Debugging and reverse engineering: 0x50 Debugging

0x50, the bug check we all love because it's so easy to say 'Remove avast!, AVG, Kaspersky, McAfee, Norton, ESET, etc' because most commonly this bug check is caused by antiviruses corrupting the file system, interceptors conflicting if anti-malware and antivirus active protections are running (maybe two antiviruses running at once), etc. Lots of different possibilities. However, what if we're not so quick to blame the antivirus, and come to find instead that it's faulty RAM? Well, let's talk about it!

---------------------------

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced. This cannot be protected by try-except,
it must be protected by a Probe. Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: fffffa806589b700, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff803fd7133d4, If non-zero, the instruction address which referenced the bad memory
address.
Arg4: 0000000000000002, (reserved)

^^ Here we of course have the basic output of the bug check. As we can see, parameter 1 is the memory that was referenced, and parameter 3 (if non-zero), is the instruction address which referenced the bad memory address (parameter 1). So, we can so far say that address fffffa806589b700 was written to by the instruction at address fffff803fd7133d4. Pretty easy so far!

6: kd> r cr2
cr2=fffffa806589b700

^^ We can see above that the 1st parameter address was stored in cr2 prior to calling the page fault handler. This doesn't tell us anything we don't already know about the bug check, just a confirmation, if you will.

---------------------------

Now that we know all of this, let's go ahead and run !pte on the 1st parameter address. !pte displays the page table entry (PTE) and page directory entry (PDE) for the specified address.

6: kd> !pte fffffa806589b700
                                           VA fffffa806589b700
PXE at FFFFF6FB7DBEDFA8    PPE at FFFFF6FB7DBF5008    PDE at FFFFF6FB7EA01960    PTE at FFFFF6FD4032C4D8
contains 000000021EFFF863 contains 0000000000000000
GetUlongFromAddress: unable to read from fffff803fd9e20e4
pfn 21efff    ---DA--KWEV not valid

^^ We can see from the above that the address fffffa806589b700 is indeed invalid. With this said, why did fffffa806589b700 attempt to write to fffff803fd7133d4?

Let's go ahead and run kv to get the trapframe:

6: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`02f324f8 fffff803`fd7a55a0 : 00000000`00000050 fffffa80`6589b700 00000000`00000000 fffff880`02f326e0 : nt!KeBugCheckEx
fffff880`02f32500 fffff803`fd71eacb : 00000000`00000000 fffffa80`6589b700 fffffa80`06705700 00000000`00000000 : nt! ?? ::FNODOBFM::`string'+0x33e2a
fffff880`02f325a0 fffff803`fd6e1eee : 00000000`00000000 a1e00021`d8925847 00000000`00001000 fffff880`02f326e0 : nt!MmAccessFault+0x55b
fffff880`02f326e0 fffff803`fd7133d4 : 00000003`00000000 00000000`00000000 a0200002`15917005 00000000`73576d4d : nt!KiPageFault+0x16e (TrapFrame @ fffff880`02f326e0)
fffff880`02f32870 fffff803`fd732434 : fffffa80`080e2568 fffff880`02f32ac0 00000000`00000001 fffff803`fd731c9e : nt!MiAgeWorkingSet+0x264
fffff880`02f32a30 fffff803`fd7318bd : fffff880`02f32b09 00000000`00000001 00000000`00000000 00000000`00000000 : nt!MiTrimOrAgeWorkingSet+0xb4
fffff880`02f32a80 fffff803`fd740e94 : 00000000`00000007 00000000`00000000 00000000`00000001 00000000`00000007 : nt!MiProcessWorkingSets+0x1dd
fffff880`02f32b70 fffff803`fd731e31 : 00000000`00000002 fffff880`02f32be0 00000000`00000008 00000000`00000000 : nt!MmWorkingSetManager+0x40
fffff880`02f32ba0 fffff803`fd6b6fd9 : fffffa80`06705700 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KeBalanceSetManager+0xd9
fffff880`02f32c10 fffff803`fd76b7e6 : fffff880`00e65180 fffffa80`06705700 fffff880`00e70f40 fffffa80`066fd040 : nt!PspSystemThreadStartup+0x59
fffff880`02f32c60 00000000`00000000 : fffff880`02f33000 fffff880`02f2d000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16

6: kd> .trap fffff880`02f326e0
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=00000000021d8925 rbx=0000000000000000 rcx=8000000000000000
rdx=0000098000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff803fd7133d4 rsp=fffff88002f32870 rbp=000000000000021e
r8=0000000fffffffff r9=fffff803fd9db700 r10=0000058000000000
r11=000007f8ca99f000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na po cy
nt!MiAgeWorkingSet+0x264:
fffff803`fd7133d4 498b5510 mov rdx,qword ptr [r13+10h] ds:00000000`00000010=????????????????

^^ On the instruction we failed on, address fffff803`fd7133d4 deferenced r13+10h where r13 is 0000000000000000. All of this would result in a memory write to the address 00000000`00000010. Let's go ahead and run !pte on 00000000`00000010 to see whether or not it's a valid address.

6: kd> !pte 00000000`00000010
                                           VA 0000000000000010
PXE at FFFFF6FB7DBED000    PPE at FFFFF6FB7DA00000    PDE at FFFFF6FB40000000    PTE at FFFFF68000000000
contains 19000001DD904867 contains 014000020228E867 contains 01500001D070F867 contains 0000000000000000
pfn 1dd904    ---DA--UWEV pfn 20228e    ---DA--UWEV pfn 1d070f    ---DA--UWEV not valid

^^ It's invalid. We can go one step further and run dd which dumps the physical address.

6: kd> dd 00000000`00000010
00000000`00000010 ???????? ???????? ???????? ????????
00000000`00000020 ???????? ???????? ???????? ????????
00000000`00000030 ???????? ???????? ???????? ????????
00000000`00000040 ???????? ???????? ???????? ????????
00000000`00000050 ???????? ???????? ???????? ????????
00000000`00000060 ???????? ???????? ???????? ????????
00000000`00000070 ???????? ???????? ???????? ????????
00000000`00000080 ???????? ???????? ???????? ????????

^^ Again, completely invalid.

---------------------------

Right, so the code wanted to write to 00000000`00000010 which as we can see above is a completely invalid and/or bogus address. The 1st parameter and cr2 however note we failed writing to address fffffa806589b700. This does not make sense, and is essentially not logically possible.

MiAgeWorkingSet told the hardware to write to 00000000`00000010 (which again by the way is a completely invalid address), and the hardware came back and said 'I cannot write to fffffa806589b700'. I like ntdebug's analogy on this, which can be read (here). The way I like to explain it in this specific scenario is if you kindly asked the waiter of your table for some delicious hot lava water (doesn't exist, of course! :']), he writes it down, but comes back and says 'I'm sorry, but we're all out of coffee'.

Ultimately, the hardware was told to write to a completely invalid address, and then came back with an entirely different invalid address. Seems very fishy on hardware, doesn't it?

We can also very likely confirm that this is a hardware issue not just by the analysis alone, but this specific crash dump was verifier enabled, yet failed to find a 3rd party driver being the culprit (because there isn't one):

Verify Level 2092b ... enabled options are:
    Special pool
    Special irql
    All pool allocations checked on unload
    Deadlock detection enabled
    Security checks enabled
    Miscellaneous checks enabled

Summary of All Verifier Statistics

RaiseIrqls                             0x0
AcquireSpinLocks                       0xfbf4b4
Synch Executions                       0x124b88
Trims                                  0xa77

Pool Allocations Attempted             0x28395e
Pool Allocations Succeeded             0x28395e
Pool Allocations Succeeded SpecialPool 0x28395e
Pool Allocations With NO TAG           0x26
Pool Allocations Failed                0x0
Resource Allocations Failed Deliberately   0x0

Current paged pool allocations         0xa8 for 00100740 bytes
Peak paged pool allocations            0xbc for 002387F0 bytes
Current nonpaged pool allocations      0x6eb9 for 0099F300 bytes
Peak nonpaged pool allocations         0x6f4f for 009B0B08 bytes

---------------------------

Ultimately, in this scenario, I asked the user to run Memtest (as RAM is as always the most likely culprit in a situation like this), and sure enough in ~10 hrs time, there were 7 errors.

Thanks for reading!