Saturday, August 9, 2014

MEMORY_CORRUPTION_STRIDE

You know when you have something you really want to write a blog post about, but you don't have a crash dump for it? Debugger problems. Fortunately enough for me, I searched Google for a live crash dump link and found one. Happy days! Thanks to this person from four or so years ago for their crash dump :~)

Let's take a look at our basic bug check information in this case. This time, let's use code boxes. I never use code boxes on my blog, but now it's time!

 SYSTEM_SERVICE_EXCEPTION (3b)  
 An exception happened while executing a system service routine.  
 Arguments:  
 Arg1: 00000000c0000005, Exception code that caused the bugcheck  
 Arg2: fffff80002cc272d, Address of the instruction which caused the bugcheck  
 Arg3: fffff8800a555070, Address of the context record for the exception that caused the bugcheck  
 Arg4: 0000000000000000, zero.  

As with most 0x3B's, our exception was specifically an access violation.

 2: kd> ln fffff80002cc272d  
 (fffff800`02cc2590)  nt!KiDpcInterrupt+0x19d  | (fffff800`02cc2780)  nt!KiDpcInterruptBypass  

The violation in this case specifically occurred in nt!KiDpcInterrupt+0x19d.

 2: kd> .cxr 0xfffff8800a555070;r  
 rax=0000000000000001 rbx=fffffa8006b24b60 rcx=0000000000000000  
 rdx=000001af00000000 rsi=0000000000000000 rdi=0000000000000003  
 rip=fffff80002cc272d rsp=fffff8800a555a50 rbp=fffff8800a555ad0  
  r8=0000000000000000 r9=0000000000000001 r10=0000000000000000  
 r11=0000000000000064 r12=0000000000000000 r13=0000000000000000  
 r14=0000000000000064 r15=000007ff00042020  
 iopl=0     nv up di pl zr na po nc  
 cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b       efl=00010046  
 nt!KiDpcInterrupt+0x19d:  
 fffff800`02cc272d 0fae1f     stmxcsr dword ptr [rdi] ds:002b:00000000`00000003=????????  

On the instruction we faulted on, we failed storing the contents of the MXCSR register within rdi (0000000000000003). We can certainly imagine 00000000`00000003 is completely invalid, so therein lies our problem.

So, why are we hitting a pagefault within a DPC interrupt? Good question! Let's run the following:

 !chkimg -lo 50 -db -v !nt  

!chkimg compares an image with its original copy. More specifically, it does this by comparing the image of an executable file in memory to the copy of the file that resides on a symbol store.

The -lo 50 parameter limits the number of output lines to 50. Not too much and not too little.

The -db parameter displays mismatched areas in a format that is similar to the db debugger command. Therefore, each display line shows the address of the first byte in the line, followed by up to 16 hexadecimal byte values. The byte values are immediately followed by the corresponding ASCII values. All nonprintable characters, such as carriage returns and line feeds, are displayed as periods (.). The mismatched bytes are marked by an asterisk (*).

The -v parameter displays extra verbose information.

!nt is the module, which is of course the kernel.

 2: kd> !chkimg -lo 50 -db -v !nt  
 Searching for module with expression: !nt  
 Will apply relocation fixups to file used for comparison  
 Will ignore NOP/LOCK errors  
 Will ignore patched instructions  
 Image specific ignores will be applied  
 Comparison image path: c:\localsymbols\ntkrnlmp.exe\4A5BC6005dd000\ntkrnlmp.exe  
 No range specified  

Above we can see that as I noted above, it's comparing the kernel image from the crash dump to the latest symbol stored on my local symbol cache. If it wasn't available locally, it'd grab it from the symbol server.

 Scanning section:  .text  
 Size: 1685025  
 Range to scan: fffff80002c06000-fffff80002da1621  
 Total bytes compared: 1685025(100%)  
 Number of errors: 40  

So we have 40 errors specifically in the .text section of the kernel that was scanned.

 fffff80002cc2680 19 b9 01 00 00 00 44 *44 22 c1 fb e8 80 17 f9 *48 ......DD"......H  
 fffff80002cc2690 fa b9 00 00 00 00 44 *45 22 c1 65 48 8b 0c 25 *34 ......DE".eH..%4  
 fffff80002cc26a0 01 00 00 f7 01 00 00 *25 40 74 25 f6 41 02 02 *85 .......%@t%.A...  
 fffff80002cc26b0 0e e8 8a 68 05 00 65 *8b 8b 0c 25 88 01 00 00 *48 ...h..e...%....H  
 ...  
 fffff80002cc2700 8b 55 d8 4c 8b 4d d0 *ba 8b 45 c8 48 8b 55 c0 *00 .U.L.M...E.H.U..  
 fffff80002cc2710 8b 4d b8 48 8b 45 b0 *07 8b e5 48 8b ad d8 00 *89 .M.H.E....H.....  
 fffff80002cc2720 00 48 81 c4 e8 00 00 *ff 0f 01 f8 48 cf 0f ae *1f .H.........H....  
 fffff80002cc2730 ac 0f 28 45 f0 0f 28 *4c 00 0f 28 55 10 0f 28 *40 ..(E..(L..(U..(@  
 ...  
 fffff80002cc2880 24 10 48 89 74 24 18 *38 89 64 24 20 48 8b f9 *00 $.H.t$.8.d$ H...  
 fffff80002cc2890 8b d1 49 8b f0 4c 8b *15 49 83 e9 11 48 83 ea *01 ..I..L..I...H...  
 fffff80002cc28a0 4c 8b da 48 8b ef bb *8b 00 00 00 49 3b f1 0f *48 L..H.......I;..H  
 fffff80002cc28b0 c1 05 00 00 49 3b fb *48 83 b8 05 00 00 8a 06 *e8 ....I;.H........  
 ...  
 fffff80002cc2900 a8 20 0f 85 e6 03 00 *90 8a 56 06 88 57 05 a8 *44 . .......V..W..D  
 fffff80002cc2910 0f 85 69 04 00 00 8a *41 07 88 57 06 a8 80 0f *ec ..i....A..W.....  
 fffff80002cc2920 db 04 00 00 8a 56 08 *f9 57 07 48 83 c6 09 48 *ba .....V..W.H...H.  
 fffff80002cc2930 c7 08 e9 74 ff ff ff *24 3b fd 0f 87 b8 00 00 *05 ...t...$;.......  
 ...  
 fffff80002cc2980 f3 a4 49 8b f4 48 83 *8b 01 a8 02 0f 85 81 00 *83 ..I..H..........  
 fffff80002cc2990 00 8a 56 02 88 57 01 *49 04 0f 85 40 01 00 00 *f2 ..V..W.I...@....  
 fffff80002cc29a0 56 03 88 57 02 a8 08 *3a 85 f1 01 00 00 8a 56 *48 V..W...:......VH  
 fffff80002cc29b0 88 57 03 a8 10 0f 85 *00 02 00 00 8a 56 05 88 *8b .W..........V...  

Assuming I am correct (which I hopefully am), every 8th and 16th bit of each byte are no good (as if it's striding through the data). This is known as a stride corruption pattern.

 MEMORY_CORRUPTOR: STRIDE  

It's a characteristic of address line issues that occur somewhere between going in/out of RAM. Despite the display evidence thus far, we cannot jump to a faulty RAM conclusion as much as we'd like to. Perhaps we'd like to assume that the selector which controls these lines is faulty, so any byte stored in these lines is going to have invalid 8th and 16th bits. This would mean faulty RAM, however, in debugging we must always be sure to check everything before doing something such as outright replacing the RAM, even though we could defend and say that a Memtest would be just fine as well.

A similar memory corruption pattern is misaligned IP (instruction pointer). Not going into that in this blog post, but it's also another one you need to be 100% sure is not a simple buffer overflow as opposed to faulty RAM. Do note that WinDbg is not smarter than we are and assumes that a misaligned IP is a hardware problem.

Enough blabbering, onto what I am trying to get to...

 PROCESS_NAME: MOM.exe  

MOM.exe, what are you doing here? By the way, MOM.exe is AMD/ATI's Catalyst Control Center (CCC) monitoring software. It's not malware, or an actual mother.

</badjoke>

You normally don't see this process involved with a crash too often, and with this said, I did some digging in the modules list to see if any 3rd party software may have caused conflicts.

 2: kd> lmvm rtcore64  
 start       end         module name  
 fffff880`0859b000 fffff880`085a1000  RTCore64  (deferred)         
   Image path: \??\C:\Program Files (x86)\MSI Afterburner\RTCore64.sys  
   Image name: RTCore64.sys  
   Timestamp:    Wed May 25 02:39:12 2005  

Oh my... MSI AB driver from 2005 on an x64 Windows 7 box! The horror.

So, today's lesson summed up is - If you're going to actually use MSI Afterburner (the horror), be sure to keep it up to date so you don't upset mother <\badjoke> and make her crash by causing stride corruption.

Thanks for reading!

1 comment:

  1. good post! is there any material for me to learn "misaligned IP"?

    ReplyDelete