Wednesday, August 13, 2014

Double Fault

I was recently sent a pretty neat kernel-dump by my good friend Jared. I've always wanted to go into double faults, so let's get started! Thanks, Jared : )

 UNEXPECTED_KERNEL_MODE_TRAP (7f)  
 This means a trap occurred in kernel mode, and it's a trap of a kind  
 that the kernel isn't allowed to have/catch (bound trap) or that  
 is always instant death (double fault).  
 Arguments:  
 Arg1: 0000000000000008, EXCEPTION_DOUBLE_FAULT  
 Arg2: 0000000080050033  
 Arg3: 00000000000406f8  
 Arg4: fffff800032aa875  

In our case, the 1st argument was 8, therefore this indicates a double fault occurred. So, what is a double fault, and when/why does one occur?

Double faults occur when an exception cannot be handled by the handler, or when an exception occurs when the CPU is already trying to call an exception handler for a previously thrown exception. In most cases, two exceptions that were thrown at the exact same time are handled separately, however in some cases, you may have a situation occur in which a pagefault occurs, but the exception handler is located in a not-present page, two page faults would occur and neither of them can be handled. This is known as a double fault! Also, double faults can occur (like in this scenario) when the processor cannot properly service an interrupt that is pending.

 4: kd> k  
 Child-SP     RetAddr      Call Site  
 fffff880`009b9de8 fffff800`0328b169 nt!KeBugCheckEx  
 fffff880`009b9df0 fffff800`03289632 nt!KiBugCheckDispatch+0x69  
 fffff880`009b9f30 fffff800`032aa875 nt!KiDoubleFaultAbort+0xb2  <- Uh oh, double fault!
 fffff880`03dccfd0 fffff800`032909ba nt!KiIpiSendRequest+0x305  <- As it is a multiprocessor job, processor #4 sent an inter-processor interrupt to interrupt another processor saying "Hey, we need to flush the TLB."
 fffff880`03dcd090 fffff800`032ec198 nt!KeFlushMultipleRangeTb+0x22a  <- Flushing translation lookaside buffer, this is a multiprocessor job.
 fffff880`03dcd160 fffff800`033935ea nt! ?? ::FNODOBFM::`string'+0x204ce  
 fffff880`03dcd350 fffff800`03394be7 nt!MiEmptyWorkingSet+0x24a  <- Removing as many pages as possible from the working set.
 fffff880`03dcd400 fffff800`0372f371 nt!MiTrimAllSystemPagableMemory+0x218  <- Unmapping all pageable system memory.
 fffff880`03dcd460 fffff800`0372f4cf nt!MmVerifierTrimMemory+0xf1  
 fffff880`03dcd490 fffff800`0372fc24 nt!ViKeRaiseIrqlSanityChecks+0xcf  <- As verifier is enabled, it's doing a sanity check. A sanity check is essentially verifier saying "Okay, what IRQL are we on and are we supposed to be here?"
 fffff880`03dcd4d0 fffff880`018443f5 nt!VerifierKeAcquireSpinLockRaiseToDpc+0x54  <- IRST resetting IRQL to DISPATCH (2) and then acquiring a lock.
 fffff880`03dcd530 fffff880`018222a2 iaStor+0x253f5  <- Intel Rapid Storage Technology
 fffff880`03dcd560 fffff880`01871489 iaStor+0x32a2  <- Intel Rapid Storage Technology

 4: kd> ub nt!KiIpiSendRequest+0x305  
 nt!KiIpiSendRequest+0x2eb:  
 fffff800`032aa85b 5e       pop   rsi  
 fffff800`032aa85c 5d       pop   rbp  
 fffff800`032aa85d c3       ret  
 fffff800`032aa85e 8bc6      mov   eax,esi  
 fffff800`032aa860 e9e2feffff   jmp   nt!KiIpiSendRequest+0x1d7 (fffff800`032aa747)  
 fffff800`032aa865 0fb70db4892100 movzx  ecx,word ptr [nt!KeActiveProcessors (fffff800`034c3220)]  
 fffff800`032aa86c 0fb705af892100 movzx  eax,word ptr [nt!KeActiveProcessors+0x2 (fffff800`034c3222)]  
 fffff800`032aa873 8bfa      mov   edi,edx  

By unassmembling nt!KiIpiSendRequest+0x305 backwards, it looks like there's a check for active processors, and then the attempt to send the IPI.

 4: kd> !ipi  
 IPI State for Processor 0  
   TargetCount     0 PacketBarrier    0 IpiFrozen   2 [Frozen]  
 IPI State for Processor 1  
   TargetCount     0 PacketBarrier    0 IpiFrozen   2 [Frozen]  
 IPI State for Processor 2  
   TargetCount     0 PacketBarrier    0 IpiFrozen   2 [Frozen]  
 IPI State for Processor 3  
   TargetCount     0 PacketBarrier    0 IpiFrozen   2 [Frozen]  
 IPI State for Processor 4  
   TargetCount     0 PacketBarrier    0 IpiFrozen   0 [Running]  
 IPI State for Processor 5  
   TargetCount     0 PacketBarrier    0 IpiFrozen   2 [Frozen]  
 IPI State for Processor 6  
   TargetCount     0 PacketBarrier    0 IpiFrozen   2 [Frozen]  
 IPI State for Processor 7  
   TargetCount     0 PacketBarrier    0 IpiFrozen   2 [Frozen]  

By running !ipi we can check the inter-processor interrupt state for every processor on the box. We can see here that every single processor (except #4) is in a frozen state (idle), therefore obviously our IPI is never going to be serviced, will remain pending, and we're going to double fault.

 4: kd> lmvm iaStor  
 start       end         module name  
 fffff880`0181f000 fffff880`01bc3000  iaStor   (no symbols)        
   Loaded symbol image file: iaStor.sys  
   Image path: \SystemRoot\system32\DRIVERS\iaStor.sys  
   Image name: iaStor.sys  
   Timestamp:    Wed Feb 01 19:15:24 2012  

The IRST driver is dated from early 2012, which is likely the problem since it is a notoriously problematic driver, and it gets worse as it gets older. The newer update would likely solve it, but honestly, I always usually recommend a user safely removes and replaces this driver with the standard MSFT driver if they aren't running a RAID setup. Kaspersky was also present on this system, and antivirus suites don't tend to play nice with this software either.

This post also shows how helpful Driver Verifier is, and how without it in this specific scenario, we likely would have had no idea what was causing this, and may interpret it as a hardware problem.

Thanks for reading!

No comments:

Post a Comment