Thursday, February 13, 2014

0x133 (with a little 0xC4 in addition) Debugging

I've written a newer/more informative version of this blog post here - 0x133 (with a little 0xC4 in addition) Debugging - New Version (Update).

--------------------

I've solved of course many *133 bug checks in the past, however, this was the first time I was supplied a kernel-dump for it and was finally knowledgeable enough thanks to reading, experience, and documentation, to debug it 'in-depth', and successfully solved it at the same time.

I have also supplied Harry (x BlueRobot) this kernel-dump as we both wanted to write and learn about 0x133's in-depth, so he has gone ahead and written a tutorial as well for it (which you can see here). Harry goes into pretty nice detail about DPC's, which is something I won't be doing here in my tutorial. I'll instead be focusing on what caused it and how it was solved.

--------------------

I've thankfully done most of the analysis in the thread I solved the crash in, however, I don't want to get lazy and will of course go into detail wherever I can. Let's get started:

DPC_WATCHDOG_VIOLATION (133)
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
    component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
Here we have the basic bug check information. First off, the DPC_WATCHDOG_VIOLATION bug check can be triggered in two ways. First, if a single DPC exceeds a specified number of ticks, the system will stop with 0x133 with parameter 1 of the bug check set to 0.  In this case, the system's time limit for single DPC will be in parameter 3, with the number of ticks taken by this DPC in parameter 2.

BugCheck 133, {0, 501, 500, 0}


^^ In this case, the 1st parameter = 0!


0: kd> .bugcheck
Bugcheck code 00000133
Arguments 00000000`00000000 00000000`00000501 00000000`00000500 00000000`00000000



This specific DPC has run for 0x501 ticks, when the limit was 0x500.


In the case of a stop 0x133 with the first parameter set to 0, the call stack should contain the offending driver. Rather than running a kv as we usually do, let's go ahead and run knL:

0: kd> knL
 # Child-SP          RetAddr           Call Site
00 fffff803`a2459408 fffff803`a044bf4b nt!KeBugCheckEx
01 fffff803`a2459410 fffff803`a0310774 nt! ?? ::FNODOBFM::`string'+0x145a4
02 fffff803`a2459490 fffff803`a0228eca nt!KeUpdateTime+0x2ec
03 fffff803`a2459670 fffff803`a02c501e hal!HalpTimerClockInterrupt+0x86
04 fffff803`a24596a0 fffff880`18564dd2 nt!KiInterruptDispatchLBControl+0x1ce
05 fffff803`a2459830 fffff803`a02f51ea usb80236!CancelSendsTimerDpc+0xa6
06 fffff803`a2459870 fffff803`a02f3655 nt!KiProcessExpiredTimerList+0x22a
07 fffff803`a24599a0 fffff803`a02f5668 nt!KiExpireTimerTable+0xa9
08 fffff803`a2459a40 fffff803`a02f4a06 nt!KiTimerExpiration+0xc8
09 fffff803`a2459af0 fffff803`a02f59ba nt!KiRetireDpcList+0x1f6
0a fffff803`a2459c60 00000000`00000000 nt!KiIdleLoop+0x5a
k = Displays the call stack of the given thread.
n = Displays frame numbers.
L (important you capitalize it) = Hides source lines in the display.

This makes a nice, neat, and informative call stack for 0x133 debugging.

As we can see above in the call stack, the offending driver is usb80236.sys, which is the Remote NDIS USB Driver. It calls into KiInterruptDispatchLBControl.

Let’s view the driver’s unassembled DPC routine:

0: kd> ub fffff880`18564dd2
usb80236!CancelSendsTimerDpc+0x81:
fffff880`18564dad 488bcd          mov     rcx,rbp
fffff880`18564db0 c6465001        mov     byte ptr [rsi+50h],1
fffff880`18564db4 ff1556230000    call    qword ptr [usb80236!_imp_KeReleaseSpinLockFromDpcLevel (fffff880`18567110)]
fffff880`18564dba 488bcb          mov     rcx,rbx
fffff880`18564dbd ff15a5220000    call    qword ptr [usb80236!_imp_IoCancelIrp (fffff880`18567068)]
fffff880`18564dc3 488bcd          mov     rcx,rbp
fffff880`18564dc6 ff15e4220000    call    qword ptr [usb80236!_imp_KeAcquireSpinLockAtDpcLevel (fffff880`185670b0)]
fffff880`18564dcc ff8708030000    inc     dword ptr [rdi+308h]
1. The driver started by calling the KeAcquireSpinLockAtDpcLevel routine which acquires a spin lock when the caller is already running at IRQL >= DISPATCH_LEVEL.

The caller should release the spin lock with KeReleaseSpinLockFromDpcLevel as quickly as possible.

2. The driver then called the IoCancelIrp routine which sets the cancel bit in a given IRP and calls the cancel routine for the IRP if there is one.

If the IRP has a cancel routine, IoCancelIrp sets the cancel bit and calls the cancel routine.

3. The driver then called the KeReleaseSpinLockFromDpcLevel routine (as mentioned above) which releases an executive spin lock without changing the IRQL.

Going a bit deeper...

0: kd> u fffff880`18564dd2
usb80236!CancelSendsTimerDpc+0xa6:
fffff880`18564dd2 488b36          mov     rsi,qword ptr [rsi]
fffff880`18564dd5 493bf6          cmp     rsi,r14
fffff880`18564dd8 75a9            jne     usb80236!CancelSendsTimerDpc+0x57 (fffff880`18564d83)
fffff880`18564dda 4c8d87b0020000  lea     r8,[rdi+2B0h]
fffff880`18564de1 488d8f70020000  lea     rcx,[rdi+270h]
fffff880`18564de8 48c7c2001f0afa  mov     rdx,0FFFFFFFFFA0A1F00h
fffff880`18564def ff1553220000    call    qword ptr [usb80236!_imp_KeSetTimer (fffff880`18567048)]
fffff880`18564df5 eb12            jmp     usb80236!CancelSendsTimerDpc+0xdd (fffff880`18564e09)
1. The driver called the CancelSendsTimerDpc routine. I do not know exactly what this routine does, however, it's certainly something in regards to a timer on and/or for a DPC (Deferred Procedure Call). According to Harry, he believes that the driver may use a Custom DPC associated with a Timer object.

2. The driver then calls the KeSetTimer routine which sets the absolute or relative interval at which a timer object is to be set to a signaled state and, optionally, supplies a CustomTimerDpc routine to be executed when that interval expires.

3. The driver then calls the CancelSendsTimerDpc routine again. As far as I know, what should be going on here is the CustomTimerDpc routine should be called, but CancelSendsTimerDpc may be in a loop.

Overall, what seems to be occurring is the DPC may be looping itself by gathering a Spinlock at DPC Level, cancelling the Timer, and then finally releasing the Spinlock again. This is happening over and over again, therefore we have a loop.

--------------------

If we run a !pcr to show us the queued DPCs for the processor:

0: kd> !pcr
KPCR for Processor 0 at fffff803a056a000:
    Major 1 Minor 1
    NtTib.ExceptionList: fffff803a2452000
        NtTib.StackBase: fffff803a2453080
       NtTib.StackLimit: 0000000004ccee48
     NtTib.SubSystemTib: fffff803a056a000
          NtTib.Version: 00000000a056a180
      NtTib.UserPointer: fffff803a056a7f0
          NtTib.SelfTib: 000000007ef44000

                SelfPcr: 0000000000000000
                   Prcb: fffff803a056a180
                   Irql: 0000000000000000
                    IRR: 0000000000000000
                    IDR: 0000000000000000
          InterruptMode: 0000000000000000
                    IDT: 0000000000000000
                    GDT: 0000000000000000
                    TSS: 0000000000000000

          CurrentThread: fffff803a05c4880
             NextThread: fffffa80036e8040
             IdleThread: fffff803a05c4880

              DpcQueue:  0xfffffa80073595e0 0xfffff8800400c960 [Normal] dxgkrnl!DpiFdoDpcForIsr
                         0xfffffa8007be9b68 0xfffff88001efb380 [Normal] ndis!ndisInterruptDpc
                         0xfffff803a04fcfe0 0xfffff803a027c71c [Normal] nt!PpmCheckPeriodicStart
                         0xfffff803a0545d60 0xfffff803a031e45c [Normal] nt!KiBalanceSetManagerDeferredRoutine
1. ndis.sys:

(Network Driver Interface Specification driver) routine call. The Network Driver Interface Specification (NDIS) is an application programming interface (API) for network interface cards (NICs). The NDIS forms the Logical Link Control (LLC) sublayer, which is the upper sublayer of the OSI data link layer (layer 2). Therefore, the NDIS acts as the interface between the Media Access Control (MAC) sublayer, which is the lower sublayer of the data link layer, and the network layer (layer 3).

The NDIS is a library of functions often referred to as a "wrapper" that hides the underlying complexity of the NIC hardware and serves as a standard interface for level 3 network protocol drivers and hardware level MAC drivers. Another common LLC is the Open Data-Link Interface (ODI).

2. dxgkrnl.sys - Direct X Kernel.

--------------------

So, with all of this said, we know that something is causing usb80236.sys to call into a loop, and it may be anything that's working with and/or possibility interfering with Windows' networking, or Direct X. We'll need to do some detective work to determine what is causing this, as it's a system driver and is being faulted by something else. At this point, since we're at quite the wall, I recommend enabling Driver Verifier so we can see what's going on. The user enabled DV, and sure enough, they had an 0xC4 crash! Let's take a look:

DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught.  This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 0000000000001011, Invariant MDL buffer contents for Read Irp were modified during dispatch or buffer backed by dummy pages.
Arg2: fffffa8006219060, Device object to which the Read IRP was issued.
Arg3: fffff980098a8c60, The address of the IRP.
Arg4: fffff8801a5b3000, System-Space Virtual Address for the buffer that the MDL describes.
Here we have the basic bug check info, with the 2nd/3rd parameter highlighted as they will be useful later on. Let's go ahead and take a look at the call stack first:

0: kd> kv
Child-SP          RetAddr           : Args to Child                                                           : Call Site
fffff880`1af96888 fffff803`07a31fcc : 00000000`000000c4 00000000`00001011 fffffa80`06219060 fffff980`098a8c60 : nt!KeBugCheckEx
fffff880`1af96890 fffff803`07ebfa51 : 00000000`00000000 fffff880`1af96a00 fffff980`098a8c60 fffff803`07cba000 : nt!MdlInvariantPreProcessing1+0x200
fffff880`1af96900 fffff803`07ebdc51 : 00000000`00000000 fffffa80`0dd877d0 fffffa80`06218060 00000000`00000001 : nt!IovpCallDriver1+0x1cd
fffff880`1af96a60 fffff803`07eb4cde : fffff980`098a8c60 00000000`00000002 fffff880`1af96c40 00000000`c0000089 : nt!VfBeforeCallDriver+0x141
fffff880`1af96a90 fffff880`012ccdab : fffff980`098a8c60 fffffa80`06219060 fffffa80`062191b0 fffffa80`0dd877d0 : nt!IovCallDriver+0x35e
fffff880`1af96ae0 fffff880`012ccb7e : fffff980`098a8c60 fffff880`1af96ba0 fffff980`098a8e50 fffffa80`06219060 : intmsd+0x3dab
fffff880`1af96b20 fffff880`012d4402 : 00000000`00000000 fffff880`1af96c40 00000000`00000000 fffff803`07928d44 : intmsd+0x3b7e
fffff880`1af96b60 fffff880`012ccaf3 : fffff980`098a8c60 fffffa80`062191b0 fffffa80`06219060 00000000`00000000 : intmsd+0xb402
fffff880`1af96bf0 fffff803`07eb4d66 : fffff980`098a8c60 00000000`00000002 fffffa80`0bf1b790 fffffa80`06219b10 : intmsd+0x3af3
fffff880`1af96cc0 fffff803`07eb4d66 : fffff980`098a8c60 fffffa80`06219b10 00000000`00000002 fffffa80`0bf1b790 : nt!IovCallDriver+0x3e6
fffff880`1af96d10 fffff880`0127b14e : fffffa80`05356de0 fffffa80`05356c90 00000000`00000002 fffffa80`0d745ba0 : nt!IovCallDriver+0x3e6
fffff880`1af96d60 fffff803`07eb4d66 : 00000000`00000002 fffff980`098a8c60 fffffa80`05356c90 fffffa80`0dd66e00 : volmgr!VmReadWrite+0x13e
fffff880`1af96da0 fffff880`01f63faa : fffff980`098a8c60 00000001`94c3f000 fffff980`098a8c60 fffffa80`0dd66e00 : nt!IovCallDriver+0x3e6
fffff880`1af96df0 fffff880`01f64236 : fffff980`098a8c60 00000001`94c3f000 fffff980`098a8c60 fffff803`07ebdc51 : fvevol!FveReadWrite+0x3e
fffff880`1af96e30 fffff803`07eb4d66 : fffffa80`06227040 00000000`00000002 fffffa80`0d9eee40 fffff880`01c18329 : fvevol!FveFilterRundownReadWrite+0x1b6
fffff880`1af96e80 fffff880`01c01af2 : fffffa80`06228190 fffffa80`06228040 00000000`00000002 fffffa80`0d9eee40 : nt!IovCallDriver+0x3e6
fffff880`1af96ed0 fffff803`07eb4d66 : fffff980`098a8c60 00000000`00000002 00000000`00000030 00000000`00000000 : volsnap!VolSnapReadFilter+0x112
fffff880`1af96f00 fffff880`01817b69 : fffff880`1af79828 fffff880`1af798f0 fffffa80`0c543080 fffffa80`045faf40 : nt!IovCallDriver+0x3e6
fffff880`1af96f50 fffff803`078c8b67 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000001 : Ntfs!NtfsStorageDriverCallout+0x16
fffff880`1af96f80 fffff803`078c8b2d : 00000000`00000000 00000000`00000000 00000000`00000002 fffff803`07923ad8 : nt!KxSwitchKernelStackCallout+0x27 (TrapFrame @ fffff880`1af96e40)
fffff880`1af79690 fffff803`07923ad8 : 00000000`00000006 00000000`00000000 00000000`00000006 00000000`00000000 : nt!KiSwitchKernelStackContinue
fffff880`1af796b0 fffff803`07926405 : fffff880`01817b54 fffff880`1af79828 00000000`00055000 00000000`00000001 : nt!KeExpandKernelStackAndCalloutInternal+0x218
fffff880`1af797b0 fffff880`01817aac : fffff880`1af79c60 fffff980`098a8c60 fffff8a0`01299b90 00000000`0005d000 : nt!KeExpandKernelStackAndCalloutEx+0x25
fffff880`1af797f0 fffff880`0181646a : fffff880`1af798a0 fffff880`1af79c60 fffff8a0`01299b90 fffff880`018770b0 : Ntfs!NtfsMultipleAsync+0xac
fffff880`1af79860 fffff880`01825b26 : 00000000`00000000 fffff880`1af79df0 fffff8a0`01299be8 fffff8a0`01299b90 : Ntfs!NtfsNonCachedIo+0x26a
fffff880`1af79a70 fffff880`0182742b : fffff880`1af79c60 fffff980`098a8c60 fffffa80`0c169701 00000000`00000001 : Ntfs!NtfsCommonRead+0x896
fffff880`1af79c30 fffff803`07eb4d66 : fffff980`098a8c60 fffff980`098a8c60 00000000`00000002 fffffa80`0dd800c0 : Ntfs!NtfsFsdRead+0x1db
fffff880`1af79e70 fffff880`013924ee : fffffa80`0fb303f0 fffff880`1af79f00 fffff980`098a8c60 fffffa80`0dd800c0 : nt!IovCallDriver+0x3e6
fffff880`1af79ec0 fffff880`013900b6 : fffffa80`0538bb00 00000000`00000002 fffff980`098a8c60 fffffa80`0d579e28 : FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x25e
fffff880`1af79f60 fffff803`07eb4d66 : fffff980`098a8c60 00000000`00000002 fffffa80`0d5f47b0 fffffa80`0398e598 : FLTMGR!FltpDispatch+0xb6
fffff880`1af79fc0 fffff803`0794b43e : fffff980`098a8c60 00000000`00000000 fffffa80`0d5f4878 fffffa80`0d579d80 : nt!IovCallDriver+0x3e6
fffff880`1af7a010 fffff803`0794fa61 : fffffa80`0d5f47b0 fffffa80`0d5f47d8 fffffa80`0d5f4818 fffffa80`0d5f4808 : nt!IoPageRead+0x21e
fffff880`1af7a060 fffff803`0794ab20 : 00000000`00000002 fffff880`1af7a0d0 fffffa80`0c543080 fffffa80`0d5f47b0 : nt!MiIssueHardFaultIO+0xc9
fffff880`1af7a0a0 fffff803`07908d8f : fffffa80`0c543080 fffff803`07bc5f40 00000000`c0033333 00000000`00000000 : nt!MiIssueHardFault+0x170
fffff880`1af7a130 fffff803`0792369b : 00000000`00000000 fffffa80`0c543080 8de00001`15efb900 00000000`00000000 : nt!MmAccessFault+0x81f
fffff880`1af7a270 fffff803`07939a37 : 00000000`00000000 00000000`00051000 00000000`00055000 fffff880`1af7a438 : nt!MmCheckCachedPageStates+0x8db
fffff880`1af7a400 fffff803`07938e62 : fffffa80`0c16b010 0000003c`a4737c90 fffff880`1af7a550 fffff880`00000000 : nt!CcMapAndCopyInToCache+0x397
fffff880`1af7a4f0 fffff880`018d4070 : 00000000`000551b8 fffffa80`0538bb00 fffffa80`0c5acbe8 fffffa80`0c1697b0 : nt!CcCopyWriteEx+0x1b2
fffff880`1af7a590 fffff880`01394415 : 00000000`00000000 00000000`000002b0 fffff880`1af7a9b0 00000000`00000000 : Ntfs!NtfsCopyWriteA+0x290
fffff880`1af7a7f0 fffff880`01394b53 : fffff880`1af7a8e0 00000000`00000000 fffffa80`0c169700 fffffa80`0c5aca90 : FLTMGR!FltpPerformFastIoCall+0x155
fffff880`1af7a850 fffff880`013bcabd : 00000000`000000d8 00000000`00000000 00000000`00000000 00000000`00000000 : FLTMGR!FltpPassThroughFastIo+0xc3
fffff880`1af7a8b0 fffff803`07caa249 : fffffa80`0c1697b0 00000000`00000000 00000000`00000000 00000000`00000000 : FLTMGR!FltpFastIoWrite+0x19d
fffff880`1af7a960 fffff803`078cd453 : fffffa80`0c543001 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtWriteFile+0x5b8
fffff880`1af7aa90 000007f8`a88c2c6a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`1af7ab00)
0000003c`a744fa28 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x000007f8`a88c2c6a
As we can see, we have many different file system related routines (Ntfs, FLTMGR, etc). Why? Well, as we move up the stack, we eventually see three intmsd.sys calls. This is the IntelliMemory Storage Filter Driver from Condusiv Technologies.

Now, let's go ahead and refer to the 2nd/3rd parameters as I've highlighted above!

If we run an !irp on the 3rd parameter:

 [  3, 0]  10 e0 fffffa8006219060 00000000 fffff880012ccbc8-fffffa8006219060 Success Error Cancel
           \Driver\intmsd    intmsd
            Args: 00008000 00000000 1cec3f000 00000000
>[  3, 0]  10 e1 fffffa8006219060 00000000 fffff880012183c0-00000000 Success Error Cancel pending
           \Driver\intmsd    partmgr!PmIoCompletion
If we run a !devobj on the 2nd parameter:

0: kd> !devobj fffffa8006219060
Device object (fffffa8006219060) is for:
 intmsd0 \Driver\intmsd DriverObject fffffa80049241b0
Current Irp 00000000 RefCount 0 Type 00000007 Flags 00000850
Vpb fffffa800534d8f0 Dacl fffff9a10052d360 DevExt fffffa80062191b0 DevObjExt fffffa8006219930 Dope fffffa800534d880
ExtensionFlags (0x80000800)  DOE_DEFAULT_SD_PRESENT, DOE_DESIGNATED_FDO
Characteristics (0x00000100)  FILE_DEVICE_SECURE_OPEN
AttachedDevice (Upper) fffffa8006219b10 \Driver\partmgr
AttachedTo (Lower) fffffa8006218060 \Driver\disk
Device queue is not busy.
That's why we're seeing so many file system and storage related routines being called. After this was found, I recommend disabling and/or preferably uninstalling IntelliMemory. After uninstalling IntelliMemory, the crashes ceased. Why?

First off, IntelliMemory™ is an intelligent data caching technology that provides faster access to frequently used files. IntelliMemory is supposed to improve latency and throughput by reducing disk I/O requests as active files are predicatively cached within the server to preempt round trips between VMs and network storage.

Remember how we saw various network related routines, etc, during the 0x133 debugging? Well, it's because IntelliMemory was the driver that was causing the loop.

...and solved : )



1 comment: