--------------------
I've solved of course many *133 bug checks in the past, however, this was the first time I was supplied a kernel-dump for it and was finally knowledgeable enough thanks to reading, experience, and documentation, to debug it 'in-depth', and successfully solved it at the same time.
I have also supplied Harry (x BlueRobot) this kernel-dump as we both wanted to write and learn about 0x133's in-depth, so he has gone ahead and written a tutorial as well for it (which you can see here). Harry goes into pretty nice detail about DPC's, which is something I won't be doing here in my tutorial. I'll instead be focusing on what caused it and how it was solved.
--------------------
I've thankfully done most of the analysis in the thread I solved the crash in, however, I don't want to get lazy and will of course go into detail wherever I can. Let's get started:
DPC_WATCHDOG_VIOLATION (133)Here we have the basic bug check information. First off, the DPC_WATCHDOG_VIOLATION bug check can be triggered in two ways. First, if a single DPC exceeds a specified number of ticks, the system will stop with 0x133 with parameter 1 of the bug check set to 0. In this case, the system's time limit for single DPC will be in parameter 3, with the number of ticks taken by this DPC in parameter 2.
The DPC watchdog detected a prolonged run time at an IRQL of DISPATCH_LEVEL
or above.
Arguments:
Arg1: 0000000000000000, A single DPC or ISR exceeded its time allotment. The offending
component can usually be identified with a stack trace.
Arg2: 0000000000000501, The DPC time count (in ticks).
Arg3: 0000000000000500, The DPC time allotment (in ticks).
BugCheck 133, {0, 501, 500, 0}
^^ In this case, the 1st parameter = 0!
0: kd> .bugcheck
Bugcheck code 00000133
Arguments 00000000`00000000 00000000`00000501 00000000`00000500 00000000`00000000
This specific DPC has run for 0x501 ticks, when the limit was 0x500.
In the case of a stop 0x133 with the first parameter set to 0, the call stack should contain the offending driver. Rather than running a kv as we usually do, let's go ahead and run knL:
0: kd> knLk = Displays the call stack of the given thread.
# Child-SP RetAddr Call Site
00 fffff803`a2459408 fffff803`a044bf4b nt!KeBugCheckEx
01 fffff803`a2459410 fffff803`a0310774 nt! ?? ::FNODOBFM::`string'+0x145a4
02 fffff803`a2459490 fffff803`a0228eca nt!KeUpdateTime+0x2ec
03 fffff803`a2459670 fffff803`a02c501e hal!HalpTimerClockInterrupt+0x86
04 fffff803`a24596a0 fffff880`18564dd2 nt!KiInterruptDispatchLBControl+0x1ce
05 fffff803`a2459830 fffff803`a02f51ea usb80236!CancelSendsTimerDpc+0xa6
06 fffff803`a2459870 fffff803`a02f3655 nt!KiProcessExpiredTimerList+0x22a
07 fffff803`a24599a0 fffff803`a02f5668 nt!KiExpireTimerTable+0xa9
08 fffff803`a2459a40 fffff803`a02f4a06 nt!KiTimerExpiration+0xc8
09 fffff803`a2459af0 fffff803`a02f59ba nt!KiRetireDpcList+0x1f6
0a fffff803`a2459c60 00000000`00000000 nt!KiIdleLoop+0x5a
n = Displays frame numbers.
L (important you capitalize it) = Hides source lines in the display.
This makes a nice, neat, and informative call stack for 0x133 debugging.
As we can see above in the call stack, the offending driver is usb80236.sys, which is the Remote NDIS USB Driver. It calls into KiInterruptDispatchLBControl.
Let’s view the driver’s unassembled DPC routine:
0: kd> ub fffff880`18564dd21. The driver started by calling the KeAcquireSpinLockAtDpcLevel routine which acquires a spin lock when the caller is already running at IRQL >= DISPATCH_LEVEL.
usb80236!CancelSendsTimerDpc+0x81:
fffff880`18564dad 488bcd mov rcx,rbp
fffff880`18564db0 c6465001 mov byte ptr [rsi+50h],1
fffff880`18564db4 ff1556230000 call qword ptr [usb80236!_imp_KeReleaseSpinLockFromDpcLevel (fffff880`18567110)]
fffff880`18564dba 488bcb mov rcx,rbx
fffff880`18564dbd ff15a5220000 call qword ptr [usb80236!_imp_IoCancelIrp (fffff880`18567068)]
fffff880`18564dc3 488bcd mov rcx,rbp
fffff880`18564dc6 ff15e4220000 call qword ptr [usb80236!_imp_KeAcquireSpinLockAtDpcLevel (fffff880`185670b0)]
fffff880`18564dcc ff8708030000 inc dword ptr [rdi+308h]
The caller should release the spin lock with KeReleaseSpinLockFromDpcLevel as quickly as possible.
2. The driver then called the IoCancelIrp routine which sets the cancel bit in a given IRP and calls the cancel routine for the IRP if there is one.
If the IRP has a cancel routine, IoCancelIrp sets the cancel bit and calls the cancel routine.
3. The driver then called the KeReleaseSpinLockFromDpcLevel routine (as mentioned above) which releases an executive spin lock without changing the IRQL.
Going a bit deeper...
0: kd> u fffff880`18564dd21. The driver called the CancelSendsTimerDpc routine. I do not know exactly what this routine does, however, it's certainly something in regards to a timer on and/or for a DPC (Deferred Procedure Call). According to Harry, he believes that the driver may use a Custom DPC associated with a Timer object.
usb80236!CancelSendsTimerDpc+0xa6:
fffff880`18564dd2 488b36 mov rsi,qword ptr [rsi]
fffff880`18564dd5 493bf6 cmp rsi,r14
fffff880`18564dd8 75a9 jne usb80236!CancelSendsTimerDpc+0x57 (fffff880`18564d83)
fffff880`18564dda 4c8d87b0020000 lea r8,[rdi+2B0h]
fffff880`18564de1 488d8f70020000 lea rcx,[rdi+270h]
fffff880`18564de8 48c7c2001f0afa mov rdx,0FFFFFFFFFA0A1F00h
fffff880`18564def ff1553220000 call qword ptr [usb80236!_imp_KeSetTimer (fffff880`18567048)]
fffff880`18564df5 eb12 jmp usb80236!CancelSendsTimerDpc+0xdd (fffff880`18564e09)
2. The driver then calls the KeSetTimer routine which sets the absolute or relative interval at which a timer object is to be set to a signaled state and, optionally, supplies a CustomTimerDpc routine to be executed when that interval expires.
3. The driver then calls the CancelSendsTimerDpc routine again. As far as I know, what should be going on here is the CustomTimerDpc routine should be called, but CancelSendsTimerDpc may be in a loop.
Overall, what seems to be occurring is the DPC may be looping itself by gathering a Spinlock at DPC Level, cancelling the Timer, and then finally releasing the Spinlock again. This is happening over and over again, therefore we have a loop.
--------------------
If we run a !pcr to show us the queued DPCs for the processor:
0: kd> !pcr1. ndis.sys:
KPCR for Processor 0 at fffff803a056a000:
Major 1 Minor 1
NtTib.ExceptionList: fffff803a2452000
NtTib.StackBase: fffff803a2453080
NtTib.StackLimit: 0000000004ccee48
NtTib.SubSystemTib: fffff803a056a000
NtTib.Version: 00000000a056a180
NtTib.UserPointer: fffff803a056a7f0
NtTib.SelfTib: 000000007ef44000
SelfPcr: 0000000000000000
Prcb: fffff803a056a180
Irql: 0000000000000000
IRR: 0000000000000000
IDR: 0000000000000000
InterruptMode: 0000000000000000
IDT: 0000000000000000
GDT: 0000000000000000
TSS: 0000000000000000
CurrentThread: fffff803a05c4880
NextThread: fffffa80036e8040
IdleThread: fffff803a05c4880
DpcQueue: 0xfffffa80073595e0 0xfffff8800400c960 [Normal] dxgkrnl!DpiFdoDpcForIsr
0xfffffa8007be9b68 0xfffff88001efb380 [Normal] ndis!ndisInterruptDpc
0xfffff803a04fcfe0 0xfffff803a027c71c [Normal] nt!PpmCheckPeriodicStart
0xfffff803a0545d60 0xfffff803a031e45c [Normal] nt!KiBalanceSetManagerDeferredRoutine
(Network Driver Interface Specification driver) routine call. The Network Driver Interface Specification (NDIS) is an application programming interface (API) for network interface cards (NICs). The NDIS forms the Logical Link Control (LLC) sublayer, which is the upper sublayer of the OSI data link layer (layer 2). Therefore, the NDIS acts as the interface between the Media Access Control (MAC) sublayer, which is the lower sublayer of the data link layer, and the network layer (layer 3).
The NDIS is a library of functions often referred to as a "wrapper" that hides the underlying complexity of the NIC hardware and serves as a standard interface for level 3 network protocol drivers and hardware level MAC drivers. Another common LLC is the Open Data-Link Interface (ODI).
2. dxgkrnl.sys - Direct X Kernel.
--------------------
So, with all of this said, we know that something is causing usb80236.sys to call into a loop, and it may be anything that's working with and/or possibility interfering with Windows' networking, or Direct X. We'll need to do some detective work to determine what is causing this, as it's a system driver and is being faulted by something else. At this point, since we're at quite the wall, I recommend enabling Driver Verifier so we can see what's going on. The user enabled DV, and sure enough, they had an 0xC4 crash! Let's take a look:
DRIVER_VERIFIER_DETECTED_VIOLATION (c4)Here we have the basic bug check info, with the 2nd/3rd parameter highlighted as they will be useful later on. Let's go ahead and take a look at the call stack first:
A device driver attempting to corrupt the system has been caught. This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 0000000000001011, Invariant MDL buffer contents for Read Irp were modified during dispatch or buffer backed by dummy pages.
Arg2: fffffa8006219060, Device object to which the Read IRP was issued.
Arg3: fffff980098a8c60, The address of the IRP.
Arg4: fffff8801a5b3000, System-Space Virtual Address for the buffer that the MDL describes.
0: kd> kvAs we can see, we have many different file system related routines (Ntfs, FLTMGR, etc). Why? Well, as we move up the stack, we eventually see three intmsd.sys calls. This is the IntelliMemory Storage Filter Driver from Condusiv Technologies.
Child-SP RetAddr : Args to Child : Call Site
fffff880`1af96888 fffff803`07a31fcc : 00000000`000000c4 00000000`00001011 fffffa80`06219060 fffff980`098a8c60 : nt!KeBugCheckEx
fffff880`1af96890 fffff803`07ebfa51 : 00000000`00000000 fffff880`1af96a00 fffff980`098a8c60 fffff803`07cba000 : nt!MdlInvariantPreProcessing1+0x200
fffff880`1af96900 fffff803`07ebdc51 : 00000000`00000000 fffffa80`0dd877d0 fffffa80`06218060 00000000`00000001 : nt!IovpCallDriver1+0x1cd
fffff880`1af96a60 fffff803`07eb4cde : fffff980`098a8c60 00000000`00000002 fffff880`1af96c40 00000000`c0000089 : nt!VfBeforeCallDriver+0x141
fffff880`1af96a90 fffff880`012ccdab : fffff980`098a8c60 fffffa80`06219060 fffffa80`062191b0 fffffa80`0dd877d0 : nt!IovCallDriver+0x35e
fffff880`1af96ae0 fffff880`012ccb7e : fffff980`098a8c60 fffff880`1af96ba0 fffff980`098a8e50 fffffa80`06219060 : intmsd+0x3dab
fffff880`1af96b20 fffff880`012d4402 : 00000000`00000000 fffff880`1af96c40 00000000`00000000 fffff803`07928d44 : intmsd+0x3b7e
fffff880`1af96b60 fffff880`012ccaf3 : fffff980`098a8c60 fffffa80`062191b0 fffffa80`06219060 00000000`00000000 : intmsd+0xb402
fffff880`1af96bf0 fffff803`07eb4d66 : fffff980`098a8c60 00000000`00000002 fffffa80`0bf1b790 fffffa80`06219b10 : intmsd+0x3af3
fffff880`1af96cc0 fffff803`07eb4d66 : fffff980`098a8c60 fffffa80`06219b10 00000000`00000002 fffffa80`0bf1b790 : nt!IovCallDriver+0x3e6
fffff880`1af96d10 fffff880`0127b14e : fffffa80`05356de0 fffffa80`05356c90 00000000`00000002 fffffa80`0d745ba0 : nt!IovCallDriver+0x3e6
fffff880`1af96d60 fffff803`07eb4d66 : 00000000`00000002 fffff980`098a8c60 fffffa80`05356c90 fffffa80`0dd66e00 : volmgr!VmReadWrite+0x13e
fffff880`1af96da0 fffff880`01f63faa : fffff980`098a8c60 00000001`94c3f000 fffff980`098a8c60 fffffa80`0dd66e00 : nt!IovCallDriver+0x3e6
fffff880`1af96df0 fffff880`01f64236 : fffff980`098a8c60 00000001`94c3f000 fffff980`098a8c60 fffff803`07ebdc51 : fvevol!FveReadWrite+0x3e
fffff880`1af96e30 fffff803`07eb4d66 : fffffa80`06227040 00000000`00000002 fffffa80`0d9eee40 fffff880`01c18329 : fvevol!FveFilterRundownReadWrite+0x1b6
fffff880`1af96e80 fffff880`01c01af2 : fffffa80`06228190 fffffa80`06228040 00000000`00000002 fffffa80`0d9eee40 : nt!IovCallDriver+0x3e6
fffff880`1af96ed0 fffff803`07eb4d66 : fffff980`098a8c60 00000000`00000002 00000000`00000030 00000000`00000000 : volsnap!VolSnapReadFilter+0x112
fffff880`1af96f00 fffff880`01817b69 : fffff880`1af79828 fffff880`1af798f0 fffffa80`0c543080 fffffa80`045faf40 : nt!IovCallDriver+0x3e6
fffff880`1af96f50 fffff803`078c8b67 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000001 : Ntfs!NtfsStorageDriverCallout+0x16
fffff880`1af96f80 fffff803`078c8b2d : 00000000`00000000 00000000`00000000 00000000`00000002 fffff803`07923ad8 : nt!KxSwitchKernelStackCallout+0x27 (TrapFrame @ fffff880`1af96e40)
fffff880`1af79690 fffff803`07923ad8 : 00000000`00000006 00000000`00000000 00000000`00000006 00000000`00000000 : nt!KiSwitchKernelStackContinue
fffff880`1af796b0 fffff803`07926405 : fffff880`01817b54 fffff880`1af79828 00000000`00055000 00000000`00000001 : nt!KeExpandKernelStackAndCalloutInternal+0x218
fffff880`1af797b0 fffff880`01817aac : fffff880`1af79c60 fffff980`098a8c60 fffff8a0`01299b90 00000000`0005d000 : nt!KeExpandKernelStackAndCalloutEx+0x25
fffff880`1af797f0 fffff880`0181646a : fffff880`1af798a0 fffff880`1af79c60 fffff8a0`01299b90 fffff880`018770b0 : Ntfs!NtfsMultipleAsync+0xac
fffff880`1af79860 fffff880`01825b26 : 00000000`00000000 fffff880`1af79df0 fffff8a0`01299be8 fffff8a0`01299b90 : Ntfs!NtfsNonCachedIo+0x26a
fffff880`1af79a70 fffff880`0182742b : fffff880`1af79c60 fffff980`098a8c60 fffffa80`0c169701 00000000`00000001 : Ntfs!NtfsCommonRead+0x896
fffff880`1af79c30 fffff803`07eb4d66 : fffff980`098a8c60 fffff980`098a8c60 00000000`00000002 fffffa80`0dd800c0 : Ntfs!NtfsFsdRead+0x1db
fffff880`1af79e70 fffff880`013924ee : fffffa80`0fb303f0 fffff880`1af79f00 fffff980`098a8c60 fffffa80`0dd800c0 : nt!IovCallDriver+0x3e6
fffff880`1af79ec0 fffff880`013900b6 : fffffa80`0538bb00 00000000`00000002 fffff980`098a8c60 fffffa80`0d579e28 : FLTMGR!FltpLegacyProcessingAfterPreCallbacksCompleted+0x25e
fffff880`1af79f60 fffff803`07eb4d66 : fffff980`098a8c60 00000000`00000002 fffffa80`0d5f47b0 fffffa80`0398e598 : FLTMGR!FltpDispatch+0xb6
fffff880`1af79fc0 fffff803`0794b43e : fffff980`098a8c60 00000000`00000000 fffffa80`0d5f4878 fffffa80`0d579d80 : nt!IovCallDriver+0x3e6
fffff880`1af7a010 fffff803`0794fa61 : fffffa80`0d5f47b0 fffffa80`0d5f47d8 fffffa80`0d5f4818 fffffa80`0d5f4808 : nt!IoPageRead+0x21e
fffff880`1af7a060 fffff803`0794ab20 : 00000000`00000002 fffff880`1af7a0d0 fffffa80`0c543080 fffffa80`0d5f47b0 : nt!MiIssueHardFaultIO+0xc9
fffff880`1af7a0a0 fffff803`07908d8f : fffffa80`0c543080 fffff803`07bc5f40 00000000`c0033333 00000000`00000000 : nt!MiIssueHardFault+0x170
fffff880`1af7a130 fffff803`0792369b : 00000000`00000000 fffffa80`0c543080 8de00001`15efb900 00000000`00000000 : nt!MmAccessFault+0x81f
fffff880`1af7a270 fffff803`07939a37 : 00000000`00000000 00000000`00051000 00000000`00055000 fffff880`1af7a438 : nt!MmCheckCachedPageStates+0x8db
fffff880`1af7a400 fffff803`07938e62 : fffffa80`0c16b010 0000003c`a4737c90 fffff880`1af7a550 fffff880`00000000 : nt!CcMapAndCopyInToCache+0x397
fffff880`1af7a4f0 fffff880`018d4070 : 00000000`000551b8 fffffa80`0538bb00 fffffa80`0c5acbe8 fffffa80`0c1697b0 : nt!CcCopyWriteEx+0x1b2
fffff880`1af7a590 fffff880`01394415 : 00000000`00000000 00000000`000002b0 fffff880`1af7a9b0 00000000`00000000 : Ntfs!NtfsCopyWriteA+0x290
fffff880`1af7a7f0 fffff880`01394b53 : fffff880`1af7a8e0 00000000`00000000 fffffa80`0c169700 fffffa80`0c5aca90 : FLTMGR!FltpPerformFastIoCall+0x155
fffff880`1af7a850 fffff880`013bcabd : 00000000`000000d8 00000000`00000000 00000000`00000000 00000000`00000000 : FLTMGR!FltpPassThroughFastIo+0xc3
fffff880`1af7a8b0 fffff803`07caa249 : fffffa80`0c1697b0 00000000`00000000 00000000`00000000 00000000`00000000 : FLTMGR!FltpFastIoWrite+0x19d
fffff880`1af7a960 fffff803`078cd453 : fffffa80`0c543001 00000000`00000000 00000000`00000000 00000000`00000000 : nt!NtWriteFile+0x5b8
fffff880`1af7aa90 000007f8`a88c2c6a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`1af7ab00)
0000003c`a744fa28 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x000007f8`a88c2c6a
Now, let's go ahead and refer to the 2nd/3rd parameters as I've highlighted above!
If we run an !irp on the 3rd parameter:
[ 3, 0] 10 e0 fffffa8006219060 00000000 fffff880012ccbc8-fffffa8006219060 Success Error CancelIf we run a !devobj on the 2nd parameter:
\Driver\intmsd intmsd
Args: 00008000 00000000 1cec3f000 00000000
>[ 3, 0] 10 e1 fffffa8006219060 00000000 fffff880012183c0-00000000 Success Error Cancel pending
\Driver\intmsd partmgr!PmIoCompletion
0: kd> !devobj fffffa8006219060That's why we're seeing so many file system and storage related routines being called. After this was found, I recommend disabling and/or preferably uninstalling IntelliMemory. After uninstalling IntelliMemory, the crashes ceased. Why?
Device object (fffffa8006219060) is for:
intmsd0 \Driver\intmsd DriverObject fffffa80049241b0
Current Irp 00000000 RefCount 0 Type 00000007 Flags 00000850
Vpb fffffa800534d8f0 Dacl fffff9a10052d360 DevExt fffffa80062191b0 DevObjExt fffffa8006219930 Dope fffffa800534d880
ExtensionFlags (0x80000800) DOE_DEFAULT_SD_PRESENT, DOE_DESIGNATED_FDO
Characteristics (0x00000100) FILE_DEVICE_SECURE_OPEN
AttachedDevice (Upper) fffffa8006219b10 \Driver\partmgr
AttachedTo (Lower) fffffa8006218060 \Driver\disk
Device queue is not busy.
First off, IntelliMemory™ is an intelligent data caching technology that provides faster access to frequently used files. IntelliMemory is supposed to improve latency and throughput by reducing disk I/O requests as active files are predicatively cached within the server to preempt round trips between VMs and network storage.
Remember how we saw various network related routines, etc, during the 0x133 debugging? Well, it's because IntelliMemory was the driver that was causing the loop.
...and solved : )
Very Informative blog.
ReplyDeleteReverseEngineering