CLOCK_WATCHDOG_TIMEOUT (101)
This indicates that an expected clock interrupt on a secondary processor, in a multi-processor system, was not received within the allocated interval.
So there's the basic definition of this particular bug check. Let's get into the debugging now.
--------------------
BugCheck 101, {19, 0, fffff880009b2180, 4}
^^ 19 clock ticks in regards to the timeout.
fffff880009b2180 is the PRCB address of the hung processor, let's keep this address in mind.
0: kd> !prcb 4For reference, I did not do !prcb 0 through 4. That would have been very tedious. Instead, you can use !running -it. The "i" argument causes it to display idle processors too, and "t" displays the stack trace for the thread running on each processor. If we run that extension, it shows the is an 8 core box.
PRCB for Processor 4 at fffff880009b2180:
Current IRQL -- 0
Threads-- Current fffffa800d851060 Next fffffa800caa6680 Idle fffff880009bd0c0
Processor Index 4 Number (0, 4) GroupSetMember 10
Interrupt Count -- 001bd6a1
Times -- Dpc 00000018 Interrupt 00000048
Kernel 0000f52d User 00003d36
Hint: At times, the 4th parameter of the bug check will show you the responsible processor. For example, in your *101 here, it was correct as the 4th parameter was 4.
Hint #2: You can also generally tell the amount of cores on the box by checking the bugcheck_string - BUGCHECK_STR: CLOCK_WATCHDOG_TIMEOUT_8_PROC
As this matches the 3rd parameter of the bug check, processor #4 is the responsible processor. Now with the information we have here thus far, we know that processor #4 reached 19 clock ticks without responding, therefore the system crashed. Before we go further, what is a clock tick? A clock interrupt is a form of interrupt which involves counting the the cycles of the processor core, which is running a clock on the processors to keep them all in sync. A clock interrupt is handed out to all processors and then they must report in, and when one doesn't report in, you then crash.
--------------------
Let's now look at the stacks of the different processors to see what the threads were involved in:
We can use knL and go through a grueling method of obtaining the trap frame, but we don't like having to put in more work, so let's use kv instead on Processor 0:
0: kd> kvThere it is! Let's move forward:
Child-SP RetAddr : Args to Child : Call Site
fffff880`009a9728 fffff800`0322c443 : 00000000`00000101 00000000`00000019 00000000`00000000 fffff880`009b2180 : nt!KeBugCheckEx
fffff880`009a9730 fffff800`032885f7 : 00000000`00000000 fffff800`00000004 00000000`00002711 00000000`00000000 : nt! ?? ::FNODOBFM::`string'+0x4e3e
fffff880`009a97c0 fffff800`037f5895 : fffff800`0381a460 fffff880`009a9970 fffff800`0381a460 00000000`00000000 : nt!KeUpdateSystemTime+0x377
fffff880`009a98c0 fffff800`0327c3f3 : 00000000`8e403992 fffff800`033f8e80 fffff880`03088180 fffffa80`101ca060 : hal!HalpHpetClockInterrupt+0x8d
fffff880`009a98f0 fffff800`032b55a3 : 00000000`00000000 00000000`00000001 00000000`00000000 00000000`00000000 : nt!KiInterruptDispatchNoLock+0x163 (TrapFrame @ fffff880`009a98f0)
fffff880`009a9a80 fffff800`0328de2c : 00000000`00000000 fffff6fc`40004308 00000000`00000000 00000000`00000000 : nt!KxFlushEntireTb+0x93
fffff880`009a9ac0 fffff800`032c76b9 : fffff6fc`40004308 00000000`00000008 fffff800`033f8e80 00000000`00000080 : nt!KeFlushMultipleRangeTb+0x28c
fffff880`009a9b90 fffff800`032c728f : ffffffff`ffffffff 00000000`0000007f 00000000`00000000 00000000`00000000 : nt!MiZeroPageChain+0x14e
fffff880`009a9bd0 fffff800`03523166 : fffffa80`0ca90460 00000000`00000080 fffffa80`0ca909e0 fffff800`0325e479 : nt!MmZeroPageThread+0x7da
fffff880`009a9d00 fffff800`0325e486 : fffff800`033f8e80 fffffa80`0ca90460 fffff800`03406c40 15ff0000`0160248c : nt!PspSystemThreadStartup+0x5a
fffff880`009a9d40 00000000`00000000 : fffff880`009aa000 fffff880`009a4000 fffff880`009a9970 00000000`00000000 : nt!KiStartSystemThread+0x16
0: kd> .trap fffff880`009a98f0
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000001 rbx=0000000000000000 rcx=00000000000406f8
rdx=00000000000008e1 rsi=0000000000000000 rdi=0000000000000000
rip=fffff800032b55a3 rsp=fffff880009a9a80 rbp=fffff880009a9bb8
r8=0000000000000000 r9=ffffffffffffff7f r10=0000000000000008
r11=fffff880009a9a20 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na pe nc
nt!KxFlushEntireTb+0x93:
fffff800`032b55a3 ebe4 jmp nt!KxFlushEntireTb+0x79 (fffff800`032b5589)
0: kd> knL^^ Here we can find the stored registers and the stack at the time of the interrupt.
*** Stack trace for last set context - .thread/.cxr resets it
# Child-SP RetAddr Call Site
00 fffff880`009a9a80 fffff800`0328de2c nt!KxFlushEntireTb+0x93
01 fffff880`009a9ac0 fffff800`032c76b9 nt!KeFlushMultipleRangeTb+0x28c
02 fffff880`009a9b90 fffff800`032c728f nt!MiZeroPageChain+0x14e
03 fffff880`009a9bd0 fffff800`03523166 nt!MmZeroPageThread+0x7da
04 fffff880`009a9d00 fffff800`0325e486 nt!PspSystemThreadStartup+0x5a
05 fffff880`009a9d40 00000000`00000000 nt!KiStartSystemThread+0x16
This is where we're going to do some instruction disassembling:
0: kd> u @rip
nt!KxFlushEntireTb+0x93:
fffff800`032b55a3 ebe4 jmp nt!KxFlushEntireTb+0x79 (fffff800`032b5589)
fffff800`032b55a5 f08305d3c5140001 lock add dword ptr [nt!KiTbFlushTimeStamp (fffff800`03401b80)],1
fffff800`032b55ad 400fb6c6 movzx eax,sil
fffff800`032b55b1 440f22c0 mov cr8,rax
fffff800`032b55b5 488b5c2440 mov rbx,qword ptr [rsp+40h]
fffff800`032b55ba 488b742448 mov rsi,qword ptr [rsp+48h]
fffff800`032b55bf 4883c430 add rsp,30h
fffff800`032b55c3 5f pop rdi
0: kd> u fffff800`032b5589 fffff800`032b55a5^^ Disassembling the first few instructions reveals a jump (jmp) that is back up in the nt!KxFlushEntireTb function. It appears at the time of the bug check, the thread was executing a pause (a CPU delay), and doing this in a loop waiting for a release.
nt!KxFlushEntireTb+0x79:
fffff800`032b5589 8b8780200000 mov eax,dword ptr [rdi+2080h]
fffff800`032b558f 85c0 test eax,eax <-- Checking if value is non-zero.
fffff800`032b5591 7412 je nt!KxFlushEntireTb+0x95 (fffff800`032b55a5) <-- It looks like it takes the jmp here to stay in the loop.
fffff800`032b5593 ffc3 inc ebx
fffff800`032b5595 851d310d2000 test dword ptr [nt!HvlLongSpinCountMask (fffff800`034b62cc)],ebx
fffff800`032b559b 0f84a11e0200 je nt! ?? ::FNODOBFM::`string'+0x7467 (fffff800`032d7442)
fffff800`032b55a1 f390 pause
fffff800`032b55a3 ebe4 jmp nt!KxFlushEntireTb+0x79 (fffff800`032b5589)
fffff800`032b55a5 f08305d3c5140001 lock add dword ptr [nt!KiTbFlushTimeStamp (fffff800`03401b80)],1
So, what's the summary so far? Processor #0 was the thread that created the bugcheck itself, and must have been interrupted by a clock interrupt in order to trigger the CLOCK_WATCHDOG_TIMEOUT bug check.
--------------------
Let's take a look into Processor #1's call stack like we did Processor #0:
Child-SP RetAddr : Args to Child : Call Site
fffff880`02f1bc58 fffff800`0328da3a : 00000000`0035ce39 fffffa80`0dc25bd8 fffff880`009fb0c0 00000000`00000001 : intelppm!MWaitIdle+0x19
fffff880`02f1bc60 fffff800`032886cc : fffff880`009f0180 fffff880`00000000 00000000`00000000 fffff880`00000000 : nt!PoIdle+0x53a
fffff880`02f1bd40 00000000`00000000 : fffff880`02f1c000 fffff880`02f16000 fffff880`02f1bd00 00000000`00000000 : nt!KiIdleLoop+0x2c
1: kd> !irql
Debugger saved IRQL for processor 0x1 -- 0 (LOW_LEVEL)
^^ Either it's running at 0 or the IRQL despite saying 'saved' really didn't get saved. Windows Internals notes this is a possibility.1: kd> u @rip^^ So it seems that we have the intelppm!MWaitIdle function. I have done some research and I cannot find info on it, although intelppm is related to the processor and I believe its power configuration, power states, etc. Assuming idle implies what I believe it does, this may indicate that processor #1 at the time of the crash was idle waiting for something.
intelppm!MWaitIdle+0x19:
fffff880`06c7ac61 c3 ret
fffff880`06c7ac62 cc int 3
fffff880`06c7ac63 cc int 3
fffff880`06c7ac64 cc int 3
fffff880`06c7ac65 cc int 3
fffff880`06c7ac66 cc int 3
fffff880`06c7ac67 cc int 3
intelppm!SetPerfStateIO:
fffff880`06c7ac68 48895c2408 mov qword ptr [rsp+8],rbx
--------------------
Let's check Processor #2:
2: kd> kv^^ Exact same as Processor #1.
Child-SP RetAddr : Args to Child : Call Site
fffff880`02f8cc58 fffff800`0328da3a : 00000000`0035ce39 fffffa80`0d163908 00000000`00000000 00000000`00000000 : intelppm!MWaitIdle+0x19
fffff880`02f8cc60 fffff800`032886cc : fffff880`02f64180 fffff880`00000000 00000000`00000000 fffff880`00000000 : nt!PoIdle+0x53a
fffff880`02f8cd40 00000000`00000000 : fffff880`02f8d000 fffff880`02f87000 fffff880`02f8cd00 00000000`00000000 : nt!KiIdleLoop+0x2c
--------------------
Let's check Processor #3:
3: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`07bb5210 fffff800`032ab0fb : 00000000`00000002 fffff880`07bb5380 fffff900`c2e82000 00000000`00000001 : nt!MiFlushTbAsNeeded+0x28a
fffff880`07bb5320 fffff800`033afceb : 00000000`00001230 00000000`00001230 00000000`00000021 00000008`00000000 : nt!MiAllocatePagedPoolPages+0x4bb
fffff880`07bb5440 fffff800`03292860 : 00000000`00001230 fffff880`049fccc0 00000000`00000021 00000000`00000000 : nt!MiAllocatePoolPages+0x8e2
fffff880`07bb5590 fffff800`033b2bfe : 00000000`00000000 00000000`02323fff fffffa80`00000020 fffff880`049fccc0 : nt!ExpAllocateBigPool+0xb0
fffff880`07bb5680 fffff960`001928ed : 00000000`00001d01 00000000`00000000 00000000`00000000 fffff960`001a40d1 : nt!ExAllocatePoolWithTag+0x82e
fffff880`07bb5770 fffff960`00193e0f : 00000000`00000001 fffff880`07bb5908 00000000`00000001 fffff960`001a4302 : win32k!AllocateObject+0xdd
fffff880`07bb57b0 fffff960`00169f2f : fffff880`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : win32k!SURFMEM::bCreateDIB+0x1fb
fffff880`07bb58a0 fffff960`00180bc4 : 00000000`01010051 fffff900`c34b3a00 00000000`00000000 00000000`0000002c : win32k!GreCreateDIBitmapReal+0x533
fffff880`07bb59d0 fffff960`00182be6 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : win32k!InternalGetIconInfo+0x174
fffff880`07bb5ac0 fffff800`0327f153 : fffffa80`0ea2d060 00000000`0272f098 fffff880`07bb5b88 00000000`00000020 : win32k!NtUserGetIconInfo+0x182
fffff880`07bb5b70 00000000`778f462a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`07bb5be0)
00000000`0272f078 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x778f462a
3: kd> .trap fffff880`07bb5be0
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=00000000778f462a rsp=000000000272f078 rbp=000000000272f1e0
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl zr na po nc
0033:00000000`778f462a ?? ???
3: kd> u @rip^^ Cannot seem to access the rip register on processor #3. From the stack, it looks like nt!MiFlushTbAsNeeded may be in a loop.
00000000`778f462a ?? ???
^ Memory access error in 'u @rip'
--------------------
Let's now take a look at the problematic processor (#4):
4: kd> kv
Child-SP RetAddr : Args to Child : Call Site
00000000`00000000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
4: kd> r^^ We have a zerod stack + registers, so this will be problematic. Usually this occurs on the problem processor because the IRQL is too high, OR the processor was too hung at the time of the crash to report its information, etc. We will need to get the raw stack.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=0000000000000000 rsp=0000000000000000 rbp=0000000000000000
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up di pl nz na pe nc
cs=0000 ss=0000 ds=0000 es=0000 fs=0000 gs=0000 efl=00000000
00000000`00000000 ?? ???
Let's give this a shot:
4: kd> !pcr
KPCR for Processor 4 at fffff880009b2000:
Major 1 Minor 1
NtTib.ExceptionList: fffff880009bd640
NtTib.StackBase: fffff880009b7040
NtTib.StackLimit: 00000000000ade58
NtTib.SubSystemTib: fffff880009b2000
NtTib.Version: 00000000009b2180
NtTib.UserPointer: fffff880009b27f0
NtTib.SelfTib: 00000000fffdb000
SelfPcr: 0000000000000000
Prcb: fffff880009b2180
Irql: 0000000000000000
IRR: 0000000000000000
IDR: 0000000000000000
InterruptMode: 0000000000000000
IDT: 0000000000000000
GDT: 0000000000000000
TSS: 0000000000000000
CurrentThread: fffffa800d851060
NextThread: fffffa800caa6680
IdleThread: fffff880009bd0c0
4: kd> !thread^^ We'll be using the base & limit addresses to dump the raw stack:
THREAD fffffa800d851060 Cid 0bb8.08c4 Teb: 00000000fffdb000 Win32Thread: fffff900c4147c30 RUNNING on processor 4
Not impersonating
DeviceMap fffff8a0019544e0
Owning Process fffffa800d845b30 Image: chrome.exe
Attached Process N/A Image: N/A
Wait Start TickCount 78494 Ticks: 714 (0:00:00:11.138)
Context Switch Count 232083 IdealProcessor: 4 LargeStack
UserTime 00:00:44.990
KernelTime 00:00:09.032
Win32 Start Address 0x0000000000288c9e
Stack Init fffff8800b922d70 Current fffff8800b922a60
Base fffff8800b923000 Limit fffff8800b91a000 Call 0
Priority 9 BasePriority 8 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Child-SP RetAddr : Args to Child : Call Site
00000000`00000000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
(For convenience purposes, I cut the stack to the important part because the entire raw stack is really large. Even after cutting it, it's still really large...)
fffff880`0b920988 fffff880`0fef3a11*** ERROR: Symbol file could not be found. Defaulted to export symbols for nvlddmkm.sys -^^ Okay, so from that raw stack, we can see quite a few DirectX Kernel & MMS calls, as well as nVidia driver calls as well. This is good news, as this may be our problem (it gives us a good start as far as troubleshooting goes). I'd like to note that there were much more than this, and that the raw stack went on for a very very long time. I am just cutting it to a very small sample for blogging purposes.
nvlddmkm+0xbda11
fffff880`0b920990 fffffa80`0fab3460
fffff880`0b920998 fffff880`06d92095 dxgmms1!VidSchiUpdateContextRunningTimeAtISR+0x45
fffff880`0b9209a0 00000000`0035ce39
fffff880`0b9209a8 fffffa80`0e47c000
fffff880`0b9209b0 00000000`00000000
fffff880`0b9209b8 00000000`00000005
fffff880`0b9209c0 fffffa80`0fab3460
fffff880`0b9209c8 00000000`0000e323
fffff880`0b9209d0 fffffa80`0e47d1a0
fffff880`0b9209d8 fffff880`06d8e66b dxgmms1!VidSchiProcessIsrCompletedPacket+0x1eb
fffff880`0b9209e0 fffffa80`0e47d1a0
fffff880`0b9209e8 fffffa80`0e44a410
fffff880`0b9209f0 fffffa80`0e47c000
fffff880`0b9209f8 fffffa80`0e47c000
fffff880`0b920a00 00000000`00000000
fffff880`0b920a08 00000000`00000000
fffff880`0b920a10 fffffa80`0e47d1a0
fffff880`0b920a18 00000000`00000000
fffff880`0b920a20 fffffa80`0fab3460
fffff880`0b920a28 00000000`00001661
fffff880`0b920a30 00000000`0000c350
fffff880`0b920a38 00000000`00400120
fffff880`0b920a40 00000000`0000e323
fffff880`0b920a48 00000000`00000001
fffff880`0b920a50 00000000`00000000
fffff880`0b920a58 00000000`00000000
fffff880`0b920a60 fffffa80`0e44a410
fffff880`0b920a68 fffff880`06d8e172 dxgmms1!VidSchDdiNotifyInterruptWorker+0x1ea
fffff880`0b920a70 fffffa80`0ebbd9c0
fffff880`0b920a78 fffff880`06d92095 dxgmms1!VidSchiUpdateContextRunningTimeAtISR+0x45
fffff880`0b920a80 00000000`0035ce39
fffff880`0b920a88 fffffa80`0e47e000
fffff880`0b920a90 00000000`00000000
fffff880`0b920a98 00000000`00000006
fffff880`0b920aa0 fffffa80`0ebbd9c0
fffff880`0b920aa8 00000000`00006ed6
fffff880`0b920ab0 fffffa80`0e47f5b0
fffff880`0b920ab8 fffff880`06d8e66b dxgmms1!VidSchiProcessIsrCompletedPacket+0x1eb
fffff880`0b920ac0 fffffa80`0e47f5b0
fffff880`0b920ac8 fffffa80`0e44a410
fffff880`0b920ad0 fffffa80`0e47e000
fffff880`0b920ad8 fffffa80`0e47e000
fffff880`0b920ae0 00000000`00000000
fffff880`0b920ae8 00000000`00000000
fffff880`0b920af0 fffffa80`0e47f5b0
fffff880`0b920af8 00000000`00000000
fffff880`0b920b00 fffffa80`0ebbd9c0
fffff880`0b920b08 00000000`0000071a
fffff880`0b920b10 00000000`000124f8
fffff880`0b920b18 00000000`00400120
fffff880`0b920b20 00000000`00006ed6
fffff880`0b920b28 00000000`00000001
fffff880`0b920b30 00000000`00000001
fffff880`0b920b38 00000000`00000000
fffff880`0b920b40 fffffa80`0e44a410
fffff880`0b920b48 fffff880`06d8e172 dxgmms1!VidSchDdiNotifyInterruptWorker+0x1ea
fffff880`0b920b50 fffffa80`0e47e000
fffff880`0b920b58 00000000`00000000
fffff880`0b920b60 fffffa80`00000001
fffff880`0b920b68 fffff880`0b920e00
fffff880`0b920b70 fffff880`0b920ba0
fffff880`0b920b78 00000000`00000005
fffff880`0b920b80 00000000`00000000
fffff880`0b920b88 00000000`00000001
fffff880`0b920b90 fffff880`0b920e00
fffff880`0b920b98 fffff880`06d8df76 dxgmms1!VidSchDdiNotifyInterrupt+0x9e
fffff880`0b920ba0 fffffa80`00006ed6
fffff880`0b920ba8 00000000`00000000
fffff880`0b920bb0 fffffa80`0e472000
fffff880`0b920bb8 fffffa80`0ebc2010
fffff880`0b920bc0 fffff880`0b920e00
fffff880`0b920bc8 fffff880`06c9513f dxgkrnl!DxgNotifyInterruptCB+0x83
fffff880`0b920bd0 00000000`00006ed6
fffff880`0b920bd8 00000000`00000000
fffff880`0b920be0 00000000`00000001
fffff880`0b920be8 00000000`00000000
fffff880`0b920bf0 fffff880`0b920c80
fffff880`0b920bf8 fffff880`0fef37c9 nvlddmkm+0xbd7c9
fffff880`0b920c00 fffff880`0b920e00
fffff880`0b920c08 fffffa80`0ebc2010
fffff880`0b920c10 fffffa80`0d906000
fffff880`0b920c18 00000000`00000000
fffff880`0b920c20 fffff880`0fef376f nvlddmkm+0xbd76f
fffff880`0b920c28 fffffa80`0d906000
fffff880`0b920c30 00000000`00000000
fffff880`0b920c38 00000000`00000000
fffff880`0b920c40 00000000`00000000
fffff880`0b920c48 00000000`00000000
fffff880`0b920c50 fffffa80`0d697480
fffff880`0b920c58 fffff880`0b920e00
fffff880`0b920c60 00000000`00006ed6
fffff880`0b920c68 fffffa80`0e246000
fffff880`0b920c70 00000000`00000001
fffff880`0b920c78 00000000`00000000
fffff880`0b920c80 fffff880`0b920d40
fffff880`0b920c88 fffff880`0fef3a11 nvlddmkm+0xbda11
fffff880`0b920c90 fffffa80`0d906000
fffff880`0b920c98 fffffa80`0ebc2010
fffff880`0b920ca0 fffff880`0b920e00
fffff880`0b920ca8 fffffa80`0e246000
fffff880`0b920cb0 fffff880`0fef39aa nvlddmkm+0xbd9aa
fffff880`0b920cb8 fffffa80`0d906000
fffff880`0b920cc0 00000000`00000000
fffff880`0b920cc8 00000000`00000000
fffff880`0b920cd0 00000000`00000000
fffff880`0b920cd8 00000000`00000000
fffff880`0b920ce0 00000000`00006ed6
fffff880`0b920ce8 00000000`00000000
fffff880`0b920cf0 00000000`00000001
fffff880`0b920cf8 00000000`00000000
fffff880`0b920d00 00000000`00000000
fffff880`0b920d08 fffffa80`0d906000
fffff880`0b920d10 00000000`00006ed6
fffff880`0b920d18 fffff880`0fef3924 nvlddmkm+0xbd924
fffff880`0b920d20 fffff880`0fef39aa nvlddmkm+0xbd9aa
fffff880`0b920d28 fffffa80`0e246000
fffff880`0b920d30 fffffa80`0e251ad0
fffff880`0b920d38 fffffa80`0d906000
fffff880`0b920d40 fffff880`0b920e90
fffff880`0b920d48 fffff880`0ff2c17d nvlddmkm+0xf617d
fffff880`0b920d50 00000000`00000000
fffff880`0b920d58 fffffa80`0ebc2010
fffff880`0b920d60 00000000`00000001
fffff880`0b920d68 00000000`00000000
fffff880`0b920d70 fffff880`0ff2be9c nvlddmkm+0xf5e9c
fffff880`0b920d78 fffffa80`0d906000
fffff880`0b920d80 fffffa80`0ebb5010
fffff880`0b920d88 fffffa80`0ebb6010
fffff880`0b920d90 fffffa80`0ebb69d0
fffff880`0b920d98 00000000`00000000
fffff880`0b920da0 00000000`00000001
fffff880`0b920da8 00000000`00000000
fffff880`0b920db0 fffffa80`0fab3460
fffff880`0b920db8 fffff880`06d92095 dxgmms1!VidSchiUpdateContextRunningTimeAtISR+0x45
fffff880`0b920dc0 00000000`0035ce39
fffff880`0b920dc8 fffffa80`0e47c000
fffff880`0b920dd0 00000000`00000000
fffff880`0b920dd8 00000000`00000006
fffff880`0b920de0 fffffa80`0fab3460
fffff880`0b920de8 00000000`0005fea6
fffff880`0b920df0 fffffa80`0e47cf30
fffff880`0b920df8 fffff880`06d8e66b dxgmms1!VidSchiProcessIsrCompletedPacket+0x1eb
fffff880`0b920e00 fffffa80`0fab3460
fffff880`0b920e08 fffff880`06d92095 dxgmms1!VidSchiUpdateContextRunningTimeAtISR+0x45
fffff880`0b920e10 00000000`0035ce39
fffff880`0b920e18 fffffa80`0e47c000
fffff880`0b920e20 00000000`00000000
fffff880`0b920e28 00000000`0000000b
fffff880`0b920e30 fffffa80`0fab3460
fffff880`0b920e38 00000000`0007cd31
fffff880`0b920e40 fffffa80`0e47ccc0
fffff880`0b920e48 fffff880`06d8e66b dxgmms1!VidSchiProcessIsrCompletedPacket+0x1eb
fffff880`0b920e50 fffffa80`0e47ccc0
fffff880`0b920e58 fffffa80`0e44a410
fffff880`0b920e60 fffffa80`0e47c000
fffff880`0b920e68 fffffa80`0e47c000
fffff880`0b920e70 00000000`00000000
fffff880`0b920e78 00000000`00000000
fffff880`0b920e80 fffffa80`0e47ccc0
fffff880`0b920e88 00000000`00000000
fffff880`0b920e90 fffffa80`0fab3460
fffff880`0b920e98 00000000`00000d3f
fffff880`0b920ea0 00000000`00007111
fffff880`0b920ea8 00000000`00400120
fffff880`0b920eb0 00000000`0007cd31
fffff880`0b920eb8 00000000`00000001
fffff880`0b920ec0 00000000`00000000
fffff880`0b920ec8 00000000`00000000
fffff880`0b920ed0 fffffa80`0e44a410
fffff880`0b920ed8 fffff880`06d8e172 dxgmms1!VidSchDdiNotifyInterruptWorker+0x1ea
fffff880`0b920ee0 fffffa80`0e47c000
fffff880`0b920ee8 00000000`00000000
fffff880`0b920ef0 fffffa80`00000001
fffff880`0b920ef8 fffff880`0b921190
fffff880`0b920f00 fffff880`0b920f30
fffff880`0b920f08 00000000`00000005
fffff880`0b920f10 00000000`00000000
fffff880`0b920f18 00000000`00000000
fffff880`0b920f20 fffff880`0b921190
fffff880`0b920f28 fffff880`06d8df76 dxgmms1!VidSchDdiNotifyInterrupt+0x9e
fffff880`0b920f30 fffffa80`0007cd31
fffff880`0b920f38 00000000`00000000
fffff880`0b920f40 fffffa80`0e472000
fffff880`0b920f48 fffffa80`0fbf5010
fffff880`0b920f50 fffff880`0b921190
fffff880`0b920f58 fffff880`06c9513f dxgkrnl!DxgNotifyInterruptCB+0x83
--------------------
Let's check Processor #5:
5: kd> kv
Child-SP RetAddr : Args to Child : Call Site
00000000`00000000 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x0
5: kd> u @rip^^ Looks like this specific processor was too hung at the time of the crash to report any information.
00000000`00000000 ?? ???
^ Memory access error in 'u @rip'
--------------------
Let's check Processor #6:
6: kd> kv^^ Same as processors #1 and #2.
Child-SP RetAddr : Args to Child : Call Site
fffff880`03121c58 fffff800`0328da3a : 00000000`0035ce39 fffffa80`0d16d378 fffff880`030f9180 00000000`00000000 : intelppm!MWaitIdle+0x19
fffff880`03121c60 fffff800`032886cc : fffff880`030f9180 fffff880`00000000 00000000`00000000 fffff880`00000000 : nt!PoIdle+0x53a
fffff880`03121d40 00000000`00000000 : fffff880`03122000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiIdleLoop+0x2c
--------------------
Finally, let's check Processor #7:
7: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffff880`07a19930 fffff800`032c1c4a : 00000000`00000000 00000000`2d6c0fff fffffa80`00000000 fffffa80`00000000 : nt!MiDeleteVirtualAddresses+0x7d8
fffff880`07a19af0 fffff800`0327f153 : ffffffff`ffffffff 00000000`2174e6d0 00000000`2174e6c8 00000000`00008000 : nt!NtFreeVirtualMemory+0x5ca
fffff880`07a19be0 00000000`77a3009a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`07a19be0)
00000000`2174e698 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x77a3009a
7: kd> .trap fffff880`07a19be0
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=0000000000000000
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=0000000077a3009a rsp=000000002174e698 rbp=0000000017a8f904
r8=0000000000000000 r9=0000000000000000 r10=0000000000000000
r11=0000000000000000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei pl nz na po nc
7: kd> u @rip^^ Cannot seem to go too far into processor #7, but it seemed to be doing virtual memory related things.
00000000`77a3009a ?? ???
^ Memory access error in 'u @rip'
--------------------
Overall, from the above, this looks like a hardware issue. Video card, RAM, or CPU itself. I'd like to say it's also possible for it to be a video driver causing corruption, we shall see.
I'm having the user go through hardware diagnostics, as well as a few other things, so I'll report back with any info when I have it.
ReplyDeleteThanks for sharing.
ReverseEngineering