Monday, June 23, 2014

How the BSOD actually 'works', why, etc.

So in this blog I've talked many times in-depth regarding postmortem debugging kernel-dumps as far as blue screen crashes goes. Well, I decided maybe it's time to go ahead and actually in detail explain why a blue screen occurs, what actually goes on when a blue screen occurs, etc.

-- Disclaimer: At the time of this post, I have never myself experienced a BSOD on my Windows 8.1 system, so I cannot 100% confirm whether or not the display is shifted to a low-res VGA mode when paining the screen. I may use NotMyFault to test this out and will edit when I get a confirmation. For now though, let's assume nothing has changed and hope I'm correct : )

---------------------------

First off, why does the blue screen of death occur? Well, it's important to know that there are many reasons as to why a blue screen occurs. Just to name a few:

  • References to invalid/inaccessible memory; causes access violations, etc.
  • Unexpected exceptions.
  • Bugs in drivers causing a fault in a kernel-mode driver, 3rd party drivers doing what I first mentioned, etc.
Again, this is very few of the potential reasons why, but some of the most prevalent. For those interested, here's actually a distribution of what causes bug checks most commonly in Windows.


This is a picture I found on Google from a TechNet article, so thanks to the author for this! It's from Windows Internals - 5th Edition. AFAIK there is not one in the 6th, at least I have not seen it throughout my reading or research afterwards. With this said, it's likely not entirely accurate in regards to today, but I imagine it has not changed too much. Given I analyze postmortem kernel-dumps quite a bit, I am surprised to see pool is so low. Again, this was way back in the writing of the 5th edition which was during Vista's legacy, so many things have changed since then. It's up in the air, really!

Now, with that said we understand a few reasons as to why Windows stops and a blue screen occurs. Good! Now let's also go ahead and understand that if any of these things occurred, Windows could theoretically not stop and keep going when one of these is occurring. Why doesn't it just do this? Well, it's actually extremely simple, and that's because many of these things can cause severe data/memory corruption which could actually lead to hardware problems.

Since we don't want any of that, Windows thankfully has a fail-safe known to us as the Blue Screen of Death (BSOD -- abbreviating from now on). If Windows detects that there is a serious problem that is unrecoverable, it will stop all executions, switch the display to the basic/low-res VGA mode, paint the actual blue screen itself, write memory/crash information to what we know as a memory dump (crash dump/dmp file/dmp), and display a stop code (bug check). All of this is done through a series of functions.

Now that we're on this topic, I must STRESS and dispel the misconception right now that the blue screen itself is a bad thing. It's not! The blue screen is a good thing, and it's making it so our data doesn't become completely corrupt. Remember, the blue screen happens because Windows has detected something has gone horribly wrong, and it cannot recover and/or stop it. When this happens, the appropriate bug check based on what caused the error is called, and the blue screen is painted.

Bottom line... the blue screen is our friend, not our enemy : )

---------------------------

As discussed above, a blue screen happens when Windows detects that there's an unrecoverable/irreversible problem occurring. Regardless of what this actual problem is, the end result is a blue screen. As I mentioned above, this blue screen process actually happens through functions.

Despite the belief that there is only one function that calls and/or begins the bug check process, it is not true! There's two!
(Clickable for their MSDN links)!

First off, before stating their differences, let's make it easy by saying that both of these functions take what is known as a BugCheckCode parameter. What is a BugCheckCode parameter? Good question! This parameter is otherwise known as a STOP code (for example - 0x0000000A, 0x0000001A, 0x0000009F, etc). These stop codes (otherwise known as/called 'bug checks') are what allows us (other than actually debugging the crash dump itself) to troubleshoot the blue screen. It allows us to go ahead and troubleshoot because each of these STOP codes has an actual preset meaning/cause as to why it occurred.

Great, so now that we know that information, what is the difference between KeBugCheckEx and KeBugCheck? Good question! KeBugCheck calls KeBugCheckEx and sets the four parameters to zero.

Example - {0,0,0,0}

Essentially, the KeBugCheckEx function itself provides more information because it sets the four parameters to their preset meanings based on the STOP code/bug check.

---------------------------

Once KeBugCheckEx is called, it first goes ahead and disables all interrupts by calling the KiDisableInterrupts function. After this is done, it transitions to a special system-state in which the STOP code is dumped (0x0000000A for example). It accomplishes the transition and dump of the STOP code with a call from KiDisableInterrupts to the HalDisplayString function.

HalDisplayString itself goes ahead and first takes one parameter (string to print to the blue screen), and does a check to see if the system is in its special system-state (blue screen 'mode'). If it is not in this state however, it will go ahead and attempt to successfully use the firmware to swap to this proper system-state in order to continue.

Once the check has been successfully completed and confirmed that the system is in its proper state, HalDisplayString goes ahead and dumps the string into text-mode video memory at the current location of the cursor. This is kept track of throughout all of the future calls.

After all of this is successfully accomplished, KeBugCheckEx then goes ahead and calls the KeGetBugMessageText function. The KeGetBugMessageText translates the stop code into its text-equivalent. There's a bug check reference list here.

Once that is completed, KeBugCheckEx will then go ahead at this point and start to call any bug check handlers that drivers registered (if any). The handlers themselves are registered by calling KeRegisterBugCheckCallback which goes ahead and fills in a buffer that is allocated by the caller of the register routine so it can be debugged in the debugging client. It also essentially in general allows any drivers a chance to stop their devices.

Once that is through, we move on to calling the KeRegisterBugCheckReasonCallback function which goes ahead and allows any drivers to write data to the crash dump or write crash dump information to alternate devices.

Once the above is done (if possible, because handlers aren't always registered) KeDumpMachineState is called which dumps the rest of the text on the screen. However, the first thing KeDumpMachineState tries to do is successfully interpret the four parameters that were passed to KeBugCheckEx as a valid address within a loaded module. It will go ahead and stop when it can successfully resolve one. The function that is used to accomplish this is KiPcToFileHeader.

KiPcToFileHeader returns for the first parameter that it goes ahead and successfully resolves, immediately prints the following text form of the STOP code, and also includes the base address of the module and the module’s name.

---------------------------

Below I will share the difference between your standard 8/8.1 and XP/Vista/7 screens:


(Windows 8/8.1 displaying 0x5C bug check)


(Windows XP/Vista/7 displaying 0x50 bug check)

Thanks for reading, and thanks to NT Insider and MSDN as always for double-checking!

Tuesday, June 17, 2014

0xC000021A Debugging

Yay, a debugging post! : )

This bug check in most if not all cases is caused by a critical Windows component corruption (.dll, piece of the file system, etc), 3rd party driver causes a conflict (rare), etc.

---------------------------

First of all, let's have a look at the basic description of the bug check:

WINLOGON_FATAL_ERROR (c000021a)

This means that an error has occurred in a crucial user-mode subsystem.

Okay, with that said let's go ahead and expand a bit on what this exactly means. Within user-mode we have various subsystems such as WinLogon or csrss.exe (Client/Server Runtime Subsystem). When for some reason these 'critical' subsystems unexpectedly cease to exist, have any sort of problem that prevents them from running or doing their job, the OS will swap to kernel-mode.

What's the problem with this? The subsystems I mentioned above are strictly user-mode, therefore when the OS swaps to kernel-mode, it calls a bug check as this is a big no-no as the OS cannot run without those subsystems.

In this bug check, two of the four parameters are important:

-- In this example, I will be using a 0xC000021A I solved quite some time ago. Your parameters may obviously differ.

BugCheck C000021A, {8da5e6b0, c0000006, 75a4e5e5, 13f86c}

The 1st parameter (8da5e6b0 in our case) is the string that identifies the problem.

The 2nd parameter (c0000006 in our case) is the error code.

---------------------------

FAILURE_BUCKET_ID:  0xc000021a_csrss.exe_c0000006_PoShutdown_ANALYSIS_INCONCLUSIVE
We can see it was csrss.exe that terminated unexpectedly. Why?
1: kd> db 8da5e6b0
8da5e6b0  57 69 6e 64 6f 77 73 20-53 75 62 53 79 73 74 65  Windows SubSyste
8da5e6c0  6d 00 a5 8d c0 e6 a5 8d-04 04 2b 06 46 4d 66 6e  m.........+.FMfn8da5e6d0  04 f2 4e 01 00 00 00 00-a7 73 19 00 00 00 00 00  ..N......s......
8da5e6e0  e0 e6 a5 8d 00 00 00 00-00 00 00 00 e4 cf 61 8a  ..............a.
8da5e6f0  00 00 00 00 00 00 00 00-00 00 00 00 40 00 00 00  ............@...
8da5e700  01 00 00 00 dc 00 de 00-40 e7 a5 8d 2e 00 2e 00  ........@.......
8da5e710  40 e7 a5 8d 00 00 00 00-00 00 00 00 00 00 00 00  @...............
8da5e720  00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ................
If we run db 1st parameter it dumps the bytes from the string. We can see FMFn which is a pool tag, specifically the NAME_CACHE_NODE structure. It's part of fltmgr.sys which is the Microsoft Filesystem Filter Manager driver.

1: kd> da 8da5e6b0
8da5e6b0  "Windows SubSystem"
If we run da 1st parameter it dumps ASCII strings. Not very helpful given we already knew this, but it's just another way to show you how you can see what caused the crash.

---------------------------

In this specific case, I advised the user to insert the installation media and run a repair (which solved the problem).

Thanks for reading!

Saturday, June 14, 2014

Blog Update

Hi friends and readers,

I'm just going to post a quick blog update as things are going to be changing a little bit! No longer from this point on will I be posting the threads I've solved. It was easy before, but now that I am extremely active on various communities and solve hundreds of threads a week, having to attempt to keep up and post all of them that I've solved is nearly impossible and extremely exhausting.

With that said, my blog from this point on will be everything it is now, minus the 'solved' posts. If anything, this will also allow me to write many more in-depth debugging posts as I won't be so stressed to have to constantly post solved posts.

I hope you understand!

Sunday, June 8, 2014

[SOLVED] SYSTEM_SERVICE_EXCEPTION

Link to solved thread - BugCheck 3B on Intel NUC w/Windows 7

What the issue was - 


- Latest video card drivers needed to be installed.

- McAfee needed to be removed and replaced with MSE.

- wdcsam64.sys was present on the system and needed to be removed.

[SOLVED] DRIVER_IRQL_NOT_LESS_OR_EQUAL

Link to solved thread - BSOD DRIVER_IRQL_NOT_LESS_OR_EQUAL (igdkmd64.sys)

What the issue was - 


- Video card drivers needed to be updated.

- Screaming Bee needed to be removed.

- Trend Micro needed to be removed and Windows Defender.

[SOLVED] BAD_POOL_HEADER

Link to solved thread - Bad pool header with blue screen of death

What the issue was -
Malwarebytes PRO needed to be removed.

[SOLVED] WDF_VIOLATION

Link to solved thread - Whats the cause of my recent BSoD?

What the issue was -
DS3 Tool/MotionJoy needed to be removed.

[SOLVED] KERNEL_DATA_INPAGE_ERROR / CRITICAL_PROCESS_DIED

Link to solved thread - Windows 8.1 BSOD Kernel Data Inpage Error and Critical Process Died

What the issue was -
ESET needed to be removed and replaced with Windows Defender.

[SOLVED] DRIVER_POWER_STATE_FAILURE

Link to solved thread - 0x0000009F when resuming from sleep on ASUS K53SD

What the issue was -
Kaspersky needed to be removed and replaced with MSE.

[SOLVED] UNEXPECTED_KERNEL_MODE_TRAP

Link to solved thread - Unexpected Error: UNEXPECTED_KERNEL_MODE_TRAP (Wfd01000.sys)

What the issue was -
VIA_USB_SER.sys driver needed to be removed.

[SOLVED] BAD_POOL_HEADER / KERNEL_SECURITY_CHECK_FAILURE

Link to solved thread - Bad_Pool_Header and Kernel_Security_Check Failure BSODs alternating

What the issue was -
Modem needed to be disabled as device drivers weren't compatible.

[SOLVED] IRQL_NOT_LESS_OR_EQUAL

Link to solved thread - BLUESCREEN ERROR in WINDOWS 7 32 BIT

What the issue was -
QuickHeal needed to be removed and replaced with MSE.

[SOLVED] DRIVER_IRQL_NOT_LESS_OR_EQUAL / DRIVER_OVERRAN_STACK_BUFFER / ATTEMPTED_EXECUTE_OF_NOEXECUTE_MEMORY

Link to solved thread - Win8 Blue Screen DRIVER_OVERRAN_STACK_BUFFER

What the issue was -
Kaspersky needed to be removed and replaced with Windows Defender.

[SOLVED] KERNEL_AUTO_BOOST_INVALID_LOCK_RELEASE

Link to solved thread - BSOD still happening Please help!

What the issue was - 


- AODDriver2.sys was present on the system and its software needed to be removed.

- Pinnacle Marvin Bus needed to be removed as the device drivers were far too old.

- PCAUSA NDIS 5.0 SPR Protocol needed to be removed for same reason as above.

[SOLVED] IRQL_NOT_LESS_OR_EQUAL

Link to solved thread - BSOD while going into sleep mode

What the issue was - 


- avast! needed to be removed and replaced with MSE.

- Asus bloatware needed to be removed.

- Boot Defrag needed to be removed.

[SOLVED] CRITICAL_PROCESS_DIED / KERNEL_DATA_INPAGE_ERROR

Link to solved thread - Windows 8.1 Kernel Data Inpage Error

What the issue was - 


- avast! and Kaspersky were conflicting, both needed to be removed.

[SOLVED] DRIVER_POWER_STATE_FAILURE

Link to solved thread - BSOD driver power state failure Windows 8.1 caused by ntoskrnl.exe

What the issue was - 


- USB drivers needed to be updated.

- Daemon Tools needed to be removed.

[SOLVED] DRIVER_POWER_STATE_FAILURE

Link to solved thread - Windows 8.1 Blue Screen

What the issue was -
IRST needed to be updated.

[SOLVED] SYSTEM_THREAD_EXCEPTION_NOT_HANDLED_M

Link to solved thread - Blue Screen BCCode: 1000007e

What the issue was - 


- Novatel NWADI Bus Enumerator software needed to be removed.

- PowerISO needed to be removed.

- Atheros drivers needed to be updated.

[SOLVED] PAGE_FAULT_IN_NONPAGED_AREA

Link to solved thread - frequent blueScreen after Hibernating

What the issue was -
AVM Remote USB Architecture software needed to be removed.

[SOLVED] DRIVER_POWER_STATE_FAILURE

Link to solved thread - Windows Blue Screen error 1000009f-How do I fix it?

What the issue was - 


- Asus bloatware needed to be removed.

- Kaspersky needed to be removed and replaced with MSE.

[SOLVED] BAD_POOL_HEADER

Link to solved thread - I keep running into Bad Pool Header

What the issue was -
Kaspersky needed to be removed and replaced with Windows Defender.

[SOLVED] DRIVER_IRQL_NOT_LESS_OR_EQUAL

Link to solved thread - DRIVER_IRQL_NOT_LESS_OR_EQUAL (wdf01000.sys)

What the issue was -
USB port on the case was faulty.

[SOLVED] SYSTEM_SERVICE_EXCEPTION

Link to solved thread - BSoD SYSTEM_SERVICE_EXCEPTION

What the issue was - 


- Needed to update SP1 ASAP.

- avast! needed to be removed and replaced with MSE.

- Video card drivers needed to be updated.

[SOLVED] IRQL_NOT_LESS_OR_EQUAL / DPC_WATCHDOG_VIOLATION

Link to solved thread - Much Errors and BSOD in Windows 8

What the issue was - 


- Accelerometer needed to be removed.

- Bluetooth drivers needed to be updated.

[SOLVED] UNEXPECTED_KERNEL_MODE_TRAP

Link to solved thread - BSOD problem WIN 8

What the issue was -
McAfee needed to be removed and replaced with Windows Defender.

[SOLVED] DRIVER_POWER_STATE_FAILURE

Link to solved thread - BSOD (Upgraded to Windows 8)

What the issue was -
ExpressCache needed to be removed.

[SOLVED] WHEA_UNCORRECTABLE_ERROR

Link to solved thread - Random STOP errors

What the issue was -
???

[SOLVED] DRIVER_IRQL_NOT_LESS_OR_EQUAL

Link to solved thread - Blue screen windows 7 64 bit Dell inspiron n5110

What the issue was -
Connectify needed to be removed.

[SOLVED] IRQL_NOT_LESS_OR_EQUAL

Link to solved thread - Problem Event Name: Blue Screen

What the issue was - 


- Asus bloatware needed to be removed.

- Glary Utilities needed to be removed.

- Malwarebytes PRO needed to be removed.

- Rainbow Tech/SafeNet USB Security Device/software needed to be removed.

[SOLVED] DRIVER_IRQL_NOT_LESS_OR_EQUAL

Link to solved thread - windows 8.1 BSOD irql_not_less_or_equal

What the issue was - 


- Video card drivers needed to be updated.

- Asus bloatware needed to be removed.

- McAfee needed to be removed and replaced with Windows Defender.

[SOLVED] DRIVER_IRQL_NOT_LESS_OR_EQUAL

Link to solved thread - DRIVER_IRQL_NOT_LESS_OR_EQUAL(athw8x.sys)

What the issue was -
Qualcomm Atheros UB91/UB93/UB94 Network Adapter driver needed to be updated.

[SOLVED] SYSTEM_THREAD_EXCEPTION_NOT_HANDLED_M / SYSTEM_SERVICE_EXCEPTION / KMODE_EXCEPTION_NOT_HANDLED

Link to solved thread - I need help : systematic BSOD on my computeur

What the issue was -
Video card and/or motherboard needed to be replaced.

[SOLVED] DRIVER_VERIFIER_DETECTED_VIOLATION

Link to solved thread - Driver Verifier Detected on Every Startup

What the issue was -
Video card drivers needed to be rolled back.

[SOLVED] KERNEL_SECURITY_CHECK_FAILURE

Link to solved thread - Computer get Bluescreen everytime I plug in my mouse~Need help~ :(

What the issue was -
Needed to circumvent incompatible mouse drivers by allowing Windows to install default mouse drivers.

[SOLVED] DRIVER_IRQL_NOT_LESS_OR_EQUAL

Link to solved thread - Windows 8 BSOD problem (DRIVER_IRQL_NOT_LESS_OR_EQUAL)

What the issue was - 


- Asus bloatware needed to be removed.

- ExpressCache needed to be removed.

- Network drivers needed to be updated.

[SOLVED] MEMORY_MANAGEMENT

Link to solved thread - Windows 8.1 several crashes BSOD

What the issue was -
AppleCharger and other bloatware needed to be removed.

[SOLVED] WHEA_UNCORRECTABLE_ERROR

Link to solved thread - BSOD/Bad

What the issue was -
Overclock needed to be returned to defaults.

[SOLVED] DRIVER_POWER_STATE_FAILURE

Link to solved thread - DRIVER_POWER_STATE-FAILURE_Message on Asus Laptop

What the issue was -
Bluetooth driver needed to be updated.

[SOLVED] BAD_POOL_CALLER

Link to solved thread - BSOD after torrent started BAD_POOL_CALLER windows 8.1 pro wmc 64, with ralink rt3290

What the issue was -
AMD Quick Stream Technology - flow processing engine software needed to be removed.

[SOLVED] SYSTEM_SERVICE_EXCEPTION / NTFS_FILE_SYSTEM

Link to solved thread - Windows 7 Blue Screens - Vga Adapter?

What the issue was -
???

[SOLVED] SYSTEM_THREAD_EXCEPTION_NOT_HANDLED_M / IRQL_NOT_LESS_OR_EQUAL

Link to solved thread - Blue screen of death

What the issue was -
???

[SOLVED] PAGE_FAULT_IN_NONPAGED_AREA

Link to solved thread - Windows 7 x64 Pro Blue Screen Intermittent on new system

What the issue was -
RAM needed to be replaced.

[SOLVED] DRIVER_VERIFIER_DETECTED_VIOLATION

Link to solved thread - Windows 8.1 BSOD

What the issue was -
SiS RAID Stor Miniport driver needed to be updated.

Saturday, June 7, 2014

[SOLVED] DRIVER_POWER_STATE_FAILURE

Link to solved thread - Stop 0x9F DRIVER_POWER_STATE_FAILURE

What the issue was -
ENE Tech USB memory card reader software needed to be removed.

[SOLVE] SYSTEM_SERVICE_EXCEPTION

Link to solved thread - Windows 8.1 BSOD after disconecting HDMI cable, could somebody help me analyse .dmp file please?

What the issue was -
The user simply did not properly remove the HDMI cable, which caused the bugcheck to be called. There were no actual problems.

[SOLVED] DRIVER_POWER_STATE_FAILURE

Link to solved thread - DRIVER POWER STATUS FAILURE - BLUE SCREEN ERROR

What the issue was -
Printer driver needed to be uninstalled/removed.

[SOLVED] WHEA_UNCORRECTABLE_ERROR / BUGCODE_USB_DRIVER

Link to solved thread - IRQL not less or equal blue screen error

What the issue was -
System was replaced with warranty as it was faulty.