Wednesday, February 12, 2014

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1) NETIO.sys Debugging

This will be my first among many debugging tutorials (aside from older ones)! I very much want to get back into writing tutorials for a few reasons, but the main is that they are very fun, and I obviously learn more and more every day! Another thing about tutorials is they are all over the web on various blogs, forums, etc, but many have different styles of the way they were written. Some may contain more info, etc, and different methods of explaining, etc. My goal with everything regarding debugging has always and will always be explain as much as my personal knowledge permits, and do it in the way that anyone that doesn't know how to do it can learn it by reading and then performing it hands on by themselves.

--------------------

Let's get started! We're going to start off with the *D1 bug check, but more specifically when NETIO.sys is the labeled fault of the crash. I've been debugging online on various forums for a little over two years now, and in the past few months to a year, I have seen a huge increase in NETIO.sys *D1's. I am going to tell you right now that NETIO.sys *D1 bug checks are caused 100% of the time from what I have seen (and I have debugged and solved MANY NETIO.sys *D1's) by either the following:

1. Network drivers themselves; whether they need to be updated, reinstalled due to corruption, rolled back due to bug in latest version, etc.

2. 3rd party antivirus or firewall software causing NETBIOS and/or network related conflicts.
 (99% of the time #2 is the cause, and rarely have I seen #1 but it's of course possible).
Right, so with all of this said, what's NETIO.sys? NETIO.sys is Microsoft Windows' Network I/O Subsystem.

First of all, Input and Output (I/O) is actually extremely in-depth and will not be explained in this blog post. If you of course would however like to read about it and learn (which I highly recommend), read the following from the msdn website.

More specifically, we're interested in Network I/O operations in this regard - msdn link here

--------------------

With this said, the basic definition (per msdn) for the *D1 bug check is the following:

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
This indicates that a kernel-mode driver attempted to access pageable memory at a process IRQL that was too high.
A driver tried to access an address that is pageable (or that is completely invalid) while the IRQL was too high. This bug check is usually caused by drivers that have used improper addresses. 
So, this is a fairly standard explanation for a person who understands how Windows' memory manager works. If you don't however, you can kinda sorta get the gist of it, but at the same time it may not really mean much to you. Let's go into detail on the memory manager subsystem, because we're all about learning!

Windows' memory manager runs at IRQL 0 (PASSIVE_LEVEL), which is the layer that threads run at. If for example a driver attempts to access memory that is not currently in RAM (paged), this will cause an exception (thrown by the processor). When this exception happens, Windows' memory manager will go ahead and catch the exception, fetch memory from the hard disk, and then finally the processor will then go ahead and return to the driver that attempted to access this memory which was not paged, but at this point will now be paged.

Alright, great, so why do we get this bug check? *D1 occurs when a driver attempts to access memory that is running at a higher IRQL. This is not good (clearly), because when the driver attempts to access paged-out memory at IRQL[n] (I use (n) because there are different levels, but I will go ahead and say that 2 is the most common, so from this point on I will use 2), Windows' memory manager will page-in the memory and run at IRQL 0. This cannot happen, so Windows' memory manager will bug check the system as a deadlock will occur.

This can also occur not only if a driver attempts to access memory that is running at a higher IRQL, but if a driver attempts to access an invalid memory address.

--------------------

Now that we have all of that said, let's move onto an example crash dump (just a random *D1 NETIO.sys dump from a user that I managed to dig up):

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 0000000000000028, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff80000f8c43f, address which referenced memory

Debugging Details:
------------------
Right away we can see that the 2nd parameter and/or argument of the *D1 bug check itself is 0000000000000002 (2) as I mentioned earlier. There are various other ways to display the parameters of a bug check in different ways.

For example, by running the .bugcheck command:

0: kd> .bugcheck
Bugcheck code 000000D1
Arguments 00000000`00000028 00000000`00000002 00000000`00000000 fffff800`00f8c43f

I've highlighted where '00000000`00000002' = 2.

Before running !analyze v it's listed:

BugCheck D1, {28, 2, 0, fffff80000f8c43f}
It's also listed after running !analyze v further in the dump:

CURRENT_IRQL:  2
So, with this specific crash dump, it was a minidump and didn't contain very much information. For example, just have a look at the call stack:

STACK_TEXT: 
ffffd000`253ab288 fffff801`9776d7e9 : 00000000`0000000a 00000000`00000028 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
ffffd000`253ab290 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiBugCheckDispatch+0x69
We can see from the stack that we just have Windows' usual error handling and fault tolerance bug check related routines. No driver calls, etc. Very dead stack. Let's go ahead and refer to the FBID:

FAILURE_BUCKET_ID:  X64_0xD1_NETIO!RtlCopyBufferToMdl+1f
We can see the fault of the crash is NETIO.sys (calling into?) the RtlCopyBufferToMdl routine. I am not entirely sure actually what this routine implies, however just from knowing the acronyms...

Rtl = Run-Time Library.

Mdl = Memory Descriptor List.

I can imagine there's some sort of buffer being copied from an RTL routine to an MDL. So, what does this mean to us? Well, nothing really. It's a minidump with not very much information. All we know is something is conflicting with NETIO.sys. Let's go ahead and take a look at the loaded modules list (Debug > Modules). Now, in NETIO.sys dumps you are going to want to check for popular antivirus drivers. I would list them here, but there are so many. I think I'll add them over time. I will just go ahead and let you know that this specific dump contained ggc.sys which is a driver in relation to Quick Heal AntiVirus.

0: kd> lmvm ggc
start             end                 module name
fffff800`01600000 fffff800`01618000   ggc        (deferred)           
    Image path: \SystemRoot\system32\DRIVERS\ggc.sys
    Image name: ggc.sys
    Timestamp:        Wed Sep 04 02:43:22 2013
So, there's ggc.sys. Now, at this point I recommend removal of QuickHeal and explained that it was likely causing network related conflicts, which in turn caused the system to crash. After QuickHeal was removed, the crashes stopped.

--------------------

-- Today when I wake up I will add a list of antiviruses and firewalls that I have seen cause this bug check.

1 comment:

  1. hi Patrick

    not really understand how you figure out the problem is related to ggc.sys, could you elaborate a bit? I'm struggling in a similar situation for a while

    thanks

    ReplyDelete