Thursday, July 12, 2012

0x116: VIDEO_TDR_ERROR

* Updated 2/25/2014

**This is actually a canned reply, even though it's not under my canned replies post. Even though it's a canned reply, you can still use it for troubleshooting. It appears based off of the page views this post gets, most do that anyway.

----------------

The basic definition of a 0x116 bugcheck is:
This indicates that an attempt to reset the display driver and recover from a timeout failed.
So, let me now explain what VIDEO_TDR_ERROR means. First off, TDR is an acronym for 'Timeout Detection and Recovery'. Timeout Detection and Recovery was introduced in Vista and carried over to Windows 7. Rather than putting exactly what Timeout Detection and Recovery does exactly, I'll just directly quote the MSDN article!

Timeout detection:
The GPU scheduler, which is part of the DirectX graphics kernel subsystem (Dxgkrnl.sys), detects that the GPU is taking more than the permitted amount of time to execute a particular task. The GPU scheduler then tries to preempt this particular task. The preempt operation has a "wait" timeout, which is the actual TDR timeout. This step is thus the timeout detection phase of the process. The default timeout period in Windows Vista and later operating systems is 2 seconds. If the GPU cannot complete or preempt the current task within the TDR timeout period, the operating system diagnoses that the GPU is frozen.
To prevent timeout detection from occurring, hardware vendors should ensure that graphics operations (that is, DMA buffer completion) take no more than 2 seconds in end-user scenarios such as productivity and game play.
Preparation for recovery:
The operating system's GPU scheduler calls the display miniport driver's DxgkDdiResetFromTimeout function to inform the driver that the operating system detected a timeout. The driver must then reinitialize itself and reset the GPU. In addition, the driver must stop accessing memory and should not access hardware. The operating system and the driver collect hardware and other state information that could be useful for post-mortem diagnosis. 
Desktop recovery:
The operating system resets the appropriate state of the graphics stack. The video memory manager, which is also part of Dxgkrnl.sys, purges all allocations from video memory. The display miniport driver resets the GPU hardware state. The graphics stack takes the final actions and restores the desktop to the responsive state. As previously mentioned, some legacy DirectX applications might render just black at the end of this recovery, which requires the end user to restart these applications. Well-written DirectX 9Ex and DirectX 10 and later applications that handle Device Remove technology continue to work correctly. An application must release and then recreate its Direct3D device and all of the device's objects. For more information about how DirectX applications recover, see the Windows SDK.
 Article here.

With this being said, if Timeout Detection and Recovery fails to recover the display driver, it will then shoot the 0x116 bugcheck. There are many different things that can cause a 0x116, which I will explain below:

(Ensure you have the latest video card drivers. If you are already on the latest video card drivers, uninstall and install a version or a few versions behind the latest to ensure it's not a latest driver only issue. If you have already experimented with the latest video card driver and many previous versions, please give the beta driver for your card a try.)

The following hardware issues can cause a TDR event:

1. Unstable overclock (CPU, GPU, etc). Revert all and any overclocks to stock settings.

2. Bad sector in memory resulting in corrupt data being communicated between the GPU and the system (video memory otherwise known as vRAM or physical memory otherwise known as RAM).

GPU testing: Furmark, run for ~15 minutes and watch temperatures to ensure there's no overheating and watch for artifacts.

RAM testing: Memtest (RUN FOR NO LESS THAN ~8 PASSES) - Refer to the below:

Memtest:

Memtest86+:

Download Memtest86+ here:

http://www.memtest.org/

Which should I download?

You can either download the pre-compiled ISO that you would burn to a CD and then boot from the CD, or you can download the auto-installer for the USB key. What this will do is format your USB drive, make it a bootable device, and then install the necessary files. Both do the same job, it's just up to you which you choose, or which you have available (whether it's CD or USB).

Do note that some older generation motherboards do not support USB-based booting, therefore your only option is CD (or Floppy if you really wanted to).

How Memtest works:

Memtest86 writes a series of test patterns to most memory addresses, reads back the data written, and compares it for errors.

The default pass does 9 different tests, varying in access patterns and test data. A tenth test, bit fade, is selectable from the menu. It writes all memory with zeroes, then sleeps for 90 minutes before checking to see if bits have changed (perhaps because of refresh problems). This is repeated with all ones for a total time of 3 hours per pass.

Many chipsets can report RAM speeds and timings via SPD (Serial Presence Detect) or EPP (Enhanced Performance Profiles), and some even support changing the expected memory speed. If the expected memory speed is overclocked, Memtest86 can test that memory performance is error-free with these faster settings.

Some hardware is able to report the "PAT status" (PAT: enabled or PAT: disabled). This is a reference to Intel Performance acceleration technology; there may be BIOS settings which affect this aspect of memory timing.

This information, if available to the program, can be displayed via a menu option.

Any other questions, they can most likely be answered by reading this great guide here:

http://forum.canardpc.com/threads/28864-FAQ-please-read-before-posting
3. Corrupt hard drive or Windows install / OS install resulting in corruption to the registry or page file.

HDD diagnostics: Seatools - Refer to the below:

http://www.seagate.com/support/downloads/seatools/

You can run it via Windows or DOS. Do note that the only difference is simply the environment you're running it in. In Windows, if you are having what you believe to be device driver related issues that may cause conflicts or false positive, it may be a wise decision to choose the most minimal testing environment (DOS).

Run all tests EXCEPT: Fix All, Long Generic, and anything Advanced.

To reset your page file, follow the instructions below:

a ) Go to Start...Run...and type in "sysdm.cpl" (without the quotes) and press Enter.

- Then click on the Advanced tab,
- Then on the Performance Settings Button,
- Then on the next Advanced tab,
- Then on the Virtual Memory Change button.

b ) In this window, note down the current settings for your pagefile (so you can restore them later on).

-Then click on the "No paging file" radio button, and

- then on the "Set" button. Be sure, if you have multiple hard drives, that you ensure that the paging file is set to 0 on all of them.

-Click OK to exit the dialogs.

c ) Reboot (this will remove the pagefile from your system)

d ) Then go back in following the directions in step a ) and re-enter the settings that you wrote down in step

b ). Follow the steps all the way through (and including) the reboot.

e ) Once you've rebooted this second time, go back in and check to make sure that the settings are as they're supposed to be.

Run System File Checker:

SFC.EXE /SCANNOW

Go to Start and type in "cmd.exe" (without the quotes)

At the top of the search box, right click on the cmd.exe and select "Run as adminstrator"

In the black window that opens, type "SFC.EXE /SCANNOW" (without the quotes) and press Enter.

Let the program run and post back what it says when it's done. 

- Overheating of the CPU or GPU and or other components can cause 0x116 bugchecks. Monitor your temperatures and ensure the system is cooled adequately.

- GPU failure- Heat, power issue (PSU issue), faulty vRAM, etc.

The following software issues can cause a TDR event:

- Incompatible drivers of any sort

- Messy / corrupt registry

- Corrupt Direct X - http://support.microsoft.com/kb/179113

- Corrupt system files (run System File Checker as advised above)

- Buggy and or corrupt 3rd party drivers. If you suspect a 3rd party driver being the issue, enable Driver Verifier:

Driver Verifier:

What is Driver Verifier?

Driver Verifier is included in Windows 8/8.1, 7, Windows Server 2008 R2, Windows Vista, Windows Server 2008, Windows 2000, Windows XP, and Windows Server 2003 to promote stability and reliability; you can use this tool to troubleshoot driver issues. Windows kernel-mode components can cause system corruption or system failures as a result of an improperly written driver, such as an earlier version of a Windows Driver Model (WDM) driver.

Essentially, if there's a 3rd party driver believed to be at issue, enabling Driver Verifier will help flush out the rogue driver if it detects a violation.

Before enabling Driver Verifier, it is recommended to create a System Restore Point:

Vista - START | type rstrui - create a restore point
Windows 7 - START | type create | select "Create a Restore Point"
Windows 8 - http://www.eightforums.com/tutorials/4690-restore-point-create-windows-8-a.html

How to enable Driver Verifier:

Start > type "verifier" without the quotes > Select the following options -

1. Select - "Create custom settings (for code developers)"
2. Select - "Select individual settings from a full list"
3. Check the following boxes -
- Special Pool
- Pool Tracking
- Force IRQL Checking
- Deadlock Detection
- Security Checks (Windows 7 & 8)
- DDI compliance checking (Windows 8)
- Miscellaneous Checks
4. Select  - "Select driver names from a list"
5. Click on the "Provider" tab. This will sort all of the drivers by the provider.
6. Check EVERY box that is NOT provided by Microsoft / Microsoft Corporation.
7. Click on Finish.
8. Restart.

Important information regarding Driver Verifier:

- If Driver Verifier finds a violation, the system will BSOD. To expand on this a bit more for the interested, specifically what Driver Verifier actually does is it looks for any driver making illegal function calls. When and/if this happens, system corruption occurs if allowed to continue. When Driver Verifier is enabled, it is monitoring all 3rd party drivers (as we have it set that way) and when it catches a driver attempting to do this, it will quickly flag that driver as being a troublemaker, and bring down the system safely before any corruption can occur.

- After enabling Driver Verifier and restarting the system, depending on the culprit, if for example the driver is on start-up, you may not be able to get back into normal Windows because Driver Verifier will detect it in violation almost straight away, and as stated above, that will cause / force a BSOD.

If this happens, do not panic, do the following:

- Boot into Safe Mode by repeatedly tapping the F8 key during boot-up.

- Once in Safe Mode - Start > Search > type "cmd" without the quotes.

- To turn off Driver Verifier, type in cmd "verifier /reset" without the quotes.
・    Restart and boot into normal Windows.

If your OS became corrupt or you cannot boot into Windows after disabling verifier via Safe Mode:

- Boot into Safe Mode by repeatedly tapping the F8 key during boot-up.

- Once in Safe Mode - Start > type "system restore" without the quotes.

- Choose the restore point you created earlier.

-- Note that Safe Mode for Windows 8 is a bit different, and you may need to try different methods: 5 Ways to Boot into Safe Mode in Windows 8 & Windows 8.1

How long should I keep Driver Verifier enabled for?

I recommend keeping it enabled for at least 24 hours. If you don't BSOD by then, disable Driver Verifier. I will usually say whether or not I'd like for you to keep it enabled any longer.

My system BSOD'd with Driver Verifier enabled, where can I find the crash dumps?

They will be located in %systemroot%\Minidump

Any other questions can most likely be answered by this article:
http://support.microsoft.com/kb/244617
 
-------- --------------------------------------------------------------------------------------------

Now that we've gone over what can cause a 0x116 bugcheck, let's go over a very simple case I solved the other day!

The user was complaining of crashing during gameplay, specifically Battlefield 3. After taking a look at the dumps, of course there were 0x116 VIDEO_TDR_ERROR bugchecks. Here's a dump file excerpt:

Built by: 7601.17835.amd64fre.win7sp1_gdr.120503-2030
Debug session time: Fri Jun 22 04:12:04.033 2012 (UTC - 4:00)
System Uptime: 0 days 1:26:38.899
BugCheck 116, {fffffa800cdfe4e0, fffff880042078b8, 0, c}
*** WARNING: Unable to verify timestamp for atikmpag.sys
*** ERROR: Module load completed but symbols could not be loaded for atikmpag.sys
Probably caused by : atikmpag.sys ( atikmpag+78b8 )
BUGCHECK_STR:  0x116
PROCESS_NAME:  bf3.exe
As you can see there, the probably caused by is actually pointing to atikmpag.sys (ATI/AMD video card drivers). In most cases, this means nothing obviously as a 0x116 is the display driver failing to recover, so Windows obviously says "Well, here's what caused the crash", so of course in most cases it's going to be the video / display driver. In *116 crashes you will at times also see Direct X be the fault (either dxgkrnl - DirectX Kernel OR dxgmms1 - DirectX MMS).

In most cases of 0x116 BSOD's, the first thing I always recommend and as you can see above, is the uninstall and reinstall of the video card drivers. If the user is at the latest, rollback a version or two to see if the issue disappears. If the user is not at the latest, then update to the latest OR a beta if available.

Well, as it turns out, this case was as simple as an uninstall and reinstall of the latest available video card drivers for the user's specific GPU. Sometimes though, and unfortunately in most cases, it's not that easy and the issue is usually hardware related which takes some patience along with trial and error.

2 comments:

  1. Thanks for instructions. I had 116 BSOD issue which was making me mad. I tried everything, including format HDD, reinstall OS and change BIOS. No effect. On the end I was thinking about new motherboard, but luckily I had an idea, what if it's only a power issue. I changed my GPU supplemental power from 2 x 4-pin peripheral cable with PCI-e adapter to actual PCI-e to PCI-e cable and it works now as a charm. I guess usual plug on my power supply was not able to provide enough current, but PCI-e plug does :)

    ReplyDelete
    Replies
    1. My pleasure!

      Great to hear you're solved your issue : )

      Delete