24th December 2007, 04:41 AM
Chaclar's Guide to Troubleshooting BSOD
Well, in the tech help section, I’ve noticed a number of threads asking for help with blue-screen errors, so I decided to write up this (kinda) quick and (sorta) easy troubleshooting guide for the most common ones.
Note: this guide should be applicable to Windows Vista BSODs, being as i don't think Microsoft has changed many of the error codes since back with windows 95, so i don't imagine they've changed them with Vista, but as i don't use Vista, my knowledge of it is very limited.
Introduction and Disabling the auto restart
A Blue Screen Of Death (BSOD) is another name for the windows STOP error (when a critical error occurs, windows “stops”, hence the name).
By default, windows doesn’t display the error screen. It simply automatically reboots, which can be rather annoying if you are receiving an error constantly, as all you would see if a brief blue flash, which is remarkably unhelpful in troubleshooting.
Usually, you’ll want to be able to see the error so you can possibly figure out the problem. To do this, right-click on the “My Computer” icon on the desktop and open properties. This will bring you to the system information window. From there, click the advanced tab, then click the settings button under the startup and recovery section. From there, just uncheck the “automatically restart” box, then hit the ok buttons till you’re back to the desktop. Now the next time it encounters a blue screen error, it will show you the error info so you can start figuring out your problem.
There are many different types of errors, and most could have several different causes. There are a few important parts of a BSOD error screen.
-The error name – this part is right near the top, spelled out in all caps with underscores for spaces. This tells you exactly what the error was, and sometimes, this is enough to find out what caused the problem, but usually you need to dig deeper.
- Troubleshooting advice – this is right under the error name and it offers some basic, though sound, troubleshooting advice. The things listed here are the first things you should try.
- Technical information – this is the probably the most useful part of the error screen. Here it lists out the actual error code (in an every-so-understandable hexadecimal format), along with other information about the system status when the error occurred which are different depending on what the error was.
Common Error codes
IRQL_NOT_LESS_OR_EQUAL or DRIVER_IRQL_NOT_LESS_OR_EQUAL - 0x0000000A or 0x000000D1
these are likely the most common of BSOD errors, and they have several causes, which often makes them a royal PITA to troubleshoot.
Firstly, if you are running overclocked on anything, crank it back to stock speeds. Anything overclocked can causes errors in something that is seemingly completely unrelated.
Another cause is something wrong with a hardware driver.
If you installed something new recently, like a new webcam, good money says that’s your problem. Try removing the device and uninstalling the driver to see if the error goes away.
Often, there is a filename listed in the technical information part of the error. This is the file that caused the error and it can go a long ways to finding the cause. A good idea in this instance is to write down the info given and punch the listed file name into Google and find out what the file is. For example, if you get an error that references “nv4_disp.dll”, a quick Google search will tell you that the file in question is a part of the NVIDIA Forceware display driver, and thus your video card or its driver is the cause of the problem. If you just changed video cards (especially if you switched to another brand, such as a NVIDIA-based card to an ATI-based card), there is some remnant of the old drivers left behind and causing trouble (call it the company's revenge for abandoning them). In this case, you need a tool to completely nuke the old drivers. One of my favourite tools for this is the obviously-named Driver Cleaner. Just run this utility and select the driver you want to get rid of, run it, then reboot.
If the file referenced by the error is found to be related to the sound driver, and it occurs when you are playing a game, such as a FPS or similar, one thing you will want to check is the game sound settings. This error occurs quite often when the sound options are set to ones the sound card doesn’t support, such as EAX (or EAX2, for that matter) on most onboard sound chips, I personally get this all the time when I forget to set the sound options in a new game (i have lousy onboard audio as I'm too cheap to get a "real" soundcard). Try setting the sound into a software mode or safe mode (the naming varies between games) and it’s good betting that your problem will be gone.
if the file mentioned is "dump_wmimmc.sys", congrats, you're experiancing a bug in gameguard. check in the gameguard folder for any .erl files and send them to email@example.com with as much other info as you can (when you get the blue screen, this error code, etc.)
sometimes (fairly often ,really) the file referanced will come up as some part of some spyware/adware program. download spybot S&D and ad-aware, run them, and nuke the program the error points to.
Another possible cause for the error is running more than one anti-virus utility. For various reasons, most anti-virus programs do not play nice with others, so choose one and run only it. Same goes for backup utilities (this usually only applies to ones that run scheduled backups to an external drive or something, but it is still a good idea to just pick one and use it rather than deal with 2 to the nth utilities that do the same thing) and software firewalls.
If you want my opinions on which, I personally use AVG free for antivirus, Nero Backitup for backup tasks, and Zonealarm free for my firewall.
Also, this error can often be caused by overheating, so refer to the UNEXPECTED_KERNAL_MODE_TRAP error listing for further advice on fixing this.
If all this fails, then it's time to take the computer into the shop, as you’re likely looking at hardware failure, though it could be pretty much anything in the case, meaning that the tech will need to play partswap to figure out what needs to be replaced.
DATA_BUS_ERROR – 0x0000002E
This error tells you that there is something wrong with your RAM. It is either
A)mis-configured, such as running at too high a clock speed, or too aggressive timings, or on some motherboards that are picky about it, you may have it in the wrong slot. or
B)your RAM is defective. If you just installed new ram, odds are good there lies your problem.
For the first cause, go into the BIOS (look at the bottom of the screen while the computer is booting and it will usually say something like “Press F2 to enter SETUP” or something similar), then around for settings regarding memory. Make sure the memory clock speed or the FSP (Front Side Bus) is set to automatic and the memory timings are set to either “automatic” or “by SPD”. Then save the changes and exit out of the BIOS, then go back into windows to see if the error is gone. keep in mind that most OEM computers (Dell, HP, etc.) will not have these options available. they'll be hidden (to prevent inept people from fooling with them) and will be locked in the said automatic setting. if this is the case, things check out here and you can continue with the next step.
If not, crack open the manual for the motherboard and look for information on installing memory. Open up the case and look to see if the way you have your memory arranged matches what the manual says. If not, then make it how it says.
Then shut the case and boot up to see if your error is gone.
If not, if you installed new ram recently, open up the case again and pull the new stick out, then boot up and see if the error is gone.
If not, then try removing the other stick and putting the new one back in. if you have more than 2 sticks of ram in your system, you’ll have to keep swapping them out, keeping one in the machine at a time to test them. In this instance, a memory-diagnostic program such as memtest86 (google it!) is helpful.
Also, try swapping the ram out into different slots, as it is entirely possible (though rare) for a slot to stop working and the symptoms are damn near identical to bad ram.
NTFS_FILE_SYSTEM or FAT_FILE_SYSTEM – 0x00000024 or 0x00000023
These errors are slightly different but both scream in, literally, all caps that there is something wrong with your file system, hard drive or accompanying components.
First thing you want to do before anything else is to back up everything of importance on your computer. Do it RIGHT NOW! This error is sometimes the first warning of a failing hard drive, so better safe than sorry.
The most commonly overlooked cause of this problem is a loose cable, especially with first-generation SATA cables. Those things can come loose ridiculously easy, though they’ve since revised the design to make them fit more securely using a locking thingy, which they really should have done in the first place.
Just simply remove the cable from both ends and then line the cable back up (make sure you put it in the right way! it only fits one way.) and push it in firmly. Also check that the power connector is firmly in place. And if you have a SATA drive, make sure that you only have 1 power connector plugged in, as some drives (WD drives in particular) have both the old 4-pin Molex connector and the new SATA power connector (which also had a tendency to come loose. why they didn't catch this when they were designing the bloody things i don't know.), and it can cause problems if both are plugged in. Then try booting the machine up again to see if the problem has gone.
If not, then try swapping out the cable with a cable that you know is good.
If that doesn’t fix it, go into the run menu in windows (or if windows isn’t booting, go into the “safe mode with command prompt” option in the boot menu. Just mash F8 repeatedly while the computer is booting) and type in “chkdsk /f /r” in, without the quotes. Make sure you use the correct slash (it’s the one that is on the same key as the question mark). If you’re in windows, it will prompt you if you want to schedule it for the next boot. Click yes, then reboot the computer. Once you reboot, it will start a scan of the hard drive. The /f tells it to look for and fix any file-system errors and the /r tells it to then scan the drive surface for any bad sectors and remark any it finds. this same procedure can also be done by booting off your windows XP CD and selecting the recovery console if it won't boot off the hard drive.
Also, if you use an add-in card, such as a raid controller, for your hard drives, make sure you have the latest drivers for it. The same goes for the built-in controller on the motherboard (for this one you want the chipset and/or the DMA/UltraDMA drivers). these go by various names such as "Hyperion" drivers or "4-in-1 drivers" and such terminological stuff will vary depending on who made the motherboard/chipset/controller card. in regards to the raid controllers, updating the BIOS on the card might help solve some problems. installation procedures will vary between manufacturers, so read the bloody manual and do as it says.
UNEXPECTED_KERNAL_MODE_TRAP – (0x0000007F)
This error most commonly happens if you’re overclocking the system too far. If you are, set it back to stock speeds. But there are other possible causes if you are not overclocking.
The second most common problem is that some component is overheating. About the simplest way to test this is to take the side panel off the computer and point a room fan to blow into the computer, then return to what you were doing when the system crashed and see if the error returns. If it doesn’t, look at your cooling system. Are the fans clogged with dust? Are there both intake and exhaust fans? Are they balanced in terms of airflow? Are any of them blocked? Do all the fans on various components (video card, CPU, Northbridge chip, etc.) work? If all those check out, it would be a good idea to add another fan or two, or to replace one or two of the fans with more powerful ones.
If it still blue screens with the fan, then check the memory. This error can sometimes be caused by putting memory sticks into the wrong slots, especially with a motherboard that uses duel-channel memory. Check the manual for the proper placement of the memory sticks.
also, this error is sometimes caused by Norton antivirus having issues with kernel memory. a solution to this, requiring registry editing is listed here.
If all this fails, then it's time to take the computer into the shop, as you’re likely looking at hardware failure, likely either the CPU, RAM, or the motherboard.
Less common errors
PAGE_FAULT_IN_NONPAGED_AREA – this one usually means faulty hardware, particularly ram. Follow the same troubleshooting steps as with the DATA_BUS_ERROR, though this can also be caused by faulty memory on a video card or even the CPU’s cache.
INACCESSABLE_BOOT_DEVICE – this one is usually caused by setting the jumpers wrong on a parallel ATA drive (if you have more than one drive on a cable, make sure that one is set to Master and one to Slave. I personally don’t bother with cable select, as it can sometimes cause problems, like this one). Other possible causes include a bad/loose cable (refer to NTFS_FILE_SYSTEM for troubleshooting steps)), a boot sector virus or installing the wrong drivers for the chipset or an add in drive controller/raid controller.
VIDEO_DRIVER_INT_FAILURE – this is usually caused by either installing the wrong drivers for your video card or by rebooting your computer before the drivers are finished installing. Boot into safe mode and uninstall the current drivers, then reboot normally and install the correct ones or just simply reinstall the drivers if you are certain you have the correct ones.
BAD_POOL_CALLER – this is a driver problem, usually caused by installing faulty or incompatible drivers, such as installing windows 98 drivers on windows XP. Make sure you have the correct and most recent drivers for everything.
this one could also be some nasty spyware/adware, so download spybot S&D and ad-aware, run them, and nuke whatever they find.
PFN_LIST_CORRUPT – this is caused by faulty ram. Follow the troubleshooting steps for DATA_BUS_ERROR.
MACHINE_CHECK_EXCEPTION – this is caused by either overheating, a defective or over-aggressively overclocked CPU or a faulty or under-powered power supply. Set the CPU speed back to stock, follow the troubleshooing steps for UNEXPECTED_KERNAL_MODE_TRAP, and if that doesn’t work, get a new power supply. And with a new power supply don’t skimp. Get a good powerful one, minimum 400 watts for a basic modern system (2 optical drives, 1 hard drive, 1 CPU, 1 video card), and from a reputable manufacturer, such as PC Power & Cooling, Enermax, Antec, or Suntech. you'll definitely want more than that if you have a very recent (made in 2006) videocard (if it uses a power connector, it fits this qualifier) or processor, as recent videocards can suck well over 100 watts (the latest NVIDIA card will draw up to 185 watts) and many processors are pretty close to that mark too, to say nothing of all the other stuff in the computer, so for these, 400 watts would be a good minimum and more is always better.
If the PC in question is a computer made by Dell, you will need to buy a specific power supply from PC Power & Cooling (one of the ones that is wired specifically for Dell computers), as Dell uses a strange wiring scheme for the main connector and there will be fireworks if you connect a normal power supply to a Dell computer. actually, some new dell computers even use a bloody propertory power cable, in addition to the propertory internal connector wiring (WTF?).
This is Chaclar's Guide to Troubleshooting BSOD. All rights are reserved to him. I took this from Sleepywood, so it can help you guys. Hope it does!
22nd October 2016, 06:46 PM
I am Having irql_not_less_or_equal low problem in my Pc....