How To Fix Windows 7 Crashes, Bugs And Blue Screens


27.04.2011

How To Fix Windows 7 Crashes, Bugs And Blue Screens

I am guessing that most of you are reading SoftSailor from a nice Windows 7 OS that you love and hate at the same time. Like for example after finishing the installation of all your things inside the computer, things like drivers updated, a good antivirus and a few need to have software, something blue appears on your desktop. Yes, I am talking about the old faithful bleus screen that we have since Windows was invented.

Most of the times the problem will be solved using the classic Windows debugger tool. The good thing about this solution is the fact it’s completely free and easy to use. And this happens since the era of the Windows XP back in 2005. For Windows 7, the version is updated to the latest thing you can want. But there is a question that probably bugs you a lot:

Are the blue screens different in all the Windows versions?

According to Andre Vachon, principal development lead for Microsoft: “The latest releases of Microsoft Windows use the same operating system kernel, the same primary interfaces, drivers work on both server and client, and the debugger uses the same debug files. Further, we used the same code base and source tree to compile both 32- and 64-bit versions.”

After hearing this “news” I chose to refer to Windows 7 in most of the problems. This doesn’t mean other versions of the Microsoft OS are different and the motives are of another nature. So the 1st question that needs to be answered is:

Why does Windows 7 crash?

With new version of Windows appearing every 2-3 years it is obvious the OS has come a long way since the 1st releases. Only by looking at the fact that the OS has come from 16-bit to 32-bit and now 64-bit you realize the progress done by producers and developers. But why does Windows still crash? Well it appears that the reasons now are the same reasons for the crashes in Windows XP.

Windows is an OS that is taking advantage of a mechanism in which multiple applications are able to run at the same time without any of them stepping in each other’s way. You have probably heard about this being the User Mode and Kernel Mode. At 1st it was known to be the Ring Protection scheme. 1st let’s talk about the Kernel Mode

Kernel Mode (Ring 0) is in fact software that has 100% access to the hardware and things inside it. The software that you find inside is on the most trusted level and this happens because this is able to execute instruction and reference any address in the system. When you will have a crash in the Kernel Mode this means a total system failure and the computer needs a reboot. Also this is the place in which you will find operating system kernel code and most drivers.

Now let’s talk a little about the User Mode. User Mode (Ring 3) software is a type of software that can’t access directly the hardware or in fact references and addresses freely. The software needs to send instructions trough things called APIs. This is a feature that offers protection in the entire system regardless of whether an application makes an erroneous call or accesses an inappropriate address.

When a crash happens in User Mode you will need to restart the application and not the entire OS. This is the place where most of the code is running and application run different stuff from things like Word to things like encoding an MP3.

As you probably guessed and noticed most of the software run in User Mode these days. This reduces the changes for applications to corrupt system-level software and at the same time corrupt each other. A thing that needs to be pointed is the fact that kernel-mode software is software which is not protected from other kernel-mode software.

To make you understand it better this is one example: you are installing a driver. That driver takes a place in the memory that is “reserved” for another thing. This is the moment when Windows will give a total system failure. You will have what is called a Bug Check fallowed instantly by the nice blue screen with all those numbers.

Now let’s talk a little about number. Although most of the times they vary this doesn’t mean they vary so much. After doing some research and from my experience when it comes to dealing with crashes I have come to a clear conclusion: around 70% of the system crashes are happening because of the third party drivers operating in Kernel Mode, around 10% are from unknown sources, 10% or so are from bad hardware (around half of these are happening because of the bad memory) and only 5% are happening from bad Microsoft code.

A thing that you have to take in consideration is the fact that most crashes happen for the same reason they happened before. This happens mostly because most of admins don’t have a clue on how to handle them when they appear. By doing so, they allow for crashes to “bug” you again and again. I will show you how to resolve them and what things are necessary to be known and used in the battle against blue screens.

System Requirements

For things to work, you will need to use WinDbg. Here is the minimal system requirement when it comes to using this method:

  • Windows 7/Vista/XP or Windows Server 2008/2003 on 32-bit or 64-bit

  • Hard disk space around 25MB

  • A live internet connection

  • Microsoft Internet Explorer

  • Also it is required the latest version of WinDbg that has an option with Windows SDK. This is a file called winsdk_web.exe and has 498KB in size and is free to get (downloaded for free.)

  • Also there is the need of a memory dump that must be on C: for Windows to save the memory dump file

Now let’s find out how we will install the WinDbg software.

After finishing the download for Windows SDK and running the Setup wizard simply select the Debugging Tools for Windows option that is located under the Common Utilities. Now it’s the time to configure the Startup and Recovery. This is a little annoying at some points but must be done. I found out that this is very non-intuitive when it comes to locating the dialogue box needed to check that your system is set to take the appropriate actions during a BugCheck. This will include things like if to automatically restart and also the size of the dump files to be saved. Here is how things are done.

  1. Select the Start button that you will find in the bottom left side of the screen.

  2. From there go and select Control Panel.

  3. Now select the System and Security option.

  4. Notice the different options in the right column. Select System from there!

  5. Now simply select the Advanced system settings. This will display the System Properties box.

  6. From the System Properties box select the Advanced tab.

  7. Notice that in the Startup and Recovery is the Settings button. Click it!

Now please take note of the Startup and Recovery dialog box below:

Please ensure that Startup and Recovery settings are correct.

Under system failure do the following:

  1. Check Write that is an event to the system log.

  2. Also check the Automatically restart.

  3. Now select the Kernel memory dump.

  4. Be sure that in the dump file you will find written %SystemRoot%MEMORY.DMP

  5. Now simply check the Overwrite any existing file to save hard drive space

By doing so, you will let your system to save both a kernel dump file and a minidump file. But although this will enable to have a minidump for every event you will save only the last kernel dump. Now it’s time to configure WinDbg.

To launch the debugger do the fallowing steps.

  1. Go to start.

  2. Select All Programs.

  3. Go to Debugging Tools for Windows

  4. Select WinDbg

  5. If you intend to use this on a more frequent base simply pin the program to the bar.

Ok hopefully things were ok until now. But before you are ready to jump and find the miscreant module in a dump file you 1st need be sure that the debugger is ready. The most important aspect of this part is that you locate the symbol files for the exactly correct version of the operating system.

When it comes to symbol products I must tell you that these are a byproduct of compilation. In fact, when a product is compiled, the source code is translated into a code that the machine can understand. In the exact time, the compiler creates a symbol file from a list of identifiers with their location in the program and of course their attributes. This means that a program doesn’t require this information to execute. This being told it’s obvious that the program can be taken out and stored in another file, reducing the size of the final executable.

Again you probably already know that a small executable take up less disk space and is loaded in the memory faster than a larger one. But to all this there is also a flip side: if a problem causes a problem of some kind, the operating system will only hex the address of the problem. You will need of course something more than that to determine which program was using the memory space and at the same time what it was trying to do.

For this you will need to understand the Windows symbol tables. This is where the answer lies. Knowing the specific symbols in your memory is like putting names on a map. But also if you were to make an analysis of the wrong symbols is like walking with a map of New York in Los Angeles.

Now let’s find you how you need to configure WinDBG to locate the needed symbols.

I can tell you from start there are an incredible big number of symbol table files for Windows. This happens because every build of the OS and even one-off variants will create every time a new file. But your luck stands in the fact that WinDbg can handle this situation. All you have to do is configure this on the correct path. For this to happen launch the program and make the following selections:

  1. Go to file

  2. Symbol file path

  3. Now enter the next path “srv*c:cache*http://msdl.microsoft.com/download/symbols” and at the same time be sure that you have access from your Windows firewall to msdl.microsoft.com

  4. An important aspect for you to know is the fact that the address between the asterisks is the place you would want the symbols to be stored for future references. In my case I have chose to store my symbols in c: drive and therefore I have stored them in “srv*c:symbols*http://msdl.microsoft.com/download/symbols

Every time you will open a memory dump, WinDbg will start to look over the executable files (.exe, .dll, etc.) and also start to extract the version information. After that happens, it will create a request to the symbol server from Microsoft. It will search for things like version information and also will locate the required symbol in the table and will extract information about it. This will make the program to not download all the symbols but only the one that is the most essential. There is also the possibility to download all the symbols from the file at Microsoft but that would mean you will have to run between 600MB and 800MB for each version. Take in consideration that WinDbg downloads less than 100MB to be able to analyze several version of the operating system. Although you may say that hard drive costs are very low I would say stick with the less space consumption.

Now let’s talk a little about the dump files. At 1st you need to know that a memory dump file is in fact a snapshot of what the system had in memory at the time of the crash. For this to happen, Windows creates three different sizes of memory dumps. These are minidumps, kernel dumps, and full dumps.

  • The small one or the minidumps.

In Windows 7, the minidumps have 256K-bytes, which is a tiny size by any standard. But they have grown a little more from the Windows 2000/XP day when they had only 64K in size. The main reason of this happening is the fact that they don’t contain any of the binary or executable files. But at the same time the files are very important and even crucial when it comes to making an analysis by the debugger. If you are debugging on the machine that created the dump file, WinDbg will be able to find them in the System Root folders. This will happen of course if the binaries were not changed by the system update after the dump file was created. Also, the debugger should be able to locate them trough the SymServ. If the OS, Windows 7 in this case, is configured correctly, it will create a minidump save for every crash event and even when it comes to a kernel dump.

  • Kernel dump

When it comes to the size of Kernel dumps, you have to know these are equal with the size occupied on the RAM by Windows 7’s kernel. On my testing machine, which is a notebook, a kernel dumps runs around 344MB and with compression just a little over 100MB. The main advantage of the kernel dump is the fact it contains the binaries. I have set the system to always save the latest kernel dump. Also take notice that when saving this, the system also makes a minidump save.

  • Full dump or the complete version.

This one is the biggest when it comes to size. It is in fact equal to the amount or RAM installed. And in these days where many systems have many GB of memory, the storage can become very quickly a problem. And think only about the headache that would appear if numerous crashes would happen. On most cases I don’t encourage this type of save and this is mainly because of the amount of storage space. But people from Microsoft said at one point that:  ”if you are trying to debug a very complex problem, such as an RPC issue between multiple services in the box and you want to see what the services are doing in User Mode, the full memory dump can be very helpful.” This being said I would say to you to stay with the kernel dump and always be ready to generate a full dump if the situation requires it. But now there is another question that is bugging us and that would be: What would happen if there is no memory dump to work with?

Although crashes are something everyone doesn’t want this is a situation when one is required. There is one possibility with using some Registry settings but I must admit this is not my favorite. I tend to use a tool called NotMyFault, a tool made by Mark Russinovich and his team at the SysInternals Company. With this you will be able to load a misbehaving driver (take notice that this option requires administrator privileges).

This will create a system crash. Be sure your system is ready for the task and make sure anyone who needs your system will log of in a few minutes. Also save all the files that contain information that you will need on further times. Your computer should shutdown, making so a minidump and a kernel dump which you will be able to access. In my experience no problems were encountered and everything should run OK.

Here are the step by step instructions to do so.

  1. 1st download the NotMyFault app and force a system crash.

  2. I recommend to download the NotMyFault tool from the next Microsoft Web site: http://download.sysinternals.com/Files/Notmyfault.zip

  3. Now simply right click on the NotMyFault.exe. On some situation the message “You don’t have permission to open this file” could appear. Simply type “Run as Administrator” to make the execution of the application happen.

  4. A menu will appear. From there select “High IRQL fault (kernel mode)” and after the Do Bug button. By doing so you will generate a memory dump file and also a “Stop D1″ error.

  5. Now wait for the system to recover and you will have at your disposal both a kernel and a minidump to use.

Load the dump file! A message could appear that would look like “You don’t have permission to open this file”. Simply re-launch the WinDbg by right-clicking on it and selecting Run as administrator.

With the debugger running select the menu option File and the Open crash dump. Now point it where the memory dump that requires analyze is. When the system will require you to save the information for workspace, select yes and be sure to remember where the dump file is.

WinDbg is a program that looks for the Windows symbol files for that precise build of Windows. It will reference to the symbol file path and accesses microsoft.com displaying the results obtained.

If you have the impressions the debugger is a little too busy don’t worry. This happens probably because this is the 1st time a dump file is used and so the symbols are downloaded from the SymServ. This means that the next time a dump is opened on the same machine the program will work faster because all the needed symbols are there to grab.

A Command window should appear at one point. This is the place where crash analysis will be displayed. Notice that at the lower left there will be a KD> prompt. To the right of that prompt there is a single line window. This is the place where all the commands will be put.

One nasty situation could happen and you will get the message “*** ERROR: Symbol file could not be found. Defaulted to export symbols for ntoskrnl.exe”. This will happen for sure because of one of the following reasons.

  • You entered the incorrect path. Be sure there are no errors or typos (also avoid blank spaces) in the address you entered earlier.

  • Be sure you internet connection is up and running.

  • Also there is a change that your Firewall could blocked the symbol files

  • Some symbols were damaged during the retrieval of information.

If you are sure the path was written correctly and the connection to internet is a solid one the main reason would be your firewall. If the firewall blocks at start the WinDbg app or download the necessary files, the result could be a corrupt file. If even after the unblocking of the firewall things don’t change too much the symbol will remain damaged for the entire operation. I would recommend to simply close the app, delete the folder in which the symbol is stored and afterwards unblock the firewall. Now it’s a simple matter of reopening the WinDbg and the dump file. The program will simple recreate the folder and download again the symbols.

Another nasty thing would be if the following message would appear: “***** Kernel symbols are WRONG. Please fix symbols to do analysis.”

Now look trough  WinDbg’s output. An error message could appear in which the system will state it couldn’t locate the information myfault.sys:

“Unable to load image ??C:Windowssystem32driversmyfault.sys, Win32 error 0n2

*** WARNING: Unable to verify timestamp for myfault.sys

*** ERROR: Module load completed but symbols could not be loaded for myfault.sys”

This could only mean one thing. The debugger was simply looking for the information in the myfault.sys. But because this is like a third-party driver (although it is a Microsoft application, some other people are involved in the creative procedure) there will be some cases where there wouldn’t be any symbols for it. This is one message that must not be ignored. In most cases vendors don’t send drivers with symbol files on them. They aren’t necessary to your work and you can even pinpoint the problem without them.

When you have WinDbg to open a dump file it will automatically run a basic analysis. And this will happen without even giving the debugger direct commands. All you have to do is just open a specific dump file.

Now we will learn about some useful commands.

Commands:

There are just too many commands to write about in just one article. But they will come in your memory with long usage of the WinDbg tool. But for this case all you would need is one. And ok, maybe one it’s too little so we will raise the stake to three. The needed commands are !analyze -v, lmv, and lmvm. So let’s take them one by one and see how things are going.

!analyze –v

Simply type analyzes -v on the command line that is situated at the bottom of the Command window. A thing that needs to be pointed is the space located before the “-v” thing. Also it is very useful to know that the v or verbose is the command that tells the WinDbg app you are asking for details. The thing showed will scare you a little but don’t worry; things are always like this at the beginning. Also this is the point you might even recognize the cause of the crash.

A very important aspect of using the !analyze –v command is the stack text. When you are looking at a dump file, always be sure you look at the far right end of the stack and notice any third party drivers. In the case of using the NotmyFault driver you will see the myfault message. Also take in consideration that the chronologic writing of the events happens from bottom to up and pushing the previous things down. As you probably seen in this short stack you can see that myfault was active. Also some data was removed to make a better fitting on the exibit page as it’s indicated by the “truncated” comments.

Now let’s speak about the analysis made with the lmv command.

This is the step where you can confirm the suspect’s existence and find at the same time some details about him. By typing lm in the command line you will display the loaded modules. The v command instructs the debugger to output in verbos mode and showing all the known details that are happening for the modules.

In some situation a message telling you the program is busy will appear. Don’t worry because this usually happens when the program is gathering information for modules loaded when the system failed and it may take a couple of minutes. When this is done you will see the kd> back where BUSY was.

Although the information could outmatch you at some point don’t get scared. It will take a wile to get use to it. The Edit -> find path is usually the easiest way in the case of myfault. When it comes to the amount of information you see it always depends on the driver vendor. In some cases the vendeors put very little information in their files and some like Microsoft tend to put very much.

The last but not least is the analysis with lmvn.

This is the best way to get right to a specific module. All you need to do is enter the lmvm myfault and the app will only return the data required for that module. After finding the vendor’s name you will have to go to its web site and be sure to check for updates, knowledge base articles, and other supporting information. If this things don’t exist or they don’t offer any solution to your problem simply contact them. They will ask for sure to send the debugging information (this can be copied in an email message or even in a simple word document). In some cases they might even ask you to send them the memory dump. In this case I would recommend to zip it to protect the data integrity.

In most cases (two out of three) you will know the cause as soon as you open the dump file. But what happens in that 1/3 cases? Well this is the point where the information provided is either misleading or insufficient. What do you do in this case?

You have to take in consideration that in some cases the problem is the hardware. If you have numerous crashes and the reason is unclear, this may be because of a memory problem. Simply download the free test tool called Memtest86. With this simple app the diagnostic is quick and easy to make. There are numerous people who can’t believe the problem is in their memory component. And this happens because of the few times this happens. But in those few times when “strange” cases where crashes appear this is the main reason.

How about the times when Windows is to blame for this issues? Well although is a common believe of this happening trust me when I say: not likely! As surprising to you as it may seem the OS is not to blame for this problems. If ntoskrnl.exe (Windows core) or win32.sys are the things to blame by the debugging tool don’t trust it. In most cases is more likely that some errant third-party device driver called upon a Windows component to perform an operation and then simply gave a wrong instruction telling the program to write to non-existing memory. I don’t say Microsoft can’t be blame, I just state that in most cases they are not the guilty factor in the equation.

Let’s say the problem could appear from the wrong driver name. This happens when an antivirus names this to be the cause. One example would be after using !analyze -v, the debugger reports a driver for your antivirus program at the line “IMAGE_NAME”. Although the result it’s obvious, think of the fact that this driver can be pointed more times than it should. The reason for this happening is the fact that for an antivirus to work it needs to watch all the files opening and closing. To make this happen the program needs to work 24 of 24 hours. The problem is that the antivirus is such a busy app that when a crush will appear the antivirus will appear as using that file or proves. Also keep in mind that any third-party driver on that stack immediately becomes suspect, it will often get named. You probably realize the frequency things like this happen and why this is a nasty problem.

And how about some Missing vendor information! Yes, this can happen and mostly because some driver vendors don’t take the time to give all the needed information with their products. If things like lmv don’t work a good option would be to look at the subdirectories on the image path. In some cases this could not appear. In most cases, one of them will be the vendor name or a contraction of needed. Also don’t forget about that nice search engine everybody is using for a couple of years. I am talking of course about Google. Simply type inside the driver namd or even the folder name and you will be amazed with the amount of information you will have at your disposal. You will find the vendor and at the same time others who have posted useful information about the regarding driver.

Summary

We have walked a long path from knowing the reason of BSOD. The things that you need to remember is that in most cases you will be able to open a dump file and search for the reasons of the crashes in just a couple of minutes. Knowing the reason for crashes in just a couple of minutes and action is a must know thing for computer specialist and even you, the person that just uses the computer on a daily base.

With this being said I wish you all to not encounter crashes and if so remember where you got the information about how to handle them, right here at Softsailor, your favorite “how to” web site.

Tags: dump, windows, system, file, also, kernel, memory, things, information, some, there, most, about, time, would, windbg, don’t, symbols, same, fact, program, files, symbol, cases, problem, driver, because, appear, they, simply