Welcome to MSDN Blogs Sign in | Join | Help

Dr. Watson: Please send in your error report

I bet you’ve seen dialogs similar to this:

 

 

We at Microsoft refer to this dialog/technology as Dr. Watson (after the famous Sherlock Holmes assistant).

 

Before Watson, computer users would get the BSOD (The Blue Screen of Death) which would just say that an error occurred and that Windows would have to shut down. Microsoft had no idea how many times these occurred or how to solve them. 

 

When you choose to send Watson errors to Microsoft, a small amount of information is sent, from which we can (hopefully) figure out the cause of the problem and possibly even point the user to a solution. This information includes things like what program was running, what modules (DLLs, or Dynamic Link Libraries) were loaded for that process, and the contents of the CPU registers.

 

If the error occurs in a third party product, we can forward that information to the third party.

 

I’ve received several Watson dialogs. After I choose Send, many times I’ve been pointed to a web site with updated device drivers that can cure the problem.

 

Windows XP, and in fact all versions of Windows since 3.11 run applications in protected address mode. This means that an errant application (or program) cannot disturb the memory of another running application.  These applications run in what’s called User Mode. The processor enforces this application protection.

 

However, certain programs run in a non-protected mode called Kernel Mode, in which that program can do nasty things like corrupt the memory of other running programs. Kernel Mode also allows the program to access hardware directly and do other operations that User mode programs are not allowed to do. (one of my old computers had a HALT instruction… nasty nasty)

 

This separation between User mode and Kernel mode made operating systems much more resilient from errant applications. However, programs that are Device Drivers run in Kernel Mode, and thus have full power over the memory of the machine. If an error occurs in a Kernel mode application, it’s likely that the machine needs to shut down to prevent any additional damage. If it’s in User Mode, then just closing that process is sufficient to resume.

 

I attended a talk by BillG a while ago in which he said that something like 80% of all Watson reported errors were due to device driver issues, particular video drivers. This implies that the errors were due to third party device driver authors, but perhaps it also means that it’s not easy to write a device driver that is bug free.

 

In the old days, a user could run only one single program at a time. I could run a word processor. If I wanted to run another program, I would exit the word processor and start another program. These monolithic programs had full control of the entire computer: any error would be the fault of the program’s author.  However, this also meant that the author would have to handle all printers, input devices (like mice), output to the screen, reading and writing files on the disk. Operating Systems were developed to help programs handle much of these tasks. They also helped unify the way things were done on the computer: if all programs wrote to the disk/printer/screen in the same way and handled the keyboard/mouse/pen the same way, then the user would get a consistent user interface.

 

I wrote a cartoon animation program around 1982 on an original 4.77 MHz IBM PC (it’s reincarnated in Visual FoxPro: start the Task Pane, Solution Sample, Forms, Form Graphics, Line Animation). You could draw a single picture, save it, draw another picture, save it, then choose to animate, which would morph one picture to the other using interpolation.

 

I wanted to add a mouse interface to the program, so I bought a product called Visi-On from Visicalc, which had a mouse. I had to decode the signals that were sent by the mouse using an oscilloscope to figure out how it communicated to the PC. Then I had to write my own pull-down mouse menus and define the mouse behavior.


Try this: click down, then release on any menu from any application in Windows. Now move the mouse around the screen away the menu. Should the menu disappear when the mouse goes off the menu?  Try clicking down on a menu item. Don’t release the mouse. Should the menu item be chosen? What if you then drag the mouse off the item before you release?

 

Nowadays, running a typical application can have dozens of programs loaded in its process space. For example, I just started Microsoft Word. I have Visual Studio installed, so I can hit bring up Task Manager, right click on the WinWord process, choose Debug. VS starts up and I can choose Debug->Windows->Modules to see that there are over 100 modules loaded!

The interactions, interfaces and assumptions made between these modules are not trivial. With multiple authors and possibly ambiguous specifications, the complexity skyrockets.

 

 

Sending in these errors is very beneficial to software users and Microsoft. Given these error reports, we can see actual computer problems experienced by the user and fix real world issues.

Published Monday, June 28, 2004 1:15 PM by Calvin_Hsia
Filed Under: ,

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

You can also stay up to date using your favorite aggregator by subscribing to the CommentRss Feed

Comments

# re: Dr. Watson: Please send in your error report

Monday, June 28, 2004 1:51 PM by Mark R. Sizer
The tone of this is strange:

Software is easier to write now because one doesn't have to have an oscilliscope to use the mouse, yet software is hard to write because of the number of interactions going on "behind the scenes". So developing software is complicated - so is building a skyscraper, yet they rarely crash (although they are sometimes "buggy").

It's not all that difficult to write software with very few bugs in it - it's just VERY expensive.

OK, so it's difficult to write correct device drivers. There are two possible sources of this problem: The framework for writing them is too difficult (i.e. it's MS's fault) or the people writing them are not good at what they do (i.e. it's the author's fault).

In either case, it's the person who loses their work because their machine mysteriously crashes who pays the price. As a user, I don't see any difference between a crashing process and a crashing machine. I've still lost what I'm doing. The "restart" is slightly faster if it's only a process that goes down but anyone who keeps working after having one process crash has not been using MS products for very long. I don't know what really happens, but one crashing program can indeed corrupt the system. If anything goes down, reboot immediately.

There is one very simple solution to crashing software, regardless of the cause: Unleash the lawyers. If software authors were held accountable for losses caused by their faulty products, there would be dramatic increase in software quality (cost, too, of course - it's a trade-off). It would also be a good way for MS to battle open source: With open source there's no one to sue if things go badly wrong.

# re: Dr. Watson: Please send in your error report

Monday, June 28, 2004 2:29 PM by Rob Mensching
Actually, to be completely accurate that dialog is of "DAD Watson" which replaced Dr. Watson in Windows XP. AFAIK, Dr. Watson was around since the first version of Windows and it definitely could not send crash reports over the internet. Also, the "DAD Watson" name was a play on the old "Dr. Watson" name and the technology was better known by the nickname "DW" (sister to Arthur in some cartoon somewhere). AFAIK, "DAD Watson" has nothing to do with Alexander Graham Bell, but I still kinda' like your story

# re: Dr. Watson: Please send in your error report

Monday, June 28, 2004 2:34 PM by anon
I thought Dr. Watson came from Sherlock Holmes. Since it helps the investigator.

# re: Dr. Watson: Please send in your error report

Monday, June 28, 2004 4:18 PM by CalvinH [MS]
Yes: I'm wrong: it's Sir Arthur Conan Doyle we have to thank. Too many Watson's!<g>

# re: Dr. Watson: Please send in your error report

Tuesday, June 29, 2004 6:27 AM by Bob Archer
Mark,

Sure, and whenever a hard drive fails we can sue the maker for lost data and time. And whenever a power supply fails we can sue the PS maker. Heck, whenever your car breaks down you can sue the manufacturer for lost wages too.

You take a risk in any thing you do. I bet 99% of the software manulas tell people to save and BU often. I know ours do. But, people DON'T do it. Why? Then when something happens it's not their fault, its the programmer, or the PC, or the weather. Give me a break.

Really, the lawyers have enough blood to such with malpractice suits and people that get burned cause their coffee was hot. If more people took personal responsibility the world would be a better place.

BTW: I don't disagree that the quality of software could be higher. There are proven methodologies that can be taken such as TDD development, shorter development cycles and iterative cycles rather than the water fall method. But, as you point out, some of these things cost money, which would raise the price of software, which customers do NOT want to pay for, no matter how much they wine about the bugs. Would you pay $5000 - $10,000 for Windows if it were "bug free".

# What is a C0000005 crash?

Wednesday, June 30, 2004 7:36 PM by Calvin_Hsia's WebLog

# Is a process hijacking your machine?

Monday, October 18, 2004 3:04 PM by Calvin Hsia's WebLog

# re: Dr. Watson: Please send in your error report

Thursday, April 27, 2006 10:12 AM by Racine
how to stop error reports from coming

# re: Dr. Watson: Please send in your error report

Tuesday, May 02, 2006 4:01 PM by mariyah
where do i send the report? help!

What do you think?

(required) 
required 
(required)