Killer Apps, 07/07/97

HOW TO AVOID "KILLER APPS"

Recognize that no amount of testing can remove all bugs or risks.

Don't confuse safety, reliability and security. A system may be 100% reliable, but neither secure nor safe.

Tightly link your software and safety organizations.

Build and use a safety information system.

Instill a management culture of safety.

Assume that every mistake users can make will be made.

Don't discount low-probability, high-impact events.

Use good software engineering practices. Emphasize requirements definition, testing, code and specification reviews and configuration control.

Don't let short-term cost considerations overshadow long-term risks and costs.

Software
Quality
and Safety
Reference
Links

Food, drug and software administration

Since 1986, there have been 450 reports filed with the FDA detailing software defects in medical devices. Twenty-four incidents led to deaths or injuries.

But just how many software glitches exist in medical devices is anyone's guess. The FDA receives thousands of reports each year on device defects that caused or could have caused death or injury. Because the reports must be filed within five to 15 days of an incident, manufacturers often don't know exactly what caused a failure at the time the report is written, says E. Stewart Crumpler, a software specialist at the FDA.

And the reports may fail to disclose software's role in a defect. For example, a report might say only that an implanted heart defibrillator failed prematurely because of an unexpectedly short battery life.

Last year a defibrillator did just that, and it turned out that a software defect caused excessive battery drain. The affected patient required emergency surgery, classified in FDA reports as an "injury."

On June 1, new federal regulations took effect that greatly extended the FDA's authority for auditing medical device manufacturers. For the first time, the regulations authorize the FDA to examine premanufacturing processes such as software design. The new emphasis on software brings the regulations into compliance with the International Standards Organization 9000 quality standards. "Management control and involvement is important, and that's beefed up in the new regulation," Crumpler says. "Management must establish and carry out quality policies and procedures." — Gary H. Anthes

People are being killed and injured by software and embedded systems. But IS managers can help prevent high-tech manslaughter by ensuring safety and high quality through rigorous testing and preparation

An intravenous medication pump runs dry and injects air into a patient. A monitor fails to sound an alarm when a heart stops beating. A digital display combines the name of one patient with medical data from another. A respirator delivers "unscheduled breaths" to a patient.

Those scary scenarios have two things in common: All the devices were recalled under orders from the U.S. Food and Drug Administration (FDA), and all owed their potentially deadly performances to flaws in the software that controls them. The FDA has issued 20 such recalls so far this year. The agency says there isn't any evidence that the quality of software is getting worse, but defects are on the rise as software becomes pervasive in medical devices.

"We are seeing more software problems," says E. Stewart Crumpler, a software specialist at the FDA's Center for Devices and Radiological Health. "Have there been deaths and injuries due to software? You betcha."

Experts say the software that controls medical devices, nuclear power plants, airplane flights and the like often isn't up to the high safety standards demanded of physical entities such as cars, appliances and lawn mowers. Failure to make safety-critical software safe is largely one of management, experts say. Managers tend to overestimate the quality of software; they frequently don't understand safety and quality concepts; and their priorities too often lie elsewhere.

"Managers are so focused on short-term costs, they fail to include the real-world costs of shoddy software," says M. E. Kabay, director of education at the National Computer Security Association in Carlisle, Pa. Software development managers "don't really integrate product quality into their thinking." "The number of accidents in high-tech systems seems to be increasing," says Nancy Leveson, a computer sciences professor at the University of Washington and author of Safeware — System Safety and Computers. "These incidents happen all the time, but because of liability, nobody discusses them in public."

Software errors have been triggering accidents for years, including the following:

In the mid-1980s, a bug allowed a radiation machine to deliver huge overdoses to six cancer patients, killing three of them.
Transmission software was implicated in a 1991 accident in which a bus plunged off a California mountain, killing seven Girl Scouts.
Because of a flaw in its targeting software, a Patriot defense missile allowed an Iraqi Scud missile to slam into a barracks during the Gulf War, killing 29 Americans.

And just in the past year, software errors have been blamed for the following close calls:

The Washington Times in May reported that a malfunctioning command and control system spontaneously switched Russian nuclear missiles to "combat mode" on several occasions.
A Boeing 747 in February scraped its tail on the runway as it took off and then had trouble flying. The cause: Poorly tested software miscalculated the plane's center of gravity.
General Motors Corp. last July recalled nearly 300,000 cars because of a software flaw that could cause an engine fire.

ROOM TO IMPROVE

"Safety-critical software is nowhere near as good as it could be," says Peter Neumann, principal scientist at SRI International, Inc. in Menlo Park, Calif., and moderator of the Risks Forum newsgroup on the Internet.

"The run-of-the-mill software developer does not have the slightest idea what the risks are or what techniques to use to ensure reliability and safety," he says.

Leveson says complacency plays a key role in most major accidents. Developers of missiles or nuclear power plants wouldn't buy their hardware at a thrift shop, she argues, but they're apt to do the equivalent with software by building safety-critical applications on top of buggy commercial operating systems. "They have this tremendous confidence in technology they don't understand," she says.

Another common management error is the failure to establish an effective safety information system that tracks hazards, accidents and trend data, Leveson says. As a result, many organizations with smart and well-meaning people nevertheless lack the information needed to build safety in to systems.

Baltimore Gas and Electric Co. doesn't employ any unusual techniques to develop the software that runs its two nuclear reactors. But it takes special pains with requirements definition, quality reviews, testing, documentation and configuration control, says Gary Spurrier, account director for nuclear information technology services at Baltimore Gas.

Software from the internal information systems group and vendors must be developed in compliance with federal Quality Assurance Criteria for Nuclear Power Plants. "When we choose a new vendor, we send an audit team to make sure they have a software quality assurance program and it meets the regulations," Spurrier says.

The utility's IS group has an alter ego in its nuclear design engineering group. IS conducts design reviews, code reviews and systems tests, and design engineers also perform those functions. Thus, software defects have two chances of being caught.

Hewlett-Packard Medical Group in Andover, Mass., which makes monitors for critically ill patients, proves out its safety-critical software by special methods, including "formal inspections" in which specifications and code are analyzed by experts according to a set of strict rules and procedures.

Formal inspections are time-consuming, but the benefits far outweigh the costs, says Brian Connolly, an engineering manager at the medical group. "The human cost of a product recall is horrible," he says. "It's 10,000 times as costly to correct a defect in the field than if you had found it in the specification phase."

Alan Barbell, a project manager at Environmental Criminology Research, Inc., in Plymouth Meeting, Pa., an institute that evaluates medical devices, says software designers must see for themselves how their products are going to be used. "A lot of companies have their marketing people go out in the field and then come back and relay information through a filter," he says.

David Croston, director of clinical engineering at Buffalo General Hospital in Buffalo, N.Y., says the quality of software in medical devices is good and improving. But he says he still sees "unexplainable software glitches." He says the hospital tests new gear for about 30 days before paying for it. "Manufacturers do a good job, but when we get it into the true clinical environment, and users are doing things slightly differently, that's when the real problems can come," he says.

Sometimes IS managers underestimate how critical the software is. Robert Charette, a computer risk specialist and president of ITABHI Corp. in Fairfax, Va., tells of such myopia among managers at the London Ambulance Service.

The service brought up a computerized dispatching system in 1992. It was poorly tested and beset by bugs. Emergency callers waited 30 minutes or more to get through; calls disappeared from dispatchers' screens; ambulances arrived hours late. As many as 46 people died in the first two days, due partly to delays in getting medical help. But not one coroner's report listed the dispatch system as the cause of death. "The ambulance people did not view the software as safety-critical," says Charette, chairman of the Risk Management Advisory Board of the Software Engineering Institute at Carnegie Mellon University in Pittsburgh. "If you don't see it as safety-critical, you don't take the extra care."

And there are perverse incentives to avoid having your software seen as safety-critical, Charette says. Software that can injure or kill is a red flag in litigation, he says, and can drive developers' insurance rates through the roof.

By Gary H. Anthes
Anthes is Computerworld's senior editor, special reports.