|
People are being killed and injured by software and embedded systems. But IS managers can help prevent high-tech manslaughter by ensuring safety and high quality through rigorous testing and preparation An intravenous medication pump runs dry and injects air into a patient. A monitor fails to sound an alarm when a heart stops beating. A digital display combines the name of one patient with medical data from another. A respirator delivers "unscheduled breaths" to a patient. Those scary scenarios have two things in common: All the devices were recalled under orders from the U.S. Food and Drug Administration (FDA), and all owed their potentially deadly performances to flaws in the software that controls them. The FDA has issued 20 such recalls so far this year. The agency says there isn't any evidence that the quality of software is getting worse, but defects are on the rise as software becomes pervasive in medical devices. "We are seeing more software problems," says E. Stewart Crumpler, a software specialist at the FDA's Center for Devices and Radiological Health. "Have there been deaths and injuries due to software? You betcha." Experts say the software that controls medical devices, nuclear power plants, airplane flights and the like often isn't up to the high safety standards demanded of physical entities such as cars, appliances and lawn mowers. Failure to make safety-critical software safe is largely one of management, experts say. Managers tend to overestimate the quality of software; they frequently don't understand safety and quality concepts; and their priorities too often lie elsewhere. "Managers are so focused on short-term costs, they fail to include the real-world costs of shoddy software," says M. E. Kabay, director of education at the National Computer Security Association in Carlisle, Pa. Software development managers "don't really integrate product quality into their thinking." "The number of accidents in high-tech systems seems to be increasing," says Nancy Leveson, a computer sciences professor at the University of Washington and author of Safeware System Safety and Computers. "These incidents happen all the time, but because of liability, nobody discusses them in public." Software errors have been triggering accidents for years, including the following:
ROOM TO IMPROVE "Safety-critical software is nowhere near as good as it could be," says Peter Neumann, principal scientist at SRI International, Inc. in Menlo Park, Calif., and moderator of the Risks Forum newsgroup on the Internet. "The run-of-the-mill software developer does not have the slightest idea what the risks are or what techniques to use to ensure reliability and safety," he says. Leveson says complacency plays a key role in most major accidents. Developers of missiles or nuclear power plants wouldn't buy their hardware at a thrift shop, she argues, but they're apt to do the equivalent with software by building safety-critical applications on top of buggy commercial operating systems. "They have this tremendous confidence in technology they don't understand," she says. Another common management error is the failure to establish an effective safety information system that tracks hazards, accidents and trend data, Leveson says. As a result, many organizations with smart and well-meaning people nevertheless lack the information needed to build safety in to systems. Baltimore Gas and Electric Co. doesn't employ any unusual techniques to develop the software that runs its two nuclear reactors. But it takes special pains with requirements definition, quality reviews, testing, documentation and configuration control, says Gary Spurrier, account director for nuclear information technology services at Baltimore Gas. Software from the internal information systems group and vendors must be developed in compliance with federal Quality Assurance Criteria for Nuclear Power Plants. "When we choose a new vendor, we send an audit team to make sure they have a software quality assurance program and it meets the regulations," Spurrier says. The utility's IS group has an alter ego in its nuclear design engineering group. IS conducts design reviews, code reviews and systems tests, and design engineers also perform those functions. Thus, software defects have two chances of being caught. Hewlett-Packard Medical Group in Andover, Mass., which makes monitors for critically ill patients, proves out its safety-critical software by special methods, including "formal inspections" in which specifications and code are analyzed by experts according to a set of strict rules and procedures. Formal inspections are time-consuming, but the benefits far outweigh the costs, says Brian Connolly, an engineering manager at the medical group. "The human cost of a product recall is horrible," he says. "It's 10,000 times as costly to correct a defect in the field than if you had found it in the specification phase." Alan Barbell, a project manager at Environmental Criminology Research, Inc., in Plymouth Meeting, Pa., an institute that evaluates medical devices, says software designers must see for themselves how their products are going to be used. "A lot of companies have their marketing people go out in the field and then come back and relay information through a filter," he says. David Croston, director of clinical engineering at Buffalo General Hospital in Buffalo, N.Y., says the quality of software in medical devices is good and improving. But he says he still sees "unexplainable software glitches." He says the hospital tests new gear for about 30 days before paying for it. "Manufacturers do a good job, but when we get it into the true clinical environment, and users are doing things slightly differently, that's when the real problems can come," he says. Sometimes IS managers underestimate how critical the software is. Robert Charette, a computer risk specialist and president of ITABHI Corp. in Fairfax, Va., tells of such myopia among managers at the London Ambulance Service. The service brought up a computerized dispatching system in 1992. It was poorly tested and beset by bugs. Emergency callers waited 30 minutes or more to get through; calls disappeared from dispatchers' screens; ambulances arrived hours late. As many as 46 people died in the first two days, due partly to delays in getting medical help. But not one coroner's report listed the dispatch system as the cause of death. "The ambulance people did not view the software as safety-critical," says Charette, chairman of the Risk Management Advisory Board of the Software Engineering Institute at Carnegie Mellon University in Pittsburgh. "If you don't see it as safety-critical, you don't take the extra care." And there are perverse incentives to avoid having your software seen as safety-critical, Charette says. Software that can injure or kill is a red flag in litigation, he says, and can drive developers' insurance rates through the roof. By Gary H. Anthes Anthes is Computerworld's senior editor, special reports. |