Preventing errors within extremely complicated technological systems is often elusive. The more complex the system, the more complex the pattern of error. But a curious thing happens in systems that are kept relatively error free: as major errors are prevented, it gets more difficult to forecast future major errors — because so few happen! In these kind of mission-critical systems the genesis profile of a major failure may be unknown because major failures are so rare.
We would not know very much about how to cure a disease if all we could study were healthy people with minor ailments. Some life-critical technological systems — like a modern passenger jet — have zero tolerance for major errors. And so we do a very good job of preventing them. But because there are so few crashes we don’t have a large body of knowledge about how they happen. A US Congressional panel investigating the FAA policy of not punishing disclosure of minor safety errors made a very keen observation about technological systems:
It said that in fields where there were few accidents, the only choice to improve safety was to gather data on “accident precursors,” minor events that could add up to catastrophe. Such events, it said, were often known only to a few airline employees.
How do you prevent major errors in a system built to successfully keep major errors to a minimum? You look for the ugly.
The safety of aircraft is so essential it is regulated in hopes that regulation can decrease errors. Error prevention enforced by legal penalties presents a problem, though: severe penalties discourages disclosure of problems early enough to be remedied. To counter that human tendency, the US FAA has generally allowed airlines to admit errors they find without punishing them. These smaller infractions are the “ugly.” By themselves they aren’t significant, but they can compound with other small “uglies.” Often times they are so minimal — perhaps a worn valve, or discolored pipe — that one can hardly call them errors. They are just precursors to something breaking down the road. Other times they are things that break without causing harm.
The general agreement in the industry is that a policy of unpunished infractions encourages quicker repairs and reduces the chances of major failures. Of course not punishing companies for safety violations rubs some people the wrong way. A recent Times article reports on the Congressional investigation into whether this policy of unpunished disclosure should continue, which issued the quote above. The Times says:
“We live in an era right now where we’re blessed with extremely safe systems,” said one panel member, William McCabe, a veteran of business aviation companies. “You can’t use forensics,” he said, because there are not enough accidents to analyze.
“You’re looking for ugly,” Mr. McCabe said. “You ask your people to look for ugly.” A successful safety system, he said, “acknowledges, recognizes and rewards people for coming forward and saying, ‘That might be one of your precursors.’ ”
Looking for ugly is a great way to describe a precursor-based error detection system. You are not really searching for failure as much as signs failure will begin. These are less like errors and more like deviations. Offcenter in an unhealthy way. For some very large systems — like airplanes, human health, ecosystems — detection of deviations is more art than science, more a matter of beauty or the lack of it.
Come to think of it, looking for ugly is how we assess our own health. I suspect looking for ugly is how we will be assessing complex systems like robots, AIs and virtual realities.