How does software fail?

Materials and structures have strength and fatigue properties that determine their engineering applications in systems designs, and the failure modes that result from these uses. These properties must be considered when designing systems and performing stress analysis in order to assure that the designed structure will fulfill its intended functions within a given environment. The structural integrity of a design is impacted by both the static and fatigue characteristics of its constituent materials, and can lead to crack initiation and propagation under cyclic loads, excessive elastic deformations, and other time dependent failure modes, such as corrosion, creep, stress ruptures, and thermal fatigue.

Software, unlike physical materials, is immune to these types of failures. While the computer on which software executes may exhibit many of these failure modes, for a deterministic algorithm that is correctly implemented in software, it will always work in the future in the same way as it has in the past, for a set of initial conditions and a given sequence of inputs.

  • It does not fatigue.
  • It does not crack.
  • It does not buckle or deform.
  • It has no effects from different temperatures, current, electromagnetic radiation, or other environmental insults.
  • It needs no protection from shock and vibration.

Yet what our programs are observed to do - far too often - is fail, rather than succeed, under circumstances that seem to surprise those not experienced in software development. Our programs fail because their composition is complex and heterogeneous, and because we are not able to fully test or analyze them as we can test homogenous materials. No laws of physics guide software creation. A line of code is weightless. Even though techniques like cyclomatic complexity have attempted to do so, we cannot reliably extrapolate a program's reliability from its structure. At best, we can only get hints about potential weaknesses from such analyses.

People unfamiliar with how software actually works in a computer often have the idea that software can work for a while, and then stop working, as if wearing out. They also often believe that software obeys other mechanical rules. Neither of these views is accurate. Software may appear to work at some times, and not at others, but such behavior will be the result of problems with the software's requirements, design, or implementation, or with the environment in which the software executes. Further, the failure of an improperly designed software component can bring the system this software is embedded into to its knees. And often, because of the software's complexity and structure, such defects lie waiting for us, undetected, until just the right sequence of inputs reveal symptoms of their existence; sadly, this only may occur if we're in the right place and time to notice such problems.

Consider the above flow graph, which represents two sequential if-then-else clauses, nested in a looping construct. How many tests should be employed to adequately traverse this logic? This is not an easy answer to determine. It is not like a structural assembly that may transfer loads from one beam to another. There may be data dependencies between code fragments; the conditions which trigger each clause may themselves be highly complex and interdependent, and the looping mechanisms may be be flawed. 

A software program is effectively an unambiguous specification for traversing a set of execution paths through the hardware it executes on. Programming languages which these specifications are expressed in provide a vehicle for communications between the programmer and, thorough a compiler, a computer architecture. Yet as the details necessary to write software are progressively revealed and incorporated into a software program, it is inevitable for mistakes to be made, typically at surprisingly high frequencies. Joel Spolsky suggests that this is because we find it easier to write specifications than to communicate ideas in ways another human brain can "compile" it, i.e. understand, translate, and take appropriate actions on its underlying meaning.

People do not describe specifications as 'failing', but instead use language to characterize them such as:

  • correct or incorrect,
  • consistent or inconsistent,
  • complete or incomplete,
  • clear or ambiguous,
  • safe or dangerous.

Problems with software occur because people define such specifications improperly, not because a computer doesn't execute it's instructions reliably. Only in rare cases is the flaw with the environment in which the software executes. Instead, here are some of the many ways that people fail in implementing the software they develop correctly:

  • Faulty requirements
    • Wrong requirements:(not what is defined for a specified situation)
    • Missing requirements (encountering a situation which the specification did not describe)
    • Infeasible requirements (attempting to implement an inconsistent
    • Inconsistent requirements
  • Faulty designs
    • Incomplete or erroneous algorithms
    • Resource contention or constraints for computer resources or interfacing devices
    • Failure to handle all possible exceptional conditions
  • Faulty code
    • Improperly implemented designs
    • Erroneous data coupling
    • Improperly implemented logic
    • Memory leaks
  • Faulty environment
    • Errors injected by tools used to build the software
    • Improper calls to runtime systems
    • Improperly set up support software

To err in these ways is human, and a part of our very nature. Studies using data collected in the Personal Software Process indicate that the average rate of injecting defects into even small programs, by even very experienced programmers, remains at about 10 defects per hundred lines of code. Luckily, about half of these defects are found by compilers, since they can be recognized as flaws in syntax, but the remaining ones prove far more difficult to eliminate. These defects must be detected, isolated, and removed in order for failures in software programs to be reduced to an acceptable level. No method is able to remove more than about 70% of the defects in any one verification stage (and many only remove 20-40%). Further, since as we make fixes, we introduce additional defects, it takes multiple stages of defect filters just to achieve an acceptable level of quality. And after all that, despite our best investments in testing methods and tools, the typical defect removal rate of most development teams prior to release remains at only about 85 percent.

What this means is that to build software that works properly, we must first recognize that in the majority of situations, software does not fail, but people who write or run software do. As a result, when we consider the probability of a software program containing an error in a failure mode and effects analysis, the software should be assigned a probability of one that it will contain an error, because people will always make mistakes. According to Murphy's law, these mistakes will be manifested as system problems at the worst possible times.

As a result, we must focus on how teams build their software in the first place, and minimize the number of defects they introduce. In parallel, we must of course continue to refine the methods we use to find and remove those defects. Tools can help some at this, but the tools are just another piece of software that may or may not work properly, for the same reasons as above. This is why talent, architectures, effective processes, and organizational maturity are so critical to successful software development, and automation approaches just help us produce failures more quickly, and often with less understanding.