Software Defect Root Cause Analysis Review

The final phase of the software reliability assessment is the software defect root cause analysis. Often times the actual root causes of the software defects are quite different than what software engineers and managers think. Example, software engineers blame the requirements for the defects when it's really the design which is the most problematic. Faulty assumptions about the types of software defects can result in defect prevention methods that focus on the wrong types of defects. This wastes time and money. 

Contrary to popular myth, the defect root causes can and will vary by product, product maturity, development organization.  There are no "default" root causes for defects. Hence, in order to effectively improve the software, a root cause analysis is an essential first step.

software defect root cause analysisSoftware defect root cause analysis

The types of software defects can be identified via a few key words. For example, timing related problems will almost always have the word time in the description. Similarly for state related issues. The defect reports are reviewed against a list of key words to identify the most common types of types by severity classification.

In the above example defect root cause analysis you can see not only the most common types of defects, but also the fact that the software group had known about several defects that were ultimately reported by the customer. Ultimately, the priority classification system required adjustment.

  • Requirements related - the "Whats" are incorrect. These include incomplete requirements, incorrect requirements, ambiguous requirements, unwritten assumptions, etc.
  • Design related - the "Hows" are incorrect. These include incomplete design, timing, state, logic, etc.
  • Code related - the "Whats" and "Hows" are correct but the software engineer doesn't implement them properly. There are many many examples of these such as divide by zero, compilable typos, memory leaks, etc.
  • External related - the software had to change because an external event such as poor source control, a change to the hardware or external software application such as the O/S.