Software Defect Root Cause Analysis Review


There are more than 400 root causes for software defects. The root causes for a program depend on what that program does, when and how the code was developed.  For example, missile defense software is prone to timing and state related defects more than other systems.  Older software is more prone to maintenance related defects.  Newer systems are more prone to requirements related defects.  Software that has many internal and external interfaces is prone to interface related defects.  Software that is mathematical is more prone to data related defects. 

Often times the actual root causes of the software defects are quite different than what software engineers and managers think. Example, software engineers blame the requirements for the defects when it's really the design which is the most problematic. Faulty assumptions about the types of software defects can result in defect prevention methods that focus on the wrong types of defects. This wastes time and money. 

The defect root cause analysis is a highly recommended first phase of the software failure modes effects analysis.  It's also highly recommended as a final phase of the software reliability assessment.

Contrary to popular myth, the defect root causes can and will vary by product, product maturity, development organization.  Contrary to popoular myth, there are no "default" root causes for defects. Hence, in order to effectively improve the software, a root cause analysis is an essential first step.

How the software defect root cause analysis works

The below is an EXAMPLE of the distribution of defects by the artifact or development activity that introduced it.  This distribution will vary from one software program to another.  It will also vary from one release to another.  For example, newer software systems tend to have requirements related defects while other systems in sustainment mode tend to have more defects introduced in maintenance and coding.

software defect root cause analysis
Software defect root cause analysis by artifact

A collection of reports for recently encountered defects in testing or operation from a particular software product are analyzed firstly for the developent activity for which the defect was introduced. The development activities include software requirements, interface design, detailed design, coding, maintenance. 

Then the defect reports are analyzed for root cause including faulty functionality, faulty timing, sequencing sequencing, faulty state management, faulty data, faulty error handling, etc.  The below is an EXAMPLE.  The most common failure modes are unique to the software under development.

software defect root cause analysisSoftware defect root cause analysis by failure mode

The types of software defects can be identified via a few key words. For example, timing related problems will almost always have the word time in the description. Similarly for state related issues. The defect reports are reviewed against a list of key words to identify the most common types of types by severity classification.

Once the failure modes are established, the root causes for those failure modes can be identified.  These root causes are useful for the software software FMEA.  History tends to repeat itself.  There are more than 400 software root causes.  By focusing on the most common root causes from recent history, the SFMEA can be more effective.  The below is an EXAMPLE.  The most common root causes are unique to the software under development.

software defect root cause analysisSoftware defect root cause analysis