1. The Frestimate System Reliability Model Module is used with either the Software Reliability Toolkit or the Frestimate software. The user must have one of these products purchased prior to using this module.
2. Using one of the above tools create a prediction for each software LRU in the system. Make sure to include all in-house developed software, Commercial Off the Shelf (COTS), Government Furnished Software (GFS), Free Open Source Software (FOSS), and firmware. Ideally the software should be designed so that each software LRU (typically called a Computer Software Configuration Item) is associated cohesively with a particular target hardware. So, if the system is an automobile the software components might include the transmissions software, GPS software, Camera software, security software, convertible top control software, entertainment software, temperature control software, etc. Unless the software in the system is very small, there should be more than one software LRU. The LRU for software is the Dynamic Link Library (DLL) file or the Executable (EXE) file or the application file.
3. Once the predictions are defined for each software LRU, launch the System Reliability Model Module.
4. Select the File->Open menu option. Select the Frestimate or Software Reliability Toolkit file that you just created. It will usually be in the c:\SWRT folder.
5. View the Failure Mode Tab. There is a checkbox and a pulldown menu. All other columns are read only.
Filter results for critical
failures only checkbox
The Frestimate software generates 2 sets of reliability figures of merit. The first set includes failures of any criticality and the second set includes only the failures predicted to affect availability. When outputting the reliability figures, the “Filter results for critical failures only” checkbox is used to toggle the results.
Checked |
Uses the predicted number of failures
that are expected to affect availability.
On average this can be about 2-8% of all software failures. |
Unchecked |
Uses all types of failure severities
which are serious enough to be noticeable. |
Model Type
The “Model type” is a pulldown menu. Select either “Failure Rate” or “MTBF”. Both results are actually exported but the selected model type will be displayed when the predictions are imported into the Isograph Reliability Workbench™ software. This field is used to show which results are shown on the Reliability Blocks.
The other information on this page is read only and is retrieved from the Frestimate prediction file. Review each column prior to exporting. Any changes to any of the below inputs must be made using the Frestimate standard or manager’s edition or the software reliability toolkit.
Column
Header |
Description |
Software component name |
The name of the software LRU that you defined in the In-house and COTS worksheets of the software reliability toolkit or Frestimate software |
Type of software component |
This is either Application (developed by your organization) or COTS (commercially developed by another organization. The in-house software components may exhibit different failure modes than the COTS components. For example, COTS components are generally more likely to have interface or update problems. |
Model type |
Select either “Failure Rate” or “MTBF”. Both results are actually exported. This field is used to show which results are shown on the Reliability Blocks. |
Month of Interest |
You defined this in the Frestimate or Software Reliability Toolkit. It is how many months of operational growth will transpire prior to the milestone of interest. This is hardly ever more than 12 months due to the fact that most software systems undergo new feature releases at least yearly. |
Unavailability, Failure rate and MTBF. |
These are all computed at the “Month of Interest”. If you wish to change the month of interest, you will need to do so in the Software Reliability Toolkit or Frestimate software. |
Restore time |
There is no “MTTR” for software because software does not wear out. There is a Mean Time To Software Restore which is a weighted average of restart time, reboot time, workaround time. In some cases, some operational failures can only be fixed via a change to the software product code. So, the administrator time to get a new release of software or downgrade to a previous version of software is also considered. |
Unavailability, Failure rate and MTBF for the first 10 months of operational usage. |
Isograph Reliability Workbench allows for 10 different milestones for these predictions. We associate the 10 milestones with the first 10 months of operational growth that is predicted to occur. |
6. Select the Fault Tree Tab.
This tab shows 28 different failure mode/root cause pairs associated with software. Edit each of the columns for each software LRU so that the failure modes most related to a particular LRU are given a higher weighting than those that are least likely. Some tips for determining which failure modes are more likely than others can be found in Effective Application of Software Failure Modes Effects Analysis as well as in IEEE 1633 Recommended Practices for Software Reliability.
The below is a listing of the 28 failure mode/root cause pairs. The fault tree will display the below events and assign a failure rate for each event that is the product of its relative weighting which you input on the above tab and the failure rate prediction for that software LRU. So, for example if the software LRU is predicted to have a failure rate of .001 and you assign equal relative portions of the below failure mode/root causes then each even will have a resulting failure rate of .001/28.
a) Edit the cells under each failure mode/root cause heading to assign more or less weighting to each of the below failure mode/root cause pairs. For more information about these failure mode/root cause pairs see “Effective Application of Software Failure Modes Effects Analysis” http://softrel.com/SoftwareReliabilityPublications.html.
b) The “compute” button ensures that the relative portions for each of the 28 failure mode/root cause pairs equal 1. It is possible to assign a relative portion of 0 to a failure mode/root cause column if you have no past or current evidence that the failure mode is likely. For example, if the software is installed exclusively in a factory or by a qualified service technician the likelihood of a serviceability failure mode is relatively small.
Note that all of the below failure modes and root causes can and do occur as a single point failure. Three of the faulty error handling root causes happen when there is a failure in the system (i.e. hardware or other software) that the software fails to detect or handle. You can supply a failure rate for the system event which is not necessarily related to the software itself. For example, if the software LRU is a transmission software system and the transmission hardware encounters a failure and the software fails to detect it or fails to recover from it that is both a hardware failure and a software failure.
Generic failure mode |
Specific root cause |
|
Faulty functionality |
This LRU performed an extraneous function |
|
|
This LRU failed to execute when required |
|
|
This LRU is missing a function |
|
|
This LRU performed a function but not as required |
|
Faulty sequencing |
This LRU executed while in the wrong state |
|
|
This LRU executed out of order |
|
|
This LRU failed to terminate when required |
|
|
This LRU terminated prematurely |
|
Faulty timing |
This LRU executed too early |
|
|
This LRU executed too late |
|
Faulty data |
This LRU manipulating data in the wrong unit of measure or scale |
|
|
This LRU can 't handle blank or missing data |
|
|
This LRU can 't handle corrupt data |
|
|
This LRU data/results are too big |
|
|
This LRU data or results are too small |
|
Faulty error handling |
This LRU generated a false alarm |
|
|
This LRU A failure in the hardware, system or software has occurred |
A failure in the hardware, system or software has occurred |
|
This LRU detected a system failure but provided an incorrect
recovery |
|
|
This LRU failed to detect errors in the incoming data, hardware,
software, user or system |
|
Faulty processing |
This LRU consumed too many resources while executing |
|
|
This LRU was unable to communicate/interface with the rest of the
system |
|
Faulty usability |
This LRU caused the user to make a mistake |
|
|
This LRU User made mistake because of user manual |
|
|
This LRU failed to prevent common human mistakes |
|
|
This LRU allowed user to perform functions that they should not
perform |
|
|
This LRU prevented user from performing functions that they should
be allowed to perform |
|
Faulty serviceability |
This LRU installed improperly |
|
|
This LRU updated improperly |
|
|
This LRU is the wrong version or is outdated |
7. Export Tab. On this tab, press the Export button to Export the failure modes, reliability blocks and fault tree information so that it can be imported by Isograph Reliability Workbench. Refer to this link for instructions on how to import the exported file into the Isograph software.