The Software Failure Modes Effects Analysis (SFMEA) [God00]
is a method to analyze the safety characteristics of critical systems that are based on software. The method is based on the Failure Modes Effects Analysis (FMEA), which is widely used and known in industry [Wan11]
. The FMEA method has
been intensively used to evaluate the safety of critical hardware systems in the automotive,
aerospace and military area. However, using the FMEA method to analyze software has
shown to be problematic. Software is somewhat different, since almost all errors within
software are design errors or logical errors. Thus, the software does not "fail", because
it does exactly what it was programmed to do. In the past the SFMEA has not been very
popular, only a few papers provide comprehensive examples of how the analysis is done for
software. But Goddard et al. [God00]
and Bowles et al. [BW01]
showed that the SFMEA
can be used to efficiently uncover potential hazards within software projects that remained
uncovered by other analytical approaches. Furthermore, analyzing the software of a given
system also reveals certain hardware errors that cause failures within the software. The
SFMEA can be used for embedded systems as well as for large software projects [BW01]
.
The SFMEA is also mentioned for the evaluation of critical systems in some standards, e.g.
the IEC 61508.
The SFMEA can be used in an top-down manner or an bottom-up manner. In this thesis the SFMEA will be complemented with a FTA analysis. The FTA itself follows a top-down approach. It is preferable to combine two different viewpoints, thus for the SFMEA only the bottom-up approach is regarded. An SFMEA analysis consists of following steps. At first the scope of the planned analysis should be defined. This also includes the viewpoints that should be considered during the analysis. The viewpoints are: functionality, maintainability, usability, serviceability, vulnerability and interfaces. After that the needed resources to carry out the analysis have to be identified. Furthermore, several terms have to be defined and rated, which can have different a scaling and rating depending on the software that is being analyzed. Those terms are defined as follows:
After defining the ratings for the detectability, severity and likelihood, a table or template for the SFMEA has to be created or chosen.
After all these preparations are finished the actual analysis begins. Potential failure modes
are researched and characterized. Failure modes are the ways in which a component or
system might fail. For each failure mode the root cause should be identified. To find
the failure modes various techniques can be applied. These techniques include the view
on architectural considerations, a system preliminary hazard analysis, the view on the
requirements and especially the analysis of critical variables. Goddard et al. [God00]
described these methods in more detail.
In the next step the effects of the failures / failure modes should be found. All found failures should be rated by their likelihood, detectability and severity. The multiplication of those three values is called the Risk Priority Number (RPN), which reflects the importance of a given entry. Then, preventive measures and counter actions to lower the threat of a given failure should be identified. After the counter measures are identified, the RPN and the three underlying measurements can be revised. Usually leading to a better (lower) RPN, thus also lowering the danger of the risks. If the threat is still to high, then the procedure can be redone to identify more counter measures and protective measures. The main difference between a SFMEA and a FMEA is that different viewpoints and failure modes are used to analyze the underlying system.
See an example of a SFMEA table at the end of this blog post.
A FTA is an top-down analysis approach. A FTA for software works in a similar way than an ordinary FTA. The only difference are types of events and the used modes. FTAs are especially useful to find the root causes of specific failures. Especially if the failure is caused by a combination of multiple errors. A FTA starts with the gathering of the necessary documents, such as requirements and design documents. Then the FTA analysis team brainstorms for failure events. The failures are then positioned on the top of the tree.
The team then tries to identify the causes for the failures. The causes that lead to a failure event can be combined by either logical "and" or "or" blocks. This process creates smaller sub-trees for each failure event. The FTA can thus identify the root causes of failures and also the path its the root errors mitigation. By looking at the risks and severities of each sub path (started from bottom-up) on overall risk and severity scoring can be created by following the path of the error mitigation. The team can then revise the applicable requirements and design documents.
A bottom-up SFMEA can be used to complement an FTA. Together both methods
generate a detailed safety analysis. Nicodemos et al. [GLAS12]
analyzed multiple ap-
proaches of combining the SFMEA and the FTA method. Additionally, the authors applied
both methods for space critical software projects. The authors show that the FTA can be
used to determine gaps in the fulfilment of the defined requirements and that the SFMEA
can be used to analyze failures in the software requirements definition itself.
Ann Marie Neufelder the chairperson of the IEEE 1633 Recommended Practices for Reliable
Software working group also mentions that SFMEA and FTA analysis can be combined to
complement each other [Neu16]
.
The different analysis approaches of a FTA and a SFMEA:
SFMEA vs FTA
A SFMEA analysis is especially useful to identify failure modes and single points of failures. The analysts require deep knowledge of the software. The FTA is especially useful to identify failures that are created by the combination of multiple events, a SFMEA fails to detect these failures.
If you want to know more, feel free to read my Master Thesis (feel free to contact me if you have not access to the ressources).
Example: SFMEA Table
[God00]
P. L. Goddard. “Software FMEA techniques.” In: Annual Reliability and Maintainability Symposium. 2000 Proceedings. International Symposium on Product Quality and Integrity (Cat. No.00CH37055). 2000, pp. 118–123. DOI :
10.1109/RAMS.2000.816294.
[Wan11]
M. H. Wang. “A cost-based FMEA decision tool for product quality design
and management.” In: Proceedings of 2011 IEEE International Conference on
Intelligence and Securit
[BW01]
J. B. Bowles, C. Wan. “Software failure modes and effects analysis for a
small embedded control system.” In: Annual Reliability and Maintainability
Symposium. 2001 Proceedings. International Symposium on Product Quality
and Integrity (Cat. No.01CH37179). 2001, pp. 1–6. DOI : 10.1109/RAMS.
2001.902433.
[GLAS12]
F. G. Nicodemos, C. Lahoz, M. A. D. Abdala, O. Saotome. “Using Combined
SFTA and SFMECA Techniques for Space Critical Software.” In: (Jan. 2012),
pp. 12–.
[Neu16]
A. M. Neufelder. “How to apply software reliability engineering.” SoftRel. 2016. URL : http://www.softrel.com/softwarereliability.pdf.