Fault Tree Analysis
Fault Tree Analysis & Examples: What It Is, How to Do It, and Why It’s Important
Answered November 05 2019
A fault tree analysis shows maintenance team members a visual representation of how a problem occurred and the potential pathways that led to the main failure event.
Picture a challenge, issue, or problem that has been plaguing you or your work for the past week, month, or even a year. Maybe it’s a leaking tap, a flashing fluorescent light bulb, or a machine that hasn’t been quite up to scratch for some time.
Most people would honestly like to know what is going on. It would be nice to fix these problems now. But for some reason, no one gets around to it until disaster strikes.
That’s where a fault tree analysis comes into play.
What is a fault tree analysis?
Fault tree analysis is a systematic approach of identifying the main cause of an event, with the use of a fault tree diagram. It can also be viewed as a framework that guides you to a systematic transformation of available information into a concrete plan of action.
This process provides the analyzer with a logical sequence that helps you discover the exact root causes of the event in question. And when a fault tree analysis is used in tandem with FMEA or another analysis method, it can provide a better overall picture for maintenance decisions
How can companies implement this type of process into their company systems? What are the main components that make fault tree analysis work? And what are some good reasons to use this analysis in the first place?
Why use fault tree analysis?
New processes are generally regarded with suspicion, particularly when the system in question already works. What’s the main reason companies should use a fault tree analysis?
Companies should use fault tree analysis in order to discover the true roots of the problems that they are facing.
What does that look like in real-time? Fault tree analyses are generally used in the fields of safety and reliability engineering because they are the best ways to discover flaws in many different situations. Some uses of the fault tree analysis process include:
- Understanding the series of events that lead to a flaw in the system or machine
- Demonstrating compliance with safety rules and regulations, such as ADA.
- Minimizing and optimizing the resources that are being spent by the company
- Assisting in reviewing, overhauling or redesigning systems
One of the best parts about this process is that it is a pre-built system that companies can simply slot into place. Let’s take a look at the components of a fault tree and how any company can run a simple fault tree analysis right away.
The 3 main components of a fault tree analysis
Every fault tree runs on three components:
- The diagram of the process
- The events that have happened and to which the diagram is being applied.
- And the gates, or the connections between events.
Here's a look at each one of these components and how it works in a company setting.
Fault tree diagram
The first piece of a fault tree analysis is the diagram of events. This framework is basically a flowchart. The actual analysis is performed by drawing a series of logical deductions that start with the failure event and trace back to the root cause throughout the diagram.
The next piece of the puzzle is the events that have happened. In a fault tree analysis, an event is an occurrence in the system. They can be divided into two categories: input events, which are the events that lead to other events, and output events, which are the result of input events.
Events are everything that has happened and/or what could have happened. An event is a cause, or a partial contributor, of the situation at hand.
Events are connected using “and” and “or” gates. If there are two input events and both causes an output event, then they are connected with an “and” gate. If only one input event leads to the output event then they’re connected with an “or” gate.
For example, let’s say a lightbulb went out. If the bulb was burnt out and the wiring was bad, and both these events were connected to the fact that a room wasn’t lit properly, these two events would be connected by an “and” gate. If only the bulb was bad, then these events would be connected with an “or” gate.
Now that the basic building blocks of a fault tree analysis have been explained, how do they all come together?
How to do a fault tree analysis in 6 steps
1. Define the top event
This is the starting point for the diagram: what specifically went wrong? The more precise your starting event is, the better the process will flow. Examples of good defined top events include:
- The electrical system went down across the entire building
- The HVAC system cannot keep the required zones hot or cold
- An important part of a machine keeps failing constantly
- Regulations have changed and the company must be in accordance with them by a certain date
The purpose of this definition is to put into words what exactly is wrong. If the top event is too broad, the process will not work as well. The best results are reached when it is narrowly defined and tightly contained.
2. Understand the system
The next step is to obtain as much information on the system as possible. Some sample questions that could be used are as follows.
- What are the different components of the system? How do they all work together ideally? How do they actually work together?
- Is the failure mechanical? Electric? Software?
- What do the schematics show? Do you have boundary diagrams?
- What are the code requirements? Are the proposed changes actually realistic?
- What are your system engineers’ thoughts and opinions?
- How do similar systems work?
The aim of this step is to figure out how the system has or had worked before the top event became a major or debilitating problem.
3. List potential causes of the top event
The next step of the process is to list the potential causes of the top event. A simple way to accomplish this is:
- Come up with 5 potential causes
- Estimate the probability of each causing the event
- And put the causes in order of likely probability.
Another way that this can be done is by estimating the failure probability of the most vulnerable areas of the system or event in question. Companies and professionals that are familiar with fault tree analysis develop their own methods over time.
This part of the process is very flexible.
4. Draw the fault tree diagram
Now you are ready to draw or otherwise create your fault tree diagram. Starting with the top event, map out the different potential causes of the top event in some shape or form. Then connect each step with “and” or “or” gates, until you arrive at potential base-level causes. You will end up with something very similar to a flow chart.
It’s important to note that this fourth step relies heavily on the other steps in order for it to actually work. If your diagram is getting messy or clogged, go back to the first three steps and make sure that you are working off of a solid foundation.
5. Assess risk
The next step is to assign a risk and probability level to each base event. This can get very complicated and is again heavily reliant on the first three steps. Some simple things that you can do in order to better assess the correct risk include:
- Relying on as much data as is feasible
- Projecting your existing data into the future
- Consulting with the people who know the systems the best
6. Mitigate risk
Finally, the last part of the process is taking steps to mitigate the highest-risk and highest-possibility events. Again, if the other parts of the process have been done well, this last piece will flow right out from them. That’s one of the best indications of the quality of your fault tree analysis; does this step naturally come from the ones before it?
Now that we’ve looked over the process in detail, what does it look like in action?
A fault tree analysis example
Here is a visualization of a fault tree analysis in action. This representation visually shows possibilities of how an error occurred, as well as demonstrates the number of potential events that added up to a failure.
The initial problem is clearly defined at the top of the diagram, with the various events outlined briefly and succinctly. It’s set up very well for someone to determine what the next steps should be for this particular problem.
While this is a simple fault tree analysis, they can be as complicated as you need them to be. The process remains the same.
Benefits of fault tree analysis
What are some of the direct and indirect benefits of fault tree analysis that other methods don’t offer? The top three benefits of a fault tree analysis include:
It accounts for human error
Many people focus on the faults of the tools, the system, or other issues that do not involve people. A fault tree analysis takes into account the people that work the system and the various bottlenecks that they can create.
It focuses on one fault at a time.
When you use a fault tree, you can break down a web of failure into a series of issues that can be solved in a much more organized way.
It highlights important system elements that are contributing to the failure(s) in question.
When something breaks, people want to know what it is. Fault trees can get you that information, unlike other reactive methods.
Other major benefits include the systematic approach offered, easy implementation, and another tool in your analyzation kit. This raises the question: how does a fault tree analysis compare to other analysis methods?
What about different analysis methods?
We’ve talked a lot about the process and the thought behind fault tree analysis methods. How do they differ from other fault-based methods, specifically FMEA and event tree analysis?
Fault tree analysis vs. FMEA
At first glance, these processes may seem very similar. They both analyze failure. They project different methods to prevent and alleviate risk. What is the difference?
Simply put, a fault tree analysis uses a top-down approach starting with a failure event, while FMEA employs a bottom-up approach starting with all potential failure modes.
It may be helpful to think about FMEA as the opposite of a fault tree analysis. They examine the same event from different perspectives and by using a different process. Curiously, this makes these two methods a very cooperative pair. There is great benefit in using them in tandem together when deeper analysis is required. If only one or the other can be used, the decisions should be made after a careful look at the company’s needs and existing problem-solving structures.
Fault tree analysis vs. Event tree analysis
Unlike FMEA, event tree analysis takes an entirely different approach to a problem or question. An event tree analysis focuses on answering particular questions in a logical, straightforward way. While they are both “tree” forms of thinking, the event tree is very different from the fault tree.
Perhaps the most dramatic difference is that event tree analysis is typically used in finance, banking, and other specialized industries as opposed to a fault tree, which can be used across many different industries.
Fault tree analysis is a powerful tool in the maintenance management field and beyond. It provides a scalable, repeatable process of discovery that is fairly easy to learn and implement. When used with other analytical methods, such as FMEA and event tree analysis, its effectiveness can quickly increase.
However, it does rely on accurate data and smart predictions. If the beginnings of the process are rushed or hurried, the whole system is apt to fall apart. If companies cannot dedicate adequate time or resources to it, it may be better not to try to implement a fault tree analysis process.
But for the companies that invest in it, they can discover the true root of the challenges they are facing.