Fault Tree Analysis

Fault Tree Analysis & Examples: What It Is, How to Do It, and Why It’s Important

Person working on fault tree analysis

The fault tree analysis is a process you can improve the operations of your maintenance teams to understand why failures occurred. Visual representations make it easier for everyone on the team to be on the same page.

A fault tree analysis shows maintenance team members a visual representation of how a problem occurred and the potential pathways that led to the main failure event.

Picture a challenge, issue, or problem that has been plaguing you or your work for the past week, month, or even a year. Maybe it’s a leaking tap, a flashing fluorescent light bulb, or a machine that hasn’t been quite up to scratch for some time.

Most people would honestly like to know what is going on. It would be nice to fix these problems now. But for some reason, no one gets around to it until disaster strikes.

That’s where a fault tree analysis comes into play.

What is a fault tree analysis?

Fault tree analysis is a systematic approach of identifying the main cause of an event, with the use of a fault tree diagram. It can also be viewed as a framework that guides you to a systematic transformation of available information into a concrete plan of action.

This process provides the analyzer with a logical sequence that helps you discover the exact root causes of the event in question. And when a fault tree analysis is used in tandem with FMEA or another analysis method, it can provide a better overall picture for maintenance decisions

How can companies implement this type of process into their company systems? What are the main components that make fault tree analysis work? And what are some good reasons to use this analysis in the first place?

Why use fault tree analysis?

New processes are generally regarded with suspicion, particularly when the system in question already works. What’s the main reason companies should use a fault tree analysis?

Companies should use fault tree analysis in order to discover the true roots of the problems that they are facing.

What does that look like in real-time? Fault tree analyses are generally used in the fields of safety and reliability engineering because they are the best ways to discover flaws in many different situations. Some uses of the fault tree analysis process include:

  • Understanding the series of events that lead to a flaw in the system or machine
  • Demonstrating compliance with safety rules and regulations, such as ADA.
  • Minimizing and optimizing the resources that are being spent by the company
  • Assisting in reviewing, overhauling or redesigning systems

One of the best parts about this process is that it is a pre-built system that companies can simply slot into place. Let’s take a look at the components of a fault tree and how any company can run a simple fault tree analysis right away.

The 3 main components of a fault tree analysis

Every fault tree runs on three components:

  • The diagram of the process
  • The events that have happened and to which the diagram is being applied.
  • And the gates, or the connections between events.

Here's a look at each one of these components and how it works in a company setting. 

Fault tree diagram

The first piece of a fault tree analysis is the diagram of events. This framework is basically a flowchart. The actual analysis is performed by drawing a series of logical deductions that start with the failure event and trace back to the root cause throughout the diagram.

Events

The next piece of the puzzle is the events that have happened. In a fault tree analysis, an event is an occurrence in the system. They can be divided into two categories: input events, which are the events that lead to other events, and output events, which are the result of input events.

Events are everything that has happened and/or what could have happened. An event is a cause, or a partial contributor, of the situation at hand.

FTA diagram symbols The different symbols on FTA diagrams indicate specific meanings.

Gates

Events are connected using “and” and “or” gates. If there are two input events and both causes an output event, then they are connected with an “and” gate. If only one input event leads to the output event then they’re connected with an “or” gate.

For example, let’s say a lightbulb went out. If the bulb was burnt out and the wiring was bad, and both these events were connected to the fact that a room wasn’t lit properly, these two events would be connected by an “and” gate. If only the bulb was bad, then these events would be connected with an “or” gate.

Now that the basic building blocks of a fault tree analysis have been explained, how do they all come together?

How to do a fault tree analysis in 6 steps

1. Define the top event

This is the starting point for the diagram: what specifically went wrong? The more precise your starting event is, the better the process will flow. Examples of good defined top events include:

  • The electrical system went down across the entire building
  • The HVAC system cannot keep the required zones hot or cold
  • An important part of a machine keeps failing constantly
  • Regulations have changed and the company must be in accordance with them by a certain date

The purpose of this definition is to put into words what exactly is wrong. If the top event is too broad, the process will not work as well. The best results are reached when it is narrowly defined and tightly contained.

2. Understand the system

The next step is to obtain as much information on the system as possible. Some sample questions that could be used are as follows.

  • What are the different components of the system? How do they all work together ideally? How do they actually work together?
  • Is the failure mechanical? Electric? Software?
  • What do the schematics show? Do you have boundary diagrams?
  • What are the code requirements? Are the proposed changes actually realistic?
  • What are your system engineers’ thoughts and opinions?
  • How do similar systems work?

The aim of this step is to figure out how the system has or had worked before the top event became a major or debilitating problem.

3. List potential causes of the top event

The next step of the process is to list the potential causes of the top event. A simple way to accomplish this is:

  • Come up with 5 potential causes
  • Estimate the probability of each causing the event
  • And put the causes in order of likely probability.

Another way that this can be done is by estimating the failure probability of the most vulnerable areas of the system or event in question. Companies and professionals that are familiar with fault tree analysis develop their own methods over time.

This part of the process is very flexible.

4. Draw the fault tree diagram

Now you are ready to draw or otherwise create your fault tree diagram. Starting with the top event, map out the different potential causes of the top event in some shape or form. Then connect each step with “and” or “or” gates, until you arrive at potential base-level causes. You will end up with something very similar to a flow chart.

It’s important to note that this fourth step relies heavily on the other steps in order for it to actually work. If your diagram is getting messy or clogged, go back to the first three steps and make sure that you are working off of a solid foundation.

5. Assess risk

The next step is to assign a risk and probability level to each base event. This can get very complicated and is again heavily reliant on the first three steps. Some simple things that you can do in order to better assess the correct risk include:

  • Relying on as much data as is feasible
  • Projecting your existing data into the future
  • Consulting with the people who know the systems the best

6. Mitigate risk

Finally, the last part of the process is taking steps to mitigate the highest-risk and highest-possibility events. Again, if the other parts of the process have been done well, this last piece will flow right out from them. That’s one of the best indications of the quality of your fault tree analysis; does this step naturally come from the ones before it?

Now that we’ve looked over the process in detail, what does it look like in action?

A fault tree analysis example

Here is a visualization of a fault tree analysis in action. This representation visually shows possibilities of how an error occurred, as well as demonstrates the number of potential events that added up to a failure.

The initial problem is clearly defined at the top of the diagram, with the various events outlined briefly and succinctly. It’s set up very well for someone to determine what the next steps should be for this particular problem.

While this is a simple fault tree analysis, they can be as complicated as you need them to be. The process remains the same.

 

fault tree analysis diagram

Benefits of fault tree analysis

What are some of the direct and indirect benefits of fault tree analysis that other methods don’t offer? The top three benefits of a fault tree analysis include:

It accounts for human error

Many people focus on the faults of the tools, the system, or other issues that do not involve people. A fault tree analysis takes into account the people that work the system and the various bottlenecks that they can create. 

It focuses on one fault at a time.

When you use a fault tree, you can break down a web of failure into a series of issues that can be solved in a much more organized way. 

It highlights important system elements that are contributing to the failure(s) in question.

When something breaks, people want to know what it is. Fault trees can get you that information, unlike other reactive methods.  

Other major benefits include the systematic approach offered, easy implementation, and another tool in your analyzation kit. This raises the question: how does a fault tree analysis compare to other analysis methods?  

What about different analysis methods?

We’ve talked a lot about the process and the thought behind fault tree analysis methods. How do they differ from other fault-based methods, specifically FMEA and event tree analysis?

Fault tree analysis vs. FMEA

At first glance, these processes may seem very similar. They both analyze failure. They project different methods to prevent and alleviate risk. What is the difference? 

Simply put, a fault tree analysis uses a top-down approach starting with a failure event, while FMEA employs a bottom-up approach starting with all potential failure modes.

It may be helpful to think about FMEA as the opposite of a fault tree analysis. They examine the same event from different perspectives and by using a different process. Curiously, this makes these two methods a very cooperative pair. There is great benefit in using them in tandem together when deeper analysis is required. If only one or the other can be used, the decisions should be made after a careful look at the company’s needs and existing problem-solving structures. 

Fault tree analysis vs. Event tree analysis

Unlike FMEA, event tree analysis takes an entirely different approach to a problem or question. An event tree analysis focuses on answering particular questions in a logical, straightforward way. While they are both “tree” forms of thinking, the event tree is very different from the fault tree.

Perhaps the most dramatic difference is that event tree analysis is typically used in finance, banking, and other specialized industries as opposed to a fault tree, which can be used across many different industries.    

In summary

Fault tree analysis is a powerful tool in the maintenance management field and beyond. It provides a scalable, repeatable process of discovery that is fairly easy to learn and implement. When used with other analytical methods, such as FMEA and event tree analysis, its effectiveness can quickly increase.

However, it does rely on accurate data and smart predictions. If the beginnings of the process are rushed or hurried, the whole system is apt to fall apart. If companies cannot dedicate adequate time or resources to it, it may be better not to try to implement a fault tree analysis process.

But for the companies that invest in it, they can discover the true root of the challenges they are facing.

Want to keep reading?

Failure Code

A failure code is an alphanumeric code that provides detailed information on why an asset failed. This information is then stored in a system, such as a CMMS.
View Article

How to Use Fault Tree Analysis in Maintenance

Fault tree analysis is a systematic approach of identifying the main cause of an event, such as a failure event, with the use of a fault tree diagram.
View Article

Asset Tracking

Asset tracking is the means by which businesses keep tabs on their critical equipment as well as their inventory.
View Article

4,000+ COMPANIES RELY ON ASSET OPERATIONS MANAGEMENT

Leading the Way to a Better Future for Maintenance and Reliability

Your asset and equipment data doesn't belong in a silo. UpKeep makes it simple to see where everything stands, all in one place. That means less guesswork and more time to focus on what matters.

Capterra Shortlist 2021
IDC CMMS Leader 2021
[Review Badge] GetApp CMMS 2022 (Dark)
[Review Badge] Gartner Peer Insights (Dark)
G2 Leader