Root Cause Analysis Techniques and Fundamentals

Answered November 24 2020

two mechanics discussing root cause analysis

Root cause analysis (RCA) is a key part of a successful maintenance program. However, to conduct one correctly and effectively, it’s critical to understand the components, tools, and training required. In addition, sharing the process, the reasons behind the data collection, the changes required, and the results can help fuel a culture that values RCA processes and procedures.

What Is Root Cause Analysis?

An RCA is a process that identifies a factor that causes a nonconformance, and then eliminates that factor through a process improvement. Although many root cause analyses are performed, it seems rare that the root cause itself is actually permanently eliminated.

A root cause is a core issue, or the highest-level cause that sets in motion an entire cause-and-effect reaction that ultimately leads to the problem. Contributing factors, or failure threads, need to be traced, so an organization can ultimately get to the roots of the problem.

RCA is, therefore, a collective term that describes a wide range of approaches, tools, and techniques used to uncover various causes of problems. Some RCA approaches are more geared toward identifying true root causes, while others are more general problem-solving techniques.

Unfortunately, many companies do not really understand RCA or use it effectively. In some cases, a few people might tinker with RCA but not consistently. Some companies may take on RCA without formal training; however, the many RCA techniques need to be formally learned and experienced. It’s critical that the first RCA application is a quick win, so team members can measure and see its effectiveness. Having the right data to find the root cause, understanding the process to conduct an RCA, and then making a permanent change to eliminate the cause of the problem are critical steps to success.

Five Reasons Why Companies Use RCA

Ultimately, there are five reasons why companies need to be using RCA in their business.

  1. Mitigate failures
  2. Optimize asset reliability
  3. Optimize process reliability
  4. Reduce costs
  5. Reduce stress

Many companies may have established a maintenance process, but over time, it can deteriorate and fall back into a mostly reactive mode. RCA can help provide more ongoing structure utilizing five key tools: the Pareto chart, Ishikawa or Fishbone, the five whys, scatter diagram, and failure modes and effect analysis (FMEA). 

Root Cause Analysis Triggers

Many events within an organization can trigger an RCA. Here are the top triggers:

Resources: A RCA trigger may be determined by RCA type and the number of resources applied. For example, a company sees that its rolling element bearing consumption is trending upwards over six months. This event serves as the trigger; an RCA may discover that a lack of lubrication or improper installation is the root cause to this problem.

Cost: Cost triggers may include maintenance labor hours, cost of restoration, cost of rework and material management, or cost of parts. Understanding those costs and the root causes of those costs is important.

Asset Failures: Obviously, if you have critical assets failing every few months, this is a trigger. Paying attention to failures after repairs or rebuilds can illustrate that you’re not addressing the true root of the problem. 

Equipment Downtime: Companies may specify that if downtime exceeds a certain level, an RCA is triggered to discover the cause.

Overnight Delivery Costs: Although few companies measure this trigger, it can be a simple and very revealing one. Businesses that cannot track purchases or materials often spend a great deal of money rush-shipping parts.

Process Failures: Manufacturers who track process stability can establish triggers in each process. For example, if you exceed six-minute stops, that could be a trigger to search for a root cause.

Employee Safety Incident: Anytime a company experiences an accident or has an OSHA safety violation, that should serve as an RCA trigger to prevent future such incidents.

Environmental Failures: Situations such as major spills or leaks that cause environmental damage should serve as a trigger for a thorough root cause investigation.

In addition, visual scorecards that show progress for key metrics can help identify triggers for RCA. Scorecards might include things like preventive maintenance compliance, schedule compliance, work order closeout accuracy, rework levels, or maintenance costs. Team members need to see progress and where problems lie in order to help them identify the triggers that can lead to RCA.

What Is the Five Whys Method?

The five whys is an interrogative technique used to explore cause and effect relationships underlying a particular product. The primary goal of this technique is to determine the root cause of a defect or problem by repeating the question “why?” Each answer forms a basis for the next question.

For example, a company is not meeting its production goals. Why? Because breakdowns are high. Why? Because there’s no planning and scheduling. Why? Because we tried that, and it didn’t work. Why? Because there was a lack of discipline among team members. That provides you with the root cause that needs to be addressed and eliminated.

What Is an Ishikawa Fishbone Diagram?

The Ishikawa fishbone diagram shows the potential causes of a specific event. It helps identify factors that contribute to a certain event like maintenance. Companies may use this tool for maintenance process design, finding quality defects, and other cause-and-effect relationships.

A company may use the fishbone to discuss key performance indicators (KPIs), the need for a maintenance dashboard, or whether Responsible, Accountable, Consulted and Informed (RACI) charts are required. You can brainstorm ideas about materials, secure storerooms, roles, and responsibilities.

For example, if employees have free access to a storeroom but fail to charge a part to a work order, this can increase costs. Understanding personnel aptitude and attitude as well as maintenance knowledge and skills can help you discover root causes.

Ishikawa is mainly used for brainstorming, so companies can determine what factors are contributing to a current problem state.

Data Analysis and Tools

Once you have conducted your RCA, you will want to be able to analyze your data. The following are the most common tools that can help you do just that.

Pareto Chart

A Pareto chart is a type of chart that contains both bar and line graphs. Individual values are representative and descend in order by bars, and the cumulative total is represented by the line. A Pareto chart can help you visualize something like production losses by equipment. Although they can be very valuable in showing you the data, it’s important that the business then implements actions to change those numbers when needed.

Scatter Diagram

A scatter diagram is a type of plot or a mathematical diagram that uses coordinates to display values for typically two variables for a set of data. For example, you can plot preventive maintenance labor hours versus emergency urgent labor hours to illustrate that the latter may be out of control or more stable.

Failure Modes and Effects Analysis

FMEA helps companies identify risks within any operating environment. It helps organizations focus on the things that are causing the highest level of risk to that business.

For example, let’s say that maintenance planning is not meeting expectations. A company can illustrate the process being reviewed, the issue, revision of the process, data review, and FMEA owner or coordinator. Then, consider the first process step, the potential failure mode, and the impact or severity of the failure. Once you have that information, you look for controls that can be put in place to prevent the cause of failure and assign a risk prioritization number (RPN) that can track severity or occurrence.

An FMEA helps companies establish their priorities. For example, if job planning is not working, companies may need to prioritize training, increase the level of discipline among the team, or secure leadership buy-in.

Creating an Effective Failure Mitigation RACI Chart

An RACI chart can help companies manage the people who are helping to not only identify root cause triggers but also resolve them.

In order to create an effective failure mitigation RACI, organizations must assemble a team to eliminate and mitigate failures. For example, in the maintenance planning example, this team may include a planner, supervisor, technician, reliability engineer, and production manager.

The team should be trained in failure mitigation strategy, which will involve defining the process as tasks and steps required for a successful failure elimination effort. This RACI process should be visible to all team members along with a KPI dashboard focused on resolutions. Most importantly, each step in the process should have a single person accountable for that task.

Root Cause Analysis Training

RCA training should be provided for every team member, whether they’re leading investigators or simply supporting the effort. Each member should understand his or her responsibilities, so the RCA effort can work as a whole system, not just a collection of individual tasks.

For example, a company reported bearing failures seven times over the course of two years. Preventive maintenance was deferred 22 times due to production requirements, and the company lost 2,300 units or $220,000 in production. Preventive maintenance compliance was at 90%, maintenance labor cost was $2,400, and maintenance material cost was $4,500. The company discovered similar assets having the same type of failure.

After investigating the trigger, it was discovered that over-lubricating was resulting in seal failure, which led to bearing seal failure, which led to bearing failure. One of the main contributing factors was the lack of effective preventive maintenance procedures. This was a training issue for technicians. By sharing a report with technicians company-wide, this organization could show how the RCA identified the failure and why lubrication practices needed to be better defined and taught. The related KPI should then be shared over the next 18 months to illustrate the impact of the training changes.

Miscellaneous Tips

The webinar wrapped up with questions and answers:

What’s the difference between RCA and FMEA?

FMEA is just a tool in the RCA toolkit.

Is it possible to implement RCA techniques for a new plant without any breakdown history?

Each piece of equipment should have a failure history, which can be used to create risk assessments. Perform an RCA on risk as opposed to waiting for the consequences. Once you baseline the equipment and start using it, you can evaluate problems that arise.

When it comes to identification of a root cause of any failure, is it logical to investigate all or any failures?

In some cases, one failure causes another failure to occur, which is why it’s important to consider the entire failure thread.

What is the difference between FMEA and FMECA?

One takes into account the criticality versus only looking at severity times probability.

When is RCA reactive, and when is it proactive?

If you don’t have a process, then RCA is reactive. 

Which method is best used to initiate proactive brainstorming?

It’s important to start with education, so the team understands what RCA is and how it can help. That’s the foundation of success.

Can you please recommend books on RCA?

There are many good books out there including ones by Bob Latino, Susan Lubell, and Ricky Smith. It’s more important to find the particular approach that meets your needs at your facility and stick with it. Look at the basics of any investigative occupation, collect evidence, establish a team, and graphically reconstruct what happened. Do something to fix it and measure to show you got something better.

Is there any other methodology that could help us assure the assumptions established on the Ishikawa diagram or five whys are valid?

Be sure you start with facts and not just hearsay. Just because someone claims something, that is not enough to go on. Back that up with evidence, whether that be parts, position, people, paradigms, or paper. Consider an evidence-based RCA such as PROACT.

Who typically defines what is the most critical of failures? Is it purely financial in most cases? What if one division is constantly being pointed to, but they are high up on the food chain?

One method is to examine the RPN number. Calculate this by multiplying a measure of severity, detection, and occurrence together, which should give you an idea of where you need to focus. This method will help you see important failures.

However, remember that the total cost of the failure over the life cycle of the facility is what's most important. This includes the safety, environmental, and reputational costs. It's important to see the whole picture because sometimes the small chronic failures that add up every day get missed when they should be solved.

Is it better to attack reliability from a process level instead of at the system component level?

It’s important to start with failures and where you have your worst problem. For example, asset reliability may be on a production line where a bottleneck occurs.

Conclusion

At the end of the day, root cause analysis only works if a company has management buy-in, leadership support, comprehensive training, and a way to show progress. Maintenance teams can be naturally inquisitive, and when they see that certain things are happening on the shop floor, they will start to look for those critical root causes. If team members understand that changes in processes, procedures, or training then affects those numbers, which then impacts things like production goals, costs, and profit, they will become excellent long-term supporters and proponents of RCA.

Note: This article is based on a webinar “Root Cause Analysis Techniques and Fundamentals” with Ricky Smith. To view the recording of the webinar, visit this link.