Start a free 7 day trial on our Business Plus plan.

← Back to the UpKeep blog

The ultimate guide to creating a reliability centered maintenance program at your facility

Key Takeaways:

  • RCM is a methodical system to help you select the right maintenance strategy for each failure mode
  • An easy step-by-step approach considers cost of failure, ease of tracking, severity of failure, and occurrence rate in the analysis
  • Reports and KPIs help refine overall strategy
  • Repeating process fosters continuous RCM improvement

Overview

Reliability centered maintenance (RCM) is a process that looks at the maintenance levels required to keep a company’s assets operating effectively, efficiently, and reliably. During an RCM analysis, you’ll need to determine failure modes and their effects. Then, you will logically create mitigating action items for each possible failure mode. After implementing those action items and collecting the resulting data, the RCM analysis will repeat to create a cycle of continuous improvement.

Since RCM can be applied to such a wide range of assets, industries, and facilities, you’ll want to learn about the overall process and analysis first. Once you understand the theory behind RCM, you’ll be better able to understand how to apply RCM to your unique facility.

In this guide, we’ll walk you through a step-by-step process on how to establish an RCM program. Then, we’ll analyze four common maintenance examples and illustrate how to actually think through each of these ten steps to reach the best RCM conclusion and action plan. Let’s get started.

10 Steps to Establish an RCM Program

When you’re tackling something as massive as implementing RCM, it can quickly seem overwhelming. The best approach is to take things one step at a time and move through a logical methodology that can be repeated and improved through every cycle. Here are 10 simple steps to the methodology:

1. Identify assets

Begin by documenting any equipment that is critical to your business or valued at $2,000 or more. If you have only basic information such as cost and expected life span, that’s okay. If you have additional maintenance data, include that as well. Create a basic spreadsheet with the data.

2. Estimate cost

Score the cost of failure. Using a scale of one to 10, assign a cost-of-failure score to each asset. This might take into account downtime, replacement, and regulatory costs. It may include safety risks as well as the probability of failure.

3. Evaluate ease of assessment

Score ability to track condition. Again, use a scale from one to 10, and select a score that represents the ease or difficulty of tracking your assets. This may include labor costs to inspect the asset, cost of sensors, and time to implement maintenance tasks.

4. Plot the data

Plot data points. Place the scores you assigned on the following graph, which can help you select the best maintenance strategy for that particular asset.

5. Repeat 1-4

Collect more data and double-check. Once you’ve assigned maintenance strategies, double-check and make sure everything makes sense to you. For example, if certain assets are subject to regulatory compliance, be sure you meet those requirements even if this exercise results in a different strategy. Confirm that the assets that end up in the run-to-failure quadrant are appropriate to actually do so.

6. Run reports

Run reports and establish KPIs. Your reports will tell you how many assets have different maintenance strategies and how much equipment lands in each quadrant. Consider if this current scenario is acceptable, and establish primary, secondary, and tertiary metrics for each.

7. Take preventive maintenance actions

Time-based preventive maintenance actions. Establish preventive maintenance schedules and intervals for each asset falling in this quadrant. This can be based on manufacturer recommendations or your own experience.

8. Begin condition-based maintenance

Condition-based maintenance actions. Purchase tools that will help you track the assets in this quadrant. Record data as maintenance tasks are performed.

9. Move toward predictive maintenance

Predictive maintenance actions. Buy the necessary sensors for the assets in this quadrant and set acceptable ranges. Schedule maintenance tasks as sensors notify you of potential issues before failures occur, and record information.

10. Repeat 1-9

All of these processes create better data. Return to step one, and repeat the process to continually improve.

Now that you have a general understanding of the RCM process, let’s take a look at four common maintenance scenarios and how RCM analysis plays out in each.

Tip: Sign up for a free trial of Upkeep to help you manage your data

The Light Bulb: Run-to-Fail

Every organization uses a standard light bulb, which makes this a simple maintenance example to consider. Although it fails to meet the “critical asset” or “$2,000 value” criteria in the first step above, it can be a useful maintenance example none the less.

The cost of failure of a standard light bulb is probably low for most facilities. Let’s assign it a score of 1. The ability track the condition of the asset is simple and warrants a rating of 1 as well. That would place the light bulb in the lower left quadrant.

Another useful grid looks at the severity and the frequency of failure occurrences. Quadrants correspond to the first grid in terms of resulting maintenance strategy.

Obviously, in this example, the light bulb has a low occurrence and low severity of failure, which makes it a perfect candidate for a run-to-failure maintenance strategy.

The remainder of the steps are not needed for run-to-failure strategy. Instead, you’ll want to make sure you have extra light bulbs to replace those that burn out, or fail. For larger run-to-failure assets, you’ll want to have spare parts on hand for repairs or a plan in place to replace the asset once it fails.

Tip: Make sure you’ve considered regulatory compliance on any run-to-fail assets

HVAC System: Time-based Preventive

Most facilities take their heating, ventilating, and air conditioning (HVAC) systems for granted. Employees simply expect that the indoor temperature and climate will be managed, regardless of outdoor weather conditions.

An HVAC system is certainly valued at more than $2,000 and will have multiple failure modes. In this example, we’ll look at one common reason for failure: a clogged filter.

The cost of failure of an HVAC system can be fairly high. It may affect the majority of your employee’s overall comfort level and productivity. In some cases, where climate control is very important in the operation of sensitive equipment, the cost of failure may be extremely high. For our example, let’s assign it a score of 7. The ability track the condition of the filter, in this case, is simple and warrants a rating of 2. That would place the HVAC filter in the upper left quadrant.

If we evaluate this example on the second chart, you’ll see that HVAC filters will end up in the corresponding quadrant with a high occurrence of failure and a low severity. Changing HVAC filters falls into the category of time-based preventive maintenance.

Heat Exchanger: Predictive Maintenance

For this third example, we’ll consider a food distribution facility that relies heavily on a working heat exchanger for its food storage system. Although a heat exchanger can be a simple part of an HVAC system where the analysis would be more similar to the clogged filter example, in this case, the heat exchanger asset is more critical to this particular business.

First, you’ll want to collect data about your heat exchanger including its cost, age, expected life span, and other information you already have about this asset. Then, assign a score on a scale of one to 10 to represent the cost of failure. This might include things like cost of downtime, replacement, or regulation compliance as well as probability of failure and safety risks.

The third step involves assigning another score on a scale of one to 10, which quantifies your ability to monitor the condition of the heat exchanger. This may include the labor costs to inspect and measure its performance as well as the cost and implementation of sensors.

Evaluate your data

Plot these data points on our first grid that considers the cost of failure and the ease of tracking performance. In our example of a critical heat exchanger in a food distribution facility, the cost of failure would be extremely high, causing the loss of potentially all your inventory. Let’s assign this a score of 9. By using automated sensors, the ease of tracking the condition of a heat exchanger would be relatively simple. However, it would require purchasing sensors and possibly an application of internet of things technology. Let’s say the score for ease of tracking is 6.

In our second chart, this heat exchanger would have a medium-high occurrence rate and a very high severity score, placing it in the upper right quadrant as well. Both analyses point to the use of an automated, condition-based predictive maintenance strategy.

Tip: Check out Monnit sensors for predictive maintenance solutions

Industrial Vehicle: Condition-based Maintenance

Most facilities use industrial vehicles on a regular basis whether they are service cars, forklifts, trucks, or overhead cranes. The majority of these assets would have a high price tag and experience multiple failure modes. In our example, let’s look at contaminated oil as the failure mode in question.

The cost of failure of an industrial vehicle with contaminated oil is probably medium-low. Safety risks are minimal and the loss of a vehicle may be more inconvenient than anything else. Let’s assign it a score of 3. The ease of tracking the condition of the oil, however, can be more difficult. It requires a technician to perform the labor and may require oil testing to see the level of contamination. Let’s assign this a score of 7. This would place the industrial vehicle in the lower right quadrant, which denotes a condition-based maintenance strategy.

Using our second chart to confirm, the occurrence of contaminated oil is probably medium-low with the severity as medium-high, placing the asset in the corresponding quadrant.

RCM and Continuous Improvement

In all of our examples, the maintenance actions taken will result in generating additional data. Over time, you’ll be able to see how often run-to-failure assets will require replacement or repair. You’ll fine-tune the intervals for time-based maintenance tasks as things like usage and loads change. Condition-based maintenance tasks should become more predictable as you collect data on how frequently assets require service based on pre-set conditions. And a wide range of sensors are available to not only trigger a predictive maintenance order but also to provide a great deal of information about how your asset is operating as a whole.

In the end, implementing a strong RCM program is not a one-time event. It’s a cycle of continuous improvement, always applying data to make better, smarter maintenance decisions.  If you’re ready to take on the challenge for your facility, learn about the RCM process, allocate resources to the effort, and take one step at a time to begin building a world-class RCM program.