MTBF, MTTR, MTTF – The difference and a full guide into failure codes and metrics
Metrics that matter
The metrics that an organization chooses to measure tell a lot about what they care about. While you can’t choose your data, you are in control of choosing the metrics you want to measure. Metrics and performance indicators only begin to make sense as they become properly identified and managed.
Equipment failure metrics make up an entire category of performance indicators. These failure data provide undeniable value in the maintenance setting. Because equipment failures can be very costly, it is important to take every failure event as an opportunity to learn. Maintenance teams can make failures count by being keen in collecting data from each event. Equipment uptime and overall reliability can be maximized by paying attention to equipment failure rates and identifying metrics that can be managed. While breakdowns are inevitable, minimizing risks that cause these events helps to unleash the full potential of a plant.
The cost of failure
Failure events are not as rare as you might expect. In a 2013 study, it was shown that 30% of manufacturers experienced unplanned downtime – and that’s just looking at the first four months of the observed year! To paint an even more concerning picture, surveys suggest that manufacturers can experience as much as 800 hours of downtime annually.
Fixing a broken piece of equipment naturally incurs repairment costs. However, a potentially bigger hit to the company is the loss of revenue due to stoppages in production. Downtime periods are simply wasted opportunities to be productive. In the automotive industry, for example, stopped production is estimated to cost an average of $22,000 per minute, with maximum costs reaching up to $50,000 per minute.
In addition to piling up production losses and costs of repair, failure events threaten the safety of workers. A study observed that 35% of accidental events are caused by equipment failure. To put the value of safety in a measurable perspective, the National Safety Council (NSC) estimates the total cost of work injuries to be $161.5 billion in 2017.
Equipment breakdowns represent a big part of the total failure picture. These events are a major struggle that maintenance workers should aim to avoid at all cost. The good news is that you can avoid failure events. All it takes are the right tools and the proper mindset.
Repairable vs. Non-repairable assets
Before even starting to manage the risk of equipment failure, it is important to first identify whether the equipment is maintainable. Equipment is either repairable or non-repairable. Defining these terms allows the team to collect data more appropriately by assigning applicable performance indicators to each asset. Repairable and non-repairable assets are handled differently. Thus, it makes sense to measure their failure rates in ways that fit how their use in the plant.
A repairable asset or repairable system is a system that can be restored to fully satisfactory performance by any method other than replacement of the entire system. For practical purposes, repairable systems only include assets that not beyond economic repair. Repair costs for assets that are beyond economic repair are relatively higher than the asset’s replacement value. In that case, it becomes more practical to replace the asset rather than to repair it.
Common metrics that describe the failure rate of a repairable asset are the mean time between failures (MTBF) and the mean time to repair (MTTR). Assuming an asset fails at a constant rate, the MTBF describes when the next break down will likely occur. The MTTR, on the other hand, measures the average time it takes for equipment to recover after a breakdown.
A non-repairable asset is any asset that needs to be replaced after a single failure. The distinction of whether an item can be repaired or not determines the applicable metric to describe its failure patterns. For non-repairable items, for instance, the mean time to failure (MTTF) is used. The MTTF characterizes the expected life that an asset can be used. It measures the total time that an asset is able to perform its intended task before completely failing.
How to calculate MTBF
The MTBF metric relates the uptime of equipment with its failure rate. It measures the average time that equipment can operate without experiencing any failures. To calculate the MTBF, divide the total uptime of equipment by the number of occurrences of failure within an observed period. In formula form, this looks like:
Uptime is the total duration that the equipment is performing its intended task. A breakdown is defined as an event that causes a stoppage in normal operations.
To illustrate the MTBF calculation, imagine a heavy-duty pump observed to operate without issues for 1,000 hours. Throughout the period of 1,000 hours of normal operations, say that 4 occurrences of breakdowns were recorded. The following calculation shows that the MTBF for the pump is 250 hours:
This metric is also commonly expressed in terms of the repair rate, or more specifically known as the rate of occurrence of failures (ROCOF) over time. For convenience, ROCOF also commonly refers to the failure rate. It is defined as the number of times that an asset breaks down during an observed period of time. The equation relating MTBF to ROCOF is:
How to calculate MTTR
The MTBF shows how to calculate the average time that an asset can operate before breaking down. While this is valuable in predicting the likelihood of a future breakdown, it does not tells us what to expect when a failure does occur. The MTTR metric fills in this gap by measuring how quick it is to restore an asset to a working condition after breaking down. In other words, MTTR is a way to assess the maintainability of an asset.
To calculate this metric, divide the total downtime experienced by an asset by the number of occurrences of failure. In formula form, express MTTR as:
In this formula, downtime is defined as the duration when equipment is out of service due to failure. In this case, downtime excludes the time when an asset is not scheduled to be utilized.
Picking up where we left off from the previous example, say that the four breakdowns observed lasted for the following durations: 3 hours, 2.5 hours, 4 hours, and 3.5 hours. Summing up the total downtime, the MTTR equals 3.25 hours as shown below:
How to calculate MTTF
The MTTF is another failure metric that is a statistical value to represent reliability. This metric predicts the usage time of an asset until complete failure. The MTTF is a measure of the mean time of the lifespan of an asset.
Because MTTF measures the time of operation until complete failure, it is more appropriate for non-repairable items – including consumable items. The MTTF is a mean value of several events over long periods of time, or over the accumulation of data from a large number of observed units.
The basic equation for MTTF can be written as the total uptime or service hours from an asset or group of assets divided by the number of assets observed. In formula form:
For example, say a single light bulb operates for a total of 1,000 hours before replacement. The MTTF simply equals 1,000 hours. Now increase the number of observed light bulbs to 5 light bulbs. Take each bulb’s uptime: 900 hours, 850 hours, 1000 hours, 1050 hours, and 950 hours. The MTTF can then equals 950 hours as shown below:
You can imagine how this equation can become significantly longer and more tedious to do as the number of observed assets increases. Note, however, that the long calculations come with really valuable insight. In fact, some component manufacturers would include some MTTF information on their products. This allows for maintenance teams a better idea of when to expect to change out non-repairable parts and consumables.
What is the difference between MTBF and MTTF?
The main difference between MTBF and MTTF lies in the type of item observed. The metric to use would depend on the actions performed on an item after a failure event. Use MTBF for repairable items where restoration to satisfactory operating conditions is possible. Use MTTF for non-repairable items that organizations must replace after only one failure.
There are differences in collecting and utilizing data for these metrics. Technicians usually collect MTBF data and analyze it for a specific asset. Historical data for a particular pump, for example, can give the team useful insight on how to schedule maintenance activities for that particular pump. MTTF, on the other hand, comes from a large number of consumable items. For instance, a type of electrical relay (that multiple parts of the plant use) averages the individual values to a meaningful MTTF value. This process obtains a metric with more data points used to predict future failures.
Deciding on whether an asset should use the MTBF or MTTF metric should be taken with care. The decision process should involve the right people with the knowledge and experience of using the assets on a daily basis.
How can I improve my metrics?
It is important to note that MTBF, MTTR, and MTTF are statistical figures that use historical data. First, define the data-gathering process before describing the state of the plant using these metrics. Improving failure metrics starts with accurately gathering data. The next part would be analyzing the patterns and identifying areas of opportunity.
The awareness of the condition of each asset should increase as the size of operations grows. If you haven’t already invested in a computer maintenance management software (CMMS), then it might be a good time to start looking into one. If you already are using a CMMS system, then make sure that you are using it to its full capability. Make sure that your CMMS is in line with the metrics you are trying to manage.
For example, the accuracy of the metrics discussed is only as good as the accuracy of tracking the time components of the equations. It is easy to overlook individual equipment uptime, for example, if your team is not tracking it meticulously. Use your CMMS system to do the work for you. Have a record of each asset in your system and have actual figures of your uptime and downtime percentages. Check the problem areas that are weighing down your performance.
Having confidence in your data gives you the power to decide the next steps to improve your overall performance. Define your metrics in a way that aligns with your assets. Align the whole team with the company’s goals. Promoting an awareness culture on these metrics equips the team with the mindset to minimize risks and increase overall performance.