MTBF, MTTR, MTTF: The Difference, and a Guide to Failure Codes and Metrics
Equipment failure metrics make up an entire category of performance indicators. These failure data provide undeniable value in the maintenance setting,if properly used because equipment failures can be very costly. To leverage this data, it is important to take every failure event as an opportunity to collect data and learn from it.
Maintenance teams can make failures count by collecting data from each event. While breakdowns are inevitable, minimizing risks that cause these events helps to unleash the full potential of a plant. Equipment uptime and overall reliability can then be maximized.
By paying attention to equipment failure rates and identifying metrics that can be managed, companies can take the next steps to future success. Here’s how you can start doing this today.
[alert type=”info” icon-size=”normal”]
- Find the definitions and differences between MTBF, MTTR, MTTF, and MDT
- Learn how to calculate MTBF, MTTR, MTTF, and MDT
- Understand the true cost of failure: 800 hours of annual downtime, and up to $50K per minute for lost production
- Discover how to choose the right metrics
An overview of asset types that generate failure codes
Before even starting to manage the risk of equipment failure, it is important to first identify whether the equipment is maintainable. Equipment is either repairable or non-repairable. Defining these terms allows the team to collect data more appropriately by assigning applicable performance indicators to each asset.
Repairable and non-repairable assets are handled differently, so it makes sense to measure their failure rates in ways that fit how they are used in the plant. Here are some easy definitions to get you started.
A repairable asset or repairable system is a system that can be restored to fully satisfactory performance by any method other than replacement of the entire system. For practical purposes, repairable systems only include assets that are not beyond economic repair.
Repair costs for assets that are beyond economic repair are relatively higher than the asset’s replacement value. In that case, it becomes more practical to replace the asset rather than to repair it.
Common metrics that describe the failure rate of a repairable asset are the mean time between failures (MTBF) and the mean time to repair (MTTR). Assuming an asset fails at a constant rate, the MTBF describes when the next break down will likely occur. The MTTR, on the other hand, measures the average time it takes for equipment to recover after a breakdown.
A non-repairable asset is any asset that needs to be replaced after a single failure. The distinction of whether an item can be repaired or not determines the applicable metric to describe its failure patterns. For non-repairable items, for instance, the mean time to failure (MTTF) is used. The MTTF characterizes the expected life that an asset can be used. It measures the total time that an asset is able to perform its intended task before completely failing.
Here’s a closer look at the definitions and the differences between these formulas and how they are used in the majority of plants today.
Definitions and differences between MTBF, MTTR, MTTF, and MDT
These formulas are organized into how to calculate them and what they do. Each section deals with a different formula and its properties, starting with the first in the list.
How to calculate MTBF
The MTBF metric relates the uptime of equipment with its failure rate. It measures the average time that equipment can operate without experiencing any failures. To calculate the MTBF, divide the total uptime of equipment by the number of occurrences of failure within an observed period. In formula form, this looks like:
Uptime is the total duration that the equipment is performing its intended task. A breakdown is defined as an event that causes a stoppage in normal operations.
As an illustration, imagine a heavy-duty pump observed to operate without issues for 1,000 hours. Throughout the period of 1,000 hours of normal operations, say that 4 occurrences of breakdowns were recorded. The following calculation shows that the MTBF for the pump is 250 hours:
This metric is also commonly expressed in terms of the repair rate. It is specifically known as the rate of occurrence of failures (ROCOF) over time. For convenience, ROCOF also commonly refers to the failure rate. That is defined as the number of times that an asset breaks down during an observed period of time.
Note: The lowercase Greek letter lambda (λ) commonly represents the failure rate. Some sources express the formula above as MTBF = 1 / λ.
How to calculate MTTR
The MTTR shows how to calculate the average time that an asset can operate before breaking down. While this is valuable in predicting the likelihood of a future breakdown, it does not tell us what to expect when a failure does occur. The MTTR metric fills in this gap by measuring how quick it is to restore an asset to a working condition after breaking down.
Briefly put, MTTR is a way to assess the maintainability of an asset.
To calculate this metric, divide the total downtime experienced by an asset by the number of occurrences of failure. In formula form, express MTTR as:
In this formula, downtime is defined as the duration when equipment is out of service due to failure. In this case, downtime excludes the time when an asset is not scheduled to be utilized.
Picking up where we left off from the previous example, say that the four breakdowns observed lasted for the following durations: 3 hours, 2.5 hours, 4 hours, and 3.5 hours. Summing up the total downtime, the MTTR equals 3.25 hours as shown below:
How to calculate MTTF
The MTTF is another failure metric that is a statistical value to represent reliability. This metric predicts the usage time of an asset until complete failure. The MTTF is a measure of the mean time of the lifespan of an asset.
Because MTTF measures the time of operation until complete failure, it is more appropriate for non-repairable items – including consumable items. The MTTF is a mean value of several events over long periods of time, or over the accumulation of data from a large number of observed units.
The basic equation for MTTF can be written as the total uptime or service hours from an asset or group of assets divided by the number of assets observed. In formula form:
For example, say a single light bulb operates for a total of 1,000 hours before replacement. The MTTF simply equals 1,000 hours. Now increase the number of observed light bulbs to 5 light bulbs. Take each bulb’s uptime: 900 hours, 850 hours, 1000 hours, 1050 hours, and 950 hours. The MTTF can then equals 950 hours as shown below:
You can imagine how this equation can become significantly longer and more tedious to do as the number of observed assets increases. Note, however, that the long calculations come with really valuable insight. In fact, some component manufacturers would include some MTTF information on their products. This allows for maintenance teams a better idea of when to expect to change out non-repairable parts and consumables.
Finally, MTTF only records one failure occurrence per asset. To provide a more predictive value, increase the number of assets observed.
How to calculate MDT
MDT, mean down time, is the total time the asset is down/number of failures. Another way of thinking of it is that it is the average time that any given system is non-operational. This includes all downtime that is associated or related to the asset, including but not limited to repair time, corrective and/or preventive maintenance, imposed downtime, logistical and/or administrative delays, and paperwork holdups.
In general, MDT will have to be calculated by taking into account all the factors that play into it.
The differences between them
Now, here’s a look at the differences between all these formulas and what makes each of them valuable.
What is the difference between MTBF and MTTF?
The main difference between MTBF and MTTF lies in the type of item observed. The metric to use would depend on the actions performed on an item after a failure event. Use MTBF for repairable items where restoration to satisfactory operating conditions is possible. Use MTTF for non-repairable items that organizations must replace after only one failure.
There are differences in collecting and utilizing data for these metrics. Technicians usually collect MTBF data and analyze it for a specific asset. Historical data for a particular pump, for example, can give the team useful insight on how to schedule maintenance activities for that particular pump.
MTTF, on the other hand, comes from a large number of consumable items. For instance, a type of electrical relay (that multiple parts of the plant use) averages the individual values to a meaningful MTTF value. This process obtains a metric with more data points used to predict future failures.
Deciding on whether an asset should use the MTBF or MTTF metric should be taken very seriously. The decision process should involve the right people with the knowledge and experience of using the assets on a daily basis.
What is the difference between MDT and MTTF?
The difference being MDT is the total time the asset is down/number of failures where MTTR is the total wrench time/number of failures – this can indicate problems with scheduling, acquiring parts, or other inefficiencies around the maintenance process.
MDT also does not rest on a specific formula, unlike MTTR.
Why does all this matter? Why are failures so important to avoid? The answer can be found in the true cost of failure and how these equations can dramatically reduce plant failures across the board.
The true cost of failure
First of all, failure events are not as rare as you might expect. In a 2013 study, it was shown that 30% of manufacturers experienced unplanned downtime – and that’s just looking at the first four months of the observed year! To paint an even more concerning picture, surveys suggest that manufacturers can experience as much as 800 hours of downtime annually.
Fixing a broken piece of equipment naturally incurs repairment costs. However, a potentially bigger hit to the company is the loss of revenue due to stoppages in production. Downtime periods are simply wasted opportunities to be productive. In the automotive industry, for example, stopped production is estimated to cost an average of $22,000 per minute, with maximum costs reaching up to $50,000 per minute.
In addition to piling up production losses and costs of repair, failure events threaten the safety of workers. A study observed that 35% of accidental events are caused by equipment failure. To put the value of safety in a measurable perspective, the National Safety Council (NSC) estimates the total cost of work injuries to be $161.5 billion in 2017.
Equipment breakdowns represent a big part of the total failure picture. These events are a major struggle that maintenance workers should aim to avoid at all costs. The good news is that you can avoid failure events. All it takes are the right tools and the proper mindset.
And that’s where the metrics that we discovered previously come into play. How do companies choose which metrics matter to them?
How to choose the right metrics
It is important to note that MTBF, MTTR, and MTTF are statistical figures that use historical data. First, define the data-gathering process before describing the state of the plant using these metrics. Improving failure metrics starts with accurately gathering data. The next part would be analyzing the patterns and identifying areas of opportunity.
The awareness of the condition of each asset should increase as the size of operations grows. If you haven’t already invested in computer maintenance management software (CMMS), then it might be a good time to start looking into one. If you already are using a CMMS system, then make sure that you are using it to its full capability. Make sure that your CMMS is in line with the metrics you are trying to manage.
Tip: Update the status of your asset as it changes in real-time with UpKeep. This creates a log of uptime/downtime percentages with ease.
For example, the accuracy of the metrics discussed is only as good as the accuracy of tracking the time components of the equations. It is easy to overlook individual equipment uptime, for example, if your team is not tracking it meticulously.
Use your CMMS system to do the work for you. Have a record of each asset in your system and have actual figures of your uptime and downtime percentages. Check the problem areas that are weighing down your performance.
And when you need them, the numbers will be all organized in your system, waiting for you to use them.
Minimize risk and maximize performance
Having confidence in your data gives you the power to decide the next steps to improve your overall performance. Define your metrics in a way that aligns with your assets. Align the whole team with the company’s goals. Promoting an awareness culture on these metrics equips the team with the mindset to minimize risks and increase overall performance.
This story was updated in March 2020 with more and updated information.