Blog-post-featured-image
Reliability Leadership

How to Address Age-Related Equipment Failure

Ryan Chan

[alert type=”info” icon-size=”hide-icon”]

This article contains insights from the book “Maintenance Control” by James Borowski, a maintenance professional and UpKeep customer. If you want additional help predicting and preventing equipment failures, you can download two chapters of the book for free that deal with equipment failure and reliability.

[/alert]

Age-related equipment failures are associated with time in service. Equipment experiencing this category of failure are those whose surfaces are in direct contact with the material being handled or the product being manufactured.

Systems tied to this failure type may be exposed to shock, mechanical vibration, a corrosive atmosphere, oxidizing chemicals or vapors, or heat evaporation that take their physical toll. These systems may also consist of consumable items or surfaces like wear plates.

Examples of specific equipment types that are typically considered to fail based on time in service include:

  • simple electromechanical systems with pins
  • bushings
  • chains
  • sprockets
  • belts
  • sheaves
  • pump rotating groups
  • valve seats
  • material handling conveyors 
  • hopper liners
  • DC motor brushes
  • hydraulic filters 
  • electrical components like limit switches and relays 
  • vehicle tires 
  • clutches 
  • and internal combustion engines.

Why equipment fails

Age-related equipment failure is generally related to the length of time of operation of the system in question. This can be measured in time, use, or both. The problem comes in when only part of a whole system is failing because it’s too old to keep working. 

Most age-related equipment failures are associated with time in service (use) as opposed to the actual age of the item in question. Equipment experiencing this category of failure are typically parts or systems where some of their surfaces are in direct contact with the material being handled, or the product being manufactured.

Systems tied to this failure type may be exposed to shock, mechanical vibration, a corrosive atmosphere, oxidizing chemicals or vapors, or heat evaporation that take their physical toll. These systems may also consist of consumable items or surfaces like wear plates.

Examples of specific equipment types that are typically considered to fail based on time in service include:

  • simple electromechanical systems with pins
  • bushings
  • chains
  • sprockets
  • belts
  • sheaves
  • pump rotating groups
  • valve seats
  • material handling conveyors 
  • hopper liners
  • DC motor brushes
  • hydraulic filters 
  • electrical components like limit switches and relays 
  • vehicle tires 
  • clutches 
  • and internal combustion engines.

How to address age-related failures with maintenance

With equipment and components that fail based on time, a task is scheduled in a maintenance management system that addresses restoration or overhaul.

Perform scheduled restoration/overhaul tasks

A restoration/overhaul task is used for equipment that has worn to the point where failure is imminent. Scheduled restoration implies that the equipment can be rebuilt or overhauled before failure occurs. The restoration process is intended to return the equipment to its original “like new” condition or close to it. The result is that the equipment is reborn and given a new life.

For example, let’s assume that a production unit has a roll line consisting of a series of individually driven rolls coupled to a gearbox and electric motor. History has shown that because of the plant environment, the gearboxes typically have a useful life of 24 months. In this application, to avoid a functional failure, a scheduled maintenance task is created to replace the boxes before an operating age of 24 months.

Useful life is the point from initial installation to the point just before multiple failures begin to take place. For our gearbox example, the useful life is 24 months. But, as the curve points out, there are multiple failures of equipment before the wear-out zone is reached—one failure every two months. Likewise, most of the equipment that is taken out of service at the 24-month period still has many additional months of good service left–—some as much as another year.

From our example, it can be seen that restoring or overhauling equipment on the basis of time is ineffective and wasteful. It is ineffective in preventing functional failures, since many pieces of equipment will fail before the wear-out zone. It is wasteful because the equipment with significant operating life will be taken out of service too early.

With attributes of inefficiency and waste, scheduled restoration tasks continue to be used and will always be around. Arguably, they are not the most effective proactive maintenance task, but they are simple to schedule.

The important thing here is that a scheduled restoration task should be used only with equipment having a bathtub curve failure mode…

Bathtub curve that shows how equipment fails over time

…or a Conditional Wear Curve with Wear-out Zone failure mode:

Graph of conditional wear curve with wear-out zone

No other failure mode curves apply. These are the only two failure modes with wear-out zones.

Scheduled restoration/overhaul tasks have been used for many years to improve equipment reliability, regardless if they are inefficient, wasteful or not. But keep in mind that there are some assumptions when using this strategy:

  • In fact, the equipment in question does wear out over time. The equipment has an age-related failure curve with a wear-out zone.
  • A large percentage of the equipment type must survive to the wear-out zone. There can’t be peaks or bursts of random failures along the way to the wear-out zone.
  • The equipment must be capable of being restored or overhauled. The item in question can’t be a throwaway, as it has no value once it has worn out.

Perform scheduled on-condition tasks

It is quite apparent from the Conditional Probability of Failure Curve of our example that random failures take place even with equipment scheduled to be overhauled after time in service. In a thorough reliability program, these random failures are treated in the same way as any random failure. That is, scheduled on-condition tasks, like detailed inspections, are carried out. The inspections may be visual and act as insurance that nothing out of the ordinary is going on.

For example, cables of an overhead crane may be scheduled for change-out every 9 months. Experience may show this is the useful life in a certain application. Yet, the maintenance department may still thoroughly inspect every foot of the cable every two months to ensure that there are no breaks or fraying of cable strands. This is a scheduled inspection of an on-condition task. The cable stays in service on the condition that it is not frayed. When 9 months of service comes along, the cable is changed. At that point, the wear-out zone has been reached for the cable.

In many cases, scheduled inspections of equipment that have an age-related failure mode may use predictive technologies like infrared scans to detect heat, non-destructive testing for identifying stress cracks, and oil analysis to determine the type and accumulation of dirt particles.

For example, assume a truck engine has a useful life of 10,000 operating hours. With anything beyond these hours, a truck fleet can be expected to see multiple failures. Consequently, a strategy is developed that says truck engines will be taken out of service at a point not to exceed 10,000 operating hours. Yet, as we have seen, multiple engines within the fleet can be expected to fail randomly.

To prolong the life of every engine in the truck fleet, the maintenance organization does a couple of things. First, regular oil changes are scheduled, performed, and tracked for compliance. Second, a sample of the oil is analyzed after each oil change to determine if it contains any excessive wear particles. If it does, maintenance schedules the engine exchange in a timely manner so that the truck doesn’t fail in service.

This is common sense and common practice. Just because equipment may fail on a time basis, experience says that things can happen to bring the equipment to its knees beforehand. This strategy only holds for equipment that has some degree of complexity. For example, wear plates and conveyor belts that can be readily observed by an operator or any interested party d not require predictive analysis to determine when failure might occur.

It is very common for items that have a history of scheduled restoration tasks, overhauls, or component exchanges to have scheduled inspections of on-condition tasks to guard against random failure.

Perform scheduled discard/throwaway tasks

A scheduled discard proactive task is assigned to equipment where a critical functional failure cannot be tolerated for any reason. A condition where a critical item must be removed from service before a failure occurs is called a safe-life limit. That is, while in service, the item must have a 100% probability of surviving to the next period. There can be no chance of the item failing. These are typically associated with simple pieces of equipment.

For example, a battery operating a critical sensing device like a gas analyzer or a light bulb in a panel indicating a critical condition must not be allowed to fail. Individual batteries and bulbs are tested in a laboratory and their failure points noted. Based upon testing, the period before any failures begin is then divided by 2, 3, or sometimes 4 to provide a margin of safety. When that point is reached in service, the item (battery or bulb in this case) is taken out of service and discarded knowing full well more life is probably available from the item.

For an item with a safe-life limit, the goal is not to collect any failure data.

A critical item can also be removed and discarded for economic consequences. This is called an economic-life limit and is treated in the same manner as the conditional probability bathtub and slowly increasing wear curves discussed in the scheduled restoration/overhaul sections. And, similar to the restoration task, the assumptions are that:

  • The equipment wears out over time having an age-related failure mode with wear-out curve
  • A large percentage of this equipment type must survive to the wear-out zone
  • The equipment is NOT intended to be rebuilt and is discarded or scrapped

For example, an elevator for a reheat furnace may be driven by a bronze nut/steel screw arrangement. The life of the screws and nuts may be 16 months, but they are changed at 14 months because of an economic-life limit. If any of the screw/nut drives fails early, it creates a significant economic hardship (delay costs) and must be avoided. Consequently, the screw/nut drives are changed out early even though more life remains. Upon being removed from service, the screws and nuts are scrapped.

Just as the scheduled restoration/overhaul task, this type of scheduled task is wasteful and inefficient. But for the sake of safety or economic issues related to financial risk, it is a management decision to accept the waste.

[optin-monster-shortcode id=”ye5fexnqeurwh2wesybu”]

Creating schedules and switching to condition-based maintenance

Throughout the article, we have mentioned an overall strategy that depends on scheduled maintenance. How are those schedules created in the first place? 

The short answer is by observing, testing, tracking, and collecting information into a condition-based maintenance strategy. 

What is condition-based maintenance? What are some of the impacts it has? And why does it matter in a strategy that is specifically designed to address age-related equipment failure?

A quick look at condition-based maintenance

Condition-based maintenance is exactly what it says: a maintenance strategy that is based on the actual condition of the asset in question, regardless of any outside factors that may influence the asset and its condition. 

These strategies fill the gaps that a time-based strategy creates. Condition-based maintenance is particularly valuable when a company is experiencing a significant amount of age-related equipment failure that is not necessarily on schedule or expected. 

Seeing as this type of failure is typically expected to compose about 80% of all equipment failures, it is in a company’s best interest to have some type of condition-based maintenance strategy in place, particularly when calculating your optimal replacement time. 

How do companies do that? 

How to calculate optimal replacement time

Unfortunately, the only sure way to calculate a reasonable optimal replacement time is by studying the data available to you and drawing logical conclusions. All the formulas and calculations that assist in discovering your optimal replacement time depend on this data.

Oddly enough, your major source of this type of data is from time-based maintenance. When properly tracked, these records display your maintenance data in easy-to-understand reports that can be fed into your optimal replacement formulas. 

This type of study is the quickest way to see results that are steady and reliable. If this seems like too much, consider it as an investment. Calculating your optimal replacement time is probably one of the biggest improvements you can make that will address age-related equipment failure.

At the end of the day, addressing age-related equipment failures is a struggle that many companies and industries deal with every day. However, a few strategic changes focusing around time-based and condition-based maintenance can go a long way to reducing unexpected failures, breakdowns, and inconveniences for you and your clients.

This article was updated with more information in June, 2020

Please enter a valid email address