How to Address Non-Age Related or Random Equipment Failure
[alert type=”info” icon-size=”hide-icon”]
This article contains insights from the book “Maintenance Control” by James Borowski, a maintenance professional and UpKeep customer. If you want additional help predicting and preventing equipment failures, you can download two chapters of the book for free that deal with equipment failure and reliability.
The failure process for age-related issues is directly linked to time. It’s generally felt that new equipment upon installation will operate for a continuous period until it eventually wears out. Maintenance personnel from the earliest times believed that this was the manner in which all equipment failed. It could arguably be stated that most maintenance personnel even today believe that this is the way all equipment fails.
As long as decades ago, research discovered this is not the case.
Actually, it was determined that most equipment failures, according to some experts, are not age-related at all. Their failure modes are more or less random in nature and not really associated with time or equipment age.
For age-related failures, stresses are applied continuously during the life of a piece of equipment. These applied stresses take their toll over time and weaken equipment. The ultimate result is functional failure after a period. With non-age-related failures, however, equipment is subjected to stresses, which are typically inconsistent and variable in nature.
Examples of events that cause non-age-related failures include:
- A high voltage transformer experiencing an extremely strong electrical surge resulting in high stress to its insulation and wiring. This may cause a short within the transformer and ultimate failure.
- An operator subjecting his piece of equipment to periodic overloads or shocks that impact a machine’s structural framework or rolling element bearings.
- The equipment itself being manufactured or assembled incorrectly causing premature failure.
These are examples of equipment failures not based on time.
How random equipment failures happen
To understand the process of non-age-related failures, four graphs can be used.
Graph A shows that the equipment has a certain resistance to operational stresses. However, when an extremely high peak stress is received (i.e., stress from operator error or equipment damage), the resistance of the equipment is overcome and functional failure results immediately.
Graph B illustrates a condition where a high stress is received which lessens the overall resistance to failure. The more stresses of this nature that the equipment receives, the weaker the equipment becomes.
Graph C depicts a situation where a very high stress is received during operation, but the impact is temporary. That is, healing takes place resulting in equipment regaining its original level of resistance. This is typical of equipment with thermoplastic materials that soften and harden with temperature.
Graph D shows a relationship where a high applied stress impacts the resistance of the equipment such that there is a deteriorating downward trend toward an eventual failure based on time. This failure mode is typical of equipment that has been assembled with damaged parts or the equipment has not been stored properly.
For equipment whose failure mode is not related to time or age, the Conditional Probability of Failure is the same from period to period. If the equipment successfully survives one period without failure, its chances of failing in the succeeding period are exactly the same. Failure chances or probability of failure are unrelated to time.
This is illustrated in the following graph.
How to address non-age-related failures with maintenance
Non-age-related equipment failures are associated with complex equipment. The components of these machines include sophisticated electronic controls, mechanical devices with close tolerances, and rolling element bearings of all types and sizes. Random failure occurs with this equipment because of stresses—from operational errors and external damage.
If that is the case, then a reasonable person might ask, “What can be done?” Is there an option or any technique that can be used to predict the random failure of this type equipment? The answer is “yes.” There is a way. The trick is to identify a potential failure point.
According to Nowlan and Heap in Reliability-centered Maintenance, “a potential failure is an identifiable physical condition, which indicates a functional failure is imminent.
With potential failure, the equipment is living up to expectations as far as the operators are concerned. Yet, if the proper inspection is made, it can be seen that the equipment is providing a clear, visible signal that failure is about to happen.
When a random failure occurs, in most cases the equipment gives a signal. This signal is a point of potential failure. If we can identify that potential failure point before the equipment reaches ultimate functional failure, the equipment can be serviced to extend its life or remove itf rom service.
To address non-age-related random failures, the potential failure point for the equipment must be found. As suggested by Nowlan and Heap, maintenance organizations use the following proactive strategies to find potential failures:
- Scheduled condition monitoring tasks
- Scheduled failure-finding tasks
Perform scheduled condition monitoring tasks
For equipment with a random probability of failure, the assignment of time-based maintenance tasks to replace or refurbish the equipment is useless and wasteful. For equipment with a random probability of failure, condition-monitoring tasks are used.
Proactive condition monitoring activities consist of inspections that are either visual in nature or they involve the use of sophisticated technologies. Both are part of a program to recognize the correct time or point to service a piece of equipment before failure occurs.
Visual inspections to determine operating life are a considerable step above the general walk-around inspections that may be conducted by operators and maintenance personnel alike. These inspections are technical in nature where data (i.e., physical measurements, pictures of current conditions, sketches of problem areas, etc.) are acquired for decision-making purposes. All the information is analyzed by experts and a determination made of the present condition of the equipment and its remaining life.
In addition to technical visual inspections, more commonly sophisticated technologies and techniques for determining equipment condition are used. These include:
- Vibration Analysis
- Motor Current Signature Analysis (MCSA)
- Non-Destructive Testing
All of these are able to capture signals and identify nuances that foretell that a piece of equipment is on the path to imminent failure. Regardless of how sophisticated or simple each of the above technologies / techniques is, it is designed to find and seek out potential failures. That is its purpose in life and reason for existing.
All of the above technologies or techniques are intended to find the “P” on a P-F curve. Before we move onto exploring failure-finding tasks, I want to explain what that means.
P-F curve and condition monitoring
When equipment with a non-age-related failure mode begins to experience random failure, there is some warning that the failure is about to happen in most cases. The intent of the on-condition monitoring task is to find the “P” on the P-F Curve.
The P-F Curve has an X-axis of time and a Y-axis for condition. Time increases toward the right. The best equipment condition is at the top of the Y-axis.
The curve consists of three significant points:
- Point where failure begins
- Point where the failure is detected (P)
- Point where the equipment has failed functionally (F)
The function of performing an on-condition task is to detect Point P, where failure is in its earliest stages. For a visual inspection, this can mean finding a wear measurement, which is extremely dangerous or out-of-spec. Or, finding a crack in a metal surface that is about to fracture.
On-condition tasks using sophisticated technologies and techniques can mean:
- Finding bearing wear, coupling misalignment, or fan imbalance using Vibration Analysis
- Discovering hot spots in refractories, electrical switchgear, or electric motor leads with Thermography
- Determining excessive wear of gearing or seals with Tribology
- Documenting the condition of electric motors by means of Meggering or Motor Current Signature Analysis
- Identifying fatigue cracks in objects and structures using Dye Penetration
- Discovering leak paths around seals or fluid flow through pipes using Ultrasonic Testing
The intent is that the discovery of the initial failure point will provide enough time within which the unit can be serviced. This period is known as the P-F Interval.
The P-F Interval is the warning period or lead-time to failure for the equipment. This is the period within which the equipment must be serviced. The wildcard in this strategy is Point F.
It may be difficult or impossible to determine when exactly failure will take place. A maintenance organization realistically does not know if they have a long or short P-F Interval when Point P is discovered. In this case, experience is the best guide to how long the P-F Interval actually is.
In practice, inspections are scheduled on a time or calendar basis, like every month or every quarter. Once a potential failure (point P) has been detected and a certain threshold has been reached, a maintenance organization typically abandons the established schedule, embarking on more frequent condition monitoring to determine the rate of decay. Typically, this is followed by debate as to when the machine should come down.
Note: The use of any of the current predictive technologies consists of a good deal of experience and intuition rather than hard science.
Part of any modern equipment reliability program involves the use of predictive technologies to find potential failures of operating equipment. In the high-stress world of equipment reliability, maintenance guys continuously look for “P.”
As just described, a maintenance inspection can be scheduled in a system to look for a potential failure. The next task, which also is unrelated to time, is scheduled in a system to find something that has already failed. Typically, these are found in systems that have intermittent use.
Perform scheduled failure-finding tasks
For the next scheduled proactive maintenance task, we are looking for something that has already failed, but hasn’t been seen yet. It is something that is not obvious to an operator, but it must be found. Typically, these hidden failures are associated with emergency or backup systems.
Functional failures that are hidden can lead to multiple failures. For example, a hydraulic system may have a supply of pressurized fluid stored in a bank of accumulators for emergency purposes. The pressurized supply under normal conditions is isolated from system components by means of an electrically energized valve. When power is lost to the machine, the valve is de-energized, causing a spring to shift the valve open, allowing pressurized fluid to enter the system.
The valve in this example can fail without anyone being aware. That is, when electrical power is lost, the valve may “stick” or “hang up” blocking accumulator discharge. Without emergency flow, the machine fails in operation possibly creating a safety issue or causing a major damage.
During an emergency when electric power is lost, it is common in some applications for a diesel-powered generator to automatically start and feed emergency power to a network. If the generator is out of fuel or in some way unable to start, a hidden failure is present.
Frequently tanks have a high-level switch that shuts down a pump filling it with liquid. If that switch fails, the high-level control circuit is commonly backed up with another switch identified as a high-high level switch. If the first high level switch fails, the high-high level switch shuts down the pump. However, if the high-high level switch has failed, no one will know until the tank overflows—most likely resulting in more damage.
Failure-finding tasks are scheduled maintenance activities to discover functional failures that are hidden. They are processes and techniques used by maintenance organizations that seek out failed components and circuits that frequently are only required to function during abnormal or an emergency condition.
For our examples, a failure-finding task would be scheduled in a maintenance management system to test the emergency operation of the bank of accumulators; or to test the emergency operation of the diesel-powered generator; or to test the high-high level switch of the tank control circuitry.
Finding functional failures that are hidden is a very important proactive maintenance task for any equipment reliability program. These tasks may save lives, production equipment, and a company’s well-being, in addition to making the equipment reliable.