← Back to the UpKeep blog

The Big 3 Equipment Failure Modes, and How to Address Them

This article contains insights from the book “Maintenance Control” by James Borowski, a maintenance professional and UpKeep customer. If you want additional help predicting and preventing equipment failures, you can download two chapters of the book for free that deal with equipment failure and reliability.

Maintenance organizations are always concerned about how long a piece of equipment will operate. For many organizations, intuition, guesses, and the experience of old timers are what determines the expectation of how long a piece of equipment will last. This eventually translates into the type of service the equipment is given.

For example, if old-timers believe that a piece of equipment in their plant requires frequent component change-outs to maintain a long life, component changes are scheduled frequently. However, for that same piece of equipment operating under the same conditions at another location, old-timers may feel that frequent oil changes are the key to long equipment life. The bottom line is that one or both of these services may not be a meaningful service and, therefore, a waste of time and valuable resources.

Guesses and intuition are not good enough in many instances. During the 1960’s, the civil aviation industry was having considerable trouble keeping aircraft flying on a reliable basis. Their normal reaction was to give the aircraft more service, do more maintenance, and change out more equipment on a regularly scheduled basis. But, this didn’t work. They needed a better approach.

After careful analysis in a joint study of FAA and commercial airline representatives, research determined that equipment failures in the aircraft industry fell into two separate categories of failure modes: age-related failures and non-age-related failures.

Age- and Non-age-related equipment failure

As the labels imply, age-related failures deal with equipment that fails due to time in service while non-age-related failures do not relate to time in service at all.

Experience tells us that all equipment will fail after some time in service—even if that service takes decades. That is, the power transmission line in your plant may fail after 30 or 40 years. This is not the age-related failure referred to here. For our purposes, we are not including equipment that fails after decades of service. On the other hand, an engine in a truck that operates continuously every day will wear out and fail after a predictable time.

Non-age-related failure relates to equipment that may fail after a month, year, or multiple years. We just don’t know. This equipment doesn’t wear out in the typical sense, but fails for other reasons. Whereas equipment that has an age-related failure mode will wear out over time and need to be replaced.

Some equipment have a life that is directly related to time; others do not have this strong tie. The intent of this article is to identify what the experts tell us about age-related and non-age-related equipment failure. Much of the information that follows is based on the work of researchers F. Stanley Nowlan (Director, Maintenance Analysis, United Airlines) and Howard F. Heap (Manager, Maintenance Program Planning, United Airlines) which was introduced in their report, Reliability-centered Maintenance, made available to the public in 1978.

What we will find is that failure is strongly related to the variety of stresses a piece of equipment receives. Failure modes characterized as age-related involve equipment that is subjected to continuous stresses throughout their lives, while non-age-related failure modes are linked to equipment that receive intermittent stresses during their lives.

Below, we’ll also explore a third failure category called infant mortality. This is a special failure mode that shows the probability of failure being highest when the equipment is first started but reduces as time goes on.

Big three categories of equipment failure

Age-related, non-age-related, and infant mortality failure modes can be combined to make more than three curves (see below). But for now, we summarize the three curves in the following way and look at what they mean and how they are typically applied:

Age-related failure mode

  • Commonly related to equipment that comes in direct contact with a product being manufactured. Consequently, they tend to be associated with shock, fatigue, corrosion, oxidation, and evaporation.
  • Random failures are expected as time moves toward the wear-out zone
  • As time progresses in the wear-out zone, probability of failure increases as equipment moves from one period to the next
  • Within the wear-out zone, equipment failures are multiple and will take the shape of a normal distribution

Graph of age related equipment failure

Related article: How to Address Age-Related Equipment Failure

Non-age-related failure mode

  • Associated with high stress from operational errors or external damage
  • Equipment failure is not based on time; no relationship with how long the equipment has been in service and the likelihood of the equipment failing
  • As time progresses, probability of failure is the same regardless of the period
  • Equipment failure is random
  • Equipment that experience more non-age-related failures tends to be complex pieces of equipment or systems.

Graph of non-age-related equipment failure mode

Related article: How to Address Non-Age-Related or Random Equipment Failure

Infant Mortality failure mode

  • Relate to equipment that has been rebuilt by trade/craft personnel or where a service has been provided that has invaded the system
  • Probability of failure is highest when equipment is first started
  • Probability of failure levels off after a time period

Graph of infant mortality equipment failure mode

Related article: How to Address Infant Mortality Equipment Failure

With three failure mode curves defined, we now expand them to the six classical failure curves of Nowlan and Heap developed in the 1960’s.

Six classical equipment failure modes

In their work with United Airlines, Stanley Nowlan and Howard Heap studied equipment failures for many years. In their report, the authors identified six equipment failure mode curves that they referred to as “age-related patterns.”

Here’s what the report says:

In each case the vertical axis represents the conditional probability of failure and the horizontal axis represents operating age since manufacture, overhaul, or repair. These six curves are derived from reliability analyses conducted over a number of years, during which all the items analyzed were found to be characterized by one or another of the age-reliability relationships shown.

The report follows with pencil sketches of the six failure mode curves that have now become classics in the maintenance arena. These curves were simply labeled A thru F with a short description without titles. Illustrated below, their graphic has been modified and enhanced with a nominal title for each curve as well as comments about each failure mode.

Bathtub curve

The bathtub curve represents an age-related equipment failure. It starts with infant mortality followed by a constant or gradually increasing failure probability and then by a pronounced “wearout” region. An age limit may be desirable, provided a large number of units survive to the age at which wearout begins.

Bathtub curve that shows how equipment fails over time

Conditional wear curve with wear-out zone

This curve represents an age-related equipment failure. It shows constant or gradually increasing failure probability followed by a pronounced wear-out region. Once again, an age limit may be desirable.

Graph of conditional wear curve with wear-out zone

Slowly increasing conditional wear curve without wear-out zone

This curve represents an age-related equipment failure. It shows gradual increasing failure probability but with no identifiable wear-out age. It is usually not desirable to impose an age limit in such cases.

Graph showing a slowly increasing conditional wear without wear-out zone

Low conditional wear curve

This curve represents a non-age-related equipment failure. It shows low failure probability when the item is new or just out of the shop, followed by a quick increase to a constant level.

Graph with a low conditional wear curve

Constant conditional wear curve

This curve represents a non-age-related equipment failure. It shows constant probability of failure at all ages.

Graph with a constant conditional wear curve

High infant mortality curve

This curve represents a non-age-related equipment failure. It shows infant mortality, followed by a constant or very slowly increasing failure probability.

High infant mortality curve

Considerations for the different failure modes

  • It’s difficult for the ordinary maintenance person to appreciate the amount of random failures that are predicted by the study of Nowlan and Heap and their six classical failure curves. It is not intuitive to them that this goes on. Yet, after years of study and application, numerous large, respected organizations have determined that this is the case. We have seen a very early example of this philosophy in the 1960’s with Walter P. Cisler and Detroit Edison, Co. Likewise, the US Navy and commercial aviation both in the United States and Europe are firm believers in the concept.
  • It is interesting to note that infant mortality is the leading culprit when it comes to equipment reliability—at least according to Nowlan and Heap. Yet, in the decades before their research, hardly any mention was made of maintenance personnel not doing a good job or an original equipment manufacturer making poor equipment that failed shortly after installation.
  • Look at your organization and the equipment that you service. Can you say that most or many of the pieces of equipment you are responsible for fail because they wear out? Probably not. If not, then how do they fail? Do any match the classic failure mode curves of Nowlan and Heap?
  • It appears that infant mortality accounts for a significant number of failures in all studies. This most likely indicates that before embarking on any reliability improvement program, this failure mode should be documented and countermeasures installed for any organization attempting to improve.