Episode 17: Doing Root Cause Analysis the RIGHT way with Bob Latino
This week on Masterminds in Maintenance, we hear from Bob Latino, who shares his best practices on root cause analysis. Bob is the CEO of Reliability Center, Inc. and is an internationally recognized author, trainer, software developer, lecturer and practitioner of best practices in the field of Reliability Engineering and specifically in Root Cause Analysis & Investigation Management. Robert J. Latino has been facilitating RCA & FMEA analyses with his clientele around the world for over 35 years and has taught over 10,000 students in the PROACT® RCA Methodology.
Bob shares his story of how he got into the maintenance industry and how his family has impacted his maintenance journey. More specifically, Bob shares everything you need to know root cause analysis. Bob argues that the 5 Whys of root cause analysis is not just asking why equipment fails, but also, involves the question of how could equipment fail.
Join the Masterminds in Maintenance Podcast!
Are you an industry leader in the fields of maintenance and reliability? We want to hear from you! If you would like to be featured as a guest on our podcast, please sign up here.
Stay tuned for more inspiring guests to come in future episodes!
00:06 Ryan Chan: Welcome to masterminds in maintenance. A podcast for those with new ideas in maintenance. I’m your host, Ryan, I’m the CEO and founder of Upkeep. Each week, I’ll be meeting with a guest who’s had an idea for how to shake things up in the maintenance and reliability industry. Sometimes the idea failed, sometimes it made their business more successful and other times their idea revolutionized an entire industry. Today I’m so excited to have the CEO of Reliability Center Inc, on the show Bob Latino. Bob is an internationally recognized author, trainer, software developer lecturer and practitioner of best practices in the field of reliability engineering and more specifically root cause analysis and investigation management. You’ve got tons of experience Bob and I’m so excited to chat with you today. I know that you’ve been working on root cause analysis, failure modes effects analysis with your clients around the world for over 35 years. I know that you’ve taught over 10000 students in the root cause PROACT methodology. I’m super excited. Welcome to our show, Bob.
01:05 Bob Latino: I really appreciate you having me. And I hope I live up to all that stuff you just said.
01:12 RC: Of course. Of course you will. And a whole lot more Bob. I know you pretty well, we’ve been following each other for quite some time, but I would love for you to give a quick background behind yourself and your story to all of our listeners today.
01:30 BL: Alright. It goes back quite a ways. We were established in 1972 as a research and development group of a company called Allied Chemical at the time, which was a global chemical company which more commonly known as Honeywell today through acquisitions. And what we were founded on is the basis of equipment process and human reliability and taking the principles from aviation and moving them into a manufacturing plant. So I think just thinking about that, that was 1972. And you look at people trying to get reliability started today, and it’s a challenge but this was a functional full department, roles and responsibilities, set up reliability departments at about 300 plants around the world at that time. So that was where the R&D was chartered against. And in 1985 the founder and director of that group decided to retire and he purchased his group out of the corporation.
02:40 BL: Now, it’s a huge coincidence that that fellow’s name was Charles Latino which will answer your next question about how did I get into reliability and maintenance. So my father is the one who directed and founded that group and he purchased it and then the intellectual property conveyed and we were able to take those principles and apply them to any industry that we wanted to.
03:07 RC: Wow. What a awesome story. I also, I’ve heard quite a few times that the core fundamentals of reliability stem from aviation. I’d love to learn more about what that was like and how you were able to take the learnings from aviation and bring it into manufacturing?
03:28 BL: Well, from his standpoint, he always had a fascination with aviation and the reliability that they were achieving even nearly 50 years ago with that. And he was the head of engineering and maintenance at a chemical plant that had, in this area at that time, about 5000 people, and he couldn’t sustain the same amount of reliability as aviation was. So that’s what intrigued him into doing the research into that area and then using his plant here, his plants here as a lab for himself to go ahead and take preventive, predictive and all of the technologies that were available back then. And it’s on our website if you’re into Nostalgia, is that they were doing the vibration, the infrared, the eddy-current testing, they were one of the first ones to use, doing operator rounds and with data recorders, handheld data recorders and going around. There was not a CMMS way back then, they created their own, they coded their own. They had their own lubrication management software back then, so they were really way ahead of their time. And that was 47 years ago ish.
04:50 RC: So 47 years later, you’re still doing this Bob, going strong. You definitely are a big leader in this space, and I know that one of the things that you’re very, very passionate about is root cause analysis. I’m curious how did that become your core focus or one of your core focuses? And what are you excited by?
05:12 BL: Well, a lot of that revolves when you get into the area of reliability and what you think reliability is. To me I consider myself, I grew up in University of reliability because it wasn’t just a job for my father. It was a way of life. So he built, he had reliability and design in the features of his house. So it’s not… [laughter] No kidding. So I grew up with the difference with understanding pro-action versus reaction and that’s really what reliability was to me, what I was taught was that if you PROACT effectively you don’t have to better catch the consequences of those actions. So if a lot of people try to get to failure with prediction and getting to a failure quicker, you get it in an earlier stage and you mitigate the consequences. But really what we focus on is, why was whatever you were tracking, whatever signal, why did it go out of alarm limits in the first place?
06:24 RC: Yeah.
06:25 BL: Why did I need to detect it? So if you focused on the risk side, you would be a lot better at the consequence side, at not having the consequences. So I think a lot of this revolves around the proaction versus reaction. And I do wanna tell a quick story, because… Back then, when nobody ever heard of reliability, and it’s new in this big chemical conglomerate, they didn’t know where to put it on the organizational chart. So initially, what they did was they made it subordinate to maintenance. And my father rejected the idea on the basis… He says, “Well, maintenance are today people, and reliability is tomorrow people.” And he says, “If you put a proactive function underneath a reactive function, it will never get to do the proactive activities.”
07:18 RC: Yeah.
07:18 BL: Because you’ll fall a slave to the reactive nature of the field.
07:23 RC: Yeah, maybe a quick question there. This has been brought up a few times. Do you think maintenance and reliability should be under, the same department?
07:33 BL: Well, again, I think they need to be isolated from each other in the sense that, one’s proactive, one’s reactive. One’s shorter term for, whatever I gotta deal with the here and now, and maybe the next week, and the next couple of weeks. But then reliability is looking out months and years ahead of time.
07:52 RC: Yeah, absolutely.
07:54 BL: So if you’re… I’ve been in the business long enough where you have maintenance engineers who were subjected to a new vice president that comes in, and says, “We’re gonna have reliability now.” So they changed the name of the Maintenance department to Reliability department, but they don’t change the function.
08:16 BL: So if I’m a Maintenance Engineer, and the operations is… I’m doing reliability stuff, and they don’t care. I need the person to deal with my issues right now, today. Changing the title didn’t solve the problem.
08:32 RC: Yeah. Well, Bob would love to chat a little bit about root cause analysis. Maybe could you walk us through how to perform a root cause analysis, the right way?
08:45 BL: Well, the methodology that we espouse… I’m just gonna go through it, because if you look at any investigative occupation, all the steps are the same. So whether somebody’s method of RCA meets the criteria or not, that’s subjective. But when you look at the steps, and the PR of preserving evidence, PR of proactives preserving evidence, and you look at any… I don’t care, NCIS, pick any of your favorite crime shows, and you know what’s step one is, you go out there, and you collect the… You cordon off the area and you preserve evidence. The second portion of that is the organizing of the team. What are the characteristics of roles and responsibility of team members? Because certainly, bias can have a play in an analysis.
09:39 RC: Yeah.
09:39 BL: If I have something to gain or lose by the analysis results, I shouldn’t be leading it. That’s just common sense.
09:47 RC: Who has a motive? I’m thinking back to the crime show. [chuckle]
09:51 BL: Right. It happens, all the time, though. You’ve been in this business a while, and you see that when an RCA… When there’s a need for an RCA, what they do is, they find the person most familiar with that particular equipment, who’s the expert. And then the expert really has to go through the formalities, because he really don’t want a team, anyway. He or she doesn’t really want a team, and everybody’s intimidated by him, because they’re scared to ask the stupid question. And he don’t wanna hear it. So you gotta get away from that. And then the A is, the analyzing of the event with the team that I have, and that to us is, the most important part in this… What most people consider RCA to be, but you gotta have a graphical expression, a graphical reconstruction of what happened. And you have to… This follows basically, scientific methods, you gotta come up with hypotheses, you have to have verifications for those hypotheses. And you just keep drilling down on what’s true. One of the uniqueness is the questioning from the facts is, you keep asking, “How could?” Think about the difference between asking “How could and why?” Because there’s a large population that believes that the root cause analysis is consistent with the application of 5 Whys.
11:19 RC: Yeah.
11:20 BL: But if I say,”How can a crime occur?” versus, “Why a crime occurs.” They’re very different answers.
11:26 RC: Interesting, I never thought about that way.
11:29 BL: Well, think about… And later in these questions I know, we’re gonna come across the difference between the physical sciences, the physics of failure and the social sciences, which is… It gets into the soft stuff about humans and how they make decisions.
11:45 RC: Yeah.
11:45 BL: But when you’re going through the physics, not much changes in physics. So when I go through, “How can a bearing fail?” There’s only so many answers to that question. Okay. So when I’m going through that asking, “How could?” is appropriate, and then using your metallurgical analysis to determine whether… What’s true and what’s not true, but eventually you’re going to come across a human that made an error of omission or commission? That they did… They made a decision to do something they should have done, or they should have done something and didn’t do it. So that’s when you start switching over. And a lot of people end their analysis there and blame somebody.
12:27 RC: Yeah.
12:28 BL: But if you’ve done that you really… That really should be the beginning of your investigation not the end of it, because you won’t get anybody participating in future RCAs if you’re using it as a weapon to point out that somebody did something wrong. The real gold in that comes from, “Why did that person that day think it was the right thing to do?”
12:50 RC: Yeah.
12:50 BL: That’s what a progressive organization would want out of their RCA system. And that gets into the soft stuff of understanding the systems that people depend on. Because I know you’re in the CMMS business, and you sit there, and you look at all of these systems that are lacking. And we have new technology, but we didn’t implement any SOPs to support it or I didn’t provide anybody training when we did do that or what I do have in place is obsolete and a well-intentioned person followed an obsolete procedure and then we’re gonna go ahead and just discipline him and hope that it goes away. So a lot of this gets down into past the decision maker and into the reasoning. And then moving on to your original question as I can completely get off the tracks is that the C in PROACT after that is about communicating findings and recommendations because half of the headaches are just solving the failure. But then the other half is getting something done about it and this is where great tools like yours come into play because now I have to come up with the recommendations. It can’t be a check-list item and say just because I have a recommendation that it worked. So the T in PROACT is about tracking for bottomline performance. It’s really the measure of effectiveness. Where are the ROIs that say that I utilized all this talent and resources to do an RCA what do I get for it.
14:26 RC: Yeah, yeah. I feel like that’s a very common thing that people forget. It’s not just an analysis just to do an analysis, it’s to do an analysis to ultimately take action to prevent that from happening again.
14:39 BL: Well, and even the bigger picture for a more progressive organization is that you’re building an internal knowledge base. If you have an RCA knowledge base and you use it properly, that all can be searchable to find out when you have… You’re solving the same thing over and over again just because people don’t know somebody else has done it or you’re taking logic for people that do the RCAs. You might have an engineer at one location that’s like a materials engineer where you don’t have at another location so that they wouldn’t know the questions that people have answered before.
15:20 RC: Yeah. What do you think most people get wrong about doing a root cause analysis?
15:26 BL: I think they use it as a checklist item, they wanna get regulatory drivers off their back, they wanna get their bosses off their back. So they use a lot of fancy words and check boxes and say, this really looks good but… And it meets whatever criteria but nobody ever goes to track whether it was effective.
15:45 RC: Yeah, yeah. So how do you know when to run a root cause analysis? When is it the right time to do it?
15:54 BL: Well, the real answer is that any organization is gonna have triggers or thresholds in which they’re going to conduct an RCA and they’re usually based on regulatory drivers. Somebody had to be hurt to a recordable level, we had a certain amount of dollar loss of production, I had equipment damage in excess of 100,000 or whatever the case may be. But what I would like to see happen in my unicorn world is that this that you have… That RCA is unfortunately… They really need a… They need a PR makeover because they’re associated with everything negative. Well, that’s because, well… I work in hospitals. It’s funny because if they know what you do you’re not there for good news. So it’s like people close doors when you come down the hallway because you’re not there to say, great job.
16:58 RC: Yeah. It’s almost like the weapon that you’re talking about.
17:02 BL: But the reality is that root cause analysis properly applied could be done proactively. It’s unfortunate that the only reason we implement the thought process is because the trigger has already happened. So the consequence has already occurred as opposed to, you mentioned that to me earlier which is the failure modes and effects analysis, and that’s a proactive tool. So that’s essentially a risk assessment that the probability times the severity of something happening equals criticality. If you did a pareto cut of that type of analysis, you’d find that 20% or less of the failure modes account for 80% or more of the risk. So there’s no reason you can’t do an RCA where it starts off with unacceptable risk of X because the thought process going through the how can questioning you’re just… Now you’re just saying how can the risk be so high? So you’d be mitigating risk in that sense and preventing consequence.
18:09 RC: And I think that 80/20% role is actually really, really important when you’re figuring out, Okay, what do we choose to run an RCA on? So you mentioned safety and you mentioned like, hey we run… So oftentimes people run an RCA when there’s a safety incident. What’s the difference between running a safety investigation and an RCA or a root cause analysis? Is there one?
18:40 BL: There is and there’s all sorts of qualifiers about dictionaries and words that you use here because there’s safety, there’s traditional safety and then there’s a whole another progressive movement called safety differently which is… Looks, claims to make safety look different than the traditionalist. I can tell you from the traditionalist standpoint they essentially… They define RCA as consistent with the principles of the 5 Whys and 5 Whys is very predominant. It has its place. I personally don’t consider it as root cause analysis as a qualified tool. And if you think about that, the reason that I’ll say I’ll back that up is that it gives people the impression that failure happens linearly and it doesn’t. Think about the difference in asking why and how could. How could has a lot more possibilities as opposed to if I’m going to, Why? It’s very limiting and misleading and also it gives people the impression that there’s one root cause. So I’ve been doing this for 30 some years. I have hundreds if not thousands of analyses. I’ve never had one, had one root cause. I don’t know where even 5 Whys came from. Why is not 3 Whys, 10 Whys? I have no idea.
20:17 BL: But it’s a good marketing I guess.
20:19 RC: Yeah, it’s good marketing really… Everyone knows it.
20:23 BL: It came out of Toyota and it was meant for individuals on an assembly line to peel the onion a little deeper with things that they faced individually.
20:32 RC: Yeah.
20:33 BL: It was not a tool meant for complex type of failures.
20:37 RC: Right, right. I understand the 5 Whys in the linear progression because it’s much more succinct. You try to find a root cause, but asking you, right… The question “How could” that’s very broad, and I’m imagining it takes a lot more time to try to answer that question. How could… How could?
21:00 BL: Absolutely.
21:05 BL: Do you wanna be quick, or do you wanna be right?
21:06 RC: That’s fair. Totally.
21:08 BL: If there… Say there was actually 10 root causes associated with an incident and you found one that all you’re gonna do is get really good at doing the same thing over and over again.
21:22 RC: Yeah, yeah. So, how does that relate to a safety investigation then?
21:28 BL: Well safety has a notorious reputation for when a safety incident occurs, step one is, the organization, especially if it’s a serious injury or something like that, the organization is gonna… Step one is gonna go check all of their paperwork and say, “Do we have policies and procedures in here that people should have adhered to?”
21:53 RC: Yeah.
21:54 BL: And once they find out that they’re covered then they’re gonna go… Step two is, find out who broke our rules and then discipline that person. It’s the assignment of blame. So, the reality of the difference between the way I would perform RCA which takes, I can easily do on safety events because the thinking, the thought process is the same, is that you don’t stop with the blame is that you delve deeper into the reasoning and you get into the head of that person. That’s how they feed their family.
22:28 BL: They didn’t say, “I’m gonna come to work today and screw up everything.” And then, ha-ha-ha… What made them think that that was the right thing to do that day? That’s really what a progressive management should be interested in. And they should actually reject RCA if they don’t get to that depth.
22:45 RC: Yeah.
22:46 BL: Because they’re not complete.
22:48 RC: Yeah. I love the attitude Bob. And, I think that really shines a good light on our industry and take away this blame game and move it more towards like you keep mentioning a progressive environment, something that’s supportive of the team and really tries to understand. I’m gonna say it, the “why” behind the actions? [chuckle] Maybe the “How could” behind the actions? [laughter]
23:12 BL: I think it’s where people get uncomfortable because I don’t know whether it’s unique or not. To us it’s unique that our form of RCA, the way that we see it, the way that we define it couples the physics involved with a failure as well as the social sciences. And, what that engineers are great at the physics side.
23:36 RC: Yeah.
23:37 BL: But you get them down into the soft stuff and they’re lost.
23:41 RC: Yeah.
23:41 BL: And, there is a unique skill to those involved with human performance and human and organizational performance of being able to do the questioning in a much more effective manner than the directness and bluntness of engineers. But conversely, what’s missing in safety is that it’s the opposite. If you have a bunch of social scientists, type folks who are good with the human, they’re lost in the physics of the failure. That they don’t even really wanna even deal with that side because in their cases, they’re starting with the human and trying to figure out the why the… How their day was going, and you know what led up to whatever the bad outcome was? But if you ask them to look at a bearing, they’re gonna say, “It’s round.”
24:33 RC: Bob, do you have any examples of a time where you ran a root cause analysis? Could you walk us through an example and an interesting insight that you were able to find coming out of this root cause analysis?
24:52 BL: I don’t know without getting into something that’s gonna be boring with people that I can’t show pictures and stuff to. I’ll just go through a very easy scenario. And I’ll say that we had a production… A significant production outage. And, I’m going through the… I know that a pump failed. Okay, so that’s kind of like when you go to the crime scene and you put the crime scene tape around there. Just like what we do whatever is inside that tape is, are the facts and our job is to explain the facts. How did they come to be? So, I get into it and I know that a particular bearing has failed within this pump and my question becomes, “How can a bearing fail?”
25:42 RC: Yeah.
25:43 BL: Now, you can have a bunch of engineers on your team, but the thought process is that in reconstruction, is that you’re going backwards in short increments of time and you’re actually visually watching this happen. So if I ask you, “How can a bearing fail?” Most people are gonna say, “Well, it was a bad bearing. The lubrication people screwed it up, we didn’t install it, right?” It’s gonna be all of these things that could be, that could happen.
26:17 RC: Yeah.
26:18 BL: But the answer to that question, the way that I have posed it. There’s only four ways a mechanical component can fail. It’s erosion, corrosion, fatigue, and overload. So when you have the failed part and you have a meddler just look at it, they’re gonna tell you by reading the surface, which it is. So say, they come back and the majority of times they are fatigued, and you would say, “Well how can I have a fatigued bearing?” It could be mechanical fatigue, it could be thermal fatigue. Well, what did my bearing tell me? Say, it’s mechanical fatigue. Okay, I continue following what’s true, how can I have mechanical fatigue?
27:00 RC: Yeah.
27:00 BL: The only way I can have that is imbalance, resonance and misalignment. So when I go through that and I say, “Well, all of these have to have the evidence from the scene to prove whether they did or didn’t happen.” And we determine that the person who was aligning didn’t have the skills to align, they weren’t aligning properly from the beginning. Now, a lot of people can end up there and say, “Well I’m gonna discipline that person for not aligning properly.” And then what I think of when I see things like that and especially in my work in hospitals, when you have people that are not, or deem not qualified to be in the position, where is the managerial oversight? Who allowed somebody to be in the position that wasn’t qualified to be in the position? So when I get down, and I’m saying, “Well why would somebody not align properly?” Well, one, they didn’t know how. Two, they were put in the position and allowed to be in the position by oversight, and this typically does happen with attrition, when somebody leaves and then they just give the role, lubrication’s infamous for that, it’s deemed as a performatory duty and there’s not a lot of science to it, so just give him the can and the route and say, “Go ahead” and it’s that simple.
28:26 BL: But the same goes with imbalance or I mean misalignment, and we wanna understand… They didn’t have the right tools to do it, they weren’t taught how to do it, and they were put in a position where they shouldn’t have been. So that gets into what we call the latent true causes, which are the systems-oriented issues that need to be addressed that led to that decision. Now in that very simple example, if I’d have disciplined the person for the decision, does the problem go away?
29:00 RC: Nope.
29:00 BL: If I’d have just replaced the bearing, does the problem go away?
29:05 RC: No.
29:06 BL: Because unless you address those systems issues that are influencing the decision makers, it’s gonna come back. And that guy, if you discipline him, may him or her, that may not happen with them again, but it’ll certainly systems are made for one person, it’ll come up with somebody else, and then you just keep doing the same thing.
29:28 RC: Well, Bob that was extremely insightful, you didn’t put me to sleep, don’t worry about that. [chuckle] I felt like that was extremely helpful to just walk through that decision making process when you’re asking the “How coulds, how coulds” really…
29:45 BL: When you ask “How could?” ’til you get to the decision maker, then you switch to “Why?”
29:49 RC: Yeah, yeah. Well, thank you so much for that Bob, that was again very, very insightful. Curious, what’s something you wish more people knew about the maintenance and reliability industry?
30:02 BL: Well, I had a interesting… I’ve been hanging out in the safety world for the last two years, and it’s been an enlightening experience for me what their perceptions are of reliability. And I can tell you the sound faith that the reliability field outside of our bubble is viewed as a component-based field, that they think that we are a field of broken stuff. They don’t think that we address anything human-wise, or systems-wise, that systems thinking is not employed in this field. And I was even called out on it once, I kept defending reliability and referring to it as holistic reliability when you look at equipment process in human, and that’s just because that’s the way that I was raised. But I found through this exercise that not everybody… These other people were right, because all they did was direct me to Wikipedia [chuckle] And when you look up reliability, it is a component-based definition. And I had to eat that one, but that gave… That was an epiphany for me, that the outside world just sees us as mechanical type people.
31:27 RC: Yeah, yeah. I truly believe from you in this conversation, and you view reliability so much more than just a single component, you brought it home to you too. [chuckle]
31:41 BL: Yeah, literally, yeah.
31:43 RC: Exactly.
31:43 BL: It’s not something that you just turn off, it’s a way of thinking, it’s your lifestyle, it’s not your job.
31:50 RC: Alright, I love that Bob. So I’m curious, where do you go for additional resources to continue learning to getting better in this space and field?
32:04 BL: A good therapist [chuckle] I’m just kidding. Like I said, I like to go outside of my circles to become uncomfortable. And where I see areas that are emerging, at least or are of interest to me personally, the AI related to RCA and utilizing that RCA database of logic to be able to better predict and trend the future. The utilization… And to that point, the utilization, ’cause we have templates that we build into our offerings, our solutions. These RCA templates that focus on, just like I said with erosion corrosion fatigue and overload, we have them related to mechanical electrical safety, all that kind of stuff. But the constructs of those are what will feed an AI algorithm.
33:06 RC: Yeah.
33:07 BL: I can foresee that type of thing growing as the industry progresses.
33:07 RC: Alright.
33:07 BL: And the last one is just the neuroscience aspect of understanding, it’s a fancy word for our human reasoning, and just understanding why people do what they do, I think we need more of that.
33:07 RC: Yeah, absolutely. And again, I think it goes back to your view of a holistic field and industry. Bob, can you share with our listeners all the ways, the different ways that they can connect with you and follow you on your journey?
33:07 BL: I don’t know. I’m probably… I’m nowhere near connected as you young folk, but I am pretty heavily involved on the LinkedIn forums. I do write a lot on there from our standpoint of… You know you’re early in the game with reliability when you’ve got the domain of reliability.com [chuckle]
34:11 RC: Well, that must be a good domain to have. Bob, thank you so much for joining us. I learned a ton personally, thank you to all of our listeners for tuning in to today’s masterminds in maintenance. My name is Ryan Chan, I’m the CEO and founder of UpKeep, you can connect with me as well on LinkedIn, I’m pretty active, pretty fun. Or you can also email me directly, my email is [email protected], thanks so much and until next time. Thanks again, Bob.
34:36 BL: Thank you Ryan.