A COMPREHENSIVE GUIDE TO ROOT CAUSE ANALYSIS (RCA)
Most processes carry certain risks – and this is certainly true in various industries and businesses. In fact, there are times when things don’t work or go wrong, no matter how well you plan or even if you have the most knowledgeable people and state-of-the-art equipment on the job.
And although it’s important for a vessel, piece of equipment or plant to be able to function at maximum capability, problems can and do occur. Of course, minimising risk is always a priority, considering that mistakes or system deficiencies can lead to reputational damage, profit loss, fractured client relationships and injuries or even fatalities, as in the case of some mining deaths in WA.
In big industrial settings where the slightest error or malfunction can mean damages worth hundreds of thousands or millions of dollars, it’s crucial to find out the root cause of every problem that occurs. This way, you will not only solve the problem but also take the necessary steps to prevent it from recurring.
The process of identifying problems and the factors that cause them is referred to as root cause analysis or RCA. Sometimes you need to dig deeper to find the root cause rather than just identifying a secondary cause.
In this post, we’ll discuss what root cause analysis is in greater depth, its many uses and benefits, how it works, its different types and other intricacies of this essential process.
What Is Root Cause Analysis?
Root cause analysis is an inspection technique performed to identify all the contributing factors in an incident and determine the root cause for why a failure occurred.
Whilst identifying and fixing the failure that has led to an incident is necessary, isolating the root cause is essential to preventing reoccurrence. Jumping to conclusions without a detailed root cause analysis can prove very costly in the long term.
AME has a simple but very effective methodology for determining the actual root cause of an incident.
Root Cause Analysis Meaning and Definition
The Institute of Internal Auditors Australia defines root cause analysis as ‘a problem-solving methodology based on the idea that effective management requires more than merely reacting to problems, but finds ways to prevent them’.
RCA’s meaning in business is basically the same, as it refers to the formal, systematic exploration of the possible causes of a problem with the end goal of identifying its root cause. By isolating and analysing the cause of a problem and addressing it adequately, you can prevent the problem (or similar concerns) from happening again.
What Is Root Cause Analysis Used For?
The wide applicability of root cause analysis makes it a highly useful method for revealing weaknesses or inefficiencies in processes and systems.
In the mining sector, the significance of root cause analysis in safety incidents is widely acknowledged. RCA is used to investigate and analyse the following:
- Accidents and incidents
- Plant and equipment failure
- Human errors
- Maintenance problems
The application of root cause analysis is one of the best, most effective ways of making industrial workplaces safer, healthier, productive and efficient.
The Three Levels of the Root Cause of Failure in a System
In mining systems, there are three levels of the root cause of failure.
- Physical root cause: Plant and equipment failure caused by physical reasons. This provides the technical explanation as to why the equipment broke down or the parts failed.
- Human root cause: Equipment failure caused by human intervention – which could mean omission or commission leading to the physical root cause. It refers to what someone did or failed to do.
- Latent root cause: Equipment failure caused by organisational-level decisions that trigger a fault event. These are deficiencies in management systems or approaches that enable human errors to continue happening unchecked. Some examples include outdated or inadequate training, safety procedures, maintenance systems or organisational behaviour.
To illustrate the three, let’s say there’s a bearing failure in certain plant machinery. Bearing failure can lead to plant equipment breakdown after some time. In this case, the bearing failure is the physical root cause when the plant equipment stops working.
However, there is a human root cause behind the bearing failure, which usually involves poor maintenance and monitoring, like incorrect lubrication or contamination from failed seals. The human root cause here would be improper or total lack of maintenance leading to bearing failure.
In this case, one must also ask why the bearings were poorly or not maintained at all. This is where the latent root cause comes in.
It may be that the company has a substantial number of assets, equipment or machinery that require the attention of an in-house maintenance team. But instead of getting people like engineers or technicians dedicated to ensuring all assets are audited and have a correct maintenance strategy assigned, and subject to a regular maintenance schedule, the company might have developed a culture where equipment is used until it fails completely. Or perhaps fixes are only made and machines are only checked when something fails.
If this is the way things are done in a company that uses plant equipment or any type of machinery, accidents and equipment breakdowns could be quite common. The latent root cause is the management’s lackadaisical stance on plant preventive maintenance.
This kind of system (or lack of it, in this case) needs to change if the company wants to prolong the lifespan of its physical assets (i.e., plant equipment) and make its workplace a safer one for its employees. By investing in maintenance, they can avoid equipment failure and prevent production problems, too.
Incidents are often due to a combination of factors, some of which may be very difficult to ascertain due to misunderstanding, bias or lack of knowledge from those central to or assessing the issue. Using an unbiased third party and experienced, qualified personnel to perform root cause analysis is highly recommended to assist with this.
As outlined above, it is also important to consider the role organisational procedures have played in a failure and review of your site’s maintenance processes to help prevent further incidents. If managed correctly, this will improve your overall site safety and improve productivity, preventing further failures and saving you time, money and resources in the long term.
If you do not feel confident in following your root cause analysis procedures, AME can provide professional engineers to perform a third-party unbiased analysis of the failure for you.
Purpose and Goals of Root Cause Analysis
The primary purpose of root cause analysis is to study a problem systematically and reveal its root cause.
Moreover, RCA aims to provide a complete understanding of how to fix, compensate for, or learn from the underlying issues behind the root cause of a problem. In doing so, you can apply the results of the analysis to prevent problems and ensure success.
Root cause analysis is a useful tool for modifying core processes and resolving system issues in a methodical manner to prevent problematic situations from arising.
Benefits of Effective Root Cause Analysis
Aside from finding the root cause of a problem, there are other benefits of effective root cause analysis, including the following:
- You can address root causes directly instead of initiating reactive firefighting. Rather than depending on quick fixes or short-term solutions referred to as ‘firefighting’, you need to work on finding the source of the ‘fire’ to prevent it from happening again. With RCA, this is possible, so you needn’t spend time, resources and effort on putting out fires (symptoms of the problem) temporarily. By applying the results of root cause analysis, you can actually prevent ‘fires’ or problems from cropping up.
- You can make better and faster informed decisions. Sometimes, similar problems can occur in an industrial setting or any type of business environment. Something you resolved in the past might come up in a different form or version after technological innovations or system changes have taken place. Although you might encounter such problems, your experience with RCA will help you make better decisions even when you are pressed for time. With your knowledge of root cause analysis, you can step in quickly to fix a problem and prevent it from causing more damage in the process.
- You can minimise or control costs by finding solutions. Problems that go unchecked or are fixed temporarily can escalate into bigger, more expensive issues. By using RCA to solve the problem at the root, you can avoid costly repair or replacement expenses. You’ll be preventing loss of profit and clientele, which is one of the effects of production problems arising from equipment breakdowns.
- You’ll improve your communication skills. Some problems occur because of communication failure, such as when you are unable to describe exactly what led to the problem. When you dedicate time to root cause analysis, you’ll gain an in-depth look into the problem and the many causes leading to it, as well as the symptoms. The more information and details you acquire, the more adept you’ll be in discussing the problem with your team. This way, you can brainstorm together and come up with viable solutions.
- Your business will remain competitive. Today’s competitive manufacturing landscape requires miners to utilise their resources efficiently and effectively. But to do this, shop floor activities like problem solving should be done in a methodical manner. With RCA at your disposal, you can find and eliminate recurring problems and make better decisions that ultimately benefit your clients and employees. Of course, this will come back to you in the form of profit and business growth.
Principles of RCA
As a systematic process designed to solve problems, there are certain principles of root cause analysis.
- The main goal of RCA is to identify the factors leading to the problem, including its nature, magnitude, location, and timing. Delving into these details helps to determine which behaviours (commission or omission) or conditions need to be changed or eliminated to prevent the same harmful outcomes or problems.
- By considering all possible solutions to a problem and adopting the simplest solution with the lowest cost, you can prevent its recurrence.
- Focusing efforts on the monitoring and evaluation of performance improvement at the root is a far more effective solution than simply getting rid of the symptoms of a problem.
- A successful and effective RCA is always done systematically, identifies causes by presenting factual evidence and provides conclusions based on data.
- In general, there may be more than one root cause for every kind of problem.
- Causal relationships identified between the root cause(s) and the defined problem during RCA should be clearly defined.
- Root causes that are discovered or revealed vary based on how the problem or event is defined.
- An efficient root cause analysis leads to the identification of a process or chain of events for one to fully comprehend the relationships between the root cause(s), contributing or causal factors, and the problem or event itself.
- The consistent application of RCA to problems in an organisation can change a reactive culture into a proactive one, so problems are solved before they happen or escalate.
- Organisations with cultures that are resistant to change usually reject root cause analysis.
Root Cause Analysis Process Flow
Although different businesses or organisations may have a more detailed, step-by-step method of applying RCA, the basic root cause analysis process flow is as follows:
- Identification of the problem: This initial step requires you to define the problem, event or issue and its effect on your business, operation or specific system.
- Data gathering: The data gathering step involves collecting information on the duration of the problem, how the problem was identified, the consequences of the problem, and so on. You need to select your information sources carefully to ensure you have high-quality data.
- Analysis of the problem: With the data on hand, you can start analysing the problem and identify possible relationships among different causes or factors to ultimately arrive at the root cause(s).
- Problem resolution: The final step after conducting a thorough analysis is solving the problem – which means finding a solution that will address the root cause and prevent recurrences. At this stage, you’ll also need to talk about the details of solution implementation, such as who will be tasked to conduct or oversee the implementation and the timeline. You also need to be familiar with any possible risks that come with adopting certain solutions.
Types of Root Cause Analysis Techniques
RCA may be implemented using one or more of several types of root cause analysis techniques available, including:
- 5 Whys Analysis
- Causal Factor Tree Analysis
- Failure Mode Effects Analysis (FMEA)
- Change Analysis
- Barrier Analysis
- Fish-Bone Diagram or Ishikawa Diagram
- Pareto Analysis
- Fault Tree Analysis
Some of the common root cause analysis techniques are discussed in detail below.
5 Whys Root Cause Analysis
The 5 whys approach in root cause analysis attempts to answer anywhere from two or 50 whys – similar to a child asking one why question after another until the root cause is revealed. This method helps to prevent assumptions by finding detailed responses to successive related questions until the final why question is answered, thereby revealing the root cause to resolve or fix.
Advantages of 5 Whys RCA
- Simple and easy to implement
- Enables you to go beyond the symptoms and identify the real cause of the problem
- Prevents you from taking premature action
- Helps cultivate a culture of continuous improvement
Disadvantages of 5 Whys RCA
- Reliability may be a concern if different people yield different answers in their search for the cause of the same problem.
- Its applicability is limited to the knowledge and experience of the people implementing it.
Depending on who is asking questions, the 5 whys accident investigation may not be thorough enough to reveal the true cause of the problem.
You can use a Pareto chart for root cause analysis to discover the causes of problems or events and quantify their relative frequency.
Advantages of Pareto Analysis
- Helps to reveal the main causes of defects or problems
- Defects identified are ranked in order of their severity (descending order)
- Aids in determining the cumulative impact of certain defects
- Provides a basis for prioritising and resolving certain defects first
- Improves problem-solving and decision-making skills
Disadvantages of Pareto Analysis
- You need a separate RCA tool to determine the root causes of defects.
- It only focuses on historical data concerning the damage that has taken place.
- Its applicability is limited to specific cases.
- Mistakes in scoring or applications can crop up if it is not conducted carefully.
Fishbone Method of Root Cause Analysis
Creating the fishbone diagram for root cause analysis is another common RCA method. Also called the ‘Ishikawa diagram’ for root cause analysis, it aids in visually mapping cause and effect relationships to identify possible causes of a problem.
Advantages of Fishbone Method
- Easy to learn and apply
- Can be easily understood because of the visual depiction of relationships
- Has been in use since the 1960s
Disadvantages of Fishbone Method
- It can lead to irrelevant potential causes.
- It is sometimes based on opinions or subjective information.
- There is no set pattern for building the bone structure of the fish.
It’s difficult to illustrate complex relationships using fishbone diagram root cause analysis.
FMEA Root Cause Analysis
Failure Mode and Effects Analysis or simply FMEA root cause analysis is a method of identifying where processes, products or designs are highly likely to fail and the possible reasons why.
Advantages of FMEA Root Cause Analysis
- Highly effective approach for evaluating processes, services or products
- Logical and structured
- Allows for the early identification of system interface problems and single failure points
Disadvantages of FMEA Root Cause Analysis
- The quality of the results depends on or is limited to the knowledge and expertise of the team implementing it.
- It may not eliminate failure modes and requires other tools and actions to resolve the issue.
- It may prove unmanageable depending on the scope and team.
- FMEA needs to be updated regularly to remain relevant and useful in addressing potential failure modes.
Root Cause Analysis Example: Real Life AME RCA
To illustrate how AME implemented root cause analysis in an actual client case study, we’ll use the RCA process flow to provide a basic overview of the incident.
Identification of the problem: Failed Palfinger PK 8501 (loader crane) Boom Arm Section
AME submitted the failed boom arm section to their 3rd party laboratory for detailed metallurgical analysis, including a visual analysis and non-destructive testing, fractography, chemical analysis, a hardness test, metallography and tensile testing.
Analysis of the problem
- There was clear evidence of heat affected zone (HAZ) cracking running parallel to the boom welds. HAZ cracking occurs in the base material, not in the weld material and in order for a HAZ crack to occur, three conditions must be present simultaneously: Sufficient levels of hydrogen, enough sensitive material and finally, a sufficiently high level of residual or applied stress. It is unlikely that the contributing factor was residual stress in the welds as the boom was manufactured by the original equipment manufacturer (OEM). However, the high levels of cyclic stress presented by workers moving around in the basket could provide the required applied stress to initiate HAZ cracking.
- Due to the fact that the vehicle loading crane has been modified to function as an elevating work platform, the boom would now be exposed to far greater cyclic loadings than previously designed. The manufacturer designed this beam, along with its rated capacity, based on the fact that the beam will be used as a VLC. Because of the nature of operation of an EWP (the operator constantly manoeuvring the workbox as well as moving around within the workbox), the degree of cyclic loading within the boom connected to an EWP is considerably higher than that of a VLC boom. Consequently, Palfinger granted only a one-year warranty for on the VLC.
- A magnetic particle inspection showed transverse cracking at the basket end of the boom but not at the opposite end. Additionally, the weld material was found to be slightly harder than the parent and HAZ material. Transverse cracking is usually associated with cases where the weld metal is higher in strength than the base material. Similar to HAZ cracking, transverse cracking is also a factor of stress, excessive hydrogen and a sensitive microstructure. Due to the high stress forces and shear forces close to the basket, as a result of EWP operation, it is more probable to find the transverse cracking occurring at the basket end of the boom.
- The boom is designed to have the longitudinal weld joint primarily in compression during operation, the nature of the welds under compression allow for a stable mechanism (hence the weld joint is at the bottom of the boom). It is important to note however, that the boom will also experience a swap from compression to tensional forces as a result of the movements of the operator in the basket. These tensional cyclic forces may act as a critical factor related to the weakening of the welds in the PK 8501 Boom Arm as it would place the welds under tension along the longitudinal axis, promoting cracking in the boom by method of fatigue cracking. The first boom extension would be extended on the majority of the jobs and hence, subjected to the majority of the cyclic forces.
- Finally, it is worth noting that the geometry of the weld left a sharp corner at the root of the weld. Kirsch’s solution allows us to analyse the effects of stress concentrations related to holes or spaces in a plane and how important it is to keep edges smooth and rounded so as to avoid locations that may act as stress concentrators (stress raisers). It is possible that this sharp corner could act as a stress concentration location and initiate a crack.
This is now the third elevating work platform boom section to be found containing longitudinal cracking. These cracks all followed the same failure mechanism of initialising internally and then propagating outwards to the surface.
AME recommended the following practices to assist in the prevention of future boom failures:
- Include regular visual inspections within the existing maintenance procedures.
- Initially have all existing units magnetic particle inspection tested.
- As all previous failures have initiated from inside the tubes, introduce annual ultrasonic testing, one of the leading NDT inspection methods, via the use of angle probes, to assist early detection.
- A thorough cost analysis on the merits of repair verses replacement of failed booms.
Root Cause Analysis Consulting and Report
If you are looking to improve systems and processes, you can depend on AME’s team of experienced engineers who are very well-versed in root cause analysis.
Aside from providing you with a detailed engineering report comprising key information, such as findings and recommendations for the prevention of recurring failures, problems or events, we’ll also give you expert guidance in evaluating and improving your current maintenance system.
So, don’t wait to experience a major incident or significant failure.
➜ Get in touch with AME so we can get to the root of the matter and provide long-term solutions to your concerns.