Many companies and organizations have been on the reliability journey for a number of years, while others are just beginning. There are many elements of a solid reliability program – establishing a reliability centered culture, tracking key metrics, bad actor elimination programs and establishing equipment reliability plans – to name a few. But, one key element to a solid reliability program, and one that is very important to improving unit reliability metrics, is root cause failure analysis (RCFA).
Historically, various methods have been used over time to assist in RCFA – Why Tree Analysis, 5-Whys, and Fishbone Diagrams. However, one common tool to facilitate and document the RCFA is a Cause Map, or Cause and Effect Diagram. Cause Mapping is a very flexible platform, which allows utilization for any type of failure investigation, regardless of equipment type, industry, or complexity. Cause Mapping starts with a primary effect – where the primary effect is the fundamental issue being investigated. From this primary effect flows various possible causes. Data is gathered and brought into the investigation to help the team determine what is the root cause of the failure being investigated. Determining the root cause of a failure is the best method to develop a robust solution to eliminate this failure from occurring again. In the end, defining solutions is the goal of a failure investigation and the RCFA is the vehicle to get you there.
The RCFA Process
Rules about when to engage a full detailed RCFA should be defined by the organization, and to some extent is a reflection of how far along the reliability journey the organization finds itself. Some organizations will have threshold criteria for when to engage a full RCFA such as cost of the incident, safety/health/environment severity, or goals defining a certain percentage of all failures which are to be captured by a detailed RCFA. When a full detailed RCFA is performed, a team is involved and must be committed to actively participate in the investigation. The team is led by an experienced RCFA facilitator, with team member representation from operations, maintenance, reliability, discipline subject matter experts (SMEs), SHE (Safety, Health, Environmental) and in some cases the original equipment manufacturer (OEM). The exact team member composition is unique to each failure being investigated and one of the first steps in the RCFA process is identifying the appropriate team members.
Effective RCFA utilizing a cause mapping approach provides several benefits. First, it removes subjectivity and bias from the investigation, which can be a significant problem – human nature often brings biases. Additionally, it is an efficient process, as the cause mapping method streamlines data gathering (rather than a “shotgun” data gathering approach) and focuses the team on where to explore for additional data and information pertinent to possible root causes. Also, by guiding the team to drill down to the root cause, the RCFA provides effective solutions, which is the ultimate goal of any RCFA. Lastly, the RCFA process provides very good documentation of the entire failure investigation. A well executed RCFA, utilizing the cause mapping approach, can fully harness the collective power of the team to define robust solutions, thereby improving the overall unit reliability.
When and Where to Utilize RCFA
The most obvious time to employ the RCFA work process is after a critical equipment failure. But, as time progresses and the organization starts to recognize the benefits of the RCFA work process to eliminate critical failures, a bigger picture starts to emerge. The RCFA work process can be used to investigate (and eliminate) any type failure, problem, or deficiency in the organization. For example, business failures can benefit from a RCFA investigation. RCFA can be used to explore management system issues which are systemic to the entire organization, thereby becoming a more forward looking and proactive methodology to address institutional level systems that lead to problems/failures across the organization. The point being that what can effectively benefit from a RCFA is far beyond just equipment failures.
It can also be very valuable to start the initial RCFA cause map during an equipment disassembly and rebuild resulting from an unplanned failure on a critical equipment item, with the understanding that you will not have a full team assembled yet, nor will you likely complete the full RCFA investigation before the equipment is returned to service. But, having a reliability engineer start the RCFA cause map during the equipment teardown and rebuild, working in conjunction with the maintenance group, has at least two benefits. First, it focuses effort on not just returning the equipment to service, but also focuses on what caused the unplanned event. This sometimes can lead to more obvious corrective actions to be implemented before restart, possibly avoiding a second unplanned event after startup. Second, there may be some data needed for the full RCFA – that will be not be finalized until after startup - which can only be gathered while the equipment is open, or the unit down. Waiting to start the RCFA until after the equipment is restarted could preclude not having some key data being brought into the failure investigation. Here is where organizations which have separate maintenance and reliability organizations can capture the benefit of not having the maintenance and reliability roles combined.
There has been much discussion in recent years about building a culture of reliability in an organization. Reliability is not just the responsibility of the reliability group, but involves everyone – operations, maintenance, engineering, safety, purchasing, sales, business teams, etc. One of the interesting benefits of organizations that have fully embraced the RCFA work process across the entire organization is that over time the RCFA methodology starts to impact how people approach everyday problems – it becomes how they think about even the smallest failure, problems, or defects. Now the organization starts to evolve into a culture that does not accept failure and provides a mindset to help eliminate failures across the organization. Now an even larger prize can be captured, going beyond just eliminating critical equipment failures that are subject to a full RCFA.
Becht Engineering has many years of experience facilitating RCFA and also providing subject matter experts as part of critical failure investigations. Also, we can be engaged to review and critique a client’s existing RCFA process, or provide a cold eyes review of a completed RCFA, helping our clients improve their existing RCFA process and ultimately their reliability.
Becht Engineering has experts in all equipment disciplines as well as trained reliability professionals who can assist your facility in implementation of effective programs. These efforts have significantly improved reliability, safety performance and production in a variety of plants. Please contact us if you have questions about this article or other aspects of plant reliability improvement. You may also post a comment for the author at the bottom of this page.