Leadership Lessons In Outage Work Selection, Planning & Scheduling

Leadership Lessons In Outage Work Selection, Planning & Scheduling

As production outages become costlier and can result in significant profit loss, the need for these to be planned and prepared for the shortest time possible and with greater certainty is becoming increasingly important.  This paper outlines four leadership activities that the author believes are critical to achieving this goal. These include cost-justified preparation steps for possible discovery or found work, as well as rigorous work selection, optimization of the work scope to move as much of it outside of the outage’s duration window and off the critical path itself.

Labor, far more than materials or equipment, will be the major cost during a plant outage, with few exceptions.  The final cost and duration will almost certainly be determined by labor efficiency.  This, combined with the need for a safe workplace, leads to a bias toward minimizing the outage’s working hours.

“The best outage you can have is the one you don’t have to do,” a senior manager once told this author.

Having the “right” scope is critical to achieving high predictability of the outage’s outcomes, whether it’s having an injury-free workplace, meeting cost objectives, its duration, or meeting production targets during the next run.

Individuals responsible for planning outages in continuous or semi-continuous manufacturing industries, in today’s ultra-competitive environment with compressed profit margins, should:

  1. Ensure that only essential work is completed during the outage; low-return or benefit work is excluded
  2. Move work as much as possible outside of the outage window
  3. Improve the execution plan to reduce the risk of time (and cost) overruns, and
  4. Be prepared for any potential found work.


Let’s look at these four leadership lessons in selecting, planning, and scheduling outage work.

1. Selecting Only the Necessary Work

Each location has its own method for identifying and selecting work to be completed during the outage, with risk-based work selection (RBWS) becoming more popular in recent years.  We won’t go into detail about this because there are several articles on the subject.

Rather, we’ll concentrate on the leadership guidance that is frequently absent or sometimes ambiguous.

Why are we shutting down production now?

I used to tease my coworkers that you should be able to hug the piece of machinery that makes you stop production when you plan to… (Do not attempt this; you may burn yourself!)

While amusing, the point is that we must all agree on why the outage is necessary and, as a result, when it must be carried out.  There could be several reasons for this:

  • Predicted end-of-life of unspared equipment may lead to major loss of production – or worse,
  • Deteriorating production capacity or quality due to internal fouling or other reasons,
  • Regulatory requirement for equipment inspections, and so one.


Another way to answer this question is to ask what the consequences will be if the outage work is not completed at this time.

What are we trying to achieve during and after the outage?

The best way to approach this problem is to figure out what the site wants to accomplish in the next production period, or until the next scheduled outage.

  • Restore or increase production capacity?
  • Introduce new products?
  • Re-establish or improve production yields or quality?
  • Restore or improve production reliability?
  • Improve energy efficiency or operating (process) safety?


There are undoubtedly many more, some of which combine several objectives.

Work execution objectives must also be listed, and these will usually include targets for workplace safety, quality, cost, and duration.

In any case, the key takeaway is that the why, when, and what must all be documented and approved by all stakeholders.

They’ve now become the filters that must be applied to all items on the outage worklist.

2. Get as Much Work Completed Outside of the Outage as Possible

Getting as much field work done outside of the outage window will greatly reduce schedule / duration risks and, as a result, cost overruns.  Labor costs during the outage will often be 30 percent to 50 percent higher than during regular work hours, especially when overheads, temporary facilities, and other costs are factored in.  “Fine-tuning” the scope of an outage can pay for itself.

Now that when the work selection phase is well underway, it’s time to adjust the scope in order to reduce execution risks:

  • Test and retest decisions about repair vs. replacement. While replacing equipment is typically more expensive, it allows for greater cost and duration certainty, as well as virtually eliminating the possibility of additional found work.  Eventually, the replacement option may not be that much more expensive, with the extra cost serving as a low insurance premium against additional expenses and schedule extensions.
  • Adopt a preference for simpler repair methods, as long as they are technically sound. Screwed or bolted connections, for example, are typically easier to assemble than those that require welding, soldering, or braising.  The safety benefits of reducing the amount of hot work done during the outage cannot be overstated.
  • Prefabrication and preassembly have numerous advantages that cannot be exaggerated. This is the simplest method for moving labor hours outside of the outage window.  Minor changes or additions to work items should be seriously considered once again.
  • Challenge the parts of the scope that could be completed before or after the outage. Insulation, painting, and equipment demolition are all examples of this.  Another option is to identify items that are potentially more labor intensive (e.g., asbestos removal) and complete them prior to the outage, then replace them with temporary facilities that are much easier to address during the outage (e.g., temporary insulation wrap).


In all cases, work completed outside of the outage period, whether before or after, must be meticulously planned and tracked.  Another important factor to consider is when to begin the prework.  This needs to be carefully considered, because spilling some of the prework into the outage window clearly defeats the purpose of our efforts.

3. Now is the Time to Optimize the Critical Path Activities

The work selection activities are essentially complete at this point, and we’ve begun adjusting scope items to push as much work outside the outage window as possible.  Now is the time to examine the critical and near-critical path duration activities more closely.

Our goals are to reduce the outage’s duration as much as possible, which will have clear financial benefits for the facility, and to reduce the risk of schedule delays.

An adaptation of the Single-Minute-Exchange-of-Die (SMED) method is one of the more common methods for accomplishing this.  SMED’s main principles can be applied once the critical / near-critical plan activities are identified, planned, and scheduled, despite the fact that they were originally developed for industrial manufacturing.

We will not go into detail about SMED in this paper because there are numerous resources, including training programs, available.  The following is the general approach to maintenance outages:

  1. Move as much work as possible away from the critical / near-critical path, and,
  2. For what’s left, make parallel, make predictable, no rework, no delay …


Having one or more participants who are not from the site but are familiar with the type of work being discussed can provide invaluable insight.

A word of caution: as this effort nears completion, the planning team should keep an eye out for any different set(s) of activities that become critical / near-critical path. Before the final critical / near-critical path emerges, this exercise will may need to be repeated several times.

4. Get Ready for Discovery Work

With the work selection done and the plans and schedules mostly optimized, including the critical path work, we can now address the curse of all maintenance outages: discovery work (also called found or added work).

This is a time-consuming step that will pay off in the long run.  Given that discovery work can occur on virtually any piece of equipment being worked on, the most practical way to complete this analysis is to go through each piece of equipment in the outage one by one – this author can reassure readers that for certain equipment classes, this can be done quite quickly.

A cross-functional team of operations, maintenance, technical, equipment specialists (e.g., metals inspectors), and reliability experts will conduct a series of workshops as part of the process.  Depending on the equipment class being reviewed, the team makeup will change.  The workshop facilitator is usually the outage’s lead planner, but this isn’t always the case.

The overall strategy is based on a ‘classic’ risk assessment:


In our case, the probability is that additional work will be required beyond the agreed-upon base scope; the result (consequence) will be an extension of the outage’s duration, with special attention paid to scope items that are near the critical path duration.

Using prior repair history, condition monitoring reports, reliability or failure records, and other data, the team will determine if additional work can be expected for each piece of equipment or group of similar ones.  The workshop participants’ own experiences should not be overlooked… Note that in some cases, this analysis must be extended to the equipment sub-component level, such as the impeller, casing, mechanical seal, shaft, and bearing of a centrifugal pump.

The team must also assess the likelihood of the additional work (high; medium; low), as well as its impact on plant operations (none, partial slowdown, significant slowdown, full shutdown; for how long?).  Are there any safety or environmental implications?). Following the initial or detailed condition inspection, a small repair could potentially become a larger one, or even a full replacement; if both scenarios are possible, they must be listed separately.

The next step is to put in place the most effective countermeasures to these duration risks.  In general, additional repairs that have a greater impact on the duration of the outage will necessitate a higher level of preparedness.

Typically, there is no need to follow up on identified low-probability additional work that has little to no impact on the schedule.

Extra repairs that will likely extend the outage’s critical path duration may necessitate significant mitigation, such as completing all required engineering deliverables, ordering additional materials, adding the additional work to the job plan and including it in the overall schedule, and ensuring that the necessary resources will be available to execute the work.

Various levels of engineering, procurement, planning, and resourcing may be justified for those “in between” cases:

  • Only complete the engineering work
  • Only confirm the delivery timing of additional materials. Perhaps, ‘reserve’ the necessary inventory
  • Identify the resources needed to complete the extra work (labor, equipment) and simply confirm their availability and lead-time to mobilize – could only have a very short notice to do so


For this latter situation, the additional repair work will only be triggered after it’s been identified, evaluated and confirmed as necessary against the outage work selection criteria.


Figure 1 summarizes this process:

Outage Scope Planning

Figure 1: Discovery work risk evaluation

Let’s take a look at an example: During a previous outage at a chemical plant, an inspection revealed that a section of pipe needed to be replaced.  There is no way to perform an accurate in-service inspection.  The construction material is an alloy steel that is not kept in stock locally and requires a long delivery time on a regular basis.  According to the approved job plan, the remaining section of pipe will be inspected again to see if it needs to be replaced during a future outage.

Following a detailed scope risk review with the metallurgical specialist and engineering team, it was determined that replacement may be required during the upcoming outage with a high probability (>50%).  The outage could be extended by 2 to 3 days due to the lengthy delivery of the additional material.  Given this risk, the planning team made the following decisions:

  • Continue only with the original scope
  • Complete the engineering for the additional pipe replacement and keep a copy of it on file
  • Place the additional material in the plant’s inventory after ordering it. It can be used whenever the extra replacement is needed


The team avoided the risk of a schedule extension if the discovery work materialized for the small incremental cost of additional engineering and inventory carrying costs.

Finally, the method described above will not look for potential discovery work on equipment excluded from the approved scope of the outage.  A separate discovery work risk evaluation may be required if the planning team believes this is possible or has happened in the past.

This could be as simple as asking the participants at the end of the workshops, “Anything else we haven’t discussed?”.


These lessons can be thought of as four main pillars for achieving predictable outage outcomes, whether it’s in terms of workplace safety (injury prevention), duration, or cost.

  • Perform a thorough review of all planned work items in relation to the upcoming production run’s objectives
  • Adopt a replacement-over-repair mindset, as well as simpler and safer work execution methods
  • Once the critical path activities have been identified, conduct a thorough scope and execution review and make any necessary adjustments to minimize work during the critical path or implement set-ups or work methods to ensure high predictability / no rework or delay
  • Conduct a thorough analysis of all repair work list items to determine the likelihood of additional repairs or scope expansion, as well as their impact on the duration of the outage. To lessen the impact, implement cost-effective countermeasures


While the steps are listed in this sequence, they are not necessarily done in that order – expect some overlap and recycling as the mitigation plans are incorporated into the outage scope and plans.

Allow plenty of time for this during the planning stage. The reward will be significant.


Reference information:


Becht has specific experience in turnaround preparation, planning and estimating (Click to View) ,or, to contact one of our experts please click below:

contact becht


About The Author

Daniel Evoy has over 30 years of experience in reliability, maintenance and construction of equipment and facilities in the petroleum refining industry based on his long-term career with Imperial Oil Ltd. His experience includes in-depth knowledge of reliability and maintenance best practices. Mr. Evoy holds a Bachelor of Science degree in Mechanical Engineering, a Master of Business Administration, and certification in Management Accounting, all from McGill University in Montreal, Quebec.

Authors Recent Posts

Leadership Lessons In Outage Work Selection, Planning & Scheduling

Leave a Reply

Let Becht Turn Your Problem
Into Peace of Mind