Dec 19

Dec 19 Post 9z: Collision at Sea-Conclusion Part3

Introduction

Having read hundreds of incident reports and case histories, I accept that most are not written with learning as the primary motivation. This can be by design or lack of skill by the writer. “Postaccident investigations can be as much about managing political reality as they are about finding out ‘what happened’ or about organizational learning.”1 The best a reader can do is identify the content of reports that is meaningful for them. This starts with the recognition that the first trap for learning from accidents comes from what people learn about causality on the job, listening to seniors, attending critiques, and reading and writing incident reports. Most are not helpful. Only after you reconstruct events from the point of view of those involved (who usually don’t write the reports!) can you begin to learn and position yourself to avoid the second trap of learning from accidents: hindsight bias. This is where the real learning opportunity begins.

Human Error is Not a Cause

“Human error” is not a cause of accidents. I know this could make some people’s heads explode, but press firmly inward on both temples and keep reading. Disdaining to label operator actions as human error is a philosophic choice. What are commonly called “errors” are judgments made after the fact by people with the power in organizations, frequently contaminated by hindsight seeking to impose an artificial order on a very messy reality. The things called errors usually don’t look like them in the moment. When the CO ordered steering split between Helm and Lee Helm stations on the bridge of the USS JOHN S MCCAIN, he didn’t perceive it as an error or he likely wouldn’t have done it. While people sometimes give orders they know at the time are a bad idea (not quite the same as an error, but close), it is rare outside of war.

Refusing to accept human error as a cause gives you a much broader perspective and understanding of causality. The bad consequences of some human actions and decisions are context _dependent_. This means they can be made over and over until just the right dominos line up, then POW! Errors can reveal problems in organizational design, deficient procedures, customs that make it hard to do the right thing (like not questioning a CO’s order on the Bridge), gaps in equipment design, and other factors. The human actions that turn out badly are indications of important problems below the surface of the system. Lifting the door of this box and peering inside is not for the faint of heart!

Human performance and error are intimately linked to characteristics of tools, operational context, and mission. Opportunities for improving safety result from understanding the connections between them. Humans are inherently not fixable. Firing, jailing, or retraining (called the “blame, shame, and train” cycle by my mentor Bill Rigot) may fulfill a deep psychological need (the desire for revenge runs deeply in the human psyche), but do nothing to identify and reduce the impact of error-producing conditions. Operator freedom of action is constrained by context and culture.

The Root Cause Myth

If you kept your head from exploding reading the previous section, press harder now: there is no such thing as “root” cause. Causality, yes, but “the one thing done differently that would have prevented the accident” (or whatever definition you learned on the job) doesn’t exist for complex socio-technical systems. No matter how many times people think or read in an incident report that “If person X had taken action Y then bad outcome Z wouldn’t have occurred,” recognize that this is just an assertion. Since different people regularly make different choices about what to call the “root” cause, then what they decide is the cause is their construction, their judgment, not reality.

Because accident trajectories are seldom linear, there is rarely a single cause. This can be profoundly unsettling for those seeking closure after a disaster. Who wants to live in a world where the stories we tell ourselves about causes and predictable consequences aren’t true? This is not a problem if one’s motive is to affix blame, punish the miscreants, and get back to work as quickly as possible.

It is much more helpful to identify problems and construct corrective actions. Corrective actions are always driven by the available resources, the most limiting of which are typically budgets and time. This is another motivation for the blame and train cycle: it is cheap and fast. This does not mean you don’t fire or punish people for the consequences of their decisions. It does mean that you don’t stop there because that type of “solution” won’t prevent the problem from happening again. In fact, nothing can since humans can always make decisions that have disastrous consequences. Don’t let this discourage you. There are always things you can do to improve your odds of avoiding failure.

How Hindsight Operates

After a disaster like a collision at sea that kills sailors sleeping in their bunks, everything looks linear. We know what was “missed,” who wasn’t “trained,” and that the Captain shouldn’t have given an order that “confused the watch team.” The problem is that hindsight channels our thinking about causality in two main ways.

Channel 1 is making complex accident trajectories look simple and linear. Because people know how things turned out after the accident, it is straightforward to identify how each action or event led to another to produce an “inevitable” outcome, an outcome that we must constantly remind ourselves the participants couldn’t see coming. Hindsight enables us to create structure and meaning out of events that looked like normal work for the people involved. People deal with problems and interruptions constantly, but with the aid of hindsight, their actions get channeled into ominous sequences of catastrophe.

Channel 2 is identifying what people should have done to prevent the accident that they couldn’t foresee. Say that three times. This is counterfactual thinking. We know the facts, but they could have been otherwise if we imagine people thinking and acting differently. I called this “if only” thinking in my analysis of the Navy collision report (ref (a)). While counterfactual thinking may be useful when looking for solutions to reduce the risk of future events, “saying what people could have done in order to prevent a particular outcome does not explain why they did what they did.”2 When you use counterfactuals as an explanation, you skip the harder challenge of understanding why people behaved a certain way. This is what you really want to know so you can learn from the event.

One of the things that makes it hard for me to identify opportunities for learning without being condescending is my years of experience as a nuclear operator (nuke). This helps me spot problems like lack of posted operating procedures, informality, and not using indications to assess outcomes of orders. These things are mandatory for nuclear operations so their absence is quickly noted in inspections and problem reports. This is not a common way of thinking in many non-nuclear organizations like the JOHN S MCCAIN (JSM).

Reconstructing the Situation

Reconstruction of the context is where you start your analysis. Sidney Dekker called this “[reconstructing] the unfolding mindset” of the operators.3 This means collecting as much contextual data as you can to understand the perspective of the actors at the time. Begin with a brief outline of what happened:

Events
Actions
Errors (that we recognize now)
Decisions
Consequences

You may have to repeat this several times to spot less obvious problems.

The questions I considered in my analysis of the JSM-ALNIC collision were:

What were the decisions that didn’t turn out well? What were the factors that influenced the outcomes?
What mental biases and logical fallacies might have been involved?
What blindspots can we see that the actors did not?
What were the actor’s goals and anti-goals (things they didn’t want to happen)? They might not be explicit in the report. What made them challenging?
What were the defenses, how did they work, how were gaps exposed?
What aspects of the situation had the biggest impact on the operators? Where was their attention? What was their physical state?
What was their level of training and experience?
What role did equipment play?
How did the people involved understand the situation? What did they miss and why?
Were the weak signals of danger? (ref (b))
What were the latent conditions and active errors?
How were risk management tools used?
How would an improved questioning attitude and other pillars of HRO have helped?
What moral dilemmas were involved?
What “undiscussables” were exposed? These are what Argyris (ref (c)) called self-reinforcing organizational, usually defensive, routines that prevent learning and perpetuate the very thing they are trying to prevent.
What were the no-win contexts?

Risk Mitigation

As a bonus for those hardy enough to have read this far and because I wasn’t sure where else to put it, here is a list of risk mitigation practices I thought about as I analyzed the JSM-ALNIC collision:

modifying the mission or the schedule to reduce the risk
dedicated watch team training, which can include walkthroughs
additional supervision or supplemental watches
assigning a watch team that has more experience
a brief, especially for things that could go wrong
no last-minute changes

For the next several posts, I will return to the fundamentals of Highly Reliable Organizing (HRO) to explore: organizational blind spots, the role of questioning attitude, principles of HRO beyond the six of Weick and Sutcliffe, and High Reliability in the U.S. Navy. Stay tuned.

References

(a) Chief of Naval Operations. (2017). Memorandum for distribution, Enclosure (2) report on the collision between USS JOHN S MCCAIN (DDG 56) and motor vessel Alnic MC, retrieved from https://www.doncio.navy.mil/FileHandler.ashx?id=12011.

(b) Vaughan, D. (1996). The Challenger launch decision: Risky technology, culture, and deviance at NASA. University of Chicago press.

(d) Dekker, S. W. (2002). Reconstructing human contributions to accidents: the new view on error and performance. Journal of safety research, 33(3), 371-385. Retrieved from https://www.sciencedirect.com/science/article/pii/S0022437502000324?casa_token=tvJzn_DHMcAAAAAA:AxWUI_VeGkJsl-E8grR9EO87QXH0un1rW-wzdVOXgzbec6ZPRnLblNw9dxCBmL599jl8iGvz

(e) Dekker, S. (2017). The field guide to understanding ‘human error’. CRC press.

(f) Dekker, S. W. (2003). Accidents are normal and human error does not exist: a new look at the creation of occupational safety. International Journal of Occupational Safety and Ergonomics, 9(2), 211-218.

(g) Dekker, S.W.A. (2001). Reconstructing human contributions to accidents: The new view on error and performance. Tech Report 2001-01. Lund University School of Aviation.