Recently, I have felt like I’m in danger of becoming complacent with the bedrock of my chosen field. I’ll admit that in the past, I’ve been fairly vocal about this bedrock’s limitations and mantra-like recitation by aviation safety professionals the world over. But the recent apparent abandonment of this concept by one of the first Australian organisations to go “all-in” on it, gave me cause for reflection.
But it wasn’t a critical review of “Reason” that was on my mind. Instead, I started to think about whether we had embraced it enough to allow us to move on.
For me, being a “cheese-head” has just been part and parcel of being in the aviation safety game. Human factors was mother’s milk during my first year of uni with CRM and organisational accidents the solids of second and third years. From there, I’ve continued along the modern system safety trajectory of culture, SMS and so on. I’ve never known it any other way.
But how has the rest of the world taken to it? The general public, I mean. The great-unwashed .
To look examine this, I thought I’d look at MSM coverage of aviation accident investigations in Australia. So, I took to Google and searched for pages related to three accidents in the days following the release of the related accident investigation report. I was looking at how the news reported the “causes” of the accident.
The three accidents I chose were:
- Lockhart River – Australia’s worst air disaster in the last 40 years or so and an investigation I knew did follow the accident causation chain right up to the regulator.
- Pel-air – The trigger for all the current controversy and, in opposition to the above, a report that is generally said not to follow the causation chain beyond the frontline operators
- R44 @ Jaspers Brush – The most recent investigation report to be issued which would have received media coverage and also a relatively small accident in which organisational factors might be hard to identify.
In Lockhart River’s case, I could really only identify three stories in the immediate aftermath of the investigation report’s release. One from the SMH, one from ABC News and one from Lateline (ABC as well). Overall, I thought the reporting was quite good. All three pieces discussed multiple contributory factors and generally shied away from the word “cause” – except for Tony Jones’ intro to the Lateline piece which was actually more concerned with the regulator’s role. However, the headlines for the SMH and ABC News stories were old school all the way – “Pilot error blamed for Lockhart River (plane) crash” – I guess we can blame the sub-editors for these ones.
For Pel-air, Google yielded only one real MSM link with a couple of other stemming more from the 4Corners story shown a couple of days after the report’s release and many coming from aviation industry outlets. The Australian‘s story was fairly consistent with the characterisation of the ATSB report in that it focussed on the crew’s actions but it did mention briefly more upstream factors. The other stories were quite critical of the ATSB report in its perceived lack of analysis beyond the Unsafe Acts and Local Workplace Factors levels.
In the final accident, the two MSM stories I found (The Australian & Fairfax Media) put a real emphasis on the “what happened” aspects and ventured little beyond that. In this case the operation was private and, I’m sure, some would argue that “Reason” doesn’t apply. The fleet grounding and safety recommendation for a change to the fuel tank were mentioned.
At the very least, I sure more could have been said about the human factors aspects related to the event. And more could definitely be said about the aircraft and crashability standards for aircraft. As I said a couple of weeks ago, no man is an island. Even, private pilots and even, aircraft designers and manufacturers. Imagine the impact that this investigation could have had if its analysis showed that aircraft structural certification processes showed deficiencies in post-crash fire considerations.
I don’t know if it does but if this is not the case, why then do we need to change the R44′s tanks? This is not a high-level systemic fix. What is there to stop another aircraft type from having this problem in the future?
Okay, I’ll admit that we can’y go on a mass analysis expedition with every accident investigation and we have to select those investigations that have the potential to yield the greatest safety benefit. As an idealist, I do have trouble with the finiteness of the real world even though I do have to deal with this in my day job.
But where does this leave “Reason”?
Well, the Lockhart River articles were (save for the subs) quite heartening and even the Pel-air coverage (overall) tried to encapsulate the complexity of an aviation system breakdown. I guess the disappointment is more the ATSB report which, as we saw with Lockhart River, can drive the media coverage.
I’d like to see the organisational accident or system failure approach to remain as fundamental for all aviation safety analysis and investigation. In fact, it should be extended to try to capture the non-linear, close-coupled nature of complex socio-technical systems like aviation. The “Post-Reason” world may be upon us but I don’t think it is based on the approach offered by the ATSB’s chief:
If we want to go to Professor Reason’s model of investigation—though we think we have come a long way since Professor Reason’s initial work in the 1990s—there is error and there is violation. While the focus of our investigations is on error and understanding error—how to prevent it, how to detect it and how to deal with its consequences—there was also in this case an element of what, in Professor Reason’s model, would be viewed as violation; and that is principally the responsibility of the regulator.
Reason’s error types fit well within his larger model and, to be honest, I don’t see the ATSB-error/CASA-violation distinction. There’s a whole other blog post on that one!
I still quite like the distinction I made in my other post on this subject where I considered the very high-level intent of the operator. If the intent of the operator was to get people safety from A to B on their aircraft, it falls within both CASA and the ATSB’s courts. While the operator may intend on breaking a specific regulation or company policy, their overall intent remains getting their pax on the ground. If the intent of the operator is anything else, then it actually becomes a criminal matter for the police and OTS.
To analogise where I think we are at, “Reason” was a mud hut for safety professionals. It gave us a basic structure and shelter to develop the field a little more. Unfortunately, we’ve out-grown the hut and we need something more. Maybe a hard floor, doors, windows, who nows? There are quite a few options on the table to take us to the next level, its only a matter of time before someone puts it together in a package as neat as the “Reason Model” was. It’s an exciting time to be a safety professional.
I’ve been out in the “real” world for the past six months or so and in that time, my thinking on risk management has changed a little bit. So here it comes, a confession…
I have being using a PIG recently and I have felt its use has probably helped with effective management of overall risk.
How can that be? Don’t you despise PIGs with every fibre of your being? Well, yes. I still do but let me provide two little points which might put this confession in context.
Firstly, the company I work for doesn’t rely solely on the product of impact and probability to assess risk. They also score risk on maximum foreseeable loss. That scale puts most aviation activities into the highest risk bucket straight away. No complex probability calculations required.
The second point is that no business is solely interested in safety. Now, I know that is extremely obvious and on some level I knew that but I don’t think I appreciated it that much when I was Mr “All-About-Safety”. That’s not the way it is anymore, I have other things to think about and my superiors expect me to provide a picture of the overall operation at my airport.
So, now that the business knows that aviation is one of the highest risks, what now? If it is “red” all the time, how to do you manage that?
Okay, now we are back on track. How does one assess the complex safety environment which exists within the aviation risk of the business? Well, I’ve been exploring the how-to-do-it bit on here for a while and I’m getting closer to tying it up but lately I’ve been thinking more about how this fits into the bigger picture.
The best I can come up with is to propose that this type of risk analysis be categorised at intra-risk analysis.
I have been trying to avoid segregating safety risk analysis from general risk analysis but in order to progress the concepts I’ve been working on within my real work, I feel the need to put the whole grand unifying theory of risk to one side.
PIGs and the like have a strong foothold in existing risk management frameworks and pragmatically, it makes sense to create a space in which these concepts can develop.
At the moment, that’s what I’m going to run with for now. I’ll have my generally PIG-based risk register for the entire operation and within it, I’ll have an intra-risk register for aviation safety using a framework based on the concepts of criticality, exposure and control.
Using this approach, I hope to develop a way of informing senior managers what the picture of risk is within that large “red” box labelled aviation and how they can be assured that the risk is both acceptable and as low as reasonably practicable.
I’ll let you know how it goes.
I’ve been a bit out of the loop over the past couple of months as I try to get a handle on my new job and the (almost overwhelming) responsibility that goes along with it. But I can’t ignore the action over at the Federal Senate’s Rural and Regional Affairs and Transport References Committee’s inquiry into Aviation Accident Investigations.
Before I comment, some disclaimers – I’m not going to comment on the particulars being discussed at the Senate hearings. While I worked with many of those involved, I never worked on anything associated with the accident event (before or after) but if I were to comment, it might look as though I have inside information, am bearing a grudge or just being an stirrer. I don’t, I’m not and maybe just a little .
I do, however, want to comment on the philosophy surrounding some of the issues at hand.
The particulars of the situation on which, I would like to comment are, basically, that an accident occurred and the resulting investigation focussed on the operating crew. In the 15th February hearing, two comments by Senator Fawcett struck me as warranting further examination. They were:
One thing the committee wants to put on the table upfront is we accept the contention by CASA that there were errors made on behalf of the pilot in command of the flight. There seems to have been some concern raised that this inquiry is all about exonerating an individual and shifting blame elsewhere. That is not the case. We accept the fact that in the view of some it was even a violation is supposed to error. (p. 1)
With the concept of a systems approach, whereby not only the operator and the piloting command but also the regulator are key parts of the safety system… (p. 3)
For all the other problems we seem to be having in this scenario, we still seem to be stuck on the basics.
Part of a Complex System
Senator Fawcett’s second quote there and numerous others throughout the course of the hearing shows that he is quite familiar with the concept of a safety system but he, and I think a large part of the industry, can’t escape the concept of personal responsibility associated with criminal law.
The language of “exonerate” and “shift blame” suggests strongly that the old approach to investigations and safety improvement is still alive. We seem to have slid back into the days of pointing the finger at the front-line operator, stamping the label of “cause” upon them, punting them into touch, dusting our hands and declaring the world a safer place.
Okay, I’ll admit that this could be a harsh analysis of what is possibly a “throw-away” line but the language could indicate a deep-seated belief in the very concepts we are supposed to have left behind. I’m also not singling out Senator Fawcett. I think we all fight these traditional ideas, conditioned within us since an early age. How many of us still use the word “cause” despite its often misleading level of direct influence and independence?
Exonerate, Exshmonerate; Blame, Shame.
It’s a hard thing to let go of but, I think, we have to let go of the criminal view of personal responsibility when we are dealing with accidents in complex socio-technical systems, such as aviation. I’m just going to come out and say it:
No one, who participates in the aviation system, should ever go to jail, be fined or sanctioned as a criminal. Ever. Regardless of the error, violation, failing, mistake, slip, lapse, omission, commission, faux-pas, foul-up, whatever.
If we accept that aviation is indeed a system – a complex set of individuals, machines, procedures, tools, organisations – all working to achieve the objective of moving stuff from A to B – then no single part of that system can be singled out as having “failed”.
As a system there are, or should be, feedback loops. Sub-systems for checking and re-checking. There should be self-correction. If one part has failed, more parts have failed; in fact, the whole system has failed.
If you are going to blame one, you need to blame all. Jail one, jail all. Fine one, fine all.
Whoa Warden, Don’t Open that Door Yet
I am definitely not advocating some criminal reform agenda that would see society’s jails shut-down and personal responsibility disappear. I am arguing for a clear distinction between how we view undesirable events within the aviation endeavour and in society at large. I don’t think it is appropriate to look at the aviation industry as a sub-set of society and apply the same thinking.
The big differences between aviation and society are choice and intent. Pilots, ATC’ers, LAMEs, AROs and many others choose to be part of the aviation with the intent on achieving the industry’s objective of moving stuff from here to there safely.
Society on the other hand is, really, all encompassing. By definition, we don’t really have a choice to join. You could run off into the woods, build a log cabin and live as a hermit but you’d still be a part of society in the broadest sense and still, more importantly, be subject to various laws governing human relationships.
What to do with a broken part?
A while back the industry tried “no-blame” and it didn’t work. I think it was because the concept suggested there would be no ramifications, no consequences to behaviour which contributed to undesirable outcomes.
And this, of course, is untenable. If the system experiences an undesirable state or outcome, it should be able to correct its performance.
The response was to abandon “no-blame” as going too far but I think the problem was that the concept of blame actually ceases to have any meaning within a safety system approach. Much like one cannot meaningfully discuss events “before” the big bang, because time began at the big bang.
So What’s the Lesson?
The tiny lesson I’m trying to get at here is that we need to try harder to fully integrate the system approach into our thinking. It’s not so much that we can’t identify frontline operators as contributors to accidents but that there will (not might) be more to the story. Someone else, actually numerous people, will have contributed, in every case.
And in taking this approach, in identifying as many contributory factors as possible, the actions we take with respect to those people, tools, equipment, etc. will be and be perceived as appropriate. It will support actions like suspending a licence, grounding a fleet or withdrawing a certificate.
Without it, honing in on a frontline operator and booting them out of the system will never look justified regardless of how necessary it is.
PS – Criminal Offences Against Aviation
There should still be criminal offences relating to aviation. For example, morons who shine lasers at aircraft should be tried as criminals because they have not chosen to be part of the aviation system or intend on supporting its objective. Same goes those who wish to use civil aviation as a weapon.
I can’t lie to you. I have been turning myself inside out trying to get a handle on risk evaluation in the aviation safety sphere for close to five years now and I still don’t feel any closer to an answer.
And I say “an” answer and not “the” answer. Since you are always assessing risk in terms of your objectives, there can and will be multiple approaches to assessing the risk of the same scenario depending on whether you are considering your safety, financial or legal objectives.
The Perpetual Problem?
The “problem” with aviation safety risk evaluation popped its head up again for me in a recent discussion. Without going into too much detail I was discussing the impact of an aerodrome defect with a non-aviation colleague.
We both identified safety as the key impact area and then our company process required us to assess the impact according to a scale (not quite a matrix ). We couldn’t escape the top box, the highest level category, because as soon as the safety of an aircraft is called into question, you can’t escape the possibility of complete disaster.
When pondering this problem, I keep coming back to the idea that aviation, from a safety perspective, is inherently perilous. You can’t commit aviation without being “all in”. As such, the risk-level question tends to end up as a probability continuum from negligible impact to catastrophe.
Alright, let’s stop there. I’m pretty sure I’ve discussed this stuff before. So, let’s take it as read that I am, essentially, only interested in the probability of the worst case.
That simplifies things, doesn’t it? Unfortunately, my recent readings of Dekker and Taleb have primed me for skepticism when complex systems appear simple. In the last BT post I wrote, I did highlight that a bow-tie diagram is only ever a model of reality. I think it would be inappropriate to evaluate it using an approach more complex than the model itself.
How to Murder an Analogy
When you want to see something in the dark, it is best not to look directly at it. Due to the biology of the eye, low light receptors are more prevalent in the area of the retina outside of the focal points. Therefore, you will better see an object in the dark if you aren’t looking directly at it!
I’m proposing something similar. If you want to evaluate the risk of the bow-tie scenario, don’t look at the top event – look around the top event.
Around the top event, I consider there to be three primary things – threats, consequences and controls (including defeating factors and secondary line controls).
Therefore, I propose we assess a BT based on:
- our exposure to the threats;
- the criticality of the consequences; and
- the effectiveness of the controls.
Exposure is a common word in the risk management game and I really like it. As such, I think it is underused. What I like about it is the implicit idea that risk exists everywhere, at all times but that the context in which we are operating may vary.
If you take my boring predictable runway excursion BT example, those threats really do exist at all airports. All aircraft have the potential to carry out an unstable approach, all runways have the potential to be contaminated but not all contexts have the same exposure to these threats.
Why not use probability or likelihood?
Well, probability tends to convey an air of accuracy and mathematical legitimacy which is rarely justified. Likelihood, not so much but it is tied often to an occurrence of a discrete event. Whereas, linguistically, for me at least, I find exposure better attuned to both discrete events and persistent conditions.
So, step one is to assess one’s exposure to the identified threats.
On the other side of the top event, let’s look at the criticality of the consequences. In an earlier post, I had used the term influence to encompass the concepts of pathways and proximity of events to the final condition (absolute destruction). I’ve had a rethink and today, I’m going with criticality.
Think of the relationship between each consequence and the potential final outcome. Are there many ways this situation can go pear-shaped? Or is this consequence a LOL-cat’s whisker away from disaster itself?
Step two is to assess the criticality of the outcomes.
Once you’ve plugged the holes with your controls, identified new holes, plugged them up again and so on, you will need to sit back and criticality assess the effectiveness of those controls.
Without a BT diagram, this could get very hard but the diagrammatic approach can help and some software makes things even easier. Once you have your measure of effectiveness, I think you’ve got all you need to make an assessment of risk, all without actually assessing the top event.
Step three, assess the effectiveness of controls.
How to actually assess exposure, criticality and effectiveness and how to put them together are questions I have not yet answered. But the brain matter is continually churning and as soon as I know (or think I know), I’ll post it here.
1. I’m sorry. I’ve been reading a few obtuse academic texts lately and perhaps the language is rubbing off on me.
When I joined the aviation safety regulator I was introduced to the concept of systems-based auditing (SBA). Before this I had been carrying out aerodrome inspections and I thought becoming an Aerodrome Inspector for the government was going to be more of the same. How wrong I was! Even after four years, my concept of systems-based auditing is still evolving.
I coming to discover, and it seems everything I read will attest, that most things in life tend to be more complex than we initially think – SBA is no different.
For those not familiar with the concept, let’s look at the features and benefits of this approach.
SBA is often compared to its predecessor, which I will call, product-based auditing. This approach involved comparing examples of finished products to the standards laid out. The image often conjured up is of an auditor with checklist in hand and ticking off the compliant aspects of the product.
The problem with is, of course, that this can only ever make an assessment of the selection of product observed. The auditor can, perhaps, infer that future products will also meet the standard involved but they haven’t really assessed that to a point where a true judgement can be made.
And that is what SBA sets out to do. By looking at the system that produces the product, the auditor is making an assessment of the operator’s ability to consistently achieve the required output standard.
This approach comes to the fore when systems are brittle but the environment hasn’t yet put pressure on the system’s weaknesses. The products have all met the requirements so far but you can see that problems will arise if just one small thing falls out of place – say a key person leaves the company. It also works really well for systems that are rarely put into action – such as an aerodrome emergency response plan at a small aerodrome.
When I’ve discussed systems with colleagues before, we have sometimes descended into a semantic quagmire. Depending on one’s field, education, experience, what constitutes a system differs in the mind. Sometimes a definition can be so restrictive that discussion is pointless and others are so loose that discussion is impossible.
Let’s aim for the middle, at something useful.
A system is some form of endeavour that seeks to convert some input(s) into some predetermined output(s) in a consistent and predictable manner.
Now that can be a similar definition for a process, task, element, activity, etc. but I am going to stick with system to cover any single or collection of human or mechanical changes of state which result in an input changing into an output.
You may note that I haven’t defined the scale of the “endeavour”. It could be making toast or it could be shipping 10,000 new Furbies from their factory in Taiwan to Walmart stores on the west coast of America. The scale of the system is simply that which is to be and can be audited.
Yes, you could define an airline system with a whole bunch of inputs, a single box of action and then one output, the safe transport of people and cargo from A to B. But it wouldn’t really be possible to audit that in one go and make a judgement as to the ability of the system to consistently produce the desired output.
We need, therefore, to breakdown the overall system into smaller chunks to make auditing manageable. And we will need to consider the interrelationships between these chunks. So, how can we do this?
In Need of a Model
There are plenty of system models out there but I’ve grown fond of the SADT or IDEF0 approach. I’ll say fond because we’ve only really flirted so far; maybe a quick dance but definitely not a slow dance or any real alone time.
I stumbled on to this model initially through some reading on the Structured Analysis & Design Technique (SADT). I can’t say I remember much about the actual technique. It was the graphical representation of a system that caught my eye. So in the beginning, it was all about looks.
I did some more reading into the graphical approach and I’ve made a few tweaks to suit the more socio-technical system of which, an aviation organisation is likely to consist.
Now, let me introduce you…
I’ll go through the components of the model first and then give you an example to help explain.
- Inputs – These are the things which are converted into the outputs. They are fundamentally changed by the process and become a constituent part of the output.
- Resources – These are things used within the system to transform the inputs into the outputs. They are not changed by the process but while they are being used in one system they are not available for use by another system or instance of the same system.
- Controls – These things guide the system in its process. They too are not changed by the process but they can be used by multiple systems or instances.
- Outputs – These are the primary products of the system. They are the system’s raison d’être and, in a larger sense, should meet the objective of the designer of the system.
- By-Products – These are also outputs of the system but do not necessarily consist of the inputs or are the primary objective of the system. This feature is something I’ve only just thought of and doesn’t appear in the SADT approach. This is a significant tweak I’ve included to help with the amalgamation of systems into a larger system – more on this later.
Let’s get to a simple example: Baking a cake.
Here’s the breakdown:
- Inputs – The ingredients – flour, sugar, eggs, etc. The ingredients are fundamentally changed by the system – they become the cake.
- Resources – The bowls, spoons, cups, cake tin, oven, the chef, etc. These things are used within the system but do not become part of the cake. However, while they are part of the system, they can’t be used to bake another cake or cook something else.
- Controls – The recipe, procedures, etc. Typically, controls are data or information. They can be used over and over again by many people at once.
- Outputs – The cake. Hopefully, it meets our objective and we weren’t trying to bake a potato.
- By-Products – This one is a bit harder in this simple example but let’s say that this system is part of a professional kitchen. The chef will probably be trying to improve their skills and would be taking notes on the performance of the system. These notes are an output of the system but are not made up of the inputs. They are however, an important part of the larger kitchen system.
Putting It All Together
Two big selling points of the SADT approach is the ability to link the systems together and fold them up or drill down into super or subsystems, as desired.
In my academic reading at the moment, I’m coming across quite a bit of push back against reductionism and modelling. It is thought that the complexity of the world is being masked by oversimplification and that people are failing to consider the bigger picture.
The SADT approach is still reductionist in a sense but when you start tying systems together, you can start to identify interrelationships and dependencies between them. Modelling will always be a simplification, it’s about striking a balance.
I haven’t yet dived into mapping out a large super-system, like an aerodrome, but it is on my to-do list. What I have done so far has highlighted the model’s ability to capture some of the complexity in managing a large socio-technical system. I found that it didn’t take long to see how outputs from one system become inputs, controls or resources for another.
What’s in the Box?
Let’s call time there. This post is already getting too long and I’m still, very much, in the middle of processing these ideas into something useful. As the sub heading notes, I haven’t really looked inside the box. We’ve got arrows going in and coming out but what actually happens in the box is still a mystery.
I also haven’t really identified a way of assessing the various components of the model and therefore, you can’t do much with it, yet.
I’m in jeopardy of becoming a tease on this blog. I’m always signing off with a promise of more to come. Please remember that this blog is more of a journey than a destination and I thank you for coming along for the ride.
More to come…
The week before last, I finished a 4-year stint with the aviation safety regulator. Even though I’m heading back to industry, I’m not going to stop writing this blog.
I believe that the role of the national regulator is the next safety frontier (not the last ) and I like the idea of exploring new territory. As the industry continues to explore concepts like safety management, systems-based this, risk-based that and outcome-based whatchamacallit as well as safety culture, we are all going to come to the realisation that safety can be greatly affected (more than we ever imagined) by the approach and actions taken by a national regulator.
Why do I think this?
Well, it goes back to that graph that I referenced in my first post here. I hinted that I had a few gripes with it, so let’s get to them first and then I’ll talk about the pearls which also reside within. Here it is again, to save you from scrolling to the bottom:
Alright, my first gripe is that I haven’t seen any reference to actual data that shows these rate changes. The best long term trend I can find is this one by Boeing which shows an overall downward trend but nothing along the three “phases” shown in the stylised version often presented. I did, however, find this post which added a few little humps to their graph but I’m not sure I agree with that “hump” on the far right.
My second gripe is that while examples of this graph tend to consistently show three distinct phases, the labels often change. I’ve seen aircraft, technology, human factors, CRM, organisational culture, systems and a few others all slotted into their appropriate home in one of the phases. This inconsistency annoys me but maybe I’m being too picky1.
So let’s turn this one around.
I agree with, what I think is, the underlying message – that safety improvements have been progressively found within a widening scope.
That brings my back to my gripes though. The above graph suggests that safety improvements can no longer be found using the methods employed during the previous phases. It leads you to conclude that technological improvements in aircraft technology or even basic human factors won’t have an impact on safety thanks to the “levelling out” of that phase’s accident rate. Why then do we work on systems like TAWS, GPS augmentation, FRMS, EFBs, etc.? Obviously work within those other phases continues and will yield safety improvements from time to time.
I’ve been pondering what would be a better measure and I keep coming back to Return on Investment (ROI). Unfortunately, I think using a measure a standardised as ROI would suggest a level of data integrity which doesn’t exist – not in my analysis anyway. But the concept I want to get as that at any given time aviation safety professionals are exerting effort in one or more given areas and that effort is seeing a variety of results.
Generally, over time, a new area of effort will grow in its return as the “low hanging fruit” are picked, it will reach a peak as these become scarce and then it will tend to peter out. However, that is not to say that new discoveries might see spikes in that area’s return in the time that follows its peak. I don’t think that it will ever return to the peak because as that area’s return waned, effort would have shifted to the next “orchard”.
I’ve tried to capture this idea in graph of my own. I’ve plotted the Return on Effort (ROE) of several “phases” of safety improvement endeavour as an index of the peak return against time. It is, of course, completely theoretical and not based on any data – just my interpretation of aviation safety history. The graph shows the trend described above for a couple of phases I tend to identify as the major changes in safety improvement focus since aviation began (you may have others and I’d love to hear about them).
Now, back to that underlying message I agreed with above. I see each successive phase as considering a progressively wider scope. In the beginning we looked at the aircraft and then we started to consider the pilot and the aircraft. After this we started to look at the crew, first the co-pilot, then cabin crew and more recently even ground crew. The current battlefield tends to be dealing with the company as a whole.
Therefore, the key relationship for me is this widening of scope over time. To show this relationship I plotted the scope of the highest ROE over time.
And then I extrapolated the line to suggest that the regulator will be the next frontier of safety improvement effort. A couple of months ago, I blogged about the different levels of culture. Current efforts have primarily looked at organisational culture but as I pointed out these cultures are influenced by other levels of culture including the professional and national levels. As these interrelationships become better understood, I think we will find the influence of the regulator to be stronger than previously thought.
Now obviously, the regulator is currently doing stuff to influence safety – as did pilots and crews and companies. Each phase has not been about the introduction of these things but examining how these things influence safety.
The regulator phase has already kicked off. The State Safety Programme (SSP) concept is designed to modernise a regulator’s approach to safety. While it is analogous to an operator’s SMS, there is a level of complexity the regulator needs to address which is above that for the operator. How to deal with this complexity will be one of the major hurdles for the near future.
As sad as this may sound, I find this stuff fascinating and I’ll be exploring it in my PhD and private research. I’ll blog about it here and I’ll start up my airport blogging again over at The Runway Centreline soon too. As this post suggests, I am still committed to the role of the regulator and dare I say to my former colleagues…
1 – An accusation often levelled at me by my wife!
As a follow-on to my first post on the Bow-Tie risk assessment method, I thought I’d concentrate on controls (or barriers or whatever else you would like to call them). This is, after all, where all the action happens. Risk controls are how we spend most of our time – they are the practical aspect of managing risk.
Our typical bow-tie model consists of one or more threats leading to a single top event which results in one or more consequences. The idea is to insert your controls into these connections in such a way as to reduce the level of risk associated with the scenario. Controls may also be subject to defeating factors which affect their ability to reduce risk. Here’s my overview picture from a couple of weeks ago:
You can categorise controls a multitude of ways. Risk professionals would be familiar with the standard hierarchy of controls and other ways of breaking them up. Now, I’m not sure if you’re getting to know me yet but, you may have guessed, I’ve got a slightly different approach.
The first concept I’d like to introduce is that bow-ties are made up of primary lines and secondary lines. The primary lines are those that link threats to the top event to consequences while the secondary lines are those connecting defeating factors to controls – see my new diagram below. The reason for the distinction is that I believe there are fundamental differences between the controls required on the primary line and those used on secondary lines.
I only noticed this phenomenon the other day when I was putting together a bow-tie on mid-air collision within a very specific context. I had a good piece of technical analysis in front of me but I wanted to create a picture of the risk to assist in evaluation. This analysis contained a list, in no particular order, of existing and potential controls and as I slotted them into the diagram, I noticed that certain types of controls went on the primary lines and other types ended up on the secondary lines.
Now, I’ve been racking my brain on how to describe these differences and I’m still not fully there but here goes.
Within my approach to creating a bow-tie, the primary line consists of events closely related in time. Maybe not a short time but at least a progression from threat through top event to consequence. Therefore, controls along the primary line must also exist along that same temporal line – not necessarily within it though, as we shall see in a moment. This means that controls here must be things that interact with the events that occur along the line. I noticed that front-line operator actions, equipment and facilities tended to fall along this line.
Secondary lines, on the other hand, may not relate to events which occur at the time or may be situations or conditions which lay dormant until the right set of circumstance arise. I noted last time the similarity between defeating factors and latent conditions. As such, controls on these lines must address these latent conditions and should have been implemented prior to the events of the primary line taking place. In the bow-tie I was working on, controls on these lines tended to be things like education and promotion related to the primary line controls.
Not all defeating factors, however, are latent conditions. I can think of a few that are events or situations related in time to the primary line. As an example, I tend to think that low visibility is a common defecting factor to many aerodrome related controls – visual markers etc. – and this is definitely something which needs to exist at the time of the top event to have an impact.
However, I have began to distinguish these two types of controls as action controls on the primary lines and capability controls on the secondary lines1. That is not to say that capability controls don’t involve action. Of course they do but their objective is to ensure the capability of the action control to achieve what it aims to achieve. I’m not exactly sure how to operationalise this concept – I would like to turn it into some form of advice or guidance on what type of controls go where or how to word controls on each line. That level of understanding still eludes me.
Squares or Triangles?2
Regardless of this shortfall, I have also been categorising controls according to where they act on their line. I think it is very important to consider this a part of categorising your controls because a bow-tie diagram has the potential to overly constrain your thinking.
As I mentioned previously, a bow-tie is or should be limited in its focus. I may have created that constraint but I stand by it, for now. When it comes to controls, however, you may need to identify things that impact on the situation to the left of the threats, to the right of the consequences and outside of the defeating factors. Before we get into all that, let’s have another picture.
In this diagram, I’ve identified four types of controls categorised according to where they impact on the links between the bow-tie’s components. Let’s go through them, one by one.
Prevention – These controls act outside the bow-tie diagram as they attempt to prevent the existence of the threat or defeating factor. An example of such a control might be maintenance on an aircraft breaking system designed to prevent the system from failing.
Intervention – These controls intervene after the threat or defeating factor has occurred or manifest and seeks to stop that situation from becoming a top event or impact on the capability of a control. A sufficiently wide runway would be a good example in the case of runway excursions – this control can’t prevent threats from occurring but it may stop a runway excursion from occurring if sufficiently wide to contain the aircraft’s lateral deviation during landing or take-off.
Mitigation – These controls don’t stop the top event from occurring but they seek to mitigate the consequence. Continuing on from the last example, a sufficient runway strip would be such a control as it only comes into play once the runway excursion has occurred.
Recovery – These controls also act outside the bow-tie diagram. This time they impact the scenario after the consequence has occurred. Any form of response, for example – emergency response, is a good example of a recovery control.
You can cut the control-pie other ways. In fact, you have to if you want to conduct analysis of the risk picture or turn it into a consolidated action plan. The more complicated the picture, the more important the structure as this helps to break it up into manageable junks. For example, you might want to think about what type of activities are involved in your controls. As examples, which ones involve training and which ones involve inspections of facilities?
Where to from Here?
I think I’ll be doing a lot more bow-ties in the very near future. So, this group of posts is going to develop, grow, change, morph, what have you. Next on my list might be evaluation methods, maybe. I’ll be back with more as soon as I figure it out. Actually, I might be back before then.
1. I really have trouble naming things. As proof I offer my high-school band name – “Hot Pink Combi” – I know, right?
2. I ask this question of my kids nearly every weekend morning when I go to cut their toast. For me, it has become to mean the same thing as slicing up a pie.