In this week’s bulletin, Charlie discusses the Kelly Report from Heathrow substation fire and how they responded to the incident.
As business continuity practitioners we should never stop learning and taking lessons where and when we can find them, be that from internal enquiries or wash-ups to reports from major incidents either locally, nationally or applicable to your industry. So, when I saw that the Kelly report was out on the Heathrow power outage, I thought I should have a look. Especially as for the last three weeks I have been working with power distribution and generation companies.
The report is very comprehensive and gives a very good description of the incident management set up at Heathrow, how they responded and the detail behind some of the decisions the organisations made. It is a rather glowing report as the report’s author is a Non-Executive Director at Heathrow so slightly marking their own homework and it didn’t look at their response from the view of their customers—either the passengers affected or the airlines—which I would suspect would have made the report a little more downbeat and critical. It is not that often organisations expose themselves and show the full breadth of their crisis response so there is important learning to be had from this report.
I also think the report is important in that it gives us a benchmark of what organisations can do to prepare for incidents and an overview of their response. I think according to the report the Gold, Silver and Bronze teams responded remarkably well so it does give us a benchmark of what is possible if you put enough time, effort and money into your preparation.
An incident involving the close of Heathrow airport will always be international news and therefore a negative story and although they managed the incident well, miracles couldn’t be performed and the recovery was always going to take time, hence magnifying the incident. Good incident management goes some way to mitigating this and prevents the spawning of further negative stories, but the lasting impact and reputation of the airport will always be dented.
What lessons can we take away?
- The airport had a comprehensive incident management structure which was deployed during the incident. This was based on the JESIP principles of Gold, Silver and Bronze Command. They also had a duty manager on call so that this incident was able to be quickly responded to. Not all organisations need duty managers, but having a manager on call is a good idea. Often, it is the issue of payment to them that results in a best-efforts system.
- The airport also used a version of the JESIP Joint Decision Model which they used in support of decision making. I am a great believer in decision models as they help focus decision making and lead to better decisions. To use a standard model used by emergency services is always a good idea, in that in the enquiries you used an established well-used model to guide your response. We at PlanB Consulting use a slightly modified one for the JESIP model but it is still based on the same base model.
- “There were a large number of pre-prepared contingency plans which were used in the response and sped up the return to full recovery.” This shouldn’t be a surprise to us as this is our job to write them, but I think the importance of technical plans for bringing back complex machinery, such as the power to terminals, cannot be underestimated. When you are bringing back an airport terminal’s power and every minute counts, you don’t want to then start writing the plan. In the same way organisations should have disaster recovery plans for recovering their systems and these should be comprehensively documented. There should also be the plan in place for how the coordination and priority of systems will be carried out.
- “It was known within Heathrow at a technical level that the structure of the airport’s own internal High Voltage electricity network meant that loss of power from one intake would result in a suspension of operations for a significant period (for at least eight hours).” How often do we see in organisations that the technical people know how long machinery or IT if down will take to recover, but this doesn’t always make it up to management and so initially timescales for recovery are unrealistic and RTOs may not be able to be met.
- I like the objectives (strategic intent) set for the incident. They are not often shared outside organisations during incidents so here are Heathrow’s. I like them as they are clear and also some of them are SMART:
a. Ensuring people, colleagues and staff are safe and secure;
b. Ensuring that the environment and assets are safe, including integrity of the UK Border;
c. Minimising disruption to passengers;
d. Being in operation by opening time at 04:30 on 22 March; and
e. Resuming operations earlier than that on 21 March if possible. - It seems that F24 was used extensively to call out members of the various incident teams. Notification systems can save valuable time in an incident, are reasonably priced, and give you a feedback loop of who has been informed and has replied and who hasn’t.
- Heathrow have their response in two phases: the initial Operational Response which occurs in the first 90 minutes, and then the Response and Recovery which occurs after the first 90 minutes. As Heathrow is a very operational organisation involving lots of moving parts, I like the time differentiation of the two phases.
- The CEO is not on the Gold Team roster as they “liaise with the Board, Department for Transport and other stakeholders”. If a team is very reliant on their CEO to lead their Crisis Team, the team may be thrown by them having to go and carry out media interviews, speak to regulators or the organisation’s parent organisation or board. Consider, like Heathrow, not having them automatically as the first call to run the team.
- “Heathrow makes template A3 pads available in APOC with the Decision Model printed on them, for employees to use during incidents. Multiple Heathrow employees who were involved in the incident response referred to using these pads or the Decision Model.” I like this idea and will recycle (steal!) it.
- All the decisions the Gold team took in the report were deemed as sensible and even with hindsight were identified by the report writer as appropriate. I am impressed by this—good incident management structure helps teams to come to better decisions.
- There was a bit in the report stating that a better relationship with suppliers at a senior level, before the incident, would have helped the response. I think this is a really good point. Many organisations have outsourced part of their processes, but have they got the senior managers’ out-of-hours phone number, discussed joint response and exercised together? Not many, I suspect.
Finally, there was a story which I picked up and commented on in a previous bulletin, Heathrow Power Outage: Unseen Lessons , that when the CEO was informed of the incident at midnight, he said he wanted to be fresh in the morning so appointed the Chief Operating Officer, Mr Javier Echave, in charge and went back to sleep. This story was picked up by the papers. According to the Kelly report he put his phone on silent and never heard the call. Seems slightly odd and even if this was the case would you not send someone round to his house to knock on the door and get him out of his bed? Perhaps the lasting story will always be that he stayed in bed and be retold by practitioners for years as how a CEO should not react when being informed of an incident.
In conclusion, read these reports and take away learning for your organisation. The one nugget you learn and implement may be a lifesaver for your organisation.