Skip to content

Building operational resilience

POSTED BY
false
Building operational resilience
7:32

2025 has proven to be a volatile year so far. Not just in the financial services industry, but across many other sectors. Events have shone a spotlight on resilience – the ability to prevent, adapt and respond to, recover and learn from operational disruption. Here’s our thoughts on what this might mean for our industry.


We have seen events impact banks, airports and whole peninsulas, it raises the question: are we resilient by design?


A growing focus on banking outages

Back in January 2025, Barclays Bank UK experienced a significant technology outage that lasted for nearly three days, affecting their digital services. 

The outage occurred at a critical time, coinciding with payday for many workers and the deadline for filing Self-Assessment tax returns with HM Revenue and Customs (HMRC). As a result, customers were unable to see up-to-date account balances, and recent payments made or received were not showing in their accounts.

A further outage on 28th February 2025, covered by ORX News, impacted Barclays, as well as other UK financial services firms, such as Lloyds Banking Group (incl. Halifax and Bank of Scotland) Nationwide Building Society and First Direct were also affected. No root cause was publicly reported. 

In today’s digital world, the bar is higher than ever before. Rightly, customers expect their banking services to work 24/7. So, too, do the regulators. 

In the UK, resilience has not just been the focus of financial services, but the government too. In February 2025, the Treasury Select Committee asked nine banks for details of IT incidents that prevented customers from using their services. The responses revealed that between January 2023 and February 2025, the banks experienced at least 158 IT failures and service disruptions impacting millions of customers in the UK. The incidents caused a total of 803 hours of outages, the equivalent of 33 days, and the average outage time per incident was over five hours. The institutions also detailed the GBP 6.2 million in financial compensation they had paid impacted customers as of early March 2025. Root causes for the incidents, commonly included issues with third party suppliers, changes in systems, and internal software malfunctions.

Operational resilience: a broader perspective

On 22nd March 2025, Heathrow Airport, Europe’s largest airport, closed suddenly, creating widespread disruption with ripple effects across the global aviation network. A fire at a nearby substation highlighted gaps in operational resilience across the aviation and critical national infrastructure ecosystem. This has resulted in the Civil Aviation Authority (CAA) reviewing the rules on resilience.

On 28th April, the Iberian Peninsula experienced major disruption due to a massive power cut across Spain and parts of Portugal. It affected critical infrastructure and stranded tens of millions of people. Transport systems ground to a halt, ATMs and card machines unavailable, communication networks faltered, and water unavailable for many of those reliant on electricity-powered pumps. It took almost 23 hours for Spain’s electrical grid to declare that systems were back up and running as normal. At the time of writing, the cause is still unknown, with many theories being shared online. Overreliance of renewable energy has been one such theory and a cyberattack has not yet been ruled out. Many questions will need to be answered, but what is certain is that operational resilience needs to be at the heart of our efforts to mitigate the impact of such events on society and our industry. 

Organisations need to prepare for a broad spectrum of severe, but plausible scenarios, ranging from power outages, third party failures, cyberattacks, IT failures and more.

Given the ongoing complex geopolitical landscape and the volatility we have experienced in just the first four months of 2025, it raises the growing need for a broader perspective and preparedness for operational resilience. Our digital world is made up of an interconnected ecosystem, and any disruption to it causes fast ripple effects across interconnected risks. Ensuring systems and processes are resilient by design will help mitigate further high impact events. 

Following the CrowdStrike incident in 2024, ORX shared member views on the key lessons learnt. These are still relevant, even with different scenarios.

Members stressed the importance of ensuring they have a grasp of how service provider failures could impact on the resilience of their critical/important business services. In order to achieve this, firms must ensure they continue to build an understanding of the potential impact a vulnerability can have across the end-to-end delivery of their critical/important business services.  

Scenario testing and simulation exercises may provide an effective way of achieving this, but members flagged scenario testing as an ongoing challenge, with the following questions being asked:  

  • Are we testing sufficiently severe and plausible scenarios to enable it to plan forward?  
  • Are scenario storylines specific and detailed enough?
  • Are we moving towards end-to-end testing to deliver more comprehensive response plans?
  • Is testing considering what is important/critical from a resilience point of view?  
  • Are third parties being involved in scenario testing? If not, how can they be?

Regulatory landscape

Across the globe, regulators are focusing on the ability of financial services firms to manage incidents gracefully, ensuring critical services can be recovered quickly to minimise the impact on customers and the wider industry.

We are seeing an increasing trend of regulators recognising good risk management results in the outcome of good resilience. This is particularly of note in Canada and Australia. New regulations such as DORA (EU), CPS230 (Australia) and OSFI’s Guideline E-21 in Canada are sharpening firms’ focus on the broader control environment across critical services, third parties, and understanding the end-to-end process management. Further industry focus includes:

A drive towards planning and designing more complex scenarios including:

  • Broadening the scope away from focusing primarily on cyber incidents and looking at impacts more broadly
  • Considering the difference in recoverability depending on whether an outage is caused by a malicious actor or if it is a benign event 

Conducting additional testing, simulation or premortem exercises to test similar scenarios while incorporating learnings from incidents.  

  • This may provide an opportunity to bring together relevant stakeholders (e.g. crisis management and technology teams) to discuss response plans 

Focus on addressing concentration risk and digital monocultures 

Revisiting controls and remediation plans to ensure there is a clear understanding of:

  • Who has the mandate to make important decisions when something needs to be done urgently
  • How to interact with third parties that are responsible for managing impacted technology 

Focus on internal communication and collaboration to drive individual and collective accountability, as well as ensuring customers are at the centre of incident responses. 

How can ORX help?

In 2024, ORX launched the Reference Process and Service Library. It is an essential reference guide to typical processes and services within financial organisations, covering both banks and insurers. The library provides both risk management and process management benefits.

In addition, ORX has recently merged the Risk Management Working Group and the Operational Resilience Working Group in recognition of maturing practice. ORX members can find out about our new Risk and Resilience Working Group and how to join here. ORX Membership enables you to join focused discussions across both of these areas.

In 2025, ORX is also focusing on Third Party Ecosystem Risk. As part of this initiative, we are undertaking the following activities:

  • Form a small member focus group to meet and help steer the direction of activity – throughout H1 2025
  • Build a community of second line specialists focusing on this topic – throughout H1 2025
  • Run a survey on third party risk management – April/May 2025
  • Share practices at roundtable meetings – throughout Q2 2025
  • Third Party Ecosystem Risk Paper publication – July 2025
  • Explore opportunities for further work with the community – H2 2025

To participate in the initiative, more information can be found on the Third Party Ecosystem Risk 2025.

The events discussed in this article are highlighted in reports from ORX News. ORX News subscribers enjoy comprehensive access to our global database of operational risk loss events.

contact-icon

Discover our standards

Take a look at our series of standards and references to boost risk management at your firm.