Roxane, could you please introduce us with the operational resilience management?
Dashboards and KPIs that convey concrete messages and calls for action are often what drives the success of operational resilience initiatives.
Operational resilience brings together and harmonises multiple disciplines that were previously managed in silos: business continuity, IT and disaster recovery, incident and crisis management (IT, business and cyber), cyber defence, third party management, and operational risk management.
In order to coordinate and orchestrate these disciplines effectively to establish an accurate picture of the overall resilience, companies need to analyse their data in relation to these topics. This requires a complete mapping of critical services (Important Business Services), their dependencies (business processes, applications, suppliers, teams, buildings, etc.) and testing.
To make this possible, there is a real need for tools and automation. This is also why we are seeing more end-to-end solutions for operational resilience management emerging in the market, from specialist vendors such as Fusion Risk Management, Castellan to non-specialist ones, such as ServiceNow.
What are the challenges in the field?
Depending on the company’s maturity, each stage of the process may pose challenges or difficulties.
Challenge 1: Data Model
The operational resilience data model must be created in consideration of Important Business Services and their respective dependencies. Preferably, an organisation would reuse existing inventories (e.g. CMDB, supplier inventories, BIAs, HR systems, etc.) and run workshops to leverage on the knowledge of their business representatives and IT experts, suppliers, etc. The challenge stems from the need to rationalise all the elements into a format that enables data analysis. This means that even if one starts with Excel, it is important to firstly define the precise rules (common referencing system, one piece of information per line, etc.).
Challenge 2: Identifying gaps
Once this mapping is carried out, companies need to identify threats linked to the end-to-end service and existing resilience capabilities to mitigate them. These capabilities can be specific to a dependency or broader. This allows the creation of indicators that show resilience gaps. Overall, there can be two types of gaps:
A dependency with insufficient contingency plans
This can be identified in the initial analysis, through existing controls, or through testing.
Example: A person wants to withdraw cash. Normally, this service is available through an ATM. Several elements are necessary for ‘normal’ service to function properly:
- Physical ATM itself
- Customer authentication system via their bank card
- Customer account management software provided by a third party to check the balance
The following threats may affect this service:
- Major IT loss (whether or not caused by a cyberattack)
- Loss of the software provider
- Physical incident affecting the ATM
We shall assume that 4 hours is the period before the inability to withdraw cash becomes an intolerable source of harm to the customer – which is also known as the impact tolerance). With this context in mind, the bank needs to consider the following questions to identify resilience gaps:
- Recovery Time Objective (RTO): In the event of a computer loss, can the ATM and authentication system be brought back online within 4 hours according to their RTO? Has it been tested?
- Exit plan: In the event of a major breakdown or bankruptcy of the account management software provider, is there an alternate provider the bank can turn to for delivering the service without intolerable delay? Alternatively, is there a way to bring the activities in-house?
- Contingencies: Is there a degraded process for dispensing cash, for example, by replacing a faulty ATM? What are the dependencies for this process? Can it be done without an IT system?
Once these gaps have been identified, you can then calculate resilience scores for individual components.
Absence of a core resilience capability
A range of operational resilience capabilities is needed in every organisation, which includes business and IT continuity, third party management, cyber defence, disaster recovery and crisis management. We have identified a list of 50 generic core capabilities, linked to the most common threats, and are deploying this framework with our clients to measure the overall operational resilience maturity level.
Examples of key capabilities include:
- Crisis management: alternative communication channel
- Disaster recovery: Cyber vault
- Third party management: Crisis SLAs with third parties
- Business and IT continuity: degraded processes without IT
- Cyber defence: emergency authentication procedure
Challenge 3: Governance
Finally, governance is required to ensure that operational resilience data is maintained up to date, such that accurate reporting can be delivered to aid decision-making in the right forums. For instance, any initiatives to remediate identified resilience gaps requires management buy-in and funding, and management can only make the right decision and prioritise initiatives based on what is being reported on official reports.
Finally, what should be measured?
The underlying question in MI is: how well is your organisation prepared to withstand a major incident?
- Are the dependencies identified?
- Are the necessary documentations in place?
- Are the threats known?
- Are controls in place to indicate a gap?
- Are the company’s employees prepared to respond and minimise the operational impact of a major incident?
What are customers’ expectations?
As of today, through supporting our clients in their Operational Resilience program, we have identified three common themes with regards to our clients’ expectations around operational resilience projects:
- Clients need help with creating an inventory and rationalising multiple sources with various data formats to be incorporated into the data model.
- Clients regularly require support with creating reporting. This can be in the form of designing useful KPIs that can be translated into actionable items and a driver for decision-making process, or creating dashboards in data visualisation tools such as PowerBI.
- There is an increasing demand for sourcing and deployment of operational resilience tools. Wavestone can help companies find the right tool that suits their needs via:
- Performing a benchmark
- Gathering requirements and specifications through workshops with future users
- Creating an RFP and a suitable scoring mechanism to evaluate vendors
In fact – a great example showcasing our expertise around this particular area around helping our clients with sourcing and the deployment of operational resilience tools would be Wavestone’s second edition of the Operational Resilience Tooling Panorama – it captures the main market players across a range of topics such as emergency notifications, resilience management (mapping, testing, dashboards), crisis management and business or cyber incident simulation (cyber range). The radar is also built to encompass a wide spectrum of players – from disruptive innovators to traditional players, and from start-ups to large organisations.
Any final advice for readers?
For French clients who have not yet launched an operational resilience program, there are two pieces of advice:
- As soon as the mapping is done, you need to think about how to store the data (i.e. the data model). Excel may not be sufficient as a tool to ensure the sustainability of the model
- Do not hesitate to re-use what your company already has in terms of business and IT continuity, third party management, cyber defence, IT reconstruction and crisis management.