Disaster recovery is an essential service for EDF with Phenix-IT
For French power generation utility EDF, the efficiency and reliability of its business continuity (BC) and disaster recovery (DR) plans, and the technology they rest upon, go way beyond the organization – they extend to the functioning of the country itself.
In this article, we look at how EDF build its BC/DR provision across the entire organization, to cover potential outages in its datacentres, cyber attacks, and even the once-a-century likelihood of major flooding around Paris. In other words, all the risks identified by its IT department.
To deliver its DR plans, EDF developed an application dedicated to the job that it calls Phenix-IT. “We needed a tool to manage the recovery plan in case of a data center disaster,” says Bruno Leyris, head of IT BC/DR and crisis management at EDF.
But the objective of Phenix-IT is twofold. Firstly, it is an operational tool that functions in real time when a datacentre DR plan is activated. Secondly, it is also used daily to manage and generate reporting on the leadership at EDF and its lines of business.
Leyris underlines the role of Phenix-IT in testing and improving BC/DR provision. “We use Phenix-IT regularly during our annual exercises,” he says. “Every year we simulate the destruction of data centers and follow the recovery of activity through Phenix-IT.
“Within the framework provided by these exercises, we generate reporting and graphics, and communicate findings using secure tools across the group’s crisis management organization.”
Beyond this operational role during crisis management, Phenix-IT collects data relevant to these exercises. Analysis of that data contributes to improvements in crisis management and the recovery plan.
Phenix-IT: An increasingly rich and sophisticated application
Since its launch 12 years ago, Phenix-IT has evolved hugely, with numerous areas of functionality added to the original application jointly with software provider Mega. Among the most significant additions during the life of Phenix-IT has been the roll-out of workflows between key actors in crisis management and secure communications that alert members of crisis teams.
These tools enable efficient decision-making via information sharing on a 24/7 basis. Securing them is essential because if the crisis is triggered by a cyber attack, it is important that the attackers cannot listen in.
Reporting on Phenix-IT has been a particular focus of development in recent years. During any disaster, the software delivers forecasting graphics. It is also integrated with impact simulation functionality that allows IT teams to test system resilience via Phenix-IT.
Outside the key role played by Phenix-IT during crisis simulations, it is also used by IT on a daily basis and has become a standard for the EDF group.
“Beyond its disaster recovery functionality, impact simulation and secure communications, Phenix-IT has become a vital standard in IT,” says Leyris. “It is accessible by security staff to collect information and test the resilience of IT systems in quiet times.”
Modernization on Mega Hopex
Phenix-IT was developed on a now-obsolete version of the Mega platform which had become increasingly costly to maintain. So, a project to update it was launched recently. After analysis, it was decided to continue with Mega, but to migrate to the company’s Hopex platform, which is dedicated to management of governance, risk and compliance.
“This solution opened up new possibilities for us while ensuring continuity in our approach,” says Leyris. “And we also came back to standards that we had diverged from.”
EDF’s CIO had progressively deployed an agile mode of operation with Mega. EDF project chiefs, developers and Mega’s sales force were all involved. “Phenix-IT is now in its ninth year of evolution,” says Leyris. “Lots of people have taken part in the project, but they have always taken care to understand the aims and added value that EDF expects from the software in times of disaster as well as under normal working.”
The project team produces two major versions a year, and these are very constrained in terms of the timeframes they must meet. Each major version must be completed before every exercise.
“Every six months, we have a major version of Phenix-IT for our disaster recovery exercises,” says Leyris. “It’s a constraint of time and quality because we can’t have Phenix-IT crashing during the exercise.”
Agile organization allows for continuous improvements in Phenix-IT. Outside times when DR exercises are running, those responsible for crisis management in EDF connect via the tool to deliver analyses, including between IT and the business.
Number of objects growing by 10% a year
A key challenge facing Phenix-IT and the DR plan is growth – of 10% each year – of elements that it must track in EDF’s IT systems. The software has to continuously integrate data from new sources, and that brings further work on the quality of data ingested by Hopex. Increases in raw data and the new functionality that accompanies growth affect the plan in terms of the power and availability of the platform.
EDF wants to keep developing the platform, with the chief idea being to increasingly integrate Phenix-IT with crisis management processes.
“We want to work on ergonomics to make the software more intuitive,” says Leyris. “Someone woken in the middle of the night during a disaster needs an intuitive tool.”
Also, Phenix-IT needs to be improved to facilitate the business continuity team’s work in case of ransomware attack.
Currently, Phenix-IT runs on Hopex v3, but the project team is studying its possible migration to v5 and putting new functionality in that version to work.