Croatian met service backs up its production at ECMWF after earthquake
Xavier Abellan (ECMWF), Kristian Horvath, Izidor Pelajić, Antonio Stanešić (all DHMZ)
The Croatian Meteorological and Hydrological Service (DHMZ) has successfully backed up its operational production and essential services on ECMWF’s High-Performance Computing Facility (HPCF) and the European Weather Cloud, following an earthquake that severely damaged DHMZ’s headquarters in March this year. Despite the emergency situation, and without previous preparation, the backup system was put together in just a matter of days. The success of this project was made possible thanks to a joint effort by a number of staff from DHMZ together with many others at ECMWF, EUMETSAT and the weather software company IBL.
The IT infrastructure at DHMZ survived the earthquake, but it was clear that an alternative arrangement was required to ensure continuity of service, using off-site resources. In the days after the earthquake, stable communication channels were quickly established among key people at DHMZ, ECMWF and EUMETSAT to discuss the best migration strategy and possible options, and a plan to move forward was outlined.
It was agreed that numerical weather prediction (NWP) and some post-processing activities would be ported to ECMWF’s HPCF, while some other essential services not fit for such infrastructure would use the European Weather Cloud. Within a week, the main components of the NWP system were running on the Centre’s supercomputers, and the European Weather Cloud was hosting a number of other services. ECMWF dissemination streams and EUMETCAST feeds were also configured to be delivered to the newly created locations. “I am hugely impressed by the prompt, effective response and support by ECMWF and EUMETSAT to DHMZ after the earthquake,” says Dr Branka Ivančan-Picek, Director General of DHMZ.
Moving onto ECMWF’s HPCF
DHMZ’s NWP group maintains and develops the operational limited-area model ALADIN and its local applications. The model and the post-processing chain are hosted on DHMZ HPC and servers located in the DHMZ headquarters. After the earthquake, it was decided to establish a backup of NWP operations at ECMWF. A list of priority users was drawn up. The goal was to provide a backup solution to them and then to gradually include others. Until then DHMZ had had very little experience with ECMWF’s HPCF: it had built only one version of the ALADIN model and one script for running the model integration.
As the local configuration was reviewed, it was clear that data assimilation would be too hard to set up in such a short time. The solution was to use the initial conditions obtained from an unperturbed member of A-LAEF (RC-LACE ensemble system), which was already running at ECMWF. For lateral boundary conditions (LBC), ECMWF products are used. In coordination with ECMWF, a new channel for LBC distribution was promptly established, so the same products distributed regularly to DHMZ were also sent directly to ECMWF’s HPCF to feed into the ALADIN model. After this, hard work on the porting of different model configurations and post-processing started. With the help of ECMWF user documentation and guidance, in a little more than a week DHMZ had the first prototype of its operations backup running at ECMWF.
Days after the model was configured and had started to run, parallelised post-processing using conda environments and a python/bash framework was set up. It was essential to keep DHMZ’s users informed about what was happening on a daily basis and what DHMZ was doing to mitigate the risk of failing to deliver its numerical products. End users were informed about the availability of products based on the operational backup at HPCF, which were ready to use in case of a major failure of DHMZ’s local IT infrastructure and during occasional delays with local operations. DHMZ received a lot of appreciation for the work done from its critical end users, who felt safe knowing that they would not have any issues with their weather-related decision-making.
Getting on board the European Weather Cloud
All the IT infrastructure used to serve and display products to end users is also located in the damaged DHMZ headquarters. However, for those services the HPCF would not have been a feasible alternative. It was therefore proposed to back them up on the European Weather Cloud. After the project was created on the cloud, DHMZ managed to quickly provision an Ubuntu virtual machine (VM) with predefined options. All relevant components for a web server, such as Apache and PHP, were soon installed and configured. DHMZ found the documentation for creating a virtual machine clear and helpful, but to configure it some prior knowledge of system administration was required.
To ensure continuity of the remaining essential services, an attempt was made to recreate the forecasters’ operational environment in the cloud. As DHMZ uses IBL’s Visual Weather workstation, this was done in coordination with IBL as well as EUMETSAT, who provided both data and additional IT support through various channels. The operational environment in general consists of inputs, the core production part and a number of outputs. To facilitate this, three additional virtual machines (VMs) were set up.
A first reception VM was created, where a subset of EUMETCast (MSG and DWDSAT satellite data) is delivered by FTP (at first as a push service from EUMETCast Terrestrial). A second CentOS VM was dedicated to a VisualWeather visualisation and forecast production system, and a third VM was provisioned as a Network File System shared storage service for internal data manipulation. ECMWF model outputs were made available within the European Weather Cloud Object Storage service, ALADIN (HPCF version) data were imported directly, ICON data from the German national meteorological service (DWD) were collected over DWD’s open data server, and US Global Forecast System data were also ingested into the system. With Global Telecommunication System data available over DWDSAT, LINET lightning data and MSG SEVIRI HRIT satellite data, most of the essential data are now available in the cloud system.
Visualisation and report production configuration were then transferred from DHMZ’s native VisualWeather system to the newly created cloud instance. Forecasters now have access to a familiar system from their home computers with relatively simple means through SSH and VNC services. DHMZ found that setting up the dissemination part proved to be the most challenging aspect, mostly due to a lack of understanding of the security features on ECMWF’s side of the European Weather Cloud.
%3Cstrong%3EEarthquake%20damage%20outside.%3C/strong%3E%20Some%20of%20the%20outside%20of%20the%20DHMZ%20building%20was%20also%20damaged.Earthquake damage outside. Some of the outside of the DHMZ building was also damaged.
The 5.3 magnitude earthquake on Sunday, 22 March 2020 and more than 30 aftershocks caused considerable damage to the 19th-century building hosting DHMZ in Zagreb. No one was injured during the event, but the building was deemed unsafe to work in. Staff managed to carry on with their duties remotely and to deliver the service to the public, relevant authorities and critical end users.
The operational production risk mitigation plan for the near future is to keep the backup of NWP and post-processing production on ECMWF’s HPCF active until DHMZ acquires and puts into operations a new supercomputer, as well as ensuring that related essential services can be run on the European Weather Cloud.
The earthquake in Croatia was a reminder of how natural hazards can endanger the role of national meteorological services to serve society and protect lives and property. Through effective collaboration with ECMWF and EUMETSAT, DHMZ was able to ensure the resilience of its critical weather services in the time after the earthquake, while showcasing the usefulness of the European Weather Cloud and other services at ECMWF.
Many people were involved in making a success of this project. Beyond the authors, several other DHMZ experts should be acknowledged since they were instrumental in either setting up the backup (M. Hrastinski, E. Keresturi, S. Panežić) or establishing communication links (B. Matjačić, M. Tudor). A number of staff across departments at ECMWF and EUMETSAT, as well as IBL, also played a key role in ensuring a quick and smooth journey.