The role of ECMWF and the C3S in historical data rescue

Still from reconstruction of Ulysses storm over the UK in February 1903; credit: Philip Brohan

Long records of weather conditions stretching back to the beginning of the 20th century and earlier are vital to understand climate variability and change. Such data provide a baseline of past, historical climate and underpin efforts to monitor, manage and adapt to the changing risks from climate extremes and hazardous weather.

While many millions of observations now pour automatically into archives from across the world several times a day, before the digital age the situation was very different. Weather observations were recorded by hand on paper. While many of these observations have been digitised, scores of records are still unavailable to weather and climate science. Undoubtedly, more data lie in historical documents as yet undiscovered and at risk of perishing. Unfortunately, numerous records have already perished.

Data rescue is the process of recovering and digitising such old weather records, and is an area attracting growing interest because of its importance in supporting climate research and climate services. This important activity is now being supported through the Copernicus Climate Change Service (C3S), implemented by ECMWF on behalf of the European Union.

What is data rescue?

Data rescue is the discovery, preservation, quality control, digitisation and consolidation of past observations of the Earth system, including land surface, ocean and upper air data. It is not just about discovering and digitising the data, but also involves bringing those data together in a consistent way and making them available to all. It is a huge task.

Prior to the digital age, observations were recorded by hand, for example in ship log books and diaries. Other important sources include paper records such as pressure measurements from barographs (Figure 1). While long and important historical weather and climate datasets exist, a huge amount of weather and climate data have still not been digitised. The digitised records that do exist are often focused on particular regions and many, as yet, have not been consolidated into a global archive.

Barograph strip chart from a French station

Figure 1: Barograph strip chart from a French station. Strip charts are a precious but often neglected source of high‐resolution observations. Image credit: Météo‐France. Image reproduced under Creative Commons Licence by 4.0

New data sources are being discovered all the time. Recent discoveries in archives and museums in Norway and Finland uncovered more than 50,000 images from log books, for example. Archives in Sweden and Denmark are little explored. Such log books can cover ship voyages into the “data sparse” oceans of the mid- and high-latitude southern hemisphere. These are just some examples of the huge amount of non-digitised data that exist in archives around the world.

Even in relatively recent decades, Earth observations, such as those from satellites, were recorded on magnetic tapes or on punch cards which can deteriorate, get lost or even be inadvertently discarded.

Many data rescue activities focus on land-based surface stations which can provide a long record of observations from the same location and are vital for climate change studies. However, marine and upper-air data rescue have also received attention and are important, especially for improving global reanalyses.

Core variables are temperature, precipitation and pressure, but new areas of interest can develop. For example, there is now considerable interest in historical solar radiation and wind data for planning and managing renewable energy generation. Observations of ice from polar regions are also of great interest.

There is also a growing interest in sub-daily (e.g. hourly) data to support the analysis of short time-scale extremes (such as intense rainfall).

The process of data rescue

Data rescue involves a number of key steps:

  • Search, discovery and archiving

This covers both data and essential background information about the data and the rescue activity (e.g. accurate time and location (including station height) of observations, information about the instrumentation, etc.). Data sources then need to be archived, catalogued and stored in a way that prevents deterioration.

  • Imaging and digitising

This stage involves scanning/photographing records and then transcribing into electronic form (digitising) the useful climatic data they contain.

  • Quality control

This is a vital, though painstaking process, to ensure data (and background metadata) are of good quality. Checks are made for obvious observation errors and to ensure the original data are digitised accurately. Quality control can also involve comparison with any other available data and consideration of changes in observational practices, station location, etc. 

  • Formatting and storage

A common data format and accessible data storage ensure effective sharing of rescued data. This step also ensures greater visibility of data rescue efforts and helps to maximise benefits.

  • Consolidation  

Bringing historical climate datasets together to form a comprehensive global repository is an important, but substantial, additional step that C3S is supporting.

A schematic of the data rescue process (blue rectangles). The C3S Data Rescue Service will provide assistance in every phase, as shown by the red circles.

Figure 2: A schematic of the data rescue process (blue rectangles). The C3S Data Rescue Service will provide assistance in every phase, as shown by the red circles. Image reproduced under Creative Commons Licence by 4.0.

The importance of historical data for reanalysis

Reanalysis is a process of reconstructing past weather conditions using a weather forecast model blended with observed data. It aims to provide a globally complete picture of the climate system, produced in a consistent way over time. Clearly, the more quality-controlled observations you have, the more realistic your reanalysis will be. So, quality reanalyses and data rescue go hand in hand, particularly for reanalyses which go back to the beginning of the 20th century (or earlier). This was demonstrated clearly in the 20th Century Reanalysis Project, from which ECMWF’s 20th century reanalysis work developed.

ECMWF produced its first reanalysis of the 20th century as part of the ERA-CLIM project, and data rescue was one of the activities in the project. ERA-CLIM and its follow-on project ERA-CLIM2, have rescued more than 5.5 million station days of surface measurements and more than 1.1 million station days of upper air measurements (ref: PDF icon Observations for Reanalysis, Nick Rayner).

Figure 3 shows an example of the difference that rescued data can make. With rescued data (central panel) a much clearer picture emerges of a strong storm crossing the UK in February 1903.

Figure 3: Two reconstructions of the Ulysses storm over the UK in February 1903 showing mean sea-level pressure (mslp). Left: early version of the 20th Century Reanalysis (20CR) v3 and middle: after assimilating additional observations from UK Daily Weather Reports. Right panel: a leave-one-out cross validation; black lines show observed pressures (when available, typically twice daily), blue dots the original 20CRv3 ensemble at the station locations, and red dots the 20CR ensemble after assimilating all the observations except the observation at that location. Credit: Philip Brohan

Data rescue – existing activities and challenges

Many countries have national initiatives to rescue weather and climate data. The World Meteorological Organization (WMO) publishes guidelines for best practice and has established the International Data Rescue (I-DARE) portal where data rescue activities can be reported.

There are many individual projects and one-off activities dealing with data rescue for specific needs. While such data are often made publicly available, their short-term nature can mean expertise is lost and rescued data archives are not maintained.

Citizen science also has a role to play. For example, as part of the OldWeather initiative, volunteers explore, mark, and transcribe ships’ logs from the 19th and early 20th centuries (Figure 4). These tasks are very difficult to automate, due to the diversity of reports and idiosyncratic handwriting that only human beings can read and understand effectively.

Screenshot of (new Whaling Chapter). Citizen science projects such as have provided millions of valuable ship data.

Figure 4: Screenshot of (new Whaling Chapter). Citizen science projects such as have provided millions of valuable ship data. Image reproduced under Creative Commons Licence by 4.0

ACRE (Atmospheric Circulation Reconstructions over the Earth) is an international initiative, led by the UK Met Office, which is linking together data rescue activities at the international, national and individual project level. It is focusing on land and ocean surface data over the last 250 years and provides a coordination and support role as well as undertaking some data rescue activity itself. The data rescue efforts are focused on providing historical data for reanalyses back to the 19th century. The initiative provided a foundation for the development of the C3S Data Rescue Service.

While much progress has been made with data rescue, many challenges still exist:

  • Data rescue is a huge, highly labour-intensive task.
  • It cannot be done alone by a single country/group/project.
  • Data rescue can suffer from insufficient funding/staffing.
  • There is a risk that activity may not be sustained.
  • There is a need for greater coordination and support.

Data rescue services – through ECMWF and the Copernicus Climate Change Service

The C3S has considered how it can best help to promote and sustain data rescue activity. Fundamentally data rescue activities need:

  • to be sustained, comprehensive and long-term;
  • to be carried out in a distributed way across a large number of smaller projects;
  • to have mechanisms to coordinate and bring together individual activity. 

In 2017, C3S signed an ambitious four-year contract with the UK Met Office to develop a service to facilitate and coordinate meteorological data rescue activities worldwide (Figure 2). The C3S climate data rescue service will provide:

  • a portal for discovery and registry of both data rescue projects and individual datasets that link with the existing WMO I-DARE portal;
  • tools for scanning, digitising and quality checking data;
  • an upload facility to ensure newly digitised data are made available to all through the C3S Climate Data Store.

A prototype portal was launched in 2017 and will be made fully operational in summer 2019.

In a collaboration with NOAA, the rescued land and marine surface data are being combined with current observations in the C3S Climate Data Store. It will involve integrating centuries of meteorological records from NOAA’s data archives and the growing quantity of rescued data from the C3S Data Rescue Service, into the C3S Global Land and Marine Observations Database. This new global repository of historic weather observations represents a real breakthrough, but is a massive undertaking. Work will continue over the next two years, but with an early version of the repository being released in 2019.

Looking further ahead, there are plans to extend C3S data rescue efforts to include satellite data and to produce an ECMWF reanalysis all the way back to 1850.

The C3S Data Rescue Service and repository represent a system that is built to last. It is an exciting and major step forward, but is only possible through international collaboration, the coordination role of C3S and the long-term vision and funding from the EU Copernicus Programme.