This project has ended | 2015 to 2018
The H2020 EarthServer-2 project concluded after three years in April 2018. The two major project targets were achieved:
- The use of OGC web services for data access was successfully demonstrated on various data sets from three Copernicus services (CEMS, C3S, CAMS)
- A connections to ECMWF's MARS archive by a third-party software to access data sets and ingest them on the fly for advanced data processing was successfully demonstrated.
The outcomes of the project have already started to influence the development of future web services ECMWF to offer new services to users of ECMWF forecasts.
OGC web services
ECMWF already delivers large amounts of forecast data in real time to its users and data from its MARS archive to the meteorological community. These services work well for users in the meteorological community, but businesses and scientists from non-meteorological domains wanting to integrate meteorological data into their services often struggle with domain-specific interfaces, data formats and data volumes. Web services offer easier and customised access to the data: data is accessed via the Internet using a URL and can thus easily be integrated into web or desktop applications and scientific workflows. One of the main advantages of web services is the possibility to retrieve time-series data. Especially users of climate data are interested in retrieving climate information for specific locations. Retrieving time-series information directly saves users from time-consuming downloads and extracting information from large amounts of data.
Outreach activities, such as PyData London 2017, were important to gather feedback from users, especially if they did not have a meteorological background.
Web services using OGC standards bring several additional benefits. OGC standards provide data access in a standardised way which is already well established in the world of geospatial information systems (GIS). This makes it easier to exchange or combine data from different data systems. Instead of learning how to access data from different systems, users merely have to learn the structure of an OGC web service request. The same structure is applicable to any OGC web service. An added benefit of using standards is vendor neutrality. Service providers can choose from different solutions to provide a standard web service endpoint.
Data access on demand can easily be integrated into data processing workflows. Data does not have to be downloaded any more but can be accessed directly from the server hosted by the data provider. Data processing or analysis workflows with Jupyter notebooks, for example, can easily be shared and reproduced among team members and colleagues.
Example Jupyter notebook accessing the CSDS through OGC WCPS.
Accessing data in MARS archive by offering web services
Most tests on OGC web services, mentioned above, were done with smaller (in tens of GB) ingested data sets. To access larger data sets stored in the ECMWF's MARS archive, it was not feasible to duplicate them by ingesting them in a different system. Therefore a different solution was explored.
The ERA-Interim reanalysis was chosen to be the first on-the-fly dataset accessible through the web service interface. Data is retrieved through a multi-step approach that completely abstracts the data retrieval from the MARS syntax. The approach achieves the efficient retrieval of data from MARS by using the rasdaman array-based database technology and FeMME, a metadata manager that hosts metadata to enable the two systems to communicate with each other. The steps can be summarised as follows:
- The user sends a WCS request to the public-facing interface of the service and the request gets translated into a corresponding MARS request.
- The MARS request then gets dispatched internally and placed in the queue.
- The data retrieved from MARS is then registered into the rasdaman database (according to pre-defined and data-dependent configuration files that define how data is mapped).
- Finally, a response to the user is sent as soon as the data is available for download. This way, the user is able to send queries for analysis-ready-data without fetching potentially large GRIB files and the necessity to write explicit requests to the MARS system. At the same time, this approach benefits from the extensive data-processing functionalities of the processing extension (WCPS) of the WCS standard.
The advantage of this architecture is twofold. Firstly, the user can interact with the service through plain WCS requests without retrieving (and then post-processing) large amounts of meteorological and climate data. Secondly, there is no need to ingest large amounts of potentially unused data into the rasdaman database. This makes the approach scalable and suitable for larger databases.
Overall design of the MARS access.
EarthServer - Agile Analytics on Big Data Cubes
About the project
The Horizon 2020 funded EarthServer project aims to establish scalable web-based analysis and processing services for multi-dimensional geo-referenced Earth Science data. Big Earth Science data often need extensive processing in order to retrieve meaningful and communicable information for users and decision-makers. The challenge is to extract the right information from petabytes (PB) of raw data. Ideally, data processing and analysis should be performed on server-side and only kilobytes of refined information are downloaded.
By exploring rasdaman, an intelligent Array Database Technology, in combination with standard data access protocols defined by the Open Geospatial Consortium (OGC), EarthServer explores the possibilities and challenges of providing access to data sizes beyond one petabyte of 3-D and 4-D Earth Science datacubes. In the course of EarthServer, large data centres, such as ECMWF, PML, MEEO/ESA, NCI and Jacobs University, aim at setting up web-based services related to water, air, climate and planetary science data.
About ECMWF's role in EarthServer
ECMWF's participation in EarthServer is in the scope of its Scalability programme. MARS is ECMWF's Meteorological Archive and Retrieval System, the world's largest archive of meteorological data. In November 2015, the MARS archive held ~87 PB of data and grew by additional ~3 PB every month. In order for users to fully benefit from the potential of a data volume beyond the PB, it is in the interest of ECMWF as a data provider to minimise the necessary data transport and yet, to provide access to the full range of data and information.
ECMWF will set up a test web service that facilitates climate data access, exploration, analysis and visualisation. The aim is to access more than 1PB of global reanalysis retrieved from ECMWF's MARS archive and to serve it with the help of rasdaman and OGC-based standard data access protocols in order to e.g. visualise the data with NASA WebWorldWind.
In this way, ECMWF data shall become easier accessible to researchers and decision-makers who use GIS standards. We aim to
- minimise data transport in general,
- serve data and information in standard GIS formats, and
- provide web-based data services on-demand.
EarthServer-2 is receiving funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 654367.
EarthServer Project Partners