With the development of the Python-based toolbox for the Copernicus Climate Data Store (CDS) and the new Python interface to Metview, ECMWF is stepping up its efforts to provide processing and visualisation options in the Python programming language. Building a Python framework is challenging. ECMWF is therefore looking to benefit as much as possible from activities in the wider Python community.
To engage with the community, ECMWF hosted a ‘Workshop on developing Python frameworks for Earth system sciences’ on 28 and 29 November 2017. The aim was to bring together key actors from the Python community who develop packages for Earth system sciences. The workshop was a great success as many participants had not met before and could for the first time exchange experiences. The event was split evenly between presentations and working groups. The presentations showed the different functionalities provided by the packages and offered insights into the challenges involved in their development and distribution. Packages represented at the workshop included MetPy, IRIS, MET, Pytroll, MetWork, EPyGrAM, CliMAF, ESMValTool and the Community Intercomparison Suite. For ECMWF, it was a chance to show first results of the work on the CDS toolbox and Metview’s new Python interface. Presentations about the xarray package and Cray’s plans regarding Python and containers helped to frame these developments in the context of the wider community.
Three working groups were set up to discuss the various challenges and how we as a community can work together better. Each group tackled a different aspect:
- Deploying and packaging Python frameworks
- Handling Big Data in Python
- (Code) Interoperability and common data structures
The discussions focused on the challenge of enabling better interoperability between the various frameworks represented at the workshop. Some of the main points made were that:
- There needs to be a good mapping of metadata between data formats, especially between GRIB and NetCDF. This is because interoperability between packages relies on the handling of data, and metadata plays a crucial role in giving meaning to the data. It was noted that various attempts have been made to solve these problems.
- It is essential to make as much use as possible of core Python packages, such as NumPy, Pandas, xarray and Dask. This will automatically reduce incompatibilities between packages.
- ECMWF needs to support efforts to engage with the Python community and explain its community’s needs, for example by participating in a Python-for-Earth-system-sciences session at the annual European Conference on Python in Science.
Everyone was encouraged to contribute to the core packages to improve them instead of implementing new solutions. Participants stressed that it is also important to pick the right tool for the job, ideally one with good community support. The various attempts to implement units for Python packages are an example. By supporting one implementation (with NumPy/xarray), interoperability will be easier. Similarly, interoperability can be achieved if all packages use Dask as their main choice for distributed computation. Here the community could work together to achieve an automatic chunking of data when data are read from NetCDF or GRIB.
There was a strong message on community-led developments and distribution. All packages presented were Open Source, but it was pointed out that putting projects up on GitHub also allows Open Development, which is the first step towards building a community. The UK Met Office has worked on providing conda-forge, a community-driven repository of Python packages for the conda package manager. Using services like this enables developers to automate most of their software release work. Some participants reported that these repositories were not only popular with single users installing Python packages but also very suitable for operational environments.