Science blog

The end of theory for Earth sciences?

24 September 2018

Wilko Hazeleger

Wilco Hazeleger, Director of the Netherlands eScience Center

As a keynote speaker at ECMWF’s 18th workshop on high performance computing (HPC) in meteorology, this blog provides an opportunity to share my thoughts on the challenges ahead and just some of the crucial issues we will be discussing. This is the third blog in a series on HPC; see blog 1 and blog 2

HPC workshop graphic

ECMWF's 18th workshop on high performance computing in meteorology is taking place in Reading from 24 to 28 September 2018. Wilco Hazeleger is one of several keynote speakers.

“The end of theory”. This was the title of an article in Wired Magazine ten years ago. No theory needed, the data deluge allows us to extract any knowledge by exploring data. This new empiricism received lots of attention and criticism. I came across it a few years ago when I started to immerse myself into the world of big data, artificial intelligence and computing. Having a background in climate research, I thought I knew all about computation and data. I was wrong.  

I work at the Netherlands eScience Center now, which is an expertise center on research software at the interface of computer and data science and research applications. I got fascinated by the wealth of projects at the eScience Center. Astronomers plowing through petabytes of data to find pulsars, high energy physicists fitting their models to data from particle accelerators in a massive parallel way, humanities scholars learning with machines from ancient texts, sociologists unraveling complex networks, medical scientists taking advantage of modern sequencing methods, imaging techniques and coupling to many other datasets. It is a world with new names such as SPARK and XENON and where bright stickers cover laptops. It is only natural to think about my own field of research and what all these developments in data sciences entail.

The computational demands of Earth sciences

In Earth sciences, including meteorology, there has been a focus on both theory and data over the past century. Based on theory, there is a quest for resolving finer scales in numerical models as physical processes interact seamlessly from the planetary to the molecular scales. Having a theory has large advantages. We know which equations to solve, and theory limits the state and parameter space that we need to explore.

Still, to resolve deep convection, one of the most important drivers of the large-scale weather and climate, at least a thousand-fold more computational resources are needed. New exascale computing developments are promising, but it will take much more than increasing generic computing resources and the machines are too energy-hungry.

We clearly need new paradigms. Only through co-design with manufacturers, hardware vendors, software engineers, computer scientists and weather and climate scientists can such a huge quest be taken on. We can take astronomy for inspiration. Projects like LOFAR are a success due to co-design and their focus on research software and workflows to make very efficient pipelines. Only when we take such a professional approach that serves an entire community can we meet the demands.

Data science methods

Do we need new data science methods as well? Of course. Processes at unresolved scales are not fully understood and boundary conditions are poorly known: think of details and characteristics of the land surface. Interactions and processes beyond the physical domain often don’t have a basic theory. There is already work ongoing in heuristic modelling for radiative transfer, turbulence and cloud characteristics. Not only can such studies aid in increasing understanding, they can also enable faster simulations. Again, taking advantage of computer and data science expertise and domain knowledge will bring in the necessary interdisciplinary perspective to make progress.

The challenge of data integration

The integration of data is an even bigger challenge. There is so much data out there to take advantage of. We can’t even count it. They say it must by zettabytes. Using data from unconventional sensors in meteorology, such as my cell phone, is still at its infancy, let alone using social media data. Especially at short forecast horizons and at small scales, the heterogeneity of the environment and the limits of computation raise opportunities for machine learned models and to include unconventional data. Also, weather and climate information is just one source of information to base decisions on. It takes a much wider perspective on data and simulations to advance science and to advance informed decision-making.

All of this seems daunting to work on, but in proposals like ExtremeEarth (a flagship project proposed to the European Commission), our community comes together and dares to dream and pass disciplinary boundaries. Digital technology when developed in co-design will help us to reach our goals. We are on a mission. Confronting theory and simulations at all scales with data interactively will allow us to increase scientific understanding and improve informed decision-making. I am looking forward to that bright ExtremeEarth sticker, the only one that may cover my laptop case.

Top banner image: MrJub/iStock/Thinkstock