Artiﬁcial Intelligence Revolutionises Weather Forecast, Climate Monitoring and Decadal Prediction

: Artiﬁcial Intelligence (AI) is an explosively growing ﬁeld of computer technology, which is expected to transform many aspects of our society in a profound way. AI techniques are used to analyse large amounts of unstructured and heterogeneous data and discover and exploit complex and intricate relations among these data, without recourse to an explicit analytical treatment of those relations. These AI techniques are unavoidable to make sense of the rapidly increasing data deluge and to respond to the challenging new demands in Weather Forecast (WF), Climate Monitoring (CM) and Decadal Prediction (DP). The use of AI techniques can lead simultaneously to: (1) a reduction of human development effort, (2) a more efﬁcient use of computing resources and (3) an increased forecast quality. To realise this potential, a new generation of scientists combining atmospheric science domain knowledge and state-of-the-art AI skills needs to be trained. AI should become a cornerstone of future weather and climate observation and modelling systems.


Introduction
In this brief Introduction, we address, in a nonexhaustive manner, some important aspects of the current state-of-the-art in Weather Forecast (WF), Climate Monitoring (CM) and Decadal Prediction (DP) and highlight some challenges, before diving into Artificial Intelligence (AI)-related work.
It is difficult to overstate the societal impact of weather forecast [1] and climate change [2]. Numerical weather prediction was presented as a quiet revolution in [1] because its progress has been incremental, but impressive. Its impact in terms of "societal benefits" is "among the greatest in any area of physical science" [1]. Reference [1] must be considered a milestone paper summarizing past achievements and future challenges categorized into three areas, namely physical process representation, ensemble forecasting and model initialization. It focused on numerical models rooted in physics and brought in agreement with observations through data assimilation.
In [2], various components of climate change were analysed with respect to mitigation plans addressing policy makers, which highlighted the urgency of voluntaristic interventions for the drastic reduction of greenhouse gas emissions. The Paris Agreement [3] is a legally binding (there is no legally binding target, but the obligation to regularly set improved national targets is binding) international treaty on climate change. Its goal is to limit global warming to well below 2, preferably to 1.5 • C, compared to pre-industrial levels. In Europe, the Green Deal [4] aims to bring the net emission of greenhouse gasses to zero by 2050.
Practical weather forecast relies on the assimilation of a large number of observations from various types into Numerical Weather Prediction (NWP) models. Since the end of the 20th Century, a breakthrough in the assimilation of global satellite data has resulted in the convergence of the Northern and Southern Hemisphere prediction skill [5]. The current state-of-the-art data assimilation method is the co-called 4D-Var data assimilation [6,7]. For each of the observation types, a so-called observation operator is implemented, allowing relating the NWP model's internal parameters to the specific observation type. In the variational data assimilation process, the internal parameters are tuned in order to reproduce the available observations. For the high-density satellite remote sensing observations, some form of data thinning [8] is needed. The current horizontal resolution of the European Center for Mid-Range Weather Forecasting (ECMWF) deterministic forecast is 9 km [9]. A probabilistic ensemble forecast is obtained by running the forecast model multiple times with slightly different initial conditions. To reduce the computational cost, the resolution is reduced to 18 km for the ECMWF ensemble forecast [9].
The climate is primarily changing due to the accumulated anthropogenic emission of carbon dioxide, with an expected atmospheric lifetime extending from centuries to millennia [10]. The climate change effects-such as global warming, amplified in the Arctic-can be quantified by climate models. An early example was given by [11]. Climate models can be considered as a variant of NWP models. They run freely with the omission of the near-real-time data assimilation, and they have a special focus on long-term integration and long-term systematic changes. In order to realistically simulate the long-term changes, a climate model needs to describe not only changes in the atmosphere, but also changes in the oceans and land cover [12].

Challenges
In general, it is considered that a higher spatial resolution used in NWP models results in a higher realism of the produced weather forecasts, in particular for precipitation forecasts [13][14][15][16][17][18]. Convection-permitting spatial resolutions of the order of 1 km are used for high-resolution NWP models over limited areas. Ideally, they would also be used in global NWP models. However, this is considered to be at the limit of what is feasible following the traditional Central Processing Unit (CPU)-based computing approach, and therefore, alternative computing architectures using Graphical Processing Units (GPUs) have been investigated [19]. For global climate models as well, a convection-permitting spatial resolution is considered highly desirable [20].
Midrange weather forecasts have skill for a forecast range of the order of two weeks; long-range or seasonal ones have skill for a forecast range of the order of three to six months. There is a growing interest in developing so-called subseasonal to seasonal forecasts to fill in the gap between the midrange and seasonal forecasts [21].
The standard use of climate models [2] is to predict the expected long-term climate change, e.g., the temperature increase and associated regional climate patterns expected by the end of the century under various greenhouse gas emission scenarios. There is a growing demand and a high societal relevance for so-called climate prediction [22], which uses initialised climate models to give detailed predictions of the regional climate for the years and decades to come. Within the so-called Digital Europe Program (DEP), the Destination Earth (DestinE) initiative aims to develop Digital Twins (DTs) (a digital twin is a virtual representation acting as the real-time digital counterpart of a physical object or process) of the Earth to study both extreme weather events and climate change [23].

AI Potential
Deep Learning (DL) [24], as a particular form of Machine Learning (ML) and Artificial Intelligence (AI), has recently emerged as a powerful evolution of neural networks. It has dramatically improved the state-of-the-art in image, video and speech processing and is now rapidly spreading to other application domains. DL is a purely data-driven technique, which discovers relations-often intractable by explicit analytic or physical analysis-between the input and output data, by training on reference input datasets and corresponding labelled output data. Several training algorithms are emerging, but the backpropagation algorithm has been the most popular one for many years [25]. Given the ever-increasing availability of "big data" and computing power-particularly on dedicated hardware such as GPUs-needed for the training of the DL networks, the discovery of hidden relations in the data themselves has become more efficient in terms of human effort compared to the traditional approach requiring the handcrafted design of feature extractors used in traditional machine learning, classification and pattern recognition or models translating the available input data into the desired output data.
The broad field of Earth system science appears to be well suited for the application of DL techniques [26]. Ever-increasing amounts of Earth system data are available, from heterogeneous sources ranging from sophisticated Earth Observation (EO) satellites [27] to massively deployed low-cost crowdsourcing sensors [28]. Most of the algorithms and models used to exploit those data are still designed by hand and suffer from insufficient scalability when large amounts of data are considered.
In the particular fields of satellite remote sensing and weather forecast as well, the high potential interest in AI techniques has been recognised [29]. The different elements in the traditional satellite remote sensing/NWP weather forecast production chain show potential to be either replaced or augmented by DL techniques [30].

Observations
Observations form the basis of weather forecasting and climate monitoring. Typical steps in the processing of weather and climate observations are information retrieval, quality control, bias correction, and data assimilation [6] and/or data fusion [31]. Using the state-of-the-art 4D-Var approach [1,6], data assimilation into NWP models is costly both in terms of human development effort-in order to construct specific observation operators for each observation type-and in terms of computing power-in order to solve online the data assimilation cost function minimisation for every assimilation time step.
An example where the information retrieval and data fusion are combined through a single DL network was given in [32], where Meteosat Spinning Enhanced Visible and Infrared (SEVIRI) [33] geostationary satellite images were combined with rain gauges for the estimation of the amount of precipitation. In [32], a multiscale DL network of the U-Net type [34] combines spectral and spatial information from three selected SEVIRI channels at 8.7, 10.8, and 12.0 µm with surface rain gauges. The U-Net architecture is a popular example of a Convolutional Neural Network (CNN), illustrated in Figure 1. We note that those SEVIRI channels are usable day and night and contain information on microphysical parameters relevant to precipitation processes [35]. The network simultaneously performs different tasks: • Retrieval of the precipitation amount from SEVIRI channels trained on rain gauge data; • Adaptive interpolation of rain gauge data; • Data fusion of interpolated rain gauge data and satellite retrievals.
The resulting AI-based satellite/rain gauge precipitation estimates have recently been shown to be more accurate than the preceding state-of-the-art, that is traditional, non-AI-based radar/rain gauge precipitation estimates [36].
Another example of AI-based data fusion is the Multi-Instrument Inversion and Data Assimilation Preprocessing System (MIIDAPS-AI) [37]. MIIDAPS-AI provides an AI-based data fusion of microwave and infrared imagers and sounders and produces the analysis of a variety of atmospheric parameters. The algorithms for the data fusion of around ten satellite sensors was built by two people in less than ten months. The processing time was at least a factor 100 smaller than for a comparable non-AI-based system. In [38], DL in combination with optical flow was used to detect convective initialization. This is an example of the use of DL to enhance an existing observation system, by extracting new information that was hitherto unexploited.

Nowcasting
For short forecast ranges typically up to 4-6 h in the future, the direct extrapolation of observations-referred to as nowcasting-is considered more accurate than an NWP forecast [39]. For satellite-based nowcasting, the extrapolation of satellite images with optical flow methods provided better skill scores than NWP models, in particular for phenomena dealing with clouds [40], e.g., surface radiation, heavy precipitation and thunderstorms. An example of the application of thunderstorm nowcasting is the use in aviation to avoid hazardous weather situations.
The nowcasting methods are expected to be significantly improved by combining different data sources, e.g., satellite images, radar, ground-based data and crowdsourced data. As a consequence, nowcasting deals with the challenge of big data and with shortcomings in the quality control and analysis of the value and weights of the different data sources, characterized by diverse origins, types, densities and reliability.
A DL-based architecture for nowcasting was first proposed in [41] and consists of an encoder-decoder architecture, where the encoder performs motion detection or a more complicated spatiotemporal analysis and the decoder performs motion extrapolation or a more complicated spatiotemporal forecasting. Both the encoder and the decoder used stacked ConvLSTM units, which were based on Long Short-Term Memory (LSTM) [42] recurrent neural networks for time forecasting and which have convolutional structures in both the input-to-state and the state-to-state transitions.
The stacked ConvLSTM architecture is illustrated in Figure 2.
In [43], a DL-based, purely data-driven nowcasting of precipitation using sequences of satellite and radar images as the input was presented. The DL network consists of an input spatial encoder, a middle temporal encoder based on ConvLSTMs and an output spatial aggregator. The network directly produces the probability distribution of the precipitation, i.e., a probabilistic precipitation nowcast, with forecast ranges up to 8 h. The performance of the DL nowcasting, compared to reference radar observations, is superior to a convection-permitting NWP with radar data assimilation. It is noteworthy that the DL nowcasting network of [43] thus outperforms the convection-permitting NWP model on its own grounds [13], namely the one of improved short-term precipitation forecast skill.

Pure AI Forecasting
As opposed to the conventional NWP approach [1], where observations are assimilated into an NWP model based on a physical modelling of the behaviour of the atmosphere, it is possible to completely replace the physical NWP model with a pure data-driven AI-based model without any a priori physical knowledge included [44]. In [45], it was demonstrated that a pure AI forecasting model can have comparable performance to a conventional NWP model, provided that the models have a comparable resolution. A bottleneck for the development of pure AI-based forecasts at a resolution and performance comparable to state-of-the-art operational NWP forecasts is the availability of sufficient training data [45]. The best practical results can therefore be expected from a combination of NWP and AI techniques [46]. This might also facilitate the acceptance of AI in a domain hitherto dominated by numerical (physical) modelling. In [47], a benchmark dataset for the evaluation of pure AI forecasting was provided. Pure AI forecasting is an area of active development. Recent examples were described in [48,49].

Process Parametrisation
When we stay in the context of conventional NWP models, Machine Learning (ML) can be used to improve the parametrisation of physical processes that are not explicitly resolved in the model [50][51][52][53]. Reference [50] dealt with a general framework on how parametrisations can learn from observations and targeted high-resolution simulations, with a focus on the parametrization of clouds and convection. Reference [51] focused on the Machine Learning (ML) parametrisation of moist convection, trained on the output of a conventional parametrization, and its behaviour when used in a climate model. Reference [52] derived a unified physics parametrization by minimising the forecast error over several days of prediction of a near-global cloud-resolving model. Reference [53] focused on the use of machine learning to derive the poorly known parts of a numerical Earth system model, in combination with physics-based modelling for the well-known parts of the model.

Hybrid AI NWP Forecasting
Given the limitation of the available training data, the best forecast performance can be expected from some form of combination of NWP and AI techniques. In August 2018, an AI weather forecast challenge was organised for the Beijing area [54], where the competing teams were asked to provide in real time the best 36 h forecasts for 2 m air temperature and relative humidity and 10 m wind speed, based on NWP model output and the surface observations of 10 automatic weather stations.
Compared to the NWP baseline, Reference [55] improved the forecast for temperature by 24.4%, for relative humidity by 12.4% and for wind speed by 6.2%. Reference [55] used a combination of a Spatiotemporal Attention Network (STAN) and a Multilayer Perceptron (MLP).
Compared to the NWP baseline, Reference [56] improved the forecast for temperature by 17.0% and for relative humidity by 9.7%. Reference [56] used a combination of Recurrent Neural Networks (RNNs) and-similar to [43] for nowcasting-a direct probabilistic output.

Postprocessing of NWP Output
A common way to improve the output of NWP models is to apply Model Output Statistics (MOS) [57] to correct the systematic errors of the NWP forecasts at different lead times. In [58], several Model Output Deep Learning (MODL) methods were investigated for the correction of temperature forecasts of the ECMWF NWP model. Averaged over a forecast range of 3-10 days, the original ECMWF temperature forecast error was 4 • C, the conventional MOS temperature error 3 • C, while the U-Net-based MODL error 2 • C. An example of the use of the U-Net DL architecture for the postprocessing of cloud cover in an NWP model output was given in [59]. An example of the use of the DL architecture for the postprocessing of probabilistic NWP model ensemble forecasts was given in [60].

Multimodel Combination
NWP models are developed and run operationally at different centres around the world. The combination of multiple models can be an effective way to reduce the uncertainty of the forecasts, as demonstrated by using neural network techniques for hurricane intensity forecasts in [61]. An example of the use of DL techniques for the combination of multimodel outputs was given in [62].

Downscaling
Running high-resolution NWP models is costly in terms of computing resources. Convection-permitting NWP models at the global scale are currently at the limit of what is feasible using conventional NWP techniques. A possible solution is the use of DL techniques as described in [63]. Examples of the use of DL for the downscaling of wind fields were given in [64,65]. An example of the use of DL for the downscaling of temperature was given in [66].

Warnings for High-Impact Weather
Despite the large efforts put into the advancement of NWP weather forecasts, predicting weather extremes remains challenging. Reference [67,68] showed an example on the use of DL to learn from data how to predict surface weather extremes over North America with lead times of 1-5 d ahead.

Seasonal to Subseasonal Prediction
An important aspect of climate change is the so-called Arctic amplification (the Arctic temperature is rising twice as fast as the global temperature), which has important consequences for Northern Hemisphere midlatitude weather [69]. In particular, the temperature gradient between the Equator and the North Pole has reduced [70], which results in a weaker and wavier jet stream [71] and an increased occurrence of atmospheric blocking [72] and structural midlatitude drought [73]. In [74], it was shown that this mechanism is poorly represented in conventional weather and climate models and can better be captured by a Subseasonal-to-Seasonal (S2S) forecast system based on machine learning.
Other examples of machine learning-based subseasonal forecast methods outperforming state-of-the-art conventional methods were given in [75,76].
In [77,78], a pure AI forecasting method, which becomes competitive with state-ofthe-art NWP models at the S2S lead times and which is much more efficient in terms of computing time, was presented.

Decadal Climate Prediction
In [79], a CNN was successfully used to skilfully predict El Niño Southern Oscillation (ENSO) events with lead times up to one and a half years and with considerably better skill than dynamical physical forecast models. A particular problem for the training of decadal climate prediction is that the available observation period is too short to achieve proper training. In [79], this problem was solved through the use of transfer learning. First, the CNN was pretrained on model simulations; next, the training was refined using the available observations.

Benefits of New DL Techniques as Enhancements to Traditional NWP Methods
Traditional NWP data assimilation requires the development of a handcrafted observation parameter-hence a dedicated human development effort-for every observation type that is assimilated [1,5,6]. Moreover, it is common practice to use "data thinning" [80] prior to the assimilation, so that only a relatively small fraction of the available observations is actually used.
In contrast, DL methods "learn from data" [24]. This is efficient in terms of human effort. Furthermore, it allows learning hidden relations-which cannot be captured by physical models-and making use of the full potential of the available observations.
A prerequisite for the training of a DL method is the availability of a sufficient amount of training data. Today, not enough training data are available to train a "big bang" or "hard AI" method that can completely replace a traditional NWP method [45]. Instead, the available training data should be used to enhance existing NWP physical models, in a "residual learning" approach [81], where the DL network innovates on top of the physical model.
The state-of-the-art 4D-Var NWP data assimilation is an inverse modelling problem, which requires an iterative online cost function optimisation, which is costly in computing time [6,19]. The DL method's training also requires a cost function optimisation, but in contrast to NWP data assimilation, this optimisation needs to be made only once, in an offline mode. A DL approach, once trained, can be operationally very efficient in computer time.
Probabilistic NWP forecasts are obtained by running multiple forecasts with slightly modified initial conditions. Running multiple forecasts is a "brute force" method, which is costly in computing time. A key advantage of DL is its flexibility. It can be trained for a deterministic forecast, as well as for probability density output [43], without the need to produce multiple integrations of a deterministic NWP or climate model.

Summary
In the past few decades, remarkable progress in weather forecast skill-with high societal benefits-has been achieved. The key technological advances underpinning this success have been the increase in computing power, the development of the 4D-Var data assimilation technique and the availability of global satellite observations for an increasing number of atmospheric parameters. AI and, in particular, DL hold great promise to become the new key technology that will further revolutionize operational weather forecasting in the years to come. DL is data driven; this means that DL models are derived from large labelled datasets, requiring much less human development effort than traditional methods. As not enough labelled data are available to train a complete DL weather forecast model with a resolution comparable to current operational NWP models, the best results can be obtained from the "residual learning" approach, where a DL model innovates on top of an existing physics-based NWP model.
Traditional NWP models are demanding in terms of computing power, in particular as concerns the online data assimilation cost function optimisation and the need to run the model multiple times to obtain a probabilistic forecast. In contrast, DL models require a high computing power only during the offline training phase. In their online execution, they can be very efficient.
Traditional NWP models solve an initial value problem, where the knowledge of the current state of the atmosphere-based on observations-is combined with physicsbased prognostic equations, which are solved iteratively forward in time. In the new DL approach, the traditional initial value problem can be replaced by an end-to-end trainable DL model, where the forecast step is included in the model optimisation. Such a DL model has the potential to be not only more efficient in terms of human development effort and computing power, but also more performant in terms of forecast quality.

The Way Forward
In order to exploit the clear potential benefits of the application of AI for weather and climate studies, a new generation of scientists needs to be trained, combining both domain knowledge in the Earth system sciences and specific AI expertise. AI should become a cornerstone of the weather and climate models of the future, such as the "DestinE" initiative. Operational weather and climate organisations are increasingly including the use of AI in their strategies. The number of workshops, benchmark datasets and journal Special Issues dedicated to the subject of AI for weather and climate application is increasing. It is very likely that within the next five to ten years, AI will become an indispensable part of state-of-the-art weather forecasting and climate monitoring and prediction. Techniques such as transfer learning and data augmentation, which have given a boost to many other applications, could accelerate the impact of AI techniques in climate and weather prediction and reduce the need for extra-large labelled datasets.
Author Contributions: S.D. had the initial idea and wrote the first version of this white paper. J.P.C., R.M. and A.M. provided improvements to the initial text. All authors read and agreed to the published version of the manuscript.
Funding: No dedicated funding was received for this paper.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.

Acknowledgments:
The idea to write this paper originated at the first EUMETNET workshop on AI for weather and climate hosted at the RMIB in February 2020. EUMETNET is a grouping of 31 European National Meteorological Services that provides a framework to organise cooperative programmes among its Members in the various fields of basic meteorological activities. These activities include observing systems, data processing, basic forecasting products, research and development and training.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: