Unraveling the Impact of Land Cover Changes on Climate Using Machine Learning and Explainable Artiﬁcial Intelligence

: A general issue in climate science is the handling of big data and running complex and computationally heavy simulations. In this paper, we explore the potential of using machine learning (ML) to spare computational time and optimize data usage. The paper analyzes the effects of changes in land cover (LC), such as deforestation or urbanization, on local climate. Along with green house gas emission, LC changes are known to be important causes of climate change. ML methods were trained to learn the relation between LC changes and temperature changes. The results showed that random forest (RF) outperformed other ML methods, and especially linear regression models representing current practice in the literature. Explainable artiﬁcial intelligence (XAI) was further used to interpret the RF method and analyze the impact of different LC changes on temperature. The results mainly agree with the climate science literature, but also reveal new and interesting ﬁndings, demonstrating that ML methods in combination with XAI can be useful in analyzing the climate effects of LC changes. All parts of the analysis pipeline are explained including data pre-processing, feature extraction, ML training, performance evaluation, and XAI.


Introduction
One of the peculiar features of climate science is the accumulation of an enormous amount of data [1,2]. The estimated size of climate data exceeds ten petabytes and continues to grow exponentially [3]. Furthermore, the number of different and diverse data sources is also increasing. Initially, information is collected by thousands of ground-based weather instruments all over the world, such as weather stations, as well as by a large number of satellites that perform measurements from kilometers above the ground. These data need to be processed and transformed to formats that are comparable with each other. Climate science multimedia systems exists, but are still underinvestigated compared to other areas such as social media or medicine [4,5].
Motivated by the big data challenges and the need for multimedia systems within climate science, we will in this paper address the important, yet less studied, problem of analyzing the climate effects of land cover (LC) changes, such as deforestation or urbanization. We will go through all the steps of the multimedia pipeline to handle and gain knowledge from the large amounts of climate data to address this problem.
Nowadays, climate changes and global warming are indisputable facts [6][7][8][9][10]. Global surface temperature has been methodically collected since 1850. According to these records, take into account all types of LC simultaneously and further to distinguish the individual impact of different LC changes in regional climate.
In this paper, we follow the same approach as Huang et al. but given the aforementioned complexities in how LC changes affect temperature, we will explore the potential of using well working ML methods, such as support vector regression (SVR), random forest (RF), multiple linear regression (MLR), and least absolute shrinkage and selection operator regression (LASSO) to learn these complex relations [29]. The method that learns the relations best, will further be used to study the effects of LC changes on temperature, using a new suggested framework based on explainable artificial intelligence methods (XAI) [30].

Data
The dataset used in this paper is the same dataset as used in [14]. We provide a brief description of the most important properties of the data, and refer to [14] for further details. The dataset consists of two parts: (1) land cover data described in Section 2.1 and (2) temperature data described in Section 2.2. The dataset approximately covers Europe from about 22°W to 45°E longitude and from 27°N to 72°N latitude [31]. The data have a resolution of 467 cells in the south-north direction and 479 cells in the west-east direction. By excluding grid cells over water, the dataset consists of a total of 121,849 grid cells.

Land Cover Dataset
The European Space Agency (ESA) has produced detailed global LC maps for the period from 1992 to 2015 as a part of the Climate Change Initiative (CCI) [32]. These maps have a spatial resolution of 0.002778 degree (around 300 m at the equator) at latitude and longitude directions, and they contain 37 LC classes following the United Nations LC Classification System (UNLCCS) [33]. To obtain the dataset used in this paper, the 37 UNLCCS LC classes were transformed to the IGBP-MODIS land cover classification system following the cross-walking table given by Huang et al. [14]. The IGBP-MODIS system consists of 21 categories that are described in Table 1. The spatial resolution of the re-classified data was further aggregated to a 0.11 degree (approximately 12 km at the equator) to agree with the climate model simulations described below. Each cell of the aggregated LC dataset contains information about the portion of each of the 21 LC classes. In the period from 1992 to 2015, some categories of LC underwent more substantial changes than others. Figure 1 shows the most prominent LC changes in the dataset, such as the expansion of urban and built-up cover and changes in Evergreen Needleleaf forest. Different colors represent the proportion of a certain LC in each cell on the grid.

Simulated Temperature Data
The Weather Research and Forecasting (WRF) model version 3.9.1 was used to make simulations based on the input data that include the LC data for 1992 and 2015 and the settings of the international Coordinated Regional Climate Downscaling Experiment (CORDEX) initiative (EURO-CORDEX) [14,34]. The WRF model is one of the most accurate models for region climate simulations and has been validated and widely used in Europe [34,35]. Average climate data for 24 years (from 1 January 1992 to 31 December 2015) were produced from the ERA-Interim data and used as initial and lateral boundary conditions [36]. The result of the WRF model simulations we focus on is the simulated 2-m air temperature in degrees Celsius for each day of the year [14]. The simulated temperatures are a result of two runs of the WRF model. In these two independent runs, the boundary conditions are the same, but the LC dataset is different: one is with the input data of LC in 1992 and the other is with the input data of LC in 2015. Therefore, these simulations illustrate how the temperature would change if only LC changed. We refer to Huang et al. [14] for a detailed introduction of the setting of the simulations.

Machine Learning and Explainable Artificial Intelligence
Let x i,j , i = 1, . . . , N, j = 1 . . . , p represent N observations of p features, and let y i , i = 1, . . . , N represent some associated response. The aim of ML is to learn a function that is able to predict the response from these features. In this paper we consider four well-known models, namely MLR, LASSO, SVM, and RF.
MLR assumes a linear association between the features and the response where ε i represent zero mean Gaussian distributed error terms. The parameter estimates are usually found by minimizing the least squares error β 0 , . . . , β p = arg min Given many features, a potential challenge with linear regression is that the model not only fits the signal in the data, but also the noise, usually resulting in poor prediction performance on held-out data. The problem is referred to as over-fitting. Regularization is a popular technique to address this issue. For example the LASSO model adds the sum of the absolute value of the parameter estimates as a penalty term to the optimization [37] β 0 , . . . , β p = arg min A positive property of the LASSO, is that the resulting model often will be sparse in the sense that most of the parameter estimates are set to zero, making model interpretation easier. A higher value of the regularization parameter λ results in a more sparse solution, and less chance of over-fitting. In this paper, we adjusted the value of λ to optimize prediction performance on held-out data.
We also consider two other very popular ML methods, namely the SVM and the RF. When the response is continuous (as it is in this work), SVM is often referred to as support vector regression (SVR). The idea behind SVR is to find the regression plane such that as many of the observations are within a (support) region around the regression plane as possible. The width of the support region is also part of the optimizing procedure.
The RF model consists of an ensemble of decision trees and, thus, is called random forest. A decision tree is a flowchart-like structure in which each internal node represents a decision based on a single feature or linear combination of a subset of features. The classification or prediction decision is based on a series of such individual decisions. RF is based on using different bootstrapping techniques to train multiple decision trees. RF makes decisions based on all the trees, for example through the average output from the trees or the majority output. In this paper, we based the decisions on the average outputs.
High dimensional data or a complex model, can make model interpretation difficult. Regression models can to some extent be interpreted by studying the size of the regression parameters β 0 , . . . , β p , and represent the core of statistical inference. However, other models, such as the RF, are far more difficult to interpret. Recently the field of XAI has received a lot of interest trying to provide explanations for such opaque models. The core idea of XAI techniques is quite simple, and based on analyzing how changes in the input features affect the model output, but more sophisticated methods have also been developed [30]. In this paper, we will resort to a quite simple XAI approach based on analyzing how changes in a single feature will affect the output. The analysis will be explained in further detail in Section 4.2.

Experiments
In this section, we will describe the experiments with the aim of measuring the effects of LC changes on temperature. The section is organized as follows. In Section 4.1, we describe how to extract features from the dataset, and in Section 4.2, we describe our XAI-based method to analyze the effects of LC changes on temperature.

Feature Extraction
Our approach is based on using the difference in LC as features where LC 1992 refer to the portion of LC type j in grid cell i in 1992 and 2015, respectively. Since the x i,j 's represent differences between two portions, it follows that To be able to study the effects of LC changes on temperature we define the differences in average temperatures where N refers to the number of grid cells and T 1992 i,d and T 2015 i,d to the temperatures from the simulations described in Section 2.2 in grid cell i at day d. D refers to some part of the whole year, and |D| the number of days in this period. In the computer experiments, five periods were used, namely winter (December, January, February), spring (March, April, May), summer (June, July, August), and autumn (September, October, November), and the whole year.
We will predict y i,D using LC changes in the same geographic location in line with the recent literature [14,25]. For a given period D, the dataset used in the experiments therefore were as follows Output:

Analyzing Effects of LC Changes on Temperature Using XAI
In this section, we explain our approach to analyze the effects of LC changes on temperature. We suggest to use a XAI technique which is based on inserting different LC changes into to the trained models to study the resulting effects on temperature changes. We point out three important considerations when using this approach: 1.
To analyze the effect of some LC changes, we must check that the given change is frequently present in the training dataset to ensure that the ML method has learned the relation between this LC change and the temperature change well. If a LC change is not present in the training dataset, we cannot trust the model prediction related to this change. We will, therefore, only show the effects for the most frequent LC changes in the dataset.

2.
In statistical inference, multicollinearity is a well-known issue, resulting in a range of models that have about the same agreement with the data, but may represent different inferential conclusions. The XAI technique used in this paper has to tackle the same potential challenge. However, except from the "grand" collinearity in Equation (5), we did not observe any strong multicollinarity in the data, and we can, therefore, reliably use the suggested XAI approach. 3.
It is well-known that the association between LC changes and temperature is complex and associated with noise. It is, therefore, important to quantify the uncertainty of the ML prediction. We quantify uncertainty by using standard (1 − α) · 100% model output prediction intervalsŷ ± z α/2σ (8) whereŷ is the model prediction, z α/2 the α/2 quantile of the standard normal distribution andσ the estimated standard deviation prediction error [38]. The standard deviation was estimated by the prediction error on unseen test examples in a ten fold CV experiment over the data samples in Equation (7).

Results
In this section, we summarize the results from the experiments described above. In Section 5.1, we compare the prediction performance of the different ML methods introduced in Section 3, and in Section 5.2, we show the effects of LC changes on temperature.

Evaluation of ML Methods to Predict Temperature Changes from LC Changes
In this section, we represent the performance of MLR, LASSO, RF, and SVR to predict temperature changes y i,D from LC changes x i,j . In addition, we evaluated the performance of a baseline predicting the temperature without using the LC features, i.e., the baseline predicted using the average temperature in the training data. The whole geographic area was divided into sectors, sizes 25 × 25, 50 × 50, and 75 × 75 cells. The methods were evaluated on a spatial cross validation (CV) procedure, where the methods were trained on data from all sectors except one. The remaining sector was used as a test set. This approach evaluates how well the methods can learn from some parts of the geographic area, and predict on others. We also evaluated the algorithms where the cells used for training and testing were randomly selected over the whole geographic area. The results from the sector and the randomization experiments were consistent, and only the results for the sector approach are shown. The prediction performance were measured using root mean squared error (RMSE) between the temperature differences y i,D and model predictions. The results are shown in Table 2. We see that RF outperforms the other ML methods, especially the linear regression models that represents the current practice in literature [14,25]. Using the 5 × 2 CV test [39], we verified that the RF performed significantly better than all the other algorithms with p-values < 10 −14 . We further see that the other algorithms performed about equally well to the baseline and thus had trouble taking advantage of the information in the LC features. The performance of the methods are reduced with the larger sectors, and is as expected since the difference between the properties of the training and test areas are greater and the number of data samples used for training are reduced.

Effects of LC Changes on Temperature
The analyses are based on the RF model which had the best performance in our experiments. Since the RF model outperformed the linear regression models, that represent current practice [14,25], it is reasonable to assume that our analysis will be more robust and potentially reveal new insights from a climate science perspective. We will discuss this further in Section 6. The analyses are based on the suggested approach in Section 4.2. Table 3 shows the temperature change for the whole of Europe, if a grid cell completely changed from one LC to another. The table shows the 15 most frequent LC changes in the dataset ensuring reliability in our results (recall consideration 1. in Section 4.2). Tables A1-A3 in Appendix A, show results for the northern, central, and southern regions of Europe. The values in the parentheses in the tables, show 95% prediction intervals. Blue and red cells show LC changes resulting in statistically significant cooling and warming, respectively. Intense blue and red cells show cells where LC changes resulted in at least a 0.5°C temperature change with 95% certainty. The column 'Event rate' shows the number of cells where the LC change was observed in the dataset.

Discussion
Despite the substantial amount of uncertainty in the predictions, the results reveal several statistically significant temperature changes. They also show that the most frequent LC changes result in mainly warming in northern and central Europe and primarily cooling in the southern Europe. The most frequent LC changes are largely different for the different parts of Europe, which makes sense since the different parts of Europe mainly consist of different types of vegetation. However, for the LC changes that are frequent in more than one part of Europe, we observe a consistency in temperature change. For example cropland to urban built-up result in significant warming in all three parts of Europe and for the whole of Europe. There is also a consistency between seasons of the year in the sense that a LC change either results in warming or cooling for every season, and interestingly this observation was not detected by Huang et al. [14] with the regression based approach (there are no rows with both red and blue cells). For example, for the whole of Europe, deciduous broadleaf forest to cropland results in statistically significant cooling for both summer and autumn and no statistically significant warming (or cooling) for the other seasons.
To further verify the validity of our suggested approach, we now analyze how consistent our results are with other studies based on statistical approaches and climate model simulations. Many studies revealed a strong correlation between temperature increase and growth in shrub species [6,[40][41][42][43]. Some of these researchers discussed the positive feedback loop when LC transitions affect climate, while temperature changes also influence LC transformation [40,43,44]. Firstly, a warming increases a spreading of shrublands. Then, LC transition to shrublands influences the energy exchange, increasing the absorption of solar radiation due to lower surface albedo. This, in turn, results in a temperature rise. However, it can be complicated to distinguish what is the main driver in this feedback loop. In this paper, we analyze only the impact of LC on temperature change, ignoring the effect of a warming on LC. We observed that transition to open shrublands alone leads to a temperature increase in northern and southern Europe.
Some works demonstrate that shrubland increase in Arctic can lead to an annual temperature increase [41,42,45], which is consistent with our own findings. However, most articles only consider the growth of shrubs and do not pay attention to the initial cover. Therefore, our approach can help in understanding how prominent is the effect of LC transformation to shrubs depending on the initial LC. For instance, the replacement of barren or sparsely vegetated cover to shrublands causes a more significant warming than a temperature rise associated with transition from permanent wetland to open shrublands.
Urbanization and its impact on temperature is another subject which draws the interest of climate scientists. In general, researchers conclude that the transition to urban and builtup covers causes a warming [7,14,46,47]. Indeed, we also observed that most of the LC changes to urban and built-up covers results in a temperature growth during the whole year, as well as seasonally.
Deforestation and its contribution to a temperature increase, is an important research subject that has been explored by many authors [14,48,49]. In this paper, we also observed a similar trend. Most LC changes associated with deforestation observed in our work lead to a significant temperature increase.
Afforestation is considered as a possible solution to the problem of the warming effect of deforestation because of its contribution to cooling [7,48,49]. In this paper, we detected such a trend in southern Europe where the shift from cropland or natural vegetation mosaic to Evergreen Needleleaf or deciduous broadleaf forest results in a significant cooling. However, in central Europe, we could not identify a clear pattern in temperature change associated with afforestation. Moreover, the transition from permanent wetland to any kind of forest contributes to a warming in northern Europe. This is consistent with the results of Li et al. where a transition of any LC to forest leads to a cooling in tropical regions but to warming in high latitudes [49].
Summarizing, we can conclude that our predictions of the LC-change-impact on temperature are consistent with the main trends described by the IPCC [6,7] and other studies. Our analyses also revealed new insights which supports the assumption that the ML techniques can be a useful tool in climate science, and it is possible to develop a model that can make a meaningful prediction. In addition, our approach allows us to extract more complex patterns and gain a more clear understanding of the effect of different LC transitions. This demonstrates that the ML techniques can help to figure out the effect of LC changes on surface temperature which opens up for a myriad of future work to explore and exploit this further.

Conclusions
In this paper, we have presented a framework based on ML and XAI to analyze the effects of LC changes on temperature. The results show that the RF model documented better prediction performance that linear regression based models, that is the current practice in the literature [14,25]. Our framework based on RF is able to find several statistically significant relations that align with other research. Our analyses also revealed new insight from a climate science perspective. For example the consistency between seasons.
We train models that predict temperature changes using LC change at the same geographic location as features. However, it is expected that temperature changes can also be affected by LC changes at other geographic locations. An interesting direction for future research is, therefore, to develop models to predict temperature using also LC changes from other geographic locations as features. This will, however, complicate the XAI analyses since temperature changes in the model now depend on LC change from multiple geographic locations. Another interesting direction is to analyze the effects of telecoupling, how LC changes in one place affect the climate in other locations

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Effects of LC Changes
Tables A1-A3 show temperature changes from the most frequent LC changes for the regions northern, central, and southern Europe, respectively ( Figure A1 shows the regions).  Figure A1. Three regions used to predict the effect of LC changes on surface temperature: northern (green), central (yellow), and southern (red) Europe.