Remotely sensed data are available from a wide range of sources, ranging from satellites to drones, and have been used for a very wide range of environmental applications and analysis of spatial and temporal trends. For example, Landsat data are freely available [1
]; the imagery covers a wide geographical area, and it avoids expensive, extensive and often impractical in situ measurement. There is a strong advantage in using remotely sensed Landsat imagery for land use and land cover (LULC) analyses in detecting and estimating the magnitude of spatio-temporal trends in measures of the quantity of green vegetation [2
]. Monitoring long-term trends of green vegetation in a semi-arid region gives valuable insight into dependencies and changing quantities influenced by climate variability. For monitoring and analysis of green vegetation, the infrared (IR) and near infrared (NIR) spectral channels are best suited since they discriminate between green and active vegetation versus woody vegetation or organic litter [4
]. Fractional cover (FCover) data is a derived product out of Landsat imagery and shows the fractions of existing land cover in one pixel as percentages that are contained within the pixel. A satellite pixel combines the reflected radiation from different objects on the earth surface, and this spectral mixing effect results in a so called mixed pixel, or Mixel [5
]. In a spectral unmixing approach, the Landsat pixel is divided into assigned biophysical variables [6
]. For example, in our study described below, we used only the band that shows the fractions for green vegetation out of a three-layer composite containing two additional layers for bare soil and for non-green vegetation. The derivation of FCover is described in [6
Environmental Modelling is important when we want to understand and monitor the local variability and spatial trends over time of green vegetation. Instead of applying our method to the full resolution, we used aggregated FCover pixels showing a much coarser resolution to examine green vegetation trends. The aggregation scheme, the best resolution for LULC studies and its suitability are described in our paper [12
]. The goal is to detect spatial and temporal trends based on 30 years of data. More specifically, we want to understand how the linear trend in FCover changes over the spatial region, and how well this can be described by geographic coordinates. For this, we specify qualitative factor levels showing six categories of green vegetation trends that are listed and further described in Table 2. Then, we model these trends using latitude and longitude coordinates that serve as a North–South gradient (latitude) and East–West gradient (longitude).
The use of latitude and longitude as surrogate covariates is not uncommon. For example, in a study in [13
], the authors used latitude and longitude coordinates as surrogate variables for North–South and East–West gradients to account for the variation in deciduous forested ecoregions. The response was an aggregated Normalized Difference Vegetation Index (NDVI) variable used as an on-site quantification of vegetation in North America. Similarly, in a study of the geographic distribution of plant functional types [14
], the authors examined the relationship of precipitation and temperature on C3 (cool-season grasses) and C4 (more drought resistant warm-season grasses) grass types and shrubs using latitude and longitude coordinates. Along a given longitude, C3 grasses increased with latitude and as one moved westward, C4 grasses were replaced by shrubs. They concluded that latitude and longitude can be used as surrogate variables for the main climatic dimensions of the area. The latitude and longitude explained a substantial portion of the variability of the distribution of the relative abundance of shrubs, C3 grasses, and C4 grasses.
In general, there are many methods using machine learning approaches for predicting temporal and spatio-temporal trends that are not limited to green vegetation. Examples include long-term seasonal changes of the Danube River eco-chemical status [15
], epidemiology studies and analysis of disease processes in public health [16
], spatial and temporal trends of birds over France [17
], long-term trends in dryland vegetation variability in Ethiopia [18
], and identification of environmental controls in fire-prone biome and spatial patterns at several spatial scales in the Canadian boreal forest [19
]. In a previous paper, we evaluated the performance of a popular machine learning technique, namely Boosted Regression Tree (BRT), and concluded that it can perform well in high-dimensional and complex problems, deal with missing data by default without the need for interpolation/infilling, describe complex nonlinearity and interactions between variables, deal with spatial and non-spatial data and different data granularities, and reduce data complexity without negatively affecting prediction performance [20
]. However, the focus of that paper was on spatial estimation of environmental indices at a single point in time. In this paper, we focus on estimation over both space and time. We do this by proposing a two-step approach comprising the extraction of slope coefficients out of the model summary and the predictions of the extracted slope coefficients using BRT. The combination of a linear regression and a nonlinear BRT model defines our two-step approach.
The detection of trends in change of green vegetation over time is essential for the assessment of the impacts of climate variability on the LULCC (Land Use Land Cover Change) of a region. The study described in this paper aims to determine the annual trends of slope coefficients over a semi-arid region. Long-term (1987–2017) gridded aggregated FCover fractions of green vegetation data are used to spatially divide the FCover scene. Historical trends are examined using a linear model to regress the aggregated green fraction over time for each grid cell. The extracted grid-specific slope coefficients are then used as a response variable, with the corresponding latitude and longitude as covariates, in the hierarchical supervised machine learning BRT model. The BRT results thus provide an evaluation of the spatial nature of the overall temporal trend in green vegetation over time.
The paper is structured as follows. Section 1
provides background information, places our study in context to other studies and demonstrates why there is a gap we need to fill. Section 2
introduces the study area and presents the context of the other linear model approach to extracting the slope coefficient. In Section 3
, we introduce the BRT modelling approach and describe the hyperparameter tuning steps and the model goodness of fit. Section 4
presents the results of the two stages of the analysis. The implications of the data and the output of the prediction of the BRT, as well as strengths and limitation measures of BRT are discussed in Section 5
In this paper, we have proposed a two-step method for evaluating the spatial patterns of linear trends across a landscape, based on the geographic covariates latitude and longitude. The relative importance of these covariates, combined with the trend estimates themselves, can provide a deeper understanding of environmental impacts on the target response. For instance, in the case study considered here, the analyses allow insight into whether climate variability appears to have little to no impact on the existing green vegetation our study area. In Figure 2
a, we show the distribution of green vegetation fractions in boxplots covering 30 years and visualising the inter quartile range, minimum and maximum values. We can see an increase in the median, especially in the years 2011 and 2014, the highest fraction of green vegetation of 70% and higher. This is surprising because, in many studies, a general trend of desertification in semi-arid regions around the world could be found. However, our results demonstrate that 84.48% of all extracted slope coefficients show a neutral to a slightly positive trend in green vegetation as shown in Table 2
We conducted a temporal and spatio-temporal investigation on one overall data set or on three data sets covering one decade each to get a better understanding if there are seasonal patterns that will not be captured by the overall 30-year time frame. Our findings are demonstrated in Figure 7
and Figure 8
. The RMSE errors listed in Table 3
indicate that there is no significant influence in dividing the data set to improve prediction accuracy. In addition, it can be seen that BRT under-predicts the slope coefficient when using geographic coordinates as spatial gradients.
By plotting the p
-values using their geographic coordinates, we can demonstrate a spatial trend of significant strong p
-values associated with the extracted slope coefficients as demonstrated in Figure 5
. Furthermore, we demonstrate a stronger influence of the longitude coordinates in explaining our response variable as demonstrated in Table 4
. To get insight into temporal and spatio-temporal trends, we split up the FCover scene and investigate several scenarios, namely the whole data set, the three decades and 30 years in four even segments of the FCover scene. We investigated if there are spatial trends in the slope coefficients and trends of green vegetation in each scenario. Figure 9
shows the influence of latitude and longitude in all eight scenarios as partial dependency plots. We can clearly see that each segment and each decade differ from each other and affirm our approach in using consecutive time intervals to investigate spatial green vegetation trends individually to get spatio-temporal insight into the amount of green vegetation fractions and how the greenness developed over space and time.
Using a linear model to extract slope coefficients allows formal, statistical investigation of the vegetation trends and associating p
-values. However, it only considers trends as a linear monotonic trends of green vegetation. We tried to overcome this by dividing the data set into three decades, but it has been demonstrated that it did not improve prediction accuracy substantially. Furthermore, no turning points or extreme events were taken into consideration that would have described changes in green vegetation fractions since the linear approach could not have detected them. As seen in Figure 8
, we used this to split the data into three decades, which shows that there were no other significant trends captured. We only used geographic surrogate gradients without adding any other environmental covariates to the BRT model. In addition, no further testing using other FCover scenes of a different Landsat footprint was taken to determine if the results are restricted to our location only.
There are many generalisations of the approach presented here. For example, while this paper has intentionally focused on a single analytic method for each of the two steps in the proposed approach, it is clear that these methods could be replaced by any one—or indeed a number—of a wide variety of statistical machine learning methods designed for estimation and/or prediction. For example, instead of the linear regression and BRT approaches illustrated here, one could consider other regression models that capture temporal and spatial correlation (e.g., exponentially weighted moving average models, Markov random field models, respectively) or other nonlinear models such as neural networks or support vector machines. It is also valuable to look at the literature in other domains that evaluate and compare these and other methods for spatial and temporal estimation and prediction, such as [34
]. Generalising in another direction, although this paper has focused on a single output from the first step (the estimated regression coefficient) and used this as a univariate response for the analysis in the second step, a multivariate approach could be adopted whereby the outputs from the first step (and inputs for the second step) are the regression estimates and their associated standard errors and/or RMSE, or estimates of multiple coefficients in a multiple regression, or parameter estimates and associated goodness of fit estimates from an alternative supervised learning method such as a neural network.
In this study, we demonstrated that a localised and quantitative distribution of temporal and spatio-temporal trends of green vegetation cover can be predicted using BRT. All together, eight scenarios have been investigated, namely the whole data set covering 30 years, then three data sets covering a decade each, then the four quadrants of the image over all years. We showed that the prediction of location-based trends of green vegetation achieved good results by using the RMSE as goodness of model fit by combining a linear model and BRT. The extracted slope coefficient and p
-values were categorised and further analysed by their direction of their increase of the quantity in green vegetation and their associated statistical significance through the p
-values. A limitation can be found that 84% of the slope coefficients were positive but most were associated with non-significant p
-values. In our paper [12
], we concluded that a North–South gradient is dominating over the East–West gradient in predicting the quantity of green vegetation fractions used in a spatial context. Here, we are using the same data and we can see that the North–South gradient does not contribute to the rate of change in green vegetation and its influence for temporal trends based on the three decades. Our results confirm the results of the author [13
], where they concluded that latitude and longitude can be used to explain the spatial variability in the distribution of C3 and C4 grass along North–South and East–West gradients. In analysing 30 years as a spatio-temporal aspect in the four segments demonstrated in Table 4
, we show a decrease of the influence of the North–South gradient and an increase of the East–West gradient as the relative influence of predicting vegetation trends. By analysing the data using either the whole data set of 30 years, three decades or four segments that show 30 years in each quadrant, we can conclude that, in the shorter time frames, no temporal trends were observed and the overall linear trend of 30 years seems sufficient.