The Normalized Difference Vegetation Index (NDVI) reflects vegetation growth and it is closely related to the amount of photosynthetically absorbed active radiation as indicated in [1
]. It is calculated using the radiometric information obtained for the red (R) and near-infrared (NIR) wavelengths of the electromagnetic spectrum in the following way:
] for more details). As mentioned in [4
], this parameter is sensitive to the blueness of the observed area, which is closely related to the presence of vegetation. Although numerical limits of NDVI can vary for the vegetation classification, it is widely accepted that negative NDVI values correspond to water or snow. NDVI values close to zero could correspond to bare soils, yet these soils can show a high variability. Values between 0.2 and 0.5 (approximately) to sparse vegetation, and values between 0.6 and 1.0 conform to dense vegetation such as that found in temperate and tropical forests or crops at their peak growth stage.
However, in remote sensing data, atmospheric conditions or cloud presence alter the correct estimation of NDVI. A large number of papers have been devoted to completing, reconstructing and predicting the spatial and temporal dynamics of the future NDVI distribution using a time series of images (see, for example, [5
]). These studies are mainly based on including temporal correlation of individual pixels at different resolutions but ignoring spatial dependence among them. Perhaps the most broadly used method for analysing NDVI temporal changes is the non-parametric Mann–Kendall test (see, for example, [11
]). When plotting significant changes, a discrete pixel by pixel map of the NDVI trend changes is obtained. Figure 1
shows the coloured pixels where significant trend NDVI changes have been detected in continental Spain from October 2011 to December 2013. This discretization comes because the Mann–Kendall test only assumes a time dependence within the same pixel across years, but it does not encompass the spatial dependence among neighbour pixels. Therefore, unless random disturbances occur because of fire events, land-use/cover changes, crop rotation, land degradation or many other causes, we expect that close locations present similar trend changes. Some improvements of this test have been also provided. For example, Neeti and Eastman [15
] introduced the contextual Mann–Kendall approach for assessing the trend significance of the NDVI time series, by removing serial correlation through a prewhitening process. It consists of evaluating the trend at a regional scale comprised of the
neighborhood around each pixel, providing a smoother picture of the trend changes but without completely avoiding the final discretization of images.
To obtain a smooth map of the NDVI trend changes, only a few alternatives are found in the spatio-temporal literature. For example, Xu et al [16
] proposed a spatio-temporal iteration method to reconstruct contaminated pixels of the MODIS13Q1 NDVI time series dataset. The method is not stochastic but based on numerical approximations. The authors first compute contaminated pixels of NDVI through linear interpolation of adjacent high-quality pixels, and, later, the NDVIs of the remaining contaminated pixels are determined based on the NDVI of a high-quality pixel located in the same ecological zone, showing the most similar NDVI change trajectories. They iterate the process using the estimated NDVIs as high-quality pixels to predict undetermined NDVIs of contaminated pixels until the NDVIs of all contaminated pixels are estimated. The well-known proposal by Eklundh and Jönsson [17
] provides the TIMESAT [18
] free program, designed primarily for analyzing the time series of satellite data. It uses an adaptive Savitzky–Golay filtering and methods based on upper envelope weighted asymmetric Gaussian and double logistic model functions. This program can be downloaded from [19
] and it has been used in this paper for comparison purposes.
The use of stochastic spatio-temporal models (see [20
]) is scarce with satellite data. Hengl et al. [21
] use a spatio-temporal regression kriging for smoothing land surface temperature data of MODIS MOD11A2. The time series data consists of 46 daytime and nighttime eight-day composite land surface temperature (LST) images in 2008 and the ground data of 159 Croatia meteorological stations. The difficulty of this method lies in fitting the variogram necessary for modelling the spatio-temporal dependence that increases depending on the number of periods and stations. As an alternative, we propose a stochastic state-space model that simultaneously exploits dependencies across space and time. Figure 2
shows the graphical summary followed in the paper.
Global Inventory Modeling and Mapping Studies Normalized Difference Vegetation Index of third generation (GIMMS NDVI3g), between 2011 and 2013 are used in this paper for analysing the spatio-temporal NDVI distribution in continental Spain. GIMMS NDVI3g data are bi-weekly composite NDVI data. The composite images are obtained by the Maximum Value Compositing (MVC). It has been shown to be more accurate than the GIMMS NDVI data for monitoring vegetation activity and phenological change [22
]. More details on GIMMS NDVI3g can be found in [23
]. Figure 3
shows the GIMMS NDVI3g image over the Earth in the first fifteen days of October 2011. The GIMMS NDVI3g time series is an improved normalized difference vegetation index (NDVI) data set produced from Advanced Very High Resolution Radiometer (AVHRR) instruments that extends from 1981 to the present onboard NOAA satellite. It has been largely used along recent years, for example in [24
]. GIMMS NDVI3g data can be downloaded from [26
]. The data have flags accounting for additional information about the pixel quality. These flags can vary between 1 and 7, where 1 or 2 indicates good quality, numbers between 3 and 6 indicate different kinds of processing, and 7 indicates missing data.
The spatial resolution of these data is 8 km at the equator, but it has been corrected for calibration, view geometry, volcanic aerosols, and other effects not related to vegetation changes, providing sometimes unrealistic values of the NDVI when downscaling the NDVI index to smaller regions [27
]. Figure 4
shows the 72 scenes of original GIMMS NDVI3g data cropped to continental Spain from January 2011 to December 2013 and plotted in the free statistical software R [28
]. In particular, library gimms [29
] has been used for reading the images in R, yet it can also be done with library raster [30
]. According to this figure, western and northern Spanish regions have the maximum limit of NDVI, even in the summer, which is usually the driest season, which is an unlikely case in this country, particularly in the central western regions.
In this paper, climate data from the Climatic Research Unit (CRU) are additionally used as auxiliary information in the stochastic space-time model to calibrate satellite data. CRU data are the result of processed meteorological data that can be downloaded from [31
]. This is a gridded climate data set of monthly observations taken at meteorological stations across the world land areas and referred to as CRU TS3.10. Station anomalies were interpolated into 0.5 degrees latitude/longitude grid cells covering the global land surface (excluding Antarctica) and combined with an existing climatology database to obtain absolute monthly values. Detailed information can be found in [32
]. From Figure 5
left we can see the grid locations of CRU TS3.10 data where auxiliary meteorological information is drawn. This database contains the following auxiliary variables:
cld cloud cover percentage (%) x 10
dtr diurnal temperature range degrees Celsius x 10
frs frost day frequency days x 100
pet potential evapotranspiration millimetres per day x 10
pre precipitation millimetres per month x 10
tmp daily mean temperature degrees Celsius x 10
tmn monthly average daily minimum temperature degrees Celsius x 10
tmx monthly average daily maximum temperature degrees Celsius x 10
vap vapour pressure hectopascals (hPa) x 10
wet wet day frequency (rain days per month) days x 100
In this list, only
variables are used because
can be derived from the rest, and the stochastic spatio-temporal models require independent auxiliary variables for avoiding multicollinearity [33
]. The six chosen variables will be called covariates hereafter.
From the GIMMS NDVI3g data, we randomly choose
locations among those with good flag attributes (indicating high quality). These locations are plotted on Figure 5
right. In these locations, we extract the meteorological information of the six covariates. As the temporal resolution of CRU data differs from GIMMS NDVI3g data (monthly versus bi-monthly data), we decided to transform CRU monthly data in bi-monthly data. In particular,
remain invariant in the corresponding fifteen days, but
are divided by two. Next, the CRU covariates and the altitude of the sampled locations are organized in a
matrix. The first column corresponds to the height values of the n
sampled observations, and the rest are blocks of 72 periods by six covariates. The number of sampled locations have been chosen after checking different sizes between 300 and 1000 locations. From 300 locations, similar results have been obtained. This number is closely related to the meteorological data resolution because meteorological data must be drawn at these sampled locations, yet only a limited number of 211 pixels of CRU TS3.10 data are inside continental Spain. It means that only 211 different sets of covariates are available for being used in the model, and, then, negligible differences in model coefficient estimates are found when increasing the number of sampled locations.
Classical statistical tools are used for checking the statistical significance of the model coefficients. Table 1
shows the estimates, the standard errors, the t
-values, and the confidence intervals of the state-space model coefficients. Standard errors are obtained by bootstrapping 10 replicates, but similar results are derived when increasing the number of replicates. All the coefficients are statistically significant because no one of the confidence intervals contain the zero value, except for the
variable. Different random sets of 561 sampled locations have been essayed with similar results. In some cases, the
covariate is statistically significant but with a very small estimate. Therefore, this covariate has been kept in the model, yet we know that it has a negligible impact in the predictions. Interpretation of sign estimates allows to conclude that NDVI is positively correlated with altitude, precipitation, and number of cloud days. However, NDVI decreases when maximum temperature or vapour pressure increase as expected. Meteorological covariates have been divided by 100 and altitude by 1000 because scaling covariates help to avoid singularities in the process of inverting matrices. Maximum temperature could be substituted by the average or minimum temperature without altering significantly the model estimation and the predictions. The model has been statistically validated testing the normality of the residuals.
For checking the model, we firstly compare sampled versus predicted data both in a unique period for all of the 561 locations, and, separately, in every one of the 72 bi-monthly periods. The overall summary of the sampled and predicted values of NDVI from 2011 to 2013 are shown in Table 2
, where we can observe that the model does not only provide the same average for sampled and predicted values, but also similar quantile values. The smoothing process crosses over the most extreme values as expected. The state-space model predictions not only follow the pattern of GIMMS NDVI3g data in the overall period (2011–2013), but also in everyone of the 72 bi-monthly periods, as it is shown in the histograms of sampled NDVI values (Figure 7
) and the corresponding predictions (Figure 8
). Similarity between these figures is evident. In addition, Figure 9
plots sampled versus predicted NDVI data in the 72 periods, exhibiting also a close proximity between them. Therefore, the good performance of the model in sampled data is not only shown in summary statistics but also in all of the sampled locations. Later, an ordinary kriging was applied in every one of the 72 bi-monthly periods to get an overall image of the whole continental Spain. Library geostatsp [42
] has been used in this step. Figure 10
shows the monthly predictions obtained by averaging the bi-monthly predictions. To complete the validation process, we compare these results to the documented information retrieved from the the Spanish National Agency of Meteorology (AEMET) [43
] and the Spanish CRU TS3.10 meteorological data.
Spain is the fifth largest country in Europe with an extension of 505,000 and an average altitude of 650 m, the third highest country in Europe. It has three climatological regions. The Mediterranean region with dry and warm summers and cool to mild, wet winters. The oceanic region located in the North of Spain and characterised by relatively mild winters and warm summers, and the semiarid region located in the southeastern part of the country. In contrast to the Mediterranean region, the dry season continues beyond the end of the summer. This climatology affects the country vegetation, where differences can be appreciated among and within seasons.
AEMET reveals that the year 2011 was extremely hot with higher temperatures than the historical average (1971–2000). It was also very dry with 25% less rainfall in the North of Spain; however, spring was more humid than normal, particularly in March. The autumn rainfall was 10% lower than usual. The meteorological information drawn from the CRU TS3.10 data is summarized in Figure 11
. On Figure 11
left, monthly average temperatures are shown, and, on the right panel, the corresponding monthly average rainfall is given. Different colors are used for the different years and the historical mean is plotted in black in both panels. In the spring of 2011, high temperatures and abundant rainfall were also reported, yet the autumn was also very dry. Figure 10
shows the NDVI monthly Spanish predictions obtained by averaging the bi-monthly predictions given by the state-space model. In 2011, low values of NDVI are estimated in autumn but have very high values in spring, in agreement with AEMET and CRU TS3.10 data. The year 2012 was also very hot, especially in summer, and rainfall was 15% less than usual, except for autumn, and the region of Galicia, located in the northwest of Spain, which was extremely humid. These features are also observed in Figure 10
, where a blue color is observed in December 2012 in Galicia, a brown color predominates in the main plateau of Spain, and northern regions show high values of NDVI, particularly in spring.
The year 2013 was hot, but not as hot as 2011 and 2012. January and February were 30% more humid than normal, and March was extremely humid, with more than 340% more rain than the normal average. However, December was very dry. In Figure 11
, CRU TS3.10 data also show a big pick of rainfall in winter that correspond to high values of smoothed NDVI in spring.
In summary, smoothed NDVI reveals a clear seasonality that intensifies the effect of spring vegetation in 2011, 2012 and 2013, where a higher level of rainfall than average is documented. The images preserve the pattern of the original ones but reduce the larger values of NDVI. As expected, the northern regions of Spain maintain higher values around 0.8 and 0.9, mainly in spring and early summer when temperatures and rainfall are more intense. Mountainous regions are also prone to the highest values, and the main plateau reaches values between 0.3 and 0.5, indicating the presence of bare soils or sparse vegetation. Therefore, the smoothed NDVI obtained through the state-space model is close to the climatological real scenario given in Spain between 2011 and 2013. Overall, smoothed images are more sensitive to seasonal and specific meteorological changes than the original ones.
Checking the performance of the smoothed NDVI with the real data is a difficult task because the NDVI is only estimated through satellite images. In this regard, comparisons of the mean estimated surfaces in four categories of NDVI are presented in Table 3
: ndvi1 for data less than or equal to 0.2, ndvi2 for data greater than 0.2 and less than or equal to 0.5, ndvi3 for data greater than 0.5 and less than or equal to 0.7, and ndvi4 for data greater than 0.7. The mean total surfaces have been calculated with the raw GIMMS NDVI3g images, the state-space smoothed NDVI values, and three versions of the TIMESAT smoothed NDVI values from 2011–2013. The smoothing effect of the state-space model is mainly shown in both ndv1 and ndvi4 categories where smoothed NDVI mean total surfaces are lower than raw averages. These reductions have been added to the ndvi2 and ndvi3 categories. The three TIMESAT versions behave likewise providing close values to those obtained with the original images in both ndvi2 and ndvi3 categories, but important differences are found in the rest of the categories. Figure 12
shows the monthly mean surfaces of the raw GIMMS NDVI3g data, and the three smoothing versions: the state-space and two versions of TIMSESAT, the Savitzky–Golay filtering and the Gaussian filtering data by years. The double logistic smoothing version of TIMESAT has been omitted because it is equal to the Gaussian version. The state-space approach follows the same pattern as the original data, but we can see how the ndvi1 category is smoothed mainly in winter and the ndvi4 category in spring and winter. The TIMESAT smoothing versions do not preserve well the pattern of the raw data in the smallest category, and bigger differences than with the state-space procedure can be found, particularly in the first category. In summary, the state-space approach preserves the monthly pattern of raw data by years and smooths mainly the lowest and upper categories. Additionally, this approach incorporates external information coming from CRU TS3.10 meteorological data and agrees with the information provided by the Spanish National Agency of Meteorology.
GIMMS NDVI3g data have been widely used during the last decades for studying large scale trend changes over the years, mainly over continental or semi-continental regions. The latest version of the GIMMS NDVI data span the period July 1981 to December 2011 and is termed NDVI3g, but, in this paper, only the last three years are used. The temporal resolution of the 72 images between 2011 and 2013 has been chosen due to two main reasons. One comes with the computational problems in estimating the model that arises when enlarging this period, and the other one comes because auxiliary data at the same resolution is also needed, something difficult to find when we go back a long time. Nevertheless, higher resolutions can be also considered and future work is needed to encompass more years in the proposed model.
The actual resolution of 8 km at the equator is an attractive feature for monitoring changes of vegetation at any scale. Unfortunately, this resolution is not enough to warrant high precision images at smaller scales because images have been pre-processed, and, likely, there is also an important ocean border effect, as in the case of Spain. The Maximum Value Compositing (MVC) algorithm used to suppress atmospheric effects also minimizes significant problems associated with short-wave passive remote sensing of the Earth’s surface, but the MVC technique itself has generated a second level of problems that must be addressed for proper interpretation of the NDVI MVC images. These are radiometric effects, which are relevant to the stratification assumption, and engineering effects, which are relevant to the MVC technique [44
]. Similar situations can also be found with other NDVI global scenes coming from Terra MODIS or SPOT VGT (see, for example, [45
], where evaluation of long trends vegetation coming from these satellites is made, revealing differences among them). High bias can also be found when using MVC in mountain regions (see [46
]). Therefore, when down-scaling global scenes to country levels, as in the case of GIMMS NDVI3g in Spain, an adequate smoothing of NDVI data is needed for a proper interpretation of the spatio-temporal NDVI distribution. Similar situations can be found with other image processing techniques that may require smoothing procedures to analyse the data properly.
The main aim of this paper is to show the importance of considering both the spatial and temporal dependence for analyzing and smoothing NDVI data. The stochastic spatio-temporal model used here is a useful tool to capture space and time variability for simultaneously smoothing images. Smoothed images have been compared with TIMESAT that only uses temporal dependence. The state-space method outperforms this alternative, as it is able to reduce the most extreme values preserving the original pattern of raw data. The state-space model also provides the contribution of every covariate to predict NDVI. In this regard, it agrees with other studies such as [47
], where it is shown that, among climatic factors, precipitation and temperature influence both temporal and spatial patterns of NDVI.
However, there is an inherent difficulty in checking the performance of the stochastic model. Validation based on comparisons between sampled and predicted data is a common approach that evaluates the model goodness of fit, yet it is not enough to warrant a good performance when predicting new data. In this paper, this approach is satisfactory with the following limitation: sampled data are not necessarily the real data, and, then, when comparing predicted with sampled data, it does not mean that small differences between them correspond necessarily to high-quality predictions. This step can only be done when looking for vegetation changes previously documented. In the Results section, a detailed exploration of documented meteorological data shows a close agreement between the historical information and the smoothed NDVI images. Comparisons of smoothed trend changes with other studies are scarce. The most relevant is [51
] where the authors investigate the NDVI trend changes that happened in the Iberian peninsula between 1981 and 2001 using GIMMS NDVI3g data but with a pixel by pixel approach.