# Object-Based Classification of Grasslands from High Resolution Satellite Image Time Series Using Gaussian Mean Map Kernels

^{1}

^{2}

^{*}

Next Article in Journal

Previous Article in Journal

Dynafor, University of Toulouse, INRA, INPT, INPT-EI PURPAN, 31326 Castanet Tolosan, France

Team Mistis, INRIA Rhône-Alpes, LJK, 38334 Montbonnot, France

Author to whom correspondence should be addressed.

Received: 26 April 2017
/
Revised: 12 June 2017
/
Accepted: 29 June 2017
/
Published: 4 July 2017

(This article belongs to the Section Remote Sensing in Agriculture and Vegetation)

This paper deals with the classification of grasslands using high resolution satellite image time series. Grasslands considered in this work are semi-natural elements in fragmented landscapes, i.e., they are heterogeneous and small elements. The first contribution of this study is to account for grassland heterogeneity while working at the object level by modeling its pixels distributions by a Gaussian distribution. To measure the similarity between two grasslands, a new kernel is proposed as a second contribution: the $\alpha $ -Gaussian mean kernel. It allows one to weight the influence of the covariance matrix when comparing two Gaussian distributions. This kernel is introduced in support vector machines for the supervised classification of grasslands from southwest France. A dense intra-annual multispectral time series of the Formosat-2 satellite is used for the classification of grasslands’ management practices, while an inter-annual NDVI time series of Formosat-2 is used for old and young grasslands’ discrimination. Results are compared to other existing pixel- and object-based approaches in terms of classification accuracy and processing time. The proposed method is shown to be a good compromise between processing speed and classification accuracy. It can adapt to the classification constraints, and it encompasses several similarity measures known in the literature. It is appropriate for the classification of small and heterogeneous objects such as grasslands.

Grasslands are semi-natural elements that represent a significant source of biodiversity in farmed landscapes [1,2,3,4]. They provide many ecosystem services such as carbon storage, erosion regulation, food production, crop pollination and biological regulation of pests [5], which are linked to their plant and animal composition.

Different factors impact grassland biodiversity conservation. Among them, the age of a grassland (i.e., the time since last ploughing/sowing) is directly related to its plant and animal composition. Old “permanent” grasslands, often called semi-natural grasslands, hold a richer biodiversity than temporary grasslands [2,6,7,8]. Indeed, they had time to establish and stabilize their vegetation cover, contrarily to temporary grasslands, which are part of a crop rotation. Additionally, agricultural management of grasslands (i.e., mowing, grazing, fertilizing, reseeding, etc.) influences their structure and composition [9,10,11,12]. Management is essential for their biodiversity conservation because its prevents woody establishment. Conversely, an intensive use constitutes a threat for this biodiversity [12,13]. Therefore, it is important to know the age of a grassland and to identify the management practices in order to monitor their effect on biodiversity and related services. However, these factors are defined at different temporal scales: over the years for the age of a grassland and during a vegetation season (i.e., a year) for the management practice.

Usually, ecologists and agronomists characterize grasslands at the parcel scale through field surveys. However, these surveys require important human and material resources, the knowledge of the assessor and a sampling strategy, which make them expensive and time consuming [14]. They are thus limited in spatial extent and in temporal frequency, limiting grassland characterization to a local scale and over a short period of time.

Conversely, remote sensing offers the possibility to provide information on landscapes over large extents, thanks to the broad spatial coverage and regular revisit frequency of satellite sensors [15]. In this context, satellite images have already appeared to be an appropriate tool to monitor vegetation over large areas with a high temporal resolution.

In the remote sensing literature, grasslands have relatively not been studied much compared to other land covers like crops or forest [16]. Most of the studies focusing on grasslands have agronomic applications, such as estimating biomass productivity and growth rate [17,18,19] or derivating biophysical parameters like the Leaf Ara Index (LAI), the Fraction of Photosynthetically Active Radiation (fPAR) and the chlorophyll content [20,21,22,23,24]. Studies having biodiversity conservation schemes such as assessing plant diversity and plant community composition in a grassland are usually based on ground spectral measurements or airborne acquisitions at a very high spatial resolution [25,26,27,28,29,30,31]. However, such acquisitions are time consuming and expensive, and thus, they do not allow for continuous monitoring of grasslands over the years.

Using satellite remote sensing images, grasslands have been much studied at a regional scale with medium spatial resolution sensors (i.e., MODIS, 250 m/pixel [17,18,32]), where the Minimum Mapping Unit (MMU) is at least of hundreds of meters. This scale is suitable for large, extensive, homogeneous and contiguous regions like steppes [33], but not for fragmented landscapes, which are usually found in Europe and in France particularly [34,35]. These fragmented landscapes are made of a patchwork of different land covers, which have a small area [35]. In these types of landscapes, grasslands can be smaller (less than 10,000 m^{2}) than the pixel resolution [36] (see Figure 1 for a graphical example). As a consequence, pixels containing grasslands are usually a mixture of other contributions, which can limit the analysis [37,38]. As examples, Poças et al. [39] had to select large contiguous areas of semi-natural grasslands in a mountain region of Portugal to be able to use SPOT-VEGETATION data (1-km resolution). Halabuk et al. [40] also had to select only one MODIS pixel per homogeneous sample site in Slovakia to detect cutting in hay meadows. A 30-m pixel resolution is still not sufficient for grassland characterization. Indeed, Lucas et al. [41] and Toivonen and Luoto [42] showed that it was more difficult to classify fragmented and complex elements [43], like semi-natural grasslands, than homogeneous habitats, using Landsat imagery. Price et al. [44] classified six grassland management types in Kansas using six Landsat images, but the accuracy of the classification was not satisfying (less than 70%). Therefore, to detect small grasslands in fragmented landscapes, high spatial resolution images are required [36,45,46].

For high spatial resolution images (about 10 m/pixel), few intra-annual images are usually available for a given location [47]. However, Buck et al. [48] concluded that three RapidEye images per year were not enough to detect the mowing practices in grasslands. It was confirmed by Franke et al. [49] who classified grassland use intensity into four categories: semi-natural grassland, extensively-used grassland, intensively-used grassland and tilled grassland. They increased the classification accuracy when increasing the number of RapidEye images from three to five scenes. Additionally, Schmidt et al. [50] concluded that about seven to ten images, depending on the vegetation index used, are a good tradeoff between the amount of satellite data and classification accuracy of grassland use intensity. Some works report results with few images per year, such as Dusseux et al. [51], but they worked on LAI. In their study for mapping grassland habitat using intra-annual RapidEye imagery, Schuster et al. [52] concluded the more acquisition dates used, the better the mapping quality.

Given the heterogeneity of grasslands in fragmented landscapes, their phenological cycle and the punctuality of the anthropogenic events (e.g., mowing), dense high spatial resolution intra-annual time series are necessary to identify the grassland management types [36,52,53,54]. Moreover, to discriminate semi-natural grasslands from temporary grasslands, inter-annual time series are necessary. Until recently, satellite missions offering high revisit frequency (1–16 days) had coarse spatial resolution (i.e., NOAA AVHRR, 1 km; MODIS, 250/500 m). Conversely, high spatial resolution missions did not provide dense time series and/or were costly (i.e., QuickBird, RapidEye). For these reasons and compared to crops, grasslands’ differentiation through Earth observations is still considered as a challenge [52]. However, new missions like Sentinel-2 [55], with a very high revisit frequency (five days) and high spatial resolution (10 m in four spectral channels, 20 m in six channels), provide new opportunities for grasslands’ monitoring over the years in fragmented landscapes [54] at no cost, thanks to the ESA free data access policy. For instance, the high spatial resolution is assumed to make possible the identification of grassland-only pixels in the image, and several pixels can belong to the same grassland plot. Hence, the analysis can be done at the object level, not at the pixel level, which is suitable for landscape ecologists and agronomists who usually study grasslands at the parcel scale [56]. Thus, object-oriented approaches are more likely to characterize grasslands ecologically [57,58]. Yet, many works consider pixel-based approaches without any spatial constraints [17,42,44,48,49,52,59].

At the object level, grasslands are commonly represented by their mean NDVI [18]. However, such a representation might be too simple since it does not account for the heterogeneity in a grassland. Sometimes, distributions of pixels as individual observations are still better than the mean value to represent grasslands, as in [54]. Lucas et al. [41] used a rule-based method on segmented areas for habitat mapping, but it did not work well on complex and heterogeneous land covers. Esch et al. [60] also used an object-oriented method on segmented elements then represented by their mean NDVI. These methods based on mean modeling do not capture grasslands’ heterogeneity well. Other representations can be found in the literature, taking the standard deviation and object texture features as variables [61], but they were not applied to time series. To our knowledge, these methods do not use the high spatial and the high temporal resolutions jointly. Moreover, all of these studies used vegetation indices as a variable, although it has been shown that classification results are better when using more spectral information [35,62].

To deal with the high spatio-spectro-temporal resolutions new satellite sensors are now offering, dimension reduction is usually performed through the use of a vegetation index such as NDVI [50,52,63,64], PCA [65] or spectro-temporal metrics [35,66]. However, a large amount of spectro-temporal information is lost with these solutions. Franke et al. [49] developed an indicator of the spectral variability of a pixel over the time series, the mean absolute spectral dynamics, but its efficiency was assessed using a decision tree algorithm. Decision trees are usually not recommended because they tend to over-fit the data [67]. Therefore, the high spatio-spectro-temporal resolutions have not really been addressed in the literature of remote sensing classification. Indeed, such time series bring new methodological and statistical constraints given the high dimension of data (i.e., number of pixels and number of spectral and temporal measurements). Dealing with more variables increases the number of parameters to estimate, increasing the computation time and making the computation unstable (i.e., ill-conditioned covariance matrices, etc.) [68,69]. Hence, conventional models are not appropriate if one wants to use all of the spectro-temporal information of time series with high spatial and temporal resolutions. Thus, classifying grasslands with this type of data is still considered as a challenge [52].

In the present study, we introduce a model suitable for the classification of grasslands using Satellite Image Time Series (SITS) with a high number of spectro-temporal variables (e.g., Sentinel-2 data). Two temporal scales are considered in this work: (i) an inter-annual time series of three years to discriminate old grasslands from young grasslands and (ii) an intra-annual time series to identify the management practices. Note that in this work, the objects are not found from segmentation [38], but from the existing dataset in a polygon form.

The first contribution of this study is to model a grassland at the object level while accounting for the spectral variability within a grassland. We consider that the distribution of the pixel spectral reflectance in a given grassland can be modeled by a Gaussian distribution. The second contribution is to propose a measure of similarity between two Gaussian distributions that is robust to the high dimension of the data. This method is based on the use of covariance through mean maps. The last contribution is the application of the method to old and young grasslands’ discrimination and of management practices’ classification, which are non-common applications in remote sensing. Moreover, to our knowledge, mean maps have not yet been used on Gaussian distributions for supervised classification of SITS at the object level.

In the next section, the materials used for the experimental part of this study are presented. Then, the methods, including the different types of grassland modeling and the measures of similarity between distributions, are introduced in Section 3. Following that, we experiment with the proposed methods on the classification of a real dataset in Section 4. Finally, conclusions and prospects are given in Section 5.

The study site is located in southwest France, near the city of Toulouse (about 30 km), in a semi-rural area (center coordinates: ${43}^{\circ}{27}^{\prime}{36}^{\u2033}$N ${1}^{\circ}{8}^{\prime}{24}^{\u2033}$E; Figure 2). This region is characterized by a temperate climate with oceanic and Mediterranean influences. The average annual precipitation is 656 mm, and the average temperature is 13 °C. The north of the site, closer to the urban area of Toulouse, is flat, whereas the southwest of the site is hilly. The eastern part corresponds to the Garonne River floodplain, and this location is dominated by crop production. Within this study site, livestock farming is declining in favor of annual crop production. Grasslands are mostly used for forage or silage production. Some grasslands, located in the southwestern part of the area, are pastures for cattle or sheep. The extent of the area is included in the satellite image extent (Figure 2) and is about $24\times 24$ km^{2}.

Time series of Formosat-2 were used in this experiment. Formosat-2 has four spectral bands with an 8-m spatial resolution: B1 “Blue” (0.45–0.52 $\mathsf{\mu}$m), B2 “Green” (0.52–0.6 $\mathsf{\mu}$m), B3 “Red” (0.63–0.69 $\mathsf{\mu}$m), B4 “Near Infra-Red (NIR)” (0.76–0.9 $\mathsf{\mu}$m). The extent of an acquisition is 24 km × 24 km. The images were all acquired with the same viewing angle. They were orthorectified, radiometrically and atmospherically corrected by the French Spatial Agency (CNES). They were provided by the Center for the Study of the Biosphere from Space (CESBIO) in reflectance with a mask of clouds and shadows issued from the MACCS (Multi-sensor Atmospheric Correction and Cloud Screening) processor [70], in the frame of the Kalideos project.

For the inter-annual analysis, we used all of the acquisitions of the consecutive years 2012 (13 observations), 2013 (17 observations) and 2014 (15 observations) (Figure 3 and Figure S1 in the Supplementary Materials). The acquisitions of the year 2013 and of the year 2014 were used separately for the classification of management practices.

To reconstruct the time series due to missing data (clouds and their shadows), the Whittaker filter [71] was applied pixel-by-pixel on the reflectances in each spectral band for each year independently. The Whittaker filter is a non-parametric filter that has a smoothing parameter that controls the roughness of the reconstructed curve. It has been successfully applied to smooth NDVI time series in the literature [72,73,74,75]. The smoother was adapted for unequally-spaced intervals and accounted for missing data (see [62] for a detailed description of the method). The smoothing parameter was the same for all of the pixels. It was equal to ${10}^{5}$ for the year 2013 and to ${10}^{4}$ for 2012 and 2014, after an ordinary cross-validation done on a subset of the pixels for each year. An example of smoothing on a grassland pixel is provided in Figure 4. This pixel is hidden by a light cloud during one image acquisition (red cross). Notice that the smoothing is done at the cost of under-estimating the local maxima of the temporal profile.

For the intra-annual time series, we used all of the spectral information. Therefore, the smoothed time series associated with each of the four spectral bands were concatenated to get a unique time series per pixel. For the inter-annual time series, as using all of the spectral bands would result in a too large number of variables to process, we worked on the NDVI, computed from the red and NIR bands.

In this study, “old” grasslands are 14 years old or more, whereas “young” grasslands are less than five years old. The French agricultural land use database (Registre Parcellaire Graphique) was used to extract the grasslands depending on their age. It registers on an annual basis the cultivated areas declared by the farmers in a GIS. Grasslands are declared as “permanent” or “temporary”. Permanent grasslands are at least five years old, whereas temporary grasslands are less than five years old (Commission Regulation EU No. 796/2004). For every plot declared as a grassland in 2014, its age was computed from the previous years’ declarations. We kept only the grasslands that were at least 14 years old in 2014 (“old”) and the grasslands that were less than 5 years old in 2014 (“young”). A negative buffer of 8 m was then applied to all of the polygons to eliminate the edge effects (Figure 5). Then, they were rasterized using the GDAL command gdal_rasterize (http://www.gdal.org/gdal_rasterize.html) to obtain the pixels inside each grassland. Only the grasslands having an area of at least 1000 m^{2} were kept to ensure a minimum number of 16 pixels to represent each grassland. In the end, there were 59 old grasslands (at least 14 years old) and 416 young grasslands (Table 1), for an average area of about 26,600 m^{2}.

The information of the agricultural practices performed in the crops is not featured in the land use database. Therefore, this dataset comes exclusively from field data. As mentioned in the Introduction, ground data are difficult to obtain in ecology since field work is fastidious. A field survey was conducted in May 2015 to determine the past and current management practices of 52 grasslands by interviewing the farmers or grasslands’ owners. The practices remained stable for the years 2013 and 2014. Four management types during a vegetation cycle were identified: one mowing, two mowings, grazing and mixed (mowing then grazing). We eliminated the type “two mowings” of the dataset because of its under-representation (only three grasslands).

The management types were used as classes for the classification (Table 2). The grasslands were digitalized manually after field work. A negative buffer of 8 m was then applied to eliminate the edge effects, before rasterizing the polygons. The average grasslands surface area is about 10,000 m^{2}. The smallest grassland is 1632 m^{2} (which represents 25 Formosat-2 pixels), and the largest is 47,111 m^{2} (735 pixels) (Figure 6).

In this work, each grassland ${g}_{i}$ is composed of a given number ${n}_{i}$ of pixels ${\mathbf{x}}_{ik}\in {\mathbb{R}}^{d}$, where k is the pixel index such as $k\in \{1,...,{n}_{i}\}$, $i\in \{1,\dots ,G\}$, G is the total number of grasslands, $N={\sum}_{i=1}^{G}{n}_{i}$ is the total number of pixels, $d={n}_{B}{n}_{T}$ is the number of spectro-temporal variables, ${n}_{B}$ is the number of spectral bands and ${n}_{T}$ is the number of temporal acquisitions. In the experimental part, when working on the intra-annual time series of 2013 using the four spectral bands, $d=4\times 17=68$. In 2014, $d=4\times 15=60$. When working on the inter-annual times series using NDVI, $d=1\times $ (13 + 17 + 15) = 45. With each grassland ${g}_{i}$ is associated a matrix ${\mathbf{X}}_{i}$ of size $({n}_{i}\times d)$ and a response variable ${y}_{i}\in \mathbb{R}$, which corresponds to its class label.

In the following, two types of grassland modeling are discussed, at the pixel level and at the object level. A more informative object level modeling is then proposed. Then, similarity measures are discussed.

The representation of a grassland at the pixel level has been much used in the remote sensing literature [17,42,44,48,49,52,59]. The grassland can either be represented by all of its pixels or by one pixel when the spatial resolution of the pixel is too coarse; see, for instance, [39,40]. In this representation, a sample is a pixel. Therefore, with each ${\mathbf{x}}_{ik}$ is associated the response variable ${y}_{i}$ of ${g}_{i}$, but ${\mathbf{x}}_{ik}$ is processed independently of all other ${\mathbf{x}}_{i{k}^{\prime}}$ of ${g}_{i}$. However, this representation usually leads to aberrant classification results (e.g., salt and pepper effect) [38], which are not expected when working at the grassland level.

At the object level, the mean vector ${\mathit{\mu}}_{i}$ of the pixels belonging to ${g}_{i}$ is generally used to represent ${g}_{i}$. It is estimated empirically by:

$$\begin{array}{c}\hfill {\widehat{\mathit{\mu}}}_{i}=\frac{1}{{n}_{i}}\sum _{l=1}^{{n}_{i}}{\mathbf{x}}_{il}.\end{array}$$

In this case, a vector ${\widehat{\mathit{\mu}}}_{i}\in {\mathbb{R}}^{d}$ and a response variable ${y}_{i}\in \mathbb{R}$ are associated with each grassland. This representation might be limiting for heterogeneous objects such as grasslands since the spectro-temporal variability is not encoded. To illustrate this bias, Figure 7 shows on the left the set of pixel values in the NIR band for two grasslands (a and b). From this figure, it can be seen that if the mean vector captures the average behavior, higher variability can be captured by including the variance/covariance (middle and right plots). The figure shows that the first and second eigenvectors of the covariance matrix capture well the general trend in the grassland and the main variations due to different phenological behaviors in the grassland. This information cannot be recovered by considering the variance feature only: covariance must also be included.

In this study, to account for the spectro-temporal variability, we assume that the distribution of pixels ${\mathbf{x}}_{i}$ is, conditionally to grassland ${g}_{i}$, a Gaussian distribution $\mathcal{N}({\mathit{\mu}}_{i},{\mathbf{\Sigma}}_{i})$, where ${\mathbf{\Sigma}}_{i}$ is the covariance matrix estimated empirically by:

$$\begin{array}{c}\hfill {\widehat{\mathbf{\Sigma}}}_{i}=\frac{1}{{n}_{i}-1}\sum _{l=1}^{{n}_{i}}({\mathbf{x}}_{il}-{\widehat{\mathit{\mu}}}_{i}){({\mathbf{x}}_{il}-{\widehat{\mu}}_{i})}^{\top}.\end{array}$$

In this case, we associate with each ${g}_{i}$ its estimated distribution $\mathcal{N}({\widehat{\mathit{\mu}}}_{i},{\widehat{\mathbf{\Sigma}}}_{i})$ and a response variable ${y}_{i}\in \mathbb{R}$. The Gaussian modeling encodes first and second order information on the grassland by exploiting the variance-covariance information. It is worth noting that if we constrain ${\widehat{\mathbf{\Sigma}}}_{i}={\mathbf{I}}_{d}$, the identity matrix of size d, for $i\in [1,\dots ,G]$, the Gaussian modeling is reduced to the mean vector. In the following, $\mathcal{N}({\widehat{\mathit{\mu}}}_{i},{\widehat{\mathbf{\Sigma}}}_{i})$ is denoted by ${\mathcal{N}}_{i}$.

For classification purposes, a similarity measure between each pair of grasslands is required. With pixel-based or mean modeling approaches, conventional kernel methods such as Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel can be used since the explanatory variable is a vector. However for a Gaussian modeling, i.e., when the explanatory variable is a distribution, specific derivations are required to handle the probability distribution as an explanatory variable.

Many similarity functions generally used to compare two Gaussian distributions (e.g., Kullback–Leibler divergence [76] and Jeffries–Matusita distance, which is based on Bhattacharyya distance [77]) require the inversion of the covariance matrices and the computation of their determinants. For a conventional multivariate Gaussian model, the number of parameters to estimate for each grassland is $d(d+3)/2$ (d parameters for the mean vector and $d(d+1)/2$ parameters for the symmetric covariance matrix). In the case where d is large, the number of parameters to estimate can be much larger than the number of samples, making the inverse problem ill-posed. This issue is faced in this study because grasslands are small elements of the landscape. They are characterized by a number of spectro-temporal variables, which is about of the same order as the number of pixels ${n}_{i}$ (see Figure 6). Therefore, most of the estimated covariance matrices are singular, and their determinants are null. Hence, conventional similarity measures used for moderate dimensional Gaussian distributions are not suitable for high dimensional Gaussian distributions. In the following, we propose to use mean map kernels, and we introduce a derivation of mean map kernels to weight the influence of the covariance matrix.

Mean map kernels are similarity measures that operate on distributions [78]. They have been used in remote sensing for semi-supervised pixel-based learning in [79]. In their work, the authors define the similarity between two distributions ${p}_{i}$ and ${p}_{j}$ as the average of all pairwise kernel evaluations over the available realizations of ${p}_{i}$ and ${p}_{j}$ (i.e., pixels that belong to grasslands ${g}_{i}$ or ${g}_{j}$). It corresponds to the empirical mean kernel (Equation (8) [79]):
where ${n}_{i}$ and ${n}_{j}$ are the number of pixels associated with ${p}_{i}$ and ${p}_{j}$, respectively, ${\mathbf{x}}_{il}$ is the l-th realization of ${p}_{i}$, ${\mathbf{x}}_{jm}$ is the m-th realization of ${p}_{j}$ and k is a semi-definite positive kernel function.

$$\begin{array}{c}\hfill {K}^{e}({p}_{i},{p}_{j})=\frac{1}{{n}_{i}{n}_{j}}\sum _{l,m=1}^{{n}_{i},{n}_{j}}k({\mathbf{x}}_{il},{\mathbf{x}}_{jm}),\end{array}$$

It is possible to include prior knowledge on the distributions by considering the generative mean kernel [78]:

$$\begin{array}{ccc}\hfill {K}^{g}({p}_{i},{p}_{j})& =& {\int}_{{\mathbb{R}}^{d}}{\int}_{{\mathbb{R}}^{d}}k(\mathbf{x},{\mathbf{x}}^{\prime}){\widehat{p}}_{i}\left(\mathbf{x}\right){\widehat{p}}_{j}\left({\mathbf{x}}^{\prime}\right)d\mathbf{x}d{\mathbf{x}}^{\prime}.\hfill \end{array}$$

Note that Equation (3) acts on the realizations of ${p}_{i}$, while Equation (4) acts on its estimation. When dealing with a large number of samples, the latter can drastically reduce the computational load with respect to the former.

In our grassland modeling, ${p}_{i}$ and ${p}_{j}$ are assumed to be Gaussian distributions. In that case, if k is a Gaussian kernel such as $k(\mathbf{x},{\mathbf{x}}^{\prime})=\mathrm{exp}(-\frac{\gamma}{2}\parallel \mathbf{x}-{\mathbf{x}}^{\prime}{\parallel}^{2})$, Equation (4) reduces to the so-called Gaussian mean kernel [80]:
where $\gamma $ is a positive regularization parameter coming from the Gaussian kernel k and $|\xb7|$ stands for the determinant.

$$\begin{array}{c}\hfill {K}^{G}({\mathcal{N}}_{i},{\mathcal{N}}_{j})=\frac{\mathrm{exp}\left\{-0.5{({\widehat{\mathit{\mu}}}_{i}-{\widehat{\mathit{\mu}}}_{j})}^{T}{\left({\widehat{\mathbf{\Sigma}}}_{i}+{\widehat{\mathbf{\Sigma}}}_{j}+{\gamma}^{-1}{\mathbf{I}}_{d}\right)}^{-1}({\widehat{\mathit{\mu}}}_{i}-{\widehat{\mathit{\mu}}}_{j})\right\}}{|{\widehat{\mathbf{\Sigma}}}_{i}+{\widehat{\mathbf{\Sigma}}}_{j}+{\gamma}^{-1}{\mathbf{I}}_{d}{|}^{0.5}},\end{array}$$

This kernel is not normalized, i.e., ${K}^{G}({\mathcal{N}}_{i},{\mathcal{N}}_{i})\ne 1$, but the normalization can be achieved easily:

$$\begin{array}{ccc}\hfill {\tilde{K}}^{G}({\mathcal{N}}_{i},{\mathcal{N}}_{j})& =& \frac{{K}^{G}({\mathcal{N}}_{i},{\mathcal{N}}_{j})}{{K}^{G}{({\mathcal{N}}_{i},{\mathcal{N}}_{i})}^{0.5}{K}^{G}{({\mathcal{N}}_{j},{\mathcal{N}}_{j})}^{0.5}}\hfill \\ & =& {K}^{G}({\mathcal{N}}_{i},{\mathcal{N}}_{j})|2{\widehat{\mathbf{\Sigma}}}_{i}+{\gamma}^{-1}{\mathbf{I}}_{d}{|}^{0.25}{|2{\widehat{\mathbf{\Sigma}}}_{j}+{\gamma}^{-1}{\mathbf{I}}_{d}|}^{0.25}.\hfill \end{array}$$

With respect to the Kullback–Leibler Divergence (KLD) and the Jeffries–Matusita Distance (JMD), the Gaussian mean kernel introduces a ridge regularization term ${\gamma}^{-1}{\mathbf{I}}_{d}$ in the computation of the inverse and of the determinant [81]. Thus, the Gaussian mean kernel is more suitable to measure the similarity in a high dimensional space than KLD and JMD. The value of $\gamma $ tunes the level of regularization. It is tuned during the training process as a conventional kernel parameter.

However, in the case of very small grasslands, two problems remain. The first lies in the ridge regularization: in this case, so low $\gamma $ values are selected that it becomes too regularized, and it deteriorates the information. The second problem is that the estimation of the covariance matrix has a large variance when the number of samples used for the estimation is lower than the number of variables. Therefore, the covariance matrix becomes a poorly-informative feature. In the following, we propose a new kernel function that allows one to weight the covariance features with respect to the mean features.

Depending on the level of heterogeneity and the size of the grassland, the covariance matrix could be more or less important for the classification process. We propose a kernel including an additional positive parameter $\alpha $, which allows one to weight the influence of the covariance matrix, the $\alpha $-generative mean kernel:

$$\begin{array}{ccc}\hfill {K}^{\alpha}({p}_{i},{p}_{j})& =& {\int}_{{\mathbb{R}}^{d}}{\int}_{{\mathbb{R}}^{d}}k(\mathbf{x},{\mathbf{x}}^{\prime}){\widehat{p}}_{i}{\left(\mathbf{x}\right)}^{\left({\alpha}^{-1}\right)}{\widehat{p}}_{j}{\left({\mathbf{x}}^{\prime}\right)}^{\left({\alpha}^{-1}\right)}d\mathbf{x}d{\mathbf{x}}^{\prime}.\hfill \end{array}$$

When ${p}_{i}$ and ${p}_{j}$ are Gaussian distributions, k is a Gaussian kernel and the normalization is applied, the expression gives rise to the $\alpha $-Gaussian mean kernel:

$$\begin{array}{c}{\tilde{K}}^{\alpha}({\mathcal{N}}_{i},{\mathcal{N}}_{j})=\hfill \\ \hfill \frac{\mathrm{exp}\left\{-0.5{({\widehat{\mathit{\mu}}}_{i}-{\widehat{\mathit{\mu}}}_{j})}^{T}{\left(\alpha ({\widehat{\mathbf{\Sigma}}}_{i}+{\widehat{\mathbf{\Sigma}}}_{j})+{\gamma}^{-1}{\mathbf{I}}_{d}\right)}^{-1}({\widehat{\mathit{\mu}}}_{i}-{\widehat{\mathit{\mu}}}_{j})\right\}}{|\alpha ({\widehat{\mathbf{\Sigma}}}_{i}+{\widehat{\mathbf{\Sigma}}}_{j})+{\gamma}^{-1}{\mathbf{I}}_{d}{|}^{0.5}}|2\alpha {\widehat{\mathbf{\Sigma}}}_{i}+{\gamma}^{-1}{\mathbf{I}}_{d}{|}^{0.25}{|2\alpha {\widehat{\mathbf{\Sigma}}}_{j}+{\gamma}^{-1}{\mathbf{I}}_{d}|}^{0.25}.\end{array}$$

The proof is given in the Appendix. It is interesting to note that particular values of $\alpha $ and $\gamma $ lead to known results:

- $\alpha =0$: In this case, Equation (8) reduces to the Gaussian kernel between the mean vectors. It becomes therefore equivalent to an object modeling where only the mean is considered.
- $\alpha =1$: It corresponds to the Gaussian mean kernel defined in Equation (6).
- $\alpha \to +\infty $: We get a distance, which works only on the covariance matrices. It is therefore equivalent to an object modeling where only the covariance is considered.
- $\gamma \to +\infty $ and $\alpha =2$: The $\alpha $-Gaussian mean kernel simplifies to an RBF kernel built with the Bhattacharyya distance computed between ${\mathcal{N}}_{i}$ and ${\mathcal{N}}_{j}$.

This proposed kernel thus includes several similarity measures known in the literature. Furthermore, new similarity measures can be defined by choosing different parameters’ configuration. The $\alpha $-Gaussian mean kernel ($\alpha $GMK) is therefore more flexible since it can adapt to the classification constraints:

- Whether the heterogeneity of the object is relevant or not,
- Whether the ratio between the number of pixels and the number of variables is high or low.

In this section, the experiments for grassland classification are detailed. We first introduce the seven competitive methods, then the classification protocol is described, and we finally present and discuss the results.

Several existing pixel-based and object-based classification methods using SVM are presented below. They are compared to assess the effectiveness of the proposed object-based method, which relies on the weighted use of the covariance matrix, $\alpha $GMK, for the classification of grasslands.

These conventional methods use a RBF kernel.

- PMV (Pixel Majority Vote): The pixel-based method was described in Section 3.1.1. It classifies each pixel with no a priori information on the object to which the pixel belongs. In order to compare to other object level methods, one class label is extracted per grassland by a majority vote done among the pixels belonging to the same grassland.
- $\mathit{\mu}$ (mean): The distribution of the pixels reflectance of ${g}_{i}$ is modeled by its mean vector ${\mathit{\mu}}_{i}$ (see Section 3.1.2).

These methods are based on a distance D between two Gaussian distributions. They are used in a Gaussian kernel such as ${K}_{D}({\mathcal{N}}_{i},{\mathcal{N}}_{j})=\mathrm{exp}(-\frac{{D}_{ij}^{2}}{\sigma})$, with $\sigma >0$:

- HDKLD (High Dimensional Kullback–Leibler Divergence): This method uses the Kullback–Leibler divergence for Gaussian distributions with a regularization on covariance matrices such as described in [82].
- BD (Bhattacharyya Distance): This method uses the Bhattacharyya distance in the case of Gaussian distributions:$$\begin{array}{c}\hfill B({\mathcal{N}}_{i},{\mathcal{N}}_{j})=\frac{1}{8}{({\widehat{\mathit{\mu}}}_{i}-{\widehat{\mathit{\mu}}}_{j})}^{\top}{\left(\frac{{\widehat{\mathbf{\Sigma}}}_{i}+{\widehat{\mathbf{\Sigma}}}_{j}}{2}\right)}^{-1}({\widehat{\mathit{\mu}}}_{i}-{\widehat{\mathit{\mu}}}_{j})+\frac{1}{2}ln\left(\frac{|\frac{{\widehat{\mathbf{\Sigma}}}_{i}+{\widehat{\mathbf{\Sigma}}}_{j}}{2}|}{|{\widehat{\mathbf{\Sigma}}}_{i}{|}^{0.5}{|{\widehat{\mathbf{\Sigma}}}_{j}|}^{0.5}}\right).\end{array}$$Small eigenvalues of the covariance matrices are shrinked to the value ${10}^{-5}$ to make the computation tractable [83].

These methods are based on mean map kernels presented in Section 3.2:

- EMK (Empirical Mean Kernel): This method uses the empirical mean map kernel of Equation (3) and it is pixel-based.
- GMK (Gaussian Mean Kernel): This method is based on the normalized Gaussian mean kernel (Equation (6)).
- $\alpha $GMK ($\alpha $-Gaussian Mean Kernel): This method is based on the proposed normalized $\alpha $-Gaussian mean kernel (Equation (8)).

Figure 8 illustrates the relationships between the different methods. The characteristics of each method are synthesized in Table 3.

For memory issues during the SVM process, the number of pixels processed for the old and young grasslands’ classification was divided by 10 for the two methods based on pixels (PMV and EMK). Only one pixel out of 10 was kept per grassland.

We compared the efficiency in terms of classification accuracy and processing time of all of the presented methods by classifying the two grassland datasets on inter-annual and intra-annual time series (Section 2).

For each method, a Monte Carlo procedure was performed for 100 runs. For each run, the dataset was split randomly into training and testing datasets (75% for training and 25% for testing), preserving the initial proportions of each class. The same grasslands were selected for a given Monte Carlo repetition regardless of the method.

During each repetition, the optimal parameters were tuned by cross-validation based on the best F1 score. Table 4 contains the parameters grid search for all of the methods. Note that a wide grid was searched for the parameter $\alpha $ of $\alpha $GMK to further analyze the distribution of selected values. The penalty parameter C of the SVM process was chosen empirically and fixed to $C=10$, after running several simulations. The classification accuracy for each repetition was assessed by the F1 score computed from the confusion matrix. The Overall Accuracy (OA) was computed, but it is not presented here, because it does not reflect the accuracy of the classification well since unbalanced datasets were used.

In order to compare each pair of methods, a Wilcoxon rank-sum test was processed on the pair of distributions of the 100 F1 scores. This nonparametric test is designed for two independent samples that are not assumed to be normally distributed [84]. It tests if the two samples are drawn from populations having the same distribution.

The kernels and the SVM were implemented in Python through the Scikit library [85].

Figure 9 sums up the old and young grasslands’ classification results for each method over the 100 repetitions as a boxplot of F1 scores. The Kappa coefficients can be found in Figure S2 in the Supplementary Materials. Since the cross-validation was not based on the Kappa coefficient, the results are discussed in terms of F1 scores. The method reaching the best scores is $\alpha $GMK with a F1 average of 0.71 followed by PMV and GMK with an average of 0.69.

Table 5 contains the Wilcoxon rank-sum test statistics between each pair of methods. It tests the null hypothesis that the two sets of observations are drawn from the same distribution. The null hypothesis is rejected if the test statistics is greater than 1.96 with a confidence level of $5\%$ (p-value < 0.05). In this case, it accepts the alternative hypothesis that values in one population are more likely to be larger than the values from the other. The two best methods, $\alpha $GMK and PMV are not significantly different. However, $\alpha $GMK is significantly better than all of the other methods, whereas PMV is not significantly different than the mean map methods (EMK and GMK). The worst method is HDKLD with a mean F1 of 0.59.

In terms of processing load and time, the pixel-based methods are clearly the most demanding. Indeed, processing the 160,514 pixels was not possible with SVM, so we had to reduce the number of samples. These issues are not faced with object-oriented methods. The fastest methods are $\mathit{\mu}$ and HDKLD, but they did not reach acceptable classification accuracies. The best method in terms of ratio accuracy/processing time is $\alpha $GMK. It is appropriate for processing a large number of grasslands.

The classifications accuracies for management practices are shown in Figure 10 (F1 score) and in Figure S3 in the Supplementary Materials (Kappa coefficient) for year 2013 and for year 2014.

In terms of classification accuracy, methods based on divergences (BD and HDKLD) provided the worst results. Pixel-based methods, the mean modeling method and mean generative kernel methods provided similar results in terms of F1 score, except for PMV, which was significantly worse than the others for the year 2013. $\alpha $GMK provided the highest values in 2013 (average F1 of 0.65), but it was not significantly better than the others for this dataset. Indeed, due to the very low number of grasslands composing this dataset, confusion matrices were quite similar whatever the method. It is therefore difficult to compare the methods’ efficiency in this configuration.

Nevertheless, this dataset makes possible the comparison in terms of processing times, because the same spectral information was used for all of the methods. Figure 11 illustrates the training processing time relative to the one of PMV versus the average F1 score for each method. In terms of computational time, the pixel-based methods required the largest processing times. BD was also very long, mainly because of the shrinkage procedure. Mean modeling was the fastest, followed closely by HDKLD. $\alpha $GMK and GMK were equivalent in terms of computational times. For this configuration with a low number of grasslands, the mean modeling was the most efficient in terms of accuracy/processing time ratio.

It is worth noting that the times series of 2014 produced higher classification accuracies (maximum F1 average of 0.73 for GMK) than the time series of 2013 (maximum F1 average of 0.65 for $\alpha $GMK).

The purpose of this work was to develop a model suitable for the classification of grasslands from dense inter- or intra-annual SITS and robust to the dimension of the data. The proposed method based on a weighted use of the covariance, namely $\alpha $GMK, was compared to several competitive methods.

The methods’ efficiency is discussed for the old and young grasslands’ classification, since the results provided with the other dataset are not significantly different, mostly because of the small dataset size.

The divergence methods (BD and HDKLD) provided the worst results, showing that they are not robust enough to a high dimensional space.

Although they provided results close to the best results, pixel-based methods (PMV and EMK) are the most demanding in terms of computational time, and they do not scale well with the number of pixels. Indeed, they have to process N pixels instead of G grasslands with $G\ll N$. Therefore, we had to reduce the number of pixels used for the classification. Using them on a large area might be difficult, as the old and young grasslands’ dataset showed.

Representing grasslands by the estimated distribution of their set of pixels decreases the complexity during the SVM process. Therefore, the object level methods offer a lower computational load when compared to empirical mean kernels and pixel-based methods.

The mean generative kernel methods performed significantly better than the mean-only method ($\mathit{\mu}$). Among them, $\alpha $GMK performed better than GMK. It was also one of the most stable methods.

In this context, including the covariance information helps to discriminate grasslands. However, if the dimensionality is not properly handled, it deteriorates the process (e.g., BD and HDKLD). In this case, it is preferable to use the mean values only. $\alpha $GMK offers the possibility to weight the influence of the covariance information compared to the mean. As a result, it provided better results than the mean modeling and than GMK, since it encompasses both.

It is furthermore interesting to analyze the optimal values of the weighting parameter $\alpha $ found during the cross-validation and the average of associated F1 scores (Figure 12). The highest F1 scores were reached for high values of $\widehat{\alpha}$. The worst F1 scores were obtained with $\widehat{\alpha}<2$, and the value $\widehat{\alpha}=0$ was never selected. It shows the importance of the covariance information in grasslands’ modeling: the heterogeneity in a grassland must be accounted for, and it is not entirely well represented by the mean only.

Following on from the methods’ discussion, the choice of modeling grasslands pixels’ distribution by a Gaussian distribution makes sense in this context. It is particularly appropriate for semi-natural grasslands, which are very heterogeneous, contrary to crops or annual “artificial” grasslands, which can be assimilated to crops.

However, modeling grasslands by the mean only produced equivalent results to the methods based on Gaussian modeling for the classification of management practices, contrary to the old and young grassland discrimination. Indeed, management practices are supposed to be uniform at the grassland scale. Therefore, the mean appears to be sufficient for this application, contrary to the old and young grasslands’ discrimination, which requires capturing more variations between the grasslands. The best modeling might be different depending on the application. Moreover, some grasslands are so small that the covariance matrix is too badly estimated.

In the proposed kernel, this modeling was made flexible by regularizing the weight given to the covariance matrix. $\alpha $GMK benefits from its high level of adaptability in front of the object configuration: no choice has to be made between a Gaussian or a mean modeling since the method encompasses both. It also includes several object level methods known in the literature. However, this is at the cost of one more parameter to tune. Therefore, the classification process takes more time than GMK, for instance.

Above all, although it is the first application of generative mean kernels in remote sensing classification, the $\alpha $-Gaussian mean kernel proved its efficiency and stability in these experiments. The results suggest it is appropriate for grasslands’ classification.

For the management practice classification, using time series of 2014 produced better results than using 2013. This might be explained by the acquisition dates in the time series. Although 2014 has less images, more clear images were acquired during spring compared to 2013, which has a lack of acquisitions in April and May (Figure 3). Indeed, many studies showed that the best season to discriminate grasslands is during the growing season [36,49,53,54]. Spring is the period of the vegetation cycle where the management practices begin. Therefore, it is easier to differentiate the practices during this period. It might thus affect the accuracy of the classification of the year 2013.

It is not shown in this experiment, but using only one or two years of acquisitions to discriminate old from young grasslands did not produce sufficient classification accuracies. This is the reason why three years of data were used. Old “permanent” grasslands are supposed to have a more stable phenology over the years than the young “temporary” grasslands, which have been recently sown (less than five years) [6]. The young grasslands phenology is closer to crops in their very first years. We suppose this makes possible their discrimination with inter-annual SITS. However, the optimal number of years needed to discriminate these types of grasslands could constitute a research topic.

In general, the results could also be enhanced by removing some winter images, which can have a negative influence on the entire annual time series [40]. However, the scope of this study was to develop a method that is able to use a given time series, without having to process a date selection.

On the whole, the classification did not reach high accuracies (F1 maximum average of 0.73 for management practices and of 0.71 for old and young grasslands’ classification). This can be explained by the unbalanced dataset with under-representation of grazing and mixed grasslands in the first application and under-representation of old grasslands in the second one. These classes obtained the lowest producer and user accuracies (cf. Tables S1 and S2 in the Supplementary Materials) because of their limited number of samples for training the models. The methods should be tested on a more balanced dataset of grasslands’ classes.

Moreover, as many times emphasized, semi-natural grasslands (which are present in these datasets) are characterized by their high level of heterogeneity. Therefore, there might be a large amount of intra-class variability because of grasslands’ diversity. The discrimination might be improved by using more distinct classes: intensively-used grasslands against extensively-used grasslands, artificial (monospecific) grasslands against semi-natural grasslands, for instance.

To our knowledge, only the work of Möckel et al. [86] relates to the classification of grasslands’ age using remote sensing data. They reached a Kappa value of 0.77 in classifying three different grassland age-classes. However, they used airborne hyperspectral data from a single date. Their recommendation was to use multitemporal data to improve the classification or to use satellite hyperspectral data to monitor grasslands over wider areas. Our study was based on using multi-spectro-temporal satellite data, but our proposed method would also work with hyperspectral data.

As described in the Introduction, few studies have been carried out on the analysis of semi-natural grasslands using high spatio-spectro-temporal resolution SITS. Usually, methods were pixel-based, and they were applied on a few images or on a precise date selection to avoid dealing with the high dimension of data [42,44,49]. Schuster et al. [52] successfully classified grassland habitat using 21 RapidEye images on a pixel basis, but there was no mention of the processing times.

At the object level using a time series, grasslands were often represented by their mean NDVI, such as in [60], who noticed the difficulty to discriminate grasslands from crops because of mean seasonal NDVI similarities. The closest configuration might be the work of Zillman et al. [35], who used an object-based analysis and spectral reflectances combined with seasonal statistics of vegetation indices for mapping grasslands across Europe. The seasonal statistics were particularly relevant in the classification, because they captured well the spectral diversity of the grassland phenology. The use of these metrics could be considered for discriminating grassland management practices, which impact on the phenology. The authors also concluded that the object-based analysis improves the classification compared to a pixel-based classification. However, the objects were determined by segmentation.

To show the efficiency of $\alpha $GMK, we classified all of the grasslands from the French agricultural land use database (Registre Parcellaire Graphique) covered by the Formosat-2 time series to predict their management practice in 2014. All of the plots declared as grasslands in 2014, i.e., “permanent grassland” and “temporary grassland” regardless of their age, were selected. After applying a negative buffer of 8 m and rasterizing the polygons, we removed the plots representing less than 10 Formosat-2 pixels. In the end, there were 797 grassland plots covered by the extent of Formosat-2 for a total of 252,472 pixels.

The multispectral SITS of 2014 was used. The SVM was trained on the whole field data (Section 2.3.2) using the same grid search as in the experiments. The parameters chosen after cross-validation based on F1 score were $\widehat{\alpha}=5$ and $\widehat{\gamma}={2}^{-15}$. Then, the model was used to predict the management practices of the 797 grasslands of the land use database.

The classification accuracy could not be assessed since the true labels of the grasslands are not known. However, as described in the study site, a spatial distribution of the classes could be expected. Indeed, grazed and mixed grasslands should be found in the southwest of the site, whereas more mown grasslands should be in the north.

An extract of the classification result is shown in Figure 13. It represents the classified grasslands in their raster format. As expected, most of the grazed and mixed grasslands are located in the southwest of the image, whereas the north of the image is mostly composed of mown grasslands. Therefore, $\alpha $GMK was very likely able to classify with an acceptable accuracy the grasslands management practices without any a priori geographic information. However, specific care should be considered, as not all of the possible management practices were predicted. For instance, grasslands mown twice or unused grasslands were not in the training dataset, but it does not mean these managements do not exist in the rest of the data. The method deserves to be tested with an exhaustive grassland typology to produce more detailed grasslands maps.

In terms of processing times, the proposed method is able to classify 800 grasslands, representing more than 250,000 pixels, at the object level from a high spatial resolution SITS within a few seconds on a conventional personal computer.

This study aimed at developing a model for the classification of grasslands using satellite image time series with a high number of spectro-temporal variables. A grassland modeling at the object level was proposed. To deal with grasslands’ heterogeneity, their pixels distribution was modeled by a Gaussian distribution. Then, to measure the similarity between two grasslands, i.e., two Gaussian distributions, a kernel function based on mean maps was introduced, namely the $\alpha $-Gaussian mean kernel. The proposed method was compared to existing pixel-based and object-based classification methods for the supervised classification of grassland using inter- and intra-annual SITS. The Gaussian mean kernels provided the highest classification accuracies, showing that the covariance information must be accounted for. In terms of processing times, the object-based methods were much faster than pixel-based methods.

Several contributions have been made in this work. The first lies in the grasslands’ pixel distribution modeling at the object level. A flexible kernel was proposed to encompass both the Gaussian and mean modeling of grasslands, so no choice has to be made between these two modelings. It can therefore be used on homogeneous objects such as artificial grasslands or on very small objects, as well as on heterogeneous semi-natural grasslands. The second contribution is that this kernel is suitable for high dimensional data in a small ground sample size context. It enables the use of all of the multispectral data instead of a single vegetation index or the use of a long time series. Furthermore, it can be used on a whole time series without date selection. Indeed, this new kernel offers very low computational load. It can therefore be applied on a large dataset. With this kernel, we were able to process and to classify more than 250,000 pixels on a conventional personal computer within a few seconds. Even if it is the first application of generative mean kernels in remote sensing classification, the $\alpha $GMK proved its efficiency and stability in these experiments. It is a good compromise between processing speed and accuracy for the classification of grasslands.

The $\alpha $GMK deserves to be tested on a larger dataset with more balanced classes. Seasonal statistics could be used to improve the representation of grassland phenology. These ideas will be considered in the future. This method was designed to deal with the dense SITS, which will be provided by Sentinel-2 and to efficiently produce maps from this type of data. Other applications of the method are still possible (e.g., small and heterogeneous objects, such as peatlands, urban areas, etc.).

The following are available online at www.mdpi.com/2072-4292/9/7/688/s1: Figure S1: True color composite images of the Formosat-2 time series of 2014, Figure S2: Boxplot of Kappa coefficient repartitions for the classification of the old and young grasslands, Table S1: Average User Accuracy (UA) and Producer Accuracy (PA) (%) over the 100 repetitions for each class; 1: old, 2: young, Figure S3: Boxplot of Kappa coefficient repartitions for classification of management practices using time series of year (a) 2013 and (b) 2014, Table S2: Average UA and PA (%) over the 100 repetitions for each class; 1: mowing, 2: mixed, 3: grazing.

This work was partially supported by a French National Institute for Agricultural Research (INRA) and French National Institute for Research in Computer Science and Automation (INRIA) Young Scientist Contract (CJS INRA-INRIA) and by the grant Défi Mastodons-CNRS. The authors would like to thank CNES and CESBIO for providing the pre-processed Formosat-2 data. Special thanks to Marc Lang for playing a major role in the field work, to Donatien Dallery for designing the processing chain to compute the grasslands age from the RPG and to Romain Carrié for his careful reviewing of the Introduction. We would like to thank the anonymous reviewers for their constructive comments.

M.L., M.F. and S.G. conceived of the model. M.L., M.F. and S.G. conceived of and designed the experiments. M.L. performed the experiments. M.L. and M.F. analyzed the data. M.L. and M.F. wrote the paper with feedback from S.G. and D.S.

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

The following abbreviations are used in this manuscript:

BD | Bhattacharyya Distance |

EMK | Empirical Mean Kernel |

GIS | Geographic Information System |

GMK | Gaussian Mean Kernel |

HDKLD | High Dimensional Kullback–Leibler Divergence |

JMD | Jeffries-Matusita Distance |

KLD | Kullback–Leibler Divergence |

LAI | Leaf Area Index |

NDVI | Normalized Difference Vegetation Index |

NIR | Near Infrared |

PMV | Pixel Majority Vote |

RBF | Radial Basis Function |

SITS | Satellite Image Time Series |

SVM | Support Vector Machine |

$\alpha $GMK | $\alpha $-Gaussian Mean Kernel |

First, let us write the Gaussian distribution ${p}_{i}$ to the power of ${\alpha}^{-1}$:

$$\begin{array}{ccc}\hfill {p}_{i}{\left(\mathbf{x}\right|{\mathit{\mu}}_{i},{\mathbf{\Sigma}}_{i})}^{{\alpha}^{-1}}& =& \frac{1}{{\left(2\pi \right)}^{d/2\alpha}}\times \frac{1}{|{\mathbf{\Sigma}}_{i}{|}^{1/2\alpha}}\times \mathrm{exp}\left\{-0.5{(\mathbf{x}-{\mathit{\mu}}_{i})}^{\top}{\left(\alpha {\mathbf{\Sigma}}_{i}\right)}^{-1}(\mathbf{x}-{\mathit{\mu}}_{i})\right\}\hfill \\ & =& \frac{{\left(2\pi \right)}^{\frac{d}{2}(1-\frac{1}{\alpha})}}{{\left(2\pi \right)}^{d/2}}\times {\alpha}^{1/2}\times \frac{|{\mathbf{\Sigma}}_{i}{|}^{\frac{1}{2}(1-\frac{1}{\alpha})}}{|\alpha {\mathbf{\Sigma}}_{i}{|}^{1/2}}\times \mathrm{exp}\left\{-0.5{(\mathbf{x}-{\mathit{\mu}}_{i})}^{\top}{\left(\alpha {\mathbf{\Sigma}}_{i}\right)}^{-1}(\mathbf{x}-{\mathit{\mu}}_{i})\right\}\hfill \\ & =& {\alpha}^{1/2}{\left(2\pi \right)}^{\frac{d}{2}(1-\frac{1}{\alpha})}{\left|{\mathbf{\Sigma}}_{i}\right|}^{\frac{1}{2}(1-\frac{1}{\alpha})}\times p\left(\mathbf{x}\right|{\mathit{\mu}}_{i},\alpha {\mathbf{\Sigma}}_{i})\hfill \\ & =& C({\mathbf{\Sigma}}_{i},\alpha )p\left(\mathbf{x}\right|{\mathit{\mu}}_{i},\alpha {\mathbf{\Sigma}}_{i}).\hfill \end{array}$$

Then, plugging Equation (A1) in Equation (7), we get:
which is Equation (5) with the covariance matrix of the Gaussian distribution scaled with $\alpha $. The constants $C({\mathbf{\Sigma}}_{i},\alpha )$ and $C({\mathbf{\Sigma}}_{j},\alpha )$ are removed when normalizing the kernel, and we get Equation (8). ☐

$$\begin{array}{ccc}\hfill {K}^{\alpha}({\mathcal{N}}_{i},{\mathcal{N}}_{j})& =& C({\mathbf{\Sigma}}_{i},\alpha )C({\mathbf{\Sigma}}_{j},\alpha )\frac{\mathrm{exp}\left\{-0.5{({\widehat{\mathit{\mu}}}_{i}-{\widehat{\mathit{\mu}}}_{j})}^{T}{\left(\alpha {\widehat{\mathbf{\Sigma}}}_{i}+\alpha {\widehat{\mathbf{\Sigma}}}_{j}+{\gamma}^{-1}{\mathbf{I}}_{d}\right)}^{-1}({\widehat{\mathit{\mu}}}_{i}-{\widehat{\mathit{\mu}}}_{j})\right\}}{|\alpha {\widehat{\mathbf{\Sigma}}}_{i}+\alpha {\widehat{\mathbf{\Sigma}}}_{j}+{\gamma}^{-1}{\mathbf{I}}_{d}{|}^{0.5}},\hfill \end{array}$$

- Eriksson, A.; Eriksson, O.; Berglund, H. Species Abundance Patterns of Plants in Swedish Semi-Natural Pastures. Ecography
**1995**, 18, 310–317. [Google Scholar] [CrossRef] - Cousins, S.A.; Eriksson, O. The influence of management history and habitat on plant species richness in a rural hemiboreal landscape, Sweden. Landsc. Ecol.
**2002**, 17, 517–529. [Google Scholar] [CrossRef] - Gardi, C.; Tomaselli, M.; Parisi, V.; Petraglia, A.; Santini, C. Soil quality indicators and biodiversity in northern Italian permanent grasslands. Eur. J. Soil Biol.
**2002**, 38, 103–110. [Google Scholar] [CrossRef] - Critchley, C.; Burke, M.; Stevens, D. Conservation of lowland semi-natural grasslands in the UK: A review of botanical monitoring results from agri-environment schemes. Biol. Conserv.
**2004**, 115, 263–278. [Google Scholar] [CrossRef] - Werling, B.P.; Dickson, T.L.; Isaacs, R.; Gaines, H.; Gratton, C.; Gross, K.L.; Liere, H.; Malmstrom, C.M.; Meehan, T.D.; Ruan, L.; et al. Perennial grasslands enhance biodiversity and multiple ecosystem services in bioenergy landscapes. Proc. Natl. Acad. Sci. USA
**2014**, 111, 1652–1657. [Google Scholar] [CrossRef] [PubMed] - Austrheim, G.; Olsson, E.G.A. How does continuity in grassland management after ploughing affect plant community patterns? Plant Ecol.
**1999**, 145, 59–74. [Google Scholar] [CrossRef] - Norderhaug, A.; Ihse, M.; Pedersen, O. Biotope patterns and abundance of meadow plant species in a Norwegian rural landscape. Landsc. Ecol.
**2000**, 15, 201–218. [Google Scholar] [CrossRef] - Waldhardt, R.; Otte, A. Indicators of plant species and community diversity in grasslands. Agric. Ecosyst. Environ.
**2003**, 98, 339–351. [Google Scholar] [CrossRef] - Hansson, M.; Fogelfors, H. Management of a semi-natural grassland; results from a 15-year-old experiment in southern Sweden. J. Veg. Sci.
**2000**, 11, 31–38. [Google Scholar] [CrossRef] - Moog, D.; Poschlod, P.; Kahmen, S.; Schreiber, K.F. Comparison of species composition between different grassland management treatments after 25 years. Appl. Veg. Sci.
**2002**, 5, 99–106. [Google Scholar] [CrossRef] - Zechmeister, H.; Schmitzberger, I.; Steurer, B.; Peterseil, J.; Wrbka, T. The influence of land-use practices and economics on plant species richness in meadows. Biol. Conserv.
**2003**, 114, 165–177. [Google Scholar] [CrossRef] - Plantureux, S.; Peeters, A.; McCracken, D. Biodiversity in intensive grasslands: Effect of management, improvement and challenges. Agron. Res.
**2005**, 3, 153–164. [Google Scholar] - Muller, S. Appropriate agricultural management practices required to ensure conservation and biodiversity of environmentally sensitive grassland sites designated under Natura 2000. Agric. Ecosyst. Environ.
**2002**, 89, 261–266. [Google Scholar] [CrossRef] - Rocchini, D.; Boyd, D.S.; Féret, J.B.; Foody, G.M.; He, K.S.; Lausch, A.; Nagendra, H.; Wegmann, M.; Pettorelli, N. Satellite remote sensing to monitor species diversity: Potential and pitfalls. Remote Sens. Ecol. Conserv.
**2016**, 2, 25–36. [Google Scholar] [CrossRef] - Pettorelli, N.; Laurance, W.F.; O’Brien, T.G.; Wegmann, M.; Nagendra, H.; Turner, W. Satellite remote sensing for applied ecologists: Opportunities and challenges. J. Appl. Ecol.
**2014**, 51, 839–848. [Google Scholar] [CrossRef] - Newton, A.C.; Hill, R.A.; Echeverría, C.; Golicher, D.; Rey Benayas, J.M.; Cayuela, L.; Hinsley, S.A. Remote sensing and the future of landscape ecology. Prog. Phys. Geogr.
**2009**, 33, 528–546. [Google Scholar] [CrossRef] - Gu, Y.; Wylie, B.K.; Bliss, N.B. Mapping grassland productivity with 250-m eMODIS NDVI and SSURGO database over the Greater Platte River Basin, USA. Ecol. Indic.
**2013**, 24, 31–36. [Google Scholar] [CrossRef] - Li, Z.; Huffman, T.; McConkey, B.; Townley-Smith, L. Monitoring and modeling spatial and temporal patterns of grassland dynamics using time-series MODIS NDVI with climate and stocking data. Remote Sens. Environ.
**2013**, 138, 232–244. [Google Scholar] [CrossRef] - Gu, Y.; Wylie, B.K. Developing a 30-m grassland productivity estimation map for central Nebraska using 250-m MODIS and 30-m Landsat-8 observations. Remote Sens. Environ.
**2015**, 171, 291–298. [Google Scholar] [CrossRef] - Friedl, M.A.; Michaelsen, J.; Davis, F.W.; Walker, H.; Schimel, D.S. Estimating grassland biomass and Leaf Area Index using ground and satellite data. Int. J. Remote Sens.
**1994**, 15, 1401–1420. [Google Scholar] [CrossRef] - Wylie, B.; Meyer, D.; Tieszen, L.; Mannel, S. Satellite mapping of surface biophysical parameters at the biome scale over the North American grasslands: A case study. Remote Sens. Environ.
**2002**, 79, 266–278. [Google Scholar] [CrossRef] - Darvishzadeh, R.; Skidmore, A.; Schlerf, M.; Atzberger, C.; Corsi, F.; Cho, M. LAI and chlorophyll estimation for a heterogeneous grassland using hyperspectral measurements. ISPRS J. Photogramm. Remote Sens.
**2008**, 63, 409–426. [Google Scholar] [CrossRef] - He, Y.; Guo, X.; Wilmshurst, J.F. Reflectance measures of grassland biophysical structure. Int. J. Remote Sens.
**2009**, 30, 2509–2521. [Google Scholar] [CrossRef] - Asam, S.; Fabritius, H.; Klein, D.; Conrad, C.; Dech, S. Derivation of leaf area index for grassland within alpine upland using multi-temporal RapidEye data. Int. J. Remote Sens.
**2013**, 34, 8628–8652. [Google Scholar] [CrossRef] - Schmidtlein, S.; Sassin, J. Mapping of continuous floristic gradients in grasslands using hyperspectral imagery. Remote Sens. Environ.
**2004**, 92, 126–138. [Google Scholar] [CrossRef] - Ishii, J.; Lu, S.; Funakoshi, S.; Shimizu, Y.; Omasa, K.; Washitani, I. Mapping potential habitats of threatened plant species in a moist tall grassland using hyperspectral imagery. Biodivers. Conserv.
**2009**, 18, 2521–2535. [Google Scholar] [CrossRef] - Fava, F.; Parolo, G.; Colombo, R.; Gusmeroli, F.; Marianna, G.D.; Monteiro, A.; Bocchi, S. Fine-scale assessment of hay meadow productivity and plant diversity in the European Alps using field spectrometric data. Agric. Ecosyst. Environ.
**2010**, 137, 151–157. [Google Scholar] [CrossRef] - Oldeland, J.; Wesuls, D.; Rocchini, D.; Schmidt, M.; Jürgens, N. Does using species abundance data improve estimates of species diversity from remotely sensed spectral heterogeneity? Ecol. Indic.
**2010**, 10, 390–396. [Google Scholar] [CrossRef] - Feilhauer, H.; Faude, U.; Schmidtlein, S. Combining Isomap ordination and imaging spectroscopy to map continuous floristic gradients in a heterogeneous landscape. Remote Sens. Environ.
**2011**, 115, 2513–2524. [Google Scholar] [CrossRef] - Duniway, M.C.; Karl, J.W.; Schrader, S.; Baquera, N.; Herrick, J.E. Rangeland and pasture monitoring: An approach to interpretation of high-resolution imagery focused on observer calibration for repeatability. Environ. Monit. Assess.
**2012**, 184, 3789–3804. [Google Scholar] [CrossRef] [PubMed] - Punalekar, S.; Verhoef, A.; Tatarenko, I.V.; van der Tol, C.; Macdonald, D.M.J.; Marchant, B.; Gerard, F.; White, K.; Gowing, D. Characterization of a Highly Biodiverse Floodplain Meadow Using Hyperspectral Remote Sensing within a Plant Functional Trait Framework. Remote Sens.
**2016**, 8, 112. [Google Scholar] [CrossRef] - Hilker, T.; Natsagdorj, E.; Waring, R.H.; Lyapustin, A.; Wang, Y. Satellite observed widespread decline in Mongolian grasslands largely due to overgrazing. Glob. Chang. Biol.
**2014**, 20, 418–428. [Google Scholar] [CrossRef] [PubMed] - Cao, R.; Chen, J.; Shen, M.; Tang, Y. An improved logistic method for detecting spring vegetation phenology in grasslands from MODIS EVI time-series data. Agric. For. Meteorol.
**2015**, 200, 9–20. [Google Scholar] [CrossRef] - Eriksson, O.; Cousins, S.A.; Bruun, H.H. Land-use history and fragmentation of traditionally managed grasslands in Scandinavia. J. Veg. Sci.
**2002**, 13, 743–748. [Google Scholar] [CrossRef] - Zillmann, E.; Gonzalez, A.; Herrero, E.J.M.; van Wolvelaer, J.; Esch, T.; Keil, M.; Weichelt, H.; Garzón, A.M. Pan-European Grassland Mapping Using Seasonal Statistics From Multisensor Image Time Series. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2014**, 7, 3461–3472. [Google Scholar] [CrossRef] - Ali, I.; Cawkwell, F.; Dwyer, E.; Barrett, B.; Green, S. Satellite remote sensing of grasslands: From observation to management. J. Plant Ecol.
**2016**, 9, 649–671. [Google Scholar] [CrossRef] - Nagendra, H. Using remote sensing to assess biodiversity. Int. J. Remote Sens.
**2001**, 22, 2377–2400. [Google Scholar] [CrossRef] - Blaschke, T.; Hay, G.J.; Kelly, M.; Lang, S.; Hofmann, P.; Addink, E.; Feitosa, R.Q.; van der Meer, F.; van der Werff, H.; van Coillie, F.; Tiede, D. Geographic Object-Based Image Analysis—Towards a new paradigm. ISPRS J. Photogramm. Remote Sens.
**2014**, 87, 180–191. [Google Scholar] [CrossRef] [PubMed] - Poças, I.; Cunha, M.; Pereira, L.S. Dynamics of mountain semi-natural grassland meadows inferred from SPOT-VEGETATION and field spectroradiometer data. Int. J. Remote Sens.
**2012**, 33, 4334–4355. [Google Scholar] [CrossRef] - Halabuk, A.; Mojses, M.; Halabuk, M.; David, S. Towards Detection of Cutting in Hay Meadows by Using of NDVI and EVI Time Series. Remote Sens.
**2015**, 7, 6107–6132. [Google Scholar] [CrossRef] - Lucas, R.; Rowlands, A.; Brown, A.; Keyworth, S.; Bunting, P. Rule-based classification of multi-temporal satellite imagery for habitat and agricultural land cover mapping. ISPRS J. Photogramm. Remote Sens.
**2007**, 62, 165–185. [Google Scholar] [CrossRef] - Toivonen, T.; Luoto, M. Landsat TM images in mapping of semi-natural grasslands and analysing of habitat pattern in an agricultural landscape in south-west Finland. FENNIA Int. J. Geogr.
**2003**, 181, 49–67. [Google Scholar] - Nagendra, H.; Lucas, R.; Honrado, J.P.; Jongman, R.H.; Tarantino, C.; Adamo, M.; Mairota, P. Remote sensing for conservation monitoring: Assessing protected areas, habitat extent, habitat condition, species diversity, and threats. Ecol. Indic.
**2013**, 33, 45–59. [Google Scholar] [CrossRef] - Price, K.P.; Guo, X.; Stiles, J.M. Optimal Landsat TM band combinations and vegetation indices for discrimination of six grassland types in eastern Kansas. Int. J. Remote Sens.
**2002**, 23, 5031–5042. [Google Scholar] [CrossRef] - Gamon, J.A.; Field, C.B.; Roberts, D.A.; Ustin, S.L.; Valentini, R. Airbone Imaging Spectrometry Functional patterns in an annual grassland during an AVIRIS overflight. Remote Sens. Environ.
**1993**, 44, 239–253. [Google Scholar] [CrossRef] - Corbane, C.; Lang, S.; Pipkins, K.; Alleaume, S.; Deshayes, M.; Millán, V.E.G.; Strasser, T.; Borre, J.V.; Toon, S.; Michael, F. Remote sensing for mapping natural habitats and their conservation status—New opportunities and challenges. Int. J. Appl. Earth Obs. Geoinf.
**2015**, 37, 7–16. [Google Scholar] [CrossRef] - Wulder, M.A.; Hall, R.J.; Coops, N.C.; Franklin, S.E. High Spatial Resolution Remotely Sensed Data for Ecosystem Characterization. BioScience
**2004**, 54, 511–521. [Google Scholar] [CrossRef] - Buck, O.; Millán, V.E.G.; Klink, A.; Pakzad, K. Using information layers for mapping grassland habitat distribution at local to regional scales. Int. J. Appl. Earth Obs. Geoinf.
**2015**, 37, 83–89. [Google Scholar] [CrossRef] - Franke, J.; Keuck, V.; Siegert, F. Assessment of grassland use intensity by remote sensing to support conservation schemes. J. Nat. Conserv.
**2012**, 20, 125–134. [Google Scholar] [CrossRef] - Schmidt, T.; Schuster, C.; Kleinschmit, B.; Forster, M. Evaluating an Intra-Annual Time Series for Grassland Classification—How Many Acquisitions and What Seasonal Origin Are Optimal? IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.
**2014**, 7, 3428–3439. [Google Scholar] [CrossRef] - Dusseux, P.; Vertès, F.; Corpetti, T.; Corgne, S.; Hubert-Moy, L. Agricultural practices in grasslands detected by spatial remote sensing. Environ. Monit. Assess.
**2014**, 186, 8249–8265. [Google Scholar] [CrossRef] [PubMed] - Schuster, C.; Schmidt, T.; Conrad, C.; Kleinschmit, B.; Förster, M. Grassland habitat mapping by intra-annual time series analysis—Comparison of RapidEye and TerraSAR-X satellite data. Int. J. Appl. Earth Obs. Geoinf.
**2015**, 34, 25–34. [Google Scholar] [CrossRef] - Psomas, A.; Kneubuhler, M.; Huber, S.; Itten, K.; Zimmermann, N.E. Hyperspectral remote sensing for estimating aboveground biomassand for exploring species richness patterns of grassland habitats. Int. J. Remote Sens.
**2011**, 32, 9007–9031. [Google Scholar] [CrossRef] - Hill, M.J. Vegetation index suites as indicators of vegetation state in grassland and savanna: An analysis with simulated SENTINEL-2 data for a North American transect. Remote Sens. Environ.
**2013**, 137, 94–111. [Google Scholar] [CrossRef] - Drusch, M.; Bello, U.D.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; Meygret, A.; Spoto, F.; Sy, O.; Marchese, F.; Bargellini, P. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sens. Environ.
**2012**, 120, 25–36. [Google Scholar] [CrossRef] - Laliberte, A.S.; Fredrickson, E.L.; Rango, A. Combining decision trees with hierarchical object-oriented image analysis for mapping arid rangelands. Photogramm. Eng. Remote Sens.
**2007**, 73, 197–207. [Google Scholar] [CrossRef] - Brenner, J.C.; Christman, Z.; Rogan, J. Segmentation of Landsat Thematic Mapper imagery improves buffelgrass (Pennisetum ciliare) pasture mapping in the Sonoran Desert of Mexico. Appl. Geogr.
**2012**, 34, 569–575. [Google Scholar] - Stenzel, S.; Fassnacht, F.E.; Mack, B.; Schmidtlein, S. Identification of high nature value grassland with remote sensing and minimal field data. Ecol. Indic.
**2017**, 74, 28–38. [Google Scholar] [CrossRef] - Evans, J.; Geerken, R. Classifying rangeland vegetation type and coverage using a Fourier component based similarity measure. Remote Sens. Environ.
**2006**, 105, 1–8. [Google Scholar] [CrossRef] - Esch, T.; Metz, A.; Marconcini, M.; Keil, M. Combined use of multi-seasonal high and medium resolution satellite imagery for parcel-related mapping of cropland and grassland. Int. J. Appl. Earth Obs. Geoinf.
**2014**, 28, 230–237. [Google Scholar] [CrossRef] - Duro, D.C.; Franklin, S.E.; Dubé, M.G. A comparison of pixel-based and object-based image analysis with selected machine learning algorithms for the classification of agricultural landscapes using SPOT-5 HRG imagery. Remote Sens. Environ.
**2012**, 118, 259–272. [Google Scholar] [CrossRef] - Sheeren, D.; Fauvel, M.; Josipović, V.; Lopes, M.; Planque, C.; Willm, J.; Dejoux, J.F. Tree Species Classification in Temperate Forests Using Formosat-2 Satellite Image Time Series. Remote Sens.
**2016**, 8, 734. [Google Scholar] [CrossRef] - Ding, Y.; Zhao, K.; Zheng, X.; Jiang, T. Temporal dynamics of spatial heterogeneity over cropland quantified by time-series NDVI, near infrared and red reflectance of Landsat 8 OLI imagery. Int. J. Appl. Earth Obs. Geoinf.
**2014**, 30, 139–145. [Google Scholar] [CrossRef] - Pan, Z.; Huang, J.; Zhou, Q.; Wang, L.; Cheng, Y.; Zhang, H.; Blackburn, G.A.; Yan, J.; Liu, J. Mapping crop phenology using NDVI time-series derived from HJ-1 A/B data. Int. J. Appl. Earth Obs. Geoinf.
**2015**, 34, 188–197. [Google Scholar] [CrossRef] - Cingolani, A.M.; Renison, D.; Zak, M.R.; Cabido, M.R. Mapping vegetation in a heterogeneous mountain rangeland using Landsat data: An alternative method to define and classify land-cover units. Remote Sens. Environ.
**2004**, 92, 84–97. [Google Scholar] [CrossRef] - Müller, H.; Rufin, P.; Griffiths, P.; Siqueira, A.J.B.; Hostert, P. Mining dense Landsat time series for separating cropland and pasture in a heterogeneous Brazilian savanna landscape. Remote Sens. Environ.
**2015**, 156, 490–499. [Google Scholar] [CrossRef] - Mitchell, T.M. Machine Learning; McGraw-Hill: New York, NY, USA, 1997. [Google Scholar]
- Donoho, D.L. High-dimensional data analysis: The curses and blessings of dimensionality. In Proceedings of the AMS Conference on Math Challenges of the 21st Century, Los Angeles, CA, USA, 8 August 2000. [Google Scholar]
- Fauvel, M.; Tarabalka, Y.; Benediktsson, J.A.; Chanussot, J.; Tilton, J.C. Advances in spectral-spatial classification of hyperspectral images. Proc. IEEE
**2013**, 101, 652–675. [Google Scholar] [CrossRef] - Hagolle, O.; Huc, M.; Villa Pascual, D.; Dedieu, G. A multi-temporal method for cloud detection, applied to FORMOSAT-2, VENuS, LANDSAT and SENTINEL-2 images. Remote Sens. Environ.
**2010**, 114, 1747–1755. [Google Scholar] [CrossRef] - Eilers, P.H.C. A Perfect Smoother. Anal. Chem.
**2003**, 75, 3631–3636. [Google Scholar] [CrossRef] [PubMed] - Atzberger, C.; Eilers, P.H. A time series for monitoring vegetation activity and phenology at 10-daily time steps covering large parts of South America. Int. J. Digit. Earth
**2011**, 4, 365–386. [Google Scholar] [CrossRef] - Atzberger, C.; Eilers, P.H.C. Evaluating the effectiveness of smoothing algorithms in the absence of ground reference measurements. Int. J. Remote Sens.
**2011**, 32, 3689–3709. [Google Scholar] [CrossRef] - Nitze, I.; Barrett, B.; Cawkwell, F. Temporal optimisation of image acquisition for land cover classification with Random Forest and MODIS time-series. Int. J. Appl. Earth Obs. Geoinf.
**2015**, 34, 136–146. [Google Scholar] [CrossRef] - Shao, Y.; Lunetta, R.S.; Wheeler, B.; Iiames, J.S.; Campbell, J.B. An evaluation of time-series smoothing algorithms for land-cover classifications using MODIS-NDVI multi-temporal data. Remote Sens. Environ.
**2016**, 174, 258–265. [Google Scholar] [CrossRef] - Kullback, S. Letter to the Editor: The Kullback-Leibler distance. Am. Stat.
**1987**, 41, 340–341. [Google Scholar] - Richards, J.A.; Jia, X. Remote Sensing Digital Image Analysis: An Introduction, 3rd ed.; Springer: Secaucus, NJ, USA, 1999. [Google Scholar]
- Mehta, N.A.; Gray, A.G. Generative and Latent Mean Map Kernels. Available online: https://www.researchgate.net/publication/45915310_Generative_and_Latent_Mean_Map_Kernels (accessed on 1 July 2017).
- Gomez-Chova, L.; Camps-Valls, G.; Bruzzone, L.; Calpe-Maravilla, J. Mean Map Kernel Methods for Semisupervised Cloud Classification. IEEE Trans. Geosci. Remote Sens.
**2010**, 48, 207–220. [Google Scholar] [CrossRef] - Muandet, K.; Fukumizu, K.; Dinuzzo, F.; Schölkopf, B. Learning from distributions via support measure machines. In Advances in Neural Information Processing Systems 25; Curran Associates: Lake Tahoe, NV, USA, 2012; pp. 10–18. [Google Scholar]
- Tarantola, A. Inverse Problem Theory and Methods for Model Parameter Estimation; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2005. [Google Scholar]
- Lopes, M.; Fauvel, M.; Girard, S.; Sheeren, D. High dimensional Kullback–Leibler divergence for grassland management practices classification from high resolution satellite image time series. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 3342–3345. [Google Scholar]
- Ledoit, O.; Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal.
**2004**, 88, 365–411. [Google Scholar] [CrossRef] - Wilcoxon, F. Individual Comparisons by Ranking Methods. Biometr. Bull.
**1945**, 1, 80–83. [Google Scholar] [CrossRef] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - Möckel, T.; Dalmayne, J.; Prentice, H.C.; Eklundh, L.; Purschke, O.; Schmidtlein, S.; Hall, K. Classification of Grassland Successional Stages Using Airborne Hyperspectral Imagery. Remote Sens.
**2014**, 6, 7732–7761. [Google Scholar] [CrossRef]

Class | No. of Grasslands | No. of Pixels |
---|---|---|

Old | 59 | 31,166 |

Young | 416 | 129,348 |

Total | 475 | 160,514 |

Class | No. of Grasslands | No. of Pixels |
---|---|---|

Mowing | 34 | 6265 |

Grazing | 10 | 1193 |

Mixed | 8 | 1170 |

Total | 52 | 8628 |

Method | PMV | EMK | $\mathit{\mu}$ | HDKLD | BD | GMK | $\mathit{\alpha}$GMK |
---|---|---|---|---|---|---|---|

Level | Pixel | Object | Object | Object | Object | Object | Object |

Explanatory variable | ${\mathbf{x}}_{ik}$ | ${\mathbf{x}}_{ik}$ | ${\mathit{\mu}}_{i}$ | ${\mathcal{N}}_{i}$ | ${\mathcal{N}}_{i}$ | ${\mathcal{N}}_{i}$ | ${\mathcal{N}}_{i}$ |

Kernel | RBF | RBF | RBF | ${K}_{\mathrm{HDKLD}}$ | ${K}_{\mathrm{B}}$ | ${\tilde{K}}^{G}$ | ${\tilde{K}}^{\alpha}$ |

Parameters | $\sigma $, C | $\sigma $, C | $\sigma $, C | $\sigma $, C | $\sigma $, C | $\gamma $, C | $\gamma $, $\alpha $, C |

No. of samples | 16,250/8628 | 16,250/8628 | 475/52 | 475/52 | 475/52 | 475/52 | 475/52 |

Method | Parameters Values | |
---|---|---|

Inter-Annual Analysis | Intra-Annual Analysis | |

PMV | $\sigma \in \{{2}^{0},{2}^{1},\dots ,{2}^{10}\}$ | $\sigma \in \{{2}^{-17},{2}^{-16},\dots ,{2}^{-10}\}$ |

EMK | $\sigma \in \{{2}^{0},{2}^{1},\dots ,{2}^{10}\}$ | $\sigma \in \{{2}^{-18},{2}^{-17},\dots ,{2}^{-10}\}$ |

$\mathit{\mu}$ | $\sigma \in \{{2}^{0},{2}^{1},\dots ,{2}^{10}\}$ | $\sigma \in \{{2}^{-18},{2}^{-17},\dots ,{2}^{-10}\}$ |

HDKLD | $\sigma \in \{{2}^{10},{2}^{11},\dots ,{2}^{20}\}$ | $\sigma \in \{{2}^{15},{2}^{16},\dots ,{2}^{25}\}$ |

BD | $\sigma \in \{{2}^{0},{2}^{1},\dots ,{2}^{10}\}$ | $\sigma \in \{{2}^{10},{2}^{11},\dots ,{2}^{18}\}$ |

GMK | $\gamma \in \{{2}^{0},{2}^{1},\dots ,{2}^{10}\}$ | $\gamma \in \{{2}^{-17},{2}^{-18},\dots ,{2}^{-10}\}$ |

$\alpha $GMK | $\gamma \in \{{2}^{0},{2}^{1},\dots ,{2}^{10}\}$ | $\gamma \in \{{2}^{-18},{2}^{-17},\dots ,{2}^{-13}\}$ |

$\alpha \in \{0,0.1,0.5,1,2,5,10,15,20,25,50\}$ | $\alpha \in \{0,{10}^{-3},{10}^{-2},{10}^{-1},0.3,0.5,0.7,0.9,1,2,5,10,15,20,25\}$ |

Method | PMV | $\mathit{\mu}$ | HDKLD | BD | EMK | GMK | $\alpha $GMK |
---|---|---|---|---|---|---|---|

PMV | - | 3.52 ** | 8.66 ** | 4.83 ** | 1.93 | 0.98 | 1.32 |

$\mathit{\mu}$ | - | 7.48 ** | 1.76 | 1.55 | 2.28 ** | 4.80 ** | |

HDKLD | - | 5.68 ** | 8.26 ** | 8.65 ** | 9.77 ** | ||

BD | - | 3.23 ** | 3.95 ** | 6.09 ** | |||

EMK | - | 0.94 | 3.35 ** | ||||

GMK | - | 2.42 ** | |||||

$\alpha $GMK | - |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).