Object-Based Classification of Grasslands from High Resolution Satellite Image Time Series Using Gaussian Mean Map Kernels

This paper deals with the classification of grasslands using high resolution satellite image time series. Grasslands considered in this work are semi-natural elements in fragmented landscapes, i.e., they are heterogeneous and small elements. The first contribution of this study is to account for grassland heterogeneity while working at the object scale by modeling its pixels distributions by a Gaussian distribution. To measure the similarity between two grasslands, a new kernel is proposed as a second contribution: the a-Gaussian mean kernel. It allows to weight the influence of the covariance matrix when comparing two Gaussian distributions. This kernel is introduced in Support Vector Machine for the supervised classification of grasslands from south-west France. A dense intra-annual multispectral time series of Formosat-2 satellite is used for the classification of grasslands management practices, while an inter-annual NDVI time series of Formosat-2 is used for 
permanent and temporary grasslands discrimination. Results are compared to other existing pixel- and object-based approaches in terms of classification accuracy and processing time. The proposed method shows to be a good compromise between processing speed and classification accuracy. It can adapt to the classification constraints and it encompasses several similarity measures known in the literature. It is appropriate for the classification of small and heterogeneous objects such as grasslands.

chlorophyll content [20][21][22][23][24]. Studies having biodiversity conservation schemes such as assessing plant 48 diversity and plant community composition in a grassland are usually based on ground spectral 49 measurements or airborne acquisitions at a very high spatial resolution [25][26][27][28][29][30][31]. However, such 50 acquisitions are time-consuming and expensive and thus they do not allow for continuous monitoring 51 of grasslands over the years.

52
Using satellite remote sensing images, grasslands have been studied a lot at a regional scale with 53 medium spatial resolution sensors (i.e., MODIS,250m/pixel [17,18,32]), where the Minimum Mapping 54 Unit (MMU) is at least of hundreds of meters. This scale is suitable for large, extensive, homogeneous 55 and contiguous regions like steppes [33], but not for fragmented landscapes which are usually found 56 in Europe and in France particularly [34,35]. These fragmented landscapes are made of a patchwork 57 of different land covers which have a small area [35]. In these types of landscapes, grasslands can 58 be smaller (less than 10,000m 2 ) than the pixel resolution [36] (see Figure 1 for a graphical example).

59
As a consequence, pixels containing grasslands are usually a mixture of other contributions, which 60 can limit the analysis [37,38]. As examples, Poças et al. [39] had to select large contiguous areas of 61 semi-natural grasslands in a mountain region of Portugal to be able to use SPOT-VEGETATION data 62 (1-km resolution). Halabuk et al. [40] also had to select only one MODIS pixel per homogeneous 63 sample site in Slovakia to detect cutting in hay meadows. A 30-m pixel resolution is still not sufficient 64 for grassland characterization. Indeed, Lucas et al. [41] and Toivonen and Luoto [42] showed that it 65 was more difficult to classify fragmented and complex elements [43], like semi-natural grasslands, than 66 homogeneous habitats, using Landsat imagery. Price et al. [44] classified six grassland management 67 types in Kansas using six Landsat images, but the accuracy of the classification was not satisfying (less 68 than 70%). Therefore, to detect small grasslands in fragmented landscapes, high spatial resolution 69 images are required [36,45,46]. estimate, increasing the computation time and making the computation unstable (i.e., ill-conditioned 121 covariance matrices...) [68,69]. Hence, conventional models are not appropriate if one wants to use 122 all the spectro-temporal information of time series with high spatial and temporal resolutions. Thus, 123 classifying grasslands with this type of data is still considered as a challenge [52]. 124 In the present study, we introduce a model suitable for the classification of grasslands using 125 satellite image time series (SITS) with a high number of spectro-temporal variables (e.g., Sentinel-2 126 data). Two temporal scales are considered in this work: (i) an inter-annual time series of three years to 127 discriminate old grasslands from young grasslands and (ii) an intra-annual time series to identify the 128 management practices. Note that in this work, the objects are not found from segmentation [38] but 129 from existing dataset in a polygon form.

130
The first contribution of this study is to model a grassland at the object level while accounting 131 for the spectral variability within a grassland. We consider that the distribution of the pixel spectral 132 reflectance in a given grassland can be modeled by a Gaussian distribution. The second contribution 133 is to propose a measure of similarity between two Gaussian distributions that is robust to the high 134 dimension of the data. This method is based on the use of covariance through mean maps. The 135 last contribution is the application of the method to old and young grasslands discrimination and of 136 management practices classification, which are non common applications in remote sensing. Moreover, 137 in our knowledge, mean maps have not yet been used on Gaussian distributions for supervised 138 classification of SITS at the object level.

139
In the next section, the materials used for the experimental part of this study are presented.

159
images were all acquired with the same viewing angle. They were orthorectified, radiometrically and  smoothing on a grassland pixel is provided in Figure 4. This pixel is hidden by a light cloud during 177 one image acquisition (red cross). Notice that the smoothing is done at the cost of under-estimating 178 the local maxima of the temporal profile.  In this study, "old" grasslands are 14 years old or more, whereas "young" grasslands are less 186 than five years old. The French agricultural land use database (Registre Parcellaire Graphique) was 187 used to extract the grasslands depending on their age. It registers on an annual basis the cultivated 188 areas declared by the farmers in a GIS. Grasslands are declared as "permanent" or "temporary".

189
Permanent grasslands are at least five years old, whereas temporary grasslands are less than five    mowings, grazing and mixed (mowing then grazing). We eliminated the type "two mowings" of the 207 dataset because of its under-representation (only three grasslands).

208
The management types were used as classes for the classification ( Table 2). The grasslands were 209 digitalized manually after field work. A negative buffer of 8 meters was then applied to eliminate the 210 edge effects, before rasterizing the polygons. The average grasslands surface area is about 10,000m 2 .

211
The smallest grassland is 1,632m 2 (which represents 25 Formosat-2 pixels) and the largest is 47,111m 2 212 (735 pixels) ( Figure 6).   In this work, each grassland g i is composed of a given number n i of pixels x ik ∈ R d , where k is the 216 pixel index such as k ∈ {1, ..., n i }, i ∈ {1, . . . , G}, G is the total number of grasslands, N = ∑ G i=1 n i is 217 the total number of pixels, d = n B n T is the number of spectro-temporal variables, n B is the number of 218 spectral bands and n T is the number of temporal acquisitions. In the experimental part, when working 219 on the intra-annual time series of 2013 using the four spectral bands, d = 4 × 17 = 68. In 2014, d = 220 4 × 15 = 60. When working on the inter-annual times series using NDVI, d = 1 × (13 + 17 + 15) = 45.

221
With each grassland g i are associated a matrix X i of size (n i × d) and a response variable y i ∈ R which 222 corresponds to its class label.

223
In the following, two types of grassland modeling are discussed, at the pixel level and at the 224 object level. A more informative object level modeling is then proposed. Then, similarity measures are 225 discussed.

Pixel level 227
The representation of a grassland at the pixel level has been used a lot in the remote sensing 228 literature [17,42,44,48,49,52,59]. The grassland can either be represented by all its pixels or by one pixel 229 when the spatial resolution of the pixel is too coarse, see for instance [39,40]. In this representation, 230 a sample is a pixel. Therefore, with each x ik is associated the response variable y i of g i , but x ik is At the object level, the mean vector µ i of the pixels belonging to g i is generally used to represent 236 g i . It is estimated empirically by: In this case, a vectorμ i ∈ R d and a response variable y i ∈ R are associated with each grassland.

238
This representation might be limiting for heterogeneous objects such as grasslands since the the variance feature only: covariance must also be included.

246
In this study, to account for the spectro-temporal variability, we assume that the distribution of 247 pixels x i is, conditionally to grassland g i , a Gaussian distribution N (µ i , Σ i ), where Σ i is the covariance 248 matrix estimated empirically by: In this case, we associate with each g i its estimated distribution N (μ i ,Σ i ) and a response variable 250 y i ∈ R. The Gaussian modeling encodes first and second order information on the grassland by 251 exploiting the variance-covariance information. It is worth noting that if we constraintΣ i = I d , the 252 identity matrix of size d, for i ∈ [1, . . . , G], the Gaussian modeling is reduced to the mean vector. In the For classification purposes, a similarity measure between each pair of grasslands is required.

257
With pixel-based or mean modeling approaches, conventional kernel methods such as Support Vector 258 Machine (SVM) with a Radial Basis Function (RBF) kernel can be used since the explanatory variable 259 is a vector. However for a Gaussian modeling, i.e., when the explanatory variable is a distribution, 260 specific derivations are required to handle the probability distribution as an explanatory variable.

261
Many similarity functions generally used to compare two Gaussian distributions (e.g., The y-axis is the NIR reflectance value. The plot on the left shows the evolution of all the pixels in the grassland and the temporal mean of these pixels in red. The plot in the middle shows the temporal mean in red, the temporal mean +0.2× the first eigenvector in blue and the temporal mean −0.2× the first eigenvector in black. The plot on the right shows the temporal mean in red, the temporal mean +0.2× the second eigenvector in blue and the temporal mean −0.2× the second eigenvector in black. for each grassland is d(d + 3)/2 (d parameters for the mean vector and d(d + 1)/2 parameters for the  evaluations over the available realizations of p i and p j (i.e., pixels that belong to grasslands g i or g j ). It 280 corresponds to the empirical mean kernel [79, eq.(8)]: where n i and n j are the number of pixels associated with p i and p j respectively, x il is the l th realization 282 of p i , x jm is the m th realization of p j and k is a semi-definite positive kernel function.
It is possible to include prior knowledge on the distributions by considering the generative mean 284 kernel [78]: Note that eq. (3) acts on the realizations of p i while eq. (4) acts on its estimation. When dealing with a 286 large number of samples, the latter can drastically reduce the computational load with respect to the 287 former.

288
In our grassland modeling, p i and p j are assumed to be Gaussian distributions. In that case, if k is where γ is a positive regularization parameter coming from the Gaussian kernel k and | · | stands for 292 the determinant.

293
This kernel is not normalized, i.e., K G (N i , N i ) = 1, but the normalization can be achieved easily: However, in the case of very small grasslands, two problems remain. The first lies in the ridge 300 regularization: in this case, so low γ values are selected that it becomes too much regularized and it 301 deteriorates the information. The second problem is that the estimation of the covariance matrix has 302 a large variance when the number of samples used for the estimation is lower than the number of 303 variables. Therefore, the covariance matrix becomes a poorly informative feature. In the following, we 304 propose a new kernel function that allows to weight the covariance features with respect to the mean 305 features. Depending on the level of heterogeneity and the size of the grassland, the covariance matrix could 308 be more or less important for the classification process. We propose a kernel including an additional 309 positive parameter α which allows to weight the influence of the covariance matrix, the α-generative 310 mean kernel: When p i and p j are Gaussian distributions, k is a Gaussian kernel and the normalization is applied, 312 the expression gives rise to the α-Gaussian mean kernel: The proof is given in the appendix. It is interesting to note that particular values of α and γ lead to Small eigenvalues of the covariance matrices are shrinked to the value 10 −5 to make the   Table 3.

364
For memory issues during the SVM process, the number of pixels processed for the old and young 365 grasslands classification was divided by 10 for the two methods based on pixels (PMV and EMK).

366
Only 1 pixel out of 10 was kept per grassland. αGMK consists in a general modeling of the grassland at the object level and it encompasses several known modelings. The underlined methods are tested in this study. PMV, EMK and µ are not based on Gaussian modeling while the others are.   368 We compared the efficiency in terms of classification accuracy and processing time of all the 369 presented methods by classifying the two grassland datasets on inter-annual and intra-annual time 370 series (section 2).

371
For each method, a Monte Carlo procedure was performed on 100 runs. For each run, the dataset 372 was split randomly into training and testing datasets (75% for training and 25% for testing), preserving 373 the initial proportions of each class. The same grasslands were selected for a given Monte Carlo 374 repetition regardless of the method.

375
During each repetition, the optimal parameters were tuned by cross-validation based on the best 376 F1 score. Table 4 contains the parameters grid search for all the methods. Note that a wide grid was  The kernels and the SVM were implemented in Python through the Scikit library [85].
388 Table 4. Parameters tested for each method during cross-validation.

404
Indeed, processing the 160,514 pixels was not possible with SVM, so we had to reduce the number 405 of samples. These issues are not faced with object-oriented methods. The fastest methods are µ and 406 HDKLD, but they did not reach acceptable classification accuracies. The best method in terms of ratio 407 accuracy/processing time is αGMK. It is appropriate for processing a large number of grasslands.    The divergence methods (BD and HDKLD) provided the worst results, showing that they are not 437 robust enough to a high dimensional space.

438
Although they provided results close to the best results, pixel-based methods (PMV and EMK) 439 are the most demanding in terms of computational time and they do not scale well with the number of 440 pixels. Indeed, they have to process N pixels instead of G grasslands with G N. Therefore, we had to reduce the number of pixels used for the classification. Using them on a large area might be difficult, 442 as the old and young grasslands dataset showed.

443
Representing grasslands by the estimated distribution of their set of pixels decreases the 444 complexity during the SVM process. Therefore, object level methods offer a lower computational load 445 when compared to empirical mean kernels and pixel-based methods.

446
The mean generative kernel methods performed significantly better than the mean-only method 447 (µ). Among them, αGMK performed better than GMK. It was also one of the most stable methods.

448
In this context, including the covariance information helps to discriminate grasslands. However,

449
if the dimensionality is not properly handled, it deteriorates the process (e.g., BD and HDKLD). In this 450 case, it is preferable to use the mean values only. αGMK offers the possibility to weight the influence 451 of the covariance information compared to the mean. As a result, it provided better results than the 452 mean modeling and than GMK, since it encompasses both.

453
It is furthermore interesting to analyze the optimal values of the weightening parameter α found Following on from the methods discussion, the choice of modeling grasslands pixels distribution 461 by a Gaussian distribution makes sense in this context. It is particularly well appropriate for 462 semi-natural grasslands, which are very heterogeneous, contrary to crops or annual "artificial" 463 grasslands which can be assimilated to crops.

464
However, modeling grasslands by the mean only produced equivalent results to the methods 465 based on Gaussian modeling for the classification of management practices, contrary to the old and 466 young grassland discrimination. Indeed, management practices are supposed to be uniform at the 467 grassland scale. Therefore the mean appears to be sufficient for this application, contrary to the old and 468 young grasslands discrimination, which requires capturing more variations between the grasslands.

469
The best modeling might be different depending on the application. Moreover, some grasslands are so 470 small that the covariance matrix is too badly estimated.

471
In the proposed kernel, this modeling was made flexible by regularizing the weight given to the 472 covariance matrix. αGMK benefits from its high level of adaptability in front of the object configuration: 473 no choice has to be made between a Gaussian or a mean modeling since the method encompasses both.

474
It also includes several object level methods known in the literature. However, this is at the cost of one 475 more parameter to tune. Therefore, the classification process takes more time than GMK for instance.

476
Above all, although it is the first application of generative mean kernels in remote sensing 477 classification, the α-Gaussian mean kernel proved its efficiency and stability in these experiments. The 478 results suggest it is appropriate for grasslands classification. It is not shown in this experiment, but using only one or two years of acquisitions to discriminate 488 old from young grasslands did not produce sufficient classification accuracies. This is the reason 489 why three years of data were used. Old "permanent" grasslands are supposed to have a more stable 490 phenology over the years than the young "temporary" grasslands which have been recently sown 491 (less than five years) [6]. The young grasslands phenology is closer to crops in their very first years.

492
We suppose this makes possible their discrimination with inter-annual SITS. However, the optimal 493 number of years needed to discriminate these types of grasslands could constitute a research topic.

494
In general, the results could also be enhanced by removing some winter images which can have 495 a negative influence on the entire annual time series [40]. However, the scope of this study was to 496 develop a method which is able to use a given time series, without having to process a date selection.  In our knowledge, only the work of Möckel et al. [86] relates to the classification of grasslands age 512 using remote sensing data. They reached a Kappa value of 0.77 in classifying three different grassland 513 age-classes. However, they used airborne hyperspectral data from a single date. Their recommendation 514 was to use multitemporal data to improve the classification or to use satellite hyperspectral data to 515 monitor grasslands over wider areas. Our study was based on using multispectro-temporal satellite 516 data, but our proposed method would also work with hyperspectral data.

517
As described in the introduction, few studies have been carried out on the analysis of semi-natural 518 grasslands using high spatio-spectro-temporal resolutions SITS. Usually, methods were pixel-based 519 and they were applied on a few images or on a precise date selection to avoid dealing with the high dimension of data [42,44,49]. Schuster et al. [52] successfully classified grassland habitat using 21 521 RapidEye images on a pixel basis, but there was no mention of the processing times.

522
At the object level using a time series, grasslands were often represented by their mean NDVI, 523 such as in [60], who noticed the difficulty to discriminate grasslands from crops because of mean

534
All the plots declared as grasslands in 2014, i.e., "permanent grassland" and "temporary grassland" 535 regardless of their age, were selected. After applying a negative buffer of 8m and rasterizing the 536 polygons, we removed the plots representing less than 10 Formosat-2 pixels. In the end, there were 797 537 grassland plots covered by the extent of Formosat-2 for a total of 252,472 pixels.

538
The multispectral SITS of 2014 was used. The SVM was trained on the whole field data 539 (section 2.3.2) using the same grid search as in the experiments. The parameters chosen after 540 cross-validation based on F1 score wereα = 5 andγ = 2 −15 . Then, the model was used to predict the 541 management practices of the 797 grasslands of the land use database.

542
The classification accuracy could not be assessed since the true labels of the grasslands are not 543 known. However, as described in the study site, a spatial distribution of the classes could be expected.

544
Indeed, grazed and mixed grasslands should be found in the south-west of the site, whereas more 545 mown grasslands should be in the north.

546
An extract of the classification result is shown in Figure 13. It represents the classified grasslands in 547 their raster format. As expected, most of the grazed and mixed grasslands are located in the south-west 548 of the image, whereas the north of the image is mostly composed of mown grasslands. Therefore, 549 αGMK was very likely able to classify with an acceptable accuracy the grasslands management 550 practices without any a priori geographic information. However, specific care should be considered, as 551 not all the possible management practices were predicted. For instance, grasslands mown twice or 552 unused grasslands were not in the training dataset, but it does not mean these managements do not 553 exist in the rest of the data. The method deserves to be tested with an exhaustive grassland typology 554 to produce more detailed grasslands maps.

555
In terms of processing times, the proposed method is able to classify 800 grasslands, representing 556 more than 250,000 pixels, at the object level from a high spatial resolution SITS within a few seconds 557 on a conventional personal computer.

559
This study aimed at developing a model for the classification of grasslands using satellite image 560 time series with a high number of spectro-temporal variables. A grassland modeling at the object 561 level was proposed. To deal with grasslands heterogeneity, their pixels distribution was modeled by 562 a Gaussian distribution. Then, to measure the similarity between two grasslands, i.e, two Gaussian must be accounted for. In terms of processing times, the object-based methods were much faster than 568 pixel-based methods.

569
Several contributions have been made in this work. The first lies in the grasslands pixels 570 distribution modeling at the object level. A flexible kernel was proposed to encompass both Gaussian 571 and mean modeling of grasslands, so no choice has to be made between these two modelings. It can 572 therefore be used on homogeneous objects such as artificial grasslands, or on very small objects, as well 573 as on heterogeneous semi-natural grasslands. The second contribution is that this kernel is suitable for 574 high dimensional data in a small ground sample size context. It enables the use of all the multispectral 575 data instead of a single vegetation index or the use of a long time series. Also, it can be used on a 576 whole time series without dates selection. Indeed, this new kernel offers very low computational load.

577
It can therefore be applied on a large dataset. With this kernel, we were able to process and to classify 578 more than 250,000 pixels on a conventional personal computer within a few seconds. Even if it is the 579 first application of generative mean kernels in remote sensing classification, the αGMK proved its 580 efficiency and stability in these experiments. It is a good compromise between processing speed and 581 accuracy for the classification of grasslands.

582
The αGMK deserves to be tested on a larger dataset with more balanced classes. Seasonal statistics 583 could be used to improve the representation of grassland phenology. These ideas will be considered in 584 the future. This method was designed to deal with the dense SITS which will be provided by Sentinel-2 585 and to efficiently produce maps from this type of data. Other applications of the method are still 586 possible (e.g., small and heterogeneous objects such as peatlands, urban areas...).

Supplementary Materials:
The following are available online at www.mdpi.com/link, Figure S1: True color 588 composite images of the Formosat-2 time series of 2014; Figure S2: Boxplot of Kappa coefficient repartitions for the 589 classification of the old and young grasslands; Table S1: Average user accuracy (UA) and producer accuracy (PA) 590 (%) over the 100 repetitions for each class, 1: Old, 2: Young; Figure  Mastodons-CNRS. The authors would like to thank CNES and CESBIO for providing the pre-processed Formosat-2 596 data. Special thanks to M. Lang for playing a major role in the field work, to D. Dallery for designing the processing 597 chain to compute the grasslands age from the RPG, and to R. Carrié for his careful reviewing of the introduction. 598 We would like to thank the anonymous reviewers for their constructive comments. The following abbreviations are used in this manuscript: Proof of eq. (8). First, let us write the Gaussian distribution p i to the power of α −1 : Then, plugging eq. (A1) in eq. (7), we get: |αΣ i + αΣ j + γ −1 I d | 0.5 , which is eq. (5) with the covariance matrix of the Gaussian distribution scaled with α. The constants 613 C(Σ i , α) and C(Σ j , α) are removed when normalizing the kernel and we get eq. (8).