Self-Supervised Learning of Satellite-Derived Vegetation Indices for Clustering and Visualization of Vegetation Types

Vegetation indices are commonly used techniques for the retrieval of biophysical and chemical attributes of vegetation. This paper presents the potential of an Autoencoders (AEs) and Convolutional Autoencoders (CAEs)-based self-supervised learning approach for the decorrelation and dimensionality reduction of high-dimensional vegetation indices derived from satellite observations. This research was implemented in Mt. Zao and its base in northeast Japan with a cool temperate climate by collecting the ground truth points belonging to 16 vegetation types (including some non-vegetation classes) in 2018. Monthly median composites of 16 vegetation indices were generated by processing all Sentinel-2 scenes available for the study area from 2017 to 2019. The performance of AEs and CAEs-based compressed images for the clustering and visualization of vegetation types was quantitatively assessed by computing the bootstrap resampling-based confidence interval. The AEs and CAEs-based compressed images with three features showed around 4% and 9% improvements in the confidence intervals respectively over the classical method. CAEs using convolutional neural networks showed better feature extraction and dimensionality reduction capacity than the AEs. The class-wise performance analysis also showed the superiority of the CAEs. This research highlights the potential of AEs and CAEs for attaining a fine clustering and visualization of vegetation types.


Introduction
Vegetation is an integral component of life, and identification and classification of vegetation types provides valuable information for understanding the distribution and dynamics of vegetation as for environmental changes. Spectral reflectance measured from remote sensing platforms provides crucial information on identification and discrimination of vegetation types.
The reflectance measured from remote sensors vary with specific biophysical and chemical attributes such as plant type, leaf pigments, water content, and morphological characteristics of the plant canopy concerned [1,2]. Vegetation indices, arithmetic combination of reflectance in multiple wavelengths, have been derived for detecting the biophysical and chemical attributes of vegetation [3]. The vegetation indices are commonly utilized for monitoring and evaluation of extent and coverage of vegetation types [4,5]. However, a large number of vegetation indices exist in the literature, and large numbers of input variables complicate modelling and prediction, and impairs accuracy, known as the "curse of dimensionality" [6,7]. To cope with this problem, dimensionality reduction techniques, which transform high-dimensional dataset into lower-dimensional representations have been proposed [8,9].
Machine learning is a commonly used technique for interpreting remote sensing images into vegetation parameters. There are a number of machine learning algorithms available for dimensionality reductions. The Random Forests (RFs)-an ensemble of decision trees built by splitting the attributes of the data and averaging the output value of all trees-is one of the effective machine learning algorithms for learning non-linear data interactions [10,11]. The RF algorithm also provides an effective statistical measure for determining variable importance [12][13][14]. Researchers have utilized the RFs-based retrieval of important variables as a measure of reducing the dimensions of data [15][16][17][18] and classification of land cover and vegetation types [19][20][21].
Some other classical techniques of dimensionality reductions are principal component analysis [22][23][24], t-distributed stochastic neighbor embedding [25,26], and modified stochastic neighbor embedding [27] as some examples. Oliveira et al. [28] assessed the performance of classical techniques and proposed fractal-based algorithm to remove the redundant attributes accurately.
Artificial neural networks (ANNs) has demonstrated effectiveness in a number of climate change and ecological studies, such as change detection [29], plant identification [30,31], modeling the distribution of vegetation in past, present, and future climates [32], estimation of standing crop and fuel moisture content [33], and mixture estimation for vegetation mapping [34] as some examples.
In recent years, the use of Autoencoders (AEs) has attracted increasing attention to create low-dimensional projections of high-dimensional data. AEs are artificial neural networks (ANNs) designed for learning self-supervised latent representations of multidimensional data [35][36][37].The AEs provides a latent-space representation with a reduced dimensionality through the process of compressing (encoding) and decompressing (decoding) of the multi-dimensional data [38,39].
The major objective of this paper is to present an Autoencoders (AEs) and Convolutional Autoencoders (CAEs)-based self-supervised learning approach for the decorrelation and dimensionality reduction of high-dimensional vegetation indices derived from satellite observations. The compressed images are utilized for the clustering and visualization of vegetation types, and they were compared over the Random Forests-based important features. The potential of this approach for the classification of vegetation types is also assessed using the Random Forests (RFs) classifier.

Study Area
This research was implemented in Mt. Zao, which is located on the border between Yamagata and Miyagi prefectures in Japan. This region is characterized by a cool temperate climate with snowfall during winter. It represents a typical mountainous ecosystem in northeastern Japan. The location of the study area is shown in Figure 1.

Collection of Ground Truth Data
The performance of Autoencoders (AEs) and Convolutional Autoencoders (CAEs) for the clustering and classification of vegetation types was assessed with the support of ground truth data. The ground truth data were collected through a field survey, which was conducted in 2018. The field survey was assisted by time-lapse images available in Google Earth. For each vegetation type, 107-300 sample points (longitudes and latitudes), representing a homogenous area of at least 30 × 30 m, were collected. This research dealt with the following list of vegetation types (Table 1) present in the study region.

Processing of Satellite Data
Sentinel-2 scenes available for the study area from 2017 to 2019 (total 343 scenes) were processed. All images were processed for cloud removal and atmospherically corrected to obtain top of canopy reflectance using the Sen2Cor software (v2.8). For each Sentinel-2 scene, 16 vegetation indices (as shown in Table 2) were calculated, and the resulting vegetation index images were composited by computing monthly median values. In this manner,

Collection of Ground Truth Data
The performance of Autoencoders (AEs) and Convolutional Autoencoders (CAEs) for the clustering and classification of vegetation types was assessed with the support of ground truth data. The ground truth data were collected through a field survey, which was conducted in 2018. The field survey was assisted by time-lapse images available in Google Earth. For each vegetation type, 107-300 sample points (longitudes and latitudes), representing a homogenous area of at least 30 × 30 m, were collected. This research dealt with the following list of vegetation types (Table 1) present in the study region.

Dimensionality Reduction
We employed densely connected Autoencoders (AEs) and Convolutional Autoencoders (CAEs) for the decorrelation of high-dimensional vegetation indices. The model architectures utilized in this research are illustrated in Figure 2. The 192-dimensional stack of vegetation indices was fed into AEs and CAEs models. The AEs were composed of three dense layers; whereas the CAEs were composed of three convolutional layers, and a fully connected (dense) layer was used to collect the outputs from the final convolutional layer. Finally, multiple (3, 5, and 10) low-dimensional latent vectors were obtained from the final dense layer. For the self-supervised learning, we split the dataset into training (95%) and testing (5%) to tune the parameters and hyper-parameters of the models such as the learning rate, number of epochs, and batch size through a repeated trial and error process.

Quantitative Evaluation
The performance of AEs and CAEs-based compressed images for the clustering and visualization was compared to the classical RFs-based retrieval of the important features. The RFs algorithm has been employed as a classical approach for deriving variable im-

Quantitative Evaluation
The performance of AEs and CAEs-based compressed images for the clustering and visualization was compared to the classical RFs-based retrieval of the important features. The RFs algorithm has been employed as a classical approach for deriving variable importance [55]. The pixel values, corresponding to the ground truth (geolocation points) data, for each vegetation type were extracted from the compressed images (AEs, CAEs, and RFs) and utilized for the visualization and classification of vegetation types. We used 3D scatter plots to visualize the clusters of vegetation types and employed the RFs classifier for the classification of vegetation types.
Furthermore, performance of the compressed images (AEs, CAEs, and RFs) in different dimensions (3, 5, and 10) in terms of classification of vegetation types was also assessed quantitatively. For the supervised classification, Random Forests (RFs) classifier was employed on a 75% training set and validated on a 25% test set. For the quantitative evaluation, we computed the confidence interval by implementing bootstrap resampling of the dataset at 1000 times. The bootstrap resampling technique involves drawing of sample data repeatedly with replacement from a data source and reduces a biased estimation of the accuracy. The research procedure has been illustrated in Figure 3.

Clustering and Visualization
The discriminative ability of the lower dimensional features can be visualized by plotting their distribution in a three-dimensional space. A three-dimensional scatter plot of the RFs algorithm-based retrieval of the most important features is shown in Figure 4. As seen in the figure, most of the inter-class clusters are closed to each other. Therefore, it indicates shortcomings of the RFs-based important features on distinguishing most of the vegetation types.

Clustering and Visualization
The discriminative ability of the lower dimensional features can be visualized by plotting their distribution in a three-dimensional space. A three-dimensional scatter plot of the RFs algorithm-based retrieval of the most important features is shown in Figure 4. As seen in the figure, most of the inter-class clusters are closed to each other. Therefore, it indicates shortcomings of the RFs-based important features on distinguishing most of the vegetation types.
An improvement on the clustering of vegetation types by the AEs-based compressed features over the RFs algorithm can be seen with a wider inter-class variation of the clusters in Figure 5.
Further improvement by the CAEs-based compressed features can be seen in Figure 6. The 3D cluster shows its ability to distinguish vegetation types that were not distinguished by RFs-based important features.
plotting their distribution in a three-dimensional space. A three-dimensional scatter pl of the RFs algorithm-based retrieval of the most important features is shown in Figure  As      Further improvement by the CAEs-based compressed features can be seen in Figure  6. The 3D cluster shows its ability to distinguish vegetation types that were not distinguished by RFs-based important features.

Confidence Intervals
We employed the bootstrap resampling method to report the confidence interva the CAEs-based classification approach. The bootstrap resampling was done for 1 times with 75% training and 25% testing data. The accuracy obtained with the test d was collected for each bootstrap resampling, and the frequency of models yielding test accuracies has been plotted in Figure 8. We also computed the accuracy at a 9 confidence interval. The CAEs-based three features provided test accuracy betw 88.7% and 89.9% with a 95.0% confidence interval.

Confidence Intervals
We employed the bootstrap resampling method to report the confidence interval of the CAEs-based classification approach. The bootstrap resampling was done for 1000 times with 75% training and 25% testing data. The accuracy obtained with the test data was collected for each bootstrap resampling, and the frequency of models yielding the test accuracies has been plotted in Figure 8. We also computed the accuracy at a 95% confidence interval. The CAEs-based three features provided test accuracy between 88.7% and 89.9% with a 95.0% confidence interval.
the CAEs-based classification approach. The bootstrap resampling was done for 1000 times with 75% training and 25% testing data. The accuracy obtained with the test data was collected for each bootstrap resampling, and the frequency of models yielding the test accuracies has been plotted in Figure 8. We also computed the accuracy at a 95% confidence interval. The CAEs-based three features provided test accuracy between 88.7% and 89.9% with a 95.0% confidence interval. The distribution of feature importance obtained from bootstrap resampling of the CAEs-based three features has been shown in Figure 9. For each bootstrap resampling, the features distribution showed positive contribution to the model. The distribution of feature importance obtained from bootstrap resampling of the CAEs-based three features has been shown in Figure 9. For each bootstrap resampling, the features distribution showed positive contribution to the model. Similarly, we calculated the test accuracy using ten features obtained from the CAEs, and the frequency of models yielding the test accuracies has been plotted in Figure 10. The CAEs-based ten features provided test accuracy between 95.0% and 96.2% with a 95.0% confidence interval. In addition, for each bootstrap resampling, the features distribution (10 features) showed positive contribution to the model (Figure 11). Similarly, we calculated the test accuracy using ten features obtained from the CAEs, and the frequency of models yielding the test accuracies has been plotted in Figure 10. The CAEs-based ten features provided test accuracy between 95.0% and 96.2% with a 95.0% confidence interval. In addition, for each bootstrap resampling, the features distribution (10 features) showed positive contribution to the model (Figure 11).
Similarly, we calculated the test accuracy using ten features obtained from the CAEs, and the frequency of models yielding the test accuracies has been plotted in Figure 10. The CAEs-based ten features provided test accuracy between 95.0% and 96.2% with a 95.0% confidence interval. In addition, for each bootstrap resampling, the features distribution (10 features) showed positive contribution to the model (Figure 11).  Furthermore, we summarized the significance of the Autoencoders (AEs) and Convolutional Autoencoders (CAEs) over the Random Forests (RFs) by employing bootstrap resampling at 1000 times with 75% training and 25% testing data. Table 3 shows test accuracies computed with a 0.95 confidence interval. The test accuracies obtained from the bootstrap resampling also showed a higher performance of the Autoencoders (AEs) and Convolutional Autoencoders (CAEs) over the Random Forests (RFs). Interestingly, it should be noted that difference between them (RFs versus AEs or CAEs) started to decrease when number of input features increased. However, the main objective of this research was to compress the high-dimensional dataset into least dimension so as to visualize the inter-class variability of the vegetation types. There- Furthermore, we summarized the significance of the Autoencoders (AEs) and Convolutional Autoencoders (CAEs) over the Random Forests (RFs) by employing bootstrap resampling at 1000 times with 75% training and 25% testing data. Table 3 shows test accuracies computed with a 0.95 confidence interval. The test accuracies obtained from the bootstrap resampling also showed a higher performance of the Autoencoders (AEs) and Convolutional Autoencoders (CAEs) over the Random Forests (RFs). Interestingly, it should be noted that difference between them (RFs versus AEs or CAEs) started to decrease when number of input features increased. However, the main objective of this research was to compress the high-dimensional dataset into least dimension so as to visualize the inter-class variability of the vegetation types. Therefore, self-supervised learning with the Autoencoders (AEs) and Convolutional Autoencoders (CAEs) has met our objective of showing the inter-class variability of vegetation types at lower dimension. The collection and preparation of ground truth data is very time-consuming and expensive for the vegetation mapping projects. The ability of such self-supervised learning and visualization of the satellite images should contribute to the better interpretation and discrimination of vegetation types (such as collection of ground truth data) as well as subsequent supervised classification for the operational mapping of vegetation types at a broad scale.

Discussion
We implemented the Autoencoders (AEs) and Convolutional Autoencoders (CAEs)based self-supervised learning approach for the decorrelation and dimensionality reduction of high-dimensional satellite-based features. Deep learning is a versatile technology specialized for big datasets. Once the high-dimensional features were compressed into lower ones, we employed Random Forests (RFs) classifier for the classification of vegetation types.
A significant processing challenge exists with an ever-increasing collection of huge volumes of remote sensing data with enhanced spatial and spectral resolution. To address this issue, dimensionality reduction techniques have been recommended for reducing the complexity of the data while retaining the relevant information for the analysis [56,57]. Therefore, dimensionality reduction of high-dimensional vegetation indices is a relevant technique, while a large number of vegetation indices exist in the literature.
Spectral vegetation indices have been used by many researchers for the clustering and classification of vegetation types. For example, Villoslada et al. [58] highlighted the need to utilize a wide array of vegetation indices for the improved classification of vegetation types in coastal wetlands. Similarly, Kobayashi et al. [59] utilized spectral indices calculated from a Sentinel-2 multispectral instrument for crop classification. Wang et al. [60] used Fourier transforms on multi-temporal vegetation indices for unsupervised clustering of crop types. These researches motivated us to conduct the clustering and classification of sixteen vegetation types (including non-vegetation classes) solely based on vegetation indices.
Previous studies have also attempted dimensionality reduction of remote sensing data for the classification and mapping of vegetation types. However, most of these researchers have employed classical dimensionality reduction techniques. For example, Alaibakhsh et al. [61] used Principal component analysis (PCA) to delineate riparian vegetation from Landsat multi-temporal imagery. Similarly, Dadon et al. [62] used an improved PCA-based classification scheme to classify Mediterranean forest types in an unsupervised way. The t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm has been used to strengthen the quality of ground truth data used in the mapping of heterogeneous vegetation [63]. Some researchers have used self-organizing feature maps for the classification of crop types [64,65]. In this context, exploring the potential of deep, self-supervised learning approaches for the clustering and visualization of vegetation types is a timely and important research.

Conclusions
In this research, we showed Autoencoders (AEs)-based self-supervised learning as a potential approach for the decorrelation and compression of high-dimensional vegetation indices in a cool temperate mountainous ecosystem in Japan. Compared to the classical Random Forests (RFs)-based dimensionality reduction method, the Autoencoders (AEs) and Convolutional Autoencoders (CAEs) showed superior performance on the clustering and classification of vegetation types. While the purpose of dimensionality reduction approaches is to represent the relevant information into the least amount of dimensions, the three-dimensional compression of vegetation indices using the CAEs method showed around a 9% increase in the confidence interval over the RFs. The RFs extracts the most important features out of given features, whereas the AEs and CAEs generate compressed features through self-supervised learning approach. Therefore, this research highlights the application of the CAEs method for the clustering and visualization of vegetation types. In the future, we will assess the efficiency of CAEs in other regions.