Identifying Spatiotemporal Patterns in Land Use and Cover Samples from Satellite Image Time Series

: The use of satellite image time series analysis and machine learning methods brings new opportunities and challenges for land use and cover changes (LUCC) mapping over large areas. One of these challenges is the need for samples that properly represent the high variability of land used and cover classes over large areas to train supervised machine learning methods and to produce accurate LUCC maps. This paper addresses this challenge and presents a method to identify spatiotemporal patterns in land use and cover samples to infer subclasses through the phenological and spectral information provided by satellite image time series. The proposed method uses self-organizing maps (SOMs) to reduce the data dimensionality creating primary clusters. From these primary clusters, it uses hierarchical clustering to create subclusters that recognize intra-class variability intrinsic to different regions and periods, mainly in large areas and multiple years. To show how the method works, we use MODIS image time series associated to samples of cropland and pasture classes over the Cerrado biome in Brazil. The results prove that the proposed method is suitable for identifying spatiotemporal patterns in land use and cover samples that can be used to infer subclasses, mainly for crop-types.


Introduction
The large amount of remote sensing images freely available nowadays with improved temporal and spatial resolution brings new opportunities for land use and cover mapping over large areas [1]. Many authors propose a paradigm shift, where change detection is replaced with continuous monitoring [2]. To achieve this goal, researchers need access to satellite image time series to detect complex underlying processes [3].
The use of machine learning methods has been the preferred approach for satellite image time series analysis to map land use and cover changes (LUCC) [1]. These methods are supervised approaches that require a training phase using samples labeled a priori. Comparative analysis of different machine learning methods shows that the quality of training samples has a large impact on classification accuracy [1,[4][5][6]. These results motivate our work, which aims to answer the following question: How can training samples be assessed to improve the accuracy of LUCC maps produced by machine learning classification methods that use them? Different approaches have been proposed to improve the quality of training samples [7]. Such strategies include best practices in collecting training data [7][8][9][10] and refinement methods of samples using satellite image time series [11][12][13]. Other approaches such as semi-supervised learning and active learning have been applied to support training sample acquisition [14][15][16][17][18]. However, most studies are limited to small areas and inter-annual extents [7], not being suitable for large areas due to the need for a large number of samples to characterize each class [19].
In large areas, the variability of land use and cover classes is high and intrinsic to different regions and periods due to heterogeneous biodiversity as well as distinct climatic conditions and management practices [20][21][22][23]. Therefore, it is essential to explore ways to obtain samples that properly represent high intra-class variability considering spatiotemporal variations in large areas and multiple years.
In many situations, experts use generic labels for training samples (e.g., "forest", "cropland", and "grassland"). In practice, the actual spatiotemporal variability of the time series data does not match such generic labels. For this reason, it is useful to distinguish subclasses of high-level labels that correspond to regions of separability on the attribute space. Based on this, this paper presents a method to identify subclasses in training samples of satellite image time series. The method distinguishes different types of land use and cover classes over large areas in a more detailed granularity than user-provided labels. Using phenological and spectral information provided by satellite images time series, the method refines the generic labels and improves the accuracy of resulting LUCC maps.
The proposed method is based on time series clustering. In general, time series clustering has been applied for exploratory analysis [24][25][26], characterization of spatiotemporal patterns [27][28][29][30][31], and to soften the lack of discriminated data of land use and cover types [18,32,33]. It is a promising approach to exploit spectral and phenological information and refine training samples to ensure an acceptable level of quality [11,13,18,22]. Spectral and phenological information can be considered to discriminate different types of land use and cover classes during the sample collecting and labeling, contributing to improving the quality of training data sets. However, time series clustering results can be difficult to interpret and visualize, especially when the training data have a high dimension [34].
Our algorithm uses self-organizing maps (SOMs) [35] combined with a hierarchical algorithm [36,37]. SOMs have been applied in spatiotemporal data analysis [28,29,31,[38][39][40][41][42] mainly due to two properties. SOMs map high-dimensional data into a low bi-dimensional grid, representing the input data in low dimension. They also preserve neighborhood information; similar patterns in attribute space tend to stay close in the 2D SOM space. After mapping the samples from the high-dimensional attribute space to the 2D SOM space, we apply a hierarchical algorithm to the SOM clusters. The resulting subclusters refine the original training data into subclasses. These subclasses have lower intra-class variability and higher inter-class variability than the original SOM clusters. Given the SOM proprieties, they provide refinement of the original training samples.
As a proof of concept, we present a case study using Moderate Resolution Imaging Spectroradiometer (MODIS) image time series of 7 years (2010-2017) associated with samples of cropland and pasture classes over the Cerrado biome in Brazil. The results show that the proposed method is suitable for identifying spatiotemporal patterns in land use and cover samples that can be used to infer subclasses mainly for crop-types.

Data
The case study presented in this paper uses samples of the Cerrado biome in Brazil. Cerrado is a large and dynamic landscape with an area of approximately 204 million hectares (Mha), covering almost 22% of the central area in Brazil. It is one of the largest and most diverse tropical savannas in the world [43,44]. However, half of the biome has been changed and deforested due to advanced agricultural production and livestock [45].
The dataset is a merge of samples collected by remote sensing specialists through visual interpretation of high-resolution images, samples collected by the specialist in field observations, and farmer interviews provided by the Brazilian National Institute for Space Research (INPE) team partners. Figure 1 illustrates the dataset used in our case study. The dataset includes 15,794 samples spread over the Cerrado biome from 2010 to 2017 and is divided into two classes: (1) cropland and (2) pasture. Most of the cropland samples are in the same locations associated with different years.
Each sample has a spatial location, a period containing the start date and end date according to an agricultural calendar (from August to September), a label describing the sample class, and the satellite image time series for each attribute (bands and vegetation indexes) associated with it. The time series were extracted from the product MOD13Q1 (Collection 6) of the MODIS sensor. The images were collected on an interval of 16 days with 250 m spatial resolution. For this dataset, we used two vegetation indices and two bands available in MOD13Q1: the Normalized Difference Vegetation Index (NDVI) and the Enhanced Vegetation Index (EVI), and near-infrared (NIR) and and mid-infrared (MIR).
The MOD13Q1 product is created by considering the best available pixel from all acquisitions of MODIS images in a 16-day period. This approach selects the best pixel available in 16 days, avoiding cloud, shadow, or low-quality pixels. We obtained the time series from the MOD13Q1 product without performing further processing.
The Cerrado biome has distinct soil and weather over its area. Thus, it has great and heterogeneous types of land use and cover classes. However, it is not trivial to separate these types using 250 m and 16-day MOD13Q1 data. For this reason, we decided to use only the cropland and pasture classes to show how the proposed method works.

Overview
Clustering methods are useful to extract spatiotemporal information from satellite image time series samples [24]. There are methods of clustering that use the approach of spatial clustering objects and spatial and temporal objects. Although our time series have a geo-location, we used only the time series points as a similarity measure to identify the clusters [46,47]. Then, the geo-localization and period of each time series were used to explore the spatiotemporal patterns. Figure 2 illustrates the proposed method to identify and analyze spatiotemporal patterns. Given a dataset of satellite image time series samples, we applied SOMs to group similar time series. SOMs map the input data (high dimensionality) onto a bi-dimensional grid (low dimensionality), keeping the topology of the neighborhood; that is, a similar time series tends to be close in 2D space [35]. The SOM grid contains units called neurons, where each neuron has a weight vector associated with it. At the end of the SOM process, each time series from the input dataset was assigned to a neuron. In this way, from the output of SOM, the clusters can be recognized and created. Once the class label of each sample was known, each neuron was labeled using the majority technique. To explore each cluster, the neurons with the same category were selected. The hierarchical method [48] was then applied to the weight vectors of these neurons to extract subclusters. Due to the high intra-class variability, the hierarchical method, as secondary clustering, was applied to the neurons' weight vectors with the same category. The use of hierarchical clustering allows visualization through dendrogram, which can be cut in a specific height to define the number of subclusters. Our method suggests the number of the cluster through an inter-cluster validation index. However, specialists can apply different numbers of groups and explore them according to their necessities. The subclusters were grouped through the time series similarity. However, since all samples have the spatial location and the period containing the start date and end date, these subclusters can indicate patterns in space or time.

Self-Organizing Map
SOM is an unsupervised learning method based on competitive learning that reduces a high-dimensional feature input space onto a lower-dimensional feature output space. A large dataset can be mapped and represented by a set of neurons through the weight vectors. An important characteristic of SOM is preserving the neighborhood topology; thus, similar input data are mapped to the same neuron or a nearby one.
Each neuron j of the output space has a weight vector w j = [w j1 , . . . , w jn ] containing the same dimension n of the input data x(t) = [x(t) 1 , . . . , x(t) n ]. The weight vectors can be initialized randomly or according to some heuristic. The algorithm has two main steps. First, the distances D j between a sample and each neuron of the SOM grid is computed. The neuron containing the smallest value of the distance d b is selected as the best matching unit (BMU) for this sample. The equations to compute the distance and BMU are given by (1) The second step is the adjustment of the weight vector of the BMU and its neighbors so that the neurons have similar characteristics. The equation of adjustment is given by where the parameters α is the learning rate and h b,j is the neighborhood function. The learning must be set as 0 < α < 1, and it controls the change level of the BMU and its neighbors during the training step. The neighborhood function limits the size of the BMU's neighborhood that must be updated. The diversity of neighborhood functions can be seen in [35,49]. These steps are performed T times for neurons to organize themselves to have a similar neighborhood between them. Then, each sample from the dataset is assigned to a neuron.
The use of a secondary clustering directly on the SOM grid is not well suited in our case. It can generate confusion between the classes because of the high variability of patterns in a specific class; therefore, the neurons were labeled previously. For this reason, we labeled the neurons before splitting the clusters through hierarchical clustering. According to Kohonen [50], when an input dataset has a specific number of classes, we can assume that each neuron belongs to one of these classes. The neurons were categorized according to the majority class label that occurs in each one. In cases of the tiebreaker, the neuron received the label of the majority neighborhood.
Kohonen et al. [51] argue that when the neurons are labeled according to the majority, the quality of the organization of the final map can be measured through the external criteria of clustering validation. In our approach, we consider a cluster as a set of neurons labeled with the same category. We use purity as the external criteria of cluster validation. Purity assesses the clusters according to the number of the most representative class assigned to them.

Hierarchical Clustering
Once all the neurons were labeled, we applied the hierarchical clustering on the weight vectors of neurons with the same category. This aids in extracting useful information, analyzing the spatiotemporal dynamics, and avoiding confusion between different classes' samples.
Hierarchical clustering is a method where the data are partitioned successively, building a hierarchy of clusters [37]. This type of representation facilitates the visualization in each step where the level similarity occurs. There are two types of hierarchical clustering, agglomerative and divisive. In the divisive algorithm, the entire dataset starts in one cluster, and then it is split into two more similar clusters. In the agglomerative method, as illustrated in Figure 2 (step 4), each weight vector (w 2 , w 3 , w 4 , w 6 , w 7 , w 8 ) starts in its own cluster, a similarity matrix is computed, and the two most similar groups are identified. At each step, the clusters are merged, and the hierarchy is built based on linkage criteria.
The linkage criteria present the distance measures between the clusters. There are several linkage criteria proposed in the literature. The most common methods are single, complete, and average linkage [36]. This paper uses the Euclidean distance and the average linkage to build the hierarchical clustering. The average linkage computes the mean distance from each element of a cluster to all the elements of the other cluster [36,37].
The hierarchical algorithms build a binary tree called a dendrogram. This structure represents the order of how the clusters were merged. A dendrogram divides the data into an internally homogeneous group. Through the hierarchy of the tree, it is possible to visualize the variability of the data. The dendrogram aims to explore and define the suitable number of clusters according to the analysis level. Figure 2 (step 5) illustrates a dendrogram and how the cluster can be defined. The dendrogram was cut at the height where three clusters were created. In our method, the internal cluster validity, the C-Index [52] criteria, were applied to define the number of cluster for each class from the dendrogram. It is defined by: where Sd is the summed distance between the neurons within the same cluster, Sd min is the sum of the smallest distance between the pair of objects within the same cluster, and Sd max is the sum of the largest distance. Finally, the new subclusters of class Z can be created, as shown in Figure 2 (step 6). The weights w 3 , w 4 and w 5 were merged in cluster 1, w 6 and w 7 in cluster 2, and w 8 in cluster 3.

Clustering Output
After the clustering process, an interpretation and comparison of the clusters are necessary to identify the data's features. Generally, the amount of data used in remote sensing analysis is large; therefore, interpreting and analyzing the data in an aggregated way can facilitate the analysis. In general, our method performs the computational processing that allows the summarization of the input data to facilitate searching for knowledge, information, and pattern discovery. Besides the spatial location and time available in each dataset sample, the clustering result provides information that is useful in further analysis. After obtaining the identifiers for each subcluster and the information of the neurons associated with it, a specialist can identify, in an easy way, patterns that need to be interpreted. Moreover, to facilitate the analyses, interactive visual tools can process the output information generated by the clustering methods and jointly with the input data that contain the spatial location and period, providing spatial and temporal information. Figure 3 illustrates an example of output after the clustering methods. For each sample (presented in Section 2.1), the result of clustering methods provides identifiers of neurons and subclusters and their labels generated by SOM and hierarchical clustering, respectively. Three samples of class Z were assigned to different neurons. However, these neurons were labeled as class Z because of the majority class. Although these samples belong to the same category's neurons, we can notice in neurons 1, 4, and 13 distinct patterns through the weight vector. When the hierarchical clustering was applied to the weight vectors of class Z, these samples were assigned to three different subclusters. Furthermore, through the SOM grid neighborhood, we notice that neuron 4 is an outlier within the neurons of the subcluster of class Z. However, it is necessary to understand whether this neuron is an error or represents a different pattern, e.g., in space or time, within the class Z. To evaluate the relative performance of the original and refined training data sets, we used 5-fold cross-validation [53]. We split the training dataset into training and test sample sets to avoid overfitting and biased data [53,54]. Using the 5-fold approach, we estimated the classification accuracy (overall, producer, and user accuracy). We chose the random forest classifier due to its robustness on land use and cover mapping [5,55] to evaluate the training datasets generated from the cluster analysis.

Results
This section presents a case study to show how the method presented in this paper works. Figure 4 presents the SOM results for the vegetation indices and spectral bands used in the training step. The parameters used in SOM to generate the maps were Euclidean distance as similarity metric, grid size = 14 × 14, learning rate initial = 0.5 and final = 0.1, number of iteration = 100 and the neurons were labeled according to majority voting. The vegetation indices and spectral bands were chosen based on the study conducted by Santos et al. [56]. While the input parameters of SOM were defined based on empirical experiments, the start points, such as the size grid and learning rate, were suggested by Vesanto and Alhoniemi [57]. Figure 4 illustrates the primary clustering generated by SOM. The 15,794 samples, shown in Section 2.1, are represented by a set of 196 neurons, where 139 neurons represent the cluster of cropland, and the cluster of pasture is represented by 57 neurons. Only by analyzing the grid generated by SOM do we notice the high variability of the patterns, mainly in cropland neurons. Moreover, the single and double cropping groups can be identified in the cropland neurons, and some neurons of pasture are considerably similar to neurons of cropland with single cropping types. The similarity between these neurons can be investigated by an expert in more detail considering the spatial and temporal information provided by the samples. Some hypotheses can also be created, indicating whether these samples are separable or not due to noise, i.e., clouds, type of sensor used (spatial resolution, temporal resolution), or mislabeled samples.  Figure 5 shows how the samples are spread over the SOM grid, identifying whether they are nearest or farthest of their neighborhood. Although the neurons are labeled as a specific class, they are not 100% pure. Samples of different types were associated with neurons for which the majority belong to other classes. Overall, the purity for cropland and pasture is, respectively, 99.6% and 99.4%. Most of the confusion occurs near the class boundary. We can identify where these samples were mapped through the SOM grid. For instance, the cropland neurons with single cropping patterns are neighbors of the pasture neurons. In addition, the grids show precisely where the cropland samples were mapped in neurons labeled as pasture and vice-versa.

Revealing the Patterns of Cropland
The hierarchical clustering was applied to 139 weight vectors of cropland. Then, the C-Index suggested 10 clusters for cropland, as illustrated in Figure 6.   Figure 7a illustrates the cropland's neurons partitioned into ten groups in the SOM grid. Note that the neighborhood within each subcluster tends to be similar. Each neuron contains a set of samples, and they are represented by a weight vector. Figure 7b illustrates the neurons weight vectors for each subcluster. From the patterns generated by the subclusters and geographical location provided by the samples, an initial analysis that experts conducted to infer subclasses for Cropland suggests the following: 1.
Cropland 1 represents samples of soy-fallow. These samples are mapped only in the state of Bahia. This region is known due to the mostly single cropping regimes [58]. Some samples are spread over Goiás and Tocantins states; however, they are originally labeled as pasture.

2.
Cropland 2 represents samples of fallow-cotton. This type of crop is mapped in the states of Mato Grosso and Bahia. The patterns of this group are well defined.

3.
Croplands 3,4, and 8 represent samples of soy-cotton. In this study, this type of crop is mapped only in the state of Mato Grosso. Through the temporal patterns (Figure 7b), we can notice small variations, particularly during the first cycle (soy crop). This difference may be due to the soybean variety. The soybean varieties planted in Brazil can be of early, medium, and late maturity. The average cycle ranges from 99 to 128 days [59].

4.
Cropland 5 represents samples of millet-cotton. In this study, this type of crop is mapped only in the state of Mato Grosso.

5.
Croplands 7, 9, and 10 represent samples of soy-corn. This type of crop is spread over the Cerrado biome. The variability of this class can be noticed through the temporal patterns extracted by SOM. It occurs due to the climatological and soil variations leading to differences in each area's agricultural calendar. 6.
Cropland 6 is not well defined. Notice in the SOM grid (Figure 7a) that most of the neighbors of Cropland 6 belong to other groups. The temporal signatures can be confounded between soy and millet during the first cycle due to noise, likely caused by clouds. It is necessary to look at these samples in more detail. In contrast, in the second cycle, we can notice patterns of cotton and corn. Additionally, this cluster contains samples initially labeled as pasture.  Some examples are presented throughout this section, detailing how knowledge extraction can be obtained from the output clustering methods. Figure 9 presents the temporal variability between the patterns of a single cropping soy-fallow. All the crop samples from Cropland 1 are mapped only in the west of Bahia state, a region with tradition in planting the single cropping system [58] (Figure 9b). Figure 9 shows the temporal dynamics of a point (−12.8875, −45.8769) named single cropping over four harvests from 2010 to 2014. According to the agricultural calendars of the National Supply Company (CONAB) and the Brazilian Institute of Geography and Statistics (IBGE), soybean planting in this region varies from October to January. For rained systems, soybean planting is linked to the rainy season [60]. Therefore, the planting season varies from year to year, and this can be seen in Figure 9c. It can be observed that the period in which the crop was planted in the soil also varied, and this may be an indication of the variety of soybean that was planted in this region (with early, medium, or late maturation).
In Figure 10, we highlighted different neurons (Figure 10b) that belong to these clusters with distinct spatial location, agricultural calendar, and patterns affected by clouds (Figure 10c) to illustrate the variability between the patterns of soy-corn. In neuron 189, some samples belong to the states of Tocantins and Maranhão (7 • S). The first cycle of these samples starts later than in Mato Grosso's state (neurons 106 and 195). This occurs because these regions have a different agricultural calendar. Despite the pattern of neuron 189 indicating a later beginning of cycle compared to samples in Mato Grosso, it also presents samples located in Mato Grosso. However, it likely happened because of the noise caused by clouds in Mato Grosso's sample, causing a pattern similar to that observed in Maranhão. Figure 10b,d present the weight vectors and the NDVI time series samples associated with each neuron. In addition to the differences at the beginning of the first cycle, the biggest difference is in the cycle's harvest period. The samples assigned to neuron 189 were harvested late. This indicates that the soybean planted there has worse variety than those planted in samples assigned neurons 106 and 195. In the second cycle, we observe the opposite behavior. Most of the corn samples attributed to neuron 189 are shorter than in neurons 106 and 195. This may have occurred because most samples of neuron 12 are located in western Bahia, Tocantins, and Maranhão states, where corn crops are shorter (early maturing variety) than in the state of Mato Grosso. Moreover, the corn crop in these regions was affected due to the negative weather conditions in 2017.  Figure 11 illustrates in more detail the cluster of Cropland 6. This cluster has three neurons; however, the weight vectors (see Figure 11b) and the time series assigned to each neuron (see Figure 11d) indicate that neuron 103 contains samples of soy-corn. In contrast, neurons 61 and 75 contain samples of soy-cotton. Neurons 61 and 75 are distant from the clusters that represent soy-cotton (Cropland 3,4, and 8) and near the neurons of pasture and soy-corn and millet-cotton (Figure 11a). This occurs because of the peaks caused by clouds that impact the soybean patterns, causing a decrease in the pattern's average values. This can lead to confusion with millet. Moreover, this confusion can occur mainly when pasture is planted (notice in Figure 11c, samples of pasture mapped in neurons 61 and 75), since its planting period is between September and March, often coinciding with millet's planting period. Furthermore, millet is often used as pasture in the crop-livestock integration system [61].

Revealing the Patterns of Pasture
In the SOM grid, there are 57 neurons classified as pasture. However, the samples of pasture do not have a high variability like those of cropland. We separated the dendrogram ( Figure 12) into two clusters, as suggested by the C-index. Although the C-Index suggests only two clusters, small variations can be found in cluster 2. However, an expert can explore these variations if necessary.  Figure 13 illustrates an overview of the pasture cluster. In Pasture 1, the weight vectors ( Figure 13b) present patterns that can be confounded with single cropping. This type of confusion may occur due to October to April belonging to the rainy season [62], and, as such, the spectral profiles in these months are low due to the noise of clouds and rain. Furthermore, it can be observed in the SOM grid in Figure 13a that the neighbors of the neurons labeled as Pasture 1 are classified as cropland.
The group of Pasture 2, on average, presents similar spectral patterns. Furthermore, this cluster has proportionally less confusion with cropland. In addition, there are some patterns with the lowest spectral values, e.g., NDVI values below 0.5. NDVI is an index related to biophysical variables that control vegetation productivity, such as net primary productivity and the leaf area index [63]. Therefore, there is a probability of these samples representing areas with lower productivity. Figure 14 illustrates the cropland samples mapped in Pasture clusters. These samples belong to the states of Mato Grosso and Bahia. The NDVI time series of the cropland samples assigned to Pasture 1 present a high incidence of clouds from September to February. Due to these peaks, it is not easy to distinguish which specific type of crop these samples belong to or if they really are pasture. It would be necessary to check each region's agricultural calendar and their respective years or high-resolution images in detail. The NDVI time series of cropland's samples mapped in Pasture 2 are noisy, and their temporal signatures are similar to those of pasture. These samples may have been affected by noise, or they may be mislabeled. In addition, in the work of [64], the authors presented a pattern of 370 samples of pasture in the state of Mato Grosso, also using time series of the product MOD13Q1, where it is also possible to observe much noise caused by clouds, similar to the patterns presented in the samples labeled as cropland that were assigned in Pasture 2.

Assessing the Performance of the Training Samples
After the cluster analysis, we evaluated the performance of two datasets generated by the clusters. First, we used the entire dataset, on which the samples mapped in pasture neurons maintain the cropland label. Second, we removed from the dataset samples of cropland that were mapped in pasture's clusters. Additionally, we divided Cropland 6 into soy-corn and soy-cotton (see Figure 11). Table 1 presents the number of samples associated with each cluster and the classes that were assigned to each one. Tables 2 and 3 present the confusion matrix for the samples relabeled according to the analysis provided by the clusters. For the entire dataset, the overall accuracy was 97% (Table 2), and for the filtered dataset, the overall was 98% (Table 3). Although the producer's accuracy to cropland, presented in Table 2, is low, few samples were classified as pasture. Both confusion matrices indicate the majority confusion among soy-corn, soycotton, and millet-cotton, as highlighted in Cropland 6 analysis. Soy-cotton and soy-fallow mix together; this can happen because their first cycle is the same, and small variations during the second cycle can affect the separability between them. The producer's accuracy regarding millet-cotton is also low, probably because the noise caused by clouds during the first cycle of soybean makes the separability between soybean and millet difficult. The confusions between fallow-cotton and soy-cotton can occur because of some land variation during the fallow cycle, and the same occurs to confusions between soy-fallow and soy-corn. Although we removed cropland samples mapped in pasture neurons (Table  3), the confusion between Pasture 2 and the double cropping types still occurs due to clouds' noise affecting the separability between them.  In addition, we evaluated the original dataset without the relabeled samples, as shown in Table 4. As expected, the training data performance was much better with nondiscriminated data. Despite the confusion between crop and pasture, the overall accuracy was 99%. When the labels were specified, the confusion was spread among the same types. However, we can see where the most significant confusion occurs, such as in Pasture 2 and double cropping and Pasture 1 and single cropping.

Discussions
In this study, we show how the proposed method can be applied. The aim is to assess the clusters provided by SOM combined with hierarchical clustering to analyze the possibility of generating subclasses. The method suggests ten subclusters for cropland and two subclusters for pasture. We relabeled the samples that were initially labeled as cropland into five subclasses according to the spatial, temporal, and phenological information. Besides the subclasses, it also possible to identify mislabeled or noisy samples in the training dataset, contributing to dataset enhancement. For instance, in Cropland 1, there are some samples of pasture. In this case, an analysis by experts could be conducted to verify why these samples are not well separated. This can be achieved using phenological metrics, checking high-resolution images or maps of agriculture and pasture. This analysis is essential to verify if the samples are not separable due to the sensors' spatial and temporal resolution, noise such as clouds, or mislabeling.
The output table provided by the clustering methods, illustrated in Figure 3, allows quantifying the samples' information, such as the number of pasture samples mapped in neurons labeled as Cropland 1. As shown in Table 4, the original dataset presents confusion between pasture and cropland. However, when the subclasses are inferred through the subclusters, the reason for this type of confusion becomes clearer. In our dataset, there is confusion between samples with single cropping and pasture. This occurs due to the similarity between their temporal patterns. This can be observed in further detail in Cropland 2 ( Figure 7b) and Pasture 1 (Figure 13b). Since the method indicates exactly which samples are being confounded and their information, such as spatial location (longitude and latitude) and time, an expert can check through high-resolution or LUCC maps to identify the real label. Thus, future errors can be avoided in the classification.
In general, the samples' patterns presented in the results section help significantly to distinguish the types of agriculture and pasture. Nevertheless, it is important to note that small variations within the subclusters can still be observed in Cropland 1. It is possible to note the different temporal patterns in each neuron of Cropland 1 and consequently the samples assigned to them, indicating variation in soybean maturity. This type of variation can be captured through the hierarchical clustering using the MODIS sensor if the number of subclusters provided by the dendrogram is higher than 10. Once we have 139 neurons to represent cropland samples, the number of subclusters provided by hierarchical clustering can be 139 if the dendrogram presented in Figure 6 is cut when the height = 0. In our case study, the C-index was applied to define the number of subclusters for cropland, but it could be defined manually according to the final user's real goal.
In contrast to soy-fallow, soy-corn and soy-cotton were distributed among different groups. The number of soy-corn and soy-fallow samples caused an unbalance in the dataset. For this reason, the differences between the soy-corn and soy-fallow neurons are quite small. The most significant differences are attributed to more distant neurons in the SOM grid. In this way, considering the cut in the dendrogram, the significant intra-class variations, such as spatial location, and agriculture calendar, are indicated in different subclusters. In addition, phenological metrics could be used as thresholds to distinguish the clusters and capture crop variations.
Although the neurons present intra-class variations, when identifying soybean variability until crop separability, it is important to highlight that the sensor's choice may present different results to that shown in this paper due to the spatial and temporal resolution. In addition, the choice of the similarity measure used in the clustering methods can also omit or highlight these variations [65].

Conclusions
There is a lack of high-quality training samples in the remote sensing field, mainly regarding change monitoring in large areas and multiples years due to the high intra-class variability of land use and cover types. A way to improve sample acquisition tasks is through temporal patterns. The use of satellite image time series provides phenological and spectral information that can be considered during the collection of samples or labeling process. However, for large datasets, it can be inadequate and complicated for humans analysis, for this reason support from computation methods is necessary.
The focus of this paper was to present the method and show how it can be applied. The proposed method combined SOM with hierarchical clustering to identify spatiotemporal patterns through exploratory analysis in training sample datasets. The method works well to infer subclasses using MODIS images of 250 m and 16 days, mainly for cropland classes presented in this dataset. From the subclusters, we were able to identify mislabeled samples and refine the generic labels to infer subclasses. The method is not free of generating uncertain labels; however, the phenological, spectro-temporal, and spatial information provided by the satellite image time series samples identified through the subclusters patterns assists the experts during the labeling process.
We explored Cropland and Pasture samples over the Cerrado biome in Brazil using MODIS because of their high variability in different regions and years. An initiative called Brazil Data Cube [66] is producing Analysis-Ready Data (ARD) and multidimensional data cubes from medium-resolution satellite images for all Brazilian territory. Using these data cubes, we intend to apply the proposed method to 10-20 meters and five days Sentinel 2 image time series in order to separate all distinct types of land use and cover classes in Cerrado.

Data and Code Availability
The method was implemented in the R package. The first step of SOM was implemented on package sits (Satellite Image Time Series), available on GitHub at https: //github.com/e-sensing/sits (accessed on 30 December 2020). The data set and the code used in this manuscript to generate the results presented in Section 3 is provided under the GNU General Public License v3.0 and is available on GitHub at https://github.com/ lorenalves/ST_patterns_time_series (accessed on 30 December 2020).

Conflicts of Interest:
The authors declare no conflict of interest.