Automatic Crop Classiﬁcation in Northeastern China by Improved Nonlinear Dimensionality Reduction for Satellite Image Time Series

: Accurate and timely information on the spatial distribution of crops is of great signiﬁcance to precision agriculture and food security. Many cropland mapping methods using satellite image time series are based on expert knowledge to extract phenological features to identify crops. It is still a challenge to automatically obtain meaningful features from time-series data for crop classiﬁcation. In this study, we developed an automated method based on satellite image time series to map the spatial distribution of three major crops including maize, rice, and soybean in northeastern China. The core method used is the nonlinear dimensionality reduction technique. However, the existing nonlinear dimensionality reduction technique cannot handle missing data, and it is not designed for subsequent classiﬁcation tasks. Therefore, the nonlinear dimensionality reduction algorithm Landmark–Isometric feature mapping (L–ISOMAP) is improved. The advantage of the improved L–ISOMAP is that it does not need to reconstruct time series for missing data, and it can automatically obtain meaningful featured metrics for classiﬁcation. The improved L–ISOMAP was applied to Landsat 8 full-band time-series data during the crop-growing season in the three northeastern provinces of China; then, the dimensionality reduction bands were inputted into a random forest classiﬁer to complete a crop distribution map. The results show that the area of crops mapped is consistent with o ﬃ cial statistics. The 2015 crop distribution map was evaluated through the collected reference dataset, and the overall classiﬁcation accuracy and Kappa index were 83.68% and 0.7519, respectively. The geographical characteristics of major crops in three provinces in northeast China were analyzed. This study demonstrated that the improved L–ISOMAP method can be used to automatically extract features for crop classiﬁcation. For future work, there is great potential for applying automatic mapping algorithms to other data or classiﬁcation tasks.


Introduction
A spatial distribution map of crops provides fundamental information for agriculture-related research, such as crop growth monitoring, yield estimation, planting structure optimization, etc., which are of great significance for national food security [1][2][3].The conventional methods of crop statistics or monitoring are not only labor-intensive and time-consuming, but they are also not precise enough.Nowadays, remote sensing has become the main means to obtain the spatial distribution map of crops at the regional or global scale [4][5][6].However, using remote sensing technology to obtain the accurate and timely spatial distribution map is also a challenge, because many crop types have similar spectral characteristics at specific phenological stages.This is especially obvious on multi-spectral images with a limited number of spectral bands [7,8].If only a single multi-spectral image within an inappropriate time is used, the accuracy of mapping may be severely limited [9][10][11][12].
To overcome this shortcoming, a sequence of satellite images taken from different dates throughout the growing season for crops have been used in many studies, namely satellite image time series (SITS) [13][14][15][16][17][18].The possibility of crops being successfully classified will increase by using SITS because they may exhibit different temporal characteristics at distinct phenological stages, which will translate into spectral separability to facilitate their discrimination on the images.In multi-temporal remote sensing, research studies based on vegetation index (VI) time-series data are impressive [19,20].Some studies established decision tree models that directly used original VI values from different periods of the growing season [21][22][23].This method is effective for vegetation types with unique temporal characteristics.However, some useful information in time series may be ignored because the ordered relationship of different temporal observation was not explicitly considered [24].A more prevalent method for processing VI time series is to extract feature metrics from the time series and then input them into a supervised classifier [25].Commonly used feature metrics include phenological and related statistical features, such as the maximum or minimum value of VI, as well as the quantile, mean, standard deviation, slope, and gradient of VI time-series curves [26][27][28].Compared to using original VI values, this may improve accuracy in classification tasks [29,30].Predefined mathematical models as more complex temporal feature extractors have been broadly adopted in vegetation phenology research and SITS classification [31].VI time series were described as a function or a batch of functions such as Savitzky-Golay filter [32,33], Kalman Filter [34], Fourier transform [35,36], wavelet transform [37,38], spline fitting [39,40], linear regression [41,42], Hidden Markov Model [43,44], or a merger of function with various forms [45].
It is worth noting that VI is often developed by two spectral bands, which means that the remaining bands in SITS are not utilized.It can be reasonably supposed that if all spectral bands in SITS are used as the feature set instead of VI, better classification accuracy can be achieved [46,47].This is because the remaining bands may also have obvious responses to vegetation dynamics.For example, short-wave infrared (SWIR) bands have been verified to be sensitive to vegetation moisture [48,49].The use of full-band SITS increases the possibility of discriminating different crop types [50].Nonetheless, the resulting high-dimensional SITS dataset also brings challenges to data processing and classification.In hyperspectral remote sensing, machine learning methods show good performance in high-dimensional data processing, especially when the number of training samples is limited [51,52].SITS have a very similar data structure to hyperspectral images, and they contain a large number of related spectral bands.Obviously, the similarity of the data structure allows machine learning techniques used in hyperspectral images to also be used for SITS [53].Dimensionality reduction (DR) is a machine learning method that is widely used to extract spectral features from hyperspectral images.There is great potential for applying this method to automatically extract spectral-temporal features from SITS [54].
DR is a data compression technique that projects high-dimensional data onto low-dimensional space.It can reduce noise while retaining as much useful information as possible [55,56].DR can be divided into linear and nonlinear methods.Linear DR methods may not be suitable for optical SITS because multiple scattering occurs between different scene components during the imaging process, so that the contribution of each scene component is not linearly proportional to its surface area.This makes optical remote sensing inherently show nonlinear characteristics [57].For SITS, the reflectance contribution of most scene components will change temporally, which further enhances its nonlinearity.
At present, only a few studies have applied nonlinear DR methods to SITS.Yan and Roy (2015) used the nonlinear DR algorithm Laplacian Eigenmap (LE) for time-series land-cover classification and achieved higher accuracy than the traditional metrics method.Zhai et al. (2018) also proposed a time-series land-cover classification method that integrates spectral-temporal and spatial features obtained from the LE algorithm.The above methods are only tested on the SITS of a single scene, and it may have problems when used in large-scale areas.This is because SITS frequently suffer from missing data due to cloud and snow coverage, and nonlinear DR algorithms have the limitation that they cannot directly handle missing data.It is necessary to reconstruct the missing data for SITS before completing classification [58].However, reconstructing the missing data does not increase the real information, but it brings uncertainty to the subsequent classification because of prediction errors.Therefore, in order to obtain true results from DR, a more reasonable way should be to delete the missing data directly while retaining all remaining data from SITS before DR.This requires a nonlinear DR method that can handle SITS with unequal lengths in the same periods.
In this paper, an automated crop classification method based on SITS is established.A nonlinear DR method that is commonly used for hyperspectral data has been improved for application to SITS with missing observations.The methodology is applicable to any SITS but takes Landsat data as the example in this study.The study area and the data used are first described in Section 2.Then, the automatic method for crop classification is proposed in Section 3. The isometric feature mapping (ISOMAP) nonlinear DR method, and how to improve it to deal with missing data in SITS, are also introduced in Section 3. Section 4 shows the result of applying the proposed method to three provinces in northeastern China for automatic crop classification.The performance of the improved DR method, and the effect of different proportions of training data on DR and classification, are also analyzed in this section.Section 5 discusses the reliability and uncertainty of applying the proposed method to crop classification in large-scale areas.Finally, Section 6 is the conclusion.

Study Area
Northeastern China consists of Heilongjiang, Jilin, and Liaoning Provinces, which have a total area of 787,100 km 2 , accounting for 8.2% of the land area in China.Northeastern China is located in the high latitude of China (38 • 43 N-53 • 33 N, 118 • 53 -135 • 05 ), and it has a temperate monsoon climate.Summers are short and warm.Winters are long and cold.The main crops are planted once per year, including soybean, maize, and rice.The main crop growth season is from April to September with mean temperatures of 14°C to 19°C and mean precipitation of 400-700 mm.The study area is the core area of "Golden Maize Belt" in China.The Golden Maize Belt of China is at the same latitude as the American maize belt and the Ukrainian maize belt.The three maize belts together are called the "Three Major Golden Maize Belts".The geographical position of the study area is shown in Figure 1.

Landsat Data
In this study, Landsat 8 OLI (Operational Land Imager) Level 1 terrain-corrected (L1T) products during the crop growth season in 2015 were used.All the data were acquired from the USGS website (http://earthexplorer.usgs.gov/).Sixty-eight scenes with different paths/rows of Landsat 8 data are needed for covering three provinces of northeastern China.Each scene corresponds to 11 or 12 different temporal images during the crop-growing season from April 1 to September 30, with a total of 775 images for this study.For each image, we used the Landsat 8 OLI bands with 30 m spatial resolution, which included band 1 (coastal), 2 (blue), 3 (green), 4 (red), 5 (nir), 6 (swir 1), 7 (swir 2), and two cloud mask files.Since it is sensitive to water absorption, band 9 (cirrus) was not used [59].The path/row of Landsat data was used, and the acquisition date and cloud coverage of all temporal images for each path/row are shown in Figure 2. The raw DN values were converted into surface reflectance using an atmospheric correction tool from the Landsat Ecosystem Disturbance Adaptive Processing System [60].

Landsat Data
In this study, Landsat 8 OLI (Operational Land Imager) Level 1 terrain-corrected (L1T) products during the crop growth season in 2015 were used.All the data were acquired from the USGS website (http://earthexplorer.usgs.gov/).Sixty-eight scenes with different paths/rows of Landsat 8 data are needed for covering three provinces of northeastern China.Each scene corresponds to 11 or 12 different temporal images during the crop-growing season from April 1 to September 30, with a total of 775 images for this study.For each image, we used the Landsat 8 OLI bands with 30 m spatial resolution, which included band 1 (coastal), 2 (blue), 3 (green), 4 (red), 5 (nir), 6 (swir 1), 7 (swir 2), and two cloud mask files.Since it is sensitive to water absorption, band 9 (cirrus) was not used [59].The path/row of Landsat data was used, and the acquisition date and cloud coverage of all temporal images for each path/row are shown in Figure 2. The raw DN values were converted into surface reflectance using an atmospheric correction tool from the Landsat Ecosystem Disturbance Adaptive Processing System [60].

Land-Cover Map
A land-cover map provides a theoretical reference for cultivated land distribution.The 2015 land-cover product, with 30 m spatial resolution provided by geographical information monitoring cloud platform (http://www.dsac.cn/),was adopted in this study to generate an initial cultivated land mask for subsequent use.This product series was first released in 1995 and updated every few years.

Land-Cover Map
A land-cover map provides a theoretical reference for cultivated land distribution.The 2015 land-cover product, with 30 m spatial resolution provided by geographical information monitoring cloud platform (http://www.dsac.cn/),was adopted in this study to generate an initial cultivated land mask for subsequent use.This product series was first released in 1995 and updated every few years.They are achieved by visual interpretation and human-computer interaction methods using Landsat satellite image data, which mainly includes six first-level categories: cultivated land, woodland, grassland, water bodies, built-up land, and unused land.The validation procedure shows that the overall accuracy of this product series is above 90% [61].

Field Data
Training and testing data were obtained from a field survey and Google Earth high-resolution images.During the entire growth period of the crop from April to September in 2015, the relevant researchers divided three groups into three provinces of the study area to carry out the field survey.As a result of the large scope of the study area, we choose two main agricultural cities in each province, including Qitaihe and Mudanjiang in Heilongjiang Province, Siping and Liaoyuan in Jilin Province, and Chaoyang and Panjin in Liaoning Province.The main content of the survey includes the latitude, longitude, and elevation information of the survey point, the soil type, the crop planting type, the crop growth period, and the crop growth condition.The sampling interval of the survey point is about 0.4 km, and the area of the survey sample is about 20 m × 20 m.One photo of each orientation survey point, including north, south, east, and west, was taken to assist in the identification of the crop category.Chaoyang in Liaoning Province has the largest number of field survey data, which is 436, and Panjin has the lowest number, which is 318.Besides the 2165 ground truth images from the field survey, the identification of crops was also replenished from Google Earth high-resolution images from May to August in 2015 through visual interpretation.Finally, a total of 3512 reference sites in 16 cities were obtained.The difference of geographic locations for these cities ensure the spatial heterogeneity of the training and testing data [62,63].The number of samples for each crop type is shown in Table 1.The cities where the training and testing samples were collected are shown in Figure 3.

Statistical Data
In addition to the field survey, we collected statistical yearbooks for three provinces.The China Statistical Yearbook 2016, published by the National Bureau of Statistics (NBS), systematically collects statistical data on all aspects of economic and social development at the country and province level for 2015 (National Statistical Bureau 2016).The relevant statistical data of agriculture (Chapter 12) in the yearbook are used as the partial validation data for this study.Statistical data for maize, rice, and soybean of northeastern China in 2015 are given in Table 2.

Methods
The overall procedure for mapping the distribution of maize, rice, and soybean in northeastern China is illustrated in Figure 4.The main procedure includes (i) time series construction, (ii) training samples learning, (iii) nonlinear dimensionality reduction, (iv) automatic crop mapping, and (v) accuracy assessment.

Statistical Data
In addition to the field survey, we collected statistical yearbooks for three provinces.The China Statistical Yearbook 2016, published by the National Bureau of Statistics (NBS), systematically collects statistical data on all aspects of economic and social development at the country and province level for 2015 (National Statistical Bureau 2016).The relevant statistical data of agriculture (Chapter 12) in the yearbook are used as the partial validation data for this study.Statistical data for maize, rice, and soybean of northeastern China in 2015 are given in Table 2.

Methods
The overall procedure for mapping the distribution of maize, rice, and soybean in northeastern China is illustrated in Figure 4.The main procedure includes (i) time series construction, (ii) training samples learning, (iii) nonlinear dimensionality reduction, (iv) automatic crop mapping, and (v) accuracy assessment.

Time-Series Construction
In order to effectively utilize the temporal characteristics of crops, we rearranged the satellite image time series according to the spectral-temporal mode.All of the temporal data for the same band were stacked together to construct the time series.A 2015 land-cover map was used to build a cultivated land mask.Then, we applied the mask to the spectral-temporal time series for obtaining a spectral-temporal time series of the cultivated land.For Landsat 8 satellite images with a low acquisition period, we often encounter some regions on the image that are covered by clouds.In this study, the cfmask bands are used to remove the missing data covered by clouds and snow in SITS,

Time-Series Construction
In order to effectively utilize the temporal characteristics of crops, we rearranged the satellite image time series according to the spectral-temporal mode.All of the temporal data for the same band were stacked together to construct the time series.A 2015 land-cover map was used to build a cultivated land mask.Then, we applied the mask to the spectral-temporal time series for obtaining a spectral-temporal time series of the cultivated land.For Landsat 8 satellite images with a low acquisition period, we often encounter some regions on the image that are covered by clouds.In this Remote Sens. 2020, 12, 2726 8 of 22 study, the cfmask bands are used to remove the missing data covered by clouds and snow in SITS, while retaining all available data.As a result, the lengths of time series for data points may not be equal.

ISOMAP
The ISOMAP DR method was selected for this study for the following reasons.First, it is a nonlinear DR algorithm that has been demonstrated to have better performance than linear DR algorithms when applied to hyperspectral data.Second, ISOMAP is a global optimization algorithm that uses geodesic distance to powerfully represent the true geometric structure of data.This feature is particularly attractive for remote sensing image classification because it can provide the greatest separation between classes to project the high-dimensional data into a low-dimensional space.Third, it is easy to incorporate different distance metrics because ISOMAP only uses the distance relationship between the data.This provides a possible way to handle SITS with missing data.The ISOMAP algorithm can be briefly described as follows: (a) Using the K nearest neighbor method to construct a neighborhood graph that can express the neighborhood structure of samples.
(b) The geodesic distances between all samples are calculated using Dijkstra's or Floyd's shortest path algorithm.
(c) Using these geodesic distances as input, the classical multidimensional scaling (MDS) algorithm is run to reconstruct the new sample points in a low-dimensional space.

L-ISOMAP
When the number of sample points increases, the computational cost of the ISOMAP algorithm will be very expensive.First, it requires obtaining the shortest paths from all the root nodes in the neighborhood graph with an order of O(kN 2 log N) using Dijkstra's algorithm, where N is the number of sample points and k is the size of the neighborhood.Second, it depends on the eigen-decomposition for the geodesic distances matrix, with a computational complexity of O(N 3 ).
To overcome this drawback, a modified version of ISOMAP, called Landmark ISOMAP (L-ISOMAP), was developed by Silva and Tenenbaum [64].
Let vector ∈ X, n ≤ N, and the low-dimensional embedding for the landmark dataset and non-landmark dataset is Y L and Y Q = Y/Y L , respectively.The L-ISOMAP algorithm includes four main steps.First, the neighborhood graph is constructed using K to the nearest neighbor method.If x i is a neighbor of x j defined by KNN, the weight of the edge between them is the Euclidean distance; otherwise, the weight is 0. Second, the geodesic distance matrix M n × N between landmark dataset X P and the original input data X are calculated using the neighborhood graph based on the shortest path distance algorithm.Clearly, only the geodesic distance matrix M n × N is calculated when the computational complexity is reduced from O(N 3 ) to O(nN log N).Third, the d-dimensional embedding of landmark dataset Y L is obtained by MDS.The solutions of Y L are the eigenvectors of the first d-largest eigenvalues of the inner-product matrix ∆ n × n = −HM G H/2, where M G is the squared distance matrix between pairwise landmarks, and H is the mean-centering matrix.Finally, the embedding of non-landmarks Y Q are found by the transformation via where Y + L is the pseudo inverse of landmark Y L , and M is the column mean of the geodesic distance matrix M n × N .In order to project non-landmark points uniquely in the embedded space, the number of landmarks n is larger than the dimension of embeddings d.

TL-ISOMAP-DTW
In this study, an improved version of L-ISOMAP, called Training Landmark-ISOMAP-Dynamic Time Warping (TL-ISOMAP-DTW), was proposed for crop classification using time-series remote sensing data in a large area.We have made two improvements to the original L-ISOMAP algorithm.
(1) The first improvement is to use a dynamic time warping (DTW) metric instead of Euclidean distance to construct the nearest neighbor graph and calculate the weight of the edge.Since the missing data was eliminated, and all the available data were retained during the construction of time series data in Section 3.1, the final TS data for subsequent DR and classification is not equal in length for data points.The original L-ISOMAP algorithm cannot handle unequal time series; therefore, we use the DTW metric instead of Euclidean distance to calculate the distance between points.DTW has been proven to be capable of comparing time series with unequal lengths and handling the temporal distortion and extension [65,66].
The methodology of DTW for time series is as follows.Assume a time series M of length t, and a time series N of length s.
We construct a t-by-s path matrix based on time series M and N where the (ith, jth) element of the matrix corresponds to a distance between the points m i and n j .The warping path is subject to several constraints such as: Endpoint constraint: the starting and ending points of the warping path must be the first point (m 1 , n 1 ) and the last point (m t , n s ) of the path matrix.
Continuity constraint: in the path matrix, each step forward of the path is confined to neighboring points, m i − m i−1 ≤ 1 and n j − n j−1 ≤ 1.
Monotonicity: the warping path must move forward monotonically with respect to time, m i−1 ≤ m i and n j−1 ≤ n j .
The cumulative distance γ(i, j) between two time series is calculated by DTW based on the following dynamic programming formulation: where d(i, j) denotes the distance between m i and n j ; for example, the square of the difference or the magnitude of the difference.
Among all possible paths in the path matrix, the cumulative distance of the warping path found by DTW is the minimum.The path matrix and the DTW metric (red squares) between time series M and N is shown in Figure 5.
(2) The second improvement is the use of training samples as landmark points.In the original L-ISOMAP, the DR result mainly depends on the quantity and quality of the selected landmark points.However, how to determine landmark points lacks a common standard.Choosing the training samples as landmark points not only meets the random principle of the landmark selection, but training samples are also strongly representative of what is beneficial to subsequent classification.In this study, training samples as landmark points of the SITS of each scene with different paths/rows for DR are obtained by an automatic learning method (details in Section 3.3).In addition, the parameter (k) in TL-ISOMAP-DTW needs to be set.The "graph growth" strategy [67] is used to automatically determine the value of k.This strategy adaptively selects the neighbor size for each point instead of selecting the same number of neighbors for all points.
( , ) − ( , ) − Among all possible paths in the path matrix, the cumulative distance of the warping path found by DTW is the minimum.The path matrix and the DTW metric (red squares) between time series M and N is shown in Figure 5.In addition, after applying the cultivated land mask, the number of data points retained in each scene is greatly reduced, which allows most scenes to be directly applied to the improve the DR algorithm.However, there are still several scenes where the number of data points retained is too much to directly apply the DR algorithm.For this problem, we adopt the solution of block processing; that is, the scene is further divided into subsets according to the spatial range, and the time series of each subset is independently processed for DR.

Training Samples Learning
In this study, the different paths/rows for the SITS of each scene require a certain number of training samples as landmark points for nonlinear DR.Due to the limited number of training samples collected for the study area, it is difficult to ensure that the SITS of each scene have enough training samples as landmark points.Therefore, a method of learning new training samples for the SITS of each scene is still needed.We have built an automatic sample learning method based on the dynamic time warping (DTW) metric, and a spectral-temporal curve database for the main crops in northeastern China.

Training Samples Learning Based on Spectral-Temporal Curve Database
The following procedures were adopted to learn of the new training samples, which serve as landmark points for the SITS of each scene with a different path/row: (a) Sampling.A certain percentage of field survey points (10% in this study) were randomly selected for each crop type, and then the spectral-temporal curve corresponding to each point was extracted.
(b) Construction of spectral-temporal curve.Since the samples of each crop type may come from SITS of different scenes, the dates of acquiring observations may be different.In order to obtain a complete spectral-temporal curve during the whole growing season, all the dates of acquiring observations are used in temporal order.This may cause the temporal interval of the constructed spectral-temporal curve to be inconsistent, but it has little effect on the DTW algorithm.If there are multiple available data for the same day, the mean value is calculated.The spectral-temporal curve of maize, soybean, and rice in Northeastern China is shown in Figure 6.
(c) Finding similar pixels.For the SITS of each scene with a different path/row, we used the DTW metric to search for a specified number of pixels closest to the spectral curve of each crop.
(d) Sorting by DTW distance.All similar pixels for the three different crop types were sorted by the DTW distance calculated in the last step.
(e) Choosing training samples.According to the number of landmark points required, the same number of similar pixels with the smallest DTW distance were reserved for SITS in each scene, with a different path/row as training samples for DR and classification.
(b) Construction of spectral-temporal curve.Since the samples of each crop type may come from SITS of different scenes, the dates of acquiring observations may be different.In order to obtain a complete spectral-temporal curve during the whole growing season, all the dates of acquiring observations are used in temporal order.This may cause the temporal interval of the constructed spectral-temporal curve to be inconsistent, but it has little effect on the DTW algorithm.If there are multiple available data for the same day, the mean value is calculated.The spectral-temporal curve of maize, soybean, and rice in Northeastern China is shown in Figure 6.

Determine Landmark Points Size
In order to determine the reasonable number of training samples as landmark points, we used 3512 verification points collected to test the effect of different sizes of landmark points in the TL-ISOMAP-DTW method on the classification accuracy.Different percentages (0.1%, 0.2%, 0.3%, 0.5%, 0.7%, 1%, 2%, 3%, and 5%) of samples were selected as landmark points, respectively.The results of DR using TL-ISOMAP-DTW were inputted into a random forest (RF) classifier.To reduce the possible errors caused by the classification process, the classifier also used different proportions of training samples (the proportion is the same as the percentage chosen for landmark points).The rest of the data were used as testing samples.Finally, the DR performance was evaluated by the classification accuracy.Each experiment was repeated 10 times, with the mean value as the final result.Among the results with the highest supervised classification accuracy, the minimum percentage was used in this study to determine the number of landmark points for the SITS of each scene with a different path/row.

Automatic Crop Mapping and Accuracy Assessment
We input the DR data of each scene with different paths/rows and the corresponding training samples from the last step into random forests (RFs) [68] classifiers to complete the classification for each scene, and we mosaic all of the classification results to obtain the final crop distribution map in Northeastern China.The RF classifier includes multiple tree classifiers, and each tree classifier has an equal vote to decide the class of the input vector [69].To run an RF classifier, two parameters have to be set, including the number of classification trees and the number of input variables used at each node.In this paper, we chose settings of 500 trees and 2 variables at each node for the RF classifier.Finally, the statistical data and field data were used to assess the classification accuracy.Since 10% of the field survey data is used when constructing the spectral-temporal curve, all remaining field data are used for accuracy assessment.

Algorithm Sensitivity to Landmark Points Size
The classification accuracy of landmark points with different percentages in TL-ISOMAP-DTW for 3512 verification points is shown in Figure 7.Meanwhile, the L-ISOMAP method is used for comparison.Since L-ISOMAP cannot handle SITS with missing data, we need to use temporal interpolation for these points before L-ISOMAP DR.All settings are the same with TL-ISOMAP-DTW except for the landmark points, which are randomly selected in L-ISOMAP instead of the activated learning method in TL-ISOMAP-DTW.
It can be seen from Figure 7 that the overall accuracy increases with the rising of the percentage of landmark points in TL-ISOMAP-DTW, which is especially obvious when the percentage of landmark points is less than 0.7%.With the error bar repeated 10 times for each experiment and the percentage of landmark points less than 0.7%, we find that the classification accuracy changes sharply.This shows that the DR results have a great impact on the classification accuracy.In other words, when the percentage of landmark points is less than 0.7%, it is difficult to obtain stable DR results.However, when the percentage of landmark points is greater than 0.7%, the overall classification accuracy tends to be stable, and the highest classification accuracy is almost the same for different percentages of landmark points.The range of the error bar for these percentages also becomes significantly smaller, which means that a stable DR result was obtained.Although the overall classification fluctuates as the proportion of training samples in the RF classifier changes, it is obvious that this fluctuation is not due to the results of DR, but from the classifier itself.Therefore, for the SITS of each scene with different paths/rows, 0.7% of the data points are selected by training samples learning as landmark points for TL-ISOMAP-DTW DR in this study.
words, when the percentage of landmark points is less than 0.7%, it is difficult to obtain stable DR results.However, when the percentage of landmark points is greater than 0.7%, the overall classification accuracy tends to be stable, and the highest classification accuracy is almost the same for different percentages of landmark points.The range of the error bar for these percentages also becomes significantly smaller, which means that a stable DR result was obtained.Although the overall classification fluctuates as the proportion of training samples in the RF classifier changes, it is obvious that this fluctuation is not due to the results of DR, but from the classifier itself.Therefore, for the SITS of each scene with different paths/rows, 0.7% of the data points are selected by training samples learning as landmark points for TL-ISOMAP-DTW DR in this study.
Figure 7 also shows the overall classification accuracy of the L-ISOMAP method using different percentages of landmark points.When the percentage of landmark points is less than 0.5%, the results of L-ISOMAP and TL-ISOMAP-DTW are very close.The error bar, after 10 repeated experiments for both methods, varies greatly.However, when the percentage of landmark points is greater than 0.5%, the TL-ISOMAP-DTW method shows a higher overall classification accuracy than the L-ISOMAP method.In addition, as an advanced classifier, RF can achieve higher classification accuracy with fewer training samples.From this experiment, we can see that when the proportion of training samples reaches 1%, the classification results of the RF classifier tend to stabilize.As the proportion of training samples increases, the classification accuracy does not increase significantly.In each of the SITS scenes with different paths/rows, the number of data points is significantly higher than the magnitude of that used in this experiment.It can be predicted that the proportion of training samples required by the classifier for each SITS scene to achieve stable accuracy will be further reduced.Therefore, considering the efficiency and accuracy of the algorithms in this study, the proportion of training samples required in RF classifier for the SITS of each scene is also set to 0.7%; that is, the training samples used by the RF classifier are the same as the landmark points used for TL-ISOMAP-DTW. Figure 7 also shows the overall classification accuracy of the L-ISOMAP method using different percentages of landmark points.When the percentage of landmark points is less than 0.5%, the results of L-ISOMAP and TL-ISOMAP-DTW are very close.The error bar, after 10 repeated experiments for both methods, varies greatly.However, when the percentage of landmark points is greater than 0.5%, the TL-ISOMAP-DTW method shows a higher overall classification accuracy than the L-ISOMAP method.In addition, as an advanced classifier, RF can achieve higher classification accuracy with fewer training samples.From this experiment, we can see that when the proportion of training samples reaches 1%, the classification results of the RF classifier tend to stabilize.As the proportion of training samples increases, the classification accuracy does not increase significantly.In each of the SITS scenes with different paths/rows, the number of data points is significantly higher than the magnitude of that used in this experiment.It can be predicted that the proportion of training samples required by the classifier for each SITS scene to achieve stable accuracy will be further reduced.Therefore, considering the efficiency and accuracy of the algorithms in this study, the proportion of training samples required in RF classifier for the SITS of each scene is also set to 0.7%; that is, the training samples used by the RF classifier are the same as the landmark points used for TL-ISOMAP-DTW.

Overall Classification Performance
The distribution of maize, rice, and soybeans in the study area is shown in Figure 8.The classification validation based on sample points from field surveys and Google Earth indicate that the crop map of northeastern China in 2015 had high accuracy, with an overall accuracy of 84.83% (Table 3).The producer accuracy and user accuracy of the three main crops including maize, rice, and soybeans are significantly higher than those of other crops.The producer accuracy of rice is the highest, which is 0.51% and 5.69% higher than that of maize and soybeans, respectively.In contrast, the user accuracy of maize is the highest, which is 4.33% and 15.58% higher than that of rice and soybeans, respectively.

Overall Classification Performance
The distribution of maize, rice, and soybeans in the study area is shown in Figure 8.The classification validation based on sample points from field surveys and Google Earth indicate that the crop map of northeastern China in 2015 had high accuracy, with an overall accuracy of 84.83% (Table 3).The producer accuracy and user accuracy of the three main crops including maize, rice, and soybeans are significantly higher than those of other crops.The producer accuracy of rice is the highest, which is 0.51% and 5.69% higher than that of maize and soybeans, respectively.In contrast, the user accuracy of maize is the highest, which is 4.33% and 15.58% higher than that of rice and soybeans, respectively.The specific crop area produced by this study is shown in Table 4.The area of each type = the number of pixels × the square of spatial resolution.The comparison and percentage difference between the classification results of this study and the 2015 statistical yearbook data is shown in Figure 9.The percentage difference of each type in each province = | the area difference | / the mean of corresponding type in the yearbook and classification results.From Figure 9, we find that the classified area of rice in all three provinces is higher than that in the statistical yearbook data, especially in Liaoning Province, where the difference is up to 150.5 (1000 ha).Rice has the greatest difference between the total classified area in northeastern China and the statistical yearbook data among these three main crops.The classified area of maize in Heilongjiang Province is higher than the statistical yearbook data, while in Jilin Province and Liaoning Province, it is lower than the yearbook data, which makes the total area difference of maize in northeastern China much smaller than that of rice.The classified area of soybeans is the closest to the statistical yearbook among the three crops, which is mainly because the planting area of soybeans is small.Therefore, it is necessary for us to further obtain the percentage difference based on the absolute value of difference and mean between statistic data and classification results.As can be seen from Figure 9, the percentage difference of maize is less than 10% in the three provinces, and it is also less than 5% in Heilongjiang Province.The percentage difference of rice and soybeans is the highest in Liaoning Province, with both of them exceeding 20%.For each province, the lowest percentage difference appears on the crop with the largest planting area, and the highest percentage difference occurs on the crop with the smallest planting area.12 The specific crop area produced by this study is shown in Table 4.The area of each type = the number of pixels × the square of spatial resolution.The comparison and percentage difference between the classification results of this study and the 2015 statistical yearbook data is shown in Figure 9.The percentage difference of each type in each province = | the area difference | / the mean of corresponding type in the statistical yearbook and classification results.From Figure 9, we find that the classified area of rice in all three provinces is higher than that in the statistical yearbook data, especially in Liaoning Province, where the difference is up to 150.5 (1000 ha).Rice has the greatest difference between the total classified area in northeastern China and the statistical yearbook data among these three main crops.The classified area of maize in Heilongjiang Province is higher than the statistical yearbook data, while in Jilin Province and Liaoning Province, it is lower than the yearbook data, which makes the total area difference of maize in northeastern China much smaller than that of rice.The classified area of soybeans is the closest to the statistical yearbook among the three crops, which is mainly because the planting area of soybeans is small.Therefore, it is necessary for us to further obtain the percentage difference based on the absolute value of difference and mean between statistic data and classification results.As can be seen from Figure 9, the percentage difference of maize is less than 10% in the three provinces, and it is also less than 5% in Heilongjiang Province.The percentage difference of rice and soybeans is the highest in Liaoning Province, with both of them exceeding 20%.For each province, the lowest percentage difference appears on the crop with the largest planting area, and the highest percentage difference occurs on the crop with the smallest planting area.

DR Performance
To verify the proposed method more intuitively, we chose 100 field samples for each of the three crops, i.e., maize, rice, and soybeans to complete DR.We chose the ISOMAP and L-ISOMAP algorithms to compare with TL-ISOMAP-DTW.Since ISOMAP and L-ISOMAP cannot handle time-series data with unequal length, we need to interpolate time-series data first for these two algorithms.Landmark points for L-ISOMAP and TL-ISOMAP-DTW were selected both at random and by the samples learning method designed in this study, respectively.For simplicity, the target dimensionality of all the DR algorithm is two.
Figure 10a displays the original distribution for bands 82 (red band in July) and 118 (NIR band in August) from time-series data constructed in Section 3.1.Figure 10b-d shows the projected data distribution using different DR algorithms.It is overly obvious that the results of ISOMAP and L-ISOMAP are very poor according to the separation of three classes.ISOMAP is a global algorithm, and although the global structure of the data can be retained, the DR results do little to the follow-up classification, since the various classes are still overlapped.In addition, the complexity of the ISOMAP algorithm is significantly higher than that of the other two algorithms.One possible reason for the poor performance of L-ISOMAP is that the landmark points are randomly selected and are not well representative of different crop classes.The TL-ISOMAP-DTW algorithm separates the three classes quite nicely.Mainly, this is not only because the selection of landmark points is very representative, but also because the DTW metric has a good adaptability to the loss and distortion of time-series data.

DR Performance
To verify the proposed method more intuitively, we chose 100 field samples for each of the three crops, i.e., maize, rice, and soybeans to complete DR.We chose the ISOMAP and L-ISOMAP algorithms to compare with TL-ISOMAP-DTW.Since ISOMAP and L-ISOMAP cannot handle timeseries data with unequal length, we need to interpolate time-series data first for these two algorithms.Landmark points for L-ISOMAP and TL-ISOMAP-DTW were selected both at random and by the samples learning method designed in this study, respectively.For simplicity, the target dimensionality of all the DR algorithm is two.
Figure 10a displays the original distribution for bands 82 (red band in July) and 118 (NIR band in August) from time-series data constructed in Section 3.1.Figure 10b-d shows the projected data distribution using different DR algorithms.It is overly obvious that the results of ISOMAP and L-ISOMAP are very poor according to the separation of three classes.ISOMAP is a global algorithm, and although the global structure of the data can be retained, the DR results do little to the follow-up classification, since the various classes are still overlapped.In addition, the complexity of the ISOMAP algorithm is significantly higher than that of the other two algorithms.One possible reason for the poor performance of L-ISOMAP is that the landmark points are randomly selected and are not well representative of different crop classes.The TL-ISOMAP-DTW algorithm separates the three classes quite nicely.Mainly, this is not only because the selection of landmark points is very representative, but also because the DTW metric has a good adaptability to the loss and distortion of time-series data.

Geographical Characteristics of Crop Distribution in Northeastern China
The maize planting area of Heilongjiang Province accounts for 45.7% of the total cultivated land area of the whole province, which was mainly distributed in Heihe City, Qiqihar City, Daqing City, Suihua City, and Harbin City in the southwest of Heilongjiang Province.For Hegang, Jiamusi, Mudanjiang, Qitaihe, and other eastern cities, there is also some maize planting.The rice planting area accounts for 24.8% of the total cultivated land area of the province, which is mainly concentrated in the east-central part of Heilongjiang Province, including Harbin, Jiamusi, and Qiqihar.Soybeans are mainly distributed in Mudanjiang, Suihua, Daqing, and other places.In addition, a number of other crops have been planted in the east and midwest.
Cultivated land in Jilin Province is mainly concentrated in the northwest of Jilin Province, including Chanchun City, Jilin City, Liaoyuan City, Siping City, and Baicheng City.These areas are rich in annual precipitation, sustained periods of sunshine, and fertile black land; thus, they become a very suitable zone for maize planting and growth.This area includes the southern part of Heilongjiang Province, the main cultivated land areas in Jilin Province, and the northern part of Liaoning Province.The Golden Maize Belt in Jilin Province includes Changchun City, Siping City, Songyuan City, Baicheng City, Liaoyuan City, and Jilin City.It encompasses a total of 6 cities and 22 counties, with a total area of more than 60,000 square kilometers, and the cultivated land area here accounts for about 75% of the total cultivated land area of Jilin Province.This provides a unique resource advantage and brings much wealth to Jilin Province.
The total area of cultivated land in Liaoning Province is the smallest among the three provinces in northeastern China; it is mainly concentrated in the central and northern part of Liaoning Province.The maize-planting area accounts for more than 50% of the total cultivated land in Liaoning Province, which is mainly distributed in Tieling, Fuxin, Chaoyang, Jinzhou, Panjin, Shenyang, and Benxi city.Rice is mainly distributed in Panjin, Anshan, and Fushun, and soybeans are only planted sporadically.

Reliability of Crop Mapping in Northeastern China
This study proposed the automated and reliable yearly 30-m crop mapping method in a large scale area and the improved DR method based on the L-ISOMAP algorithm and DTW metric for Landsat 8 time-series images.To our knowledge, this study was the first investigation to map the three main crops at 30-m spatial resolution in northeastern China.The implementation of this study can be attributed to the following factors: improved TS data, samples learning method, modified DR algorithm, and simple natural conditions.
First, the enhanced observation capability of Landsat 8 provides observation with high-temporal frequency, which has much higher advantages in terms of data availability than Landsat 5 and 7 in both the yearly scale and the transplanting phase [59,70].Landsat 8 TS data contain abundant phenological information of crop growth.Although the average missing (cloud/snow cover) data accounted for 33.73% of the total Landsat 8 TS images used in this study during the crop-growing period in 2015 (Figure 2), we used all available observations of individual pixels to heighten the representation of phenological information [71], instead of choosing scenes with less cloud/shadow.In addition, we rearranged the Landsat 8 TS data to obtain information about the variation of each spectral band over the whole crop growth, so that the phenological characteristics of crops can be used more effectively and directly.
Second, in remote sensing, the method of labeling samples is mainly achieved by in situ surveys or image photointerpretation [72].However, in the crop classification on a large regional scale, it is difficult to obtain a large number of samples, which leads to a large amount of human effort and time.Therefore, many studies adopted the samples learning method to label new training samples for supervised algorithms [73][74][75].However, there is often missing data in TS images, and the performance is poor by directly using the common samples learning methods based on the uncertainty and the diversity criteria [76].This study demonstrates that samples learning based on a spectral-temporal curve database extracted from SITS can label effective samples for crop classification in a large-scale area.
Third, data redundancy exists in time-series data, and DR is often required before classification.The nonlinear DR algorithms have been successfully applied to time-series data [54,77].However, because the complexity of the nonlinear DR algorithm is quite high, there is no research to apply the nonlinear DR algorithm to the time-series images of a large area.This study proved that the TL-ISOMAP-DTW method is reliable for a large area in northeastern China, which is helpful for applying the nonlinear DR algorithm to the time series images of a large area.
Fourth, compared with tropical regions, the simpler natural conditions in northeastern China are another important factor in the success of this study.Northeastern China is different from the natural conditions in tropical regions: there is lower planting density (generally single cropping), a longer crop growing season, and less cloud coverage in the study area.All of these variables provide opportunities that can aid in the completion of this study.

Uncertainty Analysis and Implications for Other Regions
This paper presents an automatic crop classification method that uses all available data and is not reliant on human intervention.The impact of cloud coverage on the accuracy of classification is difficult to quantify when using time-series data.Since the time-series data includes different regions, years, and frequency, the proportion of cloud coverage is quite different.It is difficult to show the complexity of cloud cover for satellite image time series, regardless of the total or average cloud cover of time-series data.This complexity is more evident in large-scale area studies, which include more time-series data with different paths and rows.In this study, the cloud cover of satellite image time series with different paths and rows varies greatly, and this brings some uncertainty to this method.In fact, this is also a problem that most image classification using satellite image time series will face [25].This paper uses a series of robust strategies: using a DTW metric to improve the nonlinear dimensionality reduction algorithm, and building a spectral-temporal curve database and learning training samples to limit the uncertainty caused by cloud cover, to a certain extent.
Furthermore, due to the low frequency of observations, this method is not recommended for satellite image time series covering only the beginning of the crop-growing season.It has been pointed out that Landsat 7 data are insufficient for annual crop mapping due to the limited observation capabilities and SLC (Scan Line Corrector)-off gaps [71].Consequently, many studies have focused on improving the temporal resolution of satellite image time series through data synthesis, which sheds light on classification methods that use temporal features because it can improve the availability of effective observations [78,79].This study demonstrates the feasibility of mapping crop distribution using Landsat 8 data during the growing season in northeastern China.However, if this method is used in tropical areas with higher cloud cover, such as southeast China, where planting is more dispersed and intensive, there may be more challenges.With the implementation of the sharing policy for earth observation by many countries, more and more satellite image time series are available free of charge, such as Sentinel-2 of ESA (European Space Agency), whose temporal resolution has been improved to 5 days, and HLS (Harmonized Landsat Sentinel-2) data synthesized using Sentinel-2 and Landsat 8 data, which has a temporal resolution of 2-3 days [80,81].Therefore, with the enrichment of time series, the wide application of the method proposed in this paper for other areas can be predicted and is encouraged.

Conclusions
In this study, we propose a new automated method based on Landsat 8 OLI time-series data to map the spatial distribution of three main crops including maize, rice, and soybeans in northeastern China.The most impressive feature of the automated method is that it reduces the need for reference data and expert knowledge in model training, thereby significantly lower the cost of crop mapping.This is especially important for high-resolution mapping at regional or larger scales.Considering the

23 Figure 1 .
Figure 1.The geographical location of the study area.

Figure 2 .
Figure 2. The Landsat path/row for the whole study area and the acquisition date and cloud coverage of all temporal images for each scene with different paths/rows in the crop-growing season.The cloud coverage percentage is calculated based on all the temporal images of each scene.(a) The acquisition date and cloud coverage for the paths/rows with 11 different temporal images during the crop-growing season.(b) The acquisition date and cloud coverage for the paths/rows with 12 different temporal images during the crop-growing season.The legend shows the acquisition date (month.day) of images.

Figure 3 .
Figure 3.The cities where the training and testing samples were collected.Photos of maize taken by the field survey group in Chaoyang and Liaoyuan city.

Figure 3 .
Figure 3.The cities where the training and testing samples were collected.Photos of maize taken by the field survey group in Chaoyang and Liaoyuan city.

Figure 4 .
Figure 4.The flowchart to extract the distribution of crops.

Figure 4 .
Figure 4.The flowchart to extract the distribution of crops.
the original input dataset and the corresponding low-dimensional embedding, where D and d denote input and embedding dimensions (d D).The landmark dataset is X P = x l p n l=1

Figure 5 .
Figure 5.The dynamic time warping (DTW) metric (red squares) between time series M and N.Figure 5.The dynamic time warping (DTW) metric (red squares) between time series M and N.

Figure 5 .
Figure 5.The dynamic time warping (DTW) metric (red squares) between time series M and N.Figure 5.The dynamic time warping (DTW) metric (red squares) between time series M and N.

Figure 7 .Figure 7 .
Figure 7. Classification accuracy and error bar of landmark points with different proportions for both Landmark-Isometric feature mapping (L-ISOMAP) and Training Landmark-ISOMAP-Dynamic Figure 7. Classification accuracy and error bar of landmark points with different proportions for both Landmark-Isometric feature mapping (L-ISOMAP) and Training Landmark-ISOMAP-Dynamic Time Warping (TL-ISOMAP-DTW) applied to 3512 verification points.Top row: landmark points with a percentage of 0.1%, 0.2%, or 0.3%.Middle row: landmark points with a percentage of 0.5%, 0.7%, or 1%.Bottom row: landmark points with a percentage of 2%, 3%, or 5%.The horizontal axis represents the proportion of training samples selected in random forest (RF) classifier.

Figure 8 .Table 3 .Figure 8 .
Figure 8.The distribution of the main crops cultivated area in northeastern China in 2015.Table 3. Confusion matrix of classification validation based on sample points from field survey and Google Earth.Overall Accuracy = 84.83%Class Reference Producer Accuracy (%) User Accuracy (%)

Figure 9 .
Figure 9.The area and percentage difference of three crops in three provinces from official statistics and classification produced by this study (HLJ: Heilongjiang province, JL: Jilin province, LN: Liaoning province).

Figure 9 .
Figure 9.The area and percentage difference of three crops in three provinces from official statistics and classification produced by this study (HLJ: Heilongjiang province, JL: Jilin province, LN: Liaoning province).

Table 1 .
Number of crop reference sites.

Crop Number of Field Survey Number of Google Earth
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 23

Table 1 .
Number of crop reference sites.
CropNumber of Field Survey Number of Google Earth Maize 1028 657

Table 2 .
Statistical data of planting acreage for three major crops in northeastern China, 2015 (1000 ha).

Table 2 .
Statistical data of planting acreage for three major crops in northeastern China, 2015 (1000 ha).

Table 3 .
Confusion matrix of classification validation based on sample points from field survey and Google Earth.

Table 4 .
The crop area from classification results in this study in northeastern China, 2015 (1000 ha).

Table 4 .
The crop area from classification results in this study in northeastern China, 2015 (1000 ha).