Classiﬁcation of the Complex Agricultural Planting Structure with a Semi-Supervised Extreme Learning Machine Framework

: Many approaches have been developed to analyze remote sensing images. However, for the classiﬁcation of large-scale problems, most algorithms showed low computational e ﬃ ciency and low accuracy. In this paper, the newly developed semi-supervised extreme learning machine (SS-ELM) framework with k-means clustering algorithm for image segmentation and co-training algorithm to enlarge the sample sets was used to classify the agricultural planting structure at large-scale areas. Data sets collected from a small-scale area within the Hetao Irrigation District (HID) at the upper reaches of the Yellow River basin were used to evaluate the SS-ELM framework. The results of the SS-ELM algorithm were compared with those of the random forest (RF), ELM, support vector machine (SVM) and semi-supervised support vector machine (S-SVM) algorithms. Then the SS-ELM algorithm was applied to analyze the complex planting structure of HID in 1986–2010 by comparing the remote sensing estimated results with the statistical data. In the small-scale case, the SS-ELM algorithm performed better than the RF, ELM, SVM, and S-SVM algorithms. For the SS-ELM algorithm, the average overall accuracy (OA) was in a range of 83.00–92.17%. On the contrary, for the other four algorithms, their average OA values ranged from 56.97% to 92.84%. Whereas, in the classiﬁcation of planting structure in HID, the SS-ELM algorithm had an excellent performance in classiﬁcation accuracy and computational e ﬃ ciency for three major planting crops including maize, wheat, and sunﬂowers. The estimated areas by using the SS-ELM algorithm based on the remote sensing images were consistent with the statistical data, and their di ﬀ erence was within a range of 3–25%. This implied that the SS-ELM framework could be served as an e ﬀ ective method for the classiﬁcation of complex planting structures with relatively fast training, good generalization, universal approximation capability, and reasonable learning accuracy.


Introduction
The remote sensing image data provides material for the detailed interpretation of large-scale surface coverage [1,2]. Based on the remote sensing data and combined with the current classification and recognition algorithms of surface features, an effective inversion of surface coverage can be achieved to some extent [3][4][5]. Among them, the effective classification of surface vegetation can provide key basic information for identifying surface cover types and plant growth conditions over large scales and estimating regional evapotranspiration. All of the data are significant for regional crop management, crop yield estimation, and the protection of agroecosystems. For the planting structures under different labeled sample sizes, and (2) to apply the SS-ELM algorithm in a large-scale agricultural area with Landsat images to obtain highly accurate vegetation recognition.

Research Area
The Hetao irrigation district (HID) located in the upper reaches of the Yellow River basin (latitude 40.1°N-41.4°N, longitude 106.1°N-109.4°E) was selected as the study area (see Figure 1). HID covers an area of 1. 12 Mha, in which about 570,000 ha is irrigated farmland. As shown in Figure 2, the land-use categories can be further classified as saline-alkali land, sand dune, waterbody, residential area, bare land, marsh, greenhouse, and cropland. All these land use categories had been segmented before the classification of cropping areas in Section 3.1. Maize, spring wheat, sunflowers, and horticulture crops, e.g. watermelons, tomatoes and peppers are the main crops grown in HID recently. The growing season of spring wheat begins in late March and ends in mid-July. The maize growing season is from late April to late September. Sunflowers and vegetables are both planted in late May and harvested in mid-September and late August, respectively. Meanwhile, the landscape is often divided into small farms with fragmented cropping patterns due to the smallholder policy of the farmland use rights.  As shown in Figure 2, the land-use categories can be further classified as saline-alkali land, sand dune, waterbody, residential area, bare land, marsh, greenhouse, and cropland. All these land use categories had been segmented before the classification of cropping areas in Section 3.1. Maize, spring wheat, sunflowers, and horticulture crops, e.g., watermelons, tomatoes and peppers are the main crops grown in HID recently. The growing season of spring wheat begins in late March and ends in mid-July. The maize growing season is from late April to late September. Sunflowers and vegetables are both planted in late May and harvested in mid-September and late August, respectively. Meanwhile, the landscape is often divided into small farms with fragmented cropping patterns due to the smallholder policy of the farmland use rights.
Remote Sens. 2020, 12, x 3 of 19 planting structures under different labeled sample sizes, and (2) to apply the SS-ELM algorithm in a large-scale agricultural area with Landsat images to obtain highly accurate vegetation recognition.

Research Area
The Hetao irrigation district (HID) located in the upper reaches of the Yellow River basin (latitude 40.1°N-41.4°N, longitude 106.1°N-109.4°E) was selected as the study area (see Figure 1). HID covers an area of 1. 12 Mha, in which about 570,000 ha is irrigated farmland. As shown in Figure 2, the land-use categories can be further classified as saline-alkali land, sand dune, waterbody, residential area, bare land, marsh, greenhouse, and cropland. All these land use categories had been segmented before the classification of cropping areas in Section 3.1. Maize, spring wheat, sunflowers, and horticulture crops, e.g. watermelons, tomatoes and peppers are the main crops grown in HID recently. The growing season of spring wheat begins in late March and ends in mid-July. The maize growing season is from late April to late September. Sunflowers and vegetables are both planted in late May and harvested in mid-September and late August, respectively. Meanwhile, the landscape is often divided into small farms with fragmented cropping patterns due to the smallholder policy of the farmland use rights.

Data and Preprocessing
Twenty-four Landsat Thematic Mapper (TM) and Operational Land Imager (OLI) images covering HID from 1986 to 2010 (see Table 1) were downloaded from https://earthexplorer.usgs.gov/. These images have six (for TM) or eight (for OLI) multispectral bands with the spatial resolution of 30 m × 30 m and one panchromatic band with the resolution of 15 m × 15 m. The L1T-level Landsat images (TM and OLI) were geometrically corrected by the system up to sub-pixel accuracy without further geometric correction. To obtain consistently geometrical data (TM and OLI) from different sensors, the relative radiation was normalized based on the enhanced thematic mapper plus (ETM+) images, and the normalization was carried out for OLI images in 2000-2010. Table 1 describes the 24 Landsat Thematic Mapper (TM) and Operational Land Imager (OLI) images used in this study. In the small-scale test case, the Google Earth's historical images with a spatial resolution of 1.6 m were used to improve the fitting accuracy. The statistical data of the planting area of the three crops from 1986 to 2010 are available at the Bayannur Agricultural Information Network (http://www.bmagri.gov.cn), Bayannur Statistics Bureau (http://tjj.bynr.gov.cn), and the Administration Bureau of the Hetao Irrigation District (http: //www.zghtgq.com/).
The software ENVI 5.4 was employed for pre-processing the downloaded Landsat TM/OLI images, the preprocess includes band combination, atmospheric correction with the fast line-of-sight-atmospheric analysis of spectral hypercubes (FLAASH) tool, image mosaic, and image subset. The purpose of atmospheric correction is to eliminate the influence of atmosphere and illumination on the reflectance of ground objects and to obtain near-surface reflectance.

Methodology
To improve the accuracy and efficiency of the traditional supervised learning-based classification, a newly developed semi-supervised extreme learning machine (SS-ELM) classification method, which combines image segmentation with a new self-label algorithm, was used in the following classification. The classification method includes three steps: image segmentation with k-means, self-label, and planting structure classification (see Figure 3). First, the non-agricultural regions are segmented using k-means unsupervised learning algorithm. Then the co-training self-label algorithm based on the SVM and ELM classifiers is used to enlarge the sample set (see Table 2). Finally,

Image Segmentation with k-Means
The k-means clustering algorithm, also known as a kind of semi-supervised learning which is simple and fast. It is based on an iterative process that divides the image into different clusters [25][26][27]. The data points or pixels are grouped exclusively. In this case, if a data point belongs to a certain cluster, then it will not belong to any other clusters. Conventional k-means clustering is selected where the clusters are fully dependent on the selection of the initial centroid point [28,29]. The k-means algorithm assumes Euclidean distance based on the similarity and/or dissimilarity. The Euclidean distance of the k-means algorithm can be expressed as where D is the Euclidean distance of the k-means algorithm, pk and qk are the k-th the pixel intensity of data objects p and q, respectively, and n is the number of clusters. Initial seed points for the kmeans clustering are randomly chosen for the entire image, and the distances between all the pixels and seed points are calculated. Pixels with minimum distance to the respective seed point are clustered together. A new mean value is calculated in each iteration. The interaction continues until there is no change or variation in the mean value.
In this study, different k values have been tested, and a stable clustering result can be obtained for k = 25. Thus, 25 clusters are selected for the k-means segmentation. The pixels of each region in the segmented image is statistically analyzed. After that, the adjacent clusters of non-agricultural regions are combined, and then further image segmentation is performed.

The ELM
The ELM has been widely adopted for various classifications, with its significant advantage of extremely fast training, learning accuracy, and good generalization [30]. The standard ELM has the structure of single-hidden layer feed-forward neural networks (SLFNs) [30], which contains input layer, hidden layer, and output layer. Implementing the ELM includes two steps, the first is to transform the input data into the hidden layer by using the ELM feature mapping, and the second step is to generate the results by using the ELM learning.  Table 2. The procedure of the co-training self-label algorithm (CTSLAL).

Image Segmentation with k-Means
The k-means clustering algorithm, also known as a kind of semi-supervised learning which is simple and fast. It is based on an iterative process that divides the image into different clusters [25][26][27]. The data points or pixels are grouped exclusively. In this case, if a data point belongs to a certain cluster, then it will not belong to any other clusters. Conventional k-means clustering is selected where the clusters are fully dependent on the selection of the initial centroid point [28,29]. The k-means algorithm assumes Euclidean distance based on the similarity and/or dissimilarity. The Euclidean distance of the k-means algorithm can be expressed as where D is the Euclidean distance of the k-means algorithm, p k and q k are the k-th the pixel intensity of data objects p and q, respectively, and n is the number of clusters. Initial seed points for the k-means clustering are randomly chosen for the entire image, and the distances between all the pixels and seed points are calculated. Pixels with minimum distance to the respective seed point are clustered together.
A new mean value is calculated in each iteration. The interaction continues until there is no change or variation in the mean value. In this study, different k values have been tested, and a stable clustering result can be obtained for k = 25. Thus, 25 clusters are selected for the k-means segmentation. The pixels of each region in the Remote Sens. 2020, 12, 3708 6 of 18 segmented image is statistically analyzed. After that, the adjacent clusters of non-agricultural regions are combined, and then further image segmentation is performed.

The ELM
The ELM has been widely adopted for various classifications, with its significant advantage of extremely fast training, learning accuracy, and good generalization [30]. The standard ELM has the structure of single-hidden layer feed-forward neural networks (SLFNs) [30], which contains input layer, hidden layer, and output layer. Implementing the ELM includes two steps, the first is to transform the input data into the hidden layer by using the ELM feature mapping, and the second step is to generate the results by using the ELM learning.
The relationship between output and input of the SLFNs with L hidden nodes can be expressed as follows: where f (•) is the output of the neural network, x is the input of neural networks, β i = [β 1 , . . . , β L ] T is the vector of the output weights between the hidden layer with L nodes to the output layer with m is the output vector of the hidden layer, G i (•) is the i-th hidden node activation function, c i is the input weight vector connecting the input layer to the i-th hidden layer, and b i is the bias weight of the i-th hidden layer. The different hidden neurons can adopt different activation functions, e.g., Fourier function, hard-limit function, and Gaussian function [31][32][33].
If different activation functions are selected, the resulting expressions are different. In this study, the following activation function was used: Different from the traditional learning algorithms, the ELM emphasizes that the hidden neurons should be fixed, and the ELM solutions aim to find the smallest training error and the smallest norm of output weights [33]: Minimize : β σ 1 where σ 1 > 0, σ 2 > 0, s 1 , s 2 = 0, 1/2, 1, 2, . . . , +∞. H is the hidden layer output matrix, which can be written as and n is the number of input nodes, m is the number of output nodes. T is the training data-target matrix, which can be expressed as The output weight β is calculated by the following equation: where H † is the Moore-Penrose generalized inverse of matrix H. With the aforementioned descriptions, fast and effective learning can be established. Given a training set, where Y is the training set, y i is the training data which is the value of each band of an image pixel, and t i is the class label of each sample, e.g., the label of vegetables, maize, wheat, and sunflowers, etc. The flowchart of the ELM can be found in Figure 4, and the calculated process of the ELM training algorithm can be summarized as follows [33]: Randomly assign the hidden node parameters: the input weights c i and biases b i .

2.
Calculate output matrix H of the hidden layer with Equation (5).

The Co-Training Self-Label Algorithm
The co-training self-label algorithm (CTSLAL) is an effective solution for learning a significant number of unlabeled samples and obtaining sufficient training datasets for fully supervised learning. The detailed procedure of the CTSLAL is presented in Figure 5 and Table 2. The CTSLAL includes two main processes: training and labeling. In the training process, the labeled set is used as the initial training set to create the SVM and ELM classifier. Then, the enlarged labeled set is used as the training set to update the SVM and ELM classifiers. In the labeling stage, a proportion of unlabeled samples are fed to the pre-trained SVM and ELM classifiers to output the confidence in each category. The CTSLAL combines the SVM and ELM classification results to label the samples according to its most confident predictions. After enlarging the labeled sample set by CTSLAL, the training process continues.
The training and labeling processes are performed iteratively. In the beginning, labeled set (L) and unlabeled set (U) are used as the input, then an enlarged set (EL) can be obtained by combining the labeled samples (L) with the co-labeled set (CL). Both the classifier SVM (clf_svm) and the classifier ELM (clf_elm), and both the SVM evaluator and the ELM evaluator are initially trained with L, respectively. After training the two independent classifiers, a set of samples from U is learned by using the clf_svm and the clf_elm. Then two annotated sets can be obtained. The two annotated sets will be compared by using the co-training based evaluator, and the samples with the same label will be added to CL. EL will be updated accordingly. In the next loop, the clf_svm and the clf_elm are updated by training with the updated EL, and then continuously perform the labeling task by the co-training-based evaluator. The unlabeled samples are learned in this iterative manner. The rest unlabeled set will be continuously learnt until the number of samples in EL does not increase.

The Co-Training Self-Label Algorithm
The co-training self-label algorithm (CTSLAL) is an effective solution for learning a significant number of unlabeled samples and obtaining sufficient training datasets for fully supervised learning. The detailed procedure of the CTSLAL is presented in Figure 5 and Table 2. The CTSLAL includes two main processes: training and labeling. In the training process, the labeled set is used as the initial training set to create the SVM and ELM classifier. Then, the enlarged labeled set is used as the training set to update the SVM and ELM classifiers. In the labeling stage, a proportion of unlabeled samples are fed to the pre-trained SVM and ELM classifiers to output the confidence in each category. The CTSLAL combines the SVM and ELM classification results to label the samples according to its most confident predictions. After enlarging the labeled sample set by CTSLAL, the training process continues.
The training and labeling processes are performed iteratively. In the beginning, labeled set (L) and unlabeled set (U) are used as the input, then an enlarged set (EL) can be obtained by combining the labeled samples (L) with the co-labeled set (CL). Both the classifier SVM (clf_svm) and the classifier ELM (clf_elm), and both the SVM evaluator and the ELM evaluator are initially trained with L, respectively. After training the two independent classifiers, a set of samples from U is learned by using the clf_svm and the clf_elm. Then two annotated sets can be obtained. The two annotated sets will be compared by using the co-training based evaluator, and the samples with the same label will be added to CL. EL will be updated accordingly. In the next loop, the clf_svm and the clf_elm are updated by training with the updated EL, and then continuously perform the labeling task by the co-training-based evaluator. The unlabeled samples are learned in this iterative manner. The rest unlabeled set will be continuously learnt until the number of samples in EL does not increase.

Evaluation and Application of the SS-ELM Method
To evaluate the performance and verify the effectiveness of the SS-ELM algorithm, an image collected from a small-scale area which is about 0.17% of the total area of HID, was used to perform the evaluation. This image has 12,593 × 12,030 pixels, including maize, sunflowers, wheat, vegetables, grove, bare land, and residential areas. The pixel size of the tested area is 1.6 m × 1.6 m. More specifically, the small-scale area was reclassified into seven categories, and 100,000 pixels were sampled randomly from each category for manual classification and marking. Six experiments were designed and conducted with the dataset of the image (see Table 3). As shown in Table 3, the randomly selected samples of the test dataset were 66,675 for wheat, 97,847 for corn, 54,963 for vegetables, and 49,516 for sunflowers, respectively. The randomly selected samples of the training set were about 0.1% and 0.01% of the image pixel number for experiments 1 and 2, respectively. Whereas 16, 8, 4 and 2 samples were selected as the training set for experiments 3 to 6. After evaluation with data from the small-scale area, the proposed SS-ELM algorithm was used to classify the planting structure of HID. In the classification of the HID planting structure, the number of manually labeled samples for each category was 20, and the number of the unlabeled set was 15,000.

Evaluation and Application of the SS-ELM Method
To evaluate the performance and verify the effectiveness of the SS-ELM algorithm, an image collected from a small-scale area which is about 0.17% of the total area of HID, was used to perform the evaluation. This image has 12,593 × 12,030 pixels, including maize, sunflowers, wheat, vegetables, grove, bare land, and residential areas. The pixel size of the tested area is 1.6 m × 1.6 m. More specifically, the small-scale area was reclassified into seven categories, and 100,000 pixels were sampled randomly from each category for manual classification and marking. Six experiments were designed and conducted with the dataset of the image (see Table 3). As shown in Table 3, the randomly selected samples of the test dataset were 66,675 for wheat, 97,847 for corn, 54,963 for vegetables, and 49,516 for sunflowers, respectively. The randomly selected samples of the training set were about 0.1% and 0.01% of the image pixel number for experiments 1 and 2, respectively. Whereas 16, 8, 4 and 2 samples were selected as the training set for experiments 3 to 6. After evaluation with data from the small-scale area, the proposed SS-ELM algorithm was used to classify the planting structure of HID. In the classification of the HID planting structure, the number of manually labeled samples for each category was 20, and the number of the unlabeled set was 15,000. To evaluate the classification capability of the SS-ELM algorithm, the results of SS-ELM were compared with those of random forest (RF) [2,10], ELM [5,[33][34][35][36], support vector machine (SVM) [22] and the semi-supervised support vector machine (S-SVM) [37] for each test. The overall accuracy (OA) [1,11] and the producer's accuracy were used as the criteria to evaluate the performance and the effectiveness of the SS-ELM algorithm. OA accounts for the percentage of the properly classified samples over the total samples. Whereas the producer's accuracy is the probability of a random sample on the ground being the same as the classification result.
If OA is greater than 80%, the classification results are considered reliable and reasonable [36]. After evaluation, the SS-ELM algorithm was then used for detecting the planting structures of typical years in HID, and the identified areas for each crop were compared against the statistical data. Table 4 presents a comparison of the classification accuracy of different algorithms. As shown in Table 4, OA of the SS-ELM algorithm was with values of 83-89.53% for experiments 3-6, which were higher than the results of other classification algorithms. The OA values were greater than the criterion of 80% for a reasonable classification, indicating that the SS-ELM algorithm could obtain stable classification results with any samples of the training set. Moreover, increasing the samples of the training set could increase the classification accuracy of the SS-ELM algorithm. However, for the RF algorithm in experiments 2-6, its OA values were in the range of 56.97-73.76%, which were smaller than 80%, indicating that the RF algorithm could not obtain reasonable classification results. Furthermore, for the other three algorithms, when the samples of the training set reduced to a certain number, their classification results could not meet the accuracy requirements. For example, in experiments 3-6, the OA values of the SVM algorithm were smaller than 80%, respectively. The ELM algorithm also obtained similar results in experiments 3-6, and the S-SVM algorithm in experiment 6, respectively. In summary, the SS-ELM algorithm has the best performance for the classification of agricultural planting structure of small-size samples. When the difference of the decision tree is obvious, the RF algorithm will show high classification accuracy for individual categories, e.g., maize and sunflowers. However, it cannot produce reasonable overall classification results because the vegetables were mistakenly classified as other crop categories. Furthermore, for the case with smaller numbers of data, the accuracy of the RF algorithm is extremely sensitive to sample selection, and improper selection of samples may result in poor classification.

Evaluation of the SS-ELM Framework
For the different planting structures, the classification accuracy of vegetables was the lowest, which might be mainly due to the large differences in its internal sub-classes, e.g., tomatoes, green peppers, squash, etc. The images of some vegetable categories were similar to those of other crop categories such as maize, causing difficulty in classifying vegetables. In experiment 3, the OA values of vegetables were 20.59%, 55.03%, 66.74% and 58.66%, respectively for the RF, SVM, ELM, and S-SVM algorithms. In contrast, the OA value of the SS-ELM algorithm was 77.08%, indicating that the SS-ELM algorithm still had a certain recognition accuracy for vegetable categories.
The classification maps of different algorithms in experiment 4 are presented in Figure 6. Compared with the original map (see Figure 6a) and the map of the handcrafted classification (Figure 6b), it is obvious that the classification map obtained by the SS-ELM algorithm (see Figure 6g) was the smoothest and clearest. In contrast, the RF algorithm could not detect the vegetable categories (Figure 6c). Whereas the traditional algorithms SVM and ELM obtained maps with obvious large patch classification errors (see Figure 6d,e). The S-SVM algorithm could greatly improve the classification accuracy, but its images still had more noise and tiny speckles (see Figure 6f) in comparison with the classification result of the SS-ELM algorithm (see Figure 6g). Moreover, the classification maps obtained by the SS-ELM algorithm had much more obvious and clearer boundary pixels or different category boundaries. Thus, compared with other classification algorithms, the SS-ELM algorithm could obtain the most approximated maps to those of the handcrafted classification. The classification maps of the SS-ELM algorithm for different samples of the training set are presented in Figure 7. It can be found that the maps in Figure 7a-c were visually more accurate than those in Figure 7d-f, where the original map in Figure 6b was used as the reference. The number of tiny speckles in the map gradually decreased as increasing the samples of the training set, and this indicated that a larger number of samples in the training set could obtain a higher classification The classification maps of the SS-ELM algorithm for different samples of the training set are presented in Figure 7. It can be found that the maps in Figure 7a-c were visually more accurate than those in Figure 7d-f, where the original map in Figure 6b was used as the reference. The number of tiny speckles in the map gradually decreased as increasing the samples of the training set, and this indicated that a larger number of samples in the training set could obtain a higher classification accuracy. The classification maps of the SS-ELM algorithm for different samples of the training set are presented in Figure 7. It can be found that the maps in Figure 7a-c were visually more accurate than those in Figure 7d-f, where the original map in Figure 6b was used as the reference. The number of tiny speckles in the map gradually decreased as increasing the samples of the training set, and this indicated that a larger number of samples in the training set could obtain a higher classification accuracy.  The SS-ELM algorithm is superior to other traditional supervised algorithms, especially for the cases with small-size samples. A similar result was obtained by Huang et al. [30], who reported that the ELM algorithm could learn about a thousand times faster than the SVM algorithm. The main reason is that the SVM algorithm requires generating a large number of support vectors, which is difficult to implement in practical application. In contrast, the ELM algorithm only requires very few hidden nodes for the same application. Meanwhile, the SS-ELM algorithm uses k-means to segment the original image first, and then removes the calculations of non-agricultural land use with the advantages of unsupervised learning; thus, it can significantly reduce the amount of time. The RF algorithm has been proven with the problem of overfitting for classifications with noise and produces incredible weights for classifications with a large number of split variables [38]. Therefore, compared with the SVM algorithm and the ELM algorithm, the RF algorithm is difficult to be used for the classification of planting structure. The semi-supervised learning algorithm can improve the classification accuracy for the cases with small-size samples of the training set by enlarging the samples in the training set [8]. Figure 8 shows the comparison of the statistical data and the planting areas estimated by remote sensing. The root mean squared errors (RMSEs) between the estimated values and the statistical data were within 9 ha. The coefficient of determination (R 2 ) for sunflowers, maize, and wheat, was 0.83, 0.87, and 0.95, respectively, implying the ideal estimations of the aforementioned crops. The R 2 of vegetables was only 0.56, indicating a relatively poor estimation for vegetables. Bias measures the average tendency of the estimated value to be larger or smaller than the statistical data. The bias of wheat and sunflowers were positive, indicating that the remote sensing overestimated the wheat and sunflowers planting areas. The bias of vegetables and maize, by contrast, were negative, suggesting that the remote sensing underestimated the vegetables and maize planting areas.
crops. The R of vegetables was only 0.56, indicating a relatively poor estimation for vegetables. Bias measures the average tendency of the estimated value to be larger or smaller than the statistical data. The bias of wheat and sunflowers were positive, indicating that the remote sensing overestimated the wheat and sunflowers planting areas. The bias of vegetables and maize, by contrast, were negative, suggesting that the remote sensing underestimated the vegetables and maize planting areas.

Detection of Cultivated Land Area
The classification maps of the total cultivated land area of HID in 1986, 1990, 1995, 2000, 2005, and 2010, respectively, are shown in Figure 9. These maps were obtained using the image segmentation method with the green pixel representing the cultivated land for agriculture. Based

Detection of Cultivated Land Area
The classification maps of the total cultivated land area of HID in 1986, 1990, 1995, 2000, 2005, and 2010, respectively, are shown in Figure 9. These maps were obtained using the image segmentation method with the green pixel representing the cultivated land for agriculture. Based on the remote sensing estimation, the cultivated land area increased from 355,926 ha in 1986 to 553,923 ha in 2010, indicating approximately a onefold increase during this 24-year period. The fastest increasing stage for cultivated land was found in 1995-2000 with an increasing rate of 20,158.5 ha/year. In contrast, the sand dune (i.e. pale-yellow pixels in Figure 8 Figure 9) decreased from 231,519 ha in 1986 to 132,950 ha in 2010. Figure 10 shows the comparison of the statistical data and the cultivated land area estimated by remote sensing. It can be seen that the estimated area of the cultivated land had a similar increasing trend as the statistical data but showed smaller than the later one. This might be attributed to, first, the statistical data was slightly higher than the practical situation, and second, the Landsat data with a resolution of 30 × 30 m or 15 × 15 m might not fully capture the cultivation details.

Classification of Planting Structure
The SS-ELM algorithm was then used to obtain the classification of agricultural planting structure in HID from 1986 to 2010, and the results are presented in Figure 11 and Table 5. As shown in Figure 11, wheat, maize, and sunflowers respectively corresponding to the color blocks of yellow, green, and purple in the resulting maps, were the three major planting crops. Of the three crops, the proportion of wheat planting area showed a significant decreasing trend. In contrast, the proportions of maize and sunflowers planting areas had a steadily increasing trend during the period of 1986-2010. Among these three crops, wheat was the crop with the largest planting area accounting for 70% of the total planting area in 1990 and became a crop with its area only about 28% of the total planting area in 2010. Meanwhile, the proportion of planting area was 12% for maize and 17% for sunflowers in 1990, and it increased to 28% for maize and 44% for sunflowers in 2010, respectively. area estimated by remote sensing. It can be seen that the estimated area of the cultivated land had a similar increasing trend as the statistical data but showed smaller than the later one. This might be attributed to, first, the statistical data was slightly higher than the practical situation, and second, the Landsat data with a resolution of 30 × 30 m or 15 × 15 m might not fully capture the cultivation details.   similar increasing trend as the statistical data but showed smaller than the later one. This might be attributed to, first, the statistical data was slightly higher than the practical situation, and second, the Landsat data with a resolution of 30 × 30 m or 15 × 15 m might not fully capture the cultivation details.    period of 1986-2010. Among these three crops, wheat was the crop with the largest planting area accounting for 70% of the total planting area in 1990 and became a crop with its area only about 28% of the total planting area in 2010. Meanwhile, the proportion of planting area was 12% for maize and 17% for sunflowers in 1990, and it increased to 28% for maize and 44% for sunflowers in 2010, respectively.   As shown in Table 5, during the period of 1986-2010, the total planting area of the three main crops increased from 249,893 ha in 1986 to 381,425 ha in 2010, indicating an increasing rate of 52.64%. Of these three main crops, the wheat planting area increased before 1995 with an annual increase of 16,513 ha/year, and then decreased in the following years with a decreasing rate of 12,999 ha/year. However, the planting areas of maize and sunflowers showed a continuous increase, and the values for maize and sunflowers in 2010 were 2.9 and 2.7 times as large as those in 1986, respectively. Especially, a significantly increasing trend could be identified with an increasing rate of 68% for maize from 2000 to 2005, and 49% for sunflowers from 1995 to 2000, respectively. In addition, the remote sensing estimated areas of the three crops were basically consistent with the statistical data, and the average difference between the estimated value and the statistical data was within 14%. The result indicated that the SS-ELM algorithm has good accuracy for the classification of crop planting structure in large-scale areas. The discrepancy between the estimated value and the statistical data again might be attributed to the fact that at first, the remote sensing image might not fully capture the detail cropping pattern, and second, a bias might exist between the statistical data and the practical cropping area.
The cultivation land area and the total planting area of the three main crops (wheat, maize, and sunflowers) in HID have dramatically increased from 1986 to 2010. This might be attributed to an increase in the local population, increasing food requirements. In addition, with economic development, expanding the cultivated land area and planting area is one of the major channels for local farmers to increase their income [39]. However, regional water-saving policies and the economic consideration of crop production might be two important reasons for the great changes in planting structures [40,41], causing the increase of the planting areas of maize and sunflowers and the decrease of wheat planting area during the period of 1986-2010. Especially, the comprehensive water-saving practices have been adopted in HID since 2000, which aims to reduce the diversion of water from the Yellow River [40]. As one of the major practices, planting structure adjustments by increasing high economic benefit and water-saving crops (sunflowers and maize) and reducing the crops with low economic benefit and high water consumption (wheat) has also been performed as well [41]. This then largely alleviates water shortages in HID, while effectively protecting the farmers' income. In addition to being used to observe land-use changes in the early years, the agricultural planting structure map can be used as the basic data for hydrological modelling and other surface parameters reproduction.
Compared with the semi-supervised Laplacian extreme learning machine, the SS-ELM algorithm combined with the co-training self-label algorithm is more suitable to solve the problem of agricultural planting structure classification. The co-training algorithm trains two robust classifiers with the labeled samples in two independent subsets, and then selects the unlabeled samples with high reliability to train the other classifier separately. This method not only solves the problem of the amount and quality of manually labeled data but also makes the samples with low resolution be labeled more correctly. The image segmentation with k-means makes the classification of agricultural planting structure directly eliminate the interference of non-agricultural lands, thus reducing the amount of time needed for classification.
It should also be mentioned that the SS-ELM method performs classification iteratively. The iterative procedure is also adopted by the active learning method [42][43][44][45]. Both the active learning and the semi-supervised learning methods use the unlabeled and labeled data to improve their learning ability. The main idea of the active learning is different from that of the semi-supervised learning. The active learning method needs an external entity to annotate the request, whereas the semi-supervised learning method does not need manual intervention. The active learning method shows good performance on hyperspectral image classification based on neural network, graph, spatial prior fuzziness pool, or 3d-gabor. The limitation of this method is that it cannot establish a reasonable network, graph, fuzziness pool, or effective features for the classification of low-resolution images. The SS-ELM algorithm combines two robust classifiers, i.e., SVM and ELM, and shows excellent stability for the cases with low resolution images. Moreover, the classified results of using the SVM and ELM classifiers are even more reliable than the manually classified results. This advantage is particularly prominent in the classification of the complex agricultural planting structure.

Conclusions
In this paper, a semi-supervised extreme learning machine (SS-ELM) framework was improved and used for the classification of land cultivation and agricultural planting structure. The SS-ELM performs the classification by jointly using the image segmentation, the self-label algorithm and the extreme learning machine based on the remote sensing data. The classification framework was evaluated by using the experiments with datasets collected from a small-scale area, and the results with reasonable accuracy were achieved even with a small number of labeled samples. Then the SS-ELM algorithm was used for the detection of land cultivation and the classification agricultural planting structure in HID, a large-scale agricultural area in the upper reaches of the Yellow River basin. The areas of both the cultivated land and the major planting crops estimated using the SS-ELM algorithm were consistent with the statistical data.
Compared with the traditional supervised and the semi-supervised algorithms, the SS-ELM algorithm could obtain the agricultural planting results with much higher accuracy and efficiency. Especially, for the cases without sufficient identified samples for each crop category, the SS-ELM algorithm could effectively solve the problem because the framework can label a sufficient number of unlabeled target samples and use the enlarged sample set for classification. Thus, it can improve the detection and classification ability and efficiency of land cultivation and planting structure.
However, in the classification of agricultural planting structure, it always occurs that crops with small growing area, e.g., vegetables and oilseed crops in HID, cannot be identified from those crops with large growing area, e.g., wheat, maize, and sunflowers in HID. In addition, the performance of SS-ELM algorithm is sensitive to the accuracy of image recognition of planting structures. Further studies are therefore required to apply the high-resolution remote sensing images which have been developed in recent years for improving the classification accuracy of agricultural planting structure. In addition, techniques of advanced computer vision, e.g., deep learning, can also be used to improve the recognition accuracy and efficiency of land cultivation and agricultural planting structure.