1. Introduction
Accurate crop-type distribution data is a prerequisite for monitoring crop growth and yield forecasting [
1,
2,
3,
4,
5,
6]. At present, satellite-based remote sensing has been widely used in the identification and mapping of different land-use types and crop types, by selecting the appropriate classification features and methods [
7,
8].
For different classification targets, the accuracy differs by using a certain spectral reflectance, vegetation index, texture feature, or a combination of multiple features at different time. More and more researches combine multiple features [
9,
10,
11] of multiple time phases [
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23] to improve classification accuracy. For the use of multi-vegetation indexes, Peña-Barragán et al. [
9] used green, red, near-infrared (NIR) and all the six shortwave infrared (SWIR) in the total cropland area of Yolo County, California to obtain maps of 12 major crops. Using ratio vegetation index (RVI), normalized difference vegetation index (NDVI), Visible Atmospherically Resistant Index (VARI) and Normalized Difference Greenness Index (NDGI), Su et al. [
11] identified mangrove species in the mangrove forest of Tieshan port, Guangxi, China, with an overall accuracy of 95.37%. For multi-temporal features using, Jakubauskas [
12] and Geerken [
15] identify crop types and calculate coverage based on the NDVI time series. Ma [
19] compared the classification results of single-phase and multi-phase and found that the information obtained from multi-phase remote sensing data can greatly improve the classification accuracy. Liu [
20] and Hao [
23] studied the methods of crop classification based on monthly and 14 time-phased time series data of NDVI in Hengshui City, Hebei Province, and the three northeast provinces, respectively. Huang [
24], Wang [
25,
26], Zhong [
27] studied the extraction of maize and soybean based on multi-temporal and multi-feature. Increasingly, characteristic variables can be used for crop classification [
28,
29], however, you may not get better results with more feature variables. Sometimes compared with the use of complex multivariate feature methods, the accuracy of variables subset is equivalent or even higher [
30]. Wang [
31], taking Sihong County, Jiangsu Province as the research area, quantitatively evaluated the importance of the feature variables and the optimal number of features, and obtained a good classification result using a random forest classifier. Brown et al. [
32] used two vegetation indexes (VIs) that were extracted from Moderate-resolution Imaging Spectroradiometer (MODIS) time series to identify cotton, soybean and maize in Brazil. Therefore, the selection of characteristic variables is a very critical issue in crop classification. Selecting appropriate classification characteristics can not only improve the calculation efficiency, but also obtain higher classification accuracy.
Texture features are also frequently used in land-use and land-cover (LULC) classification, and crop classification based on high-resolution remote sensing images. There are many types of texture feature extraction methods based on high-resolution remote sensing images [
33,
34,
35,
36,
37,
38], including the gray-level co-occurrence matrix (GLCM), Fourier power spectrum, first-order statistics of gray level differences and multifractal mode, etc. Bai et al. [
33] used GLCM for wood texture analysis. Zhang et al. [
34] used GLCM to extract texture features to identify grains. Cao et al. [
35] identified weeds in wheat field based on the color co-occurrence matrix. In addition, texture analysis methods based on mathematical transformation have also been widely studied and applied. Wu et al. [
37] used the improved Hough transform to detect crop lesion targets, and quickly and effectively detected round-like lesion targets. Majumdar et al. [
38] used the green band texture feature model to classify grains effectively. The partial canopy differences of different planting patterns or variety types of the same crop can also be reflected in the texture information.
The screening of classification methods is to compare the accuracy differences among methods. Commonly used algorithms include maximum likelihood (ML) method, neural network, support vector machine (SVM), decision tree (DT) and random forest (RF) algorithms [
39]. Among them, the random forest classifier proposed by Breiman [
39] is a classifier that is widely used and has higher classification accuracy. It has a good tolerance for outliers and noise, and is not prone to overfitting [
40]. Ghosh et al. [
41] used the random forest classifier to classify the LULC of Delhi, the capital of India, in 1998, 2002 and 2011, and the overall accuracy is about 80% for the annual map at level I. Liu et al. [
42] compared the classification accuracy of LULC by ML, SVM and RF classifiers based on Environmental and Disaster Monitoring and Forecasting Small Satellite Constellation (HJ-1) and Beijing-1 (BJ-1) images in Xuzhou City, Jiangsu Province, China, and found that the overall accuracy of RF was the highest in the two data sets, 89.08% and 88.62%, respectively. Based on the RF, Hao et al. [
43] used MODIS data to test the effect of different time series length on classification accuracy of alfa, corn, sorghum, soybean and winter wheat in Kansas, USA, and the result showed that the accuracy reached 88.45% when the time series length was 5 months. Based on landsat7 Enhanced Thematic Mapper Plus (ETM+) images, Tatsumi [
44] used a RF classifier in Peru to explore ways to extract a variety of crops, such as cotton, grape, maize, soybean, and wheat.
The typical example of different planting patterns of the same crop are grazing and moving of pastures, which are often distinguished by the change detection algorithm based on the information of reflectance, height and biomass of herbage, as well as spatial geometric pattern of pastures. Remote sensing identification research of different varieties of the same crop is mainly aimed at perennial crops with few main planting varieties, such as grape, sugarcane and citrus, and auxiliary information such as distribution map, meteorology and geography of the crop is needed to distinguish the varieties [
45]. Despite the advances in remote-sensing classification of different crops, however, there is still a lack of in-depth research on whether satellite remote sensing can further identify different planting patterns or variety types of the same crop, which is crucial for large-scale staple crops such as maize, rice and wheat. First, the acreage of a particular pattern or type of these staple crops may be much larger than that of many smaller crops. Second, there are huge regional differences in agronomic traits, stress resistance and yield per unit area between different planting patterns or cultivar types of a staple crop. If it cannot be distinguished accurately, it is difficult to accurately calibrate quantitative remote sensing parameters of this crop, resulting in the accuracy decrease of crop growth, risk and yield monitoring based on this. Therefore, it has become a key scientific problem to identify different planting patterns or varieties of the same crop.
The maize seed-producing area provides us with a similar remote-sensing recognition research scenario. In addition, other crops, even common maize, are grown around the seed maize fields. Therefore, in order to accurately identify the seed maize fields in these kind of areas, it is necessary to be able to distinguish the seed maize fields from common maize, as well as maize from other crops. These differences between seed maize, common maize and other crops are mainly reflected in the spectral reflectance, texture and other information differences of plant population canopy between plots, as well as the temporal changes in different growth periods [
46]. The selection of suitable classification characteristics and methods can help identify the subtle differences between different planting patterns and varieties of the same crop. Liu et al. [
46] and Zhang et al. [
47] respectively took remote-sensing identification of seed maize and common maize in Linze County, Gansu Province, and Qitai County, Xinjiang Autonomous Region, China, as examples to explore the classification method of decision trees using time-series spectra and high-resolution texture characteristics.
Based on the above analysis, it can be seen that vegetation index such as enhanced vegetation index (EVI), RVI, gray-level co-occurrence matrix and other texture features of high-resolution satellite remote sensing image data, as well as their temporal variation information, have the potential to identify crops with slight differences. Random forest classifier has a good tolerance for outliers and noise and is not easy to overfit, so it can be used to build a classification model for seed maize field detection.
In order to explore the identification method for two kinds of planting patterns (seed and common) and variety types (inbred and hybrid) of the same crop (maize). This paper selected two maize seed production bases in Huocheng County and Hutubi County in Xinjiang Uygur Autonomous Region of China as the study area, and took Landsat 8, China Gaofen 1 satellite (GF-1) and Gaofen-2 satellite (GF-2) as the data source. Using random forest classifier, we propose a seed maize identification method that combines multi-temporal spectral features and texture features. It provides a reference for the large-scale mapping of seed maize field and fine classification of other crops by remote sensing.
3. Methods
The research workflow is shown in
Figure 3. It mainly includes five parts: data preprocessing, spectral feature optimization, texture feature calculation, random forest classification, and accuracy assessment. First, preprocess the remote sensing data based on the automatic processing platform developed by our team. The second step is to use the correlation coefficient to optimize the spectral characteristics. The third step is to calculate the texture features. The fourth step is to classify only the spectral features and fusion spectrum and texture features in the regions covered by GF-2 image respectively. Finally, the land cover data in 2014 were used to mask and evaluate the accuracy separately.
3.1. Data Preprocessing
Both the GF-1 images and the field samples were stored using the Raster Dataset Clean and Reconstitution Multi-Grid (RDCRMG) grid system developed by China Agricultural University [
48]. Based on C # combined with the Geospatial Data Abstraction Library (GDAL), procedures such as radiometric calibration, ortho-rectification, and the image registration of their products were performed for all the data. Atmospheric correction and radiometric calibration were performed using Fast Line-of-Site Atmospheric Analysis of Spectral Hypercube (FLAASH) tools [
49]. According to the mid-latitude location and the image capturing time of the study area, a suitable atmospheric model was selected. Since the urban area in the study area is relatively small, a rural aerosol model was selected to correct the effects of the aerosol factors. The GF satellite image products used in this study were provided with rational polynomial coefficient (RPC) files. In addition, the parameters provided in the RPC files were used to perform ortho-rectification on high-resolution remote sensing images. In this paper, multi-source remote sensing images and multi-temporal remote sensing images were well geo-referenced.
3.2. Spectral Feature Optimization
In this paper, a correlation coefficient calculated by Python was used to select the better VIs. Correlation coefficient is a statistical indicator designed by statistician Carl Pearson and is a measure of the degree of linear correlation between study variables. All the field survey samples were analyzed in this paper.
The VI quantifies the vegetation properties by helping transform the reflectance of two or more spectral bands [
50]. Considering the differences in phenology, seasonal differences, as well as the significance and anti-saturation degree of different VIs, the commonly used VIs can be divided into four categories: (1) To reflect the comprehensive change in crop growth: normalized difference vegetation index (NDVI), enhanced vegetation index (EVI). (2) To reflect crop greenness: triangle vegetation index (TVI), ratio vegetation index (RVI), green normalized difference vegetation index (GNDVI). (3) To reflect crop soil background: difference vegetation index (DVI), soil regulation vegetation index (SAVI). (4) To reflect the canopy moisture content of crops: normalized difference water index (NDWI) [
8]. The formula is in
Table 2 as follows:
In the above formula, B, G, R, NIR are the reflectance of blue, green, red and near-infrared bands, respectively. L is the soil conditioning parameter and the value is 0.5.
3.3. LBP-GLCM
GLCM was first proposed by Haralick [
52] in 1973, which is one of the most common and widely used texture statistical analysis methods. The element values in the matrix represent the joint conditional probability density between the gray levels, which means that given the spatial distance d and the direction θ, the probability (i.e., frequency) of gray level j occurring when the gray level i is the starting point. It can calculate 14 texture features, such as angular second-order moment, entropy, contrast, and correlation. In this study, five features calculated by Python are selected for experiment, namely mean, variance, contrast, entropy, and correlation. Although the strip texture information exists on the same structure for the seed maize fields, the texture direction varies in the same remote sensing images. To eliminate the influence of the crop planting direction, before the GLCM is calculated, the image is first transformed into a local binary pattern (LBP) with rotation invariance. The LBP is an operator that is used to describe local texture features of images. It has significant advantages such as rotation invariance and gray invariance.
Specifically, the minimum and the most central pixel values in the binary pattern of 8 domains around the pixel points are taken, and the LBP image with rotation invariance is obtained. There are three extensions to the original operator to describe texture features, namely rotation invariant patterns.
, uniform patterns
, and rotation invariant uniform patterns
, which can be calculated using the following equation:
The calculation formula for LBP uniform patterns is
where
is the neighborhood radius,
is the number of pixels in the circular neighborhood of the LBP algorithm, the
is the grayscale of the neighborhood center cell, and the
is the grayscale of all the other cells except the central cell in the domain.
The threshold formula is
where
is the difference between the central pixel
and the pixel
. Comparing the
of P -1 gray in the circular neighborhood with the center gray
, subblocks larger than the center size are represented by 1, otherwise by 0.
3.4. Random Forest Classification
Random forest (RF) is an integrated algorithm, which belongs to the Bagging type. By combining multiple weak classifiers, the final result is obtained by voting, which gives the overall model result a higher precision and generalization ability [
8].
Random forests can process high-dimensional data well, and this method has significant advantages when there are many samples and features. In this study, multi-temporal single-band VI image series were compiled into a multi-layer data cube for further analysis. In Huocheng County, 8 planting indexes were calculated for 12 scene data, so this data cube has 96 bands, and correspondingly, there are 80 bands in Hutubi County.
In principle, the more decision trees there are in the RF classifier, the better the prediction. However, there is a trade-off between the classification accuracy and the time efficiency. In this paper, we tested different numbers of trees, including 10, 30, 50, 70, 100, 120, 150, 170, and 200 (
Figure 4), and we selected 150 trees to classify the seed maize fields when considering both the classification accuracy and time efficiency.
Crop classification based on remote sensing data is essentially based on the similarity of pixels. In this article, we combined C# with Waikato Environment for Knowledge Analysis (weka), which is an open source machine learning and data mining software based on JAVA environment, to build data sets and classification models. First of all, the characteristics of the samples were extracted from remote sensing image data. Then, the comprehensive characteristics of long phase data, generate the sample set, as the input data of RF classifier, used in the model of training. Finally, the same time-phase remote sensing data were put into the classifier to obtain the crop classification results.
In this paper, two experiments are designed, which are based solely on spectral data and fuse spectrum and texture data. First of all, according to the phenological calendar and the planting system used for the primary crops in the study area, the vegetation index system was constructed, the multi-temporal spectral characteristics were analyzed, and the seed maize was preliminarily identified using the spectral characteristics. Then, in the regions covered by GF-2 image, the classification results were further recognized by a texture analysis of high spatial resolution remote sensing images. In this way, two classification results of seed maize can be obtained.
3.5. Accuracy Assessment
In this paper, the arable land data of 2014 were firstly used for masking, then a random selection resulting in 114 samples (70%) as training samples, with the remaining samples (30%) as verification samples, and the confusion matrix based on Python was used to assess the classification results. By constructing a confusion matrix, four accuracy assessment indexes can be obtained: overall accuracy (OA), producer accuracy (PA), user accuracy (UA), kappa coefficient (K). Kappa analysis provided a measure of the magnitude of agreement between the predicted and actual class membership [
53]. A kappa value of 0 represents a total random classification, while a kappa value of 1 corresponds to a perfect agreement between the reference and classification data. The calculation formula for each indicator is as follows:
where TP and FN, respectively, refer to the true category of samples as positive examples, and the model prediction results as positive examples and negative examples. TN and FP refer to the negative examples of the true category of samples, which are predicted by the model as negative examples and positive examples. N is the total number of real samples.
5. Discussion
The purpose of this research was to explore the identification method for different planting patterns and varieties of the same crop. According to the differences in the planting methods and varieties of seed maize and common maize, based on Landsat 8 and GF-1 WFV multispectral sequence images, and GF-2 PMS panchromatic images with a spatial resolution of 1 m, six spectral vegetation indexes and LBP-GLCM texture features were extracted, and a random forest classification model was constructed to distinguish seed maize from common maize and other crops. The mapping accuracy of two typical seed maize producing counties was 95.90% and 97.79%, respectively. Compared with the “spectral method”, the classification accuracy was improved by 9.19 and 9.36 percentage points by adding texture features.
Another example of different planting patterns of the same crop is grazing and mowing in pastures. Grazing and mowing can cause changes in biomass and crop height, in addition, mowing can also result in soil exposure [
45]. Lopes et al. [
54] distinguish mowing, grazing, and mixed practices, using an object-based classification of grasslands from high-resolution satellite image time series (Formosat-2) and Gaussian mean map kernels. In a case study in Brittany, France, Dusseux et al. [
55] derived the NDVI and two biophysical variables (leaf area index and fraction of vegetation cover) from a series of three SPOT images and field data to monitor pasture mowing and grazing. Compared with the above methods, the method proposed in this paper integrates texture information and spectral information to identify the two planting patterns of maize, and makes better use of the apparent differences between the two patterns. However, this study does not consider the biomass differences between seed maize and common maize. Therefore, the calculation of biomass can be used in a future study to improve the identification accuracy.
Compared with previous exploratory studies on remote-sensing identification of seed maize, the technical scheme adopted in this paper (the combination of data source, spectral feature, texture feature and classification method) has better universality. For example, the decision tree method based on multi-temporal NDVI and high-resolution texture adopted by Liu et al. [
46] in the study of Linze County, Gansu Province, China, has a mapping accuracy of 83%, which is more suitable for identifying the seed-producing maize areas covered with plastic film before sowing. The decision tree method based on multi-temporal EVI and GLCM texture adopted by Zhang et al. [
47] in Qitai County, Xinjiang, China, has a mapping accuracy of about 90%, but the subjective influence of artificial threshold set by the decision tree method is greater. Zhang et al. [
51] adopted the random forest method based on 8 vegetation indexes and LBP-GLCM texture in Huocheng County, Xinjiang, China, with many input parameters and mapping accuracy of about 86%. In terms of texture feature extraction, the results of this study show that GF-2 panchromatic band images with 1-meter spatial resolution can also extract texture information satisfying the identification of seed production plots. Compared with the 0.3 m Geoeye-1 [
46] and 0.7 m Compsat-3 [
47] used in previous studies, the effective coverage of high-resolution data sources of GF-2 can be significantly improved.
Although the method presented in this paper has high accuracy in identifying seed maize, there are still some omissions and misclassification of seed maize. In the spectroscopy-based classification, the seed maize in Huocheng County had a relatively high error of omissions, and the miscounting errors of seed maize as common maize, cotton and rice were 5.32%, 4.63% and 2.69%, respectively. In Hutubi County, the misclassification error was higher. Cotton, watermelon and tomato were misclassified into seed maize, with errors of 3.92%, 2.61% and 1.92%, respectively. This is mainly because the phenological calendar and biomass of these crops are similar, and the sample size of seed maize in Huocheng County is relatively large, accounting for 65.33% of the total sample size. This unbalanced sample structure tends to increase the error of model training, making the classification results more inclined to the category with a large sample size. Therefore, in the following research, it is necessary to consider the subtle differences of these crops and explore the characteristics that are more suitable to distinguish them. At the same time, we will increase the sample sizes of fewer classes and optimize the distribution of samples, so as to make the number of samples and spatial structure more balanced.
The identification accuracy of the method in two seed maize production counties in Xinjiang is relatively high, which can be used for the statistics and mapping of annual seed maize acreage in well-known seed production areas, and the accurate estimation of seed yield based on this. However, this method requires data input from a large number of ground survey samples and full-growth season images. Therefore, it is necessary to further improve the algorithm and identification scheme for areas lacking samples and for obtaining seed production acreage in early growth season.
6. Conclusions
The aim of this paper is to explore the feasibility of high-resolution remote-sensing satellite images in identifying different planting patterns and varieties of the same crop, and expand the connotation of crop precise classification by remote sensing. The result shows that the random forest classifier constructed based on the temporal spectra and texture information extracted from high-resolution remote-sensing satellite images, combined with ground samples, can distinguish two kinds of planting patterns (seed and common) and variety types (inbred and hybrid) of maize. Moreover, the method had high precision in two representative maize seed-producing areas in Xinjiang, China, and can be used for remote-sensing mapping of large-scale maize seed-producing fields. The key findings are as follows:
By comparing and screening the vegetation indices involved in the modeling, we found that SAVI and GNDVI are redundant features for the classification of the seed maize, common maize and other crops, and only the six indices of NDVI, EVI, TVI, RVI, NDWI and DVI are required. Compared with the method using EVI as the spectral feature, the recognition accuracy was improved.
Another seed maize identification model was built based on the fusion of spectrum and texture characteristics. It was found that the texture parameters calculated based on 1-meter panchromatic band of GF-2 can also accurately express the internal differences between seed maize and common maize. There is a significant improvement in accuracy compared to using only spectral information. In addition, compared with other high-resolution images, GF-2 has a wider image width, making this method more scalable.
In this study, remote-sensing identification of seed maize was taken as an example to explore the discrimination method of different planting patterns and varieties of the same crop, which is of great reference significance for fine identification of similar crops. In addition, on this basis, the statistics and mapping of maize area for seed production can be carried out, and the growth, quality and yield of maize seeds can be calculated by combining crop nutrient, disease and yield estimation models. However, this method still has certain limitations, because it requires a large amount of sample data. Therefore, how to effectively use and improve this method in areas lacking samples is a work that needs further research.