Crop Identiﬁcation Based on Multi-Temporal Active and Passive Remote Sensing Images

: Although vegetation index time series from optical images are widely used for crop mapping, it remains difﬁcult to obtain sufﬁcient time-series data because of satellite revisit time and weather in some areas. To address this situation, this paper considered Wen County, Henan Province, Central China as the research area and fused multi-source features such as backscatter coefﬁcient, vegetation index, and time series based on Sentinel-1 and -2 data to identify crops. Through comparative experiments, this paper studied the feasibility of identifying crops with multitemporal data and fused data. The results showed that the accuracy of multi-temporal Sentinel-2 data increased by 9.2% compared with single-temporal Sentinel-2 data, and the accuracy of multi-temporal fusion data improved by 17.1% and 2.9%, respectively, compared with multi-temporal Sentinel-1 and Sentinel-2 data. Multi-temporal data well-characterizes the phenological stages of crop growth, thereby improving the classiﬁcation accuracy. The fusion of Sentinel-1 synthetic aperture radar data and Sentinel-2 optical data provide sufﬁcient time-series data for crop identiﬁcation. This research can provide a reference for crop recognition in precision agriculture.


Introduction
As the world's population continues to grow and the COVID-19 pandemic brings uncertainties, food security has gained increasing attention [1][2][3]. At the same time, driven by the digital revolution in agriculture, two major changes have taken place in world agriculture: the development of precision agriculture, and the development of the agricultural digital economy [4][5][6]. Integrating GIS and remote sensing to obtain and analyze crop planting information is the key to realizing intelligent agriculture [7]. Remote sensing, with its rapidity, accuracy, and numerous other advantages, is now widely used to extract and classify crops, which is vital for intelligent agriculture [8].
Multi-temporal passive remote sensing plays an important role in agriculture [9]. Since the growth of crops follows seasonal rhythms and regular phenological variations, a time series of multi-temporal remote-sensing data can characterize crops as a function of time [10][11][12]. Sonobe et al. (2017) used data from the Landsat 8 Operational Land Imager to classify crops in Hokkaido, Japan and achieved an overall accuracy of 94.5% [13]. Yi et al. (2019) extracted various features from multi-temporal Sentinel-2 data for crop classification and obtained an overall accuracy of 94% [14].
The above Landsat 8 and Sentinel-2 optical data were obtained by passive remote sensing platforms [15]. Sentinel-2 is a multispectral high-resolution imaging satellite that carries a Multispectral Instrument (MSI) [16]. It has two satellites 2A and 2B, and each satellite conducts an Earth observation every 10 days under constant-observation conditions [16]. The complementarity of the two satellites can achieve a temporal resolution of 5 days. The MSI has 13 bands that cover from 442 to 2202 nm, and the highest spatial resolution is 10 m [16]. With its three bands in the red edge, Sentinel-2 can provide rich information for crop detection and thereby greatly improve the estimation accuracy for chlorophyll content, the fractional cover of forest canopies, and leaf area index [17]. In addition, Sentinel-2 data have high temporal resolution, so their use can provide rich temporal information for short-term crop identification over large areas [18]. Although the vegetation indices (VIs) feature of optical images provides an efficient method for crop identification, they are easily affected by cloud cover and are difficult to obtain as complete time series [19].
Since microwaves can penetrate clouds, active remote sensing can be done independently of weather conditions, which greatly facilitates the acquisition of continuous timeseries images [20]. Sentinel-1 is a radar satellite launched by the European Space Agency with high spatial resolution and a short revisit period. It has four working modes: wave (WV), interferometric wide swath (IW), extra-wide swath (EW), and stripmap (SM) [21], and can provide single-polarized (HH or VV), dual-polarized (HH + HV or VV + VH), multitemporal, high-resolution (down to 5 m), and C-band synthetic aperture radar (SAR) imaging data. Researchers have already applied Sentinel-1 data to crop identification [22][23][24][25]. Teja et al. used Sentinel-1 temporal SAR data to estimate Kharif rice planting areas and obtained an overall accuracy of 91% [25]. To better identify crops, many scholars used both Sentinel-1 and -2 data [26][27][28]. However, few studies have focused on feature-level fusion, which is helpful for dealing with information redundancy [29].
The random forest (RF) classifier can classify fused remote sensing data [30]. RF is an ensemble learning method used to solve classification and regression problems [31]. Ensemble learning is a machine learning scheme that improves prediction accuracy by integrating multiple models to solve a given problem [32]. Multiple classifiers participating in ensemble classification produce more accurate results than a single classifier [32]. In addition, RF can extract multisource features through multiple decision trees, thus making the most of the features used in fused data. In particular, for time-series features, RF uses the statistical characteristics of time-series data to extract and analyze the time-series features of different samples [33]. Finally, due to the simple and efficient decision-making method of majority voting, RF classifiers can quickly classify large quantities of remote-sensing data [34].
This paper focuses on the autumn crops in Wen County, Jiaozuo City. Using multitemporal Sentinel-1 and -2 data, this study fused the backscattering coefficient of Sentinel-1 and the normalized difference vegetation index (NDVI) of Sentinel-2 data at the featurelevel, then used a GIS platform to make samples, and finally used RF for classification. The classification results of fused data, single-temporal Sentinel-2 data, multi-temporal Sentinel-1 data, and multi-temporal Sentinel-2 data were compared to elucidate the advantages of multi-temporal data and fused data for crop identification.

Study Area
Wen County, Jiaozuo City is situated in northwest Henan Province, China (112 • 57 39 -113 • 02 43 E, 34 • 50 15 -34 • 57 37 N) and has a warm temperate continental monsoon climate. The average temperature is between 14 and 15 • C, and the annual rainfall is between 550 and 700 mm. Most of Wen County is plain, and the average elevation (relative to the average sea level) is between 102.3 and 116.1 m. The multiple historical floodings of the Yellow River and the Qin River formed the unique landforms of the south beach, the north depression, and the middle hill in Wen County. The inland rivers belong to the Yellow River system. The Yellow River, Qin River, and the drainage river system flow through the area, providing sufficient water resources and convenient irrigation. The main types of soil in this area are yellow fluvo-aquic soil and cinnamon fluvo-aquic soil, and long-term farming gives this area significant production potential. This paper focuses mainly on the autumn crops in Wen County, mainly including corn, peanut, and Chinese yam, accounting for about 96% of the total crops. Other non-major crops, mainly including sweet potatoes, oil crops, vegetables and fruits, fall outside the scope of this paper. Figure 1 shows the geography of Wen County.

Sentinel-1, -2 Data
For this research, ground range detected (GRD) Sentinel-1 C-band (5.405 GHz) level-1 images in IW mode were downloaded from the website of ASF Data Search. IW mode supports the merging of wider strip widths (250 km), and the downloaded images have medium resolution (10 m). GRD products include two polarization modes, VV and VH. The range and azimuth resolutions are 20 and 22 m, respectively, and the pixel spacing is 10 m. Table 1 shows the data from Sentinel-1 imagery products. This study uses multi-temporal Sentinel-2 data for crop identification. Table 2 lists the dates of Sentinel-2 image acquisition. These Level-2A data cover different plant growth periods from June 2020 to October 2020, and the data were obtained from the PIE (Pixel Information Expert) Engine platform.

Field Data
To understand the local crop-planting situation in Wen County and obtain the training set and testing set labels for crop classification, this study conducted field visits at three different sites in the study area in September 2020. This study consulted the local agricultural production departments and farmers in detail about the planting phenology of autumn crops. The statistical results are given in Table 3, and Figure 2 shows representative crop images.  In the field visits, in addition to recording the crop attributes for each parcel, the center latitude and longitude coordinates of each parcel were recorded with a handheld differential GPS positioning tool using the WGS1984-UTM coordinate system (zone 49N), with less than 2 m positioning error. Using the ArcGIS10.6 (Esri, Redlands, CA, USA) software platform, the data generated by GPS positioning and attribute data were merged and converted into ESRI shapefile format and matched to Google's high-resolution remote sensing images. This study used visual interpretation to label polygons on Google Image with the shapefile as the center. In terms of the ground sample distance of Sentinel-1 and -2 data, each polygon was at least 10 m from the boundary to avoid pixel mixing at the boundary. Some of the labels are shown in Figure 3. "Others" includes non-major crops, established areas, water, roads, and trees.  Table 4 lists the ratio of labeled area to corresponding crop area. The crop area data came from the 2020 Jiaozuo City Statistical Yearbook data.

Methods
The research method used herein mainly focused on the fusion of Sentinel-1 and -2 data. This paper not only studied the role of multi-temporal data in crop identification but also discusses the advantages of fusing active and passive remote-sensing images for crop classification. Figure 4 shows a flowchart of the process.

Time-Series Datasets
Sentinel-1 was preprocessed using the open-source remote-sensing processing software SeNtinel Applications Platform (SNAP) of the European Space Agency and ENVI5.3 (Esri, Redlands, CA, USA). The main processes included removal of border noise and thermal noise, speckle filtering, radiometric calibration, terrain correction, conversion to decibels, and image clipping [35]. The time-series datasets with VV and VH polarization were obtained by preprocessing, which includes the backscattering coefficient. This study normalized features in both datasets: the pixel values of each image were transformed to a common scale, and the inherent similarities and variations were preserved. Feature normalization can improve the performance of machine learning algorithms. The normalization uses the min-max normalization type, with a minimum value of 2% and a maximum value of 98%, to reduce sensitivity to outlier.
Sentinel-2 data were preprocessed using PIE Engine, the optical features extracted, and a normalized difference vegetation index (NDVI) time-series dataset was constructed. Extracting the NDVI time-series features of Sentinel-2 is the key to studying crop mapping [14]. The NDVI correlates strongly with leaf area index and plant chlorophyll, and it is an important tool to study vegetation growth status and vegetation coverage and eliminate radiation errors [36]. Its time-series curve reflects the growth cycle of crops, including sowing, germination, heading, maturity, and harvest [37][38][39]. NDVI is calculated according to the normalized transformation of near-infrared and red reflectance, which is given by Equation (1).
where ρ N IR is the reflectance in the near-infrared band and ρ N IR is the reflectance in the red band.
To extract the study area, the vector boundaries of Wen County were read as feature classes in the PIE Engine and overlaid onto Sentinel-2 images. Then, this study performed operations such as cloud removal, NDVI band math, filtering, and batch exporting. In particular, the quality assessment band was used to detect clouds and cloud shadows for simple and efficient cloud detection and removal. To facilitate the calculations involved in machine learning, feature normalization was performed on the acquired NDVI time-series dataset.

Fusion of Active and Passive Remote Sensing
The fusion of different sensor data generally requires georeferencing and image fusion [40]. In this study, both active and passive remote sensing time-series datasets were projected to the WGS1984-UTM coordinate system (zone 49N), so georeferencing is not required before fusion.
Most of the fusion of optical and radar images adopts an early fusion strategy, in which the optical image time series and the radar image time series are stacked together in the form of a data cube [41][42][43][44]. This fusion method does not extract vegetation index and backscattering coefficient, so it is simple and easy to implement. However, the fusion data have too much information, most of which is not important for crop identification, so information redundancy is a problem [29]. Therefore, this study adopted here the featurelevel fusion method, whereby this study selected the backscattering coefficient feature from Sentinel-1 and the VI feature from Sentinel-2, and obtained the fused data by stacking the corresponding time-series dataset. In more detail, the fusion data contain 9 NDVI, 10 VV polarization backscattering coefficients and 10 VH polarization backscattering coefficients.

Random Forest Classifier
This study used a semantic segmentation algorithm based on the RF classifier to classify Sentinel-1 and -2 fused data. Unlike instance segmentation, which treats a single object as a distinct entity regardless of its category, semantic segmentation treats all objects of the same category as belonging to one entity [45,46]. Due to the complexity of remote-sensing image applications and the similar structure of crops, this study used a classification method of semantic segmentation. RF is a supervised machine learning algorithm that meets the needs of semantic segmentation [47]; it consists of multiple decision trees, and the category of each pixel of its output depends on the maximum number of votes in the set of trees. RF is faster than other classifiers, easier to parameterize, and robust [48], which supports the analysis of various features used herein. Figure 5 shows the structure of RF, and the RF classifier involved four main steps: (1) A sample set with capacity N was extracted N times with one-at-a-time replacement until N samples were formed, which were then used as the samples at the root node of the decision tree to train the decision tree; (2) Each sample has M features. When the decision tree needed to be split, m << M features were selected at random from these M features. The feature with the best classification ability of these m features was selected as the splitting feature of the node; (3) To form the decision tree, each node was split as per step 2 until the feature selected by the child node was the feature used when the parent node was split; that is, the child node was a leaf node. At this point, the splitting stopped. Note that each tree grew to the maximum extent, and no pruning was done during the formation of the decision tree; and (4) This study followed steps 1-3 to build k decision trees to form a RF. Assuming that the set of categories was {c 1 , c 2 , . . . , c N }, the prediction output of h i in sample x was expressed as an N-dimensional vector h 1 represented the output of h i in category c j , and the decision was made by the majority voting (Equation (2)). That was, if a category got more than half of the votes, the prediction would be that category, otherwise the prediction would be rejected.
The RF classifier had two parameters to set: the number k of decision trees in the classifier and the number m of features to consider when finding the best split [49]. The more decision trees in the RF classifier, the better the classification, but the longer the calculation time; the smaller the number of features, the smaller the variance, although the bias would increase. Therefore, the classification accuracy and time efficiency were comprehensively considered. This study used 300 decision trees, and the root of the total number of features was taken as the number of features.

Training and Prediction
For classification training and testing, the sample labels were overlaid on the fused images to obtain the samples. To avoid the influence of spatial autocorrelation [50], samples from two sites in the study area were used as the training set, and the other samples were used as the testing set. Table 5 lists the number of pixels and parcels for the training set and the testing set. The training set data were input into the parameterized RF classifier for training, and then the trained RF model was used to make predictions from the testing-set images. To evaluate the advantages of fused data and multi-temporal data, the time series of backscatter coefficients of Sentinel-1, the NDVI time series of Sentinel-2, and the NDVI images of single-temporal Sentinel-2 (on this day, all categories are easily distinguished) were also used to train the classifier and to make predictions with the same parameters.

Accuracy
Four schemes were used to evaluate the accuracy of the predictions. The most commonly used technique to assess the accuracy of crop classification involves the confusion matrix [51]. The confusion matrix compares the predicted images of the testing set with the labels of the testing set and produces the overall accuracy (OA%) and Kappa coefficient (K) (Equations (3) and (4)), which are the probability that the pixels are correctly classified and measure the consistency between the classification result and the actual result [48]. The OA was given by where p i,j is the total number of pixels belonging to category i and assigned to category j and n is the number of categories. The Kappa coefficient was given by where N is the total number of pixels, a 1 , a 2 , . . . , a n are the numbers of real pixels in each type, and b 1 , b 2 , . . . , b n are the numbers of pixels predicted for each type [52]. In addition, to fully verify the classification results obtained, this study compared the classification results of the fused data with the 2020 Jiaozuo City Statistical Yearbook data. Figure 6 shows the time-varying curves of VV and VH polarization backscatter coefficients for different categories after feature normalization. The pixel value of this backscatter coefficient is the average value of each class sample. It can be seen from Figure 6 Figure 7 shows the NDVI pixel values after feature normalization and as a function of time for different categories. The NDVI pixel value is the average of the NDVI pixel values for all samples in that category. The results showed that the NDVI value of crops changed with time, which was consistent with the growth process of crops [11]. On August 5, the categories were easy to distinguished, so this study selected the Sentinel-2 data on this day as the single-temporal data.

Accuracy
The OAs and Kappa coefficients are given in Table 6. Compared with the 5 August 2020 Sentinel-2 data, the OA of the multi-temporal Sentinel-2 data increased by 6.3%, and the Kappa coefficient increased by 0.047. The fused multi-temporal Sentinel-1 and -2 data achieved the highest OA of 90.5% compared with multi-temporal Sentinel-2 (87.6%), 5 August 2020 Sentinel-2 (81.3%), and multi-temporal Sentinel-1 (73.4%). For further comparison, the confusion matrix of the fused data is shown in Table 7. Maize achieved the highest producer's accuracy at 91.5%, followed by other land cover (89.6%), yams (88.6%), and peanuts (85.2%). In terms of user's accuracy, maize obtained the highest accuracy (96.4%), followed by other land cover (81.4%), peanuts (79.5%), and yams (75.1%). Table 7. Confusion matrix of fused multi-temporal Sentinel-1 and -2 data. The value displayed for each class is the number of pixels.  Figure 8 visually compares different predictions and testing set labels. On the one hand, multi-temporal Sentinel-2 data produced less image noise than single-temporal Sentinel-2 data and greater internal uniformity of the parcels. On the other hand, the fusion of multi-temporal Sentinel-1 and -2 data reduced the image noise of multi-temporal Sentinel-2 data and improves the prediction accuracy of multi-temporal Sentinel-1 data. With multi-temporal Sentinel-1 data, the edges of autumn crop parcels are the main areas of poor prediction, with minor linear objects, such as paths and streams, being incorrectly predicted as autumn crops.  Figure 9 shows the spatial distribution of autumn crops in the study area, as determined by the fusion of multi-temporal Sentinel-1 and -2 data. From a spatial point of view, peanut, maize, and Chinese yam are mainly distributed in areas outside the southeastern part of Wen County. In terms of area, maize covers the largest fraction of crop area (74.95%), followed by Chinese yam (15.27%) and peanut (9.48%).

Comparison with Government Data
As shown in Table 8, the areas of peanut, maize, and Chinese yam reported by the government are all within the range predicted herein based on the fused data. Therefore, the classification results obtained herein correlate significantly with the 2020 Jiaozuo Statistical Yearbook data.

Discussion of Results
This study focuses on crop classification based on multi-temporal remote sensing data. The results showed that the use of the multi-temporal Sentinel-2 data significantly improved the classification accuracy. This result is attributed to the fact that crops have certain similarities during growth due to their physicochemical properties such as moisture and chlorophyll content and thereby have highly similar spectral characteristics [53]. This means that different crops may have the same spectral characteristics. The use of multitemporal remote sensing imagery is thus crucial to clarify these ambiguities.
The time series of multi-temporal remote-sensing data characterized crops on a temporal basis, which provided a broader basis for identifying crop types, as demonstrated by the following four points: (1) The temporal dimension provided by the time-series data thus resolved the problem of the different crops having the same spectral characteristics or different spectral characteristics corresponding to the same crop, thus allowing the accurate, time-resolved determination of crops characteristics [54]. (2) Crop identification based on time-series data is not limited by seasons or crop phenology. By reconstructing or decomposing time-series curves, the effects of crop phenological changes in different periods can be eliminated and more general growth characteristics of crops can be explored, thereby eliminating the spurious variations caused by seasonal factors [55]. (3) The variations detected based on time-series data reflected multi-year changes in crops [56], which allows people to analyze how crops evolve over time. (4) The high temporal resolution of the time-series data allowed the accurate extraction of the temporal variations in crops.
The results show that the Sentinel-2 data lead to much greater accuracy than the Sentinel-1 data [57,58]. The NDVI of Sentinel-2 is an important index for crop area inversion and is widely used to extract crops and other vegetation [14,36]. For Sentinel-1 data, crops can only be extracted according to the change of backscattering coefficient, which is difficult to detect and distinguish crops.
The fusion of Sentinel-1 and -2 data is the key to this study. The results show that, compared with other data, the fused Sentinel-1 and -2 data provided the highest classification accuracy. This study used a feature-level fusion method to construct a multi-source remote-sensing fusion model that integrates multi-source features such as spectrum, time series, and backscatter coefficient. On the one hand, the combination of Sentinel-1 and -2 data provided more features for identifying crops. On the other hand, SAR data has strong anti-interference ability and is not affected by cloud cover. It can be acquired during the day or night under different weather conditions. Using Sentinel-1 data to supplement Sentinel-2 data solves the problem of cloud cover and insufficient time series data.
Some identification errors occurred in this study. A comparison of the classification details revealed that the pixels that should be classified as peanuts are instead classified as Chinese yams, and vice versa. Considering the growth phenology and vegetation indexes of the two crops, it can be found that their growth states are similar, which explains why it is difficult to distinguish Chinese yam from peanuts using remote sensing. However, the classification of maize is better because the phenology and VIs of maize differ from those of other crops. In addition, maize covers a large area so the samples are abundant, which is conducive to training an RF classifier.
The crop classification map and government data show the distribution of crops in Wen County. Maize is most widely distributed, much more so than Chinese yam or peanuts, because maize is the main food crop and has strong market demand, whereas peanuts and Chinese yam are local-specialty cash crops with small market demand. Wen is a typical agricultural county in China, so the good classification obtained in this work means that the method proposed herein can be transferred to other regions of China.
Future research should consider the following three aspects: (1) In feature selection, the multi-source remote sensing fusion model should be studied by integrating the spectral band, spectral index, texture feature, spectral index variation, and their combinations. (2) A variety of machine learning algorithms can be tested in comparative research. (3) Finally, to improve classification accuracy, multi-source remote-sensing fused data can be further enriched by, for example, airborne hyperspectral data, point cloud data, elevation data, and high-resolution data.

Conclusions
Given the population growth of recent decades, the rational use of land resources for crop planting has gained importance, and remote sensing provides an effective method to monitor crops. Radar satellites can be used to monitor the Earth's surface on cloudy and rainy days, and optical data can be used to obtain vegetation indices to monitor crops. Therefore, compared with traditional remote-sensing data, the combination of radar and optical data can improve the accuracy with which crops are classified. The multisensor developed by the Copernicus program offers the fusion of data to improve crop identification. In this study, multi-source features such as backscatter coefficient, VIs, and time series were extracted from Sentinel-1 and Sentinel-2 data, and these features were fused at the feature-level. By using a RF classifier, this study obtained a Kappa coefficient of 0.881 and an overall accuracy of 90.5%, which shows that the fusion of multi-temporal active and passive remote sensing data can improve the accuracy of crop classification.