1. Introduction
Canola (
Brassica napus L.) is one of the major oilseed crops worldwide. It is the primary source of edible oil for human consumption and a biological feedstock for fuels [
1]. Canola flowers are an important tourism resource, promoting the local development of tourism agriculture [
2,
3]. However, the production and consumption of canola are unbalanced in different world regions. For example, China’s domestic canola production is far less than its increasing demand for canola [
2]. Mapping and tracking the unbalance in canola planting areas is of great importance for agricultural management and food security.
Remote-sensing technology has made significant progress in efficiently mapping crops because of its low cost, comprehensive coverage, and regular acquisition [
4,
5,
6]. Crop mapping methods that use remote-sensing technology vary in complexity, from clustering [
7], decision trees [
8], and object-oriented methods [
9] to machine learning methods [
10]. Most previous studies of crop identification have relied mainly on the availability of well-represented training-data samples [
11,
12]. The provision of such training-data is usually time-consuming, costly, and labor-intensive. Also, methods with specific training data might be subject to low cross-year or different-region repeatability [
13]. Consequently, it is urgent to improve the automation level and robustness of crop mapping.
Enhanced imagery of crop characteristics gleaned from their phenological information helps detect and map crops automatically [
14,
15]. These characteristics are exclusive and steady because most crops show a unique phenological pattern, even though some have a similar growing season [
16,
17]. Hence, once an automated algorithm derived from these characteristics is completed and published, it can provide training rules that can be used directly and repeated year after year without retraining [
13,
18]. Several previous studies have successfully used phenology-derived characteristics for automatic mapping of crops [
13,
19].
In the past, the mapping of canola did not receive enough attention [
3]. In recent years, with the continuous rise in the economic value of canola, several studies have begun to pay attention to the remote sensing of canola. For example, d’Andrimont et al. [
20] and Mercier et al. [
21] mapped canola flowering phenology in parts of Germany and France, respectively. Tao et al. [
22] mapped the spatiotemporal dynamics of canola on the Jianghan Plain and the Dongting Lake Plain in China. Han et al. [
23] published a data product for canola mapping from 2017 to 2019 in 33 countries based on Sentinel satellite images. However, the data produced did not include China.
One of the challenges of canola remote-sensing identification is that winter wheat and canola have similar growth cycles and spectral reflectance characteristics during their entire growing season [
14]. In China, the distribution of canola fields and winter wheat fields is usually staggered and mixed. This further increases the difficulty of canola identification by remote-sensing images. Fortunately, the phenological characteristics of canola flowers provide an opportunity to distinguish between canola and winter wheat, even though the flowering period is transitory; only 1 month, approximately. Several studies have attempted to map canola by detecting its bright yellow flowers using remote-sensing images [
24]. For instance, Fang et al. [
25] proposed a simple model for estimating canola flowers during the canola flowering season. Sulik and Long [
3] proposed the green/blue band ratio to identify canola flowers. Later, Sulik and Long [
1] further proposed a normalized difference yellowness index (NDYI) to estimate canola yields. Ashourloo et al. [
11] proposed a canola index (CI) during the flowering period, computed as the near-infrared (NIR) band multiplied by the sum of red and green bands. Previous studies have gradually enriched and improved the canola indices.
Coarse imagery, such as that of MODIS and Landsat, can barely detect the actual boundaries of farmland. However, high-resolution sensors such as Sentinel-2 have sufficient spatial resolution for canola mapping. Image data volume is closely related to spatial resolution [
10]. Therefore, a flood of data are encountered when analyzing Sentinel-2 images. The cloud computing platform of the Google Earth Engine (GEE), however, provides a solution for processing very large amounts of massive remote-sensing data [
26]. The GEE stores entire data products from major international remote-sensing satellites, such as Sentinel, and has good data-management practices [
27].
In this study, the objectives were to (1) characterize the spectral reflectance of canola at a canopy scale during the canola flowering stage, (2) build a novel canola flower index (CFI) for automatic canola mapping, and (3) achieve automatic identification of canola on the GEE.
2. Materials and Methods
2.1. Study Area
For this study, three study areas were selected, as shown in
Figure 1. The first was in Wuxue County, in the east of Hubei Province, China; the second in Hanzhong County, in Shanxi Province, China; and the third in Hanshou County and its surrounding areas, in Hunan Province, China. The first study area was used to investigate the spectral reflectance of various objects and construct the CFI. The other two study areas were then used to validate the effectiveness of the CFI. On remote-sensing imagery, canola fields and winter wheat fields have a mosaic or crisscross distribution. The canola fields are highly fragmented, and their areas are generally less than 1 hectare. In addition, canola and winter wheat have a similar growth cycle from October to May [
28]. Those phenomena increased the challenge of remotely sensing canola in those study areas.
The phenological calendar of canola and winter wheat in those study areas was investigated, as shown in
Figure 2. Canola and wheat were usually sown in October. Canola generally enters the flowering stage in May, when winter wheat enters the stem-elongation stage. In May, both enter the mature stage.
2.2. Sentinel-2 Imagery
The Sentinel-2 satellite was launched by the European Commission and European Space Agency [
29]. Sentinel-2 images cover 13 wavebands [
30,
31]. The red, green, blue, and NIR wavebands have a spatial resolution of 10 m. The four red-edge wavebands and two shortwave infrared wavebands have a spatial resolution of 20 m. The spatial resolution of the other three wavebands is 60 m. Their revisit period is 10 d. In this study, the red, green, blue, and NIR wavebands were used because their spatial resolution is high. In addition, according to previous research [
17,
32], the red-edge wavebands with a spatial resolution of 20 m contribute little to improving the identification accuracy of canola.
In accordance with the principle of no cloud coverage, we selected six-phase Sentinel-2 images covering the first study area to investigate the spectral reflectance of various types of ground objects, such as canola, winter wheat, forest, bare land, and construction land. The imaging dates for the six-phase Sentinel-2 images are shown in
Table 1. Those dates covered the main growth stages of canola—seedling, wintering, budding, flowering, silique, and mature—as shown in
Figure 2.
In the second and third study areas, those Sentinel-2 images were selected during the canola flowering stage because those two study areas were used to validate the effectiveness of the CFI. The imaging dates are shown in
Table 1.
All Sentinel-2 images used in the study came from the imagery collection “COPERNICUS/S2_SR” on the GEE platform. Those images were surface reflectance data, which were atmospheric corrected [
33,
34].
2.3. Confirming the Optimum Period
During canola’s flowering period, canola fields and winter wheat fields have a significant visual difference; canola fields are yellow, and winter wheat fields are green. To confirm the specific optimum period in this study, the Fisher function of the spectrum between canola and other ground objects in various phases was computed. The Fisher function is [
11]:
where
m and
v are the mean and variance values of spectral reflectance, respectively, and subscripts 1 and 2 represent two different categories.
The Fisher value describes the difference between the classes. The greater the Fisher value, the greater the separability between categories. By using 3256 sets of pixel samples for each spectrum (i.e., the red, green, blue, and NIR wavebands), the Fisher values between canola and wheat, forest, bare land, and construction land were computed in various phases: 11 November 2019, 6 December 2019, 9 February 2020, 20 March 2020, 29 April 2020, and 19 May 2020. The phase with the maximum Fisher value was the optimum period for canola mapping.
2.4. Building the Canola Flower Index
The spectral reflectance of various ground objects during the canola flowering period was plotted based on Sentinel-2 images, as shown in
Figure 3. The spectral data were obtained by using ENVI software based on Sentinel-2 images. There were five ground objects, i.e., canola, wheat, forest, construction land, and bare land. The imagery layers were the red, green, blue, NIR, and normalized difference vegetation index (NDVI). The NDVI has the potential to distinguish between canola and no-vegetation objects [
35,
36]. In
Figure 3, the NDVI value of construction land and bare land is less than 0.2, whereas that of canola is more than 0.5. The NDVI value range of canola does not overlap with that of other objects. The difference between canola and construction land, canola, and bare land is significant in the NDVI band. Therefore, NDVI was taken as a component of the CFI.
The difference in spectral reflectance between canola and wheat, canola, and forest was discernable in the green and red bands, as shown in
Figure 3. Therefore, two features could be constructed to expand this difference in spectral reflectance. First, the sum of the red and green band reflectance for canola was greater than that for wheat and forest. Second, the difference in spectral reflectance between the green and blue bands for canola was greater than that for wheat and forest. Considering the NDVI, three features were obtained for building the CFI. Different combination patterns of those three features were used to construct different CFIs. This combination sequence was that (a) The three features were added, (b) the three features were multiplied, (c) the sum of any two features was multiplied with the third feature, and (d) the product of any two features was added to the third feature. Thereby, eight CFIs were constructed.
To find the best CFI of the eight CFIs, the Fisher value between canola and wheat, forest, bare land, and construction land for each CFI equation was compared by using the same samples, respectively. Those samples were 5219 pixels, which came from the three study areas. The CFI with the highest Fisher value was considered the optimal CFI.
2.5. Classification Methods
To verify whether the optimal CFI enhances the image features of the canola compared to Sentinel-2 raw images, canola was extracted based on the Sentinel-2 raw images and the optimal CFI image derived from the Sentinel-2 raw images. This was done by using an unsupervised classification method (i.e., the IsoData cluster method [
37,
38]), and two supervised classification methods (i.e., support vector machine (SVM) [
39,
40] and random forest (RF) [
41]). Because IsoData is hardly affected by subjective human factors, and classification is automatically done by computers based on the characteristics of the image itself, the classification results from IsoData could determine to some extent whether the optimal CFI images were better than the raw images. The SVM and RF classifiers are widely used within the remote sensing community because of the accuracy of their classification [
39,
41]. Therefore, IsoData, SVM, and RF classifiers were selected to evaluate the performance of the CFI in the study. The training samples were the same when extracting canola by different classifiers based on the CFI images and Sentinel-2 raw images. The testing samples were the same for evaluating different classification results derived from different classifiers.
A decision tree model [
42] was built in GEE to improve the automation level and robustness of the canola mapping. Specifically, the CFI values of canola had more than one threshold. For the optimal CFI images during the optimum period, we count and analyze the histogram of CFI values by 3517 canola and 3517 non-canola pixels in the first study area to confirm a preset threshold. Due to the preset threshold being obtained from the samples in the first study area, it was necessary to further verify the stability and reliability of the preset threshold in other regions. Therefore, the data surrounding the preset threshold were taken one by one as the judgment threshold to distinguish between canola and non-canola. Then the accuracies of the classification results were compared by using the confusion matrix accuracy verification method [
43,
44] and all validation samples in the three study areas. The threshold corresponding to the highest accuracy was the best judgment threshold required by the model. The rule of threshold traversal was to gradually move five steps, with 0.01 interval, to the left and right sides of the preset threshold. Thus, the best threshold was selected from the 11 candidate thresholds according to their performance in the classification results.
Then the classification accuracies of the decision tree model, SVM, and RF methods were compared in the three study areas. The primary purpose was to test whether the decision tree model was better than the SVM or RF classifier for canola mapping. Another purpose was to test the applicability of the decision tree model because training samples were not used in the second and third study areas. If the classification accuracy of the decision tree model was satisfactory in the second and third study areas, automatic identification of canola would have been achieved without relying on training samples.
2.6. Accuracy Verification
In the study, 45 validation quadrats with dimensions of 0.5 km × 0.5 km were randomly selected. Three steps were taken to complete the production of those validation quadrats, as shown in
Figure 4. First, the boundaries of various ground objects within each validation quadrat were manually plotted in accordance with Google imagery with a spatial resolution of 0.1 m × 0.1 m. Second, the vector data attributes for different ground objects were determined and labeled from field-based survey data. The canola fields were labeled as “canola” type, and all other ground objects were labeled as “other” type. Third, the vector data were converted to raster data, and their spatial resolution was converted to 10 m, the same as the classification results. Those raster data were regarded as the ground-truth samples. Then the confusion matrix accuracy verification method [
43,
44] and F1 score [
45] were used to verify the classification accuracy. The confusion matrix accuracy parameters included overall accuracy, production accuracy, user accuracy, and the kappa coefficient.
2.7. Comparison of CFI with Other Canola Indices
Some canola indices have been used for canola identification in previous studies. For example, Sulik and Long proposed a canola ratio index (CRI) [
3] and a normalized difference yellowness index (NDYI) [
1]. Ashourloo et al. [
11] proposed a canola index (CI). The equations are as follows:
where
βgreen,
βblue,
βred, and
βnir represent the spectral reflectance on green, blue, red, and near-infrared wavebands, respectively.
In order to compare the performance of the CFI proposed in this study with those existing canola indices, i.e., CRI, NDYI, and CI, their classification accuracies based on different classification method were also obtained in the study.
4. Discussion
This study developed a novel CFI for automatic canola mapping. The CFI is a spectral index for canola field detection using remote-sensing data. Its advantages are calculation simplicity and effectiveness, and a high automation level.
The spectral difference between canola and winter wheat was slight during their most-growth stages. Wheat can seriously reduce the mapping accuracy of canola, especially in mixed planting areas of wheat and canola. However, the yellow canola flowers provide another opportunity to distinguish between canola and winter wheat, even though the flowering stage is transitory, only about one month [
3].
According to our observations, the yellow flowers of canola can create a visual difference between canola fields and winter wheat fields. The most spectral differences between yellow petals and non-yellow petals are visible in the green and red bands [
46]. The content of carotenoids in the canola petal is very high; they absorb blue light and reflect green and red light [
3]. At the vegetation canopy scale, the recorded spectral reflectance values came from the canola’s yellow flowers and green leaves and stems. The values from other crops, including winter wheat, came only from their green leaves and stems [
11,
47]. As a result, the spectral reflectance values of canola in the red and green bands were higher than those of other vegetation types during the canola flowering season. However, in the blue and NIR bands, the spectral reflectance values of canola and other vegetation types were similar.
The red-edge bands of Sentinel-2 images play an important role in establishing parameters such as the leaf area index [
48]. However, in this study it was not found that the red-edge bands made a significant contribution to the identification of canola according to our previous experimental results. As Griffiths, Nendel and Hostert [
32] pointed out, the red-edge bands only slightly improve overall accuracy. In addition, the spatial resolution of the red-edge bands is 20 m × 20 m, which is lower than that of blue, green, red, and NIR bands, whose spatial resolution is 10 m × 10 m. Therefore, the red-edge bands were not used to build the CFI in this study.
The reflectivity of most ground objects in the blue waveband is usually low. Under the influence of a complex atmospheric environment, sometimes the reflectivity of ground objects might be close to zero in the blue band of an atmospheric corrected image. At that time, the NDYI of those objects will be similar to that of canola flowers, even if their reflectivity in the green band is much lower. This phenomenon was found in the Sentinel-2 image on 12 April 2020 in the third study region. Nevertheless, the CFI can effectively avoid similar problems. For example, the NDYI cannot completely distinguish between forest and canola in some particular forest areas, whereas the CFI had better performance at identifying canola, as shown in
Figure 10.
During the canola flowering stage, the CFI values of the canola field were greater than the threshold of 0.14, as shown in
Figure 11. However, the CFI value of canola did not exceed the threshold in other growth stages. Therefore, the decision tree model based on CFI images can achieve the automatic and accurate identification of canola.
The flowering period of canola is only approximately 1 mon. Suppose no remote-sensing images were available during the flowering period, due to the influence of cloud and rain. In that case, it would be difficult to identify the planting distribution of canola by using the method proposed in this study. This is the limitation of using optical imagery data to identify canola.
Of course, at the development stage, the automated algorithm requires substantial expert input and image analysis to isolate type-specific properties from inter-annual and inter-region variability [
13]. It was concluded that an important research direction is to evaluate the ability of the CFI in different years by considering various climates and other conditions in future research to verify the results of this study.
5. Conclusions
The spectral index, the CFI, proposed in this study, is extremely sensitive to yellow canola flowers. Therefore, the CFI has great potential to identify canola planting distribution accurately. The following conclusions were drawn:
The flowering stage of canola is the best time to identify its planting distribution by remote-sensing data, especially in mixed planting areas of different types of winter crops.
CFI integrates four kinds of spectral information: blue, green, red, and NIR wavebands. It dramatically reduces the dimensions and volume of remote-sensing data and enhances the image information of canola flowers.
The decision tree model based on CFI images can improve the classification accuracy of canola compared to other canola indices. In addition, this decision tree model has good universality. When this model is applied elsewhere, the model threshold does not need to be adjusted.