An Improved Multi-temporal and Multi-feature Tea Plantation Identification Method Using Sentinel-2 Imagery

As tea is an important economic crop in many regions, efficient and accurate methods for remotely identifying tea plantations are essential for the implementation of sustainable tea practices and for periodic monitoring. In this study, we developed and tested a method for tea plantation identification based on multi-temporal Sentinel-2 images and a multi-feature Random Forest (RF) algorithm. We used phenological patterns of tea cultivation in China’s Shihe District (such as the multiple annual growing, harvest, and pruning stages) to extracted multi-temporal Sentinel-2 MSI bands, their derived first spectral derivative, NDVI and textures, and topographic features. We then assessed feature importance using RF analysis; the optimal combination of features was used as the input variable for RF classification to extract tea plantations in the study area. A comparison of our results with those achieved using the Support Vector Machine method and statistical data from local government departments showed that our method had a higher producer’s accuracy (96.57%) and user’s accuracy (96.02%). These results demonstrate that: (1) multi-temporal and multi-feature classification can improve the accuracy of tea plantation recognition, (2) RF classification feature importance analysis can effectively reduce feature dimensions and improve classification efficiency, and (3) the combination of multi-temporal Sentinel-2 images and the RF algorithm improves our ability to identify and monitor tea plantations.


Introduction
Tea is an economically significant crop in global agriculture [1,2] and an important economic engine in many developing countries [3]. The global tea industry has developed rapidly since the beginning of this century; according to the International Tea Commission, the global tea plantation area in 2015 was 4.52 million ha, a 70.6% increase over the 2.65 million ha in 2000 (http://www.inttea.com/). Tea production played an important role in the development of the Chinese agricultural economy [4], districts in China, and covers 1783 km 2 between 113°42′36″ E to 114°08′34˝ E and 31°24′06″ N to 32°33′00″. The area has a continental monsoon climate within the transition from subtropical to warm temperate zones, with an average temperature of 15.1 °C and an annual precipitation of 1109.11 mm. The elevation ranges from 54 to 906 m, with the highest terrain in the Tongbai and Dabie Mountains to the southwest and the lowest in the northeastern plains along the Huaihe River. This district is the largest county for green tea production in China and is the origin and main production area of the famous Chinese teas "Xinyang Maojian" and "Xinyang Red". In 2004, it was named the "Hometown of Chinese Tea" by the state forestry administration of the People's Republic of China. At present, tea plantations in the region cover 472.56 km 2 [17], accounting for 26.5% of the district's total area. Tea in China follows a regular annual growth cycle that begins in late March and goes through three growth stages and two rest stages before returning to dormancy in mid-late October [18]. Farmers usually pick tea during the growing stages; in the study area, spring tea begins to be picked in late March or early April through mid-late May, summer tea is picked in early June through early July, and autumn tea is picked in early August through early October. Tea plants in the study area are pruned three times a year to regulate and control their branching habits, promote hierarchy and health, and prevent pests and diseases; proper pruning can prolong the life of stable/high-yielding, high-quality tea plants [19,20]. A deep pruning is usually conducted in May following the spring harvest and two light prunings are conducted in August and September ( Figure 2). Our field investigations showed that there was a significant difference in appearance between tea plantations before and after deep pruning (Figure 3). Tea in China follows a regular annual growth cycle that begins in late March and goes through three growth stages and two rest stages before returning to dormancy in mid-late October [18]. Farmers usually pick tea during the growing stages; in the study area, spring tea begins to be picked in late March or early April through mid-late May, summer tea is picked in early June through early July, and autumn tea is picked in early August through early October. Tea plants in the study area are pruned three times a year to regulate and control their branching habits, promote hierarchy and health, and prevent pests and diseases; proper pruning can prolong the life of stable/high-yielding, high-quality tea plants [19,20]. A deep pruning is usually conducted in May following the spring harvest and two light prunings are conducted in August and September ( Figure 2). Our field investigations showed that there was a significant difference in appearance between tea plantations before and after deep pruning ( Figure 3).
Sensors. 2019, 11, x FOR PEER REVIEW 4 of 16 Figure 2. Annual growth, picking, and pruning stages of tea in the study area delineated by the first ten (E), middle ten (M), and last ten (L) days of each month.

Sentinel-2 Image Data
The Sentinel-2 satellite images (Level-1C S2) were downloaded from the European Space Agency's (ESA) Sentinel Scientific Data Hub. We selected images from four different seasons (18 April 2018;12 June 2018;15 September 2017;and 19 December 2017) to account for changes in tea growth influenced by picking, pruning, weather conditions, and image availability. We choose the blue (B2), green (B3), red (B4), and near-infrared (B8) bands with 10 m resolution and four red-edge (B5, B6, B7, and B8A) bands with 20 m resolution. Radiation calibration and atmospheric correction of the images, Figure 2. Annual growth, picking, and pruning stages of tea in the study area delineated by the first ten (E), middle ten (M), and last ten (L) days of each month. Annual growth, picking, and pruning stages of tea in the study area delineated by the first ten (E), middle ten (M), and last ten (L) days of each month.

Sentinel-2 Image Data
The Sentinel-2 satellite images (Level-1C S2) were downloaded from the European Space Agency's (ESA) Sentinel Scientific Data Hub. We selected images from four different seasons (18 April 2018;12 June 2018;15 September 2017;and 19 December 2017) to account for changes in tea growth influenced by picking, pruning, weather conditions, and image availability. We choose the blue (B2), green (B3), red (B4), and near-infrared (B8) bands with 10 m resolution and four red-edge (B5, B6, B7, and B8A) bands with 20 m resolution. Radiation calibration and atmospheric correction of the images, Figure 3. Effect of pruning on tea plantations in the study area: (a,b) field photos of tea plantations before and after pruning, respectively; (c,d) Sentinel-2 false colour images of tea plantations before and after pruning, respectively. red (B4), and near-infrared (B8) bands with 10 m resolution and four red-edge (B5, B6, B7, and B8A) bands with 20 m resolution. Radiation calibration and atmospheric correction of the images, as well as resampling of the red-edge bands from 20 m to 10 m, were carried out in ENVI 5.3 and ENVI 5.5.

Digital Elevation Model (DEM) Data
We obtained 30 m DEM data from NASA Shuttle Radar Topography Mission (SRTM), and used them, as well as slope and aspect data derived therefrom, as terrain feature variables for tea plantation identification and mapping.

Ground Survey Data and Sample Datasets
The sample bank of the study area was established using ground survey data and Google Earth high-resolution remote sensing image data. We used ground surveys in April and June, 2018, to collect 410 samples of typical land-use types, including tea plantations, forest, cropland, built-up, and water. 2259 polygonal samples were obtained by two researchers' independent visual interpretation of Google Earth imagery: the number of actual pixels was 29,321, of which 7828 were tea plantations and 21,493 were other categories. Thus, tea plantation samples accounted for 27% of the total, consistent with the actual proportion of tea plantation area in the study area. Stratified random sampling of the samples was carried out in ArcGIS 10.2, 70% of which were training samples with the rest serving as validation samples ( Figure 4). as well as resampling of the red-edge bands from 20 m to 10 m, were carried out in ENVI 5.3 and ENVI 5.5.

Digital Elevation Model (DEM) Data
We obtained 30 m DEM data from NASA Shuttle Radar Topography Mission (SRTM), and used them, as well as slope and aspect data derived therefrom, as terrain feature variables for tea plantation identification and mapping.

Ground Survey Data and Sample Datasets
The sample bank of the study area was established using ground survey data and Google Earth high-resolution remote sensing image data. We used ground surveys in April and June, 2018, to collect 410 samples of typical land-use types, including tea plantations, forest, cropland, built-up, and water. 2259 polygonal samples were obtained by two researchers' independent visual interpretation of Google Earth imagery: the number of actual pixels was 29,321, of which 7828 were tea plantations and 21,493 were other categories. Thus, tea plantation samples accounted for 27% of the total, consistent with the actual proportion of tea plantation area in the study area. Stratified random sampling of the samples was carried out in ArcGIS 10.2, 70% of which were training samples with the rest serving as validation samples ( Figure 4).

Methods
Based on the unique characteristics of tea plantations, we developed a method based on the multi-temporal and multi-feature Sentinel-2 images to distinguish tea plantations from their surrounding areas ( Figure 5).

Methods
Based on the unique characteristics of tea plantations, we developed a method based on the multi-temporal and multi-feature Sentinel-2 images to distinguish tea plantations from their surrounding areas ( Figure 5).

Feature Analysis and Selection
The main LULC types in the study area were tea plantations, evergreen forests, deciduous forests, dry land, paddy fields, built-up, and water. Our field investigations showed that many southeastern tea areas were interplanted with agroforestry species (such as chestnuts), so tea plantations were subdivided into two types: monoculture and polyculture. We extracted the spectral reflectance of different LULC types from Sentinel-2 multi-temporal images using the sample data, then calculated the mean reflectance of each type and analysed the spectral differences between tea plantations and the rest ( Figure 6). In the blue (B2), green (B3), red (B4), and red-edge (B5) bands, the spectral characteristics of both tea plantation types were similar to those of evergreen forest, deciduous forest, paddy fields, and dry land. In the near-infrared (B8) and red-edge (B6, B7, B8A) bands, although the reflectance of water was obviously distinct, there were different degrees of confusion between the two tea plantation types and others in different seasons. Therefore, it was difficult to clearly identify tea plantations using only the spectral features of the 8 bands, making it necessary to use auxiliary information such as spectral derivatives, NDVI, textures, and topographical features to improve the identification accuracy.

Feature Analysis and Selection
The main LULC types in the study area were tea plantations, evergreen forests, deciduous forests, dry land, paddy fields, built-up, and water. Our field investigations showed that many southeastern tea areas were interplanted with agroforestry species (such as chestnuts), so tea plantations were subdivided into two types: monoculture and polyculture. We extracted the spectral reflectance of different LULC types from Sentinel-2 multi-temporal images using the sample data, then calculated the mean reflectance of each type and analysed the spectral differences between tea plantations and the rest ( Figure 6). In the blue (B2), green (B3), red (B4), and red-edge (B5) bands, the spectral characteristics of both tea plantation types were similar to those of evergreen forest, deciduous forest, paddy fields, and dry land. In the near-infrared (B8) and red-edge (B6, B7, B8A) bands, although the reflectance of water was obviously distinct, there were different degrees of confusion between the two tea plantation types and others in different seasons. Therefore, it was difficult to clearly identify tea plantations using only the spectral features of the 8 bands, making it necessary to use auxiliary information such as spectral derivatives, NDVI, textures, and topographical features to improve the identification accuracy. Sensors. 2019, 11, x FOR PEER REVIEW 7 of 16 When the NDVI was plotted for the eight typical LULC types on each of the four imagery dates (Figure 7), three clear observations could be made. First, the NDVI of paddy fields, dry land, builtup, and water was obviously different from tea plantations and forests. Second, the NDVI of monoculture tea plantations was similar to polyculture tea plantations, evergreen forest, and deciduous forest in April and September but was significantly lower in June. This was because the vegetative characteristics of monoculture tea plantations were missing in early June (after harvest and extensive pruning) but this effect was buffered by the foliage of interplanted (chestnut) trees in the polyculture tea plantations. Third, the NDVI of monoculture tea plantations and evergreen forest was very similar in December while that of polyculture tea plantations and deciduous forests was lower. This was because the tea plants were dormant in December but retained their leaves, such that the NDVI of monoculture tea plantations was similar to evergreen forest, while that of deciduous forests (following leaf drop) was lowest, and that of polyculture tea plantations reflected the combination of evergreen tea plants and deciduous interplanted trees (like chestnuts). In April, June and September, chestnut trees were in the germination and leaf development stage, rapid growth stage, and fruit ripening stage, respectively, so their NDVI remained high, while by December, chestnut trees had dropped their leaves, pulling the NDVI of polyculture tea plantations downward. According to the field survey, most forest in the study area was deciduous, so the difference in NDVI between December and June can be used to distinguishing monoculture tea plantations, polyculture tea plantations, and most forest areas. December was a good period in which to distinguish polyculture tea from other similar types. When the NDVI was plotted for the eight typical LULC types on each of the four imagery dates (Figure 7), three clear observations could be made. First, the NDVI of paddy fields, dry land, built-up, and water was obviously different from tea plantations and forests. Second, the NDVI of monoculture tea plantations was similar to polyculture tea plantations, evergreen forest, and deciduous forest in April and September but was significantly lower in June. This was because the vegetative characteristics of monoculture tea plantations were missing in early June (after harvest and extensive pruning) but this effect was buffered by the foliage of interplanted (chestnut) trees in the polyculture tea plantations. Third, the NDVI of monoculture tea plantations and evergreen forest was very similar in December while that of polyculture tea plantations and deciduous forests was lower. This was because the tea plants were dormant in December but retained their leaves, such that the NDVI of monoculture tea plantations was similar to evergreen forest, while that of deciduous forests (following leaf drop) was lowest, and that of polyculture tea plantations reflected the combination of evergreen tea plants and deciduous interplanted trees (like chestnuts). In April, June and September, chestnut trees were in the germination and leaf development stage, rapid growth stage, and fruit ripening stage, respectively, so their NDVI remained high, while by December, chestnut trees had dropped their leaves, pulling the NDVI of polyculture tea plantations downward. According to the field survey, most forest in the study area was deciduous, so the difference in NDVI between December and June can be used to distinguishing monoculture tea plantations, polyculture tea plantations, and most forest areas. December was a good period in which to distinguish polyculture tea from other similar types.
Solving the first derivative of spectral reflectance can reflect the change rate of the original spectral curve and enhance the slight differences in slope for vegetation, better reflecting the essential characteristics in different growth stages and increasing the separability of land cover types [21,22]. Texture can also reflect the spatial structure characteristics of objects [23]. Compared with other LULC types, the spatial textural features of tea plantations were more significant. Adding these features to the tea plantation extraction process can thus make up for the lack of spatial information for spectral features and improve the classification accuracy [24]. In the study area, tea plantations were mostly distributed in low mountainous and hilly areas, such that topographic conditions including elevation, slope, and aspect directly affected the strip characteristics of tea plantations established along contour lines. Therefore, we extracted a total of 325 spectral, NDVI, and GLCM textural features from the four Sentinel-2 images, and topographic features as input variables (Table 1). Solving the first derivative of spectral reflectance can reflect the change rate of the original spectral curve and enhance the slight differences in slope for vegetation, better reflecting the essential characteristics in different growth stages and increasing the separability of land cover types [21,22]. Texture can also reflect the spatial structure characteristics of objects [23]. Compared with other LULC types, the spatial textural features of tea plantations were more significant. Adding these features to the tea plantation extraction process can thus make up for the lack of spatial information for spectral features and improve the classification accuracy [24]. In the study area, tea plantations were mostly distributed in low mountainous and hilly areas, such that topographic conditions including elevation, slope, and aspect directly affected the strip characteristics of tea plantations established along contour lines. Therefore, we extracted a total of 325 spectral, NDVI, and GLCM textural features from the four Sentinel-2 images, and topographic features as input variables (Table  1).

Classification Method
Random Forest (RF) is an ensemble learning algorithm proposed by Breiman that consists of multiple decision trees or classified regression trees [25]. Each tree is constructed by a certain number of random samples and random feature training [26][27][28]. The basic algorithm flow of RF classification is as follows: (1) Using the bootstrapping sampling method, two-thirds of the data are extracted as training samples (called in-bag data) and the remaining one-third are validation samples (called out-of-bag (OOB) data). The latter can be used to estimate the internal error. (2) A classification and regression tree is constructed for each training sample set to generate a random forest consisting of N trees. In the growth process of each tree, m is randomly selected from all the features M (usually m = √ M). In m features, the optimal segmentation feature is selected according to the Gini coefficient, calculated as follows: where C is the number of classes, N is the number of trees, and P represents the probability of belonging to C. (3) Combining the classification results of N decision trees, the final classification results are determined by the majority voting principle.
Multi-temporal and multi-category features are helpful for improving the recognition accuracy of LULC types, but the large dimensions of features involved in classification will lead to increasing computational complexity and decreasing computational efficiency of the classifier, and not every feature will have a significant impact on the classification accuracy. Therefore, it is necessary to extract the importance information of features and obtain a feature subset as small as possible by eliminating redundant or irrelevant features without significantly reducing the classification accuracy. The RF algorithm calculates variable importance using OOB data errors. First, for each tree i in the random forest, errOOB1 f i is calculated by using the OOB data of feature f ; then noise interference is added to feature f of OOB f i data randomly, and errOOB2 f i is calculated again; the formula for calculating feature f importance is as follows: When random noise is added, if the classification accuracy of the OOB data decreases dramatically (that is, errOOB2 increases), this shows that this feature has a clear impact on the prediction results of samples; in other words, it is of high importance.

Determination of Random Forest Parameter
We built the RF classification model using EnMap-Box 2.2 software [29,30], initially selecting 1-500 decision trees for parameter N. Experimental results (Figure 8) showed that the overall accuracy (OA) showed a fluctuating upward trend as N increased, but by N = 70 this had stabilized at OA = 95.12% with a calculation time of 3.5 minutes. When N > 70, the classification accuracy did not improve effectively, while the calculation time increased significantly, leading to decreased of calculation efficiency. Therefore, we chose N = 70 to construct the RF classification model.
We built the RF classification model using EnMap-Box 2.2 software [29,30], initially selecting 1-500 decision trees for parameter N. Experimental results (Figure 8) showed that the overall accuracy (OA) showed a fluctuating upward trend as N increased, but by N = 70 this had stabilized at OA=95.12% with a calculation time of 3.5 minutes. When N > 70, the classification accuracy did not improve effectively, while the calculation time increased significantly, leading to decreased of calculation efficiency. Therefore, we chose N = 70 to construct the RF classification model.

Accuracy Analysis of Multi-temporal and Multi-feature Tea Plantation Identification Method
In order to determine the best scheme for tea plantation identification, eight groups of feature models ( Table 2) were designed based on multi-temporal and multi-feature characteristics.

Accuracy Analysis of Multi-temporal and Multi-feature Tea Plantation Identification Method
In order to determine the best scheme for tea plantation identification, eight groups of feature models ( Table 2) were designed based on multi-temporal and multi-feature characteristics. Table 2. Eight groups of feature models used for accuracy analysis. We then compared the classification results of the different models (Table 3). Generally speaking, the classification accuracy showed an upward trend with increasing types of feature variables. The producer's accuracy and the overall accuracy for both tea plantation types in the multi-temporal spectral feature model S increased by 18.01%, 22.08%, and 9.61%, respectively, when compared with the single-temporal spectral feature model (S1-4).With regard to S, S + NDVI + DEM, and S + NDVI + DEM + GLCM, the producer's accuracies of monoculture tea plantations were 93.47%, 93.91%, 94.85, those for polyculture tea plantations were 81.19%, 82.01%, 82.24%, and the overall accuracies were 95.89%, 96.05% and 96.33%, respectively. The advantages of the multi-source information clearly complemented one other, which was conducive to increasing the separability of different LULC and improving the recognition accuracy of tea plantations and the overall classification effect.

Optimum Recognition Features for Tea Plantations
Although the classification accuracy of the S+NDVI+DEM+GLCM model was highest, its abundant feature variables resulted in a low calculation efficiency, making it necessary to select the optimum recognition features from the full set of 325. There are several methods for finding the optimal feature combination. One such method is the backward feature elimination algorithm [31]. Another method involves ranking the feature importance value and accumulating features one by one to the classifier, then selecting the feature subset with the highest accuracy [32]. Owing to the high feature dimension in this study, in order to improve the computational efficiency, we adopted a threshold segmentation method that considers the feature importance value and the number of bands.
Following RF classification feature importance analysis, we carried out experiments with different numbers of features. Because the RF algorithm was a random selection of samples and features, the results of each calculation were different, so the mean values of 10 calculations were obtained to avoid randomness errors. We then ranked the mean values of feature importance and selected features in four importance classes as input features for RF classification (Table 4). Using 10 features with average importance over 1.00 resulted in a better performance than any single-temporal spectral feature (Table 3), with an overall classification accuracy of 95.01%. Using 17 features with average importance over 0.90 produced an overall accuracy of 95.68%. Using 28 features with average importance over 0.80, produced results close to those achieved when using all 325 feature classifications. This was due to the addition of optimized multi-temporal spectral, NDVI, textural, and topographic features, which increased the spectral differences and separability between different objects. After feature selection, the redundant information was eliminated, and band information that played a key role in classification was retained; this greatly reduced the dimension of input features and effectively reduced the computational complexity of the classifier while achieving high classification accuracy.
In order to further explore the impact of each feature variable on classification accuracy, the optimal feature combination was used to identify tea plantations in the study area and calculate the importance of all 28 feature variables, which varied greatly ( Figure 9). Ele, Der1_B7-09-15, Ndvi_12-19, and Ndvi_12-19-Ndvi_6-12 had the greatest importance, indicating that elevation, the first derivative of the red-edge band (B7) on September 15, winter NDVI, and the difference between winter and summer NDVI had the greatest contribution to the identification of tea plantations. Seven first spectral derivative and mean texture features (Der1_B8A-06-12, Der1_B2-04-18, Der1_B3-06-12, Der1_B8A-12-19, Der1_B8-06-12, Der1_B8A-04-18, and Mea_B2-06-12) contributed clearly and equally to classification. Overall, there were 8 spectral features, 15 first derivative spectral features, 2 NDVI features, 1 terrain feature, and 1 texture feature; only mean texture contributed to classification and its contribution was low. All four temporal phase features contributed to tea plantation identification, but spectral features in June and September contributed the most.
In order to further explore the impact of each feature variable on classification accuracy, the optimal feature combination was used to identify tea plantations in the study area and calculate the importance of all 28 feature variables, which varied greatly ( Figure 9). Ele, Der1_B7-09-15, Ndvi_12-19, and Ndvi_12-19-Ndvi_6-12 had the greatest importance, indicating that elevation, the first derivative of the red-edge band (B7) on September 15, winter NDVI, and the difference between winter and summer NDVI had the greatest contribution to the identification of tea plantations. Seven first spectral derivative and mean texture features (Der1_B8A-06-12, Der1_B2-04-18, Der1_B3-06-12, Der1_B8A-12-19, Der1_B8-06-12, Der1_B8A-04-18, and Mea_B2-06-12) contributed clearly and equally to classification. Overall, there were 8 spectral features, 15 first derivative spectral features, 2 NDVI features, 1 terrain feature, and 1 texture feature; only mean texture contributed to classification and its contribution was low. All four temporal phase features contributed to tea plantation identification, but spectral features in June and September contributed the most. Figure 9. Feature importance ranking for the optimal feature combination. Figure 9. Feature importance ranking for the optimal feature combination. Figure 10 shows the extraction result for tea plantations in the study area using the 28 optimal features with the RF algorithm. In order to assess the overall result, we merged both tea plantation types and analysed the confusion matrix of the classification results. Of the 2449 tea plantation pixels, 2365 were correctly extracted and 84 were misclassified as other LULC types, and 98 of the other 6307 pixels were misclassified as tea plantations. Those misclassified as tea plantations were mainly forest, indicating the serious confusion between these types that affects the accuracy of tea plantation identification; this occurred mainly because of widespread tea plantations interplanting with other agroforestry in the study area. In April, June, and September, the polyculture tea plantations were interplanted with and almost covered by agroforestry. Although these can be distinguished using December imagery, some confusion remained between these very similar types. pixels were misclassified as tea plantations. Those misclassified as tea plantations were mainly forest, indicating the serious confusion between these types that affects the accuracy of tea plantation identification; this occurred mainly because of widespread tea plantations interplanting with other agroforestry in the study area. In April, June, and September, the polyculture tea plantations were interplanted with and almost covered by agroforestry. Although these can be distinguished using December imagery, some confusion remained between these very similar types.

Comparison and Analysis Classification Method Accuracy
In order to evaluate the RF method's performance for tea plantation identification, we used the same data (28 optimal feature combinations designated above) to identify tea plantations using the SVM algorithm and compared the results ( Table 5). The overall accuracy of the RF method was 1.49%

Comparison and Analysis Classification Method Accuracy
In order to evaluate the RF method's performance for tea plantation identification, we used the same data (28 optimal feature combinations designated above) to identify tea plantations using the SVM algorithm and compared the results ( Table 5). The overall accuracy of the RF method was 1.49% higher and the producer's and user's accuracy for tea plantations were 4.12% and 3.57% higher, respectively, while the classification accuracy of other LULC types was also improved. By comparing the tea plantation area extracted in this paper with that in the 2017 Xinyang Statistical Yearbook [17], we determined that the tea plantation areas extracted by RF and SVM were 44,198 ha and 41,829 ha, respectively, while the statistical area of tea plantations in the study area was 47,256 ha. The relative errors of the RF and SVM methods were 6.47% and 11.48%, respectively, demonstrating the improved performance of the former for tea plantation identification. In addition, the highest accuracy of tea plantation recognition using high-resolution imagery reported in the existing literature reached 95.51%, while that of medium-resolution imagery reached 88.2%; this shows that the RF classification algorithm combined with multi-temporal and multi-feature analysis of medium-resolution images was effective in extracting tea plantation areas.

Conclusions
We developed a new approach to identifying and classifying tea plantations and tested this using multi-temporal Sentinel-2 remote sensing imagery from the Shihe District of Xinyang City, Henan Province, China. We used the distinct phenological cycles of tea management (multiple annual periods of growth, harvest, and pruning), as well as the distinct characteristics of monoculture and polyculture tea plantations, to extract the initial classification features for eight typical LULC types in the area. These features included spectral reflectance, first derivative spectral features, temporal variations in NDVI, and textural and topographic features. Feature selection was carried out with the RF classification feature importance algorithm, then the RF classifier was used to extract tea plantation areas, with the following conclusions: (1) The combination of multi-temporal and multi-feature classification methods improved the overall accuracy and tea plantation classification producer's and user's accuracies compared with using single-temporal spectral features. (2) Selecting features using RF importance classification reduced the dimension of input features and the computational complexity, resulting in improved classification efficiency and accuracy. 28 features with average importance >0.80 were selected as optimal features, resulting in an overall classification accuracy of 97.92%, and the producer's and user's accuracy for tea plantations of 96.57% and 96.02%, respectively. The classification accuracy was similar to that achieved using 325 initial features before feature selection. (3) Comparing the classification accuracy of the RF and SVM methods for tea plantation identification, the former's overall accuracy was 1.49% higher and the producer's and user's accuracies were 4.12% and 3.57% higher, respectively.
Further research should focus on two areas. First, both RF and SVM are shallow machine learning algorithms, but the use of deep learning algorithms should be tested for the extraction of tea plantations to further improve recognition accuracy. Second, our methods should be tested and verified in other tea districts and at a larger scope.