Finer Classiﬁcation of Crops by Fusing UAV Images and Sentinel-2A Data

: Accurate crop distribution maps provide important information for crop censuses, yield monitoring and agricultural insurance assessments. Most existing studies apply low spatial resolution satellite images for crop distribution mapping, even in areas with a fragmented landscape. Unmanned aerial vehicle (UAV) imagery provides an alternative imagery source for crop mapping, yet its spectral resolution is usually lower than satellite images. In order to produce more accurate maps without losing any spatial heterogeneity (e.g., the physical boundary of land parcel), this study fuses Sentinel-2A and UAV images to map crop distribution at a ﬁner spatial scale (i.e., land parcel scale) in an experimental site with various cropping patterns in Heilongjiang Province, Northeast China. Using a random forest algorithm, the original, as well as the fused images, are classiﬁed into 10 categories: rice, corn, soybean, buckwheat, other vegetations, greenhouses, bare land, water, roads and houses. In addition, we test the e ﬀ ect of UAV image choice by fusing Sentinel-2A with di ﬀ erent UAV images at multiples spatial resolutions: 0.03 m, 0.10 m, 0.50 m, 1.00 m and 3.00 m. Overall, the fused images achieved higher classiﬁcation accuracies, ranging between 10.58% and 16.39%, than the original images. However, the fused image based on the ﬁnest UAV image (i.e., 0.03 m) does not result in the highest accuracy. Instead, the 0.10 m spatial resolution UAV image produced the most accurate map. When the spatial resolution is less than 0.10 m, accuracy decreases gradually as spatial resolution decreases. The results of this paper not only indicate the possibility of combining satellite images and UAV images for land parcel level crop mapping for fragmented landscapes, but it also implies a potential scheme to exploit optimal choice of spatial resolution in fusing UAV images and Sentinel-2A, with little to no adverse side-e ﬀ ects.


Introduction
Crop classification and identification is one of the classic research topics in the scientific community of remote sensing. Over the past decades, great efforts have been taken to develop multiple methods for crop classification using different remotely sensed data. These studies are generally focused on crop composition surveys and the classifications are conducted with the low (MODIS, AVHRR) and medium (Landsat, Sentinel, HJ, GF) spatial resolution data. In these instances, the crop distribution maps produced are at relatively large scales, ranging from global [1][2][3], national [4][5][6] and provincial [7][8][9]

Study Area
The research site is in Heilongjiang Province in northeast China, located at the National Modern Agriculture Demonstration Park of Heilongjiang Academy of Agricultural Sciences, Harbin (45.73 • N, 126.66 • E) ( Figure 1). The demonstration park has an area of 3 km × 2.5 km, covering 750 ha. The average elevation of Harbin is 151 m above sea level. Its climate is a typical mid-temperate continental monsoon climate, and the winter is long and cold, while summer is short and hot. Its annual precipitation and mean temperature are 524 mm and 3.5 • C, respectively. The precipitation is mainly concentrated in June-September, with the summer precipitation accounting for 60% of total annual precipitation. The snowfall period is from November to January of each year. The four seasons are distinct with an average temperature of −19 • C in January and an average temperature of approximately 23 • C in July [24][25][26].
Remote Sens. 2019, 11, x FOR PEER REVIEW 3 of 19 of spatial resolution in fusing UAV images and Sentinel-2A, which has not been studied in previous work.

Study Area
The research site is in Heilongjiang Province in northeast China, located at the National Modern Agriculture Demonstration Park of Heilongjiang Academy of Agricultural Sciences, Harbin (45.73°N, 126.66°E) (Figure 1). The demonstration park has an area of 3 km × 2.5 km, covering 750 ha. The average elevation of Harbin is 151 m above sea level. Its climate is a typical mid-temperate continental monsoon climate, and the winter is long and cold, while summer is short and hot. Its annual precipitation and mean temperature are 524 mm and 3.5 °C, respectively. The precipitation is mainly concentrated in June-September, with the summer precipitation accounting for 60% of total annual precipitation. The snowfall period is from November to January of each year. The four seasons are distinct with an average temperature of -19 °C in January and an average temperature of approximately 23 °C in July [24][25][26].

Data Sources
The UAV images were acquired with a lightweight (4.4 kg), commercial, model Agristrong fixed-wing drone (Agristrong Corporation, Beijing, China, Figure 2) [27]. The camera equipped in fixed-wing drone is Sony α7R (Sony Corporation, Tokyo, Japan). This model is composed of a 35 mm full-frame (35.9 × 24 mm) Exmor CMOS sensor and capable of acquiring 36.4 megapixel (7360 × 4912 pixels) spatial resolution images.

Data Sources
The UAV images were acquired with a lightweight (4.4 kg), commercial, model Agristrong fixed-wing drone (Agristrong Corporation, Beijing, China, Figure 2) [27]. The camera equipped in fixed-wing drone is Sony α7R (Sony Corporation, Tokyo, Japan). This model is composed of a 35 mm full-frame (35.9 × 24 mm) Exmor CMOS sensor and capable of acquiring 36.4 megapixel (7360 × 4912 pixels) spatial resolution images. The UAV images have a spatial resolution of 0.03 m, 0.10 m, 0.50 m, 1.00 m and 3.00 m and include three bands for red, green and blue (RGB) wavelengths ( Figure 5a). We captured the UAV images (0.03 m) at 2:00 pm, September 14th, 2018. In order to avoid the effects of environment and camera configurations on imagery acquisition (e.g., light condition, shutter speed and white balance), we focused on the impacts of pixel size [28] by sub-sampling the highest resolution images (0.03 m) to produce lower resolution images (0.10 m, 0.50 m, 1.00 m and 3.00 m). Access to the study area (The National Modern Agriculture Demonstration Park) was limited due government activities on September 15-16, 2018, so all of the UAV images were collected on September 14th, 2018. We confirmed that no change in planting structure occurred in the following two days.
The Sentinel-2A MSI image (processing level 1C) used in this study was acquired on September 16, 2018, and was downloaded from the ESA Sentinels Scientific Data Hub. The processing level 1C includes radiometric and geometric corrections with sub-pixel accuracy. The revisit frequency of each single satellite is 10 days and the combined constellation revisit is five days. Sentinel-2 data are acquired on 13 spectral bands in the VNIR and SWIR with spatial resolutions ranging from 10 to 60 m [29]. These spectral channels include [30]:  Four bands at 10 m spatial resolution: blue (490 nm), green (560 nm), red (665 nm) and near infrared (842 nm).  Six bands at 20 m spatial resolution: four narrow bands are mainly used for vegetation characterization in the red edge (705 nm, 740 nm, 783 nm and 865 nm) and two wider SWIR bands (1610 nm and 2190 nm) for applications such as snow/ice/cloud detection or vegetation moisture stress assessment.  Three bands at 60 m spatial resolution for applications such as cloud screening and atmospheric corrections (443 nm for aerosols, 945 nm for water vapour and 1375 nm for cirrus detection). Ten bands (four bands at 10 m and six bands at 20 m) of the Sentinel-2A data were used in this study ( Figure 5b). Additionally, we did a field survey to collect ground truth data (points observation) of land cover classes at the plot scale. We created a GIS layer of parcels in ShapeFile The UAV images have a spatial resolution of 0.03 m, 0.10 m, 0.50 m, 1.00 m and 3.00 m and include three bands for red, green and blue (RGB) wavelengths ( Figure 5a). We captured the UAV images (0.03 m) at 2:00 pm, September 14th, 2018. In order to avoid the effects of environment and camera configurations on imagery acquisition (e.g., light condition, shutter speed and white balance), we focused on the impacts of pixel size [28] by sub-sampling the highest resolution images (0.03 m) to produce lower resolution images (0.10 m, 0.50 m, 1.00 m and 3.00 m). Access to the study area (The National Modern Agriculture Demonstration Park) was limited due government activities on September 15-16, 2018, so all of the UAV images were collected on September 14th, 2018. We confirmed that no change in planting structure occurred in the following two days.
The Sentinel-2A MSI image (processing level 1C) used in this study was acquired on September 16, 2018, and was downloaded from the ESA Sentinels Scientific Data Hub. The processing level 1C includes radiometric and geometric corrections with sub-pixel accuracy. The revisit frequency of each single satellite is 10 days and the combined constellation revisit is five days. Sentinel-2 data are acquired on 13 spectral bands in the VNIR and SWIR with spatial resolutions ranging from 10 to 60 m [29]. These spectral channels include [30]: • Four bands at 10 m spatial resolution: blue (490 nm), green (560 nm), red (665 nm) and near infrared (842 nm).

•
Six bands at 20 m spatial resolution: four narrow bands are mainly used for vegetation characterization in the red edge (705 nm, 740 nm, 783 nm and 865 nm) and two wider SWIR bands (1610 nm and 2190 nm) for applications such as snow/ice/cloud detection or vegetation moisture stress assessment.

•
Three bands at 60 m spatial resolution for applications such as cloud screening and atmospheric corrections (443 nm for aerosols, 945 nm for water vapour and 1375 nm for cirrus detection).
Ten bands (four bands at 10 m and six bands at 20 m) of the Sentinel-2A data were used in this study ( Figure 5b). Additionally, we did a field survey to collect ground truth data (points observation) of land cover classes at the plot scale. We created a GIS layer of parcels in ShapeFile format to digitize Remote Sens. 2019, 11, 3012 5 of 17 complete fields for the study area ( Figure 7a) and validation area (Figure 8a). Our classification scheme included rice, corn, soybean, buckwheat, other vegetations, greenhouses, bare land, waters, roads and houses. The number and area of these plots are listed in Table 1. Validation area included rice, corn, sorghum, green onions, vegetables and others. The UAV (0.03 m) and Sentinel-2A images for the validation area are shown in Figure 6a,b.

Methodology
We utilized a robust image fusion, classification and accuracy assessment workflow to our analysis ( Figure 3). First, we acquired the UAV, Sentinel-2A images and truth data. All data were preprocessed (camera calibration, photos aligning, dense point clouds generation (based on the estimated camera positions we can calculate depth information for each camera to be combined into a single dense point cloud which can be used to generate the orthomosaic) and orthomosaic generation). For the image fusion, we used a Gram-Schmidt (GS) transformation, having been widely used in previous studies and successfully applied [31,32]. For crop classification, we used an Random Forest (RF) approach, a widely-used machine learning ensemble algorithm method [33]. We also explored the impact of the choice of UAV spatial resolution on crop classification. The individual steps of our approach are described in detail in Figure 3.

Data Fusion
From the Jenerowicz's study [22] (fusing Landsat 8 OLI MS (30 m) and UAV (0.04 m), which is similar to Sentinel-2A (10 and 20 m) and UAV (0.03 m) used in our study), the Gram-Schmidt transformation is fast and easy to implement and generates fused images with high integration quality colour and spatial detail. The Gram-Schmidt (GS) transformation, introduced by Laben [34]. It is a commonly used method in multivariate statistics and linear algebra. Similar to the principal

Data Fusion
From the Jenerowicz's study [22] (fusing Landsat 8 OLI MS (30 m) and UAV (0.04 m), which is similar to Sentinel-2A (10 and 20 m) and UAV (0.03 m) used in our study), the Gram-Schmidt transformation is fast and easy to implement and generates fused images with high integration quality colour and spatial detail. The Gram-Schmidt (GS) transformation, introduced by Laben [34]. It is a commonly used method in multivariate statistics and linear algebra. Similar to the principal component transformation, it can transform a multidimensional image or matrix by orthogonal transformation to eliminate correlations between the bands of multispectral data. There are essential differences between the Gram-Schmidt transformation and the principal component analysis. In the Gram-Schmidt transformation, the components are only orthogonal, with the amount of information contained in each component being like avoiding excessive concentration of information in a single component. While in the principal component, the information between the principal components is redistributed so the first principal component contains the most information. A diagram of the Gram-Schmidt transform pan-sharpening technique is shown in Figure 4.  (1) The UAV panchromatic band used in this study is produced by taking the mean value of all bands of UAV images [21]. (2) Calculate the mean and standard deviation of UAV panchromatic band.
(3) The Sentinel-2A data are combined into a single, simulated lower resolution panchromatic band.
The simulated low-resolution panchromatic band is used as the first band of the low-resolution multispectral data as input of the original multispectral band to Gram-Schmidt (GS) transform. (4) Calculate the mean and standard deviation of the first band (GS1) of the images obtained by Gram-Schmidt (GS) transform. (5) The UAV panchromatic band (UAV-PAN) is then stretched so that its mean digital count (μUAV-PAN) and standard deviation (σUAV) match the mean (μGS1) and standard deviation (σGS1) of the first GS band. (6) The stretched high-resolution panchromatic band is then swapped for the first GS band, and the data is transformed back into original multispectral band space producing N+1 higher resolution multispectral bands. For our analysis, the Sentinel-2A data were fused with five different UAV datasets at varying resolutions (0.03 m, 0.10 m, 0.50 m, 1.00 m and 3.00 m).

Crop Classification
We tested multiple classification methods with the 0.10 m fused imagery: Random Forest, Support Vector Machine and Neural Net. The results showed that Random Forest produced the most accurate result (Overall Accuracy = 88.32%, Kappa Coefficient = 0.84), followed by Support Vector (1) The UAV panchromatic band used in this study is produced by taking the mean value of all bands of UAV images [21]. and standard deviation (σ UAV ) match the mean (µ GS1 ) and standard deviation (σ GS1 ) of the first GS band.
(6) The stretched high-resolution panchromatic band is then swapped for the first GS band, and the data is transformed back into original multispectral band space producing N+1 higher resolution multispectral bands.
For our analysis, the Sentinel-2A data were fused with five different UAV datasets at varying resolutions (0.03 m, 0.10 m, 0.50 m, 1.00 m and 3.00 m).

Crop Classification
We tested multiple classification methods with the 0.10 m fused imagery: Random Forest, Support Vector Machine and Neural Net. The results showed that Random Forest produced the most accurate result (Overall Accuracy = 88.32%, Kappa Coefficient = 0.84), followed by Support Vector Machine (Overall Accuracy = 86.75%, Kappa Coefficient = 0.82) and Neural Net classification (Overall Accuracy = 85.34%, Kappa Coefficient = 0.81). There is little difference among the results of three classification algorithms, but since the Random Forest performed slightly better in our study area that was the method we chose for classification. More details can be found in Appendix A.
Random forest is a widely-used machine learning method which can be used for classification and regression. The advantages of using a random forest classifier are that they produce extremely high accuracies, are unlikely to over-fit, are less affected by noise, can process high dimensional data and require no feature selection. Our random forest classifier was trained using a 0.01% subset of our ground truth data which included the following land cover classes: rice, corn, soybean, buckwheat, other vegetations, greenhouse, bare land, water, road and houses. We used ENVI RF module to do perform the classification, inputting all bands. In addition to the five fused datasets, we also conducted classifications on the original UAV and Sentinel-2A images, separately.

Accuracy Assessment
Based on verification samples (all truth data, Figure 7a) and classification results, we generated the confusion matrix (Table 2) of all categories and calculated the overall accuracy, kappa coefficient, user accuracy and producer accuracy. The kappa coefficient [35], a statistical measure of inter-rater reliability, is calculated as follows: Kappa = N n k=1 P kk − n k=1 P k+ P +k N 2 − n k=1 P k+ P +k . (1) Overall accuracy represents the classification quality of entire map.
User accuracy = P kk P k+ .
The user accuracy is the proportion of a predicted class on a map, which matches the corresponding class on the reference ground.
Producer accuracy = P kk P +k .
Producer accuracy is the proportion of a predicted class on the reference class that is classified correctly in the field. We also calculated the average of every class's user and producer accuracy to refer to the classification quality of every class.
In order to verify the reliability and universality of the study area results, we repeated the experiment in the verification area using the same methodology as 3.1., 3.2., and 3.3. to classify the validation area images to rice, corn, sorghum, green onions, vegetables and others.

Influence of Different Resolution
The Sentinel-2A image was fused with different resolution UAV images (0.03 m, 0.10 m, 0.50 m, 1.00 m and 3.00 m) respectively. The fusion images were trained and classified. After classification, we compared the influence of different resolution UAV images.

UAV and Sentinel-2A Images Fusion
Fusing UAV images at 0.03 m (Figure 5a) with Sentinel-2A (Figure 5b) resulted in the image in Figure 5c. For the original UAV (0.03 m) and Sentinel-2A images, in terms of spatial resolution, the UAV images have high levels of textural and boundary information with clear, visual distinctions between image objects (Figure 5a). Conversely, the Sentinel-2A image displays objects much less distinctly with many pixels likely containing more than one land type (i.e., mixed pixels (Figure 5b)).
In terms of spectral resolution, the UAV images contain only three bands (RGB), making it difficult to distinguish features (e.g., fields) with high spectral similarity (e.g., corn and soybean, Figure 5a). The Sentinel-2A data contains 10 spectral bands including near-infrared and red-edge bands at 10-m and 20-m resolution, which allows for the differentiation of similar features. The data fusion of the two data sources results in images with the same spatial resolution as the UAV images with 10 spectral bands ( Figure 5). Visually, the fused images have the characteristics of high resolution, high texture and multi-spectral information, ideal for fine-scale classification of cropland.
For the verification area, the UAV image (Figure 6a) was fused with Sentinel-2A (Figure 6b). The fusion image (Figure 6c) was better than original images of UAV and Sentinel-2A (Figure 6a,b) in spectral resolution and spatial resolution. Figure 5a). The Sentinel-2A data contains 10 spectral bands including near-infrared and red-edge bands at 10-m and 20-m resolution, which allows for the differentiation of similar features. The data fusion of the two data sources results in images with the same spatial resolution as the UAV images with 10 spectral bands ( Figure 5). Visually, the fused images have the characteristics of high resolution, high texture and multi-spectral information, ideal for fine-scale classification of cropland. For the verification area, the UAV image (Figure 6a) was fused with Sentinel-2A (Figure 6b). The fusion image (Figure 6c) was better than original images of UAV and Sentinel-2A (Figure 6a, b) in spectral resolution and spatial resolution.

Classification after Images Fusion
The classification results using the 0.03 m resolution UAV original image, Sentinel-2A original data and the fusion image are shown in Figure 7. For the UAV image, the ultra-high resolution of 0.03 m results in very clear class boundaries and rich texture information with an overall classification accuracy of 76.77% and Kappa coefficient of 0.68 (Figure 7b, Table 3). The field boundaries obtained

Classification after Images Fusion
The classification results using the 0.03 m resolution UAV original image, Sentinel-2A original data and the fusion image are shown in Figure 7. For the UAV image, the ultra-high resolution of 0.03 m results in very clear class boundaries and rich texture information with an overall classification accuracy of 76.77% and Kappa coefficient of 0.68 (Figure 7b, Table 3). The field boundaries obtained from the classification of the Sentinel-2A image (Figure 7c) are relatively vague and limits its usefulness for applications such as estimated crop area and yield forecast. However, due to its multiple spectral bands, the recognition of large-area crops (i.e., rice, corn, soybean) is relatively good. The overall classification accuracy for the Sentinel-2A image is 71.93% with a Kappa coefficient of 0.65 (Table 3).  The fused images' classification results are substantially better than those obtained from the separate UAV and Sentinel-2A images. Regarding the highest spatial resolution, the overall accuracy of the classification is improved by 10% compared to the UAV and the Sentinel-2A data on their own with a value of 85.41% and Kappa coefficient of 0.80 (Figure 7d, Table 3). The classification accuracy (average of user and producer average) of each image for the various land cover classes is shown in Table 4. Based on the overall accuracy and Kappa, the fusion image is  The fused images' classification results are substantially better than those obtained from the separate UAV and Sentinel-2A images. Regarding the highest spatial resolution, the overall accuracy of the classification is improved by 10% compared to the UAV and the Sentinel-2A data on their own with a value of 85.41% and Kappa coefficient of 0.80 (Figure 7d, Table 3).
The classification accuracy (average of user and producer average) of each image for the various land cover classes is shown in Table 4. Based on the overall accuracy and Kappa, the fusion image is better than that of the original UAV and Sentinel-2A images. Within the fusion classification, each class has an accuracy better than original images, except for soybean and road. This low accuracy of soybean may be due to the data collection time being mid-September when soybeans are typically harvested, resulting in more background noise (i.e., soil, also considered to be soybean). Under these circumstances, a higher resolution image may get a poorer result. For the road, the fused image does not get a better result than a UAV image. Future studies should keep the time period of data collection and crop growth period in mind to improve the classification. The result for the verification area classification is shown in Figure 8. For the UAV image result, the classification scale is fine. However, because of the low spectral resolution, the salt and pepper effects are substantial. The sorghum and green onion classes do not have a good result using the RGB bands alone (Figure 8b). For Sentinel-2A image, the classification result is not bad, but the scale is rather coarse, with borders being difficult to discern (Figure 8c). The fused image gets a better result of fine-scale and accuracy. harvested, resulting in more background noise (i.e., soil, also considered to be soybean). Under these circumstances, a higher resolution image may get a poorer result. For the road, the fused image does not get a better result than a UAV image. Future studies should keep the time period of data collection and crop growth period in mind to improve the classification. The result for the verification area classification is shown in Figure 8. For the UAV image result, the classification scale is fine. However, because of the low spectral resolution, the salt and pepper effects are substantial. The sorghum and green onion classes do not have a good result using the RGB bands alone (Figure 8b). For Sentinel-2A image, the classification result is not bad, but the scale is rather coarse, with borders being difficult to discern (Figure 8c). The fused image gets a better result of fine-scale and accuracy. The classification results in the verification area again show that the classification of the fused images with 0.03 m resolution are significantly better than both the original UAV and Sentinel-2A images. The UAV image's classification accuracy is the lowest at 77.70% with a Kappa coefficient of 0.67 (Table 5). The Sentinel-2A image classification has a slightly better accuracy of 86.51% and a Kappa coefficient of 0.81 (Table 5). Moreover, the fused image (0.03 m) achieved an even higher accuracy of 91.54%, with a Kappa coefficient of 0.87 (Table 5).

Classification at Different Resolutions
The classification results using each resolution of the fusion images are shown in Figure 9. The The classification results in the verification area again show that the classification of the fused images with 0.03 m resolution are significantly better than both the original UAV and Sentinel-2A images. The UAV image's classification accuracy is the lowest at 77.70% with a Kappa coefficient of 0.67 (Table 5). The Sentinel-2A image classification has a slightly better accuracy of 86.51% and a Kappa coefficient of 0.81 (Table 5). Moreover, the fused image (0.03 m) achieved an even higher accuracy of 91.54%, with a Kappa coefficient of 0.87 (Table 5).

Classification at Different Resolutions
The classification results using each resolution of the fusion images are shown in Figure 9. The highest level of accuracy was achieved using a spatial resolution of 0.10 m (88.32%, Kappa=0.84) followed by 0.50 m (87.3%, Kappa=0.83), 1.00 m (85.8%, Kappa=0.80) and 0.03 m (85.43%, Kappa=0.80, Table 6). The largest spatial resolution of 3.00 m had the lowest accuracy level of the fused images (82.51%, Kappa=0.77, Table 6). Somewhat counterintuitively, these results show that classification accuracy does not always improve with higher image resolution. (82.51%, Kappa=0.77, Table 6). Somewhat counterintuitively, these results show that classification accuracy does not always improve with higher image resolution.    Based on the accuracy values, the 0.10 m fusion image is better than that of 0.03 m fusion image, 0.50 m fusion image, 1.00 m fusion image and 3.00 m fusion image ( Table 6). Within the 0.10 m fusion classification, each class has an accuracy >=0.70, except for soybean (Table 7). This low accuracy holds for all the images and may be due to the data collection time being mid-September when soybeans are typically harvested. Regardless, the 0.10 m fusion image has the highest accuracy for all classes (Table 7). For the verification area, the classification results using each resolution of the fusion images are shown in Figure 10. Similar to the original area's results, the overall accuracy and the Kappa coefficient of the 0.10 m fusion data the highest of the fused classification results (92.03%, Kappa = 0.88, Table 8).  (Table 6). Within the 0.10 m fusion classification, each class has an accuracy >=0.70, except for soybean (Table 7). This low accuracy holds for all the images and may be due to the data collection time being mid-September when soybeans are typically harvested. Regardless, the 0.10 m fusion image has the highest accuracy for all classes ( Table 7). For the verification area, the classification results using each resolution of the fusion images are shown in Figure 10. Similar to the original area's results, the overall accuracy and the Kappa coefficient of the 0.10 m fusion data the highest of the fused classification results (92.03%, Kappa = 0.88, Table 8).

Discussion
Our study provides a method of crop classification for medium to small field sizes. Our crop classification method combines the high spatial resolution and high texture information of UAV imagery with the spectral information of Sentinel-2A. Ultimately, using this method allows for accurate plot classification of crops in complex agricultural landscapes. UAV and Sentinel-2A data can be obtained in a relatively cheap way, which is important for future agricultural research. Our research methods provide basic data for precision agriculture, which is the key and fundamental work of smart agriculture.
Crop classification using the results of the fusion of UAV and satellite data enables better crop classification results than a single data source [21,22]. The free access to Sentinel-2A data with a spatial resolution of up to 10 meters and high spectral resolution will undoubtedly provide great help for agricultural research in the future. Previous studies have determined the role of UAV and satellite data fusion, but no further discussion in crop classification and UAV data selection has been conducted. The UAV data choice will also affect the final classification results of crops. Choosing suitable UAV data will not only save resource consumption but also help obtain reliable classification results. Our study identified that the optimal spatial resolution for UAV imagery to fuse with Sentinel-2A images for plot-level crop classification was not the highest resolution tested (0.03 m), but the next highest, 0.10 m.
Our study does have some limitations. We classify crops in a heterogeneous arable landscape using a fused image. While the main crops are corn, soybean, and rice, areas with different crop types (such as wheat), may differ in terms of optimal UAV spatial resolution. The extent to which one can generalize optimal resolutions across crop types requires more study. However, our study provides an excellent starting point for the process of choosing a suitable spatial resolution when using UAV images. We have a GIS layer with the boundaries of agricultural parcels. However, we chose not to mask out the non-agricultural areas in order to verify the reliability and applicability of this crop classification method (fusing UAV and Sentinel-2A) despite the noise due to non-agricultural landcover. In our study, even with the interference of the non-agricultural landcover, our result was relatively accurate using the fused images. We believe that masking the non-agricultural, would achieve an even higher accuracy. Future work should also test the benefits of including textural features as they are often useful for high-resolution image classification. Due to the logistical constraints of our field data collection (e.g. acquisition time, data storage), we were not able to use time-series data for our UAV imagery. Although our study achieved a relatively good accuracy when using UAV images from a single date, repeat measurements at multiple dates may have yielded better results.
Our study highlights the feasibility of fusing UAV with Sentinel-2A images for crop classification and tests the optimal resolution of UAV for fusing. By this method, Sentinal-2A's spectral information could be added to UAV images to improve the classification result. In addition, UAV can get the height of crops which will be an important differentiating factor for classification.

Conclusions
Our results show that the images obtained by the fusion of Sentinel-2A data and UAV images combine the advantages of both data sources with little to no discernible disadvantages. In this study, the unification of Sentinel-2A's rich spectral information with the ultra-high spatial resolution and texture information from the low-cost, UAV-obtained images yielded superior classification results than using the data separately. Fusing UAV with Sentinel-2A images can improve the accuracy of the classification and classify individual crops at the scale of the plot.
When comparing the different resolutions of the fused imagery, we found that the highest spatial resolution did not necessarily achieve the most accurate results. In our study, 0.10 m resolution UAV images fused with Sentinel-2A can generate good results with less input. Thus, one should consider the context and goal of the classification when choosing the spatial resolution.    From the test, Random Forest gets the best result with an overall accuracy of 88.32% and Kappa coefficient of 0.84, followed by Support Vector Machine (overall accuracy of 86.75% and Kappa coefficient of 0.82) and Neural Net (overall accuracy of 85.34% and Kappa coefficient of 0.81). There is little difference among the results of three classification algorithms, but since the Random Forest performed slightly better in our study area that was the method we chose for classification.