An Automated Machine Learning Framework in Unmanned Aircraft Systems: New Insights into Agricultural Management Practices Recognition Approaches

The recent trend of automated machine learning (AutoML) has been driving further significant technological innovation in the application of artificial intelligence from its automated algorithm selection and hyperparameter optimization of the deployable pipeline model for unraveling substance problems. However, a current knowledge gap lies in the integration of AutoML technology and unmanned aircraft systems (UAS) within image-based data classification tasks. Therefore, we employed a state-of-the-art (SOTA) and completely open-source AutoML framework, Auto-sklearn, which was constructed based on one of the most widely used ML systems: Scikit-learn. It was combined with two novel AutoML visualization tools to focus particularly on the recognition and adoption of UAS-derived multispectral vegetation indices (VI) data across a diverse range of agricultural management practices (AMP). These include soil tillage methods (STM), cultivation methods (CM), and manure application (MA), and are under the four-crop combination fields (i.e., red clover-grass mixture, spring wheat, pea-oat mixture, and spring barley). Furthermore, they have currently not been efficiently examined and accessible parameters in UAS applications are absent for them. We conducted the comparison of AutoML performance using three other common machine learning classifiers, namely Random Forest (RF), support vector machine (SVM), and artificial neural network (ANN). The results showed AutoML achieved the highest overall classification accuracy numbers after 1200 s of calculation. RF yielded the second-best classification accuracy, and SVM and ANN were revealed to be less capable among some of the given datasets. Regarding the classification of AMPs, the best recognized period for data capture occurred in the crop vegetative growth stage (in May). The results demonstrated that CM yielded the best performance in terms of classification, followed by MA and STM. Our framework presents new insights into plant–environment interactions with capable classification capabilities. It further illustrated the automatic system would become an important tool in furthering the understanding for future sustainable smart farming and field-based crop phenotyping research across a diverse range of agricultural environmental assessment and management applications.


Introduction
Unmanned Aerial Systems (UAS) are considered one of the most significant technologies for the further development of precision agriculture (PA) [1] and sustainable smart farming [2]. UAS are frequently employed for the surveillance of cultivated lands, providing effective solutions for accurate decision support, increasing farming efficiency, enhancing profitability, reducing environmental impacts, and driving further technological innovation [1,3,4]. UAS equipped with various novel sensor types can be exploited to improve agreement and synergy between imagery and field reference data. In addition, these systems can also identify the regional monitoring requirements, such as disease detection, growth observation, yield estimation, and weed management [5,6]. In PA, vegetation indices (VI) are one of the most widely used outputs from UAS imagery applications and assist in the delivery of dependable spatial and temporal information across multiple agricultural activities. VIs typically constitute mathematical combinations of individual or groups of bands from the electromagnetic spectrum and are intended to minimize the effect of external confounding factors while enhancing the detectability of vegetation characteristics [5,7]. Currently, UAS-based remote sensing techniques offer a notable contribution in field-based crop phenotyping investigations [8]. Immediate and accurate acquisition of crop phenotypic information in various agri-environments supports the exploration of genetic-environmental interactions from critical production traits to determine the inheritance information and expression patterns to increase crop yields and tolerance to abiotic/biotic stresses [9,10]. However, it is crucial to take into account that field conditions are notoriously diverse compared to experimental environments, such as greenhouses or laboratories. Moreover, the outputs and findings collected from controlled environments can be difficult to extrapolate onto field settings and can impair the interpretation and application of research schemes [10].
Therefore, a common approach when identifying multiple crop management procedures and their interaction with the environment involves a well-conducted randomized experimental design, in which different agricultural management practices (AMP) are imposed on crops [11]. Variety performance trials (VPT) are a valuable method to address this issue. VPTs are regularly implemented in AMP research activities to improve the understanding of diverse systems and develop environmental management recommendations for variety selection [12,13]. Concerning the AMPs trial criteria chosen and the recent growth in environmental protection awareness under the concepts of sustainable agriculture, the flexibility of environmentally friendly cultivation methods, such as reduced tillage and the application of various minerals and organic fertilizers, are being developed [14]. For example, tillage reduction is an essential characteristic of agricultural management that changes the soil either physically, chemically, mechanically, or biologically to create the appropriate conditions for seedling sprouting and healthy plant growth [15,16], whereas organic additions, such as manure or organic fertilizers, are widely used methods to enhance soil fertility [17]. Studying VPT datasets, however, provides unique analysis problems due to the structure, nature, and husbandry variations of each trial. The evaluation of differences in management practices could potentially be confounded due to their nested structure (e.g., as opposed to controlled replicated treatments) [18]. These AMPs have been increasingly proposed as an ecological method involving nutrient management, increased water holding capacity, and recoupled C and N cycling in agricultural ecosystems to improve sustainability [19,20]. Although the specification of weather, soil, and management practices in current cropping systems are vital for robust model simulation and evaluation, these data are usually inaccessible for most cropping systems with adequate geospatial detail and lack of ability to replicate measured yields of field crops that received the best possible AMPs across a broad range of environments [21]. Recently, the application of UAS combined with popular machine learning (ML) systems drives a significant contribution to VPT crop biomass estimation. These results deepen the possibility of applying machine learning technology to diverse and complex AMP farmland classification applications.
Incorporating multisensory computing science approaches provides a wide range of valuable information for the expansion of precision farming practices [22]. ML techniques may not provide a universal solution in precision farming; however, these approaches enable better determination in verisimilitude scenarios with minimum human intervention. They provide not only a powerful and flexible framework for decision-making but also facilitate the integration of expert knowledge into the PA system [23]. Complexities become a drawback in VPTs since desired models need to contain training and testing databases and are often restricted by the number of pure line seeds, various AMPs (fertility test, tillage category, disease resistance, etc.), and confined areas with small sampling sizes to compensate for the labor-intensive fieldwork. Likewise, environmental factor interventions enhance obstacles in parameter selection in ML systems owing to the differences in location, climate, and soil properties [24]. Occasionally, even the same crop genotypes may not express similar spectral characteristics in UAS, which renders the models invalid. If the reference parameters exist to formulate relationship functions, the genuine implementation results are frequently unsatisfactory owing to mismatches between concepts and realities.
As an alternative, the innovative concept of automated machine learning (AutoML) has arisen to reduce these data-driven costs while becoming a significant topic as the exponential growth of computing power continues [25]. AutoML is defined as a combination of algorithm selection and hyperparameter optimization, which aims to recognize the mixture of algorithm components with the best (cross-validated) performance by covering from raw datasets to the deployable pipeline ML model to unravel substance problems [26]. AutoML is built to decrease the time demands of data scientists and save time by empowering specialists to build ML applications automatically without requiring widespread knowledge of ML [27] and entails the automated construction of an ML pipeline based on limited computational constraints [28]. Recent advancements in AutoML systems, such as Auto-WEKA [29] and Auto-sklearn, [30] are recommended as an artificial intelligence-based solution for the expanding challenge of ML applications by combining a highly parametric ML framework with a Bayesian optimization method for a given dataset, significantly streamlining these steps for non-experts [30]. The standard procedure of ML modeling involves data pre-processing, feature engineering, feature extraction, feature selection, algorithm selection, and hyperparameter optimization to increase the model's predictive performance [31].
Although AutoML has promoted great achievements in computer science and recently UAS applications, for example, the approximation of root-zone soil moisture [32] by AutoML interface H2O AutoML [33] and RGB-based crop phenotyping [34] by neural architecture search system AutoKeras [35], it has not been widely applied in multispectral image analysis. A current gap persists in the knowledge base for multispectral-based AMP analysis and agriculture land use studies in addition to the further understanding the potential for remotely sensed solutions to field-based and multifunctional platforms for the demands of plant phenotyping and smart farming management. To solve this knowledge gap, this study employed a state-of-the-art (SOTA) and completely open-source AutoML system, Auto-sklearn [30], which is constructed based on one of the most widely used ML systems, Scikit-learn, in the scientific Python community [36], combined with two novel AutoML visualization tools to explore UAS-derived multispectral vegetation indices (VI) as an example for handling the AMPs classification tasks.
More precisely, the aims of this study were to (1) build an AutoML framework for UAS classification tasks; (2) explore the applicability of UAS sensors to recognize multiple AMP categories, namely soil tillage methods (STM), cultivation methods (CM), and manure application (MA), which have currently not been efficiently examined and are absent of accessible parameters in both UAS and ML fields; and (3) compare AutoML's ability using different ML classifiers to identify image-based AMPs for diverse crop categories and its appropriate growth stages. To our knowledge, this paper is the first study to use an AutoML system with UAS -derived multispectral VIs, for the agricultural classification task. Moreover, this paper is the first to provide a novel AutoML framework, across multiple Remote Sens. 2021, 13, 3190 4 of 24 AMP activities, and present new insights into UAS and ML optimization methods for future PA and crop phenotyping research.

Study Area and Experiment Layout
This study commenced at the Agricultural Research Centre (ARC) in Kuusiku (58 • 58 52.7"N 24 • 42 59.1"E), Estonia (Figure 1a). The experimental area used in this study covered 226 hectares, of which the 2.87-hectares variety performance trial (VPT) area consists of two soil types: Calcaric Cambisol and Calcaric-Leptic Regosol [37]. The experimental layout consisted of four types of common crop and their regular combinations in Estonia, i.e., Field 1: red clover 75% (Trifolium pratense L.) with grass 25% (Festuca pratensis) (RC + G). Field 2: spring wheat (SW), Field 3: pea and oat mixture (P + O), and Field 4: spring barley with under-sowing red clover (SB + RC) in 2019 ( Figure 1b). This experimental design was developed to facilitate the understanding of the physiological conditions and yield performance capabilities of the chosen varieties and their combinations under three types of AMPs. To assess the UAS-based AMP detection capacity, the experiment was put together with three principal experimental factors (Figure 1c), which included: (1) soil tillage methods (STM), considering reduced tillage (R) (8-10 cm), ploughing (P) at a depth traditionally used in conventional tillage (18-20 cm), and disking (DP) (8-10 cm) as treatments; (2) cultivation methods (CM), considering conventional farming with mineral fertilizer application (Cmin+), organic farming with mineral fertilizer application (Omin+), and organic farming without mineral fertilizer (Omin−); and (3) manure applications (MA). Each field comprised 72 plots, which amounted to a total of 288 plots sampled within our study area. Figure 2 shows the workflow utilized to combine the UAS-based image collection, processing, sampling, and AutoML framework modified from [30]. A fixed-wing UAS eBee Plus (Sensefly Inc., Cheseaux-Lausane, Switzerland) equipped with GNSS postprocessed kinematic (PPK) capabilities was deployed with a Parrot Sequoia multispectral sensor (version 1.2.1, Parrot, Paris, France). This UAS platform and sensor were used for image acquisition and captured imagery across four spectral bands: green (530-570 nm), red (640-680 nm), red-edge (730-740 nm), and near-infrared (770-810 nm). To facilitate seasonal image processing and AMP recognition, UAS images were captured over three timeslots in 2019 at the Kuusiku Research Center: 23 April (temperature: 16 • C, wind speed: 11 km h −1 S, sunny), 30 May (temperature: 19 • C, wind speed: 12 km h −1 WSW, overcast), and 10 July (temperature 20 • C, wind speed: 3.6 km h −1 NW, sun with minor cloud cover). The weather conditions in the 6 days prior to the image acquisition are displayed in Supplementary Figure S1. The originally designed flight time was 37 min and 30 s per task over an area of 65.8 hectares (with areas of interest 2.87 hectares in this study). However, depending on the weather conditions and wind speed of the day, the eBee flight time might have been slightly different from the number of battery replacements (the endurance of one battery was approximately 20-30 min). This data capture protocol was designed to represent the reflectance spectrum characteristics of crops during different growth stages. Flight-line overlap was set using a frontal image overlap of 80% and a lateral overlap of 75% with a target altitude of 120 m above ground level (AGL), resulting in a ground sampling distance (GSD) of 10 cm per pixel. All image data capture procedures were undertaken between the hours of 10 a.m. to 2 p.m. to guarantee the consistency of photo collection quality and to minimize lateral shading of crops within the VPT fields. An Airinov radiometric calibration target (Airinov, Paris, France) and a one-point calibration method [38] were used to enable post-flight radiometric correction of the multispectral imagery before each flight to remove dark current and lens vignetting effects while postprocessing the image [39].  Figure 2 shows the workflow utilized to combine the UAS-based image collection, processing, sampling, and AutoML framework modified from [30]. A fixed-wing UAS eBee Plus (Sensefly Inc., Cheseaux-Lausane, Switzerland) equipped with GNSS postprocessed kinematic (PPK) capabilities was deployed with a Parrot Sequoia multispectral sensor (version 1.  level (AGL), resulting in a ground sampling distance (GSD) of 10 cm per pixel. All image data capture procedures were undertaken between the hours of 10 a.m. to 2 p.m. to guarantee the consistency of photo collection quality and to minimize lateral shading of crops within the VPT fields. An Airinov radiometric calibration target (Airinov, Paris, France) and a one-point calibration method [38] were used to enable post-flight radiometric correction of the multispectral imagery before each flight to remove dark current and lens vignetting effects while post-processing the image [39]. The Auto-sklearn framework constructed ML pipelines automatically, which were proposed by the Bayesian optimization method with warm-started meta-learning and joint with post hoc ensemble building approach to achieve robust performance (adapted from [30,40]). (c) Yellowbrick visualization package was conducted for AutoML model evaluation. (d) PipelineProfiler was conducted for AutoML interactive pipelines visualization tool allows the examination of the solution space of end-to-end ML pipelines.

UAS Image Processing
For pre-processing UAS images, we used SenseFly eMotion 3, applying differential correction data (RINEX) provided by the GNSS CORS (Continuously Operating Reference Station) of Estonia for post-processing kinematics (PPK) corrections [41]. PPK was The Auto-sklearn framework constructed ML pipelines automatically, which were proposed by the Bayesian optimization method with warm-started meta-learning and joint with post hoc ensemble building approach to achieve robust performance (adapted from [30,40]). (c) Yellowbrick visualization package was conducted for AutoML model evaluation. (d) PipelineProfiler was conducted for AutoML interactive pipelines visualization tool allows the examination of the solution space of end-to-end ML pipelines.

UAS Image Processing
For pre-processing UAS images, we used SenseFly eMotion 3, applying differential correction data (RINEX) provided by the GNSS CORS (Continuously Operating Reference Station) of Estonia for post-processing kinematics (PPK) corrections [41]. PPK was reported to increase the higher horizontal and vertical geotagging accuracy when compared to ground control points (GCP) [42,43]. In our study, the UAS image corrections were decreased from 5 m error to under 0.06 m (less than one-pixel size). Pix4D v.4.3.31 ® (Pix4D SA, 1015 Lausanne, Switzerland) software was utilized to process and radiometrically correct (calibrated according to the variances between the measured value and target actual reflectance [38]) the imagery, as well as to generate the multispectral orthomosaics. These images were subsequently clipped with a one-meter inward buffer zone from each plot to represent only the extent of the area of the VPTs.

Vegetation Indices Calculation
In this study, nineteen VIs were chosen and calculated to address the issues of heterogeneous crop classes, soil types, and the current absence of valuable UAS referenced parameters in AMPs (see Table 1). More specifically, Datt4, SRre, NDVIre were selected due to their positive correlation with chlorophyll content [44][45][46]; MTVI, MSR, MSRre, RVIS, WDRVI [47][48][49][50][51] are known to be sensitive to variations in leaf area index (LAI); GDVI was used for better lower vegetal land cover estimates and characterization [52]; GIPVI Table 1. Descriptions and formulas of multispectral UAS derived VIs used in this study. The ρ R refers to the reflectance of the red band, ρ G refers to the reflectance of the green band, ρ REG refers to the reflectance of the red edge, and ρ NIR refers to the reflectance of the near-infrared.

Vegetation Index Equation Reference
Datt4

Principal Component Analysis and VI Extraction
In this study, principal component analysis (PCA) was used to decrease the dimensionality of data through the calculation of a series of new variables, or principal components, through linear combinations of the original parameters [71]. PCA was employed as an exploratory data analysis (EDA) technique to describe the relationship between three different agricultural management types (CM, MA, and STM) and multispectral UAS-VIs. The PCA was used for testing whether or not it could improve the classification efficiency of AMPs. PCA was conducted using R version 4.0.2 [72] and the FactoMineR package [73]. For extraction of the digital number (DN) values from each VIs of four experimental fields (72 plots in each field), a total of 288 plots were digitized in ArcGIS Pro 2.6.3 [74]. As stated previously, a one-meter buffer zone was extended inwards from each plot boundary to address potential edge effects from agricultural management, and the average VIs were isolated and calculated. These extracted values were further used in this study when building ML algorithms and for AutoML assessment and evaluation.

AutoML Modeling with Auto-Sklearn
Auto-sklearn [30], a robust and efficient AutoML system first introduced in 2015 and upgraded in 2020 [75], was utilized in this study. Auto-sklearn is developed on the Python Scikit-learn machine learning package. It uses 15 classifiers, 14 feature pre-processing methods, and four data pre-processing methods, giving rise to a structured hypothesis space with 110 hyperparameters [76]. It improves on existing AutoML methods by automatically considering the previous performance on similar datasets, and by constructing ensembles from the models evaluated during the optimization process. At its core, this method combines the highly parametric ML framework with automatically constructed ML pipelines suggested by the Bayesian optimization method sequential model-based algorithm configuration (SMAC) [77]. SMAC can automatically construct ML pipelines that include feature selection (i.e., removing insignificant features), transformation (i.e., dimensionality reduction), classifier selection comprising support vector machines (SVM) [78], Random Forest (RF) [79], and other algorithms, hyperparameter optimization, etc. Subsequently, it then utilizes a Random Forest technique for swift cross-validation by evaluating one-fold at a time, while at the same time discarding poor-performing hyperparameter settings during early stages. It achieves competitive classification accuracy, in addition to novel pipeline operators that significantly increase classification accuracy on the datasets [80]. During the feature selection stage, any highly correlated VIs were removed to eradicate the influence of collinearity. This step was omitted here since Auto-sklearn deals with the low dimensional optimization problems [81].
In this study, all calculations were done in the open-source operating system LINUX with Intel Core i5-1035G1 CPU (1.00 GHz) and 16 GB RAM. For the AutoML framework, the steps described in [30] were followed, with some modifications for this study (Figure 2b). First, the system used a supplementary approach of extensively applied meta-learning methods to train machine learning models over statistical attributes of datasets and estimated the parameter of models that yielded the best precision [82]. Second, the system automatically built ensembles of the models considered by Bayesian optimization. Third, the system constructed a highly parameterized ML framework from high-performing classifiers and pre-processors implemented within the ML framework. Finally, the system performed broad empirical analysis using a diverse collection of datasets to demonstrate the resulting Auto-sklearn system outperformed preceding AutoML methods. The major AutoML parameter settings of this study are described in Table 2. Due to computational resource constraints and to test the efficiency of AutoML, we first limited the CPU time for each run to 60 s and the running time for evaluating a single model to 10 s as an example of rapid model selection. Subsequently, we then used a total of 1200 s with a 10-s single model computing time as a representative of the better processing of AutoML models. The data were analyzed separately according to the four crop fields (F1-F4), with each field containing 72 plots (n = 72) with a split in the training site and validation site (0.6/0.4) for classification modeling. A recent review study of supervized ML methods applied in land-cover image classification disclosed that Random Forest (RF), support vector machine (SVM), and artificial neural network (ANN) classifiers were among the most commonly used ML techniques from 220 related articles [83]. Therefore, in this study, these popular ML classifiers were selected for comparison against the accuracy performance of AutoML (with 60-s run, and 1200-s run of Auto-sklearn). These algorithms were programmed in Python by the robust ML library Scikit-learn (0.24.2) [76] with the perimeter setting as following: sklearn.ensemble.RandomForestClassifier (100 trees; min_samples_split (2); leaf_node (1)); sklearn.svm.SVC (cost (C = 500); gamma (0.5); epsilon (0,01)), and sklearn.neural_network. MLPClassifier (alpha (0.00005); the maximum number of iterations (100,000)) The parameters not mentioned were computed as default settings from Scikit-learn, and for the accuracy, calculation referring to Table 3. Table 3. The confusion matrix-based accuracy evaluation equations used throughout this study.

AutoML Model Evaluation and Visualization
For the visualization and evaluation of the Auto-sklearn model, the workflow included, in general, multiple iterations through feature engineering, algorithm selection, and hyperparameter tuning [84]. In this study, an open-source visual steering tool Yellowbrick visualization package (essentially a wrapper for the Sklearn documentation) was conducted for AutoML evaluation [85]. Yellowbrick contributes to assessing the stability and predictive values of ML models and delivers visualizations for our AutoML classification models. The accuracy evaluation based on the confusion matrix system of the AutoML classification parameters were defined as follows: true positive (TP), false positive (FP), true negative (TN), and false negative (FN), which have been well described in [86]. The equations used in this study are described in Table 3. The derived receiver operating characteristic curve (ROC) graph with the x-axis showing FPR and the y-axis showing TPR was used in this study to show the relationship among specificity and sensitivity for each possible cut-off [87] and the area under the curve (AUC) ranges from 0 to 1 to visualize the trade-off between the classifier's sensitivity and specificity [87,88]. Macro-and microaveraging ROC were calculated to evaluate overall classifier performance in multi-class problems. In this approach, the ROC curve was calculated anew, based upon the true positive and false positive rates for all dataset (by weighting curves by the relative frequencies of the dataset and then averaging them) [89,90]. In addition, the precision-recall curve (PR) was calculated for different probability thresholds. PR curves were conducted in cases where there was an imbalance in the observations between the classes [91] as another classification evaluation standard to assist with the ROC curve. The prediction errors (confusion matrix) and classification report that displays precision, recall, and F1-score [92] ( Table 3) per class as a heatmap in our study.
Alternatively, even though the AutoML framework facilitates the construction of models, given their black-box nature, the complication of the underlying algorithms and the large number of pipelines they derive leads to the reduced trust of AutoML pipelines systems [93]. Therefore, in our study, PipelineProfiler [94] was conducted for AutoML pipelines visualization. PipelineProfiler is a SOTA in visual analytics for AutoML interactive visualization tool that allows the examination of the solution space of end-toend ML pipelines. It offers a recovering understanding of how the AutoML algorithms are generated and the perceptions of how they can be optimized. As the outcome of the interactive AutoML pipeline matrix plots, where illustrated Pipeline flowchart, primitives used by the pipelines; one-hot-encoded hyperparameters for the primitive across pipelines; the accuracy ranking; primitive contribution view; and the class balancing of correlation score with accuracy. These calculations and expressions are clearly detail described in the [94] article.  (Figure 3b) with CM treatments) and one of the VIs (NDVI; Figure 3c) captured on 10 July from F1 and F2. It can be observed from the onsite AMPs treatment photographs of F1 and F2 in July that it was not readily distinguishable. In addition, it can be seen from the NDVI image that the heterogeneity within the plot may be caused by edge effects or uneven fertilization. For this reason, we used the plot average value considering the pixels inward boundary clipping to decrease the noise. end ML pipelines. It offers a recovering understanding of how the AutoML algorithms are generated and the perceptions of how they can be optimized. As the outcome of the interactive AutoML pipeline matrix plots, where illustrated Pipeline flowchart, primitives used by the pipelines; one-hot-encoded hyperparameters for the primitive across pipelines; the accuracy ranking; primitive contribution view; and the class balancing of correlation score with accuracy. These calculations and expressions are clearly detail described in the [94] article. Figure 3 displays the observation of onsite crop VPTs (i.e., Field 1 (F1) (Figure 3a) and Field 2 (F2) (Figure 3b) with CM treatments) and one of the VIs (NDVI; Figure 3c) captured on 10 July from F1 and F2. It can be observed from the onsite AMPs treatment photographs of F1 and F2 in July that it was not readily distinguishable. In addition, it can be seen from the NDVI image that the heterogeneity within the plot may be caused by edge effects or uneven fertilization. For this reason, we used the plot average value considering the pixels inward boundary clipping to decrease the noise.

Monthly PCA Analysis in Various Crop Growth Periods
PCA was conducted as the first step of data exploration in this study to gain an understanding of the relationship between VIs and different AMP categories during the three flight periods (April, May, and July) with their corresponding growing stages ( Figure 4). The results showed that on 30 May and 10 July, the PC1 and PC2 captured most of the variation from the F1 to F4 fields with 98.3%, 98.7%, 97.3%, and 97.6%, respectively, on 30 May (Figure 4b), and with 98.7%, 94.0%, 95.4%, and 95.4%, respectively, on 10 July (Figure 4c); followed by 23 April (Figure 4a). In addition, during the three flight periods, the PCA results in May and July provide better separation of the three AMP categories throughout the four crop cultivation areas based on the colored concentration ellipses where the sizes were determined by a 0.95-probability level. In terms of the AMP category, the subclasses of CM (Cmin+ Cmin+ and the other two categories) and MA (M+ and M−) seemed easier for non-overlapping AMP clustering, followed by STM. In terms of crop types, F1 (SW) were better clustered in April, while F2 (SW), F3 (P + O), and F4 (SB + RC) were better clustered in May or July. Given the better clustering performance in May, follow-up AutoML analysis was conducted on the UAS-VIs data of this month. In general, feature selection (finding the most relevant spectral bands) and extraction (reduced set of new significant variables) are commonly used to solve the collinearity and overfitting problems in the dimensionality reduction process [95]. However, after our test results, using PCA, 95% feature extraction in our preliminary experiments could not significantly improve the classification efficiency. Therefore, these PCA results were simply used as a reference basis for AutoML classification.

AutoML ROC and AUC Evaluation of AMP Recognition in May
The different subclasses and average results of ROC/AUC were calculated for evaluation of the AutoML performance for the AMP classification ability in UAS-VIs that were captured in May ( Figure 5), where AUC values were categorized in this study as AUC = 0.5: no discrimination; 0.7 ≤ AUC ≤ 0.8 (acceptable discrimination); 0.8 ≤ AUC ≤ 0.9 (excellent discrimination); 0.9 ≤ AUC ≤ 1.0 (outstanding discrimination) [87].

AutoML ROC and AUC Evaluation of AMP Recognition in May
The different subclasses and average results of ROC/AUC were calculated for evaluation of the AutoML performance for the AMP classification ability in UAS-VIs that were captured in May (Figure 5), where AUC values were categorized in this study as AUC = 0.5: no discrimination; 0.7 ≤ AUC ≤ 0.8 (acceptable discrimination); 0.8 ≤ AUC ≤ 0.9 (excellent discrimination); 0.9 ≤ AUC ≤ 1.0 (outstanding discrimination) [87]. Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 25 The AutoML results showed that the micro-average ROC of CM's classification resulted in F1(RC + G) and F2 (SW) being higher (AUC = 0.95, and 0.92, respectively). Especially in the subclass Omin−, the AUC both reached 0.99 for the micro-average ROC, followed by F4, and F3 (P + O), with 0.86 and 0.75, respectively) (Figure 5a). On the contrary, MA classification results showed that the micro-average AUC in F3 and F4 were higher (AUC = 0.83, and 0.89, respectively), followed by F1 (AUC = 0.71). F2 performance for MA was the worst (AUC = 0.51), with no discrimination ability (Figure 5b). In contrast, STM classification results were generally poor, with better results only present in F3, while other fields have larger divergence in classification results under the sub-class (DP, P, and R), as shown in Figure 5c). Overall, the AutoML classification ability from UAS-VIs of CM was the best, followed by MA and STM. The AutoML results showed that the micro-average ROC of CM's classification resulted in F1 (RC + G) and F2 (SW) being higher (AUC = 0.95, and 0.92, respectively). Especially in the subclass Omin−, the AUC both reached 0.99 for the micro-average ROC, followed by F4, and F3 (P + O), with 0.86 and 0.75, respectively) (Figure 5a). On the contrary, MA classification results showed that the micro-average AUC in F3 and F4 were higher (AUC = 0.83, and 0.89, respectively), followed by F1 (AUC = 0.71). F2 performance for MA was the worst (AUC = 0.51), with no discrimination ability (Figure 5b). In contrast, STM classification results were generally poor, with better results only present in F3, while other fields have larger divergence in classification results under the sub-class (DP, P, and R), as shown in Figure 5c). Overall, the AutoML classification ability from UAS-VIs of CM was the best, followed by MA and STM.

AutoML Precision-Recall, Prediction Error, and Classification Report of CM Recognition
Among the classification results of AMPs in May ( Figure 5) of four crop types, CM yielded the best ROC/AUC overall performance. Therefore, we used the precision-recall (PR) curves, prediction error, and classification report plots to gain an in-depth understanding of the classification status of CM treatments ( Figure 6).

AutoML Precision-Recall, Prediction Error, and Classification Report of CM Recognition
Among the classification results of AMPs in May ( Figure 5) of four crop types, CM yielded the best ROC/AUC overall performance. Therefore, we used the precision-recall (PR) curves, prediction error, and classification report plots to gain an in-depth understanding of the classification status of CM treatments ( Figure 6).  Table 3.
The PR curve of F4 CM shows the trade-off between a classifier's precision performance from UAS VIs in May (Figure 6a), where a model with perfect performance is depicted at the coordinate of (1,1). A curve that tends towards the (1, 1) coordinate represents a well-performing model, whereas a no-skill classifier is depicted as a horizontal line on the plot with a precision that is proportional to the number of positive examples in the dataset. For a balanced dataset, this value ought to be 0.5 [96]. The results  Table 3.
The PR curve of F4 CM shows the trade-off between a classifier's precision performance from UAS VIs in May (Figure 6a), where a model with perfect performance is depicted at the coordinate of (1,1). A curve that tends towards the (1, 1) coordinate represents a well-performing model, whereas a no-skill classifier is depicted as a horizontal line on the plot with a precision that is proportional to the number of positive examples in the dataset. For a balanced dataset, this value ought to be 0.5 [96]. The results showed that the classifications of Fields 1 and 2 were promising, their average PR being 0.90 and 0.85, respectively, while the results of F3 and F4 were poor (0.50 and 0.49). We can further discover from the prediction error graph (Figure 6b) in F3 and F4 that the judgment error of Cmin+ was low, and the confusions of Omin+ and Omin− were more common. We can also compare the precision, recall, and F1-score results of various cultivation method sub-classes to evaluate the classification accuracy from the heatmap (Figure 6c).

AutoML Pipeline Visualization
An interactive AutoML visualization tool PipelineProfiler was used in this study. Figure 6 shows the CM classification results across four crop fields in May with the accuracy performance of AutoML pipelines running time set at 60 s, and the primitive comparison against the others, and the real-time hyperparameter selection strategy (Figure 7). The results demonstrated that the best classifier found for Field 1 was linear discriminant analysis (LDA) [97] (Figure 7a), for Field 2, it was the Extra Trees Algorithm [98] (Figure 7b showed that the classifications of Fields 1 and 2 were promising, their average PR being 0.90 and 0.85, respectively, while the results of F3 and F4 were poor (0.50 and 0.49). We can further discover from the prediction error graph (Figure 6b) in F3 and F4 that the judgment error of Cmin+ was low, and the confusions of Omin+ and Omin− were more common. We can also compare the precision, recall, and F1-score results of various cultivation method sub-classes to evaluate the classification accuracy from the heatmap (Figure 6c).

AutoML Pipeline Visualization
An interactive AutoML visualization tool PipelineProfiler was used in this study. Figure 6 shows the CM classification results across four crop fields in May with the accuracy performance of AutoML pipelines running time set at 60 s, and the primitive comparison against the others, and the real-time hyperparameter selection strategy (Figure 7). The results demonstrated that the best classifier found for Field 1 was linear discriminant analysis (LDA) [97] (Figure 7a), for Field 2, it was the Extra Trees Algorithm [98] (Figure 7b), for Field 3, it was LDA (Figure 7c), and Random Forest (RF) for Field 4 (Figure 7d), with each of their hyperparameters found by AutoML also being represented in the figures.

Comparison of Performance between AutoML and Other Machine Learning Technologies
Based on the large calculations and multiple classifier selections that were required during the initial stage of AutoML computations, the processing time setting of 60 s may not completely reflect the performance power of AutoML. To evaluate the effects of AutoML processing time, we adjusted the times to 1200 s and 60 s (original running time) and considered the AMPs' classification accuracy with RF, SVM, and ANN algorithms ( Table 4). The results demonstrated that under the permutation and combination of ML algorithms included in AutoML, classification accuracy did not perform well in 60 s of computing time. Furthermore, performance was the worst in F1 CM, F2 STM, and F3 CM classifications compared to RF, SVM, and ANN. However, as processing time was increased to 1200 s, the classification accuracy of AutoML in AMPs was shown to improve. The results also indicated that overall AutoML (1200 s) and RF classifiers produced 5 and 3 best classification accuracy in AMPs, respectively (in black bold) and did not produce the worst accuracy values (in bold red) in any instances. Regarding SVM and ANN, the classifiers performed the best in 3 and 5 cases, respectively. However, these methods consistently produced low-performing classifiers compared to other AMPs.

Discussion
This paper is the first study to use an auto-learning system, with UAS multispectralderived VIs, for agricultural classification purposes. The study provides a novel AutoML framework across multiple AMP activities and presents a UAS and ML methodology optimized for future PA and crop phenotyping research.

Applicability and the Impact of the AutoML Method in UAS
In this study, we employed a SOTA, open-sourced AutoML framework for automatic, rapid multispectral image classification strategies and assistance in optimizing problematic hyperparameter adjustments. This technology brings several benefits and enhances the application of UAS for environmental and ecological research classification tasks.
First, UAS related classification research publications have significantly increased within recent years, with over a hundred articles developed since 2017. This substantial adoption of UAS related classification approaches demonstrates its impact and the mounting interest in such research issues [99]. Our UAS-AutoML framework may also be implemented in other UAS classification research, such as research employing multisensors (i.e., thermal, visible light, hyperspectral, radar or light detection and ranging (Lidar) sensors) across a range of contemporary agriculture classification activities (i.e., weed management [100,101], crop phenotyping [9,[102][103][104], disease monitoring [105,106], etc.), as well as research focused on ecological classification schemes, multispectral-based plant community mapping options [107], and coastal wetland vegetation classification results [108].
Second, the AutoML framework quickly provided usable classifiers and hyperparameter selections for unknown UAS classification tasks and parameter selection. For example, in the current study, the parameters and applicable classifiers of AMPs were unknown a priori. However, it provided a promising and efficient performance rating for classifiers for inclusion in modeling selection. As the results of Figure 6 show, LDA (Figure 7a,c) and Extra Trees (Figure 7b) were chosen as the best classifiers corresponding to the VPT fields of the AMP recognition task. These ML methods have been less applied and referenced in the field of UAS [83]. These findings clearly illustrate that AutoML has the potential to locate alternative ML approaches that might customarily be ignored by investigators with unknown classification subjects.
Third, the operational efficiency of AutoML classifiers can be given a time limit and gives the researcher the flexibility to find the most suitable formula within the required time. In general, a longer time setting allows for increasingly accurate results with additional classifier combinations. Since our experiments did not involve substantially large datasets, the focus was put on time setting close to the minimum limit of AutoML calculation (60 s of total CPU operation (this can be up to 3000 s) and 10 s of a single ML algorithm computation) to highlight the flexibility and rapid performance of AutoML.
Finally, within our research, the latest released AutoML interactive visualization system PipelineProfiler was employed and assisted in the screening of classifiers and the reference of fine-tuning parameters when analyzing UAS data. This interaction included adjustable time, accuracy ranking, and selection of hyperparameters in response to the requirement of customized UAS modeling. Our results showed that AutoML computations within 60-s-run produced between 11 and 12 pipelines (Figure 7), which might offer a beneficial foundation for providing adequate outcomes in most cases with minimal attempts and time.

The Impact of Algorithm Selection, Cultivated Period, and Crop Types in AutoML AMP Recognization
In terms of algorithm selection in our AMP classification results, different classifiers were suggested by AutoML as the best performances even within the same AMP category for different crop types (Figure 7). We can conclude that applying AutoML in UAS-derived multispectral VI data allowed for the consideration of a variety of algorithm combinations to meet the complexity of the VPT field. We also compared the three most used ML algorithms (RF, SVM, and ANN) in the UAS classification fields with AutoML algorithms ( Table 4). The overall performance showed that AutoML (with 1200-s CPU duration) provided the five best (or equal best) accuracy performances (shown in bold black in Table 4). Interestingly, in all tests, the AutoML (1200) and RF methods were never found to be the worst-performing methods (shown in bold red). Moreover, when using the ANN method, despite providing five of the best classification accuracy results, this method also included five of the worst performance results. Similar outcomes were observed regarding the SVM and AutoML 60-s runs.
From our results, we can deduce that increasing the computing time has the potential to improve the accuracy and stability of AutoML classification performance under certain AMPs conditions. However, it also highlights the potential to include AutoML methods in the computation of common classification problem-solving. Similar ranks were shown in a study that compared the results of the numerous classifiers with Auto-sklearn, where the RF classifier presented the strongest performance, and SVM showed robust performance for some datasets [30]. Since the Auto-sklearn classifiers are based on Scikit-learn as a blueprint, it should theoretically capture the hyperparameters of the RF algorithm on what was selected for Table 4. Despite the strong performance of AutoML (1200 s), there were still several results that indicated the inferiority of AutoML (1200 s) when compared to the RF classifier (i.e., Field 1 MA; Field 3 MA, and STM). Moreover, in a few cases, the accuracy of AutoML (1200 s) was even lower than the calculation result of the 60-s set (i.e., Field 1 MA, Field 3 STM, and Field 4 CM). It may be that the algorithm computations involve different factors other than accuracy, and the model it uses to tune the parameters actively tries to avoid overfitting. This will possibly lead to the situation where the most accurate model, on the testing or training data, will not be the one that can generalize the best on real data. In addition, developers from the Auto-sklearn team have previously described that during the ensemble selection phase, the methods can add numerous substandard models to the final ensemble, and unregularized selection may lead to overfitting with a small number of candidate models [40]. This result shows that there is still room for improvement regarding AutoML calculation methods in the future.
In terms of cultivated period and crop type, according to the monthly performance of different crop growth stages, the PCA results indicated that the VPT with better clustering performance occurred during the flight in May, with a confidence level of 0.95 (Figure 4b).
In this regard, this flight period was further used for our AMPs' classification study. Conversely, in the case of more homogeneous crop types (Field 3 (WS)), and despite the promising classification result in CM, the results of MA and STM were not as effective as other crops (Figures 5 and 6). These results may suggest that even with higher heterogeneity of cultivation within the plots (i.e., F1, F3, and F4), it appears not necessarily to affect the classification ability. However, concerning the Field 3 results from the PCA in May (stage of stem elongation) and July (stage of flowering), the MA clustering ability was better with a 0.95 confidence level in both months, and the accuracy was later improved from the classification analysis. The results of our study have demonstrated that, although the feature selection stage of AutoML is a black box, we can still preliminarily determine the potential predictive ability of the AutoML model based on PCA result and reduce the cost of period selection as we did in this study. In addition, this study has contributed evidence to the classification obstacles in the case of STM that may be caused by the orientation of images taken over vegetation or soil with uniform texture and re-cursive pattern, suboptimal flight configuration [109], or unflavored VIs selection. Some studies also suggest that the use of grey-level co-occurrence matrix (GLCM)-based texture information [100,110], semantic segmentation [111], or edge computing [112] can improve the accuracy of UAS-ML classification in the crop categories. This may be an applicable technology for AMPs classification in the future. The applicability and optimization of this framework, and the visualization of feature importance, required the optimization of the AutoML programmers and UAS application feedback to improve.
Currently, multispectral indices have been effectively applied in some AMP image analysis studies with the color, texture, and shape factors of the agricultural land at the satellite level. These include conservation tillage methods identification [113] and agriculture landscapes with pixel-based or object-based classification tasks [114,115]. AMPs application are indispensable for environmental monitoring and for facilitating the agricultural decision-making process, regarding the adoption practices proposed by growing conservation agriculture demand [116], and for its potential upscaling ability to accelerate land cover classification studies. Recently, combining commonly adopted management practice with UAS multispectral-VIs research has gradually gained attention and has been applied to cotton and sorghum fields [117]. In our study, the effective application of UAS sensors to recognize multiple AMP categories has been shown. More specifically, an UAS-AutoML approach can improve the classification ability under specific crop AMPs, highlighting that, in this study site, classification performed better in CM, with overall classification performance followed by MA and STM.

The Limitations of Our Method
In this study, not all classifiers computed within the Auto-sklearn system were able to be backtracked and reviewed to investigate the individual feature importance rankings of VIs, which has limitations in terms of their ability to assist in the selection of suitable VIs for AMPs classification tasks. However, our efforts to achieve a wide-ranging and well-considered predictor collection through a variety of VI combinations may lead to performance improvements. This study may also be limited by the location, crop categories selected, and varieties present at the study site. However, these issues can be simply addressed by including a wider range of VPT at multiple study sites and across a greater diversity of crop types in future investigations. Due to the characteristic complexity and repeatability of VPT, we need to recognize that the small sample size, and the potential interaction effects between trials, were not fully addressed. A potential solution worth pursuing may be to increase the VPT sampling size and/or enhance the segmentation number of each plot, ultimately increasing the training samples for AutoML calculation. Currently, the applicability of the AutoML framework will still require more UAS-based tests in the future to demonstrate its true potential and effectiveness.

Conclusions
First, our study demonstrated a novel UAS technology and a state-of-the-art Au-toML framework across multiple AMP tasks through non-destructive and cost-effective approaches. The scientific merit of this article lay in utilizing artificial intelligence to replace the judgment of the human for UAS classification analysis with its automated data pre-processing, model selection, feature engineering, and hyperparameter optimization capabilities. Furthermore, it provided innovative insights into agricultural management practices and accelerated the intellectualized progress of the in-field monitoring UAS system and established future crop phenotyping abilities. In our study, AutoML embodied "learning how to learn" for any given UAS subject; and it is the first study of its kind to apply an auto-learning system for AMP classification tasks in multispectral-derived VI data.
Second, in this study, we employed an AutoML workflow combined with two innovative visualization tools. We performed three multispectral-UAS flights at the farm-scale, under the four crop types (RC + G, SW, PO, and SB + RC) of VPT within three AMPs (CM, MA, and STM) treatments. In addition, we compared AutoML performance with those of three widely used ML methods. The ML comparison analysis results showed AutoML achieved the most overall classification accuracy numbers after 1200 s of calculation and without any of the worst-performing classifications of the given datasets. In terms of AMPs classification, the best recognized period for data capture occurred in the crop vegetative growth stage (in May of Estonia). The result demonstrated that CM yielded the best performance in terms of treatment, followed by MA, and STM; the last was shown to be the worst-performing treatment. These conclusions may be attributed to the low heterogeneity of the spectral reflectance value in the corresponding AMP treatment.
Third, the flexibility of fixed-wing imaging technology provides longer flight durations and thus allows for larger applications, such as commercial farmland, grasslands, forests, etc. Furthermore, the multispectral dataset produces various precise VIs without the need for any supplementary sensors, which reduces measurement errors and significantly reduces costs. In addition, given the AutoML's open-sourced platform and the powerful capabilities of automation, the complexities surrounding parameter selection in machine learning are greatly reduced, while it also has the potential to select long-ignored but highly efficient ML algorithms. Regarding the choice of AutoML systems or interfaces, although many of them have been developed successively (i.e., Auto-sklearn, H2O AutoML, AutoKeras), it is necessary to identify whether their subsequent updates and revisions keep up to date with current times.
Lastly, this UAS-AutoML solution has the potential to be implemented across a variety of other UAS classification research, such as contemporary agricultural classification methods, multispectral-based plant community mapping, ecological or wetland plant community recognition. Other remote sensing classification methods that lack algorithm and hyperparameter backgrounds may also be considered and benefited from our findings and insights.
In summary, our study, the UAS application particularly focused on the adoption and application of AutoML method across a diverse range of agricultural environmental assessment and management applications. Our approach demonstrated that UAS based on our AutoML framework, can recognize multiple agricultural management practices under certain conditions and that the integration of UAS technologies, geoprocessing methods, and automatic systems are vital tools for increasing the knowledge of plant-environment interactions within the management of crops. The framework also considerably contributes towards the simplified advancement of image-driven analytical pipelines for current VPT systems used in most countries. At the end of preparing this study, the Google Cloud AutoML also came out in 2019 for image-recognition use cases [118], showing that automatic learning will drive a non-negligible impact in the UAS field and provide new insight into the potential for remotely sensed solutions to field-based and multifunctional platforms for the demands of precision agriculture in the future.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/rs13163190/s1, Figure S1: Daily climograph of the study area (Kuusiku) during the flying period, including the previous 6 days (a. 17-23 April, b. 24-30 May, and c. 4-10 July) in 2019. Blue bars and the red line represent the daily average of rainfall and temperature, respectively.
Author Contributions: All authors contributed significantly to the work presented in this paper. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the European Regional Development Fund within the Estonian National Programme for Addressing Socio-Economic Challenges through R&D (RITA): L180283PKKK and the Doctoral School of Earth Sciences and Ecology, financed by the European Union, European Regional Development Fund (Estonian University of Life Sciences ASTRA project "Value-chain based bio-economy").

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the data size.