Improving Potato Yield Prediction by Combining Cultivar Information and UAV Remote Sensing Data Using Machine Learning

: Accurate high-resolution yield maps are essential for identifying spatial yield variability patterns, determining key factors inﬂuencing yield variability, and providing site-speciﬁc management insights in precision agriculture. Cultivar differences can signiﬁcantly inﬂuence potato ( Solanum tuberosum L.) tuber yield prediction using remote sensing technologies. The objective of this study was to improve potato yield prediction using unmanned aerial vehicle (UAV) remote sensing by incorporating cultivar information with machine learning methods. Small plot experiments involving different cultivars and nitrogen (N) rates were conducted in 2018 and 2019. UAV-based multi-spectral images were collected throughout the growing season. Machine learning models, i.e., random forest regression (RFR) and support vector regression (SVR), were used to combine different vegetation indices with cultivar information. It was found that UAV-based spectral data from the early growing season at the tuber initiation stage (late June) were more correlated with potato marketable yield than the spectral data from the later growing season at the tuber maturation stage. However, the best performing vegetation indices and the best timing for potato yield prediction varied with cultivars. The performance of the RFR and SVR models using only remote sensing data was unsatisfactory (R 2 = 0.48–0.51 for validation) but was signiﬁcantly improved when cultivar information was incorporated (R 2 = 0.75–0.79 for validation). It is concluded that combining high spatial-resolution UAV images and cultivar information using machine learning algorithms can signiﬁcantly improve potato yield prediction than methods without using cultivar information. More studies are needed to improve potato yield prediction using more detailed cultivar information, soil and landscape variables, and management information, as well as more advanced machine learning models.


Introduction
Potato (Solanum tuberosum L.) is a high-yield food crop with high added values. It originated in the Andes Mountains of southern Peru [1]. It is resistant to cold, drought, and barrenness, and has high unit input production capacity and strong adaptability [2].
At present, there are 160 countries in the world that grow and produce potatoes, making it very important for world food security [3]. Potato ranks as the fourth largest

UAV System and Flight Parameters
The imagery used to monitor potato growth in these field trails was collected by a GEMS multispectral camera (Sentek Systems LLC, Minneapolis, MN, USA) mounted on a DJI Inspire 2 quadcopter. The UAV flight speed was 6m s -1 . The GEMS camera has four  The growing degree days (GDD) is a useful variable to determine harvest dates and yield in potato [33] and for combining data from different site-years and growth stages [10]. Temperature data was downloaded from the weather station close to the research site. Based on the annual temperature data, GDD was calculated from the planting date to each sensing date following Equation (1): T max represents the daily maximum temperature. T min represents the daily minimum temperature. 7 o C is the base temperature [33].

UAV System and Flight Parameters
The imagery used to monitor potato growth in these field trails was collected by a GEMS multispectral camera (Sentek Systems LLC, Minneapolis, MN, USA) mounted on a DJI Inspire 2 quadcopter. The UAV flight speed was 6 m s −1 . The GEMS camera has four wide spectral bands with the following center wavelengths and full-width, half-maximum (FWHM) bandwidths: Blue (450 nm center, 101 nm FWHM bandwidth), Green (553 nm center, 101 nm FWHM bandwidth), Red (615 nm center, 114 nm FWHM bandwidth), and NIR (811 nm center, 135 nm FWHM bandwidth). The camera has a 35 • horizontal field of view and can collect high spatial-resolution images with a ground sampling distance of approximately 2 cm from 40 m above the ground. The Drone Survey iOS flight planning application (Sentek Systems LLC, Minneapolis, MN, USA) was used to prepare flight paths for image collection that covered the trial areas with the desired speed, height, and sidelap. The survey flights used in this study had a minimum forward overlap of 85% and a minimum sidelap of 75%. The GEMS camera has an embedded navigation system that recorded camera position and orientation data at the precise instants that images were taken. The Cheetah software package (Sentek Systems LLC, Minneapolis, MN, USA) was used to process the GEMS imagery with embedded position and orientation data to generate 3D reconstructions, elevation maps, and vegetation index maps, along with the VNIR ortho-mosaic images that were used in this study.
Three reference panels with reflectance values of 7%, 16%, and 27% were put near the experiments during each flight and used to compute surface reflectance from digital number measurements according to Equation (2). The digital number values of the near Remote Sens. 2021, 13, 3322 5 of 18 infrared band became saturated during the later growing season (data collected in July and August), thus only the reference panel with 7% reflectance was used in the data analysis.
R ROI represents the average reflectance in a given band in a region of interest (ROI). DN ROI represents the digital number values in the ortho-mosaic images, averaged over the ROI. R panel is the reflectance of a reference panel. DN panel is the digital number value of the reference panel in the ortho-mosaic images. Reflectance values were averaged over each plot in the trial for analysis. The UAV multispectral images of the fields on different sensing dates are shown in Figure 2.
represents the average reflectance in a given band in a region of interest (ROI). represents the digital number values in the ortho-mosaic images, averaged over the ROI.
is the reflectance of a reference panel. is the digital number value of the reference panel in the ortho-mosaic images. Reflectance values were averaged over each plot in the trial for analysis. The UAV multispectral images of the fields on different sensing dates are shown in Figure 2.

Data Analysis
Selected VIs were calculated based on the multispectral UAV images (Table 2). These VIs were normalized using GDD (N VI ), which is particularly useful when combining data from different site-years and growth stages [10,34]. The N VI was computed by dividing the VI data by the accumulated GDD from planting to sensing, as shown in Equation (3).  [48] NIR: reflectance data of the NIR band. Red: reflectance data of the red band. Green: reflectance data of the green band. Blue: reflectance data of the blue band. R i indicates the reflectance data at wavelength i nm.
Simple regression models were used to determine the relationship (linear, power, quadratic, or exponential) between each vegetation index and potato tuber yield using Statistical Package for the Social Sciences version 18.0 (SPSS 18.0) (SPSS Inc., Chicago, IL, USA). In addition, two machine learning algorithms, i.e., support vector regression (SVR) and RFR, performed well for crop yield prediction in a previous study [28]. Both of them were used to predict potato yield in this research. The SVR and RFR were implemented using the scikit-learn Python ML library.
Random forests consist of decision trees and provide two algorithms, namely mean decrease impurity and mean decrease accuracy, to calculate the importance of features. The mean decrease impurity is calculated to split the dataset into two so that similar response values end up in the same set. Impurity is the measure based on which the (locally) optimal condition is chosen. For regression trees, impurity is the variance. The impurity decrease from each feature can be averaged and the features can be ranked according to this measure [49,50]. The mean decrease impurity algorithm was implemented by the scikitlearn package (https://scikit-learn.org/stable/, accessed on 21 July 2021). Furthermore, the mean decrease accuracy algorithm is used to pass the out-of-band (OOB) samples down the tree and record the prediction accuracy. A variable is then selected and its values in the OOB samples were randomly permuted. OOB samples were passed down the tree and Remote Sens. 2021, 13, 3322 7 of 18 the accuracy was computed again. A decrease in accuracy obtained by this permutation was averaged over all trees for each variable and provided the importance of that variable (the higher the decrease, the higher the importance) [50]. The mean decrease accuracy was calculated by the ELI5 package in Python library (https://eli5.readthedocs.io/en/latest/, accessed on 21 July 2021). Support vector regression (SVR) is characterized by the use of kernels, sparse solution, and Vapnik-Chervonenkis control of the margin, as well as the numbers of support vectors [51]. SVR has been proven to be an effective tool for real-value function estimation. As a supervised-learning approach, SVR trains using a symmetrical loss function, which equally penalizes high and low misestimates. It has excellent generalization capability with high prediction accuracy [52]. Therefore, they were used in this study to evaluate the potential of coupling UAV multispectral data with cultivar information in potato yield estimation. Cultivar information was pre-processed into dummy variables and introduced in the machine learning models together with the selected four N VI data to build the inter-annual yield prediction model.
The analysis process flow diagram of this study is given in Figure 3. In addition to a separate analysis of different years, growth stages, and cultivars, the small plot data from different years were also pooled together, of which 75% was used for model calibration and 25% for validation. The agreement between the observed and predicted parameters was evaluated using the coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE), and relative absolute error (RAE) in prediction, as shown in Equations (4-7). The models with the largest R 2 and the lowest RMSE (t ha -1 ), MAE (t ha -1 ), and RAE (%) in prediction were identified. In addition to a separate analysis of different years, growth stages, and cultivars, the small plot data from different years were also pooled together, of which 75% was used for model calibration and 25% for validation. The agreement between the observed and predicted parameters was evaluated using the coefficient of determination (R 2 ), root mean Remote Sens. 2021, 13, 3322 8 of 18 square error (RMSE), mean absolute error (MAE), and relative absolute error (RAE) in prediction, as shown in Equations (4)- (7). The models with the largest R 2 and the lowest RMSE (t ha −1 ), MAE (t ha −1 ), and RAE (%) in prediction were identified.
y m i represents the actual potato yield data (t ha −1 ) of the i sample. y p i represents the predicted potato tuber yield data (t ha −1 ) of the i sample by the yield estimation model. y a i indicates the mean value of the actual potato marketable yield (t ha −1 ) dataset. n is the sample size while i is the sample number.

Correlation Analysis between Potato Tuber Yield and VIs
The correlation coefficients between potato tuber yield and VIs varied throughout the growth period ( Figure 4). In general, the VIs collected in late June at the tuber initiation stage had the best relationship with the potato marketable yield across years, with the best correlation coefficients in June and the worst in August at the tuber maturation stage ( Figure 2). In June, many VIs showed significant correlation with yield data at the p < 0.05 level. The Visible Atmospherically Resistance Index (VARI) had the highest correlation coefficients with yield on June 26 and July 10 in 2018. On other sensing dates in 2018, the VIs and yield did not show a significant correlation. In 2019, there was a significant correlation between the Enhanced Vegetation Index (EVI) and yield on June 26, while no significant correlation was found for the other three sensing dates.

Correlation Analysis Between Potato Tuber Yield and VIs
The correlation coefficients between potato tuber yield and VIs varied throughout the growth period ( Figure 4). In general, the VIs collected in late June at the tuber initiation stage had the best relationship with the potato marketable yield across years, with the best correlation coefficients in June and the worst in August at the tuber maturation stage (Figure 2). In June, many VIs showed significant correlation with yield data at the p <0.05 level. The Visible Atmospherically Resistance Index (VARI) had the highest correlation coefficients with yield on June 26 and July 10 in 2018. On other sensing dates in 2018, the VIs and yield did not show a significant correlation. In 2019, there was a significant correlation between the Enhanced Vegetation Index (EVI) and yield on June 26, while no significant correlation was found for the other three sensing dates.

Simple Regression Models for Potato Tuber Yield Prediction
Yield models were developed based on the most strongly correlated VIs with tuber yield for each cultivar at each growth stage. The details of these models are presented in Tables 3 and 4.
In 2018, the model based on VARI had an R 2 of 0.71 and an RMSE of 6.48 t ha -1 on June 26. On July 10, the R 2 of the VARI-based model was 0.30 and the RMSE was 9.97 t ha -

Simple Regression Models for Potato Tuber Yield Prediction
Yield models were developed based on the most strongly correlated VIs with tuber yield for each cultivar at each growth stage. The details of these models are presented in Tables 3 and 4. In 2018, the model based on VARI had an R 2 of 0.71 and an RMSE of 6.48 t ha −1 on June 26. On July 10, the R 2 of the VARI-based model was 0.30 and the RMSE was 9.97 t ha −1 . On July 18 and August 2, there was no model that could achieve significant results. In 2019, the model based on EVI had an R 2 of 0.40 and an RMSE of 4.84 t ha −1 , whereas no models were significant for the other three sensing dates.
In order to investigate the correlation between the VIs and potato marketable yield of each cultivar, we developed the single VI-based model of potato yield prediction for each cultivar and investigated the variation of the correlation at different growth stages.
Apparently, the models performed better for cultivars Ivory Russet and Umatilla Russet at each growth stage in 2018, with R 2 ranging from 0.56 to 0.80. As for Clearwater Russet, the early growth stage was the best stage to predict yield with an R 2 value of 0.52. For Russet Burbank, the best stage to predict marketable yield was July 18.
In 2019, for Lamoka, the yield estimation model was the best in August as compared with those in June and July.

Machine Learning Models
To reduce the influence of different dates, years, and sites, the N VI data were calculated from the VIs and GDDs. General models that integrated potato cultivar information and N VI data were constructed using machine learning methods.
Two feature selection algorithms (mean decrease impurity and mean decrease accuracy) were implemented to select the best feature subset. Figure 5 shows the descending scores of N VI data derived by a preliminary random forest method, in addition to the two feature selection algorithms and accumulative variance of the features.
From the cumulative histogram (Figure 5b), we can see that the first seven variables contributed over 75% of the accumulative variance of the model. However, the accumulative variance decreased when adding more inputs than the first seven variables. The first seven variables with high scores in both mean decrease in impurity and mean decrease accuracy were selected. Therefore, the Normalized Modified Triangular Vegetation Index improved (N MTVII ). The Normalized Sum Green Index (N SGI ) and normalized VARI (N VARI ) obtained on the first sensing date, in addition to the N SGI acquired on the second sensing date, were chosen for the machine learning processes. contributed over 75% of the accumulative variance of the model. However, the accumulative variance decreased when adding more inputs than the first seven variables. The first seven variables with high scores in both mean decrease in impurity and mean decrease accuracy were selected. Therefore, the Normalized Modified Triangular Vegetation Index improved (NMTVII). The Normalized Sum Green Index (NSGI) and normalized VARI (NVARI) obtained on the first sensing date, in addition to the NSGI acquired on the second sensing date, were chosen for the machine learning processes.  According to the above-selected variables, the RFR and SVR models were developed. Figure 6 shows the scatterplots between the estimated yield and measured yield. Although the RFR method performed better than the SVR in the calibration models, similar performance was found in the prediction or validation results.
Remote Sens. 2021, 13, x FOR PEER REVIEW According to the above-selected variables, the RFR and SVR models were developed. Figure 6 shows the plots between the estimated yield and measured yield. Although the RFR method performed better than the SV calibration models, similar performance was found in the prediction or validation results. Figure 6. Relationships between the measured yield and the estimated yield by the SVR and RFR models using the sele NVIs data without including cultivar information. (a) The scatterplot between measured yield data and estimated y data for calibration dataset by SVR model; (b) The scatterplot between measured yield data and estimated yield data prediction dataset by SVR model; (c) The scatterplot between measured yield data and estimated yield data for calibra dataset set by RFR model; (d) The scatterplot between measured yield data and predicted yield data for prediction dat by RFR model. Figure 7 show the relationships between the measured y the estimated yield obtained by the two machine learning models using the poo based on the selected VIs and cultivar information. Figure 6. Relationships between the measured yield and the estimated yield by the SVR and RFR models using the selected N VI s data without including cultivar information. (a) The scatterplot between measured yield data and estimated yield data for calibration dataset by SVR model; (b) The scatterplot between measured yield data and estimated yield data for prediction dataset by SVR model; (c) The scatterplot between measured yield data and estimated yield data for calibration dataset set by RFR model; (d) The scatterplot between measured yield data and predicted yield data for prediction dataset by RFR model. Figure 7 show the relationships between the measured yield and the estimated yield obtained by the two machine learning models using the pooled data based on the selected VIs and cultivar information. Remote Sens. 2021, 13, x FOR PEER REVIEW 14 of Figure 7. Relationships between the measured yield and the estimated yield by the SVR and RFR models using the pooled vegetation indices across site-years, sensing dates, and cultivar information. The scatterplot between measured yield and estimated yield for calibration dataset (a) and validation dataset (b) using the SVR model, and for calibration dataset (c) and validation dataset (d) using the RFR model.

The scatterplots in
Evidently, the incorporation of cultivar information into the UAV remote sensin based models improved potato yield prediction (Figures 6,7). The R 2 of the predictio model increased from 0.51 to 0.79 and from 0.48 to 0.75 for the SVR and RFR method respectively. The RFR model had a smaller RMSE, MAE, and RAE than the SVR model both the calibration and prediction processes. However, the R 2 value (0.97) of the RF calibration model was much higher than that (0.75) of the RFR prediction model, where consistent R 2 values were identified for the SVR calibration and prediction models, ind cating that the RFR method might be more affected by the size of the dataset and larg calibration dataset might lead to higher accuracy of the RFR model. The RFR yield pred tion model was applied to the UAV images to produce potato yield distribution maps 2018 and 2019, which are shown in Figure 8. Relationships between the measured yield and the estimated yield by the SVR and RFR models using the pooled vegetation indices across site-years, sensing dates, and cultivar information. The scatterplot between measured yield and estimated yield for calibration dataset (a) and validation dataset (b) using the SVR model, and for calibration dataset (c) and validation dataset (d) using the RFR model.
Evidently, the incorporation of cultivar information into the UAV remote sensingbased models improved potato yield prediction (Figures 6 and 7). The R 2 of the prediction model increased from 0.51 to 0.79 and from 0.48 to 0.75 for the SVR and RFR methods, respectively. The RFR model had a smaller RMSE, MAE, and RAE than the SVR model in both the calibration and prediction processes. However, the R 2 value (0.97) of the RFR calibration model was much higher than that (0.75) of the RFR prediction model, whereas consistent R 2 values were identified for the SVR calibration and prediction models, indicating that the RFR method might be more affected by the size of the dataset and larger calibration dataset might lead to higher accuracy of the RFR model. The RFR yield prediction model was applied to the UAV images to produce potato yield distribution maps in 2018 and 2019, which are shown in Figure 8.

Discussion
This study indicated that the accuracy of the potato tuber yield estimation model was closely related to the sensing time and cultivar information. For the combined cultivar dataset, the Pearson correlation analysis indicated that UAV images collected in late June (44-52 days after planting dates) during the stage of plant full vegetative growth and tuber initiation were better for potato yield prediction than during later growth stages, similar to the findings reported by some previous studies [5,10]. For the machine learning models, the VIs obtained in June were more important for yield estimation than the VIs obtained later during the growing season. A similar result was reported at the field scale [10].
The yield predictive models based on VIs are cultivar and sensing date-specific [5,10]. Potato cultivar had an important effect on the yield estimation models. In general, for cultivars Clearwater Russet and Russet Burbank, UAV image data obtained early in the growing season (late June) was more suitable for yield estimation. For cultivars Ivory Russet and Lamoka, the UAV image data obtained later in the growing season performed better. The yield of MN13142 could not be estimated via single vegetation index-based models. For Umatilla Russet, the entire growing season was suitable for yield prediction. A crop growth model simulation study indicated that potato tuber yield responses to changes in the cultivar parameters was specific to the environment [53]. Cultivars differed mostly in terms of plant N uptake dynamics and belowground biomass in field experiments [54]. The canopy morphology of variable potato cultivars and nitrogen rates were

Discussion
This study indicated that the accuracy of the potato tuber yield estimation model was closely related to the sensing time and cultivar information. For the combined cultivar dataset, the Pearson correlation analysis indicated that UAV images collected in late June (44-52 days after planting dates) during the stage of plant full vegetative growth and tuber initiation were better for potato yield prediction than during later growth stages, similar to the findings reported by some previous studies [5,10]. For the machine learning models, the VIs obtained in June were more important for yield estimation than the VIs obtained later during the growing season. A similar result was reported at the field scale [10].
The yield predictive models based on VIs are cultivar and sensing date-specific [5,10]. Potato cultivar had an important effect on the yield estimation models. In general, for cultivars Clearwater Russet and Russet Burbank, UAV image data obtained early in the growing season (late June) was more suitable for yield estimation. For cultivars Ivory Russet and Lamoka, the UAV image data obtained later in the growing season performed better. The yield of MN13142 could not be estimated via single vegetation index-based models. For Umatilla Russet, the entire growing season was suitable for yield prediction. A crop growth model simulation study indicated that potato tuber yield responses to changes in the cultivar parameters was specific to the environment [53]. Cultivars differed mostly in terms of plant N uptake dynamics and belowground biomass in field experiments [54]. The canopy morphology of variable potato cultivars and nitrogen rates were different [27]. Medium maturing and late maturing cultivars were included in this study and it was noticed that there were differences in vine senescence later in the season among different maturity cultivars. The rate of greenness loss [55] and lodging property [27] for some cultivars also affected the performance of VIs in yield estimation in the late growth stages [55]. Senescence affected the vegetation coverage, which in turn influenced the variation of VIs and their relationships with potato yield [5]. The cultivars that grew well during the whole growth season tended to have a better correlation between the VIs and yield.
The spectral bands around 500, 550, and 720 nm were important when assessing potato agronomic properties [56]. The Sum Green Index (SGI) based on green band reflectance information in June was the most important feature for the pooled dataset. The SGI was the reflectance information of the green band. The good performance of SGI across cultivars in the early and middle growth stages should be attributed to the much less absorption of chlorophyll in the green region compared to those in the blue or red regions [57]. The correlation between yields and chlorophyll in the field conditions may explain the importance of the green band in the early growth stage [55,58]. The priority of the green band in crop yield estimation was also reported in other crops (e.g., maize) [59]. The high correlation between VARI and yield was also noticeable for each cultivar, especially on the first sensing date. A similar conclusion was reported for corn yield estimation using multispectral images by Garcia-Martinez et al. [60].
The results of this study indicated that single VI-based models are cultivar-specific. Although machine learning models were developed by combining several VIs, the accuracy of the yield models was still low ( Figure 5), similar to a previous study [27]. The machine learning models combining optimal VIs and cultivar information [53] showed good potential in yield estimation, which is consistent with several previous studies [6,11,28]. SVM and RFR are two popular machine learning models used in precision agriculture and remote sensing data analysis, and one was found to perform better than the other in different studies [6,19,31]. In this study, although the RFR performed better than the SVR for the calibration process of a larger dataset, they performed similarly for a smaller validation dataset. The performance of both models was significantly improved when cultivar information was added. This clearly demonstrates the importance of adding other important variables in addition to the use of sensing data. Other studies have also found that apart from sensor data, the incorporation of plant height [27], meteorological data [15], and other related information improved the accuracy of yield estimation models.
In this study, cultivar information was incorporated in the machine learning models as dummy variables. This is an initial attempt. Further studies are needed to use more detailed information to represent cultivar differences, such as maturity days, growing GDDs, or growing degree units. More studies are needed to develop potato yield prediction models that incorporate more ancillary information together with remote sensing data using more advanced machine learning algorithms.

Conclusions
This study investigated the potential of using UAV-multispectral images coupled with cultivar information to predict potato yield and generate potential field-scale yield maps. The maturity days for each cultivar were different and the responses of potato tuber yield to the N rate also varied with cultivars. The best performing VIs and best sensing timing for potato yield prediction varied with cultivars. For cultivars Clearwater Russet and Russet Burbank, the best sensing date was in late June (44-52 days after planting). For cultivars Ivory Russet and Lamoka, the optimum sensing time was later in the growing season. For Umatilla Russet, VI-based models performed consistently well across the growing season, while no suitable models were found for cultivar MN13142 based on single VI at any of the growth stages. The R 2 s of single VI-based models for different cultivars varied from 0.22 to 0.80. In general, UAV images acquired early in the season (late June) were more correlated with potato yield across cultivars. Using machine learning models (RFR and SVR) to combine selected VIs with cultivar information could significantly improve the accuracy of potato yield prediction (R 2 = 0.75-0.79 for validation) compared with machine learning models only using VIs (R 2 = 0.48-0.51 for validation). More studies are needed to improve potato yield prediction using more detailed cultivar information, weather data, soil and landscape variables, and management information.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not yet publicly available.