Tar Spot Disease Quantiﬁcation Using Unmanned Aircraft Systems (UAS) Data

: Tar spot is a foliar disease of corn characterized by fungal fruiting bodies that resemble tar spots. The disease emerged in the U.S. in 2015, and severe outbreaks in 2018 caused an economic impact on corn yields throughout the Midwest. Adequate epidemiological surveillance and disease quantiﬁcation are necessary to develop immediate and long-term management strategies. This study presents a measurement framework that evaluates the disease severity of tar spot using unmanned aircraft systems (UAS)-based plant phenotyping and regression techniques. UAS-based plant phenotypic information, such as canopy cover, canopy volume, and vegetation indices, were used as explanatory variables. Visual estimations of disease severity were performed by expert plant pathologists per experiment plot basis and used as response variables. Three regression methods, namely ordinary least squares (OLS), support vector regression (SVR), and multilayer perceptron (MLP), were used to determine an optimal regression method for UAS-based tar spot measurement. The cross-validation results showed that the regression model based on MLP provides the highest accuracy of disease measurements. By training and testing the model with spatially separated datasets, the proposed regression model achieved a Lin’s concordance correlation coefﬁcient ( ρ c ) of 0.82 and a root mean square error (RMSE) of 6.42. This study demonstrated that we could use the proposed UAS-based method for the disease quantiﬁcation of tar spot, which shows a gradual spectral response as the disease develops.


Introduction
Tar spot is a major disease of corn caused by the fungus Phyllachora maydis and is present in 17 countries throughout the Americas; it is an emerging threat to U.S. corn production [1]. Documented yield losses range from 11 to 46% in Latin America and 25 to 30% in the U.S. [2][3][4][5]. First reported in the U.S. in 2015, this disease is characterized by the generation of fungal fruiting bodies (stromata) resembling tar spot on leaves, stems, and the husks of developing ears [1,6]. Under favorable conditions, the disease quickly develops from the late vegetative stage to the early reproductive stage, eventually reaching an exponential phase. Chemical protection has been proven to effectively manage the disease [7], despite a range in hybrid susceptibility and reaction to tar spot [1,5]. Nevertheless, reliable epidemiological surveillance and disease quantification will be essential to lay the foundation for developing immediate and long-term management strategies against tar spot. and the distance between rows was 70 cm. The seeds were planted at 3 cm depth, and the distance between plants was 12 cm. In addition, supplementary irrigation was provided for the experiments at the PPAC location. Tar spot was the most prevalent disease throughout all experiments.
All of the research plots (Tar1-4) were designed to investigate the effects of different treatments of tar spot disease. In Tar1, the effect of nine fungicides plus a control was investigated. Applications were carried out at the VT/R1 growth stage of the plant for all fungicides at a dose recommended by the manufacturer. In Tar2, the effect of tillage, three hybrids (two moderately susceptible and one susceptible), and fungicide applications (applied and non-treated) were investigated. Application of fungicide was done at VT/R1 at a dose recommended by the manufacturer. The total treatments performed were twelve, including the control. In Tar3, the effect of two fungicides applied at different growth stages was investigated. Application of fungicides started at the first detection of the disease, V8, VT, and R3, plus a combination of multiple growth stages, resulting in a total of 18 treatments, including two controls. In Tar4, the effect of a single fungicide applied at different growth stages was investigated. Fungicide applications were performed at V8, V10, VT, R2, R3, R4, R5, V8/VT, 14 days after a warning system and a control. A dose recommended by the manufacturer was applied. The spatial distribution of tar spot treatment in the study area is briefly displayed in Figure 2.

Number of Fungicide Treatments Tillage Type
Trial Tar 1 9 June 2020 10 1 9 + 1 (non-treated) Strip Trial Tar 2 6 June 2020 12 3 1 + 1 (non-treated) Strip, conventional Trial Tar 3 9 June 2020 18 1 16 + 2 (non-treated) Strip Trial Tar 4 8 June 2020 10 1 9 + 1 (non-treated) Strip of which only the middle two rows were used for visual evaluation and the estimation of yield. The plant density in all experiments was 84,000 plants ha −1 . The length of each row was 10 m, and the distance between rows was 70 cm. The seeds were planted at 3 cm depth, and the distance between plants was 12 cm. In addition, supplementary irrigation was provided for the experiments at the PPAC location. Tar spot was the most prevalent disease throughout all experiments. All of the research plots (Tar1-4) were designed to investigate the effects of different treatments of tar spot disease. In Tar1, the effect of nine fungicides plus a control was investigated. Applications were carried out at the VT/R1 growth stage of the plant for all fungicides at a dose recommended by the manufacturer. In Tar2, the effect of tillage, three hybrids (two moderately susceptible and one susceptible), and fungicide applications (applied and non-treated) were investigated. Application of fungicide was done at VT/R1 at a dose recommended by the manufacturer. The total treatments performed were twelve, including the control. In Tar3, the effect of two fungicides applied at different growth stages was investigated. Application of fungicides started at the first detection of the disease, V8, VT, and R3, plus a combination of multiple growth stages, resulting in a total of 18 treatments, including two controls. In Tar4, the effect of a single fungicide applied at different growth stages was investigated. Fungicide applications were performed at V8, V10, VT, R2, R3, R4, R5, V8/VT, 14 days after a warning system and a control. A dose recommended by the manufacturer was applied. The spatial distribution of tar spot treatment in the study area is briefly displayed in Figure 2.    Tar spot treatment applied in the research plots. The number in each experimental unit is the amount of fungicide applied in l/ha. The distance between research plots was modified for visual purposes. Fungicide treatments and hybrid were withheld due to confidentiality restrictions.

Tar Spot Visual Rating
In practice, multiple types of scales and methods can be found throughout the literature. Our selection was based on published work conducted over the last decade [32][33][34][35][36]. In this study, disease severity was defined as the proportion of diseased leaf area in total leaf area multiplied by 100 to obtain the percentage of disease severity [37].
Weekly, visual estimation of tar spot severity was done at the sub-subplot or plot level on the two middle rows. The disease area included both black stromata in the early disease stages and additional chlorotic or necrotic symptoms on the leaf or canopy that developed afterward. A single value of disease severity estimate for every experimental unit was recorded for each of the low, middle, and upper canopies. Considering the ear leaf as leaf 0 (L0), leaves below or above L0 were identified with a minus (−) or plus (+) sign, respectively. The lower canopy corresponded to L − 3 to the lowest leaf (L − n), mid-canopy from L − 2 to L + 1, and the upper canopy from L + 2 to flag leaf (L + n). Visual severity evaluations were performed on 13-14 dates (Table 2). Evaluations started at VT (tassel) and continued to the R6 (physiological maturity) growth stage in all experiments. Instead of using the entire area of the planted plot, the visual assessment was conducted within the two middle rows to avoid potential treatment overlaps ( Figure 3). However, visual rating in the lower canopy was not utilized since optical sensors could not observe vegetation in the lower canopy ( Figure A1). The tar spot visual ratings showed an increasing trend over time ( Figure 4). The rate and amount of disease progress were different in each research plot. . Tar spot treatment applied in the research plots. The number in each experimental unit is the amount of fungicide applied in l/ha. The distance between research plots was modified for visual purposes. Fungicide treatments and hybrid were withheld due to confidentiality restrictions.

Tar Spot Visual Rating
In practice, multiple types of scales and methods can be found throughout the literature. Our selection was based on published work conducted over the last decade [32][33][34][35][36]. In this study, disease severity was defined as the proportion of diseased leaf area in total leaf area multiplied by 100 to obtain the percentage of disease severity [37].
Weekly, visual estimation of tar spot severity was done at the sub-subplot or plot level on the two middle rows. The disease area included both black stromata in the early disease stages and additional chlorotic or necrotic symptoms on the leaf or canopy that developed afterward. A single value of disease severity estimate for every experimental unit was recorded for each of the low, middle, and upper canopies. Considering the ear leaf as leaf 0 (L0), leaves below or above L0 were identified with a minus (−) or plus (+) sign, respectively. The lower canopy corresponded to L − 3 to the lowest leaf (L − n), mid-canopy from L − 2 to L + 1, and the upper canopy from L + 2 to flag leaf (L + n). Visual severity evaluations were performed on 13-14 dates ( Table 2). Evaluations started at VT (tassel) and continued to the R6 (physiological maturity) growth stage in all experiments. Instead of using the entire area of the planted plot, the visual assessment was conducted within the two middle rows to avoid potential treatment overlaps ( Figure 3). However, visual rating in the lower canopy was not utilized since optical sensors could not observe vegetation in the lower canopy ( Figure A1). The tar spot visual ratings showed an increasing trend over time ( Figure 4). The rate and amount of disease progress were different in each research plot.

Unmanned Aircraft Systems (UAS) Data Collection and UAS Data Preprocessing
Unmanned aircraft systems (UAS) data was acquired by a Phantom 4 Multispectral (DJI, Shenzen, China) equipped with six 1/2.9″ CMOS (complementary metal-oxide-semiconductor) image sensors, including one RGB sensor and five monochrome

Unmanned Aircraft Systems (UAS) Data Collection and UAS Data Preprocessing
Unmanned aircraft systems (UAS) data was acquired by a Phantom 4 Multispectral (DJI, Shenzen, China) equipped with six 1/2.9″ CMOS (complementary metal-oxide-semiconductor) image sensors, including one RGB sensor and five monochrome sensors. The spectral bands of the 5 monochrome sensors are blue (450 nm ± 16 nm),

Unmanned Aircraft Systems (UAS) Data Collection and UAS Data Preprocessing
Unmanned aircraft systems (UAS) data was acquired by a Phantom 4 Multispectral (DJI, Shenzen, China) equipped with six 1/2.9 CMOS (complementary metal-oxidesemiconductor) image sensors, including one RGB sensor and five monochrome sensors. The spectral bands of the 5 monochrome sensors are blue (450 nm ± 16 nm), green (560 nm ± 16 nm), red (650 nm ± 16 nm), red edge (730 nm ± 16 nm), and near-infrared (840 nm ± 26 nm). Flight altitude and image overlap were mostly set to 30 m and 75% to obtain fine-resolution orthomosaic and digital surface model (DSM) data with a ground sampling distance (GSD) of approximately 1.5 cm ortho 3.0 cm, respectively. All of the UAS flights were conducted within a day of the dates when the visual rating was performed.
Radiometric calibration was performed on the multispectral UAS images. First, raw at-sensor irradiance was corrected using downwelling light sensor (DLS) orientation. Second, irradiance on the ground was computed as the sum of diffuse and direct sunlight components. Third, per-pixel radiance was calculated considering the effect of dark current, vignetting, and exposure time [38,39]. Finally, the reflectance was computed from per-pixel radiance and irradiance on the ground. Atmospheric correction was not performed on the UAS images since atmospheric attenuation in 0 m to 30 m elevation can generally be neglected [40].
We used multi-temporal UAS data to generate orthomosaic images and DSMs using the structure from motion (SfM) algorithm. The SfM is a 3D reconstruction method widely used for large-scale UAS data collected by consumer-grade or survey-grade cameras. A conventional SfM workflow for UAS data comprises four major steps: finding common feature points in an image dataset, feature-points matching in multiple image pairs, GCP-based orientation to georeferenced 3D models, and the iterative execution of bundle adjustment (BA) to recover camera external orientation parameters (EOP) and scene geometry [41,42]. This study used an SfM processing pipeline provided by Metashape (AgiSoft LLC, St. Petersburg, Russia) to generate orthomosaic images and DSMs.

Unmanned Aircraft Systems (UAS)-Based Plant Phenotyping
We used the orthomosaic images and DSMs to obtain plant phenotypes of each experimental unit ( Figure 5). First, we generated raster images including (a) canopy and non-canopy classification by the Canopeo algorithm [43], (b) canopy height measured from ground surface to the uppermost canopy, (c) excessive greenness (ExG), (d) NDVI, (e) the soil-adjusted vegetation index (SAVI), and (f) the modified soil-adjusted vegetation index (MSAVI). The definition and implication of the vegetation indices from (c-f) can be found in previous publications [44][45][46]. Second, we created a rectangular grid with a dimension of 9 m by 1.5 m for each experimental unit named the level 1 grid (L1G, Figure 3). A total of 24 0.75 m-by-0.75 m square grids (level 2 grid, L2G) were also created in each L1G grid area. As in the visual assessment of tar spot severity, we calculated UAS-derived plant phenotypes in the middle two rows. We designed the individual L2Gs to fit tightly between the adjacent planting rows, making the vertical centerlines of the grids aligned with the planting row. Third, we calculated zonal statistics, including the sum, average, maximum, and standard deviation of the raster data from (a-f) using the L1G and L2G polygons. The number of L1G and L2G phenotypes were 14 and 336 (14 times 24).

Variable Selection and Data Standardization
Variable selection was conducted to select relevant input variables for regression analysis using L1G UAS phenotypes. An L1G phenotype with the highest positive or negative Pearson's correlation coefficient with visual ratings was selected for regression with a single input variable. For regression models that use multiple input variables, the best subset of input L1G phenotypes was chosen by Bayesian information criterion (BIC) [47]. We first generated ordinary least square (OLS) models with every possible combination of L1G phenotypes with all observed data, and a subset of input variables was chosen that provides the lowest BIC. The individual input variables were standardized Remote Sens. 2021, 13, 2567 7 of 19 before they were entered into regression models. We transformed the distribution by subtracting the mean, then dividing by the standard deviation, and this standardization process was applied separately for the training and test data sets.

Variable Selection and Data Standardization
Variable selection was conducted to select relevant input variables for regression analysis using L1G UAS phenotypes. An L1G phenotype with the highest positive or negative Pearson's correlation coefficient with visual ratings was selected for regression with a single input variable. For regression models that use multiple input variables, the best subset of input L1G phenotypes was chosen by Bayesian information criterion (BIC) [47]. We first generated ordinary least square (OLS) models with every possible combination of L1G phenotypes with all observed data, and a subset of input variables was chosen that provides the lowest BIC. The individual input variables were standardized before they were entered into regression models. We transformed the distribution by subtracting the mean, then dividing by the standard deviation, and this standardization process was applied separately for the training and test data sets.

Regression Methods
Three regression techniques were used to convert UAS phenotypes to tar spot severity. The ordinary least squares method was chosen because of the simplicity of model interpretation. In addition, we also selected support vector regression and multilayer perceptron owing to their good accuracy with generalization capability and to model non-linear processes, respectively.

Ordinary Least Squares (OLS)
The ordinary least squares (OLS) is a least squares technique used to find a regression model by minimizing the sum of squared error between observed values and fitted values [48]. We used a linear regression model with a constant term by the following equation: where yi is a response of the i-th observation (visual rating); xi is an i-th observation of explanatory variables; β is a vector of regression coefficients; C is a constant variable, and ε i is an error. The OLS was used to estimate visual ratings of middle and upper canopy layers from single and multiple L1G phenotypes.
2.6.2. Support Vector Regression (SVR) Support vector regression (SVR) is a regression method that finds an optimal hyperplane using the same principles of support vector machine (SVM). SVR attempts to find a hyperplane that minimizes both magnitudes of the normal vector and prediction error. The generalization capability of SVR is achieved by penalizing data points outside

Regression Methods
Three regression techniques were used to convert UAS phenotypes to tar spot severity. The ordinary least squares method was chosen because of the simplicity of model interpretation. In addition, we also selected support vector regression and multilayer perceptron owing to their good accuracy with generalization capability and to model non-linear processes, respectively.

Ordinary Least Squares (OLS)
The ordinary least squares (OLS) is a least squares technique used to find a regression model by minimizing the sum of squared error between observed values and fitted values [48]. We used a linear regression model with a constant term by the following equation: where y i is a response of the i-th observation (visual rating); x i is an i-th observation of explanatory variables; β is a vector of regression coefficients; C is a constant variable, and ε i is an error. The OLS was used to estimate visual ratings of middle and upper canopy layers from single and multiple L1G phenotypes.

Support Vector Regression (SVR)
Support vector regression (SVR) is a regression method that finds an optimal hyperplane using the same principles of support vector machine (SVM). SVR attempts to find a hyperplane that minimizes both magnitudes of the normal vector and prediction error. The generalization capability of SVR is achieved by penalizing data points outside the ε-tube around the estimated function. The objective function of SVR can be written as Equation (2): where w is a normal vector; ξ i and ξ i * are prediction errors from ε-tube either above or below the estimated function, and C is a regularization parameter that trades-off the flatness of the hyperplane and the sum of the prediction error. The SVR can also solve nonlinear regression problems by mapping data points in a higher dimensional space [49]. This study used a grid search method to find the best hyperparameters for the ε, C, and kernel parameters, according to three types of kernels: linear, polynomial, and radial basis function (RBF) ( Table 3) [50,51]. The RBF kernel is defined by the following equation: where u, v are n-dimensional vectors, and γ corresponds to 1/2σ 2 in the Gaussian function.
The SVR models were used to calculate visual ratings of middle and upper canopy layers using multiple L1G phenotypes. Multilayer perceptron (MLP) is a feedforward artificial neural network (ANN) that consists of an input layer, hidden layer, and output layer. Due to the simplicity of the structure and nonlinear modeling capability, MLP has been widely used in various regression problems in plant sciences [52,53].
The MLP was used to model the relationship between the tar spot visual rating of three canopy layers and the L1G or L2G UAS phenotypes. The preliminary result showed that the MLP with a single hidden layer performed better than the MLP with 2-4 hidden layers ( Figure 6). Therefore, a grid search was conducted to determine the number of nodes in the single hidden layer. We tested MLP with 5, 10, 20, 40, 80, and 160 hidden nodes for L1G phenotypes and 3, 5, 10, 20, 40, and 80 hidden nodes for L2G phenotypes. In the training process, the mean square error (MSE) was optimized by the adam algorithm (adaptive moment estimation) with a patience of 5. All processing nodes in the MLP model used the rectifier linear unit (ReLU) as the activation function [54,55]. An optimal number of input data and processing nodes in the hidden layer vary according to the spatial resolution of explanatory variables (level 1 grid or level 2 grid) and response variables (visual rating in the middle or upper canopy layer).

Evaluating the Performance of Regression Models
The performance of regression models was assessed by a cross-validation and transferability test. First, 3-fold cross-validation was repeated 30 times using the data from all study plots. The average values of the coefficient of determination (R 2 ), root mean square error (RMSE), and Lin's concordance correlation coefficient (ρc) were calculated for each 3-fold cross-validation. Subsequently, average and standard deviation of 30 cross-validation results were obtained. An optimal regression model was chosen based on the statistics of accuracy metrics. Second, a transferability test was conducted to An optimal number of input data and processing nodes in the hidden layer vary according to the spatial resolution of explanatory variables (level 1 grid or level 2 grid) and response variables (visual rating in the middle or upper canopy layer).

Evaluating the Performance of Regression Models
The performance of regression models was assessed by a cross-validation and transferability test. First, 3-fold cross-validation was repeated 30 times using the data from all study Remote Sens. 2021, 13, 2567 9 of 19 plots. The average values of the coefficient of determination (R 2 ), root mean square error (RMSE), and Lin's concordance correlation coefficient (ρ c ) were calculated for each 3-fold cross-validation. Subsequently, average and standard deviation of 30 cross-validation results were obtained. An optimal regression model was chosen based on the statistics of accuracy metrics. Second, a transferability test was conducted to obtain accuracy metrics with spatially separated training and test data. For example, the performance of the MLP model on Tar4 data was assessed by training regression model with Tar1, Tar2, and Tar3 data, then testing the model on Tar4 data.

Correlation Analysis with UAS-Derived Plant Phenotypes
The correlation coefficient between L1G phenotypes and the visual ratings of the middle and upper canopy layers revealed that the average and maximum of MSAVI, NDVI, and SAVI had a negative correlation below −0.8 (Table 4). This indicates that the vegetation index is inversely related to the tar spot disease severity. A higher magnitude of correlation was observed from L1G average values than maximum values. Standard deviation statistics showed a weaker correlation than other statistics. Canopy cover, canopy volume, and ExG-based statistics showed a weaker relationship with visual ratings than MSAVI, NDVI, and SAVI. The strongest correlation observed between MSAVI and the visual rating of the middle and upper canopy was −0.87 and −0.83, respectively. Table 4. Pearson's correlation coefficient between visual ratings and level 1 grid (L1G) unmanned aircraft systems (UAS) phenotypes. * avg: average, ** max: maximum, *** stdev: standard deviation. Multicollinearity among explanatory variables (L1G phenotypes) was investigated using a correlation matrix. As a result, a statistically significant correlation was observed among L1G phenotypes (Figure 7). For example, the correlation between the L1G MSAVI average and the L1G NDVI average was 0.99, indicating a very high positive relationship. Similar results were observed among the phenotypes of the same statistics (average, maximum, and standard deviation) of MSAVI, NDVI, and SAVI. Since L1G phenotypes of multispectral vegetation indices contained redundant information, we only used L1G canopy cover, canopy volume, and statistics of ExG and MSAVI in the variable selection process. among L1G phenotypes (Figure 7). For example, the correlation between the L1G MSAVI average and the L1G NDVI average was 0.99, indicating a very high positive relationship. Similar results were observed among the phenotypes of the same statistics (average, maximum, and standard deviation) of MSAVI, NDVI, and SAVI. Since L1G phenotypes of multispectral vegetation indices contained redundant information, we only used L1G canopy cover, canopy volume, and statistics of ExG and MSAVI in the variable selection process. A high correlation with Pearson's correlation coefficient over 0.90 was observed among L2G phenotypes. For example, the L2G MSAVI average from grid locations 1-24 showed a correlation of over 0.95. A slightly lower correlation coefficient was observed between phenotypes in the northern (grid 1, 13) and southern edges (12,24), with a Pearson's correlation of 0.95. Correlation among L2G MSAVI in locations 2-11 and 14-23 A high correlation with Pearson's correlation coefficient over 0.90 was observed among L2G phenotypes. For example, the L2G MSAVI average from grid locations 1-24 showed a correlation of over 0.95. A slightly lower correlation coefficient was observed between phenotypes in the northern (grid 1, 13) and southern edges (12,24), with a Pearson's correlation of 0.95. Correlation among L2G MSAVI in locations 2-11 and 14-23 produced a correlation coefficient above 0.96. Such multicollinearity was also observable from other L2G UAS phenotypes.

Variable Selection
Variable selection was performed to select input L1G phenotypes for regression analysis. For a regression model with a single input variable, the L1G average of MSAVI was chosen due to the highest correlation with visual ratings in the middle and upper canopy (Table 4).
Multiple L1G variables were selected based on BIC, as shown in Tables 5 and 6. For the middle canopy, canopy cover, maximum of ExG, an average of MSAVI, and the standard deviation of MSAVI were chosen. Selected variables for the upper canopy included canopy cover, maximum of ExG, an average of MSAVI, maximum MSAVI, and the standard deviation of MSAVI. The average of MSAVI was commonly included in the best subsets for the middle and upper canopy layers, and the coefficient of average MSAVI had the highest magnitude among other variables. The highest magnitude of the coefficient of MSAVI indicated that average MSAVI is the predominant input variable that explains most of the variance in tar spot disease severity.

Hyperparameter Tuning of SVR and MLP Models by Grid Search
A set of optimal hyperparameters for the regression model was fine-tuned by grid search. Optimal parameters for SVR models are shown in Table 7. An optimal number of nodes in the first hidden layer for MLP with L1G variables were 80 and 40 for the middle and upper layers, respectively. For MLP models that use L2G variables, an optimal number of nodes was 3 and 5 for the middle and upper canopy, respectively.

Accuracy of Tar Spot Severity Measurement by Cross-Validation
The average RMSE of repeated cross-validations showed that the MLP model obtained the most accurate results with multiple L1G phenotypes ( Table 8). The average RMSE of the MLP-L1G model for the middle and upper canopy was 10.4 and 7.9, respectively. Similar accuracy was achieved by the MLP model of L2G phenotypes with an average RMSE of 10.4 and 8.2 for the middle and upper canopies, respectively. Average RMSE values from OLS or SVR were substantially higher than those obtained by the MLP models. It should be noted that the standard deviation of cross-validated RMSE of the MLP models was generally lower than those of the OLS and SVR, indicating a higher consistency of model performance.
The tar spot visual ratings obtained by the optimal OLS, SVR, and MLP models commonly showed an increasing trend as the ground reference data increased. Figure 8 displays a relationship between tar spot visual ratings and its UAS-based measurement acquired by 3-fold cross-validation, where hollow red, orange, and blue circles represent each test set of 3-fold cross-validation. Compared to the result from MLP models, the OLS and SVR models had a tendency to produce higher variance when the visual rating was in the 30-70 range. Moreover, the OLS and SVR models underestimated the disease severity when the visual rating was in the 60-100 range. Therefore, we conducted a transferability test based on the MLP model with L1G phenotypes. displays a relationship between tar spot visual ratings and its UAS-based measurem acquired by 3-fold cross-validation, where hollow red, orange, and blue circles repres each test set of 3-fold cross-validation. Compared to the result from MLP models, OLS and SVR models had a tendency to produce higher variance when the visual rat was in the 30-70 range. Moreover, the OLS and SVR models underestimated the dise severity when the visual rating was in the 60-100 range. Therefore, we conducte transferability test based on the MLP model with L1G phenotypes.

Accuracy of Tar Spot Severity Measurement by Transferability Test
The transferability test demonstrated the applicability of UAS-based tar spot measurement under different field locations and management conditions (Figures 9-12). A linear trend between visual ratings and UAS measurement was observed in Figures 9a, 10b and 12b either with an overestimating or underestimating trend. A nonlinear trend was observed from Figures 9b and 11a,b, showing an exponentially increasing trend as the visual rating increases. Nevertheless, the concordance between visual ratings and UAS measurement indicated that a statistical relationship between tar spot disease severity and spectral information was captured using the MLP model. urement under different field locations and management conditions (Figures 9-12). A linear trend between visual ratings and UAS measurement was observed in Figures 9a, 10b and 12b either with an overestimating or underestimating trend. A nonlinear trend was observed from Figures 9b and 11a,b, showing an exponentially increasing trend as the visual rating increases. Nevertheless, the concordance between visual ratings and UAS measurement indicated that a statistical relationship between tar spot disease severity and spectral information was captured using the MLP model.  urement under different field locations and management conditions (Figures 9-12). A linear trend between visual ratings and UAS measurement was observed in Figures 9a, 10b and 12b either with an overestimating or underestimating trend. A nonlinear trend was observed from Figures 9b and 11a,b, showing an exponentially increasing trend as the visual rating increases. Nevertheless, the concordance between visual ratings and UAS measurement indicated that a statistical relationship between tar spot disease severity and spectral information was captured using the MLP model.    To investigate the cause of lower accuracy in the transferability test, a relationship between visual ratings and L1G average of MSAVI was observed. It should be noted that the MSAVI average had the highest negative correlation with the visual ratings. The scatter plots in Figure 13 revealed that the data distribution in the four study plots was comparable when the visual rating was in the 0-20 range. However, a positive offset of MSAVI was observed from Tar2 and Tar3 when the visual rating was in the 20-100 range. For the most part, the data distribution of Tar1 and Tar4 showed a similar data distribution in the entire range of the visual ratings. Therefore, we selectively used Tar1 and Tar4 data to test the transferability of the proposed approach. The MLP model with L1G phenotypes was trained with Tar1 data and tested on Tar4 data. As a result, the RMSEs of the UAS-based measurements in the middle and upper canopies were 7.61 and 6.42, respectively (Figure 14), making the trend line between the two measures closer to a 1-to-1 line compared to the previous results (Figures 9-12). To investigate the cause of lower accuracy in the transferability test, a relationship between visual ratings and L1G average of MSAVI was observed. It should be noted that the MSAVI average had the highest negative correlation with the visual ratings. The scatter plots in Figure 13 revealed that the data distribution in the four study plots was comparable when the visual rating was in the 0-20 range. However, a positive offset of MSAVI was observed from Tar2 and Tar3 when the visual rating was in the 20-100 range. For the most part, the data distribution of Tar1 and Tar4 showed a similar data distribution in the entire range of the visual ratings. Therefore, we selectively used Tar1 and Tar4 data to test the transferability of the proposed approach. The MLP model with L1G phenotypes was trained with Tar1 data and tested on Tar4 data. As a result, the RMSEs of the UAS-based measurements in the middle and upper canopies were 7.61 and 6.42, respectively (Figure 14), making the trend line between the two measures closer to a 1-to-1 line compared to the previous results (Figures 9-12).

Discussion
Disease measurement based on a regression approach can perform well when training and when the test data contains concurrent statistical data distribution. We found that a difference in plot location and management practices can result in a change in the relationship between visual ratings and UAS-derived plant phenotypes. In addition, the relationships can also change according to climatic conditions and yearly fluctuations in epidemiological factors. As shown in the transferability test, a selective approach that uses the most relevant data as a training set was adequate if enough data is provided. Future research is needed to define the environmental parameters that govern the relationship between plant phenotype and disease severity to effectively confine a training dataset.
There are several drawbacks of using UAS-based plant phenotypes for disease measurement. First, plant phenotype is not a direct measurement of tar spot intensity. Instead, plant phenotypes are determined by a combination of various factors, including plant vigor, water stress, and disease stress. Second, the quality of plant phenotypic data acquired from the image-based approach can be reduced by strong winds, uneven illumination, and image alignment quality. Third, there was also an ambiguity issue with the size and number of spatial grids in plant phenotyping. Although this study tested two

Discussion
Disease measurement based on a regression approach can perform well when training and when the test data contains concurrent statistical data distribution. We found that a difference in plot location and management practices can result in a change in the relationship between visual ratings and UAS-derived plant phenotypes. In addition, the relationships can also change according to climatic conditions and yearly fluctuations in epidemiological factors. As shown in the transferability test, a selective approach that uses the most relevant data as a training set was adequate if enough data is provided. Future research is needed to define the environmental parameters that govern the relationship between plant phenotype and disease severity to effectively confine a training dataset.
There are several drawbacks of using UAS-based plant phenotypes for disease measurement. First, plant phenotype is not a direct measurement of tar spot intensity. Instead, plant phenotypes are determined by a combination of various factors, including plant vigor, water stress, and disease stress. Second, the quality of plant phenotypic data acquired from the image-based approach can be reduced by strong winds, uneven illumination, and image alignment quality. Third, there was also an ambiguity issue with the size and number of spatial grids in plant phenotyping. Although this study tested two gridding schemes (L1G and L2G) and found that L1G produces more accurate results, more research might be required to determine an optimal geometry and size of grids. Despite the above drawbacks, a disease-measurement approach with image-based plant phenotypes has been one of the most frequently utilized methods in UAS research when spatial resolution of the image is insufficient to capture individual disease lesions.
As a starting point of tar spot disease quantification using remotely sensed data, we proposed a hypothesis that UAS data can be effectively used to measure disease severity. Our data-driven approach provided a way to quantify tar spot severity with regression techniques effectively. However, the data were collected at a single location in a single year. Therefore, future studies are required to investigate the reproducibility of this method in space and time. Adding datasets from different years and locations can significantly help our approach to model the complex relationships of UAS phenotypes and the severity of tar spot.
As an alternative method of UAS-based disease measurement, a regression approach based on deep learning can be used. There are two significant advantages when using deep learning methods: (a) information loss during the phenotyping process can be minimized because deep learning uses original pixel values; (b) a complex relationship between pixel values and visual ratings can be established; (c) phenotyping procedure can be omitted since deep learning models can be trained only using orthomosaic and DSM as input.
In addition, we recommend developing a spectral disease index (SDI) for tar spot, which correlates a significant wavelength to tar spot-infected plants' biochemical or biophysical characteristics [24,56,57]. The generation of SDI using hyperspectral sensors will determine a functional spectral range throughout the different stages of the disease epidemic [24,58], which may improve our accuracy in tar spot disease detection and severity predictions.
This study does not provide the actual applications of the UAS-based tar spot measurement technology. In the future, we will explore the possibility of using cross-analysis and heterogeneous data to predict and manage plant diseases. The results and models reported in this study may be implemented and tested in the next-generation decision support systems for mitigating tar spot disease.

Conclusions
This study presents UAS-based disease quantification of tar spot of corn based on spectral phenotyping and regression techniques. We showed that the highest accuracy of the proposed method was obtained by SLP models with a reduced number of lower resolution (L1G) phenotypes. The RMSE and ρ c were 10.4 and 0.91 in the middle canopy layer and 7.9 and 0.90 in the upper canopy layer, respectively. In addition, the performance of SLP models that uses 336 higher spatial resolution (L2G) phenotypes was comparable to the best results. Another important finding in this study was that UAS-based disease measurement was possible in the upper (L + 2 to flag leaf) and middle (L − 1 to L + 1) canopy layer. The cross-validation and transferability test results revealed that the accuracy of UAS-based tar spot measurement could improve by training the model with a dataset containing sufficient statistical information between spectral phenotypes and the disease symptoms. It is expected that our demonstrated approach could provide opportunities to detect and monitor plant diseases that show a gradual spectral response in the external plant structures as the disease develops.