Leaf Wetness Duration Models Using Advanced Machine Learning Algorithms: Application to Farms in Gyeonggi Province, South Korea

Leaf wetness duration (LWD) models have been proposed as an alternative to in situ LWD measurement, as they can predict leaf wetness using physical mechanism and empirical relationship with meteorological conditions. Applications of advanced machine learning (ML) algorithms in the development of empirical LWD model can lead to improvements in the LWD prediction. The current study developed LWD model using extreme learning machine, random forest method, and a deep neural network. Additionally, performances of these ML-based LWD models are evaluated and compared with existing models. Observed LWD and meteorological variable data are obtained from nine farms in South Korea. Temporal and geographical information were also used. Additionally, the priorities of the employed variables in the development of the ML-based LWD models were analyzed. As a result, the ML-based LWD models outperformed the existing models; the random forest led to the best performance for LWD prediction among the tested LWD models. Strengths of associations between input variables and leaf wetness were relative humidity, short wave radiation, air temperature, hour, latitude, longitude, and wind speed in descending order. Uses of the geographical and time information in development of LWD model can improve the performance of LWD model.


Introduction
Disease warning systems have been studied and developed to prevent and mitigate risks from diseases among crops. Such disease warning systems can be grouped into two categories: calendar-based systems and environment variable-based systems. While the former recommends chemical sprays based on fixed calendar dates or phenological stages, the latter recommends chemical sprays based on the level of disease risk estimated according to in situ environmental conditions [1,2]. The latter system may be a more efficient way for disease warning, as it can adaptively respond to the level of disease risk according to changes in environmental conditions. Relationships between leaf wetness (LW) and plant diseases have been studied in many literatures because risks of many bacterial and fungal diseases among a variety of crops have been associated with LW, particularly in conjunction with temperatures favorable to infection [3,4]. LW is defined as the presence of free water on the surface of a leaf [5]; it mainly comes from three sources: water intercepted by the leaf during a rainfall or fog event, overhead irrigation, or dew that is formed on the leaf. The duration of LW is termed as leaf wetness duration (LWD).  Table 2 presents the means and standard deviations of the observed meteorological variables, LW, and LWD for the nine sites. The means and standard deviations of T air and short-wave radiation is relatively consistent across the nine sites, whereas the means of RH, WS, and LW vary with the sites. The RH for sites 1 and 8 is smaller than for other sites; because sites 1 and 8 are respectively located in an urban area and near a highway, their locations may influence their RH. Means of WS for sites 1, 2, and 4 are larger than those for other sites. This may result from that these sites are located in coastal and near coastal regions. Means of LW ranges from 0.22 to 0.4 and its standard deviations ranges from 0.41 to 0.491. While the means of LW vary across the sites, the standard deviations are relatively consistent.   Table 2 presents the means and standard deviations of the observed meteorological variables, LW, and LWD for the nine sites. The means and standard deviations of Tair and short-wave radiation is relatively consistent across the nine sites, whereas the means of RH, WS, and LW vary with the sites. The RH for sites 1 and 8 is smaller than for other sites; because sites 1 and 8 are respectively located in an urban area and near a highway, their locations may influence their RH. Means of WS for sites 1, 2, and 4 are larger than those for other sites. This may result from that these sites are located in coastal and near coastal regions. Means of LW ranges from 0.22 to 0.4 and its standard deviations   Figure 2 presents time series plots of LWD observations for four sites in 2017. As shown in Figure 2, large values of LWD are observed from the 121st to the 301st day. Fifty percent of annual precipitation falls during the summer rainy season (from June, the 152nd day to September, the 273rd day) in South Korea because of Changma (summer monsoons) and typhoons. Many rainfall events and high humidity lead to frequent dew formation in this season. Accordingly, the large values of LWD in this summer season result from the frequent presence of water on leaves from both rainfall events and high humidity.
2, large values of LWD are observed from the 121st to the 301st day. Fifty percent of annual precipitation falls during the summer rainy season (from June, the 152nd day to September, the 273rd day) in South Korea because of Changma (summer monsoons) and typhoons. Many rainfall events and high humidity lead to frequent dew formation in this season. Accordingly, the large values of LWD in this summer season result from the frequent presence of water on leaves from both rainfall events and high humidity.

Classification and Regression Tree/Stepwise Linear Discriminant (CART)
Gleason, et al. [17] suggested the empirical LWD prediction model using CART. This model produces LW predictions using three hierarchical levels and two linear discriminants. Each

Classification and Regression Tree/Stepwise Linear Discriminant (CART)
Gleason, et al. [17] suggested the empirical LWD prediction model using CART. This model produces LW predictions using three hierarchical levels and two linear discriminants. Each hierarchical level has thresholds to determine the existence of dew on the leaves. T air , RH, the dew point temperature (T dew ), and WS are all used in this model. The CART model assigns hourly data to one of four categories according to the threshold values of 3.7 • C for T dew , 2.5 m/s for WS, and 87.8% for RH. In the first level, dryness is predicted when T dew is equal to or greater than 3.7 • C; otherwise, the second level is considered. In the second level, wetness is predicted when inequality (1) is met and WS is less than 2.5 m/s; otherwise consideration moves on to the third level. Subsequently, wetness is predicted when Inequality (2) is met and RH is equal to or greater than 87.8%; otherwise, dryness is predicted. Inequalities (1) and (2) are formulated as follows: where D is the difference between T air and T dew (D = T air − T dew ). Since this model requires a smaller number of meteorological variables than the physical model, e.g., PM, it is easy to implement from a practical perspective.

Number of Hours of Relative Humidity (NHRH)
This model is one of the simplest models used to predict LWD. When the duration for which RH is higher than 90% exceeds a specific threshold, this model predicts an existence of dew on leaves. Sentelhas, et al. [18] reported that this model can provide prediction performance comparable with the PM model after correcting the LWD predictions based on local characteristics.

Penman-Monteith (PM)
The PM model has been applied to estimate latent heat flux (LE), which was used to determine the period of wetness on a leaf [11]. Since the PM model is a physical model, it can be applied in any location wherein the required meteorological observations are available. The main advantage of the PM model is that, unlike other physical models, it does not require air temperature measurements at the leaf level to estimate dew amount and duration. The LE for a mock leaf can be estimated for each interval of time using the PM approach [33]. The PM model is used as a representative of the physics-based LWD prediction model and uses a larger number of input variables than the CART and NHRH models. In particular, the predictive performance of the PM model is largely influenced by the precision of the net radiation (R n ) observation that is utilized. Sentelhas, et al. [15] reported that the PM model has low spatial variation in various meteorological conditions and provides a good performance with high applicability. The equation of LE is given in Equation (3): where s, R n , e s , e a , γ, r b , and r a are the slope of the saturation vapor pressure curve (hPa), net radiation of the mock leaf (J/min/cm 2 ), the saturated vapor pressure at the weather station air temperature (hPa), the actual air vapor pressure (hPa), the modified psychrometer constant (0.64 kPa/K is adopted in the current study), the boundary layer resistance for heat transfer, and the boundary layer resistance for WS, respectively. r b can be calculated by Equation (4): where D is the effective dimension of the mock leaf (0.07 m in this study). r a is calculated by Equation (4): where Z s , d, Z 0 and WS* are the height of the wetness sensor (m), the displacement height (0.5 Z c ), the roughness length (0.13 Z c ), and the friction velocity (m/s), respectively. Z c is the crop height. For LE estimates at the weather station where the wetness sensor is at the same level as the T, RH, and WS sensors, the boundary layer resistance for WS is not required in Equation (3). In the current study, Equation (3) is used to predict LE, since the levels from the T air , RH, and WS sensors are different. When the LE is greater than zero, the LW is predicted at the specific time. Because the net radiation is not observed in the considered sites, it is estimated using the method suggested by Walter, et al. [34].

ML Algorithms
ELM, RF, and DNN for ML algorithms are used in the LWD prediction models. The input variables (known features) of the three models are T air , RH, short wave radiation, WS, latitude, longitude, and hour on the 24-h-clock. Of these variables, the first four are meteorological variables. Because these variables have been used in the existing models, they can represent the meteorological influences on the LWD. The LWD can be altered depending on the location and time [20]. Geographical and time variables are used as input variables: latitude, longitude, and hour on the 24-h-clock. The training and test data sets are the same for the three ML algorithms. The data of all sites from 2009 to 2016, excluding 2013, are employed for the training data set. The data of all sites in 2017 are used for the test data sets. The occurrence of dew is predicted by the ML-based LW prediction models for every hour. The employed ML algorithms are briefly described in the following subsections.

Regularized Extreme Learning Machine
ELM is a single layer network with randomly generated weights and a bias between the input and hidden layers [35]. Unlike the traditional neural networks, which must iteratively optimize weights, the ELM model can be trained in a single iteration because of the randomized weight and bias. The ELM can be formulized by the following equation: where Y, β, and H are the outputs, the weight matrix between hidden layer and the output layer, and the output vector of the hidden layer called nonlinear feature mapping, respectively. The outputs are calculated as follows: where f a (·), W, B, and x are the activation function, the weight matrix between the input layer and the hidden layer, the bias, and the inputs, respectively. Here, the sigmoid function (f a (x) = 1 1+exp(−x) ) is used as the activation function in the ELM. Since the weights (W) and bias (B) are randomly generated and the activation function (f a (·)) is known in the ELM, H is comprised of deterministic variables from a data set. Thus, only β needs to be estimated for the ELM.
In the ELM, finding the appropriate weight set that avoids overfitting is important, as in an artificial neural network. Tuning weights in the EML can be considered as fitting a linear regression model using ordinary least square. Ridge regression was employed to attenuate multicollinearity in the data set by adding the norm of the parameters in the parameter estimation of the regression model [36]. In weight tuning, the ELM model also adapted this strategy. The ELM attempts to achieve a better generalization performance by attaining not only the smallest training error, but also the smallest norm of output weights. This minimization problem can take the form of ridge regression or regularized least squares as [37]: where the first term of the objective function is the l 2 norm regularization term which controls the complexity of the model. The second term is the training error associated with the learned model. C > 0 is a tuning parameter. The ELM gradient equation can be analytically solved, and the closed-form solution can be written as follows:β where I is an identity matrix. The ELM model used herein is the regularized ELM model. A 10-fold cross validation is conducted to identify the optimal structure of the ELM model with the fixed tuning parameter (C = 0.5). The 10 folds are split with equal spans from the training data sets. Data of 9 folds are used to train the ELM model and data of the remaining fold that is not used for training is used to compute the root mean square error (RMSE) of the EML model between prediction and observation. Thus, 10 RMSEs are obtained from the 10-fold cross validation. The mean of the RMSE of the ELM model decreases as the number of nodes increases. The change in the RMSE mean is very small when the number of nodes is greater than 25. Twenty-five nodes would thus be the optimal number of nodes in the hidden layer for the ELM model of LWD prediction. The structure of the ELM model is presented in Figure 3.
observation. Thus, 10 RMSEs are obtained from the 10-fold cross validation. The mean of the RMSE of the ELM model decreases as the number of nodes increases. The change in the RMSE mean is very small when the number of nodes is greater than 25. Twenty-five nodes would thus be the optimal number of nodes in the hidden layer for the ELM model of LWD prediction. The structure of the ELM model is presented in Figure 3.

Random Forest
The RF method has been widely applied in regression and forecasting problems [38][39][40]. The RF was proposed by Breiman [41] and uses a bagging (also referred to as bootstrapping in the field of statistics) to build a number of decision trees with controlled variance. Thus, the RF method consists of an ensemble of simple decision trees. Each decision tree in the RF is grown using randomly selected samples. Subsequently, the nodes in each tree use randomly selected features (called input variables). The RF has two major steps: (1) randomness and (2) ensemble learning.
The randomness in the RF comes from the random sampling of the entire data set and the selection of features with which every classification and regression tree is built. The data set is randomly sampled with replacement to create a subset with which to train one classification and regression tree. At each node, the optimal split rule is determined by using one of the randomly selected features from the employed features. Approximately two-thirds of the entire data set will be chosen as the training subset. Random selection without replacement is also performed on all features. The data set not included in the training subset is termed 'out-of-bag' and is applied to evaluate the performance of the RF model as well as the importance of the features.
The ensemble learning method in the RF means that all individual decision trees in a collection of decision trees (called the ensemble) contribute to the final prediction. A training subset is created

Random Forest
The RF method has been widely applied in regression and forecasting problems [38][39][40]. The RF was proposed by Breiman [41] and uses a bagging (also referred to as bootstrapping in the field of statistics) to build a number of decision trees with controlled variance. Thus, the RF method consists of an ensemble of simple decision trees. Each decision tree in the RF is grown using randomly selected samples. Subsequently, the nodes in each tree use randomly selected features (called input variables). The RF has two major steps: (1) randomness and (2) ensemble learning.
The randomness in the RF comes from the random sampling of the entire data set and the selection of features with which every classification and regression tree is built. The data set is randomly sampled with replacement to create a subset with which to train one classification and regression tree. At each node, the optimal split rule is determined by using one of the randomly selected features from the employed features. Approximately two-thirds of the entire data set will be chosen as the training subset. Random selection without replacement is also performed on all features. The data set not included in the training subset is termed 'out-of-bag' and is applied to evaluate the performance of the RF model as well as the importance of the features.
The ensemble learning method in the RF means that all individual decision trees in a collection of decision trees (called the ensemble) contribute to the final prediction. A training subset is created after the random selection step. The classification and regression tree, without pruning, is used to construct a single decision tree. To grow K number of trees in an ensemble, this process (resampling a subset and training the individual tree) is repeated K times. The final predicted value comes from averaging the results of all individual trees. The ranger library in R package is used to construct a RF model in the current study [42]. The 10-fold cross validation is conducted to identify the optimal number of trees in the RF model. The RMSE mean of the RF decreases as the number of trees in the RF increases. The mean of the RMSE converges when the number of nodes is greater than 80. 80 may be the optimal number of trees for the RF model in this study this number is adopted for the number of trees in the RF model used to predict LWD.

Deep Neural Network
DNN is a multiple-hidden layer feed-forward network. The main difference between single-and multiple-hidden layer feed-forward networks is the number of hidden layers. A DNN has more than one hidden layer. The deep hidden layer of the DNN allows the emulation of a complex function relation between inputs and outputs by extracting the complicated feature structure [31]. In the current study, rectified linear unit (ReLU) function is used for the activation function of the DNN model. This application of the ReLU function provides a satisfactory performance in the neural network model with deep layers. For the employed DNN model, three hidden layers are adopted; each hidden layer has eighteen nodes. The structure of the DNN is manually optimized because there is no standard method to identify the optimal structure of DNN model. The structure of the employed DNN model is presented in Figure 4.
be the optimal number of trees for the RF model in this study this number is adopted for the number of trees in the RF model used to predict LWD.

Deep Neural Network
DNN is a multiple-hidden layer feed-forward network. The main difference between single-and multiple-hidden layer feed-forward networks is the number of hidden layers. A DNN has more than one hidden layer. The deep hidden layer of the DNN allows the emulation of a complex function relation between inputs and outputs by extracting the complicated feature structure [31]. In the current study, rectified linear unit (ReLU) function is used for the activation function of the DNN model. This application of the ReLU function provides a satisfactory performance in the neural network model with deep layers. For the employed DNN model, three hidden layers are adopted; each hidden layer has eighteen nodes. The structure of the DNN is manually optimized because there is no standard method to identify the optimal structure of DNN model. The structure of the employed DNN model is presented in Figure 4. To train the DNN model, the stochastic gradient descent method with momentum and an adaptive learning rate (known as the decaying rate) is used. Pre-training is conducted to avoid wrong training and reduce computation time for training. An auto-encoder is applied for the pre-training of the DNN model. The initial weights, which are obtained from the auto-encoder, are adopted in the fine tuning of the DNN model. To implement the DNN model and train the DNN algorithm, the H2O library in R package is used.

Evaluation Measures
As evaluation criteria, the RMSE, Pearson correlation (Cor), means absolute error (MAE), and mean bias (mBias) are employed. The RMSE measures the accuracy of a prediction compared to an observation based on the variance of prediction and the square of bias between the observation and To train the DNN model, the stochastic gradient descent method with momentum and an adaptive learning rate (known as the decaying rate) is used. Pre-training is conducted to avoid wrong training and reduce computation time for training. An auto-encoder is applied for the pre-training of the DNN model. The initial weights, which are obtained from the auto-encoder, are adopted in the fine tuning of the DNN model. To implement the DNN model and train the DNN algorithm, the H2O library in R package is used.

Evaluation Measures
As evaluation criteria, the RMSE, Pearson correlation (Cor), means absolute error (MAE), and mean bias (mBias) are employed. The RMSE measures the accuracy of a prediction compared to an observation based on the variance of prediction and the square of bias between the observation and prediction. Since the RMSE can simultaneously measure both the precision and bias of a prediction, the RMSE has often been employed to evaluate the performance of prediction models. A lower value of the RMSE indicates a better performance of the model. Equation (10) presents the equation of the RMSE: where E i , O i , and n are the ith LWD prediction, the ith LWD observation, and the number of data, respectively. The Pearson correlation can be calculated as in the following equation: where, E and O are the means of LWD prediction and observation, respectively. The MAE is a mean of absolute deviations between predictions and observations. The MAE provides information about average model-performance error. Equations of the MAE and mBias are given by Equations (12) and (13), respectively.

Results
The quantile-quantile density plots of six LWD prediction models for all stations are presented in Figure 5. Since the PM method can predict LWD without null WS (= 0 m/s), the null WSs are replaced by a very small WS (= 0.1 m/s) for LWD computation in the PM model. For other models, null WS data are used in LWD calculation. Approximately 70% of data are that have the value of LWD is zero. Since too many data have zero values, the quantile-quantile density plot may inappropriately represent the performances of LWD prediction models for non-zero LWD data. Hence, the correct predictions for a zero value of LWD are excluded in Figure 5. All employed models show good performances for LWD prediction, particularly for large values of LWD (larger than 6 h). High density is observed in the diagonal lines for the ML-based models. Based on this density, the ML-based models perform better models than the existing models. Particularly, the RF and DNN models outperform the other models including the ELM, CART, NHRH, and PM models. In the correlation calculation, all data including those with a zero value of LWD are used, while correlation of the PM model is estimated using the data set with zero values for LWD data and replacement of null wind speed. Correlation estimates of the ELM, RF, and DNN models are 0.776, 0.811, and 0.798, respectively. Subsequently, the CART, NHRH, and PM models provide 0.661, 0.725, and 0.656, respectively. The ML-based prediction models thus provide larger correlations than the existing models. Based on correlation estimates, the RF is the best model for LWD prediction. The difference between performances of the RF and DNN models is very small.
To assess the performances of the employed models, the evaluation measures including Cor, RMSE, mBias, and MAE for all data are computed and presented in Table 3. In calculating these evaluation measures, the data set with zero values for LWD and replacement of the null WS are used for the PM, whereas the data set with zero values for LWD and the null WS are used for the other models. Based on the results of the evaluation measures for all data, the ML-based models lead to better performances than the existing models. The RMSEs of the ML-based models are less than 4.    The results presented in Figure 5 may improperly represent the prediction performance of the PM model because of the replacement of the null WS. The null WS can be a very small WS that is unable to be measured by installed instruments. Hence, the LWD prediction excluding the null WS data are used to assess the prediction performance of the PM model and for comparisons with the other LWD prediction models. Figure 6 presents the quantile-quantile density plots of six LWD prediction models for all stations excluding the correct predictions for zero values of the LWD and data having a null WS. When the WS is small, the dew on leaves accumulates and remains more easily. Hence, high density for large values of LWD observations disappear. The correlation estimates of all employed models become larger than those presented in Figure 5. Particularly, the correlation estimate of the PM model largely increases compared to the increments of other models. The correlation estimate of the PM model is still smaller than that of the ML-based models and NHRH  The results presented in Figure 5 may improperly represent the prediction performance of the PM model because of the replacement of the null WS. The null WS can be a very small WS that is unable to be measured by installed instruments. Hence, the LWD prediction excluding the null WS data are used to assess the prediction performance of the PM model and for comparisons with the other LWD prediction models. Figure 6 presents the quantile-quantile density plots of six LWD prediction models for all stations excluding the correct predictions for zero values of the LWD and data having a null WS. When the WS is small, the dew on leaves accumulates and remains more easily. Hence, high density for large values of LWD observations disappear. The correlation estimates of all employed models become larger than those presented in Figure 5. Particularly, the correlation estimate of the PM model largely increases compared to the increments of other models. The correlation estimate of the PM model is still smaller than that of the ML-based models and NHRH model. Even if the data satisfying the condition of the PM model are used in evaluations of the model's performance, the PM model is unable to outperform to the other models.    Table 3. Overall, the MLbased models provide smaller RMSEs compared to the existing models. For the employed sites, excluding sites 1, 2, and 8, all ML-based models lead to smaller RMSEs than the existing models. Thus, the smallest RMSEs are observed in the ML-based models for all sites. The RF leads to the smallest RMSE for sites 1, 2, 4, 5, 6, 8, and 9. For sites 3 and 7, the DNN provides the smallest RMSE. Overall, the results of RMSE comparisons suggests that the RF model is best-suited to LWD predictions. Among the existing models, the NHRH model shows the best performance. Based on the results of the RMSE, the CART model leads to the worst performance among the employed LWD model based on the results of RMSE.
The mBiases of the six LWD models for all employed stations are presented in Figure 8. The used data sets are the same as those used in Table 3. Overall, the ELM, RF, and PM models show positive values of biases. These models overestimate the LWD. These overestimations can be found at small values of LWD observations (from 0 to 3 h) in Figure 5. The CART and NHRH models provide negative values of biases for all sites excluding sites 4, 8, and 9; overall, these models underestimate the LWD. These underestimations can be found from 3-to 6-h LWD observations. For sites 4, 8, and 9, all employed models overestimate LWD predictions.   Table 3. Overall, the ML-based models provide smaller RMSEs compared to the existing models. For the employed sites, excluding sites 1, 2, and 8, all ML-based models lead to smaller RMSEs than the existing models. Thus, the smallest RMSEs are observed in the ML-based models for all sites. The RF leads to the smallest RMSE for sites 1, 2, 4, 5, 6, 8, and 9. For sites 3 and 7, the DNN provides the smallest RMSE. Overall, the results of RMSE comparisons suggests that the RF model is best-suited to LWD predictions. Among the existing models, the NHRH model shows the best performance. Based on the results of the RMSE, the CART model leads to the worst performance among the employed LWD model based on the results of RMSE.
The mBiases of the six LWD models for all employed stations are presented in Figure 8. The used data sets are the same as those used in Table 3. Overall, the ELM, RF, and PM models show positive values of biases. These models overestimate the LWD. These overestimations can be found at small values of LWD observations (from 0 to 3 h) in Figure 5. The CART and NHRH models provide negative values of biases for all sites excluding sites 4, 8, and 9; overall, these models underestimate the LWD. These underestimations can be found from 3-to 6-h LWD observations. For sites 4, 8, and 9, all employed models overestimate LWD predictions. Note that the used data sets are the same as those used in Table 3.

Figure 8.
Mean bias (mBias) of the five LWD prediction models for all employed stations. Note that the used data sets are the same as those used in Table 3. Note that the used data sets are the same as those used in Table 3. Note that the used data sets are the same as those used in Table 3.

Figure 8.
Mean bias (mBias) of the five LWD prediction models for all employed stations. Note that the used data sets are the same as those used in Table 3.  Table 3. Figure 9 presents the MAEs of six LWD models for all employed stations. The used data sets are the same as those used in Table 3. Overall, results of the MAE are similar to the results of RMSE presented in Figure 7. The ML-based models lead to smaller MAE than the existing models for sites 3, 5, 6, 7, 8 and 9. Although the existing models have comparable performances to the ML-based models for sites 1, 2, and 4, the smallest MAEs are given by the RF that is one of the ML-based model. The best LWD prediction model is the RF model based on MAE. The RF model leads to the smallest MAEs for the employed sites except for sites 3 and 7. Values of the smallest MAEs for each site range from 1.5 to 3.2 h. The used LWD models provide good prediction performances for site 6. Overall values of MAEs for site 9 are larger than for other sites. Note that the used data sets are the same as those used in Table 3. Figure 9 presents the MAEs of six LWD models for all employed stations. The used data sets are the same as those used in Table 3. Overall, results of the MAE are similar to the results of RMSE presented in Figure 7. The ML-based models lead to smaller MAE than the existing models for sites 3, 5, 6, 7, 8 and 9. Although the existing models have comparable performances to the ML-based models for sites 1, 2, and 4, the smallest MAEs are given by the RF that is one of the ML-based model. The best LWD prediction model is the RF model based on MAE. The RF model leads to the smallest MAEs for the employed sites except for sites 3 and 7. Values of the smallest MAEs for each site range from 1.5 to 3.2 h. The used LWD models provide good prediction performances for site 6. Overall values of MAEs for site 9 are larger than for other sites.
To investigate the detailed performance of the LWD prediction models, the times series of observed and predicted LWDs are compared. Time series plots of the LWD observations and LWD predictions by the RF, DNN, and NHRH methods for sites 1 and 4 in 2017 are presented in Figure 10. The three presented models successfully predict the LWD for the two sites. They also successfully determine dry days. For site 1 (Figure 10a), the NHRH occasionally largely overestimates the LWD when large values of LWD are observed. The RF and DNN models have similar performances based on a comparison by visual inspection. Overall, the magnitudes of LWD prediction by the DNN are slightly larger than those by RF. Note that the used data sets are the same as those used in Table 3.
To investigate the detailed performance of the LWD prediction models, the times series of observed and predicted LWDs are compared. Time series plots of the LWD observations and LWD predictions by the RF, DNN, and NHRH methods for sites 1 and 4 in 2017 are presented in Figure 10. The three presented models successfully predict the LWD for the two sites. They also successfully determine dry days. For site 1 (Figure 10a), the NHRH occasionally largely overestimates the LWD when large values of LWD are observed. The RF and DNN models have similar performances based on a comparison by visual inspection. Overall, the magnitudes of LWD prediction by the DNN are slightly larger than those by RF.

Discussion
The size of the test data is more than three thousand (3285 data points). This number is limited to evaluate the performance of the model for reproducing long-term oscillation and trends. However, this number is sufficient to assess the performance of the model for their portability. Based on the results in this study, the ML-based models outperform existing models, such as the CART, NHRH, and PM models, for LWD prediction. Though the worst model of the ML-based models is the ELM model, its performance is better or equivalent to the existing models. The existing models are advantageous in that they can be applied for LWD predictions in any region. The PM model can predict LWDs without consideration of local characteristics when the required information is available. Additionally, many literatures reported that the performances of the CART and NHRH models are good enough for LWD prediction in many regions without any modifications [21,[43][44][45]. On the other hand, the ML-based models can be used to predict LWD in the region of interest where the observed data are used to train the ML algorithms. In the current study, the ML-based models for LWD prediction are built using local data from farms in Gyeonggi province, South Korea. Additionally, the geographical information, i.e., latitude and longitude, are used for input variables of the ML algorithms to improve the predictive performance of the ML-based models. Inclusion of this local and geographical information lead to improvements in the LWD prediction performances of the ML-based models. The performances of the ML-based models, built in the current study, can valid with LWD predictions in Gyeonggi province, South Korea. To use the current ML-based models

Discussion
The size of the test data is more than three thousand (3285 data points). This number is limited to evaluate the performance of the model for reproducing long-term oscillation and trends. However, this number is sufficient to assess the performance of the model for their portability. Based on the results in this study, the ML-based models outperform existing models, such as the CART, NHRH, and PM models, for LWD prediction. Though the worst model of the ML-based models is the ELM model, its performance is better or equivalent to the existing models. The existing models are advantageous in that they can be applied for LWD predictions in any region. The PM model can predict LWDs without consideration of local characteristics when the required information is available. Additionally, many literatures reported that the performances of the CART and NHRH models are good enough for LWD prediction in many regions without any modifications [21,[43][44][45]. On the other hand, the ML-based models can be used to predict LWD in the region of interest where the observed data are used to train the ML algorithms. In the current study, the ML-based models for LWD prediction are built using local data from farms in Gyeonggi province, South Korea. Additionally, the geographical information, i.e., latitude and longitude, are used for input variables of the ML algorithms to improve the predictive performance of the ML-based models. Inclusion of this local and geographical information lead to improvements in the LWD prediction performances of the ML-based models. The performances of the ML-based models, built in the current study, can valid with LWD predictions in Gyeonggi province, South Korea. To use the current ML-based models in other regions or other countries, the ML algorithms should be newly trained using the data in the regions of interest. The predictive performances of the existing models can be improved by adjusting parameters in the models based on the data in the regions of interest. Wang, et al. [24] reported that the performances of CART and NHRH models can be improved by adjusting parameters using local data. Because the ML-based models employed in this study can inherently reflect local and geographical characteristics in the LWD predictions, the performances of the ML-based models are superior to those of the existing models for the LWD data used here.
In the current study, the RF model is considered the best LWD prediction model in Gyeonggi province, South Korea. Based on evaluation results for all data, the RF model demonstrates the best performances. Based on evaluation results of individual sites, the RF model provides the best prediction performances. The RF model returns the smallest RMSE and MAE for seven sites. After the RF model, the DNN model gives the smallest RMSE and MAE. The CART model has proved its applicability and performance, with regard to the LWD predictions, across many literatures. Because the RF algorithm is an extension of the CART algorithm, the RF model should be able to provide more precise predictions than the CART algorithm with the same data set. When the input variables for the CART model, including T air , RH, T dew , and WS, are a part of the input variables for the RF model, performance of the RF model would be better than that of the CART model for LWD prediction. Therefore, among the ML algorithms, the RF model is a good candidate for implementation in LWD prediction models.
As mentioned in Section 3.2.2, the RF model can assess the importance of variables for target variable prediction. In the current study, permutation importance (also known as the mean decrease in accuracy) is used to assess the importance of each variable to LW by removing the association between that variable and the target. In permutation importance, the importance is calculated by randomly permuting the variable and measuring the resulting increase in error. Thus, the variable importance in the RF model indicates the mean of increments between mean square errors with and without the selected variable. Although the results of variable importance from the RF model may not represent the true associations between input variables and LW because of a lack of in-depth analysis, they can roughly show their associations.
The variable importance of the employed RF model is presented in Figure 11. In descending order, magnitudes of each variable importance are RH, short wave radiation, T air , hour, latitude, longitude, and WS. The first three variables in the order are meteorological variables. The meteorological information is the most important in explaining the LW phenomena. From this result, it can be inferred that the physical mechanism to produce LW and the relations between LW and meteorological variables may be consistent across the regions of interest in the current study. If they are not consistent, importance of the geographical information would be larger than those of the three meteorological variables. The most and least important variables are RH and WS, respectively. The RH is obviously an important variable in the LWD prediction as it is used as an input variable in all existing models. Furthermore, the NHRH, that is, the best model among the existing models considered in this study, uses only RH in its LWD prediction.
The short-wave radiation is not the common variables measured at weather stations. The portability of the model may be limited due to availability of short wave radiation observation. The short-wave radiation can be estimated by remote-sensing (satellite) and empirical equations. Although the use of short wave radiation estimate can worsen the performance of the model, the model with short wave radiation may lead to a better performance than the model without short wave radiation because it was considered as the second most critical variable among the tested input variables. In a further study, the difference between uses of short wave radiation observation and estimates should be examined for improving our understanding of the LW prediction. In addition, the model with variables having high availability should be developed for LW prediction. variables. Therefore, the importance of the time variable may be larger than that of the geographical information. Performance of the model without latitude and longitude was assessed and compared with the proposed model. The results were not presented in this study. As a result, uses of latitude and longitude for the input variable improved the performance of the model. This result indicates that the use of latitude and longitude in the model may allow that the model considers spatial dependence between LW and errors in the model. Considering the spatial dependence of the LW and errors would lead to the improvement in the performance of the model. The further study should investigate the influences of the spatial dependence in the LW and the errors on the model and compare with influences of other variables. In addition, the proposed model can be used for the stations in South Korea because of consideration of the spatial dependence. For other countries, the model should be developed using the data measured at areas of interests.

Conclusions
LWD prediction models are developed using advanced machine learning algorithms including ELM, RF, and DNN for LWD data from farms in Gyeonggi province, South Korea. Performances of the ML-based models are compared with those of existing models such as the NHRH, CART, and PM models to examine their applicability and suitability in LWD prediction. The following conclusions are ascertained: (1) The ML-based models outperform the existing models in LWD prediction for farms in Gyeonggi province, South Korea. The ML-based models provide better performances than the existing models based on all employed evaluation measures. Additionally, the worst model among the ML-based models has a comparable performance in LWD prediction to the existing models. Its performance is better than or equivalent to those of the existing models based on all employed evaluation measures. (2) Among the employed models, the RF model is the best for LWD based on the results of evaluation measures for all data. Additionally, it leads to the best performance for seven among nine sites. Although the DNN demonstrates the best performance for two sites, performance of the RF model would be nearly comparable to that of the DNN. Thus, for ML algorithms used in LWD prediction models, the RF algorithm is a good candidate. (3) The strengths of associations between input variables and LW are, in descending order, RH, short wave radiation, Tair, hour, latitude, longitude, and WS based on the results of variable Based on the presented results of variable importance, the hour variable, which represents the diurnal cycle of LW, is more important than the geographical information. Since meteorological variables such as T air and short-wave radiation have a diurnal cycle, they can represent the diurnal cycle. In fact, they may not represent a diurnal cycle because of the seasonal cycle of meteorological variables. Therefore, the importance of the time variable may be larger than that of the geographical information.
Performance of the model without latitude and longitude was assessed and compared with the proposed model. The results were not presented in this study. As a result, uses of latitude and longitude for the input variable improved the performance of the model. This result indicates that the use of latitude and longitude in the model may allow that the model considers spatial dependence between LW and errors in the model. Considering the spatial dependence of the LW and errors would lead to the improvement in the performance of the model. The further study should investigate the influences of the spatial dependence in the LW and the errors on the model and compare with influences of other variables. In addition, the proposed model can be used for the stations in South Korea because of consideration of the spatial dependence. For other countries, the model should be developed using the data measured at areas of interests.

Conclusions
LWD prediction models are developed using advanced machine learning algorithms including ELM, RF, and DNN for LWD data from farms in Gyeonggi province, South Korea. Performances of the ML-based models are compared with those of existing models such as the NHRH, CART, and PM models to examine their applicability and suitability in LWD prediction. The following conclusions are ascertained: (1) The ML-based models outperform the existing models in LWD prediction for farms in Gyeonggi province, South Korea. The ML-based models provide better performances than the existing models based on all employed evaluation measures. Additionally, the worst model among the ML-based models has a comparable performance in LWD prediction to the existing models. Its performance is better than or equivalent to those of the existing models based on all employed evaluation measures. (2) Among the employed models, the RF model is the best for LWD based on the results of evaluation measures for all data. Additionally, it leads to the best performance for seven among nine sites. Although the DNN demonstrates the best performance for two sites, performance of the RF model would be nearly comparable to that of the DNN. Thus, for ML algorithms used in LWD prediction models, the RF algorithm is a good candidate. (3) The strengths of associations between input variables and LW are, in descending order, RH, short wave radiation, T air , hour, latitude, longitude, and WS based on the results of variable importance from the RF model. The first three variables in this ordered list are meteorological variables; the meteorological information is the most important in explaining LW phenomena. The hour variable representing the diurnal cycle is more important than geographical information and WS data.
In the current study, all observed data are used to build the LWD prediction models using ML algorithms. The mechanisms producing LW may be different in rainy and dry days. Hence, to improve the predictive performance of the LWD prediction model using a ML algorithm, the LWD prediction models should be built separately for rainy and dry days. Additionally, though the RF leads to the best performance among the employed ML algorithms in this study, the best algorithm may vary across different regions of interest. Because performance of ML algorithms strongly relies upon used data, appropriateness of the ML algorithms for LWD prediction models should be assessed with LWD data in other regions or countries. This would allow a more in-depth investigation of the suitability of ML algorithms to LWD prediction.