Machine Learning-Based Hourly Frost-Prediction System Optimized for Orchards Using Automatic Weather Station and Digital Camera Image Data

: Spring frosts damage crops that have weakened freezing resistance after germination. We developed a machine learning (ML)-based frost-classiﬁcation model and optimized it for orchard farming environments. First, logistic regression, decision tree, random forest, and support vector machine models were trained using balanced Korea Meteorological Administration (KMA) Automated Synoptic Observing System (ASOS) frost observation data for March from the last 10 years (2008–2017). Random forest and support vector machine models showed good classiﬁcation performance and were selected as the main techniques, which were optimized for orchard ﬁelds based on initial frost occurrence times. The training period was then extended to March–April for 20 years (2000–2019). Finally, the model was applied to the KMA ASOS frost observation data from March to April 2020, which were not used in the previous steps, and RGB data were extracted by digital cameras installed in an orchard in Gyeonggi-do. The developed model successfully classiﬁed 117 of 139 frost observation cases from the domestic ASOS data and 35 of 37 orchard camera observations. The assumption of the initial frost occurrence time for training helped the most in improving the frost-classiﬁcation model. These results clearly indicate that the frost-classiﬁcation model using ML has applicable accuracy in orchard farming.


Introduction
Frost is a phenomenon in which water vapor in the atmosphere crystallizes when the temperature falls below zero.It occurs at a small scale near the surface and is difficult to predict, owing to its complicated growth process and nonlinear interaction between the contact surface and atmosphere.Late frost occurs in spring and considerably damages crops that have weakened freezing resistance after germination.As the risk of late-spring frosts increases, it is important to predict spring frost and share the information with farmers.To this end, Mosedale et al. [1] expected the risk of late-spring frosts to increase because of the earlier timing of grapevine bud break in the UK under future climate scenarios.A temperature-based frost index was then developed for frost warnings [2][3][4].
Chevalier et al. [5] developed a frost-alarm system using a fuzzy expert model.Alongside the development of artificial intelligence (AI), studies have been conducted to predict frost days by applying weather information to machine learning (ML) techniques [6][7][8].In recent years, there have been attempts to mitigate frost risk with hybrid AI methods that combine various internet-of-things (IoT) sensors [9][10][11][12].Most ML models using the IoT showed good accuracy and precision, but IoT devices were mainly used in greenhouses, owing to the availability of internet and power.
Radiation frost is caused by radiative cooling on the surface of the ground at night, and advection frost is caused by the advection of cold air [13,14].South Korea comprises small and complex agricultural lands, and many corresponding studies on frost mechanisms and trends have been conducted.Kwon et al. [15] analyzed the meteorological characteristics of frost occurrence over the past 30 years and concluded that frost is predominantly caused by radiative cooling in South Korea.The frost predominantly occurs from October to April, with the first and last frost days occurring late.Notably, the number of late-frost phenomena tends to increase; thus, it is difficult to predict crop damage caused by late frost [16].Bae et al. [17] analyzed the temporal and spatial variations in the number of frost days using a climate-change scenario.As in previous studies that used observational data, the first frost days were delayed, and the late-frost days arrived sooner than expected.Kim et al. [18] expected that the flowering period for the growth of pears, apples, and peaches would occur earlier if it were calculated based on the climate-change scenario.This increases the frost risk for flowers that have very weak freezing resistance compared with the dormant period, as deviations in low temperature increase after flowering.
Using frost observation data in South Korea for learning, Lee et al. [19] attempted to use logistic regression (LR) and decision tree (DT) techniques to predict frost, and Kim et al. [20] estimated the occurrence of frost using artificial neural networks, random forests (RFs), and support vector machines (SVMs).However, these studies did not verify the proposed models using field observations and had a low temporal resolution in terms of daily frost prediction.One of the most common anti-frost techniques, sprinkler irrigation, requires approximately 2.5-5.1 mm/h of water.Additionally, for the wind-machine technique, a 65-75-kW power source is needed for each 4.0-4.5 ha [21].Therefore, predictions were insufficient to mitigate frost risks for orchards from the viewpoint of management.
In this study, we employ four ML methods (i.e., LR, DT, RF, and SVM) to develop frost-classification models based on meteorological data uniformly observed at the 24 hmanned synoptic weather observation stations of the Korea Meteorological Administration (KMA) Automated Synoptic Observing System (ASOS).Subsequently, the model with the highest classification accuracy is ultimately selected and optimized for application in actual farming environments.Then, the performance of the developed model is verified using the KMA ASOS frost observation information from March to April 2020 and the frost image information obtained from an orchard in Gyeonggi-do, Korea.

Input Data
Currently, frost observations in South Korea are performed twice daily (a.m. and p.m.) at 22 stations of the 24 h-manned synoptic weather observation stations of the KMA ASOS.Data were collected from a total of 19 inland stations (Figure 1).Nighttime was set as 17:00-06:00 LST instead of 18:00-06:00 LST for the link up with the 17:00 LST weather forecast of KMA.Focusing on the late frosts in spring, which directly cause significant damage to crops, nighttime (17:00-06:00 LST) temperature, subzero duration, precipitation, wind speed, humidity, snowfall, three-hourly fresh snowfall, and ground temperature over 10 years (2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017) were used for model training.A value of "1" was assigned collectively if frost was observed in the morning from 17:00 LST on the previous day to 06:00 LST on the present day, and a value of "0" was assigned otherwise.The subzero duration is a secondary variable that uses temperature.It is a value obtained by accumulating the duration of the subzero temperature from 17:00 LST on the previous day to 06:00 LST on the present day in 1 h increments.When the observed temperature is restored to above zero, it is initialized to "0".The idea was devised considering that, for frost crystals to develop into a frost layer, they must be cooled for a critical time [22,23] based on the frost index [2][3][4] using a subzero duration.For onsite verification, orchards in Buk-myeon (point A in Figure 1; located in Gapyeong-gun, Gyeonggi-do) and Wabu-eup (point B in Figure 1; located in Namyangsi, Gyeonggi-do), each installed with an automatic weather station (AWS), were selected as verification target sites.To observe the frost phenomenon, the fruit's growth stage, and farming activities that can affect weather observation, a camera was installed on the AWS to obtain 2560 × 1440-pixel images 10 times daily (05:00, 06:00, 07:00, 08:00, 09:00, 10:00, 12:00, 14:00, 16:00, and 18:00 LST) (Figure 2).Meteorological observation data for the period March-April 2020 from the verification target sites, the Buk-myeon station in Gapyeong-gun and Wabu-eup station in Namyang-si, were provided by the Gyeonggido Agricultural Research and Extension Services (GARES) (http://nongup.gg.go.kr (accessed on 29 June 2021)) and were used for verification.

Preprocessing of Input Data
Data quality is the most important factor in classification algorithm training.As an input data preprocessing step to increase the frost occurrence classification accuracy of the model, errors and missing values were processed, and data were categorized.By referring to the quality-control (QC) flag (0: normal, 1: error, 9: missing), time data, including missing and error values of precipitation, wind speed, humidity, and ground temperature, were deleted.Snowfall and three-hourly fresh snowfall data with no observations (null) were replaced with "0."According to Eltahir [24], wet soil moisture conditions tend For onsite verification, orchards in Buk-myeon (point A in Figure 1; located in Gapyeong-gun, Gyeonggi-do) and Wabu-eup (point B in Figure 1; located in Namyang-si, Gyeonggi-do), each installed with an automatic weather station (AWS), were selected as verification target sites.To observe the frost phenomenon, the fruit's growth stage, and farming activities that can affect weather observation, a camera was installed on the AWS to obtain 2560 × 1440-pixel images 10 times daily (05:00, 06:00, 07:00, 08:00, 09:00, 10:00, 12:00, 14:00, 16:00, and 18:00 LST) (Figure 2).Meteorological observation data for the period March-April 2020 from the verification target sites, the Buk-myeon station in Gapyeonggun and Wabu-eup station in Namyang-si, were provided by the Gyeonggi-do Agricultural Research and Extension Services (GARES) (http://nongup.gg.go.kr (accessed on 29 June 2021)) and were used for verification.For onsite verification, orchards in Buk-myeon (point A in Figure 1; located in Gapyeong-gun, Gyeonggi-do) and Wabu-eup (point B in Figure 1; located in Namyangsi, Gyeonggi-do), each installed with an automatic weather station (AWS), were selected as verification target sites.To observe the frost phenomenon, the fruit's growth stage, and farming activities that can affect weather observation, a camera was installed on the AWS to obtain 2560 × 1440-pixel images 10 times daily (05:00, 06:00, 07:00, 08:00, 09:00, 10:00, 12:00, 14:00, 16:00, and 18:00 LST) (Figure 2).Meteorological observation data for the period March-April 2020 from the verification target sites, the Buk-myeon station in Gapyeong-gun and Wabu-eup station in Namyang-si, were provided by the Gyeonggido Agricultural Research and Extension Services (GARES) (http://nongup.gg.go.kr (accessed on 29 June 2021)) and were used for verification.

Preprocessing of Input Data
Data quality is the most important factor in classification algorithm training.As an input data preprocessing step to increase the frost occurrence classification accuracy of the model, errors and missing values were processed, and data were categorized.By referring to the quality-control (QC) flag (0: normal, 1: error, 9: missing), time data, including missing and error values of precipitation, wind speed, humidity, and ground temperature, were deleted.Snowfall and three-hourly fresh snowfall data with no observations (null) were replaced with "0."According to Eltahir [24], wet soil moisture conditions tend

Preprocessing of Input Data
Data quality is the most important factor in classification algorithm training.As an input data preprocessing step to increase the frost occurrence classification accuracy of the model, errors and missing values were processed, and data were categorized.By referring to the quality-control (QC) flag (0: normal, 1: error, 9: missing), time data, including missing and error values of precipitation, wind speed, humidity, and ground temperature, were deleted.Snowfall and three-hourly fresh snowfall data with no observations (null) were replaced with "0."According to Eltahir [24], wet soil moisture conditions tend to enhance the net terrestrial radiation at the surface via cooling, and the precipitation increases the water content in the soil.KMA's observation policy distinguishes rain days and no-rain days.Precipitation data have a null value for a no-rain day, whereas 0 mm signifies light rainfall that cannot be measured by the sensor.Precipitation is classified into three categories to consider ground conditions that classify precipitation and non-precipitation time, and to use these as measures to represent sky conditions.Among the observations with a normal QC flag (0), when there was no observed precipitation value (null), it was classified as "no rain", whereas observations of 0 mm or more, and less than 1 mm, were classified as "light rain".Those 1 mm or more were classified as "rain" (Table 1).The heat loss on the land surface is caused by evaporation of water on the wet surface caused by light rain.Therefore, light rain can contribute to nighttime cooling compared to a dry surface.On days with continuous precipitation, there are overcast clouds and high relative humidity; such days were classified as "rain" because they were distinguished from weather conditions prone to radiation frost.The criteria for "balanced" and "unbalanced" data depend on the amount of frost observation data.All data for the past 10 years to be used for model development were unbalanced data because they had approximately 5.5-times more days of non-frost occurrence than frost events (Figure 3A).When the frost and non-frost data sets are 50:50, the data are balanced.The training data set in the classification model using ML techniques considerably affects model accuracy, especially in the case of classification models, in which the importance of balanced data is very high.Unbalanced data can degrade model accuracy because more data are trained on days when frost does not occur during classification model training [25,26].As general methods of resolving data imbalance, more weight was given to the side with less data, and techniques, such as up-sampling, down-sampling, and the synthetic minority over-sampling, were applied to adjust and balance data [27].To resolve imbalanced input data, the ratio of frost events was adjusted to 50:50 by applying the down-sampling method, which completely preserved the observed values of frost days (Figure 3B).The balanced data were again divided into training and testing at a ratio of 70:30 by applying a randomization method.They were then used to quantitatively diagnose performance by calculating the evaluation indicators of model training and those of the trained model.Furthermore, verification with unbalanced data that had not been sampled was performed to verify the performance of the model, as frost phenomena occur disproportionately in the real world.

Variable Setting and ML Technique
The complex model proposed in this study includes eight variables of input data (i.e., temperature, subzero temperature duration, precipitation, wind speed, humidity, snowfall, three-hourly fresh snowfall, and ground temperature) based on the observation factors of the KMA ASOS.The simple model includes five variables of input data (i.e., temperature, subzero temperature duration, precipitation, wind speed, and humidity), which are major observation factors of AWS.Originally, a model in which dew-point temperature was included as a variable was selected; however, the variance inflation factor was the highest at the dewpoint temperature in the multicollinearity test.Therefore, models with the corresponding factor removed were selected.The frost-classification model was built in the R language; the packages used according to the ML technique are summarized in Table 2 [28][29][30][31][32][33].
Frost complex = temperature + subzero duration + precipitation +wind speed + humidity + snow f all+ three-hourly fresh snowfall + ground temperature (1) Frost-classification models classify the presence or absence of frost phenomena.DT [34], RF [35,36], and SVM [37] methods are known to perform well with binary classification problems.LR models have lower predictability and accuracy than other ML classification methods; however, the prediction result is a probability value rather than a zero or a one.Therefore, it has the advantage of allowing the threshold to be adjusted by verification to improve the prediction accuracy of frost occurrence.
We employed the tree, rpart, and party packages in the R language for DT.Each package differs in terms of its pruning method.The tree package uses the binary recursive partitioning method, and the rpart package uses the CART methodology to determine the pruning variables based on entropy and Gini coefficients.The party package uses the methodology of unbiased recursive partitioning based on permutation tests to determine the variables to be pruned based on the importance that passed the P test.The Gaussian radial basis function kernel is used for SVM.

Model Evaluation 2.4.1. Performance Evaluation Indicators
In this study, a confusion matrix (Table 3) was prepared to evaluate the performance of the frost-classification model.The matrix comprises data of unclassified frost (true negative (TN)), classified frost (false positive (FP)) when frost is not observed, unclassified frost (false negative (FN)), and classified frost (true positive (TP)) when frost is observed.As frost is classified as the presence or absence of the phenomenon, accuracy (ACC), false-alarm ratio (FAR), probability of detection (POD), and critical success index (CSI) are selected as the verification indicators.Their respective equations are as follows: The ACC is the ratio of the correct classification in the total classification, and the FAR is the number of false alarms.The POD is the ratio of the classified frost by the model to the observed number of the actual frost occurrence.The CSI is the hit rate of frost occurrence classifications excluding TN.In natural conditions, there are far fewer cases of frost phenomena than cases of nonoccurrence and, because predicting frost occurrence is more important than predicting nonoccurrence, the CSI is considered the most important indicator.The area under the curve (AUC) of the receiver operating characteristic curve was calculated.The AUC had a value between 0.5 and 1; the closer it is to 1, the better the model performance is [38].

Performance Result
The confusion matrix (Table 4) and verification index (Table 5) for the test data for each classification model were calculated.The results of the DT technique were denoted Tree 1, Tree 2, and Tree 3 in the order of tree, rpart, and party packages.In the case of the tree package (Tree 1), the same confusion matrix was obtained for the complex model and the simple model.
The ACC of each derived classification model was particularly high in the SVM complex model, and that of the RF complex model was the second highest.The FAR was the highest for the RF complex model, and the POD was the highest for the SVM complex model.The CSI was in the following order: SVM (0.637), RF (0.62), LR (0.605), and DT series (Tree 3: 0.596, Tree 2: 0.575, Tree 1: 0.568).The AUC was in the order of RF (0.853), tree 3 (0.836), LR (0.816), SVM (0.771), Tree 2 (0.753), and Tree 1 (0.708).
For all techniques, the complex model had a higher verification index value than the simple model; however, if the input data of the frost-classification model were to be replaced with the numerical weather prediction output value in the future to account for the numerical model error, the reliability would not necessarily increase with the input variables.
When the performance indicators were synthesized according to the test data, the RF and SVM techniques were determined to be the most appropriate for frost classification, They were selected as the final classification techniques for the frost-classification model (v1.0).

Application and Optimization
Frost is a meteorological phenomenon that is significantly affected by topography and the environment on a small spatial scale.The frost-classification model developed in this study aims to predict frost in the natural environment of an orchard, not a weather station, which minimizes the topography and environment.To produce frost information at a level that can be used on farms, the frost-classification model (v1.0) based on the two ML techniques selected in Section 2.4 (RF and SVM) was optimized and applied to the orchard in the pilot service target site (Gyeonggi-do) to verify the model performance.

Model Optimization Method
To optimize the ML-based frost-classification model and improve its classification performance, first, an assumption was introduced regarding the initial frost occurrence time of the frost occurrence date, which was used as the learning data.Second, the night minimum temperature variable was added to the learning data.Third, the period of observation data used for learning was expanded from March for the previous 10 years (2008-2017) to March-April for 20 years (2000-2019).Sections 3.1.1-3.1.3provide detailed descriptions of each method.

Assuming the Initial Time of Frost Occurrence
Frost observations data in South Korea show that frost occurs in the morning and afternoon.As the time when the frost will occur cannot be known from the observation information, the pre-optimization model (v1.0) had the same value as the frost observation information for all night times when constructing the hourly training data.The result classified by the model trained in this way can be viewed as information regarding the occurrence of frost the next morning, not as a classification of the occurrence of frost over time.Considering that several previous studies assumed frost days to be days when the minimum temperature was below 0 °C [5,7,[14][15][16][17], the time at which the temperature reached 0 °C at night was assumed as the initial frost occurrence time when frost was observed in the morning (Figure 4).It was also assumed that frost only existed at 06:00 LST, the last time of the input data for the day, when the temperature remained higher than 0 °C.this study aims to predict frost in the natural environment of an orchard, not a weather station, which minimizes the topography and environment.To produce frost information at a level that can be used on farms, the frost-classification model (v1.0) based on the two ML techniques selected in Section 2.4 (RF and SVM) was optimized and applied to the orchard in the pilot service target site (Gyeonggi-do) to verify the model performance.

Model Optimization Method
To optimize the ML-based frost-classification model and improve its classification performance, first, an assumption was introduced regarding the initial frost occurrence time of the frost occurrence date, which was used as the learning data.Second, the night minimum temperature variable was added to the learning data.Third, the period of observation data used for learning was expanded from March for the previous 10 years (2008-2017) to March-April for 20 years (2000-2019).Sections 3.1.1.-3.1.3.provide detailed descriptions of each method.

Assuming the Initial Frost Occurrence Time
Frost observations data in South Korea show that frost occurs in the morning and afternoon.As the time when the frost will occur cannot be known from the observation information, the pre-optimization model (v1.0) had the same value as the frost observation information for all night times when constructing the hourly training data.The result classified by the model trained in this way can be viewed as information regarding the occurrence of frost the next morning, not as a classification of the occurrence of frost over time.Considering that several previous studies assumed frost days to be days when the minimum temperature was below 0 ℃ [5,7,14-17], the time at which the temperature reached 0 ℃ at night was assumed as the initial frost occurrence time when frost was observed in the morning (Figure 4).It was also assumed that frost only existed at 06:00 LST, the last time of the input data for the day, when the temperature remained higher than 0 ℃.

Minimum Temperature at Night
When analyzing the verification results of the frost-classification model in real cases, a factor that affected the classification accuracy of the model was the occurrence of frost when the daily minimum temperature was higher than 0 ℃.Such cases often occurred, and the cause is surmised to be errors arising from the difference between the height of the observation station thermometer and the location at which the frost is observed; the

Minimum Temperature at Night
When analyzing the verification results of the frost-classification model in real cases, a factor that affected the classification accuracy of the model was the occurrence of frost when the daily minimum temperature was higher than 0 °C.Such cases often occurred, and the cause is surmised to be errors arising from the difference between the height of the observation station thermometer and the location at which the frost is observed; the observation data consist of each hourly air temperature, and the lower temperatures that could occur between observation times were not considered.To compensate for this, the minimum night temperature was added as an input variable to the training data.The lowest value among the minimum daily temperature and the value of the hourly temperature at nighttime (17:00-06:00 LST) observed at the observatory was used as the minimum nighttime temperature.The minimum daily temperature occurred primarily in the morning; however, to reflect cases where the temperature was rather high in the morning, the lower value among the values compared with the hourly temperature during the nighttime was used.

Extension of Training Period
Increasing the number of training data is a simple method to improve the performance of the ML model.However, increasing the training too much can lead to overfitting and could deteriorate the model's performance.For this reason, the training data, which consisted of weather observation data for March for 10 years, were gradually extended to weather observation data for March and April for 20 years (2000-2019).Although there are a few days when frost occurs in April, the training period was extended because April is the flowering period of fruit trees in South Korea, and the frost causes considerable damage.Initially, the goal of the training-period expansion plan was to include data for 30 years (1990-2019), but observations before 1999 had a number of missing temperature values for more than 2 h in a row.As the missing temperature would significantly affect the calculation results of the subzero duration and the assumption of the initial frost occurrence, weather observation data since 2000 were used.

Optimization Results: Case Period March-April 2020
Table 6 summarizes the contents of the phased optimization of the frost-classification model.The performance evaluation index values for each version, which were calculated using the test data reconstructed as balanced data, are shown in Table 7.In the pre-optimization version (v1.0), when the initial frost occurrence time assumption was added (v2.0), all indicators were significantly improved.Furthermore, the difference in performance between the complex model requiring eight input variables, and the simple model requiring five input variables, for the same ML technique, was also considerably reduced.The performances of the ML techniques were almost identical.In Version 2.1, in which the training period was extended to March-April 2008-2019, all verification indicators of both ML techniques were improved.Version 2.2, in which the minimum temperature at night was added to the training to increase the classification accuracy for cases in which frost was observed when the daily minimum temperature was higher than 0 • C, provided a slight improvement in accuracy compared with Version 2.1.In Version 2.3, in which the learning period was extended from 2008-2019 to 2000-2019, the verification index decreased compared with the previous version.7, most stations have noticeable performance improvements in Version 2.0.Pohang, Changwon, Busan, and Yeosu showed lower indicators compared to the total.These regions have a common topographical characteristic: the southern coast of the Korean peninsula (Figure 1).Kwon et al. [15] determined that the average daily minimum temperature for the spring frost occurrence days of 1973-2007 in this southern coastal region was 1.0 • C. As mentioned in Section 3.1.2., the limitations of the current model have been prominently shown in these areas where there are many cases (e.g., frost was observed when the daily minimum temperature was higher than 0 °C) that the models did not classify well.

Case Verification Using KMA ASOS Data
The classification results for each version were compared with the actual frost observation data using the weather observation values from 17:00-06:00 at 18 locations of the KMA ASOS from March to April 2020 as input data.For the classification result of the model, frost occurrence was classified as true only when both ML techniques classified frost as occurring.As the morning frost observation data were obtained daily, hourly verification was conducted assuming the first frost occurrence time.Furthermore, daily verification was conducted for only 06:00 LST classification results.Table 8 shows the classification results of each version using the confusion matrix and verification index.Unlike the test data of the learning DB, which are balanced data, the ACC was high for all versions in the actual case because there are less data of frost days than there are of non-frost days.The FAR, which was a limitation of the pre-optimization model (Version 1.0), was considerably improved in the version after Version 2.0, which reflected the assumption of the initial frost occurrence time.If the training data of the pre-optimization model (Version 1.0) had data showing frost occurring the next day, even when it was difficult to accurately predict frost occurrence, the existing frost occurrence data likely contributed excessively to frost classification.However, the version after Version 2.0 that reflected the initial frost occurrence time assumption classified less frost occurrence as a whole, indicating a decrease in TP and an increase in FN, resulting in a lower POD.

Case Verification Using KMA ASOS Data
The classification results for each version were compared with the actual frost observation data using the weather observation values from 17:00-06:00 at 18 locations of the KMA ASOS from March to April 2020 as input data.For the classification result of the model, frost occurrence was classified as true only when both ML techniques classified frost as occurring.As the morning frost observation data were obtained daily, hourly verification was conducted assuming the first frost occurrence time.Furthermore, daily verification was conducted for only 06:00 LST classification results.Table 8 shows the classification results of each version using the confusion matrix and verification index.Unlike the test data of the learning DB, which are balanced data, the ACC was high for all versions in the actual case because there are less data of frost days than there are of non-frost days.The FAR, which was a limitation of the pre-optimization model (Version 1.0), was considerably improved in the version after Version 2.0, which reflected the assumption of the initial frost occurrence time.If the training data of the pre-optimization model (Version 1.0) had data showing frost occurring the next day, even when it was difficult to accurately predict frost occurrence, the existing frost occurrence data likely contributed excessively to frost classification.However, the version after Version 2.0 that reflected the initial frost occurrence time assumption classified less frost occurrence as a whole, indicating a decrease in TP and an increase in FN, resulting in a lower POD.Among the versions (after Version 2.0) that reflected the initial frost occurrence time assumptions, the POD was the highest for Version 2.2, and TP, which indicated that frost was classified to occur on days when frost was observed, was also the highest.However, because there is still a tendency to overestimate frost occurrence, both the frost event itself and its duration were classified as being longer on the day when the occurrence of said frost was classified.Version 2.3, which learned from the meteorological observation data for 20 years, classified fewer frost occurrence days than Version 2.2, which learned from meteorological observation data for 10 years.The FN of Version 2.3, which indicated failure to classify the actual observed frost, was higher for the simple model than for Version 2.2.The overestimate that appeared in Versions 2.1 and 2.2 was reduced in Version 2.3 (the decline in FP).Thus, we improved the ACC and FAR in Version 2.3.

Verification of Orchard Cases in March-April 2020 Using Digital Cameras and AWS Observations
The frost-classification model was verified by meteorological observation data of GARES AWS and digital camera image data from March-April 2020 of the orchard selected as the verification target.The verification period was from March 1 to April 19 in Bukmyeon, Gapyeong-gun, and March 1 to April 22 in Wabu-eup, Namyang-si.Five days that could not be identified owing to strong fog (Buk-myeon, Gapyeong-gun: March 1, March 22; Wabu-eup, Namyangju-si: March 22, April 18, April 20) were excluded from the verification.In the case of orchard AWS, rainfall of 0.1 mm or more, and less than 1 mm, was categorized as "light rain" because rainfall of 0 mm was not used to distinguish between precipitation days and non-precipitation days, unlike the KMA observation.
While estimating the frost occurrence date in the orchard using digital camera images, the days when frost heave occurred and the days when a thick frost layer occurred were first classified (Figure 6).The normalized difference snow index (RGB-NDSI), which uses RGB values to analyze snow cover [39,40], was calculated and used to determine frost on other days (Figure 7).The method of calculating RGB-NDSI presented in Hinkler et al. [39] is given as follows: Atmosphere 2021, 12, 846 13 of 16 RGB-NDSI = ,  RGB is the average of each RGB element value of a pixel, and RGBMax is the highest RGB value among pixels.In the equation that calculates τ, a and b are empirical constants that are specific for each camera, which are replaced by τ = (RGBHigh)Mean by Fedorov et al. [40].In this study, the average of the RGBHigh values was used as τ because the empirical constants, a and b, of the camera could not be obtained.The mid-infrared spectral band (MIR) value of the calculation formula for NDSI was replaced with MIRReplacement, which was calculated using the RGB value in the RGB-NDSI calculation formula and was analyzed using the Python language and the opencv-python library [41].In the AWS in the orchard, not all input data of the complex model were observed; therefore, only the simple model was used for verification.
Table 9 shows the classification results of the simple model for each version using the confusion matrix and verification index.As with the verification using the KMA ASOS, the pre-optimization version (1.0) showed a high POD and a low FAR, and the ACC and FAR were improved in the version after optimization (after Version 2.0).After optimization, all versions classified 35 of 37 cases equally in daily verification.As in the case of  RGB is the average of each RGB element value of a pixel, and RGBMax is the highest RGB value among pixels.In the equation that calculates τ, a and b are empirical constants that are specific for each camera, which are replaced by τ = (RGBHigh)Mean by Fedorov et al. [40].In this study, the average of the RGBHigh values was used as τ because the empirical constants, a and b, of the camera could not be obtained.The mid-infrared spectral band (MIR) value of the calculation formula for NDSI was replaced with MIRReplacement, which was calculated using the RGB value in the RGB-NDSI calculation formula and was analyzed using the Python language and the opencv-python library [41].In the AWS in the orchard, not all input data of the complex model were observed; therefore, only the simple model was used for verification.
Table 9 shows the classification results of the simple model for each version using the confusion matrix and verification index.As with the verification using the KMA ASOS, the pre-optimization version (1.0) showed a high POD and a low FAR, and the ACC and FAR were improved in the version after optimization (after Version 2.0).After optimization, all versions classified 35 of 37 cases equally in daily verification.As in the case of RGB is the average of each RGB element value of a pixel, and RGB Max is the highest RGB value among pixels.In the equation that calculates τ, a and b are empirical constants that are specific for each camera, which are replaced by τ = (RGB High ) Mean by Fedorov et al. [40].In this study, the average of the RGB High values was used as τ because the empirical constants, a and b, of the camera could not be obtained.The mid-infrared spectral band (MIR) value of the calculation formula for NDSI was replaced with MIR Replacement , which was calculated using the RGB value in the RGB-NDSI calculation formula and was analyzed using the Python language and the opencv-python library [41].In the AWS in the orchard, not all input data of the complex model were observed; therefore, only the simple model was used for verification.
Table 9 shows the classification results of the simple model for each version using the confusion matrix and verification index.As with the verification using the KMA ASOS, the pre-optimization version (1.0) showed a high POD and a low FAR, and the ACC and FAR were improved in the version after optimization (after Version 2.0).After optimization, all versions classified 35 of 37 cases equally in daily verification.As in the case of verification using the KMA ASOS, the overestimate was the smallest for Version 2.3, which was determined to be a stable version compared to the other versions.

Summary and Future Works
We developed an hourly frost-classification model using four types of ML method (i.e., LR, DT, RF, and SVM) based on the frost observation information of the KMA ASOS for the past 20 years.Among them, the frost-classification model based on RF and SVM was selected.The basic assumptions of the model were altered, and the training data were increased to optimize the model for farming environments that are considerably affected by topography and environment.In addition, the frost-classification model was verified using the frost observation information of KMA's 24 h-manned synoptic weather observation stations and the frost observation information of the GARES AWS-installed orchard using a camera from March-April 2020; the data represented unbalanced data.During verification using the KMA ASOS observation information and orchard data, more frost occurrence days were classified in the pre-optimization version (1.0); however, it was difficult to apply the pre-optimization model to farms because it exhibited a limitation of excessive classification of the frost phenomenon itself.In the optimized version, these vulnerabilities were reduced, and the ACC, POD, and CSI were improved.Regarding frost events, a maximum of 117 cases were classified out of 139 domestic frost observations in the spring of 2020, and 35 of 37 cases were classified in the orchard verification scheme using a camera.
The assumption of the initial frost occurrence time greatly improved the performance of the frost-classification model using the ML method.If initial frost occurrence timeobservation data or hourly frost observation data are used for training, the performance of the frost-classification model can be improved.However, frost observation using a digital camera, as in this study, has a limitation in terms of hourly frost observations.It is impossible to capture pictures at night, and it is difficult to distinguish between reflected sunlight and frost crystals after sunrise.For this reason, frost observation using a thermal imaging camera may be a good alternative.
In this study, hourly observation data and their secondary variables were used as input data.As hourly observation data were used as input, the training reflected the characteristics of variables with diurnal variation and were discontinuous for wind speed and precipitation.The secondary variables, subzero temperature duration and categorized precipitation, were used as input data.Categorized precipitation can be considered a variable that focuses more on sky conditions than on the amount of precipitation.However, the criteria for light rain and rain were determined empirically.Therefore, it was necessary to discuss the contribution to radiative cooling and frost occurrence of wet surfaces.
RF and SVM techniques have also evolved to perform nonlinear classification.However, the frost-classification model shows low performance for the frost case when the daily minimum temperature is higher than 0 °C.Generally, frost phenomena are heavily influenced by temperature, but the data for these cases have nonlinearities.For the next step, we will classify these cases using a deep neural network-based algorithm that allows more diverse attempts at nonlinear classification via the adjustment of hidden layers and activation functions.Furthermore, the current frost-classification model has been used to predict frost based on the 17:00 LST KMA weather forecast as input data.The frostprediction system can be further improved by considering frost-retaining conditions, which were not considered in the present study but will be in the near future.

Figure 2 .
Figure 2. AWS and digital camera (red circle) at the Gapyeong (A) and Namyangju (B) sites.

Figure 2 .
Figure 2. AWS and digital camera (red circle) at the Gapyeong (A) and Namyangju (B) sites.

Figure 2 .
Figure 2. AWS and digital camera (red circle) at the Gapyeong (A) and Namyangju (B) sites.

Figure 3 .
Figure 3.The number of events in unbalanced data set (A) and balanced data set (B).

Figure 4 .
Figure 4. Example of assumed initial frost occurrence time.

Figure 4 .
Figure 4. Example of assumed initial frost occurrence time.

Figure 5
Figure 5 presents the performance evaluation indicators for ASOS by version.The station-by-station confusion matrix is calculated based on frost observations and classified frost occurrence data by model.The data classified as frost occurrence were determined when the classification output of both ML techniques signified frost occurrence.As in Table7, most stations have noticeable performance improvements in Version 2.0.Pohang, Changwon, Busan, and Yeosu showed lower indicators compared to the total.These regions have a common topographical characteristic: the southern coast of the Korean peninsula (Figure1).Kwon et al.[15] determined that the average daily minimum temperature for the spring frost occurrence days of 1973-2007 in this southern coastal region was 1.0 • C. As mentioned in Section 3.1.2., the limitations of the current model have been prominently shown in these areas where there are many cases (e.g., frost was observed when the daily minimum temperature was higher than 0 °C) that the models did not classify well.

Figure 5 .
Figure 5. Performance evaluation indicators for each weather station shown in Figure 1.

Figure 5 .
Figure 5. Performance evaluation indicators for each weather station shown in Figure 1.

Figure 6 .
Figure 6.Digital camera images on a frost day at Gapyeong (left) and Namyangju (right) shown in Figure 1.

Figure 7 .
Figure 7. Normalized difference snow index (NDSI) (red box) calculated from a frost day camera image at Namyangju

Figure 6 .Figure 6 .
Figure 6.Digital camera images on a frost day at Gapyeong (left) and Namyangju (right) shown in Figure 1.

Figure 7 .
Figure 7. Normalized difference snow index (NDSI) (red box) calculated from a frost day camera image at Namyangju

Figure 7 .
Figure 7. Normalized difference snow index (NDSI) (red box) calculated from a frost day camera image at Namyangju.

Table 1 .
Categorization of precipitation observations.

Table 2 .
Package names and source codes for each classification method.

Table 4 .
Confusion matrix of original version for each classification model.

Table 5 .
Performance evaluation indicators of the original version for each classification model.

Table 6 .
Update note of frost-classification models.

Table 7 .
Performance evaluation indicators of the original and optimized models.

Table 8 .
Confusion matrix and performance evaluation indicators for the real case (KMA ASOS).

Table 9 .
Confusion matrix and performance evaluation indicators for real case (orchard).