Machine Learning with Voting Committee for Frost Prediction

Vinícius Albuquerque de Almeida; Juliana Aparecida Anochi; José Roberto Rozante; Haroldo Fraga de Campos Velho

doi:10.3390/meteorology4010006

,

and

¹

Laboratory for Applied Meteorology, Federal University of Rio de Janeiro, Rio de Janeiro 21941-901, Brazil

²

National Institute for Space Research, São José dos Campos 12227-010, Brazil

^*

Author to whom correspondence should be addressed.

Meteorology2025, 4(1), 6;https://doi.org/10.3390/meteorology4010006

Version Notes

Order Reprints

Review Reports

Abstract

A machine learning (ML)-based methodology for predicting frosts was applied to the southern and southeastern regions of Brazil, as well as to other countries including Uruguay, Paraguay, northern Argentina, and southeastern Bolivia. The machine learning model (using TensorFlow (TF)) was compared to the frost index (IG from the Portuguese: Índice de Geada) developed by the National Institute for Space Research (INPE, Brazil). The IG is estimated using meteorological variables from a regional weather numerical model (RWNM). After calculating the two indices using the ML model and the RWNM, a voting committee (VC) was trained to select between the computed outputs. The AdaBoostClassifier algorithm was employed to implement the voting committee. The study area was subdivided into three distinct subregions: R1 (outside Brazil), R2 (the south of Brazil), and R3 (southeastern Brazil). Two forecasting time scales were evaluated: 24 h and 72 h. The 24 h forecasts from both approaches (TF and RWNM) exhibited a similar performance in terms of the number of accurate predictions. However, in the region covering Uruguay and northern Argentina, the TensorFlow model demonstrated superior frost prediction accuracy. Additionally, the TensorFlow model outperformed the RWNM for the 72 h forecast horizon.

Keywords:

frost index; frost prediction; deep learning; committee machine

1. Introduction

Various atmospheric phenomena in Brazil cause significant societal impacts; however, frost is considered one of the most detrimental to the country’s economy, particularly in sectors related to food production. In years with a high frost incidence, there is a marked decline in agricultural yields, which leads to a rise in prices due to product scarcity. Classic examples of reduced production and subsequent price increases have been extensively documented in studies on coffee crops (Margolis [1]; Hewitt [2]; Moricochi et al. [3]), wheat (Junges et al. [4]; Melo and Moro [5]), and corn (Tsunechiro and Miura [6]), among others.

The term frost is technically defined as the formation of ice crystals on exposed surfaces, either by freezing dew or by the phase transition from vapor to ice (Blanc et al. [7]; Bettencourt [8]; Mota [9]; Cunha [10]). However, this term is also used colloquially to characterize meteorological events that cause damage to various plant crops. The literature has no consensus regarding the definition of frost from a meteorological perspective. Consequently, several definitions can be found, including the following: (a) an air temperature less than or equal to 0 °C measured in a shelter at a height between

1.25

and

2.0

m (Hogg [11,12]; Lawrence [13]); (b) an air temperature below 0 °C without specifying the type and height of the shelter (Raposo [14]; Hewett [15]); and (c) a surface temperature below 0 °C (Cunha [16]).

Several methods (both passive and active) exist to minimize damage caused by frost (Snyder and Melor [17]); however, some of these methods can be expensive and require advance preparation time to implement. In this context, significant efforts have been applied to developing tools that can predict the occurrence of frost events in advance.

In recent years, machine learning methods research has been widely used for frost prediction. Diedrichs et al. [18] developed a component for an IoT-enabled frost prediction system, where they used machine learning algorithms trained by previous readings of temperature and humidity sensors to predict future temperatures. Ding et al. [19] propose the construction of predictive models using the support vector machine approach to capture possible causal relationships between several environmental factors and frost. Fuentes et al. [20] propose a neural network model, based on backpropagation, to predict the minimum air temperature of the following day from meteorological data using air temperature, relative humidity, radiation, precipitation, and wind direction and speed to detect the occurrence of radiative frost events. Another area of research using applied machine learning for frost prediction is the study from Maqsood et al. [21]. The authors present a 24 h weather forecast in southern Saskatchewan, Canada from a set of artificial neural networks, all trained with temperature, relative humidity, and wind speed data. Lira et al. [22] utilized a spatio-temporal neural network architecture, achieving advancements compared to existing state-of-the-art methods for frost prediction. Similarly, Talsma et al. [23] explored the performance of two distinct neural network models: a fully connected network and a convolution-based model, by benchmarking them against a Random Forest algorithm. Further contributions to this field include the work of Talsma et al. [23] and Wassan et al. [24], both of whom applied convolutional models for frost prediction, showcasing the growing importance of deep learning approaches in addressing this challenge.

In recent studies, Rozante et al. [25] developed a frost index capable of predicting the possibility of frosts occurring five days in advance for three regions located in the south/southeast of Brazil, and part of Argentina, Uruguay, and Paraguay. This index is obtained from multivariate statistical techniques applied to meteorological variables predicted by a regional model of high spatial and temporal resolution. According to the authors, a comparison between the forecasts of the regional model and the index indicated significant improvements by the index for all regions and forecasts analyzed. Rozante and co-authors [26] also presented a frost prediction system by using a multi-layer perception neural network. It uses two optimization stochastic gradient descent schemes for the learning process and was applied in the South Region, Brazil.

The present study proposes the use of a methodology based on machine learning for the prediction of frosts in the south and southeast regions of Brazil, and some countries that include Uruguay, Paraguay, northern Argentina, and southeast Bolivia.

The machine learning model developed in this study was compared with the frost index proposed by Rozante et al. [25], which is currently operational at the National Institute for Space Research (INPE: Instituto Nacional de Pesquisas Espaciais, Brazil). A key innovation of this research is the implementation of a committee machine [27], which integrates multiple machine learning algorithms to improve prediction accuracy. Two inputs are considered: the frost index computed by Rozante et al. [25], and a second index derived from a deep learning approach. The AdaBoostClassifier is employed as the voting committee, combining the strengths of both models to enhance the robustness and reliability of frost forecasts.

The paper is structured as follows: Section 2 provides a brief description of the dataset used in the research, the study area of interest, the experimental setup, the machine learning algorithms, and the evaluation metrics. Section 3 is dedicated to presenting the results, showcasing the key findings through figures, tables, and statistical analyses that illustrate the performance of the models under various conditions. Section 4 provides a detailed discussion of the results and concludes with a summary of the main contributions of the study.

2. Materials and Methods

Rozante et al. [25] define favorable situations for the occurrence of frost in two distinct classes: firstly, the current meteorological conditions; and secondly, conditions such as terrain exposure, proximity to forests, latitude, and altitude. In terms of atmospheric conditions, it is important to note the following: low temperature, clear sky, light winds, high atmospheric pressure, and low humidity.

To classify the atmospheric conditions, five predicted meteorological attributes extracted from the Eta regional meteorological model were used: temperature and relative humidity at 2 m, wind speed at 10 m, mean pressure at sea level, and cloudiness. The attributes were extracted after 24 h forecasting period by Eta model, and the time of minimum temperatures from meteorological ground stations was used to select the Eta model meteorological attributes to compute the frost index with the Rozante et al. strategy [25], and by using the TensorFlow deep learning approach.

There is no universal definition to characterize the occurrence of frost. Technically, frost is described as the formation of ice crystals on surfaces, either through the freezing of dew or the direct phase transition of water vapor into ice. However, the criteria for identifying frost vary depending on the context. In meteorology, some authors define frost as the occurrence of temperatures equal to or below 0 °C in a standard meteorological shelter, while others consider any air temperature below 0 °C, without specifying the type or height of the shelter. In agriculture, the characterization of frost is heavily dependent on the type of plant and its phenological stage, as each species has varying levels of tolerance to low temperatures. According to da Mota (1987) [9], leaf temperature thresholds for causing plant damage range from approximately −6 °C for more resistant plants to 0 °C for more sensitive ones. These leaf temperatures correspond to air temperatures between 0 °C and 6 °C when measured in a meteorological shelter, as described by [28]. Therefore, the chosen temperature range reflects the interaction between meteorological conditions and the potential damage to plants. For our studies and analyses, we adopted the criterion of minimum observed temperatures (

T o b s

) ≤ 6 °C. This approach aligned our methodology with the evidence reported in the literature and facilitated direct comparisons with previous studies, such as those presented by [25]. By doing so, we ensured a consistent and scientifically grounded framework for characterizing frost, addressing both agricultural impacts and the robustness of the adopted criteria.

The methodology for frost prediction presented here is general and uses some local conditions, for instance, topography altitude, and latitude. The case study area for frost forecasting using the machine learning approach herein corresponds to the south and southeast of Brazil and some countries that include Uruguay, Paraguay and part of Argentina and Bolivia. The study area and topography are illustrated in Figure 1. The first region (R1) corresponds to northern Argentina, Uruguay, Paraguay and southeast Bolivia. The second region (R2) encompasses the entire southern region of Brazil, covering the three states: Rio Grande do Sul, Santa Catarina, and Paraná. Finally, the third region (R3) includes the states of São Paulo, Mato Grosso do Sul, Rio de Janeiro, and south of Minas Gerais. These regions encompass diverse landscapes. The Pampas plains in Uruguay and northern Argentina (R1) are characterized by flat, low-lying terrain that is prone to frost formation. The R2 region, located in southern Brazil, features a combination of flat terrain and areas with moderate relief, with altitudes ranging from low to medium. Additionally, this region is known for its extensive agricultural areas, such as the plains and fields of Rio Grande do Sul, which are particularly susceptible to frost formation during periods of low temperatures. The highlands of southeastern Brazil (R3) consist of mountainous areas and valleys, where the topography significantly influences temperature variations and frost occurrence.

Figure 1. Study area and topography.

2.1. Data

The data used in this study were obtained from the Eta model, which has been operational at the INPE since 1996. The Eta model, originally developed at the University of Belgrade [29,30], is a limited-area atmospheric model that utilizes the Arakawa E grid [31] and a terrain-following vertical coordinate system (

η

), making it well suited for regions with complex topography. The model covers the entire South American continent and surrounding oceanic areas. For this study, operational forecasts with a spatial resolution of 15 km and 50 vertical levels were employed. The model’s initial and lateral boundary conditions were provided by the analyses and forecasts of the Global Forecast System (GFS).

The analysis of frost patterns was conducted using two distinct data time series. The first dataset consisted of observed minimum temperature (Tobs) measurements collected from conventional meteorological stations distributed by the Global Telecommunication System (GTS) and provided by the National Institute of Meteorology (INMet). The second dataset comprised hourly numerical forecasts obtained from integrations of the regional Eta model. This model was initialized with conditions at 0000 and 1200 UTC, featuring a horizontal resolution of 15 km and 50 vertical levels, as described in Mesinger et al., Black et al., and Chou et al. [29,30,32].

The data collected for the frost prediction experiments was 6 years (2012 to 2017). For the calibration of the model, data were used in the period (2012 to 2016) and for the validation of the index, 2017 was selected.

2.2. IG-Frost Index

As already mentioned, Rozante and co-authors [25] established a frost index IG (in Portuguese, Índice de Geada) for a region in South America, to indicate the occurrence or not of frosts, from meteorological variables associated to this event. Five meteorological variables (temperature (T), humidity (H), sea level pressure (P), wind (V), and cloudiness (N)) as predicted from the Eta limited area meteorological model are recorded for the IG calculation.

Averages—indicated by the operator

⟨ \cdot ⟩

—and standard deviations of the five variables—Equations (1) and (2), respectively—were computed only for observed cases of frost:

⟨ {VAR}_{(i, j, h)} ⟩ = \sum_{k = 1}^{n_{(i, j)}} \frac{{VAR}_{(i, j, h)}}{n_{(i, j)}}

(1)

σ_{{VAR}_{(i, j, h)}} = \sum_{i = 1}^{n_{(i, j)}} \sqrt{\frac{{[{VAR}_{(i, j, h)} - ⟨ {VAR}_{(i, j, h)} ⟩]}^{2}}{n_{(i, j)}}}

(2)

where

VAR = T, H, V, N

, or P as predicted by the Eta model;

(i, j)

denotes the grid points nearest to the positions of the weather stations; n is the number of days with frost observations; h is the predicted times (24 h); and

σ

expresses the standard deviations for each variable.

Finally, the IG is computed as a weighted linear combination of the five variables, averages, and standard deviations:

{I G}_{(i, j, h)} = \sum_{u = 1}^{5} w_{u} [\frac{⟨ {VAR}_{(i j, h)}^{u} ⟩ - {VAR}_{(i j, h)}^{u}}{σ_{{VAR}_{(i, j, h)}}}]

(3)

where u indicates the type of meteorological variable, and

w_{u}

are the weights. The calibration for the IG is described by Rozante et al. [25], where a set of thresholds

L (i, j, h)

is determined for each grid point and forecast hour for detecting a frost event:

\{\begin{matrix} I G ((i, j, h)) \geq L ((i, j, h)) & ⟹ & Occurrence, \\ I G ((i, j, h)) < L ((i, j, h)) & ⟹ & Non - occurrence . \end{matrix}

Threshold parameters

L (i, j, h)

depend on the (latitude, longitude) coordinates, the prediction time cycle, and other processes.

2.3. Neural Network

TensorFlow is a robust, open-source framework designed for the development and deployment of advanced machine learning algorithms [33]. It is applied as a high-level interface for the definition of complex models and as a scalable system optimized for executing computations on large datasets. Initially developed by the Google Brain team in 2011, TensorFlow was engineered to facilitate the exploration and application of large-scale deep neural networks, enabling both cutting-edge research and integration into a wide range of Google products.

TensorFlow is highly versatile, implementing a wide range of machine learning algorithms, particularly deep neural networks. It has been employed across diverse fields within computer science, as well as other disciplines, such as speech recognition, computer vision, robotics, natural language processing, and computational biology. The framework API and reference implementation were made publicly available in November 2015 under the Apache 2.0 license, with access provided at.

Through the utilization of TensorFlow, users can design diverse neural network architectures, which are typically organized with an input layer, one or more hidden layers, and an output layer (Figure 2). In addition to the number of layers, several parameters must be configured, such as the number of units in the hidden layers, the activation functions for each layer, the initial weights between connections, and the optimization algorithms used during training. These hyperparameters play a crucial role in determining the model’s overall performance.

Figure 2. Typical topology of a neural network.

The Google Colaboratory [34]—CoLab —was used for prototyping the machine learning models. This platform is a product from Google Research that allows anybody to write and execute arbitrary Python code through the browser and is especially well suited to machine learning, data analysis, and education. More technically, CoLab is a hosted Jupyter notebook service that requires no setup to use, while providing access free of charge to computing resources including GPUs [34]. Figure 2 illustrates the topology of an artificial neural network, with an input layer with eight neurons, two hidden layers with four neurons each, and an output layer with a single neuron.

2.4. Voting Committee

Voting committees (VC) are a type of machine committee [27], which is a model trained to decide on the best forecast among an ensemble of models. The primary goal of a machine committee is to improve the overall prediction accuracy by combining the strengths of multiple individual models. Each model in the ensemble contributes to the final decision, typically through a voting mechanism. The VC model is trained to determine which forecast’s first index must be considered when there is a divergence between IG and TF forecasts. In this research, the AdaBoostClassifier implementation available in the Scikit-learn Python 1.6.1 module [35,36] was used.

The definition of boosting uses the principle that a very accurate prediction can be produced by the combination of several inaccurate models. The general boosting idea is to develop the classifier ensemble incrementally, adding one classifier at a time. The classifier that joins the ensemble at one step is trained on a dataset selectively sampled from the training dataset. The sampling distribution starts from uniform and is updated for each new classifier. The likelihood of the objects being misclassified at the previous step is increased so that they have a higher chance of entering the training sample of the next classifier. The algorithm is called AdaBoost in [37], which comes from ADAptive BOOSTing [38].

An AdaBoost classifier [37] is a meta-estimator that starts by fitting a classifier on the original dataset. It then fits additional copies of the classifier on the same dataset but adjusts the weights of incorrectly classified instances so that subsequent classifiers focus more on the difficult cases.

2.5. Evaluation Metrics

A statistical evaluation was performed using the indices presented in Table 1. The distribution of observed and predicted cases for positive and negative events is shown in Table 2, which is used to calculate evaluation indices, such as CSI (Critical Success Index), POD (Probability of Detection), SR (Success Ratio), FAR (False Alarm Ratio), PC (Proportion Correct), and BIAS.

Table 1. Statistical indices for evaluation.

Table 2. Contingency table.

These metrics are commonly used in meteorology to assess the performance of forecast models, providing insights into their strengths and weaknesses in predicting specific events.

2.6. Description of Experiments

Experiments were performed using a meteorological dataset from 2012 to 2017. The dataset consists of several meteorological variables extracted from 24 h forecasts from the Eta Model, such as temperature, pressure, wind speed, cloud cover, humidity, and topography (height above sea level). Also, the observed temperature at several stations in the study area was used to define frost and non-frost events. These experiments consisted of creating frost forecast models using three different approaches: frost index, TensorFlow, and voting committee.

To perform the analysis, the dataset was divided into three regions, as illustrated in Figure 1. Two time periods were defined for the training and testing phases, corresponding to 2012–2016 and 2017, respectively. The TensorFlow model was trained using data from the 2012–2016 period, with the model configuration detailed in Table 3. Subsequently, the trained TensorFlow model was applied to the test dataset (2017) to compute the statistical metrics shown in Table 1.

Table 3. Final topology and other characteristics of neural networks in TensorFlow experiment.

The voting committee (VC) model was also trained using data from 2012 to 2016. This model uses the same input attributes as the TensorFlow (TF) and frost index (IG) approaches but incorporates the outputs of both models as additional input features. The VC model was specifically designed to improve forecasts in cases where the IG and TF predictions diverged. The trained VC model was then tested on the 2017 dataset, focusing on instances where divergences between IG and TF forecasts occurred. The statistical indices obtained from this application are presented in Table 1.

The total number of frost and no-frost occurrences is presented in Figure 3. Generally, the number of frost occurrences is lower than the number of no-frost occurrences. Additionally, it is observed that the training and validation period (2012 to 2016) contains a significantly higher number of frost cases compared to the test period (2017).

Figure 3. Distribution of frost and non-frost events across training, validation, and test periods.

A comparison of the 2017 results from the TensorFlow and voting committee models was performed against the frost index approach described by Rozante et al. [25]. Furthermore, the 24 h model trained with the 2012–2016 dataset was used in a case study to generate a 72 h forecast for 21 May 2018, enabling comparison with a similar case study presented in Rozante et al.

Table 3 presents the hyperparameters and other characteristics of the TensorFlow model.

3. Results

In this section, we present the results obtained with the TensorFlow and IG models, highlighting their performance in predicting these frost events and their potential applications in operational forecasting systems. Five days with the occurrence of frost were selected. According to the criteria presented in Section 2, 566 cases of recorded minimum temperature values (Tmin) were classified as frost events during July 2017.

3.1. 24 h Forecast

This subsection presents the results of the 24 h frost forecasts for the study area, which includes the southern and southeastern regions of Brazil, as well as parts of Uruguay, Paraguay, northern Argentina, and southeastern Bolivia. The predictions were obtained using a machine learning model based on TensorFlow and compared with the frost index. In addition to the individual model outputs, a voting committee approach was employed, to combine the predictions from both models.

3.1.1. Frost Predictions: 17 July 2017

The locations of frost events predicted by the IG model on 17 July 2017 are shown in Figure 4a for the 24 h forecast, while the results obtained with the TensorFlow model are presented in Figure 4b. Both models identified frost occurrence (green asterisks) and non-occurrence (green circles), as well as the respective misses (red asterisks) and false alarms (red circles). The experimental results reveal that the IG model demonstrated a better response, successfully predicting all 85 frost events, with a Critical Success Index (CSI = 0.83) and Probability of Detection (POD = 0.91). However, it also had a higher false alarm rate (FAR = 0.09). Conversely, the TensorFlow model showed a lower response, with a reduced CSI of 0.77 and a higher POD of 0.93, despite maintaining a slightly higher Bias Score (BIAS = 1.14), indicating a slight overprediction of frost events. This lower response is further evidenced by a higher false alarm rate (FAR = 0.19), suggesting a trade-off between sensitivity and precision.

Figure 4. Spatial distribution of the occurrence of frost (★) and non-occurrence

(\circ)

of frosts for 17 July 2017.

Figure 4c,d provide further insights into the differences between the IG and TensorFlow models. In Figure 4c, areas of significant disagreement between the models are highlighted. The IG model predicted 13 frost events (red circles) that the TensorFlow model did not detect, while the TensorFlow model identified 6 frost events (blue squares) missed by the IG model. This divergence illustrates the complementary nature of the two approaches, where one model compensates for the limitations of the other.

Figure 4d presents the results obtained using a voting committee approach, which combines the outputs of both models. This ensemble method achieved 12 hits and reduced the number of misses to seven, demonstrating the advantage of combining multiple machine learning models to improve forecast accuracy. By leveraging the strengths of both the IG and TensorFlow models, the voting committee approach enhances the robustness of frost predictions, minimizing false alarms while increasing the reliability of operational forecasts.

3.1.2. Frost Predictions: 18 July 2017

Figure 5 shows the 24 h forecast results on 18 July 2017. The results predicted by the IG model are shown in Figure 5a; Figure 5b shows the results obtained with the TensorFlow model. According to the criterion, 115 cases were registered as frost events. Among the 115 cases, both models were able to predict 113 frost events.

Figure 5. Spatial distribution of the occurrence of frost

(★)

and non-occurrence

(\circ)

of frosts for 18 July 2017.

Figure 5a shows that the frost index model performs well in identifying frost events. Hits, correct negatives, misses, and false alarms are visualized as green stars, green circles, red stars, and red circles, respectively. The model achieved a Proportion Correct (PC) of 0.95, a Probability of Detection (POD) of 0.94, and a false alarm rate (FAR) of 0.04. Furthermore, the Bias Score (BIAS) of 0.98 indicates that the model is well calibrated, while the Critical Success Index (CSI) of 0.90 highlights its ability to correctly predict frost events. Analyzing the performance of the TensorFlow model (Figure 5b), the results are similar to those of the frost index model. The TF model achieved a PC of 0.95, a POD of 0.94, a FAR of 0.04, and a BIAS of 0.98, producing a CSI of 0.90. These results indicate that both models are highly accurate, with minimal errors and strong agreement in predicting frost events.

Figure 5c highlights the areas of model divergence, where predictions from the IG and TF models differ. Red circles indicate regions where the IG model predicted frost events (hits) while the TF model failed. In contrast, blue squares represent locations where the TF model successfully predicted frost events, but the IG model did not. The presence of only a few such divergent points suggests a high level of consistency between the two models, with discrepancies localized to specific areas. Finally, Figure 5d summarizes the results of a voting committee approach that combines the outputs of both models to enhance the prediction accuracy. This method identifies four confirmed frost events (hits), represented by green stars, and two missed events, marked as red dots. By consolidating the forecasts from IG and TF models, the voting committee reduces prediction errors and increases the robustness of the frost forecast.

3.1.3. Frost Predictions: 20 July 2017

Figure 6 presents the 24 h forecast for 20 July 2017, showing the performance of both the frost index (IG) model and the TensorFlow (TF) model. Figure 6a displays the frost prediction for the frost index model, while Figure 6b shows the prediction for the TensorFlow model. For this experiment, a total of 105 cases were classified as frost events. The TensorFlow model successfully predicted 86 events, while the frost index model predicted 85 events. This indicates that both models perform similarly, with TensorFlow achieving a slightly higher number of accurate predictions.

Figure 6. Spatial distribution of the occurrence of frost

(★)

and non-occurrence

(\circ)

of frosts for 20 July 2017.

In terms of performance metrics, the frost index model (Figure 6a) achieved a PC of 0.82, a POD of 0.70, and a FAR of 0.14. Additionally, the BIAS is 0.81, and the CSI is 0.62. These values suggest that the frost index model has a moderate performance, with a reasonable detection rate but a relatively high rate of false alarms. Similarly, the TensorFlow model (Figure 6b) shows nearly identical performance metrics, with a PC of 0.82, POD of 0.70, FAR of 0.14, BIAS of 0.82, and a CSI of 0.63. These metrics indicate that the TensorFlow model achieved a slightly better CSI.

Figure 6c illustrates the divergence between the two models, and Figure 6d shows the results of the voting committee approach that combines both models. In this figure, the green stars represent hits, while the red circles indicate misses. The voting committee approach led to 8 hits and 12 misses, demonstrating an improvement in prediction when combining the outputs of both models. In summary, the voting committee approach helps to refine the forecast and reduce the number of misses.

3.1.4. Frost Predictions: 21 July 2017

Figure 7 illustrates the 24 h frost forecast results for 21 July 2017, comparing the performance of the IG model and the TensorFlow model. Figure 7a displays the prediction results for the frost index model, and Figure 7b shows the results obtained with the TensorFlow model.

Figure 7. Spatial distribution of the occurrence of frost

(★)

and non-occurrence

(\circ)

of frosts for 21 July 2017.

In this case, a total of 91 frost events were recorded. The TensorFlow model demonstrated better performance, successfully predicting 55 frost events, while the frost index model identified only 47 frost events. This indicates that the TensorFlow model outperformed the IG model in detecting frost occurrences across the regions analyzed. In terms of metrics, the frost index model (Figure 7a) achieved a PC of 0.79, a POD of 0.45, and a FAR of 0.13. The BIAS of 0.52 suggests that the IG model underestimates frost events, while the CSI of 0.42 highlights its moderate success in predicting frost occurrences.

The TensorFlow model (Figure 7b) shows improved performance, with a PC of 0.78, a POD of 0.47, and a FAR of 0.22. Although the false alarm rate is slightly higher, the BIAS of 0.60 indicates a better balance in prediction compared to the IG model. The CSI remains at 0.42, which is comparable to the IG model but reflects the TensorFlow model’s slight advantage in detection. Figure 7c depicts the divergence between the models. The larger number of blue squares shows that the TensorFlow model captures additional frost events missed by the IG model.

Figure 7d summarizes the results using the voting committee approach. The voting committee achieved 12 hits and 10 misses, indicating a slight improvement in detection accuracy when combining the outputs of both models. In conclusion, the TensorFlow model outperformed the frost index model on 21 July 2017, identifying a higher number of frost events.

3.1.5. Frost Predictions: 24 July 2017

Figure 8 presents the 24 h frost forecast results for 24 July 2017, comparing the IG model with the TensorFlow. In Figure 8a, the IG model achieved a PC of 0.95, reflecting a high overall accuracy. However, its POD is notably low at 0.42, indicating that the model identified less than half of the frost events. Additionally, the FAR of 0.50 shows that 50% of the predicted frost cases did not occur, which, combined with a BIAS of 0.83, indicates an underprediction tendency. The CSI stands at 0.29, suggesting poor performance in accurately capturing frost occurrences due to missed events and false alarms.

Figure 8. Spatial distribution of the occurrence of frost

(★)

and non-occurrence

(\circ)

of frosts for 24 July 2017.

In Figure 8b, the TensorFlow model also achieved a PC of 0.95, maintaining similar overall accuracy. However, the POD of 0.58 demonstrates a significant improvement, as it detected 58% of the frost events compared to the IG model’s 42%. Despite this improvement, the FAR increased to 0.60, indicating a slightly higher rate of false alarms. The BIAS of 1.33 suggests that the TensorFlow model tended to overpredict frost events. Nonetheless, the CSI rose slightly to 0.33, reflecting an improvement in balancing hits, false alarms, and missed events.

The model divergence analysis in Figure 8c highlights specific regional differences in predictions. In the R1 region, where a single frost event occurred, the TensorFlow model successfully captured this event, whereas the IG model failed to detect it. In the R3 region, where seven frost events were recorded, the TensorFlow model correctly classified five cases. This underscores the TensorFlow model’s improved ability to predict frost events in critical regions where the IG model underperformed. Finally, Figure 8d illustrates the voting committee result and shows that both models combined correctly identified six frost events but missed six others.

In summary, Table 4 presents the results of the frost index and TensorFlow models applied over five days with the occurrence of frost. From the results shown in the table, the TensorFlow model proves to be competitive compared to the IG model, demonstrating similar performance across the metrics used. The POD values are relatively high, indicating its ability to accurately predict the occurrence of frost events. However, both models exhibit strengths and weaknesses on different dates, highlighting the importance of considering specific weather conditions when evaluating model performance.

Table 4. Results on divergence.

3.2. 72 h Forecast

The 72 h forecast evaluation focuses on comparing the performance of the frost index and TensorFlow models across different regions and metrics. Table 5 presents the results of the frost index and TensorFlow models applied to the test dataset. Specifically, in region R1, the frost index outperformed TensorFlow in CSI (0.68 vs. 0.60) and in POD (0.72 vs. 0.68); however, TensorFlow had an advantage in SR (0.84 vs. 0.82) and a lower FAR (0.16 vs. 0.18). In region R2, the frost index again led in CSI (0.57 vs. 0.54) and had a lower FAR (0.23 vs. 0.30), while TensorFlow performed better in POD (0.71 vs. 0.69). In region R3, the frost index maintained an edge in CSI (0.38 vs. 0.36) and FAR (0.42 vs. 0.47), with both models performing equally in POD (0.52). Regarding BIAS, the frost index was consistently more conservative than TensorFlow across all regions analyzed.

Table 5. Results.

Table 6 presents the results on divergence for two classes (frost and no frost) across three regions (R1, R2, and R3). Analyzing performance by region, in R1, both frost index and TensorFlow showed better performance compared to R2 and R3. Region R3 showed the worst results for both models, with low CSI values and high FAR values, indicating significant challenges in predicting events in this specific region.

Table 6. Results on divergence.

Figure 9 shows the 72 h forecast results on 21 May 2018. The results predicted by the IG model are shown in Figure 9a; Figure 9b shows the results obtained with the TensorFlow model; Figure 9c presents the model divergence analysis; and Figure 9d shows the voting committee approach. The frost index model achieved a PC of 72%, with a POD of 0.69, indicating that it successfully captured 69% of the frost events. However, it exhibited a FAR of 0.26, where 26% of its predictions were incorrect false positives, and a BIAS of 0.93, indicating a slight underestimation of frost events. The CSI, which accounts for both false alarms and missed events, reached 0.56, reflecting moderate performance.

Figure 9. Spatial distribution of the occurrence of frost

(★)

and non-occurrence

(\circ)

of frosts for 21 May 2018.

In contrast, the TensorFlow model (Figure 9b) demonstrated noticeable improvements across all metrics. It achieved a PC of 77%, outperforming the IG model with a 5% increase in accuracy. The POD rose to 0.76, showing a better ability to capture frost events, while the FAR decreased to 0.22, representing a reduction in false alarms. The BIAS of 0.98 highlights a more balanced prediction of frost events, and the CSI improved significantly to 0.63. These results underscore the TensorFlow model’s superior performance in both detecting frost events and minimizing false predictions.

The model divergence analysis in Figure 9c further emphasizes this improvement, as the TensorFlow model correctly identified 43 frost events that were missed by the IG model, compared to only 33 hits exclusive to the IG predictions. This highlights TensorFlow’s ability to detect frost events in regions where the IG model failed. Finally, the voting committee approach in Figure 9d, which combines predictions from both models, resulted in 41 correct hits and 35 missed events. While this combined strategy reduces individual model weaknesses, the persistent missed events highlight the overall complexity of accurately predicting frost.

4. Discussion and Conclusions

Two methodologies for frost prediction were developed in this study: one leveraging deep learning through the TensorFlow (TF) platform and another based on the selection of two frost indexes estimated by the IG approach (see reference [25]). When these two methodologies converge in their predictions, the forecaster gains greater confidence in disseminating frost warnings. In instances of divergence, a machine learning-based voting committee is employed, reducing subjectivity by selecting the most accurate prediction based on a consensus-driven approach.

The TensorFlow model demonstrated notable performance across several frost prediction cases, particularly on 18 and 20 July 2017, where it achieved a high accuracy, correctly classifying 113 of 115 cases on July 18 and 86 of 105 cases on 20 July. Furthermore, on 24 July 2017, the model successfully identified a frost event in region R1 that was not captured by the IG model. However, its performance varied across regions, with less satisfactory results observed in region R3 on 22 July, where accuracy was comparatively lower. These findings highlight the model’s ability to capture complex patterns in favorable conditions while underscoring the need for refinement to address challenges in regions with more complex climate dynamics.

In summary, the voting committee approach plays a crucial role in refining forecasts and reducing the number of misses, particularly in cases where individual methodologies provide divergent predictions. By integrating the strengths of both the TensorFlow and IG-based approaches, the voting committee enhances the overall forecast reliability and supports more accurate decision-making.

The integration of these methodologies provides a robust and flexible framework for frost prediction. The deep learning model excels in capturing intricate patterns in meteorological data, while the IG-based index contributes interpretability and domain knowledge. By combining these strengths, the voting committee enhances reliability and supports more informed decision-making in operational forecasting.

The methodologies presented in this study, while developed with a focus on South America, demonstrate potential for global applicability. The framework can be adaptable to various climatic and environmental conditions, making it a valuable tool for frost forecasting and risk management in diverse regions worldwide.

Future work could focus on extending the applicability of the voting committee to longer forecast periods and exploring additional machine-learning techniques to further improve prediction accuracy and reliability.

Author Contributions

Conceptualization: V.A.d.A., J.A.A., J.R.R. and H.F.d.C.V.; methodology, V.A.d.A., J.A.A., J.R.R. and H.F.d.C.V.; software development: V.A.d.A. and J.A.A.; validation: V.A.d.A. and J.A.A.; H.F.d.C.V.; writing—original draft preparation: V.A.d.A., J.A.A. and H.F.d.C.V.; writing—review and editing: V.A.d.A., J.A.A., J.R.R. and H.F.d.C.V. All authors have read and agreed to the published version of the manuscript.

Funding

Author HFCV also thanks the National Council for Scientific and Technological Development (CNPq, Brazil) for the research grants (CNPq: 315349/2023-9).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available upon request.

Acknowledgments

Authors wish to thank the National Institute for Space Research and the National Council for Scientific and Technological Development (CNPq, Brazil).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Margolis, M.L. Green gold and ice: The impact of frost on the coffee growing region of Northern Paraná, Brazil. Mass Emergencies 1979, 4, 135–144. [Google Scholar]
Hewitt, K. Interpreting the role of hazards in agriculture. In Interpretations of Calamity; Hewitt, K., Ed.; Allen Unwin: London, UK, 1983; pp. 123–139. [Google Scholar]
Moricochi, L.; Alfonsi, R.R.; Oliveira, E.D.; de Monteiro, J.L.M. Geadas e seca de 1994: Perspectivas do mercado cafeeiro. Informações Econômicas 1995, 25, 49–57. [Google Scholar]
Junges, A.; Fontana, D. Quebras de safra de trigo no estado do Rio Grande do Sul: Um estudo de caso. In Proceedings of the XVI Congresso Brasileiro de Agrometeorologia, Belo Horizonte, Brazil, 22–25 September 2009. [Google Scholar]
Melo, C.; Moro, L. Sazonalidade de Preços do Trigo no Paraná de 2000 a 2012. Revista de Política Agrícola, Local de Publicação (Editar no Plugin de Tradução o Arquivo da Citação ABNT). 22 June 2015. Available online: https://seer.sede.embrapa.br/index.php/RPA/article/view/852 (accessed on 15 December 2020).
Tsunechiro, A.; Miura, M. Segunda Estimativa de Oferta e Demanda de Milho no estado de São Paulo em 2009; Informações Econômicas: São Paulo, Brazil, 2009; Volume 39. [Google Scholar]
Blanc, M.L.; Geslin, H.; Holzberg, I.; Mason, B. Protection Against Frost Damage; WMO: Genova, Italy, 1963. [Google Scholar]
Bettencourt, M.L. Contribuição para o Estudo das Geadas em Portugal Continental; Instituto Nacional de Meteorologia e Geofísica: Lisboa, Portugal, 1980.
Mota, F.S. Balanço hídrico. In Meteorologia Agrícola; Nobel: São Paulo, Brazil, 1987; pp. 279–309. [Google Scholar]
Cunha, F.R. O Problema da Geada Negra no Algarve; Series Divulgação. 12; INIA: Instituto Nacional de Investigacao Agraria: Lisbon, Portugal, 1982; 125p. (In Portuguese)
Hogg, W.H. Frequency of radiation and wind frosts during spring in Kent. Meteorol. Mag. 1950, 79, 42–49. [Google Scholar]
Hogg, W.H. Spring frosts. Agriculture 1971, 78, 28–31. [Google Scholar]
Lawrence, E.N. Frost investigation. Meteorol. Mag. 1952, 81, 65–74. [Google Scholar]
Raposo, J.R. A Defesa das Plantas Contra as Geadas. Junta de Colonização Interna, Estudos Técnicos. 1967; Volume 7, p. 111. (In Portuguese). Available online: https://livrariadalapa.com/categorias/2144-jose-rasquilho-raposo-a-defesa-das-plantas-contra-as-geadas (accessed on 7 October 2024).
Hewett, E.W. Preventing Frost Damage to Fruit Trees; New Zealand Department of Scientific and Industrial Research (DSIR) Information Series, No. 86; 1971; 55p. Available online: https://books.google.com.br/books?id=8eRL0AEACAAJ (accessed on 7 October 2024).
Cunha, J.M. Contribuição para o Estudo do Problema das Geadas em Portugal; 1952 Relatório Final do Curso de Engenheiro Agrónomo; Instituto Superior de Agronomia: Lisbon, Portugal, 1952. (In Portuguese) [Google Scholar]
Snyder, R.L.; de Melo-Abreu, J.P. Frost Protection: Fundamentals, Practice and Economics; Environmental and Natural Resouces Series; FAO: Rome, Italy, 2005; Volume 1. [Google Scholar]
Diedrichs, A.L.; Bromberg, F.; Dujovne, D.; Brun-Laguna, K.; Watteyne, T. Prediction of frost events using machine learning and IoT sensing devices. IEEE Internet Things J. 2018, 5, 4589–4597. [Google Scholar] [CrossRef]
Ding, L.; Noborio, K.; Shibuya, K. Frost forecast using machine learning-from association to causality. Procedia Comput. Sci. 2019, 159, 1001–1010. [Google Scholar] [CrossRef]
Fuentes, M.; Cristóbal, C.; García-Loyola, S. Application of artificial neural networks to frost detection in central Chile using the next day minimum air temperature forecast. Chil. J. Agric. Res. 2018, 78, 327–338. [Google Scholar] [CrossRef]
Maqsood, I.; Khan, M.R.; Abraham, A. An ensemble of neural networks for weather forecasting. Neural Comput. Appl. 2004, 13, 112–122. [Google Scholar] [CrossRef]
Lira, H.; Martí, L.; Sanchez-Pi, N. A graph neural network with spatio-temporal attention for multi-sources time series data: An application to frost forecast. Sensors 2022, 22, 1486. [Google Scholar] [CrossRef]
Talsma, C.J.; Solander, K.C.; Mudunuru, M.K.; Crawford, B.; Powell, M.R. Frost prediction using machine learning and deep neural network models. Front. Artif. Intell. 2023, 5, 963781. [Google Scholar] [CrossRef] [PubMed]
Wassan, S.; Xi, C.; Jhanjhi, N.Z.; Binte-Imran, L. Effect of frost on plants, leaves, and forecast of frost events using convolutional neural networks. Int. J. Distrib. Sens. Netw. 2021, 17, 15501477211053777. [Google Scholar] [CrossRef]
Rozante, J.R.; Gutierrez, E.R.; da Silva Dias, P.L.; de Almeida Fernandes, A.; Alvim, D.S.; Silva, V.M. Development of an index for frost prediction: Technique and validation. Meteorol. Appl. 2020, 27, e1807. [Google Scholar] [CrossRef]
Rozante, J.R.; Ramirez, E.; Ramirez, D.; Rozante, G. Improved frost forecast using machine learning methods. Artif. Intell. Geosci. 2023, 4, 164–181. [Google Scholar] [CrossRef]
Haykin, S. Neural Networks: A Comprehensive Foundation, 2nd ed.; Prentice Hall, Inc.: Hoboken, NJ, USA, 1999. [Google Scholar]
Pinto, H.C.; Tarifa, J.R.; Alfonsi, R.R.; Pedro, M.J.J. Estimation of Frost Damage in Coffee Trees in the State of São Paulo, Brazil; American Meteorological Society: Boston, MA, USA, 1977; pp. 37–38. [Google Scholar]
Black, T.L. The new NMC mesoscale Eta model: Description and forecast examples. Weather. Forecast. 1994, 9, 265–278. [Google Scholar] [CrossRef]
Mesinger, F.; Janjić, Z.I.; Ničković, S.; Gavrilov, D.; Deaven, D.G. The step-mountain coordinate: Model description and performance for cases of Alpine lee cyclogenesis and for a case of an Appalachian redevelopment. Mon. Weather. Rev. 1988, 116, 1493–1518. [Google Scholar] [CrossRef]
Arakawa, A. Lamb Computational design of the basic dynamical processes of the UCLA general circulation model. In Methods in Computational Physics: Advances in Research and Applications; Chang, J., Ed.; Elsevier: Amsterdam, The Netherlands, 1977; Volume 17, pp. 173–265. [Google Scholar]
Chou, S.C.; Tanajura, C.A.; Xue, Y.; Nobre, C.A. Validation of the coupled Eta/SSiB model over South America. J. Geophys. Res. Atmos. 2002, 107, LBA-56. [Google Scholar] [CrossRef]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Available online: https://www.tensorflow.org/ (accessed on 21 June 2022).
Google. Google Colaboratory. Available online: https://research.google.com/colaboratory/faq.html (accessed on 13 September 2024).
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Frank, E.; Hall, M.A.; Witten, I.H. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”. In Morgan Kaufmann, 4th ed.; The University of Waikato: Hamilton, New Zealand, 2016. [Google Scholar]
Freund, Y.; Schapire, R. A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting. J. Comput. Syst. Sci. 1995, 55, 119–139. [Google Scholar] [CrossRef]
Kuncheva, L.I. Combining Pattern Classifiers: Methods and Algorithms; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]

Figure 1. Study area and topography.

Figure 2. Typical topology of a neural network.

Figure 3. Distribution of frost and non-frost events across training, validation, and test periods.

Figure 4. Spatial distribution of the occurrence of frost (★) and non-occurrence

(\circ)

of frosts for 17 July 2017.

Figure 5. Spatial distribution of the occurrence of frost

(★)

and non-occurrence

(\circ)

of frosts for 18 July 2017.

Figure 6. Spatial distribution of the occurrence of frost

(★)

and non-occurrence

(\circ)

of frosts for 20 July 2017.

Figure 7. Spatial distribution of the occurrence of frost

(★)

and non-occurrence

(\circ)

of frosts for 21 July 2017.

Figure 8. Spatial distribution of the occurrence of frost

(★)

and non-occurrence

(\circ)

of frosts for 24 July 2017.

Figure 9. Spatial distribution of the occurrence of frost

(★)

and non-occurrence

(\circ)

of frosts for 21 May 2018.

Table 1. Statistical indices for evaluation.

Index	Equation	Description	Reference Values
CSI	$C S I = \frac{a}{a + b + c}$	Proportion of hits excluding correct “No” event forecasts	Perfect when 1
POD	$P O D = \frac{a}{a + c}$	Proportion of hits among observed “Yes” events	Perfect when 1
FAR	$F A R = \frac{b}{a + b}$	Proportion of misses of “Yes” events	Perfect when 0
SR	$S R = 1 - F A R$	Proportion of hits among forecast “Yes” events	Perfect when 1
Bias	$B I A S = \frac{a + b}{a + c}$	Proportion of predicted events and observed events	Perfect when 1
PC	$P C = \frac{a + d}{a + b + c + d}$	Proportion of correctly classified events	Perfect when 1

Table 2. Contingency table.

	Observed
Predicted	Yes	No	Total
Yes	a	b	a + b
No	c	d	c + d
Total	a + c	b + d	n = a + b + c + d

Table 3. Final topology and other characteristics of neural networks in TensorFlow experiment.

Hyperparameters	NN-TensorFlow
Version	2.0.0
Number of Inputs	7
Number of Layers	2
Number of hidden units (each layer)	25
Activation function (hidden layers)	ReLU
Activation function (output)	sigmoid
Optimizer	Adam ¹
Learning rate	0.001 (default)
Momentum	0.9 (default)
Epochs	1000

¹ https://keras.io/api/optimizers/adam/ accessed on 13 November 2024.

Table 4. Results on divergence.

Date	Class	PC	POD	FAR	SR	CSI	BIAS
2017071700	frost	0.55	0.13	0.67	0.33	0.10	0.38
2017071700	no frost	0.55	0.83	0.41	0.59	0.53	1.42
2017071800	frost	0.67	0.00	1.00	0.00	0.00	0.00
2017071800	no frost	0.67	0.67	0.00	1.00	0.67	0.67
2017072000	frost	0.29	0.31	0.60	0.40	0.21	0.77
2017072000	no frost	0.29	0.25	0.82	0.18	0.12	1.38
2017072100	frost	0.50	0.36	0.38	0.63	0.29	0.57
2017072100	no frost	0.50	0.70	0.56	0.44	0.37	1.60
2017072400	frost	0.50	0.00	1.00	0.00	0.00	2.50
2017072400	no frost	0.50	0.58	0.22	0.78	0.50	0.75

Table 5. Results.

Model	Region	CSI	POD	SR	FAR	BIAS
Frost Index	R1	0.68	0.72	0.82	0.18	0.88
TensorFlow	R1	0.60	0.68	0.84	0.16	0.81
Frost Index	R2	0.57	0.69	0.77	0.23	0.90
TensorFlow	R2	0.54	0.71	0.70	0.30	1.01
Frost Index	R3	0.38	0.52	0.58	0.42	0.89
TensorFlow	R3	0.36	0.52	0.53	0.47	0.99

Table 6. Results on divergence.

Class	Region	PC	POD	FAR	SR	CSI	BIAS
Frost	R1	0.55	0.65	0.40	0.60	0.45	1.08
No Frost	R1	0.55	0.44	0.51	0.49	0.30	0.90
Frost	R2	0.65	0.46	0.42	0.58	0.35	0.80
No Frost	R2	0.65	0.78	0.32	0.68	0.57	1.14
Frost	R3	0.65	0.55	0.54	0.46	0.33	1.21
No Frost	R3	0.65	0.70	0.23	0.77	0.58	0.90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Machine Learning with Voting Committee for Frost Prediction

Abstract

1. Introduction

2. Materials and Methods

2.1. Data

2.2. IG-Frost Index

2.3. Neural Network

2.4. Voting Committee

2.5. Evaluation Metrics

2.6. Description of Experiments

3. Results

3.1. 24 h Forecast

3.1.1. Frost Predictions: 17 July 2017

3.1.2. Frost Predictions: 18 July 2017

3.1.3. Frost Predictions: 20 July 2017

3.1.4. Frost Predictions: 21 July 2017

3.1.5. Frost Predictions: 24 July 2017

3.2. 72 h Forecast

4. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics