1. Introduction
Lightning discharges pose significant risks to human safety and industrial operations, particularly in open environments such as mining areas. Electrical storms can lead to fatal accidents, infrastructure damage, and costly interruptions in industrial production. Beyond mining, lightning events also threaten ports, airports, construction sites, agriculture, wind energy operations, offshore activities, mountain-top stations, and power plants, where unexpected (and sometimes unmonitored) lightning strikes can cause severe operational disruptions. In open-air mining environments, lightning poses a particularly high risk due to the extensive exposure of equipment, personnel, and infrastructure. Mining sites often operate across large, elevated, and remote areas with limited shelter and the significant use of tall metallic structures such as drilling rigs, haul trucks, and conveyor belts, all of which are vulnerable to hazards. Sudden storms can halt operations, damage high-value machinery, injure works, or even cause fatalities, especially when evacuation protocols are delayed or unavailable. Furthermore, blasting operations must be suspended during thunderstorm activity, creating costly delays.
In the eastern Amazon, particularly in the Carajás Mineral Province, Santos et al. [
1] have shown that topographic elevation and changes in land cover in mining areas can influence lightning occurrence, further reinforcing the need for localized and high-resolution forecasting in such regions. Accurate nowcasting is therefore essential for minimizing downtime and ensuring worker safety in real-time industrial decisions.
Also, lightning can trigger wildfires, as reported in Pineda et al. [
2], emphasizing the need for reliable forecasting systems to mitigate such hazards. The lack of effective forecasting and prediction systems increases risks, as missed alerts expose workers and infrastructure to potential damage, while false alarms disrupt operations unnecessarily, leading to economic losses. Developing accurate and reliable lightning prediction models is, therefore, essential for both safety and productivity.
Efforts to improve lightning monitoring and modeling also extend beyond industrial applications. Kovář et al. [
3], for example, explored the use of very-low-frequency signals generated by lightning for long-range radio navigation, highlighting the broader applicability of lightning-based data in geolocation systems. Kákona et al. [
4] performed mobile ground-based measurements across central Europe using high-speed cameras and radio receivers, offering new insights into lightning development and the limitations of field-based radiation detection. Although distinct in scope, such studies reinforce the scientific and technological relevance of accurately detecting and modeling lightning events across diverse fields.
Additionally, understanding the spatial distribution of lightning activity is essential for improving model performance in regions with distinct convective behavior. Albrecht et al. [
5] provided a high-resolution satellite-derived lightning climatology that identified global hotspots and revealed key geographical influences, such as topography and land–water contrasts, on lightning occurrence. Although climatological in nature, these insights support the development of more localized and data-driven nowcasting strategies.
Recent advances in weather radar technology have provided high-resolution meteorological data for nowcasting severe weather events, including lightning. Various data sources, including ground-based detection networks, satellite observations, and numerical weather prediction (NWP) models, have been used for lightning forecasting. However, studies have shown that polarimetric weather radar data are particularly useful for analyzing storm microphysics and charge separation processes, which are important to lightning formation [
6,
7,
8,
9]. The integration of machine learning (ML) models with weather radar data has shown promising results in improving prediction accuracy by capturing nonlinear relationships between storm characteristics and lightning occurrence. Additionally, multi-source data integration has been shown to enhance forecast performance, as highlighted in [
10,
11], demonstrating that combining radar, satellite, and numerical models can improve prediction accuracy.
Some studies have explored different approaches to lightning forecasting using radar and ML techniques. Abreu et al. [
6] demonstrated that increasing reflectivity values in the vertical structure of clouds correlates with lightning activity. Hayashi et al. [
7] investigated the relationship between hydrometeor classification and lightning rates using dual-polarization radar, identifying ice-phase hydrometeors as key contributors to storm electrification. Capozzi et al. [
12] proposed a multi-parameter approach for cloud-to-ground (CG) lightning detection, demonstrating that quadratic discriminant analysis (QDA) outperformed traditional and single-variable models. More recently, Rombeek, Leinonen, and Hamann [
8] highlighted the importance of polarimetric radar variables in nowcasting severe weather hazards, showing that deep learning (DL) architectures can improve lightning predictions. Additionally, studies such as [
13,
14,
15] have explored ML-based approaches using various meteorological inputs, achieving significant improvements in prediction accuracy.
Despite these advancements, there is still a need for optimized methodologies that integrate different radar-based predictive approaches to improve forecasting reliability, particularly in regions with complex meteorological dynamics such as the Amazon region in Brazil. These studies did not directly compare alternative feature-engineering strategies on the same dataset to quantify incremental gains, nor provide a detailed theoretical analysis of why specific radar signatures drive electrification, or assess the operational trade-offs of model complexity and false-alarm rates in real-time systems. As a result, the unique methodological value and practical applicability of different radar-derived representations remain underexplored.
This study proposes an ML-based lightning prediction approach using polarimetric weather radar data with a focus on nowcasting over a mining region in Pará, Brazil, which is in the Amazon area. Unlike previous studies, this research leverages three different approaches for feature extraction and prediction: (i) grouping radar variables into temperature-based layers, (ii) computing descriptive statistics of reflectivity and polarimetric variables at different altitudes, and (iii) applying Principal Component Analysis (PCA) to multi-level radar data and combining multiple models into an ensemble one. The main contributions include (i) evaluating different feature engineering strategies for lightning prediction, (ii) optimizing ML models for a high-risk industrial environment, and (iii) integrating the most effective model into the Lightning Early Warning Artificial Intelligence System (LEWAIS) operational forecasting system [
16]. This study aims to contribute to both scientific research and industrial safety with a view both on worker’s safety and productivity.
This work is organized as follows:
Section 2 reviews the related work in lighting prediction and radar-based modeling;
Section 3 presents the data sources, including weather radar and lighting datasets, describes briefly the physical properties of lightning, and details the three modeling approaches proposed;
Section 4 reports and discusses the results, including performance comparisons and integration with the LEWAIS; and finally,
Section 5 provides the conclusions and outlines directions for future research.
2. Related Works
Previous studies have established a relationship between cloud microphysics and lightning occurrences using weather radar data. This is because the electrification process within a thunderstorm is directly linked to the hydrometeors that compose the cloud [
6,
7,
9]. Understanding this relationship is crucial for developing accurate lightning prediction models, particularly those leveraging polarimetric radar data and ML techniques.
Several studies have explored the vertical structure of clouds and their association with lightning. Abreu et al. [
6] analyzed the relationship between cloud structure and lightning frequency in northern Brazil using reflectivity profiles from the Tropical Rainfall Measuring Mission satellite radar. Their dataset consisted of reflectivity profiles with 80 vertical levels (one every 250 m), ranging from 0 to 80 dBZ. The study found that as lightning frequency increased, reflectivity values in the vertical profile also increased, demonstrating a clear connection between reflectivity and lightning occurrences. Similarly, Hayashi et al. [
7] used dual-polarization radar data, a hydrometeor classification algorithm, and historical lightning data to investigate microphysical properties associated with lightning rate in 10 isolated storm cases over the Kanto Plain, Japan. The study found that ice particles within the 35 dBZ volume (V35IC) had the highest correlation coefficient (
r = 0.75) and the lowest normalized root mean square error (NRMSE = 8.3%) CG lightning, and
r = 0.69, NRMSE = 8.1% for intra-cloud (IC) lightning.
A different approach was taken by Capozzi et al. [
12], who developed a multi-parameter method to detect the CG lightning stroke rate in convective cells using a low-cost X-band single-polarization radar. The reported findings demonstrated that a QDA-based classification approach outperformed traditional single-parameter methods. Furthermore, QDA surpassed Fuzzy Logic and Support Vector Machine (SVM)-based models, except for the Heidke Skill Score, where an SVM with a Gaussian kernel performed best.
Advances in ML for lightning prediction have also provided significant improvements in the field. Rombeek, Leinonen, and Hamann [
8] emphasized the importance of polarimetric variables in nowcasting thunderstorm hazards using recurrent-convolutional neural networks. By incorporating hydrometeor characteristics from multiple altitudes, their approach enhanced predictions of precipitation, hail, and lightning activity. This research further validates the importance of analyzing radar-derived microphysical features, a key aspect of our layered reflectivity analysis in Approach 1 and height-based feature extraction in Approach 2, which are described in the next section. ML-based lightning prediction methods have been explored in other regions as well. Mostajabi et al. [
13] used ML to predict lightning risk within a 30 km radius around 12 meteorological stations in Switzerland. Their model, based on four meteorological variables (air pressure, temperature, relative humidity, and wind speed), was validated using lightning detection system data. Among the ML models tested, XGBoost produced the best results for lead times of up to 30 min.
Further evidence of ML effectiveness for lightning forecasting comes from Shan et al. [
14], who applied several ML models to analyze the relationship between atmospheric radiation measurement data and lightning records from the earth networks total lightning network. The study identified key variables influencing lightning formation with Random Forest (RF) emerging as the best predictor. When convective clouds were detected, RF predicted lightning with 76.9% accuracy and an Area Under the Curve (AUC) of 0.850. Bao et al. [
15] designed a deep learning-based lightning prediction system using Multi-Layer Perceptron and ResNet50, achieving 88.2% accuracy, 92.2% precision, 81.5% recall, and an F1-score of 86.4%. These studies validate the use of ML models in our study, particularly the RF and XGBoost models, which were tested across all three approaches.
More recently, research on lightning nowcasting using weather radar has incorporated Doppler radar, NWP, and ML models to enhance forecasting accuracy. Fata et al. [
17] explored CG lightning nowcasting by fusing remote sensing and NWP data, integrating geostationary meteorological sensors and Doppler radar. Their results showed that Gaussian Process Regression improved prediction lead times by up to 15 min with higher spatial confidence. Yin et al. [
10] developed a model that integrates GNSS-derived precipitable water vapor, weather radar, and satellite data, demonstrating a 20% increase in prediction accuracy over radar-only or satellite-only models. Hosalikar et al. [
18] focused on thunderstorms in eastern India, using Doppler Weather Radars and satellite data to improve CG lightning prediction. Additionally, Cintineo et al. [
11] introduced the third version of the ProbSevere model, which integrates radar, lightning, and satellite data with ML for severe weather nowcasting, emphasizing the importance of radar-derived features. Pineda et al. [
2] examined lightning-induced wildfires, using radar reflectivity and lightning data to characterize dry thunderstorms. Finally, Kundu et al. [
19] conducted a radar-based analysis of severe lightning events in northeast India, offering insights into storm dynamics.
These studies demonstrate the increasing sophistication of lightning prediction methods, which is driven by advances in weather radar technology, data fusion, and ML techniques. Our study differs from previous works in three key aspects: (i) data representation, since most studies use integrated or single-level radar data, we analyze reflectivity and polarimetric variables at multiple altitudes to assess thunderstorm microphysics; (ii) geographic zone, because our study is one of the few ML-based in the Amazon region, specifically in a mining zone where workers face high exposure to storms. The region’s intense convection, diverse hydrometeor profiles, and strong electrification [
1] make it distinct from those previously studied; (iii) application and integration, beyond predicting lightning, we aim to integrate the best-performing model into the Lightning Early Warning Artificial Intelligence System (LEWAIS) [
16] to improve worker safety and operational efficiency. By addressing these aspects, our study contributes to advancing lightning prediction methodologies, particularly in regions with complex meteorological dynamics, while also providing practical applications for early-warning systems.
4. Results and Discussion
This section presents the most promising results obtained from the machine learning approaches for lightning prediction. The weather radar, installed in 2021, has started generating data over the mining area, and these data are currently being analyzed by multiple research teams. The goal is that accurate forecasting, which minimizes both false positives and false negatives, is crucial for enhancing safety and optimizing productivity. A missed alert poses significant safety risks to workers involved in outdoor activities, whereas a false alert disrupts operations, resulting in economic losses.
Approach 1 uses zh, zdr, kdp, and rhohv data within a 20 km radius from the point of interest in the warm, mixed 1, mixed 2, and cold layers as input for training ML models. Due to differences in variable means across classes, missing values were imputed using the class mean, and StandardScaler was applied for feature scaling. To address class imbalance, RandomUnderSampler was used, balancing the training set to 2084 samples per class, while the test set remained imbalanced (88,343 samples for class 0 and 2033 for 1). The evaluated models included RF, LR, XGboost, SVM, and EHGB. Across all models, accuracy exceeded 97% with class 0 demonstrating both precision and recall above 98%. The best-performing model was EHGB (with the best parameters: learning_rate = 0.05, max_depth = 5, max_iter = 200), while LR showed the weakest performance, particularly in recall.
In Approach 2, a different perspective was taken by computing descriptive statistics (min, mean, max, and std) of radar variables at 3 km, 6 km, and 9 km altitudes. As in the previous approach, only valid radar recordings were included in the analysis. The best results came from the training and test sets that were created through random splitting (70–30%), and missing values were imputed using the median (SimpleImputer). To address class imbalance, SMOTEENN was applied, which was followed by GridSearchCV (cv = 10 folds) for hyperparameter tuning. The best model in this approach was the DecisionTreeClassifier (ccp_alpha = 0.001, max_depth = 10, random_state = 42), which achieved 0.88 recall for class 0 and 0.71 for class 1.
In Approach 3, PCA proved to be an effective method for reducing dataset dimensionality while preserving essential radar information. The number of components retained is shown in
Figure 2 with the selection criterion based on capturing over 95% of the total feature variance.
Special attention is going to be given to this approach, since it was that with the best results. After training different models using LR, DT, RF, and SVM, all optimized with GridSearchCV, RF and SVM achieved the best results based on the recall metric. In this application, the recall metric was considered the most appropriate for model selection, as it reflects the model’s ability to correctly identify actual lightning events, which is critical in operational warning systems. Using these two algorithms, 60 models were trained, and the top five were selected, averaging 0.87855. These models were then combined into an ensemble to increase generalization and predictive robustness. Ensemble is a technique that combines the predictions of multiple models to improve the overall forecast accuracy. With this strategy, ensemble methods can limit the variance and bias errors associated with single ML models. There are different approaches to creating an ensemble, such as bagging, boosting, and stacking. In this case, the combination of the predictions of the top five models was used to form a bagging ensemble, whose classification task was decided by majority vote. Bagging is known for reducing variance without increasing the bias, while boosting reduces bias [
30]. The central idea is that by combining the strengths of multiple models, the ensemble can capture different patterns in the data and provide a more robust and accurate forecast.
The application of the ensemble to the 2022 dataset presented a significant challenge due to the large volume (about 75k registers). To handle this, the data were split by month; i.e., initially, January data were processed by the ensemble, followed by February data, and so on. This stepwise approach allowed for efficient memory management and ensured all data were analyzed without compromising system integrity. The results for each month are described in
Table 6. It is possible to observe that the ensemble model achieved 0.944 recall for class 0 and 0.658 for class 1.
It is also noteworthy that the results vary by month and, on closer analysis, on days with few lightning strikes. A critical observation is that in months with low lightning activity, the model tends to generate a significant number of FP. To investigate this issue further, we conducted a case-by-case analysis of the 2022 test samples where the model misclassified lightning events, i.e., where the model predicted no lightning occurrence, but at least one lightning strike was observed within the 20 km radius from the mine, represented by the red circle. A few illustrative examples are shown in
Figure 3.
In
Figure 3a, only one CG lightning strike (red point) was detected within the 20 km monitored area with low reflectivity values in that region. These types of events were common in cases where the ensemble model failed, specifically when one or very few discharges occurred near the edge of the monitored circle but with low radar reflectivity in the central area. A similar pattern is seen in
Figure 3b, where reflectivity indicates possible convective activity, but this activity lies outside the monitored area. Since the ML model is spatially constrained to the 20 km radius, it does not consider surrounding convection that could still influence the region of interest. This reveals a limitation of the current model in accounting for storm cells developing just outside the monitoring boundary.
On the other hand, in
Figure 3c, strong atmospheric activity is clearly observed via both radar reflectivity and lightning detections. However, this activity is centered outside the monitored radius. Due to natural dispersion, a few strikes (CG and IC) occurred within the zone, which led to misclassification by the model. Finally,
Figure 3d presents an anomalous case where strong atmospheric activity was evident in the region, including within the monitored area, but for unknown reasons, the radar failed to register reflectivity values during that period. This may suggest potential radar equipment malfunction and some kind of interference affecting data acquisition.
Overall, in all the FN cases revisited, the number of lightning strikes was very low, typically one or two events, which was similar to the first three cases presented. This suggests that the model is generally effective in associating reflectivity signatures with lightning activity. However, since it is restricted to interpreting atmospheric data only within the 20 km area, it occasionally fails when lightning originates from more distant convection systems but still impacts the target zone.
Finally, a comparison of the three approaches is summarized in
Table 7, highlighting the best results in bold. The findings indicate that radar-based polarimetric data are valuable for lightning prediction with Approach 3 showing the best generalization capability.
These results confirm that Approach 3 (zh at 18 levels with PCA) provides the most reliable generalization, while Approach 1 (grouping by layers) prioritizes recall at the expense of precision. Approach 2 (descriptive statistics by height) proved efficient in capturing storm microphysics with a competitive recall. Future studies should refine feature extraction, hyperparameter tuning, and ensemble learning techniques to further optimize prediction accuracy.
4.1. Integration with an Existing System
Recently, LEWAIS was proposed in [
16] as an operational lightning warning system for the same area analyzed in this study (referred to as “P2”). The model divides the region into predefined quadrants (which refers to the spatial monitoring area around the target one) and applied a two-step grid search to optimize conflicting operational goals. These include minimizing false alarm rate (FAR), failure-to-warn (FTW), and operational downtime, while maximizing lead time. The Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) multicriteria method was used to rank the solutions. Historical lightning discharge data are used to fine-tune alert issuance strategies that account for both safety and productivity.
To utilize these metrics, we establish a connection between machine learning metrics and operational performance. FP contributes to the FAR, as both represent instances where an unnecessary warning is issued. FN contributes to the FTW, as both indicate missed lightning events. Precision helps reduce FAR by improving the trustworthiness of alerts, while recall is inversely related to FTW, as higher recall signifies fewer missed warnings. The F1-score balances these trade-offs, proving to be a useful metric for selecting the most suitable model for integration.
Thus, to quantitatively evaluate performance, LEWAIS relies on a contingency table comparing system alerts which actual lightning events. The following metrics are derived as follows: FAR is represented by Equation (5), which measures the proportion of false alerts among all alerts issued, and FTW, Equation (6), measures the proportion of missed events among all true lightning events. Operational downtime is the proportion of time (in hours) that operations are suspended due to active alerts relative to the total period. Lead time, in its turn, is the average time (in minutes) between alert issuance and the first subsequent lightning event, which is computed for correctly predicted cases.
To integrate our radar-based Ensemble ML model into this framework, we replicated the original LEWAIS metrics for the year 2022 using lightning data from INPE and defined this as the baseline (Model 1). We then evaluated three integration strategies:
Model 2—Conditional Ensemble: Uses the ensemble model when radar data are available. Otherwise, defaults to LEWAIS;
Model 3—Combined Alerts: Triggers an alert if either LEWAIS or ensemble generates one;
Model 4—Ensemble Priority: Prioritizes ensemble alerts. If only one model triggers, Ensemble is used.
The results obtained for the LEWAIS system and for the three integrated models are presented in
Table 8. The best results for each metric are highlighted in bold.
Among the four models, Model 3 offered the lowest FTW and the greater lead time, making it the most appropriate for safety-critical environments, such as mining operations. In contrast, Model 4 achieved the lowest FAR and downtime, which may benefit productivity, but its high FTW makes it less suitable when worker safety is the main concern. Model 2 provided moderate improvements in terms of FTW but still had a high FAR. Overall, the results show that integrating radar-based ML predictions with existing systems such as LEWAIS can improve performance, but the choice of integration strategy must be aligned with operational priorities, balancing safety and efficiency.
4.2. Comparison with Literature
The results of this study confirm the effectiveness of ML models combined with polarimetric radar data for lightning prediction, especially in a high-risk region like Pará, Brazil. When compared with related works, our approach demonstrates both consistency with previous findings and relevant methodological advances.
Abreu et al. [
6] identified a link between vertical reflectivity profiles and lightning frequency. Our Approach 2, which uses descriptive statistics at key altitudes, builds on this idea and achieved 0.71 recall for class 1 with a DT model, reinforcing the importance of features based on altitudes. Hayashi et al. [
7] showed that ice-phase hydrometeors identified by dual-polarization radar have strong correlation with lightning. This supports our proposed Approach 1, which grouped variables by different layers. The best EHGB model achieved a recall of 0.97, confirming the value of microphysical layer analysis. Capozzi et al. [
12] demonstrated that multi-parameter classification outperforms traditional methods. Similarly, our Approach 3, using PCA and the Ensemble model, achieved an average recall of 87.9% across models, showing the benefit of dimensionality reduction and model combination. Additionally, Rombeek, Leinonen, and Hamann [
8] focused on DL with polarimetric data across altitudes. In Approach 3, the utilization of 18 radar levels tends to reflect a similar strategy, showing good generalization through ensemble learning (even without DL architectures). Yin et al. [
10] and Cintineo et al. [
11] emphasized the benefit of fusing radar, satellite, and NWP data. Despite relying solely on radar data, our results demonstrate that properly processed radar features can yield good lightning predictions. Finally, in [
14], its statistical summaries of atmospheric data to predict lightning were used, achieving high accuracy. Our Approach 2 uses a similar analysis by using radar inputs, with similar results, validating the use of summarized features at different atmospheric levels.
In summary, this present work aligns with and extends prior research by proposing three radar-based ML approaches, demonstrating good recall score as well as generalization, and offering practical solutions for nowcasting in industrial contexts.
4.3. Sensibility Analysis
A sensibility analysis was conducted to evaluate how different preprocessing techniques, feature selection methods, and hyperparameter optimization strategies impacted model performance throughout the analysis of the approaches.
In Approach 1, imputing missing values using the class mean was more effective than other strategies, as it preserved class-specific characteristics. StandardScaler normalization was essential for models sensitive to feature scale, such as SVM and LR. Class imbalance was addressed using RandomUnderSampler, which improved recall for the minority class (lightning in the target location) but reduced training data volume. The use of grid search cross-validation and Boruta did not show any significant improvement. In fact, while SVM with grid search improved precision, it also reduced recall, highlighting the trade-offs in optimizing specific metrics, as shown in [
16]. In Approach 2, SMOTEENN outperformed undersampling by keeping more relevant samples. Imputation with median using SimpleImputer provided robustness against outliers. The DT model, in its turn, achieved better results without the need for advanced feature selection (this method is also known for providing feature importance ranking). It suggests that the statistical features based on altitude were informative, as also reported in the literature; see Abreu et al. [
6]. In Approach 3, sensitivity centered around the number of principal components. Using 300 components captured nearly all variance without performance loss. The Ensemble with the top 5 models significantly improved generalization and reduced variance, confirming the benefit of model aggregation.
Overall, while preprocessing techniques like scaling and balancing were important, the choice of feature representation, by layer, altitude, or vertical profile, had the most impact on predictive performance. These findings highlight the importance of adapting the ML modeling pipeline to the specific characteristics of meteorological data.
5. Conclusions
This study presented and evaluated different machine learning-based approaches for short-term lightning prediction using dual-polarization weather radar data over a mining region in the eastern Amazon. The study addressed a critical challenge in industrial operations exposed to open-air weather hazards: how to predict lightning with sufficient accuracy to protect workers and minimize unnecessary operational interruptions.
We explored three different approaches for feature representation, which are (i) by temperature-based atmospheric layers, (ii) altitude-based statistical summaries, and (iii) full vertical radar profiles processed via PCA. Among them, the PCA-based ensemble (Approach 3) showed the best generalization, while the layer-based method (Approach 1) delivered the highest recall, making it ideal for maximizing detection. The altitude-based statistical model (Approach 2) offered a lightweight yet effective alternative. These findings are in line with other works highlighting the relevance of feature engineering in radar-based lightning prediction and the trade-offs involved between recall, precision, and model robustness.
Additionally, the study tested the integration of the Ensemble model with the operational LEWAIS system, which is a quadrant-based technique used for lightning prediction. Among four integration strategies, the approach that triggered alerts when either LEWAIS or the ML ensemble predicted lightning (Model 3) yielded the lowest FTW rate and longest lead time, achieving the best balance for operational safety. In contrast, prioritizing the Ensemble model (Model 4) reduced FAR and operational downtime, a potential advantage for production-driven contexts, albeit at the expense of higher miss rates.
These results demonstrate the potential of radar-based ML models not only for improving forecasting performance but also for supporting decision making in operational systems. The integration of data science techniques into industrial safety frameworks, such as lightning alert systems, may offer an important path to reduce risks and optimize processes.
Despite the promising results presented in this study, some limitations should be acknowledged. First, the models were trained and tested with data obtained exclusively from a single X-band radar installed in a specific region of the eastern Amazon. Thus, the configuration of the scanning strategy of this radar used in this study was not optimized for the detection of storm severity, which may have negatively influenced the results found. This geographic and instrumental restriction may limit the generalization of the models to other regions with different microphysical characteristics or convective regimes. Furthermore, the study did not consider the integration of satellite data or numerical models, which could complement the radar information and improve the robustness of the forecasts.
It is also important to highlight that although the class balancing strategies reduced the influence of imbalance, the occurrence of discharges is naturally sparse, which may impact performance in extreme operational scenarios. Finally, the integrated warning system still depends on technological infrastructure and real-time data that may not be available in all industrial operations. These factors should be considered in future applications and studies to expand the model. Therefore, future work can explore the use of data fusion with satellite or NWP inputs, real-time deployment and DL architectures, aiming at greater scalability and operational intelligence.