Improving Weather Forecasting in Remote Regions Through Machine Learning
Abstract
:1. Introduction
2. Literature Review
2.1. Weather Prediction Using Numerical Weather Prediction (NWP) Methods
2.2. Weather Prediction Using Machine Learning Techniques
2.3. Weather Prediction Using Deep Learning Techniques
2.4. Weather Prediction Using Large Language Models (LLMs)
2.5. Weather Prediction Using Generative AI
3. Methodology
3.1. Data Collection
Key Components of This Dataset Includes
- Basic Weather Metrics: Temperature, humidity, pressure, wind direction, and wind speed, serving as the foundation of our weather forecasting model.
- Rain Indicator: A binary variable indicating the presence of rain, derived from detailed weather descriptions, to facilitate precise precipitation forecasting.
- Geographical and Environmental Features: Data on cities’ latitude, longitude, elevation, and their proximity to rivers, lakes, oceans, and deserts. These factors are crucial for understanding the local climate’s unique characteristics.
- City Attributes: Information on land cover, prevailing winds, and ocean currents, providing insights into the environmental and climatic influences on weather patterns.
- Socio-Environmental Indicators: Nearness to rivers, lakes, oceans, and deserts, alongside detailed descriptors of water body sizes and environmental settings, offering a nuanced view of each city’s susceptibility to various weather phenomena.
- Average Annual Temperature (°C): Providing insights into the typical temperature patterns experienced by each city throughout the year, influencing local climate dynamics and seasonal variations.
- Average Annual Rainfall (mm): Describing the amount of precipitation received by each city annually, a crucial factor in understanding its susceptibility to droughts, floods, and other weather-related phenomena.
- Humidity Levels (%): Reflecting the moisture content in the air, humidity levels play a vital role in determining the comfort level, as well as influencing precipitation and atmospheric stability.
- Population Density (people/km2): Highlighting the concentration of inhabitants within each city’s geographical area, population density serves as a proxy for urbanization, resource demand, and infrastructure resilience.
- Industrial Activity: Assessing the extent of industrialization and manufacturing activities within city limits, which can contribute to air pollution, heat island effects, and local microclimates.
- Average Air Quality Index (AQI): Quantifying the overall air quality based on pollutant levels, the AQI serves as a critical indicator of environmental health and public well-being, influencing respiratory health and weather patterns.
- Historical Weather Extremes: Documenting past instances of extreme weather events, including heatwaves, cyclones, and heavy rainfall, to gauge each city’s vulnerability and resilience to climate-related hazards.
- Solar Radiation (kWh/m2/day): Estimating the solar energy potential available in each city, solar radiation data informs renewable energy planning and sustainable development initiatives.
- Nearness to River/Lake/Ocean: Indicating the proximity of each city to significant water bodies, which can influence local weather patterns, especially during monsoon seasons.
- Size of River/Lake/Ocean: Providing information on the scale of nearby water bodies, which may affect humidity levels and precipitation rates.
- Nearness to Desert: Highlighting cities located near arid regions, which experience distinct weather phenomena and temperature extremes.
- Elevation in Meters: Reflecting the altitude of each city above sea level, which can impact temperature variations and atmospheric pressure.
- Land Cover and Vegetation: Describing the dominant land cover types in and around each city, influencing local microclimates and heat absorption.
- Prevailing Winds and Ocean Currents: Identifying prevailing wind directions and ocean currents that influence weather patterns, particularly along coastal regions.
3.2. Incorporating New Features for Enhanced Predictions
- Average Annual Temperature (°C)
- Average Annual Rainfall (mm)
- Humidity Levels (%)
- Population Density (people/km2)
- Industrial Activity
- Average Air Quality Index (AQI)
- Historical Weather Extremes
- Solar Radiation (KWh/m2/day)
3.3. Data Cleaning and Pre-Processing
- Loading Datasets: Utilizing the pandas library in Python, we imported the relevant datasets containing weather variables such as humidity, temperature, pressure, wind direction, and wind speed, along with additional city-specific features.
- Creating Target Variable: We created a binary target variable indicating the presence or absence of rain based on the weather description column. This involved labeling instances where the weather description contained keywords indicative of rain.
- Merging Datasets: We merged the weather description rain column with temperature, humidity, and pressure data for each city, consolidating the relevant variables into a single dataset.
- Handling Missing Values: To ensure the integrity of the dataset, any instances with missing values were dropped, mitigating potential biases in the analysis.
- Feature Scaling: As part of preprocessing, we standardized the numerical features using standard scaling to bring them to a common scale, preventing any feature from dominating the others during model training.
3.4. Model Development
- Rule-Based System (Traditional Programming): We initially explored rule-based systems using predefined thresholds and logic to predict rainfall probabilities based on individual weather parameters. While informative, these systems lacked the complexity to capture nuanced relationships between variables.
- Machine Learning (ML): We then transitioned to ML algorithms [52], such as logistic regression and decision trees, which enabled us to analyze historical weather data and make probabilistic rainfall predictions. By considering multiple variables simultaneously, ML models provided more comprehensive insights into rainfall patterns.
- Deep Learning: In the final phase, we leveraged deep learning techniques, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs), to further refine our predictive models. Deep learning excels at capturing complex patterns in large datasets, allowing us to generate more accurate and localized rainfall forecasts for Indian cities.
3.5. Feature Enhancement
3.6. Performance Evaluation
4. Result Sand Analysis
4.1. Rule-Based System (Traditional Programming)
4.1.1. High Humidity Rule
4.1.2. Low Pressure Rule
4.1.3. Weather Description Rule:
4.2. Machine Learning
4.2.1. Data Pre-Processing
4.2.2. Standardization of Numerical Features
4.2.3. One-Hot Encoding of Categorical Features
4.2.4. Logistic Regression Model
4.2.5. Model Performance Metrics
4.2.6. Receiver Operating Characteristic (ROC) Curve Analysis
4.3. Deep Learning
4.3.1. Evaluating Deep Learning Effectiveness
4.3.2. Data Pre-Processing
4.3.3. Standardization of Numerical Features
4.3.4. Ordinal Encoding for Ordinal Features
4.3.5. Model Equation
4.3.6. Dl Model-Neural Network
Data Pre-Processing
Neural Network Architecture
Model Training
Performance Insights
- Precision for ‘Rain’: The model identified the correct rain events with a precision of 40%, suggesting that it was able to label true rain instances accurately 40% of the time.
- Recall for ‘Rain’: The model exhibited a recall of 68%, indicating it successfully detected 68% of all actual rain events.
- F1-Score for ‘Rain’: With a balance between precision and recall, the model achieved an F1-Score of 50%.
Model Performance Analysis
- Confusion Matrix: The deep learning model’s confusion matrix (see Figure 3a) reveals a significant improvement in identifying true positives for rain events compared to previous models. While the model demonstrates high accuracy, the occurrence of false positives— instances predicted as rain that did not actually result in rainfall—warrants further attention and optimization.
- ROC Curve: The model’s ROC curve (see Figure 2b) achieved an AUC of 0.86, illustrating an excellent ability to discriminate between rainy and non-rainy days. This superior AUC value, relative to the machine learning model, underscores the deep learning model’s refined prediction capabilities.
Model Training Dynamics
- Training vs. Validation Loss and Accuracy: Throughout the training epochs, we observed the model’s learning progression through its decreasing loss and stabilizing accuracy (see Figure 3c,d. Despite fluctuations in the validation metrics, indicative of the model’s responsiveness to the data’s complexity, there was a notable convergence in training loss, suggesting effective learning.
Model Predictive Performance
Precision and Recall Trade-Off
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Baer, F. Numerical weather prediction. Adv. Comput. 2000, 52, 91–157. [Google Scholar] [CrossRef]
- Brotzge, J.A.; Berchoff, D.; Carlis, D.L.; Carr, F.H.; Carr, R.H.; Gerth, J.J.; Gross, B.D.; Hamill, T.M.; Haupt, S.E.; Jacobs, N.; et al. Challenges and Opportunities in Numerical Weather Prediction. Bull. Meteorol. Soc. 2023, 104, E698–E705. [Google Scholar] [CrossRef]
- Rasp, S.; Dueben, P.D.; Scher, S.; Weyn, J.A.; Mouatadid, S.; Thuerey, N. WeatherBench: A Benchmark Data Set for Data-Driven Weather Forecasting. J. Adv. Model Earth Syst. 2020, 12, 203. [Google Scholar] [CrossRef]
- Cressman, G.P. Numerical Weather Prediction in daily use. Sci. New Ser. 1965, 148, 319–327. [Google Scholar] [CrossRef] [PubMed]
- Robertson, D.E.; Shrestha, D.L.; Wang, Q.J. Post-processing rainfall forecasts from numerical weather prediction models for short-term streamflow forecasting. Hydrol. Earth Syst. Sci. 2013, 17, 3587–3603. [Google Scholar] [CrossRef]
- Foemko, A.A. Numerical Method for Weather Forecasting Problem. EOLSS. 2022. Available online: https://www.eolss.net/sample-chapters/C02/E6-04-05-05.pdf (accessed on 10 October 2024).
- Hollingsworth, A.; Viterbo, P.; Simmons, A.J. The Relevance of Numerical Weather Prediction for Forecasting Natural Hazards and for Monitoring the Global Environment. 2002. Available online: https://www.ecmwf.int/sites/default/files/elibrary/2002/9989-relevance-numerical-weather-prediction-forecasting-natural-hazards-and-monitoring-global.pdf (accessed on 10 October 2024).
- Basu, A.; Halder, A. Importance of Numerical Weather Prediction in Variable Renewable Energy Forecast. 2017. Available online: https://regridintegrationindia.org/wp-content/uploads/sites/3/2017/09/6B_5_GIZ17_164_paper_Abhijit_Basu.pdf (accessed on 10 October 2024).
- Schulze, G.C. Atmospheric observations and numerical weather prediction. S. Afr. J. Sci. 2007, 103, 318–323. [Google Scholar]
- Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Available online: https://christophm.github.io/interpretable-ml-book/ (accessed on 10 October 2024).
- Bochenek, B.; Ustrnul, Z. Machine Learning in Weather Prediction and Climate Analyses—Applications and Perspectives. Atmosphere 2022, 13, 180. [Google Scholar] [CrossRef]
- Tandon, S.; Patel, A.; Singh, P.K. Weather Prediction Using Machine Learning. 2021. Available online: https://ssrn.com/abstract=3836085 (accessed on 12 February 2021).
- Singh, S.; Kaushik, M.; Gupta, A.; Malviyaanilkmalviya, A.K. Weather Forecasting Using Machine Learning Techniques. 2019. Available online: https://ssrn.com/abstract=3350281 (accessed on 11 March 2019).
- Haque, S.; Eberhart, Z.; Bansal, A.; McMillan, C. Semantic Similarity Metrics for Evaluating Source Code Summarization. In Proceedings of the IEEE International Conference on Program Comprehension, IEEE Computer Society, Virtual, 16–17 May 2022; pp. 36–47. [Google Scholar] [CrossRef]
- Jayasingh, S.K.; Mantri, J.K.; Pradhan, S. Smart Weather Prediction Using Machine Learning; Lecture Notes in Networks and Systems; Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2022; pp. 571–583. [Google Scholar] [CrossRef]
- Hemalatha, G.; Rao, K.S.; Kumar, D.A. Weather Prediction using Advanced Machine Learning Techniques. Phys. Conf. Ser. 2021, 2089, 012059. [Google Scholar] [CrossRef]
- Hanoon, M.S.; Ahmed, A.N.; Zaini, N.A.; Razzaq, A.; Kumar, P.; Sherif, M.; Sefelnasr, A.; El-Shafie, A. Developing machine learning algorithms for meteorological temperature and humidity forecasting at Terengganu state in Malaysia. Sci. Rep. 2021, 11, 18935. [Google Scholar] [CrossRef]
- Oshodi, I. Machine learning-based algorithms for weather forecasting. Int. J. Artif. Intell. Mach. Learn 2022, 2, 12–20. [Google Scholar] [CrossRef]
- Fowdur, T.P.; Nassir-Ud-Diin, I.N.R.M. A real-time collaborative machine learning based weather forecasting system with multiple predictor locations. Array 2022, 14, 100153. [Google Scholar] [CrossRef]
- Rahman, M.S.; Tumpa, F.A.; Islam, M.S.; Arabi, A.A.; Hossain, M.S.B.; Haque, M.S.U. Comparative Evaluation of Weather Forecasting using Machine Learning Models. arXiv 2024, arXiv:2402.01206. [Google Scholar]
- Brownlee, J. Deep Learning for Time Series Forecasting: Predict the Future with MLPs, CNNs and LSTMs in Python; Machine Learning Mastery: Melbourne, Australia, 2018. [Google Scholar]
- Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Clayton, Australia, 2018. [Google Scholar]
- Zenkner, G.; Navarro-Martinez, S. A flexible and lightweight deep learning weather forecasting model. Appl. Intell. 2023, 53, 24991–25002. [Google Scholar] [CrossRef]
- Kumar, A.; Sharma, S. Deep Learning Approaches for Weather Forecasting: A Comprehensive Review. Int. J. Comput. Appl. 2020, 177, 20–26. [Google Scholar]
- Gong, B.; Langguth, M.; Ji, Y.; Mozaffari, A.; Stadtler, S.; Mache, K.; Schultz, M.G. Temperature forecasting by deep learning methods. Geosci. Model Dev. 2022, 15, 8931–8956. [Google Scholar] [CrossRef]
- Schultz, M.G.; Betancourt, C.; Gong, B.; Kleinert, F.; Langguth, M.; Leufen, L.H.; Leufen, L.H.; Stadtler, A.M.S. Can deep learning beat numerical weather prediction? Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 2021, 379, 2194. [Google Scholar] [CrossRef]
- Ren, X.; Li, X.; Ren, K.; Song, J.; Xu, Z.; Deng, K.; Wang, X. Deep Learning-Based Weather Prediction: A Survey. Big Data Res. 2021, 23, 100178. [Google Scholar] [CrossRef]
- Espeholt, L.; Agrawal, S.; Sønderby, C.; Kumar, M.; Heek, J.; Bromberg, C.; Gazen, C.; Carver, R.; Andrychowicz, M.; Kalchbrenner, N.; et al. Deep learning for twelve hour precipitation forecasts. Nat. Commun. 2022, 13, 1–10. [Google Scholar] [CrossRef]
- Weyn, J.A.; Durran, D.R.; Caruana, R. Improving Data-Driven Global Weather Prediction Using Deep Convolutional Neural Networks on a Cubed Sphere. J. Adv. Model Earth Syst. 2020, 12, 2109. [Google Scholar] [CrossRef]
- Universitas Indonesia. In Proceedings of the ICACSIS 2015:2015 International Conference on Advanced Computer Science and Information Systems, Depok, Indonesia, 10–11 October 2015.
- Roy, D.S. Forecasting the Air Temperature at a Weather Station Using Deep Neural Networks. In Procedia Computer Science; Elsevier: Amsterdam, The Netherlands, 2020; pp. 38–46. [Google Scholar] [CrossRef]
- Hewage, P.; Trovati, M.; Pereira, E.; Behera, A. Deep learning-based effective fine-grained weather forecasting model. Pattern Anal. Appl. 2021, 24, 343–366. [Google Scholar] [CrossRef]
- Grönquist, P.; Yao, C.; Ben-Nun, T.; Dryden, N.; Dueben, P.; Li, S.; Hoefler, T. Deep learning for post-processing ensemble weather forecasts. Philos. Trans. R. Soc. 2021, 379, 20200092. [Google Scholar] [CrossRef]
- Li, X.; Zhang, Y. Large Language Models in Weather Forecasting: Opportunities and Challenges. J. Artif. Intell. Weather. 2021, 38, 450–465. [Google Scholar]
- Suthar, T.; Shah, T.; Raja, M.K.; Raha, S.; Kumar, A.; Ponnusamy, M. Predicting Weather Forecast Uncertainty based on Large Ensemble of Deep Learning Approach. In Proceedings of the International Conference on Self Sustainable Artificial Intelligence Systems, ICSSAS 2023—Proceedings, Institute of Electrical and Electronics Engineers Inc., Erode, India, 18–20 October 2023; pp. 340–345. [Google Scholar] [CrossRef]
- He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
- Chang, C.; Wang, W.-Y.; Peng, W.-C.; Chen, T.-F. LLM4TS: Aligning Pre-Trained LLMs as Data- Efficient Time-Series Forecasters. arXiv 2023, arXiv:2308.08469. [Google Scholar] [CrossRef]
- Jin, M.; Zhang, Y.; Chen, W.; Zhang, K.; Liang, Y.; Yang, B.; Wen, Q. Position Paper: What Can Large Language Models Tell Us about Time Series Analysis. preprint 2024, arXiv:2402.02713. [Google Scholar]
- Jin, M.; Zhang, Y.; Chen, W.; Zhang, K.; Liang, Y.; Yang, B.; Jindong, W.; Shirui, P.; Qingsong, W. Time-LLM: Time Series Forecasting by Reprogramming Large Language Models. Available online: https://openreview.net/forum?id=Unb5CVPtae (accessed on 10 October 2024).
- Jones, L.; Brown, M. Advancements in AI for Weather Prediction. Weather. Clim. 2019, 12, 112–128. [Google Scholar]
- Gupta, R.; Patel, S. Generative AI for Weather Data Analysis. IEEE Trans. Neural Netw. 2017, 29, 2100–2113. [Google Scholar]
- Bihlo, A. A generative adversarial network approach to (ensemble) weather prediction. Neural Netw. 2021, 139, 1–16. [Google Scholar] [CrossRef]
- Li, L.; Carver, R.; Lopez-Gomez, I.; Sha, F.; Anderson, J. Generative Emulation of Weather Forecast Ensembles with Diffusion Models. 2024. Available online: https://www.science.org (accessed on 10 October 2024).
- Bilgin, O.; Mąka, P.; Vergutz, T.; Mehrkanoon, S. TENT: Tensorized Encoder Transformer for Temperature Forecasting. arXiv 2021, arXiv:2106.14742. [Google Scholar]
- Olarewaju, I.K.; Kim, K.K. Development of a Weather Prediction Device Using Transformer Models and IoT Techniques. J. Sens. Sci. Technol. 2022, 32, 164–168. [Google Scholar] [CrossRef]
- Alerskans, E.; Nyborg, J.; Birk, M.; Kaas, E. A transformer neural network for predicting near-surface temperature. Meteorol. Appl. 2022, 29, 2098. [Google Scholar] [CrossRef]
- Cachay, S.R.; Mitra, P.; Hirasawa, H.; Kim, S.; Hazarika, S.; Hingmire, D.; Rasch, P.; Singh, H.; Singh, K. ClimFormer-A Spherical Transformer Model for Long-term Climate Projections. In Proceedings of the Machine Learning and the Physical Sciences Workshop, NeurIPS 2022, New Orleans, LA, USA, 3 December 2022. [Google Scholar]
- Available online: https://www.kaggle.com/datasets/selfishgene/historical-hourly-weather-data/data (accessed on 10 October 2024).
- Ministry of Earth Sciences, Government of India. Climatic Classifications and Monthly Weather Reports. 2022. Available online: http://moes.gov.in/climate_data.html. (accessed on 10 October 2024).
- García, S.; Luengo, J.; Herrera, F. Data Preprocessing in Data Mining; Springer International Publishing: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Zheng, A.; Casari, A. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists; O’Reilly Media: Newton, MA, USA, 2018. [Google Scholar]
- Provides Comprehensive Guides and Examples for Machine Learning Models, Preprocessing, and Model Evaluation.TensorFlow Documentation. Available online: https://www.tensorflow.org/guide (accessed on 10 October 2024).
- World Meteorological Organization. Advances in Weather Data Collection and Analysis: A Comprehensive Review; WMO Technical Report; World Meteorological Organization: Geneva, Switzerland, 2021; Volume 567, pp. 1–50. [Google Scholar]
- Sharma, K.; Singh, P. Challenges in Weather Data Availability for Remote Areas in India. Environ. Res. Lett. 2020, 15, 034010. [Google Scholar] [CrossRef]
- India Meteorological Department. Data Supply Portal: Access and Limitations. 2023. Available online: https://www.india.gov.in/website-data-supply-portal (accessed on 10 October 2024).
- James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning: With Applications in R; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Chollet, F. Deep Learning with Python; Manning Publications: Shelter Island, NY, USA, 2017. [Google Scholar]
- Kumar, R.; Gupta, A. Impact of Historical Weather Data Availability on Weather Forecasting: A Case Study of Remote Cities in India. J. Remote. Sens. Clim. Stud. 2019, 25, 112–120. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yadav, K.; Malviya, S.; Tiwari, A.K. Improving Weather Forecasting in Remote Regions Through Machine Learning. Atmosphere 2025, 16, 587. https://doi.org/10.3390/atmos16050587
Yadav K, Malviya S, Tiwari AK. Improving Weather Forecasting in Remote Regions Through Machine Learning. Atmosphere. 2025; 16(5):587. https://doi.org/10.3390/atmos16050587
Chicago/Turabian StyleYadav, Kaushlendra, Saket Malviya, and Arvind Kumar Tiwari. 2025. "Improving Weather Forecasting in Remote Regions Through Machine Learning" Atmosphere 16, no. 5: 587. https://doi.org/10.3390/atmos16050587
APA StyleYadav, K., Malviya, S., & Tiwari, A. K. (2025). Improving Weather Forecasting in Remote Regions Through Machine Learning. Atmosphere, 16(5), 587. https://doi.org/10.3390/atmos16050587