Machine Learning for Wind Pattern Estimation at Data-Scarce Coastal Ports: A Comparative Study Using Real Measurements
Abstract
1. Introduction
- We propose a unified, generalizable and multi-model framework for wind speed estimation in data-scarce coastal ports, integrating both deterministic statistical models and ML algorithms. This framework allows for the deployment of virtual wind stations where dense meteorological instrumentation is unavailable.
- We design a spatiotemporal data extraction and alignment methodology, which utilizes over five decades of hourly ERA5 reanalysis data from surrounding reference locations. By aggregating data across years for the same day and hour, we capture non-linear and recurring seasonal wind patterns that enrich the model training process.
- We implement and compare a set of deterministic models (e.g., simple averaging, correlation-weighted averaging) with five supervised ML models to assess their performance across different prediction horizons.
- Through extensive validation using real in situ wind speed measurements at the pilot port (Chalkida, Greece), we demonstrate that ensemble-based ML models, particularly GBR, offer significantly improved prediction accuracy in short-term forecasting scenarios. In long-term predictions, simpler tree-based models such as DTR remain competitive due to their lower variance and robustness to overfitting.
- Finally, we show that the proposed methodology is not tied to a specific model but can be periodically retrained and adapted to new data, enabling dynamic selection of the best-performing predictor depending on the forecasting horizon and environmental conditions.
2. Related Work
2.1. Wind Data Downscaling and Interpolation Techniques
2.2. Machine Learning in Meteorology
2.3. ERA5 Reanalysis Applications
2.4. Existing Gaps in Wind Prediction for Data-Scarce Ports
3. Materials and Methods
3.1. Telemetry System Installation at the Pilot Port
3.2. Dataset and Preprocessing
3.2.1. Data Sources
- Short-term Period: From 20 December to 31 December 2023;
- Long-term Period: From 26 February to 29 October 2024.
3.2.2. Variables of Interest
3.2.3. Preprocessing Steps
- 1.
- Timestamp Harmonization: ERA5 timestamps in Greek-formatted strings were converted to ISO-standard 24 h date–time format, ensuring consistent temporal indexing across all datasets.
- 2.
- Variable Transformation: Wind speed and direction were computed from the and components at every time point for all locations.
- 3.
- Temporal Alignment: All datasets (ERA5 and target sensor) were aligned based on the exact hour, day, and month to allow fair comparison across years and locations. To ensure uniformity in time indexing, all ERA5 datasets were inherently recorded at hourly resolution (i.e., on the hour: 00:00, 01:00, …, 23:00). On the other hand, the Chalkida port telemetry station records measurements every 30 min (e.g., 00:00, 00:30, 01:00, etc.). To harmonize the datasets for analysis, we retained only the Chalkida measurements that exactly matched the ERA5 timestamps (specifically, those recorded at full hours), thereby discarding the intermediate half-hour values. This alignment step ensured that all reference and target data points were perfectly synchronized on an hourly basis.
- 4.
- Seasonal Subset Extraction: For model training and evaluation, specific periods of interest were isolated. Specifically, the short-term evaluation window (form 20 December 2023 to 31 December 2023) and the long-term evaluation window (from 26 February 2024 to 29 October 2024) were extracted from all 54 years of ERA5 data to construct matching historical subsets. These subsets capture interannual variability while preserving daily characteristics.
- 5.
- 3D Matrix Reshaping: A 3-dimensional matrix was constructed with dimensions , where reference locations, T are the hourly time steps (separately for the two periods of interest), Y is the years. This matrix allowed model-specific extraction of historical patterns, year-wise averaging, and spatiotemporal feature engineering.
- 6.
- Feature Matrix Construction: For supervised learning models, the feature space was constructed by combining the wind speed values from the four reference points at each time step, while the target variable corresponded to the measured wind speed at at the pilot port.
3.3. Data Overview and Characteristics
3.4. General Predictive Framework
3.5. Wind Strength Prediction Models
3.5.1. Deterministic Models
3.5.2. Machine Learning Models
4. Numerical Results and Performance Evaluation
4.1. Experimental Setup
4.2. Prediction Curves and Fitting Performance
4.3. Comparative Performance Between Different Models
4.4. Multi-Metric Performance Evaluation
4.5. Time Lag Effect and Feature Importance
5. Discussion
Limitations
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| ECMWF | European Centre for Medium-Range Weather Forecasts |
| MLR | Multivariate Linear Regression |
| DTR | Decision Tree Regression |
| SVR | Support Vector Regression |
| RFR | Random Forest Regression |
| GBR | Gradient Boosting Regression |
| IDW | Inverse Distance Weighting |
| GBM | Gradient Boost Machine |
| ML | Machine Learning |
| DL | Deep Learning |
| AI | Artificial Intelligence |
| MSE | Mean Squared Error |
| ANN | Artificial Neural Network |
| SVM | Support Vector Machine |
| CNN | Convolutional Neural Network |
| LSTM | Long Short Term Memory |
| HTML | Hypertext Markup Language |
| SAM-last | Simple Averaging Model based on the last available year |
| SAM-sig | Simple Averaging Model based on the significant-correlation years |
| WAM-all | Weighted Averaging Model based on the last available year |
| WAM-sig | Weighted Averaging Model based on the significant-correlation years |
References
- Balas, E.A.; Balas, C.E. Maritime Risk Assessment: A Cutting-Edge Hybrid Model Integrating Automated Machine Learning and Deep Learning with Hydrodynamic and Monte Carlo Simulations. J. Mar. Sci. Eng. 2025, 13, 939. [Google Scholar] [CrossRef]
- Giannopoulos, A.; Gkonis, P.; Kalafatelis, A.; Nomikos, N.; Spantideas, S.; Trakadas, P.; Syriopoulos, T. From 6G to SeaX-G: Integrated 6G TN/NTN for AI-Assisted Maritime Communications—Architecture, Enablers, and Optimization Problems. J. Mar. Sci. Eng. 2025, 13, 1103. [Google Scholar] [CrossRef]
- Puig, M.; Cirera, A.; Wooldridge, C.; Sakellariadou, F.; Darbra, R.M. Mega Ports’ mitigation response and adaptation to climate change. J. Mar. Sci. Eng. 2024, 12, 1112. [Google Scholar] [CrossRef]
- Monioudi, I.N.; Chatzistratis, D.; Moschopoulos, K.; Velegrakis, A.F.; Polydoropoulou, A.; Chalazas, T.; Bouhouras, E.; Papaioannou, G.; Karakikes, I.; Thanopoulou, H. Exposure of Greek Ports to Marine Flooding and Extreme Heat Under Climate Change: An Assessment. Water 2025, 17, 1897. [Google Scholar] [CrossRef]
- Verschuur, J.; Koks, E.E.; Hall, J.W. Systemic risks from climate-related disruptions at ports. Nat. Clim. Change 2023, 13, 804–806. [Google Scholar] [CrossRef]
- Sáenz, S.S.; Diaz-Hernandez, G.; Schweter, L.; Nordbeck, P. Analysis of the mooring effects of future ultra-large container vessels (ULCV) on port infrastructures. J. Mar. Sci. Eng. 2023, 11, 856. [Google Scholar] [CrossRef]
- Gottschall, J.; Dörenkämper, M. Understanding and mitigating the impact of data gaps on offshore wind resource estimates. Wind. Energy Sci. Discuss. 2020, 2020, 1–22. [Google Scholar] [CrossRef]
- Gutiérrez, C.; Molina, M.; Ortega, M.; Lopez-Franca, N.; Sánchez, E. Low-wind climatology (1979–2018) over Europe from ERA5 reanalysis. Clim. Dyn. 2024, 62, 4155–4170. [Google Scholar] [CrossRef]
- Alkhalidi, M.; Al-Dabbous, A.; Al-Dabbous, S.; Alzaid, D. Evaluating the accuracy of the ERA5 model in predicting wind speeds across coastal and offshore regions. J. Mar. Sci. Eng. 2025, 13, 149. [Google Scholar] [CrossRef]
- Hallgren, C.; Aird, J.A.; Ivanell, S.; Körnich, H.; Vakkari, V.; Barthelmie, R.J.; Pryor, S.C.; Sahlée, E. Machine learning methods to improve spatial predictions of coastal wind speed profiles and low-level jets using single-level ERA5 data. Wind. Energy Sci. Discuss. 2023, 2023, 1–30. [Google Scholar] [CrossRef]
- Zambra, M.; Farrugia, N.; Cazau, D.; Gensse, A.; Fablet, R. Multimodal learning–based reconstruction of high-resolution spatial wind speed fields. Environ. Data Sci. 2025, 4, e2. [Google Scholar] [CrossRef]
- Baki, H.; Basu, S. Estimating high-resolution profiles of wind speeds from a global reanalysis dataset using TabNet. Environ. Data Sci. 2024, 3, e32. [Google Scholar] [CrossRef]
- Skianis, K.; Giannopoulos, A.; Spantideas, S.; Hatzaki, M.; Karditsa, A.; Trakadas, P. SWIRL: Statistical downscaling for Wind Pattern Reconstruction using Machine Learning. In Proceedings of the 18th International Conference on Environmental Science and Technology (CEST), Athens, Greece, 30 August–2 September 2023. [Google Scholar]
- Cavaiola, M.; Tuju, P.E.; Mazzino, A. Accurate and efficient AI-assisted paradigm for adding granularity to ERA5 precipitation reanalysis. Sci. Rep. 2024, 14, 26158. [Google Scholar] [CrossRef] [PubMed]
- Khattak, A.; Chan, P.W.; Chen, F.; Peng, H. Time-series prediction of intense wind shear using machine learning algorithms: A case study of Hong Kong International Airport. Atmosphere 2023, 14, 268. [Google Scholar] [CrossRef]
- Ratnam, J.; Behera, S.K.; Nonaka, M.; Martineau, P.; Patil, K.R. Predicting maximum temperatures over India 10-days ahead using machine learning models. Sci. Rep. 2023, 13, 17208. [Google Scholar] [CrossRef]
- Giannopoulos, A.E.; Spantideas, S.T.; Zetas, M.; Nomikos, N.; Trakadas, P. Fedship: Federated over-the-air learning for communication-efficient and privacy-aware smart shipping in 6g communications. IEEE Trans. Intell. Transp. Syst. 2024, 25, 19873–19888. [Google Scholar] [CrossRef]
- Dupuy, F.; Durand, P.; Hedde, T. Downscaling of surface wind forecasts using convolutional neural networks. Nonlinear Process. Geophys. 2023, 30, 553–570. [Google Scholar] [CrossRef]
- Lussana, C.; Salvati, M.; Pellegrini, U.; Uboldi, F. Efficient high-resolution 3-D interpolation of meteorological variables for operational use. Adv. Sci. Res. 2009, 3, 105–112. [Google Scholar] [CrossRef]
- Ryu, S.; Song, J.J.; Lee, G. Interpolation of temperature in a mountainous region using heterogeneous observation networks. Atmosphere 2024, 15, 1018. [Google Scholar] [CrossRef]
- Winstral, A.; Jonas, T.; Helbig, N. Statistical downscaling of gridded wind speed data using local topography. J. Hydrometeorol. 2017, 18, 335–348. [Google Scholar] [CrossRef]
- Talbot, C.; Bou-Zeid, E.; Smith, J. Nested mesoscale large-eddy simulations with WRF: Performance in real test cases. J. Hydrometeorol. 2012, 13, 1421–1441. [Google Scholar] [CrossRef]
- Slater, L.J.; Arnal, L.; Boucher, M.A.; Chang, A.Y.Y.; Moulds, S.; Murphy, C.; Nearing, G.; Shalev, G.; Shen, C.; Speight, L.; et al. Hybrid forecasting: Blending climate predictions with AI models. Hydrol. Earth Syst. Sci. 2023, 27, 1865–1889. [Google Scholar] [CrossRef]
- Yadav, K.; Malviya, S.; Tiwari, A.K. Improving Weather Forecasting in Remote Regions Through Machine Learning. Atmosphere 2025, 16, 587. [Google Scholar] [CrossRef]
- Leme Beu, C.M.; Landulfo, E. Machine-learning-based estimate of the wind speed over complex terrain using the long short-term memory (LSTM) recurrent neural network. Wind Energy Sci. 2024, 9, 1431–1450. [Google Scholar] [CrossRef]
- Schulz, B.; Lerch, S. Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison. Mon. Weather. Rev. 2022, 150, 235–257. [Google Scholar] [CrossRef]
- Zetas, M.; Spantideas, S.; Giannopoulos, A.; Nomikos, N.; Trakadas, P. Empowering 6G maritime communications with distributed intelligence and over-the-air model sharing. Front. Commun. Netw. 2024, 4, 1280602. [Google Scholar] [CrossRef]
- Alkhayat, G.; Mehmood, R. A review and taxonomy of wind and solar energy forecasting methods based on deep learning. Energy AI 2021, 4, 100060. [Google Scholar] [CrossRef]
- Sun, F.; Hao, W.; Zou, A.; Shen, Q. A survey on spatio-temporal series prediction with deep learning: Taxonomy, applications, and future directions. Neural Comput. Appl. 2024, 36, 9919–9943. [Google Scholar] [CrossRef]
- Zhao, Y.; Du, X.; Li, Q.; Zhang, Y.; Wang, H.; Wang, Y.; Xu, J.; Xiao, J.; Shen, Y.; Dong, Y.; et al. Mapping and Analyzing Winter Wheat Yields in the Huang-Huai-Hai Plain: A Climate-Independent Perspective. Remote Sens. 2025, 17, 1409. [Google Scholar] [CrossRef]
- Baïle, R.; Muzy, J.F. Leveraging data from nearby stations to improve short-term wind speed forecasts. Energy 2023, 263, 125644. [Google Scholar] [CrossRef]
- Shestakova, A.A.; Fedotova, E.V.; Lyulyukin, V.S. Relevance of Era5 reanalysis for wind energy applications: Comparison with sodar observations. Geogr. Environ. Sustain. 2024, 17, 54–66. [Google Scholar] [CrossRef]
- Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horányi, A.; Muñoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 2020, 146, 1999–2049. [Google Scholar] [CrossRef]
- Cucchi, M.; Weedon, G.P.; Amici, A.; Bellouin, N.; Lange, S.; Schmied, H.M.; Hersbach, H.; Buontempo, C. WFDE5: Bias adjusted ERA5 reanalysis data for impact studies. Earth Syst. Sci. Data 2020, 12, 2097–2120. [Google Scholar] [CrossRef]
- Hassler, B.; Lauer, A. Comparison of reanalysis and observational precipitation datasets including ERA5 and WFDE5. Atmosphere 2021, 12, 1462. [Google Scholar] [CrossRef]
- Belmonte Rivas, M.; Stoffelen, A. Characterizing ERA-Interim and ERA5 surface wind biases using ASCAT. Ocean Sci. 2019, 15, 831–852. [Google Scholar] [CrossRef]
- Tascikaraoglu, A.; Uzunoglu, M. A review of combined approaches for prediction of short-term wind speed and power. Renew. Sustain. Energy Rev. 2014, 34, 243–254. [Google Scholar] [CrossRef]
- Davidson, M.R.; Millstein, D. Limitations of reanalysis data for wind power applications. Wind Energy 2022, 25, 1646–1653. [Google Scholar] [CrossRef]
- Prakash, A.; Tuo, R.; Ding, Y. The temporal overfitting problem with applications in wind power curve modeling. Technometrics 2023, 65, 70–82. [Google Scholar] [CrossRef]
- Ho, C.Y.; Cheng, K.S.; Ang, C.H. Utilizing the random forest method for short-term wind speed forecasting in the coastal area of central Taiwan. Energies 2023, 16, 1374. [Google Scholar] [CrossRef]
- Angelopoulos, A.; Giannopoulos, A.; Spantideas, S.; Kapsalis, N.; Trochoutsos, C.; Voliotis, S.; Trakadas, P. Allocating orders to printing machines for defect minimization: A comparative machine learning approach. In Proceedings of the IFIP International Conference on Artificial Intelligence Applications and Innovations, Crete, Greece, 17–20 June 2022; Springer: Cham, Switzerland, 2022; pp. 79–88. [Google Scholar]
- Uyanık, G.K.; Güler, N. A study on multiple linear regression analysis. Procedia-Soc. Behav. Sci. 2013, 106, 234–240. [Google Scholar] [CrossRef]
- Xu, M.; Watanachaturaporn, P.; Varshney, P.K.; Arora, M.K. Decision tree regression for soft classification of remote sensing data. Remote Sens. Environ. 2005, 97, 322–336. [Google Scholar] [CrossRef]
- Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
- Borup, D.; Christensen, B.J.; Mühlbach, N.S.; Nielsen, M.S. Targeting predictors in random forest regression. Int. J. Forecast. 2023, 39, 841–868. [Google Scholar] [CrossRef]
- Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef] [PubMed]
- Altmann, A.; Toloşi, L.; Sander, O.; Lengauer, T. Permutation importance: A corrected feature importance measure. Bioinformatics 2010, 26, 1340–1347. [Google Scholar] [CrossRef] [PubMed]
- Beucler, T.; Gentine, P.; Yuval, J.; Gupta, A.; Peng, L.; Lin, J.; Yu, S.; Rasp, S.; Ahmed, F.; O’Gorman, P.A.; et al. Climate-invariant machine learning. Sci. Adv. 2024, 10, eadj7250. [Google Scholar] [CrossRef] [PubMed]
- Sommer, B.; Pinson, P.; Messner, J.W.; Obst, D. Online distributed learning in wind power forecasting. Int. J. Forecast. 2021, 37, 205–223. [Google Scholar] [CrossRef]















| Dataset Column | Description | Size (Samples) |
|---|---|---|
| Timestamp (t) | The timestamp of each sample in the form on year-month-day-hour | 473,352 |
| Longitude (HGRS87 *) | Longitude of reference points in HGRS87 system | 4 |
| Latitude (HGRS87 *) | Latitude of reference points in HGRS87 system | 4 |
| Eastward wind speed () | Wind speed values (m/s) along the x-axis | 473,352 |
| Northward wind speed () | Wind speed values (m/s) along the y-axis | 473,352 |
| Model | MSE | MAE | RMSE | MAPE (%) | R2 | WI (d) | SFS |
|---|---|---|---|---|---|---|---|
| SAM-last | 0.94 | 0.74 | 0.97 | 122.41 | −0.15 | 0.51 | −0.31 |
| SAM-sig | 1.48 | 1.04 | 1.21 | 184.91 | −0.80 | 0.55 | −1.33 |
| WAM-all | 0.94 | 0.74 | 0.97 | 122.17 | −0.15 | 0.51 | −0.29 |
| WAM-sig | 0.94 | 0.74 | 0.97 | 122.24 | −0.15 | 0.51 | −0.31 |
| MLR | 0.75 | 0.64 | 0.86 | 100.87 | 0.085 | 0.35 | −1.19 |
| DTR | 0.23 | 0.33 | 0.48 | 36.71 | 0.71 | 0.91 | 0.75 |
| SVR | 0.81 | 0.61 | 0.89 | 82.08 | 0.016 | 0.35 | −1.33 |
| RFR | 0.35 | 0.43 | 0.59 | 66.64 | 0.57 | 0.80 | 0.176 |
| GBR | 2.14 × 10−5 | 0.0035 | 0.005 | 1.17 | 0.97 | 0.96 | 0.96 |
| Metric | Wind Range * (ms) | SAM-last | SAM-sig | WAM-all | WAM-sig | MLR | DTR | SVR | RFR | GBR |
|---|---|---|---|---|---|---|---|---|---|---|
| MSE | Low | 0.72 | 1.45 | 0.72 | 0.72 | 0.46 | 0.18 | 0.44 | 0.22 | 2.12 × 10−5 |
| High | 5.35 | 2.08 | 5.38 | 5.36 | 6.52 | 1.25 | 8.11 | 2.95 | 3.11 × 10−5 | |
| MAE | Low | 0.67 | 1.04 | 0.67 | 0.67 | 0.55 | 0.30 | 0.51 | 0.37 | 0.0034 |
| High | 2.25 | 1.23 | 2.25 | 2.25 | 2.51 | 0.91 | 2.81 | 1.66 | 0.0037 | |
| RMSE | Low | 0.85 | 1.20 | 0.85 | 0.85 | 0.68 | 0.43 | 0.67 | 0.47 | 0.004 |
| High | 2.31 | 1.44 | 2.32 | 2.31 | 2.55 | 1.12 | 2.85 | 1.72 | 0.005 | |
| MAPE (%) | Low | 115.73 | 182.63 | 115.47 | 115.54 | 92.78 | 27.41 | 72.69 | 57.91 | 1.11 |
| High | 142.83 | 197.66 | 136.33 | 136.31 | 113.87 | 39.76 | 91.69 | 71.22 | 1.81 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Giannopoulos, A.; Karditsa, A.; Hatzaki, M.; Trakadas, P. Machine Learning for Wind Pattern Estimation at Data-Scarce Coastal Ports: A Comparative Study Using Real Measurements. J. Mar. Sci. Eng. 2025, 13, 2375. https://doi.org/10.3390/jmse13122375
Giannopoulos A, Karditsa A, Hatzaki M, Trakadas P. Machine Learning for Wind Pattern Estimation at Data-Scarce Coastal Ports: A Comparative Study Using Real Measurements. Journal of Marine Science and Engineering. 2025; 13(12):2375. https://doi.org/10.3390/jmse13122375
Chicago/Turabian StyleGiannopoulos, Anastasios, Aikaterini Karditsa, Maria Hatzaki, and Panagiotis Trakadas. 2025. "Machine Learning for Wind Pattern Estimation at Data-Scarce Coastal Ports: A Comparative Study Using Real Measurements" Journal of Marine Science and Engineering 13, no. 12: 2375. https://doi.org/10.3390/jmse13122375
APA StyleGiannopoulos, A., Karditsa, A., Hatzaki, M., & Trakadas, P. (2025). Machine Learning for Wind Pattern Estimation at Data-Scarce Coastal Ports: A Comparative Study Using Real Measurements. Journal of Marine Science and Engineering, 13(12), 2375. https://doi.org/10.3390/jmse13122375

