You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

12 December 2025

From Black Box to Transparency: An Explainable Machine Learning (ML) Framework for Ocean Wave Prediction Using SHAP and Feature-Engineering-Derived Variable

Faculty of Engineering and Natural Sciences, Istanbul Medipol University, Beykoz, Istanbul 34181, Turkey
This article belongs to the Special Issue AI-Driven Computational Methods: Theories, Algorithms and Applications

Abstract

Accurate prediction of significant wave height (SWH) is central to coastal ocean dynamics, wave–climate assessment, and operational marine forecasting, yet many high-performing machine-learning (ML) models remain opaque and weakly connected to underlying wave physics. We propose an explainable, feature engineering-guided ML framework for coastal SWH prediction that combines extremal wave statistics, temporal descriptors, and SHAP-based interpretation. Using 30 min buoy observations from a high-energy, wave-dominated coastal site off Australia’s Gold Coast, we benchmarked seven regression models (Linear Regression, Decision Tree, Random Forest, Gradient Boosting, Support Vector Regression, K-Nearest Neighbors, and Neural Networks) across four feature sets: (i) Base (Hmax, Tz, Tp, SST, peak direction), (ii) Base + Temporal (lags, rolling statistics, cyclical hour/month encodings), (iii) Base + a physics-informed Wave Height Ratio, WHR = Hmax/Hs, and (iv) Full (Base + Temporal + WHR). Model skill is evaluated for full-year, 1-month, and 10-day prediction windows resulting in 84 distinct configurations. Performance was assessed using R2, RMSE, MAE, and bias metrics, with the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) employed for multi-criteria ranking. Inclusion of WHR systematically improves performance, raising test R2 from a baseline range of ~0.85–0.95 to values exceeding 0.97 and reducing RMSE by up to 86%, with a Random Forest|Base + WHR configuration achieving the top TOPSIS score (1.000). SHAP analysis identifies WHR and lagged SWH as dominant predictors, linking model behavior to extremal sea states and short-term memory in the wave field. The proposed framework demonstrates how embedding simple, physically motivated features and explainable AI tools can transform black-box coastal wave predictors into transparent models suitable for geophysical fluid dynamics, coastal hazard assessment, and wave-energy applications.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.