Abstract
Accurate prediction of significant wave height (SWH) is central to coastal ocean dynamics, wave–climate assessment, and operational marine forecasting, yet many high-performing machine-learning (ML) models remain opaque and weakly connected to underlying wave physics. We propose an explainable, feature engineering-guided ML framework for coastal SWH prediction that combines extremal wave statistics, temporal descriptors, and SHAP-based interpretation. Using 30 min buoy observations from a high-energy, wave-dominated coastal site off Australia’s Gold Coast, we benchmarked seven regression models (Linear Regression, Decision Tree, Random Forest, Gradient Boosting, Support Vector Regression, K-Nearest Neighbors, and Neural Networks) across four feature sets: (i) Base (Hmax, Tz, Tp, SST, peak direction), (ii) Base + Temporal (lags, rolling statistics, cyclical hour/month encodings), (iii) Base + a physics-informed Wave Height Ratio, WHR = Hmax/Hs, and (iv) Full (Base + Temporal + WHR). Model skill is evaluated for full-year, 1-month, and 10-day prediction windows resulting in 84 distinct configurations. Performance was assessed using R2, RMSE, MAE, and bias metrics, with the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) employed for multi-criteria ranking. Inclusion of WHR systematically improves performance, raising test R2 from a baseline range of ~0.85–0.95 to values exceeding 0.97 and reducing RMSE by up to 86%, with a Random Forest|Base + WHR configuration achieving the top TOPSIS score (1.000). SHAP analysis identifies WHR and lagged SWH as dominant predictors, linking model behavior to extremal sea states and short-term memory in the wave field. The proposed framework demonstrates how embedding simple, physically motivated features and explainable AI tools can transform black-box coastal wave predictors into transparent models suitable for geophysical fluid dynamics, coastal hazard assessment, and wave-energy applications.