Author Contributions
Conceptualization, G.M.M., A.G.S., G.T.D. and N.E.B.; data curation, G.M.M. and A.G.S.; formal analysis, G.M.M., A.G.S., G.T.D. and N.E.B.; investigation, G.M.M. and A.G.S.; methodology, G.M.M., A.G.S., G.T.D., N.E.B. and Y.M.; validation, G.M.M., A.G.S., G.T.D. and N.E.B.; resources, G.M.M. and A.G.S.; writing—original draft preparation, G.M.M.; writing—review and editing, G.M.M., A.G.S., G.T.D., N.E.B., E.O.G. and Y.M.; visualization, G.M.M., A.G.S. and N.E.B.; supervision, A.G.S., G.T.D. and E.O.G.; project administration, A.G.S. and E.O.G.; funding acquisition, E.O.G. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Map of the Hombole catchment located within the UAB, Ethiopia.
Figure 1.
Map of the Hombole catchment located within the UAB, Ethiopia.
Figure 2.
Daily rainfall and discharge time series for the Hombole, Tulu Bolo, Ginchi, and Addis Ababa stations from 1981 to 2020. The plot shows strong seasonal rainfall and discharge patterns with clear wet and dry periods.
Figure 2.
Daily rainfall and discharge time series for the Hombole, Tulu Bolo, Ginchi, and Addis Ababa stations from 1981 to 2020. The plot shows strong seasonal rainfall and discharge patterns with clear wet and dry periods.
Figure 3.
Monthly discharge distribution at Hombole station (1981–2020). The plot shows pronounced wet-season peaks and dry-season lows, consistent with rainfall patterns across the UAB.
Figure 3.
Monthly discharge distribution at Hombole station (1981–2020). The plot shows pronounced wet-season peaks and dry-season lows, consistent with rainfall patterns across the UAB.
Figure 4.
Heatmap of cross-correlations between discharge, lagged discharge, and precipitation at the Hombole, Tulu Bolo, Ginchi, and Addis Ababa stations from 1981 to 2020.
Figure 4.
Heatmap of cross-correlations between discharge, lagged discharge, and precipitation at the Hombole, Tulu Bolo, Ginchi, and Addis Ababa stations from 1981 to 2020.
Figure 5.
Methodological workflow for flood forecasting using deep learning.
Figure 5.
Methodological workflow for flood forecasting using deep learning.
Figure 6.
Learning curves for training and validation metrics of the (a) CNN, (b) LSTM, (c) GRU, (d) BiLSTM, and (e) Hybrid CNN–LSTM models.
Figure 6.
Learning curves for training and validation metrics of the (a) CNN, (b) LSTM, (c) GRU, (d) BiLSTM, and (e) Hybrid CNN–LSTM models.
Figure 7.
Time series plot of observed vs. predicted discharge at Hombole using the CNN model.
Figure 7.
Time series plot of observed vs. predicted discharge at Hombole using the CNN model.
Figure 8.
Scatter plots of observed vs. predicted discharge for training, validation, and test sets using CNN.
Figure 8.
Scatter plots of observed vs. predicted discharge for training, validation, and test sets using CNN.
Figure 9.
Time series plot of predicted vs. observed discharge using the LSTM model.
Figure 9.
Time series plot of predicted vs. observed discharge using the LSTM model.
Figure 10.
Scatter plots of observed vs. predicted discharge for training, validation, and test sets using the LSTM model.
Figure 10.
Scatter plots of observed vs. predicted discharge for training, validation, and test sets using the LSTM model.
Figure 11.
Time series plot of predicted vs. observed discharge using the GRU model.
Figure 11.
Time series plot of predicted vs. observed discharge using the GRU model.
Figure 12.
Scatter plots of observed vs. predicted discharge for training, validation, and test sets using the GRU model.
Figure 12.
Scatter plots of observed vs. predicted discharge for training, validation, and test sets using the GRU model.
Figure 13.
Time series of observed vs. predicted discharge using the BiLSTM model.
Figure 13.
Time series of observed vs. predicted discharge using the BiLSTM model.
Figure 14.
Scatter plots of observed vs. predicted discharge for training, validation, and test sets using BiLSTM.
Figure 14.
Scatter plots of observed vs. predicted discharge for training, validation, and test sets using BiLSTM.
Figure 15.
Time series of observed vs. predicted discharge using the Hybrid CNN–LSTM model.
Figure 15.
Time series of observed vs. predicted discharge using the Hybrid CNN–LSTM model.
Figure 16.
Scatter plots of observed vs. predicted discharge for training, validation, and test sets using the Hybrid CNN–LSTM model.
Figure 16.
Scatter plots of observed vs. predicted discharge for training, validation, and test sets using the Hybrid CNN–LSTM model.
Figure 17.
Time series of observed vs. predicted discharge using the climatology (long-term mean) model.
Figure 17.
Time series of observed vs. predicted discharge using the climatology (long-term mean) model.
Figure 18.
Scatter plots of observed vs. predicted discharge for training, validation, and test sets using the climatology (long-term mean) model.
Figure 18.
Scatter plots of observed vs. predicted discharge for training, validation, and test sets using the climatology (long-term mean) model.
Figure 19.
Time series of observed versus simulated discharge using the HBV conceptual hydrological model.
Figure 19.
Time series of observed versus simulated discharge using the HBV conceptual hydrological model.
Figure 20.
Scatter plots of observed versus simulated discharge for training, validation, and test sets using the HBV model.
Figure 20.
Scatter plots of observed versus simulated discharge for training, validation, and test sets using the HBV model.
Table 1.
Summary statistics for discharge (m3/s, at Hombole station) and rainfall (mm/day) at four locations.
Table 1.
Summary statistics for discharge (m3/s, at Hombole station) and rainfall (mm/day) at four locations.
| Variables | Count | Mean | Std | Min | 25% | 50% | 75% | Max |
|---|
| Discharge | 14,610 | 44.69 | 79.27 | 0.40 | 4.22 | 8.074 | 40.37 | 803.10 |
| Hombole | 14,610 | 2.14 | 6.93 | 0.00 | 0.00 | 0.009 | 0.52 | 82.836 |
| Tulu Bolo | 14,610 | 2.93 | 6.25 | 0.00 | 0.00 | 0.082 | 2.57 | 83.248 |
| Ginchi | 14,610 | 3.08 | 5.91 | 0.00 | 0.00 | 0.176 | 3.75 | 66.537 |
| Addis Ababa | 14,610 | 2.77 | 5.58 | 0.00 | 0.00 | 0.103 | 3.00 | 86.808 |
Table 2.
Performance interpretation of model evaluation metrics used in this study.
Table 2.
Performance interpretation of model evaluation metrics used in this study.
| Metric | Range | Ideal Value | Performance Criteria |
|---|
| MAE | [0, ∞) | 0 | Lower values indicate higher accuracy |
| RMSE | [0, ∞) | 0 | Sensitive to large errors; lower is better |
| NSE | (−∞, 1] | 1 | >0.75 Excellent; 0.65–0.75 Good; 0.5–0.65 Satisfactory; <0.5 Poor |
| KGE | (−∞, 1] | 1 | >0.75 Excellent; 0.65–0.75 Good; 0.5–0.65 Satisfactory; <0.5 Poor |
| PBIAS (%) | (−∞, ∞) | 0 | |PBIAS| < 10 Excellent; 10–15 Good; 15–25 Satisfactory; >25 Poor |
Table 3.
Seasonal performance metrics of the CNN model across training, validation, and test sets.
Table 3.
Seasonal performance metrics of the CNN model across training, validation, and test sets.
| Dataset | Season | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | Wet | 21.188 | 36.459 | 0.875 | 0.859 | −3.987 |
| | Dry | 3.684 | 5.925 | 0.708 | 0.645 | 26.592 |
| Validation | Wet | 24.876 | 40.34 | 0.809 | 0.83 | −4.940 |
| | Dry | 3.301 | 5.632 | 0.520 | 0.511 | 8.195 |
| Test | Wet | 34.936 | 58.559 | 0.735 | 0.786 | −6.070 |
| | Dry | 5.609 | 15.927 | 0.366 | 0.488 | 2.286 |
Table 4.
Top 30 peak-flow performance metrics for the CNN model.
Table 4.
Top 30 peak-flow performance metrics for the CNN model.
| Dataset | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | 174.537 | 196.601 | −5.702 | 0.294 | −31.677 |
| Validation | 125.643 | 140.605 | −7.409 | 0.294 | −31.499 |
| Test | 165.536 | 185.624 | −18.527 | −0.277 | −35.273 |
Table 5.
Seasonal performance metrics of the LSTM model across data splits.
Table 5.
Seasonal performance metrics of the LSTM model across data splits.
| Dataset | Season | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | Wet | 23.03 | 39.28 | 0.86 | 0.89 | −1.20 |
| | Dry | 2.45 | 5.78 | 0.72 | 0.79 | 4.92 |
| Validation | Wet | 24.26 | 39.89 | 0.82 | 0.86 | −2.62 |
| | Dry | 2.43 | 5.28 | 0.58 | 0.62 | −7.82 |
| Test | Wet | 34.35 | 56.57 | 0.75 | 0.84 | −1.46 |
| | Dry | 4.52 | 14.99 | 0.44 | 0.49 | −11.36 |
Table 6.
Top 30 peak-flow performance metrics for the LSTM model.
Table 6.
Top 30 peak-flow performance metrics for the LSTM model.
| Dataset | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | 171.25 | 200.85 | −6.00 | 0.11 | −29.90 |
| Validation | 107.52 | 126.97 | −5.86 | 0.02 | −26.90 |
| Test | 136.94 | 163.82 | −14.21 | −0.46 | −28.62 |
Table 7.
Seasonal performance metrics of the GRU model across training, validation, and test sets.
Table 7.
Seasonal performance metrics of the GRU model across training, validation, and test sets.
| Dataset | Season | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | Wet | 22.43 | 38.91 | 0.86 | 0.87 | −3.49 |
| | Dry | 3.24 | 5.78 | 0.72 | 0.68 | 23.30 |
| Validation | Wet | 24.18 | 40.34 | 0.82 | 0.83 | −4.29 |
| | Dry | 2.94 | 5.27 | 0.58 | 0.60 | 9.71 |
| Test | Wet | 33.95 | 57.40 | 0.75 | 0.81 | −5.65 |
| | Dry | 4.80 | 15.01 | 0.44 | 0.46 | −0.10 |
Table 8.
Top 30 peak-flow performance metrics for the GRU model.
Table 8.
Top 30 peak-flow performance metrics for the GRU model.
| Dataset | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | 168.74 | 201.28 | −6.03 | 0.13 | −30.29 |
| Validation | 115.45 | 133.99 | −6.64 | 0.00 | −28.94 |
| Test | 148.02 | 174.38 | −16.23 | −0.46 | −31.40 |
Table 9.
Seasonal performance metrics of the BiLSTM model across training, validation, and test sets.
Table 9.
Seasonal performance metrics of the BiLSTM model across training, validation, and test sets.
| Dataset | Season | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | Wet | 22.32 | 38.65 | 0.86 | 0.90 | −2.23 |
| | Dry | 2.18 | 5.27 | 0.77 | 0.83 | −1.02 |
| Validation | Wet | 25.03 | 40.90 | 0.81 | 0.84 | −3.27 |
| | Dry | 2.34 | 5.29 | 0.58 | 0.61 | −13.16 |
| Test | Wet | 35.05 | 59.96 | 0.72 | 0.86 | −0.99 |
| | Dry | 4.76 | 14.95 | 0.44 | 0.54 | −13.86 |
Table 10.
Top 30 peak-flow performance metrics for the BiLSTM model across all sets.
Table 10.
Top 30 peak-flow performance metrics for the BiLSTM model across all sets.
| Dataset | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | 152.55 | 185.18 | −4.95 | −0.26 | −20.63 |
| Validation | 115.48 | 134.15 | −6.66 | 0.03 | −28.95 |
| Test | 133.51 | 163.45 | −14.14 | −1.22 | −22.20 |
Table 11.
Seasonal performance metrics of the Hybrid CNN–LSTM model across training, validation, and test sets.
Table 11.
Seasonal performance metrics of the Hybrid CNN–LSTM model across training, validation, and test sets.
| Dataset | Season | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | Wet | 19.23 | 32.15 | 0.90 | 0.92 | −1.33 |
| | Dry | 2.75 | 5.08 | 0.79 | 0.76 | 16.43 |
| Validation | Wet | 25.98 | 42.04 | 0.80 | 0.84 | −3.22 |
| | Dry | 2.61 | 5.52 | 0.54 | 0.58 | 0.91 |
| Test | Wet | 35.98 | 58.15 | 0.74 | 0.84 | −1.44 |
| | Dry | 4.64 | 14.89 | 0.45 | 0.51 | −6.12 |
Table 12.
Top 30 peak-flow performance metrics for the Hybrid CNN–LSTM model across training, validation, and test sets.
Table 12.
Top 30 peak-flow performance metrics for the Hybrid CNN–LSTM model across training, validation, and test sets.
| Dataset | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | 133.34 | 157.59 | −3.31 | −0.19 | −15.15 |
| Validation | 121.00 | 135.70 | −6.83 | 0.12 | −30.34 |
| Test | 128.70 | 161.58 | −13.80 | −0.55 | −27.38 |
Table 13.
Seasonal performance metrics for the climatology (long-term mean) model.
Table 13.
Seasonal performance metrics for the climatology (long-term mean) model.
| Dataset | Season | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | Wet | 86.01 | 124.40 | −0.46 | −0.54 | −61.90 |
| | Dry | 35.85 | 36.57 | −10.83 | −3.62 | 440.06 |
| Validation | Wet | 82.54 | 117.83 | −0.50 | −0.54 | −61.25 |
| | Dry | 33.92 | 34.81 | −8.68 | −2.59 | 330.44 |
| Test | Wet | 103.47 | 146.53 | −0.71 | −0.57 | −68.76 |
| | Dry | 35.42 | 39.07 | −1.67 | −1.93 | 256.50 |
Table 14.
Top 30 peak-flow performance metrics for the climatology (long-term mean) model.
Table 14.
Top 30 peak-flow performance metrics for the climatology (long-term mean) model.
| Dataset | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | 508.05 | 513.69 | −44.76 | −0.69 | −92.21 |
| Validation | 355.95 | 359.23 | −53.89 | −0.67 | −89.24 |
| Test | 426.36 | 428.43 | −103.02 | −0.68 | −90.85 |
Table 15.
Seasonal performance metrics of the HBV model.
Table 15.
Seasonal performance metrics of the HBV model.
| Dataset | Season | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | Wet | 61.35 | 97.03 | 0.11 | −0.55 | −11.22 |
| | Dry | 10.32 | 23.81 | −4.01 | −0.77 | 67.84 |
| Validation | Wet | 43.86 | 65.87 | 0.53 | −0.38 | −8.02 |
| | Dry | 10.12 | 19.47 | −2.03 | −2.03 | 39.49 |
| Test | Wet | 55.81 | 82.79 | 0.45 | 0.71 | −8.684 |
| | Dry | 12.73 | 30.33 | −0.61 | 0.12 | 39.1 |
Table 16.
Top 30 peak-flow performance metrics for the HBV model.
Table 16.
Top 30 peak-flow performance metrics for the HBV model.
| Dataset | MAE | RMSE | NSE | KGE | PBIAS (%) |
|---|
| Training | 360.83 | 374.56 | −23.32 | −0.2 | −65.5 |
| Validation | 195.76 | 204.94 | −16.86 | −0.06 | −49.07 |
| Test | 158.14 | 188.52 | −19.14 | −1.45 | −27.58 |
Table 17.
Quantitative performance metrics of deep learning models (Training/Validation/Test).
Table 17.
Quantitative performance metrics of deep learning models (Training/Validation/Test).
| Model | MAE (m3/s) | RMSE (m3/s) | NSE | KGE | PBIAS (%) |
|---|
| Climatology | 52.60/50.17/58.16 | 77.85/73.80/90.50 | 0.00/0.00/−0.01 | −0.41/−0.41/−0.43 | −0.01/−1.70/−20.39 |
| HBV | 27.37/21.4/27.12 | 59.36/41.26/27.12 | 0.42/0.69/0.64 | 0.70/0.78/0.82 | −1.47/−0.8/−1.55 |
| CNN | 9.54/10.55/15.46 | 21.63/24.27/36.34 | 0.92/0.89/0.84 | 0.90/0.88/0.85 | −0.16/−3.21/−4.82 |
| LSTM | 9.34/9.77/14.54 | 23.20/23.52/34.99 | 0.91/0.90/0.85 | 0.93/0.91/0.90 | −0.43/−3.30/−2.94 |
| GRU | 9.66/10.08/14.59 | 22.99/23.77/35.44 | 0.91/0.90/0.84 | 0.90/0.89/0.86 | −0.14/−2.44/−4.82 |
| BiLSTM | 8.91/9.97/14.94 | 22.76/24.10/36.83 | 0.91/0.89/0.83 | 0.93/0.90/0.91 | −2.08/−4.58/−2.91 |
| Hybrid CNN–LSTM | 8.26/10.46/15.17 | 19.05/24.79/35.82 | 0.94/0.89/0.84 | 0.94/0.90/0.90 | 0.89/−2.68/−2.14 |
Table 18.
Overall and extreme-flow classification performance using five discharge classes.
Table 18.
Overall and extreme-flow classification performance using five discharge classes.
| Model | Accuracy (Training) | Accuracy (Val) | Accuracy (Test) | Extreme-Class Accuracy (Test) |
|---|
| Climatology | 19.9% | 19.9% | 19.9% | 0% |
| HBV | 49.2% | 49.2% | 49.2% | 77.9% |
| CNN | 50.8% | 57.3% | 61.6% | 89.9% |
| LSTM | 66.9% | 73.1% | 73.3% | 90.5% |
| GRU | 54.0% | 61.3% | 66.7% | 90.7% |
| Bidirectional | 68.4% | 70.5% | 76.4% | 89.9% |
| Hybrid CNN–LSTM | 58.9% | 66.3% | 72.8% | 89.5% |
Table 19.
Comparative analysis of flood forecasting studies using DL and ML approaches.
Table 19.
Comparative analysis of flood forecasting studies using DL and ML approaches.
| Ref | Objective | Models Tested | Key Findings |
|---|
| [26] | Predict stream stage heights using multi-modal hydrometeorological data | ConvLSTM | Achieved ∼26% improvement in model error over state-of-the-art models, effectively capturing spatiotemporal dynamics in flash flood-prone catchments. |
| [38] | Develop an interpretable hybrid model for flood forecasting | Transformer, LSTM, AGRS | The AGRS–LSTM–Transformer model enhanced interpretability and forecasting accuracy, particularly for extreme events. |
| [39] | Real-time flood prediction in the Red River of the North, USA | LSTM, 1D-CNN | LSTM outperformed 1D-CNN in predicting flood events, demonstrating better accuracy in capturing temporal dependencies. |
| [40] | Improve flood forecasting by incorporating vector direction into LSTM | LSTM with vector direction (VD) | The VD-LSTM model improved prediction accuracy by considering the directionality of flood processes. |
| [41] | Enhance flood prediction using climate parameters | LSTM | Incorporating climate parameters into LSTM models improved the prediction of extreme flood events. |
| [27] | Operational flood forecasting using ML models | Various ML models | ML models demonstrated potential in operational settings, with some outperforming traditional hydrological models in certain scenarios. |
| [42] | Utilize satellite data and ML for flood forecasting in the Ou’em’e basin | ML models with satellite data | Combining satellite data with ML models improved flood prediction accuracy in data-scarce regions. |
| [43] | Predict flood stages using deep neural networks | LSTM, Dense Neural Networks, CNN | LSTM models provided accurate near real-time flood stage predictions, outperforming other deep learning models. |