1. Introduction
Given the role of dams in water storage, flood management, and hydropower generation, inflow forecasting is of high importance. In addition to improving water management, these forecasts enable timely decision-making in critical situations such as floods. Forecasting inflow also helps decision makers to better manage operations such as allocating water resources for different uses, generating electricity, and operating reservoirs [
1]. By integrating meteorological parameters and relating them to inflow, more environmentally compatible hydrological models can be used to increase the accuracy of predictions.
Machine learning plays a key role in predicting dam inflows because it significantly increases prediction accuracy compared to traditional methods by modeling complex and nonlinear patterns. In recent years, the use of machine learning algorithms such as recurrent neural networks (RNNs), long short-term memory (LSTM), random forests (RFs), and Gaussian process regression (GPR) has grown significantly for predicting inflow into dams. These models can extract complex relationships among precipitation data, air temperature, soil moisture, and climatic indices, and predict temporal and spatial patterns of flow with high accuracy [
2]. Zhou et al. [
3] developed an RF model to predict daily inflows into wastewater treatment plants on two wastewater treatment plants in Ontario, Canada, Humber, and a confidential treatment plant, accompanied by a novel probabilistic forecasting approach to quantify associated uncertainties. Mahmoud et al. [
4] focused on predicting the daily flow of the Euphrates River upstream of the Haditha Reservoir in Anbar Province using two artificial neural network (ANN) models—a single-layer feedforward backpropagation network (FFBP) and a multilayer perceptron (MLP)—and applying the effects of climate change and precipitation patterns on water resources. On the other hand, a combination of machine learning models was used to improve independent models. Khorram and Jehbez [
5] developed a hybrid model that combined long short-term memory (LSTM) with a convolutional neural network (CNN) architecture to predict inflow into the reservoir of the Dorudzan Dam in the Ker River Basin, Fars Province, Iran. They evaluated the model performance using various methods, including support vector machines (SVM), adaptive neural-fuzzy inference system (ANFIS), variable infiltration capacity (VIC) model, and autocorrelated moving average (ARIMA) applied to the flow data. The results showed that the combined CNN-LSTM approach provides better prediction capability for complex hydrological systems and improves reservoir inflow modeling and water resources management. Li et al. [
6] present an interpretable hybrid hydrological model that integrates physical process decomposition with LSTM networks and the widely used HEC-HMS model to predict reservoir inflow in the cascade systems of the Missouri River Basin.
Time series Decomposition before modeling is an important task to understand the structure of the data, select the appropriate model, and increase forecast accuracy [
7,
8]. This process helps analyze various components of the time series, such as trend, seasonality, and random fluctuations [
9]. By removing noise and random fluctuations from the time series, the model is less likely to overfit and has better generalizability [
10]. This step is especially important in sensitive applications such as predicting dam inflow, electricity load, or market prices [
11]. Yu et al. [
12] present a data-driven decomposition-based model called FT-SVR that integrates the Fourier transform (FT) with support vector regression (SVR) to predict monthly inflow into the Three Gorges Dam reservoir on the Yangtze River in China. By detecting multiscale fluctuations in the input flow time series, FT decomposes the series into interpretable frequency components. Independent SVR models are then developed for each decomposed component. The results demonstrate the improved predictive capability of FT-SVR by effectively capturing the physical meaning embedded in the frequency-based components. Jia et al. [
13] introduced an intelligent deformation prediction model for dams that effectively removes noise in deformation monitoring data. The original data were de-noised using empirical mode decomposition (EMD), symplectic geometric mode decomposition (SGMD), and wavelet denoising (WD). This combined approach increases the reliability of the prediction and provides an advanced tool for dam safety monitoring and management. Ahmadi et al. [
14] developed different models based on single and combined variable mode decomposition (VMD) with estimation methods such as K-star (K*), Gaussian process regression (GPR), and long short-term memory (LSTM) to predict long-term average monthly inflows for Barun Dam in northwestern Iran. The findings show that hybrid decomposition combined with LSTM enhances the reservoir inflow prediction performance by effectively capturing nonlinear and fluctuating data features.
Compared to classical models, quantum machine learning (QML) models have a high potential to solve complex problems with greater speed and efficiency, especially in areas where the data is very large or its structure is complex and nonlinear. In problems with very large feature spaces or complex nonlinear relationships, classical models may suffer from performance degradation or require high computational resources, while quantum models can model these relationships more naturally. Grzesiak and Thakkar [
15] evaluated QML techniques to improve daily flood forecasts along the Wupper River in Germany in 2023. By integrating classical machine learning methods with quantum features such as superposition and entanglement, the hybrid model achieves improved prediction accuracy and competitive training times. Comparative analysis demonstrated the scalability and efficiency advantages of QML over traditional approaches. Zhen and Bărbulescu [
16] proposed a quantum neural network (QNN) approach for river discharge prediction. A comparative evaluation with various neural network models, including LSTM, BPNN, ELM, CNN-LSTM, SSA-BP, and PSO-ELM, showed the high performance of the QNN approach, especially in capturing peak discharge events. Also, Zhen and Bărbulescu [
17] used variable quantum regression (VQR) to predict river discharge series in Romania. The model performance was compared with quantum neural networks (QNNs) and classical AI methods on the same dataset. The results showed that quantum-based models have higher accuracy and generalizability in predicting maximum monthly discharge.
Accurate dam inflow prediction is essential for optimizing water resources management, particularly in transboundary basins where water allocation affects multiple stakeholders. However, inflow time series are inherently characterized by high volatility, non-linearity, and complex climatic interactions, which challenge the predictive capability of conventional models. While time-series decomposition methods and quantum machine learning have individually shown promise in hydrological forecasting, their synergistic integration remains largely unexplored, especially for dams exhibiting highly fluctuating behavioral patterns. Furthermore, existing quantum-based hydrological studies have focused predominantly on point predictions without systematically addressing how decomposition can enhance quantum model performance by isolating trend, seasonal, and residual components. The Khoda Afarin Dam, situated on the Iran–Azerbaijan border, presents a critical case where such methodological innovation is urgently needed. Despite the dam’s strategic importance for agriculture, energy, and water supply in both nations, no prior study has investigated inflow prediction for this specific reservoir, whose complex hydrological regime defies conventional forecasting approaches.
Therefore, this study introduces a dynamic framework for forecasting monthly inflows to the Khoda Afarin Dam, located on the border between Iran and Azerbaijan, by integrating two general modeling approaches. The SVMD model, a recent advancement in time series analysis, is coupled with the HQNN model to deliver highly reliable predictions. To assess performance and quantify improvements, the hybrid SVMD-HQNN model is compared against a standalone HQNN and a classical machine learning model (Random Forest, RF) under two distinct scenarios: one incorporating hydrological parameters and the other focusing on dam-related variables. Hence, by coupling SVMD with HQNN, the study demonstrates how adaptive decomposition can enhance quantum model interpretability, a critical advancement given the black-box criticism often leveled at quantum approaches. The SVMD preprocessing reduces noise and disentangles complex temporal patterns, allowing the quantum layer to focus on learning component-specific dynamics rather than raw, chaotic signals.
2. Materials and Methods
2.1. Overview of the Study Area
Khoda Afarin Dam is an earthen dam with geographical coordinates 39°9′35″ N, 46°56′05″ E, built on the Aras River and located on the border between Iran and Azerbaijan. It is located 8 km west of Khomarlu in East Azerbaijan Province of Iran and 14 km southwest of Soltanli in Jabra’il County, Azerbaijan [
18]. The dam serves two purposes: hydroelectric power generation with a capacity of 102 MW, and water storage for irrigation of up to 75,000 hectares. Its reservoir has a total capacity of approximately 1612 Mm
3 and an area of about 20 km
2. In addition to agriculture and energy production, this dam plays a significant role in the water supply for the two countries. By storing about 2 billion cubic meters, this dam meets the irrigation needs of farms in Ardabil province of Iran and parts of Azerbaijan. On the other hand, it has improved environmental conditions, animal husbandry, and tourism, leading to economic development and the welfare of the residents of the region. The Khoda Afarin Dam reservoir is located in a predominantly mountainous region. The main source of water supply to the reservoir is the Aras River, whose flow regime is derived from multiple hydrological processes in this watershed. These processes include direct precipitation-runoff, groundwater base flow, and most importantly, snowmelt from higher elevations, which usually creates a significant peak in inflow in the spring.
The climate of this dam is known as humid continental with dry, hot summers and cold winters. The altitude is approximately 1900 m above sea level, and the average annual temperature is 13.7 °C. The maximum temperature is 33 °C in July, and the average temperature in winter is −8.3 °C. Annual rainfall is low, averaging about 25.5 mm, with April being the wettest month and August being the driest. Most precipitation, which includes rain and snow, occurs approximately 78 days a year, mainly in spring and summer, with an average annual humidity of 54% [
19]. Sunshine hours, especially in late spring and summer, are relatively high, providing conditions for high evaporation.
The meteorological data used in this study, including precipitation and evaporation, were obtained from ground observation stations operated by the East Azerbaijan Water Organization in Iran. The station is located in Khoda Afarin County, Qiz Qale Si region. The data were recorded with daily temporal resolution and aggregated into monthly values to align with the purpose of inflow forecasting.
Figure 1 shows the location of the dam studied for inflow prediction.
By predicting the inflow of the Khoda Afarin Dam, important operations such as optimal water resource management, sustainable energy production, and flood control will be improved. This forecast will allow for timely adjustment of gates and help better meet the agricultural, drinking, and industrial needs of both Iran and Azerbaijan. On the one hand, reducing flood risks and, conversely, timely planning for drought are possible with accurate forecasting. By creating a robust forecasting system that incorporates hydrological changes, conditions can be set for economic development and improving the well-being of residents [
20].
Beyond its primary functions for agriculture, energy, and water supply, the Khoda Afarin Dam reservoir faces several environmental challenges that are relevant to understanding its hydrological system [
21]. Sedimentation is a significant concern, as the reservoir acts as a sedimentation unit that progressively reduces storage capacity and affects inflow dynamics. Water quality issues are also prevalent, with studies documenting the presence of heavy metals, including lead, arsenic, cadmium, and mercury, in the reservoir’s water, sediments, and fish tissues. More recently, microplastic pollution has been identified as an emerging concern, with the reservoir potentially acting as a sink for microplastics transported from upstream urban areas. These environmental factors influence the reservoir’s ecological health and operational management, providing important context for inflow prediction modeling.
2.2. Overview of the Data Collection and Data Preprocessing
In order to predict the inflow of Khoda Afarin Dam, inflow and hydrological data were obtained from the Regional Water Organization for a period of 16 years, from 2009 to June 2024. The collected data includes evaporation, precipitation, reservoir volume, water level, reservoir surface area, and inflow to the reservoir.
Figure 2 shows the time series of inflow to Khoda Afarin Dam for a period of 16 years.
Table 1 presents the statistical summary of the selected parameters.
Data preprocessing before modeling is one of the important factors in modeling and increasing the accuracy of prediction. Accurate preprocessing not only increases the accuracy but also affects and improves the training time and prevents the complexity of the model. Therefore, after collecting the data, outliers were identified using the standard deviation criterion. Any data that was more than three times the standard deviation of the entire data set was considered an outlier and corrected. Missing values were filled using the arithmetic mean. Then, the Z-score normalization method was applied to the data to create a mean of zero and a standard deviation of one. Given that a total of about 2 percent of the data set had deficiencies, data preprocessing was performed using common and simple methods.
Table 1 presents the statistical summary of the selected parameters. The vast majority of outliers occurred sporadically, with no concentration in a specific season, and the proportion of removed data in any single month remained below 1%. A before–and–after comparison of monthly inflow climatology confirmed that the seasonal cycle (mean and standard deviation of monthly inflow) changed very little in all months, indicating that the outlier treatment did not materially distort the seasonal signal.
After the data set was finalized, two modeling scenarios were created to predict monthly inflow in million cubic meters (MCM). The first scenario included variables such as evaporation, precipitation, and a three-month lag of inflow data. In contrast, the second scenario included reservoir parameters such as volume, water level, surface area, and a three-month lag of inflow. Incorporating these time lags, especially in monthly inflow forecasts, proved useful by capturing the short-term memory characteristics of the hydrological system. This strategy provided valuable insights into how past inflow trends influenced current inflow behavior. In this research, monthly data covering 16 years from 2009 to June 2024 were divided into training and testing subsets. The training set comprised 70% of the data, spanning from January 2009 to October 2019, while the remaining 30%, from November 2019 to June 2024, was allocated for testing.
In predicting inflow, the hydrological and physical dynamics of the reservoir system can have different effects. Precipitation is one of the important parameters in inflow, which has a direct effect on both the reservoir and the basin runoff. On the other hand, the evaporation parameter causes more losses in inflow, especially in arid and semi-arid regions. While these reservoir parameters are not causal drivers of inflow, they serve as integrative proxies for the cumulative hydrological conditions preceding each prediction point. For instance, reservoir volume and water level reflect the integrated history of inflows, outflows, and net basin supply, potentially encoding information about groundwater storage, soil moisture, or snowmelt carryover that is not captured by the available meteorological variables alone. Similarly, surface area influences exposure to evaporative losses and precipitation capture directly on the reservoir. Practical consideration given the absence of comprehensive upstream human activity records for this transboundary catchment. This scenario comparison allows the study to assess whether reservoir state variables contain predictive information beyond what is provided by the limited available climatic data.
A three-month lag of inflow was incorporated in both modeling scenarios based on hydrological principles of system memory and antecedent conditions. According to the optimality framework for the Antecedent Precipitation Index (API) established by Li et al. [
22], the effective memory length of a hydrological system is governed by an attenuation coefficient that reflects the combined effects of catchment storage, groundwater contribution, and soil moisture retention. This coefficient determines how long past precipitation and flow conditions continue to influence current runoff generation. For catchments with significant groundwater components, snowmelt contributions, or substantial soil moisture storage, characteristics consistent with the mountainous Khoda Afarin Dam catchment, hydrological memory can extend to multiple months. Tian et al. [
23] demonstrated that optimal antecedent periods vary by hydrogeological setting, with longer memory (≥3 months) typically observed in systems where subsurface storage and delayed release mechanisms dominate. Furthermore, previous studies on low-flow forecasting have successfully employed 3-month antecedent conditions to capture baseflow recession and catchment storage dynamics. The selection of a three-month lag, therefore, balances the need to incorporate meaningful antecedent information while avoiding over-parameterization and multicollinearity in the model.
The modeling was done with three different processes. First, a hybrid model was created with two general methods of time series analysis and quantum machine learning. In this approach, the SVMD algorithm for time series analysis was combined with the HQNN model. To prevent information leakage in the decomposition-forecasting framework, SVMD was applied exclusively to the training set. This approach ensures that no future information from the test period influenced model training, thereby avoiding the information leakage problem identified in traditional decomposition-prediction methods. In order to compare the performance of this hybrid model, a standalone HQNN approach was modeled. Also, to compare quantum machine learning with a classical RF model, it was evaluated. Finally, the results were analyzed using evaluation criteria and graphical charts. All steps performed in Python version 3.13.3 were run on Windows 11 operating system with 16 GB RAM.
Figure 3 summarizes the steps of input stream prediction.
2.3. Classic Machine Learning Model
An Overview of the RF Model
The Random Forest (RF) model is an ensemble-based machine learning algorithm that combines multiple decision trees. In this model, each tree is built on a random sample of the training data and using a random subset of features. This increases the diversity between trees and reduces the risk of overfitting [
24]. The final prediction of the model is made based on majority voting (in classification problems) or averaging the results of the trees (in regression problems), which results in higher accuracy and greater stability of the model than individual trees.
Formally, for regression, the prediction of the random forest model with
trees for an input
is the average of all tree predictions [
25]:
where
is the prediction of the
-th tree built with randomness
and training data
.
One of the important features of Random Forest is the use of bootstrap sampling in the construction of each tree, such that about 63% of the data is used to build the tree, and the other 37% which is not used in building that tree, is used as Out Of Bag (OOB) data to evaluate the performance of the model [
26]. In addition, at each node of the tree, the best partition is selected based on a random subset of features, which helps reduce the correlation between trees and improve efficiency. The Random Forest model is of great importance in predicting the inflow to a dam. Its high ability to model complex nonlinear relationships between inputs increases accuracy. By using a set of decision trees, RF can identify hidden and complex patterns in hydrological data and increase the accuracy of prediction compared to classical models. This feature is especially helpful in water resources management and dam planning, which is of great importance [
24].
The RF model was employed to predict dam inflow by leveraging its ensemble learning approach, which builds multiple decision trees and aggregates their outputs to improve prediction robustness and reduce overfitting. In this study, the model was configured with 7 trees (n_estimators = 7) and a fixed random seed (random_state = 140) to ensure reproducibility of results. The training process involved fitting the RF model to the input features and the observed dam inflow values, allowing the model to learn nonlinear relationships between predictors and the target variable for accurate inflow forecasting.
2.4. Quantum Machine Learning Model
An Overview of the HQNN Model
A Hybrid Quantum Neural Network (HQNN) is a type of neural network that uses quantum mechanical principles in the design and implementation of neurons and network layers. It uses features such as complex number weights and concepts such as quantum entanglement and parallelization to increase the representation capacity and improve the nonlinearity of the network. These models sometimes operate as a combination of classical and quantum processing [
27]. This hybrid combination provides increased accuracy, better convergence speed in training, and greater robustness of the model against complex data. HQNN is composed of quantum neural networks and quantum machine learning methods, which have wide applications in complex problems such as pattern recognition, big data, and computational problems that classical models are barely able to solve.
This integrated framework combines classical and quantum computation. Classical embeddings extract nonlinear patterns and contextual signals, while the quantum layer leverages principles such as superposition and entanglement to capture intricate dependencies that classical models struggle with [
28]. Practitioners typically design variational quantum circuits using several qubits (e.g., 4 or 9) and modules like Pennylane’s 35QAOAEmbedding or StronglyEntanglingLayer to encode and manipulate quantum states. Circuit outputs are measured and converted into predictions, with the overall system optimized through classical training routines.
A typical mathematical formulation of the HQNN hybrid model involves three main components: data encoding (feature map) , a parameterized quantum circuit (ansatz) , and measurement of an observable . Classical input data is first encoded into an -qubit quantum state by a feature map . The quantum circuit , parameterized by , acts on this state to produce the output quantum state .
The prediction or output of the HQNN is obtained by measuring an observable operator
on the output state, represented as the expectation value [
29]:
This expectation value serves as the analog to the output of a classical neural network layer.
Training the HQNN entails optimizing the parameters
to minimize a loss or cost function, often defined by the difference between predicted and true labels or energies, using classical optimization methods. The model typically incorporates nonlinearities induced by measurement and repetition of quantum-classical layers [
30]. The cost function
for energy calculations can be written:
where
is the quantum state output for input
, and
is the associated Hamiltonian operator.
Higher processing speed, high compatibility with noisier data, and better understanding of fluctuations in input data are important features of this model. HQNN can also adapt to multi-source and extensive data, which is of great importance in water resources management and dam design, and helps optimize resource use and reduce risks from flow fluctuations. Using this hybrid model provides more realistic and reliable predictions for the flow entering dams and is important in management decisions.
The HQNN model consists of several carefully designed layers and hyperparameters optimized for dam inflow prediction. The classical part starts with an input layer that takes the full set of input features from the training data. It then passes through two dense layers: the first dense layer has 22 neurons using the Rectified Linear Unit (ReLU) activation function to introduce nonlinearity and extract relevant features, followed by a second dense layer with a neuron count equal to the number of qubits (up to 4) using the tanh activation function, which scales outputs to the interval [−1, 1] suitable for encoding into quantum rotations. Central to the model is the quantum layer, implemented as a custom TensorFlow Keras layer, which simulates a variational quantum circuit on a quantum device with a specified number of qubits (limited to 4) and layers (2 layers). Each quantum layer applies parameterized quantum gates (RZ, RY, RZ rotations) to individual qubits, followed by entangling Controlled-NOT (CNOT) gates arranged in a ring topology to create quantum correlations (entanglement). The trainable quantum weights, sized , correspond to rotation angles for the gates and are optimized during training along with the classical weights. The quantum circuit outputs expectation values of Pauli-Z operators as quantum feature representations for further classical processing.
After the quantum layer, the model includes two additional dense layers with 136 and 34 neurons, respectively, both activated by ReLU functions to refine the extracted features, and finally outputs a linear activated neuron for regression prediction of dam inflow. The architecture is trained using the Adam optimizer with a learning rate of 0.0009, minimizing mean squared error (MSE), which is the loss function selected for continuous value prediction. Training occurs over 50 epochs with a batch size of 39 samples, using a validation split that preserves temporal order to ensure realistic time-series forecasting evaluation and prevent data leakage.
2.5. Decomposition Model
The Sequential Variational Mode Decomposition (SVMD) algorithm is an advanced method for analyzing signals and variable data that extracts intrinsic mode functions (IMFs) gradually and sequentially. SVMD has the advantage over similar techniques, such as VMD (Variational Mode Decomposition), that it does not require the required number of intrinsic mode functions before execution and extracts modes sequentially [
31].
This algorithm is used in applications such as analyzing audio, video, and other time-frequency signals, and due to the sequential nature of mode extraction, it has greater stability and accuracy in separating the variable modes of the signal. In addition, SVMD can reduce the amount of computation and better handle noise and non-stationary data. The core mathematical formulation of SVMD involves decomposing an input signal
into modes
and a residual
successively, where at each step [
32]:
The objective at each iteration is to extract a mode
that is narrowband and centered around a frequency
, by minimizing the mode’s bandwidth while ensuring the residual energy around
is minimized. This is typically formulated as an optimization problem minimizing a cost function involving the bandwidth of the mode and the residual signal energy, often solved using Lagrangian multipliers and iterative updates via methods like the Alternating Direction Method of Multipliers (ADMM) [
32]. A representative optimization formulation for extracting the
-th mode
is:
where
is the partial time derivative,
denotes convolution,
is the Dirac delta,
is the imaginary unit,
is the center frequency of the mode,
is a penalty parameter controlling bandwidth, and
is the residual after subtracting previously extracted modes.
The SVMD-HQNN algorithm is of great importance in predicting dam inflow because it uses a combination of SVMD to extract features from complex signals and hybrid quantum neural networks for more accurate modeling and prediction.
The importance of SVMD-HQNN in dam inflow prediction is:
More accurate and better extraction of time-frequency characteristics of inflow signals; SVMD sequentially extracts individual modes related to different flow fluctuations, which improves signal separation.
Increased prediction accuracy by using the high capabilities of hybrid quantum neural networks (HQNNs) that can better model complex nonlinear relationships in the data.
Reduced noise and eliminated the impact of non-stationary and complex data that is very common in hydrological data.
Optimized model performance by combining two advanced algorithms, which leads to more stable and reliable predictions in dam water resources management and operation planning.
Therefore, SVMD-HQNN, as an advanced forecasting model, has the potential to be applied in fields such as flood management, water allocation, and dam performance optimization, and can improve the accuracy and efficiency of dam inflow forecasting compared to classical models.
In the given SVMD configuration, several hyperparameters control the decomposition quality: alpha (800) governs the bandwidth constraint strength, determining smoothness of extracted modes; tau (0) represents noise tolerance; K (5) specifies the number of modes to be extracted; DC (2) controls whether the decomposition includes a DC (zero frequency) component; init (1) indicates the initialization method, typically to start decomposition; and tolerance (1 × 10−7) sets the convergence tolerance for the optimization process. These parameters balance detail preservation and noise reduction, enabling the extraction of physically meaningful sub-signals that serve as cleaner inputs to machine learning models.
Once the time series is decomposed via SVMD, each mode or a recomposed series can be fed into the HQNN. The HQNN then leverages quantum and classical layers to learn complex nonlinear relationships more effectively than purely classical networks. This hybridization strategically combines SVMD’s signal decomposition capabilities to reduce non-stationarity and noise with HQNN’s expressive power to extract deep patterns, thereby improving forecasting accuracy and robustness for dam inflow prediction tasks.
The parameters of the SVMD algorithm were primarily adopted from the default values established in the literature, with additional verification through sensitivity analysis [
31]. Also, the quantum-specific hyperparameters of the HQNN were selected based on architectural considerations, empirical validation, and guidance from the quantum machine learning literature. In order to tune the hyperparameters, a random search algorithm was chosen. Unlike network search, which systematically evaluates every possible parameter combination, random search randomly selects configurations from a predefined space, providing more efficient coverage [
29,
33]. This type of tuning is particularly useful in high-dimensional environments where exhaustive search becomes computationally difficult. To improve the robustness of the result, the training process was repeated in 10 independent runs. To strike a balance between underfitting and overfitting, the model architecture, depth, and number of parameters were systematically varied. Each machine learning model was subjected to 50 random search iterations to explore the hyperparameter landscape while maintaining computational tractability. For repeatability, a fixed random seed was initialized before each search and training cycle.
2.6. Model Evaluation Metrics
Model performance evaluation is a fundamental process that contributes to a better understanding of the results obtained. Quantitative metrics provide a standardized tool for assessing the effectiveness of a model in predicting or classifying outcomes. Establishing metrics enables researchers to systematically compare different modeling approaches and select a process that is strongly related to the characteristics of the dataset and their research objectives. This approach increases the reliability of model selection and enables informed decision-making. In the present study, five main metrics were used to obtain model accuracy and performance: root mean square error (RMSE), coefficient of determination (R2), mean absolute percentage error (MAPE), Nash-Sutcliffe efficiency (NSE), and Killian-Gupta efficiency (KGE).
Root Mean Square Error (RMSE) quantifies the average magnitude of prediction errors by comparing forecasted values to actual observations, with a stronger penalty assigned to larger deviations. Lower RMSE values indicate greater predictive accuracy and fewer substantial errors. The coefficient of determination (R
2) measures the proportion of variance in the observed data that is explained by the model, ranging from 0 to 1; higher values reflect a closer fit to the underlying data structure. Mean Absolute Percentage Error (MAPE) calculates the average percentage difference between predicted and actual values, offering an intuitive, scale-independent measure of accuracy. Smaller MAPE values correspond to more precise forecasts. Nash–Sutcliffe Efficiency (NSE) assesses the agreement between predicted and observed time series, particularly in hydrological modeling. An NSE of 1 denotes perfect predictive performance, values between 0 and 1 suggest acceptable accuracy, and negative values imply that the model performs worse than using the mean of the observed data as a predictor. The KGE is a goodness-of-fit metric widely used in hydrology to evaluate the performance of models by comparing simulated data with observed data. KGE values range from
to 1, with 1 indicating a perfect match between simulations and observations. Unlike NSE, the interpretation of KGE includes the contributions of correlation, bias, and variability, offering a nuanced assessment of model quality.
where
and
are the predicted and observed values,
and
are the mean observed and predicted values, respectively, and N is the total number of data points.
where r is the Pearson correlation coefficient between simulated and observed data,
is the ratio of the standard deviation of simulations
to that of observations
, representing variability,
is the ratio of the mean of simulations
to the mean of observations
, representing bias. A typical interpretation considers KGE > −0.41 as better performance than simply using the mean observed value; KGE = 1 indicates a perfect fit. Values less than −0.41 indicate worse performance than the mean of observations [
34].
3. Results
Figure 4 shows the correlation matrices between the input parameters used to predict the dam inflow under two different scenarios. The first scenario includes hydrological parameters, and the second scenario includes dam parameters. In both scenarios, inflow has the highest correlation with Lag1 (0.674), highlighting strong persistence and the importance of immediate past inflow in predicting current inflow values.
In Scenario 1, the meteorological parameters (precipitation and evaporation) show weak to moderate correlation with inflow (0.167 and 0.165), suggesting limited direct influence, possibly due to lag effects in the hydrological response. Scenario 2 substitutes meteorological parameters with reservoir operational variables. Here, inflow’s correlations with Volume (0.423), Water Surface Area (0.418), and Water Level (0.286) are moderate and comparable to those of Lag2 and Lag3, indicating these state variables also capture relevant system behavior for inflow prediction. The three lagged inflow variables (Lag1, Lag2, Lag3) play a crucial role in dam inflow prediction by capturing the temporal dependencies and persistence present in hydrological time series data.
The comparative performance of the three forecasting models, RF, HQNN, and SVMD-HQNN for predicting the inflow to Khoda Afarin Dam under two different scenarios is presented in
Table 2.
Scenario 1:
In this scenario for testing, SVMD-HQNN achieved the highest performance (R2 = 0.926, RMSE = 34.511 MCM, NSE = 0.919, MAPE = 11.84%, KGE = 0.878).
HQNN also provided reasonable prediction quality (R2 = 0.745, RMSE = 61.307, NSE = 0.745), but with noticeably higher errors compared to the hybrid SVMD-HQNN.
RF showed the lowest predictive power among the three (R2 = 0.675), with the highest RMSE (69.136), MAPE, and the lowest KGE, indicating greater uncertainty and bias in the predictions.
The SVMD-HQNN’s clear advantage suggests that its hybrid architecture (combining SVM for preprocessing/decomposition with HQNN for learning) captures both linear and complex nonlinear patterns more effectively than the individual HQNN or RF approaches.
Scenario 2:
Performance improved for all models when additional reservoir state variables (volume, water level, water surface area) were included:
SVMD-HQNN’s accuracy increased further (R2 = 0.955, RMSE = 25.740, NSE = 0.945, MAPE = 8.98%, KGE = 0.931), reflecting superior ability to integrate diverse, highly correlated operating features.
HQNN and RF also improved, but SVMD-HQNN maintained its leading position (R2 values: HQNN 0.848, RF 0.773; RMSE: HQNN 47.219, RF 57.829).
The addition of reservoir parameters appears particularly beneficial for the hybrid SVMD-HQNN, which likely leverages both temporal and state information for robust, accurate inflow prediction.
By decomposing the signal before modeling, SVMD-HQNN removes irrelevant fluctuations, isolating clean and informative patterns. This is especially important for dam inflow, which is affected by abrupt rainfall events and operational changes.
The SVMD-HQNN model employs a powerful two-step approach: first, SVMD decomposes the dam inflow time series into cleaner sub-signals that isolate important patterns and remove noise; then, HQNN utilizes quantum-inspired neural techniques to learn complex relationships across these modes and input features. This combination allows SVMD-HQNN to excel in recognizing both short-term fluctuations and long-term trends, resulting in the highest accuracy, lowest error, and best reliability metrics in both scenarios, especially when richer reservoir data are included. By comparison, HQNN handles nonlinear relationships quite well but is less effective without the prior decomposition step, resulting in moderate outcomes. In contrast, RF offers robust and interpretable predictions for simpler patterns but struggles with the nonstationary and multi-scale dynamics inherent to dam inflows. These outcomes confirm that SVMD-HQNN is best suited for challenging hydrological forecasting tasks, HQNN provides a significant advance over classic methods, and RF serves as a reliable baseline model for less complex applications.
Figure 5 illustrates the time-series plot performance of three models, SVMD-HQNN, HQNN, and RF, for dam inflow prediction under two scenarios. In Scenario 1, SVMD-HQNN closely tracks the peaks and troughs of actual values throughout the years, with minimal deviation, especially during high-flow (peak) periods. HQNN follows a similar pattern but tends to lag slightly during rapid changes and is somewhat less accurate during extreme inflow events. RF demonstrates noticeable divergence during abrupt inflow peaks (e.g., near 2023 and 2024), often overestimating or underestimating actual values. Its predictions are more volatile and less consistent during fluctuating periods and tend to converge only during low or steady flow conditions. Scenario 2 incorporates additional reservoir operational data alongside the previous inputs, resulting in a clear improvement in prediction accuracy across all three models. SVMD-HQNN maintains its lead, nearly overlapping with the actual inflow throughout the entire period. HQNN similarly aligns close to the observed series, falling just short of SVMD-HQNN during the most rapid transitions. RF exhibits a notable performance improvement compared to Scenario 1, capturing both baseline and peak events more reliably; however, it occasionally overemphasizes or underemphasizes certain peaks and dips.
Figure 6 shows scatter plots that visualize the predictive performance of SVMD-HQNN, HQNN, and RF models for dam inflow estimation under two scenarios. Each plot compares predicted inflow with actual observed inflow, with a best-fit regression line and associated R
2 value indicating the strength of the linear relationship. Boxplots adjacent to each scatter show the distribution and spread of model predictions.
For Scenario 1, the SVMD-HQNN plot demonstrates a strong linear relationship between predicted and actual inflow points, densely clustered along the regression line. The R2 of 0.926 signals a high degree of accuracy and minimal bias across the entire inflow range, with the boxplot confirming tight distribution and few outliers. The HQNN scatter shows moderate spread around the line with some deviations for both lower and upper inflow values. Its R2 of 0.745 reflects lower prediction consistency with more scattering than SVMD-HQNN. The boxplot further highlights a broader distribution but still reasonable concentration. RF exhibits even greater dispersion in its scatter points and a lower R2 of 0.675, indicating weaker linear correspondence and less reliable predictions. The boxplot reveals a wider range and more variability, with predictions tending to deviate from actual values, especially at the extremes.
In Scenario 2, all models show improved predictive alignment. SVMD-HQNN achieves a near-perfect fit, with predicted points almost exactly on the regression line and an impressive R2 of 0.955. The boxplot illustrates extremely tight clustering, indicating both accuracy and stability with expanded features. HQNN significantly improves, with R2 = 0.848. Predicted values align closely with actuals, although a few points at high inflow magnitudes deviate. The boxplot suggests a more even distribution compared to Scenario 1, with reduced outliers. RF (bottom right) also benefits from the richer scenario, posting an increased R2 of 0.773. The scatter shows better concentration near the regression line, yet some points, especially at the upper inflow end, remain offset. The boxplot still indicates more variability relative to SVMD-HQNN and HQNN.
SVMD-HQNN consistently outperforms the other models, yielding the highest and the tightest scatter and boxplot distributions in both scenarios. This indicates excellent agreement between predicted and actual inflow, minimal bias, and robust generalization, especially when reservoir features are included. HQNN represents a significant improvement over RF, capturing nonlinearity and complex input-output relations but remaining slightly less reliable, particularly during extreme inflow events. RF, while functional, shows the weakest relationship, with the greatest prediction errors and dispersion, especially in the simpler scenario. It improves but still lags behind the neural models when given more comprehensive data.
Figure 7 presents Ridgeline plots illustrating the distribution of predicted dam inflow values (in Million Cubic Meters, MCM) across two distinct scenarios. Each scenario compares three predictive models, SVMD-HQNN, HQNN, and RF, against observed inflow data to assess their ability to replicate the actual distribution.
In scenario 1, SVMD-HQNN exhibits a sharp unimodal distribution closely aligned with the actual inflow (blue), particularly around the central peak (~300–400 MCM). This suggests high fidelity in capturing both the central tendency and dispersion of the observed data. HQNN shows a similar peak but with broader tails, indicating slight overestimation of extreme inflow events and reduced precision in low-probability regions. RF produces a flatter, more dispersed distribution, underrepresenting the peak and overestimating low inflow values (<200 MCM), which may reflect its tendency to smooth predictions and underfit nonlinear dynamics. Also, in scenario 2, SVMD-HQNN again demonstrates strong alignment with the actual inflow, capturing both the peak and tail behavior with minimal bias. The density curve suggests robust generalization under more variable conditions. HQNN retains a similar shape but shows heavier tails, implying increased uncertainty and reduced confidence in extreme predictions. RF diverges significantly, with a broad, low-amplitude distribution that fails to capture the observed peak. This may indicate poor adaptability to regime shifts or nonstationary inflow patterns.
In addition to the visual comparison in the ridgeline plots, distribution similarity was evaluated using Jensen–Shannon distance (JS), and Wasserstein (Earth Mover’s) distance between the predicted and observed inflow distributions for each model and scenario. Lower values of these metrics indicate closer agreement between the model’s distribution and the actual inflow, complementing the qualitative assessment from the ridgeline plots. Across both scenarios, the SVMD-HQNN consistently yielded the smallest JS and Wasserstein distances to the actual inflow distribution, followed by HQNN, while RF exhibited substantially larger distances, confirming the superior distributional fidelity of the SVMD-HQNN, as observed visually in
Figure 7.
The SVMD-HQNN hybrid model consistently outperforms standalone HQNN and RF across both scenarios. Its ability to preserve the statistical structure of observed inflow, especially under regime shifts, suggests superior generalization and reduced epistemic uncertainty. The integration of SVMD likely enhances feature extraction and boundary delineation, while HQNN contributes nonlinear learning capacity. In contrast, RF’s ensemble averaging appears to dilute peak fidelity and inflate low-probability regions, limiting its utility in high-stakes hydrological prediction.
4. Discussion
SVMD preprocessing reduces one of the main challenges of quantum machine learning, which is sensitivity to high-dimensional and noisy input data. By providing clean, decomposed components to the quantum layer instead of raw input signals, SVMD enables HQNN to operate on more structured information and reduces the quantum circuit complexity required for accurate predictions. Conversely, HQNN’s ability to model complex nonlinear relationships suggests that decomposition alone cannot account for the full dynamics of the hydrological system. The decomposition-first approach allows the quantum network to devote its representational capacity to learning physical relationships in each frequency component, rather than wasting computational resources on noise separation. This makes SVMD-HQNN particularly well-suited for dam inflow prediction in transboundary watersheds where data quality may vary and hydrological processes are inherently complex.
HQNNs represent a novel deep learning architecture that integrates hierarchical feature extraction with quantum-inspired optimization. In hydrology, where inflow dynamics are influenced by nonlinear, multiscale interactions (e.g., rainfall-runoff processes, catchment heterogeneity, climate variability), HQNNs offer several distinct advantages.
HQNNs utilize multi-layered abstraction to capture both low-level and high-level hydrological patterns. This is crucial for modeling nonlinear dependencies between meteorological inputs (e.g., precipitation, temperature) and inflow responses. Unlike traditional models that may overfit to specific hydrological conditions, HQNNs are better at adapting to regime shifts, such as transitions between dry and wet seasons. Their hierarchical structure allows robust learning from diverse inflow scenarios, improving reliability in operational forecasting. HQNNs often incorporate quantum-inspired algorithms (e.g., amplitude encoding, entanglement-based feature selection) that enhance convergence and reduce local minima trapping. This leads to faster training and better predictive stability, especially in high-dimensional hydrological datasets.
Recent studies show HQNNs outperform classical models like RF, SVR, and even standard deep networks (e.g., LSTM, CNN) in accuracy, robustness, and uncertainty quantification. They excel in capturing peak inflow events, which are critical for flood risk management and reservoir operation. Khemapatapan and Thepsena [
35] extended the application of quantum machine learning classifiers to real weather data from the station behind Pa Sak Jonlasit Dam, using data from 2016 to 2022. A systematic evaluation of classical features and optimizers with different circuit iterations was performed. Three quantum classifiers, Quantum Support Vector Machine (QSVM), Quantum Neural Network (QNN), and Variational Quantum Classification (VQC), were tested. The QSVM achieved the highest accuracy at 85.3%, followed by VQC at 70.1%, and QNN at 52.1%. Zhen and Bărbulescu [
17] presented a novel approach using QNNs for river discharge forecasting, highlighting its superior ability to predict extreme discharge values compared to traditional neural networks such as LSTM, BPNN, ELM, and CNN-LSTM. The QNN approach was tested on both raw data and data cleaned of aberrant values, showing consistently lower prediction errors, particularly for peak discharge events, which are critical for flood hazard management. The lowest R
2 was 84.36%, indicating the good performance of the quantum algorithm. And other studies in which the quantum model has increased accuracy and improved performance in prediction [
36,
37].
SVMD extracts modes one by one, dynamically determining the number of components during decomposition. This is particularly valuable in hydrology, where inflow signals often contain hidden periodicities, abrupt shifts, and overlapping frequencies due to rainfall variability, catchment response, and upstream regulation. This improves the signal-to-noise ratio and enhances the quality of input data for downstream predictive models, such as HQNN and other machine learning models.
Studies show that SVMD-based hybrid models outperform conventional setups in terms of forecast accuracy, peak event detection, and uncertainty reduction. Li et al. [
38] proposed a hybrid model combining adaptive VMD with Bi-LSTM for daily inflow data at the Baozhusi Hydropower Station in China, optimized using energy entropy, which effectively addresses the challenges of forecasting daily inflow for reservoirs. The notable performance metrics, with an RMSE of 64.783 and an NSE of 95.7%, demonstrate its robustness in handling highly nonlinear and nonstationary streamflow data. Ahmed et al. [
39] introduced a novel hybrid deep learning model (CVMD-CBiLSTM) for streamflow water level forecasting, integrating convolutional neural networks (CNNs), BiLSTM, and ant colony optimization (ACO) with a two-phase decomposition approach using CEEMDAN and VMD. By incorporating satellite-derived climate indices and ground-based meteorological data across nineteen gauging stations in the Australian Murray River, the model robustly extracts key features and improves forecast accuracy. The model demonstrates superior performance, with around 98% of prediction errors within ±0.020 m and a low relative RMSE of approximately 0.08%, outperforming benchmark models. Zhao et al. [
40] addressed challenges in monthly runoff prediction by proposing a model that integrates multiple fitness function optimizations to enhance decomposition and prediction accuracy. It uses K-means clustering combined with sample entropy (SE) to reconstruct high-frequency sequences obtained from CEEMDAN decomposition, followed by SVMD. The final prediction was made using a bidirectional long short-term memory (BiLSTM) network applied to the decomposed modal components. Case study results on Dongpu Reservoir runoff data demonstrate that the model, particularly using envelope kurtosis and permutation entropy as fitness functions, outperforms existing models in prediction accuracy, with Nash-Sutcliffe efficiency (NSE) and coefficient of determination (R
2) improvements ranging from 6.5% to 17.9% and 4.0% to 9.3%, respectively.
The achievements of previous research are consistent with the results obtained in this study [
33,
41]. So that in the first scenario, the HQNN model improved the inflow prediction compared to the classical method (RF) by 11.32%, and SVMD-HQNN compared to HQNN by 43.70%. On the other hand, in the second scenario, these ratios were 18.35 and 45.47%, respectively. This shows that the quantum model performs better than the classical model, but does not have sufficient reliability. By combining the SVMD model in both scenarios, the accuracy has increased significantly, and the predictions obtained have high reliability [
14,
42]. This approach effectively mitigates high-frequency noise and parameter selection issues in runoff modeling, providing robust support for accurate hydrological prediction.
Forecasting the flow into reservoirs is very important because it helps to store and use water properly and to better meet the needs of downstream areas. When we can predict the water inflow correctly, we can better manage turbines and control the inlet and outlet valves more precisely. Such forecasts make it possible to be aware when extreme weather events occur and to plan in time to minimize damage. In addition, for agriculture, especially in arid and semi-arid regions, water forecasting is very helpful in understanding when we have a lot of water and when we have a little, thus making better decisions about transferring water between basins. In short, this makes everything go better and smarter. Furthermore, existing environmental challenges for the Khoda Afarin Reservoir, including progressive sedimentation that reduces storage capacity, heavy metal pollution from upstream industrial and agricultural sources, and emerging microplastic pollution, add complexities to water quality management that are inherently linked to inflow dynamics. Inflow predictions inform not only decisions regarding water quantity but also pollution abatement strategies, as the dilution capacity and transport of pollutants are a function of river discharge and reservoir volume.
The present study focused exclusively on the Khoda Afarin Dam, which operates within a specific climatic, geological, and land-use context. As Li et al. [
43] demonstrated, watershed characteristics, including soil properties, vegetation cover, topography, and geology, fundamentally control how catchments transform precipitation into runoff. The optimal decomposition parameters and quantum network architecture identified for this site may not be directly transferable to basins with different hydrological behavior, such as karst regions (where rapid subsurface flow dominates) or snowmelt-dominated systems (where seasonal lags are more pronounced). Also, Regional heterogeneity in climate patterns, geological structure, and land use evolution may limit the applicability of the pre-trained SVMD-HQNN model without extensive recalibration. The decomposition characteristics learned from one catchment’s inflow signal may not correspond to the frequency components present in another basin’s hydrological time series. Therefore, while the methodological framework is transferable, the specific parameterizations are not. Future studies should integrate a broader range of physically meaningful variables to better capture catchment processes. These include soil moisture indices, temperature (for snowmelt estimation), groundwater infiltration rates, and upstream water abstraction records where available. Following the API optimality framework, systematic optimization of antecedent conditions for multiple variables could enhance model performance.
5. Conclusions
Predicting dam inflow using a decomposition algorithm plus quantum model is not just a methodological innovation—it is a strategic necessity. This hybrid approach bridges the gap between signal clarity and learning power, ensuring forecasts that are both accurate and trustworthy. In practice, it strengthens flood preparedness, reservoir safety, and sustainable water resource management. Therefore, in this study, a classical model (RF) and a quantum model (HQNN) were used to predict the inflow of the Khoda Afarin Dam. Given the importance of time series analysis before modeling, the SVMD algorithm was used. In order to evaluate the importance of different parameters and their accessibility, two scenarios were evaluated. The first scenario included hydrological parameters and the second scenario included reservoir parameters. This approach showed that the combined SVMD-HQNN model, in addition to having less error, was able to predict the inflow with an accuracy of 88.16% in the first scenario and 91.02% in the second scenario.
Given the results indicating the high performance of the hybrid approach, there are limitations in combining decomposition algorithms with quantum-inspired neural networks for dam inflow prediction. Although this combination increases the prediction accuracy, it faces modeling complexities, increases computational requirements, and requires fine-tuning of parameters. The hierarchical quantum neural network is a black box, making it difficult to explain and interpret the predictions to stakeholders such as reservoir managers or policymakers. This lack of transparency can hinder trust and acceptance in operational water resources management, where decision-makers often require interpretable outputs along with accuracy. The generalizability of the proposed models across different catchments is also a limitation. The present study focuses on a specific dam and hydrological regime, and the transferability of this approach to other basins with distinct climatic, geological, and land-use characteristics is not possible. Regional heterogeneity may limit the applicability of the models without additional calibration, and further testing is needed to ensure consistency across diverse hydrological conditions. Furthermore, this study emphasizes point predictions and distributional alignment but does not include formal uncertainty quantification, which is essential for risk-based decision-making.
Therefore, future research should consider other parameters such as soil moisture, temperature, groundwater infiltration rate, and pre-dam requirements. Advanced hybrid methods that combine analysis algorithms with attention mechanisms, ensemble learning, or probabilistic prediction frameworks can increase both accuracy and interpretability. On the other hand, applying this process to dams of different climates and sizes will make the results more reliable and help decision makers and managers improve control operations. Also, creating an early warning system to prevent critical incidents can make the forecasting process more practical. Also, time series cross-validation techniques, such as Time Series Split or blocked cross-validation, tailored to hydrological data, can be explored to verify model robustness on limited samples and reduce the risks of overfitting.