RETRACTED: WLP-VBL: A Robust Lightweight Model for Water Level Prediction

Yi, Congqin; Huang, Wenshu; Pan, Haiyan; Dong, Jinghan

doi:10.3390/electronics12194048

Open AccessArticle

RETRACTED: WLP-VBL: A Robust Lightweight Model for Water Level Prediction

College of Information Technology, Shanghai Ocean University, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(19), 4048; https://doi.org/10.3390/electronics12194048

Submission received: 16 August 2023 / Revised: 19 September 2023 / Accepted: 21 September 2023 / Published: 27 September 2023 / Retracted: 24 July 2025

(This article belongs to the Special Issue Applications of Deep Neural Network for Smart City)

Download

Browse Figures

Versions Notes

Abstract

Accurate and reliable water level prediction plays a crucial role in the optimal management of water resources and reservoir scheduling. Water level data have the characteristics of volatility and temporality; a single water level prediction model can only be applied to specific hydrological conditions and reservoirs. Therefore, in this paper, we present a robust lightweight model for water level prediction, namely WLP-VBL, by using a combination of VMD, BA, and LSTM. The proposed WLP-VBL model consists of three steps: first, the water level dataset is decomposed by EMD to obtain a number of decomposition layers K, and then VMD is used to decompose the original water level dataset into K intrinsic modal functions (IMFs) to produce a clearer signal. Next, the IMF data are sent to an LSTM neural network optimized by BA for prediction, and finally each component is superimposed to obtain the predicted value. In order to evaluate the effectiveness of the model, experiments were carried out on water level data for the Gan River. The results indicate that: (1) Compared with state-of-the art methods, e.g., LSTM, VMD-LSTM, and EMD-LSTM, WLP-VBL exhibited the best performance. The MSE and MAE of WLP-VBL decreased by 69.6~74.7% and 45~98.5%, respectively. (2) The proposed model showed stronger robustness for water level prediction, and was able to handle highly volatile and noisy data.

Keywords:

water level; variational-mode-decomposition; bat algorithm; long short-term memory

1. Introduction

Water level is the most critical indicator for assessing water bodies and water flow. It serves as a crucial controlling factor influencing variations in nutrients and primary productivity in large floodplain lake systems [1]. The study of water levels holds significant implications for water resource management, hydrological forecasting, water engineering design and operation, and ecological environment protection. It plays a pivotal role in advancing society sustainably and enhancing human well-being.

Traditional methods for predicting water level can be categorized into six types: (1) modified linear regression [2]; (2) the wavelet-ANFIS model [3]; (3) the hydrodynamic model [4]; (4) the particle filtering algorithm [5]; (5) the Bayesian vine copula (BVC) model [6]; and (6) the autoregressive integrated moving average (ARIMA) model [7]. Modified linear regression employs the partial least squares (PLS) regression technique, which involves assessing the interconnections among influential factors. In the wavelet-ANFIS model, the selection of wavelet basis functions is crucial, and prediction accuracy depends on choosing appropriate wavelet basis functions. The Hydrologic Engineering Center’s River Analysis System (HEC-RAS) stands as one of the most widely used hydrodynamic models in the United States, boasting a user-friendly interface despite its high computational requirements. In the particle filtering algorithm, the randomness of sampling and the limited number of collected samples (particles) significantly constrain outcomes by assigning low importance weights and may even preclude the possibility of sampling. The effectiveness of the Bayesian vine copula (BVC) model in representing extreme water level behavior and simulating the joint dependence structure of different variables with the vine copula is limited due to the use of theoretical marginal distributions such as the Weibull distribution. Among these methods, the ARIMA model is one of the most widely adopted. It is a classical time series analysis method capable of capturing the trend, seasonality, and stochastic components of data. The key of the ARIMA model lies in establishing a mathematical model to describe time series data. It offers the advantages of simplicity and ease of implementation and interpretation, rendering it suitable for various types of time series data, including water level data. However, traditional methods are easily influenced by noise within raw data, resulting in oscillations and affecting model accuracy. They fail to deeply explore and utilize the characteristics of water level data. Traditional water level time series forecasting methods possess applicability in certain situations but also exhibit limitations, including the requirement for data stationarity, assumptions about linear relationships, and model accuracy.

Machine learning algorithms can automatically discover patterns and correlations in data, thereby improving prediction accuracy and efficiency. By learning from extensive data, machine learning can identify hidden features and leverage them for prediction tasks. Additionally, machine learning demonstrates remarkable flexibility and scalability, continuously enhancing model prediction accuracy and performance through training and optimization. Consequently, in the realm of water level prediction, an increasing number of scholars are turning to machine learning methods for precise water level forecasting. Wang et al. [8] introduced a new framework based on machine learning methods that theoretically analyzed the spatiotemporal autocorrelation of adjacent well water levels. They selected the water levels of neighboring wells and the river’s nearshore from the last time step as independent variables and employed a support vector machine (SVM) machine learning model with calibrated parameters to calculate the current time step’s water level. Feng et al. [9] established a multi-layer BP neural network water level prediction model to predict water levels in hydropower stations. In addition, to mitigate the influence of the hydropower station’s static curve on prediction results, a data-driven model was established. Results indicated that compared to the hydrological equilibrium model, the BP neural network model exhibited higher accuracy. Sun et al. [10] proposed a strategy that combines the seasonal autoregressive integrated moving average (SARIMA) and long short-term memory (LSTM) models for predicting sea level anomaly (SLA) time series. The SLA time series were decomposed into trends, seasonal terms, and random terms. The SARIMA model was then used to predict the trends and seasonal terms of sea level variations, while the LSTM was used to predict the random terms. Du et al. [11] proposed a combination model for short-term water level forecasting based on the Prophet model and a long short-term memory (LSTM) network. The Prophet and LSTM models were initially constructed separately, and their results were combined by adopting different weights based on the least absolute error (LAE). The model was verified using water level data from Hongze Lake, and the experimental results showed that the proposed model effectively addressed the problem of nonlinear features of the raw data. Stateczny et al. [12] integrated features extracted from preprocessed remote sensing images to create a new hydraulic index. Subsequently, various classifiers such as neural networks (NN), support vector machines (SVM), and improved deep convolutional neural networks (DCNN) models were combined to predict groundwater levels based on remote sensing images. Sun et al. [13] studied 76 different hydrogeological properties in a belt area and focused on three commonly used data-driven models: autoregressive integrated moving average (ARIMA), backpropagation artificial neural network (BP-ANN), and long short-term memory (LSTM). The results demonstrated that LSTM performed well on both daily and monthly time scales. Zhang et al. [14] employed the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) method to decompose the original water level series into high-frequency, low-frequency, and residue sequences. A wavelet threshold denoising algorithm was applied to eliminate noise in the high-frequency sequence, and support vector regression (SVR), backpropagation (BP) neural network, and LSTM models were used to predict water level sequences. Finally, they integrated the predictions of the three machine learning models using a linear model to obtain final results. Experimental results indicated that the ensemble model exhibited better prediction accuracy and applicability than individual models. Khanesar et al. [15] employed interval type 2 fuzzy logic systems with support vector regression to identify the prediction interval associated with data. In the proposed approach, a penalty term was added to the cost functions to exert full control over the width of the prediction interval. Fei et al. [16] proposed a new method for predicting tidal water levels by developing a hydrodynamic-hydrologic coupling model (H2C) and combining its output (upstream flow and water level, tidal level) with LSTM. The coupling model (H2C-XL) was driven by a satellite meteorological estimation dataset and TPXO9 tidal data, and it was tested within the tidal range, significantly improving water level prediction accuracy. Chaudhary et al. [17] presented a computer vision system that can estimate water depth using images from social media captured during flooding events. This system enables the creation of flood maps in near real-time. The proposed multitask (deep) learning approach involves training a model using both regression and pairwise ranking losses. Yuan et al. [18] proposed a two-stage modeling method to improve the accuracy and efficiency of simultaneous daily water level prediction at multiple stations along an inland river. Firstly, the data was divided into six clusters using dynamic time warping (DTW) and a hierarchical clustering algorithm (HCA). Then, multistation daily water level prediction (MSDWLP) models were constructed for each cluster using long short-term memory (LSTM) networks and seasonal autoregressive integrated moving average (SARIMA) models.

As can be seen from the above analysis, significant progress has been made in the field of water level prediction. However, it is essential to acknowledge that several challenges still warrant further investigation: (1) Water level generally contains noise and exhibits significant fluctuations, which makes it difficult to fully explore the feature of water level data for a single machine learning model. More robust models need to be further investigated. (2) While some studies used hybrid models, most of them are the fusion of different prediction results. They ignored the fact that the water level data showed strong randomness and volatility, making the model less applicable to different data sets. This means that the models may not adapt well to the prediction demands of different regions, time periods, or hydrological conditions.

Therefore, the objective of this paper is to propose a novel combination model to improve the accuracy of water level prediction, as well as the robustness and applicability of the model.

The main contributions of our work are summarized as follows:

We have proposed a novel model, namely WLP-VBL, by using a combination of VMD, BA, and LSTM for water level prediction. The proposed combination model exhibits significant improvement in accuracy compared to a single model.
Unlike most of the studies that use raw data directly, the WLP-VBL takes into account the noise and fluctuations present in the original data and applies time-frequency processing and signal decomposition techniques, which have shown greater robustness.

2. Materials and Methods

2.1. Materials

The Gan River runs through Jiangxi Province from south to north and is the largest river flowing into China’s largest freshwater lake, Poyang Lake, with a basin length of approximately 82,809 km. It is the seventh largest tributary of the Yangtze River, with a main stream length of 823 km. The Gan River Basin belongs to a subtropical humid climate and is a typical nonregulated (natural) flow river. This means that its water flow is mainly affected by seasonal rainfall and is not affected by human reservoir regulation. The main stream and tributaries of the Gan River are shown in Figure 1, which is downloaded from the National Earth System Science Data Center (http://loess.geodata.cn (accessed on 8 October 2022)). The water level data used in this study was obtained from the water level warning section of the website (http://www.xiaoyuka.com/water/ (accessed on 16 September 2023)). We selected the daily water level data of the Gan River from 2 March 2022 to 24 September 2022, as shown in Figure 2. We divided the data into two sets, using the first 80% of the data as the training set and the remaining 20% as the validation set.

To ensure the convergence speed and accuracy of the model, the data were first standardized as follows:

x_{n e w} = \frac{x - μ}{σ}

(1)

where μ represents the mean of the sample data and σ represents the standard deviation of the sample data.

2.2. Methods

The flowchart of the proposed water level prediction method is shown in Figure 3. This method consists of three main components: time-frequency signal processing, parameter initialization, and model training and prediction. The time-frequency signal processing phase leverages both empirical mode decomposition (EMD) and variational mode decomposition (VMD) techniques to decompose the water level time series data into intrinsic mode functions (IMFs) across various time scales. This decomposition process effectively reduces the inherent randomness and volatility observed in the water level data [19]. Next, the Bat algorithm (BA) is employed to discover the optimal fitness value and initialize the parameters for the long short-term memory (LSTM) model. This initialization step is crucial for ensuring the LSTM model’s optimal performance. Subsequently, the LSTM model, enhanced through BA optimization, undergoes training and is utilized for prediction on each feature sequence, contributing to the accurate prediction of water levels.

2.2.1. Time-Frequency Domain Signal Processing Based on DEM and VMD

Empirical mode decomposition (EMD) is a technique used to analyze nonlinear and stochastic time series in a stationary manner, extracting the fluctuation and trend components at various scales to obtain intrinsic mode functions (IMFs) [20]. The IMFs obtained from the analysis must satisfy the following two essential requirements: the number of zero-crossings and the number of extrema in each IMF should be either equal to or smaller than 1, and envelope lines must appear at local maxima and minima with a mean of zero. The term x(t) represents the time series,

{i m f}_{i}

denotes the ith IMF component, and r represents the residual term. The EMD decomposition can be expressed as follows [21]:

x (t) = \sum_{i = 1}^{n} {i m f}_{i} (t) + r (t)

(2)

Variational mode decomposition (VMD) is an adaptive and completely non-recursive method used for mode variation and signal processing. By the use of the VMD algorithm, the original data can be non-recursively processed, allowing a multicomponent signal to be decomposed into multiple single-component signals [22]. The steps of VMD are as follows:

Firstly, the VMD algorithm is used to decompose the original signal sequence into a series of finite bandwidth modal functions. Each modal component has a finite bandwidth of different center frequencies. The decomposed modal functions are demodulated to the fundamental frequency band of the phase, and finally the sum of the estimated bandwidth of each mode is minimized. To calculate the bandwidth of each modal function, the corresponding constraint problem becomes [23]:

\{\begin{matrix} \underset{\{u_{k}\}, \{w_{k}\}}{m i n} \{\sum_{k = 1}^{K} {∥𝜕_{t} [(δ (t) + \frac{j}{π t}) * u_{k} (t)] e^{- j ω_{k} t}∥}_{2}^{2}\} \\ s . t . \sum_{k = 1}^{k} u_{k} (t) = f \end{matrix}

(3)

In the equation,

δ (t)

represents the Dirac delta function, and * denotes convolution. K represents the number of modes to be decomposed.

\{u_{k}\} = {u_{1}, \dots, u_{k}}

represents the k modal function components obtained by decomposition.

\{w_{k}\} = {w_{1}, \dots, w_{k}}

represents the center frequency of each modal function.

Secondly, to solve objective function constrained optimization problems, quadratic penalties α and Lagrange function λ are introduced and converted into an unconstrained variational question [24]:

\begin{array}{l} L (\{u_{k}\}, \{w_{k}\}, λ) = \\ α \sum_{k} {∥𝜕_{t} [(δ (t) + \frac{j}{π t}) u_{k} (t)] e^{- j w_{k} t}∥}_{2}^{2} \\ + {∥f (t) - \sum_{k} u_{k} (t)∥}_{2}^{2} \\ + ⟨λ (t), f (t) - \sum_{k} u_{k} (t)⟩ \end{array}

(4)

In the end, by iteratively updating, the numerical values of the intrinsic mode functions (IMFs) can be obtained:

{\hat{u}}_{k}^{n + 1} (w) = \frac{\hat{f} (w) - \sum_{g = 1} {\hat{u}}_{g} (w) + \frac{\hat{λ} (w)}{2}}{1 + 2 α {(w - w_{k})}^{2}}

(5)

The solution for the central frequency is defined as:

w_{k}^{n + 1} = \frac{\int_{0}^{\infty} w {|{\hat{u}}_{k}^{n + 1} (w)|}^{2} d w}{\int_{0}^{\infty} {|{\hat{u}}_{k} (w)|}^{2} d w}

(6)

In the equation,

{\hat{u}}_{k}^{n + 1}

is Wiener filtering for residual components, and

w_{k}^{n + 1}

is the center frequency of the current modal function.

2.2.2. Parameter Optimization Based on Bat Algorithm (BA)

In the Bat algorithm (BA), bats are considered as potential solutions to optimization problems, and their foraging mechanisms in complex environments are simulated for solving optimization tasks. A comparative analysis of accuracy and efficiency with genetic algorithms and particle swarm optimization has shown that the Bat algorithm outperforms these two algorithms [25].

The position and velocity of bat i at time t and the method of adjusting the position and velocity at time t + 1 are as follows [26]:

f_{i} = f_{m i n} + (f_{m a x} - f_{m i n}) β

(7)

v_{i}^{t + 1} = v_{i}^{t} + (x_{i}^{t} - x^{*}) f_{i}

(8)

x_{i}^{t + 1} = x_{i}^{t} + v_{i}^{t + 1}

(9)

In the equation,

f_{i}

represents the frequency of the sound wave emitted by bat i, where

f_{i}

∈ [

f_{m i n}

,

f_{m a x}

]. β is a parameter with a value in the range [0, 1].

x^{*}

denotes the best position that the bat is currently in.

The formula for updating the bat’s position is as follows:

x_{n e w} = x_{o l d} + η A_{t}

(10)

In the equation,

x_{o l d}

represents the current best solution. η is a parameter with a value in the range [−1, 1].

A_{t}

is the average loudness of all bats emitting sound waves at the current time step.

x_{n e w}

is a new solution obtained by adding a random perturbation term to the current best solution.

During the search process, to balance global exploration and local exploitation, the loudness

A_{i}^{t}

and frequency

r_{i}^{t}

of the sound waves emitted by each bat need to be self-adjusted. The calculation formulas are as follows [27]:

A_{i}^{t + 1} = α A_{i}^{t}

(11)

r_{i}^{t + 1} = r_{i}^{0} (1 - \exp (- γ_{3} t))

(12)

In the equation, α is the loudness increase coefficient where α∈ [0, 1],

γ_{3}

is a constant greater than 0, and

r_{i}^{0}

represents the initial velocity of emission.

2.2.3. Long Short-Term Memory (LSTM)

LSTM is a neural network algorithm capable of remembering long and short-term information. It was proposed to address the issue of long-term dependencies and achieve better information representation in longer time series [28]. LSTM achieves this by utilizing three gates and memory cell logic to control the forgetting, input, and output of information. The three gates in LSTM are the forget gate, input gate, and output gate, as shown in Figure 4. The forget gate allows the model to selectively forget or retain information from the previous time step. The input gate allows the model to decide what new information to input into the memory cell. The output gate controls which parts of the memory cell contents should be outputted. Due to the presence of gate mechanisms, relevant information can be directly passed on to subsequent memory cells, alleviating the problems of vanishing and exploding gradients during the model training process. This ensures better learning and memory retention capabilities in LSTM. The principle of LSTM is as follows [29]:

The forget gate

F_{t}

represents which part of the information from the previous time step should be forgotten:

F_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(13)

The input gate

I_{t}

determines which part of the new information should be stored and used to update the cell state:

I_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(14)

At time T, the candidate cell state

\tilde{C}

is defined as:

{\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C})

(15)

At time T, the current cell state

C_{t}

is defined as:

C_{t} = f_{t} * C_{t - 1} + I_{t} * {\tilde{C}}_{t}

(16)

The output gate

Q_{t}

outputs the current time step’s state information and determines the value of the next hidden state:

Q_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o})

(17)

The output

h_{t}

of the memory cell at time t is defined as:

h_{t} = Q_{t} * \tanh (C_{t})

(18)

In the equation, σ represents the sigmoid function,

W_{f}

,

W_{i}

,

W_{C}

and

W_{o}

are the corresponding connection weights,

h_{t - 1}

and

x_{t}

represent the input data,

[h_{t - 1}, h_{t}]

represents the concatenation of

h_{t - 1}

and

x_{t}

, and

b_{f}

,

b_{i}

,

b_{C}

and

b_{o}

are the bias terms for the respective gates.

3. Results

3.1. Experimental Environment

The experiment was conducted on a PC with an Intel(R) Core(TM) i5-10210U CPU @ 1.60 GHz 2.11 GHz, 64 GB RAM, and a 64-bit Windows 11 operating system. The experimental platform used MATLAB2020a with the time-domain toolbox installed.

For the LSTM network training, the root mean square error (RMSE) was selected as the loss function and the Adam optimizer was used as the gradient descent optimizer. The input variable’s time window was set to K = 1, with one alternating time step. The number of epochs was set to 250. The initial learning rate was set to 0.005 and was reduced by a factor of 0.2 after 125 epochs of training. The number of iterations, hidden units, and learning rate were determined using the Bat algorithm and found to be 258, 242, and 0.0044, respectively.

3.2. Evaluation Indicators

In order to quantitatively evaluate the effectiveness of the proposed model, the experiment used the Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) as the evaluation indicators. The calculation formulas for these evaluation metrics are as follows:

S_{M A E} = \frac{1}{n} \sum_{i = 1}^{n} |p_{i} - p_{i}^{*}|

(19)

S_{R M S E} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(p_{i} - p_{i}^{*})}^{2}}

(20)

In the equation,

p_{i}^{*}

represents the predicted water level of the Ganjiang River, and

p_{i}

represents the true water level of the Gan River.

3.3. Results and Analysis

Considering the characteristics of the original data, we utilize the advantages of the EMD and VMD algorithms in frequency-domain decomposition to process the original data sequence. The choice of the number of modes (K) plays a crucial role in the effectiveness of VMD decomposition. When K is too small, there is a risk of losing vital information within the water level data, which can subsequently impact prediction accuracy. Conversely, if K is too large, the center frequency distance between neighboring mode components decreases, potentially leading to mode overlap or the presence of noise. To determine an appropriate value for K, we employ EMD decomposition. As illustrated in Figure 5, the original signal is divided into six intrinsic mode functions (IMFs). The first IMF typically represents the highest-frequency oscillatory mode in the signal, often consisting of noise or high-frequency components. The second IMF corresponds to the next highest-frequency oscillatory mode. Subsequent IMFs capture progressively lower-frequency oscillatory modes. The term “residual” refers to the signal portion that remains after decomposition, as it cannot be further divided into any IMF components.

The original data is decomposed into six intrinsic mode functions (IMFs) using the VMD algorithm. These IMFs are then fed into the constructed BA-LSTM model to obtain the predicted values, which are compared with the actual values to create the curves shown in Figure 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11. IMF1 is a component that reflects the overall trend of water level series changes. IMF2-6 is the fluctuation component, reflecting the random fluctuation details of the curve, and is sorted in order from high frequency to low frequency. The first half of IMF1 has good prediction performance, while the weak high-frequency signals in the second half are difficult to separate. In the remaining figures, the almost perfect and moderately strong intermediate frequency signal extracted from the lowest frequency strong signal is still under acceptable quality detection. The residuals are calculated based on predicted and true values. Finally, the results of the six IMF predictions are combined to produce the final prediction shown in Figure 12. The trends exhibited by the two lines in the figure demonstrate a general consistency. Despite experiencing occasional variations, this error consistently falls within an acceptable range. The forecast outcomes for the time intervals of 0–8 days and 35–40 days exhibit remarkable performance. The Root Mean Square Error (RMSE) of the prediction is 0.01183, with an error close to 1%. The maximum error between the predicted value and the observed value is 0.05 m, and the minimum error is 0. The presence of autocorrelation in the sequence may account for the significant error. Overall, the model demonstrates a high level of accuracy and applicability.

To validate the robustness and accuracy of the proposed model, a comparison and analysis of the errors of the LSTM, EMD-LSTM, VMD-LSTM, and BA-LSTM models was conducted, and the results are shown in Table 1. From the experimental results, it can be observed that the VMD-BA-LSTM model outperforms the compared models in all evaluation metrics. The MSE and MAE are improved by 69.6% to 74.7% and 45% to 98.5%, respectively, compared to the other models, indicating the strong accuracy and reliability of this model for effective water level prediction.

4. Discussion

In this section, ablation experiments were conducted to analyze the contribution of each component.

First, the BA algorithm was removed from the proposed model, denoted as VMD-LSTM. As shown in Table 1, when compared with the proposed method, the accuracy of VMD-LSTM is notably lower. This is attributed to the fact that the BA algorithm can optimize the parameters of LSTM model, thereby enhancing the generalization ability of the model. In the absence of the BA algorithm, both MSE and MAE significantly increased.

Secondly, in order to investigate the contribution of the VMD algorithm, it was removed from the proposed model, denoted as BA-LSTM. As observed in Table 1, compared with VMD-BA-LSTM, MSE and MAE increased by 0.001759 and 0.000677, respectively. It is evident that VMD performs better in processing water level data, as it exhibits superior noise reduction, more stable mode functions, and higher frequency resolution.

Next, both the VMD algorithm and the BA algorithm were removed from the proposed model, denoted as LSTM. As can be seen from Table 1, compared to the method proposed in this article, the accuracy of this method is inferior.

Furthermore, by comparing VMD-LSTM with EMD-LSTM, it is evident that VMD outperforms in processing water level data due to its superior noise reduction, more stable mode functions, and higher frequency resolution.

In conclusion, among the three algorithms, VMD has the most significant impact on the results, followed by BA-LSTM, while LSTM has the least influence on the outcomes.

Additionally, K represents the number of modes in VMD decomposition, determining how many intrinsic mode functions (IMFs) the original signal is divided into [30]. Choosing the appropriate K value is crucial for accurately capturing signal features and influencing prediction accuracy and model performance. Too small or too large K values can lead to information loss or overfitting, resulting in reduced prediction accuracy.

As depicted in Table 2, the model’s performance exhibits variation across different K values. Larger K values yield a greater number of IMFs, allowing for more precise capturing of signal intricacies. However, they also bring about a higher computational burden and may introduce noise into the analysis. Conversely, smaller K values may lead to information loss, potentially failing to adequately capture important signal features. This can result in overly simplified models with reduced prediction accuracy.

Therefore, in practical applications, experimenting with various K values and assessing the model’s performance can aid in selecting the most suitable K value for attaining optimal prediction accuracy and robustness.

5. Conclusions

In this paper, we propose a combination model to accurately predict water levels, addressing the issue of noise contamination in the original data. The model integrates three modules, VMD, BA, and LSTM. The original data is first decomposed using VMD to generate a clear signal. Then the BA algorithm is applied to determine the optimal hyperparameter. Finally, an LSTM neural network is adopted to predict the time series water levels. Diverging from most existing studies that directly utilize the original water level data, our research takes a different approach by utilizing a range of modal functions extracted from the original signal. This approach captures multifaceted information across multiple dimensions. The proposed method was tested using the daily water level data of Gan River from 2 March 2022 to 24 September 2022.The experimental results indicate that the proposed VMD-BA-LSTM model outperforms LSTM, BA-LSTM, EMD-LSTM, and VMD-LSTM in both MSE and MAE. MSE and MAE decreased by 9.6~74.7% and 45~98.5%, respectively. Both the VMD and BA modules contribute significantly to the accuracy of water level prediction. In the future, we plan to expand our dataset by incorporating additional water level data and extending the time frame. This effort aims to enhance the model’s generalizability, enabling it to be applied in a broader range of scenarios and yielding more precise prediction results.

Author Contributions

Conceptualization, C.Y. and W.H.; methodology, W.H. and H.P.; validation, C.Y., W.H. and J.D.; formal analysis, W.H.; investigation, C.Y.; data curation, J.D.; writing—original draft preparation, W.H.; writing—review and editing, C.Y. and H.P.; visualization, W.H.; supervision, H.P.; funding acquisition, H.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Open Fund of Key Laboratory of Urban Spatial Information, Ministry of Natural Resources (grant No. 2023PT004), the National Key R&D Program of China under grant 2018YFB0505400, and the National Natural Science Foundation of China under grants 41871325 and 42241164.

Acknowledgments

We would like to thank the reviewers for their valuable comments and suggestions. We acknowledge data support from the Loess Plateau Science Data Center, National Earth System Science Data Sharing Infrastructure, National Science & Technology Infrastructure of China (http://loess.geodata.cn (accessed on 16 September 2023)).

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

IMF	Intrinsic mode functions
$x (t)$	Original signal
$r (t)$	Residual term
K	Number of decomposed patterns
${\hat{u}}_{k}^{n + 1}$	Wiener filtering for residual components
$w_{k}^{n + 1}$	Center frequency of modal function
$f_{i}$	Frequency of the sound wave
$x^{*}$	Best current position of bat
$A_{t}$	Average loudness at current time step
$A_{i}^{t}$	Pulse loudness
$h_{t - 1}$	Output from previous stage
$x_{t}$	Input from this stage
b	Bias terms for respective gates
W	Corresponding connection weights
$r_{i}^{t}$	Pulse emission frequency
$C_{t}$	Cell status
$X$	Current solution
$I_{t}$	Input gate
$Q_{t}$	Output gate
$F_{t}$	Forget gate
$p_{i}^{*}$	Predicted value
$p_{i}$	True value
Greek symbols
$δ (t)$	Dirac delta function
α	Penalty term
λ	Lagrange function
β	Random value within [0,1]
η	Random value within [−1,1]
σ	Sigmoid function

References

Wang, S.; Gao, Y.; Jia, J.; Kun, S.; Lyu, S.; Li, Z.; Lu, Y.; Wen, X. Water level as the key controlling regulator associated with nutrient and gross primary productivity changes in a large floodplain-lake system (Lake Poyang), China. J. Hydrol. 2021, 599, 126414. [Google Scholar] [CrossRef]
Kommineni, M.; Reddy, K.V.; Jagathi, K.; Reddy, B.D.; Roshini, A.; Bhavani, V. Groundwater level prediction using modified linear regression. In Proceedings of the 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; IEEE: Washington, DC, USA, 2020; Volume 9074313, pp. 1164–1168. [Google Scholar]
Zhang, R.; Dong, Z.; Guo, H. Forecast of Poyang Lake’s Water Level by Wavelet-ANFIS Model. In Proceedings of the 2009 International Conference on Computational Intelligence and Natural Computing, Wuhan, China, 6–7 June 2009; IEEE: Washington, DC, USA, 2009; Volume 1, pp. 393–395. [Google Scholar]
Alipour, A.; Jafarzadegan, K.; Mirzakhani, H. Global sensitivity analysis in hydrodynamic modeling and flood inundation mapping. Environ. Model. Softw. 2022, 152, 105398. [Google Scholar] [CrossRef]
Ruslan, F.A.; Zain, Z.M.; Adnan, R. Flood water level prediction and tracking using particle filter algorithm. In Proceedings of the 2012 IEEE 8th International Colloquium on Signal Processing and Its Applications, Malacca, Malaysia, 23–25 March 2012; IEEE: Washington, DC, USA, 2012; Volume 6194763, pp. 431–435. [Google Scholar]
Liu, Z.; Cheng, L.; Lin, K.; Cai, H. A hybrid bayesian vine model for water level prediction. Environ. Model. Softw. 2021, 142, 105075. [Google Scholar] [CrossRef]
Yu, Z.; Lei, G.; Jiang, Z.; Liu, F. ARIMA modelling and forecasting of water level in the middle reach of the Yangtze River. In Proceedings of the 2017 4th International Conference on Transportation Information and Safety (CTIS), Banff, AB, Canada, 8–10 August 2017; IEEE: Washington, DC, USA, 2017; Volume 8047762, pp. 172–177. [Google Scholar]
Wang, H.; Song, L. Water Level Prediction of Rainwater Pipe Network Using an SVM-Based Machine Learning Method. Int. J. Pattern Recognit. Artif. Intell. 2019, 34, 2051002. [Google Scholar] [CrossRef]
Feng, W.; Lei, X.; Wang, C.; Huang, H. Study on water level prediction method of Shaping Hydropower Station based on BP neural network. In Proceedings of the 2021 7th International Conference on Hydraulic and Civil Engineering & Smart Water Conservancy and Intelligent Disaster Reduction Forum (ICHCE & SWIDR), Nanjing, China, 6–8 November 2021; IEEE: Washington, DC, USA, 2021; Volume 9656344, pp. 1713–1716. [Google Scholar]
Sun, Q.; Wan, J.; Liu, S. Estimation of sea level variability in the China Sea and its vicinity using the SARIMA and LSTM models. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3317–3326. [Google Scholar] [CrossRef]
Du, N.; Liang, X. Short-term water level prediction of Hongze Lake by Prophet-LSTM combined model based on LAE. In Proceedings of the 2021 7th International Conference on Hydraulic and Civil Engineering & Smart Water Conservancy and Intelligent Disaster Reduction Forum (ICHCE & SWIDR), Nanjing, China, 6–8 November 2021; IEEE: Washington, DC, USA, 2021; Volume 9656315, pp. 255–259. [Google Scholar]
Stateczny, A.; Narahari, S.C.; Vurubindi, P.; Guptha, N.S.; Srinivas, K. Underground Water Level Prediction in Remote Sensing Images Using Improved Hydro Index Value with Ensemble Classifier. Remote Sens. 2023, 15, 2015. [Google Scholar] [CrossRef]
Sun, J.; Hu, L.; Li, D.; Sun, K.; Yang, Z. Data-driven models for accurate groundwater level prediction and their practical significance in groundwater management. J. Hydrol. 2022, 608, 127630. [Google Scholar] [CrossRef]
Zhang, Y.; Liang, X.; Lü, X. Short term water level prediction based on C-Stacking ensemble model. In Proceedings of the 2021 7th International Conference on Hydraulic and Civil Engineering & Smart Water Conservancy and Intelligent Disaster Reduction Forum (ICHCE & SWIDR), Nanjing, China, 6–8 November 2021; IEEE: Washington, DC, USA, 2021; Volume 9656410, pp. 116–121. [Google Scholar]
Khanesar, M.A.; Branson, D.T. Prediction Interval Identification Using Interval Type-2 Fuzzy Logic Systems: Lake Water Level Prediction Using Remote Sensing Data. IEEE Sens. J. 2021, 21, 13815–13827. [Google Scholar] [CrossRef]
Fei, K.; Du, H.; Gao, L. Accurate water level predictions in a tidal reach: Integration of Physics-based and Machine learning approaches. J. Hydrol. 2023, 622, 129705. [Google Scholar] [CrossRef]
Chaudhary, P.; D’aronco, S.; Leitão, J.; Schindler, K.; Wegner, J. Water level prediction from social media images with a multi-task ranking approach. ISPRS J. Photogramm. Remote Sens. 2020, 167, 252–262. [Google Scholar] [CrossRef]
Yuan, Z.; Liu, J.; Liu, Y.; Zhang, Q.; Li, Y.; Li, Z. A two-stage modelling method for multi-station daily water level prediction. Environ. Model. Softw. 2022, 156, 105468. [Google Scholar] [CrossRef]
Ranawat, N.S.; Prakash, J.; Miglani, A.; Kankar, P.K. Performance evaluation of LSTM and Bi-LSTM using non-convolutional features for blockage detection in centrifugal pump. Eng. Appl. Artif. Intell. 2023, 122, 106092. [Google Scholar] [CrossRef]
Mounir, N.; Ouadi, H.; Jrhilifa, I. Short-term electric load forecasting using an EMD-BI-LSTM approach for smart grid energy management system. Energy Build. 2023, 288, 113022. [Google Scholar] [CrossRef]
Boudraa, A.O.; Cexus, J.C. EMD-based signal filtering. IEEE Trans. Instrum. Meas. 2007, 56, 2196–2202. [Google Scholar] [CrossRef]
Huang, W.; Wang, R.; Zhuang, Y.; Wang, Z.; Du, Q. Adaptive Harmonic Detection of Active Power Filter based on Improved VMD. In Proceedings of the 2022 IEEE 5th International Electrical and Energy Conference (CIEEC), Nanjing, China, 27–29 May 2022; IEEE: Washington, DC, USA, 2022; pp. 2682–2686. [Google Scholar]
Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
Zhang, H.; Zhang, Y.; Xu, Z. Thermal Load Forecasting of an Ultra-short-term Integrated Energy System Based on VMD-CNN-LSTM. In Proceedings of the 2022 International Conference on Big Data, Information and Computer Network (BDICN), Sanya, China, 20–22 January 2022; IEEE: Washington, DC, USA, 2022; pp. 264–269. [Google Scholar]
Yang, X.S.; Hossein Gandomi, A. Bat algorithm: A novel approach for global engineering optimization. Eng. Comput. 2012, 29, 464–483. [Google Scholar] [CrossRef]
Griffiths, C.A.; Giannetti, C.; Andrzejewski, K.T.; Morgan, A. Comparison of a Bat and Genetic Algorithm Generated Sequence Against Lead through Programming When Assembling a PCB Using a Six-Axis Robot with Multiple Motions and Speeds. IEEE Trans. Ind. Inform. 2021, 18, 1102–1110. [Google Scholar] [CrossRef]
Singh, D.; Salgotra, R.; Singh, U. A novel modified bat algorithm for global optimization. In Proceedings of the 2017 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), Coimbatore, India, 17–18 March 2017; IEEE: Washington, DC, USA, 2017; pp. 1–5. [Google Scholar]
Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2016, 28, 2222–2232. [Google Scholar] [CrossRef] [PubMed]
Raj, N.; Brown, J. Prediction of Mean Sea Level with GNSS-VLM Correction Using a Hybrid Deep Learning Model in Australia. Remote Sens. 2023, 15, 2881. [Google Scholar] [CrossRef]
Liang, D.; Xu, J.; Li, S.; Sun, C. Short-term passenger flow prediction of rail transit based on VMD-LSTM neural network combination model. In Proceedings of the 2020 Chinese Control and Decision Conference (CCDC), Hefei, China, 22–24 August 2020; IEEE: Washington, DC, USA, 2020; pp. 5131–5136. [Google Scholar]

Figure 1. Map of Gan River Basin.

Figure 2. Water level data for the Gan River.

Figure 3. VMD-BA-LSTM flowchart.

Figure 4. Basic element structure of LSTM model.

Figure 5. Empirical mode decomposition (EMD).

Figure 6. IMF1 forecast and true value.

Figure 7. IMF2 forecast and true value.

Figure 8. IMF3 forecast and true value.

Figure 9. IMF4 forecast and true value.

Figure 10. IMF5 forecast and true value.

Figure 11. IMF6 forecast and true value.

Figure 12. Predicted value and true value.

Table 1. Accuracy comparison of each model.

Model	VMD-BA-LSTM	LSTM	BA-LSTM	EMD-LSTM	VMD-LSTM
MSE	0.000768	0.083367	0.034202	0.003031	0.002527
MAE	0.000827	0.009964	0.005952	0.055055	0.001504

Table 2. Model errors under different values of K.

K	4	5	6	7	8
MSE	0.033695	0.038153	0.000768	0.040341	0.032462
MAE	0.001012	0.001176	0.000827	0.001131	0.001016

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yi, C.; Huang, W.; Pan, H.; Dong, J. RETRACTED: WLP-VBL: A Robust Lightweight Model for Water Level Prediction. Electronics 2023, 12, 4048. https://doi.org/10.3390/electronics12194048

AMA Style

Yi C, Huang W, Pan H, Dong J. RETRACTED: WLP-VBL: A Robust Lightweight Model for Water Level Prediction. Electronics. 2023; 12(19):4048. https://doi.org/10.3390/electronics12194048

Chicago/Turabian Style

Yi, Congqin, Wenshu Huang, Haiyan Pan, and Jinghan Dong. 2023. "RETRACTED: WLP-VBL: A Robust Lightweight Model for Water Level Prediction" Electronics 12, no. 19: 4048. https://doi.org/10.3390/electronics12194048

APA Style

Yi, C., Huang, W., Pan, H., & Dong, J. (2023). RETRACTED: WLP-VBL: A Robust Lightweight Model for Water Level Prediction. Electronics, 12(19), 4048. https://doi.org/10.3390/electronics12194048

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RETRACTED: WLP-VBL: A Robust Lightweight Model for Water Level Prediction

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.2. Methods

2.2.1. Time-Frequency Domain Signal Processing Based on DEM and VMD

2.2.2. Parameter Optimization Based on Bat Algorithm (BA)

2.2.3. Long Short-Term Memory (LSTM)

3. Results

3.1. Experimental Environment

3.2. Evaluation Indicators

3.3. Results and Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Nomenclature

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI