ARIMA-Based Forecasting of Wastewater Flow Across Short to Long Time Horizons

Ye, Jiawen; Meng, Xulai; Wang, Haiying; Zhou, Qingdao; An, Siwei; An, Tong; Ghorbani Bam, Pooria; Rosso, Diego

doi:10.3390/math13132098

Open AccessArticle

ARIMA-Based Forecasting of Wastewater Flow Across Short to Long Time Horizons

by

Jiawen Ye

¹,

Xulai Meng

¹,

Haiying Wang

^1,*,

Qingdao Zhou

¹,

Siwei An

¹,

Tong An

¹,

Pooria Ghorbani Bam

² and

Diego Rosso

^2,3

¹

School of Science, China University of Geosciences (Beijing), Beijing 100083, China

²

Department of Civil & Environmental Engineering, University of California, Irvine, CA 92697-2175, USA

³

Water-Energy Nexus Center, University of California, Irvine, CA 92697-2175, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(13), 2098; https://doi.org/10.3390/math13132098

Submission received: 20 May 2025 / Revised: 13 June 2025 / Accepted: 24 June 2025 / Published: 26 June 2025

(This article belongs to the Special Issue Evolutionary Algorithms and Applications)

Download

Browse Figures

Versions Notes

Abstract

Improving urban wastewater treatment efficiency and quality is urgent for most cities. The accurate wastewater flowrate forecast of a wastewater treatment plant (WWTP) is crucial for cutting energy use and reducing pollution. In this study, two hybrid models are proposed: ARIMA–Markov and ARIMA–LSTM–Transformer. Using 5 min-interval inlet flowrate data from a WWTP in 2024, the two models were verified and compared. Forecasts for 1 day, 7 days, and 2 months ahead were made, and model accuracies were compared. Ten repetitions with the same dataset assess stability, and ARIMA–LSTM–Transformer, with better performance, were selected. Then, the Whale Optimization Algorithm (WOA), Particle Swarm Optimization (PSO) algorithm, and Sparrow Search Algorithm (SSA) were used for optimization, with the WOA excelling in accuracy and stability. Experimental results show that compared to the single model Transformer, WOA–ARIMA–LSTM–Transformer did better in forecasting wastewater flowrate. The combined model enables efficient and accurate wastewater flowrate forecasting, highlighting the combined model’s application potential.

Keywords:

autoregressive integrated moving average; long short-term memory; Markov; Transformer; wastewater flowrate forecast; wastewater treatment plant

MSC:

68V99

1. Introduction

With the acceleration of urbanization and the deterioration of existing urban drainage pipelines [1], stable and efficient WWTP operation has become challenging due to the dynamic variations in WWTP wastewater influenced by pipeline leakage, the intrusion of rainwater and groundwater, and so on. An accurate wastewater flowrate forecast is crucial for the optimization of WWTP operation [2].

However, traditional methods often rely on extensive historical data and empirical judgement [3], which make timely and accurate forecasts difficult for complex irregular wastewater flowrate changes. Especially during extreme weather events or major festivals [4], traditional forecasting methods cannot effectively and quickly capture changes in wastewater flow, resulting in the risk of overloading in the WWTP. Therefore, the development of reliable wastewater flowrate forecast models has become a top priority for today’s WWTP [5].

The current forecasting models of wastewater flow are mainly based on traditional statistical models and deep learning models [6]. Traditional statistical models include Autoregressive Integrated Moving Average (ARIMA), Prophet, Markov, etc. ARIMA is a statistical model widely used in time series analysis and forecasting, and it can effectively capture the trend, seasonality, and noise characteristics in time series data [7]. Maleki et al. [8] applied ARIMA to the inflow characteristic time series of WWTP to create models for short-term (7 days in advance) forecasting. Bień et al. [9] mixed ARIMA with the Long Short-Term Memory (LSTM) model, greatly improving the forecast accuracy of sludge production in sewage treatment plants. The MAPE (Mean Absolute Percentage Error) of the hybrid model is as high as 9.4%. The Prophet model can take into account holiday effects on the basis of trends, seasonality, and other factors. The Markov model is a mathematical model based on Markov properties, which mainly describe stochastic processes. Li et al. [10] used a discrete hidden Markov model to accurately predict water quality.

In addition, deep learning models, such as LSTM and Transformer, have gradually become the mainstream forecast method in recent years. As a special recurrent neural network (RNN), LSTM overcomes the gradient vanishing and gradient explosion problems of traditional RNNs when processing long sequence data and can effectively capture long-term dependencies in sequences that are suitable for dealing with nonlinear problems. Nitzan Farhi et al. [11] proposed a novel machine learning method based on LSTM architecture that utilizes measurements from bioreactors sampled every minute and combines them with climate measurements to greatly improve forecast accuracy. When predicting nitrogen concentration and nitrate concentration, the AUC increased by 1% and 5%, respectively.

However, LSTM is highly complex. It takes a long time to train and requires a large amount of data [12]. J. Ali et al. [13] used the Transformer and ensemble models to predict the ammonium nitrogen levels in rivers and achieved good results. The coefficient of determination (R²) and the relatively high Nash–Sutcliffe Efficiency (NSE) value reached 0.97 and 0.90, respectively. However, Transformer struggles to model local temporal relationships and lacks a physical explanation. In wastewater flow forecasting, a large number of external input features (such as weather data, holidays, and watershed conditions) are used, and the high-dimensional nature of this data may affect the training efficiency and forecast accuracy of the Transformer model.

Hyperparameter optimization is essential to improve model accuracy [14]. Currently, prominent optimization algorithms include WOA, PSO, and SSA. Cui et al. [15] proposed a new load forecast model that integrates the WOA to optimize the hyperparameters of an improved LSTM model, achieving excellent forecast results. Du et al. [16] proposed a model that combines LSTM and kernel density estimation (KDE) using the PSO algorithm to optimize KDE’s hyperparameters, proving the superiority of the PSO optimization model through comparison with numerous hybrid models. Zhang et al. [17] utilized the SSA to optimize Bidirectional Long Short-Term Memory (BiLSTM) and formed the VMD (Variational Mode Decomposition) –SSA–BiLSTM coupled model for predicting the monthly runoff of the Yellow River. Compared with the single BiLSTM model, the R² increased by 0.53059, which greatly improved the accuracy.

The ARIMA–Markov model and the ARIMA–LSTM–Transformer model were developed for wastewater flow forecasting during different periods of time. After comparing the two models, the ARIMA–LSTM–Transformer model was chosen. It was then joined by WOA, PSO, and SSA to optimize hyperparameters, and WOA demonstrated the best effect. Thereafter, the improved WOA–ARIMA–LSTM–Transformer model was created to achieve efficient and accurate wastewater flow forecasting during different periods of time.

In this study, the ARIMA–LSTM–Transformer model was chosen based on the complementarity of the advantages of each model: ARIMA is good at capturing linear trends in time series, but it cannot effectively deal with nonlinear features, and it performs poorly in long-term forecasting. Thanks to its memory unit and gating mechanism, LSTM can capture long-term dependencies more accurately, but the stability of the model is poor, so we further entered the Transformer module. Transformer excels at modeling long-term dependencies and complex nonlinear relationships. However, it is more dependent on the amount of data and computing resources. By integrating these three types of models, the shortcomings of each can be alleviated to a certain extent, and the accuracy and stability of sewage flow prediction can be comprehensively improved.

The paper is divided into five different parts: 1. Introduction; 2. Materials and Methods, including missing value handling, outlier identification, and the ARIMA–Markov model and ARIMA–LSTM–Transformer model; 3. Results and Discussion, including model validation, ablation experiments, and a stability test; 4. Algorithm Optimization, including parameter selection, accuracy testing, and a stability test; and 5. Conclusions.

In this study, Jiawen Ye contributed to the conceptualization, data curation, investigation, methodology, original draft writing, visualization, review and editing, and funding acquisition. Xulai Meng performed the formal analysis, data curation, methodology, original draft writing, and validation. Haiying Wang contributed to the conceptualization, methodology, supervision, funding acquisition, and review and editing. Qingdao Zhou and Siwei An contributed to the data curation, methodology, and original draft writing. Tong An participated in the investigation and in the review and editing. Pooria Ghorbani Bam and Diego Rosso were involved in the revision.

2. Materials and Methods

2.1. Missing Value Handling

Due to reasons such as monitoring equipment failures and data transmission failures, there may be some missing data in the dataset used to train the model [18]. According to statistics, approximately 0.421% of the data in this dataset (See Appendix A) are missing values. The prevailing methodologies for addressing such missing values encompass a range of approaches, including mean, median, or mode interpolation; Lagrange’s interpolation [19]; random forest (RF) [20]; and analogous techniques. Although imputation methods such as mean, median, or mode are easy to implement, they distort the original data distribution and fail to capture the underlying trends in the data. RF is an ensemble learning model composed of multiple decision trees, so it has many parameters and a long training time from synthesizing the results of multiple decision trees. Due to the large amount of data used in this study, using RF would be computationally expensive and difficult to adjust. Consequently, the improved Lagrange’s interpolation method is employed for imputation. The Lagrange interpolation method constructs a function through known discrete data points, enabling the function to pass through all the points successively. It can effectively fit the distribution of a set of data. However, when the data set is too large, the Lagrange interpolation method also has a high computational cost and may exhibit oscillation, leading to poor filling accuracy [21]. Therefore, the dataset was divided into 2108 subsets, with each subset containing 50 data points.

2.2. Outlier Identification

Prior to determining the method for cleaning outliers, it is necessary to ascertain how the data are normally distributed. To this end, a quantile–quantile plot (QQ plot) [22] was constructed (as shown in Figure A1 (see Appendix C)), and the skewness and kurtosis of the data were measured (0.14 and −0.62, respectively). The data points in the image are approximately distributed on both sides of the reference line, although the two sides are slightly divergent. However, due to the large amount of data used in this study, it can be approximated that the data conform to the normal distribution. Common methods for cleaning outliers include the Z-Score method [23], the Median Absolute Deviation method (MAD) [24], and the K-means clustering algorithm [25]. Although MAD is simple to calculate, it is only based on the median and absolute deviation, failing to fully utilize all the information of the data. The result of the K-means clustering algorithm depends on the selection of the initial clustering centers, and its computational complexity is high, making it difficult to handle large-scale data. The Z-Score method, on the other hand, is suitable for normal distributions, is less likely to be affected by extreme values when dealing with large data scales, and is relatively simple to calculate. Therefore, the standard score method (Z-Score method) is used for outlier cleaning.

When the data roughly obey the normal distribution, the probability of

|Z_{i}| > 3

is only 0.3%, which is extremely low. The corresponding data are regarded as outliers and are filtered out by Z-Score. The medians of the wastewater flowrate are used to replace them.

2.3. ARIMA–Markov Model

In this study, a forecast model that combines Autoregressive Integrated Moving Average (ARIMA) and Markov structures is developed, known as the ARIMA–Markov model. ARIMA [26] is better at capturing the main linear dependencies, trends, and cyclical components in the time series through historical data and predicting future wastewater flow [27]. Nevertheless, ARIMA by itself faces challenges in handling the non-linear patterns within the data, which makes it hard to predict spikes in wastewater flow caused by heavy rainfall or special events like festivals.

In order to resolve the aforementioned issues, the Markov model [28,29] is hereby proposed. The residuals between the predicted values of ARIMA and the actual observed values are calculated, and the residuals are divided into multiple states according to their value ranges. Thereafter, the Markov state transition modelling is conducted, and the Markov chain is constructed by using the residual state sequence. The transition probability matrix P between the states is then estimated. This modelling can reveal the transition of residuals between different states and thereby capture the potential nonlinear dynamic features in the data. Based on the Markov model, the steady-state distribution is used to correct the residuals, and the corrected residuals are added back to the forecast results of the ARIMA model [Equation (1)]:

{\hat{Y}}_{t}^{a d j u s t e d} = {\hat{Y}}_{t}^{A R I M A} + Δ

(1)

where

{\hat{Y}}_{t}^{a d j u s t e d}

represents the adjusted time series predicted value,

{\hat{Y}}_{t}^{A R I M A}

represents the time series predicted value obtained using ARIMA, and

Δ

represents the residual correction.

This correction takes into account both the major linear trends captured by ARIMA and the predictive bias caused by nonlinear factors and abrupt changes. By taking advantage of both ARIMA in capturing the long-term linear relationship of time series and the characteristics of the Markov model in describing nonlinear state switching and random mutations, a hybrid forecast strategy is formed that takes into account the advantages of both. The experimental results show that the modified forecast results reduce the forecast error to a certain extent. The architecture diagram of ARIMA–LSTM–Transformer is shown in Figure 1.

2.4. ARIMA–LSTM–Transformer Model

A forecast model that combines ARIMA, the Long Short-Term Memory (LSTM) model, and Transformer structures is developed as well, known as the ARIMA–LSTM–Transformer model. The model uses ARIMA to model precipitation data to capture its trending portions. The forecast residuals of ARIMA are then passed on to the LSTM as one of the new features, and then the LSTM can capture more complex time series dependencies [30,31]. Holiday data (see Appendix B) are provided to the LSTM as an additional input feature to enhance the LSTM’s forecast ability for special dates. The LSTM calculates a hidden state based on each time step of the input, capturing local dependencies in the time series. The output of the LSTM is a fixed-length sequence, where the output of each time step contains the “memory” of the current time step and the previous time step, which captures the time dependencies in the sequence. The Transformer Encoder acts as a second layer network and processes the sequence of LSTM outputs. At this point, the output of the LSTM is already a chronological sequence of features, containing the hidden states of each time step. The Transformer Encoder further enhances feature representation through a self-attention mechanism and uses the relationship between different time steps to enhance the features of the LSTM output. The schematic diagram of the multi-head attention mechanism is shown in Figure A2 (see Appendix C).

In order to ensure that information can be efficiently transferred in a multi-layer network, Transformer introduces residual connections behind each sub-layer, such as self-attention and feedforward networks [32]. Layer normalization accelerates training and improves model stability by normalizing each input to ensure that the inputs for each layer have a mean value of 0 and a variance of 1. Eventually, this representation is mapped to the final output value (predicted value) through a fully connected layer. The architecture diagram of ARIMA–LSTM–Transformer is shown in Figure 2.

3. Results and Discussion

3.1. Model Validation

3.1.1. ARIMA–Markov’s Maximized Likelihood Function

The maximum likelihood function is a widely used method for parameter estimation in statistics. The core is to find a set of model parameter values under the premise of the given observation data so that the probability of data occurrence under this set of parameters reaches the maximum and can help determine the model configuration that best conforms to the internal laws of the data.

In our study, in order to ensure that the ARIMA model has good stationarity under different prediction periods, and to avoid information loss or model oversimplification due to excessive differences, the difference order d is limited to [0, 3], and the stationarity of the sequence is confirmed by graphical analysis, a unit root test, and ACF/PACF attenuation. The experimental results show that when d = 2, the differential sequence has reached a plateau, so d = 2 is finally selected.

For different values of

p

and

q

, the maximum likelihood functions were calculated for the three types of time periods needed: short term (4 days), medium term (4 weeks), and long term (8 months). In this study, the training-to-test split was set at 4:1; accordingly, the prediction horizons shown in the table are four times the length of the test set—namely 4 days, 4 weeks, and 8 months. Importantly, all parameter tuning and selection were conducted exclusively on the training set, with the test set reserved solely for final model evaluation. By comparing the magnitudes of log-likelihood values, the highest point of the three-dimensional function graph was selected as the optimal combination of

p

and

q

. The graph of the maximum likelihood function is shown in Figure 3. The optimal values of

p

and

q

corresponding to each period are shown in Table 1.

3.1.2. ARIMA–LSTM–Transformer’s Loss Function

The loss function, as a function that measures the difference between the predicted outcome of the model and the observed outcome, provides a quantitative metric for model evaluation [33]. By calculating the loss value, the degree of deviation between the model’s forecast and the real situation was understood, and the smaller the value is, the closer the forecast of the model is to the observed value and thus the better the performance of the model is. The loss function is divided into many categories, among which the mean square error is often used for regression problems, and the formula is as follows [Equation (2)]:

L o s s = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(2)

where

y_{i}

is the observed value of wastewater flow,

{\hat{y}}_{i}

is the predicted value of the wastewater flow, and

n

is the total number of samples.

Figure 4 shows the change curves of the training loss (loss) and validation loss (val_loss) functions of the ARIMA–LSTM–Transformer model during the training process. From left to right, the first picture shows the validation loss of the test set for 1 day and the training loss of the training set for 4 days. The second picture displays the validation loss of the test set for 1 week and the training loss of the training set for 4 weeks. The third picture presents the validation loss of the test set for 1 month and the training loss of the training set for 4 months. The training loss reflects the forecast error of the model on the training dataset, demonstrating the learning ability of the ARIMA–LSTM–Transformer model for the features of the training data. The validation loss, calculated through a validation dataset independent of the training set, is used to measure the generalization performance of the model on unseen data.

It can be seen that both the training loss and the validation loss are decreasing and tend to stabilize at relatively low values, and the gap between them does not increase significantly. This indicates that the model does not have a serious overfitting problem. Moreover, it can find the optimal hyperparameter values within a finite number of iterations, suggesting that the learning process is effective.

3.2. Ablation Experiments

An ablation experiment is an experimental method commonly used in scientific research [34], especially in machine learning, deep learning, and other fields. The ablation experiment is used to analyze the impact of individual components or factors in the model on the overall model performance and to observe the changes in the model performance by gradually removing different components or features in the model, ultimately providing insight into the specific role and contribution of each component or feature in the model. The experiment was conducted five times for each time interval in order to eliminate the most accurate and least accurate forecasts. The average predicted wastewater flow values of the ARIMA model and ARIMA–Markov model were compared with the observed values (as shown in Figure 5). The residual plot of the predicted flowrates for each model is shown in Figure 6.

It can be seen from the graph that the ARIMA–Markov model predicted values are the closest to the observed values. Nevertheless, to quantify the comparative performance of Markov–ARIMA and the ARIMA models, the evaluation indicators of the models while drawing the images were calculated, and the results are shown in Table 2.

As can be seen in Table 2, the ARIMA model’s highest

R^{2}

is 0.7501 and lowest

M A E

and

R M S E

are 193.1590 and 296.5314, respectively, which appear when the forecast period is 1 day. These data indicate that the forecast of a single ARIMA is more accurate for short intervals of time. However, it is difficult to capture the medium- and long-term data patterns, and there is room for progress.

After adding the Markov module to correct the residuals, the forecast accuracy of the short, medium, and long intervals significantly improves, with an average increase of 17.52% for the

R^{2}

and an average decrease of 29.67% and 28.81% for

M A E

and

R M S E

, respectively. Among them, the long-term forecast increases the most significantly, with the

R^{2}

side increasing by 27.37%, and the

M A E

and

R M S E

decreasing by 43.60% and 43.06%, respectively. The statement above shows that the hybrid model greatly improves the forecast ability of a single ARIMA for long-term data and verifies the accuracy and feasibility of the fusion of ARIMA and Markov.

Another ablation experiment was based on the ARIMA, LSTM, and Transformer models. The experiment was conducted five times for each time interval in order to eliminate the most accurate and least accurate forecasts. The average predicted wastewater flow values of the LSTM, Transformer, Transformer–LSTM, and ARIMA–LSTM–Transformer models were compared with the observed values (as shown in Figure 7). The residual plot of the predicted flowrates for each model is shown in Figure 8.

It can be seen from the graph that the ARIMA–LSTM–Transformer model predicted values are the closest to the observed values. The comparative quantification of ARIMA and ARIMA–Markov was performed here and is tabulated in Table 3.

As can be seen from Table 3, both the single Transformer and LSTM models achieve the highest accuracy when the prediction interval is 2 months. Specifically, the

R^{2}

of the Transformer is as high as 0.8546, with the

M A E

and

R M S E

being 189.9022 and 265.6617, respectively. This indicates that the forecast of long-term conditions by a single Transformer or LSTM model in this experiment is more accurate, but it is difficult to capture the short-term local data features, and there is room for progress.

After the fusion of Transformer and LSTM models, the accuracy is greatly improved compared to the single model, especially in the medium-term forecast; the

R^{2}

of the LSTM–Transformer combined model is 8.36% higher than that of the Transformer single model and 12.13% higher than that of the single LSTM model. The LSTM–Transformer hybrid model achieves 24.36% and 17.37% lower MAE and RMSE, respectively, compared to the single Transformer model, and demonstrates 28.46% and 22.19% reductions in

M A E

and

R M S E

, respectively, relative to the single LSTM model.

However, the accuracy of the hybrid model in short- and long-term forecasts does not significantly improve. Compared to the single Transformer model, the

R^{2}

of the hybrid model increases by 0.73% and 1.43% for short- and long-term forecasts, respectively, and the improvement effect is not very obvious.

Considering that the forecast model may have some limits with respect to short-term forecasts, the ARIMA model was added. The ARIMA–LSTM–Transformer hybrid model has good performance in the short-, medium-, and long-term forecasts. Specifically, its short-term

R^{2}

is 4.32% higher than that of the LSTM–Transformer model, and its medium-term and long-term forecasts also exhibit outstanding results, with an average

R^{2}

improvement of 4.24%.

Overall, the ARIMA–LSTM–Transformer model emerges as the optimal choice—especially for long-term forecasting tasks. Moreover, because the medium- and long-term dataset used in this study features high precipitation amounts, these results further demonstrate the model’s strong resistance to interference and validate the applicability and superiority of integrating ARIMA, LSTM, and Transformer techniques.

Noting the accuracy of the two hybrid models compared to their components longitudinally, two hybrid models were compared horizontally. In the one-day short-term forecast, ARIMA–Markov is more accurate; compared to the ARIMA–LSTM–Transformer model, the

R^{2}

increases by 0.0293, and the

M A E

and

R M S E

decrease by 28.9061 and 43.7782, respectively, probably because the short-term forecast dataset sample size is smaller. ARIMA directly models the linear relationship between the current value and the recent historical value through the autoregressive term, which is more sensitive to the capture of short-term trends and has natural adaptability to small samples and local dependence.

However, deep learning models such as LSTM and Transformer are good at capturing complex nonlinear relationships in long sequences, which require a large amount of data for training. A small amount of data may lead to overfitting and other situations that affect accuracy. Therefore, in the medium- and long-term forecasts, the accuracy of ARIMA–LSTM–Transformer is higher. Compared to ARIMA–Markov, the average

R^{2}

increases by 0.0247, and the

M A E

and

R M S E

decrease by 14.1839 and 28.3393, respectively. Overall, the accuracy of ARIMA–LSTM–Transformer is higher.

3.3. Stability Test for ARIMA–LSTM–Transformer Model

Of the above two hybrid models, ARIMA–Markov involves the Markov probability transfer matrix, and there is randomness in the transformation between various states; the ARIMA–LSTM–Transformer model involves the weight matrix and Adam optimization algorithm. Both models involve a certain random process, so the stability of the model is also an important indicator to measure the strength and disadvantage of the forecast model. In order to test the stability of ARIMA–Markov and ARIMA–LSTM–Transformer, the three datasets selected above were used, and each dataset was repeatedly predicted ten times with the exact same parameters. The

M A E

,

R M S E

, and

R^{2}

of these ten forecasts were obtained, an array containing ten datasets was formed, and the standard deviation corresponding to the array was calculated in turn to achieve a relatively stable size, as shown in Table 4.

From Table 4, it can be concluded that the stability of ARIMA–LSTM–Transformer is better than that of the ARIMA–Markov model in short-term, medium-term, and long-term forecasting, and ARIMA–LSTM–Transformer has the highest stability for predicting long-term time data. From the above, it can be seen that the accuracy of ARIMA–LSTM–Transformer is comparable to that of ARIMA–Markov in short-term forecasting, but the accuracy of the former is better than the ARIMA–Markov model in both medium-term and long-term forecasting. Therefore, the ARIMA–LSTM–Transformer model performs better in wastewater forecasting by combining the two aspects of accuracy and stability.

4. Algorithm Optimization

4.1. Parameter Selection

Although the ARIMA–LSTM–Transformer model has achieved good results, there are many hyperparameters in the model and the amount of computation is large. Thus, how to optimize the hyperparameters is an urgent problem that requires attention. Using optimization algorithms (e.g., SSA) can optimize hyperparameter selection and improve model accuracy [35]. In our study, three algorithm optimization models, WOA [36], PSO [37], and SSA [38], were selected to optimize the ARIMA–LSTM–Transformer model. After experimentation, it was found that WOA optimization works best. The initial values of each optimization algorithm are shown in Table A3 (see Appendix D), and the WOA optimization frame diagram is shown in Figure 9. It demonstrates the complete process, from the pre-processing of sewage flow data, to the construction of a model for feature extraction, and then to the hyperparameter optimization of WOA.

4.2. Accuracy Testing

As above, forecast periods of one day, seven days, and two months were selected. Then, the corresponding

R^{2}

,

R M S E

, and

M A E

were calculated to judge the accuracy of the optimized model. The experiment was repeated five times in each time interval to eliminate the best and worst forecast results. The average wastewater flowrate forecast values of the remaining three experiments were recorded and used to plot the predicted flowrate curve of the model, as shown in Figure 10. The residual plot of the predicted flowrates for each model is shown in Figure 11.

In order to evaluate the performance of each optimization algorithm in the ARIMA–LSTM–Transformer model more roundly and scientifically, the evaluation indicators were calculated, and the results are shown in Table 5.

A comparison of Table 3 and Table 5 shows that, compared with the unoptimized ARIMA–LSTM–Transformer model, adding the SSA algorithm does not improve the model’s prediction accuracy: the

R^{2}

for the short and long-term forecasts increases by only 0.19% and 0.41%, respectively, while the

R^{2}

of medium-term forecasts actually drops by 3.97%. This may be due to the model’s inherent complexity, a mismatch between the model and SSA’s optimization mechanism, the high nonlinearity introduced by including precipitation in the dataset, or external factors such as holidays. Nevertheless, after adding PSO and WOA, the model’s forecasts for the short, medium, and long terms all show some improvement, but WOA yields the greater gains: compared to results before optimization, the average

R^{2}

increases by 2.78%, better than the 1.49% of PSO. MAE and RMSE decrease by 12.99% and 11.22%, respectively, again outperforming PSO’s reductions of 5.99% and 6.46%. Consequently, among the three algorithms tested, WOA provides the best optimization effect.

4.3. Stability Test for Optimization Algorithms

In order to test the stability of the optimized ARIMA–LSTM–Transformer, the three datasets selected above were used, and each dataset was repeated ten times with the exact same parameters; the

M A E

,

R M S E

, and

R^{2}

of these ten forecasts were obtained and formed into an array containing ten data points. The corresponding standard deviations of the arrays were calculated successively to compare the stability. The calculation results are shown in Table 6.

As demonstrated in Figure 10, after the addition of the optimization algorithm, except for some individual data points, the stability of the model improves compared to before optimization. Among them, the standard deviations corresponding to

R^{2}

,

R M S E

, and

M A E

of the model optimized by the WOA are all the lowest, indicating that the stability of the model after WOA optimization is higher than that of the other two algorithms. Compared to the unoptimized model, after WOA optimization, the standard deviation of

R^{2}

decreases by an average of 0.0061, the standard deviation of

R M S E

decreases by an average of 5.6039, and the standard deviation of

M A E

decreases by an average of 4.0102. This indicates that the ARIMA–LSTM–Transformer hybrid model optimized by the WOA algorithm has good stability in short-term, medium-term, and long-term forecasts.

5. Conclusions

To solve the complex problem of urban wastewater flowrate forecasting, models based on WOA, ARIMA, LSTM, and Transformer for wastewater flowrate forecasting were combined and achieved good results.

The experimental results show that ARIMA can accurately grasp the local characteristics of short-term wastewater flow changes, capture trends, and pass them on to LSTM as one of the characteristics. On this basis, LSTM can capture more complex temporal dependencies, and features such as holidays were also quantified to achieve multivariate regression analysis. Transformer greatly enhances the forecast accuracy of the hybrid model for medium- and long-term data, and WOA optimizes the number of LSTM layers, Encoder layers, and Decoder layers in LSTM and Transformer, reducing the computational cost and the possibility of overfitting the hybrid model.

After the introduction of the WOA, ARIMA, and LSTM modules, the

R^{2}

increased by 10.86%, the

R M S E

was reduced by 25.21%, and the

M A E

was reduced by 30.02% compared to a single Transformer, demonstrating a significant improvement. The comparison between single and hybrid models shows the latter to be associated with higher accuracy and stability. However, the hybrid model still has some limitations and may not be able to predict the flowrate due to high summer temperatures, water evaporation, or abnormal wastewater flow due to water and power outages in specific areas of the city, which should be the focus of subsequent research.

Future work includes (1) further optimizing the structure and performance of the model—although the proposed model performed well in the current study, its forecast of wastewater flow in the short term is still not sufficiently accurate, and (2) enhancing the anti-interference ability of the model by incorporating more factors that affect wastewater flow, such as urban population migration, urban water, and power outages.

Author Contributions

Conceptualization, J.Y. and H.W.; formal analysis, X.M.; validation, X.M.; supervision, H.W.; data curation, J.Y., X.M., Q.Z. and S.A.; investigation, J.Y. and T.A.; methodology, J.Y., X.M., H.W., Q.Z. and S.A.; writing—original draft, J.Y., X.M., Q.Z. and S.A.; visualization, J.Y. and P.G.B.; writing—review and editing, J.Y., H.W., T.A., P.G.B. and D.R.; funding acquisition, J.Y. and H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the 2025 University Student’s Innovation and Entrepreneurship Training Project of China University of Geosciences, Beijing. It is also supported by the 2024 Subject Development Research Fund Project of China University of Geosciences, Beijing (grant No. 2024XK208) and 2025 Special Projects for Graduate Education and Teaching Reform of China University of Geosciences, Beijing (Grant No. JG2025031).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

After missing value imputation, this dataset includes both historical wastewater flow data and corresponding real-time rainfall data, recorded at 1 min intervals throughout the entire year of 2024. In order to obtain the forecast results of each model for different time spans, the forecast intervals of the model were divided into one day, seven days, and two months, and the forecast results of each model were obtained in three intervals: short, medium, and long. Detailed information of the data for training the model are shown in Table A1. Given short dry gaps during precipitation, events with gaps less than or equal to 2 h were classified as a single precipitation event.

Table A1. Dataset description.

Time Span	Range of Wastewater Flowrate ( $m^{3} h^{- 1}$ )	Number of Rainfall Events	Maximum Single Total Precipitation ( $m m$ )	Training Sample Size	Testing Sample Size
12–17 August 2024	858.70–5058.02	11	23.40	1052	288
4 May–8 June 2024	889.37–5061.88	59	98.40	8064	2016
15 January 2024–15 November 2024	723.27–5068.71	331	99.72	70,272	17,280

Appendix B

Considering that holiday factors will have certain impacts on wastewater flow (unlike precipitation and other factors that can be directly read into the data), holiday data are a non-continuous category feature. Therefore, in this paper, the corresponding legal holidays in China are selected and marked as binary features (1 represents a holiday and 0 represents a non-holiday). The binary features (or categorical features) of holidays are passed to the hybrid model as additional inputs to help the hybrid model capture the impact of holidays on precipitation.

Appendix C

Figure A1. QQ diagram of the dataset.

This graph is used to show the relationship between the sample quantiles and the theoretical quantiles. If the sample data follow a normal distribution, the blue dots in the graph should approximately lie on the diagonal line.

As a core component of the Transformer, the multi-head attention mechanism plays a crucial role in capturing multi-dimensional information and enhancing the model’s expressive ability. Figure A2 briefly introduces the framework of the multi-head attention mechanism, including modules such as the linear transformation layer and the feature matrix.

Figure A2. Schematic diagram of the multi-head attention mechanism.

Figure A3. This graphic abstract shows the entire process of modeling.

Appendix D

Table A2. Hyperparameter ranges for each model.

Name	Hyperparameters	Parameter Ranges
Markov	Number of states	[30, 50]
ARIMA	Autoregressive term order $p$	[2, 4]
	Differential order $d$	[0, 3]
	Moving average term order $q$	4
Transformer	Number of heads for long attention	2, 4, 8
	Transformer Encoder layer numbers	[1, 6]
	Hidden layer dimensions	256, 512, 1024
	Feedforward layer dimension	1024, 2048
LSTM	Number of neurons	32, 50, 64
LSTM	Neuronal drop rate	[0, 0.5]

Table A3. The initial values of each optimization algorithm.

Algorithm Name	Parameters	Initial Values
WOA	Population size	50
	Maximum number of iterations	250
	Convergence constant	Decreasing linearly from 2 to 0
	Spiral shape parameters	1
PSO	Number of particles	40
	Maximum number of iterations	250
	Inertia weights	Decreasing linearly from 0.9 to 0.4
	Individual and social learning factors ( $C_{1}$ , $C_{2}$ )	2, 2
SSA	Population size	50
	Maximum number of iterations	250
	Proportion of finders	[0.1, 0.3]
	Proportion of vigilantes	[0.1, 0.2]
	Security thresholds	[0.5, 1]

In this paper, ARIMA, Markov, LSTM, and Transformer involve many parameters. SSA, PSO, and WOA optimization algorithms also have numerous parameters that need to be initialized. The initial values corresponding to the parameters of each model and optimization algorithm are shown in Table A2 and Table A3 above.

Appendix E

Figure A4. The scatter plot illustrates the comparison between predicted and observed flow rates on the test set for different models. The x-axis represents the observed flow rate, the y-axis the predicted flow rate, and the black dashed line the ideal 1:1 reference line—points closer to this line indicate more accurate predictions. (a–c) The scatter plots formed by the ARIMA model and the ARIMA–Markov model for 1-day, 7-day, and 2-month predictions, respectively. (d–f) The scatter plots formed by the LSTM model, Transformer model, LSTM–Transformer, and ARIMA–LSTM–Transformer for 1-day, 7-day, and 2-month predictions, respectively. (g–i) The scatter plots formed by the ARIMA–LSTM–Transformer model optimized by SSA, PSO, and WOA algorithms for 1-day, 7-day, and 2-month predictions, respectively.

Appendix F

Because there are many models used in this paper, there are also many corresponding abbreviations. In order to prevent the reading inconvenience caused by too many abbreviations, we created a table of abbreviations to facilitate the reader to cross-reference in the reading process. The abbreviation reference table is shown in Table A4.

Table A4. Abbreviation table.

Abbreviation	Full Term
ARIMA	Autoregressive Integrated Moving Average
BiLSTM	Bidirectional Long Short-Term Memory
KDE	Kernel Density Estimation
LSTM	Long Short-Term Memory
MAD	Median Absolute Deviation
MAPE	Mean Absolute Percentage Error
NSE	Nash–Sutcliffe Efficiency
PSO	Particle Swarm Optimization
QQ	Quantile–Quantile (plot)
R²	Coefficient of Determination
RF	Random Forest
RNN	Recurrent Neural Network
SSA	Sparrow Search Algorithm
VMD	Variational Mode Decomposition
WOA	Whale Optimization Algorithm
WWTP	Wastewater Treatment Plant

Appendix G

Whale Optimization Algorithm

The Whale Optimization Algorithm (WOA) is a swarm intelligence optimization algorithm based on the predatory behavior of natural humpback whales. The core idea is that there are two search mechanisms: encircling prey and spiral renewal. The algorithm steps are as follows:

First, the parameters are initialized, the whale population size (N) and the maximum number of iterations are set, and the position and speed of the whale population are initialized. Then, according to the objective function, the fitness value of each whale individual is calculated. Based on the fitness value, the position of the current global optimal solution is updated. Based on the position of the current whale individual and the position of the global optimal solution, one of the following two strategies is used to update the position: (1) Surround the prey: In this stage, the whale approaches the prey by updating the position, and the position update formula is as follows:

X_{t + 1} = X_{t} + A \cdot D

(A1)

(2) Spiral update: Simulate the spiral movement of the whale around the prey. The position update formula is as follows:

X_{t + 1} = X_{p} + A \cdot c o s (2 π l) \cdot D

(A2)

where

X_{t}

represents the position of the whale at time

t

;

X_{p}

represents the position vector of the prey, which is the target position that the whale wants to approach; and

A

is a coefficient vector that affects the step size and direction of the whale’s position update.

D

represents the distance vector between the current whale position and the prey position, which is used to measure the difference in position between the whale and the prey.

When the stopping conditions (such as reaching the maximum number of iterations or the convergence of fitness values) are met, the optimal solution is output. Otherwise, the calculation of the fitness value is returned.

2.: Particle Swarm Optimization Algorithm

Particle Swarm Optimization (PSO) is a global optimization algorithm based on swarm intelligence. The core idea is to represent the potential solution of the problem as “particles,” which find the optimal solution by adjusting their position in the search space and continuously optimizing the solution using information from individuals and groups.

The formula for updating the velocity and position of particles is as follows:

S p e e d u p d a t e : v_{i d}^{k + 1} = ω \cdot v_{i d}^{k} + c_{1} \cdot r_{1} \cdot ({p b e s t}_{i d} - x_{i d}^{k}) + c_{2} \cdot r_{2} \cdot ({g b e s t}_{d} - x_{i d}^{k})

(A3)

L o c a t i o n u p d a t e : x_{i d}^{k + 1} = x_{i d}^{k} + v_{i d}^{k + 1}

(A4)

where

v_{i d}

stands for the velocity of the particle in the

i t h

dimension;

x_{i d}

is the position of the particle in the ith dimension;

{p b e s t}_{i d}

represents the individual optimal position of the particle in the ith dimension;

{g b e s t}_{i d}

is the global optimal position;

ω

is the inertia weight;

c_{1}

and

c_{2}

are the learning factors, which control the weight of the particle to the individual optimal position and the global optimal position; and

r_{1}

and

r_{2}

are random numbers between [0, 1], which increases the randomness of the algorithm.

3.: Sparrow Search Algorithm

The Sparrow Search Algorithm (SSA) is a swarm intelligence optimization algorithm that simulates sparrows’ foraging and anti-predation behaviors. The core idea is to divide the sparrow colony into different roles, including finders, scavengers, and alerters. The algorithm steps are as follows:

First, the location of the sparrow population is randomly generated, and then the algorithm parameters are set, such as the maximum number of iterations, population size, etc. According to the objective function, the fitness value of each sparrow individual is calculated, and then the sparrow population is ranked according to the fitness value to determine the global optimal solution. It keeps updating knowing that the stop condition is met. The position update formula is as follows:

Finder Location Update:

{W i t h o u t p r e d a t o r s : X}_{i, j}^{t + 1} = X_{i, j}^{t} \cdot e^{\frac{- i}{α \cdot {i t e r}_{m a x}}}

(A5)

D e t e c t i n g p r e d a t o r s : X_{i, j}^{t + 1} = X_{i, j}^{t} + Q \cdot L

(A6)

where

X_{i, j}^{t}

is the current position of the

i

th sparrow in the

j

-dimension,

{i t e r}^{m a x}

is the maximum number of iterations,

α

is a random number,

Q

is a random number that obeys a normal distribution, and

L

is a matrix with all 1.

Scavenger location update (when a less adaptable scavenger needs to find a new food source):

X_{i, j}^{t + 1} = X_{i, j}^{t} e^{\frac{X_{w o r e t}^{t} - X_{i, j}^{t}}{i^{2}}}

(A7)

Otherwise, the scavengers follow the finder:

X_{i, j}^{t + 1} = X_{p}^{t + 1} + |X_{i, j}^{t} - X_{P}^{t + 1}| \cdot A^{+} \cdot L

(A8)

where

X_{P}^{t + 1}

is the optimal position of the discoverer,

X_{w o r s t}^{t}

is the current global worst position,

A^{+}

is the random matrix, and

L

is the matrix of all 1 s.

Alert Position Update (when a less adaptable alert person needs to find a new food source):

X_{i, j}^{t + 1} = X_{b e s t}^{t} + β \cdot |X_{i, j}^{t} - X_{b e s t}^{t}|

(A9)

Otherwise, the alert moves randomly:

X_{i, j}^{t + 1} = X_{i, j}^{t} + k \cdot (\frac{|X_{i, j}^{t} - X_{w o r s t}^{t}|}{(f_{i} - f_{w}) + ε})

(A10)

where

X_{b e s t}^{t}

is the current global optimal position,

β

is the step size control parameter,

k

is the random number,

f_{i}

is the fitness value of the current sparrow,

f_{w}

is the current global worst fitness value, and

ε

is the constant to avoid the denominator being zero.

References

Liu, R.; Han, D.; Zhang, H.; Ma, Y.; Hao, X. Reviewing and diagnosing upgradation strategies of wastewater treatment plants in China. J. Water Process Eng. 2025, 71, 107267. [Google Scholar] [CrossRef]
Zhou, P.; Li, Z.; Zhang, Y.; Snowling, S.; Barclay, J. Online machine learning for stream wastewater influent flow rate prediction under unprecedented emergencies. Front. Environ. Sci. Eng. 2023, 17, 152. [Google Scholar] [CrossRef]
Cai, S.; Zhang, Z.; Yang, X.; Lv, Q.; Liu, X.; Lai, R.; Yu, X.; Hu, Y. The modified theoretical model for debris flows predication with multiple rainfall characteristic parameters. Sci. Rep. 2025, 15, 12402. [Google Scholar] [CrossRef] [PubMed]
Grimm, N.B.; Faeth, S.H.; Golubiewski, N.E.; Redman, C.L.; Wu, J.; Bai, X.; Briggs, J.M. Global change and the ecology of cities. Science 2008, 319, 756–760. [Google Scholar] [CrossRef]
Han, X.; Yu, T.; Wang, W.; Shu, S.; Zhu, Y.; Li, H.; Hu, C. Shock load effects on WWTP performance under low loads following capacity expansion for combined sewer overflow control. Desalination Water Treat. 2025, 322, 101147. [Google Scholar] [CrossRef]
Chen, X.; Zhang, K.; Ji, Z.; Shen, X.; Liu, P.; Zhang, L.; Wang, J.; Yao, J. Progress and Challenges of Integrated Machine Learning and Traditional Numerical Algorithms: Taking Reservoir Numerical Simulation as an Example. Mathematics 2023, 11, 4418. [Google Scholar] [CrossRef]
Gong, X.; Li, B.; Yang, Y.; Li, M.; Li, T.; Zhang, B.; Zheng, L.; Duan, H.; Liu, P.; Hu, X.; et al. Construction and application of optimized model for mine water inflow prediction based on neural network and ARIMA model. Sci. Rep. 2025, 15, 2009. [Google Scholar] [CrossRef]
Maleki, A.; Nasseri, S.; Aminabad, M.S.; Hadi, M. Comparison of ARIMA and NNAR models for forecasting water treatment plant’s influent characteristics. KSCE J. Civ. Eng. 2018, 22, 3233–3245. [Google Scholar] [CrossRef]
Bień, J.D.; Bień, B. Forecasting the municipal sewage sludge amount generated at wastewater treatment plants using some machine learning methods. Desalination Water Treat. 2023, 288, 265–272. [Google Scholar] [CrossRef]
Li, D.; Sun, Y.; Sun, J.; Wang, X.; Zhang, X. An advanced approach for the precise prediction of water quality using a discrete hidden markov model. J. Hydrol. 2022, 609, 127659. [Google Scholar] [CrossRef]
Farhi, N.; Kohen, E.; Mamane, H.; Shavitt, Y. Prediction of wastewater treatment quality using LSTM neural network. Environ. Technol. Innov. 2021, 23, 101632. [Google Scholar] [CrossRef]
Chen, J.; N’Doye, I.; Myshkevych, Y.; Aljehani, F.; Monjed, M.K.; Laleg-Kirati, T.-M.; Hong, P.-Y. Viral particle prediction in wastewater treatment plants using nonlinear lifelong learning models. NPJ Clean Water 2025, 8, 28. [Google Scholar] [CrossRef]
Ali, A.J.; Ahmed, A.A. Long-term AI prediction of ammonium levels in rivers using transformer and ensemble models. Clean. Water 2024, 2, 100051. [Google Scholar] [CrossRef]
Russo, S.; Lürig, M.; Hao, W.; Matthews, B.; Villez, K. Active learning for anomaly detection in environmental data. Environ. Modell. Softw. 2020, 134, 104869. [Google Scholar] [CrossRef]
Cui, X.; Zhu, J.; Jia, L.; Wang, J.; Wu, Y. A novel heat load prediction model of district heating system based on hybrid whale optimization algorithm (WOA) and CNN-LSTM with attention mechanism. Energy 2024, 312, 133536. [Google Scholar] [CrossRef]
Du, B.; Huang, S.; Guo, J.; Tang, H.; Wang, L.; Zhou, S. Interval forecasting for urban water demand using PSO optimized KDE distribution and LSTM neural networks. Appl. Soft Comput. 2022, 122, 108875. [Google Scholar] [CrossRef]
Zhang, X.; Wang, X.; Li, H.; Sun, S.; Liu, F. Monthly runoff prediction based on a coupled VMD-SSA-BiLSTM model. Sci. Rep. 2023, 13, 13149. [Google Scholar] [CrossRef]
Villez, K.; Vanrolleghem, P.A.; Corominas, L. A general-purpose method for Pareto optimal placement of flow rate and concentration sensors in networked systems—With application to wastewater treatment plants. Comput. Chem. Eng. 2020, 139, 106880. [Google Scholar] [CrossRef]
Dhar, B.; Sajid, M. Mathematical analysis of scrub typhus seasonal infection with re-scaled transmission rate considering Northeast India reported data from 2010 to 2022. Sci. Rep. 2025, 15, 10785. [Google Scholar] [CrossRef]
Yu, J.-H.; Choi, Y.-J.; Seo, S.-H.; Choi, S.-G.; Jeong, H.-Y.; Kim, J.-E.; Baek, M.-S.; You, Y.-H.; Song, H.-K. Improved Connected-Mode Discontinuous Reception (C-DRX) Power Saving and Delay Reduction Using Ensemble-Based Traffic Prediction. Mathematics 2025, 13, 974. [Google Scholar] [CrossRef]
Wu, Y.-G.; Wu, C.-H. Image vector quantization codec indices recovery using Lagrange interpolation. Image Vis. Comput. 2008, 26, 1171–1177. [Google Scholar] [CrossRef]
Augustin, N.H.; Sauleau, E.-A.; Wood, S.N. On quantile quantile plots for generalized linear models. Comput. Stat. Data Anal. 2012, 56, 2404–2409. [Google Scholar] [CrossRef]
Yaro, A.S.; Maly, F.; Prazak, P.; Malý, K. Outlier Detection Performance of a Modified Z-Score Method in Time-Series RSS Observation With Hybrid Scale Estimators. IEEE Access 2024, 12, 12785–12796. [Google Scholar] [CrossRef]
Shrestha, P.; Park, Y.; Kwon, H.; Kim, C.-G. Error outlier with weighted Median Absolute Deviation threshold algorithm and FBG sensor based impact localization on composite wing structure. Compos. Struct. 2017, 180, 412–419. [Google Scholar] [CrossRef]
Wang, J.; Zhang, Z.; Yue, S. A Validity Index for Clustering Evaluation by Grid Structures. Mathematics 2025, 13, 1017. [Google Scholar] [CrossRef]
Wang, H.; Song, S.; Zhang, G.; Ayantoboc, O.O. Predicting daily streamflow with a novel multi-regime switching ARIMA-MS-GARCH model. J. Hydrol. Reg. Stud. 2023, 47, 101374. [Google Scholar] [CrossRef]
Wang, X.; Dong, Y.; Yang, J.; Liu, Z.; Lu, J. A benchmark-based method for evaluating hyperparameter optimization techniques of neural networks for surface water quality prediction. Front. Environ. Sci. Eng. 2024, 18, 54. [Google Scholar] [CrossRef]
Krishna Iyer, P.V. k-State Markov chains. Nature 1962, 196, 912. [Google Scholar] [CrossRef]
Rabindrajit Luwang, S.; Rai, A.; Nurujjaman, M.; Prakash, O.; Hens, C. High-frequency stock market order transitions during the US–China trade war 2018: A discrete-time Markov chain analysis. Chaos Interdiscip. J. Nonlinear Sci. 2024, 34, 013118. [Google Scholar] [CrossRef]
Dash, N.; Chakravarty, S.; Rath, A.K.; Giri, N.C.; AboRas, K.M.; Gowtham, N. An optimized LSTM-based deep learning model for anomaly network intrusion detection. Sci. Rep. 2025, 15, 1554. [Google Scholar] [CrossRef]
Dai, Z.; Liu, S.; Liu, C. Predict the prevalence and incidence of Parkinson’s disease using fractal interpolation-LSTM model. Chaos Interdiscip. J. Nonlinear Sci. 2024, 34, 053105. [Google Scholar] [CrossRef] [PubMed]
Yang, Z.; Li, W.; Yuan, F.; Zhi, H.; Guo, M.; Xin, B.; Gao, Z. Hybrid CNN-BiLSTM-MHSA Model for Accurate Fault Diagnosis of Rotor Motor Bearings. Mathematics 2025, 13, 334. [Google Scholar] [CrossRef]
Jiang, J.; Wang, H.; Lin, J.; Wang, F.; Liu, Z.; Wang, L.; Li, Z.; Li, Y.; Li, Y.; Lu, Z. Nature-inspired hierarchical building materials with low CO₂ emission and superior performance. Nat. Commun. 2025, 16, 3018. [Google Scholar] [CrossRef] [PubMed]
Spellauge, M.; Doñate-Buendía, C.; Barcikowski, S.; Gökce, B.; Huber, H.P. Comparison of ultrashort pulse ablation of gold in air and water by time-resolved experiments. Light Sci. Appl. 2022, 11, 68. [Google Scholar] [CrossRef]
Yang, F.; Zhi, M.; An, Y. Revealing large-scale surface subsidence in Jincheng City’s mining clusters using MT-InSAR and VMD-SSA-LSTM time series prediction model. Sci. Rep. 2025, 15, 5726. [Google Scholar] [CrossRef]
Zemmit, A.; Loukriz, A.; Belhouchet, K.; Alharthi, Y.Z.; Alshareef, M.; Paramasivam, P.; Ghoneim, S.S.M. GWO and WOA variable step MPPT algorithms-based PV system output power optimization. Sci. Rep. 2025, 15, 7810. [Google Scholar] [CrossRef]
Fu, X.; Zheng, Q.; Jiang, G.; Roy, K.; Huang, L.; Liu, C.; Li, K.; Chen, H.; Song, X.; Chen, J.; et al. Water quality prediction of copper-molybdenum mining-beneficiation wastewater based on the PSO-SVR model. Front. Environ. Sci. Eng. 2023, 17, 98. [Google Scholar] [CrossRef]
Wang, Z.; Peng, Q.; Rao, W.; Li, D. An improved sparrow search algorithm with multi-strategy integration. Sci. Rep. 2025, 15, 3314. [Google Scholar] [CrossRef]

Figure 1. ARIMA–Markov frame diagram.

Figure 2. ARIMA–LSTM–Transformer frame diagram.

Figure 3. Maximum likelihood function graphs.

Figure 4. Loss function graphs.

Figure 5. Observed and predicted flowrates using the ARIMA and ARIMA–Markov models for (a) 1-day, (b) 7-day, and (c) 60-day prediction horizons.

Figure 6. Predicted flowrates’ residuals using the ARIMA and ARIMA–Markov models for (a) 1-day, (b) 7-day, and (c) 60-day prediction horizons.

Figure 7. Observed and predicted flowrates using LSTM, Transformer, LSTM–Transformer, and ARIMA–LSTM–Transformer models for (a) 1-day, (b) 7-day, and (c) 60-day prediction horizons.

Figure 8. Predicted flowrates’ residuals using LSTM, Transformer, LSTM–Transformer, and ARIMA–LSTM–Transformer models for (a) 1-day, (b) 7-day, and (c) 60-day prediction horizons.

Figure 9. The WOA–ARIMA–LSTM–Transformer framework diagram.

Figure 10. Observed and predicted flowrates by implementing PSO, WOA, and SSA optimization models for (a) 1-day, (b) 7-day, and (c) 60-day prediction horizons.

Figure 11. Predicted flowrates’ residuals by implementing PSO, WOA, and SSA optimization models for (a) 1-day, (b) 7-day, and (c) 60-day prediction horizons.

Table 1. The optimal values of

p

and

q

in each period (

d = 2

).

Table 1. The optimal values of

p

and

q

in each period (

d = 2

).

Time Domain	$p$	$q$
4 days	2	4
4 weeks	3	4
8 months	4	4

Table 2. Evaluation indicators of different forecast models (1).

Model	Entity	1 Day	7 Days	2 Months
ARIMA	$M A E$	193.1590	233.2706	274.2393
	$R M S E$	296.5314	343.8097	377.0760
	$R^{2}$	0.7501	0.7337	0.7118
ARIMA-Markov	$M A E$	149.6994	188.4114	154.6708
	$R M S E$	225.8415	283.7057	214.6911
	$R^{2}$	0.8550	0.8187	0.9066

Table 3. Evaluation indicators of different forecast models (2).

Model	Entity	1 Day	7 Days	2 Months
Transformer	$M A E$	189.5547	233.8810	189.9022
	$R M S E$	274.5323	304.2115	265.6617
	$R^{2}$	0.7858	0.7914	0.8546
LSTM	$M A E$	197.9939	247.2158	202.7077
	$R M S E$	285.9588	323.0652	266.9309
	$R^{2}$	0.7676	0.7648	0.8532
LSTM–Transformer	$M A E$	182.7820	176.8608	181.0977
	$R M S E$	270.8501	251.3609	254.3222
	$R^{2}$	0.7915	0.8576	0.8668
ARIMA–LSTM–Transformer	$M A E$	178.6055	161.4459	153.2685
	$R M S E$	269.6197	230.9424	210.7759
	$R^{2}$	0.8257	0.8798	0.9178

Table 4. Standard deviation of the ARIMA–Markov and ARIMA–LSTM–Transformer models.

Model	Entity	1 Day ^a	7 Days ^a	2 Months ^a
ARIMA–Markov	$M A E$	28.7916	42.3466	25.1970
	$R M S E$	41.0260	27.3987	29.5438
	$R^{2}$	0.1019	0.0422	0.0378
ARIMA–LSTM–Transformer	$M A E$	12.9359	5.2432	4.9143
	$R M S E$	16.2493	6.0679	3.8584
	$R^{2}$	0.0157	0.0123	0.0036

^a The data in the table represent the standard deviations of the arrays obtained from ten forecasts of the corresponding indicators, not the specific values.

Table 5. The evaluation indicators of the optimized ARIMA–LSTM–Transformer model.

Optimization Algorithms	Index	1 Day	7 Days	2 Months
SSA	$M A E$	167.2940	183.4126	147.3382
	$R M S E$	253.5052	270.6702	206.8195
	$R^{2}$	0.8273	0.8449	0.9216
PSO	$M A E$	168.7508	155.3685	139.6621
	$R M S E$	250.4343	227.5126	187.4444
	$R^{2}$	0.8317	0.8934	0.9374
WOA	$M A E$	162.7161	133.5755	132.8711
	$R M S E$	246.4888	200.2303	184.8365
	$R^{2}$	0.8373	0.9196	0.9394

Table 6. The standard deviation of the optimized ARIMA–LSTM–Transformer model.

Model	Entity	1 Day ^a	7 Days ^a	2 Months ^a
SSA	$M A E$	6.4156	7.4094	6.2892
	$R M S E$	4.3774	9.3836	9.3708
	$R^{2}$	0.0267	0.01676	0.0065
PSO	$M A E$	4.3836	16.3154	4.2719
	$R M S E$	2.3340	9.4720	4.1376
	$R^{2}$	0.0037	0.0207	0.0034
WOA	$M A E$	2.8193	6.1603	2.0831
	$R M S E$	2.7386	4.0882	2.5371
	$R^{2}$	0.0037	0.0076	0.0019

^a The data in the table represent the standard deviations of the arrays obtained from ten forecasts of the corresponding indicators, not the specific values.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ye, J.; Meng, X.; Wang, H.; Zhou, Q.; An, S.; An, T.; Ghorbani Bam, P.; Rosso, D. ARIMA-Based Forecasting of Wastewater Flow Across Short to Long Time Horizons. Mathematics 2025, 13, 2098. https://doi.org/10.3390/math13132098

AMA Style

Ye J, Meng X, Wang H, Zhou Q, An S, An T, Ghorbani Bam P, Rosso D. ARIMA-Based Forecasting of Wastewater Flow Across Short to Long Time Horizons. Mathematics. 2025; 13(13):2098. https://doi.org/10.3390/math13132098

Chicago/Turabian Style

Ye, Jiawen, Xulai Meng, Haiying Wang, Qingdao Zhou, Siwei An, Tong An, Pooria Ghorbani Bam, and Diego Rosso. 2025. "ARIMA-Based Forecasting of Wastewater Flow Across Short to Long Time Horizons" Mathematics 13, no. 13: 2098. https://doi.org/10.3390/math13132098

APA Style

Ye, J., Meng, X., Wang, H., Zhou, Q., An, S., An, T., Ghorbani Bam, P., & Rosso, D. (2025). ARIMA-Based Forecasting of Wastewater Flow Across Short to Long Time Horizons. Mathematics, 13(13), 2098. https://doi.org/10.3390/math13132098

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ARIMA-Based Forecasting of Wastewater Flow Across Short to Long Time Horizons

Abstract

1. Introduction

2. Materials and Methods

2.1. Missing Value Handling

2.2. Outlier Identification

2.3. ARIMA–Markov Model

2.4. ARIMA–LSTM–Transformer Model

3. Results and Discussion

3.1. Model Validation

3.1.1. ARIMA–Markov’s Maximized Likelihood Function

3.1.2. ARIMA–LSTM–Transformer’s Loss Function

3.2. Ablation Experiments

3.3. Stability Test for ARIMA–LSTM–Transformer Model

4. Algorithm Optimization

4.1. Parameter Selection

4.2. Accuracy Testing

4.3. Stability Test for Optimization Algorithms

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Appendix F

Appendix G

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI