Research on Short-Term Passenger Flow Prediction of LSTM Rail Transit Based on Wavelet Denoising

: Urban rail transit offers advantages such as high safety, energy efﬁciency, and environmental friendliness. With cities rapidly expanding, travelers are increasingly using rail systems, heightening demands for passenger capacity and efﬁciency while also pressuring these networks. Passenger ﬂow forecasting is an essential part of transportation systems. Short-term passenger ﬂow forecasting for rail transit can estimate future station volumes, providing valuable data to guide operations management and mitigate congestion. This paper investigates short-term forecasting for Suzhou’s Shantang Street station. Shantang Street’s high commercial presence and distinct weekday versus weekend ridership patterns make it an interesting test case, making it a representative subway station. Wavelet denoising and Long Short Term Memory (LSTM) were combined to predict short-term ﬂows, comparing the results to those of standalone LSTM, Support Vector Regression (SVR), Artiﬁcial Neural Network (ANN), and Autoregressive Integrated Moving Average Model (ARIMA). This study illustrates that the algorithms adopted exhibit good performance for passenger prediction. The LSTM model with wavelet denoising proved most accurate, demonstrating applicability for short-term rail transit forecasting and practical signiﬁcance. The research ﬁndings can provide fundamental recommendations for implementing appropriate passenger ﬂow control measures at stations and offer effective references for predicting passenger ﬂow and mitigating trafﬁc pressure in various cities.


Introduction
The modern development of the city constantly promotes the rapid growth of traffic demand [1].Urban rail transit, as an important part of urban public transportation, is greatly significant for improving urban passenger flow and transportation efficiency, as well as alleviating traffic congestion.Scientific passenger flow prediction plays an extremely important role in feasibility studies for urban rail transit, layout planning of urban rail transit networks, and decision-making around urban rail transit construction scales and levels.On the one hand, it is conducive to dredging passengers, rationally arranging the flow lines of passengers in the station, and improving the quality of passenger flow organization.On the other hand, it helps the urban transportation system take timely response measures and ensure public safety [2,3].
Short-time passenger flow forecasting is a dynamic control method that mainly forecasts passenger flow in the future based on existing passenger flow data [4].The research on the traditional forecasting model of rail transit passenger flow is quite mature.The passenger flow prediction models of rail transit mainly include time series models, regression models, and some related linear and nonlinear models.Roos [5] proposed a dynamic Bayesian network approach to forecast the short-term passenger flows of the urban rail network of Paris, which could deal with the incompleteness of the data caused by failures or a lack of collection systems.Zhao [6] used a support vector machine to predict the passenger flow of Xinzhuang subway station and concluded that the nonlinear support vector machine model can predict the working day better.Anl [7] developed a long short-term memory-based (LTSM-based) deep learning model to predict short-term transit passenger volume on transport routes in Istanbul using a dataset that included the number of people who used different transit routes at a one-hour interval between January and December 2020 and compared that with popular models such as random forest (RF), support vector machines, autoregressive integrated moving average, multilayer perceptron, and convolutional neural networks.Taking the passenger flow of Chengdu East Railway Station as an example, Tan [8] verified the higher prediction accuracy and better prediction performance of the GRNN neural network model based on parameter optimization (GA) compared with other models.Pekel [9] developed two hybrid forecasting methods, POA-ANN and IWD-ANN, to forecast passenger demand, compared the forecasting results with GA-ANN, and concluded that the new algorithm had a good effect on passenger prediction.
In order to improve prediction accuracy, many scholars have studied the application of neural networks and combined models to short-term passenger flow prediction.Alghamdi [10] proposed an end-to-end deep learning-based framework with a novel architecture to predict multi-step-ahead real-time travel demand along with uncertainty estimation.Asce [11] presented a novel nonparametric dynamic time-delay recurrent wavelet neural network model for forecasting traffic flow that exploited the concept of wavelet in the model to provide flexibility and extra adaptable translation parameters in the traffic flow forecasting model.Nagaraj [12] used a greedy layer-wise algorithm to enter the processed cluster data into the long-and short-term memory models and a recurrent neural network to solve the passenger flow prediction problem in public transport.Ermagun [13] examined spatiotemporal dependency between traffic links, proposed a two-step algorithm to search and identify the best look-back time window for upstream links, and indicated the best look-back time window depends on the travel time between two study detectors.Dong [14] used a genetic algorithm to optimize the BP model, which significantly improved the prediction accuracy of short-term passenger flow on Beijing Line 4. Mirzahossein [15] proposed a novel hybrid method based on deep learning to estimate short-term traffic volume at three adjacent intersections, combined with a time window and normal distribution of WND-LSTM for traffic flow prediction, and the MAPE obtained was 60-90% lower than that of ARIMA, LR, and other models.
The current research primarily focuses on the global prediction of passenger flow for all stations or the entire subway line.However, there is a lack of sufficient precise prediction research that takes into account the specific characteristics of individual stations.Furthermore, there are a limited number of prediction models available for comparison, and the existing models do not achieve a high level of accuracy.In general, the accuracy of passenger flow prediction is greatly influenced by the changing trends observed in previous data.Therefore, it is crucial to conduct a detailed analysis of the characteristics of subway stations and evaluate multiple prediction models to enhance the accuracy and effectiveness of the predictions [16].The combination of wavelet denoising and the LSTM model in this study has several benefits and innovations.Wavelet denoising enhances data quality by reducing noise interference, while the LSTM model effectively handles the time-series relationships and dynamic characteristics of non-stationary data.For complex AFC data, by combining them and comparing them with other related models, the future trend of non-stationary data can be predicted more accurately, and the accuracy and stability of prediction results can be improved.In addition, this method is innovative and provides a new idea and solution for the prediction and analysis of non-stationary data related to subway passenger flow.Different suitable models are selected for different types of stations to predict.Based on the cluster analysis of subway stations, this paper carries out detailed prediction analysis for typical stations.This paper examines Shantang Street station in Suzhou, chosen for its high commercial nature and weekday/weekend passenger differences.Wavelet denoising processed the short-term flow data, which an LSTM model used to predict volumes versus standalone LSTM, SVR, ANN, and ARIMA.The waveletdenoised LSTM model [17][18][19] significantly improved accuracy, indicating effectiveness for real-world rail transit forecasting.

Principle of Wavelet Denoising Analysis
Wavelet denoising analysis [20][21][22] has been successfully utilized in many fields.Due to the irregularity of short-term passenger flow data at stations, the prediction error for short-term rail transit passenger flow may be substantial.
The short-term passenger flow data of rail transit stations fluctuates constantly, with a certain level of noise.High-frequency signals can be denoised through threshold values, and then data can be reconstructed to achieve denoising.The traffic signal for short-term traffic volume containing noise can be formulated as follows: f (x): data after noise removal e(x): contained noise σ: noise intensity S(x): short-term passenger flow data of rail transit with noise signal

Wavelet Denoising Process
The basic process of wavelet denoising analysis is shown in Figure 1 below: types of stations to predict.Based on the cluster analysis of subway stations, this paper carries out detailed prediction analysis for typical stations.This paper examines Shantang Street station in Suzhou, chosen for its high commercial nature and weekday/weekend passenger differences.Wavelet denoising processed the short-term flow data, which an LSTM model used to predict volumes versus standalone LSTM, SVR, ANN, and ARIMA.The wavelet-denoised LSTM model [17][18][19] significantly improved accuracy, indicating effectiveness for real-world rail transit forecasting.

Principle of Wavelet Denoising Analysis
Wavelet denoising analysis [20][21][22] has been successfully utilized in many fields.Due to the irregularity of short-term passenger flow data at stations, the prediction error for short-term rail transit passenger flow may be substantial.
The short-term passenger flow data of rail transit stations fluctuates constantly, with a certain level of noise.High-frequency signals can be denoised through threshold values, and then data can be reconstructed to achieve denoising.The traffic signal for short-term traffic volume containing noise can be formulated as follows: f(x): data after noise removal e(x): contained noise σ: noise intensity S(x): short-term passenger flow data of rail transit with noise signal

Wavelet Denoising Process
The basic process of wavelet denoising analysis is shown in Figure 1 below: Therefore, when utilizing wavelet denoising to analyze short-term passenger flow data for rail transit, it can be simplified into five processes: selecting the wavelet function, wavelet base order, threshold function, decomposition layer, and wavelet reconstruction.

LSTM Process
The LSTM neural network [23] has four structures: forgetting gate, input gate, output gate, and memory unit.The cell structure of the unit is controlled through the forgetting and input gates.The LSTM process is (Figure 2):  Therefore, when utilizing wavelet denoising to analyze short-term passenger flow data for rail transit, it can be simplified into five processes: selecting the wavelet function, wavelet base order, threshold function, decomposition layer, and wavelet reconstruction.

Basic Principles of Long Term Memory Networks 2.2.1. LSTM Process
The LSTM neural network [23] has four structures: forgetting gate, input gate, output gate, and memory unit.The cell structure of the unit is controlled through the forgetting and input gates.The LSTM process is (Figure 2): day/weekend passenger differences.Wavelet denoising processed the short-term flow data, which an LSTM model used to predict volumes versus standalone LSTM, SVR, ANN, and ARIMA.The wavelet-denoised LSTM model [17][18][19] significantly improved accuracy, indicating effectiveness for real-world rail transit forecasting.

Principle of Wavelet Denoising Analysis
Wavelet denoising analysis [20][21][22] has been successfully utilized in many fields.Due to the irregularity of short-term passenger flow data at stations, the prediction error for short-term rail transit passenger flow may be substantial.
The short-term passenger flow data of rail transit stations fluctuates constantly, with a certain level of noise.High-frequency signals can be denoised through threshold values, and then data can be reconstructed to achieve denoising.The traffic signal for short-term traffic volume containing noise can be formulated as follows: The basic process of wavelet denoising analysis is shown in Figure 1 below: Therefore, when utilizing wavelet denoising to analyze short-term passenger flow data for rail transit, it can be simplified into five processes: selecting the wavelet function, wavelet base order, threshold function, decomposition layer, and wavelet reconstruction.

LSTM Process
The LSTM neural network [23] has four structures: forgetting gate, input gate, output gate, and memory unit.The cell structure of the unit is controlled through the forgetting and input gates.The LSTM process is (Figure 2):  The arrows in the figure above represent vectors, showing input from the previous node to the node the arrows point to.LSTM controls information flow through three gate structures, consisting of sigmoid activation functions and a multiplicative structure with an output of 0 or 1.The sigmoid activation function in the gate is Equations ( 2) and (3), with the tanh functions being (4) and (5). (2) C t−1 : The cell state passed in at the previous time.
x t : The new value of information that is read at the present moment causes the module to generate a new memory.
h t−1 : The output value of the previously hidden neuron module.C t : Belongs to the current time output information, to the next time transmitted unit state.
h t : New output at the current time.

Calculation of LSTM Forward Propagation
The LSTM forward propagation calculation process is from the forgetting gate to the input gate, updating the unit state, and finally to the output gate [24].
The forgetting gate determines how much information can be retained from the previous moment to the current one.After h t−1 and x t are activated by activation function, f t is obtained, representing the degree of retention of the previous hidden neuron state.The activation function is sigma, with the f t expression being: W f : Represents the weight of the input forgetting gate of the previous hidden neuron module.
U f : The information value of the input layer flows into the weight of the forgetting gate.b f : Calculate the bias parameters of the forgetting door.The input gate determines how much information will be received and can determine the new information generated and what percentage of the new information will be used.The calculation process is as follows: After passing through the input gate, the output of the input gate is: The updating of memory cell state means that the output f t of the forgetting gate is multiplied by the cell state C t−1 at the previous time and combined with the output of the input gate to obtain a new cell state C t .The expression of C t is as follows: Finally, we need to go through the output door, which is composed of two parts of calculation, partly by the current information combined with short-term memory thus calculated, and o t , another part is calculated combined with long-term memory and concluded h t , o t by the module of a hidden neurons on the output value of h t−1 combined with the current input value x t and activated by sigma function.The calculation process is as follows: The final output after LSTM model is as follows:

Reverse Calculation of LSTM
After LSTM model forward propagation, the weight set and relevant bias terms must be updated; therefore, reverse calculation can be performed by propagating the error up the layer.

Principles of Support Vector Machine Regression
The training samples of the SVR model [25][26][27] are D = {(x 1 , y 1 ), (x 2 , y 2 ). .., (x n , y n )}, with y i ∈R.The goal is to learn a model f (x) with a value close to y.When model f (x) exactly matches y, the final loss is 0. In the SVR model, the deviation between f (x) and y is set at most to ε.When the difference between f (x) and y is greater than ε, the loss is calculated; otherwise, the loss is ignored.This is equivalent to establishing a 2ε wide tolerance band centered on f (x).The red horizontal line is the standard data cable.The two dashed lines in the figure represent the soft interval.The data between the soft interval is represented by a blue dot, and the data outside the soft interval is represented by a white dot.If sample data falls within the tolerance bands, the prediction is accurate, as shown (Figure 3): calculated, and , another part is calculated combined with long-term memory and c cluded ℎ ,  by the module of a hidden neurons on the output value of ℎ combi with the current input value  and activated by sigma function.The calculation pro is as follows: 1 ( [ , ] ) The final output after LSTM model is as follows:

Reverse Calculation of LSTM
After LSTM model forward propagation, the weight set and relevant bias terms m be updated; therefore, reverse calculation can be performed by propagating the error the layer.

Principles of Support Vector Machine Regression
The training samples of the SVR model [25][26][27] are D = {(x1, y1), (x2, y2)…, (xn, yn)}, w yi∈R.The goal is to learn a model f(x) with a value close to y.When model f(x) exa matches y, the final loss is 0. In the SVR model, the deviation between f(x) and y is se most to ε.When the difference between f(x) and y is greater than ε, the loss is calcula otherwise, the loss is ignored.This is equivalent to establishing a 2ε wide tolerance b centered on f(x).The red horizontal line is the standard data cable.The two dashed l in the figure represent the soft interval.The data between the soft interval is represen by a blue dot, and the data outside the soft interval is represented by a white dot.If sam data falls within the tolerance bands, the prediction is accurate, as shown (Figure 3):

Principles of Artificial Neural Networks
An Artificial Neural Network (ANN) consists of an input layer, a hidden layer(s), an output layer [28,29].The state of the hidden layer remains unaffected by external tors; however, its state changes can lead to variations in the output.The back propaga algorithm is commonly utilized in ANN.It involves forward propagation, where the in layer is sequentially propagated through each layer, followed by back propagation, wh adjusts the weights and related thresholds.This iterative process aims to minimize error until the desired outcome is achieved.The calculation process is illustrated in diagram [30] (Figure 4).

Principles of Artificial Neural Networks
An Artificial Neural Network (ANN) consists of an input layer, a hidden layer(s), and an output layer [28,29].The state of the hidden layer remains unaffected by external factors; however, its state changes can lead to variations in the output.The back propagation algorithm is commonly utilized in ANN.It involves forward propagation, where the input layer is sequentially propagated through each layer, followed by back propagation, which adjusts the weights and related thresholds.This iterative process aims to minimize the error until the desired outcome is achieved.The calculation process is illustrated in the diagram [30] (Figure 4).(2) Backward propagation process The backward propagation process involves adjusting the parameters of the artificial neural network model to optimize its performance.The connection weights and cell thresholds are then modified accordingly to minimize the error.This adjustment is performed iteratively to refine the model's performance. (

3) Training termination conditions
The training process can be terminated based on certain conditions.Once these conditions are met, the training process is concluded.

Basic Principles of Time Series Model
The ARIMA model is a statistical model used for analyzing and predicting time series data [32,33].It is particularly effective in forecasting future values based on past observations and the autocorrelation within the series.The model consists of three main components: the autoregressive (AR) part, the differencing (I) part, and the moving average (MA) part.These components work together to capture the patterns and trends in the data, allowing for accurate predictions [34].
(1) Autoregressive Model (AR) The autoregressive model utilizes historical data to construct a predictive model for its own data.It is important to note that the autoregressive model assumes the data to be stationary.The formula for a p-order autoregressive model is as follows: : The current value of the variable μ: Constant term : Order  : Autocorrelation coefficient

Basic Principles of Time Series Model
The ARIMA model is a statistical model used for analyzing and predicting time series data [32,33].It is particularly effective in forecasting future values based on past observations and the autocorrelation within the series.The model consists of three main components: the autoregressive (AR) part, the differencing (I) part, and the moving average (MA) part.These components work together to capture the patterns and trends in the data, allowing for accurate predictions [34].
(1) Autoregressive Model (AR) The autoregressive model utilizes historical data to construct a predictive model for its own data.It is important to note that the autoregressive model assumes the data to be stationary.The formula for a p-order autoregressive model is as follows: y t : The current value of the variable µ: Constant term p: Order γ m : Autocorrelation coefficient ε t : Residual (2) Moving Average Model (MA) The moving average model utilizes the past values of the residual to represent the linear relationship, aiming to observe the magnitude of its fluctuations.The formula for a p-order moving average model is as follows: (3) Auto-Regression and Moving Average Model (ARMA) The ARMA model combines the autoregressive (AR) model and the moving average (MA) model.It expresses the relationship between the current value and both past values and past residuals.The formula for the ARMA model is as follows: (4) Integrated (I) Before determining the parameters p and q in the ARIMA model, it is necessary to conduct a stationarity test on the data.If the data fails the test, differencing is performed.After differencing, the data should meet the stationarity condition.

Data Preprocessing
This article examines short-term traffic predictions for the Shantang Street station of the Suzhou Rail Transit system.The AFC system data used in this article is collected from the automatic ticket machines at various stations in the rail transit system, which record the card swipes of people entering and exiting the stations.The data utilized in this study were obtained from the Suzhou Rail Transit AFC system and include transaction time, ticket ID and type, inbound and outbound station codes and names, and inbound and outbound times.The experiments in this article were conducted on a Windows 10 64-bit operating system.The hardware used includes an AMD Ryzen 7 5800H with a Radeon Graphics 3.20 GHz processor and 16 GB of memory.The programming language used is Python 3.7, and the Matplotlib 3.0.2plotting tool was utilized for generating plots.
MySQL was used to clean the raw data.Relevant database rules were applied to extract the required information, resulting in over 14 million data points for the month of July that were used in this analysis.Given the high commercial nature and distinct weekday versus weekend ridership patterns, outbound passenger traffic from Shantang Street station was selected as the prediction target.

LSTM Model Construction and Prediction Analysis
The training dataset for the LSTM network consists of the inbound passenger flow data at Shantang Street for the month of July.The objective is to predict the outbound passenger flow at Shantang Street.The training set includes the inbound passenger flow data from 1-27 July, while the test set comprises the inbound passenger flow data from 28-29 July (which corresponds to Monday and Sunday).The LSTM model is trained on each subset, and the validation subset is used to evaluate the model's performance.According to the size of the passenger flow data set and the limitation of computing resources, K = 5 was determined, 5-fold cross-validation was selected, and the training set was divided into 5 subsets, of which 4 subsets were used to train the model and the remaining 1 subset was used to verify the model, and the performance index of each fold number was recorded.
Root Mean Squared Error (RMSE) was selected as the evaluation index, and the parameters were constantly evaluated and optimized.Ultimately, the output layer is set to have a dimension of 1, the hidden layer is set to 4, the number of iterations is set to 1000, and the historical time step length is set to 30.To capture traffic patterns, a historical time step of 30 was used.A batch size of 10 and dropout layers were incorporated to improve accuracy and prevent overfitting.Sigmoid activation functions were utilized for all fully connected layers during training; the prediction results are compared in the final summary figure.
Compare the forecast results with the chart, and analyze the result index values.This model configuration was trained and used to predict the test set ridership.The results were compared to the actual values using the RMSE, MAE, and MAPE metrics for both weekdays and weekends.As seen in the figure, the LSTM model predictions did not match the true values very closely, indicating poor performance that needs improvement across all accuracy metrics.

Steps of Model Construction and Prediction
To address these limitations, a wavelet denoising approach was applied prior to LSTM modeling.The key steps were: 1.
Perform a 3-level discrete wavelet transform on the time series data using the db6 wavelet.

2.
Decompose the signal into low-and high-frequency components.

3.
Apply soft thresholding denoising to the three high-frequency signals.

5.
Split the data into training and test sets.6.
Train the LSTM model on denoised training data.7.
Validate model performance on denoised test data.
The visualizations below depict the original noisy data versus the smoothed denoised signal after wavelet decomposition and thresholding.

Predictive Analysis
Based on the aforementioned basic prediction steps, continuous validation is performed to determine the wavelet base function and conduct wavelet decomposition.First, the db6 wavelet basis function is selected to decompose the three-layer wavelet of July inbound passenger flow data of Shantang Street station with a time interval of 10 min, and the results are shown in Figure 5 below.The visualizations below depict the original noisy data versus the smoothed denoised signal after wavelet decomposition and thresholding.

Predictive Analysis
Based on the aforementioned basic prediction steps, continuous validation is performed to determine the wavelet base function and conduct wavelet decomposition.First, the db6 wavelet basis function is selected to decompose the three-layer wavelet of July inbound passenger flow data of Shantang Street station with a time interval of 10 min, and the results are shown in Figure 5 below.After wavelet decomposition and soft threshold denoising, the denoised data and the original data are visualized, as shown in Figure 6.It can be seen that the denoised data are smoother.The blue curve in the Figure 6 represents the original data, and the orange curve represents the data after noise removal.After wavelet decomposition and soft threshold denoising, the denoised data and the original data are visualized, as shown in Figure 6.It can be seen that the denoised data are smoother.The blue curve in the Figure 6 represents the original data, and the orange curve represents the data after noise removal.After wavelet decomposition and soft threshold denoising, the denoised data and the original data are visualized, as shown in Figure 6.It can be seen that the denoised data are smoother.The blue curve in the Figure 6 represents the original data, and the orange curve represents the data after noise removal.The inbound short-time passenger flow training set data of Shantang Street station after noise removal is used as the input of the LSTM network.Considering the data features and model performance after wavelet decomposition, RMSE is again selected as the evaluation metric.Cross-validation is employed to continuously assess and optimize the parameters.Ultimately, the following parameter settings are determined: the input layer has a dimension of 1, the time step length is set to 1, the output layer has a dimension of 1, the hidden layer is set to 8, the number of iterations is set to 3000, and the historical time step length is set to 30.For accurate training, the batch_size is set to 10, and the dropout layer is added.It is better to set the probability to 0.1.Based on the above settings, the training of the model is expanded, and the prediction results are compared in the final summary figure.
Compare the forecast results with the chart, and analyze the result index values.It can be seen that there is almost no difference between the prediction results of the test set The inbound short-time passenger flow training set data of Shantang Street station after noise removal is used as the input of the LSTM network.Considering the data features and model performance after wavelet decomposition, RMSE is again selected as the evaluation metric.Cross-validation is employed to continuously assess and optimize the parameters.Ultimately, the following parameter settings are determined: the input layer has a dimension of 1, the time step length is set to 1, the output layer has a dimension of 1, the hidden layer is set to 8, the number of iterations is set to 3000, and the historical time step length is set to 30.For accurate training, the batch_size is set to 10, and the dropout layer is added.It is better to set the probability to 0.1.Based on the above settings, the training of the model is expanded, and the prediction results are compared in the final summary figure.
Compare the forecast results with the chart, and analyze the result index values.It can be seen that there is almost no difference between the prediction results of the test set data and the data after noise removal, and the prediction model effect of the processed data are more significant.
With the denoised data, the LSTM model was re-trained using the same configuration described previously.As evident in the figure, the predictions closely matched the denoised test set values, demonstrating significantly improved model performance compared to the non-denoised data.The RMSE, MAE, and MAPE were substantially lower for both weekday and weekend results, confirming the benefits of preprocessing with wavelets prior to LSTM modeling for this application.It has great significance for forecasting.

SVR Model Construction and Prediction Analysis
According to the existing experimental results, it can be concluded that the SVR model has good fitting ability and has a good effect on solving some complex nonlinear problems.The short-term passenger flow of rail transit has the characteristic of complexity; therefore, a support vector machine model can be used to deal with the problem of short-term passenger flow prediction.
Train support vector machines (SVMs) with different kernels, selecting RBF based on best fit.

3.
Initialize hyperparameter values for penalty factors C and gamma.4.
Refine hyperparameters via grid search cross-validation to minimize MSE.

5.
Assess the model on test data.

Predictive Analysis
The step of prediction is set to 1.It is proven that the first 30 data points are used to predict the next data points, and the calculated error is relatively small.Firstly, the penalty factor parameter C was set as 1, 5, 10, 30, 100, and the parameter gamma was set as 0.1, 0.12, 0.01, 0.05, 0.001, 1, 0.5, and 0.9.The rbf function was selected as the kernel function.In the prediction, RMSE is chosen as the evaluation metric to continuously assess the model's performance using cross-validation and evaluate the model's generalization ability.The parameters are continuously evaluated and optimized to select the optimal hyperparameters.Finally, C = 5 and gamma = 0.1 were determined to predict the test set data based on the parameters, and the prediction results are compared in the final summary figure.
Compare the forecast results with the chart, and analyze the result index values.By calculating the predicted results, it can be seen that the predicted results do not deviate much from the actual values; however, there is still a certain gap compared with the LSTM model of wavelet denoising.However, in general, the SVR model is relatively reasonable for the prediction of short-term passenger flow.

ANN Model Construction and Prediction Analysis
Using the sklearn library in Python, an artificial neural network model was called to implement the backpropagation algorithm.The model was evaluated using the Root Mean Squared Error (RMSE) metric.Cross-validation was performed to determine the relevant parameters, and parameter tuning was conducted to optimize the model.After experimentation, it was found that setting the number of iterations to 100 and the batch size to 1 yielded relatively ideal results.The prediction step was set to 1, the network layer had 12 neurons, and the activation function used was sigmoid.The prediction results are compared in the final summary figure.
The forecast results were evaluated using RMSE, MAE, and MAPE.Compare the forecast results with the chart, and analyze the result index values.However, upon analyzing the prediction result graph, it was observed that the artificial neural network model did not perform exceptionally well.It failed to accurately predict sudden fluctuations in passenger flow and exhibited lower prediction accuracy.The predicted values were generally higher than the actual values 3.6.ARIMA Model Construction and Prediction Analysis 3.6.1.Steps of ARIMA Model Construction and Prediction Step 1: The Augmented Dickey-Fuller test (ADF test) can be utilized to test for stationarity [35,36].The ADF test examines the presence of a unit root in the model, which implies that b = 1 in an autoregressive equation y t = by t−1 + c + t .This phenomenon can create spurious relationships between independent and dependent variables.The ADF test assumes the existence of a unit root and evaluates the significance test statistic at three confidence levels (1%, 5%, and 10%).
White noise [37,38] is characterized by data that lacks any discernible patterns, with mean values fluctuating around zero and no clear trend.It follows a normal distribution with a mean of 0 and a variance of σˆ2.If the data contains white noise after testing, it indicates that there is no useful information, and modeling would be meaningless.Conversely, if there is no white noise, it suggests that the data can be modeled.
Step 2: Determine the values of pmax and qmax.This can be achieved by examining the autocorrelation and partial autocorrelation plots of the original time series data.Table 2 can be used as a guide to determine the appropriate values for pmax and qmax in the ARIMA model.
Step 3: Determine the final values of p and q by considering the maximum likelihood function value and the minimum number of parameters.The higher the likelihood function value, the better the model is.Additionally, a model with fewer parameters has lower complexity and computational requirements.The optimal values of p and q can be determined by calculating the Bayesian Information Criterion (BIC) [39,40].
The BIC is a criterion based on Bayesian theory that provides a more accurate judgment, particularly for large sample sizes, compared to the Akaike Information Criterion (AIC) [41].
The BIC is calculated as BIC = ln(n) (number of parameters in the model) − 2ln (maximum likelihood function value of the model).

AR
After decay approaches 0 p-order truncation (after a value greater than a rapid approach to 0) MA q order back truncated After decay approaches 0 ARMA Attenuation tends to 0 after order q Attenuation tends to 0 after order p Step 4: Test the model's validity using the Durbin-Watson (DW) test [42,43] and the QQ plot test [44].The DW test assesses the autocorrelation of a dataset by calculating the DW value of the residual from the established model.A DW value close to 0 or 4 indicates the presence of autocorrelation in the residual, while a value approaching 2 suggests no autocorrelation.

Predictive Analysis
First of all, we need to check the short-time passenger flow data series of Shantang Street to judge whether the time series data of the inbound passenger flow of Shantang Street is stable (Figure 7).It can be seen from the figure above that the data are basically stable, and then the unit root and stationarity tests of the data are carried out.By calculating ADF, the test results are as follows: (−9.19, 2.11e −15 , 18, 2986, {'1%': −3.43, '5%': −2.86, '10%': −2.57}, 25241.09).All of these calculations are reserved for two decimal places.The statistical value is lower than the original hypothesis at the 1%, 5%, and 10% significance levels, indicating that the data does not have a unit root and there is no white noise present.Therefore, the data are stable and suitable for ARIMA modeling analysis.Additionally, the calculated p-value of 2.11e −15 is less than 0.05, further supporting the conclusion that the data does not have a unit root.
The autocorrelation and partial autocorrelation plots of the original sequence data for the Shantang Street inbound passenger flow training set were used to determine the values of p and q in the ARIMA model (Figure 8).It can be seen from the figure above that the data are basically stable, and then the unit root and stationarity tests of the data are carried out.By calculating ADF, the test results are as follows: (−9.19, 2.11e −15 , 18, 2986, {'1%': −3.43, '5%': −2.86, '10%': −2.57}, 25241.09).All of these calculations are reserved for two decimal places.The statistical value is lower than the original hypothesis at the 1%, 5%, and 10% significance levels, indicating that the data does not have a unit root and there is no white noise present.Therefore, the data are stable and suitable for ARIMA modeling analysis.Additionally, the calculated p-value of 2.11e −15 is less than 0.05, further supporting the conclusion that the data does not have a unit root.
The autocorrelation and partial autocorrelation plots of the original sequence data for the Shantang Street inbound passenger flow training set were used to determine the values of p and q in the ARIMA model (Figure 8).
From the autocorrelation plot, it can be observed that the values approach 0 after the 10th order.Similarly, the partial autocorrelation plot shows that the values mostly approach 0 after the 4th order.Based on these observations, pmax = 10 and qmax = 5 were selected.The final values of p and q were determined using the Bayesian Information Criterion (BIC).The BIC calculation indicated that the smallest value was obtained when p = 3 and q = 3.Therefore, the ARIMA (3,0,3) model was established.
Therefore, the data are stable and suitable for ARIMA modeling analysis.Additionally, the calculated p-value of 2.11e −15 is less than 0.05, further supporting the conclusion that the data does not have a unit root.
The autocorrelation and partial autocorrelation plots of the original sequence data for the Shantang Street inbound passenger flow training set were used to determine the values of p and q in the ARIMA model (Figure 8).From the autocorrelation plot, it can be observed that the values approach 0 after the 10th order.Similarly, the partial autocorrelation plot shows that the values mostly approach 0 after the 4th order.Based on these observations, pmax = 10 and qmax = 5 were selected.The final values of p and q were determined using the Bayesian Information Criterion (BIC).The BIC calculation indicated that the smallest value was obtained when p = 3 and q = 3.Therefore, the ARIMA (3,0,3) model was established.
The residuals of the ARIMA (3,0,3) model were tested using the Durbin-Watson (DW) test and QQ plot.The calculated DW test value was 2.008262490962122, which is close to 2. The red line is the standard data cable.The QQ plot also showed that the data points were approximately on a straight line.The blue dots in the figure are the index values of the test results.These results indicate that the model is reasonable (Figure 9).Based on the calculated MAE index, it can be observed that the error in predicting shortterm passenger flow on weekdays is relatively small, with an average MAPE value of around 20%.Overall, the ARIMA model demonstrates relatively good prediction performance.

Comparison of Results
By utilizing various models to forecast the short-term passenger flow of Shantang Street station, the prediction results can be visually displayed, and the corresponding prediction indices can be summarized.First, a visual representation of the predicted results is presented.Subsequently, the RMSE, MAE, and MAPE indices of the predicted results for Sunday (28

Comparison of Results
By utilizing various models to forecast the short-term passenger flow of Shantang Street station, the prediction results can be visually displayed, and the corresponding prediction indices can be summarized.First, a visual representation of the predicted results is presented.Subsequently, the RMSE, MAE, and MAPE indices of the predicted results for Sunday (28    The Shantang Street station predictions reveal that the estimated volumes from all methods stayed relatively close to their true values.This suggests the selected techniques were appropriate for modeling this station's ridership.Across both weekday and weekend results, the denoised LSTM predictions aligned most tightly with the real data.The index values of prediction results of different models are shown in Table 3 below.The Shantang Street station predictions reveal that the estimated volumes from all methods stayed relatively close to their true values.This suggests the selected techniques were appropriate for modeling this station's ridership.Across both weekday and weekend results, the denoised LSTM predictions aligned most tightly with the real data.The index values of prediction results of different models are shown in Table 3 below.
Upon analyzing the calculated results in the aforementioned table, it can be observed that the LSTM method, which incorporates wavelet analysis for denoising, yields lower RMSE and MAE indices compared to other methods.Furthermore, the MAPE index is also significantly reduced.Consequently, it can be concluded that this method exhibits certain advantages in terms of prediction accuracy.In the absence of wavelet denoising, the LSTM model demonstrates superior performance, followed by SVR and ARIMA, while the ANN model exhibits relatively poorer performance when predicting short-term passenger flow on weekdays.When it comes to predicting short-term passenger flow on Sundays, both the LSTM and ARIMA models outperform the ANN model.As Sundays typically experience higher station traffic compared to Mondays, it is expected that the prediction errors will be higher on Sundays compared to Mondays.Considering both predictive power and practicality, integrated wavelet denoising with LSTM emerges as the superior methodology, demonstrating its applicability to real-world forecasting.

Conclusions
The focus of this study is to apply different short-term forecasting techniques to predict the passenger flow at Shantang Street Station of Suzhou Rail Transit.The goal was to analyze whether the proposed denoised LSTM method provided higher accuracy and effectiveness.
This paper examines Shantang Street station in Suzhou, chosen for its high commercial nature and weekday/weekend passenger differences.For short-term prediction research, wavelet denoising processed the time series data before LSTM modeling.Based on signalto-noise ratios and rail transit passenger flow characteristics, 3-level decomposition via soft thresholding and the db6 wavelet filtered out noise.This denoised data were used to train the LSTM model and compare its forecasts against the original noisy LSTM, SVR, ANN, and ARIMA results.This study confirms the necessity of selecting appropriate methods for predicting rail transit passenger flow.The wavelet-enhanced LSTM significantly improved prediction quality, providing a new perspective for rail transit volume forecasting.Leveraging big data and scientific modeling in this manner can produce practical gains, demonstrating the value of this integrated approach.
In this paper, single-step prediction is adopted when using the model to forecast short-term passenger flow, and multi-step prediction can be carried out in future research, which may save the time of model calculation.When forecasting short-term passenger flow, only the time series data of passenger flow is used in this paper.In the next forecasting study, features such as weather and geographical location can be added so that the factors considered will be more comprehensive, which will be helpful in improving the forecasting accuracy.
Author Contributions: Conceptualization, Q.Z. and L.Z.; methodology, X.F. and Y.W.; software, X.F.; data curation, Q.Z.; writing-original draft preparation, Q.Z. and X.F.; writing-review and editing, Q.Z. and Y.W.All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by the National Natural Science Foundation of China, grant number NFC52075030.

Figure 3 .
Figure 3. Support vector machine regression display.Parameters involved in support vector machine (SVM) regression include ε and C. ε is the loss function and affects model precision and training speed.Parameter C is a penalty factor, aiming to balance the model.The smaller C means a lower model complexity and penalty.The choice of C should not be too large or small; otherwise, overfitting or underfitting may occur.

Figure 4 .
Figure 4. Flowchart of artificial neural network.(1) Forward propagation process The output value of the input layer is denoted as O.The connection weights between the input layer and the hidden layer are represented as  .The output value of the input layer is multiplied by the corresponding weights  .The resulting values are then passed through an activation function, typically the sigmoid function, to obtain the output values of the hidden layer.This process is repeated for each subsequent layer until the output layer is reached [31].

Figure 4 .
Figure 4. Flowchart of artificial neural network.(1) Forward propagation process The output value of the input layer is denoted as O.The connection weights between the input layer and the hidden layer are represented as w ij .The output value of the input layer is multiplied by the corresponding weights w ij .The resulting values are then passed through an activation function, typically the sigmoid function, to obtain the output values of the hidden layer.This process is repeated for each subsequent layer until the output layer is reached [31].(2) Backward propagation process The backward propagation process involves adjusting the parameters of the artificial neural network model to optimize its performance.The connection weights and cell thresholds are then modified accordingly to minimize the error.This adjustment is performed iteratively to refine the model's performance.(3) Training termination conditions The training process can be terminated based on certain conditions.Once these conditions are met, the training process is concluded.
The 1-27 July inbound passenger flow was used as the training set, while the 28-29 July (Monday and Sunday) data were held out as the test set.Based on the extracted swipe card data from Shantang Street station in MySQL, the inbound passenger flow at Shantang Street is calculated with a time interval of 1 h.Each column represents the inbound passenger flow at Shantang Street every hour throughout the day.The processed Shantang Street passenger flow data are summarized, and a portion of the hourly passenger flow data are shown in

Figure 6 .
Figure 6.Comparison between denoised data and original data.

Figure 6 .
Figure 6.Comparison between denoised data and original data.

17 Figure 7 .
Figure 7. Time series diagram of short-term passenger flow in Shantang Street.

Figure 7 .
Figure 7. Time series diagram of short-term passenger flow in Shantang Street.

Figure 8 .
Figure 8. Autocorrelation and partial autocorrelation diagram.The residuals of the ARIMA (3,0,3) model were tested using the Durbin-Watson (DW) test and QQ plot.The calculated DW test value was 2.008262490962122, which is close to 2. The red line is the standard data cable.The QQ plot also showed that the data points were approximately on a straight line.The blue dots in the figure are the index values of the test results.These results indicate that the model is reasonable (Figure9).
July) and Monday (29 July) are calculated based on different feature days, allowing for an analysis of the accuracy of each model's predictions.The comparison of the prediction results of different models on 28 July is shown in Figure 10.And the comparison of the prediction results of different models on 29 July is shown in Figure 11.Xiaobo_predict in the figure below is a Chinese noun for the prediction results of the LSTM model based on wavelet denoising.The forecast result chart displays the number of input indicators on the x-axis and the passenger flow, measured in terms of the number of people, on the y-axis.The green curve represents the true value, and the other curves in different colors represent the predicted value of the different models.This visual representation effectively demonstrates

Figure 9 .
Figure 9. QQ diagramtest results.After conducting the DW and QQ graph tests, it can be concluded that the established model is reasonable.The trained model is then utilized to predict the short-term passenger flow of Suzhou Metro on 28-29 July, and the results are compared in the final summary figure.Compare the forecast results with the chart, and analyze the result index values.Based on the calculated MAE index, it can be observed that the error in predicting short-term passenger flow on weekdays is relatively small, with an average MAPE value of around 20%.Overall, the ARIMA model demonstrates relatively good prediction performance.
July) and Monday (29 July) are calculated based on different feature days, allowing for an analysis of the accuracy of each model's predictions.The comparison of the prediction results of different models on 28 July is shown in Figure 10.And the comparison of the prediction results of different models on 29 July is shown in Figure 11.Xiaobo_predict in the figure below is a Chinese noun for the prediction results of the LSTM model based on wavelet denoising.Xiaobo_predict in the figure below is a Chinese noun for the prediction results of the LSTM model based on wavelet denoising.The forecast result chart displays the number of input indicators on the x-axis and the passenger flow, measured in terms of the number of people, on the y-axis.The green curve represents the true value, and the other curves in different colors represent the predicted value of the different models.This visual representation effectively demonstrates the model's prediction accuracy and effectiveness.

Figure 10 .
Figure 10.Comparison of prediction results of different models at Shantang Street station on 28 July.

Figure 10 . 17 Figure 11 .
Figure 10.Comparison of prediction results of different models at Shantang Street station on 28 July.Mathematics 2023, 11, x FOR PEER REVIEW 14 of 17

Figure 11 .
Figure 11.Comparison of prediction results of different models at Shantang Street Station on 29 July.The forecast result chart displays the number of input indicators on the x-axis and the passenger flow, measured in terms of the number of people, on the y-axis.The green curve represents the true value, and the other curves in different colors represent the predicted value of the different models.This visual representation effectively demonstrates the model's prediction accuracy and effectiveness.The Shantang Street station predictions reveal that the estimated volumes from all methods stayed relatively close to their true values.This suggests the selected techniques were appropriate for modeling this station's ridership.Across both weekday and weekend results, the denoised LSTM predictions aligned most tightly with the real data.The index values of prediction results of different models are shown in Table3below.Upon analyzing the calculated results in the aforementioned table, it can be observed that the LSTM method, which incorporates wavelet analysis for denoising, yields lower RMSE and MAE indices compared to other methods.Furthermore, the MAPE index is also significantly reduced.Consequently, it can be concluded that this method exhibits certain advantages in terms of prediction accuracy.In the absence of wavelet denoising, the LSTM model demonstrates superior performance, followed by SVR and ARIMA, while the ANN model exhibits relatively poorer performance when predicting short-term passenger flow on weekdays.When it comes to predicting short-term passenger flow on Sundays, both the LSTM and ARIMA models outperform the ANN model.As Sundays typically experience higher station traffic compared to Mondays, it is expected that the prediction errors will be higher on Sundays compared to Mondays.Considering both predictive power and practicality, integrated wavelet denoising with LSTM emerges as the superior methodology, demonstrating its applicability to real-world forecasting.

Table 1 below : Table 1 .
Hourly passenger flow data for Shantang Street station (person/hour).

Table 3 .
Index values of prediction results under different models.
Upon analyzing the calculated results in the aforementioned table, it can be observed that the LSTM method, which incorporates wavelet analysis for denoising, yields lower RMSE and MAE indices compared to other methods.Furthermore, the MAPE index is

Table 3 .
Index values of prediction results under different models.