Civil Aviation Passenger Traffic Forecasting: Application and Comparative Study of the Seasonal Autoregressive Integrated Moving Average Model and Backpropagation Neural Network

: With the rapid development of China’s aviation industry, the accurate prediction of civil aviation passenger volume is crucial to the sustainable development of the industry. However, the current prediction of civil aviation passenger traffic has not yet reached the ideal accuracy, so it is particularly important to improve the accuracy of prediction. This paper explores and compares the effectiveness of the backpropagation (BP) neural network model and the SARIMA model in predicting civil aviation passenger traffic. Firstly, this study utilizes data from 2006 to 2019, applies these two models separately to forecast civil aviation passenger traffic in 2019, and combines the two models to forecast the same period. Through comparing the mean relative error (MRE), mean square error (MSE), and root mean square error (RMSE), the prediction accuracies of the two single models and the combined model are evaluated, and the best prediction method is determined. Subsequently, using the data from 2006 to 2019, the optimal method is applied to forecast the civil aviation passenger traffic from 2020 to 2023. Finally, this paper compares the epidemic’s impact on civil aviation passenger traffic with the actual data. This paper improves the prediction accuracy of civil aviation passenger volume, and the research results have practical significance for understanding and evaluating the impact of the epidemic on the aviation industry.


Introduction
China's civil aviation passenger volume has always been one of the important indicators of national economic development and people's living standards.With the issuance of the "14th Five-Year Plan for Civil Aviation Development", China will embark on a new journey of building a strong civil aviation country in many fields.At the same time, with the rapid development of China's economy and the improvement of people's living standards, civil aviation passenger traffic also shows a trend of continuous growth.However, since the outbreak of the global epidemic at the end of 2019, people have been worried about the various impacts that the epidemic may bring, so they have reduced their travel, which has caused an unprecedented impact on the aviation industry, and the passenger volume of civil aviation has been greatly reduced.
Therefore, it is particularly important to accurately predict civil aviation passenger traffic.This helps not only to promote the sustainable development of China's aviation industry, but also helps airlines to adjust flights and models in advance, thereby optimizing passenger carrying rate and improving operational efficiency.At the same time, it is also important to understand the impact of the epidemic on passenger traffic, which enables airlines to adjust their long-term and short-term strategies according to the actual situation.This includes the optimization and adjustment of the route network, the re-evaluation of the aircraft procurement plan and so on.
The primary task of the prediction of civil aviation passenger volume is to select the model.The common prediction methods include the time series analysis model, including the moving average method, the weighted moving average method, the simple exponentialsmoothing method, and the ARIMA model.The most classic model is the ARIMA model, which is the most commonly used model in practical cases, and it is also one of the most widely used methods for univariate time series data prediction.It only needs endogenous variables without other exogenous variables.The neural network model is also often used to predict passenger volume.The BP neural network model is a commonly used artificial neural network model with strong nonlinear modeling capabilities.Through the backpropagation algorithm, the model can be trained and learned, thereby improving the prediction accuracy of the model, and can handle a large amount of data.Therefore, this paper chooses ARIMA and BP neural network models, through in-depth analysis of the characteristics of these two models, and combines them to achieve more accurate prediction results.
The main idea of this paper is to first apply the SARIMA model and the BP neural network model to predict and analyze the civil aviation passenger traffic from 2006 to 2019.Subsequently, the prediction residual of the SARIMA model is used as the input data of the BP neural network, and the BP neural network is used to represent the nonlinear characteristics of the civil aviation passenger volume to obtain more accurate prediction results.At the same time, this study also uses this method to predict the passenger volume of civil aviation during the epidemic period from 2020 to 2023, compares the predicted results with the actual data to analyze the specific impact of the epidemic on the passenger volume of civil aviation, and puts forward the corresponding suggestions accordingly.The main contributions of this paper are as follows: (1) The SARIMA-BP combined model is used to predict the civil aviation passenger volume, improve the accuracy of civil aviation passenger volume prediction, make the airlines adjust their flights and models in advance, and improve operation efficiency.(2) By predicting the passenger volume during the epidemic and comparing it with the actual passenger transport data, the impact of the epidemic on the passenger volume of civil aviation is demonstrated.These research results can provide some reference information for airlines to help them develop effective strategies and measures to cope with possible future challenges.
The rest of this paper is organized as follows: Section 2 presents the relevant literature.Section 3 describes the methods used to forecast civil aviation passenger traffic and data processing.Section 4 details how to use the SARIMA model, the BP neural network model, and the combined model.This section aims to explain how to use these models for forecasting and compare and analyze the results obtained.Section 5 presents the forecasting using this optimal methodology and analyses the impact of the epidemic on civil aviation passenger traffic by comparing it with the actual data.Section 6 presents the conclusions drawn.Section 7 contains some discussions summarizing the highlights and shortcomings of this paper.

Literature Review
Prediction methods are mainly divided into three categories: traditional time series analysis prediction, non-traditional time series analysis prediction, and prediction technology based on machine learning.In traditional time series analysis and prediction, the main methods include various regression models, the moving average method, the autoregressive integrated moving average method (ARIMA), the Holt-Winters method (also known as Winters' method), and various exponential-smoothing methods.The demand for forecasting based on non-traditional time series analysis puts forward a relatively new forecasting method from the perspective of the multi-disciplinary integration of statistics, system dynamics, and grey system theory.
Although the traditional time series analysis method provides an effective prediction solution to a certain extent, with the exponential growth in the amount of data in the flight-booking process, the nonlinear trend, and the high irregularity and volatility in the data, these methods may encounter difficulties in dealing with modern high-complexity big data.Therefore, the machine learning method provides a new and effective solution to deal with these complex and volatile flight demand forecasting problems, with its powerful nonlinear modeling ability.The neural network model based on the error backpropagation algorithm is a kind of neural network model with a strong nonlinear mapping ability prediction model.
The prediction of civil aviation passenger volume has been widely studied.Scholars usually divide it into two methods: single-model prediction and combined-model prediction.There are many prediction methods for a single model.Yu et al. [1] used the GM (1,1) model to simulate the prediction of civil aviation passenger traffic and corrected it using the GM (1,1) residual model, proving the high accuracy of the prediction formula.Zhang et al. [2] used a BP neural network prediction model to forecast the passenger traffic of civil aviation in Beijing from four aspects: economy, tourism, competition, and airport operational capacity.The ELM prediction model was used to predict civil aviation passenger traffic by Chen et al. [3].Wu et al. [4] used the LSTM prediction model to predict civil aviation passenger traffic.Their results show that the performance of the model is better than the existing fusion model and stable.Meng et al. [5] used a fuzzy diagonal regression neural network to forecast civil aviation passenger traffic.Ma et al. [6] used a multiple linear regression model to analyze the influencing factors of civil aviation passenger traffic in the Gansu province.Anupam et al. [7] used the NARX dynamic neural network to forecast civil aviation passenger traffic.Li used the SARIMA model and LSTM neural network for prediction, respectively, and the LSTM model was better in predicting the passenger traffic of civil aviation [8].Kanavos et al. [9] developed an air travel demand estimation and forecasting model using the classical autoregressive integrated moving average (ARIMA), the seasonal approach (SARIMA), and a deep learning neural network (DLNN).In addition, many scholars [10][11][12][13][14] have also used the ARIMA model to forecast the passenger traffic of civil aviation.
Although individual-model prediction methods are straightforward to implement, they often have inherent shortcomings that lead to an insufficient prediction accuracy.Therefore, some scholars choose to use the combined model prediction method to improve the accuracy of their predictions.Chen et al. [15] utilized a combined SARIMA-LR model to forecast civil aviation passenger traffic and analyze the impact of the civil aviation industry during the epidemic.Gan et al. [16] employed a bi-directional LSTM model for prediction, resulting in a high prediction accuracy.Al-Sultan [17] considered a wide range of time series prediction models.An empirical analysis shows that the BSTS model is superior to other time series models in predicting complex time series.Hu [18] used the nonadditive Choquet fuzzy integral to combine the prediction of four commonly used univariate grey prediction models into combined prediction ones.Yao et al. [19] used a combined ARIMA-BP model to predict civil aviation passenger volume, but the modeling process was cumbersome.Yu et al. [20] used the ARIMA-BP combined model to forecast short-term traffic flows, which effectively reduced the error.
The COVID-19 pandemic has had a profound impact on the global development of civil aviation.Su et al. [21] examined the spatial distribution of outbreaks and civil aviation passenger throughput in China utilizing COVID-19 statistical data, alongside socioeconomic development data from various Chinese cities, and integrating the Moran index with econometric models.Deveci et al. [22] investigated the economic ramifications of COVID-19 on the civil aviation sector.Wojcik et al. [23] built a behavioral model of flu search based on survey data linked to users' online browsing data.The research results of the above-selected parts of the literature are summarized in Table 1.

Data Source and Processing
This paper selects the monthly data of national civil aviation passenger traffic published by the National Bureau of Statistics from January 2006 to December 2019, through a collation and a summary, as shown in Figure 1.The ARIMA model has a good fitting effect on the original data sequence.
Combinedmodel prediction SARIMA-LR [15] Monthly civil aviation passenger traffic The combined model improves the prediction accuracy.
Bi-directional LSTM [16] Civil aviation passenger of the Kunming-Xishuangbanna route The prediction accuracy of the model is high and feasible.
Nonadditive forecast combination-Grey [18] Annual civil aviation passenger volume It is noticeably superior to other single models.

ARIMA-BP [20]
Short-term forecasting of traffic flows It effectively reduces the error.

Data Source and Processing
This paper selects the monthly data of national civil aviation passenger traffic published by the National Bureau of Statistics from January 2006 to December 2019, through a collation and a summary, as shown in Figure 1.According to the data shown in Figure 1, it can be observed that the distribution of data points is relatively continuous, and there are no obvious outliers or anomalies, so there is no need for data cleaning.In addition, each month's data are complete, and there are no missing values, so there is no need for data replenishment processing.According to the data shown in Figure 1, it can be observed that the distribution of data points is relatively continuous, and there are no obvious outliers or anomalies, so there is no need for data cleaning.In addition, each month's data are complete, and there are no missing values, so there is no need for data replenishment processing.

SARIMA Model
SARIMA is a time series forecasting model for forecasting and analyzing data with seasonal patterns.It is an extension of the ARIMA model to handle seasonal data.Time series data with seasonal components can be supported.Three hyper-parameters (P, D, Q) are added to ARIMA(p, d, q), as well as an additional seasonal cycle parameter s.SARIMA(p, d, q)(P, D, Q) s has a total of seven parameters, which can be classified into two categories, three nonseasonal parameters (p, d, q) and four seasonal parameters (P, D, Q) s , where P is the seasonal autoregression, Q is the non-seasonal autoregression, p and q are the maximum lag order of the moving average operator, d is the number of non-seasonal differentials, and D is the number of seasonal differentials.
We performed D seasonal differencing (de-periodization) and d differencing (detrending) on the time series {y t } to obtain the new series {x t }, then modeled the differenced {x t } as follows: where ϕ (p) (B) and θ (q) (B) are autoregressive and moving average polynomials.Φ (P) (B s ) and Θ (Q) (B s ) are polynomials in seasonal autoregression and the seasonal moving average.y t is the observed value, and ϵ t is the whiteout sound.

BP Neural Network Model
The backpropagation neural network is called the BP network, which has been widely used in various applications.It learns and stores a large number of input-output modemapping relations.The learning rule is to use the steepest descent method to iteratively adjust the weights and thresholds of the network through backpropagation to minimize the sum of squared errors.Because of the steepest descent method, the BP neural network can solve the problems of a slow learning convergence and a low learning efficiency.

Fundamentals
• The neuron model is shown in Figure 2 Sustainability 2024, 16, x FOR PEER REVIEW 5 of 18

SARIMA Model
SARIMA is a time series forecasting model for forecasting and analyzing data with seasonal patterns.It is an extension of the ARIMA model to handle seasonal data.Time series data with seasonal components can be supported.Three hyper-parameters (, , ) are added to (, , ) , as well as an additional seasonal cycle parameter  .(, , )(, , )  has a total of seven parameters, which can be classified into two categories, three non-seasonal parameters (, , ) and four seasonal parameters (, , )  , (, , )(, , )  (1) where  is the seasonal autoregression,  is the non-seasonal autoregression,  and  are the maximum lag order of the moving average operator,  is the number of non-seasonal differentials, and  is the number of seasonal differentials.
We performed  seasonal differencing (de-periodization) and d differencing (detrending) on the time series {  } to obtain the new series {  }, then modeled the differenced {  } as follows: where  () () and  () () are autoregressive and moving average polynomials.Φ () (  ) and Θ () (  ) are polynomials in seasonal autoregression and the seasonal moving average.  is the observed value, and   is the whiteout sound.

BP Neural Network Model
The backpropagation neural network is called the BP network, which has been widely used in various applications.It learns and stores a large number of input-output mode-mapping relations.The learning rule is to use the steepest descent method to iteratively adjust the weights and thresholds of the network through backpropagation to minimize the sum of squared errors.Because of the steepest descent method, the BP neural network can solve the problems of a slow learning convergence and a low learning efficiency.

•
The neuron model is shown in A BP network consists of an input layer, a hidden layer, and an output layer.The input layer receives the input data, while the hidden layer processes the information.The A BP network consists of an input layer, a hidden layer, and an output layer.The input layer receives the input data, while the hidden layer processes the information.The output layer is the output of the message, which is the result we want.The weights from the input layer to the hidden layer are represented by υ while the weights from the hidden layer to the output layer are represented by ω.
In Figure 3, the model diagram depicts a neural network with a single hidden layer.The process of the BP neural network can be divided into two stages.The first stage involves the forward propagation of the signal, where the input data pass through the hidden layer and eventually reach the output layer.The second stage is the backward propagation of the error.The error is propagated from the output layer to the hidden layer and then to the input layer.This backward propagation allows for the adjustment of the weights and biases in the hidden layer and the weights in the input layer.output layer is the output of the message, which is the result we want.The weights from the input layer to the hidden layer are represented by  while the weights from the hidden layer to the output layer are represented by .
In Figure 3, the model diagram depicts a neural network with a single hidden layer.The process of the BP neural network can be divided into two stages.The first stage involves the forward propagation of the signal, where the input data pass through the hidden layer and eventually reach the output layer.The second stage is the backward propagation of the error.The error is propagated from the output layer to the hidden layer and then to the input layer.This backward propagation allows for the adjustment of the weights and biases in the hidden layer and the weights in the input layer.

• Backpropagation Algorithm
The neural network is trained by a backpropagation algorithm.The algorithm uses gradient descent to adjust the connection weights and biases by minimizing the error between the network output and the actual values.This process consists of iterative steps of forward propagation and backward updating of the weights.

• Activation Functions
Common activation functions include Sigmoid, Tanh, ReLU, etc., which are used to introduce nonlinear factors so that the neural network can handle complex nonlinear relationships.The most used function at the moment is the Sigmoid (logistic) function, also known as the S-shaped growth curve, a function which works better when used for classifiers.

Training Process
Step 1 Input data: Input data from the training set is fed into the input layer of the network; Step 2 Forward propagation: Calculate the output of each neuron through the forward propagation of the network; Step 3 Calculate the error: Compare the network output with the actual value and calculate the error; Step 4 Backpropagation: Backpropagate using the error information, calculate the gradient, and update the connection weights and bias according to the gradient; Step 5 Repeat Iteration: Adjust the network parameters through multiple iterations of the training process until the error converges to a satisfactory level; Therefore, the BP neural network has a strong nonlinear fitting ability and is suitable for complex problems; it has a strong learning ability and a good processing ability for

• Backpropagation Algorithm
The neural network is trained by a backpropagation algorithm.The algorithm uses gradient descent to adjust the connection weights and biases by minimizing the error between the network output and the actual values.This process consists of iterative steps of forward propagation and backward updating of the weights.

• Activation Functions
Common activation functions include Sigmoid, Tanh, ReLU, etc., which are used to introduce nonlinear factors so that the neural network can handle complex nonlinear relationships.The most used function at the moment is the Sigmoid (logistic) function, also known as the S-shaped growth curve, a function which works better when used for classifiers. (4)

Training Process
Step 1 Input data: Input data from the training set is fed into the input layer of the network; Step 2 Forward propagation: Calculate the output of each neuron through the forward propagation of the network; Step 3 Calculate the error: Compare the network output with the actual value and calculate the error; Step 4 Backpropagation: Backpropagate using the error information, calculate the gradient, and update the connection weights and bias according to the gradient; Step 5 Repeat Iteration: Adjust the network parameters through multiple iterations of the training process until the error converges to a satisfactory level; Therefore, the BP neural network has a strong nonlinear fitting ability and is suitable for complex problems; it has a strong learning ability and a good processing ability for large-scale data sets.However, it is sensitive to the initial weights and learning rate, and it may require larger training data when dealing with some specific problems.

SARIMA-BP Neural Network Forecasting Model
Due to the pronounced seasonal characteristics of civil aviation passenger traffic, this study initially employs the seasonal ARIMA (SARIMA) model to describe its linear components.However, the model's predictive accuracy may be compromised when delineating time series changes, as the SARIMA model employs differencing to isolate linear factors and fails to adequately account for the nonlinear elements influencing time series fluctuations.The SARIMA model's prediction error (residual) serves as the input for the BP neural network.This study utilizes the nonlinear BP neural network model to characterize the nonlinear aspects of civil aviation passenger transportation volume.Concurrently, this approach corrects the SARIMA model's prediction residuals to enhance the prediction accuracy.The nonlinear BP neural network learns the residual prediction model through training, and the final prediction result is as follows: where e i , in this paper, is the corrected residual of the prediction SARIMA model, and a t , in this paper, is the predictions of the SARIMA model.

Evaluating Indicator
To better evaluate the error and bias of the prediction results and evaluate the performance of the prediction method, this study used five indicators: E k , MRE, R 2 MSE, and RMSE.It is expressed by Equations ( 6)- (10).
where y k , in this paper, is the predicted value of the model; T k , in this paper, is the true value; E k is the relative error; MRE is the mean relative error; R 2 is the coefficient of determination; and MSE is the mean square error, which can evaluate the degree of change in the data.The smaller the value of MSE, the better the accuracy of the prediction model to describe the experimental data.Meanwhile, RMSE is the root mean square error, which measures the deviation between the predicted value and the real value and is sensitive to the outliers in the data.In the line graph of the original series (Figure 1), we can observe that the data of the civil aviation passenger transportation volume show a growing trend with the increase in time, indicating that the time series has an obvious linear trend.By scrutinizing the line graph, we find that, after 12 time intervals, the series again shows the same fluctuation pattern, which indicates that the time series of civil aviation passenger traffic has a strong periodicity, where the cycle length is S = 12.

Model Application and Analysis of Results
Since there is significant seasonal volatility in the civil aviation passenger traffic time series to eliminate the effects of seasonality and trend in the series, we label the original series as X.Firstly, we perform a seasonal differencing of the series with a step size of 12, denoted as D(X, 0, 12), as shown in Figure 4. Next, a first-order differencing with a step size of 1 is performed, denoted as D(X, 1, 12), as shown in Figure 5.These two operations help make the series smoother and easier for subsequent time series analysis and modeling.
graph, we find that, after 12 time intervals, the series again shows the same fluctuation pattern, which indicates that the time series of civil aviation passenger traffic has a strong periodicity, where the cycle length is  = 12.
Since there is significant seasonal volatility in the civil aviation passenger traffic time series to eliminate the effects of seasonality and trend in the series, we label the original series as .Firstly, we perform a seasonal differencing of the series with a step size of 12, denoted as (, 0,12), as shown in Figure 4. Next, a first-order differencing with a step size of 1 is performed, denoted as (, 1,12), as shown in Figure 5.These two operations help make the series smoother and easier for subsequent time series analysis and modeling.periodicity, where the cycle length is  = 12.
Since there is significant seasonal volatility in the civil aviation passenger traffic time series to eliminate the effects of seasonality and trend in the series, we label the original series as .Firstly, we perform a seasonal differencing of the series with a step size of 12, denoted as (, 0,12), as shown in Figure 4. Next, a first-order differencing with a step size of 1 is performed, denoted as (, 1,12), as shown in Figure 5.These two operations help make the series smoother and easier for subsequent time series analysis and modeling.Meanwhile, the ADF unit root test is performed on the sequence D(X, 1, 12) after calculating the difference, and the test results are detailed in Table 2.The absolute values of the t-statistics are smaller than the corresponding t-values of the ADF test when the t-statistics are set to the 1%, 5%, and 10% levels, respectively.In addition, the probability p-value is 0.0000, which is significantly smaller than the usual significance level of 0.05.Combining the results of Figure 5 and the unit root test, it can be seen that the sequence D(X, 1, 12) exhibits smooth properties.We identified the model using Box-Jenkins' model identification method.This method first assumes that the process of generating time series can be approximated by an ARMA model (if it is stationary) or an ARIMA model (if it is non-static).Two diagnostic charts can be used to help select the p and q parameters of ARMA or ARIMA, which are the autocorrelation function (ACF) and the partial autocorrelation function (PACF), respectively.The ACF plot summarizes the correlation between the observations and the lag values.The PACF plot summarizes the correlation of the observations with the lagged values, which are not explained by previous lagged observations.If the ACF drops sharply to near 0 and the PACF quickly converges to 0 when the time interval k is small, then we can use the MA model.If the PACF drops sharply to near 0 and the ACF quickly converges to 0 when the time interval k is small, then we can use the AR model.If the ACF and PACF do not decline sharply but eventually converge to 0, then it is more appropriate to use the ARMA model.A sharp decline refers to a cliff-like decline, does not mean convergence to 0, and may rise later.
The autocorrelation function (ACF) and partial autocorrelation function (PACF) of the D(X, 1, 12) sequence are shown in Figure 6, with a p-value of less than 0.05 for a nonwhite noise sequence, which can be modeled; the autocorrelation sequences all converge to 0 after the second period, presenting a certain amount of trail; the partial autocorrelation sequences present a certain amount of trail; and, a preliminary decision is made, selecting the ARMA model.

Model Ordering and Parameter Estimation
In this section, we will analyze the ACF and PACF plots (determining  and ).The value of the autoregressive term  is determined using the PACF plot.In the PACF plot, if all the bars after delay k are close to zero, then  =  can be chosen.This means that the first significant non-zero delay in the PACF plot is a candidate value for .The value of the moving average term  is determined using the ACF plot.In an ACF plot, if all the bars after delay  are close to zero, then  =  can be chosen.This means that the first significant non-zero delay in the ACF plot is a candidate value for .
It can be seen from Figure 6 that the model parameters AR can be taken as 2, 11, and 12, and MA can be taken as 2, 3, 11, and 12. Since the autocorrelation function (ACF) of the time series shows a significant correlation at the first lag point after each seasonal cycle, a seasonal moving average term is needed to help the model capture this seasonal

Model Ordering and Parameter Estimation
In this section, we will analyze the ACF and PACF plots (determining p and q).The value of the autoregressive term p is determined using the PACF plot.In the PACF plot, if all the bars after delay k are close to zero, then p = k can be chosen.This means that the first significant non-zero delay in the PACF plot is a candidate value for p.The value of the moving average term q is determined using the ACF plot.In an ACF plot, if all the bars after delay k are close to zero, then q = k can be chosen.This means that the first significant non-zero delay in the ACF plot is a candidate value for q.
It can be seen from Figure 6 that the model parameters AR can be taken as 2, 11, and 12, and MA can be taken as 2, 3, 11, and 12. Since the autocorrelation function (ACF) of the time series shows a significant correlation at the first lag point after each seasonal cycle, a seasonal moving average term is needed to help the model capture this seasonal effect, so SMA takes 1.Through model debugging, we obtain the model parameters in Table 3.The SARI MA(12, 1, 12)(0, 1, 1) 12 model is more appropriate, and the D(X, 1, 12) sequence is modeled as follows: The (12,1,12)(0,1,1) model is more appropriate, and the (, 1,12) sequence is modeled as follows:   • Ljung-Box Test: According to the residual autocorrelation plot in Figure 8, the p-value is less than the significance level (usually 0.05), which indicates that there is autocorrelation in the residual series.• Prediction Performance: The model is trained using historical data and then used to make predictions of future data.Table 4 shows the prediction results and the relative error.Figure 8 and Table 4 show that the relative error is small and that the predictive performance of the model is good.This paper selects sequence values from the first 12 periods to predict the values of the subsequent period.Specifically, the sequence values from periods 1-12 serve as the input to the network, while the sequence value of the 13th period is designated as the network's output.Likewise, the sequence values from periods 2-13 serve as the input, with the sequence value of the 14th period being the output, and this pattern continues.According to the "Rule of Thumb," the number of hidden-layer neurons is typically calculated as 2/3 of the number of input-layer neurons plus 1/3 of the number of output-layer neurons, resulting in either 8 or 9 neurons.Subsequently, an empirical approach is employed to determine the appropriate number of output-layer neurons, which, in this case, is set to nine.Finally, the network configuration consists of 12 input-layer neurons, 9 hidden-layer neurons, and 1 output-layer neuron, with the Sigmoid function selected as the activation function.The training process involves 5000 iterations, with an error threshold of 0.000001 and a learning rate of 0.01.• Prediction Performance: The model is trained using historical data and then used to make predictions of future data.Table 4 shows the prediction results and the relative error.Figure 8 and Table 4 show that the relative error is small and that the predictive performance of the model is good.This paper selects sequence values from the first 12 periods to predict the values of the subsequent period.Specifically, the sequence values from periods 1-12 serve as the input to the network, while the sequence value of the 13th period is designated as the network's output.Likewise, the sequence values from periods 2-13 serve as the input, with the sequence value of the 14th period being the output, and this pattern continues.According to the "Rule of Thumb," the number of hidden-layer neurons is typically calculated as 2/3 of the number of input-layer neurons plus 1/3 of the number of output-layer neurons, resulting in either 8 or 9 neurons.Subsequently, an empirical approach is employed to determine the appropriate number of output-layer neurons, which, in this case, is set to nine.Finally, the network configuration consists of 12 input-layer neurons, 9 hiddenlayer neurons, and 1 output-layer neuron, with the Sigmoid function selected as the activation function.The training process involves 5000 iterations, with an error threshold of 0.000001 and a learning rate of 0.01.

BP Neural Network Prediction Results
After training on the sample data, the network produces output values and their fitness with the actual values is illustrated in Table 5 and Figure 9.The relative error between the network's output and the actual values from the BP neural network model training is minimal, suggesting that the neural network can be effectively applied to predicting China's civil aviation passenger traffic.

BP Neural Network Prediction Results
After training on the sample data, the network produces output values and their fitness with the actual values is illustrated in Table 5 and Figure 9.The relative error between the network's output and the actual values from the BP neural network model training is minimal, suggesting that the neural network can be effectively applied to predicting China's civil aviation passenger traffic.

SARIMA-BP Neural Network Prediction Model for Civil Aviation Passenger Traffic Volume
The results demonstrate that individual prediction methods exhibit limited accuracy.Therefore, the ARIMA-BP model combination is employed to forecast civil aviation passenger traffic volume.Residuals are derived from predictions using the seasonal ARIMA model, serving as the desired output for the BP neural network.Subsequently, the original civil aviation passenger traffic data are utilized for training, and the resulting data are fed into the BP neural network for learning modeling to obtain predicted residual sequence values.Finally, MATLAB 2023b outputs the prediction results of the combined SARIMA-BP model.As depicted in Figure 10, the predicted values closely align with the true values, leading to a significant reduction in prediction error and an enhancement in the model's prediction accuracy.

SARIMA-BP Neural Network Prediction Model for Civil Aviation Passenger Traffic Volume
The results demonstrate that individual prediction methods exhibit limited accuracy.Therefore, the ARIMA-BP model combination is employed to forecast civil aviation passenger traffic volume.Residuals are derived from predictions using the seasonal ARIMA model, serving as the desired output for the BP neural network.Subsequently, the original civil aviation passenger traffic data are utilized for training, and the resulting data are fed into the BP neural network for learning modeling to obtain predicted residual sequence values.Finally, MATLAB 2023b outputs the prediction results of the combined SARIMA-BP model.As depicted in Figure 10, the predicted values closely align with the true values, leading to a significant reduction in prediction error and an enhancement in the model's prediction accuracy.

Comparison and Analysis of Results
The relative errors of the three models were compared and analyzed and the results of the comparison are shown below (see Figure 11).The evaluation indicators (MRE, R 2 , MSE, RMSE) of the three models are compared in Table 6.

Comparison and Analysis of Results
The relative errors of the three models were compared and analyzed and the results of the comparison are shown below (see Figure 11).The evaluation indicators (MRE, R 2 , MSE, RMSE) of the three models are compared in Table 6.

Comparison and Analysis of Results
The relative errors of the three models were compared and analyzed and the results of the comparison are shown below (see Figure 11).The evaluation indicators (MRE, R 2 , MSE, RMSE) of the three models are compared in Table 6.Observing Figure 11 reveals that all the relative errors of the combined model are below 5 percent, whereas the individual prediction models exhibit some significant relative error values.
It can be observed from Table 6 that the prediction results of the combined model are in good agreement with the actual civil aviation passenger volume data.The average relative error is 1.6906%, and the R 2 value is as high as 0.9816, which is very close to 1, indicating that the model fits well.In addition, the mean square error (MSE) and root mean square error (RMSE) of the model are also significantly lower than other comparison models, which further proves its superiority.The SARIMA-BP model skillfully combines the advantages of the two models and effectively utilizes the prediction information of each model.This combination model greatly improves the accuracy of the prediction, thereby enhancing the reliability of the prediction results.Therefore, it was decided to use the SARIMA-BP model to predict the civil aviation passenger volume during the epidemic period (2020-2023).

Analysis of the Impact of the Epidemic on Passenger Transport Volume
We compared the forecast of civil aviation passenger traffic during the epidemic period (2020-2023) with the actual data, as shown in Figure 12. indicating that the model fits well.In addition, the mean square error (MSE) and root mean square error (RMSE) of the model are also significantly lower than other comparison models, which further proves its superiority.The SARIMA-BP model skillfully combines the advantages of the two models and effectively utilizes the prediction information of each model.This combination model greatly improves the accuracy of the prediction, thereby enhancing the reliability of the prediction results.Therefore, it was decided to use the SARIMA-BP model to predict the civil aviation passenger volume during the epidemic period (2020-2023).

Analysis of the Impact of the Epidemic on Passenger Transport Volume
We compared the forecast of civil aviation passenger traffic during the epidemic period (2020-2023) with the actual data, as shown in Figure 12.We observed the severe impact of the epidemic on the aviation industry.Overall, civil aviation passenger traffic suffered significant losses totaling approximately 1347.2 million passengers, particularly in February 2022, when the outbreak losses peaked at 87.62 percent, with approximately 55.8 million passengers, and in February 2020, at the beginning of the outbreak, when the losses were also significant, with a reduction of 85.11 percent, along with approximately 49.81 million passengers lost.However, over time, especially at the beginning of 2023, we could see a gradual recovery in civil aviation passenger traffic, with the smallest loss of 13.37 percent in July 2023 and with a loss of about 9.64 million passengers, followed by a gradual return to normal levels.
Presently, with the risk of the epidemic receding and the steady growth in civil aviation passenger traffic, people's willingness to travel abroad has increased significantly.Due to the constraints of road and railway transportation, airlines have the opportunity to attract more passengers choosing to fly by launching various promotional activities and improving cabin comfort.In addition, airlines can open up new routes according to changes in market demand or optimize or even discontinue existing routes to more effectively meet the needs of passengers and enhance their market competitiveness.

Conclusions
Based on the comparative study of the BP neural network model and the SARIMA model in predicting civil aviation passenger volume as well as the results of combining the two models for simultaneous prediction, we have drawn the following conclusions.
Firstly, when predicting the passenger volume of civil aviation in 2019, we found that the SARIMA-BP combination model performed the best, with a better prediction accuracy than using the BP neural network model or the SARIMA model alone.This shows that the 0.00%  We observed the severe impact of the epidemic on the aviation industry.Overall, civil aviation passenger traffic suffered significant losses totaling approximately 1347.2 million passengers, particularly in February 2022, when the outbreak losses peaked at 87.62 percent, with approximately 55.8 million passengers, and in February 2020, at the beginning of the outbreak, when the losses were also significant, with a reduction of 85.11 percent, along with approximately 49.81 million passengers lost.However, over time, especially at the beginning of 2023, we could see a gradual recovery in civil aviation passenger traffic, with the smallest loss of 13.37 percent in July 2023 and with a loss of about 9.64 million passengers, followed by a gradual return to normal levels.
Presently, with the risk of the epidemic receding and the steady growth in civil aviation passenger traffic, people's willingness to travel abroad has increased significantly.Due to the constraints of road and railway transportation, airlines have the opportunity to attract more passengers choosing to fly by launching various promotional activities and improving cabin comfort.In addition, airlines can open up new routes according to changes in market demand or optimize or even discontinue existing routes to more effectively meet the needs of passengers and enhance their market competitiveness.

Conclusions
Based on the comparative study of the BP neural network model and the SARIMA model in predicting civil aviation passenger volume as well as the results of combining the two models for simultaneous prediction, we have drawn the following conclusions.
Firstly, when predicting the passenger volume of civil aviation in 2019, we found that the SARIMA-BP combination model performed the best, with a better prediction accuracy than using the BP neural network model or the SARIMA model alone.This shows that the accuracy and stability of prediction can be improved by combining multiple prediction methods according to the characteristics of a single model.
Secondly, for predicting civil aviation passenger volume from 2020 to 2023, we utilized the SARIMA-BP combination model, which had been validated as the best method.Through comparison with actual data, it was observed that the epidemic had significantly impacted the aviation industry, resulting in substantial losses in civil aviation passenger traffic.Particularly in July 2022, during the initial outbreak of the epidemic, the decline in civil aviation passenger traffic reached its peak.However, over time, especially in early 2023, the passenger volume of civil aviation gradually rebounded and eventually returned to normal levels.Airlines can adjust their long-term and short-term strategies according to the actual situation.This includes the optimization and adjustment of the route network, the re-evaluation of the aircraft procurement plan, and so on.
In summary, this study demonstrates the effectiveness of combination models in predicting civil aviation passenger volume and provides an in-depth analysis of the epidemic's impact on the aviation industry.These findings offer valuable insights for airlines and government departments, enabling them to develop effective response strategies and measures to address similar crises which may arise in the future.Future research could focus on exploring alternative prediction models or integrating multiple methods to enhance the precision and stability of predictions, thereby better adapting to the ever-changing market environment.

Discussion
Civil aviation passenger volume shows a significant linear growth trend.The SARIMA model has a high prediction accuracy for time series with regular growth.At the same time, the BP neural network also shows an excellent prediction ability for nonlinear sequences.By combining these two models, we can further improve the accuracy of the prediction.The research literature shows that, compared with the single model, the combined prediction model can usually provide a higher accuracy.As shown in Table 7, the example verifies the advantages of the combined model in the prediction effect.The combined model improves the prediction accuracy.

Annual civil aviation passenger volume
It is noticeably superior to other single models.

Short-term forecasting of traffic flows
The prediction accuracy is improved compared to that of a single model.
In this paper, the combination model of SARIMA and a BP neural network is used to predict the passenger volume of civil aviation, and the prediction accuracy is improved.However, this paper has the following shortcomings: 1.
This paper does not try to use a variety of combinations in the prediction.2.
In the prediction of civil aviation passenger volume, this paper does not take into account the economic, demographic, and other external factors.

3.
The amounts of data used in this paper are relatively limited, including only monthly data but not annual data.
In future research, we can consider introducing more external factors and expanding the types and quantities of data to improve the accuracy of prediction.In addition, the combined-model method can also be applied to the prediction of highway and railway passenger volume.

Figure 1 .
Figure 1.The trend in China's civil aviation passenger traffic from January 2006 to December 2019.

Figure 1 .
Figure 1.The trend in China's civil aviation passenger traffic from January 2006 to December 2019.
5239B 12 ϵ t (11) 4.1.4.Model Testing•Residual Analysis: According to Figure7, the residual autocorrelation plot of the model's residuals is examined; it is, indeed, white noise; and, there is no obvious pattern or trend.Sustainability 2024, 16, x FOR PEER REVIEW 11 of 18

•
Residual Analysis: According to Figure7, the residual autocorrelation plot of the model's residuals is examined; it is, indeed, white noise; and, there is no obvious pattern or trend.•Ljung-BoxTest: According to the residual autocorrelation plot in Figure8, the pvalue is less than the significance level (usually 0.05), which indicates that there is autocorrelation in the residual series.• AIC Comparison: Using information criteria such as the Akaike Information Criterion (AIC), by comparing the fitting performance of different SARIMA models, the (12,1,12)(0,1,1) model has the model with the minimum AIC.

Figure 9 .
Figure 9. BP neural network model fitting effect diagram.

Figure 9 .
Figure 9. BP neural network model fitting effect diagram.

Figure 10 .
Figure 10.Combined model prediction result fitting effect plot.

Figure 10 .
Figure 10.Combined model prediction result fitting effect plot.

Figure 12 .
Figure 12.Comparison between predicted and actual data from 2020 to 2023.

Figure 12 .
Figure 12.Comparison between predicted and actual data from 2020 to 2023.

Table 1 .
The research results of some works in the literature.

Table 1 .
The research results of some works in the literature.
• AIC Comparison: Using information criteria such as the Akaike Information Crite- rion (AIC), by comparing the fitting performance of different SARIMA models, the SARI MA(12, 1, 12)(0, 1, 1) 12 model has the model with the minimum AIC.

Table 5 .
BP neural network model prediction results.

Table 5 .
BP neural network model prediction results.

Table 6 .
Comparison of the prediction results of the three models.Observing Figure11reveals that all the relative errors of the combined model are below 5 percent, whereas the individual prediction models exhibit some significant relative error values.It can be observed from Table6that the prediction results of the combined model are in good agreement with the actual civil aviation passenger volume data.The average relative error is 1.6906%, and the R 2 value is as high as 0.9816, which is very close to 1,

Table 6 .
Comparison of the prediction results of the three models.It can be observed from Table6that the prediction results of the combined model are in good agreement with the actual civil aviation passenger volume data.The average relative error is 1.6906%, and the R 2 value is as high as 0.9816, which is very close to 1,

Table 6 .
Comparison of the prediction results of the three models.