Research on Aviation Safety Prediction Based on Variable Selection and LSTM

Accurate prediction of aviation safety levels is significant for the efficient early warning and prevention of incidents. However, the causal mechanism and temporal character of aviation accidents are complex and not fully understood, which increases the operation cost of accurate aviation safety prediction. This paper adopts an innovative statistical method involving a least absolute shrinkage and selection operator (LASSO) and long short-term memory (LSTM). We compiled and calculated 138 monthly aviation insecure events collected from the Aviation Safety Reporting System (ASRS) and took minor accidents as the predictor. Firstly, this paper introduced the group variables and the weight matrix into LASSO to realize the adaptive variable selection. Furthermore, it took the selected variable into multistep stacked LSTM (MSSLSTM) to predict the monthly accidents in 2020. Finally, the proposed method was compared with multiple existing variable selection and prediction methods. The results demonstrate that the RMSE (root mean square error) of the MSSLSTM is reduced by 41.98%, compared with the original model; on the other hand, the key variable selected by the adaptive spare group lasso (ADSGL) can reduce the elapsed time by 42.67% (13 s). This shows that aviation safety prediction based on ADSGL and MSSLSTM can improve the prediction efficiency of the model while keeping excellent generalization ability and robustness.


Introduction
The aviation industry is of great economic value, and to adapt to the current demand for intelligent and refined safety management, the aviation safety mitigation strategy has been shifting from a reactive to a proactive and predictive method. Therefore, it is greatly significant to clarify the causal mechanisms of aviation accidents and make corresponding early warning measures to promote the development of aviation safety.
Accurate aviation safety prediction is the basis of effective accident early warning, which has become a hot topic of research today. Recently, the aviation safety prediction method used specific machine learning methods to train the features of historical causes and accident samples, construct a mathematical analytical model, and measure the change trend of the safety level. For example, Liang et al. [1] used the BP (backpropagation) neural network model to predict the monthly incidents per 10,000 flight hours of an airline. Puranik et al. [2] built an online predictive model based on a Random Forest regression algorithm to predict landing performance metrics. Rosa et al. [3] constructed an aviation safety risk assessment model based on the Bayesian inference mechanism. Lukacova et al. [4] proposed a model for accident severity prediction based on classification and regression trees. Zhang et al. [5] proposed a combined method involving SVM (support vector machine) and deep neural networks to qualify the risk. Qiao et al. [6] constructed the RBF (radial basis function) neural network model to predict the hard landing. Machine learning not only has strong self-learning ability and robustness but also can better fit the nonlinear relationships among complex variables. However, the input variables of traditional machine learning models are generally regarded stand-alone, that is, the temporal features are not fully taken into consideration, which adds interference to the prediction results. To fully extract the temporal features, Xiong et al. [7] used LSTM neural network model to train and predict the sign data of American bird strike accidents. However, single-layer structures are adopted in that paper, and it's difficult to accurately integrate the nonlinear relationships. Zhou et al. [8] used the ACARS (aircraft communications addressing and reporting system) accident report record as the research object, using the LSTM model to effectively capture the long-term dependency of samples and enhance the accuracy and robustness of safety measurement. Zhang [9] applied sequential deep learning techniques based on LSTM to perform a prognosis of adverse events. While the aviation accident data is typically a small sample, it's difficult to efficiently learn sample characteristics by peer-to-peer prediction.
In addition to the selection of prediction methods, the selection of input variables affects the predicted effect. The classic accident inducement identification model is represented by the SHEL [10] model, REASON [11], HFACS [12] (human factors analysis classification system), OHFAM [13] (occurrence human factors analysis model) and 24 model [14]. The expansion of the inducement index is theoretically conducive to obtaining a more accurate causality description but brings the curse of dimensionality accordingly: the higher the dimensions of the input set are, the efficiency of safety prediction will be greatly reduced. Existing studies have led to some explorations in variable selection: Paulet al. [15], in the form of a questionnaire interview, intuitively modeled the greatest threat from the perspective of pilots. Cui et al. [16] used DEA (data envelope analysis) and a Malmquist index to calculate the civil aviation safety efficiency of Chinese airlines from 2008 to 2012, it proved that the quality of personnel training is the most important factor affecting aviation safety. In view of the high-dimensional samples with multiple collinearities, some scholars have tried to introduce the LASSO penalty term in the classic regression model. LASSO can greatly compress the value of independent variables to dilute the high-dimensional aviation data, which verifies the feasibility in the field of fuel consumption [17]. However, there is still a blank in the inducement of aviation accident variable selection. Meanwhile, the classic LASSO punishes the coefficient of key variables consistently, which weakens the feature identification of those key variables.
In view of the status that existing methods have an insufficient description of the importance of aviation accidents and low prediction efficiency, this paper proposes a new method of aviation safety prediction based on variable selection and LSTM. The contributions of the proposed method can be described as follows: (a) the dimensions of the security sample data are sufficiently reduced [18] by ADSGL, which reduces the cost of a running predictor and the elapsed time. (b) the multistep stacked LSTM model is constructed to explore the deep temporal features and complex nonlinear relationship of aviation safety samples and achieve smaller RMSE compared with the original model.
The rest of this paper is organized as follows: Section 2 describes the required individual algorithms and the process for building the proposed methods. Section 3 demonstrates a case study, followed by the training process for the constructed model. Section 4 provides a comparative analysis to verify the feasibility and effectiveness of the proposed method, and Section 5 presents the conclusions.

The Whole Process of the Proposed Method
The process of the ADSGL-MSSLSTM method is depicted in Figure 1. The detailed descriptions are given as follows: (1) Data preprocess. Firstly, the aviation safety data are collected by the type of incidents from ASRS [19] reporting records, thus the text data is converted into structured data. Next, the aviation safety data is normalized into the range of [0, 1] to eliminate the interference of dimensional differences in data analysis.
(2) Key variables selection based on ADSGL. In view of the fact that there are too many manipulated variables to efficiently build predictive models, ADSGL is used to select key variables. The selected key variables are regarded to have the most interpretable information for incidents, which effectively reduces the operating cost and improves the calculation efficiency.
(3) Aviation safety prediction based on the MSSLSTM. In view of the fact that the predictors ignore the temporal dependence, which reduces the interpretability and accuracy, a multistep prediction model of aviation safety based on a multi-layer LSTM is constructed. Firstly, the white noise and stationarity of the data are tested. Secondly, the sample set is reconstructed to match the supervised learning mode. Thirdly, to enhance the ability of nonlinear fitting and temporal characteristic extraction, the hidden layer depth and learning step are adjusted, as well the learning error minimization is the criterion of optimal parameter establishment. (1) Data preprocess. Firstly, the aviation safety data are collected by the type of incidents from ASRS [19] reporting records, thus the text data is converted into structured data. Next, the aviation safety data is normalized into the range of [0,1] to eliminate the interference of dimensional differences in data analysis.
(2) Key variables selection based on ADSGL. In view of the fact that there are too many manipulated variables to efficiently build predictive models, ADSGL is used to select key variables. The selected key variables are regarded to have the most interpretable information for incidents, which effectively reduces the operating cost and improves the calculation efficiency.
(3) Aviation safety prediction based on the MSSLSTM. In view of the fact that the predictors ignore the temporal dependence, which reduces the interpretability and accuracy, a multistep prediction model of aviation safety based on a multi-layer LSTM is constructed. Firstly, the white noise and stationarity of the data are tested. Secondly, the sample set is reconstructed to match the supervised learning mode. Thirdly, to enhance the ability of nonlinear fitting and temporal characteristic extraction, the hidden layer depth and learning step are adjusted, as well the learning error minimization is the criterion of optimal parameter establishment.

LASSO Penalty Operator
For the classic linear regression model [20] for aviation safety level, we defined it by is the n-dimensional response variable vector, and is the regression coefficient matrix, the distribution of error is followed as The sum of the error is taken as the loss function, the penalized likelihood estimation [21] (objective function) is defined as

LASSO Penalty Operator
For the classic linear regression model [20] for aviation safety level, we defined it by where y = (y 1 , y 2 , . . . , y n ) T is the n-dimensional response variable vector, and X = (X 1 , X 2 , . . . , X l ) is the n*l-dimensional explanatory variable matrix. ω = ((ω (1) ), (ω (2) ), .., (ω (l) )) T is the regression coefficient matrix, the distribution of error is followed as ε ∼ N(0, σ 2 I n ). The sum of the error is taken as the loss function, the penalized likelihood estimation [21] (objective function) is defined aŝ where · 2 is the 2-Norm, penalized expressions for a single input variable are usually The penalty term is called the LASSO penalty when q = 1. λ is the adjustable coefficient, which can be adjusted to obtain the optimal solution and scale the penalty term appropriately. The advantages brought by the LASSO penalty into the inducements variable selection of aviation incidents can be illustrated as follows [22].
(1) Increase the sparsity of the regression coefficient matrix. Sparsity refers to the existence of several sample points with a value of 0, which conforms to the actual selection of incident inducement variables; when a candidate variable is an irrelevant interference variable, the corresponding regression coefficient is 0. The ω (i) is strictly equal to 0 when λ takes a large value, so as to realize the sparsity of the coefficient matrix and to achieve the variable selection.
(2) Increase in model stability. The input variable matrix is often followed by multicollinearity, and the obtained result may be non-inferior, which only satisfies the local optima. Daubechies, Defrise and De Mol [23] prove that any penalty for the coefficient matrix satisfying 1 ≤ p ≤ 2 can make the solution of the original model more stable.
However, the classic LASSO algorithm still has obvious defects: when making a univariate selection, LASSO agrees with the shrinkage multiple of the regression coefficient, which will impose more penalty on the coefficient term, resulting in the inconsistency and bias of the dimensionality reduction model.

Construction of the Algorithm for Aviation Safety Variable Selection
To address some limitations of the classic LASSO algorithm, this paper adopts a variable selection method based on the adaptive sparse group LASSO: when optimizing the minimum loss function, the function is univariate, that is, to adapt to the various inducement variables and the vastly different punishment strategy features, only one variable is optimized each time and cycled repeatedly until all variables converge.
l explanatory variables are divided into N non-overlapping groups. The objective function of ADSGL is formulated aŝ where [τ] 1×N is the weight matrix explaining the group variables, [ξ] 1×l is the weight matrix explaining the univariate. The weight matrix introduced by ADSGL can effectively reduce the penalty of the biggish coefficient, which keeps the model consistent and unbiased and improves the accuracy of variable selection. The algorithmic steps are summarized as follows: Step 1 the initial value of the regression coefficient is set aŝ Step 2 for k = 1, 2, . . . , N, if k satisfies the (5), thenω (k) = 0; Otherwise, turn to the Step 3.
Step 3 In the k th group of the regression coefficient, for j = 1, 2, . . . , l k , if k satisfies the (8), thenω  Step 4ω (k) i is taken in (9) to solve the optimized regression coefficient values.
Step 5 The threshold of absolute error is set and Step 2~Step 4 is cycled until convergence. The convergence algorithm is expressed as where f 0 is a strictly convex function and f j (ω) can be differentially. The algorithm avoids premature convergence and falling into a local optimum [24], and dynamically adjusts the penalty term according to the size of the regression coefficient, effectively reducing the calculation errors.

LSTM Architecture
To evaluate the selected key variables in improving the prediction accuracy, the LSTM architecture is used as an aviation safety predictor. The LSTM architecture has a memory unit and forget gate, which can update the cell state in real-time: the previous output (h t−1 ) and the current input (x t ) enter the forget gate and determines how much information is forgotten from the previous state (c t−1 ). Accordingly, the input gate determines how much the updated state is retained in the current state (c t ); the output gate determines how much information the c t export, thus continuously updating the state parameters over time. The network architecture of LSTM is shown in Figure 2. where f is the forget gate, i is the input gate, and o is the output gate. σ is the i function, which generally takes the sigmoid function. The tanh is a hyperbolic function. sigmoid and hyperbolic tangent function can be implemented as Where f is the forget gate, i is the input gate, and o is the output gate. σ is the incentive function, which generally takes the sigmoid function. The tanh is a hyperbolic tangent function. sigmoid and hyperbolic tangent function can be implemented as The updated formulas [25] for each gate and cell state can be implemented as where W is the weight of x t , U is the weight of h t−1 , b is the biases, c t is the critical state of the cell.
When processing the RNN model, the iterations of BPTT (backpropagation through time) increase due to the higher amounts of steps, which is most likely to initiate the gradient disappearance so that each parameter cannot be accurately updated. However, the unique gate structure of LSTMs can better alleviate such problems. BPTT can be implemented as where J t represents the cumulative training loss from the start to the current moment.
In RNN: ≈ σ at about 1, to prevent the gradient disappearance or explosion, and ensure the accurate update of parameters.

Prediction Process
(1) Stationarity and white noise test The research object of the LSTM predictor is a temporal aviation safety sample, so the stationarity and white noise of the sample should be tested first, and the tested sample can be analyzed and predicted by the predictor.
A stationary sequence refers to a temporal sample with no obvious volatility, trend, or cyclicity. A white noise sequence is randomly generated and mutually uncorrelated, and it is characterized by 3 features: . A schematic drawing of white noise sequence is shown in Figure 3.
(2) Refactoring the data Input and output feature vectors (x t , y t ) are refactored and fit into the sequence-tosequence learning in the LSTM network: x t and y t−1 (t = 2, . . . , n) are merged as. The training set is also divided according to each step size.
The test set is refactored as.
where n is the sum of the time steps, p is the length of the input sample, and q is the length of the output sample.
can control at about 1, to prevent the gradient disappearance or explo sion, and ensure the accurate update of parameters.

Prediction Process
(1) Stationarity and white noise test The research object of the LSTM predictor is a temporal aviation safety sample, so the stationarity and white noise of the sample should be tested first, and the tested sampl can be analyzed and predicted by the predictor.
A stationary sequence refers to a temporal sample with no obvious volatility, trend or cyclicity. A white noise sequence is randomly generated and mutually uncorrelated and it is characterized by 3 features: . A schematic drawing of whit noise sequence is shown in Figure 3.
The training set is also divided according to each step size. } ,..., The test set is refactored as.
where n is the sum of the time steps, p is the input step, and q is the output step.
(3) Constructing a multistep stacked predictor of aviation safety Peer to peer prediction adopted by the original LSTM cannot accurately describe th changes of the aviation safety level. Additionally, a single layer network lacks the ability (3) Constructing a multistep stacked predictor of aviation safety Peer to peer prediction adopted by the original LSTM cannot accurately describe the changes of the aviation safety level. Additionally, a single layer network lacks the ability to capture the temporal dependence of dataset. To address the above problems, the training layers and step size of the classic LSTM model are adjusted. The MSSLSTM model is constructed in turn, as shown in Figure 4.  In the same training batch, the input step size is set as p, the output step s q, the number of hidden layers neuron is set as k, and the number of hidden l as l. The output of the previous hidden layer is used as the input of the curren In the same training batch, the number of hidden layers neuron is set as k, and the number of hidden layers is set as l. The output of the previous hidden layer is used as the input of the current layer, the dropout layer is set between adjacent hidden layers to reduce the model complexity, and the last hidden layer and output termination are fully connected [26]. Finally, the n-dimensional output vector is activated by the softmax function. The fully connected layer can be implemented as.
The multistep stacked LSTM predictor not only increases the input sizes, which helps calculate the digital features of multiple historical samples in the input gate, improving the usage of existing samples but also increases the length of the output sample, which helps to describe the change trend of the aviation safety level more intuitively. Compared with the original LSTM model, the multistep stacked LSTM predictor strengthens the cell's learning performance on the temporal features, explains the trendy changes in safety level more visually, and helps conduct better forward-looking and real-time prediction. Additionally, the stacked architecture not only greatly strengthens the deep learning ability, but improves the robustness and adaptability of the predictor.

(4) Optimizing hyperparameters
The learning rate is optimized by the Adam algorithm with weight decay, the number of hidden layers and nodes are ergodic by step search, the MAE (mean absolute error) of the training results are compared under different parameter combinations, and it is recorded when the MAE takes the minimum value.
MAE can be implemented as where y i is the test value, y i is the predicted value. The Adam optimization process is as follows.
Firstly, the first-order and second-order moment estimation of the gradient is calculated: where β 1 and β 2 are decay rates for the first-order and second-order moment estimates. Secondly, the deviation term of the correction moment estimate is calculated: Finally, the learning rate update value is calculated based on the correction: where η 0 is the initial learning rate. Adam controls the updated learning rate in the measurable range, and accelerates the collection speed and ensures the robustness of the model. To intuitively evaluate the accuracy of the prediction model, single layer-multistep LSTM (LSTM ), the original LSTM, ML-RNN, TCN (temporal convolutional network), DT (decision tree), SVM, and ARMA (autoregressive moving average) predictors are used as control models to compute in the same experimental environment. RMSE is used as the accuracy evaluation indicator. RMSE can be implemented as

Data Preprocessing
The study sample was taken from ASRS. The complete report record [27] includes Time, Environment, Aircraft, Component (faulty parts, accident recorded with mechanical failure), Person, Events, Assessments, and Synopsis. The text data example is shown in Table 1. According to the description of Assessments and Synopsis in the reports, the variables of the accident dataset can be classified into 12 input variables: aircraft (A), company policy (CP), procedure (P), weather (W), communication breakdown (CB), confusion (C), distraction (D), human-machine interface (HMC), situational awareness (SA), time pressure (TP), training/qualification (TQ), workload (WL), and 1 output variable: minor accident (MA). Based on this, the monthly statistics of accidents from June 2010 to Nov 2020 were collected.
To eliminate the interference from dimensional differences [28], the dataset was normalized by min-max normalization. The normalized index values were scaled linearly between 0 and 1. The min-max normalization can be implemented as.
where x max and x min represent the maximum and minimum values of the sample, x is the original value, x* is the normalized value. Normalized results are shown in Table 2. The differences in value distributions were eliminated in the normalized sample, and the numerical features are retained by equal scale.

Setting of the Key Parameters of the Variable Selection Algorithm
The aircraft incident in Table 3 was selected as the dependent variable and denoted by {y t }, t ∈ (0, 138]; The remaining 12 types of insecure events were selected as the independent variable and denoted by {x (i) t }, i ∈ ( 0, 12],t ∈ (0, 138]. To improve the usage of the sample while avoiding overfitting [29], 10-fold cross-validation was used to divide the dataset. Given the actual sample size, the epoch was set at 100, and the number of groups was set at 2. To ensure the credibility of the experimental results, the variable selection was performed in the different relaxed variables (α). Parameters settings are shown in Table 3.

Analysis of Experimental Results
Minimum MSE (mean square error) is selected as the final selection results. MSE can be implemented as.
The experimental results are shown in Figure 5.  The experimental results are shown in Figure 5.    robustness of ADSGL in terms of error convergence, and a strong positive correlation is implicated between alpha and convergence speed. Lasso fitting coefficient tracer diagram as α takes 0.1 is shown in Figure 6. From Figure 6, it can be seen that: One group of variables' coefficients (W, CB, C, D, HMC, SA, TP, TQ, WL, MC) converged at a higher rate compared with the other group (A, CP, P), which indicates that the variables in the latter group are still highly interpretable to the output variables at optimal penalty weight. The absolute values of coefficients, which in the first group converged, are depicted in Figure 6.
From Figure 7, it can be found that the values of compressed regression coefficients are relatively consistent when the α is adjusted, which demonstrates that the ADSGL method is robust in the contraction of regression coefficients. Furthermore, the common intersection set of the results of the four experimental are {A, CP, P}, the regression coefficients of which are non-zero in the largest penalty, having a greater impact on the MA. Therefore, those three variables are selected as the input variables of the aviation safety predictor. From Figure 6, it can be seen that: One group of variables' coefficients (W, CB, C, D, HMC, SA, TP, TQ, WL) converged at a higher rate compared with the other group (A, CP, P), which indicates that the variables in the latter group are still highly interpretable to the output variables at optimal penalty weight. The absolute values of coefficients, which in the first group converged, are depicted in Figure 6.
From Figure 7, it can be found that the values of compressed regression coefficients are relatively consistent when the α is adjusted, which demonstrates that the ADSGL method is robust in the contraction of regression coefficients. Furthermore, the common intersection set of the results of the four experimental are {A, CP, P}, the regression coefficients of which are non-zero in the largest penalty, having a greater impact on the MA. Therefore, those three variables are selected as the input variables of the aviation safety predictor. Figure 7, it can be found that the values of compressed regression coefficients are relatively consistent when the α is adjusted, which demonstrates that the ADSGL method is robust in the contraction of regression coefficients. Furthermore, the common intersection set of the results of the four experimental are {A, CP, P}, the regression coefficients of which are non-zero in the largest penalty, having a greater impact on the MA. Therefore, those three variables are selected as the input variables of the aviation safety predictor.
The training step was generally fixed in the learning process and set based on actual demand, given the significant temporal dependence of aviation accidents, the lengths of the input and output samples were set to 4: the historical data of the last 4 months were used to predict the safety level in the next 4 months. The training and test sets were divided accordingly: {(X t−4 , . . . , X t−1 ), (Y t , . . . , Y t+3 ), t = 5, . . . , 131}, {(X 131 , . . . , X 134 ), (Y 135 , . . . , Y 138 )}

Stability Test and White Noise
To visually present the trend features of the sample over time, each variable was analyzed. The sequence diagram for the insecure events is presented in Figure 8. Figure 8a shows that the sample variance of the minor accident is stable overall. The sequence has little volatility, and no obvious trend and cyclicity, so it can be regarded as a stationary sequence. Additionally, the expectation for the sample is much larger than 0, which does not satisfy the autocorrelation property of a white-noise sequence. Thus, the sample passed the white noise test.
In turn, the other variables were observed in turn according to the sequence diagram, the results show that all input and output variables were stationary and non-white-noise sequences.

Hyperparameter Optimization
To observe the predicted effect of the model at different batches, the number of nodes and number of layers were successively adjusted. To alleviate the overfitting, neurons were randomly discarded at 65% for each training session (dropout_rate = 0.65).
Furthermore, we adjusted the number of layers, constructed the MSSLSTM aviation safety predictor, and traversed the training error (MAE). Figure 9 shows that the MAE decreases as the layer number increases to 4 and the minimum MAE is about 0.1962, which demonstrates that the stacked structure enhances the deep learning ability. While MAE increases as the layer number is over 4, the possible cause is overfitting due to model complexity. Thus, the number of hidden layers in this case was set as 4.
The training step was generally fixed in the learning process and set based on actual demand, given the significant temporal dependence of aviation accidents, the input and output steps were set to 4: the historical data of the last 4 months were used to predict the safety level in the next 4 months. The training and test sets were divided accordingly: 5,...,131} = ), ,..., ( ), ,...,

Stability Test and White Noise
To visually present the trend features of the sample over time, each variable was analyzed. The sequence diagram for the insecure events is presented in Figure 8.  Figure 8a shows that the sample variance of the minor accident is stable overall. The sequence has little volatility, and no obvious trend and cyclicity, so it can be regarded as a stationary sequence. Additionally, the expectation for the sample is much larger than 0, which does not satisfy the autocorrelation property of a white-noise sequence. Thus, the sample passed the white noise test.
In turn, the other variables were observed in turn according to the sequence diagram, the results show that all input and output variables were stationary and non-white-noise sequences.

Hyperparameter Optimization
To observe the predicted effect of the model at different batches, the number and number of layers were successively adjusted. To alleviate the overfitting, were randomly discarded at 65% for each training session (dropout_rate = 0.65).
Furthermore, we adjusted the number of layers, constructed the MSSLSTM safety predictor, and traversed the training error (MAE). Figure 9 shows that t decreases as the layer number increases to 4 and the minimum MAE is about 0.196 demonstrates that the stacked structure enhances the deep learning ability. Wh increases as the layer number is over 4, the possible cause is overfitting due to mo plexity. Thus, the number of hidden layers in this case was set as 4.  Based on the optimization of the nodes' number in the first two layers, we the nodes' number in the last two layers, training errors are accordingly shown i 10b, the global minimum MAE is about 0.1751, as the number of nodes were set a From the parameters described above, the optimal setting for the number o was set as 9, 7, 5, 5.  Based on the optimization of the nodes' number in the first two layers, we adjusted the nodes' number in the last two layers, training errors are accordingly shown in Figure 10b, the global minimum MAE is about 0.1751, as the number of nodes were set as 5, 5.
From the parameters described above, the optimal setting for the number of nodes was set as 9, 7, 5, 5.
nation of the first two layers. It shows that the global minimum MAE is about 0.1917, as the number of nodes were set as 9 and 7.
Based on the optimization of the nodes' number in the first two layers, we adjusted the nodes' number in the last two layers, training errors are accordingly shown in Figure  10b, the global minimum MAE is about 0.1751, as the number of nodes were set as 5, 5.
From the parameters described above, the optimal setting for the number of nodes was set as 9, 7, 5, 5.

Effect of Predictors on the Experimental Results
The model proposed in this paper based on the MSSLSTM will be compared with prediction models such as ' LSTM , LSTM, MSSRNN, TCN, DT, SVR, and ARMA. Those compared models were all used to predict the value of aviation safety level in the same experimental environment. The comparison experiment results of each model are shown in Figure 11.

Effect of Predictors on the Experimental Results
The model proposed in this paper based on the MSSLSTM will be compared with prediction models such as LSTM , LSTM, MSSRNN, TCN, DT, SVR, and ARMA. Those compared models were all used to predict the value of aviation safety level in the same experimental environment. The comparison experiment results of each model are shown in Figure 11.
Sensors 2023, 23, x FOR PEER REVIEW Figure 11. Comparison of the predictive model results.
From Figure 11, it can be seen that: (1) the MSSLSTM predictor outperforms inal LSTM in the fitting effect. (2) Recurrent neural networks (RNNs) are better ARMA model, as it demonstrates a better ability to recognise and understand tr the memory units. (3) For DR and SVR, the prediction errors have a much wider deviations, compared with RNNs, which indirectly proved that RNNs are relat bust. (4) TCN performs well to fit increasing and decreasing trends of predicted v has a large relative error.
The RMSE distribution of the 10 experiments is recorded. As shown in Figur The error of MSSLSTM is the least, and RMSE is below 0.058, which is reduced by and 28.37% compared with LSTM and TCN. This further proves that the robustne MSSLSTM proposed in this paper has been strengthened. (2) The accuracy MSSLSTM is slightly higher than that of the MSSRNN, which shows that gate s helps improve the accuracy of multi-layers predictor. From Figure 11, it can be seen that: (1) the MSSLSTM predictor outperforms the original LSTM in the fitting effect. (2) Recurrent neural networks (RNNs) are better than the ARMA model, as it demonstrates a better ability to recognise and understand trends by the memory units. (3) For DR and SVR, the prediction errors have a much wider range of deviations, compared with RNNs, which indirectly proved that RNNs are relatively robust. (4) TCN performs well to fit increasing and decreasing trends of predicted value but has a large relative error.
The RMSE distribution of the 10 experiments is recorded. As shown in Figure 12: (1) The error of MSSLSTM is the least, and RMSE is below 0.058, which is reduced by 41.977% and 28.37% compared with LSTM and TCN. This further proves that the robustness of the MSSLSTM proposed in this paper has been strengthened. (2) The accuracy of the MSSLSTM is slightly higher than that of the MSSRNN, which shows that gate structure helps improve the accuracy of multi-layers predictor.
The RMSE distribution of the 10 experiments is recorded. As shown in Figure 12: (1 The error of MSSLSTM is the least, and RMSE is below 0.058, which is reduced by 41.977% and 28.37% compared with LSTM and TCN. This further proves that the robustness of th MSSLSTM proposed in this paper has been strengthened. (2) The accuracy of th MSSLSTM is slightly higher than that of the MSSRNN, which shows that gate structur helps improve the accuracy of multi-layers predictor.

Effect of Variable Selection Methods on the Experimental Results
To more intuitively verify the effectiveness and robustness of ADSGL for improvin safety prediction efficiency, LASSO, Group LASSO, ridge regression (RR), and random forest (RF) are introduced into the proposed MSSLSTM predictor, respectively, and th input variables determined by the variable selection method are tested. The RMSE elapsed time (t), and the number of selected variables (Variable_number) are used as eva uation indicators.
The comparison experiment results of each model are shown in Table 4.

Effect of Variable Selection Methods on the Experimental Results
To more intuitively verify the effectiveness and robustness of ADSGL for improving safety prediction efficiency, LASSO, Group LASSO, ridge regression (RR), and random forest (RF) are introduced into the proposed MSSLSTM predictor, respectively, and the input variables determined by the variable selection methods are tested. The RMSE, elapsed time (t), and the number of selected variables (Variable_number) are used as evaluation indicators.
The comparison experiment results of each model are shown in Table 4. It can be seen from the comparison of Table 4 that: (1) The predicted RMSE of ADSGL is 0.058, which is significantly lower than that of the original operator (LASSO, RR), and the convergence error is the minimum. It shows that the accuracy of the proposed ADSGL is greatly improved, compared with the original operator. (2) The accuracy of the ADSGL is better than that of the RF, and the error decreased by more than 35.53%, which verifies the effectiveness of the penalty term in improving the accuracy. (3) Compared with the original LASSO, the elapsed time is shortened by nearly 13 s. It proves that the proposed ADSGL greatly improves elapsed efficiency. (4) The number of variables selected by ADSGL is 3, under the premise of ensuring prediction accuracy, the sample dimensions get fully reduced, by about 63.3%, with better interpretability.

Conclusions
This paper presents a new aviation safety prediction method based on ADSGL-MSSLSTM. In this study, ADSGL is used to select key variables based on coefficients' shrinking at both the group level and univariate level, and penalize the coefficients differently depending on their adaptive weights. In this way, the evaluations are ensured to be unbiased and consistent, which improves selection model accuracy and interpretability. Subsequently, MSSLSTM is trained using a dataset after dimension reduction, and it's then used to predict the changing trend in minor accidents. The multi-step stacked structure enhances the predictor by generalizing and analyzing complex temporal dependencies and nonlinear relationships, which helps further improve the prediction efficiency and robustness.
The feasibility of ADSGL-MSSLSTM is demonstrated by case studies of data collected in ASRS, from June 2010 to Nov 2020. Various existing methods are determined as control models. Experimental results indicate that the proposed method can efficiently conduct key variable selection and improve the prediction performance of aviation incidents.
As part of the future work, firstly, given the difference in safety among various models of aircraft, we will apply the proposed method to predict the safety level in a specific model, thus extracting more tailored safety information. Additionally, appropriately increasing the appropriate coefficient penalty will be adopted to speed up the error convergence rate while satisfying the accuracy demand.