Prediction of Construction and Production Safety Accidents in China Based on Time Series Analysis Combination Model

: At present, the data related to construction safety accidents in China have the characteristics of small sample size, large ﬂuctuation, large base, nonlinearity, etc. It is of great signiﬁcance in accident prevention to break through the limitations of traditional prediction models in dealing with time series and reduce the contingency of prediction results. The purpose of this study was to use the trend decomposition method to reduce the ﬂuctuation of non-stationary time series and use the combination of an autoregressive integrated moving average model and Grey model with fractional order accumulation to accurately predict construction accidents. This paper analyzed the number of production safety accidents and deaths in housing municipal engineering in China from 2009 to 2019, which makes monthly, quarterly and annual forecasts and compares forecast results. The FOAGM model is based on a genetic algorithm and can forecast individual months separately, which improves the overall forecast of accident numbers. The rolling forecast was used to provide an idea of hierarchical optimization of the forecast results for the annual accident number forecast with a small number of samples. The study also emphasized that the prediction of the death toll is affected by larger and above accidents. By using the CRITIC method, the impacts were quantiﬁed, so as to revise the prediction results.


Introduction
The construction industry is an important pillar industry in China. In the process of continuous marketization of the economy, it is the benchmark for the reform of other industries, and it takes the lead in undertaking various problems brought about by institutional reform [1]. At present, the safety production accidents of housing municipal engineering in China are still prone to occur, and in large quantities [2]. From the perspective of prevention, attaching importance to accident prediction in safety work can effectively prevent accidents [3]. It is necessary to use scientific methods to accurately predict the short-term development trend of accidents in the construction process, which will help the country to make positive policy adjustments to the construction industry as a whole. Because of the safety situation in the industry, construction enterprises set safety production goals, effectively carry out early warning work and implement emergency measures in advance. Safety supervision departments make macro decisions, and at the same time, the personal life and health rights of participants in construction activities can be more guaranteed.
Currently, domestic, and foreign scholars have conducted relevant research on the prediction of production safety accidents in various industries. Due to the differences in the research field, the accident objects involved in the study have different data characteristics and development trends. These features will affect the selection and optimization of the prediction model. This study analyzed and predicted the number of accidents and deaths in the field of building construction as a time series. Because different research objects produce different time series, scholars often select or construct a single prediction model suitable after exploring the long-term laws of safety accidents in a certain field, and improve the accuracy of prediction results by continuous improvement and optimization. For example, Rajaprasad [4] used the autoregressive integrated moving average model (ARIMA model) which can predict future values of an object based on its previous and present values for the time series generated. This model enabled long-term forecasting over the next six years based on easily available and relatively complete sample data from plant accidents.
The key to this approach was in the stability of the sequence, and its accurate prediction of the results was confirmed. Yang Wenzhong [5] considered the problems of missing relevant factors and insufficient data mining in traffic accidents, then introduced time series relationships and modeled them in combination with gradient lift regression trees. Results indicated that the introduction of a time series helps to improve the prediction accuracy of the model, and the results play a reference role for the decision-making of traffic management departments. The modeling method also brings positive reference significance to the same types of predictions.
For unstable sequences, preprocessing techniques were carried out by Li Yafei [6]. based on Mann-Kendall trend analysis and mutation analysis methods. The trends of accidents and casualties at different flight stages of civil aviation were studied, and a time series analysis model was constructed through long-term data to predict the number of global civil aviation accidents and casualties. Li Ji [7] analyzed the causes of accidents and preprocessed data by quantifying the indicators, which combined BP neural networks to optimize multi-factor prediction models. Noor Wahida Md Junus [8] estimated and predicted the number of road traffic accidents and analyzed the residuals at each step after using a structural time series model to decompose, which made the predictions more reflective of the real situation. It can be seen that the preprocessing of data, especially the decomposition of trends, provides strong support for accurate prediction results. Benefits included being able to handle any type of data, allowing components to change over time at a variable rate, and being insensitive to outliers.
For decomposed time series, the selection of a recombination method is also very important. Huang Yue [9] used wavelet transform to decompose and recombine the time series to obtain prediction results of the fuzzy range of mining safety production accidents, which also helped to reduce randomness and uncertainty in the forecast. This demonstrated the effectiveness of model building and incorporating the original data decomposition method. In addition, as one of the most common models for accident prediction, the grey prediction method can solve the problems of low historical data and low sequence integrity, but its ability to process long-term data is weak and only reflects a single change law. To this end, Dong Lifei [10] comprehensively considered influencing factors such as the number of construction enterprises and the number of employees and added them to the development process of the model. It was proved that improving the on-site management system of construction enterprises and improving the management level can reduce the accidents, which are the key points in controlling the number of deaths in construction safety accidents. By preprocessing and analyzing the time series in a single prediction model and using artificial intelligence algorithm optimization, data volatility can be reduced, and prediction accuracy can be improved.
The optimized single prediction model is still not systematic and comprehensive. However, the combined prediction model can maximize the use of single prediction results, avoid the lack of effective information, and reduce the impact of accidental factors. For now, research to improve the accuracy and stability of predictions is still focused on weakening the volatility of the series. For example, Barman et al. [11], based on the influence of variables, introduced seasonal effects into short-term power load forecasting. Variables were effectively integrated into the prediction model of the combination of the firefly algorithm and support vector machine, which proved that the addition of other factors makes the prediction more realistic than a single one. In terms of eliminating the difficulties caused by non-stationary sequences, Wu et al. [12] used different models to predict signal sequences of different frequencies, demonstrating that the EMD-GM-ARMA combination model can accurately predict short-term and medium-term mine safety production conditions. Guo Jinping [13] used the X-12-ARIMA model combined with seasonal factors to predict the evolution trend of mine production safety conditions so that the prediction results were in line with the actual trend of coal mine accidents. To make the fluctuation trend more realistic, Wang Yuli [14] used the times series Markov model and considered the impact of the recent situation, and the final prediction error value of the TSM model, the precision of which was affirmed to be significantly smaller than that of the TS model.
In general, the above research has mainly focused on the analysis of a certain factor in a specific field in a long-term time range. They optimized and combined the classical prediction model to form an accident prediction model suitable for this field. Among them, the ARIMA model only needed endogenous variables and did not need to rely on other exogenous variables (relying only on the data itself, unlike regression that requires other variables) and had a good effect on non-stationary time series. However, it cannot capture nonlinear relationships. Grey forecast is a method of accumulating (or other processing to generate) the original data to obtain an approximate exponential law and then modeling, which can be applied to situations with less historical data and low sequence integrity and reliability. However, their long-term forecasts were weak and unable to handle volatile data [15]. Considering the development trend and characteristics of the number of accidents and deaths in the field of building construction have insufficient sample data and instability, the two methods are combined to predict for solving the problem well.
This paper divided the prediction of construction and production safety accidents into three granularities: annual, quarterly, and monthly accident predictions. First, the series was processed by trend decomposition, which divides the time series into several parts in general to solve the interference caused by nonlinear components. Because the change of time series is mainly affected by four factors: long-term trend, seasonal change, periodic change and irregular change, these were also considered in the selection of decomposition variables. The combination of the ARIMA model and the fractional-order grey accumulation model optimized by the genetic algorithm was used to predict the number of monthly and quarterly accidents with large historical data. For forecasts of annual accidents with small data volumes, the rolling forecast method was used. It means that the model is re-estimated in each iteration and produces a prediction result. After that, a new observation is added at the end of the series, and the process continues. When there is no more data to add, the process stops. It is necessary to consider the impact of larger and above accidents on the forecast in the prediction of the death toll. This paper aimed to establish an accident prediction model suitable for the field of construction and realize the accurate prediction of the development trend of construction and production safety accidents.
Accidents can be avoided because they have certain characteristics and laws, and as long as these characteristics and patterns are mastered, and can be reasonably applied, effective measures can be taken to control them in advance, and the occurrence of accidents and the losses caused by them can be prevented and reduced. It is by analyzing these characteristics and patterns that predictions support future safety efforts. Actively grasping the general direction of accident prevention can play a role in curbing the occurrence of accidents to a certain extent.

Time Series
Time series has the following characteristics: the continuation of the development trend to the future is set as a premise; there is no regularity in the data involved; exclude causal relationships between the development of factors [16]. As an important indicator for studying the development of accidents, the number of accidents determines the future Appl. Sci. 2022, 12, 11124 4 of 19 safety status of the entire building construction field to a certain extent. Analyzing and forecasting as a time series can help illustrate accident trends. The number of deaths, like the number of accidents, has a similar development trend, but people tend to pay more attention to the number of accidents in accident prediction, resulting in the impact of the number of deaths in later stages being mostly indirect. Predicting the number of fatalities is more complex than the number of accidents.
According to past statistics, in the process of building construction, construction accidents have been high due to the lack of material selection, lack of training and safety awareness of personnel, and chaotic on-site management. Figure 1 shows the number of housing municipal engineering production safety accidents and deaths in China from 2009 to 2019. From 2012 to 2015, the number of accidents and deaths decreased as the state issued and improved the relevant laws and regulations and relevant regulations of the construction industry, standardizing all aspects of the construction industry from design, and construction to acceptance, etc. and improving the management level to a certain extent [17]. However, since 2016, the number of cases and deaths has continued to "double rise", which was related to the adjustment of accident reporting methods by the State Administration of State Safety, the lack of work and responsibilities related to the hidden dangers of construction enterprises [18], and soaring housing prices in some parts of the country.

Time Series
Time series has the following characteristics: the continuation of the development trend to the future is set as a premise; there is no regularity in the data involved; exclude causal relationships between the development of factors [16]. As an important indicator for studying the development of accidents, the number of accidents determines the future safety status of the entire building construction field to a certain extent. Analyzing and forecasting as a time series can help illustrate accident trends. The number of deaths, like the number of accidents, has a similar development trend, but people tend to pay more attention to the number of accidents in accident prediction, resulting in the impact of the number of deaths in later stages being mostly indirect. Predicting the number of fatalities is more complex than the number of accidents.
According to past statistics, in the process of building construction, construction accidents have been high due to the lack of material selection, lack of training and safety awareness of personnel, and chaotic on-site management. Figure 1 shows the number of housing municipal engineering production safety accidents and deaths in China from 2009 to 2019. From 2012 to 2015, the number of accidents and deaths decreased as the state issued and improved the relevant laws and regulations and relevant regulations of the construction industry, standardizing all aspects of the construction industry from design, and construction to acceptance, etc., and improving the management level to a certain extent [17]. However, since 2016, the number of cases and deaths has continued to "double rise", which was related to the adjustment of accident reporting methods by the State Administration of State Safety, the lack of work and responsibilities related to the hidden dangers of construction enterprises [18], and soaring housing prices in some parts of the country. Looking at the global accident development trend, it was found that the construction industry or other industries in each country have their own unique industry typical emergencies, and it is precise because of these major shocks that the trend has changed significantly. For example, the impact of the COVID-19 pandemic will exacerbate the pressure on completion in the construction industry, shortening the construction period. At the same time, it will cause more unsafe behaviors and conditions, which increases more accidents. Positive measures need to be taken at an early stage to avoid these phenomena. It is also based on the proven theory that political and economic influences are the primary drivers of right and wrong behaviors. Thus, those special circumstances should be considered when researching, which means that the addition of seemingly unrelated indicators may also have an impact on accident prediction. This is one of the reasons why the nature of the accident is considered when predicting the number of deaths in Section 3.3, Looking at the global accident development trend, it was found that the construction industry or other industries in each country have their own unique industry typical emergencies, and it is precise because of these major shocks that the trend has changed significantly. For example, the impact of the COVID-19 pandemic will exacerbate the pressure on completion in the construction industry, shortening the construction period. At the same time, it will cause more unsafe behaviors and conditions, which increases more accidents. Positive measures need to be taken at an early stage to avoid these phenomena. It is also based on the proven theory that political and economic influences are the primary drivers of right and wrong behaviors. Thus, those special circumstances should be considered when researching, which means that the addition of seemingly unrelated indicators may also have an impact on accident prediction. This is one of the reasons why the nature of the accident is considered when predicting the number of deaths in Section 3.3, and it is an inevitable choice to quantify other influencing factors and highlight their role in forecasting.
At the same time, there is a non-linear periodicity in the construction and production safety accidents, resulting in fluctuations in the quarterly and monthly data of accidents. Rather than predicting time series directly, pre-processing can better reflect the real situation. The development of construction accidents is mainly affected by long-term trends, seasonal fluctuations, and other fluctuations which are independent of each other. Choosing an additive model based on locally weighted regression-seasonal and trend decomposition using Loess (STL), dividing the time series into three parts: trend components, seasonal components, and remainders. This method of "predicting separately and then integrating" will effectively improve prediction accuracy. Among them, "trend" means a trend or state in which a phenomenon continues to develop and change over a long period of time, "seasonal" means the regular changes in the level of development of phenomena caused by seasonal changes. and "residual" means the impact of many fortuitous factors on the time series.
Since the monthly data is more intuitive in the presentation of the trend, the monthly data of the number of production safety accidents and deaths in China's housing municipal engineering from 2009 to 2019 were selected for trend decomposition, and the decompositions are shown in Figure 2 and it is an inevitable choice to quantify other influencing factors and highlight their role in forecasting.
At the same time, there is a non-linear periodicity in the construction and production safety accidents, resulting in fluctuations in the quarterly and monthly data of accidents. Rather than predicting time series directly, pre-processing can better reflect the real situation. The development of construction accidents is mainly affected by long-term trends, seasonal fluctuations, and other fluctuations which are independent of each other. Choosing an additive model based on locally weighted regression-seasonal and trend decomposition using Loess (STL), dividing the time series into three parts: trend components, seasonal components, and remainders. This method of "predicting separately and then integrating" will effectively improve prediction accuracy. Among them, "trend" means a trend or state in which a phenomenon continues to develop and change over a long period of time, "seasonal" means the regular changes in the level of development of phenomena caused by seasonal changes. and "residual" means the impact of many fortuitous factors on the time series.
Since the monthly data is more intuitive in the presentation of the trend, the monthly data of the number of production safety accidents and deaths in China's housing municipal engineering from 2009 to 2019 were selected for trend decomposition, and the decompositions are shown in Figure

Relationship Analysis
The data comes from China's housing municipal engineering safety production accidents released by the Ministry of Housing and Urban-Rural Development in China, in which the number of monthly accidents and the number of deaths from 2009 to 2019 are recorded in detail. The reason why data before and after the interval was not considered is based on the development of China's construction industry and the impact of the

Relationship Analysis
The data comes from China's housing municipal engineering safety production accidents released by the Ministry of Housing and Urban-Rural Development in China, in which the number of monthly accidents and the number of deaths from 2009 to 2019 are recorded in detail. The reason why data before and after the interval was not considered is based on the development of China's construction industry and the impact of the COVID-19 epidemic, respectively. In the 11 years from 2009 to 2019, the long-term development trends of accidents and deaths were similar and maintained a strong correlation. However, the residual volatility of the number of deaths was larger than that of the accidents, indicating that the impact of deaths was more complex and diverse, and more factors were considered than the number of accidents when forecasting them.
In order to explore the deep linear correlation relationship between the number of accidents and the number of deaths, their data as a time series need to be decomposed. Thus, the values of the long-term trend items which reflect the development trend of accident situations over a considerable period, obtained after decomposing the quarterly and monthly data are summed to generate annual data separately. The newly generated annual data on accidents and deaths are then compared with the actual annual data. So, four sets of data were obtained according to trend decomposition, namely the new number after a sum of quarterly accidents, the new number after a sum of monthly accidents, the new number after a sum of quarterly deaths, and the new number after a sum of monthly deaths. To show the correlation between the data, the Pearson correlation coefficient was selected. It represents the cosine of the angle between the vectors formed after the mean is set, which can measure the degree of linear correlation. The higher the coefficient, the more relevant the data. As can be seen from Table 1, the correlation between the newly generated annual data and the real annual data after the decomposition of the quarterly and monthly trends is extremely strong, and the Pearson correlation coefficient exceeds 0.9. Among the three sets of data related to the number of deaths, only the newly generated annual data after the quarterly decomposition was strongly correlated with the real data, and the other two were not highly correlated. It follows that the number of deaths cannot be directly predicted as a single time series, and the impact of the addition of other factors should also be considered. This article will follow in quantifying the impact of larger and above accidents and supplementing the prediction of the death toll.

Model Selection
For stationary time series, autocorrelation means that it is possible to predict the future with historical data. To know if the time series is stationary, it is necessary to enter the autocorrelation function (ACF) and partial autocorrelation function (PACF) figures. ACF indicates the degree of correlation between the current series value and the current series past. PACF describes the correlation between the residuals (after removing the effects already explained by the lag) and the next lag value. Both of their figures have a certain error band, and the data in this area is considered autocorrelated, so it can be used as a basis for judging when the time series is stable. By observing ACF and PACF figures generated by quarterly and monthly sample data, it was found that the original time series was not stable. So, the autoregressive integrated moving average model (ARIMA) was selected because of its ability in solving the problems of data characteristics changing over time and time series instability in stochastic processes [19]. However, considering its disadvantages, such as the requirement that time series are stable or stable by differentiation and only linear relationships can be captured, it should be combined with other models. The idea of the grey model is to fit the data by accumulating the original series, which is suitable for data prediction with a small sample size and a single variable [20]. In the monthly forecast, it was necessary to consider the holiday effect of the Spring Festival and process the data in January and February separately. Since these data are in a relatively fixed margin of error and the series is short, a fractional-order grey accumulation model was introduced.

ARIMA Model
The idea of the ARIMA model is to smooth out the original sequences that are not stationary, using the differential method to get a stationary sequence, which applies to the time series in this paper. The specific steps are as follows: First, the original time series stationarity was determined by the augmented Dickey-Fuller test (ADF). This test is to determine whether there is a unit root in a sequence: if the sequence is stable, there is no unit root; Otherwise, there will be a unit root. The hypothesis of the ADF test is the existence of a unit root, and if the resulting significance test statistic is less than three confidence levels (10%, 5%, 1%), it should be (90%, 95%, 99%) confident to reject the null hypothesis. Secondly, the model is identified, and the parameters are estimated. The principle is to convert a nonstationary time series to a stationary time series and then regress the dependent variable only on its lag order and the present and lagging values of the random error term. In the ARIMA (p, d, q) model, d is the differential order; AR is autoregressive which is only suitable for predicting phenomena related to its previous period, p is the number of autoregressive terms; MA is the moving average which focuses on the accumulation of error terms and effectively eliminates random fluctuations in predictions, q is the number of moving average terms [21]. The model is represented as follows: sequences with the integration of d-order; (1 − L) d X t : differential operators. ∅ i : autoregressive coefficients; θ i : moving average coefficients; ε t : Random items. Finally, the Akaike Information Criterion (AIC) which builds on the concept of entropy, can weigh the complexity of the estimated model against the goodness of the model fitting the data. The size of the AIC value is related to the number of parameters and samples. The fewer the parameters and the greater the number of samples result in the smaller the AIC value and the better the model. Thus, the purpose of the function is to find a model that best interprets the data while containing the fewest free arguments. Therefore, in order to avoid overfitting, the combination of parameters with the smallest AIC value should be selected as the optimal ARIMA model [22].

Fractional Order Grey Accumulation Model
In the case that the regularity of the original sequence is not strong, and the fluctuation is large, the result of the combination of first-order accumulation and least squares to obtain parameters is poor with the actual data, and it needs to be optimized. The emergence of fractional-order accumulation operators can solve the disadvantages of poor fitting, but its operation process tends to be average, and it is easy to overfit. However, through the analysis of the original time series, it can be seen that individual data can affect the accuracy of the ARIMA model's predictions. For those individual data, it may be more reasonable to treat them separately, and the grey model plays a good role in correcting these. In addition, in the face of annual forecasts that cannot be trend-decomposed, the ARIMA model is no longer suitable, and the introduction of a fractional grey accumulation model for the rolling prediction can solve the problem that the result is accidental due to too little sample data.
The steps to build the model are as follows: First, let the r-order accumulation generation sequence be . . , n. x (r) (k) means the original sequence X (0) 's grey accumulation of r-order generated operator which is defined by the gamma function.
Update the mean value of its adjacent items and order Y (r) (k) = . Thus establish the grey differential equation, i.e., the GM r (1,1) model [23]: X (r) (k): grey accumulation of r-order generated sequence; Y (r) (k): mean of grey accumulation of adjacent r-order generated sequence; The parameters a and b in this model satisfy the following matrix relationships: Appl. Sci. 2022, 12, 11124 Since least squares estimation minimizes the sum of squared errors, the least squares are used to obtain the parameters. From this, the whitening differential equation for the GMr(1,1) model can be obtained: There are two parameters a, b, and to solve these two parameters, the whitening equation is now discretized. To solve the differential equations to obtain the response of the sequence and the reduction value [24]: X (r) (k) : predictive values before restoration; X (0) (k) : predictive values after restoration.

Grey Model Based on Genetic Algorithm Optimization
The process of using the least squares method to determine the optimal order has problems, such as cumbersome operation processes and local optimal solutions, resulting in unsatisfactory results. The selection of the optimal order of the fractional grey accumulation model can be intelligently searched by genetic algorithms. The algorithm has strong search capabilities, and its mutation mechanism can avoid falling into local optimization. It can introduce probabilistic thinking in natural selection, randomly select individuals, and easily combine with other algorithms to approximate the optimal solution of the function or a better local optimal solution. Because a set of candidate solutions is used instead of a candidate solution, the crossover and mutation operations will cause the candidate solution to differ from the previous solution. As long as efforts are made to maintain population diversity and avoid premature convergence, a globally optimal solution may emerge. First, the time series is transformed into individuals in the population by coding, generating an adaptability function [25]. Before reaching the maximum number of iterations, continuous selection, crossover, and variation are made to select individuals with good adaptability in the population and repeat the above operations on the subpopulation until the optimal solution is found [26]. Figure 3 shows a flowchart using genetic algorithms to optimize the grey model.
First, the variables, including population size, crossover rate, mutation rate, genetic algebra and degree of variation factor were initialized. After generating the initial population, the gray model mentioned above is introduced for fitness calculation. This means that after decoding the individual coding string, the phenotype of the individual can be obtained. The phenotype of an individual can calculate the value of the objective function of the corresponding individual. According to the type of optimization problem, the fitness of an individual is calculated by the objective function value according to certain transformation rules, and the ultimate purpose is to determine the optimal order. until the optimal solution is found [26]. Figure 3 shows a flowchart using genetic algorithms to optimize the grey model. First, the variables, including population size, crossover rate, mutation rate, genetic algebra and degree of variation factor were initialized. After generating the initial population, the gray model mentioned above is introduced for fitness calculation. This means that after decoding the individual coding string, the phenotype of the individual can be obtained. The phenotype of an individual can calculate the value of the objective function of the corresponding individual. According to the type of optimization problem, the fitness of an individual is calculated by the objective function value according to certain transformation rules, and the ultimate purpose is to determine the optimal order.

Data Training
Taking the monthly accident data from 2009 to 2019 as an example, the trend decomposition was not carried out first, and the accident number is directly used as the original time series, which is a non-stationary sequence. The smaller the p-value in the ADF test, the smoother the sequence, and the t should be less than the critical value. The EViews 10 software is used to perform time series tests. As shown in Table 2, the p-value of the firstorder difference is much smaller than that of the second-order difference, and the difference between the test statistic and the critical value is larger, so the first-order difference is performed on the data. Since the auto-regressive moving average (ARMA) model differs by one difference from the ARIMA model, the ARMA modelwais used to predict the accident situation in 2019 for the time series after the difference, and the difference parameter d in the ARIMA model is also determined to be 1.

Data Training
Taking the monthly accident data from 2009 to 2019 as an example, the trend decomposition was not carried out first, and the accident number is directly used as the original time series, which is a non-stationary sequence. The smaller the p-value in the ADF test, the smoother the sequence, and the t should be less than the critical value. The EViews 10 software is used to perform time series tests. As shown in Table 2, the p-value of the first-order difference is much smaller than that of the second-order difference, and the difference between the test statistic and the critical value is larger, so the first-order difference is performed on the data. Since the auto-regressive moving average (ARMA) model differs by one difference from the ARIMA model, the ARMA modelwais used to predict the accident situation in 2019 for the time series after the difference, and the difference parameter d in the ARIMA model is also determined to be 1. Table 2. Comparison of first-order differences versus second-order differential results.

First-Order Difference
Second-Order Difference  Figure 4 shows the autocorrelation and partial autocorrelation plots after the series differential, with AC and PAC representing the autocorrelation coefficient and the partial autocorrelation coefficient, respectively, and the order of the delay period in portrait orientation. As can be seen from the figure, the sequence is significant in the 1st order, and the descent from the 2nd order is very large and becomes insignificant, but the significance suddenly increases around the 12th order and then remains low in the state of significance. Therefore, the p and q values are initially set to 12, and the ARMA (12, 12) process is set. Then narrow the p and q values, in turn, give preference to the case where the AIC value is the smallest, and find the optimal combination ARMA (11,12). autocorrelation coefficient, respectively, and the order of the delay period in portrait orientation. As can be seen from the figure, the sequence is significant in the 1st order, and the descent from the 2nd order is very large and becomes insignificant, but the significance suddenly increases around the 12th order and then remains low in the state of significance. Therefore, the p and q values are initially set to 12, and the ARMA (12, 12) process is set. Then narrow the p and q values, in turn, give preference to the case where the AIC value is the smallest, and find the optimal combination ARMA (11,12).

Figure 4. Autocorrelation plots and partial autocorrelation plots.
After the model is established, dynamic prediction [27] makes a multi-step forward prediction within the estimated interval and static prediction [28] makes each prediction by replacing the predicted value with the true value, and then one step forward. So, monthly data on accidents from 2009 to 2019 is trained and the number of accidents in each month in 2019 is predicted. The effect of the prediction is judged by the theory inequality coefficient, and the closer the value is to 0, the closer it is to the real data, and the better the prediction ability is proved. This means that the predicted values have greater volatility and a smaller proportion of variance, and the model can better simulate the fluctuations of the actual series. Thus, the inequality coefficient of dynamic prediction is 0.36, and the prediction effect is better. Static predicted values have greater volatility, but an unequal coefficient of 0.20 provides a better simulation of actual sequence fluctuations. As shown in Figure 5, the relative errors of the two methods for the prediction of accidents in 2019 are 3.10% and 8.28%, the correlation coefficients with the original data are 0.943 and 0.879, and the degree of fit R 2 exceeds 0.8.  After the model is established, dynamic prediction [27] makes a multi-step forward prediction within the estimated interval and static prediction [28] makes each prediction by replacing the predicted value with the true value, and then one step forward. So, monthly data on accidents from 2009 to 2019 is trained and the number of accidents in each month in 2019 is predicted. The effect of the prediction is judged by the theory inequality coefficient, and the closer the value is to 0, the closer it is to the real data, and the better the prediction ability is proved. This means that the predicted values have greater volatility and a smaller proportion of variance, and the model can better simulate the fluctuations of the actual series. Thus, the inequality coefficient of dynamic prediction is 0.36, and the prediction effect is better. Static predicted values have greater volatility, but an unequal coefficient of 0.20 provides a better simulation of actual sequence fluctuations. As shown in Figure 5, the relative errors of the two methods for the prediction of accidents in 2019 are 3.10% and 8.28%, the correlation coefficients with the original data are 0.943 and 0.879, and the degree of fit R 2 exceeds 0.8. entation. As can be seen from the figure, the sequence is significant in the 1st order, and the descent from the 2nd order is very large and becomes insignificant, but the significance suddenly increases around the 12th order and then remains low in the state of significance. Therefore, the p and q values are initially set to 12, and the ARMA (12, 12) process is set. Then narrow the p and q values, in turn, give preference to the case where the AIC value is the smallest, and find the optimal combination ARMA (11,12).

Figure 4. Autocorrelation plots and partial autocorrelation plots.
After the model is established, dynamic prediction [27] makes a multi-step forward prediction within the estimated interval and static prediction [28] makes each prediction by replacing the predicted value with the true value, and then one step forward. So, monthly data on accidents from 2009 to 2019 is trained and the number of accidents in each month in 2019 is predicted. The effect of the prediction is judged by the theory inequality coefficient, and the closer the value is to 0, the closer it is to the real data, and the better the prediction ability is proved. This means that the predicted values have greater volatility and a smaller proportion of variance, and the model can better simulate the fluctuations of the actual series. Thus, the inequality coefficient of dynamic prediction is 0.36, and the prediction effect is better. Static predicted values have greater volatility, but an unequal coefficient of 0.20 provides a better simulation of actual sequence fluctuations. As shown in Figure 5, the relative errors of the two methods for the prediction of accidents in 2019 are 3.10% and 8.28%, the correlation coefficients with the original data are 0.943 and 0.879, and the degree of fit R 2 exceeds 0.8.

Monthly Forecast
The number of accidents in January and February is at a low value throughout the year, because the Spring Festival in China is mainly concentrated in January and February, and the holiday effect increases the overall fluctuation of the time series. Due to the uncertainty of the Spring Festival time and the generally small number of accidents in January and February, the two months are extracted for separate forecasts, and the specific impact range distribution of the Spring Festival in each year is no longer considered. The optimal combination of ARMA (10, 10) is found in the same way as the data training, and it is found that the relative errors of the two predictions in the remaining 10 months of 2019 are 4.6% and 15.9%, the correlation coefficients with the true values are 0.824 and 0.862, and the fitting degree was 0.738 and 0.591, although the degree of fit and related coefficients decreased, to a certain extent, overfitting was avoided.
Using the above optimal combination, the ARIMA (10, 1, 10) process is set to predict the number of accidents from March to December, and the number of accidents in January and February is predicted by using the fractional grey accumulation model. After calculation, it is found that the relative error between the predicted value and the actual value of the number of accidents in 2019 has been reduced from 3.23% to 2.33%, indicating that this method can improve the prediction accuracy. Table 3 shows the forecasts before and after the model revisions for 2019 and 2020. Repeat the above operation if you first perform trend decomposition on the original time series and then use the trend component as a new time series. It is found that the optimal prediction process in the case of trend decomposition is obtained by setting the ARIMA (4, 2, 12) process to predict the long-term trend, adding the seasonal component and the remainder predicted by the ARIMA (6, 0, 6) model. Table 4 shows the forecast results after the trend decomposition. The relative error between the predicted value and the true value of the monthly accident number in 2019 decreases from 3.23% to 2.46%. At the same time, the correlation coefficient increases from 0.861 to 0.932, and the fitting degree is 0.868, which demonstrates the forecast development trend is closer to the real situation. This shows that the model can effectively reflect the monthly development trend of accidents and guide the monthly rectification of the construction industry. To prevent accidents, relevant departments should focus on larger changes in monthly data and strengthen safety management during that period. In particular, although the amount of work in some months has been reduced because of holidays, and accidents have been reduced accordingly, it is often accompanied by the relaxation of safety awareness or the lack of supervision. This is also the key reason for the large variation and is consistent with the prediction results, which proves the validity of the prediction. Due to the impact of the COVID-19 pandemic, the true data reference value for 2020 is small, so no comparison between the predicted value and the true value is made.

Quarterly Forecast
Taking the quarterly accident data as the time series, the stationarity is better after the second-order difference. So the ARMA (7, 3) model is established, and the prediction result is shown in Figure 6. The relative error of dynamic prediction is 3.49%, and the static forecast is 4.01%, which is not much different from the monthly forecast, and the accuracy of static prediction is doubled. The reason may be that the quarterly data is accumulated through the month, which plays the role of "peak shaving and valley filling" to a certain extent, and its data volatility is more stable than that of monthly data, which helps to improve the accuracy of forecasting.
The relative error between the predicted value and the true value of the monthly accident number in 2019 decreases from 3.23% to 2.46%. At the same time, the correlation coefficient increases from 0.861 to 0.932, and the fitting degree is 0.868, which demonstrates the forecast development trend is closer to the real situation. This shows that the model can effectively reflect the monthly development trend of accidents and guide the monthly rectification of the construction industry. To prevent accidents, relevant departments should focus on larger changes in monthly data and strengthen safety management during that period. In particular, although the amount of work in some months has been reduced because of holidays, and accidents have been reduced accordingly, it is often accompanied by the relaxation of safety awareness or the lack of supervision. This is also the key reason for the large variation and is consistent with the prediction results, which proves the validity of the prediction. Due to the impact of the COVID-19 pandemic, the true data reference value for 2020 is small, so no comparison between the predicted value and the true value is made.

Quarterly Forecast
Taking the quarterly accident data as the time series, the stationarity is better after the second-order difference. So the ARMA (7, 3) model is established, and the prediction result is shown in Figure 6. The relative error of dynamic prediction is 3.49%, and the static forecast is 4.01%, which is not much different from the monthly forecast, and the accuracy of static prediction is doubled. The reason may be that the quarterly data is accumulated through the month, which plays the role of "peak shaving and valley filling" to a certain extent, and its data volatility is more stable than that of monthly data, which helps to improve the accuracy of forecasting. After the trend decomposition, set the ARIMA (4, 2, 3) process prediction trend component, and set the ARIMA (2, 0, 1) process prediction remainder. The results are shown in Table 5, the correlation coefficient between the true value of the quarterly accident After the trend decomposition, set the ARIMA (4, 2, 3) process prediction trend component, and set the ARIMA (2, 0, 1) process prediction remainder. The results are shown in Table 5, the correlation coefficient between the true value of the quarterly accident number and the predicted value is 0.957, the degree of fit is 0.915, and the relative error is only 0.39%, indicating that the obtained results after quarterly forecast plus are more ideal than the monthly forecast. This also reflects the necessity of quarterly rectification, due to the short time of a month and the large differences in the status of the environment and festivals, the number of accidents during this period will be accidental. Divided by quarters, the regularity of the predictive results is enhanced, and the predictive error is reduced, so that the predictive results have better guiding value. At the end of each quarter, it is necessary to analyze the accident situation of the current quarter in a timely manner and summarize the experience and lessons, strengthen safety education and supervision for subsequent work, and continuously improve safety.

Annual Forecast
For the forecast of the annual data, due to the small number of data samples and the large fluctuations, it was found that although the error of the prediction result was small after the grey forecast, it did not fully reflect the real accident trend, and the result was accidental. Using quarterly data, rolling forecasts [29] are made starting in 2009 in a two-yearly forecast, three-yearly forecast, and four-yearly forecast, and then repeatedly starting in 2010, and so on. Using the results updated after each step to re-forecast and adding up the forecasted quarterly data to produce the hierarchical annual forecasts shown in Figure 7.
the short time of a month and the large differences in the status of the environment and festivals, the number of accidents during this period will be accidental. Divided by quarters, the regularity of the predictive results is enhanced, and the predictive error is reduced, so that the predictive results have better guiding value. At the end of each quarter, it is necessary to analyze the accident situation of the current quarter in a timely manner and summarize the experience and lessons, strengthen safety education and supervision for subsequent work, and continuously improve safety. Table 5. Quarterly forecast and additional results of housing municipal engineering production safety accidents after trend decomposition. 2019  786  0  −16  770  773  0.39%  2020  870  0  8 878 --

Annual Forecast
For the forecast of the annual data, due to the small number of data samples and the large fluctuations, it was found that although the error of the prediction result was small after the grey forecast, it did not fully reflect the real accident trend, and the result was accidental. Using quarterly data, rolling forecasts [29] are made starting in 2009 in a twoyearly forecast, three-yearly forecast, and four-yearly forecast, and then repeatedly starting in 2010, and so on. Using the results updated after each step to re-forecast and adding up the forecasted quarterly data to produce the hierarchical annual forecasts shown in Figure 7. The different results generated by each prediction path are calculated in correlation with the real data, and the results are normalized to obtain the weights of the predicted values under different paths, and the revised annual accident prediction results are generated. The correlation coefficient between rolling predicted and true values is 0.963, the fit of a single grey forecast to raw value is 0.792, and the rolling forecast is as high as 0.928. It can also be seen from Figure 8 that the rolling forecast of the accident trend has a higher degree of accuracy than the single grey forecast and can better reflect the actual accident occurrence prediction than the single grey forecast. The different results generated by each prediction path are calculated in correlation with the real data, and the results are normalized to obtain the weights of the predicted values under different paths, and the revised annual accident prediction results are generated. The correlation coefficient between rolling predicted and true values is 0.963, the fit of a single grey forecast to raw value is 0.792, and the rolling forecast is as high as 0.928. It can also be seen from Figure 8 that the rolling forecast of the accident trend has a higher degree of accuracy than the single grey forecast and can better reflect the actual accident occurrence prediction than the single grey forecast. This method of continuous forecasting by constantly revising the data for the near period is similar to short-term forecasting and is particularly suitable for situations where the accumulation of annual forecast data is limited. At the same time, it also compensates for the disadvantage that the results of short-term forecasts are greatly affected by the This method of continuous forecasting by constantly revising the data for the near period is similar to short-term forecasting and is particularly suitable for situations where the accumulation of annual forecast data is limited. At the same time, it also compensates for the disadvantage that the results of short-term forecasts are greatly affected by the proximity value and the overall trend. This combined model avoids the defect that a single grey prediction is not suitable for target analysis and prediction with large fluctuations and verifies the validity of the prediction results.

Death Toll Prediction
According to the above research, it was found that the trend decomposition of the time series was first analyzed and then predicted, and the prediction accuracy of the results was higher than that of the direct forecast. So, only the prediction after the trend decomposition was explored in the prediction of the number of deaths. Setting the ARIMA (4, 2, 11) and ARIMA (4, 0, 4) processes, the monthly forecast of construction deaths in 2019 and 2020 is shown in Table 6. The correlation coefficient between the true number of monthly accident deaths and the predicted value was 0.906, the degree of fit was 0.821, and the relative error was 0.33%, indicating that in terms of reflecting the real situation, the prediction of the number of deaths was not as good as the prediction of the number of accidents. Although the number of deaths is predicted as a single time series, the relative error between the predicted value and the true value in 2019 was extremely small. Through the analysis of the death number data itself, it can be seen that the number of deaths in the accident prediction is different from the number of accidents which can have an intuitive impact, and it is impossible to make accurate predictions based on the pure time series. Therefore, this paper further adjusted the prediction results by considering the impact of larger and above accident factors. For example, at the level of death data, an accident in which 9 people died and 3 accidents in which 3 people died are equal in quantity, but in real life, the accident of 9 deaths is far greater than that of 3 deaths, and the warning role played in the later construction process will be different [30]. Therefore, the rescale of larger and above accidents in 2010-2019 with detailed records of monthly accidents is intended to highlight its impact on the development trend of construction and production safety accidents. The specific scales are as follows: X: The impact of the number of larger and above accidents each month per year. Y: The impact of the number of deaths in larger and above accidents per month per year. m i : The number of deaths per month of each larger or above accident. The CRITIC weighting method [31] is used for rescaled data, which is objective empowerment to quantify the volatility and relevance of the data. If the standard deviation of the data is larger, the greater the fluctuation, and a high weight value is assigned. If the correlation values between the indicators are smaller, the greater the conflict, and a high weight value is assigned [32]. Finally, the two indicators are multiplied to obtain the amount of information, and the final weight is obtained after normalization. The specific steps are as follows: Firstly, by analyzing the six correlation combinations between the four elements of the total number of accidents, the total number of deaths, the number of larger and above accidents, and the death toll of larger and above accidents, four of them are selected as having not reached the strong correlation, namely: Combination A: The relationship between the number of larger and above accidents and the total number of accidents; Combination B: The relationship between the death toll of larger and above accidents and the total number of accidents; Combination C: The relationship between the number of larger and above accidents and the total number of deaths; Combination D: The relationship between the death toll of larger and above accidents and the total number of deaths.
Secondly, after correcting the relevant data, new data obtained is delayed, the new correlation is generated, and the forward and reverse processing is carried out. So that all indicator values are compressed between [0, 1]. The larger the value of the positive indicator, the better it is, and the larger the value of the negative indicator, the worse it is. The conversion formulas and metrics are described below: Positive indicators : M1: Initial correlation coefficient of the two elements in each combination; M2: The correlation coefficient of the two elements after the impact of the larger and above accidents lags by one month; M3: The correlation coefficient of the two elements after the impact of the larger and above accidents lags by two months; M4: Corrected correlation coefficient of the two elements in each combination; M5: Corrected correlation coefficient of the two elements after the impact of the larger and above accidents lags by one month; M6: Corrected correlation coefficient of the two elements after the impact of the larger and above accidents lags by two months.
Among them, the M1 and M4 indicators are forward-oriented, and other indicators are reversed.
Taking Combination A as an example, the results corresponding to each of its indicators are shown in Table 7. Thirdly, the above values are assigned to the CRITIC weights, and the situation with the largest proportion of indicators is selected. According to Table 8, it can be seen that after the impact of larger and above accidents is corrected and lags by one month, its impact on the number of deaths forecasts accounts for the largest proportion. Thus it is chosen to add 25% of the monthly changes in the number and death toll of larger and above accidents to 2020 to the remainder of the original trend decomposition. Finally, considering that the occurrence of larger and above accidents can not only have a warning effect on the construction field, strengthen the impact of education and management, but also lead to increase psychological pressure and aggravate the aging of equipment, etc. It will show two situations in terms of prediction: reducing and increasing the number of accident deaths. Predictions of the number of deaths will eventually result in a revised outcome interval.
Through the calculation, it can be seen that the correlation coefficient between the predictive value of the monthly deaths in 2010-2019 and the raw value at the time of non-correction is 0.904. However, the corrected correlation coefficient is as high as 0.994, and the fitting degree is increased from 0.817 to 0.988, indicating that the revised forecast results reflect the real situation. Table 9 shows that the relative error of the prediction range of the number of deaths in 2019 is between −8.05% and 7.41%. Due to the strong correlation and fitting degree between the predicted value and the true value, it can be proved that the revised data has more reference value than the original forecast data. Thus, the above revision method can be used to predict the number of deaths in the national housing municipal engineering production safety accident in 2020. Since the changes and impacts of larger and above accidents are taken into account when predicting the number of accident fatalities, the final prediction results can reflect the different focus of safety supervision in the industry. Focusing on larger and above accidents may lead to the improvement of the safety situation from a single perspective and cannot achieve comprehensive improvement. At present, the control of serious accidents in China's construction has achieved certain results, but the control of frequent accidents has been relaxed, which will cause the total number of accidents to show an upward trend, which is consistent with the predicted results. Therefore, in the current construction safety management, on the basis of maintaining the management of more harmful accidents, the management of more frequent general accidents should be strengthened.

Conclusions
In this paper, the production safety accidents of housing municipal engineering are analyzed, and the time series under these three granular sizes were predicted, respectively, because the long-term trends of the annual, quarterly, and monthly data of the number of accidents and deaths remained highly correlated. Predicting the same research object will lead to different prediction accuracy due to different data granularity, and the combined model correction will be carried out for the time series under different circumstances, so as to continuously improve the prediction accuracy. Finally, the conclusions are as follows: (1) Using monthly data and quarterly data of accidents from 2009 to 2019 for training through the ARMA model, which compares dynamic prediction and static forecast results in 2019, and further determines the optimal parameters of the ARIMA model. Among them, the monthly forecast should also consider the impact of the Spring Festival, and a fractional grey accumulation model based on genetic algorithms is constructed to predict accidents in January and February alone. The relative error of accident prediction in 2019 is reduced from 3.23% to 2.33%. Decompose the time series and predict the trend components and remainder quantities separately. The predictive value obtained after the addition of the monthly and quarterly accident prediction results and the Pearson correlation coefficient of the raw value exceeds 0.9, indicating a strong correlation. The relative error of the forecast is reduced to 0.39% by using the quarterly data, which improves the prediction accuracy. It confirms the effectiveness of the model in monthly and quarterly prediction, effectively reflecting the development trend of accidents. And for monthly, people should focus on situations with large changes and strengthen safety management during the period. For quarterly, it is necessary to analyze the accident situation of the current quarter in a timely manner and summarize the lessons at the end of each quarter.
(2) When forecasting the number of annual accidents, due to the small amount of sample data and the absence of regularity, the ARIMA model cannot be used for prediction. Based on this, the rolling forecast is carried out which uses a two-yearly forecast, threeyearly forecast, and four-yearly forecast from 2009. The final forecast value for 2009-2019 is revised by calculating the proportion of the forecast results under different forecast paths. The correlation coefficient between the predicted value and the raw value is increased from 0.890 to 0.963, and the degree of fit is increased from 0.792 to 0.928. This method compensates for the fact that the relative error is affected by the base of the raw value itself, and the predictive value of the grey model in the accident prediction tends to be average and does not reflect the actual situation well, so the prediction result is closer to the real situation. This combined model avoids the defect that a single grey prediction is not suitable for target analysis and prediction with large fluctuations and verifies the validity of the prediction results.
(3) By analyzing the impact of larger and above accidents in the country, it was found that when predicting the number of deaths, if the relationship between them and the number of accidents is not considered, the results will not reflect the development trend of safety accidents in the construction field in the real society. The method of highlighting the impact of larger and above accidents and quantifying them is adopted to distinguish the size of the warning effect of larger and above accidents in different periods on production and life, related personnel, and management. Since the occurrence of larger and above accidents has different effects on different regions of the country, the corrected residual values are added to the forecast results by taking positive and negative extreme values. The correlation coefficient between the predictive value of deaths and the raw value in 2019 is 0.994 and the degree of fit is 0.988, which ensures that the two are highly correlated and fitted, and obtains the predictive interval for the number of deaths in construction and production safety accidents in 2020. The relaxation of frequent accident control will cause the total number of accidents to show an upward trend, which is consistent with the predicted results. In the current construction safety management, on the basis of maintaining the management of more harmful accidents, the management of more frequent general accidents should be strengthened.
In summary, the data preprocessing method used, and the combined prediction model constructed by this research, can well predict the situation of construction safety production accidents in China. Although since 2020, the impact of the COVID-19 epidemic on China's construction industry has been significant. Due to the analysis and processing of data as time series, and the wide range of applications of the combined model constructed, the prediction method proposed in this paper still has the possibility of being applicable in different countries and different fields.
There are many studies on accident prediction, especially in the construction field, but they are often based on relatively stable accident trends, which is the limitation of this study. The impact of the COVID-19 epidemic is unprecedented, and in the short term, its impact and negative impact are large; In the long run, the epidemic will play a role in promoting the development of the construction industry with strong momentum. In the past three years, the epidemic has been affecting the development of all walks of life in the country, and its safety will have a direct impact on the construction field, and its impact on the economy will indirectly lead to changes in the security situation in the construction field.
Therefore, in the future, the political economy and the world development situation can be further integrated into the research in the field of building safety, and various indicators can be integrated into a semi-open system to continuously supplement the indicators related to forecasting. In terms of prediction, a more ideal and universal model was constructed, and multi-field elements were incorporated to make breakthroughs in accident prevention and make research more practical. Further study is being conducted in this direction and will be reflected in our future work.