Dynamic Warning Method for Structural Health Monitoring Data Based on ARIMA: Case Study of Hong Kong–Zhuhai–Macao Bridge Immersed Tunnel

Structural health monitoring (SHM) is gradually replacing traditional manual detection and is becoming a focus of the research devoted to the operation and maintenance of tunnel structures. However, in the face of massive SHM data, the autonomous early warning method is still required to further reduce the burden of manual analysis. Thus, this study proposed a dynamic warning method for SHM data based on ARIMA and applied it to the concrete strain data of the Hong Kong–Zhuhai–Macao Bridge (HZMB) immersed tunnel. First, wavelet threshold denoising was applied to filter noise from the SHM data. Then, the feasibility and accuracy of establishing an ARIMA model were verified, and it was adopted to predict future time series of SHM data. After that, an anomaly detection scheme was proposed based on the dynamic model and dynamic threshold value, which set the confidence interval of detected anomalies based on the statistical characteristics of the historical series. Finally, a hierarchical warning system was defined to classify anomalies according to their detection threshold and enable hierarchical treatments. The illustrative example of the HZMB immersed tunnel verified that a three-level (5.5 σ, 6.5 σ, and 7.5 σ) dynamic warning schematic can give good results of anomalies detection and greatly improves the efficiency of SHM data management of the tunnel.


Introduction
The deterioration of an immersed tunnel is a natural process with the increase in its service time. This may result in extensive deterioration and may affect the tunnel's serviceability [1][2][3][4][5][6][7][8]. Real-time SHM of the immersed tunnel is an effective way to solve this problem. It can monitor the dynamic changes of a tunnel's structure by embedding all kinds of monitoring equipment in the tunnel structure during the construction process. Further, the SHM system help to show dynamic responses of infrastructure whenever they are under heavy traffic load together with the obtained parameters depending on the traffic conditions [9]. The collected data from the equipment can then be analyzed and guide refined tunnel operation and management [10][11][12][13][14][15][16].
However, for many years, tunnel operators relied on their judgment or simple fixed alarm threshold to monitor structural data and determine maintenance measures. If this threshold is set too high, it is difficult to carry out an effective warning as the system will not set the alarm bell even if the data behave abnormally. On the other hand, if the threshold is too low, the alarm frequency will be very high, which may lead to resource mismatch. Clearly, the fixed threshold method oversimplifies the issue and results in the low reliability of anomaly detection. Moreover, it does not take into account the aging of the tunnel structure and does not make full use of the massive monitoring data.
As the digital transformation in infrastructure is accelerating, the data-driven anomaly detection method has become a research hotspot. Among them, the regression model based on ARIMA is widely used [17]. The idea is to have the model capture the normal behavior of the time series, whereas a significant deviation from this model is supposed to be an outlier.
ARIMA is a classic model for time series forecasting in the engineering field. Huang et al. [18] concluded the ARIMA model is capable of predicting the evolution of tunnel deformational performance with low computational cost and satisfying accuracy in the short term. Bian et al. [19] designed a forecasting model on the strain monitoring data of the SHM system of a cable-stayed bridge by combining the empirical mode decomposition method with the ARIMA and showed that the proposed method performs better in all statistical indicators. Liu et al. [20] built an ARIMA model to predict the concrete damage failure of the service tunnels under sulfate erosion. It can be seen that the ARIMA model has been widely applied in predicting the evolution of tunnels and bridges, and it can be combined with other models to achieve a better prediction effect [21][22][23][24][25].
Some researchers in other fields further adapted the ARIMA method to detect anomalies and issue warnings. Li et al. [26] combined the ARIMA model and the transferable belief model to realize the early warning of coal and gas outbursts in coal mining production. Zeng et al. [27] proposed an improved ARIMA model to detect the outliers based on dynamic variance. By selecting an appropriate and adjustable threshold, the algorithm successfully reduced the warning error of optical fiber sensors caused by the normal fluctuations.
Nowadays, although the government has attached great importance to tunnel maintenance and digital transformation, the research on anomaly detection of the immersed tunnel based on mass monitoring data is still insufficient. This study has aimed to propose a more intelligent and reliable warning method for immersed tunnel structures, which is of great significance to improve the safety of the structure, ensure the safety of the operation, reduce the maintenance cost, forecast disasters, and improve the overall service level. The innovation of this study lies in the design of a dynamic warning method for SHM data based on ARIMA prediction so as to timely and accurately detect anomalies. The model is dynamic, which means it can be automatically updated to capture the sequence trend in real-time. At the same time, the threshold is set to be dynamic so that the local characteristics of the sequence are considered in the process of judging outliers.
The paper is organized as follows: Section 2 will introduce the basic information about the investigated project-the Hong Kong-Zhuhai-Macao Bridge immersed tunnel and its SHM system. After that, the denoising method and the classification of anomalies will be introduced. Section 3 will discuss the methodology, which includes the static ARIMA model, the dynamic ARIMA model, and the anomaly detection mechanism, and apply this method to concrete strain monitoring data to verify the effectiveness of the proposed method. Section 4 gives the concluding remarks.

Outlines of the HZMB Immersed Tunnel
The Hong Kong-Zhuhai-Macao Bridge (HZMB), spanning Lingdingyang Bay, is a 55-km-long mega project. It includes three components: the main project of the bridge, island, and undersea tunnel; the ports of Hong Kong, Zhuhai, and Macau; and the connecting line between the three cities. As a part of the main project, the most challenging task is the 6.7-km-long undersea tunnel [28]. The immersed tunnel section has a length of 5.67 km and consists of 33 elements, as shown in Figure 1. Among all the elements, E28-E33 are located on the flat curve with a radius (R) of 5500 m, and the rest are located on the straight section [29]. The closure joint is between E29 and E30, and the water depth at the bottom of the closure joint is 27.9 m.
Sensors 2022, 22, x FOR PEER REVIEW 3 of 17 task is the 6.7-km-long undersea tunnel [28]. The immersed tunnel section has a length of 5.67 km and consists of 33 elements, as shown in Figure 1. Among all the elements, E28-E33 are located on the flat curve with a radius (R) of 5500 m, and the rest are located on the straight section [29]. The closure joint is between E29 and E30, and the water depth at the bottom of the closure joint is 27.9 m.  As a complex sea-crossing project, the HZMB immersed tunnel has a transition section at the head of the artificial islands where different foundation solutions are adopted. Additionally, the thickness of the placed backfill is uneven [30]. Meanwhile, the deposition of sediments should also be taken into consideration during the operation life of the structure, which increases the overlying load on the tunnel structure. Load changes in the tunnel's longitudinal direction will also result in a larger internal force in the tunnel. All the above situations will make the operational safety of the immersed tunnel's structure a great challenge.

Overview of Tunnel SHM System
HZMB is a mega infrastructure project with huge investment and significant impact, and a designed life of 120 years. In order to manage the maintenance and ensure the safety of this underwater structure, a SHM system is introduced to the project.
SHM is a comprehensive multidisciplinary system that encompasses techniques in sensing operation, electronic engineering, signal analysis, network communication, computer engineering, pattern recognition, civil engineering, and so on. Its main functions include: (1) Conduct effective monitoring and effective maintenance measures throughout the whole operation process; (2) Develop a system that works coherently in each stage of construction and operation; (3) Provide necessary and practical information for the operation staff on the structural state; (4) Predict the health state of the structure based on the systematic risk analysis and structural response and carry out the corresponding maintenance measures to guarantee structural safety. task is the 6.7-km-long undersea tunnel [28]. The immersed tunnel section has a length of 5.67 km and consists of 33 elements, as shown in Figure 1. Among all the elements, E28-E33 are located on the flat curve with a radius (R) of 5500 m, and the rest are located on the straight section [29]. The closure joint is between E29 and E30, and the water depth at the bottom of the closure joint is 27.9 m.  As a complex sea-crossing project, the HZMB immersed tunnel has a transition section at the head of the artificial islands where different foundation solutions are adopted. Additionally, the thickness of the placed backfill is uneven [30]. Meanwhile, the deposition of sediments should also be taken into consideration during the operation life of the structure, which increases the overlying load on the tunnel structure. Load changes in the tunnel's longitudinal direction will also result in a larger internal force in the tunnel. All the above situations will make the operational safety of the immersed tunnel's structure a great challenge.

Overview of Tunnel SHM System
HZMB is a mega infrastructure project with huge investment and significant impact, and a designed life of 120 years. In order to manage the maintenance and ensure the safety of this underwater structure, a SHM system is introduced to the project.
SHM is a comprehensive multidisciplinary system that encompasses techniques in sensing operation, electronic engineering, signal analysis, network communication, computer engineering, pattern recognition, civil engineering, and so on. Its main functions include: (1) Conduct effective monitoring and effective maintenance measures throughout the whole operation process; (2) Develop a system that works coherently in each stage of construction and operation; (3) Provide necessary and practical information for the operation staff on the structural state; (4) Predict the health state of the structure based on the systematic risk analysis and structural response and carry out the corresponding maintenance measures to guarantee structural safety. As a complex sea-crossing project, the HZMB immersed tunnel has a transition section at the head of the artificial islands where different foundation solutions are adopted. Additionally, the thickness of the placed backfill is uneven [30]. Meanwhile, the deposition of sediments should also be taken into consideration during the operation life of the structure, which increases the overlying load on the tunnel structure. Load changes in the tunnel's longitudinal direction will also result in a larger internal force in the tunnel. All the above situations will make the operational safety of the immersed tunnel's structure a great challenge.

Overview of Tunnel SHM System
HZMB is a mega infrastructure project with huge investment and significant impact, and a designed life of 120 years. In order to manage the maintenance and ensure the safety of this underwater structure, a SHM system is introduced to the project.
SHM is a comprehensive multidisciplinary system that encompasses techniques in sensing operation, electronic engineering, signal analysis, network communication, computer engineering, pattern recognition, civil engineering, and so on. Its main functions include: (1) Conduct effective monitoring and effective maintenance measures throughout the whole operation process; (2) Develop a system that works coherently in each stage of construction and operation; (3) Provide necessary and practical information for the operation staff on the structural state; (4) Predict the health state of the structure based on the systematic risk analysis and structural response and carry out the corresponding maintenance measures to guarantee structural safety.
According to the above functional requirements, the system includes the following subsystems: (1) Automatic sensing system, which includes the following three modules. The first is the sensor module. This module serves to control various types of sensing equipment, read structural load data and structural response data, and convert these values into voltage, electric current, or frequency. The second is the data acquisition and transmission module, performing the function of converting the collected electrical signal into a digital signal that can be recognized by the computer and transmitting it to the data processing and control subsystem through the wired and wireless network. The third is the data processing and control module, whose function is to complete data pre-processing, post-processing, archiving, display, and storage. (2) Inspection and maintenance management system. This subsystem formulates the regulations of structure inspection and maintenance, arranges personnel to carry out periodic, quantitative, standard, and systematic inspections according to the tasks set by the software. (3) Structural evaluation and early warning system: The main function of this subsystem is to make precise assessments of the structural operation state with the help of highperformance computing equipment, a variety of static and dynamic analysis software, and damage inspection results. Then, provide technical support to the management department by preparing and submitting the monitoring report regularly. (4) Central database system: This subsystem manages and stores the static information and dynamic monitoring data of the whole monitoring system. (5) User interface subsystem: Display all kinds of data to users and accept users' control and input of the system.
The relationships between subsystems are shown in Figure 3.
According to the above functional requirements, the system includes the following subsystems: (1) Automatic sensing system, which includes the following three modules. The first is the sensor module. This module serves to control various types of sensing equipment, read structural load data and structural response data, and convert these values into voltage, electric current, or frequency. The second is the data acquisition and transmission module, performing the function of converting the collected electrical signal into a digital signal that can be recognized by the computer and transmitting it to the data processing and control subsystem through the wired and wireless network. The third is the data processing and control module, whose function is to complete data pre-processing, post-processing, archiving, display, and storage. (2) Inspection and maintenance management system. This subsystem formulates the regulations of structure inspection and maintenance, arranges personnel to carry out periodic, quantitative, standard, and systematic inspections according to the tasks set by the software. (3) Structural evaluation and early warning system: The main function of this subsystem is to make precise assessments of the structural operation state with the help of highperformance computing equipment, a variety of static and dynamic analysis software, and damage inspection results. Then, provide technical support to the management department by preparing and submitting the monitoring report regularly. (4) Central database system: This subsystem manages and stores the static information and dynamic monitoring data of the whole monitoring system. (5) User interface subsystem: Display all kinds of data to users and accept users' control and input of the system.
The relationships between subsystems are shown in Figure 3. Five types of monitoring data, namely ground motion, joint deformation, concrete strain, temperature, and humidity data, are adopted to discuss the proposed anomaly detection method. The details of SHM contents and corresponding sensors are listed in Table  1. The original sampling frequency of the five types of sensors is 50 Hz.  Five types of monitoring data, namely ground motion, joint deformation, concrete strain, temperature, and humidity data, are adopted to discuss the proposed anomaly detection method. The details of SHM contents and corresponding sensors are listed in Table 1. The original sampling frequency of the five types of sensors is 50 Hz.

Noise and Anomalies in SHM Data
In the process of data acquisition, the SHM system will inevitably be affected by environmental noise, the unstable connection between devices, an aging electric grid, unstable wireless transmission speed, channel faults, and so on, which will result in poor  Figure 4 displays an enlarged picture of the concrete strain data. The burr signals can be seen in the sequence, and the changing patterns of data are easily submerged in noise.
by applying the method of calculating the SI index mentioned in the author's previo paper [14]. In particular, by applying the basis wavelet (Symlet 12) to decompose the or inal data into five layers, a string of appropriate coefficients and detail coefficients obtained. Then, a fixed threshold method is used to calculate the wavelet threshold, wh is a common method for setting the wavelet threshold in wavelet analysis. The formul T = 2log(N), where N is the signal length. The detail coefficients lower than the thre old are viewed as meaningless noise and set to zero so that the noisy part of the data removed. Finally, the denoising sequence is reconstructed by an inverse wavelet ope tion using the detail coefficients of each layer and the appropriate coefficients of the fi layer.  Figure 4 shows the time series before and after denoising. It can be seen that wavelet denoising can effectively remove the redundant information brought into the s tem by the external environmental noise, which helps reveal the true characteristics of sequence.
It should be noted that the denoising procedure does wipe off the point anoma (classified in Section 2.4) within the sequence. These point anomalies can be easily spot by setting thresholds according to moving windows. However, a previous study h pointed out that anomalies in SHM data not only come from poor data quality but are a caused by structural damage [31]. In this study, it is assumed that the point anoma only refer to sources of poor data quality because the immersed tunnel was newly bu and in good condition. Therefore, such anomalies can be marked down before the noising process. The above denoising method serves the purpose of removing the inter rential noise in the original data and thus improving its intelligibility and analyzabil The denoised data are ready for anomaly detection, especially with contextual anoma and collective anomalies.

Classification of Anomalies
Anomalies refer to data deviating from the normal behavior of data. According to characteristics of the anomalies, they can be divided into the following three categorie The wavelet threshold denoising method is adopted to eliminate the noise. This study selects the optimal wavelet basis and the number of wavelet decomposition layers by applying the method of calculating the SI index mentioned in the author's previous paper [14]. In particular, by applying the basis wavelet (Symlet 12) to decompose the original data into five layers, a string of appropriate coefficients and detail coefficients are obtained. Then, a fixed threshold method is used to calculate the wavelet threshold, which is a common method for setting the wavelet threshold in wavelet analysis. The formula is T = 2 log(N), where N is the signal length. The detail coefficients lower than the threshold are viewed as meaningless noise and set to zero so that the noisy part of the data is removed. Finally, the denoising sequence is reconstructed by an inverse wavelet operation using the detail coefficients of each layer and the appropriate coefficients of the fifth layer. Figure 4 shows the time series before and after denoising. It can be seen that the wavelet denoising can effectively remove the redundant information brought into the system by the external environmental noise, which helps reveal the true characteristics of the sequence.
It should be noted that the denoising procedure does wipe off the point anomalies (classified in Section 2.4) within the sequence. These point anomalies can be easily spotted by setting thresholds according to moving windows. However, a previous study has pointed out that anomalies in SHM data not only come from poor data quality but are also caused by structural damage [31]. In this study, it is assumed that the point anomalies only refer to sources of poor data quality because the immersed tunnel was newly built and in good condition. Therefore, such anomalies can be marked down before the denoising process. The above denoising method serves the purpose of removing the interferential noise in the original data and thus improving its intelligibility and analyzability. The denoised data are ready for anomaly detection, especially with contextual anomalies and collective anomalies.

Classification of Anomalies
Anomalies refer to data deviating from the normal behavior of data. According to the characteristics of the anomalies, they can be divided into the following three categories: (1) Point anomaly: If an individual data instance differs greatly from other data, it will be regarded as a point anomaly. The maximum and minimum values in the statistical distribution may likely be considered point anomalies. Figure 5 shows an example of a point anomaly in the structural temperature data. It can be seen that the value in the red box is much higher than the other data.
(1) Point anomaly: If an individual data instance differs greatly from other data, it will be regarded as a point anomaly. The maximum and minimum values in the statistical distribution may likely be considered point anomalies. Figure 5 shows an example of a point anomaly in the structural temperature data. It can be seen that the value in the red box is much higher than the other data. (2) Contextual anomaly: If an individual data instance differs greatly from its nearby data or within a certain context, it is called a contextual anomaly. Figure 6 shows a contextual anomaly in the concrete strain data. It can be seen that the value in the box has a downward trend against the context where the previous sequence trend is relatively stable. (3) Collective anomalies: Collective anomalies refer to the situation where a collection of related data instances is anomalous, while the individual instance within the collective anomalies may not be anomalous by themselves. In other words, the collective anomalies happen only in the form of a collective group. Figure 7 shows the collective anomalies in the concrete strain data. In Figure 7a, the frequency of the data instances within the red box is significantly higher than that before or after the red box. Meanwhile, judging from Figure 7b-d separately, there is no anomaly detected.
In the three types of anomalies discussed above, point anomalies can be identified by setting appropriate thresholds based on the statistical parameters of the data within a certain time window. However, for the recognition of contextual and collective anomalies, both the temporal and spatial character attributes need to be considered, and the method of setting thresholds may be insufficient, so more complex methods need to be proposed. (2) Contextual anomaly: If an individual data instance differs greatly from its nearby data or within a certain context, it is called a contextual anomaly. Figure 6 shows a contextual anomaly in the concrete strain data. It can be seen that the value in the box has a downward trend against the context where the previous sequence trend is relatively stable.
(1) Point anomaly: If an individual data instance differs greatly from other data, it will be regarded as a point anomaly. The maximum and minimum values in the statistical distribution may likely be considered point anomalies. Figure 5 shows an example of a point anomaly in the structural temperature data. It can be seen that the value in the red box is much higher than the other data. (2) Contextual anomaly: If an individual data instance differs greatly from its nearby data or within a certain context, it is called a contextual anomaly. Figure 6 shows a contextual anomaly in the concrete strain data. It can be seen that the value in the box has a downward trend against the context where the previous sequence trend is relatively stable. (3) Collective anomalies: Collective anomalies refer to the situation where a collection of related data instances is anomalous, while the individual instance within the collective anomalies may not be anomalous by themselves. In other words, the collective anomalies happen only in the form of a collective group. Figure 7 shows the collective anomalies in the concrete strain data. In Figure 7a, the frequency of the data instances within the red box is significantly higher than that before or after the red box. Meanwhile, judging from Figure 7b-d separately, there is no anomaly detected.
In the three types of anomalies discussed above, point anomalies can be identified by setting appropriate thresholds based on the statistical parameters of the data within a certain time window. However, for the recognition of contextual and collective anomalies, both the temporal and spatial character attributes need to be considered, and the method of setting thresholds may be insufficient, so more complex methods need to be proposed. (3) Collective anomalies: Collective anomalies refer to the situation where a collection of related data instances is anomalous, while the individual instance within the collective anomalies may not be anomalous by themselves. In other words, the collective anomalies happen only in the form of a collective group. Figure 7 shows the collective anomalies in the concrete strain data. In Figure 7a, the frequency of the data instances within the red box is significantly higher than that before or after the red box. Meanwhile, judging from Figure 7b-d separately, there is no anomaly detected.
In the three types of anomalies discussed above, point anomalies can be identified by setting appropriate thresholds based on the statistical parameters of the data within a certain time window. However, for the recognition of contextual and collective anomalies, both the temporal and spatial character attributes need to be considered, and the method of setting thresholds may be insufficient, so more complex methods need to be proposed.

Introduction of ARIMA
The ARMA model is a widely used classical time series prediction model. ARIMA contains both an autoregression part and a moving average part so that it can both reflect the memory of the system and the noise entering the system. The ARMA model is usually written as ARMA (p, q), where p is the autoregression (AR) parameter, indicating the number of lag observations, and q is the moving average (MA) parameter, indicating the moving average window length. The general form of the ARMA model is as follows: where is the measured value at period t, is the error terms, and are the parameters of the autoregression part and the moving average part.
A backshift operator B is defined as shifting the sequence backward by one time period.
performs the operation to shift the sequence backward for k periods.
Then, Formula (1) can be equivalently written as: where The ARMA model is a widely used classical time series prediction model. ARIMA contains both an autoregression part and a moving average part so that it can both reflect the memory of the system and the noise entering the system. The ARMA model is usually written as ARMA (p, q), where p is the autoregression (AR) parameter, indicating the number of lag observations, and q is the moving average (MA) parameter, indicating the moving average window length. The general form of the ARMA model is as follows: where X t is the measured value at period t, ε t is the error terms, ϕ p and θ q are the parameters of the autoregression part and the moving average part. A backshift operator B is defined as shifting the sequence backward by one time period. B k performs the operation to shift the sequence backward for k periods.
Then, Formula (1) can be equivalently written as: where The ARMA model is only applicable to simulate stationary time series. If the time series is non-stationary, the differencing method is commonly used to turn the non-stationary sequence into stationary ones. For time series data, "stationary" means that the mean of the data can be considered to be statistically invariant, while "non-stationary" means that the mean of the data cannot be considered to be statistically invariable [32]. The ARMA model with a differencing process is also called the ARIMA model. "I" stands for integrated, which signifies that AR and MA techniques are combined into a single model.
We define the differentiated series as the difference between the current period and the prior period: The difference operator ∇ = 1 − B is introduced here, and the ARIMA model of order (p, d, q) can be expressed as: where d is the number of times that a raw sequence is differenced, indicating the number of differences needed to be stationary.

Time Series Pre-Processing
If the time series is non-stationary, statistical models based on historical data cannot predict the future. If the sequence is white noise, which means there is no correlation between the values of the sequences, the sequence is random and has no value for analysis. Therefore, the time series suitable for the ARIMA model must both be stationary and non-white noise after difference. This section will discuss the stationarity and randomness of time series and conduct necessary data pre-processing.
Considering that ARIMA is suitable for short-term prediction, and the historical data used for modeling is generally less than 10 (that is, the value of p and q is generally less than 10), the paper takes 100 observations (100 s as the time window) to fit the model. It should be noted that the time window of 100 s is obtained subjectively after many attempts by considering the prediction accuracy of the model and the significances of many statistical tests during the modeling procedure. It is not necessarily the best model for all time points. The denoised concrete strain data of the immersed tunnel of the HZMB on 2 June 2020 is taken as an example for modeling, as shown in Figure 8.
The ARMA model is only applicable to simulate stationary time series. If series is non-stationary, the differencing method is commonly used to turn the tionary sequence into stationary ones. For time series data, "stationary" means mean of the data can be considered to be statistically invariant, while "non-sta means that the mean of the data cannot be considered to be statistically invaria The ARMA model with a differencing process is also called the ARIMA model. " for integrated, which signifies that AR and MA techniques are combined into model.
We define the differentiated series as the difference between the current pe the prior period:

∇ =
The difference operator ∇ = 1 is introduced here, and the ARIMA mod der (p, d, q) can be expressed as: where d is the number of times that a raw sequence is differenced, indicating the of differences needed to be stationary.

Time Series Pre-processing
If the time series is non-stationary, statistical models based on historical dat predict the future. If the sequence is white noise, which means there is no correl tween the values of the sequences, the sequence is random and has no value for Therefore, the time series suitable for the ARIMA model must both be stationary white noise after difference. This section will discuss the stationarity and rando time series and conduct necessary data pre-processing.
Considering that ARIMA is suitable for short-term prediction, and the histor used for modeling is generally less than 10 (that is, the value of p and q is gene than 10), the paper takes 100 observations (100 s as the time window) to fit the m should be noted that the time window of 100 s is obtained subjectively after m tempts by considering the prediction accuracy of the model and the significances statistical tests during the modeling procedure. It is not necessarily the best mod time points. The denoised concrete strain data of the immersed tunnel of the HZ June 2020 is taken as an example for modeling, as shown in Figure 8. The timing graph of the initial series and differenced series, ACF diagram, an diagram are shown in Table 2. There is no specific trend, amplitude, or frequenc in the timing diagram of the second-order difference sequence, so it is tentativel that the sequence has met the requirements of stationarity. The timing graph of the initial series and differenced series, ACF diagram, and PACF diagram are shown in Table 2. There is no specific trend, amplitude, or frequency change in the timing diagram of the second-order difference sequence, so it is tentatively judged that the sequence has met the requirements of stationarity.

Type of Plots Initial Series First Difference Series Second Difference Series
Timing graph ACF plot

PACF plot
The ADF test was used to conduct a statistical test on the data, as shown in Table 3. It can be seen that the original sequence is not stable with the first-order difference sequence, and the second-order difference sequence is stable in the statistical sense with a p-value of 0.0045. Therefore, the order of d is determined to be two. The Ljung-Box (LB) test can be used to determine whether a sequence is white noise. The LB test assumes that the test sequence is a white noise sequence, so the correlation coefficients between time steps are zero, and the test statistics should obey the Chi-square distribution. If the calculated p-value is greater than 0.05, it is 95% sure of rejecting the null hypothesis and judging that the sequence is not a white noise sequence. The p-value of the second difference sequence is 2.29 × 10 , indicating that the sequence is not white noise and is worthy of further analysis.

Model Identification
The next step is to set the number of AR and MA terms, also called model identification. A common method is to visually inspect the autocorrelation function (ACF) plot and partial autocorrelation function (PACF) plot. Note that for the second difference sequence (Table 2), the ACF plot decays slowly, while the PACF plot only has a significant spike at lag 2. This indicates that the autocorrelation pattern of the sequence is more clearly explained by adding AR terms instead of MA terms, and the lag at which the PCAF cuts off is the appropriate number of AR terms.
However, it is not feasible to select the p and q parameters manually in the context of dynamic modeling. In this paper, the Akaike Information Criteria (AIC) and Bayesian

Type of Plots Initial Series First Difference Series Second Difference Series
Timing graph ACF plot

PACF plot
The ADF test was used to conduct a statistical test on the data, as shown in Table 3. It can be seen that the original sequence is not stable with the first-order difference sequence, and the second-order difference sequence is stable in the statistical sense with a p-value of 0.0045. Therefore, the order of d is determined to be two. The Ljung-Box (LB) test can be used to determine whether a sequence is white noise. The LB test assumes that the test sequence is a white noise sequence, so the correlation coefficients between time steps are zero, and the test statistics should obey the Chi-square distribution. If the calculated p-value is greater than 0.05, it is 95% sure of rejecting the null hypothesis and judging that the sequence is not a white noise sequence. The p-value of the second difference sequence is 2.29 × 10 , indicating that the sequence is not white noise and is worthy of further analysis.

Model Identification
The next step is to set the number of AR and MA terms, also called model identification. A common method is to visually inspect the autocorrelation function (ACF) plot and partial autocorrelation function (PACF) plot. Note that for the second difference sequence (Table 2), the ACF plot decays slowly, while the PACF plot only has a significant spike at lag 2. This indicates that the autocorrelation pattern of the sequence is more clearly explained by adding AR terms instead of MA terms, and the lag at which the PCAF cuts off is the appropriate number of AR terms.
However, it is not feasible to select the p and q parameters manually in the context of dynamic modeling. In this paper, the Akaike Information Criteria (AIC) and Bayesian

Type of Plots Initial Series First Difference Series Second Difference Series
Timing graph ACF plot

PACF plot
The ADF test was used to conduct a statistical test on the data, as shown in Table 3. It can be seen that the original sequence is not stable with the first-order difference sequence, and the second-order difference sequence is stable in the statistical sense with a p-value of 0.0045. Therefore, the order of d is determined to be two. The Ljung-Box (LB) test can be used to determine whether a sequence is white noise. The LB test assumes that the test sequence is a white noise sequence, so the correlation coefficients between time steps are zero, and the test statistics should obey the Chi-square distribution. If the calculated p-value is greater than 0.05, it is 95% sure of rejecting the null hypothesis and judging that the sequence is not a white noise sequence. The p-value of the second difference sequence is 2.29 × 10 , indicating that the sequence is not white noise and is worthy of further analysis.

Model Identification
The next step is to set the number of AR and MA terms, also called model identification. A common method is to visually inspect the autocorrelation function (ACF) plot and partial autocorrelation function (PACF) plot. Note that for the second difference sequence (Table 2), the ACF plot decays slowly, while the PACF plot only has a significant spike at lag 2. This indicates that the autocorrelation pattern of the sequence is more clearly explained by adding AR terms instead of MA terms, and the lag at which the PCAF cuts off is the appropriate number of AR terms.
However, it is not feasible to select the p and q parameters manually in the context of dynamic modeling. In this paper, the Akaike Information Criteria (AIC) and Bayesian

Type of Plots Initial Series First Difference Series Second Difference Series
Timing graph ACF plot

PACF plot
The ADF test was used to conduct a statistical test on the data, as shown in Table 3. It can be seen that the original sequence is not stable with the first-order difference sequence, and the second-order difference sequence is stable in the statistical sense with a p-value of 0.0045. Therefore, the order of d is determined to be two. The Ljung-Box (LB) test can be used to determine whether a sequence is white noise. The LB test assumes that the test sequence is a white noise sequence, so the correlation coefficients between time steps are zero, and the test statistics should obey the Chi-square distribution. If the calculated p-value is greater than 0.05, it is 95% sure of rejecting the null hypothesis and judging that the sequence is not a white noise sequence. The p-value of the second difference sequence is 2.29 × 10 , indicating that the sequence is not white noise and is worthy of further analysis.

Model Identification
The next step is to set the number of AR and MA terms, also called model identification. A common method is to visually inspect the autocorrelation function (ACF) plot and partial autocorrelation function (PACF) plot. Note that for the second difference sequence (Table 2), the ACF plot decays slowly, while the PACF plot only has a significant spike at lag 2. This indicates that the autocorrelation pattern of the sequence is more clearly explained by adding AR terms instead of MA terms, and the lag at which the PCAF cuts off is the appropriate number of AR terms.
However, it is not feasible to select the p and q parameters manually in the context of dynamic modeling. In this paper, the Akaike Information Criteria (AIC) and Bayesian

Type of Plots Initial Series First Difference Series Second Difference Series
Timing graph ACF plot

PACF plot
The ADF test was used to conduct a statistical test on the data, as shown in Table 3. It can be seen that the original sequence is not stable with the first-order difference sequence, and the second-order difference sequence is stable in the statistical sense with a p-value of 0.0045. Therefore, the order of d is determined to be two. The Ljung-Box (LB) test can be used to determine whether a sequence is white noise. The LB test assumes that the test sequence is a white noise sequence, so the correlation coefficients between time steps are zero, and the test statistics should obey the Chi-square distribution. If the calculated p-value is greater than 0.05, it is 95% sure of rejecting the null hypothesis and judging that the sequence is not a white noise sequence. The p-value of the second difference sequence is 2.29 × 10 , indicating that the sequence is not white noise and is worthy of further analysis.

Model Identification
The next step is to set the number of AR and MA terms, also called model identification. A common method is to visually inspect the autocorrelation function (ACF) plot and partial autocorrelation function (PACF) plot. Note that for the second difference sequence (Table 2), the ACF plot decays slowly, while the PACF plot only has a significant spike at lag 2. This indicates that the autocorrelation pattern of the sequence is more clearly explained by adding AR terms instead of MA terms, and the lag at which the PCAF cuts off is the appropriate number of AR terms.
However, it is not feasible to select the p and q parameters manually in the context of dynamic modeling. In this paper, the Akaike Information Criteria (AIC) and Bayesian

Type of Plots Initial Series First Difference Series Second Difference Series
Timing graph ACF plot

PACF plot
The ADF test was used to conduct a statistical test on the data, as shown in Table 3. It can be seen that the original sequence is not stable with the first-order difference sequence, and the second-order difference sequence is stable in the statistical sense with a p-value of 0.0045. Therefore, the order of d is determined to be two. The Ljung-Box (LB) test can be used to determine whether a sequence is white noise. The LB test assumes that the test sequence is a white noise sequence, so the correlation coefficients between time steps are zero, and the test statistics should obey the Chi-square distribution. If the calculated p-value is greater than 0.05, it is 95% sure of rejecting the null hypothesis and judging that the sequence is not a white noise sequence. The p-value of the second difference sequence is 2.29 × 10 , indicating that the sequence is not white noise and is worthy of further analysis.

Model Identification
The next step is to set the number of AR and MA terms, also called model identification. A common method is to visually inspect the autocorrelation function (ACF) plot and partial autocorrelation function (PACF) plot. Note that for the second difference sequence (Table 2), the ACF plot decays slowly, while the PACF plot only has a significant spike at lag 2. This indicates that the autocorrelation pattern of the sequence is more clearly explained by adding AR terms instead of MA terms, and the lag at which the PCAF cuts off is the appropriate number of AR terms.
However, it is not feasible to select the p and q parameters manually in the context of dynamic modeling. In this paper, the Akaike Information Criteria (AIC) and Bayesian

Type of Plots Initial Series First Difference Series Second Difference Series
Timing graph ACF plot

PACF plot
The ADF test was used to conduct a statistical test on the data, as shown in Table 3. It can be seen that the original sequence is not stable with the first-order difference sequence, and the second-order difference sequence is stable in the statistical sense with a p-value of 0.0045. Therefore, the order of d is determined to be two. The Ljung-Box (LB) test can be used to determine whether a sequence is white noise. The LB test assumes that the test sequence is a white noise sequence, so the correlation coefficients between time steps are zero, and the test statistics should obey the Chi-square distribution. If the calculated p-value is greater than 0.05, it is 95% sure of rejecting the null hypothesis and judging that the sequence is not a white noise sequence. The p-value of the second difference sequence is 2.29 × 10 , indicating that the sequence is not white noise and is worthy of further analysis.

Model Identification
The next step is to set the number of AR and MA terms, also called model identification. A common method is to visually inspect the autocorrelation function (ACF) plot and partial autocorrelation function (PACF) plot. Note that for the second difference sequence (Table 2), the ACF plot decays slowly, while the PACF plot only has a significant spike at lag 2. This indicates that the autocorrelation pattern of the sequence is more clearly explained by adding AR terms instead of MA terms, and the lag at which the PCAF cuts off is the appropriate number of AR terms.
However, it is not feasible to select the p and q parameters manually in the context of dynamic modeling. In this paper, the Akaike Information Criteria (AIC) and Bayesian

Type of Plots Initial Series First Difference Series Second Difference Series
Timing graph ACF plot

PACF plot
The ADF test was used to conduct a statistical test on the data, as shown in Table 3. It can be seen that the original sequence is not stable with the first-order difference sequence, and the second-order difference sequence is stable in the statistical sense with a p-value of 0.0045. Therefore, the order of d is determined to be two. The Ljung-Box (LB) test can be used to determine whether a sequence is white noise. The LB test assumes that the test sequence is a white noise sequence, so the correlation coefficients between time steps are zero, and the test statistics should obey the Chi-square distribution. If the calculated p-value is greater than 0.05, it is 95% sure of rejecting the null hypothesis and judging that the sequence is not a white noise sequence. The p-value of the second difference sequence is 2.29 × 10 , indicating that the sequence is not white noise and is worthy of further analysis.

Model Identification
The next step is to set the number of AR and MA terms, also called model identification. A common method is to visually inspect the autocorrelation function (ACF) plot and partial autocorrelation function (PACF) plot. Note that for the second difference sequence (Table 2), the ACF plot decays slowly, while the PACF plot only has a significant spike at lag 2. This indicates that the autocorrelation pattern of the sequence is more clearly explained by adding AR terms instead of MA terms, and the lag at which the PCAF cuts off is the appropriate number of AR terms.
However, it is not feasible to select the p and q parameters manually in the context of dynamic modeling. In this paper, the Akaike Information Criteria (AIC) and Bayesian

Type of Plots Initial Series First Difference Series Second Difference Series
Timing graph ACF plot

PACF plot
The ADF test was used to conduct a statistical test on the data, as shown in Table 3. It can be seen that the original sequence is not stable with the first-order difference sequence, and the second-order difference sequence is stable in the statistical sense with a p-value of 0.0045. Therefore, the order of d is determined to be two. The Ljung-Box (LB) test can be used to determine whether a sequence is white noise. The LB test assumes that the test sequence is a white noise sequence, so the correlation coefficients between time steps are zero, and the test statistics should obey the Chi-square distribution. If the calculated p-value is greater than 0.05, it is 95% sure of rejecting the null hypothesis and judging that the sequence is not a white noise sequence. The p-value of the second difference sequence is 2.29 × 10 , indicating that the sequence is not white noise and is worthy of further analysis.

Model Identification
The next step is to set the number of AR and MA terms, also called model identification. A common method is to visually inspect the autocorrelation function (ACF) plot and partial autocorrelation function (PACF) plot. Note that for the second difference sequence (Table 2), the ACF plot decays slowly, while the PACF plot only has a significant spike at lag 2. This indicates that the autocorrelation pattern of the sequence is more clearly explained by adding AR terms instead of MA terms, and the lag at which the PCAF cuts off is the appropriate number of AR terms.
However, it is not feasible to select the p and q parameters manually in the context of dynamic modeling. In this paper, the Akaike Information Criteria (AIC) and Bayesian The ADF test was used to conduct a statistical test on the data, as shown in Table 3. It can be seen that the original sequence is not stable with the first-order difference sequence, and the second-order difference sequence is stable in the statistical sense with a p-value of 0.0045. Therefore, the order of d is determined to be two. The Ljung-Box (LB) test can be used to determine whether a sequence is white noise. The LB test assumes that the test sequence is a white noise sequence, so the correlation coefficients between time steps are zero, and the test statistics should obey the Chi-square distribution. If the calculated p-value is greater than 0.05, it is 95% sure of rejecting the null hypothesis and judging that the sequence is not a white noise sequence. The p-value of the second difference sequence is 2.29 × 10 −22 , indicating that the sequence is not white noise and is worthy of further analysis.

Model Identification
The next step is to set the number of AR and MA terms, also called model identification. A common method is to visually inspect the autocorrelation function (ACF) plot and partial autocorrelation function (PACF) plot. Note that for the second difference sequence (Table 2), the ACF plot decays slowly, while the PACF plot only has a significant spike at lag 2. This indicates that the autocorrelation pattern of the sequence is more clearly explained by adding AR terms instead of MA terms, and the lag at which the PCAF cuts off is the appropriate number of AR terms.
However, it is not feasible to select the p and q parameters manually in the context of dynamic modeling. In this paper, the Akaike Information Criteria (AIC) and Bayesian Information Criteria (BIC) are adopted to select the p and q automatically. To be specific, model identification was carried out on the data at different times of the day, each with a 100 s time window. Table 4 shows the information criteria values under a range of combination of p and q parameters ranging from 0 to 10. According to the two criteria, most of the data performed best when (p, d, q) are set as (5, 2, 0). This result is logical. On the one hand, setting the number of AR terms as 5 indicates that the second difference sequence still has strong autocorrelation, which is consistent with the ACF figure in Table 2. On the other hand, the wavelet threshold denoising implemented in the previous chapter filters out the effect of residuals so that the order of MA is 0. It should be noted that due to the time-consuming process of model identification, (p, d, q) is selected as (5, 2, 0) in this paper to reduce the modeling time. Therefore, the model in this paper can be expressed as:

Parameter Estimation
Maximum likelihood estimation is selected for parameter estimation. It is suitable for estimation with large samples as the idea of this method is that the current training sample distribution can represent the population distribution. Figure 9 shows the results of parameter estimation. Among them, coef. Is the list of the model coefficients, where ar.L1 to ar.L5 correspond to ϕ 1 to ϕ 5 and sigma 2 corresponds to ε t in Equation (8). Information Criteria (BIC) are adopted to select the p and q automatically. To be specific, model identification was carried out on the data at different times of the day, each with a 100 s time window. Table 4 shows the information criteria values under a range of combination of p and q parameters ranging from 0 to 10. According to the two criteria, most of the data performed best when (p, d, q) are set as (5, 2, 0). This result is logical. On the one hand, setting the number of AR terms as 5 indicates that the second difference sequence still has strong autocorrelation, which is consistent with the ACF figure in Table 2. On the other hand, the wavelet threshold denoising implemented in the previous chapter filters out the effect of residuals so that the order of MA is 0. It should be noted that due to the time-consuming process of model identification, (p, d, q) is selected as (5, 2, 0) in this paper to reduce the modeling time. Therefore, the model in this paper can be expressed as:

Parameter Estimation
Maximum likelihood estimation is selected for parameter estimation. It is suitable for estimation with large samples as the idea of this method is that the current training sample distribution can represent the population distribution. Figure 9 shows the results of parameter estimation. Among them, coef. is the list of the model coefficients, where ar.L1 to ar.L5 correspond to to and sigma2 corresponds to in Equation (8). Figure 9. Static ARIMA result. Figure 9. Static ARIMA result.

Model Checking
First, check whether each term in Equation (8) has a real effect on the dependent variable. In Figure 9, the column shows that for all AR coefficients, the probabilities that the null hypothesis (no significance) is true are close to zero, verifying that each AR parameter is significant. However, for sigma 2, the probability of the null hypothesis being true is much greater than zero. This is because the absolute value of its coefficient is too small. Since the influence of sigma 2 on the dependent variable can be ignored, the non-significance of this parameter will not affect the overall model performance.
Then, the model's significance should be examined. The well-fitting model can extract the information contained in the time series, so its residual sequence should be close to the white noise sequence without any specific patterns. Figure 10a shows the timing diagram of the residual sequence, which exhibits strong randomness. Figure 10b,c are a residual distribution plot and quantile-quantile plot, respectively, indicating that the residual basically presents the characteristics of a normal distribution. Figure 10d is the autocorrelation plot of the residuals, which shows there is no obvious autocorrelation between the residuals. In addition, LB statistics were used for the statistical test, and the p-value was 5.1698 × 10 −5 , much smaller than 0.05. All of these verify that the residual could be considered as a white noise sequence. Therefore, the model is significant.  (8) has a real effect on the dependent variable. In Figure 9, the column shows that for all AR coefficients, the probabilities that the null hypothesis (no significance) is true are close to zero, verifying that each AR parameter is significant. However, for sigma 2, the probability of the null hypothesis being true is much greater than zero. This is because the absolute value of its coefficient is too small. Since the influence of sigma 2 on the dependent variable can be ignored, the non-significance of this parameter will not affect the overall model performance.
Then, the model's significance should be examined. The well-fitting model can extract the information contained in the time series, so its residual sequence should be close to the white noise sequence without any specific patterns. Figure 10a shows the timing diagram of the residual sequence, which exhibits strong randomness. Figure 10b,c are a residual distribution plot and quantile-quantile plot, respectively, indicating that the residual basically presents the characteristics of a normal distribution. Figure 10d is the autocorrelation plot of the residuals, which shows there is no obvious autocorrelation between the residuals. In addition, LB statistics were used for the statistical test, and the pvalue was 5.1698 × 10 , much smaller than 0.05. All of these verify that the residual could be considered as a white noise sequence. Therefore, the model is significant.

Model Forecast
The data at different times of the day are selected for static ARIMA modeling. Figure  11 shows the point prediction and prediction intervals of the model performing a ten-step forecast. It can be seen that within five time steps, ARIMA's prediction results are good, but with the extension of the prediction period, the accuracy gradually decreases.

Model Forecast
The data at different times of the day are selected for static ARIMA modeling. Figure 11 shows the point prediction and prediction intervals of the model performing a ten-step forecast. It can be seen that within five time steps, ARIMA's prediction results are good, but with the extension of the prediction period, the accuracy gradually decreases.  This is because, in essence, multi-step prediction is based on single-step prediction, which means the single-step prediction is used as an input for multi-step prediction. Therefore, the error in single-step prediction will accumulate in each subsequent prediction, resulting in inaccurate multi-step prediction results. For a SHM system, real-time monitoring data are available. In order to improve the accuracy of model prediction and improve the reliability of anomaly detection, the ARIMA algorithm will obtain the measured values in real-time so as to realize a rolling single-step forecast.

Dynamic ARIMA Model
Apparently though, as time goes by, the ARIMA model fitted on the previous data will no longer be suitable for the prediction, and the model parameters need to be updated. A crude way to perform this is to re-create the ARIMA model at each second, but it will be too time-consuming. Moreover, the model has high accuracy in a certain period, so there is no need to update parameters second by second. Therefore, the model error monitoring method will be adopted in this study to determine when the model requires being updated. Specifically, the system will keep track of all observations and predictions in the last 10 min. When the average error, which is the difference between the observations and the predictions, is greater than a certain threshold, the model is considered to be inaccurate. The system will automatically update the parameters based on the latest 100 s observations. Meanwhile, in order to improve the efficiency of model refitting, the This is because, in essence, multi-step prediction is based on single-step prediction, which means the single-step prediction is used as an input for multi-step prediction. Therefore, the error in single-step prediction will accumulate in each subsequent prediction, resulting in inaccurate multi-step prediction results. For a SHM system, real-time monitoring data are available. In order to improve the accuracy of model prediction and improve the reliability of anomaly detection, the ARIMA algorithm will obtain the measured values in real-time so as to realize a rolling single-step forecast.

Dynamic ARIMA Model
Apparently though, as time goes by, the ARIMA model fitted on the previous data will no longer be suitable for the prediction, and the model parameters need to be updated. A crude way to perform this is to re-create the ARIMA model at each second, but it will be too time-consuming. Moreover, the model has high accuracy in a certain period, so there is no need to update parameters second by second. Therefore, the model error monitoring method will be adopted in this study to determine when the model requires being updated. Specifically, the system will keep track of all observations and predictions in the last 10 min. When the average error, which is the difference between the observations and the predictions, is greater than a certain threshold, the model is considered to be inaccurate. The system will automatically update the parameters based on the latest 100 s observations. Meanwhile, in order to improve the efficiency of model refitting, the parameters trained at the previous moment are taken as the initial parameter of the new model to reduce the time for parameter searching. In other words, the model will update based on its actual fitting performance. The mechanism of the dynamic ARIMA model is shown in Section 3.3.
The test results show that when the update threshold is selected as 5 × 10 −8 , the model is updated 29 times in one day and the error sequence fluctuates steadily within a day, as shown in Figure 12. The fact that errors do not increase significantly over time indicates that the model error monitoring method can ensure the accuracy of the dynamic model. parameters trained at the previous moment are taken as the initial parameter of the new model to reduce the time for parameter searching. In other words, the model will update based on its actual fitting performance. The mechanism of the dynamic ARIMA model is shown in Section 3.3. The test results show that when the update threshold is selected as 5 × 10 , the model is updated 29 times in one day and the error sequence fluctuates steadily within a day, as shown in Figure 12. The fact that errors do not increase significantly over time indicates that the model error monitoring method can ensure the accuracy of the dynamic model.

Anomaly Detection Method
In this section, this study will design an anomaly detection method based on dynamic model prediction. It should be noted that the method does not involve a multi-step prediction but a rolling single-step prediction by constantly rolling forward the time window and adding the observations into the input of the model.
If the observation deviates too far from the prediction, it is considered to violate time continuity and will be identified as an anomaly. The warning threshold here should be dynamically selected according to the state of the external environment and tunnel structure. Inspired by PauTa Criterion [33,34], or the 3 σ rule, the threshold can be set according to historical data in a certain period. As the judgment is based on statistical characteristics of the data, it requires that the sample data are approximately normally distributed, so the referential period should be sufficiently long. Moreover, it is accurate to set a fixed threshold for evaluation only under stable conditions, so the reference period should not be too long. After comprehensive consideration, the reference period is set as one hour. It is assumed that the system state is relatively stable within two hours, so the threshold of this hour can be set based on the standard deviation of the previous hour.
The following flow chart in Figure 13 summarizes the whole process of anomaly detection based on a dynamic ARIMA model.
As shown in Figure 14, when the threshold takes different standard deviation coefficients, the proportion of anomalies recognized changes. When the std. coefficient is larger, the threshold is larger, which means the criteria are stricter, so the number of detected anomalies is smaller.
In order to adapt corresponding treatments to the anomalies with different severity, hierarchical warnings are defined by classifying the anomalies by their detection thresholds, as shown in Table 5. The std. coefficients for these three levels are defined according to the percentage of abnormality in the total data; 5.5, 6.5, and 7.5 times correspond to 0.1%, 0.05%, and 0.02%, respectively. As the std. coefficient goes up, they are labeled as Level 3 to Level 1 warning points, with Level 1 being the most urgent one.

Anomaly Detection Method
In this section, this study will design an anomaly detection method based on dynamic model prediction. It should be noted that the method does not involve a multi-step prediction but a rolling single-step prediction by constantly rolling forward the time window and adding the observations into the input of the model.
If the observation deviates too far from the prediction, it is considered to violate time continuity and will be identified as an anomaly. The warning threshold here should be dynamically selected according to the state of the external environment and tunnel structure. Inspired by PauTa Criterion [33,34], or the 3 σ rule, the threshold can be set according to historical data in a certain period. As the judgment is based on statistical characteristics of the data, it requires that the sample data are approximately normally distributed, so the referential period should be sufficiently long. Moreover, it is accurate to set a fixed threshold for evaluation only under stable conditions, so the reference period should not be too long. After comprehensive consideration, the reference period is set as one hour. It is assumed that the system state is relatively stable within two hours, so the threshold of this hour can be set based on the standard deviation of the previous hour.
The following flow chart in Figure 13 summarizes the whole process of anomaly detection based on a dynamic ARIMA model.
As shown in Figure 14, when the threshold takes different standard deviation coefficients, the proportion of anomalies recognized changes. When the std. coefficient is larger, the threshold is larger, which means the criteria are stricter, so the number of detected anomalies is smaller.
In order to adapt corresponding treatments to the anomalies with different severity, hierarchical warnings are defined by classifying the anomalies by their detection thresholds, as shown in Table 5. The std. coefficients for these three levels are defined according to the percentage of abnormality in the total data; 5.5, 6.5, and 7.5 times correspond to 0.1%, 0.05%, and 0.02%, respectively. As the std. coefficient goes up, they are labeled as Level 3 to Level 1 warning points, with Level 1 being the most urgent one.

Std. Coefficient
Warning Level Colors * 5.5 Level three Yellow 6.5 Level two Orange 7.5 Level one Red * The color of warning level intuitively shows its urgent level. Red is the most urgent situation, followed by orange and yellow. Figure 15 shows the detected anomalies with the above three levels of thresholds. The model gives warning signals at both daily minima and maxima. As the data are changing more dramatically, the warning level is more severe. It should be noted that there are many physical causes of a warning identified by the proposed method, such as

Std. Coefficient
Warning Level Colors * 5.5 Level three Yellow 6.5 Level two Orange 7.5 Level one Red * The color of warning level intuitively shows its urgent level. Red is the most urgent situation, followed by orange and yellow. Figure 15 shows the detected anomalies with the above three levels of thresholds. The model gives warning signals at both daily minima and maxima. As the data are changing more dramatically, the warning level is more severe. It should be noted that there are many physical causes of a warning identified by the proposed method, such as   Figure 15 shows the detected anomalies with the above three levels of thresholds. The model gives warning signals at both daily minima and maxima. As the data are changing more dramatically, the warning level is more severe. It should be noted that there are many physical causes of a warning identified by the proposed method, such as the drastic changes in the surrounding environment of the tunnel, the failure of monitoring sensors, or the abnormal conditions of the tunnel structure. Since the actual conditions corresponding to the abnormal signals have not been identified one by one, we can only inform the operation and maintenance authorities that they should pay close attention to the further development of the monitoring value and be prepared for possible structural defects. the drastic changes in the surrounding environment of the tunnel, the failure of monitoring sensors, or the abnormal conditions of the tunnel structure. Since the actual conditions corresponding to the abnormal signals have not been identified one by one, we can only inform the operation and maintenance authorities that they should pay close attention to the further development of the monitoring value and be prepared for possible structural defects. It can be counted in Figure 15 that there were more than 10 anomalies in a single day, which may be too many in practice. We have acknowledged that the identified anomaly does not necessarily represent structural damage, it could just be an overweight vehicle passing through a tunnel, or sensor drift errors. In order to avoid ineffective work caused by handling excessive warnings, we give the following recommendations for anomaly detection of SHM data: in a certain time period, the number of warnings should vary within a certain range. If the number of warnings identified on a given day is significantly higher than that in the past few days, it can be assumed that some problems may really have happened, which requires a comprehensive and detailed inspection of structures, sensors, etc.

Conclusions
This study introduces specific processes of data anomaly detection based on the dynamic ARIMA model. The conclusions are drawn below: (1) Based on the analysis of concrete strain SHM data of the HZMB immersed tunnel, three types of anomalies can be classified and should be detected. In addition, the classification of data anomalies caused by poor data quality and structural damage requires further study. (2) The static ARIMA model is established according to the normative steps, and the model is tested to ensure its validity. (3) Considering the requirement of real-time warning of the SHM system, the method of dynamic modeling and setting dynamic threshold value is discussed. It is suggested to adopt the multiple standard deviations of the previous time period as the dynamic threshold. (4) A dynamic warning schematic was established with a hierarchical grading standard, from Level 1 to Level 3 warnings, to verify and apply to detect anomalies of the concrete strain data of the HZMB immersed tunnel. It is found that the proposed method is able to give good results in anomaly detection and greatly improve the efficiency of tunnel operators, which demonstrates its ability to be applied to major infrastructure structural health monitoring. It can be counted in Figure 15 that there were more than 10 anomalies in a single day, which may be too many in practice. We have acknowledged that the identified anomaly does not necessarily represent structural damage, it could just be an overweight vehicle passing through a tunnel, or sensor drift errors. In order to avoid ineffective work caused by handling excessive warnings, we give the following recommendations for anomaly detection of SHM data: in a certain time period, the number of warnings should vary within a certain range. If the number of warnings identified on a given day is significantly higher than that in the past few days, it can be assumed that some problems may really have happened, which requires a comprehensive and detailed inspection of structures, sensors, etc.

Conclusions
This study introduces specific processes of data anomaly detection based on the dynamic ARIMA model. The conclusions are drawn below: (1) Based on the analysis of concrete strain SHM data of the HZMB immersed tunnel, three types of anomalies can be classified and should be detected. In addition, the classification of data anomalies caused by poor data quality and structural damage requires further study. (2) The static ARIMA model is established according to the normative steps, and the model is tested to ensure its validity. (3) Considering the requirement of real-time warning of the SHM system, the method of dynamic modeling and setting dynamic threshold value is discussed. It is suggested to adopt the multiple standard deviations of the previous time period as the dynamic threshold. (4) A dynamic warning schematic was established with a hierarchical grading standard, from Level 1 to Level 3 warnings, to verify and apply to detect anomalies of the concrete strain data of the HZMB immersed tunnel. It is found that the proposed method is able to give good results in anomaly detection and greatly improve the efficiency of tunnel operators, which demonstrates its ability to be applied to major infrastructure structural health monitoring.