An Integrated Method for Tunnel Health Monitoring Data Analysis and Early Warning: Savitzky–Golay Smoothing and Wavelet Transform Denoising Processing

A tunnel health monitoring (THM) system ensures safe operations and effective maintenance. However, how to effectively process and denoise several data collected by THM remains to be addressed, as well as safety early warning problems. Thus, an integrated method for Savitzky–Golay smoothing (SGS) and Wavelet Transform Denoising (WTD) was used to smooth data and filter noise, and the coefficient of the non-uniform variation method was proposed for early warning. The THM data, including four types of sensors, were attempted using the proposed method. Firstly, missing values, outliers, and detrend in the data were processed, and then the data were smoothed by SGS. Furthermore, data denoising was carried out by selecting wavelet basis functions, decomposition scales, and reconstruction. Finally, the coefficient of non-uniform variation was employed to calculate the yellow and red thresholds. In data smoothing, it was found that the Signal Noise Ratio (SNR) and Root Mean Square Error (RMSE) of SGS smoothing were superior to those of the moving average smoothing and five-point cubic smoothing by approximately 10% and 30%, respectively. An interesting phenomenon was discovered: the maximum and minimum values of the denoising effects with different wavelet basis functions after selection differed significantly, with the SNR differing by 14%, the RMSE by 8%, and the r by up to 80%. It was found that the wavelet basis functions vary, while the decomposition scales are consistently set at three layers. SGS and WTD can effectively reduce the complexity of the data while preserving its key characteristics, which has a good denoising effect. The yellow and red warning thresholds are categorized into conventional and critical controls, respectively. This early warning method dramatically improves the efficiency of tunnel safety control.


Introduction
In recent years, highway tunnel diseases have become increasingly prominent, especially in tunnels with serious adverse geological locations, significant structural damage, mechanical fractures [1,2], or hidden dangers, which are prone to bridge breaks, tunnel collapse, mudslides, and other accidents [3][4][5].These accidents can cause substantial economic and life losses.In addition, tunnels have high operational safety risks under complex geological conditions [6].Establishing a health monitoring system for tunnels is an effective way to prevent disease and danger.It not only monitors settlement, cross-section, and surface stress in real-time but also reduces preventive maintenance costs [7,8].Therefore, implementing a tunnel health monitoring (THM) system is imperative.The THM data for the Haozhuangliang tunnel in northwest China was used to validate the method.The sensors for health monitoring include hydrostatic levelling, laser range finder, surface strain gauge, and surface crack gauge.In the last step, the coefficient of the non-uniform variation method was adopted to provide safety early warning for tunnels.

Outlines of the Haozhuangliang Tunnel
The Haozhuangliang tunnel is located on the Yan'an-Xi'an Expressway in Tongchuan, Shaanxi Province (Figure 2a), and is well equipped with ventilation, firefighting, and lighting facilities (Figure 2b).It is a dual-carriageway tunnel with four lanes in each direction.It is divided into two sections: (K125+230) to (K127+120) on the upper segment, and (K125+325) to (K127+060) on the lower part, respectively, measuring 1983 m and 1901 m (Figure 2c).Not only is it the longest completed high-grade highway tunnel in Shaanxi, but it also holds this distinction in Northwest China.The tunnel's construction began officially in March 1998, and the entire line was opened to traffic on 29 April 2001.

Outlines of the Haozhuangliang Tunnel
The Haozhuangliang tunnel is located on the Yan'an-Xi'an Expressway in Tongchuan, Shaanxi Province (Figure 2a), and is well equipped with ventilation, fire-fighting, and lighting facilities (Figure 2b).It is a dual-carriageway tunnel with four lanes in each direction.It is divided into two sections: (K125+230) to (K127+120) on the upper segment, and (K125+325) to (K127+060) on the lower part, respectively, measuring 1983 m and 1901 m (Figure 2c).Not only is it the longest completed high-grade highway tunnel in Shaanxi, but it also holds this distinction in Northwest China.The tunnel's construction began officially in March 1998, and the entire line was opened to traffic on 29 April 2001.

Overview of THM System
The THM system for the Haozhuangliang tunnel deploys hydrostatic levelling (HL), laser range finder (LRF), surface strain gauge (SSG), and surface crack gauge (SCG) as the primary sensors, allowing the satisfaction of tunnel structure settlement and experimental test.The THM system provides (1) the dynamic monitoring and transmission of data, allowing the real-time status of the tunnel to be observed; (2) the highly accurate monitoring with a distance measurement accuracy of up to 0.5 mm; (3) the collection and storage of data, providing an early warning service when data falls outside of the acceptable range.The details of THM contents and sensors are listed in Table 1.
There are four types of monitoring: settlement, cross-section convergence, lining surface stress, and crack width.(1) Settlement monitoring.It is measured with hydrostatic leveling at nine monitoring sites every 200 m, with problematic sections located every 50 m.Data are collected once a minute, for 527,040 data points per year.(2) Cross-section convergence.A total of 32 laser range finders are deployed to monitor this content, with regular and problematic sections located at intervals of 200 and 50 m, respectively.The collection frequency is once an hour, with a total of 8041 data.(3) Lining surface stress.It is monitored using 131 surface strain gauges, with intervals following the same pattern as for settlement monitoring.There are five cross-sections per section, and 27 monitoring sites on the left and right sides.(4) Lining surface crack width.It is observed through 22 surface crack gauges, with one sensor selected for 25% of the total cracks.Data are collected once an hour for a total of 8785 data points.

Overview of THM System
The THM system for the Haozhuangliang tunnel deploys hydrostatic levelling (HL), laser range finder (LRF), surface strain gauge (SSG), and surface crack gauge (SCG) as the primary sensors, allowing the satisfaction of tunnel structure settlement and experimental test.The THM system provides (1) the dynamic monitoring and transmission of data, allowing the real-time status of the tunnel to be observed; (2) the highly accurate monitoring with a distance measurement accuracy of up to 0.5 mm; (3) the collection and storage of data, providing an early warning service when data falls outside of the acceptable range.The details of THM contents and sensors are listed in Table 1.There are four types of monitoring: settlement, cross-section convergence, lining surface stress, and crack width.(1) Settlement monitoring.It is measured with hydrostatic leveling at nine monitoring sites every 200 m, with problematic sections located every 50 m.Data are collected once a minute, for 527,040 data points per year.(2) Cross-section convergence.A total of 32 laser range finders are deployed to monitor this content, with regular and problematic sections located at intervals of 200 and 50 m, respectively.The collection frequency is once an hour, with a total of 8041 data.(3) Lining surface stress.It is monitored using 131 surface strain gauges, with intervals following the same pattern as for settlement monitoring.There are five cross-sections per section, and 27 monitoring sites on the left and right sides.(4) Lining surface crack width.It is observed through 22 surface crack gauges, with one sensor selected for 25% of the total cracks.Data are collected once an hour for a total of 8785 data points.
We only show one figure of the raw signals collected from the THM for each sensor to repeat, as displayed in Figure 3. Figure 3a-d    We only show one figure of the raw signals collected from the THM for each sensor to repeat, as displayed in Figure 3.

Data Processing and Analysis
The four sensor types share similar data processing and analysis procedures in THM, and SSG is the most distributed sensor.Thus, this study employed SSG as a representative data processing and analysis example.

Missing Value Processing
Sensors may contain missing values due to various factors, such as power supply disruptions, maintenance checks, and other ambient factors.To facilitate further analysis, filling in these missing values is essential.The absence data of some sensors is appended in Table A1. Figure 4 shows the missing value of SSG-L7-1, "L" represents tunnel left, resulting in 744 missing values for each sensor.In comparison, SSG-R (SSG installed on the right side) exhibits no missing data, indicating that it has a higher quality of data concerning missing values.The leading causes of the missing data could be sensor failure or transmission network issues.This led to no data being collected for that month.The mean filling was utilized to fill in the missing values to reduce the amount of computational load and the effects of data variance.The filling result is shown in Figure 4, and the red dashed

Data Processing and Analysis
The four sensor types share similar data processing and analysis procedures in THM, and SSG is the most distributed sensor.Thus, this study employed SSG as a representative data processing and analysis example.filling in these missing values is essential.The absence data of some sensors is appended in Table A1. Figure 4 shows the missing value of SSG-L7-1, "L" represents tunnel left, resulting in 744 missing values for each sensor.In comparison, SSG-R (SSG installed on the right side) exhibits no missing data, indicating that it has a higher quality of data concerning missing values.The leading causes of the missing data could be sensor failure or transmission network issues.This led to no data being collected for that month.The mean filling was utilized to fill in the missing values to reduce the amount of computational load and the effects of data variance.The filling result is shown in Figure 4, and the red dashed lines in Figure 4 indicate the extent of missing and filling, where (a,b) shows the missing and the filling of SSG-L7-1, separately.
and SSG is the most distributed sensor.Thus, this study employed SSG as a representative data processing and analysis example.

Missing Value Processing
Sensors may contain missing values due to various factors, such as power supply disruptions, maintenance checks, and other ambient factors.To facilitate further analysis, filling in these missing values is essential.The absence data of some sensors is appended in Table A1. Figure 4 shows the missing value of SSG-L7-1, "L" represents tunnel left, resulting in 744 missing values for each sensor.In comparison, SSG-R (SSG installed on the right side) exhibits no missing data, indicating that it has a higher quality of data concerning missing values.The leading causes of the missing data could be sensor failure or transmission network issues.This led to no data being collected for that month.The mean filling was utilized to fill in the missing values to reduce the amount of computational load and the effects of data variance.The filling result is shown in Figure 4, and the red dashed lines in Figure 4 indicate the extent of missing and filling, where (a,b) shows the missing and the filling of SSG-L7-1, separately.

Outlier Value Processing
Data that deviate from overall sampling values are considered outliers due to problems in the signal transmission system or sensors, and conditions such as substantial electromagnetic interference.The presence of outliers can result in deviations in the data analysis.The outliers for some of the sensors are appended in Table A2.To address outliers, this study used standard methods such as the Pailda criterion (3σ rule) [19] and the Grubbs criterion [20].The Grubbs criterion is complex and challenging to implement, and the 3σ criterion meets the requirements for outlier detection in THM.Therefore, the 3σ criterion is selected to remove the outliers in this study.The distribution of the tunnel SSG data is depicted by the box line diagram in Figure 5, in which it can be noted that the majority of SSG medians are centered around 0, with SSG-L2-2, SSG-L2-5, and SSG-L4-2 being exceptions having medians around −50.These sensors are mostly located at the top and waist of the vault, which are prone to stress changes.Multiple outlier values were observed for SSG-L13-3, SSG-R17-3, and SSG-R21-3, which have been replaced by the mean values.

Detrending Processing
Seasonal changes in ambient factors or sensor performance can cause sensors to be susceptible to low-frequency noise.This phenomenon is known as the trend term of the signal, which requires removal for accurate signal analysis.A polynomial fitting is adopted to calculate the trend component of time-series data [21].By subtracting the trend component from the original data, the detrended data can be obtained after removing the trend.In the process of detrending, the commonly used orders for polynomial fitting are 2-4 [22][23][24].Based on the above research, a quadratic polynomial function was selected to predict the trend.Using an excessively high order can lead to overfitting, while a too low order results in insufficient fitting accuracy [25].
criterion is selected to remove the outliers in this study.The distribution of the tunnel SSG data is depicted by the box line diagram in Figure 5, in which it can be noted that the majority of SSG medians are centered around 0, with SSG-L2-2, SSG-L2-5, and SSG-L4-2 being exceptions having medians around −50.These sensors are mostly located at the top and waist of the vault, which are prone to stress changes.Multiple outlier values were observed for SSG-L13-3, SSG-R17-3, and SSG-R21-3, which have been replaced by the mean values.

Detrending Processing
Seasonal changes in ambient factors or sensor performance can cause sensors to be susceptible to low-frequency noise.This phenomenon is known as the trend term of the signal, which requires removal for accurate signal analysis.A polynomial fitting is adopted to calculate the trend component of time-series data [21].By subtracting the trend component from the original data, the detrended data can be obtained after removing the trend.In the process of detrending, the commonly used orders for polynomial fitting are 2-4 [22][23][24].Based on the above research, a quadratic polynomial function was selected to predict the trend.Using an excessively high order can lead to overfitting, while a too low order results in insufficient fitting accuracy [25].
The analysis showed that the two SSG data had a trend seasonality and residual, as shown in Figure 6.SSG-L1-1 (a) and SSG-L1-3 (b) data exhibit a decreasing trend from January to March and an increasing trend from June to September, possibly due to ambient factors, resulting in a noisy signal over time.Polynomial fitting was used to eliminate the trend item, as illustrated in Figure 6.SSG-L1-1 (a) and SSG-L1-3 (b) had different trend The analysis showed that the two SSG data had a trend seasonality and residual, as shown in Figure 6.SSG-L1-1 (a) and SSG-L1-3 (b) data exhibit a decreasing trend from January to March and an increasing trend from June to September, possibly due to ambient factors, resulting in a noisy signal over time.Polynomial fitting was used to eliminate the trend item, as illustrated in Figure 6.SSG-L1-1 (a) and SSG-L1-3 (b) had different trend term changes.SSG-L1-1 was installed at the arch foot on the left lining, while SSG-L1-3 was at the arch crown.The various installation positions are the main reason for the differing trend changes.

Signal Evaluation Indexes
Signal Noise Ratio (SNR) and Root Mean Square Error (RMSE) are commonly used signal single evaluation indexes.SNR is the ratio between signal and noise power.The larger the SNR, the better the data smoothing.The equation of SNR [26] is defined as equation (1).( ) 10log ( ) ( ) RMSE is another index used to evaluate the denoising performance of the signal.It represents the square root of the variance of the smoothed data from the original data [27].A lower RMSE indicates better denoising performance.

Signal Evaluation Indexes
Signal Noise Ratio (SNR) and Root Mean Square Error (RMSE) are commonly used signal single evaluation indexes.SNR is the ratio between signal and noise power.The larger the SNR, the better the data smoothing.The equation of SNR [26] is defined as Equation (1).
RMSE is another index used to evaluate the denoising performance of the signal.It represents the square root of the variance of the smoothed data from the original data [27].A lower RMSE indicates better denoising performance.
where s(i) and f (i) denote the denoised and original signals, respectively, and n collection data point.

Introduction of Savitzky-Golay Smoothing
Sensor data may contain signal glitches and not be smooth due to external factors such as manual operation and environmental effects.These interference signals will negatively impact monitoring and analysis.The data smoothing technique of numerical averaging is frequently utilized to mitigate the impact of external factors and simplify data analysis, enabling more precise visualization and summary of long-term trends.
Savitzky-Golay smoothing (SGS) was initially proposed by Savitzky and Golay in 1964 [28].It is based on a local polynomial with least-squares fitting by moving windows, retaining most original data information while providing smoother distribution.The SGS [29,30] process is listed as follows: where ŷj denotes the smoothed data set, x j+i the collected data set, a 0 and a i the smoothing coefficients (e.g., the weight of x j+i in the smoothing window of period i), n the number of data in the sliding window.m the window width, and n = 2m + 1.

Signal Smoothing
To demonstrate the superiority and applicability of SGS, we selected the moving average smoothing (MAS) [31] and five-point cubic smoothing (FTS) [32], which are relatively common in time-series data smoothing methods for comparison.And we used some indexes, including SNR and RMSE, to evaluate the smoothing effects.The SNR and RMSE for evaluating the smoothing impact are displayed in Figure 7a,b, respectively.And the two sensor smoothing results of SSG-L1 and SSG-R26 are shown in Table 2.It can be clearly compared that the SNR of MAS and FTS is distinctly smaller than that of SGS, and the RMSE of SGS is more significant than that of MAS and FTS.The main reason is that SGS allows direct specification of smoothing window sizes, making it suitable for signals of different frequencies.In contrast, the window sizes are fixed for FTS and MAS, which lacks flexibility.In addition, by calculating the mean values of SNR and RMSE across all sensors, it was found that the SNR and RMSE of SGS smoothing were superior to those of MAS and FTS by approximately 10% and 30%, respectively.Figure 8 shows three smoothing results, with Figure 8a-d representing the raw data, SGS, MAS, and FTS results, respectively.Compared to the raw data, SGS data does not lose too much detail, while MAS and FTS data differ considerably from the original data, such as in the signal sampling points (0, 1000) and (7000, 8000).
The fluctuation ranges of SSG vary across different periods, as shown by changes in smoothed data over the months.The data fluctuate between −20 and 20 µε for most of the sampling points from 0 (around January) to 5000 (around August).However, there are more variations from −60 to 20 µε between point 500 or so (February) and 1500 or so (around March).From point 5000 (August) to 8600 (December), the data range lies primarily between −40 µε and 20 µε.The effectiveness of SGS in preserving data features such as trends and fluctuations, further eliminating overlap and stacking, is clearly evident.Figure 8 shows three smoothing results, with Figure 8a-d representing the raw data, SGS, MAS, and FTS results, respectively.Compared to the raw data, SGS data does not lose too much detail, while MAS and FTS data differ considerably from the original data, such as in the signal sampling points (0, 1000) and (7000, 8000).There are three standard parameters in the SGS procedures, including window length (Win_len), fitting order (Orders), and sample interval (Delta).After reading some relevant literature [30,31,[33][34][35], we used an enumeration method to list every combination of parameters to smooth the data.We take some combination of parameters with Win_len (3-11, must be odd), Orders (1-5), and Delta (1-5).SNR and RMSE are used to evaluate the smoothing effect.The relationship between parameters and evaluation metrics is shown in Figures 9 and 10. Figure 9a-c   The fluctuation ranges of SSG vary across different periods, as shown by changes in smoothed data over the months.The data fluctuate between −20 and 20 µε for most of the sampling points from 0 (around January) to 5000 (around August).However, there are more variations from −60 to 20 µε between point 500 or so (February) and 1500 or so (around March).From point 5000 (August) to 8600 (December), the data range lies primarily between −40 µε and 20 µε.The effectiveness of SGS in preserving data features such as trends and fluctuations, further eliminating overlap and stacking, is clearly evident.
There are three standard parameters in the SGS procedures, including window length (Win_len), fitting order (Orders), and sample interval (Delta).After reading some relevant literature [30,31,[33][34][35], we used an enumeration method to list every combination of parameters to smooth the data.We take some combination of parameters with Win_len (3-11, must be odd), Orders (1-5), and Delta (1-5).SNR and RMSE are used to evaluate the smoothing effect.The relationship between parameters and evaluation metrics is shown in Figures 9 and 10.Figures 9a-c  respectively.The blue dots in Figure 9 represent the SGS processed values for different parameters.The smoothing results clearly indicate that too large a number results in poorly spaced samples in this study.The insensitivity of the sample interval to the SGS effect is primarily attributed to the fact that the sample interval commonly utilized does not differ considerably.Therefore, we obtained a simple Table 3 after removing the sample interval.
relevant literature [30,31,[33][34][35], we used an enumeration method to list every combination of parameters to smooth the data.We take some combination of parameters with Win_len (3-11, must be odd), Orders (1-5), and Delta (1)(2)(3)(4)(5).SNR and RMSE are used to evaluate the smoothing effect.The relationship between parameters and evaluation metrics is shown in Figures 9 and 10. Figure 9a-c and Figure 9d-f shows the SNR and RMSE values of SGS for different window lengths, fitting orders, and sample intervals, respectively, and Figure 10a-c and Figure 10d-f shows the SNR values and RMSE values for window length and order, window length, and sample interval, and order and sample interval SG smoothing, respectively.The blue dots in Figure 9 represent the SGS processed values for different parameters.The smoothing results clearly indicate that too large a number results in poorly spaced samples in this study.The insensitivity of the sample interval to the SGS effect is primarily attributed to the fact that the sample interval commonly utilized does not differ considerably.Therefore, we obtained a simple Table 3 after removing the sample interval.As is shown in Table 3, where the index includes window length and orders, the smoothing effects of parameter combination (Win_len = 7, Orders = 5) are better than others.The bolded portion of the table indicates the parameter with the best results, and this applies to the subsequent tables as well.The changes in window length are more sensitive than the fitting orders, and the reason is that the window length directly controls how many data points are included in each smoothed value.A longer window has more points and smooths more aggressively.An interesting phenomenon was discovered.When the window length of SG smoothing was more prominent than or equal to 5, the smoothing effects of fitting orders 2 and 3 did not differ much.This could be because the fitting order of the data was around 2-3.It was also similar to the polynomial fitting order in Section 3.1.3.As is shown in Table 3, where the index includes window length and orders, the smoothing effects of parameter combination (Win_len = 7, Orders = 5) are better than others.The bolded portion of the table indicates the parameter with the best results, and this applies to the subsequent tables as well.The changes in window length are more sensitive than the fitting orders, and the reason is that the window length directly controls how many data points are included in each smoothed value.A longer window has more points and smooths more aggressively.An interesting phenomenon was discovered.When the window length of SG smoothing was more prominent than or equal to 5, the smoothing effects of fitting orders 2 and 3 did not differ much.This could be because the fitting order of the data was around 2-3.It was also similar to the polynomial fitting order in Section 3.1.3.

Introduction of WTD Processing
The coupling effect of various factors often affects the tunnel structure, including traffic behavior and ambient conditions.Data transmission inevitably results in signal quality degradation, leading to significant noise in time-domain data.Thus, noise reduction processing is necessary for the data.Fourier transform for time-domain analysis and wavelet transform for time-frequency analysis are typical denoising processes.The Fourier transform is unsuited for non-stationary, non-linear signals and is less sensitive to time changes [36].On the other hand, wavelet transform (WT) can handle both non-smooth and noisy signals effectively [37].Wavelet transform can combine the benefits of time-domain and frequency-domain analysis methods and characterize the local features of the signal in time-frequency analysis.It is frequently used to denoise tunnel monitoring data [8,38,39].The results demonstrate the feasibility and effectiveness of applying wavelet transform to tunnel monitoring data processing.Based on these, this study will employ wavelet transform to reduce noise.
WT decomposes the raw signal into approximation and detail coefficients (cD).The decomposition process is stopped by discarding the low-frequency approximation coefficients (cA) from the early components and further decomposing the high-frequency detail coefficients until the denoising requirements are met.The signal is then reconstructed by retaining the high-frequency detail coefficients from each component and the approximation coefficients from the final components.This achieves the denoising effect on the signal.Therefore, the mathematical relationship between the wavelet reconstructed signal and its cA and cD [8,40] is shown in Equation (4) as follows: where s(i) is the denoised signal, cA n and cD j the approximation coefficients and the detail coefficients from the n-th and j-th decomposition, respectively.

Evaluation Index of WTD Processing
In signal denoising processing, pursuing only the increase in SNR may result in overdenoising and the loss of valuable information [41].To avoid this problem, the smoothness metric is introduced as a signal evaluation index.Smoothness (r) directly reflects the stability and continuity of the denoising signal, and can evaluate the preservation of the intrinsic characteristics of the signal after denoising [42].It is a more intuitive evaluation metric.The combination of the smoothness metric with SNR and RMSE metrics allows for a more comprehensive and objective judgment of the effects of denoising algorithms.
Combining the smoothness metric with SNR and RMSE metrics allows for a more comprehensive and objective judgment of the effects of denoising algorithms [43].r is the ratio of the root of variance between the denoised and original signal of first-order difference.
It is a physical quantity concerned with signal approximation information [42,44].The smaller the r, the better denoising.
where s(i) and f (i) denote the denoised and original signals, respectively, and n collection data point.
To solve the problem that the three evaluation indicators may have different decisions and unclear references [45], we introduced the coefficient of variation method to synthesize the weight ratio.In this study, the coefficient of variation weighting method was introduced to determine the optimal number of layers for wavelet decomposition, which objectively reflects the complexity of calculating the index.When an index is difficult to estimate, it will have a more significant coefficient of variation and be assigned a higher weight.
T j is used to evaluate wavelet basis function and decomposition scale selection.The calculation of T j is firstly normalized for each index according to the correlation with the smoothing result [44].
The weights are calculated and linearly combined to obtain the composite index T j : the smaller the T j , the better the wavelet decomposition [44,45].
(1) Calculate the coefficient of variation among the indexes CV SNR .
(2) Calculate the weights assigned to each index W SNR .
(3) T j is obtained by linear combination.
where σ SNR and µ SNR represent the variance and mean of the SNR series, respectively.

Selection of Wavelet Basis Function
The selection of the wavelet basis function is a crucial factor in wavelet noise reduction, as each wavelet basis function has a unique effect on wavelet decomposition.Commonly used wavelet basis functions include coif1-5, db2-9, and sym3-9.This study compared the performance of coifN, dbN, and symN wavelet basis functions using an evaluation of Section 3.3.2,as presented in Figure 11a-d.Figure 11 depicts the variation values of the four metrics for evaluating the noise reduction effect of the wavelet transform under different wavelet basis functions, where the horizontal coordinates represent the different wavelet basis functions.Specifically, the green line represents the SNR in Figure 11a, the blue line represents the RMSE in Figure 11b, the orange line represents r in Figure 11c, and the pink line represents T j in Figure 11d.The red part of the line indicates the best basis function.Decomposition using coif2 produces the highest SNR and smaller RMSE.The r index of the db9 is smaller than other functions.It is particularly important to note that the decomposition using sym9 obtained the smallest T j index, with higher SNR and smaller RMSE and r.The red points in the figure indicate that the best noise reduction is achieved in the wavelet basis function for the transverse coordinate.Based on these results, sym9 was selected as the wavelet basis function for subsequent wavelet noise reduction.There is an interesting phenomenon that the maximum and minimum values of the denoising effects after selection differed greatly, with the SNR differing by 14%, the RMSE by 8%, and the r by up to 80%.This was mainly because different wavelet basis function clusters could fit the original function, but the detail parts were not very similar.

Determination of Wavelet Decomposition Scale
The choice of wavelet decomposition scale (number of layers) is also an important factor affecting wavelet noise reduction.A high decomposition scale may filter out the local response signal, while a low decomposition scale may retain some of the noise, leading to suboptimal results.In quite a few studies [46][47][48], we learn that the decomposition scale is mainly chosen based on the data, but most of them choose below 7.As the number of decomposition layers continues to increase, the amplitude of the wavelet coefficients of the useful signal remains almost unchanged [40], implying that the noise reduction is no longer improving.In this paper, a range of 2 to 7 layers was selected as different sensor wavelet decomposition scales.The signal from each decomposition layer was assessed using a comprehensive evaluation index Tj, and the results are shown in Table 4. HL, LRF, SSG, and SCG have the smallest index Tj at a decomposition scale of 3 layers, which means that the optimal noise reduction effect occurs at 3 layers.
To further support the decomposition scale of 3 layers from the data distribution perspective, plots of the decomposition results for the original SSG data at 2, 3, 4, and 5 decomposition layers are depicted in Figure 12a-e.The data at 3 layers does not suffer from the issue of retaining a significant amount of noise signals when decomposing 1 and 2 layers, nor does it over-filter local signals when decomposing from 4 and 5 layers.Therefore, a decomposition scale of 3 layers not only yields a better noise reduction performance for the original signal, but also preserves the overall variation trend.

Determination of Wavelet Decomposition Scale
The choice of wavelet decomposition scale (number of layers) is also an important factor affecting wavelet noise reduction.A high decomposition scale may filter out the local response signal, while a low decomposition scale may retain some of the noise, leading to suboptimal results.In quite a few studies [46][47][48], we learn that the decomposition scale is mainly chosen based on the data, but most of them choose below 7.As the number of decomposition layers continues to increase, the amplitude of the wavelet coefficients of the useful signal remains almost unchanged [40], implying that the noise reduction is no longer improving.In this paper, a range of 2 to 7 layers was selected as different sensor wavelet decomposition scales.The signal from each decomposition layer was assessed using a comprehensive evaluation index T j , and the results are shown in Table 4. HL, LRF, SSG, and SCG have the smallest index T j at a decomposition scale of 3 layers, which means that the optimal noise reduction effect occurs at 3 layers.To further support the decomposition scale of 3 layers from the data distribution perspective, plots of the decomposition results for the original SSG data at 2, 3, 4, and 5 decomposition layers are depicted in Figure 12a-e.The data at 3 layers does not suffer from the issue of retaining a significant amount of noise signals when decomposing 1 and 2 layers, nor does it over-filter local signals when decomposing from 4 and 5 layers.Therefore, a decomposition scale of 3 layers not only yields a better noise reduction performance for the original signal, but also preserves the overall variation trend.

Reconstruction of the Wavelet Decomposition
The selected wavelet basis functions and decomposition scale decomposes the original signal into its corresponding wavelet coefficients.The threshold compromise function was selected in this study to threshold the wavelet coefficients, followed by their reconstruction to obtain the denoised wavelet signal.The results of detrending, SGS and denoising are illustrated in Figure 13.By comparing Figure 13a,b, and c, it can be obtained that an integrated method of SGS and WTD can effectively remove noise better than SGS and detrending, while preserving the characteristics of the original data.Moreover, the reconstructed data are essentially consistent with the measured data's trend and singular point location.This suggests that the integrated method has notable benefits in reducing signal noise.

Summary of Data Processing and Analysis
For the processing of the data collected by HL, LRF, and SCG sensors for THM, the same process as above for the SSG was performed, and the results are shown in Table 5.The HL and LRF have no missing values, but the SCG has missing values.The other three types of sensor data identify outlier values but lack trending terms.Regarding the selected optimal wavelet basis function, coif4, sym8, and db4 were respectively utilized for LRF,

Reconstruction of the Wavelet Decomposition
The selected wavelet basis functions and decomposition scale decomposes the original signal into its corresponding wavelet coefficients.The threshold compromise function was selected in this study to threshold the wavelet coefficients, followed by their reconstruction to obtain the denoised wavelet signal.The results of detrending, SGS and denoising are illustrated in Figure 13.By comparing Figure 13a,b,c, it can be obtained that an integrated method of SGS and WTD can effectively remove noise better than SGS and detrending, while preserving the characteristics of the original data.Moreover, the reconstructed data are essentially consistent with the measured data's trend and singular point location.This suggests that the integrated method has notable benefits in reducing signal noise.

Reconstruction of the Wavelet Decomposition
The selected wavelet basis functions and decomposition scale decomposes the original signal into its corresponding wavelet coefficients.The threshold compromise function was selected in this study to threshold the wavelet coefficients, followed by their reconstruction to obtain the denoised wavelet signal.The results of detrending, SGS and denoising are illustrated in Figure 13.By comparing Figure 13a,b, and c, it can be obtained that an integrated method of SGS and WTD can effectively remove noise better than SGS and detrending, while preserving the characteristics of the original data.Moreover, the reconstructed data are essentially consistent with the measured data's trend and singular point location.This suggests that the integrated method has notable benefits in reducing signal noise.

Summary of Data Processing and Analysis
For the processing of the data collected by HL, LRF, and SCG sensors for THM, the same process as above for the SSG was performed, and the results are shown in Table 5.The HL and LRF have no missing values, but the SCG has missing values.The other three types of sensor data identify outlier values but lack trending terms.Regarding the selected optimal wavelet basis function, coif4, sym8, and db4 were respectively utilized for LRF,

Summary of Data Processing and Analysis
For the processing of the data collected by HL, LRF, and SCG sensors for THM, the same process as above for the SSG was performed, and the results are shown in Table 5.The HL and LRF have no missing values, but the SCG has missing values.The other three types of sensor data identify outlier values but lack trending terms.Regarding the selected optimal wavelet basis function, coif4, sym8, and db4 were respectively utilized for LRF, HL, and SCG data.This demonstrates that choosing the most suitable wavelet basis function for each sensor data is necessary when reconstructing wavelet decomposition.

Safety Early Warning
Determining the monitoring of early warning thresholds for each sensor is integral to the THM.Thus, this study introduced an early warning safety index for tunnels based on data processing and denoising, as presented in Section 4.1.The red and yellow early warning lines were calculated with sensors SSG as an example, and later extended to other sensors.

Selection of Early Warning Index
In early warning of changes in THM, the coefficient of non-uniform variation (CNV), which reflects the degree of curve similarity on the grey correlation analysis method, is used to display the degree of non-uniform variation of sensors at different locations [49,50].The CNV is able to acquire the degree of correlation between the indexes.The CNV can give the degree of intercorrelation between different indicators.The larger the correlation degree, the greater the correlation between the current and standard data sequences.If the correlation degree is small, it indicates that non-uniform changes have occurred between the displacement sensors, non-uniform changes have occurred in the tunnel structure, accumulated damage may exist in THM, or there may be sensor failures [51].Such nonuniform variations could display the presence of accrued damage or sensor faults.The CNV employs the slope correlation index, which is calculated as follows [49,50]: where x 0 (i) and x i (i) denote the standard series and current compared series, respectively, a (x i (j)) the cumulative reduction of the series x(j).

Evaluation of Safety Early Warning
The CNV in Section 4.1 was adopted to delineate the safety thresholds; yellow and red early warning thresholds were then defined.When the THM sensors reach the yellow early warning threshold, management and maintenance departments must pay attention to the tunnel environment, loads, and overall structural condition and arrange maintenance during this period.Suppose the monitoring values exceed the red early warning threshold.In that case, the management and maintenance departments need to pay immediate and significant attention by arranging an inspection and assessment of the tunnel's structural safety to ensure its operation.Early warning thresholds differ from sensors of the same type located in different tunnel locations.Therefore, this study divided the early warning thresholds of all THM sensors in the tunnel and used SSG as an example.Figure 14 shows the early warning thresholds for each SSG sensor.The red early warning thresholds for the CNV mostly fall between 0.55 and 0.50, ranging from 0.35 to 0.7.Yellow early warning thresholds range between 0.3 and 0.4, ranging from 0.23 to 0.51.Consequently, the red early warning thresholds exhibit more significant variations across sensors than the yellow early warning thresholds.
early warning thresholds range between 0.3 and 0.4, ranging from 0.23 to 0.51.Consequently, the red early warning thresholds exhibit more significant variations across sensors than the yellow early warning thresholds.
One of the HL, LRF, SSG, and SCG was selected for early warning thresholds analysis on a day-by-day basis; the distribution for each sensor is shown in Figure 15a-d, respectively.It can be seen that most monitored values of the four sensors are within the safety threshold, with few reaching the yellow early warning line.This may be due to the vibration caused by heavy traffic in the tunnel on that day.On a few days, the monitored values reached the red early warning line, potentially due to extreme weather conditions, such as high winds and heavy rainfall.In such instances, it is crucial to dispatch personnel promptly for inspection and maintenance to avoid endangering the tunnel structure and ensure the safe operation of the tunnel.Furthermore, the Huangzhuangliang tunnel remains generally safe and experiences few safety hazards during daily inspections.

Validation
In order to verify the effectiveness of this method for tunnel health monitoring systems, we selected crack gauge data for comparative analysis.Since crack gauges measure One of the HL, LRF, SSG, and SCG was selected for early warning thresholds analysis on a day-by-day basis; the distribution for each sensor is shown in Figure 15a-d, respectively.It can be seen that most monitored values of the four sensors are within the safety threshold, with few reaching the yellow early warning line.This may be due to the vibration caused by heavy traffic in the tunnel on that day.On a few days, the monitored values reached the red early warning line, potentially due to extreme weather conditions, such as high winds and heavy rainfall.In such instances, it is crucial to dispatch personnel promptly for inspection and maintenance to avoid endangering the tunnel structure and ensure the safe operation of the tunnel.Furthermore, the Huangzhuangliang tunnel remains generally safe and experiences few safety hazards during daily inspections.thresholds for the CNV mostly fall between 0.55 and 0.50, ranging from 0.35 to 0.7.Yellow early warning thresholds range between 0.3 and 0.4, ranging from 0.23 to 0.51.Consequently, the red early warning thresholds exhibit more significant variations across sensors than the yellow early warning thresholds.One of the HL, LRF, SSG, and SCG was selected for early warning thresholds analysis on a day-by-day basis; the distribution for each sensor is shown in Figure 15a-d, respectively.It can be seen that most monitored values of the four sensors are within the safety threshold, with few reaching the yellow early warning line.This may be due to the vibration caused by heavy traffic in the tunnel on that day.On a few days, the monitored values reached the red early warning line, potentially due to extreme weather conditions, such as high winds and heavy rainfall.In such instances, it is crucial to dispatch personnel promptly for inspection and maintenance to avoid endangering the tunnel structure and ensure the safe operation of the tunnel.Furthermore, the Huangzhuangliang tunnel remains generally safe and experiences few safety hazards during daily inspections.

Validation
In order to verify the effectiveness of this method for tunnel health monitoring systems, we selected crack gauge data for comparative analysis.Since crack gauges measure

Validation
In order to verify the effectiveness of this method for tunnel health monitoring systems, we selected crack gauge data for comparative analysis.Since crack gauges measure the width of cracks in the tunnel, they best represent whether fatigue damage exists in the tunnel structure.Thus, we compared relevant Chinese standards: "Technical Specifications for Maintenance of Highway Tunnels" [52] and some literature studies [53,54].Different warning levels of lining surface-crack width are divided, and the warning threshold for crack width is shown in Table 6.As shown in Figure 16a,b, we can clearly see the differences between the yellow warning line and the monitoring data before the early warning.This means that all the monitoring data from SCG of THM has not reached the warning value, indicating that the tunnel's structural damage has not reached the warning line.Therefore, the early warning method we proposed has been proven to be practical and feasible.the width of cracks in the tunnel, they best represent whether fatigue damage exists in the tunnel structure.Thus, we compared relevant Chinese standards: "Technical Specifications for Maintenance of Highway Tunnels" [52] and some literature studies [53,54].Different warning levels of lining surface-crack width are divided, and the warning threshold for crack width is shown in Table 6.As shown in Figure 16a,b, we can clearly see the differences between the yellow warning line and the monitoring data before the early warning.This means that all the monitoring data from SCG of THM has not reached the warning value, indicating that the tunnel's structural damage has not reached the warning line.Therefore, the early warning method we proposed has been proven to be practical and feasible.

Discussion
The signals sampled by the sensors in the THM system acquisition process are affected by various factors, such as environmental loads, material aging, and human traffic behaviors, resulting in the sampled signals containing multiple components.In addition, the operation of the sensors, the stability of the transmission networks, and the continuity of power and energy supply also greatly affect the quality of the acquired signals.In order to research structural safety warnings, the structural response signals need to be extracted from the complex signals.It is an important factor in evaluating the safe operation of tunnel structures.Thus, signal processing methods based on SGS and WTD have been proposed.However, during signal processing, the raw signals contain many interfering factors such as missing values, outliers, trend items, signal spikes, low-frequency noises, etc.To address these influencing factors, corresponding methods have been proposed, including filling missing values with the means, removing trend items using 3σ rule, smoothing out signal spikes using Savitzky-Golay smoothing, and reducing noises in the signals by wavelet transforms.There are some limitations and strengths of those methods as follows.
Firstly, in filling missing values, the mean filling is directly used to reduce computation and lower signal residuals, but we believe supervised machine learning methods could be used in future studies to predict missing values.Secondly, in outlier handling, using only the 3σ rule may introduce some bias, in other words, the defined quantiles for time-series data with large fluctuations may not be the same.Next, in detrending, the parameters of polynomial regression need to be further refined and scientized.Also, as an improved algorithm of least squares, SGS requires more delicate parameter tuning.

Discussion
The signals sampled by the sensors in the THM system acquisition process are affected by various factors, such as environmental loads, material aging, and human traffic behaviors, resulting in the sampled signals containing multiple components.In addition, the operation of the sensors, the stability of the transmission networks, and the continuity of power and energy supply also greatly affect the quality of the acquired signals.In order to research structural safety warnings, the structural response signals need to be extracted from the complex signals.It is an important factor in evaluating the safe operation of tunnel structures.Thus, signal processing methods based on SGS and WTD have been proposed.However, during signal processing, the raw signals contain many interfering factors such as missing values, outliers, trend items, signal spikes, low-frequency noises, etc.To address these influencing factors, corresponding methods have been proposed, including filling missing values with the means, removing trend items using 3σ rule, smoothing out signal spikes using Savitzky-Golay smoothing, and reducing noises in the signals by wavelet transforms.There are some limitations and strengths of those methods as follows.
Firstly, in filling missing values, the mean filling is directly used to reduce computation and lower signal residuals, but we believe supervised machine learning methods could be used in future studies to predict missing values.Secondly, in outlier handling, using only the 3σ rule may introduce some bias, in other words, the defined quantiles for time-series data with large fluctuations may not be the same.Next, in detrending, the parameters of polynomial regression need to be further refined and scientized.Also, as an improved algorithm of least squares, SGS requires more delicate parameter tuning.Finally, in wavelet denoising, we propose combining adjustments of wavelet basis functions and decomposition scales to optimize the effect and using threshold shrinkage functions to constrain decomposition coefficients.This method shows apparent noise reduction effects and robustness.However, there is still room for optimization in parameter selection.Further studies may use particle swarm optimization (PSO) to search for optimum and achieve the best denoising outcomes globally.Currently, the assignment of parameters is still ambiguous, which makes it an intriguing and promising research direction.
For the study of tunnel safety early warnings, a method describing the interrelationships between sensors is proposed.By calculating the non-uniform variation coefficients of monitoring sensors across various categories in the THM system, changes in tunnel structural damages or monitoring errors between sensors can be observed, based on which the tunnel early warning situations are analyzed.Crack meters in the THM system are used to represent structural damage conditions to verify the feasibility of the non-uniform variation coefficient method, mainly because the width of lining cracks is the most direct and obvious damage precursor.However, the results show the crack widths do not reach the early warning threshold specified in the Chinese industrial standard "Maintenance Technical Specifications for Highway Tunnels".Therefore, further validation of this method will be conducted on structures with significant damages or those reaching warning thresholds in future studies.

Conclusions
This study introduced the integration of SGS and wavelet transform for data processing and denoising.The coefficient of non-uniform variation (CNV) was then employed to determine the safety early warning threshold.According to the results, the following conclusions can be drawn: (1) The problem on THM data for the Haozhuangliang tunnel, which is existing missing and outlier values, trend term, was addressed through filling or replacing with mean values, and polynomial fitting, respectively.(2) Tunnels suffer from the impacts of multifaceted coupling effects such as traffic behavior and environment, and there is unavoidable signal loss during data transmission, which can lead to large amounts of noise being included in the data.Based on the effective preservation of data features, an integrated method of SGS and WTD can eliminate the issues of data overlap and stacked sections.By comparison, SGS is found to be better than the equivalent MAS and FTS.The mean values of SNR and RMSE of SGS smoothing were superior to those of MAS and FTS by approximately 10% and 30%, respectively.(3) Three single THM evaluation indexes were coupled using the coefficient of variation method obtaining composite index T j to avoid too extreme evaluation results.For instance, selecting three layers for wavelet decomposition on four sensors is recommended when using index T j .However, recommendations for the other individual indexes may differ.Moreover, the maximum and minimum values of the denoising effects after selection differed greatly, with the SNR differing by 14%, the RMSE by 8%, and the r by up to 80%.(4) A CNV method was proposed for safety early warning in the THM system, resulting in yellow and red early warning lines for the four sensors.The method enables daily monitoring of tunnel safety risks, and we validated them with data sampled by surface crack gauge.
This paper has provided a detailed description of the sampling data processing for THM.The health monitoring data from the Haozhuangliang Tunnel was used as a case study analysis.Processing was carried out respectively on missing values, outliers, trend terms, signal burrs, and signal noise.This fulfilled the goal of making up for the lack of comprehensive, systematic methods for dealing with tunnel health monitoring data.However, there are some regrets that the proposed method can be further improved, for example, more advanced methods can be adopted to predict missing values in missing value processing, the SGS parameter can also be optimized using neural networks or PSO algorithms, and more enriched data are still needed for early warning.
Compared with previous studies, this paper synthesizes several modules such as data preprocessing, SGS, WTD, and early warning studies in the THM system.Among them, the proposed SGS and WTD have good adaptability and robustness to the THM data, especially the parameter adjustment process of SGS and WTD.It is expected that more researchers will refer to the research method proposed in this paper to solve practical engineering problems.

Figure 1 .
Figure 1.The flowchart of the organization for this study.

Figure 1 .
Figure 1.The flowchart of the organization for this study.

Figure 2 .
Figure 2. The location and composition of the Haozhuangliang tunnel.(a) Location and composition.(b) Haozhuangliang Tunnel.(c) Dimensions.

Figure 2 .
Figure 2. The location and composition of the Haozhuangliang tunnel.(a) Location and composition.(b) Haozhuangliang Tunnel.(c) Dimensions.

3. 1 .
Data Processing 3.1.1.Missing Value Processing Sensors may contain missing values due to various factors, such as power supply disruptions, maintenance checks, and other ambient factors.To facilitate further analysis, Sensors 2023, 23, 7460 5 of 21

Figure 4 .
Figure 4.The missing data of SSG-L7-1 sensors.(a) the missing data, (b) the filling data.Figure 4. The missing data of SSG-L7-1 sensors.(a) the missing data, (b) the filling data.

Figure 4 .
Figure 4.The missing data of SSG-L7-1 sensors.(a) the missing data, (b) the filling data.Figure 4. The missing data of SSG-L7-1 sensors.(a) the missing data, (b) the filling data.

Sensors 2023 ,
23, x FOR PEER REVIEW 7 of 22 term changes.SSG-L1-1 was installed at the arch foot on the left lining, while SSG-L1-3 was at the arch crown.The various installation positions are the main reason for the differing trend changes.

Figure 7 .
Figure 7.The SNR and RMSE line chat for SSG with three signal smoothing procedures.(a) the SNR of data smoothing, (b) the RMSE of data smoothing.

Figure 7 .
Figure 7.The SNR and RMSE line chat for SSG with three signal smoothing procedures.(a) the SNR of data smoothing, (b) the RMSE of data smoothing.
and Figure 9d-f shows the SNR and RMSE values of SGS for different window lengths, fitting orders, and sample intervals, respectively, and Figure 10a-c and Figure 10d-f shows the SNR values and RMSE values for window length and order, window length, and sample interval, and order and sample interval SG smoothing, respectively.The blue dots in Figure 9 represent the SGS processed values for different parameters.The smoothing results clearly indicate that too large a number results in poorly spaced samples in this study.The insensitivity of the sample interval to the SGS effect is primarily attributed to the fact that the sample interval commonly utilized does not differ considerably.Therefore, we obtained a simple Table 3 after removing the sample interval.

Figure 8 .
Figure 8.The result of four types of signal smoothing methods for SSG-L1-1.(a) the raw data, (b) SGS data, (c) MAS data, (d) FTS data.
and 9d-f shows the SNR and RMSE values of SGS for different window lengths, fitting orders, and sample intervals, respectively, and Figures 10a-c and 10d-f shows the SNR values and RMSE values for window length and order, window length, and sample interval, and order and sample interval SG smoothing, Sensors 2023, 23, 7460 9 of 21

Figure 9 .
Figure 9.The relationship between mono-parameter and individual indicator.(a) SNR for different win_len, (b) SNR for different orders, (c) SNR for different delta; (d) RMSE for different win_len, (e) RMSE for different orders, (f) RMSE for different delta.

Figure 9 . 22 Figure 10 .
Figure 9.The relationship between mono-parameter and individual indicator.(a) SNR for different win_len, (b) SNR for different orders, (c) SNR for different delta; (d) RMSE for different win_len, (e) RMSE for different orders, (f) RMSE for different delta.Sensors 2023, 23, x FOR PEER REVIEW 10 of 22

Figure 10 .
Figure 10.The relationship between multi-parameter and individual indicator.(a) SNR for different win_len and orders; (b) SNR for different win_len and delta; (c) SNR for different orders and delta; (d) RMSE for different win_len and orders; (e) RMSE for different win_len and delta; (f) RMSE for different orders and delta.

Figure 14 .
Figure 14.Early warning points for different SSG sensors.

Figure 14 .
Figure 14.Early warning points for different SSG sensors.

Figure 14 .
Figure 14.Early warning points for different SSG sensors.

Figure 16 .
Figure 16.The early warning lines of SCG: (a) right tunnel monitoring for SCG; (b) left tunnel monitoring for SCG.

Figure 16 .
Figure 16.The early warning lines of SCG: (a) right tunnel monitoring for SCG; (b) left tunnel monitoring for SCG.

Table 1 .
Health monitoring contents and sensors.

Table 1 .
Health monitoring contents and sensors.

Table 2 .
Three types of index of signal smoothing.

Table 2 .
Three types of index of signal smoothing.

Table 3 .
SGS effects with different parameter combinations.

Table 3 .
SGS effects with different parameter combinations.

Table 4 .
Tj for four sensors at different decomposition scales.

Table 4 .
T j for four sensors at different decomposition scales.

Table 5 .
Summary of data processing and denoising for four sensors.

Table 6 .
The warning threshold for crack width (mm).

Table 6 .
The warning threshold for crack width (mm).

Table A1 .
The missing value of SSG sampled signals.

Table A2 .
The outlier of SSG sampled signals.