A Data-Driven Approach to W-Beam Barrier Monitoring Data Processing: A Case Study of Highway Congestion Mitigation Strategy

Mu, Weiguang; Gong, Chengzhu

doi:10.3390/su15054078

Open AccessArticle

A Data-Driven Approach to W-Beam Barrier Monitoring Data Processing: A Case Study of Highway Congestion Mitigation Strategy

by

Weiguang Mu

¹ and

Chengzhu Gong

^2,*

¹

TECH Traffic Engineering Group Corporation, Beijing 100089, China

²

School of Economics and Management, China University of Geosciences, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(5), 4078; https://doi.org/10.3390/su15054078

Submission received: 5 January 2023 / Revised: 8 February 2023 / Accepted: 11 February 2023 / Published: 23 February 2023

(This article belongs to the Special Issue Sustainable Transportation for the Future: Automated Vehicles and Big Data on Traffic Operations)

Download

Browse Figures

Versions Notes

Abstract

In this paper, a data-driven approach is used to process W-Beam Barrier monitoring data, expecting to achieve online estimation of the number of trucks and accurate identification of barrier impact events. By analyzing the data features, significant noise was found in the original data, hiding the useful information, so this paper proposes an improved wavelet thresholding algorithm to achieve data denoising. As there is no study of the same application, this paper compares three commonly used data fault diagnosis algorithms: Principal Component Analysis (PCA), Partial Least Squares (PLS) and Fisher Discrimination Analysis (FDA). By designing and conducting comparison experiments, the results show that the PCA model is more suitable for estimating the number of trucks and the FDA model is more suitable for identifying barrier impact events. The data processing results are shared with the highway operation management system as a trigger condition to enable the strategy of forbidden truck overtaking. Through long-term application, the results show that highway capacity is improved by 12.7% and the congestion index and emissions are slightly reduced after adopting this paper’s method.

Keywords:

traffic information engineering and control; W-Beam barrier monitoring; data-driven; fault diagnosis; principal component analysis; partial least squares

1. Introduction

Transportation carbon emissions are characterized by a significant share, fast growth rate and slow peak attainment [1]. Traffic congestion generates more carbon emissions [2].

When there is a high ratio of trucks in the traffic flow, it leads to a drop in highway capacity, which in turn results in traffic congestion. Additionally, traffic accidents, especially those with vehicles hitting the W-Beam barrier, can also result in traffic congestion. To reduce the congestion caused by these two situations, innovations in and new practices of accident detection technologies are encouraged [3]. We developed a W-Beam barrier vibration detector based on inclination sensors in 2019. During its use, we found that there was a large amount of natural noise in the sensor data, making it impossible to distinguish whether a section of anomalous data was due to noise or to the passage of a truck. In addition, the method of setting thresholds by experience led to a significant false alarm rate when identifying barrier impact events.

Therefore, several methods in data-driven are used in this paper. First, noise reduction is applied to the sensor data and the number of trucks over a given period is identified and counted. Online monitoring is conducted to see identify any incidents of barrier collision. The obtained results are shared with the established highway operation management system.

When no accident occurs, the ratio of trucks is one source of traffic congestion. When the ratio of trucks increases, the effect of a moving bottleneck caused by trucks increases sharply, with the traffic flow forming gathering and dissipating waves, resulting in a rapid decrease in the average speed of traffic flow. This causes an increase in the average delay and a slowdown in the average speed of traffic flow, which can cause a breakdown when serious [4]. At this moment, traffic management strategies need to be adjusted in real time. These include prohibiting trucks from overtaking, active speed limits, the temporary opening of shoulders, etc. [5]. The trigger conditions of existing highway management policies are set by managers based on subjective judgments. The method proposed in this paper can be used as one of the automatic trigger conditions for daily management policies.

The barrier vibration monitoring device has now been tested on a cumulative total of 100+ km of highway in China. The structure of this device is composed of a three-axis inclination sensor (X-axis facing the driving direction, Y-axis facing the vertical ground direction, and Z-axis facing the outside of the road), LOT, and a 5 V battery. The sensors are fixed in the steel posts of the W-Beam barrier and 2 sets are deployed every 200 m (1 set each in the inner and outer barriers with symmetrical deployment). A control machine is deployed every 1.5 km with LOT communication between the control machine and each detector. We encourage the operator to set up the control machine and road cameras together to share the power and fiber optic communication system conveniently. The installation needs to set the starting position as the initial value, and the collected raw data is the offset value between each axis and the initial value.

We found through the field data in the non-accident state that a slight change occurs on the Y-axis when a truck passes by the waveform barrier vibration sensor, and the frequency of the offset value on the Y-axis increases as the number of nearby trucks increases. Using this characteristic, we intend to try to obtain the approximate proportion of trucks passing each sensor through the barrier monitoring data.

However, barrier crash data are not easily available in practice, so the data are sourced from experiments with human-simulated crashes to verify the recognition accuracy. Barrier crash events will be reflected in all axes: serious crash events will lead to large offset values in the Y and Z axes and will not return to the initial value. Less serious events will not lead to extensive deformation of the barrier, but will still be reflected in the Y-axis at the time of crash.

In China, there are legal restrictions on active speed limits and open shoulders, so not all highways can implement this strategy [6]. However, no-truck-overtaking management does not have this restriction [7]. The device will determine when to turn on the no-truck-overtaking management policy downstream based on the percentage of large vehicles upstream. The judgment condition to open the ban on truck overtaking is: in the upstream, 5 continuous sensors (1 km) detect a truck passing at least every 10 s, and this situation lasts for more than 15 min, indicating that at least 100 trucks have passed through the 1 km section during this period.

In this case, we suggest the highway operation management system to post a message at the variable message sign (VMS) at the upstream interchange of this section to alert truck drivers that the road ahead is forbidden for trucks to overtake. The VMS message will be automatically cancelled when the barrier detection data registers fewer than 100 trucks in a 15 min period. The field data comes from a highway in Guangdong, China, with four lanes in both directions and a speed limit of 120 km/h. The ratio of trucks on this section is maintained at 10–20% all year round. The maximum measured capacity is 1800 pcu/h/ln, far from the design capacity value of 2200 pcu/h/ln. The calculation method of pollution caused by congestion is referred to Wang [2] Indicators used. By comparing the data for one month with and one month without the device, it was found that the highway capacity increased by around 12.7% and the environmental pollution index due to congestion was significantly reduced. There were no barrier impact accidents during the experiment period.

Therefore, the aim of this paper is to improve congestion management for daily highway operations and barrier collisions. The approach is to decide whether to turn on the no-overtaking management strategy for trucks by estimating the proportion of large vehicles upstream. By accurately identifying barrier collisions to help rapid highway decongestion. In terms of method, this paper chose to use Chang’s wavelet multiscale algorithm to achieve noise filtration by analyzing the original data [8]. By comparing Gertler X’s PCA [9], Lee Y’s PLS [10] and Gharavian Z’s [11] FDA algorithms, we found that FDA is more suitable for application in monitoring collision events, while PCA is more suitable for evaluating the judgment of the approximate proportion of trucks. Therefore, we intend to use two sets of algorithms to achieve the estimation of the approximate proportion of large trucks and the identification of crash events, respectively.

This paper is an attempt to achieve the goal of carbon neutrality in transportation in China. In addition to significantly reducing the pollution caused by congestion, it can also serve as an alternative solution, based on video and radar technology, to reduce the cost of highway electromechanical equipment and contribute to the sustainable development of China’s highway network.

2. Literature Review

When observing the data from sensors, we found that the useful information is often covered by natural noise. As such, we needed to select a suitable method to filter out the noise and then extract the vibration information and collision information generated by the truck passing by. According to the characteristics of the sensor data, the process of barrier monitoring is regarded as a kind of “process monitoring”, and the data generated by vibration or collision is regarded as a kind of “data fault”. As such, the data-driven method was used to quickly extract useful data from the huge amounts of monitoring data.

2.1. Denoising Algorithm

Mallat S [12] found, through the diagnosis of fault data, that noise information and fault information can occur in different frequency bands of the signal, and thhat so-called sensor noise often appears in the high frequency band. As such, the original data can be split by the wavelet algorithm and then reconstructed after eliminating the high frequency signal, so as to achieve the denoising of the original data. Donoho [13] using the wavelet variation method, measured the detail of the noise signal. Chang Mastriani [8] proposed a new method to remove wavelet domain noise with unknown variance in microarrays. The noisy microarray is decomposed into wavelet sub-bands, smoothed within each highest sub-bands, and then the microarray is reconstructed with the corrected wavelet coefficients.

2.2. Data-Driven Fault Diagnosis Algorithms

Data-driven fault diagnosis methods can be classified into multivariate statistical methods, machine learning methods, signal processing methods and rough set methods. The multivariate statistical methods primarily include Principal Component Analysis (PCA), Partial Least Squares (PLS) and Fisher Discrimination Analysis (FDA), etc. [14].

The classical method of PCA technique for data fault diagnosis is the contribution graph method, which is based on the principle that data containing various types of fault information are the training data, and the vectors indicating the largest changes in the data in several directions are derived [15]. It is because of such properties of PCA models that they are used in many applications for process data monitoring in multi-sensor environments.

Wang et al. [16] used the PCA model and a sensor validation model to generate a reconstructed sensor value which has demonstrated its enhanced monitoring performance.

Yin [17] studied a data processing and fault diagnosis model according to the basic method of PCA theory. The test data was input into the model, and the presence of a failure was determined by comparing thresholds and which sensor and what kind of fault are determined.

E.P. Tao’s study results show that the PCA model can be used to detect faults. The loadings of principal components can well represent the contributions of variables to the principal components, and the scores of principal components give a clear indication of the faulty samples [18].

However, the PCA model has a weakness, in that the training data, X, can only reflect a few fault directions with the greatest variation, and if the gap between these directions is small, it will not serve the purpose of fault classification [19]. The PLS model can solve this type of situation as it requires Y to contain fault information in addition to the training data X. The fault directions can be determined by maximizing the covariance between X and Y, which is more conducive to fault classification. The PCA and PLS models can use the same methods for fault detection, identification and classification, such as T² statistic, Q statistic, contribution map method, etc. Therefore, depending on the application environment, it can be determined which is more suitable based on the same comparison methods.

The FDA model has a way to construct a reduced dimensional space using training data, in common with PCA and PLS, and can then project online data into that space and use the projected feature parameters for fault diagnosis of the data [20]. However, the FDA model differs in that it requires not only the training of normal data, but also data with labeled fault conditions. Z. Ge [21] used the FDA-based model in the Tennessee Eastman chemical process, and the model can perform online fault diagnosis tasks well in the face of 50 sensors generating more than 250 sample data per minute. Sugiyama M [22] demonstrated that the FDA model is able to maintain quantitative performance even when the labeled fault sample data is relatively small, a finding that is in general agreement with Chapelle et al.

3. Methods

In this paper, we first need to reduce the noise at multiple scales by using wavelet thresholding to remove the noise. Second, we must compare the principle of each algorithm, select the suitable algorithm for estimating the proportion of trucks, and compare the experimental results of non-accident information with the video to confirm the accuracy of the algorithm. Using the simulated impact data, the accuracy of each fault diagnosis algorithm is compared and the suitable algorithm is selected to merge with the wavelet thresholding algorithm to form the core algorithm for W-Beam Barrier collision identification.

3.1. Sensor Data Analysis

We placed 20 monitoring points at 200 m intervals on a freeway in Guangdong Province, China, and collected one set of data per second. Data were collected for about 1 h during the flat-peak period. There were no crashes during the data collection, and the ratio of trucks exceeded 15% for about 15 min. In Figure 1, data from three monitoring points were selected to compare about 1350 sets of data during the peak hours (a) as well as the flat-peak time period (b).

Since there is no crash, the data contain a large amount of noise and vibration data. The causes of this noise can come from disturbances in nature, geomagnetic interference and inconsistencies in the operating properties of the front-end sensor software and hardware. As the amount of data increases, a significant amount of noise obscures useful information at multiple scales. For example, the drift gradient of the sensor tends to show a slow-varying, constant-deviation type, while the vibration data generated when a truck passes by will show a high-frequency, sudden-varying type. Such data is often masked by noise.

3.2. Improved Wavelet Threshold Noise Removal Algorithm

The improved wavelet threshold noise removal method proposed in this paper is based on previous research. Since the previous research results were designed for other kinds of detectors, the threshold value needs to be adjusted down and the sensitivity needs to be enhanced when applied to barrier monitoring devices. In this paper, three aspects of the noise standard variance estimation, threshold setting and wavelet coefficient adjustment function in the literature [13] are modified to make it more suitable for the use of barrier monitoring data.

3.2.1. Noise Standard Deviation Estimation Formula

The noise standard deviation is a measure of the difference between the noise and the information it is supposed to correct. After decomposing the signal with scale J, there will be a high-frequency signal and a low-frequency signal under each scale. Where the high-frequency signal is

W_{j, k}

, the noise standard deviation estimation formula is defined as Equation (1):

σ_{j} = \frac{1}{0.6745} \times \frac{1}{N} \sum_{K = 1}^{N} |W a_{j, k}|, 1 \leq j \leq J

(1)

where j is the scale, N is the total number of wavelet coefficients at scale j, and k is the number of current wavelet coefficients.

3.2.2. Threshold Setting Functions

Depending on the actual signal-to-noise ratio of the original signal, the threshold setting needs to be changed according to the actual situation. The uniform threshold equation given in the literature [13] is known.

λ_{1, j} = σ_{j} \sqrt{2 \log (N)}

(2)

After decomposing the signal with scale J, J sets of high-frequency signal coefficients are obtained, and each set of wavelet coefficients is arranged from smallest to largest in absolute value to obtain a vector:

P = [W a_{j, n}]

, 1 ≤ n ≤ N. Use this vector to calculate the evaluation vector under the jth layer of wavelet coefficients:

R = [r_{n}]

, 1 ≤ n ≤ N, where:

r_{n} = \sum_{k = 1}^{n} W a_{j, n} + (N - i) W a_{j, n} + (N - 2 n) σ_{j}^{2}

(3)

The evaluation vector interruption values are then sorted from largest to smallest, and the smallest value is taken as the approximation error to find the corresponding wavelet coefficient

W a_{j, m}

, which is used to calculate the threshold value at the jth layer wavelet decomposition as:

λ_{a, j} = \sqrt{C D_{m i n}}

. The threshold selection function at the jth layer wavelet decomposition is:

λ_{j} = \{\begin{array}{l} λ_{1, j}, (P_{a, j} - σ_{j}^{2} < ρ_{N, j}) \\ \min (λ_{1, j}, λ_{a, j}), (P_{a, j} - σ_{j}^{2} \geq ρ_{N, j}) \end{array}

(4)

where

P_{a, j}

is the average of the absolute values of the wavelet coefficients and

ρ_{N, j}

is the very small energy level of this wavelet coefficient vector. The calculation equation is as follows.

P_{a, j} = \frac{1}{N} \sum_{k = 1}^{N} W a_{j, k}

(5)

3.2.3. Calculation of Wavelet Coefficient Estimates

The estimated wavelet coefficients, i.e., the wavelet coefficients at that place, are suspected to be caused by noise and need to be reduced to the signal. The original wavelet coefficient values are replaced by the estimated wavelet coefficient values through a series of calculations, and then the wavelets are reconstructed to finally achieve the purpose of denoising. A coefficient

Γ (σ_{j})

, which reflects the noise intensity of the high frequency signal of the jth layer wavelet, is introduced. The calculation formula is as follows:

Γ (σ_{j}) = \sqrt{σ_{j} / A_{j}}

, where

A_{j}

denotes the coefficient amplitude of the high-frequency part of the jth layer wavelet. The estimated values of wavelet coefficients are given in Equation (6).

\hat{w_{j, k}} = \{\begin{array}{l} w_{j, k} - Γ (σ_{j}) \times λ_{j}, w_{j, k} > λ_{j} \\ w_{j, k} + Γ (σ_{j}) \times λ_{j}, w_{j, k} < - λ_{j} \\ 0, - λ_{j} \leq w_{j, k} \leq λ_{j} \end{array}

(6)

The detailed steps of the improved wavelet threshold denoising algorithm are described in the literature [13].

3.3. PCA-Based Fault Diagnosis Model

The PCA model is a typical data dimensionality reduction method that allows the relevant multidimensional variables to be combined linearly to extract their characteristic parameters and used to determine fault information. The PCA-based fault diagnosis model works by using a large amount of normal offline data for statistical modeling, i.e., divided into two orthogonal and complementary subspaces, namely, the principal element subspace and the residual subspace. When the sample data is monitored online, any sample data x(t) can be decomposed and projected on both spaces, i.e.,

x = \hat{x} + \tilde{x}

,

\hat{x} = P P^{T} x \in R_{P}

,

\tilde{x} = (I - P P^{T}) x \in R_{r}

, and

\tilde{x} = (I - P P^{T}) x \in R_{r}

. Where

\hat{x}

is the modeled part and

\tilde{x}

is the unmodeled part [23].

Squared prediction error (SPE) and Hotelling’s

T^{2}

are usually used to detect whether a process is abnormal when it is necessary to distinguish normal data from faulty data [24].The SPE metric measures the change in the projection of the sample vector in the residual space:

S P E = | |(I - P P^{T}) x| |^{2} \leq δ_{α}^{2}

When the statistical metric with faulty data exceeds the statistical control limits, the change in correlation between the data is reflected in the change in the SPE value. As shown in Figure 2,

δ

is the radius of the control domain, and the black dots indicate the SPE values of the data points in the residual space; if the data points fall outside the domain, a fault is considered to have occurred [25].

The

T^{2}

statistic measures the variation of variables in the principal metric space:

T^{2} = x^{T} P Λ^{- 1} P^{T} x \leq T_{α}^{2}

. A sample data variable is considered faulty when it is projected in the principal metric space at a distance exceeding the radius of the control domain.

Since SPE and

T^{2}

indicators have different monitoring priorities, a phenomenon will often occur: the

T^{2}

indicator for a sample is outside its control domain while the SPE indicator for that sample is within the normal range. This indicates that the data for that sample may be a data fault, or there may be a change in the measurement range that requires other means to aid in the decision making of fault information.

3.4. PLS-Based Fault Diagnosis Model

In the practical application of a barrier monitoring system, only one or two kinds of sensor data need to be focused on in order to judge the collision event, and the data of other kinds of sensors are only auxiliary. The PLS model is able to achieve this function. The PLS model is based on the PCA model: from the variable

x (t)

, the variable

R (t)

is selected to focus on, and the space X is PLS-decomposed into two subspaces,

S_{p}

and

S_{r}

,. The fault detection is monitored by these two subspaces. If a fault occurs in the variable space Y, i.e., the fault affects the quality change, the fault must occur in the subspace

S_{p}

,.I If the fault does not affect the quality change, the fault must occur in the subspace

S_{r}

. Usually, the

T^{2}

indicator is used to detect faults in

S_{p}

, and the Q indicator is used to detect faults occurring in

S_{r}

[26].

The calculation [27] is that there are variables

x = \hat{x} + \tilde{x}

, which can reduce X into two subspaces:

\hat{x} = P R^{T} \in S_{p} \equiv S p a n \{P\}

and

\tilde{x} = (I - P R^{T}) x \in S_{r} \equiv S p a n \{R\} ⊥

. Unlike in the PCA model,

\hat{x}

and

\tilde{x}

are not orthogonal:

\hat{x}

is the oblique projection of x along the subspace

S p a n \{R\} ⊥

to

S p a n \{P\}

, while

\tilde{x}

is the oblique projection of x along the subspace

S p a n \{P\}

to

S p a n \{R\} ⊥

. When online monitoring is performed, the principal and residual spaces of the newly obtained samples can be decomposed as:

t_{n e w} = R^{T} x_{n e w}

and

{\tilde{x}}_{n e w} = (I - P R^{T}) x_{n e w}

. Calculate the control limit index:

T^{2} = t_{n e w}^{T} Λ^{- 1} t_{n e w}; Q = ∥ {\tilde{x}}_{n e w} ∥^{2}

.

In the above calculation process, a phenomenon will be found that

T^{2}

and Q are correlated, which will lead to the fact that a certain fault can appear in both subspaces at the same time. This is similar to the use of SPE,

T^{2}

indicators in the PCA model. As such, this type of phenomenon requires other means to assist in the decision making process for fault information. Additionally, fault diagnosis models using the PLS model can be more accurate and efficient than using the PCA model only if there is a better understanding of the monitored system.

3.5. FDA-Based Fault Diagnosis Model

The FDA model, like the PCA and PLS models, is a method that also uses training data to construct a reduced dimensional space. It then projects online data into that space and uses the projected feature parameters for fault diagnosis of the data. However, this model differs in that it requires both the training of normal data and data under labeled fault conditions [20]. If the data from barrier impacts are considered as a type of data fault, various types of impact events can be artificially created and the FDA model can be trained with both the collected data and the normal data.

Principle of FDA-based fault diagnosis model [28]: the space X consisting of the variables

x (t)

mentioned previously is the normal data. The created crash event monitoring data are divided by class into a fault event data space G constructed from

g (t)

. It is assumed that G is incorporated into X,

X = {[x_{1} x_{2} \dots x_{n}]}^{T} \in R^{n \times m}

, to train the FDA model. Define the total data dispersion as:

S_{t} = \sum_{i = 1}^{n} (x_{i} - \bar{x}) {(x_{i} - \bar{x})}^{T}

where

\bar{x}

is the mean vector of n samples. Define

X_{j}

as the set of sample vectors belonging to the jth class of data, and the intra-class dispersion

S_{j}

and the total intra-class dispersion

S_{w}

of the jth class of data as Equation (7), respectively.

\{\begin{matrix} S_{t} = \sum_{x_{i} \in X_{j}} (x_{i} - {\bar{x}}_{j}) {(x_{i} - {\bar{x}}_{j})}^{T} \\ S_{w} = \sum_{j = 1}^{p} S_{j} \end{matrix}

(7)

In Equation (7),

{\bar{x}}_{j}

is the mean vector of class j data, and the interclass dispersion is defined as:

S_{b} = \sum_{j = 1}^{p} n_{j} ({\bar{x}}_{j} - \bar{x}) {({\bar{x}}_{j} - \bar{x})}^{T}

, satisfying the total dispersion equal to the sum of intraclass dispersion and interclass dispersion:

S_{t} = S_{w} + S_{b}

. The FDA model is used to satisfy the minimum intraclass dispersion while maximizing the dispersion, i.e., the optimization objective function as Equation (8).

\max_{v \neq 0} \frac{v^{T} S_{b} v}{v^{T} S_{w} v}

(8)

In Equation (8), assuming that

S_{w}

is invertible, the FDA vector is equivalent to:

S_{b} w_{i} = λ_{i} S_{w} w_{i}

. Since the rank of

S_{b}

is less than p, there are at most p-1 nonzero characteristic roots, and the computed FDA vector, by column, forms the projection matrix

W_{p} ϵ R^{m \times (p - 1)}

. Thus, the sample

x_{i}

can be projected into the p-1 dimensional FDA space to obtain

z_{i} = W_{p}^{T} x_{i}

, thereby achieving optimal separation of the data. When online monitoring is performed, the online sample x is projected into the low-dimensional space composed of column vectors of the matrix

W_{p}

, and the FDA score of the sample is obtained:

z = W_{p}^{T} x

,. This is combined with a metric such as the Marxian distance of the literature [29] to find the source of the fault to which x belongs and achieve fault diagnosis.

Since the FDA model takes the data under fault conditions into consideration during the training process, its fault diagnosis accuracy is theoretically better than both the PCA and PLS models. Additionally, it arranges the modeled data while satisfying the criteria of minimizing intra-class dispersion and maximizing inter-class dispersion to complete fault diagnosis, avoiding the uncertainty between PCA and PLS fault diagnosis indexes.

4. Experimental Design

In order to verify the theoretical analysis results and compare the accuracy of various types of fault diagnosis models, simulation experiments were designed to simulate the PCA-, PLS- and FDA-based data fault diagnosis models in MATLAB. The experimental procedure was designed based on the principles of three algorithms: PCA, PLS and FDA. As shown in Figure 3, each uses their own statistical models to create their respective residual and principal metric spaces using the training data, and then downscale the online data in the two spaces, using the results of the projections for fault diagnosis. The purpose of the experiment is to select which algorithm is more suitable for identifying the approximate number of trucks and which algorithm is more suitable for identifying that a crash event has occurred.

4.1. Training Data

Two sets of training data were prepared for the experiment. The first set of training data used the flat-peak data described in the previous section. The 2nd set of training data was used to obtain the impact barrier event data through simulation. The method of simulation is: 10 adjacent monitoring points are selected and numbered from S1 to S10. Three sections of barrier between S2 and S5 were selected and the experimenter randomly hit the barrier using around 10 kg of steel bars. The PCA and PLS-based data fault diagnosis models were trained with the first set of training data. The FDA-based data fault diagnosis model was trained using group 1 and 2 training data.

4.2. Online Data

At the barriers between monitoring points S1 and S2, random positions were selected and the barriers impacted using the same method. The complete measurement data were recorded and used as the input for the PCA, PLS and FDA fault diagnosis models after wavelet threshold noise removal.

4.3. Error Indicators

Define the false alarm rate

μ_{f}

, the missed alarm rate

μ_{g}

, and the fault algorithm accuracy

Φ

. Where the false alarm rate describes the case where a data is not a fault but is wrongly reported as a fault and the missed alarm rate describes the case where a data should have been a fault but was not detected. The calculations are Equations (9)–(11).

μ_{f} = \frac{n}{N}

(9)

μ_{g} = \frac{m}{N}

(10)

Φ = (1 - \sqrt{\frac{μ_{f}^{2} + μ_{g}^{2}}{2}}) \times 100 %

(11)

where n is the number of false data points, m is the number of missed data points and N is the total number of data points.

5. Results and Discussion

5.1. PCA Fault Diagnosis Model Results

The PCA-based fault diagnosis model requires two indicators of statistical SPE and T2 to achieve fault diagnosis. The results of each control limit indicator are shown in Figure 4.

In order to determine the location of the sensor where the fault occurred, the contribution map method was used and the results are shown in Figure 5.

Due to the large amount of vibration information interference, the PCA model is unable to achieve the diagnosis of crash events but is more sensitive to vibration information. The number of data failures that occurred was counted as 90, which means that the system believes that 90 trucks passed through the experimental section.

5.2. PLS Fault Diagnosis Model Results

The PLS-based fault diagnosis model requires statistical SPE and Q indicators to achieve fault diagnosis. The results of each control limit indicator are shown in Figure 6. To determine the location of the sensor where the data fault occurred, the contribution map method was used. The results are shown in Figure 7.

Compared with the PCA model, the PLS model also has a high false alarm rate for collision event recognition and suppresses vibration information, leading to a larger error in estimating the proportion of trucks. The PLS model assumes that 35 trucks passed by during the experiment, which differs significantly from the fact.

5.3. FDA Fault Diagnosis Model Results

Figure 8 shows the results of the FDA-based fault diagnosis. Since the noise and fault data were involved together with normal data during the training process, the faults were accurately located in S1, S2, and S3 when they occurred, as the right side of Figure 8. The fault location may be closer to S1, which is effectively consistent with the simulated online data.

5.4. Error Comparison Results

Comparison Experiment 1: The algorithm suitable for truck proportion estimation was selected from the PCA, PLS, and FDA models. By comparing the number of fault data found with the manual survey of traffic flow (466 pcu of traffic flow during the experiment, including 75 large vehicles, the proportion of large vehicles 15%), using the accuracy rate for comparison. The results are shown in Table 1.

Comparison experiment 2: The algorithm suitable for identifying barrier impacts was selected from the PCA, PLS, and FDA models. The data were compared by comparing simulated impact events. The results are shown in Table 2.

As can be seen from the results: the PCA model is more suitable for truck proportion estimation, while the FDA model is more suitable for barrier impact event monitoring.

5.5. Application Effect Comparison

In 2021, we installed barrier monitoring devices in an approximately 3 km long section of highway. The section consists of a system interchange as the starting point and a service interchange as the ending point. The service interchange has a toll gate, which has two ETC gantries in the upstream section and two in the downstream section. These can be used to count traffic volume. There is a VMS upstream of the system interchange that is used to post information. The experiment first extracted historical data from December 2020 to January 2021, as shown in Table 3. The Average Daily Congestion Index for this month was calculated using Wang’s method [2] based on the congestion index to calculate the CO₂, CO, HC and NOX emissions per hour.

In December 2021, we loaded the algorithm studied in this paper into the outfield computer of the barrier monitoring devices. When the barrier monitoring devices within 1 km detect 100 trucks passing within a 15 min period, the message “No overtaking trucks on the road ahead” was be posted in the upstream of the system interchange. When the situation does not reoccur within 30 min, the message is automatically cancelled. This control policy is set to operate between 6:00 and 21:00. Because the traffic flow is not high at night, trucks overtaking do not lead to a drop in highway capacity.

Based on the statistical results of ETC gantry data, the average daily traffic volume, truck volume, average daily congestion index and emissions for the same months in both years were compared. The results are shown in Table 3.

The traffic volumes, congestion factors and emissions were compared for the same period in two different years. It was found that the reduction in emissions was not significant, but the average daily traffic volume improved by 12.7%, despite a significant increase in the proportion of trucks, and a reduction in highway congestion occurred. This may be due to the strong correlation between Zhao’s method of calculating emissions and the congestion index.

6. Conclusions

The purpose of the W-Beam barrier monitoring device: real-time estimation of truck ratio and real-time accurate identification of traffic events that hit the barrier. The monitoring results will be shared with the highway operation management system and used to implement strategies to restrict trucks from overtaking. It can improve highway capacity, relieve congestion and detect traffic accidents in real time. However, when the sensor sensitivity is too low, it will lead to many events not being detected even when the noise is low in the raw data. When the sensor sensitivity is too high, there is a large amount of noise in the raw data that obscures the useful information. Therefore, a data-driven approach is needed to process the data when turning up the sensor sensitivity.

We found that: (1), the wavelet thresholding algorithm can effectively remove the noise from the original data, but the determination of the threshold value needs further optimization. This paper found that, by adjusting the threshold value, the data after noise removal can support the diagnosis of subsequent data faults. (2), The PCA, PLS, FDA and other models have been used in many applications for process data monitoring. However, there is no research in barrier monitoring. In this paper, these three representative models were selected and tried according to the characteristics of monitoring data. The experimental results show that the PCA model is more suitable to be applied to the estimation of truck proportion and the FDA model is more suitable for the accurate identification of collision events. However, this does not indicate that there are no additional suitable algorithms. In the future, we will add research in this area. (3) Regarding long-term application effects, the adoption of the barrier monitoring device proposed in this paper resulted in a 12.7% increase in highway capacity, a slight reduction in congestion index and a slight reduction in emissions despite an increase in the proportion of trucks. The data processing method based on data-driven barrier monitoring device proposed in this paper provides an implementable solution for the sustainable development of highways.

Author Contributions

Conceptualization, W.M. and C.G.; methodology, W.M.; software, W.M.; validation, W.M.; formal analysis, W.M.; investigation, W.M.; resources, W.M.; data curation, W.M.; writing—original draft preparation, W.M.; writing—review and editing, C.G.; visualization, W.M. and C.G.; supervision, C.G.; project administration, W.M. and C.G.; funding acquisition, C.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bai, C.; Zhou, L.; Xia, M.; Feng, C. Analysis of the spatial association network structure of China’s transportation carbon emissions and its driving factors. J. Environ. Manag. 2020, 253, 109765. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Chi, L.; Hu, X.; Zhou, H. Urban Traffic Congestion Pricing Model with the Consideration of Carbon Emissions Cost. Sustainability 2014, 6, 676–691. [Google Scholar] [CrossRef]
Xu, B.; Liu, X.; Yang, Y.; Li, J.; Postolache, O. Optimization for a Multi-Constraint Truck Appointment System Considering Morning and Evening Peak Congestion. Sustainability 2021, 13, 1181. [Google Scholar] [CrossRef]
Wei, X.; Xu, C.; Wang, W.; Yang, M.; Ren, X. Evaluation of average travel delay caused by moving bottlenecks on highways. PLoS ONE 2017, 12, e0183442. [Google Scholar] [CrossRef] [PubMed]
Song, G.; Lei, Y.; Wang, Z. Aggregate Fuel Consumption Model of Light-Duty Vehicles for Evaluating Effectiveness of Traffic Management Strategies on Fuels. J. Transp. Eng. 2009, 135, 611–618. [Google Scholar] [CrossRef]
Ying, L.; Chow, A.; Cassel, D.L. Optimal Control of Motorways by Ramp Metering, Variable Speed Limits, and Hard-Shoulder Running. J. Transp. Res. Board 2014, 2470, 122–130. [Google Scholar]
Zhong, L.; Zhou, Y.; Wu, K. Truck Management Strategies on Freeways during Holidays. In Proceedings of the Twelfth Cota International Conference of Transportation Professionals, Beijing, China, 3–6 August 2012. [Google Scholar]
Chang, S.G.; Yu, B. Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Process. 2000, 9, 1532. [Google Scholar] [CrossRef]
Gertler, J.; Jin, C. PCA-based fault diagnosis in the presence of control and dynamics. AIChE J. 2010, 50, 388–402. [Google Scholar] [CrossRef]
Lee, G.; Song, S.O.; Yoon, E.S. Multiple-Fault Diagnosis Based on System Decomposition and Dynamic PLS. Ind. Eng. Chem. Res. 2003, 42, 6145–6154. [Google Scholar] [CrossRef]
Gharavian, M.H.; Ganj, F.A.; Ohadi, A.R.; Bafroui, H.H. Comparison of FDA-based and PCA-based features in fault diagnosis of automobile gearboxes. Neurocomputing 2013, 121, 150–159. [Google Scholar] [CrossRef]
Mallat, S.G. A Wavelet Tour of Signal Processing; Academic Press: Waltham, MA, USA, 1999; pp. 40–42. [Google Scholar]
Donoho, D.L. De-noising by soft-thresholding. IEEE Trans. Inf. Theory 2002, 41, 613–627. [Google Scholar] [CrossRef]
Venkatasubramanian, V.; Rengaswamy, R.; Yin, K.W.; Kavuri, S.N. A review of process fault detection and diagnosis part i: Quantitative modelbased methods. Comput. Chem. Eng. 2003, 27, 293–311. [Google Scholar] [CrossRef]
Miller, P.; Swanson, R.E.; Heckler, C.E. Contribution Plots: A Missing Link in Multivariate Quality Control. Appl. Math. Comput. Sci. 1998, 8, 775–792. [Google Scholar]
Wang, Z.; Du, H.; Lv, F.; Du, W. The fault detection of multi-sensor based on multi-scale PCA. In Proceedings of the 2013 25th Chinese Control and Decision Conference (CCDC), Guiyang, China, 25–27 May 2013. [Google Scholar]
Yin, S.; Steven, X.D.; Naik, A.; Deng, P.; Haghani, A. On PCA-based fault diagnosis techniques. In Proceedings of the 2010 Conference on Control and Fault-Tolerant Systems (SysTol), Nice, France, 6–8 October 2010. [Google Scholar]
Tao, E.; Shen, W.; Liu, T.; Chen, X. Fault diagnosis based on PCA for sensors of laboratorial wastewater treatment process. Chemom. Intell. Lab. Syst. 2013, 128, 49–55. [Google Scholar] [CrossRef]
Kumar, M.; Sharma, R.K.; Jindal, M.K. PCA Based Offline Handwritten Gurmukhi Character Recognition System. Smart Comput. Rev. 2013, 3, 346–357. [Google Scholar]
Chiang, L.H. Fault Detection and Diagnosis in Industrial Systems; Springer: London, UK, 2001; pp. 25–36. [Google Scholar]
Ge, Z.; Zhong, S.; Zhang, Y. Semi-supervised Kernel Learning for FDA Model and its Application for Fault Classification in Industrial Processes. IEEE Trans. Ind. Inform. 2016, 12, 1403–1411. [Google Scholar] [CrossRef]
Sugiyama, M.; Idé, T.; Nakajima, S.; Sese, J. Semi-supervised local Fisher discriminant analysis for dimensionality reduction. Mach. Learn. 2010, 78, 35. [Google Scholar] [CrossRef]
Zhao, C. Fault subspace selection and analysis of relative changes based reconstruction modeling for multi-fault diagnosis. IEEE Trans. Control Syst. Technol. 2015, 24, 928–939. [Google Scholar] [CrossRef]
Dunia, R.; Joe Qin, S. Joint diagnosis of process and sensor faults using principal component analysis. Control Eng. Pract. 1998, 6, 457–469. [Google Scholar] [CrossRef]
Chatterjee, C.; Kang, Z.; Roychowdhury, V.P. Algorithms for accelerated convergence of adaptive PCA. Neural Netw. IEEE Trans. 2000, 11, 338–355. [Google Scholar] [CrossRef]
Zhou, D.H. Total projection to latent structures for process monitoring. AIChE J. 2010, 56, 168–178. [Google Scholar] [CrossRef]
Kresta, J.V.; Macgregor, J.F.; Marlin, T.E. Multivariate statistical monitoring of process operating performance. Can. J. Chem. Eng. 1991, 69, 35–47. [Google Scholar] [CrossRef]
Chiang, L.H. Fault diagnosis based on Fisher discriminant analysis and support vector machines. Comput. Chem. Eng. 2004, 28, 1389–1401. [Google Scholar] [CrossRef]
Feng, J.; Wang, J.; Zhang, H.; Han, Z. Fault Diagnosis Method of Joint Fisher Discriminant Analysis Based on the Local and Global Manifold Learning and Its Kernel Version. IEEE Trans. Autom. Sci. Eng. 2016, 13, 122–133. [Google Scholar] [CrossRef]

Figure 1. Comparison of the expansion of detection data in peak hours (a) and flat hours (b).

Figure 2. Schematic diagram of PCA fault detection based on SPE indicators.

Figure 3. Experimental flow chart.

Figure 4. PCA-based fault diagnosis result chart.

Figure 5. PCA-based contribution map.

Figure 6. PLS-based fault diagnosis result chart.

Figure 7. PLS-based contribution map.

Figure 8. FDA-based fault diagnosis result chart.

Table 1. Truck Identification Accuracy Comparison.

Model Type	Statistical Quantities	Number of Trucks	Accuracy Rate (%)
PCA	$T^{2}$	80	91.08
PCA	SPE	90	91.08
PLS	$T^{2}$	35	45.22
PLS	SPE	38	45.22
FDA	$z$	15	12.75

Table 2. Comparison of Accuracy of Barrier Impact Event Identification Results.

Model Type	Statistical Quantities	False Alarm Rate (%)	Underreporting Rate (%)	Accuracy Rate (%)
PCA	$T^{2}$	67.75	8.33	32.21
PCA	SPE	52.10	4.12	32.21
PLS	$T^{2}$	33.13	7.87	65.62
PLS	SPE	28.12	0.12	65.62
FDA	$z$	18.45	1.73	88.23

Table 3. Comparison of application effects.

Time	Passenger Vehicle Volume (veh/Day)	Truck Volume (veh/Day)	Average Daily Congestion Index	Emissions (kg)
2020/12/01–2021/01/01	23386	26812	2.3	CO₂:40000 CO:210.9 HC:18.58 NOX:26.37
2021/12/01–2022/01/01	26101	39840	1.8	CO₂:47000 CO:178.33 HC:20.21 NOX:19.17

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mu, W.; Gong, C. A Data-Driven Approach to W-Beam Barrier Monitoring Data Processing: A Case Study of Highway Congestion Mitigation Strategy. Sustainability 2023, 15, 4078. https://doi.org/10.3390/su15054078

AMA Style

Mu W, Gong C. A Data-Driven Approach to W-Beam Barrier Monitoring Data Processing: A Case Study of Highway Congestion Mitigation Strategy. Sustainability. 2023; 15(5):4078. https://doi.org/10.3390/su15054078

Chicago/Turabian Style

Mu, Weiguang, and Chengzhu Gong. 2023. "A Data-Driven Approach to W-Beam Barrier Monitoring Data Processing: A Case Study of Highway Congestion Mitigation Strategy" Sustainability 15, no. 5: 4078. https://doi.org/10.3390/su15054078

APA Style

Mu, W., & Gong, C. (2023). A Data-Driven Approach to W-Beam Barrier Monitoring Data Processing: A Case Study of Highway Congestion Mitigation Strategy. Sustainability, 15(5), 4078. https://doi.org/10.3390/su15054078

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Data-Driven Approach to W-Beam Barrier Monitoring Data Processing: A Case Study of Highway Congestion Mitigation Strategy

Abstract

1. Introduction

2. Literature Review

2.1. Denoising Algorithm

2.2. Data-Driven Fault Diagnosis Algorithms

3. Methods

3.1. Sensor Data Analysis

3.2. Improved Wavelet Threshold Noise Removal Algorithm

3.2.1. Noise Standard Deviation Estimation Formula

3.2.2. Threshold Setting Functions

3.2.3. Calculation of Wavelet Coefficient Estimates

3.3. PCA-Based Fault Diagnosis Model

3.4. PLS-Based Fault Diagnosis Model

3.5. FDA-Based Fault Diagnosis Model

4. Experimental Design

4.1. Training Data

4.2. Online Data

4.3. Error Indicators

5. Results and Discussion

5.1. PCA Fault Diagnosis Model Results

5.2. PLS Fault Diagnosis Model Results

5.3. FDA Fault Diagnosis Model Results

5.4. Error Comparison Results

5.5. Application Effect Comparison

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI