1. Introduction
Extracting vehicle configurations on bridges is essential, like detecting overweight trucks and acquiring site-specific traffic information. This information can be considered for structural health monitoring (SHM). At an early stage of weight control, static scales have been used. However, the weighing process takes a long time, and measurement stations can be easily bypassed. An alternative approach uses sensors on the pavement that acquire dynamic measurements. This approach reduces the weighing time, but the sensors are exposed to more stress, making the system maintenance difficult. Current solutions use the bridge as a weighing scale [
1,
2,
3]. These systems are called bridge weigh-in-motion (BWIM). A part of the bridge, such as the girder, is usually used to measure strain or acceleration during a vehicle crossing, which is henceforth referred to as an event. Axle count, spacing, and driving speed information determine the vehicle’s weight. Pavement sensors are generally exploited to acquire these three parameters. Nothing-on-road (NOR) BWIMs are entirely dispensed with pavement sensors by only using the lower side of the bridge for sensor attachment. This setup increases the challenge of acquiring data about the vehicle configuration. Also, the lower bridge side has to be accessible, which is not always the case. A remote NOR BWIM has barely been studied, except by Ojio et al. [
3], who use cameras for contactless measurements.
Ground-based radar (GBR) has frequently been investigated in the context of bridge monitoring in recent years [
4,
5,
6,
7,
8]. It achieves comparable or even better accuracy than established, conventional strain, or acceleration sensors [
7] with the advantage of being remote and non-invasive. Thus, GBRs can be quickly set up for measurements. As they measure the bridge deflection and displacement, it can be directly applied to many SHM methods, like determining the dynamic input factor [
9]. Furthermore, GBR has been investigated regarding the detection of bridge crossing events and, to some extent, the vehicle type classification [
10,
11]. Arnold and Keller [
12] show that it is possible to differentiate between single- and multi-presence events from GBR deflection using machine learning (ML) despite facing challenges like variable-length time series and dataset imbalance. Yet, GBR has never been explicitly studied in the context of BWIM. In this paper, we will investigate the potential of GBR bridge displacement time series for BWIM by applying standard signal processing methods such as bandpass filtering and continuous wavelet transform (CWT) and using data-driven ML.
Our study is motivated by using remote displacement measurements with GBR in the context of BWIM. We develop an output-only method using GBR for analysis and an unmanned aerial vehicle (UAV) to gather ground-truth data. Measuring the displacement is challenging, so it has barely been studied for BWIM. Yet, it is regarded as a relevant feature for SHM [
13]. To our knowledge, to this date, no study exists that tries to extract traffic information from only real-world bridge deflection. Since GBR-based bridge monitoring has many advantages, such a system would be beneficial. Therefore, we focus on two objectives.
We analyze and implement ML approaches to determine relevant BWIM vehicle configurations, including vehicle type, lane, locus, speed, axle count, and spacing.
We investigate signal processing techniques in the context of axle-related parameters.
An important aspect of BWIM is the determination of the traffic load. This study will not investigate load estimation since we do not have ground-truth data. However, Ojio et al. [
3] showed that it is possible to extract the vehicle load from bridge displacement data. Furthermore, our approach is currently limited to single-vehicle events. Otherwise, the number of vehicles during an event has to be extracted from bridge displacement data. To date, no study in this regard has been conducted. Therefore, events during which multiple vehicles are present simultaneously have been discarded as Arnold and Keller [
12] indicates that data-driven differentiation between single- and multi-presence events is possible.
First, we describe the related work relevant to this study in
Section 2. In
Section 3, we explain the setup for GBR measurements and describe the used dataset. In
Section 4, we first cover the basics of wavelet transform. Then, the proposed feature extraction, the ML models, and the methodology for each task are detailed. The results of our approach are laid out in
Section 5, and a comprehensive discussion is given in
Section 6. Finally,
Section 7 summarizes the main findings of this study.
2. Related Work
The relevance of time-series data in ML has never measured up to the prominence of images, although the variety in the data poses an exciting challenge. A wide range in scaling, length, sampling frequency, channels, etc., makes it challenging to develop versatile models to handle such complexity. The UCR dataset provides extensive datasets and is often used as a benchmark for new approaches [
14]. However, at the date of writing this paper, only six datasets are both multi-variate and of unequal length. Moreover, the differences in signal length are minor. Oftentimes, an unequal length is circumvented by rescaling or padding in the time domain or extracting features, for instance, by fitting an autoregressive integrated moving average model [
15]. Utilizing the UCR dataset, ref. [
16] gives a detailed overview of the most relevant state-of-the-art methods for time-series classification. Random convolutional kernel transform (ROCKET), introduced by [
17], is highlighted due to its training speed and as it has the best results, especially for multi-variate time-series classification. ROCKET uses random convolutional kernels to extract features, which are then used as input data for a linear ridge classifier. Dempster et al. [
18] further refined ROCKET to a version called MiniRocket. MiniRocket is faster than ROCKET and is mostly deterministic while achieving comparable results. The authors also contributed a version of MiniRocket to the Python library sktime [
19], which can handle unequal-length time series. Ref. [
20] took an approach based on convolutional neural networks (CNNs) in combination with padding and masking for acoustic scene classification. To this end, they implemented a global pooling layer that supports masking to prevent the model from learning the padding. Arnold and Keller [
12] used hand-crafted features for tree-based learners and MiniRocket for a bridge event classification on variable-length GBR displacement data. Additionally, they investigated the potential of data augmentation in this context. Overall, they achieve a balanced accuracy of over 90% when classifying crossings in single- or multi-presence events.
Several studies used ML or deep learning (DL) methods to extract vehicle information from NOR BWIM time series data. Kawakatsu et al. [
21] attached a single strain sensor to the span of a 300
long-concrete bridge in Japan. They used CNNs to detect vehicles and estimate the speed, locus, and axle count. As input, 8
windows of strain data are passed to the network. They use a traffic surveillance system for their ground-truth data, resulting in up to 996,093 samples depending on the classification task. In [
22], they expand their dataset by a steel bridge and investigate acceleration sensors as an alternative to strain sensors. The mean absolute error (MAE) for locus estimation is
or
for strain data depending on the driving direction. Kawakatsu et al. [
2] extend their previous work by load estimation using multi-task CNNs. A load meter provides the necessary ground-truth data. They also increase the sequence length of the input data to 20
. For a 74
-long steel bridge, they achieve an MAE of
/
for speed, and an
for axle spacing. The effects of more than one sensor are investigated in [
23]. Eleven sensors are used to input different CNN architectures for several bridges, further improving the previous results. For a two-span bridge, they achieve a balanced accuracy score of 91.92% for an imbalanced lane estimation. Other features are comparable or slightly improved compared to their previous studies.
For axle detection, simple peak detection algorithms are often used [
22]. Yu et al. [
24] applied a combination of wavelet transform and peak detection on strain data from finite element method (FEM) simulations. Although they only use FEM data and do not transfer to real-world data, they make several relevant findings. Firstly, it is shown that axle information can be extracted from global bridge responses. Furthermore, the sampling frequency significantly impacts the identification accuracy since it leads to sharper peaks. For example, at 200
, the axle spacing identification errors for a three-axle vehicle traveling at 30
/
are 25.3% and 97.1%, respectively. At a sampling frequency of 500
, these errors decrease to 0.2% and 1.43%. Finally, they show that road surface conditions can severely impact the results. Lechner et al. [
25] also use wavelets for BWIM based on crack displacement sensor data. They measure the width changes of an existing crack during traffic loading. With this local response, they can successfully obtain vehicle speed, axle count, and distances. Using the influence line, they can also compute individual axle loads. Zhao et al. [
26] use free-of-axle detectors (FAD) in combination with wavelets for improved strain-based axle detection. FAD BWIM uses additional FAD sensors attached to the lower side of the bridge. They have shown that axle-induced peaks in the FAD strain signal with Daubechies wavelets are more easily distinguished. While such FAD BWIMs give good results, they need many sensors, leading to a higher probability of failure.
Concerning contactless BWIM, Ojio et al. [
3] investigate the potential of cameras for bridge displacement measurements. They can extract the axle loads of a few reference vehicles with known weights from the bridge displacement using the influence line. However, regarding other aspects such as speed, lane, and axle spacing, they rely on a second camera instead of directly extracting these parameters from the displacement data. An overview of BWIM studies relevant to this work is presented in
Table 1.
4. Methodology
In this chapter, we will shortly introduce wavelet transform. Then, we explain our preprocessing and feature extraction steps and, afterward, the data-driven approaches and methods for each vehicle parameter. First, we investigate the distinction between cars and trucks. For all other tasks, we investigate both vehicle types simultaneously to have a more extensive dataset, as well as only trucks, by disregarding cars, as trucks are more relevant to SHM.
4.1. Prerequisite Concerning Wavelets
We will only superficially explain the concept of wavelets. For a more in-depth explanation, please see, e.g., [
29]. One motivation behind wavelets is to acquire local frequency information while maintaining a high temporal resolution, which is impossible using Fourier transformation. The method is analog, as the original signal is expressed as a family of functions. These functions are constructed from a so-called mother wavelet:
Different coefficients are generated from the input signal by varying scaling
a and time delay
b. One example of a mother-wavelet is the Gaussian wavelet
where
C is an order-dependent normalization factor. Another mother-wavelet is the Gaussian derivative wavelet, the
m-th order derivative of Equation (5), where
m lies between 1 and 8. It is defined as
where
represents the gamma function [
30]. The Gaussian derivative wavelet will be used in this study, as implemented by Lee et al. [
31].
4.2. Preprocessing and Feature Extraction
We minimize the preprocessing, so removing each time series’s offset is the only step except for axle counting and axle spacing estimation, where we use an additional high-pass filter before feature extraction. This offset comes from long-term drifts in the signal due to environmental influences [
10]. Otherwise, no filtering is applied since we want to maintain high-frequency information.
Figure 5 shows the methodology for our feature-based approaches. Among others, we test various models in combination with manually crafted features.
Table 3 summarizes all features used in this study and how they are calculated. Each feature is calculated for each used time series. Only a part of these features is used for a specific task to avoid making ML predictions more challenging by adding irrelevant features. This selection will be stated in the corresponding subsections. Different input features are passed to the ML models since the number of reflectors also varies depending on the task. Since tree-based models are scale-independent, no scaling is applied to the input features for random forest (RF) [
32] and gradient boosting (GB) [
33]. The input features are scaled in the case of
k-NearestNeighbours (KNN) [
34].
Unlike manual feature selection, MiniRocket [
18] uses random convolutional kernels to extract 9996 features. The kernels have a fixed length of 9, and their weights are restricted to two values:
and
. Dilation, or the spread of a kernel over the time series, lies within the range of
and
, where
and with
are the input length. Padding is fixed due to the convolution and alternates between no and zero padding. Biases are calculated based on the result of the convolution of randomly selected training examples, therefore being the only non-deterministic aspect of the MiniRocket approach. The final extracted feature is the proportion of positive values (PPV), which can be calculated with
where
Z is the convolution result between the input signal and a kernel and
N represents the non-zero signal length [
17].
Theoretically, the number of kernels can be regarded as a hyperparameter, but Dempster et al. [
18] recommend using 10,000 kernels since they do not observe a significant impact on the accuracy for different values. Normalization is unnecessary using the PPV and the bias drawn from the convolution results.
We split our data in an manner for training and testing, and we use stratified sampling for classification tasks to maintain class frequency. During training, we apply a 5-fold cross-validation grid search to find the best hyperparameters for each regression and classification task. For MiniRocket, we use the configuration recommended by the authors, which also includes a ridge regressor or classifier as the final step. During grid search, we optimize the mean squared error (MSE) for regression and the balanced accuracy for classification.
4.3. Vehicle Type
Since we distinguish between cars and trucks in the following tasks, we want to investigate whether data-driven distinction is possible for completeness. Although it seems trivial to use a threshold, the driving lane also comes into play. Therefore, we use the reflectors 2 and 4 (see
Section 3.1). As input features, we use 2, 3, and 9 of
Table 3. So, for each time series, we extract the minimum, mean, and power using the corresponding calculation methods. As we use two reflectors and extract each feature for each reflector, six features are extracted in this task. These features are then directly passed to RF, GB, and KNN after scaling to predict the vehicle type.
4.4. Lane and Locus
The offset of the reflectors in the y direction, as depicted in
Figure 2, makes it possible to discern the vertical driving position of a vehicle. In the first step, the lane shall be determined in a classification task using reflectors 2 and 4. As a baseline, which we will refer to as POWER, we compare the signal power (Feature 9 in
Table 3) for both reflectors. The lane is determined depending on which one has a higher value. The signal power of reflector 2 is greater, and the vehicle drives on lane 2. The input for the features-based models consists of the features 2, 3, and 9. In addition to the lane, the locus is estimated by regression to have a more precise vehicle localization. We use all 5 reflectors with the same features for locus regression.
4.5. Speed
Global responses, such as displacement, require a different approach than usual to calculate the speed since extracting it via peak detection from local responses is impossible. Therefore, we use data-driven ML to extract the vehicle speed. Arnold and Keller [
10] and Arnold et al. [
11] can extract vehicle crossings, but just using the length of such an event is not enough since the vehicle length also plays an important role. To show this, we additionally train a linear regression (LR) for speed estimation only on the signal length as a baseline. A one-dimensional LR tries to find the function of the coefficient
w and the estimated intercept
c, such that
y is the target vector and
x the input vector. In our case,
x represents the event length, and
y represents the vehicle speed. The feature-based models are trained with the features 1 to 11 from
Table 3. As the input time series, we use reflector 3.
4.6. Axle Count
We treat the determination of the axle count as a classification problem with either 2, 3, 4, or 5+ axles. With more data, a more precise classification might be possible. All predictions are made using only the signal of reflector 3.
Unlike the previous task, we do not use the displacement directly as input or for feature extraction. Instead, we filter the signal with a 50th order forward-backward Butterworth high-pass with a cut-off frequency of 45
. This is performed so that the models learn high-frequency features and do not use low frequencies. For our feature-based approaches, we use the features from 1 to 10 in
Table 3. We drop feature 11 since we already filter the signal with a high pass.
Apart from using ML, we also investigate the deterministic methods using a bandpass filter (BANDPASS) and CWT (WAVELET) as novel approaches. Their pipelines are depicted in
Figure 6 and
Figure 7, respectively. They are similar except for the first step of the pipeline, where its corresponding filtering or transformation is applied. As input, they receive unfiltered GBR bridge displacement data from one reflector. The output consists of a list of the positions of the detected peak. For BANDPASS, we apply a forward-backward Butterworth bandpass in the first step. The order of the filter is 50 and the critical frequencies are (45
, 65
). We choose
gaus7 as implemented by Lee et al. [
31] and the
coefficient for our WAVELET approach. This corresponds to
regarding Equation (6). These parameters have been determined as part of the parameter tuning process but outside the grid search. The bandpass-filtered signal and the wavelet transform result are then squared and smoothed by a weighted moving-average filter to obtain the distinguishable peaks. Ultimately, the signals are normalized to their highest value before searching all peaks to achieve generalization over all vehicles. We use the
find_peaks-method of the Python-package scipy for peak detection [
35]. The window size of the moving-average filter and the distance and prominence of a peak during peak detection are regarded as hyperparameters and deduced using the training dataset. Depending on the driving side and thus the driving direction, we discard peaks in half of the signal during which the vehicle does not enter or leave the bridge. As seen from
Figure 8 and
Figure 9, only the entering or leaving process is relevant for peak detection. Finally, we treat the length of the list of detected peak positions as the axle count. If no peak or only one is detected, we assume two axles.
4.7. Axle Spacing
Finally, we investigate how well the distance of axles can be determined. We treat this as a multi-output regression. Our BANDPASS and WAVELET procedures return a list of detected peak positions
x (see
Figure 6 and
Figure 7). A peak at position
i in the list is interpreted as axle number
i. The temporal distance between the peaks
i and
j can be calculated by subtracting the consecutive positions
and
and then divided by the GBR sampling rate of 200
. With the speed
from the UAV data, this can finally be transformed into axle spacing
between axle
i and
j according to
We assume the speed can be correctly extracted from
Section 4.5. We also assume that the axle count is known and ignore additionally detected spacings during evaluation. The same goes for our ML approaches. We also use the same inputs as in
Section 4.6.
6. Discussion
The central focus of this study is the potential of GBR displacement signals for a remote and data-driven BWIM. This section will discuss the results stated in the previous section. First, we will analyze the potential of ML for all classification and regression tasks. Afterward, our filter approaches regarding axle configuration identification are discussed.
6.1. Machine Learning for Displacement-Based BWIM
As the results for vehicle-type classification are immaculate for feature-based methods, it is possible only to regard trucks for SHM if desired. Trucks cause a significantly greater deflection than cars. Therefore, the promising results are unsurprising. The issue of the driving lane can be avoided by using one reflector per lane as input data. Our distinction between vehicle types during vehicle configuration tasks is thus well founded.
It seems that, for all tasks except axle spacing estimation, ML models can extract vehicle configurations. MiniRocket, which uses the raw time series data and extracts features via convolutional kernels, shows especially auspicious results. The model with the best performance can vary from task to task, but MiniRocket either has the best performance or follows close behind. Interpretability is an important aspect of bridge monitoring [
36,
37,
38]. The overall satisfying results of our models indicate that the extracted configurations are identifiable from global bridge displacement data using data-driven ML approaches.
To acquire the vertical position of a vehicle, we test both a lane classification and a locus regression. The lane can successfully be extracted for both vehicle categories. Trucks can be classified almost perfectly, with only one vehicle misclassified. This is unsurprising, as our reflectors are spread along the y axis (see
Figure 2). Accordingly, the maximum displacement during an event correlates to the driving side. Especially for heavy vehicles, a clear distinction can be made. In
Figure 2, e.g., reflectors 3 to 5 show a steeper curve than reflector 1 and 2. This suggests that the vehicle drives in lane 1, which corresponds to the bus from
Figure 8, which caused the bending. Thus, it seems enough for trucks to compare the signal power of reflectors 2 and 4 to acquire the lane.
The locus regression shows a similar behavior in that the results improve when discarding cars. Due to KNN’s outstanding performance, we surmise that scaling our features will help with this task. This makes sense, as the maxima ratio within one event is more relevant than their absolute values.
There is a significant difference in speed between all vehicles and only trucks. The reason for this might lie in the dataset composition. As cars are more frequent and with less variation in the vehicle configurations, like the length, the correlation between event duration and speed is more prominent. Thus, for some models, the decreases while the MAE improves. For MiniRocket, and MAE decrease when discarding cars, suggesting that it does not mainly look at the event duration.
The results for axle count classification imply that our features from
Table 3 are not helpful for this task, as all models that depend upon them have a BA of less than 51%. Only MiniRocket achieves satisfactory results, especially for trucks. The high OA but low BA show that the imbalance in the dataset, when including cars, leads to more misclassified trucks.
Comparing both datasets, it can be said that, although having more data generally helps, cars drastically change the distribution of many configurations. This makes it more challenging for models to learn truck properties. Regarding BWIM, where only trucks are considered relevant, this can be seen as a disadvantage. The truck dataset is very small as we have only one truck with seven axles, for example. Since trucks come in significant axle count and spacing variations, our models naturally have generalized difficulties. The dilemma of high imbalance and a small dataset is also apparent for axle spacing regression. Here, the MSE is very high for all vehicles compared to MAE, as the axle spacing for cars is very closely distributed (see
Figure 4). Conversely, models mainly learn about the car distribution, regarding trucks as outliers due to the highly imbalanced datapoints.
6.2. Filtering for Axle Configuration Identification
Figure 8,
Figure 9 and
Figure 10 indicate that the information of axles is present in a bridge’s global displacement time series. Individual examples produce promising results concerning axle count and axle spacing. To our knowledge, this has not been presented before, as displacement is barely exploited in BWIM. However, the results for BANDPASS and WAVELET in
Table 8 and
Table 9 show that it is challenging to find a general procedure. Especially the negative
values in
Table 9 demonstrate this, as this means that simply using the mean of the data as a prediction is a better fit than our methods. Neither BANDPASS nor WAVELET can be described as the better approach overall, as their respective performances varied heavily between samples. Unfortunately, we can only show a small portion of all the recorded vehicles, as there are cases in which no axle-induced peaks are visible in the bandpass signal, but they exist with WAVELET.
While having a high SNR to measure these small vibrations is necessary, a good SNR alone will not lead to valuable measurements concerning axle detection. Interestingly enough, the weight of a vehicle or axle seems not to be the decisive factor, as for some cars, the peaks are easily recognizable, whereas they are not visible for trucks. Therefore, we assume that the relative axle weight within one event is relevant. The truck in
Figure 10 appears heavier in the front than in the back. Accordingly, the peaks for the latter axles are less visible. However, this is only a conjecture since no axle load data have been recorded during this study. Also, the driving direction appears unrelated to how well axles can be detected. More peaks might have been detected with a higher sampling frequency, or at least their distinction might have been more straightforward for CWT. Yu et al. [
24] have demonstrated that a sampling frequency of only 200
leads to large errors. These vibrations seem to be caused at the junction between the bridge and the street. Consequently, this behavior might be specific to this bridge. Since we only monitored one field, we cannot say if the same vibrations can be measured for the other junction. Both fields seem too loosely coupled to transmit the signal if they exist. For one-span bridges, both leaving and entering might be observable. In such cases, our BANDPASS and WAVELET approaches can also extract the vehicle speed. In this study, however, this was not possible.
For vehicles with clearly visible peaks, like in
Figure 8, the axle-induced response could also be observed in the signal of other reflectors. This suggests that the specific reflector or its attachment does not induce the vibration but is a high-frequency vibration in the bridge displacement. Also, as mentioned in
Section 3.1, we monitor the bridge with two GBRs that measure different components. There have not been visible vibrations or peaks for the GBR measuring the z and y components, although the SNR lies in an adequate range. This indicates that the vibration mainly occurs in the x-component. However, a more detailed investigation is necessary.
7. Conclusions and Outlook
In this study, we discuss a novel data-driven, displacement-based BWIM approach. The data were recorded at a two-span bridge in Germany using GBR and a UAV for ground-truth data. We investigate the potential of both classic signal processing and ML to extract the vehicle configurations from bridge-crossing events. These configurations include vehicle type, speed, lane, locus, axle count, and spacing. One challenge herein is that displacement is a non-local bridge response. Furthermore, our dataset is imbalanced and consists of variable-length time series. We evaluate all approaches on all vehicles and all trucks only.
As ML approaches, we test four different models, three of which depend on manually crafted features. The fourth model, MiniRocket, uses the raw time series data. Over all configurations, MiniRocket achieves the most auspicious results, comparable to other studies such as Kawakatsu et al. [
21]. While speed, lane, and locus can be extracted, axle count classification is more challenging for all models. Only MiniRocket can classify a truck axle count with a BA of 76.7%. Finally, the results for axle spacing regression suffer from a small dataset. Therefore, more bridges should be monitored. However, more complex models and using extracted values like speed as an input feature could lead to better results.
We have recorded a high-frequency vibration for this bridge that coincides with axles crossing the junction. Using bandpass filtering and wavelet transform, we could demonstrate examples of this behavior. Based on this finding, we try two approaches for axle count and axle spacing determination purely based on signal processing. While the information seems available, we have not found a comprehensive procedure for the automatic extraction of axle configurations. A more sophisticated approach exploiting CNNs might be more successful as they can learn more generalizable features. For this, however, either the dataset needs to be increased, or data augmentation needs to be applied, as it has been investigated in Arnold and Keller [
12]. Furthermore, it must be investigated whether other bridges show a similar behavior in the high-frequency range. Ideally, GBRs with a higher sampling rate will be used for these measurements.
We showed that a purely data-driven BWIM exploiting GBR-displacement time series data is possible. However, these promising results must be refined using a more extensive dataset. Ojio et al. [
3] indicated that determining axle loads is possible with bridge displacement signals and the vehicle configurations extracted in this study. Thus, our results work towards a fully remote and data-driven BWIM.