1. Introduction
Pipelines serve as critical infrastructure for transporting water resources over long distances. However, these structures are susceptible to various threats throughout their whole length, including aging, corrosion, ground movement, and third-party intrusion (TPI), such as accidental impacts from construction activities [
1,
2]. For specific types, like prestressed concrete cylinder pipes (PCCPs), which are often used in large-scale water transport systems, failures like wire breaks also pose significant risks [
3]. Pipeline ruptures not only result in the waste of water resources but also lead to the loosening of surrounding soil and reduced foundation-bearing capacity, thereby affecting both residential and industrial activities. Unlike gradual degradation processes such as leakage, sudden anomalous events like impacts are often unpredictable and can lead to immediate and catastrophic failure, necessitating effective real-time monitoring [
4]. Oil and gas pipelines face even greater challenges from TPI. According to statistics from the China Gas Association, nearly 60% of pipeline incidents are caused by TPI [
5]. Therefore, protecting structural integrity against impact damage caused by construction activities is crucial for preventing environmental and social problems.
Traditionally, pipeline intrusion monitoring relied on point-type sensors, such as accelerometers [
6], hydrophones [
7], microphones [
8], and Fiber Bragg Grating [
9]. Chen et al. [
10] utilized a piezoceramic transducer array to capture impact-induced stress waves and estimated the time of arrival using instant phase analysis, enabling simultaneous axial and circumferential localization of impact sources on pipeline. The sensors were installed at certain intervals along the pipeline to detect the surrounding vibration. However, due to the attenuation of vibration waves during propagation, these sensors need to be densely deployed to ensure an adequate signal-to-noise ratio in practical applications, resulting in high installation costs.
Recently, distributed acoustic sensing (DAS) technology has offered advantages such as long continuous monitoring, a high spatial sampling density, and low underwater protection requirements [
11,
12]. DAS technology is based on the detection of the phase change in the lightwave within the optical fiber, which is laid along the pipeline, capturing environmental vibrations and transmit signals [
13]. As a result, the vibration signals detected by DAS are not affected by attenuation, unlike those from point-type sensors. By monitoring the vibration signals recorded by the optical cable deployed along the pipeline, vibration events can be detected and identified. In 2009, Tanimola et al. [
14] applied DAS in intrusion threat recognition. Recent advancements in DAS signal processing have enabled vibration event detection and classification through deep learning architectures such as convolutional neural networks (CNNs) [
15,
16], long short-term memory (LSTM) networks [
17], and hybrid models such as CLDNN [
18], demonstrating superior performance in distinguishing pipeline intrusions and mechanical disturbances.
Following the identification of vibration events, accurately locating their sources is essential for assessing pipeline integrity. Hussels et al. [
19] employed DAS to track propagating acoustic waves along the pipe wall and identify different acoustic modes, achieving transient impact localization. Huang et al. [
20] proposed a pipeline inspection gauge (PIG) position method, which detects the vibration signals generated by the collision between PIG and pipeline welds, integrating a clustering algorithm and Hough transform. In addition, time-domain [
21] and frequency-domain [
22] cross-correlation algorithms are also used for the localization of pipeline vibration events.
However, the application of DAS in impact localization in large-diameter water pipelines remains challenging, mainly because of the difficulties in sensing fiber deployment. Existing research often utilizes an external deployment approach, typically burying the fiber alongside the pipe [
14,
15,
16,
17,
18,
20,
22], which is suitable for new installations or shallow-buried pipelines. Another approach involves wrapping the fiber around the pipeline [
19,
21]. However, these approaches are impractical for existing, deeply buried water pipelines due to the high cost and restrictions of excavation. An alternative is to deploy the fiber internally within the pipeline. For instance, Higgins [
23] and Lisbel [
24] laid the fiber inside the pipeline without any fixation to detect wire breaks in PCCPs. However, this internal deployment approach is susceptible to background noise introduced by water flow. This noise field exhibits strong spatiotemporal variability along the fiber, rendering fixed filters ineffective. Consequently, transient impact signals, which are the target of this study, can be easily obscured, presenting a major challenge in accurate impact source localization.
The aforementioned studies on DAS signal processing and localization have not been validated under the noisy conditions associated with internal deployment in operational water pipelines. Although many algorithms have been developed for noise suppression in underwater environments, their inherent limitations render them unsuitable for the specific challenges encountered in this study. For instance, wavelet threshold denoising [
25] requires prior knowledge to select the appropriate wavelet basis function and decomposition level, which becomes impractical in scenarios involving water flow noise. Furthermore, the least mean square algorithm [
26] necessitates a reference noise signal, which is often unavailable when noise and target signals are inherently coupled. Another widely used method, Empirical Mode Decomposition [
27], while capable of decomposing signals into intrinsic mode functions (IMFs), is known to suffer from mode-mixing, particularly when processing signals with transient features. Variational Mode Decomposition (VMD) [
28] is an adaptive signal processing method designed to decompose a given signal into a predefined number of IMFs. Its principal advantage is the effective suppression of mode mixing, attributed to its non-recursive structure and narrow-band filtering mechanism. By decomposing a noisy signal into a set of modes, VMD effectively separates the signal components of interest from those primarily containing noise. In pipeline leakage detection, the optimized VMD approach effectively denoises acoustic signals contaminated by noise, enhancing feature extraction for further analysis [
29].
The arrival time is typically defined as the point at which a noticeable difference emerges between noise and effective signal. The Autoregressive-Akaike Information Criterion (AR-AIC) algorithm [
30] is a commonly used method for selecting the arrival time of seismic signals. It determines the optimal division point between noise and the effective signal at the minimum value of the AIC function. However, this method is limited to determining a single segmentation point for a given time series, whereas the actual arrival time may correspond to a local minimum of the AIC value. Pruned Exact Linear Time (PELT) [
31] is able to precisely identify and segment all potential target vibration signals within long time series. Nonetheless, a challenge persists in that the unequal lengths of time series segments produced by PELT reduce the effectiveness of conventional metrics, such as Euclidean distance, when comparing features across different sequences.
Therefore, this paper proposes a novel two-step method designed for robust impact localization using a DAS-recorded signal from internally deployed, unfixed cables amidst strong water flow noise. The first step employs the VMD algorithm combined with a stable impulsiveness metric, Short-Time Energy Entropy (STEE), to adaptively extract the impact signal component from the noise. Subsequently, the second step achieves accurate automated selection of arrival time by applying the PELT algorithm, followed by an unsupervised learning method that uses Dynamic Time Warping (DTW) and clustering to identify the impact’s onset based on shape similarity, overcoming the limitations of traditional pickers in complex scenarios. The practical errors of this integrated approach are validated through field experiments on an operational water pipeline.
2. Sensing Principle of the DAS System
The DAS system is based on phase-sensitive optical time-domain reflectometry, which mainly consists of an interrogator unit and an optical sensing fiber. Its measurement principle is shown in
Figure 1. The DAS interrogator injects a probe laser pulse into the fiber. During the forward-propagation of the pulse, Rayleigh backscattering (RBS) signals are generated at different positions with different round-trip times. Deformation of the sensing fiber caused by external vibration leads to a phase change in the RBS signal, enabling the DAS system to measure dynamic strain changes along the fiber [
13].
Within a segment of sensing fiber with a length of
l, the phase delay of the return lightwave
ϕ before deformation is as follows:
where
β is the wave vector in vacuum and
nref is the fiber refractive index.
After deformation is applied to this segment of fiber, the length and refractive index of the fiber change to Δ
l and Δ
nref, respectively. The corresponding phase change can be expressed as follows [
32]:
where
x is the different location of the fiber,
t is the temporal information,
ε(
x,
t) = Δ
l/
l is the dynamic strain of the fiber,
Cε = Δ
nref/
ε is the constant coefficient of variation in the refractive index with strain, and
l can be regarded as the spatial resolution of the DAS system, indicating the ability to separate the RBS in different positions. Therefore, the DAS system receives a vibration signal at every spatial resolution length, thereby converting the entire length of the fiber into a distributed sensing channel.
Since the purpose of this study is to achieve a linear positioning of the impact signal along the longitudinal axis of the pipeline, we used the time of arrival method, which was previously adopted by Hong for thunder source location [
33]. Compared with point-type sensors, the key advantage of this approach lies in the DAS ability, ensuring high spatial density during vibration sampling. When a sufficiently strong impact occurs, numerous vibration arrival times can be selected within its wavefield range using DAS. This abundance of data reduces localization errors. As shown in
Figure 1, after impact vibration occurs, stress waves, which are instantaneously generated, act on the fiber. Given the assumption that the stress waves propagate at a uniform speed
v from the vibration source along the pipeline in the upstream and downstream directions, each sensing channel sequentially receives the vibration signal, in order from the closest to the farthest. The optimal vibration source location
x0 can be obtained by minimizing the misfit between the calculated and chosen travel time:
where
is the calculated travel time at channel
ch,
xch is the coordinate of channel
ch along the pipeline,
is the selected arrival time of the vibration signal at channel
ch,
t0 is the calculated initial time of vibration, and
CHsel is the selected sensing channels in the wavefield’s range of vibration.
3. Method
3.1. Basic Process
In this study, we proposed a two-step method for locating impact signals within noisy DAS-recorded signals. The first step focuses on adaptively extracting the impact signal component from water flow noise. This is achieved via using VMD to decompose the raw signal into various IMFs, followed by applying STEE to quantify the impulsiveness of these IMFs and identify the specific IMF containing the primary impact vibration.
Once the relevant impact signal component is selected, the second step automatically selects its arrival time. The PELT algorithm accurately segments the extracted signal by detecting all significant change points. Next, DTW measures the shape similarity between these variable-length segments. Finally, an unsupervised clustering method uses these DTW distances to group the segments, distinguishing the impact signal segment(s) from noise clusters based on structural dissimilarities, allowing the precise onset time to be determined. This process yields accurate arrival times, which are essential for source localization. The flowchart of the proposed method is shown in
Figure 2. The proposed method was developed with Python 3.9.
3.2. Step 1: Adaptive Extraction of Impact Signals from Water Flow Noise Based on VMD-STEE
In this study, the VMD algorithm is employed to process signals recorded by DAS, enabling the extraction of impact-induced vibration signals from background water flow noise.
The main purpose of VMD is to identify several modes that are band-limited around a specific central frequency, ensuring that the summation of these individual modes accurately reconstructs the original input signal. This decomposition is achieved by formulating and solving a variational problem, where the primary objective is to minimize the sum of the bandwidths of all extracted modes. When the vibration signal
ϕ(
t) is decomposed into
K modes, the objective function of VMD can be formulated as follows:
where
denotes the
k-th mode decomposed from
ϕ(
t),
denotes the center frequency of the
k-th mode,
is the Dirac delta function,
is an imaginary unit, the asterisk
signifies the convolution operation, and
represents the partial derivative with respect to time.
By incorporating the quadratic penalty factor
α and a Lagrange multiplier
λ(
t), the constrained variational problem in Equation (4) is transformed into an unconstrained variational problem. The resulting augmented Lagrangian function expression is
The saddle point of this augmented Lagrangian function in Equation (5) is then found using the Alternating Direction Method of Multipliers (ADMM). ADMM is an iterative optimization technique that updates the modes , their corresponding center frequencies , and the Lagrange multiplier λ in the frequency domain until a convergence criterion of relative tolerance and absolute tolerance is satisfied. In this study, the penalty factor was set as 2500, and the relative tolerance and absolute tolerance were set as 5 × 10−3 and 5 × 10−6, respectively.
Pre-setting an appropriate value for
K is essential because this directly determines the quality of IMFs. Specifying a small
K value leads to under-decomposition, where water flow noise contaminates the IMFs containing the impact signal. Conversely, a large
K value results in over-decomposition, splitting the impact signal across multiple IMFs. Both scenarios impede the accurate selection of vibration arrival times in the next step. Various generalized information entropy measures, such as spectral entropy [
34] and frequency band entropy [
35], are widely applied across different fields, often serving to quantify specific characteristics of the time series data. Since the impact signal with transient characteristics is the object signal, this study uses the concept of STEE [
36] to quantify the impulsiveness of IMFs derived from VMD. STEE is further utilized to adaptively determine the optimal value of
K for VMD. Short-time energy reflects the temporal variations in signal energy. STEE is derived by calculating the information entropy of the energy distribution across all short-time windows within the signal segment. Its specific definition is provided below.
For the
k-th IMF divided into
N windows of length
w, the short-time energy
Ek,n in the
n-th window is
STEE is then computed as
where
Hk represents the STEE of the
k-th IMF,
N is the total number of short-time windows, and
Pk,n denotes the proportion of energy in the
n-th window relative to the total energy across all windows for the
k-th IMF. Based on this definition, the STEE value is bounded between 0 and 1. Using STEE, the IMFs obtained from VMD can be classified into noise components and effective vibration components. A higher degree of transient behavior in an IMF, concentrated within a few short-time windows, results in a lower STEE value. Conversely, for a perfectly stationary signal where energy is evenly distributed (
Pk,n = 1/
N for all
n), the STEE value approaches its maximum of 1.
Kurtosis is a classical metric used to characterize the impulsiveness of a signal. However, its stability as an assessment tool can be limited in complex scenarios involving multiple impacts. In contrast, the proposed STEE exhibits lower sensitivity to the number of impacts, rendering it potentially more suitable as a stability indicator in such multi-impact scenarios. To illustrate this, consider a transient signal
fT(
t) was modeled as a damped sine wave, formulated as follows:
where
κ is the time constant and
fc is the central frequency of signal, set to 100 and 500 Hz, respectively.
Figure 3 displays simulated waveforms within a 1 s window containing five instances of this impact signal. The corresponding Short-Time Energy (STE) sequences are plotted alongside this. To simulate ambient noise, white Gaussian noise with a power of −30 dBW was added to the signal. The STE sequences clearly show the concentration of signal energy around the impact events, exhibiting distinct peaks at these instants. Furthermore,
Figure 4 presents the calculated kurtosis and STEE values for different impact instances. The results reveal a monotonic decrease in kurtosis as the number of impacts increases from one to five (from 82.39 to 27.15). In contrast, the STEE value shows only a modest increase (from 0.499 to 0.579). The baseline signal with zero impacts (representing background noise) exhibits the lowest kurtosis (2.90) and the highest STEE (0.995), approaching 1, as theoretically expected. The STEE values span a considerably smaller range compared to the kurtosis, which suggests that STEE offers superior stability when evaluating signals containing multiple transient impact events.
To effectively apply VMD to extract relevant vibration signals from water flow data, this study employs a stepwise incremental approach to determine the optimal number of K modes. The process starts with a minimum K of 2, which iteratively increases. At each step, the STEE is calculated for all resulting IMFs. If the minimum STEE value among all IMFs is less than a predefined threshold, the IMF corresponding to this minimum STEE is identified as the most prominent vibration component. This selected IMF is then utilized for subsequent arrival time determination. To prevent the potential misidentification of valid components caused by signal over-decomposition, the maximum value for K is set as 14. Furthermore, it is noted that the characterization of signal impulsiveness by STEE is influenced by the choice of short-time window length w. A detailed analysis of this parameter will be presented in the following sections.
3.3. Step 2: Automatic Arrival Time Selection Based on PELT and DTW-AHC
The PELT method builds upon the optimal partitioning method [
37] by introducing a pruning strategy, allowing for the detection of change points with low computational cost and high accuracy. Its core idea is to find a segmentation that ensures the homogeneity of the statistical properties within each segment while maximizing the distinction between adjacent segments. Specifically, for a time series
(representing the effective vibration signal), the set of segmentation points
is determined by minimizing the sum of fitting costs
within all segments, as shown in the following equation:
where
represents the cost function for a segment, and
γ is a penalty term for the number of change points. The equation recursively determines the last change point,
τM, based on the optimal partitioning up to that point. To improve computational efficiency, candidate change points
τ that provably cannot be part of the optimal solution are pruned during each recursive step. Following the AR-AIC, the vibration signal is modeled with an assumption of normal variance, leading to a cost function defined as twice the negative log-likelihood.
Following segmentation by PELT, the DTW algorithm [
38] is utilized to quantify shape similarity between the resulting segments with varying lengths. DTW finds the optimal alignment between two time series by non-linearly warping the time axis, thereby minimizing the cost required to match their shapes. Given two segments obtained by PELT,
and
, a warping path
defines the alignment between their corresponding elements. The goal of DTW, subject to monotonicity constraints, is to find the path that minimizes the cumulative distance
over all possible paths, expressed as follows:
where
L is the length of the warping path. Normalizing by
L accounts for the fact that longer signals tend to accumulate larger total distances, providing the distance per unit path length.
signifies the Euclidean distance between elements
and
. The optimal warping path is found recursively by backtracking through the distance matrix from the final cell (aligning the full sequences) to the initial cell. The cumulative distance
for each cell is calculated as follows:
This formula indicates that the cumulative distance is the sum of the distance at the current cell and the minimum cumulative distance from the valid neighboring cells to . The DTW distance between the two sequences is the value of the final element of the cumulative distance matrix, , signifying the minimum cost to align them.
After calculating the pairwise distances between all PELT-generated segments using DTW, these distances are used for clustering.
For impact signal segment detection, this study employs an unsupervised learning framework that combines DTW with a clustering method. The onset time of the detected impact segment is determined to be the arrival time. Based on the DTW distance matrix, a hierarchical clustering algorithm [
39] is applied to structure the time segment data, which can be visualized as a dendrogram. This procedure operates on the assumption that noise segments tend to cluster together at lower distance thresholds, forming a large main cluster. In contrast, signal segments, due to their distinctiveness, remain separate and merge with the main noise cluster only at a much larger distance. Consequently, the significant separation distance between the impact signal cluster and the noise cluster can be used as an indicator to distinguish them; for instance, the maximum inter-cluster merge distance can be used for this purpose. This unsupervised approach avoids the need to preset the number of clusters and directly leverages the structural dissimilarity within the data to identify the impact signal. Finally, the arrival time of the signal can be selected automatically.
4. Field Test
In this study, an impact experiment was conducted on an in-service water pipeline to verify the proposed localization method, as shown in
Figure 5. The pipeline mainly consists of PCCPs. It features an inner diameter of 4 m and comprises individual sections of 5 m in length. The pipelines are connected via the spigot and the bell method. Its burial depth varies from 3 m to 9 m. During the experiment, the water flow rate within the pipeline was maintained at 21 m
3/s. This corresponds to an average internal flow velocity of approximately 1.67 m/s, which is close to the upper limit of the economical flow velocity range of from 0.9 to 1.8 m/s [
40]. The tested pipeline section is designed for a maximum flow rate of 25 m
3/s, so the experimental flow condition reflects a typical operating condition for this pipeline.
The DAS system is manufactured by Ningbo AllianStream Photonics Technology Co., Ltd., Ningbo, China. The type of DAS interrogator used in the test was ixDAS-2000, which recorded signals along the fiber with a spatial resolution of 4.91 m and a sampling frequency of 5000 Hz. The system operated with a pulse width of 50 ns, and the acquisition card had a sampling frequency of 250 MHz. The refractive index of the optical fiber was 1.467. The sensing optic cable, encasing the fiber, was deployed longitudinally inside the pipeline without being fixed, and mainly recorded the vibration signals propagating through the water medium. It is a custom cable, suitable for use in drinking water and designed with strong waterproofing and mechanical strength to withstand the internal pipeline environment. The cable was threaded through a special cable entry assembly installed on the steel pipe section, and laid in the downstream direction.
To simulate potential TPI events, impacts were manually induced on the pipeline using two types of rebound hammers with different impact energy (IE) levels: a concrete rebound hammer (IE = 2.2 J) and a high-strength concrete rebound hammer (IE = 4.5 J). The rebound hammers are manufactured by Beijing Hichance Technology Co., Ltd., Beijing, China. The varying impact energy levels provided by the two hammer types were intended to simulate different threat scenarios, such as manual excavation or mechanical drilling activities near the pipeline. Prior to the experiment, the rebound hammers were calibrated to ensure consistent IE delivery. Impact positions were selected at two accessible air valve chambers (Source 1# and Source 2#), as indicated in
Figure 5. These two impact locations were situated 2430 m apart along the pipeline axis. At each location, impacts were induced 10 times using each type of hammer. The location results from the two distinct impact points allowed for cross-validation of the method’s errors against the known ground truth positions of the impacts.
6. Discussion
A novel two-step method was presented for localizing impact vibration sources on pipelines using DAS-recorded data. The main challenge lies in accurately detecting and timing the arrival of transient impact signals, particularly in operational scenarios characterized by high spatiotemporal noise variability and overlapping transient signals, when using internally deployed, unfixed sensing cables.
In the first step, the proposed VMD-STEE method proved suitable for adaptively extracting the relevant impact signal component observed from the operational pipeline. As shown in
Figure 6, the background noise exhibits significant spatiotemporal variability across different sensing channels. The use of STEE provided a metric for quantifying the impulsiveness of each mode, allowing it to be reliably distinguished from the more continuous flow-induced noise regardless of its specific amplitude or frequency content. ROC analysis obtained an optimal short-time window length for STEE calculation, maximizing the precision and specificity needed to reliably identify the impact component while minimizing false positives due to noise misclassification. As shown in
Figure 9, IMF1—associated with the impact event and exhibiting the lowest STEE—is clearly distinguishable from background noise components such as IMF4. This distinction facilitates accurate arrival time determination for subsequent localization.
The second step, combining PELT and DTW-AHC, focused on accurate arrival time selection, segmenting signals into homogeneous regions and clustering them based on shape similarity. Under typical conditions, as shown in
Figure 16, the amplitude of the selected impact signal segments (Segment 2) significantly exceeded that of the noise segments. Nevertheless, the arrival time selection method proposed in this study demonstrates robust performance even in complex-noise environments. During the experiment, the water flow rate was maintained at 21 m
3/s, with a Reynolds number of 6.7 × 10
6, indicating a fully turbulent flow regime. Under such hydrodynamic conditions, the optical cable experienced continuous oscillation due to water flow, leading to dynamic deformations in the optical fiber. As shown in
Figure 10, despite the signal extraction by the VMD-STEE, considerable noise persisted in the extracted impact signal. Notably, Segments 6, 13, and 15 exhibited higher amplitudes compared to other segments. Despite this, the subsequent arrival time selection step, depicted in
Figure 11, effectively identifies the true impact. Hierarchical clustering shows that Segment 26 displayed a markedly larger merging distance than other clusters, confirming it as the primary impact signal. These results validate that the proposed method captures the correct onset of impact signals, overcoming the limitations of threshold-dependent or global-optimization-based approaches in complex vibration environments.
Field tests under operational pipeline conditions validated the practical applicability of the proposed method. Localization results were consistent across two distinct source locations and two impact energy levels, achieving standard deviations typically between 1.42 m and 1.98 m. For repeated impacts at the same location, the maximum difference between calculated positions did not exceed 6.55 m, which is shorter than the length of the two pipe sections. These results provide a reliable foundation for structural assessments of pipelines.
The results also indicate the need for further investigation. While STEE demonstrated reliable performance in detecting manual impacts, its effectiveness in identifying low-energy vibrations has yet to be validated. The integration of machine learning classifiers trained on a variety of anomalous vibration patterns may extend the method’s applicability to broader scenarios. Additionally, as shown in
Figure 15, the analysis of travel time residuals highlights the complex physics of wave interaction within the pipe structure and the fluid medium. The trend of negative residuals near the source and positive residuals farther away suggests a transition in the dominant wave-propagation path. This observation indicates the limitation of assuming a constant wave velocity for all propagation paths and distances, although the near-zero mean of the residuals suggests that this assumption provides a reasonable average fit.