1. Introduction
Phase-sensitive optical time-domain reflectometry (Φ-OTDR) stands as a cornerstone advancement in distributed optical fiber sensing. The analysis of phase perturbations in the Rayleigh backscattered light within the optical fiber enables high-sensitivity, long-range, and real-time localization and monitoring of vibration events along the sensing cable [
1]. Its unique, fully distributed sensing architecture requires only a single fiber to achieve coverage extending over tens of kilometers. This technology also boasts remarkable advantages, including immunity to electromagnetic interference, high concealment, and excellent environmental adaptability [
2]. Consequently, Φ-OTDR has been widely deployed in diverse fields such as security early warning systems for oil and gas pipelines [
3,
4,
5], seismic wave detection [
6,
7,
8], perimeter security for border areas [
9,
10,
11], and underwater activity monitoring [
12,
13].
When external vibrations act upon an optical fiber, they alter the fiber’s refractive index or length, resulting in minute phase shifts in the backscattered Rayleigh light [
14]. However, the randomly distributed scattering points in the optical fiber can cause destructive interference of the Rayleigh backscattered light from the coherent detection pulse. This interference may introduce noise into the Φ-OTDR intensity signal, resulting in significant errors during phase demodulation and potentially leading to false alarms [
15,
16]. Therefore, accurately locating the positions of interference fading points in optical fibers becomes a critical prerequisite for mitigating their adverse effects. Given that fading points manifest as statistical outliers with amplitudes significantly deviating from normal points, outlier detection methods such as the standard deviation approach or Grubbs’ test have been proposed as alternative solutions. However, the standard deviation method [
17] is constrained by its assumption of a normal distribution and its sensitivity to multiple outliers, while Grubbs’ test [
18] performs poorly in handling large volumes of outliers and involves subjective selection of the significance level.
Given the density characteristics observed in the data points, the application of a density-based clustering algorithm is highly appropriate. This method is capable of identifying high-density areas and marking low-density points as noise or outliers. Commonly used algorithms include DBSCAN, FDBSCAN, and DBSCAN++. Among them, DBSCAN is a foundational algorithm proposed by Ester et al. in 1996. It is designed to discover clusters of arbitrary shapes within spatial or high-dimensional data that contain noise [
19]. The DBSCAN algorithm requires two global parameters: the neighborhood radius (eps) and the minimum number of points (MinPts). An object is classified as a core point if there are at least MinPts neighbors within its eps. All objects within the neighborhood of a core point are directly density-reachable and form an initial cluster. This cluster is then expanded by incorporating the neighborhoods of any other core points within it. Objects inside a cluster that do not meet the core point condition are labeled as border points, while any object that is not reachable from any core point is categorized as noise. FDBSCAN eliminates redundant neighborhood evaluations by directly assigning core points to clusters. These clusters may either retain their initial assignments or expand through density-reachable overlaps with adjacent clusters [
20]. If a border object is adjacent to a core point, the algorithm assigns it to the cluster containing that core point. Otherwise, it is classified as noise. DBSCAN++ is a straightforward refinement of the original DBSCAN algorithm, which requires density calculations only for a selected subset of points [
21]. Its core principle operates on the principle of estimating densities for a subset of *m* data points (where *m* << *n*), rather than for the entire dataset of size *n*. Subsequently, standard DBSCAN operations are performed on this subset. However, these density-based clustering algorithms exhibit low computational efficiency when processing the high-dimensional data acquired by Φ-OTDR systems.
To address the aforementioned challenges, this paper proposes the AP-DBSCAN algorithm, which can autonomously determine its parameters, rapidly process high-dimensional data, and accurately identify the locations of fading points. The work begins with a detailed analysis of distributed optical fiber sensing, interference fading, and the differential principle of fading. Subsequently, the AP-DBSCAN algorithm is applied to tackle the problem of efficient fading point discrimination. To handle the high-dimensional data, principal component analysis (PCA) is employed to reduce dimensionality and extract salient features [
22]. The two parameters, eps and MinPts, are determined adaptively. Specifically, the eps value is derived by analyzing the k-distance distribution across the entire dataset [
23], whereas the MinPts parameter is determined through an analysis of the density distribution based on the pre-computed eps threshold. Following this, the identification of fading points is performed by applying the DBSCAN++ algorithm to the data. Following the identification of fading points, the corresponding locations in the phase signal are flagged as missing values. For fading suppression, existing methods include suppressing interference fading based on differential phase shift pulse technology [
24] and suppressing interference fading by using phase shift transformation [
25]. In this paper, the nearest-neighbor interpolation method is utilized. The positions of the fading points of the phase signal are set as missing values, and then the missing points are filled with the nearest non-missing values to eliminate the influence of interference fading. The superiority of the proposed AP-DBSCAN method was first validated through simulation experiments, demonstrating its advantages over DBSCAN, FDBSCAN, and DBSCAN++. Finally, an experimental system was set up to collect actual disturbance signals in order to verify the processing capability of the AP-DBSCAN method for actual signals.
3. AP-DBSCAN Principle and Fading Point Discrimination
Based on the characteristics of the data, principal component analysis (PCA) is employed for data processing. PCA performs unsupervised dimensionality reduction through eigendecomposition of the covariance matrix. It derives orthogonal components from eigenvectors of standardized data, ordered by descending eigenvalues. This transforms correlated variables into uncorrelated low-dimensional features with minimal information loss. Applied to Φ-OTDR signals, PCA extracts dominant components through dimensionality reduction. This critical preprocessing step significantly enhances subsequent fading point identification efficiency.
The adaptive determination of eps utilizes the K-distance method, where K = dimensionality + 1. This configuration ensures adaptive K-value selection while avoiding self-distance, thereby preventing local density misjudgment. For each data point, the distances to its K nearest neighbors are computed and sorted in ascending order. The eps value is then defined as a sufficiently high percentile of these sorted K-distances across the entire dataset. Subsequently, with eps determined, each sample point in the dataset is traversed to count the number of points lying within its eps-radius neighborhood. The arithmetic mean of these neighborhood counts across all samples is computed, and its ceiling (upward-rounded integer value) is designated as the MinPts parameter.
The two adaptively determined parameters are then utilized in conjunction with DBSCAN++. The core principle of DBSCAN++ involves data sampling, for which a uniform sampling strategy is adopted in this study. This approach preserves the original DBSCAN’s robustness to noise and its ability to identify clusters of arbitrary shapes while notably reducing computational complexity, thereby offering significant advantages when processing large-scale, high-dimensional data. A schematic diagram of the algorithm is presented in
Figure 3.
In summary, the AP-DBSCAN algorithm integrates PCA for initial data processing. The processed data subsequently undergoes clustering via DBSCAN++ with adaptively determined parameters, enabling rapid identification of fading point locations. Analysis of the differentiated amplitude information, as illustrated in
Figure 4a, reveals that normal points exhibit high density and form compact regions, whereas fading points demonstrate low density and sparse distribution. The red areas indicate fading regions, while the green areas represent normal regions. The result of applying the AP-DBSCAN algorithm to process the differentiated amplitude information is shown in
Figure 4b, where Cluster 1 corresponds to normal points and other clusters denote fading points. Following the identification of fading points, the nearest-neighbor interpolation method is employed to reconstruct the impaired data at these locations. The overall flowchart of the fading suppression process is presented in
Figure 5.
4. Experiments
4.1. Experiment on Fading Suppression in Vibration Signals
The experiment was conducted on a sensing fiber with a total length of 70,080 m. A PZT vibration at 500 Hz was applied to the fiber segment between 48,610 m and 48,850 m. The type of optical fiber used is G652D 9/125um single-mode optical fiber. The manufacturer is Fiberhome. To validate the algorithm’s capability for both long-range and short-range fading point identification, a segment from 50,000 m to 59,000 m was selected for long-range detection, and a segment from 35,000 m to 39,000 m was chosen for short-range detection, thereby verifying the algorithm’s feasibility and accuracy. As shown in
Figure 6a,b, which display the second derivative of the amplitude information for the long-range and short-range sections, respectively, the red areas indicate fading regions and the green areas represent normal regions.
The DBSCAN, FDBSCAN, DBSCAN++, and AP-DBSCAN algorithms were, respectively, applied to cluster the long-range and short-range regions. For DBSCAN, FDBSCAN, and DBSCAN++, the parameters were set as Eps = 5 and MinPts = 5. In contrast, AP-DBSCAN determined its parameters adaptively. Regarding clustering performance, as shown in
Figure 7a,b, the solid blue dots represent the fading locations identified by DBSCAN, FDBSCAN, and DBSCAN++, while the hollow red circles denote the fading locations identified by AP-DBSCAN. It can be observed that all four algorithms identified fading points at identical locations and in equal quantities across both the long- and short-range regions, confirming the feasibility of AP-DBSCAN.
In terms of clustering efficiency,
Table 1 presents the computational time required by the four clustering algorithms when processing the differentiated data. Compared to the three traditional clustering algorithms, the proposed AP-DBSCAN achieves identical accuracy in identifying fading locations. Among the conventional methods, DBSCAN exhibits the longest processing time while maintaining the highest accuracy in fading point localization. FDBSCAN reduces computational time compared to DBSCAN without compromising accuracy, primarily due to the elimination of redundant operations in overlapping regions of core objects’ neighborhoods. DBSCAN++ demonstrates significant improvement in processing speed over DBSCAN, owing to its data sampling procedure. The proposed AP-DBSCAN algorithm achieves the fastest processing performance by incorporating PCA technology, which substantially enhances computational speed while enabling autonomous parameter determination. This improvement results in a 30.28% to 72.51% increase in clustering efficiency compared to the other methods.
Following the identification of interference fading points, this study employs the nearest-neighbor interpolation method to efficiently reconstruct the impaired data at these locations, enabling rapid and accurate detection and correction of fading effects. As shown in
Figure 8a, from 48,610 m to 48,850 m is the vibration zone. Due to the influence of interference fading, the vibration signal is drowned out.
Figure 8b displays the reconstructed signal, where the vibration region becomes distinctly visible.
Figure 8c shows the phase difference information before correction, exhibiting chaotic phase jumps. After reconstruction, clear signal characteristics can be observed in
Figure 8d. Due to the fading effects, the original signal achieves a signal-to-noise ratio (SNR) of only 2.5985 dB prior to correction. Following reconstruction, the SNR improves to 2.6389 dB. The SNR is calculated as follows:
where
denotes the signal power, and
represents the noise power.
The amplitude signal is first analyzed through differential operation and AP-DBSCAN clustering to detect the locations of signal fading. If no fading regions are detected, the signal remains unaltered. For identified fading segments, the corresponding regions undergo correction based on neighborhood interpolation. The affected areas will be shielded, and the intervals of this shielding will subsequently be replaced by adjacent non-fading data points. When no adjacent values are available on the left side, the gaps are filled entirely using values from the right side, and vice versa. This process ensures phase continuity at all corrected locations, ultimately yielding a reconstructed phase signal.
4.2. Experiment on Fading Suppression in Underwater Acoustic Signals
The experiment utilized hydroacoustic signals collected from a lake, with the setup shown in
Figure 9. A Φ-OTDR system was shore-deployed with a 120 m coiled optical fiber cable. The optical fibers used in this study are specially customized ones. The cable’s end was anchored across the bank with a 10 m rope, stabilized at 1 m depth using 5 m spaced floats and counterweights. A kayak-mounted UPS-powered acoustic source traversed the cable path, emitting specific frequencies at designated locations.
The boat was positioned 50 m perpendicular and 27 m parallel to the cable front. A 36 V/180 Hz acoustic signal was emitted. Based on the fading discrimination framework and the proposed method, the fading point locations were rapidly identified. Interference fading was suppressed using the nearest-neighbor interpolation method.
Figure 10 shows the second-order derivative plot of the signal.
For the processed underwater acoustic signal after fading differentiation, DBSCAN, FDBSCAN, DBSCAN++, and AP-DBSCAN were applied to compare their performance in terms of processing time and accuracy for fading point identification. As shown in
Figure 11, the fading locations identified by the four algorithms are illustrated.
As illustrated in the figure, DBSCAN achieves relatively accurate identification of fading points in both high- and low-density regions. FDBSCAN exhibits an increased misjudgment rate compared to DBSCAN. This performance degradation stems from its limited capability in handling high-dimensional and large-volume datasets. Although FDBSCAN simplifies computational procedures by verifying whether core objects share labeled neighbors to facilitate cluster merging, this strategy introduces sequential dependencies: erroneous labeling in early stages can propagate inaccuracies, adversely impacting subsequent merging processes. Among the four algorithms, DBSCAN++ demonstrates the highest misjudgment rate. Its core approach involves uniform data sampling, which reduces dataset scale and mitigates redundancy, but introduces critical sensitivity to the eps and MinPts parameters. In particular, inadequate sampling density frequently leads to misclassification.
In comparison, the proposed AP-DBSCAN algorithm achieves fully autonomous parameter determination while maintaining a very low error rate. It adaptively configures eps and MinPts according to variations in data dimensionality and volume, thereby substantially enhancing detection accuracy. Relative to conventional DBSCAN, AP-DBSCAN attains an accuracy of 99.92%.
As can be observed from
Table 2, the processing time required by AP-DBSCAN is significantly shorter than that of the other algorithms. Overall, while maintaining accuracy, AP-DBSCAN demonstrates a substantial advantage in discrimination time, considerably reducing the overall computational overhead. This improvement results in a 67.33% to 76.29% increase in clustering efficiency compared to the other methods. Following the identification of fading points, the nearest-neighbor interpolation method is similarly applied to mitigate the effects of interference-induced fading, as illustrated in
Figure 12.
Figure 12a displays the uncorrected signal, where the overall signal is significantly affected by fading. Multiple locations exhibit substantial phase jumps due to fading interference. Moreover, the resulting fading noise has submerged the disturbance signal, with only a vague outline of the disturbance signal discernible in the figure.
Figure 12b presents the corrected signal, where most fading points have been eliminated, allowing for clear identification of the disturbance location.
Figure 12c shows the phase before correction, revealing severe phase jumps. In
Figure 12d, after correction, the phase jumps are normalized, exhibiting a sinusoidal pattern. Due to interference fading, the SNR before correction was 2.1713 dB. After correction, the SNR improved to 3.0647 dB.
5. Conclusions
This paper presents an AP-DBSCAN algorithm specifically designed for locating the fading points in φ-OTDR systems. To address the issue of low computational efficiency in processing high-dimensional signals acquired by Φ-OTDR systems, the method incorporates principal component analysis for feature extraction and dimensionality reduction. By integrating the K-distance method into DBSCAN++, it achieves adaptive determination of the eps and MinPts parameters. Simulation experiments demonstrate the superiority of AP-DBSCAN over DBSCAN, FDBSCAN, and DBSCAN++ in terms of temporal efficiency for fading point identification, and its effectiveness is further validated using underwater acoustic signals. Experimental results indicate that AP-DBSCAN achieves the optimal processing time compared to DBSCAN, FDBSCAN, and DBSCAN++, with recorded processing times of 21.163 min, 18.744 min, 15.361 min, and 5.017 min, respectively. This confirms that the proposed method significantly improves the efficiency of fading point discrimination while maintaining an accuracy rate of 99.92%. The interference fading effect on the signal is subsequently eliminated using nearest-neighbor interpolation. AP-DBSCAN successfully resolves the time-consuming challenges associated with processing high-dimensional signals and realizes full parameter adaptivity. This work holds significant importance for the efficient discrimination of fading points, suppression of interference fading, and reconstruction of signals in practical Φ-OTDR systems.