1. Introduction
Recent perimeter security for critical infrastructure requires reliable, real-time detection and classification of events over long distances. Fiber-optic distributed acoustic sensing (DAS) is well suited to this task because a single optical fiber acts as a dense array of sensors, allowing continuous monitoring with high sensitivity, dense spatial resolution, and robustness to environmental conditions [
1]. The operating principle of DAS relies on Rayleigh backscattering from microscopic refractive-index fluctuations along an optical fiber [
2]. External perturbations such as strain or vibration modulate the fiber’s optical path length, imprinting phase and amplitude changes in the backscattered field. By coherently demodulating the amplitude and phase relative to the launched pulse, a spatially resolved backscatter profile along the fiber is reconstructed, enabling the detection, localization, and classification of events occurring along the fiber. In addition to perimeter security, DAS technology has found extensive application in various other domains [
3], such as seismology [
4] and structural health monitoring [
5,
6].
Extracting informative features from raw DAS data is a critical step in the processing pipeline to discriminate between event types such as human movement, vehicle activity, and attempted unauthorized actions near the sensor. Reflectograms captured by these systems contain detailed spatio-temporal information that characterizes the observed events. However, robust feature extraction is challenged by nonstationary and environmental noise [
7], variability in signal propagation conditions, and the wide diversity of event signatures [
8]. Several approaches have been proposed to suppress noise in DAS measurements. Frequency-domain methods based on Fourier transform decompose signals into task-specific spectral bands guided by the underlying physics and application requirements [
9]. Another common strategy for reducing the noise in raw data is to apply machine learning models, including convolutional and recurrent neural networks and support vector machines [
8,
10,
11]. These models effectively learn complex characteristic features from raw data, suppressing noise in the resulting representation and enhancing event classification accuracy. Previously, we showed that semi-supervised learning can substantially improve classification performance in limited datasets when using Fourier-derived features [
12].
As an alternative signal processing method, the wavelet transform has significant potential due to its adaptive time-frequency localization and multi-resolution analysis capabilities [
13]. Similarly to the Fourier transform, wavelet analysis enables the decomposition of the original signal into frequency-localized components corresponding to different frequency bands. Furthermore, this method achieves substantially reduced temporal resolution compared to the original reflectograms, facilitating a more efficient data representation [
14]. Wavelet analysis is a powerful tool for multilevel signal and image processing, providing precise localization of features in both the time and frequency domains. Its application to reflectogram analysis can greatly enhance the quality of feature extraction and increase robustness against noise and variability in event-related signals.
The wavelet transform has been widely used to process data acquired from distributed acoustic sensors. The study [
15] investigates the application of one- and two-dimensional wavelet transforms for efficient compression of DAS data. In [
16], a complex wavelet transform-based approach was developed to enhance the signal-to-noise ratio while preserving information throughout the full frequency range of the original DAS signals. The works [
17,
18] explore methods for coherent noise attenuation in seismic datasets using the continuous wavelet transform. Furthermore, ref. [
19] propose a hybrid technique that combines wavelet transforms with convolutional neural networks for adaptive noise prediction and suppression tailored to DAS data.
In this study, we propose a novel approach for extracting informative features from distributed acoustic sensor reflectograms using wavelet packet decomposition (WPD) to support event classification in perimeter security applications for critical infrastructure. The methodology decomposes the original reflectogram into multiple spectral channels corresponding to distinct frequency bands, from which only those containing the most relevant information are selected. Although conceptually related to Fourier analysis, WPD offers several advantages, including potentially lower computational complexity, an essential factor for real-time processing under hardware limitations, and greater flexibility in tuning filtering parameters such as the wavelet function and decomposition level. Despite its promise, the application of WPD to DAS signal processing remains underexplored, possessing both significant scientific and practical potential.
To validate the informational value of the selected wavelet channels, we implemented a convolutional neural network for binary classification trained exhaustively on all channels to confirm their relevance using a large experimental dataset from real perimeter security systems. Beyond identifying informative channels, we extract high-level features from the resulting wavelet images that enable the clustering of reflectograms by event class. Using these features, we develop a prefiltering mechanism based on principal component analysis, which effectively highlights key data characteristics, combined with a logistic regression classifier known for its efficiency and interpretability. This cascade approach reduces computational load and improves the overall accuracy of event classification.
2. Methods
2.1. Wavelet Packet Decomposition
Wavelet transforms are well suited to analyzing non-stationary processes [
20] and have been widely applied to periodic signal analysis, DAS data processing [
14], signal filtering [
13], and data compression [
21]. In digital signal processing, the discrete wavelet transform implements a two-channel filter bank that splits a signal into low- and high-frequency components, each downsampled by a factor of two relative to the original signal [
22].
Wavelet packet decomposition extends the discrete wavelet transform by iteratively applying the two-channel filter bank to both approximation and detail branches. At the decomposition level
L, the signal is partitioned into
sub-bands, each representing a specific frequency interval within the range from 0 to the Nyquist frequency of the original signal. Due to downsampling at each stage, the sampling rate within each sub-band is reduced by a factor of
relative to the original. A schematic example for
with an input sampling rate
Hz is shown in
Figure 1. In this notation, the symbol “
a” represents approximation coefficients, while “
d” indicates detail coefficients within the wavelet packet tree structure.
Since WPD decomposes the original signal into spectral sub-bands or channels with subsequent temporal downsampling, it is well suited for extracting informative features from raw reflectograms. The feature-extraction pipeline applies WPD to the reflectogram, then selects only those sub-bands that carry the most salient information about the recorded event. A detailed description of this procedure follows in the subsequent sections. Although conceptually related to the short-time Fourier transform, WPD offers advantages that are critical for minimizing computational resources required for signal filtering and subsequent processing in real-time operation. In particular, once informative wavelet channels are identified, a reflectogram can be decomposed only along those channels of interest, avoiding a full decomposition into all possible sub-bands.
2.2. PSNR-Based Wavelet Channel Selection
After applying WPD to the raw reflectograms, the next step is to identify the most informative wavelet channels among all the resulting wavelet sub-bands for the subsequent classification task. To quantify channel informativity of the image associated with a given wavelet sub-band, a single-image modification of the peak signal-to-noise ratio (PSNR) is used, a metric widely adopted for image quality and noise assessment [
23,
24]. Specifically,
where
X denotes the image corresponding to a given wavelet channel,
and
are its maximum and minimum pixel values, and
is the sample variance of the pixel intensities. This modified PSNR metric is measured in decibels (dB) and can take values from approximately 6 dB to
. The lower bound of the modified PSNR corresponds to an image in which half of the pixels have maximum intensity and the other half of the pixels have minimum intensity. Hereinafter in the study, by the PSNR metric we will mean this modified single-image variant.
If an image X corresponds to an informative wavelet channel, its PSNR value should be markedly higher than that of a non-informative, noisy channel. Informative images tend to contain regions with pixel intensities that are extreme relative to the background, reflecting the presence of a significant event. In other words, for an informative image X, the range substantially exceeds the standard deviation of the pixel values , resulting in a high PSNR(X). In contrast, images from non-informative channels are expected to lack extreme values, with pixel distributions resembling Gaussian noise, and thus exhibit considerably lower PSNR than informative channels.
2.3. Convolutional Neural Network Classifier
To empirically validate the hypothesis of informative channels, a convolutional neural network (CNN) based on the ResNet architecture [
25] was employed for binary classification of events. Residual networks are well established for the robust learning of image representation and have demonstrated strong performance in a range of classification tasks [
26]. This approach was chosen due to its combination of architectural simplicity and a relatively modest parameter count, which supports stable training and reduces the risk of overfitting compared to more complex or deeper alternatives. Using this model, all wavelet channels were systematically evaluated to identify those that yielded the highest classification accuracy. The input of the neural network comprises images generated by applying WPD to the raw reflectograms. The CNN architecture is presented in
Figure A1 and is described in detail in
Appendix A.
CNN was implemented using the open-source PyTorch framework (ver. 2.1.0), trained with the Adam optimizer and the binary cross-entropy (BCE) loss function, and supported by the learning-rate scheduler that reduced the learning rate following plateaus in validation loss. To further protect against overfitting, a pruning strategy based on performance on the validation set was employed. Training was conducted for 80 epochs on a GIGABYTE GeForce RTX 4090 WINDFORCE GPU. The model had a total of 2095 trainable parameters. The tuned hyperparameters are summarized in
Table A1.
2.4. Prefiltering Pipeline
In this work, we demonstrate that WPD not only facilitates the extraction of informative features from raw reflectograms, but also enables effective clustering of distinct event types based on characteristic wavelet image features. Leveraging these features, we developed a reflectogram filtering method that preemptively discards samples classified as non-target events before processing by the neural network. This preliminary filtering significantly reduces the computational burden on the classifier and contributes to improved overall event classification accuracy. The following subsections provide a detailed description of the data prefiltering methodology applied prior to neural network input for the event classes considered in this study.
2.4.1. Standard Deviation Analysis of Pixel Intensities in Wavelet Channel Images
The methodology outlined in
Section 2.2 enables the separation of wavelet channels according to their informativity, facilitating the identification of those channels that capture the core signal structure associated with relevant events. The subsequent analysis focuses on extracting quantitative features from these informative wavelet channels to enable robust classification of specific event types. In this study, our objective is to identify feature representations that best discriminate reflectograms assigned to target class 1 (
human_digging = True) from those assigned to non-target class 0 (
human_digging = False).
To develop a data filtering strategy tailored to specific event types, it is essential to identify which characteristics of the wavelet channel images effectively distinguish representatives of the target class from those of the non-target class. In this work, the filtering algorithm was designed based on the standard deviation (STD) of pixel intensities within the wavelet channel images, as this quantity reflects the energy deviation from the mean and serves as a discriminative statistical feature [
14,
27]. Given that wavelet images from different event classes correspond to distinct physical interactions with the DAS, the energy distributions captured by these images are expected to vary accordingly. To illustrate these differences,
Figure 2 depicts histograms of the standard deviation of pixel intensities for several wavelet channels, computed across the entire training dataset and shown separately for the two considered classes. Quantitative comparison of class-specific distributions was performed using the Jensen-Shannon Divergence (JSD) metric [
28], which ranges from 0 to 1 and increases with the dissimilarity between distributions. As shown in
Figure 2, some wavelet channels exhibit pronounced differences in standard deviation distributions between the two classes, whereas other channels show less differentiation.
2.4.2. Principal Component Analysis and Logistic Regression
Principal component analysis (PCA) is a widely used data preprocessing technique that identifies the main directions of variability within multivariate datasets, allowing dimensionality reduction, noise suppression, and the discovery of intrinsic data structures. These enhancements often lead to improved performance of subsequent machine learning models [
29,
30]. Using the significant dimensionality reduction afforded by PCA, we employed a logistic regression model to perform binary classification of events [
31]. Logistic regression, a fundamental statistical method for predicting binary outcomes, is widely used in various domains such as medicine [
32] and marketing [
33] due to its simplicity and interpretability. The logistic regression model is mathematically expressed as follows:
where
is the predicted pseudoprobability of the target class (
human_digging), ranging from 0 to 1,
are input features derived from the reflectogram,
are the model parameters, and
denotes the sigmoid function.
2.4.3. Overall Classification Workflow
To implement the preliminary reflectogram filtering algorithm, PCA was integrated with logistic regression to form an effective classification pipeline. This overall workflow, which combines PCA-based dimensionality reduction with subsequent binary classification via logistic regression, is illustrated in
Figure 3 and is shown as a sequential list below.
Application of WPD
The input reflectogram is decomposed using WPD to generate multiple wavelet channels representing different frequency sub-bands.
Selection of input features for PCA
The standard deviations of pixel intensities are extracted as a feature vector () from N wavelet channels.
Application of PCA
PCA is applied to reduce the dimensionality of the N-dimensional feature vector into a set of k principal components () with k varied between 1 and to optimize the representational fidelity.
Polynomial feature generation
Polynomial features up to degree m are generated based on the principal components. The degree m is a variable value. For example, with and , the feature set includes .
Classification via logistic regression
The polynomial features form the input to the logistic regression-based filtering classifier, which outputs a pseudoprobability .
Preliminary filtering using threshold
Reflectograms with the classifier output , where is a predefined threshold, are classified as non-target (class 0) and filtered out, bypassing further processing. This step rapidly excludes a portion of the events, reducing computational demand on the subsequent CNN classifier.
Final classification using CNN
Reflectograms with undergo further analysis by CNN described previously, which produces the final classification decision.
During the training of the logistic regression model, a class weighting parameter w was introduced to balance the influence of the target class relative to the non-target class within the loss function. This weight was incorporated into the computation of the BCE loss, effectively penalizing misclassifications of the minority class more heavily. Adjusting w enables the optimization of the prefiltering classifier to achieve a high recall for the target events while maintaining effective rejection of non-target samples. In this study, w was tuned over a wide range, spanning ratios from 1:1 to 800:1, to identify the optimal balance that preserves sensitivity to target events without excessive false positives.
2.4.4. Performance Metrics for Preliminary Filtering
The performance of the preliminary filtering stage was primarily evaluated using the Recall metric, which quantifies the proportion of correctly identified target-class events relative to the total number of target-class events. In the context of perimeter security systems, a high Recall is critical, as it ensures that the vast majority of potentially hazardous human-digging events near the protected perimeter are detected, minimizing missed detections (false negatives). This is particularly important given that the cost associated with false negatives greatly exceeds that of false positives in such applications. Mathematically, Recall is defined as:
where
True Positive denotes the number of target-class events correctly classified, and
Total Positive represents the total number of actual target-class events in the dataset. The Recall metric ranges from 0 to 1.
Additionally, the DropRate metric was used during classifier training to assess the proportion of events discarded at the preliminary filtering stage. DropRate is calculated as the ratio of events classified as non-target (class 0) to the total number of events in the dataset. It also ranges between 0 and 1, with higher values indicating a greater data reduction achieved before more computationally intensive processing.
2.4.5. Hyperparameter Optimization
Using the Recall and DropRate metrics, the hyperparameter tuning of the filtering classifier was formulated as an optimization problem: the objective is to maximize the DropRate while constraining Recall to a fixed, acceptable level. This strategy aims to discard the maximum number of non-target events before forwarding data to the neural network, without significantly compromising the retention of target-class events.
The hyperparameters under consideration include the number of principal components k, the degree of polynomial feature expansion m, and the target class weight w. For fixed values of k, m, and w, the classification threshold is varied within the interval . For each , the corresponding DropRate and Recall metrics of the trained classifier are computed, enabling the construction of a DropRate-Recall curve.
Introducing the DropRate-Recall Area Under the Curve (DRR AUC) metric, defined as the area under the DropRate-Recall curve, allows reformulating the optimization problem of maximizing DropRate at a fixed Recall as the maximization of the DRR AUC metric within a fixed Recall range. In this study, the hyperparameters of the filtering classifier were optimized by maximizing the DRR AUC. Using this integral metric enables a more comprehensive and precise selection of hyperparameters, improving the overall filtering performance of the model.
5. Discussion
Regarding physical interpretation, as noted in
Section 4.1, the dominance of low-frequency channels in the WPD is physically consistent with the characteristic frequency content of key events such as human footsteps and digging, which generate predominantly low-frequency vibrations detectable by the distributed acoustic sensor. High-frequency channels exhibit elevated PSNR values due to non-ideal filter responses inherent in wavelet filter banks, resulting in artifacts that do not correspond to meaningful event-related information. These filter-induced high-frequency deviations underscore the importance of careful channel selection to mitigate the influence of spurious spectral components. Employing enhanced metrics that incorporate underlying physical models or applying more sophisticated modifications to the PSNR calculation could further attenuate such artifacts, thereby refining the distinction between informative and non-informative channels. This consideration is critical to accurately capture event-relevant information.
From a practical standpoint, the selective decomposition of DAS reflectograms along informative wavelet channels gives significant advantages under real-time processing constraints common in perimeter security applications. By restricting computational efforts to a subset of channels containing pertinent event information, the proposed method substantially reduces processing latency and resource consumption. This selective WPD approach addresses the challenges of limited hardware resources, enabling deployment in operational environments that require prompt and reliable event detection. Importantly, the practicability of performing partial WPD without exhaustively computing the entire channel set offers a direct way to scalable real-time implementations. These efficiencies are crucial for maintaining high-throughput monitoring over extended fiber optic sensor arrays.
It should be recognized that the variability in environmental and operational conditions between the four recording stations manifested in clustering differences, reflecting the influence of site-specific factors on the characteristics of the DAS signal. Such station-to-station variability poses challenges to model generalization and may necessitate adaptive calibration or transfer learning for robust cross-site applicability. Furthermore, the inherent class imbalance present in the dataset (favoring non-target events) requires careful tuning of classification thresholds and weighting during model training to prioritize event recall. The trained models demonstrated high recall, but may still face difficulty in generalizing to entirely novel sensing locations or event types absent from the training data. Future robustness evaluations should include expanded datasets over diverse geographic and seasonal conditions.
While our current framework is developed and validated for binary classification tasks, we acknowledge that real-world DAS applications often require multi-class classification to distinguish among multiple event types with potentially overlapping frequency characteristics. To address this, we propose a strategy that decomposes the multi-class problem into multiple binary classification tasks, treating each target class independently against all other classes. This approach enables the identification of class-specific informative wavelet channels and facilitates training dedicated neural network models optimized for each event type. Subsequently, predictions from these specialized models can be aggregated to deliver a final multi-class decision regarding the detected event. This modular framework offers scalability and flexibility in handling diverse and overlapping spectral features across events.
Regarding possible future research, to further enhance classification performance, fine-tuning the convolutional neural network on datasets preprocessed via the proposed preliminary filtering method offers a promising avenue for improved feature refinement. Investigating ensemble frameworks where the preliminary classifier and neural network operate concurrently rather than sequentially may yield synergistic accuracy gains. Additionally, advanced data augmentation strategies informed by the physical characteristics of DAS signals can enhance model robustness and reduce overfitting [
37,
38,
39]. Another area of perspective lies in detailed reflectogram clustering based on station-specific signal variabilities, including exploring transformations that map data distributions between sites to facilitate higher-level feature extraction and dataset expansion. Such approaches may contribute to enhanced detection capabilities and broader operational adaptability in large-scale distributed sensing networks.
6. Conclusions
This work demonstrates the efficiency of wavelet packet decomposition as a preprocessing technique for distributed acoustic sensing in perimeter security applications. The proposed methodologies, including novel reflectogram WPD-based processing strategies, modified peak signal-to-noise ratio channel selection, and a convolutional neural network classifier, collectively achieve substantial dimensionality reduction while retaining the most significant features for robust event recognition. The CNN-based model yielded a classification accuracy of up to 97.88% on the experimental dataset from four operational perimeter security systems, effectively validating the informativity rankings obtained via the PSNR metric for wavelet channels.
Another key contribution of this study is the introduction of a multi-stage data filtering approach. By constructing high-level features using statistical properties of the wavelet-domain representations and implementing a prefiltering step using principal component analysis and logistic regression, a portion of non-discriminative samples was excluded prior to computationally intensive neural network inference. This led to measurable improvements in classification performance, especially for more challenging wavelet channel combinations, with the highest recorded test accuracy reaching 98.03%. These findings confirm that the incorporation of targeted wavelet prefiltering and feature engineering can intensify both accuracy and efficiency in large-scale DAS deployments.