1. Introduction
In typical maritime electronic reconnaissance scenarios, radar emissions are reflected by scatterers such as the sea surface, vessel hulls, coastal terrain, islands, and floating objects, creating numerous multipath propagation paths. The electronic support measure (ESM) system receiver simultaneously intercepts both the direct-path pulses and multipath-scattered pulses originating from the same emitter. It generates a Pulse Descriptor Word (PDW) stream containing core parameters—including Radio Frequency (RF), pulse width (PW), Direction of Arrival (DOA), pulse amplitude (PA), and Time of Arrival (TOA)—which is then passed to the sorting and processing unit.
Mature engineering applications of pulse sorting methods currently exist on ESM systems; most of them exploit the stable inter-pulse modulation characteristics of radiation sources. Pulse separation is completed by clustering and statistical methods. Firstly, multi-parameter correlation comparison or Density-Based Spatial Clustering of Applications with Noise (DBSCAN), fuzzy C-means (FCM) and other clustering methods are used to cluster pulses according to RF, TOA or time difference of arrival (TDOA). Then, the pulse repetition interval (PRI) is used to separate the complete sequence of the same radiation source from the pre-sorted pulse stream. After decades of development, the analysis method of PRI has mainly formed two classical algorithm systems: statistical histograms and transform domain analysis. The statistical histogram methods include the TOA difference histogram, cumulative difference histogram (CDIF) and sequence difference histogram (SDIF). The transform domain methods include the PRI transform method, plane transform method and so on [
1]. In addition, there are some other sorting techniques, including Kalman filtering, graph theory, and intelligent sorting algorithms based on deep learning [
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15].
Multipath-scattering pulses introduce spurious detections, false associations, sorting ambiguity, and missed associations in conventional PRI statistical sorting. Most operational sorting chains primarily separate pulses from different emitters and suppress multipath returns as interference, for example, by spatial filtering or DOA thresholding. Such suppression simplifies deinterleaving but also removes propagation-path information that may be useful for scatterer localization. Recent work has therefore considered multipath information as a positioning resource rather than only as a disturbance. Related studies have used multipath-scattering centers as virtual receivers or have introduced learning-based approximations for multipath-assisted localization [
16]. These studies, however, generally assume either known environmental geometry, available multipath pairing, or features beyond the PDW fields routinely produced by ESM receivers.
Dalveren [
17] used multipath-scattering centers as virtual receivers, thereby converting a single-receiver localization problem into a TDOA-like multi-sensor formulation. That line of work demonstrates the physical value of multipath returns, but it assumes that the multipath observations used by the localization algorithm have already been identified with sufficient reliability.
Catak [
18] investigated machine-learning-assisted passive emitter localization using multipath signals in an ESM context. The method reduces computational burden relative to ray-tracing-assisted localization, but it still relies on assumptions about the surrounding scattering environment. These assumptions are difficult to satisfy in non-cooperative maritime reconnaissance when the receiver has limited prior knowledge of the emitter and nearby scatterers.
These observations motivate a narrower but necessary preprocessing problem: before multipath-scattered pulses can be used for localization, direct-path and scattered-path pulses belonging to the same emitter must be separated with sufficient reliability. Existing approaches to this path-level sorting problem often depend on prior azimuth information, environmental geometry, or hand-designed timing constraints. Such assumptions are restrictive for non-cooperative ESM operation. Zhang et al. [
19] used initial pulse phase as an in-pulse fingerprint and identified suspected multipath-scattered pulse pairs through phase-difference analysis. Although the reported simulation results verify the feasibility of that approach, the required phase-related features are not standard PDW outputs in many practical ESM receivers.
The Long Short-Term Memory (LSTM) networks are well-suited to ordered pulse sequences because they can model temporal dependence over multiple time steps. Recent radar-signal studies have combined recurrent networks with CNN, Bi-LSTM, attention, and feature-fusion modules for emitter recognition, modulation-related sorting, and deinterleaving under false-pulse or missing-pulse conditions [
20,
21,
22]. These studies establish that sequence-learning models are effective for radar pulse analysis. However, their primary target is usually emitter-level separation or signal type recognition, not the direct/scattered path discrimination of pulses that have already been assigned to the same emitter.
In this work, we address the direct/scattered pulse discrimination stage that lies between conventional emitter-level pulse sorting and multipath-assisted passive localization. Unlike deep-learning radar sorting studies that primarily separate different emitters or modulation types, this study keeps the emitter identity fixed and evaluates whether PDW sequences can separate propagation-path classes of the same emitter. We provide a clear formulation of this same-emitter direct/scattered multipath-scattered pulse classification task for maritime ESM and establish a unified PDW-sequence benchmark that allows fair comparison between clustering methods, rule-based temporal analysis, shallow LSTM, and residual LSTM approaches. We also present comprehensive robustness, repeated-seed, and checked component-ablation evidence for the current simulation setting, as well as a measured-data workflow that explicitly separates pseudo-label-controlled transfer from held-out diagnostic visualization. The revised interpretation is deliberately limited: the measured-data results support workflow plausibility and diagnostic use, but they do not constitute calibrated AIS-error evaluation or manually labeled real-data classification accuracy.
2. Multipath Signal Propagation Model
The propagation model is used to generate and interpret PDW-level timing, angle, and amplitude differences rather than to reconstruct the full electromagnetic field. Unless otherwise stated, the simulation assumes narrowband pulse propagation, stationary scatterers within one processing window, dominant single-bounce scattering, negligible Doppler within a PRI, synchronized receiver timing, and known receiver hardware parameters. These assumptions simplify the maritime channel so that the direct/scattered classification problem can be evaluated in isolation.
In the considered maritime ESM scenario, the receiver is mounted on a shipborne or buoy-based platform and passively intercepts emissions from non-cooperative radars. As illustrated in
Figure 1, in addition to the direct free-space path, the electromagnetic wave may propagate through multiple independent single-bounce scattering paths produced by the sea surface, ships, islands, reefs, or other maritime objects. The receiver therefore records both direct-path pulses and multipath-scattered pulses from the same emitter and outputs PDW sequences containing TOA, DOA, RF, PW, PA, and related measurement parameters.
Let the baseband pulse sequence transmitted by non-cooperative radar be
where
is the transmitted pulse amplitude,
is the pulse width,
is the pulse repetition interval,
is the carrier frequency,
is the pulse initial phase,
is the single frame pulse sequence length, and
is the rectangular window function.
The direct wave signal received by the ESM receiver is a copy of the time delay attenuation of the transmitted sequence after propagation through free space. The model is
In the formula, is the direct wave propagation delay ( is the direct distance from the radar to the receiver, is the speed of light in vacuum); is the receiving amplitude of the direct wave is determined by the transmitting amplitude, the propagation loss and the gain of the transmitting and receiving antenna. is channel additive Gaussian white noise.
In this expression, the direct-path delay is determined by the radar-to-receiver distance and the speed of light. The direct-path amplitude is determined by the transmitted amplitude, propagation loss, and transmitter/receiver antenna gains. The additive term denotes channel noise and is modeled as white Gaussian noise for the present simulation.
For the multipath-scattering pulse, the receiving model of a single scattering path is
In the formula, is the additional delay of the path relative to the direct wave ( is far less than all multipath-scattering pulses fall in the same PRI period with the corresponding direct-path pulses); is the receiving amplitude of the path, it is always satisfied by the two-way propagation loss and the RCS of the scatterer. is the channel noise of the corresponding path. The received signals of all scattering paths together constitute the multipath-scattering pulse set of this study. For the k-th scattering path, the additional delay is measured relative to the direct path and is assumed to be smaller than one PRI, so that the associated scattered pulse remains within the same PRI as the corresponding direct-path pulse. The scattering-path amplitude is lower than the direct-path amplitude because of two-hop propagation loss and the finite radar cross section of the scatterer. The superposition of all scattering-path signals constitutes the multipath-scattered pulse set considered in this study.
Direct-path receiving power (dBm):
In the formula, is the radar transmission power, and are the gain of the transmitting and receiving antenna, and is the signal wavelength.
Scattering-path receiving power (dBm):
In the formula, is the radar cross section (RCS) of the scatterer, and and are the propagation distances from the radar to the scatterer and from the scatterer to the receiver.
The propagation model shows that direct-path and multipath-scattered pulses from the same emitter have the same RF, PW, and PRI modulation law, but differ in TOA, DOA, and PA because of path length, arrival direction, and scattering loss. These PDW-level differences provide the physical basis for direct/scattered classification.
3. Classification Problems and Definition of Evaluation Metrics
3.1. Definition of a Classification Problem
The classification task considered here starts from a pulse stream that has already been associated with one emitter by a preceding emitter-level sorting stage. The objective is to assign each pulse to either the direct-path or multipath-scattered class, thereby providing timing references and scattered-path observations for subsequent multipath-assisted localization. The study evaluates this intermediate same-emitter path-classification problem rather than a complete multi-emitter ESM deinterleaving chain.
This study formulates same-emitter multipath-scattered pulse separation as a supervised binary classification task. The sample label set is defined as , where 1 denotes direct-path pulses and 0 denotes multipath-scattered pulses. For the i-th pulse received by the ESM, its pulse description word (PDW) can be represented as a 6-dimensional feature vector, , which contains RF [MHz], PA [dBm], PW [μs], PRI [μs], TOA [μs], DOA [°], and the derived temporal interval ΔTOA [μs]. All continuous features are standardized using training-set statistics before being arranged into TOA-ordered sequences of length 32. PRI is obtained from the simulated emitter timing model or from the estimated local repetition interval in the preprocessed PDW stream, and DOA is treated as a scalar azimuth angle in degrees.
The core task of multi-path pulse sorting and pairing is to use a classification model to learn the mapping relationship between PDW features and pulse class labels: , thereby enabling accurate prediction of each pulse category, and the classification of direct-path pulse and scattering pulse is completed by category division.
3.2. Definition of Evaluation Indicators
In the physical environment of maritime electronic reconnaissance, the electromagnetic waves emitted by a radar generate numerous scattering paths as they propagate—in addition to the single direct path—due to the complex sea surface, ship hulls, reefs, and floating objects at sea. This means that for every direct-path pulse, an ESM receiver typically intercepts multiple multipath-scattering pulses associated with it. This inherent “one-to-many” physical characteristic results in an imbalance in the distribution of pulse types within the PDW data captured by ESM systems, meaning that the number of multipath-scattering pulse samples will exceed that of direct-path pulses.
Given this characteristic, this paper uses the macro-average F1 score as the primary metric to ensure a balanced evaluation of the recognition quality for both types of pulses:
Let
denote the true class labels of the pulses, and
denote the predicted labels of the algorithm. Here, 1 represents direct-path pulses, and 0 represents multipath-scattered pulses. For a test set containing
pulses to be classified, the elements of the confusion matrix
are defined as
In the equation, represents the true class of the pulse, represents the class determined by the algorithm, and is the indicator function.
The total number of samples in category can be expressed as .
To objectively evaluate the performance of the algorithm in classifying the two types of pulses, the recall
and precision
for each class
can be defined as
In particular, reflects the proportion of samples that actually belong to class and were correctly classified by the algorithm; reflects the proportion of samples classified as class that actually belong to class .
The F1 score for class
is defined as the harmonic mean of precision and recall, providing a comprehensive measure of the algorithm’s classification performance for class
:
The overall average F1 score
is the arithmetic mean of the two types of
scores:
4. Residual LSTM-Based Pulse Sorting Method
Within this PDW-sequence classification framework, the residual two-layer unidirectional LSTM provides a temporally ordered and deployment-oriented sequence learner against which clustering methods, DOA-period temporal rules, and a Single-LSTM baseline can be compared under identical labels and input features. The overall process of the algorithm is shown in
Figure 2:
4.1. Construction of a Composite Temporal Feature Space
To fully utilize the spatio-temporal coupling and correlation of multipath signals in the time domain, the model input vector
includes not only the 6 core physical parameters defined in
Section 3.1, but also incorporates key time-series difference features:
Specifically, the differential Time of Arrival quantifies the interval between consecutive pulses, with the value for the first pulse in the sequence defined as .
Each feature is first globally normalized based on the population mean and standard deviation to eliminate differences in scale. The data is then concatenated and batch-processed to form a time series . Before entering the LSTM hidden layer, the model applies Layer Normalization (LayerNorm) to the input sequence to enhance numerical stability during the early stages of training and accelerate convergence.
4.2. A Two-Layer LSTM with a Residual Structure
The hidden layer of the algorithm model consists of two layers of LSTM units, designed to extract long-range temporal dependencies in PDW sequences.
The first LSTM layer is designed to capture local evolution patterns between adjacent pulses, and its output serves as the input to the second LSTM layer.
Based on
the output of the first layer, the second LSTM layer further captures global temporal dependencies across long sequences, uncovering stable delay relationships between direct and multipath-scattered pulses. The output of the second layer is
At the same time, residual connections are introduced to sum the outputs of the first-layer LSTM with those of the second-layer LSTM, enabling the network to preserve the low-level features of the original pulse while incorporating higher-order multipath correlation features.
The hidden states from the last time step are fed into the fully connected classification layer for subsequent global feature extraction and class probability calculation.
4.3. Loss Functions and Training Strategies
To address the issue of class imbalance, this paper employs a cross-entropy loss function with class weights. By increasing the loss weight for direct-path pulses, the model’s classification bias toward the more prevalent scattered pulses is mitigated. The loss function is defined as
Here, represents the number of samples in the batch, represents the true label of the sample, represents the model’s predicted probability, and represents the class weight, which is inversely proportional to the number of samples of that class in the training set; that is, the weight of direct-path pulses is significantly higher than that of scattered pulses.
The model is trained using the following strategies:
The Adam optimizer was used, with a base learning rate of 0.0001 and a weight decay coefficient of , to prevent model overfitting.
The ReduceLROnPlateau learning rate scheduler was adopted; if the validation set loss did not decrease for 3 consecutive epochs, the learning rate was reduced by a factor of 0.5 to improve model convergence.
An early stopping strategy is enabled with a patience value of 5; if the validation set loss does not decrease for 5 consecutive epochs, training is terminated early to prevent overfitting.
During training, three data augmentation strategies—Gaussian noise, random jitter, and random scaling—are applied to enhance the model’s generalization ability and robustness against disturbances.
4.4. Fully Connected Layers and Output Layer
The fully connected layer uses a three-layer fully connected network to form a classification head, and performs dimension mapping and classification decisions on the temporal features of the LSTM output. The three-layer fully connected network maps the 128-dimensional features output by the two-layer LSTM layer by layer to the number of output categories, 2, corresponding to the classification requirements of direct-path pulses and multipath-scattering pulses.
The output layer uses a Softmax activation function to output the probability distribution for each pulse’s category. Category prediction is performed based on the maximum-likelihood principle, enabling end-to-end pulse category classification.
4.5. Component-Ablation Study
Before the perturbation experiments, a component-level ablation was conducted to verify how the main design choices affect the residual LSTM under the clean validation setting. The evaluated variants remove ΔTOA, LayerNorm, residual fusion, class-weighted loss, or data augmentation from the full configuration. All variants keep the same train/validation split and binary direct/scattered label definition.
Table 1 shows that removing ΔTOA produces the largest degradation in mean macro-F1, from 0.9995 to 0.7679, which indicates that inter-pulse timing difference is the dominant cue in the simulated validation set. Removing LayerNorm, residual fusion, class weighting, or augmentation has a limited effect on the clean validation score, but these components are retained in the final configuration because they support convergence stability, class-imbalance handling, and robustness under degraded PDW conditions.
5. Experimental Validation and Analysis of Results
To validate the path-classification model, the experimental section is organized around the complete current workflow: single-emitter simulator construction, supported component ablation, controlled method comparison under pulse loss and parameter jitter, class-balance analysis, simulation-measured domain-discrepancy assessment, high-consensus measured-data pseudo-label quality control, controlled transfer learning, and held-out measured-data diagnostic visualization. The quantitative performance claims are made on the simulated validation and robustness protocols. The measured-data results are reported as pseudo-label consistency, diagnostic classification outputs, and qualitative localization visualization.
5.1. Dataset Construction
The simulation dataset was generated with a single-emitter direct/scattered maritime ESM simulator. The relevant key settings of the simulator are shown in
Table 2 and
Table 3.The simulator assumes that emitter-level sorting has already been completed and therefore focuses on path-level discrimination within one emitter stream. It outputs the same eight PDW fields used by the downstream classifiers: time, frequency [MHz], amplitude [dBm], pulse width [μs], repetition interval [μs], TOA [μs], DOA [°], and pulse type. The complete dataset contains 600,000 PDW records, including 362,290 scattered pulses and 237,710 direct-path pulses. A chronological split by TOA is used, with 480,000 records for training and 120,000 records for validation. This split preserves temporal ordering and avoids randomly mixing adjacent pulse sequences across training and validation.
5.2. Comparison of Experimental Settings
In this experiment, all comparison methods use the same simulated train/validation CSV files and the same direct/scattered label definition. The comparison includes three non-neural baselines (FCM, DBSCAN, and TSA), a Single-LSTM sequence baseline, and the residual two-layer LSTM. For FCM and DBSCAN, training-set labels are used only after clustering to map cluster indices to semantic classes; this mapping is a supervised calibration step for otherwise unsupervised baselines. TSA uses fixed temporal and angular reference values and therefore represents a prior-dependent rule baseline rather than a fully adaptive learning method.
The controlled comparison is implemented in two layers. First, all methods are trained or calibrated on the explicit simulated training split. Second, the validation stream is perturbed by six pulse-loss rates (0%, 10%, 20%, 30%, 40%, and 50%) and six normalized parameter-jitter levels (0.0 to 1.0), producing 36 coupled degradation scenarios. Macro-F1 is used as the primary measure because both direct and scattered classes are required for subsequent multipath-assisted localization.
5.2.1. FCM Clustering Method
FCM is an unsupervised soft-clustering method based on fuzzy set theory. In this experiment, a seven-dimensional standardized PDW feature space is used, and a two-cluster fuzzy partition is estimated by batch iteration. The fuzzy coefficient is set to m = 2, and the batch size is set to 10,000 to support large-scale pulse data processing. Training labels are used only to associate the resulting clusters with the two semantic classes and to reduce the effect of class imbalance in the calibration stage. After clustering, TOA difference, cumulative difference histogram (CDIF), and sequence difference histogram (SDIF) information are used for temporal consistency verification. The core parameter settings are shown in
Table 4.
5.2.2. DBSCAN Clustering Method
DBSCAN is an unsupervised density-based clustering method. The neighborhood radius defines the local search region, and the minimum number of samples in the neighborhood determines core, boundary, and noise points. Density connectivity is then used to assign high-density pulse samples to clusters. To handle the large dataset, mini-batch processing is used. During calibration, cluster centers are mapped to pulse categories according to the label distribution of the training set; during inference, a 1-nearest-neighbor rule assigns test samples to the calibrated clusters. The same seven standardized PDW features are used for comparison with the LSTM-based methods. The core parameter settings are shown in
Table 5.
5.2.3. TSA Method
The TSA method is a rule-based classification method that incorporates physical prior knowledge of the pulse signal and offers strong interpretability. Firstly, the core physical features such as DOA, TOA difference and pulse repetition period are extracted from the basic features, and then the derivative features such as period stability, TOA difference stability and period proximity are calculated by the sliding window method. Taking the physical characteristics of the direct-path pulse as the fixed reference benchmark, the feature constraint boundary is constructed by the periodic stability threshold and the AOA threshold, and the three dimensions of spatial orientation, temporal stability and periodic consistency are fused. The triple combination logic decision rule is designed to complete the pulse signal classification. The core parameters are set as shown in
Table 6.
5.2.4. LSTM Method
The residual LSTM method constructs seven-dimensional standardized PDW features into sequences of length 32. Two unidirectional LSTM layers are used to extract temporal dependencies, with LayerNorm and dropout applied after recurrent processing. Residual fusion is introduced when the hidden dimensions match, so that lower-level temporal features can be preserved while higher-level dependencies are extracted. To address class imbalance, the cross-entropy loss is weighted inversely to the number of samples in each class. Gaussian noise, random jitter, and random scaling are applied as data augmentation. The Adam optimizer, ReduceLROnPlateau scheduler, and early-stopping strategy are used during training. The core parameter settings are shown in
Table 7.
To provide a closer neural baseline, a single-layer unidirectional LSTM was evaluated. It uses the same seven standardized PDW features, sequence length (32), classification head, class-weighted loss, and data augmentation as the residual LSTM, but removes the second recurrent layer and residual fusion. This baseline isolates the effect of the deeper residual recurrent structure from the general use of an LSTM classifier.
5.3. Comparison of Experimental Results and Analysis
To evaluate performance under degraded PDW quality, a two-factor controlled test was conducted. Parameter jitter was varied over six normalized levels (0.0–1.0), where larger values indicate stronger perturbations of the PDW parameters. Pulse loss was varied over six levels from 0% to 50%. The full design therefore includes 36 coupled pulse-loss/parameter-jitter scenarios. Macro-F1 is used as the primary metric because of the strong direct/scattered class imbalance; class accuracy and single-batch inference time are also reported.
5.3.1. Influence of Pulse Parameter Jitter on Sorting Performance
To evaluate sensitivity to pulse-parameter jitter, the pulse-loss rate was fixed and macro-F1 was compared across jitter levels, as shown in
Figure 3.
Under jitter-free conditions, the TSA method can achieve competitive performance because its fixed timing assumptions are consistent with the simulated pulse structure. As parameter jitter and pulse loss increase, however, the residual LSTM is less dependent on fixed reference values and generally maintains higher macro-F1 than FCM, DBSCAN, and TSA. The Single-LSTM baseline follows the same sequence-learning strategy but shows a larger performance drop under degraded PDW conditions, indicating that the second recurrent layer and residual fusion improve robustness in the tested scenarios. At 0% pulse loss, the residual LSTM macro-F1 decreases from 0.9827 at jitter 0.0 to 0.9693, 0.9200, 0.8630, 0.8139, and 0.7759 as jitter rises to 0.2, 0.4, 0.6, 0.8, and 1.0. The total decrease along this clean-loss jitter edge is therefore 0.2068. The corresponding Single-LSTM edge drops from 0.9959 to 0.6576, a decrease of 0.3384.
It shows that the residual LSTM remains comparatively stable in the low-jitter region, where the PDW parameter perturbation is weak and the temporal relationship among adjacent pulses is still preserved. At jitter 0.0, increasing pulse loss from 0% to 50% changes the residual-LSTM macro-F1 from 0.9827 to 0.9678, a decrease of only 0.0149. The scattered-pulse class accuracy remains 0.9647 and the direct-pulse class accuracy remains 0.9758 at 50% loss and jitter 0.0. This indicates that observation thinning alone is less damaging than a strong distortion of the PDW feature values in the current simulator.
At 50% pulse loss, the residual-LSTM macro-F1 decreases from 0.9678 at jitter 0.0 to 0.9417, 0.8823, 0.8242, 0.7829, and 0.7459 as jitter rises to 0.2, 0.4, 0.6, 0.8, and 1.0. In the strongest degradation corner, its scattered-pulse class accuracy is still 0.9423, while direct-pulse class accuracy falls to 0.5299. This class-level number is important: the model retains many scattered candidates, but the direct-path reference becomes much harder to identify under combined jitter and missingness.
The comparison methods show different numerical failure modes under jitter. At 0% loss and jitter 0.0, TSA has macro-F1 of 0.9034, but it falls to 0.7114 at jitter 0.2 and 0.5004 at jitter 1.0. DBSCAN changes from 0.7766 to 0.5382 along the same edge, and FCM changes from 0.6987 to 0.5189. These values show that TSA is strongly tied to its fixed DOA-period reference assumptions, while clustering methods are limited by static feature geometry. The residual LSTM is also affected by jitter, but its clean-loss jitter edge remains higher than the three non-recurrent baselines at every jitter level.
The Single-LSTM baseline confirms that recurrent modeling itself is beneficial for this task. Its clean setting is the strongest single coordinate, with macro-F1 of 0.9959 at 0% loss and jitter 0.0. However, its surface contracts faster when jitter becomes large: at 0% loss and jitter 1.0 it gives 0.6576, and at 50% loss with jitter 1.0 it gives 0.6211. The residual LSTM gives 0.7759 and 0.7459 at those two coordinates. The observed advantage is a numerical robustness advantage of the tested residual implementation under the present PDW-sequence protocol.
5.3.2. Influence of Pulse Loss on Sorting Performance
The jitter degree of pulse parameters is fixed, and the change rule of the macro average F1 score of each method under different missing rates is analyzed. The core scene test results are shown in
Figure 4:
Along the pulse-loss direction, the residual LSTM generally shows a smoother decline than along the high-jitter direction when the remaining PDW parameters are only lightly or moderately perturbed. At jitter 0.4, macro-F1 changes from 0.9200 at 0% loss to 0.8823 at 50% loss, a decrease of 0.0377. At jitter 0.6, it changes from 0.8630 to 0.8242, a decrease of 0.0388. These decreases are smaller than the loss-0 jitter-edge decrease of 0.2068. This supports the interpretation that, in this experiment, parameter distortion is the dominant stressor, while missing pulses mainly amplify the damage when jitter is already high.
This does not mean that missing pulses are unimportant. The downstream localization use requires both reliable direct-pulse references and reliable scattered-pulse candidates. At the strongest degradation point, the residual LSTM has a macro-F1 of 0.7459, but the direct-pulse class accuracy is only 0.5299 compared with the scattered-pulse class accuracy of 0.9423. Severe missingness therefore has a practical cost even when macro-F1 remains higher than the baselines, because the later geometry step depends on enough correct direct/scattered combinations.
Under high jitter, missing pulses further erode temporal continuity and reduce the reliability of both learned recurrent representations and fixed-rule TSA references. For example, TSA falls from 0.9034 in the clean setting to 0.4728 at 50% loss and jitter 1.0, while Single-LSTM falls from 0.9959 to 0.6211.
DBSCAN retains comparatively smooth behavior in some regions because density connectivity is less sensitive to exact sequence order. Its macro-F1 varies from 0.7777 at its best surface coordinate to 0.5374 at the strongest degradation corner. At 50% loss and jitter 1.0, DBSCAN keeps scattered-pulse class accuracy of 0.9695 but direct-pulse class accuracy is only 0.1858.
FCM remains limited by its static fuzzy partition of the standardized PDW space. Its clean macro-F1 is 0.6987 and falls to 0.5078 at 50% loss and jitter 1.0. The severe-corner class accuracies are 0.8274 for scattered pulses and 0.2291 for direct-path pulses. The fuzzy clusters do not adequately preserve the minority direct-path class across the perturbation grid.
5.3.3. Robustness Comprehensive Analysis of Bivariate Coupling Scene
Using the 36 coupled scenarios in
Figure 5, the robustness of the tested methods was evaluated across simultaneous pulse loss and parameter jitter. The comparison includes FCM, DBSCAN, TSA, Single-LSTM, and residual LSTM.
The residual LSTM surface retains the broadest high-score region and shows the smoothest degradation trend under coupled perturbations. Numerically, its macro-F1 ranges from 0.9827 to 0.7459 across the 36 coordinates. It remains at or above 0.90 in 16 scenarios and at or above 0.80 in 27 scenarios. By comparison, Single-LSTM reaches at least 0.90 in 12 scenarios and at least 0.80 in 18 scenarios, while DBSCAN and FCM do not reach 0.80 in any of the 36 scenarios. The residual LSTM has the broadest usable surface region under the current protocol.
The Single-LSTM surface provides the closest neural baseline and confirms the value of recurrent sequence modeling. It starts from a higher clean-corner value than the residual LSTM, 0.9959 versus 0.9827, but its minimum value is lower, 0.6211 versus 0.7459. Its clean-to-severe drop is 0.3749, compared with 0.2368 for the residual LSTM. The stronger contraction in the severe-loss and high-jitter area supports the use of the deeper residual recurrent implementation for this particular path-classification task.
Among the non-recurrent baselines, DBSCAN forms the strongest density-based reference in several degraded regions, with a surface range of 0.5374–0.7777. TSA has a high clean-corner value of 0.9034, but only one of its 36 coordinates is at or above 0.90 and its strongest-degradation value is 0.4728. FCM stays in a lower band of 0.5063–0.6987 and never reaches 0.70 in the current grid.
5.3.4. Analysis of Category Equilibrium
Class balance is operationally important in this task for two reasons. First, the dataset is highly imbalanced: without pulse loss, direct-path pulses account for only 39.62% of samples, while scattered pulses account for 60.38%. A classifier trained only to maximize overall accuracy may therefore under-detect the minority direct class. Second, both classes are needed for downstream localization. Direct-path pulses provide timing references, and scattered pulses carry information about propagation geometry and scatterer position. Severe missed detection in either class can invalidate subsequent multipath exploitation. Macro-F1 and class-wise recall are therefore more informative than overall accuracy alone.
Here we focus on the single category recall rate as the core evaluation basis of category balance. The recall rate is quantified separately for the complete recognition ability of each type of real sample, which is not disturbed by the proportion of the number of two types of samples, and can truly reflect the recognition ability of the model for each type of sample. At the same time, the absolute difference between the two types of recall rates can directly and quantitatively reflect the classification balance of the model. The smaller the difference, the representative model realizes the effective identification of the two types of samples without bias. The results are shown in
Figure 6:
Class balance is operationally important in this task for two reasons. First, the simulated dataset is moderately imbalanced, with scattered pulses forming approximately 60.38% of the stream and direct-path pulses approximately 39.62%. Second, downstream multipath-assisted localization requires both a reliable direct-path reference and reliable scattered-path candidates. Severe missed detection in either class can invalidate the subsequent geometry-based processing.
The current method comparison therefore emphasizes scenario-level macro-F1, macro precision, macro recall, and class-level behavior rather than overall accuracy alone or a single cross-scenario average. At 50% loss and jitter 1.0, the residual LSTM gives macro precision of 0.8053, macro recall of 0.7361, and macro-F1 of 0.7459. At the same coordinate, Single-LSTM gives 0.7823, 0.6353, and 0.6211; DBSCAN gives 0.7220, 0.5776, and 0.5374. These values indicate that the residual LSTM better preserves balanced path-class recognition in the most difficult tested corner.
5.4. Measured Data Verification and Transfer Learning
Significant domain gaps exist between simulated and measured data. The measured apparent gap rate reaches 0.4541, far exceeding the simulated value of 0.0986. Key features such as pulse width and pulse repetition interval also show substantial distribution differences. A Residual LSTM model trained purely on simulated data performs poorly on measured data, with a macro-F1 score of only 0.4972. To address this issue, we use conservative multi-constraint high-consistency rules to construct high-confidence pseudo-labels. These rules are derived from bistatic radar geometry and signal propagation characteristics, and their reliability is further verified through manual expert sampling inspection. This approach enables few-shot transfer of the simulation-trained model to the measured domain.
To strictly prevent data leakage, we divide 570,772 measured data samples into three completely isolated partitions. The first 342,463 samples form the transfer partition, which is used exclusively for pseudo-label screening and transfer training. The middle 57,077 samples serve as the guard partition, acting as an isolation buffer to prevent sequence windows from crossing training and test boundaries during time-block splitting. The last 171,232 samples constitute the held-out prediction partition, which is used only for final classification diagnosis and visualization and does not participate in any training or label generation processes.
Within the transfer partition, we screen high-confidence pseudo-labels using multi-constraint rules that integrate Time of Arrival (TOA) difference and Direction of Arrival (DOA) geometric features. Under the default threshold configuration (0.88 for direct-path pulses, 0.84 for scattered pulses), we obtain 54,157 high-confidence samples (31,277 direct, 22,880 scattered). The overall coverage rate is 15.81%, and the conflict rate is only 1.64%. Threshold sensitivity analysis shows this configuration achieves the best balance between coverage and label quality. The strict configuration reduces the conflict rate to 0.01% but limits coverage to 13.15%, while the relaxed configuration increases coverage to 56.93% but raises the conflict rate to 3.56%.
Using the screened high-confidence pseudo-labels, we perform controlled transfer learning with a chronological time-block splitting strategy. We compare three standard progressive fine-tuning strategies: classifier head only, last LSTM layer unfrozen, and full parameter unfreezing. The results show that full-parameter fine-tuning achieves the best performance. It reaches a macro-F1 score of 0.8932 on the pseudo-label test set, a 79.6% improvement over the pure simulation baseline. The recall rates for direct and scattered pulses are 94.88% and 86.32%, respectively. The full-parameter fine-tuning transfer learning training parameters are shown in
Table 8.
It is worth noting that high-consistency rules are sparse label screening protocols rather than full classifiers, and their outputs are used solely for transfer learning. Pseudo-label consistency metrics only reflect label matching within the transfer partition and cannot be directly interpreted as measured classification performance under human annotation conditions.
The fine-tuned model is applied to the measured dataset. The resulting classification of direct and multipath-scattered pulses is subsequently used for coarse localization of scatterers in the vicinity. This is achieved by applying the external illuminator radar equation in conjunction with the known position of the non-cooperative emitter. The main positioning constraint formula is
The ESM receiver is the origin of the polar coordinates; the polar coordinate of the non-cooperative emitter is ; the polar coordinates of the scattering target are ; is the speed of light; and is the delay difference between the multipath and direct-path pulse. The angle of arrival is satisfied , and the distance of the scattering target can be solved.
The comparison between the localization plot and the real-time open Automatic Identification System (AIS) chart data is shown in
Figure 7. The obtained scatterer positions are broadly consistent with AIS-marked ship positions and surrounding environmental information. This result supports the feasibility of using the classified direct and scattered pulses as inputs to multipath-assisted localization, but it should be interpreted as qualitative validation because calibrated localization error metrics are not available in the present dataset.
The measured-data experiment suggests that the temporal representation learned from simulation can be transferred to real PDW sequences after limited fine-tuning. The resulting coarse scatterer-localization points are broadly consistent with nautical chart information. This observation supports the feasibility of using classified direct and scattered pulses as inputs to multipath-assisted localization. However, the measured labels are not manually annotated or AIS-calibrated, and the localization stage is not evaluated using RMSE, false-alarm rate, missed-detection rate, or matching success rate.
5.5. Cross-Scenario Summary and Repeated-Seed Stability
Table 9 compares the performance of all methods across 36 combined scenarios. Under complex degradation conditions with both pulse loss and parameter jitter, our proposed residual two-layer LSTM achieves the highest average macro-F1 score of 0.8717. It outperforms the second-best single-layer LSTM baseline by 2.96 percentage points, the traditional DBSCAN clustering method by 16.47 percentage points, and the rule-based TSA method by 24.40%. This result clearly demonstrates the significant advantage of deep sequence learning methods for classifying disturbed PDW sequences.
Among non-neural network baselines, DBSCAN shows relatively good robustness. It achieves an average macro-F1 score of 0.7070 with small performance fluctuations (standard deviation = 0.0791). This is because DBSCAN’s density-based clustering is insensitive to pulse loss. It maintains reasonable classification performance even when some observations are missing. In contrast, the DOA-Period (TSA) method achieves near-perfect performance under ideal conditions (macro-F1 = 0.9034) but is extremely sensitive to parameter jitter. Its performance standard deviation of 0.1410 is the highest among all methods. This matches the design of DOA-Period, which relies on fixed DOA and period reference values. When actual signal parameters deviate from these preset values, its classification performance drops sharply.
The FCM method performs the worst across all test scenarios, with an average macro-F1 score of only 0.5766. This is because FCM’s global fuzzy partitioning assumption cannot effectively distinguish direct and scattered pulses from the same emitter. The two pulse types have almost identical core parameters (RF, PW, PRI) and only differ slightly in TOA, DOA, and PA. These small differences do not form clearly separable clusters in the high-dimensional feature space.
Table 10 summarizes the training stability of the residual LSTM model on the original validation set. Across four independent training runs with different random seeds (42, 2024, 2025, 3407), the model’s macro-F1 scores remain stable between 0.9811 and 0.9827, with a mean of 0.9821 ± 0.0007 and a narrow 95% confidence interval of ±0.0007. This extremely small variation shows that under the current simulated dataset setup, the model’s validation results are insensitive to random initialization, and the training process is highly reproducible and stable.
6. Discussion
The observed advantage of the residual LSTM is consistent with the fundamental structure of the same-emitter path-classification problem. Direct and scattered pulses can share nearly identical RF, PW, and nominal PRI patterns, while their TOA, DOA, PA, and ΔTOA behavior changes systematically with propagation geometry and observation continuity. Ordered PDW sequences therefore contain path-dependent regularities that are difficult to capture with static clustering alone. The revised ablation result supports the usefulness of ΔTOA as an informative temporal cue but does not justify treating it as the only dominant feature within the broader PDW-sequence representation.
The measured-data results require particularly cautious interpretation. The high-consensus pseudo-labels used for transfer learning are derived from the TSA method, which our own experiments show performs poorly under non-ideal conditions. While we have implemented a conservative quality control protocol that retains only samples with high agreement and confidence, no independent verification of these labels is possible without manual annotation. The held-out prediction partition reduces leakage between transfer and diagnosis, but it does not supply ground-truth labels. Therefore, pseudo-label consistency, entropy, predicted class ratio, and localization plots are presented solely as diagnostic outputs. They support the feasibility of the proposed workflow but do not allow a calibrated statement of real-data precision, recall, RMSE, or mean distance error.
Our simulation-measured distribution analysis reveals substantial domain discrepancies, including PW and PRI KS values of 0.7270 and 0.6496 respectively, and a much higher measured apparent-gap rate of 0.4541 compared to 0.0986 in simulation. These differences explain why direct simulation-only deployment is not feasible and motivate the use of transfer learning. The transfer learning comparison also revises the common assumption that frozen feature extractors are sufficient for domain adaptation. Head-only fine-tuning improves over simulation-only inference but is clearly inferior to recurrent-layer adaptation, indicating that the temporal dependencies learned in simulation are related to, but not identical with, those in measured data. In practical deployment, the recurrent feature extractor should therefore be adapted under quality-controlled measured samples whenever reliable labels or high-confidence pseudo-labels are available.
7. Conclusions
This paper studies a critical maritime ESM preprocessing task in which same-emitter pulses are separated into direct-path and multipath-scattered classes before multipath-assisted passive localization. Using an updated single-emitter simulator and an explicit chronological train/validation split that preserves temporal ordering, we evaluate FCM, DBSCAN, TSA, Single-LSTM, and residual LSTM methods under a common PDW-sequence protocol. The 36 coupled pulse-loss/parameter-jitter scenarios are interpreted through updated macro-F1 response surfaces rather than a single scenario-averaged ranking, providing a more nuanced understanding of performance under different degradation conditions.
The numerical surface analysis shows that the residual LSTM maintains the broadest stable performance region across all tested scenarios. Its macro-F1 changes from 0.9827 in the clean setting to 0.7459 at 50% pulse loss and maximum jitter, remaining at or above 0.90 in 16 scenarios and at or above 0.80 in 27 scenarios. By comparison, Single-LSTM has 12 and 18 such scenarios respectively, while DBSCAN and FCM have no scenarios at or above 0.80. At the strongest degradation coordinate, the residual LSTM gives a macro-F1 of 0.7459, compared with 0.6211 for Single-LSTM, 0.5374 for DBSCAN, 0.5078 for FCM, and 0.4728 for TSA. Checked ablation records show that removing ΔTOA decreases macro-F1 from 0.9995 to 0.7679, confirming its usefulness as a temporal feature without overstating its importance.
Simulation-measured distribution statistics reveal substantial domain discrepancies that necessitate careful transfer learning. Controlled transfer learning experiments show that head-only adaptation is insufficient for the observed domain shift, and that unfreezing all trainable layers gives the best pseudo-label consistency macro-F1 of 0.8932. Held-out measured-data classification and localization visualization are reported as diagnostic outputs only, because the current measured data do not provide manual pulse labels or calibrated AIS numerical trajectories for objective real-data precision/recall or localization-error evaluation. Overall, our results demonstrate that deep sequence learning methods are well-suited for the direct/scattered pulse classification task and that multipath returns can be retained as structured information at the sorting stage to enable subsequent scatterer localization.
8. Future Work
These results lay the groundwork for several important improvements that address the limitations identified in this study. Future work will first focus on establishing a fully validated measured-data benchmark with manually annotated pulse labels and synchronized numerical AIS trajectories, which will allow objective evaluation of real-data classification performance and localization accuracy using quantitative metrics such as RMSE and mean distance error. We will also extend the simulation environment to include more realistic maritime effects such as dynamic sea clutter, multi-emitter crosstalk, hardware nonlinearities, and advanced frequency-agile behaviors, and conduct a more comprehensive analysis of the simulation-measured domain discrepancy to inform the development of principled domain adaptation methods. Finally, we will further investigate the transfer learning process to better understand how temporal dependencies learned from simulation transfer to real-world data, and develop more efficient adaptation strategies that balance performance and computational cost for embedded ESM deployment.