Weakly Supervised Transformer for Radar Jamming Recognition

: Radar jamming recognition is a key step in electronic countermeasures, and accurate and sufficient labeled samples are essential for supervised learning-based recognition methods. However, in real practice, collected radar jamming samples often have weak labels (i.e., noisy-labeled or unlabeled ones), which degrade recognition performance. Additionally, recognition performance is hindered by limitations in capturing the global features of radar jamming. The Transformer (TR) has advantages in modeling long-range relationships. Therefore, a weakly supervised Transformer is proposed to address the issues of performance degradation under weak supervision. Specifically, complementary label (CL) TR, called RadarCL-TR, is proposed to improve radar jamming recognition accuracy with noisy samples. CL learning and a cleansing module are successively utilized to detect and remove potentially noisy samples. Thus, the adverse influence of noisy samples is mitigated. Additionally, semi-supervised learning (SSL) TR, called RadarSSL-PL-TR, is proposed to boost recognition performance under unlabeled samples via pseudo labels (PLs). Network generalization is improved by training with pseudo-labeling unlabeled samples. Moreover, the RadarSSL-PL-S-TR is proposed to further promote recognition performance, where a selection module identifies reliable pseudo-labeling samples. The experimental results show that the proposed RadarCL-TR and RadarSSL-PL-S-TR outperform comparison methods in recognition accuracy by at least 7.07% and 6.17% with noisy and unlabeled samples, respectively.


Introduction
As the battlefield environment becomes increasingly complex, radar jamming has become more prevalent.Radar jamming is a serious threat to radar systems in electronic warfare, disrupting target detection, identification, and tracking [1,2].Therefore, to acquire a real target and its parameters, anti-jamming techniques are widely used to improve the performance of radar systems.As a fundamental prerequisite, the effective recognition of radar jamming types provides significant support for implementing appropriate antijamming techniques.Namely, accurate radar jamming recognition plays a critical role in ensuring the survival and effectiveness of radar systems on the battlefield [3,4].
The existing methods for the radar jamming recognition task mainly focus on supervised learning, which depends on strong supervision information (i.e., adequate training samples with high-quality labels).However, the collected samples in real applications often come with weak supervision information (i.e., noisy or unlabeled training samples), making it challenging to achieve good recognition performance.Weakly supervised learning has shown effectiveness under weak supervision in the areas of natural language processing [5], computer vision [6], and imaging radar applications [7].Yet, there are no specific weakly supervised learning methods for the radar jamming recognition task.Next, a comprehensive overview of the existing research will be described in detail.
Supervised learning-based methods for radar jamming recognition: In recent decades, many supervised learning methods have been proven to be effective for radar jamming recognition tasks.These methods can mainly be divided into two categories: traditional feature extraction-based methods and deep learning-based methods.The overview of traditional feature extraction-based methods can be described as follows: First, distinguishing features are extracted from the time domain [8], frequency domain [9], time-frequency [10] domain, and transform domain [11].Then, these features are provided to a machine learning model for training [12,13].However, traditional feature extraction-based methods usually rely on the hand-crafted feature extraction acquired by experts, which poses a disadvantage in ensuring excellent recognition performance.
To overcome the shortcomings of traditional feature extraction-based methods, deep convolutional neural network (CNN)-based methods [14] were introduced in the field of radar jamming recognition.These methods have the advantage of extracting powerful features.Specifically, in contrast to traditional feature extraction-based methods, CNNbased methods work through a layer-by-layer feature transformation process, where the features of the original data are progressively transformed into a new feature space, that is, CNN-based methods can further extract more discriminative features automatically and thus usually achieve superior recognition performance [15].According to the format of the input radar jamming signal, these CNN-based methods can be divided into onedimensional methods [16,17] and two-dimensional methods [18,19].For one-dimensional methods, radar jamming echo sequences are directly fed into deep-learning models [20,21].Unlike one-dimensional methods, for two-dimensional methods, echo sequences are usually transformed into radar jamming images using a short-time Fourier transform.Then, these images are used as inputs for models [22,23].
Transformer-based methods for radar jamming recognition: CNN-based methods extract features by sliding convolutional kernels over the radar jamming signal, and each convolutional kernel can only extract local features within its receptive field [24].Therefore, the aforementioned one-dimensional and two-dimensional CNN-based methods have restrictions on learning long-range relationships between sampling points of radar jamming.Recently, the Transformer (TR) [25] has been widely reported due to its global dependency extraction capability, which has achieved attractive achievements in many fields [26].The self-attention mechanism effectively allows the TR to capture dependencies across arbitrary sampling points.Thus, the TR enables the extraction of global features of the radar jamming signal without being limited by the distance between sampling points.The advantages of the TR prompt researchers to explore more suitable frameworks to improve radar jamming recognition accuracy.Analogously, based on the format of the input radar jamming signal, TR-based methods can also be divided into one-dimensional-based methods [27] and twodimensional-based methods [28], which achieve higher accuracy compared to CNN-based methods owing to the remarkable capability of modeling long-distance relationships in the TR.Thus, using the TR is a more suitable method for long radar jamming sequences.
In summary, the aforementioned deep learning-based methods (i.e., CNN-based methods and TR-based methods) all belong to supervised learning methods, which depend on high-quality labels and enough labeled training samples.Although these supervised learning methods have shown superior performance in radar jamming recognition tasks, the collected radar jamming samples in practice often lack good supervision information, leading to a drop in recognition performance.Specifically, on the one hand, the accurate labeling of samples is challenging due to human and nonhuman factors.On the other hand, it is extremely difficult to obtain sufficient labeled training samples on the complex battlefield because of the limitations of security, technology, and so on.Accordingly, the obtained radar jamming samples may contain noisy samples (i.e., mislabeled samples) and plenty of unlabeled samples.Both noisy samples and unlabeled samples lead to a negative impact on recognition accuracy.That is to say, noisy samples or the lack of sufficient labeled training samples make it more challenging to accurately recognize radar jamming types.Therefore, it is highly desirable to address this problem using specialized methods.The great success of machine learning has promoted the development of radar signal processing.
Weakly supervised learning: Weakly supervised learning is an umbrella term that lightens the requirements of strong labels by learning with mislabeled or unlabeled data, and it consists of three typical types: incomplete supervision, inexact supervision, and inaccurate supervision [29].To this end, weakly supervised learning provides a direction to alleviate the problem of a reduction in recognition performance under radar jamming samples with weak labels.According to the definition of weak supervision in [29], the radar jamming recognition task in the presence of noisy samples and unlabeled samples belongs to inaccurate supervision and incomplete supervision, respectively.
For inaccurate supervision, mislabels, namely unreliable labels, are called noisy labels, where annotated labels are different from ground-truth labels [30].As research on inaccurate supervision has deepened, noisy labels have been widely discussed [31].In detail, due to manual labeling errors (a lack of professionalism), raw data noise, and other issues, data with noisy labels are inevitable [32].Simultaneously, when there are noisy labels in the dataset, the model tends to overfit these noisy labels, making it less generalizable.To ensure recognition performance is improved, first, the samples with noisy labels should be judged.Then, the harmful effects on recognition performance caused by noisy samples should be reduced during the training process.
For incomplete supervision, there are only a few labeled samples, and the rest are un-labeled samples [33].The issue of the degradation of recognition performance caused by inadequate labeled training samples should be noticed.Semi-supervised learning (SSL) is a powerful tool for solving this problem.It uses the model itself to assign pseudo labels (PL) to unlabeled samples.Following this, high-confidence (above the defined confidence threshold) pseudo-labeling samples are treated as labeled samples.These samples are then added to the training set.Therefore, instead of using labeled samples alone, both labeled and unlabeled samples are simultaneously utilized in SSL to train a recognizer.As a result, underlying data properties are better captured in SSL, which leads to an improvement in recognition performance with plenty of unlabeled samples.SSL has been widely applied and has made remarkable progress in many fields, including image recognition [34], language translation [35], and so on.
Upon review, many weakly supervised learning methods have shown great capabilities in signal processing in the field of deep learning.In recent years, many researchers, whose interests are in radar signal processing, have devoted themselves to emerging research on weak supervision for radar tasks.These methods have achieved excellent performance in the radar field, including synthetic aperture radar (SAR) [36], inverse SAR (ISAR) [37], and high-resolution range profiles (HRRPs) [38].Those weakly supervised learning methods have gradually been introduced to the field of radar.However, to the best of our knowledge, there is currently no relevant research on the application of weak supervision in non-imaging radar signal processing.
In summary, on the one hand, it is the application requirement of the actual battlefield to identify the radar jamming type with noisy samples or unlabeled samples.On the other hand, the methodology for radar jamming tasks under weak supervision information is still lacking.Motivated by this, it is necessary to improve radar jamming recognition accuracy under the condition of weak supervision.The main contributions of this study are summarized as follows: (1) Complementary label (CL) learning is introduced for the first time to recognize the type of radar jamming with noisy samples.Specifically, a novel framework called RadarCL-TR is devised to reduce the risk of incorrect information (i.e., noisy labels) and increase the discrimination of features, which contributes to the radar jamming recognition performance being boosted with noisy samples.(2) Semi-supervised learning is first specifically designed for radar jamming recognition tasks in the presence of plenty of unlabeled radar jamming samples.Specifically, elaborate semi-supervised learning with a pseudo-labeling Transformer (i.e., RadarSSL-PL-TR) mitigates the problem of recognition performance degradation by making good use of unlabeled radar jamming samples via pseudo labels generated from the model itself.(3) Moreover, based on the RadarSSL-PL-TR network, to avoid the negative impact on recognition performance caused by ambiguous pseudo-labeling samples, a radar semisupervised learning method with pseudo-labeling sample selection, called RadarSSL-PL-S-TR, is further explored to achieve higher recognition accuracy.
The rest of this paper is organized as follows: The proposed RadarCL-TR and RadarSSL-PL-S-TR methods for the radar jamming recognition task are presented in Sections 2 and 3, respectively.The datasets and experimental settings, experimental results, and discussions are reported in Sections 4 and 5. Section 6 is the conclusion.

The Proposed RadarCL-TR Framework for Radar Jamming Recognition with Noisy Samples
Figure 1 shows an overview of the proposed RadarCL-TR framework.The framework of the proposed RadarCL-TR can be divided into the training process and the testing process.
(2) Semi-supervised learning is first specifically designed for radar jamming recognition tasks in the presence of plenty of unlabeled radar jamming samples.Specifically, elaborate semi-supervised learning with a pseudo-labeling Transformer (i.e., Ra-darSSL-PL-TR) mitigates the problem of recognition performance degradation by making good use of unlabeled radar jamming samples via pseudo labels generated from the model itself.(3) Moreover, based on the RadarSSL-PL-TR network, to avoid the negative impact on recognition performance caused by ambiguous pseudo-labeling samples, a radar semi-supervised learning method with pseudo-labeling sample selection, called Ra-darSSL-PL-S-TR, is further explored to achieve higher recognition accuracy.
The rest of this paper is organized as follows: The proposed RadarCL-TR and Ra-darSSL-PL-S-TR methods for the radar jamming recognition task are presented in Section 2 and Section 3, respectively.The datasets and experimental settings, experimental results, and discussions are reported in Section 4 and Section 5. Section 6 is the conclusion.The training process consists of three main stages: noisy radar jamming sample detection, noisy radar jamming sample cleansing, and radar jamming recognition.(1) In the first stage, "noisy radar jamming sample detection", the "Transformer Architecture" is used to extract features from the noisy training set  , and noisy samples are detected by learning from complementary labels.(2) In the second stage, "noisy radar jamming sample cleansing", based on the detected noisy samples in the first stage, the noisy samples are filtered out by comparing the output probability of samples with a set threshold.Thus, only clean radar jamming training samples, whose labels are accurate, are selected as a training set  .(3) In the third stage, "radar jamming recognition", after cleansing the radar jamming samples with noisy labels, the trained "Transformer Architecture" in the first stage is used to further extract features from all of the selected clean radar jamming samples in the training set  .

The Proposed RadarCL-TR Framework for Radar Jamming Recognition with Noisy Samples
During the testing process, the "Transformer Architecture" that has undergone two rounds of training (i.e., the "Transformer Architecture" from the third stage) is used as a feature extractor to extract features and then recognizes the tested radar jamming types to achieve the goal of radar jamming recognition.During the testing process, the "Transformer Architecture" that has undergone two rounds of training (i.e., the "Transformer Architecture" from the third stage) is used as a feature extractor to extract features and then recognizes the tested radar jamming types to achieve the goal of radar jamming recognition.

Noisy Radar Jamming Samples Detection
Suppose that D = {(x i , y i )} represents the radar jamming training set, where i = 1, • • • , N, and N is the total number of radar jamming training samples.For each radar jamming training sample x i , the accurate label y i ∈ {1, • • • , K} may be flipped into an inaccurate label (i.e., noisy label) based on a noise transition matrix T jk , where K is the total number of classes in the radar jamming dataset.T jk can be defined as follows: where y i denotes the noisy label.p( y i = k|y i = j, x i ) is the probability that x i changes from the correct label class-j to the noisy label class k.Here, η stands for the noise ratio of the training set.Then, the radar jamming training set with noisy labels can be defined as D n = {(x i , y i )}, where the mislabeled samples are called noisy samples.Different from ordinary supervised learning, complementary label learning is an indirect learning method that attempts to train the feature extraction network (i.e., the Transformer architecture shown in Figure 1, which will be described in Section 2.2) by providing less but correct information from complementary labels.Taking "pure noise" as an example, whose true label is class 1.Its complementary label should be "not class 1".Specifically, for a certain class j radar jamming sample (x i , y i = j), its complimentary label y i can be obtained by randomly selecting a label from a candidate label list, which consists of the other K − 1 classes except for j.The process can be formulated as follows: Based on Equation (2), the radar jamming training set with complementary labels can be denoted as D c = {(x i , y i )}.To ensure the convergence of the recognizer trained with complementary labels to that of the optimal recognizer trained with true labels, the modified loss function can be described as follows: where i and N indicate the i − th radar jamming training samples x i and the total number of radar jamming training samples, respectively.y k and p k are one-hot vectors for the complementary label and the prediction, respectively.According to Equation (3), the L( f , y) value becomes smaller when the prediction probability of the complementary label is closer to 0. This increases the probability values of the other classes.In this way, the noisy radar jamming samples can be detected.

TR-Based Feature Extraction for Radar Jamming
Figure 2 shows the Transformer architecture in Figure 1, which is the key part for extracting discriminative features of radar jamming.Given a radar jamming sample x i ∈ R l with l sampling points, it is equally split into a series of non-overlapped pieces x ′ i .The series of non-overlapped pieces of radar jamming is defined as where p indicates the size of a single piece.After linear projection, a trainable class token is appended to the piece tokens of radar jamming.Then, these tokens, along with their position information, form the initial input.Next, the input is fed to the Transformer encoder for feature extraction, which consists of L = L 1 , L 2 , • • • , L L Transformer encoder blocks.Specifically, one Transformer encoder block contains two layers of normalization, two element-wise addition operations, multihead self-attention (MHSA), and multi-layer perceptron.Note that the MHSA is used for integrating long-range features of radar jamming, which can be formulated as follows: where , and V ∈ R d v are the query, key, and value, respectively.Finally, the correlations between piece tokens of radar jamming are continuously aggregated into the class token from a global perspective.By following full connection, the predictions are created via a softmax function in the full connection layer.

Noisy Radar Jamming Sample Cleansing
Considering the fact that the model is prone to fit noisy labels, erroneous directions are provided when training the model.This leads to poor generalization performance.To address the aforementioned problem and further avoid memorizing the wrong information from noisy labels, the noisy radar jamming samples are removed in the proposed methods.Based on the assumption that the higher the predicted probability, the closer to the ground truth, the threshold α is used to filter out those radar jamming samples with noisy labels, as illustrated in Figure 1.Let P i denote the logits after softmax for the radar jamming sample x i .P i is the foundation for the predicted probability S i , which can be formulated as follows: where y i denotes the true label of the radar jamming sample x i .y i points to the probability of the corresponding class of radar jamming sample x i in row x i of P i .
Let D s = {(x s , y s )} (D s ⊆ D n ) represent the selected clean radar jamming training set, where s = 1, • • • , M, M < N, and M denotes the total number of selected clean radar jamming training samples.The clean radar jamming samples are defined as those with accurate labels.Meanwhile, the output probability S i of radar jamming samples is compared with a set threshold α, which can be expressed as follows: According to Equation ( 6), the original noisy radar jamming training set D n can first be estimated and divided into clean and noisy radar jamming training samples depending on the predictions and a threshold α.Next, the judged clean and noisy radar jamming training samples are reserved and discarded, respectively.After that, the noisy radar jamming samples can be cleansed, and only the selected clean radar jamming training samples are used for the eventual recognition task.Noisy radar jamming sample cleansing is a useful tool to alleviate the negative impact caused by noisy radar jamming samples, resulting in better recognition performance.

Radar Jamming Recognition
After removing the noisy samples from the given noisy radar jamming training set D n = {(x i , y i )}, the selected clean radar jamming samples are fed into the feature extraction network to complement the recognition task.Two networks, the CNN and TR, are used to form RadarCL-CNN and RadarCL-TR, respectively.However, CNN has difficulty in modeling long-range dependencies because of the limitation of the convolutional kernel size.Compared with the CNN, the TR can capture distant relationships in radar jamming.According to the above analysis, the TR is more suitable for radar jamming recognition tasks.
All samples in D s are regarded as clean radar jamming training samples, whose labels are correct.Then, those radar jamming samples train the feature extraction network using the ordinary cross-entropy (CE) loss function.
Note that when minimizing the CE loss using the gradient descent algorithm, the closer the prediction probability of the true label is to 1, the smaller the value of loss (closer to 0), which is different from Equation (3).
Finally, the radar jamming testing set is fed into the trained feature extraction network to categorize each radar jamming sample into different groups based on their features and then generate recognition results.

The Proposed RadarSSL-PL-S-TR Framework for Radar Jamming Recognition with Labeled and Unlabeled Samples
As shown in Figure 3, the designed RadarSSL-PL-S-TR method consists of four main steps: pre-training with a labeled radar jamming set, generating pseudo labels for unlabeled radar jamming samples, retraining with a labeled and pseudo-labeling radar jamming set, selecting reliable pseudo-labeling radar jamming samples, and then looping back to step 3. First, labeled radar jamming samples are used to pre-train the feature extraction network to provide a foundation for obtaining the pseudo labels of unlabeled samples.Next, the unlabeled radar jamming samples are sent to the trained model in step 1 to generate pseudo labels.Then, the pseudo-labeling and labeled radar jamming samples are combined to train the recognizer.Finally, the reliable pseudo-labeling radar jamming samples are regarded as labeled samples and added to the original label set to form the new labeled radar jamming training set.Meanwhile, step 3 and step 4 loop.
extraction network to complement the recognition task.Two networks, the CNN and TR, are used to form RadarCL-CNN and RadarCL-TR, respectively.However, CNN has difficulty in modeling long-range dependencies because of the limitation of the convolutional kernel size.Compared with the CNN, the TR can capture distant relationships in radar jamming.According to the above analysis, the TR is more suitable for radar jamming recognition tasks.
All samples in  are regarded as clean radar jamming training samples, whose labels are correct.Then, those radar jamming samples train the feature extraction network using the ordinary cross-entropy (CE) loss function.
Note that when minimizing the CE loss using the gradient descent algorithm, the closer the prediction probability of the true label is to 1, the smaller the value of loss (closer to 0), which is different from Equation (3).
Finally, the radar jamming testing set is fed into the trained feature extraction network to categorize each radar jamming sample into different groups based on their features and then generate recognition results.

The Proposed RadarSSL-PL-S-TR Framework for Radar Jamming Recognition with Labeled and Unlabeled Samples
As shown in Figure 3, the designed RadarSSL-PL-S-TR method consists of four main steps: pre-training with a labeled radar jamming set, generating pseudo labels for unlabeled radar jamming samples, retraining with a labeled and pseudo-labeling radar jamming set, selecting reliable pseudo-labeling radar jamming samples, and then looping back to step 3. First, labeled radar jamming samples are used to pre-train the feature extraction network to provide a foundation for obtaining the pseudo labels of unlabeled samples.Next, the unlabeled radar jamming samples are sent to the trained model in step 1 to generate pseudo labels.Then, the pseudo-labeling and labeled radar jamming samples are combined to train the recognizer.Finally, the reliable pseudo-labeling radar jamming samples are regarded as labeled samples and added to the original label set to form the new labeled radar jamming training set.Meanwhile, step 3 and step 4 loop.

Pre-training Labeled Radar Jamming Set
Due to the challenge of radar jamming sample labeling in practical applications, there may be a large amount of unlabeled samples but few labeled samples in the radar jamming dataset.In general, the number of labeled radar jamming samples has a great impact

Pre-training Labeled Radar Jamming Set
Due to the challenge of radar jamming sample labeling in practical applications, there may be a large amount of unlabeled samples but few labeled samples in the radar jamming dataset.In general, the number of labeled radar jamming samples has a great impact on the recognition performance, and the recognition performance will be heavily hampered by only using a few labeled radar jamming samples to train the network.To overcome the issue of recognition performance degradation under limited labeled radar jamming samples, unlabeled radar jamming samples should be fully utilized during the training process with the aim of achieving better recognition performance.
Let D l = {x l , y l } m l=1 and D u = {x u } m+t u=m+1 (t ≫ m) denote the labeled radar jamming training set with m labeled radar jamming samples and t unlabeled radar jamming samples, respectively.As shown in Figure 3, step 1 aims to optimize the deep feature extraction model with labeled radar jamming samples using CE loss.After that, the unlabeled radar jamming samples are matched to their corresponding pseudo labels by reusing a trained model.

Generating Pseudo Labels for Unlabeled Radar Jamming Samples
Note that the connection between labeled and unlabeled radar jamming samples can be established by relying on the feature representation space in a trained model (defined as g(•)).In this way, pseudo labels are assigned to unlabeled radar jamming samples via the model itself.Specifically, for an unlabeled radar jamming sample x u , followed by feeding x u to g(•), a series of predicted probabilities under different classes are output.Then, the category corresponding to the maximum predictions is regarded as the pseudo label for a given unlabeled radar jamming sample, which can be written as follows: where Y represents pseudo labels for unlabeled radar jamming samples D u , and each y u in Y is composed of the highest predicted probability.In this way, the pseudo-labeling radar jamming training set is created as D p = {x u , y u } m+t u=m+1 .Pseudo labels provide a way to make use of these unlabeled radar jamming samples, which enables recognition performance to be improved by learning from both labeled and unlabeled radar jamming samples with pseudo labels instead of only learning from a few labeled radar jamming samples.Training the model via a large amount of unlabeled radar jamming samples with the generated pseudo labels can significantly promote the generalization ability of the model.

Retraining Labeled and Pseudo-Labeling Radar Jamming Set
The model gains prediction ability from learning the labeled radar jamming samples and applies it to produce pseudo labels for unlabeled radar jamming samples.Once the pseudo labels are obtained, the unlabeled radar jamming samples can participate in training a network, which is helpful in capturing richer and more discriminative features of radar jamming, that is, both the labeled radar jamming training set D l and pseudo-labeling radar jamming training set D p are employed to facilitate the training process.In particular, taking the loss function of pseudo-labeling samples into consideration, the overall loss function combining the loss functions of labeled and pseudo-labeling radar jamming samples can be computed as follows: where L l and L p are the losses for labeled and unlabeled radar jamming samples with pseudo labels, respectively.L(•) is the CE loss function.β = i•γ epoch is a hyperparameter, which indicates the weight γ of pseudo-labeling radar jamming samples and changes with epoch.i and epoch represent the i − th epoch and the total number of epochs, respectively.

Selecting Reliable Pseudo-Labeling Radar Jamming Samples
Despite the pseudo labels being generated based on the model's high-confidence predictions, the generated pseudo labels are sometimes inaccurate.To mitigate the misguidance of incorrect pseudo labels to the model, it is necessary to select unlabeled radar jamming samples with reliable pseudo labels.A simple yet effective method is used to select high-quality pseudo labels.Specifically, first, the clustering algorithm, i.e., the approximate rank-order clustering (AROC) algorithm, is adopted to generate pseudo labels based on the assumption that radar jamming samples belonging to the same group tend to possess the same label.Then, each given unlabeled radar jamming sample x u in D u can acquire its pseudo label according to the AROC cluster with labeled radar jamming samples, which are defined as . The pseudo labels of Y and Y ′ are compared element by element, and then the pseudo-labeling radar jamming samples are split into reliable and unreliable pseudo-labeling radar jamming samples.In general, the pseudo labels are regarded as reliable labels when they are the same in both Y and Y ′ .Otherwise, the pseudo labels are considered unreliable.
Next, according to the comparison results, the selected reliable pseudo-labeling radar jamming set is denoted as

Data Description
The radar jamming simulation dataset is used to verify the effectiveness of the proposed methods.In the dataset, linear frequency modulation (LFM) signal is used as the radar transmission waveform, which can be described as follows: where k = B T is the frequency modulation slope.B is the bandwidth, and T denotes the pulse width.Their values are 10 MHz and 20 µs, respectively.The rect t T can be expressed as follows: According to the different jamming effects, the radar jamming simulation dataset can be divided into pure noise, suppression jamming (i.e., aiming jamming (AJ), blocking jamming (BJ), and sweep jamming (SJ)), deception jamming (i.e., distance deception jamming (DDJ) and dense false target jamming (DFTJ)), novel jamming (i.e., interrupted-sampling repeater jamming (ISRJ) and smart noise jamming (SNJ)), passive jamming (i.e., chaff jamming (CJ)), and compound jamming.In summary, 12 types of radar jamming signals (i.e., the value of K is 12) are created, with each containing 500 samples, making a total of 6000 samples (i.e., the value of N is 6000).Figure 4 shows their time domain waveforms, where the 12 types of radar jamming signals can be labeled sequentially as C1 to C12.
The sampling rate is 20 MHz, and the pulse repetition interval (PRI) is 100 µs.Consequently, each sample has 2000 complex sampling points.Then, the real and imaginary part of the complex sampling points are concatenated.This means each sample has 4000 sampling points (i.e., the value of l is 4000), and the former and the latter 2000 points correspond to the real part and the imaginary part, respectively.At the same time, the piece size p is set to 16. correspond to the real part and the imaginary part, respectively.At the same time, the piece size  is set to 16.
(1) Aiming Jamming (AJ) Compared to the other types of radar jamming, the bandwidth of AJ is narrow.It is randomly set to 2~4 times the LFM signal bandwidth  (i.e., 20~40 MHz).AJ is usually used with noise amplitude modulation jamming, which can be expressed as follows: where  is the direct current bias.The modulation noise  (t) is a generalized stationary random process distributed in the interval [− , ∞) with a mean of 0.  is a random variable uniformly distributed over the interval [0, 2) and independent of the modulation noise  (t).
(2) Blocking Jamming (BJ) BJ can be generated quickly.It requires the jammer to have a higher power.It is randomly set to 5~8 times the LFM signal bandwidth (i.e., 50~80 MHz).BJ is usually used with the noise frequency modulation jamming, which can be expressed as follows: where  and  are the amplitude and the frequency modulation slope of noise frequency modulation jamming.
(3) Sweep Jamming (SJ) The principle of SJ is the same as that of BJ, except the bandwidth range is larger and changes periodically.When the bandwidth of jamming exceeds the bandwidth of the receiver, the receiver cannot receive the jamming signal, so there will not be jamming in the position where the jamming bandwidth is greater than the bandwidth of the receiver.
It can be observed in Figure 4e that the amplitude of the noise frequency modulation in SJ is constant within the bandwidth of the radar receiver, and the frequency is LFM superimposed with a random frequency.(1) Aiming Jamming (AJ) Compared to the other types of radar jamming, the bandwidth of AJ is narrow.It is randomly set to 2 ∼ 4 times the LFM signal bandwidth B (i.e., 20 ∼ 40 MHz).AJ is usually used with noise amplitude modulation jamming, which can be expressed as follows: where U 0 is the direct current bias.The modulation noise U n (t) is a generalized stationary random process distributed in the interval [−U 0 , ∞) with a mean of 0. φ j is a random variable uniformly distributed over the interval [0, 2π) and independent of the modulation noise U n (t).
(2) Blocking Jamming (BJ) BJ can be generated quickly.It requires the jammer to have a higher power.It is randomly set to 5 ∼ 8 times the LFM signal bandwidth (i.e., 50 ∼ 80 MHz).BJ is usually used with the noise frequency modulation jamming, which can be expressed as follows: where U j and K FM are the amplitude and the frequency modulation slope of noise frequency modulation jamming.
(3) Sweep Jamming (SJ) The principle of SJ is the same as that of BJ, except the bandwidth range is larger and changes periodically.When the bandwidth of jamming exceeds the bandwidth of the receiver, the receiver cannot receive the jamming signal, so there will not be jamming in the position where the jamming bandwidth is greater than the bandwidth of the receiver.
It can be observed in Figure 4e that the amplitude of the noise frequency modulation in SJ is constant within the bandwidth of the radar receiver, and the frequency is LFM superimposed with a random frequency.
(4) Distance Deception Jamming (DDJ) DDJ intercepts radar signals and then forwards them after modulation.To achieve the effect of deception as much as possible, only the time delay and the amplitude are modulated, so the forwarded jamming signal is similar to the radar signal waveform.Specifically, where A is the amplitude of DDJ.t 0 denotes the time delay corresponding to the target echo, and τ represents the additional time delay of jammer forwarding.The jamming amplitude A varies randomly from 0.5 to 2. The time delay of DDJ is randomly selected from 1 µs to 10 µs.
(5) Dense False Target Jamming (DFTJ) Unlike DDJ, which forwards one false target, DFTJ forwards multiple false targets.It can be expressed as follows: where F is the number of forwarded false targets, and its value is set to 3 ∼ 5. Additionally, the time delays of the target and the f -th false target are denoted as t 0 and τ f , respectively.A f is the amplitude of the f -th false target.
(6) Interrupted-Sampling Repeater Jamming (ISRJ) ISRJ captures part of target signals and repeatedly forwards them.After that, false targets are created, which harm the performance of radar systems.The time-domain expression for ISRJ is expressed as follows: where A and M are the amplitude and number of repeated forwarding processes, respectively.τ denotes the time delay of the ISRJ.s(t) is the radar signal and is defined in Equation ( 9).T j and are the sampling duration and the convolution, respectively.T s and N represent the PRI and its number.The sampling and forwarding modes of SNJ are similar to those of ISRJ, and they all belong to novel radar jamming.The Gauss white noise n(t) is adopted to generate SNJ, which can be expressed as follows: Radar active jamming can destroy the function of acquiring target information in the background of strong jamming by actively transmitting high-power electromagnetic radiation signals.This active jamming method also has some hidden dangers, and the active jamming signal will also become the target of the enemy's counter-confrontation and attack.In addition to active jamming, passive jamming methods can also be used to produce deception and suppress jamming effects.CJ is a kind of passive jamming that is commonly used, which can be expressed as follows: where I is the number of chaffs in chaff clouds.Its value is randomly selected from the range of 1000 to 2000.f ′ di is the Doppler shift of the i-th chaff.The Doppler frequency shift caused by chaff cloud translation is 20 ∼ 50.The Doppler variance of a single chaff is 10 ∼ 20.
(9) Compound Jamming In the complex battlefield environment, the cooperative operation of multiple jammers not only makes the radar receive single jamming but also additive compound jamming containing two or more jamming signals.Therefore, this article combines the above process of single radar jamming to obtain three-compound jamming: DFTJ + SNJ, CJ + ISRJ, and DDJ + SJ.
The JNR of AJ, BJ, SJ, ISRJ, and SNJ is randomly selected in the interval [30 dB, 60 dB], where JNR represents the ratio of jamming to signal.

Evaluation Metrics
To evaluate recognition performance, two widely used quantitative metrics (i.e., the overall accuracy (OA) and kappa coefficient (KC)) in recognition tasks are employed.Higher values of the two above quantitative metrics indicate better recognition performance of the method.The average per-class accuracy (APA) is defined as the mean accuracy for each type of radar jamming.Furthermore, to mitigate the impact of random errors and guarantee the reliability of the recognition results, the results are reported in the form of the mean and standard variance with five independent experiments.
OA is the proportion of correctly recognized test samples t N among the total number of test samples T N , which is defined as KC quantifies the consistency between the model-predicted outcomes and actual results, defined as where g K and h K are the number of actual test samples per class and the number of predicted test samples per class, respectively.K is the total number of classes.

Experimental Setup
(1) Experimental Setup for Recognition with Noisy-Labeled Samples For recognition with noisy-labeled samples, in each class, 30 samples and another 3 samples are randomly chosen as the training set and the validation set, respectively, and the rest of the samples are used as the testing set.The widely used advanced methods based on noise-robust loss functions, including generalized cross entropy (GCE) [39] and symmetric cross entropy (SCE) [40], are used as comparison methods.The hyperparameter q in [39] is set to 0.5.In [40], the hyperparameters alpha and beta are set to 0.1 and 1.0, respectively.
In the following experiments, the symmetric noise ratios η are set to 0.1 and 0.2 to compare the performance of different methods.The results are reported as the mean and standard variation obtained from ten independent runs using randomly selected training samples in each run.
For the proposed RadarCL-TR methods, the number of Transformer encoder blocks L and parallel attention heads h are set to 5 and 6, respectively.During the training process, the initial learning rate in the first stage of noisy radar jamming sample detection is set to 0.0009.Considering the difference in the output probability between noisy and clean samples, the threshold α is set to 0.5 in the second stage.Additionally, a learning rate adjustment method using cosine annealing from 0.001 to 0.0001 is adopted in the last radar jamming recognition stage.
(2) Experimental Setup for Recognition with Labeled and Unlabeled Samples For recognition with unlabeled samples, 10 training samples (denoted as R and R = m/K) for each class are selected as the labeled training set to explore the recognition performance of different methods.The proposed RadarSSL-PL-S-TR is compared with popular semi-supervised recognition methods, such as label propagation (LP) [41] and Laplacian support vector machine (LapSVM) [42].
For the proposed RadarSSL-PL-S-TR, the initial learning rate and epochs in the pretraining stage are set to 0.001 and 100, respectively.The learning rate and epochs in the retraining stage are set to 0.00007 and 50, respectively.
To guarantee a fair comparison, all experiments are conducted on NVIDIA GeForce RTX 3060.samples, the threshold α is set to 0.5 in the second stage.Additionally, a learning rate adjustment method using cosine annealing from 0.001 to 0.0001 is adopted in the last radar jamming recognition stage.

The Recognition Results of the RadarCL-TR Framework
(2) Experimental Setup for Recognition with Labeled and Unlabeled Samples For recognition with unlabeled samples, 10 training samples (denoted as  and  = /) for each class are selected as the labeled training set to explore the recognition performance of different methods.The proposed RadarSSL-PL-S-TR is compared with popular semi-supervised recognition methods, such as label propagation (LP) [41] and Laplacian support vector machine (LapSVM) [42].
For the proposed RadarSSL-PL-S-TR, the initial learning rate and epochs in the pretraining stage are set to 0.001 and 100, respectively.The learning rate and epochs in the retraining stage are set to 0.00007 and 50, respectively.
To guarantee a fair comparison, all experiments are conducted on NVIDIA GeForce RTX 3060.From Figure 5, the following observations can be made: (1) In the early stage of the training process, it can be seen that the clean and noisy samples are mixed, with their output probabilities both being at lower values.(2) In the middle stage of the training process, as training progresses, the output probabilities of the clean samples become larger and gradually shift to the right of the histograms.(3) In the later stage of the training process, it can also be seen that the clean samples and the noisy samples can be well separated as the training continues.After that, these noisy samples will be discarded.

The Recognition Results of the RadarCL-TR Framework
(2) Recognition Performance Compared with Other Methods Table 1 shows the results of the comparison methods and the proposed methods.Specifically, in Table 1, the results of proposed method and the highest accuracy of APA in per class are presented in bold.
In Table 1, it can be clearly seen that the proposed complementary learning methods (i.e., RadarCL-CNN and RadarCL-TR) achieve higher average recognition accuracies in comparison to the other CNN-based methods and TR-based methods, respectively.Furthermore, when compared to the highest accuracies achieved by the comparison methods with CNN-based and TR-based methods (i.e., SCE-CNN and SCE-TR), the RadarCL-CNN and RadarCL-TR outperform them by 1.29% and 7.07% regarding the OA, respectively.Moreover, it is worth noting that the proposed RadarCL-TR method can achieve the highest recognition accuracy from the aspects of OA and K in all ratios of symmetric noise, and this illustrates that the TR, which can capture long-range relationships, is more suitable for radar jamming recognition with noisy labels.
Experiments with different numbers of training samples are used to further evaluate the robustness of the proposed methods.As shown in Figure 6, the proposed RadarCL-TR method still outperforms the other advanced methods in terms of the OA with 20 training samples in each class at various ratios of symmetric noise.
output probabilities were created and are demonstrated in Figure 5. Blue and oran Figure 5 indicate the distribution histograms of the output probability for the clean noisy samples, respectively.
From Figure 5, the following observations can be made: (1) In the early stage o training process, it can be seen that the clean and noisy samples are mixed, with output probabilities both being at lower values.(2) In the middle stage of the tra process, as training progresses, the output probabilities of the clean samples be larger and gradually shift to the right of the histograms.(3) In the later stage of the tra process, it can also be seen that the clean samples and the noisy samples can be wel arated as the training continues.After that, these noisy samples will be discarded.
(2) Recognition Performance Compared with Other Methods Table 1 shows the results of the comparison methods and the proposed met Specifically, in Table 1, the results of proposed method and the highest accuracy of in per class are presented in bold..
In Table 1, it can be clearly seen that the proposed complementary learning me (i.e., RadarCL-CNN and RadarCL-TR) achieve higher average recognition accurac comparison to the other CNN-based methods and TR-based methods, respectively thermore, when compared to the highest accuracies achieved by the comparison met with CNN-based and TR-based methods (i.e., SCE-CNN and SCE-TR), the RadarCLand RadarCL-TR outperform them by 1.29% and 7.07% regarding the OA, respect Moreover, it is worth noting that the proposed RadarCL-TR method can achieve the est recognition accuracy from the aspects of OA and K in all ratios of symmetric n and this illustrates that the TR, which can capture long-range relationships, is more able for radar jamming recognition with noisy labels.
Experiments with different numbers of training samples are used to further eva the robustness of the proposed methods.As shown in Figure 6, the proposed Rada TR method still outperforms the other advanced methods in terms of the OA wi training samples in each class at various ratios of symmetric noise.(3) Feature Visualization Figure 7 shows t-distributed stochastic neighbor embedding (t-SNE) using 30 ing samples per class with a noise ratio  of 0.1, which aims to understand the fe distribution of the proposed methods along with their visualization.Here, different jamming types (i.e., C1-C12) correspond to different colors.It can be seen that, comp with the other comparison methods (i.e., (a), (b), and (c) and (e), (f), and (g)), the prop (3) Feature Visualization Figure 7 shows t-distributed stochastic neighbor embedding (t-SNE) using 30 training samples per class with a noise ratio η of 0.1, which aims to understand the feature distribution of the proposed methods along with their visualization.Here, different radar jamming types (i.e., C1-C12) correspond to different colors.It can be seen that, compared with the other comparison methods (i.e., (a), (b), and (c) and (e), (f), and (g)), the proposed RadarCL-based methods (i.e., (d) RadarCL-CNN and (h) RadarCL-TR) can obtain more discriminative features under CNN-based and TR-based methods, respectively.Obviously, the proposed RadarCL-TR method demonstrates superior recognition results owing to its ability to decrease the inter-class distance and enlarge the intra-class distance.(4) A Complexity Analysis of the Methods with Noisy Samples Table 2 shows the complexity analysis of the methods with 30 training samples when the ratio of symmetric noise is 0.2.The results in Table 2 reveal that the proposed RadarCL-based methods require more training time.However, it is worth noting that, despite the longer training time in the RadarCL-based methods, the inference time is comparable to the other methods during the testing process.Additionally, the proposed RadarCL-based methods maintain higher recognition accuracy without needing too many computing resources.Furthermore, compared to RadarCL-CNN, the testing time is only slightly increased (i.e., 0.24 ms) in the proposed RadarCL-TR, but the OA is significantly improved (i.e., 12.71%).

The Recognition Results of the RadarSSL-PL-S-TR Framework
(1) A sensitivity analysis for the parameter The proposed improved RadarSSL-PL-TR and RadarSSL-PL-S-TR methods include   2 shows the complexity analysis of the methods with 30 training samples when the ratio of symmetric noise is 0.2.The results in Table 2 reveal that the proposed RadarCL-based methods require more training time.However, it is worth noting that, despite the longer training time in the RadarCL-based methods, the inference time is comparable to the other methods during the testing process.Additionally, the proposed RadarCL-based methods maintain higher recognition accuracy without needing too many computing resources.Furthermore, compared to RadarCL-CNN, the testing time is only slightly increased (i.e., 0.24 ms) in the proposed RadarCL-TR, but the OA is significantly improved (i.e., 12.71%).

The Recognition Results of the RadarSSL-PL-S-TR Framework
(1) A sensitivity analysis for the parameter The proposed improved RadarSSL-PL-TR and RadarSSL-PL-S-TR methods include one important parameter in the retraining stage called weight γ. Figure 8 shows the accuracy for different values of weight γ: 0.1, 0.3, 0.5, 0.7, 0.9, 1, 2, and 3.It can be seen that the highest accuracy for RadarSSL-PL-TR and RadarSSL-PL-S-TR can be achieved when the weight is set to 0.3 and 3, respectively.one important parameter in the retraining stage called weight . Figure 8 shows curacy for different values of weight : 0.1, 0.3, 0.5, 0.7, 0.9, 1, 2, and 3.It can be s the highest accuracy for RadarSSL-PL-TR and RadarSSL-PL-S-TR can be achieve the weight is set to 0.3 and 3, respectively.It can be observed from Figure 8 that when the value of the weight  is sm proposed RadarSSL-PL-TR demonstrates good effectiveness.Conversely, for t posed RadarSSL-PL-S-TR, a higher recognition accuracy corresponds to larger v weight .The reason for this phenomenon can be described as follows.
In the case of the proposed RadarSSL-PL-TR, due to inherent limitations in r tion technology, the assigned pseudo-labels for the unlabeled samples may not al correct.When these pseudo-labeling samples are incorporated into a new training a lower weight during the training process, this mitigates some of the negative caused by incorrect pseudo-labels.Thus, the proposed RadarSSL-PL-TR achiev recognition performance with a lower weight .
Conversely, in the proposed RadarSSL-PL-S-TR, by introducing reliable sam lection, the number of samples with erroneous pseudo labels is reduced.The qu the new training set is significantly enhanced.Therefore, the recognition accurac proved with a higher weight .
(2) Feature Visualization Figure 9 illustrates the 2D visualization of the output features using the t-SN rithm.As one can observe, the proposed RadarSSL-PL-based methods (i.e., Radar CNN, RadarSSL-PL-S-CNN, RadarSSL-PL-TR, and RadarSSL-PL-S-TR) show bett separability than the other baselines, which is helpful to boost recognition perfo Note that the proposed RadarSSL-PL-S-TR shows the best clustering, indicating ferent types of radar jamming can be effectively distinguished.This is consistent w observation that the proposed RadarSSL-PL-S-TR achieves the highest recognitio racy, as shown in Table 3.
(3) Recognition Performance Compared with Other Methods Table 3 shows the recognition results of the comparison and RadarSSL-P recognition methods obtained when using the simulation datasets.In Table 3, the of the proposed RadarSSL-PL-TR and RadarSSL-PL-S-TR are presented in bold gr bold red, respectively.It can be observed from Figure 8 that when the value of the weight γ is small, the proposed RadarSSL-PL-TR demonstrates good effectiveness.Conversely, for the proposed RadarSSL-PL-S-TR, a higher recognition accuracy corresponds to larger values of weight γ.The reason for this phenomenon can be described as follows.
In the case of the proposed RadarSSL-PL-TR, due to inherent limitations in recognition technology, the assigned pseudo-labels for the unlabeled samples may not always be correct.When these pseudo-labeling samples are incorporated into a new training set with a lower weight during the training process, this mitigates some of the negative impact caused by incorrect pseudo-labels.Thus, the proposed RadarSSL-PL-TR achieves good recognition performance with a lower weight γ.
Conversely, in the proposed RadarSSL-PL-S-TR, by introducing reliable sample selection, the number of samples with erroneous pseudo labels is reduced.The quality of the new training set is significantly enhanced.Therefore, the recognition accuracy is improved with a higher weight γ.
(2) Feature Visualization Figure 9 illustrates the 2D visualization of the output features using the t-SNE algorithm.As one can observe, the proposed RadarSSL-PL-based methods (i.e., RadarSSL-PL-CNN, RadarSSL-PL-S-CNN, RadarSSL-PL-TR, and RadarSSL-PL-S-TR) show better class separability than the other baselines, which is helpful to boost recognition performance.Note that the proposed RadarSSL-PL-S-TR shows the best clustering, indicating that different types of radar jamming can be effectively distinguished.This is consistent with the observation that the proposed RadarSSL-PL-S-TR achieves the highest recognition accuracy, as shown in Table 3.
(3) Recognition Performance Compared with Other Methods Table 3 shows the recognition results of the comparison and RadarSSL-PL-based recognition methods obtained when using the simulation datasets.In Table 3, the results of the proposed RadarSSL-PL-TR and RadarSSL-PL-S-TR are presented in bold green and bold red, respectively.First, it can be seen that the proposed TR-based methods achieve higher accuracies than the LP-, LapSVM-, and CNN-based methods.It is also confirmed that TR-based methods achieve higher recognition accuracies compared to CNN-based methods.This is consistent with the conclusion of the RadarCL-based method that the TR-based method shows superior radar jamming recognition ability under noisy samples.
Then, the results of two clustering (i.e., K-means and AROC)-based methods are reported.As shown in Table 3, compared to the AROC-based methods (i.e., AROC-CNN and AROC-TR), K-means-based methods (i.e., K-CNN and K-TR) achieve lower recognition accuracy.Specifically, AROC-CNN improves the OA by 1.56% compared to K-CNN, while AROC-TR increases the OA by 0.83% compared to K-TR.The superiority of the AROC algorithm for the radar jamming recognition task can be attributed to the following reasons: (1) The AROC algorithm calculates similarity by comparing the nearest neighbor rankings, making it more robust to noise and outliers from the radar jamming signals.(2) The AROC algorithm automatically determines the number of clusters through nearest neighbor relationships and thresholds, which promotes better clustering for radar jamming signals.First, it can be seen that the proposed TR-based methods achieve higher accuracies than the LP-, LapSVM-, and CNN-based methods.It is also confirmed that TR-based methods achieve higher recognition accuracies compared to CNN-based methods.This is consistent with the conclusion of the RadarCL-based method that the TR-based method shows superior radar jamming recognition ability under noisy samples.
Then, the results of two clustering (i.e., K-means and AROC)-based methods are reported.As shown in Table 3, compared to the AROC-based methods (i.e., AROC-CNN and AROC-TR), K-means-based methods (i.e., K-CNN and K-TR) achieve lower recognition accuracy.Specifically, AROC-CNN improves the OA by 1.56% compared to K-CNN, while AROC-TR increases the OA by 0.83% compared to K-TR.The superiority of the AROC algorithm for the radar jamming recognition task can be attributed to the following reasons: (1) The AROC algorithm calculates similarity by comparing the nearest neighbor rankings, making it more robust to noise and outliers from the radar jamming signals.(2) The AROC algorithm automatically determines the number of clusters through nearest neighbor relationships and thresholds, which promotes better clustering for radar jamming signals.
(3) The AROC algorithm can handle non-convex-shaped clusters, which better match the actual distribution of radar jamming signals.
Next, applying only PL-based methods (i.e., RadarSSL-PL-CNN and RadarSSL-PL-TR) can obtain suboptimal recognition performance in CNN-based and TR-based methods, respectively.
Finally, the combined PL with a reliable pseudo-labeling sample selection module, called RadarSSL-PL-S-CNN and RadarSSL-PL-S-TR, can further improve the recognition performance.Moreover, the proposed RadarSSL-PL-S-TR achieves the best performance compared to the other methods.It outperforms RadarSSL-PL-TR by 6.17%, and 6.74% in terms of the OA and K when the number of samples per class is 10.
(4) A Complexity Analysis for the Methods with Labeled and Unlabeled Samples Table 4 presents a complexity analysis of the methods in the presence of unlabeled samples.From Table 4, the following can be observed: First, compared to the RadarSSL-PLbased methods, the training time of the RadarSSL-PL-S-based methods is increased by approximately two times.The OA is increased by 2.37% and 6.17% under the CNN and TR frameworks, respectively.Next, the inference time for a single sample and the number of model parameters are the same under the CNN-based and TR-based methods, respectively.The reason for this phenomenon is that the architectures for inference are the same in the CNN-based and TR-based methods, respectively.Specifically, for the CNN-based methods, the test time is 0.12 ms and the number of parameters is 0.09 M. For the TR-based methods, the test time is 0.36 ms and the number of parameters is 0.27 M. Finally, compared to the other methods, the training time and computing resources of the proposed RadarSSL-PL-TR are increased, but the recognition performance is significantly improved.Meanwhile, the inference time is only slightly increased.

Conclusions
In this paper, first, the idea of complementary label learning was explored for radar jamming recognition with noisy labels.The proposed RadarCL-TR methods were found to be effective for radar jamming recognition in the presence of noisy samples.More specifically, by learning from complementary labels, the proposed RadarCL-based methods, including RadarCL-CNN and RadarCL-TR, reduced the negative impact on recognition accuracy and obtained the highest accuracy in CNN-based methods and TR-based methods, respectively.Moreover, owing to the superior ability to capture global dependencies of radar jamming, RadarCL-TR increased the OA by 14.55% and 12.71% compared to RadarCL-CNN at the noise ratios of 0.1 and 0.2, respectively.
Next, to address the issue of poor radar jamming recognition performance with few labeled samples but plenty of unlabeled samples, RadarSSL-PL-TR was investigated.In detail, RadarSSL-PL-TR adopted pseudo labels to utilize unlabeled radar jamming samples, thereby increasing the number of training samples.Simultaneously, the longrange dependencies were well captured using the TR.Thus, the radar jamming recognition performance was significantly improved under conditions with plenty of unlabeled samples.Furthermore, a selection module was designed to select reliable pseudo-labeled samples, which was helpful in capturing more distinctive features.Hence, RadarSSL-PL-S-TR showed an improvement of 6.17% in the OA compared to RadarSSL-PL-TR.
This research has opened a new door for further studies to explore weakly supervised radar jamming processing.

Figure 1
Figure 1 shows an overview of the proposed RadarCL-TR framework.The framework of the proposed RadarCL-TR can be divided into the training process and the testing process.

Figure 1 .
Figure 1.An overview of the proposed RadarCL-TR framework for radar jamming recognition with noisy samples.

Figure 1 .
Figure 1.An overview of the proposed RadarCL-TR framework for radar jamming recognition with noisy samples.The training process consists of three main stages: noisy radar jamming sample detection, noisy radar jamming sample cleansing, and radar jamming recognition.(1) In the first stage, "noisy radar jamming sample detection", the "Transformer Architecture" is used to extract features from the noisy training set D n , and noisy samples are detected by learning from complementary labels.(2) In the second stage, "noisy radar jamming sample cleansing", based on the detected noisy samples in the first stage, the noisy samples are filtered out by comparing the output probability of samples with a set threshold.Thus, only clean radar jamming training samples, whose labels are accurate, are selected as a training set D s .(3) In the third stage, "radar jamming recognition", after cleansing the radar jamming samples with noisy labels, the trained "Transformer Architecture" in the first stage is used to further extract features from all of the selected clean radar jamming samples in the training set D s .During the testing process, the "Transformer Architecture" that has undergone two rounds of training (i.e., the "Transformer Architecture" from the third stage) is used as a feature extractor to extract features and then recognizes the tested radar jamming types to achieve the goal of radar jamming recognition.

Figure 2 .
Figure 2. The Transformer architecture for extracting features of radar jamming.
and W O ∈ R h×d model ×d v are learnable weight matrices, and d k = d v = d model /h and h represent the number of parallel attention heads.

Figure 3 .
Figure 3.An overview of the proposed RadarSSL-PL-S-TR framework for radar jamming recognition with labeled and unlabeled samples.

Figure 3 .
Figure 3.An overview of the proposed RadarSSL-PL-S-TR framework for radar jamming recognition with labeled and unlabeled samples.
Then, the unlabeled radar jamming samples with reliable and unreliable labels are added to the labeled radar jamming training set and pseudo-labeling radar jamming training set, respectively.In this way, a labeled radar jamming training set D l and pseudo-labeling radar jamming training set D p are updated and redefined as D l = D l + D sp and D p = D p − D sp , respectively.The radar jamming recognition performance is gradually improved via iterating step 3 and step 4.

( 1 )
Training Process of RadarCLs-TR in the Presence of Label Noise Generally, CNN and TR exhibit poor generalization capability in the presence of noisy labels.To gain more insights into the proposed CL-based method, taking RadarCL-CNN and RadarCL-TR with 10% label noise as examples, distribution histograms of the output probabilities were created and are demonstrated in Figure 5. Blue and orange in Figure 5 indicate the distribution histograms of the output probability for the clean and noisy samples, respectively.Remote Sens. 2024, 16, x FOR PEER REVIEW 13 of 22

( 1 )Figure 5 .
Figure 5. (a-c) and (d-f) are the distribution histograms of the output probability under RadarCL-CNN and RadarCL-TR during the early, middle, and later stages of the training process, respectively.

Figure 5 .
Figure 5. (a-c) and (d-f) are the distribution histograms of the output probability under RadarCL-CNN and RadarCL-TR during the early, middle, and later stages of the training process, respectively.

2 Figure 6 .
Figure 6.Radar jamming recognition accuracy for each class with 20 training samples at v ratios of symmetric noise.

Figure 6 .
Figure 6.Radar jamming recognition accuracy for each class with 20 training samples at various ratios of symmetric noise.

( 4 )
A Complexity Analysis of the Methods with Noisy Samples Table

Figure 8 .
Figure 8.A parameter sensitivity analysis of the weight in the proposed RadarSSL-PL-TR darSSL-PL-S-TR methods.

Figure 8 .
Figure 8.A parameter sensitivity analysis of the weight in the proposed RadarSSL-PL-TR and Ra-darSSL-PL-S-TR methods.

( 3 )
The AROC algorithm can handle non-convex-shaped clusters, which better match the actual distribution of radar jamming signals.Next, applying only PL-based methods (i.e., RadarSSL-PL-CNN and RadarSSL-PL-TR) can obtain suboptimal recognition performance in CNN-based and TR-based methods, respectively.Finally, the combined PL with a reliable pseudo-labeling sample selection module, called RadarSSL-PL-S-CNN and RadarSSL-PL-S-TR, can further improve the recognition performance.Moreover, the proposed RadarSSL-PL-S-TR achieves the best performance compared to the other methods.It outperforms RadarSSL-PL-TR by 6.17%, and 6.74% in terms of the OA and K when the number of samples per class is 10.(4)A Complexity Analysis for the Methods with Labeled and Unlabeled Samples Table4presents a complexity analysis of the methods in the presence of unlabeled samples.

Table 1 .
Average recognition accuracies and standard deviations of different methods using 30 training samples per class with various ratios of symmetric noise.Results of proposed method and the highest accuracy of APA for each class are presented in bold.

Table 2 .
A complexity analysis of the methods with noisy samples.

Table 2 .
A complexity analysis of the methods with noisy samples.

Table 3 .
Average recognition accuracies and standard deviations of different methods using 10 (i.e., R = 10) labeled training samples per class.Results of optimal and suboptimal accuracy are presented in bold red and bold green, respectively.The highest accuracy of APA for each class is presented in bold.

Table 4 .
A complexity analysis for the methods with labeled and unlabeled samples.