Cooperative Electromagnetic Data Annotation via Low-Rank Matrix Completion

: Electromagnetic data annotation is one of the most important steps in many signal processing applications, e


Introduction
As radar has been widely used in the battlefield, radar signal reconnaissance plays an important role in electronic warfare (EW).Typically, the first step of the radar reconnaissance system is to annotate the intercepted radar pulses with some key parameters, such as pulse width, carrier frequency, pulse repetition interval, direction of arrival (DOA), etc., which is also known as pulse description word (PDW).By analyzing the range and variation characteristics of these parameters, the working mode and behavior of the radar can be recognized.Therefore, accurate annotation is one of the key steps for radar countermeasure [1,2].However, with the appearance of advanced multi-function radar systems, the electromagnetic environment has become increasingly complex, and the annotation is facing unprecedented challenges [3].Firstly the electromagnetic spectrum is congested, and the pulse density of radar signals surges.At present, the pulse density in a typical environment may exceed millions or even tens of millions per second.Secondly, the advanced radar transmitter is programmable, networked, and intelligent, which leads to agile and overlapping parameters.The traditional fixed pulse pattern (such as fixed carrier frequency, repeated frequency, and unmodulated pulses) tends to be replaced with more complex time-varying patterns in modern radar systems.In addition, to improve the anti-reconnaissance and anti-jamming capabilities, more complex inter-pulse modulation patterns are adopted, which makes it hard to accurately annotate the parameters from the interception; the strong antagonism between the two sides of the non-cooperative game and the high real-time response induce incomplete and even wrong characteristic parameters of radar signals obtained by reconnaissance.Therefore, how to accurately and stably annotate the parameters of radar pluses is crucial for radar countermeasures.
Apart from radar countermeasures, data annotation is also commonly encountered in other fields, e.g., image and text data processing.At present, most annotations still rely on traditional manual methods.Manual annotation is often labor-intensive, tedious, and inefficient due to differences in personal experience and a lack of effective information.The heuristic rule-based annotation method and the pattern matching-based annotation method are also commonly used in the field of image and text data processing [4][5][6][7].The annotation method based on the heuristic rule has low accuracy and generality, and cannot add semantic annotations to all the extracted data [7].The pattern matching method utilizes the pre-established pattern matching relationship to annotate the data in a complementary manner [8], but in general, it is difficult to guarantee the correctness of the matching relationship.In view of the above shortcomings, it is difficult to adapt the traditional annotation methods to the reconnaissance electromagnetic data obtained under non-cooperative and strong confrontation conditions.Moreover, the reconnaissance data obtained by multiple heterogeneous platforms often have problems such as poor data quality, low annotation rate, and a serious lack of annotation information, which presents an obstacle to subsequent analyses and processing.How to realize the automatic annotation efficiently and accurately is particularly important for radar countermeasures.
In this work, we consider that radar reconnaissance data are intercepted by multiple reconnaissance platforms, but due to interference and noisy environments, each platform may have only partial, incomplete annotations of the radar pulses.Our goal is to use these partial annotations to cooperatively obtain an accurate and complete annotation.To this end, we exploit two key observations, namely, (1) radar reconnaissance data are often inherently correlated in the time-frequency domain; (2) interceptions from multiple platforms are highly correlated since they are from the same target.Upon the above two observations, we expect that the collected data from multiple platforms should exhibit a certain low-rank structure.The low-rank representation in matrix form is an important data representation, which has been widely used in various research areas such as robust principal component analysis [8,9] and matrix completion [10][11][12][13].It also can be used for image restoration combined with sparse optimization [14][15][16].Low-rank matrix recovery can be regarded as a generalization of compressed sensing, that is, how to recover the original matrix using the observation data under the low-rank condition [17][18][19].Based on the theory of completion and recovery of the low-rank matrix, the redundancy existing in data can be exploited to fill in the missing elements or correct the erroneous annotations.While lowrank matrix completion has been widely used in other fields, e.g., image recovery [20][21][22][23][24] and matrix completion [25][26][27][28][29][30][31][32][33], to the best of our knowledge we are not aware of any work on electronic reconnaissance data annotation, especially in radar countermeasure applications.In this work, we first formulate the cooperative annotation problem as a low-rank matrix completion problem and then two efficient optimization algorithms are developed; one is based on convex relaxation and the other is non-convex max-rank decomposition.Simulations on synthetic data and real data are provided to demonstrate the efficacy of the proposed methods by comparing them with the conventional method.
The outline of this paper is given as follows.In Section 2, the problem formulation is presented.In Section 3, a rank-minimization algorithm for annotation completion is proposed.In Section 4, a maximum-rank-decomposition algorithm is proposed.In Section 5, numerical comparisons of the two proposed methods with some state-of-the-art algorithms are given.In the end, Section 6 concludes the paper.

Problem Formulation
Suppose that there are n 1 reconnaissance receivers/platforms and n 2 emitters/targets, e.g., radars, in the observation area within a certain time range.For each target, there are n 3 measured parameters, including time, location (such as longitude, altitude, and height), speed, frequency band, signal intensity, etc.An illustration of the measured parameters is given in Table 1, which records the annotation information of different platforms, where " * * " represents the received value of measured parameters.  1 can be written as a matrix X ∈ R m 1 ×n 3 by arranging measured parameters in the order of platforms, where In general, it is difficult to collect target information all the time at each platform, and the parameters (annotation information) detected by different platforms are not exactly the same due to the heterogeneous characteristics between different types of platforms.In addition, different platforms have different statuses, such as "work/maintenance", at the same time.All these facts lead to the missing characteristic information in Table 1 and matrix X, which is shown in Figure 1, where the small black squares represent the missing annotation information.Our goal is to recover the missing elements in the matrix X from the partially observed data, i.e., annotation completion.According to the definition of X, the row vectors of characteristic parameters belonging to the same target should be highly correlated; therefore, the rank of matrix X does not exceed the number of targets n 2 , i.e., r = rank(X) ≤ n 2 .The matrix X is low-rank if there are enough monitoring platforms and enough categories of characteristic parameters, i.e., r = rank(X) min{m 1 , n 3 }.Thus, the annotation completion can be formulated as a low-rank matrix recovery problem, in which each row or column of the matrix can be expressed linearly by other rows or columns.The missing data can be recovered perfectly with a high probability [10,22,23] using the redundant information when the rank of the matrix and the number of known elements meet certain conditions.Therefore, it is theoretically feasible to use the low-rank matrix recovery theory for annotation completion.To put it into context, let D ∈ R m 1 ×n 3 be the observation matrix of X, which contains the known annotation information of X.The annotation completion problem based on low-rank matrix recovery can be modeled as: min where X − D 0 is the 0 -norm of X − D, i.e., the number of non-zero elements in X − D. This is a complex non-convex optimization problem since the non-convex function • 0 and the non-convex constraint on rank(X).It is difficult to obtain the global optimal solution.In order to solve this problem, the min-rank-based convex approximation algorithm and the max-rank-decomposition-based non-convex algorithm are employed to find approximate solutions for problem (2).We summarize the frequently used notations in Table 2.
Table 2.The notation of symbols.

The Rank-Minimization-Based Convex Approximation Algorithm
In this section, a rank-minimization-based convex algorithm is proposed to solve problem (2).First, let Ω ⊆ {1, 2, . . ., m 1 } × {1, 2, . . ., n 3 } denote the set of indices associated with the known annotations in X. Define the linear projection operator P Ω : R m 1 ×n 3 → R m 1 ×n 3 as follows: where D i,j represents the element in the i-th row and j-th column of matrix D ∈ R m 1 ×n 3 .Then, problem (2) can be recast as the following matrix rank minimization problem. min where rank(•) is the rank function.Problem ( 4) is still a non-convex problem.Here, we consider its convex relaxation.In fact, rank(X) describes the number of non-zero singular values of X, i.e., the 0 -norm of the singular value vector.Since the 0 -norm is a non-convex function, the 1 -norm is utilized as the convex approximation of 0 -norm, which gives rise to the nuclear norm of X as the convex approximation of rank(X).By introducing the matrix slack variable E ∈ R m 1 ×n 3 , the problem (4) can be approximated as the following convex problem min where X * is the nuclear norm of X.To solve problem (5), we employ the alternating direction method of multiple (ADMM) algorithms.Specifically, denote the augmented Lagrangian function L c (X, E, Λ) where c > 0 is the penalty factor, Λ ∈ R m 1 ×n 3 is the Lagrangian multiplier matrix, Tr{•} is the trace of the matrix, • F is the Frobenius norm.Then, problem (5) can be solved by alternately updating X, E, and Λ, respectively, as follows In the following, the updating for ( 7) is given.

Updating X
The updating of X ∈ R m 1 ×n 3 is conducted by solving the following problem (8).
In order to solve ( 8), an auxiliary variable matrix A k ∈ R m 1 ×n 3 is introduced, which is defined as and the singular value decomposition of A k is given by where U k ∈ R m 1 ×m 1 and V k ∈ R n 3 ×n 3 are the left and right singular matrices, respectively, Then, the optimal solution of problem ( 8) is given by [28]

Updating E
The updating of E ∈ R m 1 ×n 3 can be given by solving min Clearly, the optimal solution E k+1 of problem ( 13) is given by D − X k+1 + c −1 Λ k for elements not in the set Ω, thus we have where Then, the whole procedure for solving problem ( 5) is summarized in Algorithm 1.

Algorithm 1 The rank-minimization-based algorithm
Until some stopping criteria satisfied; From Algorithm 1, we find that the computation consumption is mainly in updating matrix X due to the singular value decomposition of A k .The total computation complexity of Algorithm 1 is at the order of O(max{m 1 ,

The Maximum-Rank-Decomposition-Based Non-Convex Algorithm
In this section, we consider an alternative way to tackle the annotation completion problem (2) from the maximum-rank decomposition perspective.Specifically, the maximum-rank decomposition of X ∈ R m 1 ×n 3 (suppose rank(X) = m 2 ) is given by where As before, we employ the ADMM approach to handle problem (15).Specifically, the augmented Lagrangian function of ( 16) is given as where Φ ∈ R m 1 ×n 3 is the Lagrangian multiplier matrix, c is the penalty factor.The ADMM algorithm repeatedly runs the following updating until stopping criteria are satisfied.

Updating X
The updating of X is given by solving min By using the first-order optimality condition, we have where ∂ X − D 1 represents the sub-differential of X − D 1 , which is given by with ee ∈ R m 1 ×1 and ee 1 ≤ 1.Then, we have where

Updating U
The updating of U ∈ R m 1 ×m 2 is given by solving min As the problem ( 23) is an unconstrained quadratic program, the optimal solution can be given by the first-order optimality condition, thus we have

Updating V
The V ∈ R m 2 ×n 3 updating is given by solving min Similar to the problem (23), its optimal solution is given by We summarize the whole procedure of the ADMM algorithm for problem ( 16) in Algorithm 2.
The computation complexity of Algorithm 2 is decided by the updating steps.Note that the size of U k is (m 1 × m 2 ), the size of V k is (m 2 × n 3 ), and according to the low-rank assumption, we have m 2 m 1 and m 2 n 3 .The computation complexity for updating X k , U k , and V k is at the order of O(m 1 × m 2 × n 3 ).It can be seen that the non-convex algorithm (Algorithm 2) has lower per-iteration complexity as compared with the convex algorithm (Algorithm 1).
In addition, two proposed methods are designed to recover the missing feature parameters, the value of parameters is real and the auxiliary variables using the algorithm are real as well.Therefore, they cannot be utilized for complex parameters directly.

Numerical Experiments and Discussion
In this section, the performance of the two proposed methods is tested with synthetic data and real data, and the comparison testing with three different methods is also given.To evaluate the performance, the mean squared error (MSE) is adopted as performance metrics, which is denoted as where X is the original matrix with size (m × n), and i = 1, 2, . . ., m, j = 1, 2, . . ., n, X is the recovered matrix.

Synthetic Data Test of Proposed Methods
The synthetic data is generated by a radar target simulator, including 10 platforms, 10 targets in t = (t 1 , . . ., t 10 ), for each target, 10 features are utilized, and each feature is normalized, which forms the original data matrix X with [100 × 100] and rank r = 10.In order to test the performance of proposed methods under different missing ratios, the observation matrix D is given by randomly dropping out elements with different ratios in each row of X and setting them as empty.Part of the elements of X are shown in Table 3 and part of the observation matrix D with 50% of the annotations of X randomly removed is shown in Table 4.
In Tables 5 and 6, the completed annotations by Algorithms 1 and 2 are given respectively.It can be seen that the missing elements are recovered after matrix completion.Compared with the original matrix X, we found that the proposed methods can recover X efficiently.Take the first row of X for example, the fourth, fifth, and sixth elements in Table 5 are recovered by Algorithm 1 with values 1.1639, 1.2384, and 1.0438, which are exactly the same as that in X; i.e., they are perfectly recovered.Meanwhile, the corresponding recovered values by Algorithm 2 in Table 6 are 1.1643, 1.1978, and 1.0437, with MSE ≤ 1 × 10 −3 , which suggests that the proposed methods can fill in the missing annotations efficiently.In Figure 2, the MSE of two proposed methods under different missing rates is given.It can be found that the MSE decreases with the decreasing of the missing ratio, which suggests that both of the proposed methods can recover or recorrect the missing or wrong elements in D efficiently.Comparing the two methods, we find that Algorithm 1 has lower MSE with the missing ratio < 0.7, the main reason is that the completion by max rank decomposition in Algorithm 2 results in the measurement error.In the discussion above, we have assumed rank(X) = 10 as a prior.In practice, the rank of X is generally unknown and needs to be jointly estimated.In fact, the rank minimization in Algorithm 1 cannot estimate the rank of D directly, while Algorithm 2 can predict the rank directly due to the max-rank decomposition of D. The comparison of the estimated rank and the real rank of X given by Algorithm 2 is presented in Figure 3.It can be seen that the estimated rank of the proposed method is consistent with the real rank.In fact, we find that when the missing ratio ≤ 50%, the curve of rank setting vs. estimated rank is consistent with the curve in Figure 3.The main reason is that fewer missing records result in better recovery results.When the missing ratio is ≥ 50%, the estimated rank is unstable and not consistent with the rank setting, the main reason is that more missing records can lead to rank variation.

Real Data Test of Proposed Methods
Apart from the synthetic data test, in the following, we verify the performance of the proposed methods with real data-PDW records from real radars.For the real data test, the missing ratio is about 30%.The missing information is set as empty, moreover, certain errors are added to verify the error correction capability of proposed methods.Part of the real data X and the observation data D are illustrated in Tables 7 and 8, respectively.9 and 10, respectively.From the two tables, it can be seen that both methods can fill in the missing annotations accurately.Specifically, for the carrier frequency annotation in the first column, the MSE is ≤1 × 10 −3 ; for the pulse width annotation in the second column, the MSE is about 1 × 10 −2 ; for the amplitude annotation in the fourth column, the error is about 1 × 10 −2 ; for the AOA parameter in the last column, the error is about 1 × 10 −3 .The correction for wrong PDW records of Algorithms 1 and 2 are also validated.For the real data X in Table 7, it can be seen that the PW and AOA records of Target "19" for platform "1" are "0.2200" and "0.3533" with underline, which is wrong and totally different from other platform records.From Table 9, we have that the correction of Algorithm 1 for PW and AOA are "1.7863" and "461.7466",which are close to the records of platform "2".The results of Algorithm 2 are consistent with Algorithm 1, which suggests that the proposed methods can correct the wrong records efficiently.
In addition, the run times of Algorithms 1 and 2 are compared under different missing ratios, and the result is shown in Figure 4. We see that the run time of Algorithm 2 is stable for different missing ratios, and much lower than Algorithm 1 when the missing ratio exceeds 0.3.This is consistent with the complexity analysis at the end of Section 4.3.In the end, the iteration number of Algorithms 1 and 2 under different missing ratios are shown in Figure 5, it can be found that the iteration number of Algorithm 2 is lower than Algorithm 1 and stable in different missing ratios, which is consistent with the running time and complexity analysis.

Comparison Test
In this section, the comparison test of proposed methods with three state-of-theart methods for electromagnetic data annotation completion is given.Three compared methods are: 1.
The K-nearest neighbor method (KNN) in [32], which predicts the missing annotation by its K nearest neighbors; 2.
The augmented Lagrange multiplier method for low-rank matrix recovery (ALM) in [27], where the annotation completion is formulated as a convex optimization model solved by the ALM algorithm; 3.
The nuclear norm regularized method for annotation completion (NNLS) in [28], where the annotation completion is formulated as an optimization model solved by the accelerated proximal gradient algorithm.
For comparison testing, the synthetic data is utilized, which is generated by the radar target simulator with 10 platforms, 10 targets, and in t = [t 1 , . . ., t 10 ], 10 features are utilized for each target, which forms the original data matrix X with size 100× 100 and rank r =10.
Then, the performance of the proposed methods and compared methods are discussed.The MSE of five methods under different missing ratios are shown in Figure 6.It can be seen that the MSE increases roughly with the increase of the missing ratio for all methods.The MSE of proposed Algorithms 1 and 2 are roughly the same, and much lower than the KNN, ALM, and NNLS methods, which demonstrates the superior recovery performance by using the ADMM algorithms.Compared to the KNN with Algorithms 1 and 2, it can be found that utilizing the low-rank structure for annotation completion can recover the missing annotation efficiently.In addition, the average MSE of the five compared methods is presented in Table 11.For each missing ratio, the feature parameters are dropped randomly ten times to get the average MSE of different compared methods.In the end, the running time for different methods is given in Figure 7.We find that the proposed Algorithm 1 is more time-consuming than other compared methods since the SVD decomposition, and the running time is much more with the increasing of missing ratio.The KNN method has the lowest running time since the low computation.The running time of NNLS and ALM methods are lower than the proposed method's Algorithms 1 and 2; the main reason is that the SVD decomposition in the proposed algorithms is time-consuming.

Figure 2 .
Figure 2. The MSE of two proposed methods under different missing ratios.

Figure 3 .
Figure 3.The rank setting vs. estimated rank of Algorithm 2.

Figure 4 .
Figure 4.The running time comparison of Algorithms 1 and 2 under different missing ratios.

Figure 5 .
Figure 5.The iteration number comparison of Algorithms 1 and 2 under different missing ratios.

Figure 6 .
Figure 6.The MSE comparison for different methods under different missing ratios.

Figure 7 .
Figure 7.The running time comparison for different methods under different missing ratios.

Table 1 .
An illustration of annotation information of electronic reconnaissance data.

Table 3 .
The original annotated matrix X.

Table 4 .
The partially annotated matrix D.

Table 7 .
Real data X.

Table 8 .
Recorded real data D with missing annotations.

Table 11 .
The average MSE of five compared methods.