Clustered Multi-Task Learning for Automatic Radar Target Recognition

Model training is a key technique for radar target recognition. Traditional model training algorithms in the framework of single task leaning ignore the relationships among multiple tasks, which degrades the recognition performance. In this paper, we propose a clustered multi-task learning, which can reveal and share the multi-task relationships for radar target recognition. To further make full use of these relationships, the latent multi-task relationships in the projection space are taken into consideration. Specifically, a constraint term in the projection space is proposed, the main idea of which is that multiple tasks within a close cluster should be close to each other in the projection space. In the proposed method, the cluster structures and multi-task relationships can be autonomously learned and utilized in both of the original and projected space. In view of the nonlinear characteristics of radar targets, the proposed method is extended to a non-linear kernel version and the corresponding non-linear multi-task solving method is proposed. Comprehensive experimental studies on simulated high-resolution range profile dataset and MSTAR SAR public database verify the superiority of the proposed method to some related algorithms.


Introduction
Automatic target recognition (ATR) systems are used to identify one or a group of target objects in a scene. These ATR systems are to detect and classify targets using various images and signal processing techniques. Due to the capability of producing all-weather, 24-h a day and robustness towards the environmental condition of radar sensors, researchers have drawn much attention on the automatic target recognition based on radar images. Usually, radar images can be divided into one-dimensional high-resolution range profile (HRRP) images and two-dimensional images, like synthetic aperture radar (SAR) images. In recent years, radar images have been intensively studied for ATR in civilian and military fields [1][2][3][4][5][6][7]. Nevertheless, HRRPs and SAR images are sensitive to the variation in the pose and the speckle noise. How to recognize the specified radar targets still requires further study and exploration.
Radar target recognition generally consists of three main separate stages: detection, discrimination, and classification. The first stage aims to approximately determine the location of targets by using the amplitude information of radar signals. The second stage excludes the interference of clutter in background. The last stage is to predict the category of targets using classifiers. In this paper, the classifier design is emphasized. Lots of classical classifiers have been implemented for radar target classification, including k-nearest neighbors (KNN), support vector machine (SVM) [8], AdaBoost [9] and so on. Zhou incorporated the reconstructive power and discriminative power of dictionary atoms for radar target recognition [2]. A complementary spatial pyramid coding (CSPC) approach for radar (4) APG method is used for solving the non-linear extension of multi-task learning, which guarantees the convergence and can be implemented in parallel computing.
The rest of the paper is organized as follows: in Section 2, the clustered multi-task learning for radar target recognition is proposed. The experimental results and analysis are provided in Section 3, and the paper is finalized with conclusions in Section 4.

Preliminaries
For radar target classification, the model learning for m target categories can be considered as m tasks. During the training phase, a set of training samples (x i j , y i j ) n i j=1 are given for model learning, where x i j ∈ R d is the d-dimension training data, y i j ∈ {0, 1} is the label and n i is the number of samples in the ith task. In this paper, the goal is to learn a nonlinear predictive function where φ(x i j ) is the nonlinear mapping of sample x i j , w i is the model parameter and b i is the offset of ith task. Let W = [w 1 . . . w m ], then the objective function can be formulated as: where f (W) is the empirical loss function. In this paper, it is defined as: where n i is to alleviate the data imbalance among different tasks. Ω(W) is clustering-based regularization to constrain the shared information among different tasks, and has been the focus of many researchers. For example, the authors assume that the parameter vector of each task is similar to the average parameter vector, and Ω(W) is formulated as [17]: where L is the Laplacian matrix defined on the graph with edge weights equaling to 1 /2m. In [18], the authors assume that all tasks can be grouped into r < m clusters, and Ω(W) is defined as: where I c is the index set of the cth cluster, w j denotes the mean of the cth cluster, and matrix F ∈ R m×r is an orthogonal cluster indicator matrix with F i,c = 1/ √ n c if i ∈ I c and F i,c = 0 otherwise. These methods assume that the cluster structures or the multi-task relationships are known. Nevertheless, sometimes these model assumptions may be incorrect or even worse. Thus, learning the task relationships from data automatically is a more favorable choice. In [14], a multi-task relationship learning (MTRL) method is proposed, which can autonomously learn the positive and negative task correlation.
The Ω(W) is given as: Ω(W) = tr(WS −1 W T ) s.t. S 0, tr(S) = 1 (5) where S is defined as a task covariance matrix and W is the model parameter.

Proposed Clustered Multi-Task Learning
In MTRL method, the multi-task relationships among different mode parameters are fully utilized. To further improve the classification performance of MTRL, we assume that the tasks with a close relationship should be close to each other in the projection space X T W. That is to say, the task covariance matrix S should reflect the multi-task relationships in the original and projected space of mode parameters. The proposed Ω(W) can be formulated as: 1 2 W 2 2 + λ 2 2 tr(WS −1 W T ) + λ 3 2 tr(PS −1 P T ) s.t. S 0, tr(S) = 1 , p i j = (x i j ) T w i (6) where P = [P 1 , . . . , P m ], P i = [p i 1 , . . . , p i n i ], λ i is the regularization parameter, and S denotes the task covariance matrix. The target features hidden in the radar images are usually nonlinear. Thus a nonlinear kernel version of the proposed method is obtained: where P = [P 1 , . . . , The first term penalizes the complexity of W. The second term restricts the distance between w i and w k in the model parameter space, and the third term controls the distance between φ(X i ) and φ(X k ) in the projected space. The latter two regularization terms imply that the distance between a pair of task T i and T k should be as small as possible in the original and projected space if they belongs to the same cluster. To sum up, the objective function can be denoted as:

Proposed Optimization Method
The objective function of problem (8) is convex on all variables. But it is not easy to optimize the objective function with respect to all the variables simultaneously. Here an alternating method is adopted to solve the problem. Firstly, W and b are updated with fixed S. Then S is updated with fixed W and b.
Specifically, when S is fixed, the optimization problem for updating W and b can be stated as: To facilitate a kernel extension for proposed method, the optimization problem is transformed into a dual form: where α i j and β i j are the Lagrange multipliers associated with the jth training sample of the ith task. Setting the derivative of L with respect to W, b i , ξ i j and P equal to zero, we obtain: where e i is the ith column vector of I m×m and β i j is the (j, i)th element of B. Plugging Equation (11) into Equation (10), the following form is obtained n i is the multi-task kernel matrix defined on all the training samples. For any two training samples ( , the corresponding multi-task kernel is referred to as K(x (12) is an unconstrained optimization problem with respect to β, the derivative of E with respect to β can be given as: Setting the derivative equal to zero, we obtain: where Q = −(K + S λ 3 ). Substituting Equation (14) into Equation (12), we can obtain: where So far, the problem (9) is converted to a familiar form. In most literatures [14,19], Equation (15) is solved by an SMO algorithm, being similar to the least-squares SVM [15]. However, multiple variables need to be heuristically selected, when SMO method is used for solving the non-linear extension of multi-task learning. In this paper, the SMO approach can't guarantee convergence due to the mutual interference when multiple variables are heuristically selected. In the proposed solving method, the problem (15) is efficiently solved by the widely applied APG method [16]. Specifically, the objective function is divided into the smooth part f (α) = 1 2 α T Kα and non-smooth part The gradient of f (α) is Lipschitz continuous with the Lipschitz constant satisfying with ≥ K 2 . Given , a surrogate function T (α, α k ) is defined as: where ∇ f (α k ) denotes the derivative of f (α) with respect to α at α = α k . After we omitted the constant terms, Equation (16) can be redefined as: which can be decoupled into m subproblems with the ith one formulated as: where z = Kα k and z i is a subvector of z corresponding to the ithtask. It is not difficult to see that Equation (17) can be easily implemented in parallel computing. Based on the Lagrange multiplier method, we can obtain the analytical solution of problem (17) as follows: where Y = (y 1 1 , . . . , y m n m ) When W and b are obtained, the subproblem for minimizing Equation (10) over S can be stated as: Similar to [14], the analytical solution S is given as: Then the nonlinear predictive function f i (x i j ) = φ(x i j ) T w i + b i can be obtained: where T i j = α T k i j − y i j and k i j is a column of K corresponding to x i j . The proposed method can be pictorially shown in Figure 1, and the corresponding recognition procedure is given in Algorithm 1.

Experimental Results and Analysis
In order to verify the effectiveness and robustness of the proposed method, simulated HRRP datasets and MSTAR SAR public databases are used for tests. In the following studies, a one-against-all framework is adopted, where each one-against-all classification problem is considered as a task. To quantitatively assess the performance, several state-of-the-art algorithms, including KNN, SVM and some other multi-task learning methods, which are summarized in Table 1, are used as the reference.    Reformulate the optimize problem (9) into a dual form (12) 6: Update β by Equation (14) 7: Solving problem (15) by using the APG method

Description KNN
K-nearest neighbor classifier. SVM [8] Support vector machine learning. Trace-norm Regularized multi-task learning (Trace) [20] Trace method assumes that all models share a common low dimensional subspace.
Regularized multi-task learning (RMTL) [17] RMTL method assumes that all tasks are similar, and the parameter vector of each task is similar to the average parameter vector.
Clustered Multi-Task Learning (CMTL) [18] CMTL assumes that multiple tasks follow a clustered structure and that such a clustered structure is prior. In the experiments, we perform multiple single task learning to get the trained mode parameters, and based on which to obtain the clustered structure. Multi-task relationship learning (MTRL) [14] MTRL can autonomously learn the positive and negative task correlation.   Reformulate the optimize problem (9) into a dual form (12) 6: Update β by Equation (14)  7: Solving problem (15) by using the APG method 8: Update S by using Equation (21) 9: end while 10: Output W, b and S. Table 1. The reference methods to be studied.

Methods Description
KNN K-nearest neighbor classifier.
Trace-norm Regularized multi-task learning (Trace) [20] Trace method assumes that all models share a common low dimensional subspace.
Regularized multi-task learning (RMTL) [17] RMTL method assumes that all tasks are similar, and the parameter vector of each task is similar to the average parameter vector.
Clustered Multi-Task Learning (CMTL) [18] CMTL assumes that multiple tasks follow a clustered structure and that such a clustered structure is prior. In the experiments, we perform multiple single task learning to get the trained mode parameters, and based on which to obtain the clustered structure.
Multi-task relationship learning (MTRL) [14] MTRL can autonomously learn the positive and negative task correlation.

Investigations Based on a Simulated Database
In the simulation experiments, three categories of tank models are considered as the radar targets. HRRP samples are obtained by performing an IFFT of RCS samples, which are generated by FEKO software, whose electromagnetic simulation parameters are listed in Table 2. The three targets and their corresponding HRRPs are shown in Figure 2. In the simulation, the profiles generated at the depression angle of 15 • are randomly divided into two equal parts, one for training and the other for testing. For each target, a 50-dimensional feature is achieved by principal component analysis (PCA). In this section, the influences of model parameters on the recognition performance are discussed, then two comparative experiments are conducted. In each of the numerical simulation, twenty times Monte Carlo simulation experiments are performed to achieve the average recognition results.

Investigations Based on a Simulated Database
In the simulation experiments, three categories of tank models are considered as the radar targets. HRRP samples are obtained by performing an IFFT of RCS samples, which are generated by FEKO software, whose electromagnetic simulation parameters are listed in Table 2. The three targets and their corresponding HRRPs are shown in Figure 2. In the simulation, the profiles generated at the depression angle of 15° are randomly divided into two equal parts, one for training and the other for testing. For each target, a 50-dimensional feature is achieved by principal component analysis (PCA). In this section, the influences of model parameters on the recognition performance are discussed, then two comparative experiments are conducted. In each of the numerical simulation, twenty times Monte Carlo simulation experiments are performed to achieve the average recognition results.   Two metrics of ACC (accuracy) and AUC (area under curve) are adopted to validate the proposed method. The average results are shown in Figure 3. Figure 3a shows that the best recognition rate is reached when 2 λ and 3 λ are equal to 1 10 − and 3 10 − , respectively. The optimal value of 3 λ is less than that of 2 λ , when our method achieves the best performance. This indicates that the distances among three tasks in the model parameter space are larger than that in the projected space. The penalty of the distance in model parameter space is severer than that in projected space if 2 3 λ λ > , which promotes the generalization ability. Figure 3b shows that the value λ λ
Two metrics of ACC (accuracy) and AUC (area under curve) are adopted to validate the proposed method. The average results are shown in Figure 3. Figure 3a shows that the best recognition rate is reached when λ 2 and λ 3 are equal to 10 −1 and 10 −3 , respectively. The optimal value of λ 3 is less than that of λ 2 , when our method achieves the best performance. This indicates that the distances among three tasks in the model parameter space are larger than that in the projected space. The penalty of the distance in model parameter space is severer than that in projected space if λ 2 > λ 3 , which promotes the generalization ability. Figure 3b shows that the value of AUC approximately equals to one in most of the combinations of λ 2 and λ 3 , which means that our model has a strong sorting ability for the samples. Furthermore, an ideal AUC value combined with a worse ACC value denotes that many of the negative samples are classified as positive samples. That's because the negative and positive samples are unbalanced. For example, in the task of tank 1 against tank 2 and tank 3 classification, the samples of tank 1 are set as positive, and tank 2 and tank 3 are set as negative ones, which leads to a sample unbalanced problem. However, when appropriate combination of λ 2 and λ 3 is chosen, the imbalance problem of samples will disappear. samples. That's because the negative and positive samples are unbalanced. For example, in the task of tank 1 against tank 2 and tank 3 classification, the samples of tank 1 are set as positive, and tank 2 and tank 3 are set as negative ones, which leads to a sample unbalanced problem. However, when appropriate combination of 2 λ and 3 λ is chosen, the imbalance problem of samples will disappear (a) (b)

Comparison of Single Task and Multiple Task
To verify the effectiveness of multi-task learning, single task learning and multiple (three) tasks learning methods are compared, where parameter 1 λ is set as 1, 2 λ for  As shown in Figure 4, the overall recognition rate of three tasks jointly training method is 0.5856, 0.9915, 0.8933 and 0.8830 respectively when using different parameter 3 λ . It is 2.26%, 6.11%, 3.70% and 4.59% better than that of the single task learning method. The result denotes that by jointly learning three tasks we can reveal and share the relationships among different tasks, which helps to discriminate the tanks with similar patterns. In our method, the relationships among different tasks can be automatically achieved by computing the learned covariance matrix S . The correlation coefficients of three tasks are shown in Figure 5, when 3 λ equals

Comparison of Single Task and Multiple Task
To verify the effectiveness of multi-task learning, single task learning and multiple (three) tasks learning methods are compared, where parameter λ 1 is set as 1, λ 2 for 10 −1 and λ 3 (c 3 ) for 10 −2 , 10 −3 , 10 −4 , 10 −5 . The ACC results of three tanks are shown in Figure 4. samples. That's because the negative and positive samples are unbalanced. For example, in the task of tank 1 against tank 2 and tank 3 classification, the samples of tank 1 are set as positive, and tank 2 and tank 3 are set as negative ones, which leads to a sample unbalanced problem. However, when appropriate combination of 2 λ and 3 λ is chosen, the imbalance problem of samples will disappear (a) (b)

Comparison of Single Task and Multiple Task
To verify the effectiveness of multi-task learning, single task learning and multiple (three) tasks learning methods are compared, where parameter 1 λ is set as 1, 2 λ for  As shown in Figure 4, the overall recognition rate of three tasks jointly training method is 0.5856, 0.9915, 0.8933 and 0.8830 respectively when using different parameter 3 λ . It is 2.26%, 6.11%, 3.70% and 4.59% better than that of the single task learning method. The result denotes that by jointly learning three tasks we can reveal and share the relationships among different tasks, which helps to discriminate the tanks with similar patterns. In our method, the relationships among different tasks can be automatically achieved by computing the learned covariance matrix S . The correlation coefficients of three tasks are shown in Figure 5, when 3 λ equals  As shown in Figure 4, the overall recognition rate of three tasks jointly training method is 0.5856, 0.9915, 0.8933 and 0.8830 respectively when using different parameter λ 3 . It is 2.26%, 6.11%, 3.70% and 4.59% better than that of the single task learning method. The result denotes that by jointly learning three tasks we can reveal and share the relationships among different tasks, which helps to discriminate the tanks with similar patterns. In our method, the relationships among different tasks can be automatically achieved by computing the learned covariance matrix S. The correlation coefficients of three tasks are shown in Figure 5, when λ 3 equals 10 −3 . The relationship matrix shows that task 1 and task 2 have a high negative correlation with task 3. In the proposed three tasks jointly learning method, these relationships can be accurately described and utilized in the parameter space and projected space. Therefore, a 99.15% average recognition accuracy is achieved by this method. However, when these relationships are not properly handled, these strong correlations will degrade the model learning. Figure 4 shows that when 3 λ equals 2 10 − , the ACC of tank 3 by multi-task learning is poor. The reason is that when a tight constraint is imposed on the projected space, the generalization ability of model will decline, which makes it difficult to accurately classify the highly correlated tank 3. Nevertheless, in most of the parameters 3 λ , the performance of multi-task learning is better than that of single task learning.

Comparison against the State of the Art
To evaluate the performance of proposed method, our method is compared with the reference methods and the results are shown in Figure 6 and Table 3.   Figure 6 shows that the multi-task learning with a trace-norm regularization has the lowest recognition rate. It is because that the trace method learns a linear predictive function, which can't T a n k 1 T a n k 2 T a n k 3 The relationship matrix shows that task 1 and task 2 have a high negative correlation with task 3. In the proposed three tasks jointly learning method, these relationships can be accurately described and utilized in the parameter space and projected space. Therefore, a 99.15% average recognition accuracy is achieved by this method. However, when these relationships are not properly handled, these strong correlations will degrade the model learning. Figure 4 shows that when λ 3 equals 10 −2 , the ACC of tank 3 by multi-task learning is poor. The reason is that when a tight constraint is imposed on the projected space, the generalization ability of model will decline, which makes it difficult to accurately classify the highly correlated tank 3. Nevertheless, in most of the parameters λ 3 , the performance of multi-task learning is better than that of single task learning.

Comparison against the State of the Art
To evaluate the performance of proposed method, our method is compared with the reference methods and the results are shown in Figure 6 and Table 3. The relationship matrix shows that task 1 and task 2 have a high negative correlation with task 3. In the proposed three tasks jointly learning method, these relationships can be accurately described and utilized in the parameter space and projected space. Therefore, a 99.15% average recognition accuracy is achieved by this method. However, when these relationships are not properly handled, these strong correlations will degrade the model learning. Figure 4 shows that when 3 λ equals 2 10 − , the ACC of tank 3 by multi-task learning is poor. The reason is that when a tight constraint is imposed on the projected space, the generalization ability of model will decline, which makes it difficult to accurately classify the highly correlated tank 3. Nevertheless, in most of the parameters 3 λ , the performance of multi-task learning is better than that of single task learning.

Comparison against the State of the Art
To evaluate the performance of proposed method, our method is compared with the reference methods and the results are shown in Figure 6 and Table 3.   Figure 6 shows that the multi-task learning with a trace-norm regularization has the lowest recognition rate. It is because that the trace method learns a linear predictive function, which can't T a n k 1 T a n k 2 T a n k 3   Figure 6 shows that the multi-task learning with a trace-norm regularization has the lowest recognition rate. It is because that the trace method learns a linear predictive function, which can't accurately describe the nonlinear structures of HRRP data. This result suggests that it is necessary to extend the multi-task learning methods to nonlinear domains. Table 3 shows that the overall recognition rate for our method is 0.9915, compared to 0.9674 for MTRL, 0.9570 for CMTL, 0.8918 for RMTL, 0.6944 for Trace, 0.7543 for SVM and 0.8640 for KNN. It is 2.41%, 3.45%, 9.97%, 29.71%, 23.72% and 12.75% better than the competitors, MTRL, CMTL, RMTL, Trace, SVM and KNN, respectively. The simulation results denote that our method can accurately describe and utilize the three tasks relationships in the parameter space and projected space, which helps improve the recognition rate of radar targets with highly similar patterns.

Investigations Based on MSTAR Database
To further verify the effectiveness of the proposed method, extensive studies have been done based on the MSTAR public database, a gallery collected using a 10-GHz SAR sensor with 1 ft × 1 ft resolution in range and azimuth. Images are captured at various depressions over 0-359 • range of aspect view. The sizes of the images are all around 128 × 128 pixels. To further avoid the influence of clutter, the images are cropped to 64 × 64 pixels. In this paper, the intensity of raw image is adopted as the feature. Specially, each raw image is concatenated into a 4096-dimensinal long vector. Then a 200-dimensional feature is achieved by PCA method. In the following numerical simulations, the parameters λ 1 , λ 2 and λ 3 of our method are set as 1, 0.1 and 0.001, respectively.

Target Recognition under Standard Operating Conditions (SOC)
We first consider target recognition under SOC. Images acquired under operating condition of a 17 • depression angle are used to train the mode, while the ones captured at an operating condition of a 15 • depression angle are used for testing, as shown in Table 4. All ten targets are employed and their optical and SAR images are given in Figure 7. Among these vehicles, BMP2 and T72 have several variants with small structural modifications (denoted by series number). Only the standards (SN_9563 for BMP2 and SN_132 for T72) captured at 17 • depression are available for training. The simulation results denote that our method can accurately describe and utilize the three tasks relationships in the parameter space and projected space, which helps improve the recognition rate of radar targets with highly similar patterns.

Investigations Based on MSTAR Database
To further verify the effectiveness of the proposed method, extensive studies have been done based on the MSTAR public database, a gallery collected using a 10-GHz SAR sensor with 1 ft × 1 ft resolution in range and azimuth. Images are captured at various depressions over 0-359° range of aspect view. The sizes of the images are all around 128 128 × pixels. To further avoid the influence of clutter, the images are cropped to 64 64 × pixels. In this paper, the intensity of raw image is adopted as the feature. Specially, each raw image is concatenated into a 4096-dimensinal long vector. Then a 200-dimensional feature is achieved by PCA method. In the following numerical simulations, the parameters 1 λ , 2 λ and 3 λ of our method are set as 1, 0.1 and 0.001, respectively.

Target Recognition under Standard Operating Conditions (SOC)
We first consider target recognition under SOC. Images acquired under operating condition of a 17° depression angle are used to train the mode, while the ones captured at an operating condition of a 15° depression angle are used for testing, as shown in Table 4. All ten targets are employed and their optical and SAR images are given in Figure 7. Among these vehicles, BMP2 and T72 have several variants with small structural modifications (denoted by series number). Only the standards (SN_9563 for BMP2 and SN_132 for T72) captured at 17° depression are available for training.

2S1
BRDM2 BTR60 D7 T62 ZIL131 ZSU23/4 T72-132 BMP2-C21 BRT70 Figure 7. The optical and SAR images of ten targets to be recognized. The descriptions of these vehicles can be referred to [21]. Figure 7. The optical and SAR images of ten targets to be recognized. The descriptions of these vehicles can be referred to [21]. In this section, single task learning method is compared with the other training method, like two tasks jointly learning, five tasks jointly learning, and ten tasks jointly learning method. Taking five tasks jointly learning method as an example, it is realized by diving the ten tasks into two equal groups and the tasks within the same groups are jointly learning. The formation of the other multi-task leaning method is similar to this one. The recognition results with different training methods are shown in Figure 8. Besides, the learned ten tasks relationships matrix is shown in Figure 9. As shown in Figure 8, the overall recognition rate of ten targets by ten modes jointly training is increased by 6.37%, 2.74%, and 2.14% respectively, compared with one mode individually training, two modes jointly training and five models jointly training. Moreover, ten modes jointly learning has a more robust ACC result compared with the other training methods. This is due to the fact that ten modes jointly learning method imposes a unified sparse constraint on all of the ten tasks and achieves a global balance in the process of training the ten models. Figure 9 shows that two groups of tasks ('2S1' and 'BMP2', 'T62' and'T72') are highly correlated. The single task leaning method can't appropriately handle and utilize these relationships, thus not being able to get a recognition rate as well as the multi-task leaning method.  In this section, single task learning method is compared with the other training method, like two tasks jointly learning, five tasks jointly learning, and ten tasks jointly learning method. Taking five tasks jointly learning method as an example, it is realized by diving the ten tasks into two equal groups and the tasks within the same groups are jointly learning. The formation of the other multi-task leaning method is similar to this one. The recognition results with different training methods are shown in Figure 8. Besides, the learned ten tasks relationships matrix is shown in Figure 9. As shown in Figure 8, the overall recognition rate of ten targets by ten modes jointly training is increased by 6.37%, 2.74%, and 2.14% respectively, compared with one mode individually training, two modes jointly training and five models jointly training. Moreover, ten modes jointly learning has a more robust ACC result compared with the other training methods. This is due to the fact that ten modes jointly learning method imposes a unified sparse constraint on all of the ten tasks and achieves a global balance in the process of training the ten models. Figure 9 shows that two groups of tasks ('2S1' and 'BMP2', 'T62' and'T72') are highly correlated. The single task leaning method can't appropriately handle and utilize these relationships, thus not being able to get a recognition rate as well as the multi-task leaning method. Figure 8. The recognition rates of ten vehicles with different learning methods. The term 'Task/1' denotes that each of the ten modes (classifiers) is individually learned (trained). While the term l 'Task/2' ('Task/5', 'Task/10') means that every two (five, ten) modes are jointly learned.  A v e r a g e Figure 8. The recognition rates of ten vehicles with different learning methods. The term 'Task/1' denotes that each of the ten modes (classifiers) is individually learned (trained). While the term l 'Task/2' ('Task/5', 'Task/10') means that every two (five, ten) modes are jointly learned.  In this section, single task learning method is compared with the other training method, like two tasks jointly learning, five tasks jointly learning, and ten tasks jointly learning method. Taking five tasks jointly learning method as an example, it is realized by diving the ten tasks into two equal groups and the tasks within the same groups are jointly learning. The formation of the other multi-task leaning method is similar to this one. The recognition results with different training methods are shown in Figure 8. Besides, the learned ten tasks relationships matrix is shown in Figure 9. As shown in Figure 8, the overall recognition rate of ten targets by ten modes jointly training is increased by 6.37%, 2.74%, and 2.14% respectively, compared with one mode individually training, two modes jointly training and five models jointly training. Moreover, ten modes jointly learning has a more robust ACC result compared with the other training methods. This is due to the fact that ten modes jointly learning method imposes a unified sparse constraint on all of the ten tasks and achieves a global balance in the process of training the ten models. Figure 9 shows that two groups of tasks ('2S1' and 'BMP2', 'T62' and'T72') are highly correlated. The single task leaning method can't appropriately handle and utilize these relationships, thus not being able to get a recognition rate as well as the multi-task leaning method. Figure 8. The recognition rates of ten vehicles with different learning methods. The term 'Task/1' denotes that each of the ten modes (classifiers) is individually learned (trained). While the term l 'Task/2' ('Task/5', 'Task/10') means that every two (five, ten) modes are jointly learned.  A v e r a g e Figure 9. Correlation coefficients of ten different tasks. The term'2S1'means the task to recognize vehicle 2S1 and the meanings of other terms are similar to this one.  Figure 10 and Table 5 show the comparison ACC results of the proposed method and the reference methods. From Table 5, we can see that the overall recognition rate of our method is 0.9734, compared to 0.9584 for MTRL, 0.9391 for CMTL, 0.9209 for RMTL, 0.7504 for Trace, 0.9017 for SVM and 0.9271 for KNN. It is 1.50%, 3.43%, 5.25%, 22.30%, 7.17% and 4.63% better than the competitors, MTRL, CMTL, RMTL, Trace, SVM and KNN, respectively. The results show that jointly training the ten modes under a unifying classification framework is beneficial for the improvement of recognition rate. Besides, Figure 10 also shows that the ACC of target '2S1' is lower than other targets in most of the reference methods. This may be because that, as shown in Figure 9, tasks '2S1' and 'BMP2' are highly correlated and most of the reference methods can't accurately describe and utilize this relationship, which results in a recognition performance degradation. On the contrary, the proposed method can utilize this relationship and get a better recognition accuracy.  Figure 10 and Table 5 show the comparison ACC results of the proposed method and the reference methods. From Table 5, we can see that the overall recognition rate of our method is 0.9734, compared to 0.9584 for MTRL, 0.9391 for CMTL, 0.9209 for RMTL, 0.7504 for Trace, 0.9017 for SVM and 0.9271 for KNN. It is 1.50%, 3.43%, 5.25%, 22.30%, 7.17% and 4.63% better than the competitors, MTRL, CMTL, RMTL, Trace, SVM and KNN, respectively. The results show that jointly training the ten modes under a unifying classification framework is beneficial for the improvement of recognition rate. Besides, Figure 10 also shows that the ACC of target '2S1' is lower than other targets in most of the reference methods. This may be because that, as shown in Figure 9, tasks '2S1' and 'BMP2' are highly correlated and most of the reference methods can't accurately describe and utilize this relationship, which results in a recognition performance degradation. On the contrary, the proposed method can utilize this relationship and get a better recognition accuracy.  To assure the practicability of our method, the recognition performances under different depression angles are assessed. Three vehicle targets 2S1, BRDM2, and ZSU23/4 are utilized. The images captured at an operating condition of a 17° depression are used to train the algorithm, while the ones collected at an operating condition of 30° and 45° depressions are used for testing, as shown in Table 6.  To assure the practicability of our method, the recognition performances under different depression angles are assessed. Three vehicle targets 2S1, BRDM2, and ZSU23/4 are utilized. The images captured at an operating condition of a 17 • depression are used to train the algorithm, while the ones collected at an operating condition of 30 • and 45 • depressions are used for testing, as shown in Table 6. In this section, single task learning is compared with three tasks jointly learning. The ACC results and correlation coefficients matrix of three vehicles are shown in Figures 11 and 12 respectively.  In this section, single task learning is compared with three tasks jointly learning. The ACC results and correlation coefficients matrix of three vehicles are shown in Figure 11 and Figure 12 respectively.   Figure 11 indicates that the overall recognition rate of three modes jointly learning is 1.63% and 1.90% better than that of single mode individually learning under 30° and 45° testing depression angles, respectively. Figure 12 shows that the three tasks are connected with each other. The tasks connections in EOC test are not close enough, when compared with the relationships shown in Figure 5. One reason is that the SAR images contain a lot of speckle noises, while the simulated HRRP data does not contain noises. Nevertheless, the improvements of overall recognition rate are significant by means of three tasks jointly learning in both of the MSTAR data testing and HRRP data testing. All these results corroborate that multi-task learning is superior to single tasks learning. (B) Comparison against the State of the Art To evaluate the robustness of proposed method, the reference methods are compared with our A v e r a g e  In this section, single task learning is compared with three tasks jointly learning. The ACC results and correlation coefficients matrix of three vehicles are shown in Figure 11 and Figure 12 respectively. Figure 11. Recognition rates of three vehicles with different learning methods. The term 'E30/1' ('E45/1') denotes that each of the three modes is individually learned at an operating condition of a 17° depression and tested at an operating condition of 30° (45°) depression. Similar to the terms 'E30/1' and 'E45/1', the term 'E30/3' ('E45/3') denotes that all of the three modes are jointly learned.  Figure 11 indicates that the overall recognition rate of three modes jointly learning is 1.63% and 1.90% better than that of single mode individually learning under 30° and 45° testing depression angles, respectively. Figure 12 shows that the three tasks are connected with each other. The tasks connections in EOC test are not close enough, when compared with the relationships shown in Figure 5. One reason is that the SAR images contain a lot of speckle noises, while the simulated HRRP data does not contain noises. Nevertheless, the improvements of overall recognition rate are significant by means of three tasks jointly learning in both of the MSTAR data testing and HRRP data testing. All these results corroborate that multi-task learning is superior to single tasks learning. (B) Comparison against the State of the Art To evaluate the robustness of proposed method, the reference methods are compared with our method under two different depression angles. The ACC results are shown in Figure 13 and Table 7.
A v e r a g e Figure 12. Correlation coefficients of three different tasks. The term'2S1'means the task to recognize vehicle 2S1 and the meanings of other terms are similar to this one. Figure 11 indicates that the overall recognition rate of three modes jointly learning is 1.63% and 1.90% better than that of single mode individually learning under 30 • and 45 • testing depression angles, respectively. Figure 12 shows that the three tasks are connected with each other. The tasks connections in EOC test are not close enough, when compared with the relationships shown in Figure 5. One reason is that the SAR images contain a lot of speckle noises, while the simulated HRRP data does not contain noises. Nevertheless, the improvements of overall recognition rate are significant by means of three tasks jointly learning in both of the MSTAR data testing and HRRP data testing. All these results corroborate that multi-task learning is superior to single tasks learning.

(B) Comparison against the State of the Art
To evaluate the robustness of proposed method, the reference methods are compared with our method under two different depression angles. The ACC results are shown in Figure 13 and Table 7.  The overall recognition rate for the proposed method is 0.9824, compared to 0.9546 for MTRL, 0.9472 for CMTL, 0.9203 for RMTL, 0.6742 for Trace, 0.8673 for SVM and 0.9142 for KNN on the operating condition of 30° depression. It is 2.78%, 3.52%, 6.21%, 30.82%, 11.51% and 6.82% better than MTRL, CMTL, RMTL, Trace, SVM and KNN, respectively. When the algorithms are further evaluated using the images captured at an operating condition of 45°, the performances of all the methods are degraded, especially the KNN method. The recognition rate for KNN drops from 0.9142 to 0.6363. Different form KNN method, the multi-task leaning methods like RMTL, CMTL, MTRL and our method still keep a higher recognition rate due to the ability to share the multi-task relationships. Among all the approaches, the proposed method achieves the highest recognition rate of 0.9731. This is 2.29%, 7.83%, 6.73%, 39.90%, 40.61% and 33.68% better than MTRL, CMTL, RMTL, Trace, SVM and KNN, respectively. The improvement of recognition accuracy is significant. The results demonstrate that the proposed method is much more robust toward depression variation than the reference methods.
From the above extensive analyses of simulation experiment results, we can draw the following conclusions. Multi-task relationships information can actually improve the performance of classification and automatically learning the task relationships from data is a more favorable  The overall recognition rate for the proposed method is 0.9824, compared to 0.9546 for MTRL, 0.9472 for CMTL, 0.9203 for RMTL, 0.6742 for Trace, 0.8673 for SVM and 0.9142 for KNN on the operating condition of 30 • depression. It is 2.78%, 3.52%, 6.21%, 30.82%, 11.51% and 6.82% better than MTRL, CMTL, RMTL, Trace, SVM and KNN, respectively. When the algorithms are further evaluated using the images captured at an operating condition of 45 • , the performances of all the methods are degraded, especially the KNN method. The recognition rate for KNN drops from 0.9142 to 0.6363. Different form KNN method, the multi-task leaning methods like RMTL, CMTL, MTRL and our method still keep a higher recognition rate due to the ability to share the multi-task relationships. Among all the approaches, the proposed method achieves the highest recognition rate of 0.9731. This is 2.29%, 7.83%, 6.73%, 39.90%, 40.61% and 33.68% better than MTRL, CMTL, RMTL, Trace, SVM and KNN, respectively. The improvement of recognition accuracy is significant. The results demonstrate that the proposed method is much more robust toward depression variation than the reference methods.
From the above extensive analyses of simulation experiment results, we can draw the following conclusions. Multi-task relationships information can actually improve the performance of classification and automatically learning the task relationships from data is a more favorable option. Furthermore, a detailed and accurate description of the multi-task relationships in the original and projected space of model parameters is better than the rough hypothesis of multi-task relationships in terms of improving the radar target recognition performance.

Conclusions
In this paper, we propose a radar target recognition method based on the theory of clustered multi-task learning. In order to learn more useful and accurate relationships among multiple tasks, the potentially useful relationships in the projection space are further explored. The proposed method assumes that multi-tasks within the same cluster should be close to each other in the original and projected space, which contributes to discriminating the radar targets with similar patterns. Studies on the simulated HRRP data show that the proposed method can fully discover multi-task relationship in the projected space and accurately classify the targets with similar structures. Extensive comparative experiments on the MSTAR data are conducted to further demonstrate the superiority and robustness of the proposed method. The simulation experiment results under SOC and EOC demonstrate that the proposed method can properly reveal the latent relationships among multiple tasks and have a better performance than single task learning. Moreover, all of the comparisons against the state-of-the-art methods indicate the superiority of the proposed method.