Recognition of Targets in SAR Images Based on a WVV Feature Using a Subset of Scattering Centers

This paper proposes a robust method for feature-based matching with potential for application to synthetic aperture radar (SAR) automatic target recognition (ATR). The scarcity of measured SAR data available for training classification algorithms leads to the replacement of such data with synthetic data. As attributed scattering centers (ASCs) extracted from the SAR image reflect the electromagnetic phenomenon of the SAR target, this is effective for classifying targets when purely synthetic SAR images are used as the template. In the classification stage, following preparation of the extracted template ASC dataset, some of the template ASCs were subsampled by the amplitude and the neighbor matching algorithm to focus on the related points of the test ASCs. Then, the subset of ASCs were reconstructed to the world view vector feature set, considering the point similarity and structure similarity simultaneously. Finally, the matching scores between the two sets were calculated using weighted bipartite graph matching and then combined with several weights for overall similarity. Experiments on synthetic and measured paired labeled experiment datasets, which are publicly available, were conducted to verify the effectiveness and robustness of the proposed method. The proposed method can be used in practical SAR ATR systems trained using simulated images.


Introduction
Over the last few years, automatic target recognition using synthetic aperture radar (SAR ATR) has increasingly become important as a crucial means of surveillance [1][2][3][4]. SAR images can be obtained in most weather types, day and night, and at a high resolution [5][6][7][8]. Based on these characteristics, SAR ATR algorithms have been evaluated on several targets, including ground-based vehicles, aircrafts, and vessels, which are challenging for military operations [9]. However, the sensitivity of SAR to sensor parameters, target configurations, and environmental conditions make it challenging to implement SAR ATR [10][11][12].
SAR ATR algorithms can be divided into three basic steps: detection, discrimination, and classification [13,14]. The first two steps are intended to extract potential target areas and remove false alarms [15]. The purpose of target classification is to automatically classify each input target image obtained by target detection and discrimination [16]. A large number of efforts have been made to achieve robust SAR ATR. However, the classification performances under extended operating conditions (EOCs) are insufficient for practical applications. In real-world cases, most targets are likely to be camouflaged or blocked by surrounding obstacles [11,17]. To improve the performance under EOCs within a SAR ATR system, this paper focused on the classification stage.
Target classification methods are mainly divided into model-based and feature-based methods [5]. Feature-based methods involve pattern recognition and rely solely on features to represent the target, with many SAR target templates stored in advance. Once the The previously proposed WVV-based method has a robust ATR capability using all extracted ASCs [35]. The WVV is insensitive to translation, rescaling, random perturbation, and random addition and deletion of points. However, the method is sensitive to partial differences that may arise due to simulation limitations. Further, local feature matching under imbalance from two sets is still a challenging and important point for classification problems [36,37]. It is necessary to design a classification method that is less sensitive to the partial difference between real and synthetic images. Therefore, we propose an improved WVV-based ATR method using a subset of ASCs instead of using al ASCs to focus on local similarity.

Target Classification with a WVV-Based Feature Set
To classify the target in SAR images, we designed the classification algorithm in terms of local features matching. The ASCs were used as the unique features of the target in the SAR image. Figure 2 illustrates a flowchart of the proposed method. The flowchart is divided into five main steps: extraction of the scattering centers, sub-sampling of scatter centers based on amplitude, neighbor matching, WVV-based feature reconstruction, and similarity measurement. The ASCs of the template dataset extracted from each template image were prepared offline in advance. The value of T in Figure 2 is the number of template samples. In the classification algorithm, the extraction of scattering center from a test image was first employed. To analyze the local similarity and the imbalance problem in two ASC sets, some of the template ASCs were then selected by amplitude-based subsampling, where the number of test ASCs was exploited to adjusting the number of template ASCs. Afterwards, neighbor matching was applied to partially concentrate on the related points between the test ASCs and template ASCs. The subset of ASCs was used to reconstruct the world view vector feature set, thus considering the point similarity and structure similarity simultaneously. Later, the matching scores between the two ASC sets were calculated using weighted bipartite graph matching and then combined with severa weights for overall similarity. Regarding the WVV-based similarity as the weight of bipartite graph matching, we found the optimal matching between the two sets. Finally, the overall similarity was determined to recognize the target by combining the matching score with several weights related to the matched/unmatched number of ASCs and selected/unselected number of ASCs, which could not be exclusively applied to the WVV-based similarity only. By repeating the process T times, the test image was consequently classified. The previously proposed WVV-based method has a robust ATR capability using all extracted ASCs [35]. The WVV is insensitive to translation, rescaling, random perturbation, and random addition and deletion of points. However, the method is sensitive to partial differences that may arise due to simulation limitations. Further, local feature matching under imbalance from two sets is still a challenging and important point for classification problems [36,37]. It is necessary to design a classification method that is less sensitive to the partial difference between real and synthetic images. Therefore, we propose an improved WVV-based ATR method using a subset of ASCs instead of using all ASCs to focus on local similarity.

Target Classification with a WVV-Based Feature Set
To classify the target in SAR images, we designed the classification algorithm in terms of local features matching. The ASCs were used as the unique features of the target in the SAR image. Figure 2 illustrates a flowchart of the proposed method. The flowchart is divided into five main steps: extraction of the scattering centers, sub-sampling of scatter centers based on amplitude, neighbor matching, WVV-based feature reconstruction, and similarity measurement. The ASCs of the template dataset extracted from each template image were prepared offline in advance. The value of T in Figure 2 is the number of template samples. In the classification algorithm, the extraction of scattering center from a test image was first employed. To analyze the local similarity and the imbalance problem in two ASC sets, some of the template ASCs were then selected by amplitude-based subsampling, where the number of test ASCs was exploited to adjusting the number of template ASCs. Afterwards, neighbor matching was applied to partially concentrate on the related points between the test ASCs and template ASCs. The subset of ASCs was used to reconstruct the world view vector feature set, thus considering the point similarity and structure similarity simultaneously. Later, the matching scores between the two ASC sets were calculated using weighted bipartite graph matching and then combined with several weights for overall similarity. Regarding the WVV-based similarity as the weight of bipartite graph matching, we found the optimal matching between the two sets. Finally, the overall similarity was determined to recognize the target by combining the matching score with several weights related to the matched/unmatched number of ASCs and selected/unselected number of ASCs, which could not be exclusively applied to the WVV-based similarity only. By repeating the process T times, the test image was consequently classified.

Extraction of Scattering Centers
Several algorithms are available for extracting scatter centers in the image domain [38,39]. CLEAN is one of the most used algorithms for extracting scatter centers from SAR images [16,17,19,24,26,27,[40][41][42][43]. The CLEAN algorithm employs a filter derived from the point spread function (PSF), which is given by: where (x, y) denotes the position of the scattering center; c is the speed of light; f is the center frequency; θ is the center azimuth; B is the frequency bandwidth of the radar; and Ω is the azimuth aperture. To extract the scattering centers properly, the filter used in the CLEAN algorithm has to incorporate the same smoothing window ω(x, y) used during image formation, resulting in: h(x, y) = psf(x, y)ω(x, y) This work used a −35 dB Taylor Window that was employed by the SAMPLE dataset [31].
The CLEAN algorithm searches for the highest-amplitude pixel in the SAR image and records its amplitude and its image coordinates (x , y ). Then, the filter ℎ( , ) shifts to the center of the pixel location and is multiplied by . Assuming a point spread function (PSF) with a corresponding amplitude, the response of the imaging system is removed from the complex image [44]. In general, the iterative process is repeated with the residual image until a predetermined number of ASCs are extracted, or the amplitude of the extracted scattering point is less than the threshold value.
In this study, we extracted scattering centers located in a target region if its amplitude was greater than or equal to a threshold, which was intended to limit the number of ASCs. The target region and shadow of a SAR image can be separated by some conventional segmentation algorithms, including the K-means, the Otsu's method, and the iterated conditional modes (ICM). In this experiment, the ICM in Ref. [45] was used, but the ICM processing speed was improved by using the initial segmented image by the K-means, instead of the SAR image, as the input data of the ICM. Figure 3 shows the results of our scattering-center extraction task using CLEAN. The reconstructed SAR image, using a total of N scattering centers, shown in Figure 3b, was very similar to the real SAR image. The segmentation results and the extracted scattering centers located only inside the target region are shown in Figure 3c. At the end of the extraction process, the scattering centers were stored in an N × 3 matrix, { , , | = 1, 2, … , }. For the extracted scattering centers, the coordinates of the range (in other words, the slant range) were converted to the

Extraction of Scattering Centers
Several algorithms are available for extracting scatter centers in the image domain [38,39]. CLEAN is one of the most used algorithms for extracting scatter centers from SAR images [16,17,19,24,26,27,[40][41][42][43]. The CLEAN algorithm employs a filter derived from the point spread function (PSF), which is given by: where (x, y) denotes the position of the scattering center; c is the speed of light; f c is the center frequency; θ c is the center azimuth; B is the frequency bandwidth of the radar; and Ω is the azimuth aperture. To extract the scattering centers properly, the filter used in the CLEAN algorithm has to incorporate the same smoothing window ω(x, y) used during image formation, resulting in: h(x, y) = psf(x, y)ω(x, y) This work used a −35 dB Taylor Window that was employed by the SAMPLE dataset [31].
The CLEAN algorithm searches for the highest-amplitude pixel in the SAR image and records its amplitude A i and its image coordinates (x i , y i ). Then, the filter h(x, y) shifts to the center of the pixel location and is multiplied by A i . Assuming a point spread function (PSF) with a corresponding amplitude, the response of the imaging system is removed from the complex image [44]. In general, the iterative process is repeated with the residual image until a predetermined number of ASCs are extracted, or the amplitude of the extracted scattering point is less than the threshold value.
In this study, we extracted scattering centers located in a target region if its amplitude was greater than or equal to a threshold, which was intended to limit the number of ASCs. The target region and shadow of a SAR image can be separated by some conventional segmentation algorithms, including the K-means, the Otsu's method, and the iterated conditional modes (ICM). In this experiment, the ICM in Ref. [45] was used, but the ICM processing speed was improved by using the initial segmented image by the K-means, instead of the SAR image, as the input data of the ICM. Figure 3 shows the results of our scattering-center extraction task using CLEAN. The reconstructed SAR image, using a total of N scattering centers, shown in Figure 3b, was very similar to the real SAR image. The segmentation results and the extracted scattering centers located only inside the target region are shown in Figure 3c. At the end of the extraction process, the scattering centers were stored in an N × 3 matrix, {A i , x i , y i | i = 1, 2, . . . , N}. For the extracted scattering centers, the coordinates of the range (in other words, the slant range) were converted to the ground range to facilitate the matching analysis of scattering centers between SAR images taken from different depression angles, as follows: where θ depression denotes the depression angle. Hereafter, x i is referred to as ground range coordinated, x g i . ground range to facilitate the matching analysis of scattering centers between SAR images taken from different depression angles, as follows: where denotes the depression angle. Hereafter, is referred to as ground range coordinated, x . In this paper, we used the ASC location ( , ) and normalized amplitude (| |) for the SAR ATR. The locations describe the spatial distribution of ASCs, and the normalized amplitudes reflect the relative intensities of different ASCs. Therefore, the direct relevance to the target physical structures benefits the ATR performance. The N-scattering center feature set G was defined as follows [26]:

Amplitude-Based Selection
The apparent distribution difference between the amplitude distributions of synthetic and real SAR images is a challenge in training synthetic data. The statistical differences between the synthetic and measured data of SAMPLE were investigated by Ref. [33] to plot histograms of the image means and variances. The synthetic data tended to have a lower mean and variance than the measured data. The mean image difference was approximately 0.1 when we extracted scattering centers from real and synthetic images. There was a difference in the number of SCs extracted between the synthetic and real images. This could be due to the clear differences in the overall structure of the targets, as well as fine differences in the target details. However, it could also be due to the use of unequal minimum amplitudes to alleviate the difference in amplitude values. Before neighbor matching, a subset of the scatter points in the template image was selected in the order of their amplitude values.
The number of scatter points in subset was equal to the number of scatter points in the test image. However, the amplitude of the scattering points extracted from SAR images varied depending on subtle changes in the target's pose and imaging geometry (e.g., depression angle), and it is desirable to extract the points with a small buffer according to as follows: where N and M are the numbers of scatter points in the test and template images, respectively. More ASCs in the template (synthetic) image can be expected (i.e., > 1), con- In this paper, we used the ASC location (x i , y i ) and normalized amplitude ( A Norm i ) for the SAR ATR. The locations describe the spatial distribution of ASCs, and the normalized amplitudes reflect the relative intensities of different ASCs. Therefore, the direct relevance to the target physical structures benefits the ATR performance. The N-scattering center feature set G was defined as follows [26]:

Amplitude-Based Selection
The apparent distribution difference between the amplitude distributions of synthetic and real SAR images is a challenge in training synthetic data. The statistical differences between the synthetic and measured data of SAMPLE were investigated by Ref. [33] to plot histograms of the image means and variances. The synthetic data tended to have a lower mean and variance than the measured data. The mean image difference was approximately 0.1 when we extracted scattering centers from real and synthetic images. There was a difference in the number of SCs extracted between the synthetic and real images. This could be due to the clear differences in the overall structure of the targets, as well as fine differences in the target details. However, it could also be due to the use of unequal minimum amplitudes to alleviate the difference in amplitude values. Before neighbor matching, a subset of the scatter points in the template image was selected in the order of their amplitude values.
The number of scatter points in subset M was equal to the number of scatter points in the test image. However, the amplitude of the scattering points extracted from SAR images varied depending on subtle changes in the target's pose and imaging geometry (e.g., depression angle), and it is desirable to extract the points with a small buffer according to A ratio as follows: where N and M are the numbers of scatter points in the test and template images, respectively. More ASCs in the template (synthetic) image can be expected (i.e., A ratio > 1), considering the possible partial and random occlusion of the test image and the full set of scattering centers of the synthetic image (at least no occlusion). In addition, a maximum value should be set to control the imbalance of extracted ASC numbers between the real and synthetic images. In our experiments, the best results were achieved when the A ratio was around 1.3, which was used in the following analysis.

Neighbor Matching
As shown in Figure 1, scattering centers with strong amplitudes in synthetic images are often invisible in real SAR images. In addition, the scattering centers of a part of the target may not appear in the real image because of the occlusion of the target. Therefore, considering the above reasons (difference from synthetic images and target occlusion), we focused on local similarity rather than identification using all scattering centers. For local similarity, the WVV descriptor should be reconstructed using scattering centers in the target overlap area (i.e., adjacent scattering centers). In this study, after selecting only the adjacent points between two sets of scattering centers through the neighbor-matching algorithm, we evaluated the similarity. First, the positions of the test ASCs were taken as the centers to form a binary region combined by all circles. When the template ASC was in the constructed binary region, it was selected; otherwise, it was discarded. The positions of the template ASCs were then reversed. The radius set for neighbor matching was chosen to be {0.3, 0.4, and 0.5 m} based on the resolution of MSTAR images, as well as the experimental observations [19,27]. Figure 4 presents an illustration of neighbor matching when the radius was set to 0.5 m. Neighbor matching was first conducted on the test ASCs ( Figure 3a) and then on the template ASCs ( Figure 3b). The corresponding matching results are shown in Figure 3a,b. Matched template ASCs, unmatched template ASCs, matched test ASCs, and unmatched test ASCs are represented by different markers. As shown in Figure 3c, the ratio of unmatched to matched points can be used to distinguish different targets. It is clear that the selected number of ASCs in the template was smaller when the type was not similar. Therefore, the matching results can provide discriminability for correct target recognition. scattering centers of the synthetic image (at least no occlusion). In addition, a maximum value should be set to control the imbalance of extracted ASC numbers between the real and synthetic images. In our experiments, the best results were achieved when the was around 1.3, which was used in the following analysis.

Neighbor Matching
As shown in Figure 1, scattering centers with strong amplitudes in synthetic images are often invisible in real SAR images. In addition, the scattering centers of a part of the target may not appear in the real image because of the occlusion of the target. Therefore, considering the above reasons (difference from synthetic images and target occlusion), we focused on local similarity rather than identification using all scattering centers. For local similarity, the WVV descriptor should be reconstructed using scattering centers in the target overlap area (i.e., adjacent scattering centers). In this study, after selecting only the adjacent points between two sets of scattering centers through the neighbor-matching algorithm, we evaluated the similarity. First, the positions of the test ASCs were taken as the centers to form a binary region combined by all circles. When the template ASC was in the constructed binary region, it was selected; otherwise, it was discarded. The positions of the template ASCs were then reversed. The radius set for neighbor matching was chosen to be {0.3, 0.4, and 0.5 m} based on the resolution of MSTAR images, as well as the experimental observations [19,27]. Figure 4 presents an illustration of neighbor matching when the radius was set to 0.5 m. Neighbor matching was first conducted on the test ASCs ( Figure 3a) and then on the template ASCs (Figure 3b). The corresponding matching results are shown in Figure 3a,b. Matched template ASCs, unmatched template ASCs, matched test ASCs, and unmatched test ASCs are represented by different markers. As shown in Figure 3c, the ratio of unmatched to matched points can be used to distinguish different targets. It is clear that the selected number of ASCs in the template was smaller when the type was not similar. Therefore, the matching results can provide discriminability for correct target recognition.

WVV-Based Feature Reconstruction
To solve the difficulties of similarity calculation, as mentioned before, the correspondence was established based on the existing feature descriptor, the WVV [26]. Test ASCs and matched template ASCs were used to construct each WVV-based feature set. The detailed procedure of the WVV-based feature reconstruction is illustrated in Algorithm 1 [26]. First, WVV establishes a polar coordinate system, taking the ith scattering center as the origin. The location of i is represented by the polar radii and polar angles of the remaining SCs. Next, the WVV i is defined by sorting the polar radii according to their polar angles, and the radii are linearly interpolated at 1 • intervals. The WVV is mapped into a vector of length 360. Finally, to avoid sensitivity to rescaling, the elements in the interpolated WVV are normalized by the maximum element. After iteration as the number of scattering centers, the scattering center feature set will consequently be the WVV-based feature set. 3. Sort the polar radii r ik corresponding to their polar angles θ ik and define ith WVV as  Figure 5 shows an example of WVV-based feature reconstruction using 2S1 ASCs from the SAMPLE dataset. Figure 5a,b show the 56-ASC set and interpolated 23rd WVV, respectively. In Figure 5b, the blue point at the origin is the 23rd ASC. At this point, the WVV 23 comprises the polar radii of the remaining points. Subsequently, the WVV 23 was linearly interpolated at 1 • intervals, and the elements were normalized by the maximum element. After iterations, the WVV-based feature set will have 56 interpolated WVV sets and normalized amplitudes. When some ASCs are randomly removed, the WVV-based features, consisting of the remaining ASCs, can maintain their spatial structures. Therefore, the proposed method is not sensitive to random noise removal.

WVV-Based Feature Reconstruction
To solve the difficulties of similarity calculation, as mentioned before, the corr spondence was established based on the existing feature descriptor, the WVV [26]. Te ASCs and matched template ASCs were used to construct each WVV-based feature se The detailed procedure of the WVV-based feature reconstruction is illustrated in Alg rithm 1 [26]. First, WVV establishes a polar coordinate system, taking the th scatterin center as the origin. The location of is represented by the polar radii and polar angles the remaining SCs. Next, the is defined by sorting the polar radii according to the polar angles, and the radii are linearly interpolated at 1° intervals. The WVV is mappe into a vector of length 360. Finally, to avoid sensitivity to rescaling, the elements in th interpolated WVV are normalized by the maximum element. After iteration as the numb of scattering centers, the scattering center feature set will consequently be the WVV-base feature set.   Figure 5 shows an example of WVV-based feature reconstruction using 2S1 ASC from the SAMPLE dataset. Figure 5a,b show the 56-ASC set and interpolated 23rd WV respectively. In Figure 5b, the blue point at the origin is the 23rd ASC. At this point, th comprises the polar radii of the remaining points. Subsequently, the w linearly interpolated at 1° intervals, and the elements were normalized by the maximu element. After iterations, the WVV-based feature set will have 56 interpolated WVV se and normalized amplitudes. When some ASCs are randomly removed, the WVV-base features, consisting of the remaining ASCs, can maintain their spatial structures. Ther fore, the proposed method is not sensitive to random noise removal.

Similarity Calculation
In ATR, it is important to design a similarity measurement between the input test image and archive templates. The structural similarity between two sets of scattering centers is obtained through WVV reconstruction of the scattering centers, and several weights related to the number of scattering centers and the number of matches are proposed.

Matching Score
After the WVV-based feature reconstruction, target classification based on point matching was conducted. The WVV provides a robust description of scattering centers. The Euclidean distance between the locations of the scattering centers is generally used when evaluating the point-to-point similarity between two sets of SCs. Meanwhile, spatial structure relationships were characterized by WVV-based features.
Comparing G test , the WVV-based feature set of the test, and template dataset B k k = 1, . . . T with Equation (4) S(G test , B k ) below, the most similar template was selected.
where g l ∈ G test and b l ∈ B k are the matching pairs and N and M are the number of SCs in the test and k-th template, respectively.
There was no strict one-to-one correspondence between the two sets. We needed to find an optimal assignment to maximize S between the two point sets. Based on weighted bipartite graph matching (WBGM), the similarity of the matching pairs was calculated as where D g i , b j are the Euclidean distances between the test ASC and template ASC. If D is greater than R, which is the distance (meter) for neighbor matching, 0 is assigned to s g i , b j to prevent matching between points that are too far away. Accordingly, the number of matched pairs, K match was applied to normalize the similarity measurement.

Weight Design
The previously obtained WVV-based matching score did not consider the difference in the number of scattering centers that occurs when the types of targets are different. Therefore, it was necessary to properly design the weights based on the number of matches and the number of points that were not matched, and reflect them in the overall degree of similarity.
Although the type was different from the test, if the template points around the test point were gathered, the WVV-based similarity was calculated to be high by neighbor matching (Figure 6a). The WVV-based matching score is higher in Figure 6a than that in Figure 6b, but in Figure 6a, the difference between the number of test and template scattering centers is substantial. Because the number of points in the template will be similar for targets of the same type, it is necessary to consider the difference in the number of points between the test and the template. Unmatched test scattering centers (MA: Missing Alarm) and unmatched template scattering centers (FA, False Alarm) were considered in the matching score. Here, we used the following quadratic weights presented by Ref. [24]: where s and q represent the number of MA and FA, respectively, and s + q can be calculated as N + m -2 × , where m is the number of matched template ASCs by the neighbor matching algorithm. Because of noise and extraction errors, it is common to observe the appearance of a few MAs and FAs. As the proportion of MAs and FAs increases, the weight decreases more quickly. In addition, when the test and template are of the same type, the number of scattering centers within the radius tends to be large during neighbor matching. Therefore, along with and , the ratio of the number of selected points to the total number of points was used as a weight. The overall matching score adopted in this study was defined as: In Equation (11), n/N is the ratio of the number of scattering centers extracted from the test image to the test ASCs adjacent to the template ASCs, and m/M is the ratio of the number of template ASCs to the number of neighboring ASCs to the number of test ASCs.
The procedure for the proposed target WVV-based matching method is shown in Figure 2. The entire algorithm was programmed in MATLAB ® R2021a (9.10), employing Matlab's Parallel Computing Toolbox. First, because it is impossible to define a one-toone correspondence between the test and template, subsampling based on amplitude and neighbor matching was conducted to reduce the imbalance. The similarity matrix of the WVV sets was obtained using Equation (7) , , and then all the similarities were em- Unmatched test scattering centers (MA: Missing Alarm) and unmatched template scattering centers (FA, False Alarm) were considered in the matching score. Here, we used the following quadratic weights presented by Ref. [24]: where s and q represent the number of MA and FA, respectively, and s + q can be calculated as N + m − 2 × K match , where m is the number of matched template ASCs by the neighbor matching algorithm. Because of noise and extraction errors, it is common to observe the appearance of a few MAs and FAs. As the proportion of MAs and FAs increases, the weight w a decreases more quickly. In addition, when the test and template are of the same type, the number of scattering centers within the radius tends to be large during neighbor matching. Therefore, along with S Norm and w a , the ratio of the number of selected points to the total number of points was used as a weight. The overall matching score adopted in this study was defined as: In Equation (11), n/N is the ratio of the number of scattering centers extracted from the test image to the test ASCs adjacent to the template ASCs, and m/M is the ratio of the number of template ASCs to the number of neighboring ASCs to the number of test ASCs.
The procedure for the proposed target WVV-based matching method is shown in Figure 2. The entire algorithm was programmed in MATLAB ® R2021a (9.10), employing Matlab's Parallel Computing Toolbox. First, because it is impossible to define a one-to-one correspondence between the test and template, subsampling based on amplitude and neighbor matching was conducted to reduce the imbalance. The similarity matrix of the WVV sets was obtained using Equation (7) s g i , b j , and then all the similarities were employed as weights for the WBGM. By repeating the above process based on the number of template databases, the type of test target could be determined as a category according to the template with the maximum matching score.

Experimental Settings
To evaluate the classification performance of the proposed method, experiments were conducted on SAMPLE datasets under standard operation conditions (SOC) and EOC, including random and partial occlusions. The SAMPLE dataset consisted of real and synthetic SAR images using CAD models of 10-class MSTAR targets, which are listed in Table 1 [31]. Furthermore, the optical images of 10-class targets are shown in Figure 7; they are ground vehicles, carriers, and trucks (you can see more targets in SAR images and types of targets in Refs. [46][47][48][49]). The data had a spatial resolution of one foot. Unfortunately, only some parts of the SAMPLE dataset are publicly available, which is appropriate for small-scale operations. They were collected at azimuth angles of 10-80 • with depression angles from 14-17 • . To validate the proposed method for 10 targets, we used target chips with depression angles of 16 • and 17 • . In Table 1, the number of SAMPLE datasets for each target class is listed according to the depression angle.

Experimental Settings
To evaluate the classification performance of the proposed method, experime were conducted on SAMPLE datasets under standard operation conditions (SOC) a EOC, including random and partial occlusions. The SAMPLE dataset consisted of real a synthetic SAR images using CAD models of 10-class MSTAR targets, which are listed Table 1 [31]. Furthermore, the optical images of 10-class targets are shown in Figure  they are ground vehicles, carriers, and trucks (you can see more targets in SAR imag and types of targets in Refs. [46][47][48][49]). The data had a spatial resolution of one foot. Unf tunately, only some parts of the SAMPLE dataset are publicly available, which is app priate for small-scale operations. They were collected at azimuth angles of 10-80° w depression angles from 14-17°. To validate the proposed method for 10 targets, we us target chips with depression angles of 16° and 17°. In Table 1, the number of SAMP datasets for each target class is listed according to the depression angle.  When using CLEAN, scattering centers with a minimum amplitude or higher w extracted. The distribution between the amplitude in the clutter around the target and t of the target may be considered when selecting the threshold value for the extraction scattering centers. Because a difference in amplitude exists between real and synthetic i ages, scattering centers with an amplitude of 0.25 or higher for real and 0.14 or more synthetic images were extracted considering the difference in amplitude between sy thetic and real data of SAMPLE [33,50]. The statistical values of the number of extrac scattering centers are shown in Figure 8; evidently, the number of scattering centers tracted by CLEAN was different for each target. BMP2 and BTR70 had an average of 3 50 scattering centers, which was less than that in the other targets. Meanwhile, M548 a M60 had numerous scattering centers. Thus, this imbalance in the number of scatter centers caused matching to be challenging, although valuable information was obtain for target identification. When using CLEAN, scattering centers with a minimum amplitude or higher were extracted. The distribution between the amplitude in the clutter around the target and that of the target may be considered when selecting the threshold value for the extraction of scattering centers. Because a difference in amplitude exists between real and synthetic images, scattering centers with an amplitude of 0.25 or higher for real and 0.14 or more for synthetic images were extracted considering the difference in amplitude between synthetic and real data of SAMPLE [33,50]. The statistical values of the number of extracted scattering centers are shown in Figure 8; evidently, the number of scattering centers extracted by CLEAN was different for each target. BMP2 and BTR70 had an average of 30-50 scattering centers, which was less than that in the other targets. Meanwhile, M548 and M60 had numerous scattering centers. Thus, this imbalance in the number of scattering centers caused matching to be challenging, although valuable information was obtained for target identification. For neighbor matching, the radius must be determined. Although the WVV d scriptor is not affected by the translation of the scattering point, translation may affect t identification performance because only a part of the template remains based on the t point after neighbor matching. In Refs. [17,24], the researchers applied several differe radii (e.g., 0.3, 0.4, and 0.5) for neighbor matching, and then the averages of all the sim larities were employed to determine the final similarity between the test and its cor sponding template.
In the experiment of this study, when the number of extracted scattering centers w small, the identification rate increased when a high value of 0.5 was used, rather than low value of 0.3 m. Conversely, when a high value of 0.5 was used, there was no chan in the identification rate, even when the number of scattering centers was large. We m need to consider the possibility of a translation transformation between the target a template before applying the neighbor-matching algorithm. Therefore, we used a sin value of 0.5 so that the effect of the centering error of the test image could be mitigated Figure 9 shows an example of the processing according to the proposed algorith ( Figure 2), in which the test image BTR 70 was well recognized in the ambiguous templ images (BTR 70 and ZSU23). First, the scattering points of each template image were lected in order of amplitude so that the number of scattering points of the test image d not exceed 48 × 1.3 (=62.4), where the parameter 1.3 means , as mentioned in Secti 2.2. Consequently, all 46 scattering points extracted from template BTR70 were selecte and only 62 of the 69 scattering points of template ZSU23 were selected. The point ra selected by neighbor matching had a lower value for template ZSU23 than BTR70. T test ASCs and matched template ASCs were used to reconstruct the WVV set, and t similarity between the test and template WVV sets was measured. The WVV-based sim larity ( ) of template ZSU23 was higher than that of template BTR70. Contrasting the number of pairs matched by the WBGM of template ZSU23 was smaller than that template BTR70; therefore, the of ZSU23 was lower. Finally, after WBGM, the matc ing score was calculated using Equation (9), giving a value of 0.63 for the template ima of BTR70 and 0.59 for ZSU23. Although the WVV-based similarity alone did not sho good recognition results, the performance was considerably improved using the propos weights. For neighbor matching, the radius must be determined. Although the WVV descriptor is not affected by the translation of the scattering point, translation may affect the identification performance because only a part of the template remains based on the test point after neighbor matching. In Refs. [17,24], the researchers applied several different radii (e.g., 0.3, 0.4, and 0.5) for neighbor matching, and then the averages of all the similarities were employed to determine the final similarity between the test and its corresponding template.
In the experiment of this study, when the number of extracted scattering centers was small, the identification rate increased when a high value of 0.5 was used, rather than a low value of 0.3 m. Conversely, when a high value of 0.5 was used, there was no change in the identification rate, even when the number of scattering centers was large. We may need to consider the possibility of a translation transformation between the target and template before applying the neighbor-matching algorithm. Therefore, we used a single value of 0.5 so that the effect of the centering error of the test image could be mitigated. Figure 9 shows an example of the processing according to the proposed algorithm ( Figure 2), in which the test image BTR 70 was well recognized in the ambiguous template images (BTR 70 and ZSU23). First, the scattering points of each template image were selected in order of amplitude so that the number of scattering points of the test image did not exceed 48 × 1.3 (=62.4), where the parameter 1.3 means A ratio , as mentioned in Section 2.2. Consequently, all 46 scattering points extracted from template BTR70 were selected, and only 62 of the 69 scattering points of template ZSU23 were selected. The point ratio selected by neighbor matching had a lower value for template ZSU23 than BTR70. The test ASCs and matched template ASCs were used to reconstruct the WVV set, and the similarity between the test and template WVV sets was measured. The WVV-based similarity (S Norm ) of template ZSU23 was higher than that of template BTR70. Contrastingly, the number of pairs matched by the WBGM of template ZSU23 was smaller than that of template BTR70; therefore, the W a of ZSU23 was lower. Finally, after WBGM, the matching score was calculated using Equation (9), giving a value of 0.63 for the template image of BTR70 and 0.59 for ZSU23. Although the WVV-based similarity alone did not show good recognition results, the performance was considerably improved using the proposed weights.

Standard Operating Condition
The proposed method was first evaluated under SOC on 10 classes of targets for overall classification accuracy. An actual image was used as the test target chip and a simulated image was used as the template. In the identification performance experiment using a synthetic image as a template, a simulated image can be used under the same observation conditions (depression angle) as the real image. However, owing to the characteristics of the SAR image, the distribution of scattering centers may vary depending on the imaging parameters, such as the depression angle or subtle changes in the pose of the target. In addition, because there is a difference between synthetic and real images, it is more advantageous for target identification to use not only the same observation angle data, but also adjacent observation angle data, as the template image. The merged training data were generated by combining the synthetic data with depression angles of 16° and 17° for 1052 chips. Tables 2 and 3 show the confusion matrix of the proposed method using a real 17° and 16° as the test image and a synthetic 16°/17° as the template. The performance was expressed by the percentage of correct classifications (PCCs). The average PCCs of all 10 targets in Table 2 were 90.7%, whereas the M2 target was recognized with PCCs under 80%, and the remaining targets were over 86%. For the result of real 16° data (Table 3), not only M2 but also BMP2 and M548 showed PCCs under 80%. Meanwhile, T72 and ZSU23 showed higher PCCs than the real 17° data. For real 16° and 17° data identifications, when only synthetic 16° and 17° data corresponding to each other were used as template data, the recognition rates were 87.0% and 88.0%, respectively. When synthetic 16°/17° data were used to identify real 16° and 17° data, the identification rates were improved to 87.3% and 90.7%, respectively. With the merged data from synthetic 16° and 17° angles, the real 17° data improved the recognition performance by 2.7%, while the real 16° data improved by only 0.3%.

Standard Operating Condition
The proposed method was first evaluated under SOC on 10 classes of targets for overall classification accuracy. An actual image was used as the test target chip and a simulated image was used as the template. In the identification performance experiment using a synthetic image as a template, a simulated image can be used under the same observation conditions (depression angle) as the real image. However, owing to the characteristics of the SAR image, the distribution of scattering centers may vary depending on the imaging parameters, such as the depression angle or subtle changes in the pose of the target. In addition, because there is a difference between synthetic and real images, it is more advantageous for target identification to use not only the same observation angle data, but also adjacent observation angle data, as the template image. The merged training data were generated by combining the synthetic data with depression angles of 16 • and 17 • for 1052 chips. Tables 2 and 3 show the confusion matrix of the proposed method using a real 17 • and 16 • as the test image and a synthetic 16 • /17 • as the template. The performance was expressed by the percentage of correct classifications (PCCs). The average PCCs of all 10 targets in Table 2 were 90.7%, whereas the M2 target was recognized with PCCs under 80%, and the remaining targets were over 86%. For the result of real 16 • data (Table 3), not only M2 but also BMP2 and M548 showed PCCs under 80%. Meanwhile, T72 and ZSU23 showed higher PCCs than the real 17 • data. For real 16 • and 17 • data identifications, when only synthetic 16 • and 17 • data corresponding to each other were used as template data, the recognition rates were 87.0% and 88.0%, respectively. When synthetic 16 • /17 • data were used to identify real 16 • and 17 • data, the identification rates were improved to 87.3% and 90.7%, respectively. With the merged data from synthetic 16 • and 17 • angles, the real 17 • data improved the recognition performance by 2.7%, while the real 16 • data improved by only 0.3%.    We attempted to prove the effectiveness of our proposed method using a subset of ASCs. There are three types of ASCs: ASC all means that all the test and template ASCs are used in WVV-based feature reconstruction as in Ref. [26], ASC subset(neighbor) indicates that the subset of ASCs is selected by neighbor matching, and ASC subset(amp,neighbor) indicates that the subset of ASCs is selected by amplitude-based sub-sampling followed by neighbor matching. Figure 10 shows the recognition rate of the 10 targets according to the ASC types used for classification. As described above, the merged data from synthetic 16 • and 17 • angles were used as a single template. This is the result of averaging the respective recognition performance obtained by using real 16 • and 17 • data as the test images. The overall performance of ASC all using all extracted ASCs suggested by Ref. [26] was lower than the results achieved by our proposed methods. When using the ASC all , the recognition rate of BMP2 was very low, at about 70%. The ASC subset(amp,neighbor) showed a high recognition rate of about 85% or more in all targets except M2, and the performance variance for the 10 targets was the smallest among the three types of ASC. The results of ASC subset(amp,neighbor) were appropriate for the 10-class target classification, and in particular, the recognition rate of BMP2, BTR70, and T72 was greatly improved, by more than 5% compared to ASC all .

Occlusion and Random Missed Pixels
In the real world, the occlusion of a target by the external environment, such as artificial or natural objects, can always occur. The performance of the proposed method was investigated using directional occlusions. Similarly, the test samples in Table 1 were simulated to obtain samples with different levels of directional occlusion for classification. Figure 11 shows the recognition rate for each method with varying levels of directional occlusions. When the target was partially occluded, only part of the local structure was discriminative for the target. The WVV-based reconstruction with ASCs selected by neighbor matching made it possible to consider the local similarity between the test and the template. When the test was occluded randomly from different directions, the performance of the proposed method, ( , ) , remained at a high level of over 70% until the occlusion level was 20%. Compared to the overall recognition rate of , it was improved by 1-4%. When the WVV was reconstructed with a part selected as a neighbor rather than the entire scattering center, better results were obtained with less than 20% occlusion. In addition, the method applying was maintained at ap-

Occlusion and Random Missed Pixels
In the real world, the occlusion of a target by the external environment, such as artificial or natural objects, can always occur. The performance of the proposed method was investigated using directional occlusions. Similarly, the test samples in Table 1 were simulated to obtain samples with different levels of directional occlusion for classification. Figure 11 shows the recognition rate for each method with varying levels of directional occlusions. When the target was partially occluded, only part of the local structure was discriminative for the target. The WVV-based reconstruction with ASCs selected by neighbor matching made it possible to consider the local similarity between the test and the template.

Occlusion and Random Missed Pixels
In the real world, the occlusion of a target by the external environment, such a ficial or natural objects, can always occur. The performance of the proposed metho investigated using directional occlusions. Similarly, the test samples in Table 1 wer ulated to obtain samples with different levels of directional occlusion for classific Figure 11 shows the recognition rate for each method with varying levels of direc occlusions. When the target was partially occluded, only part of the local structur discriminative for the target. The WVV-based reconstruction with ASCs selected by n bor matching made it possible to consider the local similarity between the test an template. When the test was occluded randomly from different directions, the performa the proposed method, ( , ) , remained at a high level of over 70% the occlusion level was 20%. Compared to the overall recognition rate of , improved by 1-4%. When the WVV was reconstructed with a part selected as a nei rather than the entire scattering center, better results were obtained with less tha occlusion. In addition, the method applying ( , ) was maintained proximately 3% higher than the neighbor subset, although there was no considerab When the test was occluded randomly from different directions, the performance of the proposed method, ASC subset(amp,neighbor) , remained at a high level of over 70% until the occlusion level was 20%. Compared to the overall recognition rate of ASC all , it was improved by 1-4%. When the WVV was reconstructed with a part selected as a neighbor rather than the entire scattering center, better results were obtained with less than 20% occlusion. In addition, the method applying ASC subset(amp,neighbor) was maintained at approximately 3% higher than the neighbor subset, although there was no considerable difference in value. However, when >25% occlusion occurred, it was better to use ASC all . In the case of occlusion of 20% or more, the average recognition rate was lowered to 70% or less. That is, the overall reliability of the performance was too low to be used in practice. Therefore, it is better to use a subset instead of all ASCs, assuming a low occlusion rate, for a high recognition rate.
The ASCs could be interrupted by noise and differences in image resolution. A sensitivity analysis for randomly missed points was also performed. We randomly removed them according to the percentage of missed points. The remaining SCs were used to reconstruct WVV-based features that were matched with the templates. The percentage of missed points varied from 0 to 50%, and 10 Monte Carlo simulations were conducted for each percentage.
As shown in Figure 12, ASC subset(neighbor) always showed a higher recognition rate than ASC all , verifying its robustness to random occlusion. The high performance of over 80% in ASC subset(neighbor) was maintained until a removal level of 45%. On the other hand, ASC subset(amp,neighbor) showed better results than ASC all only up to 15% random occlusion. This is because the ASCs in the test image were randomly removed, while the ASCs in the template image were removed based on the amplitude of ASCs, so the relevance between the remaining pixels decreased significantly as the percentage of random occlusion increased. less. That is, the overall reliability of the performance was too low to be used in pra Therefore, it is better to use a subset instead of all ASCs, assuming a low occlusion for a high recognition rate. The ASCs could be interrupted by noise and differences in image resolution. A sitivity analysis for randomly missed points was also performed. We randomly rem them according to the percentage of missed points. The remaining SCs were used construct WVV-based features that were matched with the templates. The percent missed points varied from 0 to 50%, and 10 Monte Carlo simulations were conduct each percentage.
As shown in Figure 12, ( ) always showed a higher recognitio than , verifying its robustness to random occlusion. The high performance o 80% in ( ) was maintained until a removal level of 45%. On the hand, ( , ) showed better results than only up to 15% ra occlusion. This is because the ASCs in the test image were randomly removed, whi ASCs in the template image were removed based on the amplitude of ASCs, so the vance between the remaining pixels decreased significantly as the percentage of ra occlusion increased. We achieved high performances using the SAMPLE dataset, with 100% syn data in the training set. When the experiments were conducted by changing the ty ASCs used, the overall recognition rate under SOC (no occlusion and no random m pixels) was improved about 4% compared to . Additionally, the proposed m was less sensitive to a small amount of partial occlusion and random pixel removal We compared the proposed method with existing methods [39] to illustrate it formance. The experimental results, where the average recognition rate was 24.97% solely synthetic data were used in the training batch, were initially presented by Ref the creators of the SAMPLE dataset. Their algorithm was based on a convolutional n network (CNN). They then achieved average accuracies of 51.58% by training Den with the assistance of a generative adversarial network (GAN) in their most recent [25]. Their deep learning-based performances seem to be superior to our proposed m in Refs. [51,52]. However, the accuracy dropped notably below 85% when they used We achieved high performances using the SAMPLE dataset, with 100% synthetic data in the training set. When the experiments were conducted by changing the types of ASCs used, the overall recognition rate under SOC (no occlusion and no random missed pixels) was improved about 4% compared to ASC all . Additionally, the proposed method was less sensitive to a small amount of partial occlusion and random pixel removal.
We compared the proposed method with existing methods [39] to illustrate its performance. The experimental results, where the average recognition rate was 24.97% when solely synthetic data were used in the training batch, were initially presented by Ref. [31], the creators of the SAMPLE dataset. Their algorithm was based on a convolutional neural network (CNN). They then achieved average accuracies of 51.58% by training DenseNet with the assistance of a generative adversarial network (GAN) in their most recent work [25]. Their deep learning-based performances seem to be superior to our proposed method in Refs. [51,52]. However, the accuracy dropped notably below 85% when they used 100% synthetic data in training. Consequently, our proposed method had a higher performance, which was also more stable when using 100% synthetic data as the training dataset.
In terms of the validity under EOCs, our performances were also compared to other previous works where they used the SAMPLE dataset for experiments, but the organization of data setting (number of classes, depression angle, etc.) was not same as our dataset. For example, the target recognition rate decreased by 30% during 50% random occlusion in Ref. [17], when they use a measured dataset (MSTAR) for both testing and training, but only by 10.6% in our study using ASC subset(neighbor) . This was the best performance under random occlusion among the previous works. Compared to the results of other methods, our similarity measure can effectively overcome the recognition difficulties caused by random occlusion when ASC subset(neighbor) is used, though the same condition with our dataset was not applied in the methods. We believe that the advantage of the proposed algorithm is that it mostly focuses on the local features related to the intersected target parts to improve the feasibility of the SAR ATR system in response to a realistic scenario. However, there are several limitations resulting from human intervention. One is the existence of an empirical threshold for amplitude-based subsampling, and the need to set a specific radius in neighbor matching, which can cause significant performance degradation in our algorithm when most parts of the target are occluded. To resolve these limitations, we will also consider how to integrate global similarities with the proposed method in the case of target occlusion.

Conclusions
A robust algorithm is required to identify partial differences between real and synthetic images to perform target identification based on a dataset of synthetic images, such as SAMPLE. We proposed an improved WVV-based ATR method using a subset of the template's ASCs, using the ASCs' amplitude and the proximity of the scattering center location, which is less susceptible to the partial differences between two ASC sets. The SAMPLE dataset, with 10 classes of military targets, was used in the experiments. With the merged template of synthetic 16 • /17 • images, the recognition rates were 87.3% for the 16 • real images and 90.7% for the 17 • real images. The performance with occlusion remained above 70% until the occlusion level reached 20%. In addition, the subset of ASCs selected by neighbor matching achieved recognition rates great than 80% until the proportion of randomly missed points reached 45%. Therefore, we expect that the proposed method will be useful in practical SAR ATR systems using synthetic images with partial and random differences in ASCs due to occlusion.  Data Availability Statement: The SAMPLE dataset was obtained in accordance with the instructions contained in Ref. [31] and is available online: https://github.com/benjaminlewis-afrl/SAMPLE_ dataset_public (accessed on 25 August 2022).