Double Weight-Based SAR and Infrared Sensor Fusion for Automatic Ground Target Recognition with Deep Learning

This paper presents a novel double weight-based synthetic aperture radar (SAR) and infrared (IR) sensor fusion method (DW-SIF) for automatic ground target recognition (ATR). IR-based ATR can provide accurate recognition because of its high image resolution but it is affected by the weather conditions. On the other hand, SAR-based ATR shows a low recognition rate due to the noisy low resolution but can provide consistent performance regardless of the weather conditions. The fusion of an active sensor (SAR) and a passive sensor (IR) can lead to upgraded performance. This paper proposes a doubly weighted neural network fusion scheme at the decision level. The first weight (α) can measure the offline sensor confidence per target category based on the classification rate for an evaluation set. The second weight (β) can measure the online sensor reliability based on the score distribution for a test target image. The LeNet architecture-based deep convolution network (14 layers) is used as an individual classifier. Doubly weighted sensor scores are fused by two types of fusion schemes, such as the sum-based linear fusion scheme (αβ-sum) and neural network-based nonlinear fusion scheme (αβ-NN). The experimental results confirmed the proposed linear fusion method (αβ-sum) to have the best performance among the linear fusion schemes available (SAR-CNN, IR-CNN, α-sum, β-sum, αβ-sum, and Bayesian fusion). In addition, the proposed nonlinear fusion method (αβ-NN) showed superior target recognition performance to linear fusion on the OKTAL-SE-based synthetic database.


Introduction
Automatic target surveillance and recognition is important for protecting borders and countries.Among several sensors available, infrared (IR) cameras, particularly mid-wave infrared band (3-5 µm), are used frequently in military applications because of the day and night operation capability [1,2].The research scope of this paper focuses only on ground target recognition assuming that the target regions or locations are detected by IR only [3][4][5], synthetic aperture radar (SAR) only [6][7][8], and fused sensor [9][10][11].Many military applications prefer to use IR-based target recognition because IR sensors have a passive nature and high image resolution.On the other hand, IR images are sensitive to weather and imaging conditions; IR-based automatic target recognition (ATR) methods have been attempted to overcome these limitations.Since the 1980s, model-based approaches have become popular and targets have been recognized by the alignment method, such as geometric hashing [12].Moreover, various image learning-based target recognition methods have been proposed by considering both feature extractors and classifiers to cope with IR variations.The Markov tree feature [13], IR wavelet feature [14], scale invariant feature transform (SIFT) [15], histogram of oriented gradients (HOG) [16], and moment features [17,18] are recently proposed infrared features that show promising recognition results on their own applications.Simple machine learning-based classification methods, such as the nearest neighbor classifier [15], Bayesian classifier, conventional neural network, Adaboost [19], and support vector machine (SVM) [16] are used frequently to discriminate the target features for classification.
SAR can measure the electromagnetic scattering property of targets under any weather and light conditions [20].This method is used frequently to recognize a range of targets because it provides a strong radar cross section (RCS) and shape information of non-stealth targets.On the other hand, it produces many false recognitions due to speckle noise [21].Various SAR features, such as polarimetry and transformations (log-polar, Fourier, and wavelet), are used to discriminate SAR targets [20,[22][23][24].The standard deviation, fractal dimension, and weighted-rank fill ratio are the basic SAR features proposed in the Lincoln laboratory [22].A genetic algorithm-based SAR feature selection method has been used to select the optimal features for target recognition [25].Targets can be recognized by classifying features.Template matching is popular in military applications because of its simplicity [26].The joint sparse representation-based method was used to tackle to the 3D problem in template matching using multiple views [27].A model-based approach using a scattering center model was proposed to reduce the database size [28].
The fusion of SAR and IR imagery enables a combination of complementary information, such as the thermal signatures in the IR case and RCS signatures in the SAR case.Both sensors have day and night capabilities, whereas the SAR sensor has weather-independence. Therefore, it is reasonable to use both SAR and IR sensors to recognize the targets stably by sensor fusion [29].There are almost no published works of SAR-IR fusion-based ATR except [30,31] due to security reasons.This method uses a hypothesis fusion coming from both SAR feature and IR features.IR target scores provide hints of possible target models.If the corresponding SAR target features are correlated with model features, the hypothesized target model is boosted.In SAR-IR fusion-based ATR research, there are three issues: preparation of the database (DB), fusion level, and fusion architecture.Based on these issues, the contributions of this paper can be summarized as follows.The first contribution is the preparation of a new SAR-IR database of 16 ground targets using an OKTAL-SE simulator [32][33][34].OKTAL-SE is the only simulator that can synthesize SAR and IR for the same background and targets.The second contribution is the proposition of a novel SAR-IR fusion architecture based on the double weight-based linear sum scheme at the decision level.The third contribution is the adoption of a state-of-the-art classifier, deep convolutional neural network, which consists of 14 layers.
The remainder of this paper is organized as follows: Section 2 introduces the background of SAR-IR fusion levels and fusion methods.Section 3 explains the overall structure of the paper, including the SAR-IR database construction method, deep convolutional neural network-based basic classifier, and double weight-based SAR and IR sensor fusion for ground target recognition.Section 4 explains the composition of the DB and classifiers and Section 5 evaluates the target recognition performance of the proposed method by comparing it with the Bayesian fusion.The paper is concluded in Section 6.

Background of SAR-IR Fusion Level and Fusion Method
Ground targets can be recognized by SAR and IR sensor fusion.The recognition of targets from multisensor data can be described at the level at which the data are combined [35].According to sensor fusion strategy, there are three types of fusion schemes in SAR and IR fusion-based recognition, as shown in Figure 1 [36,37].

Pixel-Level Fusion
Feature-Level Fusion Decision-Level Fusion The pixel-level fusion scheme is used frequently in homogeneous sensors governed by the same underlying physical mechanisms, such as CCD and IR, or in visualization for human understanding [38].In pixel-level fusion, the data is combined, and the features are then extracted.Li et al. conducted the first trial of SAR-IR fusion at the pixel-level in 1996 [39].The HNC (the name of the software company) algorithm was proposed to fuse the SAR and IR images based on the stereo human visual system to enhance the image contrast [40].In 1997, Novak et al. in the Lincoln Laboratory proposed a super-resolution by the pixel-level fusion of SAR and IR images, which showed the upgraded performance of target classification [41].In satellite-based applications, Wavelet transformation-based fusion showed upgraded earth surface classification [42,43].Recently, a compressed sensing-based SAR-IR fusion method was presented by a discrete cosine transformation (DCT) and sparse coefficient fusion, which showed better image contrast [38].
In contrast, in feature-level fusion, feature extraction is performed individually on the data of each sensor; these feature vectors are then combined.The classifiers use the concatenated feature vector of SAR and IR [44].The first SAR-IR fusion in the feature-level was presented using the DCT feature and distance measure in 1996 [35].Lei et al. fused the shape information of the electro-optical (EO) sensor and fractal dimension of SAR sensor and classified targets by fuzzy C-means in 2005 [45].Lehureau et al. fused the Gabor features for EO and log-cumulants for SAR in feature level [44].SVM conducted the classification of the fused vector for the target classifications of buildings, roads, and forests [46].Recently, a multiple kernel learning (MKL)-based recognition method was proposed by selecting the features automatically for the earth surface classification of Lansat images [47].
Individual classifier decisions can be fused at the decision-level.The initial work of decision-level SAR-IR fusion was conducted in 1999 by sequential fusion [30].The multi-frame IR sensor provides a hypothetical target model and the SAR sensor recognizes the final targets [31].Radar and IR fusion in the decision level provides upgraded target recognition and tracking accuracy [48].In 2002, evidence reasoning (Dempster-Shafer theory, DST) was extended to target recognition by fusing the SAR and EO images at the decision level [49].DST was used to fuse the multi-sensors for target detection by generalizing the Bayesian fusion.The Bayesian sensor fusion method was used for target recognition [50].In 2007, Waske et al. proposed a fused SVM classifier that combines SVM result per sensor and builds a new SVM classifier [51].Recently, majority voting (MV)-based approaches were proposed and showed promising results in SAR-EO-based target recognition.Each target feature is assigned to each classifier, which votes the recognized target identity (ID).Targets with the maximal votes are finally recognized [52][53][54].Logical AND/OR operation after Bayesian classification was proposed to fuse SAR-EO images for earth surface classification [55].
The first issue is how to select the optimal fusion level for SAR-IR-based target recognition.Understanding the SAR and IR image acquisition scenario is very important for enhancing the recognition performance, as shown in Figure 2a, which shows the scenario of target detection and recognition in an inaccessible area using a multitude of sensor types, such as IR and SAR.A SAR sensor mounted on an airplane can image an inaccessible area by the small depression angle and the IR sensor mounted on a satellite or unmanned aerial vehicle (UAV) can image the same region in a top-down view.The SAR sensor should move to image the region, which requires some processing time.On the other hand, the IR sensor can record the targets in a real-time video.In addition, it is assumed that the targets (e.g., T72, AMX10) to be recognized can be stationary or moving.Therefore, the imaged SAR and IR targets are not aligned in the spatial and time domains.As shown in Figure 2b, pixel-level fusion is suitable for visualizing homogeneous sensors but requires strict subpixel and time alignment.Although feature-level fusion can provide a powerful feature vector that can upgrade the target classification rate, it also requires pixel and time alignment.On the other hand, the decision-level fusion is suitable for heterogeneous sensor fusion, such as SAR-IR, and does not require pixel/time alignment because each sensor is processed independently.Therefore, this study uses the decision-level SAR and IR fusion for ground target recognition.In this paper, it was assumed that the detected SAR and IR target regions were registered correctly using a manual selection or automatic registration method [9] to focus on fusion-based target recognition.The second issue is what type of fusion framework should be adopted at the decision level.This paper proposes two kinds fusion schemes: linear sum and nonlinear neural network.Linear sum-based classifier fusion is used frequently because of its simplicity and good performance.There are strategies in linear sum-based classifier fusion: hard classification-based fusion and soft classification-based fusion [54].MV belongs to a hard classification, where each classifier provides 1 for the recognized label and 0 for the others.The final classification decision is made by selecting the class label that receives the maximal vote.The weight vote (WV) is a modified version of MV by changing 1 to a weighted vote using the classifier reliability [56].Bayesian fusion, Dempster-Shafer theory-based fusion, and Adaboost belong to the soft classification-based fusion, where the weighted output of each classifier is summed up to make the final decision [57].The linear sum fusion based on probability can provide a class confidence between 0 and 1, which is more reasonable than that based on a vote of 0 or 1.Therefore, this paper adopted the linear sum fusion approach using the classifier reliability as a baseline fusion scheme.Another nonlinear fusion scheme, such as the multi-layer neural network, is adopted because it can adjust the weights dramatically according to the current statistics among the SAR and infrared data.

Proposed Double Weight-Based SAR-IR Fusion for ATR
In linear/nonlinear sensor fusion, an estimation of the sensor reliability is important for the success of target recognition in non-cooperative environments.This paper proposes a novel Double Weight-based SAR-IR Fusion (DW-SIF) framework, as shown in Figure 3.The first weight vector SAR sensor (α SAR ) and IR sensor (α IR ) can be obtained by applying the evaluation DB to the trained classifiers (SAR-CNN and IR-CNN).The first weight vector can measure the reliability of the classifier for each target label based on the classification accuracy offline.In the test mode, each trained classifier produces the target probability vector (P SAR , P IR ) for each test image.The second weight (scalar, β SAR , β IR ) measures the confidence of the individual sensor based on the entropy calculated from the target probability distribution.Multi-sensor information is fused using the proposed linear sensor fusion I (double weighted linear sum of the SAR and IR sensor) and nonlinear sensor fusion II (double weighted multi-layer neural network).A target is finally recognized by applying the maximum operation to the fused vector.The following subsections provide details of database preparation method, deep learning-based individual classification method, and sensor fusion method.

SAR-IR Database Construction
The proposed DW-SIF system requires three types of SAR-IR databases for training, evaluating, and testing.In SAR and IR fusion-based target recognition, the most difficult part is how to prepare the SAR and IR DB for the same target and background environments to validate a range of recognition algorithms.According this survey, there are no public SAR/IR databases due to security reasons.Four types of DB preparation methods can be considered.The first database acquisition strategy is to use a real IR camera and SAR sensor mounted on an airplane.This is the most accurate and useful method but it is also the most expensive due to the expensive sensors and acquisition platform.The second strategy is to use satellites, such as TerraSAR-X for SAR and KOMPSAT-3A for IR.On the other hand, identifying the various military target images for both SAR and IR sensors is also very difficult and expensive.The third method is to develop a SAR and IR simulator for DB generation, which is out of the current research scope and will require considerable time for development.The final strategy is to purchase commercially available software that can synthesize both SAR and IR images for the same scenario.Several simulators that can work on a specific spectrum are available: DIRSIG and SensorVision for IR and Xpatch for radar.The OKTAL-SE, which is a proven synthesizing tool, is the only simulator that can generate both SAR and IR images [32,34].As shown in Figure 4, the user parameter and atmospheric file are inserted into the SE-SCENARIO program, which can manipulate the SAR/IR sensor platform and locate the targets in a specific background.The SAR and IR images are generated simultaneously for the same scenario using SE-RAY-IR and SE-RAY-SAR software.The synthesized raw data can be modified further by reflecting the sensor noise in the SE-SIGNAL-VIEWER module.In particular, SE-AGETIM/SE-FFT deals with a 3D representation and the production of terrain and target models.SE-PHYSICAL-MODELER defines the material properties for EM and IR rendering.The surface temperature and RCS can be modified depending on the target types and spectral bands.SE-ATMOSPHERE generates an atmospheric transmission graph from parameters, such as weather conditions, time of day, wavelength, altitude, star irradiance, sky radiance, ground radiance, and seasons.SE-THERMAL generates a thermal database from a material database and atmospheric files.SE-SCENARIO is the main simulator that can modify the background, targets, and scenarios.SAR and IR sensors can be mounted on an airplane and moved to specific trajectories.Through the SE-RAY-SAR (SE-RAY-NBSAR) and SE-RAY-IR, synthetic SAR and IR images are synthesized.Figure 5 provides partial examples of a SAR and IR image generation for 16 targets.In SAR image generation, the center frequency is set to 34.25 GHz with a band width of 500 MHz in horizontal polarization.In IR image generation, mid-wave band (3-5 µm) is used for the same scenario file.The generated SAR image shows very strong speckle noise and the IR image shows a bright intensity around the engine location.

14 Layered-Deep Convolutional Neural Network Classifier
Many hand-crafted features (e.g., SIFT, HOG, ACF, etc.) and classifiers (SVM, Adaboost, Bayesian, random forest, etc.) are available, as discussed in the introduction.Recently, deep learning-based algorithms, which learn features and classifiers simultaneously, have been proposed and showed outperformance in RGB-based object classification on ImageNet [58] and the CIFAR image database [59].CNN-based approaches show better performance on visual object recognition than the stacked autoencoder [60,61].The stacked autoencoder (SAE) is comprised of several individual autoencoders, which are optimized to reduce the dimension for 1D data (speech).A convolutional neural network (CNN) is comprised of spatial convolution layers and fully connected layers.The main advantage of this layer is the establishment of local connectivity through the correlation of neighboring pixels, which is optimized to the 2D data (image).According to the experimental comparisons, CNN outperforms SAE [61].Therefore, this paper adopts deep learning approaches as a base classifier to recognize the individual SAR and IR target recognition.Among the several deep learning architectures, the LeNet-based deep convolutional neural network architecture [62] is used by changing the input size and number of layers for 16 IR target recognition, as shown in Figure 6.The deep learning architecture used in [63] consists of full convolutional layers without fully connected layers.The MatConvNet toolbox [64] is used for training and learning because this study focused on the SAR-IR fusion for target recognition.
The architectures, SAR-CNN and IR-CNN, are the same except for the input image size and related data size, as shown in Figure 6.Domain specific SAR-CNN methods can be adopted depending on the outliers [65] and terrain classification [66].The architecture consists of 14 layers: 1 input layer, 4 convolutional layers, 3 pooling layers, 4 Rectified Linear Unit (ReLU) layers, 1 fully connected (FC) layer, and 1 output layer.The SAR-CNN architecture can be explained in terms of data flow and operational flow.An input layer receives an IR image, 64 × 64 in size.The first convolution operation using 32 kernels with a 5 × 5 support region, stride 1, and padding 2 produces 32 feature data with a 64 × 64 resolution, as shown in Figure 6a

Proposed Double Weight-Based SAR-IR Fusion Method
As explained in Figure 3, the first sensor weights represent the sensor reliability for each target label evaluated offline.Given a trained SAR-CNN, the base classifier provides the accuracy (α i SAR ) of i-th target label as defined in Equation ( 1) where m i SAR denotes the total number of test samples of the i-th target and n i SAR denotes the total number of correct recognitions.
If the number of target labels is assumed to be N, the first weight vector (α SAR ) for the SAR sensor is defined as Equation (2), which represents the reliability of the SAR sensor for each target label.The classifier can be regarded as reliable if α i SAR ≈ 1 or there is high recognition accuracy, .
Similarly, given a trained IR-CNN, the base classifier provides the accuracy (α i IR ) of the i-th target label, as defined in Equation (3), where m i IR denotes the total number of test samples of the i-th target and n i IR denotes the total number of correct recognitions.
The first weight vector (α IR ) for the IR sensor is defined in Equation ( 4), which represents the reliability of the IR sensor for each target label.The classifier can be regarded as reliable if α i IR ≈ 1 or has high recognition accuracy.
Figure 7 shows the estimated alpha vectors for the SAR and IR sensor by applying the trained SAR-CNN and IR-CNN to the evaluation DB, which are explained in the experimental section.According to the offline evaluation, the IR sensor is more reliable (higher α) than the SAR sensor except the jeep, sa9ichcamo, and tmm targets onto which IR sensor shows poor recognition performance due to the similar visual shapes compared to the SAR sensor.After estimating the offline sensor reliability, the next step is to measure the online sensor confidence (the second weight) for a test image.Although the probability value provides the level of target confidence, the additional information of sensor confidence is useful in SAR-IR fusion-based target recognition because the recognition capability changes dynamically according to the target input.This paper proposes an online confidence estimation based on the entropy of each sensor.If the distribution of a probability is uniform or ambiguous, the entropy is high, as shown in Figure 8 left.Similarly, the entropy is low if the distribution of a probability shows peak at a specific target label.Based on this property, the novel confidence measures (β SAR , β IR ) for the SAR and IR sensors are defined in Equations ( 5) and ( 6), respectively.
where H SAR and H IR represent the SAR and IR sensor entropies defined in Equations ( 7) and ( 8), respectively.The SAR and IR recognition probabilities (P SAR , P IR ) for a test pair can be obtained by normalizing the scores produced by SAR-CNN and IR-CNN.The finally recognized target identity information (ID) is obtained using either the proposed linear fusion method defined in Equation ( 9) or the multi-layer neural network-based nonlinear fusion method.The offline sensor reliability vectors (α SAR , α IR ) and online sensor confidence scalars (β SAR , β IR ) are multiplied by the base classifier probabilities (P SAR , P SAR ) and summed in the linear fusion scheme.The target recognition is completed by the max operation to the fused vector, as shown in Figure 8 right.
In the multi-layer neural network fusion scheme, input data is prepared by concatenating the double weighted SAR vector (α SAR β SAR P SAR ) and IR vector (α IR β IR P IR ), as shown in Equation (10).In this fusion scheme, the multi-layer neural network consists of an input layer (node size 32), hidden layer (node size 40), and output layer (node size 16) because the number of targets to recognize is 16 (see Figure 3).The same double weights are used in this scheme.
The effects of the offline weights (α SAR , α IR ) can be visualized, as shown in Figure 9, where the x-axis represents the target labels and the y-axis represents the probability of target recognition.Each curve represents the target recognition probability depending on the recognition methods, such as the SAR sensor only, IR sensor only, and the proposed linear DW-SIF (αβ-sum).According to the IR-CNN probability distribution, the test input (ID = 9, jeep) is recognized as ID = 3 (audi), which has a similar shape to each other in the IR domain.On the other hand, the SAR-CNN can provide a correct answer.The proposed linear DW-SIF (αβ-sum) can correct the IR sensor information using the CNN-SAR information and offline weight of SAR (α SAR ), as indicated in Figure 7, where the weight of the SAR (0.5692) is higher than that of the IR (0.4308).Table 1 provides the details of the double weight-based SAR-IR fusion flow.In this case, the incorrect recognition using only the IR sensor is corrected by the linear fusion scheme.The effect of the online weights (β SAR , β IR ) can be visualized, as shown in Figure 10, where each curve represents the target recognition probability curve depending on the sensor type.Given a test input (ID = 5, bus), the IR-CNN provided a correct answer, while the SAR-CNN failed to recognize it because the SAR signatures of the bus (ID = 5) and oil tanker (ID = 11) were similar.The entropies of SAR and IR were estimated to be 2.7685 and 2.6877, respectively, from the SAR and IR probability distributions using Equations ( 7) and (8).The corresponding online weights of the SAR and IR sensor were calculated using Equations ( 5) and ( 6).The SAR sensor shows higher entropy than the IR sensor because the probability distribution of the SAR sensor is flatter (equally probable) than that of the IR sensor.The online weights (β SAR , β IR ) were finally estimated using Equations ( 5) and (6).The online weight of the IR sensor (β IR = 0.51) was higher than that of the SAR sensor (β SAR = 0.49), which could correct the SAR-CNN information.Note that the offline weights of the SAR and IR sensors for ID = 5 were similar, as shown in Figure 7.

Composition SAR-IR Target Database
OKTAL-SE can generate a range of SAR-IR target images by varying the SAR, IR sensor setting (spectral band, detector size, field of view, depression angle, and height), atmospheric setting, and target pose (aspect angle).The simulation scenario was assumed to be ground surveillance on an unmanned aerial vehicle.OKTAL-SE SAR simulator can generate target images with a 30 cm × 30 cm resolution per pixel.The spectral band was basically a mid-wave IR (MWIR) and the other camera parameters were set to produce a 5 cm × 5 cm resolution per pixel.
Table 2 lists the composition of the target DB for training and testing for the DW-SIF.The total number of targets was 16; among them, 10 targets were military targets (BMP3, T72, AMX10, AMX10RC, Leclerc, Jeep, TMM, Rada Camo, SA9 Inch Camo, and VAB OBS) and 6 targets were non-military targets (Audi, Bus, Clio, Firetruck, Oil tanker, and Ford transit).In the case of the train DB, the SAR target templates had a 64 × 64 image resolution with a depression angle of 10, 15, 20, and 25 • and an aspect angle of 5 • .The total number of the SAR training DB was 4608 (16 targets × 72 aspect angles × 4 depression angles).Similarly, the IR target templates had a 96 × 96 image resolution with a depression angle of 65, 70, 75, and 80 • and aspect angle of 5 • .The total number of IR training DB was 4608, which is the same number as the SAR templates.Figure 11a shows the 16 SAR targets (top) and 72 aspect views for T72 at a depression angle of 20 • (bottom).Figure 11b presents the corresponding IR target templates of T72 at a depression angle of 75 • .
An evaluation DB is required to measure the classification reliability of the SAR and IR sensors, which are used to estimate the offline weights.As explained in the third row of Table 2, 4608 composite images were used in each sensor by setting the specific parameters, such as PSNR, blurring level (σ), rotation jitter, and translation jitter.
In the case of the test DB, four types of DB were prepared by adding noise, blur, rotation jitter, and translation jitter to determine the effects of the image variations on the fusion-based target recognition.
The number of test DB per test condition was 4608, which is the same number of target templates for an individual sensor.

Comparison of Base Classifiers
A comparison of the base classifiers should be conducted before a fusion-based ATR evaluation.Three types of base classifiers were selected considering the recognition framework.The first base classifier was the IR-CNN presented in this paper based on the LeNet architecture [62].The IR-CNN can learn the feature extractor and classifier simultaneously in a network.The second base classifier is Transfer learning using the AlexNet and SVM [67,68].The feature was extracted from the pre-trained AlexNet and the classification was conducted by SVM.This method can be useful when the size of the training DB is small.The last base classifier is the HOG-SVM, which is popular in object detection and classification [16].HOG is a hand-crafted orientational feature based on the SIFT.The IR-CNN is a fully automatic classifier and the HOG-SVM is a classical classifier.Transfer learning is a hybrid framework by compromising both sides.Three base classifiers were trained using the training DB (4608), as shown in Table 2. Test images (4608) were prepared by adding Gaussian noise only, as shown in Figure 12 (bottom).According to the evaluation, the HOG-SVM showed the worst performance followed by Transfer learning, as shown in Figure 12.The proposed IR-CNN showed noise resistive performance due to the stochastic gradient descent and dropout method during deep learning.The HOG-SVM is based on gradient information to estimate the orientation, which leads to poor performance to noise.In the case of Transfer learning, the feature extractor, AlexNet, is learned from millions of RGB images not IR images, which results in poor recognition performance in a low SNR.Therefore, this study used the IR-CNN-based base classifier for IR ATR and the SAR-CNN for SAR ATR in the performance evaluation of fusion-based target recognition.

Analysis of CNN Training
The parameter analysis related to CNN training is important for successful target recognition.In this subsection, the effects of the database size on the recognition rate and training time were evaluated because the total number of weight parameters of IR-CNN is huge (668,656).Figure 13a shows the training effect according to the number of DB sizes.In this evaluation, IR-CNN is trained using the batch size of 150 and an epoch of 40 with a DB size interval of 450.A noisy test set (4608) is applied to check the training level.If the number of training image is larger than 4000, IR-CNN shows a 99% correct recognition rate.Figure 13b shows the required training time according to the number of training images.The process took 156.8 s to learn 4608 images on a deep learning platform (GPU: NVIDIA GTX1080ti, CPU: i7-5820K, RAM: 128 GB).In the testing phase, approximately 5 ms were needed to recognize a 96 × 96 test image using the same platform.

Experimental Results
The SAR-CNN and IR-CNN were trained using the 4608 SAR images and 4608 IR images.Figure 14 presents the training curves in terms of the objective function, top 1 error, and top 5 error.Although the IR-CNN showed a 0 top 1 error after 20 epochs, the SAR-CNN showed some residual error (0.3) after 80 epochs, which originated from the low signal-to-noise ratio due to the SAR speckle noise.In the performance evaluation, the two baseline classifiers (SAR-CNN only (SC), IR-CNN only (IC)), and six fusion frameworks (Fusion using alpha only (α-sum), Fusion using beta only (β-sum), Fusion using alpha+beta (αβ-sum, proposed linear fusion scheme I), Bayesian fusion (BF) [57], Fusion using neural network (NN), and Fusion using alpha+beta neural network (αβ-NN, proposed nonlinear fusion scheme II)) were compared.In the implementation of the Bayesian fusion, the prior is assumed to be uniform and the SAR and IR sensors are independent, which makes Bayesian fusion possible by the product of two output probability distributions.In the case of the neural network, the weights in the layers are learned using the evaluation set, as shown in Table 2. Four types of test sets, such as Noise, Blur, Rotation jitter, and Translation jitter, were prepared, as indicated in Table 2 (bottom row).These image variations occur frequently in outdoor ATR applications because of the sensor noise, imaging platform, and weather conditions.The default control parameters for the SAR and IR test DB were set as follows: Peak Signal-to-Noise Ratio (PSNR) = 22 dB, σ (Gaussian blur) = 1, Rotation jitter (uniform) = ±1 • , and Translation jitter = ±1pixel (uniform) to reflect the moderate noisy environments.
The first evaluation was conducted by varying the SAR/IR image noise.Speckle noise and thermal noise exist in the SAR and IR images, respectively.Figure 15 compares the noise distribution of the real SAR image (MSTAR) and synthesized SAR image (SE-RAY-SAR).Note that the synthesized SAR images already have K-distribution and it is a second best choice to add Gaussian noise for the simplicity of experiments.The PSNR changed from 14.3 [dB] to 18.5 [dB] and 4608 images per specific PSNR were generated by adding Gaussian noise to the training DB, as shown in Figure 16b.Figure 16a presents the results of the performance comparison.The SC method showed the worst target recognition results followed by IC.In the fusion methods, the proposed linear fusion scheme (αβ-sum) showed the best performance among the linear fusion schemes.Interestingly, α-sum fusion (offline) was better than the β-sum fusion when PSNR was below 17.2 [dB].On the other hand, the situation is reversed when the PSNR is higher than 17.2 [dB], as shown in Figure 16a.In a noisy environment, alpha (offline sensor reliability) plays a crucial role in SAR-IR fusion.The BF (Bayesian Fusion) showed similar performances to the α-sum fusion.Although αβ-sum fusion showed good performance compared to the other linear fusion schemes, nonlinear fusion using a neural network (NN) outperformed it.If double weights are used in the neural network input (αβ-NN), the recognition performance is upgraded by 1.13% points, on average, at PSNR 16.3 dB.The confusion matrices of the eight ATR methods can be compared, as shown in Figure 17, which were obtained at PSNR = 16.3 [dB].The recognition accuracy of each target can be compared for each method.Note that the proposed linear fusion scheme I (αβ-sum) can upgrade the recognition performance for all the labels among the linear fusion schemes.For example, the recognition accuracy of the leclerc target changes to 34.4%, 71.9%, 79.9%, 78.8%, 87.8%, and 83.0%, for SC, IC, α-sum, β-fusion, αβ-sum, and BF, respectively.In addition, the proposed nonlinear fusion scheme II (αβ-NN) showed the best performance.This is even better than the αβ-sum fusion by 8.94% points, on average.The second evaluation was conducted by varying the SAR/IR image blur.The SAR image can be blurred by target motion and the IR image can be blurred by sensor motion, lens, and atmospheric conditions in the real world.The image blurring is performed by the Gaussian filter by changing the σ parameter, as shown in Figure 18b.The blur parameter (σ) changes from 1.0 to 3.4 and 4608 images per specific parameter were generated by applying Gaussian filtering to the training DB. Figure 18a shows the results of the performance comparison.The SC method showed the worst target recognition results followed in order by IC, α-sum, and β-sum.The proposed linear fusion scheme (αβ-sum) showed the best performance among the linear fusion schemes, as shown in Figure 18a.Although αβ-sum fusion showed good performance compared to the other linear fusion schemes, nonlinear fusion using neural network (NN) outperformed them all.If double weights are used in the neural network input (αβ-NN), the recognition performance was upgraded by 1.04% points, on average, at σ = 2.7.The confusion matrices of the eight ATR methods can be compared, as shown in Figure 19, which was obtained at σ = 2.7.The recognition accuracy of each target was compared for each method.Note that the proposed linear fusion scheme I, αβ-sum, upgraded the recognition performance for most labels among linear fusion schemes.For example, the recognition accuracy of the leclerc target changed to 74.7%, 63.9%, 80.2%, 85.1%, 89.2%, and 85.1% for SC, IC, α-sum, β-sum, αβ-sum, and BF, respectively.In addition, the proposed nonlinear fusion scheme II (αβ-NN) showed the best performance.This was even better than the αβ-sum fusion, by 3.86% points, on average.The third evaluation was conducted by varying the SAR/IR image rotation (θ).Ground targets can move and rotate.Therefore, performance analysis on the rotational variations is required.The training DB consists of 360 • views with an interval of 5 • .Therefore, the rotated SAR-IR DB can be generated by adding uniform rotation noise between −k and +k, whose maximum value is 5 (θ ∼ U(−k, +k)). Figure 20b provides partial examples of rotated SAR and IR images.The rotation parameter (k) changes from 0 to 5 and 4608 images per specific parameters were generated by an image transformation with a bilinear interpolation assuming a fixed PSNR (17.0 dB). Figure 20a shows the results of the performance comparison.The SC method showed the worst target recognition results followed by IC, β-sum, BF, and α-sum.The proposed linear fusion scheme I ((αβ-sum) showed the best performance in all the test images among the linear fusion schemes, regardless of the rotation parameter k, which originated from the composition of training DB.Although αβ-sum fusion showed good performance compared to other linear fusion schemes, nonlinear fusion using neural network (NN) outperformed them all.When double weights were used in the neural network input (αβ-NN), the recognition performance was upgraded by 0.19% points, on average, at a rotation jitter of 3.5 • .The confusion matrices of the eight ATR methods were compared, as shown in Figure 21, which was obtained at k = 3.5.The recognition accuracy of each target was compared for each method.The proposed nonlinear fusion scheme II (αβ-NN) showed the best performance in rotational variation followed in order by αβ-sum, α-sum and BF.For example, the recognition accuracy of the tmm target changed to 36.1%, 11.1%, 25.3%, 25.3%, 66.3%, 25.3%, 86.1%, and 92.0% for SC, IC, α-sum, β-sum, αβ-sum (proposed linear fusion scheme I), BF, NN, and αβ-NN (proposed nonlinear fusion scheme II), respectively.The final evaluation was conducted by varying the SAR/IR image translation (t x , t y ).SAR and IR target images can be translated by an inaccurate automatic target detection (ATD) system.Therefore, the effect of image translation on target recognition should be analyzed.The synthetic image translation was performed by applying an image transformation with a bilinear interpolation.The x-axis translational parameter (t x ) and y-axis translational parameter followed a uniform distribution between −l and +l ((t x , t y ) ∼ U(−l, +l)). Figure 22b shows the synthesized SAR and IR images at l = 2, 3 [pixel].The translational parameter (l) changed from 0.2 to 3.0 and 4608 images per specific parameter were generated assuming a fixed PSNR (17 dB). Figure 22a presents the results of the performance comparison.The SC method showed the worst target recognition results followed by IC, β-sum, α-sum, BF, and αβ-sum fusion (proposed linear fusion scheme I), as shown in Figure 22a.Although αβ-sum fusion showed good performance compared to other linear fusion schemes; nonlinear fusion using neural network (NN) outperformed the other methods.When double weights were used in the neural network input (αβ-NN), the recognition performance was upgraded by 0.18% points, on average, at a translation jitter 1.5 pixels.The confusion matrices of the eight ATR methods were compared, as shown in Figure 23, which were obtained at l = 1.5.The recognition accuracy of each target can be compared for each method.Note that the proposed αβ-NN can improve the recognition performance for most labels.For example, the recognition accuracy of the tmm target changed to 33.0%, 8.0%, 25.0%, 25.3%, 71.9%, 25.3%, 86.5%, and 92.7% for SC, IC, α-sum, β-sum, αβ-sum (proposed linear fusion scheme I), BF, NN, and αβ-NN (proposed nonlinear fusion scheme II), respectively.

Conclusions
Recognizing the ground targets any-time and under any weather condition is important for homeland security and defense.This paper presented a novel SAR (active) and IR (passive) image information fusion method for automatic target recognition based on an offline classifier reliability and online classifier confidence in a weighted sum-based fusion framework, called the Double Weight-based SAR and IR Fusion (DW-SIF).A novel SAR and IR image database (4608 per sensor) was constructed using an OKTAL-SE simulator for 16 ground targets.A LeNet-based classifier architecture was presented; the optimized IR-CNN showed much better target classification performance than the Transfer learning and HOG-SVM in the noise test.The offline weights (α SAR , α IR ) were estimated by applying SAR-CNN and IR-CNN to the evaluation DB.The online weights (β SAR , β IR ) were estimated by applying Entropy to the SAR/IR probability distributions.According to the performance evaluation results for the four types of image variation experiments (noise, blur, rotation jitter, and translation jitter), the proposed linear fusion scheme I (αβ-sum) method showed the best target recognition performance among the linear fusion schemes, such as the SAR-CNN only, IR-CNN only, Fusion by alpha, Fusion by beta, Fusion by alpha+beta, and Bayesian fusion.Furthermore, the proposed nonlinear fusion scheme II (αβ-NN) showed much better performance than the linear fusion approaches.

Figure 2 .
Figure 2. Necessity for decision-level SAR and IR sensor fusion: (a) operational concept of automatic target recognition; (b) pros and cons for each fusion level.

Figure 3 .
Figure 3. Proposed DW-SIF system for ground target recognition.

Figure 4 .
Figure 4. Simultaneous SAR and IR image generation flow using the OKTAL simulation environment (OKTAL-SE).

Figure 5 .
Figure 5. Examples of SAR and IR target generation using OKTAL-SE.

Figure 8 .
Figure 8. Offline confidence and online confidence-based SAR and IR fusion flow.

Figure 11 .
Figure 11.Composition of the SAR-IR target database: (a) 16 SAR targets (top) and 72 aspect views of T72 at the depression angle 20 • (bottom); (b) corresponding 16 IR targets (top) and 72 aspect views of T72 at the depression angle 75 • (bottom).

PSNRFigure 12 .
Figure 12.Performance comparison between IR-CNN, transfer learning, and HOG-SVM on the IR database.

Figure 13 .
Figure 13.Training parameter analysis: (a) recognition rate vs. training DB size; (b) training time vs. training DB size.

Figure 15 .
Figure 15.Analysis of SAR noise: (a) K-distribution of a MSTAR SAR image; (b) K-distribution of synthesized SAR image.

Figure 16 .
Figure 16.Performance evaluation results for noise variations: (a) recognition rate vs signal-to-noise rate [PSNR]; (b) test examples of the SAR and IR images at different PSNRs.

Figure 18 .
Figure 18.Performance evaluation results for blur variation: (a) recognition rate vs blurring level [σ]; (b) test examples of SAR and IR images at different blur levels.

Figure 20 .
Figure 20.Performance evaluation results for the rotational variation: (a) recognition rate vs. rotation angle [ • ]; (b) test examples of SAR and IR images at different rotation levels.

Figure 22 .
Figure 22.Performance evaluation results for the translational variation: (a) recognition rate vs. the translation level [pixel]; (b) test examples of SAR and IR images at different translation levels.
Table 3 compares the performance in terms of the average recognition accuracy extracted from the confusion matrices shown in Figures 17, 19, 21, and 23.Fusion by α can improve the accuracy using the offline classifier reliability and Fusion by β can increase the SAR-CNN and IR-CNN also based on the online classifier confidence.The proposed linear fusion scheme I (α + β-sum) showed the best target recognition performance among linear fusion schemes for the four types of image variations.The proposed nonlinear fusion scheme II (α + β-NN) showed upgraded performance compared to the linear fusion schemes.

Table 1 .
Details of the recognition improvement by the offline weights.

Table 2 .
Composition of the train, evaluation, and test SAR-IR DB.

Table 3 .
Performance comparison of the ATR methods in terms of the average accuracy for the four types of image variations.