SAR Target Classification Based on Deep Forest Model

: Synthetic aperture radar (SAR) has become one of the most important means of information acquisition in today's society and shows great potential in many fields. Target identification and classification of SAR images are also the focus of research. With the vigorous development of deep learning, many researchers apply this method to SAR target classification to obtain a more automatic process and more accurate results. In this paper, a novel deep forest model constructed by multi-grained cascade forest (gcForest), which is different from the traditional neural network (NN) model, is employed to classify ten types of SAR targets in the moving and stationary target acquisition and recognition (MSTAR) dataset. Considering that the targets of input images may be off-center and of different sizes in practical applications, two improved models based on varying weights by image features have been put forward, and both obtain better results. A series of experiments have been conducted to optimize model parameters, and final results with the MSTAR dataset illustrate that the two improved models are both superior to the original gcForest model. This is the first attempt to classify SAR targets using the non-NN model.


Introduction
As an active microwave remote-sensing device, synthetic aperture radar (SAR) is capable of providing high-resolution images independent of weather conditions and sunlight illumination, and is of great application value in military reconnaissance, environmental monitoring, geological exploration, disaster prediction, and other fields [1][2][3][4][5]. High-resolution SAR images contain a wealth of detailed information, which can immensely widen its application areas [6][7][8][9]; however, this also makes SAR image interpretation extremely complicated. Unlike optical images, the same landform or targets in SAR images show completely different information characteristics, so successful classification algorithms for optical images cannot be used directly on SAR images. In addition, a lack of effective characteristics, geometric distortion, false targets, speckle noise, and other issues make SAR images become more difficult to understand [5,10,11]. From the above points, it becomes harder and harder to meet the growing application demand utilizing only regular means of image interpretation. In recent years, with the development of deep learning, increasingly intelligent methods have been applied to SAR image interpretation. Deep learning is regarded as a feature learning tool in which image data can be directly taken as input without additional manual image processing or other complex operations. It can independently extract features that best express the target from SAR image data, and realize image and target classification automatically and efficiently [12][13][14][15][16][17][18][19][20].
Images from the MSTAR dataset used in this paper are collected by a high-resolution spotlight SAR operating in the X band with single polarization (HH). The images are slices of static targets. According to the SAR image characteristics, continuous strong reflection sources usually correspond to artificial targets [35]. The contrast between the targets and background can be used to identify such objects. However, due to the existence of clutters, the contrast between the targets and the background is usually too low to identify objects and details. So, it is necessary to adjust the images' contrast before they are input in order to classify. This paper carries out two steps of Gamma transformation preprocessing on the original images for enhancement.
Equation (1) represents a Gamma transformation of a greyscale image I. I(x, y) is the intensity of the pixel at the (x, y) location of the input image; c and γ are positive constants. A Gamma transformation with a certain range of γ maps a narrow range of cramped input greyscale into a broader range of output greyscale, with the opposite being true for higher input. In other words, if γ > 1, low-intensity pixels are darkened, and vice versa when γ < 1. In this experiment, a Gamma transformation of γ > 1 will be done on the images in the first step. This operation can effectively highlight the area containing targets. Then, the obtained image will be dynamically divided according to its gray value to obtain a slice of the target area. Finally, a Gamma transformation with γ < 1 will be performed on the slice to improve the dark details of the target; this operation can also increase the image's contrast at the low gray level, which is more conducive to distinguishing the image details.

Introduction of the Deep Forest Construction Method
Multi-grained cascade forest (gcForest) is an approach to construct a deep forest [32]. It is a novel ensemble method with a cascade decision tree structure, which can enhance its representation learning ability. The deep forest is made up of two parts, multi-grained scanning and cascade forest, which will be introduced in the following part. Since gcForest is the main method to achieve a deep forest, gcForest will be mainly used instead of the name deep forest in the following section.

Multi-Grained Scanning
As shown in Figure 1, sliding windows are used to scan the input original images. The original d*d image is partitioned into n*n panes of m*m size, in which m is the side length of a sliding window and n can be computed as follows: where p is the step size of sliding windows. Then, n 2 instances are put into classifiers such as random forest that can obtain classification types. The classification probability of the input instances obtained by the classifier constitutes the first-stage feature vectors. Suppose there are k classes to classify, these k classes can constitute a kdimensional vector, so each classifier gets n 2 k-dimensional first-stage feature vectors. The first-stage feature vectors of each classifier are sorted according to class k and rearranged into second-stage feature vectors with k n*n-dimensional as shown in Figure 2. As shown in Figure 3, the second-stage feature vectors are pooled to obtain the final features vectors of the multi-grained scanning stage. The side length of the pooling block (framed by a yellow dotted line in Figure 3) is set as t, and the pooling block is composed of t 2 sliding blocks of the secondstage feature vectors. The final feature vectors can be obtained by averaging the prediction probability of the second-stage feature vectors within the pooling block.

Cascade Forest
Cascade forest is a cascade structure, as illustrated in Figure 4. This layer-by-layer processing configuration improves the representation learning ability of deep forest. Each layer of the cascade is an ensemble of classifiers, and it receives the feature vectors (i.e., the classification probability) processed by the previous layer classifiers. Then these feature vectors are brought into the classifiers of the current layer for classification and the classification probability is output to the next layer as a result. Theoretically, classifiers used here can be of any form, such as random forest or logistic regression. To lower the risk of overfitting, the cascade forest uses k-fold cross-validation to generate every class vector. More concretely, each input instance will be used as training data k-1 times to generate k-1 class vectors, and the output class vector is produced by averaging these k-1 class vectors as enhanced features. After cross-validation, a new layer of cascade forest is generated. Since gcForest has a characteristic that automatically determines the number of cascade levels, the training procedure will be terminated if a new layer leads to almost no improvement. The overall framework of gcForest is elucidated in Figure 5.

Optimized Deep Forest
To gather new image features, multi-grained scanning is used to assort sizes of sliding panes to deal with the original image as shown in Figure 6. Thus, the proportion of the object in each sliding pane is different, which affects the classification probability of second-stage feature vectors in the pooling block. Nevertheless, these influences are ignored in the pooling process. Such an average pooling process method is not accurate for samples with the target in the center, which will directly affect whether the generated features are representative enough or accurate enough. In order to overcome this problem, two improved methods for pool layer processing are proposed, which can effectively improve the classification accuracy without affecting the training speed. They are Euclidean distance weighted pooling (distance gcForest) and overlap degree weighted pooling (overlap gcForest).

Distance gcForest
Distance gcForest measures the proportion of the target in the sliding pane in a relatively straightforward way by calculating the Euclidean distance d between O (the center of the image) and P (the center of the sliding block), as shown in Figure 7a. Assuming that the distance between P and O is less than some value and that the pool block will contain the entire target, there is no need to weight by distance. Accordingly, the second-stage feature vectors in pooling blocks are weighted differently according to the extent of d. The distance range is the minimum to maximum value of d, and this range is divided into two parts as described in the first column of Table 1. l is the side length of the input image and q is a parameter that can be adjusted according to the average proportion of the target to the input image. A and B are weighted areas shown in Figure 7b. Inspired by the weight updating mode of the Boosting classifier [36], the updated weight can be formulated as follows.
The initial weight distribution of each feature vector in the block is ( ) where m and n are the size of the pooling block (usually choose m = n). Then, the feature vector's updated weight distribution is ( ) where k is an adjustable parameter, ei is the misjudgment probability of a second-stage feature vector in the pooling block, and Zd is the scaling factor, which makes Wd a probability distribution.
In conclusion, the weighting mode divided by d is shown in Table 1.

Overlap gcForest
Overlap gcForest is a method that weights the second-stage feature vector in the pooling block according to the overlap degree. As shown in Figure 8, SW is the cross area of the sliding pane and the target region (indicated by the yellow area), and S is the target area (indicated by the red box). The overlap degree I is defined as the ratio of SW to S.
(8) Similar to the last section, the initial weight ui is the average number of panes in the pooling block and the updated overlap degree weight distribution is ( ) , 1 ,2, , where k is an adjustable parameter and ZI is the scaling factor, which makes WI a probability distribution.

MSTAR Dataset
The experimental data used in this paper are provided by the SAR sensor of Sandia national laboratory, collected by a spotlight SAR with 0.3 m × 0.3 m resolution. Defense Advanced Research Projects Agency and the Air Force Research Laboratory co-hosted the data collection, which is a vital part of the MSTAR project. The project gathered hundreds of thousands of SAR images, including target types, azimuth, pitch, barrel turning, and contour configuration changes of various ground military targets.
The dataset used in this paper includes 10 different classes of ground military vehicles, which are 2S1 (self-propelled howitzer), BMP2 (infantry fighting vehicle), BRDM2 (armored reconnaissance vehicle), BTR60 (armored transport vehicle), BTR70 (armored transport vehicle), D7 (bulldozer), T62 (tank), T72 (tank), ZIL131 (cargo truck), and ZSU234 (self-propelled anti-aircraft gun) as shown in Figure 9. The same type of target in the training set and the test set has the same model number, but their imaging elevation angle and azimuth angle are different. The training SAR image was collected at 17 pitch angles and the test SAR image was collected at 15 pitch angles, a difference of 2 pitch angles that can be considered negligible. The sample numbers of each target type in the dataset are different and are listed in Table 2. The initial size of samples is 128 * 128, and then images are cut into size 96 * 96 to reduce the amount of computation.

Classification Results of Optimized gcForest
The hardware environment producing these experiment results in this paper is a computer using an Intel Core I5-6500 CPU of 3.20 GHz with a 64-bit operating system and 24 GB of internal storage. It is worth noticing that the running efficiency of gcForest and optimized gcForest is good without GPU acceleration. Table 3 shows the confusion matrix for the classification results of the MSTAR dataset using the parameter optimized gcForest, which is described in Section 4. The rows in the confusion matrix represent the actual target category, and the columns represent the experimentally predicted category. The experiment results demonstrate that deep forest can obtain high classification accuracy in most categories. It is feasible to apply deep forest to SAR target classification, but the algorithm still needs further improvement. The classification accuracies of BMP2 and BTR60 are not satisfactory, and other categories are prone to be misconceived as T72 and ZIL131. The main reason for these cases is that different incident angles of the electromagnetic wave have different contributions to the final imaging results, making the same vehicle object look completely different.

Effect of Optimized gcForest and Improved gcForests
The MSTAR dataset is then used to evaluate the performance of two improved methods. The first four rows in Table 4 present the classification results for each class using several kinds of gcForests, where the initial gcForest is the model using the original parameters. Comparing the results of the initial gcForest and optimized gcForest, it can be found that the adjustment of parameters has greatly improved the classification accuracy of gcForest. This part will be explained in detail in the next section. The distance and overlap gcForests using optimized parameters further improve the classification accuracy, and the latter one improved BMP2 and BTR60 significantly. Table 4 also lists the results of comparative methods, namely, MSCAE, Non-negative Matrix Factorisation (NMF) [37], ED-AE, Sparse Representation of Monogenic Signal (MSR) [38], and Riemannian Manifolds [39]. The training results obtained by the comparison approach in this paper are all based on the dataset without artificial expansion. It is not hard to find that the classification accuracies of the optimized and the two improved gcForests are higher than the other algorithms.

Discussion
This section discusses the influence of classifiers and parameter selection on multi-grained scanning and cascade forest respectively, based on the working process of gcForest, and then selects the most suitable training parameters for the MSTAR dataset.
The default parameters of gcForest are listed in Table 5. The training and test set classification accuracies are 98.18% and 81.77%. Training time is divided into two parts: Multi-grained scanning costs 567,858 s, cascade takes 126,179 s, and the total time consumed is 8 days, 47 minutes, and 27 seconds. The accuracies using initial parameters are not very ideal and the training is inefficient, there is still a long way to go for a specific category of classification tasks. Therefore, it is necessary to adjust the parameters to achieve a better classifier performance and shorter time cost. Not only that, comparison and analysis of the results can further help to understand the working process of gcForest and prove the feasibility and superiority of the proposed method.

Factors Affecting Multi-Grained Scanning
This stage involves feature extraction, which enhances the ability of the network to process input image data. The quality of feature extraction directly affects the accuracy and performance of the model, and it also influences the training time of the whole network. The following subsections explain the influence of the sliding pane size, classifier type and classifier parameters on accuracy.

Size of the Sliding Pane
In this experiment, ExtraTreesClassifier is used for classification to observe the influence of the change of the pane size on the accuracy and training time.
The size of the sliding pane determines the number and completeness of features. Sliding panes that are too small can cause the target to be too scattered, with most of them not even having parts associated with the target. Experiment results listed in Table 6 demonstrate that a small-pane size not only leads to a decrease in accuracy, but also significantly reduces training efficiency. Too large a pane size may contain lots of useless background information, which also reduces training accuracy to some extent but has little effect.

Type of Classifiers
In theory, gcForest can use any kind of classifier, and it imports various kinds of base classifiers to enhance model diversity. Unlike the original setting, this experiment adds LogisticRegression at the multi-grained scanning stage, the sliding pane size is set as 84, and the parameter settings of the classifiers are formulated in Table 7.  Table 8 illustrates the accuracy of each classifier. RandomForestClassifier and ExtraTreesClassifier achieve high accuracy on the training set, but the accuracies of their test sets are slightly lower, while LogisticRegression has better accuracy in both sets. However, in the multigrained scanning phase, LogisticRegression significantly increases the memory footprint and reduces the training speed, so it will not be used on a large scale.

Size of Classifier
The parameter adjustment of the classifier mainly focuses on the number of estimators and the max depth, which will affect the classification accuracy and training speed. Five groups of parameters are used in this experiment to observe their effects. The parameters are adjusted up and down proportionally centered on n_estimators = 500, max_depth = 100. Table 9 illustrates their influence on accuracy, and Figure 10 shows the corresponding time consumption. The results show that for the same classifier, the variation of parameters has little effect on the classification accuracy, but the accuracies of different classifiers using the same parameter are quite different. ExtraTreesClassifier takes less time and performs better than RandomForestClassifier. On the premise of ensuring the classification accuracy, especially that of the test set, the final model in this stage selects a parameter that uses less training time. Finally, RandomForestClassifier and ExtraTreesClassifier choose n_estimators = 500, max_depth = 100 and n_estimators = 600, max_depth = 120 as their parameters, respectively. Combined with the previous discussion, the classifiers selection and their parameters settings of the multi-grained scanning stage are listed in the following Table 10. As can be seen from Table 17 in Section 4.3, the improvement of multi-grained scanning plays a significant role in shortening the training time.

Factors Affecting Cascade
Cascade forest carries the predictive results trained by multi-grained scanning as a new extracted feature for training classification. This section discusses the effects of the type, parameter setting, and quantity of classifiers inside the cascade forest in detail.

Type of Classifier
Based on the ensemble learning theory, the introduction of individual classifiers with differences can improve the overall performance of the network which is also the foundation of the gcForest algorithm. Therefore, a new classifier SGDClassifier is introduced to improve the performance of gcForest. The parameter setting is shown in Table 11. The average classification accuracies of each layer are shown in Table 12. The accuracies of builtin classifiers are listed in the first five lines, and the last line is that of gcForest configured with these parameters. Figure 11 and Figure 12 records the changes in the accuracy of the training set and test set at each layer of the built-in classifiers and gcForest. During the training stage, the performance of the SGDClassifier is not as good as the other four basic classifiers, but in the test stage, the accuracy of the SGDClassifier is the best. However, two classifiers, ExtraTreesClassifier and RandomForestClassifier, that achieved better performances in the training stage turned out to be worse in testing. GcForest integrates these five kinds of classifiers and achieves high accuracy and stable performance in both stages.
On account of the classification accuracies of ExtraTreesClassifier and RandomForestClassifier falling short of the other three classifiers and being lower than that of gcForest, an experiment was carried out using a new classifier combination. It used each of the above-mentioned classifiers, except for ExtraTreesClassifier and RandomForestClassifier. The gcForest accuracy using this combination was 98.43% in the training set and 95.84% in the test set. This result is just slightly better than the result above, which uses a network without optimization. To some extent, the result also shows that increasing the diversity of classifiers in gcForest is beneficial to improving its performance. The classifiers used in the network support each other and play a complementary role.

Parameter Selection of Built-In Classifiers
This part debates the effects of classifier size and loss function selection on experiment results from three perspectives, respectively.
i. Size of Two Forest Classifiers Similar to the experiment in the previous section, several combinations of parameters are selected to compare their effects on the test set. Their influence on classification accuracy is shown in Figure 13 and Figure 14, where different numbers of estimators are represented by different colors, their corresponding descriptions are listed below, and the max depths corresponding to estimators' number are listed on the Y-axis. These parameter settings have a slight effect on accuracy, and the training time increases as the estimator number and max depth increase. Except for the first two sets of parameters, the average costing time per layer of RandomForest is 33.4 s and ExtraTreesClassifier is 14.2 s. These results indicated that ExtraTreesClassifier performs quickly. In consideration of time consumption and slight differences in accuracy, RandomForestClassifier selects max_depth = 100, n_estimators = 600 and ExtraTreesClassifier selects max_depth = 100, n_estimators = 500 as the model parameters in the following experiments.  ii. Size of XGBClassifier The performance verification of the XGBClassifier is divided into three groups according to the numbers of estimators, which are 500, 750, and 1000, and they are distinguished by three colors in Figure 15. For each group, there are three kinds of max depth for the estimators, which are 5, 6, and 10. Figure 15 shows the max depth required for each set of estimators to obtain the best classification results; the bar chart shows the accuracy and the line chart represents the average training time. The training time very closely relates to the number of estimators used in XGBClassifier; classifiers with the same estimator number but different max depths take almost the same amount of time. Therefore, the training time of the classifier with the same number of estimators is averaged to represent the time cost. It is not hard to find from the composite structure diagram that classification accuracy can be improved by using more estimators, but the more estimators are used, the more time it takes. Considering the accuracy and time consumption, the alternative parameters are chosen as max_depth = 10, n_estimators = 750 and max_depth = 5, n_estimators = 500. iii. Loss Function of SGDClassifier SGDClassifier is a simple and effective classification method. Since gradient descent is extremely sensitive to the range of data, it is necessary to adjust its parameters. In this experiment, the loss functions are chosen as modified_huber and log. Modified_huber is a smooth hinge loss function, where the hinge is a linear support vector machine, and log is a logistic regression function. The loss function modified_huber only updates the model parameters when the sample violates the marginal constraint. Table 13 shows that the classification accuracy of modified_huber is not as high as that of log in both datasets, so log is selected as the final loss function of the SGDClassifier.

Number of Classifiers
It is found from the structure of gcForest that the type and number of classifiers used in the cascade forest affect the results. The accuracy and generalization performance of the gcForest will improve with the increase in the number of individual classifiers inside the model, but too many individual classifiers will lead to a decrease in the diversity among classifiers, which will lead to a performance decline and magnified calculation of the algorithm. In addition, the number of cascade forest classifiers is small, so it produces fewer feature vectors, which leads these features to be overtaken by large quantities of features generated at multi-grained scanning. Therefore, it is particularly important to choose the number and types of classifiers appropriately.
The analysis in the previous two sections provides a reliable basis for us to make targeted adjustments for the appropriate number and types of classifiers. XGBClassifier performs better in gcForest, so its number is raised during the base classifier selection. The choices of the number of each classifier are indicated in Table 14, their parameter selections are listed in Table 15, and the classification accuracies of these different combinations are illustrated in Figure 16. In order to reach the optimal outcome under the condition of any data size, gcForest will automatically determine its number of training layers. Therefore, the overall training time of gcForest cannot be determined in advance, which is not the focus of attention, so only the average training time of each layer is recorded here.
The results prove that a suitable parameter selection can greatly improve classification accuracy. Moreover, an appropriate increase in the number of XGBClassifier used in cascade forest improves the classification accuracy to a certain extent, but at the same time, this increase in number also leads to an increase in the training time per layer.

Results Contrast
The final parameters used in the gcForest model are listed in Table 16. Tables 4 and 17 show the classification results and training time comparison before and after optimization. It can be seen in Table 4 that after parameter optimization, the classification accuracy has been greatly improved. Furthermore, the classification of augmented MSTAR datasets using optimized and improved gcForests is also more accurate than the methods proposed in recent years. The gcForest terminates its training automatically, which means that the layer number and training time of the cascade forest are uncertain. On this account, although the total time consumption of cascade forest needs to be paid attention to, it is more important to pay attention to the time consumption of each layer. Since the optimized gcForest, distance gcForest, and overlap gcForest use the same features generated by multi-grained scanning, they have the same training time at multigrained scanning. Although the time consumption of the two optimization algorithms is slightly increased, it brings an improvement in classification accuracy, especially for the two categories of BMP2 and BTR60.

Conclusions
In this paper, a novel deep forest method implemented by the gcForest is used to explore the field of SAR target classification. This is the first time that such an experiment applied this method to SAR target classification and achieved satisfactory results. At present, most NN classification methods need to expand the dataset by means of rotation, stretching, and arbitrary clipping to improve their classification accuracy, but the experimental results in this paper use no artificially extended dataset. The gcForest is not sensitive to hyper-parameters and can achieve a good classification effect only by using the original parameters. Of course, as proved by the experiments presented in this paper, a more suitable selection of parameters will improve the classification accuracy for the specific classification task. Compared with NN methods, gcForest's calculation scale is much smaller, the training time is much shorter, and it is much easier to understand the working process of the network. The gcForest model can automatically determine the optimal number of training layers and terminate the classification process. In addition, this paper proposes two preliminary optimized methods, distance gcForest and overlap gcForest, aiming at data features for further improvement, and some results were achieved. Compared with several classification experiments without dataset enlargement, the optimized gcForest and improved gcForests achieve the best accuracy. In future studies, more targeted improvement schemes will be proposed for SAR target characteristics and applied to a wider range of extension conditions.