Multi-Pixel Simultaneous Classification of PolSAR Image Using Convolutional Neural Networks

Convolutional neural networks (CNN) have achieved great success in the optical image processing field. Because of the excellent performance of CNN, more and more methods based on CNN are applied to polarimetric synthetic aperture radar (PolSAR) image classification. Most CNN-based PolSAR image classification methods can only classify one pixel each time. Because all the pixels of a PolSAR image are classified independently, the inherent interrelation of different land covers is ignored. We use a fixed-feature-size CNN (FFS-CNN) to classify all pixels in a patch simultaneously. The proposed method has several advantages. First, FFS-CNN can classify all the pixels in a small patch simultaneously. When classifying a whole PolSAR image, it is faster than common CNNs. Second, FFS-CNN is trained to learn the interrelation of different land covers in a patch, so it can use the interrelation of land covers to improve the classification results. The experiments of FFS-CNN are evaluated on a Chinese Gaofen-3 PolSAR image and other two real PolSAR images. Experiment results show that FFS-CNN is comparable with the state-of-the-art PolSAR image classification methods.


Introduction
Synthetic aperture radar (SAR) is one of the most important methods of earth observation. It has the advantages of working under all weather conditions, large scope and certain penetration capacity. Modern SAR systems can provide polarimetric SAR (PolSAR) images by emitting and receiving fully polarized radar waves [1]. In recent years, PolSAR has developed rapidly in China. With the launching of the Chinese Gaofen-3 (GF-3) satellite on 10 August 2016, the ability of earth observation of China is improved significantly. GF-3 carries a C-band SAR sensor with different polarizations and operates in 12 different working modes, so it can provide all kinds of polarization images, including single-, dual-and quad-polarization images. GF-3 will greatly help the study of SAR image processing in the next few years.
PolSAR image classification is one of the most important applications in PolSAR image processing, where each pixel in the PolSAR image is assigned to one class. It plays an important role in urban planning, agriculture, disaster prevention and so on [2][3][4]. The methods for PolSAR image classification can be divided into two main categories: one is the traditional statistical modeling [5,6] and the other is the machine learning. For long time the machine learning methods for PolSAR image classification are mainly non-neural machine learning [7] methods, such as support vector machine (SVM) [8] and random forest [9]. These methods have achieved good results [10,11], but the classification accuracy of non-neural machine learning methods depends on the discrimination of feature representation, which always requires designing and tuning manually. The handcrafted features need long time

Framework of the CNN
Typically, the CNN is stacked by convolutional layers, pooling layers, and fully connected layers. All the layers are connected in series and the input data of a layer is the output data of the previous layer. The input data of the first layer is an image or low level features. Because of the deep connected structure, the CNN can extract high level features from low level features. The convolutional layer convolves the feature maps of the previous layer with learnable kernels and puts the results through activated function to generate the output feature maps [30], as shown in the equation below.
x l where M j denotes the input feature map, k j i denotes the convolutional kernel, b j denotes the bias and x l j denotes the output feature map. f (·) is the nonlinear activation function, such as sigmoid function and Rectified-Linear Units (ReLU) [31].
The convolutional layer is usually followed by a pooling layer. The pooling layer can reduce the dimension of the feature map and prevent overfitting. The pooling layer computes a value from a local window of the input feature map. Different pooling layers have different algorithms. The most common used is max pooling, it chooses the max value of a local window as the output. The other pooling layers are avenge pooling, stochastic pooling, Spatial Pyramid Pooling (SPP) [32] and so on.
The fully connected layer is usually on the top of a CNN. It multiplies the input feature maps with learnable weights to generate the output feature maps, which is shown as below where M j is the input feature, x l j is the output feature, w l ij is the learnable weights and b l j is the bias. f (·) is the nonlinear activation function, too. In the task of classification, the fully connected layer can extract a 1D feature vector and puts the vector to the softmax layer. The dimension of the feature vector is the number of the classes.
In the task of classification, softmax layer is the classifier of a CNN. It is defined as follows where z i is one of the prediction of previous layer. σ i (z) is a nonnegative and normalized value, which is the probability of class i. The softmax layer calculates the probabilities of all classes and the class with max probability is the final classification result. The most common used CNN for PolSAR image classification is Lenet-5 [25] or improved Lenet-5 [26,27]. Other very deep CNNs such as AlexNet [13], VGGNet [33], GoogLeNet [34] and ResNet [35] are suitable for images of large size. The CNNs need many samples to train the weights but the PolSAR images are always not sufficient from one SAR sensor. In PolSAR image classification field, the PolSAR images are split into a significant amount of patches and some patches are randomly selected as the training samples of CNNs. The size of the patches is always small, such as 7 × 7, 9 × 9, 15 × 15. When the small patches pass through multiple convolutional layers and pooling layers, the size of final feature map is even smaller and may become zero. Therefore, more time is needed to tune the parameters of these very deep CNNs mentioned above to satisfy the requirement of the PolSAR image classification task. Lenet-5 is composed of 2 convolutional layers, 2 pooling layers, 2 fully connected layers and a softmax layer, as shown in Figure 1. In optical image processing, some CNN models can output pixel-wise predictions simultaneously for all the pixels of an image. Take [14] as example, the authors proposed fully convolutional neural networks (FCN) for semantic segmentation. With a well adapted classifier for dense prediction, FCN can output pixel-wise bidimensional map of an input image.
Generally, most CNN models for classification, including Lenet-5 and FCN, can be divided into two parts, which are the feature extraction part and the classifier. According to different deep learning algorithms, the architectures of feature extraction parts are different, such as Lenet-5, AlexNet and ResNet. The most common used classifier for CNN is softmax. The softmax not only can output one class, such as Lenet-5, but also can output a bidimensional map of classes, such as FCN. FFS-CNN can also be divided to the feature extraction part and classifier. The feature extraction part of FFS-CNN makes some reference to Lenet-5 and classifier of FFS-CNN is the same as FCN. The detailed structure of FFS-CNN is illustrated in Section 2.2.

Fixed-Feature-Size Convolution Neural Networks
The structure of FFS-CNN is shown in Figure 2. As the size of input patches is small, the feature extraction part of FFS-CNN should as simple as Lenet-5. The feature extraction part of FFS-CNN contains 4 convolutional layers, 2 fully connected layers, and a reshape layer. The classifier of FFS-CNN is softmax, which can produce pixel-wise predictions for all pixels in a patch. The kernel size of each convolutional layer is 3 × 3, stride is 1 and pad is 1. So for each convolutional layer, the size of input feature maps and output feature maps is the same. The size of input patches is denoted by w × w and the number of channels is 9. Firstly the input patches pass through 4 convolutional layers, the size of output feature maps of the fourth convolutional layer is w × w with 100 channels. Then the feature maps pass through 2 fully connected layers and the output feature is a 1D vector. The size of the 1D vector is (w × w × n), where n denotes the number of classes and the expression in parentheses represents a number. In order to match the input format of the softmax layer to classify all pixels simultaneously, the 1D feature vector needs to be reshaped to 2D feature matrices. The size of output feature matrices of the reshape layer is n × (w × w). Finally the softmax layer uses the feature matrices to calculate the probability of each class for the pixels of w × w and the class with max probability is the classification result of each pixel.
There are three kinds of input data, which are training samples, testing samples and the data for classification. In Figure 2, the red arrows and blue arrows show the training procedure and green arrows show the classification procedure. The training samples and their labels are used to train the weights of all layers through back propagation (BP) and gradient descent. The testing samples and their labels are checked during training to monitor the progress and coarse accuracy of the model, but are never used for gradient descent. In the procedure of classification, all patches from a PolSAR image are input to the trained model to get the classification results of the pixels in the image. In the training procedure, the image patch and the labels of all pixels in the patch are used to train the FFS-CNN.
In this way, the FFS-CNN can learn the interrelation of land covers in the patch. For example, if the neighboring pixels of a pixel are all water, then this pixel most probably is water. conv1 Input Output  As we mentioned in Section 2.1, the feature extraction part of FFS-CNN makes some references to Lenet-5. In order to keep the feature size invariable, the two pooling layers of Lenet-5 are changed to convolutional layers. The parameters of convolutional layers and fully connected layers are also specially designed. In [14], the softmax layer is used to classify multiple pixels, so FFS-CNN also use the softmax layer to classify the pixels in a patch simultaneously.
The FFS-CNN has two main characteristics: 1.
The FFS-CNN implements the multiple pixels simultaneous classification in a patch, which is illustrated in Figure 3. The number of land covers in a patch is equal to or smaller than the number of classes n. It is not hard to classify w × w pixels to no more than n classes through CNN. Because FFS-CNN can classify the multiple pixels simultaneously and its structure is simple, it is much faster than Lenet-5 when classifying a whole PolSAR image.

2.
FFS-CNN can use the interrelation of different land covers. In the training procedure, FFS-CNN uses the labels of pixels in a patch to learn the interrelation of land covers. In the classification procedure, the interrelation of land covers is used to predict the classes of pixels in a patch.

FFS-CNN
Classification result of all pixels in a small patch Based on the architecture of the FFS-CNN, the pixels in a patch can be classified simultaneously, so the sliding window method is used to classify the entire PolSAR image. In this paper, the window slides w/4 pixels. In other word, about 3/4 pixels are overlapped and will be classified again. Figure 4 shows the strategy of 3/4 overlap classification. Each pixel of a PolSAR image is classified multiple times so the probabilities are averaged. For each pixel, the class of maximum probability is taken as the final classification result. In the video activity recognition section of [36], the authors also averaged the label probabilities across all frames of an entire video to choose the most probable label.

Input Data of FFS-CNN
PolSAR image can be expressed with polarization coherent matrix T 3 . It has the following form.
where T 11 , T 22 , T 33 are real numbers, the others are complex numbers. T 12 is the conjugate complex number of T 21 , T 13 is the conjugate complex number of T 31 and T 23 is the conjugate complex number of T 32 . To make full use of the polarimetric information, the matrix T 3 is used to generate the input data of FFS-CNN. For each pixel, the polarimetric data can be defined as a vector t p .
Then the polarimetric data of all the pixels in a patch can be used to generate a matrix as the input of FFS-CNN, which is shown in Figure 5. The dimension of the matrix is 9 × w × w . For each channel, normalization is needed.

Materials
Three real PolSAR datasets, including two spaceborne PolSAR images and one airborne PolSAR image, are used to verify the performance of FFS-CNN. The detailed data information is presented in Table 1.

RADARSAT-2 Flevoland Dataset
The spaceborne dataset is acquired by the C-band RADARSAT-2 (RS-2) PolSAR system at fine quad-pol mode. It is over Flevoland in the Netherlands, with an image size of 1400 × 1200 pixels. The spatial resolution is 12 m in range direction and 8 m in azimuth direction. A total of four classes of Flevoland dataset are identified, consisting of water, forest, farmland, and buildings. Figure 6a shows the Pauli RGB image. Figure 6b shows the ground truth map, which was manually created based on very high resolution optical images.

AIRSAR Flevoland Dataset
The airborne dataset is the NASA/JPL AIRSAR L-band four-look fully polarimetric data. The Pauli color-coded image is shown in Figure 6e. This scene also covers over Flevoland, the Netherlands, with an image size of 750 × 1024 pixels and a spatial resolution of 6 × 12 m. Since [37], this dataset is widely used in land cover classification with the well-established ground truth map, which is shown in Figure 6f. A total of 11 classes are identified, consisting of eight crop classes, and three other classes of bare soil, water, and forest.

Gaofen-3 Wuhan Dataset
The other spaceborne dataset is acquired by the C-band GF-3 PolSAR system at quad-polarized strip I (QPSI) mode. The scene used in this paper covers over local area of Wuhan, China, with an image size of 1050 × 1000 pixels and a spatial resolution of 5.20 in range direction and 2.25 m in azimuth direction. It has four classes, which are water, forest, farmland and buildings. The Pauli RGB image and ground truth map are shown in Figure 6c,d. The ground truth map is labeled manually according to the high resolution optical image, which is shown in Figure 7.
For an entire PolSAR image, the sliding window of w × w is used to generate a significant amount of image patches, which can serve as the training and testing samples. For each dataset, the training samples and testing samples are selected randomly from those generated patches but the numbers of training samples and testing samples are a little different. The numbers of samples of the three datasets are introduced in Table 1.

Results
All the experiments in this paper are based on our deep learning acceleration computing service. The CPU is i7-7700, the graphics card is NVIDIA GTX 1080ti and the RAM is 16G. The system of the computing service is Ubuntu 16.10 and all the CNN models are trained with Caffe [38]. We set the size of input patches w to 15. The OA and kappa are used to judge the performances of the models, where OA stands for the overall accuracy and kappa stands for the kappa coefficient. The final classification results of the datasets are used to calculate the OA and kappa.

Results of RS-2 Flevoland
For RS-2 Flevoland, there are two kinds of training and testing simples, which are named Samples 1 and Samples 2. The Samples 1 are randomly selected from the total patches. The whole PolSAR image is used to evaluate the classification results. The Samples 2 are randomly selected from the patches that generated from the top half of the PolSAR image. Only the bottom half of the PolSAR image is used to evaluate the classification result. Because there are no training samples from the bottom half of the PolSAR image, the classification results of the bottom half of the PolSAR image are totally independent from the training samples and can more clearly show the generalization ability of the models. The number of samples in Samples 1 and Samples 2 is the same. In [39], the authors used RS-2 Flevoland dataset to judge the performance of their method. The ground truth map in [39] is different from ours, so the result in [39] can be used as a reference but should not be used as the benchmark to judge the performances of the models. Figure 8 and Table 2 show the classification results and accuracies of RS-2 Flevoland dataset. No matter what kind of the training samples is, the accuracies of 4 classes of FFS-CNN are all higher than Lenet-5. When the training and testing samples are Samples 1, the OA of FFS-CNN is 3.44% higher than Lenet-5. When the training and testing samples are Samples 2, both the OAs of FFS-CNN and Lenet-5 become lower but the OA of FFS-CNN is still 3.92% higher than the OA of Lenet-5. Hence, FFS-CNN can learn more discriminative feature representation than Lenet-5. The OAs of FFS-CNN also higher than the method in [39].

Results of AIRSAR Flevoland
Because the labeled land covers of AIRSAR Flevoland are irregular, they cannot be divided into 2 parts simply for selecting samples. All the training samples and testing samples are randomly selected from the total generated patches. Figure 9 and Table 3 show the classification results and accuracies of AIRSAR Flevoland. The accuracies of 11 classes of FFS-CNN are all higher than Lenet-5. The OA of FFS-CNN of AIRSAR Flevoland is 2.62% higher than Lenet-5.  The AIRSAR Flevoland dataset is widely used in some other papers, such as [26,27]. In [26], the authors proposed a dual-branch CNN. The dual-branch CNN was compare with PauliRGB-CNN. The PauliRGB-CNN only used the Pauli RGB image as the input. In [27], the authors proposed a complex-valued CNN (CV-CNN). CV-CNN was compared with the real-valued CNN (RV-CNN). The ground truth map of AIRSAR Flevoland in both [26,27] are different from ours. The results can be used as a reference but should not be used as the benchmark to judge the performances of the models. The results are shown in Table 4. The OA of FFS-CNN is the highest.

Results of GF-3 Wuhan
For GF-3 Wuhan dataset, there are also two kinds of samples named Samples 1 and Samples 2, like RS-2 Flevoland dataset. Samples 1 are selected from the total patches. The whole PolSAR image is used to evaluate the classification result. Samples 2 are selected from the patches that generated from the right 1/5 of the PolSAR image. The left 4/5 of the PolSAR image is used to evaluate the classification result. The number of samples in Samples 1 and Samples 2 is the same.
The classification results and accuracies of GF-3 Wuhan are shown in Figure 10 and Table 5. When the training and testing samples are Samples 1, the accuracies of 4 classes of FFS-CNN are all much higher than Lenet-5. The OA of FFS-CNN is 7.00% higher than the Lenet-5. When the training and testing samples are Samples 2, the accuracies of 4 classes of FFS-CNN are also higher than Lenet-5 and the OA of FFS-CNN is 4.85% higher than Lenet-5. It again illustrates that FFS-CNN can learn more discriminative feature representation than Lenet-5.

Discussion
The feature extraction part of FFS-CNN makes some references to Lenet-5 but the performance of FFS-CNN is much higher than Lenet-5. Moreover, the Samples 2 are totally independent from the evaluation of PolSAR images and the results of FFS-CNN are also better than Lenet-5. This illustrates that FFS-CNN can learn more discriminative feature representation. Three factors play important roles in the good results of FFS-CNN. First, the feature size of all layers in FFS-CNN is fixed. Second, FFS-CNN is trained to use the interrelation of land covers in a patch. Third, the sliding window classification strategy is used to classify a whole PolSAR image. In the following subsections, we will discuss how the three factors affect the classification accuracy of FFS-CNN.

Discussion on Feature Size
The feature size of each layer of FFS-CNN is invariant. In order to compare the performance with the CNN of which the feature size decreases, the second and fourth convolutional layers of FFS-CNN are changed to pooling layers. We call it decreased-feature-size CNN. The structure is shown in Figure 11. The layers in red box are the same as Lenet-5 while the layers in blue boxes are the same as FFS-CNN, so the decreased-feature-size CNN can also classify multiple pixels simultaneously in a patch. If the size of input patches is 15 × 15, then the size of the output features of the second pooling layer is 4 × 4. Because all the pixels are classified by softmax layer, the size of input features of the softmax layer should be n × 15 × 15. The first fully connected layer upsamples the features to the size of (n × 15 × 15). Other experiment parameters are the same as the experiments in Section 3. The classification results are shown as Table 6. For all datasets, the OAs of FFS-CNN are much better than the decreased-feature-size CNN. There are two reasons. First the feature size is fixed, FFS-CNN has more weights in convolutional layers. Second, FFS-CNN do not need to upsample the features. FFS-CNN can learn more discriminative feature representation and get better classification results than decreased-feature-size CNN.

Discussion on Interrelation of Land Covers
FFS-CNN can use the interrelation of the land covers in a patch. In this additional experiment, special patches are randomly selected from the total generated patches as the training samples. The land covers of the pixels in each special patch are the same. We call it one-class training samples. In this way, we can remove the effect of the interrelation of different land covers. In Section 3, the land covers of the pixels in each training patch are different, so they are called multi-class training samples. Other experiment parameters are the same as the experiments in Section 3. Table 7 shows the experiment results of FFS-CNN and Lenet-5 of different training data types. For the RS-2 Flevoland dataset, when the Lenet-5 and FFS-CNN are trained with one-class training samples, the OA of Lenet-5 is decreases by around 1% and the OA of FFS-CNN decreases by around 3%. The OA of FFS-CNN decreases much more than Lenet-5. The Lenet-5 classifies one pixel each time and does not use the interrelation of land covers, so the OA only decreases a little. The interrelation of different land covers is removed from the one-class training samples, so FFS-CNN cannot learn the interrelation of land covers and cannot use the interrelation of land covers to classify the pixels in a patch. Hence, the OA of FFS-CNN decreases a lot. For the AIRSAR Flevoland dataset, the OAs of Lenet-5 and FFS-CNN are both decrease by around 1-2% when the models are trained with one-class training samples. The OA of FFS-CNN still decreases more than Lenet-5. For the GF-3 Wuhan dataset, the OA of Lenet-5 increases but the OA of FFS-CNN decreases by 1.38% when the models are trained with one-class training samples. This is again illustrates that the interrelation of different land covers can improve the accuracy of FFS-CNN but makes no sense to Lenet-5. From the classification results of the three datasets we can see that FFS-CNN can use the interrelation of land covers to the improve the multiple pixels classification results.

Discussion on Overlap Ratio
In Sections 3, 4.1 and 4.2, the overlap ratio is 3/4. In order to discuss the effect of overlap ratio, in this section the overlap ratios are set to 0, 1/4, 1/2, 3/4, 7/8, where 0 stands for on overlap. The classification time is also recorded based on the hardware and software platform of our deep learning acceleration computing service with no special optimization. Only one patch is input to the models each time. Table 8 illustrates the OAs of the three datasets. Especially, Figure 12 shows the classification results of the local area of RS-2 Flevoland, which can more clearly shows the details of classification results. In the situation of no overlap classification strategy, the OA of RS-2 Flevoland of FFS-CNN are almost the same as the Lenet-5, but the classification result of FFS-CNN has mosaic effect, which can be seen in Figure 12. For the AIRSAR Flevoland dataset, the OA of FFS-CNN is a little lower than Lenet-5. There are two reasons. First, Lenet-5 uses one input data to only predict the class of one pixel while FFS-CNN uses the same input data to predict the classes of w × w pixels. Second, the performance of FFS-CNN depends on the interrelation of land covers in a patch, which is discussed in Section 4.2. The labeled pixels of AIRSAR Flevoland are scattered and FFS-CNN cannot learn enough interrelation of land covers, so FFS-CNN loses the edge to Lenet-5 and the OA of FFS-CNN of AIRSAR Flevoland is a little lower. For the GF-3 Wuhan dataset, the OA of FFS-CNN is much better than Lenet-5. In the situation of 1/2 overlap classification strategy, the classification results of FFS-CNN are improved significantly and all much better than Lenet-5. In the meanwhile, the mosaic effect is almost disappeared. In the situation of 3/4 overlap classification strategy, the OAs of FFS-CNN increase further and the classification results are very fine. Table 9 illustrates the classification time of the three datasets. First, when the overlap ratio is 1/2, the OAs of FFS-CNN are much better than Lenet-5 while the classification times of FFS-CNN are almost ten times less than Lenet-5. FFS-CNN is much faster than Lenet-5. Second, when the overlap ratio is increases from 0 to 3/4, the OAs increase dramatically and the classification time increases a lot, too. The classification times of FFS-CNN are still about a half of the classification times of Lenet-5. FFS-CNN is still faster than Lenet-5. When the overlap ratio increases to 7/8, the OAs only increase slightly, but the classification times increase almost 10 times more. Therefore, if we give priority to the classification speed, overlap ratio can be set to 1/2. If we prefer to the classification accuracy, then the overlap ratio of 3/4 is the best choice. As long as the overlap ratio is less than 3/4, FFS-CNN is much faster than Lenet-5 for classification a whole PolSAR image. There is no need to increase the overlap ratio over 3/4.

Visualization of Outputs of Convolutional Layers
In this section we will visualize the output feature maps of the convolutional layers. The output features of convolutional layers have multiple channels and no standard method is proposed to visualize the features in PolSAR image classification. In this paper, all channels of the features are simply added up and the values are mapped to the range from 0 to 255, so the features are shown as gray images.
We select four one-class training patches from RS-2 Flevoland dataset as the input of FFS-CNN. The size of the patches is 15 × 15. Figure 13 shows the Pauli RGB images and output feature maps of four convolutional layers. The blurry textures of output feature maps of the fourth convolutional layers are consist with the Pauli RGB images and the output feature maps of different land covers have distinct differences. We also select a multi-class training patch from RS-2 Flevoland dataset as the input of FFS-CNN. The size of the patch is also 15 × 15. Figure 14 shows the Pauli RGB image, ground truth map, classification result and the output feature maps of the four convolutional layers. The ground truth map has 2 land covers while the actual classification result has 3 land covers. There are three areas in the Pauli RGB images. The water area is in the middle and two farmland areas are on either side of the water area. The output feature maps of four convolutional layers are all have distinct three areas. The three areas are finally classified to forest, water and farmland. Compare with the ground truth map, the classification result of top-left corner is not correct, but it is consistent with the Pauli RGB image.
From the above analysis, the FFS-CNN can extract discriminative features of different land covers for the classification of multiple pixels in a patch simultaneously.

Future Works
Although FFS-CNN achieves good classification results for the three dataset, FFS-CNN also has a disadvantage. When classifying a whole PolSAR image with no overlap classification strategy, the accuracies are only comparable with Lenet-5 and the results have mosaic effect. For RS-2 Flevoland, the OAs of FFS-CNN and Lenet-5 are almost the same. For AIRSAR Flevoland, the OA of FFS-CNN is a little lower than Lenet-5. For GF-3 Wuhan, the OA of FFS-CNN is higher than Lenet-5. The reason is that the classification results are not continuous between the edges of different image patches.
There are two main research directions to further improve the classification results in the future. First, the conditional random fields (CRF) can be added to FFS-CNN. It is proven that CFR can improve the pixel-wise prediction accuracies of all pixels in an optical image [28], so we believe that CRF can improve the classification results of FFS-CNN as well. Second, more advanced CNN architectures can be introduced to the feature extraction part of FFS-CNN, such as the inception unit and residual unit in GoogLeNet and ResNet, respectively.

Conclusions
In this paper, the proposed FFS-CNN method can classify all pixels in a patch simultaneously and has achieved great results. The OAs of FFS-CNN of the three real PolSAR image datasets all surpass the OAs of Lenet-5. From the experiments we can get the following conclusions. First, the interrelation of different land covers in a patch is indeed helpful for multiple pixels classification. Second, the relationship between the overlap ratio and classification accuracies is analyzed. When the overlap ratio is 1/2, the classification times are about 10 times less than Lenet-5. When the overlap is 3/4, the classification times are about a half of the classification times of Lenet-5. Especially, when the overlap ratio is 3/4, the classification results of FFS-CNN are the best and are much better than Lenet-5.