Satellite Image Classiﬁcation Using a Hierarchical Ensemble Learning and Correlation Coefﬁcient-Based Gravitational Search Algorithm

: Satellite image classiﬁcation is widely used in various real-time applications, such as the military, geospatial surveys, surveillance and environmental monitoring. Therefore, the effective classiﬁcation of satellite images is required to improve classiﬁcation accuracy. In this paper, the combination of Hierarchical Framework and Ensemble Learning (HFEL) and optimal feature selection is proposed for the precise identiﬁcation of satellite images. The HFEL uses three different types of Convolutional Neural Networks (CNN), namely AlexNet, LeNet-5 and a residual network (ResNet), to extract the appropriate features from images of the hierarchical framework. Additionally, the optimal features from the feature set are extracted using the Correlation Coefﬁcient-Based Gravita-tional Search Algorithm (CCGSA). Further, the Multi Support Vector Machine (MSVM) is used to classify the satellite images by extracted features from the fully connected layers of the CNN and selected features of the CCGSA. Hence, the combination of HFEL and CCGSA is used to obtain the precise classiﬁcation over different datasets such as the SAT-4, SAT-6 and Eurosat datasets. The performance of the proposed HFEL–CCGSA is analyzed in terms of accuracy, precision and recall. The experimental results show that the HFEL–CCGSA method provides effective classiﬁcation over the satellite images. The classiﬁcation accuracy of the HFEL–CCGSA method is 99.99%, which is high when compared to AlexNet, LeNet-5 and ResNet.


Introduction
Remote sensing images are widely considered as an essential source of data related to the Earth's surface. Information about the fundamental land cover from remote sensing images is required for classification applications [1,2]. The advantages of the remote sensing techniques are their low cost and the possibility for huge area coverage [3,4]. This image classification technique is utilized to recognize and detect appropriate information from satellite images [5], since satellite images have sufficient information to perform land cover mapping to deliver data at national, international and local scales [6]. Remotely sensed satellite imaging is used in various applications, such as forestry, regional planning, agriculture and geology, to examine and handle human activities and natural resources [7][8][9]. The biophysical cover of the Earth's surfaces is considered one of the most important climate

•
The refined and semantic features are extracted from the fully connected layers of AlexNet, LeNet-5 and ResNet. Further, these extracted features are concatenated together to obtain multiple ensembles of features. • From the extracted features, the optimal set of features is selected using the CCGSA technique. Hence, the combination of multiple ensemble features from the HFEL and optimal features selected from the CCGSA is used to increase the classification accuracy of satellite images. • Three different datasets, i.e., SAT-4, SAT-6 and Eurosat datasets, are considered to analyze the performance of the HFEL-CCGSA method.
The following existing works are related to the classifications accomplished on the SAT-4 and SAT-6 datasets.
Unnikrishnan, A., Sowmya, V. and Soman, K.P [16] presented a deep learning architecture with hypertuning of the network, reducing the input bands to two (i.e., red and near-infrared (NIR)). Deep learning architectures were designed for three networks, namely VGG, AlexNet and ConvNet. Here, the hypertuning was accomplished over the filters of each convolutional layer. The classification of different classes was accomplished using the modified architecture with a reduced number of filters and two-band information. However, the hypertuned deep learning architecture obtained less accuracy during the classification. Jiang, J et al. [17] developed the Double-Channel Convolutional Neural Network (DCCNN) model for classifying RGB-NIR images using the correlation among the R, G, B and NIR bands. For describing the RGB and NIR image features, the DCCNN had two independent CNN networks. Next, feature fusion was performed at the fully connected layer and the classification was performed at the final layer. This configuration was useful for the effective utilization of various features of RGB-NIR images. Moreover, overfitting was avoided using the net dropout technique, which eliminated 60% of the neurons in the fully connected layer. However, the classification accuracy was lower for the double-channel CNN model with the net dropout technique.
N. Yang et al. [18] presented a training approach, namely greedy DropSample, for increasing the speed of the Convolutional Neural Network (CNN)'s optimization process during image classification. This method was mainly focused on the samples that generated the highest gradients. Moreover, the activations of the network were biased in the training process due to the absence of certain training samples. The samples with less losses were filtered out to increase the speed of the CNN. However, the developed DropSample failed to consider the similarity between the classes. Weng et al. [19] developed the Multi-Dimensional Multi-Grained Scanning Structure (MGSS) method to classify the land cover/land use over remote sensing images. The developed MGSS was used to extract the Downloaded from mostwiedzy.pl Remote Sens. 2021, 13, 4351 3 of 17 spatial and spectral information from the images. Next, the prediction was obtained by mapping the probability feature vectors in the residual forest structure. The number of parameters required for the optimization of MGSS was lower when compared to the CNN. However, the gradient passed from the high level affected by a certain event in MGSS. Zhong, Y et al. [20] presented an agile CNN structure, namely SatCNN, for obtaining the classification over High-Spatial-Resolution Remote-Sensing (HSR-RS) images. Here, the intrinsic features from the HSR-RS images were captured using preprocessing with the z-score methods. The developed SatCNN was used to balance the training efficiency and generalization ability of the model. However, the performance of the SatCNN was sensitive to the testing ratio. Specifically, the SatCNN's performance was affected due to the intraand inter-class complexity of the HSR-RS.
The following existing works are related to the classifications accomplished on the EuroSAT dataset. S. A. Yamashkin et al. [21] solved the issue of classification over HSR-RS images using deep learning methods along with the conditions of labeled data scarcity. The GeoSys-temNet model solved the classification issue based on the genetic uniformity of spatially neighboring objects of various scales and hierarchical levels. However, the GeoSystem-Net model required a huge amount of freedom degrees to maintain the classification performance. Syrris et al. [22] developed the SatImNet, which is a group of open training, structured and harmonized data with respect to certain rules. Further, the CNN was modeled to obtain the classification of satellite images.
Finally, the satellite image classification carried out on the Office-31 dataset and the NWPU-Merced-Land satellite image dataset was performed as follows: Hu et al. [23] presented the Coordinate Partial Adversarial Domain Adaptation (CPADA) to perform an unsupervised satellite image classification. The CPADA was used to develop a partial transfer learning technique, and the negative transfer was discarded by coordinate loss using the down-weighting outlier satellite images. The classification of CPADA improved because of the domain-invariant features obtained from the CPADA. However, the features were misaligned, due to the deviation that occurred among the predicted and ideal weights.
The better classification of the satellite images resulted in high classification accuracy. The hypertuned deep learning architecture obtained less accuracy during the classification process due to the vanishing gradient problem [16]. Moreover, the net dropout technique was used to avoid the overfitting issues that affect the classification accuracy due to the removal of relevant features [17]. The DropSample developed for the CNN failed to consider the similarity of the classes, which may affect the classification performance [18]. Moreover, a huge amount of freedom degrees is required for the GeoSystemNet model to maintain the classification performance [21].

Solution
In this paper, classification accuracy over the satellite images is increased using the multiple ensemble features and optimal features selected from the CCGSA technique to select the relevant features to avoid the overfitting problem. The correlation coefficient considered in the feature selection process is used to avoid the irrelevant features from the feature set. The classification accuracy of the HFEL-CCGSA method also improved using image data obtained from the hierarchical framework to maintain the gradient in the network.
The overall organization of the paper is as follows: a detailed explanation of the HFEL-CCGSA method is given in Section 2. Section 3 provides the results and discussion of the HFEL-CCGSA method. Finally, the conclusions are made in Section 4.

HFEL-CCGSA Method
In the HFEL-CCGSA method, the satellite image classification is performed using HFEL and CCGSA-based feature selection. In 'without CCGSA', the refined and semantic features are extracted from the fully connected layers of AlexNEt, LeNet-5 and ResNet. Next, the CCGSA-based feature selection is used to select the optimal features from the extracted features. Generally, input images have unknown characteristics and an effective model is required to handle the unknown characteristics of images. The developed method randomly selects the number of images in the model for classification. The proposed method analyzes the effect of the pre-processing, such as normalization and augmentation. Then, k-fold cross-validation is applied to analyze the performance of the developed method in remote sensing classification. Both the extracted and selected features are given as input to the MSVM for classifying the satellite images. Therefore, the utilization of hierarchical images, feature extraction from the CNN and optimal feature selection are used to improve the classification performance. Figure 1 shows a block diagram of the HFEL-CCGSA method. The hierarchial framework provides the data in various manners for feature learning, the CCGSA method selects the features based on correlation, and ensemble learning selects the feature set based on the MSVM model.

Image Acquisition
To evaluate the HFEL-CCGSA method, the tests were carried out on three different datasets, namely the SAT-4, SAT-6 and Eurosat datasets. Both the SAT-4 and SAT-6 datasets were acquired from the NAIP dataset [24]. The SAT-4 multispectral dataset has 500,000 images, including four different classes of images, such as trees, barren land, grasslands and all other land covers. Next, the SAT-6 dataset has 405,000 images and contains different classes of images, such as water bodies, grasslands, barren land, buildings, trees and roads. The images that exist in both the SAT-4 and SAT-6 datasets have the size of <!--MathType@Translator@5@5@MathML2 (no namespace).tdl@MathML 2.0 (no namespace)@ --> <math> <semantics> <mrow> Figure 1. Block diagram of HFEL-CCGSA method.

Image Acquisition
To evaluate the HFEL-CCGSA method, the tests were carried out on three different datasets, namely the SAT-4, SAT-6 and Eurosat datasets. Both the SAT-4 and SAT-6 datasets were acquired from the NAIP dataset [24]. The SAT-4 multispectral dataset has 500,000 images, including four different classes of images, such as trees, barren land, grasslands and all other land covers. Next, the SAT-6 dataset has 405,000 images and contains different classes of images, such as water bodies, grasslands, barren land, buildings, trees and roads. The images that exist in both the SAT-4 and SAT-6 datasets have the size of 28 × 28 with 1 m spatial resolution, and each image contains red, blue, green and near-infrared (NIR). Figure 2 shows sample images from the SAT-4 and SAT-6 datasets. Downloaded from mostwiedzy.pl   In the Eurosat dataset [25], the satellite images are acquired from European cities that are dispersed over 34 countries. The Eurosat dataset has 27,000 labeled and georeferenced images, where each image has a size of 64 × 64. The Eurosat dataset contains 10 different classes of images, such as residential buildings, industrial buildings, seas and lakes, herbaceous vegetation, highways, pastures, rivers, annual crops, permanent crops and forests, where each class has 2000-3000 images. Additionally, the Eurosat images have 13 bands, namely red, blue, green, aerosols, red edge 1, red edge 2, red edge 3, red edge 4, shortwave infrared 1, shortwave infrared 2, NIR, water vapor and cirrus. Figure 3 shows sample images from the Eurosat dataset. . The Eurosat dataset contains 10 different classes of images, such as residential buildings, industrial buildings, sea and lakes, herbaceous vegetation, highways, pastures, rivers, annual crops, permanent crops and forests, where eac class has 2000-3000 images. Additionally, the Eurosat images have 13 bands, namely red, blue, green, aerosols, re edge 1, red edge 2, red edge 3, red edge 4, shortwave infrared 1, shortwave infrared 2, NIR, water vapor and cirrus Figure 3 shows sample images from the Eurosat dataset.

Hierarchical Framework
In a hierarchical framework, an ordered set of tasks is accomplished to generate the coarse data of the images. An illustration of the hierarchical framework used in the HFEL-CCGSA method is shown in Figure 4. For example, the input image can be directly given to the CNN for feature extraction or it can be generated after performing the pre-processing. The hierarchy contains the arrangement of items to represent the data at various levels or at the same level. Here, images are arranged as raw images, pre-processed and augumented images to extract features for better representation. Downloaded from mostwiedzy.pl illustration of the hierarchical framework used in the HFEL-CCGSA method is shown in Figure 4. For example, the input image can be directly given to the CNN for feature extraction or it can be generated after performing the pre-processing. The hierarchy contains the arrangement of items to represent the data at various levels or at the same level. Here, images are arranged as raw images, pre-processed and augumented images to extract features for better representation.

Image Pre-Processing
After accomplishing the image acquisition, histogram equalization and normalization are performed in order to enhance the image quality. The visual quality of satellite images is improved by varying the pixel value range using normalization, also referred to as contrast stretching. Equation (1)

Image Pre-Processing
After accomplishing the image acquisition, histogram equalization and normalization are performed in order to enhance the image quality. The visual quality of satellite images is improved by varying the pixel value range using normalization, also referred to as contrast stretching. Equation (1) shows the common formulae of the normalization technique.
where x1 specifies the input image subjected to the preprocessing; min and max represent the minimum and maximum intensity values, i.e., from 0 to 255; moreover, x2 indicates the image after normalization, and the new minimum and maximum values are represented as newmin and newmax, respectively. After normalization, the new image ranges from 0 to 1 and it is subjected to histogram equalization. Next, histogram equalization is applied to adjust the image contrast using the histogram values. Histogram equalization is considered an effective technique to provide a better image without losing information such as points, image patches and edges [26].

Data Augmentation
After performing the pre-processing, the image (x2) is subjected to data augmentation. During data augmentation, the pre-processed image is rotated according to the angles 0 • , 90 • , 180 • and 270 • , and it is flipped from left to right. In this way, this documentation generates eight augmented images for a single pre-processed image.
From this hierarchical framework, there are three different image outputs, such as the x1-raw input image, x2-pre-processed image and x3-pre-processed and augmented image. Similarly, the same hierarchical framework is carried out for N amount of input sample images. Further, these outputs from the hierarchical framework are subjected to the feature extraction process, which is explained in the following section.

Feature Extraction from the CNN
In the HFEL-CCGSA method, ensemble learning is performed by concatenating the features from the CNN. There are three different CNNs, namely AlexNet, LeNet-5 and Resnet, used to extract the optimal features from the input images x1, x2 and x3, respectively. Here, the feature vectors from the fully connected layers are obtained and these feature vectors are concatenated together as follows.

AlexNet
In the HFEL-CCGSA method, AlexNet [27,28] is used to extract the feature vectors from the given input, where the structure of the network has eight different layers, i.e., five convolution layers and three fully connected layers. AlexNet uses an effective activation function, namely the rectified linear unit (RELU), which prevents the gradient vanishing issue. The RELU's gradient is always 1, even when the input is not less than 0. The Downloaded from mostwiedzy.pl Remote Sens. 2021, 13, 4351 7 of 17 RELU is also used to improve the training speed, and Equation (2) defines the RELU activation function.
where y is the neuron output and x1 is the input. Next, AlexNet contains various small sub-networks, which may encounter the overfitting issue. Thus, some layers are dropped out to avoid this. On the other hand, some of the neurons are trained in each iteration during the dropout period. The generalization is improved in AlexNet by minimizing joint adaption among the neurons. The output of AlexNet is an average of all the sub-networks. Therefore, the dropout is used to maximize the robustness. Next, the convolutional layers are used for the automatic extraction of features, and the extracted features are reduced by the pooling layer. The convolution of AlexNet is represented in the following Equation (3).
where the input image is represented as x1; the width and height of the input image are h and w, respectively; the convolutional kernel is represented as m; the width and height of the convolutional kernel are b and c, respectively. The convolution is used to acquire the features from the image and these parameters are exchanged in order to minimize the model's complexity. Moreover, the feature map is reduced using max pooling in AlexNet. Then, cross-channel normalization (i.e., a local normalization technique) is used to accomplish feature generalization.
Equation (4) is a Softmax function that is used as an activation function in the fully connected layers for mapping the output between the range of 0 and 1. Next, the feature vectors from the fully connected layers of AlexNet, i.e., f a, are extracted from the output of the fully connected layers and used for ensemble learning.

LeNet-5
LeNet-5 [29] is generally a modified version of the CNN that is applied to the preprocessed image to extract the feature vectors. LeNet-5 has two convolutional layers and two max pooling layers followed by a fully connected layer. The process of LeNet-5 is described as follows.
First, feature extraction is performed using the convolution layer, where each layer has a number of convolutional kernels. In this layer, the input matrix is convolved with the convolution kernel. Consider the input x2 = x2 ij i = 1, 2, . . . I, j = 1, 2, . . . J , where I and J are the input image and amount of data in the input image, respectively. The convolution kernel of LeNet-5 is CK = ck p,q p = 0, 1, . . . SCK − 1, q = 0, 1, . . . SCK − 1 , where SCK defines the size of the convolution kernel. Equation (5) shows the results from the convolutional layer.
where the output obtained from the convolution is represented as oc i,j , the offset term is represented as ot and the activation function is represented as f (.). Similar to AlexNet, this LeNet-5 uses the RELU as an activation function.
The dimension of the data is minimized by accomplishing feature selection using the pooling layer. This pooling layer uses the maximum pooling, which used to obtain Downloaded from mostwiedzy.pl where the lth layer and its former layer are represented as mp l n and mp l−1 n , respectively, and n represents the nth sample In general, the fully connected layer is the last layer of the CNN. Here, the RELU activation function is used by each neuron and this RELU links the neurons with those from previous layers. This fully connected layer combines the local information that can differentiate the types of classes one from another. Equation (7) shows the output of the fully connected layer (l).
f c l n = f ck l · f c l−1 n + ot l Hence, the integration of the convolution layer, pooling layer, RELU activation function and fully connected layer is used to accomplish the feature extraction from the input image.

ResNet
ResNet [30] is used to obtain the semantic features from the input image based on its deeper network structure using a residual block. ResNet has five convolutional layer stages, which include 101 layers that are deeper with less redundancy. Due to the deeper network architecture, ResNet effectively extracts the features from the pre-processed and augmented image (x3). The output from the layers of ResNet is expressed in Equation (8).
where H(x3) is the required output features from ResNet and F(x3) is the network map. Hence, the ResNet features are referred to as the features extracted from the residual network. Moreover, it is worth stating that the features from AlexNet and LeNet-5 are concatenated along with the features extracted using Resnet (i.e., ensemble learning). The concatenation of all features (FV1) from AlexNet, LeNet-5 and Resnet is expressed in Equation (9).
The AlexNet features f a , LeNet-5 features f c , and ResNet feature H are combined for the feature selection process. The formulated FV1 is a feature vector obtained from the combination of hierarchical framework and ensemble learning, which involves appropriate features obtained from the input images. FV1 is given as one input for the MSVM, and the selected features from the feature vector are given as an additional input to the MSVM. The feature selection process over the concatenated features is explained in the following section.

Feature Selection Using CCGSA
In the HFEL-CCGSA method, CCGSA-based feature selection is used to select the optimal features in each stage to minimize the redundancy and interferences caused by the identical features. In some existing feature selection strategies [31,32], classification accuracy is taken as a primary factor for selecting the features. However, the feature selection using the GSA considers the correlation coefficient [33] as the primary factor. The reason for not selecting the classification accuracy is that it requires an additional classifier in the feature selection process. The process of the HFEL-CCGSA method already contains two levels, i.e., it includes HFEL and a feature selection process for classification purposes. Instead of increasing the complexity of the HFEL-CCGSA method, the GSA considers the correlation coefficient of the features; hence, the CCGSA is developed for selecting the features. The solution to the feature selection issue is represented using a binary representation. The selection of the respective feature is represented by each element Generally, the GSA [34] is designed for solving the optimization issue in the continuous search space, but in order to use the CCGSA in the process of searching for features, solutions are required to convert the results into binary form. Therefore, the continuous solutions are converted into binary form using the transfer function that is expressed in Equation (10).
At this time, each solution present in the velocity vector defines the probability of flipping the respective feature from selected to not selected and vice versa. Specifically, flipping occurs when the dimension has high velocity values; otherwise, when the dimension has lower velocity values, there is no flipping in the feature selection.
where the transfer function is represented as TF; the ith element velocity in dimension d and iteration t is represented as v i d (t). The output from Equation (10) is utilized to change the solution location to 1 or 0 as shown in Equation (11).
where the random number generated between the range of 0 and 1 is represented as r.
The derivation of the fitness function is important to select the appropriate features from the feature set (FV1). In this CCGSA, the efficiency of the solution is represented by the solution's physical mass, i.e., the optimal feature set has a high mass when compared to the worst feature subset. The optimal selection of the features is obtained by considering the number of features in the subset and the correlation coefficient. Here, the selected feature subset should have a lower number of features and less inter-correlation of features. The fitness function used in the CCGSA is expressed in Equation (12).
where the first term represents the correlation coefficient; the amount of selected features is represented as R; the amount of total features in the FV1 is represented as A; the average feature to feature inter-correlation and feature to class correlation are represented as r f f and r c f , respectively. Therefore, this feature selection is used to eliminate the redundant and irrelevant features from the feature vectors. The selected features (FV2) from the feature vector are divided into M sets and are given as the second input to the voting-based prediction. The features from CNN models have a size of n × 1000 and the feature selection model of CCGSA selects 0.8 correlated features from extracted features.

Classification Using MSVM
Finally, the features from 'without CCGSA' (i.e., concatenated features) and HFEL-CCGSA (i.e., selected features) are given as input to the MSVM in order to obtain the precise classification of satellite images. To create the MSVM, a multiple-binary Support Vector Machine (SVM) classifier is combined during the classification process [35]. Here, the One Versus All (OVA) method is used to solve the issues related to the multi-class SVM. The C-SVM models are established in the OVA for C amount of classes.

Results and Discussion
The results and discussion of the satellite image classification using the HFEL-CCGSA method are given in this section. The HFEL-CCGSA method was implemented and Downloaded from mostwiedzy.pl simulated using MATLAB R2019a, where the PC was operated with an i9 Intel core processor, 128 GB RAM, 3 TB hard disk and Windows 10 operating system (64-bit). The HFEL-CCGSA method was analyzed with three different datasets, namely the SAT-4, SAT-6 and Eurosat datasets. Here, the performance is analyzed using different metrics such as accuracy, precision and recall, which are defined as follows.
(i) Accuracy Accuracy is defined as the total number of precise predictions obtained using the HFEL-CCGSA method, and Equation (13) expresses the accuracy.
where TP and TN are the true positive and true negative; FP and FN are the false positive and false negative.

(ii) Precision
Precision is the ratio among the true positives and the sum of true positives and false positives and is expressed in Equation (14).
Recall is the ratio among the true positives and true positives and false negatives, which is shown in Equation (15).

Quantitative Analysis on SAT-4 Dataset
This section provides the quantitative analysis of the SAT-4 dataset, which contains 500,000 satellite images. In these images, 70% of the data were used for training and 30% of the data were used for testing purposes. Here, four different classes were considered for classification, namely grassland, trees, barren land and a class that covered all land cover classes excluding grassland, trees and barren land. The quantitative analysis of the HFEL-CCGSA method for the SAT-4 dataset was carried out in three different ways, which are given as follows. Table 1 shows the performance comparison between the case 'without CCGSA' and the HFEL-CCGSA method on the SAT-4 dataset. Here, the case 'without CCGSA' was individually processed using the MSVM classifier. From Table 1, it can be concluded that the combination in the case 'without CCGSA' performed well when compared to the individual experiments. The reason for the integration of 'without CCGSA' is that it uses multiple ensembles of features at the same time for the classification performance. Moreover, the features from the images (i.e., hierarchical framework) lead to improved classification. Tables 2 and 3 show the performance analysis of the individual CNN and HFEL-CCGSA (AlexNet + LeNet-5 + ResNet) and different feature selection methods and HFEL-CCGSA, respectively. The combination of CNNs, i.e., AlexNet + LeNet-5 + ResNet, provides better classification because it extracts the feature vectors from the fully connected layers of all CNNs. As the satellite images from the datasets have different angles and perceptions, different types of CNN are used in the HFEL-CCGSA to obtain the refined and semantic features from the images. Hence, the combination of different CNNs, namely AlexNet, LeNet-5 and ResNet, can be used to obtain better classification. Moreover, the PSO and BDA were used for the comparison of the different feature selection methods, because all the optimization algorithms consider velocity to update the location of the population. From Table 3, it can be seen that the classification performance was improved using the CCGSA, Downloaded from mostwiedzy.pl Remote Sens. 2021, 13, 4351 11 of 17 because it selects the features mainly based on the correlation coefficient of features. Hence, the CCGSA can be used to avoid the irrelevant features from the image features, which helps to improve the accuracy up to 99.99% for satellite image classification. Downloaded from mostwiedzy.pl

Quantitative Analysis on SAT-6 Dataset
This section provides the quantitative analysis of the SAT-6 dataset, which has 40,500 satellite images. These 40,500 satellite images contain six different land cover classes, such as grassland, trees, barren land, roads, buildings and water bodies. From the 40,500 satellite images, 70% of the images were used for training and 30% of the images were used for testing purposes. The performance analysis of the HFEL-CCGSA method for the SAT-6 dataset was accomplished in three different ways, which are as follows.
The comparison among the case 'without CCGSA' and the HFEL-CCGSA method for the SAT-6 dataset is presented in Table 4. Table 4 shows that the HFEL-CCGSA achieves higher classification accuracy when compared to the case 'without CCGSA'. The ensemble of multiple semantic and refined features can be used to obtain a higher classification accuracy of 99.99% than in the case 'without CCGSA'. The performance comparison of the individual CNNs and HFEL-CCGSA (AlexNet + LeNet-5 + ResNet) and the different feature selection methods and CCGSA for the SAT-6 dataset are shown in Tables 5 and 6, respectively. From Tables 5 and 6, it can be concluded that the performance of the HFEL-CCGSA method is better than that of the individual CNNs and PSO and BDA methods. The reduction in outliers in the ensemble learning (i.e., AlexNet + LeNet-5 + ResNet) can be used to improve the classification performance of the HFEL-CCGSA method. On the other hand, the correlation coefficient-based feature selection method can be used to extract the refined features from the extracted feature vectors, which can be used to increase the accuracy up to 99.99% compared to the other feature selection methods. Downloaded from mostwiedzy.pl

Quantitative Analysis on Eurosat Dataset
This section provides the quantitative analysis of the Eurosat dataset, which contains 27,000 satellite images. In these images, 70% of the data were used for training and 30% of the data were used for testing purposes. The Eurosat dataset has 12 different land use and land cover classes, such as residential buildings, industrial buildings, highways, permanent crops, pastures, forests, herbaceous vegetation, annual crops, rivers, seas and lakes. The quantitative analysis of the HFEL-CCGSA method for the Eurosat dataset was performed in three different ways, which are given as follows. Tables 7-9 show the performance comparison for the different experiments, CNNs and feature selection methods, respectively. From the analysis, it can be concluded that the HFEL-CCGSA method provides better performance than the individual experiments, the CNNs and the PSO and BDA feature selection methods. For example, the classification accuracy of the HFEL-CCGSA method is 99.49%, which is high when compared to the case 'without CCGSA'. The HFEL-CCGSA method achieves better performance due to its ensemble of multiple features and the appropriate feature selection from the images.

Comparative Analysis
The comparative analysis of the HFEL-CCGSA method with existing satellite image classification methods is described in this section. The existing methods considered for the comparison were the two-band AlexNet [16], hyperparameter-tuned AlexNet [16], twoband ConvNet [16], hyperparameter-tuned ConvNet [16], two-band VGG [16], hyperparameter-tuned VGG [16], DCCNN [17], MGSS [19] and GeoSystemNet [21]. Here, the comparison was made between three different datasets, namely the SAT-4, SAT-6 and Eurosat datasets.   Table 10 shows the comparative analysis of the HFEL-CCGSA method for three different datasets, namely the SAT-4, SAT-6 and Eurosat datasets. From Table 10 and Figure 5, it can be concluded that the HFEL-CCGSA method performs well when compared to the existing satellite image classification techniques. The classification accuracy of the HFEL-CCGSA method is 99.99% for both the SAT-4 and SAT-6 datasets, which is high when compared to the existing classification techniques. Additionally, the classification accuracy of the HFEL-CCGSA method is 99.49% for the Eurosat dataset, which is high when compared to the GeoSystemNet model [21]. The DCCNN [17] achieves lower classification accuracy, because of the net dropout technique used to avoid the overfitting issue. Moreover, the classification performance of the GeoSystemNet model [21] is maintained only when using a huge amount of freedom degrees. However, the classification accuracy of the HFEL-CCGSA method is improved when using both low-and high-level image data and an ensemble of multiple features. Furthermore, the accuracy of the HFEL-CCGSA method is improved by selecting the optimal features during the classification. Downloaded from mostwiedzy.pl Table 10. Comparative analysis of HFEL-CCGSA method on 5-fold cross-validation.

Conclusions
In this paper, the combination of HFEL and CCGSA is used for the precise classification of satellite images from the SAT-4, SAT-6 and Eurosat datasets. The hierarchical framework includes data pre-processing and augmentation, which are used to generate the images. Subsequently, three different CNNs, namely AlexNet, LeNe 5 and ResNet, are used to extract appropriate features from hierarchical images. On the other hand, the CCGSA is used to eliminate the redundant features from the selected features based on the correlation coefficient. Further, the MSVM effectively classifies satellite images using both multiple ensemble features and selected features. From th

Conclusions
In this paper, the combination of HFEL and CCGSA is used for the precise classification of satellite images from the SAT-4, SAT-6 and Eurosat datasets. The hierarchical framework includes data pre-processing and augmentation, which are used to generate the images. Subsequently, three different CNNs, namely AlexNet, LeNet-5 and ResNet, are used to extract appropriate features from hierarchical images. On the other hand, the CCGSA is used to eliminate the redundant features from the selected features based on the correlation coefficient. Further, the MSVM effectively classifies satellite images using both multiple ensemble features and selected features. From the performance analysis, it is concluded Downloaded from mostwiedzy.pl Remote Sens. 2021, 13, 4351 16 of 17 that the HFEL-CCGSA method provides better performance than the existing methods. The classification accuracy of the HFEL-CCGSA method analyzed on the Eurosat dataset is 99.49%, which is lower when compared to the GeoSystemNet model. Author Contributions: The paper investigation, resources, data curation, writing-original draft preparation, writing-review and editing, and visualization were performed by K.T. and M.M.A. The paper conceptualization, software, validation, and formal analysis, and methodology were performed by A.S. Supervision, project administration, and final approval of the version to be published were conducted by P.B.D. and H.K.L. All authors have read and agreed to the published version of the manuscript.