Squirrel Search Optimization with Deep Transfer Learning-Enabled Crop Classiﬁcation Model on Hyperspectral Remote Sensing Imagery

: With recent advances in remote sensing image acquisition and the increasing availability of ﬁne spectral and spatial information, hyperspectral remote sensing images (HSI) have received considerable attention in several application areas such as agriculture, environment, forestry, and mineral mapping, etc. HSIs have become an essential method for distinguishing crop classes and accomplishing growth information monitoring for precision agriculture, depending upon the ﬁne spectral response to the crop attributes. The recent advances in computer vision (CV) and deep learning (DL) models allow for the effective identiﬁcation and classiﬁcation of different crop types on HSIs. This article introduces a novel squirrel search optimization with a deep transfer learning-enabled crop classiﬁcation (SSODTL-CC) model on HSIs. The proposed SSODTL-CC model intends to identify the crop type in HSIs properly. To accomplish this, the proposed SSODTL-CC model initially derives a MobileNet with an Adam optimizer for the feature extraction process. In addition, an SSO algorithm with a bidirectional long-short term memory (BiLSTM) model is employed for crop type classiﬁcation. To demonstrate the better performance of the SSODTL-CC model, a wide-ranging experimental analysis is performed on two benchmark datasets, namely dataset-1 (WHU-Hi-LongKou) and dataset-2 (WHU-Hi-HanChuan). The comparative analysis pointed out the better outcomes of the SSODTL-CC model over other models with a maximum of 99.23% and 97.15% on test datasets 1 and 2, respectively.


Introduction
Due to advancements in remote sensing image acquisition mechanisms and the increasing availability of rich spatial and spectral data by means of various sensors, hyperspectral imaging has become more prominent [1]. Especially, hyperspectral remote sensing image (HSI) classification has become a major source for real-time application in fields such as mineral mapping, agriculture, environment, and forestry, etc. [2,3]. Usually, the HIS is taken at a large number of contiguous narrow spectral wavelengths for the improved analysis of the earth object. Since the spectral resolution could be in nm, the hyperspectral sensor offers significant facility in data analysis [4] for many humanitarian tasks, including precision agriculture for improved farming practices, discrimination amongst vegetation classes for better treatment, etc. [5]. The current study emphasizes using and analyzing HSI in the agriculture area. Conventional techniques, such as statistical-based analyses and field surveys, are time-consuming [6]. Cutting-edge remote sensing technologies involving HSI provide an appropriate solution and might fill the gap with solutions such as crop classification. In the HSI framework, the classification has the common objective of automatically labeling the pixel (spectral pattern or signature) into a predetermined class [7]. The classification is implemented either by utilizing the transformed feature or the original feature. An HSI has numerous features and is hard to adapt to a single convolutional kernel size. When the number of model layers is increased, many useful features are lost [8][9][10].
The authors of [11] proposed a rotation-invariant local binary pattern-based weighted generalized closest neighbor (RILBP-WGCN) approach for an HSI classifier. The presented RILBP is an improved texture-based classifier paradigm, which employs LBP filters to any designated bands to generate a wide sketch of spatial texture data. Similarly, the presented WGCN approach effectually maintained the spatial uniformity amongst the adjacent pixel employing a local weight method and point-to-set distances. Meng et al. [12] concentrated on a DL-based crop mapping, utilizing one-shot hyperspectral satellite imagery, whereas three CNN techniques, such as 1D-CNN, 2D-CNN, and 3D-CNN, were executed for end-to-end crop mapping. Furthermore, a manifold learning-based visualized method, i.e., t-distributed stochastic neighbor embedding (t-SNE), was established for demonstrating the discriminative capability of deep semantic feature extracting by the distinct CNN approaches.
In [13], a hybrid model was established for estimating the chlorophyll content from the crops utilizing HIS segmentation with active learning, which contains two important stages. First, it can utilize a sparse multinomial logistic regression (SMLR) method for learning the class posterior probability distribution with quadratic programming or joint probability distributions. Second, it can utilize the data developed from the preceding step for segmenting the HSI utilizing a Markov random field segment. Farooq et al. [14] examine patch-based weed identification utilizing HSI. A CNN was estimated and correlated to a histogram of oriented gradients (HoG) for this solution. Appropriate patch sizes were examined. The restriction of RGB imagery was established. In [15], a deep one-class crop (DOCC) structure that contains a DOCC extracting element and an OCC extraction loss element was presented for large-scale OCC mapping. The DOCC structure takes only the instances of one target class as input for extracting the crop of interest by positive and unlabeled learning and automatically extracts the feature for OCC mapping.
In [16], a low altitude UAV hyperspectral remote sensing platform was created for collecting higher spatial resolution remote sensing images of degraded grassland. The GDIF-3D-CNN classifier method was utilized for classifying the pure pixel and every pixel data set, whose accuracy and performance were enhanced by optimizing the eight parameters of the method. Wei et al. [17] present a fine classifier approach dependent upon multi-feature fusion and DL. During this case, the morphological profiles, GLCM texture, and endmember abundance features were leveraged to exploit the spatial data of HIS. Next, the spatial data were fused with original spectral data to generate a classifier outcome by utilizing a DNN with a conditional random field (DNN + CRF) method. In detail, the DNN is a deep detection method that extracts depth features and mines the potential data.
For smaller samples and higher-dimension HSIs, it becomes very complex to learn wide-ranging image features; subsequently, it becomes hard to precisely recognize complex HSI. The UAV-borne HSIs have rich spatial data, and the spatial resolution reaches centimeter level; however, the higher spatial resolution causes serious spatial heterogeneity and spectral variability. Nowadays, the deep learning (DL) method is extensively employed in image processing because of its effective feature learning abilities [9]. Currently, the most common DL-based network framework is the convolution neural network (CNN). CNN has the features of parameter sharing, equivariant mapping, and sparse interaction, which reduce the training parameter size and complexity of the network. Such features permit the algorithm to generate a certain degree of invariance in scaling, shifting, and distortion and also create fault tolerance and stronger robustness [10]. Consequently, CNN has been extensively employed in HSI classification.
This article introduces a novel squirrel search optimization with a deep transfer learning-enabled crop classification (SSODTL-CC) model on HSIs. The proposed SSODTL-CC model initially derives a MobileNet with an Adam optimizer for the feature extraction process. The utilization of the Adam optimizer allows for effectual adjustment of the hyperparameters of the MobileNet model. In addition, a bidirectional long-short term memory (BiLSTM) method is employed for crop type classification. To enhance the classifier efficiency of the BiLSTM model, the SSO algorithm is employed for hyperparameter optimization, which shows the novelty of the work. To demonstrate the better performance of the SSODTL-CC model, a wide-ranging experimental analysis is performed on a benchmark dataset.

Materials and Methods
In this article, a new SSODTL-CC model has been developed to identify the crop type in HSIs properly. To do so, the proposed SSODTL-CC model performed feature extraction using MobileNet with an Adam optimizer. In addition, the BiLSTM model received feature vectors and performed crop type classification. To enhance the classifier efficiency of the BiLSTM model, the SSO algorithm was employed for hyperparameter optimization. Figure 1 illustrates the block diagram of the SSODTL-CC technique.
Appl. Sci. 2022, 12, x FOR PEER REVIEW 3 of 17 centimeter level; however, the higher spatial resolution causes serious spatial heterogeneity and spectral variability. Nowadays, the deep learning (DL) method is extensively employed in image processing because of its effective feature learning abilities [9]. Currently, the most common DL-based network framework is the convolution neural network (CNN). CNN has the features of parameter sharing, equivariant mapping, and sparse interaction, which reduce the training parameter size and complexity of the network. Such features permit the algorithm to generate a certain degree of invariance in scaling, shifting, and distortion and also create fault tolerance and stronger robustness [10]. Consequently, CNN has been extensively employed in HSI classification. This article introduces a novel squirrel search optimization with a deep transfer learning-enabled crop classification (SSODTL-CC) model on HSIs. The proposed SSODTL-CC model initially derives a MobileNet with an Adam optimizer for the feature extraction process. The utilization of the Adam optimizer allows for effectual adjustment of the hyperparameters of the MobileNet model. In addition, a bidirectional long-short term memory (BiLSTM) method is employed for crop type classification. To enhance the classifier efficiency of the BiLSTM model, the SSO algorithm is employed for hyperparameter optimization, which shows the novelty of the work. To demonstrate the better performance of the SSODTL-CC model, a wide-ranging experimental analysis is performed on a benchmark dataset.

Materials and Methods
In this article, a new SSODTL-CC model has been developed to identify the crop type in HSIs properly. To do so, the proposed SSODTL-CC model performed feature extraction using MobileNet with an Adam optimizer. In addition, the BiLSTM model received feature vectors and performed crop type classification. To enhance the classifier efficiency of the BiLSTM model, the SSO algorithm was employed for hyperparameter optimization. Figure 1 illustrates the block diagram of the SSODTL-CC technique.

Data Collection
In this section, the experimental validation of the proposed model is performed against two datasets [18], namely dataset-1 (WHU-Hi-LongKou) and dataset-2 (WHU-Hi-HanChuan). The dataset-1 comprises a total of 9000 samples with nine class labels, holding 1000 samples under each class. In addition, dataset-2 comprises a total of 16,000 samples with 16 class labels, holding 1000 samples under each class. Figure 2 shows the sample HSIs from various classes, such as water spinach, soybean, strawberry, corn, sesame, and broad-leaf soybean.

Data Collection
In this section, the experimental validation of the proposed model is performed against two datasets [18], namely dataset-1 (WHU-Hi-LongKou) and dataset-2 (WHU-Hi-HanChuan). The dataset-1 comprises a total of 9000 samples with nine class labels, holding 1000 samples under each class. In addition, dataset-2 comprises a total of 16,000 samples with 16 class labels, holding 1000 samples under each class. Figure 2 shows the sample HSIs from various classes, such as water spinach, soybean, strawberry, corn, sesame, and broad-leaf soybean.

Feature Extraction: MobileNet Model
During the feature extraction process, the HSIs were passed into the MobileNet model to generate feature vectors. MobileNet is a CNN-based technique that is extensively applied in classifier procedures. The most important benefit of utilizing the presented method is that the model needs moderately low computation work in comparison with the CNN, which makes it appropriate to operate with a mobile device and a computer that operates with lower computational capabilities. The presented method is a fundamental architecture that combines convolution layers that are applied to efficiently distinguish details according to two controllable attributes that change between parameter precision and potential. The presented method is valuable in diminishing the size of the system.
The MobileNet structure is very effective with the least amount of attributes, namely Palmprint detection. This concerns a depth-wise convolution. The fundamental architecture is dependent on discrete abstracted layers, i.e., a module of dissimilar convolution layers that seem to be the quantal structure that measures a typical in-depth complication

Feature Extraction: MobileNet Model
During the feature extraction process, the HSIs were passed into the MobileNet model to generate feature vectors. MobileNet is a CNN-based technique that is extensively applied in classifier procedures. The most important benefit of utilizing the presented method is that the model needs moderately low computation work in comparison with the CNN, which makes it appropriate to operate with a mobile device and a computer that operates with lower computational capabilities. The presented method is a fundamental architecture that combines convolution layers that are applied to efficiently distinguish details according to two controllable attributes that change between parameter precision and potential. The presented method is valuable in diminishing the size of the system.
The MobileNet structure is very effective with the least amount of attributes, namely Palmprint detection. This concerns a depth-wise convolution. The fundamental architecture is dependent on discrete abstracted layers, i.e., a module of dissimilar convolution layers that seem to be the quantal structure that measures a typical in-depth complication [19]. The resolution multiplier variable ω is added to minimize the measurement of the input dataset and inner layer representation with the analogous variable.
The feature vector map of size F m × F m , and the filter is of size F s × F s . The input variable is embodied by p, and the output variable is denoted by q. For the basic abstract layer of the structure, the whole computation work is considered as variable c e , and it could be evaluated as follows: The ω multiplier value can be considered within one to n. The variable resolution multiplier is known as α. The computational effort is recognized as the variable cost e and is evaluated by the following equation: The proposed approach incorporates the pointwise and depth-wise convolutions that are circumscribed by the reduction variable known as the variable d, which is evaluated in the following: The two hyper characteristics, resolution and width multipliers, enable changing the optimal window size for accurate prediction based on the context. The third values suggest that it contains three input channels. The principle under the MobileNet structure replaced the complicated convolutional layer, which comprises a convolutional layer with 3 × 3 buffers for the input dataset, along with a pointwise convolutional layer of size 1 × 1 that combines the filtered variable to construct an element.
To optimally tune the hyperparameters related to the MobileNet model, the Adam optimizer is exploited. Furthermore, the hyperparameter optimized by the MobileNetv2 approach utilizes the Adam optimizer. It can be utilized for estimating an adoptive learning value, whereas the parameter was implemented for training the parameter of the DNN approach [20]. It can be a well-designed and effective approach for the 1st-order gradient with constraints stored for stochastic optimization. At this point, the newly presented approach was utilized to resolve the ML problem with the maximum dimensional parameter space, and the massive data set measures the rate of learning for different features with approximations of 1st and 2nd order moments. Additionally, the Adam optimizer was heavily utilized depending upon the gradient descent (GD) and momentum technique and a variety of intervals. Therefore, the 1st momentum is attained utilizing Equation (4): The 2nd momentum is expressed as:

Crop Type Classification: BiLSTM Model
At the time of image classification, the extracted feature vectors are fed into the BiLSTM model. The BiLSTM approach receives the feature vector as input and executes the detection method. The LSTM signifies a different RNN method, which solves the problem of gradient vanishing of RNN by offering a threshold method and memory unit [21]. However, x denotes the network input at different times, y refers to the network outcome, h stands for the hidden layer (HL), u refers to the weighted input to HLs, w demonstrates the weighted input of the previous node HL to the existing node HL, and v signifies the weighted input in HL to the output layer.
During the actual implementation of the LSTM technique, the LSTM unit was upgraded at time t as: At this point, stands for the equal product of elements, and σ denotes the sigmoid function. x t signifies the input vector at time t. h t refers to the HL vector named as the output vector and the storage of all the data at time t and the preceding time. b t , b f , b c , b o demonstrates the offset vector. W i , W f , W c , W o implies the weight of various gates to the HL vector h t . U i , U f , U c , U o stands for the weighted input vector. x t stands for the input, forgotten, unit, and output gates, correspondingly. Utilizing the 3-gates infrastructure, the LSTM permits the recurrent network to maintain the useful data of the task from the memory units at the time of the trained method, therefore evading the problem of the RNN disappearing but reaching an extensive range of data.
In addition to processing the series data, the BLSTM presents more backward estimate procedures, for instance, different normal LSTM cases. This process employs the subsequent data of sequences. At last, the forward and reverse estimations are executed. The values were resultant of the output layer simultaneously; thus, as the outcome, all of the sequence data are reached in 2 × 2 directions, which is utilized to complete a variety of natural language processing tasks.

Hyperparameter Tuning: SSO Algorithm
For enhancing the classifier efficiency of the BiLSTM model, the SSO algorithm is employed for hyperparameter optimization. The SSO technique is proposed by the foraging behavior of a flying squirrel; subsequently, an effectual method employed small animals for migration. According to the food foraging hierarchy of squirrels [22], the optimum SSO algorithm is iteratively developed in an arithmetical model. There are important characteristics in SSA, that is, population sizes NP, maximal value of iteration Iter max , the predator existence possibility P dp , decision variables value n, gliding constants G c , scaling factors s f , upper and lower limits to decision variable FS U and FS L . They are given in the following. The position of the squirrel is randomly loaded from the searching space: However, rand ( ) denotes an arbitrary value in [0, 1]. The fitness measure f = ( f l f 2 , f NP ) of a squirrel position was processed by replacing the decision variable with FF: Next, the quality of food sources is evaluated by the fitness measure of a squirrel position as follows: In addition, the organization of food sources was processed, which comprised hickory trees, normal trees, and oak trees (acorn nuts). The optimal food source (lower fitness) was assumed to be the hickory nut tree (FS hr ), the successive food sources that exist are denoted as acorn nut trees (FS ar ), and the rest are called normal trees (FS nt ): The three states that denote the dynamic gliding approach of squirrels are described in the following. Scenario 1. The squirrel resides in an acorn nut tree and jumps to a hickory nut tree. A novel location can be given as follows: Now d g indicates the gliding distance, R l denotes a function that proceeds the measured value of a uniform distribution value within 0 and 1, and G c denotes a gliding constant. Scenario 2. The squirrel resides in a normal tree and moves to acorn nut trees for gathering needed food. A novel location can be determined by: Here, R 2 indicates a function that provides a measure of uniform distribution value in [0, 1] . Scenario 3. Squirrels on normal trees go to hickory nut trees once they meet the routine objectives. Now, a novel position of squirrel can be determined by: where R 3 shows a function that suggests the measure of uniform distribution amongst [0, 1] . Hence, this measure is a maximum that invokes high perturbation. For achieving an appropriate method, a scaling factor (sf) is employed as a divisor of d g .
The foraging nature of flying squirrels depends on the season, which varies frequently. Therefore, the seasonal observation must be implemented; thus, the trapping is removed in the local optimal result. The seasonal constant Sc and minimal value can be given as: S cmin = 10E − 6 365 Iter/(Iter max )/2.5 For S t c < S c min , the winter becomes the highest, the squirrel loses its exploring ability, and the method of searching for food sources and locations changes: Now the Lévy distribution is employed to improve the global search to an enhanced method: This approach stops when the maximal constraint is fulfilled. If not, the nature of creating a novel location and approving the seasonal observation need to be repeatedly followed.

Result Analysis of SSODTL-CC Model
This section investigates the performance of the proposed model on test images. Figure 3 showcases the sample classification results obtained by the SSODTL-CC model. The figure implies that the proposed model has obtained effective classification results. In addition, some of the misclassified regions by the SSODTL-CC model are marked in blue circles.

Result Analysis of SSODTL-CC Model
This section investigates the performance of the proposed model on test images. Figure 3 showcases the sample classification results obtained by the SSODTL-CC model. The figure implies that the proposed model has obtained effective classification results. In addition, some of the misclassified regions by the SSODTL-CC model are marked in blue circles.       Table 1 reports detailed crop classification outcomes of the SSODTL-CC model on all of dataset-1. The experimental values indicated that the SSODTL-CC model gained effectual outcomes under every individual class. For instance, in the corn class, the SSODTL-CC model offered accu y , prec n , and reca l of 99.24%, 97.55%, and 95.60%, respectively. Similarly, on the mixed weed class, the SSODTL-CC model reached accu y , prec n , and reca l of 99.27%, 96.70%, and 96.70%, respectively. Overall, the SSODTL-CC model showed a maximum average accu y , prec n , and reca l of 99.20%, 96.43%, and 96.40%, respectively. Table 2 depicts a brief crop classification outcome of the SSODTL-CC approach on 70% of training dataset-1. The experimental values stated that the SSODTL-CC method gained effectual outcomes under every individual class. For instance, in the corn class, the SSODTL-CC model offered accu y , prec n , and reca l of 99.19%, 97.04%, and 95.49%, respectively. In addition, in the mixed weed class, the SSODTL-CC system obtained accu y , prec n , and reca l of 99.27%, 96.67%, and 96.67%, respectively. Overall, the SSODTL-CC model demonstrated maximum average accu y , prec n , and reca l of 99.19%, 96.38%, and 96.35%, correspondingly.  Table 3 defines the detailed crop classification outcomes of the SSODTL-CC model on 30% of testing dataset-1. The experimental values indicated that the SSODTL-CC model gained effectual outcomes under every individual class. For instance, in the corn class, the SSODTL-CC approach presented accu y , prec n , and reca l of 99.37%, 98.68%, and 95.85%, correspondingly. Furthermore, in the mixed weed class, the SSODTL-CC methodology reached accu y , prec n , and reca l of 99.26%, 96.76%, and 96.76%, respectively. Overall, the SSODTL-CC model portrayed enhanced average accu y , prec n , and reca l of 99.23%, 96.54%, and 96.53%, correspondingly.   Table 4 demonstrates the detailed crop classification outcomes of the SSODTL-CC model on all of dataset-2. The experimental values exposed that the SSODTL-CC model gained effectual outcomes under every individual class. For instance, in class 1, the SSODTL-CC algorithm obtained accu y , prec n , and reca l of 97.39%, 79.57%, and 78.30% correspondingly. In addition, in class 16, the SSODTL-CC model gained accu y , prec n , and reca l of 97.17%, 74.98%, and 82.10%, correspondingly. Overall, the SSODTL-CC model outperformed higher average accu y , prec n , and reca l of 97.13%, 74.98%, and 82.10%, respectively. Table 5 reports a brief crop classification outcome of the SSODTL-CC model on 70% of training dataset-2. The experimental values exposed that the SSODTL-CC model gained effectual outcomes under every individual class. For instance, in class 1, the SSODTL-CC model offered accu y , prec n , and reca l of 97.43%, 79.74%, and 79.29%, respectively. In addition, in class 16, the SSODTL-CC model reached accu y , prec n , and reca l of 97.21%, 74.08%, and 81.65%, respectively. Overall, the SSODTL-CC methodology exhibited maximal average accu y , prec n , and reca l of 97.13%, 77.08%, and 77.05%, correspondingly. Table 6 defines the detailed crop classification outcome of the SSODTL-CC technique on 30% of testing dataset-2. The experimental values indicated that the SSODTL-CC algorithm gained effectual outcomes under every individual class. For sample, in class 1, the SSODTL-CC model offered accu y , prec n , and reca l of 97.29%, 79.15%, and 75.93%, correspondingly. In the same way, in class 16, the SSODTL-CC system reached accu y , prec n , and reca l of 97.06%, 76.80%, and 82.99%, respectively. Overall, the SSODTL-CC approach showed maximal average accu y , prec n , and reca l of 97.15%, 77.25%, and 77.09%, correspondingly.

Discussion
To ensure the improved crop classification results of the SSODTL-CC model, a comparison study with recent models on two datasets is given in Table 7 [22,23].    From these results and discussions, it is evident that the SSODTL-CC model has the capability of attaining improved crop classification outcomes on HSIs.       with existing approaches on dataset-2. The outcomes indicated that the SVM model gained an ineffectual outcome with the least of 77.34%. Likewise, the FNEA-OO model certainly accomplished an increased performance with an of 86.49%. Then, the SVRFMC, CNN, and CNN-CRF models depicted closer values of 86.95%, 87.72%, and 94.67%, correspondingly. At last, the SSODTL-CC methodology demonstrated superior performance with an of 97.15%.

Conclusions
In this article, a new SSODTL-CC model was developed to properly identify the crop type in HSIs. To do so, the proposed SSODTL-CC model performed feature extraction using MobileNet with an Adam optimizer. In addition, the BiLSTM model received feature vectors and performed crop type classification. To enhance the classifier efficiency of the BiLSTM model, the SSO algorithm was employed for hyperparameter optimization. To demonstrate the better performance of the SSODTL-CC model, a wide-ranging experimental analysis was performed on two benchmark datasets, namely dataset-1 (WHU-Hi-LongKou) and dataset-2 (WHU-Hi-HanChuan). The comparative analysis pointed out the better outcomes of the SSODTL-CC model over the recent approaches, with a maximum of 99.23% and 97.15% on test datasets 1 and 2, respectively. Therefore, the SSODTL-CC model can be utilized for effective crop type classification on HSIs. In the future, the classification performance of the SSODTL-CC model can be enhanced by the design of hybrid DL models.  Data Availability Statement: Data sharing not applicable to this article as no datasets were generated during the current study.