Novel Vision Transformer–Based Bi-LSTM Model for LU/LC Prediction—Javadi Hills, India

: Continuous monitoring and observing of the earth’s environment has become interactive research in the ﬁeld of remote sensing. Many researchers have provided the Land Use/Land Cover information for the past, present, and future for their study areas around the world. This research work builds the Novel Vision Transformer–based Bidirectional long-short term memory model for predicting the Land Use/Land Cover Changes by using the LISS-III and Landsat bands for the forest- and non-forest-covered regions of Javadi Hills, India. The proposed Vision Transformer model achieves a good classiﬁcation accuracy, with an average of 98.76%. The impact of the Land Surface Temperature map and the Land Use/Land Cover classiﬁcation map provides good validation results, with an average accuracy of 98.38%, during the process of bidirectional long short-term memory–based prediction analysis. The authors also introduced an application-based explanation of the predicted results through the Google Earth Engine platform of Google Cloud so that the predicted results will be more informative and trustworthy to the urban planners and forest department to take proper actions in the protection of the environment. LST work, the features of the LST map the LU/LC change classiﬁcation map for evaluating the LU/LC prediction map for Javadi Hills. The LST map shows the features of the high- and low-temperature values of the earth’s surface. The high-temperature values indicate less vegetation, and the low-temperature value indicates a high-vegetation area. The impact of the LST map over the LU/LC change classiﬁcation map provides good accuracy during the process of LU/LC prediction. The relationship between the values of the LST and LU/LC map is shown in Section 5.1.


Introduction
The Land Use/Land Cover (LU/LC) prediction is one of the most significant applications of remote sensing and GIS technology. The main causes of LU/LC changes are agricultural/crop damage, wetland change, deforestation, urban expansion, and vegetation loss. Several researchers working in this application area for many years had different findings for their study areas around the world. The importance of this LU/LC prediction research is to provide information about the landscape changes of the specific study area to the government officials, forest department, urban planners, and social workers for the protection of the LU/LC environment [1][2][3]. Remote sensing technology provides information about the satellite data and helps in performing the LU/LC prediction research effectively. Researchers have used different remote sensing satellite systems for acquiring the data, and some of the satellite system databases are Advanced Land Imager (ALI), Hyperion data, Linear Imaging Self-Scanning Sensor III (LISS-III), Linear Imaging Self-Scanning Sensor IV (LISS-IV), Landsat Series, Sentinel-2A and -2B, Moderate Resolution Imaging Spectroradiometer (MODIS), Rapid Eye Earth Imaging System (REIS), and ASTER Global DEM (Digital Elevation Model). Other data acquisition for performing the LU/LC prediction research can be made through aerial photographs, Google Earth images, government, and field or ground survey data. The advantage of the satellite and airborne data has been used in many applications areas such as oceanography, landscape monitoring, weather forecasting, biodiversity conservation, forestry, cartography, surveillance, and warfare [4][5][6][7][8][9][10]. The different band in the multispectral data has been widely used in monitoring the LU/LC changes around the world. The visible (red-blue-green), near infrared (NIR), short-wave domains. Many researchers had motivated and contributed to the significant problem of LU/LC prediction analysis. The LU/LC change detection for past, present, and future analysis has been a key research topic to understand the environmental change on the earth's surface. Hence, LU/LC feature extraction has emerged as an essential research aspect, and therefore, the standard and accurate methodology for LU/LC classification and prediction should be made. By use of satellite system technology, we can perform our research on LU/LC change analysis. The main need of this research is to assist the landresource management, government officials, forest department, and urban planners to take action to protect the earth's environment. From the brief survey on different classification and prediction algorithms, we have found that the sustainable growth of the LU/LC environment for the time-series data requires an accurate classification and prediction map, which was considered the strong motivation for our study. The main contributions of our work are as follows: • The novel Vision Transformer-based Bidirectional long-short term memory (Bi-LSTM) model is proposed for predicting the LU/LC changes of Javadi Hills, India.

•
The use of the LST map with the Vision Transformer-based LU/LC classification map provides the main advantage in achieving good validation accuracy with less computational time during the process of LU/LC prediction analysis through the Bi-LSTM model.

•
The impacts of the Multi-Satellite System (LISS-III multispectral with the Landsat TIRS, RED, and NIR bands) on the proposed LU/LC prediction model for Javadi Hills, India, are analyzed. • Explainable Artificial Intelligence (XAI), an application-based explanation, is also introduced for validating the predicted results through the Google Earth Engine platform of Google Cloud so that the predicted results will be more informative and trustworthy to the urban planners and forest department to take appropriate measures in the protection of the environment.

Materials and Methods
This section elaborates the various stages of our proposed prediction model: (i) the study area and data acquisition, (ii) proposed Vision Transformer-based LULC classification, (iii) description of expression for calculating and analyzing the LST map, (iv) Bi-LSTM model for LULC prediction, and (v) description of explainable AI and its importance.

Study Area and Data Acquisition
The study area in our research work is the forest-and non-forest-covered area of Javadi Hills with the geographic coordinates falling between 78.75 E 12.5 N and 79.0 E 12.75 N. Our study area is located across the Eastern Ghats of Vellore and Tiruvannamalai district, Tamil Nadu, India. The UTM (Universal Transverse Mercator) GCS (geographic coordinate system)/WGS (World Geodetic System) 1984 (44 N) projection system was processed for the extracted satellite data. The location of the Javadi Hills map was extracted from Google Earth Engine (https://www.google.com/earth/ (accessed on 10 November 2021)). The map view of our study area was prepared by using ArcGIS (Version 10.1 developed by ESRI (http://www.esri.com/software/arcgis)) geospatial software, and it is shown in Figure 1.
The multispectral LISS-III satellite images for the years 2012 and 2015 were collected from the Bhuvan Indian Geo-Platform of ISRO (www.bhuvan.com (accessed on 9 December 2019)). The extracted LISS-III multispectral data of Javadi Hills were used for the LU/LC classification process. The TIRS, RED, and NIR bands of Landsat 8 (Band 10) and Landsat 7 (Band 6) were collected from the United States Geological Survey (USGS), United States (https://earthexplorer.usgs.gov (accessed on 16 December 2019)) and were used for the estimation of LST. There was no TIRS Band in the LISS-III sensor, so we extracted the TIRS image from the Landsat Satellite data for our study area. The importance of the TIRS band used in our paper provides the impact of LST on Javadi Hills for the years 2012 and 2015. Table 1 shows the source and characteristics of the remotely sensed satellite Appl. Sci. 2022, 12, 6387 5 of 35 images. In our research work, the atmospheric corrections were made to provide good visibility to the extracted LISS-III multispectral satellite image of Javadi Hills. The scan-line error correction was made for filling the gaps in the extracted Landsat TIRS image of Javadi Hills. The geometric correction was made to extract the Region of Interest (ROI) coordinates in the forest-and non-forest-covered area of Javadi Hills that falls between 78.80 E 12.56 N and 78.85 E 12.60 N. Figure  The multispectral LISS-III satellite images for the years 2012 and 2015 were collected from the Bhuvan Indian Geo-Platform of ISRO (www.bhuvan.com (accessed on 09 December 2019)). The extracted LISS-III multispectral data of Javadi Hills were used for the LU/LC classification process. The TIRS, RED, and NIR bands of Landsat 8 (Band 10) and Landsat 7 (Band 6) were collected from the United States Geological Survey (USGS), United States (https://earthexplorer.usgs.gov (accessed on 16 December 2019)) and were used for the estimation of LST. There was no TIRS Band in the LISS-III sensor, so we extracted the TIRS image from the Landsat Satellite data for our study area. The importance of the TIRS band used in our paper provides the impact of LST on Javadi Hills for the   (a) (b)    (a) (b)

Proposed Vision Transformer Model for LU/LC Classification
A transformer is a deep-learning model that has emerged through the self-attention mechanism. The transformer follows the encoder-decoder architecture by processing the sequential data parallelly without depending on any recurrent network. It has been widely used in the scientific fields of NLP and computer vision. The Vision Transformer architecture has attracted an interesting view from researchers in recent years by showing good performance in the area of machine-and deep-learning applications. The Vision Transformer has been used in the area of image classification for providing state-of-the-art performance and to outperform the standard classification models. The Vision Transformer develops the encoder module of the transformer for performing the

Proposed Vision Transformer Model for LU/LC Classification
A transformer is a deep-learning model that has emerged through the self-attention mechanism. The transformer follows the encoder-decoder architecture by processing the sequential data parallelly without depending on any recurrent network. It has been widely used in the scientific fields of NLP and computer vision. The Vision Transformer architecture has attracted an interesting view from researchers in recent years by showing good performance in the area of machine-and deep-learning applications. The Vision Transformer has been used in the area of image classification for providing state-of-the-art performance and to outperform the standard classification models. The Vision Transformer develops the encoder module of the transformer for performing the image classification by representing the sequence of image patches to the classified label. The attention mechanism of the Vision Transformer goes through all areas of the image and integrates the information into the full-sized image [47][48][49][50][51]. The end-to-end Vision Transformer model for the classification of satellite images is shown in Figure 6. The Vision Transformer classification model has experimented with the preprocessed LISS-III satellite image of Javadi Hills for the years 2012 and 2015. The Vision Transformer architecture is composed of an embedding, encoder, and classifier layer. Equations (1) and (2) represent the first step of analyzing and dividing the training images into a sequence of patches. Appl. Sci. 2022, 12, x FOR PEER REVIEW 8 of 35 Let represent a set of training satellite images, , where is a satellite image; represents the class labels { ∈ 1, 2, … … , } associated with the , and m denotes the number of defined LU/LC classes for that set. Let S i represent a set of training satellite images, r, where X i is a satellite image; y i represents the class labels {y i ∈ 1, 2, . . . . . . , m} associated with the X i , and m denotes the number of defined LU/LC classes for that set. In the first step of the Vision Transformer model, an image X i from the training, the set is divided into non-overlapping patches of fixed size. Each patch is observed by the Vision Transformer as an individual token. Thus, from the size h * w * c (where h is the height, c is the number of channels, and w is the width) of an image X i , we extracted the patches of dimension c * p * p (p is the patch size) from it. The extracted patches are converted to a sequence of images (x 1 , x 2 , x 3 , . . . . . . . . . , x n ) of length n through flattening.
The image patches are linearly projected into a vector setup of model dimension, d, using the known embedding matrix, E. The concatenation of embedded representations is processed along with the trained classification token vclass for performing the classification task. The positional information, E pos , is programmed and attached to the patch representation. The spatial arrangements of the trained image patches were processed through positional embedding. The resulting sequence of image patches from positional embedding with token z 0 is given in Equation (3).
The resulting sequence of embedded image patches, z 0 , is passed into the transformer encoder with L identical layers. It has a multi-head self-attention block (MSA) and fully connected feed-forward MLP (Multilayer Perceptron) block with the GeLU activation function between them. The two subcomponents of the encoder work with the residual skip connections through the normalization layer (LN). The representation of the two main components of the encoder is given in Equations (4) and (5). The last layer of the encoder, the first element in the sequence z 0 L , is passed into the head classifier for attaining the LU/LC classified classes.
The transformer block for the classification model is shown in Figure 7. The MSA block of the encoder is considered the central component of the transformer. The MSA block determines the importance of a single patch embedding with the other embeddings in the sequence. There are four layers in the MSA block: the linear layer, the self-attention layer, the concatenation layer, and a final linear layer. The attention weight is computed by calculating the weighted sum of all values in the sequence. The query-key-value scaling dot product is computed by the self-attention (SA) head through the attention weights The Q (query), K (key), and V (value) were generated by multiplying the element against three learned matrices U QKV (Equation (7)). For determining the significance of the elements on the sequence, the dot product is used between the Q vectors of one element with the K vectors of the other elements. The results show the importance of the image patches in the sequence. The outcomes of the dot product were scaled and passed into a Softmax (Equation (8)).
[Q, K, V] = zU QKV , U QKV ∈ R d * 3D k (7)  The scaling-dot-product process achieved by the SA block is related to the standa dot product, but it includes the dimension of the key as a scaling factor. The patch with the high attention scores (Equation (8)) are processed by multiplying the outputs Softmax with the values of each patch embedding vector. The results of all the attentio heads are concatenated and provided to the MLP classifier for attaining the pixel valu representation of the feature map (Equation (10)). The resampling was performed f adjusting the size of the feature map so that the output classified image would be repr sented in the standardized form during the time of accuracy assessment. The trainin data with different parameters that define the Vision Transformer classification model our research work are presented in Section 5.1. The LU/LC classification map for th years 2012 and 2015 is shown in Figure 8. The accuracy assessment for the fe ture-extraction-based classification model is shown in Section 5.2. The evaluation of th LU/LC classification map was achieved through the accuracy assessment. The percentag of the LU/LC change between the years 2012 and 2015 for our study area was calculate Based on the good accuracy results, the LU/LC change classification map was processe for further findings of the LU/LC prediction map. The scaling-dot-product process achieved by the SA block is related to the standard dot product, but it includes the dimension of the key D K as a scaling factor. The patches with the high attention scores (Equation (8)) are processed by multiplying the outputs of Softmax with the values of each patch embedding vector. The results of all the attention heads are concatenated and provided to the MLP classifier for attaining the pixel value representation of the feature map (Equation (10)). The resampling was performed for adjusting the size of the feature map so that the output classified image would be represented in the standardized form during the time of accuracy assessment. The training data with different parameters that define the Vision Transformer classification model of our research work are presented in Section 5.1. The LU/LC classification map for the years 2012 and 2015 is shown in Figure 8. The accuracy assessment for the feature-extraction-based classification model is shown in Section 5.2. The evaluation of the LU/LC classification map was achieved through the accuracy assessment. The percentage of the LU/LC change between the years 2012 and 2015 for our study area was calculated. Based on the good accuracy results, the LU/LC change classification map was processed for further findings of the LU/LC prediction map. Appl

Land Surface Temperature
The LST measures the skin temperature of the spatial data in the field of remote sensing. It displays the cold and hot temperature of the earth's surface through the radiant energy reflected within the surface. The thermal-infrared remote-sensing data are used for measuring the LST. The TIRS data help in recognizing the mixture of bare soil and vegetation temperatures through LST [52][53][54]. In our research work, we estimated the LST for the TIRS bands of Landsat 8. Equations (11)-(13) represent the estimation of LST for TIRS image 7. The conversion of the Digital Number (DN) value to the radiance of the TIRS image is calculated by using Equation (11). The conversion of radiance into the brightness temperature is shown in Equation (12). The degree conversion from Kelvin (K) to Celsius (C) is shown in Equation (13).
where represents the spectral radiance in ( /( * * )) , represents the quantized calibrated pixel value, represents the maximum quantized calibrated pixel value, represents the minimum quantized calibrated pixel value, represents the spectral radiance scaled to , and represents the spectral radiance scaled to . = (12)

Land Surface Temperature
The LST measures the skin temperature of the spatial data in the field of remote sensing. It displays the cold and hot temperature of the earth's surface through the radiant energy reflected within the surface. The thermal-infrared remote-sensing data are used for measuring the LST. The TIRS data help in recognizing the mixture of bare soil and vegetation temperatures through LST [52][53][54]. In our research work, we estimated the LST for the TIRS bands of Landsat 8. Equations (11)-(13) represent the estimation of LST for TIRS image 7. The conversion of the Digital Number (DN) value to the radiance of the TIRS image is calculated by using Equation (11). The conversion of radiance into the brightness temperature is shown in Equation (12). The degree conversion from Kelvin (K) to Celsius © is shown in Equation (13).
where L λ represents the spectral radiance in Watts/(m 2 * sr 2 * µm ) , QCAL represents the quantized calibrated pixel value, QCALMAX represents the maximum quantized calibrated pixel value, QCALMI N represents the minimum quantized calibrated pixel value, LMAX λ represents the spectral radiance scaled to QCALMAX, and LMI N λ represents the spectral radiance scaled to QCALMI N. (13) where T K represents the effectiveness at the satellite temperature in Kelvin, and K1 and K2 represent the calibration constants 1 and 2 in Watts/(m 2 * sr 2 * µm ), respectively. For Landsat 7, the calibration constant value of K1 and K2 is 666.09 and 1282.71, respectively.
Equations (14)- (20) represent the estimation of LST for the TIRS image of Landsat 8. By using the radiance rescaling factor, the conversion of Top of Atmosphere (TOA) spectral radiance is shown in Equation (14). By using the thermal infrared constant values in the metadata file of the satellite image, the spectral radiance data are converted to the TOA brightness temperature, and the expression is shown in Equation (15). The NDVI is calculated for differentiating the near-infrared and visible reflectance of the vegetation cover of the satellite data. The expression for NDVI is shown in Equation (16). The Land Surface Emissivity (LSE) is derived from NDVI values for displaying the average emissivity of the earth's surface. The expressions are shown in Equations (17) and (18). By using the results of TOA brightness temperature, emitted radiance wavelength, and LSE, the LST was calculated and is shown in Equation (19).
where TL λ represents the TOA spectral radiance in Watts/(m 2 * sr 2 * µm ), ML represents the radiance multiplicative band rescaling factor of the TIRS image, QCAL represents the quantized calibrated pixel value, AL represents the radiance additive band rescaling factor of TIRS image, and O i represents the correction value of the TIRS band of Landsat 8.
where BT P represents TOA brightness temperature in Celsius, and K1 and K2 represent the calibration constant 1 and 2 in Watts/(m 2 * sr 2 * µm ), respectively. For Landsat 8, the calibration constant value of K1 and K2 is 774.8853 and 1321.0789, respectively.
where NDV I represents the Normalized Difference Vegetation Index, N IR represents the reflectance values of the near-infrared band, and RED represents the reflectance values of the red band.
where E represents the Land Surface Emissivity, PV represents the Proportion of Vegetation, NDV I represents the reflectance values of the NDV I image, NDV I max represents the maximum reflectance value of the NDV I image, and NDV I min represents the minimum reflectance value of the NDV I image.
where LST represents Land Surface Temperature, BT P represents the TOA brightness temperature in Celsius ©, λ represents the wavelength of the emitted radiance, pk represents the Planck's constant value of 6.626 * 10 −34 J s, vl represents the velocity of the light value of 2.998 * 108 m/s, and bc represents the Boltzmann constant value of 1.38 * 10 −34 JK. The statistical modeling of TIRS bands present in the Landsat satellite image was used for analyzing the LU/LC surface temperature of Javadi Hills, and it helps in improving the performance of the LU/LC prediction model. The LST map of Javadi Hills during the years 2012 and 2015 was analyzed by using the TIRS bands of Landsat 7 and 8 for the area of Javadi Hills. The flow of the calculation of LST for our area of Javadi Hills is shown in Figure 9. The LST map for the years 2012 and 2015 is shown in Figure 10.

Bidirectional Long Short-Term Memory Model for LU/LC Prediction
The LSTM model is considered the advanced model of RNN, where the long-term dependencies can be learned for the sequence prediction problems. The long-term vanishing-gradient problems are prevented by using the LSTM models. The key elements of the LSTM model are input, forget, and output gate [55][56][57]. Figure 11 displays the working principle of the LSTM model. . In Figure 11, the vector operations represent the element-wise multiplication ( * ), and element-wise summation (+) respectively. The time step (t) indicates the length of the input sequence in all the Equations (21)- (26). Equation (21) shows the mathematical expression of the forget gate, where represents the memory gate's output at time t, represents the sigmoid function (0 < < 1), represents the weight value of ANN, ℎ is the output value of the previous cell, represents the input values, and denotes the bias weight values of the ANN. At the output of the equation, the value 1 will keep the information and the value 0 will forget the information = ( * ℎ , + ) = ℎ( * ℎ , + ) In Equation (22), represents the output of the input gate, σ represents the sigmoid function, represents the weight values stored in the memory of ANN, ℎ is the

Bidirectional Long Short-Term Memory Model for LU/LC Prediction
The LSTM model is considered the advanced model of RNN, where the long-term dependencies can be learned for the sequence prediction problems. The long-term vanishinggradient problems are prevented by using the LSTM models. The key elements of the LSTM model are input, forget, and output gate [55][56][57]. Figure 11 displays the working principle of the LSTM model. In Figure 11, the vector operations represent the element-wise multiplication ( * ), and element-wise summation (+) respectively. The time step (t) indicates the length of the input sequence in all the Equations (21)- (26). Equation (21) shows the mathematical expression of the forget gate, where f t represents the memory gate's output at time t, σ represents the sigmoid function (0 < σ < 1), W f represents the weight value of ANN, h t−1 is the output value of the previous cell, x t represents the input values, and b f denotes the bias weight values of the ANN. At the output of the equation, the value 1 will keep the information and the value 0 will forget the information  In Equation (23), represents the output of ANN with the normalized ℎ function that outputs the value between −1 and +1, represents the weight values stored in the memory of ANN, ℎ is the output value of the previous cell, represents the input values, and denotes the bias weight values of the ANN.
Equation (24) shows the mathematical expression of the updated gate, where the memory is updated. The ANN learns the stored or forgotten information from the memory and then updates the newly added information from Equations (21)- (23). Equation (25) shows the mathematical expression of the output gate, where represents the weight values stored in the memory of ANN, ℎ is the output value of the previous cell, represents the input values, and denotes the bias weight values of the ANN. The output value, ℎ , was calculated in Equation (26).
The uniform LU/LC classes were generated through the Vision Transformer classification model, and the features of the LST map were extracted for the years 2012 and 2015. In this research work, we used the spatial features of the LST map and the LU/LC change classification map for evaluating the LU/LC prediction map, using the Bi-LSTM model. The idea of Bi-LSTM is to process the sequence data in both forward and backward directions. The Bi-LSTM algorithm was used in our research for extracting the spatial and temporal features of the fifteen-year time-series data from 2012 to 2027 for the area of Javadi Hills. Figure 12 displays the working principle of the Bi-LSTM prediction model. In Equation (22), I t represents the output of the input gate, σ represents the sigmoid function, W i represents the weight values stored in the memory of ANN, h t−1 is the output value of the previous cell, x t represents the input values, and b i denotes the bias weight values of the ANN.
In Equation (23), c t represents the output of ANN with the normalized tanh function that outputs the value between −1 and +1, W c represents the weight values stored in the memory of ANN, h t−1 is the output value of the previous cell, x t represents the input values, and b c denotes the bias weight values of the ANN.
Equation (24) shows the mathematical expression of the updated gate, where the memory is updated. The ANN learns the stored or forgotten information from the memory and then updates the newly added information from Equations (21)- (23). Equation (25) shows the mathematical expression of the output gate, where W O represents the weight values stored in the memory of ANN, h t−1 is the output value of the previous cell, x t represents the input values, and b O denotes the bias weight values of the ANN. The output value, h t , was calculated in Equation (26).
The uniform LU/LC classes were generated through the Vision Transformer classification model, and the features of the LST map were extracted for the years 2012 and 2015. In this research work, we used the spatial features of the LST map and the LU/LC change classification map for evaluating the LU/LC prediction map, using the Bi-LSTM model. The idea of Bi-LSTM is to process the sequence data in both forward and backward directions. The Bi-LSTM algorithm was used in our research for extracting the spatial and temporal features of the fifteen-year time-series data from 2012 to 2027 for the area of Javadi Hills. Figure 12 displays the working principle of the Bi-LSTM prediction model. Appl   to project the predicted maps for the years 2021 (t + 9) and 2024 (t + 12) successfully. The features (LC(j m,n )) define the LU/LC classes with the LST temperature values for each time step at defined coordinates. The input set of combined features of the LU/LC and LST map from the Javadi Hills was split by the ratio of 8:2 for the training and validation of the model. The parameters were adjusted through a trial-and-error approach for acquiring good prediction accuracy. The tanh activation function was used for the Bi-LSTM layers, whereas the Softmax activation functions were used for the last layer to calculate the probabilities between the LU/LC classes of Javadi Hills. Through repeated forward and back-propagation processes, the parameters are adjusted until the cost function is minimized. The validation method is part of training the prediction model and adjusting the parameters, which uses a small portion of data to validate and update the model parameters for each training epoch. The significant approach is to ensure that the prediction model is learning from data correctly by minimizing the cost function during the training and validation process. The training data with the parameters that run the Bi-LSTM prediction model for our research work are presented in Section 5.

Application-Based Explainable Artificial Intelligence and Its Importance
The XAI provides knowledge to humans about the outcomes achieved by machineor deep-learning models. The XAI has been used for providing knowledge on the extracted time-series LU/LC information to the urban planners, forest department, and government officials. XAI improves the user's understanding and trust in the products or services. There are many ways of explaining the model through XAI, and the techniques of explaining the model differ for each application area around the world [58][59][60]. In our research work, we used application-based XAI, and it was observed to be the easiest and fastest way of obtaining knowledge with finite compute resources. The knowledge about the outcomes of the prediction model can be accessed through online applications. Technically, the application-based XAI can be understood by the end-users through third-party applications. In our prediction model, we used the Google Earth Engine (https://www.google.com/earth/ (accessed on 10 November 2021)) platform for explaining our results to urban planners, forest departments, and government officials. The LU/LC predicted results for the years 2018 and 2021 were tested through the Google Earth Engine time-series image. We achieved good testing accuracy for our prediction model. Through the XAI of the Google Earth Engine platform, the end-users can also access and check the LU/LC information. We have shown the model structure of XAI through the Google Earth Engine platform for our research work in Figure 15. The XAI on Google Earth will convey the LU/LC information to the government, forest department, and urban planners to take action in regard to protecting the LU/LC area.

Application-Based Explainable Artificial Intelligence and Its Importance
The XAI provides knowledge to humans about the outcomes achieved by machine-or deep-learning models. The XAI has been used for providing knowledge on the extracted time-series LU/LC information to the urban planners, forest department, and government officials. XAI improves the user's understanding and trust in the products or services. There are many ways of explaining the model through XAI, and the techniques of explaining the model differ for each application area around the world [58][59][60]. In our research work, we used application-based XAI, and it was observed to be the easiest and fastest way of obtaining knowledge with finite compute resources. The knowledge about the outcomes of the prediction model can be accessed through online applications. Technically, the application-based XAI can be understood by the end-users through third-party applications. In our prediction model, we used the Google Earth Engine (https://www.google.com/ earth/ (accessed on 10 November 2021)) platform for explaining our results to urban planners, forest departments, and government officials. The LU/LC predicted results for the years 2018 and 2021 were tested through the Google Earth Engine time-series image. We achieved good testing accuracy for our prediction model. Through the XAI of the Google Earth Engine platform, the end-users can also access and check the LU/LC information. We have shown the model structure of XAI through the Google Earth Engine platform for our research work in Figure 15. The XAI on Google Earth will convey the LU/LC information to the government, forest department, and urban planners to take action in regard to protecting the LU/LC area.

Proposed LU/LC Prediction Using Vision Transformer-Based Bi-LSTM Model
This research work aimed to identify the LU/LC changes in the forest-covered (high vegetation) and non-forest-covered (less vegetation) regions of the proposed study area. The flow of LU/LC change for our study area is shown in Figure 16. The proposed flow of this work is described in the following steps, The relationship between the spatial features of the LST map with the LU/LC classification map were used to provide good validation results during the prediction process.

•
The Bi-LSTM model was successfully applied to forecast the future LU/LC changes of Javadi Hills for the years 2018, 2021, 2024, and 2027.

•
The LU/LC changes that occurred in our study area will assist the urban planners and forest department to take proper actions in the protection of the environment through XAI.

Proposed LU/LC Prediction Using Vision Transformer-Based Bi-LSTM Model
This research work aimed to identify the LU/LC changes in the forest-covered (high vegetation) and non-forest-covered (less vegetation) regions of the proposed study area. The flow of LU/LC change for our study area is shown in Figure 16. The proposed flow of this work is described in the following steps, The relationship between the spatial features of the LST map with the LU/LC classification map were used to provide good validation results during the prediction process.

•
The Bi-LSTM model was successfully applied to forecast the future LU/LC changes of Javadi Hills for the years 2018, 2021, 2024, and 2027.

•
The LU/LC changes that occurred in our study area will assist the urban planners and forest department to take proper actions in the protection of the environment through XAI. Appl

Algorithm to Construct the Vision Transformer-Based Bi-LSTM Model for LU/LC Prediction
Our research is based on the Vision Transformer-based Bi-LSTM model for LU/LC Prediction of Javadi Hills, India. From the brief analysis and validation, we found that the impact of the TIRS LST map with the LU/LC classified provides a good percentage of results with a lower misclassification rate. The detailed steps of our proposed model are presented in Algorithm 1. Each process in our proposed algorithm provides the different aspects of LU/LC information of Javadi Hills. A brief explanation of the input data, training data, parameter settings, and accuracy assessment of our proposed model is explained in Section 5. Return input data ( ) 6 7 Preprocessed data ( ): 8 Initialize the input data for performing the preprocessing for the input data of and 9 For each initialized input image of and 10 Calculate the geometric coordinates of the study area (georeferencing)

Algorithm to Construct the Vision Transformer-Based Bi-LSTM Model for LU/LC Prediction
Our research is based on the Vision Transformer-based Bi-LSTM model for LU/LC Prediction of Javadi Hills, India. From the brief analysis and validation, we found that the impact of the TIRS LST map with the LU/LC classified provides a good percentage of results with a lower misclassification rate. The detailed steps of our proposed model are presented in Algorithm 1. Each process in our proposed algorithm provides the different aspects of LU/LC information of Javadi Hills. A brief explanation of the input data, training data, parameter settings, and accuracy assessment of our proposed model is explained in Section 5. Input data (I P ): 2 Initialize the input data 3 Extract LISS-III multispectral image (M = I 1, I 2 ) 4 Extract Landsat bands (T = IR 1, IR 2 ) 5 Return input data (I P ) 6 7 Preprocessed data (PR I ): 8 Initialize the input data for performing the preprocessing for the input data I P of M and T 9 For each initialized input image of M and T Algorithm 1: Continued.

10
Calculate the geometric coordinates of the study area G I (georeferencing) 11 Reduce the atmospheric (haze) effects A I of the georeferenced image 12 Correct the radiometric errors R I for the haze-reduced image 13 End for 14 Return preprocessed data (PR I ) 15 16 LU/LC classification (LU I ): 17 Perform the Vision Transformer-based LU/LC classification by using the preprocessed image PR I 18 For each input image of PR I 19 Load the training data Ti and initialize the parameters 20 Split an image into patches of fixed size 21 Flatten the image patches 22 Perform the linear projection from the flattened patches 23 Include the positional embeddings 24 Feed the sequences as an input to the transformer encoder 25 Fine-tune the multi-head self-attention block in the encoder 26 Concatenate all the outputs of attention heads and provide the MLP classifier for attaining the pixel value representation of the feature map. 27 Generate the LU/LC classification map 28 End for 29 LU/LC classification (LU I ) 30 31 Accuracy assessment (AA I ): 32 Perform the accuracy assessment for the feature extraction-based LU/LC classification map LU I 33 For each classified map of LU I 34 Compare the labels of each classified data LU I with the Google Earth data 35 Build the confusion matrix 36 Calculate overall accuracy, precision, recall, and F1-Score 37 Summarize the performance of the classified map LU I 38 End for 39 Return accuracy assessment (AA I ) 40 41 Change detection (CD I ): 42 Perform the LU/LC change detection by using the time-series LU/LC change classification map (LU I ) 43 For each classified map of LU I 44 Calculate the percentage of change between the time-series classified map of LU I 45 End For 46 Return change detection (CD I ) 47 48 Extracting LST map (LST) 49 Initialize the I P of T 50 For each preprocessed image of T 51 Calculate Land Surface Temperature using the Landsat bands (TIRS, RED, and NIR) 52 Extract the spatial features 53 End for 54 Return LST (LST I ) 55 56 LU  Apply tanh activation function for each Bi-LSTM layer 66 The output layer is decided by using the Softmax activation function 67 Update the parameters until the loss function is minimized 68 The output of the predicted time-series data is obtained 69 Validate the results 70 End for 71 Return LU/LC prediction map LP I {PR 1, PR 2 , . . .} 72 Analyze the growth patterns of the LU/LC prediction maps 73 74 Explain predicted results to the urban planners, forest department, and government officials, using application-based XAI End

Results and Discussion
The problematic study on LU/LC prediction in Javadi was presented in this research work. The LISS-III multispectral, Landsat TIRS, RED, and NIR satellite images were used for predicting the vegetation in the forest-and non-forest-covered regions of the Javadi Hills. All the research experiments were processed on the Intel Xeon processor 2.

Training Data and Parameter Settings
For appropriate mapping of the input features to the output features using machinelearning or a deep-learning model, the training data and its parameters were used and tuned. Algorithm 1 shows the detailed procedure of our research on LU/LC prediction.  Table 2. The output extracted at the end of the fully connected layer was used as the LU/LC classified map for further processing. After the classification, each classified sample was tested through the referenced data of Google Earth images. The LU/LC classified image ( ) was tested through the referenced Google Earth image. Each reference datum was labeled according to the respective LU/LC classes of the Javadi Hills through Google Earth images. The LU/LC class considered in our research work includes the high-and less-vegetation regions of the forest-and non-forest-covered regions of Javadi Hills. For better understanding, we have shown the validation of the point shape file with the Google Earth images in Figure 18, and the class values associated with each coordinate of the trained image are shown in Table 3. The accuracy assessment was calculated for the Vision Transformer model, and the results are shown in Section 5.2. The percentage of LU/LC change detection was calculated for the LU/LC classified image, and the results are shown in Section 5.3. Based on the good accuracy, the LU/LC classification map was processed for further findings of the LU/LC prediction map. The LST map for the years 2012 and 2015 was calculated to extract the spatial features of Javadi Hills. The estimation of the LST map was explained in Section 3.3. The LST map shows the features of the high-and low-temperature values of the earth's surface of Javadi Hills. The high-temperature values indicate less vegetation, and the low-temperature value indicates a high-vegetation area. The LST ( ) and the LU/LC  After the classification, each classified sample was tested through the referenced data of Google Earth images. The LU/LC classified image (LU I ) was tested through the referenced Google Earth image. Each reference datum was labeled according to the respective LU/LC classes of the Javadi Hills through Google Earth images. The LU/LC class considered in our research work includes the high-and less-vegetation regions of the forest-and non-forest-covered regions of Javadi Hills. For better understanding, we have shown the validation of the point shape file with the Google Earth images in Figure 18, and the class values associated with each coordinate of the trained image are shown in Table 3. The accuracy assessment was calculated for the Vision Transformer model, and the results are shown in Section 5.2.
Hills. We combined the time-series features of LST and the LU/LC map of Javadi Hills. The impact of LST on the LU/LC map provides good results during the prediction process. For a better understanding, we show the impact of a few LST and LU/LC features in Figure 19, and we show the values in Table 4. The impact on the LST and LU/LC map strengthens our proposed predicted model with good validation results.    The percentage of LU/LC change detection was calculated for the LU/LC classified image, and the results are shown in Section 5.3. Based on the good accuracy, the LU/LC classification map was processed for further findings of the LU/LC prediction map. The LST map for the years 2012 and 2015 was calculated to extract the spatial features of Javadi Hills. The estimation of the LST map was explained in Section 3.3. The LST map shows the features of the high-and low-temperature values of the earth's surface of Javadi Hills. The high-temperature values indicate less vegetation, and the low-temperature value indicates a high-vegetation area. The LST (LST I ) and the LU/LC (LU I ) classification map was used as an input for predicting the LU/LC map of Javadi Hills. We combined the time-series features of LST and the LU/LC map of Javadi Hills. The impact of LST on the LU/LC map provides good results during the prediction process. For a better understanding, we show the impact of a few LST and LU/LC features in Figure 19, and we show the values in Table 4. The impact on the LST and LU/LC map strengthens our proposed predicted model with good validation results.      Table 5. The combined features of the LU/LC and LST map were used as the training features during the process of the Bi-LSTM training. Each pixel value was identified through the latitude and longitudinal coordinates of Javadi Hills manually through the combined features of the LU/LC and LST map. Each pixel holds either high or less vegetation for its defined coordinate system. The few combined values were shown in Table 4. For better understanding, we show the combined features map in Figure 20. The accuracy results for the prediction model are shown in Section 5.2. The results were also cross-verified with the time-series Google Earth Engine for acquiring the validation accuracy of our model. With the impact of the LST map with the LU/LC map, good validation accuracy was obtained with a lower misclassification rate. Appl. Sci. 2022, 12,   The combined features of the LU/LC and LST map were used as the training features during the process of the Bi-LSTM training. Each pixel value was identified through the latitude and longitudinal coordinates of Javadi Hills manually through the combined features of the LU/LC and LST map. Each pixel holds either high or less vegetation for its defined coordinate system. The few combined values were shown in Table 4. For better understanding, we show the combined features map in Figure 20. The accuracy results for the prediction model are shown in Section 5.2. The results were also cross-verified with the time-series Google Earth Engine for acquiring the validation accuracy of our model. With the impact of the LST map with the LU/LC map, good validation accuracy was obtained with a lower misclassification rate.

Validation of Vision Transformer-Based Bi-LSTM Model
The Google Earth images with the LU/LC classified images were evaluated for the examination of accuracy assessment. By using the time-series images of the Google Earth

Validation of Vision Transformer-Based Bi-LSTM Model
The Google Earth images with the LU/LC classified images were evaluated for the examination of accuracy assessment. By using the time-series images of the Google Earth Engine, the accuracy assessment was calculated for the LU/LC classified image of Javadi Hills. All the pixel values of the LU/LC classified image were validated with the Google Earth images. A total of 1008 random training samples were loaded, and the confusion matrix was obtained during the process of accuracy assessment. Table 6 represents the confusion matrix for the years 2012 and 2015. The results of the accuracy assessment for the year 2012 are 0.9891, and for 2015, it is 0.9861. Table 7 represents the LU/LC accuracy assessment for the years 2012 and 2015. The LU/LC prediction was performed, and the results were analyzed and processed.  Table 8. The validation accuracy refers to the results of the non-trained datasets of the model. The testing accuracy refers to the results of the complete model. We used the inputs of the LU/LC map of 2012 and 2015, along with the predicted LU/LC map of 2018 and 2021 for predicting the LU/LC map for the years 2024 and 2027. The short-term prediction was performed till the year 2027 for our study area. As the Google Earth Engine provides the time-series image till the current date, the validation and testing accuracy for the predicted LU/LC map of 2024 and 2027 was not calculated. With the results of the good validation accuracy for all the LU/LC predicted maps of Javadi Hills, our prediction model provides a lower misclassification rate.
where A Y represents the accuracy value of years {1 . . . .n}, and T represents the total number of years. The importance of providing the performance of the model depends on the average classification and prediction results. The average classification and prediction accuracy for the time series LU/LC data have been calculated by using Equation (27  The computational complexity defines the total time taken by the computer for running an algorithm. The computational complexity of the Vision Transformer model is O (nC), where n is the size of input, and C is the number of classified LU/LC classes. The computational complexity of the Bi-LSTM prediction model is O (nkC + 1), where k is the size of the spatial maps (LST) associated with input data n. Hence, the total computational time of our proposed algorithm C c is the arithmetic sum of the classification and prediction model, which is given in Equation (28).
Although the proposed Vision Transformer-based Bi-LSTM prediction model shows significant performance, its training phase requires the determination of class values associated with spatial maps for each pixel in the n images, and this is computationally expensive.

Growth Pattern of the LU/LC Area of Javadi Hills
The growth patterns of LU/LC change in the area of Javadi Hills were performed between the years 2012 to 2027, and the results are shown in Table 9. In 2012, the LU/LC multispectral classified map was found to be 1651.04 ha (hectare) of the high vegetation and 736.85 ha of less vegetation. In 2015, the LU/LC multispectral classified map was found to be 1601.22 ha of vegetation and 786.67 ha of less vegetation. In 2018, the LU/LC predicted map was found to be 1621.18 ha of high vegetation and 766.71 ha of less vegetation. In 2021, the LU/LC predicted map was found to be 1596.04 ha of high vegetation and 791.85 ha of less vegetation. In 2024, the LU/LC predicted map was found to be 1568.23 ha of high vegetation and 819.66 ha of less vegetation. In 2027, the LU/LC predicted map was found to be 1553.17 ha of high vegetation and 834.72 ha of less vegetation. It was observed that the LU/LC changes have been frequently happening every three years in the area of Javadi Hills. The results of the LU/LC change that occurred between the years 2012 to 2027 are shown in Table 10. The comparison chart of LU/LC area statistics for the time-series data from 2012 to 2027 is shown in Figure 21.

Comparative Analysis
In this research work, we have proposed the Vision Transformer-based Bi-LSTM prediction model for analyzing the past, present, and future changes of Javadi Hills, India. We also infer that the LU/LC prediction accuracy of our model provides a lower error rate, i.e., below 5%. From the thorough analysis, we infer that the use of the LST map has a high impact on the LU/LC environment, and it was considered an important spatial feature for the prediction of the LU/LC vegetation map.
We have compared our model with CNN, DWT, and standard LU/LC classification and prediction techniques for the area of Javadi Hills. Our model outperforms the other standard classification and prediction algorithms in terms of accuracy and computational efficiency. We have executed the standard LU/LC algorithms (DWT [22], CNN [27], SVM [1], MLC [2], and RFC [25]) and provided a comparative analysis of the Vision Transformer model for our study area of Javadi Hills in Table 11. We have also presented the comparative accuracy of the classification model in Figure 22. We have also shown the comparative analysis of our prediction model with the hybrid machine-learning models

Comparative Analysis
In this research work, we have proposed the Vision Transformer-based Bi-LSTM prediction model for analyzing the past, present, and future changes of Javadi Hills, India. We also infer that the LU/LC prediction accuracy of our model provides a lower error rate, i.e., below 5%. From the thorough analysis, we infer that the use of the LST map has a high impact on the LU/LC environment, and it was considered an important spatial feature for the prediction of the LU/LC vegetation map.
We have compared our model with CNN, DWT, and standard LU/LC classification and prediction techniques for the area of Javadi Hills. Our model outperforms the other standard classification and prediction algorithms in terms of accuracy and computational efficiency. We have executed the standard LU/LC algorithms (DWT [22], CNN [27], SVM [1], MLC [2], and RFC [25]) and provided a comparative analysis of the Vision Transformer model for our study area of Javadi Hills in Table 11. We have also presented the comparative accuracy of the classification model in Figure 22. We have also shown the comparative analysis of our prediction model with the hybrid machine-learning models [7] for the area of Javadi Hills in Table 12. Table 11. Comparative analysis of the proposed Vision Transformer model with other algorithms for the area of Javadi Hills, India.

Algorithms
Average Accuracy (%) Ours 98.76 CNN [27] 96.42 DWT [22] 94.21 SVM [1] 97.71 MLC [2] 94.4 RFC [25] 95.6 Our model outperforms the hybrid machine-learning models [7] and provides good prediction accuracy. We have validated the use of the LST map with other spatial maps that include a slope, aspect, and distances from the road map [7] for our prediction model. From the thorough analysis, we infer that the use of the LST map has a high impact on the LU/LC environment, and it has been considered an important spatial feature for the prediction of the LU/LC vegetation map. We have shown a few comparisons of the validation results of the LU/LC prediction methods by using LST, slope, aspect, and distance from the road map for the area of Javadi Hills in Table 13. Table 12. Comparative analysis of LU/LC prediction models for the area of Javadi Hills, India.

Study Area Algorithm Prediction Accuracy (%)
Javadi Hills, India Vision Transformer-based Bi-LSTM Model (ours) 98.38% RFC-based MC-ANN-CA Model [7] 93.41%   Our model outperforms the hybrid machine-learning models [7] and provides good prediction accuracy. We have validated the use of the LST map with other spatial maps that include a slope, aspect, and distances from the road map [7] for our prediction model. From the thorough analysis, we infer that the use of the LST map has a high impact on the LU/LC environment, and it has been considered an important spatial feature for the prediction of the LU/LC vegetation map. We have shown a few comparisons of the validation results of the LU/LC prediction methods by using LST, slope, aspect, and distance from the road map for the area of Javadi Hills in Table 13. We also show a few comparative analyses of overall prediction models for a few different study areas in Table 14. We observed that there is a performance variation in the prediction results for each study area around the world. This variation of the LU/LC classification and prediction results was due to the selection of study area, satellite data, environmental data, and its LU/LC classes. A variation of results was observed for our study area with the assessment of multi-satellite datasets through the proposed algorithm. We delivered a clear view of the importance of Vision Transformer-based LU/LC classification and Bi-LSTM-based prediction for forecasting the time series LU/LC vegetation map. The advantage of our proposed work lies in using only the LST map as the spatial data for predicting the LU/LC vegetation map. We also achieved a good prediction accuracy of 98.38%. Our proposed algorithm can be applied to other study areas around the world in predicting the LU/LC vegetation map. Moreover, our proposed model has been efficient for urban planners, forest departments, and government officials in analyzing the LU/LC information through XAI and taking necessary actions in the protection of the LU/LC environment.

Conclusions
The LU/LC prediction modeling was considered important research in the area of remote sensing. In this research work, the multispectral LISS-III and Landsat satellite image of Javadi Hills for the periods 2012 and 2015 were downloaded and performed