Urban Overheating Assessment through Prediction of Surface Temperatures: A Case Study of Karachi, Pakistan

: Global climate has been radically affected by the urbanization process in recent years. Karachi, Pakistan’s economic hub, is also showing signs of swift urbanization. Owing to the construction of infrastructure projects under the China-Pakistan Economic Corridor (CPEC) and associated urbanization, Karachi’s climate has been signiﬁcantly affected. The associated replacement of natural surfaces by anthropogenic materials results in urban overheating and increased local temperatures leading to serious health issues and higher air pollution. Thus, these temperature changes and urban overheating effects must be addressed to minimize their impact on the city’s population. For analyzing the urban overheating of Karachi city, LST (land surface temperature) is assessed in the current study, where data of the past 20 years (2000–2020) is used. For this purpose, remote sensing data from the Advanced Spaceborne Thermal Emission and Reﬂection Radiometer Global Digital Elevation Model (ASTER GDEM) and Moderate-Resolution Imaging Spectroradiometer (MODIS) sensors were utilized. The long short-term memory (LSTM) model was utilized where the road density (RD), elevation, and enhanced vegetation index (EVI) are used as input parameters. Upon comparing estimated and measured LST, the values of mean absolute error (MAE), mean square error (MSE), and mean absolute percentage error (MAPE) are 0.27 K, 0.237, and 0.15% for January, and 0.29 K, 0.261, and 0.13% for May, respectively. The low MAE, MSE, and MAPE values show a higher correlation between the predicted and observed LST values. Moreover, results show that more than 90% of the pixel data falls in the least possible error range of − 1 K to +1 K. The MAE, MSE and MAPE values for Support Vector Regression (SVR) are 0.52 K, 0.453 and 0.18% and 0.76 K, 0.873, and 0.26%. The current model outperforms previous studies, shows a higher accuracy, and depicts greater reliability to predict the actual scenario. In the future, based on the accurate LST results from this model, city planners can propose mitigation strategies to reduce the harmful effects of urban overheating and associated Urban Heat Island effects (UHI).


Introduction
Human activities and excessive construction have altered the surface structure and materials of the Earth, thereby disturbing the natural environment. These changes disturb the Earth's surface energy balance and the atmospheric composition, disturbing the local climate. Rapid urbanization and associated human settlement lead to a distinguished local climate referred to as urban climate [1]. Urbanization, one of the major human activities, transform the natural landscape into an impervious and built-up area. The impervious area infrared (TIR) remote sensing is used, which relies on radiances from TIR sensors to record surface temperatures [26].
The main focus of the SUHI studies is analyzing the relationship between the intensity of urbanization and LST [19,27]. The variation in temperature (or climate) is recorded using the information of the vegetative cover [28]. Multiple studies report a negative correlation between LST and vegetation [8,29]. The areas with higher vegetation constantly produce cooler temperatures than non-vegetated areas. This decreases the daily mean temperature of residential areas and streets by 0.5 K and 0.9 K, respectively [30]. LST is the canopy temperature at maximum spectral vegetation or complete canopy cover [31]. Thus, proving the inverse correlation between the vegetation fraction and LST. This correlation is stronger than the relationship between the normalized difference vegetation index (NDVI) and LST [32]. The current study uses LST retrieved from Moderate-Resolution Imaging Spectroradiometer (MODIS) thermal infrared data to get information about vegetative cover in the study region. MODIS products contain two vegetation indices: the standard NDVI and the Enhanced Vegetation Index (EVI). NDVI is measured using the radiation of visible and near-infrared bands. EVI is measured similar to NDVI but with improved vegetation monitoring by de-coupling the canopy background signal, improved sensitivity to high biomass regions, and reduced atmospheric influences [33].
Road density (RD) is referred to as the density of a road per unit area and is among the key parameters that influence an area's LST. The relationship between LST and RD is direct, and therefore, it can be interpreted that an increase in the RD increases LST [34]. This behavior is due to the energy exchanges between resulting modified land surfaces due to urbanization. Accordingly, this energy interaction results in an LST rise due to its dependence on the surface heat fluxes [35]. Thus, it can be deduced that a rise in surface temperatures and a decline in green vegetation cover is associated with rising LST and RD owing to urbanization [34]. This concept is used in the current study.
Studies prove that LST cannot be predicted by only assessing the vegetative cover of the area. Instead, the surrounding environment also has to be taken into consideration [36]. During the analysis of substantially large territories for urban overheating where the ground level is inconsistent, elevation becomes one factor affecting LST. By analyzing the environmental lapse rate (ELR), it is observed that LST is considerably affected by variation in elevation. The ELR varies from 5-10 • C with every 1000-m change in elevation. These variations in LST are observed to be inversely related to elevation [34,37]. However, considering the lesser variation of elevation in urban cities, the effect of elevation can be ignored. This is particularly applicable to Karachi, which is mostly plain and inland with negligible changes of elevation.
Previous studies have estimated urban overheating and associated UHI based on air temperature and LST [38,39]. However, using air temperature to investigate these phenomena is challenging. In developing regions like Pakistan, there is a lack of meteorological stations and ground-based data for recording air temperatures [38]. Therefore, the researchers became more inclined towards urban overheating studies based on LST acquired from remote sensing (RS) data because of its spatial extent and temporal availability [40]. Moreover, previous research studies mostly used the correlation of mean LST with two key land use/land cover (LULC) types (green spaces (GS) and impervious surfaces (IS)) to investigate the urban overheating and UHI effect in various mountainous cities [41,42], coastal cities [43,44] and inland cities [44,45].
In addition, researchers have recently employed several other techniques to get a more comprehensive understanding of urban overheating and UHI distribution. For example, studies have used the urban-rural gradient analysis to investigate the temperature variations between rural and urban regions [41,44,46]. The drawback of this approach is that the UHI's directional deviation cannot be acquired through the gradient analysis. Due to this reason, various researchers have applied the multidirectional and multitemporal UHI profile analysis in the diagonal and orthogonal directions of their study areas [41,47]. Other studies have used the grid-based method to explore the LST in their study areas [43,48,49].
Recently, the studies analyzing urban overheating and UHI using LST prediction showed superior results when Support Vector Regression (SVR) and Artificial Neural Network (ANN) models were included [26,50].
Similarly, forecasting approaches based on deep learning have been employed in studies on time-series data [51,52]. Importantly, it is noted that in time-series analysis, recurrent neural networks (RNNs) show comparatively more adaptability and improved prediction results. However, RNN has some drawbacks. These include quick losses of longterm memory and exploding and vanishing gradients. Nevertheless, these problems have been addressed by long short-term memory (LSTM) models, which is a variant of RNN. LSTM addresses the problems of RNN effectively and represents the short and long-term dependencies more effectively. These LSTM models have been employed successfully in studies on time-series data in different disciplines, such as speech recognition, machine translation, and traffic flow forecasting [53]. However, the use of these models to study urban overheating and associated UHI effect and LST of areas has not been reported yet, thus presenting a gap targeted in the current study. This research aims to develop an LSTM architecture that uses temperatures of previous years and parameters like vegetation, RD, and surface elevation to predict the LST of Karachi, Pakistan. It should be noted that these parameters vary independently, and despite having some correlation, developing any model for assessing their interrelationship is challenging. LSTM model can be used for such integration that has been utilized in the current study. The resulting LST, predicted through LSTM, is compared with SVR and ANN models used in previous studies to check the accuracy of results.
Thus, this research contributes to the existing literature by providing a novel and efficient LSTM-based approach for LST forecasting. The proposed model helps in prediction when acquiring data using remote sensing observations is not possible due to the cloud cover or other hindrances. Furthermore, the model proposed in the current study can be used during the planning process of a town or a city so that the urban overheating and associated UHI effects are scaled-down, thus, reducing the associated energy demand.
Similarly, this study is critical to the China-Pakistan Economic Corridor (CPEC) project, where Karachi is the focus region. CPEC has been a gamechanger for policymakers alongside the general masses of both China and Pakistan. It offers advantages in terms of economic development and the development of the lifestyle for the individual inhabitants of the region. Despite the benefits mentioned above and the eminence of this project, it has some repercussions. The increase in the infrastructure and concrete cover in the region due to projects under CPEC and rapid urbanization has increased the average temperature of Karachi. This has resulted in the urban overheating of Karachi. This has been causing some serious health issues, including heatstroke, respiratory difficulties, and heat exhaustion coupled with higher air pollution and energy costs (for cooling, etc.). Considering that Karachi is a major industrial and metropolitan city of Pakistan, it has become imperative that the ramifications of CPEC projects are investigated in Karachi. This has been targeted in the current study. Accordingly, mitigation and adaptation strategies must be developed for such urban overheating and other associated UHI effects in advance to minimize the effects on the local population's quality of life. Accordingly, the current study humbly contributes to assessing urban overheating and UHI effects and is among the first studies targeting this useful study area.
The rest of the paper is organized as follows: Section 2 presents the study area and its characteristics. Section 3 discusses the three-step method adopted in the current study. In the first step of this section, data is acquired from MODIS and Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM) sensors. In the second step, an LSTM model is developed that is used to forecast the urban overheating for Karachi in the third step. Section 4 presents the results and discussions of the study where the LST, EVI, RD, and elevation results obtained from the developed model are presented and discussed. Finally, Section 5 concludes the study, presents the limitations, key takeaways, and the direction for futuristic expansion based on the current study.

Study Area
The study area chosen for this paper is Karachi, Pakistan (24 • 51 35 N-67 • 0 36 E), as shown in Figure 1. Karachi, previously known as Khurachee Scinde, is located in the south of Pakistan along the Arabian sea. It is the hub of land and sea transportation in Pakistan and has two large seaports: Port of Karachi and Port Bin Qassim. The city also contains Pakistan's busiest airport, the Jinnah International Airport. This area has two small ranges, Khasa and Multi hills. These hills are separated from each other with a wide coastal plain containing dry riverbeds and water channels. It is the capital of the province Sindh, with a population of 23.5 million living in the urban area [54]. Despite being a coastal area and having cold sea breezes, the city is comparatively hot in a large portion of a year: March to August. The winter plays its part from December to February. Karachi experiences the highest to lowest temperatures between 285.928 K to 307.594 K, with an average precipitation of 170 mm annually [55]. In the current study, the boundaries set by the urban planning authorities of Karachi and those defined by the government of Pakistan and Sindh development authorities have been used to define the case study area. The urbanization in Karachi is rapidly expanding. Further, infrastructure construction and built-up areas are increasing, resulting in the decline of its natural environment. Being an economic hub, the city has a huge population density that is a key factor for this urban expansion. To accommodate the population and provide them with better resources, the agricultural land in Karachi is regularly replaced by urban areas over time. This urban development and construction are having detrimental effects on Karachi's environment and its local climate [56]. Over the last decade, this urbanization has increased the local temperature in Karachi due to urban overheating, resulting in serious health concerns for the city's inhabitants [57]. An extreme event was observed in June 2015 where a heatstroke claimed more than 1200 lives in 10 days [58]. Moreover, the severe temperatures of up to 45 • C due to urban overheating, zero wind activity, and high humidity coupled with 10-18-h power outages have also caused havoc and panic in Karachi's residents.
Considering that Karachi is a major industrial and metropolitan city of Pakistan, it is important to prevent such instances from occurring in the future by building city resilience or reducing the causative factors for such catastrophic events or disasters [59,60]. For this reason, the urban overheating effect, along with its associated factors such as the UHI, must be studied and catered for in Karachi. Thus, it is chosen as the case study area in the current study. Also, Karachi is an important part of the CPEC one belt one road program that will increase the road infrastructure in the region. This presents a significant threat to the city's environment, where the built-up area will be further increased. This will ultimately raise the LST and add to the urban overheating of Karachi. This provides more impetus to study urban overheating in Karachi and cater to its causation factors to build its resilience.
Various studies have investigated the heat waves and associated heath strokes in Karachi over the last decade. These include vulnerability assessment of urban expansion and associated heatwaves [3], the socioeconomic impacts of heatwaves [61], health impacts of heatwaves and heat accumulation [62], and generic management of heat waves [62]. However, no study has investigated the urban overheating and associated UHI effects in the city that is bound to grow and cause more detrimental effects due to CPEC and other development projects if not catered for appropriately. Therefore, the current study targets this aspect of Karachi, Pakistan, and uses it as a case study to assess the effects of urban overheating on the city and proposes LSTM to assess the LST of the city. Accordingly, mitigation measures can be planned and implemented to deal with such urban overheating and associated UHI effects in Karachi and similar regions.

Methodology
The methodology adopted in the current study consists of three steps: data acquisition, LSTM network development, and LSTM forecasting model development. In step 1, data from MODIS and ASTER are acquired and used for pertinent analyses. Based on this data, in step 2, an LSTM network is developed where back-propagation through a time algorithm is conducted. Further, the weights are updated using a gradient-based optimization algorithm. Finally, in step 3, an LSTM forecasting model is developed based on the LSTM network developed in the previous step. In this model, data processing modules, network prediction, model evaluation, and network training are incorporated. The data acquisition, model development, and forecasting based on the developed model are explained subsequently.

Data Acquisition
The first step of the method adopted in this study deals with data acquisition. The data used in this research is taken from MODIS's derived thermal indices based on ASTER sensors. Eight-day 1 km MYD11A2, LST, and emissivity product of the MODIS (resolution 926.626 m) are used in the current study. Further, the sixteen days vegetation indices MYD13A2 product of MODIS (resolution 926.626 m) of overlapping dates with MYD11A2 have also been used for this study, as shown in Table 1. Data of twenty years (2000 to 2020) with two fixed intervals of January (009-016 days) and May (137-144 days) have been used. A security flag available with MYD11A2 has been used to select the best quality pixels for the analysis with the allowable error of 2K. It helped filter out any unreliable or poor-quality data that may be acquired from the sensors.  Table 1 provides the details of remote sensing data acquired for this study. Accordingly, two sensors have been used for this study, MODIS, and ASTER, to get data from satellite platforms of Aqua and Terra. Aqua and Terra are two of NASA's scientific research satellites orbiting around the Earth. The orbits of these satellites are timed such that Aqua passes over the equator from south to north in the afternoon. Similarly, Terra passes from north to south direction in the morning time over the equator. The MODIS sensor is an important instrument on board with the platforms/satellites of Aqua and Terra. It has been designed to observe the land surface, ocean, and Earth's atmosphere. The use of MODIS and ASTER sensors on the Aqua and Terra satellites results in a significant improvement in the spatial resolution of the acquired data. Spatial resolution is highly important as it contributes to how sharply the objects can be seen in an image. It represents the size of the tiniest feature captured by a satellite sensor or portrayed in a satellite photo. It is commonly expressed as a single number representing the length of one of the sides of a square (grid).
The elevation pixels of the study area have been acquired from the ASTER GDEM map. ASTER GDEM standard data products are produced with 30-m postings and have Z accuracy between 10 m and 25 m root mean square error (RMSE). To generate scene-based GDEMs, automated processing, including stereo-correlation of the ASTER GDEM scenes, is employed. To produce the final pixel value, residual bad values and outliers are eliminated, and selected data are averaged. The residual anomalies are then corrected, and the data is partitioned or split into 1 × 1-degree tiles.

LSTM Network Development
In step two of the method for the current study, an LSTM network is developed. LSTM is an improved form of the standard RNN with three important threshold functions under its hidden layer of cell structure, as shown in Figure 2 [15].
In Figure 2, C t−1 , H t−1 , X t , are the three inputs to the LSTM cell. X t , represents the input of the current time step. H t−1 represents the output of the preceding LSTM unit. C t−1 is the "memory" of the previous unit and is the most important input. h t is the unit output while C t is the memory of the current network. Thus, this single unit decides on a new output based on previous memory, previous output, and current input. Afterward the memory is modified and stored. Other symbols mentioned in Figure 2 include: X = Element/point-wise multiplication + = Element/point-wise addition Tanh = Hyperbolic tangent σ = Sigmoid The tan h activation function is utilized for the regulation of the values flowing through the network. It compresses the values to fall within −1 and 1. The gates use sigmoid (σ) activations. The σ and tan h activations are quite similar. However, unlike tan h activation, the σ activation compresses the values to fall within 0 and 1. This is useful in updating or forgetting data because every value multiplied by 0 yields a null value. This eventually results in such values being forgotten. Similarly, every number that gets multiplied by 1 yields the same value again. Thus, the value remains unchanged or is "kept as it is." In this way, the network filters out the important data by forgetting or keeping the data.
The LSTM's capability of learning long-term dependencies makes it reliable for assessments than other methods. Accordingly, it has been used in multiple studies related to remote sensing and wider urban research. Some examples include, object-based change detection [63], water depth assessment [64], crop estimation [65], building extraction from images [66], disaster management [67] and others. LSTM threshold functions, known as gates, control information storage and data interaction in forward propagation. These gates comprise the forget gates, denoted by f t in Figure 2, whose function is to keep track of the data to be kept or discarded. The values of the forget gates are calculated through Equation (1). The other gates include the input gate, denoted by i t , which controls the data entering a cell. In other words, it decides which data should be processed or discarded. This is computed through Equation (2). Finally, LSTM network consists of the output gate, denoted by o t , which controls and decides the best alternative among many possible outcomes to be used for further analyses. This is calculated through Equation (3). Here, Equation (4) express the updated current time (C t ) equation for the cell state information.
where c is the state and tan h is the activation function. The combined form of forgetting and input gate for the current time point of the cell state is shown in Equation (5). Equation (6) shows the output, i.e., the information that is returned to the hidden layer.
The LSTM models are trained by back-propagation through a time algorithm [19]. Firstly, the output is calculated using Equations (1)-(6) for forward propagation. In the next step, the error term for each cell is calculated. These include error terms for longitudinal propagation through layers and temporal transverse propagation. Lastly, based on each error term, the gradient of each weight is calculated. These weights are updated using a gradient-based optimization algorithm to give the final model.

LSTM Forecasting Model
An LSTM model was designed based on RNN and univariate time series data with limited samples for the current study. Figure 3 shows the LSTM model with a basic structure consisting of data processing, network prediction, model evaluation, and network training as the main modules. Using this model, the time series can be predicated as follow. First, previous data is processed and divided into test and training sets. Then, an LSTM model is designed and trained on the training sets to make suitable predictions based on the test data set. Finally, the model's accuracy is assessed and compared with existing models to highlight any potential improvements. In the data processing stage, the examined time-series data can be converted into controlled learning problems by generating overdue input observation and forecast labels. For a given lag step k, the forecasting value at time t is computed by the input vector (y t−k , y t−k+1 , . . . , y t−1 ). After that, the loss function of mean square error (MSE) is used to train the model on the LSTM network and conduct the forward calculation. Finally, the ultimate forecasting model, as shown in Equation (7) is used.
y t = f (y t−k , y t−k+1 , . . . , y t−1 ) (7) where k represents the number of neurons in the input layer of LSTM structure, f is the forward calculation, t is a period, and y t is the forecasted value. Forecasted values are calculated through iterations. Accordingly, an initial forecast on the test data set is taken to create a new input vector and compute the next forecasted value. This is continued until all forecasted values are obtained, then these values are compared with the test data set to check the accuracy of the forecasting model.

Results and Discussion
The following section encompasses the research results regarding EVI, RD, and elevation's effect on LST (which eventually causes urban overheating and associated UHI effects). The results are verified using the year 2020 s data. Mean absolute error (MAE), MSE, and mean absolute percentage error (MAPE) are used to investigate the accuracy of the developed model. Moreover, the proposed LSTM model has been compared with the previously used SVR and ANN models to highlight the increase in the accuracy and efficiency of prediction. This is followed by model evaluation, where the number of pixels having errors within allowable limits is calculated.

Land Surface Temperature (LST)
The urban overheating and associated SUHI effect for the summer and winter seasons of the study area has been analyzed and carried out using eight-day night-time LST data of the years 2000 to 2020. As a sample, LST images of Karachi for data of the year 2020 are shown in Figures 4 and 5. In Figures 4 and 5, the data from January represents the winter season, and data from May represents summer. The x and y-axis in both Figures 4 and 5 depict the latitude and longitude of the map, while the kilometers bar has been added as the reference scale for the map. Both Figures 4 and 5 depict that the center part of the city, the Central Business District (CBD), has more built spaces and concrete structures compared to the previous years. This has resulted in a comparatively higher LST as the built-up regions have increased compared to the previous years. Interestingly, the non-built-up region (outside the urban blue boundary) has higher LST pixels than the build-up area (inside the urban blue boundary). This is because these rural/non-built-up areas of Karachi comprise dry mountains and barren sandy land, resulting in higher LST.  Regardless of this, the main takeaway from these figures is that the relative increase in the concrete cover in the central region of the city is resulting in the rise of LST. If this trend carries on, this rising LST value of the central region will soon surpass the LST value of the surrounding rural region. The figures also show the actual and predicted values of the region. For the central part of the city, the predicted and actual temperatures are approximately 293.76 K and 292.92 K respectively in winter, i.e., January, and 302.6 K and 303.06 K temperatures in summer, i.e., in May. This is in line with previous studies [68][69][70]. The authors of pertinent studies have reported that LST will increase with more constructed areas due to high built-up areas, low vegetation, and high use of anthropogenic materials [71]. In the current study area, due to the progress of CPEC and its associated initiatives, the urban and constructed area of Karachi has significantly increased. These have contributed to the rise in LST values and the corresponding urban overheating and UHI effects in Karachi. This has the potential for a further increase due to more construction and urbanization in the future.
The LST pattern of Karachi has not changed for two periods since 2019 (January and May). However, there is an increasing pattern for mean LST values of January and May from 2000 to 2020. Moreover, urban overheating and UHI intensities, which are the most quantitative and simplest indicators for the temperature modifications due to urbanization [72], have been defined by the variation between the urban and rural regions' respective temperatures. In our study area, the average urban overheating intensity for May and January is 0.1435 and 0.17615, respectively. The overall mean urban overheating intensity of the study area for 20 years of the two months (January and May) is 0.1598. Several parameters like EVI, RD, and elevation affect the LST. Accordingly, these are used as input parameters to the LSTM model developed in the current study to predict LST values. The results are subsequently presented and discussed.

Enhanced Vegetation Index (EVI)
The EVI analysis of Karachi is shown in Figure 6 Figure 6 shows abundant intermediate vegetation compared to a shallow blue area (urban and constructed land). The area inside the urban boundary is brown, representing lower EVI values due to wider built-up areas with an impermeable surface containing little vegetative cover. Previous studies have also shown that an increase in EVI value means an increase in the vegetation cover and vice versa [73]. In vegetation index images, vegetation index values of rural areas correspond to the combination of harvested agricultural land and the vegetated ground. It consists of lands with residual plant parts and bare soil. The vegetation index values are low in summer due to the low water supply and high temperature, causing a decrease in the vegetation of Karachi. This is in line with Mathew et al. [10] and Wang et al. [74], who reported that vegetation index value is lowest in the summer season due to lower water supplies and higher evaporations. The vegetation index can be increased by planting more trees, pocket parks, and changing human behaviors [75,76]. The associated urban overheating and UHI effects can be mitigated by these strategies coupled with urban ventilation in the city [77].

Road Density (RD)
The road network of Karachi is more than 9500 km in length, as shown in Figure 7. The high and low values are mentioned to define the range of the RD values. From Figure 7, it can be seen that the road network is very extensive in the urban areas of Karachi. It has a much higher road length and density compared to the rural areas (area outside the urban boundary). The high and low values of RD of Karachi ranges from 0 to 62.45 km/sq.km, having an average RD of 36 km/sq.km.
Despite its many advantages to the city and country, the CPEC projects have significantly increased the city's road infrastructure, which has a prominent effect on the surrounding environment of the area under study (Karachi). This has led to increased urban overheating, UHI effects, and elevated temperatures in the city. As discussed by Hart and Sailor [78] and Ge et al. [79], through Normalized Difference Built-up Index (NDBI), roads and constructed environments act as crucial factors in determining the urban overheating effects. These may lead to higher temperatures if not catered for. This can leave the area uninhabitable for the residents. Accordingly, strategies like more tree plantation, urban forest landscapes, structural-taxonomic attributes, and using cooler materials in road construction have been suggested to mitigate the harmful effects of urban overheating and the associated UHI effect [75,80].

Elevation
Correlation analysis was conducted to investigate the impacts of elevation changes on LST in the study area. The results show that elevation and LST have a significant correlation value of 0.681 for this region. Thus, the effect of elevation must be taken into consideration in similar studies. The pixels of the elevation model of the study area are shown in Figure 8a. These have been produced with ASTER GDEM data at 30 m resolution. Figure 8 shows the study area elevation to be in the range of 400 m and −19 m. It must be noted here that the LST image has a resolution of 90 m. To compare any two two data sets, it is necessary to use uniform spatial resolution for all images. Thus, the ASTER GDEM image has been resampled to compare LST with elevation, as shown in Figure 8b. Due to the averaging of neighborhood pixels, the highest and lowest elevations of the study area changed from 400 to 326 m and from −19 to −3 m, respectively. The GDEM and LST images are captured to represent the same area on the ground.
A similar technique was applied in previous studies using a resampling tool to get an ASTER image by Khandelwal et al. [81]. The authors observed a change in elevation in the resampled images due to averaging of neighboring pixels. As Mathew et al. [10] discussed, elevation enhances urban overheating; thus, it should be considered when conducting urban overheating and UHI effect studies. Table 2 compares the actual LST values of the study area and the predicted values from the developed LSTM model. A very close resemblance in minimum, maximum, and standard deviation values can be observed. Furthermore, the mean values and the standard deviations are closely aligned, showing higher accuracy of the developed model. This is complemented by the results from Figures 4 and 5, which show similarities in actual and predicted LST values. Accordingly, it can be inferred that the LSTM model performed well for the whole set of periods in the study area. Thus, the LSTM model can predict LST values of different pixels with high accuracy. Such models can be used to analyze the urban overheating and associated UHI effects. Mitigations plans can be developed accordingly to reduce the harmful effects and make the cities more livable and resilient.  LST values (observed and predicted) have been plotted using 200 randomly selected pixels from the study area in this study. This is done to assess the temporal variation of the model's observed and predicted LST values. Further, the model testing using randomly selected values from the study area assures that the model develops no harmony or affinity towards the selected study area. Thus, it can accurately predict LST in various global cases. The graphical plotting of the values in Figure 9 for January and May, shows a clearer picture regarding the model's accuracy. It plots the LST values against the pixel numbers for both predicted and actual values of the developed model. As a result, the model shows a consensus or closer proximity for the measurements of the observed and predicted values. This shows that the model predicts higher as well as lower LST values with greater accuracy.

Model Validation
The observed LST values have been compared with the predicted LST values, and the MAE is calculated using Equation (8). The principles of statistics used to examine and gauge the performance of this proposed model are MAE, MAPE, and MSE. For absolute error calculation in the model, the difference of both observed and predicted LST values have been taken for each pixel. In Equation (8), Y represents the observed values, and Q refers to the predicted values. MAE is calculated by taking the mean of absolute errors using Equations (9) and (10).
Here E A1 + E A2 + E A3 + . . . + E AN are the absolute errors at each selected pixel, and N refers to the number of pixels. Similarly, the PE, which refers to the percentage error, is calculated using the ratio of absolute error (E A ) to respective observed LST values (Y). The mean of the absolute percentage errors (MAPE) is given by Equation (11).
Minitab ® was used to calculate the regression coefficient using the fit regression model. The MAE, MSE, MAPE for the periods of 2020 are shown in Table 3. From Table 3, it can be observed that the MAE values of both months are low (0.27 K and 0.29 K). The MAE values represent a consensus between the predicted and observed LST values of the proposed model. Similarly, MSE values for both months are also lower, i.e., 0.237 and 0.261. Furthermore, the largest MAPE value from the proposed model is 0.15%, representing very little variation among the two streams.  [26]. In comparison, the proposed model has comparatively fewer values of all these parameters. Thus, the current model outperforms these relevant studies and shows a higher accuracy.
The performance of the developed model for January and May is presented in Table 4. In Table 4, the pixels are categorized based on the error in the prediction of LST values using the proposed model. The positive error depicts that the observed value is greater than the predicted LST value. On the other hand, the negative error indicates a higher predicted value as compared to the observed value. The data in Table 4 shows that most of the pixel data from the study area have errors falling within the 1 K range (−1 K to +1 K), which is the least possible error. The percentage of pixels having errors within 1 K is 75% for January and 55% for May. Most of the pixels having lower error mean less difference between the actual and predicted values. Thus, it is inferred that the proposed model can accurately predict the values.  Figure 10 represents the error in the prediction of the spatial data for the study period. Distinct colors are used to depict the different ranges of the error between actual and predicted values. For example, the red color shows the values or areas where the error was more than 1 K (in the range of 1 K-2 K). From Figure 10, it is observed that the prediction with a 1 K error is scattered over the study area. Therefore, very few pixels fall outside the 1 K value for error between the observed and predicted LST values. The few pixels with comparatively greater error predominantly fall outside the urban parameters (i.e., rural areas). Thus, in the urban region, the observed and predicted values show close similarity, hence depicting higher reliability of the proposed model.
The reason for the greater error in rural areas is that different seasons and variations in rainfall intensity cause a significant annual variation to EVI of the rural areas compared to urban areas [10]. It was noted that the EVI value increased in the warm season and decreased in cold seasons [26], which provides proof that seasonal variation has a significant effect on EVI and the associated urban overheating effects. Another factor for higher error in the rural areas can be its RD. In this study, one RD data is used. There is also a possibility of change in the ground features of the rural areas due to the development process, such as the road construction due to CPEC projects causing the RD to change. On the other hand, such significant changes are not observed in the urban area as the major constructional features are already built. Thus, the change in RD in the urban area is only due to the widening of existing roads. Furthermore, a higher density of errors in pixel of urban land beyond 1 K is observed during May compared to January. This is because May falls in the summer in Karachi. The temperatures are higher in the summer season, bringing in the heat waves and increasing the associated urban overheating and UHI effects, air pollution, anthropogenic heat, etc. [82]. Despite these factors, major LST data predicted is within the 1 K error range, which is reliable for predicting the LST and addressing the detrimental effects of urban overheating and associated UHI in Karachi.
Overall, Table 4, Figures 9 and 10 are interrelated as these compare the actual and predicted data. The difference is that Figure 9 represents variations (i.e., error) at each pixel. Table 4 distributes the pixels (percentage-wise) in various classes based on error ranges. While Figure 10 is based on the spatial extent and shows the prediction error of various points in the study area map.
From the above-stated observations and analysis, it can be concluded that the proposed model is suitable for the prediction of LST. Based on this model, urban overheating and associated UHI effects can be analyzed and predicted in advance for Karachi and similar contexts to minimize the detrimental effects and make the cities more resilient and livable.

Conclusions
In this paper, LST data of the past 20 years, along with elevation, RD, and EVI, are used as input parameters for LST prediction to assess urban overheating. An LSTM model is developed to predict the LST for the study area, i.e., Karachi, Pakistan. The results show a higher correlation between the actual and predicted data sets of the developed model. The MAE, MSE, and MAPE values for January are calculated as 0.27 K, 0.237, and 0.15% and 0.29 K, 0.261, and 0.13%, respectively for May. The low MAE, MSE, and MAPE values depict that there is very little variation among the two streams of data set (i.e., predicted and observed values). The proposed LSTM model is also compared with the results of the previously published ANN and SVR models. The MAE, MSE and MAPE values for ANN model are 0.76 K, 0.873, and 0.26% while for SVR are 0.52 K, 0.453 and 0.18%. These values are comparatively higher than the value from the proposed model. This shows the superior performance of the proposed model. Hence, it is more reliable compared to the previously developed models.
The developed model has the least possible prediction error (1 K), and most of the pixels of the model have errors in the range of −1 K to +1 K. This error is within allowable limits and shows that the pixels are of supreme quality. The MODIS LST quality criteria also declare the pixels with errors below or equal to ±2 K as good quality for studies related to urban overheating and associated UHI effects. In this paper, more than 90% of the pixels lie within the allowed limit of ±2 K. Thus, it is concluded that the proposed LSTM can be utilized for predicting LST of various areas to assess urban overheating and UHI effects. Thus, it can be used in the planning phase of city development projects for areas with greater urban overheating and UHI issues.
The current study is limited to a single city (Karachi) in a developing country. Furthermore, the results have not been validated in the field. However, based on several theoretical validations, it is expected that the proposed LSTM model will give reliable results and identical results to those recorded for the site temperature in the field.
In the future, based on the accurate LST results from this model, city planners can propose mitigation strategies to reduce the urban overheating and associated UHI effect and control different anthropogenic heat sources. In this context, mitigation strategies such as tree plantation, pocket parks, urban ventilation, and others are proposed. Similar studies can be conducted in a developed country, and the results can be compared with the current study to highlight the causes of urban overheating and associated UHI effects in different contexts. This way, a holistic prediction model can be developed to assess the LST and associated urban overheating effect for both developing and developed countries. Further, the study only considers elevation, RD, EVI, and LST. In the future, other parameters may be assessed and integrated into a holistic LSTM model to enhance its accuracy and prediction capabilities.