Downscaling of ASTER Thermal Images Based on Geographically Weighted Regression Kriging

: The lower spatial resolution of thermal infrared (TIR) satellite images and derived land surface temperature (LST) is one of the biggest challenges in mapping temperature at a detailed map scale. An extensive range of scientiﬁc and environmental applications depend on the availability of ﬁne spatial resolution temperature data. All satellite-based sensor systems that are equipped with a TIR detector depict a spatial resolution that is coarser than most of the multispectral bands of the same system. Certain studies may therefore be not feasible if applied in areas that depict a high spatial variation in temperature at small spatial scales, such as urban centers and ﬂooded pristine areas. To solve this problem, this study applied an image downscaling method to enhance the spatial resolution of LST data by combining TIR, multispectral images, and derived data, such as Normalized Difference Vegetation Index (NDVI), according to the geographically weighted regression (GWRK) and area-to-point kriging of regressed residuals. The resulting LST images of the natural and anthropogenic urban areas of the Brazilian Pantanal are very highly correlated to the reference LST images. The approach, combining ASTER TIR with ASTER visible/infrared (VNIR) and Sentinel-2 images according to the GWRK method, performed better than all of the remaining state-of-the-art downscaling methods.


Introduction
The precise estimation of the spatiotemporal variation of the land surface temperature (LST) is a key aspect employed to describe the processes of surface energy balance [1,2], soil surface moisture, and evapotranspiration [3][4][5], as well as to understand trends in future climatic change on various spatial scales [6,7]. High spatial and temporal resolution LST data have been extensively used in previous decades in environmental studies, mostly to estimate surface temperatures in urban areas, to evaluate urban heat islands (UHI), and to investigate the temporal trends of temperature in disturbed ecosystems [8][9][10][11]. Additionally, LST data have been used to evaluate the soil surface moisture and evapotranspiration [3][4][5]. Therefore, estimating the LST and assessing its spatiotemporal variation are useful methods to understand various environmental and ecological processes.
The current thermal infrared (TIR) orbital sensors suitable for studying urban and natural regions that are characterized by high spatial variability at small spatial scales, such as Landsat-8 TIRS and ASTER TIR, depict a much coarser spatial resolution than is required to perform detailed studies. However, both ASTER and Landsat-8 missions provide LST data with the highest spatial resolution among all of the optical satellite missions and depict spatial resolutions of 90 and 100 m, respectively. Regardless, the current satellite TIR images exhibit low spatial and temporal resolutions, which limit Wang et al. [26] presented an adaptation of the ATPKR method (AATPRK), by suggesting the use of a non-stationary regression model between the dependent (TIR) and the auxiliary data (panchromatic band). According to the authors, the spatial structure of land-cover/land-use (LCLU) sometimes demands a non-stationary model due to the spatial variation in the LCLU along the image [26]. It can therefore be inferred that the relationship between the coarse band and the PAN band may not be sufficiently explained by a global regression model. However, the construction of the non-stationary model for AATPRK was based on a single independent variable (PAN band). In the case of ASTER data, which provides three high spatial resolution spectral bands in the VNIR range, AATPRK downscaling assuming non-collinearity between the VNIR bands can be achieved by applying multiple AATPRKs, having bands 1, 2, and 3, as well as band-derived products (e.g., NDVI).
Alternatively, using multiple independent variables based on geographically weighted regressions may help solve a major problem related to the non-stationary regression between MS (Multispectral) and PAN bands that occurs when working with just one independent band (PAN). According to Wang et al. [26], there is significant variance in the LCLU along the image. However, the correlation between the images is always limited to the coarse (MS) and the fine resolution (PAN) bands. Although limited to the correlation between one MS and the PAN bands, non-stationary regression enables the weights of the regression to be changed spatially. This problem can be solved by adopting a non-stationary geographically weighted regression (GWR) approach, in which both the weights and the dependent variables (PAN images) change along the image [30]. Wang et al. [26] employed an adaptive ATPRK method based on a single predictor (PAN band). Nonetheless, the authors concluded that other additional data could be incorporated into both ATPRK and AATPRK to enhance the utility of the approach. The relationship between the multiple covariates and observed coarse data can be built using multiple regression, which can be achieved by extending the linear GWR model. The AATPRK method employed by Wang et al. [26] was operated using adaptive regression windows at fixed sizes, with local windows sizes varying from 3 × 3 to 11 × 11 (pixels), and the best results were achieved using a 5 × 5 pixels window size. The method is similar to the ATPRK presented in Wang et al. [28]; however, it takes into account the analysis of the regressions within the local moving windows of specific W × W sizes, allowing for the generation of non-stationary regression models. The GWR method is analogous to the adaptive regression method adopted by Wang et al. [26]. However, in GWR the weights are computed from a weighting scheme (kernel) as a function of the location of the dependent variable in relation to the independent variable [31]. Thus, according to GWR, a number of kernels are possible and the more common schemes comprise exponential or Gaussian shapes [32][33][34]. The downscaling of thermal image data based on GWR, combined with ATPRK of residuals, could offer a suitable alternative for the precise refinement of LST data.
Given the eminent need for high spatial resolution LST data to mitigate environmental problems, the main goal of this research was to apply GWRK (geographically weighted regression), DCK, and ATPRK methods to generate 30-m spatial resolution, day and night TIR images of the Brazilian Pantanal. The results obtained by the geostatistical-based downscaling methods were compared with state-of-the-art pan-sharpening methods, namely, Principal Component Analysis (PCA), Gram Schmidt (GS), ATWT (À Trous Wavelet), and Multi-Directional and Multi-Resolution (MDMR). We expect that this research will contribute to the generation of high spatial resolution LST data with minimum spectral distortion, enriching the study of areas where the spatial variability of the temperature is characterized by high heterogeneity at small spatial scales.

ATPRK and GWRK Downscaling Methods
The ATPRK and GWR with kriging (here referred to as GRWK) methods adopted in this research are similar to those presented by Wang et al. [28] and Wang et al. [26]. However, in the present research we adapted the methods presented by these authors by using multiple predictors and GWR. Both GWRK and ATPRK methods are comprised of two phases: multiple global or multiple GWR regression, and ATPK of the residuals of the regressions. The method first requires performing a regression of each coarse band on the fine bands and then performing ATPK to downscale the band residuals from the regression model.

Multiple Linear Regression between Fine and Coarse Bands
As suggested by Rodriguez-Galiano et al. [35], the use of all fine bands as covariates does not necessarily lead to an improvement in the prediction of the target variable (temperature). Thus, some ancillary variables may have small correlations with the primary variable and might not provide useful fine spatial resolution information. However, we observed that the correlation between coarse and fine bands is highly spatially variable. Therefore, LST, for example, might have a high correlation with NVDI in urban areas and a weak correlation in flooded pristine regions, which is the case in this study. As such, we adopted multiple regressions as an effort to reduce the influence of the residuals in the ATPK step. In this research, we selected multiple predictors (PAN bands) for the ATPRK and GWRK downscaling methods. Since the DCK method demands more computational time than the other geostatistical-based approaches, we downscaled the fine band by using the most correlated (highest coefficient of correlation) band with the coarse thermal band [36].
Let Z l c (x i ) be the measurements (LST values) of pixel C centered at x i (i = 1, . . . , M,) where M is the number of pixels in the coarse band l (l = 1, ..., B, where B is the number of bands), and let Z F x j be the measurements of pixel F centered at x j (j = 1, ..., MG 2 ), where G is the spatial resolution ratio between the coarse and PAN bands. The notations F and C denote the fine and coarse data, respectively. The goal of the LST downscale is to predict target variables Z l F (x) for all fine pixels in all B coarse bands.
The first step of the regression comprises the upscaling of each ancillary band (Z F ) to the coarser resolution thermal bands (Z C ). The PAN band (Z F ) is first upscaled to Z C to match the spatial resolution of the coarse bands, which is achieved here by the application of convolution filters: where h c (x) is the point spread function (PSF) for the multispectral fine resolution band and * is the convolution operator. The relationship between the predictors (Z c ) and the observed coarse thermal image (Z l c (x)) is modeled by linear multiple regression, and for the l-th band, the regression prediction Z l c (x) is calculated as: In Equation (2), R(x) is a spatially dependent residual term used in the ATPK step. The a l coefficients are estimated by multiple ordinary least square regressions [37]. Based on the assumption of scale-invariance, a l estimated at coarse spatial resolutions in Equation (2) are used for regression prediction at the fine spatial resolution according to Equation (3). With the available fine spatial resolution multispectral bands in the studied areas, the fine spatial resolution regression prediction at a specific location, x 0 , that is,Ẑ l F1 (x), can be defined as:

Geographically Weighted Regressions between Fine and Coarse Bands
In the AATPRK method proposed by Wang et al. [26], for each coarse pixel, a linear regression model is fitted using the coarse and upscaled PAN pixels within a local window with a specific fixed size (W × W). The regression coefficients are estimated on a coarse pixel basis and are functions of the pixel locations. According to this method, the Z l C (x) at a specific location for any fine pixel in l Remote Sens. 2018, 10, 633 5 of 20 bands are predicted within the local windows of W × W sizes (4). The GWR regression considers the spatial dependence of the correlation between the coarse and the fine resolution bands within the local windows [32][33][34]. According to this approach, the nearness and similarity must be considered in the regression [31]. Therefore, we might expect that if we wish to estimate parameters for a model at some location x 0 , the observations that are closer to that location should have a greater weight in the estimation, when compared to observations that are further away. The relationship between Z C and the upscaled l bands in GWRK is then modeled by linear regression for specific locations, x. Therefore, the GWR model is described as the modification of Equation (2) by: The estimator for this model is similar to the OLS (ordinary least squares regression) global model. Thus, the weights, a l , are based on the location, x, relative to the other observations in the dataset and, hence, change for each location. Equation (4) describes the GWR regression in the upscaled multispectral bands. For any fine pixel in l bands, centered at x 0 , the regression predictionẐ l F1 (x 0 ) at the fine spatial resolution becomes: where x 0 is the center of the coarse pixel in which the fine pixel is centered at x 0 . In GWR the parameters a l are obtained by the weighted least squares estimation, and the parameter estimation in the i-th sample point can be written as:â where W i is a matrix of circular shape, and each element of the diagonal matrix is a function of the distance between the position of the observation and the regression point. Theâ l (x) in Equation (6) is the result of parameter estimation. The common distance functions include the distance threshold method, inverse distance method, Gauss function, and the bi-square function. W(x) is a matrix of weights relative to the position of x in the images; X T W(x)X is the geographically weighted variance-covariance matrix (the estimation requires its inverse to be obtained), and Y is the vector of the values of the dependent variable. The weights are computed from a kernel-weighting scheme. A number of kernels are possible; a typical one has a Gaussian shape: where W i (x) is the geographical weight of the i-th observation in the dataset relative to the location x, d i (x) is some measure of the distance between the i-th observation and the location x, and h is the bandwidth. The bandwidth in the kernel is expressed in the same units as the coordinates used in the image dataset. As the bandwidth becomes larger, the weights approach the global average and the local GWR model becomes a global OLS model. Therefore, different local to regional bandwidths were tested in this study. After the GWR regression modeling step, coarse residual images are obtained for each coarse band. The ATPK is performed to downscale the residual of each LST image to the targeted fine spatial resolution residualsẐ l F1 (x 0 ) according to Equation (5).

Geographically Weighted Regression Kriging
The regression step in both methods described above results in higher residuals for high bandwidth (h) values (7). Therefore, ATPK-based residual downscaling was adopted to maintain the spectral integrity of the LST images. ATPK is a downscaling technique that predicts values in an image with smaller spatial dimensions than that of the original data [38][39][40]. The ATPK method takes into account the size of the support, spatial correlation, and the PSF of the base image. Thus, ATRK was Remote Sens. 2018, 10, 633 6 of 20 used to interpolate the residuals of both regression models, ATPRK and GWRK. The residuals in a specific l band Z l c (x), are given by: Based on ATPK, the fine residual at a specific location x 0 ,Ẑ l F2 (x 0 ), is a linear combination of the observed coarse residuals, as described in (9): where λ i is the weight for the i-th coarse residual centered at x i and N is the number of coarse observations used in the prediction, such as a 6 × 6 window of coarse pixels surrounding a fine pixel. The N weights λ i ; ...; λ N in Equation (9) are calculated by minimizing the prediction error variance, and the corresponding kriging matrix is: In Equation (10), the term λ l CC x i , x j is the coarse-to-coarse residual semivariogram between coarse pixels centered at x i and x j in band l, λ l FC x 0 , x j is the fine-to-coarse residual semivariogram between fine and coarse pixels centered at x 0 and x j in band l, and θ is the Lagrange multiplier.
Let s be the Euclidean distance between the centroids of any two pixels, λ l FF (s) the fine-to-fine residual semivariogram between two fine pixels, and h l C (s) the PSF for band l. λ l CC (s) and λ l FC (s) in (10) are calculated by combining λ l FF (s) with the PSF h l C (s) as follows: In which the coarse pixel value is the average of the fine pixel values: where S c is the size of pixel C and C(x) is the spatial support of pixel C centered at x. Given the PSF in (13), the calculations of λ l FC x 0 , x j and λ l CC x i , x j are simplified as: where σ = G 2 is the pixel size ratio between the coarse and fine pixels, S m is the distance between the centroid x 0 of fine pixel F and the centroid of any fine pixel within the coarse pixel C centered at x i , and S mm is the distance between the centroid of any fine pixel within the coarse pixel centered at x i and the centroid of any fine pixel within the coarse pixel centered at x j . The fine-to-fine residual semivariogram, λ l FF (S), is derived by deconvolution of the coarse residual semivariogram of band l denoted as λ l C (s), as presented in Equation (16): where N(s) is the number of paired pixels at a specific lag from the center pixel x. The empirical approach of semivariogram deconvolution was discussed in the work of Wang et al. [28] and Wang et al. [26]. These authors suggested that the semivariogram function of λ l FF (s) could be described by two parameters, which are sill and range, and with a zero nugget effect. Therefore, the ATPK step applied in this research, for both the GWRK and ATPRK methods, was the same as that proposed by Wang et al. [26]. First, the base parameters are generated from λ l FF (s) by referring to the known λ l c (s) for each coarse LST band. For each parameter of λ l FF (s), two multipliers are defined empirically to generate an interval for selecting the optimal multiplier. We adopted multipliers similar to those presented by Wang et al. [26], which have the punctual sill set between 1 and 3 times the range of λ l c (s), whereas the interval of the punctual range was set between 0.5 and 2.5 times the range of λ l c (s). The optimal λ l FF (s) is determined as the parameter combination leading to the smallest difference between λ R FF (s) and λ l c (s), where R represents the regularized semivariograms [26].

Other Downscaling Methods
In this paper, we adopted another image downscaling method, DCK, which is based on a geostatistical approach [13,35,41,42]. A DCK with multiple predictors might lead to expansive computational time. Therefore, the DCK of LST data was carried out with just one predictor comprising the NDVI (multiplied by −1) in the Campo Grande region, and the Sentinel-2 band 12 in the Nhecolândia region. The other downscaling methods adopted in this paper also used the same panchromatic bands as those used in the DCK method, which comprise the following pan-sharpening methods: PCA [43], GS [44], ATWT [45], MDMR [46], and TSHARP [12]. This last one was specifically designed for the downscaling of thermal images.

Datasets and Study Locations
In this study, two areas were selected for the evaluation of downscaling methods of ASTER TIR images. The areas are located in the Brazilian Pantanal, comprising the regions of the city of Campo Grande located near the southern boundary of the Pantanal biome, and an area of natural seasonally flooded fields located in the Pantanal's sub-region of Nhecolândia ( Figure 1). The Pantanal wetlands of South America are one of the largest and most important tropical wetland ecosystems, covering an area of approximately 140,000 km 2 during maximum inundation in the rainy season [47]. The rainfall patterns of the region support an annual flood regime that varies both temporally and spatially. Thus, the Pantanal wetland is composed of a number of floodplain sub-regions with particular biodiversity, hydrological, and geomorphological characteristics [48,49].
Given the high variability in environmental characteristics of this vast region [48,[50][51][52][53][54], two areas with remarkably different patterns of land-cover/land-use (LCLU) were selected. The urban area of Campo Grande has been marked by an intense process of urbanization in the last decades, while in the Nhecolândia region the most remarkable features are thousands of small lakes/ponds composed of crystalline and salt waters [51], areas of natural vegetation (scrubs and grasslands), and scattered areas of pasture. Thus, these two areas were considered due to the high variety of anthropic features in the Campo Grande region and the remarkable nocturnal emissivity of the Pantanal saline lakes, easily observed in TIR satellite images acquired at night.
Grande located near the southern boundary of the Pantanal biome, and an area of natural seasonally flooded fields located in the Pantanal's sub-region of Nhecolândia ( Figure 1). The Pantanal wetlands of South America are one of the largest and most important tropical wetland ecosystems, covering an area of approximately 140,000 km 2 during maximum inundation in the rainy season [47]. The rainfall patterns of the region support an annual flood regime that varies both temporally and spatially. Thus, the Pantanal wetland is composed of a number of floodplain sub-regions with particular biodiversity, hydrological, and geomorphological characteristics [48,49]. Given the high variability in environmental characteristics of this vast region [48,[50][51][52][53][54], two areas with remarkably different patterns of land-cover/land-use (LCLU) were selected. The urban area of Campo Grande has been marked by an intense process of urbanization in the last decades, while in the Nhecolândia region the most remarkable features are thousands of small lakes/ponds Some studies focused on hydrological processes require nighttime water emissivity data [51]. Therefore, the downscaling of ASTER data in the Nhecolândia region was carried out using ASTER scenes collected during the nighttime and the ancillary data (PAN bands) derived from 20-m SWIR and NIR bands of the Sentinel-2 satellite, acquired in March 2016. The Sentinel satellite was selected because it provided the acquisition period most close to the ASTER TIR acquisition, with a cloud-free scene within the studied area. The ASTER scenes were acquired in October 2014 and January 2016 for the Campo Grande and Nhecolândia regions, respectively. The downscaling of ASTER TIR data in the Campo Grande region was carried out using both TIR and VNIR channels of the same ASTER scene [35]. The Campo Grande region comprises an area of about 2570 km 2 , which corresponds to a subset of the original ASTER scene with dimensions of approximately 50 by 50 km. The Nhecolândia region covers an area of 2700 km 2 with dimensions of 50 by 53 km.
The ASTER bands were provided by the USGS (United States Geological Survey) [55] at an L1T processing level, which comprises precision terrain corrected, registered at-sensor radiance. The VNIR and TIR products were provided as top of atmosphere (TOA) radiance (W m −2 sr −1 µm −1 ). The Sentinel-2 images were also provided in TOA radiance (W m −2 sr −1 µm −1 ) at an L1C processing level, which includes radiometric and geometric corrections comprising ortho-rectification and spatial registration on a global reference system (WGS 84) with sub-pixel accuracy [56]. Before merging the data, we carried out a correlation analysis between ancillary and TIR images of the studied areas, in order to select ancillary bands with correlation coefficients of up to 0.85. This analysis was necessary to avoid collinearity in the regression step of the GWRK and ATPRK fusion methods, which would increase computational time without adding any relevant information to the fused images. Thus, we selected the ancillary ASTER bands 1 and NDVI (r 2 : 0.72 between these two bands) for the Campo Grande region and the ancillary Sentinel-2 bands 8 and 11 for the Nhecolândia region (r 2 : 0.81 between these two bands).

Experimental Setup
The highest spatial resolution possible for both studied areas can be achieved by merging the 90-m ASTER TIR images with the 15-m ASTER VNIR bands or 20-m Sentinel bands to produce 15-m and 20-m TIR images in the Campo Grande and Nhecolândia regions, respectively. However, by adopting this strategy, no reference image would be available at the end of the process to assess the accuracy of the fusion methods. Given this, the merging process must be carried out by taking into account the availability of perfect fine spatial resolution reference images, which is achieved by degrading the original ASTER TIR, ASTER VNIR, and Sentinel images, while respecting the spatial resolution ratio of the images, as shown in Figure 2. The 90-m ASTER bands 13 and 14 were upscaled by factors of 6 and 4.5 for the Campo Grande and Nhecolandia regions, respectively, in order to generate merged images Remote Sens. 2018, 10, 633 9 of 20 with a 90-m spatial resolution (Figure 2). The merging approaches were then implemented to merge the 540-and 405-m TIR bands with the 90-m ASTER VNIR and Sentinel-2 bands, which created images comparable to the reference 90-m TIR bands for the precise assessment of merging quality [35,57].
Following image downscaling, the ASTER TIR images were converted from radiance to surface LST. Image header parameters (conversion units) were used to convert digital numbers to at-sensor radiances. The thermal bands (TIR) 13 and 14 were calibrated to LST according to the method discussed by Jiménez-Muñoz [58]. The LST data retrieval from ASTER TIR bands is expected to be more accurate in the spectral region between 10 and 12 µm, which corresponds to bands 13 and 14. These bands show higher atmospheric transmissivity and, consequently, a good correlation with LST. Therefore, bands 13 and 14 were used for LST retrieval. The at-sensor radiances from bands 13 and 14 were converted to LST (T s ) according to the SC JM&S method detailed in Jiménez-Muñoz [58]. The final LST images represent the surface temperature in degrees Kelvin derived from bands 13 and 14. Both the reference and the merged images were converted in order to assess the quality of the downscaled images. Both the reference and the merged images were converted in order to assess the quality of the downscaled images. The three geostatistical methods, namely GWRK, ATPRK, and DCK, were compared to five state-of-the-art algorithms summarized in Vivone et al. [59] and Wang et al. [26]. They are ATWT, GS, MDMR, PCA, and TSHARP. Four indices were used for the quantitative evaluation of the spectral and spatial qualities of the LST downscaled bands 13 and 14, including the correlation coefficient (CC), relative global-dimensional synthesis error (ERGAS) [45], universal image quality index (UIQI) [60], and the spatial quality (SM) [61] index.
Given two images x and y, the CC is given as: where x and y are the mean gray values of the reference and the fused bands, respectively, and CC is estimated globally for each LST image. The result of this equation shows similarities between the small structures of the original and fused images. The resulting values ranged from −1 to 1. ERGAS (18) is an acronym in French for "Erreur relative globale adimensionnelle de synthese", which translates to relative dimensionless global error in synthesis. This index is used to calculate the The three geostatistical methods, namely GWRK, ATPRK, and DCK, were compared to five state-of-the-art algorithms summarized in Vivone et al. [59] and Wang et al. [26]. They are ATWT, GS, MDMR, PCA, and TSHARP. Four indices were used for the quantitative evaluation of the spectral and spatial qualities of the LST downscaled bands 13 and 14, including the correlation coefficient (CC), relative global-dimensional synthesis error (ERGAS) [45], universal image quality index (UIQI) [60], and the spatial quality (SM) [61] index.
Given two images x and y, the CC is given as: where x and y are the mean gray values of the reference and the fused bands, respectively, and CC is estimated globally for each LST image. The result of this equation shows similarities between the small structures of the original and fused images. The resulting values ranged from −1 to 1. ERGAS (18) is an acronym in French for "Erreur relative globale adimensionnelle de synthese", which translates to relative dimensionless global error in synthesis. This index is used to calculate the amount of spectral distortion between fused and reference images and is given by the following equation: where h/l is the ratio between the spatial resolution of the panchromatic image and the MS image, N is the number of spectral bands of the fused image, µ 0 is the mean value of each spectral band of the fused composition, and RMSE (relative root mean square error) represents the difference between the standard deviation and mean of the fused and original images. The best possible value for ERGAS is zero [45]. The UIQI (19,) proposed by Zhou and Bovik [60], considers the luminance, contrast, and structural differences between each band of the fused and original multispectral images, resulting in a global index representing the fusion quality.
U IQI (x, y) = (4σ xy xy) where x and y are two non-negative image signals (gray values) for the fused and reference bands, respectively, while σ xy , σ x , and σ y are the covariance and the variances of images x and y, respectively. The closer the UIQI index is to 1, the better the fused image. The SM index is useful for evaluating the spatial quality of the downscaled images. The spatial features are extracted, by applying a Laplacian filter over all fused and reference LST images, prior to implementing the SM algorithm. The CC is then estimated for corresponding bands and the derived values are averaged, which results in an overall spatial measure. The SM index is mathematically expressed as follows: CCi (20) where N is the total number of LST images, and CC represents the correlation coefficient obtained by (21): where X and Y are the i-th bands in the reference and fused images, respectively, * is the convolution operator, and l is a Laplacian filter, set by Otazu et al. [61] as: The SM values fall within a range of −1 to 1, where 1 indicates the best spatial quality. Quantitative evaluation algorithms (17)- (20) result in a set of global quality indices for each fused LST band. Nevertheless, depending on the fusion algorithm, regional variations in fused bands might occur due to differences related to specific image features. Accordingly, the numerical expression of the fusion quality by means of a global value might lead to biased results, especially when considering fusion methods that cause a high standard deviation between the reference and fused bands. Given this, we adopted a regionalized approach by calculating the quality index within zonal moving windows ( Figure 2) with dimensions 30 times greater than the resolution of the reference image (e.g., the zonal widow is a regular grid with dimensions of 2.7 by 2.7 km for the 90-m ASTER TIR images). The results of the quantitative approaches implemented for each fusion method were compared using descriptive statistics. The mean LST value of the downscaled TIR bands 13 and 14 were used for the quantitative evaluation of downscaling quality due to the very high spectral similarities between these two bands.

Results
Figures 3 and 4 summarize the image fusion results of the three geostatistical approaches (ATPRK, GWRK, and DCK) as well as the five other downscaling methods previously described, for the Campo Grande and Nhecolândia regions, respectively. The kriging methods ATPRK, DCK, and GWRK (Figure 4b-d) resulted in very similar temperature images when compared to the reference image (Figure 4a) in the Nhecolândia region. However, greater variability in temperature values was observed for the Campo Grande study area as a function of the different geostatistical approaches used (Figure 3b-d). As can be observed in Figure 3b, the ATPRK method caused a temperature overestimation in the urban area of Campo Grande, with some pixels showing reddish values in the central area of the city that are not visible in the reference image (Figure 3a). The ATPRK results showed a much higher contrast between low and high temperature values compared to the reference image (blue to red). While this effect was also observed in the image merged by DCK (Figure 3c), the scale had a lower contrast when compared to the ATPRK image (Figure 3b). The images merged by GWRK show a higher spectral similarity to the reference images of both studied areas (Figures 3d and 4d).
Remote Sens. 2018, 10, x FOR PEER REVIEW 11 of 20 bands 13 and 14 were used for the quantitative evaluation of downscaling quality due to the very high spectral similarities between these two bands.

Results
Figures 3 and 4 summarize the image fusion results of the three geostatistical approaches (ATPRK, GWRK, and DCK) as well as the five other downscaling methods previously described, for the Campo Grande and Nhecolândia regions, respectively. The kriging methods ATPRK, DCK, and GWRK (Figure 4b-d) resulted in very similar temperature images when compared to the reference image (Figure 4a) in the Nhecolândia region. However, greater variability in temperature values was observed for the Campo Grande study area as a function of the different geostatistical approaches used (Figure 3b-d). As can be observed in Figure 3b, the ATPRK method caused a temperature overestimation in the urban area of Campo Grande, with some pixels showing reddish values in the central area of the city that are not visible in the reference image (Figure 3a). The ATPRK results showed a much higher contrast between low and high temperature values compared to the reference image (blue to red). While this effect was also observed in the image merged by DCK (Figure 3c), the scale had a lower contrast when compared to the ATPRK image (Figure 3b). The images merged by GWRK show a higher spectral similarity to the reference images of both studied areas (Figures 3d  and 4d).   Regarding to the non-geostatistical solutions, the image merged by TSHARP in Campo Grande (Figure 3i) presented the best results, when both the spatial and spectral coherence are considered. The ATWT (Figure 4e,g) and MDMR methods resulted in blurred images that masked the border features, thus causing a significant spectral mixture between the riparian forest at the center of the images (pattern of blue irregular lines) and the building zones of the city of Campo Grande (yellow to red patterns). In contrast, the images that most resembled the reference image in the Nhecolândia region were obtained using the ATWT (Figure 4e) and MDMR (Figure 4g) methods, followed by the GS method (Figure 4f). The poorest results for both areas were obtained by the PC method ( Figures  3h and 4h), which caused the complete inversion of temperature values in the Nhecolândia study Regarding to the non-geostatistical solutions, the image merged by TSHARP in Campo Grande (Figure 3i) presented the best results, when both the spatial and spectral coherence are considered. The ATWT (Figure 4e,g) and MDMR methods resulted in blurred images that masked the border features, thus causing a significant spectral mixture between the riparian forest at the center of the images (pattern of blue irregular lines) and the building zones of the city of Campo Grande (yellow to red patterns). In contrast, the images that most resembled the reference image in the Nhecolândia region were obtained using the ATWT (Figure 4e) and MDMR (Figure 4g) methods, followed by the GS method (Figure 4f). The poorest results for both areas were obtained by the PC method (Figures 3h  and 4h), which caused the complete inversion of temperature values in the Nhecolândia study area (Figure 4h) and saturation in the Campo Grande region (Figure 3h). Similar effects were also observed in the GS downscaled image of Campo Grande (Figure 3f). Figures 5 and 6 show the quantitative evaluation of the downscaling methods for the Campo Grande and Nhecolândia study areas, respectively. The quantitative assessment, which was based on Equations (17)- (20), was carried out by assessing the correlation within a regular grid. As such, the results are expressed as the mean, median, quartiles, outliers, and extreme values (Figure 2).
The ideal values of each index are provided at the top (CC, SM, and UIQI) and bottom (ERGAS) of the charts in order to facilitate the comparison of the fusion methods. With the exception of the TSHARP method that presented values close to those obtained by DCK, the three geostatistical approaches (ATPRK, DCK, and GWRK) outperformed the five state-of-the-art algorithms (Table 1 and Figures 5  and 6). The CC, ERGAS, SM, and UIQI values of the GWRK were closer to the ideal values in all evaluated cases, considering both spectral (CC, ERGAS, UIQI) and spatial (SM) coherences. Moreover, the GWRK method resulted in fewer outliers and extreme values, due to its superior performance within all image targets of the fused TIR bands. If we compare, for example, ATPRK (the second-best method) and GWRK in the Campo Grande region (Figure 5), the CC and UIQI values are close (Table 1). However, there was a clear occurrence of extreme values and outliers in the ATPRK, with some values of CC and UIQI reaching 0.35. On the other hand, the extremely low values in the CC and UIQI for the GWRK were limited to 0.65, showing a better spectral coherence of this method within the entire merged image for the Campo Grande region (Figure 6). The overall quality of merged images in the Campo Grande region ( Figure 5) was higher when compared with the Nhecolândia region ( Figure 6). No negative quality values were observed in the Campo Grande region for any of the fusion methods when using the CC, SM, and UIQI indices (Table 1). On the other hand, we observed a large number of negative values in Nhecolândia, mostly for the PC fusion method, with some negative outliers also related to the GS, MDMR, and ATWT methods ( Figure 6). Despite the complex surface of the city of Campo Grande imparted by buildings and roads, the performance of the fusion methods was better due to a higher correlation between ancillary and TIR bands resulting from the images being collected by the same satellite platform (ASTER) in the same acquisition period. However, for the Nhecolândia region we used images from different satellites collected at different periods (day and night), which clearly caused a higher discrepancy between the reference and merged products ( Figure 6). Moreover, the high nocturnal thermal emissivity of Nhecolândia lakes, typical of hyperalkaline water bodies [52], might cause an abnormal spectral response that is not observed in other satellite spectral bands, such as those in the VIS, NIR, and SWIR spectral ranges of the ASTER and Sentinel-2 satellites.

Discussion
GWRK is similar to the AATPRK described by Wang et al. [26]. However, a distance-weighting scheme was also considered in this study. The authors highlighted the significant improvement in the quality of merged images resulting from the adoption of local windows, which was reflected in a coefficient of correlation close to 1 [26]. However, the authors did not consider distance-weighting schemes within the local regression windows. Therefore, the best method for refining ASTER temperature data without losing or distorting the original coarse information (Figures 7 and 8), is to combine local regression windows, weighting distance schemes (Gaussian) within local windows, and a final area-to-point kriging of regressed residuals using multiple covariates (Figure 9). Moreover, the results are of higher quality when compared with other downscaling methods based on regression and kriging (GWK, ATPRK, and DCK), as well as the regression and interpolation of residuals (TSHARP). Further, the downscaling resulted in the generation of more ideal CC, ERGAS, SM, and UIQI values. Finally, the experiments demonstrated the great utility of using GWRK to merge data with high spectral sensibility. The results presented herein show great potential for estimating temperature at high spatial resolution, which can be useful when studying local phenomena, such as the occurrence of heat islands in dense urban areas (Figure 7) or evaluating nocturnal variability of water temperature ( Figure 8) in flooded pristine regions.
It is important to highlight that the proposed GWRK method was compared to other state-of-the-art merging solutions based on kriging [13,35,42]. For the case of cokriging, we observed that the need to compute both the auto-semivariogram and the cross-semivariogram for each coarse band significantly increased the computational time, which made it impossible to use multiple covariates. On the other hand, GWRK and ATPRK are simpler and faster solutions since the mandatory process of cross-correlation between dependent and explanatory variables for kriging solutions is achieved by simply regressing the models globally (ATRPK) or locally with distance weighting (GWRK). Therefore, the kriging step of these methods, which calculates each coarse band separately, only requires the adjustment of the auto-semivariogram residuals for each coarse band, which was also observed by Wang et al. [28] for the ATRPK solution. As such, both ATPRK and GWRK are much faster to implement.

Discussion
GWRK is similar to the AATPRK described by Wang et al. [26]. However, a distance-weighting scheme was also considered in this study. The authors highlighted the significant improvement in the quality of merged images resulting from the adoption of local windows, which was reflected in a coefficient of correlation close to 1 [26]. However, the authors did not consider distance-weighting schemes within the local regression windows. Therefore, the best method for refining ASTER temperature data without losing or distorting the original coarse information (Figures 7 and 8), is to combine local regression windows, weighting distance schemes (Gaussian) within local windows, and a final area-to-point kriging of regressed residuals using multiple covariates ( Figure 9). Moreover, the results are of higher quality when compared with other downscaling methods based on regression and kriging (GWK, ATPRK, and DCK), as well as the regression and interpolation of residuals (TSHARP). Further, the downscaling resulted in the generation of more ideal CC, ERGAS, SM, and UIQI values. Finally, the experiments demonstrated the great utility of using GWRK to merge data with high spectral sensibility. The results presented herein show great potential for estimating temperature at high spatial resolution, which can be useful when studying local phenomena, such as the occurrence of heat islands in dense urban areas (Figure 7) or evaluating nocturnal variability of water temperature ( Figure 8) in flooded pristine regions.
It is important to highlight that the proposed GWRK method was compared to other state-ofthe-art merging solutions based on kriging [13,35,42]. For the case of cokriging, we observed that the need to compute both the auto-semivariogram and the cross-semivariogram for each coarse band significantly increased the computational time, which made it impossible to use multiple covariates. On the other hand, GWRK and ATPRK are simpler and faster solutions since the mandatory process of cross-correlation between dependent and explanatory variables for kriging solutions is achieved by simply regressing the models globally (ATRPK) or locally with distance weighting (GWRK). Therefore, the kriging step of these methods, which calculates each coarse band separately, only requires the adjustment of the auto-semivariogram residuals for each coarse band, which was also observed by Wang et al. [28] for the ATRPK solution. As such, both ATPRK and GWRK are much faster to implement.  In order to improve the quality of the merged images, fusion methods with multiple explanatory panchromatic bands were tested ( Figure 9). The ability to combine multiple covariates improved the quality of the methods based on regression (ATPRK and GWRK), as shown in Figure 9, which highlights the quality of the regressed surfaces using GWR prior to the kriging of the regressed residuals. The impact of multiple covariates on improving the quality of the merged images was clearer in the Nhecolândia region (Figure 9b) where GWR with just one covariate (Sentinel-2 band 12), resulting in a generally lower correlation within the local GWR windows. When using just one covariate (NDVI), the correlation values were higher in the Campo Grande region than in Nhecolândia. Further improvements were noted when multiple covariates were adopted for the Campo Grande region ( Figure 9a); however, they were not as significant as in Nhecolândia ( Figure  9b). Given this, it is advised to use multiple covariates when applying GWRK to merge images, since a significant improvement can be achieved, especially when working with panchromatic and multispectral bands with lower correlations, as is the case for the Nhecolândia region (Figure 9b).  In order to improve the quality of the merged images, fusion methods with multiple explanatory panchromatic bands were tested (Figure 9). The ability to combine multiple covariates improved the quality of the methods based on regression (ATPRK and GWRK), as shown in Figure 9, which highlights the quality of the regressed surfaces using GWR prior to the kriging of the regressed residuals. The impact of multiple covariates on improving the quality of the merged images was clearer in the Nhecolândia region (Figure 9b) where GWR with just one covariate (Sentinel-2 band 12), resulting in a generally lower correlation within the local GWR windows. When using just one covariate (NDVI), the correlation values were higher in the Campo Grande region than in Nhecolândia. Further improvements were noted when multiple covariates were adopted for the Campo Grande region ( Figure 9a); however, they were not as significant as in Nhecolândia (Figure 9b). Given this, it is advised to use multiple covariates when applying GWRK to merge images, since a significant improvement can be achieved, especially when working with panchromatic and multispectral bands with lower correlations, as is the case for the Nhecolândia region (Figure 9b). (c) result of GWRK pan-sharpening of TIR bands without degradation, with a 20-m spatial resolution. Lower and higher temperatures range from blue to red, respectively.
In order to improve the quality of the merged images, fusion methods with multiple explanatory panchromatic bands were tested (Figure 9). The ability to combine multiple covariates improved the quality of the methods based on regression (ATPRK and GWRK), as shown in Figure 9, which highlights the quality of the regressed surfaces using GWR prior to the kriging of the regressed residuals. The impact of multiple covariates on improving the quality of the merged images was clearer in the Nhecolândia region (Figure 9b) where GWR with just one covariate (Sentinel-2 band 12), resulting in a generally lower correlation within the local GWR windows. When using just one covariate (NDVI), the correlation values were higher in the Campo Grande region than in Nhecolândia. Further improvements were noted when multiple covariates were adopted for the Campo Grande region ( Figure 9a); however, they were not as significant as in Nhecolândia ( Figure  9b). Given this, it is advised to use multiple covariates when applying GWRK to merge images, since a significant improvement can be achieved, especially when working with panchromatic and multispectral bands with lower correlations, as is the case for the Nhecolândia region ( Figure 9b).  Multiple covariates combined with GWRK provided the best results for downscaling temperature data either using images collected by the same sensor and period, which was the case of the Campo Grande area, or using images acquired by different sensor systems at different time periods (daytime or nighttime acquisitions), which was the case for the Nhecolândia region. Moreover, the combination of multiple covariates and GWRK demonstrated high performance over a range of different image targets, such as urban areas, areas of native vegetation and agriculture, as well as floodplain regions.

Conclusions
Nowadays, there are still a few remote sensing satellites equipped with thermal sensors that acquire imagery data at high spatial and temporal resolutions. A few exceptions include Landsat 8 TIR and ASTER TIRS sensors, which provide data at a 90-m spatial resolution. However, some studies require data at more refined spatial scales. Therefore, in this research a series of state-of-the-art pan-sharpening methods were tested and a fusion approach based on GWR and area-to-point kriging of residuals was presented. The proposed method incorporates multiple covariates that are regressed by GWR, and the resulting residuals are refined to the resolution of the covariate bands by kriging.
The evaluation of the quality of the merging methods was carried out in two different regions with a high variety of natural and anthropogenic features, for which images from different satellites and sensor systems were merged. Following quantitative and qualitative analyses of the resulting images, the following conclusions were drawn: (i) the proposed GWRK outperformed all of the other evaluated methods, followed by the ATRPK method. The better performance of the proposed method was confirmed by quantitative analysis carried within zonal windows. The zonal CC, ERGAS, SM, and UIQI quality indices indicated that the GWRK method had the best performance in all evaluated scenarios considering both studied regions of Campo Grande and Nhecolândia; (ii) the quantitative evaluation of fused images is more precise when calculated zonally, instead of using global evaluation schemes; (iii) the use of multiple covariates is advised due to the significant improvement observed in the regressed surface from GWR as compared to regression with a single covariate, especially in areas with low correlation between panchromatic and TIR data.