Downscaling Land Surface Temperature in Complex Regions by Using Multiple Scale Factors with Adaptive Thresholds

Many downscaling algorithms have been proposed to address the issue of coarse-resolution land surface temperature (LST) derived from available satellite-borne sensors. However, few studies have focused on improving LST downscaling in urban areas with several mixed surface types. In this study, LST was downscaled by a multiple linear regression model between LST and multiple scale factors in mixed areas with three or four surface types. The correlation coefficients (CCs) between LST and the scale factors were used to assess the importance of the scale factors within a moving window. CC thresholds determined which factors participated in the fitting of the regression equation. The proposed downscaling approach, which involves an adaptive selection of the scale factors, was evaluated using the LST derived from four Landsat 8 thermal imageries of Nanjing City in different seasons. Results of the visual and quantitative analyses show that the proposed approach achieves relatively satisfactory downscaling results on 11 August, with coefficient of determination and root-mean-square error of 0.87 and 1.13 °C, respectively. Relative to other approaches, our approach shows the similar accuracy and the availability in all seasons. The best (worst) availability occurred in the region of vegetation (water). Thus, the approach is an efficient and reliable LST downscaling method. Future tasks include reliable LST downscaling in challenging regions and the application of our model in middle and low spatial resolutions.


Introduction
As an important parameter for characterizing the balance of surface energy, land surface temperature (LST) serves a key function in biophysical-chemical processes [1] and has been widely used in common applications, such as soil moisture estimation [2,3], forest fire detection [4], and urban heat environment monitoring [5][6][7]. Thermal infrared remote sensing (TIRS) can detect surface temperature and describe the spatial differences and diversity in LST [8] dynamically and macroscopically. A large amount of thermal remote-sensing data, including those from NOAA/AVHRR, Landsat TM/ETM+, MODIS, ASTER, and Landsat 8 TIRS, have been used to retrieve LST. Landsat satellites are frequently used in LST retrieval because of their high spatial resolution and wide availability of the data to the public, however, the LST retrieved from Landsat data is usually mixed with pixel temperature. Moreover, urban surfaces are characterized by high heterogeneity [5]. In this case, the LST retrieved from satellite-borne sensors has an insufficient spatial resolution for some urban applications. Downscaling may be applied to enhance the spatial resolution of thermal images with relatively low resolution [9]. mixed areas and adaptively selects the scale factors within a moving window based on the Pearson's correlation coefficients between LST and the scale factors. The rest of this paper is organized as follows: Section 2 discusses the proposed method. Section 3 presents the study area and data. Section 4 evaluates the downscaling results. Section 5 discusses the findings. Section 6 concludes the paper.

Downscaling Methods
LST can be retrieved using thermal infrared images with coarse spatial resolutions. Regression models between ancillary environmental predictors and LST have been widely established to enhance LST resolution. If the relationships between LST and the predictors do not change with the variation in the spatial resolution, a detailed LST with a high resolution can be estimated by the predictors using such relationships.
The four scale factors and LST image with coarse resolution are regressed into the model given by: where SAVI C , NMDI C , MNDWI C , NDBI C , and LST F are the SAVI, NMDI, MNDWI, NDBI, and fitted coarse-resolution LST, respectively. The subscript "C" indicates the variable in the coarse resolution, and the subscript "F" refers to the variable fitted by others. The coefficients a, b, c, and d change with the moving window.
Owing to the sparse ground observations and the limited spatial representativeness of ground measurements, the LST retrieved using TIRS images was used as the reference of LST, considering the reliable accuracy of retrieval, which has an average bias of less than 1 • C [46]. The residual temperature (e) became the difference between the retrieved LST (LST R ) and the LST F , as shown in Equation (1). This difference was due to the spatial variability in LST/SAVI and LST/NDBI: Therefore, from the coarse-resolution LST, the simulated LST with coarse resolution (LST C ) could be estimated as LST C = aSAV I C + bN MDI C + cMNDW I C + dNDBI C + e.
Owing to the scale invariance, the relationships between LST and the scale factors with coarse resolutions were applied to the four scale factors with high resolutions. Subsequently, a simulated LST with a high resolution (LST H ) is obtained, which is given by: where SAVI H , NDBI H , MNDWI H , and NMDI H are the SAVI, NDBI, MNDWI, and NMDI with high resolutions, respectively. For convenience, LST H (LST C ) is regarded as the downscaled (simulated) LST, whereas LST R is regarded as the retrieved LST. The given relationships were fitted by all scale factors because of mixed pixels. Nevertheless, the heterogeneity in a pixel decreases with the spatial resolution. Therefore, in the images with high resolutions, not all the scale factors were used in this study for the regression in every pixel. A multi-scale-factor downscaling approach based on adaptive threshold (MSFAT) was developed to solve this problem. Compared with other traditional approaches, the developed approach did not involve all scale factors in fitting the regression model. Furthermore, the actual cover types were considered in the selection of scale factors in our approach.
CCs were used to compute automatically the importance scores of the four scale factors, and CC thresholds were estimated to determine which scale factors would be involved in fitting the regression model. The estimation process of the CC threshold is as follows (Figure 1). First, the CC between each scale factor and LST was calculated within every moving window until the entire image was scanned. Second, the CCs of every scale factor were sorted into several levels (the number of windows) in ascending order. Third, at a given level, the scale factors were selected within every moving window according to the CC level, and the multiple regression model was fitted using the selected scale factors (Equation (4)). Fourth, the simulated LST with a high resolution at every level was evaluated using some evaluation measures. Fifth, the CC threshold at the optimal level for every scale factor was determined using the evaluation measures. Finally, assuming that the relationships between LST and the scale factors did not change with the scale, the pixels whose CCs were higher than the threshold values were downscaled in the corresponding multiple linear regression model (Equation (4)), whereas the LSTs of the other pixels were downscaled using the most relevant scale factors in the linear simulation. Therefore, in our downscaling method, not all land cover types were involved in the regression equation fitting, but only the scale factors of the main land cover types within a moving window were involved in the regression fitting. The correlation threshold of each scale factor was estimated to determine which scale factors should be involved in the downscaling model within a moving window. multi-scale-factor downscaling approach based on adaptive threshold (MSFAT) was developed to solve this problem. Compared with other traditional approaches, the developed approach did not involve all scale factors in fitting the regression model. Furthermore, the actual cover types were considered in the selection of scale factors in our approach. CCs were used to compute automatically the importance scores of the four scale factors, and CC thresholds were estimated to determine which scale factors would be involved in fitting the regression model. The estimation process of the CC threshold is as follows (Figure 1). First, the CC between each scale factor and LST was calculated within every moving window until the entire image was scanned. Second, the CCs of every scale factor were sorted into several levels (the number of windows) in ascending order. Third, at a given level, the scale factors were selected within every moving window according to the CC level, and the multiple regression model was fitted using the selected scale factors (Equation (4)). Fourth, the simulated LST with a high resolution at every level was evaluated using some evaluation measures. Fifth, the CC threshold at the optimal level for every scale factor was determined using the evaluation measures. Finally, assuming that the relationships between LST and the scale factors did not change with the scale, the pixels whose CCs were higher than the threshold values were downscaled in the corresponding multiple linear regression model (Equation (4)), whereas the LSTs of the other pixels were downscaled using the most relevant scale factors in the linear simulation. Therefore, in our downscaling method, not all land cover types were involved in the regression equation fitting, but only the scale factors of the main land cover types within a moving window were involved in the regression fitting. The correlation threshold of each scale factor was estimated to determine which scale factors should be involved in the downscaling model within a moving window.

Determination of the Moving Window Size
In the calculations of the regression models and CCs, the moving window (operation window) used in downscaling determines the accuracy of the estimated subpixel temperature. The moving window is also directly related to the complexity of the downscaling operations. The criteria for selecting the appropriate MWS according to the types and characteristics of the land cover in the study area should be determined.
Meanwhile, semivariance function is an important tool for understanding the spatial structure of local areas [26,27,35,[47][48][49]. In our study, the semivariance function uses variable ranges to represent the spatial variations in the four scale factors in the entire region.

Determination of the Moving Window Size
In the calculations of the regression models and CCs, the moving window (operation window) used in downscaling determines the accuracy of the estimated subpixel temperature. The moving window is also directly related to the complexity of the downscaling operations. The criteria for selecting the appropriate MWS according to the types and characteristics of the land cover in the study area should be determined.
Meanwhile, semivariance function is an important tool for understanding the spatial structure of local areas [26,27,35,[47][48][49]. In our study, the semivariance function uses variable ranges to represent the spatial variations in the four scale factors in the entire region.
A total of eight values of the variable range, R, in the horizontal and vertical directions for the four scale factors are calculated as [50]: where R is a variable range of a scale factor; i = 1, . . . , m, where m is the number of all the pixels for comparison; x i is the location of a pixel; Z(x i ) is the value of the pixel; and h, which varies from 0 to 300, is the step length of curve fit. Subsequently, R can be determined using h when the semivariance fitting curve stabilizes, and R can be regarded as the MWS.

Evaluation Measures
Two measures, namely, coefficient of determination (R 2 ) and root-mean-square error (RMSE) [19,35], were used to evaluate the downscaling effect of the MSFAT algorithm and compare the proposed algorithm with three other downscaling methods.
In the equation below, R 2 is the coefficient of determination between the original and downscaled images. A high R 2 indicates a satisfactory downscaling. This coefficient is given by: where LST S is the simulated LST (Equations (3) and (4)), LST R is the retrieved LST with the same number of pixels as that of LST R , and LST R is the average of LST R in the entire image. Meanwhile, RMSE was used to test the errors between the original LST image and the downscaled image. The calculation formula for RMSE is given by where M and N represent the number of rows and columns of the image, respectively. When the threshold values of the scale factors are set too high, no scale factors can be fitted, and some pixels cannot be included in the simulation of LST. Therefore, the number of pixels in the solution to Equation (3) was used as the index of algorithm availability.

Study Area and Data Description
Nanjing (31 • 14"-32 • 37" N, 118 • 22"-119 • 14" E) is the capital of Jiangsu Province, and is in one of the largest economic zones in China, the Yangtze River Delta [51]. Nanjing covers seven districts and a total area of 6587 km 2 . It has a humid subtropical climate, which is influenced by the East Asian monsoon. The annual average rainfall and air temperature in Nanjing are 979 mm and 15.9 • C, respectively. July and August are the hottest and the most humid months, in which the average maximum air temperature is 32 • C [52]. Accordingly, LST peaks in the hot summer. The experimental area is a subset of Nanjing City ( Figure 2) based on the land use and land cover map in 2010. The area, which is characterized by heterogeneous urban landscape patterns, has four main land cover types, namely, water, vegetation, bare soil (Yangtze River Beach and idle lands), and impervious surfaces. The Landsat 8 Operational Land Imager (OLI) and TIRS images of Nanjing City were acquired on 11 August 2013 and then used in this study. The Landsat 8 datasets, which were provided by the United States Geological Survey, included OLI and TIRS images with 30 and 100 m spatial resolutions, respectively [53]. At the retrieval moment, the air temperature, humidity, pressure, visibility, wind direction, and wind speed were 38.0 °C , 44%, 1008 hpa, 12 km north, and 2.0 m/s, respectively. Except for the image in the summer, the other images under clear sky were also acquired on 28 March 2016, 14 October 2013, and 20 December 2014 to reveal the availability of our approach in the other seasons (spring, autumn, and winter).
The four main land cover types were identified using the high-resolution images ( Figure 3); these were then classified by maximum likelihood classification [54]. The accuracy of the classification method was evaluated by comparison with the field survey data. With a kappa coefficient of 0.915, the maximum likelihood classification thus achieved high classification accuracy. The most dominant land cover type is impervious surface area, followed by vegetation and water. No obvious law governs the spatial distribution of the four land cover types. The Landsat 8 Operational Land Imager (OLI) and TIRS images of Nanjing City were acquired on 11 August 2013 and then used in this study. The Landsat 8 datasets, which were provided by the United States Geological Survey, included OLI and TIRS images with 30 and 100 m spatial resolutions, respectively [53]. At the retrieval moment, the air temperature, humidity, pressure, visibility, wind direction, and wind speed were 38.0 • C, 44%, 1008 hpa, 12 km north, and 2.0 m/s, respectively. Except for the image in the summer, the other images under clear sky were also acquired on 28 March 2016, 14 October 2013, and 20 December 2014 to reveal the availability of our approach in the other seasons (spring, autumn, and winter).
The four main land cover types were identified using the high-resolution images ( Figure 3); these were then classified by maximum likelihood classification [54]. The accuracy of the classification method was evaluated by comparison with the field survey data. With a kappa coefficient of 0.915, the maximum likelihood classification thus achieved high classification accuracy. The most dominant land cover type is impervious surface area, followed by vegetation and water. No obvious law governs the spatial distribution of the four land cover types.

Data Processing
The data processing in this study was divided into three parts, namely, data preprocessing, downscaling processing, and simulation validation. Data preprocessing aims to unify the image resolution. Downscaling processing executes the MSFAT algorithm, and simulation validation evaluates MSFAT.
In the data preprocessing, the OLI images were initially adjusted with the Fast Line-of-sight Atmospheric Analysis of Hypercubes (FLAASH) atmospheric correction algorithm. Owing to the unavailability of a validation reference for LST simulation, the OLI and TIRS images were upscaled to ensure that the LST simulation using Equation (4) could be validated by the LST retrieved from the TIRS images with the original resolution ( Figure 4). For convenience, the TIRS images with 100 m resolution were resampled into 90 m images by the nearest neighbor method, whereas the OLI images with 30 m resolution were resampled into 90 m images by aggregation [55]. The 90 m OLI and TIRS images were also resampled by aggregation into 360 m images. The 90 m OLI and TIRS images were high-resolution images, whereas the 360 m OLI and TIRS images were coarse-resolution images. The coarse-resolution images were used to construct the relationship model in Equation (3), whereas the 90 m OLI images were used to simulate the 90 m LST in Equation (4). The 90 m retrieved LST was used to validate the 90 m simulated LST. Subsequently, the four scale factors with 90 m (360 m) resolution were then estimated from the 90 m (360 m) OLI images, whereas the LST with 90 m (360 m) resolution was retrieved from band 10 of Landsat 8 TIRS by using the generalized single-channel method in consideration of the effect of stray light [56]. Worth Noting, the final resolution of the downscaled result is 90 m.
In the downscaling process, the moving window was first adopted to explore the relationships between LST and the scale factors, and then estimated using the semivariance curve. Next, the relevant scale factors were selected, and the 360 m retrieved LST was downscaled to the 90 m simulated LST following the approach in Section 2.1.
In the simulation validation, the simulated downscaled 90 m LST image was evaluated and compared with the retrieved 90 m LST. The downscaling accuracy and the spatial distribution of the simulation error were subsequently determined.

Data Processing
The data processing in this study was divided into three parts, namely, data preprocessing, downscaling processing, and simulation validation. Data preprocessing aims to unify the image resolution. Downscaling processing executes the MSFAT algorithm, and simulation validation evaluates MSFAT.
In the data preprocessing, the OLI images were initially adjusted with the Fast Line-of-sight Atmospheric Analysis of Hypercubes (FLAASH) atmospheric correction algorithm. Owing to the unavailability of a validation reference for LST simulation, the OLI and TIRS images were upscaled to ensure that the LST simulation using Equation (4) could be validated by the LST retrieved from the TIRS images with the original resolution ( Figure 4). For convenience, the TIRS images with 100 m resolution were resampled into 90 m images by the nearest neighbor method, whereas the OLI images with 30 m resolution were resampled into 90 m images by aggregation [55]. The 90 m OLI and TIRS images were also resampled by aggregation into 360 m images. The 90 m OLI and TIRS images were high-resolution images, whereas the 360 m OLI and TIRS images were coarse-resolution images. The coarse-resolution images were used to construct the relationship model in Equation (3), whereas the 90 m OLI images were used to simulate the 90 m LST in Equation (4). The 90 m retrieved LST was used to validate the 90 m simulated LST. Subsequently, the four scale factors with 90 m (360 m) resolution were then estimated from the 90 m (360 m) OLI images, whereas the LST with 90 m (360 m) resolution was retrieved from band 10 of Landsat 8 TIRS by using the generalized single-channel method in consideration of the effect of stray light [56]. Worth Noting, the final resolution of the downscaled result is 90 m.
In the downscaling process, the moving window was first adopted to explore the relationships between LST and the scale factors, and then estimated using the semivariance curve. Next, the relevant scale factors were selected, and the 360 m retrieved LST was downscaled to the 90 m simulated LST following the approach in Section 2.1.
In the simulation validation, the simulated downscaled 90 m LST image was evaluated and compared with the retrieved 90 m LST. The downscaling accuracy and the spatial distribution of the simulation error were subsequently determined.

Spatial Distribution of LST and Scale Factors
The four scale factors, namely, SAVI, NMDI, MNDWI, and NDBI, were extracted from the OLI image ( Figure 5). A comparison of Figures 2 and 4 shows that the spatial distributions of the four scale factors and four types of land cover (vegetation, soil, water, and impervious surface) were consistent. Thus, the four scale factors can accurately characterize the four types of land cover.
The Yangzi River, flowing through the northwestern part of the study area, exhibited an MNDWI higher than 0.8, thus indicating a water area. A similar MNDWI was located in the southwest area, which corresponded to the Xuanwu Lake. In the southern part of the region, an area with SAVI of more than 1.0 was located in the Zijing Mountain with dense trees. Several building zones with NDBI of more than 0 were sporadically distributed in the northern part. Furthermore, mixed-land covers occupied the other pixels of the study area.
The distribution of LST (90 m retrieved values) is presented in Figure 6a. The average temperature in the study area was 37.2 °C . The lowest temperature (approximately 30 °C ) was detected in the Yangzi River and Xuanwu Lake, which had high MNDWIs. Relatively low temperatures (32 °C -34 °C ) were also recorded in the Zijing Mountain, which had a high SAVI, whereas the highest temperature (higher than 40 °C ) was sporadically located in the northern industrial zones with high NDBI. The LST distribution was evidently related to the scale factors. Figure 6b shows the 360 m retrieved LST, whose distribution is similar to that of the 90 m retrieved LST. However, detailed LST information cannot be provided in the 360 m resolution, particularly for urban areas. Thus, the downscaling approach should be applied because of the absence of high-resolution LST.

Spatial Distribution of LST and Scale Factors
The four scale factors, namely, SAVI, NMDI, MNDWI, and NDBI, were extracted from the OLI image ( Figure 5). A comparison of Figures 2 and 4 shows that the spatial distributions of the four scale factors and four types of land cover (vegetation, soil, water, and impervious surface) were consistent. Thus, the four scale factors can accurately characterize the four types of land cover.
The Yangzi River, flowing through the northwestern part of the study area, exhibited an MNDWI higher than 0.8, thus indicating a water area. A similar MNDWI was located in the southwest area, which corresponded to the Xuanwu Lake. In the southern part of the region, an area with SAVI of more than 1.0 was located in the Zijing Mountain with dense trees. Several building zones with NDBI of more than 0 were sporadically distributed in the northern part. Furthermore, mixed-land covers occupied the other pixels of the study area.
The distribution of LST (90 m retrieved values) is presented in Figure 6a. The average temperature in the study area was 37.2 • C. The lowest temperature (approximately 30 • C) was detected in the Yangzi River and Xuanwu Lake, which had high MNDWIs. Relatively low temperatures (32 • C-34 • C) were also recorded in the Zijing Mountain, which had a high SAVI, whereas the highest temperature (higher than 40 • C) was sporadically located in the northern industrial zones with high NDBI. The LST distribution was evidently related to the scale factors.
The temperature distribution matched those of the scale factors. The lowest temperature corresponded to the high MNDWIs in some western water regions (Yangzi River and Xuanwu Lake), whereas the highest temperature corresponded to the high NDBI in the northern building region. A low temperature was related to the high SAVI in the southern forest region (Zijing Mountain).   The temperature distribution matched those of the scale factors. The lowest temperature corresponded to the high MNDWIs in some western water regions (Yangzi River and Xuanwu Lake), whereas the highest temperature corresponded to the high NDBI in the northern building region. A low temperature was related to the high SAVI in the southern forest region (Zijing Mountain).    Figure 6b shows the 360 m retrieved LST, whose distribution is similar to that of the 90 m retrieved LST. However, detailed LST information cannot be provided in the 360 m resolution, particularly for urban areas. Thus, the downscaling approach should be applied because of the absence of high-resolution LST.
The temperature distribution matched those of the scale factors. The lowest temperature corresponded to the high MNDWIs in some western water regions (Yangzi River and Xuanwu Lake), whereas the highest temperature corresponded to the high NDBI in the northern building region. A low temperature was related to the high SAVI in the southern forest region (Zijing Mountain).

Analysis of the MWS
The R value of each horizontal and vertical scaling factor was estimated using the exponential model of semivariance. As shown in Figure 7, the R values of the four scale factors were similar in the vertical and horizontal directions. When the step length was approximately 5 horizontally or vertically, the fitting curve of the semivariance function became stabilized, as indicated by the red line. Therefore, the average of the eight R values of the four land-cover types in the vertical and horizontal directions was 5. The R values of every scale factor in the horizontal and vertical directions were closely related to the spatial distribution characteristics. Therefore, the average values of the eight R values in the vertical and horizontal directions were used to determine the MWS, which was 5 × 5 pixels in this study.

Analysis of the MWS
The R value of each horizontal and vertical scaling factor was estimated using the exponential model of semivariance. As shown in Figure 7, the R values of the four scale factors were similar in the vertical and horizontal directions. When the step length was approximately 5 horizontally or vertically, the fitting curve of the semivariance function became stabilized, as indicated by the red line. Therefore, the average of the eight R values of the four land-cover types in the vertical and horizontal directions was 5. The R values of every scale factor in the horizontal and vertical directions were closely related to the spatial distribution characteristics. Therefore, the average values of the eight R values in the vertical and horizontal directions were used to determine the MWS, which was 5 × 5 pixels in this study.

CC Threshold Value
The CC threshold values were estimated to select the major scale factors that fit the regression model within a moving window. The CCs were sorted into 1545 levels for every scale factor from the smallest to the largest in the study area ( Figure 8). The number of levels depended on the number of moving windows in the entire 360 m image. Only the scale factors with CCs higher than the corresponding CC thresholds were considered in fitting the downscaled regression model. Clearly, the higher the CC threshold is, the fewer the pixels that can be fitted. Consequently, the pixels with low CCs for every scale factor cannot be simulated. The 90 m LST was estimated using the major scale factors, which were selected on the basis of the CC thresholds. The downscaling result is shown in Figure 9.

CC Threshold Value
The CC threshold values were estimated to select the major scale factors that fit the regression model within a moving window. The CCs were sorted into 1545 levels for every scale factor from the smallest to the largest in the study area ( Figure 8). The number of levels depended on the number of moving windows in the entire 360 m image. Only the scale factors with CCs higher than the corresponding CC thresholds were considered in fitting the downscaled regression model. Clearly, the higher the CC threshold is, the fewer the pixels that can be fitted. Consequently, the pixels with low CCs for every scale factor cannot be simulated. The 90 m LST was estimated using the major scale factors, which were selected on the basis of the CC thresholds. The downscaling result is shown in Figure 9.

CC Threshold Value
The CC threshold values were estimated to select the major scale factors that fit the regression model within a moving window. The CCs were sorted into 1545 levels for every scale factor from the smallest to the largest in the study area (Figure 8). The number of levels depended on the number of moving windows in the entire 360 m image. Only the scale factors with CCs higher than the corresponding CC thresholds were considered in fitting the downscaled regression model. Clearly, the higher the CC threshold is, the fewer the pixels that can be fitted. Consequently, the pixels with low CCs for every scale factor cannot be simulated. The 90 m LST was estimated using the major scale factors, which were selected on the basis of the CC thresholds. The downscaling result is shown in Figure 9.   The values of R 2 and RMSE (90 m retrieved LST versus the scale factors) as well as the number of fitting pixels were used to determine the CC thresholds. Missing information (black pixels) existed because of the excessively high CCs of the scale factors in these pixels. When the CC threshold increased, R 2 also increased with decreasing RMSE and decreasing number of available pixels. A low CC threshold resulted in the low accuracy of the simulated LST and a large number of available pixels, whereas a high CC threshold resulted in the high accuracy of the simulated LST and a small number of available pixels. The evaluation measures, R 2 , RMSE, and the number of available pixels, varied stably at level 440, where the second derivative of the curves for the measures was approximate to 0, without a dramatic variation. At this level, the average R 2 was relatively large (i.e., 0.87), the RMSEs had a low average value (i.e., 1.15 • C), and more than 95% of the pixels were involved in the fitting. This meant that the accuracy was relatively unsatisfactory below this level, whereas the number of available pixels sharply decreased at this level. The fitting information of LST was lost in most of the areas with high CC thresholds (Figure 8c,d). For example, we selected four typical CC threshold levels (i.e., 0, 440, 810, and 1545) to present the discrepancies in the evaluation measures for different CC levels, because they were the beginning, stable point, dramatic variation, and final level, respectively. As shown in Table 1, when the CC threshold levels increased from 0 to 440, 810, and 1545, R 2 increased from 0.84 to 0.87, 0.88, and 0.92, respectively, whereas the RMSE (number of available pixels) decreased from 1.  Furthermore, the CCs at level 440 implied that the downscaled result of the MSFAT algorithm was the best. Furthermore, the optimal thresholds of SAVI, NMDI, MNDWI, and NDBI were 0.623, 0.773, 0.311, and 0.775, respectively. At these thresholds, most of the pixels can be involved in the downscaling process. In such case, the missing pixels mainly located in the Zijing Mountain, and the low CCs are probably related to the influence of uneven topography. The LST in the few noninvolved pixels had to be downscaled using most of the relevant scale factors because of the weak correlations of LST with the scale factors. Therefore, within every moving window, the related scale factors were selected, and the multiple linear regression model (Equation (3)) was established according to the respective thresholds of SAVI (i.e., 0.623), NMDI (i.e., 0.773), MNDWI (i.e., 0.311), and NDBI (i.e., 0.775).

Downscaling Performance
After the fitting of the most relevant scale factors in the pixels, the final simulated result was obtained (Figure 10a). Compared with Figure 9b, Figure 10a already provides the missing information to obtain the LST in the mountain area. The average simulated temperature in the study area was 37.0 • C. A comparison of Figure 10a with Figure 5b shows that our downscaling method obviously improved the spatial resolution of the original LST image, especially in the northern region, in which high LSTs are indicated in red and yellow, corresponding to the industrial zones and the mixed areas, respectively. Our simulated LST image could also identify the bridge above the Yangzi River and the four islands in the Xuanwu Lake. The distribution of LSTs in Figure 10a is similar to those in the simulated 90 m image (Figure 9b) and retrieved 90 m image (Figure 6a), with the lowest, relatively low, and highest temperatures detected in the water, vegetation, and building areas, respectively. Therefore, our 90 m downscaled LST showed spatial reliability and provided more detailed information than the 90 m fitting LST in Figure 9b and the 360 m retrieved LST in Figure 6b.

Validation of the Downscaling Results
Compared with the 90 m retrieved LST, the 90 m simulated LST had a relatively satisfactory accuracy for the entire image, with pixel-average R 2 and RMSE of 0.87 and 1.13 °C , respectively ( Figure 11a). The pixels with LST errors of −1.0 °C -1.0 °C , −2.0 °C -−1.0 °C , 1.0 °C -2.0 °C , lower than −2.0 °C , and higher than 2.0 °C accounted for 73%, 6%, 14%, 2%, and 5% of all the pixels, respectively ( Table 2). In most of the pixels, the discrepancies between the retrieved and simulated LSTs were less than 1 °C and within the scope of the retrieved accuracy [46]. Thus, reliable downscaling results appeared in most parts of the area. Table 2. Error probability of downscaled LST for our approach in all seasons.

Validation of the Downscaling Results
Compared with the 90 m retrieved LST, the 90 m simulated LST had a relatively satisfactory accuracy for the entire image, with pixel-average R 2 and RMSE of 0.87 and 1.13 • C, respectively (Figure 11a). The pixels with LST errors of −1.0 • C-1.0 • C, −2.0 • C-−1.0 • C, 1.0 • C-2.0 • C, lower than −2.0 • C, and higher than 2.0 • C accounted for 73%, 6%, 14%, 2%, and 5% of all the pixels, respectively ( Table 2). In most of the pixels, the discrepancies between the retrieved and simulated LSTs were less than 1 • C and within the scope of the retrieved accuracy [46]. Thus, reliable downscaling results appeared in most parts of the area. Table 2. Error probability of downscaled LST for our approach in all seasons. As shown in Figure 12, a systemic underestimation occurred in the region with a low temperature of approximately 30 • C, specifically in the locality near to the shores of the Yangzi River. This phenomenon may have been induced by the improper recognition of this region, such that the land-water mixed shores might have been directly mistaken for water. In addition, there were a small number of pixels with LST overestimation in the industrial zone in the northern part of the city, where dense tall factory buildings were located. This outcome might have resulted from the inaccurate estimation of land surface emissivity in the area; accordingly, the reliability of the retrieved LST was reduced [46]. In order to reveal the overall accuracy of the downscaling result, we also analyzed its accuracy depending on the different types of surfaces. The RMSE of results in the regions of water, vegetation, impervious surface, and bail soil were 1.18 • C, 0.83 • C, 1.08 • C and 1.10 • C, respectively. Our results thus demonstrate the higher accuracy in the region of vegetation than in water areas. As shown in Figure 12, a systemic underestimation occurred in the region with a low temperature of approximately 30 °C , specifically in the locality near to the shores of the Yangzi River. This phenomenon may have been induced by the improper recognition of this region, such that the land-water mixed shores might have been directly mistaken for water. In addition, there were a small number of pixels with LST overestimation in the industrial zone in the northern part of the city, where dense tall factory buildings were located. This outcome might have resulted from the inaccurate estimation of land surface emissivity in the area; accordingly, the reliability of the retrieved LST was reduced [46]. In order to reveal the overall accuracy of the downscaling result, we also analyzed its accuracy depending on the different types of surfaces. The RMSE of results in the regions of water, vegetation, impervious surface, and bail soil were 1.18 °C , 0.83 °C , 1.08 °C and 1.10 °C , respectively. Our results thus demonstrate the higher accuracy in the region of vegetation than in water areas.

LST Error (
In addition, at the Ruijin site, the ground observation of LST was 35.02 °C . The retrieved LST and downscaling result in the pixel located at the site were 34.05 °C and 34.49 °C , respectively. Compared with ground observation, the error of downscaling result was 0.53 °C , which was within In addition, at the Ruijin site, the ground observation of LST was 35.02 • C. The retrieved LST and downscaling result in the pixel located at the site were 34.05 • C and 34.49 • C, respectively. Compared with ground observation, the error of downscaling result was 0.53 • C, which was within the accuracy of LST retrieval. Therefore, the direct validation also reveals the availability of our approach. In summary, the 90 m downscaled LST proved to be reliable (with a bias of less than 1 • C) in approximately three-quarters of the area, except for the shores of the Yangzi River and the industrial zone in the northeast region. In the entire area, the pixel-average R 2 and RMSE reached 0.87 and 1.13 • C, respectively.

Comparison of Approaches
As shown in Figure 13, all downscaling methods obviously improved the spatial resolution of the original LST image (Figure 6b). Some detailed information within the same land cover was found in the downscaled images (Figure 7a,c,e); in comparison, the same cannot be found in the original image (Figure 5b). The downscaled LST images can maintain the thermal characteristics and spatial distribution characteristics of the original LST image. Relative to the 90 m retrieved LST, regardless of the water area, the downscaling result of DisTrad and TsHARP approach had a R 2 (RMSE) of 0.86 (1.01 °C ) and 0.82 (1.14 °C ), when the value was 0.85 (1.04 °C ) for our approach. Hence, the accuracy of our approach proved to be better than that of TsHARP approach, and it is also similar to that of DisTrad approach.
In detail, most errors of three methods ranged from −1 °C to 1 °C. Most of the errors were less than 1 °C for MSFAT algorithm, whereas there were more errors greater than 1 °C for the TsHARP method (Table 3). DisTrad method had the least errors of more than 3 °C but less errors ranging from −1 °C to 1 °C compared with MSFAT. The accuracy of our approach ranged between those of the DisTrad and TsHARP approaches in the regions of vegetation and impervious surface. The accuracies of these three approaches are similar in the region of bail soil. Worth noting, our approach can downscale the LST in water area, whereas the two other approaches have no ability of downscaling in the water area.

Comparison of Approaches
As shown in Figure 13, all downscaling methods obviously improved the spatial resolution of the original LST image (Figure 6b). Some detailed information within the same land cover was found in the downscaled images (Figure 7a,c,e); in comparison, the same cannot be found in the original image ( Figure 5b). The downscaled LST images can maintain the thermal characteristics and spatial distribution characteristics of the original LST image. Relative to the 90 m retrieved LST, regardless of the water area, the downscaling result of DisTrad and TsHARP approach had a R 2 (RMSE) of 0.86 (1.01 • C) and 0.82 (1.14 • C), when the value was 0.85 (1.04 • C) for our approach. Hence, the accuracy of our approach proved to be better than that of TsHARP approach, and it is also similar to that of DisTrad approach.
In detail, most errors of three methods ranged from −1 • C to 1 • C. Most of the errors were less than 1 • C for MSFAT algorithm, whereas there were more errors greater than 1 • C for the TsHARP method (Table 3). DisTrad method had the least errors of more than 3 • C but less errors ranging from −1 • C to 1 • C compared with MSFAT. The accuracy of our approach ranged between those of the DisTrad and TsHARP approaches in the regions of vegetation and impervious surface. The accuracies of these three approaches are similar in the region of bail soil. Worth noting, our approach can downscale the LST in water area, whereas the two other approaches have no ability of downscaling in the water area.

Availability of MSFAT in Different Seasons
Except for the situation in the summer, the downscaling results of MSFAT algorithm in the other three seasons are shown in Figure 10. Compared with the 90 m retrieved LST, the 90 m simulated LST had a relatively satisfactory accuracy for the entire image, with pixel-average R 2 (RMSE) of 0.86 (1.31 °C ), 0.83 (1.28 °C ), and 0.63 (0.91 °C ) in the spring, autumn, and winter, respectively ( Figure 11). Meanwhile, the R 2 and RMSE were 0.87 and 1.13 °C , respectively, in the summer. Obviously, MSFAT algorithm had the best downscaling capability in summer than in the

Availability of MSFAT in Different Seasons
Except for the situation in the summer, the downscaling results of MSFAT algorithm in the other three seasons are shown in Figure 10. Compared with the 90 m retrieved LST, the 90 m simulated LST had a relatively satisfactory accuracy for the entire image, with pixel-average R 2 (RMSE) of 0.86 (1.31 • C), 0.83 (1.28 • C), and 0.63 (0.91 • C) in the spring, autumn, and winter, respectively ( Figure 11). Meanwhile, the R 2 and RMSE were 0.87 and 1.13 • C, respectively, in the summer. Obviously, MSFAT algorithm had the best downscaling capability in summer than in the three other seasons. In comparison, winter seemed to be incompatible with our proposed approach. This phenomenon is probably related to LST. Higher LST is always accompanied by better downscaling capability. When LST is too low, the ice and snow on the surface generally affect the scale factors and reduce the accuracy of our approach to some degree.
In detail, the accuracy of our approach was best in the region of vegetation in all seasons with RMSE of 0.73 • C-0.94 • C (Figure 12). In the region of water, the R 2 (the result of our approach vs. the retrieved LST) in the winter (0.78) was obviously higher than those in other seasons (0.22-0.32), which may be related to the greater number of pure pixels in the regions of dried-up water shores. The opposite situation for RMSE appeared in the regions of impervious surface and bail soil.
Generally, the MSFAT algorithm can be applied in all seasons, especially in the summer. However, more careful application in the winter is required. Meanwhile, the best (worst) availability occurred in the region of vegetation (water) in all seasons.

Discussion
LST information derived from TIR images is essential in ecology, meteorology, and hydrology research [57][58][59]. Many applications in urban ecology require high-resolution TIR remote sensing data, and downscaling facilitates the acquisition of such data [60,61]. The primary objective of this study is to develop an adaptive selection approach for the relevant scale factors to downscale the LST maps in heterogeneous regions. In view of this objective, the usefulness of the MSFAT should be evaluated. The results of the Landsat 8 downscaling experiments indicated the effectiveness of this method. In this study, fine-scale predicted LSTs obtained by the MSFAT algorithm are compared with the retrieved LST from the original TIRS images.
The evaluation results showed relatively satisfactory RMSE and R 2 statistics. Unlike in other linear regression approaches to LST downscaling, only the scale factors with CCs higher than the thresholds are involved, and the appropriate scale factors are selected adaptively in our regression. In other approaches, all the scale factors are considered without setting thresholds. As a result, all scale factors within each moving window are all included in the regression model [62,63]. A poor downscaling effect is normally attained in the pixel of a single/few land cover types if all scale factors are involved in the regression. However, in other approaches, only a single scale factor is considered; as a result, the mixed areas with numerous types of land cover cannot attain a satisfactory downscaling effect [16][17][18]. Therefore, the MSFAT algorithm has the advantages of multiple scale factors, adaptive selection of factors, and satisfactory accuracy.
However, our algorithm also has some limitations in LST downscaling in urban areas, especially those containing dense tall building blocks and river shores. This phenomenon in the regions of dense building blocks is partly related to the insensitivity of NDBI in these areas [45]. In reality, the unsatisfactory performance could be ascribed to the underestimation (overestimation) of land surface emissivity (temperature) estimation. Meanwhile, the limitation in the region of river shores (near the northwest water body) can be mainly ascribed to various mixed pixels existing in the inundated region. The dense tall building blocks and river shores occupy only few areas in ordinary urban regions. Thus, despite certain limitations, MSFAT remains an effective approach to rapidly and accurately downscaling LST in urban mixed areas because it uses adaptive thresholds and multi-scale factors. In addition, MSFAT is reliable in regions with uneven topography. However, in the regions with wavy terrain, other scale factors that can represent terrain characteristics may have to be integrated into the MSFAT algorithm. The incapability of fitting in the Zijing Mountain implies the limitation of MSFAT in mountainous areas. Furthermore, the edge effects in the downscaled LST image are apparent in the overlapping parts in the moving windows, which is due to different regression models existing in various moving windows. The edge effects may be reduced if the step size of the moving window is increased. Therefore, reliable LST downscaling in building and shore areas as well as in uneven regions will be our future goal.
Meteorological conditions differ significantly in different seasons, accompanying with the drastic variation of the state of the underlying surface. The vegetation grows luxuriantly in the summer, while the water freezes in the winter. Furthermore, the soil water content fluctuates in the rainy or dry season. Those variations possibly have an influence on the discrepancy of the availability of MSFAT in different seasons. Thus, the relationships of underling surface variations (meteorological conditions) and the availability of MSFAT will be researched in the future.
LST downscaling in middle and high spatial resolutions has been realized using our algorithm. However, limited by the low temporal resolution of the downscaled LST images and the influence of the clouds, the images are unsuitable for dynamic surface heat island analysis [47,63]. Thus, LST downscaling in middle and low spatial resolutions as well as high temporal resolution is necessary in estimating daily LST variation continuously [19,64]. The combination of LST downscaling in various resolutions contributes to the accurate monitoring of regional thermal environments [65].

Conclusions
This paper presents a strategy for downscaling LST in an area with various land cover types by using four scale factors, which are adaptively selected according to the CCs between LST and the scale factors within every moving window. The comparison results based on two statistical measures and visual analyses show that MSFAT achieves a satisfactory downscaling performance, regardless of whether it is used for vegetation area, impervious surface area, water body, and mixed area. The R 2 and RMSE values between the 90 m downscaled result and the 90 m retrieved image are 0.87 and 1.13 • C, respectively. Except for the overestimation in the industrial zone in the northern region and the underestimation in the Yangzi River, the differences between the retrieved LST and simulated LST are less than 1 • C in approximately three-quarters of the study area. Spatially, compared with the 360 m retrieved LST, the 90 m downscaled result can present detailed LST information; furthermore, a similar distribution appears in both the 90 m downscaled and 90 m retrieved LSTs.
Compared with other algorithms that have been proven to provide high downscaling accuracy in our studies, MSFAT has the advantages of similar accuracy, availability of downscaling in water area, multiple scale factors, adaptive selection of factors, and a relatively credible downscaling performance. MSFAT also shows availability in all seasons, especially in the summer. Furthermore, the best availability occurred in the region of vegetation, while the worst one appeared in the region of water. Thus, MSFAT has considerable potential in generating useful LST information from thermal images of mixed areas with an improved spatial resolution. Furthermore, MSFAT can select the scale factors adaptively according to land cover types by using CCs. Thus, MSFAT can be applied to more LST data, such as MODIS/LST and ASTER/LST, in other seasons, and in other regions with flat terrain. In our future research, we intend to develop a method for performing reliable LST downscaling in building and shore areas as well as in uneven regions. Through such a method, we aim to reduce the edge effects and apply MSFAT in middle or low spatial resolutions.