^{1}

^{*}

^{2}

^{3}

^{2}

This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (

Monitoring of (surface) urban heat islands (UHI) is possible through satellite remote sensing of the land surface temperature (LST). Previous UHI studies are based on medium and high spatial resolution images, which are in the best-case scenario available about four times per day. This is not adequate for monitoring diurnal UHI development. High temporal resolution LST data (a few measurements per hour) over a whole city can be acquired by instruments onboard geostationary satellites. In northern Germany, geostationary LST data are available in pixels sized 3,300 by 6,700 m. For UHI monitoring, this resolution is too coarse, it should be comparable instead to the width of a building block: usually not more than 100 m. Thus, an LST downscaling is proposed that enhances the spatial resolution by a factor of about 2,000, which is much higher than in any previous study. The case study presented here (Hamburg, Germany) yields promising results. The latter, available every 15 min in 100 m spatial resolution, showed a high explained variance (R^{2}: 0.71) and a relatively low root mean square error (RMSE: 2.2 K). For lower resolutions the downscaling scheme performs even better (R^{2}: 0.80, RMSE: 1.8 K for 500 m; R^{2}: 0.82, RMSE: 1.6 K for 1,000 m).

Increasing urbanization has caused changes in the heat balance in densely built urban areas [

In terms of UHI monitoring, a spatial resolution of 1 km allows coarse scale temperature mapping and limits the analysis of relationships between the UHI and

Instruments in the geostationary orbit like the Spinning Enhanced Visible Infra-Red Imager (SEVIRI) aboard Meteosat Second Generation (MSG) satellites on the other hand have appropriate temporal resolution but too poor spatial resolution (standard retrieval time of SEVIRI is 15 min; one SEVIRI pixel for the case study area of Hamburg is approximately 3,300 by 6,700 m large). Many studies are thus based on instruments in polar orbits that have wide swath (over 2,000 km) and a revisit time of approximately one day (one daytime and one night-time image). Such instruments have nominal spatial resolution of about 1,000 m, thus they are considered as a reasonable compromise between the two previously described options.

This paper examines the possibility of using an alternative approach based on the fusion of data with different temporal and spatial resolutions. Such an approach should provide results in high temporal (15 min) and high spatial (100 m) resolution. This should be possible as LST spatial distribution is well correlated with numerous parameters. Many of them can be observed by remote sensing in high spatial resolution. If the correlation between these parameters in high spatial resolution (HR) and low spatial resolution (LR) LST is known, LST in HR can be estimated. The factor of this downscaling was not larger than 100 in previous studies. This study aims to examine whether LST downscaling can be used with a higher downscaling factor in order to fulfill the needs of urban climatology (Section 2.1). Furthermore, it is investigated which predictors are most suitable to implement such a downscaling scheme and how the errors depend on the target resolution (Section 3). At the end possible improvements are discussed (Section 4).

Downscaling, image fusion, spatial sharpening and disaggregation are known methods to “improve” the spatial resolution of the input data using predictors of higher spatial resolution. In the first attempts of image fusion, the goal was the production of sharper multispectral images based on information obtained from the panchromatic band (e.g., [

Therefore, LST downscaling is possible using physical or statistical based methods. Among physical based LST downscaling methods, the dual band method, often used in thermal anomaly monitoring [

More common are statistical based methods. Most of them rely on the linear correlation between predictors (see following paragraph) and LST, e.g., [

Statistically based downscaling can only be successful if appropriate HR predictors are available. Many biophysical parameters are well correlated to the LST spatial distribution. As most of them can be observed in finer resolution than LST, they are suitable as downscaling predictors. The most renowned is the negative correlation between LST and normalized difference vegetation index (NDVI) [

The Southern part of Hamburg, Germany (53.38–53.63°E, 9.75–10.38°N) was chosen for the case study. It comprises the river Elbe (from South-East to North-West), rural areas with a high percentage of agricultural land use, forested areas in the Southwest and East and most of the city of Hamburg including the port areas (in the center), the central business district north of the river Elbe in the center as well as urban areas of different building densities (ranging from single family houses to very compact morphologies). Hamburg has a population of about 1.8 million inhabitants and is situated in the northern German lowland between the North Sea and the Baltic Sea. The domain of the study was limited by the availability of the morphological data (see below).

The entire set of 296 predictors was processed from different sensors and data sources. All predictors were first prepared on a 100 m UTM grid. For further analysis the predictors were resampled to lower resolutions between 200 m and 1,000 m in 100 m steps using an area weighted average. The spatial resolution of 100 m is appropriate because it is approximately equal to the width of a building block in a city. Coarser resolutions are more suitable for monitoring broader areas like city districts.

Topographic predictors derived from NEXTMap® Interferometric Synthetic Aperture Radar (IFSAR) data include simple statistics of the height distribution [

Multispectral predictors were derived from multitemporal data from the Thematic Mapper (TM) on board Landsat 4 and 5 and the Enhanced Thematic Mapper Plus (ETM+) on board Landsat 7. Since the predictors are only used in empirical models, calibration, atmospheric correction and conversion to reflectance were not considered necessary [

Further, some land cover data were included, more specifically the proportion of water per pixel (

For calibration of the downscaling scheme the LSA SAF operational LST product was used. Their LST is produced every 15 min for cloud-free pixels [

The independent HR LST used for validation came from ASTER on board the Terra satellite. It retrieves data approximately 15 min after LS 7 in the same orbit. ASTER measures five thermal bands (between 8 and 11 μm) at 90 m horizontal resolution. Its LST product is calculated with the “temperature and emissivity separation algorithm” [

Downscaling starts with the aggregation of HR predictors to the same grid as LR LST. The input LST is given for a SEVIRI pixel that has an “irregular” shape according to the HR predictors in UTM projection. Instead of a simple geometrical resampling, the predictors were aggregated using the point spread function (PSF) of the sensor [

To find optimal predictors for the downscaling scheme, the performances of different predictor sets were compared. First, predictors were grouped according to their origin in order to evaluate which data sources contain relevant information. Then, single predictors with high potential were tested and sets were combined by expert knowledge. Further sets were compiled by different feature selection algorithms (described in the following three paragraphs) in order to eliminate covariant, irrelevant, redundant and noisy variables and find the most meaningful predictors.

The Minimum Redundancy Maximal Relevance approach (MRMR) was first used in bioinformatics for genome classification [

Forward selection is a linear step-by-step algorithm that has been successfully used by many researchers in order to build robust prediction models [^{2}) is selected as the second input. This step is repeated for all predictors. Finally, among the obtained subsets, the subset with optimum R^{2} is selected as the model input subset. This means adding further variables to this subset does not significantly increase the R^{2} [

An alternative to MRMR and forward selection would be the principal component analysis (PCA) as applied in LST downscaling by Zakšek and Oštir [

a predictor, in the example MAST

the variance in the LR domain is much lower for both predictors and dependent variable (which is plausible since the upscaling is essentially a low pass filtering operation), and

the linear relation is similar for both domains.

Hence, linear functions are calibrated by Multiple Regression in the LR domain and then applied to the HR predictors. However, it can also be seen that the linear equation only explains a part of the variance and that certain areas have a divergent thermal behavior.

The results of the Linear Regression models with different predictor sets are presented in ^{2}) for the training set in the LR domain but produced some high errors (mean absolute error (MAE), root mean square error (RMSE)) as well as small R^{2} for the independent validation data in the HR domain. Conversely, using regression with merely a single TIR predictor (selected from the highest correlation in LR) reached an RMSE of 2.63 K and an R^{2} of 0.53 for the HR validation data. The ACP predictors performed much better with RMSE = 2.17 K and R^{2} = 0.71 for MAST only and RMSE = 2.21 and R^{2} = 0.68 for MAST and YAST. Comparably, the principal components of all TIR scenes reached very good results in the downscaling scheme, with only the first component slightly outperforming larger sets (RMSE = 2.17, R^{2} = 0.70 for the first PC, RMSE = 2.37, R^{2} = 0.63 for the first three ranking PC, and RMSE = 2.42, R^{2} = 0.61 for the first five ranking PC). The sets chosen by expert knowledge also performed quite well (RMSE = 2.35, R^{2} = 0.63 for “expert1”, containing of YAST, ^{2} = 0.57 and RMSE = 2.31 K, R^{2} = 0.65), the addition of ^{2} = 0.54), the forward selection failed in selecting suitable downscaling predictor sets (RMSE = 4.18 K, R^{2} = 0.19 with 3 TIR bands, one green band and a morphological closing from the profile).

Hence, it can be stated that predictor sets should not be larger than a few predictors and that aggregated patterns of multitemporal thermal data are very powerful for downscaling LST.

The equation presented below for the ACP downscaling is evidently only valid for the specific acquisition and the coefficients are in fact a function of time:

Since the downscaling factor between SEVIRI and ASTER is quite large (approximately a factor of 33 in East-West direction and 67 in North-South), both predictors and testing data were resampled to lower resolutions between 200 m and 1,000 m using an area weighted average in order to evaluate the performance of the scheme for different resolutions.

Regarding the large downscaling factor between SEVIRI and ASTER data, the results of the Hamburg case study are very promising. Three proposed predictor sets reached an RMSE of about 2.2 K and an explained variance of about 0.7. An RMSE of less than 2 K was reached by three predictor sets at a target resolution of 300 m, at 1,000 m resolution RMSEs of about 1.65 K and R^{2} of about 0.8 were reached. These results are very good compared with the accuracy of the SEVIRI LST retrieval of 2 K [^{2} (0.77) for their Yokohama (Japan) case study. However, they downscaled from MODIS to ASTER resolution, which is a much smaller downscaling factor (about 100); comparable results were achieved in this study for a resolution of 300 to 400 m, which is still a higher downscaling factor. The downscaling from ASTER to 10 m resolution in San Juan (Puerto Rico) suggested by Dominguez ^{2} = 0.81) and higher error (RMSE = 2.8 K) than in this study. Stathopoulou and Cartalis [^{2} of about 0.94) but their error is also higher than ours (RMSE = 2.5 K). Keramitsoglou [

Although the presented results are encouraging compared with the listed studies, improvements are still possible. The random error of the downscaling scheme is even much lower, if the large systematic bias of about 1.3 K for all models and resolutions is considered. Thus, the overall error could be substantially decreased if an independent estimate of the bias was available. The reason for the bias is the geometry of data retrieval. The geostationary SEVIRI views Europe from the South, thus it sees a remarkable share of southern vertical sides of objects. These are warmer, while northern facades of buildings are not seen by SEVIRI at all. On board a low Earth orbiter, ASTER can be used off-nadir at low angles and thus also often observes surfaces towards West or towards East, resulting in an azimuth dependent LST. However, the viewing geometry of ASTER is still “largely perpendicular” compared with SEVIRI (for instance a pointing angle of 5.71°for the used ASTER scene instead of almost 60° for SEVIRI). Hence, mostly horizontal objects (especially roofs and streets) are seen by ASTER. This bias can also be seen in the scatterplots in

In general, the aggregated TIR parameters (ACP,

Heating patterns acquired during different seasons can be assumed to show comparable effects. Hence, the correlations between the ASTER LST and Landsat TIR data from different times of the year (also from different years) were investigated. In ^{2} while the thermal patterns from winter acquisitions are substantially different. Hence, predictor sets of more than one pattern are expected to be more stable in order to produce heating patterns for different times of the day.

NDVI predictors were expected to be suitable for the downscaling scheme. The correlation between single NDVI predictors and ASTER LST (see also ^{2} = 0.36). Secondly, NDVI should not be linearly upscaled but recomputed from the upscaled red and NIR bands. Thirdly and most importantly, the low correlations of the different NDVI result from the high fraction of water coverage in Hamburg. LST of the water bodies is largely dominated by their heat storage capacity as well as advection from upstream areas. For the given situation the water temperatures are lower than those of the vegetated areas. Conversely, water has a low NDVI like impervious surfaces (water bodies and adjacent sealed areas show LST deviations of more than 20 K). Hence, NDVI shows a much higher explained variance (e.g., R^{2} = 0.31 instead of R^{2} = 0.09 for scene #LT41960231989185XXX0) if the water pixels are excluded. Thus, we expect the fraction of active vegetation (besides those of impervious surfaces and water) to be a better LST predictor than NDVI. This is in agreement with Weng

To answer the question which predictors are suitable for the downscaling scheme, the explained variances of single predictors in LR and HR were compared. The results are displayed in ^{2} in LR domain, which is again a consequence of the water bodies that cover only small proportions of single SEVIRI pixels. Hence, the basic assumption of the downscaling is not fulfilled for these predictors. Conversely, the ACP (red circles), the first TIR PC (magenta asterix at lower left), and ^{2} in both HR and LR, which are almost identical (indicated by black dotted line). Hence, they fulfill the basic assumption and are suitable predictors, which is in good agreement with their results in

This study demonstrates that it is possible to downscale land surface temperature (LST) for a factor of about 2,000. By this spatial improvement the resulting accuracy (root mean square error = 2.2 K) remained comparable to the accuracy of LST retrieved by geostationary satellites. The most suitable downscaling predictors are aggregated from thermal infrared data. A surprising outcome of the case study is that the normalized difference vegetation index failed to explain the LST variability. Eventually, we aim to achieve a time dependent downscaling scheme, which presupposes validation data from

We are grateful to H. Peng and colleagues for the MRMR code, the Machine Learning Group at the University of Waikato for WEKA, and Jürgen Böhner and Olaf Conrad for SAGA-GIS. Further, we thank the NASA, the EEA, Open StreetMap, and Intermap Technologies for the predictor raw data and LSA SAF for the SEVIRI LST data. For his patience and helpful advice with the ASTER acquisition request we thank Tetsuro Nishimura. For proof reading and useful advice on the comprehensibility of the manuscript we thank Claire Nattrass, Leonie Pick, Eleonore Rauch and Michael Bock.

The Cluster of Excellence CliSAP (EXC177) is hosted by the KlimaCampus and funded by the German Federal and the Hamburg state Government. This research has been supported also by a grant from the German Science Foundation (DFG) number ZA659/1-1. The Centre of Excellence for Space Sciences and Technologies SPACE-SI is an operation partly financed by the European Union, European Regional Development Fund and Republic of Slovenia, Ministry of Higher Education, Science and Technology.

Relation between dependent variable land surface temperature (LST) in high spatial resolution (HR) (blue) and low spatial resolution (LR) (red) domain and different predictors. (

(

Resolution dependent performance of the downscaling scheme for different predictor sets. (^{2}.

Variance in ASTER LST data (day indicated by dotted line) that can be explained by Landsat data (TIR and NDVI) from different seasons.

Explained variance R^{2} of single predictors with LST in LR and HR.

Downscaling accuracy with linear regression for different predictor sets. Number of predictors in the set (Num), explained variance (R^{2}) of the model in LR domain, mean absolute error (MAE), root mean square error (RMSE), R^{2} and BIAS for the independent validation data in HR domain. Best predictor sets are highlighted blue.

| ||||||
---|---|---|---|---|---|---|

^{2} (LR) |
^{2} |
|||||

| ||||||

| ||||||

heighstat | 6 | 0.722 | 4.31 | 6.09 | 0.04 | 1.40 |

LUCshare | 3 | 0.634 | 2.23 | 2.96 | 0.33 | 1.33 |

NDVI | 32 | 0.825 | 6.40 | 9.15 | 0.00 | 1.46 |

TIR | 32 | 0.867 | 3.36 | 4.46 | 0.11 | 1.42 |

| ||||||

| ||||||

bestNDVI[lres] | 1 | 0.356 | 2.51 | 3.73 | 0.03 | 1.36 |

bestTIR[lres] | 1 | 0.628 | 2.08 | 2.63 | 0.53 | 1.32 |

| ||||||

| ||||||

MAST | 1 | 0.468 | 1.76 | 2.17 | 0.71 | 1.29 |

ACP | 2 | 0.593 | 1.82 | 2.21 | 0.68 | 1.31 |

tirpca1 | 1 | 0.575 | 1.79 | 2.17 | 0.70 | 1.31 |

tirpca3 | 3 | 0.622 | 1.89 | 2.37 | 0.63 | 1.37 |

tirpca5 | 5 | 0.633 | 1.89 | 2.42 | 0.61 | 1.38 |

| ||||||

| ||||||

expert1 | 7 | 0.698 | 1.86 | 2.35 | 0.63 | 1.35 |

expert2 | 2 | 0.514 | 1.85 | 2.47 | 0.57 | 1.34 |

expert3 | 2 | 0.591 | 1.82 | 2.31 | 0.65 | 1.33 |

| ||||||

| ||||||

MRMR2 | 2 | 0.641 | 1.98 | 2.51 | 0.54 | 1.31 |

MRMR6 | 6 | 0.702 | 2.03 | 2.64 | 0.52 | 1.35 |

MRMR10 | 10 | 0.717 | 2.12 | 2.91 | 0.43 | 1.35 |

ForwSel | 5 | 0.799 | 3.07 | 4.18 | 0.19 | 1.42 |