Multi-Resolution Study of Thermal Unmixing Techniques over Madrid Urban Area: Case Study of TRISHNA Mission

This work is linked to the future Indian–French high spatio-temporal TRISHNA (Thermal infraRed Imaging Satellite for High-resolution natural resource Assessment) mission, which includes shortwave and thermal infrared bands, and is devoted amongst other things to the monitoring of urban heat island events. In this article, the performance of seven empirical thermal unmixing techniques applied on simulated TRISHNA satellite images of an urban scenario is studied across spatial resolutions. For this purpose, Top Of Atmosphere (TOA) images in the shortwave and Thermal InfraRed (TIR) ranges are constructed at different resolutions (20 m, 40 m, 60 m, 80 m, and 100 m) and according to TRISHNA specifications (spectral bands and sensor properties). These images are synthesized by correcting and undersampling DESIREX 2008 Airborne Hyperspectral Scanner (AHS) images of Madrid at 4 m resolution. This allows to compare the Land Surface Temperature (LST) retrieval of several unmixing techniques applied on different resolution images, as well as to characterize the evolution of the performance of each technique across resolutions. The seven unmixing techniques are: Disaggregation of radiometric surface Temperature (DisTrad), Thermal imagery sHARPening (TsHARP), Area-To-Point Regression Kriging (ATPRK), Adaptive Area-To-Point Regression Kriging (AATPRK), Urban Thermal Sharpener (HUTS), Multiple Linear Regressions (MLR), and two combinations of ground classification (index-based classification and K-means classification) with DisTrad. Studying these unmixing techniques across resolutions also allows to validate the scale invariance hypotheses on which the techniques hinge. Each thermal unmixing technique has been tested with several shortwave indices, in order to choose the best one. It is shown that (i) ATPRK outperforms the other compared techniques when characterizing the LST of Madrid, (ii) the unmixing performance of any technique is degraded when the coarse spatial resolution increases, (iii) the used shortwave index does not strongly influence the unmixing performance, and (iv) even if the scale-invariant hypotheses behind these techniques remain empirical, this does not affect the unmixing performances within this range of resolutions.


Introduction
In the context of urban spread, 68% of the world population is projected to live in urban areas by 2050 according to the United Nations, and climate change is very likely to increase both the frequency and intensity of heat waves, hence the need for methodologies to monitor urban climates and UHI (urban heat islands) becomes critical.Remotely sensed data from space in the thermal infrared (TIR) is quite well adapted for this purpose, but systems combining both high spatial resolution and revisit capacities are still missing.Different projects are currently under study such as TRISHNA (Thermal infraRed Imaging Satellite for High-resolution natural resource Assessment) developed in cooperation between French CNES (Centre National d'Etudes Spatiales) and Indian ISRO (Indian Space Research organisation) space agencies [1], LSTM (Land Surface Temperature Mission) at ESA [2] (European Space Agency), and HyspIRI (Hyperspectral InfraRed Imager) in NASA/JPL (National Aeronautics and Space Administration/Jet Propulsion Laboratory) (https://hyspiri.jpl.nasa.gov/)[3].All of them are aiming at spatial resolutions in the Thermal InfraRed (TIR) close to 50 m.Nevertheless, the extreme fragmentation of the city structures with small size elements (such as buildings, streets, etc. . . ) and the variability of artificial materials used makes accessing information at smaller scale mandatory-a few meters or tens of meters.
During recent decades, disaggregation methods based on combining low resolution TIR data with high resolution data in the Visible and Near InfraRed (VNIR) domain have been developed to derive high resolution products in the TIR [4][5][6].All these methodologies rely on the hypothesis of conservation at finer resolution of relationships established at coarser resolution between temperature and a feature estimated using the VNIR bands.Different refinements on features and relationships were then brought, making a number of unmixing methods now available [7].Thus, Disaggregation of radiometric surface Temperature (DisTrad), which was first presented to study agricultural landscapes [8], supposes a second order polynomial behavior (which can be reduced to linear) between Normalized Difference Vegetation Index (NDVI) and Land Surface Temperature (LST).Thermal imagery sHARPening (TsHARP), introduced by Agam et al. 2007 for the study of different types of crops [4,9,10], supposes both linear and polynomial behavior between NDVI and LST and a non-linear relationship between Fractional Cover (FC) and LST.Wang et al. 2015 presented Area-To-Point Regression Kriging (ATPRK) and Adaptive Area-To-Point Regression Kriging (AATPRK), which assume a linear behavior between fine resolution observables (in our case shortwave features) and coarse resolution observables (in our case LST) in both urban and rural scenarios [6,11].Other techniques have been developed in the last years, such as Urban Thermal Sharpener (HUTS) hypothesizing polynomial behavior between NDVI, albedo and LST [5], or Multiple Linear Regressions (MLR) assuming linear behavior between LST and multiple Land-Use/Land-Cover (LULC) types [12].Despite the huge work developed in the last years, the performances of many of these methods have never been studied across spatial resolutions and the scale hypotheses on which they rely have been studied superficially for DisTrad and in the case of MODIS (MODerate-resolution Imaging Spectroradiometer) satellite [13,14].In addition, from the best of our knowledge, the group of techniques analyzed in this article has not been compared before, even if some of them have been compared between them.The behavior of the unmixing performances in function of the used shortwave features has already been studied for DisTrad in Essa et al. 2012 [15].However, from our knowledge this study has not been performed for the other empirical techniques used in the present work.
This paper focusses on urban areas, which present a more difficult case of study due to the high diversity of materials and structures, leading to highly heterogeneous situations.It first proposes a comprehensive literature review of seven thermal unmixing methods based on the TIR-VNIR combination.Their performances are then compared in the case of Madrid city in Spain when retrieving fine resolution LST from different coarse resolutions.The TIR image resolutions (coarse resolutions), from which the unmixing is performed, are: 40 m, 60 m, 80 m, and 100 m and the fine resolution of the VISible-Near InfraRed (VIS-NIR) image is 20 m.These images are obtained by applying atmospheric correction and spatial aggregation to DESIREX 2008 AHS data at 4 m resolution [16], and by selecting only the spectral bands needed to model TRISHNA.The spatial aggregation used to synthesize the satellite images takes into account the Signal to Noise Ratios (SNR) and the Modulation Transfer Functions (MTF) modelling TRISHNA.Each thermal unmixing technique has been tested with several shortwave features, in order to choose the most performant.Once the index has been chosen, the unmixing performances of the techniques are compared at each resolution.The evolution of the performances across resolutions is also studied, as well as the validity of the hypotheses behind each technique.The evaluation of the unmixing methods is conducted over different districts in Madrid to characterize the impact of the urban structure on their performances.
In Section 2, the theoretical aspects, hypotheses, and application methodology of the seven unmixing techniques compared along the article are presented.In Section 3, a brief description of DESIREX 2008 AHS data is given (Section 3.1), the configuration of the modelled satellites is outlined (Section 3.2), and the processing theory and methodology from AHS to multispectral satellite data are also explained (Section 3.3).In Section 4, the evaluation criteria used to compare the unmixing performances are presented.In Section 5, the unmixing results for the different empirical techniques and for the different synthetic satellite images at different coarse resolutions are shown.First, the LST characterization of the city is shown from the simulated satellite images at different resolutions without unmixing (Section 5.1).Second, the LST characterization of the city is shown for several unmixing tehcniques applied on the different satellite resolutions (Section 5.2).In Section 6, the validity of the unmixing techniques hypotheses is discussed together with the obtained unmixing performances and the best choice of the shortwave index used to perform the unmixing.Finally, in Section 7, the conclusions and future perspectives are presented.

Temperature Unmixing Procedures
All the unmixing procedures presented in this section rely on the base that shortwave optical domain images are acquired at a better resolution than TIR domain ones, and consequently, the finer information present in the shortwave domain can be used to unmix the TIR images.To this end, the methods below express the surface temperature as a function of a given shortwave feature, T = f (I).These features can be reflectance or radiance measures at a given wavelength or indices mixing measures at different wavelengths.Throughout this article only indices are used as shortwave features.Depending on the nature of the function f , two main kinds of methods can be distinguished: those where f is linear, and those where f is not.

Theoretical Aspects
This family of thermal unmixing techniques, among which the most famous are: DisTrad [8], TsHARP [4,10], and more recently, ATPRK [11] and AATPRK [6], are based on the hypothesis that LST at a given resolution is linearly related with a given feature extracted from the shortwave optical range at the same resolution.Following this hypothesis, we can express the LST at a generic resolution l as: where T l is the LST and I l is a feature obtained from the shortwave wavelengths, both at resolution l.
The second hypothesis is that the linear relationship between the surface temperature and the shortwave feature is the same at any resolution, i.e., surface temperature behaves, in function of this feature, identically across different resolutions.Then, the linear parameters a l (intercept) and b l (slope) do not depend on the scale and they can be noted a and b.Equation (1) becomes: where l indicates a generic resolution l.This hypothesis represents the scale invariance of the linear relationship [5,6].This family of unmixing procedures presents four different steps in its application.
• First, using the TIR spectral bands of the image, the LST T L (x L ) is obtained at a given coarse resolution L. At the same time, using the shortwave spectral bands, a feature I is obtained at a finer resolution η (I η (x η )).By aggregating (spatially averaging) the shortwave spectral bands, this feature is also obtained at the coarse resolution L (I L (x L )).Both, LST and shortwave features are functions of the pixel locations x (x L at the coarse scale and x η at the fine scale).

•
Second, the linear parameters a and b are obtained by linear regression at the coarse resolution L (Figure 1a)).
• Third, this linear relationship is applied pixel by pixel to deduce the temperature T η at the fine scale (Figure 1b)).
• Fourth, the residuals ∆T η of the unmixed LST (T η ) are estimated and used to correct T η (Figure 1c)).
The sharpening procedures belonging to this family mainly differ on the shortwave feature used, or in the residuals estimation.We classify the main methods of this family in three groups depending on the residuals estimation and the locality of the linear parameters estimation.

DisTrad, TsHARP, and Variations
In this group, a third hypothesis appears concerning the residuals ∆T η estimation.The residual estimated for a coarse pixel at resolution L is supposed to be the residual for all the fine pixels at resolution η within this coarse pixel [13]: This hypothesis is called scale invariance of the residuals [13].
Once, Equations ( 3) and ( 4) have been used to obtain the unmixed temperature T η .The LST residuals can be estimated pixel by pixel at the coarse scale using two different methods.On one hand, the coarse residuals are defined, pixel by pixel, as the measured temperature minus the linear regression predicted temperature [15]: On the other hand, the residual of each coarse pixel is estimated as the difference between the measured temperature at coarse resolution and the mean of the unmixed LSTs inside the pixel [5]: where N η is the number of fine pixels within a coarse pixel.
The application procedure for all this group of techniques is the same, being the only difference between the techniques the shortwave feature used.Classically, DisTrad uses NDVI [8,13] and TsHARP uses FC [4,13].In general, this procedure can be applied with a generic feature.The two conditions for the feature are: • the feature must be obtained at a finer resolution η in order to be able to perform the unmixing, • at the different resolutions between L and η, the temperature should behave "linearly" in function of the feature.
Essa et al. 2012 presented some studies on the unmixing performance of these methods for different features [15], showing that for Dublin in spring from 60 m to 30 m resolution the most performant feature is impervious percentage, followed by some vegetation indices such as NDVI and FC.

ATPRK
The first steps of this technique are exactly the same as those of the DisTrad family procedure: Equations (3)- (5), respectively.However, Area-To-Point Regression Kriging [6,11] presents a new residuals estimation methodology based on the area-to-point spatial interpolation of Kyriakidis [17].The residuals at the coarse resolution are estimated following Equation (7).From these coarse resolution residuals, the residual of each fine pixel x η is obtained by: where N is the number of coarse pixels (x L,i ) surrounding the fine pixel (x η ) and taken into account in the estimation (see Figure 2), λ i is the weight of the coarse pixel x L,i in the residual estimation of fine pixel x η , and ∆T L (x L,i ) is the residual of the coarse pixel x L,i .This residual estimation at the fine resolution, as a weighted linear combination of the coarse residuals around, can be interpreted as a smoothing of the fine residual function ∆T η (x η ), i.e., a weighted mean filtering that smooths the possible aberrant values of the fine residuals.
In order to estimate the λ i values, two constraints are imposed: the bias has to be zero (under spatial stationarity hypothesis), i.e., the sum of the weights must be normalised to one: and the variance of the estimator has to be minimal.Thus, solving the optimization problem, by using Lagrange multipliers method, results in the Kriging system: where θ is the Lagrange multiplier used in the error minimization, γ LL (x L,i , x L,j ) is the coarse-to-coarse semivariogram between coarse pixels x L,i and x L,j and γ ηL (x η , x L,j ) is the fine-to-coarse semivariogram between fine pixel x η and coarse pixel x L,j .
xL,i xη Figure 2. Gray pixels are the N pixels x L,i used to compute the residual of green pixel x η .The number of coarse pixels N can vary (in this exemple N = 25).The same N coarse pixels are used to compute the residuals of all the fine pixels within the center coarse one (gridded in the exemple).However, the weights (λ i ) can be different for each fine pixel, as the distances vary.
Before obtaining γ LL and γ ηL semivariograms, the coarse residual semivariogram at Euclidean distance s is estimated as: with N s the number of pixels at a distance s from x L,0 .This coarse residual semivariogram is then deconvoluted to obtain the fine-to-fine semivariogram γ ηη (s), with s being now the distance between two fine pixels.Then, from this fine-to-fine semivariogram, the fine-to-coarse and the coarse-to-coarse semivariograms are respectively: with s m the distance between fine pixel x η and any fine pixel within the coarse pixel x L,j , s mm the distance between any fine pixel within the coarse pixel x L,i and any fine pixel within the coarse pixel x L,j and σ the pixel size ratio between fine and coarse resolutions.Equations ( 13) and ( 14) are obtained by assuming that the coarse pixel value is the average of the fine pixel values within it [6].Finally, these fine-to-coarse and coarse-to-coarse semivariograms are injected in Equation (11).
To obtain the fine-to-fine semivariogram γ ηη which allows to obtain γ ηL and γ LL , Wang et al. [6,11] propose an empirical approach.First, they suppose that γ ηη depends only on two parameters: sill and range, with the nugget being equal to zero [18].Then, they suppose γ ηη to have the same shape as γ L but with different sill and range parameters.In this article, γ ηη (s) has been chosen to behave exponentially on the distance s.Finally, they search the sill and range parameters of γ ηη by minimizing γ L (s) − γ R L (s).Where γ R L (s) is the regularized semivariogram defined as γ R L (s) = γ LL (s) − γ LL (0), with γ LL defined in Equation ( 14) which depends on γ ηη .The equation to minimize is then: with γ (sill, range) ηη the fine-to-fine semivariogram we want to estimate and s (s) mm and s (0) mm the distances between any couple of fine pixels within coarse pixels x L,i and x L,j at distance s (||x L,i − x L,j || = s) and 0 (||x L,i − x L,j || = 0), respectively.Once sill and range are obtained via the minimization of Equation ( 15), γ ηη is defined and Equations ( 13) and ( 14) can be used to obtain the fine-to-caorse and coarse-to-coarse semivariograms.

AATPRK
Adaptive Area-To-Point Regression Kriging was presented by Wang et al. [6] to take into account the non spatial stationarity of the image.The global linear regression of Equation ( 3) cannot take into account local variations of the image, which can lead to different regression values for different sites.In order to consider the effect of local variations, AATPRK estimates the linear regression of Equation ( 3) for the pixels inside a sliding window of size N × N, and assigns the obtained values to the center pixel (x L,0 ) of the window.The residual estimation procedure of this methodology is the same as the ATPRK one.Consequently, the procedure steps are: First, linear regressions at the coarse resolution are achieved inside a moving window of size N × N to obtain the linear parameters a(x L,0 ) and b(x L,0 ): Second, the linear relationship is applied pixel by pixel to obtain the temperature T η at the fine scale.
where x η refers to the fine pixels inside x L,0 .Third, the residual of each coarse pixel is estimated as the difference between the measured temperature at coarse resolution and the linear regression predicted temperature: Fourth, the residuals ∆T η of the unmixed temperature (T η ) are estimated using Equation ( 9) and used to correct T η . T

More Complex Relationships Between Temperature and Shortwave Features
All the above presented methods suppose linear behavior of LST on a given shortwave feature, at least within a spatial window.However, other procedures not assuming this linear behavior have been recently developed.

Bi-Linear
This technique, called bi-linear in this work (simplest version of MLR [12]), supposes that the LST behaves linearly in two different shortwave features I (1) and I (2) : In addition, this technique supposes the scale invariance of the regression parameters a 0 , a 1 , and a 2 .The steps of this method are the same as those of the DisTrad family techniques by substituting the linear behavior of that family by the bi-linear relation of Equation (20).

HUTS
The original version of High-resolution Urban Thermal Sharpener [5] considers that land surface temperature behaves as a fourth order polynomial of NDVI and albedo.It is well known that NDVI and albedo are strongly correlated quantities [19][20][21], and therefore redundant information will be provided to the relationship.In order to study other possible shortwave combinations, a generic fourth order polynomial relationship between LST and a given couple, feature I (that can be NDVI or another) and albedo (α), is written: HUTS also supposes that the parameters p i (with i ∈ [1,15]) on the expression remain constant across resolutions.The fundaments of HUTS are the same as those of the DisTrad family procedure.First, a regression is performed at the coarse resolution to obtain the parameters p i .Then, the regression parameters are injected in Equation ( 21) at scale η (where the albedo and the shortwave feature have been measured) to obtain the land surface temperature T η .The residuals at the coarse scale can be obtained by using the HUTS version (the linear regression is replaced by the polynomial one) of Equation (7) or Equation (8).These residuals at the coarse scale are newly supposed to be constant across resolutions (Equation ( 6)) and are used to correct the LST estimation.
Due to the complex polynomial relationship between LST and shortwave features, some unmixed temperatures can have aberrant values, specially in the regions of the temperature-features space with a low density of data points.For this reason, HUTS procedure includes a quality control for the final land surface temperature [5].The lower limit for acceptable unmixed temperatures is fixed at the water surface temperature and the upper limit is fixed to be 5 K higher than the highest value of the coarse measured temperature.If any unmixed pixel temperature is out of these limits, the surface temperature of the pixel is obtained as the distance-weighted mean value of the 5 × 5 pixel block surrounding this aberrant pixel.

Classification-Unmixing Techniques
Classification-unmixing techniques are based on the possibility of combining a ground classification of the image with the above unmixing methods.Thus, the unmixing can be performed on each different class separately [15].This combination of classification-unmixing assumes that the pixels of the same class (c i ) present LSTs that behave following a given function f c i (I), which is different than the functions of the other classes f c j (I) with j = i.Consequently, this approach supposes that the unmixed methodologies can achieve better performances being applied separately on the diffferent classes.
Two different pixel classifications have been performed, both of them followed by DisTrad unmixing applied on the different classes separately.The two classifications used in this article are: index based classification using NDVI and NDBI (Normalized Difference Built-up Index) to differenciate vegetation from impervious surfaces, and albedo to differenciate high albedo impervious surfaces from low albedo impervious surfaces.A NDVI higher than 0.15 and a NDBI lower than −0.15 allow to select vegetation pixels.Those pixels with NDV I ≤ 0.15 and NDBI ≥ −0.15 are differenciated between low albedo (α < 0.2) and high albedo (α > 0.2) impervious surfaces.

Datasets and Preprocessing
In this section, the Aircraft Hyperspectral Scanner (AHS) data used to model satellite images are presented (Section 3.1).Then, the different modelled satellite configurations are introduced (Section 3.2).Finally, the AHS image processing used to obtain shortwave indices and LSTs at different spatial resolutions is explained (Section 3.3).

DESIREX Dataset
Airborne hyperspectral scanner data taken over the city of Madrid, during the DESIREX campaign 2008 [16,[22][23][24] (summer), at 4 m resolution over 80 spectral channels (0.443-13.4 µm) is used.The data correspond to three urban and suburban images of Madrid from Getafe to Universidad Autónoma (see Figure 3) obtained at 2497 m altitude from: Together with these aircraft hyperspectral images, atmospheric characterizations were performed each day.On one hand, temperature profiles from ground level to 25 Km altitude, and atmospheric water vapor content were measured with soundings several times a day at three different emplacements.On the other hand, relative himidity and air temperature evolution were measured in 6 fixed masts located in rural, urban-medium and urban-dense spots.During the DESIREX campaign LST and emissivity ground measurements were also acquired.LST was measured every five minutes in the same 6 fixed masts as air temperature, and thermal images were acquired at different emplacements and simultaneously with the AHS flights.A spectral characterization, reflectivity and emissivity measures, of urban surfaces was also done during the DESIREX campaign.The complete description of the dataset can be found in the DESIREX 2008 final report [16,[22][23][24].

Simulated Satellite Configurations
From these hyperspectral images at 4 m pixel resolution, the images of 4 multispectral satellites with 40 m, 60 m, 80 m, and 100 m pixel resolutions in the TIR bands respectively, and 20 m pixel resolution in the shortwave bands are synthesized.All the modelled satellites present the same spectral configuration with five bands in the VNIR-SWIR domain and four bands in the TIR domain.The used AHS bands are those closer to the present set-up (pending resolution) of the future french-indian mission TRISHNA [1], see Table 1.

Satellite Multispectral Modelling
From AHS radiance measures, satellite radiance images at different resolutions are synthesized by using direct and inverse radiative transfer solutions COCHISE and COMANCHE [25] and next applying a spatial undersampling.The preprocessing from the initial AHS data to the final multispectral satellite images is different for the shortwave bands and for the thermal bands as the radiative transfer equation and the TRISHNA sensors specifications (SNR and MTF) are different, see Figure 4.In this work, to complete the DESIREX atmospheric description, the mid-latitude summer temperature profile of MODTRAN is used from 25 Km to the top of atmosphere.In addition, the MODTRAN urban aerosols model is used with a visibility of 23 Km, as indicated in the DESIREX technical report for those days.In the shortwave optical domain (0.4-2.5 µm) the radiance measured by a given sensor aiming at nadir at wavelength λ is: where R λ s is the radiance at the sensor level, R λ BOA is the radiance at ground level, τ (atm,s) λ is the total atmospheric transmission between ground and sensor (in shortwave optical range, direct and diffuse transmission are taken into account), R (atm↑,s) λ is the upwelling atmospheric radiance between ground and sensor, and R (envi↑,s) λ is the upwelling environment radiance.Due to environment terms and diffuse transmission, which are important in the shortwave range, a direct estimation of Top Of Atmosphere (TOA) radiance from AHS radiance is not desirable at these wavelengths.As an alternative, first the Bottom Of Atmosphere (BOA) reflectance is estimated from AHS radiance using COCHISE: to consecutively obtain the TOA radiance (COMANCHE), see green lines of Figure 4: Once R λ TOA images have been obtained, they are spatially aggregated.The spatial aggregation procedure aims at decreasing the resolution by increasing the pixel size, but taking also into account the TRISHNA sensor parameters to model the satellite measures, see Figure 4.These parameters are the modulation transfert function and the signal to noise ratio.Both the MTF and the SNR have been defined according to the first evaluations of the TRISHNA instrument characteristics.
The bottom of atmosphere reflectances ρ λ on the VNIR-SWIR bands can be obtained by applying COCHISE on the TOA radiance satellite data.With the multispectral satellite configuration described in Table 1, several different indices are selected (see Table 2).

TIR Processing: Land Surface Temperature
In the thermal infrared domain (8.0-12.0µm), the radiance measured by a given sensor at nadir and wavelength λ is: where R λ s is the radiance at the sensor level, R λ BOA is the radiance at ground level, τ (atm,s) λ is the direct atmospheric transmission between ground and sensor (in thermal infrared range, diffuse tansmission is negligible), and R (atm↑,s) λ is the upwelling atmospheric radiance between ground and sensor.Equation ( 25) is valid for aircraft (R λ s = R λ AHS ) and satellites (R λ s = R λ TOA ).Being R λ BOA the same for both sensors, it is possible to write: which is related to a linear behaviour of R .These slope and intercept only depend on the atmosphere conditions and the observation angle (here near nadir).For a given atmosphere (characterized by the DESIREX campaign measures together with MODTRAN models), R λ TOA and R λ AHS are computed using COMANCHE [25] for 75 different emissivity spectra at 6 different temperatures.Thus, it is possible to obtain 75 × 6 = 450 radiances at each altitude (aircraft and satellite) from emissivities going from very weak values to almost 1 and for temperatures going from 0 • C to 60 • C.These ranges of emissivities and temperatures are supposed to be representative of the DESIREX urban scenario, and 450 radiances are supposed to be enough to define the linear behavior of Equation ( 26) well.The intercept and slope parameters of Equation ( 26) for a given atmosphere can be then computed by performing a least square estimation.Finally, using Equation ( 26), the obtained slope and intercept are applied to the radiance AHS images to model the TOA data.The spatial aggregation procedure is the same for shortwave and thermal ranges, and only differs on the used parameters.
For the temperature estimation, Temperature and Emissivity Separation (TES) methodology [26] is used, which has been presented as the most convenient during the DESIREX campaign, with robust LST estimations and without the need of shortwave information, previous emissivity knowledge, or day-night acquisitions [16,22,23].In addition, the convenience of TES for both LST and emissivity retrieval was studied in Oltra-Carrio 2013 [24] when processing AHS data of Madrid acquired during the DESIREX campaign.DESIREX ground measurements were used in [24] to characterize the TES performances.Furthermore, TES performances when retrieving LST and emissivity has been also studied in A. Michel et al. 2019 for simulated TRISHNA data of Madrid [27].Both works show that TES outperforms other methods such as Split-Window.
Before applying the TES methodology to obtain the land surface temperature, Equation ( 25) is used to perform an atmospheric correction and obtain, from the satellite radiance measure R λ TOA , the ground radiance R λ BOA , see Figure 4. TES method presents three steps:

•
First, temperature is estimated at each sensor wavelength by considering the emissivity to be constant and equal to a reference value ( re f = 0.99 [26]).
• Second, the Maximum Minimum Difference (MMD) value of the emissivities is calculated.

•
Third, the minimum value of the emissivity is estimated following: where the three parameters α 1 , α 2 and α 3 depend on the sensor.Next, the emissivity value is redefined to have the minimal emissivity estimated in the last step.

Evaluation Criteria
Land surface temperature measured at 20 m resolution using TES (T 20m ) is used as reference along the article.Then, to quantitatively compare the different unmixing techniques, root mean square error (RMSE, Equation ( 28)), mean bias error (MBE, Equation ( 29)), and cross correlation (R, Equation ( 30)) are computed for the unmixed LSTs at 20 m resolution ( T20m ) obtained from each different coarse resolution and for each shortwave index.
where T 20m (x 20m,i ) is the reference temperature at 20 m resolution of pixel i, T20m (x 20m,i ) is the unmixed temperature at 20 m resolution for the same pixel and N is the number of pixels.Finally, in order to compare the similarity between reference LST and unmixed LST images, Structural Similarity Index (SSIM, Equation ( 31)) [35] between reference LST and the unmixed LSTs at 20 m resolution is also computed.
where µ T and µ T are the local sample means of T 20m and T20m respectively, σ T and σ T are the local sample standard deviations of T 20m and T20m respectively, and σ T, T is the sample cross correlation of T 20m and T20m .C 1 , C 2 , and C 3 are small positive constants to stabilize each term.In this article, local means, standard deviations, and correlations are computed within a sliding Gaussian Kernel with standard deviation equal to 1.5 pixels.Hence, to quantify the performance of the LST unmixing, three classical statistical tools are used, RMSE, MBE and R, together with an image similarity measure, SSIM.These image comparisons support the visual analysis.Table 3 summarizes the different compared techniques and evaluation criteria.

Results: Comparative Study of Performances
In this section, the LST unmixing performances of DisTrad, ATPRK, AATPRK, Bi-linear, and HUTS methodologies (Section 2) are compared for different coarse resolutions and with different shortwave indices.Classification-unmixing method has been also tested with two different ground classifications and DisTrad applied on each class.First, the LST is estimated from the satellite TIR images without unmixing (Section 5.1).This analysis will allow us to later compare the thermal unmixing improvements.Next, 7 different unmixing techniques are applied to the 4 coarse resolution TIR images.The technique performances are compared for a given resolution and the evolution of the performances across resolutions is also studied (Section 5.2).In this section, only the results for the June 28th noon image are shown.However, the July 1st and 4th noon images have been also studied leading to very similar quantitative results, see Appendix A.

LST Retrieval from Simulated Satellite Data
A decrease in the quality of the LST characterization when the resolution decreases is illustrated in Figure 5, which shows Madrid city center LST at different resolutions, and Figure 6, which shows Atocha train station and Gran Via avenue LSTs, both at different resolutions.The increase of the ground sample distance leads to a loss of detail in the visual inspection: the Atocha train station roof begins to dissapear between 40 m and 60 m resolution (Figure 6) and the Gran Via Avenue-Ricardo Leon Street discrimination is impossible at 40 m resolution (Figure 6).This decrease in the quality of the LST characterization is quantified in Table 4, which shows RMSE, MBE, R and SSIM estimated between the 20 m LST reference and the uniformly disaggregated (UniTrad) [5,13] versions of the coarser images.RMSE, R, and SSIM show a decrease in the LST characterization performance when the resolution decreases.Madrid City Center Surface Temperature

LST Retrieval from Unmixing
Each unmixing technique has been tested with 14 different shortwave indices, see Table 2.In this section, only the results obtained with the most performant index are presented.This index can be different depending on the unmixing procedure.The comparison of the 7 unmixing methods at different resolutions leads to 2 main conclusions: one on the best technique for a given resolution, and another, on the evolution of each technique's performance across resolutions.
Quantitative and visual analysis have been performed for the 4 different coarse resolutions 40 m, 60 m, 80 m, and 100 m.However due to the constant dynamic across resolutions of the unmixing performances, which decrease when the resolution decreases, and to avoid redundancies in the article, only unmixed LST images for 60 m and 100 m coarse resolutions are shown.
Figures 7 and 8 show the unmixed LST of Madrid city center obtained with different unmixing techniques from 60 m and 100 m resolution respectively.Visual analysis corroborates that LST unmixing allows to discriminate the fine structure of the city (Figures 7 and 8), which was indiscernible in the initial coarse resolution images (Figure 5).The net of small streets in the neighborhood at north of the Retiro park can be observed, and Madrid old town (red region at west of Retiro Park) can be differenciated from the larger avenues at north and south.Visual analysis also shows a smoothing of the temperature values, i.e., the maximal (minimal) values of the unmixed temperature are lower (higher) than the maximal (minimal) values of the reference temperature.In addition, a spatial blurring of the unmixed temperatures, for regions with small scale objects such as narrow streets, is observed with AATPRK.This effect is specially evident for the 100 m coarse resolution.
Figures 9 and 10 zoom around Atocha train station, Figures 11 and 12 around Madrid old town, and Figures 13 and 14 around Gran Via avenue.These zooms allow a finer visual inspection of the unmixing performance of each technique.In Figures 9 and 10 the train station structure is better conserved by AATPRK and ATPRK methods.However, the small streets near the train station are almost lost with AATPRK due to the observed spatial blurring.In the old town images (Figures 11 and 12), it is difficult to visually state which method performs the better, with ATPRK and DisTrad giving results that are more similar to the reference image, but closely followed by other techniques such as HUTS or AATPRK.The spatial blurring found with AATPRK, for regions composed of small scale objects, is highlighted in these images.In the Gran Via images (Figures 13 and 14), ATPRK and AATPRK are the methods that better delineate the large avenues.Nevertheless, this time other techniques such as HUTS, or DisTrad also have good performances.
Figure 15a,b shows the 7 boxplots (one for each unmixing technique) of the 20 m unmixed LST ( T20m ) together with the boxplot of the measured LST at 20 m (T 20m ), used as reference.Figure 15c,d shows the boxplots of the difference ∆T = T 20m − T20m .Boxplots are calculated taking into account the pixels of the whole image.As the visual study, this statistical analysis is shown for 60 m to 20 m unmixing (Figure 15a,c) and 100m → 20 m unmixing (Figure 15b,d).For a given unmixing resolution, the statistical differences between methods are small.The best methods ATPRK and AATPRK present temperature boxplots closer to the reference LST one, and consequently narrower differences (∆T) around zero.The smoothing of the temperature values observed visually, is also shown by the boxplots: T20m boxplots are always narrower than T 20m one.The boxplots of the LST difference ∆T for 100 m → 20m unmixing are wider than those of the 60 m → 20 m unmixing, showing a worse performance in the unmixing from 100 m, as expected.
Finally, RMSE, MBE, R, and SSIM for the 7 unmixing techniques are calculated.Table 5 shows this quantities calculated on the whole image and for the 4 coarse resolutions.Table 6 shows the results obtained on Atocha (Figures 9 and 10), Madrid old town (Figures 11 and 12), and Gran Via (Figures 13 and 14) for 60 m and 100 m coarse resolutions.For each unmixing technique and resolution, the most performant index is also indicated.The results shown in these tables support the statistical and the visual analysis.SSIM, R, and RMSE show that ATPRK is the most performant procedure at any tested coarse resolution, followed by AATPRK and DisTrad (all the three estimated with NDBI index) and Bi-linear (with FC and NDBI).In addition, the performances of the different methodologies (as shown by RMSE, MBE, R, and SSIM) degrade similarly across resolutions.
The comparison between the results of Tables 5 and 6, and the results obtained without unmixing in Table 4, leads to conclude that applying ATPRK (the most performant unmixing technique) improves the resolution of the image.

Limits of the Validity of the Hypotheses
Unmixing methods of Section 2 are based on the hypothesis that LST can be expressed as a function of a shortwave index, T = f (I).In the case of linear behavior (T = a + bI), Figure 16 shows this relationship at 20 m resolution for any of the 14 tested indices.The linear relationship is most of the time hard to observe with R 2 < 0.5 for any index.However, some indices such as FC (f), NDVI (i) or NDBI (e) present a behavior closer to linear than others such as PISI (l), BRBA (n), or NBI (m).The validity of this hypothesis is studied in [14] for different indices and resolutions, showing that both NDVI and NDBI are strongly correlated with LST for resolutions ranging from 30 m to 960 m when studying the urban scenario of Nanjing city.
All the unmixing methods used along this article present a second common hypothesis: the regression parameters obtained at the coarse scale are supposed to be the same at any resolution.For the simplest case of a linear relationship between temperature and shortwave feature (Equation (2)), Figure 17 (for NDVI), Figure 18 (for NDBI), and Figure 19 (for FC) show that this independance of the regression parameters (intercept and slope) on the resolution is false for the range of scales studied here.Variations on the regression parameters across resolutions can be shown for any of the 14 tested shortwave indices.These variations are more or less important depending on the index.
To test the error in the LST unmixing due to the non validity of this last hypothesis, the unmixing is performed by taking the regression parameters obtained at 20 m (finer scale) and arguing that they are constant across resolutions, as the hypothesis states.The performances of the different techniques and for the different coarse resolutions are shown in the "scale-invariance hypothesis" columns of Table 5.In the case of linear regression (DisTrad, ATPRK, or classification-DisTrad) the obtained performances are almost the same.The hypothesis is then considered good enough.For the case of more complex relationships between temperature and index (HUTS) when the ratio between coarse and fine resolutions increases (80 m and 100 m coarse resolutions), the unmixing performances are slightly degraded due to the non-validity of the hypothesis.Taking the regression parameters obtained at 20 m results in a more performant unmixing, because the regression parameters across resolutions are different enough.
All the unmixing methods include a third hypothesis in the residuals estimation step.This hypothesis varies from one method to another, see Section 2. Figures 20 and 21 show the LST obtained without residuals correction ( ˆ T) for the 7 different unmixing techniques and for coarse scales of 60 m and 100 m respectively.In addition, Table 7 shows the RMSE, MBE, R and SSIM calculated between the 20 m reference LST and ˆ T obtained from 60 m and 100 m coarse scales.Both, the figures and the table show the importance of the residual correction step, and consequently the importance of the adopted hipothesis.The performances of the unmixing techniques drastically deteriorate if no residual correction is done.Furthermore, the choosen residual correction hypothesis strongly influences the unmixing performance.The linear regression case (DisTrad and ATPRK) is specially interesting, since both have the same ˆ T but very different residual hypotheses, leading to very different final unmixing performances (Table 5).These differences between DisTrad and ATPRK unmixing performances are also shown in Table 6 for different subsets: Atocha, Madrid old town and Gran Via Avenue.It can be seen that ATPRK always outperforms DisTrad.Nevertheless, the performance differences are weaker when the analyzed region is more homogeneous (Madrid old town) and when the coarse TIR scale is larger (100 m).In this linear case, Figure 22 shows the boxplot of the residuals at 20 m measured at this scale (used as reference), and the boxplots of the residuals at 20 m obtained during the unmixing process with Equation (8) (DisTrad1), Equations (7) (DisTrad2) and (9) (ATPRK) from coarse scales 40 m, 60 m, 80 m, and 100 m.All the boxplots take into account the pixels of the whole image.Equation (8), Equations ( 7) and ( 9) present very similar residuals distribution, which are close to the reference one.However, the residuals distributions obtained during the unmixing are always narrower than the reference.The residuals obtained with Equations ( 8) and (7) show also a small shift on the mean and median that does not appear in Equation ( 9) residuals.The mean and median of the reference residuals at 20 m are 0 K and −0.06 K respectively.The mean and median of DisTrad1 residuals obtained in the unmixing from 60 m are 0.12 K and 0.15 K.The mean and median of DisTrad2 residuals obtained in the unmixing from 60 m are 0.01 K and 0.02 K.While the mean and median of the residuals obtained with ATPRK from the same coarse resolution are 0.01 K and 0 K, being thus closer to the reference ones.The only difference between DisTrad and ATPRK is the residuals estimation, and then the differences in performances are only due to these residuals differences.We can conclude that the ATPRK hypothesis for residuals estimation is better than DisTrad hypothesis.It is important to note that the DisTrad residual hypothesis (Equation ( 6)) is included in the area-to point possibilities (Equation 9) when λ i for the coarse pixel containing the fine pixel is equal to 1 (and consequently the other λ i are equal to 0, see Equation (10)).
Table 7 also shows that for AATPRK, with local linear regressions, the residuals correction leads to less important improvements of the unmixing performances.However, even in this case, the performances are improved.This supports the importance of the residuals correction and the convenience of area-to-point residuals correction methods.8), DisTrad2 to the residuals obtained with Equation ( 7), and ATPRK with those obtained with Equation (9).

Assessment of the Best Index
The validity of the first and second hypotheses, linear behavior of T vs I and scale invariance of the regression parameters respectively, will define the best index for unmixing.Figure 16 shows that the first hypothesis is, in the best-case, poorly respected by some indices with R 2 values between 0.36 and 0.42.Consequently, this first hypothesis does not make the difference between indices.However, NDBI is, amongst the 14 tested indices, one of those that more respect the second hypothesis, with variations fewer than 2 K in the slope and a constant behavior for the intercept between 20 m and 100 m resolutions (Figure 18).This "agreement" of NDBI with the invariance hypothesis for the regression parameters can explain why NDBI is the most performant index for the unmixing procedures supposing linear regression.We can also see that FC, despite presenting the best R 2 value for the linear regression, does not respect the second hypothesis (Figure 19).

Evaluation of the Unmixing Performance
The unmixing performances obtained through these empirical approaches indicate that shortwave ground characterization and LST are related and that the first can be used to unmix the second.
Amongst the tested unmixed techniques, ATPRK presents the best performances.In the residual estimation, it takes into account the close environment of the fine pixel and not only the coarse pixel in wich the fine one is contained, and consequently it allows a larger domain of variation of the LST and a higher heterogeneity of the area.This result shows the importance of the residuals estimation step and the convenience of the area-to-point residuals estimation.
The LST spatial blurring produced by AATPRK when the unmixed region is close to homogeneous is due to the chosen window size in the method (in this article 5 × 5 coarse pixels).AATPRK estimates the regression parameters within this window, and when the LST values inside the window are close, but not equal, this method tends to "average" them, as the slope of the temperature-index relationship is close to zero (Figures 11 and 12).On the opposite, when the values of LST within the window are different enough, this method leads to a high slope in the temperature-index relationship of the window's pixels and consequently to a more important contrast between unmixed pixels, as it can be seen in the Atocha train station images (Figures 9 and 10).
The used classification-unmixing techniques do not lead to better results, indicating either that LST does not present different linear behaviors on the index for the different defined classes, or that more adequated ground classifications should be tested, i.e., classifications defining classes containing pixels with the same temperature behavior.

Conclusions
In this work, we modelled, from AHS data of Madrid at 4 m resolution obtained during the DESIREX 2008 campaign, the corresponding data as would be obtained by four different multispectral satellites with the same spectral configuration and spatial resolution in the shortwave optical domain and different resolution in the TIR range.The TIR resolution of each modelled satellite is 40 m, 60 m, 80 m, and 100 m, while the shortwave resolution is 20 m.From these satellite data, we estimate the LST at the TIR resolution and shortwave indices characterizing the ground use at the shortwave resolution.
Then, seven empirical unmixing techniques were selected, each one with 14 different indices, to obtain the LST at 20 m resolution from the estimated LST at TIR resolution.The unmixing performances of each technique applied on the different TIR resolutions were compared, and also the evolution of these performances across resolutions.ATPRK with NDBI as shortwave feature appears as the best LST unmixing technique for any tested resolution, with performances which are slightly better than those of AATPRK and DisTrad.Furthermore, visually ATPRK gives images that are closer to the reference LST than AATPRK, which presents a spatial blurring of the LST in near (but not) homogeneous regions.AATPRK being created to face spatial non-stationarities discriminates better the shape of objects with a special temperature behavior as the Atocha train station but homogenize regions with a behavior close to, but not, homogeneous.Other techniques, such as HUTS, K-means-DisTrad, or index-classification-DisTrad with a more complex hypotheses on the LST-Index relationship, present worse performances.This can indicate the relative validity of the linear LST-Index relationship approach.The combination of two indices in the regression (Bi-linear technique) does not improve the unmixing performances either.However, Bi-linear methodology can be generalized to n features (MLR) [12]: T l = a 0 + a 1 I The choice of the features should be adequate to not incorporate redundant information in the equation.The more the number of features does not mean the better the unmixing.So, finding a good index combination to improve the unmixing performance should be studied in the future.
The shortwave and TIR data used throughout this paper were acquired at the same time and by the same sensor (AHS, 4 m resolution).Consequently, shortwave and TIR pixel locations matched perfectly.However, the use of different satellites, or the use of different aquisition times for shortwave and TIR ranges can be envisaged, leading to new error sources in the unmixing.These error sources merit future study.
Finally, the validity of the hypotheses of the unmixing methods was studied, determining that even if the hypotheses are not completely matched, in the case of linear models the impact on the technique performances is not important.
We can conclude that, in the LST characterization of a mid-latitude city during summer (Madrid, end of June beggining of July), ATPRK is the best performing unmixing technique.This result indicates that the linear behavior of LST on NDBI together with the area-to-point residuals estimation configure the best hypotheses for the unmixing.Performances of ATPRK with NDVI are close to those with NDBI, indicating the needless of a SWIR band to study urban LST.
The performances of the above unmixing techniques have also been studied for DESIREX 2008 data from July the first and July the fourth leading to equivalent results and then supporting the above conclusions.
Despite the agreement of the results obtained for the city of Madrid at three different dates of summer 2008, unmixing performances could be dependent of season and climatological parameters as well as urban structure and urban material differences.So, unmixing performance differences could be found for heterogeneous cases.A generalization for arbitrary urban areas appears as an interesting issue for future research.

Figure 1 .
Figure 1.Main unmixing steps of the seven empirical techniques presented in this work (Linear case exemple).(a) Linear regression (red line) at the coarse scale L and coarse residuals estimation ∆T L .(b) Unmixed Land Surface Temperature (LST) estimation without residual correction T η .(c) Residual correction of T η with the fine scale residuals ∆T η .

• the 11 :
53 flight line of 28th June.No wind and clear sky.• the 11:44 flight line of 1st July.Moderate south-west wind and hazy atmosphere.• the 11:32 flight line of 4th July.Moderate north-east wind and some high clouds.

NFigure 3 .Figure 4 .
Figure 3. Madrid Airborne Hyperspectral Scanner (AHS) RGB image from Getafe to Universidad Autónoma at 4 m resolution.The red rectangles correspond to the regions shown in the results section for visual analysis.

Figure 5 .
Figure 5. Madrid city center land surface temperature.From left to right at 20 m, 40 m, 60 m, 80 m, 100 m resolutions.

Figure 6 .
Figure 6.Atocha train station and Gran Via avenue land surface temperature.From left to right at 20 m, 40 m, 60 m, 80 m, 100 m resolutions.

Figure 17 .
Figure 17.Evolution across scales of linear regression parameters for the LST-NDVI relationship.

Figure 18 .
Figure 18.Evolution across scales of linear regression parameters for the LST-NDBI relationship.

Figure 19 .
Figure 19.Evolution across scales of linear regression parameters for the LST-FC relationship.

Figure 20 .Figure 21 .Figure 22 .
Figure 20.Madrid city center T (LST without residuals correction) at 20 m resolution.For the unmixed temperatures the initial coarse scale was 60 m.

Table 1 .
Spectral and spatial characterization of the four modelled multispectral satellites.
λ TOA in function of R λ AHS with slope

Table 3 .
Summary of unmixing techniques, shortwave features used for each technique, scales used in the performance estimation, and performance evaluation criteria.

Table 4 .
Root mean square error (RMSE), mean bias error (MBE), cross correlation (R), and Structural Similarity Index (SSIM) between 20 m LST reference and UniTrad versions of LST satellite images from 40 m, 60 m, 80 m, and 100 m resolutions.

Table 5 .
RMSE, MBE, R, and SSIM for LST images unmixed from 40 m, 60 m, 80 m, and 100 m resolutions with different unmixing techniques: by taking the regression parameters obtained at the coarse resolution (Unmixing Columns) and by taking the regression parameters obtained at 20 m (Scale-invariance hypothesis Columns).For each technique and coarse resolution, these mathematical quantities are estimated for the most performant shortwave feature, which is also indicated.A 20 m LST image is used as reference.

Table 6 .
RMSE, MBE, R, and SSIM for LST images of Atocha, Gran Via, and Madrid old town at 20 m unmixed from 40 m, 60 m, 80 m, and 100 m resolutions with different unmixing techniques.For each technique and coarse resolution, these mathematical quantities are estimated for the most performant shortwave feature, which is also indicated.A 20 m LST image is used as reference.

Table 7 .
RMSE, MBE, R, and SSIM for T (LST without residuals correction) images at 20 m unmixed from 40 m, 60 m, 80 m, and 100 m resolutions with different unmixing techniques.For each technique and coarse resolution, these mathematical quantities are estimated for the most performant shortwave feature, which is also indicated.A 20 m LST image is used as reference.

Table A3 .
RMSE, MBE, R, and SSIM for LST images unmixed from 40 m, 60 m, 80 m, and 100 m resolutions with different unmixing techniques.For each technique and coarse resolution, these mathematical quantities are estimated for the most performant shortwave feature, which is also indicated.A 20 m LST image is used as reference.Results for 04-07-2008.