Monitoring 10-m LST from the Combination MODIS / Sentinel-2, Validation in a High Contrast Semi-Arid Agroecosystem

: Downscaling techniques o ﬀ er a solution to the lack of high-resolution satellite Thermal InfraRed (TIR) data and can bridge the gap until operational TIR missions accomplishing spatio-temporal requirements are available. These techniques are generally based on the Visible Near InfraRed (VNIR)-TIR variable relations at a coarse spatial resolution, and the assumption that the relationship between spectral bands is independent of the spatial resolution. In this work, we adopted a previous downscaling method and introduced some adjustments to the original formulation to improve the model performance. Maps of Land Surface Temperature (LST) with 10-m spatial resolution were obtained as output from the combination of MODIS / Sentinel-2 images. An experiment was conducted in an agricultural area located in the Barrax test site, Spain (39 ◦ 03 (cid:48) 35” N, 2 ◦ 06 (cid:48) W), for the summer of 2018. Ground measurements of LST transects collocated with the MODIS overpasses were used for a robust local validation of the downscaling approach. Data from 6 di ﬀ erent dates were available, covering a variety of croplands and surface conditions, with LST values ranging 300–325 K. Di ﬀ erences within ± 4.0 K were observed between measured and modeled temperatures, with an average estimation error of ± 2.2 K and a systematic deviation of 0.2 K for the full ground dataset. A further cross-validation of the disaggregated 10-m LST products was conducted using an additional set of Landsat-7 / ETM + images. A similar uncertainty of ± 2.0 K was obtained as an average. These results are encouraging for the adaptation of this methodology to the tandem Sentinel-3 / Sentinel-2, and are promising since the 10-m pixel size, together with the 3–5 days revisit frequency of Sentinel-2 satellites can fulﬁll the LST input requirements of the surface energy balance methods for a variety of hydrological, climatological or agricultural applications. However, certain limitations to capture the variability of extreme LST, or in recently sprinkler irrigated ﬁelds, claim the necessity to explore the implementation of soil moisture or vegetation indices sensitive to soil water content as inputs in the downscaling approach. The ground LST dataset introduced in this paper will be of great value for further reﬁnements and assessments.


Introduction
Time series of fine spatial and temporal resolution Thermal Infrared Images (TIR) are essential in a variety of agricultural applications, water resources management or irrigation scheduling, based on surface energy balance modeling [1][2][3][4]. However, spatio-temporal resolution of the operational TIR satellite sensors results are insufficient for some applications and services, including agriculture. The importance of high-resolution TIR images is being claimed [5][6][7][8][9]. The limitation in the TIR domain remains, since the revisit time for high spatial resolution TIR sensors is typically poor, while the spatial resolution for those with a high revisit frequency is too coarse. In practice, the spatial resolution requirements of satellite-derived surface temperature for agricultural applications are <50 m to face certain physical limitations related to the sensor's point spread function in TIR observations [2,10,11]. As for the temporal resolution, daily TIR observations are desired, although this requirement could be relaxed to 3 days as a minimum threshold [7,11].
The Copernicus conceptual mission LSTM [12] could complement other planned high-resolution TIR missions (e.g., the JPL-NASA Landsat 9-10 or the Indian-French TRISHNA mission [13]) and fulfill the spatio-temporal requirements stated above. In the meantime, downscaling methods are contributing to filling this gap by downscaling the TIR coarse resolution to finer resolutions [3,[14][15][16][17]. Several techniques have been proposed in the literature to enhance the spatial resolution of the TIR domain over vegetated areas by linking TIR and reflectance information in the Visible Near Infrared (VNIR) [18][19][20][21]. These techniques are generally based on the assumption that there exists a relation between the vegetation cover and the LST. According to these approaches, a relation between the TIR and VNIR bands is first obtained at coarse spatial resolution, and then applied at the finer resolution of the VNIR bands, assuming that this relation is scale invariant.
The Normalized Difference Vegetation Index (NDVI) or the Fractional Vegetation Cover (FVC) are the most commonly used inputs in sharpening techniques, although some studies have recently explored the possibility of implementing other combinations of reflectance values that can better characterize the surface response [17,22,23]. There are also some efforts attempting to integrate soil moisture delineated vegetation indices [24], and even radar-derived soil moisture [25] in the formulations of the LST downscaling.
For years, the Moderate Resolution Imaging Spectroradiometer (MODIS) or the Advanced Along-Track Scanning Radiometer (AATSR) were combined with Landsat or Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) imagery to downscale LST from 1000 m × 1000 m to~1 ha (10,000 m 2 ) scale. Higher resolution VNIR sensors, such as Formosat or the Satellite pour l´Observation de la Terre (SPOT), have been also used to improve the disaggregated LST pixel size [10,26].
The synergistic use of Copernicus Sentinel-2 (S2) and Sentinel-3 (S3) imagery could offer the desired solution of high spatial and temporal resolution [8,26,27]. Although no TIR information is provided, the Sentinel-2A and -2B tandem offers a 3-5-day repeat cycle, and a 10-20 m spatial resolution in the VNIR bands. Revisit time for S3 reduces to 1-2 days, with a spatial resolution of 1000 m for their thermal channels. The relationship between TIR and VNIR bands could be extracted from S3 and then applied to S2 VNIR data. Sobrino et al. [27] explored the conceptual combination of the MultiSpectral Instrument (MSI), on board the Sentinel-2, and the Ocean and Land Color Instrument/Sea and Land Surface Temperature Radiometer (OLCI/SLSTR), on board the Sentinel-3, to show an improvement in LST products derived from AATSR at that time, before the Sentinel-3 data were available. High spatial resolution data from S2 was used to improve the characterization of the sub-pixel heterogeneity through a better parameterization of surface emissivity, although no downscaling was applied by these authors. A machine learning algorithm was proposed by Guzinski and Nieto [8] to sharpen low-resolution TIR observations from S3 using high-resolution VNIR S2 imagery. Huryna et al. [28] applied the methodology introduced by Agam et al. [19] to the combination of S3-S2 imagery. However, the methodology was tested using Terra/MODIS or Landsat observations in both works, due to the lack of high-resolution TIR data to use for cross-validation.
Despite the extraordinary growth of downscaling studies in the past decade, the assessment of the thermal sharpening techniques has been traditionally conducted by cross-validation with derived LST products at original Landsat or ASTER TIR spatial resolutions, 60-100 m and 90 m, respectively [1,8,17,21,28]. Comprehensive ground validations of disaggregated LST are quite scarce, due to the lack of robust datasets covering high contrast heterogeneous areas.
This paper continues the work initiated by Bisquert et al. [1]. These authors tested the application of different downscaling techniques in an experimental site in Barrax (Albacete, Spain) from the combination MODIS-Landsat to provide LST at fine spatial and temporal resolutions, to fulfill the requirements in the estimation of surface energy fluxes and evapotranspiration in the agricultural areas of semi-arid regions, where small land holdings dominate. Bisquert et al. [1] analyzed both classical methods based on the VNIR-LST regression, as well as more advanced approaches based on Neural Networks (NN) or Data Mining (DM). Linear, quadratic and exponential relationships, proposed in the literature, were tested and results were compared to those obtained by applying NN and regression trees in a DM approach using reflectance values from all the spectral bands available. These authors observed that NN and DM, as well as the nonlinear regression tested, have the risk of overfitting, being very sensitive to noise in the samples. They concluded that the simpler NDVI-LST linear regression led to the better results in this case. Bisquert et al. [1] explored the technique results for the different land covers in the Barrax area, and found the largest uncertainties for irrigated croplands, especially in summer when cover heterogeneity and irrigation effects are stressed. As a follow-up, Bisquert et al. [26] extracted disaggregated LST maps at a 10-m spatial resolution for the first time, using high-resolution SPOT-5 images in the framework of the Spot-5 Take 5 project. Results shown in [26] were encouraging for the further application of the model to operational S2 images.
In this context, the objective of this paper is to revise and adapt the downscaling technique to the combination MODIS-S2 to derive operational LST maps with a spatial resolution of 10 m. Some adjustments to the original formulation of the approach were introduced to reduce the model uncertainty by adding an additional image-based parametrization of the residual as a function of the VNIR response. Ground LST data from an experimental campaign carried out in the summer of 2018 were used for the model evaluation. The variety of croplands and the contrast in the surface conditions during the experiment in the selected area allowed a comprehensive analysis of the performance of the downscaling technique, not achieved before. Strengths and limitations of the models were discussed, and also some guidelines for the optimal use of this technique with Sentinel-3 and Sentinel-2 imagery are given. This paper is structured as follows. Section 2 describes the study site, the field measurements and the satellite imagery used, as well as the downscaling methodology. Results of the ground validation and distributed assessment are shown in Section 3. Interpretation of the results and comparison with previous studies comprise Section 4. Finally, Section 5 summarizes the main conclusions of this work.

Study Site and Measurements
This work was conducted in the semi-arid area of Barrax, southeast Spain (39 • 03 N, 2 • 06 W). This is a very flat area with an average altitude of 700 m a.s.l, close to Albacete (Figure 1), traditionally used by ESA (European Space Agency) as a test site in different international campaigns [29][30][31]. Irrigated and rainfed crops combine in this agroecosystem, with field size ranging from small terrains below 1 ha to large pivots over 50 ha (Figure 1). This large variety makes Barrax a perfect target to assess the performance of a downscaling technique and explore its strengths and weaknesses.
Ground measurements of LST (LST g ) were registered in "Las Tiesas" experimental farm during the summer of 2018, concurrent with EOS-Terra/MODIS overpasses, and covering a total of 10 different crops in 13 independent fields ( Figure 1). The temperatures were measured using four hand-held infrared radiometers (IRTs) Apogee MI-210. These radiometers have a broad thermal band (8-14 µm), with a 22 • field of view and an accuracy of ±0.2 K, according to the manufacturer (Apogee Instruments, Inc.). In fact, the similar Apogee SI-121 radiometers (same radiometer, but with a field of view of 18 • and without datalogger) were calibrated against a National Institute of Standards and Technology (NIST) blackbody, during a comparison of TIR radiometers carried out in Miami by the Committee on Earth Observation Satellites (CEOS), and the accuracy was established at 0.2 K [32]. Special care was taken with the ground measurements in the sparse crops (vineyard and almond orchards), by averaging soil and canopy component temperatures to obtain representative values of the target LST. The radiometers were manually carried back and forth along transects on the fields pointing at nadir view, at a height of 1.5-2 m above the ground surface. Temperatures were registered at a rate of 5-10 measurements/min, in transect distances of 30-50 m/min, and then covering several hectares with each IRT. The 10-min averages centered at the satellite overpass time were considered. Radiometric temperatures were corrected from atmospheric and emissivity effects [33]. Downwelling sky radiance was measured with each radiometer and emissivity data were obtained through the Temperature-Emissivity Separation (TES) procedure [34,35] from in situ thermal radiance measurements using a multispectral radiometer CIMEL Electronique CE 312-2 [36]. Instruments, Inc.). In fact, the similar Apogee SI-121 radiometers (same radiometer, but with a field of view of 18° and without datalogger) were calibrated against a National Institute of Standards and Technology (NIST) blackbody, during a comparison of TIR radiometers carried out in Miami by the Committee on Earth Observation Satellites (CEOS), and the accuracy was established at 0.2 K [32]. Special care was taken with the ground measurements in the sparse crops (vineyard and almond orchards), by averaging soil and canopy component temperatures to obtain representative values of the target LST. The radiometers were manually carried back and forth along transects on the fields pointing at nadir view, at a height of 1.5-2 m above the ground surface. Temperatures were registered at a rate of 5-10 measurements/min, in transect distances of 30-50 m/min, and then covering several hectares with each IRT. The 10-min averages centered at the satellite overpass time were considered. Radiometric temperatures were corrected from atmospheric and emissivity effects [33]. Downwelling sky radiance was measured with each radiometer and emissivity data were obtained through the Temperature-Emissivity Separation (TES) procedure [34,35] from in situ thermal radiance measurements using a multispectral radiometer CIMEL Electronique CE 312-2 [36].  Table, together with indication of crop type and field size.

Satellite images
Terra/MODIS images with near nadir observations of the study site (field of view <25 ) were selected to minimize the bowtie effect [37]. Six different dates were used for this work (Table 1). MODIS VNIR and TIR data were extracted from the MOD09GQ, and MOD11_L2 products, respectively, downloaded from the NASA Earthdata Search tool. MOD09GQ offers surface Red and NIR reflectivity values at a 250 m spatial resolution. MOD11_L2 product provides LST, atmospherically corrected with a split-window algorithm, at a 1000 m spatial resolution [38].
Sentinel-2A and Sentinel-2B images concurrent or within ±1-day timing difference with MODIS were used (see dates in Table 1). S2 Level-2A products were downloaded from the Copernicus Open Access Hub, and they contain 10-m surface reflectance values. Bands 4 and 8 were used to compose the Normalized Difference Vegetation Index (NDVI). Figure 2 shows an example of the spatial distribution of the NDVI over the study site. Note the wide range in NDVI values available in the area during the experiment. Plot in Figure 3 shows the NDVI values for the different crop fields in the selected dates.  Table, together with indication of crop type and field size.

Satellite Images
Terra/MODIS images with near nadir observations of the study site (field of view <25 • ) were selected to minimize the bowtie effect [37]. Six different dates were used for this work (Table 1). MODIS VNIR and TIR data were extracted from the MOD09GQ, and MOD11_L2 products, respectively, downloaded from the NASA Earthdata Search tool. MOD09GQ offers surface Red and NIR reflectivity values at a 250 m spatial resolution. MOD11_L2 product provides LST, atmospherically corrected with a split-window algorithm, at a 1000 m spatial resolution [38].
Sentinel-2A and Sentinel-2B images concurrent or within ±1-day timing difference with MODIS were used (see dates in Table 1). S2 Level-2A products were downloaded from the Copernicus Open Access Hub, and they contain 10-m surface reflectance values. Bands 4 and 8 were used to compose the Normalized Difference Vegetation Index (NDVI). Figure 2 shows an example of the spatial distribution of the NDVI over the study site. Note the wide range in NDVI values available in the area during the experiment. Plot in Figure 3 shows the NDVI values for the different crop fields in the selected dates. Landsat-7/ETM+ overpasses were also available for 3 of the selected dates (9, 16 and 25 July). Although ETM+ TIR band has a resolution of 60 m, a cubic convolution resampling to 30 m is applied for user distribution. Thus, these images, with a 30-m spatial resolution, were used as a reference for an extended validation of the disaggregated LST. For the VNIR bands the Landsat Surface Reflectance (CDR) product was used, whereas the original TIR data in band 6 were corrected from atmospheric and emissivity effects following the method proposed by Galve et al. [39].    Figure 1, and the images/dates listed in Table 1.

Downscaling Approach
Bisquert et al. [1] tested different downscaling methods with pairs of Landsat/MODIS images in this Barrax area. A modification of the sharpening method presented by Agam et al. [19] showed the best results. This method is based on the linear relationship established between NDVI and LST at the MODIS 1000 m resolution (NDVIMOD and LSTMOD, respectively). The approach outlined by   Table 1.
Landsat-7/ETM+ overpasses were also available for 3 of the selected dates (9, 16 and 25 July). Although ETM+ TIR band has a resolution of 60 m, a cubic convolution resampling to 30 m is applied for user distribution. Thus, these images, with a 30-m spatial resolution, were used as a reference for an extended validation of the disaggregated LST. For the VNIR bands the Landsat Surface Reflectance (CDR) product was used, whereas the original TIR data in band 6 were corrected from atmospheric and emissivity effects following the method proposed by Galve et al. [39].

Downscaling Approach
Bisquert et al. [1] tested different downscaling methods with pairs of Landsat/MODIS images in this Barrax area. A modification of the sharpening method presented by Agam et al. [19] showed the best results. This method is based on the linear relationship established between NDVI and LST at the MODIS 1000 m resolution (NDVI MOD and LST MOD , respectively). The approach outlined by Bisquert et al. [1] has been revised and adapted to the combination MODIS-S2 and used in this work as a basis to derive 10-m LST maps.
The flowchart in Figure 4 shows the main steps and calculations of this downscaling algorithm that can be summarized as follows: 1.
The aggregation of the VNIR bands was carried out by averaging the reflectance values in the red and NIR bands of the 10-m S2 pixels, and 250-m MODIS pixels within an equivalent 1000 m MODIS pixel; 2.
Differences between S2 and MODIS VNIR data due to spectral resolution, atmospheric correction, viewing angle or pixel footprint were corrected through a normalization extracted from the 1000 m NDVI, then applied to 10-m S2 NDVI (NDVI N ); 4.
The 1000 m coarse spatial resolution required a previous selection of "pure" pixels for the NDVI-LST adjustment. This selection was based on a confidence value calculated from the comparison between NDVI MOD and aggregated NDVI N . This confidence value was computed as the ratio between the standard deviation from the 4 × 4 pixels belonging to each 1000 m pixel, and its mean value, as suggested by [18]. Pixels with confidence values within the lowest quartile were selected in this step; 5.
A linear regression was established between NDVI MOD and LST MOD at 1000 m, using data from those "pure" pixels, and then applied to the NDVI N values to obtain a prime estimate of 10-m LST (LST prime ); 6.
The Bisquert et al. [1] algorithm included a residual (R LST ) correction to account for the local conditions, and to correct the possible deviations produced by the NDVI-LST equation. This residue was calculated as the difference between the original and predicted LST at a coarse resolution, and this residue value was then added equally to all high-resolution pixels composing a coarse pixel. Since this residual correction leads to some boxy effect, Bisquert et al. [1] used a Gaussian filter to smooth. This final step was revised, and a modification is introduced in this work by adding a smoothing based on a linearization between the residual R LST and the NDVI MOD itself from 1000 m data. This linear relationship between the residue and the NDVI was then applied to 10-m NDVI N ( Figure 4); 7.
Finally, 10-m LST values were obtained by adding this residual R LST to original 10-m LST prime data from step 5. This new protocol to derive the residue values was expected to reduce the LST deviation, particularly in small size fields surrounded by a different cover, and then contribute to an overall improvement in the model performance.
to an overall improvement in the model performance. This downscaling procedure was applied to pairs of MODIS-S2 images. When no Sentinel-2 image was available concurrent with the MODIS overpass, close in time images (±1 day) were used, under the assumption of minimum changes in NDVI. Note the normalization procedure applied in step 3 reduced possible differences at this point. The assessment of the MODIS-S2 downscaling method was carried at both local and distributed scales, by comparison with ground measurements and Landsat-7/ETM+ LST products, respectively. In this last case, the comparison was established at the 30-m spatial resolution provided by U.S. Geological Survey (USGS). Following Gao et al. [20], the aggregation of 10-m S2 LST was done through the Stefan-Boltzmann law, with the assumption of similar emissivity values for adjacent pixels. Some differences may arise due to the 20-30 min delay in acquisition time between MODIS and ETM+ sensors. A normalization procedure was applied to minimize these discrepancies in LST values [1]. A linear regression between the aggregated Landsat and MODIS images at 1000 m was obtained, and then applied at 30-m spatial resolution, for each pair of MODIS-ETM+ images.
The model performance was quantified in terms of classical statistical metrics, such as the determination coefficient (r 2 ), the root mean square difference (RMSD), the systematic difference parameter (Bias), the mean absolute deviation (MAD), or the mean absolute deviation in percentage This downscaling procedure was applied to pairs of MODIS-S2 images. When no Sentinel-2 image was available concurrent with the MODIS overpass, close in time images (±1 day) were used, under the assumption of minimum changes in NDVI. Note the normalization procedure applied in step 3 reduced possible differences at this point.
The assessment of the MODIS-S2 downscaling method was carried at both local and distributed scales, by comparison with ground measurements and Landsat-7/ETM+ LST products, respectively. In this last case, the comparison was established at the 30-m spatial resolution provided by U.S. Geological Survey (USGS). Following Gao et al. [20], the aggregation of 10-m S2 LST was done through the Stefan-Boltzmann law, with the assumption of similar emissivity values for adjacent pixels. Some differences may arise due to the 20-30 min delay in acquisition time between MODIS and ETM+ sensors. A normalization procedure was applied to minimize these discrepancies in LST values [1]. A linear regression between the aggregated Landsat and MODIS images at 1000 m was obtained, and then applied at 30-m spatial resolution, for each pair of MODIS-ETM+ images.
The model performance was quantified in terms of classical statistical metrics, such as the determination coefficient (r 2 ), the root mean square difference (RMSD), the systematic difference parameter (Bias), the mean absolute deviation (MAD), or the mean absolute deviation in percentage (MADP) [40]. Following Schneider et al. [41], other statistics considered more robust and less influenced by outliers were also calculated, such as the median bias (Me), robust standard deviation (RSD) and robust RMSD (R-RMSD). The skewness and kurtosis were also included, which quantitatively describe the distribution of the differences between the estimated and observed values.

Ground Validation
The field scale assessment was performed using the ground data as a reference. IRT radiometric temperatures were corrected from atmospheric and emissivity effects, and average values for each 10-min transect/field were calculated. A total of 51 LST g data were available for this study (Table 1). Measured LST g values were in the range 297-327 K. The lowest values were observed for fully vegetated crops (grass, potato, or maize), whereas the largest values corresponded to bare soil, soil dominated Remote Sens. 2020, 12, 1453 8 of 16 crops (vineyard and almond orchard) and senescence cereals. Standard deviation of LST g data per crop field were <±1.5 K in 90% of the dataset, with a maximum value of ±1.9 K, showing the thermal homogeneity of the fields and the thermal stability during the 10-min interval.
The methodology described above was applied to the six pairs of MODIS-S2 images listed in Table 1. Mean values of 5 × 5 high-resolution pixel arrays, centered in the location of the ground transects, were calculated and plotted against LST g (see Figure 5). Disaggregated LST values ranged from 302 to 322 K, pointing to a certain limitation of the disaggregation technique to reproduce extreme low and high temperatures. Values of the standard deviation for the 5 x 5 pixel averages were <± 2.0 K.
All parcels in this study were provided with sprinkler irrigation system, except the vineyard and almond orchard, where drip irrigation was supplied. Irrigation was scheduled and frequently applied during the study period. For a few hours after an irrigation event, a cooling effect occurs consequence of the wetted surface. This effect is stressed when sprinklers are used. This was the case of 20% of our dataset, with 12 ground transects collected just a few hours/minutes after irrigation events. These points are plotted with non-filled circles in Figure 5. Note the evident overestimation of the disaggregated LST compared to LST g values, with differences >10 K in some cases. These results reinforce Bisquert et al. [26] findings pointing a shortcoming of the method over wet soil areas.
Certain limitations were also observed for the highest LST values. By stablishing a threshold of 325 K, only five data were excluded corresponding to fallow and tilled barley or poppy.
The field scale assessment was performed using the ground data as a reference. IRT radiometric temperatures were corrected from atmospheric and emissivity effects, and average values for each 10-min transect/field were calculated. A total of 51 LSTg data were available for this study (Table 1). Measured LSTg values were in the range 297-327 K. The lowest values were observed for fully vegetated crops (grass, potato, or maize), whereas the largest values corresponded to bare soil, soil dominated crops (vineyard and almond orchard) and senescence cereals. Standard deviation of LSTg data per crop field were <±1.5 K in 90% of the dataset, with a maximum value of ±1.9 K, showing the thermal homogeneity of the fields and the thermal stability during the 10-min interval.
The methodology described above was applied to the six pairs of MODIS-S2 images listed in Table 1. Mean values of 5 × 5 high-resolution pixel arrays, centered in the location of the ground transects, were calculated and plotted against LSTg (see Figure 5). Disaggregated LST values ranged from 302 to 322 K, pointing to a certain limitation of the disaggregation technique to reproduce extreme low and high temperatures. Values of the standard deviation for the 5 x 5 pixel averages were <± 2.0 K.
All parcels in this study were provided with sprinkler irrigation system, except the vineyard and almond orchard, where drip irrigation was supplied. Irrigation was scheduled and frequently applied during the study period. For a few hours after an irrigation event, a cooling effect occurs consequence of the wetted surface. This effect is stressed when sprinklers are used. This was the case of 20% of our dataset, with 12 ground transects collected just a few hours/minutes after irrigation events. These points are plotted with non-filled circles in Figure 5. Note the evident overestimation of the disaggregated LST compared to LSTg values, with differences >10 K in some cases. These results reinforce Bisquert et al. [26] findings pointing a shortcoming of the method over wet soil areas.
Certain limitations were also observed for the highest LST values. By stablishing a threshold of 325 K, only five data were excluded corresponding to fallow and tilled barley or poppy. Focusing on LST data lower than 325 K, and excluding points corresponding to recently irrigated conditions, a good agreement (r 2 = 0.90) is observed between disaggregated LST 10m and LST g values ( Figure 5). Differences range between ±4.0 K, with a systematic deviation of 0.2 K and a RMSD value of ±2.2 K (see Table 2). The kurtosis values (~−1) indicate a behavior close to the normal distribution, while the negligible skewness observed indicates a LST-difference distribution closely centered at 0.
The plot in Figure 6 superposes results obtained running the Bisquert et al. [1] algorithm. Good agreement is also observed by this original formulation, although some scatter is added, with an increase in the RMSD value up to ±2.7 K in this case.
MODIS LST values are superposed to plot in Figure 6 too, showing a large scatter (RMSD=±8.0 K) and discrepancies >10 K. This deviation is stressed for low temperatures registered in small size vegetated parcels that are surrounded by bare soil or other croplands with higher LST. This effect was already observed by Bisquert et al. [26]. increase in the RMSD value up to ±2.7 K in this case.
MODIS LST values are superposed to plot in Figure 6 too, showing a large scatter (RMSD=±8.0 K) and discrepancies >10 K. This deviation is stressed for low temperatures registered in small size vegetated parcels that are surrounded by bare soil or other croplands with higher LST. This effect was already observed by Bisquert et al. [26].

Distributed Assessment
Beyond the ground validation at a field scale, the model performance was assessed at a larger distributed scale by using the three concurrent Landsat-7/ETM+ images as a reference. The Single-Band Atmospheric Correction (SBAC) tool, recently introduced by Galve et al. [39], was used in this work for the correction of the TIR data.
Prior to the downscaling assessment, the feasibility of the Landsat-derived LST data needs to be tested. Ground data were also used to evaluate the Landsat-7/ETM+ LST estimates. The plot in Figure 7 shows the comparison between estimated and ground-measured LST values for a total of 21 data available for the three Landsat dates/images. Differences ranged within ± 3.5 K, except four cases corresponding to 1.1 and 1.2 sites. Note that these are small size parcels (<2 ha), for which the spatial resolution of Landsat 7/ETM+ is not fine enough, resulting in an overestimation of the LST values in these vegetated targets (potato and grass). Excluding these data from the analysis, a good matching with the 1:1 line was observed, with a coefficient of determination of r 2 = 0.96. An average error of RMSD = ±1.8 K was obtained. These results are in agreement with those reported by Galve et al. [39], using data from 2015-2016 in this same agricultural area. A RMSD value of ±1.6 K was obtained by these authors using ground LST data measured in six of the crop fields within "Las Tiesas" experimental farm also used in the present work. Note significant differences in terms of LST between MODIS and high-resolution sensors (Landsat or ASTER) up to 2-3 • C have been reported, induced by difference in the retrieval algorithm, atmospheric correction, sensor performance, acquisition time, view geometry, or spectral response function [10,[42][43][44][45]. Weng et al. [44] pointed out that comparison of thermal data from different sensors requires some pre-processing procedure. In this work, the differences in the spectral characteristics and overpass time (20-30 min difference) between Landsat and MODIS were minimized by applying a normalization process to the Landsat bands [1]. The disaggregated LST 10m were aggregated to the equivalent 30 m Landsat pixels (LST 30m ) by a 3 × 3 pixel averaging, based on the Stefan-Boltzmann law, following Gao et al. [20].  Figure 8 shows the comparison between the original 30-m LST derived from L7-ETM+, and LST disaggregation products at 10-m spatial resolution, for a subset of 10 x 10 km 2 centered in the "Las Tiesas" experimental farm. Visual inspection points the significant improvement in the capacity to discriminate the different field borders. Although the real potential of the downscaling approach is revealed when focusing on parcels <5 ha, where 3 × 3 thermal pixels of L7-ETM+ can be hardly fit in, being these areas the main responsible of the scatter (r 2 = 0.82) observed in the regression plot in Figure 9.
To quantify the performance of the downscaling approach at a full scene perspective, pixel-topixel differences were calculated at the 30 m spatial resolution for the selected subset of 10 x 10 km 2 ( Figure 9). Statistical metrics of the differences are listed in Table 3. Considering more than 270.000 pixel/data, an average RMSD of ±2.0 K was obtained, with a minor overestimation of 0.3 K. Table 3. Quantitative analysis of the differences plotted in Figure 9. Statistical metrics as defined in Table 2.    Figure 8 shows the comparison between the original 30-m LST derived from L7-ETM+, and LST disaggregation products at 10-m spatial resolution, for a subset of 10 × 10 km 2 centered in the "Las Tiesas" experimental farm. Visual inspection points the significant improvement in the capacity to discriminate the different field borders. Although the real potential of the downscaling approach is revealed when focusing on parcels <5 ha, where 3 × 3 thermal pixels of L7-ETM+ can be hardly fit in, being these areas the main responsible of the scatter (r 2 = 0.82) observed in the regression plot in Figure 9.
To quantify the performance of the downscaling approach at a full scene perspective, pixel-to-pixel differences were calculated at the 30 m spatial resolution for the selected subset of 10 × 10 km 2 (Figure 9). Statistical metrics of the differences are listed in Table 3. Considering more than 270,000 pixel/data, an average RMSD of ±2.0 K was obtained, with a minor overestimation of 0.3 K. Table 3. Quantitative analysis of the differences plotted in Figure 9. Statistical metrics as defined in Table 2.

Discussions
Agromomy management decisions based on TIR data need confidence in LST estimates. An absolute uncertainty <1.5 K is traditionally reported as a requirement [7,46]. The translation of this uncertainty to ET accuracy depends on the model but ranges between 10% and 20% [47,48]. Based on this threshold, the results obtained in this work (average RMSD = ±2.2 K) are encouraging. Additionally, the 10-m pixel size and the revisit frequency of the MODIS data, much better than the 3-5 days revisit frequency of S2A and S2B satellites, can fulfill the LST input requirements of the surface energy balance methods for a variety of hydrological, climatological or agricultural applications. At this point, no significant differences in the model performance were observed in connection with the collocation delay between the MODIS overpass and the Sentinel-2 image used, i.e., ±1-day mismatch seems not to have had an effect on the disaggregation.
With the new treatment of the residuals (R LST ) introduced in this work, as part of the downscaling scheme, an improvement around 20% was obtained in the performance of the original formulation of the model [1] that simply included a Gaussian filter, as suggested by Anderson et al. [2].
Ground validation results are in agreement with those obtained by Bisquert et al. [26] using MODIS-Spot 5 pairs in this case. These authors reported a bias of 0.2 K and a RMSD of ±2.4 K based on the comparison between disaggregated 10-m LST and ground-measured LST values for 10 different dates and five different fields. Regarding the distributed assessment, our results are similar to the RMSD value of ±2.6 K reported by Bisquert et al. [26] at a scene scale. In a first work using MODIS-Landsat combination [1], these authors reported an average RMSD = ±2.0 K for disaggregated 60-m LST in this case.
In this context, Agam et al. [19] tested a sharpening model (TsHARP) over extensive corn/soybean fields in central Iowa, USA. RMSD values between ±0.7 and ±1.4 • C were obtained by sharpening down simulated MODIS thermal maps at 1000 m to 250 m and between ±1.8 and ±2.4 • C by sharpening simulated thermal Landsat maps from 60 and 120 m to a VNIR 30 m resolution. Also using TsHARP, Duan and Li [49] disaggregated MODIS LST from 1000 m to 90 m, with an uncertainty of ±2.7 • C. Jeganathan et al. [14] tested TsHARP from MODIS over a heterogeneous agricultural landscape in India, and found uncertainties ranging ±2-3 K, using ASTER thermal data as a reference. Eswar et al. [23] used a thermal sharpening technique with five different indices to downscale MODIS LST from 960 m to 120 m, and compared this with the Landsat 7 LST data at different sites in India. These authors found that NDVI/FVC showed better result for wet areas, whereas the Normalized Difference Water Index (NDWI) was found better for dry areas. Yang et al. [17] used the multiple linear regression models to downscale the aggregated Landsat TIRS (360 m) image to 90 m, using a relation of LST with multiple scale factors in an area of mixed land covers (water, vegetation, bare soil, impervious surface), and then compared with the pure Landsat LST. The result found was satisfactory with coefficient of determination of 0.87 and RMSD of ±1.13 K. Merlin et al. [10] used a time series of higher resolution Formosat-2 images to test a new disaggregation procedure of kilometric thermal data over an irrigated cropping area in northwestern Mexico during an agricultural season. RMSD values about ±3 • C were obtained by these authors.
Many of these previous studies already pointed larger uncertainties in disaggregated LST over irrigated lands [1,10,26]. This is a weekness that remains in the present work since, although affected, VNIR reflectivity data does not fully capture the cooling effect produced in a wetted surface. Therefore, the downscaling technique still fails at reproducing LST values for spots with an undergoing irrigation or recently irrigated targets. In an attempt to face these limitations, some works incorporate additional reflectance information in the regression algorithms. Gosh and Joshi [22] tested several regression algorithms using EO1-Hyperion hyperspectral data over different land use land cover scenes. These authors used three pairs of coincident Hyperion and Landsat 7-ETM+ images as a reference for the assessment. Liu et al. [24] compared the performances of a thermal disaggregation technique, based on three different indices: temperature vegetation dryness index (TVDI), NDVI and fractional vegetation cover (FVC), over a humid agriculture region. These authors found the smallest RMSD using TVDI, with an improvement of 0.2 K in comparison to the results obtained using NDVI or FVC. A similar reduction of the uncertainty in 0.2 K was obtained by Amazirh et al. [25], thanks to the inclusion of Sentinel-1 radar data, linked to surface soil moisture, in a new formulation to improve the LST disaggregation methodology. These authors used Sentinel-1 imagery to derive 100-m resolution LST, and the results were compared with Landsat LST, used as a reference over two heterogeneous sites (irrigated and rainfed). However, average RMSD values for the six dates of study resulted over ±3.0 K, with even worse accuracy during summer. So, further efforts are still required to improve this soil moisture integration. Further works should also explore the inclusion of additional Sentinel-2 bands in the shortwave infrared (SWIR) in the sharpening scheme, since they might account for vegetation and soil water content [50].
Another finding is this work is the difficulty of the downscaling approaches to reproduce excepcionally high LST when these conditions are constrained to small parcels in the image, and there is a lack of coarse original MODIS pixels showing this homogeneous thermal conditions. The modeled relationship LST-VNIR reflectivities may not fully cover these conditions, leading to an underestimation of LST.
Focusing on the combination of S3-S2 images, very few quantitative studies have been conducted. In a first attempt, Huryna et al. [28] applied TsHARP sharpening. These authors reported LST differences of ±1.3-1.5 K, when compared with sharpened to 60-m S3 temperature with reference Landsat 8 temperature at 60 m, with a positive bias of 0.3-0.6 K, depending on the study site. However, the lack of local measurements prevented these authors from conducting a ground assessment. Further research should merge Sentinel-2 and Sentinel-3 imagery and conduct robust assessment of the downscaling results. Collections of ground LST measurements under a variety of surface and environmental conditions are then required, and the dataset gathered in the framework of this work is potentially attractive for this aim.

Conclusions
This work adds to the previous literature dealing with thermal infrared downscaling. The 10-m LST maps generated from the combination MODIS-S2 can contribute to fill the gap until high spatial-temporal resolution TIR images are available. The linear relation NDVI-LST was adopted as a basis for the downscaling approach. Results obtained encourage the parametrization of the residual as a function of the NDVI as a key step in the algorithm (an improvement of 0.5 K was achieved). The variety of surface conditions and the wide range of NDVI and LST values in the semi-arid area of Barrax allowed a robust assessment of the downscaling approach. An average estimation error of ±2.2 K in LST 10m resulted from the ground validation. This evaluation was reinforced by the pixel-to-pixel comparison of rescaled LST 30m with Landsat-7/ETM+ LST estimates, showing a similar RMSD of ±2.0 K for the distributed assessment.
Findings in this study highlight the limitations of the methodology to capture the variability of extreme LST, and the problems in recently sprinkler irrigated fields. Results indicate the need for caution, since disaggregated LST under these conditions may result artificially higher than expected.
Despite the weaknesses, this work gives promising insights for the adaptation of this methodology to the tandem S3-S2 in coming works. Further research could also benefit from the ground LST dataset introduced in this paper for a comprehensive performance assessment.
Finally, note that the benefits of this research may extend to other applications, such as monitoring volcanic activity and wildfire, estimating evapotranspiration or assessing drought severity.