TS 2 uRF : A New Method for Sharpening Thermal Infrared Satellite Imagery

Thermal infrared (TIR) imagery is normally acquired at coarser pixel resolution than that of shortwave sensors on the same satellite platform. TIR resolution is often not suitable for monitoring crop conditions of fragmented farming lands, e.g., the accurate estimates of evapotranspiration (ET) based on surface energy balance from remote sensing for irrigation water management. Consequently, thermal sharpening techniques have been developed to sharpen TIR imagery to a shortwave band pixel resolution. However, most methods concentrate on the visual effects of the thermal sharpened images, and they treat the pixels as independent samples without considering their spatial context, which can give rise to adverse effects such as artifacts. In this work, a new thermal sharpening method called TS2uRF is proposed. The potential of superpixels (SP) combined with regression random forest (RRF) have been used to augment the spatial resolution of the Landsat 8 TIR (100 m) imagery to their visible (VIS) spatial resolution (30 m). The SP has allowed the contextual information on the land cover to be integrated, and RRF has allowed the relationship between five spectral indices and TIR data to be integrated into a single model. The TIR sharpened images obtained using the TS2uRF were compared with images obtained using the TsHARP, one of the most classic thermal sharpening techniques, evaluating the root-mean-square error (RMSE) and structural similarity index (SSIM) for measuring image quality. In all of the cases evaluated, the RMSE and SSIM of the images sharpened using the TS2uRF method outperform those obtained using TsHARP. In particular, the TS2uRF method has an average error of 1.14 ◦C (RMSE) lower than TsHARP, regarding SSIM, TS2uRF outperforms TsHARP on average by 0.218. From the visual comparison, it has been shown that the TS2uRF methodology avoids the artifacts that appear in the enhanced images using the TsHARP method.


Introduction
The pressure on agricultural land due to the limited resources of arable land and water scarcity, is greater than ever.In this scenario, an improvement of agricultural productivity is required [1].One of the most important factors in achieving this goal is suitable irrigation water management.To this end, an accurate estimation of crop evapotranspiration (ET) is necessary.ET represents the total loss of water due to transpiration and evaporation phenomena that take place in vegetation cover and soil in a crop area.Nowadays it is possible to estimate ET for different crops, by providing spatial and temporarily distributed information over a wide area, using information gathered from aircraft or satellite platforms.One of the most commonly used methods for ET estimation from remote sensing are based on residual methods using the surface energy balance (e.g., SEBAL, METRIC, SEBI, ETMA) [2][3][4][5].The most critical input in the surface energy balance are visible (VIS), near infrared (NIR), and thermal infrared TIR remote sensed imagery.TIR imagery has been expanding rapidly, playing an important role in numerous agricultural environments: nursery monitoring, soil salinity stress detection, plant disease detection, yield estimation and irrigation scheduling, among others [6].
TIR imagery is normally acquired at a coarser pixel resolution than that of shortwave sensors on the same satellite platform, and the TIR resolution is often not suitable for monitoring the crop conditions of individual fields or the impacts of land cover changes that are at significantly finer spatial scales [7].Consequently, techniques for improving the spatial resolution of the thermal band to shortwave band pixel resolutions have been developed; this resolution often being fine enough for field-scale applications [8][9][10].Zhan et al. [10] classified these techniques into two groups based on a robust bibliography: Temperature Unmixing (TUM) and Thermal Sharpening (TSP).TUM is characterized as a generic process by which component temperatures within a pixel are broken down based on multi-temporal, spatial, spectral and/or angular observations.TSP refers to any procedure through which thermal images are enhanced for interpretation purpose, i.e., enhancement of low resolution TIR using spatially-distributed auxiliary data that are statistically correlated to the TIR pixel by pixel or region by region.Whilst the differences between the TSP and TUM are subtle, in [10] the authors point out that the TSP is used to obtain the land surface temperature (LST) of smaller resolution cells while the TUM is used to obtain the LST of elements within large resolution cells.This work focuses on TSP methods for improving the spatial resolution of TIR images.
Most of methods in the TSP group concentrate on the visual effects of the thermal sharpened images, which may not always be useful for quantitative remote sensing applications [11].One of the most classic thermal sharpening techniques is TsHARP [12,13].TsHARP exploits the inverse relationship between land surface temperature (LST) and the normalized vegetation index (NDVI).Through TsHARP it is possible to obtain TIR images at NDVI spatial resolution.The TsHARP technique assumes that fractional vegetation cover, which is related to NDVI, is one of the primary factors affecting land surface temperature variations over a given area [14].The implementation and operation of TsHARP is accessible and easy, and has a low computational cost.However, in [15] some disadvantages of the methodology were presented.In particular, the scale effect of the LTS and NDVI relationship is not considered, which could lead to significant errors in homogeneous areas (e.g., natural vegetation areas).Furthermore, there are several studies providing a single relationship between LST and NDVI which are not well defined for complex land covers (e.g., soils with moisture or bare soils that experience cooling due to evaporation or the presence of water) [16,17].
Other thermal sharpening techniques use Data Mining (DM) [18].In [18], the authors proposed an approach based on Regression Trees (RT) built using TIR and shortwave spectral reflectances.The authors demonstrate that the DM approach reduces the artificial like-box pattern in land surface temperature generated by TsHARP.The DM framework is more flexible and adaptable for automated operational data production, allowing auxiliary data (e.g., digital terrain model, soil texture or moisture) to be included to improve the spatial resolution of the TIR images.In Bai et al. [11], a novel data fusion method, based on a learning machine algorithm [19] for a neural network regression model is proposed to enhance the 60 m Landsat 7 ETM + TIR band imagery to a 30 m resolution.MODIS LST and enhanced Landsat 7 ETM + TIR data, are also combined to improve their temporal resolution.The authors concluded that the synthetic LST imagery obtained through this methodology can be used for monitoring the variation of land surface temperature (e.g., in urban heat-island studies).The main shortcoming of the aforementioned methods is that they treat pixels as independent samples without considering their spatial context, which can give rise to adverse effects such as a lack of precision of the estimates [20].In this regard, Zhan et al. [21] presented a study for downscaling land surface temperature based on multi-spectral and multi-resolution images and the methodology proposed in [8] in which they included spatial context features through different mobile window sizes.The results obtained showed that the sharpened images increase in quality as the spatial context information considered during processing increases.The use of square windows is a common practice in image processing applications, however this procedure constrains the context to an artificial structure, i.e., a square, without considering that they could belong to different land covers.Recently, some studies [22][23][24][25][26][27] have demonstrated that the use of superpixels (SP) (i.e., perceptually-uniform regions in the image of a similar-size) can solve the aforementioned problems.Our working hypothesis states that it is possible to improve the spatial resolution of the TIR images by modeling a relationship between various spectral indices of vegetation and soil and TIR images, but unlike the methods mentioned above, our method incorporates information from the pixel context without the restrictions of artificial structures such as square windows.In this work, the potential of SP combined with Regression Random Forest (RRF) is used to augment the spatial resolution of the Landsat 8 TIR (TIR 1 and TIR 2 ) imagery to their VIS spatial resolution.The use of SP allows the authors to consider the contextual information on the land cover with respect to a pixel, while RRF allows the relationship between five spectral indices and TIR data to be integrated into a single model.

Study Site
The study site is located in the region of Biobío, Chile.The area is made up of rivers, different annual crops and orchards, and alluvial soils, which allows high production levels for different crops.The climate is warm temperate, with an annual mean temperature of 14 • C. Four different land covers are included in the area: urban, agricultural vegetation, forest and bare soil.An image was acquired by the ETM+ sensor on board of the Landsat-8 satellite (path 233, row 85), and downloaded from the USGS Glovis official site (http://glovis.usgs.gov),with an L1T preprocessing level of standard field correction.A color composite of the multispectral image registered on 16 February 2016 (summer season) is displayed in Figure 1.The size of the scene is 674 × 470 pixels.Each pixel represents an area of 30 m × 30 m for all spectral bands, except for the TIR band.The TIR sensors contain two thermal bands, which measure land surface temperature at 100 m resolution (Figure 2).However, the product provided by USGS Glovis is resampled and delivered as 30 m (Table 1).

Methodology
The objective of the proposed methodology is to improve the spatial resolution of the TIR bands of the Landsat 8 dataset by finding a relationship between spectral features and brightness temperature (BT) of the TIR values.These spectral features include vegetation and soil spectral indices, widely used in remote sensing monitoring of agriculture, as well as information on the spatial context of the pixels.This methodology exploits the SP to provide perceptibly uniform regions in the TIR images, which are useful for obtaining image features at different scales, as well as providing contextual information of pixels.While an RRF model is trained from characteristics calculated using the context provided by SP to find a relationship between them and the BT values of the TIR images.The trained model is able to provide a pixel-level estimation of the TIR value in response to new input values (which must come from the same distribution as those used to generate the model).Thus, the model can be used to improve the resolution of the TIR image using a set of similar characteristics but with a higher spatial resolution as input.
Figure 3 illustrates the workflow of the proposed methodology called TIR Sharpening imagery using Superpixels and Random Forest (TS 2 uRF).Its main steps are described in the next sections.

Preprocessing
The first step consists of converting digital numbers (DNs) of the Landsat 8 bands in the visible range to the top of atmosphere (TOA) reflectance, followed by a simple atmospheric correction using the Dark Object Subtraction 1 (DOS1) method [28].Once this is carried out, the reflectance of the thermal bands (TIR 1 and TIR 2 ) are converted to at-satellite brightness temperature in Celsius.All of these processes are carried out using the Semi-automatic Classification Plugin [29] for Quantum GIS software [30].This step assumes that the TIR imagery have been previously resampled to the pixel size of the VIS.

Superpixel Generation
As mentioned above, SP are used to provide contextual information at multiple scales.Unlike window-based approaches that have a predetermined environment, i.e., a square, SP are capable of providing an adjustable-size environment that is adapted to the characteristics of the image, particularly to the shapes of objects (Figure 4).An SP is a small, local, and coherent cluster that contains a statistically homogeneous image region in accordance with certain criteria such as color and texture, among others [31].There are different approaches to computing SP, one of the most popular is the SLIC algorithm [32], which is based on the well-known k-means method which groups pixels into a conventional color space.The SLIC method generates SP similar in size in accordance with two criteria: spectral similarity (three channels) and spatial proximity.The SLIC has two parameters k and c, the first one is related to the size of the SP, whilst the second one regulates their compactness.Thus, SLIC is able to provide SP at different scales by fixing the value of the compactness parameter (c), but varying the size of the SP (k).In this work, a modified version of SLIC [33] is used which works with any number of channels.Prior to segmentation, the image is filtered to reduce the noise and to preserve most of the image edges.The filter chosen for this purpose is the Rolling Guidance Filter, proposed by Zhang et al. [34], which is based on the bilateral filter.Because the objective of using SP in the proposed methodology is to obtain homogeneous temperature areas to establish relationships with spectral indices, this step is only applied to TIR bands sampled at the pixel size of the VIS image.

Integration Process
Since the idea of this work is to estimate the brightness temperature (BT) values of the TIR images at a higher resolution than the original one by means of an RRF model, it is necessary to define a set of features that allows better estimations to be obtained.In this regard, SP are used to provide a context to pixels at different scales (Figure 5).Because SP are well suited to the edges of objects present in the images (land covers), they are suitable for providing the context of pixels of TIR images.The BT value associated to each SP is defined as the average value of its neighborhood (made up of pixels within the same SP of all the given scales and a particular spectral characteristic TIR band (TIR 1 or TIR 2 ).Thus, the brightness value associated with the ith pixel at scale jth will be denoted by BT SP i j j , where SP i j represents the SP where ith pixel is contained at the scale jth (j = 1, . . ., n).In this way, the spatial variability of each pixel in respect to their spatial context given by SP i j is captured at different scales.Because this work is intended for agricultural purposes, spectral indices related to agricultural applications have been used as features.The spectral indices used are summarized in Table 2 and correspond to the Normalized Difference Vegetation Index (NDVI), the Enhanced Vegetation Index (EVI), the Normalized Difference Water Index (NDWI), the Normalized Difference Moisture Index (NDMI), and the Bare Soil Index (BSI).
To augment the spatial resolution of the Landsat 8 TIR (TIR 1 and TIR 2 ) imagery to their VIS spatial resolution, the BT values at scale n (BT ) have been related with five aforementioned spectral indices.To accomplish this, each SP i j is intersected with the corresponding pixels belonging to each spectral index images, then mean value of these pixels are calculated.Thus, for each spectral index a n-components feature vector Feat i 1−n is obtained.It is defined as in Equation ( 1) Then, at the end of the process, a dataset for scale n is obtained, which can be represented by the features vector in Equation ( 2): where n are the results of using the spectral indices (Table 2) for characterizing the ith pixel at the n-scale.
As in the work of Gonzalo-Martín et al. [33], in this work the size of the SP has been selected to follow a dyadic scale.

Regression Random Forest Model Generation
To generate the RRF model [36], each feature vector compounded by the aforementioned spectral indices at the VIS spatial resolution (Equation ( 2)), is associated with its corresponding BT target value obtained from TIR 1 or TIR 2 at scale n and sampled at the pixel size of the VIS image (Section 2.2.3).An RRF model is trained to learn a function ( f ) representing the relationship between a feature vector and BT values, considering the context pixels information provided by the SP approach, from TIR band (Equation (3)).
Thus, an RRF model for each TIR band and for each specific scale n is obtained after training.The optimal number of trees used to build the model is selected by means of the out of bag (OOB).The model has been implemented using the TreeBagger package of Matlab R .TreeBagger function allows for the growth of decision trees in the ensemble using bootstrap samples of the data, thus reducing the effects of overfitting and improving the generalization [37].The main TreeBagger function parameters used in this work were: regression method, 50 trees and surrogate splits.To generate a TIR image, the trained model is fed using as input a features vector at the VIS spatial resolution, as defined in Equation (3).In this way, it is possible to obtain BT values to generate a thermal-sharpened image at the same resolution of the input vector.

Results
To evaluate the quality of the BT from TIR sharpened images generated by the TS 2 uRF, we simulated multi-resolution thermal data by degrading the Landsat 8 channel TIR 1 and TIR 2 from the original 100 m spatial resolutions (Table 1) to 200, 300, 400, 500, 600, 700, 800, 900, and 1000 m, using Lanczos 2D filter, implemented in Matlab R .This data set allows the authors to apply the proposed method to sharpen the degraded TIR images back to a resolution of 100 m; they images are then compared against the original thermal image (100 m).Following the methodology proposed and illustrated in Figure 3, to obtain the TIR images at 100 m resolution, VIS data from the target resolution is required, therefore, the VIS image is also re-sampled at 100 m.It is important to note that each degraded TIR image is segmented once it is resampled at 100 m.In Figure 6, heatmaps representing the SSIM and RMSE indices for TIR 1 (Figure 6a,b) and TIR 2 (Figure 6c,d) are shown.The SSIM index takes decimal values in the range [−1, 1], where 1 is the ideal value, when the two compared images are equal, while RMSE is a prediction error.These quality indices are calculated for sharpened TIR images (100 m) obtained from nine degraded images; spatial resolutions are represented in the y-axis while on the x-axis the effect of different spatial context information was considered for the sharpened TIR images (Scale).The minimal spatial context information is considered when generating the images at a pixel level.On the other hand, a maximum spatial context is considered when information from Scale 1 to Scale 5 is integrated into the sharpened process (Scale 1−5 ).A visual comparison of the best results obtained by sharpening the degraded images at 200, 600 and 1000 m is shown in Figure 7.As can be seen, TS 2 uRF is able to preserve the quality of the brightness temperature at degraded spatial resolutions (200, 600 and 1000 m).Since similar results were obtained for TIR 2 , from here the analysis will focus on the results of sharpening the TIR 1 image.To compare the TS 2 uRF method with one of the most classic methodologies of TsHARP, described in Section 1, thermal sharpened images were generated by TsHARP, applying the same degradation levels (200 m, . . ., 1000 m) and considering a minimal spatial context, i.e., the pixel level.In Table 3, it is possible to compare the quality evaluation indices (SSIM and RMSE).In all cases, the RMSE and SSIM values of the images sharpened using the TS 2 uRF method were better than for TsHARP.The greatest RMSE difference appears in the 200 m degraded TIR 1 image, obtaining a RMSE TS 2 uRF = 0.490 • C and RMSE TsH ARP = 1.906 • C. The same applies to the structural index; the SSI M TS 2 uRF = 0.824 and the SSI M TsH ARP = 0.412.This outcome was confirmed by a comparison between the TIR 1 reference image and sharpened TIR 1 image using TS 2 uRF and TsHARP methods.Figure 8 shows the comparison for sharpening an image degraded to 200 m.The reference image (Figure 8a) and the image sharpened by TS 2 uRF (Figure 8b), the brightness temperature varies over a similar range of values.However, the TIR 1 image sharpened using TsHARP (Figure 8c) is no longer able to estimate the original values for a brightness temperature of more than 30 • C and lower than 20 • C.This behavior is also reflected in the scatter plot between reference and sharpened images using TS 2 uRF (Figure 8d), where their goodness of fit index (r 2 ) is 0.87; while for TsHARP (Figure 8e) it is 0.48.Due to the lack of actual 30 m data from the TIR images, it is not possible to quantitatively compare the results of carrying the TIR image from 100 m to 30 m by using the VIS Landsat data.However, the result of this experiment can be seen in Figure 9.It shows a visual comparison between the reference TIR images of 30 m (TIR 1 Figure 9a and TIR 2 Figure 9b) and the sharpened TIR images at 30 m, obtained using TS 2 uRF (Figure 9c,e) and TsHARP (Figure 9d,e).As can be seen, images sharpened with TS 2 uRF have a better spatial and brightness temperature than those obtained with TsHARP.
The influence of spatial context information on the sharpened images using TS 2 uRF is shown in Figure 10.The original TIR 1 image at 100 m spatial resolution (resampled at 30 m) is shown in Figure 10a.Whereas the TIR 1 images sharpened at 30 m spatial resolution using TS 2 uRF at a pixel level, Scale 1−3 and Scale 1−5 are shown in Figure 10b-d By making a visual comparison between the TIR 1 image of 30 m (Figure 10a) and sharpened image (Figure 10b-d), it is possible to state that the sharpened image obtained at Scale 1−5 and the original images have a similar brightness temperature behavior.Moreover, the model using information at Scale 1−5 does not exhaustively capture the edge (spatial information) from the source images.This is due to the Scale 1−5 integrating a lot of spatial information, working as a low-pass filter, and thus smoothing the edges.As a result, the quality of the brightness temperature is high (compared to the original images), at the expense of reducing the spatial resolution of the sharpened images.From the results obtained both in the first experiment (Table 3) and this experiment (Figure 10), the authors recommend using Scale 1−3 to sharpen Landsat 8 TIR images from 100 m to 30 m.

Discussion
The results obtained in this research allow the authors to make considerable improvements in the sharpening thermal infrared satellite imagery, with respect to the state-of-the-art methods, proving the hypothesis established in this work.The most important contribution of this work was the use of the SP, which allowed the authors to consider the contextual information on the BT values from TIR images.
The results shown in Figure 6, for both the TIR 1 and TIR 2 images, demonstrate the same behavior for all degraded images under analysis.The best SSIM values are obtained with Scale 1−5 .For degraded images of more than 500 m, less spatial context is required.The RMSE has a similar behavior for both TIR 1 and TIR 2 sharpened images.These results are similar to that set out by [21], in which it is shown that the RMSE values of the TIR sharpened images decrease as the spatial context information (windows size) considered in the sharpening process increases.
In Table 3, the results show that the TS 2 uRF methodology provides a better reconstruction of the original image than TsHARP.
It can be seen in Table 3 that the SSIM and RMSE values obtained using TsHARP are almost unchanged when changing the level of degradation.Whereas the quality of the results obtained by TS 2 uRF vary according to contextual information, the results shown in Figure 9 indicate that the TsHARP method offers good spatial details, but a poor BT estimation.This is consistent with the work of Bai et al. [11], which states that most of the TSP methods focus on the visual effects of thermal sharpened images which may not always be useful for quantitative remote sensing applications.
Future research should aim to evaluate their usefulness for ET estimation.Moreover, this research focused solely on the agricultural landscape.Therefore, it may be interesting to evaluate the TS 2 uRF in urban applications (e.g., urban-heat-island).

Conclusions
In this work, a new method for thermal sharpening called TS 2 uRF has been proposed to overcome the shortcomings of the methods found in literature.The set of sharpening experiments and their results confirm that it is possible to improve the spatial resolution of TIR images by modeling a relationship between vegetation and soil spectral indices and TIR values, as well as spatial context information given by SP.Based on the evaluation of the results obtained by sampling the TIR images at different resolutions, it can be concluded that the TS 2 uRF method improves the results obtained by the TsHARP method.The results obtained from sharpening nine images of different resolutions (from 200 m to 1000 m) to a resolution of 100 m show that the TS 2 uRF method has an average error of 1.14 • C (RMSE) lower than TsHARP.Unlike TsHARP, the TS 2 uRF sharpened image has a brightness temperature that varies in a range similar to that of the original images.In addition, the TS 2 uRF methodology avoids artifacts that are visually perceived in improved images using the TsHARP method, which results in better spatial (edges) and spectral (brightness temperature) quality.In this regard, the TIR images sharpened using TS 2 uRF can be useful to improve the spatial resolution of agricultural applications, such as estimating ET maps, which will be a research motive in future works.

Figure 1 .
Figure 1.A location map and a false color composite (NIR, Green, and Blue bands) from Operational Land Imager (OLI) of the study site, located in the region of Biobío (Chile) and acquired on 16 February 2016.

Figure 2 .
Figure 2. Thermal Infrared bands (TIR) of the study site, located in the region of Biobío (Chile) and registered on 16 February 2016.

Figure 3 .
Figure 3. Flowchart of the methodology used to sharpen TIR satellite imagery by TS 2 uRF.
(a)Superpixels at fine scale (b)Superpixels at coarse scale

Figure 5 .
Figure 5. Information about the context of a pixel at different scales.

Figure 6 .
Figure 6.Heatmaps of SSIM (a,c) and RMSE (b,d) indices comparing the original TIR images (100 m) and the sharpened TIR, obtained by TS 2 uRF, from degraded images and considering different spatial context information.

Figure 7 .
Figure 7. Visual comparison between degraded TIR 1 images and their corresponding sharpened images (100 m).The images in the left column correspond to the degraded versions at 200 (a); 600 (c) and 1000 m (e) respectively.Whereas in the right column, the images correspond their respective sharpened versions using the following contextual features: Scale 1−5 (b); Scale 1−4 (d); and Scale 1−3 (f).

Figure 9 . 5 Figure 10 .
Figure 9. Visual comparison between (a) the original image TIR 1 and (b) TIR 2 at 100 m spatial resolution and sharpened TIR 1 (c) and TIR 2 (d) at 30 m by TS 2 uRF and sharpened TIR 1 (e) and TIR 2 (f) at 30 m by TsHARP, and considering a Scale 1−3 as a spatial context information.

Table 3 .
Quality evaluation between reference image TIR 1 (100 m) and sharpened TIR 1 image obtained by TS 2 uRF and TsHARP using images degraded at different levels.