New Scheme for Validating Remote-Sensing Land Surface Temperature Products with Station Observations

Continuous land-surface temperature (LST) observations from ground-based stations are an important reference dataset for validating remote-sensing LST products. However, a lack of evaluations of the representativeness of station observations limits the reliability of validation results. In this study, a new practical validation scheme is presented for validating remote-sensing LST products that includes a key step: assessing the spatial representativeness of ground-based LST measurements. Three indicators, namely, the dominant land-cover type (DLCT), relative bias (RB), and average structure scale (ASS), are established to quantify the representative levels of station observations based on the land-cover type (LCT) and LST reference maps with high spatial resolution. We validated MODIS LSTs using station observations from the Heihe River Basin (HRB) in China. The spatial representative evaluation steps show that the representativeness of observations greatly differs among stations and varies with different vegetation growth and other factors. Large differences in the validation results occur when using different representative level observations, which indicates a large potential for large error during the traditional T-based validation scheme. Comparisons show that the new validation scheme greatly improves the reliability of LST product validation through high-level representative observations.


Introduction
Land-surface temperature (LST) is an important parameter related to the surface energy and water balance at local and global scales and has principal significance for applications such as monitoring the climate, hydrological cycle, and vegetation [1].Satellite remote sensing provides a repetitive synoptic view in short intervals of the global land surface and is a vital tool for monitoring the LST of the Earth.With the development of remote-sensing technology, many LST products have been provided by different groups based on retrieval from different satellite data [2][3][4][5].The first long-term global sensing LST dataset, the NOAA/NASA Pathfinder AVHRR Land dataset (PAL) [2], was released in 1994.The second generation AVHRR Land Pathfinder Π (PALΠ) was a refinement product from the PAL released in 2000 [3].Sun and Pinker estimated LST products from a Geostationary Operational Environmental Satellite (GEOS) in 2003 [4].The LSTs from Spinning Enhanced Visible and Infrared Imager (SEVIRI) onboard the Meteosat Second Generation (MSG) was retrieved in 2008 [6].As a part of NASA Earth Observing System (EOS) project, MODIS LSTs have played an important role in recent studies, especially in regional studies, because of the suitable temporal and spatial resolution, acceptable accuracy, and accessibility of these LSTs.Therefore, MODIS daily LST/LSE products with 1-km spatial resolution are validated in this study.
Remotely sensed LSTs must be appropriately and precisely evaluated to ensure effective application [7].Mainly two types of methods exist for validating LST products retrieved from thermal-infrared satellite data: temperature-based methods (T-based) and radiance-based methods (R-based) [8][9][10][11][12][13][14].The main advantage of R-based methods is that they work during both the daytime and nighttime because in situ LST observations are not required, and finding validation sites with small spatial variations in land-surface emissivity is relatively easy [14].However, the atmospheric and water vapor profiles at validation sites from radiosonde balloons that are synchronously launched with the satellite are a necessary dataset, which limits the implementation of this method for long-term and large-region validation.Therefore, T-based methods remain common, and ground-based measurements are still the primary source of datasets to directly validate remotely sensed LSTs.However, we cannot perform a direct comparison with a pixel grid, especially for a coarse-resolution product over heterogeneous areas, because of the spatial heterogeneity and different scales between ground-based observations and remotely sensed LST pixels.
A generally accepted method is a systematic site-to-network method, which deeply develops an in situ sampling strategy and upscaling approaches to acquire the truth at the pixel scale over a heterogeneous surface based on multiscale, multi-platform and multi-source observations [15].This approach employs both field measurements from nodes of a wireless sensor network (WSN) and high-resolution remote-sensing data from synchronous high-resolution satellites or airborne sensors to establish a site-specific relationship and generate high-resolution LST reference maps over the validation area [15].These LST reference maps are then treated as benchmarks to obtain multiscale validation datasets by upscaling methods [16][17][18].However, only a few high-resolution LST reference maps can be synchronously obtained for a given region because of cost limitations and the revisiting cycles of satellites with high resolution, which is the greatest challenge towards the global validation of LST products, especially in terms of temporal consistency in product validation.
In contrast to more complicated R-based, site-to-network methods with limited LST reference maps, simple T-based methods are directly based on existing global, long-term ground LST measurements and are an important supplement for validation.Simple T-based methods have been widely used to validate remotely sensed LST products at homogeneous stations [8,13,19].When directly validating LST products with spatial resolutions above hundreds or even thousands of meters by ground-based measurements, the error from the scale mismatch changes with the land-cover type (LCT) and the proportions of mixtures in pixel grids reduce the reliability of the validation and hinder the application of ground-based LST measurements during the validation of remotely sensed LST products.Coll et al. have pointed out that during the day, LST can vary by 10 K or more over a few meters in a heterogeneous surface [9].Ground-based LST measurements from two types LST observation instruments with different field of view (FOV) were selected to discuss the scale mismatch implications for validation of remote sensing LST products in the study by Yu et al., and the validation results show that there is an extra 26.9% in the error >3 K range caused by the 41.5 FOV difference [20].Therefore, we must assess the spatial representativeness of station observations at a given spatial resolution to reliably validate remotely sensed LSTs.Recently, several methods have been used to assess the spatial representativeness of different land-surface parameters, such as the leaf area index [21], surface solar radiation [22], bidirectional reflectance distribution function (BRDF)/albedo [23], air temperature [24] and air quality [25], which are observed by ground stations.These methods are based on two factors: the point-to-area consistency and the spatial heterogeneity [21].The point-to-area consistency indicator can be calculated through two methods.The first involves directly comparing the footprint of the ground-station observations to the corresponding product pixels [26] or the average value of the corresponding area [27].In the second approach, the observational representativeness is determined by the average difference between a given station and its neighboring stations based on multi-temporal observations from multiple stations [9,28].The semi-variance is usually selected to describe the spatial representativeness by analyzing the spatial heterogeneity around the stations [29].A first-order statistical algorithm is an important spatial heterogeneity indicator, for example, using window-size analysis to assess the spatial variation of the landscape around a given station [30].Spatial representativeness assessments have been widely implemented to validate satellite-albedo, evapotranspiration, and LAI products [21,23,26,31,32].However, few representativeness assessments exist for station LST observations, which increases the uncertainty of the validation of LST products, particularly for simple T-based implementations, and hinders the application of station observations.This paper presents a new methodology for validating LST products that focuses on quantifying the spatial representativeness of station observations to improve the accuracy and reliability of LST product validation.The term "spatial representativeness" refers to measurements of the degree to which ground-based observations can resolve the surrounding LST by extending to the satellite footprint.This validation technique assesses the spatial characteristics of the LST, and the seasonal representativeness changes within a statistical framework.A scheme that is based on spatial representativeness indicators is presented in this paper, and then the grading criteria are outlined in detail.All the stations from the Heihe Watershed Allied Telemetry Experimental Research (HiWATER) [33] are selected for applying the validation strategy.The study area and data-processing procedure are introduced in Section 3. In Section 4, the representativeness of the given station observations is assessed to validate MODIS V5 daily LST products (MOD/MYD11A1).The representativeness assessment and LST product validation are also analyzed and discussed in this Section.Finally, the conclusions are summarized.

New Validation Scheme
The LST is a land-surface parameter with great spatial and temporal heterogeneity, which creates many challenges for "point-to-pixel" comparisons.Local changes in the surface temperature within and between different ecosystems introduce scale mismatch errors.Moreover, these patterns change seasonally and are particularly difficult to identify during periods of rapidly changing surface conditions.Therefore, "point" measurements alone are not sufficient to validate satellite-derived LST retrievals, especially remote-sensing LST products with moderate and low resolution (illustrated in Figure 1).This temporal mismatch can be solved by increasing the observation-acquisition frequency of stations.The scheme that is developed here to validate remote-sensing LST products (see Figure 1) attempts to solve spatial mismatch effects during validation.The key in this scheme is to assess the spatial representativeness, which is based on remote-sensing data with high spatial resolution that are closely related to the LST.Indicators are proposed to quantify the spatial representativeness, and then the grading criteria are designed before selecting appropriate ground-based measurements for validation.
Remote Sens. 2017, 9, 1210 3 of 24 heterogeneity around the stations [29].A first-order statistical algorithm is an important spatial heterogeneity indicator, for example, using window-size analysis to assess the spatial variation of the landscape around a given station [30].Spatial representativeness assessments have been widely implemented to validate satellite-albedo, evapotranspiration, and LAI products [21,23,26,31,32].However, few representativeness assessments exist for station LST observations, which increases the uncertainty of the validation of LST products, particularly for simple T-based implementations, and hinders the application of station observations.This paper presents a new methodology for validating LST products that focuses on quantifying the spatial representativeness of station observations to improve the accuracy and reliability of LST product validation.The term "spatial representativeness" refers to measurements of the degree to which ground-based observations can resolve the surrounding LST by extending to the satellite footprint.This validation technique assesses the spatial characteristics of the LST, and the seasonal representativeness changes within a statistical framework.A scheme that is based on spatial representativeness indicators is presented in this paper, and then the grading criteria are outlined in detail.All the stations from the Heihe Watershed Allied Telemetry Experimental Research (HiWATER) [33] are selected for applying the validation strategy.The study area and data-processing procedure are introduced in Section 3. In Section 4, the representativeness of the given station observations is assessed to validate MODIS V5 daily LST products (MOD/MYD11A1).The representativeness assessment and LST product validation are also analyzed and discussed in this Section.Finally, the conclusions are summarized.

New Validation Scheme
The LST is a land-surface parameter with great spatial and temporal heterogeneity, which creates many challenges for "point-to-pixel" comparisons.Local changes in the surface temperature within and between different ecosystems introduce scale mismatch errors.Moreover, these patterns change seasonally and are particularly difficult to identify during periods of rapidly changing surface conditions.Therefore, "point" measurements alone are not sufficient to validate satellite-derived LST retrievals, especially remote-sensing LST products with moderate and low resolution (illustrated in Figure 1).This temporal mismatch can be solved by increasing the observation-acquisition frequency of stations.The scheme that is developed here to validate remote-sensing LST products (see Figure 1) attempts to solve spatial mismatch effects during validation.The key in this scheme is to assess the spatial representativeness, which is based on remote-sensing data with high spatial resolution that are closely related to the LST.Indicators are proposed to quantify the spatial representativeness, and then the grading criteria are designed before selecting appropriate ground-based measurements for validation.

Indicators for Assessing Spatial Representativeness
Three indicators are proposed to describe the spatial characteristics for a specific parameter, mainly including the consistency from points to pixels and the spatial heterogeneity within pixels.These indicators are calculated on high-resolution images, which are much easier to access, thus simplifying the process.The LST is a direct parameter for assessing the representativeness of a given station's LST observations.However, obtaining temporal high-spatial-resolution LST matches for LST products with lower resolution is difficult.Therefore, high-resolution land-cover type (LCT) and the normalized difference vegetation index (NDVI), which can indicate the surface conditions and their changes, are chosen as additional supporting parameters to evaluate ground-based LST observations in addition to LST data with a high spatial resolution.
If the LCTs observed by stations do not match the dominant LCTs in the pixels, these station LST observations cannot represent the LST of the LCTs in the pixels.Thus, the dominant LCT (DLCT), which is given by the percentage of the observed LCT throughout the pixel's area, is defined as where s is the product pixel, M(s) is the area with the LCT that is observed by the given station, and N(s) is the total area of the LCTs in the LST product's pixel grid.When using a high-resolution LCT map, the DLCT can also be described as the percentage of fine pixel numbers covered by station-observed LTC to total LCT pixel numbers in the LST product's pixel range.A high DLCT indicates that the LCT observed by a station is consistent with that in the LST product's pixel and low heterogeneity within the product pixel because the mixing rate of LCTs in the pixel is not large.We developed a relative bias (RB) indicator to assess how close a ground-based LST measurement is to the value of the corresponding pixel area.According to the high-spatial-resolution LST reference images, the relative bias is used to describe the difference between the LST value T(s) at a station and the average LST value T(s) in the product pixel's area.If we consider the comparability between different ranges of LST values, the RB is defined as where s is the product pixel and RB depends on both its resolution and the resolution of the reference LST image.This indicator can quantify the certainty of the ground-based measurements to the product pixel area LST values.A smaller RB indicates more spatially representative observations for the corresponding pixel at the specific spatial resolution of s.
The two above indicators mainly measure the point-to-area value consistency, so the heterogeneity of the spatial distribution of the LST within a pixel, which is correlated with the vegetation growth, must be seasonally quantified.In terms of the spatial autocorrelation of LST parameters, semivariogram is one of the most commonly used and efficient geostatistical analysis tools for quantitatively evaluating spatial variations.Semi-variance and related geostatistical kriging were developed from mining research during the late 1950s and have been widely used after a publication by Journel and Huijbregts in 1978 [34,35].These geostatistical techniques have been used in many scientific projects, such as in describing the distribution and density of plants and animals [36,37] and in determining the spatial scales of variation and sampling strategies in remote sensing [38][39][40].In regionalized variable theory, the semi-variance measures the dissimilarity of a spatial variable observed at different locations.The semi-variance is calculated by the average squared difference between observations Z(x i ) and Z(x j ), which are separated by distance h, as described below: where 2N(h) is the number of observation pairs, which are separated by a distance h, or lag, as intervals to calculate the semi-variance.In this study, the variogram estimator r(h) is computed on discretized point values from high-spatial resolution LST pixels, and then a variogram model is established as a parametric functional approximation based on these semi-variance values.Several theoretical variogram model types exist, including linear models, spherical models, exponential models, and Gaussian models.Among these models, spherical models are the most widely used variogram models for their strong fitting and generalization capabilities and are recommended for assessing the spatial representativeness of observations [23,32].The isotropic spherical variogram that is used to estimate the variogram is as follows: In Equation ( 4), it is obvious that the r sph (h) generally increases from a nonzero value to a relatively stable constant value with h.When h = 0, the r sph (h) value is a nonzero value c 0 , namely, r sph (0) = c 0 , and c 0 is the nugget of the variogram.The stable constant value is the sill (c 0 + c) of the variogram.When r sph (h) reaches the sill, the value of the variable h is a, namely, the range of the variogram.The key parameters range (a), nugget (c 0 ), and sill (c 0 + c) can be obtained from fitting Equation (4).a is the maximal distance between the two correlated points and indicates the average structural scale (ASS) of the given area.The nugget c 0 indicates the level of Z(x i )'s randomness, which may be caused by internal variations in Z(x i ) over a smaller distance h than the sampling distance or may be derived from the sampling error.The sill (c 0 + c) represents the largest extent of the regionalized variation.Li and Reynolds [41] introduced the proportion of structural variation, which is based on subtracting the variogram nugget (c 0 ) from the sill (c) and then dividing by the sill, to discuss the definition and quantification of ecological heterogeneity.However, the heterogeneity of LST parameters considerably varies over time when compared to that of ecological parameters, and reference high-resolution LST maps may not be completely synchronous with the validated LST products.Therefore, the ASS is introduced based on the range of the variogram model, which reflects the average structural scale of the given area and the size of the homogeneous area.A large ASS value indicates that the station observations represent a large homogeneous area.

Ground-Based LST Measurements
In this study, we selected the Heihe River Basin (HRB), which is the second largest inland valley in China's arid regions, to evaluate MODIS LSTs with 1-km resolution.The HRB is located in the northern arid region within 97.1 • E-102.0 • E and 37.7 • N-42.7 • N. Glaciers, frozen soils, alpine meadows, forests, irrigated crops, riparian ecosystems, deserts, and gobi are distributed from upstream to downstream regions (see Figure 2).The HRB was selected as an experimental watershed to reveal the processes and mechanisms of the ecohydrological system in an inland river basin.Allied telemetry experiments such as the Heihe Basin Field Experiment (HEIFE) [42] and Watershed Allied Telemetry Experimental Research (WATER) [43] have been conducted in the HRB, and the HiWATER [33] project has been ongoing since 2012.The stations that collect watershed hydrological observations cover a wider range than those in previous studies and provide a large amount of ecohydrological data for evaluation.Eighteen atmospheric stations from HiWATER are scattered around the HRB region.Longwave-radiation data for eighteen stations from 2013 to 2014 were selected to obtain ground-based LSTs to evaluate the MODIS LSTs.The locations of the stations are shown in Figure 2. The information for the stations is listed in Table 1, and environmental photos of these sites are shown in Figure 3.The ARC, ARS, ARY, DSL, JYL, HZS, HCG, and EBZ stations are located in the upstream area; the DMZ, GBZ, HZZ, SDZ, and SSW stations are located in the midstream area; and the downstream area contains the SDQ, HJL, HYZ, NTZ, and LTZ stations.The LCTs of these stations are all typical types in the three significantly different areas.
Eighteen meteorological towers are located in the HRB region, consisting of three superstations and fifteen ordinary stations.Pyrgeometers are deployed at these 10-m to 35-m high meteorological station towers to measure longwave radiation (see Figure 3 and Table 1).At least two pyrgeometers are positioned on a single tower: one facing upward and the other facing downward.The field-ofview (FOV) of the upward-facing pyrgeometer is nearly 180°, while that of the downward-facing pyrgeometer is 150°.Therefore, the effective diameter of the FOV of the pyrgeometers on a 10-m to 35-m tower is approximately 2.5-24 m with a 6-m average mounting height, and the diameters of the ground-observation footprints are shown in Table 1.The pyrgeometers are sensitive to the spectral range from 4.5 to 42 µm in the longwave band.All the instruments at each station were calibrated  1 "Footprint" refers to the diameter of the footprint; "Height" indicates the installation height.
Eighteen meteorological towers are located in the HRB region, consisting of three superstations and fifteen ordinary stations.Pyrgeometers are deployed at these 10-m to 35-m high meteorological station towers to measure longwave radiation (see Figure 3 and Table 1).At least two pyrgeometers are positioned on a single tower: one facing upward and the other facing downward.The field-of-view (FOV) of the upward-facing pyrgeometer is nearly 180 • , while that of the downward-facing pyrgeometer is 150 • .Therefore, the effective diameter of the FOV of the pyrgeometers on a 10-m to 35-m tower is approximately 2.5-24 m with a 6-m average mounting height, and the diameters of the ground-observation footprints are shown in Table 1.The pyrgeometers are sensitive to the spectral range from 4.5 to 42 µm in the longwave band.All the instruments at each station were calibrated before and after field deployment.Field-routing exams were implemented once per month.Assurance and quality control provided the best possible data for the level-2 daily data.All these data and related information can be found at the HiWATER website [44].All the ground-based measured data from the eighteen HiWATER sites were 10-min averaged values.The longwave radiation data were selected according to the field viewing time for the MODIS LST.The LST is related to land-surface longwave radiation according to the Stefan-Boltzmann law [1,45]: where L ↑ is the surface upwelling longwave radiation, ε b is the surface broadband emissivity, δ is Stefan-Boltzmann's constant (5.67 , and L ↓ is the atmospheric downwelling longwave radiation at the surface.Therefore, the ground-measured LST can be estimated from station longwave-radiation observations by the following equation: In Equation ( 6), L ↑ and L ↓ are obtained from the ground-based measurements.Seven narrowband emissivities exist in MOD/MYD11B1 LST/LSE products.ε b can be estimated from these MODIS narrowband emissivities [45]: where ε b is the broadband emissivity, and ε 29 , ε 31 , and ε 32 are the narrow emissivity products of MODIS bands 29, 31, and 32 that are retrieved from the MODIS day/night LST algorithm (MOD/MYD11B1 LST/LSE).
Remote Sens. 2017, 9, 1210 7 of 24 before and after field deployment.Field-routing exams were implemented once per month.
Assurance and quality control provided the best possible data for the level-2 daily data.All these data and related information can be found at the HiWATER website [44].All the ground-based measured data from the eighteen HiWATER sites were 10-min averaged values.The longwave radiation data were selected according to the field viewing time for the MODIS LST.The LST is related to land-surface longwave radiation according to the Stefan-Boltzmann law [1,45]: where ↑ is the surface upwelling longwave radiation, is the surface broadband emissivity, is Stefan-Boltzmann's constant (5.67 × 10 −8 W•m −2 •K −4 ), and ↓ L is the atmospheric downwelling longwave radiation at the surface.Therefore, the ground-measured LST can be estimated from station longwave-radiation observations by the following equation: In Equation ( 6

MODIS Data
As a component of NASA's Earth Observing System (EOS) project, two MODIS instruments were placed onboard the Terra and Aqua satellite platforms to provide information for global atmosphere-, land-and oceanic-process studies [46].MOD/MYD22_L2, MOD/MYD11A1, MOD/MYD11B1 and MOD/MYD07_L2 are all daily LST products based on thermal-infrared data from MODIS.MOD/MYD11_L2 was retrieved by a generalized split-window algorithm with 1-km

MODIS Data
As a component of NASA's Earth Observing System (EOS) project, two MODIS instruments were placed onboard the Terra and Aqua satellite platforms to provide information for global atmosphere-, land-and oceanic-process studies [46].MOD/MYD22_L2, MOD/MYD11A1, MOD/MYD11B1 and MOD/MYD07_L2 are all daily LST products based on thermal-infrared data from MODIS.MOD/MYD11_L2 was retrieved by a generalized split-window algorithm with 1-km spatial resolution [47].MOD/MYD11A1 is tile-based and gridded in the sinusoidal projection from MOD/MYD11_L2.MOD/MYD11B1 was obtained using the day/night LST algorithm at 5-km spatial resolution [48].MOD/MYD07_L2 was retrieved by the atmospheric team using statistical regression methods [49].In this study, we focus on the collection of 5 MOD/MYD11A1 products, which are more widespread.Uncertainties from the satellite measurements and improvements in the original MODIS LSTs for cloudy days are beyond the scope of this paper.To eliminate effects from the inversion algorithm and clouds, only pixels with high-quality MODIS LSTs were selected for the evaluation based on a quality control flag value of 0. The narrow emissivity products from MOD/MYD11B1 LST/LSE were selected to estimate the land-surface broadband emissivity ε b and obtain ground-based LSTs [45].The narrow emissivities from MOD/MYD11B1 LST/LSE at 5-km resolution were resampled to 1-km spatial resolution to match the evaluated MODIS LSTs.

High-Spatial-Resolution Images
Land-cover images with a spatial resolution of 30 m (data doi:10.3972/hiwater.155.2014.db,downloaded from the HiWATER land cover datasets [50], which were produced by Zhong et al. [51,52], were selected to obtain the DLCT in a 1-km LST product pixel.This dataset was mainly based on charge-coupled device (CCD) data from the Huan Jing 1 (HJ-1) satellite, which was launched on 6 September 2008, by the China Center for Resources Satellite Data and Application (CRESDA).HJ-1/CCD has three visible bands and one near-infrared band [53].This dataset provides monthly land-cover maps of the HRB from 2011 to 2015 with 30-m spatial resolution.LCTs change with the seasons, and these changes are similar across consecutive years, so the DLCT products in 2013 were collected to calculate the DLCT indicator.
Landsat 8, which is called the Landsat Data Continuity Mission, is extending the distinguished 40-year records of Landsat-series satellites and has enhanced capabilities, such as adding new spectral bands in the visible and thermal-infrared wavelengths and improving the signal-to-noise ratio and radiometric resolution of the sensor [54].The Landsat 8 satellite includes two instruments: an Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS).High-resolution LST maps were retrieved from the Landsat 8 TIRS data and OLI data.Before retrieving the LST maps, the Landsat OLI and TIRS images were preprocessed, including radiometric calibration and atmospheric correction based on the correction model in the ENVI software.Monthly NDVI maps were obtained to assess the relationships between the monthly changes in the indicators and vegetation growth.The NDVI maps were based on the visible red band (R, band 4) and the near-infrared band (NIR, band 5) according to the following equation: In this study, the LST data were estimated from the TIRS aboard Landsat 8 based on a practical split-window (SW) algorithm developed by Du et al. [55].The SW algorithm can be expressed as follows: where T is LST, T i and T j are the TOA brightness temperatures in the thermal-infrared channels i and j, respectively; ε is the average emissivity of the two channels (i.e., ε = 0.5(+ε j )); ∆ε is the channel emissivity difference (i.e., ε = 0.5(ε i − ε j )); and b k (k = 0, 1, . . .7) are the algorithm coefficients from the simulated dataset.In this algorithm, the coefficients were determined based on atmospheric water-vapor subranges, which were obtained through a modified split-window covariance-variance ratio method.The channel emissivities were acquired from newly released global land-cover products at 30 m and a fraction of the calculated vegetation cover from visible and near-infrared images that were obtained by Landsat 8.The effect of heterogeneity changes depending on the season, so we selected Landsat 8 data from September 2013 to August 2014 with a 16-day temporal resolution to calculate the RB, and ASS.A total of 92 Landsat 8 images for all the stations in the HRB were downloaded from the following USGS website [56].The statistical results were based on per-month averages to eliminate invalid data from cloud cover.

Spatial Representativeness Classification
Since the DLCT and RB can measure point-to-pixel value consistency, and ASS can assess the spatial patterns for a given station, respectively.Therefore, all three indicators were combined to describe the representativeness including point-to-pixel consistency and spatial heterogeneity.The spatial representativeness was also classified based on these three indicators.The DLCT indicator determines the representativeness of the station's LCT in the product pixel.When the LCT in the view footprint of a station is not dominant within the product pixel, the station observations cannot be representative of the pixel, and the LST value of all other vegetation-cover types may be ignored, even if the point-to-area LST consistency is high at the station sometime.When a pixel has a large DLCT value, the RB and ASS subsequently would determine the spatial representativeness together.Presumably, the station-observed LSTs represent ideal data for LST product validation if the RB value is small and the ASS value is large.If the RB and are ASS value are both small, the station may have some spatial representation.In some extreme cases, the station observing area is an average heterogeneity sub-areas in the products pixel, which means the station observations are representative for pixels, although the surface is heterogeneous in these products pixels.Finally, if the RB is large but the ASS is small, the observations probably differ from the values of the pixel.
Reasonable thresholds for the DLCT, RB and ASS are required to determine the representativeness level of a given station's observations.The emissivity products of the version 5 collection of MODIS LST/LSE products are retrieved based on the LCT of a pixel, and the LCT is classified as the land cover of this pixel based on the classification rule for MODIS land-cover products (MCD12Q1) if the area percent of one LCT in a pixel is higher than 60% [57].Thus, 60% was defined as the threshold of the DLCT in this study.The RB was used to evaluate the difference between the land-based measurements within the view footprint at each station (in Table 1, the view footprints are shown in the sixth column) and the mean pixel value at the station locations.The ideal RB value is close to zero.However, the threshold of the RB is not unique and depends on the spatial resolution of the LST products, the view footprint of the station measurements, and the retrieval accuracy of the high-spatial-resolution LST maps that are used to evaluate the representativeness.Thus, a reasonable threshold for the RB was 0.5% in this study for MODIS LST products with 1-km spatial resolution, a station view footprint above 30 m and a 1-K retrieval error from the high-resolution LST map itself [55].The ASS is calculated from variogram models based on the semi-variance in a 3-km × 3-km area that is centered at a given station and can indicate the greatest distance over which the value at a point on the surface is related to the value at another point.The ASS defines the maximum neighborhood over which control points should be selected to estimate a grid node to take advantage of the statistical correlation among observations and can describe the spatial distribution of LSTs and quantify the average LST spatial structures in the given area.In this study, the measurements from stations were used to evaluate the MODIS LSTs with a 1-km spatial resolution.Therefore, a reasonable ASS indicator should be larger than the spatial resolution of the LST products, that is, 1 km in this study.
The spatial representativeness of the station's LST observations was classified into five different levels based on the difference-constraining degrees of the three indicators and their thresholds.The levels and their descriptions are presented in Table 2.

Representativeness Grading of the HiWATER Station Observations
The representativeness of the LST observations for the eighteen HiWATER stations was graded based on the three indicators and level in Table 2. Figure 4 shows the DLCT indicator results for the stations, and the red dashed line shows the threshold value of the DLCT in the figure.In the upstream area, all the stations were covered by grass or meadows, except for HZS, which was covered by wheat.The DLCT values within a 1-km pixel indicated that five stations were covered by a single LCT and one station (DSL) was covered by a dominant grass type, with a DLCT value above 80%.In the upstream area of the HRB, only one station (HZS) had a DLCT value below 60%.Among the five stations in the midstream area of the HRB, SSW, HZZ, and GBZ were covered by a single LCT and SDZ was dominated by one type within a 1-km pixel.The LCTs within a 1-km pixel were diverse and broken at DMZ, and the DLCT value of the station observations was less than 60%.The LCTs of the five downstream stations were more complex within a 1-km pixel, and their DLCT values were all less than 60%.According to the DLCT threshold, the observations of seven stations, including HZS, DMZ, SDQ, HJL, HYL, NTZ, and LTZ, were graded at Level 5 (Table 2).Eleven remaining stations exhibited vegetation-type cover, matching the types in the corresponding MODIS LST pixels, but their representative levels varied depending on the RB and ASS values.

Representativeness Grading of the HiWATER Station Observations
The representativeness of the LST observations for the eighteen HiWATER stations was graded based on the three indicators and level in Table 2. Figure 4 shows the DLCT indicator results for the stations, and the red dashed line shows the threshold value of the DLCT in the figure.In the upstream area, all the stations were covered by grass or meadows, except for HZS, which was covered by wheat.The DLCT values within a 1-km pixel indicated that five stations were covered by a single LCT and one station (DSL) was covered by a dominant grass type, with a DLCT value above 80%.In the upstream area of the HRB, only one station (HZS) had a DLCT value below 60%.Among the five stations in the midstream area of the HRB, SSW, HZZ, and GBZ were covered by a single LCT and SDZ was dominated by one type within a 1-km pixel.The LCTs within a 1-km pixel were diverse and broken at DMZ, and the DLCT value of the station observations was less than 60%.The LCTs of the five downstream stations were more complex within a 1-km pixel, and their DLCT values were all less than 60%.According to the DLCT threshold, the observations of seven stations, including HZS, DMZ, SDQ, HJL, HYL, NTZ, and LTZ, were graded at Level 5 (Table 2).Eleven remaining stations exhibited vegetation-type cover, matching the types in the corresponding MODIS LST pixels, but their representative levels varied depending on the RB and ASS values.The annual vegetation growth at stations can severely affect changes in their spatial representativeness.When stations have homogeneous vegetation types and density, the representativeness of stations may not vary with vegetation growth.Instead, the representativeness changes with the level of the point-to-area LST match, vegetation-density homogeneity and spatial contexture of the pixel during different seasons.The spatial representativeness of station The annual vegetation growth at stations can severely affect changes in their spatial representativeness.When stations have homogeneous vegetation types and density, the representativeness of stations may not vary with vegetation growth.Instead, the representativeness changes with the level of the point-to-area LST match, vegetation-density homogeneity and spatial contexture of the pixel during different seasons.The spatial representativeness of station observations can vary with vegetation growth in a year.When the vegetation type is uniform and the density is homogeneous, the representativeness at different growth stages may not significantly change.Otherwise, the representativeness may change because of changes in the point-to-area consistency and spatial heterogeneity.Seasonal changes in the spatial representativeness are similar in consecutive years when not interfered with by outside influences.Therefore, the RB and ASS values of the station observations were calculated monthly and are shown in Figures 5 and 6.

ARC ARY ARS DSL EBZ HCG HZS JYL DMZ SSW HZZ SDZ GBZ SDQ HJL HYL NTZ LTZ
observations can vary with vegetation growth in a year.When the vegetation type is uniform and the density is homogeneous, the representativeness at different growth stages may not significantly change.Otherwise, the representativeness may change because of changes in the point-to-area consistency and spatial heterogeneity.Seasonal changes in the spatial representativeness are similar in consecutive years when not interfered with by outside influences.Therefore, the RB and ASS values of the station observations were calculated monthly and are shown in Figures 5 and 6   The RB values of the stations in the upstream area, midstream area, and downstream area and their monthly changes are shown in Figure 5a-c, respectively.The eight upstream stations did not show a consistent trend with vegetation growth.As shown in Table 1, the LCTs of these stations were mostly grass or meadows.The DLCT values of these stations were all greater than 80%, with the exception of HZS, and the area of the heat-shadow effect in the thermal-infrared image was wider than the actual area.The RB depended on the homogeneity of the vegetation density for the stations that were covered by only one vegetation type and its changes with the seasons.Many factors can influence the density's homogeneity and its changes with vegetation growth, such as non-uniform rain or snow, random grazing in the grass, and uneven growth.Thus, complex factors are likely the observations can vary with vegetation growth in a year.When the vegetation type is uniform and the density is homogeneous, the representativeness at different growth stages may not significantly change.Otherwise, the representativeness may change because of changes in the point-to-area consistency and spatial heterogeneity.Seasonal changes in the spatial representativeness are similar in consecutive years when not interfered with by outside influences.Therefore, the RB and ASS values of the station observations were calculated monthly and are shown in Figures 5 and 6.The RB values of the stations in the upstream area, midstream area, and downstream area and their monthly changes are shown in Figure 5a-c, respectively.The eight upstream stations did not show a consistent trend with vegetation growth.As shown in Table 1, the LCTs of these stations were mostly grass or meadows.The DLCT values of these stations were all greater than 80%, with the exception of HZS, and the area of the heat-shadow effect in the thermal-infrared image was wider than the actual area.The RB depended on the homogeneity of the vegetation density for the stations that were covered by only one vegetation type and its changes with the seasons.Many factors can influence the density's homogeneity and its changes with vegetation growth, such as non-uniform rain or snow, random grazing in the grass, and uneven growth.Thus, complex factors are likely the The RB values of the stations in the upstream area, midstream area, and downstream area and their monthly changes are shown in Figure 5a-c, respectively.The eight upstream stations did not show a consistent trend with vegetation growth.As shown in Table 1, the LCTs of these stations were mostly grass or meadows.The DLCT values of these stations were all greater than 80%, with the exception of HZS, and the area of the heat-shadow effect in the thermal-infrared image was wider than the actual area.The RB depended on the homogeneity of the vegetation density for the stations that were covered by only one vegetation type and its changes with the seasons.Many factors can influence the density's homogeneity and its changes with vegetation growth, such as non-uniform rain or snow, random grazing in the grass, and uneven growth.Thus, complex factors are likely the main reasons for the inconsistency and volatility of the stations' changing RB trends.The chance of an RB value below 0.5% was greatest for stations with a DLCT above 60%.
The monthly changes in the RB at the stations in the midstream and downstream areas of the HRB could be divided into two types (Figure 5b,c).The first type included GBZ, SSW and HZZ, with DLCT values of 100%.Their RB values showed similar trends to those of the upstream stations, with inconsistency and volatility.The RB values of these stations were mostly smaller than 0.5%.The RB values of the other midstream and downstream stations, which had smaller DLCT values, increased with the NDVI (shown in Figure 6), although the RB's maximum peak point appeared within, before or after the month of the maximum NDVI peak.The RB values of these stations were all larger than 0.5% during the growing season.The land within a 1-km pixel area at these stations was covered by more than two LCTs, and the LST difference increased with vegetation growth and increasing solar radiation, especially at the maximum peak during summer.For instance, the midstream SDZ station was mainly covered by water and seeds.During winter, the station was predominantly covered by water, resulting in a small RB value below 0.5%.As the seeds began to grow, the seed area became larger, and the LST difference between seed and water increased as the solar radiance became stronger, creating a larger RB value.After reaching its maximum, the RB value decreased as vegetation growth and the solar radiance simultaneously decreased.The above analysis shows that the RB is relatively consistent with the DLCT and that an RB threshold is reasonable to effectively distinguish each level for a 1-km pixel.
The monthly ASS values of all the stations in the HRB are shown in Figure 7, with the ASS plots of stations in the upstream area in Figure 7a, the ASS plots of stations in the midstream area in Figure 7b, and the ASS plots of stations in the downstream area in Figure 7c.The black dashed lines in Figure 7 are the threshold of the ASS.The ASS plots of each station in Figure 7 did not show consistent trends as the seasons changed.However, when the DLCT values of the stations were less than 60%, the corresponding ASS values were less than 1000 m.Even for stations with DLCT values of 100%, the ASS value could be smaller than 1000 m, such as JYL in January and February (see Figures 4a and 6a).The ASS results for the stations were determined by the heterogeneity in the 1-km LST pixel.The heterogeneity of stations with a DLCT below 60% was mainly caused by the LST differences among different LCTs in the pixel.By contrast, the heterogeneity of stations with greater DLCT values, including values of 100%, was determined by the density heterogeneity of the dominant vegetation in the pixel, which is influenced by many factors, including uneven vegetation growth, human activity, such as random grazing, and localized weather, such as non-uniform rainfall.Moreover, the results were affected by the combination of these factors, leading to fluctuating ASS plot trends for stations with DLCT values of 100%.
main reasons for the inconsistency and volatility of the stations' changing RB trends.The chance of an RB value below 0.5% was greatest for stations with a DLCT above 60%.
The monthly changes in the RB at the stations in the midstream and downstream areas of the HRB could be divided into two types (Figure 5b,c).The first type included GBZ, SSW and HZZ, with DLCT values of 100%.Their RB values showed similar trends to those of the upstream stations, with inconsistency and volatility.The RB values of these stations were mostly smaller than 0.5%.The RB values of the other midstream and downstream stations, which had smaller DLCT values, increased with the NDVI (shown in Figure 6), although the RB's maximum peak point appeared within, before or after the month of the maximum NDVI peak.The RB values of these stations were all larger than 0.5% during the growing season.The land within a 1-km pixel area at these stations was covered by more than two LCTs, and the LST difference increased with vegetation growth and increasing solar radiation, especially at the maximum peak during summer.For instance, the midstream SDZ station was mainly covered by water and seeds.During winter, the station was predominantly covered by water, resulting in a small RB value below 0.5%.As the seeds began to grow, the seed area became larger, and the LST difference between seed and water increased as the solar radiance became stronger, creating a larger RB value.After reaching its maximum, the RB value decreased as vegetation growth and the solar radiance simultaneously decreased.The above analysis shows that the RB is relatively consistent with the DLCT and that an RB threshold is reasonable to effectively distinguish each level for a 1-km pixel.
The monthly ASS values of all the stations in the HRB are shown in Figure 7, with the ASS plots of stations in the upstream area in Figure 7a, the ASS plots of stations in the midstream area in Figure 7b, and the ASS plots of stations in the downstream area in Figure 7c.The black dashed lines in Figure 7 are the threshold of the ASS.The ASS plots of each station in Figure 7 did not show consistent trends as the seasons changed.However, when the DLCT values of the stations were less than 60%, the corresponding ASS values were less than 1000 m.Even for stations with DLCT values of 100%, the ASS value could be smaller than 1000 m, such as JYL in January and February (see Figures 4a and 6a).The ASS results for the stations were determined by the heterogeneity in the 1-km LST pixel.The heterogeneity of stations with a DLCT below 60% was mainly caused by the LST differences among different LCTs in the pixel.By contrast, the heterogeneity of stations with greater DLCT values, including values of 100%, was determined by the density heterogeneity of the dominant vegetation in the pixel, which is influenced by many factors, including uneven vegetation growth, human activity, such as random grazing, and localized weather, such as non-uniform rainfall.Moreover, the results were affected by the combination of these factors, leading to fluctuating ASS plot trends for stations with DLCT values of 100%.The monthly representativeness levels of the multi-temporal LST observations of all the stations in the HRB are shown in Figure 8 based on the DLCT values and the monthly RB and ASS values, The monthly representativeness levels of the multi-temporal LST observations of all the stations in the HRB are shown in Figure 8 based on the DLCT values and the monthly RB and ASS values, which were calculated using the high-resolution LST images from the Landsat 8 images.Eight stations were present in the upstream area, so the upstream stations were divided into two figures, namely, Figure 8a,b, to clearly show the monthly representativeness.The monthly representativeness levels of the midstream and downstream stations are shown in Figure 8c,d.All the downstream stations had DLCT values below 60%, so all the stations are listed as Level 5 in Figure 8d.The spatial representativeness varied by month for the different stations.Among the 216 months of observations (18 stations in 12 months from September 2013 to September 2014), only 73 months of observations were graded as Level 1, so the number of station observations that perfectly represented the pixel were limited.
Remote Sens. 2017, 9, 1210 14 of 24 which were calculated using the high-resolution LST images from the Landsat 8 images.Eight stations were present in the upstream area, so the upstream stations were divided into two figures, namely, Figure 8a,b, to clearly show the monthly representativeness.The monthly representativeness levels of the midstream and downstream stations are shown in Figure 8c,d.All the downstream stations had DLCT values below 60%, so all the stations are listed as Level 5 in Figure 8d.The spatial representativeness varied by month for the different stations.Among the 216 months of observations (18 stations in 12 months from September 2013 to September 2014), only 73 months of observations were graded as Level 1, so the number of station observations that perfectly represented the pixel were limited.

Traditional Validation Results without Spatial Representative Assessment
Traditional initial validation results, which neglect the evaluation of the spatial representativeness of the station observations, are presented first to demonstrate the effectiveness of our proposed scheme.The scatter plots in Figures 9 and 10 show the daytime and nighttime comparison results between the MODIS LST and ground-based LST measurements.As shown in Figure 8, the scatters in half of the sub-plots were close to the 1:1 line (the black line in the scatter plot).By contrast, the scatters for ARY, JYL, SDZ, SSW, SDQ, HJL, HYZ, NTZ, and LTZ were more decentralized.In Figure 10, the scatters of these sites were closer to the 1:1 line than those in Figure 9. Table 3 presents a statistical comparison of the daytime and nighttime results between the groundbased measurements and MODIS LST products that were retrieved from the MODIS data onboard Terra and Aqua, respectively.The bias (ground-based LST measurements-MODIS LST) at eighteen sites during the daytime ranged between −4.72 K and 6.46 K for the MOD11A1 LST/LSE products and between −3.44 K and 6.24 K for the MYD11A1 LST/LSE products; the root mean square error (RMSE) during the daytime ranged from 1.74 K to 5.50 K for the Terra MODIS LSTs (MOD11A1 LST/LSE products) and from 1.75 K to 5.94 K for the Aqua MODIS LSTs (MYD11A1 LST/LSE products).By contrast, the bias and RMSE between the Terra MODIS LST and ground-based measurements at nighttime ranged from −0.51 K to 3.86 K and from 1.19 K to 2.64 K, respectively; for

Traditional Validation Results without Spatial Representative Assessment
Traditional initial validation results, which neglect the evaluation of the spatial representativeness of the station observations, are presented first to demonstrate the effectiveness of our proposed scheme.The scatter plots in Figures 9 and 10 show the daytime and nighttime comparison results between the MODIS LST and ground-based LST measurements.As shown in Figure 8, the scatters in half of the sub-plots were close to the 1:1 line (the black line in the scatter plot).By contrast, the scatters for ARY, JYL, SDZ, SSW, SDQ, HJL, HYZ, NTZ, and LTZ were more decentralized.In Figure 10, the scatters of these sites were closer to the 1:1 line than those in Figure 9. Table 3 presents a statistical comparison of the daytime and nighttime results between the ground-based measurements and MODIS LST products that were retrieved from the MODIS data onboard Terra and Aqua, respectively.The bias (ground-based LST measurements-MODIS LST) at eighteen sites during the daytime ranged between −4.72 K and 6.46 K for the MOD11A1 LST/LSE products and between −3.44 K and 6.24 K for the MYD11A1 LST/LSE products; the root mean square error (RMSE) during the daytime ranged from 1.74 K to 5.50 K for the Terra MODIS LSTs (MOD11A1 LST/LSE products) and from 1.75 K to 5.94 K for the Aqua MODIS LSTs (MYD11A1 LST/LSE products).By contrast, the bias and RMSE between the Terra MODIS LST and ground-based measurements at nighttime ranged from −0.51 K to 3.86 K and from 1.19 K to 2.64 K, respectively; for the Aqua MODIS LST, the bias ranged from −0.39 K to 3.48 K and the RMSE ranged from 1.94 K to 3.82 K.
Remote Sens. 2017, 9, 1210 15 of 24 the Aqua MODIS LST, the bias ranged from −0.39 K to 3.48 K and the RMSE ranged from 1.94 K to 3.82 K.The initial results revealed that the MODIS LSTs fit better with MODIS at nighttime than during the daytime, and the comparison results greatly differed between the daytime and nighttime for some stations, especially HZS, HYZ, NTZ, and LTZ, which are all Level-5 stations.The LST had stronger heterogeneity during the daytime than at nighttime.During the day, LSTs can vary by 10 K or more over a few meters depending on the nature of the surface and the local meteorological conditions, with the variability being lower at night [9].When comparing the unrepresentative observations of these stations to the MODIS LSTs, the stronger heterogeneity resulted in greater differences in the bias and RMSEs of these stations between the daytime and nighttime because of the constant accuracy of the MODIS LST products within the same pixel.If these results were considered the final results for the accuracy of the MODIS LST, an obvious error would be introduced into the validation because The initial results revealed that the MODIS LSTs fit better with MODIS at nighttime than during the daytime, and the comparison results greatly differed between the daytime and nighttime for some stations, especially HZS, HYZ, NTZ, and LTZ, which are all Level-5 stations.The LST had stronger heterogeneity during the daytime than at nighttime.During the day, LSTs can vary by 10 K or more over a few meters depending on the nature of the surface and the local meteorological conditions, with the variability being lower at night [9].When comparing the unrepresentative observations of these stations to the MODIS LSTs, the stronger heterogeneity resulted in greater differences in the bias and RMSEs of these stations between the daytime and nighttime because of the constant accuracy of the MODIS LST products within the same pixel.If these results were considered the final results for the accuracy of the MODIS LST, an obvious error would be introduced into the validation because of the unrepresentativeness of the station observations when compared to the corresponding scale of the remote-sensing observations.The stronger heterogeneity during the daytime affected the Aqua MODIS LST and Terra MODIS LST validation process in a similar manner, so no significant differences in accuracy existed between the LST products from Terra MODIS and the LST products from Aqua MODIS when compared to the ground-measured LSTs at these sites, as shown in Table 3.

MODIS LST Product Validation Considering the Influence of Spatial Representativeness Evaluation
All the station observation validation results and representative station observation validation results were calculated and compared to discuss the effect of representativeness on the validation of satellite-retrieved LSTs.In total, 7527 station observations from eighteen stations were evaluated based on the Landsat 8 LST reference maps according to the satellite-overpass times and quality control flags of the MODIS LSTs.The RMSE and bias of the MOD/MYD11A1 LST products based on all the observations and the different levels of observations are listed in Table 4.For all the observations, the bias of the MODIS LST ranged from −0.27 K to 2.39 K, and the RMSE ranged from 3.32 K to 4.93 K.The RMSEs were the smallest for Level 1, which contained 2472 station observations.During both the daytime and nighttime, the bias between the station observations and MODIS LSTs was better than 1 K and the RMSE was less than 2 K.No significant differences existed between the daytime and nighttime comparison results for the Level 1 observations.These results using Level 1 observations are inconsistent with those from the traditional validation scheme in Section 4.3 and are more reasonable based on the retrieval accuracy of the MODIS LSTs during the daytime and nighttime for the same pixel.The comparison results between the MODIS LSTs and LST observations for Levels 2 and 3 had a similar bias and RMSE, which were not larger than those for all the observations.The results for the Level-4 observations had a larger bias and RMSE than those for Levels 3 and 2. The largest RMSE values were obtained when using Level 5 observations.The RMSE value increased from Level 1 to Level 5. Large differences existed in the bias and RMSE between the daytime and nighttime at all levels, except for Level 1, because of the greater spatial heterogeneity during the daytime.Table 4 indicates that the highly representative validated observations usually had good results.Thus, highly representative observations can more accurately describe satellite-retrieved LST values and limit spatial mismatch errors from point-station observations.Therefore, validating all the stations without grading their representativeness usually underestimated the accuracy of the satellite-retrieved LSTs; these underestimated accuracy values were larger for the daytime validation than the nighttime validation.The MOD columns show the results of a comparison between the LSTs from the MODIS onboard Terra and the ground-based measurements at the eighteen sites; b The MYD columns show the results of a comparison between the LSTs from the MODIS onboard Aqua and the ground-based measurements at the eighteen sites.

Other Potential Factors during the LST Validation Process
Other factors may potentially influence the accuracy of the validation results.In this study, we only focused on the sensor view zenith angle (VZA) and broadband sensitivity issues, which are the main factors that affect T-based validation for MODIS LSTs in addition to the representativeness of the station observations.The effects of these two factors are discussed in detail below.

Dependence of the LST Error on the Sensor VZA
The relationship between the error (the absolute difference between MODIS and ground-based LSTs) and the sensor VZA was first investigated at all HiWATER sites to analyze the potential factors that create large errors at certain sites.The VZAs differed between ground-based instruments and the MODIS sensor: the ground-measured LSTs were obtained at a VZA of 0 • , whereas the MODIS observations were acquired over a large range of VZAs (0-65 • ).Sufficient ground-based measurements were available to statistically analyze multiple VZAs using data from the eighteen sites.Scatterplots of the errors and sensor VZAs are presented in Figure 11.The average errors did not significantly depend on the VZAs during the daytime (see Figure 11a).By contrast, the probability of a larger error increased with the VZA for the nighttime scatterplot (Figure 11b).The average absolute bias for observations that were acquired with a smaller VZA (≤30 • ) was 0.2 K lower than that for those at greater VZAs (>30 • ).Greater errors for LSTs observed under larger sensor VZAs were also observed in a prior validation study [8,13], and the remote sensor may see different percentages of shadows at different VZAs during the daytime, when larger heterogeneities occur [58].The larger LST heterogeneity at certain sites was the most likely cause of the daytime scatterplot (Figure 11a), rather than the obvious dependence of the error of the LST on the sensor VZA.

Broadband Sensitivity Issue
The accuracy of LSTs from ground-based longwave-radiation measurements is another important factor that influences the evaluation of MODIS LSTs.The accuracy of longwave-radiation measurements and the accuracy of the broadband emissivity, which is used to calculate LSTs from ground-based longwave-radiation measurements, are the two major factors for ground-based LST estimation.The accuracy of longwave-radiation measurements depends on the sensor calibration.Calibration is typically performed before and after field deployment and during monthly fieldrouting exams [59], so the estimation accuracy of the broadband emissivity is a key parameter that influences the estimation accuracy of ground-based LST observations.Our previous study indicated that the maximum mean absolute error (MAE) of the LST was less than 0.3 K when the estimation emissivity MAE was ≤0.01, which was estimated by adding a series bias of ±0.001, ±0.002, ±0.003, ±0.004, ±0.005, and ±0.01 based on the ε values at the stations in the northern arid region of China, including ARC, HZZ, and DMZ (also called YK) [60].In this study, the satellite-estimated broadband-emissivity method was used based on the MODIS narrowband emissivities.A suitable set of coefficients for all the data was used to calculate the broadband emissivity, as outlined in Equation ( 7), according to the diversity of the land cover at the sites.In a study by Wang et al., the broadband emissivity that was estimated by this method was compared to three years of ground-based emissivity measurements at Gaize (32.30°N and 84.06°E, with an elevation of 4420 m) in the western Tibetan Plateau.The results showed that the broadband-emissivity calculation from the MODIS narrowband emissivities reasonably matched the ground measurements, with a standard deviation of 0.0085 and a bias of 0.0015 [45].Therefore, the average estimation error of the broadband emissivity based on this method was less than 0.037 K, which is the maximum average absolute error value of all the stations in northern arid China for an emissivity bias of 0.02.

Conclusions
In traditional validation schemes, as discussed in Section 4.3, station observations are usually considered as the true LST values of the corresponding pixels.Therefore, a source of uncertainty for the validation results is introduced when the representativeness of station observations is not evaluated.The spatial representativeness assessment of station observations is a key step to reliably validate satellite-retrieved LST products, which has barely been discussed by previous researchers.In this study, a new validation scheme was proposed that adds a key step to evaluate the spatial representativeness of station LST observations.Three indicators, namely, the DLCT, RB, and ASS, were constructed in this new scheme to assess and grade the representativeness of station observations.The representativeness of the station observations was divided into five levels for a 1-km spatial resolution pixel according to the values of these three indicators.Then, the proposed method

Broadband Sensitivity Issue
The accuracy of LSTs from ground-based longwave-radiation measurements is another important factor that influences the evaluation of MODIS LSTs.The accuracy of longwave-radiation measurements and the accuracy of the broadband emissivity, which is used to calculate LSTs from ground-based longwave-radiation measurements, are the two major factors for ground-based LST estimation.The accuracy of longwave-radiation measurements depends on the sensor calibration.Calibration is typically performed before and after field deployment and during monthly field-routing exams [59], so the estimation accuracy of the broadband emissivity is a key parameter that influences the estimation accuracy of ground-based LST observations.Our previous study indicated that the maximum mean absolute error (MAE) of the LST was less than 0.3 K when the estimation emissivity MAE was ≤0.01, which was estimated by adding a series bias of ±0.001, ±0.002, ±0.003, ±0.004, ±0.005, and ±0.01 based on the ε b values at the stations in the northern arid region of China, including ARC, HZZ, and DMZ (also called YK) [60].In this study, the satellite-estimated broadband-emissivity method was used based on the MODIS narrowband emissivities.A suitable set of coefficients for all the data was used to calculate the broadband emissivity, as outlined in Equation ( 7), according to the diversity of the land cover at the sites.In a study by Wang et al., the broadband emissivity that was estimated by this method was compared to three years of ground-based emissivity measurements at Gaize (32.30 • N and 84.06 • E, with an elevation of 4420 m) in the western Tibetan Plateau.The results showed that the broadband-emissivity calculation from the MODIS narrowband emissivities reasonably matched the ground measurements, with a standard deviation of 0.0085 and a bias of 0.0015 [45].Therefore, the average estimation error of the broadband emissivity based on this method was less than 0.037 K, which is the maximum average absolute error value of all the stations in northern arid China for an emissivity bias of 0.02.

Conclusions
In traditional validation schemes, as discussed in Section 4.3, station observations are usually considered as the true LST values of the corresponding pixels.Therefore, a source of uncertainty for the validation results is introduced when the representativeness of station observations is not evaluated.The spatial representativeness assessment of station observations is a key step to reliably validate satellite-retrieved LST products, which has barely been discussed by previous researchers.In this study, a new validation scheme was proposed that adds a key step to evaluate the spatial representativeness of station LST observations.Three indicators, namely, the DLCT, RB, and ASS, were constructed in this new scheme to assess and grade the representativeness of station observations.The representativeness of the station observations was divided into five levels for a 1-km spatial resolution pixel according to the values of these three indicators.Then, the proposed method was used to evaluate the spatial representativeness of HiWATER LST observations for 1-km-pixel daily MODIS LST products.The analysis showed that the new validation scheme can effectively limit the error that is introduced by spatial mismatches between the station observations and remote-sensing products.This spatial representative assessment synthesized the three indicators, providing quantitative and accurate descriptions and reliably evaluating the observations' representativeness, although reasonable thresholds for the three indicators should consider the LST products' spatial resolution, the theoretical accuracy of the retrieval algorithm, and instrument error.Therefore, several conclusions can be drawn.
First, the traditional validation results when using all the station observations showed obvious errors and RMSE values between the daytime and nighttime validations, especially when using observations from the downstream stations.The retrieval method for the MODIS LST products could obtain LST products with similar accuracy during the daytime and nighttime.Therefore, the error was likely introduced by unrepresentative station observations.According to the HRB traditional validation sample, this effect caused a maximum bias of 8.03 K and a maximum RMSE of 3.25 K according to the difference between the daytime and nighttime validation results from the same stations.
Second, the monthly representativeness analysis illustrated that the spatial contexture heterogeneity and point-to-pixel consistency of the LST changed with variations in plant growth and other factors, such as human activity, the solar-radiation intensity, and the local climate, which created irregular changes in the monthly spatial representativeness of the stations.The monthly changes in all the factors that influenced the representativeness were not always similar between consecutive years, so the spatial representativeness in the same month of different years may have greatly changed for a given station.Therefore, the spatial representativeness for consecutive years could be reassessed through a rigorous LST product-validation process.
Third, the RMSE of the MOD/MYD11A1 products increased from 1.58 K to 6.27 K from Level 1 to Level 5, and the RMSE values from Level 1 to Level 3 were smaller than those across all the observations.For all levels except Level 1, the RMSE values were larger during the daytime than those at nighttime because of the stronger LST heterogeneity during the daytime.The obvious differences in the bias and RMSE values among the levels indicated that the representativeness method could sufficiently differentiate the spatial representativeness level within 1-km pixels.Moreover, the Level-1 LST observations were acceptable validation data for the MODIS 1-km LST products.While the RMSE difference between all observations (using in the traditional validations) and Level 1 observations validations are 3.0 K in daytime and 1.7 K in nighttime, which indicates the error introduced by the traditional validation without the representativeness assessment.Therefore, the error of the MOD/MYD11A1 products was better than 1 K and the RMSE was less than 2 K.Moreover, no obvious differences existed in the accuracy of the MODIS LST products among the four daily times that Terra and Aqua passed.
Finally, the bias increased from −0.19 K at Level 1 to 3.6 K at Level 5, and the RMSE increased from 1.58 to 6.27.Thus, a bias difference of 3.79 K and a RMSE difference of 4.69 K existed between Level 1, which was the most representative level, and Level 5, which was the least representative level.The dependence of the absolute biases on the sensor VZA during the daytime was strongly influenced by the land-surface heterogeneity at heterogeneous sites.During the nighttime, the LSTs of these sites were more homogeneous and any surface-heterogeneity effects were smaller, so the average absolute bias for observations that were acquired under lower VZA (≤30 • ) was 0.2 K lower than those that were acquired at greater VZAs (>30 • ).The estimated broadband emissivities from the narrowband MODIS LST/LSE products retrieved by the day/night algorithm varied with the land-surface conditions, such as vegetation growth and land cover, with a bias of 0.0015.Therefore, the estimation error of the broadband emissivities was less than 0.004 K. Compared to the effects of the representativeness, the errors from these last two factors were very small.Thus, the evaluation of spatial representativeness is a key step to reliably validate LST products with special spatial resolution, which is the greatest advantage of our proposed validation scheme.
A more reasonable and accurate scheme to validate remotely sensed LST products was proposed in this study.However, areas with more heterogeneous LSTs are not suitable for validation with single-point ground-based measurements and require conditional upscaling by multi-point ground-based measurements.Therefore, this scheme can be further improved for different conditions, and can be actually used to study potential new sites for LST validation.Furthermore, the thresholds of the three indicators were suitable for grading the levels of station observations in a 1-km pixel.This scheme can also be used to evaluate other LST products by renewing the thresholds and evaluating the spatial representativeness in other pixel grids.

Figure 1 .
Figure 1.New scheme for land surface temperature (LST) validation based on the assessment of local spatial representativeness (site level).In the scheme, LCT is the abbreviation of land-cover type, and NDVI is the abbreviation of normalized difference vegetation index.

Figure 1 .
Figure 1.New scheme for land surface temperature (LST) validation based on the assessment of local spatial representativeness (site level).In the scheme, LCT is the abbreviation of land-cover type, and NDVI is the abbreviation of normalized difference vegetation index.

Figure 2 .
Figure 2. Study area and locations of the validation stations.

Figure 2 .
Figure 2. Study area and locations of the validation stations.

Figure 3 .
Figure 3. Photos of the 18 Heihe Watershed Allied Telemetry Experimental Research (HiWATER) stations.The two pyrgeometers that were used to record the longwave radiation were deployed at an average height of 6 m, with one facing upwards and the other facing downwards.

Figure 3 .
Figure 3. Photos of the 18 Heihe Watershed Allied Telemetry Experimental Research (HiWATER) stations.The two pyrgeometers that were used to record the longwave radiation were deployed at an average height of 6 m, with one facing upwards and the other facing downwards.

Figure 4 .
Figure 4.The dominant land cover type (DLCT) for the HiWATER stations.

Figure 5 .
Figure 5. Monthly relative bias (RB) of the observations from the 18 stations within 1-km MODIS LST pixels.(a), (b) and (c) respectively show the monthly RB of the stations in the upstream, midstream and downstream of the Heihe River Basin (HRB).

Figure 6 .
Figure 6.Monthly normalized difference vegetation index (NDVI) at the eighteen sites from 2013 to 2014.

Figure 5 .
Figure 5. Monthly relative bias (RB) of the observations from the 18 stations within 1-km MODIS LST pixels.(a), (b) and (c) respectively show the monthly RB of the stations in the upstream, midstream and downstream of the Heihe River Basin (HRB).

Figure 5 .
Figure 5. Monthly relative bias (RB) of the observations from the 18 stations within 1-km MODIS LST pixels.(a), (b) and (c) respectively show the monthly RB of the stations in the upstream, midstream and downstream of the Heihe River Basin (HRB).

Figure 6 .
Figure 6.Monthly normalized difference vegetation index (NDVI) at the eighteen sites from 2013 to 2014.

Figure 6 .
Figure 6.Monthly normalized difference vegetation index (NDVI) at the eighteen sites from 2013 to 2014.

Figure 7 .
Figure 7. Monthly average structure scale (ASS) of the stations.(a), (b) and (c) respectively show the monthly ASS of the stations in the upstream, midstream and downstream of the HRB.

Figure 7 .
Figure 7. Monthly average structure scale (ASS) of the stations.(a), (b) and (c) respectively show the monthly ASS of the stations in the upstream, midstream and downstream of the HRB.

Figure 8 .
Figure 8. Grading results of the stations.(a,b) show the monthly representativeness level of the upstream stations; (c,d) show the monthly representativeness level of midstream and downstream stations

Figure 8 .
Figure 8. Grading results of the stations.(a,b) show the monthly representativeness level of the upstream stations; (c,d) show the monthly representativeness level of midstream and downstream stations

Figure 9 .
Figure 9. Scatterplots of the daytime comparison results of the LSTs from MODIS and the LST measurements at the eighteen HiWATER sites.

Figure 9 .
Figure 9. Scatterplots of the daytime comparison results of the LSTs from MODIS and the LST measurements at the eighteen HiWATER sites.

Figure 10 .
Figure 10.Scatterplots of the nighttime comparison results of the LSTs from MODIS and the LST measurements at the eighteen HiWATER sites.

Figure 10 .
Figure 10.Scatterplots of the nighttime comparison results of the LSTs from MODIS and the LST measurements at the eighteen HiWATER sites.

a
The MOD columns show the results of a comparison between the LSTs from the MODIS onboard Terra and the ground-based measurements at the eighteen sites; b The MYD columns show the results of a comparison between the LSTs from the MODIS onboard Aqua and the ground-based measurements at the eighteen sites.

Figure 11 .
Figure 11.Relationships between the bias and zenith angle of MODIS when viewing the pixels: (a) relationship during the daytime; and (b) relationship at nighttime.

Figure 11 .
Figure 11.Relationships between the bias and zenith angle of MODIS when viewing the pixels: (a) relationship during the daytime; and (b) relationship at nighttime.

Table 1 .
Information for the stations in this study.

Table 1 .
Information for the stations in this study.

Table 2 .
Grading of the spatial representativeness.

Table 2 .
Grading of the spatial representativeness.

Table 3 .
Summary of the statistical comparison results between the MODIS LSTs and the LST measurements from the HiWATER sites.

Table 4 .
Validation results for MOD/MYD11A1 based on the spatial representativeness levels of the station observations.