Spectral emissivity (SE) measurement uncertainties 2 across 2.5 – 14 μm derived from a round -robin study 3 made across international laboratories 4

: Information on spectral emissivity (SE) is vital when retrieving and evaluating land surface temperature (LST) estimates from remotely sensed observations. SE measurements often come from spectral libraries based upon laboratory spectroscopic measurements, with uncertainties typically derived from repeated measurements. To go further, we organised a ‘round - robin’ inter -comparison exercise involving SE measurements of three samples collected at seven different international laboratories. The samples were distilled water, which has a uniformly high spectral emissivity, and 35 two artificial samples (aluminium and gold sheets laminated in polyethylene), with variable emissivities and largely specular and Lambertian characteristics. Large differences were observed between some measurements, with standard deviations over 2.5 – 14 μm of ± 0.092, 0.054 and 0.028 38 emissivity units (15.98%, 7.56% and 2.92%) for the laminated aluminium sheet, laminated gold sheet 39 and distilled water respectively. Wavelength shifts of up to 0.09 μm were evident between spectra from different laboratories for the specular sample, attributed to system design interacting with the 41 angular behaviour of emissivity. We quantified the impact of these SE differences on satellite LST estimation and found that emissivity differences resulted in LSTs differing by at least 3.5 K for each 43 artificial sample and by more than 2.5 K for the distilled water. Our findings suggest that variations and a gold-coated integrating sphere (mercury-cadmium-telluride)

results. derivation, the sample and reference samples are mounted simultaneously (e.g. through multiple ports 142 or use of the internal sphere wall as for the reference), their measurements made consecutively (e.g. 143 through use of an internal rotating mirror), with the sample spectral reflectance again calculated 144 through their ratio. Theoretically, the comparative method should provide some benefit since it avoids 145 a known limitation in the substitution method (the so-called 'substitution error'), where changes in the 146 total internal sphere reflectance between measurements of the reference and the sample cause 147 underestimation of sample reflectances (and thus overestimation of emissivity) as discussed in [30].
148 Hardy and Pineo [31] determined that the substitution error could be as much as 25% for low reflectance 149 samples and 12% for samples with medium reflectance. Corrections have been developed [32,33] but 150 even these are known to include errors of up to 1% from approximations in the calculations. However 151 using both the substitution and comparative methods with the setup at UT-ITC, Hecker et al. [27] 152 observed differences between the SEs derived with those calculated using the substitution method in 153 closer agreement with other spectra, thus questioning the assumption that the comparative method 154 provided improved results. They attributed these differences to variations in the measurement 155 geometries between the reference and sample measurements made using the comparative method. The 156 measurement setup at the KCL laboratory has been designed to attempt to overcome this issue, and Two artificial SE samples were used in this study, with one specular and one diffuse to test the 170 setup performances for samples with different scattering properties. Sample 1 was a thin 60 mm × 60 171 mm aluminium sheet showing specular reflective behaviour and laminated in polyethylene (PE).

172
Sample 2 was a thin 47 mm × 57 mm diffusely reflective gold sheet also laminated in PE. The samples 173 are shown in Figure 2 and were selected primarily for their robustness and their ability to be used in 174 multiple different laboratory setups having different sample holder sizes and different measurement 175 alignments (e.g. side-looking and down-looking instrumentation). Additionally, both have known 176 spectrally varying properties over the 2.5 -14 μm spectral region since the metal foils have low 177 emissivity but the PE film has spectral regions of high absorptivity that result in spectral regions of high emissivity [34]. Due to the nature of the materials used in these reference samples, measurements of SE that required heating of the samples are inappropriate and only laboratories where SE measurements

189
In addition to the two artificial SE standards shown in Figure 2, the laboratories participating in 190 the Round Robin were also requested to make measurements of a sample of distilled water if their setup 191 was permitting. Distilled water is widely available and its SE is well known and available in the 192 ECOSTRESS spectral library, with SE measurements of distilled water featuring in other inter-193 comparison studies including [27]. Additionally, since distilled water has a low spectral reflectance of 194 only a few percent, the retrieval of accurate SE poses a challenge for the signal-to-noise capability of 195 setups that measure in DHR mode [28], thus providing a good test of a laboratory's capability.  water spectrum measured by John Hopkins University from the ECOSTRESS spectral library [25], also 205 considered in [27] and which was found by [28] to be within 0.17% of a theoretical spectrum of distilled 206 water calculated using optical constants of water in the infrared [35]. This spectrum is identifiable as 207 ESL.

208
The setup and instruments used within each of the seven laboratories vary in design, 209 interferometer type, age, reference standard (typically gold), and general measurement protocols. A 210 detailed description of each setup is provided in Appendix A, with an overview in Table 1. Note that 211 CSIRO were able to use two different instruments in the Round Robin -a Bruker VERTEX 80v with an integrating sphere with multiple possibilities of port placement for the sample and reference standards, both halfway (Oct 2018) and at the end (Sept 2019) of the exercise to be re-measured on the same setup 219 and with the same methodology to check for absolute changes to the samples. ONERA's Infragold 220 reference standard was additionally sent to KCL to be measured using the KCL setup to determine the 221 impact of using a different laboratory's reference standard to derive reflectance.

291
The SEs of the two artificial samples and distilled water measured using the setups listed in Table   292 2 are shown in Figures Table 2.
bands for CO2 and H2O compared to in the LWIR region -and the differences in how each setup 322 compensate for these atmospheric effects (if at all). There seems to have been an issue with the CO2 323 purging in the DLR setup when measuring the artificial samples as these results report an increase in 324 emissivity in the CO2 absorption band (~ 4.3 μm) which is not present in the measurements from the 325 other laboratories as can be seen in Figure 4. This is not apparent in the DLR measurement of distilled 326 water however ( Figure 5). Table 3. Mean (μ) and standard deviation (σ) of SE (ε) for each sample averaged over a specified 328 wavelength range, calculated using all spectra. The number in brackets shows the standard deviation 329 as a percentage of the mean. The subscripts a, b and c refer to the wavelength ranges averaged over,

332
As shown by the standard deviations in Table 3, SE differences are largest for the specular sample 333 (Sample 1), where the standard deviation of the measurements is ± 0.092 over the full wavelength range

334
(2.5 -14 μm). The maximum observed difference between two measurements (DLR and CSIRO BG-335 W_S-T, Sample 1) is 0.762 emissivity units, but this occurs around 4.3 μm and is thus within the CO2 336 absorption band and seems likely to be associated with an insufficient atmospheric compensation in 337 the DLR system as discussed earlier. However, DLR and CSIRO consistently produce the highest and 338 lowest emissivities respectively as evident from Table 4, which presents the mean absolute differences 339 of each individual measurement from the mean of all measurements. The measurements made at DLR 340 have the greatest positive bias compared to the mean while all measurements made at CSIRO using the 341 VERTEX 80v FTIR spectrometer (with the exception of that with the reference in the lower port and the

350
Fewer participants made SE measurements of distilled water ( Figure 5), but there is greater 351 agreement among these than for the artificial samples, with reduced standard deviations in all spectral 352 regions ( Table 3). Much of the variation in the distilled water spectra appears due to noise, given the  Table 4. At CSIRO, all the SE measurements made using the Bruker VERTEX 80v FTIR 384 spectrometer were derived using the comparative method, with sample and reference simultaneously

421
highlighting the wavelength shifts that appear in the 9.8 -11 μm spectral region. The mean SE of all 422 measurements is also shown.

423
A more likely cause of the shifts was identified as the different incident angles in each method.

424
This is because, for a specular sample, the resonance wavelength will change with incident angle, 425 assuming a cavity effect due to the thin layer coating. For example, the ONERA measurement sequence  Table 2.

439
Other potential causes of the wavelength shifts could be changes in the water vapour and CO2 440 conditions between the sample and reference measurements, non-uniformity in the PE film structure 441 and thickness for Sample 1, different sample orientations at time of measurement as in [44], or 442 differences in the spectral data interval as detailed in [29] and caused by different settings in zero-filling 443 factors for example. To evaluate the impact of sample orientation of position of the spectral features, 444 measurements were made of Sample 1 at KCL at different orientations (0° to 315° in increments of 45°).

445
The locations of the spectral features in this spectral region agreed between measurements at 0°, 90°,

446
180° and 270° but small wavelength shifts (Δλ ≈ 0.04 μm) were observed between these measurements 447 and the measurements at 45°, 135°, 225° and 315° (which were in agreement with each other). It is likely 448 therefore that the different sample orientations or differences in the illumination angles used within the 449 different measurement setups could therefore at least partly explain the spectral shifts observed.

451
To identify whether the cause of the differences could be attributed to different reference standards 452 used in the substitution approach, a comparison of SEs calculated using two different laboratories' 453 reference standards was conducted, shown in Figure 11. Using ONERA's reference standard (with 454 absolute reflectance provided by ONERA) within the KCL setup in substitution mode reduces the 455 differences between the measured emissivities of the artificial samples by between 10 -50%. However, it does not equalise them, with the KCL measured emissivities -including those derived using the

504
The range of the LSTs calculated using the distilled water emissivities is reduced compared to those 505 calculated using the artificial sample emissivities, reflecting the greater agreement between the SE 506 measurements of distilled water. However, the range of the LSTs calculated using distilled water 507 spectra convolved to the ASTER TIR bands is still ~2.5 K in all bands, thus still exceeding the GCOS 508 target accuracy requirements.

524
Results from all three samples considered indicated that three of the four CSIRO measurements reference target in the lower port and the sample in the top port (CSIRO BG-B_S-T) the only one in 527 agreement with the others. Given the similarity in the spectral shapes measured by this setup and those 528 measured at other laboratories for all three samples, this could suggest that CSIRO may need to re-529 characterise their reflectance standards for the other three configurations used with the VERTEX 80v 530 spectrometer. However, these differences may also be due to the different optical path lengths of the 531 sample beam with each permutation, or to directional reflectance effects of the sample at the different 532 incident angles of each permutation. Further investigation is recommended if CSIRO wish to use any 533 of the three biased configurations (e.g. to measure liquid samples, which is not currently possible with 534 the sample in top port configuration). Removing these three measurements from the analysis reduces 535 standard deviations to ± 0.089 (14.69%), ± 0.038 (5.16 %) and ± 0.008 (< 1%) for Sample 1, Sample 2 and 536 distilled water respectively across the 2.5 -14 μm wavelength range. Furthermore, the impacts on LST 537 are reduced considerably without these measurements, with the range of the LSTs calculated using 538 distilled water spectra convolved to the ASTER TIR bands reduced to < 0.45 K in all bands.

539
Differences were also observed between measurements made on the KCL setup with the reference

573
The increased variability in the MWIR than LWIR observed is likely due to the increased 574 atmospheric effects in this region, with the DLR measurements of Samples 1 and 2 clearly impacted in 575 the CO2 region ( Figure 4). An alternate explanation for the reduced variability in the LWIR for both 576 artificial samples could be because this is an area of high emissivity (and thus low reflectance), which 577 Hecker et al. [27] observed to be areas of better agreement in their intercomparison of emissivity spectra 578 from different laboratories.

579
This latter interpretation could also be why the distilled water measurements (with uniformly high 580 emissivity) had reduced variability compared to the artificial sample SE measurements (which had 581 variable emissivities between 0 and 1). However, it is more likely that the increased variability of the 582 SE measurements of the artificial samples is due to the composition of these samples and their 583 interaction with different setups. Tsilingiris [50] provide a transmittance spectrum for polyethylene 584 (PE), and considering this against the measured spectra of Samples 1 and 2 in Figure 4, it is clear that 585 variability amongst the different measurements is lower in the regions where PE has a low 586 transmittance (~3.5 μm, 6.9 μm and 13.8 μm). In all other spectral regions, the PE forms a multilayer 587 system which is potentially sensitive to directional illumination characteristics. Differences in the 588 incident angles upon the samples within the different measurement setups could therefore at least 589 partly explain the SE variations seen. Given that Sobrino and Cuenca [51] observed that emissivities in 590 field measurements tended to decrease with increases in observation angle, results from this study 591 indicate that future work should be conducted to explore whether emissivities from DHR setups in the 592 laboratory similarly correlate with incident angle. Materials with expected directional behaviour 593 should in particular be considered.

594
The observed spectral shifts between different measurements of the specular sample also raise 595 interesting questions about the impact of incident angles on SE measurements. While this may not be 596 an issue for non-specular samples without coating (and therefore for most natural samples), these the incident angle on the spectral stability and absolute emissivity. Spectral shifts of the magnitude 599 observed in this study will have more implications when working with data from hyperspectral rather 600 than multispectral thermal imagers. Conversely airborne hyperspectral instruments such as NASA such TES approaches often rely on laboratory emissivity spectra to derive empirical relationships used within the algorithm, and so their accuracy is still important even though for each hyperspectral image 606 pixel the emissivity is directly retrieved [10]. Such wavelength shifts may also have implications for use 607 of spectral emissivity features in e.g. mineral identification studies [54], and also may affect in situ LST 608 measurements given that radiometers commonly used in LST validation studies are affected -such as 609 the Heitronics KT15.85 IIP radiometer with a spectral range of 9.6 -11.5 μm [16,55].

626
Comparing the measurements from the different laboratories we found that the inter-setup 627 variability of the SE measurements was larger than anticipated, with differences in magnitude and 628 spectral shape. Standard deviations of ± 0.092 (15.98%) and ± 0.054 (7.56%) were identified across the 629 2.5 -14 μm spectral range for Samples 1 and 2 respectively. Repeated measurements using the same 630 measurement setup at different times confirmed that observed SE differences were not attributable to 631 changes in the sample properties over the course of the study but were rather due to the different setups 632 and measurement procedures used in the various laboratories. Variability was greater in the MWIR 633 rather than LWIR spectral region, likely due to differing efficiencies of atmospheric purging which 634 impact this region more. SE differences across the LWIR atmospheric window (8 -14 μm), which is the 635 most important for the remote sensing of LST, were ± 0.046 and ± 0.037 respectively for Samples 1 and the specular sample (Sample 1) over this region was attributed to spectral shifts, with differences 638 between identification of spectra maxima and minima of up to 0.09 μm between different setups, and 639 up to 0.13 μm between different positional permutations on one setup were observed in the 9.8 -11 μm 640 spectral range in particular. Investigation indicated potential causes of these spectral shifts to be 641 different sample orientations during measurements or differences in the incident angles within the 642 different measurement setups. The latter cause was also identified as a potential cause of the absolute 643 emissivity differences. Further investigation is therefore recommended into the impact of directional effects in laboratory measurements of emissivity (particularly for materials with known directional 645 behaviour) given recent advances into understanding the angular dependence of emissivity for field Use of different reference standards was found to contribute to the observed SE differences 648 between different laboratory measurements but not to be the sole factor. Nonetheless, the differences 649 observed from use of a different reference standard suggest that uncertainty in the reference standard 650 calibration is a key factor in emissivity uncertainty in laboratory measurements. Regular calibration of 651 the reference standards is recommended to reduce this uncertainty.

652
SE variability was comparatively lower for distilled water than for the artificial reference samples,

653
with a mean emissivity of 0.962 ± 0.028 determined over the 2.5 -14 μm spectral range. These

663
The impact of the determined spectral emissivity differences on LST retrieval was evaluated

690
Funding: Support for this research came partly from NERC National Capability funding to the National Centre
setup at King's College London.
The measurement system and processes of each laboratory that participated in the Round Robin 700 spectral emissivity inter-comparison are summarised below in the order that the measurements were 701 made.  Figure A1). There are two internal sources (an air-710 cooled globar for the MIR and a tungsten lamp for the NIR) and an external high-power water-cooled 711 globar for the MIR. The external IR source was used for this study for improved signal-to-noise ratio.

712
In addition to the MCT detector, the sphere is equipped with an external InGaAs detector for 713 consideration down to the NIR, thus enabling spectral measurements from 0.7 to 16 μm. The entire 714 system (including the integrating sphere) is continuously purged with H2O-and CO2 free air at a flow 715 rate of at least 200 L/h to reduce atmospheric features in the spectra and prevent degradation of the KBr 716 beamsplitter. The incident beam has to be convergent at an angle of 3 -4°. The incidence angle is 12° in 717 order to prevent the specularly reflected part from escaping through the entrance hole. The measured 718 sample area is about 25 mm in diameter. Spectrometer settings used in this study are given in Table  Figure

727
The design of the sphere at King's allows for both the substitution and the comparative calibration 728 procedures, with an internal rotating mirror to enable the comparative method, as shown in Figure A2.

729
To perform a reference measurement in comparative mode, the folding mirror of the sphere (which 730 also acts as a baffle) is rotated such that the incoming energy is reflected onto the gold-coated sphere 731 wall instead of the sample. After the reference measurement, the folding mirror is rotated back and the 732 sample in the sample port measured.

758
The number of scans collected depends on the signal available for each material, with a larger 759 number of measurements made for low signal spectra (e.g. open port or distilled water). In this study,

801
The instrument setup and general measurement procedures are explained in depth in [27].

817
Currently PSL operates two identical Bruker VERTEX 80v vacuum FTIR spectrometer; one 818 spectrometer is equipped with aluminium mirrors optimized for the UV, visible and near-IR spectral 819 ranges, the second features gold-coated mirrors for the near to far IR spectral range. Hemispherical 820 reflectance is measured under purging conditions, covering the 0.2 μm to above 200 μm spectral range.

821
The two instruments share the collection of detectors, beamsplitters, and optical accessories that are 822 available in their equipment to cover a very wide spectral range, and this facilitates the cross-calibration 823 procedures. Figure A7. The laboratory setup at PSL

826
The instruments and the optical accessory units used are fully automatized and the data calibration 827 and reduction are made with quality-controlled software, developed following the DLR quality 828 management rules. Figure A7 shows the PSL laboratory. Two integrating spheres (one with gold coated 829 surfaces, the other with PTFE coating) are available for hemispherical reflectance measurements.

830
Reflectance measurements are calibrated by comparing with spectroscopic measurements of well 831 characterized references (PTFE for UV, Spectralon for VIS, Infragold for MIR). Figure A8 shows the 832 integrating sphere that was used to measure the two Optosol samples (1 and 2), the distilled water 833 sample, and the reference (Infragold) that was used for calibration. Figure A8. Gold coated integrating sphere and reference used for the experiment at PSL (DLR).

836
The measurement process is as follows. Firstly, a reference spectrum is acquired for each set-up.

837
Then the spectrometer software allows measuring the sample and directly divide its spectrum for the 838 measured reflectance. As a result, one has the relative reflectance of the sample to the reference used

839
(Infragold in this case). By dividing for the real Infragold reflectance spectrum (provided from the 840 producer of the reference itself), one gets the absolute hemispherical reflectance of the sample.

841
The number of scans depends on the spectral resolution. At 0.5 cm -1 resolution, 300 scans were

854
The last Bruker conformity certificate was obtained on 22/11/2017, with a 2-year validity. The

855
principal parameters of the set-up are summed-up in Table A4.

858
The measurement is a directional (13° incidence) hemispherical reflectance and obtained using 859 four successive acquisitions of 256 scans each to compensate for the substitution error (see Figure A10).

864
Hemispherical reflectance denoted dh is then retrieved from these successive measurements

929
The demineralised water was measured in the Bruker-supplied black aluminium sample cups and 930 placed in the bottom port.