1. Introduction
Gaofen-5 01A (GF-5A) is a key component of China’s civilian High-Resolution Earth Observation System, designed to support comprehensive environmental monitoring and resource assessment. Launched on 8 December 2022, the satellite carries three primary payloads: the Advanced Hyperspectral Imager (AHSI), the Environmental Trace Gas Monitoring Instrument (EMI), and the Wide-swath Thermal Infrared Imager (WTI) [
1]. WTI employs a whiskbroom cross-track scanning configuration and provides an exceptionally wide swath of 1500 km with a ground sampling distance of 100 m across four longwave infrared bands. This combination of wide coverage and multiple thermal-window channels enables enhanced spatiotemporal observations for applications such as land surface temperature (LST), sea surface temperature (SST), drought monitoring, and urban thermal environment assessment [
2]. Thermal infrared (TIR) remote sensing retrieves key parameters—such as land surface temperature, surface emissivity, and atmospheric characteristics—by measuring longwave radiance signals [
3]. These products are widely used in climate diagnostics, surface energy budget analysis, disaster monitoring, and environmental assessment. Such applications require radiometric calibration that is accurate, temporally stable, and traceable across long-term satellite missions. Historical sensors such as Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER), MODIS, and Landsat have demonstrated the importance of maintaining consistent and comparable TIR records [
4,
5,
6,
7].
TIR sensors are typically equipped with an onboard radiometric calibration subsystem. For GF-5A WTI, two temperature-controlled internal blackbodies are configured as the primary radiance reference sources. The high-temperature blackbody is maintained at an elevated operating temperature through dedicated heaters and thermal control components, whereas the low-temperature blackbody, in addition to the same types of components as the high-temperature blackbody, requires a heat dissipation surface and a heat pipe to regulate its temperature at a lower level. During routine operational imaging, the instrument periodically observes the onboard blackbodies for calibration, enabling on-orbit calibration coefficients to be updated and applied to the digital number (DN)-to-radiance conversion for product generation. Nevertheless, onboard calibration alone cannot fully eliminate long-term radiometric drift; residual systematic biases may still arise from stray-light contamination, orbit-driven thermal-environment perturbations, limitations in thermal control and thermometry stability, and instrument aging [
8,
9]. Therefore, vicarious calibration remains necessary to provide an independent, traceable external pathway for verifying and, when required, correcting the onboard-calibrated radiometric response [
10,
11,
12].
Common approaches for TIR vicarious calibration include lake- or water-body–based methods using uniform temperature/radiance targets, as well as radiance-based methods employing ground-based or airborne radiometers. For example, the U.S. Geological Survey (USGS) Earth Resources Observation and Science (EROS) Center has established a comprehensive on-orbit calibration and validation framework for the Landsat 8 Thermal Infrared Sensor (TIRS), which demonstrates sub-Kelvin radiometric stability under favorable observing conditions and thereby enhances the consistency of long-term thermal data records [
13]. Additionally, studies over Qinghai Lake have conducted vicarious calibration and uncertainty evaluation for Fengyun-4A (FY-4A) Advanced Geostationary Radiation Imager (AGRI) TIR channels, quantifying the brightness temperature biases and stability characteristics of individual bands [
14].
To enhance calibration frequency and repeatability without substantially increasing costs, the Radiometric Calibration Network (RadCalNet), established under the Committee on Earth Observation Satellites (CEOS) Working Group on Calibration and Validation (WGCV), has provided an operational service framework for radiometric calibration in the visible-to-shortwave infrared (VNIR–SWIR) spectral region. This framework integrates standardized automated sites with unified processing chains and enables cross-mission and cross-sensor radiometric-consistency assessments [
15,
16]. Meanwhile, methodologies based on automated stations and standardized processing pipelines have already been operationalized. For instance, the automated reflectance-based method designed for Landsat-8 Operational Land Imager (OLI) significantly reduces field deployment requirements while maintaining timely and robust calibration updates [
17]; Similarly, a ground-based radiance-based method for Sentinel-2A MultiSpectral Instrument (MSI) has demonstrated a long-term radiometric-stability evaluation framework driven by automated observations [
18]. Using the Baotou Desert automated calibration site, time-series vicarious calibration of Ziyuan-3 (ZY-3) multispectral camera (MUX) has also verified the effectiveness of the automated workflow and its capability to close the uncertainty budget [
19].
However, automated vicarious calibration systems for the thermal infrared domain remain limited. Existing lake-based campaigns are typically short in duration, seasonally restricted, and difficult to maintain year-round. Furthermore, automated networks such as RadCalNet do not currently extend to the TIR spectral region, where radiance is more sensitive to atmospheric variability and surface emissivity. This gap highlights the need for a persistent, automated, traceable TIR calibration framework suitable for long-term operational missions.
Several recent efforts have extended automated vicarious calibration to the TIR domain. Over radiometrically homogeneous lake surfaces, automated buoys, radiometers, and uncrewed surface vehicles (USV) have been used to reduce manual intervention and increase temporal sampling. For example, USV-based observations over Lake Qinghai have enabled multi-scene radiometric consistency assessments for the FY-4A/AGRI TIR channels [
14]. Similarly, the Ziyuan-1 02E (ZY-1-02E) Infrared Spectrometer sensor (IRS) has been calibrated using coordinated ground measurements from a Fourier transform infrared (FTIR) spectrometer and an SI-111 infrared thermometer (Apogee Instruments, Inc., Logan, UT, USA), achieving an accuracy better than 0.6 K under diverse surface and atmospheric conditions [
20]. In addition, Hu et al. established a lake-based ground-to-satellite synchronous calibration workflow for the Sustainable Development Goals Satellite-1 (SDGSAT-1) Thermal Infrared Spectrometer (TIS), utilizing high-resolution ground radiometer observations and atmospheric radiative transfer simulations to quantify systematic brightness temperature biases (approximately 0.3–1.1 K) across the three TIR bands, confirming the feasibility of routine on-orbit calibration and performance monitoring enabled by automated lake-based observation sites [
21].
Despite these advances, lake-based calibration campaigns are limited in operational continuity. Their reliance on water bodies also restricts spatial scalability and limits radiometric coverage across gain states, because the limited dynamic range of water-surface temperature cannot adequately span the sensor’s full gain range. In contrast, the extensive Gobi regions of northwestern China offer naturally stable, spatially uniform, and low-humidity surfaces, making them ideal pseudo-invariant calibration sites (PICS) for thermal infrared sensors. However, no operational, year-round, multi-site automated calibration system has yet been established over these Gobi surfaces. Addressing this gap is essential for enabling persistent, traceable, and high-frequency vicarious calibration of modern TIR satellite missions.
To address this gap and to develop a long-term, low-maintenance, and traceable TIR calibration framework, we deployed automated ground-based observation systems at three Gobi sites in October 2023 and initiated a dedicated field experiment for GF-5A WTI. The framework integrates continuous contact surface-temperature measurements, field-measured emissivity spectra, radiosonde atmospheric profiles, and MODTRAN v5.2 radiative-transfer simulations to construct a complete surface–atmosphere–sensor radiometric chain. Using this chain, top-of-atmosphere radiances are computed, collocated with satellite observations, and used to derive on-orbit calibration coefficients. Although previous studies have investigated TIR vicarious calibration, no multi-site, year-round automated system has been established over Gobi pseudo-invariant calibration surfaces. The present work provides the first implementation of such a system and demonstrates its feasibility for routine TIR calibration.
This study proposes a long-term and traceable framework to support operational TIR vicarious calibration. We establish three year-round Gobi calibration sites selected under PICS-candidate criteria and deploy an automated ground-based observation system, whose accuracy is verified through metrological traceability and radiometric closure. By integrating ground observations with IGRA radiosonde profiles and MODTRAN v5.2 radiative-transfer simulations, we implement a traceable “surface–atmosphere–satellite” radiometric chain to derive on-orbit calibration coefficients for GF-5A/WTI in each band and gain mode. The resulting calibration is further validated by independent on-orbit samples and a comprehensive uncertainty budget.
The remainder of this paper is organized as follows:
Section 2 provides detailed descriptions of the study area, data sources, measurement systems, and calibration methods. The calibration results are presented in
Section 3.
Section 4 discusses the findings, and
Section 5 concludes the study.
3. Results
3.1. Accuracy Verification of Ground-Based Temperature Measurements
To evaluate the performance of the ground-based contact temperature sensor across different thermal transition regimes, two field comparison experiments were conducted under warming (20 September 2024, 11:00–12:30 local time) and cooling (27 September 2024, 22:00–23:30 local time) conditions. In each experiment, the ground-based contact sensor was compared synchronously with a JM standard thermometer traceable to the National Institute of Metrology of China. The temperature difference time series
, defined in Equation (
1), was used to characterize the measurement bias between the two sensors.
Figure 6 presents the synchronous temperature measurements and the corresponding bias series. The two temperature profiles exhibit high temporal coherence, indicating consistent thermal response behavior between the contact sensor and the JM reference. During the warming experiment, surface temperature increased from approximately 28.6 °C to 36.6°C, with instantaneous deviations
generally confined within
°C. Statistically, the bias metrics are
During the cooling experiment, temperature decreased from 8.57°C to 5.77°C, with even bigger deviations:
Both experiments indicate that the bias remains small throughout the experiments. Such behavior suggests that the contact sensor and JM standard thermometer maintain robust agreement in both heating and cooling conditions. The slight positive bias may arise from the thermal interface between the probe and surface: higher contact pressure or marginally thicker thermal paste may cause the contact probe to sense heating slightly earlier than the standard thermometer.
Given the intrinsic single-point measurement uncertainty of the contact sensor (approximately ± 0.1), and considering the experiment-derived statistics (; ), the observed deviation is far smaller than typical uncertainties in thermal infrared inversion or on-orbit radiometric calibration (often K). Thus, the accuracy is sufficient for validating TIR products and evaluating retrieval algorithms.
To independently assess the suitability of contact temperature measurements for serving as reference inputs in radiative-transfer calculations, a radiometric-closure experiment was performed. This experiment evaluates whether temperatures derived from FTIR-based radiometric inversion are consistent with direct contact measurements under controlled surface and atmospheric conditions. Turbo FT radiance spectra
and contact temperatures
were collected simultaneously, and the ISSTES algorithm was applied to retrieve temperature
[
31]. The radiometric consistency metric
, as defined in Equation (
2), was then used to quantify the difference between the contact measurements and the ISSTES-derived temperatures.
The experiments covered a high-temperature case (approximately 40
, 20 September 2024) and a low-temperature case (approximately 7
, 26 September 2024). As shown in
Figure 7, the deviations
are generally confined within
. The overall statistics are
indicating that the contact temperatures are consistent with the ISSTES-retrieved radiometric temperatures at a level well within the accuracy requirements for thermal infrared calibration in this study.
Across both dynamic thermal transitions and radiometrically retrieved temperature ranges, the contact sensors demonstrate high consistency (–; –). This accuracy level is fully sufficient for thermal infrared calibration requirements and supports both on-orbit and field-based radiometric calibration as well as emissivity algorithm validation. Based on these results, the contact temperature measurements are adopted as the reference temperature input for subsequent surface temperature retrieval and calibration procedures.
3.2. Analysis of Surface Temperature Variations
In October 2023, the automated ground measurement systems were successfully deployed at the three calibration sites of Dunhuang, Dachaidan, and Golmud, and stable data acquisition has since been achieved. Analysis of the 2024 contact temperature time series enables an assessment of the long-term suitability of the three Gobi sites for persistent vicarious calibration. The annual thermal dynamics provide insight into the stability, representativeness, and radiometric diversity required for calibrating thermal infrared sensors across their full dynamic range.
As shown in
Figure 8, all three sites exhibit a typical seasonal cycle characterized by low temperatures in winter and high temperatures in summer. A sustained high-temperature plateau lasting approximately two months is observed in summer, while maximum winter temperatures do not exceed 20
. The annual warmest period occurs in late July, and the coldest temperatures appear in January (Dachaidan) and December (Dunhuang and Golmud), consistent with regional radiative climate characteristics. These seasonal behaviors enable seasonal calibration strategies, with high-temperature calibration in summer and low-temperature calibration in winter.
Due to planned maintenance and system switching, data gaps occur between 1 November and 15 November 2024; however, the overall coverage and continuity remain sufficient for statistical analysis and do not affect the interpretation of seasonal patterns or thermal extremes. The annual temperature dynamic range is large across all sites, with Dunhuang spanning 87.3, Dachaidan 95.3, and Golmud 89.2. This broad thermal range ensures that the three sites collectively cover the full spectrum of surface temperatures from cold to hot throughout the year. Consequently, ample calibration samples can be obtained to support dynamic-range extension and long-term drift monitoring of thermal infrared sensors, providing continuous and traceable ground-based reference measurements for radiometric calibration.
3.3. Radiometric Calibration Results
GF-5A WTI imagery acquired from 1 February 2024 to 31 July 2024 was selected for calibration analysis (182 days in total). During this period, approximately six WTI scenes per day were available over the calibration sites, yielding 1092 candidate scenes. Following the satellite–ground matchup and the quality-control procedure described in
Section 2.2.3, 903 unsuitable scenes were excluded, and a total of 189 valid overpass samples remained across the three calibration sites.
This dataset provides both a sufficiently wide radiometric dynamic range and a representative distribution of multiple gain states, meeting the requirements of linear regression–based calibration. Notably, GF-5A/WTI employs multiple gain modes within a given band, and gain switching introduces gain-dependent offsets and response variations. Consequently, the DN–radiance relationship is not globally continuous across gain modes, and pooling samples from different gains in a single regression will bias the fitted coefficients. Therefore, we perform gain-stratified regression and derive the calibration coefficients for each gain mode using the linear model in Equation (
7).
Figure 9 presents the on-orbit calibration results for WTI channels B1–B4 under different gain modes (Z2, Z3, Z4). For each band–gain combination, linear regression was performed using all valid matched samples, and the regression equation and corresponding coefficient of determination (
) were obtained. A larger
indicates a stronger linear correspondence between DN and radiance
. Overall, all channel–gain combinations exhibit clear and statistically significant linear responses, with coefficients of determination ranging from
to
and RMSE ranging from 0.12 to 0.24. These results demonstrate that the on-orbit response of GF-5A WTI is well described by a first-order linear model across its operational dynamic range, with no evidence of nonlinear distortion.
From a channel-by-channel perspective, distinct patterns emerge in sample size n and residual characteristics (RMSE) across gain states. Band 1 under gain mode Z2 exhibits the best overall performance (, RMSE = 0.18, ), indicating a favorable signal-to-noise ratio and sufficient radiance coverage, making it the most stable channel in the current calibration framework. Band 2 demonstrates excellent cross-gain consistency between Z2 and Z3, with identical coefficients of determination () and RMSE values of 0.12 and 0.24 (). This suggests that transitioning between gain states does not introduce systematic response deviations and that the linear relationship remains portable and internally consistent.
Bands 3 and 4 also maintain strong linearity under Z3/Z4 modes (Band 3: , RMSE = 0.16/0.23; Band 4: , RMSE = 0.14/0.23). The slightly higher residuals observed in the high-gain modes are primarily attributable to the limited number of samples () and the broader distribution of operating radiance levels, reflecting the influence of sample size and radiance coverage on regression stability rather than any intrinsic nonlinearity of the sensor.
In summary, all WTI channels exhibit highly consistent linear behavior across gain states, with strong model fits and well-controlled residuals, meeting the requirements of on-orbit radiometric calibration. To further evaluate cross-gain coherence and the applicability of the derived calibration coefficients, the next section analyzes error characteristics in overlapping gain regions and validation results from independent imagery.
4. Discussion
4.1. Selection of Calibration Sites
To ensure that automated ground-based observation systems provide long-term, high-quality data for satellite radiometric calibration and validation, calibration sites must satisfy several core criteria, including spatial homogeneity, temporal stability, flat terrain, and low cloud cover. The overarching principle is to prioritize surface types with highly stable radiometric properties and minimal anthropogenic disturbance—such as desert and gravel–gobi surfaces—and to include regions that span different thermal regimes so that sensor performance can be evaluated under diverse surface temperature conditions. Spatially uniform regions with minimal topographic variation and weak spatial gradients in reflectance/emissivity help improve the geometric consistency between satellite and ground observations, whereas persistent low cloudiness and low aerosol loading ensure high ratios of valid overpasses and robust temporal continuity.
For candidate site selection, this study builds upon the preliminary site inventory of Hu et al. (2020), which identified 32 Pseudo-Invariant Calibration Sites (PICS) [
25]. A regional search and field surveys were conducted across the arid to semi-arid regions of northwestern China. To further refine site suitability, we used the MODIS MOD09GA cloud-mask product from 2001–2020 to derive 20-year clear-sky statistics for all PICS candidates, using the annual mean clear-sky fraction averaged over the 20-year period. MOD09GA provides two sets of cloud-related flags (cloud state and internal cloud algorithm flag). Based on these, two clear-sky indicators were defined:
Clear-Sky Probability 1: fraction of pixels within the 10 km × 10 km window that satisfy cloud state = 0 or internal cloud algorithm flag = 0;
Clear-Sky Probability 2: fraction of pixels within the 10 km × 10 km window that satisfy cloud state = 0 and internal cloud algorithm flag = 0.
These metrics jointly characterize the long-term clear-sky availability under “cloud-free/low-cloud” (Clear-Sky Probability 1) and “strictly cloud-free” (Clear-Sky Probability 2) conditions.
To meet the spatial homogeneity requirements, we further calculated the mean slope and slope standard deviation within a 10 km × 10 km window around each candidate site using SRTM digital elevation model (DEM) data, providing quantitative constraints on terrain flatness and spatial uniformity.
As shown in
Table 2, the three selected sites—Golmud, Dachaidan, and Dunhuang—perform well in terms of spatial homogeneity and clear-sky frequency. Although Dunhuang exhibits slightly higher cloudiness, its low slope and high spatial uniformity make it an excellent candidate. Golmud and Dachaidan, both located within typical gravel–gobi regions, show consistently high clear-sky probabilities and therefore provide highly favorable conditions for long-term, stable ground-based calibration observations.
4.2. Spatial Uniformity of Surface Thermal Infrared Emission
To evaluate the spatial uniformity of surface thermal infrared emission over the candidate calibration regions, thermal infrared band images from the GF-5A WTI instrument were analyzed within a 2 km × 2 km area centered on each candidate site. Spatial radiometric uniformity was quantified using the coefficient of variation (CV), a dimensionless statistic widely used in calibration-site characterization studies [
25], defined as:
where
represents the standard deviation of digital numbers (DN) within the region of interest, and
denotes the corresponding mean DN value. A smaller CV indicates a more spatially homogeneous thermal radiance field.
As summarized in
Table 3, all three candidate calibration sites exhibit excellent spatial radiometric uniformity, with CV values consistently below 2% across all thermal infrared bands. The Dunhuang site shows the lowest CV values among the three regions for both daytime (0.3%–0.9%) and nighttime (0.18%–0.63%) scenes, indicating superior radiance homogeneity. The Golmud site ranks second (daytime 0.6%–1.6%, nighttime 0.42%–0.98%), while the Dachaidan site displays slightly higher CV values but remains well within acceptable uniformity thresholds (daytime 0.6%–1.8%, nighttime 0.3%–1.1%).
Across all sites, nighttime CV values are systematically lower than their daytime counterparts, suggesting that surface thermal emission fields are more stable under weak or absent solar heating. Taken together, the three regions exhibit favorable characteristics—uniform surface composition, minimal temporal variability, flat terrain, and a high frequency of clear-sky conditions—supporting their suitability for establishing persistent vicarious calibration fields and meeting the radiometric accuracy requirements of high-resolution thermal infrared sensors.
4.3. Analysis of Calibration Coefficients
To accommodate calibration requirements across different gain modes within the same spectral band, calibration coefficients were derived independently for each mode. During the calibration process, it was observed that DN values differ across gain states even when corresponding to the same physical radiance .
To compare calibration consistency across gain settings under equivalent radiance levels, this study defines the overlap region as the radiance interval between the minimum
observed in a lower-gain mode and the maximum
observed in the adjacent higher-gain mode. Within this interval, radiance residuals were computed using the calibration coefficients listed in
Table 4, following:
where
and
indicate the calibration slope and intercept of the
i spectral band under the
j gain mode. For each overlap interval, three statistical metrics—systematic bias (Bias), standard deviation (Std), and root-mean-square error (RMSE)—were used to characterize the error distribution and evaluate whether different gain settings maintain consistent radiometric responses under identical radiance conditions.
Figure 10 illustrates the radiometric consistency across gain modes for each spectral band within their respective
overlap intervals. These results provide a systematic basis for evaluating whether different gain configurations maintain comparable statistical behavior and response stability under identical radiance conditions, thereby testing the robustness of the calibration model across the instrument’s operational dynamic range. A few extreme samples are present in the overlap region, which are retained because they reflect gain-state behavior near the overlap boundary.
Across-band comparisons reveal distinct cross-mode behaviors. For Band 2, gain mode Z2 shows a clear statistical advantage in the overlap region: its , and are 0.1695, 0.0182 and 0.1685, respectively, all lower than the corresponding values for Z3 (0.2145, 0.0595 and 0.2060). Under the current calibration model and sample distribution, this indicates that the calibration samples for Band 2 provide denser coverage on the low-radiance side (e.g., nighttime or low-temperature conditions), so that the linear fit and noise characteristics in this radiance interval are better constrained. Since Z2 is typically used for lower-radiance observations, the present dataset suggests that low-brightness samples exert a stronger constraint on the Z2 calibration parameters, leading to smaller apparent errors and higher stability for this gain mode under the current calibration setup.
For Band 3, the overlap-region statistics show a different pattern: Z4 outperforms Z3 across all three metrics, with decreasing from 0.2788 to 0.1971, from 0.1232 to 0.0436, and from 0.2501 to 0.1923. Under the current calibration model and dataset, this indicates that the available calibration samples in the overlap region are more concentrated in the higher-radiance range (e.g., daytime or stronger surface emission), so that the high-radiance responses of Z4 are more fully constrained by the data. This feature suggests that, for Band 3 under the present calibration model and sample conditions, the high-radiance samples provide a more pronounced constraint on the Z4 fit, with system noise and potential nonlinearity being more effectively suppressed for this gain mode in the overlap interval.
Band 4 exhibits a similar behavior to Band 3 within the overlap region. With only a small difference in dispersion ( differing by about 0.006), Z4 achieves smaller errors than Z3 (for Z3: , ; for Z4: , ). Under the present sample size and calibration procedure, this indicates that Z4 yields smaller fitting errors for the higher-radiance samples in the Band 4 overlap region.
Overall, the overlap-region statistics for the three bands jointly reflect the behavior of the current calibration parameters across different radiance levels. For Band 2, the better error statistics of Z2 indicate that, within the present dataset, nighttime or low-radiance samples provide more effective constraints for this gain mode. For Bands 3 and 4, the smaller errors of Z4 suggest that, under the current calibration model and sample distribution, high-radiance samples in the overlap regions exert stronger constraints on the corresponding fits. Taken together, these results indicate that, for GF-5A WTI under the present calibration framework and sample conditions, consistent responses and good error convergence are achieved across different radiance regimes.
4.4. Validation of Calibration Coefficients
To assess the applicability and radiometric accuracy of the derived calibration coefficients under real observational conditions, thermal infrared imagery acquired by the GF-5A WTI instrument from 1 August to 10 September 2024 was used for validation. For each scene, the WTI spectral response functions, MODTRAN-derived atmospheric radiative transfer components (including upward transmittance and path radiance), together with in situ surface temperature and emissivity measurements, were combined to compute the band-equivalent at-sensor radiance. The radiance was subsequently converted to the simulated brightness temperature
using the Planck function. In parallel, the satellite-side brightness temperature
was retrieved using the calibration coefficients derived in
Section 3.3 of this study by applying them to the corresponding WTI observations to obtain the at-sensor radiance, followed by inversion of the Planck function. Comparison with the satellite-derived brightness temperature
yields the brightness–temperature deviation:
After removing samples affected by cloud contamination or unstable surface temperature conditions, 201 out of 246 candidate samples were excluded, and the remaining 45 valid cases were retained for quantitatively assessing the stability and accuracy of the calibration coefficients under on-orbit conditions.
As shown in
Figure 11, the brightness temperature deviations for the four thermal infrared bands exhibit strong consistency across the 45 valid samples. The deviation range is within
for all bands, and the maximum RMSE does not exceed 0.76 K, indicating that the calibration results achieved in this study are both accurate and reliable. The error distributions for each channel are relatively compact, further demonstrating that the calibration model remains stable and that the retrieval uncertainties are well controlled.
From a per-band perspective, although slight differences exist in deviation magnitude and distribution shape, the overall behavior is coherent across bands. Band 1 shows a nearly symmetric deviation distribution with and , indicating stable radiometric response without obvious systematic shifts. Band 2 exhibits a small negative bias () and achieves the smallest RMSE among all bands (), suggesting that this channel currently attains the highest calibration accuracy. Band 3 presents a small positive bias (, ) with tightly clustered residuals and limited random dispersion. Band 4 behaves similarly to Band 3, showing and , consistent with stable linear response and controlled random errors.
Overall, the mean absolute deviation across all four channels is approximately , with an average RMSE of . These results demonstrate that the calibration coefficients derived in this study are highly reliable and applicable under on-orbit observation conditions. Furthermore, after applying the calibration coefficients, the GF-5A WTI instrument maintains strong radiometric consistency and effective noise control across all channels.
4.5. Uncertainty Analysis
The uncertainty of brightness temperature estimation in this study primarily originates from four contributing factors: (1) contact temperature measurement, (2) surface emissivity measurement, (3) atmospheric profile acquisition, and (4) radiative transfer modeling using MODTRAN v5.2. These components collectively determine the overall accuracy of the surrogate radiometric calibration.
- (1)
Uncertainty of Contact Temperature Measurement
To quantify the measurement uncertainty of the contact temperature sensor, synchronous heating–cooling comparison experiments were conducted using a JM standard thermometer traceable to national metrology institutes. The results show that the RMSE between the contact thermometer and the standard reference thermometer lies within 0.173–0.177
. This RMSE already includes practical factors such as thermal coupling between the probe and target surface and differences in response time. Therefore, the RMSE is treated as the standard uncertainty of the contact temperature sensor under the conditions of this study:
- (2)
Uncertainty of Surface Emissivity Measurement
Surface emissivity was measured in the field using a portable Fourier-transform infrared spectrometer. As defined by Equation (
5), emissivity uncertainty arises primarily from radiance measurement errors and temperature measurement errors propagated through the emissivity formulation. In addition, it also includes the algorithmic uncertainty in the emissivity measurement. When these effects are further propagated through the surface–atmosphere–sensor radiative transfer chain, the resulting brightness temperature uncertainty is less than 0.5 K. Therefore, the emissivity-related uncertainty is taken as:
- (3)
Uncertainty of Atmospheric Profile Data
Atmospheric correction in this study relies on radiosonde profiles from the Integrated Global Radiosonde Archive (IGRA). These profiles undergo operational quality control and exhibit good temporal and spatial consistency [
29,
30]. Under cold and dry conditions, radiosonde relative humidity biases are typically within 5–10%. Previous studies show that a 10% relative increase in precipitable water vapor (PWV) leads to a brightness temperature deviation of approximately 0.13 K [
32]. As the study area is located in an arid to semi-arid Gobi region with generally low and slowly varying PWV, the atmospheric-profile uncertainty is taken as:
- (4)
Uncertainty of the Radiative Transfer Model (MODTRAN)
The uncertainty of the radiative transfer model mainly arises from band-model approximations, absorption-line parameter uncertainties, and numerical integration discretization. Within the 8–14 μm atmospheric window, the inherent MODTRAN v5.2 model error is generally below 2%. Therefore, we apply a relative perturbation to the MODTRAN-simulated band-equivalent TOA radiance for each channel and retrieve the corresponding brightness temperature using the inverse Planck relation. We then summarize the differences before and after perturbation in the brightness–temperature domain, and take the mean of all differences to quantify this uncertainty term.
Based on our dataset, the mean brightness–temperature uncertainties induced by MODTRAN v5.2 are 0.9 K (Band 1), 0.9 K (Band 2), 1.2 K (Band 3), and 1.3 K (Band 4). Accordingly, the MODTRAN v5.2-related brightness-temperature uncertainty is taken as:
- (5)
Total Uncertainty
The combined standard uncertainty of the at-sensor brightness temperature derived via the radiative transfer chain is computed using the root-sum-of-squares (RSS):
Substituting the adopted values
,
,
,
, yields:
Overall, brightness temperature retrievals based on the radiative transfer model exhibit a conservative combined standard uncertainty of approximately 1.41 K under the conditions of this study. Among the individual contributors, the MODTRAN-related uncertainty dominates the total error budget, followed by surface emissivity uncertainty, while uncertainties from contact temperature measurements and atmospheric profiles are comparatively minor. This indicates that even in dry, low-water-vapor environments, radiative transfer modeling uncertainty can be a primary contributor to the absolute brightness–temperature uncertainty when propagated in the brightness–temperature domain.
5. Conclusions
In response to the need for long-term stability and high-frequency calibration of TIR satellite sensors, this study developed a continuous, automated ground-based observation system centered on Gobi pseudo-invariant calibration sites and dedicated in situ observations. The framework has been validated using the GF-5A WTI instrument through comprehensive analyses of calibration-coefficient stability, inter-gain consistency, and on-orbit brightness–temperature comparisons.
Three calibration sites located in arid to semi-arid Gobi regions were selected through comprehensive screening based on terrain flatness, cloud-occurrence probability, and TIR spatial uniformity. Using metrology-traceable reference thermometers together with dual validation from Turbo FT emissivity measurements and ISSTES-based temperature–emissivity separation, the deployed automated ground system was demonstrated to provide a reliable, traceable surface-temperature reference suitable for long-term TIR radiometric calibration.
By integrating high-accuracy emissivity measurements, IGRA atmospheric-sounding profiles, and MODTRAN v5.2-based radiative-transfer modeling, an end-to-end “surface–atmosphere–sensor” calibration chain was established, enabling routine updates of on-orbit gain and offset parameters at a cadence compatible with operational practice and supporting a physically consistent, traceable radiometric link between ground measurements and satellite observations.
Results derived from GF-5A WTI indicate that the linear calibration coefficients exhibit highly stable responses across all bands and gain states, with coefficients of determination consistently exceeding . Overlap-region analyses further show that inter-gain brightness–temperature differences (bias and RMSE) are generally constrained within about . Band-dependent behaviors are also observed: Band 2 calibration benefits from low-radiance samples (nighttime or low-temperature conditions), whereas Bands 3 and 4 achieve improved model fitting under high-radiance conditions (daytime or warm-surface cases). Independent on-orbit image validation confirms that the RMSE for all bands remains below , which is smaller than the conservative estimated total uncertainty of approximately derived from the full radiative chain.
Overall, the proposed continuous ground-based surrogate calibration approach provides GF-5A WTI with a low-maintenance, traceable, and operationally practical on-orbit calibration solution. The methodology offers a reproducible technical pathway for quality assurance, long-term stability assessment, and cross-sensor or multi-mission consistency analysis for future high-resolution thermal infrared satellite missions.