2.3.1. Simulation of Hyperspectral Data from Satellite Hyperspectral Data
When satellite hyperspectral data are used to simulate a new hyperspectral sensor data, it is necessary to evaluate and, if required, implement spectral band adjustment (including data preprocessing, Spectral Response Function (SRF) calculation, and band generation), spatial resampling, and radiometric noise addition/reduction. In this study we used PRISMA and EnMAP data for simulating CHIME data.
Data pre-processing
For PRISMA, VNIR and SWIR image cubes were first merged into a single continuous dataset, and radiance units were converted from W/m2/sr/μm to W/m2/sr/nm to match the EnMAP format that was used as a reference for the CHIME simulated data.
Then, to ensure spectral consistency between the VNIR and SWIR detector overlap, bands within the overlapping region (940–970 nm for PRISMA and 900–1000 nm for EnMAP) that did not exhibit a smooth, continuous response across the dataset were removed. Specifically, four bands were removed for PRISMA (two from the VNIR and two from the SWIR detector) and twelve bands were removed for EnMAP. This automated filtering represents approximately 1.7% of the total spectral channels for PRISMA and 5% for EnMAP. Due to the dense spectral sampling in these regions, the removal improved the continuity of the spectra without loss of critical information, facilitating a more stable SRF adjustment (See
Section 2.3.1). This VNIR–SWIR mismatch was most noticeable in the PRISMA data but was also evident in water pixels of the EnMAP data. These inconsistencies are linked to variations in calibration, optical paths and noise characteristics at each subsystem and were necessary to correct to prevent these artifacts from propagating into the CHIME simulations.
Spectral Response Function (SRF) calculation
The SRF describes how sensitive a sensor band is to incoming radiance as a function of wavelength. In practice, the measured signal in each band is an integral of the true spectrum weighted by the SRF. They can be approximated using Gaussian shape, parameterized by the band’s central wavelength, CW, and bandwidth (Full Width at Half Maximum (FWHM)), and normalized so that their integral equals one, as given by:
Band generation
The proposed method for spectral adjustment reconstructs a high-resolution continuous spectrum that is consistent with the original sensor’s band
, then
is inferred from the measured band values from the original sensor and their SRFs, then forward-modeled through the target SRFs to yield simulated CHIME bands.
where
is the unknown high-resolution spectrum,
is the original sensor SRF matrix, and
the CHIME SRF matrix.
This method allows spectral harmonization between sensors with differing spectral characteristics, provided that a sufficiently high-resolution spectral representation is available and the SRFs are known or accurately approximated.
Spatial resampling
In the simulation of satellite-to-satellite datasets, spatial resampling, either downscaling or upscaling depending on the relative GRD of the source and target sensors, is typically required. However, in this study, as both the source instruments (PRISMA and EnMAP) and the target mission (CHIME) share a native spatial resolution of 30 m, no change in pixel size was required.
Rather than resampling, the methodology prioritized maintaining spatial consistency and radiometric integrity through precise georeferencing and grid alignment. This was achieved by reprojecting the PRISMA datasets to match the EnMAP Coordinate Reference System (CRS) and grid geometry.
Radiometric noise addition/reduction
When simulating new sensors, adding or reducing radiometric noise may be required to match the sensors’ SNR specifications. In this study, no additional synthetic noise was needed to be added or reduced into the simulated CHIME data. Published SNR specifications for CHIME, PRISMA, and EnMAP indicate that CHIME’s performance (e.g., average SNR ~328 in VNIR and ~190 in 1–2 μm SWIR ~95 in 2–2.5 μm) is comparable to the reference instruments for the radiance levels observed. Therefore, the simulated data are expected to be representative of realistic CHIME measurements without extra noise modeling.
However, if noise reduction was necessary, the enhancement methodology described by [
27] would be implemented. On the other hand, if the introduction of noise was required to match target sensor specifications, a Gaussian noise model would be applied, as detailed in
Section 2.3.2.
2.3.2. Simulation of Hyperspectral Data from Airborne Hyperspectral Data
The simulation of satellite images from airborne hyperspectral imagery follows a structured, physically based workflow. The primary objective is to convert very high-resolution airborne data (VNIR-only in our case), into products that are spectrally, radiometrically and spatially consistent with satellite (CHIME in our case) observations at the TOA at the satellite data resolution (30 m in our case). The processing chain consists of five main components: (i) atmospheric and radiometric correction of the airborne data, (ii) spectral adjustment using satellite spectral response functions (SRFs), (iii) spatial resampling to satellite-like resolution using a point spread function (PSF) model, (iv) noise addition/reduction, and (v) forward atmospheric simulation using a radiative transfer model. The following paragraphs illustrate the application of the above steps to simulate CHIME satellite data. In this study, simulation of new satellite hyperspectral data from existing airborne data is given by examples that simulate CHIME data using HySpex data.
Atmospheric and radiometric corrections
First, atmospheric corrections were applied to all airborne hyperspectral datasets to derive Bottom-of-Atmosphere (BOA) reflectance from the initial radiance images. This was performed using ATCOR-4 v6.2, which utilizes radiative transfer modeling based on MODTRAN-derived look-up tables. This process simulates the interaction of solar radiation with atmospheric gases, aerosols, and the underlying terrain before reaching the sensor. For each study region, ATCOR-4 was configured with an atmospheric profile appropriate to local conditions (water vapor, aerosols, and ozone), ensuring that the retrieved reflectance is physically consistent with the scene’s atmosphere. ATCOR-4 also allows the definition of a “custom sensor” using central wavelengths and FWHM to derive SRFs for atmospheric correction of the airborne sensor. This configuration was applied consistently across all flight lines within a region to guarantee radiometric homogeneity.
Bidirectional reflectance effects can be particularly pronounced over water in multi-angle airborne acquisitions. To account for Bidirectional Reflectance Distribution Function (BRDF) distortions in datasets where they were visible (Milano, Margherita, Lago Morto), a nadir normalization (or across-track illumination) correction was applied via ATCOR-4. This method calculates brightness as a function of the scan angle and multiplies each pixel by the reciprocal function.
Finally, to mosaic the flight runs, pixels with a scan angle greater than 12 degrees were masked to remove obvious edge distortions in each run. Each flight run was then radiometrically harmonized bandwise using the mean values of each band of overlapping sections. The final mosaic was generated using the Orfeo ToolBox v9.1.1 (OTB) mosaic algorithm, which manages the blending between overlapping regions. For water surfaces, the “large” blending mode was utilized to ensure seamless transitions across maximum overlap areas. For inland areas, the “slim” blending mode was preferred, as it applies blending over a defined transition distance, avoiding the blur effects that can occur due to slight geometric distortions.
Spectral adjustment with CHIME SRFs
The airborne hyperspectral sensor used in this study acquires data only in the VNIR channels, spanning approximately 403–993 nm. Consequently, only CHIME bands with central wavelengths falling within this range can be simulated from the airborne data. Bands in the SWIR cannot be generated due to the absence of corresponding spectral information in the source imagery.
For bands within the valid VNIR range, the same SRF-based adjustment methodology used for satellite data was applied. Each airborne spectrum was convolved with the CHIME SRFs to ensure spectral consistency between the high-resolution airborne input and the target satellite output.
Spatial resampling and PSF modeling
Airborne data were acquired at a very high spatial resolution of 0.5 m, whereas CHIME operates at 30 m spatial resolution. To bridge this gap, a spatial resampling strategy is implemented that mimics the sensor’s point spread function rather than relying on simple pixel averaging.
First, a Gaussian blur is applied to the high-resolution images to emulate the effect of CHIME’s PSF. This Gaussian kernel introduces controlled spatial smoothing and reduces high-frequency detail, approximating the optical degradation and mixing that would occur in a real satellite observation. The width of the Gaussian (expressed through its standard deviation or FWHM) is chosen based on CHIME’s modulation transfer function (MTF) requirements. In particular, the kernel is selected so that the MTF at the Nyquist frequency remains above the mission’s specified threshold.
In practice, a Gaussian PSF with a FWHM of 30 m was adopted in both across- and along-track directions. This choice yields isotropic spatial blurring that matches CHIME’s expected spatial mixing while remaining compliant with the MTF constraint. Once the Gaussian blur has been applied, the images are downsampled to the CHIME pixel size by taking samples at the appropriate spacing, resulting in 30 m airborne-derived products that realistically approximate CHIME’s spatial characteristics.
Simulation of atmospheric effects
To simulate what CHIME would actually measure at TOA, atmospheric effects need to be reintroduced in a controlled, physically consistent way. For this forward simulation, the 6S v2.1 radiative transfer model is used. 6S computes atmospheric transmission, path radiance, atmospheric reflectance, spherical albedo and direct/diffuse irradiance for each spectral channel, given an atmospheric state and geometry. In this study, atmospheric profiles were built from European Centre for Medium-Range Weather Forecasts (ECMWF)’s ERA5 reanalysis [
28] (water vapor and ozone) and CAMS global reanalysis products [
29] (aerosol optical depth at 550 nm).
For each scene, solar zenith and azimuth angles were derived from the acquisition time and location, while the sensor zenith angle is fixed at 0° at the image center, reflecting CHIME’s nadir-viewing geometry. To account for varying viewing geometries across the full swath, a series of Look-Up Tables (LUTs) was generated and spatially interpolated for each pixel. The model was configured with CHIME’s SRFs, so that the resulting TOA radiances correspond directly to CHIME bands. BOA reflectance from the upscaled hyperspectral data was then converted to TOA radiance (
) using the 6S outputs, for CHIME spectral channels within the VNIR range, this conversion was performed as follows:
where
is path radiance,
upward transmittance,
and
the direct and diffuse irradiance at the surface,
the cosine of the solar zenith angle,
the atmospheric spherical albedo, and
the surface reflectance.
Radiometric noise addition/reduction
CHIME is characterized by a high SNR, particularly across the VNIR spectrum (average SNR of 328:1). This exceeds the nominal specifications of the HySpex sensor, which has a reported threshold of 255:1. However, the simulation workflow, specifically the Gaussian blurring and spatial averaging performed during the 30 m resampling, significantly enhances the effective SNR of the intermediate products. By comparing homogeneous regions in the original aerial imagery with those in the processed simulations, we determined that the SNR improved by a factor of 4 to 5 across most bands.
To align the radiometric quality of the simulated products with the official CHIME specifications, we applied a noise addition process. A random noise value, generated from a Normal (Gaussian) distribution, was added to each pixel. The standard deviation (
) required to reach the target noise level was calculated based on the relationship between the initial simulated noise (
) and the target CHIME noise (
):
Finally, to simulate realistic sensor artifacts, a small fraction of pixels was randomly selected and assigned extreme values (outliers) or null values (e.g., NaN or zero). This step ensures the resulting datasets replicate the defective sensor readings typically encountered in satellite images.
2.3.3. Simulation of Thermal Data from Thermal Satellite Data
In this study simulation of new satellite thermal data from existing satellite data is given by examples that simulate the five 50 m TIR bands of the future LSTM mission using Landsat and ASTER thermal data. Landsat 8/9 provides only two TIR bands at 100–120 m, and ASTER provides five TIR bands at 90 m. The proposed framework tackles both missing bands and coarse resolution through a two-step chain:
Thermal Infrared Spectral Super-Resolution (TIR-SSR): Generation of LSTM-equivalent TIR bands at 100 m from Landsat and ASTER.
Thermal Infrared Spatial Super-Resolution (TIR-SpSR): Downscaling of the 100 m LSTM-like bands to 50 m using deep learning.
Together, these steps produce physically consistent, high-resolution LSTM-like thermal products.
Thermal Infrared Spectral Super-Resolution
The SSR module generates a full set of LSTM-equivalent thermal bands by integrating Landsat/ASTER TIRS radiances, ASTER GED emissivity, and detailed atmospheric simulations. The thermal emission spectrum in the 8–12 μm window is governed by first principles: Planck’s law and the thermal radiative transfer equation (RTE) [
30]. At any given wavelength λ, the Top-of-Atmosphere (TOA)
radiance measured by the sensor is defined by the RTE [
30,
31]:
where
τ(
λ) is the atmospheric transmittance,
is the Planck radiance at the surface temperature, ε(λ) is the surface emissivity, L↓(λ) is the downwelling sky radiance incident on the surface, and L↑(λ) is the thermal emission originating from the atmosphere along the sensor line of sight. Accurate simulation of these terms requires a physically realistic atmospheric profile describing water vapor, temperature and ozone concentration. The blackbody radiance
is determined by Planck’s Law:
where h is Planck’s constant, c the speed of light and k is the Boltzmann’s constant.
Once the surface temperature, emissivity, and atmospheric state are known, the shape of the thermal radiance spectrum is highly constrained, smooth, and low-dimensional. This makes the TIR region particularly well suited for a purely physics-based spectral super-resolution approach, without needing any machine learning to “hallucinate” missing bands [
32,
33].
To obtain the atmospheric terms required, libRadtran v2.0.6 [
12] is used to compute spectral transmittance, upwelling and downwelling thermal radiances at fine spectral resolution. For each Landsat and ASTER acquisition, ERA5 reanalysis profiles of temperature, pressure, water vapor and ozone [
28] are extracted at the scene center and vertically interpolated to the pressure grid needed by libRadtran. These profiles are used within an AFGL Mid-Latitude Summer baseline atmosphere, with water vapor and ozone scaled to match ERA5 total columns. Simulations cover the 8–12 μm range with 1 nm spectral sampling and are repeated across the relevant set of view zenith angles. The resulting high-resolution spectra of τ(λ), L↑(λ) and L↓(λ) are then convolved with the spectral response functions of Landsat 8/9, ASTER and LSTM to generate band-integrated lookup tables for each instrument.
Surface emissivity is taken from the ASTER GED v3 [
34] emissivity mosaics at 100 m resolution. The five ASTER TIR channels act as anchor points for building a continuous emissivity spectrum between 8 and 12 μm via shape-preserving cubic splines. Over land, this emissivity is further refined using Landsat NDVI to apply a fractional vegetation correction, while water emissivity values are held fixed. From this, emissivity maps for the spectral configurations of Landsat, ASTER and LSTM are derived. Because emissivity in the TIR spectrum, like emitted radiance, varies slowly and does not exhibit sharp spectral features for most natural surfaces, this interpolated curve provides a physically realistic representation of the true emissivity spectrum [
32,
33].
Surface temperature is then retrieved from Landsat and ASTER TIR data. For Landsat 8/9, only Band 10 is used, in line with USGS recommendations [
35,
36], as Band 11 is affected by stray-light contamination. All ASTER TIR bands are exploited. The workflow consists of converting Level-1 digital numbers to TOA radiances using the calibration coefficients, resampling those radiances to a common 100 m grid, interpolating the band-integrated τ, L↑ and L↓ values from the libRadtran lookup tables at each pixel’s viewing geometry, and then applying the thermal radiative transfer equation to isolate the surface-leaving radiance. Planck’s law is inverted at the effective wavelength (around 10.9 μm) to obtain kinetic surface temperature per pixel. This inversion is nonlinear because of the Planck term, so it is solved iteratively using Newton–Raphson, making use of all available TIR bands to stabilize the solution, when possible, a method particularly effective for the multi-band ASTER configuration [
33]. The outcome is a spatially consistent surface temperature map, physically tied to the ERA5-libRadtran atmospheric state.
Once surface temperature and emissivity are known, the BOA thermal spectrum can be reconstructed. Then the BOA spectrum is reconstructed and propagated back through the atmosphere to produce TOA radiances for the five LSTM bands at a 100 m grid. The outcome is therefore a set of five LSTM-equivalent thermal bands at 100 m, representing the radiances that LSTM would have seen under the same surface and atmospheric conditions as the original Landsat or ASTER acquisition.
Thermal Spatial Super-Resolution
The second step, thermal downscaling, aims to increase the spatial resolution of the produced 100 m LSTM-like bands to the 50 m resolution planned for the mission. Current TIR sensors are too coarse to resolve many fine-scale features of interest, such as small river plumes, localized industrial discharges or intra-urban heat patterns. To address this, a deep-learning framework is used utilizing a SwinIR-based single-image super-resolution network.
Thermal downscaling methods have evolved to overcome the resolution constraints of TIR sensors by leveraging high-resolution auxiliary data. Classical approaches have transitioned from data fusion and pansharpening [
37,
38] to regression-based models [
39,
40] that establish statistical relationships between land surface properties and temperature. While effective for large-scale patterns, these methods often struggle with complex, nonlinear thermal gradients in heterogeneous landscapes.
The advent of Deep Learning (DL) has significantly advanced the field, moving from early convolutional neural networks (CNNs) [
41,
42,
43], to models incorporating attention mechanisms [
44,
45,
46] to better reconstruct structural details. However, CNNs are often limited by the local nature of convolutional kernels. To address this, we selected the SwinIR model [
47], a state-of-the-art architecture based on Swin Transformers. Unlike traditional CNNs, SwinIR utilizes shifted-window self-attention to capture both local textures and long-range spatial dependencies, making it uniquely suited for enhancing 100 m thermal fields to 50 m while maintaining the strict radiometric integrity required for the LSTM mission.
Network Architecture and Training
SwinIR [
47] is based on Swin Transformer blocks and follows a U-Net-like architecture. The network’s architecture is composed of three parts: shallow feature extraction, deep feature extraction, and high-resolution image reconstruction. The core consists of a stack of Swin Transformer blocks that process the imagery at multiple scales to capture complex spatial patterns. Residual connections between the input and the deep feature representations stabilize training and ensure efficient gradient flow.
While high-resolution thermal imagery represents the ideal ground truth for downscaling models, such datasets are not easily available. Therefore, training was conducted using existing high-resolution satellite datasets (e.g., 90 m ASTER, 100 m Landsat 8/9) by applying Wald’s protocol [
48] to construct scale-invariant training and validation sets. Wald’s protocol is a widely used validation framework for resolution enhancement methods, where high-resolution data are artificially degraded to a lower resolution and then reconstructed. The reconstructed product is subsequently compared against the original high-resolution data to assess performance. The 100 m LSTM-like thermal products generated in the SSR phase were spatially degraded to 200 m to create input–target pairs (200 m → 100 m). This methodology follows the principle that a valid downscaling framework must be capable of reconstructing the original high-resolution signal from a degraded version of itself.
For each scene, the dataset is split into 80% for training and 20% for validation at the scene level to avoid spatial leakage and bias. The images are tiled into overlapping patches (e.g., 64 × 64 pixels at 200 m), and extensive data augmentation (rotations, flips, random crops) is applied, increasing the effective dataset size to approximately 51,200 patches. Thermal values are normalized using min–max scaling. SwinIR was trained end-to-end on a workstation equipped with an Intel® Core™ i9-14900K CPU (Intel Corporation, Santa Clara, CA, USA), 128 GB RAM, and NVIDIA GeForce RTX™ 5090 GPU with 32 GB VRAM (NVIDIA Corporation, Santa Clara, CA, USA). The system used SSD-based storage for data handling. Training was performed for approximately 8 h, utilizing the Adam optimizer with a standard L1 reconstruction loss to prioritize pixel-level radiometric accuracy over artificial perceptual sharpness. Hyperparameters such as window size, number of attention heads, feature depth, and the balance between content and adversarial losses are adjusted to balance quality and computational cost.
Inference and Integration
During inference, the trained models are applied to the 100 m LSTM-simulated bands produced previously. The images are processed in overlapping patches to avoid edge artifacts, and the outputs are then mosaicked and denormalized back to physical units (Wm−2sr−1 μm−1). The resulting 50 m thermal fields show substantially enhanced spatial detail compared with the original 100 m products: thermal gradients around urban structures, water bodies and coastal plumes are sharper and more coherent, while large-scale radiometric patterns remain consistent with the input. In this way, the combination of physics-based spectral reconstruction and transformer/GAN-based spatial sharpening yields a robust, operationally realistic framework for producing 50 m LSTM-like thermal imagery that respects both the underlying physics and the expected spatial structure of the mission’s data.