Next Article in Journal
Dual-Polarization Radar Quantitative Precipitation Estimation (QPE): Principles, Operations, and Challenges
Previous Article in Journal
Assessing the Feasibility of Satellite-Based Machine Learning for Turbidity Estimation in the Dynamic Mersey Estuary (Case Study: River Mersey, UK)
Previous Article in Special Issue
UAV Based Weed Pressure Detection Through Relative Labelling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

RTM Surrogate Modeling in Optical Remote Sensing: A Review of Emulation for Vegetation and Atmosphere Applications

1
Image Processing Laboratory (IPL), University of Valencia, C/Catedrático José Beltrán 2, Paterna, 46980 Valencia, Spain
2
State Forestry and Grassland Administration Key Laboratory of Forest Resources and Environmental Management, Beijing Forestry University, Beijing 100083, China
3
State Key Laboratory of Remote Sensing and Digital Earth, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
4
Secretary of Research and Graduate Studies, CONACYT-UAN, Tepic 63155, Mexico
*
Author to whom correspondence should be addressed.
Remote Sens. 2025, 17(21), 3618; https://doi.org/10.3390/rs17213618
Submission received: 9 September 2025 / Revised: 17 October 2025 / Accepted: 28 October 2025 / Published: 31 October 2025

Highlights

What are the main findings?
  • Emulation via machine learning regression algorithms (MLRAs) accurately reproduces vegetation and atmospheric RTMs while accelerating computations by 10 2 10 6 .
  • Dimensionality reduction (e.g., PCA, autoencoders) with scalable MLRAs (GPR, NN, DLNN) optimizes the accuracy–efficiency trade-off for hyperspectral and coupled models.
What is the implication of the main finding?
  • Emulation enables fast global sensitivity analysis, scene generation, and large-scale inversion applications.
  • Anticipated advances: physics-informed/explainable emulation, reliable uncertainty layers, and emulation of water and soil RTMs.

Abstract

Radiative transfer models (RTMs) are foundational to optical remote sensing for simulating vegetation and atmospheric properties. However, their significant computational cost, especially for 3D RTMs and large-scale applications, severely limits their utility. Emulation, or surrogate modeling, has emerged as a highly effective strategy, accurately and efficiently replicating RTM outputs. This review comprehensively surveys recent developments in emulating vegetation and atmospheric RTMs. We discuss the methodological underpinnings, including suitable machine learning regression algorithms (MLRAs), effective training sampling strategies (e.g., Latin Hypercube Sampling, active learning), and spectral dimensionality reduction (DR) methods (e.g., PCA, autoencoders). Emulators commonly achieve 10 2 10 6 × per-evaluation acceleration, but accuracy–efficiency trade-offs remain inherently context-dependent, governed by the MLRA design and the coverage/quality of training data. DR consistently shifts this trade-off toward lower cost at comparable accuracy, positioning latent-space training as a pragmatic choice for hyperspectral applications. We synthesize key emulation applications such as global sensitivity analysis, synthetic scene generation, scene-to-scene translation (e.g., multispectral-to-hyperspectral), and retrieval of geophysical variables using remote sensing data. The paper concludes by outlining persistent challenges in generalizability, interpretability, and scalability, while also proposing future research avenues: investigating advanced deep learning algorithms (e.g., physics-informed and explainable architectures), developing multimodal/multitemporal frameworks, and establishing community benchmarks, tools and libraries. Emulation ultimately empowers remote sensing workflows with unparalleled scalability, transforming previously unmanageable tasks into viable solutions for operational Earth observation applications.

Graphical Abstract

1. Introduction

In optical remote sensing, radiative transfer models (RTMs) are fundamental physical tools. They describe the complex interactions of electromagnetic radiation with Earth’s surface and atmosphere, providing a theoretical framework for understanding observed signals [1,2,3]. Specifically, vegetation RTMs simulate light interactions within plant canopies to characterize biophysical properties [4], while atmospheric RTMs describe how radiation is modified by atmospheric constituents, enabling atmospheric correction and sensor signal interpretation [5]. These models collectively serve as the theoretical backbone for understanding and interpreting the complex interactions between radiation and the Earth’s surface and atmosphere, and are essential for generating synthetic observations and designing robust retrieval algorithms [6]. The most commonly used RTM for vegetation is PROSAIL [7,8], which simulates leaf and canopy reflectance as a function of biophysical variables such as leaf area index (LAI) and chlorophyll content [9,10]. Based on the principles of PROSAIL, the energy balance model SCOPE (Soil Canopy Observation, Photosynthesis and Energy balance) [11] extends this approach by additionally modeling energy fluxes and solar-induced chlorophyll fluorescence (SIF) [12]. Both RTMs represent vegetation as a turbid medium, i.e., a one-dimensional (1D) representation of a canopy, which simplifies canopy architecture. In recent years, more structurally explicit 3D RTMs have gained traction, such as DART (Discrete Anisotropic Radiative Transfer) [13] and LESS (LargE-Scale remote sensing data and image Simulation framework) [14], which account for detailed canopy structure and enable more realistic scene simulations. These models have likewise evolved in simulating multiple types of radiation, such as SIF and laser scanning [15,16]. This increase in realism, however, comes at the cost of significantly higher computational demands, often resulting in long rendering times for a single simulation. RTMs also play a key role in accounting for atmospheric effects on solar radiation in remote sensing applications. The 6S model (Second Simulation of a Satellite Signal in the Solar Spectrum) [17] and MODTRAN (MODerate resolution atmospheric TRANsmission) [18] are two of the most frequently employed RTMs for simulating atmospheric transmittance, path radiance, and surface-atmosphere interactions. Similar to MODTRAN, libRadtran (library for radiative transfer) [19] offers a flexible and high-accuracy radiative transfer framework that supports spectral calculations in both the solar and thermal domains, allowing for detailed simulation of various atmospheric scenarios, including aerosols, clouds, and surface reflectance anisotropy. These models are essential for atmospheric correction of satellite observations and retrieval of surface reflectance [20].
Despite their inherent strengths, RTMs—particularly structurally explicit 3D canopy models—incur substantial computational demands. This limitation becomes especially pronounced when applied in large-scale or iterative frameworks such as global mapping, operational biophysical retrieval, uncertainty quantification (UQ), or data assimilation. Depending on the model’s complexity, a single RTM simulation can range from milliseconds (e.g., PROSAIL) to several minutes or even hours (e.g., 3D DART or LESS simulations). Atmospheric RTMs (e.g., 6S, MODTRAN, libRadtran) exhibit similar challenges: their high dimensionality and detailed parameterizations make them computationally intensive, particularly in applications involving inversion, time-series reconstruction, or coupled canopy-atmosphere modeling [20]. While individually relatively fast, the necessity to execute these RTMs hundreds of thousands or even millions of times across vast parameter spaces, entire satellite images (e.g., millions of pixels), or long time series renders them impractical for real-time or near-real-time scenarios. These computational demands often stem from the complex physical equations and iterative numerical solutions required to simulate light interactions, especially across high-dimensional input spaces and spectral ranges [21].
To address these limitations, emulation has emerged as a promising strategy [22]. Emulation, also known as surrogate modeling, involves constructing a simplified, computationally inexpensive statistical model (an emulator or surrogate model) that accurately approximates the input-output relationship of a more complex, computationally demanding model or data transformation [23]. This is typically achieved by capitalizing on statistical learning techniques to train the emulator on a limited, yet carefully selected, set of simulations (i.e., from a physical model) or empirical observations (i.e., from data) [24,25]. Once trained, these emulators provide predictions orders of magnitude faster, frequently completing computations in microseconds, while preserving high accuracy. This drastic speed-up transforms previously intractable problems into feasible ones, opening new avenues for research and operational applications, such as in optical remote sensing science [26,27].
Since their introduction in remote sensing science and driven by advancements in statistical learning, over the past two decades we have witnessed the rapid emergence of emulators, with applications expanding across multiple domains in Earth observation (EO). In an attempt to grasp recent developments and anticipate future directions, this review examines how emulation contributes to the growing effectiveness of RTMs and remote sensing workflows, especially in vegetation and atmospheric EO studies. By reviewing and synthesizing recent scientific literature, we highlight key methodological advances in emulation, such as the selection of suitable machine learning regression algorithms (MLRAs) and efficient sampling strategies, alongside emerging image processing applications, including sensitivity analysis, synthetic scene generation, image-to-image transformations, and retrieval applications. We close with outlining promising directions for future research in this rapidly evolving field.

2. The Challenge of Computationally Expensive RTMs in EO Applications

Vegetation RTMs are essential for interpreting remote sensing data by simulating light interaction with plant canopies, with complexity varying across models [3]. PROSAIL, a combination of PROSPECT and SAIL, serves as a benchmark RTM due to its moderate complexity and widespread use [9,10]. PROSAIL couples leaf-level optical properties with a turbid medium representation of the canopy. Building upon this model, SCOPE represents a significant upgrade, integrating radiative transfer with detailed biophysical processes, e.g., photosynthesis and the full energy budget, while still relying on turbid medium principles [11,12]. At the highest level of complexity, we find RTMs that use discrete ordinate methods or ray tracing to explicitly simulate photon paths through detailed, heterogeneous 3D scenes. These models, such as DART, precisely account for all orders of scattering, clumping, and shadowing, offering a physically rigorous representation of radiative transfer [13,15,16]. Similarly, to seek a physical and realistic radiative transfer over heterogeneous scenes, LESS uses Monte Carlo (MC) ray tracing to simulate photon interactions within highly detailed 3D vegetation canopies [14], often reconstructed from LiDAR data [28]. Designed for scalability, LESS emphasizes computational efficiency while retaining realism in canopy architecture and light transport by using a lightweight boundary-based leaf cluster description approach [29] or semi-empirical-based radiative transfer acceleration technique [30], making it especially suited for simulating remote sensing signals, including images, LiDAR point clouds, and SIF, over large forested or agricultural landscapes. See also Table 1.
While powerful, on the downside, these advanced RTMs impose significant computational bottlenecks in high-throughput tasks such as pixel-wise inversion over satellite scenes or global sensitivity analyses. For instance, a typical 10-m resolution Sentinel-2 (S2) image covering 100 × 100 km contains 10 8 pixels. Performing a full RTM inversion for each pixel, which might involve iterative optimization or look-up table (LUT) searches, would be computationally burdensome for operational product generation, potentially taking days or even weeks. This challenge is further compounded when considering time series analysis, where the same pixels need to be processed across many dates.
Introducing atmospheric RTMs (e.g., 6S, MODTRAN, libRadtran), adds yet another layer of complexity (see Table 1). Atmospheric RTMs simulate how radiation is modified by atmospheric gases (e.g., water vapor, ozone), aerosols (e.g., type, optical depth), and viewing geometry (solar and observation angles). These models are essential for atmospheric correction, converting top-of-atmosphere (TOA) radiance into surface reflectance. When subsequently coupling these atmospheric RTMs with vegetation RTMs, e.g., to simulate TOA reflectance from surface properties and atmospheric conditions for sensor design or retrieval algorithm training at the TOA scale, the resulting simulation chains become even slower and higher-dimensional. The interaction of numerous input parameters, such as varying aerosol loads, water vapor content, and viewing geometries, can easily result in parameter spaces spanning tens of dimensions [20].
Consequently, traditional inversion methods based on brute-force iterative optimization methods (e.g., downhill simplex, Levenberg-Marquardt) become impractical at scale [40,41], while limiting the size of LUTs to enhance processing speed typically leads to a decline in retrieval accuracy [42]. Even advanced MC sampling methods, commonly used for UQ, normally demand thousands to millions of model runs, rendering them unfeasible for large-scale applications without substantial computational resources [43]. These inherent challenges underscore the critical need for fast emulators that can accurately and efficiently approximate RTM outputs across these vast and complex domains.

3. Emulation as a Surrogate Modeling Strategy

3.1. General Principles and Core Emulation Approaches

The principle of emulation entails constructing a computationally efficient surrogate model that approximates the behavior of a complex RTM or other deterministic, physically-based models [22]. For several decades, statistical learning techniques have been employed in climate and environmental sciences to emulate complex system dynamics [24,44,45,46,47,48,49,50,51]. In remote sensing, emulation functions as an inverse regression model. While a typical regression model uses spectral data to predict vegetation or atmospheric properties, emulation instead takes atmospheric or biophysical RTM parameters as an input to generate synthetic spectral data. In the context of RTMs, these emulators aim to replicate RTM outputs with high accuracy while drastically reducing computation time—often achieving speed-ups of several orders of magnitude [6,27]. See also Figure 1 for a visual illustration of the RTM emulation concept. The conceptual figure shows a parameter sampling scheme that defines input combinations for a computationally-intensive RTM (e.g., SCOPE), which simulates hyperspectral reflectance and fluxes. The resulting dataset supports two modelling branches: (i) a direct retrieval route (top), where spectra—optionally compressed by principal component analysis (PCA)—are linked to biophysical traits via regression models, and (ii) an emulation route (bottom), where a surrogate model learns to reproduce SCOPE outputs from inputs, effectively approximating the RTM outputs. This efficiency gain is especially advantageous in computationally intensive workflows, such as replicating outputs from advanced RTMs, model inversion, sensitivity analysis, data assimilation, and operational near-real-time applications.
A common characteristic of these emulators is their foundation in adaptive and flexible MLRAs [22,52]. MLRAs enable the modeling of non-linear relationships between input and output parameters. Owing to their relatively low computational cost, these algorithms allow emulators to generate spectral outputs far more rapidly than running full RTMs. Emulators are typically trained on a representative set of RTM simulations, which span the range of relevant input parameters. Once trained, they can rapidly predict outputs for unseen input combinations with negligible computational overhead. This enables the efficient exploration of high-dimensional parameter spaces that would be prohibitively slow using the original RTMs [6,27]. Importantly, emulators are usually validated against the original RTM to ensure fidelity to physical laws, which enhances generalizability, especially when extrapolating slightly beyond the training domain [26,27].
Emulators are thus fundamentally built upon statistically learned models. Recent advances in MLRAs—and more recently in deep learning—have significantly enhanced the predictive capabilities and broadened the applications of emulators [53]. A wide range of emulation approaches has been employed to approximate RTMs efficiently, spanning from classical data-driven regression algorithms to the latest advanced MLRAs. MLRAs such as Neural Networks (NN) and, upcoming, deep learning NNs (DLNNs), Gaussian Process Regression (GPR), Random Forests (RF), Support Vector Regression (SVR), Kernel Ridge Regression (KRR), and Polynomial Chaos Expansion (PCE) have been successfully used to emulate various types of physically based models (see Table 2 and Table 3 for details). Although PCE has seen little use in RTM emulation, it is worth highlighting due to its established role in emulation studies in other fields [54,55,56,57,58]. The suitability of each method depends on factors including the RTM’s complexity and non-linearity, the dimensionality of the input space, interpretability needs, and whether UQ is required [52].
Table 2 summarizes the key characteristics, advantages, and limitations of established MLRAs used for RTM emulation. GPR remains a strong choice for its inherent UQ and effectiveness with small datasets, though its high computational cost hampers scalability. NNs and Deep Learning NNs (DLNNs) are well-suited for capturing complex, nonlinear relationships and scale efficiently to large datasets. DLNNs offer state-of-the-art performance in high-dimensional and data-rich scenarios, yet they require significant training data, are more opaque in their interpretability, and only provide approximate uncertainty estimates unless explicitly designed to do so. RFs are valued for their robustness, fast training, and interpretability via feature importance measures. Although RFs can approximate predictive uncertainty through ensemble variance, they lack a principled probabilistic UQ framework and may underperform on fine-grained, highly continuous outputs. SVR achieves high accuracy and robustness in medium-sized datasets, though it is sensitive to kernel and hyperparameter settings and struggles with large-scale datasets. KRR offers competitive accuracy with fewer tuning demands than GPR and SVR—and therefore trains faster and is more efficient in application—but lacks native uncertainty estimates. Finally, PCE excels in analytical UQ and global sensitivity analysis, but is constrained by the curse of dimensionality (yet, see also Section 4.2) and has limited flexibility in capturing strong nonlinearities. Overall, the selection of an emulation method depends on trade-offs between accuracy, scalability, interpretability, and the need for UQ, with different approaches better suited to different RTMs and application settings.
Two examples of GPR emulators are presented in Figure 2. GPR was selected as the emulator since it achieved the highest accuracy among the evaluated MLRAs when trained and validated on the original spectra (results not shown; illustrative purpose only). In comparison to the computational demands of the original simulations—requiring about half an hour to nearly a full day on a single CPU to generate (1) top-of-canopy (TOC) reflectance with LESS or (2) TOA radiance spectra with the coupled PROSAIL–libRadtran models—the GPR emulators reproduced, respectively, one thousand TOC reflectance and TOA radiance spectra within seconds. Training of each emulator was completed in under 20 s, underscoring the orders-of-magnitude computational savings.

3.2. Proof-of-Concept Studies Demonstrating the Potential of Emulation in Approximating RTMs

Initial proof-of-concept studies effectively illustrated the capability of emulators to replicate RTM outputs with high precision and notable computational efficiency. While emulation has a longer history in other scientific fields, its application to RTMs began around a decade ago with two pioneering works. First, Rivera et al. [26] introduced a statistical learning-based emulator toolbox for approximating SCOPE outputs—specifically reflectance and SIF. Among the evaluated MLRAs, KRR and NNs achieved high reconstruction accuracy, with relative errors below 0.5% when trained on a small set of 500 simulations. Using PCA for dimensionality reduction (DR), NN and KRR emulators ran approximately 50× and 800× faster, respectively, than the original SCOPE model. In parallel, Gómez-Dans et al. [27] demonstrated emulation use cases for both vegetation (PROSAIL) and atmospheric (6S) RTMs. Their GPR emulators reproduced model outputs and finite-difference-based gradients accurately, achieving speed-ups between 10,000 and 50,000 times compared to the original models. These foundational studies spurred broader research in exploring distinct RTMs, MLRAs, and output variables. For instance, Verrelst et al. [61] systematically evaluated the impact of machine learning type, integration with DR, and LUT size. Their results showed that well-configured GPR and NN emulators could reconstruct SCOPE outputs with relative errors below 2% for reflectance and 4% for SIF, and about 250 times faster than SCOPE. Vicent et al. [42] subsequently benchmarked emulation against classical interpolation techniques using PROSAIL and MODTRAN using MLRA and DR combinations. Emulation consistently outperformed interpolation techniques in spectral reconstruction, with GPR achieving up to tenfold higher accuracy while maintaining competitive speed. Additionally, GPR, formulated within a Bayesian framework, inherently provides associated uncertainty estimates.

3.3. Recent Progress in Emulation in Vegetation and Atmospheric RTMs

MLRA-based emulation of vegetation RTMs has now become an established strategy for accelerating inversion workflows and facilitating large-scale applications. For example, Shi et al. [63] applied multiple regression algorithms—RF, ANN, and SVR—to emulate a coupled soil–canopy–atmosphere RTM ACRM (Atmospheric Correction Reflectance Model) [81] and 6S, enabling accurate and efficient retrievals of vegetation biophysical variables from satellite data. Extending to 3D canopy structures, Makhloufi and Kallel [82] developed an ANN-based emulator of the DART model, integrated with a continuous MC inversion framework. Applied to S2 imagery, this approach significantly reduced computational burden while supporting uncertainty-aware retrievals. Alternatively, ref. [30] proposed a semi-empirically accelerated approach to accurately simulate reflectance from 400 nm to 2500 nm using a few predefined soil, branch, and leaf optical properties based on the four-stream theory, which provides the potential to emulate only a few bands to replicate the full spectral range of LESS-like simulations between 400 and 2500 nm with an acceleration of more than 320 times. These three recent studies demonstrate the potential of emulators to support accurate, scalable, and efficient RTM applications in vegetation remote sensing. However, despite the growing importance of 3D and coupled RTMs, applications of surrogate modeling remain scarce. The niche use of dedicated emulators for 3D RTMs highlights an important research gap and a promising opportunity to advance robust emulation frameworks for complex vegetation RTMs. Yet, emulation efforts have advanced more substantially in atmospheric applications.
Recent advances in atmospheric RTM emulation have capitalized on MLRAs to significantly improve computational efficiency and flexibility. Fundamental work by Brodrick et al. [64] introduced a generalized NN emulator for radiative transfer in imaging spectroscopy, enabling flexible reflectance retrievals from complex scenes. Veerman et al. [83] developed an NN surrogate for the gas optics component of RRTMGP (Rapid Radiative Transfer Model for General circulation–Parallel), achieving near-original accuracy with a fourfold speed-up. In the Earth system modeling context, Belochitski and Krasnopolsky [84] demonstrated that emulated RT components can be integrated into hybrid general circulation models (GCMs) while maintaining stable long-term behavior. Ukkonen [67] systematically explored design choices for neural RTM emulators, showing that architecture, input representation, and training strategies have a strong impact on emulation accuracy. Vicent Servera et al. [73] proposed a multifidelity GPR framework for emulating atmospheric RTMs, which integrates information from different fidelity levels to improve predictive performance and reduce training costs. Zhong et al. [85] applied an emulated RRTMG within the WRF (Weather Research and Forecasting) model, demonstrating notable runtime reductions without degrading forecast performance. Jasso-Garduño et al. [70] presented a DLNN emulator for the 6S RTM, enabling fast approximations of atmospheric corrections. Emulation of MODTRAN-based TOA reflectance for MODIS bands using RF and NNs was conducted by Gonzalez et al. [86], yielding accurate and efficient surrogates. Similarly, Lamminpää et al. [87] employed GPR to replicate NASA’s OCO-2 forward model with instrument-level accuracy and orders-of-magnitude speed-ups. Zucker et al. [88] further demonstrated that physics-informed neural networks (PINNs) can directly solve the radiative transfer equation, yielding physically consistent and accurate solutions (see Section 4.3). Additionally, emulation has been applied to aerosol optics in E3SM (Energy Exascale Earth System Model) using randomly wired NNs [89], and probabilistic surrogates have been proposed for CRTM (Community Radiative Transfer Model) in support of satellite data assimilation [90]. Finally, Sgattoni et al. [91] presented an emulation approach for the FORUM (Far-infrared Outgoing Radiation Understanding and Monitoring) satellite mission, selected as ESA’s 9th Earth Explorer mission, aiming to approximate the inverse retrieval of atmospheric properties from far-infrared spectra using simulated data and NNs. Altogether, these studies underscore the increasing maturity and breadth of atmospheric RTM emulation, establishing it as a powerful approach across remote sensing, forecasting, and climate modeling domains.

3.4. Trends in MLRAs for RTM Emulation Applications

In light of the above studies, we can now evaluate the suitability of established MLRAs for emulation. Table 3 qualitatively compares commonly used MLRAs for RTM emulation based on four key characteristics: (1) accuracy—how well the emulator replicates RTM outputs; (2) UQ—the ability to estimate prediction confidence; (3) scalability—how efficiently the method handles increasing data size or complexity; and (4) interpretability—how transparently model behavior and predictions can be understood. Both standard NNs and DLNNs offer high accuracy and scalability, with DLNNs particularly suited for learning complex input–output mappings; yet, they are limited in interpretability and only provide approximate UQ. GPR excels in terms of UQ and accuracy, although scalability is limited. RF is robust and scalable, offers moderate-to-high accuracy, with empirical rather than intrinsic UQ. KRR and SVR offer balanced accuracy and interpretability, although without native UQ, and differ in scalability and hyperparameterization. KRR often surpasses SVR in both accuracy and practical scalability, benefiting from the simplicity of tuning a single regularization parameter. PCE stands out for its analytical UQ and interpretability, making it ideal for sensitivity studies, albeit less suited to complex, high-dimensional problems. This likely explains why PCE is less suited for emulating RTM outputs, which often span hundreds of spectral bands. However, integrating PCE with DR techniques could help overcome this limitation (see Section 4.2). Altogether, selecting an appropriate MLRA requires balancing trade-offs among accuracy, UQ, scalability, and interpretability according to the application’s needs. In most emulator designs, accuracy is the primary driver, which often makes (DL)NNs the preferred choice. Conversely, when robust UQ is essential, GPR is often favored for its combination of high accuracy and intrinsic probabilistic framework, though its limited scalability constrains the feasible size of training datasets.

3.5. Emulation Applications Beyond RTMs: LSMs, ESMs, and DGVMs

While this review primarily addresses the emulation of RTMs in the optical domain, it is worth noting that emulation techniques have a long-standing history in other areas of Earth observation and environmental modeling. In fact, they have been widely used to accelerate computationally demanding environmental models for over a decade [24,44,45,46,47,48,49,50,51]. Surrogate modeling has become instrumental in accelerating simulations and enabling UQ in complex Earth system components, including Earth system models (ESMs), land surface models (LSMs), and dynamic global vegetation models (DGVMs). These large models often involve highly nonlinear, computationally intensive processes that simulate biogeophysical, hydrological, and biogeochemical dynamics across spatiotemporal scales by making use of EO data. This section briefly reviews recent advances in emulating Earth system models to demonstrate that the same MLRA families are employed as in RTM emulation. For instance, Lu and Ricciuto [92] employed MLRA-based surrogates, including NNs and gradient boosted-decision trees (GBDTs), to emulate components of ESMs. In this context, DR techniques such as singular value decomposition (SVD) (see also Section 4.2) were applied to compress high-dimensional outputs before training the surrogate. Duffy et al. [93,94] proposed a general deep learning emulation framework for numerical models, demonstrating its applicability in satellite remote sensing and suggesting broader potential for Earth system and land surface model emulation. Regarding the domain of LSMs, Baker et al. [95] used sparse GPR (see also Section 4.3) to emulate outputs of the high-resolution JULES (Joint UK Land Environment Simulator) model. Their approach demonstrated that GPR can offer accurate surrogate representations even for fine-scale simulations at high speed, while enabling uncertainty estimation. Similarly, Xu et al. [96] used a GPR-based surrogate to support a Bayesian calibration framework for runoff-generation in E3SM, and Watson-Parris et al. [97] introduced ESEm (Earth System Emulator), an open and scalable emulator platform combining ensemble learning and DR for ESM calibration. Recent advancements extend emulation to complex processes like wildfire modeling and plant functional type dynamics. For example, Zhu et al. [98] developed a DLNN emulator for wildfire activities in ESMs using a fully connected NN, whereas Li et al. [99] applied XGBoost (see also Section 4.3) to emulate plant coexistence dynamics in the ELM-FATES model (a demographic vegetation model that operates within the E3SM land model framework (ELM)). At the climate system level, Beusch et al. [100,101] introduced MESMER (Modular Earth System Model Emulator with spatially Resolved output), a statistical framework to emulate temperature responses across spatial scales using a combination of pattern scaling and autoregressive modeling. Further, Bouabid et al. [102] proposed FaIRGP, a GPR emulator for global surface temperature projections with UQ [103] also presented Graph Convolutional NNs (see also Section 4.3) as surrogate models for spatially explicit climate simulations with UQ.
Taken together, these examples highlight that MLRAs provide efficient, flexible surrogates for complex, high-dimensional Earth-system models. For spectrally or spatially structured outputs, DR (e.g., PCA/SVD) is commonly applied to compress the target space, improving training efficiency and simplifying the emulator.

4. Trends and Advances in Emulation Methodologies

4.1. Empirical vs. RTM-Based Emulation: The Role of Training Data Sampling

One key distinction among emulators lies in the nature of their training inputs. Some emulators are trained on purely empirical relationships between inputs and observed spectral data, essentially learning directly from real-world data, e.g., for scene-to-scene translation (see also Section 5.3). However, the most robust approaches train emulators on simulated outputs from RTMs. This latter approach offers several critical advantages: it ensures consistency with known physical principles, allows for generalization across sensors and vegetation and atmosphere variability (as the underlying physics remains constant), and can even permit extrapolation to conditions not yet observed in real data. The capability to accurately replicate RTM’s output is crucial for broad applicability and predictive power in novel environments. To reach high output accuracy, the quality and representativeness of an emulator’s training data are paramount.
To train RTM emulators efficiently, it is crucial to use efficient sampling strategies that cover the high-dimensional input space—i.e., explore it broadly and uniformly—and provide the model with a diverse set of training data. Ideally, the parameter space is sampled with good coverage, including boundaries and rare but plausible combinations. To achieve so, space-filling sampling designs such as (1) Latin Hypercube Sampling (LHS) [104], (2) Sobol sequences [105], or (3) Halton sequences [106] are commonly employed. See Table 4 for a qualitative comparison of their key properties. These methods ensure that samples are uniformly spread across the entire input domain, thereby avoiding clustering and guaranteeing adequate coverage of all dimensions, which ultimately leads to better model generalization. For instance, LHS ensures that each dimension of the input space is sampled exactly once for each stratum, providing more uniform coverage than simple random sampling. Given its simplicity, flexibility, and ability to ensure well-distributed samples across high-dimensional input spaces, LHS has become the most widely adopted sampling strategy in RTM emulation studies [26,42,61]. LHS ensures that each parameter range is sampled evenly, making it markedly effective for generating training datasets from high-dimensional RTMs.
Alternatively, active learning and adaptive sampling techniques have emerged to further optimize the training process. Instead of pre-generating all training samples, these methods iteratively select new samples based on the current emulator’s uncertainty. For example, measures of Euclidean-based diversity (see review for regression applications: [107]) or regions of high predictive variance can guide the sampling process. This allows focusing sampling efforts on areas where the model is least confident or where the RTM response exhibits high nonlinearity. For instance, Ma et al. [72] explored the active subspace method [108] finds a low-dimensional linear subspace (spanned by combinations of input variables) that captures most of the important variation in the output. Another notable implementation of this principle is the AMOGAPE (Active Multi-Output Gaussian Process Emulator) framework, introduced by Svendsen et al. [109], which combines GPR-based emulation with an active sampling strategy tailored for RTM’s spectral outputs. The acquisition function balances exploration and exploitation by targeting inputs that yield high predictive uncertainty or rapid output variation. Their results demonstrated that this strategy can substantially reduce the number of RTM evaluations needed to construct accurate emulators, especially in complex or high-dimensional settings. Although AMOGAPE is built upon GPR, overall, active learning and adaptive sampling techniques can offer a promising direction for building efficient surrogate models.

4.2. The Role of Spectral Dimensionality Reduction (DR) in Emulation of RTMs

Spectral dimensionality reduction (DR) is frequently applied when emulating RTM spectral output spaces (reflectance, radiance, SIF) that may comprise hundreds of bands. Owing to strong collinearity in hyperspectral data [110], these outputs can be compressed into a compact latent space—a lower-dimensional representation that preserves the dominant spectral variability—using (1) principal component analysis (PCA) [111], (2) singular value decomposition (SVD) [112], or (3) autoencoders (AEs) [113]. These are the most commonly used DR approaches in RTM emulation, especially for compressing spectral data and large LUTs (see Table 5 for a qualitative comparison). When applied to mean-centered data, PCA and SVD yield identical principal directions and projections [111,114]. In practice, PCA is the default DR in RTM emulation studies, while AEs are integral to many NN-based designs and provide a nonlinear latent space capable of capturing curved spectral manifolds; SVD is more often discussed in broader environmental emulation contexts (see Section 3.5).
A common and effective workflow is to train in latent, reconstruct to full: the emulator learns to predict a small set of latent coefficients (e.g., 10–50) instead of the full spectrum, and a fixed decoder reconstructs the complete spectral output—via PCA/SVD loadings for linear DR, or via the AE decoder for nonlinear DR. This reduces target dimensionality, accelerates training and inference, and often improves generalization by acting as a structural regularizer, with limited loss in accuracy when the latent basis captures the relevant variance. In pipelines where inputs are themselves spectral (rather than physical RTM parameters), the same DR strategy can be applied to both input and output spaces, with reconstruction on each side (see also Section 4.2); this dual-side compression further reduces computational burden and stabilizes learning in very high-dimensional settings.
Beyond PCA/SVD/AEs, functional PCA (FPCA) [108]—a PCA variant for functional data observed over a continuous domain (e.g., wavelength) [115]—is a promising, though less commonly used, option for spectral emulation. By contrast, popular manifold learners such as t-SNE [116], Isomap [117], UMAP [118], and Kernel PCA [119] generally lack a straightforward inverse map; reconstructing spectra would require solving a pre-image problem or training a separate inverse model, complicating end-to-end emulation. In summary, latent-space DR offers a pragmatic path to scalable RTM surrogates: compress spectra to a low-dimensional latent space, train the emulator on latent targets (and inputs, if spectral), and decode to the full spectrum for downstream use.
Table 5. Comparison of common DR techniques used in RTM emulation workflows.
Table 5. Comparison of common DR techniques used in RTM emulation workflows.
PropertyPCA [111]SVD [112]Autoencoder [113]
TypeLinear projectionLinear matrix factorizationNonlinear encoder-decoder
LearningUnsupervised (closed-form)Unsupervised (closed-form)Unsupervised (trained with backpropagation)
NonlinearityNoNoYes
InterpretabilityHigh (ordered by variance)Moderate (singular vectors)Low (latent variables)
ScalabilityFast, memory-limited at scaleEfficient, scalable SVD libs. existScales well; training cost higher
CompressionEffective for linear varianceGood for general matricesStrong for nonlinear manifolds
ReconstructionInverse projection from PCsMatrix product of truncated SVDDecoder reconstructs from latent space
AccuracyGood for linear dataSimilar to PCABetter for nonlinear data
ComplexitySimple, widely usedSimple, widely availableRequires architecture and tuning
RTM UseCommon for LUT/input reductionRare, yet applicable (similar to PCA)Increasing use for LUT compression
References[60,120][66,69]

4.3. Advanced Machine Learning for Emulation

Whereas the above sections demonstrated that established MLRAs are already used in the emulation of vegetation and atmosphere RTMs, several more novel algorithms—though not yet widely adopted in RTM applications—hold promise for developing more scalable, accurate, and uncertainty-aware surrogate models. These include scalable variants of GPR, deep learning architectures (e.g., CNNs, transformers, GANs), and advanced gradient decision-tree ensembles (e.g., XGBoost, LightGBM, CatBoost). Bayesian methods such as BART and Bayesian NNs (BNNs) offer native UQ. Table 6 summarizes the most relevant methods, their main strengths, and associated references. These emerging approaches could serve as a foundation for future developments in RTM emulation workflows. Recent studies outside the strict RTM domain already illustrate the potential of these advanced methods. Sparse GPR has been used to emulate high-resolution outputs of the JULES land surface model [95]. XGBoost has been applied to emulate plant coexistence dynamics in the ELM-FATES demographic vegetation model within the E3SM framework [99]. Graph Convolutional NNs (GCNNs) have been proposed as surrogate models for spatially explicit climate simulations with integrated UQ [103]. More recently, physics-informed NNs (PINNs) have been employed to directly solve the radiative transfer equation, enabling physically consistent and accurate approximations [88]. In the context of RTM emulation, choosing a suitable advanced MLRA would depend on factors such as the size and dimensionality of RTM outputs (although the methods could be combined with DR), the availability of training data, the need for interpretability or uncertainty estimates, and computational constraints.

4.4. Emulator Performance Trade-Offs

Given the above-described emulation literature, reported per-evaluation speed-ups (single forward call, excluding training) typically fall in the range of 10 2 10 6 times faster, depending on model complexity, target dimensionality/resolution, and hardware baselines, with DR generally pushing results toward the upper end of these ranges (e.g., latent projections via PCA/SVD/AEs) [111,112,113,114]. However, these accelerations—and emulator accuracy/runtime more broadly—are inherently context-dependent, jointly governed by interacting design choices (algorithm family), training-data volume and coverage [104,134], effective output dimensionality (with or without DR) [111,113], and implementation details (sampling design, normalization, loss, early stopping). Because these factors differ across studies and interact nonlinearly, a universal quantitative ranking is neither stable nor meaningful. Instead, results consistently map onto a Pareto-like trade space [135] in which gains in accuracy are exchanged for higher computational cost and/or stronger data requirements. DR schemes systematically shift this trade space by lowering the effective output dimensionality, enabling near-equivalent accuracy at markedly reduced cost; this shift alone makes cross-paper numbers incomparable unless DR settings are matched [26,42,111,113]. On the algorithmic side, scalable GPR variants (sparse/variational/structured GPRs) provide principled accuracy–efficiency compromises [122,123,136,137], while modern NN architectures benefit from training in latent spaces for high-dimensional spectra [113]. Consequently, quantitative comparison only becomes informative under a shared experimental protocol (same RTM, same sampling and DR pipeline, same target variables and metrics, same hardware/budget constraints), as can be achieved in standardized community challenges (see Section 6). At the review level, no emulator emerges as universally optimal; selection should be guided by task-specific priorities across the accuracy–efficiency–uncertainty trade space.
Table 7 makes these trade-offs explicit by: (i) contrasting the most widely used MLRAs in two regimes—full hyperspectral outputs (exemplified by ~500 bands) versus a dimensionality-reduced latent space (exemplified by ~10 components); (ii) using a harmonized star scale where more stars always indicate better relative performance; (iii) comparing the key decision axes side-by-side (training/prediction speed, accuracy/fidelity, memory/storage efficiency, resistance to overfitting, interpretability); and (iv) adding concise remarks to state typical operating regimes and assumptions. This table can thus be read as a design map: DR consistently shifts configurations toward higher efficiency and robustness with limited loss in fidelity/accuracy, while the preferred algorithm depends on whether UQ, nonlinearity, or throughput is paramount. A pragmatic strategy is to train in a reduced latent space and reconstruct to full spectra, choosing the algorithm according to use-case priorities: (1) GPR(+DR) when UQ and interpretability are central; (2) NN(+DR) for highly nonlinear, data-rich emulation models; and (3) RF/KRR(+DR) for rapid prototyping, benchmarking, or ensemble aggregation.

5. Applications of Emulation

Having introduced the principles of emulators in optical remote sensing, we now turn to their practical applications. In RTM emulation, an MLRA is trained on RTM simulations, with RTM input variables serving as predictors and RTM spectra as the outputs. Because spectral outputs are inherently high-dimensional, DR is typically applied; training the emulator on compressed spectral components, which are later reconstructed back into full spectra. This approach enables efficient, accurate surrogates for complex RTMs. In the following sections, we discuss four key emulation application domains: (1) global sensitivity analysis, (2) synthetic scene generation, (3) scene-to-scene emulation, and (4) retrieval.

5.1. Emulation for Global Sensitivity Analysis of RTMs

Global Sensitivity Analysis (GSA) quantifies how much variation in an RTM’s output is attributable to each input parameter, including their interactions. Variance-based methods such as Sobol’s sensitivity indices (i.e., first-order and total Sobol indices) provide a comprehensive decomposition of output variance but traditionally require thousands to millions of costly RTM runs, making GSA practically infeasible for complex models [138]. Emulators can substantially alleviate this computational bottleneck by replacing the full RTM with a fast surrogate, enabling variance-based GSA to be performed in minutes or hours instead of weeks. This acceleration allows researchers to: (1) identify influential RTM input parameters such as LAI, chlorophyll content, soil brightness, or atmospheric variables like aerosol optical depth and water vapor; (2) reveal complex nonlinear interactions between parameters—e.g., the synergistic effect of chlorophyll and LAI on reflectance; and (3) optimize input parameter ranges to ensure efficient and representative simulation campaigns.
As a first demonstration of using emulators for GSA, Verrelst et al. [60] applied GPR, NN, and KRR emulators to the PROSAIL and MODTRAN5 models (see also Figure 3). Their work demonstrated the high accuracy of the emulators, and then successfully identified the key drivers of spectral variability. Extending this work, Verrelst et al. [139] performed a detailed emulator-based GSA on a coupled leaf-canopy-atmosphere RTM system (PROSAIL + MODTRAN5). Their GPR emulator achieved high fidelity (<2.5% relative error), allowing the identification of dominant contributors to TOA radiance. Results revealed that vegetation parameters (e.g., leaf chlorophyll, water thickness, LAI) dominated over atmospheric ones, demonstrating the feasibility of direct biophysical retrieval from TOA data without prior atmospheric correction. Similarly, in the atmospheric domain, Vicent Servera et al. [73] proposed a multifidelity GPR emulator framework for MODTRAN to support GSA while balancing computational cost and accuracy. Progressing along this line, Vicent Servera et al. [74] subsequently introduced a physics-aware feature selection framework for RTM emulation. By aligning GPR feature selection with variance-based GSA, the method pinpointed key atmospheric variables such as solar zenith angle, water vapor, and aerosol properties, contributing to more interpretable emulation pipelines. In the context of operational atmospheric correction, Zhou et al. [140] used MODTRAN and libRadtran to build LUTs, which were then emulated using an RF model to enable rapid GSA and subsequent surface reflectance estimation. Their GSA highlighted visibility and water vapor as dominant parameters affecting surface reflectance. It was concluded that emulator integration not only improved computational efficiency but also enhanced understanding of parameter sensitivity, supporting practical implementation in large-scale processing workflows. Overall, these works confirm that emulator-enabled GSA uplifts sensitivity analysis from a theoretical possibility into a practical and fast tool for RTM understanding, inversion optimization, and operational remote sensing retrieval model development.

5.2. RTM Emulation for Synthetic Scene Generation

RTM emulators also offer a fast alternative to generate synthetic reflectance or radiance scenes over simulated vegetated landscapes, vastly outperforming full RTM execution in speed. Importantly, emulated scenes can be tailored to specific sensor characteristics—such as spectral band configurations, signal-to-noise ratio, and spatial resolution—making them notably valuable within the context of satellite mission design. Those emulators can be subsequently integrated into end-to-end simulation frameworks [141,142] to assess system capabilities and optimize payload specifications. By emulating responses across large parameter spaces, emulators can form the core to enable fast exploration of “what-if” scenarios and foster a deeper understanding of how biophysical or environmental factors over a surface influence observed signals from space. This includes identifying sensitive spectral regions, investigating parameter interactions, or predicting sensor responses under unobserved conditions. In this context, a notable example is provided by Verrelst et al. [61], who developed GPR- and NN-based emulators of the SCOPE model to reproduce canopy reflectance and SIF spectra. Integrated into a so-called GUI Automated Scene Generator Module, the emulators produced synthetic sensor-specific reflectance and SIF images with <2% error for reflectance and <4% for SIF, thereby reducing processing time from days to minutes (see also Figure 4). These emulators support the simulation of realistic reflectance and SIF scenes over mapped landscapes and have been used in the context of ESA’s upcoming hyperspectral missions FLEX (FLuorescence EXplorer) [143] and CHIME (Copernicus Hyperspectral Imaging Mission) [144].
RTM-based emulators can also serve in spectral retrieval schemes that rely on physical principles. Pursuing this approach, Pato et al. [145] introduced a MODTRAN-based ML emulator that directly predicts at-sensor radiances in the O2-A absorption band, optimized for SIF retrieval. This emulator integrates physical radiative transfer principles embedded in MODTRAN with advanced learning using fourth-degree polynomial regression model to provide accurate and efficient estimates of radiance, facilitating improved SIF inversion from airborne or satellite hyperspectral sensor data. In summary, RTM-based emulators enable fast, scalable, and sensor-specific scene generation for EO application development, satellite mission design, and the processing of satellite imagery. However, the fidelity of any emulated scene is bounded by the emulator’s validated accuracy and calibration within the intended domain of use; insufficient validation or out-of-domain inputs can propagate biases downstream. Aside from the studies mentioned above, their use for producing spatially explicit synthetic scenes remains largely unexplored, marking a clear opportunity for further research in end-to-end simulation frameworks and satellite image processing, e.g., in the context of developing atmospheric correction and retrieval pipelines [73,140].

5.3. Scene-to-Scene Emulation

Scene-to-scene emulation involves transforming spatially explicit remote-sensing products from one domain to another—for example, converting multispectral to hyperspectral reflectance, or airborne to satellite-scale SIF. This application is a higher-level form of emulation, distinct from traditional RTM emulators. Rather than reproducing an RTM’s forward simulations, it uses emulators trained on high-quality reference data to map directly and efficiently between image domains, enabling rapid generation of realistic, sensor-specific radiometric products. The workflow typically applies DR at both input and output stages, with back-projection used to reconstruct full-spectrum outputs.
Regarding the emulation of reflectance imagery, Verrelst et al. [139] presented a prototype of reflectance scene emulation in which S2 multispectral imagery was transformed into hyperspectral imagery using GPR models trained on empirical HyPlant hyperspectral reflectance observations. The resulting maps maintained physical consistency and spectral fidelity, enabling a first demonstration of hyperspectral scene reconstruction from operational satellite data. Building upon this, Morata et al. [146] employed NNs to emulate hyperspectral reflectance (402–2356 nm) from S2 imagery (see also Figure 5). The models achieved high accuracy (R2 = 0.75–0.90, low NRMSE) and could process full S2-like hyperspectral tiles (e.g., 5490 × 5490 pixels) in seconds. Model uncertainty was quantified using NN dropout, allowing spatial predictive confidence. In another application, Barrou Dumont et al. [147] developed an emulator for historical SPOT satellite imagery by emulating S2 data into SPOT spectral and radiometric characteristics. This enabled the training of deep learning classifiers for snow and cloud classification on SPOT images, thus without requiring reference data for SPOT, thereby overcoming limitations of historical archives and supporting long-term ecosystem monitoring.
Scene-to-scene emulation has also proven to be a promising approach for reconstructing and upscaling full-spectrum SIF. Morata et al. [76] developed an emulator trained on HyPlant airborne radiance data to estimate SIF, enabling fast and accurate reconstruction of SIF maps from radiance measurements. Building upon this, Morata et al. [120] subsequently presented a PCA-based approach to reconstruct full-spectrum SIF from HyPlant O2A and O2B band signals simulated with the SCOPE model. A KRR emulator was subsequently trained to upscale full-spectrum SIF through satellite PRISMA reflectance spectra to satellite-scale full-spectrum SIF at 30 m and 300 m resolution, producing FLEX-compatible synthetic full-spectrum SIF products. Importantly, their method incorporated uncertainty propagation throughout the reconstruction and upscaling steps, providing quantified confidence bounds on the emulated full-spectrum SIF estimates. These advances illustrate the potential of scene-to-scene emulation for generating realistic, high-resolution SIF products across platforms. The developed workflow supports mission calibration and validation, enabling the flexible generation of satellite-like SIF datasets from airborne or ground-based measurements, thereby supporting preparatory activities in upcoming missions, such as FLEX.
Together, these studies demonstrate the emerging potential of scene-to-scene emulation as a powerful and computationally efficient approach for transforming remote sensing data across scales, sensors, and spectral domains (with fidelity bounded by validated in-domain accuracy). This supports a wide range of applications, from algorithm training and data fusion to satellite mission design and validation.

5.4. Emulation-Based Retrieval of Vegetation and Atmospheric Products

Finally, emulators have been applied to accelerate and enhance retrieval workflows for EO products, i.e., in mapping applications of vegetation and atmospheric variables using remote sensing data. By replacing computationally intensive RTMs with fast, accurate surrogate models, emulation enables large-scale, high-resolution inversion of multi/hyperspectral imagery. The traditional approach to inverting RTMs in image processing employs iterative optimization [148], minimizing a cost function that measures the mismatch between observed and simulated variables (e.g., reflectance). Direct application to images is often infeasible due to the high computational cost of per-pixel iterations. Substituting the RTM with an accurate emulator can greatly accelerate inversion, restoring its practicality for large-scale retrievals. When the emulator preserves the realism of the original model, inversions can not only run faster but also deliver improved estimates of vegetation properties. This principle was demonstrated by Verrelst et al. [78] using a KRR emulator of the DART model to numerically invert key vegetation variables such as LAI, leaf chlorophyll content (LCC), and fractional vegetation (FVC) over a forest as observed by the airborne hyperspectral sensor HyPlant (see also Figure 6). Likewise, Shi et al. [63] employed soil–canopy–atmosphere RTM emulators based on RF and NNs to retrieve multiple vegetation variables from S2 TOA satellite observations with enhanced computational efficiency. Similarly, Makhloufi and Kallel [82] coupled an ANN-based emulator of the DART model with MC inversion, enabling uncertainty-aware crop monitoring from S2 data.
Atmospheric product retrievals also benefit from emulation. Vicent Servera et al. [73] developed multifidelity GPR emulators for atmospheric RTMs, enabling efficient inversion for aerosol optical depth and water vapor with UQ. Likewise, Zhou et al. [140] combined atmospheric correction with machine learning emulators to accelerate surface reflectance retrieval from hyperspectral data, demonstrating improved processing throughput without sacrificing accuracy. Overall, emulation-based retrieval approaches represent a transformative avenue for rapid, scalable, and potentially uncertainty-aware mapping of EO vegetation and atmospheric products (with reliability bounded by validated in-domain accuracy), making emulators highly relevant for current and future satellite missions.

6. Ongoing Challenges and Future Outlook

To end this review, we offer some suggestions on ongoing trends and the future outlook. Recent developments in machine learning are pushing the boundaries of what is possible for emulation in remote sensing, pointing to several promising directions:
  • Robust emulators: A persistent challenge in RTM emulation is maintaining high predictive accuracy when applied to conditions outside the training domain. Strategies to address this include: (1) Physically informed sampling or adaptive sampling [149], ensuring training LUTs span the relevant parameter space; (2) Domain adaptation and transfer learning [150,151,152] to adjust emulators for new sensors, locations, or observation conditions; (3) Physics-informed constraints that embed RTM equations or invariants into learning architectures [131,153]; (4) Regularization and UQ to reduce overfitting and detect when predictions are extrapolations [154,155,156]; and (5) Cross-domain validation, testing on independent datasets with different distributions to evaluate robustness. Combining these strategies improves resilience to domain shifts and enhances emulator applicability in operational settings.
  • Community Resources and Benchmarking: The growth of open-source libraries, pre-trained emulators, user-friendly toolboxes, and collaborative benchmarks is a critical enabler for the field. Already since 2015, ARTMO’s (automated radiative transfer models operator) Emulator Toolbox has been released, which continues to be expanded with MLRAs and application tools (e.g., emulation of RTMs, GSA, scene generation [26,61,76,120,139,146]. Emulator tools have also been prepared specifically for atmospheric RTMs within the ALG (Automated Lookup table Generator) toolbox [20,73,74,157]. Both GUI toolboxes are downloadable at https://artmotoolbox.com/. At the same time, initiatives such as the development of standardized Python packages (e.g., Surrogate Modeling Toolbox: SMT https://github.com/SMTorg/SMT [158,159]) or specific modules within larger machine learning libraries (e.g., PySMO: Python-based Surrogate Modeling Objects, as part of IDAES (https://idaes-pse.readthedocs.io/) lower the barrier to entry for researchers. Beyond individual studies, community emulation challenges—for example, hackathon-style benchmarks such as the RTM emulation dataset (https://huggingface.co/datasets/isp-uv-es/rtm_emulation, all above websites accessed on 30 October 2025)—can foster innovation, enable standardized comparison, and accelerate robust, generalizable solutions. By fixing RTMs/targets, train–validation–test splits and reporting protocols, such challenges promote fair evaluations across MLRAs, sampling designs and DR choices. These shared resources are key to consolidating best-performing emulators for common use cases, scaling applications, and broadening impact across the remote sensing community.
  • Physics-Informed Neural Networks (PINNs): As discussed, PINNs are gaining traction as a paradigm shift [160,161]. By embedding known physical relationships (e.g., spectral absorption features, conservation laws) directly into the neural network’s loss function, PINNs can achieve higher accuracy with less training data, extrapolate more reliably, and offer greater physical consistency than purely data-driven NNs. This blend of machine learning with physical constraints or knowledge represents a powerful direction for creating more robust and scientifically grounded emulators.
  • Explainable AI (XAI) for RTM Emulators: As emulators become more complex, especially deep learning-based ones, there is an increasing demand for explainable AI (XAI) techniques [162,163]. It can be expected that future work will focus on developing explainable methods to interpret how emulators make predictions, identify which input parameters are most influential for specific outputs, and understand the internal logic of the models. This will build trust in emulator-derived products and facilitate scientific discovery by elucidating complex RTM behaviors.
  • Multimodal and Multitemporal Emulation: Future emulators may move beyond single TMs or single output types. Multimodal emulation involves models that jointly emulate multiple outputs or modalities (e.g., simultaneous prediction of reflectance, SIF, and thermal emissions from a single set of inputs), or fuse information across different sensor types (e.g., optical and thermal RTMs). This holistic approach supports integrated ecosystem monitoring and can help bridge gaps between diverse observations and process-based understanding. Progressing along, so far the temporal aspect has been ignored in RTM emulation. In this respect, multitemporal emulation can become promising and crucial for dynamic vegetation models, learning the evolution of parameters and signals over time, which is essential for understanding phenology, crop growth, or ecological succession.
  • Extending emulation to underrepresented RTM domains: water and soil. While emulation has flourished for vegetation and atmospheric RTMs, other RTM domains—radiative transfer in natural waters and in soils—remain largely unexplored. In aquatic optics, HydroLight/EcoLight/WASI simulations are mostly used to train MLRAs for retrieving water constituents and inherent optical properties, rather than forward emulators of remote-sensing reflectance ( R r s ( λ ) ) [164,165,166,167]. Similarly, soils are often simplified via static libraries or brightness scalings; dedicated forward soil RTM emulators are still absent despite mature forward models for soil reflectance/BRDF, including SOILSPECT/Hapke variants and the multilayer MARMIT family [168,169,170,171]. Looking ahead, developing fast, uncertainty-aware surrogates for water and soil RTMs would enable geometry-aware spectral generation and realistic background coupling, supporting large-scale retrieval and end-to-end uncertainty propagation across observation conditions.

7. Conclusions

Emulation, or surrogate modeling, has emerged as a transformative approach in optical remote sensing, offering fast, scalable, and potentially uncertainty-aware alternatives to traditional RTMs. Established MLRAs such as GPR, KRR, RF, and (DL)NNs have demonstrated high-fidelity RTM approximations with speed-ups of 102–105×. GPR provides probabilistic UQ, while (DL)NNs excel in accelerating high-dimensional outputs. Integrating an MLRA with DR and back-projection greatly simplifies the reconstruction of output contiguous spectral data. Crucially, accuracy and runtime depend jointly on interacting design factors—algorithmic complexity, the dimensionality of the RTM input–output space, the size and coverage of the training dataset, and the chosen DR strategy. Because these factors interact nonlinearly, there is no universally “best’’ emulator; instead, practitioners navigate a continuum of trade-offs among accuracy, generalization capability, and computational efficiency. In practice, training in a low-dimensional latent space and reconstructing to full spectra offers a pragmatic balance, with the preferred MLRA determined by task priorities (e.g., UQ vs. large-scale throughput).
On the application side, emulators have proven particularly valuable in enabling global sensitivity analysis, synthetic scene generation, uncertainty-aware scene-to-scene spectral translation (e.g., multispectral to hyperspectral), and retrieval of vegetation and atmospheric products from remote sensing data. Such applications are vital for optimizing satellite mission design, streamlining retrieval workflows, and fostering novel data-driven EO solutions. Planning for the future, challenges remain in ensuring generalization beyond training domains, improving interpretability, and establishing standardized benchmarking protocols. Continued development of community tools and integration of explainable MLRAs will be key to mainstream adoption. As both machine learning and intricate physical models continue to advance, emulators are destined to become indispensable in operational remote sensing pipelines.

Author Contributions

Conceptualization, J.V.; methodology, J.V.; software, J.P.R.-C., M.M., J.Q. and Y.S.; validation, J.L.G.-S., Y.S.; formal analysis, J.V.; investigation, J.V.; resources, J.V.; data curation, J.V.; writing—original draft preparation, J.V., M.M., J.L.G.-S. and J.Q.; writing—review and editing, J.V. and M.M.; visualization, J.V., M.M. and J.L.G.-S.; supervision, J.V. and J.Q.; project administration, J.V.; funding acquisition, J.V. All authors have read and agreed to the published version of the manuscript.

Funding

The research was funded by the European Research Council (ERC) under the FLEXINEL project: grant number 101086622. The views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Goody, R.M.; Yung, Y.L. Atmospheric Radiation: Theoretical Basis; Oxford University Press: Oxford, UK, 1995. [Google Scholar]
  2. Liou, K.N. An Introduction to Atmospheric Radiation; Elsevier: Amsterdam, The Netherlands, 2002; Volume 84. [Google Scholar]
  3. Myneni, R.B.; Ross, J. Photon-Vegetation Interactions: Applications in Optical Remote Sensing and Plant Ecology; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
  4. Myneni, R.; Maggion, S.; Iaquinta, J.; Privette, J.; Gobron, N.; Pinty, B.; Kimes, D.; Verstraete, M.; Williams, D. Optical remote sensing of vegetation: Modeling, caveats, and algorithms. Remote Sens. Environ. 1995, 51, 169–188. [Google Scholar] [CrossRef]
  5. Lenoble, J. Radiative Transfer in Scattering and Absorbing Atmospheres: Standard Computational Procedures; A. Deepak: Hampton, VA, USA, 1985; Volume 300. [Google Scholar]
  6. Verrelst, J.; Camps-Valls, G.; Muñoz-Marí, J.; Rivera, J.P.; Veroustraete, F.; Clevers, J.G.; Moreno, J. Optical remote sensing and the retrieval of terrestrial vegetation bio-geophysical properties–A review. ISPRS J. Photogramm. Remote Sens. 2015, 108, 273–290. [Google Scholar] [CrossRef]
  7. Verhoef, W. Light scattering by leaf layers with application to canopy reflectance modeling: The SAIL model. Remote Sens. Environ. 1984, 16, 125–141. [Google Scholar] [CrossRef]
  8. Jacquemoud, S.; Baret, F. PROSPECT: A model of leaf optical properties spectra. Remote Sens. Environ. 1990, 34, 75–91. [Google Scholar] [CrossRef]
  9. Jacquemoud, S.; Verhoef, W.; Baret, F.; Bacour, C.; Zarco-Tejada, P.J.; Asner, G.P.; François, C.; Ustin, S.L. PROSPECT+ SAIL models: A review of use for vegetation characterization. Remote Sens. Environ. 2009, 113, S56–S66. [Google Scholar] [CrossRef]
  10. Berger, K.; Atzberger, C.; Danner, M.; D’Urso, G.; Mauser, W.; Vuolo, F.; Hank, T. Evaluation of the PROSAIL model capabilities for future hyperspectral model environments: A review study. Remote Sens. 2018, 10, 85. [Google Scholar] [CrossRef]
  11. van der Tol, C.; Verhoef, W.; Timmermans, J.; Verhoef, A.; Su, Z. An integrated model of soil-canopy spectral radiances, photosynthesis, fluorescence, temperature and energy balance. Biogeosciences 2009, 6, 3109–3129. [Google Scholar] [CrossRef]
  12. van der Tol, C.; Vilfan, N.; Dauwe, D.; Cendrero-Mateo, M.P.; Yang, P. The scattering and re-absorption of red and near-infrared chlorophyll fluorescence in the models Fluspect and SCOPE. Remote Sens. Environ. 2019, 232, 111292. [Google Scholar] [CrossRef]
  13. Gastellu-Etchegorry, J.; Martin, E.; Gascon, F. DART: A 3D model for simulating satellite images and studying surface radiation budget. Int. J. Remote Sens. 2004, 25, 73–96. [Google Scholar] [CrossRef]
  14. Qi, J.; Xie, D.; Yin, T.; Yan, G.; Gastellu-Etchegorry, J.P.; Li, L.; Zhang, W.; Mu, X.; Norford, L.K. LESS: LargE-Scale remote sensing data and image simulation framework over heterogeneous 3D scenes. Remote Sens. Environ. 2019, 221, 695–706. [Google Scholar] [CrossRef]
  15. Gastellu-Etchegorry, J.P.; Yin, T.; Lauret, N.; Cajgfinger, T.; Gregoire, T.; Grau, E.; Feret, J.B.; Lopes, M.; Guilleux, J.; Dedieu, G.; et al. Discrete anisotropic radiative transfer (DART 5) for modeling airborne and satellite spectroradiometer and LIDAR acquisitions of natural and urban landscapes. Remote Sens. 2015, 7, 1667–1701. [Google Scholar] [CrossRef]
  16. Gastellu-Etchegorry, J.P.; Lauret, N.; Yin, T.; Landier, L.; Kallel, A.; Malenovskỳ, Z.; Al Bitar, A.; Aval, J.; Benhmida, S.; Qi, J.; et al. DART: Recent advances in remote sensing data modeling with atmosphere, polarization, and chlorophyll fluorescence. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 2640–2649. [Google Scholar] [CrossRef]
  17. Vermote, E.F.; Tanré, D.; Deuze, J.L.; Herman, M.; Morcette, J.J. Second simulation of the satellite signal in the solar spectrum, 6S: An overview. IEEE Trans. Geosci. Remote Sens. 1997, 35, 675–686. [Google Scholar] [CrossRef]
  18. Berk, A.; Bernstein, L.; Anderson, G.; Acharya, P.; Robertson, D.; Chetwynd, J.; Adler-Golden, S. MODTRAN cloud and multiple scattering upgrades with application to AVIRIS. Remote Sens. Environ. 1998, 65, 367–375. [Google Scholar] [CrossRef]
  19. Mayer, B.; Kylling, A. The libRadtran software package for radiative transfer calculations-description and examples of use. Atmos. Chem. Phys. 2005, 5, 1855–1877. [Google Scholar] [CrossRef]
  20. Vicent, J.; Verrelst, J.; Sabater, N.; Alonso, L.; Rivera-Caicedo, J.P.; Martino, L.; Muñoz Marí, J.; Moreno, J. Comparative analysis of atmospheric radiative transfer models using the Atmospheric Look-up table Generator (ALG) toolbox (version 2.0). Geosci. Model Dev. 2020, 13, 1945–1957. [Google Scholar] [CrossRef] [PubMed]
  21. Verrelst, J.; Malenovskỳ, Z.; Van der Tol, C.; Camps-Valls, G.; Gastellu-Etchegorry, J.P.; Lewis, P.; North, P.; Moreno, J. Quantifying vegetation biophysical variables from imaging spectroscopy data: A review on retrieval methods. Surv. Geophys. 2019, 40, 589–629. [Google Scholar] [CrossRef] [PubMed]
  22. Kennedy, M.C.; O’Hagan, A. Bayesian calibration of computer models. J. R. Stat. Soc. Ser. (Statistical Methodol.) 2001, 63, 425–464. [Google Scholar] [CrossRef]
  23. Forrester, A.; Sobester, A.; Keane, A. Engineering Design via Surrogate Modelling: A Practical Guide; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
  24. Castelletti, A.; Galelli, S.; Ratto, M.; Soncini-Sessa, R.; Young, P. A general framework for dynamic emulation modelling in environmental problems. Environ. Model. Softw. 2012, 34, 5–18. [Google Scholar] [CrossRef]
  25. Castelletti, A.; Galelli, S.; Restelli, M.; Soncini-Sessa, R. Data-driven dynamic emulation modelling for the optimal management of environmental systems. Environ. Model. Softw. 2012, 34, 30–43. [Google Scholar] [CrossRef]
  26. Rivera, J.P.; Verrelst, J.; Gómez-Dans, J.; Muñoz-Marí, J.; Moreno, J.; Camps-Valls, G. An Emulator Toolbox to Approximate Radiative Transfer Models with Statistical Learning. Remote Sens. 2015, 7, 9347–9370. [Google Scholar] [CrossRef]
  27. Gómez-Dans, J.L.; Lewis, P.E.; Disney, M. Efficient Emulation of Radiative Transfer Codes Using Gaussian Processes and Application to Land Surface Parameter Inferences. Remote Sens. 2016, 8, 119. [Google Scholar] [CrossRef]
  28. Zhao, X.; Qi, J.; Yu, Z.; Yuan, L.; Huang, H. Fine-scale quantification of absorbed photosynthetically active radiation (APAR) in plantation forests with 3D radiative transfer modeling and LiDAR data. Plant Phenomics 2024, 6, 0166. [Google Scholar] [CrossRef] [PubMed]
  29. Qi, J.; Xie, D.; Jiang, J.; Huang, H. 3D radiative transfer modeling of structurally complex forest canopies through a lightweight boundary-based description of leaf clusters. Remote Sens. Environ. 2022, 283, 113301. [Google Scholar] [CrossRef]
  30. Qi, J.; Jiang, J.; Zhou, K.; Xie, D.; Huang, H. Fast and accurate simulation of canopy reflectance under wavelength-dependent optical properties using a semi-empirical 3D radiative transfer model. J. Remote Sens. 2023, 3, 0017. [Google Scholar] [CrossRef]
  31. Yang, P.; Van der Tol, C.; Campbell, P.K.E.; Middleton, E.M. Unraveling the physical and physiological basis for the solar-induced chlorophyll fluorescence and photosynthesis relationship using continuous leaf and canopy measurements of a corn crop. Biogeosciences 2021, 18, 441–465. [Google Scholar] [CrossRef]
  32. Pacheco-Labrador, J.; El-Madany, T.S.; van der Tol, C.; Martin, M.P.; Gonzalez-Cascon, R.; Perez-Priego, O.; Guan, J.; Moreno, G.; Carrara, A.; Reichstein, M.; et al. senSCOPE: Modeling mixed canopies combining green and brown senesced leaves. Evaluation in a Mediterranean Grassland. Remote Sens. Environ. 2021, 257, 112352. [Google Scholar] [CrossRef]
  33. North, P. Three-dimensional forest light interaction model using a Monte Carlo method. IEEE Trans. Geosci. Remote Sens. 1996, 34, 946–956. [Google Scholar] [CrossRef]
  34. Hernández-Clemente, R.; North, P.; Hornero, A.; Zarco-Tejada, P. Assessing the effects of forest health on sun-induced chlorophyll fluorescence using the FluorFLIGHT 3-D radiative transfer model to account for forest structure. Remote Sens. Environ. 2017, 193, 165–179. [Google Scholar] [CrossRef]
  35. Yang, X.; Wang, Y.; Yin, T.; Wang, C.; Lauret, N.; Regaieg, O.; Xi, X.; Gastellu-Etchegorry, J.P. Comprehensive LiDAR simulation with efficient physically-based DART-Lux model (I): Theory, novelty, and consistency validation. Remote Sens. Environ. 2022, 272, 112952. [Google Scholar] [CrossRef]
  36. Zhou, K.; Xie, D.; Qi, J.; Zhang, Z.; Bo, X.; Yan, G.; Mu, X. Explicitly reconstructing RAMI-V scenes for accurate 3-dimensional radiative transfer simulation using the LESS model. J. Remote Sens. 2023, 3, 0033. [Google Scholar] [CrossRef]
  37. Kotchenova, S.Y.; Vermote, E.F. Validation of a vector version of the 6S radiative transfer code for atmospheric correction of satellite data. Part II. Homogeneous Lambertian and anisotropic surfaces. Appl. Opt. 2007, 46, 4455–4464. [Google Scholar] [CrossRef] [PubMed]
  38. Berk, A.; Anderson, G.; Acharya, P.; Bernstein, L.; Muratov, L.; Lee, J.; Fox, M.; Adler-Golden, S.; Chetwynd, J.; Hoke, M.; et al. MODTRANTM 5: 2006 Update; SPIE: Bellingham, WA, USA, 2006; Volume 6233. [Google Scholar] [CrossRef]
  39. Emde, C.; Buras-Schnell, R.; Kylling, A.; Mayer, B.; Gasteiger, J.; Hamann, U.; Kylling, J.; Richter, B.; Pause, C.; Dowling, T.; et al. The libRadtran software package for radiative transfer calculations (version 2.0.1). Geosci. Model Dev. 2016, 9, 1647–1672. [Google Scholar] [CrossRef]
  40. Lin, Y.; O’Malley, D.; Vesselinov, V.V. A computationally efficient parallel L evenberg-M arquardt algorithm for highly parameterized inverse model analyses. Water Resour. Res. 2016, 52, 6948–6977. [Google Scholar] [CrossRef]
  41. Kennedy, B.E.; King, D.J.; Duffe, J. Comparison of empirical and physical modelling for estimation of biochemical and biophysical vegetation properties: Field scale analysis across an Arctic bioclimatic gradient. Remote Sens. 2020, 12, 3073. [Google Scholar] [CrossRef]
  42. Vicent, J.; Verrelst, J.; Rivera-Caicedo, J.P.; Sabater, N.; Muñoz-Marí, J.; Camps-Valls, G.; Moreno, J. Emulation as an Accurate Alternative to Interpolation in Sampling Radiative Transfer Codes. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 4918–4931. [Google Scholar] [CrossRef]
  43. Shapiro, A. Monte Carlo sampling methods. Handbooks Oper. Res. Manag. Sci. 2003, 10, 353–425. [Google Scholar]
  44. Petropoulos, G.; Wooster, M.; Carlson, T.; Kennedy, M.; Scholze, M. A global Bayesian sensitivity analysis of the 1D SimSphere soil vegetation atmospheric transfer (SVAT) model using Gaussian model emulation. Ecol. Model. 2009, 220, 2427–2440. [Google Scholar] [CrossRef]
  45. Rohmer, J.; Foerster, E. Global sensitivity analysis of large-scale numerical landslide models based on Gaussian-Process meta-modeling. Comput. Geosci. 2011, 37, 917–927. [Google Scholar] [CrossRef]
  46. Carnevale, C.; Finzi, G.; Guariso, G.; Pisoni, E.; Volta, M. Surrogate models to compute optimal air quality planning policies at a regional scale. Environ. Model. Softw. 2012, 34, 44–50. [Google Scholar] [CrossRef]
  47. Razavi, S.; Tolson, B.A.; Burn, D.H. Numerical assessment of metamodelling strategies in computationally intensive optimization. Environ. Model. Softw. 2012, 34, 67–86. [Google Scholar] [CrossRef]
  48. Villa-Vialaneix, N.; Follador, M.; Ratto, M.; Leip, A. A comparison of eight metamodeling techniques for the simulation of N2O fluxes and N leaching from corn crops. Environ. Model. Softw. 2012, 34, 51–66. [Google Scholar] [CrossRef]
  49. Lee, L.; Pringle, K.; Reddington, C.; Mann, G.; Stier, P.; Spracklen, D.; Pierce, J.; Carslaw, K. The magnitude and causes of uncertainty in global model simulations of cloud condensation nuclei. Atmos. Chem. Phys. 2013, 13, 8879–8914. [Google Scholar] [CrossRef]
  50. Bounceur, N.; Crucifix, M.; Wilkinson, R. Global sensitivity analysis of the climate–vegetation system to astronomical forcing: An emulator-based approach. Earth Syst. Dyn. Discuss. 2014, 5, 901–943. [Google Scholar] [CrossRef]
  51. Ireland, G.; Petropoulos, G.; Carlson, T.; Purdy, S. Addressing the ability of a land biosphere model to predict key biophysical vegetation characterisation parameters with Global Sensitivity Analysis. Environ. Model. Softw. 2015, 65, 94–107. [Google Scholar] [CrossRef]
  52. Oakley, J.; O’hagan, A. Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika 2002, 89, 769–784. [Google Scholar] [CrossRef]
  53. Bocquet, M. Surrogate modeling for the climate sciences dynamics with machine learning and data assimilation. Front. Appl. Math. Stat. 2023, 9, 1133226. [Google Scholar] [CrossRef]
  54. Bazargan, H.; Christie, M.; Elsheikh, A.H.; Ahmadi, M. Surrogate accelerated sampling of reservoir models with complex structures using sparse polynomial chaos expansion. Adv. Water Resour. 2015, 86, 385–399. [Google Scholar] [CrossRef]
  55. Kim, Y.J. Comparative study of surrogate models for uncertainty quantification of building energy model: Gaussian Process Emulator vs. Polynomial Chaos Expansion. Energy Build. 2016, 133, 46–58. [Google Scholar] [CrossRef]
  56. Laloy, E.; Jacques, D. Emulation of CPU-demanding reactive transport models: A comparison of Gaussian processes, polynomial chaos expansion, and deep neural networks. Comput. Geosci. 2019, 23, 1193–1215. [Google Scholar] [CrossRef]
  57. Massoud, E.C. Emulation of environmental models using polynomial chaos expansion. Environ. Model. Softw. 2019, 111, 421–431. [Google Scholar] [CrossRef]
  58. Rajabi, M.M. Review and comparison of two meta-model-based uncertainty propagation analysis methods in groundwater applications: Polynomial chaos expansion and Gaussian process emulation. Stoch. Environ. Res. Risk Assess. 2019, 33, 607–631. [Google Scholar] [CrossRef]
  59. Haykin, S. Neural Networks–A Comprehensive Foundation, 2nd ed.; Prentice Hall: Hoboken, NJ, USA, 1999. [Google Scholar]
  60. Verrelst, J.; Sabater, N.; Rivera, J.; Muñoz-Marí, J.; Vicent, J.; Camps-Valls, G.; Moreno, J. Emulation of Leaf, Canopy and Atmosphere Radiative Transfer Models for Fast Global Sensitivity Analysis. Remote Sens. 2016, 8, 673. [Google Scholar] [CrossRef]
  61. Verrelst, J.; Rivera Caicedo, J.P.; Muñoz-Marí, J.; Camps-Valls, G.; Moreno, J. SCOPE-Based Emulators for Fast Generation of Synthetic Canopy Reflectance and Sun-Induced Fluorescence Spectra. Remote Sens. 2017, 9, 927. [Google Scholar] [CrossRef]
  62. Bue, B.D.; Thompson, D.R.; Deshpande, S.; Eastwood, M.; Green, R.O.; Natraj, V.; Mullen, T.; Parente, M. Neural network radiative transfer for imaging spectroscopy. Atmos. Meas. Tech. 2019, 12, 2567–2578. [Google Scholar] [CrossRef]
  63. Shi, H.; Xiao, Z.; Tian, X. Exploration of Machine Learning Techniques in Emulating a Coupled Soil-Canopy-Atmosphere Radiative Transfer Model for Multi-Parameter Estimation from Satellite Observations. IEEE Trans. Geosci. Remote Sens. 2019, 57, 8522–8533. [Google Scholar] [CrossRef]
  64. Brodrick, P.G.; Thompson, D.R.; Fahlen, J.E.; Eastwood, M.L.; Sarture, C.M.; Lundeen, S.R.; Olson-Duvall, W.; Carmon, N.; Green, R.O. Generalized radiative transfer emulation for imaging spectroscopy reflectance retrievals. Remote Sens. Environ. 2021, 261, 112476. [Google Scholar] [CrossRef]
  65. Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  66. Basener, A.A.; Basener, B.B. Deep learning of radiative atmospheric transfer with an autoencoder. In Proceedings of the 2022 12th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS), Rome, Italy, 13–16 September 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–7. [Google Scholar]
  67. Ukkonen, P. Exploring Pathways to More Accurate Machine Learning Emulation of Atmospheric Radiative Transfer. J. Adv. Model. Earth Syst. 2022, 14, e2021MS002875. [Google Scholar] [CrossRef]
  68. Ojaghi, S.; Bouroubi, Y.; Foucher, S.; Bergeron, M.; Seynat, C. Deep Learning-Based Emulation of Radiative Transfer Models for Top-of-Atmosphere BRDF Modelling Using Sentinel-3 OLCI. Remote Sens. 2023, 15, 835. [Google Scholar] [CrossRef]
  69. Aghdami-Nia, M.; Shah-Hosseini, R.; Homayouni, S.; Rostami, A.; Ahmadian, N. Surrogate Modeling of MODTRAN Physical Radiative Transfer Code Using Deep-Learning Regression. Environ. Sci. Proc. 2023, 29, 16. [Google Scholar]
  70. Jasso-Garduño, A.E.; Muñoz-Máximo, I.; Pinto, D.; Ramírez-Cortés, J.M. Deep Learning Based Emulation of Radiative Transfer Code for Atmospheric Correction of Satellite Images. Comput. Sist. 2024, 28, 2327–2341. [Google Scholar] [CrossRef]
  71. Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; The MIT Press: Cambridge, MA, USA, 2005. [Google Scholar] [CrossRef]
  72. Ma, P.; Mondal, A.; Konomi, B.A.; Hobbs, J.; Song, J.J.; Kang, E.L. Computer Model Emulation with High-Dimensional Functional Output in Large-Scale Observing System Uncertainty Experiments. Technometrics 2022, 64, 65–79. [Google Scholar] [CrossRef]
  73. Vicent Servera, J.; Martino, L.; Verrelst, J.; Camps-Valls, G. Multifidelity Gaussian Process Emulation for Atmospheric Radiative Transfer Models. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–10. [Google Scholar] [CrossRef]
  74. Vicent Servera, J.; Martino, L.; Verrelst, J.; Rivera-Caicedo, J.P.; Camps-Valls, G. Multioutput Feature Selection for Emulation and Sensitivity Analysis. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–11. [Google Scholar] [CrossRef]
  75. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  76. Morata, M.; Siegmann, B.; Morcillo-Pallarés, P.; Rivera-Caicedo, J.; Verrelst, J. Emulation of Sun-Induced Fluorescence from Radiance Data Recorded by the HyPlant Airborne Imaging Spectrometer. Remote Sens. 2021, 13, 4368. [Google Scholar] [CrossRef] [PubMed]
  77. Suykens, J.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
  78. Verrelst, J.; Rivera-Caicedo, J.; Moreno, J. Progress in Emulation for Radiative Transfer Modeling and Mapping. In Proceedings of the IGARSS 2018–2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1688–1691. [Google Scholar]
  79. Vapnik, V.; Golowich, S.; Smola, A. Support vector method for function approximation, regression estimation, and signal processing. Adv. Neural Inf. Process. Syst. 1997, 9, 281–287. [Google Scholar]
  80. Xiu, D.; Karniadakis, G.E. The Wiener–Askey polynomial chaos for stochastic differential equations. Siam J. Sci. Comput. 2002, 24, 619–644. [Google Scholar] [CrossRef]
  81. Kuusk, A. A two-layer canopy reflectance model. J. Quant. Spectrosc. Radiat. Transf. 2001, 71, 1–9. [Google Scholar] [CrossRef]
  82. Makhloufi, A.; Kallel, A. Inversion of a new designed ANN-based 3-D-RTM emulator by continuous MCMC technique to monitor crop biophysical properties using sentinel-2 images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
  83. Veerman, M.A.; Pincus, R.; Stoffer, R.; Van Leeuwen, C.M.; Podareanu, D.; Van Heerwaarden, C.C. Predicting atmospheric optical properties for radiative transfer computations using neural networks. Philos. Trans. R. Soc. 2021, 379, 20200095. [Google Scholar] [CrossRef] [PubMed]
  84. Belochitski, A.; Krasnopolsky, V. Stable emulation of an entire suite of model physics in a state-of-the-art gcm using a neural network. arXiv 2021, arXiv:2103.07028. [Google Scholar]
  85. Zhong, X.; Ma, Z.; Yao, Y.; Xu, L.; Wu, Y.; Wang, Z. WRF–ML v1. 0: A bridge between WRF v4. 3 and machine learning parameterizations and its application to atmospheric radiative transfer. Geosci. Model Dev. 2023, 16, 199–209. [Google Scholar] [CrossRef]
  86. Gonzalez, J.; Dipu, S.; Sourdeval, O.; Simeon, A.; Camps-Valls, G.; Quaas, J. Emulation of Forward Modeled Top-of-Atmosphere MODIS-Based Spectral Channels Using Machine Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 1896–1911. [Google Scholar] [CrossRef]
  87. Lamminpää, O.; Susiluoto, J.; Hobbs, J.; McDuffie, J.; Braverman, A.; Owhadi, H. Forward model emulator for atmospheric radiative transfer using Gaussian processes and cross validation. Atmos. Meas. Tech. 2025, 18, 673–694. [Google Scholar] [CrossRef]
  88. Zucker, S.; Batenkov, D.; Rozenhaimer, M.S. Physics-informed neural networks for modeling atmospheric radiative transfer. J. Quant. Spectrosc. Radiat. Transf. 2025, 331, 109253. [Google Scholar] [CrossRef]
  89. Geiss, A.; Ma, P.L.; Singh, B.; Hardin, J.C. Emulating aerosol optics with randomly generated neural networks. Geosci. Model Dev. 2023, 16, 2355–2370. [Google Scholar] [CrossRef]
  90. Howard, L.; Subramanian, A.C.; Thompson, G.; Johnson, B.; Auligne, T. Probabilistic Emulation of the Community Radiative Transfer Model Using Machine Learning. arXiv 2025, arXiv:2504.16192. [Google Scholar] [CrossRef]
  91. Sgattoni, C.; Sgheri, L.; Chung, M. A data-driven approach for fast atmospheric radiative transfer inversion. Inverse Probl. 2025, 41, 085006. [Google Scholar] [CrossRef]
  92. Lu, D.; Ricciuto, D. Efficient surrogate modeling methods for large-scale Earth system models based on machine-learning techniques. Geosci. Model Dev. 2019, 12, 1791–1807. [Google Scholar] [CrossRef]
  93. Duffy, K.; Vandal, T.; Wang, W.; Nemani, R.; Ganguly, A.R. Deep Learning Emulation of Multi-Angle Implementation of Atmospheric Correction (MAIAC). arXiv 2019, arXiv:1910.13408. [Google Scholar]
  94. Duffy, K.; Vandal, T.; Wang, W.; Nemani, R.; Ganguly, A. A Framework for Deep Learning Emulation of Numerical Models With a Case Study in Satellite Remote Sensing. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 3345–3356. [Google Scholar] [CrossRef] [PubMed]
  95. Baker, E.; Harper, A.B.; Williamson, D.; Challenor, P. Emulation of high-resolution land surface models using sparse Gaussian processes with application to JULES. Geosci. Model Dev. 2022, 15, 1913–1929. [Google Scholar] [CrossRef]
  96. Xu, D.; Bisht, G.; Sargsyan, K.; Liao, C.; Leung, L.R. Using a surrogate-assisted Bayesian framework to calibrate the runoff-generation scheme in the Energy Exascale Earth System Model (E3SM) v1. Geosci. Model Dev. 2022, 15, 5021–5043. [Google Scholar] [CrossRef]
  97. Watson-Parris, D.; Williams, A.; Deaconu, L.; Stier, P. Model calibration using ESEm v1. 1.0–an open, scalable Earth system emulator. Geosci. Model Dev. 2021, 14, 7659–7672. [Google Scholar] [CrossRef]
  98. Zhu, Q.; Li, F.; Riley, W.J.; Xu, L.; Zhao, L.; Yuan, K.; Wu, H.; Gong, J.; Randerson, J. Building a machine learning surrogate model for wildfire activities within a global Earth system model. Geosci. Model Dev. 2022, 15, 1899–1911. [Google Scholar] [CrossRef]
  99. Li, L.; Fang, Y.; Zheng, Z.; Shi, M.; Longo, M.; Koven, C.D.; Holm, J.A.; Fisher, R.A.; McDowell, N.G.; Chambers, J.; et al. A machine learning approach targeting parameter estimation for plant functional type coexistence modeling using ELM-FATES (v2.0). Geosci. Model Dev. 2023, 16, 4017–4040. [Google Scholar] [CrossRef]
  100. Beusch, L.; Gudmundsson, L.; Seneviratne, S.I. Emulating Earth system model temperatures with MESMER: From global mean temperature trajectories to grid-point-level realizations on land. Earth Syst. Dyn. 2020, 11, 139–159. [Google Scholar] [CrossRef]
  101. Beusch, L.; Gudmundsson, L.; Seneviratne, S.I. Crossbreeding CMIP6 earth system models with an emulator for regionally optimized land temperature projections. Geophys. Res. Lett. 2020, 47, e2019GL086812. [Google Scholar] [CrossRef]
  102. Bouabid, S.; Sejdinovic, D.; Watson-Parris, D. FaIRGP: A Bayesian energy balance model for surface temperatures emulation. J. Adv. Model. Earth Syst. 2024, 16, e2023MS003926. [Google Scholar] [CrossRef]
  103. Potter, K.; Martinez, C.; Pradhan, R.; Brozak, S.; Sleder, S.; Wheeler, L. Graph Convolutional Neural Networks as Surrogate Models for Climate Simulation. arXiv 2024, arXiv:2409.12815. [Google Scholar]
  104. Mckay, M.; Beckman, R.; Conover, W. A Comparison of Three Methods for Selecting Vales of Input Variables in the Analysis of Output From a Computer Code. Technometrics 1979, 21, 239–245. [Google Scholar] [CrossRef]
  105. Sobol’, I. On the distribution of points in a cube and the approximate evaluation of integrals. USSR Comput. Math. Math. Physics 1967, 7, 86–112. [Google Scholar] [CrossRef]
  106. Halton, J.H. On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals. Numer. Math. 1960, 2, 84–90. [Google Scholar] [CrossRef]
  107. Berger, K.; Hank, T.; Halabuk, A.; Rivera-Caicedo, J.P.; Wocher, M.; Mojses, M.; Gerhátová, K.; Tagliabue, G.; Dolz, M.M.; Venteo, A.B.P.; et al. Assessing Non-Photosynthetic Cropland Biomass from Spaceborne Hyperspectral Imagery. Remote Sens. 2021, 13, 4711. [Google Scholar] [CrossRef]
  108. Wang, J.L.; Chiou, J.M.; Müller, H.G. Functional data analysis. Annu. Rev. Stat. Appl. 2016, 3, 257–295. [Google Scholar] [CrossRef]
  109. Svendsen, D.H.; Martino, L.; Camps-Valls, G. Active emulation of computer codes with Gaussian processes–Application to remote sensing. Pattern Recognit. 2020, 100, 107103. [Google Scholar] [CrossRef]
  110. Hughes, G. On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef]
  111. Jolliffe, I.T. Principal Component Analysis, 2nd ed.; Springer Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar] [CrossRef]
  112. Golub, G.H.; Van Loan, C.F. Matrix Computations; JHU Press: Baltimore, MD, USA, 2013. [Google Scholar]
  113. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
  114. Shlens, J. A tutorial on principal component analysis. arXiv 2014, arXiv:1404.1100. [Google Scholar] [CrossRef]
  115. Ramsay, J.O.; Silverman, B.W. Functional Data Analysis; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
  116. Maaten, L.v.d.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
  117. Tenenbaum, J.B.; Silva, V.d.; Langford, J.C. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290, 2319–2323. [Google Scholar] [CrossRef] [PubMed]
  118. McInnes, L.; Healy, J.; Melville, J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv 2018, arXiv:1802.03426. [Google Scholar]
  119. Schölkopf, B.; Smola, A.; Müller, K.R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 1998, 10, 1299–1319. [Google Scholar] [CrossRef]
  120. Morata, M.; Siegmann, B.; García-Soria, J.L.; Rivera-Caicedo, J.P.; Verrelst, J. On the potential of principal component analysis for the reconstruction of full-spectrum SIF emission and emulated airborne-to-satellite upscaling. Remote Sens. Environ. 2025, 328, 114865. [Google Scholar] [CrossRef]
  121. Datta, A.; Banerjee, S.; Finley, A.O.; Gelfand, A.E. Hierarchical nearest-neighbor Gaussian process models for large geostatistical datasets. J. Am. Stat. Assoc. 2016, 111, 800–812. [Google Scholar] [CrossRef]
  122. Quinonero-Candela, J.; Rasmussen, C.E. A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 2005, 6, 1939–1959. [Google Scholar]
  123. Hensman, J.; Fusi, N.; Lawrence, N.D. Gaussian processes for big data. In Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI), Bellevue, WA, USA, 11–15 July 2013; pp. 282–290. [Google Scholar]
  124. Damianou, A.; Lawrence, N.D. Deep gaussian processes. In Proceedings of the Artificial intelligence and statistics, Scottsdale, AZ, USA, 29 April–1 May 2013; PMLR: Cambridge, MA, USA; pp. 207–215. [Google Scholar]
  125. Chipman, H.A.; George, E.I.; McCulloch, R.E. BART: Bayesian additive regression trees. Ann. Appl. Stat. 2010, 4, 266–298. [Google Scholar] [CrossRef]
  126. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
  127. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. Lightgbm: A highly efficient gradient boosting decision tree. Adv. Neural Inf. Process. Syst. 2017, 30, 52. [Google Scholar]
  128. Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
  129. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  130. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
  131. Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
  132. Blundell, C.; Cornebise, J.; Kavukcuoglu, K.; Wierstra, D. Weight uncertainty in neural networks. In Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France, 6–11 July 2015; PMLR: Cambridge, MA, USA, 2015; pp. 1613–1622. [Google Scholar]
  133. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 8–13 December 2014; Volume 27, pp. 2672–2680. [Google Scholar]
  134. Sacks, J.; Welch, W.J.; Mitchell, T.J.; Wynn, H.P. Design and Analysis of Computer Experiments. Stat. Sci. 1989, 4, 409–435. [Google Scholar] [CrossRef]
  135. Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
  136. Wilson, A.; Nickisch, H. Kernel interpolation for scalable structured Gaussian processes (KISS-GP). In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; PMLR: Cambridge, MA, USA, 2015; pp. 1775–1784. [Google Scholar]
  137. Gardner, J.R.; Pleiss, G.; Weinberger, K.Q.; Bindel, D.; Wilson, A.G. GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration. In Proceedings of the Advances in Neural Information Processing Systems 31 (NeurIPS), Montreal, QC, Canada, 2–8 December 2018. [Google Scholar]
  138. Saltelli, A.; Ratto, M.; Andres, T.; Campolongo, F.; Cariboni, J.; Gatelli, D.; Saisana, M.; Tarantola, S. Global Sensitivity Analysis: The Primer; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
  139. Verrelst, J.; Rivera Caicedo, J.; Vicent, J.; Morcillo Pallarés, P.; Moreno, J. Approximating Empirical Surface Reflectance Data through Emulation: Opportunities for Synthetic Scene Generation. Remote Sens. 2019, 11, 157. [Google Scholar] [CrossRef]
  140. Zhou, Q.; Wang, S.; Liu, N.; Townsend, P.; Jiang, C.; Peng, B.; Verhoef, W.; Guan, K. Towards operational atmospheric correction of airborne hyperspectral imaging spectroscopy: Algorithm evaluation, key parameter analysis, and machine learning emulators. ISPRS J. Photogramm. Remote Sens. 2023, 196, 386–401. [Google Scholar] [CrossRef]
  141. Vicent, J.; Sabater, N.; Tenjo, C.; Acarreta, J.R.; Manzano, M.; Rivera, J.P. FLEX End-to-End Mission Performance Simulator. IEEE Trans. Geosci. Remote Sens. 2016, 54, 4215–4223. [Google Scholar] [CrossRef]
  142. Tenjo, C.; Rivera-Caicedo, J.P.; Sabater, N.; Vicent Servera, J.; Alonso, L.; Verrelst, J.; Moreno, J. Design of a Generic 3-D Scene Generator for Passive Optical Missions and Its Implementation for the ESA’s FLEX/Sentinel-3 Tandem Mission. IEEE Trans. Geosci. Remote Sens. 2018, 56, 1290–1307. [Google Scholar] [CrossRef]
  143. Coppo, P.; Taiti, A.; Pettinato, L.; Francois, M.; Taccola, M.; Drusch, M. Fluorescence imaging spectrometer (FLORIS) for ESA FLEX mission. Remote Sens. 2017, 9, 649. [Google Scholar] [CrossRef]
  144. Celesti, M.; Rast, M.; Adams, J.; Boccia, V.; Gascon, F.; Isola, C.; Nieke, J. The Copernicus Hyperspectral Imaging Mission for the Environment (CHIME): Status and Planning. In Proceedings of the IGARSS 2022–2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 5011–5014. [Google Scholar] [CrossRef]
  145. Pato, M.; Buffat, J.; Alonso, K.; Auer, S.; Carmona, E.; Maier, S.; Müller, R.; Rademske, P.; Rascher, U.; Scharr, H. Physics-based Machine Learning Emulator of At-sensor Radiances for Solar-induced Fluorescence Retrieval in the O2-A Absorption Band. In IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing; IEEE: Piscataway, NJ, USA, 2024; pp. 18566–18576. [Google Scholar] [CrossRef]
  146. Morata, M.; Siegmann, B.; Pérez-Suay, A.; García-Soria, J.L.; Rivera-Caicedo, J.P.; Verrelst, J. Neural Network Emulation of Synthetic Hyperspectral Sentinel-2-Like Imagery With Uncertainty. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 762–772. [Google Scholar] [CrossRef]
  147. Barrou Dumont, Z.; Gascoin, S.; Inglada, J. Snow and Cloud Classification in Historical SPOT Images: An Image Emulation Approach for Training a Deep Learning Model Without Reference Data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 5541–5552. [Google Scholar] [CrossRef]
  148. Jacquemoud, S.; Baret, F.; Andrieu, B.; Danson, F.; Jaggard, K. Extraction of vegetation biophysical parameters by inversion of the PROSPECT + SAIL models on sugar beet canopy reflectance data. Application to TM and AVIRIS sensors. Remote Sens. Environ. 1995, 52, 163–172. [Google Scholar] [CrossRef]
  149. Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nature Reviews Physics 2021, 3, 422–440. [Google Scholar] [CrossRef]
  150. Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
  151. Fernández, A.; García, S.; Galar, M.; Prati, R.C.; Krawczyk, B.; Herrera, F. Learning from Imbalanced Data Sets; Springer: Berlin/Heidelberg, Germany, 2018; Volume 10. [Google Scholar]
  152. Elshamli, A.; Taylor, G.W.; Areibi, S. Multisource domain adaptation for remote sensing using deep neural networks. IEEE Trans. Geosci. Remote Sens. 2019, 58, 3328–3340. [Google Scholar] [CrossRef]
  153. Willard, J.; Jia, X.; Xu, S.; Steinbach, M.; Kumar, V. Integrating physics-based modeling with machine learning: A survey. arXiv 2020, arXiv:2003.04919. [Google Scholar]
  154. Higdon, D.; Gattiker, J.; Williams, B.; Rightley, M. Computer model calibration using high-dimensional output. J. Am. Stat. Assoc. 2008, 103, 570–583. [Google Scholar] [CrossRef]
  155. Murphy, K.P. Probabilistic Machine Learning: Advanced Topics; MIT Press: Cambridge, MA, USA, 2023. [Google Scholar]
  156. García-Soria, J.L.; Morata, M.; Berger, K.; Pascual-Venteo, A.B.; Rivera-Caicedo, J.P.; Verrelst, J. Evaluating epistemic uncertainty estimation strategies in vegetation trait retrieval using hybrid models and imaging spectroscopy data. Remote Sens. Environ. 2024, 310, 114228. [Google Scholar] [CrossRef]
  157. Servera, J.; Rivera-Caicedo, J.; Verrelst, J.; Munoz-Mari, J.; Sabater, N.; Berthelot, B.; Camps-Valls, G.; Moreno, J. Systematic Assessment of MODTRAN Emulators for Atmospheric Correction. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–17. [Google Scholar] [CrossRef]
  158. Bouhlel, M.A.; Hwang, J.T.; Bartoli, N.; Lafage, R.; Morlier, J.; Martins, J.R.R.A. A Python surrogate modeling framework with derivatives. Adv. Eng. Softw. 2019, 135, 102662. [Google Scholar] [CrossRef]
  159. Saves, P.; Lafage, R.; Bartoli, N.; Diouane, Y.; Bussemaker, J.; Lefebvre, T.; Hwang, J.T.; Morlier, J.; Martins, J.R.R.A. SMT 2.0: A Surrogate Modeling Toolbox with a focus on Hierarchical and Mixed Variables Gaussian Processes. Adv. Eng. Sofware 2024, 188, 103571. [Google Scholar] [CrossRef]
  160. Pang, G.; Lu, L.; Karniadakis, G.E. fPINNs: Fractional physics-informed neural networks. SIAM J. Sci. Comput. 2019, 41, A2603–A2626. [Google Scholar] [CrossRef]
  161. Cuomo, S.; Di Cola, V.S.; Giampaolo, F.; Rozza, G.; Raissi, M.; Piccialli, F. Scientific machine learning through physics–informed neural networks: Where we are and what’s next. J. Sci. Comput. 2022, 92, 88. [Google Scholar] [CrossRef]
  162. Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  163. Dwivedi, R.; Dave, D.; Naik, H.; Singhal, S.; Omer, R.; Patel, P.; Qian, B.; Wen, Z.; Shah, T.; Morgan, G.; et al. Explainable AI (XAI): Core ideas, techniques, and solutions. Acm Comput. Surv. 2023, 55, 1–33. [Google Scholar] [CrossRef]
  164. Mobley, C.D. Light and Water: Radiative Transfer in Natural Waters; Academic Press: New York, NY, USA, 1994. [Google Scholar]
  165. Gege, P. WASI-2D: A software tool for regionally optimized analysis of imaging spectrometer data from deep and shallow waters. Comput. Geosci. 2014, 62, 208–215. [Google Scholar] [CrossRef]
  166. Hadjal, M.; Paterson, R.; McKee, D. Neural networks to retrieve in-water constituents applied to radiative transfer models simulating coastal water conditions. Front. Remote Sens. 2023, 4, 973944. [Google Scholar] [CrossRef]
  167. Mobley, C.D. Fast light calculations for ocean ecosystem and inverse models. Opt. Express 2011, 19, 18927–18944. [Google Scholar] [CrossRef]
  168. Jacquemoud, S.; Baret, F.; Hanocq, J.F. Modeling spectral and bidirectional soil reflectance. Remote Sens. Environ. 1992, 41, 123–132. [Google Scholar] [CrossRef]
  169. Liang, S.; Townshend, J.R.G. A modified Hapke model for soil bidirectional reflectance. Remote Sens. Environ. 1996, 55, 1–10. [Google Scholar] [CrossRef]
  170. Bablet, A.; Vu, P.V.H.; Jacquemoud, S.; Viallefont-Robinet, F.; Fabre, S.; Briottet, X.; Sadeghi, M.; Whiting, M.L.; Baret, F.; Tian, J. MARMIT: A multilayer radiative transfer model of soil reflectance to estimate surface soil moisture content in the solar domain (400–2500 nm). Remote Sens. Environ. 2018, 217, 1–17. [Google Scholar] [CrossRef]
  171. Dupiau, A.; Jacquemoud, S.; Briottet, X.; Fabre, S.; Viallefont-Robinet, F.; Philpot, W.; Biagio, C.D.; Hébert, M.; Formenti, P. MARMIT-2: An improved version of the MARMIT model to predict soil reflectance as a function of surface water content in the solar domain. Remote Sens. Environ. 2022, 272, 112951. [Google Scholar] [CrossRef]
Figure 1. Workflow comparison between direct regression and emulation of RTM simulations with PCA compression.
Figure 1. Workflow comparison between direct regression and emulation of RTM simulations with PCA compression.
Remotesensing 17 03618 g001
Figure 2. Example of original simulations of (a) LESS TOC reflectance and (c) PROSAIL coupled with libRadtran TOA radiance, alongside 1000 randomly generated spectral outputs from their corresponding GPR emulators (b,d). Colors are for visual distinction only.
Figure 2. Example of original simulations of (a) LESS TOC reflectance and (c) PROSAIL coupled with libRadtran TOA radiance, alongside 1000 randomly generated spectral outputs from their corresponding GPR emulators (b,d). Colors are for visual distinction only.
Remotesensing 17 03618 g002
Figure 3. Total sensitivity results of TOA radiance using a GPR emulator of a 12-variable PROSAIL-MODTRAN RTM. Figure is adapted from [139]: A GPR emulator was run in ARTMO’s GSA tool with 1000 samples per variable. The processing took less than 40 s. Figure is adapted from [139]; see [139] for details and interpretation.
Figure 3. Total sensitivity results of TOA radiance using a GPR emulator of a 12-variable PROSAIL-MODTRAN RTM. Figure is adapted from [139]: A GPR emulator was run in ARTMO’s GSA tool with 1000 samples per variable. The processing took less than 40 s. Figure is adapted from [139]; see [139] for details and interpretation.
Remotesensing 17 03618 g003
Figure 4. Illustration of A-SGM scene generation by using emulators. First, a land cover map is arbitrarily created. Then, these classes are filled with the input variables as described in [61]. Following the emulators are run to generate the spectral datacubes. For illustration, output images of a few wavelengths are shown. General statistics (mean, SD, min-max) per class are derived. Figure is adapted from [61]; see [61] for details.
Figure 4. Illustration of A-SGM scene generation by using emulators. First, a land cover map is arbitrarily created. Then, these classes are filled with the input variables as described in [61]. Following the emulators are run to generate the spectral datacubes. For illustration, output images of a few wavelengths are shown. General statistics (mean, SD, min-max) per class are derived. Figure is adapted from [61]; see [61] for details.
Remotesensing 17 03618 g004
Figure 5. Workflow showing the procedure of emulating hyper- from S2 multispectral data using an MLRA in combination with PCA DR and back-projection. Figure is adapted from [146]; see [146] for details.
Figure 5. Workflow showing the procedure of emulating hyper- from S2 multispectral data using an MLRA in combination with PCA DR and back-projection. Figure is adapted from [146]; see [146] for details.
Remotesensing 17 03618 g005
Figure 6. Illustration of emulator-based numerical inversion over a forested terrain. A KRR emulator was trained based on DART simulations. The function ’lsqnonlin’ of Matlab’s optimization toolbox was used for the inversion. The traits FVC, LCC, and LAI were successfully retrieved, while the RMSE gives insight into retrieval quality. See [78] for details.
Figure 6. Illustration of emulator-based numerical inversion over a forested terrain. A KRR emulator was trained based on DART simulations. The function ’lsqnonlin’ of Matlab’s optimization toolbox was used for the inversion. The traits FVC, LCC, and LAI were successfully retrieved, while the RMSE gives insight into retrieval quality. See [78] for details.
Remotesensing 17 03618 g006
Table 1. Overview of representative RTMs in vegetation and atmosphere domains. refs: references.
Table 1. Overview of representative RTMs in vegetation and atmosphere domains. refs: references.
Canopy RTMs
ModelKey FeaturesOutputsKey Refs.
PROSAILLeaf and canopy opticsReflectance[7,8,9,10]
SCOPEEnergy balance, photochemistryReflectance, SIF, fluxes[11,12,31,32]
FLIGHT3D canopy architecture, detailed scatteringReflectance, SIF[33,34]
DART3D voxel, facet and ray tracing, heterogeneous scenesReflectance, radiance, LiDAR, SIF[13,15,16,35]
LESS3D voxel, facet and ray tracing, heterogeneous scenesReflectance, SIF, LiDAR, fluxes[14,29,36]
Atmosphere RTMs
6SAtmospheric correction, multiple scatteringTOA radiance/reflectance, transmittance[17,37]
MODTRANSpectral transmission, path radiance; correlated-kRadiance/irradiance, transmittance, path terms[18,38]
libRadtranFlexible RT driver with DISORT solver, trace gasesHigh-res. radiance/irradiance, actinic flux[19,39]
Table 2. Comparison of MLRAs used for emulating RTMs, ordered from the most applied to the least.
Table 2. Comparison of MLRAs used for emulating RTMs, ordered from the most applied to the least.
MethodDescriptionProsConsExample Emulation Use
Neural Networks (NNs) [59]Flexible models that learn non-linear mappings through interconnected layers.High scalability, captures complex patterns, optimized inference speed.Requires large datasets, limited interpretability, approximate UQ, sensitive to hyperparameters.Emulating complex RTMs (e.g., coupled vegetation-atmosphere models), scene-to-scene inversion [26,42,60,61,62,63,64].
Deep Learning NNs (DLNNs) [65]Advanced NN architectures including Convolutional NNs, Recurrent NNs, autoencoders, transformers, and physics-informed NNs. Designed for high-dimensional, spatiotemporal, or structured data emulation.Extremely flexible, handles high-dimensional inputs, learns spatial/temporal structure, enables end-to-end inversion, supports uncertainty via dropout or ensembles.Computationally demanding to train, it requires large annotated datasets, has reduced interpretability, and carries an overfitting risk.Scene-level RTM emulation, spatiotemporal flux retrievals, hybrid physical–DL models (e.g., MODTRAN emulation with CNNs) [66,67,68,69,70].
Gaussian Process Regression (GPR) [71]Kernel-based probabilistic model providing both mean and variance predictions; ideal for small datasets and inherent UQ.High accuracy, strong UQ, robust with small data, interpretable.Scales as O ( N 3 ) , memory-intensive, less suited for large datasets.Emulating PROSAIL, SCOPE, MODTRAN in applications where accuracy and UQ is critical [26,27,42,60,61,63,72,73,74].
Random Forests (RF) [75]Ensemble of decision trees that aggregate outputs for robust prediction. Well-suited for tabular and structured data.Robust, fast training, interpretable via feature importance, handles noise.No inherent UQ but empirical variance. Slower prediction at scale due to multiple decision trees, and tends to reduce the impact of outliers due to its averaging nature.Used as alternative RTM emulators in some benchmarking studies [61,76].
Kernel Ridge Regression (KRR) [77]Ridge regression in a kernel-transformed space; similar to GPR but deterministic.Competitive accuracy, captures non-linearity, less sensitive to hyperparameters.Scales as O ( N 3 ) , no native UQ, less popular than GPR.Alternative to GPR for mid-sized RTMs where UQ is not essential [60,61,69,78].
Support Vector Regression (SVR) [79]Finds a hyperplane with ϵ -insensitive loss; effective in high-dimensional spaces with kernel trick.Accurate, robust to outliers, generalizes well with good kernel choice.Scales as O ( N 2 ) , sensitive to kernel and hyperparameters, lacks native UQ.Used for spectral emulation tasks with moderate-sized datasets [61,76].
Polynomial Chaos Expansion (PCE) [80]Expands model output in orthogonal polynomials based on input distributions, allowing for analytical UQ and sensitivity analysis.Provides analytical UQ, Sobol indices, interpretable, and efficient for low dimensions.Suffers from curse of dimensionality, basis tied to distribution, struggles with strong non-linearity.Used in global sensitivity and UQ analysis of deterministic models. Not applied to RTMs.
Table 3. Comparison of common MLRAs for emulating RTMs: accuracy, UQ, scalability, interpretability, and key references. UQ refers to whether the method natively provides principled predictive uncertainty or only empirical approximations.
Table 3. Comparison of common MLRAs for emulating RTMs: accuracy, UQ, scalability, interpretability, and key references. UQ refers to whether the method natively provides principled predictive uncertainty or only empirical approximations.
MethodAccuracyUQScalabilityInterpretabilityRTM Studies
NNsHighApprox. (MC dropout, ensembles)HighLow[26,42,60,61,62,63,64]
DLNNsVery HighApprox. (MC dropout, deep ensembles)Very HighVery Low[66,67,68,69,70]
GPRHighYes (Bayesian predictive distribution)LimitedMedium[26,27,42,60,61,63,72,73,74]
RFModerate–HighEmpirical (ensemble variance)HighMedium[61,63,69]
KRRHighNoHighMedium[26,42,60,61,78]
SVRModerate–HighNoMediumMedium[61,76]
PCEModerateYes (analytical)ModerateHighNo RTM studies
Table 4. Comparison of common space-filling sampling designs for emulator training in RTM applications.
Table 4. Comparison of common space-filling sampling designs for emulator training in RTM applications.
PropertyLatin Hypercube Sampling (LHS) [104]Sobol Sequence [105]Halton Sequence [106]
TypeStratified random samplingQuasi-random (low-discrepancy) sequenceQuasi-random (low-discrepancy) sequence
Space-filling QualityGood in all dimensions (by construction)Excellent for moderate to high dimensionsGood in low dimensions; deteriorates with higher dimensions
UniformityRandom, but forced stratification ensures uniform marginal distributionsHighly uniform; minimizes gaps and clustersUniform in low dimensions; suffers from correlation in higher dimensions
DeterminismStochastic (can vary by seed)DeterministicDeterministic
ScalabilityEasily scalable to high dimensions and sample sizesEfficient in high-dimensional settings; extensibleLess scalable; performance degrades beyond 10–20 dimensions
Implementation SimplicitySimple and widely implementedSlightly more complex; supported in numerical librariesRelatively simple but less widely used
Suitability forEmulator TrainingCommon choice due to flexibility and randomnessPreferred for high-dimensional RTMs due to uniformity and extensibilitySuitable for low-dimensional problems, less ideal for complex RTMs
ReproducibilityDepends on random seedFully reproducibleFully reproducible
UseWidely used for trainingUsed for running emulators in global sensitivity analysis (see also Section 4.2)No RTM emulation studies
Table 6. Overview of promising machine learning methods for RTM emulation. While not yet widely applied in RTM studies, these approaches offer potential due to scalability, UQ, or structural flexibility. Refs: references.
Table 6. Overview of promising machine learning methods for RTM emulation. While not yet widely applied in RTM studies, these approaches offer potential due to scalability, UQ, or structural flexibility. Refs: references.
ML MethodTypeStrengths for EmulationRefs.
Scalable GPR; e.g., Sparse GPR, nearest neighbor GPR (NNGPR), stochastic variational (SVGP)Probabilistic kernel regressionUQ; scalable to large datasets via approximation[121,122,123]
Deep GPRDeep kernel-based regressionCaptures hierarchical structure; better handles complex non-stationarity[124]
Bayesian Additive Regression Trees (BART)Bayesian ensemble treesProbabilistic output; interpretable; handles nonlinear relationships well[125]
XGBoostGradient-boosted decision trees (GBDTs)Fast and accurate; robust to overfitting; interpretable[126]
LightGBMGBDTs with histogram splitsVery fast; handles large-scale input efficiently[127]
CatBoostGBDTs with ordered boostingEffective with categorical inputs; competitive accuracy[128]
CNNsDeep learning (spatial)Strong at extracting local spectral/spatial patterns; good for hyperspectral data[129]
TransformersDeep learning (attention)Captures long-range interactions; suited to structured inputs (e.g., spectra)[130]
PINNsPhysics-informed NNsIncorporates RTM physics in training (e.g., spectral absorption features, conservation laws); enables physically consistent emulation[131]
Bayesian Neural Networks (BNNs)Probabilistic deep learningUncertainty-aware emulation; flexible for complex nonlinearities[132]
Generative Adversarial Networks (GANs)Generative deep learningCapable of high-fidelity synthetic spectral generation; potential for inversion/data augmentation[133]
Table 7. Qualitative comparison of emulator performance for hyperspectral RTMs (~500 bands) and their dimensionality-reduced counterparts (~10 components). Stars (★ ★ ★) denote relative performance (higher is better).
Table 7. Qualitative comparison of emulator performance for hyperspectral RTMs (~500 bands) and their dimensionality-reduced counterparts (~10 components). Stars (★ ★ ★) denote relative performance (higher is better).
Emulator TypeOutput DimensionalityTraining Data NeedTraining SpeedPrediction SpeedAccuracy/FidelityMemory & Storage EfficiencyResistance to OverfittingInter-PretabilityKey Remarks
RF500 bandsModerate★ ★ ✩★ ★ ★★ ★ ✩★ ★ ✩★ ★ ✩Medium–highStable baseline; struggles with very high-D outputs
RF + DR10 compsModerate★ ★ ★★ ★ ★★ ★ ✩★ ★ ★★ ★ ★Medium–highDR reduces redundancy; improved generalization
KRR500 bandsModerate★ ★ ✩★ ★ ✩★ ★ ✩★ ★ ✩★ ★ ✩MediumKernel choice critical; moderate scalability
KRR + DR10 compsModerate★ ★ ★★ ★ ★★ ★ ✩★ ★ ★★ ★ ★MediumDR improves conditioning and robustness
GPR500 bandsLow–moderate★ ✩ ✩★ ✩ ✩★ ★ ★★ ✩ ✩★ ★ ★HighHigh fidelity and UQ; limited by cubic scaling
GPR + DR10 compsLow–moderate★ ★ ✩★ ★ ✩★ ★ ★★ ★ ✩★ ★ ★HighLatent-space GPR balances efficiency and reliability
NN500 bandsHigh★ ✩ ✩★ ★ ★★ ★ ✩★ ★ ✩★ ★ ✩LowPowerful for nonlinear RTMs; costly, prone to overfit
NN + DR (latent-AE)10 compsHigh★ ★ ✩★ ★ ★★ ★ ✩★ ★ ★★ ★ ✩LowDR acts as structural regularizer; efficient and stable
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Verrelst, J.; Morata, M.; García-Soria, J.L.; Sun, Y.; Qi, J.; Rivera-Caicedo, J.P. RTM Surrogate Modeling in Optical Remote Sensing: A Review of Emulation for Vegetation and Atmosphere Applications. Remote Sens. 2025, 17, 3618. https://doi.org/10.3390/rs17213618

AMA Style

Verrelst J, Morata M, García-Soria JL, Sun Y, Qi J, Rivera-Caicedo JP. RTM Surrogate Modeling in Optical Remote Sensing: A Review of Emulation for Vegetation and Atmosphere Applications. Remote Sensing. 2025; 17(21):3618. https://doi.org/10.3390/rs17213618

Chicago/Turabian Style

Verrelst, Jochem, Miguel Morata, José Luis García-Soria, Yilin Sun, Jianbo Qi, and Juan Pablo Rivera-Caicedo. 2025. "RTM Surrogate Modeling in Optical Remote Sensing: A Review of Emulation for Vegetation and Atmosphere Applications" Remote Sensing 17, no. 21: 3618. https://doi.org/10.3390/rs17213618

APA Style

Verrelst, J., Morata, M., García-Soria, J. L., Sun, Y., Qi, J., & Rivera-Caicedo, J. P. (2025). RTM Surrogate Modeling in Optical Remote Sensing: A Review of Emulation for Vegetation and Atmosphere Applications. Remote Sensing, 17(21), 3618. https://doi.org/10.3390/rs17213618

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop