1. Introduction
Spectral reconstruction from RGB images—that is, generating multispectral or hyperspectral images from simple three-channel images (red, green, and blue)—is a topic that has received increasing attention in the last decade. Native hyperspectral images, acquired with hyperspectral cameras, provide richer information than traditional RGB images, capturing data across multiple narrow spectral bands, from tens to hundreds of bands. It allows for detecting chemical and physical properties in targets on the scene, related to the spectral signature, without any damage or invasive sampling. This capability enables advanced applications in various sectors, including remote sensing, agriculture, geology, medicine, environmental monitoring, and cultural heritage preservation [
1]. Within environmental monitoring, air pollution is a topic where hyperspectral images are the best source of information. However, hyperspectral cameras are generally complex, expensive, and often impractical for large-scale applications or consumer contexts. To overcome these limitations, a line of research has developed aimed at reconstructing detailed spectral information from RGB images by leveraging advanced computational techniques and machine learning models [
2]. Generally, the reconstruction of hyperspectral images from RGB images is a complex challenge due to information loss and limitations in data quality. First of all, there are problems of the non-uniqueness of the solution: reconstructing the spectrum from an RGB image is not unique, as different spectral distributions can correspond to the same RGB values. Given the sensor response, multiple input spectra can lead to identical integral values. Moreover, there is lack of an RGB camera frequency response. So, we do not know exactly how the camera works with the incoming spectrum. An important aspect of this problem is that many models require knowledge of the spectral response of the sensor (e.g. attention-block models) [
1], which are not available for common RGB cameras. In addition, we must couple with data degradation. Raw image values contain unknown pixel responses directly related to the incoming electromagnetic waves, but JPEG compression, white balance, and gamma correction degrade the original spectral information. However, recent studies (from regression models to deep learning models) demonstrate promising results, including predictions of spectral information beyond the visible spectrum, such as in the NIR (near-infrared) region. The methods proposed in the literature can primarily be divided into two general categories: prior-based methods and data-driven methods. Prior-based methods exploit statistical information and spatial or spectral characteristics, such as sparsity or spectral correlation [
1]. On the other hand, data-driven methods mainly rely on deep neural networks, which have demonstrated a great ability to model complex relationships between RGB data and corresponding spectral information [
3]. Recent research has significantly focused on the use of deep learning techniques to address the problem of spectral reconstruction. Approaches based on deep convolutional neural networks (CNNs), as well as more advanced architectures, such as Generative Adversarial Networks (GANs), have shown remarkable effectiveness, especially when trained on large datasets [
1]. A critical issue in this field concerns the physical accuracy of reconstructed images. Many current methods, although achieving good spectral accuracy, may generate results that do not respect physical plausibility—meaning that reconstructed images do not precisely yield the original RGB values when reintegrated with the camera’s spectral sensitivities. To address this challenge, recent studies have developed physically plausible reconstruction methods, introducing constraints based on the decomposition of spectral information into fundamental metameric components. This allows for a reconstruction that is physically coherent and robust to exposure variations [
4]. A further advancement in research involves the introduction of category-specific prior information—additional details describing particular object types or surfaces present in the images—to further improve reconstruction quality [
5]. Finally, an important driver for developing these approaches has been the creation of new datasets, such as ARAD-HS [
6], which offers a wide variety of natural hyperspectral images. These datasets enable rigorous validation and performance comparisons among the various developed methods [
7]. It is important to note that in the analyzed studies, RGB and hyperspectral images can be treated either as directly measured luminance data or as reflectance data, where the acquired signal is normalized against the light source and environmental conditions. However, it should be emphasized that many commonly cited datasets, such as ARAD-HS, CAVE, ICVL, and BGU-HS [
1], do not provide detailed calibration procedures for precise conversion to reflectance, one of the most important steps in order to extract numerical information from the images related to the status of plant and fruits. The only dataset explicitly described with rigorous calibration via a standard white panel is the KAUST-HS dataset [
1,
6]. Consequently, the use of reflectance data in studies based on these datasets should be approached cautiously, recognizing potential limitations in their accuracy and generalizability. Applying these models to images acquired under uncontrolled conditions remains an open challenge but represents a significant area of research for food-quality analysis and other natural material assessments. This study presents a detailed analysis of the topics previously introduced, beginning with the current state of the art in conventional hyperspectral imaging (HSI), with particular attention to methodologies, algorithms, and their limitations. On this basis, we then examine spectral reconstruction, highlighting the advantages it provides as well as the constraints that characterize it, with specific reference to the earliest applications reported in the agri-food domain.
3. Traditional HSI On-Field
HSI is emerging as a key technology across a wide range of agri-food applications. The main advantage of this technology lies in its non-invasive approach, as it enables the analysis of objects within a scene by simultaneously combining imaging and spectroscopy. The result of this measurement is a three-dimensional data cube, or hypercube, consisting of two spatial dimensions and one spectral dimension. Each acquired pixel is characterized by a complete spectrum—reflected, transmitted, or absorbed by the sample—commonly referred to as its spectral signature. Beyond laboratory and industrial systems, hyperspectral imaging can also be implemented on different platforms, including ground-based setups, UAVs, airborne systems, and satellites, thus covering a wide range of spatial scales and application domains.
The development of regression and classification models based on hyperspectral imaging (HSI) in agriculture generally follows a consolidated methodological pipeline, which, despite its modularity, remains consistent across most studies as shown in
Figure 1. The first critical decision concerns the acquisition system, where the choice of the hyperspectral camera and platform determines the spectral range, resolution, and signal-to-noise characteristics of the resulting datacube (
). Indoor gantry systems and imaging boxes are frequently employed for controlled experiments and calibration datasets, while ground-based vehicles, such as tractors or unmanned ground vehicles (UGVs), allow for direct deployment in the field. Unmanned aerial vehicles (UAVs) and manned airborne platforms extend coverage to larger areas, with satellites providing long-term and wide-scale monitoring. Selecting the sensor technology (e.g., pushbroom, whiskbroom, snapshot, or spectral scanning cameras) is equally decisive, as each modality entails specific trade-offs in terms of spatial resolution, acquisition speed, and robustness to motion. Once data are acquired, extensive pre-processing is typically required to ensure reliability. This includes radiometric corrections, such as white/dark reference calibration and spectral normalization, geometric adjustments for distortion removal and georeferencing, and, in the case of airborne acquisitions, atmospheric corrections. Following pre-processing, a crucial step is segmentation and region-of-interest (ROI) extraction, where the objects of interest (leaf, fruit, canopy, or plot) are separated from background elements, such as soil, shadows, or crop residues. This is achieved through a variety of techniques, including morphological operations, thresholding, vegetation indices, spectral clustering approaches such as Principal Component Analysis (PCA), or more recently, deep neural segmentation methods. A critical aspect at this stage is the choice of the analytical strategy, which generally follows two main approaches [
8]. The first relies on the ROI average spectrum, where the mean spectral signature of a region is used to represent the entire sample. This method is computationally efficient and suitable for relatively homogeneous materials or when only global classification is required, yet it inevitably overlooks local heterogeneity and may obscure subtle adulterations or localized defects. Conversely, the pixel-wise approach exploits the full spatial resolution by analyzing each pixel spectrum individually, thus enabling the visualization of chemical maps and the detection of spatially localized anomalies. While this strategy preserves the distinctive advantage of hyperspectral imaging, it also introduces substantial computational demands, higher sensitivity to noise, and the need for advanced chemometric or deep learning models. Ultimately, the choice between the two reflects a trade-off between simplicity and representativeness, with recent studies often favoring the ROI approach for its practicality, even at the cost of underutilizing the spatial richness of hyperspectral data. The next phase concerns dimensionality reduction and feature extraction, necessary to condense the large spectral–spatial information into more manageable and discriminative representations. Beyond PCA, other methods, such as Minimum Noise Fraction (MNF), Independent Component Analysis (ICA), Partial Least Squares (PLS), or band selection strategies, are employed, while deep learning-based embeddings, such as autoencoders and Convolutional Neural Networks (CNNs), have gained increasing prominence. Building a robust dataset requires linking hyperspectral measurements to ground-truth labels obtained through expert annotation, field sampling and laboratory analysis—for instance, by measuring biochemical or physiological parameters (e.g., chlorophyll, nitrogen, soluble solids) through standard methods such as spectrophotometry, refractometry, or HPLC (High-Performance Liquid Chromatography). This is complemented by careful definition of training, validation, and test splits, with data augmentation techniques, such as geometric transformations (e.g., rotation, flipping, cropping), spectral perturbations (e.g., noise addition, spectral shifts), or synthetic sample generation via generative models, used to increase dataset variability and reduce overfitting in limited datasets. For modeling, regression tasks typically rely on algorithms such as Partial Least Squares Regression (PLSR), Random Forests (RF), or CNNs, targeting biochemical and physiological parameters. Classification problems—such as crop/weed discrimination or stress detection—are addressed with Support Vector Machines (SVMs), two- or three-dimensional CNNs, or hybrid models. Lightweight versions of these algorithms are increasingly adopted for real-time, on-board inference on GPUs (Graphics Processing Units) or FPGAs (Field-Programmable Gate Arrays). Model evaluation combines classical statistical measures with domain-specific metrics. Regression models are commonly assessed with the coefficient of determination (
), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), or the Ratio of Performance to Deviation (RPD). Classification models are evaluated through accuracy, precision, recall, F1-score, and confusion matrices. For spectral fidelity and image reconstruction, metrics such as the Spectral Angle Mapper (SAM), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index (SSIM) are also employed. Finally, deployment translates model outputs into actionable tools. In agriculture, this includes the production of operational maps for nutrients or water stress, the integration of real-time predictions into precision spraying or micro-dosing systems, or the use of embedded pipelines for sorting and grading applications. Comparable workflows are reported in other domains—such as food quality and safety, medical imaging, and water or flood monitoring—where analogous sequences of acquisition, segmentation, feature extraction, and modeling are applied with domain-specific adaptations. For example, indoor calibration protocols are optimized for food safety, tissue- or biomarker-driven feature sets guide medical imaging, and water–soil–vegetation spectral separation is exploited in environmental monitoring [
9,
10].
Table 2 compares the main platforms employed for hyperspectral data acquisition in precision agriculture and related domains. Ground-based systems (gantries, imaging boxes, UGVs) stand out for their high stability and controlled illumination, making them particularly suitable for calibration studies and reference dataset collection. UAVs offer a balance between flexibility and spatial detail, with established applications in vineyards, rice fields, and horticultural crops. Manned airborne systems provide regional coverage but are constrained by high costs and logistical complexity, whereas satellite missions deliver long-term time series on a global scale, albeit with a ground sampling distance (GSD) inadequate for micro-plots. Multi-UAV systems enable simultaneous coverage of multiple plots but require sophisticated coordination and data management strategies [
10].
Table 3 highlights how the choice of acquisition modality directly affects data quality and suitability for specific scenarios. Pushbroom imagers remain the most widely adopted in agriculture due to their compact design and high signal-to-noise ratio (SNR), although they require careful motion correction. Whiskbroom scanners, despite their high spectral accuracy, are less suited to dynamic scenes or UAV deployment. Spectral scanning and snapshot systems are typically used in indoor or industrial sorting contexts, where rapid acquisition and reduced costs are prioritized.
Table 4 synthesizes the main families of algorithms applied to HSI analysis. Traditional machine learning approaches (PLSR, SVM, and RF) remain highly relevant for biochemical parameter estimation thanks to their efficiency with limited datasets and selected bands. Deep learning models (1D, 2D, 3D CNNs) leverage both spectral and spatial information, achieving superior accuracy in discrimination tasks such as weed detection or stress identification. Generative approaches (GANs, autoencoders) have emerged more recently to mitigate overfitting and generate synthetic samples. This convergence suggests that methodological innovations are increasingly transferable across application domains.
Table 5 illustrates the wide spectrum of applications reported in the literature, ranging from nutrient and water stress monitoring to crop/weed discrimination and pest detection. The availability of software tools (e.g., MATLAB, Python libraries, such as SPy and PlantCV, and embedded SDKs) is shown to play a decisive role, often requiring custom pipelines tailored to specific tasks. In sorting and quality control, embedded software is critical for enabling real-time implementation. The same software frameworks can be adapted across domains with different performance requirements—for instance, rapid decision-making in surgical guidance versus high-throughput processing in food sorting.
Table 6 emphasizes the trade-offs among computing platforms for managing hyperspectral datasets. GPUs remain the standard for training and inference of deep learning models, whereas FPGAs offer extremely low latency and high throughput in sorting and compression tasks, albeit at the expense of design complexity. Embedded systems (e.g., Jetson TX1/TX2/Xavier) are emerging solutions for UAVs and UGVs, enabling on-board processing but constrained by limited memory. Broader challenges are linked to costs and massive dataset management, where HPC clusters and dedicated storage infrastructures become necessary, particularly in cross-domain applications such as medical imaging or water quality monitoring.
Beyond vineyard use, detailed in numerous recent studies [
39,
40,
41,
42,
43,
44], ground-based HSI has been successfully applied to wheat, vegetables, stone fruit, citrus, and for quality assessment and defect detection in fresh produce, such as apples, potatoes, and tomatoes [
45,
46]. For example, the review by Dale [
45] and the overview provided by Benelli [
46] report multiple sub-studies documenting the use of HSI for apple ripeness classification, tomato damage detection, and non-destructive evaluation of quality parameters in kiwis and bell peppers.
A wide range of predictive and classification approaches were employed in the studies reported in
Table 7. Among regression techniques, Partial Least Squares Regression (PLSR) was by far the most commonly applied method, often combined with preprocessing or variable selection procedures. Other linear approaches such as multiple linear regression and optimized vegetation indices were also adopted, while non-linear methods, including convolutional neural networks (CNNs), support vector machines (SVMs), multilayer perceptrons (MLPs), and discriminant analysis, were implemented particularly for classification tasks, such as variety discrimination, disease detection, and weed recognition. With respect to predictive performance, most studies reported moderate to good accuracy, with determination coefficients (R
2) typically ranging from 0.70 to 0.88 for chlorophyll and nitrogen estimation, and classification accuracies above 90% in tasks such as fruit maturity assessment and weed resistance detection. Even though environmental factors (e.g., illumination variability, shadows, canopy structure, or wind) often affected data quality, the majority of models achieved acceptable and reproducible results, demonstrating that ground-based hyperspectral imaging can reliably support both quantitative predictions and qualitative classifications in real field conditions.
In
Table 7, some agricultural studies on-field using hyperspectral technology are listed. The only type of hardware-related information we aim to include, in order to provide a broader overview of these applications, is economic. The overall equipment required for hyperspectral image acquisition in the field comes at a significantly high cost. The aqcuisition system listed in
Table 7 have an average cost of €20,000 and can reach up to €40,000 for the camera body alone, without lens/objective.
Despite its great potential, the use of HSI on-field entails several practical and technical challenges. Below, we analyze the most important ones, with a greater focus on those less related to the specific hardware in use.
Table 8 summarizes the various issues investigated. The first three are mainly tied to the type of technology currently available in research centers and on the market. It is not impossible that prices and/or technical specifications will improve in the coming years to match current RGB cameras. The remaining issues concern data acquisition and processing methods for raw field data. In what follows, each issue is separately analyzed.
3.1. Limited Acquisition Speed
Current HSI systems are often characterized by low acquisition speeds, especially when using push-broom or line-scan spectroradiometers with internal moving parts in the camera to generate spectral bands, which acquire one line at a time [
40,
44,
45]. This strongly limits the extent of monitoring campaigns, particularly when large areas need to be covered or when working with moving platforms. For instance, studies reported by [
46] highlight that scanning a complete tree canopy may take several minutes, making large-scale use or use in high temporal variability contexts difficult. Limited acquisition speed is not only a technological bottleneck but also a major barrier for operational scalability. In UAV-based applications, line-scanning systems require precise synchronization with platform velocity, which increases operational complexity and makes long campaigns prone to incomplete or misaligned coverage [
9]. However, the use of portable NIR spectrometers allows for faster data collection, making monitoring more efficient, as shown by [
43]. On-the-go HSI systems also enable data acquisition on entire vineyard rows while in motion, further reducing the operational time [
44]. Beyond agriculture, similar constraints are observed in food quality inspection and medical imaging, where rapid decision-making is critical, and pushbroom systems are often replaced by snapshot cameras, despite their lower spectral fidelity [
10]. Another limiting factor for acquisition speed, in addition to the intrinsic behavior of the hyperspectral camera, is illumination. Under field conditions, reduced light levels—caused, for example, by cloud cover—often require longer exposure times to achieve an acceptable signal. Even snapshot cameras, which are designed for rapid cube reconstruction, suffer from low light sensitivity and reduced signal-to-noise ratios, thus constraining their effective speed in outdoor deployments. Furthermore, when the target is located several meters away from the sensor, as in many ground-based applications, increasing the exposure time inevitably leads to motion blur. This issue is commonly mitigated by reducing the platform’s travel speed, which in turn further decreases the overall acquisition efficiency.
3.2. Low Robustness in Outdoor Conditions
This lack of robustness is consistently reported across the literature as one of the key obstacles for field deployment. According to Ram et al [
9], the combination of platform vibrations and atmospheric interference severely impacts the reliability of outdoor acquisitions, particularly for UAVs. For ground-based application, vibrations due to ground unevenness can compromise acquisition quality, causing blurring, distortions, and misalignment between spectral bands [
40,
46]. For example, in vineyard campaigns, it has been observed that even minor oscillations in the mounts lead to registration errors, which are difficult to correct in post-processing. Some authors suggest using anti-vibration supports or active stabilization systems [
46].
3.3. Low Spatial Resolution
Despite the high spectral detail, many portable HSI systems offer lower spatial resolution compared to conventional cameras [
43,
45].
Spatial resolution represents a critical parameter in hyperspectral imaging, as it directly affects the ability to capture meaningful spatial and spectral information. Four main challenges are typically encountered: (i) the trade-off between pixel size and optical quality, where small pixels may be limited by diffraction or optical aberrations, while large pixels reduce spatial detail; (ii) the balance between resolution and noise, since operating near the resolution limit often increases noise, requiring strategies such as spatial or spectral binning; (iii) the need to match the spatial resolution with the size of the target particles or contaminants, as insufficient resolution produces mixed pixels that dilute relevant signals; and (iv) the influence of surface irregularities and geometry, which introduce reflections, shadowing, and other artifacts that degrade effective resolution. When these limitations prevent reliable pixel-wise analysis, many studies have resorted to using the ROI average spectrum [
61]. This approach reduces noise, simplifies data structures, and enhances computational feasibility, making it particularly attractive in applied and industrial contexts. Over the last five years, this trade-off has resulted in more than 80% of published studies adopting the ROI-based strategy, as it compensates for spatial resolution constraints and improves the robustness of classification models [
8]. Nevertheless, this choice inevitably limits the exploitation of hyperspectral imaging’s unique capacity to map spatial heterogeneity, thereby reducing its potential for detecting localized defects, adulterants, or contaminants.
3.4. HSI Data Management and Complexity
Another critical issue relates to the size and complexity of HSI data. Each acquisition generates large volumes of multi-band data, often tens or hundreds of MB per image, posing significant challenges in terms of storage, transfer, and processing time [
39,
45,
60]. Embedded platforms such as Jetson boards struggle to process full hypercubes in real time, necessitating dimensionality reduction or band selection prior to modeling, highlighting that data management is often a greater challenge than data acquisition itself [
9]. Das et al. [
60] emphasize that efficient data management is one of the key aspects for operational deployment of HSI, suggesting the adoption of automated pipelines for data cleaning and compression. In some cases, the cited literature suggests that the use of machine learning techniques for feature selection and dimensionality reduction is essential for handling large datasets [
62]. Moreover advanced processing techniques like parallel and distributed architectures, and high-performance computing can help to reduce computational and storage cost [
63].
3.5. Lighting Control
Natural light, variable in intensity and direction, is one of the main sources of error in outdoor HSI imaging [
9,
39,
43,
45]. The presence of clouds, shadows, reflections, or rapid light variations can alter the sensor’s spectral response, making it difficult to compare repeated acquisitions. Various methods (e.g., use of reference targets for reflectance calibration or normalization algorithms) have been proposed in the literature, but standardizing conditions remains problematic [
45,
46].
3.6. Radiometric Image Calibration
Radiometric calibration is an essential phase in the acquisition and analysis of hyperspectral and multispectral images. This refers to the set of procedures that transform raw digital values (digital numbers, DN) produced by a sensor into absolute radiometric measurements, such as reflectance or scene radiance [
64,
65,
66], considering both sensor response and lighting conditions. In the case of hyperspectral and multispectral images, calibration serves two purposes: ensuring data comparability across different acquisitions, sensors, and lighting conditions; and enabling reliable quantitative information extraction, essential for biophysical monitoring, species or material identification, diagnostics, and automatic classification.
It requires stable, controlled conditions, which are difficult to maintain in the field [
45,
46]. Calibration errors can propagate throughout the data-processing chain, significantly affecting the accuracy of qualitative and quantitative estimates. Benelli et al. [
46] underline that the lack of standardization in calibration procedures among research groups still limits widespread outdoor HSI use. The evolution of methods, from classic reference-based strategies to computational and AI-based solutions, has expanded operational capabilities, allowing for use across a wide range of scenarios. However, the choice of calibration procedure must be carefully adapted to operating conditions, sensor type, and application objectives, always considering the challenges and limitations reported in the recent literature.
The most widespread method, using physical references (on-field and laboratory-based), involves reference panels with known reflectance (Spectralon or similar materials), acquired before, during, and after the acquisition session [
64,
66,
67,
68,
69]. This strategy can be applied both in the lab and directly in the field and includes the following steps: acquisition of panel images under the same operating conditions as the scene; estimation of the sensor’s radiometric response function for each band, typically through linear regression between panel values and their certified reflectances [
64]; application of the calibration curve to all operational images to obtain radiometrically corrected data [
64,
67,
69].
Panel choice (pure white, gray, black, multiple) and operational handling strongly affect calibration quality and final measurement accuracy [
67,
69].
3.6.1. Calibration Based on Illumination Estimation and Compensation
Recently, techniques have spread that estimate the spectral distribution of the light source directly from images, then compensate its effects on the data [
70,
71,
72,
73,
74]. These methods include using reference patches in the scene to extract the SPD (Spectral Power Distribution) of the light [
70,
75]; applying color constancy algorithms adapted to multispectral data, such as gray world, max-RGB, shades of gray, or gray edge [
72,
73]; and finally using statistical or machine learning (deep learning) methods for automatic, per-pixel estimation of the illuminant [
74].
These approaches are particularly useful where physical references cannot be included, or in dynamic scenes and complex operational environments [
70,
71,
74].
3.6.2. Calibration for Non-Conventional or Consumer Instruments
The spread of consumer digital cameras has stimulated the development of calibration procedures adapted to RGB sensors or non-scientific devices, especially under low-light conditions [
66,
76]. Adopted methods include using reference panels, grayscale charts, and careful management of acquisition parameters (ISO, exposure time, dark correction) [
66] and spectral calibration strategies using diffraction filters and simplified physical models [
76]. The analyzed literature highlights several recurring issues in the radiometric calibration of multispectral/hyperspectral images: variations in natural light [
64,
67,
68,
75]; errors due to panel misplacement, shadowing, surface contamination, or non-optimal angles [
64,
67]; non-linear sensor response and temporal drift [
66,
69,
77]; absence of physical references [
71,
72,
73]; multiple or non-uniform illumination effects [
74,
75,
78]; and application requirements and sector-specific constraints [
77,
78,
79].
In
Table 9, several of the radiometric calibration methods previously discussed are summarized. Particular attention is given to the last two columns, which concern the practical aspects of their implementation. In many cases, the lack of knowledge of the sensor spectral response represents a major limitation, especially when non-scientific or consumer-grade cameras are employed. Conversely, methods requiring a reference target in the field may complicate certain in situ applications, where the deployment of calibration panels is not always feasible.
4. Spectral Super-Resolution: Methods, Challenges, and Perspectives
Spectral Super-Resolution (SSR), that is, the reconstruction of hyperspectral images from RGB images, is a central topic in modern computer vision, with applications ranging from precision agriculture to medical diagnostics and cultural heritage preservation.
Nevertheless, it is important to emphasize that spectral reconstruction only replaces the traditional acquisition stage: once the hyperspectral cube is reconstructed, all subsequent steps—such as data preprocessing, dimensionality reduction, and the development of predictive models—remain essentially the same as in the traditional HSI pipeline, along with their inherent difficulties.
A phenomenon that has a direct implications on spectral reconstruction is metamerism. Metamerism arises from the fact that imaging devices reduce the continuous spectral information of light to a small number of channels (red, green, and blue). As a result, two different spectral reflectance functions may produce identical RGB values and thus appear as the same color, despite their underlying physical differences. This phenomenon highlights the inherent information loss in color imaging systems, where a high-dimensional spectral signal is projected onto a low-dimensional color space. Formally, the response
of a given channel
can be expressed as a weighted integral over the visible spectrum:
where
denotes the surface spectral reflectance,
the spectral power distribution of the illuminant, and
the spectral sensitivity function of the channel. Two distinct reflectance spectra,
and
, are said to be metameric under a given illuminant if they produce identical responses in all channels:
In this case, although , the recorded color will be indistinguishable, which represents the essence of the metamerism problem. Recovering the full incident spectrum from only three RGB responses is intrinsically an inverse ill-posed problem. Since infinitely many different spectra can lead to the same set of RGB values, the mapping from RGB space back to the hyperspectral domain is not unique.
Despite this problem, the recent literature highlights the significant progress enabled by statistical models and, especially, deep learning approaches [
1,
3].
Regression [
80], one of the earliest techniques, has become popular because of its straightforward, fast, accurate, and closed-form solution. RGB and their spectral estimations are related in the most basic “linear regression” [
80] by a single linear transformation matrix. Moreover, polynomial and root-polynomial regression [
81,
82] expand the RGB into polynomial/root-polynomial terms, which are subsequently transferred to spectra using a linear transform, in order to add non-linearity. Regressions that minimize the mean squared error (MSE) in the training set are sometimes referred to as “least-squares” regressions. Li et al. [
83] proposed a locally linear embedded sparse coding approach for spectral reconstruction from RGB images, achieving competitive results on various benchmarks. Similarly, the A+ method based on Adjusted Anchored Neighborhood Regression, developed by Timofte et al. [
84], is a reference in traditional super-resolution and is often used as a baseline in SSR applications.
Several advanced architectures have been proposed in recent years, such as HSCNN+ [
85] reported in
Figure 2, attention-based models, adversarial networks [
86,
87], and techniques that exploit prior knowledge (e.g., the camera’s spectral response) [
2,
5]. In particular, Yan et al. [
5] suggest to use informative priors on material categories, meta-learning, and dimensionality reduction techniques to improve generalization and model efficiency.
The New Trends in Image Restoration and Enhancement (NTIRE) challenges, held annually alongside the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), provide standardized benchmarks and competitive venues for low-level vision problems. They foster rapid progress by releasing carefully curated datasets and clear evaluation protocols, enabling fair, reproducible comparisons among state-of-the-art architectures in spectral reconstruction.
As shown in
Table 10, most of the submitted solutions rely on deep learning architectures, predominantly convolutional and transformer-based models, whereas approaches grounded in sparse recovery are almost entirely absent from the recent NTIRE challenges. In the earliest editions (e.g., NTIRE 2018), convolutional neural networks rapidly supplanted traditional sparse coding or regression-based techniques, which had previously been considered effective for spectral recovery. From NTIRE 2020 onwards, virtually all competitive entries relied on increasingly deeper CNNs, often incorporating residual or dense connections to improve stability and representation capacity. More recent challenges (e.g., NTIRE 2022) show a further shift towards transformer-based designs, such as MST++ and hybrid CNN–Transformer frameworks, reflecting the field’s convergence with broader trends in image restoration. Another noticeable trend in recent NTIRE challenges is the increasing integration of architectures that are sensitive not only to spectral dependencies but also to spatial features. Transformer-based models and hybrid CNN–attention frameworks explicitly leverage spatial context to improve reconstruction fidelity, marking a shift from purely spectral, pixel-wise mappings to more context-aware designs. Notably, dictionary learning and sparse recovery approaches are almost entirely absent from these recent competitions, confirming that they no longer represent competitive baselines when large-scale datasets and efficient GPU implementations enable the training of complex neural models. This progression underscores the decisive role of deep learning in setting the state of the art for spectral reconstruction.
The summary reported in
Table 11 shows not only the technical diversity of the methods submitted to NTIRE competitions but also the gradual evolution of the benchmarking protocols. A distinctive element of these challenges is the division into tracks, each designed to emulate different acquisition conditions or levels of task difficulty. For instance, the NTIRE 2018 and 2020 spectral recovery competitions defined two parallel tracks: the Clean track, which considered RGB images generated from hyperspectral data through a known and noise-free camera response function, and the real-world track, which simulated the practical scenario of JPEG-compressed RGB images obtained with an unknown response function. This dual setup provided insight into both the theoretical upper bound of algorithmic performance and the more challenging case of uncontrolled acquisition. In the NTIRE 2022 spectral recovery challenge, the organizers opted for a single track with a more realistic camera simulation pipeline, which incorporated automatic exposure, sensor noise, basic in-camera processing, and compression. This choice reflects a shift in focus from idealized conditions to scenarios that better approximate real-world usage. Overall, the design of tracks across NTIRE competitions illustrates the organizers’ intention to balance scientific rigor with practical relevance. By progressively moving from clean synthetic scenarios to complex, noise-affected, and compression-degraded data, the NTIRE challenges have established themselves as a benchmark series that both advances methodological innovation and pushes models closer to real-world applicability.
Table 12 summarizes both the official NTIRE datasets and the most widely used public hyperspectral benchmarks. A clear distinction can be observed between the NTIRE collections and traditional datasets. The NTIRE datasets (BGU HS, ARAD 1K, and LDV 2.0) were specifically curated to support competitive benchmarking, providing standardized train/validation/test splits and evaluation protocols. They are generally smaller in spatial resolution (e.g., ARAD 1K at 482 × 512 px) but offer carefully controlled acquisition pipelines, such as realistic camera simulations and JPEG compression, which are crucial for fair comparison across methods. In contrast, public hyperspectral datasets such as ICVL, CAVE, and Harvard remain indispensable for algorithm development due to their higher spatial resolution and scene diversity, though they lack standardized challenge settings. Remote sensing datasets (e.g., Chikusei, Houston, Pavia, Washington DC Mall, Botswana, and Cuprite) typically consist of large single-scene acquisitions with high spectral dimensionality, providing opportunities to test scalability but posing challenges for generalization. More recent UAV-based benchmarks, such as WHU-Hi, reflect a growing interest in domain-specific applications, particularly agriculture and crop monitoring. Overall, while NTIRE datasets drive methodological innovation through controlled and competitive benchmarking, the broader public datasets remain essential for testing robustness and adaptability of spectral reconstruction methods across varied imaging conditions and application domains. Many public datasets used for training and testing models do not include standardized calibration procedures, further complicating the generalization of results [
1,
4]. Chen et al. [
89] highlight that deep networks achieve excellent results on benchmarks, but also the challenge of maintaining high performance on unseen data or under different acquisition conditions.
It is important to note that the NTIRE challenges are not specifically oriented toward agri-food applications. Rather, they are conceived as general-purpose benchmarks for low-level vision tasks, such as spectral recovery, image restoration, and super-resolution, using standardized datasets and evaluation protocols. While the methodological advances achieved in NTIRE competitions are of potential relevance for agri-food imaging—particularly in terms of spectral fidelity and reconstruction accuracy—their datasets are typically composed of generic natural scenes or video content. Consequently, transferring these approaches to agri-food requires additional validation under domain-specific conditions, including outdoor acquisition variability, crop heterogeneity, and application-driven performance metrics.
In this regard, it is also important to acknowledge recent contributions that go beyond the NTIRE framework and explicitly address some of the physical shortcomings of current spectral reconstruction methods. Lin and Finlayson [
81] demonstrated that most deep learning architectures, while highly accurate on fixed benchmark datasets, are strongly dependent on exposure conditions. Their analysis revealed that leading CNN-based models fail to generalize when illumination or camera settings vary, a scenario that is unavoidable in field acquisitions. To mitigate this limitation, they revisited regression-based approaches and introduced root-polynomial regression, a model that is inherently exposure-invariant. Although simpler than deep neural networks, this method was shown to maintain stable performance across a range of exposure conditions, thus highlighting the importance of robustness over architectural complexity. Building on this line of work, Lin and Finlayson [
4] proposed a physically plausible formulation of spectral reconstruction, directly addressing another fundamental limitation of existing methods. In conventional pipelines, reconstructed hyperspectral signals often fail to reproduce the RGB values that were originally captured, meaning that the mapping from RGB to spectra is not physically consistent with the image formation process. To resolve this, the authors decompose each spectrum into two components: a
fundamental metamer, which is uniquely determined by the measured RGB values, and a
metameric black, which represents the residual spectral degrees of freedom invisible to the camera. This guarantees that reconstructed spectra always integrate back to the exact RGB measurements, ensuring zero colorimetric error while still allowing for learning-based methods to predict the hidden spectral variation. Beyond improved spectral fidelity, this approach enhances robustness to changes in illumination and camera response, thereby aligning reconstruction with the physical constraints of image capture. Taken together, these studies suggest that for application domains, such as agri-food, where imaging is frequently conducted in uncontrolled outdoor conditions, advances in exposure invariance and physical plausibility may be as critical as improvements in benchmark accuracy. Consequently, progress in this field cannot rely solely on NTIRE-driven deep learning pipelines, but must also integrate physics-based constraints and robustness criteria to ensure reliable deployment in real-world agricultural monitoring and food quality assessment. A critical aspect concerns model portability: the output of RGB cameras heavily depends on the sensor’s spectral response, calibration, white balance, and compression processes [
3,
4,
100]. As noted by Koundinya et al. [
100], models trained on data from a specific camera may degrade significantly when applied to data acquired with different devices. Normalization using a white reference and the incorporation of physical constraints on the plausibility of the reconstructed response are promising strategies. Overall, spectral super-resolution remains an open challenge: excellent results have been achieved in controlled conditions and on standard datasets, but operational application requires robust pipelines for calibration, validation, and critical selection of data and models [
1,
80]. All these factors make reconstruction highly dependent on the dataset, the camera, the resolution, the FOV, and the camera sensor, among others [
1,
4,
81]. The validity of an SSR model on cameras/datasets different from those used in training is often limited. In particular, the SSR model depends on the spectral response function of the specific camera used, in addition to the nature of the scene and the acquisition methodology. In datasets acquired in reflectance (where the response is normalized to a white standard), this dependence is mitigated, but considering the RGB response of each system remains fundamental [
4].
The main challenges of spectral super-resolution concern the need for large and heterogeneous datasets for effective training, the limited physical interpretability of deep learning models, and the difficulty of ensuring generalization across different cameras and datasets. In summary, spectral super-resolution represents a promising pathway to broaden spectral analysis applications in real-world scenarios; however, it remains an open scientific and technological problem with several hurdles still to be addressed. The following section examines recent applications of spectral reconstruction and super-resolution, both in controlled laboratory settings and under uncontrolled field conditions.