1. Introduction
Hyperspectral sensing facilitates the acquisition of spatially resolved spectral information across tens to hundreds of contiguous wavelength bands. This capability enables precise material identification, chemical composition analysis, and subtle feature discrimination that far exceeds the limitations of conventional red–green–blue (RGB) or multispectral imaging [
1]. By capturing the unique spectral signature of an object, this technology has become indispensable in diverse fields. Specifically, it is extensively utilized in remote sensing for environmental and agricultural monitoring [
2], where hyperspectral measurements enable material discrimination and condition mapping based on spectral signatures. In industrial settings, it supports food safety and grain quality control by enabling rapid, non-destructive screening and classification of products [
3,
4]. Furthermore, hyperspectral imaging is applied to the non-destructive analysis of historical art [
5] for pigment identification and conservation-state mapping. In biomedical diagnostics and surgery, it provides spectral contrast for tissue differentiation to assist cancer detection and intraoperative guidance [
6,
7,
8]. However, the transition of hyperspectral technology from specialized laboratory instruments to ubiquitous portable devices has been historically impeded by the intrinsic complexity and bulk of traditional system architectures [
9]. Conventional hyperspectral imaging systems predominantly rely on spatial dispersion principles. These instruments typically employ diffraction gratings, prisms, or interferometric components combined with high-precision scanning mechanisms, such as pushbroom or whiskbroom configurations [
10]. While these architectures deliver superior spectral resolution and radiometric accuracy, they necessitate extended optical path lengths to separate wavelengths and require complex mechanical assemblies for scanning. Consequently, such systems are characteristically large, power-intensive, and sensitive to mechanical vibrations. These physical constraints have restricted the deployment of hyperspectral sensing in emerging applications that demand compactness, robustness, and scalable integration, such as uncrewed aerial vehicles (UAVs) and mobile consumer electronics [
1,
9].
In response to these challenges, the field has witnessed a concerted effort toward system miniaturization. The initial phase of this evolution focused on the structural scaling of classical optical elements [
11]. Engineers successfully reduced the footprint of spectrometers by employing folded optical geometries, micro-gratings, and micro-electromechanical systems (MEMS) tunable filters [
12,
13,
14]. These developments enabled a significant reduction in volume. Nevertheless, these approaches essentially shrink the physical dimensions of discrete optical components without altering the fundamental physics of dispersion or interference. As device dimensions decrease, unavoidable trade-offs emerge among spectral resolution, optical throughput, and signal-to-noise ratio (SNR)—the reduced optical path length limits dispersion capability, while smaller apertures restrict photon collection [
12]. To circumvent the physical limits of structural scaling, computational hyperspectral imaging emerged as a complementary strategy. Techniques such as coded aperture snapshot spectral imaging (CASSI) and integral field spectroscopy replace direct spectral measurement with multiplexed encoding. By modulating the incident light field and employing inverse algorithms, these snapshot architectures reconstruct the whole spectral data cube from compressed measurements [
15,
16,
17]. While this paradigm successfully eliminates the need for mechanical scanning and enhances temporal resolution, it introduces a heavy reliance on calibration accuracy and computational resources. Furthermore, the image sensor in these systems remains a passive intensity detector. The spectral selectivity is still determined by external optical encoding elements rather than the sensing device itself [
18]. Recent advancements in nanomanufacturing and materials science have catalyzed a fundamental paradigm shift toward integrating spectral discrimination capabilities directly at the sensor level [
19]. This emerging direction seeks to eliminate the physical separation between optical preprocessing and photodetection. By embedding wavelength discrimination functionalities within the pixel architecture, these sensor-level integrated devices promise to realize truly compact and “optics-free” hyperspectral sensing. This integration is primarily achieved through two mechanisms: pixel-level spectral filtering using nanophotonic structures or colloidal nanomaterials [
20,
21], and intrinsic material selectivity, where the optoelectronic properties of the active layer itself are engineered to distinguish wavelengths [
22,
23].
While previous reviews have extensively covered component-based miniaturization and computational reconstruction strategies, a unified perspective that connects these developments with the recent rise of sensor-integrated paradigms remains valuable. To complement the existing literature on compact hyperspectral systems, this review provides an integrated perspective on the transition from component-based miniaturization to sensor-integrated hyperspectral devices while placing additional emphasis on emerging optics-free architectures in which wavelength selectivity is implemented at the pixel or device level. Starting from traditional dispersive optics,
Section 2 and
Section 3 review component miniaturization and computational snapshot schemes.
Section 4 focuses on the frontier of sensor-level integration, highlighting advances in pixel-level filtering and the development of intrinsically spectrally selective materials, such as compositionally engineered perovskites and electrically tunable organic or two-dimensional (2D) heterostructures. Finally, this review addresses persistent fabrication challenges and outlines future directions toward scalable, optics-free hyperspectral vision systems.
2. Miniaturized Dispersive and Filtering Optics for Hyperspectral Systems
The miniaturization of hyperspectral imaging systems has historically followed a trajectory of structural scaling. Early efforts focused on reducing the physical dimensions of classical optical elements, such as gratings, prisms, and filters, while maintaining the high spectral fidelity characteristic of benchtop instruments [
11,
12]. This section reviews the evolution of these miniaturized optical components and examines how traditional dispersive and filtering architectures have been adapted for compact integration. Although substantial progress has been achieved through advances in microfabrication, these approaches continue to rely on external spectral separation before photodetection, which imposes inherent constraints on achievable system volume and photon efficiency [
12,
24].
Dispersive spectroscopy, which relies on diffraction gratings or prisms to spatially separate polychromatic light, has historically served as the gold standard for high-precision spectral analysis. As qualitatively compared in
Figure 1a, this approach provides hundreds of contiguous narrow bands, offering significantly finer spectral discrimination capabilities than the three broad bands typical of RGB imaging [
25]. The performance of these systems is fundamentally governed by the principles of spatial dispersion, where spectral resolution is inextricably linked to the optical path length and the dispersive power of the element [
24]. Historically, high-end hyperspectral imaging has been dominated by macroscopic spatial dispersion systems employing scanning architectures, primarily categorized into whiskbroom (point-scanning) and pushbroom (line-scanning) configurations, the operating principles of which are schematically illustrated in
Figure 1b and
Figure 1c, respectively. The whiskbroom architecture (
Figure 1b), exemplified by the classic airborne visible/infrared imaging spectrometer, utilizes an oscillating optomechanical mirror to scan a single pixel’s instantaneous field of view across the scene, acquiring spectral data point by point [
26]. In contrast, as depicted in
Figure 1c, the pushbroom architecture captures an entire spatial line simultaneously; light passes through a narrow entrance slit and is dispersed onto a 2D focal plane array (FPA), where one axis maps spatial position and the orthogonal axis maps spectral information [
25]. While these scanning paradigms deliver superior spectral uniformity and radiometric accuracy, their reliance on complex optomechanical assemblies and extended focal lengths to satisfy dispersion requirements strictly imposes a significant barrier to miniaturization [
12].
To reconcile the conflict between high spectral performance and the stringent size, weight, and power constraints of portable applications, miniaturization strategies have bifurcated into two distinct paths. The first strategy focuses on the structural scaling of free-space optics. To accommodate the extended optical path length required for sufficient dispersion within a limited volume, engineers have widely adopted folded optical geometries. As visualized in
Figure 2a,b, these designs utilize precision-engineered mirrors to guide light through a convoluted trajectory [
11,
12], effectively packing the performance of a benchtop instrument into a handheld footprint without sacrificing the fundamental mechanism of spatial dispersion. Complementing this geometric folding is the miniaturization of the dispersive element itself, which involves replacing bulky macroscopic ruled gratings with micro-gratings fabricated via lithographic processes that can be seamlessly integrated into these compact modules [
12,
14]. Second, more radical advancement involves transitioning from free-space optics to integrated guided-wave architectures. The first class of these devices operates within a continuous slab waveguide, where light propagates freely in a planar layer before reflecting off a curvilinear boundary patterned with lithographically etched facets. This architecture, typically exemplified by the planar echelle grating (
Figure 2c), effectively embeds the functionality of a concave diffraction grating directly into the chip, utilizing the Rowland circle configuration to simultaneously disperse and focus spectral components without the need for external lenses [
28]. In parallel, as depicted in
Figure 2d, arrayed waveguide gratings (AWGs) offer an alternative mechanism based on multi-beam interference. Unlike the continuous propagation in slab designs, AWGs split light into an array of discrete waveguides with incrementally increasing path lengths [
29]. These precise length differences introduce phase shifts that separate wavelengths through constructive interference at the output couple. Both architectures enable the realization of rugged, alignment-free spectrometers on silicon photonic chips with footprints reduced to the millimeter scale.
Miniaturized dispersive systems are fundamentally constrained by a trade-off among spectral resolution, optical throughput, and SNR [
24]. As device dimensions are reduced, the available optical path length decreases accordingly, which limits the achievable angular or spatial dispersion. To preserve spectral resolution under these conditions, the entrance slit width must be reduced, leading to a proportional decrease in collected photon flux. This coupling between geometric scaling and photon throughput results in degraded SNR, particularly in low-light or high-resolution operating regimes, and represents a fundamental limitation of dispersive architectures in aggressively miniaturized implementations [
12].
In parallel with dispersive optics, filtering-based approaches have been widely adopted to circumvent the path-length requirements of spatial dispersion. By selectively transmitting specific wavelengths while rejecting others, filtering components enable a significant reduction in the optical track length, facilitating the transition from three-dimensional (3D) volumetric instruments to quasi-2D planar devices. To achieve hyperspectral resolution without relying on bulky mechanical filter wheels, electronically tunable filters were developed. As conceptually illustrated in
Figure 3a, one dominant class involves solid-state devices, primarily represented by liquid crystal tunable filters (LCTF) [
30]. These components utilize the birefringence induced by electric fields to modulate the spectral transmission window [
31]. Another established approach in this category is the acousto-optic tunable filter [
32], which achieves similar tuning capabilities via acoustic waves. While these solid-state architectures enable rapid, random-access tuning suitable for vibration-sensitive applications like UAV-based remote sensing, they often suffer from low transmission efficiency due to polarization sensitivity and high power consumption, limiting their utility in energy-constrained microsystems [
33]. Driven by the need for structural compactness, a distinct micro-electromechanical approach has emerged in the form of the MEMS Fabry–Pérot interferometer (FPI).
Figure 3b shows the design principle of such a device, which functions based on physical actuation utilizing a vertically integrated cavity with a variable air gap. By applying an electrostatic voltage, the top mirror is mechanically displaced to dynamically tune the resonance wavelength, allowing a single detector to scan through the spectral range over time [
34]. This architecture effectively compresses a scanning spectrometer into a chip-scale module, finding widespread adoption in mobile gas sensing. However, this reliance on moving micro-structures introduces distinct reliability challenges, particularly regarding susceptibility to mechanical shock and vibration compared to fully solid-state alternatives. Furthermore, FPIs maintain an intrinsic sensitivity to the incident light angle, often necessitating additional collimating optics that reintroduce volume to the system, limiting the degree of monolithic integration achievable [
11].
3. Computational Snapshot Hyperspectral Imaging Architectures
While physical miniaturization has successfully reduced the device footprint, traditional architectures inevitably sacrifice light throughput to achieve spectral resolution. To address this limitation, snapshot hyperspectral imaging seeks to capture the full 3D spectral datacube (x, y,
) within a single integration time. This objective necessitates projecting volumetric data onto a conventional 2D FPA, effectively mapping the spectral dimension onto spatial detector coordinates. The core innovation lies in replacing the direct measurement of specific wavelengths with the multiplexed encoding of the entire light field. Instead of discarding photons via filters or scanning them via slits, these architectures modulate the incident light to mix spatial and spectral information into a single compressed measurement [
18,
35]. Consequently, the recovery of the original scene is formulated as an inverse problem, requiring computational decoding algorithms to reconstruct the 3D signal from the 2D projection [
36]. A prominent architecture utilizing this approach is the CASSI system. As schematically illustrated in
Figure 4, in a typical CASSI configuration, the scene is imaged onto a coded aperture, typically taking the form of a random binary mask, which spatially modulates the incident light field [
18]. This modulated signal subsequently passes through a dispersive element, such as a prism or grating, inducing a wavelength-dependent shear relative to the mask pattern. The detector thus captures a superposition where the coded spatial intensity is inextricably mixed with the dispersed spectral information. In the absence of such coding, the dispersion of an extended scene would result in an ambiguous superposition where spatial and spectral signals are indistinguishable. The coded aperture resolves this ambiguity by introducing a known high-frequency modulation pattern into the light field. This structure allows reconstruction algorithms, often leveraging compressive sensing theories, to mathematically disentangle spatial overlaps from spectral shifts [
37,
38].
In contrast to the dispersive encoding employed by CASSI, the snapshot hyperspectral imaging Fourier transform (SHIFT) spectrometer adopts an interferometric approach based on aperture division, as schematically illustrated in
Figure 5 [
18]. Unlike early implementations that relied on vibration-sensitive Michelson interferometers, the SHIFT design utilizes a birefringent common-path interferometer, typically realized using Nomarski prisms positioned behind a microlens array. In this configuration, the lenslet array partitions the scene into multiple sub-images, while a slight rotation of the prisms relative to the detector creates a step-wise variation in the optical path difference (OPD) across these sub-images. This clever arrangement effectively maps the temporal scanning of traditional Fourier transform spectrometer onto the spatial domain, allowing the entire 3D interferogram cube to be captured in a single snapshot [
39]. By sharing the same optical path for interfering beams, this birefringent design substantially enhances intrinsic stability and robustness against environmental vibrations, offering a compact and reliable solution for real-time spectral sensing. The final hyperspectral datacube is reconstructed by performing a Fourier transformation along the spatially sampled OPD axis [
18,
39].
Recovering high-fidelity spectra from these encoded projections requires solving a severely ill-posed inverse problem, particularly in compressive architectures where the number of detector pixels is significantly lower than the voxels in the datacube. Traditional reconstruction relies on iterative optimization algorithms that enforce hand-crafted priors, such as total variation minimization or sparsity constraints, to converge on a physical solution [
40]. Recently, deep learning approaches have begun to supersede these iterative methods, utilizing convolutional neural networks to learn the complex, non-linear mapping from 2D measurements to 3D spectra with improved speed and accuracy [
41,
42,
43]. As reconstruction becomes the dominant computational block, its latency, power consumption, and memory footprint increasingly shape practical miniaturization and deployment. Iterative regularized methods are often too slow for real-time recovery of high-dimensional datacubes, while learned models can support sub-second inference on graphics processors and enable video-rate operation. This motivates on-device acceleration using embedded graphics processors or field-programmable gate arrays for low-latency operation and supports hybrid edge–cloud workflows in which lightweight inference runs locally while more complex reconstruction is executed remotely when the communication budget allows [
40]. Notwithstanding these algorithmic advancements, practical performance remains constrained by physical realities. The reconstruction quality is strictly contingent on the accuracy of the system calibration, as any deviation between the physical optical path and the mathematical forward model results in systematic artifacts. Moreover, while optical multiplexing theoretically increases light throughput, it simultaneously distributes photon noise across reconstructed channels, often degrading the SNR in low-light conditions [
44]. Taken together, these factors show that the effective resolution of snapshot systems is set not only by hardware design, but also by the efficiency of the optical encoding and the strength of the reconstruction prior [
40]. Fundamentally, while computational snapshot approaches successfully replace mechanical complexity with algorithmic sophistication, they rely on the sensor as a passive intensity recorder. The spectral discrimination capability is not intrinsic to the detection mechanism but is “borrowed” from the external optical encoding and “repaid” by computational resources. This dependence on precise optical alignment and complex priors invites a search for deeper integration, where the sensing material itself plays an active role in spectral selection.
4. Nanomanufactured Sensor-Level Devices with Integrated Spectral Selectivity
The trajectory of miniaturization outlined in the preceding sections indicates that both the structural scaling of discrete optical components and the complexity of computational multiplexing are approaching fundamental physical limits. Consequently, the frontier of innovation has shifted toward embedding spectral discrimination capabilities directly into the photodetector architecture, effectively eliminating the physical separation between optical preprocessing and signal detection [
19]. This section reviews the evolution of such sensor-level integrated devices, organized by their underlying operational mechanisms. We first examine pixel-level spectral filtering, where nanophotonic structures, metasurfaces, and colloidal nanomaterials are integrated onto image sensors to realize compact spectral filter arrays (SFAs). Subsequently, we explore intrinsic spectral selectivity, a paradigm shift that leverages wavelength-dependent material absorption and active optoelectronic tunability to achieve truly optics-free hyperspectral sensing without reliance on passive external filters.
The most direct pathway to monolithic integration involves migrating the spectral selection mechanism from the macroscopic aperture to the microscopic pixel level. As conceptually visualized in
Figure 6a, this approach is typically realized as SFAs, effectively functioning as the hyperspectral analog of the Bayer pattern found in standard RGB cameras [
45,
46]. However, unlike the broad dye-based filters typical of consumer electronics, hyperspectral SFAs require strict narrowband selectivity. This necessitates the implementation of interference-based or nanophotonic structures fabricated with high-precision lithography techniques [
47]. Distinct from dynamic MEMS, these architectures employ compact all-dielectric film arrays integrated directly onto complementary metal-oxide-semiconductor (CMOS) sensors [
45,
46]. Advanced implementations utilize complex multi-layer stacks fabricated via electron beam evaporation and lift-off cycles. These nanomanufacturing techniques enable the creation of variable-height and curved film morphologies induced by shadow effects, facilitating sophisticated multi-peak and broadband spectral encoding rather than simple narrowband filtering [
46]. This “wavelength-coded” approach achieves extreme miniaturization but requires robust reconstruction algorithms to decouple the spatially multiplexed spectral information [
46]. A distinct implementation of this pixel-level spectral selection architecture utilizes solution-processable nanomaterials, particularly colloidal quantum dots (CQDs), as depicted in
Figure 6b, to construct absorptive filter arrays directly on the sensor. Unlike the interference-based dielectric stacks discussed above, these semiconductor nanocrystals function as absorptive filters with a size-tunable bandgap governed by the quantum confinement effect. By simply varying the particle diameter during synthesis, their absorption peak can be precisely engineered from the ultraviolet to the shortwave infrared. Serving as a “liquid semiconductor,” CQDs can be spin-coated or inkjet-printed directly onto CMOS readout integrated circuits [
48]. This hybrid integration capability bypasses the stringent lattice matching requirements of traditional epitaxial growth, enabling the fabrication of high-density multispectral mosaics where individual pixels are functionalized with distinct quantum dot inks, thereby achieving comparable spectral encoding capabilities through a cost-effective, solution-based manufacturing process.
Complementing these thickness-modulated film architectures, plasmonic and dielectric metasurfaces exploit sub-wavelength lateral patterning to achieve spectral control, offering a strictly planar alternative [
49]. As exemplified in
Figure 6c, plasmonic implementations typically pattern sub-wavelength nanohole arrays directly on the photodiode surface to selectively transmit specific wavelengths via extraordinary optical transmission resonances [
50,
51]. In parallel, dielectric metasurfaces, as depicted in
Figure 6d, utilize arrays of nanopillars to manipulate the optical response with lower absorption losses compared to their metallic counterparts [
52]. These nanophotonic filters offer the advantage of extreme compactness, often reducing the optical stack height to a few hundred nanometers. This ultrathin profile significantly minimizes the optical crosstalk inherent to thicker multi-layer interference architectures, where oblique light leakage between adjacent pixels is more pronounced. Moreover, metasurface manufacturing demonstrates high compatibility with standard semiconductor planar processes, thereby facilitating wafer-scale production [
53].
Figure 6.
Pixel-level spectral selection via monolithic integration. (
a) Schematic diagram of a hyperspectral imager featuring a micrometer-scale nanophotonic film array integrated directly onto a CMOS image sensor. Reprinted with permission from Ref. [
46]. Copyright 2025 American Chemical Society. (
b) Schematic illustration of a CQD spectrometer (PQDF, perovskite quantum dot filter). Reproduced from Ref. [
48] under the Creative Commons Attribution 4.0 International License. (
c) Tunable plasmonic metasurface filter based on a GeSbTe-embedded nanohole array Reproduced from Ref. [
51] under the Creative Commons Attribution 4.0 International License. (
d) All-dielectric metasurface filter based on a pixelated nanopillar array [
52]. Reproduced from Tittl et al., Science,
https://doi.org/10.1126/science.aas9768 (2018), AAAS [
52].
Figure 6.
Pixel-level spectral selection via monolithic integration. (
a) Schematic diagram of a hyperspectral imager featuring a micrometer-scale nanophotonic film array integrated directly onto a CMOS image sensor. Reprinted with permission from Ref. [
46]. Copyright 2025 American Chemical Society. (
b) Schematic illustration of a CQD spectrometer (PQDF, perovskite quantum dot filter). Reproduced from Ref. [
48] under the Creative Commons Attribution 4.0 International License. (
c) Tunable plasmonic metasurface filter based on a GeSbTe-embedded nanohole array Reproduced from Ref. [
51] under the Creative Commons Attribution 4.0 International License. (
d) All-dielectric metasurface filter based on a pixelated nanopillar array [
52]. Reproduced from Tittl et al., Science,
https://doi.org/10.1126/science.aas9768 (2018), AAAS [
52].
Despite the success of pixel-level integration in compressing the spectrometer into a chip-scale format, this approach essentially extends the passive structural paradigm to the micro-scale without altering the fundamental detection logic, consequently inheriting several critical limitations. First, whether utilizing narrowband rejection or the broadband encoding mentioned above, these passive structures inherently operate by blocking, absorbing, or scattering a portion of the incident photon flux to achieve spectral discrimination. As pixel sizes shrink toward the sub-micron scale, optical crosstalk, namely light leakage between adjacent pixels, becomes increasingly pronounced and degrades both spatial resolution and spectral purity [
54]. At the same time, the subtractive nature of passive filtering imposes an intrinsic photon-efficiency penalty, because each filter rejects a large fraction of the incident flux to isolate a spectral band, which in turn reduces the SNR under low-light conditions [
47]. Second, the spatial multiplexing necessitated by mosaic arrays creates an unavoidable trade-off between spatial and spectral resolution. Recovering the full resolution requires complex demosaicing or reconstruction algorithms, which frequently introduce aliasing artifacts at high-frequency edges [
45]. Finally, both static multi-layer films and planar nanostructures exhibit intrinsic sensitivity to the angle of incidence. This dependence often necessitates the inclusion of bulky telecentric coupling optics to maintain spectral accuracy, a requirement that paradoxically adds volume back to the system and partially negates the benefits of monolithic integration [
48].
While filter-based architectures rely on external optical elements to reject unwanted photons, a more fundamental approach involves engineering the photosensitive material itself to inherently discriminate between wavelengths. This strategy shifts the burden of spectral selection from the optical domain to the optoelectronic domain, leveraging the intrinsic absorption properties and bandgap engineerability of novel semiconductor materials. By integrating these spectrally selective absorbers directly with the readout circuitry, this paradigm eliminates the need for passive lossy filters and enables true monolithic multispectral detection [
55,
56]. A notable commercial implementation of this concept exploits the wavelength-dependent absorption depth of silicon, as schematically illustrated in
Figure 7. In bulk silicon, shorter wavelengths such as blue light are absorbed near the surface, while longer wavelengths such as red light penetrate deeper into the substrate before generating electron-hole pairs. By stacking multiple P-N junctions vertically at precise depths within a single pixel, distinct spectral bands can be extracted simultaneously without lateral subsampling [
57]. While this vertical integration successfully demonstrates the preservation of spatial resolution often lost in mosaic patterns, the inherently broad absorption spectrum of silicon fundamentally limits spectral separability. Consequently, such devices are typically restricted to broadband multispectral sensing rather than the high-resolution narrowband discrimination requisite for hyperspectral imaging.
To overcome the limited spectral discrimination of silicon while retaining the compactness and co-integration advantages of detector-level spectral selection, recent efforts have focused on intrinsically wavelength-selective photodiodes whose narrowband response is generated inside the photosensitive stack rather than by external filters. Metal-halide perovskites are attractive for optics-free spectral selection because their sharp absorption edges and tunable transport allow for self-filtering, yielding a narrow band-edge response while suppressing out-of-band signals. A representative implementation utilizing MAPbX
3 single crystals is illustrated in
Figure 8a,b, where short-wavelength photons are absorbed and recombine in a wider-bandgap substrate before reaching the active layer, effectively filtering out high-energy components. Using this mechanism, Liao et al. achieved an ultra-narrowband photoresponse with a full width at half maximum of 15 nm and a spectral rejection ratio of around 300, tuning the response window from 440 to 560 nm through halide-composition bandgap engineering [
58]. Beyond self-filtering, resonant microcavity engineering provides a complementary route to narrowband selectivity with higher collection efficiency. As depicted in the schematic of
Figure 8c, by embedding the perovskite absorber in a Fabry–Pérot–like cavity defined by reflective electrodes or dielectric mirrors, the optical field is selectively enhanced at resonance while off-resonant wavelengths are suppressed. This architecture produces efficient narrowband detection without adding a separate optical filter element. In line with this concept, Ooi et al. demonstrated that by optimizing the cavity structure, the responsivity and specific detectivity can be significantly enhanced at specific resonant wavelengths (
Figure 8d), tuning the response window from 560 to 660 nm through cavity engineering at the pixel level [
59]. Together, self-filtering and cavity-resonant perovskite devices enable filter-free narrowband detection at the pixel level, offering a scalable alternative to passive color-filter arrays.
Organic semiconductors offer a complementary, fabrication-friendly route to electrically tunable responsivity using manufacturable thin-film stacks. Zhu et al. developed a flexible, solution-processable single-detector organic spectrometer that leverages the bias-dependent charge collection efficiency to achieve tunable spectral response. By sweeping the bias voltage, the device generates a unique set of photocurrent responses that allow for the computational reconstruction of incident spectra with high resolution. They further demonstrated the capability of this approach for hyperspectral imaging by scanning a colored portrait, as visualized in
Figure 9a–c. Specifically, the system successfully recovered distinct spectral images at various wavelengths (
Figure 9a), enabling the full reconstruction of the target scene (
Figure 9b) and precise spectral extraction at specific spatial points (
Figure 9c), which showed excellent agreement with commercial spectrometer measurements [
60]. Pushing this organic strategy toward wider bandwidth and practical high-speed operation, Schrickx et al. introduced a bias-tunable tandem organic photodetector, whose architecture is illustrated in
Figure 9d. In this design, two sub-cells with opposing polarity and complementary absorption jointly generate a voltage-dependent responsivity basis. The resulting single-pixel spectrometer covers 400–1000 nm and supports the reconstruction of both narrowband and broadband spectra (
Figure 9e), while maintaining a responsivity of 0.27 A W
−1, a detectivity of 1.4 × 10
12 Jones, and microsecond-scale rise/fall times of 2.82/3.72 µs at operating voltages below 1 V [
61].
Another direction aims at electrically tunable, single-pixel computational spectrometers that reconstruct spectra from a set of responsivity curves generated under different electrical biases, thereby avoiding dispersive optics, mechanical scanning, and filter arrays. Yoon et al. demonstrated a miniaturized spectrometer based on a tunable van der Waals junction whose transport-modulated spectral response enabled a peak-wavelength accuracy of about 0.36 nm and a spectral resolution of about 3 nm across 405–845 nm, showing that a single electrically reconfigurable optoelectronic element can approach conventional spectroscopy performance when combined with reconstruction algorithms [
62]. To extend the detectable range and simplify operation, Uddin et al. developed a van der Waals tunnel-diode spectrometer based on a MoS
2/black-phosphorus junction. The general working principle of such a diode-based computational spectrometer is illustrated in
Figure 9. During the learning stage (
Figure 10a), the device is characterized under known monochromatic inputs at varying drain-source biases to construct a bias-dependent responsivity matrix. In the testing stage (
Figure 10b), the electrical response to an unknown input spectrum is measured across the same bias range. Finally, the incident spectrum is computationally recovered (
Figure 10c) using the learned matrix and the measured photocurrents. Utilizing this approach, Uddin et al. achieved a broadband spectral range from 500 to 1600 nm while maintaining a peak-wavelength accuracy of about 2 nm in a compact footprint of roughly 30 × 20 μm
2. Their design relies on bias-controlled switching of the dominant transport mechanism, enabling electrically reconfigurable spectral sensing at low operating voltages without a separate gate electrode [
63]. Cui et al. further extended van der Waals spectrometers by using multi-parameter electrical tuning to enrich the response basis and improve spectral identifiability. As illustrated in
Figure 10d, they fabricated a tunable optoelectronic interface based on an InSe/NbTe
2 van der Waals heterojunction with gate control and drain–source bias tuning. Unlike previous designs that rely solely on bias sweeps, this architecture simultaneously modulates both the drain-source bias and the gate voltage. This dual-parameter control generates a high-dimensional photoresponse matrix, significantly increasing the diversity of the spectral features that can be resolved.
Figure 10e further conceptually shows the device’s operation, illustrating how it classifies materials by encoding their optical spectra into unique high-dimensional electrical signatures for database matching, which demonstrates that expanding tuning dimensionality significantly enhances spectral reconstruction accuracy [
64].
Collectively, these material- and device-driven approaches demonstrate that spectral discrimination can be fundamentally embedded within the optoelectronic response of the sensor itself, rather than imposed through external or pixel-level optical filtering. By exploiting wavelength-dependent absorption depth, bandgap tunability, and electrically controllable carrier dynamics, vertically stacked semiconductors and low-dimensional materials enable multispectral and hyperspectral detection within a monolithic device footprint. Compared with filter-based architectures, these strategies offer intrinsic advantages in photon utilization efficiency, spatial resolution preservation, and integration density. At the same time, they introduce new challenges related to material uniformity, interface engineering, long-term stability, and large-area manufacturability. Continued progress in nanomanufacturing, heterogeneous integration, and device–circuit co-design is therefore critical for translating these concepts into scalable sensing platforms. As these challenges are addressed, intrinsic spectral selectivity based on material and device physics is expected to play a central role in the development of fully integrated and optics-free hyperspectral sensors, enabling compact, robust, and functionally reconfigurable spectral imaging systems beyond the limits of conventional optical architectures.