1. Introduction
Decomposing images into reflectance and illumination components has proven useful for many computer vision tasks, including object detection, tracking, and face recognition in low-light conditions, where strong color consistency and accurate separation of intrinsic factors are essential [
1,
2,
3,
4,
5]. While Retinex theory remains fundamental for illumination-reflectance decomposition, traditional formulations differ from current neuroscientific knowledge and often do not ensure accurate signal/image reconstruction. [
6,
7]. One of the main reasons for this is that these methods analyze color channels separately, neglecting the vector nature of color vision and cross-channel relationships [
8].
Retinex decomposition is naturally ill-posed, causing ambiguity and making it hard to find a unique solution [
9] Traditional methods often produce halo artifacts and color distortions around high-contrast edges, which can harm perceptual quality [
10,
11]. Early methods like Single-Scale Retinex (SSR) and Multi-Scale Retinex with Color Restoration (MSRCR) improved basic models by using Gaussian or logarithmic filtering but caused halo artifacts, over-enhancement, and color shifts under difficult lighting conditions [
12,
13]. Variational methods provide better stability with smoothness priors, but they can sometimes oversmooth details or increase noise [
14,
15]. Fusion-based strategies combine multiple enhancement estimates but struggle with significant illumination variations [
16]. These models typically assume uniform lighting, which inadequately represents scenes with complex lighting conditions, including shadows and multiple light sources [
17,
18]. Additionally, Retinex techniques depend heavily on heuristic parameter tuning, leading to inconsistent performance [
19], and more advanced models often have high computational complexity, which limits real-time use [
20]. They also increase image noise, decrease effectiveness for noisy inputs [
15], and perform poorly in extremely dark or saturated areas, creating unnatural artifacts. Addressing these issues one at a time often causes new problems, highlighting the need for a fundamental improvement, such as the quaternion-based framework introduced here [
21].
Neuroscientific evidence exposes key limitations in traditional Retinex models. Spatial processing begins in the early visual pathways (retina, LGN), where mechanisms such as spatial frequency tuning and center-surround interactions facilitate coarse-to-fine analysis [
22]. Furthermore, Retinex lacks spatial-frequency modeling, as it decomposes images pixel-by-pixel without accounting for broader spatial structures. Higher brain regions combine color and spatial information, with ventral areas such as V4 showing stronger responses to color boundaries [
23].
This motivates quaternion-based representations that maintain inter-channel relationships via the Hamilton product and color space rotations [
24,
25,
26]. Hypercomplex algebra unifies color operations, reduces parameter redundancy, and shows promise for restoration, segmentation, and correction tasks [
27]. Parallel research showcases the effectiveness of wavelet transforms in managing both low- and high-frequency components, aiding applications such as multi-scale feature extraction, denoising, and contrast enhancement [
28,
29,
30]. Although quaternion-wavelet methods show promise, they often lack precise reconstruction guarantees and incur high computational costs, making them less ideal for tasks that require exact pixel-level processing. Despite their widespread use, Retinex decomposition models have fundamental limitations that drive our quaternion-based approach.
To overcome the limitations of conventional Retinex models, we introduce a quaternion algebra-based unified mathematical framework for Retinex decomposition. Key contributions are:
Quaternion Retinex Framework (QRetinex-Net): A novel Retinex model defined in the quaternion domain, is the Hamilton product. By representing RGB channels as quaternions and modeling image formation through the Hamilton product, our approach achieves: (1) comprehensive color processing that maintains inter-channel relationships, (2) biological plausibility consistent with opponent-color perception, and (3) invertible, reliable color reconstruction.
Reflectance Consistency Index (RCI): A new metric for evaluating reflectance stability and illumination invariance.
Multi-Task Validation: Demonstrated across low-light crack detection, infrared–visible fusion, and face detection under variable lighting—showing superior segmentation, fusion, and detection results.
State-of-the-Art Performance: Outperforms RetinexNet, KIND++, U-RetinexNet, and Diff-Retinex, with higher PSNR/SSIM, improved perceptual quality (LPIPS ≈ 0.0001), and strong reflectance consistency (RCI ≈ 0.988).
The rest of this paper is organized as follows.
Section 2 reviews Retinex theory and quaternion algebra;
Section 3 details QRetinex-Net and RCI;
Section 4 presents experimental results and ablations;
Section 5 discusses insights and future directions;
Section 6 concludes the study.
2. Background
Color perception depends on the interactions between illumination and the reflectance properties of the surfaces. As illustrated in
Figure 1, the process begins with an illumination source emitting light with a specific spectral distribution, which then interacts with object surfaces that selectively reflect particular wavelengths. This reflected light enters the human eye, where photoreceptors with distinct spectral sensitivities detect it. Different illuminant-surface combinations can produce identical color signals, leading to inherent ambiguity in the visual system. The overlapping spectral sensitivities of the cone cells in the human retina further intensify this ambiguity. Humans typically possess three types of cones, categorized as L (long-wavelength), M (medium-wavelength), and S (short-wavelength) cones. The peak sensitivity of L-cones typically ranges between 564 and 580 nm, M-cones between 534 and 545 nm, and S-cones between 420 and 440 nm. While this overlapping sensitivity brings advantages for everyday vision, it significantly complicates accurate color reproduction.
Retinex is considered one of the most influential approaches to understanding color constancy in particular and human vision in general, and has been applied to image enhancement [
6,
7]. Retinex processing decomposes visual signals into illumination and reflectance components. The term “Retinex” itself is a combination of “retina” and “cortex,” suggesting a processing pathway that spans from the initial detection of the light in the retina to higher-level analysis in the visual cortex. The theory proposes that our perception of color and lightness largely depends on factors beyond specific illumination conditions, indicating its relevance in scenarios where illumination varies. In contrast, objects’ perceived colors and lightness remain relatively constant. In its original formulation, Retinex theory postulated a degree of independence in processing the color channels: red, green, and blue, corresponding to the L, M, and S cones in our eyes. This initial simplification assumed that each channel is processed in isolation, with each cone system calculating its lightness values without direct influence from the other channels.
Retinex theory provides a fundamental framework for understanding how the human visual system maintains consistent color perception despite changes in lighting. Recognized as one of the most influential models in the study of color constancy and visual perception, Retinex was initially developed to explain the stability of perceived color across different lighting conditions. Beyond its theoretical importance, Retinex has practical uses in image enhancement and computer vision, where it improves visual quality by correcting lighting inconsistencies. Mathematically, Retinex theory is often expressed as:
where
represents the observed image intensity at position
is element-wise multiplication, R indicates the reflectance of the surface, and
denotes the illumination.
In digital imaging, Retinex-based methods seek to enhance visual content by separating these components, thereby improving contrast and color accuracy.
Figure 2 shows a Retinex decomposition of two images with slightly different lighting conditions, demonstrating that while illumination varies, reflectance remains consistent, thereby emphasizing the core principle of the theory.
Though decomposing the observed image S into its reflectance R and illumination L components is a significant computational challenge due to the ill-posedness of the inverse problem. To manage this inherent ambiguity, researchers need to include prior assumptions such as the spatial smoothness of illumination and impose constraints like illumination has to be at least as bright as the measured intensity [
31]. These assumptions help constrain the solution space and make the decomposition.
Early implementations of the Retinex theory often processed the illumination channel using logarithmic or Gaussian filtering. SSR used a center/surround filtering mechanism to enhance local contrast [
12]. Although SSR offered improved color consistency, it could not simultaneously handle significant dynamic range compression and faithful tone rendering, frequently producing halo artifacts near edges. The MSRCR attempted to mitigate color distortions by integrating multiple scales and applying color recovery [
13].
While MSRCR improved overall color fidelity relative to SSR, it still tended to introduce artifacts around high-contrast boundaries. Path-based Retinex algorithms explored pixel intensities along specific paths, but they proved computationally expensive, vulnerable to sampling noise, and highly dependent on path selection [
19]. Subsequent classical variations, such as random spray Retinex (RSR) and bright-pass filtering, sought more efficient approximations of illumination. Nonetheless, these methods continued to struggle in the presence of strongly non-uniform illumination or intense noise, often resulting in over-enhancement or the loss of fine details [
14,
15].
Early implementations processed illumination through logarithmic or Gaussian filtering. Single-Scale Retinex (SSR) used center/surround filtering but produced halo artifacts [
12]. Multi-Scale Retinex with Color Restoration (MSRCR) improved color fidelity but still introduced artifacts at high-contrast boundaries [
13]. Path-based algorithms proved computationally expensive and sensitive to sampling noise [
19]. Variational frameworks with smoothness priors improved stability but tended to over-smooth details [
14,
15,
20,
32].
To address the inherent shortcomings of classical Retinex, variational-based frameworks have introduced smoothness priors on illumination and piecewise-continuous constraints on the reflectance map. Notable examples include total variation models (TVM) and weighted variational approaches (WV-SIRE), which demonstrated enhanced decomposition performance by enforcing regularity [
20]. However, these approaches sometimes over-smooth key edges and details in darker regions. Fusion-based methods similarly aimed to obtain better local detail enhancement by decomposing an image into reflectance and illumination and then combining multiple contrast-enhanced versions of the illumination map [
16].
Recent deep learning approaches have advanced Retinex decomposition. KinD divided the workflow into decomposition, reflectance restoration, and illumination adjustment modules [
33]. URetinexNet mapped iterative optimization into a trainable network [
34]. Transformer-based solutions like RetinexFormer integrated attention mechanisms [
35], while diffusion models combined generative capabilities with Retinex principles [
36,
37]. Despite these advances, fundamental limitations remain: methods still struggle to balance fidelity of reconstruction, artifact suppression, and component independence.
The main challenges stem from the limitations of simple or independent channel-processing methods and the ill-posed nature of the Retinex equation, which allows infinitely many combinations of reflectance and illumination to produce the same observed image. This mathematical ambiguity raises important questions: (i) which surface properties are most relevant for real-world applications? (ii) What can the human visual system realistically determine about true surface reflectance given only three cone types? and (iii) how can surface reflectance be accurately estimated from light entering sensors?
Quaternion algebra offers a solution by enabling unified color representation within a single mathematical framework. This approach delivers: (i) holistic rod-cone response modeling, (ii) accurate visual system simulation, and (iii) extensibility for chromatic adaptation and spatial interactions.
Quaternion algebra offers a promising solution by enabling unified representation of both color and geometric information within a single mathematical framework. This approach delivers key advantages: (i) unified rod and cone response representation with natural vision type transitions and efficient spatial computation; (ii) more accurate visual response modeling leading to better performance predictions and enhanced processing systems; and (iii) systematic examination of visual adaptation mechanisms and rod-cone interactions. The framework’s extensibility includes chromatic adaptation, temporal dynamics, and spatial interactions, making it versatile for both research and practical vision science applications.
3. Materials and Methods
In this section, we present a comprehensive framework that decomposes color images into reflectance and illumination components using quaternion mathematics. We begin with an overview of quaternion color representation, followed by the introduction of our novel quaternion decomposition method. Next, we describe the architecture of our decomposition network and provide implementation details. Finally, we introduce the Reconstruction Color Index (RCI) metric for assessing decomposition quality.
3.1. Quaternion Algebra Fundamentals
Quaternions extend complex numbers to four dimensions. A quaternion
is expressed as:
where
and the basis units satisfy:
Multiplication on quaternions is defined through the Hamilton product. For two quaternions
the Hamilton product is defined as [
38]:
Quaternions provide a mathematical structure for color representation, allowing RGB color values to be modeled as a 3D vector component, with the scalar component used for luminance information. This 4D representation naturally accommodates rotations and transformations in color space and can effectively model the spectral distribution and geometric relationships of color signals in a unified framework.
3.2. Quaternion-Based Color Representation
Quaternions provide an intuitive mathematical structure for color representation by modeling RGB values as a three-dimensional vector, with the scalar component used for luminance information. This four-dimensional representation naturally accommodates rotations and transformations in color space, effectively modeling both the spectral distribution and geometric relationships of color signals within a unified framework.
The quaternion-based modeling draws inspiration from human visual perception, specifically the relationship between rod and cone cells. A quaternion
can be represented as
, where
represents the rod response (scalar component) and
represents the cone responses (vector component). The quaternion representation of visual response can be expressed as:
where the rod component is
(with r being the scalar response magnitude under given illumination), and the cone component is
(where
,
, and
represent the responses of the three cone types). The parameters
and
are light-dependent factors representing rod and cone contributions, respectively, with the normalization constraint
.
3.3. Mathematical Formulation of the Quaternion Retinex
Let
represent a quaternion-valued matrix for an RGB image of height
and width
, with color channels
for red, green, and blue components, respectively [
27,
38]. Each element of
is a purely imaginary quaternion representing a single pixel:
where
,
.
The Quaternion Retinex decomposition aims to express the observed image as:
where
is the reconstructed color image,
is a reflectance-like component invariant to illumination changes,
captures illumination information,
is the Hamilton product, and N
1 and N
2 represent noise terms.
Unlike element-wise or concatenation-based operations that treat RGB channels separately, the Hamilton product inherently couples all color channels through multiplicative interactions, enabling the network to model inter-channel dependencies directly. In the quaternion representation q = 0 + Ri + Gj + Bk, the basis units (i,j,k) interact under fixed algebraic rules that naturally encode opponent-color mechanisms (e.g., red–green, blue–yellow), consistent with biological color vision. This coupling ensures that illumination variations affect all channels coherently, constraining the solution space of the ill-posed Retinex decomposition problem and producing reflectance maps that better capture the correlated nature of human cone responses.
Our objective is to decompose S into two matrices and such that their element-wise Hamilton product yields a reconstruction approximating . This presents a significant computational challenge, as it is an ill-posed inverse problem that requires careful regularization and constraints.
3.4. Illumination-like Map Definition and Goals
To solve this ill-posed problem, we first define an illumination-like map (), which serves several specific goals in the quaternion-based Retinex decomposition framework.
Primary Objective—Lighting Isolation: The illumination-like map isolates and represents the lighting conditions that affect the scene. It captures: (1) light intensity variations across the image, (2) color temperature of the illumination source (warm/cool lighting), (3) spatial distribution of light (shadows, highlights, gradients), and (4) spectral characteristics of the illumination.
Mathematical Objective: From the initial formulation + G(x,y)j + B(x,y)k, the illumination-like map initially retains the original RGB magnitudes before refinement. This means it: (1) preserves the absolute intensity values from the original image, (2) maintains color information that varies due to lighting conditions, and (3) captures illumination-dependent color shifts.
To facilitate the quaternion-based Retinex decomposition, we first express each RGB pixel as two quaternions
and
. The reflectance-like map captures normalized color ratios that are invariant to intensity:
where
The illumination-like map retains the original magnitudes:
Functional Goals: The illumination-like map enables: (1) illumination invariance by separating lighting effects into , making the reflectance component more invariant to lighting changes, (2) independent modification or correction of illumination information without affecting surface properties, (3) accurate reconstruction of the original image through the Hamilton product ⊗ , and (4) preservation of complex spectral relationships between different color channels.
Practical Applications: The illumination-like map enables white balance correction by modifying illumination components, lighting normalization across different scenes, shadow/highlight adjustment without affecting material properties, and color constancy improvements in computer vision tasks.
3.5. Wavelet-Domain Decomposition Framework
To effectively exploit multi-scale information, we operate in the Haar wavelet domain [
31,
39]. We apply the discrete Haar wavelet transform (DWT) to each channel of
and
, yielding the transformation
{
,
}, where
and
contain the low- and high-frequency sub-bands
per channel.
A trainable decomposition network refines these wavelet-domain representations to produce , , which better disentangles the underlying reflectance structure from illumination effects. Finally, we apply the inverse wavelet transform to the coefficients of each channel to obtain the final outputs and .
Finally, we invert the wavelet coefficients of each channel to obtain the outputs and . The goal of the network is to obtain two quaternions whose Hamilton product reconstructs the input image and can be interpreted, respectively, as disentangled reflectance and illumination factors. This design preserves full reconstruction fidelity through the invertibility of the wavelet transform and the explicit quaternion splitting, while also leveraging inter-channel dependencies to improve color constancy and robust feature extraction.
3.6. Decomposition Network
The proposed quaternion decomposition network
(Algorithm 1) generates the wavelet-domain representations
and
so that their inverse wavelet transforms
and
satisfy (4). To begin, the input image
is used to form two initial quaternions
and
using the normalized and original quaternion color representations (5) and (7) (
Figure 3). Each quaternion
or
consists of four channels
, where
is the real part and
and
are the three imaginary parts. A discrete Haar wavelet transform
is applied independently to every channel, yielding wavelet-domain tensors:
where
denote the channels of
. Concatenating these results produces a 32-channel representation, where the 16 channels of
and 16 channels of
capture multiscale information in the LL, LH, HL, and HH sub-bands.
Once in the wavelet domain, the network refines the representations through separate convolutional branches that learn higher-level features for each quaternion. A Symmetric Cross Attention module is employed to enforce coherent decomposition.
| Algorithm 1 Quaternion Retinex Network Decomposition |
| Input: RGB image |
| Output: Reflectance , Illumination |
Initialize quaternion representations:Form Form Wavelet decomposition:- 4.
Apply DWT: - 5.
Extract sub-bands: for each quaternion channel
Feature extraction:- 6.
- 7.
Extract features: Symmetric cross-attention: Reflectance branch:- 8.
- 9.
Attention: - 10.
- 11.
Illumination branch:- 12.
- 13.
Attention: - 14.
- 15.
Refinement and reconstruction:- 16.
Concatenate and fuse: - 17.
Apply sharpening with Laplacian kernel - 18.
Inverse wavelet transform: Post-processing:- 19.
Apply smoothing: - 20.
Return
|
Cross Attention module: Let the wavelet-domain feature map for the reflectance branch be
and for the illumination branch be
. A learnable set of pointwise convolutions projects
and
into query
, key
, and value
tensors. The reflectance branch computes (the inner working of the block is illustrated in
Figure 3):
The illumination branch computes:
Each query, key, or value is reshaped so that the spatial dimensions (
) become a single index and let
be the per-head dimensionality. Then cross-attention for
is defined as:
Here
is the attention matrix, and
is the resulting aggregation from the illumination branch back into
. A similar cross-attention is formed to refine
These cross-branch features are reshaped back to match the original spatial dimensions and fused with a residual skip-connection:
where
is a learned
convolution consolidating the multi-head outputs. This enables each branch to attend to features in the other branch, thereby strengthening the decomposition of shared edges and color dependencies.
The QRetinex decomposition network operates on 8-channel input: 4 channels each for and . When wavelets are enabled, the Haar DWT quadruples channel count to 32. The initial 3 × 3 convolution projects input to a base feature dimension of 32 channels. Each decomposition branch consists of two sequential 3 × 3 convolutional blocks with ReLU activation and residual connections, maintaining the 32-channel dimension throughout. The Symmetric Cross-Attention module employs single-head attention of size 32 with 1 × 1 pointwise convolutions for query, key, and value projections in both directions, followed by softmax-normalized attention weights and residual aggregation. After cross-attention, features are concatenated (64 channels) and fused via a 1 × 1 convolution back to 32 channels, then projected to output space (8 or 32 channels depending on wavelet usage) through a final 3 × 3 convolution. A sharpening layer applies a 3 × 3 Laplacian-initialized convolution to enhance edge detail, followed by inverse wavelet transform when applicable. Finally, channel-wise post-smoothing blocks (3 × 3 depth-wise separable convolutions with residual connections) are applied independently to and outputs to suppress grid artifacts.
3.7. Implementation Details
We train the QRetinex decomposition network on the LOLv1 dataset, which contains 485 paired low-light and normal-light images for training [
40]. During training, we extract random 256 × 256 patches from each image pair and process mini-batches of the size 8. The model is optimized using AdamW with an initial learning rate of 3 × 10
−4, momentum parameters β
1 = 0.9 and β
2 = 0.999, and no weight decay [
41]. We employ a two-phase learning rate schedule: a linear warmup over the first 10 epochs (starting from 10
−9) followed by cosine annealing decay to a minimum of 10
−7 over the remaining 990 epochs. To stabilize training, we apply gradient clipping with a maximum norm of 1.0. All experiments, including both training and testing, are conducted on a single NVIDIA RTX 5080 GPU (NVIDIA Corporation, Santa Clara, CA, USA). This computation procedure is summarized in Algorithm 1. The entire network contains approximately 75 K trainable parameters, enabling efficient end-to-end optimization with standard GPU memory constraints.
Our training objective combines reconstruction fidelity with Retinex-inspired regularization terms. For each mini-batch, we decompose both the low-light input
and its normal-light counterpart
into quaternion reflectance-illumination pairs (
,
) and (
,
). Beyond standard paired reconstruction, we enforce cross-reconstruction constraints, reconstructing
from
and vice versa, to ensure illumination-invariant reflectance learning. Additionally, we augment each low-light image with two photometric jitters (random gain in range [0.8, 1.2], per-channel color cast in range [0.9, 1.1], and gamma correction in range [0.9, 1.1]) and penalize inconsistency between their reflectance estimates with a combined L1 and cosine similarity loss (weight 0.05), promoting robustness to illumination variations. The total loss is defined as:
where
,
,
sums over pixels and channels,
is the cosine similarity of flattened
-normalized tensors, and
denotes the pixel-wise magnitude of the
vector.
The mutual (cross) reconstructions swap reflectance and illumination across lighting conditions,
limiting degenerate factorizations and encouraging stability under illumination changes.
Illumination regularization: Edge-aware smoothness down-weights smoothing at strong reflectance edges:
With
implemented by fixed
finite-difference stencils and
denoting element-wise multiplication. We also include anisotropic total variation,
Finally, to explicitly suppress high-frequency content in the illumination, we penalize the energy that remains after applying a radial high-pass mask to the FFT of the quaternion magnitude:
where
,
is the 2-D Fourier transform, and
keeps frequencies with radius greater than a cutoff
.
Reflectance consistency: Beyond simple equality in , we combine (i) cosine similarity between flattened, -normalized reflectances; (ii) gradient-level alignment of their luminance channels; and (iii) magnitude consistency, which discourages arbitrary rescalings of the albedo vector. These terms reinforce illumination-invariant reflectance estimates.
Quaternion constraints: Because only the imaginary parts are rendered, we softly suppress the real parts of the Hamilton products and encourage unit-norm quaternions, which stabilizes optimization without over-constraining the model.
Augmentation-based invariance (auxiliary): During training, we additionally sample two global photometric jitters (gain, per-channel color, and gamma) of each low-light input and penalize discrepancies between the corresponding reflectances with an cosine loss (weight 0.05); this promotes illumination invariance of under plausible exposure/color changes.
3.8. The RCI Metric
To assess the stability of a decomposition
under varying illumination, we introduce the Reflectance Consistency Index (RCI). We begin by linearly interpolating between a low-light image
and a normal-light image
, yielding a sequence
for
:
Applying the chosen decomposition to each produces reflectance maps . If the reflectance captures intrinsic scene properties, it should remain nearly invariant across different illumination conditions.
Let
denote the set of all pixel locations
. For each pixel, we compute the variance of its reflectance values
over
:
where
is the discrete set of interpolation steps. Since the reflectance is assumed normalized to the range
, the maximum possible variance is
. We then define the RCI by taking the supremum of these variances over
:
An RCI value of 1 corresponds to zero variance and indicates perfectly consistent reflectance across all illumination levels. Lower values signify greater deviations in
. This worst-case perspective ensures that even localized inconsistencies in the reflectance map are penalized. By focusing on the supremum of the variance, RCI captures the most critical breakdown in reflectance invariance. This ensures that methods yielding highly stable reflectance under most pixels, yet large errors in a few regions, are penalized, reflecting the core Retinex principle that the reflectance should remain consistent under changes in illumination. The metric computation process is illustrated in
Figure 4.
We compute RCI using 11 uniformly spaced interpolation steps , which provides sufficient sampling density while remaining computationally efficient, the per-pixel variance is computed independently for each RGB channel.
4. Results
We evaluate the proposed quaternion-based Retinex decomposition across diverse low-light vision tasks, including low-light crack detection, zero-shot day–night adaptation for object detection, and infrared-visible image fusion. We compare our approach with existing state-of-the-art methods, analyzing both decomposition accuracy and downstream task performance.
4.1. Decomposition Experiments
We evaluate decomposition fidelity and component disentanglement on the LOLv1 [
40] validation subset (15 paired images) and, for cross-dataset robustness, the LOLv2 test set (100 images). All baseline methods are evaluated using their official pretrained checkpoints and recommended hyperparameters. We maintain consistent preprocessing (normalization to [0, 1]) and apply no post-processing to any method’s outputs. All metrics are computed on raw network predictions at native resolution (400 × 600 for LOLv1, 512 × 512 for LOLv2) to ensure fair comparison.
We employ multiple complementary metrics to comprehensively evaluate decomposition quality: PSNR (dB) and SSIM measure reconstruction fidelity and structural similarity [
42]; LPIPS quantifies perceptual quality, with lower values indicating better perceptual similarity [
43]; CORR (correlation coefficient) measures the relationship between illumination and reflectance components [
44]. Lower CORR values indicate better decomposition quality, as illumination and reflectance should ideally be independent; our newly proposed RCI metric.
Table 1 presents the RCI scores for deep learning-based decomposition methods. Our approach achieves an RCI of 0.988, substantially outperforming competing methods: RetinexNet (0.779), URetinexNet (0.824), Diff-Retinex (0.774), and KIND++ (0.605). This dramatic improvement indicates that our reflectance component remains far more invariant when illumination changes, highlighting the strength of modeling color channels jointly via quaternion operations.
Table 2 provides a comprehensive comparison across all metrics, including both traditional methods (JIEP, STAR) and deep learning approaches. The results reveal several key findings: Our method (without wavelet transform) achieves exceptional reconstruction performance with PSNR of 65.20 ± 2.19 dB and SSIM of 0.9998 ± 0.0001 on low-light images, surpassing all competitors by substantial margins. Even our wavelet variant achieves strong performance (PSNR: 52.37 ± 2.39 dB). Our method achieves the lowest LPIPS scores (0.00005 ± 0.00002 for low-light, 0.00005 ± 0.00003 for normal-light), indicating superior perceptual similarity to the ground truth compared to methods like URetinex (0.10339) and DiffRetinex (0.21248 for low-light). The CORR metric reveals that our method maintains low correlation between reflectance and illumination (0.009 ± 0.051 for the non-wavelet variant, 0.015 ± 0.038 for wavelet), confirming effective separation of these components. Traditional methods like JIEP show much higher correlation (0.536 ± 0.271), while KIND++ exhibits the poorest separation (0.750 ± 0.191).
The competing deep learning methods show varying limitations: RetinexNet and URetinex suffer from poor reconstruction accuracy (PSNR 35.49–44.40 dB) and high CORR values, while DiffRetinex shows the worst performance on low-light images (PSNR: 23.47 ± 10.95 dB, LPIPS: 0.21248). Traditional methods like STAR achieve moderate PSNR (40.61 ± 3.48 dB) but fail to properly separate illumination and reflectance (CORR: 0.322 ± 0.243).
Figure 5 provides visual evidence of these quantitative improvements. RetinexNet (
Figure 5b) produces noticeable halos and color shifts in the reflectance component, particularly visible in the highly saturated yellow regions. KIND++ (
Figure 5c) shows improved noise handling but loses fine texture details in the reflectance. URetinexNet (
Figure 6d) better suppresses noise but introduces mild color distortions around high-contrast edges, especially visible in the reflectance map. Diff-Retinex (
Figure 5e) shows similar artifacts with additional smoothing that compromises detail preservation.
In contrast, our quaternion-based method (
Figure 5f) effectively preserves both texture and color fidelity. The reflectance component (shown in the i, j, k quaternion channels) retains structural details without halos or color shifts, while the illumination (QI) remains smooth and artifact-free. Notably, our reflectance remains stable even under simulated illumination changes, as evidenced by the high RCI score. The quaternion representation naturally captures inter-channel color relationships, producing more consistent reflectance maps than methods that process RGB channels independently.
These results confirm that the proposed quaternion-based Retinex framework excels at capturing cross-channel dependencies while ensuring near-perfect reconstruction. The combination of highest PSNR and SSIM metrics, lowest LPIPS scores, minimal illumination-reflectance correlation (CORR), and superior reflectance consistency (RCI) underscores the robustness and effectiveness of our decomposition approach—a critical advantage for downstream tasks in low-light imaging.
Table 3 demonstrates the robust generalization of the proposed quaternion-based Retinex framework on the LOLv2 dataset, revealing performance patterns consistent with those on LOLv1 and highlighting the method’s cross-dataset stability. The quaternion approach achieves exceptional reconstruction quality, with the non-wavelet variant reaching 66.55 ± 0.18 dB PSNR and 0.9999 SSIM on low-light images—dramatically outperforming competing methods by margins exceeding 20 dB over the second-best performer (JIEP: 46.67 dB). Notably, the wavelet variant demonstrates near-perfect component independence on normal-light images (CORR: 0.000 ± 0.000), indicating complete separation of illumination and reflectance factors under favorable conditions. The perceptual quality metrics further underscore the method’s superiority, with LPIPS scores as low as 0.00003—two to three orders of magnitude better than traditional methods (JIEP: 0.02664, STAR: 0.07849) and deep learning competitors (URetinex: 0.11100). While competing methods exhibit substantial performance degradation between LOLv1 and LOLv2 (e.g., URetinex drops from 35.49 to 34.85 dB PSNR on low-light images), the quaternion framework maintains consistently high metrics across both datasets, validating its architectural robustness to variations in image content, illumination distributions, and capture conditions.
4.2. Low-Light Crack Segmentation
Detecting and segmenting cracks in concrete structures is critical for structural health monitoring, yet such tasks are often complicated by poor visibility, noise, and uneven illumination. CrackNex approaches the problem by using a Retinex-based decomposition to extract illumination-invariant reflectance features and few-shot learning to deal with the scarcity of labeled crack images [
48]. In the original CrackNex pipeline, an input image
is decomposed into reflectance and illumination components
. A few-shot support prototype is then learned from limited labeled exemplars, while a reflectance prototype is extracted from
. These prototypes are integrated into a dedicated module to simultaneously include crack features and reflectance information, followed by an Atrous Spatial Pyramid Pooling (ASPP) module to enhance multi-scale features for final segmentation. The original implementation uses a pretrained network from RetinexNet for decomposition [
40]. We replace it with our decomposition, where the RGB channels
are considered a single quaternion entity. This quaternion
is then decomposed into quaternion-valued reflectance
and illumination
.
We evaluate two variations: (i) Standard quaternion-based decomposition in the spatial domain, and (ii) the output of the decomposition network before the inverse wavelet transform. Only the reflectance component is used to learn the reflectance prototype. All other components of CrackNex, including few-shot prototype learning, prototype fusion, and ASPP segmentation, remain intact. The primary difference is that the scalar reflectance constraint replaces our quaternion-based constraint.
We benchmark the modified CrackNex on the LCSD dataset. As summarized in
Table 4, the quaternion-based methods achieve superior segmentation results: our spatial-domain Quaternion Retinex yields a +2% boost in mean Intersection-over-Union (mIoU) and a +3% increase in F1-score compared to the original CrackNex. When further incorporating the wavelet-domain decomposition, we observe an even larger improvement. Qualitative results in
Figure 6 show that our Quaternion Retinex significantly enhances crack boundary visibility and suppresses noise in dark regions.
4.3. Zero-Shot Day–Night Domain Adaptation for Object Detection
Object detectors trained on well-lit, daytime images often struggle under low-light or nighttime conditions due to the considerable domain gap created by poor illumination, high noise, and color distortions. Assembling large-scale labeled datasets for night scenes is costly, which drives the development of zero-shot domain adaptation approaches to enable generalization to nighttime without need of actual nighttime data during training. Du et al. proposed a “DArk-Illuminated Network” (DAI-Net) that pairs normal-light images with their artificially darkened counterparts [
49]. This work also utilizes the decomposition from the RetinexNet, thereby directing the model to concentrate on reflectance features that are less influenced by changes in illumination. To enhance these features, DAI-Net employs a loss function that encourages consistent reflectance estimates across synthetic and real variations in illumination. We replace RetinexNet with our Quaternion Retinex. Concretely, the synthetic low-light image
and its well-lit counterpart
are jointly decomposed into quaternion reflectance
and illumination
. We adapt DAI-Net’s interchange and recomposition steps to operate on
and
, applying the same reconstruction losses in quaternion space. Apart from the decomposition module, the downstream detection pipeline (e.g., YOLOv3 or DSFD) is left unchanged.
We evaluate our approach on Dark Face dataset under zero-shot adaptation settings [
50].
Table 5 compares the original DAI-Net with our DAI-Net + quaternion retinex variant, reporting mean Average Precision (mAP) on two benchmark datasets. Integrating quaternion retinex boosts performance by +1.2% mAP on the Wider-Face → Dark-Face setup and +0.8% mAP on the COCO → Dark Face transfer.
4.4. Retinex-Based Infrared and Visible Image Fusion
Infrared-visible (IR-VI) fusion is crucial for night vision, surveillance, and target recognition. IR images capture thermal information while visible images provide texture and color details. Recent methods use Retinex decomposition to separate visible image reflectance from illumination, then fuse with infrared data at feature or pixel levels. Wang et al. proposed RDMFuse, an original decomposition model independent of pretrained networks [
51]. It fuses infrared with visible reflectance to preserve structural details, using an illumination-adaptive module to enhance contrast in low-light conditions. We have replaced the standard reflectance-illumination separation with our quaternion-based wavelet-domain decomposition. Particularly:
We decompose the visible image using our quaternion wavelet model, yielding and in the quaternion domain.
We convert the infrared image into a quaternion representation by assigning its intensity values to the imaginary components of the quaternion and setting the real component to zero.
The existing contrast texture module and reflectance-fusion function in RDMFuse now operate on and .
The illumination-adaptive module remains in place but processes our quaternion-based illumination instead of the original scalar illumination map. Minor normalization layers are introduced to handle the four-dimensional hypercomplex features.
We evaluated our approach on the LLVIP dataset using four visual quality metrics:
EN (Entropy): Information content; higher values indicate richer detail
SF (Spatial Frequency): Overall sharpness; larger values imply sharper images
AG (Average Gradient): Edge clarity and contrast; higher values indicate better edge detail
STD (Standard Deviation): Intensity spread; higher values indicate better contrast.
Table 6 reports the average metric scores on the LLVIP test subset. Our method exceeds RDMFuse across all four metrics, suggesting improved detail preservation, edge clarity, and contrast. Hence, quaternion retinex demonstrates more robust low-light fusion performance than RDMFuse, aided by quaternion-based wavelet decomposition and hypercomplex illumination handling.
4.5. Ablation Studies
To validate the design choices in QRetinex-Net, we conducted comprehensive ablation experiments on the LOLv1 dataset.
Table 7 presents the impact of key architectural components and loss functions on model performance.
The ablation results reveal several counterintuitive but important findings. Removing the wavelet transform improves PSNR by 1.10 dB, suggesting that for the quaternion representation, direct spatial processing may be more effective than frequency-domain decomposition. However, we retain the wavelet transform in our full model, as it provides better perceptual quality and reduced artifacts in the qualitative evaluation, despite a marginal PSNR decrease. The post-smoothing module proves critical; its removal results in the largest performance drop (−3.97 dB in PSNR), confirming its importance in suppressing grid artifacts introduced by the quaternion decomposition.
The wavelet/spatial trade-off reflects a fundamental tension between pixel-level reconstruction fidelity and perceptual quality. While the spatial variant achieves 1.10 dB higher PSNR through direct optimization, it produces subtle grid artifacts and ringing near edges—perceptually noticeable yet not penalized by PSNR. The wavelet approach naturally suppresses these high-frequency artifacts through multi-scale decomposition, yielding cleaner boundaries despite lower metrics.
While the spatial-domain method achieves higher pixel-level fidelity, it produces subtle grid artifacts and ringing near sharp edges—artifacts that are perceptually noticeable but not heavily penalized by PSNR. The wavelet decomposition naturally suppresses these high-frequency artifacts through its multi-scale representation, yielding smoother illumination fields and cleaner reflectance boundaries. This trade-off favors the wavelet approach for downstream vision tasks where edge quality matters more than pixel-perfect reconstruction.
The swap loss, which enforces cross-reconstruction consistency between low/high image pairs, shows a significant impact (−1.60 dB when removed), validating our paired training strategy. The quaternion-specific regularizations (real-part suppression and unit norm) contribute substantially to stability (−1.26 dB), while the reflectance consistency terms improve illumination invariance (−1.25 dB).
The RetinexNet-style configuration, using only classical Retinex losses, underperforms our complete quaternion-regularized model by 1.50 dB, demonstrating the value of our quaternion-specific constraints. The minimal configuration combining architectural simplification with classical losses shows the largest degradation (−1.94 dB), confirming that both our architectural innovations and loss design contribute meaningfully to performance.
These ablations confirm that QRetinex-Net’s performance stems from the synergy among the quaternion representation, targeted architectural components (particularly post-smoothing), and carefully designed loss functions that respect both Retinex principles and quaternion algebra constraints.
5. Discussion
This work revisits Retinex decomposition through a quaternion formulation that jointly models RGB channels via quaternions and Hamilton product operation, addressing fundamental limitations of classical Retinex per-channel processing. The decomposition results and performance on downstream vision tasks demonstrate that hypercomplex color representation, when combined with explicit reconstruction constraints, yields substantial improvements in reflectance stability, color fidelity, and task performance.
The advantage of the quaternion formulation is that classical: Retinex implementations process RGB channels independently, neglecting the correlated nature of human cone responses and introducing artifacts when illumination varies. By encoding color information with quaternions and modeling image formation as a Hamilton product of quaternion-valued reflectance and illumination , our framework enables the network to learn inter-channel dependencies that scalar multiplications cannot capture. This holistic treatment produces smoother illumination fields and reflectances with superior chromatic stability, particularly near material boundaries where traditional methods exhibit halos and color shifts.
Reflectance consistency and perceptual grounding: The proposed Reflectance Consistency Index (RCI) quantifies worst-case reflectance variance across interpolated illumination levels, providing a task-agnostic measure of illumination invariance. Our method’s RCI of 0.988 indicates near-perfect stability even under extreme lighting variations, substantially exceeding state-of-the-art approaches with RCI ranging between 0.605 and 0.824. The high RCI scores translate directly to downstream task improvements: in crack segmentation, the illumination-invariant reflectance features enable more robust boundary detection in zero-shot day–night adaptation, stable reflectance supports better cross-domain generalization, and in IR-VI fusion, preserved structural details enhance multi-modal integration across all quality metrics.
Biological plausibility and interpretability: The current implementation of QRetinex encodes images as purely imaginary quaternions, with network losses suppressing spurious real components in the reconstructed image. This choice ensures reconstruction fidelity but leaves room for more explicit biological alignment. Future work could assign achromatic signals to the scalar component and align the vector part with LMS cone axes or opponent-color channels, potentially strengthening both the biological plausibility and interpretability of the learned representations. Such extensions would bridge computational efficiency with perceptual modeling, offering insights into how the visual system maintains color constancy across diverse illumination conditions.
Architectural insights and trade-offs: The ablation studies reveal several noteworthy design choices. Post-smoothing proves critical for suppressing grid artifacts from quaternion decomposition. Removing the wavelet transform slightly improves PSNR metrics, yet qualitative evaluation favors the wavelet-based approach for its superior artifact suppression and perceptual smoothness. This suggests that optimal quaternion processing may benefit from adaptive, content-dependent decisions about spatial versus frequency-domain decomposition—an avenue for future investigation. The swap loss, which enforces cross-reconstruction between paired images, effectively constrains the solution space and promotes illumination-invariant reflectance learning, validating our paired training strategy.
Limitations and Impact:
Ill-posed Ambiguity and Potential Remedies: Retinex decomposition remains fundamentally ill-posed: multiple reflectance and illumination factors can produce the same observations, especially under complex or textured lighting. We discuss how physics-based priors (e.g., wavelength-dependent reflectance modeling) and semantic constraints could further regularize this ambiguity and guide future extensions of QRetinex-Net.
Paired-Training Dependency: The current training protocol depends on paired low/normal-light images, which limits its scalability in areas where paired data is unavailable, such as historical archives, astronomy, or cultural heritage imaging. We propose approaches like unpaired, self-supervised, and cycle-consistent training to improve generalization and lower data collection costs.
Failure Modes under Extreme Imaging Conditions: The framework assumes moderate smoothness of illumination. Under severe saturation, specular highlights, or scenes where reflectance and illumination share similar spatial frequencies, the accuracy of decomposition decreases. We suggest adaptive, frequency-aware regularization as a possible way to address these challenging cases.
Practical Impact and Generalization Potential: Despite the previous limitations, QRetinex-Net shows strong zero-shot transfer capabilities in real-world situations—such as day-to-night object detection and few-shot crack segmentation—highlighting its practical relevance for resource-limited environments where labeled nighttime data are limited. Additionally, its bio-inspired opponent-process design offers a promising link between computational neuroscience and future vision models.
Visual decomposition quality and artifact analysis:
Figure 7,
Figure 8 and
Figure 9 reveal characteristic limitations of existing Retinex decomposition methods and demonstrate the advantages of the quaternion-based approach when applied to the same scene under different lighting conditions.
The decomposition comparison in
Figure 7 illustrates several critical failure modes of existing methods. Traditional approaches (JIEP, STAR) achieve moderate reconstruction metrics (PSNR 40–42 dB) but exhibit significant artifacts in their reflectance maps. JIEP produces overly smooth reflectance with loss of fine texture details, while STAR generates noticeable color bleeding artifacts, particularly visible in the transition regions between the blue wall and wooden door frame.
Deep learning methods show varying degrees of degradation. RetinexNet maintains reasonable PSNR (36–39 dB) but introduces characteristic halo artifacts around high-contrast edges in the reflectance component. KIND++ demonstrates improved noise handling compared to RetinexNet, but suffers from over-smoothing that eliminates crucial texture information in darker regions. URetinex exhibits the poorest reconstruction quality (PSNR 24–27 dB), with severe color distortions in the reflectance maps that manifest as unnatural yellow-green shifts. Diff-Retinex shows inconsistent performance between low and normal light conditions, with better normal-light reconstruction (PSNR 38.72 dB) but degraded low-light performance (PSNR 30.80 dB).
The quaternion decomposition visualizations in
Figure 8 and
Figure 9 provide insight into how the proposed method maintains consistency across lighting conditions. In the normal-light case (
Figure 8), the quaternion channels (i, j, k) of the reflectance component
show clear separation of color information while maintaining structural coherence. The illumination component
exhibits smooth spatial variation without the grid artifacts common in wavelet-based methods. The real component remains effectively suppressed, validating our loss design.
The low-light decomposition (
Figure 9) demonstrates the method’s robustness under challenging conditions. Despite severe underexposure in the input, the quaternion reflectance maintains similar structural patterns to the normal-light case, confirming the high RCI score. The illumination component correctly captures the non-uniform lighting distribution while avoiding over-enhancement artifacts. Notably, the quaternion representation preserves inter-channel relationships even in near-black regions where traditional per-channel processing would amplify noise independently.
A consistent pattern emerges across competing methods: they struggle to balance three competing objectives—reconstruction fidelity, artifact suppression, and reflectance consistency. Methods optimizing for high PSNR (JIEP) sacrifice perceptual quality through over-smoothing. Those preserving texture (RetinexNet) introduce halos and color shifts. Methods attempting to suppress noise (KIND++) lose critical details. The quaternion framework’s joint color modeling through Hamilton product operations naturally addresses these trade-offs by maintaining inter-channel coherence throughout the decomposition process.
The visual evidence corroborates our quantitative findings: quaternion-based decomposition achieves superior reflectance stability (RCI 0.988) while maintaining high reconstruction quality. This visual consistency translates directly to improved performance in downstream tasks, as the illumination-invariant features extracted from our reflectance maps provide more reliable cues for crack detection, face recognition, and image fusion applications.
The spatial-wavelet trade-off merits deeper discussion. Our ablation reveals a fundamental tension between reconstruction fidelity and perceptual quality. The non-wavelet variant achieves 65.20 dB PSNR by operating directly in the spatial domain, where gradient-based losses can precisely optimize pixel values. However, this pixel-level optimization fails to constrain spectral characteristics, allowing high-frequency noise and grid patterns to leak into the reflectance. The wavelet variant (52.37 dB PSNR) trades pixel-perfect reconstruction for better frequency localization—its multi-scale decomposition explicitly separates and regularizes different frequency bands, suppressing artifacts that manifest as ringing or aliasing. For applications prioritizing perceptual quality over PSNR (e.g., medical imaging, surveillance), the wavelet approach proves superior despite metrics suggesting otherwise.
6. Conclusions
This paper introduced a novel quaternion-based Retinex decomposition that effectively addresses the limitations of traditional scalar methods. By representing RGB channels as unified hypercomplex entities, our framework holistically captures cross-channel dependencies using quaternion algebra. The decomposition, modeled as with the Hamilton product, ensures perfect image reconstruction while preserving crucial color information. Our model demonstrates state-of-the-art performance, achieving a PSNR of up to 62.51 dB and an SSIM of 0.9998 on the LOLv1 dataset. The proposed Reflectance Consistency Index (RCI) reached 0.988, quantitatively validating the model’s ability to maintain stable reflectance under challenging illumination. Across diverse applications, including low-light crack detection, nighttime face detection, and infrared-visible fusion, our framework consistently outperformed leading methods by 2–11%. This superior performance stems from its ability to maintain the cross-channel correlations that scalar methods ignore, ensuring enhanced color fidelity and structural preservation in difficult visual conditions.
The proposed framework offers a strong foundation for advancing low-light image enhancement. Future research should focus on developing perceptual frameworks that include opponent color processing and CIE-aligned quaternion representations to better emulate the human visual system. It should also extend the framework to video sequences by incorporating quaternion-based temporal consistency constraints, ensuring smooth and stable enhancement across frames. Additionally, designing GPU-accelerated kernels and lightweight network architectures will optimize the model for real-time use and deployment on mobile or edge devices. Additionally, enhancing robustness under extreme lighting and weather conditions and ensuring the model generalizes well across different camera sensors and imaging systems remain vital areas for ongoing research.