Next Article in Journal
A Multi-Strategy Enhanced Whale Optimization Algorithm for Long Short-Term Memory—Application to Short-Term Power Load Forecasting for Microgrid Buildings
Next Article in Special Issue
Young White Pine Detection Using UAV Imagery and Deep Learning Object Detection Models
Previous Article in Journal
Automatic Identification of Lower-Limb Neuromuscular Activation Patterns During Gait Using a Textile Wearable Multisensor System
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A No-Reference Multivariate Gaussian-Based Spectral Distortion Index for Pansharpened Images

School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, China
*
Authors to whom correspondence should be addressed.
Sensors 2026, 26(3), 1002; https://doi.org/10.3390/s26031002
Submission received: 24 December 2025 / Revised: 28 January 2026 / Accepted: 29 January 2026 / Published: 3 February 2026
(This article belongs to the Special Issue Remote Sensing Image Fusion and Object Tracking)

Abstract

Pansharpening is a fundamental image fusion technique used to enhance the spatial resolution of remote sensing imagery; however, it inevitably introduces spectral distortions that compromise the reliability of downstream analyses. Existing no-reference (NR) quality assessment methods often fail to exclusively isolate these spectral errors from spatial artifacts or lack sensitivity to specific radiometric inconsistencies. To address this gap, this paper proposes a novel No-Reference Multivariate Gaussian-based Spectral Distortion Index (MVG-SDI) specifically designed for pansharpened images. The methodology extracts a hybrid feature set, combining First Digit Distribution (FDD) features derived from Benford’s Law in the hyperspherical color space (HCS) and Color Moment (CM) features. These features are then used to fit Multivariate Gaussian (MVG) models to both the original multispectral and fused images, with spectral distortion quantified via the Mahalanobis distance between their statistical parameters. Experiments on the NBU dataset showed that the MVG-SDI correlates more strongly with standard full-reference benchmarks (such as SAM and CC) than existing NR methods like QNR. Tests with simulated distortions confirmed that the proposed index remains stable and accurate even when facing specific spectral degradations like hue shifts or saturation changes.

1. Introduction

Satellite systems face inherent limitations in imaging, storage, and data transmission, leading to a trade-off between spectral and spatial resolutions in remote sensing images [1,2]. Despite technological progress, spaceborne sensors typically capture high spatial resolution (HR) panchromatic (PAN) images alongside low spatial resolution (LR) multispectral (MS) images, rather than direct HR MS data.
Pansharpening, a fundamental image fusion technique in remote sensing, addresses this by combining the high spatial resolution of PAN images with the rich spectral information of MS images to produce high spatial resolution multispectral (HRMS) products [3]. This process enhances the interpretability and utility of satellite imagery, enabling applications such as land cover classification, urban planning, environmental monitoring, disaster management, precision agriculture, visual analysis, change detection, and mapping. The pansharpening field has evolved through key phases since the 1980s [4]: early techniques like Intensity-Hue-Saturation (IHS) and High-Pass Filtering (HPF) [5]; 1990s advancements in Multiresolution Analysis (MRA) [6,7]; and post-2000 innovations including variational optimization (VO) [8,9], deep learning (DL) [10], and unified frameworks [11,12]. Recent advancements in this area have further refined these deep learning approaches by exploring interactions between the spatial and frequency domains to improve feature extraction and fusion accuracy [13].
However, the fusion process frequently introduces artifacts, with spectral distortion being one of the most critical and pervasive issues. Spectral distortion refers to the alteration of the original spectral properties in the fused image, manifesting as color shifts, radiometric inconsistencies, or loss of spectral fidelity. Visually, this may appear as unnatural hues or brightness variations, but its impact extends far beyond aesthetics. In quantitative remote sensing tasks, such as vegetation index calculation (e.g., NDVI), mineral mapping, or change detection, precise spectral signatures are paramount. Even subtle distortions can lead to erroneous interpretations, compromising the reliability of downstream analyses. For instance, altered spectral bands may misrepresent vegetation health or soil composition, leading to inaccurate decision-making in fields like agriculture or ecology.
Despite advances in pansharpening algorithms, categorized broadly into component substitution (CS), MRA, VO, and machine learning (ML) approaches, spectral distortion remains a persistent challenge, exacerbated by factors such as sensor misalignment, atmospheric effects, or algorithmic assumptions about spectral–spatial relationships. CS and MRA represent traditional methods, differing in spatial detail extraction: CS uses spectral transforms to separate and substitute intensity components with histogram-matched PAN data; MRA employs multiscale decompositions (e.g., à trous wavelet or generalized Laplacian pyramid) to inject high-frequency PAN details. VO recasts fusion as an optimization problem with fidelity and regularization terms, drawing from super-resolution and restoration techniques. ML, especially convolutional neural networks (CNNs), excels at managing non-linearities, balancing spatial and spectral quality in satellite imagery.
Quality assessment (QA) in pansharpening is vital for ensuring image reliability and accuracy [14]. It guides fusion algorithm selection, informs new method development, sets standards for downstream applications (e.g., classification or detection), and boosts the commercial appeal of fused products. Effective QA mitigates issues from spectral–spatial trade-offs, preventing distortions that could compromise results. Assessment approaches fall into three main types: qualitative (visual inspection) [15], application-based (task performance, e.g., classification) [16], and quantitative. The latter is considered the most objective, targeting spectral distortion (color alterations) and spatial distortion (artifact introduction or detail mismanagement).
These are subdivided into full-reference (FR) and no-reference (NR) categories. FR QA usually relies on Wald’s protocol [14], emphasizing consistency (degraded fused image matches original LR MS) and synthesis (fused image mimics hypothetical HR MS capture). Reduced-resolution (RR) assessment applies this by downscaling inputs, using the original LR MS as a reference, with filters matching sensor modulation transfer function (MTF) [17]. FR metrics, such as the Spectral Angle Mapper (SAM), Spectral Information Divergence (SID), and Cross-Correlation (CC), require a high-resolution ground truth image for comparison, which is often unavailable in real-world scenarios. However, RR suffers from scale-invariance failures at high ratios, filter biases, and MTF instability from sensor aging.
In contrast, NR methods evaluate the HR MS image directly, avoiding such assumptions, though protocols like Quality with No Reference (QNR) [18] and its variants (e.g., FQNR and MQNR) face criticism for spectral–spatial coupling and a lack of standardization [19]. Traditional no-reference image quality assessment (NR-IQA) or blind IQA includes opinion-aware methods (e.g., BIQI [20], DIIVINE [21], BRISQUE [22], BLIINDS-II [23]) trained on distorted natural images with subjective scores, limiting generalization for pansharpening. Opinion-unaware approaches, like NIQE [24] and IL-NIQE [25], fit features to Multivariate Gaussian (MVG) models, measuring distances from pristine benchmarks promising for remote sensing adaptation. For pansharpening-specific NR QA, evaluations occur at the PAN scale without HR MS references, using innovative distortion measures. QNR-like indices [26] assess band relationships via the Universal Image Quality Index (UIQI), with variants like hybrid QNR (HQNR) [27] and regression-based QNR (RQNR) [28] incorporating MTF and consistency for spectral fidelity. Quality Estimation by Fitting (QEF) [15] extrapolates RR metrics, enhanced by Kalman Filter-based (KQEF) [29] and Combiner-based (CQEF) [19] versions, but these depend on accurate down-sampling, suffer scale-invariance issues, and show uneven performance across scenes and distortions. DL-based NR-QA advances include CNN architectures like the Deep Feature Similarity Measure Network (DFSM-net) [30] and the Three-Branch Neural Network (TBN-PSI) [31], which learn distortions without hand-crafted features, improving correlations but requiring large datasets and computational resources, reducing interpretability. MVG-based NR methods extract features (e.g., NDVI, NDWI, ASM, CON) from pristine MS images to train benchmark models, then compute distances for test images [32,33]. While effective, these often produce global scores, conflating spatial and spectral distortions.
The existing literature highlights several limitations in current NR spectral QA methods. For example, QNR-based indices assume that spectral relationships remain consistent across resolutions, ignoring the non-stationary nature of remote sensing imagery, which includes diverse land covers like vegetation, water bodies, and urban areas. Other MVG-based models, while effective for general image quality assessment, do not incorporate features tailored to spectral artifacts, such as deviations from natural statistical distributions or Color Moments. Our previous studies on datasets like the Ningbo University (NBU) pansharpening database [34] comprising imagery from sensors such as IKONOS (IK), WorldView-2 (WV-2), WorldView-3 (WV-3), and WorldView-4 (WV-4) reveal that pristine MS images adhere to statistical laws like Benford’s Law in their First Digit Distributions (FDDs) within hyperspherical color domains (HCDs), but fused images deviate markedly due to spectral alterations. Simulated distortions (e.g., hue shifts, saturation changes, non-linear intensity mismatches) further amplify these deviations, underscoring the need for metrics sensitive to such changes. Despite these insights, no dedicated NR metric exists that exclusively isolates spectral distortion while leveraging a comprehensive statistical model to capture both local and global spectral characteristics.
Recently, a spectral quality assessment method based on Benford’s Law was proposed, showing a strong correlation with visual perception [35]. Expanding on this concept, a subsequent study developed an NR metric specifically for fused hyperspectral imagery [36]. The findings from this later work indicate that the technique offers superior stability and robustness compared to alternative NR metrics, yielding results that align more closely with full-reference benchmarks.
Various other NR techniques employ a Multivariate Gaussian (MVG) model trained on features extracted from pristine, undistorted images [31,32]. However, these MVG-driven approaches typically generate a single aggregate quality score, failing to distinguish between spatial and spectral distortions.
Despite these strides, methods based on MVG, DL, and sparse coding frequently struggle to generalize across diverse sensors and scenes. Traditional FR protocols rest on flawed assumptions, existing NR metrics like QNR tend to couple distinct distortion types, and DL approaches demand prohibitively large datasets. These limitations highlight a critical need for advanced NR frameworks capable of isolating spectral distortion through specialized statistical features, which serves as the primary motivation for this paper.
To address this gap, this paper proposes a novel No-Reference Multivariate Gaussian-based Spectral Distortion Index (MVG-SDI) specifically designed for pansharpened images. Building on the MVG framework, the method extracts a hybrid feature set from non-overlapping image patches: 9-dimensional FDD features derived from Benford’s Law in the hyperspherical color space (HCS) to detect statistical deviations in angular components, and 12-dimensional Color Moment (CM) features (mean, standard deviation, and skewness across RGB-NIR channels) to quantify perceptual color shifts. These features are concatenated into a 21-dimensional vector per patch, forming a spectral feature matrix. Separate MVG models are fitted to the original MS (reference) and fused (test) images, with spectral distortion quantified via the Mahalanobis distance between their parameters. This approach ensures sensitivity to localized distortions while accounting for feature interdependencies, outperforming existing NR metrics in isolating spectral errors without confounding them with spatial artifacts.
The contributions of this work are threefold: (1) it introduces the first NR index dedicated solely to spectral distortion in pansharpened images using MVG model, decoupling it from spatial quality assessment; (2) it integrates FDD and CM features within an MVG model, providing a robust statistical representation validated on diverse sensor data; and (3) extensive experiments on the NBU dataset demonstrate superior correlation with FR benchmarks (SAM, SID, CC) compared to QNR variants, highlighting its practical utility for algorithm optimization. This work provides targeted feature engineering and experimental protocols that allow future research and practitioners to extend and adapt the MVG framework to new satellites, sensors, or domain requirements.
The remainder of this paper is organized as follows. Section 2 details the proposed MVG-SDI methodology, including patching, feature extraction, model fitting, and score computation. Section 3 describes the experiments, including the datasets description, fusion algorithms, evaluation protocols, and analysis. Section 4 presents discussions, followed by conclusions in Section 5.

2. Proposed Method

This section details the No-Reference Multivariate Gaussian-based Spectral Distortion Index (MVG-SDI). The proposed approach operates on the hypothesis that spectral distortions in pansharpened images manifest as statistical deviations from the natural spectral characteristics inherent in the original MS data.
The MVG model is a statistical method that describes an image’s pixel distribution using a mean vector and a covariance matrix, which capture the average pixel values and the correlation between spectral bands, respectively. This model, first introduced for blind assessment of natural images by Mittal et al. [15], has proven effective at capturing the statistical regularities of natural scenes. In pansharpening assessment, the MVG model is fitted to both the original and the fused images. By comparing the statistical parameters of the two distributions, the model can quantify how well the pansharpened image preserves the original spectral information. For a d-dimensional feature vector f , the probability density function (PDF) of the MVG distribution is given by
g ( f ) = 1 ( 2 π ) d / 2 | Ψ | 1 / 2 exp 1 2 ( f μ ) T Ψ 1 ( f μ )
where: f R d × 1 is the feature vector, Ψ R d × d is the covariance matrix, μ R d × 1 is the mean vector, X denotes the feature matrix containing the image patches, K is the number of image patches, d is the dimensionality of the distribution, and the superscript T denotes the transpose.
This work adapts the MVG framework specifically for spectral distortion by focusing on features sensitive to color and radiometric changes, while excluding spatial-oriented ones. As illustrated in Figure 1, the method compares the statistical characteristics of the fused image against an ideal reference derived from the original MS data. The process uses the MS image as the training reference and the fused image as the testing sample. Both undergo identical processing: patch division, spectral feature extraction (combining FDD from Benford’s Law and CMs), and aggregation into a Spectral Features Matrix. Separate MVG models are fitted into a training MVG model ( μ ref , Ψ ref ) from the MS data, representing undistorted spectral properties, and a testing MVG model ( μ test , Ψ test ) from the fused image. The Mahalanobis distance between these models quantifies distortion, with smaller values indicating better spectral preservation. This adaptation enhances robustness by accounting for feature covariances, enabling detection of subtle, inter-dependent spectral artifacts that simpler distances overlook.

2.1. Patch-Based Decomposition

The fused and MS images are decomposed into a uniform grid of non-overlapping 32 × 32 patches. Numerous patch sizes have been tested, and through empirical experimentation, this 32 × 32 size was determined to be the optimal choice. This patch-based approach directly addresses the spectral variability common in HR remote sensing imagery, where adjacent land-cover types such as vegetation, water, and urban surfaces display unique spectral characteristics. In contrast, a global statistical model would blend these diverse responses, potentially concealing specific distortions caused by pansharpening. The selected 32 × 32 dimension strikes a practical balance: it is expansive enough to ensure reliable statistical calculations for FDD and CMs, yet compact enough to maintain local uniformity. As a result, the MVG-SDI is capable of detecting fine, location-specific color variations or radiometric discrepancies that broader metrics would typically miss.

2.2. Spectral Feature Extraction

The performance of the proposed index depends on features that are highly sensitive to the spectral distortions typically introduced during the pansharpening process. To achieve this, a hybrid feature set was employed that combines FDD features derived from Benford’s Law to capture deviations in spectral statistics with CM features, which characterize the global color distribution through the mean, standard deviation, and skewness of each channel. This combination yields a comprehensive 21-dimensional feature vector for each image patch, effectively representing both local spectral consistency and global chromatic variation.

2.2.1. First Digit Distribution Features

Distortion causes an image’s statistical characteristics to stray from their expected norms. By extracting these statistics as features and measuring their divergence, image quality can be assessed without any reference images. One widely used feature is Benford’s Law, which has been employed in natural image quality assessment [37,38]. Unlike natural images, remote sensing data often include multiple bands and rich spectral information, such as MS and hyperspectral images, making spectral distortion measurement in pansharpened MS images essential for evaluating sharpening performance.
To ensure the reproducibility of the proposed metric, it is important to note that no radiometric normalization, clipping, or dynamic range rescaling is applied to the input images prior to feature extraction. The HCS transform is applied directly to the raw pixel values of the fused and reference images. The only normalization performed is the scaling of the angular components θ k by the constant 2 / π , as defined in Equation (3) to map them to the [ 0 , 1 ] interval required for consistent First Digit Distribution analysis.
The experiments on the NBU database show that the FDD of the angular components of pristine MS images in the HCD adheres to standard Benford’s Law. As shown in Figure 2, for high-quality (undistorted, unprocessed) images, the FDD features of four different MS bands align almost perfectly with the theoretical Benford distribution.
Figure 3 illustrates that once the same images are processed by fusion algorithms known to introduce spectral errors (such as the BT-H, TV, and PWMBF methods), their first digit frequencies deviate markedly. Similarly, Figure 4 shows that simulated spectral degradations like hue, saturation, and non-linear intensity mismatch break the Benford pattern even more severely, causing the distribution to skew away from the theoretical curve. These distortions confirm that only pristine MS images in the HCD truly follow Benford’s Law; any filtering or spectrally altering processing disrupts the natural first digit statistics. Building on this, a nine-dimensional feature vector was extracted based on Benford’s Law to quantify spectral distortion.
First, the hyperspherical color space (HCS) transform is employed to map the N-band pansharpened image from its original space to the hyperspherical color space. This process separates the intensity component from the angular components, yielding one intensity component and N 1 angular components. Specifically, the intensity component characterizes the spatial information of the pansharpened image, while the angular components represent its spectral information.
Let the intensity component of the pansharpened image M ^ = { M ^ 1 , M ^ 2 , , M ^ N } in the HCD be denoted as I , and the angular components be denoted as θ = { θ 1 , θ 2 , , θ N 1 } . The dimensions of θ are H × W pixels, and the value range of each pixel in θ is [ 0 , π / 2 ] . The hyperspherical color transform is calculated as follows:
I = M ^ 1 2 + M ^ 2 2 + + M ^ N 2 θ 1 = tan 1 M ^ 2 2 + + M ^ N 2 M ^ 1 θ k = tan 1 M ^ k + 1 2 + + M ^ N 2 M ^ k θ N 1 = tan 1 M ^ N M ^ N 1
The raw angular components θ k fall within the range [ 0 , π / 2 ] . Before feature extraction, these are normalized to the range [ 0 , 1 ] to obtain θ ¯ k :
θ ¯ k = 2 π θ k
Figure 5 displays the heatmaps of the HCD normalized angular components for the MS image and the IHS pansharpened image, which exhibits severe spectral distortion. It can be observed that the normalized angular component heatmaps of the IHS pansharpened image differ significantly from those of the MS image. This demonstrates that the normalized angular components can effectively reflect spectral distortion.
Next, the FDD features are determined. For every pixel in the normalized component θ ¯ k , the first non-zero digit is extracted. The probability of each digit a { 1 , 2 , , 9 } is calculated as
P k ( a ) = Q a H × W
where: Q a is the count of digit a, and H × W is the total number of pixels in the patch.
This creates a 9-dimensional FDD feature vector for each angle component. These are then averaged across the N 1 angles to produce a single 9-dimensional FDD feature vector x F D D for the patch:
x F D D = 1 N 1 k = 1 N 1 [ P k ( 1 ) P k ( 2 ) , , P k ( 9 ) ] T
where the superscript T represents the transpose of the vector.

2.2.2. Color Moment Features

While the FDD features capture the underlying statistical distribution, they are complemented by CM features to provide a more direct and perceptually relevant measure of the image’s color profile. This feature set is designed to capture the global color distribution within each patch and is highly sensitive to the primary artifacts of spectral distortion, such as changes in brightness, chromaticity, and illumination. Pansharpening algorithms can often introduce radiometric shifts (biasing brightness), alter the gain between bands (changing color balance), or cause non-linear saturation, all of which are effectively quantified by this feature set.
For each patch, the first three statistical moments are calculated for each of the red (R), green (G), blue (B), and Near-Infrared (NIR) channels. This process forms a compact and highly descriptive 12-dimensional feature vector (4 channels × 3 moments/channel):
  • Mean ( μ ): The first-order moment. This represents the average color intensity of a channel, directly reflecting the image’s overall brightness or any radiometric bias introduced during fusion.
  • Standard deviation ( σ ): The second-order moment. This measures the contrast or dynamic range within a channel. A higher σ indicates more variation in pixel intensities, a property often compressed or unnaturally expanded by fusion.
  • Skewness ( γ ): The third-order moment. This captures the asymmetry of the pixel distribution. It is highly sensitive to non-linear distortions, indicating whether the channel is biased toward darker or brighter tones, which often results from pixel value clipping or saturation.
Thus, each image block yields the concatenated 12-dimensional feature vector:
x C M = [ μ R , σ R , γ R , μ G , σ G , γ G , μ B , σ B , γ B , μ N I R , σ N I R , γ N I R ] T .

2.3. Spectral Feature Matrix Construction

Following the extraction of the two distinct feature sets from each 32 × 32 patch, the next step is to combine them into a single, powerful descriptor. For each individual image patch, the FDD features (a 9-dimensional vector) and the CM features (a 12-dimensional vector) are concatenated end-to-end. This fusion of features is crucial, as it creates a single, comprehensive vector that simultaneously describes the patch’s underlying statistical “naturalness” (from FDD) and its direct perceptual color profile (from CMs).
This process results in a final 21-dimensional spectral feature vector for each patch, defined as
x = [ x F D D , x C M ] .
This procedure is repeated for all N non-overlapping patches extracted from the image. The resulting N feature vectors are then aggregated to form the comprehensive N × 21 spectral feature matrix. This final matrix statistically represents the complete spectral properties of the entire image.

2.4. Model Fitting

The core of the quality assessment lies in comparing the statistical characteristics of the fused image against those of the original MS image using the MVG framework. To achieve this, two distinct models are constructed:
  • Training MVG model ( μ r e f , Ψ r e f ): This model is constructed from the spectral feature matrix (containing both FDD and CM features) extracted from the original, LR distortion-free MS image. Its parameters, the mean vector μ r e f and covariance matrix Ψ r e f , serve as the “ground truth” statistical benchmark, representing the ideal spectral properties that the fused image should replicate.
  • Testing MVG model ( μ t e s t , Ψ t e s t ): In parallel, a second model is built for the fused image under evaluation. Its mean vector μ t e s t and covariance matrix Ψ t e s t are computed from its own spectral feature matrix, which also contains the FDD and Color Moment features. This model represents the actual spectral statistics of the final fused product, including any distortions.
The main steps of the proposed method can be summarized in Algorithm 1.
Algorithm 1 Pseudocode of the proposed MVG-SDI method
Require: 
Multispectral image ( I M S ), fused image ( I F u s e d )
Require: 
Block size S = 32 , feature dimension d = 21
Ensure: 
Spectral Distortion Index (Q)
  1:
Step 1: Feature extraction
  2:
Function ExtractFeatures(Image):
  3:
    Divide I m a g e into K non-overlapping patches of size S × S
  4:
for  k = 1 to K do
  5:
    // Extract FDD features (9 dimensions)
  6:
    Convert patch to HCS to get angular components θ
  7:
    Normalize angles: θ ¯ θ × ( 2 / π )
  8:
    Compute digit probabilities x F D D based on Benford’s Law
  9:
    // Extract CM features (12 dimensions)
10:
    Select first 4 bands (RGB + NIR)
11:
    Compute mean ( μ ), standard deviation ( σ ), and skewness ( γ ) for each band
12:
    Construct vector x C M = [ μ 1 , σ 1 , γ 1 , , μ 4 , σ 4 , γ 4 ]
13:
    // Concatenate feature vector
14:
     f k [ x F D D , x C M ]
15:
end for
16:
return Feature matrix F = [ f 1 , f 2 , , f K ] T
17:
Step 2: Model construction
18:
F r e f ExtractFeatures ( I M S )
19:
F t e s t ExtractFeatures ( I F u s e d )
20:
Compute training MVG model parameters:
21:
μ r e f mean ( F r e f ) , Ψ r e f cov ( F r e f )
22:
Compute testing MVG model parameters:
23:
μ t e s t mean ( F t e s t ) , Ψ t e s t cov ( F t e s t )
24:
Step 3: Distance calculation
25:
Compute pooled covariance: Ψ ( Ψ r e f + Ψ t e s t ) / 2
26:
Calculate difference vector: Δ μ μ r e f μ t e s t
27:
Compute Mahalanobis distance:
28:
Q Δ μ ( Ψ ) 1 Δ μ T
29:
return Q

2.5. Quality Score Computation

The spectral distortion is formally quantified by calculating the Mahalanobis distance (D) between the statistical parameters of the two fitted MVG models, utilizing the formulation provided in Equation (8). This distance measures the dissimilarity between the mean feature vector of the test model ( μ t e s t ) and the mean of the ideal reference model ( μ r e f ).
By incorporating the pooled covariance matrix, this metric offers a more robust assessment than simple Euclidean distance. It explicitly normalizes for statistical variance and accounts for the complex inter-dependencies between distinct spectral features, such as the correlation between FDD variations and Color Moments.
D = ( μ r e f μ t e s t ) T Ψ 1 ( μ r e f μ t e s t )

3. Experiments

3.1. Datasets

Experimental validation was performed using the publicly available large-scale NBU dataset [34], which comprises 1200 diverse image pairs acquired by five distinct satellite sensors: IK, WV-2, WV-3, and WV-4. Each sample pair contains a high-resolution PAN image ( 1024 × 1024 pixels) and a corresponding low-resolution MS image ( 256 × 256 pixels).
Table 1 provides a detailed breakdown of the image pairs and spectral bands, while Figure 6 displays representative examples from each sensor subset.

3.2. Fusion Algorithms

The evaluation employed 19 distinct pansharpening algorithms, sourced from two public MATLAB toolkits [39] to ensure standardized implementation. These methods were selected to represent a diverse cross-section of established techniques, which is crucial for assessing the proposed index’s performance across various types of spectral artifacts. The set includes five CS, nine MRA, four VO, and four ML methods, covering the most prominent categories in the field. Table 2 provides a summary of the algorithms utilized in this study.

3.3. Implementation Details

To ensure the reliability of the proposed MVG-SDI metric and facilitate future comparisons, all key implementation parameters have been standardized. These settings, including patch size, normalization constants, and feature dimensionality. The specific values used in this study are detailed in Table 3 below.

3.4. Spectral Quality Assessment Metrics

To validate the performance of the proposed MVG-SDI, its results were compared against a comprehensive suite of established evaluation metrics. This suite included both competing NR methods and benchmark FR measures.
For the NR comparison, the spectral distortion components of several prominent QNR-like indices, namely QNR λ , FQNR λ , and MQNR λ , were employed. These indices are designed to assess how well spectral characteristics are preserved in pansharpened images without requiring a ground truth reference.
In addition, three widely used FR evaluation measures, the Spectral Angle Mapper (SAM), Spectral Information Divergence (SID), and the correlation coefficient (CC), were incorporated as benchmarks. These metrics are employed specifically to quantify spectral angular consistency, Spectral Information Divergence, and statistical correlation, respectively.
Moreover to properly evaluate these metrics, this study employs a dual-scale experimental protocol consisting of NR and FR validation. This approach is essential for comparing the proposed index against other NR metrics at the original scale, while validating its performance against FR benchmarks at a reduced scale. The two protocols are summarized as follows.

3.4.1. No-Reference Assessment

This protocol compares the proposed spectral index against existing NR spectral indexes ( QNR λ , FQNR λ , and MQNR λ ). The fusion algorithms are applied at the original full resolution (producing 1024 × 1024 fused images), and all NR metrics are computed directly on these outputs to evaluate performance under realistic conditions.

3.4.2. Reduced-Resolution Validation (Wald’s Protocol)

This protocol validates the proposed NR index against reference-based scores using Wald’s protocol. Following standard degradation and upsampling procedures, fusion is performed at a reduced scale ( 256 × 256 images). The FR metrics (SAM, SID, and CC) are computed by comparing the fused results against the original 256 × 256 MS image, which serves as the ground truth. The proposed NR spectral index is simultaneously calculated on these reduced-resolution images to analyze its correlation with the FR benchmarks.

3.5. Numerical Evaluation

A comprehensive analysis was conducted to benchmark the proposed MVG-SDI against state-of-the-art metrics across diverse fusion categories (CS, MRA, VO, and ML). In the reported results, the optimal performance within each category is highlighted in red, while the poorest performance is marked in blue.
A critical observation from this comparison is the strong alignment between the proposed method and the advanced FQNR λ model. As evidenced in Table 4, Table 5, Table 6 and Table 7, the proposed index exhibits a high degree of correlation with FQNR λ in identifying extreme performers. Specifically, both metrics converge on the same best-performing algorithm in the WV-2 dataset, identify identical best and worst algorithms in the WV-3 dataset, and consistently flag the same poorest performer in the WV-4 dataset.
Furthermore, the numerical results validate the proposed method’s capability to assess spectral distortion effectively. It demonstrates superior performance to MQNR λ and comparable to established NR benchmarks, while maintaining a logical consistency with FR metrics such as the CC.
The numerical results for the IK dataset are presented in Table 4. PNN achieved the best scores for CC, SAM, and SID, indicating superior spectral fidelity with respect to the reference. Conversely, GS yielded the lowest CC value, while PWMBF recorded the poorest performance for both SAM and SID. Among the QNR-based indices, optimal selection varied: QNR λ favored A-PNN, FQNR λ selected SR-D, and MQNR λ identified GS as the top-performing algorithm. Notably, the proposed method identified PRACS as the best fusion model, aligning more closely with the preferences of MQNR λ for CS-based methods. Regarding the poorest results, SR-D exhibited the highest distortion levels according to the proposed metric, whereas MQNR λ flagged MTF-GLP-HPM-H as the worst performer.
Quantitative results for the WV-2 dataset are presented in Table 5. PNN-IDX achieved the best scores for both SAM and SID, indicating superior spectral fidelity, while MTF-GLP-HPM-H recorded the highest CC value. Conversely, MF yielded the lowest correlation coefficient, while PRACS and AWLP exhibited the highest spectral distortion across both SAM and SID measures. Among the QNR-based indices, optimal selection varied: QNR λ favored BT-H, whereas both MQNR λ and FQNR λ identified GS as the top-performing algorithm. Notably, the proposed method aligned with these latter indices, consistently identifying GS as the best fusion model. Regarding the poorest results, the proposed metric flagged SR-D as the worst performer, while MQNR λ identified PNN-IDX, the algorithm with the best ground truth spectral fidelity, as the poorest model, further highlighting the divergence between FR and NR assessments.
The numerical assessment results for the WV-3 dataset are summarized in Table 6. MTF-GLP achieved the highest CC, while BT-H and A-PNN-FT recorded the best performances for SID and SAM, respectively, indicating superior spectral preservation. Conversely, PNN-IDX consistently yielded the poorest results across all full-reference metrics, exhibiting the lowest correlation and highest spectral distortion. Among the QNR-based indices, optimal selection varied significantly: QNR λ and MQNR λ favored BT-H, aligning with the SID results, whereas FQNR λ selected GS as the top-performing algorithm. Notably, the proposed method also identified GS as the best fusion model. Regarding the poorest results, a distinct contradiction was observed: the proposed metric flagged BT-H as the worst performer (0.2899) despite it achieving the best SID score, while MQNR λ aligned with the FR benchmarks by identifying PNN-IDX as the poorest model.
The quantitative evaluation for the WV-4 dataset is detailed in Table 7. FE-HPM achieved the best scores for CC, SAM, and SID, consistently demonstrating superior spectral fidelity across all full-reference benchmarks. Conversely, BT-H recorded the lowest correlation coefficient, while SR-D and PWMBF exhibited the highest spectral distortion in terms of SID and SAM, respectively. Among the QNR-based indices, the optimal selection varied: QNR λ favored BDSD, MQNR λ selected PRACS, and FQNR λ identified SR-D as the top-performing algorithm. Notably, the proposed method identified PWMBF as the best fusion model; however, this presents a significant contradiction, as PWMBF yielded the poorest SAM value in the reference-based assessment. Regarding the poorest results, a rare consensus was observed: all NR metrics, including the proposed method, flagged BT-H as the worst performer, aligning with its lowest ranking in the CC evaluation.

3.6. Visual Evaluation

The visual performance of the proposed index was evaluated against the benchmark metrics ( QNR D λ , FQNR D λ , and MQNR D λ ) using the GS fusion method, which is selected for its tendency to induce noticeable spectral distortions.
Figure 7 illustrates the distortion maps for the IK dataset. The GS fused image (f) exhibits characteristic spectral shifts relative to the MS reference (e). The maps for QNR D λ (a) and MQNR D λ (c) display identical patterns concentrated almost exclusively along high-frequency edges. This indicates a bias toward spatial features rather than true spectral deviations. Meanwhile, FQNR D λ (b) appears almost entirely blue, failing to register the distortion. Conversely, the proposed index (d) generates a distinct heatmap with high-intensity values (red and yellow) distributed across broad object surfaces, effectively highlighting the spectral errors that competing metrics mistake for spatial structures.
This behavior is further validated on the WV-2 dataset, as shown in Figure 8. Here, the GS method introduces spectral deviations across varied urban materials. Consistent with the previous dataset, QNR D λ (a) and MQNR D λ (c) remain nearly indistinguishable, focusing narrowly on high-contrast features such as bright rooftops, while FQNR D λ (b) significantly underestimates the error magnitude. The proposed index (d), however, captures the widely distributed inconsistencies, producing a heatmap that correlates robustly with the global spectral degradation inherent to the component substitution process.
The analysis of the WV-3 dataset shown in Figure 9 demonstrates the robustness of the proposed index in scenes with high dynamic range. Visually, the GS image (f) shows spectral shifts in both bright blue industrial rooftops and deep shadowed regions. The competing metrics exhibit a strong radiance bias: QNR D λ and MQNR D λ detect artifacts only on the bright rooftops, leaving the background and shadows unassessed. In contrast, the proposed index (d) identifies spectral degradation across the entire dynamic range, showing significant responsiveness even in the low-radiance shadowed areas that other metrics fail to register.
Finally, the WV-4 dataset, as shown in Figure 10, offers the most striking validation. The GS fusion (f) suffers from severe global spectral distortion, appearing as an unnatural reddish-brown hue shift compared to the reference (e). Despite this obvious degradation, FQNR D λ (b) remains unresponsive, and QNR D λ (a) and MQNR D λ (c) display only scattered, low-intensity noise. The proposed index (d) is the only metric to produce a high-intensity response with prominent hotspots aligning precisely with the most distorted regions, proving its superior capability in quantifying severe spectral artifacts.

3.7. Visual Analysis of Fusion Results

A comprehensive visual inspection of the fusion outcomes across the different datasets, ranging from Figure 11, Figure 12, Figure 13 and Figure 14, reveals distinct variations in algorithmic performance. This qualitative analysis highlights the critical trade-off between spatial enhancement and spectral preservation, demonstrating how certain methods generalize more robustly across diverse sensor platforms than others.
For the IK dataset illustrated in Figure 11, the BT-H and MTF-GLP methods distinguish themselves with superior visual performance. These algorithms effectively inject high-frequency spatial details while maintaining rigorous spectral fidelity, resulting in images that are sharp yet natural. The PNN method also delivers competent results, striking a commendable balance between detail enhancement and artifact suppression. In contrast, the BDSD algorithm performs with mediocrity, failing to achieve the spatial crispness defined by the top performers. Meanwhile, the GS and AWLP methods occupy a middle ground; while their outputs are acceptable, they lack the refined clarity and spectral accuracy observed in the leading models.
In the case of the WV-2 dataset presented in Figure 12, a sharp disparity in spectral preservation capabilities is evident. The BT-H and GS algorithms, alongside the deep learning-based family (PNN, PNN-IDX, and A-PNN), produce the most visually convincing results. These methods are characterized by the precise rendering of spatial features and the maintenance of natural color distributions. Conversely, both BDSD and AWLP suffer from severe spectral degradation that compromises image utility. Specifically, BDSD introduces a pervasive, unnatural green hue, whereas AWLP manifests a distinct brownish cast, indicating a significant failure to preserve the original spectral distribution of the scene.
The visual assessment of the WV-3 dataset in Figure 13 highlights the robustness of BT-H, GS, and PNN-IDX, which consistently provide very good visual results with high spatial clarity. AWLP also performs well in this scenario. However, spectral distortions remain a challenge for other methods: BDSD again exhibits a strong green bias, while PNN produces an unclear, brownish output. Furthermore, A-PNN fails to retain visual quality, resulting in a generally poor fusion product.
Finally, for the WV-4 dataset presented in Figure 14, the traditional methods largely outperform the learning-based approaches. BT-H, BDSD, GS, MTF-GLP, and MTF-GLP-HPM-FS all demonstrate good fusion capabilities, balancing spatial enhancement with spectral accuracy. AWLP, however, results in noticeably darker imagery, suggesting a loss of luminance. Notably, the deep learning models (PNN, PNN-IDX, and A-PNN) struggle significantly with this dataset, collectively exhibiting severe spectral shifts towards yellow and green tones, rendering them less suitable for this specific sensor data.

3.8. Quality Assessment with Spectral Degradations

Different pansharpening methods can introduce various color-related artifacts, known as spectral degradations. Common spectral degradations such as color shifting, intensity mismatch, and oversaturation significantly reduce the spectral fidelity of the fused image. To test the proposed method, these spectral degradations were manually generated to verify its efficacy in correct identification and ranking.

3.8.1. Hue and Saturation Shift

These artifacts represent one of the most direct and perceptually obvious sources of spectral distortion. They are particularly common in CS methods, most notably the IHS fusion family. The core issue arises from a fundamental spectral mismatch between the two source images. In the IHS method, the HR MS image is transformed into the IHS color space, and its “intensity” (I) component is replaced by the high-resolution PAN image. The problem is that the broad spectral response of the PAN sensor is not a perfect representation of the synthetic intensity component, which is calculated from the narrower MS bands. When this spectrally inconsistent PAN image is substituted and transformed back to the original color space, it introduces brightness levels that do not align with the original hue (H) and saturation (S) information. This mismatch leads to significant and unrealistic color shifts, altering the appearance of features like vegetation or water [5].
To systematically simulate and test the proposed index’s sensitivity to this degradation, the image is converted to the HSV (or HSI) color space, where the chromatic (H, S) and brightness (V) components can be manipulated independently. A hue shift is simulated by adding a constant offset, Δ H , to the entire hue channel. This effectively “rotates” all colors on the color wheel, creating a global color cast (e.g., shifting all blues toward purple).
H distorted = ( H original + Δ H ) ( mod 1 )
A saturation shift is simulated by multiplying the saturation channel by a scaling factor, α S . This modifies the “purity” or “vividness” of the colors.
S distorted = S original × α S
The severity of the distortion is precisely controlled by the magnitude of Δ H (a larger shift) and how far the scaling factor α S deviates from one (with α S > 1 causing oversaturation and α S < 1 causing desaturation or “washout”).

3.8.2. Non-Linear Intensity Mismatch

This artifact represents a complex form of spectral distortion where the brightness of the image is altered in a non-linear fashion. Unlike a simple, uniform brightening or darkening, this mismatch disproportionately affects different intensity levels, often compressing or expanding the mid-tones while leaving the darkest and brightest pixels relatively unchanged. This frequently leads to a “washed-out” or, conversely, an “overly dark” and “crushed” appearance.
This type of distortion is a common failure mode in MRA methods. These algorithms work by extracting high-frequency spatial details from the PAN image and then using an “injection model” to add them to the up-sampled MS bands. The amount of detail added is controlled by injection gains. When these gains are incorrectly calculated, either by “over-injecting” (adding too much detail) or “under-injecting” (adding too little), the resulting change in brightness is not uniform. Such non-linear intensity shifts can significantly alter the image’s radiometric values, which is particularly detrimental for quantitative analysis as it corrupts the accuracy of derived products like vegetation indices [6]. This non-linear degradation is effectively simulated by applying a power function, commonly known as gamma correction, to the image. To isolate the brightness component from the color information, this operation is performed on the value (V) channel in the HSV color space:
V distorted = ( V original ) γ
The severity of this mismatch is controlled by the gamma parameter ( γ ). A γ value of 1.0 results in no change. However, a value of γ < 1 brightens the image by boosting the mid-tones, simulating the washed-out effect of over-injection. Conversely, a value of γ > 1 darkens the image by compressing the mid-tones, mimicking the effect of under-injection. The further γ deviates from 1.0, the more severe the non-linear spectral distortion.

3.9. Experimental Results with Spectral Degradations

3.9.1. Objective Evaluation

This experiment evaluates the robustness of the proposed metric against manually induced spectral degradations. The primary objective is to verify the monotonicity of the metric; that is, the quality score should exhibit a consistent increase as the severity of the spectral distortion (specifically the hue shift, saturation shift, and non-linear intensity gamma) increases. A higher score must reliably indicate a strictly worse result without suffering from premature saturation or insensitivity.
The assessment of the continuous trends presented in Figure 15, Figure 16, Figure 17 and Figure 18 benchmarks the performance of the proposed method against established indices: QNR λ , MQNR λ , and FQNR λ .
For the GS method on the IK dataset, as shown in Figure 15, the proposed method demonstrates superior capability in quantifying spectral distortions.
In the saturation shift analysis, the FQNR λ curve exhibits excessive sensitivity, marked by a steep initial spike to a near-maximum score (>0.8) at the lowest degradation level ( α S = 1.20 ), indicating premature saturation. In contrast, the proposed method follows a nearly linear trajectory with a consistent, gradual rise in penalty. Furthermore, in the intensity gamma evaluation ( γ > 1 ), standard QNR λ displays a significant lack of sensitivity, remaining flat, whereas FQNR λ saturates rapidly. The proposed method avoids these extremes, yielding a balanced and monotonic curve that accurately reflects the increasing magnitude of degradation.
An assessment of the WV-2 dataset, as shown in Figure 16, reinforces these findings. The FQNR λ curve again exhibits excessive sensitivity to saturation shifts, spiking early, while the proposed method maintains a consistent, gradual trajectory. Similarly, in the intensity gamma evaluation ( γ > 1 ), standard QNR λ remains insensitive (score < 0.2 at γ = 1.20 ), while FQNR λ rises sharply. The proposed method provides a stable compromise, offering a balanced and monotonic response.
The results for the WV-3 dataset, shown in Figure 17, corroborate the robust stability of the proposed index, particularly where others fail. Here, MQNR λ exhibits a critical failure in the saturation shift test, plateauing immediately (>0.9) at the initial degradation level ( α S = 1.20 ).
Conversely, the proposed method initiates at a moderate penalty (∼0.3) and rises monotonically. A similar trend is observed for the intensity gamma, where the proposed metric bridges the gap between the insensitivity of QNR λ and the premature saturation of MQNR λ , ensuring a reliable assessment of radiometric consistency.
Finally, an evaluation on the WV-4 dataset, as shown in Figure 18, demonstrates the proposed method’s balanced sensitivity. While FQNR λ reacts disproportionately to initial saturation shifts (score > 0.75 ) and QNR λ remains largely unresponsive, the proposed method exhibits a moderate, monotonic trajectory (reaching 0.38 at α S = 1.20 ). In intensity gamma tests, it avoids the negligible response of QNR λ and the sharp spikes of FQNR λ and MQNR λ , providing a strictly monotonic response curve that effectively characterizes varying degrees of spectral degradation.

3.9.2. Visual Analysis with Spectral Degradations

Figure 19 provides a visual analysis of a GS-fused image from the IK dataset, showing three types of manually generated spectral distortions at different severity levels. The figure is organized in a 3 × 3 grid, with each row dedicated to one type of artifact.
The top row (a, b, c) illustrates the hue shift distortion. The degradation begins subtly in (a) with a slight, unnatural color cast ( Δ H = 0.05 ), barely perceptible in the image. In (b), the hue shift becomes more pronounced, reaching a noticeable color alteration ( Δ H = 0.10 ). By (c), the effect is most severe, resulting in a complete misrepresentation of the original scene’s colors, which now present a significant deviation from natural hues, disrupting the visual integrity of the image.
The middle row (d, e, f) is dedicated to the saturation shift effect. A slight, noticeable oversaturation is introduced in (d) ( α S = 1.2 ). This effect intensifies in (e) ( α S = 1.4 ) and culminates in (f) ( α S = 1.6 ), where colors appear excessively saturated, creating an almost “cartoon-like” quality with noticeable color bleeding, which severely alters the image’s realism.
The bottom row (g, h, i) demonstrates the intensity gamma distortion (non-linear mismatch), focusing on the impact of over-brightening ( γ < 1.0 ). The row begins with (g) ( γ = 0.4 ), where a severe brightening effect is observed, making the image appear “washed-out,” with a significant loss of contrast. In (h) ( γ = 0.6 ), the image is slightly less washed out, but still lacks the depth and richness of the original. By (i) ( γ = 0.8 ), the image shows minimal distortion and is closest to the original contrast, though still exhibiting slight differences in brightness levels. This gamma shift highlights how small adjustments in intensity can cause perceptible distortions in image clarity.

3.10. Consistency Analysis of NR and FR Metrics

To quantitatively validate the performance of the proposed NR metric, its scores must be benchmarked against established FR metrics. In this evaluation, the FR metrics (CC, SAM, and SID) are treated as the objective “ground truth” for image quality, as determined in the reduced-resolution validation protocol.
The agreement between the proposed NR metric and these FR metrics is assessed using three standard statistical criteria: the Spearman Rank-Order Correlation Coefficient (SROCC), the Pearson Linear Correlation Coefficient (PLCC), and the Root Mean Square Error (RMSE). These metrics constitute the standard protocol for validating image quality assessment algorithms [41,42] and are extensively employed in evaluating recent remote sensing and multi-focus image fusion frameworks [43,44].
Before calculating the PLCC and RMSE, a non-linear logistic regression is applied to the raw NR metric scores ( u i ) to map them onto the same scale as the FR scores. This results in a mapped predicted score, y i . This step is necessary because the raw NR scores and the FR ground truth scores x i may not be linearly related, even if they are monotonically associated.

3.10.1. Pearson Linear Correlation Coefficient (PLCC)

The PLCC measures the prediction accuracy of the NR metric after non-linear mapping. It quantifies the linear correlation between the mapped NR metric scores y i and the FR ground truth scores x i . The PLCC is calculated as
PLCC = i = 1 N ( x i x ¯ ) ( y i y ¯ ) i = 1 N ( x i x ¯ ) 2 i = 1 N ( y i y ¯ ) 2
where x ¯ , y ¯ are the means of the ground truth scores and the mapped predicted scores, respectively.

3.10.2. Spearman Rank-Order Correlation Coefficient (SROCC)

The SROCC measures the prediction monotonicity of the NR metric. It is a non-parametric test that assesses how well the rank order of the NR scores matches the rank order of the FR scores, without assuming a linear relationship. This is crucial for quality assessment, as a good metric must at least agree on which images are better or worse than others.
The SROCC is calculated as
SROCC = 1 6 i = 1 N ( v i p i ) 2 N ( N 2 1 )
where N is the total number of fused images; v i is the rank of the ground truth FR score y i ; p i is the rank of the predicted NR score u i .

3.10.3. Root Mean Square Error (RMSE)

The RMSE measures the prediction error. After applying the non-linear logistic mapping, the RMSE quantifies the average magnitude of the error (or residuals) between the mapped NR scores y i and the FR ground truth scores x i . The RMSE is calculated as
RMSE = 1 N i = 1 N ( x i y i ) 2
For a high-performing NR method, the SROCC, PLCC, and KROCC values should be high (closer to one), while the RMSE value should be as low as possible.

3.11. Quantitative Validation Results

To quantitatively assess the proposed MVG-SDI, its performance was benchmarked against the competing NR metrics, FQNR λ and MQNR λ . This evaluation was conducted using the RR validation protocol, treating the FR metrics CC, SAM, and SID as the ground truth. The performance was measured using the SROCC for monotonicity, the PLCC for accuracy after non-linear mapping, and the RMSE for prediction error. A superior NR metric should exhibit high SROCC and PLCC values alongside low RMSE values. The results across Table 8, (IK, WV-2) datasets and Table 9, (WV-3, and WV-4) datasets are discussed below, with the best value for each comparison highlighted in red in the corresponding tables.
The proposed MVG method demonstrated consistently strong performance, particularly when benchmarked against the CC and SAM metrics. Against CC, the proposed method achieved the top SROCC, PLCC, and RMSE values on the IK, WV-2, and WV-3 datasets. Against SAM, it secured the best performance across all three correlation criteria (SROCC, PLCC, RMSE) on the IK, WV-2, and WV-3 datasets. While its performance against SID was generally strong (e.g., best SROCC and PLCC on WV-3), the MQNR λ metric showed slightly better correlation and lower error against SID on the IK and WV-2 datasets. On the WV-4 dataset, the proposed method performed well but was outperformed by FQNR λ . Overall, the proposed index proved to be a highly effective and generally consistent metric, especially for predicting CC and SAM.
In contrast, FQNR λ showed variable performance. Its SROCC and PLCC values were often lower than the proposed method, especially against CC on the IK (SROCC 0.7308) and WV-3 (PLCC 0.5306) datasets. It exhibited a notably high RMSE (0.3686) when compared against SAM on the WV-2 dataset, indicating significant prediction error in that scenario. However, FQNR λ performed exceptionally well on the WV-4 dataset, achieving the best SROCC, PLCC, and RMSE against CC, the best PLCC and RMSE against SAM (tying for SROCC), and tying for the best performance against SID. This indicates FQNR λ can be highly accurate under certain conditions but lacks the overall consistency of the proposed method.
MQNR λ exhibited highly inconsistent performance. It consistently performed extremely well when benchmarked against SID, achieving the top SROCC, PLCC, and RMSE values on the IK and WV-2 datasets. However, its performance against CC and SAM was often poor. Against CC, it yielded very low SROCC and PLCC values on the WV-2 and WV-3 datasets. Against SAM, it produced a high RMSE on the WV-3 (0.4130) and WV-2 (0.3345) datasets, indicating significant prediction errors. Despite these weaknesses, MQNR λ performed very strongly on the WV-4 dataset, tying for the best SROCC against CC and SAM. This confirms that MQNR λ is highly effective for predicting SID but is unreliable for predicting CC and SAM across different datasets.

3.12. Computational Complexity Analysis

To evaluate the practical efficiency of the proposed MVG-SDI, a runtime comparison against three widely used NR metrics, Q N R λ , M Q N R λ , and F Q N R λ , was conducted. The computational complexity of the proposed method is primarily determined by the feature extraction and statistical fitting processes.
Theoretically, the complexity of extracting the FDD and CM features is linear with respect to the number of pixels N, i.e., O ( N ) . The subsequent MVG modeling involves calculating the covariance matrix of the feature vectors. With a fixed feature dimension ( D = 21 ) and a number of patches M proportional to the image size, the fitting process is efficient, scaling as O ( M · D 2 ) . Consequently, the overall computational complexity of the algorithm remains O ( N ) , ensuring it scales linearly with image resolution.
To validate this empirically, the average execution time across the entire IKONOS dataset, which comprises 200 images with a spatial resolution of 1024 × 1024 pixels, was measured. All experiments were performed on a computer equipped with an Intel Core i5-1035G1 CPU @ 1.19 GHz and 16 GB of RAM, running on a 64-bit operating system. The algorithms were implemented in MATLAB R2024a.
Table 10 presents the average execution times. The results indicate that the proposed MVG-SDI is computationally efficient for practical applications. While it requires slightly more processing time than the simpler M Q N R λ and Q N R λ indices due to the statistical modeling involved, it is approximately 35% faster than the advanced F Q N R λ metric. This demonstrates that the proposed method offers a favorable trade-off, providing sophisticated spectral distortion detection with a runtime comparable to established benchmarks.

4. Discussion

4.1. Decoupling Spectral Distortion from Spatial Artifacts

The most significant limitation of existing NR metrics, specifically QNR and its variants ( Q N R λ , M Q N R λ ), is their inability to effectively decouple spectral distortion from spatial information. As evidenced by the visual error maps in Figure 7, Figure 8, Figure 9 and Figure 10, these legacy metrics exhibit a strong “edge bias,” where high distortion scores are concentrated almost exclusively along high-frequency spatial edges (e.g., building boundaries). This suggests that these metrics are conflating spatial sharpening artifacts with spectral fidelity errors, leading to false positives in quality assessment.
In contrast, the proposed MVG-SDI generates error maps that align with the physical surfaces of objects rather than their outlines. By utilizing patch-based statistical fitting rather than pixel-to-pixel differences, the proposed method captures the distributional shift of spectral information. This confirms that the combination of FDD features derived from Benford’s Law and CMs successfully isolates radiometric inconsistencies from spatial sharpening effects, solving a persistent issue in the QNR framework.

4.2. Metric Stability and Linearity

For a quality metric to be practically useful in algorithm optimization, it must exhibit monotonicity and linearity, meaning the score should degrade proportionally as the image quality worsens. The degradation simulations in Figure 15, Figure 16, Figure 17 and Figure 18 reveal a critical flaw in the state-of-the-art F Q N R λ metric: it suffers from premature saturation. In saturation shift experiments, F Q N R λ spikes to near-maximum error values with even minor degradations ( α s = 1.2 ), rendering it useless for fine-tuning fusion algorithms as it cannot distinguish between “slightly bad” and “terrible”.
The proposed MVG-SDI, however, demonstrates a consistent, monotonic response across hue, saturation, and non-linear intensity gamma distortions. This linearity is crucial. It implies that the MVG-SDI provides a usable numerical gradient that accurately reflects the magnitude of error.

4.3. Generalization and Robustness

A critical advantage of the proposed MVG-SDI is its intrinsic capability to generalize across different sensor types and diverse imaging conditions without requiring retraining or parameter tuning. This robustness stems from the method’s unsupervised, instance-specific learning framework.
Unlike deep learning-based quality metrics that rely on fixed weights learned from a specific training distribution, which thus often suffer from domain shifts when applied to unseen sensors, the proposed method dynamically fits a unique MVG model for every individual image pair. The “ground truth” statistical reference ( μ r e f , Ψ r e f ) is derived directly from the original MS image of the scene under evaluation. Consequently, the specific spectral response of the sensor (e.g., the four-band configuration of IK vs. the eight-band configuration of WV-2) and the specific imaging conditions (e.g., atmospheric haze, solar angle, or seasonal vegetation changes) are automatically incorporated into the reference model.
Furthermore, the feature set employed, combining FDD based on Benford’s Law and CMs, captures fundamental statistical regularities of natural scenes rather than sensor-specific artifacts. The validity of this approach is empirically supported by the experimental results presented in Section 3, where the metric demonstrated consistent performance across four distinct satellite sensors (IK, WV-2, WV-3, and WV-4) covering varying spatial resolutions and spectral band configurations.

4.4. Discrepancies Between NR and FR Metrics

While Table 8 and Table 9 demonstrate strong overall correlations between MVG-SDI and FR metrics (e.g., SROCC up to 0.9000 with CC on WV-3), specific ranking disagreements in Table 4, Table 5, Table 6 and Table 7 raise questions about their origins. For example, on the WV-3 dataset, the MVG-SDI identifies GS as the best performer but flags BT-H (strong in SID) as the worst; on WV-4, it ranks PWMBF highest despite its suboptimal SAM. These discrepancies do not necessarily indicate a limitation of the MVG-SDI but rather highlight inherent constraints in the FR protocol itself. Wald’s RR protocol, while a standard benchmark, relies on downscaling assumptions that may not fully capture real-world FR distortions. Issues such as scale-invariance failures at high resolution ratios, biases from MTF filters, and sensor aging effects (as noted in prior work [17]) can lead FR metrics to over- or under-penalize certain algorithms, particularly those introducing non-linear spectral changes not evident at reduced scales. In contrast, the MVG-SDI’s NR design, focusing on multivariate statistical deviations (e.g., Mahalanobis distance between patch-based FDD and CM features), detects global and localized spectral inconsistencies without these scaling artifacts, making it more robust to such biases. That said, the MVG-SDI’s exclusive emphasis on spectral distortion (as a deliberate design choice) may contribute to disagreements in cases where spatial artifacts indirectly influence spectral statistics, though this is mitigated by its patch-based approach and decoupling from edge-biased features (Section 4.1). Ultimately, these divergences suggest that FR protocols, while valuable for validation, have limitations in generalizing to operational scenarios without a ground truth—positioning the MVG-SDI as a complementary tool that addresses these gaps rather than a flawed alternative.

4.5. Limitations and Future Research

4.5.1. Limitations

The primary limitation of the proposed MVG-SDI is its exclusive focus on spectral fidelity. As indicated by its design, the feature extraction process utilizing Benford’s Law on angular components and Color Moments is engineered strictly to isolate radiometric inconsistencies and color shifts. While this effectively solves the issue of spectral–spatial conflation found in legacy metrics like QNR, it renders the index blind to spatial distortions. Consequently, the MVG-SDI cannot detect structural degradations such as blurring or ghosting artifacts. For a holistic quality assessment, this index must currently be paired with a separate, dedicated spatial quality metric.

4.5.2. Future Work

Future research will focus on three strategic directions to maximize the utility of the MVG framework and advance the standardization of NR assessment:
Unified Quality Framework:
The immediate goal is to develop a comprehensive Multivariate Gaussian framework capable of quantifying both spectral and spatial distortions within a single model. We aim to construct a complementary MVG-based spatial distortion index utilizing features such as gradient profile sharpness and Laplacian statistics. The critical challenge will be integrating these features without re-introducing the “coupling” effects that plague current benchmarks. By combining this future spatial index with the current MVG-SDI, we intend to create a holistic “MVG-QNR” metric that maintains the statistical robustness demonstrated in this work, specifically the stability against hue and saturation shifts while independently penalizing spatial errors.
Loss Function for Deep Learning: Given that the MVG-SDI relies on differentiable statistical operations, a promising avenue is to integrate it as a perceptual loss function for training pansharpening CNNs. Unlike standard L 1 or L 2 losses that minimize pixel-wise errors, an MVG-SDI-based loss would force the network to explicitly prioritize the preservation of statistical spectral distributions, potentially mitigating the spectral color shifts often observed in deep learning-based fusion.
Extension to Hyperspectral and Multi-modal Imaging: While the current MVG-SDI is designed for pansharpening, the underlying multivariate statistical framework shows promise for other fusion tasks, such as infrared and visible image fusion. Specifically, the statistical foundation of this method, FDD in the HCS, is theoretically scalable to higher-dimensional data. Future work will investigate adapting the feature set to capture the specific statistical characteristics of hyperspectral pansharpening and thermal–visible fusion. In these domains, the preservation of precise radiometric signatures is paramount for downstream tasks such as material identification and classification.

5. Conclusions

This paper presents the No-Reference Multivariate Gaussian-based Spectral Distortion Index (MVG-SDI), a specialized metric for evaluating spectral distortions in pansharpened remote sensing images. By overcoming limitations in existing NR methods, such as conflating spectral and spatial artifacts, the MVG-SDI isolates spectral fidelity through patch-based analysis, combining FDD features from Benford’s Law in HCS and CM features. These are fitted to MVG models for the original MS and fused images, with distortion measured via Mahalanobis distance.
Experiments on the NBU dataset across five sensors (IKONOS, WorldView-2/3/4) show the MVG-SDI outperforming FQNRλ and MQNRλ in correlations with FR metrics like CC, SAM, and SID. It achieves high SROCC and PLCC values (e.g., 0.8089 SROCC and 0.9719 PLCC against CC on IK), with robust sensitivity to simulated distortions like hue and saturation shifts.

Author Contributions

B.O.A.A.: conceptualization, methodology, software, validation, formal analysis, investigation, data curation, writing—original draft preparation, writing—review and editing, visualization, and project administration; X.L.: conceptualization, resources, writing—review and editing, supervision, project administration, and funding acquisition; J.W. and X.H.: validation and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Practice and Innovation Funds for Graduate Students of Northwestern Polytechnical University, the Key Research and Development Program of Shaanxi Province (No. 2025CY-YBXM-079).

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found in [34].

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments and suggestions.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.

References

  1. Yan, H.-F.; Zhao, Y.-Q.; Chan, J.C.-W.; Kong, S.G.; El-Bendary, N.; Reda, M. Hyperspectral and multispectral image fusion: When model-driven meet data-driven strategies. Inf. Fusion 2025, 116, 102803. [Google Scholar] [CrossRef]
  2. Wu, J.; Li, X.; Hao, X. A review of full resolution quality assessment for multispectral pansharpening. In Proceedings of the 2022 10th International Conference on Information Systems and Computing Technology (ISCTech), Guilin, China, 28–30 December 2022; pp. 35–42. [Google Scholar]
  3. Perretta, M.; Delogu, G.; Funsten, C.; Patriarca, A.; Caputi, E.; Boccia, L. Testing the impact of pansharpening using PRISMA hyperspectral data: A case study classifying urban trees in Naples, Italy. Remote Sens. 2024, 16, 3730. [Google Scholar] [CrossRef]
  4. Liang, J.; Zhao, Z. Remote sensing image fusion method in NSST domain combining multiscale morphological gradient and neural network. Proc. SPIE 2025, 13506, 135062F. [Google Scholar]
  5. Meng, G.; Huang, J.; Wang, Y.; Fu, Z.; Ding, X.; Huang, Y. Progressive high-frequency reconstruction for pan-sharpening with implicit neural representation. Proc. AAAI Conf. Artif. Intell. 2024, 38, 4122–4130. [Google Scholar] [CrossRef]
  6. Ciotola, M.; Guarino, G.; Vivone, G.; Poggi, G.; Chanussot, J.; Plaza, A.; Scarpa, G. Hyperspectral pansharpening: Critical review, tools and future perspectives. IEEE Geosci. Remote Sens. Mag. 2024, 12, 1–25. [Google Scholar] [CrossRef]
  7. Ahmad, M.; Ghous, U.; Usama, M.; Mazzara, M. WaveFormer: Spectral–Spatial Wavelet Transformer for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
  8. Chen, Y.; Wan, Z.; Chen, Z.; Wei, M. CSLP: A novel pansharpening method based on compressed sensing and L-PNN. Inf. Fusion 2025, 118, 103002. [Google Scholar] [CrossRef]
  9. Deka, B.; Mullah, H.U.; Barman, T.; Rajan, K.S. Joint sparse representation-based single image super-resolution for remote sensing applications. IEEE Trans. Geosci. Remote Sens. 2023, 16, 2352–2365. [Google Scholar] [CrossRef]
  10. Sustika, R.; Suksmono, A.B.; Danudirdjo, D.; Wikantika, K. Remote sensing image pansharpening using deep internal learning with residual double-attention network. IEEE Access 2024, 12, 141627–141637. [Google Scholar] [CrossRef]
  11. Vivone, G.; Deng, L.-J.; Deng, S.; Hong, D.; Jiang, M.; Li, C.; Li, W.; Shen, H.; Wu, X.; Xiao, J.-L.; et al. Deep learning in remote sensing image fusion: Methods, protocols, data, and future perspectives. IEEE Geosci. Remote Sens. Mag. 2025, 13, 269–310. [Google Scholar] [CrossRef]
  12. Su, Z.; Yang, Y.; Huang, S.; Wan, W.; Sun, J.; Tu, W.; Chen, C. STCP: Synergistic transformer and convolutional neural network for pansharpening. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–15. [Google Scholar] [CrossRef]
  13. Wen, X.; Ma, H.; Li, L. A three-branch pansharpening network based on spatial and frequency domain interaction. Remote Sens. 2025, 17, 13. [Google Scholar] [CrossRef]
  14. Wald, L.; Ranchin, T.; Mangolini, M. Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images. Photogramm. Eng. Remote Sens. 1997, 63, 691–699. [Google Scholar]
  15. Gilbertson, J.K.; Kemp, J.; Niekerk, A.V. Effect of pan-sharpening multi-temporal landsat 8 imagery for crop type differentiation using different classification techniques. Comput. Electron. Agric. 2017, 134, 151–159. [Google Scholar] [CrossRef]
  16. Ren, K.; Sun, W.; Meng, X.; Yang, G.; Du, Q. Fusing China GF-5 hyperspectral data with GF-1, GF-2 and sentinel-2A multispectral data: Which methods should be used? Remote Sens. 2020, 12, 882. [Google Scholar] [CrossRef]
  17. Murga, J.; Otazu, X.; Fors, O.; Albert, P.; Palà, V.; Arbiol, R. Multiresolution-based image fusion with additive wavelet decomposition. IEEE Trans. Geosci. Remote Sens. 1999, 37, 1204–1211. [Google Scholar]
  18. Alparone, L.; Aiazzi, B.; Baronti, S.; Garzelli, A.; Nencini, F.; Selva, M. Multispectral and panchromatic data fusion assessment without reference. Photogramm. Eng. Remote Sens. 2008, 74, 193–200. [Google Scholar] [CrossRef]
  19. Aiazzi, B.; Alparone, L.; Baronti, S.; Carlà, R.; Garzelli, A.; Santurri, L. Full-scale assessment of pansharpening methods and data products. In Image and Signal Processing for Remote Sensing; Bruzzone, L., Ed.; SPIE: Bellingham, WA, USA, 2014; Volume 9244, pp. 1–12. [Google Scholar]
  20. Choi, J.; Yu, K.; Kim, Y. A new adaptive component-substitution-based satellite image fusion by using partial replacement. IEEE Trans. Geosci. Remote Sens. 2011, 49, 295–309. [Google Scholar] [CrossRef]
  21. Aiazzi, B.; Alparone, L.; Baronti, S.; Garzelli, A.; Selva, M. MTF-tailored multiscale fusion of high-resolution MS and pan imagery. Photogramm. Eng. Remote Sens. 2006, 72, 591–596. [Google Scholar] [CrossRef]
  22. Aiazzi, B.; Alparone, L.; Baronti, S.; Garzelli, A. Context-driven fusion of high spatial and spectral resolution images based on oversampled multiresolution analysis. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2300–2312. [Google Scholar] [CrossRef]
  23. Alparone, L.; Wald, L.; Chanussot, J.; Alparone, L.; Wald, L.; Chanussot, J.; Thomas, C.; Gamba, P.; Bruce, L.M. Comparison of pansharpening algorithms: Outcome of the 2006 GRS-S data fusion contest. IEEE Trans. Geosci. Remote Sens. 2007, 45, 3012–3021. [Google Scholar] [CrossRef]
  24. Restaino, R.; Mura, M.D.; Vivone, G.; Chanussot, J. Context-adaptive pansharpening based on image segmentation. IEEE Trans. Geosci. Remote Sens. 2017, 55, 753–766. [Google Scholar] [CrossRef]
  25. Vivone, G.; Restaino, R.; Mura, M.D.; Licciardi, G.; Chanussot, J. Contrast and error-based fusion schemes for multispectral image pansharpening. IEEE Geosci. Remote Sens. Lett. 2014, 11, 930–934. [Google Scholar] [CrossRef]
  26. Vivone, G.; Restaino, R.; Chanussot, J. A regression-based high-pass modulation pansharpening approach. IEEE Trans. Geosci. Remote Sens. 2018, 56, 984–996. [Google Scholar] [CrossRef]
  27. Restaino, R.; Vivone, G.; Dalla Mura, M.; Chanussot, J. Fusion of multispectral and panchromatic images based on morphological operators. IEEE Trans. Image Process. 2016, 25, 2882–2895. [Google Scholar] [CrossRef]
  28. Licciardi, G.; Vivone, G.; Simões, M.; Dalla Mura, M.; Restaino, R.; Bioucas-Dias, J.; Licciardi, G.A.; Chanussot, J. Pansharpening Based on Semiblind Deconvolution. IEEE Trans. Geosci. Remote Sens. 2015, 53, 1997–2010. [Google Scholar]
  29. Palsson, F.; Sveinsson, J.R.; Ulfarsson, M.O. A new pansharpening algorithm based on total variation. IEEE Geosci. Remote Sens. Lett. 2014, 11, 318–322. [Google Scholar] [CrossRef]
  30. Stępień, I.; Oszust, M. Three-branch neural network for no-reference quality assessment of pan-sharpened images. Eng. Appl. Artif. Intell. 2025, 139, 109594. [Google Scholar] [CrossRef]
  31. Zhou, B.; Shao, F.; Meng, X.; Fu, R.; Ho, Y.S. No-reference quality assessment for pansharpened images via opinion-unaware learning. IEEE Access 2020, 8, 112101–112111. [Google Scholar] [CrossRef]
  32. Meng, X.; Bao, K.; Shu, J.; Zhou, B.; Shao, F.; Sun, W.; Li, S. A blind full-resolution quality evaluation method for pansharpening. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
  33. Vivone, G.; Alparone, L.; Chanussot, J.; Dalla Mura, M.; Garzelli, A.; Licciardi, G.; Restaino, R.; Wald, L. A Critical Comparison Among Pansharpening Algorithms. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2565–2586. [Google Scholar] [CrossRef]
  34. Meng, X.; Xiong, Y.; Shao, F.; Shen, H.; Sun, W.; Yang, G.; Yuan, Q.; Fu, R.; Zhang, H. A Large-Scale Benchmark Data Set for Evaluating Pansharpening Performance: Overview and Implementation. IEEE Geosci. Remote Sens. Mag. 2021, 9, 18–52. [Google Scholar] [CrossRef]
  35. Wu, J.; Li, X.; Wei, B.; Li, L. A No-Reference Spectral quality assessment method for Multispectral pansharpening. In Proceedings of the IGARSS 2023–2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; pp. 5595–5598. [Google Scholar]
  36. Hao, X.; Li, X.; Wu, J.; Wei, B.; Song, Y.; Li, B. A No-Reference quality assessment method for Hyperspectral Sharpened images via Benford’s Law. Remote Sens. 2024, 16, 1167. [Google Scholar] [CrossRef]
  37. Varga, D. No-Reference Image Quality Assessment Based on the Fusion of Statistical and Perceptual Features. J. Imaging 2020, 6, 75. [Google Scholar] [CrossRef] [PubMed]
  38. Varga, D. Analysis of Benford’s Law for No-Reference Quality Assessment of Natural, Screen-Content, and Synthetic Images. Electronics 2021, 10, 2378. [Google Scholar] [CrossRef]
  39. Arienzo, A.; Vivone, G.; Garzelli, A.; Alparone, L.; Chanussot, J. Full-Resolution Quality Assessment of Pansharpening: Theoretical and hands-on approaches. IEEE Geosci. Remote Sens. Mag. 2022, 10, 168–201. [Google Scholar] [CrossRef]
  40. Wei, Y.; Yuan, Q.; Shen, H.; Zhang, L. Boosting the accuracy of multi-spectral image pan-sharpening by learning a deep residual network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1795–1799. [Google Scholar] [CrossRef]
  41. Sheikh, H.R.; Sabir, M.F.; Bovik, A.C. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 2006, 15, 3440–3451. [Google Scholar] [CrossRef]
  42. Horé, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 20th International Conference on Pattern Recognition (ICPR), Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
  43. Li, L.; Song, S.; Lv, M.; Jia, Z.; Ma, H. Multi-Focus Image Fusion Based on Fractal Dimension and Parameter Adaptive Unit-Linking Dual-Channel PCNN in Curvelet Transform Domain. Fractal Fract. 2025, 9, 157. [Google Scholar] [CrossRef]
  44. Zhang, Z.; Zhang, S.; Meng, X.; Chen, L.; Shao, F. Perceptual quality assessment for pansharpened images based on deep feature similarity measure. Remote Sens. 2024, 16, 4621. [Google Scholar] [CrossRef]
Figure 1. Flowchart of the proposed MVG-SDI method for spectral distortion assessment.
Figure 1. Flowchart of the proposed MVG-SDI method for spectral distortion assessment.
Sensors 26 01002 g001
Figure 2. FDD features of pristine MS images from various sensors in NBU database.
Figure 2. FDD features of pristine MS images from various sensors in NBU database.
Sensors 26 01002 g002
Figure 3. FDD of fused images from various sensors in NBU database.
Figure 3. FDD of fused images from various sensors in NBU database.
Sensors 26 01002 g003
Figure 4. FDD features of distorted images from various sensors in NBU database.
Figure 4. FDD features of distorted images from various sensors in NBU database.
Sensors 26 01002 g004
Figure 5. Heatmaps of HCD normalized angular components of the pansharpening results. (a) MS θ ¯ 1 , (b) MS θ ¯ 2 , (c) MS θ ¯ 3 , (d) IHS θ ¯ 1 , (e) IHS θ ¯ 2 , and (f) IHS θ ¯ 3 .
Figure 5. Heatmaps of HCD normalized angular components of the pansharpening results. (a) MS θ ¯ 1 , (b) MS θ ¯ 2 , (c) MS θ ¯ 3 , (d) IHS θ ¯ 1 , (e) IHS θ ¯ 2 , and (f) IHS θ ¯ 3 .
Sensors 26 01002 g005
Figure 6. MS and PAN image samples from IK (a,b), WV-2 (c,d), WV-3 (e,f), and WV-4 (g,h).
Figure 6. MS and PAN image samples from IK (a,b), WV-2 (c,d), WV-3 (e,f), and WV-4 (g,h).
Sensors 26 01002 g006
Figure 7. Visual comparison on the IK dataset: (a) QNR D λ , (b) FQNR D λ , (c) MQNR D λ , (d) proposed, (e) MS, and (f) GS.
Figure 7. Visual comparison on the IK dataset: (a) QNR D λ , (b) FQNR D λ , (c) MQNR D λ , (d) proposed, (e) MS, and (f) GS.
Sensors 26 01002 g007
Figure 8. Visual comparison on the WV-2 dataset: (a) QNR D λ , (b) FQNR D λ , (c) MQNR D λ , (d) proposed, (e) MS, and (f) GS.
Figure 8. Visual comparison on the WV-2 dataset: (a) QNR D λ , (b) FQNR D λ , (c) MQNR D λ , (d) proposed, (e) MS, and (f) GS.
Sensors 26 01002 g008
Figure 9. Visual comparison on the WV-3 dataset: (a) QNR D λ , (b) FQNR D λ , (c) MQNR D λ , (d) proposed, (e) MS, and (f) GS.
Figure 9. Visual comparison on the WV-3 dataset: (a) QNR D λ , (b) FQNR D λ , (c) MQNR D λ , (d) proposed, (e) MS, and (f) GS.
Sensors 26 01002 g009
Figure 10. Visual comparison on the WV-4 dataset: (a) QNR D λ , (b) FQNR D λ , (c) MQNR D λ , (d) proposed, (e) MS, and (f) GS.
Figure 10. Visual comparison on the WV-4 dataset: (a) QNR D λ , (b) FQNR D λ , (c) MQNR D λ , (d) proposed, (e) MS, and (f) GS.
Sensors 26 01002 g010
Figure 11. Close-ups of the main fusion results for the IK dataset (bands: RGB): (a) BT-H, (b) BDSD, (c) GS, (d) AWLP, (e) MTF-GLP, (f) MTF-GLP-HPM-FS, (g) PNN, (h) A-PNN, and (i) A-PNN-FT.
Figure 11. Close-ups of the main fusion results for the IK dataset (bands: RGB): (a) BT-H, (b) BDSD, (c) GS, (d) AWLP, (e) MTF-GLP, (f) MTF-GLP-HPM-FS, (g) PNN, (h) A-PNN, and (i) A-PNN-FT.
Sensors 26 01002 g011
Figure 12. Close-ups of the main fusion results for the WV-2 dataset (bands: RGB): (a) BT-H, (b) BDSD, (c) GS, (d) AWLP, (e) MTF-GLP, (f) MTF-GLP-HPM-FS, (g) PNN, (h) A-PNN, and (i) A-PNN-FT.
Figure 12. Close-ups of the main fusion results for the WV-2 dataset (bands: RGB): (a) BT-H, (b) BDSD, (c) GS, (d) AWLP, (e) MTF-GLP, (f) MTF-GLP-HPM-FS, (g) PNN, (h) A-PNN, and (i) A-PNN-FT.
Sensors 26 01002 g012
Figure 13. Close-ups of the main fusion results for the WV-3 dataset (bands: RGB): (a) BT-H, (b) BDSD, (c) GS, (d) AWLP, (e) MTF-GLP, (f) MTF-GLP-HPM-FS, (g) PNN, (h) A-PNN, and (i) A-PNN-FT.
Figure 13. Close-ups of the main fusion results for the WV-3 dataset (bands: RGB): (a) BT-H, (b) BDSD, (c) GS, (d) AWLP, (e) MTF-GLP, (f) MTF-GLP-HPM-FS, (g) PNN, (h) A-PNN, and (i) A-PNN-FT.
Sensors 26 01002 g013
Figure 14. Close-ups of the main fusion results for the WV-4 dataset (bands: RGB): (a) BT-H, (b) BDSD, (c) GS, (d) AWLP, (e) MTF-GLP, (f) MTF-GLP-HPM-FS, (g) PNN, (h) A-PNN, and (i) A-PNN-FT.
Figure 14. Close-ups of the main fusion results for the WV-4 dataset (bands: RGB): (a) BT-H, (b) BDSD, (c) GS, (d) AWLP, (e) MTF-GLP, (f) MTF-GLP-HPM-FS, (g) PNN, (h) A-PNN, and (i) A-PNN-FT.
Sensors 26 01002 g014
Figure 15. The evaluation of the degradations on the IK dataset using the GS method.
Figure 15. The evaluation of the degradations on the IK dataset using the GS method.
Sensors 26 01002 g015
Figure 16. The evaluation of the degradations on the WV-2 dataset using the GS method.
Figure 16. The evaluation of the degradations on the WV-2 dataset using the GS method.
Sensors 26 01002 g016
Figure 17. The evaluation of the degradations on the WV-3 dataset using the GS method.
Figure 17. The evaluation of the degradations on the WV-3 dataset using the GS method.
Sensors 26 01002 g017
Figure 18. The evaluation of the degradations on the WV-4 dataset using the GS method.
Figure 18. The evaluation of the degradations on the WV-4 dataset using the GS method.
Sensors 26 01002 g018
Figure 19. Visual examples of simulated spectral degradations. The top row (ac) shows hue shift artifacts with Δ H values of 0.05 , 0.10 , and 0.15 . The middle row (df) shows saturation shift applied with a factor α S of 1.2 , 1.4 , and 1.6 . The bottom row (gi) shows intensity gamma correction applied with a γ value of 0.4 , 0.6 , and 0.8 .
Figure 19. Visual examples of simulated spectral degradations. The top row (ac) shows hue shift artifacts with Δ H values of 0.05 , 0.10 , and 0.15 . The middle row (df) shows saturation shift applied with a factor α S of 1.2 , 1.4 , and 1.6 . The bottom row (gi) shows intensity gamma correction applied with a γ value of 0.4 , 0.6 , and 0.8 .
Sensors 26 01002 g019
Table 1. Details of the NBU dataset used for validation.
Table 1. Details of the NBU dataset used for validation.
SensorImage PairsPAN DimensionsMS DimensionsMS Bands
IK200 1024 × 1024 pixels 256 × 256 pixels4
WV-2500 1024 × 1024 pixels 256 × 256 pixels8
WV-3160 1024 × 1024 pixels 256 × 256 pixels8
WV-4500 1024 × 1024 pixels 256 × 256 pixels4
Table 2. Summary of typical pansharpening methods used for quality assessment.
Table 2. Summary of typical pansharpening methods used for quality assessment.
CategoryFusion MethodDescription
CSBT-HBrovey transform with haze correction [17]
BDSDBand-dependent spatial-detail-based [9]
BDSD-PCBDSD with physical constraints [10]
GSGam–Schmidt [7]
PRACSPartial replacement adaptive CS [20]
MRAAWLPAdditive wavelet luminance proportional [11]
C-MTF-GLP-CBDMTF-GLP-CBD [21,22,23] with local parameter estimation-exploiting clustering [24]
MTF-GLPGLP with modulation transfer function-matched filter [12]
MTF-GLP-CBDMTF-GLP [21,22] context-based decision with regression-based injection model [23]
MTF-GLP-FSMTF-GLP, with a full-scale regression-based injection model [40]
MTF-GLP-HPMMTF-GLP high-pass modulation injection model [15]
MTF-GLP-HPM-HMTF-GLP-HPM, with haze correction [16]
MTF-GLP-HPM-RMTF-GLP-HPM [21,25] with preliminary regression-based spectral matching phase [26]
MFNon-linear decomposition scheme exploiting half-gradient morphological filters [27]
VOFE-HPMFilter estimation based on a semi-blind deconvolution framework and HPM injection model [28]
SR-DSparse representation of injected Details [14]
PWMBFPrincipal component analysis/wavelet model-based fusion [18]
TVTotal variation pansharpening [29]
MLPNNPansharpening neural network [19]
PNN-IDXPNN with input auxiliary indexes [19]
A-PNNAdvanced PNN [19]
A-PNN-FTA-PNN with fine tuning [19]
Table 3. Fixed parameters and implementation settings for MVG-SDI.
Table 3. Fixed parameters and implementation settings for MVG-SDI.
ParameterValueDescription
Patch size 32 × 32 Size of non-overlapping blocks for local feature extraction.
HCS normalization 2 / π Constant used to normalize angular components θ k to [ 0 , 1 ] .
FDD bins9Digits { 1 , , 9 } used for First Digit Distribution analysis.
CM features12Mean, std. dev, and skewness calculated for 4 channels (R, G, B, NIR).
FDD features9Probability of leading digits derived from HCS angular components.
Total dimensionality21Concatenated feature vector size per patch.
Table 4. The numerical evaluation results on the IK dataset (best results in red, worst in blue).
Table 4. The numerical evaluation results on the IK dataset (best results in red, worst in blue).
CategoryFusion ModelFR EvaluationNR Evaluation
CCSAMSIDQNRλMQNRλFQNRλOurs
CSBT-H0.93664.05700.00810.09303.21360.31080.1320
BDSD0.92964.28850.00950.06301.94460.06160.1001
BDSD-PC0.92974.28240.00940.03871.93160.06110.0986
GS0.90054.54960.01020.04391.81510.13460.0442
PRACS0.91054.31310.00900.03842.66810.04360.0365
MRAAWLP0.91054.31310.00900.12882.93690.02500.1146
C-MTF-GLP-CBD0.93334.07670.00850.08722.40640.02710.0834
MTF-GLP0.92464.31860.00950.14652.87170.03060.1330
MTF-GLP-CBD0.92754.25740.00900.09802.52890.02820.0979
MTF-GLP-FS0.92854.22240.00890.10272.53130.02830.1015
MTF-GLP-HPM0.93064.05040.00830.14252.94440.02860.1314
MTF-GLP-HPM-H0.93664.06010.00810.11453.32800.02620.1031
MTF-GLP-HPM-R0.93413.98980.00810.09692.37140.02650.0975
MF0.92854.03570.00830.12252.94770.03560.1126
VOFE-HPM0.93454.02200.00820.11342.80010.02950.0914
SR-D0.92844.01720.00830.10082.26260.01380.1466
PWMBF0.91294.64270.01070.16212.59000.09820.0789
TV0.93553.84420.00790.03052.42110.04190.0799
MLPNN0.95093.39430.00620.01952.34450.02250.1017
PNN-IDX0.95083.46130.00640.02752.63870.02330.1028
A-PNN0.95053.39780.00640.00562.21080.02160.1027
A-PNN-FT0.93443.73130.00730.01032.27940.02150.1059
Table 5. The numerical evaluation results on the WV-2 dataset (best results in red, worst in blue).
Table 5. The numerical evaluation results on the WV-2 dataset (best results in red, worst in blue).
CategoryFusion ModelFR EvaluationNR Evaluation
CCSAMSIDQNRλMQNRλFQNRλOurs
CSBT-H0.94766.52610.03230.00752.09610.19810.0655
BDSD0.93627.68000.05130.04533.86000.20840.0895
BDSD-PC0.94347.28590.04530.01743.28620.15450.0398
GS0.91377.57210.04320.01721.98430.01570.0258
PRACS0.92158.30730.05290.01894.07980.06130.0395
MRAAWLP0.92158.30730.05290.05545.44840.02220.1054
C-MTF-GLP-CBD0.93707.10240.04370.03342.90360.02790.0539
MTF-GLP0.94586.72860.03690.06963.49100.02450.0942
MTF-GLP-CBD0.93827.30920.04380.05793.45990.02490.0871
MTF-GLP-FS0.93957.21550.04280.06043.48240.02480.0905
MTF-GLP-HPM0.94676.82510.03690.06533.00960.02500.1002
MTF-GLP-HPM-H0.94796.49640.03200.07033.65730.02460.1003
MTF-GLP-HPM-R0.93877.45960.04470.05483.42590.02550.0936
MF0.90937.23190.04000.06623.22280.03170.1050
VOFE-HPM0.91837.22960.04010.06213.09230.02700.0999
SR-D0.92576.89610.03870.04922.74200.03730.1506
PWMBF0.93227.58240.04470.10183.22400.06440.0955
TV0.94446.55770.03630.01804.03500.03370.1053
MLPNN0.94446.07170.02640.06075.31510.04890.0369
PNN-IDX0.94525.94370.02620.07975.78300.05200.0373
A-PNN0.94096.14170.02800.06874.89820.04250.0448
A-PNN-FT0.93326.39610.03180.05464.17340.04570.0587
Table 6. The numerical evaluation results on the WV-3 dataset (best results in red, worst in blue).
Table 6. The numerical evaluation results on the WV-3 dataset (best results in red, worst in blue).
CategoryFusion ModelFR EvaluationNR Evaluation
CCSAMSIDQNRλMQNRλFQNRλOurs
CSBT-H0.94886.39930.03160.01751.74060.49230.2899
BDSD0.94936.77840.04050.03182.92330.18810.0989
BDSD-PC0.95046.68680.03780.05672.13960.17620.0852
GS0.93817.64660.04090.02751.95140.02250.0243
PRACS0.94366.83050.03910.02442.69450.10000.0316
MRAAWLP0.94366.83050.03910.07212.82800.03720.0981
C-MTF-GLP-CBD0.94426.98640.04240.04012.67320.04650.0485
MTF-GLP0.95076.43420.03300.09832.93320.04260.0917
MTF-GLP-CBD0.94986.44290.03380.07972.68820.04290.0785
MTF-GLP-FS0.94986.44940.03400.08312.72340.04280.0825
MTF-GLP-HPM0.94486.48660.03600.08662.86000.04460.1140
MTF-GLP-HPM-H0.94976.38940.03190.10913.34630.04210.1075
MTF-GLP-HPM-R0.94866.60160.03720.06652.74340.04430.0994
MF0.93706.60550.03610.09172.94590.04270.0992
VOFE-HPM0.94476.56910.03600.08432.80140.04250.0982
SR-D0.93816.78310.06680.06892.27040.02510.1525
PWMBF0.94037.22510.04030.15793.69270.09390.1069
TV0.94856.78040.04890.02492.97080.04580.1308
MLPNN0.92487.58780.04260.04903.65730.08310.1112
PNN-IDX0.90489.22210.09660.06154.69880.14030.1358
A-PNN0.92076.82880.04070.06963.90700.05710.1146
A-PNN-FT0.94446.31690.03530.05073.71000.06720.0857
Table 7. The numerical evaluation results on the WV-4 dataset (best results in red, worst in blue).
Table 7. The numerical evaluation results on the WV-4 dataset (best results in red, worst in blue).
CategoryFusion ModelFR EvaluationNR Evaluation
CCSAMSIDQNRλMQNRλFQNRλOurs
CSBT-H0.95583.71170.00830.08942.23410.38400.1517
BDSD0.95823.86530.00890.01741.62690.10750.0363
BDSD-PC0.95853.84710.00880.01741.62690.10750.0363
GS0.95713.76870.00820.03591.71880.15490.0286
PRACS0.96293.95290.00920.02641.59530.08590.0205
MRAAWLP0.96293.95290.00920.03701.79950.02930.0361
C-MTF-GLP-CBD0.95963.90700.00960.03591.93390.03100.0188
MTF-GLP0.96093.67750.00900.06191.92960.03260.0397
MTF-GLP-CBD0.96283.83380.00900.05861.93270.03190.0344
MTF-GLP-FS0.96253.82480.00900.05851.93130.03190.0344
MTF-GLP-HPM0.96153.69970.00800.06092.02370.03270.0391
MTF-GLP-HPM-H0.95713.85010.00900.03641.70220.03260.0466
MTF-GLP-HPM-R0.96263.98610.00940.05652.00150.03200.0348
MF0.96183.67940.00790.06742.14060.03710.0433
VOFE-HPM0.96593.66800.00780.05872.01660.03070.0380
SR-D0.95853.90800.02120.02561.60370.01440.0680
PWMBF0.95844.48910.01090.06122.04410.08120.0136
TV0.96284.00230.00950.02791.99460.03830.0288
Table 8. Consistency analysis results of NR and FR metrics on the IK and WV-2 datasets (best results are highlighted in red).
Table 8. Consistency analysis results of NR and FR metrics on the IK and WV-2 datasets (best results are highlighted in red).
FR MetricNR MetricIKWV-2
SROCCPLCCRMSESROCCPLCCRMSE
CCOurs0.80890.97190.00320.70000.95940.0036
FQNRλ0.73080.86020.01130.60000.87910.0951
MQNRλ0.60700.56020.01130.50000.00220.0129
SAMOurs0.89030.87650.07520.88010.95100.0095
FQNRλ0.88100.77200.09930.86050.77180.3686
MQNRλ0.60000.79330.09510.70000.81680.3345
SIDOurs0.78000.83260.00040.87040.83130.0066
FQNRλ0.87000.82760.00040.85050.74120.0061
MQNRλ0.90000.93690.00020.90000.88000.0035
Table 9. Consistency analysis results of NR and FR metrics on the WV-3 and WV-4 datasets (best results are highlighted in red).
Table 9. Consistency analysis results of NR and FR metrics on the WV-3 and WV-4 datasets (best results are highlighted in red).
FR MetricNR MetricWV-3WV-4
SROCCPLCCRMSESROCCPLCCRMSE
CCOurs0.90000.99370.00050.66690.91830.0009
FQNRλ0.80000.53060.00390.97470.98780.0004
MQNRλ0.30000.43010.00460.97470.56100.0024
SAMOurs0.90000.98390.12510.66690.80170.0496
FQNRλ0.80000.96280.34440.97470.99680.0066
MQNRλ0.30000.13310.41300.97470.99290.0098
SIDOurs0.78000.94840.00110.75910.89090.0003
FQNRλ0.70000.94590.00110.87510.99390.0001
MQNRλ0.40000.94590.00110.87210.99290.0002
Table 10. Average execution time (in seconds) on the IKONOS dataset.
Table 10. Average execution time (in seconds) on the IKONOS dataset.
MetricAverage Time (s)
MQNRλ0.6898
QNRλ1.1077
Ours1.9955
FQNRλ3.0951
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Adam, B.O.A.; Li, X.; Wu, J.; Hao, X. A No-Reference Multivariate Gaussian-Based Spectral Distortion Index for Pansharpened Images. Sensors 2026, 26, 1002. https://doi.org/10.3390/s26031002

AMA Style

Adam BOA, Li X, Wu J, Hao X. A No-Reference Multivariate Gaussian-Based Spectral Distortion Index for Pansharpened Images. Sensors. 2026; 26(3):1002. https://doi.org/10.3390/s26031002

Chicago/Turabian Style

Adam, Bishr Omer Abdelrahman, Xu Li, Jingying Wu, and Xiankun Hao. 2026. "A No-Reference Multivariate Gaussian-Based Spectral Distortion Index for Pansharpened Images" Sensors 26, no. 3: 1002. https://doi.org/10.3390/s26031002

APA Style

Adam, B. O. A., Li, X., Wu, J., & Hao, X. (2026). A No-Reference Multivariate Gaussian-Based Spectral Distortion Index for Pansharpened Images. Sensors, 26(3), 1002. https://doi.org/10.3390/s26031002

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop