New Combined Metric for Full-Reference Image Quality Assessment

Mariusz Frackiewicz; Łukasz Machalica; Henryk Palus

doi:10.3390/sym16121622

,

and

Department of Data Science and Engineering, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland

^*

Author to whom correspondence should be addressed.

Symmetry2024, 16(12), 1622;https://doi.org/10.3390/sym16121622

This article belongs to the Section Computer

Version Notes

Order Reprints

Abstract

In recent years, many new metrics highly correlated with the Mean Opinion Score (MOS) have been proposed for assessing image quality through Full-Reference Image Quality Assessment (FR-IQA) methods, such as MDSI, HPSI, and GMSD. Eight of these selected metrics, which compare reference and distorted images in a symmetrical manner, are briefly described in this article, and their performance is evaluated using correlation criteria (PLCC, SROCC, and KROCC), as well as RMSE. The aim of this paper is to develop a new, efficient quality index based on a combination of several high-performance metrics already utilized in the field of Image Quality Assessment (IQA). The study was conducted on four benchmark image databases (TID2008, TID2013, KADID-10k, and PIPAL) and identified the three best-performing metrics for each database. The paper introduces a New Combined Metric (NCM), which is a weighted sum of three component metrics, and demonstrates its superiority over each of its component metrics across all the examined databases. An optimization method for determining the weights of the NCM is also presented. Additionally, an alternative version of the combined metric, based on the fastest metrics and employing symmetric calculations for pairs of compared images, is discussed. This version also demonstrates strong performance.

Keywords:

image quality assessment; quality metrics; combined metrics; image databases; mean opinion scores

1. Introduction

Every day, a huge number of digital cameras generate a vast stream of images. Due to the multitude of applications of imaging devices in areas such as vision-based quality control of components in manufacturing processes, security monitoring, object detection systems in automotive applications, and analysis of diagnostic images in medicine, there has been a strong increase in the demand for Image Quality Assessment (IQA) methods.

Image quality can be assessed either subjectively or objectively. Subjective methods rely on the perceptual evaluation of image quality by human observers, which means that conducting these assessments incurs significant financial costs and requires a large number of participants. In contrast, objective methods utilize mathematical models to determine the values of various metrics related to image quality. Among Image Quality Assessment (IQA) methods, the most advanced are those that perform a symmetrical comparison between distorted images and their originals, referred to as Full-Reference IQA (FR-IQA) techniques. The scores generated by each IQA metric can be evaluated against subjective assessments, such as the Mean Opinion Score (MOS) or the Difference Mean Opinion Score (DMOS), derived from human viewers. For FR-IQA methods, the correlation coefficients obtained from comparisons with MOS indicate the effectiveness of the metric: the higher the coefficient, the more closely the metric aligns with human perception. For many years, efforts have been made in the field of FR-IQA to improve and refine existing quality metrics. Significant importance is attached to attempts to combine one metric with other quality measures to increase the correlation of the resulting quality index with MOS. Meanwhile, the rapid development of machine-learning and deep-learning techniques provides new methods for image quality assessment. The FR-IQA problem can be understood as a challenge in developing mathematical models that can perceptually assess the image quality in alignment with human judgment.

Over the years, numerous metrics for FR-IQA have been proposed that take various aspects of the Human Visual System (HVS) into account. Recently, attempts have been made to enhance the effectiveness of FR-IQA by combining existing metrics to create a “super” index. Theoretical foundations for such metric fusion can be found in Liu’s work [], where it is applied to very old and classical metrics such as PSNR, VSNR, SSIM, and VIF. In the paper by Okarma [], the properties of three FR-IQA metrics (MS-SSIM, VIF, and R-SVD) were analyzed, and a combined quality metric based on their product was proposed. It is named the Combined Quality Metric (CQM), and its three correlation coefficients increased in relation to the correlation coefficients with MOS of individual product multipliers.

Later, this concept was further developed using optimization or regression techniques to determine the optimal weights or exponents in the products of existing FR-IQA indices [,]. The Combined Image Similarity Index (CISI) proposed by Okarma in [] employs metrics similar to those used in the CQM. However, instead of the R-SVD metric, it utilizes the FSIMc metric. The CISI index demonstrates a higher correlation with Mean Opinion Score (MOS) compared to the CQM index. In the 2013 article by Okarma [], a new EHIS metric is introduced, which is based on the product of four multipliers: two familiar from CQM (MS-SSIM and VIF) and two new ones (WFSIMc and RFSIM). This approach improves the correlation with MOS over the combined CISI metric.

Another metrics fusion strategy was proposed by the author of [], who presented several versions of a linear combination (weighted sum) based on metrics selected from a dozen FR-IQA metrics. He referred to these new combined metrics as Linearly Combined Similarity Measures (LCSIM). The use of linear combination metrics requires determining the weighting factors for the FR-IQA metrics, which is achieved by solving the RMSE error minimization task using a genetic algorithm.

Another line of work in creating methods based on the fusion of metrics utilizes machine-learning techniques. One example can be seen in [], where the results of six traditional FR-IQA metrics (FSIMc, PSNR-HMA, PSNR-HVS, SFF, SR-SIM, and VIF) were used as a feature vector for training and testing a four-layer neural network. The output results produced by the neural network demonstrate a significant improvement over those achieved by the input metrics. Currently, deep neural networks, particularly CNNs, can learn the best combinations of metrics to optimize image quality assessment [,].

Combining metrics leads to increased computational complexity, which can be an issue in the context of real-time applications. Nevertheless, the fusion of quality metrics for FR-IQA is a promising research direction that could significantly improve the correlation of objective quality assessments with MOS.

In this paper, we consider a new combined metric for FR-IQA, based on metrics that are highly correlated with MOS and well-known from the literature. The structure of this paper is as follows: following the introduction, Section 2 provides an overview of eight relatively new and promising FR-IQA metrics. Section 3 presents their linear combination, along with the experimental results. Finally, Section 4 concludes the paper.

2. Overview of Highly Correlated FR-IQA Metrics

2.1. Feature SIMilarity Index FSIMc (Color Version)

In [], the Feature SIMilarity Index (FSIM) was introduced as a quality assessment metric for grayscale images, along with its color version, FSIMc. The local quality of the assessed image,

f_{2}

, when symmetrically compared to the reference image,

f_{1}

, is represented by two low-level similarity maps derived from phase congruences (

P C_{1}

,

P C_{2}

) []:

S_{P C} (x, y) = \frac{2 P C_{1} (x, y) \cdot P C_{2} (x, y) + T_{1}}{P C_{1}^{2} (x, y) + P C_{2}^{2} (x, y) + T_{1}},

(1)

and gradient magnitudes (

G_{1}

,

G_{2}

):

S_{G} (x, y) = \frac{2 G_{1} (x, y) \cdot G_{2} (x, y) + T_{2}}{G_{1}^{2} (x, y) + G_{2}^{2} (x, y) + T_{2}},

(2)

where

G_{1}

and

G_{2}

are Scharr gradient operators and

T_{1}

and

T_{2}

are positive constants introduced to enhance the stability of the formulas. Phase Congruence (PC) quantifies the presence and intensity of local features, including edges, corners, and textures. It is derived from the analysis of phase information in the frequency domain of the image (specifically, the phase of its Fourier transform). As an additional weighting factor for the similarity maps, the following

P C_{m}

is used:

P C_{m} (x, y) = m a x (P C_{1} (x, y), P C_{2} (x, y)) .

(3)

The final expression for the proposed image quality index is given by:

F S I M = \frac{\sum_{(x, y) \in Ω} S_{P C} (x, y) \cdot S_{G} (x, y) \cdot P C_{m} (x, y)}{\sum_{(x, y) \in Ω} P C_{m} (x, y)} .

(4)

FSIMc, the color extension of FSIM, incorporates the chrominance components I and Q. The calculation of the index values starts by decomposing the compared images,

f_{1}

and

f_{2}

, into their YIQ color components, where Y represents luminance and I and Q represent chrominance. Similar to the earlier defined similarity maps, additional similarity maps for the I and Q components were introduced:

S_{I} (x, y) = \frac{2 I_{1} (x, y) \cdot I_{2} (x, y) + T_{3}}{I_{1}^{2} (x, y) + I_{2}^{2} (x, y) + T_{3}},

(5)

S_{Q} (x, y) = \frac{2 Q_{1} (x, y) \cdot Q_{2} (x, y) + T_{4}}{Q_{1}^{2} (x, y) + Q_{2}^{2} (x, y) + T_{4}},

(6)

where

T_{3}

and

T_{4}

are positive constants. The overall chrominance similarity map is given by:

S_{C} (x, y) = S_{I} (x, y) \cdot S_{Q} (x, y) .

(7)

The inclusion of chromatic components in the FSIM index results in the following version of the formula for color images:

F S I M c = \frac{\sum_{(x, y) \in Ω} S_{P C} (x, y) \cdot S_{G} (x, y) \cdot {[S_{C} (x, y)]}^{λ} \cdot P C_{m} (x, y)}{\sum_{(x, y) \in Ω} P C_{m} (x, y)},

(8)

where the positive value of the

λ

exponent highlights the significance of chrominance in the color image quality assessment process. For subsequent studies utilizing FSIMc, the following parameter values were employed, as specified in []:

T_{1} = 0.85

,

T_{2} = 160

,

T_{3} = T_{4} = 200

, and

λ = 0.03

.

2.2. Mean Deviation Similarity Index MDSI

Many IQA metrics work as follows: they determine local distortions in the images, build similarity maps, and implement a pooling strategy based on the mean, weighted mean, standard deviation, etc. An example of this approach to IQA index modeling is the Mean Deviation Similarity Index (MDSI), described in []. The calculation of MDSI starts with converting the RGB color space components of the input images to a luminance component:

L = 0.2989 R + 0.5870 G + 0.1140 B

(9)

and two chromaticity components:

\begin{matrix} [\begin{matrix} H \\ M \end{matrix}] = [\begin{matrix} 0.30 & 0.04 & - 0.35 \\ 0.34 & - 0.6 & 0.17 \end{matrix}] [\begin{matrix} R \\ G \\ B \end{matrix}] . \end{matrix}

(10)

This index is derived from the calculation of gradient similarity (

G S

) for structural distortions and chromaticity similarity (

C S

) for color distortions. The local structural similarity map is typically computed using gradient values. Traditionally, structural similarity maps are obtained by calculating the gradient values separately for the original and distorted images. The traditional approach for the MDSI metric has been improved by integrating the gradient value map, which combines the luminance channel values from both images:

f = 0.5 (L_{r} + L_{d}),

(11)

where f represents the fused luminance image, r is the reference image, and d refers to the distorted image. The formulas for the proposed structural similarity are given below:

\begin{matrix} G S_{r f} (x) = \frac{2 G_{r} (x) G_{f} (x) + C_{2}}{G_{r}^{2} (x) + G_{f}^{2} (x) + C_{2}}, \end{matrix}

(12)

\begin{matrix} G S_{d f} (x) = \frac{2 G_{d} (x) G_{f} (x) + C_{2}}{G_{d}^{2} (x) + G_{f}^{2} (x) + C_{2}}, \end{matrix}

(13)

\hat{G S} (x) = G S (x) + [G S_{d f} (x) - G S_{r f} (x)] .

(14)

The gradient magnitude is calculated using the simple Prewitt operator. Additionally, the authors of the MDSI index have adjusted the method for evaluating local chromaticity similarity. In contrast to the previously discussed IQA metrics, such as FSIM or VSI, which assess chromaticity separately for the two chrominance components, this approach combines them in a different way. In the case of MDSI, it was suggested to calculate the color similarity for both chrominance components simultaneously, using the following formula:

\hat{C S} (x) = \frac{2 (H_{r} (x) H_{d} (x) + M_{r} (x) M_{d} (x)) + C_{3}}{H_{r}^{2} (x) + H_{d}^{2} (x) + M_{r} {(x)}^{2} + M_{d} {(x)}^{2} + C_{3}},

(15)

where

C_{3}

is a constant introduced for numerical stability. The joint color similarity map,

\hat{C S} (x)

, is then combined with the

\hat{G S} (x)

map using a weighted mean:

\hat{G C S} (x) = α \hat{G S} (x) + (1 - α) \hat{C S} (x),

(16)

where

α

determines the relative importance of the similarity maps

\hat{G S} (x)

and

\hat{C S} (x)

. The final step involves converting the resulting

\hat{G C S}

map into an MDSI score through a pooling strategy based on a specific deviation method:

M D S I = {[\frac{1}{N} \sum_{i = 1}^{N} | {\hat{G C S}}_{i}^{1 / 4} - (\frac{1}{N} \sum_{i = 1}^{N} | {\hat{G C S}}_{i}^{1 / 4}) |]}^{1 / 4} .

(17)

The original article [] provides suggestions for selecting various parameters that influence the performance of the MDSI index.

2.3. Haar Wavelet Perceptual Similarity Index HPSI

The Haar Wavelet-based Perceptual Similarity Index (HPSI) [] is a relatively novel and computationally efficient similarity metric for FR-IQA. HPSI uses coefficients obtained from Haar wavelet decomposition for assessing local similarities between two images. The one-dimensional Haar filters are given by:

h_{1}^{1 D} = \frac{1}{\sqrt{2}} \cdot [1, 1],

(18)

g_{1}^{1 D} = \frac{1}{\sqrt{2}} \cdot [- 1, 1],

(19)

where

h_{1}^{1 D}

represents the low-pass scaling filter and

g_{1}^{1 D}

refers to the corresponding high-pass wavelet filter. For any scale,

j \in N

, two-dimensional Haar filters can be constructed as follows:

g_{j}^{(1)} = g_{1}^{1 D} \otimes h_{1}^{1 D},

(20)

g_{j}^{(2)} = h_{1}^{1 D} \otimes g_{1}^{1 D},

(21)

where the symbol ⊗ denotes the outer product, and the one-dimensional filters

h_{j}^{1 D}

and

g_{j}^{1 D}

for

j > 1

are defined as:

g_{j}^{1 D} = h_{1}^{1 D} * {(g_{j - 1}^{1 D})}_{↑ 2},

(22)

h_{j}^{1 D} = h_{1}^{1 D} * {(h_{j - 1}^{1 D})}_{↑ 2},

(23)

where

↑ 2

is the dyadic upsampling operator and ∗ denotes the one-dimensional convolution operator. To effectively predict the perceptual similarity perceived by human viewers, it may be beneficial to apply an additional nonlinear mapping to the local similarities derived from the high-frequency Haar wavelet filter responses. This nonlinearity is represented by a logistic function, defined with a parameter

α > 0

, as follows:

l_{α} (x) = \frac{1}{1 + e^{- α x}} .

(24)

For two grayscale images,

f_{1}

and

f_{2}

, the local similarity measure employed to calculate the HPSI is derived from the first two steps of the two-dimensional discrete Haar wavelet transform, as expressed by the following formula:

H S_{f_{1}, f_{2}}^{(k)} [x] = l_{α} (\frac{1}{2} \sum_{j = 1}^{2} S (| (g_{j}^{(k)} * f_{1}) [x] |, | (g_{j}^{(k)} * f_{2}) [x] |, C)),

(25)

where

C > 0

,

k \in {1, 2}

selects either horizontal or vertical Haar wavelet filters, S denotes the similarity measure, and ∗ is the two-dimensional convolution operator. Similar to FSIMc, HPSI also applies a specific weighting map, which is derived here from the response of a single low-frequency Haar wavelet filter:

W_{f}^{(k)} [x] = | (g_{3}^{(k)} * f) [x] |,

(26)

where

k \in {1, 2}

again differentiates between horizontal and vertical filters. The final expression for the HPSI for grayscale images

f_{1}

and

f_{2}

is provided as a weighted average of the local similarity map,

H S_{f_{1}, f_{2}}^{(k)}

:

H P S I_{f_{1}, f_{2}} = l_{α}^{- 1} {(\frac{\sum_{x} \sum_{k = 1}^{2} H S_{f_{1}, f_{2}}^{(k)} [x] \cdot W_{f_{1}, f_{2}}^{(k)} [x]}{\sum_{x} \sum_{k = 1}^{2} W_{f_{1}, f_{2}}^{(k)} [x]})}^{2},

(27)

where:

W_{f_{1}, f_{2}}^{(k)} [x] = max (W_{f_{1}}^{(k)} [x], W_{f_{2}}^{(k)} [x]) .

(28)

HPSI can be extended for color images in the YIQ color space using a third local similarity map based on the chrominance components I and Q. This map,

H S_{f_{1}, f_{2}}^{(3)}

, is defined as:

H S_{f_{1}, f_{2}}^{(3)} [x] = l_{α} (\frac{1}{2} (S (| (m * f_{1}^{I}) [x] |, | (m * f_{2}^{I}) [x] |, C) + S (| (m * f_{1}^{Q}) [x] |, | (m * f_{2}^{Q}) [x] |, C))),

(29)

where m is a

2 \times 2

mean filter and then:

W_{f_{1}^{Y}, f_{2}^{Y}}^{(3)} [x] = \frac{1}{2} (W_{f_{1}^{Y}, f_{2}^{Y}}^{(1)} [x] + W_{f_{1}^{Y}, f_{2}^{Y}}^{(2)} [x]) .

(30)

The final form of the HPSI for color images is defined as follows:

H P S I c_{f_{1}, f_{2}} = l_{α}^{- 1} {(\frac{\sum_{x} \sum_{k = 1}^{3} H S_{f_{1}, f_{2}}^{(k)} [x] \cdot W_{f_{1}^{Y}, f_{2}^{Y}}^{(k)} [x]}{\sum_{x} \sum_{k = 1}^{3} W_{f_{1}^{Y}, f_{2}^{Y}}^{(k)} [x]})}^{2} .

(31)

The fast computation time of the HPSI may explain its high usefulness in various tasks.

2.4. Visual Saliency with Color Appearance and Gradient Similarity Index VCGS

The VCGS index [] uses color space CIELAB and combines three feature similarity maps: visual salience with color appearance similarity map,

S_{V C}

, gradient similarity map,

S_{G}

, and chrominance similarity map,

S_{C}

. The first of these maps is calculated using a formula based on visual saliency with color appearance (VC) for both images:

S_{V C} = \frac{2 V C_{1} \cdot V C_{2} + K_{V C}}{V C_{1}^{2} + V C_{2}^{2} + K_{V C}},

(32)

where

K_{V C}

is a small constant that controls the numerical stability of the formula. The gradient similarity map using the Scharr operator applied to the L component is calculated according to the formula:

S_{G} = \frac{2 G_{1} \cdot G_{2} + K_{G}}{G_{1}^{2} + G_{2}^{2} + K_{G}},

(33)

where

K_{G}

is a small constant that controls the numerical stability of the formula. The third map measures the similarity of the

a^{*}

and

b^{*}

chrominance components in the CIELAB color space and is given by the formula:

S_{C} = \frac{2 a_{1} \cdot a_{2} + K_{C}}{a_{1}^{2} + a_{2}^{2} + K_{C}} \cdot \frac{2 b_{1} \cdot b_{2} + K_{C}}{b_{1}^{2} + b_{2}^{2} + K_{C}},

(34)

where

K_{C}

is a small constant that controls the numerical stability of the formula. The final form of the VCGS metric is given by the following formula:

VCGS = \frac{\sum_{Ω} S_{V C} \cdot {(S_{G})}^{α} \cdot {(S_{C})}^{λ} \cdot V C_{m}}{\sum_{Ω} V C_{m}},

(35)

where

Ω

is the spatial domain,

V C_{m} = m a x (V C_{1}, V C_{2})

is used to weight the relevance of two maps in overall similarity, and

α

and

λ

represent the relative importance of the similarity maps depending on where they occur.

2.5. SuperPixel SIMilarity Index SPSIM

Superpixel-based SIMilarity (SPSIM) [] utilizes superpixel segmentation for feature extraction. Superpixels are clusters of neighboring pixels that share similar characteristics, such as color, intensity, or structure. This pixel grouping results in a mosaic consisting of a significantly smaller number of superpixels, which facilitates faster subsequent processing. A key advantage of using superpixel-based segmentation over other oversegmentation algorithms is the ability to predefine the number of generated superpixels. Additionally, superpixel segmentation improves the distinction of perceptually significant regions in the image. Among the various superpixel generation algorithms, we can distinguish between graph-based, gradient-based, clustering-based, and watershed-based methods, among others []. The shape and size of superpixels depend on the applied algorithm, with each pixel belonging to exactly one superpixel. These algorithms control the number and properties of the superpixels, such as compactness and minimum size. One of the most popular and efficient algorithms for superpixel segmentation is the k-means-based Simple Linear Iterative Clustering (SLIC) algorithm []. This algorithm is notable for producing superpixels with a consistent shape and size. A key benefit of SLIC is that segmentation only requires specifying the desired number of superpixels in the output image. Consequently, the SLIC algorithm is used in the SPSIM quality index discussed in this paper. For each superpixel, the algorithm calculates the mean CIELAB color values and the Local Binary Pattern (LBP) features. Superpixels are initially generated on the reference image and then applied to both the reference and distorted images.

The SPSIM index calculation algorithm relies on pixel gradient similarity and luminance-chrominance superpixel similarity. The YUV color space, rather than RGB, is utilized for SPSIM computation, where Y represents luminance and U and V denote chrominance components. If

s_{i}

is used to represent a superpixel containing pixel i, the following formulas can be written for luminance

L_{i}

and luminance similarity

M_{L} (i)

:

L_{i} = \frac{1}{|s_{i}|} \sum_{j \in s_{i}} Y (j), M_{L} (i) = \frac{2 L_{r} (i) L_{d} (i) + T_{1}}{L_{r}^{2} (i) + L_{d}^{2} (i) + T_{1}},

(36)

where

Y (j)

represents the luminance of pixel j and

L_{r} (i)

and

L_{d} (i)

denote the average luminance values for superpixel

s_{i}

in the reference and distorted images, respectively.

T_{1}

is a positive constant introduced to prevent instability in the equation. Similar expressions can be formulated for both the U and V chrominance components:

U_{i} = \frac{1}{|s_{i}|} \sum_{j \in s_{i}} U (j), M_{U} (i) = \frac{2 U_{r} (i) U_{d} (i) + T_{1}}{U_{r}^{2} (i) + U_{d}^{2} (i) + T_{1}},

(37)

V_{i} = \frac{1}{|s_{i}|} \sum_{j \in s_{i}} V (j), M_{U} (i) = \frac{2 V_{r} (i) V_{d} (i) + T_{1}}{V_{r}^{2} (i) + V_{d}^{2} (i) + T_{1}} .

(38)

The chrominance similarity,

M_{C}

, can then be calculated as shown below:

M_{C} (i) = M_{U} (i) M_{V} (i) .

(39)

The gradient similarity,

M_{G}

, is described by the following formula:

M_{G} (i) = \frac{2 G_{r} (i) G_{d} (i) + T_{2}}{G_{r}^{2} (i) + G_{d}^{2} (i) + T_{2}},

(40)

where the gradient magnitude, G, is composed of two components calculated using a simple Prewitt operator, and

T_{1}

and

T_{2}

are constants selected by the authors of the algorithm to account for contrast-related errors. Further information on the determination of

T_{1}

and

T_{2}

can be found in []. The formula for calculating the similarity of superpixel i in both images is as follows:

M (i) = M_{G} (i) {[M_{L} (i)]}^{α} e^{β (M_{C} (i) - 1)},

(41)

where the parameters

α

and

β

represent the weights for the luminance and chrominance components, respectively. Finally, the SPSIM index is calculated as a weighted sum of

M (i)

and the corresponding weights, which are determined based on the texture complexity (

T C

), described by the standard deviation (

s t d

) and kurtosis (

K u r t

) of the superpixels:

T C_{r} (i) = \frac{s t d (S_{r} (i))}{K u r t [S_{r} (i)] + 3}, T C_{d} (i) = \frac{s t d (S_{d} (i))}{K u r t [S_{d} (i)] + 3},

(42)

w (i) = e x p (0.05 \cdot a b s (T C_{d} (i) - T C_{r} (i))),

(43)

S P S I M = \frac{\sum_{i = 1}^{N} M (i) w (i)}{\sum_{i = 1}^{N} w (i)},

(44)

where

S_{r} (i)

and

S_{d} (i)

are, respectively, the superpixels in the reference and distorted images that contain the i-th pixel.

2.6. Local Global Variation Index LGV and Saliency Weighted Local Global Variation Index SWLGV

Varga [] introduced new quality indices that utilize both gradients in the image and Grünwald–Letnikov fractional derivatives. While gradients capture local variations within the image, fractional derivatives describe global variations, represented by the flowing global similarity map:

S_{G} (x, y) = \frac{2 \cdot {}^{G L}D^{α} R (x, y) \cdot {}^{G L}D^{α} D (x, y) + c_{1}}{{({}^{G L}D^{α} R (x, y))}^{2} + {({}^{G L}D^{α} D (x, y))}^{2} + c_{1}},

(45)

where

R (x, y)

is the reference image,

D (x, y)

is the distorted image,

{}^{G L}D^{α}

is the

α

-order Grünwald–Letnikov fractional derivative, and

c_{1}

is a constant number that provides numerical stability. The order of the fractional derivative was set to

α = 0.6

. The

3 \times 3

Scharr operator was used to compute the gradients for the local gradient map,

S_{L} (x, y)

:

S_{L} (x, y) = \frac{2 \cdot G_{R} (x, y) \cdot G_{D} (x, y) + c_{2}}{G_{R}^{2} (x, y) + G_{D}^{2} (x, y) + c_{2}},

(46)

where

c_{2}

is a constant that ensures numerical stability. The similarity map between the two compared images was calculated using the two previously defined gradient maps and the exponential coefficient

λ = 0.7

:

S (x, y) = {(S_{G} (x, y))}^{λ} \cdot {(S_{L} (x, y))}^{1 - λ} .

(47)

Finally, the similarity map obtained is fused with the saliency map. This index is referred to as the Local Global Variation (LGV):

L G V = \frac{1}{M \cdot N} \sum_{x = 1}^{M} \sum_{y = 1}^{N} S (x, y),

(48)

where

M \cdot N

is the resolution of images.

The SWLGV index, in contrast to LGV, also incorporates the mechanism of visual saliency. It emphasizes the differences between the reference and distorted images in the most distinctive regions. By labeling the maps of the distinguishing regions as

S M_{R} (x, y)

for the reference image and

S M_{D} (x, y)

for the distorted image, we can create a formula for the image pair:

S M (x, y) = m a x (S M_{R} (x, y), S M_{D} (x, y)),

(49)

where

S M_{R} (x, y)

and

S M_{D} (x, y)

are visual saliency maps built as proposed in []. The SWLGV index is defined as the weighted average of

S (x, y)

and

S M (x, y)

, where

S M (x, y)

represents the weights:

S W L G V = \frac{\sum_{i = 1}^{M} \sum_{j = 1}^{N} S M (x, y) \cdot S (x, y)}{\sum_{i = 1}^{M} \sum_{j = 1}^{N} S M (x, y)} .

(50)

2.7. Gradient Magnitude Similarity Deviation Index GMSD

GMSD [] is a relatively simple metric, which is based on a gradient similarity map and uses a

3 \times 3

Prewitt filter. The magnitudes of the gradients of images r and d at position i, denoted by

m_{r} (i)

and

m_{d} (i)

, are calculated as follows:

m_{r} (i) = \sqrt{{(r \otimes h_{x})}^{2} (i) + {(r \otimes h_{y})}^{2} (i)},

(51)

m_{d} (i) = \sqrt{{(d \otimes h_{x})}^{2} (i) + {(d \otimes h_{y})}^{2} (i)},

(52)

where ⊗ denotes a convolution operation. The magnitude gradient similarity map,

G M S (i)

, is then calculated as follows:

G M S (i) = \frac{2 m_{r} (i) m_{d} (i) + c}{m_{r}^{2} (i) + m_{d}^{2} (i) + c},

(53)

where c is a constant number that provides numerical stability. The formulas above demonstrate a symmetrical approach to both referenced and distorted images. The average gradient value from the

G M S (i)

map was then determined as:

G M S M = \frac{1}{N} \sum_{i = 1}^{N} G M S (i),

(54)

where N is a number of pixels in image. Finally, the GMSD index is defined by the formula:

G M S D = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(G M S (i) - G M S M)}^{2}} .

(55)

2.8. Evaluation Criteria for IQA

Individual IQA metrics are commonly compared with the subjective ratings of specific images. To assess the linearity, monotonicity, and accuracy of these predictions, four criteria are used: the Pearson Linear Correlation Coefficient (PLCC), the Spearman Rank Order Correlation Coefficient (SROCC), the Kendall Rank Order Correlation Coefficient (KROCC), and the Root Mean Squared Error (RMSE). The formulas for these comparisons are provided below:

P L C C = \frac{\sum_{i = 1}^{N} (p_{i} - \bar{p}) (s_{i} - \bar{s})}{\sqrt{\sum_{i = 1}^{N} {(p_{i} - \bar{p})}^{2} {(s_{i} - \bar{s})}^{2}}},

(56)

where

p_{i}

and

s_{i}

represent the raw values of the subjective and objective measures, respectively, and

\bar{p}

and

\bar{s}

are the mean values of the subjective and objective measures.

Spearman Rank Order Correlation Coefficient is given by the formula:

S R O C C = 1 - \frac{6 \sum_{i = 1}^{N} d_{i}^{2}}{N (N^{2} - 1)},

(57)

where

d_{i}

represents the difference between the ranks of both measures for the i-th observation and N is the total number of observations.

Kendall Rank Order Correlation Coefficient (KROCC) is provided by the formula:

K R O C C = \frac{N_{c} - N_{d}}{0.5 (N - 1) N},

(58)

where

N_{c}

and

N_{d}

denote the counts of concordant and discordant pairs.

Root Mean Squared Error (RMSE) is given by the following equation:

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(p_{i} - s_{i})}^{2}},

(59)

where

p_{i}

and

s_{i}

are defined as above.

The above correlation coefficients are useful tools for objectively assessing the agreement between IQA computational models and subjective MOS assessments. However, they capture specific aspects of this relationship, such as linearity in the case of PLCC or monotonicity in the cases of SROCC and KROCC. SROCC and KROCC are suitable for scenarios where the relationship between variables is nonlinear, with KROCC offering high robustness to small changes in the data. RMSE, on the other hand, is a measure of error primarily used to evaluate the accuracy of a model’s predictions. Unlike PLCC, SROCC, and KROCC, it is not a measure of correlation, and its role is fundamentally different. RMSE quantifies the average distance between predicted and actual values. It is sensitive to the magnitude of errors because the differences are squared before averaging, meaning that larger deviations have a disproportionate impact on the final value. High RMSE values indicate poor agreement between predicted and actual values, but RMSE does not provide information about the type of relationship (e.g., whether it is monotonic, linear, or otherwise).

As recommended in [], a nonlinear mapping was applied to calculate PLCC and RMSE. This process involves the use of a fitting function, usually a logistic function with five beta parameters,

β_{1}, β_{2}, β_{3}, β_{4}, β_{5}

, to better represent the relationship between predicted performance, x, and MOS.

p (x, β) = β_{1} (\frac{1}{2} - \frac{1}{1 + e x p (β_{2} (x - β_{3}))}) + β_{4} x + β_{5} .

(60)

3. The New Combined Metric (NCM) and Its Experimental Research

The New Combined Metric (NCM):

N C M = α \cdot M_{1} + β \cdot M_{2} + γ \cdot M_{3},

(61)

where

M_{1}

,

M_{2}

, and

M_{3}

are the selected FR-IQA metrics for given dataset and

α

,

β

, and

γ

are the optimized weights.

3.1. Selected IQA Databases

Four benchmark databases, TID2008 [], TID2013 [], KADID-10k [], and PIPAL [], were chosen for the research. These databases are distinguished by a large set of reference images, diverse distortion types, and varying levels of their presence in the images. For each image in the databases, Mean Opinion Scores (MOSs) are experimentally gathered by collecting assessments from multiple human observers.

The TID2008 image database consists of 1700 distorted images, generated using 17 different distortion types, each applied at four levels to 25 reference images (Figure 1). MOS was provided based on the work of 838 human observers and 256,428 comparisons. All images have a resolution of 512 × 384 pixels.

Figure 1. Reference images of the TID2008 and TID2013 databases [].

The TID2013 image database is an updated and expanded version of TID2008. It retains the same set of reference images (Figure 1), but the number of distortion types has been increased to 24, and the distortion levels have been raised to five. The database includes 3000 distorted digital images. Additionally, the size of the research group from which the average subjective ratings were derived has been enlarged. MOS ratings were collected from 524,340 comparisons made by 971 observers. The image resolution remains unchanged.

Online crowdsourcing for image assessment has enabled the creation of larger databases. One such large database, KADID-10k (Konstanz Artificially Distorted Image Quality Database) [], contains 10,125 digital images with subjective quality scores (MOSs). It was developed and published by 2209 crowd workers. This database includes a limited selection of reference images (81) (Figure 2), a restricted number of artificial distortion types (25), and five levels for each distortion type. Recently, KADID-10k has become widely used for deep-learning models for image quality assessment []. The artificial distortions present in the KADID-10k database include spatial distortions, noise, blurs, and more. The image resolution remains unchanged.

Figure 2. Reference images of the KADID-10k database [].

PIPAL is a large IQA dataset, first introduced in 2020 by [], that increased the number of reference images to 250. In fact, these are 288 × 288 fragments from images in the DIV2K and Flickr2K high-resolution image collections (Figure 3), with distortion types increased to 40 and distorted images increased to 29,000, and it contains 1,130,000 human ratings. In this image database, the Elo rating system was used to assign the Mean Opinion Scores (MOSs). Currently, the PIPAL dataset is used in many challenges as a benchmark for IQA algorithms.

Figure 3. Examples of reference images from the PIPAL database [].

The key information regarding the selected IQA benchmark databases is presented in Table 1.

Table 1. Comparison of the selected IQA databases.

3.2. Experimental Tests

The experimental study began with the determination of the PLCC, SROCC, and KROCC correlation coefficients, and RMSE values for eight selected highly correlated metrics. The results of these tests for the four study datasets are included in Table 2. The three highest correlation coefficients and the three lowest RMSE values are shown in different colors (the best results in red, the second results in green, and the third results in blue).

Table 2. Values of correlation coefficients and RMSE for FR-IQA metrics.

The three correlation coefficients and the RMSE value were aggregated into one score. A point scale with values from 1 to 8 was adopted, where the highest points were awarded to the highest correlation coefficients and the lowest RMSE values. This ranking is shown in Table 3, where the point values for each dataset are also summarized in the columns. The number of points determined the three component metrics for each dataset. The three highest scores are highlighted in bold.

Table 3. Ranking of FR-IQA metrics.

The three metrics selected from the table served as the components

M_{1}

,

M_{2}

, and

M_{3}

for the linear combination that determines the New Combined Metric, as defined in Formula (61).

Determining the NCM value requires calculating the

α

,

β

, and

γ

weights present in this formula. These weights are optimized in the Matlab environment using the

f m i n c o n

function. In the optimization task, the PLCC linear correlation coefficient is maximized. The obtained values of the weights for each dataset are given in Table 4. Based on these weights and component metrics, the values of the combined NCM metric were determined. The results are shown in Table 5. The results for the three component metrics are shown in red, while green is used for the best score achieved by the combined NCM metric in each case.

Table 4. Optimized weight values for the NCM metric.

Table 5. Results of NCM metric for the three best component metrics.

In order to visualize the good quality of the proposed NCM index, scatter plots of the proposed eight metrics and the NCM for the tested bases are shown in Figure 4, Figure 5, Figure 6 and Figure 7. The scatter plots and their fitted curves show that the proposed combined NCM metric closely matches the MOS estimates for each of the databases.

Figure 4. Scatter plots of subjective MOS against IQA metrics obtained from the TID2008 database.

Figure 5. Scatter plots of subjective MOS against IQA metrics obtained from the TID2013 database.

Figure 6. Scatter plots of subjective MOS against IQA metrics obtained from the KADID-10k database.

Figure 7. Scatter plots of subjective MOS against IQA metrics obtained from the PIPAL database.

A study of computation times for the considered FR-IQA metrics was also conducted. The average computation times for each of the databases are shown in Table 6. The three fastest metrics are highlighted in bold. The NCM computation time, which is not included in Table 6, is approximately equal to the sum of the computation times of its component metrics.

Table 6. Computation times (s) for IQA metrics.

For the three fastest FR-IQA metrics highlighted in red (see Table 7), a linear combination was formed by redetermining the optimal

α

,

β

, and

γ

weights (see Table 5) from the perspective of PLCC maximization. The resulting NCM metric using the three fastest metrics achieved the best performance, as marked in green in Table 7.

Table 7. Results of the NCM metric for the three fastest component metrics.

The study was conducted in the MATLAB R2024a programming environment on a computer with the specifications provided in Table 8.

Table 8. Parameters of desktop computer used for experiments.

The best results obtained using the NCM were additionally compared with those of other combined metrics presented in the literature [,]. The comparison is highlighted in bold in Table 9. The authors of [] used combined metrics (MFMOGP3, MFMOGP4) based on the additive combination of component metrics, ranging from 8 to 10 metrics. In [], combined metrics (OFIQA) based on the product form, involving between 4 and 17 factor-metrics, were proposed. For Table 9, we selected the best results from both of the above-mentioned works. For the TID2008 database, the results are comparable, while for the TID2013 database the proposed NCM index achieves the highest correlation coefficients. We conducted the comparison on the TID2008 and TID2013 databases, as both older works on combined metrics did not consider newer databases (KADID-10k, PIPAL).

Table 9. Comparison of the proposed NCM metric with other combined metrics.

4. Conclusions

From the existing literature on FR-IQA metrics, it is evident that there is no single metric that significantly outperforms the others. Therefore, the idea of creating a linear or nonlinear combination of several top metrics has emerged. The proposed approach opted for an additive combination of the three metrics with the highest correlation coefficients and the lowest RMSE. The resulting combined NCM metric was based on component metrics that depended on the selected database. NCM achieved the best results among all tested metrics across all tested databases. In addition, a case was examined where the three fastest metrics, i.e., MDSI, HPSI, and GMSD, were selected as components. The combined metric obtained in this case also achieved the best results compared to all the tested metrics. Potential extensions of the proposed approach include replacing the linear combination of metrics to their nonlinear combination, exploring alternative methods for optimizing weight selection, and more.

Author Contributions

Conceptualization, M.F. and H.P.; methodology, M.F.; software, Ł.M.; validation, M.F. and H.P.; investigation, M.F. and Ł.M.; resources, M.F.; data curation, M.F.; writing—original draft preparation, M.F.; writing—review and editing, M.F.; visualization, M.F.; supervision, H.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Polish Ministry for Science and Education under internal grant 02/070/BK_24/0055 for the Department of Data Science and Engineering. Silesian University of Technology, Gliwice, Poland.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

IQA	Image Quality Assessment
FR-IQA	Full-Reference Image Quality Assessment
MOS	Mean Opinion Score
FSIMc	Feature SIMilarity (color version)
MDSI	Mean Deviation Similarity Index
HPSI	Haar wavelet Perceptual Similarity Index
VCGS	Visual saliency with Color appearance and Gradient Similarity
SPSIM	SuperPixel SIMilarity
LGV	Local Global Variation
SWLGV	Saliency Weighted Local Global Variation
GMSD	Gradient Magnitude Similarity Deviation
NCM	New Combined Metric
PSNR	Peak Signal-to-Noise Ratio
VSNR	Visual Signal-to-Noise Ratio
VIF	Visual Information Fidelity index
MS-SSIM	Multi Scale Structural SIMilarity index
PLCC	Pearson Linear Correlation Coefficient
SROCC	Spearman Rank Order Correlation Coefficient
KROCC	Kendall Rank Order Correlation Coefficient
RMSE	Root Mean Squared Error
SVD	Singular Value Decomposition
HVS	Human Visual System
DMOS	Differential Mean Opinion Score
TID	Tampere Image Database
KADID-10k	Konstanz Artificially Distorted Image quality Database
PIPAL	Perceptual Image Processing ALgorithms database
WFSIMc	Weighted FSIM (color version) index
RFSIM	Riesz-transform-based Feature SIMilarity index
CQM	Combined Quality Metric
CISI	Combined Image Similarity Index
LCSIM	Linearly Combined Similarity Measures

References

Liu, M.; Yang, X. A new image quality approach based on decision fusion. In Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Jinan, China, 18–20 October 2008; Volume 4, pp. 10–14. [Google Scholar]
Okarma, K. Combined full-reference image quality metric linearly correlated with subjective assessment. In Proceedings of the Artificial Intelligence and Soft Computing: 10th International Conference, ICAISC 2010, Zakopane, Poland, 13–17 June 2010; Part I 10. Springer: Berlin/Heidelberg, Germany, 2010; pp. 539–546. [Google Scholar]
Okarma, K. Combined image similarity index. Opt. Rev. 2012, 19, 349–354. [Google Scholar] [CrossRef]
Okarma, K. Extended hybrid image similarity–combined full-reference image quality metric linearly correlated with subjective scores. Elektron. Ir Elektrotech. 2013, 19, 129–132. [Google Scholar] [CrossRef]
Oszust, M. Full-reference image quality assessment with linear combination of genetically selected quality measures. PLoS ONE 2016, 11, e0158333. [Google Scholar] [CrossRef] [PubMed]
Lukin, V.; Ponomarenko, N.; Ieremeiev, O.; Egiazarian, K.; Astola, J. Combining full-reference image visual quality metrics by neural network. In Proceedings of the Human Vision and Electronic Imaging XX, San Francisco, CA, USA, 9–12 February 2015; Volume 9394, pp. 172–183. [Google Scholar]
Bosse, S.; Maniry, D.; Müller, K.R.; Wiegand, T.; Samek, W. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Process. 2017, 27, 206–219. [Google Scholar] [CrossRef] [PubMed]
Varga, D. A combined full-reference image quality assessment method based on convolutional activation maps. Algorithms 2020, 13, 313. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef]
Kovesi, P. Image features from phase congruency. Videre J. Comput. Vis. Res. 1999, 1, 1–26. [Google Scholar]
Nafchi, H.Z.; Shahkolaei, A.; Hedjam, R.; Cheriet, M. Mean deviation similarity index: Efficient and reliable full-reference image quality evaluator. IEEE Access 2016, 4, 5579–5590. [Google Scholar] [CrossRef]
Reisenhofer, R.; Bosse, S.; Kutyniok, G.; Wiegand, T. A Haar wavelet-based perceptual similarity index for image quality assessment. Signal Process. Image Commun. 2018, 61, 33–43. [Google Scholar] [CrossRef]
Shi, C.; Lin, Y. Full Reference Image Quality Assessment Based on Visual Salience with Color Appearance and Gradient Similarity. IEEE Access 2020, 8, 97310–97320. [Google Scholar] [CrossRef]
Sun, W.; Liao, Q.; Xue, J.H.; Zhou, F. SPSIM: A superpixel-based similarity index for full-reference image quality assessment. IEEE Trans. Image Process. 2018, 27, 4232–4244. [Google Scholar] [CrossRef] [PubMed]
Stutz, D.; Hermans, A.; Leibe, B. Superpixels: An evaluation of the state-of-the-art. Comput. Vis. Image Underst. 2018, 166, 1–27. [Google Scholar] [CrossRef]
Achanta, R.; Shaji, A.; Smith, K.; Lucchi, A.; Fua, P.; Süsstrunk, S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2274–2282. [Google Scholar] [CrossRef] [PubMed]
Varga, D. Full-Reference Image Quality Assessment Based on Grünwald–Letnikov Derivative, Image Gradients, and Visual Saliency. Electronics 2022, 11, 559. [Google Scholar] [CrossRef]
Imamoglu, N.; Lin, W.; Fang, Y. A saliency detection model using low-level features based on wavelet transform. IEEE Trans. Multimed. 2012, 15, 96–105. [Google Scholar] [CrossRef]
Xue, W.; Zhang, L.; Mou, X.; Bovik, A.C. Gradient magnitude similarity deviation: A highly efficient perceptual image quality index. IEEE Trans. Image Process. 2013, 23, 684–695. [Google Scholar] [CrossRef]
Sheikh, H.R.; Sabir, M.F.; Bovik, A.C. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 2006, 15, 3440–3451. [Google Scholar] [CrossRef]
Ponomarenko, N.; Lukin, V.; Zelensky, A.; Egiazarian, K.; Carli, M.; Battisti, F. TID2008—A database for evaluation of full-reference visual quality assessment metrics. Adv. Mod. Radioelectron. 2009, 10, 30–45. [Google Scholar]
Ponomarenko, N.; Jin, L.; Ieremeiev, O.; Lukin, V.; Egiazarian, K.; Astola, J.; Vozel, B.; Chehdi, K.; Carli, M.; Battisti, P.; et al. Image database TID2013: Peculiarities, results and perspectives. Signal Process. Image Commun. 2015, 30, 57–77. [Google Scholar] [CrossRef]
Lin, H.; Hosu, V.; Saupe, D. KADID-10k: A large-scale artificially distorted IQA database. In Proceedings of the 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), Berlin, Germany, 5–7 June 2019; pp. 1–3. [Google Scholar]
Gu, J.; Cai, H.; Chen, H.; Ye, X.; Ren, J.; Dong, C. PIPAL: A large-scale image quality assessment dataset for perceptual image restoration. In European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2020; pp. 633–651. [Google Scholar]
Varga, D. Composition-preserving deep approach to full-reference image quality assessment. Signal Image Video Process. 2020, 14, 1265–1272. [Google Scholar] [CrossRef]
Merzougui, N.; Djerou, L. Multi-measures fusion based on multi-objective genetic programming, for full-reference image quality assessment. arXiv 2017, arXiv:1801.06030. [Google Scholar]
Varga, D. An optimization-based family of predictive, fusion-based models for full-reference image quality assessment. J. Imaging 2023, 9, 116. [Google Scholar] [CrossRef]

Figure 1. Reference images of the TID2008 and TID2013 databases [].

Figure 2. Reference images of the KADID-10k database [].

Figure 3. Examples of reference images from the PIPAL database [].

Figure 4. Scatter plots of subjective MOS against IQA metrics obtained from the TID2008 database.

Figure 5. Scatter plots of subjective MOS against IQA metrics obtained from the TID2013 database.

Figure 6. Scatter plots of subjective MOS against IQA metrics obtained from the KADID-10k database.

Figure 7. Scatter plots of subjective MOS against IQA metrics obtained from the PIPAL database.

Table 1. Comparison of the selected IQA databases.

		No. of	No. of		No. of Dist.
Database	Year	Ref.	Dist.	Environment	Images
TID2008	2008	25	17	lab	1700
TID2013	2013	25	24	lab	3000
KADID-10k	2019	81	25	crowdsourcing	10,125
PIPAL	2020	250	40	crowdsourcing	29,000

Table 2. Values of correlation coefficients and RMSE for FR-IQA metrics.

Database	Metric	FSIMc []	MDSI []	HPSI []	VCGS []	SPSIM []	LGV []	SWLGV []	GMSD []
TID2008	PLCC	0.876	0.916	0.907	0.878	0.893	0.865	0.874	0.879
	SROCC	0.884	0.921	0.910	0.897	0.910	0.881	0.889	0.891
	KROCC	0.699	0.751	0.737	0.717	0.730	0.696	0.711	0.709
	RMSE	0.647	0.538	0.566	0.643	0.605	0.674	0.652	0.640
TID2013	PLCC	0.877	0.909	0.893	0.900	0.909	0.778	0.797	0.855
	SROCC	0.851	0.890	0.873	0.893	0.904	0.807	0.807	0.804
	KROCC	0.667	0.712	0.692	0.717	0.725	0.638	0.641	0.634
	RMSE	0.596	0.518	0.557	0.541	0.517	0.779	0.749	0.642
KADID-10k	PLCC	0.851	0.864	0.885	0.868	0.874	0.815	0.835	0.805
	SROCC	0.854	0.885	0.885	0.871	0.874	0.820	0.840	0.847
	KROCC	0.665	0.702	0.699	0.683	0.687	0.630	0.655	0.664
	RMSE	0.568	0.544	0.505	0.538	0.525	0.627	0.595	0.643
PIPAL	PLCC	0.615	0.598	0.641	0.554	0.578	0.529	0.543	0.629
	SROCC	0.589	0.585	0.589	0.534	0.562	0.519	0.536	0.583
	KROCC	0.416	0.408	0.417	0.370	0.391	0.359	0.372	0.414
	RMSE	0.104	0.106	0.101	0.110	0.108	0.112	0.111	0.103

Table 3. Ranking of FR-IQA metrics.

Database	Metric	FSIMc []	MDSI []	HPSI []	VCGS []	SPSIM []	LGV []	SWLGV []	GMSD []
TID2008	PLCC	3	8	7	4	6	1	2	5
	SROCC	2	8	6	5	7	1	3	4
	KROCC	2	8	7	5	6	1	4	3
	RMSE	3	8	7	4	6	1	2	5
	TOTAL	10	32	27	18	25	4	11	17
TID2013	PLCC	4	7	5	6	8	1	2	3
	SROCC	4	6	5	7	8	3	2	1
	KROCC	4	6	5	7	8	2	3	1
	RMSE	4	7	5	6	8	1	2	3
	TOTAL	16	26	20	26	32	7	9	8
KADID-10k	PLCC	4	5	8	6	7	2	3	1
	SROCC	4	8	7	5	6	1	2	3
	KROCC	4	8	7	5	6	1	2	3
	RMSE	4	5	8	6	7	2	3	1
	TOTAL	16	26	30	22	26	6	10	8
PIPAL	PLCC	6	5	8	3	4	1	2	7
	SROCC	8	6	7	2	4	1	3	5
	KROCC	7	5	8	2	4	1	3	6
	RMSE	6	5	8	3	4	1	2	7
	TOTAL	27	21	31	10	16	4	10	25

Table 4. Optimized weight values for the NCM metric.

Criterion:		Three Best		Three Fast
Database	Weight	Value	Metrics	Value	Metrics
TID2008	$α$	0.578	MDSI []	0.618	MDSI []
	$β$	0.285	HPSI []	0.372	HPSI []
	$γ$	0.136	SPSIM []	0.010	GMSD []
TID2013	$α$	0.459	MDSI []	0.680	MDSI []
	$β$	0.086	VCGS []	0.310	HPSI []
	$γ$	0.455	SPSIM []	0.010	GMSD []
KADID-10k	$α$	0.386	MDSI []	0.342	MDSI []
	$β$	0.404	HPSI []	0.486	HPSI []
	$γ$	0.210	SPSIM []	0.172	GMSD []
PIPAL	$α$	0.320	FSIMc []	0.317	MDSI []
	$β$	0.507	HPSI []	0.584	HPSI []
	$γ$	0.174	GMSD []	0.099	GMSD []

Table 5. Results of NCM metric for the three best component metrics.

Db.	Met.	FSIMc []	MDSI []	HPSI []	VCGS []	SPSIM []	LGV []	SWLGV []	GMSD []	NCM	$M_{1}$ , $M_{2}$ , $M_{3}$
TID2008	PLCC	0.876	0.916	0.907	0.878	0.893	0.865	0.874	0.879	0.924	SPSIM []
	SROCC	0.884	0.921	0.910	0.897	0.910	0.881	0.889	0.891	0.924	MDSI []
	KROCC	0.699	0.751	0.737	0.717	0.730	0.696	0.711	0.709	0.759	HPSI []
	RMSE	0.647	0.538	0.566	0.643	0.605	0.674	0.652	0.640	0.514
TID2013	PLCC	0.877	0.909	0.893	0.900	0.909	0.778	0.797	0.855	0.922	SPSIM []
	SROCC	0.851	0.890	0.873	0.893	0.904	0.807	0.807	0.804	0.906	MDSI []
	KROCC	0.667	0.712	0.692	0.717	0.725	0.638	0.641	0.634	0.732	VCGS []
	RMSE	0.596	0.518	0.557	0.541	0.517	0.779	0.749	0.642	0.481
KADID-10k	PLCC	0.851	0.864	0.885	0.868	0.874	0.815	0.835	0.805	0.896	SPSIM []
	SROCC	0.854	0.885	0.885	0.871	0.874	0.820	0.840	0.847	0.897	MDSI []
	KROCC	0.665	0.702	0.699	0.683	0.687	0.630	0.655	0.664	0.717	HPSI []
	RMSE	0.568	0.544	0.505	0.538	0.525	0.627	0.595	0.643	0.480
PIPAL	PLCC	0.615	0.598	0.641	0.554	0.577	0.529	0.543	0.629	0.653	GMSD []
	SROCC	0.589	0.585	0.589	0.534	0.562	0.519	0.536	0.583	0.608	FSIMc []
	KROCC	0.416	0.408	0.417	0.370	0.391	0.359	0.372	0.414	0.432	HPSI []
	RMSE	0.104	0.106	0.101	0.110	0.108	0.112	0.111	0.103	0.100

Table 6. Computation times (s) for IQA metrics.

Database	FSIMc []	MDSI []	HPSI []	VCGS []	SPSIM []	LGV []	SWLGV []	GMSD []
TID2008	0.089	0.014	0.028	0.212	0.128	0.435	5.986	0.019
TID2013	0.104	0.016	0.037	0.214	0.113	0.339	6.130	0.019
KADID-10k	0.136	0.026	0.046	0.299	0.143	0.451	5.890	0.028
PIPAL	0.172	0.011	0.012	0.091	0.176	0.619	2.628	0.009

Table 7. Results of the NCM metric for the three fastest component metrics.

Db.	Met.	FSIMc []	MDSI []	HPSI []	VCGS []	SPSIM []	LGV []	SWLGV []	GMSD []	NCM	$M_{1}$ , $M_{2}$ , $M_{3}$
TID2008	PLCC	0.876	0.916	0.907	0.878	0.893	0.865	0.874	0.879	0.922	GMSD
	SROCC	0.884	0.921	0.910	0.897	0.910	0.881	0.889	0.891	0.923	MDSI
	KROCC	0.699	0.751	0.737	0.717	0.730	0.696	0.711	0.709	0.757	HPSI
	RMSE	0.647	0.538	0.566	0.643	0.605	0.674	0.652	0.640	0.519
TID2013	PLCC	0.877	0.909	0.893	0.900	0.909	0.778	0.797	0.855	0.912	GMSD
	SROCC	0.851	0.890	0.873	0.893	0.904	0.807	0.807	0.804	0.892	MDSI
	KROCC	0.667	0.712	0.692	0.717	0.725	0.638	0.641	0.634	0.716	HPSI
	RMSE	0.596	0.518	0.557	0.541	0.517	0.779	0.749	0.642	0.508
KADID-10k	PLCC	0.851	0.864	0.885	0.868	0.874	0.815	0.835	0.805	0.897	GMSD
	SROCC	0.854	0.885	0.885	0.871	0.874	0.820	0.840	0.847	0.897	MDSI
	KROCC	0.665	0.702	0.699	0.683	0.687	0.630	0.655	0.664	0.719	HPSI
	RMSE	0.568	0.544	0.505	0.538	0.525	0.627	0.595	0.643	0.479
PIPAL	PLCC	0.615	0.598	0.641	0.554	0.577	0.529	0.543	0.629	0.655	GMSD
	SROCC	0.589	0.585	0.589	0.534	0.562	0.519	0.536	0.583	0.608	MDSI
	KROCC	0.416	0.408	0.417	0.370	0.391	0.359	0.372	0.414	0.432	HPSI
	RMSE	0.104	0.106	0.101	0.110	0.108	0.112	0.111	0.103	0.100

Table 8. Parameters of desktop computer used for experiments.

Processor	Intel(R) Core(TM) i5-7400 CPU @ 3.00 GHz (4 cores)
RAM	32 GB
OS	Windows 10
Env.	Matlab 2024a

Table 9. Comparison of the proposed NCM metric with other combined metrics.

Database	Metric	MFMOGP3 []	MFMOGP4 []	OFIQA []	NCM
TID2008	PLCC	0.925	0.902	0.910	0.922
	SROCC	0.923	0.911	0.915	0.923
	KROCC	0.757	0.727	0.738	0.757
	RMSE	0.511	0.580	0.557	0.519
TID2013	PLCC	0.883	0.914	0.906	0.922
	SROCC	0.868	0.902	0.890	0.923
	KROCC	0.688	0.725	0.713	0.757
	RMSE	0.581	0.503	0.526	0.519

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

New Combined Metric for Full-Reference Image Quality Assessment

Abstract

1. Introduction

2. Overview of Highly Correlated FR-IQA Metrics

2.1. Feature SIMilarity Index FSIMc (Color Version)

2.2. Mean Deviation Similarity Index MDSI

2.3. Haar Wavelet Perceptual Similarity Index HPSI

2.4. Visual Saliency with Color Appearance and Gradient Similarity Index VCGS

2.5. SuperPixel SIMilarity Index SPSIM

2.6. Local Global Variation Index LGV and Saliency Weighted Local Global Variation Index SWLGV

2.7. Gradient Magnitude Similarity Deviation Index GMSD

2.8. Evaluation Criteria for IQA

3. The New Combined Metric (NCM) and Its Experimental Research

3.1. Selected IQA Databases

3.2. Experimental Tests

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics