DualNetIQ: Texture-Insensitive Image Quality Assessment with Dual Multi-Scale Feature Maps

Agamy, Adel; Mady, Hossam; Esmaiel, Hamada; Al Ayidh, Abdulrahman; Aly, Abdelmageed Mohamed; Abdel-Nasser, Mohamed

doi:10.3390/electronics14061169

Open AccessArticle

DualNetIQ: Texture-Insensitive Image Quality Assessment with Dual Multi-Scale Feature Maps

by

Adel Agamy

¹,

Hossam Mady

¹,

Hamada Esmaiel

^2,*

,

Abdulrahman Al Ayidh

²

,

Abdelmageed Mohamed Aly

¹ and

Mohamed Abdel-Nasser

¹

Electrical Engineering Department, Faculty of Engineering, Aswan University, Aswan 81542, Egypt

²

Electrical Engineering Department, College of Engineering, King Khalid University, Abha 61411, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(6), 1169; https://doi.org/10.3390/electronics14061169

Submission received: 10 February 2025 / Revised: 10 March 2025 / Accepted: 11 March 2025 / Published: 17 March 2025

(This article belongs to the Special Issue AI-Based Image Processing Detection and Classification Analysis for Multidisciplinary Approaches)

Download

Browse Figures

Versions Notes

Abstract

The precise assessment of image quality that matches human perception is still a major challenge in the field of digital imaging. Digital images play a crucial role in many technological and media applications. The existing deep convolutional neural network (CNN)-based image quality assessment (IQA) methods have advanced considerably, but there remains a critical need to improve the performance of existing methods while maintaining explicit tolerance to visual texture resampling and texture similarity. This paper introduces DualNetIQ, a novel full-reference IQA method that leverages the strengths of deep learning architectures to exhibit robustness against resampling effects on visual textures. DualNetIQ includes two main stages: feature extraction from the reference and distorted images, and similarity measurement based on combining global texture and structure similarity metrics. In particular, DualNetIQ takes features from input images using a group of hybrid pre-trained multi-scale feature maps carefully chosen from VGG19 and SqueezeNet pre-trained CNN models to find differences in texture and structure between the reference image and the distorted image. The Grey Wolf Optimizer (GWO) calculates the weighted combination of global texture and structure similarity metrics to assess the similarity between reference and distorted images. The unique advantage of the proposed method is that it does not require training or fine-tuning the CNN deep learning model. Comprehensive experiments and comparisons on five databases, including various distortion types, demonstrate the superiority of the proposed method over state-of-the-art models, particularly in image quality prediction and texture similarity tasks.

Keywords:

image quality assessment; deep learning; grey wolf optimizer; texture similarity

1. Introduction

Visual perception plays a vital role in our daily lives, and over the past century, there has been a significant technological shift in the field of imaging. Advances in mobile multimedia technology now allow people to access numerous high-resolution images on their smartphones or other devices at any time. However, various processes in an image system such as capture, reproduction, compression, storage, transmission, and restoration can introduce noise, distortions, or artifacts. As a result, monitoring and assessing image quality is essential. The demand for accurate image quality assessment (IQA) has grown with the increasing reliance on digital imaging in fields like communications, surveillance, medical imaging, and visual information. Thus, precise IQA methods are fundamental for many image-based applications. The objective of IQA is to develop algorithms that automatically assess digital image quality in a manner consistent with human perception [1,2].

Notably, existing IQA methods are generally classified into three categories: full-reference (FR), reduced-reference (RR), and no-reference (NR) methods. The FR-IQA techniques, such as mean square error (MSE) and the structural similarity index (SSIM) [3], evaluate image quality by directly comparing the distorted image with a pristine reference image. However, these methods often fail to align with human visual perception, especially when comparing images with similar textures that may differ in the exact arrangement of features but appear nearly identical to the human eye (see Figure 1). This discrepancy arises from their dependence on pixel-wise accuracy, making them overly sensitive to point-by-point differences between images that share the same overall texture.

In response to the limitations of FR-IQA methods, and with the rise of deep learning across various fields, deep learning has significantly impacted IQA [7,8,9]. Deep learning-based models have shifted the paradigm by leveraging large-scale datasets to automatically learn complex distortion patterns, eliminating the need for manually crafted features that traditional methods rely on. Moreover, the use of convolutional neural networks (CNNs) for IQA tasks has been further enhanced by transfer learning, where pre-trained networks initially trained on large-scale image recognition tasks exhibit strong capabilities in capturing intricate patterns and textures, closely resembling the processing of the human visual system [9,10,11,12].

Despite the advancements in the realm of IQA, there remains a need for further improvements in IQA models, particularly regarding texture similarity and sensitivity to texture resampling effects. The key challenge lies in developing an IQA model that can accurately assess image quality while withstanding the impact of texture resampling commonly encountered during image processing tasks. While models such as deep image structure and texture similarity (DISTS) [11] and deep distance correlation (DeepDC) [12] have explored the concept of IQA models that are less influenced by texture resampling, they should also strive for alignment with human visual perception. It is important to note that both DISTS and DeepDC utilize the VGG pre-trained model [13] for feature extraction. However, relying on a single deep feature extractor like VGG may not fully capture the complete range of image appearance, structure, and texture, especially under various distortion types. While DISTS and DeepDC have achieved satisfactory IQA results, there remains a need for further improvement in the performance of IQA methods, particularly in traditional image quality assessment.

This paper introduces a novel full-reference IQA method, termed DualNetIQ, designed to be resilient to resampling effects on visual textures by leveraging the capabilities of deep learning architectures. The proposed DualNetIQ method has two steps: first, it pulls out features from both the original image and the distorted image; second, it measures similarity by combining global texture and structure similarity metrics. In particular, this method takes features from input images by using a mix of pre-trained feature maps carefully chosen from the VGG19 and SqueezeNet CNN models to find differences in texture and structure. Each deep CNN network uses hybrid feature maps in the proposed IQA method to pull out specific patterns from images. This makes sure that a full description of the textures and structures of both the original image and the distorted image under different types of distortion is given. The proposed scheme then evaluates the similarity between the reference and distorted images by calculating a weighted combination of global texture and structure similarity metrics, which we then optimize using the Grey Wolf Optimizer (GWO). This research highlights the following key contributions:

Proposing a novel IQA method called DualNetIQ for texture-insensitive full reference IQA. Unlike existing IQA methods that employ a single deep feature extractor, the proposed method utilizes hybrid multi-scale feature maps selected from robust pre-trained CNNs models (VGG19 and SqueezeNet) to assess the quality of textured images effectively and withstand the effects of texture resampling under different distortion types. DualNetIQ makes a complete description of the textures and structures of the reference and the distorted images under different types of distortion. It addresses the limitations of similar existing models, particularly in handling various types of distortions.
Proposing an efficient method for assessing the similarity between the reference and distorted images by optimally combining structure and texture similarity metrics utilizing the GWO algorithm, allowing the model to align closely with human perceptual judgement.
Providing an extensive ablation study and comparative analysis against existing methods on five IQA datasets (LIVE [14], CSIQ [15], TID2013 [16], KADID-10k [17], and PIPAL [18]), one texture similarity dataset, SynTEX [4], and one perceptual similarity dataset, BAPPS [10]. These datasets encompass a wide range of distortions, including traditional synthetic artifacts and those created by modern image processing algorithms. The employment of such datasets provides a rigorous benchmark that confirms the adaptability and robustness of the DualNetIQ method across various evaluation scenarios.

The remainder of this study includes four sections: Section 2 presents a brief review of existing research on IQA methods. Section 3 introduces the proposed DualNetIQ model in detail. Section 4 provides experimental comparisons and discussion to evaluate the model’s performance. Lastly, Section 5 concludes the study.

2. Related Work

The field of IQA has seen significant advancements in recent years. It began with traditional IQA methods, which were primarily based on modeling the human visual system. In the last decade, IQA techniques have evolved to include deep learning approaches, whether through training neural networks from scratch or leveraging pre-trained models. Additionally, hybrid models that combine knowledge-driven and data-driven methods have emerged. Below, we provide a brief overview of the most prominent traditional approaches and deep learning-based IQA methods.

2.1. Traditional Full-Reference IQA Methods

For over five decades, knowledge-driven techniques have been the primary approach in FR-IQA models. Early FR-IQA methods, like peak signal-to-noise ratio (PSNR) and mean squared error (MSE), compared things in a way that was certain by computing differences between pixels. However, these methods demonstrated poor correlation with the human visual system (HVS) [19]. Wang et al. introduced the SSIM metric [3], recognizing that earlier models failed to capture perceptual image quality effectively. This marked a paradigm shift, incorporating structural information, contrast, and luminance into the evaluation process. Despite these advances, SSIM and its variants, such as multi-scale SSIM (MS-SSIM) [20], information weighting SSIM (IW-SSIM) [21], and complex wavelet SSIM (CW-SSIM) [22], faced challenges in handling complex distortions and accurately representing textures.

Additional techniques, such as visual saliency similarity (VSI) [23], Visual Information Fidelity (VIF) [24], and feature similarity (FSIM) [25], have been proposed to improve performance. While these models have shown significant improvements, they still face challenges in accurately handling visual textures and certain distortion types.

2.2. Deep Learning-Based IQA Methods

Deep learning-based models have significantly outperformed traditional metrics in capturing high-level perceptual cues. In recent years, numerous deep learning-based IQA methods have been proposed. For example, Liang et al. [26] introduced a dual-path CNN network that processes both distorted and reference image patches to predict corresponding quality scores. Kim and Lee [7] developed an IQA method called DeepQA in which a CNN learns the behavior of the HVS from the underlying data distribution of IQA datasets.

The successful application of pre-trained networks like AlexNet [27], SqueezeNet [28], and VGG [13] across various vision tasks has inspired the use of deep features in FR-IQA, significantly enhancing the accuracy and reliability of these estimations [8]. Gao et al. [9] introduced the DeepSim IQA model, which utilizes a deep CNN pre-trained on the ImageNet database [29]. In their approach, pairs of distorted and reference images are fed into the CNN, and each output layer serves as a feature map. Global image quality scores are then computed by pooling local similarities between the feature maps from the distorted and reference images. The authors of LPIPS [10] used pre-trained CNN models to extract feature stacks from various network layers and normalize them in the channel dimension to compute the distance between reference and distorted image patches.

The development of transformers has greatly energized IQA and ushered in a new era in artificial intelligence. Junyong You and Korhonen [30] employed transformers to effectively manage images of diverse resolutions and improve their focus on global features. Building on this, Ke et al. [31] presented a novel technique that combines scale embedding and hash-based 2D spatial embedding to make it easier to handle images with different sizes and aspect ratios. Many vision transformer-based methods have since emerged [32,33,34,35] that have shown excellent performance in image quality assessment.

While all these models have achieved acceptable results, they did not address the issue of sensitivity to texture resampling.

Recent studies have aimed to address the issue of texture resampling sensitivity. One of the first deep learning-based models designed to explicitly handle this problem is the DISTS model [11]. DISTS employs a pre-trained VGG model to extract features from both reference and corrupt images. It then constructs a similar measure by combining texture and structural terms across all feature maps. The authors of [36] proposed an improved version of DISTS called A-DISTS by introducing adaptive strategies to calculate the structure and texture similarities. They employed a dispersion index to adaptively localize the texture regions at different scales. DeepDC [12] proposed the distance correction to measure the similarity between the reference and corrupted images in the deep feature domain.

Although the methods mentioned above can address texture resampling and texture similarity, they still have limitations, such as sensitivity to specific distortion types, largely due to their reliance on a single pre-trained CNN feature extractor. According to the no-free-lunch theorem, using one pre-trained model (i.e., a pre-trained VGG19 network) for feature extraction may work well for certain distortions but not for all. Specifically, DISTS and DeepDC perform a little poorly on the large-scale IQA dataset TID2013, which involves 3000 images with 24 types of distortion. These methods yield limited results when dealing with high-frequency noise, impulse noise, local block-wise distortion, color saturation, and changes in contrast. In contrast, this paper proposes the DualNetIQ model, which utilizes carefully selected multi-scale feature maps from multiple powerful pre-trained models, enabling it to handle various distortion types. DualNetIQ offers improved computational efficiency and reliability in image quality assessment and texture similarity tasks.

3. Proposed Method

3.1. DualNetIQ Model

Figure 2 presents an overview of the architecture of the proposed DualNetIQ model. In this process, there are two main steps: using multi-scale hybrid feature maps to extract features and using the proposed optimal combined similarity metric to measure similarity in the deep feature domain. Given the clean reference image (I) and the damaged image (G), DualNetIQ uses robust pre-trained models to choose the best multiscale feature maps to represent the input images. Notably, effectively capturing the perceptual quality of images requires a robust image representation approach. Since perceptual distances are non-uniform in pixel space [37,38], the goal of an image representation function is to transform the pixel representation into a space that is more perceptually uniform. In the proposed DualNetIQ model, we carefully select hybrid multi-scale feature maps from powerful pre-trained CNN models to extract robust representations of pristine reference and distorted images across various distortion types. Through an exhaustive ablation study [39] on different CNN models, including VGG19 [13], ResNet [40], AlexNet [27], and SqueezeNet [28], we found that VGG19 and SqueezeNet performed the best for feature representation. We also expanded the comparative study to include a stage-by-stage analysis to identify the most effective stages for our task. Additionally, this study involved some transformer-based models for further evaluation.

As shown in Figure 2, DualNetIQ includes two identical branches: the top branch extracts representation for the reference image while the bottom branch extracts representation for the corrupted image. Each branch consists of eight layers: the first layer represents the input images, the second and third layers are feature maps rendered from the VGG19 pre-trained model, and the last five layers are feature maps rendered from the SqueezeNet pre-trained model. The feature vector extracted by each feature map is passed to the optimal combined similarity metric (OCSM) to compute the similarity score.

The feature representation of the input reference image,

f (I),

and the representation of the distorted image

f (G)

are formed by concatenating the features resulting from the feature maps of the top and bottom branches:

f (I) = {{\tilde{I}}_{j}^{(i)}; i = 0, \dots m; j = 1, \dots, n_{i}}

(1)

f (G) = {{\tilde{G}}_{j}^{(i)}; i = 0, \dots m; j = 1, \dots, n_{i}}

(2)

Here, m represents the number of feature maps in each branch,

f

stands for the feature representation,

n_{i}

denotes the dimensions of features extracted from the i-th feature map, and

{\tilde{I}}^{(0)}

and

{\tilde{G}}^{(0)}

are the input reference and corrupted images.

Table 1 provides a description of the stages of the proposed model as well as the output feature dimension for each stage. The first stage corresponds to the input image. The second and third stages correspond to the convolutional responses from conv1_8 and conv2_4 layers of the VGG19 network. The fourth stage captures the convolutional response from the conv1_1 layer of SqueezeNet. The remaining four stages correspond to the responses from fire modules 2, 4, 6, and 8, respectively. As described in [28], a fire module consists of a squeeze convolution layer with 1 × 1 filters, followed by an expanding layer that combines both

1 \times 1

and

3 \times 3

convolution filters.

3.2. Optimal Combined Similarity Metric

At each stage, DualNetIQ computes global texture and structure similar measurements between the corresponding feature maps of reference and distorted images. The overall image quality score,

D (I, G)

, is determined through a weighted summation of the similarity scores across all stages. The formula is given by:

D (I, G) = 1 - \sum_{i = 0}^{m} \sum_{j = 1}^{n i} (κ_{i j} l ({\overset{ˇ}{I}}_{j}^{(i)}, {\overset{ˇ}{G}}_{j}^{(i)}) + ξ_{i j} s ({\overset{ˇ}{I}}_{j}^{(i)}, {\overset{ˇ}{G}}_{j}^{(i)}))

(3)

where {

κ_{i j}

,

ξ_{i j}

} are positive learned values that are constrained such that

\sum_{i = 0}^{n} \sum_{j = 1}^{n i} {(κ}_{i j} + ξ_{i j}) = 1

{\overset{ˇ}{I}}_{j}^{(i)}

and

{\overset{ˇ}{G}}_{j}^{(i)}

are defined in Equations (1) and (2), respectively.

l ({\overset{ˇ}{I}}_{j}^{(i)}, {\overset{ˇ}{G}}_{j}^{(i)})

and

s ({\overset{ˇ}{I}}_{j}^{(i)}, {\overset{ˇ}{G}}_{j}^{(i)})

represent texture similarity and structure similarity, respectively, and are equal:

l ({\overset{ˇ}{I}}_{j}^{(i)}, {\overset{ˇ}{G}}_{j}^{(i)}) = \frac{2 μ_{\tilde{I} j}^{(i)} μ_{\tilde{G} j}^{(i)} + c_{1}}{{(μ_{\tilde{I} j}^{(i)})}^{2} + {(μ_{G j}^{(i)})}^{2} + c_{1}}

(4)

s ({\overset{ˇ}{I}}_{j}^{(i)}, {\overset{ˇ}{G}}_{j}^{(i)}) = \frac{2 σ_{\tilde{I} j}^{(i)} σ_{\tilde{G} j}^{(i)} + c_{2}}{{(σ_{\tilde{I} j}^{(i)})}^{2} + {(σ_{\tilde{G} j}^{(i)})}^{2} + c_{2}}

(5)

where

µ_{I}

and

σ_{I}

are the mean and standard deviation of the image x, respectively, and

σ_{I G}

is the covariance of the I and G images.

The parameters in Equation (3), known as

ĸ

and

ξ

, determine the weight of texture and structure similarity, so they are important in our model performance improvement. It should be noted that we explored many optimization methods to select the most appropriate tuning method for our model, including Random Search [41], Blend Search [42], Ax Search, Nevergrad Search [43], Bayesian Opt/HyperBand (BOHB) Search [44], and Bayesian Optimization (BO) Search within Raytune framework [45]. Moreover, we investigated some metaheuristic algorithms such as Particle Swarm Optimization [46], Grey Wolf Optimization (GWO) [47], and Tree-structured Parzen Estimator [48]. Ultimately, GWO yielded the best results, so we adopted this algorithm for the tuning of ĸ and

ξ

.

Grey wolves are at the top of the food chain. They exhibit a complex hunting behavior where they encircle their prey. This behavior is mathematically modeled in GWO with the following equations:

\overset{⃑}{D} = | \overset{⃑}{C} . {\overset{⃑}{X}}_{p} (t) - \overset{⃑}{X} (t) |

(6)

\overset{⃑}{X} (t + 1) = {\overset{⃑}{X}}_{p} (t) - \overset{⃑}{A} . \overset{⃑}{D}

(7)

where

{\overset{⃑}{X}}_{p} (t)

is the position of the prey (best solution obtained so far),

\overset{⃑}{X} (t)

is the position vector of a grey wolf, and

\overset{⃑}{A}

and

\overset{⃑}{C}

are coefficient vectors. These vectors are calculated as:

\overset{⃑}{A} = 2 \overset{⃑}{a} . {\overset{⃑}{r}}_{1} - \overset{⃑}{a}

(8)

\overset{⃑}{C} = 2 . {\overset{⃑}{r}}_{2}

(9)

where

\overset{⃑}{a}

decreases linearly from 2 to 0 over the course of iterations, and

{\overset{⃑}{r}}_{1}

and

{\overset{⃑}{r}}_{2}

are random vectors in [0, 1].

Grey wolves have a hierarchy made up of four kinds of wolves:

α, β, δ,

and

ω

. This hierarchy is very important in the coordination of group hunting and survival systems; GWO takes advantage of this when it comes to problem-solving techniques in complex optimization. The social hierarchy in the GWO algorithm is modeled by considering the best solution as alpha (

α

), the second-best solution as beta (

β

), the third-best solution as delta (

δ

), and other remaining solutions are omegas (

ω

). This hierarchy is very important for guiding the optimization process where

α

,

β,

and

δ

wolves guide

ω

wolves towards promising areas of search space while maintaining a balance between exploration and exploitation. The wolves update their positions in terms of the best three solutions so far obtained

(α, β, δ)

to show how they hunt by tracking, chasing, and attacking. The next equations present how to update positions:

{\overset{⃑}{X}}_{1} = {\overset{⃑}{X}}_{α} - {\overset{⃑}{A}}_{1} . | {\overset{⃑}{C}}_{1} . {\overset{⃑}{X}}_{α} - \overset{⃑}{X} |

(10)

{\overset{⃑}{X}}_{2} = {\overset{⃑}{X}}_{β} - {\overset{⃑}{A}}_{2} . | {\overset{⃑}{C}}_{2} . {\overset{⃑}{X}}_{β} - \overset{⃑}{X} |

(11)

{\overset{⃑}{X}}_{3} = {\overset{⃑}{X}}_{δ} - {\overset{⃑}{A}}_{3} . | {\overset{⃑}{C}}_{3} . {\overset{⃑}{X}}_{δ} - \overset{⃑}{X} |

(12)

\overset{⃑}{X} (t + 1) = ({\overset{⃑}{X}}_{1} + {\overset{⃑}{X}}_{2} + {\overset{⃑}{X}}_{3}) / 3

(13)

Figure 3 shows the steps of using GWO to fine-tune our model parameters. The fitness function for this method is

f i t n e s s = | 1 - ρ (g t_s c o r e s, d_s c o r e s) |

(14)

where

ρ (g t_s c o r e s, d_s c o r e s)

is the Spearman’s correlation coefficient (SRCC) between ground truth scores and the scores computed by our model using the instance

κ

and

ξ

values suggested by the GWO algorithm. Equation (14) is the equation that GWO seeks to minimize in order to obtain the optimal values of

κ

and

ξ

that result in the best alignment between the human subjective scores and the quality expectations resulting from our model. Application of GWO is best suited because this algorithm’s performance in exploring complex and high-dimensional search spaces helps us refine parameters influencing the model’s outputs.

We conduct this parameter learning on a rich dataset containing various types of image distortions. This ensures that the tuned parameters will enable DualNetIQ to perform exceptionally well across various types of distortions, thereby enhancing its usability and dependability in practical applications. The values of κ and ξ that yield the smallest value of 1-SRCC are the final optimal values. This will confirm not only that the quality predictions made by the model are accurate but that they are also in tune with the subtle perceptual preferences of human observers.

3.3. Implementation of DualNetIQ

The implementation of DualNetIQ, as outlined in Algorithm 1, begins with loading the tuned parameters,

κ

and

ξ

, and then the pre-trained models are loaded, VGG19 and SqueezeNet, partially freezing their parameters to maintain the features that have been learned while lowering computational complexity. Code is available at https://github.com/Hossam-Mady/DualNetIQ.

Algorithm 1: Pseudocode of DualNetIQ
1 2 3 4 5	Load tuned κ and ξ from files Load pretrained VGG19 and SqueezeNet models Define stages as sequential models for different layers of VGG19 and SqueezeNet Freeze model parameters Set mean and standard deviation for input normalization Define channel sizes list ([3,256,512,64,128,256,384,512])
6 7 8 9 10 11 12 13 14 15	def forward_once(x): Normalize x using mean and std # Pass x through VGG19 stages Pass x through VGG19_stage1 -> h1 Pass h1 through VGG19_stage2 -> h2 # Pass x through SqueezeNet stages Pass x through SqueezeNet_stage1 -> h3 Pass h3 through SqueezeNet_stage2 -> h4 Pass h4 through SqueezeNet_stage3 -> h5 Pass h5 through SqueezeNet_stage4 -> h6 Pass h6 through SqueezeNet_stage5 -> h7 Return [x, h1, h2, h3, h4, h5, h6, h7]
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31	def forward(x, y, require_grad = False, batch_average = False): Normalize kappa and xi term1 = 0 term2 = 0 for i in range(len(channel_sizes)): x_mean, y_mean = Compute mean of features from x and y S1 = Compute similarity using x_mean, y_mean term1 += kappa[i] * S1 x_var, y_var = Compute variance of features from x and y S2 = Compute similarity using x_var, y_var term2 += xi[i] * S2 score = 1 − (term1 + term2).squeeze() if batch_average: return score.mean() else: return score

The VGG19 and SqueezeNet stages are then defined, with each stage representing specific layers within each architecture. Additionally, we specify input normalization, which provides the mean and standard deviation to properly scale the input images for feeding into the network. The forward_once function passes an input image through the defined stages of VGG19 and SqueezeNet. First, it normalizes the image and then feeds it sequentially through the VGG19 stages, extracting features h1 and h2, and through the SqueezeNet stages, extracting features h3 to h7. The algorithm returns a list of features from both models, which subsequently serve as the basis for computing similarity. The core of the algorithm is the forward function, which computes the similarity between the features extracted from two images,

x

and

y

. It normalizes first the

κ

and

ξ

parameters, and it initializes two terms, term1 and term2, which will accumulate the score. For each stage, it calculates the meaning and variance of the features from both images. Then, it computes two similarities,

S 1

(texture similarity) and

S 2

(structure similarity), weighted respectively by kappa[i] and xi[i]. The combination of these two weighted similarities produces an image quality score, normalized to fall within a specific range.

3.4. Evaluation Metrics

The correlation between the objective IQA quality score and mean opinion scores (MOSs) is a commonly used measure to evaluate the objective performance of IQA. Three metrics are commonly used to measure this correlation: SRCC, KRCC, and PLCC.

While PLCC gauges the predicted quality score’s linearity, SRCC and KRCC assess the predicted quality score’s monotonicity. We use SRCC and KRCC, as PLCC requires the produced scores to exhibit linearity in relation to subjective ratings; higher SRCC and KRCC values indicate higher-caliber prediction outcomes.

S R C C = 1 - \frac{6 \sum_{i = 1}^{n} {d_{i}}^{2}}{n (n^{2} - 1)}

(15)

K R C C = \frac{2 (C - D)}{n (n - 1)}

(16)

where

d_{i}

is the difference between the

i^{t h}

image’s ranks in the objective and the subjective ratings,

C

is the number of concordant pairs,

D

is the number of discordant pairs, and n is the number of observations.

4. Results and Discussion

In this section, we discuss the comprehensive tests conducted to validate the effectiveness of our suggested model. These consisted of standard image quality prediction evaluation, perceptual similarity measurement, and texture similarity assessment. Finally, we conducted ablation studies to demonstrate the impact of each component on the model.

4.1. Parameter Setting

As previously mentioned, we optimized the weights using the GWO method. It was configured with a population size equal to four and a maximum number of iterations of 50 iterations. The dimensionality of the search space was set to 16, reflecting the number of parameters (the total of the kappa and xi values over all stages). Parameter values were restricted within the range [0, 1]. The error function was defined as the absolute value of (1-SRCC), and the goal was to minimize this value.

We performed all computations on an NVIDIA GeForce GTX 1650 GPU using PyTorch version 2.2.0, with all computations offloaded to the CUDA-enabled GPU to enhance performance and reduce runtime. Our system was running Windows 10 64-bit with CUDA version 12.1 for GPU acceleration. This experiment was conducted on a machine with an Intel(R) core (TM) i7-9700K processor (3.60 GHz) and 32 GB of RAM.

Algorithm 1 shows the steps taken on the reference image and the distorted image to calculate the final assessment of the quality of the distorted image.

4.2. Performance in Quality Prediction and Texture Similarity

In this section, we compare DualNetIQ with 11 FR-IQA models on five standard IQA datasets (LIVE [14], CSIQ [15], TID2013 [16], KADID-10k [17], and PIPAL [18]), as well as on a specialized texture similarity dataset, SynTEX [4], which contains 105 synthesized texture images created using five different methods of texture synthesis based on 21 high-quality textures. The 11 FR-IQA methods include six knowledge-driven models (PSNR, SSIM [3], MS-SSIM [20], FSIM [25], VIF [24], and NLPD [49]) and five data-driven CNN-based models (PieAPP [50], LPIPS [10], DISTS [11], A-DISTS [36], and DeepDC [12]). For consistent evaluation, all models were evaluated using their original implementations, and we selected the default implementation of LPIPS, which is LPIPS-VGG-lin.

Table 2 presents the comparative results that provide a number of important insights. Every existing IQA dataset shows that the proposed DualNetIQ is strong, outperforming older methods like PSNR, SSIM, and MS-SSIM as well as newer CNN-based methods like PieAPP and LPIPS. It is intriguing that DualNetIQ does better than DISTS, A-DISTS, and DeepDC in most of the IQA datasets. This demonstrates its ability to conduct more comprehensive structural and textural analysis, a crucial step in accurately evaluating image quality. DualNetIQ achieves the second-highest SRCC and KRCC values on the LIVE, TID2013, and KADID-10K datasets. Notably, the difference between its performance and the top-performing model on KADID-10K (DeepDC) is only 0.008. It is clear that DualNetIQ enhances performance on TID2013, achieving strong results compared to other methods, especially those that are insensitive to texture resampling. On the CSIQ dataset, DualNetIQ delivers comparable results (SRCC = 0.930 and KRCC = 0.764). Furthermore, when evaluating individual distortions on CSIQ, it outperforms other models. When it comes to the pipal dataset, which has a lot of GAN-generated images that are hard to work with because they have synthesized textures that are not in the reference images, DualNetIQ also does well.

With

S R C C = 0.938

and

K R C C = 0.792

, DualNetIQ is much better than any knowledge-based or advanced data-driven model at figuring out how similar two textures are across the SynTEX dataset. DualNetIQ’s superior feature extraction capabilities, which effectively extract key attributes related to visual texture, account for its performance edge. This aspect presents a challenge for traditional models, as they rely on precise pixel correspondence. Also, it is worth noting that DualNetIQ outperforms modern methods known for their ability and superiority in texture similarity assessment, such as DISTS and DeepDC.

4.3. Performance on Perceptual Similarity Measurment

In addition to the traditional image quality evaluation, DualNetIQ performance was also evaluated for the perceptual similarity measurement on the BAPPS dataset. BAPPS contains a diverse collection of image patches that includes distortions produced by real-world algorithms such as super resolution, video deblurring, and frame interpolation, as well as synthetic ones, whether traditional or CNN-based. The evaluation of perceptual similarities in BAPPS follows a Two-Alternative Forced Choice (2AFC) experimental setup. This score quantifies the consistency between the model and human preference; the higher the score, the more consistent the model is, providing a strong indication of its perceptual alignment with human visual perception.

The results shown in Table 3 highlight the excellent performance of DualNetIQ both for synthetic and real-world distortions. The proposed model outperformed all methods in managing distortions generated by real-world algorithms by achieving the highest 2AFC score of 0.661 and it also surpassed all methods in managing synthetic distortions by achieving the highest 2AFC score of 0.797 and thus the highest 2AFC score in total, which was 0.693.

4.4. Performance Comparison Across Diversity of Distortion Types

Table 4 shows how well DualNetIQ performed compared to DISTS, A-DISTS, and DeepDC for a certain group of noise types in the TID2013 dataset, using SRCC and KRCC. The results show that our proposed model achieved strong generalization across a wide range of distortion types. The proposed model consistently achieved the best or at least the second-best result for each type of distortion, demonstrating its remarkable versatility and robustness.

Figure 4 shows a boxplot of SRCC values for IQA metrics for different types of distortion in the TID2013, CSIQ, and LIVE datasets. It is easy to see that our suggested model, DualNetIQ, had the highest median and upper whisker. DeepDC demonstrated a slightly lower yet comparable median; however, it presented outliers that suggest limitations associated with certain distortion types. Conversely, A-DISTS recorded the lowest median, and despite the absence of visible outliers, the behavior of its lower whisker was comparable to or worse than that of other metrics’ outliers. DISTS had a median slightly lower than both DualNetIQ and DeepDC, albeit still comparable, and it also exhibited outliers that highlighted limitations for specific distortion types.

Furthermore, Figure 5 provides a similar analysis for KRCC values. DualNetIQ also had the highest median, as well as the highest upper and lower whiskers. DeepDC exhibited a slightly lower median and the smallest lower whisker, suggesting that it performs poorly under certain distortion classes. A-DISTS had the lowest median, with its lower whisker slightly higher than DeepDC’s. Furthermore, DISTS exhibited a marginally lower median than both DualNetIQ and DeepDC, all while maintaining a lower whisker.

In summary, it is evident that DualNetIQ performs better than the other models, resolving issues and improving generalizability across different kinds of distortion.

4.5. Ablation Study

We conducted a series of ablation experiments to delve into the influences of various CNNs acting as feature extractors in the DualNetIQ framework. Specifically, the aim was to study how different CNN selections affect the performance of the model.

First, we carefully tuned κ and ξ parameters in each configuration using the GWO algorithm to ensure the validity and reliability of our findings. This study was essential because it provided a means to identify settings that would result in optimum performance for each CNN network and ensure fair comparison.

Second, we tested the performance of each CNN variant (VGG19, SqueezeNet, AlexNet, and ResNet50) on a wide range of five datasets, including four benchmark IQA datasets and one texture similarity dataset. Table 5 shows how each architecture performed in terms of evaluating texture similarity and image quality, offering a detailed comparison of the performance metrics for each CNN. It also presents the results of combining the VGG19 and SqueezeNet networks (i.e., the proposed DualNetIQ model).

As one can see in Table 5, the VGG19 and SqueezeNet networks achieved SRCC values higher than 0.9 with the LIVE and SynTEX datasets. However, AlexNet obtained good results with LIVE, CSIQ, TID2013, and KADID-10k, and it achieved limited results with the SynTEX dataset (KRCC values < 0.72). In turn, ResNet50 obtained limited results with the SynTEX dataset (SRCC values < 0.7 and KRCC values < 0.5). According to the results, VGG19 demonstrated commendable performance, approaching that of DualNetIQ in certain cases, and even outperforming in the case of CSIQ. However, it is important to note that DualNetIQ outperformed VGG19 across all datasets if we consider the different distortion types separately in CSIQ. In addition, DualNetIQ reduced the computational load of the model by selecting and utilizing only selective layers from VGG19, particularly by replacing some of the computation-intensive layers from VGG19 with more efficient counterparts from SqueezeNet. The quality prediction became much better, especially on the TID2013 dataset, where it achieved SRCC = 0.865 and KRCC = 0.678, compared to SRCC = 0.833 and KRCC = 0.648 when VGG19 was used alone. Additionally, on the SynTEX dataset, adding efficient SqueezeNet layers increased SRCC to 0.938 and KRCC to 0.792, which was better than the SRCC of 0.928 and KRCC of 0.733 that was achieved with VGG19 alone.

4.6. Complexity Comparison

We compared our proposed method, DualNetIQ, with a set of methods used to evaluate image quality in computational complexity, a perspective that is crucial for real-time applications. The methods used in comparison were FPS and FLOPS. As shown in Table 6, DualNetIQ achieved a competitive FPS value of 29.8 and FLOPS of 46.92 billion, which was the second-best result after DISTS, which achieved a slightly better result. However, DualNetIQ outperformed DISTS in terms of image quality assessment and texture similarity, particularly when dealing with certain types of distortions.

It should be noted that one limitation of the model is that it has not been extensively tested on images with unusual colors such as night shots and underwater. Also, we did not present results for real systems (e.g., in surveillance cameras or vision systems) in this study. Additionally, the performance of the proposed model with other transformer architectures [52] could be studied. All these points will be addressed in future studies.

5. Conclusions and Future Work

In this paper, we proposed DualNetIQ, an FR-IQA model with explicit tolerance to texture resampling. Our model utilizes two powerful CNNs, VGG19 and SqueezeNet, to extract features from both reference and distorted images. The eight feature maps are then combined using the optimal combined similarity metric, with its weighting parameters fine-tuned using GWO. Our model architecture enables it to address the limitations of traditional metrics as well as deep learning-based metrics that rely on a single CNN for feature extraction. DualNetIQ was tested on IQA, texture similarity, perceptual similarity, and different types of distortion. The results show that our model works better and more efficiently than other existing models.

Future work may involve incorporating additional metrics to calculate the similarity at each stage of the DualNetIQ model, with the aim of enhancing its performance. Additionally, different methods for parameter tuning may be investigated. The effectiveness of various architectures, such as EfficientNet or Swin Transformer, with the proposed model will be also assessed. Moreover, we will conduct real-world deployment tests and evaluate the model’s performance in handling images with unusual colors.

Author Contributions

Conceptualization, A.A., H.M., H.E. and M.A.-N.; methodology, A.A., H.M., A.M.A. and M.A.-N.; software, A.A., M.A.-N. and H.M.; validation, H.E., A.M.A. and A.A.A.; formal analysis, A.A. and A.A.A.; investigation, H.M., H.E. and A.A.A.; resources, A.A. and H.M.; data curation, A.A.A. and A.M.A.; writing—original draft preparation, A.A., H.M. and M.A.-N.; writing—review and editing, H.E., A.A.A. and A.M.A.; visualization, H.E., A.M.A., A.A.A. and M.A.-N.; supervision, A.A., A.M.A. and M.A.-N.; project administration, A.A., A.A.A. and M.A.-N.; funding acquisition, H.E. and A.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work through the Large Research Project under grant number RGP2/301/46.

Data Availability Statement

The authors declare that the data used to support the findings of this study will be available from the corresponding author upon request.

Conflicts of Interest

There are no conflicts of interest.

References

Bosse, S.; Maniry, D.; Müller, K.R.; Wiegand, T.; Samek, W. Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment. IEEE Trans. Image Process. 2018, 27, 206–219. [Google Scholar] [CrossRef] [PubMed]
Ma, C.; Shi, Z.; Lu, Z.; Xie, S.; Chao, F.; Sui, Y. A Survey on Image Quality Assessment: Insights, Analysis, and Future Outlook. arXiv 2025, arXiv:2502.08540. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Golestaneh, S.A.; Subedar, M.M.; Karam, L.J. The effect of texture granularity on texture synthesis quality. In Proceedings of the SPIE Optical Engineering + Applications, San Diego, CA, USA, 9–13 August 2015; Volume 9599, pp. 356–361. [Google Scholar] [CrossRef]
Efros, A.A.; Freeman, W.T. Image quilting for texture synthesis and transfer. In Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, Los Angeles, CA, USA, 12–17 August 2001; pp. 341–346. [Google Scholar] [CrossRef]
Wei, L.Y.; Levoy, M. Fast Texture Synthesis using Tree-structured Vector Quantization. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 23–28 July 2000; pp. 479–488. [Google Scholar] [CrossRef]
Kim, J.; Lee, S. Deep Learning of Human Visual Sensitivity in Image Quality Assessment Framework. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1969–1977. [Google Scholar] [CrossRef]
Varga, D. A Combined Full-Reference Image Quality Assessment Method Based on Convolutional Activation Maps. Algorithms 2020, 13, 313. [Google Scholar] [CrossRef]
Gao, F.; Wang, Y.; Li, P.; Tan, M.; Yu, J.; Zhu, Y. DeepSim: Deep similarity for image quality assessment. Neurocomputing 2017, 257, 104–114. [Google Scholar] [CrossRef]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
Ding, K.; Ma, K.; Wang, S.; Simoncelli, E.P. Image Quality Assessment: Unifying Structure and Texture Similarity. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 2567–2581. [Google Scholar] [CrossRef]
Zhu, H.; Chen, B.; Zhu, L.; Wang, S.; Lin, W. DeepDC: Deep Distance Correlation as a Perceptual Image Quality Evaluator. arXiv 2023, arXiv:2211.04927v2. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Sheikh, H.R.; Wang, Z.; Cormack, L.; Bovik, A.C. Image and Video Quality Assessment Research at LIVE. Available online: http://live.ece.utexas.edu/research/quality/ (accessed on 3 October 2024).
Chandler, D.M.; Larson, E.C. Most apparent distortion: Full-reference image quality assessment and the role of strategy. J. Electron. Imaging 2010, 19, 011006. [Google Scholar] [CrossRef]
Ponomarenko, N.; Jin, L.; Ieremeiev, O.; Lukin, V.; Egiazarian, K.; Astola, J.; Vozel, B.; Chehdi, K.; Carli, M.; Battisti, F.; et al. Image database TID2013: Peculiarities, results and perspectives. Signal Process. Image Commun. 2015, 30, 57–77. [Google Scholar] [CrossRef]
Lin, H.; Hosu, V.; Saupe, D. KADID-10k: A Large-scale Artificially Distorted IQA Database. In Proceedings of the 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX), Berlin, Germany, 5–7 June 2019; pp. 1–3. [Google Scholar] [CrossRef]
Gu, J.; Cai, H.; Chen, H.; Ye, X.; Ren, J.; Dong, C. PIPAL: A Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Volume 16, pp. 633–651. [Google Scholar]
Lin, W.; Li, D.; Xue, P. Discriminative Analysis of Pixel Difference towards Picture Quality Prediction. In Proceedings of the IEEE International Conference on Image Processing, Barcelona, Spain, 4–17 September 2003; Volume 3, pp. 193–196. [Google Scholar] [CrossRef]
Wang, Z.; Simoncelli, E.P.; Bovik, A.C. Multi-Scale Structural Similarity For Image Quality Assessment. In Proceedings of the IEEE Asilomar Conference on Signals, System and Computers, Pacific Grove, CA, USA, 9–12 November 2003; pp. 1398–1402. [Google Scholar]
Wang, Z.; Li, Q. Information content weighting for perceptual image quality assessment. IEEE Trans. Image Process. 2011, 20, 1185–1198. [Google Scholar] [CrossRef]
Wang, Z.; Simoncelli, E.P. Translation insensitive image similarity in complex wavelet domain. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Philadelphia, PA, USA, 23 March 2005; Volume 2. [Google Scholar] [CrossRef]
Zhang, L.; Shen, Y.; Li, H. VSI: A visual saliency-induced index for perceptual image quality assessment. IEEE Trans. Image Process. 2014, 23, 4270–4281. [Google Scholar] [CrossRef] [PubMed]
Sheikh, H.R.; Bovik, A.C. A Visual Information Fidelity Approach To Video Quality Assessment. In Proceedings of the First International Workshop on Video Processing and Quality Metrics for Consumer Electronics, Scottsdale, AZ, USA, 23–25 January 2005; Volume 7. [Google Scholar]
Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A Feature Similarity Index for Image Quality Assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef]
Liang, Y.; Wang, J.; Wan, X.; Gong, Y.; Zheng, N. Image quality assessment using similar scene as reference. In Proceedings of the Computer Vision—ECCV 2016, Amsterdam, The Netherlands, 11–14 October 2016; Volume 9909, pp. 3–18. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
You, J.; Korhonen, J. Transformer For Image Quality Assessment. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 1389–1393. [Google Scholar] [CrossRef]
Ke, J.; Wang, Q.; Wang, Y.; Milanfar, P.; Yang, F. MUSIQ: Multi-scale Image Quality Transformer. In Proceedings of the IEEE International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 5128–5137. [Google Scholar] [CrossRef]
Cheon, M.; Yoon, S.J.; Kang, B.; Lee, J. Perceptual image quality assessment with transformers. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Nashville, TN, USA, 19–25 June 2021; pp. 433–442. [Google Scholar] [CrossRef]
Golestaneh, S.A.; Dadsetan, S.; Kitani, K.M. No-Reference Image Quality Assessment via Transformers, Relative Ranking, and Self-Consistency. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022, Waikoloa, HI, USA, 3–8 January 2022; pp. 3989–3999. [Google Scholar] [CrossRef]
Yang, S.; Wu, T.; Shi, S.; Lao, S.; Gong, Y.; Cao, M.; Wang, J.; Yang, Y. MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Orleans, LA, USA, 19–20 June 2022; pp. 1190–1199. [Google Scholar] [CrossRef]
Keshari, A.; Subudhi, B. Multi-Scale Features and Parallel Transformers Based Image Quality Assessment. arXiv 2022, arXiv:2204.09779. [Google Scholar]
DIng, K.; Liu, Y.; Zou, X.; Wang, S.; Ma, K. Locally Adaptive Structure and Texture Similarity for Image Quality Assessment. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 2483–2491. [Google Scholar] [CrossRef]
Wang, Z.; Simoncelli, E.P. Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities. J. Vis. 2008, 8, 8. [Google Scholar] [CrossRef]
Berardino, A.; Ballé, J.; Laparra, V.; Simoncelli, E.P. Eigen-Distortions of Hierarchical Representations. arXiv 2018, arXiv:1710.02266. [Google Scholar]
Mady, H.; Agamy, A.; Aly, A.M.; Abdel-Nasser, M. A Comparative Analysis of CNN Feature Extractors and Parameter Tuning with Ray Tune Search Algorithms for Image Quality Assessment. Aswan Univ. J. Sci. Technol. 2024, 4, 132–148. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Wang, C.; Wu, Q.; Weimer, M.; Zhu, E. FLAML: A Fast and Lightweight AutoML Library. Proc. Mach. Learn. Syst. 2021, 3, 434–447. [Google Scholar]
Rapin, J.; Teytaud, O. Nevergrad—A Gradient-Free Optimization Platform. Available online: https://GitHub.com/FacebookResearch/Nevergrad (accessed on 10 March 2025).
Falkner, S.; Klein, A.; Hutter, F. BOHB: Robust and Efficient Hyperparameter Optimization at Scale. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1437–1446. [Google Scholar]
Liaw, R.; Liang, E.; Nishihara, R.; Moritz, P.; Gonzalez, J.E.; Stoica, I. Tune: A Research Platform for Distributed Model Selection and Training. arXiv 2018, arXiv:1807.05118. [Google Scholar]
Kennedy, J.; Eberhart, R.; bls gov. Particle Swarm Optimization. In Proceedings of the ICNN’95—International Conference on Neural Network, Perth, WA, Australia, 27 November–1 December 2002; Volume 4, pp. 1942–1948. [Google Scholar]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Watanabe, S. Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance. arXiv 2023, arXiv:2304.11127. [Google Scholar]
Laparra, V.; Ballé, J.; Berardino, A.; Simoncelli, E.P. Perceptual image quality assessment using a normalized Laplacian pyramid. Electron. Imaging 2016, 2016, 1–6. [Google Scholar] [CrossRef]
Prashnani, E.; Cai, H.; Mostofi, Y.; Sen, P. PieAPP: Perceptual Image-Error Assessment through Pairwise Preference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1808–1817. [Google Scholar]
Xue, W.; Zhang, L.; Mou, X.; Bovik, A.C. Gradient Magnitude Similarity Deviation: A Highly Efficient Perceptual Image Quality Index. IEEE Trans. Image Process. 2013, 23, 684–695. [Google Scholar] [CrossRef]
Rehman, M.U.; Nizami, I.F.; Ullah, F.; Hussain, I. IQA Vision Transformed: A Survey of Transformer Architectures in Perceptual Image Quality Assessment. IEEE Access 2024, 12, 183369–183393. [Google Scholar] [CrossRef]

Figure 1. Three images of the same texture from the Syntex dataset [4] accompanied by MOS scores and quality assessments from some IQA models. (a) Reference image (Curtain). (b) Synthesized image using Algorithm 1 [5]. (c) Synthesized image using Algorithm 3 [6]. It is obvious from the results that PSNR and SSIM assign quality scores that do not correlate well with MOSs, as they give low scores to the synthesized images. DISTS assigns the same score to the synthesized images. DeepDC assigns quality scores that are higher than those reflected in the MOSs. DualNetIQ assigns scores that most closely match the MOSs.

Figure 2. The architecture of the proposed DualNetIQ for full-reference IQA.

Figure 3. Constructing the optimal combined similarity metric using the GWO algorithm.

Figure 4. Boxplot of SRCC values for some IQA metrics across various distortion types in TID2013, CSIQ, and LIVE datasets.

Figure 5. Boxplot of KRCC values for some IQA metrics across various distortion types in the TID2013, CSIQ, and LIVE datasets.

Table 1. Description of the stages of the proposed model.

Stage (i)	Description	Output Feature Dimension (n_i)
1	Input image	3 × 256 × 256
2	VGG19 stage1: conv1_8	256 × 64 × 64
3	VGG19 stage2: conv2_4	512 × 32 × 32
4	SqueezeNet stage1: conv1_1	64 × 127 × 127
5	SqueezeNet stage2: conv2_6 (fire1 + fire2)	128 × 64 × 64
6	SqueezeNet stage3: conv3_6 (fire3 + fire4)	256 × 32 × 32
7	SqueezeNet stage4: conv4_6 (fire5 + fire6)	384 × 16 × 16
8	SqueezeNet stage5: conv5_6 (fire7 + fire8)	512 × 16 × 16

Table 2. Performance comparison of the DualNetIQ model against multiple state-of-the-art FR-IQA models on four standard IQA datasets, as well as one texture similarity dataset.

Method	Quality Prediction										Texture Similarity
	LIVE [14]		CSIQ [15]		TID2013 [16]		KADID-10k [17]		PIPAL [18]		SynTEX [4]
	SRCC	KRCC	SRCC	KRCC	SRCC	KRCC	SRCC	KRCC	SRCC	KRCC	SRCC	KRCC
PSNR	0.873	0.680	0.809	0.599	0.687	0.496	0.676	0.488	0.407	0.233	0.320	0.211
SSIM [3]	0.931	0.763	0.872	0.678	0.720	0.527	0.724	0.537	0.498	0.276	0.620	0.446
MS-SSIM [20]	0.951	0.805	0.906	0.730	0.786	0.605	0.826	0.635	0.552	0.291	0.632	0.454
FSIMc [25]	0.965	0.837	0.931	0.768	0.851	0.666	0.854	0.665	0.569	0.304	0.081	0.077
VIF [24]	0.964	0.828	0.911	0.743	0.677	0.518	0.679	0.507	0.443	0.261	0.606	0.492
NLPD [49]	0.937	0.778	0.932	0.769	0.800	0.625	0.812	0.623	0.469	0.255	0.606	0.464
PieAPP [50]	0.919	0.750	0.892	0.715	0.876	0.683	0.836	0.647	0.700	0.492	0.715	0.532
LPIPS [10]	0.932	0.765	0.876	0.689	0.670	0.497	0.843	0.653	0.573	0.323	0.663	0.478
DISTS [11]	0.954	0.811	0.939	0.780	0.830	0.639	0.887	0.709	0.624	0.433	0.923	0.759
A-DISTS [36]	0.955	0.812	0.942	0.796	0.836	0.642	0.890	0.715	0.622	0.431	0.760	-
DeepDC [12]	0.940	0.781	0.937	0.774	0.844	0.651	0.905	0.733	0.684	0.467	0.896	0.727
DualNetIQ (ours)	0.955	0.815	0.930	0.764	0.865	0.678	0.897	0.719	0.635	0.452	0.938	0.792

Table 3. Performance comparison of various IQA models on the BAPPS [10] dataset. The table displays the 2AFC scores for each model, which quantify the agreement with human judgments on image quality. Scores range from [0, 1], with higher values indicating better performance and closer alignment with human perceptual evaluations.

Method	Synthetic Distortions			Distortions by Real-World Algorithms					All
Method	Traditional	CNN-Based	All	Super Resolution	Video Deblurring	Colorization	Frame Interpolation	All	All
Human	0.808	0.844	0.826	0.734	0.671	0.688	0.686	0.695	0.739
PSNR	0.573	0.801	0.687	0.642	0.590	0.624	0.543	0.614	0.633
SSIM [3]	0.605	0.806	0.705	0.647	0.589	0.624	0.573	0.617	0.640
MS-SSIM [20]	0.585	0.768	0.676	0.638	0.589	0.524	0.572	0.596	0.617
FSIMc [25]	0.627	0.794	0.710	0.660	0.590	0.573	0.581	0.615	0.640
VSI [23]	0.630	0.818	0.724	0.668	0.592	0.597	0.568	0.622	0.648
VIF [24]	0.556	0.744	0.650	0.651	0.594	0.515	0.597	0.603	0.615
NLPD [49]	0.550	0.764	0.657	0.655	0.584	0.528	0.552	0.600	0.615
GMSD [51]	0.609	0.772	0.690	0.677	0.594	0.517	0.575	0.613	0.633
DeepIQA [1]	0.703	0.794	0.748	0.660	0.582	0.585	0.598	0.615	0.650
PieAPP [50]	0.727	0.770	0.746	0.684	0.585	0.594	0.598	0.627	0.659
LPIPS [10]	0.714	0.814	0.764	0.705	0.605	0.625	0.630	0.641	0.692
DISTS [11]	0.749	0.824	0.786	0.705	0.600	0.629	0.625	0.649	0.685
DeepDC [12]	0.757	0.825	0.796	0.712	0.608	0.631	0.626	0.655	0.692
DualNetIQ (ours)	0.739	0.832	0.797	0.719	0.606	0.648	0.631	0.661	0.693

Table 4. Performance of DualNetIQ against texture-resampling-insensitive methods over a subset of distortion types in the TID2013 dataset.

Noise Type	DISTS [11]		A-DISTS [36]		DeepDC [12]		DualNetIQ
Noise Type	SRCC	KRCC	SRCC	KRCC	SRCC	KRCC	SRCC	KRCC
Quantization Noise	0.831	0.644	0.799	0.62	0.872	0.682	0.862	0.672
Gaussian Blur	0.938	0.776	0.938	0.774	0.954	0.808	0.941	0.78
Contrast Change	0.488	0.335	0.433	0.289	0.472	0.297	0.472	0.303
JPEG Compression	0.904	0.702	0.907	0.709	0.917	0.725	0.921	0.736
JPEG 2000 Compression	0.944	0.795	0.945	0.797	0.949	0.801	0.946	0.798
Additive Gaussian Noise	0.877	0.677	0.858	0.651	0.868	0.667	0.883	0.688
High-Frequency Noise	0.878	0.659	0.872	0.651	0.879	0.663	0.895	0.677
Impulse Noise	0.713	0.506	0.708	0.503	0.693	0.48	0.8	0.591
Image Denoising	0.905	0.736	0.892	0.714	0.912	0.755	0.923	0.762
Mean Shift	0.801	0.594	0.801	0.598	0.737	0.538	0.804	0.604
Change of Color Saturation	0.814	0.612	0.839	0.644	0.717	0.518	0.836	0.642
Multiplicative Gaussian Noise	0.83	0.616	0.795	0.578	0.823	0.611	0.826	0.612

Table 5. Ablation study results: comparative performance of DualNetIQ using different CNN architectures as standalone feature extractors.

CNN Model	Quality Prediction								Texture Similarity
	LIVE [14]		CSIQ [15]		TID2013 [16]		KADID-10k [17]		SynTEX [4]
	SRCC	KRCC	SRCC	KRCC	SRCC	KRCC	SRCC	KRCC	SRCC	KRCC
VGG19	0.954	0.813	0.941	0.786	0.833	0.648	0.896	0.718	0.928	0.773
SqueezeNet	0.953	0.812	0.896	0.714	0.858	0.668	0.881	0.695	0.930	0.778
AlexNet	0.942	0.797	0.900	0.716	0.833	0.640	0.860	0.666	0.892	0.713
ResNet50	0.862	0.681	0.816	0.633	0.774	0.578	0.801	0.612	0.631	0.490
VGG19 + SqueezeNet (DualNetIQ)	0.955	0.815	0.930	0.764	0.865	0.678	0.897	0.719	0.938	0.792

Table 6. FPS and FLOPS results for five IQA models including DualNetIQ.

Model	FPS	FLOPS (Billion)
LPIPS [10]	9.68	401
PieAPP [50]	0.36	827.3
DISTS [11]	32.9	40.125
DeepDC [12]	27.77	56.69
DualNetIQ (ours)	29.8	46.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Agamy, A.; Mady, H.; Esmaiel, H.; Al Ayidh, A.; Aly, A.M.; Abdel-Nasser, M. DualNetIQ: Texture-Insensitive Image Quality Assessment with Dual Multi-Scale Feature Maps. Electronics 2025, 14, 1169. https://doi.org/10.3390/electronics14061169

AMA Style

Agamy A, Mady H, Esmaiel H, Al Ayidh A, Aly AM, Abdel-Nasser M. DualNetIQ: Texture-Insensitive Image Quality Assessment with Dual Multi-Scale Feature Maps. Electronics. 2025; 14(6):1169. https://doi.org/10.3390/electronics14061169

Chicago/Turabian Style

Agamy, Adel, Hossam Mady, Hamada Esmaiel, Abdulrahman Al Ayidh, Abdelmageed Mohamed Aly, and Mohamed Abdel-Nasser. 2025. "DualNetIQ: Texture-Insensitive Image Quality Assessment with Dual Multi-Scale Feature Maps" Electronics 14, no. 6: 1169. https://doi.org/10.3390/electronics14061169

APA Style

Agamy, A., Mady, H., Esmaiel, H., Al Ayidh, A., Aly, A. M., & Abdel-Nasser, M. (2025). DualNetIQ: Texture-Insensitive Image Quality Assessment with Dual Multi-Scale Feature Maps. Electronics, 14(6), 1169. https://doi.org/10.3390/electronics14061169

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

DualNetIQ: Texture-Insensitive Image Quality Assessment with Dual Multi-Scale Feature Maps

Abstract

1. Introduction

2. Related Work

2.1. Traditional Full-Reference IQA Methods

2.2. Deep Learning-Based IQA Methods

3. Proposed Method

3.1. DualNetIQ Model

3.2. Optimal Combined Similarity Metric

3.3. Implementation of DualNetIQ

3.4. Evaluation Metrics

4. Results and Discussion

4.1. Parameter Setting

4.2. Performance in Quality Prediction and Texture Similarity

4.3. Performance on Perceptual Similarity Measurment

4.4. Performance Comparison Across Diversity of Distortion Types

4.5. Ablation Study

4.6. Complexity Comparison

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI