No-Reference Image Quality Assessment Based on Dual-Domain Feature Fusion

Image quality assessment (IQA) aims to devise computational models to evaluate image quality in a perceptually consistent manner. In this paper, a novel no-reference image quality assessment model based on dual-domain feature fusion is proposed, dubbed as DFF-IQA. Firstly, in the spatial domain, several features about weighted local binary pattern, naturalness and spatial entropy are extracted, where the naturalness features are represented by fitting parameters of the generalized Gaussian distribution. Secondly, in the frequency domain, the features of spectral entropy, oriented energy distribution, and fitting parameters of asymmetrical generalized Gaussian distribution are extracted. Thirdly, the features extracted in the dual-domain are fused to form the quality-aware feature vector. Finally, quality regression process by random forest is conducted to build the relationship between image features and quality score, yielding a measure of image quality. The resulting algorithm is tested on the LIVE database and compared with competing IQA models. Experimental results on the LIVE database indicate that the proposed DFF-IQA method is more consistent with the human visual system than other competing IQA methods.


Introduction
Many image processing tasks (e.g., image acquisition, compression, transmission, restoration, etc.) often cause different types of distortion at different levels, so perceived quality assessment has been receiving more and more attention [1]. It can be divided into subjective IQA and objective IQA. The conventional way of measuring image quality is to solicit the opinion of human observers. However, such subjective IQA methods are cumbersome and time-consuming, so they are difficult to be incorporated into automatic systems. Therefore, objective IQA methods have more actual significance in practical applications [2,3].
According to the scope of application, the current methods of NR-IQA can be roughly divided into two categories: special methods for specific types of distortion [20][21][22] and general methods for various types of distortion [23][24][25][26][27][28][29][30][31]. Considering that special-purpose algorithms need to acquire the type of distortion such as blur, noise, compression, etc., their scope of application is limited. Therefore, research on general-purpose methods has become a hot topic in the field of IQA, including two-stage framework models and global framework models.
(1) We analyze and extract the perceptual features in dual-domain, and the fused quality-aware feature vector has been verified to promote the performance of quality evaluation. (2) We compare the representative FR/NR-IQA models with our DFF-IQA model. The experimental results show that the proposed method has better performance and has good consistency with human subjective perception.
The remainder of this paper is organized as follows. In Section 2, the proposed model is presented. In Section 3, we illustrate and discuss the experimental results. Finally, we conclude our paper in Section 4.

Proposed DFF-IQA Method
As shown in Figure 1, this paper proposes a novel NR-IQA algorithm based on dual-domain feature fusion, dubbed as DFF-IQA. Firstly, in the spatial domain, we extract several features about weighted local binary pattern (WLBP) [32], naturalness and entropy from the distorted image. WLBP describes the texture characteristic of distorted images by adopting the statistical features of gradient-weighted LBP histogram, and the naturalness feature is represented by the fitting parameters of the generalized gaussian distribution (GDD). Spatial entropy is computed based on image blocks. Secondly, in the frequency domain, the DCT and curvelet transforms are implemented on distorted image based on the blocks respectively. In DCT domain, spectral entropy feature is extracted from image blocks. In curvelet transform domain, an asymmetric generalized Gaussian distribution (AGGD) model is employed to summarize the distribution of curvelet coefficients of the distorted image. Meanwhile, oriented energy distribution (OED) feature is further extracted to describe the curvelet coefficient. Thirdly, the features extracted in the dual-domain are fused to build the quality-aware feature vector. Finally, quality regression process by random forest is conducted to build the relationship between image features and quality score, yielding a measure of image quality. The framework of the proposed DFF-IQA algorithm is depicted in Figure 1.
Entropy 2020, 22, x FOR PEER REVIEW  3 of 21 image. Meanwhile, oriented energy distribution (OED) feature is further extracted to describe the curvelet coefficient. Thirdly, the features extracted in the dual-domain are fused to build the qualityaware feature vector. Finally, quality regression process by random forest is conducted to build the relationship between image features and quality score, yielding a measure of image quality. The framework of the proposed DFF-IQA algorithm is depicted in Figure 1. Local binary pattern (LBP) is an operator used to describe local texture features of images effectively and has shown good performance for evaluation of IQA tasks [33]. In this paper, the LBP coding with rotation invariance equivalent mode is employed, and gradient magnitude is adopted for weighting. Here, the gradient magnitude of the image ( I G ) is obtained using the Prewitt filter.
The calculation process is as follows: where I is the input image, h P and v P are the Prewitt filters in the horizontal and vertical directions respectively, " * " represents the convolution operation, and I G is the gradient magnitude image of I .
We calculate the local rotation invariant uniform LBP operator , Local binary pattern (LBP) is an operator used to describe local texture features of images effectively and has shown good performance for evaluation of IQA tasks [33]. In this paper, the LBP coding with rotation invariance equivalent mode is employed, and gradient magnitude is adopted for weighting. Here, the gradient magnitude of the image (G I ) is obtained using the Prewitt filter. The calculation process is as follows: where I is the input image, P h and P v are the Prewitt filters in the horizontal and vertical directions respectively, "*" represents the convolution operation, and G I is the gradient magnitude image of I. We calculate the local rotation invariant uniform LBP operator L P,R by: where R is the radius value, and P represents the number of neighboring points. G c indicates a center pixel at the position (x c , y c ) in the corresponding images, and G i is a neighboring pixel (x p , y p ) surrounding G c : where p ∈ {1, 2, . . . P} is the number of neighboring pixels sampled by a distance R from G c to G i . In this case, z(θ) is the step function and defined by: where T indicates the threshold value. In addition, ψ(·) is used to compute the number of bitwise transitions: where L P,R is the rotation-invariant operator: where k ∈ {1, 2, . . . , P}, and ROR(β, k) is the circular bit-wise right shift operator that shifts the tuple β by k positions. Finally, we obtain L P,R with a length of P + 2.
Prewitt filters with horizontal, vertical, main diagonal, and secondary diagonal directions were used to obtain the four gradient images in different directions by convolution operation. They are defined as O d = I * P d , d = 1, 2, 3, 4, where P d represents the Prewitt filter with four different directions, I denotes the input image, and O d represents the gradient of the four directions. In this work, we use the maximum gradient magnitude O(i, j) calculated by Equation (7) as the LBP weight of each pixel: where O d (i, j) represents the values of the gradient in four directions of (i, j) pixel point. Then the final weight map is obtained. The gradient magnitudes of pixels with the same WLBP pattern are accumulated O H (c), which can be regarded as the gradient-weighted WLBP histogram.
where O(i, j) represents the maximum directional gradient response, that is, the weight map;w and h represent the length and width of the image, respectively; and c ∈ [0, C] is the LBP encoding patterns. In this paper, LBP of rotation invariant equivalence mode is used. At last we extract 10-dimensional statistical characteristics from pattern 1 to pattern 10 at a single scale. As shown in Figure 2d, we can find a significant difference in the WLBP distribution for different distortion types. The abscissa of the histogram is LBP coding pattern from pattern 1 to pattern 10. In the histogram of the pristine natural image, most number of the patterns are pattern 1, pattern 2, pattern 5, and pattern 10, and the other patterns are relatively few. When the images are distorted, the pattern distribution changes significantly. Therefore, the spatial distortion of the image can be described by extracting the WLBP feature of the image.   The locally mean subtracted contrast normalized (MSCN) coefficients have been successfully applied to measure their naturalness [30]. For each distorted image, its MSCN coefficients can be calculated by: where I is the input image, C 1 is a constant that prevents instabilities from occurring when denominator tends to zero, and µ and σ are the mean and standard deviation of the distorted image, respectively. The calculation formulas are shown in Equations (11) and (12).
where ω = ω k,l |k = −K, . . . , K, l = −L, . . . , L represents a 2D circularly-symmetric Gaussian weighting function sampled out to three standard deviations and rescaled to unit volume. In our implementation, we set K = L = 3. Figure 3a,b shows the distorted image of JP2K and the corresponding MSCN coefficients, respectively; Figure 3c,d shows statistical distribution of the distorted image and MSCN histogram distribution of MSCN coefficients, respectively. As shown in Figure 3c,d, the distribution of MSCN coefficients is significantly different from the statistical distribution of the distorted image and approximates the Gaussian distribution. Therefore, the distribution of MSCN coefficient can be fitted by GGD to represent the degree of naturalness [30]. Figure 4 plots a histogram of MSCN coefficients for a pristine natural image and for various distorted versions of it. Notice how the pristine image exhibits a Gaussian-like appearance, while each distortion modifies the statistics in its own characteristic way. For example, blur creates a more Laplacian appearance, while white-noise distortion appears to reduce the weight of the tail of the histogram. We have found that a generalized Gaussian distribution (GGD) can be used to effectively capture a broader spectrum of distorted image statistics, which often exhibit changes in the tail behaviour (i.e., kurtosis) of the empirical coefficient distributions where the GGD with zero mean is given by: where where Γ(·) is the gamma function:   Notice how the pristine image exhibits a Gaussian-like appearance, while each distortion modifies the statistics in its own characteristic way. For example, blur creates a more Laplacian appearance, while white-noise distortion appears to reduce the weight of the tail of the histogram. We have found that a generalized Gaussian distribution (GGD) can be used to effectively capture a broader spectrum of distorted image statistics, which often exhibit changes in the tail behaviour (i.e., kurtosis) of the empirical coefficient distributions where the GGD with zero mean is given by: where ) (⋅ Γ is the gamma function:

Local Spatial Entropy and Spectral Entropy
Although global entropy can reflect the overall information in the image, it cannot reflect the details in the image. Therefore, this paper uses entropies computed from local image blocks, on both the block spatial scale responses and also on the block DCT coefficients [25].
The spatial entropy is computed by:

. Local Spatial Entropy and Spectral Entropy
Although global entropy can reflect the overall information in the image, it cannot reflect the details in the image. Therefore, this paper uses entropies computed from local image blocks, on both the block spatial scale responses and also on the block DCT coefficients [25].
The spatial entropy is computed by: where v are the pixel values in a local block, with empirical probability density p(v).
The block DCT coefficient matrix M C is firstly computed on 8 × 8 blocks. Implementing of the DCT rather than the DFT reduces block edge energy in the transform coefficients. DCT coefficients are normalized by the following equation: where 1 ≤ i ≤ 8, 1 ≤ j ≤ 8, and i, j 1 (DC is excluded). Then the local spectral map could be computed by: To illustrate the behavior of the local spatial entropy and spectral entropy against different degrees and types of distortions, we conducted a series of validation experiments on images. As shown in Figure 5, the undistorted image (org) has a spatial entropy histogram that is left-skewed. The spectral entropy histogram has a similar distribution. we can find different types of distortions (jp2k and jpeg compression, noise, blur and fast-fading) that exert systematically different influences on the local spatial and spectral entropy. Therefore, we utilize skewness and mean as features to measure the image quality.

AGGD Fitting Parameter of Curvelet Coefficients
The Curvelet transform is a higher dimensional generalization of the Wavelet transform designed to represent images at different scales and different angles [26]. Therefore, it is characterized by the ability to capture the information along the edges of the image well. Taking as the input in Cartesian coordinate system, the discrete curvelet transform of a 2-D function 1 2 [ , ] f t t is computed by:

AGGD Fitting Parameter of Curvelet Coefficients
The Curvelet transform is a higher dimensional generalization of the Wavelet transform designed to represent images at different scales and different angles [26]. Therefore, it is characterized by the ability to capture the information along the edges of the image well. Taking f [t 1 , t 2 ](0 ≤ t 1 , t 2 < n) as the input in Cartesian coordinate system, the discrete curvelet transform of a 2-D function f [t 1 , t 2 ] is computed by: where ϕ j,l,k represents a curvelet of scale j at position index k, with angle index l, t 1 , t 2 denoting coordinates in the spatial domain [34].
After the curvelet transform is implemented on the distorted image, we can obtain the curvelet coefficients. Then we can compute the MSCN coefficients from the curvelet coefficient according to Equation (10) described in Section 2.1.2. While MSCN coefficients are definitely more homogenous for pristine images, the signs of adjacent coefficients also exhibit a regular structure, which gets disturbed in the presence of distortion. We construct this structure using the empirical distributions of pairwise products of neighboring MSCN coefficients along four orientations: horizontal (H), vertical (V), main-diagonal (D1), and secondary-diagonal (D2), as depicted in Figure 6, respectively. In order to visualize how paired products vary in the presence of distortion, in Figure 7, we plot histograms of paired products along each of the four orientations, for a reference image and for distorted versions of it. Figure 7 a-d are the histograms of the pairwise product of the center pixel and horizontal, vertical, main diagonal, and secondary-diagonal MSCN coefficients. Mathematically, it could be computed by: where M I is the curvelet coefficients and O b represents the pairwise product of the MSCN coefficients and the MSCN coefficients in the V, D1 and D2, and b is set to 0, 1, −1. H represents the pairwise product of the MSCN coefficients and the horizontal MSCN coefficients. In order to visualize how paired products vary in the presence of distortion, in Figure 7, we plot histograms of paired products along each of the four orientations, for a reference image and for distorted versions of it. Figure 7a-d are the histograms of the pairwise product of the center pixel and horizontal, vertical, main diagonal, and secondary-diagonal MSCN coefficients.
secondary-diagonal at a distance of 1 pixel.
In order to visualize how paired products vary in the presence of distortion, in Figure 7, we plot histograms of paired products along each of the four orientations, for a reference image and for distorted versions of it. Figure 7 a-d are the histograms of the pairwise product of the center pixel and horizontal, vertical, main diagonal, and secondary-diagonal MSCN coefficients.  We use the zero mean asymmetric generalized Gaussian distribution (AGGD model) to fit its statistical distribution. The histograms of the pairwise products in four directions are calculated. where where α is the shape parameter controlling the statistical distribution, and β l and β r are the scale parameters of left and right edges respectively. When σ l = σ r , AGGD model can be transformed into generalized Gaussian model (GGD). In addition, we use the three parameters mentioned above to calculate η as an additional feature. See the formula below for the specific calculation process.

Oriented Energy Distribution (OED)
Cortical neurons are highly sensitive to orientation energy in images, whereas image distortion can modify the orientation energy distribution in an unnatural manner. The curvelet transform is a rich source of orientation information on images and their distortions [26]. In order to describe changes in the energy distribution in curvelet domain, we utilize the mean of the logarithm of the magnitude of the curvelet coefficients in all scales as an energy measure to calculate the energy differences between the adjacent layers and interval layers. The energy statistical function e j on different scales j is calculated by: e j = E(log 10 θ j ), j = 1, 2, 3, 4, 5 where θ j is a set of coefficients of the scale matrix's set with scale index j, and d 1 , d 2 , . . . , d 6 represent energy differences between the adjacent layers and interval layers. At the same time, the curvelet transform has rich directional information on the reference and distorted images. The magnitude of oriented energy is different in various categories of distortion. The average kurtosis m can be selected as the quality feature.
Previous studies [35][36][37][38][39] have found that image distortion processes affect image anisotropy. To capture this, we calculate the variation of the non-cardinal orientation energies c v [40]: where µ so and σ so are the sample mean and standard deviation of the non-cardinal orientation energies, and c v is employed to capture the degree of anisotropy of the image, and is used as a quality feature. Thus, we obtain an eight-dimensional feature group, which describes the oriented energy distribution, referred to as f OED = [m,c v ,d 1 ,d 2 ,d 3 ,d 4 ,d 5 ,d 6 ].

Pooling Strategy
The features extracted from the dual-domain are fused to form a multi-dimensional feature vector [41][42][43][44][45][46]. After feature extraction, the quality regression from feature space to image quality is conducted, which can be denoted as where f Q (•) is a quality regression function achieved by feature pooling strategy, F f represents the extracted feature vector, and Q is the quality of tested image. At present, learning-based methods [25,26,30] have been widely used in the feature pooling stage of IQA, such as support vector regression (SVR), random forest (RF) and BP neural network. SVR model is relatively fast in the regression processing, however, it is prone to over-fitting. BP neural network is employed rarely because of its high complexity. In the learning process of RF, a large number of decision trees will be generated. Each decision tree will give its own classification results, and the final regression score will be obtained by averaging the classification results of all decision trees. Many studies have shown that RF has higher prediction accuracy and is less prone to over-fitting [47][48][49], which is better than SVR in predicting the color images. Therefore, RF is used in this paper to learn the mapping relationship between feature vectors and Mean Opinion Score (MOS), so as to obtain the final quality score.

Database and Evaluation Criterion
In order to verify the effectiveness of the proposed algorithm, we tested the performance of DFF-IQA on the LIVE IQA database [50], which contains 29 reference images distorted by the five distortion types: white noise, JPEG and JP2K compression, Gaussian blur, and fast Rayleigh fading, yielding 799 distorted images. Each distorted image is provided with a Difference Mean Opinion Score (DMOS) value, which is representative of the human subjective score of the image. The subjective score DMOS value range is 0-100; the larger the DMOS value is, the more serious the image distortion is. Some examples of reference scenes in the LIVE database are showed in Figure 8.  Figure 8.  -i) shows some examples of reference scenes in the LIVE database, including "bikes scene", "buildings scene", "caps scene", "lighthouse2 scene", "monarch scene", "ocean scene", "parrots scene", "plane scene" and "rapids scene" (not listed one by one due to layout reasons).
Pearson linear correlation coefficient (PLCC), Spearman rank order correlation coefficient (SROCC) and root mean square error (RMSE) were used to measure the correlation between a set of predicted visual quality scores pre Q and a set of predicted visual quality score sub Q . The better correlation with human perception means a value close to 0 for RMSE and a value close to 1 for PLCC and SROCC. The calculation processes of PLCC, SROCC and RMSE are shown in Equations (30)-(32), respectively.  -i) shows some examples of reference scenes in the LIVE database, including "bikes scene", "buildings scene", "caps scene", "lighthouse2 scene", "monarch scene", "ocean scene", "parrots scene", "plane scene" and "rapids scene" (not listed one by one due to layout reasons).
Pearson linear correlation coefficient (PLCC), Spearman rank order correlation coefficient (SROCC) and root mean square error (RMSE) were used to measure the correlation between a set of predicted visual quality scores Q pre and a set of predicted visual quality score Q sub . The better correlation with human perception means a value close to 0 for RMSE and a value close to 1 for PLCC and SROCC. The calculation processes of PLCC, SROCC and RMSE are shown in Equations (30)-(32), respectively.
where cov(·) represents the covariance between Q pre and Q sub ; σ(·) represents the standard deviation;d i is rank difference of i-th evaluation sample in Q pre and Q sub ; and N is the number of samples.
where N is the number of samples; x i is the subjective value (MOS/DMOS); and y i is the predicted value by IQA model.

Performance Analysis of Different Features
Overall, the proposed method extracts five types of quality perception features from the distorted image, as tabulated in Table 1.  Table 2 show the performance comparison of different features on the specific distortion types of LIVE database. The overall performance of the combination of the five features is better than that of each single feature, which shows that the design of each feature is reasonable and complementary.

Overall Performance Analysis
We compared the performance of the proposed algorithm (DFF-IQA) with three FR-IQA models (PSNR, SSIM [4] and VIF [7]) and another five NR-IQA algorithms (BIQI [23], DIIVINE [24], BLIINDS-II [28], BRISQUE [30] and SSEQ [25]) on individual distortion types over the LIVE database. To make a fair comparison, we performed a similar random 20% test set selection for 1000 times to get median performance indices of the FR algorithms, since the FR algorithms do not need training. In addition, we only tested the FR approaches on the distorted images (excluding the reference images of the LIVE IQA database). For the NR approaches, the same random 80-20% train test trails were conducted and the median performance was treated as the overall performance indices. We also calculated the standard deviations (STD) of the performance indices to judge the algorithm stability in Table 6. Higher PLCC and SROCC with the lower STD and RMSE mean excellent quality prediction performance. The results are shown in Tables 3-6.  Tables 3-5 are the experimental results of SROCC, PLCC and RMSE respectively. It can be seen from the results in the table that when evaluating the whole LIVE database, the median value of SROCC is 0.9576, and the median value of PLCC is 0.9671, all of which are the best performance. We also calculated the standard deviations (STD) of the performance indices to measure the stability of the models in Table 6; the proposed model also has the best stability. In addition, we can also find that the VIF model performs better than the proposed method in the median value of RMSE. However, its application is limited because it is a FR-IQA model. In summary, the proposed DFF-IQA model is more consistent with the human visual system than other competing IQA methods considering the practical application. To further testify the superiority of the proposed DFF-IQA method, we also conducted a statistical significance analysis by following the approach in [16]. The comparison results among nine metrics are shown in Table 7 in terms of the Median PLCC. The proposed method is obviously superior to other competing IQA models, which is consistent with the data in Table 4. Table 7. Statistical significance tests of different IQA models in terms of PLCC. A value of '1' (highlighted in green) indicates that the model in the row is significantly better than the model in the column, while a value of '0' (highlighted in purple) indicates that the model in the row is not significantly better than the model in the column. The symbol "-" (highlighted in blue) indicates that the models in the rows and columns are statistically indistinguishable.

PSNR SSIM VIF BIQI DIIVINE BLIINDS-II BRISQUE SSEQ Proposed
In order to further analyze the prediction performance of the proposed model, we provide a visual illustration by scatter plot of subjective ratings (DMOS) versus objective scores obtained by DFF-IQA  Figure 9, each point ('+') represents one test image. The red curve shown in Figure 9 is obtained by a logistic function. DFF's points are more close to each other, which means that the model correlates well with subjective ratings. From Figure 10, we can find the proposed method is superior to other competing FR-IQA and NR-IQA models, which is consistent with the median PLCC in Table 4. From the above box plot, we can also find that the VIF model performs as well as the proposed method. However, its application is limited because it is a full-reference method. Therefore, experimental results further confirm that the proposed FFD-IQA model has good performance and practical application.

Conclusions
In this paper, we proposed a novel metric for NR-IQA, dubbed as DFF-IQA. It is based on dualdomain features extracted. The basic consideration is to develop some known facts of the human visual system (HVS) to build an IQA model that is useful for blind quality evaluation of color images. For this purpose, the proposed DFF-IQA model is dedicated to characterizing the image quality from both spatial and frequency domains. In the spatial domain, features of weighted local binary model From Figure 10, we can find the proposed method is superior to other competing FR-IQA and NR-IQA models, which is consistent with the median PLCC in Table 4. From the above box plot, we can also find that the VIF model performs as well as the proposed method. However, its application is limited because it is a full-reference method. Therefore, experimental results further confirm that the proposed FFD-IQA model has good performance and practical application.  Figure 9 is obtained by a logistic function.
From Figure 10, we can find the proposed method is superior to other competing FR-IQA and NR-IQA models, which is consistent with the median PLCC in Table 4. From the above box plot, we can also find that the VIF model performs as well as the proposed method. However, its application is limited because it is a full-reference method. Therefore, experimental results further confirm that the proposed FFD-IQA model has good performance and practical application.

Conclusions
In this paper, we proposed a novel metric for NR-IQA, dubbed as DFF-IQA. It is based on dualdomain features extracted. The basic consideration is to develop some known facts of the human visual system (HVS) to build an IQA model that is useful for blind quality evaluation of color images. For this purpose, the proposed DFF-IQA model is dedicated to characterizing the image quality from both spatial and frequency domains. In the spatial domain, features of weighted local binary model

Conclusions
In this paper, we proposed a novel metric for NR-IQA, dubbed as DFF-IQA. It is based on dual-domain features extracted. The basic consideration is to develop some known facts of the human visual system (HVS) to build an IQA model that is useful for blind quality evaluation of color images. For this purpose, the proposed DFF-IQA model is dedicated to characterizing the image quality from both spatial and frequency domains. In the spatial domain, features of weighted local binary model (WLBP), naturalness and spatial entropy are extracted. In the frequency domain, the features of spectral entropy, asymmetrical generalized gaussian distribution (AGGD) fitting parameters and oriented energy distribution (OED) of curvelet coefficient are extracted. Then, the features extracted in the dual domain are fused to form a feature vector. At last, random forest (RF) is adopted to build the relationship between image features and quality scores, yielding a measure of image quality. Experiments on LIVE databases well demonstrate the superiority of the proposed DFF-IQA model. In the future, we will consider to further improve the performance of the algorithm by extracting more effective perceptional features.