Full-Reference Image Quality Assessment Based on Grünwald–Letnikov Derivative, Image Gradients, and Visual Saliency

: The purpose of image quality assessment is to estimate digital images’ perceptual quality coherent with human judgement. Over the years, many structural features have been utilized or proposed to quantify the degradation of an image in the presence of various noise types. Image gradient is an obvious and very popular tool in the literature to quantify these changes in the images. However, gradient is able to characterize images locally. On the other hand, results from previous studies indicate that global contents of a scene are analyzed before the local features by the human visual system. Relying on these features of the human visual system, we propose a full-reference image quality assessment metric that characterizes the global changes of an image by the Grünwald– Letnikov derivatives and the local changes by image gradients. Moreover, visual saliency is also utilized for weighting the changes in the images and emphasizing those areas of the image which are salient to the human visual system. To prove the efﬁciency of the proposed method, massive experiments were carried out on publicly available benchmark image quality assessment databases.


Introduction
Image quality assessment (IQA) is still a serious research challenge due to the difficulties of modelling the enormous complexity of the human visual system and perception. Presently, IQA algorithms are divided into two distinct classes, i.e. subjective and objective IQA. Specifically, subjective IQA focuses on collecting subjective quality scores from human participants in a laboratory environment [1] or an online crowdsourcing experiment [2]. Subsequently, users' individual quality ratings are averaged into mean opinion scores (MOS) that are later considered as a direct measure of image quality. Besides, subjective IQA studies in detail the effects of viewing distances, display devices, lighting conditions, and participants demographical and physical features. Many benchmark IQA databases [3][4][5] can be found online which are the results of subjective quality experiments. Specifically, these databases consist of a number of digital images with their corresponding MOS values.
In contrast to subjective IQA, the aim of objective IQA is devising mathematical algorithms and methods which are capable of predicting perceptual image quality. In the literature, objective IQA is classified into three broad groups. The first group is fullreference image quality assessment (FR-IQA) where the algorithms estimate the quality of distorted images with full access to the distortion-free, reference images. In contrast, no information is available about reference images in no-reference image quality assessment (NR-IQA). Finally, reduced-reference image quality assessment (RR-IQA) corresponds to a transition between NR-IQA and FR-IQA. Although full information about the reference images is not available, but some features derived from the reference images can be applied in RR-IQA.
Over the years, many structural features have been utilized or proposed to quantify image degradations. Image gradient, which characterizes image locally, is a very popular tool in the literature for this purpose [6][7][8][9]. Results of previous studies indicate that global contents of a scene are analyzed by the human visual system before the local features [10]. The main contribution of this study is an FR-IQA metric that characterizes the global changes of an image by the Grünwald-Letnikov derivative and the local changes by image gradients. Thus, a combined approach is proposed in this regard. Moreover, visual saliency is also utilized for weighting the changes in the images and emphasizing those image regions which are salient to the human visual system.

Literature Review
In the literature, numerous FR-IQA algorithms and metrics have been proposed in recent decades [11]. Further, these methods can be divided into five classes, such as (i) error visibility, (ii) structural similarity, (iii) information-theoretic, (iv) learning-based, and (v) fusion-based methods. The main idea of error visibility methods is to devise a distance measure between pixel values or between the transformed representations of the reference and the distorted images to quantify perceptual quality. The most well-known example is the simple mean square error which correlates weakly with the perceptual quality but is still widely used owing to its simplicity [12]. Another well-known example is the peak signal-to-noise ratio (PSNR) which is commonly applied to quantify the quality of image reconstruction and lossy compression [13]. Ponomarenko et al. developed further PSNR by taking the discrete cosine transform (DCT) coefficients and the contrast sensitivity function [14].
Structural similarity methods try to measure similarity between the corresponding image regions of the reference and the distorted image. The representative example of this approach and probably the most well-known FR-IQA metrics is the structural similarity index measure (SSIM) [15] which compares the reference and the distorted images in respect of luminance, contrast, and structure. Over the years many extensions and modifications of SSIM have been proposed in the literature. For example, Zhou et al. [16] calculated SSIM over multiple scales of an input image. In contrast, Li and Bovik [17] determined SSIM for three distinct image regions, such as textures, edges, and smooth regions, and took their weighted average as perceptual quality metric. Later, this approach was further developed by dividing edges into preserved and changed categories [18]. To achieve higher accuracy, Liu et al. [19] performed SSIM in the wavelet domain. The authors' approach was further developed in the complex wavelet domain by Sampat et al. [20] Wang and Li [21] measured the information content of the input images and used it to weight SSIM. Sun et al. [22] proposed to use superpixels [23] to segment the reference and distorted images first, since they provide a more meaningful representation of images than rectangular pixel grids. This method was further improved by Frackiewicz et al. [24] by using other color spaces and comparing similarity maps by mean deviation similarity index.
Information-theoretic FR-IQA approaches measure some kind of mutual information between the reference and the distorted image to quantify perceptual image quality. A representative example is the visual information fidelity (VIF) model [25]. Specifically, the authors applied Gaussian scale mixtures in the wavelet domain to model the reference and the distorted images. Mutual information was measured between the two Gaussain scale mixtures to quantify perceptual quality.
Recently, deep learning has gained popularity in the field of visual quality assessment as well [26][27][28]. Learning-based methods apply some kind of machine or deep learning algorithm to learn relationships between image features and perceptual quality. For example, Tang et al. [29] extracted spatial and frequency domain features from reference-distorted image pairs and combined them. The obtained features were projected onto perceptual quality scores with the help of a trained random forest regressor. In contrast, Bosse et al. [30] used a convolutional neural network (CNN) as feature extractor. More specifically, deep features were extracted from a distorted and a reference image patch by a CNN and were fused together. Subsequently, the fused feature vectors were projected onto patch-wise quality scores. To get the perceptual quality of an input image, the arithmetic mean of the patchwise scores was determined. In contrast, Ahn et al. [31] predicted a distortion sensitivity map with a three-stream CNN using as input the distorted image, the reference image, and the spatial error map. To get the perceptual quality, the sensitivity map is multiplied by the spatial error map.
Fusion-based methods take existent FR-IQA metrics to compile a new image quality evaluator. The main idea behind fusion-based methods is similar to those of boosting in machine learning. For example, Okarma et al. [32] studied the properties of MS-SSIM, VIF, and R-SVD FR-IQA metrics thoroughly and proposed the fusion of these three metrics by particular arithmetic expression containing productions and powers. Later, Okarma proposed different regression techniques for a more effective FR-IQA metrics fusion [33,34]. Based on the results of Okarma, Oszust [35] and Yuan et al. [36] introduced other regression based fusion techniques. Specifically, in [35] traditional FR-IQA metrics were used as predictor variables in a multiple linear regression model, while Yuan et al. [36] utilized kernel ridge regression for combining predefined local structures and local distortion measurements. In [37], a support vector regression based fusion was carried out based on ten FR-IQA metrics. In contrast, Lukin et al. [38] trained a neural network to fuse the results of six traditional FR-IQA metrics. Instead of machine learning techniques, Oszust [39] implemented a genetic algorithm for the decision fusion of multiple metrics. This approach was further developed in [40] by applying multi-gene genetic programming. Amirshahi et al. [41] compared the feature maps, extracted from an AlexNet [42] convolutional neural network model, of the reference and the distorted image using traditional FR-IQA metrics. To obtain the perceptual quality of the distorted image, the quality scores of the feature maps were aggregated using different types of averages, such as arithmetic and geometric mean.

Organization of the Paper
The remaining parts of this study is organized as follows. After this introduction and literature review, Section 2 introduces briefly the mathematical preliminaries, i.e. Grünwald-Letnikov derivative, and our proposed method in detail. Next, Section 3 gives the definitions of the applied evaluation metrics and presents a comprehensive comparison to the state-of-the-art. Lastly, the paper is concluded in Section 4.

Preliminaries
In this section, some mathematical concepts and definitions are introduced which have vital importance in our proposed FR-IQA metrics. The Grünwald-Letnikov derivative, introduced by Anton Karl Grünwald Austrian and Aleksey Vasilievich Letnikov Russian mathematicians, is a basic extension of the definition of the derivative in fractal calculus. Specifically, it enables to take the derivative of a function a non-integer number of times [47]. The definition of Grünwald-Letnikov derivative is conducted from the integer-order calculus in the literature. The starting point is the definition of the first-order derivative of a one-dimensional signal f (x) which is determined as: Based on this, the second-order derivative can be expressed as In general, for any positive integer n, we can derive the following formula where Eliminating the restriction that n must be a positive by an α non-integer number, it is reasonable to define where GL x and x 0 represent the upper and lower bounds, respectively. Moreover, [·] stands for the rounding operator.
is the Gamma-function, we can define Grünwald-Letnikov derivative as: It is essential to highlight one important difference between ordinary and Grünwald-Letnikov derivatives. As one can see from Equation (6), the calculation of the Grünwald-Letnikov derivative of f (x) at x requires all function values from x 0 to x. As a consequence, Grünwald-Letnikov derivative is considered to have memory. In the literature, this property is also formulated as that Grünwald-Letnikov derivative requires non-local information [48]. As an illustration, Figure 1 depicts the fractional derivatives of the sine function with order between 0.1 and 0.9. Next, we have to define the Grünwald-Letnikov derivative of a two dimensional signal which is given as where M and N stand for the number of rows and columns in I(x, y). Similarly to the ordinary derivative, the Grünwald-Letnikov derivative has to be defined in two dimensions, i.e., xand y-directions [49,50]. In x-direction, it can be defined as follows Similarly in y-direction Hence, the Grünwald-Letnikov fractional derivative can be given as Figure 2 shows a grayscale test image and its Grünwald-Letnikov derivatives with different values of α.

Proposed Metric
Results of previous studies indicate that global contents of a scene are analyzed by the human visual system before the local features [10]. In this study, we propose an FR-IQA metric that combines global and local information of an image by applying Grünwald-Letnikov derivatives and ordinary image derivatives (high-level overview is depicted in Figure 3). In the followings, R(x, y) stands for the pristine, reference image, while D(x, y) denote the distorted image generated from R(x, y). Global similarity (denoted by S G (x, y)) between R(x, y) and D(x, y) is expressed as the similarity between the Grünwald-Letnikov derivatives of R(x, y) and D(x, y) where c 1 is a constant number to manage numerical stability [15]. In our MATLAB implementation α = 0.6 fractional derivative order was used. To characterize the similarity between local changes, gradient operators are applied. The literature [51,52] recommends the Scharr operator, since it has a good effect on image quality estimation. Specifically, a 3 × 3 Scharr operator was applied in our method whose horizontal (S x ) and vertical (S y ) templates can be given as These templates can be applied separately to obtain gradient components of an image I in each orientation: where * stands for the convolution operator. These can be put together to get the gradient magnitude:  To characterize the local similarity (S L (x, y)) between the reference and distorted images, the gradient magnitudes are utilized as follows: where G R (x, y) and G D (x, y) stand for reference and distorted gradient magnitude maps, respectively. Moreover, c 2 is a constant number to manage numerical stability. The similarity map (denoted by S(x, y)) between a reference and a distorted image using the preceding equations is defined as where λ is used to fine-tune the respective weights of the importance of global and local information. In our MATLAB implementation, λ = 0.7 was applied. To get the local global variation (LGV) quality score, we take the average of S(x, y). Formally, it can be written In the saliency weighted local global variation (SWLGV) quality score, the visual attention mechanism is also taken into account. Namely, the differences between the reference and the distorted images are emphasized in the salient regions. Let denote the saliency maps of the reference and distorted images by SM R (x, y) and SM D (x, y), respectively. In our metric, the algorithm of Imamoglu et al. [53] was used to generate saliency maps. The saliency map of a reference-distorted image pair (denoted by SM(x, y)) is the elementwise maximum of SM R (x, y) and SM D (x, y), SM(x, y) = max(SM R (x, y), SM D (x, y)).
Specifically, SWLGV corresponds to the weighted average of S(x, y) and SM(x, y), where SM(x, y) represents the weights. Formally, it can be written:

Experimental Results and Analysis
This section presents our obtained experimental results. First, Section 3.1 describes the applied evaluation metrics and protocol. Subsequently, the used benchmark IQA databases are introduced in Section 3.2. Finally, a comparison of LGV and SW LGV to the state-of-the-art is presented in Section 3.3.

Evaluation Metrics and Protocol
The performance of an FR-IQA metric is given by correlation coefficients in the literature which are measured between predicted and ground-truth quality scores [54]. With this end in view, three correlation coefficients, such as Pearson's linear correlation coefficient (PLCC), Spearman's rank order correlation coefficient (SROCC), and Kendall's rank order correlation coefficient (KROCC), are widely used in the literature. Thus, these evaluation metrics are also applied in this paper. The PLCC between two vectors (denoted by x and y) is defined as: Following the recommendations of Sheikh et al. [55], a non-linear mapping is applied between the predicted and ground-truth scores before the calculation of PLCC using the following formula: where β i 's (i goes from 1 to 5) represent the fitting parameters. Moreover, Q p and Q denote the predicted and mapped quality scores, respectively. Similarly, SROCC is defined as: where x and y stand for the middle ranks of x and y, respectively. Finally, KROCC is defined as where C stands for the number of concordant pairs between x and y, while D denotes the number of discordant pairs. The proposed methods were implemented in MATLAB R2020a using a STRIX Z270H Gaming personal computer with Intel(R) Core(TM) i7-7700K CPU 4.20 GHz (8 cores) and 15 GB memory.

Databases
Benchmark IQA databases used for developing, testing, and ranking FR-IQA methods contain a small group of reference images whose perceptual quality are believed flawless. Moreover, distorted images are artificially generated from the reference images applying several levels and types of distortions, such as motion blur, JPEG compression, or salt & pepper noise. Implicitly, MOS values belong to the distorted images. In this study, we utilized four popular IQA benchmark databases, i.e. KADID-10k [5], TID2013 [3], TID2008 [56], and CSIQ [57], to evaluate the proposed LGV and SWLGV metrics. The empirical MOS distributions of these databases are depicted in Figure 5, while their main properties are outlined in Table 1. Figure 6 depicts some sample distorted images from KADID-10k [5] as an illustration of IQA databases.

Comparison to the State-of-the-Art
As it can be seen in the previous section, the proposed method possesses several adjustable parameters to determine global and local similarity between the reference and the distorted images, such as α and h (Equation (6)) for the Grünwald-Letnikov derivative and λ that is applied to weight the importance of global and local information.
To determine optimal values for these parameters, eight random reference images and their corresponding 544 distorted counterparts were taken and numerical experiments were carried out on this subset of TID2008 [56]. Namely, α and λ were varied from 0 to 1 in steps of 0.1, while h was varied from 10 to 100 in steps of 10. During the numerical experiments, we monitored the SROCC values. Finally, we choose α = 0.6, λ = 0.7, and h = 80 where the maximum of SROCC was measured.
To compare the previously presented LGV and SWLGV FR-IQA metrics to the stateof-the-art, nine other state-of-the-art FR-IQA metrics were collected, i.e., 2stepQA [58], CSV [59], DISTS [60], GSM [8], MAD [57], MS-SSIM [16], ReSIFT [61], RVSIM [62], and SSIM [15], whose source codes were made online available for the research community. The results measured on KADID-10k [5] and TID2013 [3] are outlined in Table 2. The results of TID2008 [56] and CSIQ [57] are presented in Table 3. From the presented results, it can be concluded that the SWLGV metric provides the best outcomes in terms of SROCC and KROCC on KADID-10k [5] and TID2008 [56]. Furthermore, it gives the best PLCC value and the second best SROCC and KROCC values on TID2013 [3]. Interestingly, the saliency weighting step do not improve the performance of estimation on CSIQ [57], while it significantly improves the estimation accuracy on the other applied databases. Table 4 summarizes the direct and weighted averages of the PLCC, SROCC, and KROCC values measured on KADID-10k [5], TID2013 [3], TID2008 [56], and CSIQ [57]. It can be observed that the proposed SWLGV provides the best results in terms of SROCC and KROCC results, while the proposed LGV gives the second best results in terms of SROCC and KROCC.  Table 3. Comparison of LGV and SWLGV to the state-of-the-art on TID2008 [56] and CSIQ [57]. The highest values are typed in bold, while the second highest ones are underlined.   Tables 5 and 6 summarizes the SROCC values which were measured separately on the distortion levels of TID2013 [3] and TID2008 [56]. As mentioned in Section 1, TID2013 [3] and TID2008 [56] have five and four different distortion levels, respectively. It can be observed that LGV and SWLGV give in general higher performance on higher distortion levels. Moreover, SWLGV provides the second best SROCC values on 4 out of 5 distortion levels of TID2013 [3], while it is the best performing method on all distortion levels of TID2008 [56]. On the other hand, LGV provides the second best result on the lowest distortion level of TID2013 [3] and the second highest SROCC values on all distortion levels of TID2008 [56].   Tables 7 and 8 presents the results on TID2013 [3] and TID2008 [56] in detail for every distortion types found in these IQA benchmark databases. As mentioned in Section 3. On the other hand, TID2008 [56] contains a narrower set of distortions than TID2013 [3]. Specifically, it includes the first 17 distortion types of TID2013 [3]. It can be seen that SWLGV and LGV are the best performing method on 5 out of 24 distortion types of TID2013 [3]. On the other hand, SWLGV provides the highest SROCC values on 8 out of 17 distortion types of TID2008 [56] and gives on 6 distortions the second best results.

Conclusions
In the present study, an innovative FR-IQA metric was proposed relying on Grünwald-Letnikov derivative, image gradients, and visual saliency. The starting point was the observation of previous studies that the human visual system analyzes the global features of a scene before the local ones. However, image gradients, which are very popular in the literature to quantify image degradations, characterize the image locally. Our main contribution was a metric that describes the global changes of an image relying on Grünwald-Letnikov derivative, while the local changes are quantified by image gradients. Next, the combination of local and global changes were weighted by visual saliency to estimate perceptual image quality. The proposed metric was compared with several other state-of-the-art algorithms on major standard IQA databases. It was demonstrated that the proposed method is able to surpass or approach the state-of-the-art performance.