A No Reference Image Quality Assessment Metric Based on Visual Perception †

Nowadays, how to evaluate image quality reasonably is a basic and challenging problem. In view of the present no reference evaluation methods, they cannot reflect the human visual perception of image quality accurately. In this paper, we propose an efficient general-purpose no reference image quality assessment (NRIQA) method based on visual perception, and effectively integrates human visual characteristics into the NRIQA fields. First, a novel algorithm for salient region extraction is presented. Two characteristics graphs of texture and edging of the original image are added to the Itti model. Due to the normalized luminance coefficients of natural images obey the generalized Gauss probability distribution, we utilize this characteristic to extract statistical features in the regions of interest (ROI) and regions of non-interest respectively. Then, the extracted features are fused to be an input to establish the support vector regression (SVR) model. Finally, the IQA model obtained by training is used to predict the quality of the image. Experimental results show that this method has good predictive ability, and the evaluation effect is better than existing classical algorithms. Moreover, the predicted results are more consistent with human subjective perception, which can accurately reflect the human visual perception to image quality.


Introduction
With the rapid development of information technologies and widespread usage of smart phones, digital imaging has become a more and more important medium to acquire information and communicate with other people.These digital images generally suffer impairments during the process of acquisition, compression, transmission, processing and storage.In addition, these phenomena have brought us great difficulties in studying images and understanding the objective world.Because of this, the image quality assessment (IQA) metrics have become a fundamentally important and challenging work.Effective IQA metrics could play important roles in applications such as dynamic monitoring and adjustment of image quality, optimizing the parameter settings of image processing systems, and searching high quality images of medical imaging and so on [1].For one thing, high quality images can help doctors to judge the severity of the disease in medical fields.For another, such as high-definition televisions, personal digital assistants, internet video streaming, and video on demand, necessitate the means to evaluate the images quality of this information.Therefore, in order to obtain high quality and high fidelity images, the study of image quality evaluation has important practical significance [2,3].
Because human beings represent terminals for the majority of processed digital images, subjective assessment is the ultimate criterion and reliable image quality assessment method.However, the subjective assessment methods hinder its application in practice because of its disadvantages, such as time-consuming, expensive, complex and laborious.Thus, the goal of objective IQA is to automatically evaluate the quality of images as near to human perception as possible [2].
Based on the availability of a reference image, objective IQA metrics can be classified into full reference (FR), reduced reference (RR), and no reference (NR) approaches [4].Only recently did FR-IQA methods reach a satisfactory level of performance, as demonstrated by high correlations with human subjective judgments of visual perception.SSIM [5], MS-SSIM [6] and VSNR [7] are examples of successful FR-IQA algorithms.These metrics are based on measuring the similarity between the distorted image and its corresponding original image.However, in real-world applications, where the original is not available, FR metrics are not used.This strictly limits the application domain of FR-IQA algorithms.In addition, the NR-IQA is the only possible algorithm that can be used in the practical application.However, a number of NR-IQA metrics have been testified that they do not always correlate with the perceived image quality [8,9].
Presently, NR-IQA algorithms generally follow one of two trends: The distortion-specific and general-purpose methods.The former evaluate the distorted image of a specific type while the latter directly measure the image quality evaluation without knowing the type of image distortion.Since the result of evaluation ultimately depends on the feeling of the observers, image evaluation with more perfect and more suitable for the actual quality must be based on human visual, psychological characteristics and organic combine the subjective and objective evaluation methods.A large number of studies have shown that: considering the human visual system (HVS) evaluation methods are better than that without considering the characteristics of HVS evaluation methods [10][11][12][13].However, existing no-reference image quality assessment algorithms are failed to give full consideration to the human visual features [10].
At present, there are some problems in the field of image quality assessment: 1.
Unable to obtain the original image.

2.
Difficult to determine whether the type of distortion of the distorted image exists.

3.
Existing methods lack of considering human visual characteristics.4.
How to ensure the characteristics to be not significantly difference caused by the distortion images with same degree yet different types.
In this paper, based on the existing image quality evaluation methods, we draw the human visual attention region mechanism and the HVS characteristics into the no reference image quality assessment method.Then, we propose a universal no-reference image quality assessment method based on visual perception.
The rest of this paper is organized as follows.In Section 2, we describe the current research about no reference image quality assessment methods and visual saliency.In Section 3, we introduce the extraction model of visual region of interest.In Section 4, we provide an overview of the method.In Section 5, we describe blind/reference image spatial quality evaluator (BRISQUE) algorithm and the prediction model is provided in Section 6.We present the results in Section 7, and we conclude in Section 8.
of researchers have devoted time to improving the assessment accuracy of NR IQA methods by taking advantage of known natural scene statistics characteristics.These methods are failed to give full consideration to the human visual characteristics [2].There is also a deviation between the predicted qualities and the real qualities, since the human visual system is the ultimate assessor of image quality.In addition, almost all of the no reference image quality assessment methods are based on gray scale images, and they do not make full use of the color characteristics of the image.
Through above analysis of NR IQA methods, at present, both in distortion-specific and general-purpose methods, almost all of the methods use statistical characteristics of natural images to establish quality evaluation model, and have achieved better effect of evaluation.However, considering the statistical characteristics of natural images can hardly reflect the whole regularities of image quality.There is still a certain gap with the subjective consistency of human vision.
Since the HVS is the ultimate assessor of image quality, a number of image quality assessment methods based on an important feature of the HVS, namely, visual perception, are emerging in the present day [10].Among them, the research of region of interest (or visual saliency) is a major branch of the HVS.In the field of image processing, ROI of an image are the areas that differ significantly from their adjacent regions.They can immediately attract our eyes and capture our attention.Therefore, the ROI is a very important region in the image quality assessment.On the other hand, how to build a computational visual saliency model has been attracting tremendous attention in recent years [32].
Nowadays, a number of computational models to simulate human visual attention have been studied by scholars and some powerful models have been proposed.For further details of ROI refer to [9][10][11][12][32][33][34][35][36].In [33], Itti et al. proposed a visual saliency model which was the first influential and best known visual saliency model.In this paper, we call it Itti model.Itti model mainly contains two aspects.Firstly, they introduced image pyramids for feature extraction, which makes the visual saliency computation efficient.Secondly, Itti et al. proposed the biologically inspired "center-surround difference" operation to compute feature dependent saliency maps across scales [32].This model effectively breaks down the complex problem of scene understanding by rapidly selecting.Following the Itti model, Harel et al. proposed the graph-based visual saliency (GBVS) model [37].The GBVS was introduced a novel graph-based normalization and combination strategy, and was a new bottom-up visual saliency model.It mainly consists of two steps.The first step is forming activation maps on certain feature channels.In addition, the second step is normalizing them in a way which highlights salient and admits combination with other maps.Experimental results show that the GBVS predicts ROI more reliably than other algorithms.
In this paper, we mainly study the human visual characteristics, extract the region of interest, and combine the statistical characteristics of the natural image as the measurement index of the distortion image.Finally, do experiments on the database LIVE [38], the experimental results show that the features extraction and the learning method are reasonable, and correlate quite well with human perception.

Extraction Model of Visual Region of Interest
With the increasingly development of information technology, images have become the main media of information transmission.How to analyze and deal with a large number of image information effectively and accurately is an important subject.Then, researchers have found that the most important information in the image is often focusing on a few key areas.They call these key areas salient regions or regions of interest.We will greatly improve the efficiency and accuracy of image processing and analyzing with extracting these salient regions accurately.So far, there are many regions of interest extraction algorithms.The detection technology of the ROI, which has been based on interaction and detection technology, has gradually developed into based on visual feature technology.The salient map model proposed by Itti is the most typical ROI detection, based on visual features [33].This method only considers the bottom-up signal, simple and easy to implement.It is the most influential visual attention model.

The Principle of Itti Model
In 1998, Itti et al. proposed an algorithm based on salient map model.The algorithm defines the color, brightness and direction of three kinds of visual characteristics, and uses Gaussian pyramid building scale space.The model uses the center-surround of the compute strategy to form a set of feature maps, and then these collections are normalized and merged to create salient map.It includes three processes of features acquisition, salient map computation, selection and transfer of the region of interest [34][35][36].Figure 1 is the implementation process of Itti Koch model.

The Principle of Itti Model
In 1998, Itti et al. proposed an algorithm based on salient map model.The algorithm defines the color, brightness and direction of three kinds of visual characteristics, and uses Gaussian pyramid building scale space.The model uses the center-surround of the compute strategy to form a set of feature maps, and then these collections are normalized and merged to create salient map.It includes three processes of features acquisition, salient map computation, selection and transfer of the region of interest [34][35][36].Figure 1 is the implementation process of Itti Koch model.

Algorithm Improvement
In this paper, we propose an improved model based on the idea of Itti model.We add texture feature and edge structure characteristics to the original model.The model block diagram is shown in Figure 2.

Algorithm Improvement
In this paper, we propose an improved model based on the idea of Itti model.We add texture feature and edge structure characteristics to the original model.The model block diagram is shown in Figure 2.

The Principle of Itti Model
In 1998, Itti et al. proposed an algorithm based on salient map model.The algorithm defines the color, brightness and direction of three kinds of visual characteristics, and uses Gaussian pyramid building scale space.The model uses the center-surround of the compute strategy to form a set of feature maps, and then these collections are normalized and merged to create salient map.It includes three processes of features acquisition, salient map computation, selection and transfer of the region of interest [34][35][36].Figure 1 is the implementation process of Itti Koch model.

Algorithm Improvement
In this paper, we propose an improved model based on the idea of Itti model.We add texture feature and edge structure characteristics to the original model.The model block diagram is shown in Figure 2.

Texture Feature Extraction
Texture is an important visual cue, which is a common and difficult feature in the image.The texture feature is consistent with human visual perception process and plays an important role in the region of interest.There are many methods to extract texture features of images currently, such as wavelet extraction methods, gray level co-occurrence matrix (GLCM) extraction methods and so on.In recent years, the wavelet methods have been widely used in many fields.In literatures [39,40], the authors introduced the wavelet method into the medical field and had achieved considerable success.In the former article, the authors used the discrete wavelet packet transform (DWPT) to extract wavelet packet coefficients from MR brain images.In the latter article, to help improve the directional selectivity impaired by discrete wavelet transform, they proposed a dual-tree complex wavelet transform (DTCWT), which was implemented by two separate two-channel filter banks.DTCWT obtained directional selectivity by using approximately analytic wavelets.At each scale of a two dimensions DTCWT, it produces in total six directionally selective sub-bands (±15 • , ±45 • , ±75 • ) for both real and imaginary parts.The method can obtain more information about the direction of the images.However, Gabor wavelet is an important method of texture feature extraction.It not only can extract texture features effectively, but also eliminate redundant information.In addition, the frequency and direction of Gabor filter that close to the human visual system for frequency and direction, and they were used for texture representation and description.A series of filtered images can be obtained by convolution of the image with the Gabor filter, and the information of each image on the scale and direction is described in each image [41].In this paper, we use Gabor wavelet to extract texture features of each filter image.Two dimensional Gabor function g(x, y) can be expressed as: where W is Gaussian function of polyphonic system frequency.According to the accordance with human visual system, W = 0.5.The Fourier transform of g(x, y) can be expressed as: where The frequency spectrum is shown in Figure 3.

Texture Feature Extraction
Texture is an important visual cue, which is a common and difficult feature in the image.The texture feature is consistent with human visual perception process and plays an important role in the region of interest.There are many methods to extract texture features of images currently, such as wavelet extraction methods, gray level co-occurrence matrix (GLCM) extraction methods and so on.In recent years, the wavelet methods have been widely used in many fields.In literatures [39] and [40], the authors introduced the wavelet method into the medical field and had achieved considerable success.In the former article, the authors used the discrete wavelet packet transform (DWPT) to extract wavelet packet coefficients from MR brain images.In the latter article, to help improve the directional selectivity impaired by discrete wavelet transform, they proposed a dual-tree complex wavelet transform (DTCWT), which was implemented by two separate two-channel filter banks.DTCWT obtained directional selectivity by using approximately analytic wavelets.At each scale of a two dimensions DTCWT, it produces in total six directionally selective sub-bands (±15°, ±45°, ±75°) for both real and imaginary parts.The method can obtain more information about the direction of the images.However, Gabor wavelet is an important method of texture feature extraction.It not only can extract texture features effectively, but also eliminate redundant information.In addition, the frequency and direction of Gabor filter that close to the human visual system for frequency and direction, and they were used for texture representation and description.A series of filtered images can be obtained by convolution of the image with the Gabor filter, and the information of each image on the scale and direction is described in each image [41].In this paper, we use Gabor wavelet to extract texture features of each filter image.Two dimensional Gabor function g(x, y) can be expressed as: where W is Gaussian function of polyphonic system frequency.According to the accordance with human visual system, W = 0.5.The Fourier transform of g(x, y) can be expressed as: . The frequency spectrum is shown in Figure 3.In order to get a set of self-similar filters, we use g(x, y) as the generating function, and do moderate scale expansion and rotation transformation on it.That is, Gabor wavelet: where a > 1, m, n are integer.
, where θ = nπ/k, and k is the number of direction.m, n are scale and direction respectively, n ∈ [0, k]. Figure 4 are four different directions of the Gabor wavelet.Then we use the Gabor wavelet transform to extract the texture features of the image.For an image I(x, y), its Gabor wavelet transform can be defined as: where, * represents the conjugate complex, and the texture features are calculated as shown in Formula ( 5).In the model, the center-surround is implemented as the difference between fine and coarse scales.The center represent a pixel at scale c ∈ {2, 3, 4}, and the surround represent the corresponding pixel at scale s = c + δ, δ ∈ {3, 4}.The across-scale difference (denoted "Θ" below) between two maps is acquired by interpolation to the finer scale and point-by-point subtraction.
The above feature maps are combined to get the texture salient map.They are obtained through across-scale addition, "⊕", which consists of reduction of each map to scale four and point-by-point addition.The combined formula is as follow.
Algorithms 2016, 9, 87 7 of 7 In order to get a set of self-similar filters, we use g(x, y) as the generating function, and do moderate scale expansion and rotation transformation on it.That is, Gabor wavelet: where θ = nπ/k, and k is the number of direction.m, n are scale and direction respectively, n ∈ [0, k]. Figure 4 are four different directions of the Gabor wavelet.Then we use the Gabor wavelet transform to extract the texture features of the image.For an image I(x, y), its Gabor wavelet transform can be defined as: where, * represents the conjugate complex, and the texture features are calculated as shown in Formula ( 5).In the model, the center-surround is implemented as the difference between fine and coarse scales.The center represent a pixel at scale c ∈ {2, 3, 4}, and the surround represent the corresponding pixel at scale s = c + δ, δ ∈ {3, 4}.The across-scale difference (denoted "Θ" below) between two maps is acquired by interpolation to the finer scale and point-by-point subtraction.

T(c,s) T(c) T(s)
= Θ The above feature maps are combined to get the texture salient map.They are obtained through across-scale addition, "⊕", which consists of reduction of each map to scale four and point-by-point addition.The combined formula is as follow.T NTcs

Edge Structure Feature Extraction
Edge is the most important feature of the image.Currently, there are many kinds of edge detection methods, such as Roberts operator detection method, Sobel operator detection method, Prewitt operator detection method and so on.However, because of the noise, the Canny method is not easy to be disturbed, or to detect the real weak edges.The advantage is that the two different thresholds are used to detect strong edges and weak edges.Weak edges are included in the output image when the weak edges and strong edges are connected.In this paper, we use Canny edge detection algorithm to extract the edge structure features of the image.Steps are as follows: (1) We use Gauss filter to smooth image.Canny method firstly uses two-dimensional Gauss's function to smooth the image:

Edge Structure Feature Extraction
Edge is the most important feature of the image.Currently, there are many kinds of edge detection methods, such as Roberts operator detection method, Sobel operator detection method, Prewitt operator detection method and so on.However, because of the noise, the Canny method is not easy to be disturbed, or to detect the real weak edges.The advantage is that the two different thresholds are used to detect strong edges and weak edges.Weak edges are included in the output image when the weak edges and strong edges are connected.In this paper, we use Canny edge detection algorithm to extract the edge structure features of the image.Steps are as follows: (1) We use Gauss filter to smooth image.Canny method firstly uses two-dimensional Gauss's function to smooth the image: Its gradient vector is ∇G = ∂G/∂x ∂G/∂y .Where σ is distributed parameter of Gauss filter.
(2) We calculate the magnitude and direction of the gradient, and suppress the amplitude of the gradient.(3) We perform a dual threshold method to detect and connect edges.E(x, y) is the final edge structure map.
The edge structure characteristic is calculated as: The above feature maps are combined to get the edge structure salient map.The combined formula is as: In the end, the final image salient region is obtained by linear addition of 5 salient maps, and the formula is shown below: where, I, C and O represent intensity, color and orientation feature respectively.

Overview of the Method
Currently no reference image quality evaluation algorithm can be divided into two categories: distortion-specific methods and universal methods.However, the distortion-specific methods are limited in the practical application because the prior need to determine the type of the distorted images.The general-purpose methods do not depend on the type of distorted images, and have prospect of practical application.However, the existing algorithms are almost to extract natural statistical characteristics of image, and rarely take into account the visual characteristics of the human eye.Thus, in practical application, there is a deviation between the evaluation result and the result of the human eye perception.As shown in Figures 5 and 6, due to the visual characteristics of the human eye, in the same picture, different regions (such as interest and non interest areas) in the same image quality damage degree will make the human eye produce different visual experience.Under the same Gauss fuzzy, the visual observation effect of Figure 5 is better than Figure 6.
Algorithms 2016, 9, 87 8 of 8 Its gradient vector is . Where σ is distributed parameter of Gauss filter.
(2) We calculate the magnitude and direction of the gradient, and suppress the amplitude of the gradient.(3) We perform a dual threshold method to detect and connect edges.E(x, y) is the final edge structure map.
The edge structure characteristic is calculated as: The above feature maps are combined to get the edge structure salient map.The combined formula is as: In the end, the final image salient region is obtained by linear addition of 5 salient maps, and the formula is shown below: where, I, C and O represent intensity, color and orientation feature respectively.

Overview of the Method
Currently no reference image quality evaluation algorithm can be divided into two categories: distortion-specific methods and universal methods.However, the distortion-specific methods are limited in the practical application because the prior need to determine the type of the distorted images.The general-purpose methods do not depend on the type of distorted images, and have prospect of practical application.However, the existing algorithms are almost to extract natural statistical characteristics of image, and rarely take into account the visual characteristics of the human eye.Thus, in practical application, there is a deviation between the evaluation result and the result of the human eye perception.As shown in Figures 5 and 6, due to the visual characteristics of the human eye, in the same picture, different regions (such as interest and non interest areas) in the same image quality damage degree will make the human eye produce different visual experience.Under the same Gauss fuzzy, the visual observation effect of Figure 5 is better than Figure 6.In the observation of the image, human eye will be the first to pay attention to the visual characteristics of the more prominent areas.Although the image suffered the same damage, because of the different regions, the human eye subjective feelings are different.Based on the background, this paper proposes a method of image quality evaluation based on visual perception.Firstly, we extract the region of interest.Secondly, extract the features from the regions of interest and regions of non-interest.Finally, the extracted features are fused effectively, and the image quality evaluation model is established.The proposed approach we called region of interest blind/reference image spatial quality evaluator (ROI-BRISQUE).Figure 7 is the framework of the proposed method in this paper.The method of this paper is described as follows: Firstly, the LIVE image database is divided into two categories: training images and testing images.In addition, we extract the region of interest and features on the training images.Then, we use the feature vectors as the input of the ε-SVR, the DMOS of the corresponding image as the output target to train the image quality evaluation model.Finally, the image quality prediction model is used to predict the quality of the distorted image.
This paper focuses on the image region of interest extraction.The traditional Itti model calculates the salient region of the image by extracting the underlying features of the image color, brightness and direction.It can search out the attention region.However, there are still some shortcomings, for instance, the contour of the salient region is not clear.In order to extract the region of interest more accurately, we add texture and edge structure features to the Itti model.Then, we use the improved model to extract the region of interest and region of non-interest, and use BRISQUE algorithm to extract the natural statistical features of the image respectively.Moreover, the characteristics of the region of interest and region of non interest are fused to get the measure factor of the image.We train the image quality evaluation model, which the measure factor as the input and the DMOS value of corresponding image as the output target of SVR.Finally, the quality of the distorted image is predicted by the trained evaluation model.In the observation of the image, human eye will be the first to pay attention to the visual characteristics of the more prominent areas.Although the image suffered the same damage, because of the different regions, the human eye subjective feelings are different.Based on the background, this paper proposes a method of image quality evaluation based on visual perception.Firstly, we extract the region of interest.Secondly, extract the features from the regions of interest and regions of non-interest.Finally, the extracted features are fused effectively, and the image quality evaluation model is established.The proposed approach we called region of interest blind/reference image spatial quality evaluator (ROI-BRISQUE).Figure 7 is the framework of the proposed method in this paper.In the observation of the image, human eye will be the first to pay attention to the visual characteristics of the more prominent areas.Although the image suffered the same damage, because of the different regions, the human eye subjective feelings are different.Based on the background, this paper proposes a method of image quality evaluation based on visual perception.Firstly, we extract the region of interest.Secondly, extract the features from the regions of interest and regions of non-interest.Finally, the extracted features are fused effectively, and the image quality evaluation model is established.The proposed approach we called region of interest blind/reference image spatial quality evaluator (ROI-BRISQUE).Figure 7 is the framework of the proposed method in this paper.The method of this paper is described as follows: Firstly, the LIVE image database is divided into two categories: training images and testing images.In addition, we extract the region of interest and features on the training images.Then, we use the feature vectors as the input of the ε-SVR, the DMOS of the corresponding image as the output target to train the image quality evaluation model.Finally, the image quality prediction model is used to predict the quality of the distorted image.
This paper focuses on the image region of interest extraction.The traditional Itti model calculates the salient region of the image by extracting the underlying features of the image color, brightness and direction.It can search out the attention region.However, there are still some shortcomings, for instance, the contour of the salient region is not clear.In order to extract the region of interest more accurately, we add texture and edge structure features to the Itti model.Then, we use the improved model to extract the region of interest and region of non-interest, and use BRISQUE algorithm to extract the natural statistical features of the image respectively.Moreover, the characteristics of the region of interest and region of non interest are fused to get the measure factor of the image.We train the image quality evaluation model, which the measure factor as the input and the DMOS value of corresponding image as the output target of SVR.Finally, the quality of the distorted image is predicted by the trained evaluation model.The method of this paper is described as follows: Firstly, the LIVE image database is divided into two categories: training images and testing images.In addition, we extract the region of interest and features on the training images.Then, we use the feature vectors as the input of the ε-SVR, the DMOS of the corresponding image as the output target to train the image quality evaluation model.Finally, the image quality prediction model is used to predict the quality of the distorted image.
This paper focuses on the image region of interest extraction.The traditional Itti model calculates the salient region of the image by extracting the underlying features of the image color, brightness and direction.It can search out the attention region.However, there are still some shortcomings, for instance, the contour of the salient region is not clear.In order to extract the region of interest more accurately, we add texture and edge structure features to the Itti model.Then, we use the improved model to extract the region of interest and region of non-interest, and use BRISQUE algorithm to extract the natural statistical features of the image respectively.Moreover, the characteristics of the region of interest and region of non interest are fused to get the measure factor of the image.We train the image quality evaluation model, which the measure factor as the input and the DMOS value of corresponding image as the output target of SVR.Finally, the quality of the distorted image is predicted by the trained evaluation model.

Feature Extraction of BRISQUE Algorithm
Ruderman [30] found that the normalized luminance coefficients of natural images obey the unit generalized Gauss probability distribution.He believes that the image distortion will change the statistical characteristics of the normalized coefficient.By measuring the change of the statistical features, the distortion types can be predicted and the visual quality of the image can be evaluated.Based on this theory, Mittal et al. proposed a BRISQUE algorithm based on spatial statistical features [29].This spatial method to NR IQA that they have developed can be summarized as follows.

Image Pixel Normalization
Given an image which possibly distorted, we compute locally normalized luminances via local mean subtraction and divisive normalization.Formula (11) may be applied to a given intensity image I(i, j) to produce: where I(i, j) represents the gray value of the original image, I ∈ {1,2, . . ., M}, j ∈ {1,2, ..., N}.M and N are the image height and width respectively; c is a small constant, in order to prevent the stability of calculated results when the denominator closes to 0. µ(i, j) and σ(i, j) are weighted mean and variance.ω = {ω k,l |k = −K, . . ., K, l = −L, . . ., L} is a two-dimensional circularly symmetric Gaussian weighting function.They called the normalized brightness value of Î(i, j) is MSCN (Mean Subtracted Contrast Normalized) coefficients.

Spatial Feature Extraction
The existence of distortion will destroy the regularity between the adjacent MSCN coefficients.The characteristics of the normalized coefficients include the generalized Gauss distribution features and the correlation of the adjacent coefficients.
(1) Generalized Gauss distribution characteristics: The model formula can be expressed as Formula (14).
where β = σ Γ(1/a) Γ(3/a) , Γ(•) is gamma function.The shape parameter α controls the shape of the generalized Gauss model, and σ 2 is the variance.This article uses the literature [42] fast matching method to estimate (a, σ 2 ) as the features of generalized Gauss distribution.
(2) Correlation of adjacent coefficients: We obtain correlation images from the horizontal, vertical, diagonal and diagonal four directions [43], and use Asymmetric Generalized Gaussian Distribution (AGGD) to fit the images.We use fast matching method [44] to estimate parameters (η, ν, σ l , σ r ) for each direction.Thus, we utilize the 16 estimation parameters of four directions as the correlation feature of the adjacent coefficients.
However, due to multiscale statistical characteristics of natural images, the author found that it is much more reasonable to extract two generalized distribution characteristics and 16 neighboring coefficients of correlation characteristics at two scales.Therefore, the total extracted features are (2 + 16) × 2 = 36.

Prediction Model
Through being established the relationship between image features and subjective evaluation values of image in this paper, we propose a model of objective image quality evaluation model based on ROI-BRISQUE.We utilize measure factor of distortion image (V all ) which obtained from feature vectors effective fusion between region of interest feature vector (V roi ) and region of non-interest feature vector (V non-roi ) as the input of the SVR model.Moreover, we use the subjective quality evaluation value of the corresponding image as the output target to establish image quality evaluation model based on visual perception.The calculation formula is given by Formula ( 15).
The parameter λ is the weight of the image region of interest.Through training on the LIVE image database, the experimental results show that the model has a good learning accuracy and predictive ability.

Experimental Results of the Region of Interest
In this paper, we select some pictures from image database to test the improved Itti model.Meanwhile, in order to compare the experimental results, saliency maps of Itti and GBVS are also listed, the results as shown in Figures 8-10: In To verify the accuracy of the region of interest which is better obtained by the improved Itti model, we use eye-tracking data [45].It was collected in order to better understand how people look to images when assessing image quality.This database consists of 40 original images.Each original image was further processed to produce four different versions, which resulted in a total of 160 images used in the experiment.Participants performed two eye-tracking experiments: one with a free-looking task and one with a quality assessment task.With the eye tracker it was possible to track the eye movement of the viewers and map the salient regions for images given different tasks and with different levels of image quality.Some meaningful progress in the design of eye-tracking data is reported in the literature [46].
Algorithms 2016, 9, 87 11 of 11 utilize the 16 estimation parameters of four directions as the correlation feature of the adjacent coefficients.However, due to multiscale statistical characteristics of natural images, the author found that it is much more reasonable to extract two generalized distribution characteristics and 16 neighboring coefficients of correlation characteristics at two scales.Therefore, the total extracted features are (2 + 16) × 2 = 36.

Prediction Model
Through being established the relationship between image features and subjective evaluation values of image in this paper, we propose a model of objective image quality evaluation model based on ROI-BRISQUE.We utilize measure factor of distortion image (Vall) which obtained from feature vectors effective fusion between region of interest feature vector (Vroi) and region of non-interest feature vector (Vnon-roi) as the input of the SVR model.Moreover, we use the subjective quality evaluation value of the corresponding image as the output target to establish image quality evaluation model based on visual perception.The calculation formula is given by Formula (15).
- (1 ) The parameter λ is the weight of the image region of interest.Through training on the LIVE image database, the experimental results show that the model has a good learning accuracy and predictive ability.

Experimental Results of the Region of Interest
In this paper, we select some pictures from image database to test the improved Itti model.Meanwhile, in order to compare the experimental results, saliency maps of Itti and GBVS are also listed, the results as shown in Figures 8-10: In From figures of (b), (c) and (d), we can see the regions of interest which obtained from the method of this paper are more accurate than others.The regions are consistent quite well with human perception and human visual process.
To verify the accuracy of the region of interest which is better obtained by the improved Itti model, we use eye-tracking data [45].It was collected in order to better understand how people look to images when assessing image quality.This database consists of 40 original images.Each original image was further processed to produce four different versions, which resulted in a total of 160 images used in the experiment.Participants performed two eye-tracking experiments: one with a free-looking task and one with a quality assessment task.With the eye tracker it was possible to track the eye movement of the viewers and map the salient regions for images given different tasks and with different levels of image quality.Some meaningful progress in the design of eye-tracking data is reported in the literature [46].Since recording eye movements is so far the most reliable ways for studying human visual attention [10], it is highly desirable to use these "ground truth" visual attention data for the model of extract ROI.We use the images of the eye-tracking database to extract the region of interest, and compare with the results of eye-tracking database.The resulting ROI are illustrated in Figures 11  and 12. (e) as an example a saliency map derived from eye-tracking data obtained in experiment I      Since recording eye movements is so far the most reliable ways for studying human visual attention [10], it is highly desirable to use these "ground truth" visual attention data for the model of extract ROI.We use the images of the eye-tracking database to extract the region of interest, and compare with the results of eye-tracking database.The resulting ROI are illustrated in Figures 11  and 12. (e) as an example a saliency map derived from eye-tracking data obtained in experiment I      Since recording eye movements is so far the most reliable ways for studying human visual attention [10], it is highly desirable to use these "ground truth" visual attention data for the model of extract ROI.We use the images of the eye-tracking database to extract the region of interest, and compare with the results of eye-tracking database.The resulting ROI are illustrated in Figures 11  and 12. (e) as an example a saliency map derived from eye-tracking data obtained in experiment I Since recording eye movements is so far the most reliable ways for studying human visual attention [10], it is highly desirable to use these "ground truth" visual attention data for the model of extract ROI.We use the images of the eye-tracking database to extract the region of interest, and compare with the results of eye-tracking database.The resulting ROI are illustrated in Figures 11 and 12. (e) as an example a saliency map derived from eye-tracking data obtained in experiment I (free looking task) for one of the original images, and the (i) obtained in experiment II (scoring task) for the same image.It can be seen that the improved Itti model achieves comparable testing results and approaches the performance of the eye-tracking data.This model is able to successfully predict ROI in close agreement with human judgments.We use the LIVE IQA database2 [38] to test the performance of ROI-BRISQUE method.The database consists of 29 reference images and 779 distorted images spanning various distortion categories, JPEG compresses images (169 images), JPEG2000 compressed images (175 images), Gaussian blur (145 images) and Rayleigh fading channel (fast fading 145 images).Further, each We use the LIVE IQA database2 [38] to test the performance of ROI-BRISQUE method.The database consists of 29 reference images and 779 distorted images spanning various distortion categories, JPEG compresses images (169 images), JPEG2000 compressed images (175 images), Gaussian blur (145 images) and Rayleigh fading channel (fast fading 145 images).Further, each distorted image has an associated difference mean opinion scores (DMOS), which represents the perceived quality of the image.
In order to calibrate the relationship between the extracted features and the ROI, as well as DMOS, the ROI-BRISQUE method requires two subsets of images.We randomly divide the LIVE IQA database2 into two non-overlapping sets −80% training and 20% testing.We do this to make sure that the experiment results do not depend on features extracted from known distortion images, which can improve performance of the ROI-BRISQUE method.Further, in order to eliminate performance bias, we repeat this random 80% train and 20% test procedure 1000 times and compute the mean value of the performance.In the following text, the figures reported here are the median values of performance across these 1000 iterations.
In this paper, we use four commonly used performance indexes to evaluate the IQA algorithms.They are Spearman's rank ordered correlation coefficient (SROCC), Kendall rank order correlation coefficient (KROCC), Pearson's linear correlation coefficient (PLCC) and Root Mean Squared Error (RMSE).The first two indexes can measure the prediction monotonicity of an IQA algorithm.The other two indexes are computed after passing the algorithmic value through a logistic nonlinearity as in [47].

Performance Analysis
In this paper, all the procedures are performed using the Matlab (R2011a) toolbox.We use the LIBSVM package [48] to implement the SVR and the radial bias function (RBF) kernel for regression.The operating environment is Intel ® Core TM i5 and 2.67 GHz processor with 6 GB of RAM.
First of all, in order to make the method proposed in this paper can get better results in performance, we need to choose a reasonable weight value of λ in formula (15).We use different values of λ to test the proposed model.The results are shown in Table 1.As observed from Table 1, we can see the experimental results obtained by the Formula (15) are relatively good when the λ = 0.7.In addition, we also utilize ITTI and GBVS model to extract the ROI of the image and test the proposed model.The performance indices are tabulated in Tables 2 and 3 respectively.In these two tables, since Itti and GBVS model cannot accurately describe the scope of the region of interest, these performances are inferior to the improved Itti model.
In order to test the proposed algorithm performance better, we report the performance of two FR IQA algorithms peak-signal-to-noise-ratio (PSNR), and the structural similarity index (SSIM).Since the two algorithms have a good performance in FR IQA, they have been used as a benchmark for many years.We also select three NR IQA algorithms BLIINDS-II, DIIVINE and BRISQUE to compare the performance, because these three methods are general-purpose methods and have a good performance in DCT, wavelet and spatial domain respectively.These performance indices are tabulated in Table 4.
As seen from Table 4, ROI-BRISQUE performs quite well in terms of correlation with human perception, beating present day full-reference and no-reference image quality assessment indices.Usually we hope the scatter plot should define a cluster, that means the subjective score and objective evaluating score are tightly correlative since the ideal IQA algorithm should accurately reflect the subjective score, such as the DMOS [49].The scatter plots of different IQA models are shown in Figure 13, where each point represents one test image, with its vertical and horizontal axes representing its predict score and the given objective quality score, respectively.We can see the scatter plots of full reference methods PSNR and SSIM are scattered, while the scatter plots of no reference methods DIIVINE, BLIINDS-II and BRISQUE are evenly distributed along the curve in the low distortion.The prediction is relatively accurate.When the degree of distortion of the image is too large, there is a deviation between the prediction results and the true value, scatter points more dispersed.In this paper, the method can accurately predict the image with different degree of distortion.We can see the distributions cluster along the curve.Figure 15.Obviously, the lower the confidence interval with a higher median SROCC, the better the performance is.The plots show that the ROI-BRISQUE is not statistically significantly different in performance.In addition, in order to evaluate the statistical significance of performance of each algorithm, we use one-sided t-test between those correlation values [50].We tabulate the results of such statistical analysis in Table 6.The null hypothesis is that the mean correlation for the row is equal to the mean correlation for the column at the 95% confidence level [51].The other hypothesis is that the mean correlation of the row is greater or lesser than the mean correlation of the column.In the Table 6, the value of "1" indicates that the row method is statically superior to the column method, "−1" indicates that the row method is inferior to the column method, and "0" indicates that the row and the column methods are equivalent.Obviously, the lower the confidence interval with a higher median SROCC, the better the performance is.The plots show that the ROI-BRISQUE is not statistically significantly different in performance.In addition, in order to evaluate the statistical significance of performance of each algorithm, we use one-sided t-test between those correlation values [50].We tabulate the results of such statistical analysis in Table 6.The null hypothesis is that the mean correlation for the row is equal to the mean correlation for the column at the 95% confidence level [51].The other hypothesis is that the mean correlation of the row is greater or lesser than the mean correlation of the column.In the Table 6, the value of "1" indicates that the row method is statically superior to the column method, "−1" indicates that the row method is inferior to the column method, and "0" indicates that the row and the column methods are equivalent.Obviously, the lower the confidence interval with a higher median SROCC, the better the performance is.The plots show that the ROI-BRISQUE is not statistically significantly different in performance.In addition, in order to evaluate the statistical significance of performance of each algorithm, we use one-sided t-test between those correlation values [50].We tabulate the results of such statistical analysis in Table 6.The null hypothesis is that the mean correlation for the row is equal to the mean correlation for the column at the 95% confidence level [51].The other hypothesis is that the mean correlation of the row is greater or lesser than the mean correlation of the column.In the Table 6, the value of "1" indicates that the row method is statically superior to the column method, "−1" indicates that the row method is inferior to the column method, and "0" indicates that the row and the column methods are equivalent.

Figure 1 .
Figure 1.General architecture of the model.

Figure 2 .
Figure 2. Improved architecture of the mode.

Figure 1 .
Figure 1.General architecture of the model.

Figure 1 .
Figure 1.General architecture of the model.

Figure 2 .
Figure 2. Improved architecture of the mode.

Figure 2 .
Figure 2. Improved architecture of the mode.

Figure 5 .
Figure 5. Damaged figure of non-interest region.Figure 5. Damaged figure of non-interest region.

Figure 5 .
Figure 5. Damaged figure of non-interest region.Figure 5. Damaged figure of non-interest region.

Figure 6 .
Figure 6.Damaged figure of interest region.

Figure 7 .
Figure 7. Model of Region of Interest-Blind/Reference Image Spatial Quality Evaluator.

Figure 6 .
Figure 6.Damaged figure of interest region.

Figure 7 .
Figure 7. Model of Region of Interest-Blind/Reference Image Spatial Quality Evaluator.

Figure 7 .
Figure 7. Model of Region of Interest-Blind/Reference Image Spatial Quality Evaluator.

Figures 8 -
10, (a) is the original image, (b), (c) and (d) are salient overlap maps of Graph-Based Visual Saliency (GBVS) model, Itti model and this paper's model respectively.(e), (f) and (g) are salient maps of GBVS model, Itti model and this paper's model respectively.The white areas represent region of interest, and the black areas represent the region of non-interest.The higher the brightness of the white part is, the higher the degree of interest of the human eye becomes.From figures of (b), (c) and (d), we can see the regions of interest which obtained from the method of this paper are more accurate than others.The regions are consistent quite well with human perception and human visual process.

Figures 8 -
10, (a) is the original image, (b), (c) and (d) are salient overlap maps of Graph-Based Visual Saliency (GBVS) model, Itti model and this paper's model respectively.(e), (f) and (g) are salient maps of GBVS model, Itti model and this paper's model respectively.The white areas represent region of interest, and the black areas represent the region of non-interest.The higher the brightness of the white part is, the higher the degree of interest of the human eye becomes.

Figure 8 .
Figure 8.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (e-g) salient maps of GBVS model, Itti model and improved Itti model respectively.

Figure 9 .
Figure 9.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (e-g) salient maps of GBVS model, Itti model and improved Itti model respectively.

Figure 10 .
Figure 10.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (e-g) salient maps of GBVS model, Itti model and improved Itti model respectively.

Figure 8 .
Figure 8.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (e-g) salient maps of GBVS model, Itti model and improved Itti model respectively.

Figure 8 .
Figure 8.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (e-g) salient maps of GBVS model, Itti model and improved Itti model respectively.

Figure 9 .
Figure 9.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (e-g) salient maps of GBVS model, Itti model and improved Itti model respectively.

Figure 10 .
Figure 10.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (e-g) salient maps of GBVS model, Itti model and improved Itti model respectively.

Figure 9 .
Figure 9.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (e-g) salient maps of GBVS model, Itti model and improved Itti model respectively.

Figure 8 .
Figure 8.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (e-g) salient maps of GBVS model, Itti model and improved Itti model respectively.

Figure 9 .
Figure 9.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (e-g) salient maps of GBVS model, Itti model and improved Itti model respectively.

Figure 10 .
Figure 10.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (e-g) salient maps of GBVS model, Itti model and improved Itti model respectively.

Figure 10 .
Figure 10.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (e-g) salient maps of GBVS model, Itti model and improved Itti model respectively.

Algorithms 2016, 9 , 87 13 of 13 (
free looking task) for one of the original images, and the (i) obtained in experiment II (scoring task) for the same image.It can be seen that the improved Itti model achieves comparable testing results and approaches the performance of the eye-tracking data.This model is able to successfully predict ROI in close agreement with human judgments.

Figure 11 .
Figure 11.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (f-h) salient maps of GBVS model, Itti model and improved Itti model respectively; (e) Salient map of free looking task; (i) salient map of scoring task.

Figure 12 .Figure 11 .
Figure 12.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (f-h) salient maps of GBVS model, Itti model and improved Itti model respectively; (e) Salient map of free looking task; (i) salient map of scoring task7.2.Experiments and Relate Analysis of Image Quality Evaluation7.2.1.LIVE IQA DatabaseWe use the LIVE IQA database2[38] to test the performance of ROI-BRISQUE method.The database consists of 29 reference images and 779 distorted images spanning various distortion categories, JPEG compresses images (169 images), JPEG2000 compressed images (175 images), Gaussian blur (145 images) and Rayleigh fading channel (fast fading 145 images).Further, each

Figure 11 .
Figure 11.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (f-h) salient maps of GBVS model, Itti model and improved Itti model respectively; (e) Salient map of free looking task; (i) salient map of scoring task.

Figure 12 .
Figure 12.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (f-h) salient maps of GBVS model, Itti model and improved Itti model respectively; (e) Salient map of free looking task; (i) salient map of scoring task 7.2.Experiments and Relate Analysis of Image Quality Evaluation 7.2.1.LIVE IQA Database

Figure 12 .
Figure 12.Comparison of different visual saliency models.(a) Original image; (b-d) salient overlap maps of GBVS model, Itti model and improved Itti model respectively; (f-h) salient maps of GBVS model, Itti model and improved Itti model respectively; (e) Salient map of free looking task; (i) salient map of scoring task of the distribution of the SROCC values for each of the 1000 experimental trials.The error bars indicate the 95% confidence interval for each algorithm[50].Plots are shown in Figure15.

Figure 15 .
Figure 15.Mean SROCC value of the algorithms evaluated in Table 4, across 1000 train-test on the LIVE IQA database2.The error bars indicate the 95% confidence interval.

Figure 15 .
Figure 15.Mean SROCC value of the algorithms evaluated in Table 4, across 1000 train-test on the LIVE IQA database2.The error bars indicate the 95% confidence interval.

Figure 15 .
Figure 15.Mean SROCC value of the algorithms evaluated in Table 4, across 1000 train-test on the LIVE IQA database2.The error bars indicate the 95% confidence interval.

Table 1 .
Performance of ROI-BRISQUE which ROI extracted by the improved Itti model.The numbers in bold are the best values.

Table 2 .
Performance of ROI-BRISQUE which ROI extracted by Itti model.The numbers in bold are the best values in the corresponding column.

Table 3 .
Performance of ROI-BRISQUE which ROI extracted by GBVS model.The numbers in bold are the best values.

Table 4 .
Performance comparison of some IQA methods and the method of this paper.Italicized algorithms are NR IQA algorithms, others are FR IQA algorithms.