No-Reference Image Quality Assessment with Local Gradient Orientations

Image processing methods often introduce distortions, which affect the way an image is subjectively perceived by a human observer. To avoid inconvenient subjective tests in cases in which reference images are not available, it is desirable to develop an automatic no-reference image quality assessment (NR-IQA) technique. In this paper, a novel NR-IQA technique is proposed in which the distributions of local gradient orientations in image regions of different sizes are used to characterize an image. To evaluate the objective quality of an image, its luminance and chrominance channels are processed, as well as their high-order derivatives. Finally, statistics of used perceptual features are mapped to subjective scores by the support vector regression (SVR) technique. The extensive experimental evaluation on six popular IQA benchmark datasets reveals that the proposed technique is highly correlated with subjective scores and outperforms related state-of-the-art hand-crafted and deep learning approaches.

The recent advancement of digital imaging has stimulated a tremendous growth in the use of visual information for communication [1][2][3].Therefore, it is essential to develop reliable automatic image quality assessment (IQA) measures for the evaluation of results of image processing methods for the acquisition, storage, transmission, restoration, or enhancement.The main role of IQA measures is to provide the objective assessment and replace cumbersome tests with human subjects [4].The IQA measures are divided into three categories, based on the availability of reference images [1,5,6]: full-reference (FR), reduced-reference (RR), and no-reference (NR) approaches.In an FR-IQA measure, a distorted image is compared with its reference image, while only some statistics of the distortion-free image are available in the RR-IQA case.The peak signal-to-noise ratio (PSNR) is often used as an FR-IQA model due to its simplicity.However, it weakly correlates with human perception [5].Therefore, more suitable measures have been developed that employ structural information [7], image statistical properties [8], visual saliency maps [9,10], structure and contrast changes [11], phase congruency [12], distortion distribution [13], or other measures [14,15].The RR-IQA measures use only a part of reference data [16].
In this work, the discussion is confined to NR approaches, which are considered challenging and highly desired due to their applicability in absence of reference images.Many measures are devoted to evaluating the perceptual quality of images distorted by Gaussian white noise, JPEG compression, contrast change, or Gaussian blur [17,18].Since their practical application is limited and based on the prior knowledge of distortion types, general-purpose NR methods have been developed.Many of them assume that statistical regularities of natural images can be reflected by natural scene statistics (NSS), as the Human Visual System (HVS) is very sensitive to local regularities [19].Consequently, NSS characteristics of distorted images are used for the IQA in many domains, e.g., Discrete Cosine Transform (DCT) [20], wavelet [21,22], or spatial [23].A variety of gradient-based features are often employed to model NSS [24].Furthermore, the use of perceptual features [25][26][27] or image patches [28] can be found in the literature.Since the supervised learning bridges image statistics with the perceptual quality, it is often applied to obtain a model used for the quality prediction.For the learning, the Support Vector Regression (SVR) [20,21,23,27], neural networks [24], or random forests [29] are applied.In methods that do not use supervised learning, distortion types are modeled with a set of centroids of quality levels or NSS from multiple cues [30,31].Another direction is to employ a pseudo-reference image which is created and compared with a distorted image with blockiness, sharpness, and noisiness metrics [32].Recently, many NR-IQA approaches which use deep neural network (DNN) architectures have been introduced.They merge the feature extraction and quality prediction steps.However, they suffer from a small number of training examples available in IQA benchmark datasets or use complex architectures that are devoted to image recognition tasks.To overcome these limitations, most of them use image patches [33,34], train models using FR-IQA measures instead of subjective scores [34,35], or perform fine-tuning to adopt an architecture to the IQA [36].Interestingly, some DNN-based approaches use features introduced in earlier methods [35].
The HVS is sensitive to local structures, which are often described using local binary patterns (LBP) and gradient-based statistics.However, a spatial distribution of LBP may not be able to capture more complex structures [37].Thus, statistics extracted from gradient maps often occur in conjunction with other approaches to improve the IQA performance [27,38].Such techniques use global distributions of gradient magnitude maps [25], relative gradient orientations or magnitude [24,39].
To describe an image and efficiently take into account local gradient orientations, Histogram of Oriented Gradients (HOG) descriptor can be used [40].However, the HOG produces high-dimensional feature vectors, which are devoted to object recognition tasks due to their discriminative capabilities.Consequently, considering its application to the IQA, it is worth noticing that an original image content of an assessed image, which is described using the HOG, may influence the quality prediction performance.The descriptor also strongly depends on the size of processed image blocks and the described neighborhood.Since the image gradient orientation captured by the HOG and its relevance for the NR-IQA is interesting and still seems largely uninvestigated, in this paper, a novel no-reference technique for the image quality assessment with a SEt of Histogram of OriEnted GRadients (HOG) descriptors [40] (SEER) is introduced.In the SEER, in contrary to a widely accepted application of the HOG, an image is described by a set of descriptors which are obtained taking into account different local neighborhoods.In other words, each feature vector is composed of histograms of gradient orientations calculated for image regions (cells), which are arranged together with their neighboring cells in blocks.The descriptors in the set consider different sizes of image regions and blocks.Then, each descriptor is characterized by a histogram, seen as perceptual features.In a typical image recognition system, high-dimensional descriptors are often compared with each other.However, to apply the HOG to the NR-IQA, such comparison cannot take place since a distortion-free image and its descriptors are unavailable.Furthermore, the feature vectors in the HOG are designed to discriminate objects in images.Therefore, in the SEER, to train the SVR model, statistics of descriptors are employed instead of high-dimensional vectors.To improve the IQA performance, the method processes luminance and chrominance channels of an image, as well as their high-order derivatives.
The extensive experimental evaluation of the introduced NR measure against the related state-of-the-art techniques on six popular large-scale IQA benchmark datasets, which contain various distortion types, demonstrates that the SEER provides the higher quality prediction accuracy than compared NR models and is consistent with subjective scores.The method was evaluated against hand-crafted and deep learning NR measures.
The rest of this paper is arranged as follows.Section 1 reviews previous work on NR-IQA and Section 2 describes the proposed measure.Section 3 presents and discusses the experimental results obtained for the SEER and the related measures on TID2013 [41], TID2008 [42], CSIQ [43], LIVE [7], and LIVE In the Wild Image Quality Challenge (LIVE WIQC) [44] datasets.Finally, Section 4 concludes the paper.

Related Work
In this section, a brief review of previous studies closely related to the introduced work is presented.
The introduced SEER is based on gradient processing.However, there are many works which employ other features.For example, Moorthy and Bovik proposed a two-stage framework, in which distortion type is predicted and then an image is evaluated [21].Saad et al. trained a probabilistic model with DTC-based NSS as a single-stage framework [20].The generalized Gaussian distribution used for capturing NSS with locally normalized luminance coefficients is employed in Blind/Referenceless Image Spatial QUality Evaluator (BRISQUE) [23].A scheme which combines artifact-specific metrics and employs a generalized Laplace distribution of the difference of two adjacent pixel values in an image was introduced by Fang et al. [45].The measure uses three transductive 2 k-nearest neighbor algorithms to map the metrics into subjective scores.Normalized gradients magnitude and Laplacian of Gaussian responses were jointly used by Xue et al. [25].Le et al. [46], in turn, used a histogram of local binary patterns (LBP) obtained for a gradient map.They also used LBP extracted from texture and structural maps [38].A more advanced gradient-based image descriptor, Speeded-Up Robust Features (SURF), is employed in the measure proposed by Oszust [27].In that work, the sample mean, standard deviation, entropy, skewness, kurtosis, and histogram variance for the assessed image, the image filtered with Prewitt operators, and their SURF features are used.In Optimized filteRing with binAry desCriptor for bLind imagE quality assessment technique (ORACLE) [47], in turn, a data-driven filtering based on the appearance of Features From Accelerated Segment Test (FAST) in grayscale images is proposed.The histograms of Fast Retina Keypoint (FREAK) descriptors for keypoints detected in filtered images are used to characterize the assessed content.The sample mean, standard deviation, and histogram variance of raw local patches describing FAST keypoints in images filtered with the bilaplacian operator in the YCbCr color space are used in stATistics of pixel blocks of local fEatuRes (RATER) measure [48].Unlabeled data for learning Gabor features and modeling an image using the soft-assignment coding with the max pooling are employed in [49].In another method that uses a codebook, High Order Statistics Aggregation (HOSA) [28], K-means clustering of normalized image patches and their description with the low and high order statistics are considered.The HOSA uses the soft assignment for an image representation and trains the SVR model for the prediction.Screen content images were assessed by Lu and Li [50] using orientation selectivity mechanism for extraction of orientation features.
Gradient-based techniques are often employed to provide effective IQA measures [11,24,25,46].These measures use global distributions of gradient magnitude maps [25], relative gradient orientations or magnitude [24,39].For example, in the Oriented Gradients Image Quality Assessment (OG-IQA) index, in which a correlation structure of image gradient orientations [24] is employed to train a quality model with AdaBoosting-backpropagation neural network.In that work, histograms of gradient magnitude, relative gradient orientation, and relative gradient magnitude maps are used and characterized using the histogram variance.These gradient-based measures do not take into account local gradient distributions.Such distributions are used in the HOG to provide a feature vector composed of 1-D histograms of gradient directions of pixels within image regions (cells) [40].In the literature, there are FR-IQA measures which compare a reference image with the distorted image, calculating a distance between corresponding HOG vectors [51] or produce a weight map for the FR SSIM index [52].In the technique proposed by Ahn et al. [18], in turn, blurred images are assessed by comparing HOG vectors approximated by a random sample consensus set (RANSAC).
Taking into account the referred works, it can be stated that the effectiveness of the HOG in NR image quality prediction remains largely uninvestigated, and a promising application of this descriptor to the NR-IQA is introduced for the first time in this paper.

Proposed NR Measure
In this section, the feasibility of using the HOG for the NR-IQA is investigated.The image processing steps in the SEER are explained in details in Figure 1, while in Figure 2 the block diagram of the proposed method is shown.As illustrated, the measure uses the HOG and provides feature vectors for luminance and chrominance channels of a distorted image, as well as for channels filtered using two bilaplacian operators.The measure uses differently sized image regions and the way the regions are combined together in the HOG descriptors.In the SEER, the K HOG features are obtained for each channel and their histograms are used to train a quality prediction model.

Local Gradient Orientations
The HVS is sensitive to variations in local structures [19,39].Such sensitivity and the resultant subjective perception of an image are related to local semantic structural information which forms primitives in V1 [39].Local distributions of gradients are used in the HOG to characterize an image [40].Specifically, a 2D image I is convolved with 1-D Prewitt filters in the horizontal (h Ph = [1, 0, −1]) and vertical (h Pv = [1, 0, −1] T ) directions to obtain gradients.Then, the edge magnitude is calculated as g = g 2 x + g 2 y , and its orientation is obtained as θ = arctan(g y /g x ).The orientation is then transformed to [0, 180] degrees range, ensuring that the opposite directions are assigned the same angle.The I is divided into adjacent, non-overlapping cells of size C C 1 ×C 2 and, for each cell, the gradient orientations are binned into o bins with votes based on their magnitudes.To reduce aliasing, each pixel contributes to adjacent bins a fraction of its gradient magnitude.To reduce contrast changes, in turn, histograms for cells are normalized.Since gradient magnitudes carry the information about the described object in a region, to preserve the information, cells are grouped into overlapping blocks.There are B B 1 ×B 2 cells in each block.Then, B 1 × B 2 cell histograms are concatenated and normalized using L2-norm.Finally, the obtained features for blocks are concatenated and normalized.This normalization makes the descriptor invariant against overall image contrast [40].The division of an image into blocks and cells is shown in Figure 3.The number of overlapping cells between adjacent blocks is equal to The number of dimensions in the resulting feature vector D is calculated as: where M and N denote the height and width of an image, respectively.If

Feature Extraction
In the introduced NR measure, a distorted RGB image is converted to the YCbCr color space.The YCbCr color space is specified in ITU-R BT.601 and selected as the preferred format for video broadcasting with the efficient use of the channel bandwidth [53].Following the finding that the image filtering can enhance the image quality prediction of NR-IQA measures [47] and the bilapacian operator can be used to capture more information about described image regions for the IQA than it can be achieved with other filters [48], the SEER uses the YCbCr color space and filters color channels of the assessed image with two bilaplacian kernels ∆ 2 a and ∆ 2 b .The bilaplacian is obtained by convolving two Laplacian kernels.The following kernels are used: Finally, where " * " denotes the convolution.Consequently, a filtered channel (e.g., ∆ 2 a (Y)) is obtained using the convolution ∆ 2 a * Y.To show that the HOG technique provides different feature vectors for YCbCr components, Figure 4 presents an exemplary distorted image and the visualization of the HOG obtained for a small square image patch.For the visualization, a grid of rose plots is used.In the grid, each rose expresses the distribution of gradient orientations within a cell.The length of a petal indicates the contribution of a given orientation within the cell histogram and displays two times o petals.As presented, the feature vectors for the descriptor D(C 2×2 , B 2×2 ) have different shapes for different YCbCr channels.The figure also contains image patches filtered with two used bilaplacian operators ∆ 2 a and ∆ 2 b .As shown in Figure 4, the differences between descriptors for channels seem to justify the need for their joint application in order to extract more information about the described image regions.Furthermore, since the size of a local structure in an image cannot be determined in advance, the method uses several HOG descriptors with cells and blocks of different sizes.The HOG descriptor produces high-dimensional features, depending on its focus on a local appearance of objects within an image.Thus, in this paper, each obtained feature vector is characterized using its histogram h(D).The histogram is used since it captures distributions of natural images [24].The values in the feature vectors for images are in the range [0, 1].Therefore, the number of bins in the histogram and its variance is calculated by dividing the range into d intervals and using them for the determination of bin centers.
To illustrate that the used statistics can characterize the distortion severity of images, they are computed for two sequences of images distorted with Gaussian blur and Gaussian noise [41].Here, the descriptors D(C 2×2 , B 2×2 ) and D(C 4×4 , B 2×2 ) are used.For the computation of the histogram, 10 bins are applied.As shown in Figure 5, the histogram responds consistently to the distortion type and its severity.
In the SEER, for YCbCr channels and channels filtered with two bilaplacian operators, feature vectors for the HOG descriptors are obtained and concatenated.This can be written as .Finally, the perceptual feature vector is obtained as ].The parameters of HOG descriptors used in the SEER are discussed in details in Section 3.6.Since the distortions affect images across scales [23], the method also processes an input image which is downsampled by a factor of two to improve the quality prediction.Finally, the concatenated histograms for an input image and its downscaled version are used to train the SVR model.The SVR model which maps the vector F into subjective ratings S is obtained using LIBSVM library [54].Here, as in many NR techniques [24,46], the radial basis function (RBF) kernel is employed.Finally, given an input image, the model predicts its quality and provides the objective score Q.

Datasets and Protocol
In this work, to evaluate an NR metric, six publicly available large IQA datasets were used: TID2013 [41], TID2008 [42], CSIQ [43], LIVE [7], LIVE WIQC [44], and the Waterloo Exploration Database (WE) [55].The first four datasets are typically selected to evaluate recently introduced measures [34,56,57], while the WE dataset, similar to the Group MAD Competition [58,59], is tied with a novel evaluation methodology.The LIVE WIQC dataset contains subjective scores collected in an uncontrolled manner using the Amazon Mechanical Turk.However, it contains images captured with a mobile camera and can be used for the evaluation of NR measures using real images.The TID2013 dataset is the largest and the most demanding public IQA benchmark.It consists of 3000 distorted images and covers 24 image distortion types.The other datasets contain only half (TID2008) or less than one-third of the number of the images in TID2013 (CSIQ and LIVE).The LIVE dataset contains popular distortions, such as JPEG compression, JPEG2000 compression, Gaussian blur, white noise, or simulated fast fading Rayleigh channel.It is worth noticing that some distortion types in TID2013 can be regarded as multiple, e.g., lossy compression of noisy images.Interestingly, most existing general-purpose NR measures are designed to provide an acceptable performance on the LIVE [32].As shown in Section 3.3, some of them experience a drop in the performance on datasets that contain more diverse distortions.The datasets contain high-quality images, their distorted images, and related subjective scores.The subjective scores obtained in tests with human subjects are denoted as mean opinion scores (MOS) or differential MOS (DMOS).The WE dataset contains images distorted with JPEG compression, JPEG2000 compression, white Gaussian noise, and Gaussian blur.However, it does not contain subjective scores.The datasets are characterized in Table 1.To measure the consistency of the prediction results provided by an IQA measure with subjective ratings, the following four indices were considered [60]: the Spearman's Rank Correlation Coefficient (SRCC), Kendall Rank order Correlation Coefficient (KRCC), Pearson linear Correlation Coefficient (PCC), and Root Mean Square Error (RMSE).PCC and RMSE were calculated after a nonlinear mapping between the vectors of objective and subjective scores, Q p and S (MOS or DMOS), respectively.For the mapping, the following function was used [60]: ] are parameters of the fitted regression model, Q is the objective score, and Q p is the fitted score.
The evaluation protocol associated with the WE dataset requires: the calculation of pristine/distorted image discriminability test (D-test), listwise ranking consistency test (L-test), and pairwise preference consistency test (P-test) [55].The D-test uses predictions of a model to classify distorted and pristine images.Larger values of D denote better separability.The L-test, in turn, adopts the average SRCC to quantify the ranking consistency among distorted images.Finally, the P-test compares preference predictions of IQA models on pairs of images whose quality is clearly discriminable [55], dividing the number of image pairs with correctly predicted concordance by the number of image pairs.The values obtained in tests lie in [0, 1].

Model Training
The proposed method should be trained to obtain the SVR model used for the quality prediction.Therefore, a typical protocol used for the validation of NR techniques was adopted, in which each image dataset was divided into disjoint training and testing subsets, i.e., distorted images of 80% reference images were used in training and the remaining 20% of images were used for testing [28,32,46].Then, to avoid bias and fairly compare a measure with other measures, the performance of each method was reported in terms of the median values of SRCC, KRCC, PCC, and RMSE over 100 training-testing iterations [61].

Performance on Individual Datasets
The performance of the presented NR measure was compared with those of the related state-of-the-art techniques.The following NR measures were considered: (i) HOSA [28]; (ii) BPRI [32]; (iii) BRISQUE [24]; (iv) IL-NIQE [31]; (v) SISBLIM [62]; (vi) OG-IQA [24]; (vii) GWH-GLBP [46]; and (viii) RATER [48].Among NR measures, which, to some extent, are similar to the SEER, the RATER, HOSA, and IL-NIQE assess color images, while the OG-IQA incorporates image gradients.The GWH-GLBP and SISBLIM are designed for the evaluation of images with multiple distortions.The RATER and BPRI are recently introduced general-purpose measures.However, due to distortion-specific steps in BPRI, its range of applicability is larger than those of distortion-specific measures, but it seems to be confined to several distortion types.As reported by Zhang et al., the IL-NIQE is superior to the BLIINDS2, DIIVINE, CORNIA, NIQE, BRISQUE, and QAC [31].The HOSA, in turn, is reported to outperform the GM-LOG, BRISQUE, or IL-NIQE [28].
The experimental evaluation was conducted on five IQA datasets using the protocol shown in Section 3.1.For the fair comparison, the parameters of all learning-based techniques were obtained aiming at their best performance [25,31].The methods were run using publicly available implementations.For the SVR, the popular LIBSVM library was used [54].The parameters of AdaBoosting BP neural network in the OG-IQA were also determined.The methods that do not require training (BPRI, IL-NIQE, and SISBLIM) were evaluated using the defined testing subsets of images in datasets.The SEER was run with o = 36 and d = 30 (see Section 3.6).
Table 2 summarizes the results on IQA datasets, where the best result for each performance index is written in bold.The table also contains average values for SRCC, KRCC, and PCC.The RMSE was not averaged due to the different range of values in the benchmarks.
As demonstrated, the introduced NR measure outperformed the state-of-the-art measures on four IQA benchmarks, i.e., the TID2013, TID2008, CSIQ, and LIVE.In the case of the LIVE WIQC dataset, the RATER and BRISQUE were slightly better than SEER, which was the third-best measure.Remarkably, the proposed measure had the highest average performance across the datasets, which confirms its usability.
To determine whether the relative performance differences between the measures are statistically significant, the Wilcoxon rank-sum test was used.The test measures the equivalence of the median values of independent samples with a 5% significance level.Here, the null hypothesis assumes that the SRCC values of compared metrics are drawn from a population with equal medians.The results are shown in Table 3, where the symbols "−1", "0" and "1" denote that the IQA measure in the column was statistically better with a confidence greater than 95%, indistinguishable, or worse than the SEER on a given IQA dataset, respectively.The findings are consistent with conclusions drawn from the previous experiments, i.e., the SEER was statistically better than other measures on the TID2013, TID2008, and LIVE.Taking into account the results for the CSIQ, the SEER was on pair with the RATER, as well as on pair with the RATER and BRISQUE for the LIVE WIQC.The measures were evaluated using the methodology associated with the WE database [55].Here, the BRISQUE and HOSA were compared with the SEER.Since the IL-NIQE is reported to provide superior performance using this methodology [55], it is also added to the comparison.The results reported for BRISQUE, HOSA, and IL-NIQE in [55] are presented in Table 4.The SEER, as other techniques, was trained on LIVE dataset.Interestingly, the IL-NIQE, which is much less correlated with subjective scores on LIVE than other measures, yielded encouraging L and P values.This may indicate that the relationship between distortion types, their levels, and used tests in the used methodology may require further attention, leading to its possible improvement.For example, the P values of the measures whose performances on the databases with subjective scores were significantly different seem to be close to its upper limit.Furthermore, the assumption that image quality degrades monotonically with the distortion levels may not be true for all distortion types and content of images, which may influence the results of the L-test [55].However, it can be seen that all measures provided acceptable performance in the tests on the WE database.[31] 0.9084 0.9926 0.9927 HOSA [28] 0.9175 0.9647 0.9983 SEER 0.8547 0.9466 0.9895 The comparative evaluation of the SEER with state-of-the-art NR methods that use DNN or other neural networks (NN) architectures was based on published results.Due to the large complexity of the models, and the unavailability of learning source codes for some of them, such comparison is very popular.Consequently, many papers report the performance of measures on the basis of results obtained in 10 random training-testing splits on only one or two IQA benchmarks.Furthermore, the coherent comparison of DNN-based methods is often impeded by the exclusion of some distortion types, which can make the use of the largest IQA datasets, such as the TID2013 or TID2008, superfluous.Table 5 contains the comparison of published median values of SRCC and PCC for NN-based NR methods with those obtained for the SEER in 10 training-testing splits.Other performance indices, i.e., KRCC and RMSE, are seldom reported in the referenced works.The results for the TID2008 are not presented since the referenced works were not evaluated on this dataset.As reported, the SEER clearly outperformed other measures on the most demanding datasets, such as the TID2013 or CSIQ.The NN-based measures were better than the SEER on the LIVE dataset.The worse results of the compared models on the datasets that contain considerably more distortions can be attributed to the lack of a sufficient number of training samples or imperfections of used architectures [36,57].It can be concluded that the introduced hand-crafted NR measure is highly competitive to the recently introduced techniques based on DNN or NN architectures.

Metric Configuration and Contribution of Features
In the SEER, as explained in Section 2.2, an evaluated image is characterized using the histogram of K HOG descriptors.Since the HOG can be used with different parameters, their choice and the resulting IQA performance of the introduced measure requires investigation.SRCC was used as the quality index, taking into account that the remaining criteria similarly indicate the performance of the method.
Figure 6a contains SRCC values obtained for the SEER with only one HOG descriptor, using the protocol introduced in the previous section, on the TID2013 dataset.TID2013 was used due to its size and the number of image distortions.In the experiment, the performance of the SEER with many different HOG parameters was investigated.The presented HOG descriptors were run with o = 36 and d = 30.In the IQA, a detailed description of a pixel neighborhood is needed, confirmed by various NR approaches using LBP in which a pixel is characterized by its 8 neighbors.Therefore, HOG configurations with smaller pixel areas were taken into account to determine the IQA performance of the SEER with them.The nine HOG descriptors presented in Figure 6a were selected to be used jointly.In the case of single application of the descriptor D(C 1×1 , B 1×1 ), the method resembles approaches in which statistics for a gradient map, among other features, are used for the quality prediction (e.g., [27]).To show that such joint application of the HOG descriptors is beneficial, Figure 6b reports the performance of the method with 1, 2, . . ., 9 HOG descriptors.Since the presentation of all possible combinations of these descriptors is unfeasible, the performance is reported for the descriptors added in the order presented in Figure 6a.Interestingly, the SEER with only two HOG descriptors delivered promising performance, outperforming the state-of-the-art measures evaluated in the previous section.It can be assumed that an application of dimensionality reduction or feature selection techniques, which would indicate the most influential descriptors or their statistics, may provide better results.However, these techniques can also deepen the dataset-dependency.Therefore, in this study, the selected K = 9 HOG descriptors were used in the SEER.
In the proposed method, the descriptors D(C 1×3 , B 1×3 ), D(C 3×1 , B 3×1 ), D(C 1×1 , B 1×1 ) and D(C 2×2 , B 1×1 ) contributed the most to its performance (see Figure 6a).Apart from the size of the cell and the arrangement of cells in the block, the computation of the feature vector F in the SEER requires two additional parameters, o and d.Therefore, the performance of the measure was evaluated taking into account their variability.The results in terms of SRCC on the TID2013 are shown in Figure 6c.They reveal that these two parameters almost did not affect the performance of the SEER.In other experiments with the method, o = 36 and d = 30 are used.
Since the measure uses YcbCr color space and filtered images, Table 8 presents their contribution to its performance in terms of SRCC on the TID2013.As reported, color channels contributed similarly.However, the channels filtered using the bilaplacian operators carry the most information that can be used for the quality prediction.As reported, the channels alone could not provide satisfactory SRCC performance, which confirms their complementary relationship and justifies their joint application in the SEER.

Conclusions
In this work, a novel NR-IQA technique has been presented.The introduced SEER incorporates the distribution of local intensity gradients.The histogram of a set of these descriptors, obtained for YCbCr channels of a distorted image, as well as its channels filtered with bilaplacian operators, are used as perceptual features to train an SVR model for the image quality prediction.It has been shown that the use of a descriptor, which takes into account different sizes of described image regions and their mutual relationship, is beneficial to the IQA, as obtained feature vectors respond consistently to the distortion type and its severity.Furthermore, it has been demonstrated that, to deliver superior performance, the information extracted from YCbCr channels in the bilaplacian domain should be employed.The introduced technique was evaluated and compared with the state-of-the-art NR hand-crafted and NN-based measures on six IQA datasets.The experimental results demonstrate that overall the SEER outperforms the compared NR measures in terms of prediction accuracy and generalization capability.
The future work on the SEER will focus on an application of a keypoint detector that indicates image regions for the description [27,67].Another promising direction of research is to train a deep learning model using features provided by the SEER, similar to the work in [35].
The Matlab code of the proposed approach is publicly available at http://marosz.kia.prz.edu.pl/SEER.html.

Figure 1 .
Figure 1.Image processing steps in the SEER.
A 1 is equal to B 1 , or A 2 to B 2 , the length of the vector is calculated without the subtraction in the denominator.For an exemplary image I 512×512 , the lengths of the feature vectors for the descriptors D(C 2×2 , B 2×2 ) and D(C 4×4 , B 2×2 ), calculated with o = 9, are 2,340,900 and 580,644, respectively.Such high-dimensional feature vectors are designed for robust object recognition and they cannot be used to train a quality model with the SVR.Therefore, it is shown in this paper that their histograms are sensitive to image degradation, consequently leading to the efficient application of the HOG to the NR-IQA.

Figure 2 .
Figure 2. Block diagram of the method.

Figure 3 .
Figure 3. Division of an image into blocks and cells in the HOG.

∆ 2 b]
or D i , where i = 1, . . ., 9.There are nine HOG descriptors per ith image, with different C and B. Hence, the HOG descriptors for the I can be written as D I = [D I 1 , D I 2 , . . ., D I 9 ]

Figure 4 .
Figure 4. Influence of the Gaussian blur on HOG descriptors obtained for YCbCr channels of an image.The D(C 2×2 , B 2×2 ) descriptor is visualized for the selected image patch.

Figure 5 .
Figure 5. Influence of distortions on features used in the SEER for exemplary images distorted with: Gaussian blur (a-e); and Gaussian noise (l-p).The images are ordered by the distortion severity (from left to right).Below distorted images, the histograms for two descriptors are shown: D(C 2×2 , B 2×2 ) denoted by D a (f-h,r-t); and D(C 4×4 , B 2×2 ) denoted by D b (i-k,u-w).Colors are assigned to distorted images.

Figure 6 .
Figure 6.Influence of parameters of the SEER on its performance: (a) sizes of cells and their arrangement in blocks in the HOG descriptor; (b) the number of used descriptors added in the precedence shown in (a); and (c) the number of orientation bins o and intervals d, respectively.

Table 1 .
Image quality assessment datasets.

Table 2 .
Performance evaluation on individual datasets.

Table 5 .
Performance comparison between the SEER and NR measures which use NN architectures.
Note: The comparison is based on published median values of SRCC and PCC over 10 training-testing iterations.

Table 8 .
Contribution of YCbCr channels with filtering to the performance of the SEER on the TID2013 in terms of SRCC.