On the Application LBP Texture Descriptors and Its Variants for No-Reference Image Quality Assessment

: Automatic assessing the quality of an image is a critical problem for a wide range of applications in the ﬁelds of computer vision and image processing. For example, many computer vision applications, such as biometric identiﬁcation, content retrieval, and object recognition, rely on input images with a speciﬁc range of quality. Therefore, an effort has been made to develop image quality assessment (IQA) methods that are able to automatically estimate quality. Among the possible IQA approaches, No-Reference IQA (NR-IQA) methods are of fundamental interest, since they can be used in most real-time multimedia applications. NR-IQA are capable of assessing the quality of an image without using the reference (or pristine) image. In this paper, we investigate the use of texture descriptors in the design of NR-IQA methods. The premise is that visible impairments alter the statistics of texture descriptors, making it possible to estimate quality. To investigate if this premise is valid, we analyze the use of a set of state-of-the-art Local Binary Patterns (LBP) texture descriptors in IQA methods. Particularly, we present a comprehensive review with a detailed description of the considered methods. Additionally, we propose a framework for using texture descriptors in NR-IQA methods. Our experimental results indicate that, although not all texture descriptors are suitable for NR-IQA, many can be used with this purpose achieving a good accuracy performance with the advantage of a low computational complexity.


Introduction
With the fast growth of imaging systems, a large number of digital images are being generated every day.These images are often altered in the acquisition, transmission or compression stages.These alterations can introduce distortions that may affect how human and machines understand the image content.Therefore, multimedia and computer vision applications can really benefit from automatic tools that are capable of assessing image quality.More specifically, image quality assessment (IQA) methods can be used, for example, to determine optimal codec parameters [1], find best perceptual coding schemes [2][3][4][5][6], and design efficient image watermarking algorithms [7,8].Moreover, a recent report by Conviva R shows that viewers are demanding a higher quality of delivered multimedia content [9].As users' demands increase, the importance of designing automatic tools to predict the quality of the visual stimuli also increases.
In the context of computer vision (CV), the quality of input images can affect the performance of the algorithms.For instance, Kupyn et al. [10] have shown that object detection methods based on deep learning approaches are greatly affected by the quality of the input images, as can be seen in the images depicted in Figure 1.Moreover, Dodge and Karam [11] demonstrated that deep neural networks are susceptible to image quality distortions, particularly to blur and noise.Another examples of known CV algorithms that are affected by the quality of the input images include finger vein detection [12], biometric sensor spoofing [13], face recognition [14], video stream recognition systems [15], deep learning reconstruction of magnetic resonance imaging (MRI) [16], and multi-view activity recognition [17].Object detection using YOLO [18] on the distorted (left) and pristine (right) images, taken from GoPro [19] dataset.The detection effectiveness of YOLO is remarkably impaired by the quality of the input image.
There are mainly two ways of measuring image quality.The first consists of performing psychophysical experiments in which humans rate the quality of a set of images.These experiments use standardized experimental methodologies to obtain quality scores for a broad range of images processed with a diverse number of algorithms and procedures.Since these experiments use human subjects, this approach is known as subjective quality assessment and it is considered the most accurate method to estimate quality [20].Unfortunately, subjective methods are expensive and time-consuming and, therefore, are unsuitable for the most real-time applications.The second approach consists of using computer algorithms to obtain a quality estimate.Since this does not require human subjects, this approach is often entitled 'objective quality assessment'.If a given objective method produces results that are well correlated with the quality scores provided by human viewers, it can be used to replace subjective methods.
Objective IQA methods are classified according to the amount of available reference information they require.If the full reference image (pristine content) is required to estimate quality, the method is classified as full-reference (FR).If the method only requires a limited amount of information regarding the reference image, the method is a reduced-reference (RR) method.Since requiring full or limited reference information can be a severe impediment for applications, one solution is to use no-reference (NR) methods, which evaluate the quality of images without requiring any information about the reference image.Objective methods can also be classified according to their target applications.Methods designed for specific applications are known as distortion-specific (DS) methods.DS methods can be designed to estimate the amount of sharpness [21][22][23], JPEG/JPEG2000 degradations [24,25], blockiness artifacts [26], contrast distortions [27], and enhancement [28] in an image.Although DS methods can be useful for specific scenarios, they have limited applicability in the real world.An alternative to DS methods is the distortion-generic (DG) methods, which do not require a prior knowledge of the type of distortion and, therefore, are more adequate for diverse scenarios.As expected, the design of DG methods is more challenging [29,30].
According to Hemami and Reibman [31], the design of IQA methods requires three major steps: measuring, pooling, and mapping.Measuring refers to the extraction of a set of specific physical attributes of the image.In other words, the method must compute a set of image features that describes visual quality.Pooling refers to the combination of these measurements to create a link between the image features and its quality.Mapping refers to the model of correspondence between the result of the pooling and the subjective scores.Most existing works focus on the measuring stage, where quality-aware features are designed to measure the level of image distortion.These features are usually based on the natural scene statistics (NSS) [32][33][34][35], assuming that pristine natural images have particular statistical properties that are disturbed by the distortions.NSS-based methods can extract features in different domains, such as discrete cosine transform (DCT) domain [36][37][38], discrete wavelet transform (DWT) domain [39][40][41], spatial domain [42], etc.More recently, convolutional neural networks (CNN) have also been used in the design of NR-IQA methods [43][44][45].CNN-based methods use the direct correspondence between the hierarchy of the human visual system and the layers of a CNN [46][47][48][49][50].
Another trend has been the use of saliency models (SM) [51][52][53][54].SMs provide a measure of the perceptual importance of each image region, which allows quality assessment methods to weight the distortion according to region importance.In other words, quality and saliency are inherently associated because both of them depend on how the human visual system perceives the content and, consequently, on how (suprathreshold) distortions are detected [53].Some investigators have studied how to include saliency information into existing visual quality metrics in order to boost their performance [52,[55][56][57][58]. Nevertheless, most of these investigations are targeted at either FR or DS image quality metrics.
In this paper, we investigate the suitability of texture descriptors to assess image quality.This paper has inspiration on the studies of Ciocca et al. [59] and Larson and Chandler [60].The premise is that visible impairments alter the statistics of texture descriptors, making it possible to estimate image quality.To investigate this premise, we analyze the use of a set of state-of-the-art texture descriptors in quality assessment methods.Additionally, we propose a framework to use these texture descriptors for NR-IQA.The framework is based on supervised machine learning (ML) approach that takes into account how impairments affect the statistics of the texture descriptors.These statistics are used as feature vectors of a random forest regression algorithm that learns the predictive quality model via regression [61].
The rest of this paper is organized as follows.Section 2 presents a brief review of the texture descriptors investigated in this paper.Section 3 describes the proposed framework, the experimental setup, all simulation results, and a discussion of these results.Finally, Section 4 presents the conclusions.

Texture Descriptors
Texture is a fundamental attribute of images, but there is no consensus on its definition.Petrou and Garcia-Sevilla, for instance, define texture as a variation of the visual stimuli at scales smaller than the scale of interest [62].Davies associates texture to patterns with both randomness and regularity [63].In this paper, texture refers to area characteristics that are perceived as combinations of basic image patterns.These basic patterns present a certain regularity that is captured by statistical measures.
To characterize a texture, texture analysis methods identify and select a set of relevant texture features.Over the years, several texture analysis methods have been proposed, using a variety of approaches [62,63], including gray level run-length (GLRLM) [64], gray level co-occurrence matrices (GLCM) [65], texture spectrum [66], and textons [67].Among the popular texture operators is the local binary patterns (LBP) [68], which describes the local textures of an image by performing simple operations.More specifically, the textures are labeled according to the relationships between each pixel and its neighbors.One of the advantages of the LBP descriptor is that it unifies traditional texture analysis models.
There are several modifications of the LBP operator [69,70].Most of them try to improve the performance of the LBP in specific applications (e.g., texture classification, face recognition, object detection, etc.).However, few works have investigated the performance of the LBP (and its variants) in specific applications.This paper is inspired by the work of Hadid et al. [69], who compared the performance of 13 different LBP-based methods in gender recognition applications.Our focus is to test the performance of LBP-based descriptors in IQA applications.This section describes the basic LBP descriptor and the state-of-the-art LBP variants considered in this work.

Basic Local Binary Patterns (LBP)
The Local Binary Pattern (LBP) is arguably one of the most powerful texture descriptors.It was first proposed by Ojala et al. [68] and it has since been proven to be an effective feature extractor for texture-based problems.The traditional LBP descriptor takes the following form: where In Equation ( 1), I c = I(x, y) is an arbitrary central pixel at the position (x, y) and I p = I(x p , y p ) is a neighboring pixel surrounding I c , where and P is the total number of neighboring pixels I p , sampled with a distance R from I c . Figure 2 illustrates examples of symmetric samplings with different numbers of neighboring points (P) and radius (R) values.Figure 3 illustrates the steps for applying the LBP descriptor on a single pixel (I c = 8) located in the center of a 3 × 3 image block, as shown in the bottom-left of this figure.The numbers in the yellow squares of the block represent the order in which the descriptor is computed (counter-clockwise direction starting from 0).In this figure, we use a unitary neighborhood radius (R = 1) and eight neighboring pixels (P = 8).After calculating S(t) (Equation ( 2)) for each neighboring pixel I p (0 ≤ p ≤ 7), we obtain a binary output for each I p , as illustrated in upper-left position of Figure 3.The black circles correspond to '0' and white circles to '1'.These binary outputs are stored in a binary format, according to their position (yellow squares).The LBP output for I c is the decimal number, obtained by converting this binary number.After the LBP is applied to all pixels in an image, we get a set of labels that compose the LBP channel.Figure 4 shows examples of LBP channels for the image 'Baboon', obtained using different radius values and different numbers of neighbors.When an image is rotated, the I p sampled values move along the perimeter of the circumference around I c , generating a circular shift in the binary number generated.As a consequence, a different decimal LBP R,P (I c ) value is obtained.To remove this effect, we can use the following rotation invariant (ri) descriptor, defined as:

REF
where k = {0, 1, 2, • • • , P − 1} and ROTR(x, k) is the circular bit-wise right shift descriptor that shifts the tuple x by k positions.Due to the crude quantization of the angular space and to the occurrence of specific frequencies in individual patterns, LBP R,P and LBP ri R,P descriptors do not always provide a good discrimination [71].To improve the discriminability, Ojala et al. [68] proposed a 'uniform' descriptor that captures fundamental pattern properties: where and In addition to a better discriminability, the uniform LBP descriptor has the advantage of generating fewer distinct LBP labels.While the 'nonuniform' descriptor (Equation ( 1)) produces 2 P different output values, the 'uniform' descriptor produces only P + 2 distinct output values.Finally, once the LBP mask is calculated using any the LBP approaches described above, we compute its histogram.Next, we present some of the LBP variants that have been proposed to improve the robustness and discriminability of the original descriptor.

Local Ternary Patterns (LTP)
The LTP operator is an extension of the LBP descriptor that assumes up to 3 coded values ({−1, 0, 1}).This is achieved by changing the step function S in the following manner: where τ is a threshold that determines how sharp an intensity change should to be considered as an edge.After computing the ternary codes, each ternary pattern is split into two codes: a positive (upper pattern) and a negative (lower pattern) codes, which are treated as two separate channels.Figure 5 illustrates the basic feature extraction procedure for a single pixel using a LTP descriptor.The numbers in yellow squares represent the order in which the step function is computed (Equations ( 2) and ( 9)).In this example, we consider an unitary neighborhood radius (R = 1), eight neighboring pixels (P = 8), and a threshold τ equal to five.While in the LBP the binary code takes only two values (0 or 1, represented by the colors black and white), the LTP descriptor generates three possible values (see Equation ( 9)) that are represented by the colors black ( Ŝ(t) = 1), white ( Ŝ(t) = 0), and red ( Ŝ(t) = −1).We split the LTP code into two LBP codes (with only positive values).First, we create the upper pattern by converting the negative codes to zero.Next, we create the lower pattern by setting the positives values to zero, by converting the negative values to positive.Comparing Figure 5 and Figure 3, we notice that the LTP descriptor generates two texture information maps that are two separate LBP channels.Finally, we compute independent histograms and similarity measures for each of these maps and combine these histograms to generate the feature vector.

Local Phase Quantization (LPQ)
A limitation of the LBP is its sensitivity to blur.To tackle this problem, the local phase quantization (LPQ) descriptor was proposed [72].The LPQ descriptor performs a quantization of the Fourier transform phase in local neighborhoods.Assuming that G(u) and F(u) are the discrete Fourier transforms (DFT) of the blurred g(z) and original f (z) images, which are related by the following equation: Assuming that h(x) = h(−x), its DFT is always real and the phase assumes only two values, namely: For the LPQ descriptor, the phase is computed in the local neighborhood N z , for each pixel position of f (z).The local spectrum is computed with the following equation: where u is the frequency and w R is a window given by: The local Fourier coefficients are computed at four frequencies for each pixel position, i.e., where T , and u 4 = [a, −a] T .In these cases, a is sufficiently small to satisfy H(u i ) > 0.
The phase of the Fourier coefficients is given by the signs of the real and imaginary parts of each component F(x), computed by scalar quantization: where g j is the j-th component of G(x) = [Re{F(x)}, Im{F(x)}].After generating the binary coefficients q j , the feature vector is generated using the same technique used in the LBP.

Binarized Statistical Image Features (BSIF)
The binarized statistical image features (BSIF) is a descriptor proposed by Kannala and Rahtu [73], which does not use a manually predefined set of filters.Instead, it learns the filters using the statistics of natural images.BSIF is among the best texture descriptors for face recognition and texture classification applications [69,73].Differently from previous descriptors, which operate on pixels, BSIF works on patches of pixels.Given an image patch X of size l × l pixels and a linear symmetric filter W i of the same size, the filter response s i is obtained computing the following expression: where vectors w and x contain the pixels of W i and X, respectively.The binarized feature is acquired using the following function: The filters W i are learned via independent component analysis (ICA).The binarized features b i are aggregated following the same procedure described for generating the LBP labels.The descriptive features are obtained by computing the histogram of the aggregated data.
Similarly to the LBP, which generates LBP channels, the BSIF generates coded images.These coded images are the set of labels generated after the binarized features are computed using Equation (17) and aggregated using Equation (1).The aggregation of BSIF results is based on a selected number of bits, instead of the number of neighbors of the labeled pixel.The labeling depends on the relationship between the patch size l and the number of binarized features b i .Figure 6 shows the BSIF coded images corresponding to the same reference using different BSIF parameters.As can be seen in this figure, the textured information depends on the patch size l and on the number of bits.The number of bits is less or equal l 2 − 1.This is the reason why the second column does not contain BSIF coded images for 9, 10, 11, or 12 bits.Figure 6 shows that the choice of the number of bits and patch sizes is important for texture analysis algorithms.Therefore, multiscale approaches that incorporate several combinations of these parameters are interesting [74][75][76][77].

Rotated Local Binary Patterns (RLBP)
For some applications, image rotation affect the LBP results because of the fixed order of its weights.Since weights are distributed in a circular way, the effect of rotation can be eliminated by rotating the weights by the same angle.When the rotation angle is not known, an adaptive arrangement of weights, based on the locally computed reference direction, can be used.Mehta and Egiazarian [78] proposed the rotated local binary (RLBP) descriptor, which considers that, if an image is rotated, the descriptor should be rotated by the same angle.
The RLBP makes the LBP invariant to rotation by circularly shifting the weights according to the dominant direction (D).In a neighborhood of a pixel I c , D is the index of the neighbor whose difference to I c is maximum, i.e., D = argmax Since D is taken as a reference, the weights are assigned with respect to it.The RLBP descriptor is computed as follows: where i (mod j) is the remainder of the division of i by j.
Figure 7 depicts the effect of a rotation on LBP and RLBP descriptors.Notice that the LBP changes for a rotation.The red color indicates pixels with values above the threshold, while the yellow color indicates pixels with the maximum difference to I c (D).The position D takes the smallest weight, while the other positions get weights that correspond to circular shifts with relation to D. From Figure 7g, we notice that the weight corresponding to D is the same both for the original and rotated images, even when these pixels are at different angles.Therefore, the RLBP values for two rotated neighborhoods are the same.Figure 8 shows the effect of rotation after generating the LBP and RLBP channels.The first row shows the LBP and RLBP maps of the original images and their corresponding histograms.The second row shows the same information for a 90 degrees rotated version of the original image.To compare the differences between the LBP and RLBP histograms, before and after the rotation, we use three statistical divergences measures: Kullback-Leibler divergence (KLD) [80], Jensen-Shannon divergence (JSD) [81], and chi-square distance (CSD) [82].The KLD, JSD, and CSD of the LBP histogram are 2.92 × 10 −2 , 6.96 × 10 −3 , and 2.11 × 10 2 , respectively.These divergences for the RLBP histograms are 2.06 × 10 −4 , 5.12 × 10 −5 , and 1.57, respectively.Therefore, the order of magnitude of the LBP statistical divergences is two times higher than for the RLBP statistical divergences.

Complete Local Binary Patterns (CLBP)
The LBP descriptor considers only the local differences of each pixel and its neighbors.The complete local binary patterns consider both signs (S) and magnitude (M) of the local differences, as well as the original intensity value of the center pixel [83].Therefore, the CLBP feature is a combination of three descriptors, namely CLBP S , CLBP M , and CLBP C . Figure 9 illustrates the computation of the CLBP feature.The CLBP S and CLBP M components are computed using the local difference sign-magnitude transform (LDSMT), which is defined as: where s p = S(I p − I c ) and m p = |I p − I c |.The s p is the sign descriptor used to compute CLBP S , i.e., CLBP S is the same as the original LBP and it is used to code the sign information of the local differences.CLBP M is used to code the magnitude information of local differences: where In the above equation, c is a threshold set as the mean value of the input image I. Finally, the CLBP C is used to code the information of original center gray level value: The three descriptors, CLBP S , CLBP M , and CLBP C , are combined.Individual histograms are computed and concatenated.This joint histogram is used as a CLBP feature.

Local Configuration Patterns (LCP)
Local configuration patterns (LCP) is a rotation invariant image descriptor proposed by Guo et al., which is more discriminative [84].LCP decomposes the image information into two levels: local structural information and microscopic configuration information.The local structural information is composed by LBP features, while the microscopic configuration (MiC) information is determined by the image configuration and by the pixel-wise interaction relationships.
To model the image configuration, we estimate the optimal weights, which are associated with the neighboring pixels, to linearly reconstruct the central pixel intensity for each pattern type.This can be expressed by the following equation: where I c and I p denote the intensity values of the center pixel and neighboring pixels, a p are weighting parameters associated with I p , and E(a 0 , a 1 , . . ., a P−1 ) are the reconstruction errors with respect to the model parameters.To minimize the reconstruction errors, the optimal parameters for each pattern are determined by a least squares estimation.Suppose the occurrence of a particular pattern type j is f j .There are f j pixels in the image with the pattern j.We denote intensities of those f j pixels as c j,i , where i = 0, 1, . . ., f j − 1.These intensities c j,p are organized into a vector: We denote the intensities of neighboring pixels with respect to each c j,i as v i,0 , . . ., v i,P−1 , which are organized into a matrix with the following form: To minimize the reconstruction error (Equation ( 24)), the unknown parameters a p are organized as a vector: and the optimal parameters are determined by solving the following equation: After determining A j , we apply the Fourier transform to the estimated parameter, which can be expressed by: where H j (k is the k-th element of H j and A j (p) is the p-th element if A j .The magnitude part of each element of the vector H j is taken as the resulting MiC, which is defined by: The LCP feature is formed by both pixelwise interaction relationships and local shape information, which is expressed as: where |H j | is computed using Equation ( 30) with respect to the j-th pattern and O j is the number of occurrences of the j-th LBP label.

Opposite Color Local Binary Patterns (OCLBP)
To combine both texture and color information into a joint descriptor, Maenpaa [85] proposed to use the Opponent Color Local Binary Pattern (OCLBP) descriptor.This descriptor improves the descriptor proposed by Jain and Healey [86] by substituting the Gabor filter with a variant of the LBP descriptor, decreasing its computational cost.The OCLBP descriptor uses two approaches.In the first, the LBP descriptor is applied, individually, on each color channel, instead of being applied only on a single luminance channel.This approach is called 'intra-channel' because the central pixel and the corresponding sampled neighboring points belong to the same color channels.In the second approach, called 'inter-channel', the central pixel belongs to a color channel and its corresponding sampled neighboring points belong to another color channel.More specifically, for an OCLBP MN descriptor, the central pixel is positioned in the channel M, while the neighborhood is sampled in the channel N.For a three-channel color space, such as RGB, there are six possible combinations of channels: OCLBP RG , OCLBP RG , OCLBP RB , OCLBP RB , OCLBP GB , and OCLBP GB .
Figure 10 depicts the sampling approach of OCLBP when the central pixel is sampled in R channel.From this figure, we can notice that two combinations are possible: OCLBP RG (left) and OCLBP RB (right).In OCLBP RG , the gray circle in the red channel is the central point, while the green circles in the green channel correspond to '0' sampling points and the white circles correspond to '1' sampling points, respectively.Similarly, in the OCLBP RB the blue circles correspond to '0' sampling points and the white circles correspond to '1' sampling points, respectively.After computing the OCLBP descriptor for all pixels, a total of six texture channels are generated.As depicted in Figure 11, three LBP intra-channels (LBP R , LBP G , and LBP B ) and three LBP inter-channels (OCLBP RG , OCLBP RB , and OCLBP GB ) are generated.Although all possible combinations of the opposite color channels allow six distinct channels, we observed that the symmetric opposing pairs are very redundant (e.g., OCLBP RG is equivalent to OCLBP GR ).Due to this redundancy, only the three more descriptive inter-channels are used.

Three-Patch Local Binary Patterns (TPLBP)
Wolf et al. [87] proposed a family of LBP-related descriptors designed to encode additional types of local texture information.While variants of LBP descriptor use short binary strings to encode information about local micro-texture pixel-by-pixel, the authors considered capturing information which is complementary to that computed pixel-by-pixel.These patch-based descriptors are named Three-Patch LBP (TPLBP) and Four-Patch-LBP (FPLBP).
TPLBP considers a w × w patch centered on a pixel and and S additional patches distributed uniformly on a ring of radius r around it, as illustrated in Figure 12.For an angle α, we get a set of neighboring patches along a circle and compare their values with those of the central patch.More specifically, the TPLBP is given by: where The function d(x, y) is any distance function between two patches under a vector representation.Examples of d(x, y) are Manhattan [88], Mahalanobis [89], Minkowski [90], etc.The parameter τ is slightly larger than zero to provide some stability in uniform regions.

Four-Patch Local Binary Patterns (FPLBP)
In FPLBP, two rings centered on the pixel are used, instead of only one ring as used in TPLBP.As depicted in Figure 13, two rings of radii r 1 and r 2 (centered in the central pixel) are considered, with S patches of size w × w equally distributed on each ring, positioned α patches away along the circle.We compare the two center symmetric patches in the inner ring with the two center symmetric patches in the outer ring.The bit in each coded pixel is set according to which of the two pairs is being compared.Therefore, the FPLBP code is computed as follows:

Multiscale Local Binary Patterns (MLBP)
The Multiscale local binary pattern (MLBP) is an extension of the LBP, designed with the goal of extracting image quality information [91].A block diagram of the MLBP descriptor is depicted in Figure 14 and it is computed as follows.First, we generate several LBP channels, by varying the parameters R and P and performing a symmetrical sampling.For the smallest possible radius, R = 1, there are two possible P values that produce rotational symmetrical sampling (P = 4 and P = 8).When R = 2, there are three possible P values (P = 4, P = 8, and P = 16).In general, for a given radius R, there is a total of R + 1 distinct LBP channels.Figure 14a depicts the feature extraction for R = 1.The unitary radius generates only two distinct symmetrical patterns (P = 4 and P = 8).Each pattern generates a distinct LBP channel (see Figure 4).For a radius R, LBP maps are generated and combined: where LBP u R,P is computed according to Equation ( 6) and L R contains R + 1 elements.From these LBP channels, the texture features are obtained by computing the histogram of each member of L R : where and In the above equations, (x, y) indicates the position of a given point of LBP u R,P and l i is the i-th LBP label.Notice that we are using 'uniform' LBP descriptors (Equation ( 6)) since their histograms provide a better discrimination of the texture properties.
To obtain the feature vector, we vary the radius, compute all possible symmetric LBP patterns and their histograms, as illustrated in Figure 14b.For a radius R, we generate a vector of histograms by concatenating all individual LBP histograms: where ⊕ denotes the concatenation descriptor.
The steps for computing the multiscale LBP histogram are summarized in Figure 15.For R = N, the final feature vector is generated by concatenating the histograms of the LBP channels with radius values smaller than N: where R = N is the maximum radius value and x N is the feature vector used to compute the histogram.

Multiscale Local Ternary Patterns (MLTP)
In general, the LTP threshold τ is adjusted for the target application.Anthimopoulos et al. [92] demonstrated that the τ values correspond to the gradient of the image.The choice of τ may affect the discrimination of edge and non-edge pixels, which is an important step in the texture analysis.We propose [93] an optimal set of thresholds to be used in the multilevel edge description operation, which make it possible to cluster gradient PDFs.The procedure is described as follows.First, the image gradients are fit using an exponential distribution: where λ is the rate parameter of the distribution.Then, the average value of the image gradient λ −1 is computed.The inverse cumulative distribution function of PDF e is, then, obtained using the following equation: where: and i ∈ {1, 2, • • • , L} and L is the number of levels.To select a threshold, we take for equally spaced values of ∆ i .The feature extraction process is illustrated in Figure 16.We decompose the image into LTP channels.These channels are generated by varying the τ values according to Equations ( 42)- (44).Since for a single image the LTP descriptor produces two channels, for L numbers of τ i , 2L LTP channels are produced.For example, in Figure 17, we use L = 4, generating eight distinct LTP channels.In the proposed LTP approach, instead of computing the differences between t c and its neighbors on the grayscale image, we take the maximum difference on the R, G, or B channels.After the aforementioned steps are completed, we obtain a set of LTP channels with 2 × L elements: In this set, the subscript index corresponds to the i-th τ value, while the superscript index indicates whether the element is an upper (up) or lower (lo) pattern.For each LTP channel C j i , where j ∈ {up, lo}, we compute the corresponding LTP histogram H j i .These histograms are used to build the feature vector.If we simply concatenate these histograms, we generate a feature vector with a 2 P × 2 × L dimension.Depending on the L and P parameters, the number of features can be very high, what has a direct impact on the performance of the proposed algorithm.
In order to limit the number of dimensions, the number of bins of the LTP histograms is reduced according to the following formula: where • is the operation of rounding to the nearest integer, n defines the number of equal-width bins in the given range, and k j i is the reduced number of bins of histogram H j i .After this quantization, we acquire a set of quantized histograms {h This new set is used to generate the feature vector associated with the image I.More specifically, the feature vector x is generated by concatenating the quantized histograms where ⊕ is the concatenation descriptor and x is the feature vector.

Local Variance Patterns (LVP)
The Local Variance Pattern (LVP) is an extension of the LBP descriptor proposed in this work.This descriptor was developed specifically for quality assessment tasks.The LVP descriptor computes the texture local energy using the following formula: where: V R,P (I c ) = LVP descriptor estimates the spread of the texture local energy.By measuring the texture energy, the LVP descriptor is able to estimate the effect that specific impairments have on the texture.For example, a Gaussian blurring impairment decreases the local texture energy, while a noise impairment increases it.Figure 18 shows a comparison of the steps used to extract texture information using the LBP and LVP descriptors, assuming that R = 1 and P = 8.The numbers in the yellow squares represent the order in which the steps are computed.The LBP descriptor generates two possible values (see Equation ( 2)), which are represented by the colors white (S(t) = 1) and black (S(t) = 0).Next, we use Equation ( 6) to compute the LBP label and Equation ( 47) to compute the LVP label.After computing the LBP and LVP labels for all pixels of a given image, we obtain two channels for each image.These channels, C LBP and C LVP , correspond to the LBP and LVP patterns, respectively.Examples of these channels are shown in Figure 19.The first row of this figure shows the unimpaired reference image and three impaired images, degraded with different types of distortions.The second and third rows show the C LBP and C LVP channels for each image, respectively.Observing the C LBP and C LVP patterns in Figure 19, we notice that textures are affected differently by the different impairments.Comparing the C LBP channels corresponding to the noisy, blurry, and jpeg2k compressed images (2nd line of Figure 19), we can notice that they are very different among themselves.The C LBP channels corresponding to the blurry and jpeg2k images are also very different from the C LBP channel corresponding to the reference (unimpaired) image.Nevertheless, the C LBP channel corresponding to the noisy and reference images are visually similar.This similarity makes it difficult to discriminate between unimpaired and impaired images, what affects the quality prediction.Nevertheless, the C LVP channels clearly show the differences between impaired and reference images, as can be seen in the 3rd line of Figure 19.

Orthogonal Color Planes Patterns (OCPP)
The Orthogonal Color Planes Pattern (OCPP) descriptor extends the LBP to make it more sensitive to color and contrast distortions.Consider a pixel τ c = I(x, y, z) of a tri-dimensional (XYZ) color image I.This image can be decomposed into a set of individual XY planes stacked along the Z-axis, a set of YZ planes stacked along the X-axis, or a set of XZ planes stacked along the Y-axis.In this work, we concatenate the LBP descriptors corresponding to the XY, XZ, and YZ planes to build an orthogonal color planes pattern (OCPP) texture descriptor.
As can be noticed from the aforementioned formulation, the LBP descriptor corresponding to the XY, XZ, and YZ planes can be computed independently to generate the thee LBP maps: LBP XY , LBP XZ , and LBP YZ .But, since the spatial dimensions of the XY, XZ, and YZ planes are generally different, the radius (R X , R Y , and R Z ) and the number of sampled points (P XY , P XZ , and P YZ ) corresponding to each of the LBP maps can vary.Figure 20a illustrates how the points along the tri-dimensional HSV color space are sampled, while Figure 20b-d  Considering R Z = 1 and R X = R Y = R, the coordinates of the neighboring points in the XY, XZ, and YZ orthogonal planes are given by: We compute the LBP for each plane using the following equations: and The OCPP descriptor is built by concatenating these individual LBP descriptors:

Salient Local Binary Patterns (SLBP)
The salient local binary pattern (SLBP) is an extension of the LBP which is designed to be used in image quality assessment methods.The descriptor incorporates visual salient information, given that recent results show that visual attention models improve the performance of visual quality assessment methods [52,58].
To estimate the saliency of the different areas of an image I, we use a computational visual attention model.More specifically, to keep the computational complexity low, we chose the Boolean map-based saliency (BMS) model [94].When compared with other state-of-the-art visual attention models, BMS is noticeably faster, while still providing a good performance.
After  We generate the feature vector by computing the histogram of L weighted by W. The histogram } is given by the following expression: where The number of bins of this histogram is similar to the number of distinct LBP patterns of L. So, we can remap each L[i, j] to its weighted form, generating the map S displayed in Figure 21d.This figure depicts a heatmap representing the importance of each local texture.We name this weighted LBP map as the "Salient Local Binary Patterns" (SLBP).

Multiscale Salient Local Binary Patterns (MSLBP)
The multiscale salient local binary patterns (MSLBP) is an extension of SLBP in combination with MLBP.The idea behind MSLBP is to achieve fine information about frame texture by varying the parameters of LBP and combining the multiple generated LBP maps with saliency maps.In other words, we variate the SLBP to obtain multiple maps, as illustrated in Figure 22.For each combination of radius (R) and sampled points (P), we have an associated histogram H R,P .

No-Reference Image Quality Assessment Using Texture Descriptors
In the previous section, we presented a series of texture descriptors.Most of them were designed for pattern recognition and computer vision applications.We also presented a set of proposed descriptors (MLBP, MLTP, LVP, OCPP, and SLBP), which were specially designed for visual quality assessment.Our goal is to investigate which descriptors are more suitable for no-reference (blind) image quality assessment (NR-IQA) methods.Moreover, we are interested in the relation between the type of descriptor and the performance accuracy of the IQA method.After generating the labeled database formed by the set of pairs (I k , v k ), the features are extracted in order to generate the IQA model.For each image I k , we compute the histogram of the given LBP variant H k and concatenate all histograms to produce the feature vector.Therefore, the training data is composed by the set (H k , v k ).The model is created using (H k , v k ), which is formed by a matrix H ∈ R K×Q and a vector v ∈ R 1×K .In this case, K is the number of training entries (rows of H) and Q is the number of features (columns and the numbers of bins of H k ).

Training and Testing Stages
The prediction model is built using a regression model.This model maps each H k into a real value v k that predicts a corresponding quality score.The chosen regression model is the random forest (RF) regressor [61].RF was chosen based on the results of Fernandez-Delgado et al. [95], which conducted an exhaustive evaluation of several machine learning methods and concluded that the best results are achieved by a family of RF methods.
The quality assessment task is depicted in Figure 24.After generating the prediction model, the image quality can be estimated using the model trained in the previous stage.The procedure is the same used for the images in the training set.In other words, the same feature (LBP histogram) is computed using the test image as input and, using this feature, the trained model predicts the quality score.

Test Setup
Results were generated using an Intel i7-4790 processor at 3.60 GHz.To assess the performance of the proposed NR-IQA method, we compute the Spearman's Rank Ordered Correlation (SROCC) between the mean opinion scores (MOS) and the predicted scores.Although other correlation coefficients (such as KRCC and PCC) can be added in the analysis, we decided to report the results using only the SROCC to prevent this article from becoming too lengthy.The proposed method is compared with the fastest state-of-the-art NR-IQA methods, including BRISQUE [42], CORNIA [96], CQA [97], SSEQ [98], and LTP [93].These methods were chosen because they are all based on machine learning techniques, making the comparison with the proposed method straightforward.Moreover, the codes of these methods are publicly available for download.
For all machine learning NR-IQA methods, we use the same procedure for training and testing.In order to avoid overlapping between content detection and quality prediction, we divide the benchmark databases into content-independent training and testing subsets.Specifically, image content in the training subset was not used in the testing subset, and vice-versa.This division is made in a way that 80% of images are used for training and 20% are used for testing.This split is a common procedure used by several ML-based NR-IQA methods [42,96,97].For the machine learning NR-IQA methods that are based on SVR, we use a LibSVR implementation accessed via Python interface and provided by the Sklearn library [99].The optimal SVR meta parameters (C, γ, ν, etc.) are found using exhaustive grid search methods provided by Sklearn's API.No optimized search methods are used for the RF version of the proposed method.
The tests were performed using three image quality databases, which include subjective scores collected from psychophysical experiments.These databases are:

Results for Basic Descriptor with Varying Parameters
In order to test the LBP and its variants, we vary some parameters of each algorithm.Specifically, we vary the parameters of LBP, BSIF, CLBP, and LPQ.For the other tested variants, we choose the parameters R = 1 and P = 8.Table 1 depicts the parameters used by the tested algorithms.
To investigate the suitability of the basic LBP descriptor, we variate the parameters R and P using the Rotation Invariant LBP (LBP ri ), the Uniform LBP (LBP u ), and the Uniform LBP with Rotation Invariance (LBP riu2 ), which are described in Section 2.1.Figure 25 depicts the distribution of SROCC over simulations on the general case (i.e., when all distortions are considered).Table 2 shows the average SROCC correlation values for 100 simulations following the aforementioned protocol.In this table, STD represents the standard deviation and ∆ is the subtraction between the maximum and minimum value in a given row or column.From Table 2, we can notice that the basic LBP descriptor is suitable for predicting quality.This suitability is indicated by the high correlation indices obtained on LIVE2 database.On this database, the average SROCC vary from 0.8034 to 0.9532 in the general case, from 0.6459 to 0.9054 for the FF distortion, from 0.8771 to 0.9666 for the GB distortion, from 0.9285 to 0.9794 for the WN distortion, from 0.7812 to 0.9423 for the JPEG2k distortion, and from 0.7716 to 0.9306 for the JPEG distortion.These values suggest that basic LBP variations are well appropriate to model quality of images under WN and GB distortions.Regarding the LBP parameters, the prediction performance of WN and GB are less affected by these parameters when compared with other distortions (see the variance and ∆ values).
Although the basic LBP works well for WN and GB distortions independently of its parameters, the performance for other distortions varies according to the parameters.This variation is also observed in CSIQ and TID2013 databases.For example, on the CSIQ database, the SROCC values varies from 0.8073 to 0.8912 in the best case (JPEG) and from 0.2093 to 0.5901 in the worst case (CD).These values indicate that the prediction performance is affected by the basic LBP parameters.Actually, this is the premise used by Freitas et al. [91], who assume that different parameters of LBP can be used to achieve a better performance.In their work, an aggregation of features obtained with different LBP parameters results in a more robust quality assessment model.

Results for Variants of Basic Descriptors
Once it has been demonstrated that basic LBP variants present a suitable descriptor to describe image quality, we check the performance of the other LBP variants described in Section 2. To perform the tests, we vary the parameters of BSIF, LPQ, and CLBP descriptors.For the remaining extensions (i.e., LCP, LTP, RLBP, TPLBP, FPLBP, LVP, OCLBP, OCPP, SLBP, MLBP, MLTP, and MSLBP), we do not vary the parameters.Figure 26 depicts the distribution of SROCC for the general case using the tested LBP variants (100 simulations).
To investigate the suitability of the basic BSIF descriptor, we performed the simulations by changing the patch size and the number of selected binarized feature (see Section 2.4).The results of the performed simulations on the LIVE2, CSIQ, and TID2013 databases are depicted in Table 3 respectively.Based on results of Table 3, we notice that BSIF is a valuable descriptor for IQA.In the LIVE2 database, the BSIF performs well for almost all configurations.However, the results are better for smaller patch sizes.In these cases, the average SROCC values are higher and have a low variance.As shown in Table 3, the performance of BSIF decreases for the CSIQ database.When compared with the LIVE2 database, the average SROCC values are lower and the variance is higher.The values in both Table 3 indicate that there is a relationship between the patch size and the number of bits.More specifically, the larger the patch size, the higher the number of bits required to obtain a good quality prediction.For example, in both LIVE2 and CSIQ databases, using a 3 × 3 patch, the best performance is obtained using 8 bits and the worst performance is obtained when only 5 bits are used.
Table 4 shows the results of simulations using seven different LPQ configurations, corresponding to different LPQ parameters.The main parameters of the LPQ descriptor are the size of the local window and the method used for local frequency estimation.The size of the local window was fixed on 3 × 3 and the tests were performed by varying the method used for local frequency estimation.The LPQ configurations are the following:  Table 4 shows that the performance of LPQ is high for the LIVE2 database, with mean SROCC values above 0.9 for all distortions, independently of the configuration.The low variance and the high average value of the SROCC values for the LIVE2 indicate that LPQ is a suitable descriptor for measuring the quality of JPEG, JPEG2k, WN, GB, and FF distortions.However, the performance of the prediction decreases for the CSIQ and TID2013 databases.This is probably due to the presence of the contrast and color distortions on the CSIQ and TID2013 databases.Table 5 shows the average SROCC of simulations using CLBP as the texture descriptor.For this descriptor, we tested the influence of each combination of feature set (see CLBP S , CLBP M , and CLBP C in Figure 9) on the image quality prediction.From Table 5, we can notice that the feature sets, CLBP M and CLBP C , are individually unsatisfactory for measuring image quality.This is due to the low SROCC scores obtained for the three tested databases.On the other hand, CLBP S is the dominant feature set for quality description, since it presents the higher SROCC values in almost all cases.
Interestingly, the combination of CLBP feature sets produces a better performance, as indicated by the performances of CLBP SM (CLBP S + CLBP M ) and CLBP SMC (CLBP S + CLBP M + CLBP C ). From Table 5, we can observe that the mean SROCC value of overall case increases from 0.91 (CLBP S ) to 0.93 (CLBP MC and CLBP SMC ) for the LIVE2 database.The combination of feature sets also improves the average SROCC values of the TID2013 database, increasing from 0.35 (CLBP S ) to 0.44 (CLBP MC and CLBP SMC ).The average values on CSIQ database show that the best performance is obtained using CLBP MC .Based on these SROCC values, we can conclude that CLBP MC is the best combination of features to assess image quality since the incorporation of CLBP C does not improve or even deteriorate the general prediction performance.The exceptions are TPLBP and FPLBP that presented mean SROCC below 0.65, which is poorer than other methods.Based on the average values of mean SROCC on LIVE2, the methods LTP, RLBP, LCP, LVP, MLTP, SLBP, OCLBP, MLBP, MSLBP, and OCPP are in ascending order of performance.For the CSIQ and TID2013 databases, the methods perform similarly, but RLBP performs worse than LTP on CSIQ.
It is noticeable that multiscale approaches (MLBP, MLTP, and MSLBP) present the best results.For the three tested databases, the results are in agreement with the assumptions made by Freitas et al. [91], who demonstrated that combining multiple LBP descriptor parameters increases the prediction performance.However, we can observe that the OCPP descriptor presents the best performance when compared with any other tested descriptor, even when compared with the multiscale approaches.Although for the LIVE2 database the performance of the OCPP descriptor is similar to the performance of the MSLBP descriptor, this good performance is not achieved for the other databases.While MSLBP presents average SROCC values of 0.8147 for the CSIQ database, the OCPP presents an average SROCC value of 0.9140 for the same database.Similarly, for the TID2013 database, the average SROCC values obtained with MSLBP and OCPP are 0.5919 and 0.7035, respectively.When we observe the results obtained per distortion for the CSIQ database, we can notice that the superiority of OCPP is due to the good performance obtained for the contrast distortions.While the quality prediction of contrast-distorted images has a mean SROCC value equal to 0.5299 when using MSLBP, the mean SROCC value for these same images are 0.7753 when using OCPP.Similarly, for the TID2013 database, the OCPP presents a superior performance for several types of distortions, especially for color and contrast-related distortions (AGC, AGN, CA, CC, CCS, etc.).7 depicts the results of six IQA methods, including two established full-reference metrics (PSNR and SSIM) and four state-of-the-art no-reference metrics (BRISQUE, CORNIA, CQA, and SSEQ).From this table, we can notice that CORNIA and SSEQ present the best performance on LIVE2 By comparing Table 7 with Tables 4-6, we can notice that LBP-based NR-IQA approaches present better performance also for the CSIQ and TID2013 databases.For the CSIQ database, we can observe that, on average, the best state-of-the-art NR-IQA method is BRISQUE, followed by SSEQ and CORNIA.The average SROCC scores are 0.7406, 0.6979, and 0.6886 for BRISQUE, SSEQ, and CORNIA, respectively.However, the LPQ, BSIF, LVP, OCLBP, OCPP, SLBP, MLBP, MLTP, and MSLBP descriptors present better results for this CSIQ database.Similarly, for the TID2013 database, the best state-of-the-art method is CORNIA, which presents an average SROCC of 0.5361.This value is outperformed by several LBP-based descriptors, such as LVP (0.5428), OCLBP (0.5902), OCPP (0.7035), MLBP (0.5284), MLTP (0.5652), MSLBP (0.5919), and LPQ (0.5518).

Prediction Performance on Cross-Database Validation
To investigate the generalization capability of the studied methods, we performed a cross-database validation.This validation consists of training the ML algorithm using all images of one database and testing the them on the other databases.Table 8 depicts the SROCC values obtained using LIVE2 as the training database and TID2013 and CSIQ as the testing databases.To perform a straightforward cross-database comparison, only the shared subset of distortions are selected from each database.Based on the results in Table 8, we can notice that OCPP outperforms other methods for almost all types of distortions.For TID2013, the OCPP outperforms the other methods for 3 out of the 5 distortions, while for CSIQ it outperforms the other methods for 4 distortions out of the 5 distortions.OCPP is followed by MSLBP, which achieves the best results in the cases where OCPP is not the best.The cross-database validation test indicates that, in general, texture descriptors have a better generalization capacity, when compared to the tested state-of-the-art methods.

Simulation Statistics
In order to investigate the stability of the mean over the simulations, we generated some box plots depicted inf Figure 28.We chosen the BSIF, LCP, CLBP, and LPQ because these descriptors were among the best analyzed in the last section.Based on this figure, we can notice that the mean changes over the simulations.More specifically, the inter-quartile ranges increase over the simulations for BSIF and LPQ on LIVE2 database.On the other hand, this behavior is not the same for LCP and CLBP descriptors.The pattern on CSIQ and TID2013 are more similar.Further studies concerning the number of simulations that generates a stable distribution are suggested as future works.

Conclusions
In this paper, we compared three basic LBP (LBP ri , LBP u , and LBP riu2 ) with eight different parameter combinations each.This comparison was performed to verify whether LBP can be used as a feature descriptor in image quality assessment applications.Preliminary results show that, although LBP can be used in image quality assessment, the performance varies greatly for each distortion and the its parameters.Based on these results, we investigated other 14 texture descriptors, which are variants of the basic LBP.When tested using the proposed framework, BSIF, LPQ, LVP, and CLBP present a good mean correlation value for the LIVE2 database, but their performances decrease for the CSIQ and TID2013 datasets due to color and contrast distortions.Results show that multiscale approaches have a substantially better quality prediction performance.Among the tested multiscale approaches, the MSLBP descriptor, which incorporates visual saliency, has the best performance.While MSLBP has a performance that is similar to the performance obtained with the OCPP descriptor for the LIVE2 database, the OCPP presents the best performance for the remaining databases.

Figure 1 .
Figure 1.Object detection using YOLO [18] on the distorted (left) and pristine (right) images, taken from GoPro [19] dataset.The detection effectiveness of YOLO is remarkably impaired by the quality of the input image.

Figure 2 .
Figure 2. Circularly symmetric P neighbors extracted from a distance R.

Figure 4 .
Figure 4. Reference image and its correspondent Local Binary Pattern (LBP) channels computed using three different radius (R) values.

Figure 5 .
Figure 5. Illustration of the basic Local Ternary Pattern descriptor.

Figure 7 .
Figure 7. Rotation effect on LBP and RLBP descriptors: (a) original image and its rotated version, (b) Illustration of the neighbors rotation for the same pixel '63', (c) Thresholded neighbors, values above threshold are shown in red color, (d) The weights corresponding to the thresholded neighbors, (e) LBP values, (f) Thresholded neighbors for RLBP with reference denoted in yellow color, (g) The weights of the thresholded neighbors, (h) The RLBP values for the original and rotated image is same [79].

Figure 8 .
Figure 8.Effect of rotation on LBP and RLBP information.

Figure 10 .
Figure 10.Sampling scheme for the OCLBP RG and OCLBP RB descriptors.

Figure 11 .
Figure 11.Original images and their output channels, computed using the OCLBP descriptor.

Figure 16 .Figure 17 .
Figure 16.Illustration of process of extracting the feature vector x with L = 2.

Figure 19 .
Figure 19.Reference image, its impaired versions, and their respective LBP and LVP maps (C LBP and C LVP ).
illustrate how each of the XY, XZ, and YZ planes are sampled.
computing the LBP descriptor of all pixels of image I, we obtain a LBP map L, where each L[x, y] gives the local texture associated to the pixel I[x, y].Similarly, the output of BMS is a saliency map W, where each element W [x, y] corresponds to the probability that the pixel I[x, y] attracts the attention of a human observer.The first, second, and third columns of Figure 21 depict a set of original images I, their corresponding LBP maps L, and their corresponding saliency maps W, respectively.

Figure 23 Figure 23 .
Figure 23 depicts the training stage of the set of IQA methods proposed in this work.First, we collect subjective scores corresponding to each image of a training set.This procedure generates a set of labeled images, where each training set entry is composed by a pair of an image marker and its associated MOS (mean observer score).In other words, for the k-th unlabeled image I k the algorithm associates a real value v k , which corresponds to the overall quality of I k .

Figure 27
Figure27depicts the SROCC box plots for different no-reference IQA methods.Moreover, Table7depicts the results of six IQA methods, including two established full-reference metrics (PSNR and SSIM) and four state-of-the-art no-reference metrics (BRISQUE, CORNIA, CQA, and SSEQ).From this table, we can notice that CORNIA and SSEQ present the best performance on LIVE2

Table 3 .
Average SROCC of 1000 runs of simulations on tested databases using BSIF variations.

Table 4 .
Average SROCC of 1000 runs of simulations on tested databases using LPQ variations.

Table 5 .
Average SROCC of 1000 runs of simulations on tested databases using CLBP variations.

Table 6
depicts the mean SROCC values of simulations using other LBP variants.From this table, we can notice that almost all variants present an acceptable performance for the LIVE2 database.

Table 6 .
Average SROCC of 100 runs of simulations on tested image databases using other LBP variations.

Table 8 .
SROCC cross-database validation, when models are trained on LIVE2 and tested on CSIQ and TID2013.