Visual Perceptual Quality Assessment Based on Blind Machine Learning Techniques

Takam Tchendjou, Ghislain; Simeu, Emmanuel

doi:10.3390/s22010175

Open AccessArticle

Visual Perceptual Quality Assessment Based on Blind Machine Learning Techniques

by

Ghislain Takam Tchendjou

^*

and

Emmanuel Simeu

^*

Univ. Grenoble Alpes, CNRS, Grenoble INP (Institute of Engineering Univ. Grenoble Alpes), TIMA, 38000 Grenoble, France

^*

Authors to whom correspondence should be addressed.

Sensors 2022, 22(1), 175; https://doi.org/10.3390/s22010175

Submission received: 25 November 2021 / Revised: 14 December 2021 / Accepted: 18 December 2021 / Published: 28 December 2021

(This article belongs to the Special Issue Image Sensing and Processing with Convolutional Neural Networks)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper presents the construction of a new objective method for estimation of visual perceiving quality. The proposal provides an assessment of image quality without the need for a reference image or a specific distortion assumption. Two main processes have been used to build our models: The first one uses deep learning with a convolutional neural network process, without any preprocessing. The second objective visual quality is computed by pooling several image features extracted from different concepts: the natural scene statistic in the spatial domain, the gradient magnitude, the Laplacian of Gaussian, as well as the spectral and spatial entropies. The features extracted from the image file are used as the input of machine learning techniques to build the models that are used to estimate the visual quality level of any image. For the machine learning training phase, two main processes are proposed: The first proposed process consists of a direct learning using all the selected features in only one training phase, named direct learning blind visual quality assessment

D L B Q A

. The second process is an indirect learning and consists of two training phases, named indirect learning blind visual quality assessment

I L B Q A

. This second process includes an additional phase of construction of intermediary metrics used for the construction of the prediction model. The produced models are evaluated on many benchmarks image databases as

T I D 2013

,

L I V E

, and

L I V E

in the wild image quality challenge. The experimental results demonstrate that the proposed models produce the best visual perception quality prediction, compared to the state-of-the-art models. The proposed models have been implemented on an

F P G A

platform to demonstrate the feasibility of integrating the proposed solution on an image sensor.

Keywords:

blind image quality assessment; regression technique; non-distortion-specific; visual perception; convolutional neural network; deep learning; ILBQA; FPGA implementation

1. Introduction

Digital images are increasingly used in several vision application domains of everyday life, such as medical imaging [1,2], object recognition in images [3], autonomous vehicles [4], Internet of Things (

I o T

) [5], computer-aided diagnosis [6], and

3 D

mapping [7]. In all these applications, the produced images are subject to a wide variety of distortions during acquisition, compression, transmission, storage, and displaying. These distortions lead to a degradation of visual quality [8]. The increasing demand for images in a wide variety of applications involves perpetual improvement of the quality of the used images. As each domain has different thresholds in terms of visual perception needed and fault tolerance, so, equally, does the importance of visual perception quality assessment.

As human beings are the final users and interpreters of image processing, subjective methods based on the human ranking score are the best processes to evaluate image quality. The ranking consists of asking several people to watch images and rate their quality. In practice, subjective methods are generally too expensive, time-consuming, and not usable in real-time applications [8,9]. Thus, many research works have focused on objective image quality assessment

I Q A

methods aiming to develop quantitative measures that automatically predict image quality [2,8,10]. The objective

I Q A

process is illustrated in Figure 1. This process has been introduced in the paper [11].

In digital image processing, objective

I Q A

can be used for several roles, such as dynamic monitoring and adjusting the image quality, benchmark and optimize image processing algorithms, and parameter setting of image processing [8,12]. Many research investigations explore the use of machine learning (

M L

) algorithms in order to build objective

I Q A

models in agreement with human visual perception. Recent methods include, artificial neural network (

A N N

), support vector machine (

S V M

), nonlinear regression (

N L R

), decision tree (

D T

), clustering, and fuzzy logic (

F L

) [13]. After the propulsion of deep learning techniques in 2012 [14], researchers were also interested in the possibility of using these techniques in the image quality assessment. Thus, in 2014, the first studies emerged on the use of convolutional neural networks in

I Q A

[15]. Many other works have followed, and we find in the literature increasingly efficient

I Q A

models [16,17,18].

The

I Q A

evaluation methods can be classified into three categories according to whether or not they require a reference image: full-reference (

F R

), reduced-reference (

R R

), and no-reference (

N R

)

I Q A

approaches. Full reference image quality assessment (

F R

-

I Q A

) needs a complete reference image in order to be computed. Among the most popular

F R

-

I Q A

, we can cite the peak signal to noise ratio (

P S N R

), structure similarity index metric (

S S I M

) [8,19], and visual information fidelity (

V I F

) [20]. In reduced reference image quality assessment (

R R

-

I Q A

), the reference image is only partially available, in the form of a set of extracted features, which help to evaluate the distorted image quality; this is the case of reduced reference entropic differencing (

R R E D

) [21]. In many real-life applications, the reference image is unfortunately not available. Therefore, for this application, the need of no-reference image quality assessment (

N R

-

I Q A

) methods or blind

I Q A

(

B I Q A

), which automatically predict the perceived quality of distorted images, without any knowledge of reference image. Some

N R

-

I Q A

methods assume the type of distortions are previously known [22,23], these objective assessment techniques are called distortion specific (

D S

)

N R

-

I Q A

. They can be used to assess the quality of images distorted by some particular distortion types. As example, the algorithm in [23] is for

J P E G

compressed images, while in [22] it is for

J P E G 2000

compressed images, and in [24] it is for detection of blur distortion. However, in most practical applications, information about the type of distortion is not available. Therefore, it is more relevant to design non-distortion specific (

N D S

)

N R

-

I Q A

methods that examine image without prior knowledge of specific distortions [25]. Many existing metrics are the base units used in

N D S

methods, such as

D I I V I N E

[26],

I L

-

N I Q E

[27],

B R I S Q U E

[28],

G M L O G Q A

[29],

S S E Q

[30],

F R I Q U E E

[31], DIQA [32], and DIQa-NR [33].

The proposal of this paper is a non-distortion-specific

N R

-

I Q A

approach, where the extracted features are based on a combination of the natural scene statistic in the spatial domain [28], the gradient magnitude [29], the Laplacian of Gaussian [29], as well as the spatial and spectral entropies [30]. These features are trained using machine learning methods to construct the models used to predict the perceived image quality. The process we propose for designing the evaluation of perceived no-reference image quality models is described in Figure 2. The process consists of

extracting the features from images taken in the $I Q A$ databases,
removing the superfluous features according to the correlation between the extracted features,
grouping the linearly independent features to construct some intermediate metrics, and
using the produced metrics to construct the estimator model for perceived image quality assessment.

Finally, we compare the designed models with the state-of-the-art models using the features extraction, but also the process using deep learning with convolutional neural network; as shown in Section 4.1.

To evaluate the performances of the produced models, we measure the correlation between objective and subjective quality scores using three correlation coefficients:

(1): Pearson’s linear correlation coefficient ( $P L C C$ ), which is used to measure the degree of the relationship between linear related variables.
(2): Spearman’s rank order correlation coefficient ( $S R C C$ ), which is used to measure the prediction monotony and the degree of association between two variables.
(3): Brownian distance ( $d C o r$ ), which is a measure of statistical dependence between two random variables or two random vectors of arbitrary, not necessarily equal dimension.

The paper is organized as follows. Section 2 presents the feature extraction methods. Section 3 explains the feature selection technique based on feature independence analysis. The construction of intermediate metrics is also presented in this section. Section 4 explains the experimental results and their comparison. Section 5 presents the

F P G A

implementation architectures and results. Finally, Section 6 draws a conclusion and perspectives for future investigations.

2. Feature Extraction

Feature extraction in this paper is based on four principal axes: natural scene statistic in the spatial domain, gradient magnitude, Laplacian of Gaussian, and finally spatial and spectral entropies.

2.1. Natural Scene Statistic in the Spatial Domain

The extraction of the features based on

N S S

in spatial domain starts by normalization of the image represented by

I (i, j)

, to remove local mean displacements from zero log-contrast, and to normalize the local variance of the log contrast as observe in [34]. Equation (1) presents normalization of the initial image.

\hat{I} (i, j) = \frac{I (i, j) - μ (i, j)}{λ (i, j) + 1}

(1)

where i and j are the spatial indices, M and N are the image dimensions,

i \in 1, 2, \dots, M

and

j \in 1, 2, \dots, N

.

μ (i, j)

that denotes the local mean is represented by (2) and

λ (i, j)

that estimates the local contract is expressed by (3).

μ (i, j) = \sum_{k} \sum_{l} ω_{k, l} I (i + k, j + l)

(2)

λ (i, j) = \sqrt{\sum_{k} \sum_{l} ω_{k, l} {[I (i + k, j + l) - μ (i, j)]}^{2}}

(3)

where

k = - K, \dots, K

, and

l = - L, \dots, L

.

ω_{k, l}

is a 2D circularly-symmetric Gaussian weighting function sampled out to 3 standard deviations (

K = L = 3

) and rescaled to unit volume [28].

The model produced in (1) is used as the mean-subtracted contrast normalized (

M S C N

) coefficients. In [28], they take the hypothesis that the

M S C N

coefficients have characteristic statistical properties that are changed by the presence of distortion. Quantifying these changes helps predict the type of distortion affecting an image as well as its perceptual quality. They also found that a Generalized Gaussian Distribution (

G G D

) could be used to effectively capture a broader spectrum of a distorted image, where the

G G D

with zero mean is given by (4).

f (x; α, σ^{2}) = \frac{α}{2 β Γ (1 / α)} exp (- {(x / β)}^{α})

(4)

where

β

is represented by (5) and

Γ (.)

is expressed by (6).

β = σ \sqrt{\frac{Γ (1 / α)}{Γ (3 / α)}}

(5)

Γ (a) = \int_{0}^{\infty} t^{a - 1} e^{- t} d t a > 0 .

(6)

The parameter

α

controls the shape of the distribution while

σ^{2}

controls the variance.

In [28], they also give the statistical relationships between neighboring pixels along four orientations:

H = \hat{I} (i, j) \hat{I} (i, j + 1); V = \hat{I} (i, j) \hat{I} (i + 1, j); D 1 = \hat{I} (i, j) \hat{I} (i + 1, j + 1)

and

D 2 = \hat{I} (i, j) \hat{I} (i + 1, j - 1)

. This is used with asymmetric density function to produce a practical alternative to adopt a general asymmetric generalized Gaussian distribution (

A G G D

) model [35]. Equation (7) gives the AGGD with zero mode.

f (x; γ, σ_{l}^{2}, σ_{r}^{2}) = \{\begin{matrix} \frac{γ}{(β_{l} + β_{r}) Γ (1 / γ)} exp (- {(\frac{- x}{β_{l}})}^{γ}) \forall x \leq 0 \\ \frac{γ}{(β_{l} + β_{r}) Γ (1 / γ)} exp (- {(\frac{x}{β_{r}})}^{γ}) \forall x \geq 0 \end{matrix}

(7)

where

β_{a}

(with

a = l o r r

) is given by (8).

β_{a} = σ_{a} \sqrt{\frac{Γ (1 / γ)}{Γ (3 / γ)}}

(8)

The parameter

γ

controls the shape of the distribution, while

σ_{l}^{2}

and

σ_{r}^{2}

are scale parameters that control the spread on each side of the mode. The fourth asymmetric parameter is

η

given by (9)

η = (β_{r} - β_{l}) \frac{Γ (2 / γ)}{Γ (1 / γ)}

(9)

Finally, the founded parameters are composed of the symmetric parameters (

α

and

σ^{2}

) and the asymmetric parameters (

η, γ, σ_{r}^{2}

, and

σ_{l}^{2}

), where the asymmetric parameters are computed for the four orientations, as shown in Table 1. All the founded parameters are also computed for two scales, yielding 36 features (2 scales × [2 symmetric parameters

+ 4

asymmetric parameters

\times 4

orientations]). More details about the estimation of these parameters are given in [28,36].

2.2. Gradient Magnitude and Laplacian of Gaussian

The second feature extraction method is based on the joint statistics of the Gradient Magnitude (

G M

) and the Laplacian of Gaussian (

L o G

) contrast. These two elements

G M

and

L o G

are usually used to get the semantic structure of an image. In [29], they also introduce another usage of these elements as features to predict local image quality.

By taking an image I(i,j), its GM is represented by (10).

G_{I} = \sqrt{{[I \otimes h_{x}]}^{2} + {[I \otimes h_{y}]}^{2}}

(10)

where ⊗ is the linear convolution operator, and

h_{d}

is the Gaussian partial derivative filter applied along the direction

d \in x, y

, represented by (11).

h_{d} (x, y σ) = - \frac{1}{2 π σ^{2}} \frac{d}{σ^{2}} exp (- \frac{x^{2} + y^{2}}{2 σ^{2}})

(11)

Moreover, the LoG of this image is represented by (12).

L_{I} = I \otimes h_{L o G}

(12)

where

h_{L o G} (x, y σ) = - \frac{1}{2 π σ^{2}} \frac{x^{2} + y^{2} - 2 σ^{2}}{σ^{4}} exp (- \frac{x^{2} + y^{2}}{2 σ^{2}})

(13)

To produce the used features, the first step is to normalize the

G M

and

L o G

features map as in (14).

\begin{matrix} {\bar{G}}_{I} = G_{I} / (N_{I} + ϵ) \\ {\bar{L}}_{I} = L_{I} / (N_{I} + ϵ) \end{matrix}

(14)

where

ϵ

is a small positive constant, used to avoid instabilities when

N_{I}

is small, and

N_{I}

is given by (15).

N_{I} (i, j) = \sqrt{\sum_{l} \sum_{k} ω (l, k) F_{I}^{2} (l, k)}

(15)

where

(l, k) \in Ω_{i, j}

; and

F_{I}

is given by (16).

F_{I} (i, j) = \sqrt{G_{I}^{2} (i, j) + L_{I}^{2} (i, j)}

(16)

Then (17) and (18) give the final statistic features.

\{\begin{matrix} P_{G} (G = g_{m}) = \sum_{n} K_{m, n} \\ P_{L} (L = l_{n}) = \sum_{m} K_{m, n} \end{matrix}

(17)

\{\begin{matrix} Q_{G} (G = g_{m}) = \frac{1}{N} \sum_{n} P (G = g_{m} L = l_{n}) \\ Q_{L} (L = l_{n}) = \frac{1}{M} \sum_{m} P (L = l_{n} G = g_{m}) \end{matrix}

(18)

where

n = 1, \dots, N

and

m = 1, \dots, M

, and

K_{m, n}

is the empirical probability function of G and L [37,38]; it can be given by (19).

K_{m, n} = P (G = g_{m}, L = l_{n})

(19)

In [29], the authors also found that the best results are obtained by setting

M = N = 10

; thus, 40 statistical features have been produced as shown in Table 2, 10 dimensions for each statistical features vector

P_{G}, P_{L}, Q_{G}

and

Q_{L}

.

2.3. Spatial and Spectral Entropies

Spatial entropy is a function of the probability distribution of the local pixel values, while spectral entropy is a function of the probability distribution of the local discrete cosine transform (

D C T

) coefficient values. The process of extracting the spatial and spectral entropies (SSE) features from images in [30] consists of three steps:

The first step is to decompose the image into 3 scales, using bi-cubic interpolation: low, middle, and high.
The second step is to partition each scale of the image into $8 \times 8$ blocks and compute the spatial and spectral entropies within each block. The spatial entropy is given by (20).

$E_{s} = - \sum_{x} p (x) l o g_{2} p (x)$

(20)

and spectral entropy is given by (21).

$E_{f} = - \sum_{i} \sum_{j} P (i, j) l o g_{2} P (i, j) .$

(21)

where $p (x)$ is the probability of x, and $P (i, j)$ is the spectral probability gives by (22).

$P (i, j) = \frac{C {(i, j)}^{2}}{\sum_{i} \sum_{j} C {(i, j)}^{2}} .$

(22)

where $i = 1, \dots, 8; j = 1, \dots, 8$ .
In the third step, evaluate the means and skew of blocks entropy within each scales.

At the end of the three steps, 12 features are extracted from the images as seen in Table 3. These features represent the mean and skew for spectral and spatial entropies, on 3 scales (

2 \times 2 \times 3 = 12

features).

2.4. Convolutional Neural Network for NR-IQA

In this paper, we explore the possibility of use the deep learning with convolutional neural network to build the model used to evaluate the quality of the image. In this process, the extraction of features are done by the convolution matrix, constructed using the training process with deep learning.

In CNNs, three main characteristics of convolutional layers can distinguish them from fully connected linear layers in the vision field. In the convolutional layer,

each neuron receives an image as inputs and produces an image as its output (instead of a scalar);
each synapse learns a small array of weights, which is the size of the convolutional window; and
each pixel in the output image is created by the sum of the convolutions between all synapse weights and the corresponding images.

The convolutional layer takes as input and image of dimension

R_{l} x C_{l}

with

S_{l}

channels, and the output value of the pixel

p_{l, n, r, c}

(the pixel of the row r and the column c, of the neuron n in the layer l) is computed by (23).

p_{l, n, r, c} = f_{a c t} (\sum_{s = 0}^{S_{l}} \sum_{i = 0}^{I_{l}} \sum_{j = 0}^{J_{l}} w_{l, n, s, i, j} p_{l - 1, n, r + i, c + j})

(23)

where

I_{l} \times J_{l}

are the convolution kernel dimensions of the layer l,

w_{l, n, s, j, k}

is the weight of the row i and the column j in the convolution matrix of the synapse s, connected to the input of the neuron n, in the layer l. In reality, a convolution is simply an element-wise multiplication of two matrices followed by a sum. Therefore, the 2D convolution take two matrices (which both have the same dimensions), multiply them, element-by-element, and sum the elements together. To close the convolution process in a convolutional layer, the results are then passed through an activation function.

In convolution process, a limitation of the output of the feature map is that they record the precise position of features in the input in the convolutional layers. This means that small movements in the position of the feature in the input image will result in a different feature map. This can happen with cropping, rotating, shifting, and other minor changes to the input image [15]. In CNNs, the common approach to solving this problem is the down-sampling using the pooling layers [39]. The down-sampling is a reduction in the resolution of an input signal while preserving important structural elements, without the fine details that are not very useful for the task. The pooling layers are used to reduce the dimensions of the feature maps. Thus, it reduces the number of parameters to learn and the amount of computation performed in the network. The pooling layer summarizes the features present in a region of the feature map generated by a convolution layer. Therefore, further operations are performed on summarized features instead of precisely positioned features generated by the convolution layer. This makes the model more robust to variations in the position of the features in the input image.

In CNNs, the most used pooling function is Max Pooling, which calculates the maximum value for each patch on the feature map. But other pooling function exist, like Average Pooling, which calculates the average value for each patch on the feature map.

Our final CNN model, called

C N N

-

B Q A

has 10 layers: four convolutional layers with 128, 256, 128, and 64 channels, respectively; four max pooling layers; and two fully connected layers with 1024 and 512 neurons, respectively. Finally, a output layer with one neuron is computed to give the final score of the image.

3. Construction of Prediction Models

In the feature extraction phase, 88 features were extracted in the images. These features are used to construct the prediction models used to evaluate the quality of images. In our case, two processes have been proposed to evaluate the image quality:

The Direct Learning Blind visual Quality Assessment method ( $D L B Q A$ ), in which machine learning methods are directly applied overall set of selected features for training, and producing the final models.
Indirect Learning Blind visual Quality Assessment method ( $I L B Q A$ ), in which ML methods are applied to the four produced intermediary metrics to construct the final model. The $I L B Q A$ process requires two training phases: the first is used to merge the independent features of each class in order to generate the adequate intermediary metrics, while the task of the second training is to derive the final $I Q A$ model. The inputs of the produced model are the four produced intermediary metrics, and the output is the estimated quality score of the input image.

3.1. Remove Superfluous Features

Having extracted the 88 features on the images, the next step consists of removing the superfluous features, which not add any useful information for the construction of final

I Q A

models. The main type of superfluous features consists of strongly correlated features.

The strongly correlated features give approximately the same information, and this redundancy may cause overfitting problem in the estimation process of the model parameters, and can also produce uselessly complex final models. The proposed way in this paper in order to avoid these problems is to remove the strongly correlated features. The suppression of features is based on the following idea: “if there are two strongly correlated features in the set of extracted features, with a mutual correlation greater than a predefined threshold, we remove the one that has the least correlation with the subjective score without significant loss of information”. This reduction process is clearly described in Figure 3, which takes as input the set of extracted features, and produces the set of superfluous features.

The features contained in the output vector of this algorithm are removed from the list of those to be used in the rest of the work. In our use case, we took

0.9

(

t h r e s = 90 %

) as the threshold of mutual correlation between two features. This implies that for any mutual correlation between 2 features higher than

0.9

, one of the two features concerned must be removed. With this threshold, the output vector contains 42 features to remove from the initial extracted features list. At the end of this step, 42 features have been found as superfluous features; yielding 46 features are saved for study.

3.2. Direct Process ( $D L B Q A$ )

In the direct process, the whole set of 46 selected features is used by only one training phase to produce the final quality assessment model. The produced models have been called Direct Learning Blind visual Quality Assessment index based on Machine Learning (

D L B Q A

-

M L

). Different machine learning methods have been used to construct assessment models. These include methods based on artificial neural networks (

D L B Q A

-

A N N

), polynomial Nonlinear Regression (

D L B Q A

-

N L R

), Support Vector Machine (

D L B Q A

-

S V M

), Decision Tree (

D L B Q A

-

D T

), and Fuzzy Logic (

D L B Q A

-

F L

). This different machine learning methods have been presented in [11,40]

Table 4 resumes the results using different correlation scores on the test data by the

D L B Q A

-

M L

models produced using different machine learning methods, on the TID2013 [41,42], LIVE [8,43,44], and LIVE challenge [45,46] image databases.

These results show that the

D L B Q A

based on support vector machine (

S V M

) and decision tree (

D T

) methods provide the best models for predicting image quality from the different image quality databases.

3.3. Indirect Process ( $I L B Q A$ )

As the number of selected features is too big (46), the indirect process devises the training phase into three steps:

The first step, as mentioned in the previous section, is to distribute the features into independent classes, depending on the axes of the extracted features.
The next step uses machine learning methods to merge the features of each class in order to generate an appropriate intermediary metrics. Then, the pool process has been carried out using different ML methods to create intermediary metrics $I M_N S S$ , $I M_G M$ , $I M_L O G$ , and $I M_S S E$ .
The third step consists of using the produced intermediary metrics to construct final model and to derive the quality score estimators.

3.3.1. Construction of Intermediate Models

Having removed superfluous features, the next step consists of grouping the selected features into a set of classes. The produced classes are based on four principal axes:

Natural Scene Statistic in the spatial domain ( $N S S$ ) with 14 extracted features;
Gradient Magnitude ( $G M$ ) with 13 extracted features;
Laplacian of Gaussian ( $L O G$ ) with 11 extracted features;
Spatial and Spectral Entropies ( $S S E$ ) with 8 extracted features.

Each produced class is used to construct one intermediary metric. In each produced class, the features have been merged to generate the appropriate intermediary metrics. The pool process has been carried out, using different machine learning methods.

Table 5 compares the result of applying four ML methods (fuzzy logic, support vector machine, decision tree, and artificial neural network) in order to build the intermediary metrics, using the feature classes. The comparison is represented by Spearman’s correlation score between the produced intermediary metrics and the subjective scores give in different image databases.

The selection of the adequate ML method is determined by studying Brownian correlations between the produced intermediary metrics and the subjective scores in multivariate space as shown in Table 6.

The main remark extracts on this Table 6 is that the decision tree (

D T

) ML method produces the best results on all the studied image databases. Thus, the decision tree method has been used as the machine learning method to learn the intermediary metrics.

3.3.2. Construction of Final Indirect Prediction Model

Once the intermediary metrics are built, the next step in the indirect process is the construction of final prediction model using these intermediary metrics. The final evaluation index constructed consists of several models: four models of construction of the intermediary metrics, and a model of evaluation of the final score using the intermediary metrics. As the intermediary metrics have been constructed in the previous section, all that remains is the construction of the final prediction model, taking as inputs these intermediary metrics previously evaluated. These models are built using different machine learning methods.

Table 7 resumes the prediction performance of

I L B Q A

-based image quality models trained on TID213, LIVE, and Live challenge image databases. The design of

I Q A

models consists here of two training steps. The first step uses the decision tree approach in order to merge the quality features into four intermediary metrics. Moreover, the next training step is used different ML approaches (

D T

,

F L

,

S V M

, and

A N N

) for predicting quality score from the corresponding set of intermediary metrics.

These results show that the

I L B Q A

based on decision tree (

D T

) methods provide the best models for predicting image quality from the different image quality databases.

4. Experimental Results

Having extracted and removed the superfluous features, the next step is to construct machine learning models, which automatically predict the perceived quality of an input image without a need to reference image. The construction process of the objective NR-IQA block essentially consists of training (with

α %

of data) and validation phases. During the training phase, machine learning algorithms are applied to create an appropriate mapping model that relates the features extracted from an image to subjective score provided in the image database. While the validation phase is used to estimate how well the trained ML models predict the perceived image quality, and to compare the accuracy and robustness of the different ML methods.

The Monte Carlo Cross-Validation (

M C C V

) method [47] is also applied to compare and confirm the accuracy of the obtained results, as follows:

First, split the dataset with a fixed fraction $(α %)$ into training and validation set, where $α %$ of the samples are randomly assigned to the training process in order to design a model, and the remaining samples are used for assessing the predictive accuracy of the model.
Then, the first step is repeated k times.
Finally, the correlation scores are averaged over k.

In this paper,

α % = 70 %

and

k = 1000

.

4.1. Results Comparison

Having built our evaluation models, the next step is the validation of these models, and the comparison of the results produced with those produced by the state-of-the-art models. The validation process is done using Monte Carlo Cross-Validation (

M C C V

) [47].

In Figure 4, Figure 5 and Figure 6, different state-of-the-art no-reference

I Q A

methods have been studied and compared with the two models design in this paper (

D L B Q A

and

I L B Q A

). These blind

I Q A

methods consist of BRISQUE [28] that uses NSS in the spatial domain, GMLOGQA [29] that uses the gradient magnitude (GM) and the Laplacian of Gaussian (LOG) to make a predictor for image quality, and SSEQ [30] based on the spatial and spectral entropies. These figures depict the box-plot of Brownian’s distance in the

M C C V

phase, of different no-reference

I Q A

methods, trained on

T I D 2013

,

L I V E

, and

L I V E

Challenge image databases, respectively. They also show that the

B I Q A

model that obtains the best prediction accuracy (with the highest mean and the smallest standard deviation) is designed by the proposed method

I L B Q A

with two training steps.

We compare the proposed models (

D L B Q A

,

I L B Q A

, and

C N N

-

B Q A

) with six feature extraction based BIQA methods, and five CNN-based methods, on three popular benchmark

I Q A

databases. The results are listed in Table 8. In the table, the best two SROCC and PLCC are highlighted in bold. The performances are the correlations between the subjective score (MOS/DMOS) take in image databases, and the evaluated objective scores given by the different studied

N R

-

I Q A

methods in cross-validation phase with

k = 1000

. The main remark is that for every image databases (

T I D 2013

,

L I V E

, and

L I V E

Challenge), the proposed

D L B Q A

and

I L B Q A

methods perform the best results.

4.2. Computational Complexity

In many image quality assessment applications, the computational complexity is an important factor when evaluating the produced BIQA models. In this paper, the feature extraction is based on four principal axes. Table 9 presents the percentage of time spent by each feature extraction axes.

The run-time of the produced model is compared to the run-times of five other good NR-IQA indexes: BLIINDS-II [50], DIIVINE [26], BRISQUE [28,36], GMLOGQA [29], and SSEQ [30]. Table 10 presents the estimated run-time for an execution on CPU, in second(s), of the different methods, for a

B M P

image with resolution

512 \times 768

, on MATLAB, using a computer with

16 G B

RAM and Intel Xeon dual-core CPU,

1.70

GHz for each core. We observe that the run-time of the proposed method (

I L B Q A

) is significantly inferior to

D I I V I N E

and

B L I I N D

-

I I

run-time, approximately equal to

S S E Q

run-time, and superior to

B R I S Q U E

and

G M L O G Q A

run-time.

5. Implementation Architectures and Results

After the construction of the no-reference image quality evaluation models, through the MATLAB tool, the next step in our process is the implementation of the produced models on an FPGA platform. The produced IP-Core can be integrated into the output of an image sensor to evaluate the quality of the produced images. The implemented index is the indirect process (

I L B Q A

). This implementation is done with the Xilinx Vivado and Vivado HLS and is implemented on the Xilinx Virtex 7 (

V C 707

) and Zynq 7000 (

X C 7 Z 020

) platforms.

The work in [11] introduces this phase of implementation of our proposed process on the objective no-reference image quality assessment. The general implementation process consists of the following.

HLS implementation of the block IP, allowing the encapsulation of the designed models, through the Vivado HLS tools, in C/C++ codes.
Integration of the produced IP-cores in the overall design at RTL, through the Vivado tool, in VHDL/Verilog codes.
Integration on Xilinx’s Virtex 7 ( $V C 707$ ) and Zynq 7000 ( $X C 7 Z 020$ ) FPGA boards, and evaluation of the obtained results on several test images.

5.1. HLS Implementation of the IP-Cores

The implementation of the

I L B Q A

models in HLS enabled the construction of five IP-cores. This IP-Cores are divided into three groups, leading to the evaluation of the objective score of an image taken at the input of the process. The IP-Cores in each group can run in parallel, but the inputs in each group depend on the outputs of the previous group. The three proposed groups consist of feature extraction, construction of intermediate metrics based on decision tree models, and evaluation of the objective score from the model based on fuzzy logic, taking as input the intermediate metrics. The layered architecture of the produced IP-cores is presented in Figure 7.

Feature extraction: the entries of this group are, the image whose quality is sought to be evaluated, as an AXI-stream, and the dimensions of this input image (row and column sizes as an integer). It contains three IP-cores: $N S S_C o r e$ which extracts 14 features on the image; $G M$ - $L O G_C o r e$ consisting of extraction of 13 and 11 features on $G M$ and $L o G$ axes, respectively; and $S S E_C o r e$ which extracts 08 features.
Production of the intermediary metrics: This group contains only one IP-core, the decision tree ( $D T$ ) IP-core. The $D T_C o r e$ uses the features extracted from the input image and generates the intermediary metrics in floating-point format.
Production of the estimated score of the image using the intermediary metrics: this group also has only one IP-core, fuzzy logic IP-core ( $F L_C o r e$ ). This IP-core takes as input the four intermediary metrics produced by the $D T_C o r e$ and produces the estimated image quality score of the input image, in floating-point format.

Table 11 presents the estimation of performances and utilization at the synthesis level mapped to Xilinx Virtex 7 (

V C 707

) FPGA platform, for the different produced IP-cores, and the global process. These results are produced by an image with a resolution of

512 \times 768

, and for a clock cycle equal to 25 ns (a frequency equal to 40 Mhz).

5.2. SDK Implementation and Results

The global design containing our models has been implemented on an

F P G A

board, then a program is implemented in the micro-controller (MicroBlaze) to evaluate the quality of the image thus received through the model implemented on

F P G A

. This program also return the objective score evaluated by the

F P G A

implemented models. The images are retrieved in “AXI-Stream” format and transmitted on the

F P G A

board through an extension using the

H D M I

protocol. The program is written in C/C++ codes, and passed to micro-controller via the

J T A G

interface of the

F P G A

board, thanks to Xilinx SDK tool.

120 images distorted by 24 types of distortion, from the

T I D 2013

image database [41,42] were evaluated by the produced models on the

F P G A

platform. The

F P G A

implementation results were compared to the experimental results under

M A T L A B

and also to the subjective scores taken the image database.

Figure 8 presents the comparison between

M A T L A B

results in the X-axis and

F P G A

implementation results in the Y-axis, for the

I L B Q A

index. While Figure 9 presents the comparison between the subjective scores in the X-axis, and

M A T L A B

results (in blue) and

F P G A

implementation results (in red) in the Y-axis. These figures also present the correlation scores between the scores in the X-axis and the produced objective scores in the Y-axis on each graph.

The difference between the image quality scores estimated under

M A T L A B

and those estimated on

F P G A

is caused by several factors, including the following.

Spreading 32-bit floating-point truncation errors in the $F P G A$ system;
The difference between the results of the basic functions of the MATLAB and Vivado HLS tools. For example, the conversion functions in gray-scale, or the resizing functions using the bi-cubic interpolation, give slightly different results on MATLAB and Vivado HLS. These small errors are propagated throughout the evaluations, to finally cause the differences between the estimated scores shown in the previous figures.

5.3. Visual Assessment

Our NR-IQA process has been used in a control loop process [51] to assess the quality of an image and decide if that image needs correction. Gaussian noise is inserted in a reference image, resulting in distorted image illustrated in Figure 10a, with a very low

I L B Q A

score like show in the figure. Figure 10b–d shows the corrected version of the distorted image presented in the graph (a), by the control loop process based on the image quality assessment. Each graph also presents the

P S N R

between the reference image and the corrected image, as well as the blind image quality (

I L B Q A

) score of each image. The graphs (b) and (c) of the Figure 10 show the intermediary corrected images, while the graph (d) presents the best-corrected image by the process.

6. Conclusions

This paper has presented the construction of models for objective assessment of perceived visual quality. The proposed models use a non-distortion specific no-reference image quality assessment methods. The objective

I Q A

is computed by combining the most significant image features extracted from natural scene statistics (

N S S

) in the spatial domain, gradient magnitude (

G M

), Laplacian of Gaussian (

L o G

), and spectral and spatial entropies (

S S E

). In our proposal, the training phase has been performed using

D L B Q A

and

I L B Q A

process. The

D L B Q A

index evaluates the image quality using all the selected features in one training phase. While the

I L B Q A

index uses two training phases. The first phase consists of training the intermediary metrics using the feature classes based on four extraction axes, and the second training phase evaluates the image quality using the intermediary metrics. Different machine learning methods were used to assess image quality and their performance was compared. The proposed methods consisting of artificial neural network (

A N N

), nonlinear polynomial regression (

N L R

), decision tree (

D T

), support vector machine (

S V M

), and fuzzy logic (

F L

) are discussed and compared. Both the stability and the robustness of designed models are evaluated using a variant of Monte Carlo cross-validation (

M C C V

) with 1000 randomly chosen validation image subsets. The accuracy of the produced models has been compared to the results produced by other state-of-the-art

N R

-

I Q A

methods. Implementation results on

M A T L A B

and on the

F P G A

platform demonstrated the best performances for

I L B Q A

proposed in this paper, which uses two training phases in the modeling process. One of the future work based on this article is to present how to use our image quality assessment indexes in the control loop process for self-healing of the image sensor. In addition, the implementation on an ASIC and the implementation of the generated IP-core in image sensors.

Author Contributions

Conceptualization, G.T.T. and E.S.; methodology, G.T.T. and E.S.; software, G.T.T.; validation, G.T.T. and E.S.; formal analysis, G.T.T. and E.S.; investigation, G.T.T. and E.S.; resources, G.T.T. and E.S.; data curation, G.T.T. and E.S.; writing—original draft preparation, G.T.T.; writing—review and editing, G.T.T. and E.S.; visualization, G.T.T. and E.S.; supervision, E.S.; project administration, E.S.; funding acquisition, E.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lu, L.; Wang, X.; Carneiro, G.; Yang, L. Deep Learning and Convolutional Neural Networks for Medical Imaging and Clinical Informatics; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
Leonardi, M.; Napoletano, P.; Schettini, R.; Rozza, A. No Reference, Opinion Unaware Image Quality Assessment by Anomaly Detection. Sensors 2021, 21, 994. [Google Scholar] [CrossRef] [PubMed]
Geirhos, R.; Janssen, D.H.; Schütt, H.H.; Rauber, J.; Bethge, M.; Wichmann, F.A. Comparing deep neural networks against humans: Object recognition when the signal gets weaker. arXiv 2017, arXiv:1706.06969. [Google Scholar]
Plosz, S.; Varga, P. Security and safety risk analysis of vision guided autonomous vehicles. In Proceedings of the 2018 IEEE Industrial Cyber-Physical Systems (ICPS), Saint Petersburg, Russia, 15–18 May 2018; pp. 193–198. [Google Scholar]
Aydin, I.; Othman, N.A. A new IoT combined face detection of people by using computer vision for security application. In Proceedings of the 2017 International Artificial Intelligence and Data Processing Symposium (IDAP), Malatya, Turkey, 16–17 September 2017; pp. 1–6. [Google Scholar]
Song, H.; Nguyen, A.D.; Gong, M.; Lee, S. A review of computer vision methods for purpose on computer-aided diagnosis. J. Int. Soc. Simul. Surg 2016, 3, 2383–5389. [Google Scholar] [CrossRef] [Green Version]
Kanellakis, C.; Nikolakopoulos, G. Survey on computer vision for UAVs: Current developments and trends. J. Intell. Robot. Syst. 2017, 87, 141–168. [Google Scholar] [CrossRef] [Green Version]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ponomarenko, N.; Ieremeiev, O.; Lukin, V.; Jin, L.; Egiazarian, K.; Astola, J.; Vozel, B.; Chehdi, K.; Carli, M.; Battisti, F.; et al. A new color image database TID2013: Innovations and results. In International Conference on Advanced Concepts for Intelligent Vision Systems; Springer: Berlin/Heidelberg, Germany, 2013; pp. 402–413. [Google Scholar]
Tsai, S.Y.; Li, C.H.; Jeng, C.C.; Cheng, C.W. Quality Assessment during Incubation Using Image Processing. Sensors 2020, 20, 5951. [Google Scholar] [CrossRef]
Tchendjou, G.T.; Simeu, E.; Lebowsky, F. FPGA implementation of machine learning based image quality assessment. In Proceedings of the 2017 29th International Conference on Microelectronics (ICM), Beirut, Lebanon, 10–13 December 2017; pp. 1–4. [Google Scholar]
Wang, Z.; Bovik, A.C.; Lu, L. Why is image quality assessment so difficult? In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Orlando, FL, USA, 13–17 May 2002; Volume 4, p. IV-3313. [Google Scholar]
Tchendjou, G.T.; Alhakim, R.; Simeu, E. Fuzzy logic modeling for objective image quality assessment. In Proceedings of the 2016 Conference on Design and Architectures for Signal and Image Processing (DASIP), Rennes, France, 12–14 October 2016; pp. 98–105. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Kang, L.; Ye, P.; Li, Y.; Doermann, D. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1733–1740. [Google Scholar]
Ma, J.; Wu, J.; Li, L.; Dong, W.; Xie, X.; Shi, G.; Lin, W. Blind Image Quality Assessment With Active Inference. IEEE Trans. Image Process. 2021, 30, 3650–3663. [Google Scholar] [CrossRef]
Zhang, W.; Ma, K.; Zhai, G.; Yang, X. Uncertainty-aware blind image quality assessment in the laboratory and wild. IEEE Trans. Image Process. 2021, 30, 3474–3486. [Google Scholar] [CrossRef]
Ullah, H.; Irfan, M.; Han, K.; Lee, J.W. DLNR-SIQA: Deep Learning-Based No-Reference Stitched Image Quality Assessment. Sensors 2020, 20, 6457. [Google Scholar] [CrossRef]
Li, C.; Bovik, A.C. Content-partitioned structural similarity index for image quality assessment. Signal Process. Image Commun. 2010, 25, 517–526. [Google Scholar] [CrossRef]
Sheikh, H.R.; Bovik, A.C. Image information and visual quality. IEEE Trans. Image Process. 2006, 15, 430–444. [Google Scholar] [CrossRef] [PubMed]
Soundararajan, R.; Bovik, A.C. RRED indices: Reduced reference entropic differencing for image quality assessment. IEEE Trans. Image Process. 2012, 21, 517–526. [Google Scholar] [CrossRef] [Green Version]
Brandão, T.; Queluz, M.P. No-reference image quality assessment based on DCT domain statistics. Signal Process. 2008, 88, 822–833. [Google Scholar] [CrossRef]
Meesters, L.; Martens, J.B. A single-ended blockiness measure for JPEG-coded images. Signal Process. 2002, 82, 369–387. [Google Scholar] [CrossRef]
Caviedes, J.; Gurbuz, S. No-reference sharpness metric based on local edge kurtosis. In Proceedings of the International Conference on Image Processing, Rochester, NY, USA, 22–25 September 2002; Volume 3, p. III. [Google Scholar]
Ye, P.; Doermann, D. No-reference image quality assessment using visual codebooks. IEEE Trans. Image Process. 2012, 21, 3129–3138. [Google Scholar] [PubMed]
Moorthy, A.K.; Bovik, A.C. Blind image quality assessment: From natural scene statistics to perceptual quality. IEEE Trans. Image Process. 2011, 20, 3350–3364. [Google Scholar] [CrossRef] [PubMed]
Zhang, L.; Zhang, L.; Bovik, A.C. A feature-enriched completely blind image quality evaluator. IEEE Trans. Image Process. 2015, 24, 2579–2591. [Google Scholar] [CrossRef] [Green Version]
Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef]
Xue, W.; Mou, X.; Zhang, L.; Bovik, A.C.; Feng, X. Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features. IEEE Trans. Image Process. 2014, 23, 4850–4862. [Google Scholar] [CrossRef]
Liu, L.; Liu, B.; Huang, H.; Bovik, A.C. No-reference image quality assessment based on spatial and spectral entropies. Signal Process. Image Commun. 2014, 29, 856–863. [Google Scholar] [CrossRef]
Ghadiyaram, D.; Bovik, A.C. Perceptual quality prediction on authentically distorted images using a bag of features approach. J. Vis. 2017, 17, 32. [Google Scholar] [CrossRef] [PubMed]
Kim, J.; Nguyen, A.D.; Lee, S. Deep CNN-based blind image quality predictor. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 11–24. [Google Scholar] [CrossRef]
Bosse, S.; Maniry, D.; Müller, K.R.; Wiegand, T.; Samek, W. Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans. Image Process. 2017, 27, 206–219. [Google Scholar] [CrossRef] [Green Version]
Ruderman, D.L. The statistics of natural images. Netw. Comput. Neural Syst. 1994, 5, 517–548. [Google Scholar] [CrossRef]
Lasmar, N.E.; Stitou, Y.; Berthoumieu, Y. Multiscale skewed heavy tailed model for texture analysis. In Proceedings of the 2009 16th IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, 7–10 November 2009; pp. 2281–2284. [Google Scholar]
Mittal, A.; Moorthy, A.K.; Bovik, A.C. Making image quality assessment robust. In Proceedings of the 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), Pacific Grove, CA, USA, 4–7 November 2012; pp. 1718–1722. [Google Scholar]
Wainwright, M.J.; Schwartz, O. 10 Natural Image Statistics and Divisive. In Probabilistic Models of the Brain: Perception and Neural Function; The MIT Press: Cambridge, MA, USA, 2002; p. 203. [Google Scholar]
Lyu, S.; Simoncelli, E.P. Nonlinear image representation using divisive normalization. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
Jiang, T.; Hu, X.J.; Yao, X.H.; Tu, L.P.; Huang, J.B.; Ma, X.X.; Cui, J.; Wu, Q.F.; Xu, J.T. Tongue image quality assessment based on a deep convolutional neural network. BMC Med. Inform. Decis. Mak. 2021, 21, 147. [Google Scholar] [CrossRef]
Tchendjou, G.T.; Simeu, E.; Alhakim, R. Fuzzy logic based objective image quality assessment with FPGA implementation. J. Syst. Archit. 2018, 82, 24–36. [Google Scholar] [CrossRef]
Ponomarenko, N.; Ieremeiev, O.; Lukin, V.; Egiazarian, K.; Jin, L.; Astola, J.; Vozel, B.; Chehdi, K.; Carli, M.; Battisti, F.; et al. Color image database TID2013: Peculiarities and preliminary results. In Proceedings of the European Workshop on Visual Information Processing (EUVIP), Paris, France, 10–12 June 2013; pp. 106–111. [Google Scholar]
Ponomarenko, N.; Ieremeiev, O.; Lukin, V.; Egiazarian, K.; Jin, L.; Astola, J.; Vozel, B.; Chehdi, K.; Carli, M.; Battisti, F.; et al. TID2013 Database. 2013. Available online: http://www.ponomarenko.info/tid2013.htm (accessed on 20 November 2021).
Sheikh, H.R.; Wang, Z.; Huang, H.; Bovik, A.C. LIVE Image Quality Assessment Database Release 2. 2005. Available online: http://live.ece.utexas.edu/research/quality (accessed on 20 November 2021).
Sheikh, H.R.; Sabir, M.F.; Bovik, A.C. A statistical evaluation of recent full reference image quality assessment algorithms. IEEE Trans. Image Process. 2006, 15, 3440–3451. [Google Scholar] [CrossRef]
Ghadiyaram, D.; Bovik, A.C. Massive online crowdsourced study of subjective and objective picture quality. IEEE Trans. Image Process. 2016, 25, 372–387. [Google Scholar] [CrossRef] [Green Version]
Ghadiyaram, D.; Bovik, A. Live in the Wild Image Quality Challenge Database. 2015. Available online: http://live.ece.utexas.edu/research/ChallengeDB/index.html (accessed on 20 November 2021).
Xu, Q.S.; Liang, Y.Z. Monte Carlo cross validation. Chemom. Intell. Lab. Syst. 2001, 56, 1–11. [Google Scholar] [CrossRef]
Kim, J.; Lee, S. Fully deep blind image quality predictor. IEEE J. Sel. Top. Signal Process. 2016, 11, 206–220. [Google Scholar] [CrossRef]
Xu, J.; Ye, P.; Li, Q.; Du, H.; Liu, Y.; Doermann, D. Blind image quality assessment based on high order statistics aggregation. IEEE Trans. Image Process. 2016, 25, 4444–4457. [Google Scholar] [CrossRef]
Saad, M.A.; Bovik, A.C.; Charrier, C. Blind image quality assessment: A natural scene statistics approach in the DCT domain. IEEE Trans. Image Process. 2012, 21, 3339–3352. [Google Scholar] [CrossRef]
Tchendjou, G.T.; Simeu, E. Self-Healing Image Sensor Using Defective Pixel Correction Loop. In Proceedings of the 2019 International Conference on Control, Automation and Diagnosis (ICCAD), Grenoble, France, 2–4 July 2019; pp. 1–6. [Google Scholar]

Figure 1. Objective image quality assessment process.

Figure 2. Design of prediction models for objective no-reference

I Q A

process.

Figure 2. Design of prediction models for objective no-reference

I Q A

process.

Figure 3. Flow diagram for selection of strongly correlated features to remove.

Figure 4. Performance comparison for different NR-IQA methods trained on TID2013.

Figure 5. Performance comparison for different NR-IQA methods trained on LIVE.

Figure 6. Performance comparison for different NR-IQA methods trained on LIVE Challenge.

Figure 7. Implementation architecture of the

I L B Q A

on an

F P G A

platform.

Figure 7. Implementation architecture of the

I L B Q A

on an

F P G A

platform.

Figure 8. Comparison between

M A T L A B

and

F P G A

implemented scores.

Figure 8. Comparison between

M A T L A B

and

F P G A

implemented scores.

Figure 9. Comparison between MOS,

M A T L A B

, and

F P G A

implemented scores.

Figure 9. Comparison between MOS,

M A T L A B

, and

F P G A

implemented scores.

Figure 10. Example of using

I L B Q A

index in a control loop process.

Figure 10. Example of using

I L B Q A

index in a control loop process.

Table 1. Extracted features based on NSS in spatial domain.

Feats	Feature Description	Computation Procedure
f1–f2	Shape and variance	Fit GGD to MSCN coefficients
f3–f6	Shape, mean, left variance, right variance	Fit AGGD to horizontal (H) pairwise
f7–f10	Shape, mean, left variance, right variance	Fit AGGD to vertical (V) pairwise
f11–f14	Shape, mean, left variance, right variance	Fit AGGD to first diagonal (D1) pairwise
f15–f18	Shape, mean, left variance, right variance	Fit AGGD to second diagonal (D2) pairwise

Table 2. Extracted features based on GM and LoG.

Feats	Feature Description
f1–f10	marginal probability functions of GM
f11–f20	marginal probability functions of LoG
f21–f30	measure of the overall dependency of GM
f31–f40	measure of the overall dependency of LoG

Table 3. Extracted features based on SSE.

Feats	Feature Description
f1–f2	Mean and skew of spatial entropy values for the first scale
f3–f4	Mean and skew of spectral entropy values for the first scale
f5–f6	Mean and skew of spatial entropy values for the second scale
f7–f8	Mean and skew of spectral entropy values for the second scale
f9–f10	Mean and skew of spatial entropy values for the third scale
f11–f12	Mean and skew of spectral entropy values for the third scale

Table 4. Correlation score of MOS vs. estimated image quality of different image databases using different ML methods for DLBQA.

DB		ANN	SVM	DT	FL
	PLCC	0.871	0.899	0.878	0.786
TID	SRCC	0.859	0.883	0.862	0.751
	dCor	0.850	0.882	0.854	0.762
	PLCC	0.947	0.960	0.962	0.927
LIVE	SRCC	0.946	0.958	0.960	0.936
	dCor	0.939	0.950	0.954	0.927
	PLCC	0.627	0.801	0.741	0.673
NLC	SRCC	0.566	0.774	0.739	0.613
	dCor	0.596	0.779	0.732	0.630

Table 5. Spearman’s correlation between intermediary metrics and MOS for different image databases.

DB	IM_	ANN	SVM	DT	FL
	NSS	0.647	0.703	0.842	0.615
	GM	0.697	0.597	0.849	0.629
TID	LOG	0.547	0.463	0.780	0.498
	SSE	0.623	0.865	0.856	0.431
	NSS	0.931	0.939	0.930	0.915
	GM	0.946	0.924	0.954	0.940
LIVE	LOG	0.942	0.928	0.954	0.937
	SSE	0.868	0.954	0.937	0.632
	NSS	0.603	0.618	0.717	0.569
	GM	0.441	0.488	0.685	0.499
NLC	LOG	0.527	0.529	0.741	0.552
	SSE	0.444	0.763	0.683	0.440

Table 6. Brownian’s distance between the group of intermediary metrics and MOS.

dCor	TID2013	LIVE	Live Challenge
ANN	0.745	0.951	0.632
SVM	0.835	0.957	0.772
DT	0.907	0.969	0.823
FL	0.691	0.937	0.634

Table 7. Correlation score of MOS vs. estimated image quality of different image databases using different ML methods for ILBQA.

DB		ANN	SVM	DT	FL
	PLCC	0.925	0.936	0.954	0.924
TID	SRCC	0.912	0.924	0.946	0.913
	dCor	0.908	0.919	0.942	0.908
	PLCC	0.971	0.978	0.982	0.970
LIVE	SRCC	0.970	0.977	0.979	0.968
	dCor	0.961	0.971	0.976	0.961
	PLCC	0.818	0.815	0.894	0.826
NLC	SRCC	0.777	0.848	0.877	0.793
	dCor	0.792	0.851	0.879	0.802

Table 8. Performance comparison on three benchmark

I Q A

databases.

Table 8. Performance comparison on three benchmark

I Q A

databases.

Methods	LIVE		TID2013		LIVE-CH
Methods	SRCC	PLCC	SRCC	PLCC	SRCC	PLCC
DIQaM [33]	0.960	0.972	0.835	0.855	0.606	0.601
BIECON [48]	0.958	0.960	0.717	0.762	0.595	0.613
DIQA [32]	0.975	0.977	0.825	0.850	0.703	0.704
DB-CNN [49]	0.968	0.971	0.816	0.865	0.851	0.869
AIGQA [16]	0.960	0.957	0.871	0.893	0.751	0.761
CNN-BQA (proposed)	0.961	0.966	0.881	0.876	0.772	0.786
DIIVINE [26]	0.925	0.923	0.654	0.549	0.546	0.568
BRISQUE [28]	0.939	0.942	0.573	0.651	0.607	0.585
ILNIQE [27]	0.902	0.865	0.519	0.640	0.430	0.510
FRIQUE [31]	0.948	0.962	0.669	0.704	0.720	0.720
SSEQ [30]	0.941	0.943	0.883	0.886	0.772	0.807
GMLOGQA [29]	0.956	0.970	0.883	0.877	0.782	0.805
DLBQA (proposed)	0.951	0.957	0.892	0.893	0.779	0.862
ILBQA (proposed)	0.977	0.983	0.934	0.942	0.876	0.883

Table 9. Percentage of time consumed by each feature extraction axes.

Axes	Percentage of Time (%)
NSS	7.98
GM	3.10
LOG	3.10
SSE	85.77

Table 10. Comparison of time complexity for different

B I Q A

approaches.

Table 10. Comparison of time complexity for different

B I Q A

approaches.

NR-IQA	Time (second)
BLIIND-II	76.12
DIIVINE	25.40
BRISQUE	0.168
GMLOGQA	0.131
SSEQ	1.807
ILBQA	2.126

Table 11. Device performance utilization estimates in HLS synthesis phase.

	NSS	GM-LoG	SSE	DT	FL	ILBQA
clk est. (ns)	21.79	21.84	21.84	11.31	16.82	21.85
Min clk.	79,302	1648	570	2	42	80,370
Max clk.	9,400,263	2,909,751	25,231,749	17	43	84,720,454
BRAM	1235	1 052	554	0	0	2328
DSP	394	436	141	0	96	1136
FF	59,097	69,705	20,872	10,936	5245	167,801
LUT	181,656	247,775	50,903	128,122	11,779	497,622

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Takam Tchendjou, G.; Simeu, E. Visual Perceptual Quality Assessment Based on Blind Machine Learning Techniques. Sensors 2022, 22, 175. https://doi.org/10.3390/s22010175

AMA Style

Takam Tchendjou G, Simeu E. Visual Perceptual Quality Assessment Based on Blind Machine Learning Techniques. Sensors. 2022; 22(1):175. https://doi.org/10.3390/s22010175

Chicago/Turabian Style

Takam Tchendjou, Ghislain, and Emmanuel Simeu. 2022. "Visual Perceptual Quality Assessment Based on Blind Machine Learning Techniques" Sensors 22, no. 1: 175. https://doi.org/10.3390/s22010175

APA Style

Takam Tchendjou, G., & Simeu, E. (2022). Visual Perceptual Quality Assessment Based on Blind Machine Learning Techniques. Sensors, 22(1), 175. https://doi.org/10.3390/s22010175

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Visual Perceptual Quality Assessment Based on Blind Machine Learning Techniques

Abstract

1. Introduction

2. Feature Extraction

2.1. Natural Scene Statistic in the Spatial Domain

2.2. Gradient Magnitude and Laplacian of Gaussian

2.3. Spatial and Spectral Entropies

2.4. Convolutional Neural Network for NR-IQA

3. Construction of Prediction Models

3.1. Remove Superfluous Features

3.2. Direct Process ( $D L B Q A$ )

3.3. Indirect Process ( $I L B Q A$ )

3.3.1. Construction of Intermediate Models

3.3.2. Construction of Final Indirect Prediction Model

4. Experimental Results

4.1. Results Comparison

4.2. Computational Complexity

5. Implementation Architectures and Results

5.1. HLS Implementation of the IP-Cores

5.2. SDK Implementation and Results

5.3. Visual Assessment

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Visual Perceptual Quality Assessment Based on Blind Machine Learning Techniques

Abstract

1. Introduction

2. Feature Extraction

2.1. Natural Scene Statistic in the Spatial Domain

2.2. Gradient Magnitude and Laplacian of Gaussian

2.3. Spatial and Spectral Entropies

2.4. Convolutional Neural Network for NR-IQA

3. Construction of Prediction Models

3.1. Remove Superfluous Features

3.2. Direct Process ( D L B Q A )

3.3. Indirect Process ( I L B Q A )

3.3.1. Construction of Intermediate Models

3.3.2. Construction of Final Indirect Prediction Model

4. Experimental Results

4.1. Results Comparison

4.2. Computational Complexity

5. Implementation Architectures and Results

5.1. HLS Implementation of the IP-Cores

5.2. SDK Implementation and Results

5.3. Visual Assessment

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2. Direct Process ( $D L B Q A$ )

3.3. Indirect Process ( $I L B Q A$ )