Research on a Multidimensional Digital Printing Image Quality Evaluation Method Based on MLP Neural Network Regression

Zhong, Jiafeng; Zhan, Hongwu; Xu, Fang; Zhang, Yinwei

doi:10.3390/app14145986

Open AccessArticle

Research on a Multidimensional Digital Printing Image Quality Evaluation Method Based on MLP Neural Network Regression

College of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310014, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(14), 5986; https://doi.org/10.3390/app14145986

Submission received: 23 May 2024 / Revised: 27 June 2024 / Accepted: 7 July 2024 / Published: 9 July 2024

(This article belongs to the Section Additive Manufacturing Technologies)

Download

Browse Figures

Versions Notes

Abstract

High-quality printing is a longstanding objective in the printing and replication industry. However, the methods used to evaluate print quality suffer from subjectivity and multidimensionality, relying on personal preferences and subjective perceptions to assess the quality of printed images, which poses significant limitations. To address these issues, a set of evaluation metrics aimed at assessing the quality of digital printing products is proposed to achieve evaluation results consistent with human visual perception. Given the differing imaging principles of pre-press digital images and post-scan images, these images are first preprocessed to standardize them for comparison. Next, features are extracted in both spatial and frequency domains, and similarity metrics are used to quantify the differences in features between pre-press digital images and post-scan images. Finally, a multilayer perceptron (MLP) neural network regression model is trained to predict the final objective quality scores. Experimental results on two standard databases demonstrate that this metric exhibits high consistency in both subjective and objective quality evaluation metrics for printed image quality assessment and outperforms other metrics in terms of accuracy.

Keywords:

printing; image quality assessment (IQA); spatial domain; frequency domain; neural network

1. Introduction

Given the subjective and multidimensional nature of evaluating the quality of color prints, this task becomes inherently complex. With the rapid evolution of the digital printing industry, ensuring high-quality prints has become a central concern, garnering increasing attention. Unlike expert measurements using technical means, the assessment of print image quality by the general public often relies on personal preferences and subjective perceptions [1]. Consequently, there is a pressing need to develop an evaluation metric that closely aligns with human visual perception.

Numerous research teams have dedicated substantial efforts to evaluating both print quality and image quality. Jing et al. [2] have developed an image processing and analysis metric employing a universal approach. This metric takes digital and scanned images as input and produces grayscale spatial visualizations indicating the location and severity of defects as output, effectively assessing various flaws. Eerola [3] applied No-Reference image quality assessment (NR-IQA) metrics to print images and evaluated the performance of several cutting-edge NR-IQA metrics on a vast array of printed photographs. Notably, the Blind Image Quality Index (BIQI) and the Natural Image Quality Evaluator (NIQE) metrics outperformed other IQA methods. Of these, the NIQE displayed lower sensitivity to image content. In a related field of image processing pertinent to printing, Mittal [4] proposed a distortion-agnostic no-reference (NR) quality assessment metric leveraging natural scene statistics. This spatial domain-based metric offers heightened computational efficiency. Liang [5] introduced a pioneering deep blind IQA metric based on multiple instance regression to overcome the fundamental limitation of unavailable ground truth for local blocks in traditional convolutional neural network-based image quality assessment (CNN-based IQA) models. Additionally, Wang [6] devised a multiscale information content-weighted approach based on the natural image gradient structural similarity metric (GSM) model. This novel weighting method enhances the performance of IQA metrics relying on the Peak Signal-to-Noise Ratio (PSNR) and the SSIM.

The researchers mentioned above have made significant advancements in their respective fields. However, methodologies for evaluating the quality of digital print images often assess images from a limited perspective. While addressing specific issues such as color deviation, spatial domain information may adequately characterize the problem. However, metrics representing image quality extend beyond color aspects alone. Different image contents emphasize various directions, and the importance of different aspects of image quality varies across different media. For example, while glossiness [3] is not an intrinsic attribute of digital images, it significantly influences the perceived quality of printed images. Additionally, the unique characteristics of digital print images have been overlooked in image quality assessment: the color gamut range of digital print images is much smaller than that of digital images, and printed images are essentially halftone images, with grayscale variations influenced by the human eye’s low-pass characteristics, unlike continuous-tone digital images. Understanding the visual traits of the human eye reveals its ability to perceive diverse visual sensations through image contrast, brightness, and phenomena such as visual masking and brightness adaptation when viewing images. Therefore, evaluating print image quality requires considering not only objective assessments of print quality but also the influence of visual characteristics on print quality. Relying solely on a singular evaluation framework risks significant disparities between assessment conclusions on final print quality and intuitive visual perceptions, leading to notable discrepancies between objective and subjective quality assessments.

Hence, our study endeavors to develop a comprehensive print quality evaluation metric grounded in deep learning, aiming to reconcile subjective and objective assessment outcomes. To accomplish this, we initially gathered print images and meticulously aligned them with pre-press digital images to ensure commensurability. Subsequently, we conduct feature extraction from these processed datasets in both spatial and frequency domains, covering diverse aspects like the color, structure, and content of the images. Employing similarity measurement techniques enables us to quantitatively assess the distinctions between these features, thereby furnishing robust support for accurate print quality appraisal. Leveraging neural network models facilitates precise predictions of the objective quality assessment results for prints.

In the ensuing sections, we will delve into our research in greater detail. Section 2 will focus on evaluating the performance of sampling devices to ensure the fidelity and dependability of data acquisition, while also establishing the theoretical underpinnings for spatial domain feature extraction. Section 3 will delve deeper into methods for frequency domain feature extraction and devise effective tools for feature fusion. Section 4 will present our experimental findings and undertake a thorough analysis to refine the proposed quality assessment metric. Finally, Section 5 will encapsulate the outcomes and conclusions of our entire research endeavor.

2. Extraction of Spatial Domain Features from Digital Print Images

For assessing the quality of printed images, there exist several methods that can capture post-print images, including the use of multispectral cameras, CCD line scan cameras, and scanners. In this study, we opted for the Epson Expression 13000XL scanner (Epson, Suwa, Japan) to acquire image data due to its high precision, which facilitates the faithful reproduction of real data, along with its user-friendly and efficient operation. The specific parameters are detailed in Table 1. The acquired images are in the RGB color space.

We derived the modulation transfer function (MTF) of the scanner by scanning the ISO 12233 [7] test chart, as illustrated in Figure 1 The scanning results allowed us to determine the scanner’s MTF, as shown in Figure 2. At lower spatial frequencies, the MTF values are higher, indicating that the scanner performs well when reproducing larger details and overall image contrast. The curve drops gently, so we can conclude that the device’s performance is relatively stable.

Figure 3 presents a novel metric proposed for assessing the quality of digital print images, termed Spatial-Frequency Domain Feature Fusion (FFSF). The preprocessing stage involves the feature curve transformation of pre-print digital images, wherein the images undergo processing using device ICC curves. Additionally, post-scan images undergo registration and correction. Subsequently, the preprocessed images are subjected to spatial transformation, transitioning from the original color space to the CIELAB color space. Features are then extracted from the L, A, and B channels, with chromaticity features being extracted from channels A and B. In the L channel, gradient features are computed using Sobel filters, structural features are evaluated using the structural similarity index (SSIM) metric, texture features are extracted via a Gabor filter bank, and spatial frequency features are derived by analyzing discrete cosine transform (DCT) coefficients of image blocks. Following feature extraction, similarity metrics are applied to quantify the differences between features extracted from pre-print digital images and scanned print images. Finally, an MLP neural network model trained on standard datasets is employed to predict the objective quality scores of the print images.

Due to inevitable issues like positional shifts and rotations during the scanning process, image registration is necessary to ensure more accurate scanned images. In this study, we utilized the SURF (Speeded-Up Robust Features) metric. Extensive research by Pang [8] has shown this metric to be highly reliable. Since the imaging principles of pre-press digital images and post-scan images differ, it is illogical to directly compare them. Spatial transformations and device characteristic curve transformations were applied to pre-press images, including both forward and inverse transformations, as depicted in Figure 4. Here, A.ICC represents the spatial transformation curve, while B.ICC denotes the device characteristic curve. The forward and reverse transformations do not preserve the original image; instead, they introduce color alterations due to the application of different ICC curves, thereby simulating the printing imaging process accurately.

For color image prints, accurately reproducing colors serves as a critical benchmark for quality control. In color image printing, challenges often arise in mapping color information between different color spaces. For instance, in digital printing, original information is typically presented in RGB or CMYK mode on a monitor, while printing involves transferring ink onto a substrate in the form of ink dots. Due to the human eye’s low-pass characteristics, it primarily perceives the overall appearance and cannot discern ink dots at a microscopic level. This underscores the essence of printing, where color gradation is manifested by the density of ink dots.

When conducting quality assessment, the selection of an appropriate color space holds significant importance in the evaluation process. We opt for the classic color space for conversion, namely the CIELAB [9,10,11] color space. If the original image is in the CMYK color space and needs to be converted to the CIELAB color space, it must first be converted to the RGB color space. However, direct conversion from the RGB color space to the CIELAB color space is not possible; instead, it necessitates the use of the XYZ color space as an intermediary. The CIELAB color space comprises a luminance channel and two chromaticity channels. The conversion formula is as follows:

\begin{matrix} {l i n e a r}_{r} & = & r e d T R C [{d e v i c e}_{r}] \\ {l i n e a r}_{g} & = & g r e e n T R C [{d e v i c e}_{g}] \\ {l i n e a r}_{b} & = & b l u e T R C [{d e v i c e}_{b}] \end{matrix},

(1)

\begin{matrix} [\begin{matrix} {c o n n e c t i o n}_{X} \\ {c o n n e c t i o n}_{Y} \\ {c o n n e c t i o n}_{Z} \end{matrix}] = \\ [\begin{matrix} {r e d C o l o r a n t}_{X} & {g r e e n C o l o r a n t}_{X} & {b l u e C o l o r a n t}_{X} \\ {r e d C o l o r a n t}_{Y} & {g r e e n C o l o r a n t}_{Y} & {b l u e C o l o r a n t}_{Y} \\ {r e d C o l o r a n t}_{Z} & {g r e e n C o l o r a n t}_{Z} & {b l u e C o l o r a n t}_{Z} \end{matrix}] [\begin{matrix} {l i n e a r}_{r} \\ {l i n e a r}_{g} \\ {l i n e a r}_{b} \end{matrix}], \end{matrix}

(2)

\{\begin{matrix} L^{*} = 116 f (\frac{{c o n n e c t i o n}_{Y}}{Y_{n}}) - 16 \\ a^{*} = 500 [f (\frac{{c o n n e c t i o n}_{X}}{X_{n}}) - f (\frac{{c o n n e c t i o n}_{Y}}{Y_{n}})] \\ b^{*} = 200 [f (\frac{{c o n n e c t i o n}_{Y}}{Y_{n}}) - f (\frac{{c o n n e c t i o n}_{Z}}{Z_{n}})] \end{matrix},

(3)

f (t) = \{\begin{array}{l} t^{\frac{1}{3}} i f t > {(\frac{6}{29})}^{3} \\ \frac{1}{3} {(\frac{29}{6})}^{2} t + \frac{4}{29} o t h e r w i s e \end{array},

(4)

Based on the device’s ICC file, the following information is obtained, including the

{r e d C o l o r a n t}_{T a g}

,

{g r e e n C o l o r a n t}_{T a g}

,

{b l u e C o l o r a n t}_{T a g}

,

r e d T R C T a g

,

g r e e n T R C T a g

, and

b l u e T R C T a g

, as shown in Table 2 and Figure 5. Furthermore, it is noted that in Equation (3),

X_{n}, Y_{n}

, and

Z_{n}

values can be queried through the ICC file,

X_{n} = 0.9642

,

Y_{n} = 1.0000

,

Z_{n} = 0.8249

.

For clarity in subsequent discussions, the following explanation is provided: The experimental objects are a preprocessed, pre-print digital image, denoted as

I_{o}

, and a post-scanned image, denoted as

I_{s}

. After color space transformation, the three-channel image information of

I_{o}

includes

L_{o}, A_{o}

, and

B_{o}

, while the three-channel image information of

I_{s}

includes

L_{s}, A_{s}

, and

B_{s}

.

2.1. Extraction of Chromaticity Features

Chroma refers to the saturation or purity of a color, indicating its intensity. Changes in chroma can reflect the accuracy and consistency of color reproduction during printing. Color features are extracted from the A and B channels. The similarity of the A channel is as follows:

S_{c}^{A} = \frac{1}{N} \sum_{i = 1}^{N} \frac{2 \times A_{s} (i) \times A_{o} (i) + C_{1}}{{A_{s} (i)}^{2} + {A_{o} (i)}^{2} + C_{1}},

(5)

In this equation,

i

represents the index of the image pixel, and

N

denotes the total number of pixels in the image.

C_{1}

serves as a stabilizing constant for the fractional term to prevent division by zero. Similarly, the chromaticity similarity in the

B

channel is described as follows:

S_{c}^{B} = \frac{1}{N} \sum_{i = 1}^{N} \frac{2 \times B_{s} (i) \times B_{o} (i) + C_{1}}{{B_{s} (i)}^{2} + {B_{o} (i)}^{2} + C_{1}},

(6)

The calculation of color feature similarity between

I_{s}

and

I_{o}

is outlined as follows:

S_{C} = \frac{1}{N} \sum_{i = 1}^{N} (\frac{2 \times A_{s} (i) \times A_{o} (i) + C_{1}}{{A_{s} (i)}^{2} + {A_{o} (i)}^{2} + C_{1}} \times \frac{2 \times B_{s} (i) \times B_{o} (i) + C_{1}}{{B_{s} (i)}^{2} + {B_{o} (i)}^{2} + C_{1}}),

(7)

Different values of

C_{1}

will lead to differences in chromaticity similarity. In order to optimize the chromaticity metrics, the TID2013 dataset was tested to judge the effect of

C_{1}

values on chromaticity values via the Spearman Rank Order Correlation Coefficient (SROCC) metrics. The SROCC is used as a measure of the correlation between two variables. The SROCC value range is [−1, 1]; the closer it is to 1 or −1, the more monotonicity is achieved. The results are shown in Figure 6. Figure 6a shows the change curve of the SROCC metrics with

C_{1}

taking the value of Figure 6b, which shows the change rate of the curve. From this analysis, it is concluded that the SROCC increases rapidly when the value of

C_{1}

increases from 0 to about 40, which indicates that the monotonicity between the color similarity and the true value increases as the value of the parameter increases. The rate of increase of the SROCC starts to slow down when the value of

C_{1}

is about 40, and enters into a smooth growth phase. When the parameter value continues to increase, the SROCC still shows a slow increasing trend, but the increase is smaller and close to linear growth. In practical applications, it may be necessary to find a balance between the performance improvement induced by the parameter value increase and the computational complexity. Too high a parameter value may not significantly improve the performance, while the computational cost will increase. All things considered,

C_{1} = 40

is chosen in this paper.

2.2. Gradient Features of Digital Print Images Based on the Sobel Filter

In print quality assessment, gradient refers to the rate of change in color or brightness within an image. Gradient features reveal the sharpness of edges and details, as well as variations in texture. This information is essential for evaluating print quality. Image edges play a vital role in conveying visual information, with gradient features widely utilized in image quality assessment due to their ability to capture both edge structure and contrast variations effectively [12]. Various forms of gradient features have been integrated into image quality assessment metrics. For example, the Feature Similarity Index (FSIM) incorporates gradient and phase consistency features to evaluate the local quality of distorted images [13], while the Directional Similarity Measure (DASM) combines gradient magnitude, anisotropy, and local orientation features [14,15,16,17]. Commonly employed edge detection filters include those created by Sobel, Prewitt, and Scharr [18]. The advantage of the Sobel operator lies in its robustness to noise and its relatively accurate computation of edge direction. In this study, the Sobel filter was chosen to compute the gradient of the luminance channel in both

I_{o}

and

I_{s}

. The gradient magnitude calculation formulas for these images using the Sobel filter on the L channel are as follows:

G_{o} (i) = \sqrt{{(h_{x} * L_{o} (i))}^{2} + {(h_{y} * L_{o} (i))}^{2}},

(8)

G_{s} (i) = \sqrt{{(h_{x} * L_{s} (i))}^{2} + {(h_{y} * L_{s} (i))}^{2}},

(9)

where

G_{o} (i)

and

G_{s} (i)

denote the gradient magnitude values at position index

i

of the pre-press digital image and the printed and scanned image, respectively. The symbol

*

indicates the convolution operation.

h_{x}

and

h_{y}

represent the horizontal and vertical Sobel filter templates, defined as follows:

h_{x} = \frac{1}{4} [\begin{matrix} 1 & 0 & - 1 \\ 2 & 0 & - 2 \\ 1 & 0 & - 1 \end{matrix}], h_{y} = \frac{1}{4} [\begin{matrix} 1 & 2 & 1 \\ 0 & 0 & 0 \\ - 1 & - 2 & - 1 \end{matrix}],

(10)

The similarity between the gradient magnitudes of

I_{o}

and

I_{s}

is quantified using the standard deviation of the similarity:

G (i) = \frac{2 G_{o} (i) \times G_{s} (i) + C_{2}}{G_{o}^{2} (i) + G_{s}^{2} (i) + C_{2}},

(11)

S_{G} = \frac{1}{N} \sum_{i = 1}^{N} G (i),

(12)

In this context,

C_{2}

serves as a stabilizing constant within the fraction to prevent division by zero. The relationship strength between gradient similarity and the Mean Opinion Score (MOS) was assessed using the SROCC metric across various

C_{2}

values, as illustrated in Figure 7. A higher SROCC value approaching one signifies a stronger correlation between the two datasets. Notably, the maximum SROCC value is achieved when

C_{2}

is set to 66. Thus, for this study,

C_{2}

is determined to be 66. The red number and dot line indicate the positions of the inflection points of the curve, which are the turning points where the curve transitions from an upward trend to a downward trend.

2.3. Printing Image Structural Features Based on the SSIM

Structure refers to the layout and organization of textures, details, and patterns within an image. The quality of structure significantly influences the visual quality of the image and the overall perception of printed materials. Based on the premise that the CIELAB color space is well suited for extracting structural information from the scene, Wang et al. [14] introduced a novel concept for measuring image quality known as the SSIM. This concept defines separate functions to measure the luminance, contrast, and structural similarity between two images,

I_{o}

and

I_{s}

, in the L channel. Their similarity is calculated as the product of the luminance similarity

l (I_{o}, I_{s})

, the contrast similarity

c (I_{o}, I_{s})

, and the structural similarity

s (I_{o}, I_{s})

, resulting in an overall measure of image similarity:

S_{B} (I_{o}, I_{s}) = l (I_{o}, I_{s}) \times c (I_{o}, I_{s}) \times s (I_{o}, I_{s}),

(13)

where:

l (I_{o}, I_{s}) = \frac{2 μ_{L_{o}} μ_{L_{s}} + C_{3}}{{(μ_{L_{o}})}^{2} + {(μ_{L_{s}})}^{2} + C_{3}},

(14)

c (I_{o}, I_{s}) = \frac{2 δ_{L_{o}} δ_{L_{s}} + C_{4}}{{(δ_{L_{o}})}^{2} + {(δ_{L_{s}})}^{2} + C_{4}},

(15)

s (I_{o}, I_{s}) = \frac{δ_{L_{o} L_{s}} + C_{5}}{δ_{L_{o}} δ_{L_{s}} + C_{5}},

(16)

Let the parameters

C_{4} = {2 C}_{5}

, and

S_{B} (I_{o}, I_{s})

be expressed as follows:

S_{B} (I_{o}, I_{s}) = \frac{2 μ_{L_{o}} μ_{L_{s}} + C_{3}}{{(μ_{L_{o}})}^{2} + {(μ_{L_{s}})}^{2} + C_{3}} \cdot \frac{2 δ_{L_{o} L_{s}} + C_{4}}{{(δ_{L_{o}})}^{2} + {(δ_{L_{s}})}^{2} + C_{4}},

(17)

The equations demonstrate that

μ_{L_{o}}

and

μ_{L_{s}}

represent the luminance features using the mean value. Taking

L_{o}

as an example, the calculation is as follows:

μ_{L_{o}} = \frac{1}{N} \sum_{i = 1}^{N} L_{o} (i),

(18)

Similarly, the calculation for

μ_{L_{s}}

is as follows.

δ_{L_{o}}

and

δ_{L_{s}}

represent the contrast feature using normalized variance. For instance, the calculation for

L_{o}

is as follows:

δ_{L_{o}} = \sqrt{\frac{1}{N - 1} \sum_{i = 1}^{N} {(L_{o} (i) - μ_{L_{o}})}^{2}},

(19)

Similarly, the calculation for

δ_{L_{s}}

is as follows.

δ_{L_{o} L_{s}}

represents the covariance used to characterize the structural feature, which is calculated as follows:

δ_{L_{o} L_{s}} = \frac{1}{N - 1} \sum_{i = 1}^{N} (L_{o} (i) - μ_{L_{o}}) (L_{s} (i) - μ_{L_{s}}),

(20)

C_{3} = {(k_{1} L^{'})}^{2}

and

C_{4} = {(k_{2} L^{'})}^{2}

are constants employed to ensure stability. Here,

L^{'}

denotes the number of grayscale levels in the image. For an 8-bit grayscale image,

L^{'} = 255

, with

k_{1} = 0.01,

and

k_{2} = 0.03

. Substituting these values, we obtain

C_{3} = 6.5025

and

C_{4} = 58.5225

.

3. Extracting the Frequency Domain Features of Printed Images

3.1. Image Multiscale and Multiorientation Texture Features Based on Gabor Filters

Texture features can capture the surface structure and details of printed samples, which are crucial for evaluating the fidelity and consistency of print quality. Texture reflects certain patterns of color and grayscale variations on the surface of an object, serving as effective information to distinguish objects with similar spectral properties but different spatial distribution characteristics. It is widely used in the extraction of image information. Multiscale texture analysis methods include wavelet transform, Gabor transform, pyramid decomposition, and contourlet transform, among others. In multiscale image analysis, the support region of the Gabor filter’s main function has a rectangular structure, with its length and width varying with the scale. Therefore, it can optimally and sparsely describe the edges of an image and the approximate singular curves with the minimum number of coefficients [19].

Gabor filters can precisely adjust their response to specific orientations, making them highly effective in capturing directional information in images. Each Gabor filter is highly sensitive to local frequency components along a specific orientation. In contrast, wavelet transforms (such as Haar wavelets, Daubechies wavelets, etc.) typically have limited directional selectivity, mainly capturing horizontal, vertical, and diagonal orientations. Although extensions like 2D directional wavelets exist to provide more directional sensitivity, they are generally more complex than Gabor filters in terms of orientation control. Gabor filters are often used to simulate biological visual perception due to their similarity to the response characteristics of simple cells in the human visual system, especially in processing visual textures and edges [19,20,21,22,23]. While wavelet transforms excel in image compression and denoising, their biological relevance is typically not as strong as that of Gabor filters.

Gabor transformation can effectively analyze grayscale variations in images at various scales and orientations. A two-dimensional Gabor filter consists of a series of bandpass filters with given orientation angles and spatial frequencies. For a given frequency and orientation angle, it is defined as follows:

g (x, y, λ, θ) = \frac{1}{2 π σ_{x} σ_{y}} e^{[- \frac{1}{2} (\frac{x^{2}}{σ_{x}^{2}} + \frac{y^{2}}{σ_{y}^{2}})] \times [i (\frac{2 π x}{λ})]},

(21)

In the formula

x = x_{0} \cos θ + y_{0} \sin θ,

(22)

y = {- x}_{0} \sin θ + y_{0} \cos θ,

(23)

(x_{0}, y_{0})

is the spatial position coordinate of the image pixel.

λ

is the wavelength, which is the reciprocal of spatial frequency.

θ

is the angle between the filter orientation and the

y_{0}

axis.

σ_{x}

and

σ_{y}

are the standard deviations of the Gaussian envelope in the

x

and

y

directions, calculated as follows:

σ_{x} = \frac{λ}{π} \times \sqrt{\frac{\ln 2}{2}} \times \frac{2^{B_{w}} + 1}{2^{B_{w}} - 1},

(24)

σ_{y} = \frac{σ_{x}}{γ},

(25)

where

B_{w}

and

γ

are the passband width and the aspect ratio of the filter frequency, respectively.

Here,

g (x, y, λ, θ)

represents a complex matrix, wherein the real and imaginary components are convolved separately with the information from

I_{o}

and

I_{s}

on the L channel. Subsequently, the real and imaginary parts are combined, and the modulus is calculated.

In Figure 8, we present an example of an original image extracted from the TID2013 database alongside the image’s response after being filtered by a Gabor filter bank with five scales and four orientations. From the illustration, it is evident that larger-scale filters offer superior noise reduction capabilities but may also lead to a loss of finer details.

The filters are grouped into

l

categories based on the number of scales (

l = 1, 2, 3, 4, 5

). Additionally, each scale has four orientations (

θ = 0 °, 45 °, 90 °, 135 °

). Therefore, when computing the similarity of texture features, the weighted average of the correlation coefficients for the four orientations is calculated as follows:

C C_{l} (E_{o}, E_{s}) = \frac{1}{4} \sum_{θ} \frac{\sum_{i = 1}^{N} (E_{s}^{θ} (i) - \bar{E_{s}^{θ}}) \cdot (E_{o}^{θ} (i) - \bar{E_{o}^{θ}})}{\sqrt{\sum_{i = 1}^{N} {(E_{s}^{θ} (i) - \bar{E_{s}^{θ}})}^{2} \cdot \sum_{i = 1}^{N} {(E_{o}^{θ} (i) - \bar{E_{o}^{θ}})}^{2}}},

(26)

where

E_{o}

and

E_{o}

denote the response energy of the

l - t h

scale and

θ - d e g r e e

Gabor filter for the pre-print digital image and the scanned image of the printed product, respectively.

\bar{E_{o}^{θ}}

and

\bar{E_{s}^{θ}}

represent the mean energies corresponding to the

θ - d e g r e e

.

3.2. Image Spatial Frequency Characteristics Based on Discrete Cosine Transform (DCT)

Spatial frequency features reveal the distribution of various frequency components in printed samples, which helps evaluate image sharpness and detail fidelity. Contrast sensitivity is a critical visual attribute of the human visual system (HVS), where different sensitivities exist for various distortions in images. Hence, we utilize discrete cosine transform (DCT) on the luminance channel L of the image, leveraging the sub-band features derived from the image’s DCT [24,25,26,27] to emulate the contrast sensitivity function (CSF) within the HVS.

Performing 4 × 4 DCT and 8 × 8 DCT transformations on the TID2013 dataset, we calculated the spatial frequency similarity and the SROCC indicator between MOSs. The results, as shown in Table 3, indicate that the 4 × 4 transformation yields better results compared to the 8 × 8 transformation. The computation complexity of 4 × 4 DCT blocks is lower, and due to their smaller size, they can handle local changes and details in the image more accurately, especially in regions with rich, high-frequency details.

According to the research by S.-H. Bae [28], we partition the spatial frequency domain into regions with varying sensitivities to distortion. Three indicators of spatial frequency similarity are represented by comparing the contrast energy values of low-frequency (LF), mid-frequency (MF), and high-frequency (HF) regions within 4 × 4 discrete cosine transform (DCT) blocks. Figure 9 illustrates the spatial contrast sensitivity function (CSF) within the LF, MF, and HF regions of the 4 × 4 DCT blocks.

The discrete cosine transforms of two-dimensional images are as follows:

\begin{matrix} F (u, v) = c (u) c (v) \sum_{x = 0}^{M - 1} \sum_{y = 0}^{N - 1} f (x, y) c o s \frac{π (2 x + 1) u}{2 M} c o s \frac{π (2 y + 1) v}{2 N}, \\ u = 0, 1, 2 \dots, M - 1 v = 0, 1, 2 \dots, N - 1, \end{matrix}

(27)

where

c (u) = \{\begin{array}{l} \frac{1}{\sqrt{N}}, u = 0 \\ \sqrt{\frac{2}{N}}, u = 1, 2 \dots, M - 1 \end{array} c (v) = \{\begin{array}{l} \frac{1}{\sqrt{N}}, v = 0 \\ \sqrt{\frac{2}{N}}, v = 1, 2 \dots, N - 1 \end{array}

(28)

The key advantage of two-dimensional discrete cosine transform (DCT) lies in its ability to concentrate a significant portion of the image’s energy within specific frequency ranges [25]. DCT transforms signals or images from the time domain (spatial domain) to the frequency domain [26]. This facilitates a better understanding of the frequency components present in the signal or image. In the frequency domain, one can observe the contributions of various frequency components in the signal or image, aiding in further analysis and processing.

The contrast energy map extracted from the LF region of the image is defined as described in [25]:

φ_{L} = \sum_{(u, v) \in R_{L}} p^{2} (u, v),

(29)

The energy calculation for the mid-frequency and high-frequency regions is computed as follows:

p (u, v)

represents the magnitude of the DCT coefficients at

(u, v)

and

R_{L}

denotes the mid-frequency region. Similarly, the energy calculation for the mid-frequency and high-frequency regions is computed as follows:

φ_{M} = \sum_{(u, v) \in R_{M}} p^{2} (u, v),

(30)

φ_{H} = \sum_{(u, v) \in R_{H}} p^{2} (u, v),

(31)

Similar to the SSIM, the similarity index takes the form of

(2 \cdot a b + c) / (a^{2} + b^{2} + c)

. Therefore, the similarity index between

I_{o}

and

I_{s}

in the low-frequency domain is defined as follows:

S_{L} = \frac{1}{n} \sum_{(u, v) \in R_{L}} \frac{2 φ_{o L} \cdot φ_{s L} + C_{6}}{{φ_{o L}}^{2} + {φ_{s L}}^{2} + C_{6}},

(32)

where

φ_{o L}

and

φ_{s L}

represent the contrast energy maps of

I_{o}

and

I_{s}

in the low-frequency domain, respectively.

n

denotes the number of 4 × 4 pixel blocks into which the image is divided, and

C_{6}

is a constant that is used to maintain stability. Similarly, the similarity in the mid-frequency and high-frequency domains is given as follows:

S_{M} = \frac{1}{n} \sum_{(u, v) \in R_{M}} \frac{2 φ_{o M} \cdot φ_{s M} + C_{7}}{{φ_{o M}}^{2} + {φ_{s M}}^{2} + C_{7}},

(33)

S_{H} = \frac{1}{n} \sum_{(u, v) \in R_{H}} \frac{2 φ_{o H} \cdot φ_{s H} + C_{8}}{{φ_{o H}}^{2} + {φ_{s H}}^{2} + C_{8}},

(34)

where

φ_{o M}

and

φ_{s M}

, respectively, represent the contrast energy maps of

I_{o}

and

I_{s}

in the MF region.

φ_{o H}

and

φ_{s H}

, respectively, represent the contrast energy maps of

I_{o}

and

I_{s}

in the HF region.

n

denotes the number of 4 × 4 pixel blocks into which the image is divided.

C_{7}

and

C_{8}

are two positive invariants controlling numerical stability. To determine the appropriate invariant constants, a series of tests were conducted. Various values of

C_{6}

,

C_{7}

, and

C_{8}

were examined, with the SROCC used as the metric, to evaluate the relationship between the spatial frequency similarity and the MOS. As shown in Figure 10, the blue curve represents the SROCC under different invariant constant values, while the red curve indicates the rate of change of the SROCC. Figure 10a,d represent the low-frequency region, Figure 10b,e represent the mid-frequency region, and Figure 10c,f represent the high-frequency region. The results indicate that with the increase in the invariant constant, the SROCC shows an upward trend in all three regions, but the growth rate gradually decreases. This suggests that excessively high values of the invariant constant may not significantly improve performance. Considering all factors, it was concluded that the optimal values are

C_{6} = 0.03

,

C_{7} = 0.04

, and

C_{8} = 0.03

.

3.3. Neural Network Regression

Machine learning serves as an effective solution for addressing regression problems, with various well-established methods proving successful in image quality assessment (IQA) regression. These methods include the k-nearest neighbor (KNN) model [30], support vector machines (SVMs) [31,32,33], random forest [34,35], and neural networks [36]. While linear regression and KNN calculations are relatively straightforward, they often encounter challenges in achieving optimal results. SVM metrics pose difficulties in their implementation, especially with large-scale training samples. The reliance of SVMs on quadratic programming for solving support vectors involves m-order matrix calculations (where m is the number of samples). When the number of samples is substantial, the storage and computation of this matrix can significantly consume machine memory and processing time.

The MLP neural network possesses powerful nonlinear fitting capabilities. Through learning, the MLP neural network can automatically adjust its internal weights, gradually adapting to different tasks and environments. This enables the neural network to exhibit excellent generalization capabilities in handling regression problems, and MLP neural networks demonstrate good robustness and fault tolerance. Based on the significant advantages of the MLP mentioned above, we chose the MLP neural network to train a regression mapping model between quality description metrics and subjective image evaluation scores.

We construct the regression function

f (X)

, which, given the input metrics, yields the final objective quality assessment score:

Q = f (S_{C}, S_{G}, S_{S}, {C C}_{1}, {C C}_{2}, {C C}_{3}, {C C}_{4}, {C C}_{5}, S_{L}, S_{M}, S_{H}),

(35)

The architecture of the MLP neural network is depicted in Figure 11. It consists of two hidden layers. The first hidden layer processes 11-dimensional input data and produces 64 output nodes. The second hidden layer then takes these 64 nodes and reduces them to 32 output nodes. A ReLU activation function is applied between these two hidden layers to introduce nonlinearity.

4. Experimental Results and Analysis

To assess the effectiveness of the proposed metrics, experiments were conducted on two standard databases: TID2013 [37] and TID2008 [38]. Both datasets comprise a selection of distorted images that are utilized to evaluate the perceptual quality of images. The TID2008 dataset comprises 25 reference images along with their respective distorted versions. The distortions in TID2008 encompass noise, blur, compression artifacts, and various other types of distortions. The TID2013 dataset comprises 3000 reference images paired with their corresponding distorted versions. The distortions applied to these images encompass a range of common types, including noise, blur, compression artifacts, and others. Four key performance metrics were employed for a quantitative evaluation: the Pearson Linear Correlation Coefficient (PLCC), SROCC, the Kendall Rank Order Correlation Coefficient (KROCC), and the Root Mean Square Error (RMSE). The PLCC gauges the model’s prediction accuracy, reflecting its ability to predict subjective assessments with minimal error. The RMSE measures the consistency of the model’s predictions. The SROCC and the KROCC indicate the monotonicity of the model’s predictions, showcasing how well it can predict subjective assessments. The proposed method underwent comparison with nine classical Full-Reference image quality assessment (FR-IQA) methods, including GMSD [17], GSM [16], IFC [39], MAD [40], MSSIM [41], PSNR [42], SSIM [14], VIF [43], and VSI [44].

The color space transformation is performed on both the original digital image and the scanned print image, as illustrated in Figure 12:

Using the TID2013 dataset as an example, the images in the TID2013 dataset were printed and subsequently scanned to obtain scanned images. The relationship between the chromaticity similarity index in the CIELAB space and the MOS was validated, utilizing the MOS provided by the dataset itself, according to the method described in Section 2.1. Figure 13 illustrates the distribution of the chromaticity similarity and the MOS for printed images in the CIELAB color space:

The scatter plot depicts a consistently distributed set of data points, which are densely clustered, suggesting the precision of the method outlined in Section 2.1.

Similarly, this study examines the correlation between the gradient similarity metric proposed in Section 2.2 and the MOS. Figure 14 illustrates the distribution of gradient similarity against the MOS:

The scatter plot shows a relatively uniform distribution of data points with high concentrations, indicating the accuracy of the method proposed in Section 2.2.

Randomly selected images from the TID2008 and TID2013 standard datasets were used to measure the SSIM values of the pre-print digital images and the scanned images, along with their energy histograms, as shown in Figure 15. It can be observed that the energy distribution of the two images is nearly identical in the low-frequency region. However, in the high-frequency region, the energy of the pre-print digital image is higher than that of the scanned image. This is because printing cannot perfectly reproduce continuous-tone images. Additionally, it can be concluded that the printing quality is poorer in the color depth region. This is reasonable because printing machines have a saturation point: once the ink density reaches a certain level, the color becomes saturated, and further increasing the ink amount will not change the color.

This paper selects features from Gabor filters with four orientations and five scales. The selection method for the orientation parameter

θ

is as follows:

θ_{k} = \frac{k π}{n}, k = {0, 1, 2, \dots, n - 1},

(36)

Here,

n

represents the total number of filter orientations. In this paper, we select four orientations, so the angles are, respectively,

θ_{0} = 0 °

,

θ_{1} = 45 °

,

θ_{2} = 90 °

,

θ_{3} = 135 °

.

The parameter

f

of the Gabor filter represents the frequency of the filter, and it is calculated as follows:

σ \cdot f = \frac{1}{π} \sqrt{\frac{\ln 2}{2}} \times \frac{2^{B} + 1}{2^{B} - 1},

(37)

B

represents the bandwidth of the filter, typically set as

B = 1

, then

σ \cdot f = 0.56

. For a filter bank, its frequency can be expressed as follows:

f_{k} = a^{- k} f_{m a x}, k = (0, 1, \dots, m - 1),

(38)

m

represents the total number of frequencies for the filter, where

a = \sqrt{2}

. In this study, five scales are chosen, and thus

m = 5

.

As shown in Figure 16, the real and imaginary parts of the Gabor filter are illustrated.

In the process of image learning, 80% of the images from each database were randomly selected for training, and the remaining 20% were used for testing. To ensure fair comparisons, the training–testing process was repeated 1000 times, and the median of the results after 1000 iterations was taken as the final outcome to eliminate performance biases. The indicators for measuring the strengths and weaknesses of the quality evaluation metrics are shown below:

The SROCC metric evaluates the monotonic relationship between the image quality assessment metrics and the subjective evaluation, indicating whether the objective assessment maintains a consistent trend with changes in the subjective evaluation. The SROCC value ranges from −1 to 1, with values closer to 1 or −1 indicating stronger monotonicity.
The PLCC metric assesses the linear correlation between the image quality assessment metrics and the subjective evaluation, with values ranging from −1 to 1. Positive values denote a positive correlation, negative values signify a negative correlation, and 0 indicates no correlation. The closer the absolute value is to one, the more accurate the metric’s evaluation.
The KROCC metric measures the rank correlation between two variables, with values ranging from −1 to 1. A value close to 1 indicates a strong correlation, while a value of 0 implies independence.
The RMSE quantifies the deviation between the image quality assessment metrics and the subjective evaluation, with smaller values indicating higher consistency. An RMSE value approaching 0 signifies the high accuracy of the metrics.

This paper compares nine commonly used FR-IQA metrics, including GMSD, GSM, IFC, MAD, MSSIM, PSNR, SSIM, VIF, and VSI. Table 4 presents the results of testing these nine common metrics along with the proposed metric on two standard databases. The bolded numbers denote the evaluation metrics with the best performance. Based on the descriptions of the SROCC, PLCC, KROCC, and RMSE, combined with the data in the table, it is evident that the performance of the proposed FFSF metric is superior.

The selection of various regression methods leads to diverse evaluation outcomes. To identify a suitable regression approach for feature fusion, this section conducts a comparison between random forest (RF) and the MLP neural network. As shown in Table 5, except for the RMSE metric, the results for the remaining three metrics are all superior in the MLP compared to the RF. The results of the SVR are comparatively less favorable. Furthermore, since RMSE ranges from 0 to positive infinity, both methods fall within a similar, small range, indicating comparable performance. In summary, the evaluation metrics obtained through regression using the MLP neural network surpass those obtained using the RF. The bolded numbers denote the evaluation metrics with the best performance.

The study concurrently compared results from the same model using different loss functions. Experiments were conducted using an MLP neural network model with various loss functions, and the findings are summarized in Table 6, showing that MSE outperforms MAE in terms of results.

To illustrate the consistency between the proposed metrics in this paper and human subjective ratings, scatter plots were generated, as shown in Figure 17. These plots display the evaluation results of nine commonly used metrics relative to the MOS in the TID2013 database. It is evident that the results obtained from individual evaluation methods exhibit considerable dispersion in the scatter distribution of image MOSs, lacking a clear trend. Further analysis through curve fitting accentuates the inconsistency between subjective and objective assessments. This outcome unequivocally indicates significant disparities between the objective quality assessment results obtained and the subjective ratings. To demonstrate the alignment between the metrics proposed in this study and human subjective ratings, scatter plots were generated, as depicted in Figure 17. These plots illustrate the evaluation outcomes of nine commonly used metrics relative to the Mean Opinion Score (MOS) in the TID2013 database. The GMSD metric shows poor monotonicity and a weak trend in the fitted curve, indicating varied GMSD scores for the same MOS. GMSD, primarily assessing image gradient magnitude similarity as an objective metric, does not fully encompass subjective image quality perception, which is influenced by factors like color, contrast, and noise. Hence, different GMSD scores may occur for identical MOS values. Similar issues are observed with the IFC metric. In contrast, GSM, MAD, MSSIM, SSIM, VIF, and VSI metrics exhibit relatively better performance, showing more pronounced growth trends in scatter plots and fitted curves. Among them, SSIM performs notably well despite its high data dispersion. Conversely, PSNR performs the least favorably, analyzing images solely from a pixel perspective and neglecting their multidimensional attributes, diverging from real-world conditions. Clearly, individual evaluation methods yield notably dispersed results in MOS scatter plots, lacking definitive trends. Further curve fitting analysis underscores the disparity between subjective and objective assessments, distinctly highlighting substantial differences between objective quality assessment results and subjective ratings.

Compared to other methods, the metrics proposed in this paper demonstrate significant advantages in their evaluation results. As shown in the scatter plot in Figure 18, the data distribution is more concentrated, and the curve fitting shows a good linear relationship. This result indicates a high level of consistency between the objective evaluation scores of the metrics proposed in this paper and the subjective scores.

Taking the TID2013 dataset as an example, Figure 19 clearly illustrates the variations in the loss functions of nine popular metrics and the metric proposed in this paper during the training of the MLP neural network. Table 7 provides detailed convergence values of each loss function. Through comparative analysis, it is evident that although the loss functions of individual evaluation methods can converge, their effectiveness is not satisfactory. This indicates that a single evaluation method is insufficient to fully meet the complex requirements of printing image quality assessment. In contrast, the metrics proposed in this paper demonstrate significant advantages in the performance of the loss function. Not only is the training effect more ideal, but the final convergence values are also superior compared to other metrics.

5. Conclusions

This paper presents an effective and reliable metric for assessing the quality of printed images that closely mimic human visual perception. Compared to mainstream single-dimensional evaluation metrics, the metrics proposed in this paper offer a comprehensive evaluation of printed images from multiple dimensions, significantly improving the accuracy and reliability of the assessment. It combines spatial and frequency domain features, starting with data analysis in the CIELAB color space. In the spatial domain, it extracts color and gradient features, calculates the similarity of different features using similarity measures, and then computes structural similarity. Subsequently, it applies Gabor transform and discrete cosine transform (DCT) to the images, utilizing texture and spatial frequency features as complementary aspects of printed image quality. Multiscale and multi-directional texture features, along with spatial frequency features, are employed for a comprehensive analysis of the frequency domain information of the images, and similarity measures are used to assess similarity at the frequency domain level. Finally, an MLP neural network regression tool is utilized to train a stable model through extensive training, predicting the overall quality scores of the printed images. Extensive experiments on publicly available databases demonstrate that the selected parameters and this method highly conform to subjective perception, exhibiting high consistency with human visual characteristics.

Samples in publicly available databases may lack diversity in certain aspects, as some databases might focus primarily on specific types of prints or particular printing conditions. This limitation could result in inadequate generalization of the results to different types of prints or conditions. Future research should employ more diverse and comprehensive databases to ensure the broader applicability of the findings. Future research directions include enhancing the accuracy and robustness of evaluation models to more comprehensively reflect print quality. One promising avenue is to explore the application of deep learning and reinforcement learning in print quality assessment. For instance, convolutional neural networks (CNN) can be used to extract image features, while reinforcement learning can be employed to optimize the evaluation process. Improving the self-learning and adaptive capabilities of these models will further enhance the efficiency and effectiveness of the assessments.

Author Contributions

Methodology, J.Z.; software, J.Z.; validation, J.Z., H.Z., F.X. and Y.Z.; data curation, J.Z.; writing—original draft preparation, J.Z.; writing—review and editing, J.Z., H.Z., F.X. and Y.Z.; visualization, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by national key plan, grant number 2018YFB1309401.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to protect the integrity and accuracy of the research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

MLP	Multilayer perceptron.
IQA	Image quality assessment.
NR-IQA	No-Reference image quality assessment.
BIQI	Blind Image Quality Index.
NIQE	Natural Image Quality Evaluator.
CNN-based IQA	Convolutional Neural Network-based Image Quality Assessment.
GSM	Gradient structural similarity metric.
PSNR	Peak Signal-to-Noise Ratio.
MTF	Modulation transfer function.
FFSF	Spatial-Frequency Domain Feature Fusion.
SSIM	Structural similarity index.
DCT	Discrete cosine transform.
SURF	Speeded-Up Robust Features.
SROCC	Spearman Rank Order Correlation Coefficient.
PLCC	Pearson Linear Correlation Coefficient.
KROCC	Kendall Rank Order Correlation Coefficient.
RMSE	Root Mean Square Error.
FR-IQA	Full-Reference Image Quality Assessment.
FSIM	Feature Similarity Index.
DASM	Directional Similarity Measure.
MOS	Mean Opinion Score.
CSF	Contrast sensitivity function.
LF	Low frequency.
MF	Mid frequency.
HF	High frequency.
KNN	K-nearest neighbor.
SVMs	Support vector machines.
RF	Random forest.

References

Maqsood, N.; Rimašauskas, M. Influence of printing process parameters and controlled cooling effect on the quality and mechanical properties of additively manufactured CCFRPC. Compos. Commun. 2022, 35, 101338. [Google Scholar] [CrossRef]
Jing, X.; Astling, S.; Jessome, R.; Maggard, E.; Nelson, T.; Shaw, M.; Allebach, J.P. A general approach for assessment of print quality. Proc. SPIE 2013, 8653, 175–184. [Google Scholar] [CrossRef]
Eerola, T.; Lensu, L.; Kälviäinen, H.; Bovik, A.C. Study of no-reference image quality assessment algorithms on printed images. J. Electron. Imaging 2014, 23, 061106. [Google Scholar] [CrossRef]
Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef] [PubMed]
Liang, D.; Gao, X.; Lu, W.; Li, J. Deep blind image quality assessment based on multiple instance regression. Neurocomputing 2021, 431, 78–89. [Google Scholar] [CrossRef]
Wang, Z.; Li, Q. Information content weighting for perceptual image quality assessment. IEEE Trans. Image Process. 2010, 20, 1185–1198. [Google Scholar] [CrossRef] [PubMed]
ISO 12233:2023; Photography—Electronic Still Picture Imaging—Resolution and Spatial Frequency Responses. International Organization for Standardization: Geneva, Switzerland, 2023.
Gupta, S.; Thakur, K.; Kumar, M. 2D-human face recognition using SIFT and SURF descriptors of face’s feature regions. Vis. Comput. 2021, 37, 447–456. [Google Scholar] [CrossRef]
Ze, S.; Jin, Y.; Peng, L.; Huan, Z.; Jun, J. Colour space conversion model from CMYK to CIELab based on CS-WNN. Color. Technol. 2021, 137, 272–279. [Google Scholar] [CrossRef]
Li, Y.; Bi, Y.; Zhang, W.; Ren, J.; Chen, J. M2GF: Multi-Scale and Multi-Directional Gabor Filters for image edge detection. Appl. Sci. 2023, 13, 9409. [Google Scholar] [CrossRef]
Durmus, D. CIELAB color space boundaries under theoretical spectra and 99 test color samples. Color Res. Appl. 2020, 45, 796–802. [Google Scholar] [CrossRef]
Manikonda, S.K.; Gaonkar, D.N. Islanding detection method based on image classification technique using histogram of oriented gradient features. IET Gener. Transm. Distrib. 2020, 14, 2790–2799. [Google Scholar] [CrossRef]
Liu, Y.; Fan, K.; Wu, D.; Zhou, W. Filter pruning by quantifying feature similarity and entropy of feature maps. Neurocomputing 2023, 554, 126297. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Gu, K.; Qiao, J.; Min, X.; Yue, G.; Lin, W.; Thalmann, D. Evaluating quality of screen content images via structural variation analysis. IEEE Trans. Vis. Comput. Graph. 2018, 24, 2689–2701. [Google Scholar] [CrossRef] [PubMed]
Cai, Q.; Cui, G.C.; Wang, H.X. EEG-based emotion recognition using multiple kernel learning. Mach. Intell. Res. 2022, 19, 472–484. [Google Scholar] [CrossRef]
Xue, W.; Zhang, L.; Mou, X.; Bovik, A.C. Gradient magnitude similarity deviation: A highly efficient perceptual image quality index. IEEE Trans. Image Process. 2014, 23, 684–695. [Google Scholar] [CrossRef]
Pei, X.; Hong, Y.; Chen, L.; Guo, Q.; Duan, Z.; Pan, Y.; Hou, H. Robustness of machine learning to color, size change, normalization, and image enhancement on micrograph datasets with large sample differences. Mater. Des. 2023, 232, 112086. [Google Scholar] [CrossRef]
Zhao, X.; Tao, R.; Li, W.; Philips, W.; Liao, W. Fractional Gabor convolutional network for multisource remote sensing data classification. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5503818. [Google Scholar] [CrossRef]
Choi, J.Y.; Lee, B. Ensemble of deep convolutional neural networks with Gabor face representations for face recognition. IEEE Trans. Image Process. 2019, 29, 3270–3281. [Google Scholar] [CrossRef]
Hill, P.; Achim, A.; Al-Mualla, M.E.; Bull, D. Contrast sensitivity of the wavelet, dual tree complex wavelet, curvelet, and steerable pyramid transforms. IEEE Trans. Image Process. 2016, 25, 2739–2751. [Google Scholar] [CrossRef]
Ni, Z.; Zeng, H.; Ma, L.; Hou, J.; Chen, J.; Ma, K.-K. A Gabor feature-based quality assessment model for the screen content images. IEEE Trans. Image Process. 2021, 27, 4516–4528. [Google Scholar] [CrossRef] [PubMed]
Samantaray, A.K.; Rahulkar, A. New design of adaptive Gabor wavelet filter bank for medical image retrieval. IET Image Process. 2020, 14, 679–687. [Google Scholar] [CrossRef]
Dua, S.; Singh, J.; Parthasarathy, H. Image forgery detection based on statistical features of block DCT coefficients. Procedia Comput. Sci. 2020, 171, 369–378. [Google Scholar] [CrossRef]
Yuan, Z.; Liu, D.; Zhang, X.; Wang, H.; Su, Q. DCT-based color digital image blind watermarking method with variable steps. Multimed. Tools Appl. 2020, 79, 30557–30581. [Google Scholar] [CrossRef]
Sharma, S.; Sharma, H.; Sharma, J.B. Artificial bee colony based perceptually tuned blind color image watermarking in hybrid LWT-DCT domain. Multimed. Tools Appl. 2021, 80, 18753–18785. [Google Scholar] [CrossRef]
Sawant, S.S.; Manoharan, P. Unsupervised band selection based on weighted information entropy and 3D discrete cosine transform for hyperspectral image classification. Int. J. Remote Sens. 2020, 41, 3948–3969. [Google Scholar] [CrossRef]
Bae, S.-H.; Kim, M. A Novel DCT-Based JND Model for Luminance Adaptation Effect in DCT Frequency. IEEE Signal Process. Lett. 2013, 20, 893–896. [Google Scholar] [CrossRef]
Bae, S.-H.; Kim, M. A novel image quality assessment with globally and locally consilient visual quality perception. IEEE Trans. Image Process. 2016, 25, 2392–2406. [Google Scholar] [CrossRef]
Tu, B.; Wang, J.; Kang, X. KNN-Based representation of superpixels for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 4032–4047. [Google Scholar] [CrossRef]
Ding, Y.; Zhao, Y.; Zhao, X. Image quality assessment based on multi-feature extraction and synthesis with support vector regression. Signal Process. Image Commun. 2017, 54, 81–92. [Google Scholar] [CrossRef]
Du, S.; Yan, Y.; Ma, Y. Blind image quality assessment with the histogram sequences of high-order local derivative patterns. Digit. Signal Process. 2016, 55, 1–12. [Google Scholar] [CrossRef]
Narwaria, M.; Lin, W. SVD-based quality metric for image and video using machine learning. IEEE Trans. Syst. Man Cybern. B Cybern. 2012, 42, 347–364. [Google Scholar] [CrossRef]
Pei, S.-C.; Chen, L. Image quality assessment using human visual DOG model fused with random forest. IEEE Trans. Image Process. 2015, 24, 3282–3292. [Google Scholar] [CrossRef] [PubMed]
Phan, T.N.; Kuch, V.; Lehnert, L.W. Land cover classification using Google Earth Engine and random forest classifier—The role of image composition. Remote Sens. 2020, 12, 2411. [Google Scholar] [CrossRef]
Ertuğrul, O.F. A novel type of activation function in artificial neural networks: Trained activation function. Neural Netw. 2018, 99, 148–157. [Google Scholar] [CrossRef] [PubMed]
Ponomarenko, N.; Jin, L.; Ieremeiev, O.; Lukin, V.; Egiazarian, K.; Astola, J.; Vozel, B.; Chehdi, K.; Carli, M.; Battisti, F.; et al. Image database TID2013: Peculiarities, results and perspectives. Signal Process. Image Commun. 2015, 1, 55–77. [Google Scholar] [CrossRef]
Ponomarenko, N.; Lukin, V.; Zelensky, A.; Egiazarian, K.; Carli, M.; Battisti, F. TID2008: A database for evaluation of full-reference visual quality assessment metrics. Adv. Mod. Radioelectron. 2009, 10, 30–45. [Google Scholar]
Chang, H.; Zhang, Q.; Wu, Q.; Gan, Y. Perceptual image quality assessment by independent feature detector. Neurocomputing 2015, 151, 1142–1152. [Google Scholar] [CrossRef]
Eskicioglu, A.M.; Fisher, P.S. Image quality measures and their performance. IEEE Trans. Commun. 1995, 43, 2959–2965. [Google Scholar] [CrossRef]
Rajagopal, H.; Khairuddin, A.S.M.; Mokhtar, N.; Ahmad, A.; Yusof, R. Application of image quality assessment module to motion-blurred wood images for wood species identification system. Wood Sci. Technol. 2019, 53, 967–981. [Google Scholar] [CrossRef]
Tanchenko, A. Visual-PSNR measure of image quality. J. Vis. Commun. Image Represent. 2014, 25, 874–878. [Google Scholar] [CrossRef]
Hou, R.; Zhou, D.; Nie, R.; Liu, D.; Xiong, L.; Guo, Y.; Yu, C. VIF-Net: An unsupervised framework for infrared and visible image fusion. IEEE Trans. Comput. Imaging 2020, 6, 640–651. [Google Scholar] [CrossRef]
Zhang, L.; Shen, Y.; Li, H. VSI: A visual saliency-induced index for perceptual image quality assessment. IEEE Trans. Image Process. 2014, 23, 4270–4281. [Google Scholar] [CrossRef] [PubMed]

Figure 1. ISO 12233 test chart (partial).

Figure 2. The modulation transfer function (MTF) of the Epson Expression 13000XL Scanner.

Figure 3. The flow chart of the proposed metrics.

Figure 4. Pretreatment flow chart.

Figure 5. Tone reproduction curve.

Figure 6. The impact of different values of

C_{1}

on the similarity of data chromaticity: (a) the SROCC varies with the parameter

C_{1}

, (b) the rate of change of SROCC with respect to the parameter

C_{1}

.

Figure 6. The impact of different values of

C_{1}

on the similarity of data chromaticity: (a) the SROCC varies with the parameter

C_{1}

, (b) the rate of change of SROCC with respect to the parameter

C_{1}

.

Figure 7. The impact of different values of

C_{2}

on the similarity of data gradients.

Figure 7. The impact of different values of

C_{2}

on the similarity of data gradients.

Figure 8. The texture information of a reference image after filtering using a Gabor filter bank in five scales.

Figure 9. The spatial CSF in LF, MF and HF regions for a 4 × 4 DCT block (adapted from [29]): (a) the classification of frequency regions depending on the CSF obtained in [29], (b) the three frequency regions in a 4 × 4 DCT block.

Figure 10. The impact of different invariant constants on spatial frequency similarity: (a) the SROCC in the low-frequency region, (b) the SROCC in the mid-frequency region, (c) the SROCC in the high-frequency region, (d) the growth rate of the SROCC in the low-frequency region, (e) the growth rate of the SROCC in the mid-frequency region, and (f) the growth rate of the SROCC in the high-frequency region.

Figure 11. Diagram of the MLP neural network architecture.

Figure 12. The visualization of the image after being transformed into the CIELAB color space: (a) The scanned print image, (b) the scanned image’s L channel, (c) the scanned image’s A channel, (d) the scanned image’s B channel, (e) the original image, (f) the original image’s L channel, (g) the original image’s A channel, and (h) the original image’s B channel.

Figure 13. A scatter plot of the image chromaticity similarity and the MOS in the CIELAB color space.

Figure 14. A scatter plot of the gradient similarity against the MOS for the images.

Figure 15. Histogram of the SSIM values for the selected scanned images compared to the original images: (a) image A, (b) image B.

Figure 16. Visualization of the Gabor filter’s kernel.

Figure 17. A scatter plot of subjective scores versus predicted scores for objective evaluation metrics in the TID2013 database: (a) GMSD, (b) GSM, (c) IFC, (d) MAD, (e) MSSIM, (f) PSNR, (g) SSIM, (h) VIF, and (i) VSI.

Figure 18. A scatter plot of subjective scores versus predicted scores for the FFSF evaluation metrics in the TID2013 database.

Figure 19. The loss convergence curves for 10 metrics during training.

Table 1. The technical specifications of the Epson Expression 13000XL Scanner.

Technical Parameters	Value
Transparency Area	309 mm × 420 mm
Optical Density	3.8D
Color Bit Depth	48 bit
Photosensitive Element	12-line color CCD with micro lenses

Table 2. Device ICC tag sheet.

	X	Y	Z
${r e d C o l o r a n t}_{T a g}$	0.4361	0.2225	0.0139
${g r e e n C o l o r a n t}_{T a g}$	0.3851	0.7169	0.0971
${b l u e C o l o r a n t}_{T a g}$	0.1431	0.0606	0.7141

Table 3. A comparison of the results for DCT variations at two scales.

	4 × 4 DCT	8 × 8 DCT
Low frequency	0.559	0.535
Mid frequency	0.595	0.510
High frequency	0.465	0.401

Table 4. A performance comparison of 10 metrics on two databases.

		GMSD	GSM	IFC	MAD	MSSIM	PSNR	SSIM	VIF	VSI	FFSF
TID2013	KROCC	0.6339	0.6255	0.6785	0.7327	0.5977	0.7161	0.5588	0.6665	0.7183	0.7720
	PLCC	0.8590	0.8464	0.8791	0.9071	0.8319	0.9080	0.7895	0.8769	0.9000	0.9397
	RMSE	0.6346	0.6603	0.5905	0.5219	0.6880	0.5193	0.7608	0.5959	0.5404	0.4256
	SROCC	0.8044	0.7946	0.8697	0.9052	0.7779	0.8925	0.7417	0.8510	0.8965	0.9316
TID2008	KROCC	0.7092	0.6596	0.7009	0.7294	0.6636	0.9120	0.5758	0.6991	0.7123	0.8017
	PLCC	0.8788	0.8422	0.8810	0.8899	0.8579	0.7395	0.7732	0.8762	0.8762	0.9485
	RMSE	0.6404	0.7235	0.6349	0.6120	0.6895	0.9078	0.8511	0.6468	0.6466	0.4277
	SROCC	0.8907	0.8504	0.8903	0.9051	0.8559	0.5628	0.7749	0.8840	0.8979	0.9444

Table 5. A performance comparison of two regression methods.

		RF	SVR	MLP
TID2013	KROCC	0.7777	0.7426	0.8452
	PLCC	0.9426	0.9180	0.9680
	RMSE	0.4095	0.6907	0.4907
	SROCC	0.9315	0.9083	0.9483
TID2018	KROCC	0.7921	0.7497	0.8197
	PLCC	0.9429	0.9159	0.9659
	RMSE	0.4422	0.6017	0.5333
	SROCC	0.8382	0.8738	0.9118

Table 6. MLP regression models with different loss functions.

		MAE	MSE
TID2013	KROCC	0.6974	0.8452
	PLCC	0.8637	0.9680
	RMSE	0.6784	0.4907
	SROCC	0.8745	0.9483
TID2018	KROCC	0.7619	0.8197
	PLCC	0.9031	0.9659
	RMSE	0.8769	0.5333
	SROCC	0.7697	0.9118

Table 7. The loss convergence values for 10 metrics during training.

	GMSD	GSM	IFC	MAD	MSSIM	PSNR	SSIM	VIF	VSI	FFSF
LOSS	1.505	0.834	1.214	0.929	0.874	1.172	0.767	0.869	1.040	0.138

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhong, J.; Zhan, H.; Xu, F.; Zhang, Y. Research on a Multidimensional Digital Printing Image Quality Evaluation Method Based on MLP Neural Network Regression. Appl. Sci. 2024, 14, 5986. https://doi.org/10.3390/app14145986

AMA Style

Zhong J, Zhan H, Xu F, Zhang Y. Research on a Multidimensional Digital Printing Image Quality Evaluation Method Based on MLP Neural Network Regression. Applied Sciences. 2024; 14(14):5986. https://doi.org/10.3390/app14145986

Chicago/Turabian Style

Zhong, Jiafeng, Hongwu Zhan, Fang Xu, and Yinwei Zhang. 2024. "Research on a Multidimensional Digital Printing Image Quality Evaluation Method Based on MLP Neural Network Regression" Applied Sciences 14, no. 14: 5986. https://doi.org/10.3390/app14145986

APA Style

Zhong, J., Zhan, H., Xu, F., & Zhang, Y. (2024). Research on a Multidimensional Digital Printing Image Quality Evaluation Method Based on MLP Neural Network Regression. Applied Sciences, 14(14), 5986. https://doi.org/10.3390/app14145986

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on a Multidimensional Digital Printing Image Quality Evaluation Method Based on MLP Neural Network Regression

Abstract

1. Introduction

2. Extraction of Spatial Domain Features from Digital Print Images

2.1. Extraction of Chromaticity Features

2.2. Gradient Features of Digital Print Images Based on the Sobel Filter

2.3. Printing Image Structural Features Based on the SSIM

3. Extracting the Frequency Domain Features of Printed Images

3.1. Image Multiscale and Multiorientation Texture Features Based on Gabor Filters

3.2. Image Spatial Frequency Characteristics Based on Discrete Cosine Transform (DCT)

3.3. Neural Network Regression

4. Experimental Results and Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI