Quality Assessment of 3D Synthesized Images Based on Textural and Structural Distortion Estimation

Emerging 3D-related technologies such as augmented reality, virtual reality, mixed reality, and stereoscopy have gained remarkable growth due to their numerous applications in the entertainment, gaming, and electromedical industries. In particular, the 3D television (3DTV) and free-viewpoint television (FTV) enhance viewers’ television experience by providing immersion. They need an infinite number of views to provide a full parallax to the viewer, which is not practical due to various financial and technological constraints. Therefore, novel 3D views are generated from a set of available views and their depth maps using depth-image-based rendering (DIBR) techniques. The quality of a DIBR-synthesized image may be compromised for several reasons, e.g., inaccurate depth estimation. Since depth is important in this application, inaccuracies in depth maps lead to different textural and structural distortions that degrade the quality of the generated image and result in a poor quality of experience (QoE). Therefore, quality assessment DIBR-generated images are essential to guarantee an appreciative QoE. This paper aims at estimating the quality of DIBR-synthesized images and proposes a novel 3D objective image quality metric. The proposed algorithm aims to measure both textural and structural distortions in the DIBR image by exploiting the contrast sensitivity and the Hausdorff distance, respectively. The two measures are combined to estimate an overall quality score. The experimental evaluations performed on the benchmark MCL-3D dataset show that the proposed metric is reliable and accurate, and performs better than existing 2D and 3D quality assessment metrics.


Introduction
Three-dimensional (3D) technologies, e.g., augmented reality, virtual reality, mixed reality, and stereoscopy, have lately enjoyed remarkable growth due to their numerous applications in the entertainment industry, gaming industry, for electro-medical equipment, etc. 3D television (3DTV) and the recent free-viewpoint television (FTV) [1] have enhanced users' television experience by providing immersion. 3DTV projects two views of the same scene from slightly different viewpoints to provide the depth sensation. The FTV, in addition to the immersive experience, enables the viewer to enjoy the scene from different viewpoints by changing his/her position in front of the television. To provide a full parallax, FTV needs dozens of views, ideally an infinite number of views. Capturing, coding, and transmitting such a large number of views is not practical due to various financial and technological constraints, such as limited available bandwidth. Therefore, novel 3D video (3DV) formats and representations have been explored to design compression-friendly and cost-efficient solutions. The multiview video plus depth (MVD) format is considered to be the most suitable for 3D televisions. In addition to color images, MVD also provides the corresponding depth maps, which represent the geometry of the 3D scene.
The additional dimension of the depth in MVD provides the ability to generate novel views from a set of available views using the depth-image-based rendering (DIBR) technique [2], thus enabling the stereoscopy. The quality of the synthesized views is important for a pleasant user experience. Since the depth maps are usually generated using stereomatching algorithms [3], they are not accurate. The inaccuracies in depth maps, when used in DIBR, might introduce various distortions in the synthesized images degrading their quality and resulting in a poor quality of experience (QoE). Thus, assessing the quality of the DIBR-synthesized views is necessary to ensure a satisfactory user experience.
Inaccuracies in depth maps cause textural and structural distortions such as ghost artifacts and inconsistent object shifts in the synthesized views [4][5][6][7][8]. Texture and depth compression also introduce artifacts in the virtual images [9,10]. Another factor that causes degradation in virtual image quality is occluded areas in the original view that become visible in the virtual view, which are called holes. These holes are usually estimated using image inpainting techniques that do not always produce a pleasant reconstruction. Figure 1a shows the artifacts introduced in a synthesized view due to visible occluded regions. Note the distorted face of a spectator in Figure 1b because of erroneous depth in DIBR. The various structural and textural distortions introduced in DIBR images may affect the picture quality, the depth sensation, and the visual comfort, which are considered three main factors of user quality-of-experience (QoE) [6]. Besides viewing experience, studies show that the distortion in 3D images can affect the performance of various applications designed for the 3D environment, such as image saliency detection, video target tracking, face detection, and event detection [11][12][13]. This means that the image quality is very important not only for viewer satisfaction in a stereoscopic environment but also for various 3D applications built for this environment. Therefore, 3D image quality assessment (3D-IQA) is an essential part of the 3D video processing chain.
In this paper, we propose a 3D-IQA metric to estimate the quality of DIBR-synthesized images. The proposed metric aims to measure the structural and textural distortions introduced in the synthesized image due to depth-image-based rendering and combines them to predict the overall quality of the image. The structural details in an image are considered important for their quality as the human visual system (HVS) is more sensitive to them [14,15]. It is the difference between luminance or color that makes the representation of an object or the main features of an image distinguishable. The distortion in these features, referred to as textural distortion, is also important for a true image quality estimation. The textural and structural metric scores are combined to obtain an overall quality score.
The rest of the paper is organized in the following way. Section 2 reviews the related literature, Section 3 presents the proposed 3D-IQA technique. The experimental evaluation of the proposed metric is carried out in Section 4 and we conclude the research in Section 5.

Related Work
The quality of an image can be either assessed through subjective tests or by using an automated objective metric [16]. As human eyes are the ultimate receiver of the image, a subjective test is certainly the best and the most reliable way to assess the visual image quality. In such tests, a set of human observers assigns quality scores to the image, which are averaged to get one score. This method, however, is a time-consuming and expensive approach. Therefore, it was felt necessary to introduce an automatic and fast way to assess the quality of an image. This provides the opportunity for researchers to introduce objective metrics for quantitative image quality evaluation, which proves to be a significant improvement in the field of image quality assessment.
Objective image and video quality metrics can be grouped into three classes based on the availability of the original reference images: full-reference (FR), no-reference (NR), and reduced-reference (RR) [17]. The IQA metric that requires the original reference image to evaluate the quality of its distorted version is referred to as a full-reference metric. The IQA approach that assesses the quality of an image in the absence of a corresponding reference image is classified as the no-reference metric. The reduced-reference metrics lie between the two categories, they do not require the reference images but some of their features must be available for comparison.
In the literature, several 2D and 3D objective quality assessment metrics have been proposed to assess visual image quality. Initially, 2D metrics were used to assess the quality of 3D content, however, the use of conventional 2D metrics was found inappropriate to assess the true quality of 3D images due to several additional factors of 3D videos that were not considered by 2D-IQA algorithms [18][19][20]. Therefore, novel IQA algorithms were needed to evaluate the quality of 3D videos. Such algorithms, in addition to 2D-related artifacts, must also consider artifacts introduced due to the additional dimension of depth in the videos.
In recent years, several algorithms have been proposed to evaluate the quality of 3D images. Many of them utilize the existing 2D quality metrics for this purpose, e.g., [21][22][23][24]. Since these algorithms rely on metrics especially designed for 2D images, they do not consider the most important factor of 3D images, i.e., depth, and therefore they are not accurate and reliable.
Many 3D-IQA techniques consider depth/disparity information while assessing the quality of 3D images, e.g., [25][26][27][28]. You et al. [19] adopted a belief-propagation-based method to estimate the disparity and combined the quality maps of distorted image and distorted disparity computed using conventional 2D metrics. The method proposed in [25] exploits the disparity as well as binocular rivalry to determine the quality. It uses the Multi-scale Structural Similarity Index Measure (MSSIM) [29] metric to evaluate the quality of disparity of stereo images. Zhan et al. [26] presented a machine-learning-based method that works by learning the features from 2D-IQA metrics and specially designed 3D features using the Scale Invariant Feature Transform (SIFT) flow algorithm [30], and was used to obtain the depth information. The different features of disparity and three types of distortions (blur, noise, and compression) were used by [28] in evaluating the quality of 3D images. These features were used to train a quality prediction model by using the random forest regression algorithm. The method proposed in [18] addressed the issue of structural distortion in a synthesized view due to DIBR, but the method is limited to structural distortions so it cannot be used to evaluate the overall quality of the image.
The 3D-IQA method presented in [31] identifies the disocclusion edges in the synthesized image and inversely maps them to the original image, and the corresponding regions are then compared to assess the quality. The algorithm in [32] uses feature matching points in the synthesized and reference images to compute the quality degradation. The Just Notice Difference (JND) model is exploited in [33] to compute the global sharpness and distortion in holes in the DIBR image to assess its quality. The quality metric proposed in [34] identifies the critical blocks in the DIBR synthesized image and the reference image. The texture and color contrast similarities between these blocks are compared to estimate the quality of the synthesized image. The method in [35] works by extracting the features of energy-weighted spatial and temporal information and entropy. Then, support vector regression uses these features for depth estimation. Gorley et al. proposed a stereo-bandlimited contrast method in [36] that considers contrast sensitivity and luminance changes as important factors for the assessment of image quality. The method presented in [37] extracts the natural scene features from a discrete cosine transform (DCT) domain, and a deep belief network (DBN) model was trained to get the deep features. These generated deep features and DMOS values were used to train a support vector regression (SVR) model to predict the image quality. The learning framework proposed in [38] also uses a regression model to learn the features and besides assessing the quality, it also improves the quality of stereo images. The method proposed in [39] considers the global visual characteristics by using structural similarities and the local quality was evaluated by computing the local magnitude and local phase. The global and local quality scores were combined to get the final score.
Binocular perception or binocular rivalry is an important factor in 3D image quality assessment [40,41]. Humans perceive images with both eyes and it is obvious that there is a difference between the perceptions of the left and the right eye in relation to an image. Indeed, binocular rivalry is the visual perception phenomenon in which there exists a difference in the perception of an image when it is seen from the left eye and the right eye. This difference is called the binocular parallax or binocular disparity. The binocular disparity can be divided into horizontal and vertical parallax. The horizontal parallax affects depth perception and the vertical parallax affects visual comfort [37]. This binocular perception was taken into account in [42] and a binocular fusion process was proposed for quality assessment of stereoscopic images. The 3D-IQA metric proposed in [41] is also based on binocular visual characteristics. A learning-based metric [43] uses binocular receptive field properties for assessing the quality of stereo images. Shao et al. [44] proposed a metric that simplifies the process of binocular quality prediction by dividing the problem into monocular feature encoding and binocular feature combination.
Lin et al. combine binocular integration behaviors such as binocular combination and binocular frequency integration with conventional 2D metrics in [45] to evaluate the quality of stereo images. Binocular spatial sensitivity influenced by binocular fusion and binocular rivalry properties was taken into consideration in [46]. The method proposed in [47] uses binocular responses, e.g., binocular energy response (BER), binocular rivalry response (BRR), and local structure distribution, for 3D-IQA. Quality assessment of asymmetrically distorted stereoscopic images was targeted in [48]. The method is inspired by binocular rivalry and it uses estimated disparity and Gabor filter responses to create an intermediate synthesized view whose quality is estimated using 2D-IQA algorithms. A multi-scale model using binocular rivalry is presented in [49] for quality assessment of 3D images. Numerous other 3D-IQA algorithms use binocular cues for evaluating the quality of 3D images, e.g., [50][51][52].

The Proposed Technique
In multiview video-plus-depth (MVD) format, depth-image-based rendering (DIBR) is used to generate virtual views at novel viewpoints to support 3D vision in stereoscopic and autostereoscopic displays. The DIBR obtains the virtual view by warping the original left and right views to a virtual viewpoint with the help of the corresponding depth maps. As discussed earlier, when the virtual view is generated its quality may degrade due to several structural or textural distortions introduced during synthesis. The major cause of these distortions is the inaccurate depth. This inaccuracy in the depth estimates and other compression-related artifacts can cause several distortions in the synthesized image, such as ghost artifacts, holes, and blurry regions, as shown in Figure 1. These distortions degrade the image quality and eventually result in poor overall user quality of experience (QoE). Estimating the quality of the synthesized image is therefore important to ensure better QoE. We propose a 3D-IQA metric that attempts to estimate the distortions introduced in synthesized images. Specifically, the proposed metric is a combination of two measures: one estimates the variations in the texture and the other calculates the deterioration in the structures in the image.

Estimating the Textural Distortion
Textures are complex visual patterns, composed of spatially organized entities that have characteristic brightness, color, shape, and size. The texture is an important discriminant characteristic of an image region [53] and can be used for various purposes such as segmentation, classification, and synthesis [54]. Image texture gives us information about the spatial arrangement of color or intensities in an image or a selected region of an image. During the process of DIBR, the texture of the synthesized image can be adversely affected due to object shifting, incorrect rendering of textured areas, and blurry regions [55]. Object shifting may cause translation or changes in the size of the region in the synthesized view. Due to the translation of objects, the occluded areas in the original view may become visible in the synthesized view, and these are known as holes. These holes are usually estimated using image inpainting techniques that do not always produce accurate reconstruction and result in the incorrect rendering of texture areas and blurry regions in the synthesized view. Given a DIBR-synthesized image and its corresponding reference, the proposed metric estimates the texture distortion by computing the local variations in their contrasts.
Image contrast is an important feature of texture, a basic perceptual attribute and also an important characteristic of the human visual system (HVS) [56,57]. Contrast sensitivity is one of the dominating factors in the research of visual perception [58]. It can be defined as the difference between luminance or color that makes the representation of an object distinguishable. The most famous contrast computation methods are the Michelson and Weber contrast formulas [58]. There are a few methods that use some form of contrast to assess the quality of images [36,[59][60][61].
The proposed metric captures the local variation in contrast of the synthesized image and its reference image. The two images are low-pass filtered to smooth their high spatial frequencies. This is achieved with a small Gaussian filter w of size 3 × 3.
where α g is a normalization term that ensures ∑ w i,j = 1 and σ g is Gaussian variance, which controls the weight distribution and the filter size. Let I and R be the filtered synthesized image and its reference image of size M × N. Let x ij represent a block of size m × n in image I centered at pixel (i, j), and y ij be its corresponding block in reference image R centered at pixel location (i, j). Let x i and y i represent the i-th corresponding blocks of I and R. The mean µ, variance σ 2 , and standard deviations σ of a block x ij are computed.
These statistics for y ij are computed analogously. The variation in contrast ψ ij between the blocks x ij and y ij is then computed.
where c is a small constant used to stabilize the equation. The ψ scores of all pixels in I are computed and averaged to obtain the texture distortion score T of the synthesized image.

Estimating the Structural Distortion
The study presented in [14] shows that the human visual system (HVS) is highly adapted for extracting structural information from the image. The inaccuracies and compression artifacts in the depth map adversely affect the structural details of the image during the process of DIBR, generally distorting the edges and gradients in the images [55,60]. The depth compression may cause the pixels to be lost or wrongly projected in the synthesized view. Similarly, the estimation inaccuracies in the depth cause ghost artifacts, inconsistent object shift, and distortion of edges in the synthesized view. These distortions in the image affect both the texture and the structure of the image. Therefore it is equally important to compute the structural dissimilarities in the image to assess its quality. Several methods are proposed to compute the structural similarity in 2D images, e.g., [29,60,[62][63][64][65].
We used the Hausdorff distance [66] to compute the structural similarity score. The Hausdorff distance measures the degree of mismatch between two sets [66,67]. Similar to a texture distortion metric, this mismatch is also computed locally. The Hausdorff distance can be computed for grayscale images, e.g., [68], and for binary images [66,67]. In the proposed metric, since we want to estimate the distortion in the structural details in the warped image compared to the reference image, the edges in the two images are detected and these edge images are used to estimate the degree of mismatch. Any edge detector can be used for this purpose, however, similar to [66], in our study we used the Canny edge detector [69] to compute the edge maps. The Hausdorff distance between two image blocks x ij and y ij of size m × n centered at location (i, j) in image I and R, respectively, as defined in the preceding section, is computed as follows: HD(x ij , y ij ) = max(hd(x ij , y ij ), hd(y ij , x ij )) The function hd(x ij , y ij ) is called the directed Hausdorff distance from x ij to y ij and it can be defined as Equation (7) identifies the point a in x ij that is the farthest from any point in y ij and measures its distance from the nearest neighboring point in y ij . The function hD(x ij , y ij ) then ranks each point of x ij according to its distance from the nearest neighbor in y ij and picks the largest distant point from these ranked distances because it is the most mismatched point between the reference and distorted image blocks. Similarly, the directed Hausdorff distance from y ij to x ij is computed. In hd(x ij , y ij ) and hd(y ij , x ij ), the former represents the degree of mismatch between the synthesized and original image block and the latter represents the degree of mismatch between the original and the synthesized image blocks. Then the largest of the two is chosen as the mismatch score (Equation (6)). The obtained value is normalized.
The value of HD ij falls in the interval [0, 1]. Recall that HD ij is the degree of mismatch, and the structural similarity score is computed by subtracting this normalized Hausdorff score (HD ij ) from 1.
The structural scores for all k blocks are computed and averaged to obtain a single score S.

Final Quality Score
The textural and structural scores of the synthesized image computed using Equations (5) and (10), respectively, are combined to compute the overall quality score Q.
The parameter α is used to adjust the relative importance of textural and structural scores. Its value is empirically set to 0.7. Figure 2 shows the results of the proposed metric on a few sample images from the testing dataset. In a stereoscopic environment, the quality scores obtained by the proposed metric for both views are averaged to get a single quality score. Figure 2b-e are images obtained by DIBR synthesis from the two source color and depth images, the source view images were artificially degraded by introducing additive white noise (AWN) at four different levels, with noise control parameters 5, 17, 33, and 53, respectively. Figure 2a shows the corresponding ground truth image. Below each image, the scores estimated by the proposed metric and the respective subjective scores are reported. It can be noted that the visual quality of the synthesized images degrades as the noise in the source color and depth images increases and our metric effectively captures this quality degradation.

Experiments and Results
The performance of the proposed 3D video quality assessment metric was evaluated on the benchmark stereoscopic Media Communications Lab -MCL-3D dataset [70] and compared with other 2D and 3D-IQA metrics. We conducted multiple experiments of different types to evaluate the performance and statistical significance of our proposed method. The results were also compared with existing 2D-and 3D-IQA algorithms.

Dataset
The MCL-3D dataset was used to evaluate the performance of the proposed quality metric. The dataset analyzes the impact of different distortions on the quality of depthimage-based rendering (DIBR) synthesized images. MCL-3D dataset was created by Media Communication Lab, University of Southern California, and is publicly available [70]. The dataset was created from 9 multiview-video-plus-depth (MVD) sequences. The resolution of 3 MVD sequences is 1024 × 768 whereas the remaining 6 sequences have a resolution of 1920 × 1080. The dataset reports 648 mean opinion scores (MOSs) of stereo image pairs generated using DIBR from distorted texture and/or depth images. Six different types of distortions (Gaussian blur (Gauss), additive white noise (AWN), down-sampling blur (Sample), JPEG compression (JPEG), JPEG 2000 compression (JP2K), and transmission loss (transloss)) with four different levels are applied either on texture images, depth images, or both. The distorted texture images and depth maps are used to generate the intermediate middle virtual images by using view synthesis reference software (VSRS) [71], a benchmark DIBR technique for generating synthesized views. Sample reference and distorted DIBR images are shown in Figure 3.

Performance Evaluation Parameters
To evaluate the performance of the proposed method we used different statistical tools, including the Pearson linear correlation coefficient (PLCC), Spearman rank-order correlation coefficient (SROCC), Kendall rank-order correlation coefficient (KROCC), rootmean-square error (RMSE), and mean absolute error (MAE). Before computing these parameters, the scores obtained by objective quality metrics were mapped to subjective deferential mean opinion score (DMOS) values using the nonlinear logistic regression described in [72]: where o is the score obtained by the objective quality metric, DMOS p is the mapped score, and β 1 , . . . , β 5 are the regression parameters. The Pearson linear correlation coefficient (PLCC) is used to determine the linear correlation between two continuous variables. Since this method is based on covariance computation, it is considered the best method for measuring statistical relationships. The method was used in the prediction accuracy test. Let x represent the MOS values, y represent the mapped scores, andx andȳ represent the mean values of x and y, respectively. PLCC is computed as The Pearson correlation coefficient describes how strong the relationship between subjective MOS and evaluated objective scores is. The value lies between −1 and 1. Values closer to 1 represent a strong relationship.
The Spearman rank-order correlation coefficient (SROCC) is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function. The difference between PLCC and SROCC is that the former only assesses linear relationships whereas the latter assesses monotonic relationships that may or may not be linear. For n observations, the SROCC can be computed as The Kendall rank correlation coefficient (KROCC) is another nonparametric measure to determine the relationship between two continuous variables. Like SROCC, it assesses associations based on the ranks of data. It is used to test the similarities in data when they are ranked by quantities.
where n is the sample size, n c is the number of concordant pairs and n d is the number of discordant pairs. Root-mean-square error (RMSE) is the most widely used performance evaluation measure and it computes the prediction error [73]. Since the method takes the square of the error before computing the average, it gives a relatively high weight to a large error and that is why it is considered an important method for performance evaluation.
Mean absolute error (MAE) is another method to compute the difference between two continuous variables. MAE is a linear score that means that all the individual differences are weighted equally in average.
Since PLCC, SROCC, and KROCC are the correlations and MAE and RMSE are the errors, large values of correlations and small values of errors indicate a better performance of the quality metric.

Performance Comparison with 2D and 3D-IQA Metrics
To evaluate the effectiveness of the proposed method, we compared its performance with various existing 2D and 3D-IQA metrics. We compared the performance of the proposed metric with widely used 2D quality assessment metrics: PSNR, SSIM [60], VSNR [74], IFC [75], MSSIM [29], VIF [76], and UQI [77]. Before computing the performance parameters, the objective scores computed by these metrics were also mapped to MOS values by using the same logistic function mentioned in Equation (12). In all experiments, the implementation of the method provided by the authors or other parties was used. The comparison in terms of all five performance parameters is presented in Table 1. The best results are highlighted in bold for convenience. The results reveal that the proposed algorithm outperforms all the compared 2D-IQA algorithms in all performance parameters. Specifically, the proposed method achieves PLCC 0.8909, SROCC 0.8979, and KROCC 0.7095 with minimum RMSE 1.1816. We also evaluated the performance the proposed metric with thirteen well-known and recent 3D image quality assessment metrics. They include 3DSwIM [55], StSD [78], Chen [25], Benoit [79], Campisi [21], Ryu [50], PQM [80], Gorley [36], You l , You g [19], SIQM [73], NIQSV [81], and ST-SIAQ [82]. The evaluation results presented in Table 2 show that the proposed method outperforms all the compared methods in each performance parameter and achieves the best PLCC (0.8909) with the minimum RMSE (1.1816). The other measures, SROCC, KROCC, and MAE, also reveal that the proposed method performs better than other 3D-IQA metrics. To further investigate the effectiveness of the proposed method, its performance with the compared 2D and 3D-IQA metrics was also evaluated for individual distortion types, i.e., AWN, Gauss, Sample, Transloss, JPEG, and JP2K. Recall that the stereopair images in the dataset were generated through DIBR from the polluted depth and/or the color images with these types of noise. The results of the comparison with 2D and 3D quality metrics in terms of PLCC are reported in Tables 3 and 4. These results show that the proposed metric performs better than the compared method for most individual distortion types. Similar observations were made when evaluated using other performance parameters, i.e., SROCC, KROCC, RMSE, and MAE, which are not shown here to save space.

Variance of the Residual Analysis
Variance is the squared deviation of a measure from its mean. It is generally used in evaluating the efficiency of an image quality assessment metric by finding how much the scores computed by an objective IQA metric are closer to the subjective scores. This is achieved by computing the difference between the predicted scores and actual scores. A small difference indicates that the results computed by the metric are reliable and close to the actual scores. To compute this variance σ 2 , first the residual difference between the DMOS and predicted scores after non-linear mapping (DMOS p ) is computed: The variance σ 2 of each compared 2D and 3D quality metric was computed from its residuals R and the statistics are presented in Table 5. The results show that our method achieves the smallest variance among all compared methods, which means the scores estimated by the proposed method are highly correlated with the subjective ratings. Table 5. Variance of the residuals of subjective ratings and the mapped objective scores of the proposed and the compared 2D and 3D quality metrics. The best results are highlighted in bold.

Statistical Significance Test
The statistical significance test [16,72] helps to determine whether a quality metric is statistically better than the other metric. We conducted this test to statistically verify the performance of the proposed metric. In this experiment, we considered only the 3D-IQA metrics as the previous evaluations have shown that the 2D-IQA algorithms perform rather poorly compared to 3D-IQA approaches in assessing the quality of DIBRsynthesized images. The F-test procedure was used to test the significance of the difference between two quality assessment metrics. In the F-test, we compared the variances of residuals (Equation (18)) of two metrics i and j with the F-ratio threshold to find the statistical significance. The F-ratio threshold was obtained from the F-distribution look-up table with If the σ 2 j σ 2 i ratio is greater than the F-ratio then the metric i is said to be significantly superior to the metric j. Similarly, the metric i is said to be significantly inferior to the metric j if this ratio is less than the p-value. The two metrics are said to be statistically indistinguishable if this ratio lies between the p-value and the F-ratio threshold. The F-ratio is called the right-tail critical value and the p-value is called the left-tail critical value. Both of these values were obtained from the F-distribution look-up table at a 95% significance level. The results of the test are presented in Table 6. Each entry in the table is a codeword of 6 characters corresponding to symbols A, G, J, j, S, and T, which represent the distortions AWN, Gauss, JP2K, JPEG, Sample, and Transloss, respectively. In the codeword, the value '1' means that the performance of metric in the row is significantly superior to the metric in the column, the value '0' means that the metric in the row is significantly inferior to the metric in the column, and −' means that the performance of metric in the row and column is equivalent or statistically indistinguishable. The results demonstrate that except AWN and Transloss, the performance of the proposed metric is significantly superior or equivalent to all the compared 3D-IQA metrics in all distortion types. The experimental evaluations performed on the benchmark DIBR synthesized image dataset showed that the performance of the proposed 3D-IQA metric is appreciably better than the compared 2D and 3D-IQA algorithms. Moreover, the variance and the statistical significance tests also showed that our metric is significantly superior or equivalent to most of the compared 3D-IQA metrics. All these performance analyses reveal that the proposed metric is reliable and accurate in estimating the quality of the DIBR-synthesized views. Table 6. Statistical significance tests of proposed and other 3D-IQA metrics on MCL-3D dataset. Value '1' means the metric in the row is significantly superior to that of the column. Value '0' means the metric in the row is significantly inferior to that of the column and '-' means both the metrics are significantly equivalent. The symbols A, G, J, j, S, and T represent the distortions for AWN, Gauss, JP2K, JPEG, Sample, and Transloss, respectively.

Conclusions
In this paper, a novel 3D-IQA metric was proposed to assess the quality of DIBRsynthesized images. The proposed method merges two metrics, one computing the deterioration in the texture of the synthesized image and the other computing the structural distortions introduced in the synthesized image due to DIBR and various other types of noise. The two measures are weighted-averaged to obtain the overall quality indicator. Experimental evaluations were performed on the MCL-3D dataset, which contains DIBRsynthesized images generated from color and depth images that were subject to different types of noise. The experimental results and comparisons with existing 2D and 3D-IQA metrics demonstrate that the proposed metric is accurate and reliable in assessing the quality of DIBR-synthesized 3D images.