Learning to Measure Stereoscopic S3D Image Perceptual Quality on the Basis of Binocular Rivalry Response

Blind perceptual quality measurement of stereoscopic 3D (S3D) images has become an important and challenging issue in the research field of S3D imaging. In this paper, a blind S3D image quality measurement (IQM) method that does not depend on examples of distorted S3D images and corresponding subjective scores is proposed. As the main contribution of this work, we replace human subjective scores with a quality codebook of binocular rivalry responses (BRRs); this allows blind S3D-IQM methods to be learned without evaluation performance loss. Our results, using the publicly accessible LIVE S3D dataset, confirm that our method is highly robust and efficient.


Introduction
In recent years, stereoscopic 3D (S3D) technologies have attracted considerable research interest. S3D measurement can be classified into S3D content measurement (e.g., visual discomfort, quality of experience, or unnatural experience) and S3D image quality measurement (IQM) (e.g., the measurement of compression distortions or transmission impairments for S3D images). However, research on S3D content measurement is still limited. In recent decades, blind S3D-IQM has been attracting attention in the research field of S3D imaging [1][2][3]. State-of-the-art blind S3D-IQM methods learn to measure S3D quality via regression from the subjective scores of training samples. For instance, in [4], Akhter et al. introduced a blind S3D-IQM method that extracts both 2D and S3D quality-aware features from an S3D image and a depth/disparity map. Next, a regression model is adapted to obtain a final quality value from the above-mentioned features. Similar studies were also conducted in [5]; Chen et al. proposed a blind S3D-IQM method based on a "cyclopean" view and disparity map. Zhou et al. introduced blind S3D-IQM based on the self-similarity of binocular features [6]. Other relevant existing studies can be found in [7][8][9][10][11][12][13][14]. These methods require human-scored S3D images for training; however, obtaining subjective scores through subjective evaluation is often expensive, cumbersome, and time-consuming, which limits their usability in practice. Thus, a significant problem is triggered: can we learn to measure S3D image quality blindly without relying on human-scored S3D images?
Quality measurement of 2D images has been extensively studied, and several excellent methods have been proposed [15][16][17][18]. For example, in [16], Xue et al. presented a quality-aware clustering metric, which can learn centroids to act as a quality-aware codebook for measuring distorted image quality. Inspired by [16], we propose a blind S3D-IQM method that learns a quality codebook of binocular rivalry responses (BRRs). The main contributions of this work are three-fold. 2 of 7 First, in the offline training phase, we learn a quality codebook from the BRRs of the original and distorted S3D images; this codebook will be used to replace human subjective scores to learn blind S3D-IQM methods.
Second, a BRR-weighted local binary pattern (LBP) histogram scheme is used to extract quality-predictive features; this scheme is effective for describing degradation patterns.
Third, in the online testing phase, a distance-based weighting scheme is used for perceptual quality pooling. Experimental results obtained using the publicly accessible LIVE S3D dataset indicate that the proposed method performs competitively and provides quality prediction performance and generalization ability.

Proposed Method
The block diagram for our blind S3D-IQM method is depicted in Figure 1. During the offline training phase, similarity scores and a weighted LBP histogram are generated from training S3D images; subsequently, a quality codebook is constructed. During the online testing phase, by implementing feature extraction for test S3D images, blind perceptual quality pooling can be easily achieved by a distance-based weighting scheme. Our method is described in detail as follows.
methods have been proposed [15][16][17][18]. For example, in [16], Xue et al. presented a quality-aware clustering metric, which can learn centroids to act as a quality-aware codebook for measuring distorted image quality. Inspired by [16], we propose a blind S3D-IQM method that learns a quality codebook of binocular rivalry responses (BRRs). The main contributions of this work are three-fold.
First, in the offline training phase, we learn a quality codebook from the BRRs of the original and distorted S3D images; this codebook will be used to replace human subjective scores to learn blind S3D-IQM methods.
Second, a BRR-weighted local binary pattern (LBP) histogram scheme is used to extract quality-predictive features; this scheme is effective for describing degradation patterns.
Third, in the online testing phase, a distance-based weighting scheme is used for perceptual quality pooling. Experimental results obtained using the publicly accessible LIVE S3D dataset indicate that the proposed method performs competitively and provides quality prediction performance and generalization ability.

Proposed Method
The block diagram for our blind S3D-IQM method is depicted in Figure 1. During the offline training phase, similarity scores and a weighted LBP histogram are generated from training S3D images; subsequently, a quality codebook is constructed. During the online testing phase, by implementing feature extraction for test S3D images, blind perceptual quality pooling can be easily achieved by a distance-based weighting scheme. Our method is described in detail as follows. (1) Training dataset preparation: We select 12 reference S3D images collected from Mobile S3DTV and MPEG (please refer to Figure 2) [19]. From each reference S3D image, we then generate distorted S3D images using four types of distortions, including JPEG2000 (JP2K), Gaussian blur (Gblur), white noise (WN), and JPEG. For each distortion, five quality levels are generated by controlling different parameters. (1) Training dataset preparation: We select 12 reference S3D images collected from Mobile S3DTV and MPEG (please refer to Figure 2) [19]. From each reference S3D image, we then generate distorted S3D images using four types of distortions, including JPEG2000 (JP2K), Gaussian blur (Gblur), white noise (WN), and JPEG. For each distortion, five quality levels are generated by controlling different parameters. (2) Quality measurement of the training S3D images: Neuroscience has made rapid progress in comprehending the human visual mechanism and how S3D visual signals are transmitted to the human brain. BRR is a well-researched visual mechanism of competition in the visual cortex [20]. BRR is a human visual response whereby the binocular visual mechanism alternates between contradictory monocular images viewed by both eyes. Some findings in [20] showed that BRR can be (2) Quality measurement of the training S3D images: Neuroscience has made rapid progress in comprehending the human visual mechanism and how S3D visual signals are transmitted to the human brain. BRR is a well-researched visual mechanism of competition in the visual cortex [20]. BRR is a human visual response whereby the binocular visual mechanism alternates between contradictory monocular images viewed by both eyes. Some findings in [20] showed that BRR can be strongly regulated by low-level sensory features, and a classic linear mathematical model is provided. Motivated by the above, 2D-Gabor is used for both views to simulate the rivalries. Consequently, BRR, which can be denoted by R bin (x,y,σ), is defined as where d denotes a disparity value estimated using an S3D matching algorithm [21] (Reference [22] reports that similar performance is attained when using either the ground truth or the estimated disparity). Subscripts "l" and "r" are the left and right views, respectively. M l (x,y,σ) and M r (x,y,σ) are the monocular responses of both views, which can be obtained using a difference of Gaussian (DoG) filter where σ denotes the scales parameter and k is used to control the scales of the DoG. r denotes the l 2 norm. In this study, scales σ of the DoG are set to 0, 1, 1.6, 2.56, and 4.096 (the same values as the filter used in [2]), with k = 1.6. W l and W r denote the weights for imitating the BRR. They are represented as W l (x,y) = G l (x,y)/(G l (x,y) + G r (x + d,y)) and W r (x + d,y) = G r (x + d,y)/(G l (x,y) + G r (x + d,y)), respectively, where G l (x,y) and G r (x,y) denote the respective 2D-Gabor responses of both views. The 2D Gabor filter can be defined as: G(x, y, σ x , σ y , ζ x , ζ y , θ) = 1 2πσ x σ y e −(1/2)[((x cos θ+y sin θ)/σ x ) 2 +((y cos θ−sin θ)/σ y ) 2 ] e i(xζ x +yζ y ) where σ x and σ y denote the standard deviations along the x and y axes, ζ x and ζ y , and θ denotes spatial frequencies and orients the filter. The design of the 2D-Gabor filter was based on the research conducted by Su et al. [23]. The similarity score between the BRRs of the original S3D image and distorted S3D images is defined as: where T is a small positive constant. In this study, we set T = 0.085 in our experiments. Ω indicates the whole spatial domain, · 2 denotes the l 2 norm, and · represents the inner product. Subscript "n" is the index of the n-th training distorted S3D images. Subscripts "o" and "d" are the original and distorted S3D images, respectively.
(3) Quality-predictive features extraction: Based on Marr's theory [24], local structural features based on the visual cortex (area V1) are related to image quality. LBPs are basic feature extractors that are commonly used to represent an image's structural and textural information [25]. First, we compute the rotation invariant uniform LBP map, LBP riu2 P,R,BRR,n , for BRR by comparing BRR intensity against its eight neighbors (in consideration of computational complexity, R and P were set to 1 and 8, respectively). To effectively capture textural and structural features, the BRR-weighted LBP histogram is applied to extract quality-predictive features of BRR, which are defined as h n (k, σ) = 1 Ω x,y R d,n (x, y, σ) × f (LBP riu2 P,R,BRR,n (x, y, σ), k) (4) BRR quality codebook: Thus far, we have gained a set of human-scored S3D images for training. All the quality scores (denoted by {S n }) and all the feature vectors (denoted by h n (k, σ) ) of the distorted S3D images for training constitute a BRR quality codebook to reflect different quality levels of visual signals and associated quality-predictive features.
(5) Blind perceptual quality pooling: After obtaining a BRR quality codebook, we use the codes to predict the quality score of a test S3D image, relying on the hypothesis that S3D images with similar quality-aware features should share similar visual qualities. First, let h d m (k, σ) represent the quality-aware features of the n-th test S3D image. We define the distance H m,n between h d m (k, σ) and h n (k, σ) as the product of the chi-square distance between each pair of quality-predictive features.
where N is the total number of training S3D images. Let {Ĥ m,1 ,Ĥ m,2 , · · · ,Ĥ m,9Ĥm,10 } represent the chi-square distances of 10 of the nearest quality-predictive features, which are calculated using the n-th test set and the selected training set. A distance-based weighting method is then utilized to measure the perceptual visual quality of the test set. The normalized weight for the 10 similarity scores of the selected training set, which is denoted by w m,n , can be represented as By distributing larger weights to the similarity values of the selected neighboring training S3D images, the final quality value of the n-th test S3D image is calculated as

Results
In the experiment, the effectiveness of our method was verified using the LIVE S3D [5] dataset, which includes 365 symmetrically distorted S3D images (in the future, we plan to understand how asymmetrical distortion affects perceptual quality (for the case of blur artifacts, the perceptual quality of asymmetric blurred S3D images is mainly dominated by the view that contains more information. For the case of JPEG artifacts, the binocular perceptual quality has a tendency towards the lower quality view) and extension of the proposed method to facilitate asymmetrical distortion S3D-IQM). In order to facilitate the experiment (refer to [16]), we utilized only four distortion types: JP2K, WN, JPEG, and Gblur. The images in the dataset were assigned difference mean opinion scores (DMOS), which represent subjective judgments.
In order to compare the proposed method against existing S3D-IQM methods, two widely used performance criteria were calculated by applying the DMOS and the predicted quality scores. The first was a Pearson's linear correlation coefficient (PLCC), which was computed to evaluate the prediction accuracy. The second was a Spearman's rank order correlation coefficient (SROCC), which measured the prediction monotonicity. For an ideal IQM method, values close to 1 were needed for SROCC and PLCC.
To demonstrate the robustness and effectiveness of the proposed method, we compared it with several existing representative IQM methods, including three 2D-IQM methods-SSIM (full reference, FR) [26], IL_NIQE (blind, not depending on DMOSs) [15], and Xue's method (blind, not depending on DMOSs) [16]-and seven S3D-IQM methods-Lin's method (FR) [2], Akhter's method (blind, depending on DMOSs) [4], Chen's method (blind, depending on DMOSs) [5], Yue's method (blind, depending on DMOSs) [9], Zhou's method (blind, depending on DMOSs) [12], Shao's method (blind, not depending on DMOSs) [13], and Zhou's method (blind, not depending on DMOSs) [14]. All IQM methods are based on the grayscales of S3D images. The SROCC and PLCC results are listed in Table 1; the criterion that provides the best evaluation performance is emphasized in boldface (all the blind methods that do not depend on DMOSs). As shown in Table 1, compared with other methods, it is evident that our method correlates well with the DMOSs, owing to its sufficient application of human visual perception. It thereby delivers a performance that is superior to most blind and FR methods, including SSIM and IL_NIQE, and the methods proposed by Xue, Lin, Shao, and Zhou. Thus, our method is an accurate, effective, and consistent S3D-IQM method, even though it does not use human-scored S3D images. Another experiment was conducted to further investigate the evaluation ability of the IQM methods for specific distortion types. In this experiment, we tested the performance of the state-of-the-art methods on individual distortion types. The SROCC and PLCC results are listed in Table 2. For each individual distortion type, the top two results are emphasized in boldface (all blind methods that do not depend on DMOS). As shown in Table 2, our method placed among the top methods most often. Although some methods may provide good performance for some distortion types, the proposed method performs better than, or on par with, the best method for each individual distortion type. Next, we discuss the influence of disparity estimation performance on the proposed method. The performance of the proposed method using ground truth, SAD [27], and Klaus [21] are listed in Table 3. Table 3 shows that a similar performance is attained when using either the ground truth or the estimated disparity.

Conclusions
In this paper, we proposed a new blind S3D-IQM method that does not depend on DMOS during learning. The main contribution of this study is that we constructed a quality-aware codebook to obtain blind IQM using a perceptual pooling strategy. Experimental results show that the proposed method can produce assessments that are statistically much more consistent with DMOS.
Further work will mainly focus on: (1) other binocular visual perception (e.g., binocular fusion, binocular suppression) can be considered for improving the performance of the proposed method, (2) the effect of disparity information needs to be further addressed and, besides disparity, many other factors, such as monocular depth cues, visual comfort, visual direction, etc., should be considered in modeling S3D-IQM without relying on human-scored S3D images.