No-Reference Quality Assessment of Transmitted Stereoscopic Videos Based on Human Visual System

: Provisioning the stereoscopic 3D (S3D) video transmission services of admissible quality in a wireless environment is an immense challenge for video service providers. Unlike for 2D videos, a widely accepted No-reference objective model for assessing transmitted 3D videos that explores the Human Visual System (HVS) appropriately has not been developed yet. Distortions perceived in 2D and 3D videos are signiﬁcantly different due to the sophisticated manner in which the HVS handles the dissimilarities between the two different views. In real-time video transmission, viewers only have the distorted or receiver end content of the original video acquired through the communication medium. In this paper, we propose a No-reference quality assessment method that can estimate the quality of a stereoscopic 3D video based on HVS. By evaluating perceptual aspects and correlations of visual binocular impacts in a stereoscopic movie, the approach creates a way for the objective quality measure to assess impairments similarly to a human observer who would experience the similar material. Firstly, the disparity is measured and quantiﬁed by the region-based similarity matching algorithm, and then, the magnitude of the edge difference is calculated to delimit the visually perceptible areas of an image. Finally, an objective metric is approximated by extracting these signiﬁcant perceptual image features. Experimental analysis with standard S3D video datasets demonstrates the lower computational complexity for the video decoder and comparison with the state-of-the-art algorithms shows the efﬁciency of the proposed approach for 3D video transmission at different quantization (QP 26 and QP 32) and loss rate (1% and 3% packet loss) parameters along with the perceptual distortion features.


Introduction
Quality Assessment is an imperative aspect of video services aimed at human observers in applications such as television, Blu-ray, DVD, mobile TV, web TV, gaming, and video streaming.An objective 3D video QA is a statistical mathematical model that approximates the results for video perception that would be obtained from typical human viewers [1].No-reference (NR) models can objectively estimate a video's quality based on the received frames that have been subjected to distortions from coding and transmission losses [2].Since it does not require any information from the source video, it generates less precise scores when evaluating the quality of a video compared to full-and reducedreference approaches, but can be applied in many real-time applications for which source information is unavailable.NR models are widely used for continuous quality monitoring at the receiver end in video playback and streaming systems [1], with a basic system shown in Figure 1.The figure shows a conventional quality measure system for video transmission where stereoscopic videos are differently encoded, transmitted, and then decoded in the receiving system.For any kind of transmission losses or distortion, the end devices perform some checking of the videos, perform real-time error concealment, and then show it to the viewer.3D video transmission has received significant research attention driven by commercial applications in recent years [1,3,4].More attention has been paid to analyzing and mitigating the effects by applying various techniques in the 3D video transmission chain and display to ensure better Quality of Experience (QoE) [5].However, the impacts of artifacts introduced into a 3D video by its transmission system have not received as much interest as those in a 2D video [6] although they influence the overall image quality similarly.Because stereoscopic 3D video has two separate channels, each of which may experience unrelated attenuation, transmission errors over unreliable wireless communication channels are higher than in 2D video.For example, lag in a view can lead to a time-lapse out-of-sync process, reducing the comfort of 3D [7] viewing.In addition, the methods used to reduce these spurious structures (e.g., error masking) do not work as well for 3D videos as they do for 2D [8] videos.To create a good 3D depth perception [9,10] and avoid the competition of two binoculars [11,12], ideally the two 3D channels should be maintained and synchronized.
Numerous factors, including spatial or temporal frequencies, binocular depth cues, and transmission media, can significantly affect the video quality experienced by users during 3D viewing.Previous studies of binocular vision and its sensitivity showed that the behaviors of viewers during 2D and 3D viewing are different and the observations are tightly linked to the way in which 3D videos are perceived [13][14][15].Existing 2D image or video quality models are not adequate for measuring 3D visual experiences since they do not incorporate various 3D perceptual characteristics, including at least the most positive (binocular depth) and negative (cross-talk and visual discomfort) factors of S3D [7,12].
As shown in Figure 2, the brain employs binocular disparity to extract depth information from two-dimensional (2D) retinal pictures.The difference in the coordinates of similar features in two stereo views is referred to as "binocular disparity".The horopter is a curving line that connects all sites that have a zero retinal disparity (same relative coordinate), with the points positioned at it having the same perceived distances from a human subject's fixation point.Panum's fusional area is located around the horopter and is a zone within which views with non-zero retinal disparities can be merged binocularly; however, things located outside of this area result in double pictures.Panum's area size varies across the retina and is regulated by the spatial and temporal characteristics of the fixation object [16].When a person focuses on an item, a picture of that object is formed on the retina, and objects that are closer to or farther away from the accommodation distance are perceived as fuzzy pictures.Objects that lay within a limited zone around the accommodation point, however, might be regarded as having a high resolution (i.e., not blurred), with the extent of this region known as the depth of field (DOF) [16].The geometry of stereopsis depicted in Figure 2 corresponds to the experimental setting of the two-needle test [17] with a fixation point B. The theoretical depth discrimination ∆ f , according to it, can be determined from the definition of the convergence angle α by applying dα/d f , which provides with b the interpupillary distance and f the mean object distance.With 4 f 2 b 2 , which is even valid for a near point distance f near = 250 mm, Equation (1) simplifies to the common form ∆ f ≈ − f 2 ∆α b .In binocular vision, ∆α is the stereoscopic acuity (the smallest detectable depth difference) which is required to get a proper binocular or 3D depth perception.Scientifically, it is observed that ∆α = 10 arcsec, (a tolerable value under photopic lighting conditions) which is added to move the view from fixation point A (α − ∆α) to (α) and for more depth perception B (α) to C (α + ∆α).The minimum detectable depth difference being on the order of 0.3 mm for a fixation distance of 650 mm.Among other factors, stereoscopic acuity is affected mainly by parameters such as an object's luminance and spatial frequency, and angular distances from fixation and object motion.Therefore, a multidimensional 3D visual experience model needs to be defined to incorporate the aforementioned acuity factors based on their perceptual importance.
Effective 3D video quality evaluation schemes should be designed based on the observations which will allow us to (i) synchronize between the two views of stereoscopic video, (ii) adjust parameters to maximize overall quality and comfort, (iii) control visual quality during transmission, and (iv) define the levels of quality for specific video services.The main goal of this research work is designing an NR quality metric for evaluating 3D videos considering different aspects of human binocular perception and real-time transmission.In Section 2, we include a summarized discussion of existing relevant research on 3D quality assessment and Section 3 introduces the proposed NR-based approach with a detailed algorithm presented in the subsections.The experiments and final results are discussed in Section 4. Finally, Section 5 concludes this research with the directions on possible future works.

Previous Relevant Research
S3D QA algorithms could be categorized based on the perspective binocular perception.The first consists of 2D-based 3D QA models which do not utilize depth information from stereo pairs.In [18], some well-known 2D image quality metrics (PSNR, SSIM, MSSIM, VSNR, VQM, Weighted SNR, and JND) are introduced and their capabilities for the stereoscopic image were investigated.Structural Similarity (SSIM) is a Full-Reference (FR) objective video quality metric proposed by Wang et al. [19] which is based on the assumption that human visual perception is highly adapted for extracting structures from a scene.It compares local patterns of pixel intensities normalized for luminance and contrast.In [20], several 2D objective video quality metrics (SSIM, Universal Quality Index (UQI), C4, and RR Image QA (RRIQA)) for left and right images are combined using an average image distortion approach and visual acuity technique based on disparity distortion for the QA of a stereo image.The approaches in [21] apply 2D QA algorithms independently on the left and right perspectives, and then aggregate the two scores (by different means) to forecast 3D quality.In addition, Ryu et al. [22] presented a 3D quality score as the weighted sum of the left and right perspectives' quality scores.According to Meegan et al. [23], the binocular sense of the quality of asymmetric MPEG-2 distorted stereo pictures is approximately the average of the two views, but the perception of asymmetric blur in a distorted stereo image is mostly influenced by the higher-quality view.Since most approaches accumulate standard 2D image or video features rather than considering the psychovisual aspects of 3D, the outcome directed the investigation of a binocular rivalry or acuity measure by obtaining the disparity between the left and right views.
The second category of models considers the binocular (depth) information and estimates the disparity map in the overall process.Shen et al. [1] presented an NR QA approach imitating the HVS perception route, and from the fused and single view, they derived the features throughout the global feature fusion sub-network.Liu et al. [14] developed a new NR stereoscopic image quality perception model that incorporated monocular and binocular features by the relationship between visual features and stereoscopic perception.Blenoit et al. [24] developed an FR 3D QA algorithm that computes the differences between the quality scores of the left and right videos calculated from reference and distorted views, as well as a distorted disparity map.C4 [25] and SSIM [26] are used to compute these three quality scores, and different combinations of them are employed to obtain the final results.Their findings suggest that discrepancy information can improve the SSIM-based 3D QA algorithm (called 3D-SSIM).You et al. [27] expanded the concept of forecasting the 3D quality of stereo pairs by using SSIM and the mean absolute difference (MAD) of their predicted disparity maps.Bensalma et al. [12] presented a 3D QA technique based on assessing the difference in binocular energy between reference and tested stereo pairs, which takes into account the potential influence of binocular effects on perceived 3D quality.In the RR 3D QA proposed by Hewage et al. [28], just the depth map's edge information is sent, and the PSNR of the reference frame is used to estimate the overall score.Akhter et al. [29] proposed an NR 3D QA algorithm which extracts features and estimates the disparity map, with a logistic regression model then used to predict the 3D quality scores.Wang et al. [30] introduced a 3D QA model based on the suppression theory of binocular fusion, in which 2D Image Quality Metric (IQM) distortion maps are generated for both the left and right views.Then, a binocular spatial sensitivity module is incorporated with these maps to generate the final quality metric.Some recent feature-based approaches also contributed significantly in the NR-based video quality assessment research.Varga et al. [31] provided a potent feature map for NR-VQA that draws inspiration from Benford's law.It is shown that the first-digit patterns recovered from the video volume data's various transform domains are quality-aware attributes that can be efficiently projected towards sensory quality reporting.Based on the research findings, the authors in [32] suggested a support vector regression (SVR)based supervised learning strategy to tackle the no-reference video quality assessments (NRVQA) issue.Authors claimed that the suggested method provides satisfactory accuracy on real aberrations and competitive intensity on conventional (fake) aberrations.Ebenezer et al. [33] suggested a fresh working prototype for no-reference video quality assessment (VQA) that is grounded on the inherent characteristics of video's space-time chips.The phrase "space-time chips" (ST-chips) refers to a brand-new, quality-aware feature set that we characterize as confined spatial slices of multimedia data that follow the motion flow of the local region.They demonstrated that the parameters from such frameworks are distorted-sensitive and can thus be utilized to forecast the grade of movies using generalized dispersion fitting to the band-pass histograms of space-time chips.Saad et al. proposed a blind video evaluation model (no reference or NR) that is not specific to the [34] distortion.This method is based on the spatial-temporal model of video scenes in the signal processing domain (discrete cosine transform) and a computational model that classifies the motion occurring in the scenes to predict the quality video.
Deep learning has received attention in recent years due to the availability of benchmark video QA databases.Varga et al. [2] described a novel, deep learning-based strategy for NR-VQA that used several pre-trained convolutional neural networks (CNN) to classify the probable image and video distortions in parallel.To extract the spatial and temporal features from the stereoscopic videos, H. Imami et al. [3] proposed a quality assessment method based on a 3D convolution neural network with capturing the disparity information.Zhang et al. [35] proposed a synthesized video denoising algorithm based on CNN for the elimination of temporal flicker distortion and enhanced 3D synthetic video perceptual quality.Feng et al. [36] presented a multi-scale feature-directed 3D CNN for QA that used 3D convolution to catch the spatiotemporal features as well as a novel multi-scale unit to accumulate multi-scale information.Jin et al. [4] proposed a no-reference image quality assessment method for measuring 3D composite images based on visual entropy-oriented multi-layer feature analysis.However, all of the above approaches combine high computational complexity, creating longer latency, and thus lack real-time capability for end-user electronics after transmission.
In general, it is difficult to judge the quality of the perceived depth because the depth of ground truth is generally not available during transmission.These models can only evaluate depth quality using an estimated disparity map (calculated from an empty stereo pair or a distorted stereo pair) which is significantly affected by the accuracy of the sound.In addition, most existing 3D video quality models are validated using symmetrically encoded videos.In addition, conventional quality assurance methods depend on FR or RR criteria, making it difficult to judge from damaged transmitted or broadcast movies, especially when the original video is not available to the user or receiver.Due to the asymmetric quality of their encryption and the degradation caused by packet loss and network delays, the application of these quality measures to many real-world situations is limited.Our proposed approach overcomes these challenges by accumulating perceptual 3D features in an NR-based manner applicable to video streaming and streaming platforms along with a reduced computational complexity to meet the real-time capacity of the decoder and measure the optimal 3D video quality.

Applied Datasets
For our experiment, we used three different stereoscopic sequences (3D_02, 3D_car and 3D_03) from the RMIT3DV [37] and EPFL [38] video datasets summarized in Table 1.All had a 3-s playback duration, full HD resolution of 1920 × 1080 and 25 frames per second, and consisted of different pictorial contents, such as low or high contrast and object movements, and textures.The two sequences, 3D_02 and 3D_car, which were from the EPFL dataset, showed a person bicycle-riding on a road and a car taking a turn while moving, respectively.The other sequence, 3D_03 which was from the RMIT3DV dataset, was called Flag Waving in the state library which consists of a very fast motion.These videos consisted of different low to high level of 3D depth perception and various motions.

Proposed Method
The assessment of the quality of a distorted 3D video is an important part of building and organizing advanced immersible media delivery platforms.As shown in Figure 3, our proposed QA method is implemented by extracting the binocular and perceptual features of a stereoscopic video, which are combined afterward to generate a QA Objective Metric (QAOM).At first, perceptual attributes were taken into account to estimate the binocular features that influence the quality of a stereoscopic video.A new video quality index based on two image/video features has been devised to evaluate the perceived quality of transmitted stereoscopic videos, with the extracted features accumulated according to the tube suppression theory.The disparity index was developed by taking into account similar aspects of stereoscopic film, as well as edge detection, which was used to assess binocular vision impairment and distortions due to packet loss in the network.Firstly, stereo-matching with a block-based technique [39] is constructed to create a matrix of error energy for extracting disparity information from a pair of stereoscopic frames, based on which an intermediary disparity map [40] is generated from the two views.An averaging filter for eliminating unreliable disparity estimates is applied for enhancing the disparity map's reliability.
We denote L v (i, j, c) as the left-view RGB format, R v (i, j, c) the right-view RGB format and error energy in an RGB format.The error energy E eng (i, j, d) for a block size of p × q, can be expressed as Here, d is the disparity and c has value of 1, 2, 3 which correspond to the RGB color space's red, green, and blue components, respectively.The error energy matrix E eng (i, j, d) is smoothed for the predefined difference search range S (d = 1 to S) by applying averaging filters multiple times and eliminating very abrupt energy changes that may correspond to incorrect matching, with its recurring use uncovering global trends in error energy.The whole process makes this algorithm a region-based approach.The averaging filtering of the E eng (i, j, d) is described as follows for a p × q window size.
Applying the averaging filter iteratively to the error energy matrix for each disparity, the disparity (D) with the minimum error energy of E eng (i, j, d) is selected as the best disparity estimation of pixel (i, j), of the disparity map as shown in Figure 4.The process can be summarized as follows: firstly, estimate the error energy matrix for every disparity D in the search range.The average error energy obtained from the stereoscopic images is utilized to remove unreliable disparity estimates from the disparity map D(i, j), which is also calculated by the similar block-matching approach as shown in Equation (2).Apply the averaging filter iteratively to every error matrix for every disparity value, and finally, for each pixel (i, j), assign the minimum error energy Min[E eng (i, j, d)] to its disparity D(i, j) in the disparity map.
These estimates are for points around object boundaries, resulting from object occlusion in images which may be recognized by examining the high error energy D ne (i, j) in E eng (i, j).The d is omitted from the rest of the equations since the average is taken from a disparity search range S (d = 1 to S).To increase the confidence of D(i, j), a simple threshold is applied to filter out unreliable difference estimates and obtain a new disparity map.
where D ne (i, j) represents the unestimated high-energy value of block mismatch due to errors in comparing pairs of views or unreliable parallax estimation.Th eng also determines whether the disparity estimate is reliable, which can be determined for the averaged filter tolerance coefficient (α), as shown in Figure 4, where the error energy threshold is given as where α is a tolerance factor to adjust the reliability of the filtering process; smaller values make D(i, j) more reliable.On the other hand, decreasing α erodes the disparity map by removing more disparity points in the map.Then, the disparity index can be calculated as the average of D(i, j) as A stereoscopic video maintains a nearly constant D n ratio between left and right views.Considering the assumption that there are no scene changes in the video, this statement will be explained and analyzed in the experimental part.If there is any packet loss or distortion in either or both views, D n will deteriorate, thereby demonstrating dissimilarity between the views which is measured as where D n is the disparity index of the current frame and P the number of disparity indices of the previous candidate frames being compared with it to measure the dissimilarity (S m ).The D n value is divided by 10 to get the final S m score within the range from 0 to 1 and produce an equal ratio (weight factor) of S m with the perceptual edge-based measure which equally contributes to the final QA score of QAOM 3 D, as discussed in a later section.

Edge-Based Perceptual Difference Measure
Considering the human visual system, edges are the most essential perceptual elements.They record key events and changes in the features of an image, such as discontinuities in depth and surface orientations and fluctuations in scene illumination.The losses of edge magnitudes in visually relevant parts of an image are calculated in this subsection.
Because of its minimal computing complexity and excellent efficiency for recognizing edges, we employ a Sobel filter [41] to find edge regions of a given video frame A. As indicated in Equation ( 8), the operator constructs derivative approximations using two 3 × 3 kernels convoluted with the original image-one for horizontal and one for vertical changes.If we consider A to be the source picture, and Gx and Gy to be two images that comprise the horizontal and vertical derivative estimates at each position, the computations are as follows: Here, * denotes the convolution operation for the two-dimensional signal processing.We can combine the resulting gradient approximations to get the gradient magnitude for the current frame (G c ) as follows: The resulting edge size map for the previous frame (G P ) of the video was determined similarly.After that, the edge strength difference (E di f f ) between the current frame and the previous frame for the perceptual relevant region [42] in the image is computed, and the pixel position where the edge strength lies is defined as greater than the current frame's edge strength map average (G c ).Furthermore, the threshold at which edge size differences are perceptible is taken as half the standard deviation of the actual edge size (σG c ) [43].The mean and standard deviation of the current frame, both left and right, are calculated, and their mean is taken as the threshold.So, the distinct difference from the edge (E c ) at pixel location (i, j) is calculated as: where P is the number of previous candidate frames relative the current frame to measure the difference.For a current video frame (c), the sum of the E c (i, j) is defined as the total difference (E c D) for that frame, that is, the maximum perceived difference between the two views which is computed as

Final Objective Quality Measure
The final QA is achieved by combining the two stereoscopic perceptual measures, dissimilarity and perceptual difference, to assess the property of the present stereoscopic 3D video due to the unavailability of the original sending video.
We assume that both the edge and disparity features have equal weights with respect to perceptual characteristics and dominance.Therefore, the distortion measures developed in Equations ( 7) and ( 12) are averaged and this average distortion is subtracted from 1, to obtain the overall quality of the stereoscopic video impairment metric QAOM 3D :

Results
According to the proposed method, the dissimilarity between the left and right frames is measured by considering the inter-view similarity disparity information, which is a block-based stereoscopic matching approach.Based on a 3 × 3 window size for block matching and a predetermined disparity search range of d = 1 to 40, the reliable disparity, depth map, and error energy were calculated.Equations ( 2)-( 5) show details of the steps for calculating the intermediate stages of the dissimilarity measure and Figure 5 shows some results for the Bicycle Riding dataset.Using Equation ( 6), the disparity index (D n ) calculated from the reliable disparity measured between the left and right frames of the Bicycle Riding video was 4.155, a frameby-frame comparison approach which continued for the entire video sequence.It was observed that the disparity indices remained relatively constant throughout a sequence assuming that there was no scene cut, with any error or distortion in either view reducing the value of D n .To verify this observation, different kinds of errors, i.e., packet losses, distortions, and manual errors, were applied in one of the two views.Then, based on variations in the error energy, the disparity index was calculated from the left and right views.In Figure 5, the intermediate calculations and the corresponding maps are shown.
Using Equation ( 6), the disparity index (D n ) is calculated again from the reliable disparity between the distorted left and original right views of the Bicycle Riding video which was 3.497 and was less than the actual disparity index of 4.15 for two undistorted frames.Further examples of simulated packet losses and distortions are shown in Figures 6 and 7.In Figures 6 and 7, different error mechanisms and artifacts induced in the frames can be seen, with the resultant disparity indices varying depending on the types of errors.In our experiment, packet losses were simulated using the JM H.264 reference software and, for various artifacts, the disparity indices were less than for those in undistorted frames.To verify that the left and right frames in a 3D video maintain a constant rate of disparity for proper viewing, we simulated different errors in 75 frames of the Bicycle Riding video sequence, with a graph of the disparity indices presented in Figure 8.For a packet loss, due to the dissimilarity, the views result in a deteriorated disparity (in frames 50 to 75).Further, for each distorted artifact (in frames 20 to 30), the value of the index decreased and then increased back to its previous value for the original undistorted frame in both views.On that occasion, S m of the current frame is measured by collating it with the previous frame indices of disparity using Equation (7).Different numbers of previous frames could be selected as reference ones by selecting different values of P.
For the second part of our experiment, an edge was selected as the most significant perceptual feature for consideration and Sobel edge detection as the most prominent and widely used mask for detecting edges in an image.In our proposed method, the edges of the current frame were compared with those of the previous one to determine the distortion between them.The edges of a frame detected using Equations ( 8) and ( 9) are shown in Figure 9, in which those of one distorted and one original frame from Figure 5 are presented.Each edge-detected frame was compared with the previous frame in the same view to observe edge distortions.From our experimental observations, it was noted that, for undistorted videos, the perceptual edge difference was small.However, if any prominent error-induced distortions were present in a frame, this measure demonstrated a significant difference compared with the previous one and may have surpassed a predefined threshold in terms of edge accumulation.According to Equations ( 10) and (11), the threshold at which we can observe the magnitude of the edge differences was half the standard deviation of the current edge magnitude (σG c ).Based on Equation ( 12), the maximum edge difference considered was that between the separate left and right frames.
It was obvious that the motions in a video affected the accumulation of edge differences, as shown by the differences between the Sobel edge-detected original right views in frames 4 and 5 in Figure 10 in which the edges indicate relative motions between the frames.

Discussion
According to our observations, for temporal information, this edge accumulation was low even for high-motion videos.The perceptual edge difference D E was calculated using Equation (12), with the results obtained from the original and distorted frames of the Bicycle Riding stereoscopic video shown in Table 2.For low-motion videos, the D E values were usually in the range from 0.05 to 0.10 and, for high-motion ones between 0.10 and 0.20, as can be observed for the undistorted frames 1, 2, and 3 while those for frames 6 and 7 increased due to packet losses occurring in the previous frame.However, for error or distorted frames, edge difference accumulations could be as high as 1, depending on their percentages of transmission losses and distortions.In Table 2, it can be seen that when distortion or packet loss (P.loss) is added, the disparity index ratio of left and right views (D n ) significantly decreased, a score also applied to measure the dissimilarity (S m ) of the video.Eventually, in Equation ( 13), the dissimilarity and perceptual variance metrics were merged to construct the QA metric of a 3D video.To demonstrate the effectiveness of our method, we examined two video datasets.To speed up the process, only the most recent received frame was utilized to calculate the edge difference perceptual measure.Different ratings were obtained for frames 2 to 10, as shown in Table 3, with distorted or degraded frames marked as D with the frame number.Finally, a video's Q AOM 3D score can be represented as the overall mean of the QAOM 3D scores obtained from each frame.In our approach, the concluding quality scores of the video trials contain 75 frames for different loss settings obtained from our proposed QAOM and two popular algorithms are shown in Table 4.
StSDLC [42] is a metric which calculates impairments in the range from 0 (low distortion) to 1 (high distortion).In our experiment, for our proposed method to be comparable with the others, we presented its values in reverse, i.e., as 1 to 0. We used the H.264 JM encoder to create several forms of impairments, using 26 and 32 quantization parameter settings to reduce the overall quality of a video and 1% and 3% packet losses simulated by the encoder and, to create distortions, different kinds of compression artifacts and manual degradation.We found that our approach performed equally well as the full-referencebased ones and, most importantly, did not require an original video or image.Therefore, it is very significant for quantifying transmitted or broadcast videos for which the original video is not available at the receiver end.In addition, we compare our approach to the approach called BLIIND [34], where the author applied an NR-based method to find the video quality.The method works well for estimating the videos with nominal noises where QP is 26 and 32.It shows moderate results even for 1% packet losses.However, for higher distortions and losses, the method fails to track the level of distortions since it is a non-distortion method that deliberately avoids structural errors in the temporal video sequences.However, the proposed method performs really better than the frequency domain-based high computational BLIIND method.It performs significantly well for higher distortion and noises in the video sequences.In addition, the performance for QP32 depicts the performance of the method for light level noises in the video.Although it, it does not outperform all the approaches, the proposed method works adequately well in terms of lower computation time.The method considers human visual sensitive features without the presence of a reference frame which is very much required for transmitted videos.
This research was conducted to implement a low-computational QA algorithm which could be used in real-time stereoscopic video transmission.To determine the influence of the binocular artifact in transmission, a series of experiments was performed, with the stereoscopic video streams individually encoded.To simulate real packet losses, the JM reference software was used.Finally, we have designed an objective no-reference quality measurement approach based on the HVS features of stereoscopic videos which combines the dissimilarity measure obtained based on the disparity index between the left and right views, and a perceptual difference measure calculated by considering the difference in the edge magnitude between temporal frames.

Conclusions
This study aims of incorporating binocular rivalry measure in QA which will lead us to design a robust error-resilient 3D video communication free from perceptual ambiguity and binocular rivalry with the focus on developing error concealment strategies that are applicable in real-time and various categories of 3D displays which do not deteriorate the user's visual perceptual comfort.To avoid the detrimental effect of binocular ambiguity and visual discomfort, the transmission system could be designed to take into account these challenging psychovisual issues; for example, based on feedback regarding the measure of 3D video quality at the receiver end, parameters of the transmission system could be changed "on the fly" to mitigate distortions and generate 3D views with minimal distortions [44].To mitigate the loss of quality, the transmission system can allocate extra resources to that view or enhance the level of protection to reduce error for that 3D video channel.This type of distortion measure or impairment meter is useful for assuring errorresilient video communication and boosts the possibility of efficiently fusing 3D video content and improving overall user QoE.

Figure 1 .
Figure 1.Typical model of no-reference video quality measure.

Figure 2 .
Figure 2. Binocular disparity and corresponding depth perception in the brain.

Figure 3 .
Figure 3. Block diagram of a QA metric for 3D video.

Figure 4 .
Figure 4. Estimation of the disparity map from the minimum of smoothed error energy.

Figure 5 .
Figure 5. Steps for calculating disparity index from left and right frames of the Bicycle Riding video.

Figure 6 .
Figure 6.Different error scenarios induced in left views of 3D Video (part-1).

Figure 7 .
Figure 7. Different error scenarios induced in left views of 3D Video (part-2).

Figure 8 .
Figure 8. Disparity indices with respect to frames for the Bicycle Riding 3D video.

Figure 9 .
Figure 9. Sobel edge detection in a stereoscopic frame.

Figure 10 .
Figure 10.Difference in Sobel edge accumulation due to motion.

Table 2 .
Experimental analysis of the Bicycle Riding video.

Table 3 .
QAOM 3D scores for original and distorted videos from Bicycle Riding, Car Moving, and Flag Waving datasets.

Table 4 .
Comparison of quality scores from the proposed and different prominent approaches.