A Self-Assessment Stereo Capture Model Applicable to the Internet of Things

The realization of the Internet of Things greatly depends on the information communication among physical terminal devices and informationalized platforms, such as smart sensors, embedded systems and intelligent networks. Playing an important role in information acquisition, sensors for stereo capture have gained extensive attention in various fields. In this paper, we concentrate on promoting such sensors in an intelligent system with self-assessment capability to deal with the distortion and impairment in long-distance shooting applications. The core design is the establishment of the objective evaluation criteria that can reliably predict shooting quality with different camera configurations. Two types of stereo capture systems—toed-in camera configuration and parallel camera configuration—are taken into consideration respectively. The experimental results show that the proposed evaluation criteria can effectively predict the visual perception of stereo capture quality for long-distance shooting.

evaluation criteria for stereo capture in Section 4. Furthermore, Section 5 describes the evaluation experiments. Finally, Section 6 concludes the paper.

Related Work
Currently, researchers have focused on the analysis of shooting principles to get an excellent stereo effect; for instance, Frederik et al. presented production rules that were required for the acquisition of adequate stereo content through the study of the geometry of the 3D display, 3D capture and the stereo formula [10]; Kim et al. reported a visual fatigue metric that could replace subjective tests to evaluate image quality; further, the metric could also be used for filming and warning systems for general viewers [16]; Kim and Sohn proposed a 3D reconstruction algorithm from a stereoscopic image pair through analyzing inter-camera distance, camera focal length and shooting distance; this method solved the mutual occlusion and interaction problems between real and virtual objects in an MRsystem [20]; Lee et al. used a multiple color-filter aperture camera to present an adaptive background generation method for automatic selection of initial object regions, which can be appropriate for realizing depth capture and simultaneous detection [21]; Min et al. present a new method of synthesizing novel views from the virtual cameras in multiview camera configurations for 3DTV system [22]; Yamanoue, Okui and Yuyama presented a setting principle to achieve better stereo quality through analyzing the relationship between camera focal length and the convergence point [23]. Additionally, many scholars have exerted their efforts to establish systems for stereo capture, such as: Ebrahimnezhad et al., who constructed a set of calibrated virtual stereo cameras to propose a robust curve-based method for 3D model reconstruction of an object from image sequences captured by perpendicular stereo rigs [24]; Heinzle presented a computational stereo camera system that could close the control loop from the capture and to analyze the automatic adjustment of physical camera parameters [25]; Ilham et al. established a semi-automatic camera rig system equipped with inter-camera distance and the convergence angle [26]; Jung et al. equipped stereo vision cameras with a time-of-flight camera to propose a novel visual discomfort monitoring, and the stereo-plus-depth camera system had a left view color camera, a right view color camera to capture the stereoscopic images and a time-of-flight camera to capture the depth map [27]; Lim et al. proposed a simple geometrical ray approach to calibrate the extrinsic parameters of the cameras and solved the stereo correspondence problem of the single-lens bi-prism stereovision system [28]; Oskam et al. presented a controller system for camera convergence and inter-axial separation that specifically addressed challenges in interactive stereoscopic applications like games [14]. Furthermore, Okui, Hanning, Zhu and Park et al. had done some related studies about the effect of camera parameters on shooting quality [15,[29][30][31].
Although stereo capture technologies have made significant progress, most studies were about part of the stereo camera parameters. The stereo capture systems were built mainly for specialized cameras or hardware platforms, and the widely-recognized objective stereo cameras shooting quality evaluation criteria have not been fully investigated yet. At the same time, some subjective evaluation theories were proposed to assess the quality of stereo images [15,32,33]; however, the subject evaluation required numerous duplicate tests with a great number of participants. It is a time-consuming process for stereo capture, and no immediate feedback is able to provide for the resetting of stereo cameras, if the shooting quality is not as ideal as expected. For these reasons, the shooting parameters of toed-in and parallel stereo camera configurations, as well as the basic shooting principles are fully investigated, in order to establish an objective shooting quality evaluation criteria applicable to stereo capture systems for long-distance shooting. The evaluation criteria provide the design of integrated stereo camera systems with a basic indication of shooting quality, which can be regarded as the basis for the self-assessment capability. When the self-assessment capability gives lower shooting quality, an adjustment procedure of shooting parameters is required until the evaluation criteria reach a reasonable value and, therefore, a better 3D effect in accordance to people's subjective perception appears.

Subjective Quality Assessment Experiment on Stereoscopic Contents
Different from the traditional 2D subjective quality assessment method, an additional indicator, such as depth rendering, naturalness, presence, visual experience, visual comfort, etc., should be taken into consideration in 3D assessment [34]. In this section, subjective quality assessment based on the 3D evaluation concept is used to evaluate the generated stereoscopic contents in order to justify the proposed optimal shooting rules.
(1) Participants: Fifty non-professional adult assessors, aged between 20 and 40 years with a binocular vision of more than 0.8, participated in the subjective assessment. All participants had normal stereoacuity according to the Titmus stereo test.
(2) Stereo images: Double viewpoint images were adopted for the subjective experiments. These images were selected from the stereo image library in the stereo vision laboratory of the School of Electronic Information Engineering, Tianjin University [35]. The main sources of stereo images in this library are captured by Autodesk 3ds Max and stereo cameras in the laboratory (shown in Figure 1a-c). The resolution of the training and test stereo images is 1024 × 768. This database consists of 1839 stereoscopic pairs (990 for the toed-in camera configuration and 849 for the parallel camera configuration) under various shooting camera parameters.
(3) Display setting: The current research is motivated by the need to enhance the understanding of the variables that may influence the shooting quality of stereoscopic images. Since the display is the most common media through which people watch stereoscopic images and perceive the shooting quality, thus several display aspects, e.g., display size and watching condition, should not be overlooked [16]. It is important to acknowledge that the depth that is perceived in stereoscopic contents is strongly related to the size of the display screen [32]. Subjective tests in this paper are conducted on two different sizes of stereoscopic displays, namely the Hyundai S465D 46-inch 3D stereoscopic LCD display and the LG 47CM540-CA 47-inch 3D HDTV display, to build shooting principles. The two are paired with their own 3D active glasses.
(4) Procedure: Before formal experiments, all participants watched randomly-ordered training stereo images for 8 s at a distance of approximately 3.9 m, as suggested in the ITU-RBT.1438 for HDTV [36]. They were then asked to evaluate the stereo images with different camera parameters at the viewing range suggested by the instructions for each display. During the subjective tests, a series of stereo images under the guidance of the long-distance shooting principle, which were captured with a common value of shooting distance h and changing values of other camera shooting parameters, was displayed for 5 s, followed by a 5-s interval of a 2D mid-gray image as a grading and relaxation period. For each of the durations, observers were asked to rate the quality of stereo images using a five-level scale, as shown in Table 1.
The mean opinion score (MOS) [37] was firstly computed for each image by averaging all of the subjective scores. Then, we calculated the range of each influenced factor, which is further introduced in Section 4, and summarized the relationship between each factor and the MOS values. The experimental processes were repeated with the changing of the value of h. At last, we presented the total evaluation factor for toed-in and parallel camera configurations, respectively. Inter-camera distance can be changed to obtain stereo images with different shooting distances, and also, the toed-in and parallel camera configurations can be obtained; (b) bigger inter-camera distance; (c) matrix multi-camera arrangement. Table 1. Standards for the subjective quality evaluation of stereo cameras.

Response Explanation Quality
Imperceptible: Not any damage to 5 3D or image quality, looks comfortable and Excellent natural, consistent with human visual experience.
Perceptible, but not annoying: A slight loss of 4 3D or depth perception, but the quality of the whole Good image is still good, consistent with human visual experience.
Slightly annoying: Obvious loss of 3D and depth 3 perception; however, you can accept viewing such quality, Fair reluctantly, and it is generally suitable for human visual experience. 2 Annoying: Need to attentively distinguish between 3D and Poor depth perception that is not suitable for visual experience.

1
Very annoying: Nearly no 3D perception, and Bad people feel uncomfortable.

Evaluation Criteria for Stereo Capture
Stereo cameras are generally divided into two types in shooting configurations: toed-in and parallel [15,38]. By analyzing the features of the toed-in and parallel camera configurations, we found that different parameter settings had significant influence on the 3D effect of stereoscopic image pairs. In addition, we found that the image quality was still good when the evaluation factor exceed the previous shooting guidelines. The previous shooting guidelines were empirical methods and could only generate a rough estimation and suggestion for the camera parameters. Therefore, this paper aims to establish a corresponding five-level evaluation criteria for stereo camera configurations over long-distance shooting, as shown in Figure 2. The semantic meaning of stereo camera parameters used in establishing the evaluation criteria is summarized in Table 2.

Evaluation Criterion for the Toed-In Camera Configuration
For the toed-in camera configuration, the optical axes are converging on a single point. The objects in the foreground have a significant effect on the stereo quality of the images. Convergence rotation angle α [25] and shape ratio µ [10] are important factors that have to be taken into consideration for the toed-in camera configuration over long-distance shooting.
Convergence rotation angle: Heinzle et al. [25] presented that the value of convergence rotation angle α (Equation (1)) has an effect on the position of convergence point, as shown in Figure 3. Previous studies indicated that α had an effect on the screen disparity of a given point, while it referred to the distance between the two corresponding points in stereoscopic images recorded by the left and right cameras. The disparity was often the most important parameter for stereo depth perception and related to the most comfort-zone constraints. However, it was an empirical method and could only generate a rough estimation for camera parameters. Hence, this part of the whole aims to establish a corresponding five-level evaluation criterion of the convergence rotation angle for stereo cameras through subjective and objective experiments.
As a matter of convenience, this paper considered α as the evaluation index. Firstly, when f = 50 mm and h was a fixed value, d changed constantly. Then, we change the value of h, and we captured the corresponding stereoscopic image pairs once again. Besides, we changed the camera focal length f and carried out similar experiments as described above. The subjective results are shown in Figure 4, and it was indicated that f , as well as the α value had a great effect on the subjective results. In order to enrich the experiments, several of the camera focal lengths were involved in our experiments. Through a number of experiments, we proposed an ultimate assessment index g, calculated from Equation (2). The mapping between g and the MOS value, M OS g , was studied based on the subjective experimental results and is listed in Table 3. It indicated that when the g value was at most 20.56, a good stereoscopic effect could be obtained. Shape ratio: Frederik et al. [10] proposed that the shape ratio µ was defined as the depth magnification to the width magnification. Equation (3) can be taken to ensure an undistorted depth reproduction near the screen surface: Where h D is the viewing distance, t e is the viewer's inter-ocular distance, h is the shooting distance and d is the inter-camera distance. Note that, if an average h D and the point of convergence are given, the grade of stereoscopic distortion can only be controlled by choosing the right ratio between d and t e .
The geometrical interpretation of Equation (3) depended on whether the points at infinity were actually presented in the captured scene. The value of µ had a significant effect on the quality of the stereoscopic image pairs. The smaller the value, the higher the stereo quality. Hence, when shooting long distance, h was far outstrips h D and d was bigger than t e . This part of the whole aims to establish a corresponding five-level evaluation criterion of the shape ratio for the stereo camera through experiments.
Based on the experiments, a series of stereoscopic image pairs was captured with different shape ratios. We selected a bigger viewing distance as 3.9 m; refer to [36] for HDTV. The range of µ is from 0.095 to 1.532. Through subjective experiments, we determined the mapping between µ and the MOS value, M OS µt , of stereoscopic image pairs, shown in Table 4, which indicated that when the µ value was no more than 0.304, a good stereoscopic effect could be obtained.

Evaluation Criterion for Parallel Camera Configuration
For the parallel camera configuration, the evaluation of long-distance shooting quality has been studied from the following four aspects: 1/30 rule [39,40], binocular overlap percentage [15,30,41], camera visual acuity [42] and shape ratio [10].
Modified 1/30 rule: In professional stereo shooting activities, the 1/30 rule of 3D, which stipulates that the inter-camera distance d should be 1/30 of the shooting distance h from the camera to the first foreground object, was suggested and widely used in stereo photography. In our experiments, the index d/h was applied to the analysis of the effect on shooting quality.
Previous studies presented that the 1/30 rule was a two-level evaluation criterion, which meant two evaluation effects (good or bad) toward the stereo effect [39,40]. Therefore, our goals were complementary to these previous works. Our work targeted establishing the five-level objective evaluation criterion. As a matter of convenience, this paper considered d/h as the evaluation index. Based on experiments, a series of stereoscopic image pairs was captured, and the value of d/h ranged from 1/80-1/15.
Combining with the subjective experimental results and the range of the d/h value, the mapping between d/h and the MOS value, M OS dh , was calculated, as shown in Table 5, and it indicated that when the d/h value was at most 1/39.685, people can obtain a good stereoscopic effect. Binocular overlap percentage: The magnification of an image on the retina is a/w, as shown in Figure 5 (here, a is the original image width, p is the viewing angle of the stereo camera, w is the viewing region of stereo camera and w is the resulting composite image width, which denotes the binocular overlap of stereo camera). a/w can influence the values of positive and negative parallax and further affect the quality of the stereo images. In order to simplify the calculation, w/w was applied to the analysis on how binocular overlap affected stereoscopic capturing quality. Based on the geometric relations in Figure 5, we can conclude: For the sake of better understanding, the principle of the binocular overlap percentage was taken as the evaluation index. All test stereo images were divided into several different groups, each with a fixed value of h and different values d. Then, changing the value of h and repeating the above experimental process, a series of stereoscopic image pairs was captured. Through subjective experiments, we took index ξ as the binocular overlap percentage w/w in Equation (4) with the effort of Equation (5).
Based on a series of experiments, the binocular overlap percentage ξ of stereoscopic image pairs can be calculated, and the range of ξ is from 0.706-0.995. Through investigating the mapping between ξ and the MOS value M OS ξ , as shown in Table 6, it can be used to evaluate the effect of the binocular overlap for long-distance shooting with the parallel camera configuration. Camera visual acuity: Generally, the camera visual acuity is widely recognized as 0.57 • (ϑ; shown in Figure 6). Let h denote the shooting distance; the theoretical inter-camera distance d w can be obtained (shown in Equation (6)) according to the camera visual acuity.  Over the long-distance shooting condition, the camera visual acuity was one of the main considerations. Ignoring this limitation might result in viewing discomfort or even the loss of stereo impression. In order to establish a five-level evaluation criterion, this paper took ϑ (shown in Equation (7)) as the evaluation index of camera visual acuity.
In order to enrich the experiments, several of the camera focal lengths f were involved in our experiments; the experimental results are shown in Figure 7, which indicated that f , as well as ϑ have a great effect on the subjective results. We proposed an ultimate assessment index k, calculated from Equation (8). The mapping between k and the MOS value, M OS k , is shown in Table 7. The values indicated that when k was at most 55.98, a good stereoscopic effect could be obtained. Shape ratio of the parallel camera configuration: Similarly, the shooting principle of the shape ratio was applied to the parallel camera configuration. Based on a series of experiments, the value of µ ranged from 0.095-0.932. Combined with the subjective experimental results, shown in Table 8, the mapping between µ and the MOS value, M OS µp , of stereoscopic image pairs had been investigated.

Comprehensive Objective Evaluation Criteria
At present, the most common method used to integrate all independent individual factors into a global index is the linear weighting method [43][44][45]. In order to reasonably evaluate the performance of the objective evaluation criteria, we applied a linear regression to the combination of the six factors, and each of them was given a weight. We specified M OS g as the output value of the convergence rotation angle for the toed-in camera configuration, M OS µt as the output value of the shape ratio factor for the toed-in camera configuration, M OS dh as the output value of the modified 1/30 rule factor for the parallel camera configuration, M OS ξ as the output value of the the binocular overlap percentage factor for the parallel camera configuration, M OS k as the output value of the camera visual acuity factor for the parallel camera configuration and M OS µp as the value of the shape ratio factor for the parallel camera configuration. Considering that each of the factors had different properties in stereo capture, the establishment of the evaluation criteria should take all of the perspectives into account. Accordingly, they were regarded as individual factors that could be considered relatively independent. Although there may be a relation among the criteria, for simplicity, the global quality score Q can be gained by using a linear combination of the criteria, which can be defined as: where Z = 0 denotes toed-in camera long-distance shooting, Z = 1 denotes parallel camera long-distance shooting, m, n, q, r, s and t are weights of each factor in objective evaluation criteria and restricted by m + n = 1 and q + r + s + t = 1.
With the given weights of the factors, we can compute an objective score for each captured 3D image pair. At the same time, through a series of subjective tests, which has been done in Section 2, the subjective score for each pair can also be obtained. In order to choose the proper values of the six weights, the correlation coefficients between the objective and subjective scores were computed on the whole database, and the corresponding values that achieved the max relativity were chosen for the weight of each factor. Based on the experimental results, the weights of each factor are shown in Table 9. It is worth noting that the factors for each camera configuration get the same weight, indicating that the criteria proposed may have approximately the same importance in evaluating the performance of 3D capturing. However, the application of 3D capturing is a still a complex procedure, and more efforts are need. Table 9. Weights of each factor in the objective evaluation criteria.

Evaluation Experiments
To verify the proposed objective criteria, another thirty non-professional adults, aged between 20 and 40 years, participated in the subjective evaluation experiments. All of them took the stereo vision test before the subjective experiments. They were asked to watch the training stereo images with different camera parameters. Two hundred and nineteen test stereoscopic image pairs were used for the subjective test and displayed randomly.
One of the selected scenes is shown in Figure 8a. Under the condition of the long-distance shooting setting, stereoscopic image pairs were captured by changing the value of the main camera parameters h and d, when f = 50 mm, using toed-in and parallel camera configurations, respectively.
Take the stereoscopic image pairs captured by the parallel camera configuration as an example, shown in Figure 8b-e. When d = 600 mm, h = 50 m, the quality score Q is 5.0. Learning from the subjective experiments, the MOS is also 5.0, which indicates that the prediction of the proposed criteria is consistent with the subjective evaluation value. When d = 1800 mm, h = 50 m, Q is 2.0. Based on the subjective experiments, the MOS is 2.3, which indicates that our proposed criteria are in line with human perception. According to our evaluation formula, Q will decrease as the value of d increases. When d = 2400 mm, h = 90 m, Q is three and the MOS is 3.3, which is close to the value from the objective value.
Stereoscopic images captured by the toed-in camera configuration are shown in Figure 8f-i. When d = 1000 mm, h = 70 m, Q is 5.0. Learning from the subjective experiments, the MOS is 5.0, which is in line with the output of our proposed criteria. When d = 3000 mm, h = 70 m, Q is 3.5 and the MOS is 3.8. The comparison between the two results reveals that the value of d has a great effect on stereo image quality, which is consistent with our prediction. Similar results can also be found on other captured images. In order to further validate the effectiveness of the proposed objective evaluation criteria, another four group scenes were chosen to conduct the above experiments. Two groups were real 3D scene pictures (shown in Figure 9a,b), and the rest were Autodesk 3ds Max scene pictures (shown in Figure 9c,d). Through changing the values of shooting parameters h, d and f , another two hundred and sixty-one stereoscopic image pairs were chosen to validate the effectiveness of the proposed criteria. The linear correlation between objective evaluation criteria Q and the subjective evaluation MOS values is shown in Figure 10. As we can see from the figure, the consistency between the proposed criteria and the subjective evaluation is clearly identified.
Three evaluation indices are adopted to verify the consistency of the objective evaluation criteria and the subjective evaluation values for long-distance stereo capture quality [46]: the Pearson correlation coefficients (CC), the Spearman rank order correlation coefficient (SROCC), and the RMSE. The range of CC and SROCC is 0-1; the closer the values of CC and SROCC are to one, the better the performance of the criteria, and vice versa. The range of RMSE values is 0 to +∞, and the lower its value, the better the performance of the criteria. Thirty image pairs were selected from the two hundred and sixty-one test stereoscopic image pairs to compare the output values of subjective and objective evaluation criteria, and the results are summarized in Table 10.  According to Figure 10 and Table 10, we can conclude that the objective evaluation results of the proposed criteria are in accordance with those of the subjective evaluation. With the verification of subjective experimental results and theoretical analysis, the proposed criteria algorithm is applicable for evaluating the long-distance shooting quality of stereo cameras.

Conclusions
In this paper, we proposed the design of a self-assessment stereo capture model applicable to IoT for long-distance shooting. Serving as a major procedure in information acquisition, stereo capture plays a key role in determining the quality of stereo capturing-based content in IoT. Regarded as the core component of the self-assessment model, an objective evaluation criteria for long-distance shooting quality adapted to different stereo camera configures is fully investigated. Two types of stereo camera systems-toed-in camera configuration and parallel camera configuration-were taken into consideration respectively. Experimental results indicated that the proposed evaluation criteria were consistent with people's subjective perception and can be applied to the self-assessment of the stereo capture's long-distance shooting quality. Instead of duplicate subjective tests, the design of the self-assessment capability can effectively predict the long-distance shooting quality and provides a better 3D effect in accordance to people's subjective perception in a much easier way. In conclusion, the establishment of the evaluation criteria will not only lead to better shooting quality, but also provide a theoretical basis for the integrated stereo capture systems with self-assessment capability. With this model, more reliable stereo image-based information can be used in the interaction between physical terminal devices, and the IoT will also benefit a lot from improved sensor networks.
It is worth noting that a self-assessment stereo capture model that can compute its performance by using the provided criteria has been realized in the paper. However, there is still a lack of research on how to improve the performance. Further studies and analysis will focus more on the design of the stereo capturing system with the ability to adaptively adjust corresponding parameters to achieve better capture quality, if the system has only poor performance.
ξ --the index of binocular overlap percentage; g --the index of the convergence rotation angle; k --the index of the camera visual acuity; Z --the stereo camera type; Q --the overall quality score.