Automated Calibration Method for Eye-Tracked Autostereoscopic Display

In this paper, we propose an automated calibration system for an eye-tracked autostereoscopic display (ETAD). Instead of calibrating each device sequentially and individually, our method calibrates all parameters of the devices at the same time in a fixed environment. To achieve this, we first identify and classify all parameters by establishing a physical model of the ETAD and describe a rendering method based on a viewer’s eye position. Then, we propose a calibration method that estimates all parameters at the same time using two images. To automate the proposed method, we use a calibration module of our own design. Consequently, the calibration process is performed by analyzing two images captured by onboard camera of the ETAD and the external camera of the calibration module. For validation, we conducted two types of experiments, one with simulation for quantitative evaluation, and the other with a real prototype ETAD device for qualitative assessment. Experimental results demonstrate the crosstalk of the ETAD was improved to 8.32%. The visual quality was also improved to 30.44% in the peak-signal-to-noise ratio (PSNR) and 40.14% in the structural similarity (SSIM) indexes when the proposed calibration method is applied. The whole calibration process was carried out within 1.5 s without any external manipulation.


Introduction
An autostereoscopic display is a device that allows three-dimensional (3D) perception by providing different images to each one of a person's eyes. The biggest advantage of the autostereoscopic display is that users do not need to wear special equipment such as glasses. During the past decade, various methods have been studied to implement autostereoscopic display [1,2]. Among them, an autostereoscopic display with a conventional two-dimensional (2D) flat panel display and an additional optical layer, which is located in front or at the back of the display panel, has gained the most commercial success due to its low cost and ease of fabrication [3]. A typical method of stereoscopic display, which employs an additional optical layer, endows directionality to each pixel by blocking light paths (for parallax barriers) or by refracting light rays (for lenticular lenses). This method limits the visibility of pixels to only a specific range of view positions or angles. Located at different view positions, the left and the right eye see different sets of pixels.
Despite these advantages, several drawbacks prevent the autostereoscopic display from becoming more prevalent [4,5]. A typical method of stereoscopic display, in which the left and the right views are repeated alternately, is restricted to a specific viewing area. In the case of two views, the margin of the correct viewing area is approximately the distance between the eyes. Even if a viewer moves slightly from the fixed location, the view will transform into a depth-reversed image, referred to as a pseudoscopic image [6]. Increasing the angular resolution, i.e., the number of views, has been a promising method to overcome this problem [7]. The concept of the multiview method is based on showing the user two of several consecutive views, which have disparity over each other, instead of the left and the right views. In this case, however, the spatial resolution of each view is inevitably lowered due to the finite number of pixels. Hence, a tradeoff arises between the angular resolution and the spatial resolution. In addition, complete separation between all views is challenging, so there is a crosstalk where several views overlap with each other.
To solve these problems, an eye-tracked autostereoscopic display (ETAD) has been proposed [8][9][10][11]. Instead of increasing the number of views, the ETAD uses the left and right views only. To prevent the user's eyes from moving out of the correct viewing area, the ETAD employs an additional device, e.g., a camera or a depth sensor, for detecting and tracking the eye position of a viewer. Using the eye tracking device, the ETAD ensures that the user's eyes are always in the correct viewing zones by moving the optical element mechanically [12], by moving slits on an additional LC display [13,14], or by merging viewing zones into two views around the viewer's eyes [6].
To date, there has been various studies to improve the visual quality of the ETAD, which mainly focus on minimizing crosstalk [4], reducing visual fatigue [15], or estimating the accurate position of a viewer [16]. These studies implicitly assumed that all parameters of the ETAD are known accurately or are the same as the designed parameters. In practice, however, parameters of the ETAD are likely to differ from the designed values for many reasons, especially due to inaccuracy in the production and the assembly processes. Moreover, the ETAD system, which employs additional devices and processes, requires parameters that are more accurate because the accumulation of errors from each device significantly degrades the visual quality. Therefore, parameters of the ETAD must be calibrated prior to the operation.
Several studies have investigated the calibration of the ETAD. Sandin et al. [17] described a calibration method of Varrier TM , which consists of multiple autostereoscopic displays with a stereo camera for user's head tracking. To calibrate each optical element of the displays, they used external stereo cameras. However, they did not describe the method for calibrating the pose between the head tracking camera and the coordinates of each device. Bailer et al. [18] proposed a calibration method for a moving barrier type ETAD system. In the calibration process, they mapped each pixel on a captured camera image to the pixels of a display panel. This method is simple to implement; however, the calibration process should be conducted in a dark room. Moreover, the mapping pixels between the image and the panel are only valid when the viewer is at a certain distance. In the case of Nintendo TM 3DS, which is commercially available, the calibration is carried out by the user [19]. A user continuously changes parameters while looking at the screen until the optimum visual quality is achieved.
Thus far, there has been no method to calibrate all parameters of the devices consisting the ETAD. In calibrating the ETAD, therefore, the most suitable method to date is to calibrate each device individually and sequentially. Below, we present a calibration example for an ETAD consisting of an onboard camera for eye tracking, a 2D display panel, and an optical element such as lenticular lenses or parallax barriers. First, we calibrated an onboard camera by analyzing several images, which capture patterns of various postures in front of the camera [20]. Then, we calibrated the optical element by the visual pattern analysis method [21]. This method uses two images of the display, thus requiring an external camera setup. Finally, the extrinsic pose between the onboard camera and the display panel should be calibrated. There are few studies on calibrating the extrinsic pose of these two devices. The second-best method is to apply a similar approach as in [22], in which the extrinsic pose of the camera and the pattern on a robot are calibrated. Even though we employed this method, problems still exist. To calibrate the pose of the camera and the pattern looking in the same direction, a mirror is needed. Furthermore, several movements of the device are required to capture images at various angles. Applying this approach to the ETAD can lead to a serious problem: parameters of the device may be changed by repeated movements of the device. In addition, individual calibration methods have a serious problem. The calibration errors of each device are accumulated and finally affect to the visual quality. The sequential approach described above is also not suitable for automated calibration. For automated calibration, an auxiliary manipulator is needed to move calibration patterns or the ETAD itself. In terms of mass production, this method increases both cost and processing time of the device calibration.
For automated calibration of the ETAD, several problems should be solved. It has not been clearly established which parameters of the ETAD need to be calibrated. A novel methodology would be to calibrate all of these parameters not individually but simultaneously for cost reduction. To alleviate parameter change during the calibration process, the method should minimize manipulation or even calibrate in a fixed environment. In this paper, we introduce an automated calibration method for the ETAD. The main contributions of this study are as follows: • We establish the parameters that require calibration. To achieve this, we firstly address the process of the ETAD from eye tracking to 3D image rendering. To the best of our knowledge, the whole process of the ETAD has never been explored before.

•
We propose a novel methodology, which calibrates all devices of the ETAD. Our calibration approach is mainly focused on both the automated and the simultaneous calibration. This method calibrates all parameters at the same time, not sequentially. The proposed method also does not require the movement of the ETAD or the calibration device, but instead is performed in a fixed environment. To achieve this goal, we designed and implemented a calibration module that consists of a 3D pattern and an external camera. The module enabled us to calibrate an ETAD by simply setting it in front of the calibration module.
The rest of the paper is organized as follows. In Section 2, we establish the parameters of the ETAD by addressing its physical model and rendering process. Section 3 presents a calibration method, which estimates all parameters automatically and simultaneously. Then, we experimentally validate the proposed method in Section 4. Finally, Section 5 concludes the paper.

Parameters Establishment
In this section, we establish the parameters of the ETAD that need to be calibrated in two stages. First, we describe the parameters of the devices that comprise a conventional ETAD. Here, the conventional ETAD refers to an autostereoscopic device having a camera for eye tracking and an additional optical layer, e.g., lenticular lenses or parallax barriers, on a 2D flat panel. Note that parameters are common even though the types of optical layers are different. Then, we describe the rendering process of the ETAD in order to identify the parameters.

Physical Model of the ETAD
The parameters of the conventional ETAD devices are represented in Figure 1. The conventional ETAD consists of three modules: a two-dimensional flat panel, an optical layer, and an eye-tracking sensor. The two-dimensional flat panel is a collection of pixels and each pixel has a width µ w and height µ h . Most of optical layers are lenticular lenses or parallax barriers [23]. In addition, these two optical layers have common parameters. A pitch p represents the distance between the lens in the case of lenticular lenses and between the slits in the case of parallax barriers. The slanted angle θ represents the degree of tilting of the optical layer. The gap τ is the distance of the slit of the barrier or the focal plane of the lens from the flat panel. The offset parameter η represents the horizontal translation of the optical layer from the origin of the display coordinate system. An onboard camera is attached at the top of the display panel for tracking a viewer's eyes. To reconstruct the three-dimensional position from a point on an image, intrinsic parameters and distortion parameters of the camera are used. The intrinsic parameters include focal lengths f x , f y and principal points c x , c y . The parameters k 1 , k 2 , k 3 , and k 4 denote distortion coefficients. There are two coordinate systems: the onboard camera coordinate system and the display panel coordinate system.

Rendering Process
Basically, the rendering process for the ETAD is considered as an iterative process comprising two steps: the eye position estimation step and the image-multiplexing step. In the eye position estimation step, the system estimates the 3D position of the eyes from the image captured by the onboard camera. Note that the eye detection method is beyond the scope of this study. In the image-multiplexing step, the system generates a multiplexing image according to the eye positions. Let us describe the iterative process systematically.

Eye Position Estimation
Step: Given the eye positions on a captured image from the onboard camera, they are derived in the camera coordinate system as where s µ and t µ are eye positions on the captured image and x µ , y µ are the corresponding positions on the onboard camera coordinate system. Note that the subscript µ is either left or right (µ ∈ {l, r}) and the equation containing µ is the same in the left and right cases. The parameters f x , f y are focal lengths in the x and y directions, respectively, and c x , c y are principal points of the onboard camera. Commercial camera lenses conventionally have distortion, known as radial and tangential distortion [24]. We define k 1 , k 2 as the parameters for radial distortion, and k 3 , k 4 as the parameters for tangential distortion of the camera lens. Using these distortion parameters, the undistorted position of each eye x µ , y µ is derived as where r µ = x 2 µ + y 2 µ . Then, we can obtain the ray vectors, which have direction information of both eyes from the origin of the camera coordinate system. The ray vector v µ which passes through x µ , y µ from the origin of the camera coordinate system O c can be represented in homogeneous coordinates as Given these back-projected vectors, we finally obtain the three dimensional eye positions e c l , e c r as where α is defined as Here, we assume that interpupillary distance (IPD) ψ is known and the direction of the face normal vector φ faces the onboard camera. The proof for Equations (4) and (5) are given in Appendix A at the end of the paper.

Image Multiplexing
Step: In our rendering approach, we can simplify the method of generating a multiplex image as a problem of assigning labels (left or right) to each pixel. Note, a pixel generally consists of several sub-pixels. However, here we define the pixels in the smallest unit, i.e., sub-pixels, as "pixels". The multiplexed image is generated from the number of viewpoint images, e.g., a two-view display requires two images: the left and the right image. As described above, all pixels have directionality by the additional optical layer. Pixels are assigned the "left" label to show the same intensity value of the left source image if their direction is toward the left eye of the viewer; the same condition is applicable to the "right" case. On the contrary, we can assign a label to each pixel by determining whether the pixel's position is close to the visible pixels of the left eye or those of the right eye. The slits of the barrier or principle points of a lenticular lens can be expressed as a set of lines; therefore, the pixels passing through the slit and directed to the left or right eye are also represented by lines. Here, we would like to define these sets of lines as the visible pixels of the left eyes and those of the right eyes. Then, we assign a label value that corresponds to the nearest visible line for the all pixels, not visible directly.
The 3D eye position e c µ obtained from the captured image is based on the camera coordinate system; we can convert it into the display coordinate system as where T d c is a homogeneous transform matrix from an onboard camera coordinate system to a display coordinate system. The transform matrix can be represented as which consisting of the transform matrix t d c = t x t y t z T and the rotation matrix R d c . The rotation matrix is defined as cos r z cos r y cos r z sin r y sin r x − sin r z cos r x cos r y sin r y cos r x + sin r z sin r x sin r z cos r y sin r z sin r y sin r x + cos r z cos r x sin r x sin r y cos r z − cos r z cos r x − sin r y cos r y sin r x cos r y cos r x    , (9) where r x , r y , r z are Euler angles in radians. We define the nth line of the slits of an optical element as S n ; the vector equation is represented as where is the offset, θ is the slanted angle, p/ cos θ is the horizontal pitch, and τ is the gap of the optical layer. The left term of this equation represents the origins of slits, and the right term the directional vector with an arbitrary real number δ. We obtain the positions of the visible pixel s n,µ which is represented as the projected vectors on the panel using the eye positions and the slit vectors as where z µ (µ ∈ l, r) is the distance to the left or the right eye. Then, we also express a pixel position of u, v on the display panel as where λ w , λ h are the width and height of a pixel, respectively. We obtain the distances between a pixel P and all visible pixels s n,µ by the function D, which is given by where × represents the cross product and · denotes the two-norm. Finally, we determine the label of the pixel by comparing it with the minimum distances to the left and the right visible pixels as The pixels with labels "left" have the pixel intensity of those positions on the left source image; the same applies for the "right" label case. An overview of the calibration process is presented in Figure 2.

Parameter Classification
As described above, various parameters are required to generate a correct multiplex image for a certain viewing position. This means that accumulation of small parameter errors can cause visual quality degradation; therefore, accurate calibration is needed for the ETAD. The parameters are listed in Table 1. We classify these parameters into two categories: constant and variable parameters. Constant parameters have absolute values, determined when manufacturing the devices and there is little variation between the devices. These constant parameters also hardly change with movement. Therefore, it is sufficient to use designed parameters directly for the constant parameters. The width and height of the pixel µ w , µ h , and the pitch of an optical layer p belong to this category. On the other hand, some parameters, classified as variable parameters, are not only different from the design value, but even have different values for each ETAD. These errors mainly occur during the production process of each device. These errors can even increase if the device is not fixed during the calibration process. For onboard cameras, focal lengths and principal points, which are dependent on the pose of the image sensor and the lens, vary from device to device. The radial distortions of cameras also have different values between each other. Some parameters of the additional optical layers, such as the slanted angle, the gap, and the offset, are determined in the assembly process with the display panel, because those parameters are relative to the display coordinates. The relative pose between the onboard camera coordinate system and the display coordinate system also depends on how the onboard camera is attached to the display. In our study, therefore, we focus on the calibration of those "variable parameters". Based on the classification of parameters, we describe the calibration method in the next section.

Calibration Process
The proposed method uses two images to calibrate all the parameters of the ETAD. One image is for estimating the parameters of the onboard camera and the other image is for parameters of the optical layer. To estimate camera parameters, i.e., intrinsic parameters and distortion coefficients, the onboard camera takes a picture of the calibration pattern. The same applies to calibrating the optical layer. In this case, the optical layer is a kind of pattern to be captured. Because the viewing angle of the onboard camera does not include the optical layer, an additional camera is required to capture the optical layer. In summary, we use two different cameras: the onboard camera and the external camera; hence, we can capture two images at the same time. To estimate the pose between the onboard camera and the display panel, we introduce an indirect pose estimation method. As mentioned above, it is not possible to directly capture the display panel or the optical layer with only the onboard camera. Some methods use equipment, e.g., mirrors, to capture objects to estimate the pose. These methods require several images, so a robot or a manipulator that can move the mirror is needed. Alternatively, the object in front of the mirror should be moved, which would cause parameter changes. Our method estimates the final pose by combining three poses. Two poses are estimated from two captured images and have different values for each calibration. The other pose is estimated once and is fixed.
With the proposed calibration method, we can calibrate the ETAD automatically and simultaneously; however, additional elements are needed: the calibration pattern for the onboard camera and the external camera to capture the display panel. We designed a calibration module that consists of these two devices: the calibration pattern and the additional camera, which has a common structure with the ETAD (see Figure 3). As with the ETAD, the calibration pattern is not included in the viewing angle of the external camera of the calibration module. Therefore, we placed the ETAD in front of the calibration module for the calibration process. The onboard camera of the ETAD captures an image of the calibration pattern of the calibration module. Oppositely, the external camera of the calibration pattern captures an image of the optical layer with the display pattern. Here, we can estimate the parameters of the ETAD devices, i.e., the onboard camera and the optical layer, because we know in advance the parameters of the devices constituting the calibration module.
The block diagram of the calibration process with two images is represented in Figure 4. When the calibration begins, the onboard camera and the external camera simultaneously capture the patterns on the opposite side. Using the onboard camera image, the system calibrates the onboard camera parameters, i.e., f x , f y , c x , c y , k 1 , k 2 , k 3 , k 4 . At this time, it also estimates the transformation matrix T p c from the calibration pattern coordinate system to the onboard camera coordinate system. Simultaneously, the system calibrates the optical element parameters, i.e., θ, τ, , using the external camera image. The onboard camera pose, i.e., r x , r y , r z , t x , t y , t z , which also can be represented as the transformation matrix T d c is finally calibrated using two estimated transformation matrix T p c , T x d and T p x calibrated in advance. The details of the calibration method are presented in the following section.

Calibration Method
Onboard camera calibration: The onboard camera parameters are calculated by analyzing the pattern image of the 3D calibration pattern of the calibration module. We employ Zhang's camera calibration method [20] to estimate both intrinsic parameters and distortion parameters of the onboard camera. The calibration pattern of the calibration module is built such that the three 2D patterns are orthogonal to each other. Hence, once an image is captured, the system detects a 3D pattern on the image and split it into three 2D patterns. However, three images are not enough to estimate accurate parameters using [20]; therefore, we add an orthogonal constraint among the estimated poses of the three planes. Using the known 3D points and the corresponding points in the captured image, we estimate the onboard camera parameters and the poses of the plane patterns by minimizing the following functional: where the left term denotes reprojection error and the right term denotes the orthogonal constraint term. In the left term, j represents the index of the points and i represents the index of the 2D patterns. The functionm projects the 3D point M j of the ith pattern, and m ij corresponds to the points on the image. In the right term, λ is a weight factor; we set this term to 0.01. The matrixR is defined by a composition of vectors, which are the basis vectors of R 1 , R 2 , R 3 and can be represented aŝ Because the three planes of the calibration pattern are orthogonal, the matrixR also has to satisfy orthogonality. This means that the product ofR and the transpose ofR is the identity matrix I. The optimization is solved using Levenberg-Marquardt nonlinear optimization [25]. The transformation matrix T c x can be estimated as because the coordinate system of the second pattern is the same as the 3D calibration pattern coordinate system (see Figure 5).
(a) (b) (c) (d) Optical layer calibration: The system calibrates the parameters of the optical layer such as the pitch, the slanted angle, the gap, and the offset. Our method is mainly based on the visual pattern analysis method [21], which uses two captured images by an external camera. First, a pattern is displayed on the panel. This pattern is composed of two vertically striped patterns to avoid ambiguity between the slanted angle and the pitch. The two vertical stripe patterns have different distances and different colors, e.g., blue and green, to separate them later. However, the ambiguity between the gap and the pitch remains. To solve this problem, Hwang et al. [21] captured the pattern images twice at different distances. They first capture the pattern image at a close distance, then move the camera and capture the image from a long distance. It is not suitable to apply this method directly to our system because of the movement of the camera. We propose a revised method to calibrate the parameters of the optical layer in a fixed environment. The key idea of our method is that we only take into account three parameters of the optical layer, excluding the pitch because it was classified as a constant parameter. With the revised method, ambiguities are solved and only one capture image of the pattern is required to calibrate parameters.
We first displayed the pattern image on the display panel and capture the image using the external camera. We employed a state-of-the-art method to estimate camera pose called homography decomposition [26]. We detected positions of the four corners of the display in the captured image and found homography between the external camera coordinates and the image coordinates. The homography can be decomposed into the intrinsic parameters of the external camera and the camera pose matrix, which is denoted as T d x in this paper. We assumed the intrinsic parameters of the extrinsic camera are calibrated in advanced to this process, thus the camera pose matrix T d x can be computed by multiplying the inverse of the intrinsic matrix to the homography. Even if we found the four corners of the pattern area, it is necessary to restore the pattern to its original shape because the shape of the pattern in the captured image may appear distorted due to the camera rotation. To calibrate the system with the captured image, a rectification process is needed. We used a simpler scheme that warps the four corners to the vertices of a rectangle of the display pattern [24].
In the captured image, the original stripe pattern is shown as a lattice pattern (see Figure 6a). This lattice pattern is generated by the intersection of the two stripes: the stripes of the display pattern and the stripes of the optical layers. The lattice pattern L can be represented as where δ denotes Dirac's delta function. In Equation (18), x, y represent the positions on the lattice pattern, and β denotes the horizontal space of the display pattern. The distance of the external camera from the display pattern is represented as d. Note, the parameter beta is fixed and the distance d was obtained when estimating the external camera pose. Therefore, we can estimate parameters of the optical layer p, τ by analyzing the positions of x, y and then determining optimal values of m, n. Generally, a 2D lattice in a spatial domain is mapped to another 2D lattice in the frequency domain [27].
In the frequency domain, Equation (18) is transformed intô where C denotes a constant scale factor. Estimating a peak point ( f x , f y ) of the lattice in the frequency domain instead of (x, y) in the spatial domain has several advantages. The signal-to-noise ratio (SNR) around the peak is quite high in the frequency domain; therefore, the position of the peak point is estimated more accurately. Moreover, noise is well suppressed in the frequency domain [21]. For estimating the parameters of the optical layers, we first transformed the warped pattern image I(x, y), which is in the form of a 2D lattice in a spatial domain to a frequency domain imageÎ( f x , f y ) using the discrete Fourier transform. Then, we estimated a peak point of the lattices in the frequency domain. To estimate the accurate peak position, the paraboloid fitting was exploited. The parameters concerned with the display pattern are already known, thus the parameters of the optical layer can be estimated by decoding the lattice pattern. After detecting the peak point f * x , f * y in the frequency domain, the number of stripes m and the number of slits n are estimated as (m * , n * ) = arg min m,n where p 0 and τ 0 mean the designed pitch and gap, respectively. The selected m and n bring the estimated pitch closer to the designed one. We determined m * , n * by exhaustive search within the range 1-20. The slanted angle θ and the gap τ are then derived using the peak points f x , f y and the selected numbers of m * , n * as Finally, we generated a lattice pattern using these parameters with zero offset. The offset of the optical element was estimated by measuring the vertical shift between the generated lattice pattern and the captured pattern.
Onboard camera pose calibration: The onboard camera pose plays an important role because it converts the three-dimensional position of the viewer's eyes from the camera coordinate system to the display coordinate system. For calibrating the onboard camera pose, we propose an indirect pose estimation method instead of direct pose estimation. As mentioned earlier, the direct pose estimation has several problems and is not suitable for our method, which aims toward an automated and simultaneous calibration process. With the indirect pose estimation method, we estimated the pose of the onboard camera T d c by multiplying several poses as where T d x and T p c are the inverse matrix of T x d and T c p . T x p denotes the pose between the external camera and the calibration pattern on the calibration module (see Figure 7). Therefore, once the T x p is estimated in advanced to the calibration process, it can be used as a fixed value each time the ETAD is calibrated. The other poses, such as T d x and x p T p c , have different values for each calibration process of the ETAD. Note, however, that the estimation of these poses is not needed because they are already estimated in the process of calibrating other devices. T p c , which denotes the pose between the onboard camera and the calibration pattern, is estimated in the process of the onboard camera calibration. Likewise, the pose between the display coordinate system and the calibration pattern coordinate system T d x is estimated in the optical layer calibration process. Therefore, there are no additional processes for estimating the onboard camera pose without multiplying the already known pose matrix.
The proposed indirect method has two advantages. First, the calibration process performed in this method is carried out under a fixed environment. Second, no additional devices or processes are required for estimating the onboard camera calibration, which significantly reduces the calibration time of the ETAD.

Experiments and Discussion
We conducted two types of experiments to evaluate the performance of the proposed calibration scheme. First, we evaluated our method on a synthetic dataset by using a simulation tool. The simulation provides a controlled environment, i.e., we set up the system and obtained images with various parameters. In this case, the ground-truth parameters are available for the synthetic dataset; therefore, we could quantitatively measure the estimation error. We also conducted the experiment in the same manner by adding noise to the original dataset and evaluated the robustness of the proposed method.
Next, we applied the calibration scheme to real-life displays. We used the calibration module that we designed and built as described in the previous section (see Figure 8). The calibration module contains not only a calibration pattern and an external camera but also a workstation and a Wi-Fi module. Through Wi-Fi, the ETAD transmits the image of the onboard camera to the workstation and receives the calibration results. The proposed calibration algorithms were implemented on a workstation equipped with an Intel Core(TM) i7-4960 processor and 64 GB of memory. In the case of the real-life ETAD, the actual parameters were relatively different from the designed ones. However, we could not measure the errors directly because the ground-truth parameters were unknown. We evaluated the performance in the visual aspect of the observed images. We prepared a measuring device that mimics a person: a fake face was attached on the stereo cameras. The ETAD continuously rendered 3D images according to the position of the viewer's eyes. The stereo camera captured the left image and the right image. With the captured images, we assessed the visual quality with two types of evaluation: the first was the measurement of the crosstalk using red-blue images and the second was the image quality measurement using PSNR and SSIM index [28].

Simulation Environment
For the synthetic dataset, we built a simulation environment based on a ray-tracing tool (POV-RAY [29]). The first system was built using the designed parameters listed in Table 2. We generated 100 datasets by adding or subtracting random values from the designed ones, except for the pixel width, pixel height, and pitch parameters, which are classified as constant. We also used constant values for principal points and distortion coefficients for the onboard camera because they cannot be handled with the simulation tool. In the calibration evaluation, however, few attempts were made to measure errors of parameters directly instead of measuring the standard deviation or reprojection error. Thus, we used fixed values for the principal points of the onboard camera (c x = 960, c y = 480) and set all of the distortion coefficients (k 1 -k 4 ) to zero. We defined the resolution of the display panel as 1920 × 1.080 and both pixel sizes µ x , µ y as 0.0846 mm. For each case of the datasets, we captured two images, one from the position of the onboard camera and the other from positions of the external camera of the calibration module. Given the two captured images, we first simultaneously estimated the onboard camera parameters and pose of the onboard camera, as well as the optical layer parameters and the pose of the display panel. Then, we finally estimated the onboard camera pose parameters from the origin of the display panel. The total processing time was 0.8 s. The estimation results are presented in Table 2. The accuracy of the focal length is comparable to that of the existing camera calibration method [30], by comparing the standard deviation. We also evaluated the same dataset with additional noise to verify the robustness of the proposed algorithm in a noisy environment. We added Gaussian noise from signal-noise-ratio (SNR) 0-10 to all the generated datasets. Given the 2200 images, we also evaluated the errors of the parameters.
The result of parameter estimation errors is described in Figure 9. Note, error magnitudes of the parameters related to the optical element are quite low even with a significant amount of noise. On the other hand, the translation error of the pose tends to be affected by noise levels up to values greater than 5 dB of the SNR.

Real System
We also applied the proposed method to a real-life ETAD. The proposed method was applied to a prototype of the ETAD system that included an 18.5-inch LCD panel with a resolution of 1920 × 1080. We inserted an additional parallax barrier in front of the display panel. The parameters concerned with the calibration module, i.e., external camera parameters and the pose T p x were obtained before beginning the calibration process. Because our calibration process is fully automated, all we had to do was place the ETAD in front of the calibration module (see Figure 10a). The ETAD was connected to the calibration module through Wi-Fi. Then, the images were simultaneously captured from the onboard camera and the external camera. The image of the onboard camera was transmitted to the workstation in the calibration module. Using the PC in the module, all parameters were estimated and the results were finally transmitted to the prototype. The overall processing time from connection to writing parameters onto the tablet was less than 1.46 s. The reason behind the overall processing time being longer than the simulation processing time is that it takes into consideration the waiting time after command and image transmission. After completing the calibration, we performed the visual quality evaluation using the calibrated parameters. To measure the view of the user, we set up stereo cameras, approximately 65 mm apart, and attach a printed image of a human face onto the lens shown in Figure 10b. The onboard camera continuously detected the eyes from the face-mimicking image, and a multiplex image was generated from a set of left and right images. We set the stereo camera system at different positions with z-axis values ranging 300-600 mm and performed the above-mentioned process. For every captured image, we adjusted the color to compensate for the difference of RGB spectra between the display and the camera.
For the measurement of the crosstalk, we encoded the views based on color, assigning red to the left, and blue to the right. Ideally, only red pixels are visible on the left image and blue pixels are visible on the right images. However, if errors of parameters are large, the views are not sufficiently separated, and a mix of both images is displayed. This effect is referred to as extrinsic crosstalk. We measured the extrinsic crosstalk according to the scheme described in [31]: Extrinsic crosstalk(%) = Incorrect view luminance Correct view luminance × 100.
We set four positions of different distances between 300 mm and 600 mm. Then, we captured 100 images at each position and analyzed the crosstalk. Figure 11 shows the left and right captured images at different camera positions. In the case of no calibration, the left view (red) and the right view (blue) appear mixed. On the other hand, the left and right views are sufficiently separated when using the parameters from the proposed method. The results are summarized in Table 3. The average external crosstalk value was 8.32%, which is within an acceptable range for a two-view autostereoscopic display [32]. Note, the crosstalks of the no-calibration case are so large that it may be difficult for users to perceive scenes in 3D.  We also evaluated the visual quality in terms of PSNR and SSIM index. In this case, the two views of red-blue are no longer suitable because both metrics need a reference image, which has various colors and structure. We employed de facto standard datasets to evaluate stereo and multi-view image quality such as Middlebury stereo dataset [33,34], The Stanford Light Field Archive [35], MIT Synthetic Light Field Archive [36], and Disney Light Field Archive [37]. From the datasets, we selected eight images, i.e., Lego Knights, Fish, Flower, Aloe, Motorcycle, Baby3, Couch, and Bicycle1. To obtain the reference images of the selected ones, we followed the same method proposed in [21]: We first identified the view corresponding to the camera position. We then displayed the viewpoint image, which was one of the left or the right image, on the panel. This means that a 2D image was displayed on the ETAD. Then, we captured the image, which was free from artifacts caused by the other view image. The evaluation was performed in the same manner, except that we used a multiplexed image rather than a 2D image.
The results of the visual quality assessment are listed in Table 4. The gains of using the proposed method over no calibration are large (5.62 dB in PSNR, 0.2 in SSIM index on average). Figure 12 also shows more details of the results. Similar to the crosstalk evaluation results, two views are mixed in a view when rendering without calibration. In the images of the no-calibration cases, the edges in the wall bricks, the fish's tail, the flower's petal, the leaf of aloe, the front wheel of the motorcycle, the edge of ball, the hippo's eye, and the front wheel of the bicycle are exposed in duplicate. However, we can verify that those artifacts almost disappear when the proposed method is used.

Conclusions
Despite the ability of the ETAD to fix issues faced by the autostereoscopic displays, the parameters of the ETAD become more complex and need higher levels of accuracy. In terms of mass production, the calibration process is essential for devices whose parameters deviate from designed ones. In this paper, we propose an efficient calibration method to estimate all variable parameters of the ETAD.
First, we determined the required parameters of the ETAD system and classified the parameters based on their ability to be misaligned during the assembly process. To fix the parameter errors, we proposed a simultaneous estimation algorithm using two images. For the automated process, we designed a calibration module comprising a pattern and an external camera. By using the module, the entire calibration process can be carried out in fixed environments, thus avoiding the need of any external manipulation. Experimental results obtained by evaluating the proposed method using a synthesis dataset show that the proposed method minimized the estimated parameter errors to within 1 s. Using a real ETAD prototype, rendering with the calibrated parameters limited the crosstalk to under 8.5%. The visual quality was also improved to 30.44% in PSNR and 40.14% in SSIM index by using the proposed method.
where κ is distance to each eye. The distance of two eyes is equal to IPD ψ and defined as The vector passing through the two eyes is orthogonal to face normal vector φ and is represented as