Charuco Board-Based Omnidirectional Camera Calibration Method

In this paper, we propose a Charuco board-based omnidirectional camera calibration method to solve the problem of conventional methods requiring overly complicated calibration procedures. Specifically, the proposed method can easily and precisely provide two-dimensional and three-dimensional coordinates of patterned feature points by arranging the omnidirectional camera in the Charuco board-based cube structure. Then, using the coordinate information of the feature points, an intrinsic calibration of each camera constituting the omnidirectional camera can be performed by estimating the perspective projection matrix. Furthermore, without an additional calibration structure, an extrinsic calibration of each camera can be performed, even though only part of the calibration structure is included in the captured image. Compared to conventional methods, the proposed method exhibits increased reliability, because it does not require additional adjustments to the mirror angle or the positions of several pattern boards. Moreover, the proposed method calibrates independently, regardless of the number of cameras comprising the omnidirectional camera or the camera rig structure. In the experimental results, for the intrinsic parameters, the proposed method yielded an average reprojection error of 0.37 pixels, which was better than that of conventional methods. For the extrinsic parameters, the proposed method had a mean absolute error of 0.90◦ for rotation displacement and a mean absolute error of 1.32 mm for translation displacement.


Introduction
Recently, with the development of head-mounted displays, it has become possible to provide users with immersive virtual reality (VR). In addition to computer graphics, capturing a real scene with a camera and transferring it to a VR space has become imperative to providing VR content. Moreover, to further increase the user's degree of freedom (DoF) and immersion in the VR space, various omnidirectional cameras, such as a dioptric camera [1], catadioptric camera [2][3][4][5], and polydioptric camera [6][7][8][9], have been proposed. In this paper, we deal with the polydioptric camera such as Facebook Surround 360 [10], Google Jump [11], Richo Theta [12], and Samsung Gear 360 [13]. These omnidirectional cameras are divergent structures and capture surrounding image information simultaneously through overlapping the images from multiple cameras. To create real-world quality VR content, since we need to extract the depth information instead of simply stitching the images, we have to understand the internal characteristics and the geometric positions of the installed cameras.
For this purpose, two essential calibrations [14] must be performed. First, there is an intrinsic calibration for measuring the sensor characteristic and lens distortion between cameras constituting the omnidirectional camera. Then, there is an extrinsic calibration for measuring the relative three-dimensional (3D) positional difference (rotation and translation) among the cameras. Generally, to find these calibrated parameters for a single camera, conventional methods artificially provide planar feature points such as a chessboard. However, when applying these conventional calibration methods to the omnidirectional calibration method, it is difficult for the cameras to share the same 3D world coordinate system and the same method has to be repeated several times for each camera.
Whereas, a self-calibration is also well-studied after Duane [15] and Kenefick et al. [16] proposed a bundle adjustment concept for lens distortion [17][18][19]. The self-calibration can estimate the intrinsic parameters and the relative position from relative orientation through the bundle adjustment of correspondence pairs extracted from captured multi-view images [20]. It has advantages that both intrinsic and extrinsic calibrations can be performed at once without a calibration structure, and absolute distance can also be estimated if there is an object (e.g., encoded marker) that tells the scale of the real world in the image. However, this method has drawbacks that it depends on the quality of correspondence pairs, and needs wide overlapped area so that the feature points of the images can be shared. Therefore, it is difficult to apply it to our target omnidirectional camera, especially in case of the divergent type.
We herein propose a new Charuco board-based omnidirectional camera calibration structure and method for solving the problems of conventional camera calibration methods, which require overly complicated procedures to accurately calibrate the omnidirectional camera. The proposed method is based on the Aruco marker and a Charuco board [21] to perform the intrinsic calibration and extrinsic calibration of multiple cameras comprising the omnidirectional camera. Specifically, the Charuco board pattern is placed inside a cube structure so that it can easily and precisely provide two-dimensional (2D) and 3D coordinates for the patterned feature points. Through the proposed method, it is possible that multiple cameras can share the same 3D world coordinate system by utilizing feature point information from the Charuco board pattern, even if the image captured by a given camera does not overlap with the other images captured by the opposite camera. Additionally, the proposed structure allows both intrinsic and extrinsic calibrations, eliminating the necessity for additional structures in the overall calibration process. If only a part of the calibration structure is included in the captured image (i.e., at least one Aruco marker is in the captured image) [22,23], it is possible to calibrate the omnidirectional camera. Moreover, via the proposed calibration method, the camera position can be visualized on the virtual 3D space using the obtained rotation and translation information.
This paper is organized as follows. Section 2 introduces the concepts of camera calibration and its existing methods. Section 3 explains the proposed calibration structure and the procedure of the proposed calibration method. In Section 4, we introduce the experimental setting and results obtained by the proposed method, compared to the conventional method. Finally, in Section 5, we conclude the paper.

Single Camera Calibration
Based on the pinhole camera model, a camera captures an image by mapping a one-to-one relationship between a point Q = (X, Y, Z) in a 3D world coordinate system and a pixel q = (x, y) in a 2D image coordinate system, as shown in Figure 1. This mapping is called a perspective projection transformation, which can be easily expressed with homogeneous coordinates [14]. Figure 1 shows homogeneous coordinates that can represent multiple 3D points projected at the same position on the image plane as a single coordinate. Perspective projection transformation on the homogeneous coordinates can be expressed as follows: where s is a nonzero scale factor, f x and f y are indices of x and y axes for a focal length between a pinhole and an image sensor in pixels, skew_c is a skew coefficient of the image sensor array, and c x and c y are indices of x and y axes for a principal point on image coordinate system. A, R, and t are a camera matrix, a rotation matrix, and a translation matrix, respectively. P is a projection matrix.
[ ] s is a nonzero scale factor, x f and y f are indices of x and y axes for a focal l en a pinhole and an image sensor in pixels, _ skew c is a skew coefficient of the image s and x c and y c are indices of x and y axes for a principal point on image coor . A , R , and t are a camera matrix, a rotation matrix, and a translation matrix, respec projection matrix. he parameters representing the characteristics of the camera (e.g., focal length, skew coeff incipal point) are intrinsic parameters, represented by the camera matrix, A . The param to geometric relations, such as rotation and translation between the 3D camera coord and the 3D world coordinate system, are extrinsic parameters. R and t are rotatio tion matrices, respectively, for transforming the world coordinate system into the c nate system. Therefore, intrinsic and extrinsic parameters are essential when calculati coordinate of the projected 3D points, or vice versa. The overall process of estimating eters is the camera calibration. onventional camera calibration methods mainly focus on estimating intrinsic paramete camera. After Tsai [24] first performed a calibration using a 3D object, to simplify it, Zhan 2D chessboard for finding optimal parameters through the least-squares approximation oard helps to detect the invariant feature points (i.e., corner points) in the captured ima t can easily find corresponding coordinates (2D chessboard coordinate and 2D nate) of feature points. Then, as an optimal solution, it selected parameters that minimiz e reprojection error for all feature corner points in images that captured from various an The parameters representing the characteristics of the camera (e.g., focal length, skew coefficient, and principal point) are intrinsic parameters, represented by the camera matrix, A. The parameters related to geometric relations, such as rotation and translation between the 3D camera coordinate system and the 3D world coordinate system, are extrinsic parameters. R and t are rotation and translation matrices, respectively, for transforming the world coordinate system into the camera coordinate system. Therefore, intrinsic and extrinsic parameters are essential when calculating an image coordinate of the projected 3D points, or vice versa. The overall process of estimating these parameters is the camera calibration.
Conventional camera calibration methods mainly focus on estimating intrinsic parameters of a single camera. After Tsai [24] first performed a calibration using a 3D object, to simplify it, Zhang [25] used a 2D chessboard for finding optimal parameters through the least-squares approximation. The chessboard helps to detect the invariant feature points (i.e., corner points) in the captured image, so that it can easily find corresponding coordinates (2D chessboard coordinate and 2D image coordinate) of feature points. Then, as an optimal solution, it selected parameters that minimized the average reprojection error for all feature corner points in images that captured from various angles.
Thereafter, several methods for estimating extrinsic parameters of the camera (i.e., camera pose estimation) were proposed using a strong invariance feature points in addition to the existing chess board [26][27][28]. Tang et al. [26] proposed a method estimating the extrinsic parameters of the camera based on the array plane, using the robust and flexible characteristics of AprilTag [29]. Dong et al. [27] proposed arbitrarily distributed encoded targets, based on close-range photogrammetry to provide indices to feature points. Even if a small part of the target plane is captured by a camera, it can perform the extrinsic calibration. Carrera et al. [28] proposed an extrinsic calibration that combines camera motion through a robot and a visual simultaneous localization and mapping algorithm [30] without a calibration pattern.
Additionally, methods for calibrating both intrinsic and extrinsic parameters were proposed [31][32][33][34][35]. Li et al. [31] proposed a method of creating a random pattern by reverse-engineered scale-invariant feature transform [36], which detects feature points that are highly invariant to various distortions. Strauß et al. [32] proposed a method combining the advantages of existing intrinsic and extrinsic calibration methods using a coded checkerboard. Yu et al. [33] proposed a camera calibration using a virtual large planar target for a camera with a large field of view (FoV). Fraser [34,35] proposed the different distribution coded targets to automate process of calibration.

Omnidirectional Camera Calibration
Unlike the single camera calibration, multi-camera calibration requires that two or more cameras share the same coordinate space. In the case of stereo camera calibration, the relationship between two camera coordinates sharing the same object is estimated through epipolar geometry [20]. As shown in Figure 2a, for a point in the world coordinate system, Q, the points projected on the image plane of the left and right cameras are q L and q R , respectively. Additionally, we can use the fundamental relationship, q R Fq L = 0, to estimate the parameters of the fundamental matrix, F [20]. However, in the case of a divergent structure, such as a polydioptric omnidirectional camera as shown in Figure 2b, there is a limitation to applying an epipolar geometry, because acquiring the same feature point from the opposite directions is impossible. To overcome these problems, methods using mirrors and reflected images [37,38] have been proposed. However, there are disadvantages that the size of a mirror and the distance from the camera must be accurately calculated. Additionally, angle calculation of the mirror plane is complicated, because all pattern information must be provided to each camera constituting the omnidirectional camera. Furthermore, as the number of individual cameras increases, the number of calibration structures must increase. Zhu et al. [39] also proposed omnidirectional camera calibration combined with conventional methods [25,31], but it was difficult to provide accurate world coordinate systems, because it irregularly arranged patterns in a space rather than using a single structure. Tommaselli et al. [40,41] proposed a catadioptric omnidirectional camera calibration with the Aruco [21] 3D terrestrial calibration field. Campos et al. [42], Khoramshahi and Honkavaara [43] proposed a polydioptric omnidirectional camera calibration with the coded target surface room. However, these methods [40][41][42][43] are difficult to maintain the illumination condition of room which has to be uniform to detect the coded feature points, and need a large amount of space to calibrate the target camera structures. extrinsic calibration methods using a coded checkerboard. Yu et al. [33] proposed a camera calibration using a virtual large planar target for a camera with a large field of view (FoV). Fraser [34,35] proposed the different distribution coded targets to automate process of calibration.

Omnidirectional Camera Calibration
Unlike the single camera calibration, multi-camera calibration requires that two or more cameras share the same coordinate space. In the case of stereo camera calibration, the relationship between two camera coordinates sharing the same object is estimated through epipolar geometry [20]. As shown in Figure 2a, for a point in the world coordinate system, Q , the points projected on the image plane of the left and right cameras are L q and R q , respectively. Additionally, we can use the fundamental relationship, , to estimate the parameters of the fundamental matrix, F [20].
However, in the case of a divergent structure, such as a polydioptric omnidirectional camera as shown in Figure 2b, there is a limitation to applying an epipolar geometry, because acquiring the same feature point from the opposite directions is impossible. To overcome these problems, methods using mirrors and reflected images [37,38] have been proposed. However, there are disadvantages that the size of a mirror and the distance from the camera must be accurately calculated. Additionally, angle calculation of the mirror plane is complicated, because all pattern information must be provided to each camera constituting the omnidirectional camera. Furthermore, as the number of individual cameras increases, the number of calibration structures must increase. Zhu et al. [39] also proposed omnidirectional camera calibration combined with conventional methods [25,31], but it was difficult to provide accurate world coordinate systems, because it irregularly arranged patterns in a space rather than using a single structure. Tommaselli et al. [40,41] proposed a catadioptric omnidirectional camera calibration with the Aruco [21] 3D terrestrial calibration field. Campos et al. [42], Khoramshahi and Honkavaara [43] proposed a polydioptric omnidirectional camera calibration with the coded target surface room. However, these methods [40][41][42][43] are difficult to maintain the illumination condition of room which has to be uniform to detect the coded feature points, and need a large amount of space to calibrate the target camera structures.

Proposed Calibration Structure
We herein propose a Charuco board-based cube structure and a method to perform both intrinsic and extrinsic calibrations of the omnidirectional camera as shown in Figure 3. We designed the proposed structure placing four different 10

Proposed Calibration Structure
We herein propose a Charuco board-based cube structure and a method to perform both intrinsic and extrinsic calibrations of the omnidirectional camera as shown in Figure 3. We designed the proposed structure placing four different 10 × 10 grid Charuco board patterns [21] on the each face (front, back, left, and right) of the 60 × 60 × 60 cm 3 acrylic material cube. The length and orthogonality were obtained using an accuracy of 0.02 mm Vernier calipers and 0.0573 • vertical meter, respectively. The Charuco board is a combination of Aruco markers [21] and a chess board, where the Aruco markers can support the distinction of the designated 3D coordinates of corner points on the chessboard via their marker identification (ID). Therefore, if only a part of the calibration structure (e.g., an Aruco marker) is included in the captured image, it becomes possible to calibrate the intrinsic parameters. Furthermore, without an additional calibration structure, extrinsic calibration can be performed to estimate the absolute 3D position difference (rotation and translation). Compared to the conventional methods, the proposed method increases reliability, because it does not require the additional adjustments to the angle of the mirror or the positions of several pattern boards. Additionally, the proposed method can calibrate the omnidirectional camera independently regardless of the number of cameras constituting the omnidirectional camera or the camera rig structure. Moreover, with applying the cube structure, calculating the rotation and the translation between the world coordinate system and the camera coordinate system is less complicated than conventional methods, because it can provide a Cartesian coordinate with the high reliability. The overall process of the proposed method is shown as Figure 4 and Algorithm 1. Proposed method consists of three steps: (1) intrinsic calibration, (2) extrinsic calibration, and (3) visualization of the camera position. Our algorithm searches Aruco markers and corner points in the given input images to identify the 2D board coordinate and the 3D world coordinate of the corner points. Then, it finds the matching pair and solves the non-linear least-squares problems to estimate the optimal parameters. A detailed description of the algorithm is given in the next section. performed to estimate the absolute 3D position difference (rotation and translation). Compared to the conventional methods, the proposed method increases reliability, because it does not require the additional adjustments to the angle of the mirror or the positions of several pattern boards. Additionally, the proposed method can calibrate the omnidirectional camera independently regardless of the number of cameras constituting the omnidirectional camera or the camera rig structure. Moreover, with applying the cube structure, calculating the rotation and the translation between the world coordinate system and the camera coordinate system is less complicated than conventional methods, because it can provide a Cartesian coordinate with the high reliability. The overall process of the proposed method is shown as Figure 4 and Algorithm 1. Proposed method consists of three steps: (1) intrinsic calibration, (2) extrinsic calibration, and (3) visualization of the camera position. Our algorithm searches Aruco markers and corner points in the given input images to identify the 2D board coordinate and the 3D world coordinate of the corner points. Then, it finds the matching pair and solves the non-linear least-squares problems to estimate the optimal parameters. A detailed description of the algorithm is given in the next section.   performed to estimate the absolute 3D position difference (rotation and translation). Compared to the conventional methods, the proposed method increases reliability, because it does not require the additional adjustments to the angle of the mirror or the positions of several pattern boards. Additionally, the proposed method can calibrate the omnidirectional camera independently regardless of the number of cameras constituting the omnidirectional camera or the camera rig structure. Moreover, with applying the cube structure, calculating the rotation and the translation between the world coordinate system and the camera coordinate system is less complicated than conventional methods, because it can provide a Cartesian coordinate with the high reliability. The overall process of the proposed method is shown as Figure 4 and Algorithm 1. Proposed method consists of three steps: (1) intrinsic calibration, (2) extrinsic calibration, and (3) visualization of the camera position. Our algorithm searches Aruco markers and corner points in the given input images to identify the 2D board coordinate and the 3D world coordinate of the corner points. Then, it finds the matching pair and solves the non-linear least-squares problems to estimate the optimal parameters. A detailed description of the algorithm is given in the next section.

Intrinsic Parameters Estimation of the Omnidirectional Camera
The principle of estimating intrinsic parameters is as follows. By considering the actual boardsquare length and the ratio of the Aruco marker and the Charuco board, we can designate the world coordinates of each corner point in advance. For the individual camera calibration, we place the omnidirectional camera inside the proposed calibration structure, as shown in Figure 5. Regardless of the center of the structure, the omnidirectional camera can be positioned where it can capture at least one Aruco marker in the pattern, considering the focal length and illuminance conditions. Then, the inside of the structure is captured by rotating the rig around the center of the tripod. In this case, at least two images having a common feature point must be captured to estimate the intrinsic parameters. From the captured images, we proceed corner-point detection and designate the 2D board coordinate of the detected corner point by using the near-marker ID information. Then, we match the 2D board coordinate and the image coordinate of the same corner point and estimate the intrinsic parameters. Specifically, let , is distorted by the lens before projection to the image sensor. We only consider the radial and tangential distortion of the lens, approximated through Taylor series as in Equation (2). And we do not consider the fisheye lens distortion model which covers up to 180°. Then, the distorted normalized image coordinate point, dn dn x y = P , is multiplied by the camera matrix, A , to obtain the image coordinate point, ( , ) x y = P , as in Equation (3). After detecting M corner points in image j I , for the paired image coordinate  (4). In the proposed method, we use the Levenberg-Marquardt method [44] to find the optimal parameters. And we set a skew coefficient, _ skew c , as zero since the image sensor array is solid state.

Intrinsic Parameters Estimation of the Omnidirectional Camera
The principle of estimating intrinsic parameters is as follows. By considering the actual board-square length and the ratio of the Aruco marker and the Charuco board, we can designate the world coordinates of each corner point in advance. For the individual camera calibration, we place the omnidirectional camera inside the proposed calibration structure, as shown in Figure 5. Regardless of the center of the structure, the omnidirectional camera can be positioned where it can capture at least one Aruco marker in the pattern, considering the focal length and illuminance conditions. Then, the inside of the structure is captured by rotating the rig around the center of the tripod. In this case, at least two images having a common feature point must be captured to estimate the intrinsic parameters. From the captured images, we proceed corner-point detection and designate the 2D board coordinate of the detected corner point by using the near-marker ID information. Then, we match the 2D board coordinate and the image coordinate of the same corner point and estimate the intrinsic parameters. Specifically, let CAM i be the individual camera, and I j , j = 1, . . . , N be the one of N images captured by CAM i . The detection processes for the corner points and the Aruco marker ID are performed on I j , and the 2D board coordinate of the corner point, P, is designated as shown in Figure 5. (Refer to Garrido-Jurado et al. [21] for more details on marker detection.) For the Aruco marker ID, k, the top-left corner point's board coordinate is designated as (k, 1), and the top-right corner point's board coordinate is designated as (k, 2), and so on. Additionally, considering the lens distortion model, the relationship between the image coordinate, P = (x, y), normalized image coordinate, P n = (x n , y n ), and distorted normalized image coordinate, P dn = (x dn , y dn ), of the corner point can be expressed in Equations (2) and (3). Based on the pinhole camera model, normalized image coordinate point, P n = (x n , y n ), is distorted by the lens before projection to the image sensor. We only consider the radial and tangential distortion of the lens, approximated through Taylor series as in Equation (2). And we do not consider the fisheye lens distortion model which covers up to 180 • . Then, the distorted normalized image coordinate point, P dn = (x dn , y dn ), is multiplied by the camera matrix, A, to obtain the image coordinate point, P = (x, y), as in Equation (3). After detecting M corner points in image I j , for the paired image coordinate P m = (x m , y m ) and the distorted normalized image coordinate P m_dn = (x m_dn , y m_dn ) of each corner point P m,m=1,...,M , we set the optimal camera matrix, A i * , of CAM i to minimize the reprojection error in Equation (4). In the proposed method, we use the Levenberg-Marquardt method [44] to find the optimal parameters. And we set a skew coefficient, skew_c, as zero since the image sensor array is solid state.
where D is a lens distortion model, k 1 , k 2 , and k 3 are radial distortion coefficients, p 1 and p 2 are tangential distortion coefficients, and r n is the Euclidean distance from a principal point on the normalized image coordinate.
where D is a lens distortion model, 1 k , 2 k , and 3 k are radial distortion coefficients, 1 p and 2 p are tangential distortion coefficients, and n r is the Euclidean distance from a principal point on the normalized image coordinate.
where rep E is a reprojection error between two points, m P and _ ( ) AD P m n .

Estimation of Extrinsic Parameters Using Estimated Intrinsic Parameters
For omnidirectional camera extrinsic calibration, we fix the camera rig and arrange the proposed calibration structure to capture the images one-by-one for each camera. After finding the corner points of the Charuco board from the captured images, we find the mapping relation between the 2D image coordinate and the 3D world coordinate of the corner points. For the image coordinate and 3D world coordinate of the specific corner point, the relationship between these coordinates can also be expressed as in Equation (1).
Specifically, for the camera  After detecting M corner points in image I j , for the paired image coordinate P m = (x m , y m ) and the normalized image coordinate P m_n = (x m_n , y m_n ) of each corner point P m,m=1,...,M , we set the optimal camera matrix A i * and the optimal lens distortion D i * of CAM i to minimize the reprojection error in Equation (4). In the proposed method, we use the Levenberg-Marquardt method [44] to find the optimal parameters.
where E rep is a reprojection error between two points, P m and AD(P m_n ).

Estimation of Extrinsic Parameters Using Estimated Intrinsic Parameters
For omnidirectional camera extrinsic calibration, we fix the camera rig and arrange the proposed calibration structure to capture the images one-by-one for each camera. After finding the corner points of the Charuco board from the captured images, we find the mapping relation between the 2D image coordinate and the 3D world coordinate of the corner points. For the image coordinate and 3D world coordinate of the specific corner point, the relationship between these coordinates can also be expressed as in Equation (1).
Specifically, for the camera CAM i , after detecting M corner points in the image, the paired image coordinate P m = (x m , y m ) and the 3D world coordinate P m_world = (X m_world , Y m_world , Z m_world ) of each corner point P m,m=1,...,M , is set as the extrinsic parameters of CAM i , which minimizes the Euclidean distance in Equation (5). Here, we use the camera matrix A i * and the lens distortion D i * , estimated in Section 3.2. Then, we use the Levenberg-Marquardt method [44] to find the optimal solution, R i * and t i * , to map the image coordinate and 3D world coordinate pairs. Finally, we obtain the rotation and translation information of each camera: where d is a Euclidean distance between two points, P m and A i *D i *([R|t]P m_world ).

Experiment on Proper Size of Calibration Pattern Unit
Before executing a proposed calibration experiment, we conducted the experiment on the proper size of the used pattern unit to improve the detection of the marker in the proposed calibration structure. The pattern unit means one white square containing the Aruco marker on the Charuco board. Since the resolution, FoV, and distance from the pattern of the camera are changed every time, it is difficult to derive the physical scale (mm) for the proper size of the pattern unit in the captured image.
Therefore, we derived the appropriate size of pattern unit on pixel scale. For the experiment, the distance between the camera and the pattern, and the size of the Charuco board were kept constant as shown in Figure 6a. The size of the pattern unit was adjusted and captured as shown in Figure 6b. Also, as the main purpose of the Charuco board is to provide the feature points, the accuracy was derived from the correctly detected number of corner points compared to the actual number of corner points in the captured image. Additionally, the camera was rotated and 10 images were taken for each pattern at various angles. The experimental results are shown in Table 1. The accuracy of the detected corner point was 97% or more when the length of one pattern unit in the captured image exceeds about 100 pixels. As a result, without the consideration for camera resolution, FoV, and distance from the pattern, we required the length occupying more than 100 pixels in the captured image for the accurate calibration.

Experiment on Proper Size of Calibration Pattern Unit
Before executing a proposed calibration experiment, we conducted the experiment on the proper size of the used pattern unit to improve the detection of the marker in the proposed calibration structure. The pattern unit means one white square containing the Aruco marker on the Charuco board. Since the resolution, FoV, and distance from the pattern of the camera are changed every time, it is difficult to derive the physical scale (mm) for the proper size of the pattern unit in the captured image.
Therefore, we derived the appropriate size of pattern unit on pixel scale. For the experiment, the distance between the camera and the pattern, and the size of the Charuco board were kept constant as shown in Figure 6a. The size of the pattern unit was adjusted and captured as shown in Figure 6b. Also, as the main purpose of the Charuco board is to provide the feature points, the accuracy was derived from the correctly detected number of corner points compared to the actual number of corner points in the captured image. Additionally, the camera was rotated and 10 images were taken for each pattern at various angles. The experimental results are shown in Table 1. The accuracy of the detected corner point was 97% or more when the length of one pattern unit in the captured image exceeds about 100 pixels. As a result, without the consideration for camera resolution, FoV, and distance from the pattern, we required the length occupying more than 100 pixels in the captured image for the accurate calibration.

Experimental Setting and Process
We experimented with the proposed omnidirectional camera calibration structure. In the experiment, we used a camera rig with 360 • horizontal coverage. Additionally, we used a pentagonal rig as a target omnidirectional camera rig to combine with the five-stereo shape. Compared to the circular rig with overlapping regions, the pentagonal rig has large overlapping regions where the cameras are parallel and few overlapping regions where the cameras diverge. With the target rig, we used 10 GoPro Hero4 Blacks (GoPro Inc., San Mateo, CA, USA) [45]. The resolution of each camera was 2250 × 3000 pixels, vertical FoV was 94.4 • , and horizontal FoV was 72.2 • . The structure of the target rig and the overlapping area between the cameras are shown in Figure 7. During the experiment, 10 cameras were captured simultaneously using the remote controller provided by GoPro. For the proposed calibration and the visualization of the calibration results, OpenCV camera calibration example code [46], Charuco library [47], and the camera calibration toolbox for Matlab [48] were used, respectively.

Experimental Setting and Process
We experimented with the proposed omnidirectional camera calibration structure. In the experiment, we used a camera rig with 360° horizontal coverage. Additionally, we used a pentagonal rig as a target omnidirectional camera rig to combine with the five-stereo shape. Compared to the circular rig with overlapping regions, the pentagonal rig has large overlapping regions where the cameras are parallel and few overlapping regions where the cameras diverge. With the target rig, we used 10 GoPro Hero4 Blacks (GoPro, Inc., San Mateo, CA, USA) [45]. The resolution of each camera was 2250 3000 × pixels, vertical FoV was 94.4°, and horizontal FoV was 72.2°. The structure of the target rig and the overlapping area between the cameras are shown in Figure 7. During the experiment, 10 cameras were captured simultaneously using the remote controller provided by GoPro. For the proposed calibration and the visualization of the calibration results, OpenCV camera calibration example code [46], Charuco library [47], and the camera calibration toolbox for Matlab [48] were used, respectively. For the intrinsic calibration of the omnidirectional camera, the camera rig was placed in the proposed calibration structure. By rotating the rig, 20 images were obtained for each camera, as shown in Figure 8a. Then, we obtained the intrinsic parameters of each camera that minimized the reprojection error for the 20 images of that camera. Next, for the extrinsic calibration, the camera rig was fixed in the proposed structure. We then captured one image for each camera, as shown in Figure  8b. For the captured images, we obtained the extrinsic parameters of each camera to minimize the Euclidean distance. Then, based on the estimated rotation and translation information, the position of the individual camera constituting the rig was visualized in a 3D space. For the intrinsic calibration of the omnidirectional camera, the camera rig was placed in the proposed calibration structure. By rotating the rig, 20 images were obtained for each camera, as shown in Figure 8a. Then, we obtained the intrinsic parameters of each camera that minimized the reprojection error for the 20 images of that camera. Next, for the extrinsic calibration, the camera rig was fixed in the proposed structure. We then captured one image for each camera, as shown in Figure 8b. For the captured images, we obtained the extrinsic parameters of each camera to minimize the Euclidean distance. Then, based on the estimated rotation and translation information, the position of the individual camera constituting the rig was visualized in a 3D space.

Experimental Setting and Process
We experimented with the proposed omnidirectional camera calibration structure. In the experiment, we used a camera rig with 360° horizontal coverage. Additionally, we used a pentagonal rig as a target omnidirectional camera rig to combine with the five-stereo shape. Compared to the circular rig with overlapping regions, the pentagonal rig has large overlapping regions where the cameras are parallel and few overlapping regions where the cameras diverge. With the target rig, we used 10 GoPro Hero4 Blacks (GoPro, Inc., San Mateo, CA, USA) [45]. The resolution of each camera was 2250 3000 × pixels, vertical FoV was 94.4°, and horizontal FoV was 72.2°. The structure of the target rig and the overlapping area between the cameras are shown in Figure 7. During the experiment, 10 cameras were captured simultaneously using the remote controller provided by GoPro. For the proposed calibration and the visualization of the calibration results, OpenCV camera calibration example code [46], Charuco library [47], and the camera calibration toolbox for Matlab [48] were used, respectively. For the intrinsic calibration of the omnidirectional camera, the camera rig was placed in the proposed calibration structure. By rotating the rig, 20 images were obtained for each camera, as shown in Figure 8a. Then, we obtained the intrinsic parameters of each camera that minimized the reprojection error for the 20 images of that camera. Next, for the extrinsic calibration, the camera rig was fixed in the proposed structure. We then captured one image for each camera, as shown in Figure  8b. For the captured images, we obtained the extrinsic parameters of each camera to minimize the Euclidean distance. Then, based on the estimated rotation and translation information, the position of the individual camera constituting the rig was visualized in a 3D space.

Experimental Results
The performance evaluation of the proposed calibration method was divided into two parts: (i) accuracy of the intrinsic parameters and (ii) accuracy of the extrinsic parameters. First, the accuracy of the intrinsic parameters of the individual cameras constituting the omnidirectional camera were evaluated. After detecting the feature points for 20 images of 10 cameras, we calculated the average reprojection error for each camera through the estimated intrinsic parameters of (4). Additionally, we compared the proposed method with Zhang's method [25] which is the one of the commonly-used conventional camera calibration methods, and with the method of Li et al. [31] which uses a random pattern as an invariant feature to calibrate even if the image is partially captured. Using Zhang's and Li et al.'s methods, we captured 20 images of their pattern, and we compared the average reprojection error, as shown in Table 2 and Figure 9a. For 10 cameras, the average reprojection error of the proposed method was 0.37 pixels: 0.2 pixels fewer than Zhang's method, and 0.3 pixels fewer than Li et al.'s method. Additionally, unlike Zhang's method, it was possible to calibrate, even if the entire pattern board had not been captured at the image. Therefore, it was confirmed that intrinsic calibration of the omnidirectional camera is possible without the problems of the conventional method, where the same procedure must be repeated for each camera.

Experimental Results
The performance evaluation of the proposed calibration method was divided into two parts: (i) accuracy of the intrinsic parameters and (ii) accuracy of the extrinsic parameters. First, the accuracy of the intrinsic parameters of the individual cameras constituting the omnidirectional camera were evaluated. After detecting the feature points for 20 images of 10 cameras, we calculated the average reprojection error for each camera through the estimated intrinsic parameters of (4). Additionally, we compared the proposed method with Zhang's method [25] which is the one of the commonly-used conventional camera calibration methods, and with the method of Li et al. [31] which uses a random pattern as an invariant feature to calibrate even if the image is partially captured. Using Zhang's and Li et al.'s methods, we captured 20 images of their pattern, and we compared the average reprojection error, as shown in Table 2 and Figure 9a. For 10 cameras, the average reprojection error of the proposed method was 0.37 pixels: 0.2 pixels fewer than Zhang's method, and 0.3 pixels fewer than Li et al.'s method. Additionally, unlike Zhang's method, it was possible to calibrate, even if the entire pattern board had not been captured at the image. Therefore, it was confirmed that intrinsic calibration of the omnidirectional camera is possible without the problems of the conventional method, where the same procedure must be repeated for each camera.  Second, the accuracy of the estimated extrinsic parameters was evaluated. Because obtaining the ground truth of the 3D world coordinate system of the camera is difficult, we rotated and translated the camera rig, as in Figure 10. We compared it with the calculated displacement via estimated extrinsic calibration. The rig was rotated in 60° increments and translated at 30-mm increments. The result of extrinsic calibration is more accurate when the mean absolute error is closer to zero. The experimental results for each of the rotations and translations are shown in Tables 3 and 4, and Figure  9b,c, respectively. From the results, we can see that the rotation error and the translation error were about 0.90° and 1.32 mm, respectively. Additionally, we confirmed the visualization results. The posture of multiple cameras can be easily confirmed by visualizing the estimated pose of each camera in three dimensions. We used the estimated extrinsic parameters (e.g., translation and rotation Second, the accuracy of the estimated extrinsic parameters was evaluated. Because obtaining the ground truth of the 3D world coordinate system of the camera is difficult, we rotated and translated the camera rig, as in Figure 10. We compared it with the calculated displacement via estimated extrinsic calibration. The rig was rotated in 60 • increments and translated at 30-mm increments. The result of extrinsic calibration is more accurate when the mean absolute error is closer to zero. The experimental results for each of the rotations and translations are shown in Tables 3 and 4, and Figure 9b,c, respectively. From the results, we can see that the rotation error and the translation error were about 0.90 • and 1.32 mm, respectively. Additionally, we confirmed the visualization results. The posture of multiple cameras can be easily confirmed by visualizing the estimated pose of each camera in three dimensions. We used the estimated extrinsic parameters (e.g., translation and rotation vectors) to allow for additional alignments (e.g., pan and tilt) of the individual camera constituting the omnidirectional camera. For the given position of the omnidirectional camera, as Figure 11a, we visualized the calibration results of the proposed method as in Figure 11b. To compare the proposed calibration method, we proceeded with a chain-wise stereo camera calibration and visualized the result using relative rotation and translation information of the adjacent camera, such as CAM 1 -CAM 2 , CAM 2 -CAM 3 , CAM 3 -CAM 4 , etc. The proposed method visualized the shape of the pentagon more clearly, as shown in Figure 11c. When using the chain-wise method, the positional error increased, because errors accumulated, as shown in the circles of Figure 11c. However, because the proposed method estimated the rotation and translation independently for each camera, the positional error is relatively smaller.
Electronics 2018, 7, x FOR PEER REVIEW 11 of 15 calibration method, we proceeded with a chain-wise stereo camera calibration and visualized the result using relative rotation and translation information of the adjacent camera, such as CAM1-CAM2, CAM2-CAM3, CAM3-CAM4, etc. The proposed method visualized the shape of the pentagon more clearly, as shown in Figure 11c. When using the chain-wise method, the positional error increased, because errors accumulated, as shown in the circles of Figure 11c. However, because the proposed method estimated the rotation and translation independently for each camera, the positional error is relatively smaller.      To show the validity of the proposed method on the VR application, we used Autopano Giga 4.4 [49] which mainly aims to the omnidirectional image stitching. Image stitching needs not only extrinsic parameters of each camera, but also additional warping to get the result without cognitive distortion. However, we applied the result of our method only using the extrinsic parameter results without any additional processing as shown in Figure 12. To visualization, we only contain the result images of the odd-numbered cameras among 10 cameras. The blue circles shows the images are wellconnected, and the red circles shows the images are disconnected which causes cognitive distortion.

Conclusions
A Charuco board-based omnidirectional camera calibration method and structure is herein proposed to solve the problem where conventional camera calibration methods require too many complicated procedures. In the case of the omnidirectional camera, it must be comprised of several cameras to collect 360° information around the rig. Additionally, the calibration among these cameras To show the validity of the proposed method on the VR application, we used Autopano Giga 4.4 [49] which mainly aims to the omnidirectional image stitching. Image stitching needs not only extrinsic parameters of each camera, but also additional warping to get the result without cognitive distortion. However, we applied the result of our method only using the extrinsic parameter results without any additional processing as shown in Figure 12. To visualization, we only contain the result images of the odd-numbered cameras among 10 cameras. The blue circles shows the images are well-connected, and the red circles shows the images are disconnected which causes cognitive distortion.  To show the validity of the proposed method on the VR application, we used Autopano Giga 4.4 [49] which mainly aims to the omnidirectional image stitching. Image stitching needs not only extrinsic parameters of each camera, but also additional warping to get the result without cognitive distortion. However, we applied the result of our method only using the extrinsic parameter results without any additional processing as shown in Figure 12. To visualization, we only contain the result images of the odd-numbered cameras among 10 cameras. The blue circles shows the images are wellconnected, and the red circles shows the images are disconnected which causes cognitive distortion.

Conclusions
A Charuco board-based omnidirectional camera calibration method and structure is herein proposed to solve the problem where conventional camera calibration methods require too many complicated procedures. In the case of the omnidirectional camera, it must be comprised of several cameras to collect 360° information around the rig. Additionally, the calibration among these cameras

Conclusions
A Charuco board-based omnidirectional camera calibration method and structure is herein proposed to solve the problem where conventional camera calibration methods require too many complicated procedures. In the case of the omnidirectional camera, it must be comprised of several cameras to collect 360 • information around the rig. Additionally, the calibration among these cameras should be preceded. Accurate calibration is essential to increase that of omnidirectional image processing, such as extracting depth information from captured images. An omnidirectional camera structure has a form in which the center line of each camera diverges. Therefore, it is difficult to share one another's coordinate space. Additionally, because calibration must be repeated according to the number of individual cameras comprising the omnidirectional camera, it takes a long time. To solve these problems, we proposed an omnidirectional camera calibration method and structure that can provide distinguishable 3D world coordinates and 2D board coordinates. From the experimental results, for the intrinsic parameters, the proposed method yielded an average reprojection error of 0.37 pixels, higher than the conventional method. For the extrinsic parameters, the proposed method had a mean absolute error of 0.90 • for the rotation displacement and a mean absolute error of 1.32 mm for the translation displacement. We expect that the proposed method could be the basis of omnidirectional image processing for acquiring 6-DoF in the future.