A Virtual Multi-Ocular 3D Reconstruction System Using a Galvanometer Scanner and a Camera

A novel visual 3D reconstruction system, composed of a two-axis galvanometer scanner, a camera with a lens, and a set of control units, is introduced in this paper. By changing the mirror angles of the galvanometer scanner fixed in front of the camera, the boresight of the camera can be quickly adjusted. With the variable boresight, the camera can serve as a virtual multi-ocular system (VMOS), which captures the object at different perspectives. The working mechanism with a definite physical meaning is presented. A simple and efficient method for calibrating the intrinsic and extrinsic parameters of the VMOS is presented. The applicability of the proposed system for 3D reconstruction is investigated. Owing to the multiple virtual poses of the camera, the VMOS can provide stronger constraints in the object pose estimation than an ordinary perspective camera does. The experimental results demonstrate that the proposed VMOS is able to achieve 3D reconstruction performance competitive with that of a conventional stereovision system with a much more concise hardware configuration.


Introduction
Vision-based three-dimensional (3D) reconstruction has been among the most popular research fields for years. This technique endeavors to recover the 3D information of spatial objects or scenes from one or multiple images. It has been widely applied in many flourishing fields, such as auto driving, industrial inspection, building modeling, etc.
Generally, the implementation of 3D reconstruction of unknown objects requires images taken from different perspectives, which may be synchronously captured by two or more spatially distributed cameras. The 3D reconstruction devices with two or more cameras are usually called binocular or multi-ocular 3D vision systems, respectively. For example, Fu et al. proposed an optimization method based on the correspondence field (CF) to determine a proper camera arrangement in a dual-camera system, which turned out to be effective for improving the reconstruction accuracy [1]. Shih et al. produced high-resolution depth maps of the scenes utilizing a multi-camera system, which consists of a wide-angle camera and an array of telescopic cameras [2]. However, dual-camera and multi-camera systems usually occupy a large spatial layout and have limitations in some scenarios. Different perspectives images can also be successively achieved by one moving camera. Then under the framework of structure from motion (SfM) [3] the spatial information of the scenes can be reconstructed from the images captured over time. For example, Gao et al. designed a set of multi-view 3D reconstruction technology based on SFM and obtained the external parameters of the camera and the sparse pointcloud model [4]. Wang et al. integrated the fringe light projection with SfM to realize a global dense point cloud reconstruction system [5]. These SfM-based methods can produce accurate 3D point clouds for large-scale scenes; however, capturing images one after another is time consuming and inconvenient for many applications.
The configuration of the proposed galvanometer-camera combined VMOS is shown as Figure 1a. The VMOS consisted of a galvanometer scanner, a camera with an appropriate lens, and a set of control units. The galvanometer scanner was fixed in front of the camera, and the control unit was used to control the camera and the galvanometer scanner simultaneously to take pictures when the galvanometer deflected to a specified position. The lights of the scene were deflected twice by the two mirrors in the galvanometer scanner and then captured by the camera sensor through the lens. By changing the turning angles of the two mirrors, the camera boresight and FOV could be adjusted, as shown in Figure 1b According to the principle of mirror transformation, changing the camera's field of view through the mirror deflections is equivalent to changing the camera's pose (including the position and direction), as shown in Figure 2. In Figure 2, O XYZ − is the camera coordinate system, which represents the pose of the real camera. The rotation angles of Mirror-1 and Mirror-2 are denoted as α and β , respectively, which are uniquely determined by a pair of control values ( , ) D a b . Suppose Mirror-1 and Mirror-2 are at the initial turning angles, then 1 O ' is the virtual camera position, which is specularly transformed from O with Mirror-1, and 1 O '' is the virtual camera position, which is specularly transformed from 1 O ' with Mirror-2 in the initial status. The boresight of the real camera is identically transformed and marked as the blue dotted lines. When the turning angles α and β are changed to an arbitrary status, the corresponding virtual camera positions induced by the two mirror transformations are According to the principle of mirror transformation, changing the camera's field of view through the mirror deflections is equivalent to changing the camera's pose (including the position and direction), as shown in Figure 2. The configuration of the proposed galvanometer-camera combined VMOS is shown as Figure 1a. The VMOS consisted of a galvanometer scanner, a camera with an appropriate lens, and a set of control units. The galvanometer scanner was fixed in front of the camera, and the control unit was used to control the camera and the galvanometer scanner simultaneously to take pictures when the galvanometer deflected to a specified position. The lights of the scene were deflected twice by the two mirrors in the galvanometer scanner and then captured by the camera sensor through the lens. By changing the turning angles of the two mirrors, the camera boresight and FOV could be adjusted, as shown in Figure 1b According to the principle of mirror transformation, changing the camera's field of view through the mirror deflections is equivalent to changing the camera's pose (including the position and direction), as shown in Figure 2. In Figure 2, O XYZ − is the camera coordinate system, which represents the pose of the real camera. The rotation angles of Mirror-1 and Mirror-2 are denoted as α and β , respectively, which are uniquely determined by a pair of control values ( , ) D a b . Suppose Mirror-1 and Mirror-2 are at the initial turning angles, then 1 O ' is the virtual camera position, which is specularly transformed from O with Mirror-1, and 1 O '' is the virtual camera position, which is specularly transformed from 1 O ' with Mirror-2 in the initial status. The boresight of the real camera is identically transformed and marked as the blue dotted lines. When the turning angles α and β are changed to an arbitrary status, the corresponding virtual camera positions induced by the two mirror transformations are In Figure 2, O − XYZ is the camera coordinate system, which represents the pose of the real camera. The rotation angles of Mirror-1 and Mirror-2 are denoted as α and β, respectively, which are uniquely determined by a pair of control values D(a, b). Suppose Mirror-1 and Mirror-2 are at the initial turning angles, then O 1 is the virtual camera position, which is specularly transformed from O with Mirror-1, and O 1 is the virtual camera position, which is specularly transformed from O 1 with Mirror-2 in the initial status. The boresight of the real camera is identically transformed and marked as the blue dotted lines. When the turning angles α and β are changed to an arbitrary status, the corresponding virtual camera positions induced by the two mirror transformations are denoted as O 2 and O 2 , respectively, and the virtual camera boresight in this status is marked as the red dotted lines.
To sum up, the virtual camera pose was related to the deflection angles α and β, the distance between the rotation axes of the two mirrors, and the relative installation pose between the real camera and the galvanometer scanner. However, it is not trivial to directly calculate the pose matrices of the virtual cameras in practice for the following reasons: (1) The turning angles α and β are determined by a pair of control parameters a and b, respectively. The non-linear mapping between (α, β) and the control parameter D(a, b) needs to be carefully calibrated, and the calibration errors of D(a, b) → (α, β) may reduce the accuracy of the calculated virtual camera poses. (2) The distance between the rotation axes of two mirrors is determined by the manufacturing process of the galvanometer scanner and is difficult to accurately measure in practice. (3) The relative installation pose between the camera and the galvanometer scanner is hard to know.
Instead of trying to calculate the virtual camera poses through specular reflection transformation, we enabled the galvanometer-camera to work as a virtual multi-ocular system, which needed to know neither the nonlinear relation D(a, b) → (α, β) , the rotation axes distance of the two mirrors, nor the installation pose of the camera. This scheme took advantage of the high repeatability of the galvanometer scanner. Specifically, the high repeatability of the scanner meant that whenever a specific control parameter D(a, b) was transmitted to the scanner, the corresponding deflection angles α and β almost remained unchanged every time, and hence, the imaging area of the system was all the same. In other words, given control parameter D(a, b), the pose of the virtual camera V was definitely determined. Therefore, we sampled the 2D control parameter domain in advance and endeavored to calibrate the corresponding virtual poses that corresponded to the sampled parameters. A one-to-one mapping D → V from the sampled control parameters D to the corresponding virtual camera V was established. All the virtual cameras constituted the virtual multi-ocular system.
In order to perform the camera imaging within the deflection range of the galvanometer scanner, the camera and the galvanometer scanner should be properly configured to guarantee that the view pyramid of any virtual camera resulting from the deflection of Mirror -1 should intersect with Mirror-2, as shown in Figure 3. marked as the red dotted lines.
To sum up, the virtual camera pose was related to the deflection angles α and β , the distance between the rotation axes of the two mirrors, and the relative installation pose between the real camera and the galvanometer scanner. However, it is not trivial to directly calculate the pose matrices of the virtual cameras in practice for the following reasons: (1) The turning angles α and β are determined by a pair of control parameters a and b , respectively. The non-linear mapping between ( , ) α β and the control parameter needs to be carefully calibrated, and the calibration errors of ( , ) ( , ) D a b α β → may reduce the accuracy of the calculated virtual camera poses. (2) The distance between the rotation axes of two mirrors is determined by the manufacturing process of the galvanometer scanner and is difficult to accurately measure in practice. (3) The relative installation pose between the camera and the galvanometer scanner is hard to know.
Instead of trying to calculate the virtual camera poses through specular reflection transformation, we enabled the galvanometer-camera to work as a virtual multi-ocular system, which needed to know neither the nonlinear relation ( , ) ( , ) D a b α β → , the rotation axes distance of the two mirrors, nor the installation pose of the camera. This scheme took advantage of the high repeatability of the galvanometer scanner. Specifically, the high repeatability of the scanner meant that whenever a specific control parameter was transmitted to the scanner, the corresponding deflection angles α and β almost remained unchanged every time, and hence, the imaging area of the system was all the same. In other words, given control parameter ( , ) D a b , the pose of the virtual camera V was definitely determined. Therefore, we sampled the 2D control parameter domain in advance and endeavored to calibrate the corresponding virtual poses that corresponded to the sampled parameters. A one-to-one mapping D V → from the sampled control parameters D to the corresponding virtual camera V was established. All the virtual cameras constituted the virtual multi-ocular system.
In order to perform the camera imaging within the deflection range of the galvanometer scanner, the camera and the galvanometer scanner should be properly configured to guarantee that the view pyramid of any virtual camera resulting from the deflection of Mirror -1 should intersect with Mirror-2, as shown in Figure 3. More specifically, the parameters of the galvanometer-camera combination should meet the following condition: More specifically, the parameters of the galvanometer-camera combination should meet the following condition: where α is the turning angle of Mirror-1, θ is the FOV angle of the camera, W is the width of Mirror-2, O is the optical center point of camera, O is the optical center point of virtual camera formed by Mirror-1, R 1 is the center point of Mirror-1, and R 2 is the center point of Mirror-2.
To guarantee that each virtual camera in the VMOS shared common FOVs with some of the others, the sampling numbers of control parameter D(a, b) should satisfy where N a and N b are the least sampling numbers of the control parameters a and b, respectively; α max and β max are the maximum turning angles of Mirror-1 and Mirror-2, respectively; and θ x and θ y are the camera FOV angle in the horizontal and vertical directions, respectively. Having determined N a and N b , the 2D control parameter domain is evenly sampled. Then we have a number of S = N a × N b virtual cameras corresponding to the sampled control parameters D(a, b) s (s = 1, 2, . . . , S). The virtual cameras are denoted as V s (s = 1, 2, . . . , S). The above control parameter sampling rule can ensure that the adjacent virtual cameras share common FOVs. Most viewable regions of the VMOS largely have fourfold overlap, as shown in Figure 4. The bigger the sampling numbers N a and N b are, the more folds the viewing regions overlap, and the more constraints can be supplied for 3D reconstruction.
of the others, the sampling numbers of control parameter ( , ) The above control parameter sampling rule can ensure that the adjacent virtual cameras share common FOVs. Most viewable regions of the VMOS largely have fourfold overlap, as shown in Figure 4. The bigger the sampling numbers a N and b N are, the more folds the viewing regions overlap, and the more constraints can be supplied for 3D reconstruction.

Calibration Method of the VMOS
According to Section 2.1, the VMOS was composed of a number of . Since all the virtual cameras were induced from the same real camera, the intrinsic parameters, including the pinhole imaging matrix and the distortion parameters, were the same for

Calibration Method of the VMOS
According to Section 2.1, the VMOS was composed of a number of S = N a × N b virtual cameras corresponding to the sampled control parameters D(a, b) s (s = 1, 2, . . . , S). Since all the virtual cameras were induced from the same real camera, the intrinsic parameters, including the pinhole imaging matrix and the distortion parameters, were the same for each virtual camera, while the poses of all the virtual cameras V s (s = 1, 2, . . . , S) needed to be calculated.
Due to the large FOV of the VMOS, the calibration was difficult to realize by a calibration target at once. We proposed a global optimization method for zonal calibration, combining Zhang's camera calibration method [28], the PnP (perspective-n-point) method [29], and the bundle adjustment (BA) method [30]. The main steps are summarized in Figure 5. Due to the large FOV of the VMOS, the calibration was difficult to realize by a calibration target at once. We proposed a global optimization method for zonal calibration, combining Zhang's camera calibration method [28], the PnP (perspective-npoint) method [29], and the bundle adjustment (BA) method [30]. The main steps are summarized in Figure 5. To realize the calibration method, we built a planar calibration target, which was evenly distributed with coded points. The identifications of each coded points in the images could be easily recognized by decoding. Denote the calibration target coordinate system as C-CS, and the coordinates of the coded points in C-CS are denoted as id X , where the superscript id represents the identification of a coded point. The specific steps  To realize the calibration method, we built a planar calibration target, which was evenly distributed with coded points. The identifications of each coded points in the images could be easily recognized by decoding. Denote the calibration target coordinate system as C-CS, and the coordinates of the coded points in C-CS are denoted as X id , where the superscript id represents the identification of a coded point. The specific steps of proposed calibration method are as follows: For image collection and calibration data preparation, put the calibration target at position T p (p = 1, 2, . . . , P) in the working volume of the VMOS. Capture images I sp (s = 1, 2, . . . , S, p = 1, 2, . . . , P) of the target in position T p with the virtual camera V s . Then extract the image coordinates x id sp of the coded points in image I sp . The 3D coordinates under a global coordinate system (G-CS) of the coded points X id p on the calibration target in position T p are measured utilizing a photogrammetric device.

1.
For the calibration of the camera intrinsic parameters, among images I sp with different index s and fixed index p, match the 3D points X id p with the image points x id sp . Take the matched pairs X id p ↔ x id sp into Zhang's monocular camera calibration process [28,31] for calibrating the intrinsic matrix K in the pinhole camera model as shown in Equation (3) and the distortion parameters (k 1 , k 2 , k 3 , k 4 , k 5 ) expressed in Equation (4).
where [x, y, z, 1] T is the homogeneous coordinates of the spatial point, [u, v, 1] T is the ideal homogeneous pixel coordinates of the corresponding point, [R|t] is the pose parameters of the camera, and λ is the depth coefficient.
where u d and v d are the observed pixel coordinates with distortion corresponding to the ideal coordinates u and v, respectively; r is the distance between the pixel point (u, v) and the principle point of the pixel plane; k 1 , k 2 , and k 3 are the radial distortion parameters; and k 4 and k 5 are the tangential distortion parameters.

2.
For the calibration of the virtual camera poses, to calculate the sth virtual camera pose, gather the coded points x id sp in the images I sp as a group G s with the same index s(s = 1, 2, . . . , S) and different index p. Match the image points x id sp in each G s with X id p (p = 1, 2, . . . , P) according to index p and id. Utilizing the matched pairs X id p ↔ x id sp in the specific group G s , the pose of the virtual camera V s (s = 1, 2, . . . , S), i.e., the transformation matrix [R s |t s ] from G-CS to the virtual camera coordinate system V s -CS, is calculated through the PnP method [29].

3.
For global optimization, to improve the calibration accuracy, the BA method [32] is applied to optimize the intrinsic parameters and all the virtual camera poses. In consideration of the lens distortion, we add radial distortion and tangential distortion to the BA model. The objective function of the nonlinear optimization is where reprojection id sp is the reprojection pixel coordinates of spatial point X id p in virtual camera V s calculated through Equations (3) and (4). Figure 6 shows the schematic diagram of the entire calibration process. Finally, the intrinsic matrix K, the distortion parameters (k 1 , k 2 , k 3 , k 4 , k 5 ), and the extrinsic matrices [R s |t s ](s = 1, 2, . . . , S) of the virtual cameras were determined.
consideration of the lens distortion, we add radial distortion and tangential distortion to the BA model. The objective function of the nonlinear optimization is   (3) and (4). Figure 6 shows the schematic diagram of the entire calibration process. Finally, the intrinsic matrix K , the distortion parameters 1

The 3D Reconstruction Method with the VMOS
Having completed the VMOS calibration, the intrinsic matrix K , the distortion parameters

The 3D Reconstruction Method with the VMOS
Having completed the VMOS calibration, the intrinsic matrix K, the distortion parameters (k 1 , k 2 , k 3 , k 4 , k 5 ), and the extrinsic pose matrices [R s |t s ] of all the virtual cameras V s (s = 1, 2, . . . , S) were obtained. The control parameter sampling rule described in Section 2.1 guarantees that the scene in the working volume of the VMOS can be observed by largely four or more virtual cameras. According to the triangulation method, the region observed by multiple virtual cameras can be 3D reconstructed, shown as Figure 7. In Figure 7, the image point 1 m corresponding to the spatial point M can be expressed as  In Figure 7, the image point m 1 corresponding to the spatial point M can be expressed as where [R 1 |t 1 ] is the 3 × 4 extrinsic pose matrix of V 1 ; [X, Y, Z, 1] T is the homogeneous coordinates of M in the world coordinate system; [u 1 , v 1 , 1] T is the undistorted pixel coordinates of M in pixel coordinate system of virtual camera V 1 , which can be calculated from the observed image coordinates with Equation (4); and λ 1 is the depth coefficient of point M in the coordinate system of virtual camera V 1 . By eliminating λ 1 , Equation (6) can be reorganized as According to Equation (7), one camera can provide a 2 × 4 coefficient matrix. When there are n(n ≥ 2) cameras having observed the target point, an overdetermined linear system shown in Equation (8) can be obtained: (2) . . .
Perform singular value decomposition (SVD) [30] on the coefficient matrix. The 3D coordinates (X, Y, Z) can be obtained from the singular vector corresponding to the minimum singular value.

Pose Estimation Method Using the VMOS
Object pose estimation is one of the most common applications of machine vision. The PnP algorithm is the most common means of monocular pose estimation [33,34]. Through the 2D image coordinates observed by one camera and the corresponding known 3D coordinates of the target, the transformation between the object coordinate system and the camera coordinate system can be calculated. Then the six degree of freedom (DOF) pose parameters of the object with respect to the camera coordinate system can be obtained.
However, due to the limited field of view of each virtual camera, it may be impossible to obtain enough points for the PnP calculation in a single perspective. In addition, the pose calculated by PnP from a single view is in the current virtual camera coordinate system, which needs to be converted to the VMOS coordinate system using the pose parameters of each virtual camera, which is cumbersome and inconvenient. Fortunately, the proposed VMOS could observe the same object point by different virtual cameras, which had potential to provide more constraints for determining the object pose compared with ordinary cameras. Taking advantage of the large FOV of the VMOS, we proposed a global pose estimation algorithm to directly obtain the object pose in the VMOS coordinate system by utilizing the images from multiple virtual cameras.
In our pose estimation scheme with the VMOS, not all the virtual cameras but only those having observed the feature points for the pose estimation participated in the calculation. As shown in Figure 8, suppose the calibrated virtual camera V i observes a point P j in the object coordinate system (O-CS) concerning the pose estimation, and the corresponding undistorted pixel coordinates (u ij , v ij ) are obtained. Then, (u ij , v ij ) can be transformed to the normalization plane in V i − CS according to (9).
where [x ij , y ij , 1] T is the coordinates of point p ij , which is on the normalization plane in V i − CS corresponding to P j .
where   (10) and (11). The correspondence between spatial point P j and the line O i p ij was established. Utilizing the extrinsic matrix [R i |t i ] , the line O i p ij was transformed from V i − CS to the VMOS coordinate system (i.e., G − CS), as shown in Equations (10) and (11).
Given N pairs of 3D point-line correspondence which formed by virtual cameras V i (i = 1, 2, . . . , I) observing points P j (j = 1, 2, . . . , J) on an object, the pose estimation can be modeled as the non-perspective PnP (NPnP) [35] problem depicted as where ζ ij is the parameter of line We utilized the procrustean solution provided in [35] to estimate the transformation (13). After obtaining the result, a BA optimization was performed to improve the accuracy.
as the initial value, we minimized the reprojection errors expressed in Equation (14) to finally obtain the object pose parameters.
where ( u ij , v ij ) is the reprojection pixel coordinates of the spatial point P j in virtual camera V i and can be expressed in Equation (15). The Gauss-Newton iteration method was used to minimize the objective function in Equation (14).

Hardware Setup
The VMOS used in our experiments is shown in Figure 1a. Specifically, the hardware contained a CCD camera with a lens and a galvanometer with two reflection mirrors. The camera used was an GT-2450 from AVT in Germany, whose resolution is 2448 × 2050 pixels, and the size of each pixel is 3.45 µm × 3.45 µm. The lens installed on the camera was a LM50-JC1MS from Kowa in Japan with a 50 mm fixed focal length. The nominal repeatability of the used galvanometer was <8 µrad. The planar calibration target used in the experiments was a piece of glass, 2.4 m × 1.2 m in size and with a set of coded markers [36] pasted on, as shown in Figure 9. Each coded marker had eight white circular dots distributed in a specific pattern on a black background. From the unique distribution pattern of each marker, the marker identity and in turn the identity of each white dot could be recognized. This helped to establish the correspondences between the points on the board and the points in the images. The program for completing the whole experimental process was written in C++ and installed in a personal computer with an Core i5-9400F @3.7 GHz CPU from Intel in America, 16 G RAM, and a Windows 10 Enterprise edition operating system from Microsoft in America.

Galvanometer Repeatability Verification
Since the proposed VMOS calibration method was based on the high repeatability characteristic of the galvanometer scanner, we first conducted an experiment to verify the repeatability of the galvanometer.
We placed the VMOS about 2.5 m in front of a wall pasted with nine coded markers, as shown in Figure 10. The galvanomirrors in the VMOS were deflected so that the camera could observe each of the coded markers, and the corresponding control parameters of the galvanometer were recorded as ( , ) ( 1, 2 , ..., 9 )  were calculated. The mean absolute (MA) and root mean square (RMS) deviations in the pixel are shown in Table 1.

Galvanometer Repeatability Verification
Since the proposed VMOS calibration method was based on the high repeatability characteristic of the galvanometer scanner, we first conducted an experiment to verify the repeatability of the galvanometer.
We placed the VMOS about 2.5 m in front of a wall pasted with nine coded markers, as shown in Figure 10. The galvanomirrors in the VMOS were deflected so that the camera could observe each of the coded markers, and the corresponding control parameters of the galvanometer were recorded as D(a, b) s (s = 1, 2, . . . , 9). We then repeatedly input the recorded D(a, b) s (s = 1, 2, . . . , 9) to the VMOS five times and captured five images for each coded marker. A total of 45 images were captured.

VMOS Calibration
After verifying the galvanometer repeatability, a calibration experiment was conducted. The camera's horizontal and vertical viewing angles x θ and y θ were 9.6° and 8.0°, respectively. The maximum turning angles of the galvanomirrors were both about 20°.
Referring to Equation (2), the sampling numbers a N and b N of the control parameter domain ( , )

D a b
should be larger than 10 and 9, respectively, to satisfy the fourfold FOV overlap constraint. We set in the experiments. Therefore, 21 × 21 = 441 control parameter samples were obtained. The angle between adjacent virtual camera boresights was about 2°.
The measurement device used for determining the 3D coordinates of the dot centers on the calibration board was an industrial photogrammetry system TriTop ® from GOM Co. in Brunswick, Germany. It needed only a handheld camera with a fixed focal length to take a set of overlapped images of the calibration board from various perspectives to reconstruct the 3D coordinates of the white dot centers. Some coded points placed in the calibration scene were used to establish the global coordinate system G-CS.
The calibration operation processes in the experiment were as follows.
Put the planar calibration target in six positions, i.e.,  The pixel coordinates of the coded point centers in the images were extracted. The deviations among the five pixel coordinates of the same coded point center corresponding to the same D(a, b) s were calculated. The mean absolute (MA) and root mean square (RMS) deviations in the pixel are shown in Table 1. The repeatability of galvanometer could be calculated by Equation (16).
where δ is the repeatability of the galvanometer, ε is the coded point center deviation in pixel, p is the pixel size of the camera sensor, and f is the focal length of the camera lens. According to the results in Table 1, the repeatability of the galvanometer calculated by Equation (16) was 4.62 µrad, which conformed with the nominal value.

VMOS Calibration
After verifying the galvanometer repeatability, a calibration experiment was conducted. The camera's horizontal and vertical viewing angles θ x and θ y were 9.6 • and 8.0 • , respectively. The maximum turning angles of the galvanomirrors were both about 20 • . Referring to Equation (2), the sampling numbers N a and N b of the control parameter domain D(a, b) should be larger than 10 and 9, respectively, to satisfy the fourfold FOV overlap constraint. We set N a = N b = 21 in the experiments. Therefore, 21 × 21 = 441 control parameter samples were obtained. The angle between adjacent virtual camera boresights was about 2 • .
The measurement device used for determining the 3D coordinates of the dot centers on the calibration board was an industrial photogrammetry system TriTop ® from GOM Co. in Brunswick, Germany. It needed only a handheld camera with a fixed focal length to take a set of overlapped images of the calibration board from various perspectives to reconstruct the 3D coordinates of the white dot centers. Some coded points placed in the calibration scene were used to establish the global coordinate system G-CS.
The calibration operation processes in the experiment were as follows. Put the planar calibration target in six positions, i.e., T p (p = 1, 2, . . . , 6). Start the VMOS and take images of the target in each position with the 441 virtual cameras V s (s = 1, 2, . . . , 441). Measure the coordinates X id p of the coded points on the calibration target with TriTop ® . The measured coded points on the calibration board in the six positions in G-CS are shown in Figure 11.  We used the reprojection errors of all the spatial coded points on all the virtual cameras to evaluate the calibration accuracy. The MA and RMS of the reprojection errors are listed in Table 2, where the "direction" column includes the error components in the horizontal and vertical directions, as well as the full reprojection error.  We chose the calibration target position T 1 to calibrate the intrinsic matrix K and the distortion parameters (k 1 , k 2 , k 3 , k 4 , k 5 ) following the method described in step 1 in Section 2.2. Then, following step 2 and step 3 in Section 2.  We used the reprojection errors of all the spatial coded points on all the virtual cameras to evaluate the calibration accuracy. The MA and RMS of the reprojection errors are listed in Table 2, where the "direction" column includes the error components in the horizontal and vertical directions, as well as the full reprojection error.  We used the reprojection errors of all the spatial coded points on all the virtual cameras to evaluate the calibration accuracy. The MA and RMS of the reprojection errors are listed in Table 2, where the "direction" column includes the error components in the horizontal and vertical directions, as well as the full reprojection error. According to the calibration results shown in Table 1, the optimized parameters K, (k 1 , k 2 , k 3 , k 4 , k 5 ), and [R s |t s ](s = 1, 2, . . . , 441) resulted in about a 5% reduction in the MA reprojection error and about 10% in the RMSE compared with the initial values. The comparison revealed that the optimization process played an important role in improving the calibration accuracy. In this experiment, the size of the calibration target covered by each image was about 500 mm × 400 mm. Owing to the relatively high image resolution (2448 × 2050), the absolute length represented by 1 pixel was about 0.2 mm in the calibration target area. After the optimization, the mean reprojection error was 0.282 pixel, which largely stood for an accuracy of about 0.06 mm in space.

Experiments on 3D Coordinate Reconstruction
We performed two 3D reconstruction experiments using the calibrated VMOS to verify its applicability in 3D vision tasks.

Reconstruction of a Visual Scale Bar
The visual scale bar is commonly used in photogrammetry to serve as a scale reference. It has visual markers with precise known distances between each other. In this experiment, we reconstructed the 3D coordinates of the 24 white dot centers belonging to three coded markers on a scalar bar to verify the 3D reconstruction capability of the VMOS, as shown in Figure 13. The distance between the VMOS and the scale bar was about 2.5 m.

Experiments on 3D Coordinate Reconstruction
We performed two 3D reconstruction experiments using the calibrated VMOS to verify its applicability in 3D vision tasks.

Reconstruction of a Visual Scale Bar
The visual scale bar is commonly used in photogrammetry to serve as a scale reference. It has visual markers with precise known distances between each other. In this experiment, we reconstructed the 3D coordinates of the 24 white dot centers belonging to three coded markers on a scalar bar to verify the 3D reconstruction capability of the VMOS, as shown in Figure 13. The distance between the VMOS and the scale bar was about 2.5 m. Following the method in Section 2.3, the coordinates of the 24 points on the three coded markers on the scale bar were reconstructed with the calibrated VMOS, as shown in Figure 14. To evaluate the reconstruction accuracy, the coordinates of the three sets of coded points were measured using TriTop ® from GOM CO. in Brunswick, Germany as well and regarded as the ground truth. Then, the distances between each dot of the No. 0 coded marker and each dot of the No. 1 and No. 2 coded markers were calculated. There were 128 distances calculated in total, which were compared with the ground truth values to evaluate the 3D reconstruction. The absolute mean (AM) and root mean square (RMS) distance errors are shown in Table 3. Following the method in Section 2.3, the coordinates of the 24 points on the three coded markers on the scale bar were reconstructed with the calibrated VMOS, as shown in Figure 14. To evaluate the reconstruction accuracy, the coordinates of the three sets of coded points were measured using TriTop ® from GOM CO. in Brunswick, Germany as well and regarded as the ground truth. Then, the distances between each dot of the No. 0 coded marker and each dot of the No. 1 and No. 2 coded markers were calculated. There were 128 distances calculated in total, which were compared with the ground truth values to evaluate the 3D reconstruction. The absolute mean (AM) and root mean square (RMS) distance errors are shown in Table 3.
well and regarded as the ground truth. Then, the distances between each dot of th coded marker and each dot of the No. 1 and No. 2 coded markers were calculated were 128 distances calculated in total, which were compared with the ground truth to evaluate the 3D reconstruction. The absolute mean (AM) and root mean square distance errors are shown in Table 3.

Reconstruction of Marker Points on 3D Structure
In this experiment, we utilized the VMOS to reconstruct a 3D structure object. We placed the object in the working space of the VMOS, shown in Figure 15.

Reconstruction of Marker Points on 3D Structure
In this experiment, we utilized the VMOS to reconstruct a 3D structure obje placed the object in the working space of the VMOS, shown in Figure 15. Referring to the method in Section 2.3, the coordinates of 19 × 8 coded points 3D object surface were reconstructed, as shown in Figure 16. In order to ver reconstruction accuracy, the reconstructed 3D coordinates of the coded point compared with the coordinates measured by TriTop ® . The MA and RMS reconst errors of the 19 × 8 coded points scattered on the 3D structure are listed in T including the error components in the x, y, and z directions and the full Euclidean d errors.  Referring to the method in Section 2.3, the coordinates of 19 × 8 coded points on the 3D object surface were reconstructed, as shown in Figure 16. In order to verify the reconstruction accuracy, the reconstructed 3D coordinates of the coded points were compared with the coordinates measured by TriTop ® . The MA and RMS reconstruction errors of the 19 × 8 coded points scattered on the 3D structure are listed in Table 4, including the error components in the x, y, and z directions and the full Euclidean distance errors. The results in Table 4 show that the errors in the x and y directions were much smaller than that in z direction (i.e., the depth direction). The reason was that the baseline between each virtual camera was much smaller than the target depth, resulting in the uncertainty increase in the depth direction. The related works on 3D reconstruction using a variable boresight imaging system [10,37] also reported relatively lower accuracy in the depth direction. In our experiment, we quantitatively evaluated the MA and RMS reconstruction errors of the coded points, which could be accurately measured to serve as the true values.

Repeatability of Reconstruction Verification
In order to test the repeatability of the VMOS reconstruction, we conducted a repeatability verification experiment. We placed the 3D structure object as shown in Figure 15 at a location in the FOV of the VMOS and reconstructed the coded points on the object by using the VMOS. Then we changed the position of the object and reconstructed the object coded points again. The processes were repeated, and five sets of reconstruction results were obtained, as shown in Figure 17. To evaluate the repeat accuracy of the 3D reconstruction, we aligned the reconstructed point sets under each position to the other sets, and 10 pairs of aligned point sets were obtained. The distance errors of the corresponding points in each pair of point sets after the alignment were calculated, and the errors are shown in Table 5. The experiment results demonstrated a good repeatability of the VMOS.   The results in Table 4 show that the errors in the x and y directions were much smaller than that in z direction (i.e., the depth direction). The reason was that the baseline between each virtual camera was much smaller than the target depth, resulting in the uncertainty increase in the depth direction. The related works on 3D reconstruction using a variable boresight imaging system [10,37] also reported relatively lower accuracy in the depth direction. In our experiment, we quantitatively evaluated the MA and RMS reconstruction errors of the coded points, which could be accurately measured to serve as the true values.

Repeatability of Reconstruction Verification
In order to test the repeatability of the VMOS reconstruction, we conducted a repeatability verification experiment. We placed the 3D structure object as shown in Figure 15 at a location in the FOV of the VMOS and reconstructed the coded points on the object by using the VMOS. Then we changed the position of the object and reconstructed the object coded points again. The processes were repeated, and five sets of reconstruction results were obtained, as shown in Figure 17. The results in Table 4 show that the errors in the x and y directions were much smaller than that in z direction (i.e., the depth direction). The reason was that the baseline between each virtual camera was much smaller than the target depth, resulting in the uncertainty increase in the depth direction. The related works on 3D reconstruction using a variable boresight imaging system [10,37] also reported relatively lower accuracy in the depth direction. In our experiment, we quantitatively evaluated the MA and RMS reconstruction errors of the coded points, which could be accurately measured to serve as the true values.

Repeatability of Reconstruction Verification
In order to test the repeatability of the VMOS reconstruction, we conducted a repeatability verification experiment. We placed the 3D structure object as shown in Figure 15 at a location in the FOV of the VMOS and reconstructed the coded points on the object by using the VMOS. Then we changed the position of the object and reconstructed the object coded points again. The processes were repeated, and five sets of reconstruction results were obtained, as shown in Figure 17. To evaluate the repeat accuracy of the 3D reconstruction, we aligned the reconstructed point sets under each position to the other sets, and 10 pairs of aligned point sets were obtained. The distance errors of the corresponding points in each pair of point sets after the alignment were calculated, and the errors are shown in Table 5. The To evaluate the repeat accuracy of the 3D reconstruction, we aligned the reconstructed point sets under each position to the other sets, and 10 pairs of aligned point sets were obtained. The distance errors of the corresponding points in each pair of point sets after the alignment were calculated, and the errors are shown in Table 5. The experiment results demonstrated a good repeatability of the VMOS. Table 5. Distance errors in each pair of aligned sets of points.

Experiment on Pose Estimation
In this experiment, we demonstrated the applicability of the calibrated VMOS for two objects' relative pose estimations, which is a frequently faced 3D vision task in many scenarios. Two 3D objects, denoted as Obj A and Obj B, were placed in the FOV of the calibrated VMOS, as shown in Figure 18.

Experiment on Pose Estimation
In this experiment, we demonstrated the applicability of the calibrated VMOS for two objects' relative pose estimations, which is a frequently faced 3D vision task in many scenarios. Two 3D objects, denoted as Obj A and Obj B, were placed in the FOV of the calibrated VMOS, as shown in Figure 18. to G CS − were calculated. As shown in Figure 19b, the blue and orange circles represent the points on Obj A and Obj B, respectively. The blue and orange lines represent the calculated spatial lines in G-CS corresponding to the image points on Obj A and Obj B captured by the virtual cameras. The transformation matrix from Obj A to Obj B obtained using the VMOS was denoted as G A B T , which was calculated by Equation (17). We denoted the coordinate systems of the objects as O a − CS and O b − CS. The relative pose between O a − CS and O b − CS was unknown and needed to be determined. There were some coded markers on the surface of each object. In order to obtain a ground truth of the relative pose to evaluate the estimate result, we again used the TriTop ® to measure the coded points on Obj A and Obj B and then calculated the relative pose transformation matrix T AB as the ground truth.
There were 174 out of the total 441 virtual cameras participating in the pose estimation in this experiment, as shown in Figure 19a, where dark gray cells indicate the corresponding sampled control parameters. By using the global pose estimation method in Section 2.4, the transformation matrix T AG from O a − CS to G − CS and transformation matrix T BG from O b − CS to G − CS were calculated. As shown in Figure 19b, the blue and orange circles represent the points on Obj A and Obj B, respectively. The blue and orange lines represent the calculated spatial lines in G-CS corresponding to the image points on Obj A and Obj B captured by the virtual cameras. The transformation matrix from Obj A to Obj B obtained using the VMOS was denoted as T G AB , which was calculated by Equation (17). For comparison, we also estimated the relative pose of Obj A to Obj B using an ordinary camera, as shown in Figure 20. The model of the ordinary camera was GC-2450 from AVT in Germany, which had the same performance parameters as the camera used in the VMOS. The lens installed on the camera was a LM12-JC1MS from Kowa in Japan with a 12 mm focal length, which was the same series with the lens installed in the VMOS. This lens enabled the ordinary camera to obtain a similar FOV to that of the VMOS. In the ordinary camera pose estimation experiment, we took a PnP solver (SQPnP) [38], which was based on the non-linear sequential quadratic programming method, to calculate the transformation matrix to the camera coordinate system, respectively. Similarly, the transformation matrix from Obj A to Obj B obtained using the ordinary camera is denoted as T , which was calculated by Equation (18). For comparison, we also estimated the relative pose of Obj A to Obj B using an ordinary camera, as shown in Figure 20. The model of the ordinary camera was GC-2450 from AVT in Germany, which had the same performance parameters as the camera used in the VMOS. The lens installed on the camera was a LM12-JC1MS from Kowa in Japan with a 12 mm focal length, which was the same series with the lens installed in the VMOS. This lens enabled the ordinary camera to obtain a similar FOV to that of the VMOS. For comparison, we also estimated the relative pose of Obj A to Obj B using an ordinary camera, as shown in Figure 20. The model of the ordinary camera was GC-2450 from AVT in Germany, which had the same performance parameters as the camera used in the VMOS. The lens installed on the camera was a LM12-JC1MS from Kowa in Japan with a 12 mm focal length, which was the same series with the lens installed in the VMOS. This lens enabled the ordinary camera to obtain a similar FOV to that of the VMOS. In the ordinary camera pose estimation experiment, we took a PnP solver (SQPnP) [38], which was based on the non-linear sequential quadratic programming method, to calculate the transformation matrix to the camera coordinate system, respectively. Similarly, the transformation matrix from Obj A to Obj B obtained using the ordinary camera is denoted as  In the ordinary camera pose estimation experiment, we took a PnP solver (SQPnP) [38], which was based on the non-linear sequential quadratic programming method, to calculate the transformation matrix T AC and T BC from O a − CS and O b − CS to the camera coordinate system, respectively. Similarly, the transformation matrix from Obj A to Obj B obtained using the ordinary camera is denoted as T C AB , which was calculated by Equation (18).
According to the transformation matrices T AB , T G AB , and T C AB , the 6D pose parameters (rotations around the x, y, and z axes and translations along the x, y, and z directions) were calculated. The errors in the parameters obtained by the VMOS and those obtained by the ordinary camera were compared with the ground truth provided by TriTop ® , as shown in Table 6. The results in Table 6 illustrate that the errors in the z-direction (depth direction), as well as the overall errors, of the VMOS were much lower than those of the ordinary camera. The VMOS outperformed the ordinary camera in the relative pose estimation. One reason might have been that the multiple virtual cameras of the VMOS provided more constraints, which was conducive to eliminating the uncertainty error in the pose estimation compared with the single ordinary camera. In addition, the VMOS had a much higher resolution than the ordinary camera for the same FOV and was helpful in finding the exact image location of the coded points.

Conclusions
A novel 3D reconstruction system, which only consisted of a galvanometer scanner and a single camera with lens, is introduced. The galvanomirrors deflect the incoming rays and induce multiple virtual cameras in different poses. A set of methods for calibrating the intrinsic and extrinsic parameters of the virtual multi-ocular system is presented. The calibrated VMOS is applicable for 3D reconstruction and 6D pose estimation. The experimental results quantitatively demonstrate that the proposed VMOS can achieve 3D reconstruction with the accuracy of about 1 mm. Moreover, the VMOS shows better performance than the ordinary camera in pose estimation. Compared with the traditional multi-ocular system, the VMOS avoids the bulky hardware and complex layout, while achieving a comparable 3D reconstruction performance. Compared with the SfM-based methods, the ultrafast scanning capability enables the VMOS to swiftly capture the target images, which is crucial in many applications.
One focus of future work may be reconfiguring the hardware setup to enlarge the baselines between the virtual cameras in the VMOS, which is promising for further improvement in the reconstruction accuracy. It can be realized by increasing the distance between the camera and the galvanometer scanner under the constraints of Equation (1). In addition, the VMOS is developed for industrial inspection applications at present, which usually concern indoor settings and cooperative visual targets. However, it has potential to be used in non-laboratory environments. Application scenarios in more challenging surroundings can be explored in the future.