An Improved Method of Pose Estimation for Lighthouse Base Station Extension

In 2015, HTC and Valve launched a virtual reality headset empowered with Lighthouse, the cutting-edge space positioning technology. Although Lighthouse is superior in terms of accuracy, latency and refresh rate, its algorithms do not support base station expansion, and is flawed concerning occlusion in moving targets, that is, it is unable to calculate their poses with a small set of sensors, resulting in the loss of optical tracking data. In view of these problems, this paper proposes an improved pose estimation algorithm for cases where occlusion is involved. Our algorithm calculates the pose of a given object with a unified dataset comprising of inputs from sensors recognized by all base stations, as long as three or more sensors detect a signal in total, no matter from which base station. To verify our algorithm, HTC official base stations and autonomous developed receivers are used for prototyping. The experiment result shows that our pose calculation algorithm can achieve precise positioning when a few sensors detect the signal.


Introduction
The main task of a tracking system is to detect the spatial poses of objects in three degrees of freedom and six degrees of freedom. In virtual reality systems, a precise and stable tracking system with low latency plays a significant part in creating a truly immersive experience. In vision-based robot systems, accurate and reliable positioning is one of the most fundamental and crucial functions. The cost of camera-based optical tracking systems increases exponentially as the tracking area expands. Thus, economy and stability play pivotal roles in tracking systems.
Lighthouse offers a business indoor tracking solution featuring high precision, low latency and low cost. To solve the problem of tracking objects in the occlusion environment, Lighthouse installs 22 infrared diodes on each handle to receive signals (we regard each diode as a point). Pose estimation is unavailable when the number of points which has detected signal is less than 5. Besides, the Lighthouse does not support base station expansion. This leads to three problems, the first being that the tracked object must reach a certain size. Second, when people acting in the working area, the occlusion problem will be inevitable, so some optical data will be lost in this situation, destabilizing the system. Third, the tracking area cannot be expanded.
Most of existing pose estimation methods estimate the camera pose from n 3D-to-2D point correspondences. If there exists more than six non-coplanar points in the space, the problem can be easily solved using the well-known direct linear transform (DLT) [1]. When the rank of a matrix is less than 12, the solution of PnP problem is equivalent to the nonlinear problem. There are two solutions: closed-form methods [2][3][4] and iterative optimization methods such as [5][6][7]. The EPnP algorithm is a non-iterative solution for the PnP problem [8], which is used to solve the pose of one camera from n 2D-3D counterparts. The merit of this algorithm is that its computational complexity is O(n). However, it is applicable only when n ≥ 4. Specialized solutions include those designed to solve the P3P problem [2,[9][10][11], which are under the constraint of one calibrated camera. It is obviously not applicable for the case where the two base stations recognize n (n ≥ 3) points in total.
Our goal is looking for a pose estimation algorithm to unify the four cases in the same framework with known extrinsic parameter matrixes, namely, rotation matrix R, translation matrix T and the intrinsic matrix of base stations. We can transform the problem into a traditional PnP problem, so as to solve the six degrees of freedom of the tracked object. This algorithm is also applicable in multi-base stations tracking systems.
This paper proposed a unified pose estimation algorithm based on Lighthouse. Through calibration, in the case of known intrinsic parameters matrix of each base station, our algorithm can solve the above four cases correctly. In order to verify the effectiveness of our algorithm, a prototype was built utilizing HTC official base stations and a specifically constructed receiver, whose number of photodiodes and shape of tracker are customizable as per need. The experimental result shows that the tracking system using our algorithm achieves millimeter precision in positioning, and the jitter range is less than 0.4%. A simulation experiment shows that our algorithm is equally applicable to multi-base station tracking systems.

Materials and Methods
Our tracking system consists of three key components: a laser sweeps base stations module, a wearable tracker module and a positioning server, as shown in Figure 1. The function of base station are providing encode control and synchronous control. The local code information is given in advance (the pulse width of synchronizing signal denotes the following line laser from which base station), a LED (Light-emitting diode) array is used for transmitting a sync signal, horizontal and vertical laser sweeps, which are represented as laser sweep driver in Figure 1, and a photodiode, allowing for synchronization between stations when free of cables. The wearable tracker module contains multiple customizable photodiodes with relative position fixed, each of which will pick up signals from base stations and use them to determine the azimuth and elevation angles relative to the base station individually. The photodiode performs preamplification, filtering and quantization of the received laser scanning signal. This is done through the ARM (Advanced RISC Machine) single-chip decoding circuit, which receives the quantization signal to capture, statistics and decoding. The decoded signal is passed through the convert serial port to wireless chip to the positioning server. The input of the positioning server pose estimation algorithm includes the calibration data of each base station, the three-dimensional local coordinate of each photodiode, and the decoded time signal, which is obtained from the wearable tracker module, while the output of positioning server is location data with high precision.

System Architecture
The proposed tracking system consists of three main sections: two transmitters (base stations); one (or more) tracked object(s), and; a processing unit. The transmitter contains both sync and rotation mechanisms that produce horizontal and vertical laser sweeps. The receiver contains photodiodes, a signal amplifying module, a filtering module and a wireless output module. The processor contains the signal resolver. The tracked object transmits its time data (in millisecond) to a host PC.
After the laser sweeping signal is received by the photodiode, the front-end circuit performs preamplification, filtering and quantization of the received signal, and the quantized signal received by ARM single-chip decoding circuit is captured, surveyed and decoded. The decoded signal is then transmitted to a PC via wireless connection.
Every tracked object has a unique IP address that they can work independently. An ARM microcontroller is used as processor unit currently, and each chip can load eight receivers at the same time. In the future, we plan to use FPGA (Field-Programmable Gate Array) instead, so that each module could load more photodiodes. By installing customized receivers on different tracked objects, any physical object can be brought into the virtual world. It can be easily attached on various parts of our body, including the head, hand, ankle, and so on, thanks to its small size, flexibility and convenience installation. The most significant feature is that the number and geometry of photosensors can be customized when three or more photodiodes are involved.

Pose Estimation Algorithm
In this paper, the identification process is divided into four cases: (a) base station A identifies three or more points, and B zero; (b) both stations identify three or more points; (c) A or B, and not both, identifies three or more points; (d) neither station alone identifies three or more points, but put together, more than three points are identified.
The goal of our study is to develop a novel method which can estimate the pose of tracked objects in 3D with 2D coordinates with respect to different base stations. We treat each base station as a pinhole camera model. A representation of the signal or pulse train received by a photodiode from a base station is shown in Figure 2.

System Architecture
The proposed tracking system consists of three main sections: two transmitters (base stations); one (or more) tracked object(s), and; a processing unit. The transmitter contains both sync and rotation mechanisms that produce horizontal and vertical laser sweeps. The receiver contains photodiodes, a signal amplifying module, a filtering module and a wireless output module. The processor contains the signal resolver. The tracked object transmits its time data (in millisecond) to a host PC.
After the laser sweeping signal is received by the photodiode, the front-end circuit performs preamplification, filtering and quantization of the received signal, and the quantized signal received by ARM single-chip decoding circuit is captured, surveyed and decoded. The decoded signal is then transmitted to a PC via wireless connection.
Every tracked object has a unique IP address that they can work independently. An ARM microcontroller is used as processor unit currently, and each chip can load eight receivers at the same time. In the future, we plan to use FPGA (Field-Programmable Gate Array) instead, so that each module could load more photodiodes. By installing customized receivers on different tracked objects, any physical object can be brought into the virtual world. It can be easily attached on various parts of our body, including the head, hand, ankle, and so on, thanks to its small size, flexibility and convenience installation. The most significant feature is that the number and geometry of photosensors can be customized when three or more photodiodes are involved.

Pose Estimation Algorithm
In this paper, the identification process is divided into four cases: (a) base station A identifies three or more points, and B zero; (b) both stations identify three or more points; (c) A or B, and not both, identifies three or more points; (d) neither station alone identifies three or more points, but put together, more than three points are identified.
The goal of our study is to develop a novel method which can estimate the pose of tracked objects in 3D with 2D coordinates with respect to different base stations. We treat each base station as a pin-hole camera model. A representation of the signal or pulse train received by a photodiode from a base station is shown in Figure 2. A sync blinker pulse is emitted by an infrared LED array which is built into the transmitter at the beginning of a period. Each photodiode will pick up the signal simultaneously, and followed by sync signal, the vertical laser will scan line swept by rotor 1 in the scan volume, used to measure the interval between the sync signal and the vertical scanning signal, denoted by t 1 in Figure 2. After rotator 1 rotates 180 degrees, the infrared LED array will blink again as the second sync signal. Then the horizontal laser line swept by rotor 2 in the scan volume will be used to obtain the time interval between the second sync signal and the horizontal signal, denoted as t 2 in Figure 2. Suppose the rotating speed of both rotors is ω. The azimuth and elevation angles for each photodiode can be calculated as follows: The coordinates (x, y) for each sensor in the constellation in a certain base station can be derived as such: (2).
The final pixel coordinate (u, v) displayed on the screen can be derived from correlative formulae. For a sensor in the constellation at X = [x, y, z] T in its local coordinate system, the corresponding image coordinates in the emitter is x i = [u i ,v i ] T , i = 1, 2, 3, … , m represents the number of the transmitters, according to the projection imaging principle. Where X and x ĩ are the homogeneous presentations of and x i respectively, denoted as [X, 1] T and [x i , 1] T . P i is the projection matrix of the i-th emitter, which is obtained in the calibration process. The coordinates of a sensor in the world coordinate system [X w , Y w , Z w , 1] T . The coordinate of a sensor in the world coordinate system [X w , Y w , Z w , 1] T and pixel coordinate [u i , v i , 1] T satisfy Equation (3), where Z c is a constant, K i is the intrinsic parameter matrix of i-th emitter, and denote the rotation matrix and translation matrix of the i-th emitter, and K i , R i and T i are all known by calibration. Solving the pose of the tracked object simply involves solving R and T. Thus, we can obtain Through simplifying, we get the following form: A sync blinker pulse is emitted by an infrared LED array which is built into the transmitter at the beginning of a period. Each photodiode will pick up the signal simultaneously, and followed by sync signal, the vertical laser will scan line swept by rotor 1 in the scan volume, used to measure the interval between the sync signal and the vertical scanning signal, denoted by t 1 in Figure 2. After rotator 1 rotates 180 degrees, the infrared LED array will blink again as the second sync signal. Then the horizontal laser line swept by rotor 2 in the scan volume will be used to obtain the time interval between the second sync signal and the horizontal signal, denoted as t 2 in Figure 2. Suppose the rotating speed of both rotors is ω. The azimuth and elevation angles for each photodiode can be calculated as follows: The coordinates (x, y) for each sensor in the constellation in a certain base station can be derived as such: The final pixel coordinate (u, v) displayed on the screen can be derived from correlative formulae. For a sensor in the constellation at X = [x, y, z] T in its local coordinate system, the corresponding image coordinates in the emitter is x i = [u i , v i ] T , i = 1, 2, 3, . . . , m represents the number of the transmitters, according to the projection imaging principle. Where X and x i are the homogeneous presentations of X and x i respectively, denoted as [X, 1] T and [x i , 1] T . P i is the projection matrix of the i-th emitter, which is obtained in the calibration process. The coordinates of a sensor in the world coordinate system [X w , Y w , Z w , 1] T . The coordinate of a sensor in the world coordinate system [X w , Y w , Z w , 1] T and pixel coordinate [u i , v i , 1] T satisfy Equation (3), where Z c is a constant, K i is the intrinsic parameter matrix of i-th emitter, R i and T i denote the rotation matrix and translation matrix of the i-th emitter, and K i , R i and T i are all known by calibration. Solving the pose of the tracked object simply involves solving R and T. Thus, we can obtain Through simplifying, we get the following form: , P T 1 denotes the first row of P i , M 1 denotes the first column of M, and so on, for each parameter. Eliminating Z c in the first two rows with the third row, and we get Now we denote P T 3 u i − P T 1 and P T 3 v i − P T 2 as C i and D i , respectively. Both C i and D i are 1 × 4 matrices. Then the equation above is expressed in the form of non-homogeneous linear equation ×[r 11 r 21 r 31 r 12 r 22 r 32 r 13 r 23 r 33 t 1 t 2 where C i (j) denotes the j-th (j = 1, 2, 3, 4) element of C i , D i (j) denotes the j-th (j = 1, 2, 3, 4) element of D i , and t j denotes the j-th (j = 1, 2, 3) element of T. For each set of x i and X R , two independent equations can be obtained. We thus get 2N equations from N points, leaving 12 unknowns in the equation. Substituting points identified by all emitters into the above equation. When the rank of A matrix is equal to 12, the equation is linear, which can be formulated as a least squares problem. However, in order to make our algorithm can work under all circumstances, in this paper, the iterative optimization method is used to solve the derived model. The effective method is selected to solve this problem, which is an optional parameter in MATLAB optimization toolbox. The constraints we used include the following: r 2 11 + r 2 21 + r 2 31 = 1 r 11 r 21 + r 21 r 22 + r 31 r 32 = 0 r 2 12 + r 2 22 + r 2 32 = 1 r 11 r 13 + r 21 r 23 + r 31 r 33 = 0 r 2 13 + r 2 23 + r 2 33 = 1 r 12 r 13 + r 22 r 23 + r 32 r 33 = 0 We can get the optimal solution by satisfying the minimum mean square error of the objective function.

Hardware Design
The architecture described in Section 2 is implemented into a functional prototype affordable for general consumers. The implementation is designed to be modular, so we can evaluate each part independently. In general, the whole implementation consists of three different circuit boards (one for pre-signal processing, one for signal decoding, one for serial port to Wi-Fi) and two applications (tracked object firmware and server software).
For the tracked object, each receiver unit is designed to hold one photodiode along with its analog front-end circuit, as shown in Figure 3a. VISHAY TEMD5110X01 photodiodes (VISHAY, Palm Harbor, FL, USA) and Triad TS3633 signal conversion chip (Triad, San Francisco, CA, USA) are used featuring invulnerability to visible light interference, high spectral sensitivity and low response time. Its response wavelength is concentrated between 790 nm and 1050 nm, which largely overlaps with the laser emitted by the base station. Each processing module (Figure 3b) can load eight photodiodes. The core processing unit of each processing module circuit use ST STM32 SCM (ST, Geneva, Switzerland), a 168 MHz processor. Figure 3c is the prototype of serial port to Wi-Fi module, and the core component of this module is a TI CC3200 serial (TI, Dallas, TX, USA) to Wi-Fi chip. Of course, the installation of the tracker can be adapted for various scenarios. In this paper, the final installation is shown in Figure 4. invulnerability to visible light interference, high spectral sensitivity and low response time. Its response wavelength is concentrated between 790 nm and 1050 nm, which largely overlaps with the laser emitted by the base station. Each processing module (Figure 3b) can load eight photodiodes. The core processing unit of each processing module circuit use ST STM32 SCM (ST, Geneva, Switzerland), a 168 MHz processor. Figure 3c is the prototype of serial port to Wi-Fi module, and the core component of this module is a TI CC3200 serial (TI, Dallas, TX, USA) to Wi-Fi chip. Of course, the installation of the tracker can be adapted for various scenarios. In this paper, the final installation is shown in Figure 4.

Results and Discussion
In order to evaluate our system, the frame rate was measured first. The hardware frame rate is 272 Hz and the system frame rate is 70 Hz. The pose of tracked object is displayed in Figure 5. The experimental device is shown in Figure 6, which is the setting of our precision test, jitter test and latency test in this environment. The result of these experiments will be given in Sections 3.1.1 and 3.1.2.

Results and Discussion
In order to evaluate our system, the frame rate was measured first. The hardware frame rate is 272 Hz and the system frame rate is 70 Hz. The pose of tracked object is displayed in Figure 5. The experimental device is shown in Figure 6, which is the setting of our precision test, jitter test and latency test in this environment. The result of these experiments will be given in Sections 3.1.1 and 3.1.2.

Results and Discussion
In order to evaluate our system, the frame rate was measured first. The hardware frame rate is 272 Hz and the system frame rate is 70 Hz. The pose of tracked object is displayed in Figure 5. The experimental device is shown in Figure 6, which is the setting of our precision test, jitter test and latency test in this environment. The result of these experiments will be given in Sections 3.1.1 and 3.1.2.

Positioning Accuracy
The equipment of our system includes a slide guide, two sliders, a charge pal (which supplies electricity to the tracked object), a tracked object and two HTC Vive base stations. The experimental environment is shown in Figure 6. Three distances in space (1 m, 3 m, 5 m from the base stations) are specified for the test, then in each distance we test the precision along different directions (x-axis, yaxis, z-axis) of the world coordinate. The tracker is installed on the slider, which is moved from 15 cm of the slide guide to 65 cm (step, 2.5 cm), and we pin down the origin of the world coordinate when the tracker is laid on 15 cm. After the tracker is fixed on each position, 1000 valid sets of the tracker's center coordinates in a virtual environment are collected. Thus, we get 21 × 1000 datasets for each direction/distance. We can obtain the displacement distance from the data of the latter position and the corresponding data of its previous position. We define | −̂| as the absolute error, ̂ is 25 here, and the unit is mm.
The experimental results are shown in Figure 7. Among these figures, the blue, orange and gray bar represent the result of the tracker in 1 m, 3 m, and 5 m, respectively. The vertical axis means the precision; the unit is mm, and the abscissa axis denotes the 21 positions in the moving process. More specific results was given in Table 1. There may exist partially occlusion during our test, but we could also see the result is quiet stabilize.
It can be seen in Figure 7 that the positional precision of our tracking system is within millimeter level, and the average positioning precision is 1.4653 mm. Most noises are primarily due to mechanical vibrations and inconstant rotational speed of the rotors, which are not arranged perfectly

Positioning Accuracy
The equipment of our system includes a slide guide, two sliders, a charge pal (which supplies electricity to the tracked object), a tracked object and two HTC Vive base stations. The experimental environment is shown in Figure 6. Three distances in space (1 m, 3 m, 5 m from the base stations) are specified for the test, then in each distance we test the precision along different directions (x-axis, yaxis, z-axis) of the world coordinate. The tracker is installed on the slider, which is moved from 15 cm of the slide guide to 65 cm (step, 2.5 cm), and we pin down the origin of the world coordinate when the tracker is laid on 15 cm. After the tracker is fixed on each position, 1000 valid sets of the tracker's center coordinates in a virtual environment are collected. Thus, we get 21 × 1000 datasets for each direction/distance. We can obtain the displacement distance from the data of the latter position and the corresponding data of its previous position. We define | −̂| as the absolute error, ̂ is 25 here, and the unit is mm.
The experimental results are shown in Figure 7. Among these figures, the blue, orange and gray bar represent the result of the tracker in 1 m, 3 m, and 5 m, respectively. The vertical axis means the precision; the unit is mm, and the abscissa axis denotes the 21 positions in the moving process. More specific results was given in Table 1. There may exist partially occlusion during our test, but we could also see the result is quiet stabilize.
It can be seen in Figure 7 that the positional precision of our tracking system is within millimeter level, and the average positioning precision is 1.4653 mm. Most noises are primarily due to mechanical vibrations and inconstant rotational speed of the rotors, which are not arranged perfectly

Positioning Accuracy
The equipment of our system includes a slide guide, two sliders, a charge pal (which supplies electricity to the tracked object), a tracked object and two HTC Vive base stations. The experimental environment is shown in Figure 6. Three distances in space (1 m, 3 m, 5 m from the base stations) are specified for the test, then in each distance we test the precision along different directions (x-axis, y-axis, z-axis) of the world coordinate. The tracker is installed on the slider, which is moved from 15 cm of the slide guide to 65 cm (step, 2.5 cm), and we pin down the origin of the world coordinate when the tracker is laid on 15 cm. After the tracker is fixed on each position, 1000 valid sets of the tracker's center coordinates in a virtual environment are collected. Thus, we get 21 × 1000 datasets for each direction/distance. We can obtain the displacement distance from the data of the latter position and the corresponding data of its previous position. We define |X u −X u | as the absolute error,X u is 25 here, and the unit is mm.
The experimental results are shown in Figure 7. Among these figures, the blue, orange and gray bar represent the result of the tracker in 1 m, 3 m, and 5 m, respectively. The vertical axis means the precision; the unit is mm, and the abscissa axis denotes the 21 positions in the moving process. More specific results was given in Table 1. There may exist partially occlusion during our test, but we could also see the result is quiet stabilize.   It can be seen in Figure 7 that the positional precision of our tracking system is within millimeter level, and the average positioning precision is 1.4653 mm. Most noises are primarily due to mechanical vibrations and inconstant rotational speed of the rotors, which are not arranged perfectly symmetrical. Calibration, measurement and environment factors also contribute to errors. The maximal error is observed in the distance of 5 m from the base stations, reaching 10.6 mm.

Angel Accuracy
In order to test the angel accuracy of our algorithm, we stick tracker on a three-axis turntable. There is a built-in bubble on the tracker, so this guarantee the axis of tracker horizontal. In this way, we ensure that the tracker approximately rotates around the only axis. Our angle-measurement system is shown in Figure 8. Three distances in space (1 m, 3 m, 5 m from the base stations) are specified for the test, then in each distance we test the angle precision along different axis (x-axis, y-axis, z-axis) of the world coordinate pivoted. We rotated the tracker from −65 • of the rotary stage to 65 • (step, 5 • ), and we pin down the origin of the world coordinate when the tracker is laid on 0 • . After the tracker is fixed on each position, 100 valid sets of tracker's rotation matrix in virtual environment are collected and took average. Thus, we get 25 datasets for each spindle/distance. We use the method in the code of [12] to calculate the tracker's corner relative to the previous position. The experimental results are shown in Table 2.

Jitter Measurement
The dataset used in the jitter test is acquired in the same way as the precision test, only processed differently. For the 21 × 1000 groups of data, after getting rid of outliers (<0.5%), Figure 8 shows the scatter point protraction in MATLAB.
In Figure 9, the blue dots represent the three-dimensional positions in the virtual environment.  It can be seen in Table 2 that the angular precision of our algorithm is within minute level, and the average angle precision is 0.3199 • . Most errors are primarily due to mechanical vibrations and inconstant rotational speed of the rotors, which are not arranged perfectly symmetrical. Calibration, measurement and environment factors also contribute to errors. The stability of our system is inversely proportional to distance.

Jitter Measurement
The dataset used in the jitter test is acquired in the same way as the precision test, only processed differently. For the 21 × 1000 groups of data, after getting rid of outliers (<0.5%), Figure 8 shows the scatter point protraction in MATLAB.
In Figure 9, the blue dots represent the three-dimensional positions in the virtual environment. The red crosses refer to the convex centers of each of the 1000 points. The pink circles denote the ground truth. The size of the blue diffuse spot represents the degree of jitter at this position. As the figure demonstrates, the movement trend of points is consistent with the ground truth, but there exist a certain angle or translation. The further the distance between the tracker and base station, the more obvious the deviation is. The error comes mainly from two sources. One is the systematic jitter, and the other is errors in the process of obtaining world coordinates by taking only one picture. The experimental results show that the jitter of our system is less than 0.28773% at 1 m away from the base stations. The jitter at 3 m is less than 0.487537%, and less than 0.397538% at 5 m. Jitter of various combinations of direction and distance is shown in Table 3. The stability of our system is pretty good since the overall jitter is within 0.4%.

Jitter Measurement
The dataset used in the jitter test is acquired in the same way as the precision test, only processed differently. For the 21 × 1000 groups of data, after getting rid of outliers (<0.5%), Figure 8 shows the scatter point protraction in MATLAB.
In Figure 9, the blue dots represent the three-dimensional positions in the virtual environment. The red crosses refer to the convex centers of each of the 1000 points. The pink circles denote the ground truth. The size of the blue diffuse spot represents the degree of jitter at this position. As the figure demonstrates, the movement trend of points is consistent with the ground truth, but there exist a certain angle or translation. The further the distance between the tracker and base station, the more obvious the deviation is. The error comes mainly from two sources. One is the systematic jitter, and the other is errors in the process of obtaining world coordinates by taking only one picture. The experimental results show that the jitter of our system is less than 0.28773% at 1 m away from the base stations. The jitter at 3 m is less than 0.487537%, and less than 0.397538% at 5 m. Jitter of various combinations of direction and distance is shown in Table 3. The stability of our system is pretty good since the overall jitter is within 0.4%.

Latency Measurement
The device for latency test includes a 1000-frame high-speed camera, a set of dual-screen computer, an optical slide, two base stations and a tracked object which is shown in Figure 10. We use the high-speed camera to capture the tracker's movement in both the virtual three-dimensional space and the real world. An indicator light is installed on the tracker so that we can observe the movement of the tracker clearly. When the tracker is quickly dragged on the slide, screen 1 shows the tracker in the virtual three-dimensional space, and screen 2 serves as a monitor. The high-speed camera is used to record the entire process of the tracked object from stationary to swift motion. After the shooting is complete, the video is played frame by frame. The latency of our system is obtained by multiplying the frame rate, which is known, by the difference between the frame number when the tracker begins to move in the virtual space and its counterpart in the real world. The average latency of five measurements is less than 0.047 s.

Latency Measurement
The device for latency test includes a 1000-frame high-speed camera, a set of dual-screen computer, an optical slide, two base stations and a tracked object which is shown in Figure 10. We use the high-speed camera to capture the tracker's movement in both the virtual three-dimensional space and the real world. An indicator light is installed on the tracker so that we can observe the movement of the tracker clearly. When the tracker is quickly dragged on the slide, screen 1 shows the tracker in the virtual three-dimensional space, and screen 2 serves as a monitor. The high-speed camera is used to record the entire process of the tracked object from stationary to swift motion. After the shooting is complete, the video is played frame by frame. The latency of our system is obtained by multiplying the frame rate, which is known, by the difference between the frame number when the tracker begins to move in the virtual space and its counterpart in the real world. The average latency of five measurements is less than 0.047 s.

Simulation Experiment of Multi-Base Station System
To prove that our algorithm is also applicable to multi-base station tracking systems, we have simulated the same process in MATLAB. We assume that there are n base stations (n = 3, 4, 5,…). The parameters of all base stations, such as R matrix and T matrix relative to the world coordinate, are given. We obtain the sample points which has 1000 groups with six points in each group randomly scattered in a 3D space. Each base station is able to identify one or several points in each group. We acquire the pose of the tracked object with our algorithm. When the number of base stations is three, each base station identifies one point, which is the minimum configuration for pose estimation of this algorithm. The calculation is compared with the predicted value, thereby obtaining the calculation error. The statistical result is shown in Table 4. Given that the case where three points are identified by four or five base stations is similar to the case where three points are identified by three base stations, we do not discuss this here. In order to simulate real conditions, we add Gaussian White Noise to the data, mean 0, with different variance (0.8 and 1.2). In a wide-range tracking system, a point will be swept by between one to five base stations at the same time, and this experiment covers all possible situations. It can be seen from Table 4 that as the number of stations and points increase, the precision of the pose estimation remains stable. There is little difference between multi-base-station and dual-base-station systems.

Conclusions
This paper proposes a unified pose estimation algorithm based on Lighthouse. The merit of the algorithm is that it can consolidate data from different base stations in a common framework, making

Simulation Experiment of Multi-Base Station System
To prove that our algorithm is also applicable to multi-base station tracking systems, we have simulated the same process in MATLAB. We assume that there are n base stations (n = 3, 4, 5, . . . ). The parameters of all base stations, such as R matrix and T matrix relative to the world coordinate, are given. We obtain the sample points which has 1000 groups with six points in each group randomly scattered in a 3D space. Each base station is able to identify one or several points in each group. We acquire the pose of the tracked object with our algorithm. When the number of base stations is three, each base station identifies one point, which is the minimum configuration for pose estimation of this algorithm. The calculation is compared with the predicted value, thereby obtaining the calculation error. The statistical result is shown in Table 4. Given that the case where three points are identified by four or five base stations is similar to the case where three points are identified by three base stations, we do not discuss this here. In order to simulate real conditions, we add Gaussian White Noise to the data, mean 0, with different variance (0.8 and 1.2). In a wide-range tracking system, a point will be swept by between one to five base stations at the same time, and this experiment covers all possible situations. It can be seen from Table 4 that as the number of stations and points increase, the precision of the pose estimation remains stable. There is little difference between multi-base-station and dual-base-station systems.

Conclusions
This paper proposes a unified pose estimation algorithm based on Lighthouse. The merit of the algorithm is that it can consolidate data from different base stations in a common framework, making low-cost, large-scale and precise tracking possible. The only constraint is about how many points are identified by all base stations, instead of each base station. The experimental result shows that the positioning precision of tracking system using our algorithm reaches the millimeter level, the angular precision is minute level and the jitter range is less than 0.4%. The simulation experiment shows that our algorithm is equally applicable to multi-base-station tracking systems. While the precision of this system is very high, the results from a low-cost implementation show that there is room for improvement. The future work will focus on improving hardware stability, increasing the number of processing unit ports, as well as expanding base stations to achieve tracking in a broader range.