Registration of Panoramic/Fish-Eye Image Sequence and LiDAR Points Using Skyline Features

We propose utilizing a rigorous registration model and a skyline-based method for automatic registration of LiDAR points and a sequence of panoramic/fish-eye images in a mobile mapping system (MMS). This method can automatically optimize original registration parameters and avoid the use of manual interventions in control point-based registration methods. First, the rigorous registration model between the LiDAR points and the panoramic/fish-eye image was built. Second, skyline pixels from panoramic/fish-eye images and skyline points from the MMS’s LiDAR points were extracted, relying on the difference in the pixel values and the registration model, respectively. Third, a brute force optimization method was used to search for optimal matching parameters between skyline pixels and skyline points. In the experiments, the original registration method and the control point registration method were used to compare the accuracy of our method with a sequence of panoramic/fish-eye images. The result showed: (1) the panoramic/fish-eye image registration model is effective and can achieve high-precision registration of the image and the MMS’s LiDAR points; (2) the skyline-based registration method can automatically optimize the initial attitude parameters, realizing a high-precision registration of a panoramic/fish-eye image and the MMS’s LiDAR points; and (3) the attitude correction values of the sequences of panoramic/fish-eye images are different, and the values must be solved one by one.


Introduction
Motivated by applications of vehicle navigation, urban planning and autonomous driving, the need for 3D information on urban areas has dramatically increased in recent years. Mobile mapping systems (MMS) technology has been widely used for efficient data acquisition. Information obtained by an MMS mainly includes a sequence of optical images captured by a camera and point clouds obtained by LiDAR. Optical images have rich texture information, and point clouds mainly reflect the spatial characteristics. Registering images and point clouds has important theoretical and practical value.
Optical image (2D) and point cloud (3D) data are two types of data. The imaging model of an optical image is a collinear equation. The point cloud reflects the position of an object by transmitting a laser pulse. Thus, the geometric reference frame of LiDAR points and an optical image is different. There are 3 key problems that must be solved before registration: the primitive pairs, the registration model and parameter optimization. To capture large scenes in a single image, a panoramic camera could be used on an MMS. The most common type of panoramic camera, such as Ladybug, consists of a multiple fish-eye camera rig. Images with a narrow field are first obtained by fish-eye lenses, and then projected and resampled to stitch together a panoramic image. The acquisition efficiency of rectangles, rather than only straight lines. Stamos et al. [26] and Schenk [27] laid a solid foundation in line feature-based registration, and Liu et al. [28,29] utilized linear features to determine the camera orientation and position relative to the 3D model. 3D-3D-based methods. 3D-3D-based methods require a sequence of images for 3D reconstruction and then 3D matching to achieve registration. Structure from motion (SFM) is a widely used method that reconstructs 3D points using a set of images [30,31]. A 3D matching algorithm, such as iterative closest point (ICP) [32] and normal distributions transform (NDT) [33], is often used for point registration. Zheng et al. [34] used bundle adjustment for a sequence of images; the points obtained by adjustment were matched with the laser point cloud by ICP. Zhao et al. [35] used stereo vision technology to process 3D reconstruction and used the ICP algorithm to achieve 3D point cloud registration. Abayowa et al. [36] presented a coarse to fine strategy in the estimation of the registration parameters without initial alignment.
We propose utilizing a rigorous imaging model and skyline-based method for automatic registration of LiDAR points and a sequence of panoramic/fish-eye images. This method improves the accuracy of registration automatically through optimizing attitude parameters. Figure 1 shows the flow chart of this method. This paper is organized as follows. Section 2 provides the rigorous panoramic/fish-eye image registration model and the skyline matching method. In Section 3, conducted experiments that verify the effectiveness of the skyline-based method are described. Section 4 discusses the parameters in skyline pixels/points matching and the precision/automation of our method. Finally, conclusions are presented in Section 5.

Materials
The MMS used in this paper was jointly developed by Wuhan University and Leador Spatial Information Technology Corporation, configured with a panoramic camera, three low-cost SICK laser scanners (one for ground and two for facades) and a GPS/IMU [1], they are connected by a precision mechanical device with accurate calibration. The panoramic camera is a Ladybug5 (FLIR Integrated Imaging Solutions Inc., Richmond, BC, Canada), which includes 6 high-definition Sony ICX655 CCDs, with 5 in the side (horizontal) and 1 at the top (vertical). The camera can perform image acquisition, processing, correction and stitching of a panoramic image in real-time. Figures 2 and 3 show the point data and the optical image of the MMS, respectively. The image data include a fish-eye image (6000 × 4000) along the road direction and a panoramic image (4000 × 8000) stitched by 6 fish-eye images. The point data of the MMS corresponds to the image at the same place; this road has a length that is approximately 220 m, the data of which includes 3 million points. In addition, the initial values of the imaging position and attitude, as well as the calibration information of the MMS sensors, are known. Based on the current panoramic/fish-eye image (N), 4 panoramic/fish-eye images were selected on both sides of N, which constituted 5 consecutive images (N − 2, N − 1, N, N + 1, and N + 2). This sequence of panoramic/fish-eye images is shown in Figure 4.    To compare the accuracy of different registration methods, 38 control points are selected manually; these points are mainly distributed on the corners of buildings and billboards, the tops of lamps, etc. As shown in Figures 1 and 3, the panoramic and fish-eye images contains 38 and 19 control points, respectively; an exception is that only 17 control points are in the fish-eye image (N + 2). In Appendix A, Table A1 lists the 3D coordinates of the 38 control points obtained from the LiDAR points; Table A2 lists the 2D image coordinates of the 38 control points in the panoramic image; and Table A3 lists the 2D image coordinates of control points in the fish-eye image.

Panoramic/Fish-Eye Image Registration Model
The imaging mode of a panoramic/fish-eye image must be known before registration. The panoramic camera used in an MMS is composed of multiple fish-eye lenses at present; the fish-eye images can be stitched together to form a panoramic image according to a fixed model. As shown in Equation (1), the collinear equation is a classic imaging model. The Y axis is perpendicular to the image plane (XOZ) in this model, and the panoramic/fish-eye imaging model can be transformed by a collinear equation.
is the coordinate of an object, f is the imaging lens focal length, (X S Y S Z S ) is the imaging position, (r x r y r z ) is the imaging attitude, and (r c) is the perspective projection coordinate.

Panoramic/Fish-Eye Imaging Model
There are 4 types of fish-eye lens projections: equidistant projection, orthographic projection, equisolid-angle projection and stereographic projection [37,38]. The panoramic stitching models include a spherical model, cylinder model and cube model. Considering the experimental data in our paper, an equidistant projection and a spherical model were used as examples.
The equidistant projection of a fish-eye lens can be expressed as: where θ = a tan( Z 2 + X 2 /Y), and (r c ) is the coordinate of the object in the fish-eye image.
As shown in Equation (3), for a panoramic image using the spherical model, take vertical angle (v) as the row coordinate and the horizontal angle (h) as the column coordinate. Considering h ∈ [−π π] and v ∈ [−π/2 π/2], the size of the panoramic image column is 2 times that of a row.
The imaging parameters can be obtained accurately by the control points. Equations (4) and (5) are transformed by Equations (2) and (3) and represent the computational model of the panoramic/ fish-eye image.
where r d = √ r 2 + c 2 , θ = r d / f , and (r c ) is the fish-eye image coordinate.
where v = r /row · π, h = c /col · 2π, (row col) expresses the size of the panoramic image, and (r" c") is the panoramic coordinate. Let the right side of Equations (4) and (5) equal p 1 and p 2 , which are known as the image coordinate. The adjustment model of a panoramic/fish-eye image is shown in Equation (6); these variables can be calculated by the DLT method.

Registration Accuracy Evaluation
Substituting the solved variables and control points' coordinates into Equation (6), p 1 and p 2 can be calculated, and then the panoramic/fish-eye image coordinates of control points can be obtained by Equations (7) and (8). v = a tan(p 1 / 1 + p 2 2 ) h = a tan(p 2 ) (7) where θ = a tan( p 1 2 + p 2 2 ), and the other parameters are the same as those previously indicated. The registration error (δ) is given by Equation (9) and is considered to be the precision index of registration.
where m is the number of control points, (r i c i ) expresses the image coordinate of the control points from Equations (7) and (8), and (r i c i ) expresses the real coordinate of the control points.

Skyline-Based Method
The skyline-based method includes skyline pixels extracted from a panoramic/fish-eye image, skyline points extracted from LiDAR points and skyline pixels/points matching.

Skyline Pixels Extracted from Panoramic/Fish-Eye Image
Because the difference in the image pixel value between the sky and other objects is significant, the skyline can be easily extracted by this feature. The first grey jump as the skyline pixels are searched from above to below and column by column, the skyline pixels (blue pixels) extracted from the panoramic/fish-eye image can be obtained, as shown in Figure 5a,b, which include the top profile of buildings, trees, billboards, power lines and street lamps. Considering the linear feature of a power line and the discrete characteristics of LiDAR points, power line points extracted from LiDAR points are incomplete; thus, power line pixel interference must be eliminated by setting a jump buffer. Figure 5a,b show skyline pixels (red pixels) after optimization in which the power line pixels have been eliminated.

Skyline Points Extracted from LiDAR Points
Skyline points extracted from LiDAR points are influenced by imaging position, imaging attitude, imaging focal length and imaging model. In particular, panoramic image registration is related to imaging position and attitude, and fish-eye image registration is related to imaging position, attitude and focal length. Imaging position and attitude can be obtained by GPS/IMU and calibration of the MMS, and imaging focal length is obtained by camera calibration. Next, skyline points can be extracted from LiDAR points as follows. First, generate a panoramic image with all points using Equation (3); the highest pixel in the image is calculated column by column, and the point corresponding to this pixel is a skyline point. The extraction algorithm of fish-eye image registration is similar; the skyline points include a large amount of noise, and no skyline points correspond to skyline pixels in some regions; therefore, constraints must be added in skyline matching to ensure the rigorous correspondence of skyline points and pixels.

Skyline Pixels/Points Matching
The skyline pixels/points are extracted from the panoramic/fish-eye image and LiDAR points above. We define the image skyline pixels as Φ{r i , c i } i=0,1,...m and the skyline points as Ω{x i , y i , z i } i=0,1,...n in this section to analyze the skyline matching method. The key of this method is to restore the correspondence between Φ and Ω. We solve this problem using the brute force optimization method. The position parameters have little influence on the image compared to the attitude parameters; therefore, we only optimize attitude parameters in this paper. As shown in Equation (10), the procedure follows from constructing the correcting matrix R , which consists of 3 correction angles (r x r y r z ), and then placing R before R in the imaging model. (r x r y r z ) can be obtained by brute force optimization, and the registration of the LiDAR points and the image is performed when the skyline pixels/points are matched. where First, the imaging position (X S Y S Z S ), the attitude (r x r y r z ) and the focal length (f ) are obtained from GPS/IMU and calibration of the MMS [1] (as this calculation is very common, it is not described in detail here). Taking the maximum error of (r x r y r z ) as the initial value range (w) and dividing it into t parts, as shown in Equation (11), (t + 1) 3 panoramic/fish-eye synthetic images Ω{r i , c i } are generated using Ω{x i , y i , z i } according to Equations (2), (3) and (10). Second, we calculate the similarity between Ω{r i , c i } and Φ r j , c j , as shown in Equation (12). With the number of matching skyline pixels (n) used as the evaluation index, the highest similarity can be obtained from (t + 1) 3 synthetic images, and taking the corresponding parameters (i, j, k) as the initial value of the next iteration, each iteration reduces the range by half.
where e denotes the gross error elimination threshold; when the row difference between Ω{r i , c i } and Φ{r i , c i } in the same column are less than e, n is added; otherwise they are unchanged. n reflects the skyline similarity between a real image and a synthetic image. e is used to eliminate mismatching of skyline pixels. Considering the differences of acquisition sensors, skyline points are impossible to match completely. For example, the building skyline in the left part of the panoramic image is easy to extract from the image; however, LiDAR points are missing in the corresponding region, and there is no matching skyline. e can eliminate this part of the skyline, such that it is not involved in the parameter optimization.

Comparison of the Registration Methods
To compare the accuracy of our method, the original registration method, the skyline-based registration method and the control point registration method (referred to below as method I, II and III, respectively) were used for the panoramic/fish-eye image registration. First, the original registration (I) was conducted according to Equations (1)-(3), and the registration parameters (f,X S ,Y S ,Z S ,r x ,r y ,r z ) were obtained by GPS/IMU and MMS calibration. Second, by taking advantage of control points for registration (III) according to Equations (4) and (5), this method does not require the initial value. The registration results of the panoramic/fish-eye image (N) with method I and III are shown in Figure 6. The key parameters involved in the skyline registration method include: angle error range (w), parameter segmentation (t), gross error threshold (e), number of matching pixels (n) and registration error (δ). n is the evaluation index of the parameter optimization, δ expresses the matching error of 38 (19) control points. Herein, let w = 5 • , t = 6, and e = 5 pixels. Next, the skyline is extracted from the panoramic/fish-eye image and the LiDAR points, as described above. The skyline points are used for imaging 7 3 times by Equations (2), (3) and (10), and n between each generated image and real image skyline is calculated. We take both sides of maximum n as the starting value of the next iteration, and w is reduced by half in each iteration. After 6 iterations, the matching skyline pixels of the panoramic/fish-eye image are shown in Figure 7. The registration results with 3 methods are shown in Figure 6. Figures 8 and 9 show local images to compare the registration effect.   As shown in Figure 7, the matching skyline pixels includes the top outlines of buildings, trees, lamps, etc. The influence of the unmatched skyline's registration effect has been eliminated, such as the missing parts (the building in the left panoramic image) and cables. Simultaneously, considering the discrete types of point clouds and the continuity of the image, the number of matching skyline pixels(n) is far less than the column of the panoramic/fish-eye image. The proportion of the panoramic and fish-eye images is 17.75% and 11.35%, respectively. Figures 8 and 9 show the local registration effect of the panoramic and fish-eye images, respectively, with 3 methods. A large dislocation is observed with method I, especially in the billboards, street lights, trees, buildings, etc. Conversely, method III can obtain the ideal effect, and all objects have a very high coincidence. The proposed skyline-based method can also achieve good registration, whose accuracy is between that of method I and that of method III. To quantitatively analyze the effect of the 3 methods, the registration error (δ) is calculated by the control points according to Equation (9). Table 1 lists δ in each iteration using our method, and Table 2 lists δ for 3 methods. Table 1. Skyline registration error in each iteration (w, r x , r y , r z (degrees), δ(pixels)).

NO.
w Panoramic Image Fish-Eye Image r x r y r z n δ r x r y r z n δ As shown in Table 1, δ becomes stable in both panoramic/fish-eye images after 6 iterations; w has been reduced to 0.15 • , and the angle parameters are no longer the main factors affecting registration accuracy. Analysis of each iteration reveals that δ noticeably decreases as n increases, and that the rotation angle parameter has been optimized gradually. Table 2 lists δ of 3 methods. Our method has an accuracy (δ/image diagonal) of 0.97‰ and 2.04‰ in the panoramic and fish-eye images, respectively, which is greater than that of method III and less than that of method I. This conclusion is the same as the above image analysis. Finally, the position error of each control point with the 3 methods is analyzed in detail. Figure 10 shows the error of 38/19 control points in the panoramic/fish-eye image. Figure 11 shows the locations of all control points after registration.
According to Figure 10a,b, the error of each control point is clearly reduced after each iteration compared with method I, especially in the first iteration. Afterwards, the error maintenance reduces in subsequent iterations and gradually approaches that of method III. Figure 11 displays all control points calculated by the 3 methods on panorama/fish-eye images. We can see that the direction of the control points' offset is consistent compared with the real position in method I, and this error is regular, indicating the optimized rotation angle has a theoretical basis.

Point Cloud Colouring
LiDAR points can be given an RGB value after registration, so the points not only contain spatial information but also have texture information. Using texture points can achieve a texture image with any position and attitude. The procedure follows by setting the current imaging position as N, N − 1 and N + 1 to express the imaging positions before and after N in the sequence images, as shown in Figure 12a. We generate an image according to the perspective imaging model. The main axis is along the road direction and the other rotation angle is 0 • . The imaging focal length is 250 pixels and the imaging size is 1000 × 1000. Figure 12b-d show the imaging results. Through texture points imaging, we can obtain the scene from an arbitrary location and attitude to meet different vision requirements.

Sequence of Panoramic/Fish-Eye Image Registration
Our registration method is verified by the previous experiment. In this section, experiments are performed using our method with a sequence of panoramic/fish-eye images (N − 2, N − 1, N + 1, N + 2) to analyze the regularity of the optimized parameters. The parameters involved in our method are the same as those previously indicated. The rotation angle correction value (r x r y r z ) of each panoramic/fish-eye image can be obtained automatically. Figure 13 shows the registration result using the 3 methods. Figures 14 and 15 show the local registration effect using the 3 methods considered. Figures 14 and 15 show the local image with 3 methods to compare the effectiveness of the registration. Our method has obvious optimization effects compared with method I and approaches the effect of method III. Moreover, our method is completed without manual intervention, unlike method III. For quantitative analysis, Table 3 lists the imaging position and angle correction value of a sequence of panoramic/fish-eye images using our method, Table 4 lists the registration accuracy of a sequence of panoramic/fish-eye images for the 3 methods considered. Table 3. Registration parameters of panoramic/fish-eye image sequence (X S ,Y S ,Z S (meters), r x , r y , r z (degrees)).

NO.
Imaging  Table 4. Registration accuracy of panoramic/fish-eye image sequence with 3 methods (pixels).    As shown in Table 3, the interval between sequence images is approximately 7 m, and the road height difference is within 0.3 m. The angle correction values of each panoramic/fish-eye image are different. Although the difference is small in most cases, it must be solved separately, rather than used directly for the next image. Table 4 shows that the registration error of method III is the smallest; the error of method II is approximately 2 times that of method III; and the error of method I is approximately 3 times that of method II. For further analysis, as shown in Figure 16, the position error of each control point using the 3 methods in a sequence of images is analyzed in detail.  As shown in Figure 16a,b, the error of each control point is noticeably reduced compared with that of method I. Moreover, the error distribution of the control point has a certain regularity with our method. First, the error of control points located at both ends of a panoramic image is large, such as no.

Panoramic Image Fish-Eye Image
(1, 2, 37 and 38). Second, the control points located at the top of the image have a larger error, such as no. (14 and 26) in the panoramic/fish-eye image (N + 1, N + 2), leading to a greater registration error of the fish-eye image (N + 1, N + 2) and the panoramic image (N + 2) than that of the others.

Parameters in Skyline Pixels/Points Mat Ching
The parameters in skyline matching are angle error range (w), parameter segmentation (t) and gross error threshold (e), and keep the same in panoramic/fish-eye image registration. w is the iteration range of angle initial values (r x r y r z ), in this paper, which is obtained from IMU. If IMU is unavailable, the approximate values of r x , r z could be solved by adjacent imaging position, and r y is set to 0. t determines the efficiency of registration. The nodes are set evenly in the correction angles (r x r y r z ), and it produces (t + 1) 3 images at each iteration. The number of iterations can be reduced when t is large, but the amount of computation will be increased by power function. In order to ensure the accuracy of registration, w is reduced to half not w/t after each iteration. e is the key parameter of optimization; the matching skyline pixels/points less than e are considered as successful matches. When e is larger, it would be more mismatched points, but when e is small, it would filter out correct matching pixels/points. In addition, e is related to the image's rows, the rows of the panoramic and fish-eye images is 4000 and 6000 pixels respectively in our experiments, it is empirically proved that setting e to 5 pixels is suitable.

The Precision and Automation Compare with Other Methods
Our method does not require manual intervention in the registration of panoramic/fish-eye images and LiDAR points. In recent studies, some articles use primitive pairs as line and plane, and a rigid registration to obtain high-precision result. However, extracting primitive pairs automatically from images and LiDAR is still challenging. Cui et al. [1] extracted line features from LiDAR points automatically and from the corresponding rectified mono-images through a semi-automation process, and achieved the registration accuracy of 4.718 and 4.244 pixels (image size is 1616 × 1232 pixels) in spherical and panoramic camera respectively. However, the process requires a manual check for primitive pairs before registration. Li et al. [2] took parked vehicles as registration primitives, and extracted them from panoramic images via Faster-RCNN; particle swarm optimization was further utilized to refine the translations between the panoramic camera and the laser scanner and registration error less than 3 pixels (image size is 2048 × 4096 pixels) are obtained. However, this automatic registration method relies on the parked vehicles, which makes its application scenarios limited. Moreover, the extraction of vehicles often fails due to poor illumination or occlusion. Our method takes skyline as primitive pairs as it is easy to extract both from image and LiDAR and skyline exists in most of scenes. In addition, our method can be used for both fish-eye image and panoramic image.

Weakness and Future Work
The main weakness of our method is the dependence on the skyline. In an open area, the image pixels of skyline could represent very distant objects, while points cloud are limited in long distance, making skyline pixels/points failed to match. Therefore, our method is more suitable for urban area or similar places with nearby skyline. There are also some problems require further study, such as the optimization of the position parameters which could be iterated alternately with the attitude parameters. The stitching error of the panoramic image also needs to be further studied as the camera center of each fish-eye lens hardly overlaps, and the attitude of each lens is seldom in the same horizontal plane.

Conclusions
In this paper, we proposed a skyline-based method for automatic registration of LiDAR points and panoramic/fish-eye image sequence in an MMS. The effectiveness of this method was demonstrated by comparison with the original registration and control points registration methods. Compared with other related works, the main contribution of this study is that (1)