Line-Based Registration of Panoramic Images and LiDAR Point Clouds for Mobile Mapping

For multi-sensor integrated systems, such as the mobile mapping system (MMS), data fusion at sensor-level, i.e., the 2D-3D registration between an optical camera and LiDAR, is a prerequisite for higher level fusion and further applications. This paper proposes a line-based registration method for panoramic images and a LiDAR point cloud collected by a MMS. We first introduce the system configuration and specification, including the coordinate systems of the MMS, the 3D LiDAR scanners, and the two panoramic camera models. We then establish the line-based transformation model for the panoramic camera. Finally, the proposed registration method is evaluated for two types of camera models by visual inspection and quantitative comparison. The results demonstrate that the line-based registration method can significantly improve the alignment of the panoramic image and the LiDAR datasets under either the ideal spherical or the rigorous panoramic camera model, with the latter being more reliable.


Introduction
Motivated by applications such as vehicle navigation [1], urban planning [2], and autonomous driving [3], the need for 3D detailed photorealistic models in urban areas has increased dramatically in recent years. A mobile mapping system (MMS), which collects 3D and/or 2D photographic data while vehicles move at a regular speed, have been widely used as the efficient data acquisition technology at street-level [4]. Short range laser scanners (e.g., 100-200 m) and electro-optical cameras are two primary sensors on a MMS, each of which has its own characteristics. Laser scanners can acquire 3D information directly, but offer relatively low resolution and short range of use, whereas a camera captures 2D images with high-resolution textures but the depth information is missing. Thus, these complementary characteristics between ranging and imaging sensors have been broadly studied through the so-called camera/LiDAR fusion [5][6][7][8].
A prerequisite of data fusion is to transform different datasets to a common coordinate system, which is often called 2D-to-3D registration. Although the camera and LiDAR sensors are usually calibrated [9][10][11][12] in advance, considerable misalignment may still exist between the two datasets. Possible reasons for such misalignment are as follows: (1) System calibration errors. The relative orientation and position offsets between all the sensors, GPS, Inertial Measurement Unit (IMU), LiDAR automatic registration of mobile LiDAR and spherical panoramas; however, only part of the panoramic image in the limited viewport was used. Moreover, the registration was based on a conventional frame camera model.
Besides registration, many applications based on panoramic images have been reported in recent years. In [36] a structure-from-motion (SfM) pipeline for 3D modeling using Google Street View imagery with an ideal spherical camera model was presented, while [37] presented a piecewise planar model to reconstruct 3D scenes from panoramic image sequences. A quadrangular prism panoramic camera model was used for improved image matching. Reference [38] explained both the ideal spherical camera model and the rigorous panoramic sensor model for a multi-camera rig and their corresponding epipolar geometry. In [39] the two models were further compared and their effects on the localization quality in object space and the quality of space resection. In this paper, we utilize the rigorous panoramic sensor model for image and LiDAR registration. Since there were many studies based on the ideal spherical camera model, we also attempt to demonstrate its limitations in this paper by comparing it with the rigorous model. This paper proposes a rigorous line-based registration method for precise alignment of LiDAR point clouds and panoramic images. The remainder of this paper is structured as follows: Section 2 introduces a MMS and its sensor frames for the LiDAR and the panoramic camera. Section 3 presents registration models based on straight-line features for panoramic cameras. Section 4 addresses 3D line extraction from LiDAR point clouds. Section 5 introduces the datasets and analyzes the registration results based on our model. Finally, the conclusions and future work recommendations are presented in Section 6.

The Mobile Mapping System and Sensor Configuration
The MMS used in this paper was jointly developed by Wuhan University and Leador Spatial Information Technology Corporation, Ltd. (Wuhan, China), configured with a Ladybug 3 camera [40], three low-cost SICK laser scanners [41] (one for ground and two for facades) and a GPS/IMU. The system and its sensor configuration are shown in Figure 1. of the panoramic image in the limited viewport was used. Moreover, the registration was based on a conventional frame camera model. Besides registration, many applications based on panoramic images have been reported in recent years. In [36] a structure-from-motion (SfM) pipeline for 3D modeling using Google Street View imagery with an ideal spherical camera model was presented, while [37] presented a piecewise planar model to reconstruct 3D scenes from panoramic image sequences. A quadrangular prism panoramic camera model was used for improved image matching. Reference [38] explained both the ideal spherical camera model and the rigorous panoramic sensor model for a multi-camera rig and their corresponding epipolar geometry. In [39] the two models were further compared and their effects on the localization quality in object space and the quality of space resection. In this paper, we utilize the rigorous panoramic sensor model for image and LiDAR registration. Since there were many studies based on the ideal spherical camera model, we also attempt to demonstrate its limitations in this paper by comparing it with the rigorous model. This paper proposes a rigorous line-based registration method for precise alignment of LiDAR point clouds and panoramic images. The remainder of this paper is structured as follows: Section 2 introduces a MMS and its sensor frames for the LiDAR and the panoramic camera. Section 3 presents registration models based on straight-line features for panoramic cameras. Section 4 addresses 3D line extraction from LiDAR point clouds. Section 5 introduces the datasets and analyzes the registration results based on our model. Finally, the conclusions and future work recommendations are presented in Section 6.

The Mobile Mapping System and Sensor Configuration
The MMS used in this paper was jointly developed by Wuhan University and Leador Spatial Information Technology Corporation, Ltd. (Wuhan, China), configured with a Ladybug 3 camera [40], three low-cost SICK laser scanners [41] (one for ground and two for facades) and a GPS/IMU. The system and its sensor configuration are shown in Figure 1. This section first introduces several key coordinate systems of the MMS and their geometric relationships, followed by a description of the geo-referenced LiDAR and the rigorous camera model for a multi-camera rig. This section first introduces several key coordinate systems of the MMS and their geometric relationships, followed by a description of the geo-referenced LiDAR and the rigorous camera model for a multi-camera rig.

Coordinate Systems
As shown in Figure 2, there are three coordinate systems in our MMS: (1) a world coordinate system; (2) a vehicle platform coordinate system; and (3) a camera-centered coordinate system. The world coordinate system is the reference for data management and organization. In the proposed method, the GPS/IMU records the translation and orientation from the world coordinate system to the vehicle platform coordinate system denoted as M 1 (R 1 , T 1 ). The LiDAR points are geo-referenced to the world coordinate system (the left-bottom dotted line) according to M 1 and the calibration parameters M 3 between the LiDAR sensor and the platform. M 2 (R 2 , T 2 ) is the transformation from the camera to the vehicle platform, which is also achieved through prior calibration.
The goal of registration is to determine the transformation M from the LiDAR points to the panoramic image. Other than a static calibration which concerns only M 23 , the time series of localization information M 1 is considered. For simplification, the possible errors of M 3 are ignored and the transformation is constructed directly between the georeferenced LiDAR and camera (the bottom solid line). It is assumed that there exists ∆M(∆R, ∆T) to meet: ∆M compensates for several aspects including M 3 (as is discussed in Section 1). The line features are extracted from both the images and the LiDAR points, which are then utilized to determine the optimal ∆M for an accurate registration. The solution procedure is based on the standard least squares technique imbedded with RANSAC for removal of possible gross errors in M 1 .

Coordinate Systems
As shown in Figure 2, there are three coordinate systems in our MMS: (1) a world coordinate system; (2) a vehicle platform coordinate system; and (3) a camera-centered coordinate system. The world coordinate system is the reference for data management and organization. In the proposed method, the GPS/IMU records the translation and orientation from the world coordinate system to the vehicle platform coordinate system denoted as M1(R1, T1). The LiDAR points are geo-referenced to the world coordinate system (the left-bottom dotted line) according to M1 and the calibration parameters M3 between the LiDAR sensor and the platform. M2(R2, T2) is the transformation from the camera to the vehicle platform, which is also achieved through prior calibration.
The goal of registration is to determine the transformation M from the LiDAR points to the panoramic image. Other than a static calibration which concerns only M23, the time series of localization information M1 is considered. For simplification, the possible errors of M3 are ignored and the transformation is constructed directly between the georeferenced LiDAR and camera (the bottom solid line). It is assumed that there exists ΔM(ΔR, ΔT) to meet: ΔM compensates for several aspects including M3 (as is discussed in Section 1). The line features are extracted from both the images and the LiDAR points, which are then utilized to determine the optimal ΔM for an accurate registration. The solution procedure is based on the standard least squares technique imbedded with RANSAC for removal of possible gross errors in M1.

Geo-Refereneced LiDAR
The LiDAR points are geo-referenced to the world coordinate system by the interpolated rotation values recorded by INS at the corresponding position from the GPS/IMU integrated navigation data M1 and the calibration parameters M3 between the LiDAR and the IMU [42]. In the proposed system, three low-cost SICK laser scanners (all linear-array lasers) are equipped to acquire a 3D point cloud of the object's facade. The angular resolution (0.25°-1.0°) and scan frequency  Hz) are fixed during data acquisition. The density of LiDAR points is uneven, i.e., the closer they are to the measured surface, the higher the density of the points. For instance, the points on the ground are much denser than those on the top facade. In addition, the point density in the horizontal direction is dependent on the velocity of the MMS vehicle.

Geo-Refereneced LiDAR
The LiDAR points are geo-referenced to the world coordinate system by the interpolated rotation values recorded by INS at the corresponding position from the GPS/IMU integrated navigation data M 1 and the calibration parameters M 3 between the LiDAR and the IMU [42]. In the proposed system, three low-cost SICK laser scanners (all linear-array lasers) are equipped to acquire a 3D point cloud of the object's facade. The angular resolution (0.25 • -1.0 • ) and scan frequency  are fixed during data acquisition. The density of LiDAR points is uneven, i.e., the closer they are to the measured surface, the higher the density of the points. For instance, the points on the ground are much denser than those on the top facade. In addition, the point density in the horizontal direction is dependent on the velocity of the MMS vehicle.

Multi-Camera Rig Models
The panoramic image in Figure 3 covers a 360 • view of a surrounding scene, captured by the Ladybug 3 system composed of six fisheye cameras. The straight lines in reality are no longer straight on panoramic images compared to a common frame Figure 3b.

Multi-Camera Rig Models
The panoramic image in Figure 3 covers a 360° view of a surrounding scene, captured by the Ladybug 3 system composed of six fisheye cameras. The straight lines in reality are no longer straight on panoramic images compared to a common frame Figure 3b.  Generally, the panoramic imaging process can be approximated by an ideal spherical camera model. However, since the entire image is technically stitched from six individual images through blending, stitching errors cannot be avoided. This section first introduces the traditional ideal spherical camera model and then extends it to the rigorous multi-camera rig panoramic model. The spherical camera model is referred to as the ideal one and the panoramic camera model as the rigorous one.

Spherical Camera Model
Under this model, the imaging surface is regarded as a sphere, whose center is the projection center. Figure 4a presents a schematic diagram of the spherical projection, where the sphere center S, the 3D points P in plane π, and the panoramic image point u are collinear [39]. The pixels in a panoramic image are typically expressed in polar coordinates. Assuming that the width and height of the panoramic image is W and H respectively, the horizontal 360° view is mapped to [0, W] and the vertical 180° view is mapped to [0, H]. Thus, each pixel (u, v) can be transformed to polar coordinate ( , ) by Equation (2): is the horizontal angle between −π and π, and is the vertical angle between −π/2 and π/2. Let r be the radius of the projection sphere, Equation (3) is used to calculate a set of Cartesian coordinates. In most cases, r = 20.0 m for the best stitching accuracy [43].
The sphere center S, 3D point P, and edge pixel u are collinear. The relationship between X and P is established by perspective transformation in Equation (4): where P is the coordinate of the object point, and X(x, y, z) is the Cartesian coordinate of image point u; R and T are respectively the rotation matrix and translation vector between the object space and the panoramic camera space; and is the scale factor. Generally, the panoramic imaging process can be approximated by an ideal spherical camera model. However, since the entire image is technically stitched from six individual images through blending, stitching errors cannot be avoided. This section first introduces the traditional ideal spherical camera model and then extends it to the rigorous multi-camera rig panoramic model. The spherical camera model is referred to as the ideal one and the panoramic camera model as the rigorous one.

Spherical Camera Model
Under this model, the imaging surface is regarded as a sphere, whose center is the projection center. Figure 4a presents a schematic diagram of the spherical projection, where the sphere center S, the 3D points P in plane π, and the panoramic image point u are collinear [39]. The pixels in a panoramic image are typically expressed in polar coordinates. Assuming that the width and height of the panoramic image is W and H respectively, the horizontal 360 • view is mapped to [0, W] and the vertical 180 • view is mapped to [0, H]. Thus, each pixel (u, v) can be transformed to polar coordinate (θ, ϕ) by Equation (2): θ is the horizontal angle between −π and π, and ϕ is the vertical angle between −π/2 and π/2. Let r be the radius of the projection sphere, Equation (3) is used to calculate a set of Cartesian coordinates. In most cases, r = 20.0 m for the best stitching accuracy [43].
The sphere center S, 3D point P, and edge pixel u are collinear. The relationship between X and P is established by perspective transformation in Equation (4): where P is the coordinate of the object point, and X(x, y, z) is the Cartesian coordinate of image point u; R and T are respectively the rotation matrix and translation vector between the object space and the panoramic camera space; and λ is the scale factor.  Unlike the traditional camera model, the z coordinate of the image point is not equal to −f, where f is the focal length in the traditional camera model. In the widely used spherical camera model, the image point u is under the spherical geometry constraint as Equation (5): As a result, Equation (4) actually has two degrees of freedom, i.e., two independent equations.

Panoramic Camera Model
A multi-camera rig consists of several separate and fixed fish-eye lenses. Independent images are captured by each lens and then stitched to form the entire panoramic image. As shown in Figure  4b, each lens has its own projection center C, but it cannot be precisely located on sphere center S due to the manufacturing constraints. The mono-lens center C, instead of sphere center S, panoramic image point u, and 3D point P′ in object space are collinear.  Unlike the traditional camera model, the z coordinate of the image point is not equal to −f, where f is the focal length in the traditional camera model. In the widely used spherical camera model, the image point u is under the spherical geometry constraint as Equation (5): As a result, Equation (4) actually has two degrees of freedom, i.e., two independent equations.

Panoramic Camera Model
A multi-camera rig consists of several separate and fixed fish-eye lenses. Independent images are captured by each lens and then stitched to form the entire panoramic image. As shown in Figure 4b, each lens has its own projection center C, but it cannot be precisely located on sphere center S due to the manufacturing constraints. The mono-lens center C, instead of sphere center S, panoramic image point u, and 3D point P in object space are collinear.  Unlike the traditional camera model, the z coordinate of the image point is not equal to −f, where f is the focal length in the traditional camera model. In the widely used spherical camera model, the image point u is under the spherical geometry constraint as Equation (5): As a result, Equation (4) actually has two degrees of freedom, i.e., two independent equations.

Panoramic Camera Model
A multi-camera rig consists of several separate and fixed fish-eye lenses. Independent images are captured by each lens and then stitched to form the entire panoramic image. As shown in Figure  4b, each lens has its own projection center C, but it cannot be precisely located on sphere center S due to the manufacturing constraints. The mono-lens center C, instead of sphere center S, panoramic image point u, and 3D point P′ in object space are collinear.  The panoramic camera used in this paper is composed of six separate fish-eye lenses. Figure 5a shows an example of the six raw fisheye images, and Figure 5b shows the corresponding undistorted images rectified from the raw ones. Rectification from a fisheye to an ideal plane image only depends on the known camera calibration parameters Kr, including the projection model and the radial and tangential distortion. The index r means that every fisheye camera has its own calibration parameters. Since the straight lines, such as the boundaries of buildings, are distorted in the raw fisheye image, rectified images are used for line extraction.
As shown in Figure 6, the global camera coordinate system is defined for the whole multi-camera rig, and six local coordinate systems are defined for each lens separately. The global coordinate system (see Figure 6a) of the panoramic camera is defined by three main directions: the X-axis typically is along the driving direction; the Z-axis is the zenith direction; and the Y-axis is orthogonal to both the Y-axis and the Z-axis. Each of the six lenses has its own local coordinate system (see Figure 6b): (1) The origin is the optical centre of the lens; (2) the Z-axis is the optical axis and points towards the scene; and (3) the X-axis and the Y-axis are parallel to the corresponding image coordinate. For each lens, there are three interior orientation elements (focal length f and image centre (x 0 , y 0 )) and six exterior orientation parameters (T x , T y , T z , R x , R y , R z ) relative to the global coordinate system (the offsets between C and S in Figure 4b under spherical view). Both of them were acquired in advance by careful calibration by the manufacturer. The panoramic camera used in this paper is composed of six separate fish-eye lenses. Figure 5a shows an example of the six raw fisheye images, and Figure 5b shows the corresponding undistorted images rectified from the raw ones. Rectification from a fisheye to an ideal plane image only depends on the known camera calibration parameters Kr, including the projection model and the radial and tangential distortion. The index r means that every fisheye camera has its own calibration parameters. Since the straight lines, such as the boundaries of buildings, are distorted in the raw fisheye image, rectified images are used for line extraction.
As shown in Figure 6, the global camera coordinate system is defined for the whole multi-camera rig, and six local coordinate systems are defined for each lens separately. The global coordinate system (see Figure 6a) of the panoramic camera is defined by three main directions: the X-axis typically is along the driving direction; the Z-axis is the zenith direction; and the Y-axis is orthogonal to both the Y-axis and the Z-axis. Each of the six lenses has its own local coordinate system (see Figure  6b): (1) The origin is the optical centre of the lens; (2) the Z-axis is the optical axis and points towards the scene; and (3) the X-axis and the Y-axis are parallel to the corresponding image coordinate. For each lens, there are three interior orientation elements (focal length f and image centre (x0, y0)) and six exterior orientation parameters (Tx, Ty, Tz, Rx, Ry, Rz) relative to the global coordinate system (the offsets between C and S in Figure 4b under spherical view). Both of them were acquired in advance by careful calibration by the manufacturer. Although each lens may have its own camera model, the advantages of panoramic imaging may be overlooked. To address this issue, all six images are projected into the global spherical imaging surface to obtain uniform coordinates. First, the coordinates of the rectified image of each separate lens are transformed to the global coordinate system. Each pixel p (x, y) in the rectified images forms a 3D ray in the global coordinate system by Equation (6): where Xr (x − x0, y − y0, f) is the mono-camera coordinates of pixel p and the translation vector Tr (Tx, Ty, Tz) and the local rotation matrix Rr are known, the latter can be calculated by the following equation: = cos cos cos sin sin − sin cos cos sin cos + sin sin sin cos sin sin sin + cos cos sin sin cos − cos sin − cos sin cos cos (7) X' x', y', z') is the coordinates transformed into the global panoramic coordinate system, and the scale factor m defines the distance from the rectified image plane to the projection surface (typically a sphere or cylinder). By combining Equations (5) and (6), we can resolve m and X' for a sphere projection. Although each lens may have its own camera model, the advantages of panoramic imaging may be overlooked. To address this issue, all six images are projected into the global spherical imaging surface to obtain uniform coordinates. First, the coordinates of the rectified image of each separate lens are transformed to the global coordinate system. Each pixel p (x, y) in the rectified images forms a 3D ray in the global coordinate system by Equation (6): where X r (x − x 0 , y − y 0 , f ) is the mono-camera coordinates of pixel p and the translation vector T r (T x , T y , T z ) and the local rotation matrix R r are known, the latter can be calculated by the following equation: Sensors 2017, 17, 70 8 of 20 X (x , y , z ) is the coordinates transformed into the global panoramic coordinate system, and the scale factor m defines the distance from the rectified image plane to the projection surface (typically a sphere or cylinder). By combining Equations (5) and (6), we can resolve m and X for a sphere projection.
In the next step, the collinearity equation based on the multi-camera rig is established. As shown in Figure 4b, the real 3D ray is through CuP , instead of SuP, which can be vectorized as (X − T r ). Translating the vector to the global camera coordinate system yields: Equation (8) would be the same as the sphere projection (4) when T r is small enough and vanishing. However, for the self-assembly panoramic camera whose T r is too large to ignore, the panoramic camera model is a better choice.

Line-Based Registration Method
To simplify the transformation in Equation (1), an auxiliary coordinate system is introduced, which is close to the camera-centered coordinate system but still has ∆M bias. Using M 1 and M 2 in Figure 2, LiDAR point P w is transformed in the world coordinate into the auxiliary coordinate P, as is defined in Equation (9), which is further discussed below:

Transformation Model
Suppose a known line segment AB is given in the world coordinate (actually the auxiliary coordinate in this paper). Its corresponding line in the panoramic image is detected as edge pixels. The projection ray through the perspective panoramic camera center C, an edge pixel p on panoramic image, intersects AB on point P, as illustrated in Figure 7. By letting the line be represented by the two endpoints X A and X B , an arbitrary point P is defined using Equation (10): with t a scale factor along the line. In the next step, the collinearity equation based on the multi-camera rig is established. As shown in Figure 4b, the real 3D ray is through CuP′, instead of SuP, which can be vectorized as ( − ).
Translating the vector to the global camera coordinate system yields: Equation (8) would be the same as the sphere projection (4) when Tr is small enough and vanishing. However, for the self-assembly panoramic camera whose Tr is too large to ignore, the panoramic camera model is a better choice.

Line-Based Registration Method
To simplify the transformation in Equation (1), an auxiliary coordinate system is introduced, which is close to the camera-centered coordinate system but still has ΔM bias. Using M1 and M2 in Figure 2, LiDAR point Pw is transformed in the world coordinate into the auxiliary coordinate P, as is defined in Equation (9), which is further discussed below:

Transformation Model
Suppose a known line segment AB is given in the world coordinate (actually the auxiliary coordinate in this paper). Its corresponding line in the panoramic image is detected as edge pixels. The projection ray through the perspective panoramic camera center C, an edge pixel p on panoramic image, intersects AB on point P, as illustrated in Figure 7. By letting the line be represented by the two endpoints XA and XB, an arbitrary point P is defined using Equation (10): with t a scale factor along the line. Substituting the object point P in Equation (10) to Equation (4) yields the line-based sphere camera model: Further, Equations (2) and (3) are combined and P is substituted in Equations (10)-(8), yielding the line-based panoramic camera model: where X' can be obtained from Equation (6). The scalar parameter and the line parameter t are Substituting the object point P in Equation (10) to Equation (4) yields the line-based sphere camera model: Further, Equations (2) and (3) are combined and P is substituted in Equations (10)- (8), yielding the line-based panoramic camera model: where X can be obtained from Equation (6). The scalar parameter λ and the line parameter t are unknown. What we try to resolve is the rotation matrix R and translation T.

Solution
To get the best alignment between the two datasets, the non-linear least squares method is used to solve the unknowns iteratively. Euclidean distance in the panoramic image coordinate system is used as the similarity metric. Denoting the right-hand term in Equation (11) as X Y Z T and combining Equations (2) and (4), resulting Equation (13): where (u, v) is the coordinates in the panoramic image coordinate system and (W,H) is the panoramic image size.
In addition to the six orientation and translation values (X, Y, Z, ϕ, ω, κ), unknown line parameter t must be estimated. Here the right terms are multivariate composite functions f u (R, T, t) and f v (R, T, t). Given one pixel on the corresponding lines, the two equations in Equation (11) are formed with one line-parameter t introduced. In order to solve the six unknowns, at least six points are needed. If one point per line is used, six pairs of corresponding lines are needed; and if two points per line are used, three pairs of corresponding lines are needed. More than two points on a line does not reduce the rank deficiency but only increases the redundancy.
The equations of i-th pair of corresponding lines can be termed by . Defining a parameter vector X = (X, Y, Z, ϕ, ω, κ, t 1 , · · · , t n ) T for n pairs of corresponding lines, Equation (13) is then expanded as Equation (14) after linearization by the Taylor series: The above equation is expressed in matrix form as Equation (15): where ∆ = (∆X T , ∆Y T , ∆Z T , ∆ϕ, ∆ω, ∆κ, ∆t 1 , · · · , ∆t n ) T and L = (L 1 ; · · · ; L n ) with The coefficient matrix A(A 1 ; · · · ; A n ). are defined as partial derivative of functions f u and f v : The results can be obtained by solving the normal equation ∆ = A T A −1 A T L. The unknowns X are updated through X ← X + ∆ iteratively until the elements of ∆ are less than a given threshold. In order to assess the accuracy of the results, the standard deviation is calculated by Equation (17): Here r is the number of redundant observations, r = (2 × n) − (n + 6). n is the number of pixels involved in the transformation, and usually n = 2m with m pairs of corresponding lines. To handle the mismatch between the LiDAR cloud lines and image lines as well as the occasional large biases in GPS/IMU records, a RANSAC paradigm [29] is applied in iteration to remove the outliers in the corresponding line segments from the LiDAR and the camera.

Line Feature Extraction from LiDAR
The insufficient density of LiDAR points in a low-cost configuration usually makes 3D line fitting a challenge. The previous works about line extraction tend to use the intersection of two neighboring unparalleled planar patches. However, the methods cannot work well in our case. There are only a few intersections of planar patches in the dataset both due to the point cloud density and flat building facades. Hence we fit linear features directly from cloud points. There are three types of objects containing abundant linear features in the common street-view scenes: buildings, pole-like objects and curbs. In this section, we introduce the methods to fit 3D straight lines from points belonging to the three objects. It is noted that the line-fitting is based on the well classified 3D point cloud achieved by the existing algorithms or software.

Buildings
Buildings provide the most reliable straight line features in street view. Given a mobile LiDAR point cloud dataset P{p i |p i (x i , y i , z i )} of buildings, a group of 3D line segments L l j l j x 0j , y 0j , z 0j ; x 1j , y 1j , z 1j can be obtained. The detection procedure consists of three steps: (1) apply a region growing segmentation [44] on P and get a set of planar segments S{s k |s k (i k1 , i k2 , · · · , i kn )}; (2) project the points onto the 3D plane model of segments s k (Figure 8a

Line Feature Extraction from LiDAR
The insufficient density of LiDAR points in a low-cost configuration usually makes 3D line fitting a challenge. The previous works about line extraction tend to use the intersection of two neighboring unparalleled planar patches. However, the methods cannot work well in our case. There are only a few intersections of planar patches in the dataset both due to the point cloud density and flat building facades. Hence we fit linear features directly from cloud points. There are three types of objects containing abundant linear features in the common street-view scenes: buildings, pole-like objects and curbs. In this section, we introduce the methods to fit 3D straight lines from points belonging to the three objects. It is noted that the line-fitting is based on the well classified 3D point cloud achieved by the existing algorithms or software.

Street Light Poles
The pole-like segments are labeled and divided into separate objects by spatial connectivity and presented by one or two arrays of points. A percentile-based pole recognition algorithm [46] is adopted to extract the pole objects, which could exclude disrupting structures, such as a flowerbed at the bottom of a light pole and non-pole elements such as lamps. The main steps of the algorithm are as follows: (1) the segment is first sliced into subparts, for which 2D enclosing rectangles and centroids are derived; (2) the deviation of the centroids between neighboring subparts is checked; (3) the diagonal length of the rectangle is checked; (4) the neighboring subparts with the maximum length are kept. The final fitting line segment is defined by the 2D centroids and the minimum and maximum Z values. The fitting results are shown in Figure 9a. In order to overcome the weakness of the traditional least-square regression method [45], certain constraints are introduced: (1) lines must be through the outermost point, instead of the centroid; (2) only the lines close to being vertical or horizontal are considered; and (3) only the lines having sufficient points are considered (Figure 8d). Compared to the line segments detected by [33] (Figure 8c), the fitting lines with constraints are more reliable.

Street Light Poles
The pole-like segments are labeled and divided into separate objects by spatial connectivity and presented by one or two arrays of points. A percentile-based pole recognition algorithm [46] is adopted to extract the pole objects, which could exclude disrupting structures, such as a flowerbed at the bottom of a light pole and non-pole elements such as lamps. The main steps of the algorithm are as follows: (1) the segment is first sliced into subparts, for which 2D enclosing rectangles and centroids are derived; (2) the deviation of the centroids between neighboring subparts is checked; (3) the diagonal length of the rectangle is checked; (4) the neighboring subparts with the maximum length are kept. The final fitting line segment is defined by the 2D centroids and the minimum and maximum Z values. The fitting results are shown in Figure 9a.

Curbs
Curbs are usually located at a height of 10-20 cm above the road surface and are designed to separate the roads from the sidewalks. The density of the points on the ground is relatively high, and the points on curbs present as narrow stripes which are vertical to the road surface [28]. A curb-line can be approximated as the intersection of the vertical curb with the ground surface. In this paper, the curb-lines in the following steps: (1) the points assigned as curbs are fitted into a plane parallel to the Z-axis under a RANSAC method with direction constraint and the noises are filtered as outliers; (2) the points are fitted into the 2D line segment in the OXY plane, (3) the height of the ground is used as the Z value of the 2D line segment. The fitting results are shown in Figure 9b.

Datasets
The test data were collected on Hankou North Street in the northern part of Wuhan City and included buildings, trees, poles, streets, and moving cars. Figure 10a shows the test area in Google Earth, and Figure 10b shows the 3D point cloud of the test area containing about 1.2 million points, which were previously classified and rendered by classification. (a)

Curbs
Curbs are usually located at a height of 10-20 cm above the road surface and are designed to separate the roads from the sidewalks. The density of the points on the ground is relatively high, and the points on curbs present as narrow stripes which are vertical to the road surface [28]. A curb-line can be approximated as the intersection of the vertical curb with the ground surface. In this paper, the curb-lines in the following steps: (1) the points assigned as curbs are fitted into a plane parallel to the Z-axis under a RANSAC method with direction constraint and the noises are filtered as outliers; (2) the points are fitted into the 2D line segment in the OXY plane, (3) the height of the ground is used as the Z value of the 2D line segment. The fitting results are shown in Figure 9b.

Datasets
The test data were collected on Hankou North Street in the northern part of Wuhan City and included buildings, trees, poles, streets, and moving cars. Figure 10a shows the test area in Google Earth, and Figure 10b shows the 3D point cloud of the test area containing about 1.2 million points, which were previously classified and rendered by classification.

Datasets
The test data were collected on Hankou North Street in the northern part of Wuhan City and included buildings, trees, poles, streets, and moving cars. Figure 10a shows the test area in Google Earth, and Figure 10b shows the 3D point cloud of the test area containing about 1.2 million points, which were previously classified and rendered by classification. In both subfigures, the red dots are the driving path, and each of the dots is the location where the panoramic camera exposed at an approximate spacing of 7.5 m. The GPS observations were postprocessed with RTK [47] technology and can reach an accuracy of up to 0.1 m.
We first extracted a number of three typical line features from the LiDAR dataset. Second, we projected the lines to the rectified mono-images to obtain the corresponding 2D lines, followed by a manual check for eliminating possible one-to-many uncertainty. Then, the proposed registration approach based on the panoramic camera model was applied. A linear feature from LiDAR was defined by two 3D endpoints; and a linear feature from an image was defined as a sequence of pixels in which only two pixels were used in the transformation. Finally, the registration results were assessed based on 2D and 3D visual comparison before and after registration, quantitative evaluation of check points, and statistical evaluation of edge pixels and 3D boundary points.
As mentioned in Section 2.1, the MMS recorded the POS data of the vehicle when it captured an image, while the exterior orientation parameters (EOP) of the camera relative to the vehicle platform coordinate system were acquired in advance through system calibration. The EOP defined the position and rotation of the camera at the instant of exposure with six parameters: three Euclidean coordinates (X, Y, Z) of the projection center and three angles of rotation ( , , ). Table 1 shows an example of the POS data and the EOP, which correspond to M1(R1, T1) and M2(R2, T2), respectively, in Equation (9).  In both subfigures, the red dots are the driving path, and each of the dots is the location where the panoramic camera exposed at an approximate spacing of 7.5 m. The GPS observations were post-processed with RTK [47] technology and can reach an accuracy of up to 0.1 m.
We first extracted a number of three typical line features from the LiDAR dataset. Second, we projected the lines to the rectified mono-images to obtain the corresponding 2D lines, followed by a manual check for eliminating possible one-to-many uncertainty. Then, the proposed registration approach based on the panoramic camera model was applied. A linear feature from LiDAR was defined by two 3D endpoints; and a linear feature from an image was defined as a sequence of pixels in which only two pixels were used in the transformation. Finally, the registration results were assessed based on 2D and 3D visual comparison before and after registration, quantitative evaluation of check points, and statistical evaluation of edge pixels and 3D boundary points.
As mentioned in Section 2.1, the MMS recorded the POS data of the vehicle when it captured an image, while the exterior orientation parameters (EOP) of the camera relative to the vehicle platform coordinate system were acquired in advance through system calibration. The EOP defined the position and rotation of the camera at the instant of exposure with six parameters: three Euclidean coordinates (X, Y, Z) of the projection center and three angles of rotation (ϕ, ω, κ). Table 1 shows an example of the POS data and the EOP, which correspond to M 1 (R 1 , T 1 ) and M 2 (R 2 , T 2 ), respectively, in Equation (9).  Table 2 shows the known parameters of the six cameras in the panoramic camera model. R x , R y , R z are the rotation angles about the X, Y and Z axes, and T x , T y , T z are the translation along the X, Y and Z axes. x 0 , y 0 indicate the pixel location of the camera center, and f is the focal length. These parameters and definitions of the coordinate systems are discussed in detail in Section 2. These parameters are used in the line-based panoramic camera model (12). Table 2. Parameters of mono-cameras in the panoramic camera model (image size is 1616 × 1232 in pixels).

Registration Results
This section analyzes the registration results based on the panoramic camera model in the following steps: registration, visual inspection, and quantitative and statistical evaluation. For comparison purposes, the results from the spherical camera model are presented as well. Table 3 lists the registration results based on the spherical and panoramic camera models. Here the deltas are the corrections after registration, which are the correction term in Equation (15). The root mean square error (RMSE) is the assessment of registration accuracy, which is defined in Equation (17). Both RMSEs were below five pixels. Given an object point 20 m away from the camera center, the error in the object space was about 6 cm.  To quantitatively evaluate the registration results based on the panoramic camera model, we manually selected check points both in the LiDAR points and the images according to the following rules: (1) the check points must be correspondent and recognizable in both datasets; (2) the check points should be selected from stationary objects with sufficiently dense LiDAR points; and (3) the check points should be evenly distributed horizontally. As a result, 20 check points were selected (see Figure 12) in the 3D LiDAR point cloud and the panoramic image, whereas 28 corresponding points were selected on the rectified mono-camera images. Please note that there are more check points for images than for LiDAR points because the check points for the panoramic model in the overlapping areas appear twice in adjacent cameras. To quantitatively evaluate the registration results based on the panoramic camera model, we manually selected check points both in the LiDAR points and the images according to the following rules: (1) the check points must be correspondent and recognizable in both datasets; (2) the check points should be selected from stationary objects with sufficiently dense LiDAR points; and (3) the check points should be evenly distributed horizontally. As a result, 20 check points were selected (see Figure 12) in the 3D LiDAR point cloud and the panoramic image, whereas 28 corresponding points were selected on the rectified mono-camera images. Please note that there are more check points for images than for LiDAR points because the check points for the panoramic model in the overlapping areas appear twice in adjacent cameras. rules: (1) the check points must be correspondent and recognizable in both datasets; (2) the check points should be selected from stationary objects with sufficiently dense LiDAR points; and (3) the check points should be evenly distributed horizontally. As a result, 20 check points were selected (see Figure 12) in the 3D LiDAR point cloud and the panoramic image, whereas 28 corresponding points were selected on the rectified mono-camera images. Please note that there are more check points for images than for LiDAR points because the check points for the panoramic model in the overlapping areas appear twice in adjacent cameras. We projected all the check points to images to determine their 2D coordinates before and after registration separately. Then, we calculated the Euclidean distances between the projected 2D points and the image check points as residuals. According to Figure 13, both the spherical and panoramic camera models reduced the residuals significantly, with the latter showing a slight advantage. The average residuals decreased from 12.0 to 2.9 pixels with the panoramic camera model (Figure 13b), and from 20.5 to 6.5 pixels with the spherical camera model (Figure 13a) after registration. The residuals on most check points decreased substantially after registration, and a few of them showed minimal change. The latter occurred for the checks points whose initial residuals were small enough before registration, such as 1, 3, and 20. We projected all the check points to images to determine their 2D coordinates before and after registration separately. Then, we calculated the Euclidean distances between the projected 2D points and the image check points as residuals. According to Figure 13, both the spherical and panoramic camera models reduced the residuals significantly, with the latter showing a slight advantage. The average residuals decreased from 12.0 to 2.9 pixels with the panoramic camera model (Figure 13b), and from 20.5 to 6.5 pixels with the spherical camera model (Figure 13a) after registration. The residuals on most check points decreased substantially after registration, and a few of them showed minimal change. The latter occurred for the checks points whose initial residuals were small enough before registration, such as 1, 3, and 20. In order to further evaluate the overall effect of registration, we also calculated the statistical value "overlap rate" of the linear features from the two datasets. For the panoramic image, we adopted the EDISON edge detector [48] to extract the edge pixels, as shown in Figure 14a, in which 8 × windows and 20 pixels were used as the min length of the lines to remove noise. For the LiDAR points, we extracted the boundary points using 30 k-nearest neighborhoods, and the results are shown in Figure 14b. From the figures, it can be seen that most of the geometric linear features in the two datasets corresponded. At the same time, some disturbing elements were present, such as the edges due to the color difference in the image and the points from missing data in the LiDAR points. In order to further evaluate the overall effect of registration, we also calculated the statistical value "overlap rate" of the linear features from the two datasets. For the panoramic image, we adopted the EDISON edge detector [48] to extract the edge pixels, as shown in Figure 14a, in which 8 × windows and 20 pixels were used as the min length of the lines to remove noise. For the LiDAR points, we extracted the boundary points using 30 k-nearest neighborhoods, and the results are shown in Figure 14b. From the figures, it can be seen that most of the geometric linear features in the two datasets corresponded. At the same time, some disturbing elements were present, such as the edges due to the color difference in the image and the points from missing data in the LiDAR points.  Figure 15 illustrates the "overlap rate" of the linear features from the two datasets. First, we projected the 3D boundary points onto an image and determined the binary image (see Figure 15b). Second, we performed the Overlap and Union operation on the two binary images (see Figure 15a,b) to determine the overlap binary image (see Figure 15c) and the union binary image (see Figure 15d). We then counted the number of non-zero pixels in both binary images: the number in the overlap binary image is no, and the number in the union binary image is nu. Finally, we defined the overlap rate as follows: If the alignment of the image and the LiDAR point cloud improves, there will be more overlapping linear features, which means will increase while will decrease and the overlap rate will increase after efficient registration. The results in Table 4 show that all of the overlap rates slightly increased up to 2% after registration. Among the five lenses, Len ID 1 showed the most significant improvement, mainly because part of the image captured by Len ID 1 was a facade containing many linear elements (see Figure 5).  Figure 15 illustrates the "overlap rate" of the linear features from the two datasets. First, we projected the 3D boundary points onto an image and determined the binary image (see Figure 15b). Second, we performed the Overlap and Union operation on the two binary images (see Figure 15a,b) to determine the overlap binary image (see Figure 15c) and the union binary image (see Figure 15d). We then counted the number of non-zero pixels in both binary images: the number in the overlap binary image is n o , and the number in the union binary image is n u . Finally, we defined the overlap rate as follows: If the alignment of the image and the LiDAR point cloud improves, there will be more overlapping linear features, which means n o will increase while n u will decrease and the overlap rate will increase after efficient registration. The results in Table 4 show that all of the overlap rates slightly increased up to 2% after registration. Among the five lenses, Len ID 1 showed the most significant improvement, mainly because part of the image captured by Len ID 1 was a facade containing many linear elements (see Figure 5).

Conclusions
This paper proposed a line-based registration approach for panoramic images and LiDAR point clouds collected by a MMS. We first established the transformation model between the primitives from the two datasets in the camera-centered coordinate system. Then, we extracted the primitives (three typical linear features) in street view from LiDAR automatically and from panoramic images through a semi-automation process. Using the extracted features, we resolved the relative orientations and translations between the camera and LiDAR.
Compared with other related works, the main contribution of this study is that it focused on the registration between LiDAR and the panoramic camera, which is widely used in a MMS instead of a conventional frame camera. Two types of camera models (spherical and panoramic) were utilized in our registration. The experimental results show that both models were able to remove obvious misalignment between the LiDAR point cloud and the panoramic image. However, the panoramic model achieved better registration accuracy. It is suggested that a suitable camera model may need to be chosen for certain data fusion tasks. For example, for rendering a LiDAR point cloud with acceptable misalignment, the spherical camera model would be adequate while the panoramic camera model may be necessary for high level fusion tasks such as facade modeling.
There are ways to further improve the registration accuracy and automation of the proposed method in future work. First, the errors from the LiDAR point cloud itself could not be overlooked. In our case, the LiDAR points were collected by three laser scanners, whose calibration errors also influenced the registration accuracy. Moreover, finding reliable correspondences between different datasets may need to use local statistical similarity such as mutual information. In addition to geometrical features, utilizing the physical attributes of LiDAR, such as intensity, is also a future research topic. (c) union of (a,b); (d) overlap of (a,b); (e) composition of (c) and highlighted (d).

Conclusions
This paper proposed a line-based registration approach for panoramic images and LiDAR point clouds collected by a MMS. We first established the transformation model between the primitives from the two datasets in the camera-centered coordinate system. Then, we extracted the primitives (three typical linear features) in street view from LiDAR automatically and from panoramic images through a semi-automation process. Using the extracted features, we resolved the relative orientations and translations between the camera and LiDAR.
Compared with other related works, the main contribution of this study is that it focused on the registration between LiDAR and the panoramic camera, which is widely used in a MMS instead of a conventional frame camera. Two types of camera models (spherical and panoramic) were utilized in our registration. The experimental results show that both models were able to remove obvious misalignment between the LiDAR point cloud and the panoramic image. However, the panoramic model achieved better registration accuracy. It is suggested that a suitable camera model may need to be chosen for certain data fusion tasks. For example, for rendering a LiDAR point cloud with acceptable misalignment, the spherical camera model would be adequate while the panoramic camera model may be necessary for high level fusion tasks such as facade modeling.
There are ways to further improve the registration accuracy and automation of the proposed method in future work. First, the errors from the LiDAR point cloud itself could not be overlooked. In our case, the LiDAR points were collected by three laser scanners, whose calibration errors also influenced the registration accuracy. Moreover, finding reliable correspondences between different datasets may need to use local statistical similarity such as mutual information. In addition to geometrical features, utilizing the physical attributes of LiDAR, such as intensity, is also a future research topic.