Calibration of an Outdoor Distributed Camera Network with a 3D Point Cloud

Outdoor camera networks are becoming ubiquitous in critical urban areas of the largest cities around the world. Although current applications of camera networks are mostly tailored to video surveillance, recent research projects are exploiting their use to aid robotic systems in people-assisting tasks. Such systems require precise calibration of the internal and external parameters of the distributed camera network. Despite the fact that camera calibration has been an extensively studied topic, the development of practical methods for user-assisted calibration that minimize user intervention time and maximize precision still pose significant challenges. These camera systems have non-overlapping fields of view, are subject to environmental stress, and are likely to suffer frequent recalibration. In this paper, we propose the use of a 3D map covering the area to support the calibration process and develop an automated method that allows quick and precise calibration of a large camera network. We present two cases of study of the proposed calibration method: one is the calibration of the Barcelona Robot Lab camera network, which also includes direct mappings (homographies) between image coordinates and world points in the ground plane (walking areas) to support person and robot detection and localization algorithms. The second case consist of improving the GPS positioning of geo-tagged images taken with a mobile device in the Facultat de Matemàtiques i Estadística (FME) patio at the Universitat Politècnica de Catalunya (UPC).

Abstract: Outdoor camera networks are becoming ubiquitous in critical urban areas of the largest cities around the world. Although current applications of camera networks are mostly tailored to video surveillance, recent research projects are exploiting their use to aid robotic systems in people-assisting tasks. Such systems require precise calibration of the internal and external parameters of the distributed camera network. Despite the fact that camera calibration has been an extensively studied topic, the development of practical methods for user-assisted calibration that minimize user intervention time and maximize precision still pose significant challenges. These camera systems have non-overlapping fields of view, are subject to environmental stress, and are likely to suffer frequent recalibration. In this paper, we propose the use of a 3D map covering the area to support the calibration process and develop an automated method that allows quick and precise calibration of a large camera network. We present two cases of study of the proposed calibration method: one is the calibration of the Barcelona Robot Lab camera network, which also includes direct mappings (homographies) between image coordinates and world points in the ground plane (walking areas) to support person and robot detection and localization algorithms. The second case consist of improving the GPS positioning of geo-tagged images taken with a mobile device in the Facultat de Matemàtiques i Estadística (FME) patio at the Universitat Politècnica de Catalunya (UPC).
Keywords: camera network calibration

Introduction
The development of powerful laser sensors combined with simultaneous location and mapping (SLAM) methodologies [1] allows the possibility to have available high precision 3D maps registered over large areas [2]. These maps come usually in the form of large point clouds and are typically used to support robot navigation systems [3]. These point clouds are usually acquired with laser range finders, cover the complete area of the network and, in particular, contain the areas corresponding to the fields of view of the cameras. This paper exploits these technologies, proposing a methodology to calibrate an outdoor, non-overlapped, distributed camera network using such range data. See Figure 1. Traditional techniques for camera calibration require the use of non-planar [4] or planar [5] patterns, usually made of points, lines or checkerboards [6,7], conics [8] or, even, ARTagmarkers [9]. Unfortunately, for large outdoor camera networks, calibration patterns of reasonable sizes often project on images with very small resolution, mainly because cameras are usually located at a considerable height with respect to the floor, consequently making pattern segmentation difficult. In addition, a pattern-based independent calibration of each camera would require a secondary process to relate all camera coordinate systems to a global reference frame, but establishing this relation with small to null overlapping fields of view is nearly impossible. For planar scenarios, a direct linear transformation (DLT) [10] suffices to estimate image to plane homographies [11]. Unfortunately, the planar scenario assumption is too restrictive, especially in situations with nonparallel, locally planar surfaces, such as ramps and plazas, which often occur in real urban environments, as in our case.
An interesting technique to calibrate the camera network without the need of a pattern is with the aid of a bright moving spot [12]. The technique assumes overlapping fields of view to estimate the epipolar geometry to extract homographies, estimate depth and, finally, compute the overall calibration of the camera network. In our case, the cameras' fields of view seldom overlap, and the visibility of the bright spot does not always hold in sunlight. Another alternative is to place an LED light on a moving robot and to track it with a secondary robot equipped with a laser sensor, relating their position estimates to the camera network [13]. Yet another system that relies on tracking a moving object to estimate the extrinsic parameters is [14], which assumes a constant velocity model for the target. Tracking a moving target each time the system needs recalibration might be prohibitive. The estimation of the camera location purely by analyzing cast shadows is also mathematically possible, but with very low position accuracy in practice [15], and if one is interested only in the topology of the network configuration and not in a metric calibration, multi-target tracking of people could also be an alternative [16]. In contrast to these approaches, we opt for a system that does not rely explicitly on a moving pattern or shadow to calibrate the network and that produces an accurate metric calibration.
For camera network systems that incorporate camera orientation control (pan and tilt) and motorized zoom, it is possible to use such motion capabilities to first estimate the intrinsic parameters rotating and zooming, fitting parametric models to the optical flow, and then to estimate extrinsic parameters aligning landmarks to image features [17]. Unfortunately, in our case, the cameras are not active. Another option is to use stereo pairs instead of monocular cameras at each node in the network. In this way, local 3D reconstruction can be obtained directly within each node and registered globally using graph optimization techniques [18]. Overlapping fields of view are still necessary in that case. A third option to calibrate the camera network, albeit relative translation, is to use a vertical vanishing point and the knowledge of a line in a plane orthogonal to the vertical direction on each camera image [19]. A different, but related, problem is the relative positioning of one or more cameras with respect to a range sensor. To that end, calibration can be achieved using a checkerboard pattern, as in [20]. A related methodology to calibrate extrinsically an omnidirectional camera using point correspondences between a laser scan an the camera image is proposed in [21]. In contrast to our approach, the method assumes known intrinsic camera parameters. For a method to calibrate the laser intrinsic parameters instead, the reader is referred to [22].
We benefit from the availability of a dense point cloud acquired during a 3D laser-based SLAM session with our mapping mobile robot [3]. The set contains over eight million points and maps the environment with accuracies that vary from 5 cm to 20 cm, approximately. These data replace the need for a checkerboard pattern, a tracked beam, a robot or active capabilities of the cameras and are used as external information to calibrate the network.
Our work is largely related in spirit to the method described in [23], in which a set of images are registered into a urban point cloud. One major difference of the approach is on the assumptions made with respect to the characteristics of the scene during 3D feature extraction. In particular, the above-mentioned technique exploits the fact that buildings have strong parallel edges and that these cluster with similar orientation. On the contrary, we exploit the fact that in urban scenes, large planar regions also meet at long straight edges. In contrast with [24], in which edge parallelism is used to calibrate only the attitude and focal length of cameras for traffic monitoring, we use edge information to calibrate also the camera location.
The rest of the paper is organized as follows. We explain first how nominal calibration is executed and then how this calibration is refined, first by extracting 3D features from the point clouds and optimizing over the reprojection error of their matching to those found in images. When needed, a final refinement step is computed by means of the DLT-lines algorithm. Experiments on a pair of scenarios are presented to show the feasibility of the proposed solution. The paper is concluded with some remarks and future work.

Nominal Calibration
The proposed calibration procedure is illustrated in Figure 2. It consists of two main steps. In the first step, the internal camera parameters are initialized to the manufacturer specs (image pixel width and depth and focal length), and a nominal calibration of the camera external parameters is obtained by manually registering the point cloud to an aerial image of the experimental site with the aid of a graphical user interface, prompting the user to coarsely specify the camera location, orientation, height and field of view. These initial parameters allow the cropping of the full point cloud into smaller regions of interest compatible with the field of view of each camera. The user can then adjust the registration by manually modifying each of the parameters (see the video associated with [25], available in the IEEE Xplore digital library). Figure 2. Two-step calibration methodology. In the first step, a graphical user interface is used to assist in an initial manual registration of the point cloud. The second step refines this registration, matching 2D image lines with 3D edges in the point cloud. In the second step, an automatic refinement of the camera calibration parameters is obtained by matching 2D image lines to their corresponding 3D edges in the point cloud. To this end, the point cloud is segmented into a set of best fitting planes with large support using local variation [26], and straight edges are computed from the intersection of perpendicular planes in the set. The extracted 3D lines are associated with 2D image lines, and this information is fed to a non-linear optimization procedure that improves both intrinsic and extrinsic parameters. Finally, the homographies of the walking areas are computed for the planar regions in the range data. The final output of the whole calibration procedure consists in: (1) the extrinsic camera parameters, i.e., the position and orientation of each camera in the world frame; (2) the intrinsic camera parameters (focal distance, image center aspect ratio and distortion terms); and (3) the homographies of each walking area.
The first step of the calibration procedure needs to be performed only once, during the camera network installation or when the network topology changes, i.e., cameras are added/moved, and takes only a couple of minutes. The second step, which does not require user intervention, can be executed as frequently as needed to keep the system calibrated, despite small modifications in camera orientation due to weather conditions and maintenance operations. In the following paragraphs, we detail the nominal calibration.
For the nominal calibration, the user interacts with a graphical user interface to coarsely locate each camera with respect to the reference frame of the point cloud and the viewing direction in the ground plane. For an easier interaction, the point cloud appears registered with an aerial view of the environment. The registration of the point cloud with the aerial view is computed manually using the DLT [10]. The camera parameters are initialized with default intrinsic parameters.
We then compute initial values for the camera pose: the center of projection t is simply the user selected point p 1 , located at a user define height (i.e., 6 m in our application). See Figure 3. The two selected points p 1 and p 2 set the azimuth direction ψ. The elevation φ gives a user-defined inclination to the ground (17 • in the shown example), and the roll ρ is set to π to account for the proper axes changes. These parameters suffice to compute the initial rotation matrix R = R ρ R ψ R φ with: Assuming that the principal point (u 0 , v 0 ) is located at the image center, we can compute an initial estimate for the camera intrinsic parameters using as input the focal length f and the CCD pixel size in millimeters k u and k v , i.e., α v = f k v , α u = f k u , and: The initial estimate of the perspective projection matrix for each camera is: Once this initial estimate is obtained for a particular camera, the user can further adjust each parameter manually, whilst a projection of a cropped section of the point cloud that falls within the viewing frustum is visualized in the image. Note, however, that this initial estimate does not take into account radial distortion parameters. These are refined along with the rest of parameters in the second step of the method.

Calibration Refinement
To improve camera calibration from the nominal parameters, we propose an automatic method, where relevant 3D edges are extracted from the point cloud and matched to corresponding image lines. In practice, the method works well with about a half-dozen lines selected from each camera image.
The procedure uses the nominal calibration as an initial rough approximation and can be executed anytime to recover from small perturbations in camera orientation or any other parameter, due to weather (wind, rain, etc.) or maintenance operations (repair, cleaning).

3D Edge Computation
The computation of straight lines from the point cloud relies on identifying and intersecting planes. The method to segment planar regions is motivated by Felzenszwalb's algorithm to 2D image segmentation [27] and extended to deal with non-uniformly sampled 3D range data [26]. The algorithm sorts point to point distances for each point's k-nearest neighbors and then traverses the list of sorted distances in increasing order, growing planar patches by joining points that meet two matching criteria, i.e., distance constraints and orientation constraints. Thanks to the use of union by rank and path compression [28], the algorithm runs nearly in linear time with respect to the number of points in the point cloud. The ANN library proved an adequate tool for the computation of approximate nearest neighbors [29]. If computation time becomes an issue, libnabo could be used as an alternative [30].  In the segmentation algorithm, local surface normals n are computed for each point in the point cloud, fitting local planar patches. To account for global variation, planar patches are merged, growing a forest of trees based on curvature and mean distance. The curvature between two candidate regions is computed from the angle between their two normals, which must be below a user-selected threshold t c , Two segments passing the curvature criteria are merged if they also pass a distance constraint. That is, if the sum of distances between their centers along weighted orthogonal directions is below a user-selected threshold t d , with k i and k j the number of points each segment holds and c i and c j the segment centers. See Figure 5.
if LABEL(c i ) = LABEL(c j ) then 13: if | cos −1 (n T i n j )| < t c then if ORTHOGONAL(n i , n j ) then 24: E ← E ∪ PLANEINTERSECTION(n i , n j ) 25: end if 26: end for

Optimization
Straight image lines are extracted from the camera images using the method proposed in [31]. The line set is pruned to those lines larger than a predefined threshold.
3D model lines are projected onto the image plane using the projection matrix computed during the nominal calibration step. 2D-3D line association is attained by matching such projections to the closest 2D image line.
Once the 3D-2D association is established, nonlinear optimization is performed to improve the nominal calibration by minimizing the squared sum of the line endpoint projection errors: where u id (ϑ) is the distorted projection of the 3D endpoint p i , ϑ = (K, R, t, a 1 , a 2 ) are the set of parameters being optimized and u i is the image point. The optimization is solved using Levenberg-Marquardt nonlinear optimization. See Figure 6. In this step of the method, image distortion is modeled based on even powers of the radial distance in the image plane: where a j are the distortion parameters, r 2 = u in − u 0 2 is the radial distortion factorization, u 0 is the computed principal point, u in is the normalized (pinhole) image projection of point p i : and ∼ denotes equality up to a scale factor.

Initialization of the Optimization
The nominal calibration introduced in Section 2 is, in general, sufficient to initialize the calibration optimization formalized in Equation (5). However, one finds that while it is usually simple and very effective to observe precisely some parameters as the camera horizontal position in the coordinates of a laser range finder map, other parameters, such as camera height, 3D rotation, focal length or principal point, are more challenging. The possibility of automating the initialization of the optimization procedure is a convenient feature for calibrating cameras in a network.
The initialization of the calibration optimization process may be setup to find from zero till all of the calibration parameters. Zero means using all of the nominal calibration results for initializing the optimization. All means (re-)finding all initialization parameters from image and laser range finder data. In between are several cases of interest, many of which have solutions published. For example, if a camera has its intrinsic parameters calibrated before being mounted in place, then one just has to estimate the extrinsic parameters by solving the well-known Perspective-n-Point (PnP) problem [32]. In the following, we detail two cases. In the first case, we show how to estimate all of the parameters from 3D lines imaged by the camera to calibrate. In the second case, we consider that the camera position is precisely known and detail how to estimate the intrinsic parameters and 3D rotation.
As proposed in [33], the use of image lines instead of isolated points in the camera calibration process brings an advantage. Image processing can be used for fine tuning the location of the lines in the image and therefore automatically improving the calibration data input. In this section, DLT-Lines is presented as a method to initialize the optimization step, allowing one to estimate simultaneously the camera projection matrix and radial distortion, from the 3D point cloud and 2D lines.
Considering the shorthand notation for image points m i = [u T i 1] T and 3D points M i = [p T i 1] T the perspective camera model, Equation (7), becomes m i ∼ PM i .
The projection of a 3D line L i to the camera image plane can be represented by the cross product of two image points in projective coordinates: Any point m ki lying in the line l i implies that l T i m ki = 0. Hence, applying the multiplication of l T i to both sides of the perspective camera model, i.e., l T i m ki = l T i P M ki , leads to: where M ki is a 3D point in projective coordinates lying in L i . The properties of the Kronecker product [34] allow one to obtain a form factorizing the vectorized projection matrix: Considering N ≥ 12 pairs (M ki , l i ), one forms a matrix B, N × 12, by stacking the N matrices M T ki ⊗ l T i . An example of N = 12 arises when one observes six 2D lines imaging six 3D lines, L i (i = 1, ..., 6), each one represented by two end points, L i ↔ (M i1 , M i2 ). Alternatively, given a 3D line L i and its projection represented by the image line l i , any 3D point lying on the 3D line L i can be paired with the 2D line l i . On the other hand, any image line l i can be paired with any 3D point lying on L i , i.e., more than one image line can be paired with a 3D point.
The least squares solution, more precisely the minimizer of B vec(P) 2 subjected to vec(P) = 1, is the right singular vector corresponding to the least singular value of B.
Note that the perspective camera model, as presented in Equation (7), does not contain yet the radial distortion. To include radial distortion, we use Fitzgibbon's division model [35]. An undistorted , where λ represents the radial distortion parameter. The division model allows one to define a line l 12 as the cross product of two points: where s i is the norm of distorted image point i,
To solve Equation (13) instead of Equation (10), we still need to consider N ≥ 12 pairs (M ki ,l i ), where N = k max i max , and form now two N × 12 matrices, B 1 and B 2 , instead of just N, by stacking matrices B ki1 and B ki2 . Left-multiplying the stacked matrices by B T 1 results in a polynomial eigenvalue problem (PEP), which can be solved, for example, in MATLAB using the polyeig function. It gives, simultaneously, the projection matrix, in the form of vec(P), and the radial distortion parameter λ.
Having estimated the projection matrix, P, the camera intrinsic and extrinsic parameters can be obtained by QR-decomposition [10]. More precisely, given the sub-matrix P 3×3 containing the first three columns of P and S an anti-diagonal matrix: the QR-decomposition allows factorizing P 3×3 T S = QU, where Q is an orthogonal matrix and U is an upper triangular matrix. Then, the intrinsic parameters and the rotation matrices are computed as K = −SU T S and R = Q T S. Finally, the camera position is obtained with t = KP 4 , where P 4 is a 3 × 1 vector containing the fourth column of P. If the diagonal of K contains negative values, then it is corrected by post-multiplying by a diagonal matrix. In MATLAB/Octave D= diag(sign(diag(K))); K= K * D; R= D * R; t= D * t;.
In addition, since ±P are both solutions of Equation (10), the factorization of P may imply det(R) = −1. If det(R) = −1, then the factorization of P is repeated using −P.
To convert the obtained distortion parameter λ to the distortion parameters in Equation (6), we sample the region of interest and find a least squares solution for the best parameter fit in this region. Starting from a set of camera points evenly sampled at pixel granularity covering the image dimensions {u id }, we apply the Fitzgibbon distortion model to obtain an uneven set of undistorted pixel coordinates {u i }. We next solve the optimization problem: which is a linear least squares problem in the variables a j , where a closed form solution is available.
For small distortions, we empirically find that a 1 = λ and a 2 = 0 provide a good fit to initialize the main optimization algorithm.

Known Camera Location
In the case one knows accurately the camera location, e.g., the camera has been imaged by the 3D data acquisition system, then the number of degrees of freedom of the calibration problem is decreased. The DLT methodology presented in the previous section can be further simplified.
Subtracting the camera center to all points of the point cloud results in a coordinate system where the camera is at the origin, and thus, the projection matrix, P = K[R| t] is equivalent to a simple homography,P = KR. Considering image lines l i and 3D points, p ki = [x ki y ki z ki ] T , imaged as points of the lines, recalling Equation (9), one has: where W t C denotes the camera projection center in world coordinates. As such, one obtains linear constraints similar to the ones already derived for DLT-Lines: The length of vec(KR) is just nine, i.e., the knowledge of the camera location saves three variables to estimate, and thus, the estimation process is intrinsically simplified. Finally, the projection matrix, P, can be obtained decomposingP = KR and adding the camera location as P = [P |P W t C ].
The calibration problem has been reduced to the estimation of a homography, represented by vec(KR) and, therefore, not including radial distortion. A similar form based on Equation (17) can be obtained for the radial distortion case represented in Equation (12): wherep k12 = (p k12 − W t C ). Equation (18) can be re-written in the form of Equation (13), which can be used to estimate the camera projection matrix P and radial distortion parameter λ, as shown before.

Experiments
To show the cogency of the proposed calibration method, we show calibration results of two different outdoor scenarios. In both cases, the range data was gathered using a custom-built 3D laser with a Hokuyo UTM-30LX scanner mounted in a slip-ring. Each scan has 194,580 points with a resolution of 0.5 • azimuth and 0.25 • elevation. The first dataset was acquired in the Barcelona Robot Lab (BRL), located at the Campus Nord of the Universitat Politècnica de Catalunya (UPC). The BRL is a 10,000 m 2 facility for research in outdoor service mobile robotics. It is equipped with a camera network, from which we only use 12 of the 21 cameras. They are shown in Figure 7. For this dataset, our 3D laser scanner was mounted on Helena, a Pioneer 3AT mobile robot, acquiring a total of 400 scans; however, only 30 of them were necessary to cover the area of the selected cameras. The complete dataset is available online [36]. The second dataset was gathered in the inner courtyard of the Facultat de Matemàtiques i Estadística (FME), located at the Campus Sud of the UPC. For this dataset, the range sensor was mounted atop our robot Teo, a rough outdoor terrain Segway RPM400 mobile robot. In this case, only 39 scans were collected. Figure 8 shows the point cloud registered onto an aerial view of the scene and the segmented planes. A mobile phone camera was used to acquire geo-tagged images from different position in this scenario.
In both cases, the point clouds generated from the aggregation of multiple scans are preprocessed to remove outliers, to smooth out the planar regions and to provide a uniform point distribution through subsampling. The details of the filtering scheme used follow [3]. The parameters used to segment the range data in both scenarios were n = 25 neighbors to fit planar patches, a distance threshold of t d = 0.5 and a curvature threshold t c = 0.8. Furthermore, we only consider lines intersecting orthogonal planes with a deviation of ±3 • from orthogonality. For those cases when there were less than six independent lines detected for the calibration of a camera, each detected line was broken into three segments, and all of these were used for the optimization step. This allowed one to have a sufficiently large number of lines to find a least squares solution to Equations (17) and (18). In most of those cases, however, the DLT-lines algorithm did not contribute to the improvement of the solution, and the first optimization sufficed to find acceptable results.
For the BRL dataset, the initial elevation angle was set to 17 • . We initialized on this value, because most of the cameras are located about 6 m above the ground, and objects in the images for which lines can be detected reliably are closer than 20 m. The horizontal field of view is initialized to 40 • , which corresponds to an 8-mm lens in a 0.25-in CCD. The final calibration of the internal camera parameters for the BRL set are given in Table 1, along with the mean reprojection error and standard deviation. Note that a comparison of these estimates to those obtained with a checkerboard is unrealistic due to the actual positioning of the cameras. An unfeasibly large checkerboard would be needed and moved along the whole camera workspace to achieve significant results. Table 2 gives the obtained camera locations for this experiment, together with the camera ground truth locations manually extracted from the 3D point cloud. These values should only be used as a reference, since small variations of focal length might have incidence in the final positioning of the camera along the principal axis, without detriment to the camera reprojection for a limited range of depth values. The elapsed computation time for the calibration of each camera is also reported in the table. Figure 9 shows the reprojection of each matched line onto the corresponding camera images after the optimization is computed.
The estimated camera poses are plotted in Figure 10. Camera viewpoints are represented with triangular pyramids that suggest the viewing direction. To empirically judge the quality of the calibration results, we can also project all points from the map that fall in each camera viewing frustum onto the image plane. This is shown in Figure 11 for the BRL.
In the case of the FME scenario, all images were taken with a mobile phone. GPS readings on the phone were used as initial position estimates. The local world model was considered planar, so that a homography can be used to translate from GPS coordinates to the metric representation used in the point cloud. Table 3 shows the results of the calibration of internal camera parameters. Since all images were computed using the same camera, the mean values obtained for the internal parameters can be used as a reference. These mean values are shown in the first line in the table.  Table 4 reports the different camera positions estimated with our algorithm and contrasted to the GPS readings on the phone. GPS coordinates are transformed to metric coordinates with the WGS84 standard and affine transformed with the DLT algorithm to align them with the FME building. Height values are not reported, as their readings from the phone GPS unit are unreliable. This comparison is shown schematically in Figure 12. Figure 12a shows the estimated camera viewpoints, whereas Figure 12b shows a comparison between GPS readings (blue squares) and the camera poses computed with our method (red dots). Figure 13 shows results of the optimization results for the FME dataset. In blue projected 3D lines and in red selected image lines. The heavy presence of unstructured data made it difficult to find a large number of support lines for calibration in this scenario, suggesting the need for calibration also with point/appearance features, together with lines. We leave this hybrid scheme as an open alternative for further development of the method.

Computing Homographies
To measure events occurring on the scene, such as path lengths or areas of crowdedness, it would be necessary to obtain direct mappings from images to planar regions in the floor. The idea is to have a practical way to transfer 2D images to the world coordinates of the targets detected. To this end, we compute the homographies of user-selected planes with the end of a graphical user interface. The user selects polygonal regions in the images, and the 3D points that project inside these polygons are used to approximate the 3D planes. The algorithm to compute the homographies is the standard direct linear transform [10]. Figure 14 shows the result of this computation for the two scenarios. Camera images, each one of a different color mask, are projected onto their corresponding planar regions in the map.

Conclusions
In this paper, we have proposed a methodology to calibrate outdoor distributed camera networks having small or inexistent overlapping fields of view between the cameras. The methodology is based on the matching of line image features with 3D lines computed from dense 3D point clouds of the scene.
In the first stage, the user obtains the nominal calibration by using default intrinsic parameters for the cameras and indicating their positions and orientations on an aerial view aligned with the range map. Next, the calibration of each camera is improved by an automatic optimization procedure detecting lines in the 3D map and matching them with image lines. The lines are detected in the point cloud by automatically segmenting out planar regions and finding such plane intersections. The optimization procedure then minimizes the distance between points in the lines found in the map and their corresponding points in the image lines. The method has been used to calibrate the Barcelona Robot Lab, a 10,000 m 2 area for mobile robot experimentation, and for camera localization at the FME patio, both located at the UPC campus in Barcelona.
Future work will include an analysis of uncertainty for the external calibration using the DLT-lines algorithm and the combination of feature points together with lines for scenes with a limited structure.

Author Contributions
Agustín Ortega developed the nominal and refinement calibrations routines and drafted an initial version of the paper. Ernesto Teniente contributed with the 3D point cloud of the BRL site, and also helped with data acquisition for the FME experiment. Manuel Silva and Ricardo Ferreira developed the DLT-lines algorithm. Alexandre Bernardino, José Gaspar and Juan Andrade-Cetto supervised the work and finalized the manuscript. All of the authors read and approved the final manuscript.