Visual EKF-SLAM from Heterogeneous Landmarks †

Many applications require the localization of a moving object, e.g., a robot, using sensory data acquired from embedded devices. Simultaneous localization and mapping from vision performs both the spatial and temporal fusion of these data on a map when a camera moves in an unknown environment. Such a SLAM process executes two interleaved functions: the front-end detects and tracks features from images, while the back-end interprets features as landmark observations and estimates both the landmarks and the robot positions with respect to a selected reference frame. This paper describes a complete visual SLAM solution, combining both point and line landmarks on a single map. The proposed method has an impact on both the back-end and the front-end. The contributions comprehend the use of heterogeneous landmark-based EKF-SLAM (the management of a map composed of both point and line landmarks); from this perspective, the comparison between landmark parametrizations and the evaluation of how the heterogeneity improves the accuracy on the camera localization, the development of a front-end active-search process for linear landmarks integrated into SLAM and the experimentation methodology.


Introduction
Simultaneous localization and mapping (SLAM) is an essential functionality required on a moving object for many applications where the localization or the motion estimation of this object must be determined from sensory data acquired by embedded sensors. The object is typically a robot or a vehicle, the position of which is required to deal with robust navigation in a cluttered environment. A SLAM module could also be required on smart tools (phones, glasses) to offer new services, e.g., augmented reality [1][2][3].
The robot or smart tool could be equipped with a global navigation satellite system (GNSS) receiver for outdoor applications to obtain directly a position with respect to the Earth reference frame [4]; at present, indoor localization with respect to a building reference frame could also be provided using ultra-wide band (UWB) [5], WiFi [6] or RF devices [7], on the condition that a hotspot or antenna network has been previously installed and calibrated. However, the direct localization is not always available (i.e., occlusions, bad propagation, multiple paths); so generally, they are combined using loose or tie fusion strategies, with motion estimates provided by an inertial measurement unit (IMU), integrating successive accelerometer and gyro data [8][9][10]. Nevertheless, even GPS-IMU fusion could fail or be too inaccurate. Depending on the context, a priori knowledge could be exploited; a map matching function can be sufficient, as in the GPS-based navigation systems available on commercial vehicles.
Considering mobile robots or the emerging autonomous vehicles, it is necessary to also make use of data acquired on the environment with embedded exteroceptive sensors, e.g., laser range finders [11], 3D sensors (ToF cameras [12], Kinect [13]) or vision with many possible modalities (mono, stereo, omni). Here, only visual SLAM is considered, due to the fact that it could be integrated both in low-cost unmanned ground and aerial vehicles and on smart tools equipped with cameras. Many visual SLAM methods have been proposed during the last decade [3,14].
A SLAM method combines two interleaved functionalities shown in Figure 1: the front-end detects and tracks features from images acquired from the moving robot, while the back-end, interpreting these feature and landmark observations, estimates both the landmark and robot positions with respect to the selected reference frame.
Landmark% tracking% Landmark% ini3aliza3on% Map%update% The back-end can be based either on estimation (Kalman [15], information [16], particle filters [17]) or optimization (bundle adjustment) frameworks [18]. The more classic landmarks are 3D points, detected as interest points (SIFT [19], SURF [20], FAST [21]), matched by using their descriptors (binary robust independent elementary features (BRIEF) [22]), or tracked by using Kanade-Lucas-Tomasi (KLT) feature tracker [23], or an active-search strategy [24]. Generally, the set of 3D points extracted from an image does not give any semantic information, unlike 3D lines, which correspond generally to sharp 3D edges in the environment. This is the reason why segment-based SLAM from either an estimation [25] or optimization [26] back-end, has been proposed. The main challenge of these methods concerns the front-end, i.e., the robustness of the line detection and tracking in successive images.
The initialization of such landmarks with their minimal Euclidean parameters requires more than one observation. One way to solve this problem was the delayed initialization [3,27], in which a landmark was added to the map only when it was known in the Euclidean space. This does not allow use of landmarks that are very far from the robot. An alternative solution is to add them to the map, as soon as they are observed (i.e., undelayed initialization), and it has been proposed for point [28,29] or line [25] landmarks. The pros and cons of several representations for 3D points and 3D lines have been analyzed in [30].
This article is devoted to the analysis of a visual SLAM solution using a heterogeneous map, as a more complete approach where points and lines are both included from features extracted in images acquired by a camera moving in the environment, and with undelayed initialization. Therefore, the contributions of the proposed method comprehend the use of heterogeneous landmark-based SLAM (the management of a map composed of heterogeneous landmarks), and from this perspective, the comparison between landmark parametrizations, the development of a front-end active-search process for linear landmarks integrated into SLAM (the processing of linear landmarks by the an undelayed landmark points and lines paramtituting the unmeasured ian prior that handles till manageable my the es such as points and plications. For points, and it covers all the e straight lines handles freedom, which corree covered up to infinity, the next sections, point for ULI are covered, for process itself. 2) Homogeneous point: Homogeneous points are conformed by a 4-vector, which is composed by the 3D vector m and scalar ρ.
In order to convert from homogeneous to Euclidean coordinates, the following equation is applied: In the camera frame, m is the director vector of the optical ray, and ρ has a linear dependence to the inverse of the distance d defined from the optical center to the point.
This allows to express the unbounded distance of a point along the optical ray from 0 to infinity, into this bounded interval in parameter space ρ ∈ (0, m/d min ].
The frame transformation of an homogeneous point is performed according to the next equation: where super-index C indicates the frame to which the point is refered, and matrix H sepecifies the frame to which the point is transformed. 2) Homogeneous point: Homogeneous points are conformed by a 4-vector, which is composed by the 3D vector m and scalar ρ.
In order to convert from homogeneous to Euclidean coordinates, the following equation is applied: In the camera frame, m is the director vector of the optical ray, and ρ has a linear dependence to the inverse of the distance d defined from the optical center to the point.
This allows to express the unbounded distance of a point along the optical ray from 0 to infinity, into this bounded interval in parameter space ρ ∈ (0, m/d min ]. The frame transformation of an homogeneous point is performed according to the next equation: 2) Homogeneous point: Homogeneous points are conformed by a 4-vector, which is composed by the 3D vector m and scalar ρ.
In order to convert from homogeneous to Euclidean coordinates, the following equation is applied: In the camera frame, m is the director vector of the optical ray, and ρ has a linear dependence to the inverse of the distance d defined from the optical center to the point.
This allows to express the unbounded distance of a point along the optical ray from 0 to infinity, into this bounded interval in parameter space ρ ∈ (0, m/d min ].
The frame transformation of an homogeneous point is performed according to the next equation: where super-index C indicates the frame to which the point is refered, and matrix H sepecifies the frame to which the point is transformed. 2) Homogeneous point: Homogeneous points are nformed by a 4-vector, which is composed by the vector m and scalar ρ.
In order to convert from homogeneous to Euclidean ordinates, the following equation is applied: In the camera frame, m is the director vector of the tical ray, and ρ has a linear dependence to the inverse the distance d defined from the optical center to the int.  For different landmark types such as points and lines, uncertainty has distict implications. For points, there is uncertainty in distance and it covers all the visual ray until infinity. Infinite straight lines handles uncertainty in two degrees of freedom, which correspond to a distance that should be covered up to infinity, and all possible orientations. In the next sections, point and line parameterizations used for ULI are covered, for then deepen in the initialization process itself.

A. 3D point parameterizations
This section explains some point parameterizations. The aspects included in each description refer to the parameterization itself, camera projection, coordinate transformation, and back-projection. 2) Homogeneous point: Homogeneous points are conformed by a 4-vector, which is composed by the 3D vector m and scalar ρ.
In order to convert from homogeneous to Euclidean coordinates, the following equation is applied: In the camera frame, m is the director vector of the optical ray, and ρ has a linear dependence to the inverse of the distance d defined from the optical center to the point.
This allows to express the unbounded distance of a point along the optical ray from 0 to infinity, into this bounded interval in parameter space ρ ∈ (0, m/d min ].
The frame transformation of an homogeneous point is performed according to the next equation: where super-index C indicates the frame to which the point is refered, and matrix H sepecifies the frame to which the point is transformed.

EKF-SLAM with heterogeneous landmarks Michel Devy
Jorge Othón Esparza-Jiménez For different landmark types such as points and lines, uncertainty has distict implications. For points, there is uncertainty in distance and it covers all the visual ray until infinity. Infinite straight lines handles uncertainty in two degrees of freedom, which correspond to a distance that should be covered up to infinity, and all possible orientations. In the next sections, point and line parameterizations used for ULI are covered, for then deepen in the initialization process itself.

A. 3D point parameterizations
This section explains some point parameterizations. The aspects included in each description refer to the parameterization itself, camera projection, coordinate transformation, and back-projection.
R and T are the rotation matrix and translation vector that define the camera C. Underlined vectors like u represent homogeneous coordinates.
2) Homogeneous point: Homogeneous points are conformed by a 4-vector, which is composed by the 3D vector m and scalar ρ.
In order to convert from homogeneous to Euclidean coordinates, the following equation is applied: In the camera frame, m is the director vector of the optical ray, and ρ has a linear dependence to the inverse of the distance d defined from the optical center to the point.
This allows to express the unbounded distance of a point along the optical ray from 0 to infinity, into this bounded interval in parameter space ρ ∈ (0, m/d min ].
The frame transformation of an homogeneous point is performed according to the next equation: where super-index C indicates the frame to which the point is refered, and matrix H sepecifies the frame to which the point is transformed.
The projection of a point into the image frame is performed with the following expression: al. introduce an undelayed landmark I) for different points and lines paramnsists on substituting the unmeasured n by a Gaussian prior that handles y but that is still manageable my the landmark types such as points and has distict implications. For points, ty in distance and it covers all the nfinity. Infinite straight lines handles o degrees of freedom, which corree that should be covered up to infinity, rientations. In the next sections, point rizations used for ULI are covered, for e initialization process itself.
meterizations xplains some point parameterizations. ded in each description refer to the itself, camera projection, coordinate d back-projection.
R and T are the rotation matrix and translation vector that define the camera C. Underlined vectors like u represent homogeneous coordinates.
2) Homogeneous point: Homogeneous points are conformed by a 4-vector, which is composed by the 3D vector m and scalar ρ.
In order to convert from homogeneous to Euclidean coordinates, the following equation is applied: In the camera frame, m is the director vector of the optical ray, and ρ has a linear dependence to the inverse of the distance d defined from the optical center to the point.
This allows to express the unbounded distance of a point along the optical ray from 0 to infinity, into this bounded interval in parameter space ρ ∈ (0, m/d min ].
The frame transformation of an homogeneous point is performed according to the next equation: The complete homogeneous point parameterization is given in the following equations: , where ρ C must be given as prior and represents inverse-distance from the origin of coordinates.
3) Anchored homogeneous point: In order to improve linearity, an anchor is added as a reference to the optical center at initialization time of the landmark. Thus, the landmark is a 6-vector that includes the anchor 3D coordinates, the Cartesian coordinates of the point with respect to the anchor, and an inverse-depth scalar.
The convertion from anchred homogeneous point to Euclidean coordinates can be achieved by the following equation: The projection and frame transformation process is given in the next expression: The complete anchores homogeneous point parameterization is the following: where ρ C must be given as prior.

B. 3D line parameterizations
In this section, some line parameterizations are covered. The description of projection to image frame, bilinear transformation and back-projection are included.
T can be represented as homogeneous 6-vector, known as Plücker coordinates: The complete homogeneous point parameterization is given in the following equations: , where ρ C must be given as prior and represents inverse-distance from the origin of coordinates.
3) Anchored homogeneous point: In order to improve linearity, an anchor is added as a reference to the optical center at initialization time of the landmark. Thus, the landmark is a 6-vector that includes the anchor 3D coordinates, the Cartesian coordinates of the point with respect to the anchor, and an inverse-depth scalar.
The convertion from anchred homogeneous point to Euclidean coordinates can be achieved by the following equation: The projection and frame transformation process is given in the next expression: The complete anchores homogeneous point parameterization is the following: where ρ C must be given as prior.

B. 3D line parameterizations
In this section, some line parameterizations are covered. The description of projection to image frame, bi-where n = a × b, n = ab − having the following Plücker cons Geometrically speaking, n is the plane π containing the line v is the director vector from a orthogonal distance from the line by n/v. Thus, v is the inv to ρ of homogeneous points. Plü representation is shown in figure Plücker coordinates transform frame is performed as shown nex The whole transformation and Plücker coordinates in terms of R l = K · R T · (n − T where K is the instrinsic proje defined as: When Plücker coordinates are frame, projection is only obtained The projection to the camera frame is performed as follows: where, is the intrinsic parameter matrix and R and T are the rotation matrix and translation vector that define the camera C. Homogeneous coordinates are represented by underlined vectors, like u.

Homogeneous Point
Homogeneous points are four-vector composed of the 3D vector m and scalar ρ, as introduced in [32].
The vector m gives the direction from the origin O to the point p, while ρ serves as a scale factor for providing the magnitude for each coordinate of the point.
The conversion from homogeneous to Euclidean coordinates is given by the following equation: Depending on the characteristics of the parameters m and ρ, there are three different canonical representations for a homogeneous point. The original Euclidean point refers to the case when ρ = 1, inverse-depth has m z = 1 and in inverse-distance m = 1.
In the camera frame, m is the director vector of the optical ray, and ρ has a linear dependence on the inverse of the distance d defined from the optical center to the point: The unbounded distance of a point along the optical ray from zero to infinity can then be expressed in the bounded interval in parameter space ρ ∈ (0, m /d min ].
The frame transformation of a homogeneous point is performed according to the next equation: where super-index C indicates the frame to which the point is referred and matrix H specifies the frame to which the point is transformed. The projection of a homogeneous point into the image frame is performed with the following equation: By expressing a homogeneous point in the camera frame, the projected image point is u = Km C , where super-index C indicates the frame to which the point is referred. In this case, ρ C is not measurable. Back-projection is then: The complete homogeneous point parametrization is the following: where ρ C must be given as a prior and represents the inverse-distance from the origin of the coordinates, that is the scalar value that makes m = 1.

Anchored Homogeneous Point
Linearity is supposed to be improved by the addition of an anchor that serves as a reference to the optical center at the initialization time of the landmark. The landmark is then composed of seven elements that include Cartesian coordinates of the anchor, the point with respect to the anchor and an inverse-distance scalar.
The conversion from the anchored homogeneous point to Euclidean coordinates can be obtained by the following equation: The projection and frame transformation process is given below: The anchor is chosen to be the position of the optical center at the initialization time, given by T. That way, the term multiplying the unmeasured degree of freedom ρ (i.e., (T − p 0 ) ρ) is small after initialization. This helps to decouple the uncertainty of the most uncertain parameter ρ. The complete anchored homogeneous point parametrization for back projection and transformation is the following: where ρ C must be given as the prior.

3D Line Parametrizations
The line parametrization includes the projection to the image frame and back-projection to 3D. The Plücker line and anchored homogeneous point line are shown in Figure 3. thón Esparza-Jiménez der to convert from homogeneous to Euclidean tes, the following equation is applied: e camera frame, m is the director vector of the ay, and ρ has a linear dependence to the inverse istance d defined from the optical center to the ρ = m d allows to express the unbounded distance int along the optical ray from 0 to infinity, s bounded interval in parameter space ρ ∈ /d min ].
frame transformation of an homogeneous point med according to the next equation: e super-index C indicates the frame to which t is refered, and matrix H sepecifies the frame the point is transformed.
projection of a point into the image frame is ed with the following expression: essing an homogeneous point in the camera he projected image point is u = Km C , and t measurable. Back-projection is then: r and represents rdinates.
In order to ims a reference to of the landmark. ludes the anchor ates of the point rse-depth scalar.
geneous point to by the following (6) ation process is us point param- izations are covimage frame, biion are included.
ed by two points represented as ker coordinates: where n = a × b, n = ab − ba, n, v ∈ R 3 , and having the following Plücker constraint: n T v = 0.
Geometrically speaking, n is the vector normal to the plane π containing the line and the origin, and v is the director vector from a to b. The Euclidean orthogonal distance from the line to the origin is given by n/v. Thus, v is the inverse-depth, analogous to ρ of homogeneous points. Plücker line geometrical representation is shown in figure 2. Plücker coordinates transformation from camera frame is performed as shown next: The whole transformation and projection process for Plücker coordinates in terms of R , T, n, and v is: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions: 2 int parameterization : rior and represents coordinates.
nt: In order to imd as a reference to me of the landmark. t includes the anchor dinates of the point inverse-depth scalar.
mogeneous point to ved by the following (6) formation process is eneous point param- r.
eterizations are covto image frame, bijection are included. efined by two points be represented as lücker coordinates: where n = a × b, n = ab − ba, n, v ∈ R 3 , and having the following Plücker constraint: n T v = 0.
Geometrically speaking, n is the vector normal to the plane π containing the line and the origin, and v is the director vector from a to b. The Euclidean orthogonal distance from the line to the origin is given by n/v. Thus, v is the inverse-depth, analogous to ρ of homogeneous points. Plücker line geometrical representation is shown in figure 2. Plücker coordinates transformation from camera frame is performed as shown next: The whole transformation and projection process for Plücker coordinates in terms of R , T, n, and v is: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions: Plücker coordinates transformation from camera frame is performed as shown next: The whole transformation and projection process for Plücker coordinates in terms of R , T, n, and v is: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions: Plücker coordinates transformation from camera frame is performed as shown next: The whole transformation and projection process for Plücker coordinates in terms of R , T, n, and v is: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions: be given as prior and represents the origin of coordinates.
ogeneous point: In order to imnchor is added as a reference to initialization time of the landmark. a 6-vector that includes the anchor Cartesian coordinates of the point nchor, and an inverse-depth scalar.
om anchred homogeneous point to s can be achieved by the following d frame transformation process is ression: chores homogeneous point paramlowing: e given as prior.
rizations me line parameterizations are covof projection to image frame, biand back-projection are included.
A line in P 3 defined by two points T can be represented as or, known as Plücker coordinates: where n = a × b, n = ab − ba, n, v ∈ R 3 , and having the following Plücker constraint: n T v = 0.
Geometrically speaking, n is the vector normal to the plane π containing the line and the origin, and v is the director vector from a to b. The Euclidean orthogonal distance from the line to the origin is given by n/v. Thus, v is the inverse-depth, analogous to ρ of homogeneous points. Plücker line geometrical representation is shown in figure 2. Plücker coordinates transformation from camera frame is performed as shown next: The whole transformation and projection process for Plücker coordinates in terms of R , T, n, and v is: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions: 2 where n = a × b, n = ab − ba, n, v ∈ R 3 , and having the following Plücker constraint: n T v = 0.
Geometrically speaking, n is the vector normal to the plane π containing the line and the origin, and v is the director vector from a to b. The Euclidean orthogonal distance from the line to the origin is given by n/v. Thus, v is the inverse-depth, analogous to ρ of homogeneous points. Plücker line geometrical representation is shown in figure 2. Plücker coordinates transformation from camera frame is performed as shown next: The whole transformation and projection process for Plücker coordinates in terms of R , T, n, and v is: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions: The complete anchores homogeneous point parameterization is the following: where ρ C must be given as prior.

B. 3D line parameterizations
In this section, some line parameterizations are covered. The description of projection to image frame, bilinear transformation and back-projection are included. T can be represented as homogeneous 6-vector, known as Plücker coordinates: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions: where β 1 , β 2 ∈ R and e 1 , e 2 , n C are mutually orthogonal.
Defining β = (β 1 , β 2 ) ∈ R 2 , vector v C can be also expressed as: where v C ∈ π C for any value of β.
In order for β to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where β must be provided as a prior.
2) Anchored Homogeneous-points line: Another way of representing a line is by the endpoints that define it. Departing from the anchored homogeneous point parameterization, an homogeneous-point line is an 11-vector defined as follows: For each point, the transformation and projection of a pinhole camera is , as previously stated, An homogeneous 2D line is obtained by the cross product of two points lying on it, l = u 1 × u 2 and thus, Comparing this result to what was obtained for Plücker coordinates, it can be seen that the product m 1 × m 2 is a vector orthogonal to the plane π, analogous to the Plücker sub-vector n. Also, the term (ρ 1 m 2 − ρ 2 m 1 ) is a vector joining the two support points of the line, therefore related to Plücker sub-vector v.
Defining β = (β 1 , β 2 ) ∈ R 2 , vector v C can be also expressed as: where v C ∈ π C for any value of β.
In order for β to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where β must be provided as a prior.
2) Anchored Homogeneous-points line: Another way of representing a line is by the endpoints that define it. Departing from the anchored homogeneous point parameterization, an homogeneous-point line is an 11-vector defined as follows: For each point, the transformation and projection of a pinhole camera is , as previously stated, An homogeneous 2D line is obtained by the cross product of two points lying on it, l = u 1 × u 2 and thus, Comparing this result to what was obtained for Plücker coordinates, it can be seen that the product m 1 × m 2 is a vector orthogonal to the plane π, analogous to the Plücker sub-vector n. Also, the term (ρ 1 m 2 − ρ 2 m 1 ) is a vector joining the two support points of the line, therefore related to Plücker sub-vector v.
Defining β = (β 1 , β 2 ) ∈ R 2 , vector v C can be also expressed as: where v C ∈ π C for any value of β.
In order for β to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where β must be provided as a prior.
define it. Departing from the anchored homogeneous point parameterization, an homogeneous-point line is an 11-vector defined as follows: For each point, the transformation and projection of a pinhole camera is , as previously stated, An homogeneous 2D line is obtained by the cross product of two points lying on it, l = u 1 × u 2 and thus, Comparing this result to what was obtained for Plücker coordinates, it can be seen that the product m 1 × m 2 is a vector orthogonal to the plane π, analogous to the Plücker sub-vector n. Also, the term (ρ 1 m 2 − ρ 2 m 1 ) is a vector joining the two support points of the line, therefore related to Plücker sub-vector v.
In homogeneous coordinates, 3 Defining β = (β 1 , β 2 ) ∈ R 2 , vector v C can be also expressed as: where v C ∈ π C for any value of β.
In order for β to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where β must be provided as a prior.
For each point, the transformation and projection o a pinhole camera is , as previously stated, An homogeneous 2D line is obtained by the cros product of two points lying on it, l = u 1 × u 2 and thus Comparing this result to what was obtained fo Plücker coordinates, it can be seen that the produc m 1 × m 2 is a vector orthogonal to the plane π analogous to the Plücker sub-vector n. Also, the term (ρ 1 m 2 − ρ 2 m 1 ) is a vector joining the two suppor points of the line, therefore related to Plücker sub-vecto v.
Defining β = (β 1 , β 2 ) ∈ R 2 , vector v C can be also expressed as: where v C ∈ π C for any value of β.
In order for β to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where β must be provided as a prior.
2) Anchored Homogeneous-points line: Another way of representing a line is by the endpoints that define it. Departing from the anchored homogeneous point parameterization, an homogeneous-point line is an 11-vector defined as follows: For each point, the transformation and projection of a pinhole camera is , as previously stated, An homogeneous 2D line is obtained by the cross product of two points lying on it, l = u 1 × u 2 and thus, Comparing this result to what was obtained for Plücker coordinates, it can be seen that the product m 1 × m 2 is a vector orthogonal to the plane π, analogous to the Plücker sub-vector n. Also, the term (ρ 1 m 2 − ρ 2 m 1 ) is a vector joining the two support points of the line, therefore related to Plücker sub-vector v.
In homogeneous coordinates, where ρ C must be given as prior and represents inverse-distance from the origin of coordinates.
3) Anchored homogeneous point: In order to improve linearity, an anchor is added as a reference to the optical center at initialization time of the landmark. Thus, the landmark is a 6-vector that includes the anchor 3D coordinates, the Cartesian coordinates of the point with respect to the anchor, and an inverse-depth scalar.
The convertion from anchred homogeneous point to Euclidean coordinates can be achieved by the following equation: The projection and frame transformation process is given in the next expression: The complete anchores homogeneous point parameterization is the following: where ρ C must be given as prior.

B. 3D line parameterizations
In this section, some line parameterizations are covered. The description of projection to image frame, bilinear transformation and back-projection are included.

1) Plücker line: A line in
T can be represented as homogeneous 6-vector, known as Plücker coordinates: to ρ of homogeneous points. Plücker line geometrical representation is shown in figure 2. Plücker coordinates transformation from camera frame is performed as shown next: The whole transformation and projection process for Plücker coordinates in terms of R , T, n, and v is: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions: The convertion from anchred homogeneous point to Euclidean coordinates can be achieved by the following equation: The projection and frame transformation process is given in the next expression: The complete anchores homogeneous point parameterization is the following: where ρ C must be given as prior.

B. 3D line parameterizations
In this section, some line parameterizations are covered. The description of projection to image frame, bilinear transformation and back-projection are included.

1) Plücker line:
T can be represented as homogeneous 6-vector, known as Plücker coordinates: Plücker coordinates transformation from camera frame is performed as shown next: The whole transformation and projection process for Plücker coordinates in terms of R , T, n, and v is: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions: 2 he anchor, and an inverse-depth scalar.
on from anchred homogeneous point to inates can be achieved by the following n and frame transformation process is t expression: e anchores homogeneous point paramfollowing: st be given as prior.
meterizations n, some line parameterizations are covption of projection to image frame, biation and back-projection are included.
ne: A line in P 3 defined by two points T can be represented as vector, known as Plücker coordinates: Plücker coordinates transformation from camera frame is performed as shown next: The whole transformation and projection process for Plücker coordinates in terms of R , T, n, and v is: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions: 2 eneous point: In order to imhor is added as a reference to itialization time of the landmark. 6-vector that includes the anchor artesian coordinates of the point hor, and an inverse-depth scalar.
anchred homogeneous point to can be achieved by the following frame transformation process is ssion: ores homogeneous point paraming: given as prior.
zations e line parameterizations are covf projection to image frame, bind back-projection are included.
T can be represented as , known as Plücker coordinates: Plücker coordinates transformation from camera frame is performed as shown next: The whole transformation and projection process for Plücker coordinates in terms of R , T, n, and v is: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions:

Michel Devy
Jorge Othón Esparza-Jiménez For different landmark types such as points and lines, uncertainty has distict implications. For points, there is uncertainty in distance and it covers all the visual ray until infinity. Infinite straight lines handles uncertainty in two degrees of freedom, which correspond to a distance that should be covered up to infinity, and all possible orientations. In the next sections, point and line parameterizations used for ULI are covered, for then deepen in the initialization process itself.

A. 3D point parameterizations
This section explains some point parameterizations. The aspects included in each description refer to the parameterization itself, camera projection, coordinate transformation, and back-projection.
The projection to camera frame is given by the following equation: where, and T are the rotation matrix and translation vector that define the camera C. Underlined vectors like u represent homogeneous coordinates.
2) Homogeneous point: Homogeneous points are conformed by a 4-vector, which is composed by the 3D vector m and scalar ⇥.
In order to convert from homogeneous to Euclidean coordinates, the following equation is applied: In the camera frame, m is the director vector of the optical ray, and ⇥ has a linear dependence to the inverse of the distance d defined from the optical center to the point.
This allows to express the unbounded distance of a point along the optical ray from 0 to infinity, into this bounded interval in parameter space ⇥ ⇤ (0, ⌃m⌃/d min ].
The frame transformation of an homogeneous point is performed according to the next equation: where super-index C indicates the frame to which the point is refered, and matrix H sepecifies the frame to which the point is transformed.
The projection of a point into the image frame is performed with the following expression: Expressing an homogeneous point in the camera frame, the projected image point is u = Km C , and ⇥ C is not measurable. Back-projection is then: where ⇤ C must be given as prior and represents inverse-distance from the origin of coordinates.
3) Anchored homogeneous point: In order to improve linearity, an anchor is added as a reference to the optical center at initialization time of the landmark. Thus, the landmark is a 6-vector that includes the anchor 3D coordinates, the Cartesian coordinates of the point with respect to the anchor, and an inverse-depth scalar.
The convertion from anchred homogeneous point to Euclidean coordinates can be achieved by the following equation: The projection and frame transformation process is given in the next expression: The complete anchores homogeneous point parameterization is the following: where ⇤ C must be given as prior.

B. 3D line parameterizations
In this section, some line parameterizations are covered. The description of projection to image frame, bilinear transformation and back-projection are included.

1) Plücker line:
T can be represented as homogeneous 6-vector, known as Plücker coordinates: representation is shown in fig Plücker coordinates trans frame is performed as shown The whole transformation a Plücker coordinates in terms o l = K · R T · (n where K is the instrinsic p defined as: When Plücker coordinates frame, projection is only obta Line's range and orientatio measurable.
For Plücker line back proje are computed according to the 2 Euclidean coordinates can be achieved by the following equation: The projection and frame transformation process is given in the next expression: The complete anchores homogeneous point parameterization is the following: where ⇤ C must be given as prior.

B. 3D line parameterizations
In this section, some line parameterizations are covered. The description of projection to image frame, bilinear transformation and back-projection are included.
T can be represented as homogeneous 6-vector, known as Plücker coordinates: Plücker coordinates transformation from camera frame is performed as shown next: The whole transformation and projection process for Plücker coordinates in terms of R , T, n, and v is: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions: where 1 , 2 ⇧ R and ⇧ e 1 , e 2 , n C ⌃ are mutually orthogonal.
where v C ⇧ ⇥ C for any value of .
In order for to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where must be provided as a prior.
define it. Departing from the anchored homogeneous point parameterization, an homogeneous-point line is an 11-vector defined as follows: For each point, the transformation and projection of a pinhole camera is , as previously stated, An homogeneous 2D line is obtained by the cross product of two points lying on it, l = u 1 ⇤ u 2 and thus, Comparing this result to what was obtained for Plücker coordinates, it can be seen that the product m 1 ⇤ m 2 is a vector orthogonal to the plane ⇥, analogous to the Plücker sub-vector n. Also, the term (⇤ 1 m 2 ⇤ 2 m 1 ) is a vector joining the two support points of the line, therefore related to Plücker sub-vector v.
where v C ⇧ ⇥ C for any value of .
In order for to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where must be provided as a prior.
2) Anchored Homogeneous-points line: Another way of representing a line is by the endpoints that define it. Departing from the anchored homogeneous point parameterization, an homogeneous-point line is an 11-vector defined as follows: For each point, the transformation and projection of a pinhole camera is , as previously stated, An homogeneous 2D line is obtained by the cross product of two points lying on it, l = u 1 ⇤ u 2 and thus, Comparing this result to what was obtained for Plücker coordinates, it can be seen that the product m 1 ⇤ m 2 is a vector orthogonal to the plane ⇥, analogous to the Plücker sub-vector n. Also, the term (⇤ 1 m 2 ⇤ 2 m 1 ) is a vector joining the two support points of the line, therefore related to Plücker sub-vector v.
Defining = ( 1 , 2 ) ⇧ R 2 , vector v C can be also expressed as: where v C ⇧ ⇥ C for any value of .
In order for to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where must be provided as a prior.
2) Anchored Homogeneous-points line: Another way of representing a line is by the endpoints that define it. Departing from the anchored homogeneous point parameterization, an homogeneous-point line is an 11-vector defined as follows: For each point, the transformation and projection of a pinhole camera is , as previously stated, An homogeneous 2D line is obtained by the cross product of two points lying on it, l = u 1 ⇤ u 2 and thus, Comparing this result to what was obtained for Plücker coordinates, it can be seen that the product m 1 ⇤ m 2 is a vector orthogonal to the plane ⇥, analogous to the Plücker sub-vector n. Also, the term (⇤ 1 m 2 ⇤ 2 m 1 ) is a vector joining the two support points of the line, therefore related to Plücker sub-vector v.
Defining = ( 1 , 2 ) ⇧ R 2 , vector v C can be also expressed as: where v C ⇧ ⇥ C for any value of .
In order for to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where must be provided as a prior.
2) Anchored Homogeneous-points line: Another way of representing a line is by the endpoints that define it. Departing from the anchored homogeneous point parameterization, an homogeneous-point line is an 11-vector defined as follows: For each point, the transformation and projection of a pinhole camera is , as previously stated, An homogeneous 2D line is obtained by the cross product of two points lying on it, l = u 1 ⇤ u 2 and thus, Comparing this result to what was obtained for Plücker coordinates, it can be seen that the product m 1 ⇤ m 2 is a vector orthogonal to the plane ⇥, analogous to the Plücker sub-vector n. Also, the term (⇤ 1 m 2 ⇤ 2 m 1 ) is a vector joining the two support points of the line, therefore related to Plücker sub-vector v.
Defining = ( 1 , 2 ) ⇧ R 2 , vector v C can be also expressed as: where v C ⇧ ⇥ C for any value of .
In order for to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where must be provided as a prior.
way of representing a line is by the endpoints that define it. Departing from the anchored homogeneous point parameterization, an homogeneous-point line is an 11-vector defined as follows: For each point, the transformation and projection of a pinhole camera is , as previously stated, An homogeneous 2D line is obtained by the cross product of two points lying on it, l = u 1 ⇤ u 2 and thus, Comparing this result to what was obtained for Plücker coordinates, it can be seen that the product m 1 ⇤ m 2 is a vector orthogonal to the plane ⇥, analogous to the Plücker sub-vector n. Also, the term (⇤ 1 m 2 ⇤ 2 m 1 ) is a vector joining the two support points of the line, therefore related to Plücker sub-vector v.
Defining = ( 1 , 2 ) ⇧ R 2 , vector v C can be also expressed as: where v C ⇧ ⇥ C for any value of .
In order for to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where must be provided as a prior.
2) Anchored Homogeneous-points line: Another way of representing a line is by the endpoints that define it. Departing from the anchored homogeneous point parameterization, an homogeneous-point line is an 11-vector defined as follows: For each point, the transformation and projection of a pinhole camera is , as previously stated, An homogeneous 2D line is obtained by the cross product of two points lying on it, l = u 1 ⇤ u 2 and thus, Comparing this result to what was obtained for Plücker coordinates, it can be seen that the product m 1 ⇤ m 2 is a vector orthogonal to the plane ⇥, analogous to the Plücker sub-vector n. Also, the term (⇤ 1 m 2 ⇤ 2 m 1 ) is a vector joining the two support points of the line, therefore related to Plücker sub-vector v.  The convertion from anchred homogeneous point to Euclidean coordinates can be achieved by the following equation: The projection and frame transformation process is given in the next expression: The complete anchores homogeneous point parameterization is the following: where ⇤ C must be given as prior.

B. 3D line parameterizations
In this section, some line parameterizations are covered. The description of projection to image frame, bilinear transformation and back-projection are included. T can be represented as homogeneous 6-vector, known as Plücker coordinates: Plücker coordinates transformation from camera frame is performed as shown next: The whole transformation and projection process for Plücker coordinates in terms of R , T, n, and v is: where K is the instrinsic projection Plücker matrix defined as: When Plücker coordinates are expressed in camera frame, projection is only obtained by Line's range and orientation expressed in v C are not measurable.
For Plücker line back projection, vectors n C and v C are computed according to these expressions: where 1 , 2 ⇧ R and ⇧ e 1 , e 2 , n C ⌃ are mutually orthogonal.
where v C ⇧ ⇥ C for any value of .
In order for to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where must be provided as a prior.
where v C ⇧ ⇥ C for any value of .
In order for to be exactly inverse-distance and e 1 be parallel to the image plane, e 1 and e 2 are obtained with the next expressions: Plücker line back projection is shown in figure 3. The complete Plücker line parameterization is the following: where must be provided as a prior.
2) Anchored Homogeneous-points line: Ano way of representing a line is by the endpoints define it. Departing from the anchored homogen point parameterization, an homogeneous-point line 11-vector defined as follows: For each point, the transformation and projectio a pinhole camera is , as previously stated, An homogeneous 2D line is obtained by the c product of two points lying on it, l = u 1 ⇤ u 2 and Comparing this result to what was obtained Plücker coordinates, it can be seen that the pro m 1 ⇤ m 2 is a vector orthogonal to the plan analogous to the Plücker sub-vector n. Also, the (⇤ 1 m 2 ⇤ 2 m 1 ) is a vector joining the two sup points of the line, therefore related to Plücker sub-v v.
In homogeneous coordinates, where must be provided as a prior.

Anchored Homogeneous-points line
A line can also be represented by the end points that define it. Apply the anchored homogeneous point parameterization, an homogeneous-p line is an 11-vector defined as follows: LAHP L = [p0 m1 ⇢1 m2 ⇢2] T 2 R 11 For each point, the transformation and projection of a pinhole camer as previously stated in equation 3.
( By comparison to the result obtained for Plücker coordinates, the prod m1 ⇥ m2 is a vector orthogonal to the plane ⇡, analogous to the Plücker s vector n. Also, the term (⇢1m2 ⇢2m1) is a vector that gives the direct between the points of the line, therefore related to Plücker sub-vector v.

Landmark initialization
The process of initialization of a landmark consist on the detection a feature in the image, retroprojection to 3D, and inclusion into the m Points are represented as a 2-vector containing Cartesian coordinates in p space, and are modeled as a Gausian variable.
In homogeneous coordinates,  U 0 0 0 Lines can be expressed by a 4-vector that represents coordinates of t end points, also with a Gaussian probability density function.
 U 0 0 U The probability density function for infinite lines like Plücker, pdf N l in conformed by the homogeneous line representation, and the Gaussian tribution defined as follows: where must be provided as a prior.

Anchored Homogeneous-points line
A line can also be represented by the end points that define it. Applying the anchored homogeneous point parameterization, an homogeneous-point line is an 11-vector defined as follows: LAHPL = [p0 m1 ⇢1 m2 ⇢2] T 2 R 11 For each point, the transformation and projection of a pinhole camera is as previously stated in equation 3.
An homogeneous 2D line is obtained by the cross product of two points lying on it, l = u 1 ⇥ u 2 , giving By comparison to the result obtained for Plücker coordinates, the product m1 ⇥ m2 is a vector orthogonal to the plane ⇡, analogous to the Plücker subvector n. Also, the term (⇢1m2 ⇢2m1) is a vector that gives the direction between the points of the line, therefore related to Plücker sub-vector v.

Landmark initialization
The process of initialization of a landmark consist on the detection of a feature in the image, retroprojection to 3D, and inclusion into the map. Points are represented as a 2-vector containing Cartesian coordinates in pixel space, and are modeled as a Gausian variable.
In homogeneous coordinates,  U 0 0 0 Lines can be expressed by a 4-vector that represents coordinates of their end points, also with a Gaussian probability density function.
In terms of geometry, n is the vector normal to the plane π containing the line and the origin and v is the director vector from a to b. The Euclidean orthogonal distance from the line to the origin is given by n / v . Hence, v is the inverse-distance, analogous to ρ of homogeneous points. Plücker line geometrical representation is shown in Figure 3a.
Expressions for transformation and inverse-transformation of Plücker coordinates from and to the camera frame are as shown next: The transformation and projection process in terms of R, T, n and v is as follows: where the intrinsic projection Plücker matrix K is defined as: When Plücker coordinates are expressed in the camera frame, projection is obtained by: The range and orientation of the line are included in v C and are not measurable. For Plücker line back projection, vectors n C and v C are obtained with these expressions: where β 1 , β 2 ∈ R and e 1 , e 2 , n C are mutually orthogonal.
Defining β = (β 1 , β 2 ) ∈ R 2 , vector v C can also be expressed as: where v C ∈ π C for any value of β. Plücker line back projection is shown in Figure 3b. The complete Plücker line parametrization for back projection and transformation is given in the following equation: where β must be provided as a prior.

Anchored Homogeneous Points Line
A line can also be represented by the end points defining it. With the application of the anchored homogeneous point parametrization, shown in Figure 3c, an anchored homogeneous-points line is an eleven-vector defined as follows: For each point, the transformation and projection of a pinhole camera is as previously stated in Equation (4).
A homogeneous 2D line is obtained by the cross product of two points lying on it, l = u 1 × u 2 , giving: In comparison to the result obtained for Plücker coordinates, the product m 1 × m 2 is a vector orthogonal to the plane π, analogous to the Plücker sub-vector n. Furthermore, the term (ρ 1 m 2 − ρ 2 m 1 ) is a vector that gives the direction between the points of the line, therefore related to Plücker sub-vector v.
The complete anchored homogeneous point line parametrization for back projection and transformation is the following: where ρ C 1 and ρ C 2 must be given as priors.

Landmark Initialization
The process of the initialization of a landmark consists of the detection of a feature in the image, retro-projection to 3D and inclusion into the map. There are three important concepts that are involved in the landmark initialization: the 3D landmark x itself, the 2D measurement z of the landmark in the image and the unmeasured degree of freedom π. All of these are modeled as Gaussian variables, whose notation is N µ, σ 2 . Thus, the cited concepts are expressed as x ∼ N {x, P}, z ∼ N {z, R} and π ∼ N {π, Π}, respectively. The 3D landmarks x considered for this study are points and lines, already described in Sections 2.1 and 2.2. The parametrizations for landmark 2D measurements z and unmeasured degrees of freedom π, as well as the description of the initialization algorithm are covered in the following sections.

Landmark 2D Measurements in the Image
Points are represented as a two-vector containing Cartesian coordinates in pixel space, leading to the following: where U is the covariance matrix of the position of the point. In homogeneous coordinates, Lines can be expressed by a four-vector that represents the coordinates of their end-points, also with a Gaussian probability density function.
The probability density function for infinite lines like Plücker, N l , L , is composed of the homogeneous line representation and the covariance matrix defined as follows: l =ū 1 ×ū 2 , and

Unmeasured Degrees of Freedom
The uncertainty in 3D points and lines coming from projection is represented by inverse-distance variables ρ C and β C , which are modeled as Gaussian variables. The origin of each of these priors must be inside the 2σ of their probability density functions.
For points and end-point-based lines, the minimum distance must match the upper 2σ boundary, hence: This initializes lines at the front of the camera.

Undelayed Landmark Initialization Algorithm
The ULI algorithm was presented in [30] for the construction of landmark-based stochastic maps, including a single type of landmark L. The approach presented in this paper includes heterogeneous parametrizations of landmarks L * on the same map, where L * can be a point or line. The resulting algorithm is composed of the following steps:

2.
Identify measurements z ∼ N {z, R}, where z is either a point or a line (i.e., u or s, respectively).

3.
Define Gaussian prior π ∼ N {π, Π} for the unmeasured degree of freedom. π can either be ρ C , t C or β C .

4.
Back-project the Gaussian measurement, and get the landmark mean and Jacobians.
where g() is the back projection and transformation function for the corresponding landmark. C = (T, Q) is the camera frame expressed in terms of its position T and orientation Q in quaternion nomenclature.

Landmark Update
The purpose of the landmark update process is to recalculate the parameters of the elements on the map, (i.e., the robot and landmark poses), given the observation of the already mapped landmarks in the current frame. This process starts by projecting all of the observable landmarks to the image plane and selecting those with higher uncertainty for correction. For points, the observation function h() applies a homogeneous to Euclidean transformation h2e() once having performed the projection process previously explained, as follows: Innovation mean y and covariance Y is then obtained as shown next: where R = U is the measurement noise covariance and Jacobian H = ∂h ∂x x .
For lines, the innovation function computes the orthogonal distances from the detected end-points u i to a line l, as shown in Figure 4, leading to the following: (16) 354 Int J Co which is in pixels units. If we name this function h 1 (l, s), the full observation function is its composition with the projection functions h() in Table 1, The EKF innovation y is defined as the difference between the actual measurement and the expectation, For the measurement z, this corresponds to the distances from the detected endpoints to the detected line l = u 1 ×u 2 . Because this line l is precisely defined by the two endpoints, the measured vector is zero by definition, and we just need to consider a covariance R = U ∈ R 2 (see (48)) representing the pixel noise in just two of the four dimensions. 4 The expectation corresponds to the distances (64) to the ex-

Landmark Re-parametrization
Landmark over-parametrization, w for EKF performance so far, is expe used when justified. Landmarks sho their minimal forms after converge servation functions of these minim forms) are judged linear enough.
For points, the natural choice i The reparametrization is triggered scribed in Civera et al. (2008), whi pute and can be easily adapted to H For lines, and because of the n be convenient to choose a non-m sentation L = (p 1 , p 2 ) (EPL, see F In this case we can use the test fo (2008), which must hold for both also use any of the minimal repre size 4 (see also Fig. 8). Tests for t tations might be defined from the li in the next section, although these for speed. A compromise that wou factory operation is to use the test and does indicate that the line has then reparametrize to any other f We have not explored these last po

Linearity and Performance Ev
We present here the analytical and 354 Int J C which is in pixels units. If we name this function h 1 (l, s), the full observation function is its composition with the projection functions h() in Table 1, The EKF innovation y is defined as the difference between the actual measurement and the expectation, For the measurement z, this corresponds to the distances from the detected endpoints to the detected line l = u 1 ×u 2 . Because this line l is precisely defined by the two endpoints, the measured vector is zero by definition, and we just need to consider a covariance R = U ∈ R 2 (see (48)) representing the pixel noise in just two of the four dimensions. 4 The expectation corresponds to the distances (64) to the expected linel = h(C,x) (Fig. 11). This yields an innovation

Landmark Re-parametrizatio
Landmark over-parametrization, for EKF performance so far, is ex used when justified. Landmarks s their minimal forms after converg servation functions of these mini forms) are judged linear enough.
For points, the natural choice The reparametrization is triggere scribed in Civera et al. (2008), w pute and can be easily adapted to For lines, and because of the be convenient to choose a nonsentation L = (p 1 , p 2 ) (EPL, see In this case we can use the test (2008), which must hold for bo also use any of the minimal rep size 4 (see also Fig. 8). Tests for tations might be defined from the in the next section, although thes for speed. A compromise that wo factory operation is to use the tes and does indicate that the line h then reparametrize to any other We have not explored these last p

Linearity and Performance E
We present here the analytical a this article to evaluate the perfor tions. 354 In Fig. 11 Plücker line observation update. Direct measurement of the two signed orthogonal distances from the detected endpoints to the expected (or predicted) line endpoints u i to a line l. This leads to the measurement function which is in pixels units. If we name this function h 1 (l, s), the full observation function is its composition with the projection functions h() in Table 1, The EKF innovation y is defined as the difference between the actual measurement and the expectation, For the measurement z, this corresponds to the distances from the detected endpoints to the detected line l = u 1 ×u 2 . Because this line l is precisely defined by the two endpoints, the measured vector is zero by definition, and we just need to consider a covariance R = U ∈ R 2 (see (48)) representing the pixel noise in just two of the four dimensions. 4 The expectation corresponds to the distances (64) to the expected linel = h(C,x) (Fig. 11). This yields an innovation For lines, and because of t be convenient to choose a no sentation L = (p 1 , p 2 ) (EPL, s In this case we can use the te (2008), which must hold for also use any of the minimal r size 4 (see also Fig. 8). Tests tations might be defined from t in the next section, although th for speed. A compromise that factory operation is to use the and does indicate that the line then reparametrize to any oth We have not explored these las

Linearity and Performanc
We present here the analytica this article to evaluate the perf tions. 354 Int J C which is in pixels units. If we name this function h 1 (l, s), the full observation function is its composition with the projection functions h() in Table 1, The EKF innovation y is defined as the difference between the actual measurement and the expectation, For the measurement z, this corresponds to the distances from the detected endpoints to the detected line l = u 1 ×u 2 . Because this line l is precisely defined by the two endpoints, the measured vector is zero by definition, and we just need to consider a covariance R = U ∈ R 2 (see (48)) representing the pixel noise in just two of the four dimensions. 4

Landmark Re-parametrization
Landmark over-parametrization, for EKF performance so far, is exp used when justified. Landmarks sh their minimal forms after converg servation functions of these minim forms) are judged linear enough.
For points, the natural choice The reparametrization is triggere scribed in Civera et al. (2008), wh pute and can be easily adapted to For lines, and because of the be convenient to choose a non-m sentation L = (p 1 , p 2 ) (EPL, see F In this case we can use the test (2008), which must hold for bot also use any of the minimal repr size 4 (see also Fig. 8). Tests for tations might be defined from the in the next section, although these for speed. A compromise that wo factory operation is to use the test and does indicate that the line ha then reparametrize to any other We have not explored these last p 5 Linearity and Performance E Since the EKF innovation is the difference between the actual measurement and the expectation and z is the orthogonal distance previously described, the line innovation function is: as the desired orthogonal distance from the predicted line to the matched end-points is zero.
A landmark is found consistent if the squared Mahalanobis distance MD2 of innovation is smaller than a threshold MD2th.
As that is true, the landmark is updated: State update:x ←x + K · y Covariance update: P ← P − K · H · P Point and line parametrizations are modeled as Gaussian variables in [25,29,30], validating the use of Mahalanobis distance as compared to a chi-squared distribution. Kalman gain is assumed to be optimal. Since this process is intended to be developed as a light approach that could be integrated into a dedicated architecture on small vehicles, the selected covariance update formula is used instead of the Joseph form, which has such a high complexity that it may compromise performance. Successful results of this formulation are presented in [30].

SLAM Front-End
To obtain the geometrical representation of landmarks given by the SLAM back-end, it is necessary to process the information coming from the sensors embedded in the moving agent (i.e., cameras mounted on the mobile robot). The front-end deals with the detection of new landmarks and the matching of already existent ones in subsequent images.
This section covers the image processing algorithms used for detecting and matching points and lines. Points have been widely studied and implemented as SLAM landmarks [3,24,33,34]. In the case of lines, a different front-end strategy was integrated for the detection and tracking of line segments.

Point Landmarks
A point landmark is modeled as an appearance descriptor composed of a patch of pixels around the point in the image. Once detected, the patch is used for the matching of the feature on incoming images.

Point Detection
An active-search approach [24,34] can ensure that the point landmarks are equally distributed in the image by dividing it into a number of equal regions, in which it is expected to have a landmark (Figure 5a). At each iteration, an empty region is randomly selected, and a corner point is chosen to be the strongest Harris point [35] (Figure 5b). This point is used for the landmark initialization, and its appearance and the current position and orientation of the camera C 0 = (T 0 , R 0 ) are saved. The appearance of the point is given by the patch of pixels surrounding it, as seen in Figure 5c.

Point Matching
When there are point landmarks already mapped, the matching process searches for a point landmark x in the frame captured at camera pose C i . This point has been initialized in the frame captured with camera pose C 0 (Figure 6a,b).
The saved appearance patch of the landmark is warped by applying a homography transformation. This transformation takes into account the rotation and translation in the camera position and orientation, with respect to its pose when the landmark was detected. The transformed coordinates of pixel j of the patch at camera pose i (i.e., q j i ) is computed as follows: where: and q j 0 are the coordinates of pixel j of the original patch, as R and T are the rotation matrix and translation vector between camera pose C 0 and camera pose C i . Once the patch is warped, it is cropped to be squared in order to maintain the same dimensions of the original. This warping process is shown in Figure 6c.
The matching process performs the projection of point landmark x into the image at camera pose C i to get a point expected position u i , given the current pose i of the camera: The 2D covariance matrix U of this point is obtained from the 3D covariance matrix P LL corresponding to landmark L, as follows: where U RF , U SF and U L are the Jacobians of the projection u with respect to the robot frame, the sensor frame and the landmark, respectively.
Then, the zero mean normalized cross-correlation (ZNCC) test [36] is applied to the warped patch and a region of pixels surrounding the expected point in the image (Figure 6d). The rectangular search region is based on the projection mean u and the covariance ellipse U. The mean is the center of the search box, and the square roots of the diagonals of the covariance are the standard deviations, σ u and σ v . The search region goes ±3σ at each side of the center. If the ZNCC score is over a threshold, the point is said to be matched.

Line landmarks
Many methods have been proposed to extract lines in the image processing community, generally starting from the detection of intensity discontinuities (gradient, Laplacian, Canny filter). The first method, introduced in the 1980s, was the chaining method [37], based on the polygonal approximation of extracted contours. This method is efficient, but the result is too dependent on its parametrizations (gradient threshold, contour thinning). This is why the Hough transform became so popular [38]; a recent variant, the kernel-based Hough transform [39], has been implemented and evaluated for line detection. However, this approach, when working with infinite lines rather than segments, makes its performance less than optimal for the intended purposes. In [40], the Dseg algorithm is proposed, close to the chaining method, but using an iterative filtering approach to integrate a contour point in the processed segment. The Dseg algorithm was compared to the chaining method, with the Hough transform and with the line segment detector (LSD) detector [41]; it was found that it allows extraction of a greater number of segments of various lengths.
This section covers the description of a front-end line segment active-search process, developed for segment-based SLAM. The selected techniques for working with segments are the LSD for the detection and moving edges (ME) [42] for matching.
An active-search approach was developed and implemented in order to handle the line segment landmarks, similar to the one previously described for points.

Segment Detection
The process starts by building a grid that divides the image into rectangular cells. A 3 × 3 grid was chosen, as shown in Figure 7a. There are two different ways of detecting lines: the one applied in the first frame of the sequence, and the one used in all other frames. In the first case, the segment detection algorithm is run for the whole image, and the longest segments found are selected. The cells containing a whole or partial segment are marked as "occupied". This is shown in Figure 7b. The detection process, applied in a subsequent frame (Figure 7c), departs from the assumption that there are line landmarks already on the map. Once the EKF back-end computes the 3D position of the robot and the landmarks seen so far, landmarks are projected to the image. The projection of line landmark x into the image at camera pose C i to get a line segment's expected position s i , given the current pose i of the camera, is performed as follows: Each projection is taken into account for updating the grid. The occupied cells are not considered, and one empty cell is chosen randomly. The image patch delimited by this cell is used to run the segment detector and to find the longest segment on it for initializing a new landmark. The line detected is extended to the other cells, and they are marked as "occupied" when this is the case. This process is shown in Figure 7d.
The patch of the cells where lines were detected in the present frame is saved for the line matching process, as well as the current position and orientation of the camera C 0 = (T 0 , R 0 ). Each line is defined by its end points, and only the pixels surrounding it are used for matching, as can be seen in Figure 7e.

Segment Matching
When there are line landmarks already mapped, the matching process searches for a line landmark x in the frame captured at camera pose C i . This line has been initialized in the frame captured with camera pose C 0 (Figure 8a,b).
The saved appearance patch of the landmark is warped by applying a homography transformation. This transformation takes into account the rotation and translation in the camera position and orientation, with respect to its pose when the landmark was detected. The transformed coordinates of pixel j of the patch at camera pose i (i.e., q j i ) are computed applying Equation (17). This warping process is shown in Figure 8c.
The matching process performs the projection of the line landmark from Equation (18)  To perform the tracking of the points that make up part of line segments, the moving edges algorithm is implemented as discussed in [43].
The algorithm consists of searching the correspondent point p t+1 on line l(r) t+1 in image I t+1 of point p t in line l(r) t . The search for a match is performed in the direction normal to the line l(r) t , given by δ. For each point p t , a search interval Q j , j ∈ [−J, J] is defined. Each sample Q j is evaluated by the criterion ζ j . This evaluation consists in computing the convolution value between an image patch at the neighborhood ν of Q j , and the mask M δ , which is a function of the orientation of line l(r) t . The algorithm is shown in Figure 9.
Thus, the position of point p t+1 on line l(r) t+1 in image I t+1 is given by: A list of k points is produced, from which the segment extremities s = (u 1 , u 2 ) are extracted. One way to express the measurement z of the matched segment with respect to the expected prediction of it is to compute the orthogonal distance of the matched end points u 1 and u 2 to the predicted line l, as shown in Equation (16) and Figure 4.
By defining line measurement in this way, the matching can be accomplished regardless of which points of the corresponding line were detected by the tracker and of the segment length.
This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H0. If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Qj, j ∈ [−J, J]} is defined in the direction δ normal to the contour. For each point p t and for each Qj, in direction δ, a criterion ζj is computed. This criterion consists on the convolution value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted.

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4].
3 oriented, in an N × M image, there are a total of NM × NM different rectangles wuth NM different width values. Thus, the total number of rectangles considered is (NM) 5/2 . The presicion p takes τ /π as initial value, but a total of γ values are tried, having a number of tests of (NM) 5/2 γ. The Number of False Alarms (NFA) is then NFA(r, i) = (NM) 5/2 γ · n j=k n j p j (1 − p) (n−j) This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H0. If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Qj, j ∈ [−J, J]} is defined in the direction δ normal to the contour. For each point p t and for each Qj, in direction δ, a criterion ζj is computed. This criterion consists on the convolution value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted.

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4].
3 NFA(r, i) = (NM) 5/2 γ · j=k j p j (1 − p) (n−j) (13) rs to the number of rectangles that have a sufficient number of aligned points to be as rare as If this number is large, the event is common, and thus not relevant one. If, on the contrary, is small, the event is rare, and probably a meaningful one. The threshold is chosen so that ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.
ent Tracking erform the tracking points that make part of line segments, the Moving Edges algorithm [2] is . No prior edge extracton is required for it. rithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point ) t . A 1D search interval {Qj, j ∈ [−J, J ]} is defined in the direction δ normal to the contour. int p t and for each Qj, in direction δ, a criterion ζj is computed. This criterion consists on the value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The exemplified in figure 1. e new position p t+1 is given by: k points is produced, from which the segment extremities are extracted.
ration into SLAM oal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An approach was implemented in order to handle the line segment landmarks, similar to the one plemented for points in [4].
3 f rectangles that could be aligned at certain presicion. Since rectangles are there are a total of NM × NM different rectangles wuth √ NM different umber of rectangles considered is (NM) 5/2 . The presicion p takes τ /π as lues are tried, having a number of tests of (NM) 5/2 γ. s (NFA) is then FA(r, i) = (NM) 5/2 γ · n j=k n j p j (1 − p) (n−j) (13) f rectangles that have a sufficient number of aligned points to be as rare as large, the event is common, and thus not relevant one. If, on the contrary, is rare, and probably a meaningful one. The threshold is chosen so that eaningful rectangle and produces a detection. This value is chosen to be 1.
g points that make part of line segments, the Moving Edges algorithm [2] is racton is required for it. rching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point terval {Qj, j ∈ [−J, J ]} is defined in the direction δ normal to the contour. Qj, in direction δ, a criterion ζj is computed. This criterion consists on the Qj, using mask Mδ, which is function of the orientation of the contour. The re 1. is given by: d, from which the segment extremities are extracted. (a) ME is their utilization as a front-end for line segment based SLAM. An lemented in order to handle the line segment landmarks, similar to the one nts in [4].
This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H0. If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Qj, j ∈ [−J, J]} is defined in the direction δ normal to the contour. For each point p t and for each Qj, in direction δ, a criterion ζj is computed. This criterion consists on the convolution value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted.

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4].
3 j=k This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H0. If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Qj, j ∈ [−J, J]} is defined in the direction δ normal to the contour. For each point p t and for each Qj, in direction δ, a criterion ζj is computed. This criterion consists on the convolution value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted.

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4].
3 oriented, in an N × M image, there are a total of NM × NM different rectangles wuth NM different width values. Thus, the total number of rectangles considered is (NM) 5/2 . The presicion p takes τ /π as initial value, but a total of γ values are tried, having a number of tests of (NM) 5/2 γ. The Number of False Alarms (NFA) is then NFA(r, i) = (NM) 5/2 γ · n j=k n j p j (1 − p) (n−j) (13) This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H0. If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Qj, j ∈ [−J, J]} is defined in the direction δ normal to the contour. For each point p t and for each Qj, in direction δ, a criterion ζj is computed. This criterion consists on the convolution value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted.

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4]. 3 Ntest is the total number of rectangles that could be aligned at certain presicion. Since rectangles are oriented, in an N × M image, there are a total of NM × NM different rectangles wuth √ NM different width values. Thus, the total number of rectangles considered is (NM) 5/2 . The presicion p takes τ /π as initial value, but a total of γ values are tried, having a number of tests of (NM) 5/2 γ.
The Number of False Alarms (NFA) is then NFA(r, i) = (NM) 5/2 γ · n j=k n j p j (1 − p) (n−j) (13) This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H0. If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Qj, j ∈ [−J, J ]} is defined in the direction δ normal to the contour. For each point p t and for each Qj, in direction δ, a criterion ζj is computed. This criterion consists on the convolution value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted.

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4].
3 width values. Thus, the total number of rectangles considered is (NM) . The presicion p takes τ /π as initial value, but a total of γ values are tried, having a number of tests of (NM) 5/2 γ. The Number of False Alarms (NFA) is then NFA(r, i) = (NM) 5/2 γ · n j=k n j p j (1 − p) (n−j) (13) This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H 0 . If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Q j , j ∈ [−J, J]} is defined in the direction δ normal to the contour. For each point p t and for each Q j , in direction δ, a criterion ζ j is computed. This criterion consists on the convolution value computed at Q j , using mask M δ , which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted.

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4].
3 width values. Thus, the total number of rectangles considered is (NM) . The presicion p takes τ /π as initial value, but a total of γ values are tried, having a number of tests of (NM) 5/2 γ. The Number of False Alarms (NFA) is then NFA(r, i) = (NM) 5/2 γ · n j=k n j p j (1 − p) (n−j) (13) This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H0. If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Qj, j ∈ [−J, J ]} is defined in the direction δ normal to the contour. For each point p t and for each Qj+1, in direction δ, a criterion ζj is computed. This criterion consists on the convolution value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted.

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4].
3 initial value, but a total of γ values are tried, having a number of tests of (NM) 5/2 γ. The Number of False Alarms (NFA) is then NFA(r, i) = (NM) 5/2 γ · n j=k n j p j (1 − p) (n−j) (13) This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H0. If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Qj, j ∈ [−J, J ]} is defined in the direction δ normal to the contour. For each point p t and for each Qj+n, in direction δ, a criterion ζj is computed. This criterion consists on the convolution value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted.

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4].
3 NFA(r, i) = (NM) 5/2 γ · j=k n j p j (1 − p) (n−j) (13) This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H0. If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Qj, j ∈ [−J, J ]} is defined in the direction δ normal to the contour. For each point p t and for each Qj, in direction δ, a criterion ζj is computed. This criterion consists on the convolution value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted.

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4]. 3 Ntest is the total number of rectangles that could be aligned at certain presicion. Since rectangles are oriented, in an N × M image, there are a total of NM × NM different rectangles wuth √ NM different width values. Thus, the total number of rectangles considered is (NM) 5/2 . The presicion p takes τ /π as initial value, but a total of γ values are tried, having a number of tests of (NM) 5/2 γ.
The Number of False Alarms (NFA) is then NFA(r, i) = (NM) 5/2 γ · n j=k n j p j (1 − p) (n−j) (13) This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H0. If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Qj, j ∈ [−J, J ]} is defined in the direction δ normal to the contour. For each point p t and for each Qj, in direction δ, a criterion ζj is computed. This criterion consists on the convolution value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted. (a) (b) (c) (d)

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4]. 3 Ntest is the total number of rectangles that could be aligned at certain presicion. Since rectangles are oriented, in an N × M image, there are a total of NM × NM different rectangles wuth √ NM different width values. Thus, the total number of rectangles considered is (NM) 5/2 . The presicion p takes τ /π as initial value, but a total of γ values are tried, having a number of tests of (NM) 5/2 γ.
The Number of False Alarms (NFA) is then NFA(r, i) = (NM) 5/2 γ · n j=k n j p j (1 − p) (n−j) (13) This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H0. If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Qj, j ∈ [−J, J ]} is defined in the direction δ normal to the contour. For each point p t and for each Qj, in direction δ, a criterion ζj is computed. This criterion consists on the convolution value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted. (a) (b) (c) (d)

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4]. 3 Ntest is the total number of rectangles that could be aligned at certain presicion. Since rectangles are oriented, in an N × M image, there are a total of NM × NM different rectangles wuth √ NM different width values. Thus, the total number of rectangles considered is (NM) 5/2 . The presicion p takes τ /π as initial value, but a total of γ values are tried, having a number of tests of (NM) 5/2 γ.
The Number of False Alarms (NFA) is then NFA(r, i) = (NM) 5/2 γ · n j=k n j p j (1 − p) (n−j) (13) This refers to the number of rectangles that have a sufficient number of aligned points to be as rare as r under H0. If this number is large, the event is common, and thus not relevant one. If, on the contrary, this number is small, the event is rare, and probably a meaningful one. The threshold is chosen so that NFA(r, i) ≤ ε constitutes an ε-meaningful rectangle and produces a detection. This value is chosen to be 1.

Segment Tracking
In order to perform the tracking points that make part of line segments, the Moving Edges algorithm [2] is implemented. No prior edge extracton is required for it.
The algorithm consiston searching the correspondent point p t+1 on line l(r) t+1 in image I * t + 1 of point p t in line l(r) t . A 1D search interval {Qj, j ∈ [−J, J ]} is defined in the direction δ normal to the contour. For each point p t and for each Qj, in direction δ, a criterion ζj is computed. This criterion consists on the convolution value computed at Qj, using mask Mδ, which is function of the orientation of the contour. The algorithm is exemplified in figure 1.
Thus, the new position p t+1 is given by: A list of k points is produced, from which the segment extremities are extracted. (a) (b) (c) (d)

Integration into SLAM
The main goal of the LSD and ME is their utilization as a front-end for line segment based SLAM. An active-search approach was implemented in order to handle the line segment landmarks, similar to the one previously implemented for points in [4].

Experiments
This section includes the experimental part that tests the three main contributions of this article. The first part deals with the back-end and consists of a comparative evaluation between different landmark parametrizations. A set of simulations tests the benefits of the combination of point and line landmarks in the same map. The following part deals with the implementation of the developed segment-based SLAM front-end that includes the line segment active-search process presented in this paper. Finally, a complete heterogeneous landmark-based SLAM experiment that integrates the contributions to back-end and front-end is included.

Simulation of the Back-End for Heterogeneous SLAM
The point and line parametrizations previously presented have been tested independently in previous studies, such as [25,29,30]. This section offers a comparison of different heterogeneous approaches, including combinations of distinct landmarks on the same map. The purpose is to show the benefits of working with a heterogeneous parametrization that combines points and lines in a single map. The combinations performed are enumerated below:
AHP + AHPL The MATLAB EKF-SLAM toolbox [44] was extended with the heterogeneous functionality to perform the simulations. Figure 10a shows the simulation environment. It consists of a house conformed by 23 lines and an array of 16 points distributed uniformly among the walls.
The robot performs two different trajectories. The first one is a circular path of 5 m in diameter, with a pose step of 8 cm and 0.09 • . The second is a motion of 70 steps of 4 cm, each taken towards the scene. The linear noise is 0.5 cm and the angular noise 0.05 • .
Besides the heterogeneous landmark capability of the toolbox, the transparency of the objects in the scene was also considered. By default, objects in the simulation environment of the toolbox are transparent, so landmarks are visible on almost every image frame. To work in a more realistic manner, an aspect graph was implemented to only observe visible surfaces of the house at each camera pose.
Both transparent and opaque object visualizations are shown in Figure 11. An example of a heterogeneous map constructed after a complete turn of the robot around the house is shown in Figure 10b. The parametrization used is AHP + PL. For the case of the approaching trajectory, a final heterogeneous map constructed is shown in Figure 10c. The parametrization used is AHP + AHPL. They display in green the line landmarks estimated and in blue the point landmarks. Real, predicted and estimated robot trajectories are displayed in blue, red and green, respectively.  For the circular path, a trajectory of five turns, considering transparent and opaque objects, was performed for each parametrization.

Integration of Line Segment Active-Search to the SLAM Front-End
The line-based SLAM front-end that was developed and that implements the line segment active-search presented in this article is covered in this section. The LSD and ME algorithms were applied to an image sequence showing a piece of furniture inside a room. Figure 12 shows the operation of this segment-based front-end of an EKF-SLAM process. Infinite thin lines represent the estimated position of the landmark in the current image, while thicker segments show the match found. It can be observed that the match corresponds to the estimation in most of the cases.

Heterogeneous SLAM Experiment
The complete heterogeneous SLAM solution was tested with the experiment described in this section.
The mobile robot used was a CRS Robotics F3 system, with a Microsoft LifeCam Studio camera mounted on it, along with a 9DOF Razor IMU that provided the robot state estimation at each frame. A total of 401 images with a 1280 × 720 resolution form the sequence. From this robot, it was possible to get the ground truth information with a repeatability of ±0.05 mm. The robot described a total trajectory of 0.4653 m. To get a prediction of the motion, the information provided by the IMU was used in a constant acceleration motion model. The inclusion of this additional sensor made it possible to cope with the inherent scale ambiguity of monocular systems. Figure 13 presents certain frames of the sequence, showing the landmarks used to update the state of the map. The AHP and AHPL were the parametrizations selected for the experiments, as they were the ones that provided better simulation results.

Results and Discussion
For the analysis, root mean square error (RMSE) evaluation is used to compute errors. At each instant k, the estimated position (x k , y k , z k ) is compared to the ground truth position (x k ,ŷ k ,ẑ k ).
From the previous results, the mean and standard deviation of the error are computed as follows: Simulation of the back-end for heterogeneous SLAM was intended to compare the different parametrizations and to show the benefits of landmark heterogeneity.
The parametrization with the highest error is the Plücker line. Anchored parametrizations achieved the best performance, for both points and lines. There is an improvement effect in line parametrizations by the addition of points. Even for the anchored cases, already having a relative good performance while working independently, the heterogeneity improves the results, in such a way that the combination of both AHP and AHPL is the one with the least error along the simulated trajectories.
The position error of the robot in the case of a circular trajectory with transparent and opaque objects is shown in Figure 14; these results are summarized in Table 1. For the case of the approaching trajectory, the results are shown and summarized in Figure 15 and Table 2.     The complete SLAM experiment integrating the back-end with the developed front-end is used to compare the heterogeneous approach with a classic point-based SLAM applied to the same sequence. The ground truth and estimated trajectories for each SLAM approach tested are shown in Figure 16.  Figure 17 presents results in terms of the robot position estimation error, comparing the IMU estimation to both point and heterogeneous SLAM. As can be observed, the heterogeneous approach results in lower errors, as previously presented in the simulation part. Near Frame 200, there was a change in the motion direction of the robot, which can be seen as a peak in the position estimation error graph. Even in this case, heterogeneous SLAM achieved a better performance than the point-based SLAM. Table 3 shows a comparative summary of the errors from the three cases.

Conclusions
The purpose of this paper is to prove the benefits of including heterogeneous landmarks when building a map from an EKF-based visual SLAM method. Several authors have shown interest in the use of heterogeneous landmarks. For the front-end, the interest for joint tracking of points and lines is found in [45], while for the back-end, a theoretical study is presented in [31], and preliminary results of an EKF-based SLAM method, based on heterogeneous landmarks, are presented in [46]. The experiments performed that the authors describe have shown that the robot localization or the SLAM stability can be improved by combining several landmarks, i.e., points and lines. The use of just monocular vision provides only partial observations of landmarks by features extracted from images; here, undelayed initialization of landmarks is used, as was proposed initially by Solà et al. [25,29] for points and lines. The use of simulated data has shown how the choice of the landmark representation has an impact on the accuracy of the map. The best ones, considering the construction of a map with heterogeneous landmarks, are anchored homogeneous points and anchored homogeneous points lines. These parametrizations were used in a complete heterogeneous SLAM experiment that produced better results than the classic point-based case, by reducing the camera position estimation error.
Another contribution of this paper is a method proposed for a segment-based SLAM front-end. This method relies on the line segment active-search process presented in this article and on state-of-the-art line detection and matching processes. The methods that compose the developed front-end that resulted were discussed, recalling, first, their theoretical background and, then, presenting some experimental evaluations on image sequences that show the stability of the process.
Finally, a complete heterogeneous landmark-based SLAM experiment was presented, integrating the contributions with the back-end and the front-end and confirming the results obtained independently.
In future work, constraints will be exploited in the map, typically when points and lines are extracted from known portions of the scene.