Multi-Robot 2.5D Localization and Mapping Using a Monte Carlo Algorithm on a Multi-Level Surface

Most indoor environments have wheelchair adaptations or ramps, providing an opportunity for mobile robots to navigate sloped areas avoiding steps. These indoor environments with integrated sloped areas are divided into different levels. The multi-level areas represent a challenge for mobile robot navigation due to the sudden change in reference sensors as visual, inertial, or laser scan instruments. Using multiple cooperative robots is advantageous for mapping and localization since they permit rapid exploration of the environment and provide higher redundancy than using a single robot. This study proposes a multi-robot localization using two robots (leader and follower) to perform a fast and robust environment exploration on multi-level areas. The leader robot is equipped with a 3D LIDAR for 2.5D mapping and a Kinect camera for RGB image acquisition. Using 3D LIDAR, the leader robot obtains information for particle localization, with particles sampled from the walls and obstacle tangents. We employ a convolutional neural network on the RGB images for multi-level area detection. Once the leader robot detects a multi-level area, it generates a path and sends a notification to the follower robot to go into the detected location. The follower robot utilizes a 2D LIDAR to explore the boundaries of the even areas and generate a 2D map using an extension of the iterative closest point. The 2D map is utilized as a re-localization resource in case of failure of the leader robot.


Introduction
For mapping and localization on an uneven or multi-level surface, 3D LIDAR is needed. However, the price of a 3D LIDAR is very high, and the required computing power and resources are also high. A single small robot cannot process both 3D localization and mapping for real-time navigation. Since our previous work [1] studied mapping and localization on uneven surfaces, we decided to extend the experiment using a second robot. The mapping and exploration time is simplified, and the computational time is improved.
The present experiment intends to provide a starting point for a bigger multi-robot team, where the more robots, the faster the mapping task could be performed.
Multi-robot mapping and exploration encounter sparse features in large environments. Alternatives for 2.5D and 3D mappings include 2D-3D feature matching [2] and online point cloud segmentation. Multi-robot systems reduce the mapping time, improve map accuracy and robot localization [3][4][5]. The iterative closest point (ICP) and the random sample consensus (RANSAC) methods are solutions for point cloud registration, but those approaches have issues with the local minimum. Merging image-key points into the point cloud can also reduce the local minimum error [6]. LIDAR odometry and mapping (LOAM) [7] uses point features and edges-planes for scan registration but lacks loop closure in large-scale maps.
in narrow slopes and crowded spaces. Input from multiple robots renders the system more fault-tolerant than its counterparts. The overlapping of multiple information compensates for sensor uncertainties. Nevertheless, multi-robot systems have limitations in exploring uneven spaces because of the sudden change of flat surfaces [46]. There is extensive work for single and multi-robot systems on 2D flat surfaces. However, robot localization is significantly affected when the robots need to explore areas provided with multiple levels. Thus, the major problem for a multi-robot system involves identifying and finding solutions for multi-robot localization. Multi-robot navigation should be capable of localizing the robot, distinguishing slopes from a staircase, and ascertaining a safe path.
This work proposes a robot framework for cooperative robot navigation in indoor environments with multi-level surfaces. We used two robots to validate the proposed method. The leader robot explores the uneven areas of the multi-level access using a Faster R-CNN to detect indoor ramps. By contrast, the follower robot examines the boundaries of the even areas using 2D LIDAR. We propose a novel 2.5D mapping approach to generate a 2.5D map while the robot is exploring an environment with multi-level terrain. Furthermore, we develop a 2D scan merging method to generate a map and obtain a resource to back up the robot localization. Furthermore, autonomous navigation in uneven and unstructured environments is helpful for mobile robots and provides meaningful information for the design of smart wheelchairs. The remainder of this paper is organized as follows. Section 2 addresses the ramp detection using Faster R-CNN, feature extraction, map merging, and localization. Section 3 provides the experimental results, and Section 4 presents the conclusions.

System Overview
The proposed experiment uses a base station computer as a manager for the data collection from the two robots. The wheel odometry of both robots constitutes the priority reference for the pose distribution using Monte Carlo localization. We measure the weight of the particles according to the input from each LIDAR. To ascertain the correspondences among the point clouds, we translate them to the given pose from the particle estimation.
The leader robot generates a global map from the collected LIDAR point clouds. The map can be updated using SLAM according to pose graph optimization and the LIDAR odometry. The follower robot uses the reference generated by the leader robot to identify multi-level access. Once the follower robot enters the new level, it explores all 2D boundaries on the floor. The diagram for the proposed multi-robot system is presented in Figure 1. The follower robot relies on the inertia measurement unit (IMU) to only proceed with the exploration on the X-Y plane. Unless an input of a new path comes from the leader robot, the follower robot will allow the reading in the Y-Z plane.
This work utilizes pose-graph optimization to update a map M. The contributions of this study are as follows: a multi-robot localization according to a Monte Carlo algorithm for multi-level areas, an extension of 2D ICP map merging, and a multi-robot exploration of multi-level areas using a deep CNN. This work utilizes pose-graph optimization to update a map M. The contributions of this study are as follows: a multi-robot localization according to a Monte Carlo algorithm for multi-level areas, an extension of 2D ICP map merging, and a multi-robot exploration of multi-level areas using a deep CNN.

Faster R-CNN for Indoor Ramp Detection
Our multi-robot system defines the location of multi-level areas. The leader robot uses a Faster R-CNN [47] for real-time object detection. The Faster R-CNN detector adds a region proposal network to generate region proposals directly in the network, thereby improving object detection. First, training images were collected using a Kinect camera and then resized to a resolution of 224 × 224. Next, each image was divided into grids and assigned a bounding box. A single CNN runs once on every image. The network consists of 15 convolutional layers followed by two fully connected layers. During training and testing, the Faster R-CNN checks the entire image. Figure 2 shows (a) the diagram for the Faster R-CNN training, (b) the 2D image capture by the Kinect camera, and (c) the detected ramp.

Faster R-CNN for Indoor Ramp Detection
Our multi-robot system defines the location of multi-level areas. The leader robot uses a Faster R-CNN [47] for real-time object detection. The Faster R-CNN detector adds a region proposal network to generate region proposals directly in the network, thereby improving object detection. First, training images were collected using a Kinect camera and then resized to a resolution of 224 × 224. Next, each image was divided into grids and assigned a bounding box. A single CNN runs once on every image. The network consists of 15 convolutional layers followed by two fully connected layers. During training and testing, the Faster R-CNN checks the entire image. Figure 2 shows (a) the diagram for the Faster R-CNN training, (b) the 2D image capture by the Kinect camera, and (c) the detected ramp.

3D Point Clouds and 2D Feature Extraction
The leader robot performs three tasks: detecting uneven areas in the scenery, generating a 3D dense point cloud map, and extracting 2D features from the 3D point cloud.
Based on our previous research [1], the leader robot generates a 2D occupancy grid map OM A using features obtained from a 3D point cloud for localization. The 2D features represent the main edges of the 3D point cloud. Figure 3a shows the projection of the 3D point cloud on a 2D plane. The range of interest (ROI) is set to detect the significant component of the 3D point cloud. To remove the invalid floor points, we employed the Random sampling consensus algorithm (RANSAC). The edge points are denoted as E 1 c and arranged into polar coordinates. Then, the angle ∆θ between the consecutive points (r n , r n+1 ) of the polar coordinates can be obtained as Figure 3b shows. The distance between those two points is given by Equation (1). D(r n , r n+1 ) = r 2 n + r n+1 − 2r n r n+1 cos(∆θ).
Given the 3D LIDAR resolution, the distance threshold is D thd = 0.02 m to split the points. The ROI was divided into segments Sg k {p 1 , p 2 , . . . , p N }. {p 1 , p 2 , . . . , p N } represent the set of points for every segment. The created segments allow us to generate and label features. The orientation for each feature line is ∆θ. Figure 3b illustrates how the points were split into segments. Using the described segments Sg k and orientation ∆θ we created a set of features F L:A .

3D Point Clouds and 2D Feature Extraction
The leader robot performs three tasks: detecting uneven areas in the scenery, gene ating a 3D dense point cloud map, and extracting 2D features from the 3D point cloud.
Based on our previous research [1], the leader robot generates a 2D occupancy gri map using features obtained from a 3D point cloud for localization. The 2D feature represent the main edges of the 3D point cloud. Figure 3a shows the projection of the 3D point cloud on a 2D plane. The range of interest (ROI) is set to detect the significant com ponent of the 3D point cloud. To remove the invalid floor points, we employed the Ran dom sampling consensus algorithm (RANSAC). The edge points are denoted as an arranged into polar coordinates. Then, the angle ∆ between the consecutive poin ( , ) of the polar coordinates can be obtained as Figure 3b shows. The distance be tween those two points is given by Equation (1).

2D Mapping and Map Merging
The follower robot explores the flat area boundaries. The set of 2D LIDAR scans collected by the follower robot was divided according to the input received from the leader robot. The leader robot detects a ramp and the follower robot crosses to the ramp, thereby creating a new set of 2D LIDAR scans s L:B . Then, using s L:B , the follower robot filters each scan using the Voronoi diagram and Delaunay triangulation. Algorithm 1, Line 4 shows the filtering step. After filtering the scans, the follower robot creates an occupancy grid map OM B .

2D Mapping and Map Merging
The follower robot explores the flat area boundaries. The set of 2D LIDAR scans co lected by the follower robot was divided according to the input received from the lead robot. The leader robot detects a ramp and the follower robot crosses to the ramp, thereb creating a new set of 2D LIDAR scans : . Then, using : , the follower robot filters ea scan using the Voronoi diagram and Delaunay triangulation. Algorithm 1, Line 4 show the filtering step. After filtering the scans, the follower robot creates an occupancy gr map . We propose an extended version of the ICP for scan merging and further occupan grid map creation. For every robot pose,  Using ICP scan matching, we obtained a pose correction vector Then the pose of the robot can be updated by Equation (4). We propose an extended version of the ICP for scan merging and further occupancy grid map creation. For every robot pose, M i−1 is the reference scan and S i c is the current scan. Given the rotation matrix R = R θ and the translation t, ICP computes the alignment error E between the two datasets and determines the proper rotation R and translation t that minimizes the outcome of Equation (2).
where N r and N c are the number of the points in M i−1 and S i c respectively. w k,j is 1, if M i−1 k is the closest point to S i c,j and is 0 otherwise. ICP rotates the scanned data S i c,j by θ and translates using t to obtain the best alignment to the reference data M i−1 . We started the mapping and localization matching 0 0 0 T with the coordinate frame.
Using ICP scan matching, we obtained a pose correction vector x i sm y i sm θ i sm T . The pose correction vector derives the homogeneous coordinate transformation matrix H. By employing the 2D geometric transformation, H was expanded to a 3 × 3 matrix as shown in Equation (3).
Then the pose of the robot can be updated by Equation (4).
T is the difference between the i − th and the (i − 1) − th pose of the robot as estimated through the odometer. Scan merging combines the reference data set M i−1 and the current data set S i c into a new data set M i . Then, we determined the outlier points and validated the workspace scans. Using a sparse point map, we avoid point duplication. The map of the environment was incrementally built according to Equation (5). where (x q , y q ) is a data point of the current scan S i c , (x p , y p ) is a data point in M i−1 , and d th is the threshold value for the scan merging. The sensor range determines the proper value d th to achieve optimal scan alignment. The initial state S i c is one "1" and d th is the threshold circle radius.
The merging was executed from left to right, and the threshold circle moves as the data line increases. The new reference M i−1 was obtained from the data line "5." The larger the value d th , the smaller the number of points on the map. Figure 4 shows the reference scan M i−1 (red triangles) and the current scan S i c (blue circles). Figure 4a presents a circle moving from the previous P i+1 line "1" to the new position p i line "2." Points that do not belong to p i and p i+1 are the new scanned points. Figure 4b shows lines "1" and "4" as a duplicated data set and lines "2" and "3" as new scanned points. Our method merges LIDAR scans using a circle threshold that omits adjacent points. If the Euclidian distances between,p i , p i−1 and p i+1 are shorter than the threshold d th , then the points are invalid. Algorithm 1, Line 6 includes the described scan merging. Figure 5a-c shows the different stages of the map merging using the segmented data set.

Communication
To establish multi-robot communication, we used an ad hoc on-demand distance vector (AODV) and a WLAN access point. The AODV is a package of the robot operative system (ROS) and has a unicast and multicast transition. The ROS allows us to create nodes for multi-robot communication. The AODV uses an automatic request datalink to achieve reliable data transfer. The robots do not communicate with each other directly. Instead, the AODV transmits the data using a raw socket, thereby avoiding kernel space. Once each robot has received the data, the master computer publishes the AODV package. With the mentioned protocol and ROS node-publication method, we establish flexible communication among the robots. Figure 6a shows the multi-robot communication diagram. The master station receives a ROS message containing the occupancy grid created by each robot. The master computer then merges the multi-robot trajectories into a single-occupancy grid map. Figure 6b presents the communication flowchart.

Multi-Robot Localization
We propose a multi-robot localization using a Monte Carlo algorithm from our previous study [1]. The leader robot gives the starting point in the initial pose. Each robot uses a sensor reading for particle estimation. We used an iterative process; each robot moves, senses, and re-samples to determine its pose. We can execute a single robot localization when multi-robots have mutual poses. This work assumes that the initial robot pose is known, but each robot does not have global positioning. The leader robot extracts 2D features from the 3D point cloud projection and generates an occupancy grind map OM A . The follower robot then uses 2D LIDAR scans to localize itself in the occupancy grid map OM B .
Instead, the AODV transmits the data using a raw socket, thereby avoiding kernel space. Once each robot has received the data, the master computer publishes the AODV package. With the mentioned protocol and ROS node-publication method, we establish flexible communication among the robots. Figure 6a shows the multi-robot communication diagram. The master station receives a ROS message containing the occupancy grid created by each robot. The master computer then merges the multi-robot trajectories into a singleoccupancy grid map. Figure 6b presents the communication flowchart.

Multi-Robot Localization
We propose a multi-robot localization using a Monte Carlo algorithm from our previous study [1]. The leader robot gives the starting point in the initial pose. Each robot uses a sensor reading for particle estimation. We used an iterative process; each robot moves, senses, and re-samples to determine its pose. We can execute a single robot localization when multi-robots have mutual poses. This work assumes that the initial robot pose is known, but each robot does not have global positioning. The leader robot extracts 2D features from the 3D point cloud projection and generates an occupancy grind map . The follower robot then uses 2D LIDAR scans to localize itself in the occupancy grid map . The leader robot provides the initial position for localization using the Monte Carlo algorithm. Then, the follower robot proceeds with the localization in the occupancy grid map using the features : described in the Section 2.2 as shown in Algorithm 2, Line 5. To obtain the multi-robot localization, we merge the follower robot trajectory onto the leader robot occupancy-grid-map as shown in Algorithm 2, Line 6. Lastly, we ascertain each robot's global position in an occupancy grid map. Figure 7 reveals the occupancy map created by each robot: (a) the map for the leader robot, (b) the follower robot map, and (c) the combination of both in one map. The leader robot provides the initial position for localization using the Monte Carlo algorithm. Then, the follower robot proceeds with the localization in the occupancy grid map OM B using the features F L:A described in the Section 2.2 as shown in Algorithm 2, Line 5. To obtain the multi-robot localization, we merge the follower robot trajectory onto the leader robot occupancy-grid-map OM A as shown in Algorithm 2, Line 6. Lastly, we ascertain each robot's global position in an occupancy grid map. Figure 7

Results
Our multi-robot system has two robots: Robot A (leader) and Robot B (follower leader robot is equipped with a 3D LIDAR scan, a Kinect camera, a Kobuki robot pla and a laptop running Linux Ubuntu 14.04. The follower robot is equipped with a 2 DAR scan, a Kobuki platform, and a laptop running Linux Ubuntu 14.04. The mast tion uses Matlab with the robotics Tool Box 1.4 with an Intel i7 processor. Figure 8a d

Results
Our multi-robot system has two robots: Robot A (leader) and Robot B (follower). The leader robot is equipped with a 3D LIDAR scan, a Kinect camera, a Kobuki robot platform, and a laptop running Linux Ubuntu 14.04. The follower robot is equipped with a 2D LIDAR scan, a Kobuki platform, and a laptop running Linux Ubuntu 14.04. The master station uses Matlab with the robotics Tool Box 1.4 with an Intel i7 processor. Figure 8a depicts the two robots used for the experiment, and Figure 8b-d shows the experiment location. For the experiment, we use a university location provided with four ramps with 10-degree slopes. The ramps allow us to test the multi-robot performance in an uneven space. The leader robot collects the features using the process described in Section 2.2. The follower robot uses raw information from a 2D LIDAR scan.

Results
Our multi-robot system has two robots: Robot A (leader) and Robot B (follower). The leader robot is equipped with a 3D LIDAR scan, a Kinect camera, a Kobuki robot platform, and a laptop running Linux Ubuntu 14.04. The follower robot is equipped with a 2D LI-DAR scan, a Kobuki platform, and a laptop running Linux Ubuntu 14.04. The master station uses Matlab with the robotics Tool Box 1.4 with an Intel i7 processor. Figure 8a depicts the two robots used for the experiment, and Figure 8b-d shows the experiment location. For the experiment, we use a university location provided with four ramps with 10-degree slopes. The ramps allow us to test the multi-robot performance in an uneven space. The leader robot collects the features using the process described in Section 2.2. The follower robot uses raw information from a 2D LIDAR scan. The experiment coordinates systems are as follows: the world coordinate (x, y, z), leader robot ( , , ), and the follower robot ( , , ). The robot coordinates are always parallel to each robot's velocity. Axes and are perpendicular to the soil. Both robots move with a linear velocity of 0.05 m/s. The 2D and 3D LIDAR sensor axes are concurrent with the follower and leader robots, respectively. The speed of both robots and are considered as non-slip speeds. The scanning frequency for the leader robot is every 0.5 m, and that of the follower robot is 0.1 m. The leader robot sampling period is ∆t = 1 s, and that of the follower robot is ∆t = 0.5 s. The follower robot has a shorter sampling period because 2D LIDAR scans are lighter than their 3D LIDAR counterparts. Both robots have enough time for data acquisition from LIDAR and odometry sensors employing the described sampling time. The proposed multi-robot system uses 3D point cloud information to identify features and recreate a 2.5D dense map. By contrast, the 2D LIDAR scans allow for fast localization on the even space. The ground truth for the robot localization was obtained using the odometer and the Inertia measurement unit (IMU) integrated into the robot. The information collected from these sensors was fused using the Extended Kalman Filter (EKF) within a ROS node. The experiment coordinates systems are as follows: the world coordinate (x, y, z), leader robot (x A , y A , z A ), and the follower robot (x B , y B , z B ). The robot coordinates are always parallel to each robot's velocity. Axes z A and z B are perpendicular to the soil. Both robots move with a linear velocity of 0.05 m/s. The 2D and 3D LIDAR sensor axes are concurrent with the follower and leader robots, respectively. The speed of both robots V A and V B are considered as non-slip speeds. The scanning frequency for the leader robot is every 0.5 m, and that of the follower robot is 0.1 m. The leader robot sampling period is ∆t = 1 s, and that of the follower robot is ∆t = 0.5 s. The follower robot has a shorter sampling period because 2D LIDAR scans are lighter than their 3D LIDAR counterparts. Both robots have enough time for data acquisition from LIDAR and odometry sensors employing the described sampling time. The proposed multi-robot system uses 3D point cloud information to identify features and recreate a 2.5D dense map. By contrast, the 2D LIDAR scans allow for fast localization on the even space. The ground truth for the robot localization was obtained using the odometer and the Inertia measurement unit (IMU) integrated into the robot. The information collected from these sensors was fused using the Extended Kalman Filter (EKF) within a ROS node.
We calculated the root-mean-square error (RMSE) in meters for the scan registration. Table 1 lists the RSME errors for both robots. Figure 9 shows the multi-robot estimated and ground-truth trajectories. Figure 9a,b are the trajectories of the leader and follower robots, respectively. The follower robot has a larger trajectory than that of the leader robot. As the leader robot uses a 3D LIDAR for exploration, his trajectory is shorter than that of the follower robot. Figure 9c shows the combined trajectory for both robots. Table 1 indicates that the leader and follower robots' scanning errors are approximately 0.2 m in the X and Y axes, and this outcome acceptable for robot mapping and exploration. The leader robot collects 3D point clouds for every pose and creates a 2.5D dense map. We used a box grid filter (BGF) with a voxel size of 0.1 m to down-sample each 3D point cloud. Once the robots completed the exploration, each robot generates a map. The leader robot generates a 2.5D dense map including all the collected 3D LIDAR point clouds. The 2.5D dense map created by the leader robot includes the floor points. The floor points are only removed for the generation of the occupancy grid map, as is described in Section 2.2. The follower robot recreates the 2D map using all the collected 2D LIDAR scans, the 2D LIDAR map was merged and filter using the process described in Section 2.3.
Filtering the 3D point clouds enable quick registration, thereby maintaining accurate results. Figure 10a depicts the 2.5D map generated by the leader robot. Figure 10b shows the 2D map generated by the follower robot following the process described in Section 2.3. We compare our results with those of two state-of-art methods: generalized GICP and LOAM, to validate our method. As the proposed method was designed for multiple robots, we test the performance of each robot. To validate the trajectory obtained, we measure (in meters) the mean square error (RMSE) between the ground truth and the estimated trajectory. First, we employed two search algorithms: kd-tree and knearest-neighbor; which are the optimal ways to find the distance between two neighboring points. Second, once the closest neighbor pair was found, RMSE calculated the distance between two neighboring points. Tables 2 and 3 show the localization error for each robot. LOAM and GICP are dependent on features extracted for even surfaces. As an uneven surface induces a sudden point cloud rotation, our method provides an accurate response for multi-robot localization on multi-level spaces. The leader robot collects 3D point clouds for every pose and creates a 2.5D den map. We used a box grid filter (BGF) with a voxel size of 0.1 m to down-sample each 3 point cloud. Once the robots completed the exploration, each robot generates a map. T leader robot generates a 2.5D dense map including all the collected 3D LIDAR poi clouds. The 2.5D dense map created by the leader robot includes the floor points. The flo points are only removed for the generation of the occupancy grid map, as is described Section 2.2. The follower robot recreates the 2D map using all the collected 2D LIDA scans, the 2D LIDAR map was merged and filter using the process described in Secti 2.3.
Filtering the 3D point clouds enable quick registration, thereby maintaining accura results. Figure 10a depicts the 2.5D map generated by the leader robot. Figure 10b show the 2D map generated by the follower robot following the process described in Secti 2.3. We compare our results with those of two state-of-art methods: generalized GICP an LOAM, to validate our method. As the proposed method was designed for multiple r bots, we test the performance of each robot. To validate the trajectory obtained, we mea ure (in meters) the mean square error (RMSE) between the ground truth and the estimat trajectory. First, we employed two search algorithms: kd-tree and knearest-neighbo which are the optimal ways to find the distance between two neighboring points. Secon once the closest neighbor pair was found, RMSE calculated the distance between tw neighboring points. Tables 2 and 3 show the localization error for each robot. LOAM an GICP are dependent on features extracted for even surfaces. As an uneven surface induc a sudden point cloud rotation, our method provides an accurate response for multi-rob localization on multi-level spaces.    This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn. Table 2 shows the error for the leader robot, for which our method error in axis Y is approximately 0.15 m that that of the GICP and LOAM. Table 3 shows the error for the follower robot. Our method errors in axes X and Y are approximately 0.2 m lower than those of the GICP and LOAM, and the processing time is also lower because we enhance the filtering and map merging for 2D scans. Table 4 shows the general errors in the multi-robot system, for which our method reduces the error in axes X and Y. The processing time of our technique is likewise lower than that of the GICP and LOAM. As our proposed method includes using a CNN, the 3D-2D exploration is versatile and allows for robot mapping within an optimal result. To provide a quantitative and qualitative comparison of the three methods, we only used MATLAB and ROS for programming. It was clarified in the results section. Table 4. Total errors and times for a multi-robot system using our proposed method versus the general iterative closest Point (GICP) technique and the LIDAR odometry and Mapping (LOAM) approach. The lowest values in the axis X, Y, and consumed time are denoted in bold.

Conclusions
This work allows the mapping and exploration of multi-level surfaces for multi-robots. Our mapping approach simplifies the global representation and localization using a merged occupancy grid map. The experiment results show a robust response in an environment integrated with multi-level surfaces. Our proposed 2D LIDAR scan merging method reduces the error localization around the x and y axes. The created Faster R-CNN has a robust response for detecting ramps in indoor environments. In future work, we shall extend larger indoor and outdoor mapping scenery.