Map Merging with Suppositional Box for Multi-Robot Indoor Mapping

: For the map building of unknown indoor environment, compared with single robot, multi-robot collaborative mapping has higher efﬁciency. Map merging is one of the fundamental problems in multi-robot collaborative mapping. However, in the process of grid map merging, image processing methods such as feature matching, as a basic method, are challenged by low feature matching rate. Driven by this challenge, a novel map merging method based on suppositional box that is constructed by right-angled points and vertical lines is proposed. The paper ﬁrstly extracts right-angled points of suppositional box selected from the vertical point which is the intersection of the vertical line. Secondly, based on the common edge characteristics between the right-angled points, suppositional box in the map is constructed. Then the transformation matrix is obtained according to the matching pair of suppositional boxes. Finally, for matching errors based on the length of pairs, Kalman ﬁlter is used to optimize the transformation matrix. Experimental results show that this method can effectively merge map in different scenes and the successful matching rate is greater than that of other features.


Introduction
Mapping is the first step in the navigation work of mobile robots. Some tasks of the robot, such as ruin detection or indoor rescue, require high efficiency and quality in the process of mapping. Nevertheless, building a global map by a single robot faces many disadvantages in large-scale indoor environments. For example, the longtime mapping of a single robot leads to a large accumulation of errors, which cannot meet the requirements of high precision. Therefore, in recent years, many scholars have focused their research on multi-robot collaborative mapping. In the multi-robot system, each robot walks around in the indoor environment in a distributed manner. A global map is constructed through merging local maps finished by each robot simultaneously [1]. However, since the coordinate frame of local map is not uniform, local map cannot be directly merged. There are usually two ways to solve this problem. One is that the transformations between local maps are given in advance, and the other is that the transformations between local maps are calculated by some methods such as feature matching.
Accordingly, the current map merging methods are mainly divided into method with known initial pose and method with unknown initial pose. One part of method with known initial pose is to extend the single-robot SLAM (Simultaneous Localization And Mapping) method to a multi-robot system, and the other part is to combine optimization algorithms to reduce the cumulative error. In this method, an important challenge is that the estimation of initial pose is not only complicated but also has certain errors. In the case that initial pose is unknown, map merging is carried out mainly through rendezvous, optimization, and feature matching. The method of rendezvous is to derive the transformation relationship combined with optimization algorithm between local maps by using sensors to measure each other's position when they meet. The problem is that the face-to-face observation has a high requirement for the sensor and angle of the measurement site. The method based on optimization mainly searches overlapping areas between maps by using artificial intelligence algorithms such as GA (Genetic Algorithm). However, most optimization algorithms require a long matching time which cannot meet high efficiency requirements. The method based on feature matching mainly uses traditional geometric features, such as points and straight lines. However, in a two-dimensional grid map, the grid is divided into three types of states-free, occupancy, and unknown by the probability threshold. In the process of extracting features, grids with only three different states will cause a large number of similar features, which results in a high mismatching rate. Similarly, the number of linear features in indoor environment is very large, which is easy to fall into the local extremum in the matching process.
The combination of point and line features is our motivation because they are both robust features in grid map in spite of some drawbacks to separate matching. Suppositional box is a virtual feature that we construct on map with vertical points and virtual straight lines. It is invariant in the grid map and more easily matched in number, and after matching, a pair of suppositional boxes can get more corresponding point or line features, which is beneficial to improve the merging accuracy. It should be noted that our research aims at the two-dimensional grid map of the indoor environment. In addition, it does not need to know the initial relative pose relationship. Our proposal is to (i) find common area between local maps based on suppositional box; (ii) use the length of suppositional box and the position constraints between four right-angled points to calculate the rough transformation relationship; and (iii) in order to reduce the errors in the process of suppositional box matching, use Kalman filter to optimize the transformation matrix, so as to get the precise transformation relationship. The method framework is shown in Figure 1.
Electronics 2021, 10, x FOR PEER REVIEW 2 of 20 that initial pose is unknown, map merging is carried out mainly through rendezvous, optimization, and feature matching. The method of rendezvous is to derive the transformation relationship combined with optimization algorithm between local maps by using sensors to measure each other's position when they meet. The problem is that the face-toface observation has a high requirement for the sensor and angle of the measurement site.
The method based on optimization mainly searches overlapping areas between maps by using artificial intelligence algorithms such as GA (Genetic Algorithm). However, most optimization algorithms require a long matching time which cannot meet high efficiency requirements. The method based on feature matching mainly uses traditional geometric features, such as points and straight lines. However, in a two-dimensional grid map, the grid is divided into three types of states-free, occupancy, and unknown by the probability threshold. In the process of extracting features, grids with only three different states will cause a large number of similar features, which results in a high mismatching rate. Similarly, the number of linear features in indoor environment is very large, which is easy to fall into the local extremum in the matching process. The combination of point and line features is our motivation because they are both robust features in grid map in spite of some drawbacks to separate matching. Suppositional box is a virtual feature that we construct on map with vertical points and virtual straight lines. It is invariant in the grid map and more easily matched in number, and after matching, a pair of suppositional boxes can get more corresponding point or line features, which is beneficial to improve the merging accuracy. It should be noted that our research aims at the two-dimensional grid map of the indoor environment. In addition, it does not need to know the initial relative pose relationship. Our proposal is to (i) find common area between local maps based on suppositional box; (ii) use the length of suppositional box and the position constraints between four right-angled points to calculate the rough transformation relationship; and (iii) in order to reduce the errors in the process of suppositional box matching, use Kalman filter to optimize the transformation matrix, so as to get the precise transformation relationship. The method framework is shown in Figure 1.

Initial Pose Known
In the situation of knowing initial pose, the relative position relation can be directly used to obtain the transformation relationship between the local maps, but errors accumulate over time. Reid et al. [2] used the gradient descent method to minimize the errors. The relative position transformation at a certain time was obtained through the odometer, and then the gradient descent method was used to find the optimal conversion method. Other optimization method, such as particle filtering [3], can also be used to reduce errors. In other way, PF-SLAM (Particle Filter SLAM) is extended to multi-robots [4]. This method is discussed separately by assuming whether the initial position is known. In their method, each particle contains trajectory, map, and weight value of each robot. When initial position is known, multiple robot formulas can be derived. The second kind of situation where initial position is unknown, this is the most common form in practical applications, and mutual position relationship is generally determined by meeting. Analogously, Sasaoka et al. [5] studied multi-robot slam based on information fusion and EKF (Extended Kalman Filter).

Initial Pose Unknown-Rendezvous
Rendezvous case is the use of sensors to measure the relative positional relationship when robots are close to each other. However, recognition only by observation will produce some errors. Thus, some studies are focus on landmark matching [6], combining PLICP (Point-to-Line Iterative Closest Point) algorithm and KD (K-Dimension) tree [7] or twodimensional code scanning [8] to improve the accuracy of the merged map. In addition, Cho et al. [9] used one-way observation to obtain coarse transformation. Then they combined curvature registration method and the particle swarm smoothing algorithm to increase merging precision. Konolige [10] proposed a hypothesis verification method. When two robots meet, one robots uses data received from the other robot to estimate their relative position. On the base of this, they assume that the robot will be in a certain position. Once the assumption is successful, robots will work together.

Initial Pose Unknown-Optimization
The map merging method based on the optimization algorithm is to directly find overlap area between maps, which can obtain the transformation relationship. Carpin [11] sets the objective function as the difference function of two maps. To find optimal overlap area, random search algorithm is used to minimize the difference degree function. On the contrary, Birk et al. [12] set the objective function as the map overlap similarity function and used the adaptive random walk algorithm to find the optimal solution. Furthermore, Ma et al. [13] studied an adaptive genetic algorithm to solve.

Initial Pose Unknown-Feature Matching
The matching features range from points to lines to other geometric features. Ferrao [14] used SIFT (Scale-Invariant Feature Transform) to calculate translation, rotation, and zoom between maps. However, according to their discussion, it was found that only matching feature points may not find the correct corresponding points. Therefore, reliability analysis is introduced on the basis of feature points [15]. Windows are built around the feature points, which can calculate the number of unknown state grids and filter mismatched feature points. Analogously, Jiang et al. [16] increased the precision by using MAFF (Map Augmentation and Feature Fusion). In addition, Tang et al. [17] combined SURF (Speeded Up Robust Features) features and RANSAC (RANdom Sample Consensus) algorithm as the initial value of the ICP (Iterative Closest Point) algorithm for map merging. Sun et al. [18] used Harris points to construct max common subgraph which is used as feature for map matching. In the grid map, each grid is represented by only three states after the threshold is divided. Thus, the description of the feature points is very similar, resulting in the point feature being unable to find the correct corresponding point. For lines matching, Carpin et al. [19] used the cross-correlation of the Hough spectrum to determine similar signals. After obtaining the rotation angle, they calculated the translation matrix through the X-spectrum and the Y-spectrum. On the basis of Hough spectrum, Lee et al. [20] enhanced this algorithm to improve the efficiency of the multi-robot system by identifying visual landmarks. Similarly, Sajad et al. [21] improved the accuracy by studying how to match the relative peak value correctly. Furthermore, Roh et al. [22] used Radon transform to extract the sinogram of the map. They took the coarse rotation matrix and translation matrix as the initial particle of the particle swarm algorithm to calculate the accurate value. However, it is easy to fall into a local maximum or minimum only by Radon transformation, so recently Lee [23] studied the Gaussian mixture model to optimize the transformation matrix on this basis. In addition to extracting existing lines from the map, some works [24,25] constructed lines based on scanning points. The difference is that the latter match based on the interconnections between the intersections of the two line segments. Although, in indoor grid maps, straight line features are very common, it takes a long time to match only with straight lines, which cannot meet the requirements of high efficiency. In the research based on geometric features, Adam et al. [26] proposed method based on wireframe mapping where the descriptor of each point is six variables and each edge is vectorized. Then, the weighted infinite norm between descriptors serves as a matching criterion. For some similar studies, Park et al. [27] proposed to construct the maximal empty rectangle in the map, so as to use the triangle position constraint between the empty rectangles to obtain the transformation relationship. Since the center of the empty rectangle will change due to the inconsistency of the map orthogonal coordinate system, the direction of the map orthogonal coordinate system needs to be unified. On the basis of that, Jazi [28] improves the orthogonal algorithm to speed up the matching speed, but the sensitivity of the empty rectangle to the rotated map still increases the complexity of the matching process. Our methods are not affected by this map rotation.
In addition to the above features extracted directly from the map, there are also some methods studying features fixed in the environment. Göksel et al. [29] find similar landmarks to match by identifying certain landmark information. Tsardoulias et al. [30] combined RFID (Radio Frequency Identification) positioning technology and ICP algorithm for grid map merging. However, the setting of feature points needs to be built in an unchanging environment, so the adaptability is not strong.
With the development of neural networks, some scholars use the features output from neural network to merge maps. Sajad et al. [31] used SOM neural network to conduct clustering feature for each segmented area, and the rotation angle is obtained by matching the norm vector histogram of the line connecting two cluster points and combining with Radon transform. Finally the x-y transformation was determined by using the norm vector of the matched cluster point. Although the number of feature output by the neural network is large in grid map, the mismatching is relatively high only according to these features. The other part was mainly based on the pose-graph map or topology map. Bonanni et al. [32] used the Voronoi diagram method to extract pose-graph map from grid map. They expanded the nodes from the existing pose-graph according to the breadth-first algorithm and optimize the expansion node by using the g2o libraries, and Lázaro et al. [33] directly merged topology map based on visual features (ORB features), then optimized the pose map using the g2o libraries. Similar to these works, other studies are focused on different optimization methods, likely feature filtering method [34] or RANSAC and Gauss-Newton method [35].

Method
The map merging method based on suppositional box is mainly divided into three steps: suppositional box construction, map matching based on suppositional box, and map merging.

The Construction of Suppositional Box
In our study, suppositional box is a virtual connection between four right-angled points which have position constraints. Generally, they have some commonalities. For instance, a suppositional box is composed of four right-angled points and two pairs of parallel sides. At the same time, four right-angled points have common edges between each other. Thus, it is easy to construct suppositional box by using common features, saying straight lines and right-angled points, in indoor environment. Hough transform [36] as a classics and mature method is used to detect straight lines in our research. Then the intersections of perpendicular straight lines are calculated as vertical points. Finally vertical points and perpendicular straight lines are grouped to construct suppositional box. Since these vertical points can also be intersecting points of straight line extension, the number of suppositional boxes extracted from the map is much, which can improve the success rate of matching.

Vertical Point Extraction
The vertical point is the intersection between perpendicular straight lines. Practically, a vertical point can represent the spatial vertical relationship between two objects. In other words, they can be either real connecting points or extension points of objects.
First, straight lines are detected in the grid map through Hough transform, and the slope and intercept of each straight line is calculated. Then, the slopes are used to find two perpendicular straight lines, that is, when two straight lines are perpendicular, the slopes of them are multiplied to −1. If two straight lines are in a perpendicular relationship, the intersection point of two straight lines is as the vertical point, as and p 2 in Figure 2.
allel sides. At the same time, four right-angled points have common edges between each other. Thus, it is easy to construct suppositional box by using common features, saying straight lines and right-angled points, in indoor environment. Hough transform [36] as a classics and mature method is used to detect straight lines in our research. Then the intersections of perpendicular straight lines are calculated as vertical points. Finally vertical points and perpendicular straight lines are grouped to construct suppositional box. Since these vertical points can also be intersecting points of straight line extension, the number of suppositional boxes extracted from the map is much, which can improve the success rate of matching.

Vertical Point Extraction
The vertical point is the intersection between perpendicular straight lines. Practically, a vertical point can represent the spatial vertical relationship between two objects. In other words, they can be either real connecting points or extension points of objects.
First, straight lines are detected in the grid map through Hough transform, and the slope and intercept of each straight line is calculated. Then, the slopes are used to find two perpendicular straight lines, that is, when two straight lines are perpendicular, the slopes of them are multiplied to −1. If two straight lines are in a perpendicular relationship, the intersection point of two straight lines is as the vertical point, as and 2 p in Figure 2. The vertical point is described by the coordinates of vertical point, the slope and intercept of two perpendicular straight lines constituting this vertical point. On the whole, the i-th vertical point descriptor is p i : where N is the number of vertical points. There is no requirement for the order of the two vertical sides. In the subsequent matching process, the slope and intercept of the two vertical sides will be used for matching.

Uncertainty Optimization of Vertical Points
Ideally, each vertical point is a unique representation in the map. However, the same wall sometimes is not made up of a straight line in grid map, because the state of each grid is calculated by probability. At the same time, a part of long wall will be recognized The vertical point is described by the coordinates of vertical point, the slope and intercept of two perpendicular straight lines constituting this vertical point. On the whole, the i-th vertical point descriptor is p i : where N is the number of vertical points. (vx i , vy i ) is the vertical point coordinate value, k i1 and b i1 are the slope and intercept of one of the vertical sides respectively, k i2 and b i2 are the slope and intercept of the other vertical side, respectively. There is no requirement for the order of the two vertical sides. In the subsequent matching process, the slope and intercept of the two vertical sides will be used for matching.

Uncertainty Optimization of Vertical Points
Ideally, each vertical point is a unique representation in the map. However, the same wall sometimes is not made up of a straight line in grid map, because the state of each grid is calculated by probability. At the same time, a part of long wall will be recognized as several straight lines. Therefore, in the map, a vertical point in reality may be represented by a set of nearby points, as shown in Figure 3. In order to reduce the uncertainty, it is necessary to find an optimal one to represent these vertical points.
Electronics 2021, 10, x FOR PEER REVIEW 6 of 20 as several straight lines. Therefore, in the map, a vertical point in reality may be represented by a set of nearby points, as shown in Figure 3. In order to reduce the uncertainty, it is necessary to find an optimal one to represent these vertical points. This paper uses clustering algorithm to group the vertical points that actually represent the same vertical point and find the optimal value to represent this set of vertical points. We take Euclidean distance as the goal to calculate the optimal coordinate, and defines it as an unconstrained optimization problem, as in Formula (2). Some vertical points formed by the intersection of perpendicular extension straight line are not in the map outline. In order to improve the matching speed, these outliers are filtered using the neighborhood state difference method: a vertical point is taken as the starting point for four directions of diffusion; if there is no grid state different from this grid state in the four directions, then this vertical point is deleted.

Suppositional Box Building
In addition to the commonality of four right-angled points, suppositional box also has certain constraints on the positions of the four right-angled points. They must form a closed loop in the clockwise or counterclockwise direction. According to this constraint, suppositional box can be built by optimized vertical points.
Any vertical point can be chosen as the first right-angled point of suppositional box. A vertical side of this vertical point is selected as the target to find all other vertical points with the same slope and intercept which are the first candidate group. Then, a vertical point is randomly selected from above candidate object group as the second right-angled point of suppositional box, meanwhile the second candidate object group is obtained in the same way. Finally, the same method is used combined with the height and width distance constraints to find the fourth right-angled point of suppositional box. Repeat the above steps until all suppositional boxes that meet the requirements are extracted. When This paper uses clustering algorithm to group the vertical points that actually represent the same vertical point and find the optimal value to represent this set of vertical points. We take Euclidean distance as the goal to calculate the optimal coordinate, and defines it as an unconstrained optimization problem, as in Formula (2). vx best , vy best = arg min t ∑ m=0 (vx m , vy m ) − vx best , vy best 2 (2) where vx best , vy best is the optimal vertical point, (vx m , vy m ) is the m-th vertical point in the same group, and t is the number of vertical points in the group. The optimization problem is solved by Newton method [37] which starts with any coordinate (vx 0 , vy 0 ) selected within the group. The optimization precision is set as ε = 1 × 10 −8 . Some vertical points formed by the intersection of perpendicular extension straight line are not in the map outline. In order to improve the matching speed, these outliers are filtered using the neighborhood state difference method: a vertical point is taken as the starting point for four directions of diffusion; if there is no grid state different from this grid state in the four directions, then this vertical point is deleted.

Suppositional Box Building
In addition to the commonality of four right-angled points, suppositional box also has certain constraints on the positions of the four right-angled points. They must form a closed loop in the clockwise or counterclockwise direction. According to this constraint, suppositional box can be built by optimized vertical points.
Any vertical point can be chosen as the first right-angled point of suppositional box. A vertical side of this vertical point is selected as the target to find all other vertical points with the same slope and intercept which are the first candidate group. Then, a vertical point is randomly selected from above candidate object group as the second right-angled point of suppositional box, meanwhile the second candidate object group is obtained in the same way. Finally, the same method is used combined with the height and width distance constraints to find the fourth right-angled point of suppositional box. Repeat the above steps until all suppositional boxes that meet the requirements are extracted. When the fourth right-angled point is matched, in order to prevent the intersection of the sides of suppositional box, the height and width distance constraints are imposed. The process of building suppositional box is shown in Figure 4.
Electronics 2021, 10, x FOR PEER REVIEW 7 of 20 the fourth right-angled point is matched, in order to prevent the intersection of the sides of suppositional box, the height and width distance constraints are imposed. The process of building suppositional box is shown in Figure 4. After each suppositional box is built, its descriptor can be defined by suppositional box's height, width, area, center point, and vertical points as where M is the number of suppositional box. R q is the q-th suppositional box in the map, ( ) r z , q = x y is the coordinate of the r-th right-angled point in suppositional box, height is the long side of suppositional box, width is the short side of suppositional box, area is the area of suppositional box, (cx, cy) is the center point of suppositional box coordinate. Suppositional box matching will be based on this descriptor. See Algorithm 1 for the detailed process of suppositional box construction. After each suppositional box is built, its descriptor can be defined by suppositional box's height, width, area, center point, and vertical points as where M is the number of suppositional box. R q is the q-th suppositional box in the map, z qr = (x, y) is the coordinate of the r-th right-angled point in suppositional box, height is the long side of suppositional box, width is the short side of suppositional box, area is the area of suppositional box, (cx, cy) is the center point of suppositional box coordinate. Suppositional box matching will be based on this descriptor. See Algorithm 1 for the detailed process of suppositional box construction.

Algorithm 1 Suppositional box building
Input: the list of vertical points, Output: the list of suppositional box descriptor: R q = (z q1 , z q2 , z q3 , z q4 , height, width, area, cx, cy) ) //Find point with the same slope and intercept 6: if j 2 = finding (p i .k i2 ,p i .b i2 )//Find point with the same slope and intercept 7: ) and intersection (i,j 1 ,j 2 ,j 3 ) // Find point that has the same slope and intercept and satisfies distance constraint constraint 8: height = max(dis(z q1 , z q2 ),dis(z q2 , z q3 ))//Define the long side to be height high 12: width = min(dis(z q1 , z q2 ),dis(z q2 , z q3 ))//Define the short side to be width 13: (cx, cy) = meancenter(z q1 ,z q2 ,z q3 ,z q4 )//Compute the center point 14: R q .append(z q1 ,z q2 ,z q2 , z q4 ,height, width, cx, cy) 15: q ← q + 1 16: else 17: ss ← False 18: end for After building suppositional box circularly, right-angled points of each descriptor are randomly ordered. In order to reduce the time for subsequent matching, our method is designed to start from the lower left corner point and carry out counterclockwise sorting, generating a set of ordered right-angled points. As shown in Figure 5.

Suppositional Box Matching
On the whole, the process of matching is divided into two parts, rough matching, and filtration. It is important to note that the background of our work is based on the same resolution, only considering the translation and rotation of the map.
It is easy to match suppositional box by the side length. By default, the long side is height, and the short side is wide. Considering the error in the two-dimensional grid map established by the lidar, in order to match all the similar suppositional box pairs as much as possible, the threshold value is set to a larger number. Along with these a large number of pairs come the challenges of mismatching. Therefore, on the basis of rough matching with the side length, the process of filtering the mismatched pairs to obtain the optimal suppositional box pairs is proposed. In this study, the optimal suppositional box pairs can meet the alignment requirements of a large number of rough matching pairs and minimize the alignment error.
Through suppositional box matching, at least four pairs of corresponding right-angled points between two local maps can be directly found. The transformation matrix T can be calculated by the corresponding right-angled points, as shown in Equation (4).

Suppositional Box Matching
On the whole, the process of matching is divided into two parts, rough matching, and filtration. It is important to note that the background of our work is based on the same resolution, only considering the translation and rotation of the map.
It is easy to match suppositional box by the side length. By default, the long side is height, and the short side is wide. Considering the error in the two-dimensional grid map established by the lidar, in order to match all the similar suppositional box pairs as much as possible, the threshold value is set to a larger number. Along with these a large number of pairs come the challenges of mismatching. Therefore, on the basis of rough matching with the side length, the process of filtering the mismatched pairs to obtain the optimal suppositional box pairs is proposed. In this study, the optimal suppositional box pairs can meet the alignment requirements of a large number of rough matching pairs and minimize the alignment error.
Through suppositional box matching, at least four pairs of corresponding right-angled points between two local maps can be directly found. The transformation matrix T can be calculated by the corresponding right-angled points, as shown in Equation (4).
Firstly, a transformation matrix can be calculated for each rough match pair. Since the transformation matrix contains only three unknown variables, it can be calculated by three pairs of corresponding right-angled point pairs of suppositional box pairs. However, the arbitrarily sorted right-angled points of suppositional box descriptor means that the cyclic shift of point pairs must be considered in the computational process. Below we take two local maps map 1 and map 2 matching as an example. For any rough matching pair, select three points (z 11 , z 12 , z 13 ) from suppositional box of map 1 as the source point, correspondingly select (z 21 , z 22 , z 23 ), (z 22 , z 23 , z 24 ), (z 23 , z 24 , z 21 ), (z 24 , z 21 , z 22 ) from the corresponding suppositional box of map 2 as candidate points. Now, 4 transformation matrixes T 0 , T 1 , T 2 and T 3 are obtained, respectively.
Secondly, it is the filtration process that try to remove incorrect match pairs. For the other suppositional box corresponding to map 1 in the rough matching pair, we convert the coordinates of right-angled points to the map 2 coordinate system by the transformation matrix T 0 . Then, the distance between four pairs of right-angled point is calculated. When the distance of the four right-angled points is less than a threshold, it is considered that this pair is successfully matched. Next, the number of successfully matched under the transformation T 0 is counted. When the number of successful matching reaches the target value, T 0 is considered a correct transformation matrix and put it into the candidate group. On the contrary, if the number is less than the threshold, then T 1 , T 2 , and T 3 will be used to repeat the above process in turn.
Finally, score evaluation is used to obtain the optimal matching pair. After the aforementioned round of screening, transformation matrices can be calculated for the correctly matched pairs. They form a candidate group. In order to select the optimal transformation matrix from the candidate group, a scoring method is designed. The higher the score is, the smaller the alignment errors generated by the transformation matrix is. Based on this, a suppositional box pair with the highest score is selected as the optimal matching result. The score is mainly judged by the corresponding right-angled point distance and suppositional box area, see Equation (5). If the distance between the right-angled points of other suppositional box in the rough matching is smaller, the higher the accuracy of this pair is. Besides that, since the area of suppositional box is smaller, the area of the overlapping represented is smaller. These increase the error of other corresponding right-angled pairs. Therefore, the area is used as a reference factor. The smaller the pair area, the lower the weight.
where R 1 and R 2 are the matching suppositional box pairs, area(R 1 ) and area(R 2 ) are the corresponding suppositional box area, d R 1 R 2 1 , d R 1 R 2 2 , d R 1 R 2 3 and d R 1 R 2 4 are the Euclidean distance differences of the four corresponding right-angled points.

Map Merging
Local maps transformed to the same coordinate system by the optimal suppositional box pairs can be directly merged. However, it is evident that there are inconsistencies in the representation of the same region after map merging. For example, the same suppositional box does not align perfectly on the merged map. The reason is that there may be some errors in the optimal pairs. For reducing errors, Kalman filter is chosen.
In the process of grid map merging, we take the grid homogeneous coordinates as the state variable. The predictor variable is grid homogeneous coordinateX of the coordinate system of map 2 , which is transformed from the corresponding grid homogeneous coordinates under the coordinate system of map 1 , and the observation variableẐ is represented as the same position in the map 2 .
Given any grid coordinates (x, y) in map 1 , the predictor variableX n is calculated through the transformation matrix T.
where n = 0,1,2..., is the grid index. The transformation matrix obtained only by selecting three pairs of corresponding right-angled points of the matching pair may have some errors. Hence the prediction covariance matrix can be derived from the four transformation matrices T a , T b , T c and T d which are calculated by the cyclic combination of four sets of corresponding right-angled points of this pair. First these four transformation matrices are used to find the corresponding four sets of homogeneous coordinate values, and then calculate the mean values in the x and y directions in turn: Next, the predicted covariance matrix is calculated according to the above four sets of homogeneous coordinate values and mean values: where The matched optimal suppositional box pair is considered as the same area in environment; that is, the center points of its two suppositional boxes should theoretically be the same point in practice. Accordingly, the distance from the same grid point to the center point of the matched suppositional box in the two maps should be same. Thus, the observation value can be calculated by the distance from this point to the center point of the matched suppositional box:Ẑ where x c1 , y c1 is the center point coordinates value under the coordinate system of map 2 transformed from the center point x c1 , y c1 of this suppositional box in map 1 , x c2 , y c2 is the corresponding center point of this suppositional box in map 2 .
Obviously, the error of center point comes from that the matching pairs are not exactly equal in length. Whereupon the observed error covariance matrix is calculated by the error value of height and width between the matching pair.
here, W is the length error in the x direction, and H is the length error in the y direction. 0.5 is the set parameter value.
Finally, the Kalman gain is calculated: The Kalman estimate of grid coordinates iŝ Since the grid coordinates are integer values, the Kalman estimate result is finally rounded. During the process of merging, the state of grid in the two maps sometimes are different. The paper follows the following rules in Table 1 to determine the state of the grid.

Experimental Results and Analysis
In this section, we have verified our algorithm with various maps produced by simulation and real environment. In simulation experiments, the effectiveness of map merging method based on suppositional box is verified. Our method is also proved to be feasible in the case that the map orthogonal frame is inconsistent with the orthogonal frame of suppositional box. In a real environment, our method is performed under different indoor environment and proved of the higher rate of matching by comparing with other matching methods.

Experimental Verification under the Simulation Environment
In Ubuntu environment, we use Gazebo [38], a simulation software, to construct a simulation environment, as shown in Figure 6. In our implementation, the simulation robots equipped with lidar sensors construct two maps through the Gmapping [39] package in ROS. The Gmapping package can use the virtual lidar sensors to build a two-dimensional grid map based on the method of particle filtering. As the mapping area extending, the amount of data carried by particles will increase, resulting in an increase in the amount of calculation. However, compared with other mapping methods, it has a small computation cost and high accuracy in the small scenes.

Experimental Results and Analysis
In this section, we have verified our algorithm with various maps produced by simulation and real environment. In simulation experiments, the effectiveness of map merging method based on suppositional box is verified. Our method is also proved to be feasible in the case that the map orthogonal frame is inconsistent with the orthogonal frame of suppositional box. In a real environment, our method is performed under different indoor environment and proved of the higher rate of matching by comparing with other matching methods.

Experimental Verification under the Simulation Environment
In Ubuntu environment, we use Gazebo [38], a simulation software, to construct a simulation environment, as shown in Figure 6. In our implementation, the simulation robots equipped with lidar sensors construct two maps through the Gmapping [39] package in ROS. The Gmapping package can use the virtual lidar sensors to build a two-dimensional grid map based on the method of particle filtering. As the mapping area extending, the amount of data carried by particles will increase, resulting in an increase in the amount of calculation. However, compared with other mapping methods, it has a small computation cost and high accuracy in the small scenes. The local maps, 1 map and 2 map , are constructed by the two robots, respectively, as shown in Figure 7. By suppositional box matching algorithm, the optimal matching pair can be obtained. It can be seen that vertical points are not only the actual corner inflection points, but also the intersection points of some straight lines' extension. Although we filter the vertical points that are not on the outline, some of them with different grid states around are reserved due to algorithm, which will not affect the matching process. The local maps, map 1 and map 2 , are constructed by the two robots, respectively, as shown in Figure 7. By suppositional box matching algorithm, the optimal matching pair can be obtained. It can be seen that vertical points are not only the actual corner inflection points, but also the intersection points of some straight lines' extension. Although we filter the vertical points that are not on the outline, some of them with different grid states around are reserved due to algorithm, which will not affect the matching process. In following part, we demonstrate that Kalman filter can reduce the errors in the matching process of suppositional box. We use violent merging method and Kalman filter method respectively to carry out map merging experiment. The results are shown in Figure 8. Violent merging method is to merge the two maps directly. In Figure 8a, you can see that the same wall part appears twice, which is caused by the error of the transformation matrix. In Figure 8b, it clearly shows that the result is more consistent with the actual situation after optimization, which proves the effectiveness of Kalman filtering. In addition, according to statistics, the overlap contains a total of 22 edges, among which there are 9 non-coincident edges in violent merging method, and the sum of the length of non-coincident edges accounts for 31.6% of the sum of all the length of edges in the public area. In contrast, in the Kalman filter method, only one edge is not completely coincident, accounting for 2.4% of the total length. In next part, we analyzed detailed errors in local map and merged map, respectively. The error is calculated by the differences of same objects between map and real environment.
The suppositional box is constructed on the basis of right-angled, which is component of the orthogonal frame. In the simulation environment, the orthogonal frame of suppositional box and map is usually in the same orientation. Actually, there will be certain differences between them. For example, in the real experiences, orthogonal frame of suppositional box may not coincide with orthogonal frame of map such as form a certain angle between them, which is a challenge for observed error in Kalman filter merging. As show in Figure 9. It is because that our method ignores the error caused by a certain angle between the orthogonal frame of them. Therefore, in order to comprehensively evaluate In following part, we demonstrate that Kalman filter can reduce the errors in the matching process of suppositional box. We use violent merging method and Kalman filter method respectively to carry out map merging experiment. The results are shown in Figure 8. Violent merging method is to merge the two maps directly. In Figure 8a, you can see that the same wall part appears twice, which is caused by the error of the transformation matrix. In Figure 8b, it clearly shows that the result is more consistent with the actual situation after optimization, which proves the effectiveness of Kalman filtering. In addition, according to statistics, the overlap contains a total of 22 edges, among which there are 9 non-coincident edges in violent merging method, and the sum of the length of non-coincident edges accounts for 31.6% of the sum of all the length of edges in the public area. In contrast, in the Kalman filter method, only one edge is not completely coincident, accounting for 2.4% of the total length. In following part, we demonstrate that Kalman filter can reduce the errors in the matching process of suppositional box. We use violent merging method and Kalman filter method respectively to carry out map merging experiment. The results are shown in Figure 8. Violent merging method is to merge the two maps directly. In Figure 8a, you can see that the same wall part appears twice, which is caused by the error of the transformation matrix. In Figure 8b, it clearly shows that the result is more consistent with the actual situation after optimization, which proves the effectiveness of Kalman filtering. In addition, according to statistics, the overlap contains a total of 22 edges, among which there are 9 non-coincident edges in violent merging method, and the sum of the length of non-coincident edges accounts for 31.6% of the sum of all the length of edges in the public area. In contrast, in the Kalman filter method, only one edge is not completely coincident, accounting for 2.4% of the total length. In next part, we analyzed detailed errors in local map and merged map, respectively. The error is calculated by the differences of same objects between map and real environment.
The suppositional box is constructed on the basis of right-angled, which is component of the orthogonal frame. In the simulation environment, the orthogonal frame of suppositional box and map is usually in the same orientation. Actually, there will be certain differences between them. For example, in the real experiences, orthogonal frame of suppositional box may not coincide with orthogonal frame of map such as form a certain angle between them, which is a challenge for observed error in Kalman filter merging. As show in Figure 9. It is because that our method ignores the error caused by a certain angle between the orthogonal frame of them. Therefore, in order to comprehensively evaluate In next part, we analyzed detailed errors in local map and merged map, respectively. The error is calculated by the differences of same objects between map and real environment.
The suppositional box is constructed on the basis of right-angled, which is component of the orthogonal frame. In the simulation environment, the orthogonal frame of suppositional box and map is usually in the same orientation. Actually, there will be certain differences between them. For example, in the real experiences, orthogonal frame of suppositional box may not coincide with orthogonal frame of map such as form a certain angle between them, which is a challenge for observed error in Kalman filter merging. As show in Figure 9. It is because that our method ignores the error caused by a certain angle between the orthogonal frame of them. Therefore, in order to comprehensively evaluate the performance of map merging method based on suppositional box, we test our method in scenes where the orthogonal frame of suppositional box and map is in the different orientation. In the test, the local map is rotated around the robot's initial position point, so that the orthogonal frame of local map forms a certain angle with the orthogonal frame of suppositional box.
Electronics 2021, 10, x FOR PEER REVIEW the performance of map merging method based on suppositional box, we test our in scenes where the orthogonal frame of suppositional box and map is in the orientation. In the test, the local map is rotated around the robot's initial positio so that the orthogonal frame of local map forms a certain angle with the orthogon of suppositional box. We selected eight walls characterized by a straight line in the overlapping ar two local maps, as shown in Figure 10. We firstly record the actual length of walls in Gazebo, then calculate the length of the eight walls in the two local maps merged map, respectively. The two local maps are 1 map constructed by robot1 a constructed by robot2. The length of the wall in map is calculated by the Euclid tance between the pixel coordinates of the two ends of the wall multiplied by th tion 0.05. Finally, the error results of the unrotated map, the 45° rotation map an rotation map are shown in Table 2.   We selected eight walls characterized by a straight line in the overlapping area of the two local maps, as shown in Figure 10. We firstly record the actual length of the eight walls in Gazebo, then calculate the length of the eight walls in the two local maps and the merged map, respectively. The two local maps are map 1 constructed by robot1 and map 2 constructed by robot2. The length of the wall in map is calculated by the Euclidean distance between the pixel coordinates of the two ends of the wall multiplied by the resolution 0.05. Finally, the error results of the unrotated map, the 45 • rotation map and the 90 • rotation map are shown in Table 2.
Electronics 2021, 10, x FOR PEER REVIEW the performance of map merging method based on suppositional box, we test our in scenes where the orthogonal frame of suppositional box and map is in the d orientation. In the test, the local map is rotated around the robot's initial positio so that the orthogonal frame of local map forms a certain angle with the orthogon of suppositional box. We selected eight walls characterized by a straight line in the overlapping ar two local maps, as shown in Figure 10. We firstly record the actual length of t walls in Gazebo, then calculate the length of the eight walls in the two local maps merged map, respectively. The two local maps are 1 map constructed by robot1 an constructed by robot2. The length of the wall in map is calculated by the Euclid tance between the pixel coordinates of the two ends of the wall multiplied by the tion 0.05. Finally, the error results of the unrotated map, the 45° rotation map and rotation map are shown in Table 2.    From the above error data, we can see that the error of the merged map is mostly below the errors of local maps, and sometimes it is between them. First of all, the length of the wall in the merged map is very consistent with the actual length, and the average error is about 4%, indicating that our method is still feasible in the rotated map merging. Secondly, the superiority of multi-robot mapping can be proved because the merging method can reduce the accumulated errors of single-robot mapping over a long period of time.
Other rotated angle values in (0 • , 360 • ) are randomly experimented to further illustrate the merging performance. The error results are shown in Figure 11. On the whole, the error of the merged map is about 4%, showing that the merged map is accord with the actual environment. From the analysis of different rotation angles, it can be seen that when the rotation angle is small, the error of the map does not change much. In the case of a large rotation angle, the error in local map increases due to the map rotation operation, and thus the error in merged map also increases. Even so, the error in merged map is still within 5%. In conclusion we can say, our proposed method based on suppositional box matching is not affected by map rotation. From the above error data, we can see that the error of the merged map is mostly below the errors of local maps, and sometimes it is between them. First of all, the length of the wall in the merged map is very consistent with the actual length, and the average error is about 4%, indicating that our method is still feasible in the rotated map merging. Secondly, the superiority of multi-robot mapping can be proved because the merging method can reduce the accumulated errors of single-robot mapping over a long period of time.
Other rotated angle values in (0°, 360°) are randomly experimented to further illustrate the merging performance. The error results are shown in Figure 11. On the whole, the error of the merged map is about 4%, showing that the merged map is accord with the actual environment. From the analysis of different rotation angles, it can be seen that when the rotation angle is small, the error of the map does not change much. In the case of a large rotation angle, the error in local map increases due to the map rotation operation, and thus the error in merged map also increases. Even so, the error in merged map is still within 5%. In conclusion we can say, our proposed method based on suppositional box matching is not affected by map rotation.

Experimental Verification under the Real Environment
Here, we use two Fuxi cleaning robots to test our method. Each robot is equipped with a SICK-TIM561 single-line lidar, along with the Ubuntu 16.04 environment and ROS related function packages.
The real experiments are done in an office building in the Science Park of Central South University, as shown in Figure 12. The environment is mainly composed of corridors and connected rooms. Gmapping algorithm is used to build local maps of the environment.

Experimental Verification under the Real Environment
Here, we use two Fuxi cleaning robots to test our method. Each robot is equipped with a SICK-TIM561 single-line lidar, along with the Ubuntu 16.04 environment and ROS related function packages.
The real experiments are done in an office building in the Science Park of Central South University, as shown in Figure 12. The environment is mainly composed of corridors and connected rooms. Gmapping algorithm is used to build local maps of the environment. The robots walk around in the environment and build two-dimensional grid maps. Taking into account the diversity of the environment, the robot builds four local maps in the real environment and divides them into two different groups for map merging. In Experiment 1, the robot explored a normal indoor environment with rooms in different directions. In Experiment 2, the robot explored a long corridor, which can construct multiple similar suppositional boxes. The experimental results are shown in Figure 13. The advantage of the ubiquity and abundance of suppositional boxes in the environment may result in a high rate of false matches. However, from the above experimental results, it is obvious that the proposed method can be successfully applied in different environments, which proves our method can select the right one from a number of suppositional boxes.
To demonstrate the higher accuracy of the map merging method based on suppositional box, we compare it with the two feature point matching methods based on SURF [40] and ORB [41]. SURF and ORB feature have become the main research hotspots because of better speed and robustness for image matching compared with other feature points. The robots walk around in the environment and build two-dimensional grid maps. Taking into account the diversity of the environment, the robot builds four local maps in the real environment and divides them into two different groups for map merging. In Experiment 1, the robot explored a normal indoor environment with rooms in different directions. In Experiment 2, the robot explored a long corridor, which can construct multiple similar suppositional boxes. The experimental results are shown in Figure 13. The robots walk around in the environment and build two-dimensional grid maps. Taking into account the diversity of the environment, the robot builds four local maps in the real environment and divides them into two different groups for map merging. In Experiment 1, the robot explored a normal indoor environment with rooms in different directions. In Experiment 2, the robot explored a long corridor, which can construct multiple similar suppositional boxes. The experimental results are shown in Figure 13. The advantage of the ubiquity and abundance of suppositional boxes in the environment may result in a high rate of false matches. However, from the above experimental results, it is obvious that the proposed method can be successfully applied in different environments, which proves our method can select the right one from a number of suppositional boxes.
To demonstrate the higher accuracy of the map merging method based on suppositional box, we compare it with the two feature point matching methods based on SURF [40] and ORB [41]. SURF and ORB feature have become the main research hotspots because of better speed and robustness for image matching compared with other feature points. The advantage of the ubiquity and abundance of suppositional boxes in the environment may result in a high rate of false matches. However, from the above experimental results, it is obvious that the proposed method can be successfully applied in different environments, which proves our method can select the right one from a number of suppositional boxes.
To demonstrate the higher accuracy of the map merging method based on suppositional box, we compare it with the two feature point matching methods based on SURF [40] and ORB [41]. SURF and ORB feature have become the main research hotspots because of better speed and robustness for image matching compared with other feature points.
We selected two pairs of maps from the real indoor map dataset Halmstad-Robot-Maps [42]. This data set mainly collects environmental data of four different scenarios in Harmstad University. They used two methods to collect data, namely CAD drawing and sensor perception. The data set is mainly based on Google's Tango Constructor application to build a three-dimensional grid map, and then select a certain horizontal plane to convert it into a two-dimensional grid map [43]. In the data set, a two-level map of the office lobby (E5 and F5) and two different residential apartment maps (HIH and KPT4A) are established. Among them, 14 two-dimensional grid maps are established in the E5 and F5 scene respectively, and there are 4 two-dimensional grid maps in the HIH and KPT4A scene, respectively. In the experiment, we select one pair of maps from the E5 scene as map 3 and map 4 , and the other pair map 5 and map 6 from the F5 scene, as shown in Figure 14. The test is to count the number of correct matching features in different matching methods, and the result is shown in Figure 15. Our method, in terms of the number of suppositional boxes, is 17 and 15 pairs of features, respectively. However, transformation matrix is calculated on the basis of multiple pairs of corresponding feature points. Therefore, according to the number of matching feature points, the number of corresponding feature points obtained by our method is 68 and 60, respectively. In comparison, the number of corresponding feature points in our method far outweighs the other two methods. Importantly, the more the number of matching correct feature points, the higher the accuracy of the transformation matrix. We selected two pairs of maps from the real indoor map dataset Halmstad-Robot-Maps [42]. This data set mainly collects environmental data of four different scenarios in Harmstad University. They used two methods to collect data, namely CAD drawing and sensor perception. The data set is mainly based on Google's Tango Constructor application to build a three-dimensional grid map, and then select a certain horizontal plane to convert it into a two-dimensional grid map [43]. In the data set, a two-level map of the office lobby (E5 and F5) and two different residential apartment maps (HIH and KPT4A) are established. Among them, 14 two-dimensional grid maps are established in the E5 and F5 scene respectively, and there are 4 two-dimensional grid maps in the HIH and KPT4A scene, respectively. In the experiment, we select one pair of maps from the E5 scene as map from the F5 scene, as shown in Figure  14. The test is to count the number of correct matching features in different matching methods, and the result is shown in Figure 15. Our method, in terms of the number of suppositional boxes, is 17 and 15 pairs of features, respectively. However, transformation matrix is calculated on the basis of multiple pairs of corresponding feature points. Therefore, according to the number of matching feature points, the number of corresponding feature points obtained by our method is 68 and 60, respectively. In comparison, the number of corresponding feature points in our method far outweighs the other two methods. Importantly, the more the number of matching correct feature points, the higher the accuracy of the transformation matrix.  In general, the smaller the local environmental area explored by a single robot, the smaller the number of features in local map that can be extracted. Therefore, we divide the indoor map dataset into two categories according to area of local environment. Relatively speaking, in E5 and F5 scenes, the local environmental area explored by a single robot is large, while in HIH and KPTA4 scene, the local environmental area is small. We We selected two pairs of maps from the real indoor map dataset Halmstad-Robot-Maps [42]. This data set mainly collects environmental data of four different scenarios in Harmstad University. They used two methods to collect data, namely CAD drawing and sensor perception. The data set is mainly based on Google's Tango Constructor application to build a three-dimensional grid map, and then select a certain horizontal plane to convert it into a two-dimensional grid map [43]. In the data set, a two-level map of the office lobby (E5 and F5) and two different residential apartment maps (HIH and KPT4A) are established. Among them, 14 two-dimensional grid maps are established in the E5 and F5 scene respectively, and there are 4 two-dimensional grid maps in the HIH and KPT4A scene, respectively. In the experiment, we select one pair of maps from the E5 scene as map from the F5 scene, as shown in Figure  14. The test is to count the number of correct matching features in different matching methods, and the result is shown in Figure 15. Our method, in terms of the number of suppositional boxes, is 17 and 15 pairs of features, respectively. However, transformation matrix is calculated on the basis of multiple pairs of corresponding feature points. Therefore, according to the number of matching feature points, the number of corresponding feature points obtained by our method is 68 and 60, respectively. In comparison, the number of corresponding feature points in our method far outweighs the other two methods. Importantly, the more the number of matching correct feature points, the higher the accuracy of the transformation matrix.  In general, the smaller the local environmental area explored by a single robot, the smaller the number of features in local map that can be extracted. Therefore, we divide the indoor map dataset into two categories according to area of local environment. Relatively speaking, in E5 and F5 scenes, the local environmental area explored by a single robot is large, while in HIH and KPTA4 scene, the local environmental area is small. We In general, the smaller the local environmental area explored by a single robot, the smaller the number of features in local map that can be extracted. Therefore, we divide the indoor map dataset into two categories according to area of local environment. Relatively speaking, in E5 and F5 scenes, the local environmental area explored by a single robot is large, while in HIH and KPTA4 scene, the local environmental area is small. We select four pairs of maps in E5 scene and three pairs of maps in the F5 scene. In HIH and KPT4A scene, one pair of maps were selected, respectively.
In addition to point features mentioned above, since Hough transform is used in our method, we choose another method based on Hough spectrum [19] for comparison. The performance of methods in terms of success rate is shown in Table 3. From the result, we have a matching success rate of 75% and 66.66% in map groups with large area, whereas the ORB method has only 25% and 0%, the SURF method has 25% and 25%, meanwhile the method based on Hough spectrum only has 25% and 25%. Although we had only a 50% success rate in map groups with small area, none of the ORB method and SURF method matched successfully.

Conclusions
In this paper, we propose a map merging method based on suppositional box. Suppositional box is not only simple in structure, but also ubiquitous in indoor environment. This virtual structure feature can greatly facilitate the efficiency of map merging because a pair of suppositional boxes represents a pair of identical common area within which many other corresponding features can be extracted. What is more, it is not depend on the orientation of the map.
The main body of our approach is to find public areas based on suppositional box and merge map by Kalman filtering. Experimental results demonstrate its outperformance. We have shown, with experimental results, the merged map is more in line with the actual environment after reducing the errors of the optimal matching pairs. Further, compared with other matching method, a suppositional box pair can get more feature pairs like points or lines, which shows excellent matching abilities.
Despite the success of our approach, there are still challenging situations requiring additional research. Current research is only a simple violent fusion of three different grid states when facing with the different states fusion of same grid point. In fact, it is possible to discuss the probability weight of the grid based on the original probability grid. Therefore, in future research, it is expected that on the basis of this method, improvements can be made to the fusion of different grid state values, which can further improve the mapping accuracy.