2D SLAM Algorithms Characterization, Calibration, and Comparison Considering Pose Error, Map Accuracy as Well as CPU and Memory Usage

The present work proposes a method to characterize, calibrate, and compare, any 2D SLAM algorithm, providing strong statistical evidence, based on descriptive and inferential statistics to bring confidence levels about overall behavior of the algorithms and their comparisons. This work focuses on characterize, calibrate, and compare Cartographer, Gmapping, HECTOR-SLAM, KARTO-SLAM, and RTAB-Map SLAM algorithms. There were four metrics in place: pose error, map accuracy, CPU usage, and memory usage; from these four metrics, to characterize them, Plackett–Burman and factorial experiments were performed, and enhancement after characterization and calibration was granted using hypothesis tests, in addition to the central limit theorem.


Introduction
SLAM algorithms are complex methods that allow a robot, without any external system other than its own sensors, to create a map of the environment and locate itself into this map. There are a large amount of non-linearities and imperfections in the mobile robot system (e.g., robot drifts, sensor noise, irregular environment) that could lead the SLAM algorithms to a bad representation of the environment, getting lost on this representation, or spending a considerable amount of computational resources [1,2]. Therefore, since these are the main difficulties a robot with a SLAM algorithm must overcome, this work focuses on characterizing, calibrating, and comparing five different 2D SLAM algorithms towards creating a good map, having a good track of its pose (position and orientation), but also spending the less possible CPU and memory while doing so.
For longer than two decades, SLAM has been in the spotlight of many robotics researchers, due its many possible applications such as autonomous driving [3,4], search and rescue [5], autonomous underwater vehicles [6,7], and collaborative robotics [8], which is why, nowadays, there are many different approaches trying to solve the same problem [9]. Below are shown the most frequent SLAM algorithms approaches.
A first approach to solve the SLAM problem was based on the extended Kalman filters (EKF) [1]. Kalman filters [10] are based in the implementation of observers, which are mathematical models of the linearized system that help estimate the behavior of the real system, and in the utilization of an optimal state estimator, that considers white noise in the measurements of the system [11]. For the SLAM problem, the EKF first predict the an algorithm that has deeply optimized the pose and map calculations. In the subsequent sections the selected algorithms will be described with deeper emphasis.
In this work, four metrics are used for the comparison of 2D Slam algorithms, they were created and processed in MATLAB, and are explained in the following paragraphs.
The map accuracy was measured using k-nearest neighbor method [29], by measuring the euclidean distance from each of the ground truth points to the nearest map point generated by the SLAM algorithm under test. A mathematical representation of the metric can be found in Equation (1), where N is the amount of points to sample, x 2i − x 1i and y 2i − y 1i represent the x-coordinate and y-coordinate difference between the ground truth point and the nearest map point generated by the algorithm, respectively. The measurement units used for this metric are centimeters.
Pose tracking accuracy was developed by a set of iterative loops calculating the euclidean distance between the ground-truth pose and the estimated pose [30]. It can also be represented by equation (1), but with a modified interpretation of the variables. For this metric N is the number of poses to sample, x 2i − x 1i and y 2i − y 1i represent the x-coordinate and y-coordinate difference between the ground truth pose and the estimated pose generated by the algorithm, respectively. The measurement units used for this metric are meters.
Finally, CPU and memory usage were recorded using Python psutil library [31]. These both metrics are mathematically represented by averaging the whole measurements taken during the test run, and their units are percentage of for CPU usage, where a number beyond 100% means it is using more than a single core, and MB for memory usage.
Lastly, there are many SLAM comparison investigations done previously, such as Ref. [32], which focuses on the algorithms processing time; Ref. [29] which evaluates map accuracy, and CPU usage; Ref. [20] which evaluates map accuracy, CPU and memory usage; Ref. [33] which only measures pose and map accuracy, and Ref. [34] which analyzes map accuracy and CPU usage.
Based on the reviewed works, there are two differentiating factors of the method proposed in this paper, which puts our investigation a step ahead: 1.
The existing works focus only on map accuracy, pose accuracy, memory or CPU usage, but none of them considers all of them together. Our investigation considers all of them, giving a wider point of view to better characterize, calibrate, and compare the SLAM algorithms.

2.
None of the current methods takes a statistical approach to provide confidence levels on the results obtained. With our investigation we can guarantee with 90% confidence that each condition will happen when the populations are considered. In addition with 95% confidence level that the characterization and calibration of the parameters is the best fit for the ranges tested.

Generalities
For all these experiments, since the trials and algorithms were simulated, the only equipment needed was a computer running Ubuntu 18 with ROS Melodic, the computer was a server with an Intel Xeon Silver 4114 2.2 GHz. To simulate the environment, a software called GAZEBO 11.0.0 release was used to simulate the test environment, while a robot named TurtleBot 3 Burger was the one selected to be simulated in this work, because of its 2D LiDAR sensor and its differential driving mode, but other configurations can be used, such as a mecanum omnidirectional robot [35].

Simulation Needs
Regarding the ROS nodes, there are some nodes that were tailored for our needs, other than simulated robot that can be easily implemented based on the TurtleBot 3 wiki [36]. The first of them is the so-called Robot Pose Publisher, which basically reads the data published by GAZEBO and stores every convenient time (20 times per second in this case) the actual pose of the robot [37], second, a node that monitors the CPU and memory usage by the SLAM algorithm [38], and last but not least, a node that makes the robot follow a fixed path, to guarantee that all the samples were performed under the same conditions [39].

Data Processing Needs
Next, MATLAB 2020B was used to convert the data provided through rosbags in a manner that can be easily analyzed and synthesized, the scripts used are Ground Truth Generator, which takes the environment created through GAZEBO and builds a high resolution 2D version of it [40]. This well-known Ground truth plot is then compared to the SLAM algorithm result by using a script that takes advantage of knn-search method provided by MATLAB [41], its output is the descriptive statistics of the whole comparison.
There are two other important scripts, in the first it is compared the real pose towards the estimated pose of the robot, and returns some meaningful descriptive statistics about the comparison [42], and a script that analyzes the CPU and memory usage by the algorithm [43].

Data Analysis Needs
The data analysis software used to provide sufficient statistical evidence of the results provided, was Minitab statistical tool version 2018.

SLAM Algorithms Used
There are five algorithms used, all of them as a 2D algorithm because of the robot sensor limitation wanted, these are described in the following subsections.

Cartographer
Cartographer was created by Google and released for free worldwide access since October 2016 [21]. The main idea with this algorithm was to improve the efficiency, by optimizing the way to process the data from particle filters. So, instead of creating a big map, it divides them by shorter sub-maps, which then are inserted on the way, besides a pose optimization, concluding in an error reduction that is carried over from robot pose [44].
This algorithm is based in the combination of two separated 2D SLAM, one of them working locally, and the other working globally, both using a LiDAR sensor and optimized independently. Local SLAM is based in the collection and creation of sub-maps, one of them is the recollection and alignment of multiple scans with respect to initial position. Sub-maps are created like a dot net with an specific resolution, and with a probability associated that one of its dots is blocked. This probability depends if it was measured previously and if it is kept while more sub-maps are created. Once sub-map is created, it is passed by an algorithm to find the optimal position to match with the rest of the sub-maps, and then extrapolate the rest of them [45].
The second part of the algorithm, the global SLAM, is focused in the sub-maps feedback. Once these sub-maps are created, all of them have robot poses associated. which are used to improve the maps, making a reduction of the accumulated SLAM error. This is well-known as loop closure [45].
By using the well-known optimization called Spare Pose Adjusment (SPA), every time a sub-map is generated, a map-scanner is executed to close the loop and insert the just-created sub-map into the graphic. Below are shown two formulas that determine if a cell is saved as busy, empty, or empty into a map cell [46]. where: • M last (cell) is the error likelyhood. • p hit is the probability that a map cell is busy.
The intention is to minimize the functional cost of updating the cells value that compose the map. where: • M so f tened (x) is the cell value x, softened by the neighbor values. • h k is the laser reading related to cell. • T ξ is the matrix transformation that displaces the point h k to ξ. • ξ is the posture vector (ξ x , ξ y , ξ θ ). This model is configured based on different parameters of the algorithm. Below, in the Table 1 are shown the main parameters that have incidence in the functionality of the algorithm [47].

Gmapping
This algorithm is based in the principles described in the particle filter with Rao-Blackwellization, which makes the math to get the actual posture of the robot, right from the probability given by the information collected in the past; with the help of this posture and the past maps made. It also has the capability of correcting estimations by the odometry and the calculation of the weights and the map [17]. This is one of the most studied types of SLAM algorithms, it came right after many years of investigation around particle filters, using the Rao-Blackwellized particle filter approach [48] to solve more efficiently the SLAM algorithm, reducing the number of particles required for the estimation [48]. In addition, the robot pose uncertainty is greatly decreased in this algorithm. However, it has a higher computational resource requirement, as it usually has an elevated processing time and memory consumption when compared to the EKF filter approach.
The main parameters responsible of the functionality of the algorithm are listed in the Table 2, according to [49].

HECTOR-SLAM
This algorithm is named because of its development team, which is Heterogeneous Cooperating Team Of Robots, an as it is explained in [18], it was developed because of the necessity of an algorithm for Urban Search and Rescue scenarios (USAR).
HECTOR-SLAM was developed from a 2D SLAM using a LiDAR sensor that had attached an IMU, this sensor provides the measurements for the navigation filter, and also gives the capability to perform 3D mapping. This is the reason why HECTOR-SLAM can be used into either 2D or 3D strategies.
As shown in [18], the algorithm uses an occupation grid map. Since LiDAR has 6 degrees of freedom, the scanned points must be transformed to a local coordinates framework using the estimated behavior from the LiDAR. Reason why, using the estimated pose, the scanned points are converted in a point cloud. With this point cloud, it is performed a pre-processing of the data, HECTOR-SLAM uses a z axis filtering of the final point, with this only the final points of the (x, y) plane are considered.
Regarding the list of parameters of HECTOR-SLAM, these are defined in the Table 3, they were taken from [50].  [22].
KARTO-SLAM builds the map by using nodes that save the location points of the robot trajectory and the dataset of sensor measurements. Graph borders are represented by transformations or trajectories between two consecutive poses in the space. when a new node is added, the map will be reprocessed and updated according to the border restriction in the space. These restrictions will be linearized as an scatter graph [51,52].
A loop closure condition can be shown if the robot revisits the same point twice or more times in the same run. In other words, a border that connects two nodes with the same world perception is made. Aligning these perceptions produces a virtual transformation. Based on this information it is determined if the algorithm can adjust its estimations and represents the environment with a good enough confidence level [53].
An optimization is used to calculate the most likely pose from the nodes collected, to get the most probable graph. To use the optimization methods, it is necessary to define an error function between the measurements obtained. Assuming x = (x 1 , x 2 , ..., X T ) T is the nodes vector in the graph, and z i,j the odometry between nodes x i and x j . A borderẑ i , j is produced, with an error expression that meets the Equation (3).
Together with the inverse covariation matrix ω i,j , an error function is established, given by the Equation (4).
The goal is to compute a posture x, in a way that the Equation (4) goes to its minimum, in a way that Equation (5) is accomplished.
At this point it is necessary to describe the algorithm parameters, these are shown in the Table 4 and were taken from [54]. RTAB-Map comes from Real-Time Appearance-Based Mapping, it is a graph-based SLAM algorithm, composed by a C++ library and a ROS package. This library is an open source library, and has been improved and extended since its beginning in a way that the closed loop algorithm implements a memory management strategy [23].
Its processing requires some distributed storage systems, these are short-term memory, work memory, and long-term memory. These all together optimize the localization and mapping for long periods or in wide spaces, because they limit the size of the space processed, so that the loop closure can be executed in a short time lapse [55,56].
RTAB-Map implementation is based in a simultaneous processing. For graph-based SLAM, as the map grows, the processing, optimization, assembly, and CPU load also grows. Reason why, RTAB-Map stablishes a maximum response time at SLAM output, once it has received the sensors data [23,57]. As the latest version of the algorithm admits 2D and 3D LiDAR sensors and is capable of performing visual SLAM, the RTAB-Map 2D LiDAR based SLAM option [23] was used for the tests performed in this work.
The list of parameters of RTAB-Map are shown in the Table 5, they were taken from [58].

Arenas Used
Three different arenas simulated through GAZEBO were created to test the SLAM algorithms. The differences between them are mainly based on the number of irregularities per area that they have, and also, by the kind of path that they force the robot to follow. This arena simulates an apartment with a set of rooms and regular geometry objects on it, in every single place there is a quite good number of irregularities, so that the robot can easily handle the SLAM task, see Figure 1 for reference.

Training Arena
This arena is used for algorithms characterization and calibration, but also for the comparison trials. It is shown in Figure 2. It can be considered as a middle point between Common Environments Arena and Labyrinth Arena, since it has regular figures as Common environments Arena does, but also has long corridors around the zero coordinate of the arena, as Labyrinth Arena does. These are the reasons why it is used for characterization and calibration of the algorithms. The arena tries to challenge the algorithms with some sort of general asymmetry, and with angled obstacles to see how good it is dealing with this kind of obstacles, the arena itself is nothing but a corridor with a center room containing a single obstacle, however, something that can slip past is that the number of irregularities per area is a bit lower than with Common Environments Arena, but higher than Labyrinth Arena.
This arena is also considered for comparison trials, to reflect how good the characterization and calibration was.

Labyrinth Arena
This arena is the hardest of the three for the SLAM algorithms, at glance it shows a labyrinth easy to follow, however, it is a very difficult environment to map by any SLAM algorithm, as this arena challenges the algorithms with more complex obstacles and with long corridors, without any irregularity that could help the algorithms to easily locate themselves and recreate the environment map. These two reasons make this arena the hardest for the test performed in the algorithm comparison. For reference see Figure 3.

Trajectories Used
There were fifteen trajectories used, six for Common environments Arena, six for Training Arena, and three for Labyrinth Arena. The main objective of the trajectories is to make the robot follow the arenas in diverse ways, first starting from coordinate zero (geometric center of the arenas), then starting from a non-zero coordinate, and finally following twice the trajectory starting from coordinate zero. All these three trajectories are followed in the forward direction and then in reverse, except for the Labyrinth Arena, in which reverse trajectories are the same than the forward direction, so only three trajectories were used in this arena. The match between observations and scenario is shown in the Table 6.

Characterization and Calibration Methods Used
To characterize each of the algorithms, a statistical approach was taken, it is not sensible to the type of SLAM algorithm or sensors used, it is only sensible to the data provided by each of the metrics for the trials, so that even SLAM approaches using sensors other than LiDAR can be calibrated following this method, as long as the map representation is compatible with the application of the knn-search metric, and the robot pose is obtained in matching measurement units. However, a 2D LiDAR sensor approach was taken to match the analysis with actual equipment available, and because these are the most common sensor for the 2D SLAM approach, and especially applicable to low-cost robotic platforms.
The methodology focuses on finding statistical evidence of the effects of the algorithms parameters on the output means of Pose Accuracy, Map Accuracy, CPU usage, and Memory usage, it is important to highlight that this paper have used mean measurements for characterization and calibration, but other descriptive statistical values can be used if wanted.
There are three different stages for calibration, these are described below. The first stage is only used when the algorithm has a large amount of parameters that must be tuned, here comes into play the first statistical tool, which is a Plackett-Burman experiment, which is a kind of Design of Experiments with a reduced amount of samples, but with the weakness that only takes into account main effects, since main effects are aliased with 2-way interactions (only the effect of each variable by itself can be obtained). With this tool it can be ensured with some confidence level defined when analyzing the experiment results, that a variable has an effect over an output.
Next, the second stage is when calibration comes into play. This part of the process considers only the variables that demonstrated that, by themselves, have an effect over at least one of the four outputs we are measuring, a full-factorial Design of Experiments is used, it makes the combination of all the parameters in the ranges defined by the user, and returns a Pareto chart and an equation, with these both we can determine which is the best combination that reduces the error of the localization and mapping, or reduces the resources usage, also with some confidence level defined when analyzing the experiment results.
Finally, to compare the algorithms, since the data obtained from each run not necessarily shapes a Gaussian's curve, the central limit theorem is used, population data are considered the whole different tests that can be performed on these arenas with this robot and with each algorithm, so that, calculating the mean of the means we can then compare this value between the values obtained from other algorithms (for a full-data comparison), using the statistical tools that can be used with Gaussian-behaving samples, in this case using hypothesis tests for the mean and the standard deviation of the means (Two-Sample T for the mean, and Two-Sample Standard Deviation for the standard deviation of the means).

Results
There are three main stages considered along this work, Characterization, Calibration, and Comparison, these are explained in the following sections.

Characterization and Calibration
Since the algorithms were already working and providing acceptable results to consider them functional with the default parameters, a soft tuning with short modifications to these parameters was performed, to enhance the performance for the training arena and the simulated robot.
There were two statistical experiments and a set of hypothesis tests performed to tune each algorithm. First, with a Plackett-Burman experiment, filter which parameters main effects over each metric had enough statistical significance for the ranges of variation per variable, next, with a full factorial experiment, tune these parameters to give the best output for the metrics considered, and finally, confirm that the new parameters tune gives better results than default parameters tuning with a set of hypothesis tests for the mean and/or the standard deviation. This confirmation was performed with more than two trials, to be able to take advantage of the central limit theorem and get valid hypothesis conclusions.
As disclaimer, Gmapping was not soft tuned for these trials, since it was already fully tuned by a previous work [59].

Cartographer
For Cartographer, its output had the problem that default parameters did not give a good map accuracy, for this reason the soft tuning was focused on enhancing the map accuracy. There were identified ten different parameters that could be more significant for the general algorithm outputs, these are shown in the Table 7. 10 × 10 5 10 × 10 2 10 × 10 6 10 × 10 6 local_slam_rotation_weight 10 × 10 5 10 × 10 2 10 × 10 6 10 × 10 6 odometry_slam_translation_weight 10 × 10 5 10 × 10 2 10 × 10 6 10 × 10 2 odometry_slam_rotation_weight 10 × 10 5 10 × 10 2 10 × 10 6  After Plackett-Burman and Full Factorial designs only three of the listed parameters were modified from their defaults, those can be seen in the final values of Table 7. For the improvement confirmation trials, there were five runs executed with default and improved parameters, with 95% of confidence we can tell that map accuracy and pose accuracy means were improved with the new parameters (given the Figure 7), by performing a set of 2-Sample T tests, but at cost of memory usage degradation from default parameters.

HECTOR-SLAM
At the very beginning, with default parameters this algorithm showed up an adequate performance for all the metrics, based on Figure 8, so the experiment was focused on really short variations to see if there might be an enhancement on the outputs. For that reason, the parameters identified for the experiment were the ones shown in Table 8.  After all the experiments it was identified that none of the parameters had enough statistical evidence to demonstrate any direct effect on the outputs. Furthermore, it was evidenced that the best scenario for the four metrics was the default scenario, since all the different variations have a worsened behavior from the default values.

KARTO-SLAM
For KARTO-SLAM, eighteen parameters were considered in the soft tuning stage, these are shown in the Table 9. After completing Plackett-Burman experiment only three parameters surpassed the statistical limit to be considered relevant for CPU usage. A full factorial experiment was executed over these parameters with the same variation ranges used in Plackett-Burman experiment. After completing the factorial experiment, it was obtained a scenario that improved the output for each of the metrics, demonstrated through hypothesis tests over the mean and standard deviation, using five runs with default versus new parameters. The improved map is showed in Figure 9.

RTAB-Map
Since RTAB-Map has a boosted capabilities than others, its model was coupled to deal only with 2D SLAM problem. With this, the relevant parameters were selected to tune, those are shown in Table 10.  From filtering stage, there were identified 3 parameters with enough statistical relevance for pose and map accuracy those were detectionRate for pose error, and timeThreshold and LoopThreshold for map accuracy. A full factorial experiment was performed with these parameters, obtaining a total of nine experiments to perform. With this factorial experiment the parameters were tuned for the best scenario; their final values are shown in Table 10.
After soft tuning, with six extra trials with the new parameters versus default parameters, it was demonstrated with 90% of confidence that all the metrics perform better with these new parameters configuration. Figure 10 is presented as proof of the improvement.

Cartographer
Results for Cartographer in terms of pose accuracy were quite stable throughout all the different scenarios executed, excepting when executing labyrinth arena starting at non-zero coordinate (observation 14 in Figure 11), this is a special case where the robot starts in a corridor without irregularities or landmarks to reference itself, making it accumulate the error quickly, and it is unable to take it back to near zero.
In terms of CPU and memory usage, it can be noticeable that the longer the test the higher the usage, since observations related to two laps show a higher CPU and memory usage, as can be seen in Figure 12 for CPU usage behavior, and Figure 13 for memory usage behavior.    In regards of map accuracy there are no trends by visually inspecting the results, as there is no noticeable correlation to either arena type, trajectory type, or robot direction. See Figure 14 for reference.

Gmapping
In regards of pose accuracy behavior it is the same behavior obtained with Cartographer, Figure 15 shows the time evolution of the pose error. The quick error increase at the beginning of the test of the observation 14 (labyrinth arena starting at non-zero coordinate) is quite visible, which is an expectable behavior because of the SLAM algorithms nature, as was explained before in the Cartographer results. These overall results considering all the tests for Gmapping can be found in Figure 16.  In relation to CPU and memory usage, the only trend noticeable was the correlation between them, when CPU usage increased memory usage decreased and vice versa. After a Pearson test to confirm this correlation, it resulted in a strong negative correlation of −0.928. Figure 17 shows visually their behavior, that can be explained by the way Gmapping manages its resources. Gmapping processes the particles on the fly [48], and this can result in timelapses where CPU is full of other tasks and memory must store these particles while CPU gets some time to process them. The same occur when the CPU has a high availability for processing the particles, releasing the allocated memory. For map accuracy there is no real trend noticeable by the dataset, as shown in Figure 18.

HECTOR-SLAM
A general commentary on HECTOR-SLAM is its highly noticeable susceptibility to environments without irregularities, where HECTOR-SLAM gets completely lost in terms of map and pose accuracy. The empirical rules observed is that it gets lost when interprets that the places are longer than they really are (long corridor issue) or interprets that the robot is stopped in the last place it detected an irregularity.
This behavior can be observed mainly in the pose accuracy in Figure 19, with the value obtained in the observation 14 that is the labyrinth arena starting at non-zero coordinate, as the robot begins its movement inside a corridor without irregularities or landmarks to reference itself. This is similar to the results obtained for the Gmapping and Cartographer SLAM algorithms.
Also, as the worst result is obtained for the observation 12 with a peak error value of 2255.05 m, which is the common environments arena (two laps in reverse). In this case, the effect of running two laps instead of one has a negative effect on the metric. The cause is associated with the algorithms difficulty to close the loop for this arena, which should happen at about 1500 s in Figure 20, that represents the timeseries plot for pose error in observation 12. At first, the trial was considered an outlier, however upon repeating the test under the same conditions used for the other trials gave a similar result.
As for the map accuracy, in Figure 21 is visible that the output for common environments arena is better than the training arena, and that training arena is better than the labyrinth arena (compare the visual mean of observations 7-12, 1-6, and 13-15 respectively). This is confirmed by a hypothesis test between them at 90% confidence level, which is associated to the number of irregularities per arena that lets the algorithm create a better representation of the environment when there are more of them present.   Lastly, in regards of memory and CPU usage, it was verified that there is no wide difference between them for the different scenarios, as Figure 22 shows. It looks like the memory usage is better when repeating the trajectories. In addition, the algorithm is using about 15% of one single core.

KARTO-SLAM
Examining the results for pose accuracy, KARTO-SLAM had a quite stable behavior (see Figure 23), except for observation 14, which is the labyrinth arena starting at non-zero coordinate, the root cause is the lack of irregularities at the beginning of the test, which makes the algorithm wrongly estimate the pose of the robot and quickly accumulate a high error for the pose. This behavior can be seen in Figure 24 where it is clear that, at time zero, the pose accuracy was quite good, however, after a brief time driving the arena, the error goes up and keeps that way almost throughout the whole test, in a similar way as the previously analyzed results of the other SLAM algorithms.  Next, for memory usage it was identified that the longer the test the higher the memory usage, so that two lapped trials spent more memory compared to one lapped trial, it can be seen in Figure 25, observations 5 and 6 are the two lapped trials for training arena, 11 and 12 observations are the two lapped trials for common environments arena, and 15 is the observation for two lapped trial for Labyrinth arena. In addition, it was identified that both CPU and memory usage had a highly evident correlation, confirmed with a Pearson test, giving a correlation of 0.911 with a p-value of 2.4265 × 10 −6 . Lastly, for map accuracy it was evidenced and statistically supported that the higher the number of irregularities per area the better the map accuracy. With a 90% confidence level it was confirmed that maps generated with common environments arena gave a more accurate map (lower population mean) than training arena. The same thing occurs for the training arena against labyrinth arena. It can be visually confirmed by looking at the Figure 26, where observations 1 to 6 pertain to training arena, 7 to 12 to common environments arena, and 13 to 15 to labyrinth arena.

RTAB-Map
Regarding pose accuracy, the algorithm behave as KARTO-SLAM did, with satisfactory performance for all the trials excepting trial 14. This can be verified observing Figure 27. The cause is similar to the other SLAM algorithms results, since the error grew up quickly at the beginning of the test and stood the same through the test. For CPU and memory usage, it was identified a strong direct correlation between them, visually evident through Figure 28, but confirmed with a Pearson correlation test, giving a correlation of 0.942 with a p-value of 1.6380 × 10 −7 at 95% confidence level. Visually, it is also noticeable that CPU and memory usage grows when the tests late for longer periods, since observations 5, 6, 11, 12, and 15, which are the two lapped trials, have higher means compared to the trials on the same arena but running only one lap. Lastly, in regards of RTAB-Map results analysis, map accuracy was noticeable better performing on arenas with higher density of irregularities per area, since the maps obtained were more accurate for common environments arena than for training arena. Same thing for training arena against labyrinth arena, since the training arena gave better maps than labyrinth arena. The Figure 29 shows all the observations compared with each other.

Algorithms Comparison
For algorithms comparison two statistical tools were used, a 2-Sample T and a 2-Sample Standard Deviation using the central limit theorem. They compare the mean and standard deviation of both samples, to conclude about the mean and standard deviation of their populations at certain confidence level, in this case at 90% confidence level.

Pose Accuracy
In regards of pose accuracy, it was quite hard to plot all the samples from all the algorithms together because of their range differences. To solve this, a timeseries plot was used showing all of them in separate plots, as it can be seen in Figure 30. Comparing visually the samples by their ranges, the main result was that RTAB-Map performed better than KARTO-SLAM, which also performed better than Gmapping, followed closely by Cartographer and by far HECTOR-SLAM performing worse than all of them. However, with the dataset obtained, there was only evidence to demonstrate at 90% confidence level that RTAB-MAP population mean was lower than KARTO-SLAM's population mean, which also had a lower population mean than Gmapping, Cartographer, and HECTOR-SLAM. In addition, there was no evidence to demonstrate any difference on the population mean and standard deviation between Gmapping and Cartographer, only to demonstrate that both were superior to HECTOR-SLAM by their standard deviation, which means that in terms of pose accuracy HECTOR-SLAM would give more variant results through different scenarios than these two.
The data used for this section can be referenced in the Table 11.

Map Accuracy
For map accuracy, the data presented in the Figure 31 shows a box plot for all the algorithms together, with a trend line centered on their means. With these results, it was possible to confirm at 90% confidence level that RTAB-MAP outperformed all the other algorithms, followed closely by KARTO-SLAM, then by Cartographer, next by Gmapping, and finally by HECTOR-SLAM, which was impossible to demonstrate its difference towards Gmapping by its mean, but not by its standard deviation. The data used for this section can be referenced in the Table 12.

CPU Usage
With respect to CPU usage, Figure 32 shows a boxplot representation of all the algorithms with all their sample means. This figure shows that HECTOR-SLAM outmatch the other algorithms, followed closely by RTAB-Map, then by KARTO-SLAM, next by far from Gmapping, and finally by Cartographer. This finds were verified for their means by four hypothesis tests, all of them were demonstrated at 90% confidence level. The data used for this section can be referenced in the Table 13.

Memory Usage
For the last metric, in view of memory usage the data representation used was a set of boxplots with a trendline pointing to their means, as seen in Figure 33. From these results it was demonstrated with 90% of confidence that HECTOR-SLAM is the algorithm that best manages memory resources, followed closely by KARTO-SLAM, then by far by Cartographer, next by RTAB-Map and finally by Gmapping. It was not possible to demonstrate any population difference between Cartographer and Gmapping, neither between RTAB-MAP and Gmapping, however there was enough evidence to demonstrate that Cartographer was better performing than RTAB-Map by their means, and that population standard deviation of Gmapping would be greater than population standard deviation of RTAB-Map, which is the reason Gmapping is considered the worse of the algorithms for this metric. The data used for this section can be referenced in the Table 14.

Algorithms Comparison Summary
To summarize based on the previous sections the Table 15 was created. It shows in a numbering scale which algorithm is the best, where one means the best of them. In addition the nomenclature M represents that its superiority or inferiority was demonstrated by a 2-Sample T, and S represents that its superiority or inferiority was demonstrated by a 2-Sample Standard Deviation.
With the Table 15, it can be stated that if map and pose accuracy are priorities, regardless of CPU and memory usage, then RTAB-Map is the preferred algorithm to use. However, if there are limited resources in the mobile robot platform, a better approach could be using HECTOR-SLAM, with the highlight that it is the worse of them regarding map and pose accuracy.  Nevertheless, a different approach can be taken, in order to classify all the algorithms by their means in a range from zero to one hundred, where zero represents the algorithm with the lowest mean, and 100 would be the algorithm with the highest mean. With this classification, KARTO-SLAM comes up as the best choice between all of them, since is the algorithm that shows the lowest average with this methodology. The Equation (6) details this approach, and the results obtained are shown in Table 16.  100 * X Alg −X Min X Max −X Min Met (6) where: • Ave Alg Is the average to calculate, considering all the metrics. • Met Is the metric to be averaged, either pose accuracy, map accuracy, CPU usage, or memory usage. •X Alg Is the sample mean obtained from the algorithm being analyzed. •X Min Is the shortest sample mean obtained from any of the algorithms for that metric.
•X Max Is the largest sample mean obtained from any of the algorithms for that metric. Based on the evidence of Table 16 and Figure 34, the result of evaluating the algorithms by this procedure let to the conclusion that KARTO-SLAM brings the higher performance considering the CPU and memory usage along with map and pose accuracy. Furthermore, if the memory usage is not a limitation, RTAB-MAP has better results in all the other metrics, followed by Cartographer, HECTOR-SLAM and the last one is Gmapping.

Conclusions
The following are the main conclusions derived from the results of this work: • The proposed methodology is useful to characterize, calibrate, and compare any SLAM algorithm, no matter the robot sensors or SLAM type, as long as the map representation is compatible with the application of the knn-search metric, and the robot pose is obtained in matching measurement units, since the proposed characterization and calibration is based on the final results of the SLAM algorithms, rather than on their internal structure or on the sensors these algorithms use. The method proposed in this paper provides strong statistical evidence, based on the pose error, map accuracy, CPU usage, and memory usage, with descriptive and inferential statistics to bring confidence levels about overall behavior of the algorithms and their comparisons.
• It was quite noticeable that KARTO-SLAM outperformed all the other algorithms because it balances the use of resources and holds a good SLAM performance, just by looking at Figure 34 or by checking Table 16. • Without considering resources usage, the best algorithm is RTAB-Map, which really does an excellent job at mapping and calculating its own pose into the map. • HECTOR-SLAM outperformed when saving resources is the feature that matters, providing statistical evidence that it is the one which uses less CPU and memory than the other algorithms, however it is the one that gave the worst results when talking about localization and mapping. • Localization metric (pose accuracy) gets worse as obstacle density decreases for all algorithms, and this is something that makes sense, since SLAM algorithms require irregularities to be able to refer the robot to this irregularity, without them, it must trust on its odometry system, which is less accurate because it does not consider wheels slippage, dimensional irregularities in robot model, etc. • There was an hypothesis that repeating the trajectories two times would enhance the localization and mapping output. However, there was no enhancement noticed for both these metrics with statistical support. • There was provided statistical evidence that, starting at a coordinate without any irregularity for the robot to reference itself, can become a highly important issue that it may not be able to correct in regards to pose accuracy. Confirmed through the experiments performed in the labyrinth arena, when starting at a non-zero coordinate, the pose error grows quickly and all the algorithms had troubles in correcting this failure as the simulation continues, situation that does not happen this way when starting at zero coordinate, where there are good enough irregularities for the robot to locate itself.
As future work, the method can be extended to consider extended test time and bigger areas in the arenas, to determine the best algorithms for these cases of indoor SLAM applications. In addition new metrics can be defined for 3D SLAM and cooperative distributed SLAM algorithms that do not have a compatible map representation for the application of the knn-search metric.