Autonomous Mobile Scanning Systems for the Digitization of Buildings: A Review

: Mobile scanning systems are being used more and more frequently in industry, construction, and artiﬁcial intelligent applications. More particularly, autonomous scanning plays an essential role in the ﬁeld of the automatic creation of 3D models of building. This paper presents a critical review of current autonomous scanning systems, discussing essential aspects that determine the efﬁciency and applicability of a scanning system in real environments. Some important issues, such as data redundancy, occlusion, initial assumptions, the complexity of the scanned scene, and autonomy, are analysed in the ﬁrst part of the document, while the second part discusses other important aspects, such as pre-processing, time requirements, evaluation, and opening detection. A set of representative autonomous systems is then chosen for comparison, and the aforementioned characteristics are shown together in several illustrative tables. Principal gaps, limitations, and future developments are presented in the last section. The paper provides the reader with a general view of the world of autonomous scanning and emphasizes the difﬁculties and challenges that new autonomous platforms should tackle in the future.


Introduction
The creation of 3D models of buildings from 3D data is still a semi-manual work. Of particular relevance in this respect is the fact that, during the extraction of as-is models, an operator must manually take and process millions of datum (mainly 3D points), which entails time and errors in the model obtained.
However, the era of the automatic creation of what are denominated as Building Information Models (BIM) has brought about new systems, procedures, and algorithms that are able to collect and process a huge amount of data efficiently without the help of humans. In the last few years, the fields of artificial intelligence and robotics have, therefore, burst into the automatic BIM world.
Very few reviews concerning autonomous 3D scanning in construction can be found in literature to date. Representative surveys related to this research field can be found in References [1][2][3]. Lehtola et al. [1] present a review of the latest commanded mobile scanning techniques, focusing on aspects related to the quality of the point cloud and the metrics used. The survey also presents essential aspects that determine the goodness and applicability of the existing mobile autonomous 3D scanning systems and discusses the current limitations and gaps in this research field. Kostavelis et al. [2] present a survey regarding semantic mapping obtained from mobile robots. The paper categorizes the existing methods and shows the current applications implemented in mobile robots. A discussion concerning the sensors and strategies utilised in the construction of metric maps of the inside of buildings is carried out and the authors conclude their paper with a discussion of open issues.

Autonomous Scanning Platforms
Completely autonomous systems are those that are able to perform navigation, 3D data acquisition, and 3D data processing, without any initial knowledge of the scene and without human interaction. This degree of autonomy is attained thanks to efficient next best view (NBV) algorithms, which are adapted to each particular mobile platform. Representative examples of mobile scanning robots are illustrated in Figure 1.
The first autonomous platforms appeared in the period 1995-2010. Sequeira et al. [4] present a simple autonomous robot that partially digitalizes a single room with a time-of-flight laser range finder. A pan-tilt unit is used to collect a range image with 140 by 140 samples, covering a field of view of 60 • by 60 • . Surmann et al. develop Ariadne robot [5], a mobile platform with a 3D laser range finder that was capable of digitalizing large indoor environments. The Rosete platform is presented by Strand et al. in Reference [6]. In order to overcome the small viewing cone of the earlier commercial 3D scanners, they introduce a rotating laser scanner mounted on a mobile platform and successfully scans simple indoor scenarios. ATRV-2 AVENUE [7] is designed to acquire data from large-scale outdoors sites. It consists of a laser-scanner-equipped robot that assumes a previous rough localization (2D-map), which is necessary to calculate the route with a minimal set of views. Therefore, although the system has a high degree of autonomy, it requires essential information about the scene.
The platform of Blodow et al. [8] autonomously explores indoor environments and provides a semantic map obtained from colored point clouds. The robot, which is denominated as the "PR2" robot, has a tilting laser scanner and a color camera, which is panned and tilted to overcome the problem of its short field of view. This technique is applied to the drawers and doors. The same robot is used in Reference [9], but now to analyze the performance of a next best view algorithm in small and cluttered environments. In this case, the scene consists of a table top with different objects, which are sensed and recognized for robot interaction tasks. Charrow et al. [10] carry out 3D mapping of indoor environments using a ground robot equipped with a 2D laser range finder and a RGB-D camera. A single experiment is developed on a set of connected corridors and without occlusion. Iocchi et al. [11] obtain 3D maps of buildings by integrating a 2D laser, stereo-vision, and IMU on a mobile robot. Bormann et al. [12] present the platform Irma3D, a robotic platform that automatically creates 3D thermal models of indoor environments. The mobile platform is equipped with a 3D laser scanner, a thermal camera and an RGB camera. A 2D laser scanner is used for obstacle avoidance. Remote Sens. 2018, 10 In the last few years, micro aerial vehicles (MAV) have also been used as autonomous platforms that extract 3D information from indoor and outdoor scenes. Bircher et al. [15] proposed a new pathplanning algorithm for the autonomous exploration of an unknown volume using a firefly hexacopter and a stereo camera. The experiment took place in a single room. A similar UAV platform with two configurations is presented in Reference [16]. Heng et al. [18] present an algorithm for simultaneous exploration and coverage with an assumed data acquisition system composed of an MAV equipped with a forward-looking depth-sensing camera. The system is simulated in an office-like environment.
In another context, the platform of Rusu et al. [19] acquires 3D maps of kitchens with the aim of interacting with recognized objects. The robot enters the room and sweeps the scene with a laser mounted on its end effector. The output is a coarse 3D model composed of cuboids and planes that represent relevant objects, such as containers or tables.
The most recent proposals are those of References [13][14], [17]. Kurazume et al. [17] have proposed an innovative cooperative multiple robot system that scan indoors and outdoors. The system is composed of a mobile robot equipped with an on-board 3D laser scanner (the parent robot) and several child robots, including terrestrial robots and quadcopters. The parent robot obtains 3D data and generates a large-scale 3D model, whereas the child robots implement a precise localization technique. The system does not have any knowledge of the environment and works autonomously in complex scenarios. Kim et al. [14] introduce a robotic platform with a hybrid laser scanning system composed of five 2D laser scanners and a digital single-lens reflex (DSLR) camera. The robot classifies the next positions obtained after analyzing the visible area of a previously built 2D map into three diffuse categories and moves to the best next position. By following this method, the system can move autonomously in corridors, but the scanning completeness is not guaranteed. Finally, the autonomous robotic platform MoPAD [13], composed of a 3D laser scanner and a RGB camera, is able to generate detailed 3D models of the indoors of buildings. This platform has been tested in more complex scenes with clutter and occlusion. Table 1 presents a summary of a set of representative autonomous mobile scanning systems, including the environment tested, the technology, and the type of degree of autonomy. In the last few years, micro aerial vehicles (MAV) have also been used as autonomous platforms that extract 3D information from indoor and outdoor scenes. Bircher et al. [15] proposed a new path-planning algorithm for the autonomous exploration of an unknown volume using a firefly hexacopter and a stereo camera. The experiment took place in a single room. A similar UAV platform with two configurations is presented in Reference [16]. Heng et al. [18] present an algorithm for simultaneous exploration and coverage with an assumed data acquisition system composed of an MAV equipped with a forward-looking depth-sensing camera. The system is simulated in an office-like environment.
In another context, the platform of Rusu et al. [19] acquires 3D maps of kitchens with the aim of interacting with recognized objects. The robot enters the room and sweeps the scene with a laser mounted on its end effector. The output is a coarse 3D model composed of cuboids and planes that represent relevant objects, such as containers or tables.
The most recent proposals are those of References [13,14,17]. Kurazume et al. [17] have proposed an innovative cooperative multiple robot system that scan indoors and outdoors. The system is composed of a mobile robot equipped with an on-board 3D laser scanner (the parent robot) and several child robots, including terrestrial robots and quadcopters. The parent robot obtains 3D data and generates a large-scale 3D model, whereas the child robots implement a precise localization technique. The system does not have any knowledge of the environment and works autonomously in complex scenarios. Kim et al. [14] introduce a robotic platform with a hybrid laser scanning system composed of five 2D laser scanners and a digital single-lens reflex (DSLR) camera. The robot classifies the next positions obtained after analyzing the visible area of a previously built 2D map into three diffuse categories and moves to the best next position. By following this method, the system can move autonomously in corridors, but the scanning completeness is not guaranteed. Finally, the autonomous robotic platform MoPAD [13], composed of a 3D laser scanner and a RGB camera, is able to generate detailed 3D models of the indoors of buildings. This platform has been tested in more complex scenes with clutter and occlusion. Table 1 presents a summary of a set of representative autonomous mobile scanning systems, including the environment tested, the technology, and the type of degree of autonomy.

Context of the Review in the Process of the Creation of As-Is Models
Some of the scanning systems referenced collect partial data of the environment [4] or do not generate a formal geometric model of the scene as seen in References [6,10,12,14,17]. However, our interest lies in generating automatically geometric semantic models of buildings. The degree of automation varies from the simple automatic acquisition of data (i.e., coordinates of 3D points, color, and temperature, etc.) carried out by mobile robots/platforms, to the automatic detection and positioning of small components of the building (e.g., signs or sockets on walls). In general, the modeling tasks are carried out at five levels, each of which provides a particular semantic model. Figure 2 shows these levels and the outputs at each of them.
This paper discusses the methodologies and processes followed to accumulate dense 3D information of the scene with the objective of creating a realistic 3D model of a building. Of all these levels, this survey covers only the first and the second level, which are directly related to the acquisition of 3D data.
The first level (see Figure 2) concerns the automatic data acquisition of the building. The semantic model at this level is, therefore, a mere collection of unconnected 3D data of the visible scene (3D data are coordinates and, sometimes, and color). Thanks to the scan planning and the next best scan algorithms, the autonomous moving platforms collect sufficient information and roughly represent the inside of References [5,6,[10][11][12], or outside [7] of the buildings.
At the second level, a simplified geometric model of the building is obtained. At this level, the model is composed of primary features, such as vertices, edges, and faces. This representation is commonly implemented by using a graph-structure, which relates these geometric primitives, all of which form a B-rep representation [7,[20][21][22]. This simple model does not yet contain valuable information from a semantic point of view.
The first and second levels are sometimes mixed into a single level at which the scanner simultaneously collects the data from a mobile platform and generates a primitive polyhedral model of the indoor scene scanned in References [8,11,13].
Higher levels concern the recognition of essential structural elements (SE), the recognition and labeling of essential parts of these structural elements and the recognition of small building service components. None of these levels, which concern 3D data processing and modeling, are within the scope of this review.
In the following sections, we analyze current key issues and problems in the field of the autonomous 3D scanning of buildings and make a comparison among the most important systems in the construction context.
Higher levels concern the recognition of essential structural elements (SE), the recognition and labeling of essential parts of these structural elements and the recognition of small building service components. None of these levels, which concern 3D data processing and modeling, are within the scope of this review.
In the following sections, we analyze current key issues and problems in the field of the autonomous 3D scanning of buildings and make a comparison among the most important systems in the construction context. .

Open Issues.
Autonomous scanning appeared in the late 1990s as a new and challenging topic and became more and more important as the functionality and accuracy of sensors improved.
While some scanning platforms are currently able to autonomously scan specific environments and simple scenarios, there are still a number of underlying questions in this research field, which are rarely debated in papers. These open issues are related to the achievements and limitations of the

Open Issues
Autonomous scanning appeared in the late 1990s as a new and challenging topic and became more and more important as the functionality and accuracy of sensors improved.
While some scanning platforms are currently able to autonomously scan specific environments and simple scenarios, there are still a number of underlying questions in this research field, which are rarely debated in papers. These open issues are related to the achievements and limitations of the autonomous scanning methods. As will be shown throughout this paper, the current systems still have gaps and serious weaknesses that need to be addressed in order to create autonomous systems that are able to work in realistic environments. A critical discussion of this is presented in the following subsections.

Utility and Redundancy of the Data
3D scanning entails collecting data from the scene, but the question is: Which of the collected data are and are not necessary to develop a further application? Or otherwise, is there a particular strategy that considers the utility (i.e., useful or useless) of the data before the scan is carried out?
While the final goal of some approaches is to create a 3D model of a building, the objective of their scanning stage is to accumulate as much data of the visible area as possible [14]. Since the goal is simply to scan everything inside (or outside) the building, these approaches do not deal with the utility of the data collected in References [5,9,12,23]. Redundancy and cluttering are thus ignored by these brute-force scanning techniques. As a consequence of this, a huge amount of data has to be processed after several scans, with the sole aim of recognizing furniture [8], or extracting frontiers. 3D mapping in References [7,9,10,12,19,24], and robot localization/navigation [10,25,26] and digitization [5,27] are research lines in which the data redundancy problem is not considered in the data acquisition stage. However, some redundancy in the collected data can also be useful, for instance to increase robustness or to increase the probability of completeness of the model.
In the case of the extraction of a model of the building, most techniques have to manage a huge amount of irrelevant 3D data, which do not correspond to the structural components of the building, but rather to other objects inside the scene (i.e., furniture and clutter) [6,11,28]. These methods are inefficient because a part of the point cloud is unimportant as regards creating the 3D model. For example, if the goal of the process is to detect openings within a room, the data pertaining to the furniture is unimportant information.
In contrast with these methods, there are few that focus the scanning towards collecting 'useful' 3D data of the scene. This means that if the objective is to create a 3D model of the inside of the building, the scan planning algorithm is focused on collecting data from the structural elements (SE), which are essentially the floor, walls, columns, and ceiling [13]. In this case, the next best scanner position is based on the already sensed and recognized parts of the building structure. Such algorithms consequently reduce the amount of data and the time, and also alleviate the algorithmic complexity in further processes.
In summary, optimizing the choice of the regions of the scene to be sensed is an unexplored issue that could be addressed in the future. The alternative of interactive scan planning strategies, therefore, makes sense in this context. A hybrid human-computer approach entails a semi-automated optimization of the scanning process, in which the knowledge of the geometry and heuristics can help the user to decide the best scan planning and achieve a high-quality model of the scene. This strategy has been successfully carried out during the large-scale recording of heritage [29].

The Complexity of the Scene
Among other aspects, a complex scene has a high component of occlusion and clutter, such as a lounge in an inhabited building. Complexity also implies irregular geometry, such as non-regular rooms (i.e., concave/convex storey rooms), which are connected through openings (mainly doors).

Geometry
With regard to the interiors of buildings, some systems are constrained to the scanning of corridors [10], which are very simple shapes. Most works deal with scenes composed of a corridor to which several rooms are connected [5,6,11,12,[24][25][26][27]30]. In some cases, the mobile scanning system moves along the corridor, enters the room in order to take 3D data, leaves the room, and goes back to the corridor [5,12]. These systems are prepared to only work in such topologies.
Many of the systems work in rectangular rooms [5,6,12,15,18,24,25,27,30], and a few in free-shape interiors [11]. Iocchi et al. [11] generated a multilevel 2D-map with which to generate not-necessarily orthogonal structural elements. Moreover, the system is able to manage the changes in the plane of the room with the help of an inertial measurement unit. Scenarios with more flexible shapes are used in Reference [26], but the goal here is not to build a 3D model of the scenario, but rather to navigate with the help of depth cameras. Approaches that work in free-shape scenarios, such as abandoned mines [31] with winding corridors, are usually hand-guided and are not, therefore, within the scope of this paper.
The reconstruction of concave rooms is not frequently dealt with. Jun et al. [32] proposes a method with which to cut the point cloud data with arbitrary plans and extract the convex parts. Nevertheless, some concave structures may not be detected. This method has been tested by humans that carry a 3D LIDAR. The system presented by Prieto et al. [13] is able to deal with convex and concave rooms connected by doors.
Few exterior-scanning techniques with robots can be found in literature. Good examples are those of Wolf et al. [33] and Blaer et al. [7]. Wolf et al. digitize buildings and obtain very simple 3D shape models, such as parallelepipeds, but the robot is commanded externally. Blaer et al. [7] present an automated data acquisition on large-scale outdoor sites, but they assume that a two-dimensional map of the region is known. Kim et al. [14] validate their method in both outdoor and indoor environments. Recent MAVs with on-board 3D sensors are applied to extract coarse models of the facades of building [15,16]. Figure 3a presents prototypes of simple and complex geometry of indoors, whereas Figure 3b shows scenes tested with the platforms referenced in Section 2.
Much work is required to automatically digitize more complex scenes. The autonomy of the current systems is limited to indoor scenes composed of a large single room or a wide corridor and a few rooms on a single storey. The autonomous scanning of a complete building composed of multiple storeys is one of the most important future challenges.   Table 1.

Occlusion and Clutter
In the building scanning field, occlusion is considered to be one of the principal sources of uncertainty. In order to avoid occlusions, the operators scan the environment from many different viewpoints and thus generate a fused point cloud. However, human scanning can be inefficient and time-consuming.
There are authors who explicitly do not refer to the occlusion problem [11]. Some approaches work only in empty scenarios or deal with little occlusion and clutter [5]. In these cases, the point cloud is easily processed and the navigation of the mobile system is also quite simple in interiors [10]. Other methods are tested by considering small obstacles in the scene [6,16,26]. In these scenes, the segmentation of points that belong to structural parts of the building are calculated by applying RANSAC [34], or other similar matching algorithms [35]. The next-best-position scan algorithms work perfectly in such a friendly framework and a single 3D model is easily obtained.
On the contrary, in inhabited scenes, a lot of objects, such as pieces of furniture, shelves, or even human beings, might occlude the essential structural parts of the building. In these cases, the lack of information entails the use of more robust and efficient NBV algorithms [12,13,30]. Some examples of works that deal with high levels of occlusions are those of [7][8][9]12,13]. In Reference [7], a church, with a particularly cluttered environment, which includes several chairs and tables, is scanned by using a particular NBV algorithm. Owing to the difficulty of the scene as regards autonomous navigation, the robot is manually guided along the narrow paths towards the next scanning position. Besides indoor scenes, this method also digitises outdoor scenes, e.g., large forts, which contain high levels of clutter and occlusion. Blodow et al. [8] propose an NBV algorithm for the digitization of kitchen environments. These scenes usually contain a high level of clutter and occlusion owing to the kitchen furniture and household appliances in them. Bormann et al. [12] propose an NBV algorithm that combines 2D and 3D planning. The 3D information is used to reduce the percentage of occlusion obtained from the 2D planning. The NBV algorithm proposed by Potthast et al. [9] is able to work in two different kinds of scenes: Small and cluttered environments (a table top with a large number of objects) and in large-scale office environments. The small environments contain a high level of clutter and occlusion, whereas the large-scale office environments are simple simulated scenes with little clutter. The method proposed by Prieto et al. [13] addresses the occlusion problem in the same planning algorithm. The experimentation presented is carried out in several scenarios with a high level of clutter and occlusion. In the building scanning field, occlusion is considered to be one of the principal sources of uncertainty. In order to avoid occlusions, the operators scan the environment from many different viewpoints and thus generate a fused point cloud. However, human scanning can be inefficient and time-consuming.
There are authors who explicitly do not refer to the occlusion problem [11]. Some approaches work only in empty scenarios or deal with little occlusion and clutter [5]. In these cases, the point cloud is easily processed and the navigation of the mobile system is also quite simple in interiors [10]. Other methods are tested by considering small obstacles in the scene [6], [16], and [26]. In these scenes, the segmentation of points that belong to structural parts of the building are calculated by applying RANSAC [34], or other similar matching algorithms [35]. The next-best-position scan algorithms work perfectly in such a friendly framework and a single 3D model is easily obtained.
On the contrary, in inhabited scenes, a lot of objects, such as pieces of furniture, shelves, or even human beings, might occlude the essential structural parts of the building. In these cases, the lack of information entails the use of more robust and efficient NBV algorithms [12][13], [30]. Some examples of works that deal with high levels of occlusions are those of [7][8][9] [12][13]. In Reference [7], a church, with a particularly cluttered environment, which includes several chairs and tables, is scanned by using a particular NBV algorithm. Owing to the difficulty of the scene as regards autonomous navigation, the robot is manually guided along the narrow paths towards the next scanning position. Besides indoor scenes, this method also digitises outdoor scenes, e.g., large forts, which contain high levels of clutter and occlusion. Blodow et al. [8] propose an NBV algorithm for the digitization of kitchen environments. These scenes usually contain a high level of clutter and occlusion owing to the kitchen furniture and household appliances in them. Bormann et al. [12] propose an NBV algorithm that combines 2D and 3D planning. The 3D information is used to reduce the percentage of occlusion obtained from the 2D planning. The NBV algorithm proposed by Potthast et al. [9] is able to work in two different kinds of scenes: Small and cluttered environments (a table top with a large number of objects) and in large-scale office environments. The small environments contain a high level of clutter and occlusion, whereas the large-scale office environments are simple simulated scenes with little clutter. The method proposed by Prieto et al. [13] addresses the occlusion problem in the same planning algorithm. The experimentation presented is carried out in several scenarios with a high level of clutter and occlusion. Figure 4a presents prototypes of low and high occlusion indoors, whereas Figure 4b shows different examples of scenes with occlusion that have been tested in autonomous systems.

The Next Best Scan Position
One of the keys in automatic scanning with occlusion is that of a good selection of the next scanner position. The decision regarding the best next position should lead to the attainment of a complete, high-quality and non-redundant digitization process. This is known in literature as the Next Best View problem (NBV) [36], but in our context, this could be renamed as Next Best Scan (NBS).
Most autonomous methods use the current 2D map of the scene and estimate the next scan position on the basis of the future visibility of the scene, sometimes with low levels of occlusion [14]. Several of these techniques employ the frontier-based approach as a starting point [17], [37]. However, 2D information is highly incomplete in terms of occlusion in a 3D world and frequently leads the system to erroneous or non-optimum positions. In addition, the next best position algorithm should take into account more important parameters regarding the accessibility and security of the mobile platform, along with its cost in terms of power.
3D information-based NBV algorithms are more efficient when exploring volumes and inspecting surfaces. Blaer et al. [7] propose a two stage-planning algorithm. In the first stage, a coarse model of the scene is obtained by making use of a 2D map, setting several random scanning locations on the map and selecting an optimal set of positions which covers the boundaries of the free space. In the second stage, the coarse model obtained is refined. A 3D NBV algorithm is executed in a voxel space with labels: unseen, seen-empty and seen-occupied. The position from which more boundary unseen voxels (unseen voxels adjacent to seen-empty voxels) are seen is then selected as the next best position. Surmann et al. [5] develop a 2D-3D mix algorithm that calculates several NBVs of various slices from the point cloud and selects the best option. Bormann et al. [12] also propose a 2D-3D mix algorithm, in which the robot moves to the positions obtained from the 2D NBV until it recognizes an enclosed space. The system then uses a 3D NBV algorithm. The strategy is similar to that of Blaer et al. [7].

The Next Best Scan Position
One of the keys in automatic scanning with occlusion is that of a good selection of the next scanner position. The decision regarding the best next position should lead to the attainment of a complete, high-quality and non-redundant digitization process. This is known in literature as the Next Best View problem (NBV) [36], but in our context, this could be renamed as Next Best Scan (NBS).
Most autonomous methods use the current 2D map of the scene and estimate the next scan position on the basis of the future visibility of the scene, sometimes with low levels of occlusion [14]. Several of these techniques employ the frontier-based approach as a starting point [17,37]. However, 2D information is highly incomplete in terms of occlusion in a 3D world and frequently leads the system to erroneous or non-optimum positions. In addition, the next best position algorithm should take into account more important parameters regarding the accessibility and security of the mobile platform, along with its cost in terms of power.
3D information-based NBV algorithms are more efficient when exploring volumes and inspecting surfaces. Blaer et al. [7] propose a two stage-planning algorithm. In the first stage, a coarse model of the scene is obtained by making use of a 2D map, setting several random scanning locations on the map and selecting an optimal set of positions which covers the boundaries of the free space. In the second stage, the coarse model obtained is refined. A 3D NBV algorithm is executed in a voxel space with labels: unseen, seen-empty and seen-occupied. The position from which more boundary unseen voxels (unseen voxels adjacent to seen-empty voxels) are seen is then selected as the next best position. Surmann et al. [5] develop a 2D-3D mix algorithm that calculates several NBVs of various slices from the point cloud and selects the best option. Bormann et al. [12] also propose a 2D-3D mix algorithm, in which the robot moves to the positions obtained from the 2D NBV until it recognizes an enclosed space. The system then uses a 3D NBV algorithm. The strategy is similar to that of Blaer et al. [7].
Potthast et al.
[9] present a probabilistic NBV using Markov Random Fields. The method assigns the probability of being seen in the next scan position to each voxel. The voxel space contains occupied, free, and unobserved voxels and the NBV is defined as the position with the highest expected knowledge gain. In Reference [6], another form of representation is used. The next best view is determined using a 2D-grid that stores different attributes of the 3D world. An octree representation is used in Reference [8]. The octree space is labeled with four different labels: Occupied, free, unknown, and fringe (i.e., voxels labelled as free adjacent to unknown voxels). The goal here is to find the pose of the robot from which most fringe voxels are seen with an overlap of at least 50%. Meng et al. [16] also create an octree structure and define the NBV position as a function of the volumetric information gain model. They propose a two-stage planner, consisting of a frontier-based boundary coverage planner and a fixed start open travelling salesman problem solver. The information gain is similar to that of the entropy concept [9], which is the increase in the knowledge from a visibility-based propagation with ray-casting. Charrow et al. [10] propose a two stage planning approach. In the first stage, a set of candidate trajectories is generated by using a combination of global planning and local motion primitives. The trajectory that maximizes the objective is then chosen. This trajectory is refined by maximizing the CSQMI (Cauchy-Schwarz Quadratic Mutual Information) objective, while satisfying the motion constraints of the robot.
An online inspection path planning algorithm for micro-aerial vehicle is proposed in [38]. The NBV is here based on a volumetric approach that constructs volumetric models composed of voxels. The procedure is tested in simulated indoor and outdoor environments. The approach presented in Reference [15] is also implemented in aerial vehicles, but using a sampling-based receding horizon path planning paradigm. The quality of the view selected is determined by the amount of visible uninspected volume. As in the earlier case, this method provides a voxel model of the explored space. Another work in the MAV context is that of Heng et al. [18]. The system performs simultaneous exploration and coverage in unknown environments. The goal is chosen from among different candidates located on the edges of a currently known free space, thus maximizing the information gain weighed exponentially by its cost to reach. Quintana et al. [39] generate a growing 3D voxel model of the environment by selecting next scanner positions on the basis of the visible uninspected surfaces of the structural elements of the building. This method is robust under severe occlusion and provides a raw 3D model of the structure of buildings.

Assumptions and Initial Hypotheses
The assumption of hypotheses always reduces the applicability of a method and makes a scanning system less reliable as regards its application to real environments. In order to solve the 3D mapping or the reconstruction of 3D models in an effective manner, most of the existing methods impose the shape and dimensions of the scene a priori, whereas others assume a set of restrictive hypotheses. For example, Potthast et al. [9] impose the bounds of the scenario and Rusu et al. [19] work in scenes in which pieces of furniture are modeled as cubic volumes.
2D maps of the scenario are assumed beforehand in some approaches [7,11,26]. In Reference [7], an initial point cloud of the target region is calculated in a first stage by using a two dimensional map of the region. The initial point cloud model is refined in the second stage. Iocchi et al. [11] obtain a 3D map of the environment as a set of connected 2D maps, while Biswas et al. [26] solve the localization problem using a 2D map, which has been extracted from the blueprint of the building.
A rather unusual hypothesis is that of Strand et al. [6], in which the room is detected only if the corridor is bigger than the existing rooms. Other systems solve the location and planning problems using targets [17] or assume that the pose of the system is known beforehand [18]. In Reference [7], the robot makes use of the GPS, and therefore, navigates only in outdoor environments.
Flexible and adaptable approaches can also be found in literature [5,9,10,12,13,24,30]. These methods do not require strong assumptions or hypotheses related to either the scene or the initial localization of the sensor. The mobile scanning platform does not have any knowledge of the shape, dimension and other characteristics of the scenario, and is able to move autonomously and obtain the necessary data. Despite this, there are implicit assumptions that are not mentioned in papers that correspond to advanced situations and special scenes. For example, it is always assumed that the mobile platform moves freely on horizontal ground, without different ground levels, that the walls are vertical planes, that moving objects or human beings are not allowed or that the doors are all open. All these and other assumptions signify that the current systems are not yet prepared to accomplish the automatic digitization of more realistic environments.

Comparison
This section makes comparisons between a representative set of autonomous systems. As is known, when authors present their approaches and experimental results, they do not follow a particular pattern. Some articles provide complete information regarding the method/technique, while others provide only the visual evidence of the results and do not evaluate the proposed method in a quantitative manner. However, comparing methods implies certain risks. The selection of the methods to be compared, the comparison method itself, and even some of the features taken into account for comparison, may be debatable. We trust that this comparison will be truly useful to other researchers.
Fourteen properties of fourteen autonomous scanning systems have been compared. The acronym NR in included when the characteristic is 'not reported' by the author or when it is not possible to infer it from the paper. A discussion of these features is provided in the next paragraphs. Table 2 summarizes the properties mentioned in Section 4, that is: the environment in which the systems has been tested, the final goal of the method presented, a brief description of the next best view algorithm used, details of the geometry scanned, occlusion and clutter circumstances, a brief description of the principal hypotheses and assumptions, and the output provided by the scanning system. In order not to repeat the aforementioned comments, only the properties 'Geometry' and 'Output' are referred to below.
Most of the methods shown in Table 2 do not create a geometric model of the scene, but rather provide a large unstructured point cloud that represents the whole scene [10,14,17]. Moreover, the point cloud is not segmented into semantic groups of points, such as walls, ceiling, floor or clutter. Some point cloud models contain information concerning color [6] and a few authors generate a coarse 3D CAD model [5,8,11,13], or a meshed model [12], which would cover the second semantic level explained in Section 3.
It is noteworthy that those approaches that yield simple 3D models work on rectangular floors and flat walls. On the contrary, the systems that provide unstructured point clouds do not have geometric restrictions [7,[15][16][17]. In any case, none of the aforementioned systems scan a complete multi-storey building.
Other important properties that are frequently considered in autonomous scanning systems have been gathered together in Table 3.
First, some necessary preprocessing tasks are usually found in the data acquisition stage. The most common are the detection of the outliers and the alignment of point clouds. With regard to outliers, large range scanners capture data that might originate from outside the scene and that, additionally, some incorrect data might originate from the scanner itself when beams reflect off shiny surfaces. Outliers are, therefore, relatively common in scanning and can have a disturbing effect on the further data processing algorithms. Data registration is an old topic that needs to be tackled and efficiently solved. These processes are not referred to by some authors, but are, however, explicitly explained by others [6,8,13,14].  The next column corresponds to openings. Door detection is a key issue when the mobile scanning platform has to navigate on a storey with interconnected rooms. In this environment, the robot has to recognize the door of the room in order to pass from one scene to another. There are, of course, a lot of 3D imaging-based methods that detect doors in buildings by means of laser scanners or photogrammetric systems, but only a few of them are implemented on mobile scanning platforms. Of all the systems in Table 3, only References [6,12,13], can detect doors.
Time requirement is sometimes a confused characteristic. In this framework, not only is the time required for scanning important [6,11,12], but also the time needed to calculate the next best position of the robot. Although in the case of non-autonomous methods this information might be irrelevant, the computation stage of the NBS is time-consuming in autonomous scanning systems. The methods proposed in References [5,7,9,13], therefore, have time requirements because their respective NBS algorithms consume a large proportion of the processing time.
The next two columns concern the experimentation. In all the cases discussed, the experimental work has been developed in real environments, but some techniques have evaluated the precision and completeness of the output yielded in simulated scenes. Simulation tools, such as MAV Gazebo simulator Rotors [40], v-rep simulator from Coppelia Robotics [41] or Blensor [42] have been used in the systems [13,15,16,18].
In order to demonstrate the importance of a method, an experimental comparison with similar works is necessary. However, comparisons with other works are unusual in this research field. Some sorts of comparisons are made in References [9, 10,13,[16][17][18]. Charrow et al. [10] performed simulations and real world experiments and compared the performance of their method with that of three other approaches, one of which is a manual method. Entropy maps over time and additional statistics regarding the distance travelled and the time required to reduce entropy are shown in detail. A comparison between frontier-based exploration algorithms is developed in Reference [9]. In this case, a table-top scene with high clutter and occlusion is explored by a robot. The authors also simulate two scenarios and compare the number of scans with four other methods. Prieto et al. [13] present a comparison in terms of 3D NBV algorithms by evaluating the ray-tracing procedure and the evolution of the scanned scene with three other approaches. Meng et al. [16] compared the exploration results with the method of Bircher et al. [15]. Heng et al. [18] compared the path length and the percentage of observed voxels in the final model with two similar approaches [43]. Kurazume et al. [17] made a comparison between two of their multi-robot scanning prototypes.
The columns concerning the 'Quantitative evaluation' and 'Time report' of the scanning system are very important. The first is specifically focused on the quantitative evaluation of the 3D model generated. A good assessment of the method is essential to justify and provide arguments regarding the soundness of the proposed technique. Visual arguments are not sufficiently convincing and make the method less compelling. It is noteworthy that 50% of the approaches referenced here do not provide any quantitative evaluations of the 3D model obtained (unstructured point cloud of a coarse semantic model). Around 23% of them present a poor quantitative evaluation, reporting the number of scans of the process [6,11], or the number of unobserved cells in a scanner position [9]. On the other hand, it is even more surprising that the high percentage (78%) of the methods do not provide a report concerning the accuracy of the 3D map obtained against a ground truth. Note that the final point cloud is the pillar as regards extracting a BIM model of the scene. However, if the precision of the 3D map is not evaluated, a realistic model will not be guaranteed. Complete time reports are also unusual in papers. Times concerning the total scanning time or NBV times are frequently provided.

Weaknesses and Strengths
Significant limitations and advantages of each technique included in Tables 2 and 3 are summarized in Table 4. Note that the majority of the weaknesses can be inferred from the features presented in earlier sections. Some of the common disadvantages are restrictive initial assumptions, the low resolution of the 3D model generated, planning exploration based on 2D maps, and the fact that only low occlusion is permitted. The strengths are the reduction in the number of scans, a reduction in the excessive time involved in the preprocessing stages, the flexibility of the system as regards working in different environments, no geometric restrictions, and real applicability. Table 4. Some important limitations and disadvantages of mobile scanning systems.

Ref.
Limitations and Weaknesses Strengths Reduced movement ability of the robot (simple trajectories).
The NBV is based on 2D data The planning algorithm works in a continuous state space rather than a grid-based space.
[7] Blaer2007 Localisation based on a GPS, it cannot be used indoors.
Multiple iterations needed to attain the final model. Two-dimensional map of the region is assumed.
The system works in indoor and outdoor scenes. Scanning in a large-scale environment.
[11] Iocchi 2007 Owing to the way in which the 3D model is obtained, there could be wrong structures and a loss of information in the final model.
Generation of a single 3D model of the building structure Reduction of the planning model representation.
[8] Blodow 2011 The objective is focused on mapping objects in the scene.
Inefficient 2D/3D NBV for 3D model of buildings. High overlapping between scans is required.
Exhaustive labelling. Detailed semantic 3D model of the scene. The correct registration of point clouds is guaranteed with the overlapping restriction applied.
[12] Borrmann 2014 The robot has plenty of space to move, the occlusion and the obstacles are concentrated on the walls. Great loss of information because of the height at which the data is taken.
The system obtains a 3D thermal point cloud. Low computational and memory requirements.
[9] Potthast 2014 The NBV might not be reachable for the robot and the final position could be bad for the exploration. The exploration algorithm is evaluated in a simulated and simple scenario. There is an error owing to accumulative registration issues The system is able to mimic different exploration strategies. The system works in small cluttered scenes (table top) and less cluttered large indoor scenes (simulated office environments).
[10] Charrow 2015 The exploration is based on 2D data Particularly efficient for robots with limited battery life, equipped with noisy sensors with short sensing ranges and limited FOV.
[15] Bircher 2016 Low resolution of the 3D model. High level of occlusion is not permitted. Scene with given bounds. Online planning. Real applicability. Open source.
[13] Prieto 2017 The robot's footprint is too big for inhabited indoor scenes.
Excessive time involved in the preprocessing stages.
The system works in complex scenarios composed of furnished concave rooms. The number of scans is reduced. Generation of a single 3D model of the building structure.
[17] Kurazume 2017 Planning algorithm in 2D space. High complexity of the overall system.
Scanning in a large-scale environment with no geometric restrictions.
[16] Meng 2017 Low resolution of the 3D model. High level of occlusion is not permitted. Online planning. Real applicability.
[14] Kim 2018 Noisy individual dynamic point cloud. The registered point cloud is not sufficiently accurate The system works without using targets. The number of scans is reduced.

Conclusions: Improvements and Future Projects
Having analyzed the autonomous scanning systems referenced, a discussion a regarding: "what has been achieved?", "what is achievable?", and "what are the future challenging projects?" is presented in the following subsections.

What Has Been Achieved?
To date, the current autonomous scanning systems have achieved several important milestones, such as: • Advances in the automatic digitization of buildings: For many years, the 3D digitization of buildings and large-scale environments has been carried out exclusively by expert operators. However, in the last few years, intelligent mobile scanning platforms have successfully performed the digitalization and 3D mapping of real environments. The current systems are able to navigate and scan the indoors of buildings composed of one or several rooms. The most advanced methods work in irregular (i.e., non-rectangular) rooms and rooms with reduced occlusion.
• Autonomy of the mobile platforms: Some of the current systems do not impose strong assumptions about the scene, such as the a priori knowledge of the scene, signifying that the autonomy of the scanning process can be guaranteed. These autonomous systems are able to collect data and provide a coarse 3D model of the interior of an inhabited building.
However, some restrictive hypotheses have been imposed on all the systems referenced, which concern: The shape of the floor and walls (flat surfaces), the state of the doors (open doors), and the degree of occlusion (low occlusion). All this still limits the autonomy of the current platforms in realistic environments.
• Modelling: The field of automatic BIM models has become one of the most exciting 3D computer vision research lines to have emerged in the last few years. To date, autonomous platforms provide point cloud models or elementary B-rep models of building indoors, which include the basic architectural elements and openings.

What is achievable?
Important improvements that should be made to the current systems and future issues are shown as follows.
• NBS algorithms: The majority of the current NBS are thought to scan the whole scene, regardless the identity of the data collected. The future NBS algorithms should address the problem of scanning the structural elements of the building and thus avoid collecting any other kind of data, such as furniture, clutter and outliers. More efficient NBS algorithms would reduce the volume of data and the processing time, and highly alleviate the algorithmic complexity of further processes.
• Quantitative evaluation: While some of the aforementioned approaches provide quantitative evaluations, little information is provided as regards the accuracy of the 3D model obtained. A comparison with regard to a ground truth in real experiments is particularly necessary. One of the future improvements in this field would, therefore, would be to provide complete information about the deviations and errors in the 3D model generated.
• Complexity of the scenes: To date, the autonomous scanning systems provide, single 3D models composed of planar structures of the building (walls, ceiling, floor, and columns). Nevertheless, solutions for more complex scenarios, are needed, including curve structures, irregular ceilings, floors at several levels, and stairs inserted into the environment. Much research on this issue is still necessary.

Future Challenging Projects
Nowadays, the autonomy degree of the current mobile platforms is limited. To achieve a truly autonomous scanning system, future platforms should tackle the following projects.
• Scanning of single storeys with closed doors: The digitization of a single storey composed of a corridor and several easy-shaped rooms has not been completely resolved. The major problem concerning how the mobile platform passes from one room to another has not, as yet, been dealt with. All the methods, with the exception of the work presented in Reference [13], assume that the doors are open and that the mobile platform will, therefore, be able to enter the adjoining room. However, this issue has not yet been completely demonstrated in papers. Beyond open doors, none of the current approaches is able to deal with closed or semi-closed doors. In these situations, the mobile platform should interact in some way with the door in order to clear the way. Scanning storeys with closed or semi-closed doors is, therefore, a challenging topic that will also lead to increase the autonomy of the scanning systems.
• Scanning multi-storey buildings: The autonomous scanning of a multi-story building has not yet been carried out. The key problem is how to move the mobile platform from one floor to another. As in the earlier case, the system should autonomously recognize the lift door, enter the lift and, eventually, leave the lift when the next floor is reached. Executing these actions in a precise manner will entail the development of efficient recognition and robot-interaction algorithms that will allow the truly autonomous system to be attained.