Collaborative 3D Scene Reconstruction in Large Outdoor Environments Using a Fleet of Mobile Ground Robots

Teams of mobile robots can be employed in many outdoor applications, such as precision agriculture, search and rescue, and industrial inspection, allowing an efficient and robust exploration of large areas and enhancing the operators’ situational awareness. In this context, this paper describes an active and decentralized framework for the collaborative 3D mapping of large outdoor areas using a team of mobile ground robots under limited communication range and bandwidth. A real-time method is proposed that allows the sharing and registration of individual local maps, obtained from 3D LiDAR measurements, to build a global representation of the environment. A conditional peer-to-peer communication strategy is used to share information over long-range and short-range distances while considering the bandwidth constraints. Results from both real-world and simulated experiments, executed in an actual solar power plant and in its digital twin representation, demonstrate the reliability and efficiency of the proposed decentralized framework for such large outdoor operations.


Introduction
The use of outdoor mobile robots for real-world applications, such as search and rescue [1,2], logistics [3], agriculture [4], industrial inspection [5], surveillance and maintenance [6,7], have increased rapidly over the past several years. This is due to the capabilities of mobile robots to assist humans in dangerous, repetitive or time-consuming tasks. A successful robot navigation for such applications relies primarily on three aspects: mapping, localization, and trajectory planning. Robotic mapping generates a map by deciphering the spatial information of the environment acquired through the robot's sensors. Commonly, mapping is carried out first to understand the environment and enhance the subsequent localization and motion planning tasks. However, for many applications, mapping must be executed frequently to continuously acquire a complete situational awareness and to support reasoning and decision making in dynamic environments.
Many outdoor robotic automation applications, such as solar farm inspection and maintenance [8][9][10], disaster response [11][12][13], agriculture [14] and city re-planning [15,16] need to cover very large areas of 1-40 acres. Traversing such expansive environments with a single mobile robot is very time-consuming or even impractical. In addition, conventional localization methods based only on GPS, odometry and IMU are not always reliable for such long-range operations. The uneven, rough, and unstructured nature of rural environments, such as in solar farms and disaster-struck regions, introduce additional localization errors. In such scenarios, a multi-robot system can be a suitable alternative to obtain full coverage of the area and execute tasks in a collaborative manner, resulting in a more complete and time-efficient solution. In regards to mapping, a multi-robot system can rapidly explore the environment in parallel and from different angles, to generate more accurate maps in less time [17] and to enhance the localization accuracy in challenging environments.
In general, a multi-robot mapping framework will require three main elements:

1.
A mission-planning unit to coordinate robots to explore the environment.

2.
A communication policy to share the map generated by each robot.

3.
A matching and merging method to integrate individual maps into a global map.
In practice, mapping large outdoor 3D environments with a team of mobile robots is challenging due to the communication limitations and the high volume of sensor data that need to be shared and processed. High-capacity wireless communication routers commonly available to robots, such as Wi-Fi modules, typically have a limited range of about 90 m or less in open outdoor environments. On the other hand, long-range wireless devices, such as the Xbee-Pro RF module, can provide a large coverage of up to several kilometers but have a very limited bandwidth of about 200 kbs, which is not suitable for sharing high volumes of sensor data. Furthermore, the employment of 4G mobile technologies is not always possible due to the lack of coverage in many rural areas or in disaster-response scenarios. Hence, it is essential to consider such communication constraints while developing a multi-agent mapping algorithm.
In this work, we present an online, fully distributed and active framework for a team of mobile ground robots, equipped with 3D LiDAR sensors, for mapping and situational awareness in large outdoor environments. We develop our solution specifically for the application of inspection and surveillance of large multi megawatt solar plants, while considering the strict communication constraints that exist in terms of range and bandwidth in the commonly available wireless technologies. However, the proposed framework can be applied to many other outdoor exploration problems, such as search and rescue or precision agriculture. The outline of the paper is as follows: Section 2 provides the literature review on 3D mapping and localization methods and discusses the various techniques and limitations of multi-agent cooperative mapping and point cloud registration. The proposed distributed multi-agent framework is discussed in Section 3. The experiments and results are presented in Section 4. Finally, Section 5 concludes the findings and pitches possible improvements to the proposed method.

Literature Review
Over the past decade, 3D sensors have emerged as revolutionary data acquisition devices. In robotics, 3D sensory information has been used for mapping, localization, obstacle avoidance, and scene recognition. Omnidirectional LiDARs [18][19][20][21], RGBD cameras [22][23][24][25], and uni-directional LiDARs [26][27][28] have found applications in the field. Three-dimensional sensorbased algorithms, such as LOAM [18,19,21], have become the norm for an out-of-the-box simultaneous localization and mapping algorithm. However, the computational complexity of 3D algorithms and the size of 3D sensor data make it challenging to achieve scalability. Due to these reasons, SLAM (simultaneous localization and mapping) algorithms [29][30][31] are not commonly preferred with 3D sensors, especially in large areas [32,33]. Methods of point cloud compression [34] and low-cost registration [35,36] are promising endeavors but require prior training. The considerable size of 3D data further imposes constraints on a decentralized multi-agent mapping system, making it challenging to share observations continuously. Hence, it is essential to transfer only the required features.
Point clouds represent rigid body data structures, typically generated from LiDAR sensors. The process of aligning two point clouds is called point cloud registration. The process results in a rigid body transformation matrix that aligns one point cloud in the frame of another. The registration techniques [37] are categorized into local and global registration. Global registration [38][39][40][41] is ideal when the initial transformation estimate has yet to be discovered, and is perfect when the point clouds are acquired from spatially distant frames of references. When the initial transformation is known, a quick refinement can be acquired from a local registration technique. Local registration techniques, such as iterative closest point (ICP) [42,43], normal distribution transform [44], point-to-plane [45][46][47], color-based [48,49] or class-based methods [50,51], can be computationally expensive if the initial transformation is inaccurate. For multi-agent systems, A set of N a agents, R, is tasked to explore and map the environment. Each agent, R i (∀R i ∈ R), is equipped with a 3D LiDAR sensor, a GPS receiver, an IMU, and an odometer sensor. The LiDAR sensor has a maximum range of L max . Considering possible GPS drifts, odometer slippage, and electromagnetic interference, each agent has an instance of an extended Kalman filter (EKF) to fuse the sensory information from GPS, IMU and odometer and obtain a more reliable estimate of the global localization G i in the geographic coordinate system. An instance of the LiDAR odometry and mapping (LOAM) [18] is used on each agent to locally map the surrounding environment from the R i perspective and to locate the agent within the map. This egocentric LOAM localization is represented in the form of an odometry message O i , in the sub-map M i , of agent R i . Each agent is equipped with a short-range and a long-range wireless transceiver. The short-range transceiver is a Wi-Fi module that allows the peer-to-peer transfer of large map data, which is only activated when two agents are within distance C max of each other. The long-range transceiver is an RF module that can ensure long-range transfer of very small quantities of data, which is used only to share odometry, GPS or heavily down-sampled 3D data. Figure 1 portrays an instance of the proposed fast decentralized multi-agent active mapping framework, executed on agent R i . A separate instance of the framework is executed on each agent of R. This ensures an active decentralized framework for multiagent 3D mapping. The framework has two modules: a continuous update module and a conditional update module. The continuous update module is executed with every new sensory update. In this module, an instance of LOAM generates the egocentric odometry O i and the map M i . Added to this, an instance of extended Kalman filter fuses the sensory data from GPS, IMU and odometry to output the global localization estimate, G i . A ball tree generator, as explained in Section 3.1, generates a global ball tree B i that keeps track of O i and G i throughout time and at specified distance intervals. Whenever a new tree node is added, it is also shared with all other agents using the long-range transmitter. Minimal proximity search, detailed in Section 3.2, is used to compute the proximity of an incoming tree node from an agent R j with all nodes of the ball tree B i . The conditional update module is executed when the result of the minimal proximity search is true.  Conditional update module consists of several computationally intensive processes. Spherized point cloud registration, in Section 3.3, describes down-sampling, segmentation and nearest-neighbor sampling of the M i , to generate a spherized map M s i . M s i , which is considerably reduced in size but abundant in features, is then transmitted using the long-range transmitter to the respective agent R j , for point cloud registration. The resultant transformation is then used for merging the complete maps once the agents are close enough to transfer the complete maps via the short-range transceiver.

Global Ball Tree Generator
A ball tree is a binary tree data structure, that is used for data partitioning to ensure fast data query [67]. When an agent, R i , is initialized, a ball-tree, B i , is instantiated with G i (t = 0) as the root node. The node n of B i is represented as B i (n) and the latest node added is B i (end). The pair-wise distance used for constructing B i is the Haversine distance. The Haversine distance represents the angular distance between two points on the surface of a sphere. Ball trees with Haversine distance are shown to result in fast nearest-neighbor look-up for GPS datasets [68,69]. A node B i (n), inserted at time instant T, consists of the G i (t = T) and is tagged with the corresponding LOAM-odometry message O i (t = T). The framework continuously monitors the LOAM-odometry O i and iteratively calculates the distance between the current odometry O i (t = T) with that of the last node B i (end). If this distance is greater than a predefined value, D thresh , a new node is added to B i , with The process, called global ball-tree generator, is described in Algorithm 1, which is continuously run by each agent in R. Each node of the ball-tree has a global localization estimate G i , which is mapped with the corresponding LOAM-odometry message stored at that time instant. These pair-wise data are essential to link the egocentric localization of R i with the global frame. Alternatively, we could georeference the point cloud, for each iteration of LOAM, which requires an accurate initial global localization estimate [70] and would be computationally costly [71,72]. The intermittent method proposed in this work eases the computational complexity. It also alleviates the dependence on a single initial estimate.
Algorithm 1 Global ball-tree generator for agent R i .

Minimal Proximity Search
In a communication-constrained environment, it is essential to ensure that the bandwidth is used for the most vital transmissions. The process explained in this section queries for possible spatial overlaps in the global frame. Minimal proximity search transmits every new node added to the ball tree B i (end) and compares it with the ball trees of other agents in R for proximity within the global frame.
Every new node added to B i of agent R i is shared over the long-range transmitter, with the remaining agents. The minimal bandwidth required to transfer the node B i (end) makes it ideal for a communication-constrained environment. With no loss of generality, an agent R j (∈ R, ∀i = j) has its own instance of LOAM, EKF and global ball-tree generator, resulting in its own sub-map M j , egocentric odometry O j , global localization estimate G j , and global ball-tree B j . Agent R j processes the incoming node information from R i by carrying out a nearest-neighbor search in B j . If the global localization estimate G j entry of the resultant nearest-neighbor node, B j (Neighbour), is within a certain threshold(r i ), of B i (end), the node B j (Neighbour) is shared with R i . This is depicted in Algorithm 2.
Algorithm 2 Minimal proximity search by R j .
In an effort to minimize the effect of GPS drifts, the distance threshold, r i , is a bounded dynamic distance threshold. Equation (1) ensures that r i is bounded within predefined values (r min ,r max ) and proportional to the uncertainty C i ek f of the agent R i EKF estimate.

Spherized Point Cloud Registration
There exists a transformation, T ij , that aligns M j with M i of agents R j and R i . This transformation can be achieved by registering the map M j with the map M i . However, the sizes of M i and M j are rapidly increasing as the R i and R j individually explore and map the environment, from their perspective. Sharing such large data over a long-range bandwidth-limited communication channel will lead to a high network latency and data loss. Hence, this section describes a strategy to only share small sampled subsets of the maps, and only for the regions that are expected to have sufficient overlapping features for registration.
With no loss in generality, let us assume that for two agents, R i and R j , the minimal proximity search was successful. A successful minimal proximity search (Section 3.2) gives an assurance that, at two different time instances, the R i and R j are spatially close, in the global frame. The minimal proxy search results in two nodes, node i and node j , of B i and B j , that are globally close: B i (node i ).O and B j (node j ).O gives the egocentric odometry measurement of R i and R j . For lucidity, we will refer to B i (node i ).O and B j (node j ).O as L i and L j , respectively.
A Euclidean ball, of radius r o , is generated in both M i and M j , centered at L i and L j , respectively. This method of filtering is hereby referred to as spherization. The points within the sphere are sampled and used for point cloud registration. Since they represent a fraction of the overall map, the size is considerably reduced. Added to this, the sampled map, M s i and M s j , have features that are bound to overlap. This is because the sampled map was generated when the agents were spatially close in the global frame. Since point clouds can be considered a rigid body of particles [73], we can conclude that the T ij that successfully aligns M s i with M s j also aligns M i with M j . Spherized maps are transmitted over the long-range transmitter to the respective agents. For a seamless transmission on the constrained bandwidth channel, the spherized maps have to be less than 25 kilobytes. Thus, spherization is preceded by downsampling, ground-plane removal and outlier removal to bring down the overall size of the point cloud to the prerequisite limit. Each agent generates the spherized maps in its own frame of reference. These frames of reference will be separated by several meters, which is not ideal for a local point cloud registration algorithm. We use a global registration algorithm to align these two spherized point clouds roughly. The transformation matrix acquired from the global registration technique is then used to initialize the local point cloud registration. Local point cloud registration helps in refining the initial rough alignment.
The local point cloud registration results in the transformation, T ij , and the RMSE, E ij , of all inlier correspondences.
The RMSE [45], in the context of point cloud registration, refers to the root mean square value between the corresponding points of the two point clouds. For N c correspondences, between M s i and M s j , the RMSE for transformation, T ij , can be calculated by Equation (2). c i and c j refer to all the correspondences in M s i and M s j , respectively. The transformation, T ij , that minimizes E ij , across all executions of Algorithm 3 is chosen for the full map alignment.
In the proposed implementation, the global registration is carried out using RANSAC (random sample consensus) [74]. The FPFH (fast point feature histograms) feature [75], a 33-dimensional vector that encapsulates the local geometric property, for each point, is calculated. RANSAC searches for these features to make a fast and approximate alignment. For local registration, we are aware that the process can be further enhanced by sharing only the features [76,77] rather than the entire point clouds and subsequently using featurebased registration methods [45,50,51]. We could also implement a semantic mapping technique [20,21] for acquiring a segmented map before spherization. However, we use point-to-plane ICP [46] to keep the overall complexity and tunable parameters to a minimum.

Real World Experiments
We carried out our experiments with two Jackal robots (shown in Figure 2a) (named J1 and J2), from Clearpath Robotics, on an actual solar farm (total area approximately 1 km 2 , depicted in Figure 3a). The two robots were equipped with a Velodyne Puck (VLP-16) sensor that has 16 layers of infra-red (IR) lasers, a horizontal field of view of 360 • , a vertical field of view of 30 • and a speed of up to 300,000 data points per second. A Pixhawk 2.1 cube IMU and a Here+ GPS receiver were also added to each robot. The global path of the robots are planned beforehand to explore the regions of interest, through visiting a set of predefined GPS waypoints. The plans also include some time instances where the robots are within communication range for the sharing of map information between agents. The global paths taken by the robots are presented in Figure 2b, along with the area in which the agents were within short-range communication distance and the region that had a successful minimal proximity search outcome. The values of (r max , r min ) were set to (20 m, 30 m). Figure 4 represents the various stages of the framework during the experiment. Each agent's LOAM initialization (shown in Figure 4a,d) creates an ego-centric frame of reference. Once the successful minimal proximity search is achieved, the down-sampled point cloud spheres are shared between both agents. These are then registered in the respective frames of reference, as portrayed in Figure 4b,e. Finally, when the agents are close enough for short-range communication, the latest maps generated by J1 and J2 are shared and aligned, as depicted in Figure 4c,f.

Simulated Experiments
The simulations were carried out on a digital twin world (a 3D Gazebo model) of the actual solar farm used in the real-world experiments, as shown in Figure 3a. The simulated environment had a total area of about 1 km 2 . The digital twin was purely used for simulation purposes, to further test the collaborative 3D scene reconstruction framework with multiple agents, and was not linked to real-time sensory data from the actual solar farm. The complete ground truth point cloud was acquired by converting the 3D Gazebo mesh model (depicted in Figure 5b) to a 3D point cloud model (depicted in Figure 5c). Figure 4. Real-world experiment-the various phases of the proposed method for agent J1 (above) and J2 (below). The blue and red points correspond to the point clouds generated by J1 and J2, respectively. (a,d) LOAM initialization; (b,e) minimal proximity search was successful, an agent receives a spherized down-sampled map from the other agent and registers this in its own map; (c,f) each agent receives the full map from the other agent, as they are within short communication range. The agent aligns the incoming map with its own map using the transformation acquired from the spherized registration. We initially performed a brief parameter analysis to select the values for maximum LiDAR ranges, L max . Figure 6 details three maps, generated with three different L max values, by an agent following the same path in between the solar panels of the digital twin world (Figure 3c). We can note that for L max = 20 m (in Figure 6a), LOAM is unable to properly find the correspondences that are further away. Due to this, the reconstructed panels are incorrectly curved. This issue is not seen for L max = 40 m (in Figure 6b) and L max = 80 m (in Figure 6c). In an effort to keep the computational load to a minimum, L max was selected as 40 m for the experiments. To validate the robustness of the proposed algorithm to 3D-laser errors, we induced a Gaussian error in the simulated VLP-16 sensor. The red circles in Figure 7 map the correspondences between the ground truth and the 3D map generated by a single agent. It can be noted that, owing to LiDAR errors, there is a clear mismatch in the generated map. The resultant incorrect 3D reconstruction is evident in the encircled areas. Such reconstruction errors, across each mapping agent, is bound to make the eventual point cloud registration prone to errors. However, the cooperative framework was shown to be robust against such individual reconstruction errors and could still merge multiple maps with a good accuracy. We averaged the results over 15 simulations of varying number of UGVs (N a = 2 to 5 agents). For comparing the robustness of the proposed method to the noise in the LiDAR data and resultant LOAM mapping, we executed the experiments with different LiDAR rates, f of 10 hz and 5 hz. Mapping at a lower laser frequency, for the same agent speed, is relatively more error prone. Similar to the real-world experiment, navigation is carried out in between predefined waypoints, shown in Figure 5a. These waypoints are grouped as rows and divided uniformly amongst N a . The selected path covered all possible communication conditions and allowed validation of the end-to-end functionality of the proposed framework. For N a = 3, the distribution of agents and the path taken by each agent can be seen in Figure 8b. Each color in the point cloud, shown in Figure 8a, represents the sub-map obtained by a single agent. Note that, the ground plane was removed for ease of visualization. The errors in individual LOAM-maps can be evidently seen as artifacts in Figure 8a. The framework is robust to these errors and is able to merge the maps irrespectively.  Figure 9a plots the minimum-maximum-mean RMSE values for various number of agents. We can note that there is a decline in the overall RMSE values with the increase in the number of agents, N a . The mapping error, infused by the LiDAR noise, accumulates over time. This is spread across the number of agents involved and thus the overall decline in RMSE is expected, provided the map merging is accurate. This decline in RMSE implies a successful fusion of each agent's map.

Fitness Analysis
Fitness( F ) property of a point cloud registration gives us the overlapping area of the two point clouds. For our scenario, where the target point cloud, M g has N g points, F is given by Equation (3).
Since M g is constant throughout the analysis, a high fitness score implies an increase in the number of correspondences. In Figure 9b, we can note that with the increase in N a , there is a steady increase in F , (and thereby N c ), implying that the merged point clouds have more point-to-point correspondences with the M g . This can be attributed to the successful blending of the map of each agent.

Covariance Analysis
The Fischer information matrix, I, that is acquired as a result of the point cloud registration of M g on M m , characterizes the confidence in the registration process. The inverse [78] of I gives the covariance matrix, C, of the point cloud registration process [74,79,80]. C gives us the uncertainties involved in the 6 degrees of freedom. We utilize the determinant of C for our analysis of the overall uncertainty. Figure 10a showcases the healthy decline in the value of the determinant of C. This implies that the point cloud registration is more confident in its result, with the increase in N a . The reduced covariance or the increased confidence is the result of successful map merges.
The results depicted in Figures 9 and 10 further showcase the robustness of the proposed algorithm. The behavior exhibited by the agents at f = 10 hz is the similar to that of f = 5 hz. In other words, the agents showcase a decline in RMSE and det(C) and an increase in F with the increase in N a . Though there is a performance degradation in f = 5 hz with respect to f = 10 hz, this is expected, owing to the increased mapping error from LOAM.

Time Analysis
Trivially, the time taken to map the whole map should decline linearly with N a . For all runs of the simulation, the homogeneous agents have the same set of waypoints to visit. Figure 10b showcases the time taken to complete the whole map. Mapping is deemed to be complete when all agents have completed Algorithm 3, with E u ≤ 0.4. This threshold gives us a reliable transformation in between maps of different agents. As is evident in Figure 10b, there is an expected decline in time taken; however, this is not linear. This is because of the time taken to achieve a successful spherized registration with low E u .

Density
The density, at a point(p) in a point cloud, is the number of points around p, within a sphere of radius, r d . The density at point p, D(p), is given by Equation (4). Density can be roughly considered analogous to the resolution of an image. Thus, a denser point cloud is a more detailed point cloud.

D(p) =
Number of points within sphere(centre=p,radius=r d ) 4 3 π · (r d ) 3 For analysis, we average the density of every point in the merged point cloud [81]. The spherical radius r d is chosen as 1 m. The results are shown in Figure 11a. We can see a healthy increase in the average density with N a . This is attributed to the increased overlapping areas. The overlapping areas have unique points from different agents during the merging of individual point clouds. This in-turn increases the number of points per unit area. Added to this, though trivial, it is evident that the density is higher at f = 10 hz than f = 5 hz. This is due to the increase rate of the LiDAR data acquisition. Figure 11b depicts the surface density distribution across each point in the point cloud, generated with 3 agents (represented in Figure 8a). Figure 11c depicts the histogram of surface density of Figure 11b. We can note that higher surface density is rare and achievable primarily in areas of overlap.

Conclusions and Future Work
This article proposes an active, distributed, homogeneous multi-agent mapping and localization framework. The distributed framework enables conditional long-range and short-range peer-to-peer communication for small and large data. The proposed method is tested on a real-world solar farm with two UGVs and its digital twin with multiple agents. The results showcase the robustness of the proposed algorithm to independent mapping errors. However, we acknowledge that using direct point cloud registration in the framework can be error-prone, with increased LiDAR errors. Additionally, a spherized registration is robust to global localization errors up to a few meters. Thus, noisy EKF estimates, due to large GPS or magnetic interference, might lead to incorrect map merges. For future work, we aim to extend the framework to a heterogeneous team of agents with a heterogeneous set of sensors. We also plan to incorporate an optimal waypoint planning module, considering the constraints in communication and each agent's battery life. Currently, we are conducting an in-depth parameter study to understand and optimize the framework.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

EKF
Extended Kalman Filter EMI Electromagnetic Interference LOAM Lidar Odometry and Mapping in Real Time IMU Inertial Measurement Unit