Bridge Inspection Using Unmanned Aerial Vehicle Based on HG-SLAM: Hierarchical Graph-Based SLAM

: With the increasing demand for autonomous systems in the ﬁeld of inspection, the use of unmanned aerial vehicles (UAVs) to replace human labor is becoming more frequent. However, the Global Positioning System (GPS) signal is usually denied in environments near or under bridges, which makes the manual operation of a UAV difﬁcult and unreliable in these areas. This paper addresses a novel hierarchical graph-based simultaneous localization and mapping (SLAM) method for fully autonomous bridge inspection using an aerial vehicle, as well as a technical method for UAV control for actually conducting bridge inspections. Due to the harsh environment involved and the corresponding limitations on GPS usage, a graph-based SLAM approach using a tilted 3D LiDAR (Light Detection and Ranging) and a monocular camera to localize the UAV and map the target bridge is proposed. Each visual-inertial state estimate and the corresponding LiDAR sweep are combined into a single subnode. These subnodes make up a “supernode” that consists of state estimations and accumulated scan data for robust and stable node generation in graph SLAM. The constraints are generated from LiDAR data using the normal distribution transform (NDT) and generalized iterative closest point (G-ICP) matching. The feasibility of the proposed method was veriﬁed on two different types of bridges: on the ground and offshore.


Introduction
Over the last decade, the public's awareness of the importance of safety management has been increasing due to a series of large-scale safety accidents. Preventing the collapse of large structures is crucial, as it causes not only human harm, but also huge economic loss. The importance of structural diagnosis and maintenance to prevent structure collapse and safety accidents is increasing. The number of structures, such as bridges and buildings, is steadily growing both in Korea and abroad, and the number of aging structures more than 30 years old has also increased. Existing safety diagnosis and maintenance have limitations. The reliability of the diagnosis results is low; efficient internal and external inspections are difficult; and time and monetary costs are high because professional personnel are performing visual inspections or using nondestructive testing equipment. Therefore, developing autonomous systems for efficient inspection and maintenance of structures, such as bridges and high-rise buildings, is necessary.
With the increasing demands for autonomous systems in the field of inspection, the use of unmanned aerial vehicles (UAVs) to replace human labor is becoming more frequent. Perhaps the greatest obstacle to fully automated bridge inspection using a UAV is the GPS (Global Positioning System)-denied environment found under a bridge. For automated bridge inspection, being able to estimate the UAV location is necessary. Aerial vehicles, which are typically used outdoors, estimate their locations on the basis of integrated information obtained from the GPS and an onboard inertial navigation system (INS). However, due to the nature of large bridges, the area below such a bridge is a sheltered area where GPS signals are not available, and due to the influence of the many steel structures that compose the bridge, estimation by a UAV of its precise location is difficult. Nevertheless, to perform inspections under large bridges, autonomous UAVs need to be able to accurately estimate their positions and follow a planned path.
Since the beginning of the 21st Century, many studies have addressed the inspection of structures using robotic technologies [1][2][3][4][5][6][7][8][9][10][11]. However, most of these approaches require preinstalled infrastructures, such as ropes or other platforms, or are mostly controlled by human operators. In addition, the accessibility of some parts of bridges is limited for these systems. Although a few robots have been developed that do not require supporting infrastructures, they do not have sufficient payloads and do not guarantee safety during operation [12][13][14].
Previously, our team introduced several wall-climbing-type UAVs for micro-inspection of the surfaces of bridges [15][16][17]. While these UAVs may overcome the shortcomings of the above systems, the technological readiness level is not high enough to operate them for a long time, so they are not suitable for use in bridge inspection immediately. Nevertheless, with the development of UAV technology, UAVs have recently begun to be used in many fields. UAVs equipped with sensors such as GPS receivers, inertial measurement units (IMUs), and cameras have made it possible to inspect bridges on a large scale, overcoming the disadvantages of traditional methods. For example, some studies proposed a method that involved acquiring an image of the underside of a bridge using a camera mounted on top of a UAV, which was later analyzed by a specialist [18]. Similarly, a method for detecting cracks in steel bridge members based on image data obtained using small UAVs has been presented [19]. A visual inspection strategy was introduced based on the use of a UAV to generate a 3D point cloud and an image map using images captured by a camera [20]. Another work suggested a visual servoing method to stabilize image capture for a target position by applying soft-force feedback based on comparisons of the captured images [21]. In general, however, most of the contributions listed above are heavily reliant on low-level stabilization control and GPS signals for position maintenance, or at least they assume certain situations. The major remaining problem is that when UAVs are flying near or below a bridge for inspection, the GPS signal might be lost, resulting in unstable flight and consequently causing the acquired image quality to degrade. This has been the greatest challenge hindering the use of UAVs in the field of bridge inspection. One solution for localization and mapping in a GPS-denied environment is simultaneous localization and mapping (SLAM) based on various sensors, such as IMUs, cameras, and LiDARs (Light Detection and Ranging).
In 2015, the Robotics Institute of Carnegie Mellon University developed a project called the Aerial Robotic Infrastructure Analyst (ARIA), which introduced a new concept to bridge inspection. In ARIA, the researchers used a UAV with a rotating 2D LiDAR to estimate its odometry and map the target bridge. The developed algorithm, called LiDAR odometry and mapping (LOAM) [22], enabled safe, efficient, and high-precision assessment of critical infrastructure. With LOAM, the UAV could fly safely near or under bridges; however, the feasibility tests performed for verification were conducted in bridge environments with short paths and little movement. Later, a lightweight and ground-optimized LOAM (LeGO-LOAM) [23] was proposed; however, it was utilized for ground vehicles. Surfel-based mapping (SuMA) [24] was proposed to provide 3D LiDAR odometry and mapping using surfels, which are efficient in representing large environments. In addition, considerable recent studies on deep-learning-based LiDAR odometry or SLAM have been carried out and have shown its potential [25][26][27].
In recent years, researchers have become interested in implementing either visual SLAM or odometry in the absence of GPS signals; examples of the corresponding methods include parallel tracking and mapping (PTAM) [28], large-scale direct SLAM (LSD-SLAM) [29], oriented FAST and rotated BRIEF SLAM (ORB-SLAM) [30], semidirect visual odometry (SVO) [31], direct sparse odometry (DSO) [32], and monocular visual-inertial state estimator (VINS-Mono) [33]. Most of them have shown excellent results in terms of localization and mapping. Nevertheless, the environments near bridges are often subject to harsh conditions involving strong winds or obstacles, and consequently, the need for a more robust algorithm to prevent crashing of UAVs carrying expensive inspection equipment has emerged.
To overcome the disadvantages of vision-only and LiDAR-only methods, LiDAR has been widely incorporated with other techniques in recent years. Some researchers proposed a SLAM method called vision-enhanced LiDAR odometry and mapping (VELO), which relies on multiple cameras tightly coupled with a LiDAR [34]. The authors of LOAM [22] improved on their previous work by taking visual odometry output as a motion prior followed by LiDAR odometry; the corresponding method is known as V-LOAM [35]. A tightly coupled LiDAR inertial odometry and mapping method called LIO-mapping [36] has also been developed that adopts the scheme of LOAM in combination with VINS-Mono. In addition, industry or governmental research institutes have also provided solutions that can be used for the surveillance of various types of infrastructure using UAVs by applying LiDAR-based 3D SLAM methods [37,38]. However, these methods are not optimized for practical inspection, and modifying either the hardware or software to better suit the user's purpose is not easy.
An overview of our group's entire framework of research on autonomous structural inspection is shown in Figure 1. In our previous papers, we presented significant results related to coverage path planning [39] and a brief introduction to the overall framework that consists of preflight 3D mapping, viewpoint selection, and path planning followed by in-flight localization and online inspection [40]. In this paper, we focus on 3D mapping and a localization method in the context of hierarchical graph-based SLAM using multiple sensors and the implementations of high-level control of the autonomous flight for the purpose of bridge inspection using a UAV. For robustness, visual-inertial (VI) state estimates and the corresponding LiDAR sweep are combined into subnodes. Then, they are combined into a supernode that is comprised of states and a submap (accumulated scan data). The constraints are generated using the LiDAR-based normal distribution transform (NDT) [41,42] and generalized iterative closest point (G-ICP) matching [43]. Before full global optimization, NDT locally optimizes the relationship between subnodes in the hierarchical scheme. This enables accurate localization and mapping of the UAV surroundings for autonomous navigation without operator intervention. Both 3D mapping and localization steps use the hierarchical graph-based (HG)-SLAM structure proposed in this paper. With our system, the drone can fly reliably over and under a bridge and acquire high-quality images sufficient for image-based inspection. In addition, the efficacy of the proposed system is verified through various simulations and experiments. Additionally, using our developed system, another group has performed highly successful bridge inspection and diagnosis based on the acquired image dataset [44].
The main contributions of this paper are listed as follows: • To the best of our knowledge, this paper presents the first experimental validation that encompasses autonomous bridge inspection with a UAV that utilizes a graph-based SLAM method fusing data from multiple sensors, such as a camera, an inertial measurement unit (IMU), and a 3D LiDAR. It proposes a methodology for each component of the proposed framework for applying a UAV to actual bridge inspection. • The proposed graph structure for SLAM offers two types of optimization with two different sensors on the aerial vehicle: local (VI odometry (VI-O) and NDT) and global (G-ICP).
Although node generation based on VI odometry alone tends to easily diverge depending on the environment, VI odometry combined with NDT-based odometry can robustly generate nodes with hierarchical optimization. Therefore, the robustness of node generation can be guaranteed.
• The proposed method was tested on two different types of large-scale bridges presenting different flight scenarios ( Figure 2  Overall framework for autonomous bridge inspection using a UAV (unmanned aerial vehicles). The first two stages, 3D mapping and coverage path planning, are preflight processes. After a precise 3D map is acquired, path planning is performed by extracting the locations at which the target bridge needs to be inspected. Only the 3D mapping and localization steps and the technical control scheme for the autonomous flight system are considered in this paper. Both 3D mapping and localization steps use the hierarchical graph-based (HG)-SLAM (simultaneous localization and mapping) structure proposed in this paper. The rest of this paper is organized as follows. Sections 2.1.1 and 2.1.2 introduce the proposed hierarchical graph-based SLAM approach in detail. Section 2.2 describes the technical implementation of the control scheme, and Section 3 reports the results of several outdoor experiments. Finally, Section 4 concludes by summarizing the overall implementation, addressing the difficulties and challenges faced and discussing future work.

Graph Structure
Our approach to the fusion of data from multiple sensors (camera and LiDAR) is based on the pose graph SLAM scheme [45], which is applied to construct and solve nonlinear optimization problems with a limited sensor measurement history and limited UAV states. The overall flow diagram of the proposed SLAM method is presented in Figure 3. The graph essentially consists of a set of nodes and a set of edges. The nodes represent the UAV poses, and the edges constrain the nodes based on the relative measurements of pairs of nodes. In particular, unlike ground mobile robots, the aerial vehicles considered in this paper may be placed in a dangerous situation in the case of incorrect pose estimation; therefore, we define the basic nodes as supernodes and further establish multiple subnodes that make up a supernode ( Figure 4). Each supernode represents the UAV pose and submap, and each subnode consists of a state estimation obtained using VI odometry and the corresponding point cloud data. The subnodes are connected by constraints derived via VI odometry, and once a certain number of subnodes are generated, NDT matching creates an additional constraint. The G-ICP algorithm then creates global constraints relating the supernodes, and batch optimization is performed for all poses represented by the supernodes and subnodes.  In the first step, VI odometry is applied to estimate the camera pose in the local coordinates based on information provided by the camera-IMU sensor system mounted on the UAV. Here, we adopt VINS-Mono [33], which is a sliding window smoothing method, as the base node generator. A node is generated every time a pose is estimated through VI odometry. However, the results obtained through VI odometry are subject to some drift error, and moments occur at which no odometry information is provided when the UAV purely rotates or due to sudden movements. Therefore, another sensor-based odometry estimation method is additionally needed to ensure robust node generation.
LiDAR sensors can obtain high-precision information from the environment, allowing a UAV to perform odometry measurements. The basic principle of LiDAR odometry is to calculate the transformation between two consecutive point cloud frames, which are called the reference and input point clouds, through matching. The purpose of the matching process, also called point cloud registration, is to obtain the optimal transformation matrix for matching or aligning the reference frame with the input frame. Figure 4. Illustration of the proposed graph structure. The green dashed lines represent the constraints generated through VI-odometry (O)-based pose estimation, and the red dotted line represents the constraint obtained through NDT-based odometry. Thus, each subnode is composed of a state estimate obtained using VI-O and a 3D LiDAR sweep. Note that these two measurements induce local optimization within a set of subnodes to ensure stable node generation. The blue solid lines represent the constraints generated via G-ICP matching, which are used for global optimization.
NDT [41,42], which is represented in the form of a combination of normal distributions, describes the probability of finding a surface point at a certain position. For NDT-based odometry, regular cubic cells are defined to subdivide the point cloud set, and the points in each cell are continuously modeled by a normal distribution. In this paper, the accuracy of NDT-based odometry is improved by accumulating previous point cloud data. In addition, the result of VI odometry is used to initialize the parameters of the NDT [41]. To estimate the UAV pose at the j-th subnode from the (j − 1)-th subnode, the point cloud at the j-th subnode is set as the target point cloud in the NDT function, and the point cloud at the (j − 1)-th subnode is set as the query point cloud. The target point cloud ( χ tar j ) can be defined as the accumulated point cloud data from the (j − 1)-th and the j-th subnodes as follows: where c j denotes the local LiDAR sweep or the query point cloud for registration at the j-th subnode and X j−1 j is the result of the SE(3) transformation of pose x at the j-th subnode from the (j − 1)-th subnode. In this way, the 3D LiDAR scan data are accumulated to form the reference point cloud at every certain step to manage the size of the reference point cloud.
To map the bottom part of the target bridge, we use a tilted 3D LiDAR attached to the underside of the UAV. For mounting on the UAV, the LiDAR is usually small and lightweight, so the scan is sparse. Thus, each UAV pose can have only a very small overlapping area or no overlapping area at all, which may result in low scan matching quality. Since VI odometry provides highly reliable odometry measurements over short intervals, one "supernode" can be created by stacking multiple subnodes generated based on VI odometry and NDT odometry over a certain distance, as shown in where T R L denotes the coordinate transformation matrix from the LiDAR to the UAV, X s i j denotes the transformed subnode x s i j in the i-th supernode, and j is a subnode index. For global scan matching, the supernodes can be registered using their local point cloud maps via the iterative closest point (ICP) algorithm [46]. The ICP algorithm is a simple way of calculating the transformation between two consecutive scans by recursively searching for pairs of neighboring points in the two scans and minimizing the sum of the point-to-point or point-to-plane distances. However, because the local maps generated via VI and NDT odometry can be sparse in density, the normal ICP algorithm that relies on point-to-point or point-to-plane relationships cannot achieve clear matching. Therefore, instead of normal ICP matching, G-ICP matching [43] is applied, which can utilize sparse data to capture the data tendency during the optimization process. Since the actual surface is assumed to be at least partially distinguishable, each small part of the local map can be considered locally planar. Under this assumption, the G-ICP algorithm calculates the covariance value for each point on the local map (the covariance is larger for a direction that is more parallel to the plane and smaller for a direction that is closer to the normal direction). Finally, it checks the similarity of the covariances between each pair of points and sets corresponding matching scores.

Graph Optimization
Once the graph is constructed, the optimization process consists of finding the best node configuration for all available measurements. The measurements are affected by noise, which is assumed to be additive, white, and normally distributed with a mean of zero. Figure 4 shows a graphical description of our pose graph-based SLAM method. Let x = (x 1 , . . . , x i ) T be the state vector, where the node T be the set of measurement values; and let Λ i,j be the information matrix of the measurements, which is the inverse of the covariance matrix. Then, r i,j denotes the residual error between the actual and predicted observations as follows: where h i,j denotes the predicted measurement derived from the nodes. Maximum-likelihood estimation (MLE) seeks the optimal state configuration x * , as follows: where C denotes a set of constrained pose pairs i, j . Since optimizing the UAV trajectory is equivalent to minimizing the Mahalanobis distance of the residual errors, i.e., r i,j in (3), which is generally a nonlinear function, the optimization problem can be solved through nonlinear least-squares optimization with proper linearization, as follows: where H and g are the Hessian and gradient, respectively, of the right-hand side of (4), such that: where J i,j (x) denotes the Jacobian of the error function (3) with respect to x. In the present study, the incremental smoothing and mapping (iSAM) algorithm [47] is utilized to optimize the full trajectory of the UAV and generate a precise 3D point cloud map.

High-Level Control Scheme
The goal of autonomous flight for inspection is to follow a specified path designed to enable the inspection of all surfaces. To achieve this purpose, several points must be considered for control of the UAV. The requirements necessary for autonomous flight are described briefly as follows.
First, when the UAV flies along the specified path, the cumulative image data acquired by the mounted camera must cover all surfaces of the bridge. To meet the first requirement for successful autonomous flight, a path consisting of waypoints is designed based on a prebuilt point cloud map of the bridge as part of preflight processing. Second, two time-consecutive camera images must overlap at a ratio higher than a certain value to ensure that the surface can be continuously inspected. Therefore, the UAV should move at a nearly constant speed when moving along the path. By means of a control scheme with constant speed and heading angle, the UAV can fulfill the second requirement. Finally, it is essential to design an accurate path-following method to allow the UAV to resist wind gusts that would force it off its path and to ensure that the UAV does not miss the bridge surface to be inspected. In this paper, a simplified line-of-sight (LoS) algorithm is used as the path-following scheme. This simplified LoS algorithm is a compact version of the ordinary LoS algorithm [48]. The LoS algorithm is a path-following scheme for natural motion and robust path return that relies on setting local goals along the path based on the perpendicular distance between the UAV position and the path. As in the ordinary LoS algorithm, the simplified LoS algorithm first obtains the next local goal by drawing a sphere. This local goal is defined as the intersection between the path and this sphere, whose radius is the LoS radius and the center point is the location of the UAV. In addition, the LoS vector (P LoS ) between the current location and the local goal of the UAV is obtained. In this way, by applying the simplified LoS algorithm, a velocity vector with the same direction as the LoS vector and a speed (v s ) that is suitable for bridge inspection is derived as follows:ṗ * = v s · P LoS P LoS .

Platform and Experimental Setup
A field test was performed to verify the feasibility of our method for inspecting bridges. Our field experiment platform is shown in Figure 5. The body was designed and developed by a collaborating company; the dimensions are approximately 1090 mm × 1090 mm × 450 mm (length × width × height); the maximum flight time is 30∼35 min; and the weight is 12 kg without batteries ( Table 1). The sensor system has frame rates of 40 Hz for the camera, 240 Hz for the IMU, and 10 Hz for the 3D LiDAR. For node generation, VI odometry is run at 20 Hz, and NDT-based odometry is performed at approximately 3 Hz, while G-ICP matching is run at 1 Hz to generate global constraints. The embedded NUCis connected to Pixhawk via Telem2 port, and Pixhawk obtains the current HG-SLAM pose and orientation estimation (x, y, z, θ, φ, ψ) through the topic /mavros/vision_pose and sends the IMU data, local position for the feedback, and velocity in the body frame to the PC.  Figure 5. UAV used in this research. This custom-designed quadrotor-type UAV carries various sensors and devices. In the forward direction, a Sony α9 DSLR camera is mounted on a homemade gimbal for bridge inspection. At the top is a PX4-based Pixhawk 2 flight controller with a Here + GPS (Global Positioning System) receiver. This GPS receiver is not used in the SLAM system being tested here, but is used later to convert the estimated local position of the drone into the global frame to serve as the ground truth when a GPS signal is available. An embedded Intel i7 NUC with 16 GB of RAM is also included for data processing. A 1280 × 1024 resolution monocular camera (PointGrey FL3-U3-13Y3M-C) is mounted in a downward-looking configuration. A 3D LiDAR system (Velodyne Puck LITE) is mounted with a tilt angle of 45 degrees at the rear of the UAV. It is also equipped with an LTE modem for remote communication, allowing the drone to be remotely operated wherever LTE communication is available.
In this study, our method was tested for two types of bridges, a riverside bridge and a sea-crossing bridge. For a bridge located on the ground, the camera is installed to look downward to extract feature points of the ground, whereas for a bridge located over the sea (or a river), an upward-looking camera is installed on the UAV instead of a downward-looking camera so that it can see the bottom of the bridge. Deungseon Bridge in Chuncheon, Korea (Figure 6a), and Geobukseon Bridge in Yeosu, Korea (Figure 6b), were chosen as the test sites. For Deungseon Bridge, considering the flight time limitations, we planned a path to inspect only two spans, although the entire bridge is 2 km long. The distance between adjacent spans of the bridge is 50 m, the height of each column is 14.5 m, and the width of the bridge is 11 m. Geobukseon Bridge, on the other hand, has a total length of 744 m and a 230 m span, and in this study, only one span without main towers was tested. The height of each bridge column is 23 m, and the bridge is 20 m wide. Before fully applying our SLAM algorithm for navigation near an actual bridge, we first tested the stability of the local optimization method (relating subnodes by means of VI and NDT-based odometry constraints once the UAV has moved a certain distance based on the VI odometry results) to ensure the robustness of basic node generation, as explained in Section 2.1.1. Figure 7 shows a noticeable difference. The point cloud map shown in Figure 7a is the result obtained when VI odometry alone is used to generate the nodes. Close inspection of the region in the white circle reveals that the bottom of the bridge and the column appear curved. This is due to the drift error of VI odometry when there are few loop closures, which results in errors in the map and in position estimation as a whole. The cumulative error creates considerable problems when flying over long distances and increases the risk of the UAV falling. In contrast, local optimization with NDT-based odometry results in straight lines for the columns and the bottom of the bridge, as shown in Figure 7b. The resulting position estimates can also be expected to be accurate. This occurs because the estimation is not divergent even if loop closing does not occur frequently since the basic node generation is optimized locally in a hierarchical structure.  Figure 8 compares the results of trajectory estimation using our method with the results obtained using VI odometry and GPS positioning in two different scenarios: straight and zigzag. In the straight scenario, the VI odometry result shows a slightly larger drift in translation and rotation, whereas our proposed method yields a relatively accurate result. In the zigzag scenario, the VI odometry result shows a considerably larger drift than in the previous scenario because the zigzag path includes many pure rotations, which are known to cause significant degeneracy [49]. In both cases, the height estimates obtained via VI odometry also show fluctuations, even though the UAV maintained a height of 5 m.

Experimental Results
At Deungseon Bridge, the UAV was operated to navigate underneath the bridge except for viewpoint selection and mission triggering, which are not addressed in this paper. The results for the given path are shown in Figures 9 and 10. We conducted two experiments at Deungseon Bridge, one with a simple straight path and one with a complicated zigzag flight path. Our method and LIO-mapping [36] both require an initialization step due to the use of VINS-Mono [33] for initializing extrinsic parameters; therefore, before take-off, we conducted a handheld initialization in both scenarios. Figure 9 compares the results obtained in the simple straight flight path experiment at Deungseon Bridge using our proposed method, LOAM [22], and LIO-mapping. All methods achieved good localization, without losing the position of the UAV, but showed slightly different error rates in mapping. Figure 10 compares the results obtained in the zigzag path flight experiment at Deungseon Bridge using our method, LOAM, and LIO-mapping. In this scenario, all three methods also showed good localization performance; however, the mapping results again showed slightly different error rates. The numerical analysis results for both cases are presented in Table 2. The detailed view of the final map of Deungseon Bridge obtained using each method is illustrated in Figure 11. LOAM achieved low-noise mapping performance with clean, straight lines; however, the streetlights were missing in several places, and most of the area of the last bridge column was not mapped, as shown in Figure 11b. In the case of LIO-mapping, the ability to capture straight lines was insufficient compared to LOAM or our method, and as seen from both the front and side views, significant noise was introduced into the map. LIO-mapping also did not achieve good mapping performance for the last bridge column, as shown in Figure 11c. Finally, as seen from a visual inspection of Figure 11a, the performance of our proposed method was superior overall in terms of street lamp representation, straight line reproduction, bridge column mapping, and so on. From the experimental results obtained in these two different scenarios, our method is confirmed to show higher accuracy in performing SLAM for a riverside bridge with many visual and geometric features.   In the case of a sea-crossing bridge, however, the situation is considerably different, with fewer distinctive visual or geometric features than in the case of a riverside bridge; as a result, estimation based on either VI odometry only or LiDAR only may easily fail to converge because the corresponding environment tends to cause almost half of a single image or LiDAR sweep to consist of sky or sea surface. Even if sufficient geometric features exist, VI odometry based on an upward-looking camera becomes vulnerable to disturbances such as changes in illumination caused by sunlight. A greater problem is presented by the strong winds (over 12 m/s on average at the time of the experiment) often encountered under sea-crossing bridges, which cause the risk of loss from falls to be much greater than in the case of land bridges. For these reasons, we conducted remote-controlled navigation at Geobukseon Bridge with a simple path only. Figure 12 compares the 3D maps of Geobukseon Bridge acquired using our method, LOAM, and LIO-mapping. Unlike in the case of Deungseon Bridge, the three algorithms were not all able to produce neat mapping results for Geobukseon Bridge. In the case of LOAM, as shown in Figure 12b, although the results did not significantly diverge, the base deck of the bridge appeared slightly thicker than it did in the results of the other methods due to a slight divergence in the z-axis. LIO-mapping showed better performance in mapping the target bridge than LOAM, but the final map was slightly noisier than that obtained with LOAM due to the frequent occurrence of low sweep matching results, as shown in Figure 12c. By contrast, our method showed good mapping quality without any divergence or mismatch, as visualized in Figure 12a. Since no precise GPS signal is available under bridges, our algorithm was evaluated by comparing the mapping results with the actual bridge measurements according to the blueprints rather than precise real-time kinematic (RTK)-GPS results. The measured intervals are indicated in Figure 6. To ensure accurate comparisons, the average value for each measured quantity was calculated by randomly sampling 10 points on the corresponding part of the 3D maps obtained via SLAM in each environment using the point-to-point distance measurement tool of the CloudCompare software package [50]. The comparison outcomes are summarized in Table 2. In the case of the proposed algorithm, the errors from the sea-crossing bridge experiment are slightly higher than those for the riverside bridge, but none of them exceeds 2%. In contrast, in the case of LOAM [22], the Deungseon Bridge results show a slightly higher error rate than that of the proposed method for each part, while for the Geobukseon Bridge data, the errors are much higher; for w 2 in particular, the error is close to 10%. This finding can be attributed to the divergence of LOAM when the UAV was flying near the middle of the bridge due to the lack of LiDAR feature points provided by the water surface below the bridge. Furthermore, intermittent failure cases due to divergence also occurred. LIO-mapping [36] showed a performance similar to that of LOAM, with an error rate of approximately 1 to 3%; although its mapping error rate for Geobukseon Bridge was lower than that of LOAM, it showed a larger error rate than the proposed method in all cases. Because the LiDAR is mounted with a tilt angle of 45 degrees at the rear of the UAV for the mapping of the bridge, there are few features in a sweep to be extracted. As mentioned before, the LiDAR data is sparse, and there are few overlapping areas, so the quality of sweep matching is not good. However, our method has a hierarchical structure that accumulates the point cloud map and improves the matching performance. Besides, by additionally using VI-O to overcome the limitations of a single sensor in the feature-poor environment, the robustness of the entire SLAM can be secured. It is analyzed that LOAM (w/o IMU) showed relatively high orientation errors in most rotations, which makes the odometry and mapping inaccurate. This can be seen in Figure 12b, since LOAM processes with a few selected features with high/low curvatures, which makes it less accurate where there are sparse features. It is also difficult to accurately model the relative pose uncertainty since the LOAM only gives constraints between two consecutive scans. LIO-mapping sometimes fails on some mapping sequences due to error-prone data association at Deungseon Bridge (Figure 11c), a relatively cluttered environment. However, at Geobukseon Bridge, in comparison with LOAM, the results show that combining IMU and LiDAR can effectively improve the accuracy of the pose estimation and mapping with limited overlapping of LiDAR sweeps (Figure 12c). Furthermore, LIO mapping showed improved accuracy, benefiting from rotation-constrained refinements in the mapping step.
Finally, Figure 13 shows the results of experiments conducted in accordance with our overall framework for autonomous bridge inspection, as mentioned previously. In this scenario, preflight 3D mapping and path planning followed by in-flight localization and UAV control were applied in sequence. Figure 13a illustrates the estimated trajectory from waypoints #1 to #21 along the bottom part of the bridge. The color bar denotes the speed of the UAV, which can be calculated as the sum of the square root of the velocity on each axis. Because we set the maximum speed to 1 m/s, the values do not exceed this threshold. Figure 13b shows the trajectory plotted on the satellite map for comparison with the GPS-based trajectory, which illustrates the need for precise localization in a GPS-denied environment.  Figure 13. Field test of the overall framework at Deungseon Bridge. The system was used to obtain a 3D map of the target bridge, and the operator extracted waypoints by specifying the parts requiring inspection on the 3D map. Subsequently, using our proposed SLAM method and high-level control scheme, the UAV flew along the planned trajectory to obtain image data for the bridge. (a) Flight trajectory along the specified waypoints (from #1 to #21). The colors along the line represent the speed of the UAV. (b) Trajectory comparison: GPS (red line) vs. estimated (blue line).

Discussion and Conclusions
To support the ongoing shift from human labor toward the use of autonomous systems in structural inspection, this paper demonstrated a hierarchical SLAM algorithm for performing UAV-based inspections of infrastructure to address the unreliability of GPS-based localization underneath structures. To ensure robustness and stability, the proposed SLAM algorithm uses VI odometry and NDT-based odometry to generate multiple subnodes composing each supernode, thus applying a sort of local optimization. Based on the outcomes of this robust odometry method, the G-ICP algorithm is then applied to generate global constraints to globally optimize the complete trajectory. In addition, the details of the high-level control scheme for actually operating a UAV during a bridge inspection mission was introduced to allow the UAV to fly along the given path and cover the entire area of the bridge while maintaining a constant speed to ensure sufficient image quality. The proposed method was evaluated in several outdoor experiments conducted at a riverside bridge and a sea-crossing bridge. Overall, the results demonstrate the feasibility and stable performance of the proposed approach. A few of the difficulties and challenges encountered during the field experiments are summarized below: • Transient strong winds below the sea-crossing bridge: Even if accurate position estimation is possible under a bridge, if a wind gust of over 10 m/s suddenly occurs, the risk that the UAV will fall markedly increases. Addressing this problem will require improvement of the hardware of the aircraft itself or an improved control algorithm that can better cope with strong winds. • Difficulty in precise synchronization of the obtained images and pose estimates: In the current system, the image and pose data are stored separately, and the operator manually synchronizes them later. In this case, the possibility that the data may not be properly synchronized arises. Therefore, in the future, the entire image acquisition system will be improved and integrated into the Robot Operating System (ROS) to enable accurate synchronization.
Future research will focus on developing a more reliable SLAM system for field tests at sea-crossing bridges. In addition, methods for online structural inspection permitting on-the-fly detection of bridge defects will be developed as the final step of our framework (Figure 1). With further research, we hope to define a more generalized overall framework for autonomous structural inspection and release an implementation of this framework as open-source software.