Flight Planning for Survey-Grade 3D Reconstruction of Truss Bridges

: Autonomous UAV 3D reconstruction has been widely used for infrastructure inspections and asset management. However, its applications on truss structures remain a challenging task due to geometric complexity and the severe self-occlusion problem of the truss structures when constrained by camera FOV, safety clearance, and ﬂight duration. This paper proposes a new ﬂight planning method to effectively address the self-occlusion problem to enable autonomous and efﬁcient data acquisition for survey-grade 3D truss reconstruction. The proposed method contains two steps: First, identifying a minimal set of viewpoints achieves the maximal reconstruction quality at every observed surface of the truss geometry through an iterative optimization schema. Second, converting the optimal viewpoints into the shortest, collision-free ﬂight trajectories while considering the UAV constraints. The computed ﬂight path can also be implemented in a multi-UAV fashion. Evaluations of the proposed method include a synthetic truss bridge and a real-world truss bridge. The evaluation results suggested that the proposed approach outperforms the existing works in terms of 3D reconstruction quality while taking less time in both the inﬂight image acquisition and the post-ﬂight 3D reconstruction.


Introduction
Truss structures have been widely used in bridges and other civil infrastructure. These trusses consist of many interconnected components such as beams, girders, bracing, gusset plates, and other elements. Unmanned aerial vehicles (UAVs) have been recognized as an effective tool for inspecting bridge structures [1][2][3]. Three-dimensional photogrammetric reconstruction using autonomous UAVs has also gained traction in recent years. The reconstructed high-quality 3D models allow a better understanding of bridge conditions than using fragmented 2D images [4][5][6].
However, UAV inspections, either through remote control or simple automated waypoint flight paths (i.e., orbit, lawn mowing, etc.), are challenging to achieve the desired quality and completeness of the 3D reconstruction of truss structures [7]. The primary challenge is how to handle the complexity and self-occlusion problem of the truss geometry under the constraints of camera field-of-view (FOV), safety clearance, and flight duration.
Over the past years, advanced flight planning solutions have been proposed for the automated inspection and 3D reconstruction of bridges. Most of the works configured the camera viewpoints to back-and-forth sweep the structural surfaces efficiently [8][9][10][11][12]. For example, Morgenthal et al. [2] densely reconstructed bridge piers in three steps: First, the method sliced each pier structure vertically at given intervals. Then, a dense set of horizontal camera views was sampled at each slice. Finally, the camera views across the sliced structures were connected vertically by a spiral path. Phung et al. [10] configured the orthogonal viewpoints along bridge surfaces based on the required ground sampling distance (GSD) and image overlapping. Discrete particle swarm optimization (DPSO) was employed to find the shortest path to connect these viewpoints. Bolourian and Hammad [12] scanned the bridge deck with varied densities of the camera views based on the critical level of defects at the deck surface. The method used the ray-tracing algorithm to avoid the occlusion caused by the on-site obstacles, which guarantees the quality of the collected images. A limitation of these sweep-based techniques is that the methods assume the structures majorly consist of planar surfaces (for viewpoints to scan along). This assumption does not hold for the truss structures due to the geometrical variances and the complexity of the truss components. In addition, sweeping the image views along the structural surface often encourages collecting overly redundant images, which reduces the efficiency during both the inflight image acquisition and the post-flight image processing without increasing the reconstruction quality.
For aerial multi-view stereo (MVS) reconstruction of the buildings and civil structures, many works handled the views and path planning through optimization due to the increased robustness to counter structures with varied geometries while ensuring path optimality [13][14][15][16][17][18][19]. Bircher et al. [15] and Shang et al. [19] configured the viewpoint search space based on the camera parameters and the model geometry. The methods iteratively optimized the trajectories in the continuous space to find the shortest inspection path from an initial randomly sampled camera trajectory. However, these sampling-based methods were designed for efficient coverage inspections, and they cannot ensure highquality aerial photogrammetry because of the lack of stereo-matching constraints. For aerial photogrammetric reconstruction, a core problem is to maximize the reconstruction quality while reducing the image redundancy. Many studies in this category followed the next-best-view (NBV) planning, where the vantage viewpoints are incrementally selected from an ensemble of candidate camera views [20][21][22]. For example, Schmid et al. [13] constructed a spherical view hull to define a discrete list of candidate viewpoints around a building. The NBV list is then recursively selected from the candidates based on the coverage, the overlapping, and the redundancy constraints. Hoppe et al. [14] proposed a similar method. Besides the image coverage and overlapping requirements, the study also incorporated the triangulation angles into the candidate view selection, reducing the poorly triangulated points in the final reconstruction. To maximize the reconstruction performance at each flight, Roberts et al. [16] integrated the selection of the good views and the routing between them as an integrated optimization problem. The study framed the problem as submodular and sequentially selected the best orientations and the positions of the NBV list. Hepp et al. [17] improved this method [16] by using information gain (IG) to measure the marginal reward of each viewpoint. The method combined the selection of the camera positions and the orientations, which achieved a better reconstruction performance. A notable limitation of these NBV techniques is that the methods relied on user-defined discrete candidate viewpoints. Unlike the 3D reconstruction of buildings where the candidate set can be defined as Overhead views naively surrounding the building geometries, trusses are composed of many slim, self-occluded, and non-planar components (i.e., beams, girders, connectors, etc.). Thus, it is difficult to determine a suitable-sized candidate set while ensuring complete coverage at every truss side.
A new UAV flight planning method is proposed to overcome these challenges by finding the optimal trajectories that maximize the reconstruction quality at truss surfaces. Table 1 summarizes the comparison between the proposed method and the state-of-theart in terms of the efficiency, the accuracy, the optimization strategy, the applications, and the type of structures surveyed to demonstrate our contributions. Compared to the sweep-based techniques [23][24][25], the proposed method takes fewer images and achieves a higher reconstruction quality by incorporating the MVS quality assurance principles (i.e., accuracy, completeness, and level of details.) at the planning phase. Unlike the NBV methods [13,16,17] where the vantage viewpoints were selected from a pre-determined discrete candidate set, the proposed method iteratively resamples the whole candidate set in the continuous space, increasing the searchability of finding the optimal viewpoints subset.
Additionally, the method computes the shortest flight paths subject to the UAV capacity constraints (i.e., battery capability, autopilots limitation), enabling the more automated truss bridge reconstruction by single/multiple UAVs. Evaluation of the proposed method includes both a synthetic and a real-world truss bridge. The results showed that the proposed method outperforms both the recent sweep-based method [23] and the state-ofthe-art NBV [17] in terms of the higher model quality with the increased automation and the fewer images/distance traveled in the air.  Figure 1 shows the overview of the truss bridge reconstruction using the proposed flight planning method. The method assumes an existing rough geometrical model of the bridge, which can be extracted from the web (Google Maps in our case) using thirdparty tools (e.g., OpenStreetMap). The extracted model is an unstructured triangular mesh (in KMZ format) containing both the bridge and the surroundings. It is notable that compared to the existing literature that relies on an initial flight to obtain the model geometry [16,17], this strategy keeps the flight planning process offsite, reducing the overall surveying duration. The obtained model is further processed to explore the camera/UAV search space around the bridge and the observable truss surface points for the subsequent view and path planning (Section 3). The proposed method first computes the optimal viewpoints that maximize the reconstruction quality at each observable surface point by selecting the best subset from an iteratively resampled candidate set (Section 4). The candidate is a set of densely sampled oblique viewpoints (i.e., multiple orientations at each position) initialized within the UAV free space while considering the camera/inspection parameters. After the optimization, the method converts the discrete viewpoints into single or multiple smooth flight trajectories subject to UAV constraints (i.e., aerodynamics, battery capacity, memory usage, and safety distance to the on-site objects) (Section 5). These trajectories are then transformed into the world coordinates (e.g., WGS84) and uploaded to the onboard autopilot system for the automated inflight image acquisition using single or multiple UAVs (Section 6.2). A photogrammetric reconstruction software will use the acquired high-quality images to generate a truss bridge's geo-referenced, high-fidelity 3D model (Section 6.3).

Input Parameters
Several important parameters must be defined as the inputs of the proposed method. In this study, we classify these parameters into four categories: (1) the UAV parameters, (2) the camera parameters, (3) the inspection requirements, and (4) the safety concerns. The UAV parameters include the physical properties of the selected UAV, such as the overall flight duration, the designed inflight speed, and the maximal number of executable waypoints in each flight. The camera parameters describe the properties of the onboard camera system, including the horizontal angle-of-view (AOV), the resolution of the onboard camera, and the gimbal pitch rotation limits. Because the in-plane rotations do not change the image contents, we locked the gimbal roll angle at 0 • and aligned the gimbal yaw with the UAV orientation. The inspection requirements are factors that control the quality of the collected images: they are the maximal/saturated GSD and the incidence angle. The safety concerns are parameters that define the UAV flyable space. They include the safe clearance to site objects, the minimum height above the ground level (AGL), and whether the UAV is enabled to fly through the truss. Fly-through-truss is a binary coefficient that defines whether the spaces within the truss structure are available for UAVs to pass through. These spaces enable the UAVs to inspect the truss bridge's interior surfaces better. However, most consumer-grade UAVs cannot fly closely around metal structures (e.g., steel truss) because electromagnetic disturbances can affect the onboard sensing system (i.e., compass) and corrupt the GPS positioning capabilities. Therefore, for safety, we enable the fly-through-truss option only when the UAV onboard navigation system can handle the signal interference. The symbols, descriptions, and the default values of the parameters are listed in Table 2 below.

Preprocessing
Based on the input parameters, the initial model extracted from Google Maps is pre-processed to define the UAV configuration space, the search space of the admissible viewpoints, and the truss surface points for visibility/quality evaluation.

UAV Configuration Space
The UAV configuration space is the free space accessible by a UAV. Due to the external noise (e.g., GPS errors, wind, signal interference, etc.), a safety tolerance (Table 2) between the model and the selected UAV must be maintained. Since the input model format is a triangular mesh, the space inaccessible by a UAV can be defined by extruding the safety tolerance along the normal at every surface of the mesh. New mesh surfaces (highlighted as orange in Figure 2a) that cover every side of the bridge with the defined tolerance are then constructed by connecting the adjacent extruded points. Positions located within this mesh or intersected with the mesh surfaces are considered collisions. To define the free space within the truss, another surface mesh that covers the interiors of the truss structure (highlighted as yellow in Figure 2a) is developed. This mesh can be manually created or downscaled from the convex hull (detailed in Section 3.2.2). It is worth noting that this mesh is created only when the fly-through-truss option is disabled.

Viewpoints Search Space
The viewpoints search space is a subset of the UAV configuration space where the baseline observation quality of the collected images is guaranteed. Thus, only the free spaces surrounding the truss surfaces within certain distances should be considered. To achieve that, we performed the Quickhull algorithm [27] to generate a watertight convex hull that tightly covers the input truss. The convex hull was then resampled into a uniformly distributed triangular mesh using approximated centroidal voronoi diagram (ACVD) [28]. Figure 2b shows the triangle mesh for generating the candidate viewpoints set. For each triangular surface, a candidate viewpoint can be generated using the sampling-based coverage algorithm [15]. This strategy encourages the uniform sampling of the candidate set, which provides a good initialization for the subsequent optimization.
Please note that the candidate set only covers the exterior of the truss. To also sample the candidate viewpoints at the interior of the truss (when the fly-through-truss mode is activated), we reversed the normal of the convex hull and resampled the surface mesh. The result is a double-sided triangular mesh where viewpoints can be sampled within the free space at both sides of the truss. In this study, we set the interior/exterior convex hull to contain 100/500 triangle surfaces, respectively, for the candidate viewpoints sampling (Section 4.1).

Truss Surface Points
The surface points are visible points located at the surface of the bridge truss structure. These points are utilized to measure each viewpoint's visibility and quality. Given the input model of the truss structure, Poisson disk sampling [29] was employed to sample the surface points at the model surfaces evenly. The normal of each point is computed as the average of the surface normal at each local Poisson disk. Figure 2c shows the sampled surface points at the truss surface. Figure 3 shows the workflow of the proposed view planning method. The method was developed in an iterative optimization schema: starting with a randomly initialized set of oblique viewpoints (i.e., multiple candidate view orientations at each sampled position) that covers the truss geometry (Section 4.1), the method selects (Section 4.2) a vantage viewpoint subset from the candidates based on the MVS geometric criterion. The selected subset is further refined to explore better solutions (Section 4.3) and then utilized to resample the candidate set for the subset viewpoints selection in the next iteration (Section 4.4). The above steps are wrapped into an adaptive particle swarm optimization (APSO) framework such that both the candidate and the refine subsets are iteratively optimized. The details of each step of the proposed method are discussed in the following paragraphs.

Candidate Viewpoints Generation
The candidate viewpoints are generated in two steps: (1) one admissible viewpoint is sampled within the search space of each triangle surface (i.e., convex hull); (2) multiple oblique orientations are added at each view position to enhance the searchability.

Candidate Viewpoints Generation
The candidate viewpoints are generated in two steps: (1) one admissible viewpoint is sampled within the search space of each triangle surface (i.e., convex hull); (2) multiple oblique orientations are added at each view position to enhance the searchability.

Admissible Viewpoints
Mathematically, let ( ∈ ) be a triangle surface of the convex mesh, we define a viewpoint ( ∈ ) is admissible if the following constraints are satisfied Equation (1): where computes the GSD of a viewpoint to the surface given the camera FOV and image resolution, measures the incidence angle between the viewpoint and the normal of surface plane, and is the gimbal rotation angle. We set the initial orientation of each viewpoint as a ray casting from the viewpoint to the center of the triangle. is a binary function that measures if the designed viewpoint is located within the UAV configuration space (i.e., no collision).
ensures the altitude of the viewpoint is AGL. These constraints formulate the viewpoint search space at each triangle surface.

Oblique View Orientations
Due to the limited camera FOV and the complex truss geometry, a single view orientation might be insufficient to cover every truss surface. Thus, we add extra orientations ) be a triangle surface of the convex mesh, we define a viewpoint v (v ∈ V) is admissible if the following constraints are satisfied Equation (1): where λ computes the GSD of a viewpoint to the surface given the camera FOV and image resolution, θ measures the incidence angle between the viewpoint and the normal of surface plane, and ϕ is the gimbal rotation angle. We set the initial orientation of each viewpoint as a ray casting from the viewpoint to the center of the triangle. π is a binary function that measures if the designed viewpoint is located within the UAV configuration space (i.e., no collision). ensures the altitude of the viewpoint is AGL. These constraints formulate the viewpoint search space at each triangle surface.

Oblique View Orientations
Due to the limited camera FOV and the complex truss geometry, a single view orientation might be insufficient to cover every truss surface. Thus, we add extra orientations at each sampled position to increase the searchability. Given the initial orientation of the admissible viewpoint, the oblique orientations are symmetrically generated based on two parameters: α and β. α measures the adjacent angles between the extra orientations. The smaller of α, the more oblique orientations are generated. β denotes the angle between the original and the oblique orientation. The larger β indicates the increased exploration ability of the oblique orientations. It is noted that the oblique orientations must also follow the gimbal constraints, and the orientations with the pitch angle located outside the gimbal limits must be rejected. Figure 4 illustrates the oblique orientations (arrows in yellow) generated under the selection of different α and β. In this study, we set α = 90 • and β = 30 • based on the experiments (detailed in Section 7.4.3).

Viewpoints Subset Selection
The initially sampled candidate set contains a redundant number of viewpoints. In this section, we describe how to select the best subset from the candidate viewpoints. The method is developed based on the multi-view stereo quality insurance principle that only the geometric consistent images contribute to the final reconstruction [30]. In the following paragraphs, we first present the quality-efficiency metric that measures the reconstruction quality given a set of ordinary viewpoints (i.e., one view orientation at each position). The metric also identifies/ranks the contribution of each viewpoint. Next, we propose a greedy view selection algorithm to efficiently select the best subset from the candidate oblique viewpoints based on the metric.

Quality-Efficiency Metric
The quality-efficiency (F QE ) metric is formulated as the weighted sum of the reconstruction quality (F Q ) and the reconstruction efficiency (F E ) as Equation (2) below: where σ ∈ [0, 1] is a constant coefficient that balances these two terms. In this study, we set σ = 0.8 based on a thorough experiment (detailed in Section 7.4.3). The presented metric encourages high-quality reconstruction from a small set of viewpoints to be obtained. In the following, we discuss the computation of each term in detail.
Reconstruction quality predicts the MVS quality at each surface point given a set of viewpoints. Due to the absence of the pixel-level contents at the planning phase, the metric is computed based on the geometric priors at the image level [30,31] where the following principles are considered: Principle 1. Each surface point must be covered by at least two high-quality images in terms of sufficient GSD and the incidence angles for feature extraction and matching. Principle 2. Small baselines between the matched images can cause large triangulation errors for depth interpretation. Principle 3. Redundant images are uninformative views that do not reduce the depth uncertainty while can increase the computation workload.
Based on the above-mentioned principles, we formulate the quality metric as Equation (3) below: where Q measures the quality of a truss surface point p (p ⊆ P) as the sum of the k best observations (Principle 3). We set k equal to 3 due to the increased robustness of the three-view reconstructions at texture-less surfaces [32]. τ is a binary function that detects if the point p is visible from v. q measures the observation quality of each viewpoint, which is computed as the average (ω = 0.5) of two factors: (1) the view-to-point distance; and (2) the view-to-point incidence angle (Principle 1). These two factors are normalized and saturated based on the input parameters. V p denotes the subset of viewpoints V V p ⊆ V where the baselines at p follow the stereo-matching constraints (Principle 2). Based on [33], we empirically set µ = 15 • in this study. Figure 5 illustrates the geometries of the viewpoints to a surface point. Reconstruction Efficiency measures the ratio of the non-selected viewpoints over the complete viewpoints set V (as in Equation (4)). This metric encourages reducing the redundant images for efficient aerial reconstruction.
where F Q (P, v) is the quality of the viewpoint v to every surface point P. V * is the subset of V that contributes to the reconstruction quality F Q (with F Q (P, v) > 0).

Greedy Views Selection
Selecting the viewpoints subset from an oblique set involves the selection of the viewpoint positions as well as the best orientation at each position. Clearly, enumerating every possible combination is expensive. To make the problem tractable, we propose a greedy algorithm that includes three steps as follows: Step 1. Measuring F QE of the oblique viewpoints with the initial orientation at each view position (Section 4.2.1). The output subset V * is considered as the baseline for the view selection in the next step.
Step 2. Selecting one viewpoint in the baseline and substituting the current orientation with one oblique orientation. The current orientation of the viewpoint is updated if F QE is increased. Iterative this process to all oblique orientations at the position.
Step 3. Repeating Step 2 at every viewpoint in V * . Stop the operation until every viewpoint has been visited or the overall quality does not increase a certain number of times (i.e., 5).
Clearly, the sequence of the viewpoints being selected (as in Step 2) significantly affects the outcome. To avoid the results being biased to a bad sequence, we perform multiple runs of Step 2 in parallel with the viewpoints selected in random order at each run. Among the different runs, the viewpoints subset V * with the maximal F Q is chosen as the output of the algorithm.

Viewpoints Subset Refinement
The viewpoints subset refinement is to perform the local search to better exploit the problem space. The main idea is to adjust the viewpoint v * in V * where F Q (P, v * ) is low. The proposed refinement method is performed based on two operations (as in Figure 3 (c)): In the first operation, we resample the viewpoints with quality F Q (P, v * ) less than a pre-defined threshold (i.e., 0.2). The viewpoints are updated if F Q (P, V * ) is increased. In the second operation, a ratio (i.e., 25%) of the viewpoints with the lowest quality in V * are selected and incrementally mutated at positions within a defined radius (i.e., 5m). We update the mutated viewpoint if the F Q (P, V * ) is increased. Preliminary results showed that this refinement step can improve the F Q (P, V * ) at an average of 6-8% in each iteration without increasing the size of V * .

Candidate Viewpoints Resampling
The refined subset V * is near-optimal only if the candidate set covers or partially covers the true optimal viewpoints. However, enumerating every possible candidate in the 3D continuous space is impractical due to the scale and the geometric complexity of the truss structures. Thus, we iteratively resample the candidate viewpoints such that the randomly initialized candidate set eventually converges to the optimal or near-optimal solutions (i.e., cover the optimal viewpoints). In this study, we wrap the resampling procedure into the APSO framework [34]. Compared to the conventional PSO, APSO is selected due to the increased convergence speed and exploitability in solving multimodal optimization problems. Specifically, we define each candidate set (Section 4.1) as a particle and the quality-efficiency metric of the refined subset (Section 4.2) as the fitness. Equation (5) shows the resampling mechanism based on APSO.
where v pos is the position of a viewpoint, u is the particle velocity at v. t denotes the number of iterations, and g is the global best particle (i.e., viewpoint subset V * with maximal F Q (P, V * )). δ 1 and δ 2 (δ 1 = 0.8, δ 2 = 0.5 ) are coefficients that control the update behavior at each viewpoint. χ is a standard normal distribution N(0, 1) along each axis of the Euclidean space. ∇(·) is the function that measures the difference between the position of a viewpoint in a particle and the correspondent position in the global best (i.e., viewpoints at the same triangle surface). We set the function to return 0 if the viewpoint does not belong to V * . Notably, the presented Equation (5) only updates/resamples the positions of the viewpoints. The initial and oblique orientations at each updated position need to be recomputed afterward using the same strategy as in Section 4.1.

Trajectory Planning
This section converts the optimized viewpoints into the UAV executable trajectories (as shown in Figure 6a). The method starts with constructing a complete, undirect graph, with each node indicating the position of a viewpoint and each edge as the distance of the collision-free path between every pair of the view positions. As shown in Figure 6b-d, the edge distance between each pair of the viewpoints is computed in three steps: (1) Connect the viewpoints with a straight line and check if this line collides with the on-site obstacles; (2) If a collision is found, the informed rapidly exploring random tree star (RRT*) [35] is employed to efficiently reroute the path. If the path does not converge a given number of iterations, we recognize the path segment as not accessible, and a significant penalty is assigned to the edge. (3) For each rerouted path, B-spline curve interpolation [36] is applied to smooth the path segment for the UAV path following at the desired speed. The distance of the smoothed path is then measured as the cost of the edge between the viewpoints. Based on the constructed graph, the trajectory planning problem is then formulated as a capacitated vehicle routing problem (CVRP) [37]. To simplify the problem, we set the vehicle type and capacity as homogeneous, and let the routes start and end at the same spot (i.e., drone departure/landing). Two factors are considered the major capacity constraints of the problem. The first is the UAV battery capacity, which is a determinant of how long can the UAV stay in the air. The second is the autopilot limitation, which restricts the maximal number of waypoints to be uploaded per flight. Many autopilot systems (e.g., DJI) have such constraints for safety concerns. Unlike the battery constraint, the waypoint limit does not require the UAV to land, but needs the drone to be located in proximity to receive the users' input signal for continuing the mission.
In this study, we employ the Lin-Kernighan-Helsgaun (LKH-3) [38] as the problem solver. The solver utilizes the improved symmetric transformation and five-opt move generator to efficiently compute the paths while handling the battery/memory constraints. The output is single or multiple routes, each route starts/ends at the same spot and travels through a subset of viewpoints under the imposed constraints. It is possible that the outputted paths still contain sharp corners that may not be tightly followed by the UAV at the desired speed. Under such conditions, the path can be either re-smoothed using the B-spline algorithm or manually checked/adjusted by the operator at the pre-flight stage. While the presented method is initially developed for a single UAV to sequentially fly the paths (with battery replacement). The method can be easily extended for multiple UAVs to fly in parallel by adjusting the flight speed of the UAVs at regions where the paths are intersected [39].

Implementation Details
In this section, we discuss the implementation details of the proposed method, including the visibility detection for quality evaluation, the procedures of automated flight execution, and the 3D reconstruction pipeline.

Visibility Detection
For each viewpoint, we compute the visible surface points to evaluate the correspondent contribution to the quality metric. The presented visibility detection method considers not only the occlusions as is mostly done, but also the inherent image triangulation properties. This strategy reduces the computation load and avoids considering the poorly matched camera views. Given the camera parameters (as in Table 2), we construct a viewing frustum to simulate the camera FOV at each viewpoint. The visibility detection is performed in three steps: Step 1. We examine every surface point by checking whether the point is located within the frustum. Step 2. We cast a ray from the viewpoint to each surface point within the frustum and check whether the ray is intersected with any truss components. The surface points without intersections are visible from the viewpoint.
Step 3. For each visible point, we measure the incidence angle between the point normal and the camera ray. Only the points with incidence angles smaller than a predefined angular threshold (θ max ) are triangulable by the viewpoint. Figure 7 illustrates the proposed visibility detection using a single viewpoint and a synthetic truss bridge. In this study, the visibility detection is implemented based on Octree-based collision detection [40] using VTK [41].

Flight Execution
Because the flight trajectories were originally computed in the local coordinates, they need to be transformed to the World Geodetic System (WGS84) to be executable by a UAV. To achieve this, several ground control points close to the truss bridge to be inspected are manually surveyed using a GPS receiver. The transformation can then be found by correlating these GPS positions to the correspondent points in the local coordinates. Due to the relatively small scale of most bridges, rigid body transformation (i.e., assume the surface is flat) is used to map the transformations from the local coordinates to WGS84.
After the transformation, the viewpoints are uploaded into UgCS [42], a ground station software, for the automated inflight waypoints following and image acquisition. The software contains a hardware-in-the-loop simulator that can perform the pre-flight check before the field deployment. In this study, we use the DJI Inspire 1 as the flight platform to execute the missions, and the DJI Zenmuse X3 for the aerial image collection. Inspire 1 can fly for around 15 min when the wind speed is moderate. We restrict each flight to only use at most 80% of the battery capacity (i.e., 12 min) for safety concerns. It is noted that DJI drones have the limit of at most 99 waypoints to be uploaded per flight, which is another capacity constraint to be considered in the trajectory planning (Section 5).

3D Reconstruction
After the flight executions, the collected aerial images are imported into 3D reconstruction software. In this study, Agisoft Metashape [43] is selected since it has been previously used for the 3D reconstruction of bridge structures with fewer artifacts [1]. When the GPS of each image is available, reference matching is enabled to accelerate the image alignment process. To obtain the detailed reconstruction, we set the quality of both the image alignment and dense point cloud as high with the depth map as aggressive to actively filter out the noises in the final reconstruction.

Experimental Setup
The performance of the proposed method is evaluated based on both a synthetic and a real-world truss bridge. Evaluation using the synthetic bridge has the advantage of controlling the environmental factors (e.g., reflection, illumination change, shadows, moving objects, etc.), which are often considered as noises in image-based reconstruction. In this study, Unreal Engine 4 (UE4) is selected to render the synthetic environment due to its ability to provide photo-realistic scenes at high levels of details (LoDs). UnrealCV [44], an open-source computer vision SDK, is employed to render the image at each camera footprint. The selected synthetic truss bridge (as shown in Figure 8a) is a highway bridge across a valley. The bridge truss structure, which was downloaded from the Unreal Marketplace [45], has a dimension of 66.7 m × 16.1 m × 12.0 m. It is noted that the original package only contains the bridge superstructures; we import the deck surface and the surrounding environment to simulate the real-world condition. Because the synthetic environment is noise-free, the fly-through-truss option is enabled (ρ = 1) in the evaluation.
The selected truss bridge for the real-world experiment is an abandoned railway bridge (as shown in Figure 8b). The bridge has a dimension of 40 m × 4.7 m × 8.2 m, which is not accessible by the human at the time of the inspection due to safety concerns. It is noted that the inner space of the bridge is insufficient for the UAV to pass through (i.e., safety tolerance), even without considering the magnetic interference. Thus, the fly-through-truss (ρ = 0) is disabled in the real-world environment.

Comparison
Based on the authors' knowledge, there is no reported flight planning method specifically designed for the 3D reconstruction of truss bridges. Thus, a baseline and a stateof-the-art method for building reconstruction are selected to evaluate the performance of the proposed method in the simulated environment. In this study, the Overhead flight, composed of an orbit path with the camera surrounding the center of the scene followed by a lawnmower path providing the bird views, was selected as the baseline approach. This path can be easily reproduced using commercial flight planning software [42,46,47]. The overlap between the adjacent viewpoints is 80% to ensure dense image registration. The NBV method presented in [17] is employed for the state-of-the-art approach. The method incrementally adds the viewpoints with the largest marginal reward from a graph of the candidate cameras. The orbit path obtained from the Overhead flight is used to initialize the optimization. Compared to our method, the original implementation of [17] used different strategies and implementation libraries for the space representation (i.e., occupancy map), the collision detection/avoidance, and the visibility detection that might affect the result. To avoid confusion, we implement the NBV using the exact implementation strategies as our work such that the final results are only affected by the optimization algorithms. The method [17] limits the viewpoints planning in a single flight (i.e., battery constraints) that might result in incomplete reconstruction. Thus, in the simulation, we set UAV flight time as unlimited and leave the evaluation of the trajectory planning to the field experiment.
For the field experiment, both the reconstruction quality as well as the efficiency of the inflight image acquisition and the post-flight image processing are discussed. Thus, we select a sweep-based, multi-UAV-supported route planning method [23] as the previous state-of-the-art. The method designed three routes to tightly cover the structure from different perspectives while considering the photogrammetric constraints (i.e., GSD, camera angles, overlapping, etc.). Due to only one UAV being available (i.e., DJI Inspire 1) in the field experiment, the planning adjustment step (for multi-UAV cooperation) was skipped. Since the method does not include the collision avoidance algorithm, a manual check is needed to guarantee the safety of the mission. All the experiments were executed on a PC desktop with Intel CPU E5-2630, 64G memory running on Ubuntu 18.04.

Quality Evaluation
Evaluation of the reconstruction quality includes a visual and quantitative comparison. The visual comparison focuses on the observations of the texture smoothness and the artifacts in each reconstruction, especially in the complex geometric regions (e.g., truss interiors, connections, and slim beams). The quantitative evaluation measures the geometric fidelity between the reconstruction and the ground truth. The evaluation includes three major steps: First, the reconstruction model is cropped and filtered only to contain the regions covered in the ground truth (i.e., truss bridge). Second, a coarse-to-fine alignment is used to transform the coordinates of the reconstruction into the ground truth. The coarse alignment is performed by the rigid transformation from a set of correspondence points. Based on the coarse alignment, the fine transformation is computed using iterative closest point (ICP) registration [48]. It is noted that the reconstructed models might contain outliers. Thus random sample consensus (RANSAC) [49] is employed such that the refined transformations are robust to such outliers. Third, the F-Score, as presented in [50], is used to measure the fidelity of the finely aligned model. F-Score is composed of the harmonic mean of two indicators: Precision, and Recall, given a distance threshold. Precision measures the accuracy of the reconstruction by averaging the errors of each reconstructed point to the ground truth. In contrast, Recall evaluates the reconstruction completeness by measuring whether each point in the ground truth is covered by the reconstruction. A high F-Score indicates a reconstruction that has both high model accuracy and completeness. The formula of F-Score (F), Precision (P ), and Recall (R) are presented in Equation (6) below. We refer the readers to [50] for the details of the indicators. Due to each F-score being computed with a distance threshold, thus we report the quantitative evaluation based on the F-scores across a range of distance thresholds (χ).
where K and G, respectively, denote the points set of the source and the target models. e is the error metric that measures the distance of a point in the source model to its closest point (represented as *) in the target model. For the synthetic bridge, the ground truth model is known. Thus, the F-score can be directly computed by comparing the reconstruction to the ground truth. For the field experiment, terrestrial laser scanning (TLS) is used to obtain the ground truth model of the bridge. In this study, the Leica BLK360 laser scanner is selected. The scanner can obtain millimeter accuracy at a distance of fewer than 10 m, which is sufficient to obtain a high-fidelity 3D model of the truss bridge. Figure 9 shows the 2D view of the TLS scanned truss bridge and the on-site scanning spots. The 3D model is registered from 27 scans, with most of the scans conducted on the bridge. The entire survey took around four hours for on-site data collection and another five hours for the offsite data transmission and point cloud registration.  Figure 10 shows the trajectories computed using the Overhead, NBV, and our method (ρ = 0, 1) on the synthetic bridge. To make a fair comparison, we restrict the upper bound of NBV as the number of images generated with our method, such that the methods generate the same number of images. It observed that Overhead and NBV only compute the viewpoints surrounding the truss geometry. Instead, our method (ρ = 1) enables the trajectories to pass through the truss (detailed view in Figure 10), and provides better observations of truss interiors. Table 3 summarizes the statistic of the runtime of flight planning, number of images, and the flight distance computed with each method. Clearly, Overhead requires significantly less running time with a fewer number of images and the flight distance needed when compared to other methods. Compared to the NBV where the candidate viewpoints are fixed, our method iteratively optimizes the viewpoints in the continuous space at the cost of the longer runtime. In addition, setting ρ equal to one increases both the runtime and the number of images. However, because the proposed method can be computed offsite at the pre-flight stage, the increased runtime might cause minimal effects on the field deployment.   Figure 11 shows the visual comparison of the reconstructions using different methods, including the detailed views of three challenge areas as highlighted in Figure 8a. The results showed that our method (ρ = 0) generates more visually appealing results at vertical/diagonal web members and the surface connections (second and fourth columns in Figure 11) when compared to both the Overhead and the NBV. In addition, the textures at the interiors of the top chords/struts (as third column in Figure 11) are only recovered by our method, especially when ρ equals one. Table 4 presents the measured F-Score at varied distance thresholds (χ = 0.05, 0.1, 0.2). The results validate the observations that our method outperforms the other two at every distance threshold. Enabling the fly-through-truss option shows the best result, which indicates that collecting the images from the inside of the truss can indeed improve the overall reconstruction quality.   Figure 12 compares the flight trajectories computed with Zheng et al. [23] and our method. Because Zheng et al. [23]'s method was originally developed for the 3D reconstruction of building structures; the method takes more images. Table 5 shows the statistics of both the inflight inspection and the post-flight reconstruction. Clearly, our method takes shorter time both on-site and offsite.

Synthetic Bridge
In Figure 13, a detailed comparison between the reconstructed models and the laserscanned model is presented. Since the TLS model is obtained by scanning the bridge interiors, it shows the different color intensities when compared to the photogrammetry, where the images are taken from the external side. To make a fair comparison, the models need to be reconstructed from a similar number of images. Thus, we generate the reconstruction model with images taken only at routes 1 and 3 in Zheng et al.'s method [23]. The selected routes form similar flight patterns as the Overhead that can be utilized to represent the typical flight in the real-world experiment. The routes generate a total of 133 images, which is close to ours. It is evident that although both methods recover the major truss structures. The model obtained from routes 1 and 3 presents the worst result in terms of the point cloud density and recovering the model details (e.g., slim beams, joints, and truss interiors.). The low density of the point cloud shows the insufficient coverage at the truss surface. Compared to Zheng et al. [23], our reconstruction has higher density and preserves more structural details with much fewer artifacts. For example, the holes in the top chord and the boundaries of the diagonal beams are much better recovered by our method (detailed views in the third column in Figure 13). Table 6 presents the F-Score of the truss reconstructions as opposed to the TLS. The results demonstrate that our method slightly outperformed Zheng et al. [23] in terms of both the Precision and Recall, with less than half of the images being used, which validates both the efficiency and effectiveness of the proposed method.

Further Results
In this subsection, we evaluate the performance of several parameters based on the results from both the synthetic and the real-world bridges. First, the effects of the weight coefficients σ on the optimization performance is evaluated. Figure 14 (Left) shows the quality metric F Q and the number of collected images |V * | under different σ. The figure showed that in contrast to |V * |, which increases monotonically as σ, F Q gradually decreases when σ close to one. Such results might be affected by the fact that the overredundant images can cause diminished return. Because a smaller number of images is preferred for efficient reconstruction, we set the weight coefficient at 0.8 as a good trade-off between the reconstruction quality and the efficiency. Next, we evaluated the selection of the oblique orientations (α, β) to the quality-efficiency metric F QE . As shown in Figure 14 (Right), compared to the conventional viewpoints (α = 0 || β = 0), using oblique viewpoints significantly improves F QE for all test cases (for both ρ = 0 or ρ = 1). The result indicates that the oblique orientations indeed increase the reconstruction quality. Among the different combinations of α and β, we found the combination of α = 90 • and β = 30 • shows the best result. Thus, we select it as the default in the experiments.

Conclusions
This paper presents a new flight planning method for autonomous, efficient, and high-quality 3D reconstructions of truss bridges. The synthetic experiment showed that the proposed method outperforms the state-of-the-art NBV method with increased reconstruction accuracy and completeness. Enabling the fly-through-truss option significantly improves the coverage and model quality at the truss interiors, such as the top chord and other web members. The real-world experiment demonstrated that the presented method computed the image capture views can achieve a higher reconstruction quality with less than half of the images being used when compared to the existing sweep-based method. The planned trajectories ensure the safety of the flights while considering the UAV constraints, enabling the automated and efficient bridge inspection practice.
In this study, the authors extracted the input model from Google Maps to guide the viewpoints and trajectories planning. This strategy enables the flight plans to be designed offsite, reducing the survey time when compared to the existing literature. The presented method also accepts other types of 3D models, such as aerial photogrammetry, with minimal adjustment. This flexibility potentially enables the method to be performed as an incremental procedure by sending the outputted reconstruction as the input model for flight planning in the next iteration until the users satisfy the results.
Future works include evaluating the proposed method with the fly-through-truss option enabled in real-world experiments, which requires implementing the proposed method on a UAV with advanced flight control and navigation system. In addition, currently, we offload the flight planning and assume the positioning at every viewpoint is accurate. However, it would be more beneficial to be able to adjust the flight plans in real-time to counter the effects of the external factors (e.g., GPS error, wind, magnetic interference, dynamic obstacles, etc.) and produce higher fidelity models.

Data Availability Statement:
The data reported in this article is available from the authors upon request.

Conflicts of Interest:
The authors declare no conflict of interest.