Bilevel Optimization-Based Time-Optimal Path Planning for AUVs

Using the bilevel optimization (BIO) scheme, this paper presents a time-optimal path planner for autonomous underwater vehicles (AUVs) operating in grid-based environments with ocean currents. In this scheme, the upper optimization problem is defined as finding a free-collision channel from a starting point to a destination, which consists of connected grids, and the lower optimization problem is defined as finding an energy-optimal path in the channel generated by the upper level algorithm. The proposed scheme is integrated with ant colony algorithm as the upper level and quantum-behaved particle swarm optimization as the lower level and tested to find an energy-optimal path for AUV navigating through an ocean environment in the presence of obstacles. This arrangement prevents discrete state transitions that constrain a vehicle’s motion to a small set of headings and improves efficiency by the usage of evolutionary algorithms. Simulation results show that the proposed BIO scheme has higher computation efficiency with a slightly lower fitness value than sliding wavefront expansion scheme, which is a grid-based path planner with continuous motion directions.


Introduction
Autonomous underwater vehicles (AUVs) are frequently employed to perform environmental monitoring and exploration tasks, such as surveillance of the dynamics of plankton assemblages, temperature, and salinity profiles, and the onset of harmful algal blooms [1][2][3]. Path planning is of primary importance to the safety and efficient operation of a vehicle in such tasks. AUVs are frequently deployed for long periods and must operate with limited energy. Thus, a path planner should be capable of determining a trajectory that safely guides an AUV from its initial or current position to its destination with minimal energy or time. By selecting an appropriate trajectory, the path planner may enable the AUV to bypass adverse currents, exploit favorable currents and subsequently achieve high speeds while substantially saving energy. Many researchers investigated the AUV path planning problem in recent years. Here, we discuss those contributions that relate directly to our work, and these are grid-based path planning in anisotropic environments. Some of these studies use heuristic search algorithms, such as A* algorithm and the extensions of A* algorithm to the AUV path planning problem. A* method is usually used for obtaining an energy-optimal path for the AUV [4][5][6] and has been proven to be efficient. The use of the EEA* algorithm [7], which is an extension of the A* algorithm, is proposed for the planning of energy efficient paths for a marine surface vehicle when heading angle is considered. However, the above-mentioned approaches are limited by small and discrete sets of possible transitions, which result in the generation of a suboptimal path; in some cases, The remaining parts of this paper are organized as follows: Section 2 outlines the system structure involved in path planning, including path formation, path evaluation, and environment model. Section 3 introduces the bilevel optimization mechanism for path planning. The simulation tests and results generated are presented in Section 4. The concluding remarks are then presented in Section 5.

Problem Statement
The objective of a path planning system is to find an optimal path that leads an AUV traveling through the ocean environment to its destination. The ocean environment is modeled as a strong currents field with fixed and moving obstacles. The optimization criterion for the path planner is set as minimal energy consumption with collision avoidance.

Ocean Field Environment
The information of ocean currents can be obtained from remote observations, particularly through high-frequency radar surface current measurements and satellite observations, or from numerical forecast models. Some ocean current measurement and prediction systems have been found, such as the Regional Ocean Model System (ROMS) [29] with 1 km resolutions. The use of measured or predictive ocean models is primarily intended for large regions.
In practice, the time-optimal path planning problem for AUV operation is generally solved for long-term missions (large regions) with durations of several days and trajectory lengths of hundreds of kilometers. In addition, the AUV's deployment usually is assumed on a horizontal plane, because vertical motions in ocean structure are generally negligible due to large horizontal scales comparing vertical [30]. Hence, in this research, the ocean model based on measured or predictive is used in 2D space. According to the resolution of the measures or the precision of the forecast model, the environment is divided into grids. In each mesh, the ocean current value is characterized by a constant, and the ocean current data are simulated by the superposition of randomly distribution eddy fields. The eddy fields in the global coordinate system are provided by the following expression: eddy{p, size} : where p x and p y are the coordinates of eddy center p on xand y-axes, respectively, and size x and size y determine the current strength in the directions of the xand y-axes, respectively. The sign of size x must be similar to that of size y . Their signs determine the rotation direction (clockwise or anticlockwise) of the eddy field. The model of the current field F is as follows: where rand( ) represents a random function. The current field is the superposition of n eddies.

Obstacle Models
In this study, the uncertainty of the position is modeled with independent normal distribution X o ∼ N(µ o , σ o ). An obstacle with low uncertainty indicates a high probability that the obstacle exists at the location. Conversely, high uncertainty indicates the absence of such an obstacle. Three kinds of obstacle modeling derived from [13,31] are considered for the assimilation of different possibilities of real situations:

1.
Static obstacles with fixed uncertainty: a static obstacle's position, although fixed, have a constant measurement uncertainty incorporated in its position.

2.
Quasi-static obstacles: this group of obstacles is introduced with a fixed-center and an uncertain radius that varies over time.

3.
Moving obstacles: similar to the quasi-static obstacles, the uncertainty of moving obstacles increases with time, whereas their centers will change accordingly with constant speed and direction.
In modeling obstacles with continuous observations, uncertainty is fixed and depends on the accuracy and repeatability of the measuring modalities. However, in the lack of continuous observations, the uncertainty of obstacles increases such that σ o grows linearly with time.

Optimization Criterion
An optimization planning criterion is used to define the evaluation function as described in Section 2.3.2. Based on this evaluation function, a fitness value is measured for each candidate path.

Path Formation
In this study, a path denoted as Γ s,g is a sequence of waypoints between the initial position and a destination, in which each waypoint is located at the edge of the meshes. The waypoints are denoted as x i (i = 1, 2, · · · n), then the path from s to g shown in Figure 1 is 2. Quasi-static obstacles: this group of obstacles is introduced with a fixed-center and an uncertain radius that varies over time. 3. Moving obstacles: similar to the quasi-static obstacles, the uncertainty of moving obstacles increases with time, whereas their centers will change accordingly with constant speed and direction.
In modeling obstacles with continuous observations, uncertainty is fixed and depends on the accuracy and repeatability of the measuring modalities. However, in the lack of continuous observations, the uncertainty of obstacles increases such that o σ grows linearly with time.

Optimization Criterion
An optimization planning criterion is used to define the evaluation function as described in Section 2.3.2. Based on this evaluation function, a fitness value is measured for each candidate path.

Path Formation
In this study, a path denoted as s,g Γ is a sequence of waypoints between the initial position and a destination, in which each waypoint is located at the edge of the meshes. The waypoints are denoted as ( )  , then the path from s to g shown in Figure 1 is Any two adjacent waypoints 1 , i i x x + are assumed to be connected by a straight line segment.
The path planner solves the optimal sequence of waypoints to minimize energy consumption. This formation allows waypoints to be moved on the edges rather than to be fixed at the center or vertex of the mesh. It solves the problem of discrete state transitions and guarantees path optimization. Path formation is shown in Figure 1, where the solid point represents the waypoints, and the dotted line represents the path s,g Γ . Γ S G Figure 1. Example of a path, in which the dotted line represents path s,g Γ .

Path Evaluation
The fitness value of each path is evaluated by measuring energy consumption. This work is primarily concerned with finding the optimum trajectory adjustments to exploit the currents fields and ensure that the vehicle does not collide with the obstacles.

Energy consumption
The energy consumption E along a given path is the sum of i e (energy consumption of ith path segment) required to cover each of the constituent path segments. Any two adjacent waypoints x i , x i+1 are assumed to be connected by a straight line segment. The path planner solves the optimal sequence of waypoints to minimize energy consumption. This formation allows waypoints to be moved on the edges rather than to be fixed at the center or vertex of the mesh. It solves the problem of discrete state transitions and guarantees path optimization. Path formation is shown in Figure 1, where the solid point represents the waypoints, and the dotted line represents the path Γ s,g .

Path Evaluation
The fitness value of each path is evaluated by measuring energy consumption. This work is primarily concerned with finding the optimum trajectory adjustments to exploit the currents fields and ensure that the vehicle does not collide with the obstacles.

Energy consumption
The energy consumption E along a given path is the sum of e i (energy consumption of ith path segment) required to cover each of the constituent path segments.
where P vehicle is the vehicle's thrust power which is proportional to the cube of the vehicle's thrust speed. c d is the drag coefficient based on the vehicle's design.
→ v c and → v g represent the vehicle's thrust velocity and the velocity relative to the seabed, respectively. The vehicle's thrust power and thrust speed are assumed to be constant (when P vehicle is constant, the time-optimal path is equal to the energy-optimal path). The velocity relative to the seabed is obtained by vector synthesis as the following formula.
where → c represents the ocean current vector, which is known either by measurement or forecasting, as described in Section 2.1.
It is worth noting that the operation of AUV will be significantly interfered by ocean currents. This interference limits the motion direction of AUV to a reachable region. The stronger the ocean current is, the smaller the range of the reachable region is. As shown in Figure 2a, the radius of circular O is → v c . According to the velocity composition law, the range of direction of where vehicle P is the vehicle's thrust power which is proportional to the cube of the vehicle's thrust speed. d c is the drag coefficient based on the vehicle's design. c v  and g v  represent the vehicle's thrust velocity and the velocity relative to the seabed, respectively. The vehicle's thrust power and thrust speed are assumed to be constant (when vehicle P is constant, the time-optimal path is equal to the energy-optimal path). The velocity relative to the seabed is obtained by vector synthesis as the following formula.
where c  represents the ocean current vector, which is known either by measurement or forecasting, as described in Section 2.1.
It is worth noting that the operation of AUV will be significantly interfered by ocean currents. This interference limits the motion direction of AUV to a reachable region. The stronger the ocean current is, the smaller the range of the reachable region is. As shown in Figure 2a, the radius of circular O is c v  . According to the velocity composition law, the range of direction of g v  Further, if the intersection of the reachable region and allowed motion directions based on traditional path planning algorithm is empty, the path planner will return no path. Nevertheless, the physically feasible path exists, as shown in Figure 3. This is explained in detail in Reference [8]. The proposed BIO scheme solves this problem with higher computational efficiency than SWE scheme. the physically feasible path exists, as shown in Figure 3. This is explained in detail in Reference [8]. The proposed BIO scheme solves this problem with higher computational efficiency than SWE scheme.

Collision constraint
Any proposed path that passes through the region of uncertainty surrounding any obstacle is discarded as unsuitable. The following expression imposes the collision value: where ( ) The collision constraint is a high priority objective that should be checked first, and its value must be zero. The evaluation function F is defined as

Bilevel Programming Mechanism
Bilevel optimization is a branch of optimization that involves a nested optimization problem within the constraints of an upper level optimization problem [32]. The key difference between bilevel programming problems and other optimization problems are their nested structures.
In solving the problem of path planning for AUV based on grid environment, two steps are performed. The first step is to find a set of connected meshes to constitute the channel, channel x from the initial position to the destination with discrete state transitions for the upper level,while avoiding the mesh that contains the obstacle in the upper level. It is worth noting that the upper level problem is a combinatorial optimization. The second step is to find the exact energy-optimal path, path x that is constrained inside the given channel generated by the upper level in the lower level. The ( channel x , is an energy-optimal path response to channel x , represents a feasible solution to the path planning problem. Note that most grid-based path planners only execute the first step and, thus, obtain incomplete and suboptimal paths. An example of a path generated by the bilevel optimization is shown in Figure 1. The shaded meshes represent a channel, and the dotted line is the optimal path inside this channel. In our study, the 4-connectivity extending model is used in the upper level, which guarantees search space integrity; it excludes the redundant channel to improve the search efficiency. As shown in Figure 4, both ABC and AC represent the channel from the initial position s to the destination g. In the 4-connectivity extending model, only channel ABC can be generated. By contrast, ABC and AC can be both generated in an 8-connectivity extending model, although channel AC is redundant.

Collision constraint
Any proposed path that passes through the region of uncertainty surrounding any obstacle is discarded as unsuitable. The following expression imposes the collision value: where O µ k o , k o is the circular region of the position uncertainty at time k around the obstacle with radius k o and center at µ k o . The collision constraint is a high priority objective that should be checked first, and its value must be zero. The evaluation function F is defined as

Bilevel Programming Mechanism
Bilevel optimization is a branch of optimization that involves a nested optimization problem within the constraints of an upper level optimization problem [32]. The key difference between bilevel programming problems and other optimization problems are their nested structures.
In solving the problem of path planning for AUV based on grid environment, two steps are performed. The first step is to find a set of connected meshes to constitute the channel, x channel from the initial position to the destination with discrete state transitions for the upper level, while avoiding the mesh that contains the obstacle in the upper level. It is worth noting that the upper level problem is a combinatorial optimization. The second step is to find the exact energy-optimal path, x path that is constrained inside the given channel generated by the upper level in the lower level. The (x channel , x * path ) pair, where x * path is an energy-optimal path response to x channel , represents a feasible solution to the path planning problem. Note that most grid-based path planners only execute the first step and, thus, obtain incomplete and suboptimal paths. An example of a path generated by the bilevel optimization is shown in Figure 1. The shaded meshes represent a channel, and the dotted line is the optimal path inside this channel.
In our study, the 4-connectivity extending model is used in the upper level, which guarantees search space integrity; it excludes the redundant channel to improve the search efficiency. As shown in Figure 4, both ABC and AC represent the channel from the initial position s to the destination g. In the 4-connectivity extending model, only channel ABC can be generated. By contrast, ABC and AC can be both generated in an 8-connectivity extending model, although channel AC is redundant. Channel AC contains the unique path {s, O, g} because mesh A and mesh C are connected only through vertex O, which does not satisfy the definition of the channel. The path {s, O, g} can be found in the channel ABC when the two waypoints at the borders of mesh B coincide with point O. Therefore, the 4-connectivity extending model is suitable for generating the channel.  We solve the above problem by using a nested bilevel optimization algorithm, where the upper level problem is solved by the ACA and the lower level problem is solved by the QPSO. ACA is suitable for solving the combinatorial optimization problems while QPSO is suitable for solving the multivariable optimization problems. The process for the algorithm is described as follows: Step 1: Initialization. Choose the appropriate weight coefficient for α (the coefficient that characterizing the importance of pheromone), β (the coefficient that characterizing the importance of the heuristic information), and ρ (pheromone evaporation coefficient). The data, including the number of ants m (equivalent to the number of candidate channels), the current number of iterations c N , the maximum number of iterations max N , the heuristic information matrix H, which is equivalent to the reciprocal of distance between the current position and the destination, and initial pheromone matrix T, are initialized. Then, the ocean field information is inputted.
In the problem of path planning, few meshes receive pheromones in some cases because of the large scale of the problem (a 50 × 50 environment model has 2500 meshes). This condition leads to a serious problem wherein optimization falls into a local solution. So, the values of α and β are adjusted as 20 10 This adjustment is effective for path planning and preventing falling into a local solution.
Step 2: Constructing channels at the upper level. When an ant constructs a feasible channel, it must crawl through a set of connected meshes from an initial position to the destination. A given probability selection formula is then applied to determine the selection probability of each available meshes, where meshes with transfer values are selected based on certain rules. The ant k in mesh i can calculate the probability of visiting mesh j according to Equation (11). The criterion of exploration We solve the above problem by using a nested bilevel optimization algorithm, where the upper level problem is solved by the ACA and the lower level problem is solved by the QPSO. ACA is suitable for solving the combinatorial optimization problems while QPSO is suitable for solving the multivariable optimization problems. The process for the algorithm is described as follows: Step 1: Initialization. Choose the appropriate weight coefficient for α (the coefficient that characterizing the importance of pheromone), β (the coefficient that characterizing the importance of the heuristic information), and ρ (pheromone evaporation coefficient). The data, including the number of ants m (equivalent to the number of candidate channels), the current number of iterations N c , the maximum number of iterations N max , the heuristic information matrix H, which is equivalent to the reciprocal of distance between the current position and the destination, and initial pheromone matrix T, are initialized. Then, the ocean field information is inputted.
In the problem of path planning, few meshes receive pheromones in some cases because of the large scale of the problem (a 50 × 50 environment model has 2500 meshes). This condition leads to a serious problem wherein optimization falls into a local solution. So, the values of α and β are adjusted as where N m is the critical number of iterations. If 0 ≤ N c < N m , the dominant factor in ant routing is regarded as heuristic information because of the relatively small number of pheromones on each mesh. At this stage, the ant finds the destination more easily, and more meshes can get pheromones. Then, the dominant factor in ant routing is changed into pheromone factor when N m ≤ N c ≤ N max . This adjustment is effective for path planning and preventing falling into a local solution.
Step 2: Constructing channels at the upper level. When an ant constructs a feasible channel, it must crawl through a set of connected meshes from an initial position to the destination. A given probability selection formula is then applied to determine the selection probability of each available meshes, where meshes with transfer values are selected based on certain rules. The ant k in mesh i can calculate the probability of visiting mesh j according to Equation (11). The criterion of exploration depends on two terms, one relating to heuristic information and the other relating to the quantity of pheromones deposited by all the ants.
where τ ij is the concentration of pheromones on channel (i, j) which consists of meshes i, j, and h j that represents heuristic information of mesh j. The allowed(i) represents a set of meshes that can be explored and do not contain the obstacle. The tabu list represents the set of meshes that the ant has already passed through. The records in tabu list change as the ants select different channels.
Step 3: Optimization at the lower level. For each of the generated channels at the upper level, preform the lower level optimization to determine the exact energy-optimal path by QPSO, as described in Algorithm 2. The resulting channels of the upper level programming as the constraint of the lower level optimization. The individuals are evaluated based on the energy consumption function Equation (4). Finally, the procedure returns the best value, E * k of the lower level optimization.
Step 4: Releasing pheromones. According to fitness, pheromone is released according to certain proportions. The higher the fitness, the more pheromones are released. The pheromone updating mechanism is represented by the following equation: (12) where ∆τ k ij represents the released pheromone of the ant k on the channel (i, j). The expression is as follows: where Q represents the pheromone increasing coefficient, which is a constant. Equation (13) is the pheromone update calculation method based on the ant-cycle model. This method updates pheromones for the global channel, making it highly efficient and effective.
Step 5: Termination check. A termination check is performed before the next generation (Step 2) if the termination check is false.
The Algorithm 1 shows the simplified upper level algorithm. Algorithm 2 provides an overview of the iterative QPSO algorithm for the lower level optimization. Every particle in the swarm represents a potential path, and the parameters of each particle correspond to the coordinates of the waypoints generating the path. As the QPSO algorithm iterates, each particle is attracted towards its respective local attractor according to the outcome of the particle's individual search, as well as the particle swarm's search results.

Algorithm 2. Lower level algorithm.
Initialization: Choose appropriate parameters for the population size, n, (equivalent to the number of candidate paths), the current number of iterations X c and the maximum number of iterations X max . Set X c = 1.
Generate an initial group of particles with random states representing the candidate paths. Initialize the current state P i (0) and the pbest state P i (0) = P i (0). Main loop: while the terminate condition is not met do Compute the mean best state for each particle i do Evaluate the cost function F(P i (X c )) as defined in Equation (8); Update the pbest state P and the gbest state G; G(X c ) = argmin 1≤i≤n F(P i (X c )) (16) for each dimension of particle j do end for end for Set X c = X c + 1; end while Return G as the optimal fitness value and its correlated path as the optimal solution to the upper level programming.
In algorithm 2, the state of the ith particle is represented as follows P i = p i1 , · · · , p ij , · · · , p ik (21) where p ij represents the position of the waypoint at the boundary of jth mesh and (j + 1)th mesh in ith channel. The dimension k of every particle is determined by the number of meshes M contained in the channel. The relationship between them is The Contraction-Expansion coefficient is represented as τ, which is the only parameter in QPSO algorithm. It can be tuned to control the convergence speed of the algorithms. ϕ i,j is random number distributed uniformly on [0, 1]. ψ i,j is called as the potential well length, which represents the scope of searching of particles.

Simulation Results
The simulation results obtained for the energy-optimal path planning problem through different scenarios are shown. To evaluate the performance of the proposed bilevel optimization scheme, we have selected the SWE algorithm as the benchmark, which is a deterministic algorithm based on continuous optimization technique. The algorithms are implemented by using MATLAB 2014a on an Intel Core i5 processor with a speed of 3.2 GHz × 4 and 8 GB of RAM. (2) QPSO (the lower level): the population size is 50, and the maximum number of iterations is 500.

Simulation Setup
The additional stop criterion of both levels is satisfied when the weighted average change in the fitness function value over a set number of iterations is less than the function tolerance (1 × 10 −5 ), as follows: where l is the number of the current iteration and E is the relevant fitness value.
To better compare, the modified SWE algorithm is used in following simulation experiments, which the projected gradient method (similar to the lower level optimization) in SWE algorithm is replaced by QPSO with the same settings as the BIO.

Simulation Experiments with Different Scenarios
The results of the path optimization with the same currents field are displayed in Figures 5a, 6a and 7a-c. The optimal paths are respectively generated by the BIO scheme and SWE scheme. The solid line located in the channel represents the result of path generated by the BIO, and the dotted line represents the optimal path obtained by the SWE.

Simulation Experiments with Different Scenarios
The results of the path optimization with the same currents field are displayed in Figures 5a, 6a and 7a-c. The optimal paths are respectively generated by the BIO scheme and SWE scheme. The solid line located in the channel represents the result of path generated by the BIO, and the dotted line represents the optimal path obtained by the SWE.

Simulation Experiments with Different Scenarios
The results of the path optimization with the same currents field are displayed in Figures 5a, 6a and 7a-c. The optimal paths are respectively generated by the BIO scheme and SWE scheme. The solid line located in the channel represents the result of path generated by the BIO, and the dotted line represents the optimal path obtained by the SWE.  Figure 5a shows the optimal path in a scenario with no obstacles. Figure 6b displays the result of the optimal path in a scenario containing randomly distributed static obstacles with fixed uncertainty. The position uncertainty of each obstacle is represented as a black circle around the obstacle with radius 2σ o and indicates that the obstacle is located within this area at a confidence of 95.4%. The safe trajectory is achieved when the vehicle maneuver does not have any intersection with the proposed obstacle boundary. Figure 7a-c shows the simulation results of the scenario with quasi-static and moving obstacles. The uncertainty over both the quasi-static and moving obstacles are linearly propagated relative to the updating time. This leads to the radius growth of the static obstacles and simultaneous position and radius changes in the moving obstacles, which is expressed as a proportional increment in the collision boundary encircling the obstacles. The varying uncertainty of the obstacles can be clearly seen in the subsequent Time Step 1-3 of Figure 7.
Evidently, the utilized BIO and SWE path planning methods are capable of generating a collision-free path against the distribution of obstacles and taking advantage of ocean current to minimize the vehicle's energy consumption.  Figure 5a shows the optimal path in a scenario with no obstacles. Figure 6b displays the result of the optimal path in a scenario containing randomly distributed static obstacles with fixed uncertainty. The position uncertainty of each obstacle is represented as a black circle around the obstacle with radius 2 o σ and indicates that the obstacle is located within this area at a confidence of 95.4%. The safe trajectory is achieved when the vehicle maneuver does not have any intersection with the proposed obstacle boundary. Figure 7a-c shows the simulation results of the scenario with quasistatic and moving obstacles. The uncertainty over both the quasi-static and moving obstacles are linearly propagated relative to the updating time. This leads to the radius growth of the static obstacles and simultaneous position and radius changes in the moving obstacles, which is expressed as a proportional increment in the collision boundary encircling the obstacles. The varying uncertainty of the obstacles can be clearly seen in the subsequent Time Step 1-3 of Figure 7.
Evidently, the utilized BIO and SWE path planning methods are capable of generating a collision-free path against the distribution of obstacles and taking advantage of ocean current to minimize the vehicle's energy consumption. Table 1 lists the best fitness value and computation time of the two methods in finding the optimal solution considering the augmented objective function Equation (4). By comparing the simulation results, we have found that the computation time of BIO scheme is significantly less than  Table 1 lists the best fitness value and computation time of the two methods in finding the optimal solution considering the augmented objective function Equation (4). By comparing the simulation results, we have found that the computation time of BIO scheme is significantly less than that of the SWE scheme, and there is a bit of difference (less than 2%) in best fitness value between two schemes. The convergence curves of the upper level of the BIO scheme is shown in Figure 5b, Figure 6b, and Figure 7d, in which the broken lines represent the result of SWE scheme. The algorithm does not preserve the elite member in the upper level, so continuous improvement is not observed. Instead, some humps are contained. These convergence curves show that the fitness values are close to the optimal value given by the SWE algorithm during 30-45 iterations and the stopping criterion is satisfied during 50-75 iterations. It demonstrates that the proposed BIO algorithm provides good performance in terms of convergence. In addition, it can be seen that the iteration count decreases as the complexity of the scenario increases. This is because that the increase in complexity reduces the search space.
In addition, the A*-QPSO scheme (A* as the upper level optimization and QPSO as the lower level optimization) was tried for the outstanding performance of A* in the field of path planning. The five sets of results about SWE, A*-QPSO and BIO are listed in Table 2 in the none obstacle environment. Since the best fitness values of SWE and A*-QPSO are approximately equal (slight differences are caused by the uncertainty of QPSO), only computing time is shown. As can be seen from Table 1, the performance of A*-QPSO is better than SWE scheme but worse than BIO scheme. The reasons are as follows: According to A* algorithm theory, the performance of A* algorithm is largely determined by its heuristic function. The closer the heuristic function is to the actual value, the better the computational efficiency of A* algorithm is. In the field of path planning for a land robot or computer games, the heuristic usually is defined as the distance between current position and goal position. However, in the problem of energy-optimal path planning for AUV, the heuristic function will be adjusted according to the ocean current fields. The reasonable heuristic function is → v c and → c max respectively represent the thrust velocity of vehicle and the largest ocean current. The introduction of the largest ocean current (which is to guarantee h(x i ) < g(x i ), where h(x i ) and g(x i ) represent the heuristic value and actual cost value from x i to goal point, respectively.) reduces the weight of heuristic factors. The guiding effect of heuristic factors is weakened. Therefore, compared with A* algorithm without considering the current field, the advantages of A* are not well represented.
Based on the above reasons, this paper does not use A* approach as the path planning algorithm, and the further simulation for A*-QPSO is not implemented.

Performance Assessment
To evaluate the performance of the proposed algorithm, we have performed the simulations on the basis of randomly generated ocean currents (N = 100) with various obstacles. The settings for each simulation are the same as those described in Section 4. The performance of SWE and BIO are compared according to the following factors: the best fitness value and computation time. The best fitness value and the computation time respectively reflect the searching ability and searching efficiency. Table 3 shows that the BIO scheme generates paths at significantly shorter computation time, which the computation time of BIO scheme is about 1/3 of SWE scheme, and a slightly worse fitness value (less than 5%, which is acceptable because the errors also exist in vehicle navigation and ocean current measurement) than the SWE scheme. The differences in the mean of best fitness value are respectively 3.8%, 2.9%, and 4.3% under the three scenarios. It is worth noting that although the SWE scheme is used as a benchmark algorithm, the best fitness value of BIO scheme is possibly smaller than the value of SWE. This is because that the QPSO as a random searching algorithm has uncertainty. The last column in Table 3 provides information about QPSO calls to count. It shows that SWE took more QPSO evaluations than the BIO scheme and thus the SWE scheme has a long computation time. Note that the QPSO calls count is not equal to the product of the number of iterations and the quantity of ants because some ants may be a failure to find the channel to the destination especially in the initial stage of searching.

Conclusions
The BIO scheme is presented to solve the problem of path planning for AUV. The scheme works by splitting the path planning task into a selection channel and optimization path in the selected channel. The ACA and QPSO are used as the upper level and lower level algorithms, respectively. As indicated by the results obtained at different scenarios, the BIO scheme is capable of finding a collision-free path, while taking advantage of the ocean current to reduce energy consumption. We have compared the results obtained using the BIO scheme with that obtained by the SWE scheme and have found that the BIO scheme considerably improves computation efficiency with an acceptable accuracy.

Conflicts of Interest:
The authors declare no conflict of interest.

BIO
bilevel optimization GA Genetic algorithm SWE The sliding wavefront expansion algorithm ROMS the Regional Ocean Model System QPSO The quantum-behaved particle swarm optimization AUV Autonomous underwater vehicle ACA The ant colony algorithm FM The fast marching algorithm FM* Heuristically guided FM