Autonomous Path Planning of AUV in Large-Scale Complex Marine Environment Based on Swarm Hyper-Heuristic Algorithm

Featured Application: This paper presents a new path planning model that combines the global path planning and the local path planning for the large-scale complex marine environment. Meanwhile, the online learning swarm hyper-heuristic algorithm (SHH) is proposed to solve this model with real-time performance and stability. Abstract: Autonomous underwater vehicles (AUVs) as an e ﬃ cient underwater exploration means have been used to perform various marine missions. However, limited by the technologies of underwater acoustic communications and intelligent autonomy, the most current and advanced AUVs only perform a limited number of tasks in the small-scale area and the known underwater environment. Therefore, in this paper, a one path planning model was proposed combining the global path planning and the local path planning for the large-scale complex marine environment. More speciﬁcally, the B-spline curve was used to represent the smooth path for the requirement of kinematic constraints of AUVs. After considering the various constraints, such as the energy / time consumption, the turning radius limitation, the marine environment, and the ocean current, the path planning was abstractly modeled as a multi-objective optimization model with the time cost, the curvature cost, the map cost, and the ocean current cost. The swarm hyper-heuristic algorithm (SHH) with the online learning ability was proposed to solve this model with real-time performance and stability. The results showed that the proposed online learning SHH algorithm had obvious advantages in terms of time e ﬃ ciency, stability, and optimal performance compared with the results of two traditional heuristic algorithms, both particle swarm optimization (PSO) and ﬁreﬂy algorithm (FFA). The time e ﬃ ciency of the online learning SHH algorithm improved at least 20% compared with PSO and FFA. the results of PSO, FFA and SHH algorithms in the case of only ocean current information. The solid line was the result of the PSO; the dash-dot line described the result of the FFA algorithm, and the dashed line was the running result of the SHH algorithm. All of the three algorithms were planning a bending route because of the inﬂuence of ocean currents. The results show that all algorithms can generate the local paths that basically conform to ocean current information in scenario 1.


Introduction
In recent years, many scholars and engineers are committed to developing various autonomous underwater vehicles (AUVs) because of the increasing application demands [1], such as marine environment exploration [2], marine resource monitoring [3], pelagic survey [4], seabed resource exploitation [5,6], military mission [7], and so forth. With the rapid development of more available advanced technologies, such as processing capabilities and high-density power supplies, more and more AUVs are now indeed being used for executing those high risk and repetitive tasks instead of people [8]. With the roles and missions constantly evolving, AUVs, instead of people performing missions in the unknown large-scale complex underwater environment, are an inevitable trend  The navigation layer can plan the voyage paths according to the AUV's missions and the environmental information. At present, many available AUVs still need to download the workflow for a certain task to the control computer with a pre-programmed form before the start of the task. At the stage of task execution, human intervention is required at any time to ensure the security of the AUV and the reliability of the task execution. This traditional operation procedure cannot handle large-scale, highly dynamic and complex marine environments, and has low adaptability to new environments, which can easily lead to the loss of the AUV or the failure of mission's execution. Advanced AUVs are often equipped with expensive sensors so that the price and maintenance costs of AUVs are relatively high, and the loss or failure to complete tasks is unacceptable.
Therefore, AUVs must have the ability to adapt to the large-scale, highly dynamic marine environments to ensure their own safety and the accuracy of mission's completion. Planning is an effective way to deal with complex environments. Path planning and path re-planning generate the trajectory of the AUV navigation to ensure collision-free, safe and energy-saving and to adapt to dynamic and complex marine environments. The path planning achieves the autonomy of the AUV, ensuring the security of the AUV and the reliability of task execution. The following is an analysis of the constraints that need to be considered for AUV navigation path planning.
Trajectory dynamics constraints. When the actual AUV travels, there are dynamic constraints, such as the maximum radius of gyration, the maximum speed, the maximum acceleration, etc. The planned path should satisfy these constraints, which requires adding corresponding constraints in the algorithm to limit the generation space of the track. The path generated by the local path planning module should satisfy the dynamic constraints, and there should be no turning radius which is too small or other conditions that do not satisfy the dynamic constraints.
Obstacle and current constraints. Obviously, the path planned by the path planning algorithm needs to meet the collision-free constraint. In addition, the motion of the AUV should also take into account the influence of the water flow. The tangential water flow will cause the AUV to deviate from the original path and result in motion errors. The reverse flow will increase energy loss. Therefore, the path planning algorithm should plan as far as possible to avoid the collision and meet the path of certain water flow constraints. The navigation layer can plan the voyage paths according to the AUV's missions and the environmental information. At present, many available AUVs still need to download the workflow for a certain task to the control computer with a pre-programmed form before the start of the task. At the stage of task execution, human intervention is required at any time to ensure the security of the AUV and the reliability of the task execution. This traditional operation procedure cannot handle large-scale, highly dynamic and complex marine environments, and has low adaptability to new environments, which can easily lead to the loss of the AUV or the failure of mission's execution. Advanced AUVs are often equipped with expensive sensors so that the price and maintenance costs of AUVs are relatively high, and the loss or failure to complete tasks is unacceptable.
Therefore, AUVs must have the ability to adapt to the large-scale, highly dynamic marine environments to ensure their own safety and the accuracy of mission's completion. Planning is an effective way to deal with complex environments. Path planning and path re-planning generate the trajectory of the AUV navigation to ensure collision-free, safe and energy-saving and to adapt to dynamic and complex marine environments. The path planning achieves the autonomy of the AUV, ensuring the security of the AUV and the reliability of task execution. The following is an analysis of the constraints that need to be considered for AUV navigation path planning.
Trajectory dynamics constraints. When the actual AUV travels, there are dynamic constraints, such as the maximum radius of gyration, the maximum speed, the maximum acceleration, etc. The planned path should satisfy these constraints, which requires adding corresponding constraints in the algorithm to limit the generation space of the track. The path generated by the local path planning module should satisfy the dynamic constraints, and there should be no turning radius which is too small or other conditions that do not satisfy the dynamic constraints.
Obstacle and current constraints. Obviously, the path planned by the path planning algorithm needs to meet the collision-free constraint. In addition, the motion of the AUV should also take into account the influence of the water flow. The tangential water flow will cause the AUV to deviate from the original path and result in motion errors. The reverse flow will increase energy loss. Therefore, the path planning algorithm should plan as far as possible to avoid the collision and meet the path of certain water flow constraints.

Modeling of Path Planning
The conventional path planning method mainly relies on discretizing the environment and using the A* and other search algorithms to obtain the path between the start point and the end point. This obtained path is the absolute shortest path at the high resolution of the environment map. Although the path obtained by the A* and other search algorithms is the absolute shortest path, either it is not necessarily a smooth path due to the low-resolution environment map, or those methods would consume much time to search in the high-resolution map. It is difficult to track the unsmooth path for AUVs due to its ontology structure design. In recent years, many scholars have begun to study the path planning method that satisfies the requirements of the smoothness, and the shortest path simultaneously. In this section, the A* algorithm was first used to find the global path, and then modeled the local path planning based on the global path planning, considering the time, turning radius, obstacles, and ocean current constraints. The authors provided a detailed description of the problem mathematical definition and the optimization criteria for the local path planning.

Global Path Planning
The global path planning in the navigation layer generates a global network node path for AUV navigation based on the task execution sequence generated by the decision layer. The task execution sequence generated by the decision layer has three disadvantages for the following path planning. Firstly, the generation procedure of task sequence does not consider the influence of environment in detail. This will result in the length of the planned path being larger than the distance of two task points. Secondly, the distances between two task points are usually too long, especially for the scattered tasks in the large-scale marine environment. This characteristic will lead to the long computing time of calculating the optimal path and go against real-time planning and re-planning. Thirdly, the distances between two task points largely differ so that it brings difficulties to the real-time path planning and path re-planning.
Therefore, in the global path planning stage, this study expressed the task execution sequence as one task network graph G T = (V T , E T ). V T is the task node set of the task network graph (including one start point, one goal point, and lots of task points). E T indicates the edge set of the task network graph. Some discrete intermediate points with roughly uniform length are randomly added to the task node set V T . Combined with the task node set V T , one path node set V P and one path network graph G P = (V P , E P ) can be generated. The new edge set E P can be easily obtained by trying to connect all the added nodes by straight lines. If the straight line connecting two nodes is passed through the land or danger area, the A* algorithm is used to search for the shortest path in the path network graph. The A* algorithm should not take much time to explore the shortest path in this path network graph since it does not concern itself about other constraints, except for the environment. However, it can obtain a discrete path with the shortest distance. The distance between two arbitrary path points is approximately the same that will benefit the real-time path planning and path re-planning. By this way, this global path planning overcomes the above three disadvantages of the task execution sequence generated by the decision layer. As shown in Figure 2, the left picture is the task execution sequence generated by the task planning algorithm, and the right picture is the refined global planning path generated by the discrete path points with uniform spacing distances.
it can obtain a discrete path with the shortest distance., The distance between two arbitrary path points is approximately the same that will benefit the real-time path planning and path re-planning. By this way, this global path planning overcomes the above three disadvantages of the task execution sequence generated by the decision layer. As shown in Figure 2, the left picture is the task execution sequence generated by the task planning algorithm, and the right picture is the refined global planning path generated by the discrete path points with uniform spacing distances.
(a) (b) Figure 2. The global path planning results (a) the task execution sequence after task planning, and (b) the refined global path after global path planning.

Mathematical Definition of the Problem
The sailing of the AUV is easily affected by ocean currents due to its small inertia. On the one hand, the ocean current will interfere with the navigation of AUVs, causing its navigation drift. On the other hand, the rational use of ocean currents can assist AUVs to reduce energy consumption. Considering two extreme cases, if the AUV's navigation direction is consistent with the ocean current direction, the ocean currents will provide the positive work for the AUV's navigation. In this case, the ocean current assists the AUV to sail. When the AUV's navigation direction is opposite to the ocean current, in such case the ocean currents do negative work to the AUV, and the currents hinder the AUV's sailing. Therefore, if the ocean current information of the known environment is considered in the local path planning stage, the influence of ocean currents on the AUV navigation can be utilized to some extent to assist the AUV navigation.
The ocean current information is modeled as a two-dimensional ocean map, which represents the seawater flow information. This simplification is reasonable because the variation in the vertical direction is small relative to the horizontal size scale, and the movement of the water flow in the vertical direction is much smaller than the movement in the horizontal direction (because of the rotation of the earth). Under these two assumptions, the ocean dynamics model in the horizontal plane is described by the two-dimensional Navier-Stokes equation [35]: where → V = (V x , V y ) is the velocity field, ν is the fluid viscosity, ω = ∂V y ∂x − ∂V x ∂y is the vorticity, ∇ and ∆ are the gradient operator and Laplace operator respectively. However, this model is computationally intensive and cannot meet real-time requirements. In practice, Navier-Stokes can be expressed as a superposition of single-point vortices, namely the viscous Lamb vortex. The mathematical expression of the Lamb vortex is shown as: where → r 0 is the center position vector of the vortex; Γ and δ are the intensity and radius parameters of the vortex. The physical model represented by the Equation (2) is used to analyze the velocity field of the surrounding water flow environment [35].
The position, radius, and velocity of the Lamb vortex in practice are analyzed using data obtained by the horizontal acoustic doppler current profiler (H-ADCP). The number of Lamb vortices and the center of the vortex are estimated using the following method. Assuming that the velocity measured by H-ADCP is the tangential velocity produced by the nearest vortex, the vertical direction of this velocity is the radial direction of the vortex, using adjacent velocity information can obtain the center of the vortex. Figure 3 shows a schematic diagram of a water flow Lamb vortex.  Since the AUV rarely changes its depth when navigating in large-scale marine, in this paper, the authors only considered the two-dimensional path of AUV. The B-spline curve was used to represent the two-dimensional local path of AUV. Compared with the Bezier curve, the number of control points of the B-spline curve is independent of the order of the curve. The local characteristics of the curve can be adjusted by adjusting the position of the local control point. This explains why it is more and more applied in the robot path representation. The position of the control points can be defined as Therefore, the local path planning problem of AUV can be described as finding the control points of the B-spline curve, so that the obtained curve path satisfies the non-collision, the whole path satisfies the minimum turning radius constraint, and the path direction follows the direction of the ocean current as much as possible. This problem can be modeled as an optimization problem under multiple constraints, using intelligent algorithms.

Optimization Criteria
The optimization cost functions are the functions related to the B-spline path. Given a curve path, the cost function can be used to calculate a comprehensive cost value that combines the time cost, the path curvature, the environment map, and the ocean current. A discrete B-spline path ℘ can be expressed as the set comprised by the discrete points shown in the formula (4). Since the AUV rarely changes its depth when navigating in large-scale marine, in this paper, the authors only considered the two-dimensional path of AUV. The B-spline curve was used to represent the two-dimensional local path of AUV. Compared with the Bezier curve, the number of control points of the B-spline curve is independent of the order of the curve. The local characteristics of the curve can be adjusted by adjusting the position of the local control point. This explains why it is more and more applied in the robot path representation. The position of the control points can be defined as . . , n . The curve order is K and B i,K (t) is the basic function of the B-spline curve, then an AUV path can be expressed in the form of the equation: Therefore, the local path planning problem of AUV can be described as finding the control points of the B-spline curve, so that the obtained curve path satisfies the non-collision, the whole path satisfies the minimum turning radius constraint, and the path direction follows the direction of the ocean current as much as possible. This problem can be modeled as an optimization problem under multiple constraints, using intelligent algorithms.

Optimization Criteria
The optimization cost functions are the functions related to the B-spline path. Given a curve path, the cost function can be used to calculate a comprehensive cost value that combines the time cost, the path curvature, the environment map, and the ocean current. A discrete B-spline path ℘ can be expressed as the set comprised by the discrete points shown in the formula (4).
where h is the number of discrete points on the path; X i , Y i are the position of the AUV at the i discrete point; ψ i is the direction of the AUV at the i discrete point, also named as the yaw angle, which can be calculated using the following equation: The time cost is one optimization criteria that requires the shortest sailing time, and usually is proportional to energy consumption. The time cost is obtained by dividing the total length of the accumulated path by the average velocity, as shown in the Equation (6).
where h is the total number of points in the discrete path, and v is the average velocity. The time cost is used to constrain the traveling time. Minimizing the time cost under certain conditions can result in a path that has the shortest distance and satisfies certain conditions. The curvature cost is obtained by accumulating the approximate curvature at each discrete point on the path, as shown in the Equation (7).
The curvature cost is used to constrain the spatial curvature of the path, that is, the turning radius (curvature radius) on the path. The curvature and the turning radius are inversely proportional. A legal path should not have a turning radius less than the minimum turning radius, and the path that satisfies the AUV turning radius constraint can be obtained by minimizing the curvature cost. The map cost is obtained by the map information at each discrete point on the path, as shown in the Equation (8). If the point is on a map danger area (land, island, etc.), the map cost for that point is 1, otherwise 0. Minimizing the cost of the map is used to constrain the path from colliding with the danger area of the map and ensure the non-collision of the path.
The ocean current cost is one optimization criteria obtained by accumulating the dot product of the ocean current vector and the path direction vector at each discrete point on the path, as shown in Equation (9) represents the path unit direction vector at the point ℘ i x,y ; θ(·) is the angle between the two vectors The path planning is a multi-objective optimization problem. The optimal path evaluation criterion includes the shortest path time T cost , the smallest cumulative curvature ρ cost , the map cost M cost , and the ocean current cost O cost . The multi-objective optimization problem can be converted into a single-objective optimization problem by summing the weighted four cost value, as shown in the following equation: where ω 1 , ω 2 , ω 3 , ω 4 ∈ R are the weights of different cost values respectively. The values of weights are determined by the expert system and application demands.

Swarm Hyper-heuristic Algorithm
As shown in Figure 4, the hyper-heuristic algorithm is an advanced type of intelligent computing method appearing in the early 2000 s, which provides a high-level strategy (HLS) that manipulates or manages a set of low-level heuristics (LLH) to improve the searching efficiency and performance. The hyper-heuristic algorithm works in a higher abstraction layer than the meta-heuristic algorithm, and the hyper-heuristic algorithm selects which low-level heuristic method to use at a particular moment through state of the problem solving [36]. Swarm hyper-heuristics algorithm searches a kind of good solution to the problem rather than the best solution of the problem directly, which is independent of the specific problem domain or background issues.

Swarm Hyper-heuristic Algorithm
As shown in Figure 4, the hyper-heuristic algorithm is an advanced type of intelligent computing method appearing in the early 2000 s, which provides a high-level strategy (HLS) that manipulates or manages a set of low-level heuristics (LLH) to improve the searching efficiency and performance. The hyper-heuristic algorithm works in a higher abstraction layer than the meta-heuristic algorithm, and the hyper-heuristic algorithm selects which low-level heuristic method to use at a particular moment through state of the problem solving [36]. Swarm hyper-heuristics algorithm searches a kind of good solution to the problem rather than the best solution of the problem directly, which is independent of the specific problem domain or background issues.  The swarm intelligence, derived from the observation of insect swarms in nature, is the behavior characteristics exhibited by swarm organisms through cooperation. Swarm intelligence is an increasing focus in the field of metaheuristic algorithms, and the efficiency, and effectivity in solving complex problems has drawn much attention in recent years. However, some scholars believe that these similar algorithms lack novelty. Some algorithms have similar operations, by taking the different names and simulating the characteristics of another natural organism. The performance of the metaheuristic algorithm depends on how the algorithm balances two basic search mechanisms, intensification and diversification. The intensification gets the algorithm to perform a detailed search in the local, and the diversification prevents the solution from entering the local optimum prematurely. These two basic mechanisms include a variety of group operations. For example, fireflies' movements according to the light intensity and lightness in the firefly algorithm are intensification, and the random movement of fireflies is diversification.
Therefore, Tilahun et al. [37] proposed a swarm hyper-heuristic algorithm framework, trying to integrate the swarm meta-heuristic algorithm with a general hyper-heuristic framework, where the updating operators were recognized as low-level heuristics and guided by a high-level hyperheuristic. Different learning methods are used to determine the intensified and diversified behavior of the algorithm. According to whether or not to learn, it can be divided into three categories, no The swarm intelligence, derived from the observation of insect swarms in nature, is the behavior characteristics exhibited by swarm organisms through cooperation. Swarm intelligence is an increasing focus in the field of metaheuristic algorithms, and the efficiency, and effectivity in solving complex problems has drawn much attention in recent years. However, some scholars believe that these similar algorithms lack novelty. Some algorithms have similar operations, by taking the different names and simulating the characteristics of another natural organism. The performance of the metaheuristic algorithm depends on how the algorithm balances two basic search mechanisms, intensification and diversification. The intensification gets the algorithm to perform a detailed search in the local, and the diversification prevents the solution from entering the local optimum prematurely. These two basic mechanisms include a variety of group operations. For example, fireflies' movements according to the light intensity and lightness in the firefly algorithm are intensification, and the random movement of fireflies is diversification. Therefore, Tilahun et al. [37] proposed a swarm hyper-heuristic algorithm framework, trying to integrate the swarm meta-heuristic algorithm with a general hyper-heuristic framework, where the updating operators were recognized as low-level heuristics and guided by a high-level hyper-heuristic. Different learning methods are used to determine the intensified and diversified behavior of the algorithm. According to whether or not to learn, it can be divided into three categories, no learning SHH1, offline learning SHH2, and online learning SHH3. According to the conclusion of [37], the performance of the online learning method SHH3 was better than the other methods.
The swarm-based meta-heuristic algorithm is a swarm-based algorithm, in which the individual in the swarm interacts with each other through different update operators and achieves their own updates. The efficiency and effectivity of swarm hyper-heuristic algorithms are determined by two steps, including the heuristic selection and the heuristic generation. The heuristic selection is the method for choosing or selecting the appropriate heuristic at each iteration, while the heuristic generation is the procedure that generates various heuristics [36].

Heuristic Generation
Heuristic generation is the procedure that generates various heuristics. A heuristic, usually named as an operator, is one operation procedure that inputs a solution and outputs a new solution after an iteration. Assuming an operator is expressed as O · , the new solution x t+1 is the map using the operator O · after receiving an input x t , which can be expressed as the equation: To a certain extent, the performance of the algorithm depends on these heuristic operators. Many scholars carried out related research and proposed new heuristic operators for heuristic algorithms. Swarm intelligent operators mimic the different swarming behaviors of different swarm organisms. Some outstanding examples are the foraging swarming behavior from ants (the ant colony optimization) or flies (fruit fly optimization algorithm), and the gathering behavior from fireflies (firefly algorithm) or masses (binary gravitational search algorithm such as ants). However, some heuristic operators have similar operations, just taking the different names and simulating the characteristics of another natural organism. The widely used operators are summarized in the following Table 1. There may be more types of updating operators in the future, but the swarm hyper-heuristic algorithm framework does not depend on the specific type of operator, and it is convenient to add new types of operators. Various operators comprise an operation set of operators, denoted as O = {O 1 , O 2 , . . . , O k }. Different operators in this operation set have the different searching characteristics of increasing the degree of diversification or intensification. The diversified operator provides the search strategy away from the explored neighborhood region, whereas an intensified operator accelerates the convergence to the promising area. Table 1. Some widely and recently used operators.

Operator Name Operator Formula Note to Explain
Random move in the neighborhood x i := x i + λ min · rand · u λ min is an intensification step length Following better solutions Following own best is the best performance of the solution x i in history Random long jump x i := x i + λ max · u λ max is a diversification step length Mutation x i := m(x i ) m is a variation function Run away from the worst x w is the worst solution in population Run away from worse solution

Heuristic Selection
Heuristic selection is the procedure of using one high-level strategy (HLS) to manipulate or manage a set of low-level heuristics (LLH). It is noted that the performance of the swarm hyper-heuristic algorithm depends mainly on how to balance the two basic search mechanisms of intensification and diversification. There are two issues for this problem, including how to evaluate the intensification and diversification performance after one iteration and what strategy is used to select the operator according to the intensification and diversification performance. The without learning SHH uses the strategy of giving equal probability to select each operator in each iteration, whereas the online learning SHH has the ability to evaluate probabilities of choosing which operators before or after an iteration. In this paper, online learning SSH was chosen to improve the adaptability. The intensification and diversification of the algorithm are measured by a certain, and then the selection probabilities of the centralized and diversified operations are adjusted so that the intensification and diversification are more balanced.
The degree of intensification is determined by the value of the best cost function before and after one iteration. For a minimization problem, the algorithm is more intensified in this iteration if . The degree of diversification can be measured by the sum of the distances between solutions and the central solution, as shown in the following equation: where For a minimization problem, the algorithm is less diversified in this iteration if d t−1 > d t .
Therefore, for a minimization problem, the relationship between intensification and diversification after one iteration is shown in Figure 5. In the iterative process, the degree of diversification and intensification of the algorithm should be maintained at a considerable level. If the trend of algorithm diversification is detected to decrease, the selection probability of diversified operator in the algorithm should be appropriately increased. diversified in this iteration if d d > . Therefore, for a minimization problem, the relationship between intensification and diversification after one iteration is shown in Figure 5. In the iterative process, the degree of diversification and intensification of the algorithm should be maintained at a considerable level. If the trend of algorithm diversification is detected to decrease, the selection probability of diversified operator in the algorithm should be appropriately increased. γ > corresponding to all diversified operations. Then, the selection probability vector is normalized by P P P =  . The change of the degree of intensification or diversification can be achieved by modifying the selected probability of each operator. Consider an increase of the degree of diversification of search behavior, and the probability modifiers for all diversified operations by P t (o i ) = γP t−1 (o i ). The probability modification factor is γ > 1 corresponding to all diversified operations. Then, the selection probability vector is normalized by P = P/ P.

Initialization of B-spline Curve
In this paper, the path planning method uses the B-spline curve to represent the path of AUVs. The variable control point of a set of B-spline curves is an individual. Let the swarm size be i max , the dimension of each individual is M, then an individual in the swarm can be represented as a set of control points.
Considering that the path generally does not appear to be distorted and intersected, the order of the control points is generally along with the start point to the goal point. Therefore, in the initialization phase, the equidistant point is generated on the line connecting the start point and the goal point, and the perpendicular line of the line segment is made on each equidistant points to obtain a vertical line, and a random point is generated on each vertical line. Thus, the entire initialized swarm is obtained, and the random point is calculated as shown in equation: where dist is the end point of the vertical line to the left of the bisector, the distance between the start point S and the goal point D in the direction, slope is the slope of the vertical line and rand is a random value and rand ∈ [−0.5,0.5]. The swarm obtained by initialization is shown in Figure 6a.
, , ,  (14) where dist is the end point of the vertical line to the left of the bisector, the distance between the start point S and the goal point D in the direction, slope is the slope of the vertical line and rand is a random value and rand ∈[-0.5,0.5]. The swarm obtained by initialization is shown in Figure 6a. The fixed control point is used to control the start and end directions of the local path curve to be the same as the start and end directions of the AUV. The increase mode is to start from the starting point S and add a fixed control point 1 p along the starting direction of the AUV by a short distance.
Starting from the end point D, a fixed control point 2 p is added to a short distance along the opposite direction of the end direction of the robot. The fixed control point is used to control the start and end directions of the local path curve to be the same as the start and end directions of the AUV. The increase mode is to start from the starting point S and add a fixed control point p 1 along the starting direction of the AUV by a short distance. Starting from the end point D, a fixed control point p 2 is added to a short distance along the opposite direction of the end direction of the robot.
where S, D ∈ R 2 is the start point and goal point respectively. θ 1 is the starting direction and θ 2 is the end direction. Figure 6b shows the schematic diagram of fixed control points.

Procedure of Optimization
The online learning swarm hyper-heuristic algorithm was used to solve the optimization problem proposed in formula (10). The pseudocode of the path planning using online learning SHH algorithm is shown in Algorithm 1. The method first sets the algorithm parameters, such as the number of iterations t max , the probability modification factor γ, the number of individual swarms i max and the parameters of the B-spline curve. According to the equations from Table 1, the set of basic operators O is determined. Each operator O i has an equal probability of being selected in the first iteration. In this way, the initial value of the selection probability vector P is determined. Then initializing the swarm, an individual x i is represented by a set of variable control points of the B-spline curve, and the initial generation value of each individual is calculated using the cost function shown in the formula (10). Then, enter the iteration of the swarm until the number of iterations reaches the set value t max , and the optimal individual (the optimal control point) is used to generate the optimal local path. In each iteration step, one operator is selected from the set of basic operations for each individual in turn. The operator is applied to the individual, and the individual's cost value is updated until all individuals in the swarm have completed one operation. After each iteration, the degree of diversification can be evaluated by (12), and the selection probabilities of the basic operators are updated according to the four cases shown in Figure 5, making the trend of intensification and diversification more balanced in the next iteration.

Algorithm 1: Pseudocode of swarm hyper-heuristic algorithm for path planning
Input: S, θ 1 , D, θ 2 , i max , t max Output: Path 1. initialize: Set the parameter of algorithm and B-spline curve; 2. initialize the population basic operation set O, and its selection probability vector P; 3. initialize the population according to the formula (13) and (14). 4. for t = 1; 2; ; t max do 5.
Selects the basic operation corresponding to the maximum value in the population basic operation selection probability vector O i ; 7: operates on individual i using the operation O i ; 8: Updates the value of the individual i according to the Equation (10); 9: end for 10: Caculates the degree of diversification d t according to the Equation (12); 11: According to the Figure 5 to evaluate the degree of intensification and diversification of this iteration, and update the operation selection probability vector P t ; 12: end for 13: Output optimal Path;

Results and Discussion
The software running environment was Intel i7-8700K, 3.7 GHz. The operating system was ubuntu 16.04, and the algorithm was implemented by C++ language. The global path planning phase simply generates the discrete path points and obtains the shortest path points according to the A* algorithm, all of which can be completed in a relatively short time. The local path planning uses the B-spline curve to represent the path. Combined with time, map, curvature, and ocean current constraints, the local path planning problem is modeled as the nonlinear optimization problem with multi-constraint conditions shown in the formula (10). Then, the online learning SHH is adopted to solve this problem. In this section, three scenarios with the different marine environment, including only ocean current information, only dangerous area information and both ocean current information and dangerous area information contained, were simulated to test and analyze the running effectiveness of the model and algorithm. The general metaheuristic algorithms, such as particle swarm optimization (PSO) [27], firefly algorithm (FFA) [17], were used in the path planning problem. The FFA algorithm proved to have better performance compared with other algorithms in [17]. In this paper, the results of online learning SHH were compared with the PSO and FFA.
The number of B-spline control points was set to 7, where the number of variable control points was set to 3 and the curve order was set to 3. According to the parameter design method of [37], the number of iterations of the SHH algorithm was set to 80 times. The number of individual swarms was set to 100, random long-distance jump and mutation operations λ max = 40, random movement λ min = 20, and follow-up operations λ = 0.4. Referring to the paper [27], the inertia ω, the personal influence c 1 and the social influence parameters c 2 of PSO were set to 0.7298, 1.496, and 1.496 respectively. According to the reference [17], the number of FFA iterations was set to 80, the number of fireflies was set to 100, the attracting factor γ FFA = 0.05, and the random moving factor α = 3. The iterations times and the number of individual swarms of PSO, FFA, and SHH were set to the same value to compare the difference in the performance of the three algorithms under the same conditions.
It is worth noting that in the actual program implementation, the operation set only contained the first 6 operators, and did not contain the last 3 operators. After the inclusion of the runaway operators, the convergence of the algorithm drops sharply, and the results of the convergence cannot occur many times in the test. These two operations were not suitable for solving the optimization problem of the path planning model.

Scenario1: Only Ocean Current Information
As mentioned above, the ocean current has a significant influence on the navigation of AUVs, affecting navigation accuracy and energy consumption. Therefore, if the planned path can go in the direction of the ocean current, the ocean current will be used to improve the navigation performance of the AUV. In this part, this study analyzed the planning effect of the SHH algorithm for the scenario that only contained ocean current information. The results of the SHH were compared with the path planning results using two heuristic algorithms, PSO and FFA.
The solid line in Figure 7 was the planning result using the SHH algorithm without considering the influence of the ocean current. Since there was no danger area, the planning result was a straight line with the shortest path. The dashed line was the result after considering the influence of ocean current information. It is evident that the planning path has obvious bending under the influence of the ocean current. Under the condition of ensuring smoothness, the direction of planned path tried to follow the direction of the ocean current as near as possible, which proved that the ocean current constraint in the optimization cost function influenced the planning result.     Figure 8 is a comparison of the results of PSO, FFA and SHH algorithms in the case of only ocean current information. The solid line was the result of the PSO; the dash-dot line described the result of the FFA algorithm, and the dashed line was the running result of the SHH algorithm. All of the three algorithms were planning a bending route because of the influence of ocean currents. The results show that all algorithms can generate the local paths that basically conform to ocean current information in scenario 1.  The results of the three algorithms running 200 times under the same conditions were analyzed. Table 2 is the result comparison of the PSO, FFA, and SHH in the scenario only containing ocean current information. The data in the table were the average values obtained after repeatedly running 200 times under the same conditions. The computing time refers to the interval time from the entry to the completion of the algorithm iteration. The average computing time of PSO, FFA and SHH were 2.2815 s, 1.1834 s, and 0.6214 s respectively. It appeared that the SHH algorithm had the shortest average computing time compared with the PSO and FFA algorithms because the swarm operations of the three algorithms were different. The FFA operation was only the darkness individual moving towards the bright individual. This operation needs to traverse all other individuals when operating one individual. This process is time-consuming. In addition to the operations of following the better individual which need to traverse all other individuals, the other operations can be done in constant time, so it is easily understandable that the SHH algorithm has high computing efficiency. The total cost value of the final optimization of PSO, FFA, and SHH were 998.8257, 701.4505, and 624.668 respectively. The cost functions of the three algorithm evaluation paths were consistent. Therefore, the results obtained by the SHH algorithm had the best cost value. It is visible that the SHH algorithm using multiple operation sets has more advantages in solving the proposed path planning problem. In addition, although the path length obtained by the SHH is longer, it is smoother and more suitable for ocean current constraints.

Scenario 2: Only Danger Area Information
This section tests the operational effects of the three algorithms in the scenario containing only danger area information (Scenario 2). Figure 9 compares the results of PSO, FFA and SHH algorithms in scenario 2. The cost functions in scenario 2 are time, danger area, and curvature. The path is as short as possible on a smooth and bumpless basis. After running 200 times under the same conditions, the results of the three algorithms were analyzed. Table 3 shows the statistics of PSO, FFA and SHH algorithms in scenario 2 running under fixed conditions for 200 times. In terms of computing time, the SHH also had the greatest advantage, with an average computing time of 0.6217 s, while the average computing time of PSO and FFA were 2.4215 s and 1.2700 s respectively. The final ultimate average cost value of SHH was 306.4266, while the total average cost value of PSO and FFA were 588.3817 and 322.0653 respectively. This indicates the SHH algorithm can find the best optimal performance for the proposed model in scenario 2. The specific time cost and curvature cost of the SHH algorithm were more advantageous than the results obtained by the PSO and FFA. In addition, the map cost value in the cost function was 0, which indicated the visible paths had no collision with the environment obstacles.  After running 200 times under the same conditions, the results of the three algorithms were analyzed. Table 3 shows the statistics of PSO, FFA and SHH algorithms in scenario 2 running under fixed conditions for 200 times. In terms of computing time, the SHH also had the greatest advantage, with an average computing time of 0.6217 s, while the average computing time of PSO and FFA were 2.4215 s and 1.2700 s respectively. The final ultimate average cost value of SHH was 306.4266, while the total average cost value of PSO and FFA were 588.3817 and 322.0653 respectively. This indicates the SHH algorithm can find the best optimal performance for the proposed model in scenario 2. The specific time cost and curvature cost of the SHH algorithm were more advantageous than the results obtained by the PSO and FFA. In addition, the map cost value in the cost function was 0, which indicated the visible paths had no collision with the environment obstacles. This part compares and analyzes the operational effects of three algorithms in scenario 3 that contains both ocean current information and danger area information. Figure 10 Table 4 is the result comparison of PSO, FFA and SHH algorithms in the case of scenario 3. In terms of computing time, the SHH algorithm used the shortest computing time, with the average time of 0.6241 s, which was 0.1992 s faster than the FFA. The total cost value of the final path using the FFA was optimized to 935.7844, while for the SHH, the optimal path cost was 893.2590. The SHH had significant advantages in terms of time cost and ocean current cost, but the FFA had a smaller curvature value. PSO had the worst performance in the aspect of computing time, final total cost, and curvature cost value. It shows that the path obtained by the SHH algorithm is shorter and more in line with the ocean current constraint.   Table 4 is the result comparison of PSO, FFA and SHH algorithms in the case of scenario 3. In terms of computing time, the SHH algorithm used the shortest computing time, with the average time of 0.6241 s, which was 0.1992 s faster than the FFA. The total cost value of the final path using the FFA was optimized to 935.7844, while for the SHH, the optimal path cost was 893.2590. The SHH had significant advantages in terms of time cost and ocean current cost, but the FFA had a smaller curvature value. PSO had the worst performance in the aspect of computing time, final total cost, and curvature cost value. It shows that the path obtained by the SHH algorithm is shorter and more in line with the ocean current constraint.

Stability Analysis
In addition to the analysis of the results and average data of the three scenarios, the stability and distribution of the results are also statistically analyzed. This subsection compares and analyzes the running time and the resulting stability of three algorithms.  Figure 11, the SHH algorithm that has the most stable and most concentrated distribution of running time among the three algorithms can be obtained. The results indicated the SHH algorithm had a good calculation time stability performance in solving the proposed path planning model in this paper. The time efficiency of the online learning SHH algorithm improved at least 20% compared with those of PSO and FFA.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 19 of 22 In addition to the analysis of the results and average data of the three scenarios, the stability and distribution of the results are also statistically analyzed. This subsection compares and analyzes the running time and the resulting stability of three algorithms.  Figure 11, the SHH algorithm that has the most stable and most concentrated distribution of running time among the three algorithms can be obtained. The results indicated the SHH algorithm had a good calculation time stability performance in solving the proposed path planning model in this paper. The time efficiency of the online learning SHH algorithm improved at least 20% compared with those of PSO and FFA.  It can be seen that the optimal cost was mainly concentrated in the range of 1027 to 1101, and the median value was approximately 1030. There were some abnormal points on the upper side of the concentration range, and the maximum value of these abnormal points was approximately 1195. The middle graph was the running result of the FFA. It can be seen that the optimal cost after optimization was mainly concentrated between 910 and 965, the median was around 935. A small number of abnormal points appeared on the upper side of the concentration range. The maximum was approximately 995. The right graph is the running result of the SHH. It can be seen that the optimal cost value after optimization was mainly concentrated between 860 and 930, the median was  1195. The middle graph was the running result of the FFA. It can be seen that the optimal cost after optimization was mainly concentrated between 910 and 965, the median was around 935. A small number of abnormal points appeared on the upper side of the concentration range. The maximum was approximately 995. The right graph is the running result of the SHH. It can be seen that the optimal cost value after optimization was mainly concentrated between 860 and 930, the median was around 890. A small amount appears on both the upper side and the lower side of the concentration range. Among the abnormal points, the smallest abnormal point was near 840, and the largest abnormal point was near 980. By comparing the three cost stability analysis graphs, it can be obtained that the cost value distribution of the SHH and FFA have the consistent ranges, that is, the final cost stability of the two algorithms is equivalent. However, the overall cost value of FFA exceeded that of SHH to approximately 50, indicating the SHH algorithm had the best search capability in solving the proposed path planning model in this paper.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 20 of 22 SHH to approximately 50, indicating the SHH algorithm had the best search capability in solving the proposed path planning model in this paper.

Conclusions
In this paper, a unique model was developed which combined the global path planning with the local path planning in order to meet the requirements for the large-scale complex environment. The global path planning only performs the homogenization operation on the task sequence route, so that the subsequent local path planning solution scale is in a relatively stable range. The local path planning was modeled as a nonlinear optimization with multi-constraints, including the time, the danger area, curvature, and ocean currents constraints. The online learning SHH was proposed to attack the complexity of the path planning model. This paper analyzed the results in three scenarios. Moreover, our rigorous comparison of the three algorithms, including PSO, FFA, and SHH algorithms, showed that the online learning SHH algorithm had significant advantages in solving the proposed path planning model in terms of computing time, optimization ability and stability.

Patents
The Chinese invention patent CN201910209731.X titled with the AUV path planning in an ocean current environment based on swarm hyper-heuristic algorithms results from the work reported in this manuscript.

Conclusions
In this paper, a unique model was developed which combined the global path planning with the local path planning in order to meet the requirements for the large-scale complex environment. The global path planning only performs the homogenization operation on the task sequence route, so that the subsequent local path planning solution scale is in a relatively stable range. The local path planning was modeled as a nonlinear optimization with multi-constraints, including the time, the danger area, curvature, and ocean currents constraints. The online learning SHH was proposed to attack the complexity of the path planning model. This paper analyzed the results in three scenarios. Moreover, our rigorous comparison of the three algorithms, including PSO, FFA, and SHH algorithms, showed that the online learning SHH algorithm had significant advantages in solving the proposed path planning model in terms of computing time, optimization ability and stability.

Patents
The Chinese invention patent CN201910209731.X titled with the AUV path planning in an ocean current environment based on swarm hyper-heuristic algorithms results from the work reported in this manuscript.