Next Article in Journal
A Multi-Method Approach to Identify the Natural Frequency of Ship Propulsion Shafting under the Running Condition
Previous Article in Journal
Hydrodynamic Behaviour of a Floating Polygonal Platform Centrally Placed within a Polygonal Ring Structure under Wave Action
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

COLREGs-Compliant Multi-Ship Collision Avoidance Based on Multi-Agent Reinforcement Learning Technique

Navigation College, Dalian Maritime University, Dalian 116026, China
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2022, 10(10), 1431; https://doi.org/10.3390/jmse10101431
Submission received: 8 August 2022 / Revised: 30 September 2022 / Accepted: 30 September 2022 / Published: 4 October 2022
(This article belongs to the Section Ocean Engineering)

Abstract

:
The congestion of waterways can easily lead to traffic hazards. Moreover, according to the data, the majority of sea collisions are caused by human error and the failure to comply with the Convention on the International Regulation for the preventing Collision at Sea (COLREGs). To avoid this situation, ship automatic collision avoidance has become one of the most important research issues in the field of marine engineering. In this study, an efficient method is proposed to solve multi-ship collision avoidance problems based on the multi-agent reinforcement learning (MARL) algorithm. Firstly, the COLREGs and ship maneuverability are considered for achieving multi-ship collision avoidance. Subsequently, the Optimal Reciprocal Collision Avoidance (ORCA) algorithm is utilized to detect and reduce the risk of collision. Ships can operate at the safe velocity computed by the ORCA algorithm to avoid collisions. Finally, the Nomoto three-degrees-of-freedom (3-DOF) model is used to simulate the maneuvers of ships. According to the above information and algorithms, this study designs and improves the state space, action space and reward function. For validating the effectiveness of the method, this study designs various simulation scenarios with thorough performance evaluations. The simulation results indicate that the proposed method is flexible and scalable in solving multi-ship collision avoidance, complying with COLREGs in various scenarios.

1. Introduction

The continuous increase in the number of maritime transportation vessels results in the waterways becoming more congested. Obviously, this situation will cause serious traffic hazards. When there are many ships around, it is easy to make wrong decisions by only relying on people to control the ship. According to data, it is found that about 89–96% of sea collisions are caused by human error [1]. To avoid this situation, ship automatic avoidance has become one of the most important research issues in the field of marine engineering. However, due to the complex motion model of ships and the low control accuracy, most of the algorithms cannot meet the requirements. Artificial intelligence technology (AI) is currently the most applicable method to solve this problem [2]. Deep reinforcement learning (DRL) is a new research hotspot in the field of artificial intelligence, and has made great progress in both theory and applications, in particular, the I-go agent named ‘Alpha-Go’, which was created by the ‘Google Deep Mind’ team and which beat the top I-go players [3]. It has also made substantial breakthroughs in decision-making control [4]. Deep reinforcement learning (DRL) consists of two parts: deep learning and reinforcement learning. Among them, deep learning has a strong perceptual ability, and is widely used in image analysis [5], speech recognition [6] and other fields. Reinforcement learning is known for its decision-making ability, first proposed by Sutton in 1984 [7]. It uses a reward and punishment system, gains experience from the environment, then adjusts strategies through multiple training to adapt to the environment and ultimately achieves the desired results. The task of autonomous ship avoidance involves interactions with the environment and objects, and many scenarios involve interactions between multiple ships; and this problem becomes more complex during this common interaction.
This paper proposes a COLREGs-compliant multi-ship collision avoidance method based on multi-agent reinforcement learning CA-QMIX. A 3-DOF ship model is utilized, and an Optimal Reciprocal Collision Avoidance algorithm is used to detect the risk of collision and provide a safe velocity. The simulation studies validate the effectiveness of the algorithms in the multi-ship collision avoidance problems.

2. Literature Review and Motivation

The automatic collision avoidance technology of ships is key in guaranteeing the safety of navigation. Recently, the related theories and technologies have been gradually improved. Miele et al. [8] proposed a method based on the multi-subarc sequential gradient-restoration algorithm to solve two cases of the collision avoidance problem: two ships moving along the same rectilinear course and orthogonal courses. Phanthong et al. [9] described path replanning techniques and proposed an algorithm based on the A star algorithm to avoid against stationary and dynamic obstacles with the optimal trajectory. Cheng et al. [10] proposed an optimization method based on a genetic algorithm which was applied to avoid collision and to seek the trajectory. However, these methods do not comply with the COLREGs, which should not be ignored for ocean-going ships.
Methods complying with COLREGs collision avoidance have been proposed for multi-ships in the open sea. Wilson et al. [11] proposed a new navigation method called a line-of-sight counteraction navigation algorithm (LOSCAN). The algorithm aided maneuver decision-making for a two-ship collision avoidance complying with COLREGs. However, this method is not capable of dealing with multi-ship collision avoidance. Liang et al. [12] proposed the minimum course alteration algorithm (MCA) to avoid moving ships, or obstacles constrained by COLREGs. The results of simulation presented that the algorithm was credible in collision avoidance. Chen et al. [13] designed an intelligent collision avoidance control system, which integrated the collision avoidance navigation and the nonlinear optimal control methods. To avoid collision, two fuzzy indicators including collision risk and collision avoidance acting timing were developed. Johansen et al. [14] describes a concept for a ship collision avoidance system, which is based on model predictive control. COLREGs and collision hazards associated with each of the alternative control behaviors are evaluated on a finite prediction horizon. Hu L et al. [15] designed a multi-objective optimization algorithm which cooperated a hierarchical sorting rule to prioritize the objective of course or speed change preference over other objectives such as path length and path smoothness. All of these methods can complete the two-ship collision avoidance task, and comply with COLREGs. However, when the ship encounters more complex scenarios, such as four ships colliding at the same time, these methods cannot achieve collision avoidance navigation.
With the development of artificial intelligence, a number of collision avoidance methods based on deep reinforcement learning (DRL) have been developed. Shen et al. [16] presented a training method based on DRL for ship collision avoidance, which incorporated the ship maneuverability, human experience and COLREGs. Through the experimental validation of three self-propelled ships, it demonstrated that the method based on DRL had great potential to realize automatic collision avoidance. Sawada et al. [17] proposed a multi-ship automatic collision avoidance method based on DRL in continuous action space, and the obstacle zone by target was used to compute risk of collision. The trained agent has passed a large number of simulation scenarios. Li et al. [18] utilized the artificial potential field (APF) algorithm to improve the action space and reward function of DRL. To avoid collision, the method trained agents complying with COLREGs. The results of simulation showed that the improved DRL could realize automatic collision avoidance. Zhao et al. [19] proposed a method which used the Deep Neural Network (DNN) to map the states of encountered ships to a ship’s own steering commands in terms of rudder angle. The policy gradient-based DRL algorithm was used to train the DNN for collision avoidance complying with COLREGs. The simulation results indicated that the multi-ship model was able to avoid collision. Xu et al. [20] formulated the collision avoidance strategy and designed the state, action, reward function and network structure to improve the DDPG algorithm. The results showed that the method can give reasonable collision avoidance actions and realize effective collision avoidance. The advantages and disadvantages of the literature is shown in Table 1.
In this study, a novel intelligent method based on multiple agent reinforcement learning, named the CA-QMIX algorithm is proposed. The COLREGs and ship maneuverability are considered for achieving multi-ship automatic collision avoidance. The Optimal Reciprocal Collision Avoidance (ORCA) algorithm is used to detect and reduce the risk of collision. The safe velocity computed by the ORCA is adopted to avoid collision. This study also utilizes the three-degrees-of-freedom (3-DOF) Nomoto ship motion mathematical model to simulate the maneuvers of a ship. Finally, the state space, action space and reward functions are designed for improving the convergence rate of training. The simulation results indicated that the proposed method has excellent flexibility and scalability for solving multi-ship collision avoidance complying with COLREGs in various scenarios.
The organization of this paper is stated as follows: In Section 3, the method for detecting the risk of collision is given first. Then, the ship motion model and COLREGs, which are considered as the basis of ship collision avoidance, are illustrated. Section 4 describes the principles and applications of the multi-agent reinforcement algorithm. Section 5 is the simulation result of the multi-ship collision avoidance. Section 6 is a summary of this paper.

3. Ship Collision Avoidance Problem

3.1. Problem Definition

The solution to the multi-ship collision avoidance problem can be roughly divided into the following two categories:
  • Single-agent collision avoidance: The own-ship (OS) is considered as an agent, the target-ship (TS) is seen as a dynamic obstacle;
  • Multi-agent collision avoidance: Each ship is an agent, and there are partnerships between them.
In this study, we aim at successfully completing multi-ship collision avoidance and reach the target point. The good or bad behavior of collision avoidance is not only related to own-ship (OS) actions, but also related to the target-ship (TS) actions; under the trend that the communication network (5G or 6G network) will gradually cover the world in the future, the interaction between ships will be more convenient, and the advantages of the multi-agent method will become more obvious. Thence this paper defines multi-ship collision avoidance as a multi-agent problem, and uses the multi-agent reinforcement learning algorithm to solve the problem.

3.2. The Ship Motion Model and Collision Detection

Establishing a suitable ship motion mathematical model is necessary before using the algorithm. This paper uses the Nomoto three-degrees-of-freedom (3-DOF) model [21] and the principal dimensions of the ship form [22]. The coordinate systems are shown in Figure 1, and the principal dimensions of the ship are given in Table 2.
Where the ship’s velocity set includes surge velocities u v , sway velocities v v and yaw rate r v . The ψ denotes heading angle, and ψ d stands for desired heading angle. Hence the error of heading angle is ψ e = ψ ψ d . The rudder characteristics are expressed as.
[ ψ ˙ r ˙ δ ˙ ] = [ r ( K δ r ) / T ( δ E δ ) / T E ]
where δ and δ E are the real rudder angle and the command rudder angle. T E is the time constant of the steering gear.
To reduce collision risk, this paper performs collision detection using the ORCA (Optimal Reciprocal Collision Avoidance) [23] method. The schematic diagram of ORCA is illustrated in Figure 2. Similarly, the ship domain concept is suggested to calculate the collision risk area and is used to define a safe area. To increase security, the area of the ship domain is further expanded into a circle (taking d 5 as diameter [24]).
d 5 = L ν 1.26 + 30 ν + u
As is shown in Figure 2a, a hexagon-shaped collision danger risk area is created, then taking the smallest circumscribed circle of the hexagon, the final collision danger risk area is formed. The multi-ship collision avoidance problem can be simplified into the collision avoidance behavior of circular areas with a radius R O S and R T S . As Figure 2 shows, P O S and P T S stand for the location of OS and TS, respectively. However, the movement of the areas are still constrained by the ship model.
In the velocity coordinate system of Figure 3a, assuming that TS is stationary, if OS does not collide with TS during movement, then the velocity of OS cannot be selected from the velocity obstacle V O S | T S t s (Figure 3a gray part). The definition of the velocity obstacle implies that if V O S V T S V O S | T S t s , or equivalently if V T S V O S V T S | O S t s , then OS and TS will collide at some moment before time t s (one time step).
V O S | T S t s = { v | t [ 0 , t s ]   : :   t v D ( P T S P O S , R T S + R O S ) }
where D is a circle with center P T S P O S and radius R T S + R O S . The V O S | T S t s is geometrically a truncated cone with its apex at the origin and its two sides tangent to the circle (center P T S P O S and radius R T S + R O S ). The cone is truncated by arc (center ( P T S P O S ) / t s and radius ( R T S + R O S ) / t s ). Generally speaking, OS cannot sail for t s time with current velocity, otherwise OS will collide with the TS.
When V T S is considered, the set of velocities which make OS collide with TS are V O S | T S t s V T S (Figure 3b gray part). Finally, the complementary set of V O S | T S t s V T S is the safe velocity recorded as V s a f e . If the ship trails with V V s a f e , the collision will be avoided; but it is a difficult problem to select the optimal velocity from the set V s a f e .
For solving the above problem, a method named Optimal Reciprocal Collision Avoidance (ORCA) is presented. Firstly, a vector u illustrates minimal change to make OS’s velocity V O S V O S | T S t s .
u = ( arg min V O S V O S | T S t s V O S * V O S ) V O S
where V O S | T S t s is the limit point of the V O S | T S t s . The vector n is a normal vector of the V O S | T S t s boundary. n points to the inside of the collision area and the starting point is at the intersection of the vector u and the boundary. Combining these variables, O R C V O S | T S t s is defined as the optimal reciprocal collision avoidance velocity of OS.
O R C V O S | T S t s = { V O S * | ( V O S * ( V O S + 1 2 u ) ) n 0 }
where O R C V O S | T S t s is the optimal reciprocal collision avoidance velocity set for OS avoiding collision with TS in time t s . When multiple ships avoid collision, each ship computes the optimal velocity set by ORCA, and the intersection of these velocity set form a polyhedron. For achieving multi-ship collision avoidance, we defined a velocity set O R C V O S t s , in which OS adopting V O R C V O S t s can avoid colliding with all TSs.
O R C V O S t s = D ( 0 , V O S * ) ( O S T S O R C V O S | T S t s )

3.3. COLREGs

Before using the method to solve the multi-ship collision avoidance problem, COLREGs needs to be considered. The OS must react to avoid the TSs while complying with COLREGs, and subsequently, return to its predefined path once safety is confirmed. As illustrated in Figure 4, a diagram centered on the OS is divided into four parts:
  • Head-on: When two vessels (OS and TS) are meeting on opposite or approximately opposite routes within an azimuth angle of (0°, 5°) or (355°, 360°), this situation should be judged as a head-on situation. The two vessels (OS and TS) should alter their course to starboard, so that each vessel should pass on the port side of the other to avoid collision;
  • Port crossing: When a vessel (OS) is crossing on its port side within an azimuth angle of 247.5–355°, this situation should be judged as a port crossing situation. The vessels (OS) are not the give-way vessels so they shall keep their original course and speed;
  • Starboard crossing: When a vessel (OS) is crossing on its starboard side within an azimuth angle of 5–112.5°, this situation should be judged as the starboard crossing situation. The vessels (OS) shall alter their course to the starboard side to avoid collision;
  • Overtaking: When a vessel (OS) is chasing another vessel (target-ship, TS) within an azimuth angle of 112.5–247.5° directly behind it, this situation should be judged as overtaking. The vessels (OS) shall alter their course to starboard or port side to avoid collision.
The collision avoidance behaviors conforming to COLREGs are shown in Figure 5.

3.4. COLREGs-Based Multi-Ship Collision Avoidance

The COLREGs can be extended to scenarios where OS encounters multiple TSs. The multi-ship collision avoidance under the COLREGs can be summarized as Figure 6.
In Figure 6, the OS encounters two TSs in different directions, they should all comply with COLREGs and alter course to starboard to avoid collision. In the same way, when three ships encounter a similar situation, they should all alter course to starboard. In summary, when a multi-ship (≥3) encounter is occurring, each ship should follow the COLREGs and alter course to starboard for avoiding collision.
The process of collision avoidance combined with COLREGs is shown in Figure 7. Firstly, the instantaneous ship domain should be calculated and expanded to a safe round area. Then, OS detects whether TSs enter into the OS’s safe area, and judges the encounter scenario by TSs’ instantaneous positions. To avoid collision, OS must select V O R C V O S t s which is calculated by ORCA. After one time step, OS checks whether TSs leave the OS’s safe area. If there are TSs still in the safe area, the above steps are repeated.

4. Algorithm Background

4.1. Algorithm Model and CTDE

4.1.1. CTDE

Multi-agent reinforcement learning is a recent research topic. Centralized Training Decentralized Execution (CTDE) [25] is arguably the simplest method to train and execute. However, multi-agent reinforcement learning has two difficulties:
  • Observational limitations: When the agent interacts with the environment, the agent cannot obtain the global state s of the environment, and can only see the local observation information within its own observation range o ;
  • Instability: When multiple agents learn together, the changing strategies and the mutative actions caused mean that the value function of agent i cannot stably update.
Therefore, to solve the above problems, this study proposes using the Centralized Training Decentralized Execution (CTDE) framework to relax the limitation conditions and allow agents to access the global information during training.

4.1.2. DEC-POMDP Model

QMIX algorithm takes the DEC-POMDP model [26] as the standard for the cooperative multi-agent tasks model. All variables are grouped into a tuple G = { S , U , P , r , Z , O , N , γ } , where s S denotes the true state of the environment. Each agent i = { 1 N } chooses an action u i U at each time step, generating a joint action vector u : = [ u i ] i = 1 N U N . Function P ( s | s , u ) : S × U N × S [ 0 , 1 ] decides all of the state transition dynamics. Every agent uses the same joint reward function r ( s , u ) : S × U N , and γ [ 0 , 1 ) is the discount factor. Each agent has its own observation z Z , according to the observation function O ( s , i ) : S × N Z . Each agent also has an action observation history τ i Γ : = ( Z × U ) * , on which it conditions its stochastic policy π i = ( u i | τ ) : Γ × U [ 0 , 1 ] .

4.2. IQL and VDN

The IQL (Independent Q-Learning) [27] algorithm treats the rest of the agents directly as part of the environment. That is, each agent is solving a single-agent task. The value function of the agent i is Q i ( τ i , u i ) . Only relying on Q i ( τ i , u i ) for decision-making is unstable. Obviously, due to the existence of agents in the environment, the environment is a non-stationary state, so convergence cannot be guaranteed, and the agent can easily get caught up in endless exploration.
It is necessary to use Q t o t a l ( τ , u ) to learn in global sight. Sunehag [28] proposed the VDN algorithm which used Q i ( τ i , u i ) to finish the factorization of Q t o t a l ( τ , u ) , the formula is as follows:
Q t o t a l = i = 1 N Q i ( τ i , u i )
The VDN just accumulates the local action value functions of each agent to get the joint action value function, so that it satisfies the conditions as the same additivity of Q t o t a l ( τ , u ) and Q i ( τ i , u i ) . But it does not integrate the single agent local value function during learning.

4.3. QMIX Algoritnm

4.3.1. IGM Condition and Constraint

In order to follow the advantages of VDN, centralized learning is used to obtain distributed strategies; QMIX [29] first defines the conditions called IGM (Individual-Global-Max):
arg max u = ( arg max u 1 Q 1 ( τ 1 , u 1 ) arg max u N Q N ( τ N , u N ) )
Generally speaking, if Equation (8) is satisfied (doing the arg max of Q t o t a l ( τ , u ) and Q i ( τ i , u i ) are equivalent), then gaining optimal actions by local Q i ( τ i , u i ) is trivially tractable.
For achieving this effect, the QMIX algorithm sets a sufficient condition:
Q t o t a l Q i 0 , i N
If Q t o t a l ( τ , u ) and Q i ( τ i , u i ) satisfy monotonicity, then equation 4 holds. For the purpose of achieving the constraints, QMIX uses an architecture to implement.

4.3.2. Overall Framework

The overall framework of the QMIX algorithm is shown in Figure 8. The network structure mainly consists of three parts:
  • Agent network (Figure 8a): It is represented by the DRQN network. In the partially observable setting, agents using RNN can use all their action-observation history information to get the current state. Its input at each step is the current individual observation o t i of the agent and the action u t 1 i at each time step;
  • Hypernetwork: Hypernetwork [30] is used to calculate network weights and biases in the mixing network. Its inputs are global state inputs. The outputs are the weights and the bias, where the weights need to be greater than 0 ( W 0 ) , so the activation function is the absolute activation function. Biases use the common Relu activation function, because it does not have a requirement for the value range;
  • Mixing network (Figure 8c): Its weights and biases are generated by the Hypernetwork, and its role is to mix the Q i ( τ i , u i ) of each agent into a monotonic Q t o t a l ( τ , u ) of the whole system, and to also make the training more stable by increasing the system information.
QMIX is trained to minimize the following loss:
L ( θ ) = i = 1 b [ ( Q t o t a l ( τ , u ) y t o t a l ( r , τ ; θ ) ) 2 ]
where b is the batch size of transitions sampled from the replay buffer, and the role of y t o t a l [31] is to update networks. θ are the parameters of a target network as in DQN.
y t o t a l = r + γ max u Q t o t a l ( τ , u , s ; θ )

4.3.3. Algorithm Implementation

During the multi-ship collision avoidance, each ship is an agent to participate in the training; the multi-agent reinforcement learning algorithm named QMIX is employed to solve the problem. The iterative updating process of the algorithm is shown in Figure 9. The parameters of the training are shown in Table 3.

4.4. CA-QMIX Algoritnm

4.4.1. Action Space

In the process of ship collision avoidance, the crew changes heading and speed to ensure navigational safety. Likewise, during automatic collision avoidance, the performance of turning is considered for designing the action u , where u [ ψ , ψ ] and ψ is the change in course angle. The command of the rudder angle is obtained by the ship motion mathematical model. At each time step, each ship chooses an action u i , giving rise to a joint action vector [ u i ] i = 1 N , where N is the number of ships.

4.4.2. State Space

The state space is defined as the set of information about the environment that the ship receives at a given time step t s . The observed state includes each ship location P l o c a t i o n , the location of goal P g o a l , heading angle ψ , desired heading ψ d , velocity V and the ship length L .

4.4.3. Reward Function

Reward function is an evaluation of ship movements, and calculated as the sum of the remuneration accumulated from each ship. This process is expressed in:
R t o t a l = i = 1 N R i
The reward of each ship R i is the sum of the rewards accumulated in each episode. The objective of the study is to avoid OS and TSs collision, and to maneuver OS complying with COLREGs. Consequently, the reward function can be defined to reward the agent for reaching the destination and for avoiding the collision by complying with COLREGs.
R i = R g o a l + R c o l l i s i o n + R C O L R E G s
The goal reward function R g o a l is to guide the ship to reach the destination. It is expressed as a formula by:
R g o a l = { 0 , i f P t P g o a l d 5 4 λ g o a l ( P t P g o a l P t 1 P g o a l ) , i f   o t h e r w i s e
where P t is the ship current location at t episode, and λ g o a l is a hyperparameter. As the distance between the ship and the destination gets shorter, the agent obtains more substantial reward value. The reward value reaches maximum, when the distance becomes less than d 5 4 .
For collision avoidance and fulfilling COLREGs, this paper designs the reward functions R c o l l i s i o n and R C O L R E G s .
R c o l l i s i o n = { 0 , i f V O R C V O S t s r c o l l i s i o n , i f o t h e r w i s e
R C O L R E G s = { r C O L R E G s , i f t u r n r i g h t r C O L R E G s , i f o t h e r w i s e
However, sometimes the goal reward function is contradictory to the collision avoidance reward. Thereby, the whole process is divided into two stages including normal sailing and collision avoidance, as shown in Figure 10.

5. Method for Path Planning and Collision Avoidance Based on CA-QMIX

According to the previous section, the design of the ship collision avoidance system has been presented with an explanation of important parts in detail. In this section, we trained the agent to avoid collision using the CA-QMIX algorithm. The proposed CA-QMIX algorithm has been evaluated with simulation tests for diverse environments. The setting of the ship collision avoidance simulation scenarios included two-ship encounter situations and a multi-ship encounter situation. The collision avoidance scenarios of two ships were used to evaluate whether the algorithm conforms to COLREGs. The scalability of the algorithm was then verified by multi-ship encountering scenarios.

5.1. Two Ships Collision Avoidance in Four Scenarios

To guarantee the performance of the algorithm, each ship making the decision must comply with COLREGs, and reach a destination after successfully avoiding collision. In the training phase, the state input of each ship consists of its state, as observed by itself. The output of the algorithm is the rudder angle. At each training iteration, each ship selects an optimal action based on state and observation to generate trajectories. But an episode will end if ships collide with others or ships all reach their destinations.
The average reward is computed as the sum of rewards accumulated from each ship, and the reward functions follow the rules designed in Section 3. When the average reward value tends to be stable as shown in Figure 11, the training process is completed, and the optimal agent can be obtained. All ships can automatically avoid collision, and strictly follow the COLREGs.
For inspecting the trained agent, the case (a), case (b), case (c) and case (d) were set to conduct the simulation. The origin and destination parameters of the simulation were shown in Table 4, and the simulation scenarios were setup as shown in Figure 12.
In case (a), ship I and ship II encountered head-on situations (reciprocal courses in the range of 10°). In detecting collision risk, the ships quickly altered course to starboard. After finishing collision avoidance, they continued to their destinations. In this process, the ship movement not only complied with the COLREGs, but also completed collision-free navigation.
In case (b), a port crossing situation occurred if the ship IV was appearing within an azimuth angle (247.5°, 355°) of ship III. When two ships were in a dangerous ship domain, they altered course to starboard and finished the collision avoidance task by selecting a safe speed.
In case (c), ship V detected that ship VI was coming from the starboard side of ship V. Two ships were encountering a starboard crossing situation. Ship V altered course to starboard to avoid collision.
In case (d), ship VII was overtaking ship VIII in the range 135°. For avoiding collision, ship VII altered course to starboard and completed overtaking.
From Figure 12, the ships motion trajectories and collision avoidance behaviors can be observed. The results show that each ship followed the COLREGs and reached its destination, indicating that the trained agent can complete the collision avoid task.
According to Figure 13, the detailed information of the ship’s navigation can be obtained. Since the rudder angle and the speed of ship Ⅰ and ship Ⅱ are the same, only the information of ship Ⅰ is expressed. The rudder angle of each ship is limited to [−30°, 30°]. And the velocity is limited to [0, 7.5 (m/s)]. After detecting the collision risk, each ship adopts a different rudder angle and velocity to solve the danger.
In conclusion, the CA-QMIX algorithm can ensure compliance with the COLREGs under the premise of successful collision avoidance. In addition, the proposed method demonstrated its excellent collision avoidance performance, flexibility of application scenarios and scalability potential.

5.2. Simulation for Multi-Ship Collision Avoidance

In order to verify the scalability of the algorithm, scenarios of three and four ships encountering are set. The origin and destination of each ship are shown in Table 5.

5.2.1. Three Ships Collision Avoidance Scenarios

According to Figure 6, the three ships encountering scenario was created. We need to evaluate the performance of the algorithm from two aspects. One is the assessment of collision avoidance and whether the ship can avoid collision and comply with COLREGs. The other is whether the ship can reach the destination. In Figure 14, three ships were at risk of collision in the central area. To avoid collision, three ships altered course to starboard while complying with COLREGs. The trajectories of the ships were safe and smooth. Figure 15 illustrates the change in their speed and rudder angle.

5.2.2. Four Ships Collision Avoidance Scenarios

To prove the scalability of the algorithm, we set up this complex simulation scenario, as shown in Figure 16, and the origin and destination coordinates of each ship are shown in Table 5. Where four ships navigate to a center point, if they do not adopt the appropriate collision avoidance behaviors, they will collide each other in central area.
Figure 17 illustrates the simulation process of collision avoidance with a four-ship encounter situation. Initially, four ships sailed to their destinations along straight lines. When they sailed to the positions of the first figure, they detected the collision risk. For avoiding collision, the command that changed their course to turn right was issued. After arriving in the positions of the second figure, the four ships move circularly to pass the hazardous area. Next, the four ships arrived at their positions in the third figure—the risk of collision had been solved. They changed course to more quickly arrive at the destination course. The rudder angle and velocity of the four ships in this process are shown in Figure 17.

5.3. Complementary Simulation for Multi-Ship Collision Avoidance

In this section, further multi-ship encounter scenarios were set to increase the validity of the proposed CA-QMIX method. For confirming that the collision avoidance of two ships complies with COLREGs in all scenarios, the further simulations are shown in Figure 18. Based on the previous simulations, Figure 18 illustrates the collision avoidance process under the various conditions.
Figure 19 illustrates the setup of the multi-ship simulation. According to this section, we confirmed the successful performance of the CA-QMIX algorithm in various simulation scenarios and through thorough performance evaluations. Therefore, it can be concluded that the proposed CA-QMIX algorithm could enable ships to avoid collision and get to their target destinations in different scenarios. It also indicates that this study provides a more versatile decision-making model for intelligent ship behavior.

6. Conclusions

In this study, an intelligent ship behavior decision-making method is proposed for multi-ship collision avoidance based on the multi-agent reinforcement learning algorithm, which could ensure the safety of a ship’s voyage in different multi-ship encounter scenarios.
To reduce collision risk, this paper performs collision detection using the ORCA (Optimal Reciprocal Collision Avoidance) method. The proposed algorithm adopts the CTDE framework and DEC-POMDP model to train the agents to avoid collision. Then, the multi-ship model was trained on the rich encountering situations based on the CA-QMIX algorithm. Furthermore, multi-ship collision avoidance also needed to comply with COLREGs. Hence, we designed a novel reward function for solving the collision problem, and as a result, changing course complied with COLREGs. To improve the efficiency of training, the procedure of the reward functions was defined. Subsequently, multiple ships were trained on different scenarios defined by COLREGs.
In the simulations, the proposed algorithm was validated in various simulated scenarios, and its performance was evaluated by the navigational trajectories, the rudder angle and speed. The results of simulation on various scenarios indicated that the proposed algorithm could implement multi-ship collision avoidance while complying with COLREGs and incorporating rudder characteristics. This algorithm demonstrated its flexibility and scalability. It was therefore able to be applied to a wide range of tasks.
For future work, we will focus on improving the ship motion mathematical model. The model-based multiple-agents reinforcement learning can achieve a good sample efficiency and a stable performance. For use in the real world, a more accurate model will be integrated to enhance the maneuverability of a ship. In addition, the collision-free path should be optimized to improve navigational efficiency. Finally, a hardware simulation will be implemented for verifying the feasibility of multi-ship collision avoidance, and the simulation results will be compared with other relevant methods.

Author Contributions

Conceptualization, G.W. and W.K.; methodology, W.K.; software, W.K.; validation, G.W.; formal analysis, G.W. and W.K.; investigation, G.W. and W.K.; writing—original draft preparation, W.K.; writing—review and editing, G.W. and W.K.; supervision, G.W.; project administration, G.W.; funding acquisition, G.W. All authors have read and agreed to the published version of the manuscript.

Funding

The paper is partially supported by National Natural Science Foundation of China (NO. 51409033, 52171342), and the Fundamental Research Funds for the Central Universities (NO. 3132019343). The authors would like to thank the anonymous reviews for their valuable comments.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Guan, W.; Peng, H.; Zhang, X.; Sun, H. Ship Steering Adaptive CGS Control Based on EKF Identification Method. J. Mar. Sci. Eng. 2022, 10, 294. [Google Scholar] [CrossRef]
  2. Statheros, T.; Howells, G.; Maier, K.M.D. Autonomous ship collision avoidance navigation concepts, technologies and techniques. J. Navig. 2008, 61, 129–142. [Google Scholar] [CrossRef] [Green Version]
  3. Zhao, D.; Shao, K. Deep reinforcement learning overview: The development of computer go. Control. Theory Appl. 2016, 6, 17. [Google Scholar] [CrossRef]
  4. Liu, Q.; Zhai, J. A brief overview of deep reinforcement learning. Chin. J. Comput. 2018, 1, 27. [Google Scholar] [CrossRef]
  5. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
  6. Graves, A.; Mohamed, A.R.; Hinton, G. Speech Recognition with Deep Recurrent Neural Networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; Volume 38, pp. 6645–6649. [Google Scholar] [CrossRef] [Green Version]
  7. Sutton, R.S.; Barto, A.G. Reinforcement learning: An introduction. IEEE Trans. Neural Netw. 1998, 9, 1054. [Google Scholar] [CrossRef]
  8. Miele, A.; Wang, T. Maximin approach to the ship collision avoidance problem via multiple-subarc sequential gradient-restoration algorithm. J. Optim. Theory Appl. 2005, 124, 29–53. [Google Scholar] [CrossRef]
  9. Phanthong, T.; Maki, T.; Ura, T.; Sakamaki, T.; Aiyarak, P. Application, Application of A* algorithm for real-time path re-planning of an unmanned surface vehicle avoiding underwater obstacles. J. Mar. Sci. Appl. 2014, 13, 105–116. [Google Scholar] [CrossRef]
  10. Cheng, X.; Liu, Z.; Zhang, X. In Trajectory Optimization for ship Collision Avoidance System Using Genetic Algorithm. In Proceedings of the OCEANS 2006-Asia Pacific, Singapore, 16–19 May 2006; pp. 1–5. [Google Scholar] [CrossRef]
  11. Liang, C.; Zhang, X.; Watanabe, Y.; Deng, Y. Autonomous collision avoidance of unmanned surface vehicles based on improved a star and minimum course alteration algorithms. Appl. Ocean. Res. 2021, 113, 102755. [Google Scholar] [CrossRef]
  12. Wilson, P.A.; Harris, C.J.; Hong, X. A line of sight counteraction navigation algorithm for ship encounter collision avoidance. J. Navig. 2003, 56, 111–121. [Google Scholar] [CrossRef]
  13. Chen, Y.-Y.; Ellis-Tiew, M.-Z.; Chen, W.-C.; Wang, C.-Z. Fuzzy risk evaluation and collision avoidance control of unmanned surface vessels. Appl. Sci. 2021, 11, 6338. [Google Scholar] [CrossRef]
  14. Johansen, T.A.; Perez, T.; Cristofaro, A. Ship collision avoidance and COLREGS compliance using simulation-based control behavior selection with predictive hazard assessment. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3407–3422. [Google Scholar] [CrossRef] [Green Version]
  15. Hu, L.; Naeem, W.; Rajabally, E.; Watson, G.; Mills, T.; Bhuiyan, Z.; Raeburn, C.; Salter, I.; Pekcan, C. A multiobjective optimization approach for COLREGs-compliant path planning of autonomous surface vehicles verified on networked bridge simulators. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1167–1179. [Google Scholar] [CrossRef] [Green Version]
  16. Shen, H.; Hashimoto, H.; Matsuda, A.; Taniguchi, Y.; Terada, D.; Guo, C. Automatic collision avoidance of multiple ships based on deep Q-learning. Appl. Ocean. Res. 2019, 86, 268–288. [Google Scholar] [CrossRef]
  17. Sawada, R.; Sato, K.; Majima, T. Automatic ship collision avoidance using deep reinforcement learning with LSTM in continuous action spaces. J. Mar. Sci. Technol. 2021, 26, 509–524. [Google Scholar] [CrossRef]
  18. Li, L.; Wu, D.; Huang, Y.; Yuan, Z.-M. A path planning strategy unified with a COLREGS collision avoidance function based on deep reinforcement learning and artificial potential field. Appl. Ocean. Res. 2021, 113, 102759. [Google Scholar] [CrossRef]
  19. Zhao, L.; Roh, M.-I. COLREGs-compliant multiship collision avoidance based on deep reinforcement learning. Ocean. Eng. 2019, 191, 106436. [Google Scholar] [CrossRef]
  20. Xu, X.; Lu, Y.; Liu, X.; Zhang, W. Intelligent collision avoidance algorithms for USVs via deep reinforcement learning under COLREGs. Ocean. Eng. 2020, 217, 107704. [Google Scholar] [CrossRef]
  21. Fossen, T.I. Guidance and Control of Ocean Vehicles; John Wiley and Sons: Chichester, UK, 1999; ISBN 0 471 94113 1. [Google Scholar]
  22. Perez, T.; Ross, A.; Fossen, T. A 4-dof Simulink Model of a Coastal Patrol Vessel for Manoeuvring in Waves. In Proceedings of the 7th IFAC Conference on Manoeuvring and Control of Marine Craft, International Federation for Automatic Control, Lisbon, Portugal, 20–22 September 2006; pp. 1–6. [Google Scholar]
  23. Alonso-Mora, J.; Breitenmoser, A.; Rufli, M.; Beardsley, P.; Siegwart, R. Optimal Reciprocal Collision Avoidance for Multiple Non-Holonomic Robots; Springer: Berlin/Heidelberg, Germany, 2013; pp. 203–216. [Google Scholar] [CrossRef] [Green Version]
  24. Śmierzchalski, R. Ships’ Domains as Collision Risk at Sea in the Evolutionary Method of Trajectory Planning. In Information Processing and Security Systems; Springer: Berlin/Heidelberg, Germany, 2005; pp. 411–422. [Google Scholar] [CrossRef]
  25. Oliehoek, F.A.; Span, M.T.; Vlassis, N. Optimal and approximate q-value functions for decentralized POMDPs. J. Artif. Intell. Res. 2008, 32, 289–353. [Google Scholar] [CrossRef] [Green Version]
  26. Oliehoek, F.A.; Amato, C. A Concise Introduction to Decentralized POMDPs; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
  27. Busoniu, L.; Babuška, R.; De Schutter, B. Multi-Agent Reinforcement Learning: An Overview. In Innovations in Multi-Agent Systems and Applications-1; Springer: Berlin/Heidelberg, Germany, 2008; Volume 38, pp. 156–172. [Google Scholar] [CrossRef]
  28. Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-decomposition networks for cooperative multi-agent learning. arXiv 2017, arXiv:1706.05296. [Google Scholar] [CrossRef]
  29. Rashid, T.; Samvelyan, M.; Schroeder, C.; Farquhar, G.; Foerster, J.; Whiteson, S. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4295–4304. [Google Scholar]
  30. Ha, D.; Dai, A.; Le, Q. Hypernetworks. arXiv 2016, arXiv:1609.09106. [Google Scholar]
  31. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The ship motion mathematical model.
Figure 1. The ship motion mathematical model.
Jmse 10 01431 g001
Figure 2. Establish the safe area by designing the ship domain. (a) The design of the ship safe domain; (b) The information of multi-ship motion.
Figure 2. Establish the safe area by designing the ship domain. (a) The design of the ship safe domain; (b) The information of multi-ship motion.
Jmse 10 01431 g002
Figure 3. Calculate the optimal velocity by ORCA: (a) When TS is stationary, the collision velocity is calculated; (b) When TS is moving, the collision velocity is calculated; (c) The geometric meaning of the vector u and n are expressed; (d) The geometric meaning of the optimal velocity is expressed.
Figure 3. Calculate the optimal velocity by ORCA: (a) When TS is stationary, the collision velocity is calculated; (b) When TS is moving, the collision velocity is calculated; (c) The geometric meaning of the vector u and n are expressed; (d) The geometric meaning of the optimal velocity is expressed.
Jmse 10 01431 g003
Figure 4. The categorization of TSs based on COLREGs.
Figure 4. The categorization of TSs based on COLREGs.
Jmse 10 01431 g004
Figure 5. Encounter situations defined by COLREGs.
Figure 5. Encounter situations defined by COLREGs.
Jmse 10 01431 g005
Figure 6. Multi-ship collision avoidance strategies.
Figure 6. Multi-ship collision avoidance strategies.
Jmse 10 01431 g006
Figure 7. Flow diagram for collision avoidance.
Figure 7. Flow diagram for collision avoidance.
Jmse 10 01431 g007
Figure 8. (a) The overall QMIX architecture; (b) Agent network structure; (c) Mixing network structure.
Figure 8. (a) The overall QMIX architecture; (b) Agent network structure; (c) Mixing network structure.
Jmse 10 01431 g008
Figure 9. QMIX algorithm for the multi-ship collision avoidance.
Figure 9. QMIX algorithm for the multi-ship collision avoidance.
Jmse 10 01431 g009
Figure 10. QMIX algorithm for the multi-ship collision avoidance.
Figure 10. QMIX algorithm for the multi-ship collision avoidance.
Jmse 10 01431 g010
Figure 11. Average reward for multi-ship collision avoidance.
Figure 11. Average reward for multi-ship collision avoidance.
Jmse 10 01431 g011
Figure 12. Collision avoidance with four situations based on COLREGs. (Unit: 10 m).
Figure 12. Collision avoidance with four situations based on COLREGs. (Unit: 10 m).
Jmse 10 01431 g012
Figure 13. Rudder angle and velocity of ship ⁠, ⁢, ⁤ and ⁦.
Figure 13. Rudder angle and velocity of ship ⁠, ⁢, ⁤ and ⁦.
Jmse 10 01431 g013
Figure 14. Three ships encountering scenario. (Unit: 10 m).
Figure 14. Three ships encountering scenario. (Unit: 10 m).
Jmse 10 01431 g014
Figure 15. The rudder angle and velocity of three ships.
Figure 15. The rudder angle and velocity of three ships.
Jmse 10 01431 g015
Figure 16. Four ships encountering scenario. (Unit: 10 m).
Figure 16. Four ships encountering scenario. (Unit: 10 m).
Jmse 10 01431 g016
Figure 17. The rudder angle and velocity for four ships.
Figure 17. The rudder angle and velocity for four ships.
Jmse 10 01431 g017
Figure 18. Additional two-ships encountering scenarios. (The simulation of the more two-ship encountering scenarios).
Figure 18. Additional two-ships encountering scenarios. (The simulation of the more two-ship encountering scenarios).
Jmse 10 01431 g018
Figure 19. Additional multiple-ships encountering scenarios. (The simulation of the more multi-ship encountering scenarios).
Figure 19. Additional multiple-ships encountering scenarios. (The simulation of the more multi-ship encountering scenarios).
Jmse 10 01431 g019
Table 1. The Simple Summary of the Literature Review.
Table 1. The Simple Summary of the Literature Review.
Type ReferenceTechniqueAdvantagesDisadvantages
Ship collision avoidance[8]ulti-subarc sequential gradient-restoration algorithmSolving two cases of the collision avoidance problemShip‘s actions do not conform to COLREGs
[9]A star algorithmAvoiding against the stationary and dynamic obstacles with the optimal trajectory
[10]Genetic algorithmAvoiding collision and seek the trajectory
COLREGs-compliant ship collision avoidance[11]A line of sight counteraction navigation algorithmTwo-ship collision avoidance complied with COLREGsWhen the ship encounters more complex scenarios, such as four ships will collide at the same time, these methods cannot achieve collision avoidance navigation
[12]Minimum course alteration algorithmAvoid moving ships or obstacles constrained by COLREGs
[13]Nonlinear optimal control methodCollision risk and collision avoidance acting timing are developed
[14]Model predictive controlCOLREGs and collision hazards associated with each of the alternative control behaviors are evaluated on a finite prediction horizon
[15]Multi-objective optimization algorithmCooperating a hierarchical sorting rule
COLREGs-compliant multi-ship collision avoidance based on DRL[16]DRL incorporated the ship maneuverability, human experience and COLREGsThe experimental validation of three self-propelled shipsDiscrete action element
[17]DRL in continuous action spaceThe risk of collision is reducedPoor convergence
[18]Utilizing the artificial potential field (APF) algorithm to improve DRL Improving the action space and reward function of DRL-
[19]The policy-gradient based DRL algorithmMulti-ship collision avoidanceHeading angle retention
[20]DDPGThe method can give reasonable collision avoidance actions and realize effective collision avoidance-
Table 2. Principal dimensions of the ship.
Table 2. Principal dimensions of the ship.
ParametersValue
Length (m)52.5
Beam (m)8.6
Draft (m)2.29
Rudder area (m2)1.5
Max rudder angle (deg)40
Max rudder angle rate (deg/s)20
Nominal speed (kt)15
K index−0.085
T index4.2
Table 3. The parameters of the training algorithm.
Table 3. The parameters of the training algorithm.
ParameterValue
Discounted rate γ 0.99
Lambda λ 0.95
Time stepsT max10,000
The epoch of target networkE100
The learning rate for RMSpropr RMS5 × 10–4
Clipping hyperparameter ε 1
Table 4. Two ships test parameters.
Table 4. Two ships test parameters.
ScenarioShip NumberOrigin (m)Destination (m)
Head-onI(0, 900)(0, −900)
II(0, −900)(0, 900)
Port crossingIII(0, −900)(0, 900)
IV(−900, 0)(900, 0)
Starboard crossingV(0, −900)(0, 900)
VI(900, 900)(−900, 0)
OvertakingVII(0, −900)(0, 900)
VIII(0, 100)(0, 300)
Table 5. Multi-ship scenarios parameters setup.
Table 5. Multi-ship scenarios parameters setup.
ScenarioShip NumberOrigin (m)Destination (m)
Three shipsShip I(0, 900)(0, −900)
Ship II(779.4, 450)(−779.4, −450)
Ship III(−779.4, 450)(779.4, −450)
Four shipsShip I(0, −900)(0, 900)
Ship II(−900, 0)(900, 0)
Ship III(0, 900)(0, −900)
Ship IV(900, 0)(−900, 0)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wei, G.; Kuo, W. COLREGs-Compliant Multi-Ship Collision Avoidance Based on Multi-Agent Reinforcement Learning Technique. J. Mar. Sci. Eng. 2022, 10, 1431. https://doi.org/10.3390/jmse10101431

AMA Style

Wei G, Kuo W. COLREGs-Compliant Multi-Ship Collision Avoidance Based on Multi-Agent Reinforcement Learning Technique. Journal of Marine Science and Engineering. 2022; 10(10):1431. https://doi.org/10.3390/jmse10101431

Chicago/Turabian Style

Wei, Guan, and Wang Kuo. 2022. "COLREGs-Compliant Multi-Ship Collision Avoidance Based on Multi-Agent Reinforcement Learning Technique" Journal of Marine Science and Engineering 10, no. 10: 1431. https://doi.org/10.3390/jmse10101431

APA Style

Wei, G., & Kuo, W. (2022). COLREGs-Compliant Multi-Ship Collision Avoidance Based on Multi-Agent Reinforcement Learning Technique. Journal of Marine Science and Engineering, 10(10), 1431. https://doi.org/10.3390/jmse10101431

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop