Optimization of Inbound and Outbound Vessel Scheduling in One-Way Channel Based on Reinforcement Learning

Zhen, Rong; Sun, Meng; Fang, Qionglin

doi:10.3390/jmse13020237

Open AccessArticle

Optimization of Inbound and Outbound Vessel Scheduling in One-Way Channel Based on Reinforcement Learning

by

Rong Zhen

,

Meng Sun

and

Qionglin Fang

^*

Navigation College, Jimei University, Xiamen 361021, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(2), 237; https://doi.org/10.3390/jmse13020237

Submission received: 14 December 2024 / Revised: 31 December 2024 / Accepted: 24 January 2025 / Published: 26 January 2025

(This article belongs to the Special Issue Smart and Low Carbon Emission-Oriented Maritime Traffic Management and Controlling)

Download

Browse Figures

Versions Notes

Abstract

As the size and number of ships continue to grow, effective management of vessel scheduling has become more and more important for the efficient one-way channel port operation, whose characteristics significantly affect the safety and efficiency of ports. This paper presents a reinforcement-learning-based approach to optimize the scheduling of vessels in a one-way channel, aiming to quickly identify a scheduling solution that enhances port operational efficiency. This method models the vessel scheduling problem in a one-way channel by incorporating navigational constraints, safety requirements, and vessel-specific characteristics. Using the Q-learning algorithm to minimize vessel wait times, it identifies an optimal scheduling solution. Experiments were conducted using real data from the Dayao Bay Pier of Dalian Port to validate the rationality and effectiveness of the proposed model and algorithm. The results show that the reinforcement learning approach achieved approximately a 16% improvement in solution quality compared to the genetic algorithm (GA) while requiring only half the computation time. Additionally, it reduced delay times by over 40% relative to the traditional FCFS strategy, indicating superior overall performance. This research presents an efficient, intelligent approach to vessel scheduling, providing a theoretical foundation for further advancements in this field and enhancing decision support for vessel scheduling in one-way channels with practical implications.

Keywords:

inbound and outbound scheduling; one-way channel; reinforcement learning

1. Introduction

As a critical node in global trade, port operational efficiency directly affects the smooth flow of goods and economic benefits [1,2]. However, with the rising number of vessels and channel capacity constraints, port management faces numerous challenges [3]. One key challenge is the scheduling of ship arrivals and departures, with channel arrangements typically supervised by vessel traffic services (VTS) [4,5]. Channel management directly influences the timing of ships’ arrivals at berths and departures from the port. In many domestic ports, due to constraints such as natural conditions, the difficulty of channel expansion, and financial limitations, one-way channels remain the primary mode of operation [6,7]. Under this model, ships must strictly follow the port authority’s designated time windows for arrivals and departures, which can lead to disorganization in port operations. Inefficient or delayed scheduling may cause port congestion, posing challenges for both ships and the port. By implementing an efficient scheduling system, we can boost the throughput of vessels at ports, enhance operational efficiency, conserve energy resources, and lower carbon emissions, all of which are crucial steps toward meeting China’s emission reduction goals [8,9]. Therefore, to alleviate congestion and enhance the safety and efficiency of ship scheduling [10], designing an intelligent and effective scheduling plan is essential for improving port operational capacity and ensuring maritime safety.

The port VTS coverage area typically includes the harbor basin, channels, and anchorages [11,12]. Currently, the most commonly used ship scheduling strategy is the first-come, first-served (FCFS) approach [13], where the VTS management center schedules ships’ entry and exit times based on the order of their pre-reported arrival and departure times. However, this approach neglects other valuable information about ships, potentially leading to extended delays during peak arrival periods. This not only increases ship waiting times but also reduces the efficiency of port resource utilization and may even result in traffic congestion and safety risks. In this context, finding an intelligent and flexible scheduling method has become increasingly urgent. To address the above issues, this paper first develops a model that simulates actual port operations and explores the key challenges in one-way channel vessel scheduling. The model takes into account factors such as channel capacity, ship size, and sailing speed, aiming to closely resemble real port environments. Based on this, reinforcement learning techniques are introduced into the field of vessel scheduling, and an intelligent scheduling algorithm suitable for one-way channel environments is designed. This algorithm autonomously learns from real-time information and continuously optimizes scheduling strategies, making the ship entry and exit process more organized and efficient. Finally, through testing with real port data, this paper demonstrates that the reinforcement learning-based scheduling method has significant advantages over traditional approaches. Specifically, the innovations of this paper include:

(1): Method Improvement: By combining reinforcement learning with unidirectional channel ship scheduling problem, we comprehensively consider channel capacity, ship size, sailing speed, and other factors, and we strive to be close to the real port environment and build a more efficient scheduling model.
(2): Algorithm Optimization: A strategy of gradually approaching Q value in reinforcement learning algorithm is adopted to reduce the resource consumption and shorten the running time of the intelligent scheduling algorithm for ships entering and leaving ports in one-way channel.

The remainder of this paper is organized as follows: Section 2 provides a detailed review of the current state of vessel scheduling research, discussing the main achievements and limitations in existing literature and highlighting the focus and trends in current studies. Section 3 presents the model and solution methods for the one-way channel vessel scheduling problem, explaining the reinforcement learning framework and technical details used. Section 4 discusses numerical experiments and results analysis, verifying the effectiveness and superiority of the proposed method through specific case studies and providing a comprehensive evaluation of its performance. Section 5 concludes the paper and suggests future research directions, offering ideas and recommendations for further exploration. The overall research process is illustrated in Figure 1.

2. Literature Review

2.1. Current Research on Ship Scheduling

Inbound and outbound ship scheduling is a critical process for establishing the queue of ships arriving at and departing from the port, with the primary goal of determining their order. This scheduling process optimizes port operations by ensuring safe and efficient ship movements and maximizing resource utilization. Research on this issue primarily focuses on three areas: scheduling optimization for one-way, two-way, and compound channels, each with distinct requirements and challenges.

Inefficient ship scheduling can lead to severe congestion, disrupting the planned berthing and departure arrangements for inbound and outbound ships, particularly in one-way channels, where it can cause significant delays. To ensure navigational safety, one-way channels prohibit ships from traveling in opposite directions simultaneously. As a result, total waiting time becomes a key metric for measuring the efficiency of port waterways [14]. Existing research mainly focuses on reducing ship waiting times through various optimization methods. Xu et al. [15] proposed an optimal scheduling model for inbound and outbound ships in one-way channels based on onboard AIS data, considering factors such as berth proximity, ship size, type, and draft. Building on this, Zheng et al. [16] considered factors such as the safe navigation distance between ships, alternating entry and exit time windows, and clustered entry and exit rules. They developed a mixed-integer linear programming model aimed at minimizing the total waiting time for inbound ships. Xu et al. [17] considered tidal and water depth requirements and optimized the completion time for all arriving ships. They developed a parallel machine scheduling model for berth allocation with time windows and designed a corresponding heuristic algorithm. Ting et al. [18] proposed a mixed-integer programming model aimed at minimizing the waiting and operating times for arriving ships, addressing the dynamic discrete berth allocation problem. They applied a particle swarm optimization algorithm to solve the model. Mauri et al. [19] developed a mathematical model for berth allocation that accounts for the spatial and temporal constraints required for ship berthing. This model applies to both continuous and discrete berths, and they designed an adaptive large neighborhood search algorithm to solve it. Bai [20] enhanced the artificial fish swarm algorithm to improve search precision and local optimization capability, significantly reducing total delay time due to ship arrival sequencing. Li [21] integrated reinforcement learning with a genetic algorithm to substantially shorten total waiting times for ships in one-way channels while ensuring safe navigation. Zhang et al. [22] developed a scheduling optimization model aimed at minimizing total waiting time and designed a simulated annealing multi-population genetic algorithm suitable for port ship scheduling, validating it through simulation experiments.

In recent years, two-way channels have been widely implemented in major ports to alleviate navigational pressure and enhance efficiency. For two-way channels, Zhang et al. [23] proposed a tidal-influenced ship scheduling model and algorithm to optimize the sequencing of inbound and outbound ships under tidal conditions. This approach reduces waiting times and enhances channel operational efficiency. Building on this, Zhang et al. [24] developed a multi-objective optimization model for two-way port vessel scheduling that incorporates constraints for safety, continuity, and efficiency, taking into account ship attributes and traffic conditions. This model improves scheduling safety and navigational efficiency. Wang [25] proposed a hybrid self-organizing scheduling (HSOS) method for restricted two-way channels that employs a priority mechanism to reduce traffic conflicts, significantly lowering the average delay for large ships under high arrival rates. Meisel et al. [26] addressed the traffic management challenges of the Kiel Canal using multiple optimization models and heuristic algorithms, significantly improving ship transit efficiency and service quality.

Compound channels are relatively advanced, and research on them remains limited both domestically and internationally. Gong [27] proposed an optimized scheduling and tug assignment plan for ports, balancing traffic scheduling rules with tug availability constraints. Zhang et al. [28] developed a ship scheduling model for a typical tidal compound channel, using real tidal data and a genetic algorithm to optimize vessel scheduling in the Tianjin Port channel. Results showed that this approach outperformed other scheduling methods and could be applied to other tide-affected compound channels.

Although extensive research has been conducted on one-way channel ship scheduling, most studies rely on traditional heuristic algorithms, such as mixed-integer linear programming, particle swarm optimization, and genetic algorithms. While these methods can find feasible solutions within relatively short timeframes, they often depend on predefined rules or parameters, making it difficult to guarantee a globally optimal solution [29]. As problem size increases, the computational complexity of traditional algorithms rises sharply, resulting in longer solution times and limiting their suitability for real-time scheduling needs.

2.2. Reinforcement Learning in Combinatorial Optimization

Reinforcement learning is a trial-and-error learning method based on Markov decision processes (MDP) [30,31]. As a key paradigm in machine learning, it enables an agent to optimize its behavior policy through interactions with the environment to achieve specific objectives [32]. Unlike traditional combinatorial optimization methods, reinforcement learning does not rely on precise models, giving it a distinct advantage in addressing complex, hard-to-model problems, particularly in applications with high uncertainty and dynamic environments [33].

In recent years, an increasing number of scholars have sought to apply reinforcement learning to solve combinatorial optimization problems [34]. Among these, the most representative problem is the traveling salesman problem (TSP). Ottoni et al. [35] proposed a version of the TSP with fuel constraints and applied reinforcement learning to solve it. Through multiple case studies, they demonstrated the effectiveness of this approach, achieving optimal results and highlighting the advantages of reinforcement learning in combinatorial optimization problems. Gambardella et al. [36] studied the superior performance of the Ant-Q algorithm in solving problems like the TSP, demonstrating its powerful capabilities in combinatorial optimization. Duan et al. [37] proposed a variable speed control strategy based on Q-learning, significantly improving traffic flow efficiency in bottleneck areas and enhancing overall traffic conditions. The application of reinforcement learning in traffic management has also been expanding. Fotuhi et al. [38] introduced a Q-learning-based model for stacker crane operations, optimizing the service sequence of hauling trucks to minimize waiting times. Huang et al. mentioned that it has also been applied in taxi dispatch in recent years [39]. It is also widely used in traffic trajectory prediction [40]. Xiang et al. [41] developed a reinforcement learning model for outbound flights at multi-airport terminal areas, using it to intelligently sort incoming and outgoing flights, thereby effectively reducing flight delays and the number of delayed aircraft, while lowering average delay times. Zhu et al. [42] applied Q-learning algorithms to optimize air traffic flow, significantly alleviating congestion at flight route intersections with positive results. Wang [43] combined the PPO algorithm with simulation modeling to simulate ship traffic flow in crossing water areas, showcasing the potential of reinforcement learning in managing traffic in complex water environments. Du et al. [44] developed a machine-learning-based model for liner transportation scheduling to address uncertainties in speed adjustments, validating its effectiveness in shipping optimization.

In summary, reinforcement learning, with its ability to learn and adapt in dynamic environments, demonstrates excellent performance in solving complex combinatorial optimization problems, overcoming the limitations of traditional optimization algorithms, such as poor adaptability and high resource consumption. Therefore, this paper applied reinforcement learning to optimize the scheduling of ships entering and exiting a one-way channel, with the goal of reducing ship delay times, improving port scheduling efficiency, and providing innovative technical solutions for port management.

3. Model of Ship Scheduling Problem in One-Way Channels

Port ship operations are a continuous process, and a safe and feasible scheduling plan should be proactive, capable of anticipating potential urgent situations that ships may encounter in the channel, while taking preventive measures in advance [45]. In ports with one-way navigation, the core of ship scheduling lies in the scientific and rational arrangement of the sequence and timing of ships entering and exiting the channel. This not only concerns channel safety but also directly impacts the overall navigation efficiency of the port.

3.1. Construction of One-Way Channel Ship Scheduling Model

The diagram of the one-way channel is shown in Figure 2. The ship arrival and departure process can be divided into three stages: ship arrival, ship departure, and the transition between arrival and departure. For the ship arrival process, when the arriving ship reaches the reporting line, it receives instructions from the VTS [46]. The VTS will develop an arrival plan in advance based on factors such as the ship’s scheduled arrival, the distribution of anchorages outside the port, the availability of berths inside the port, and operational information within and outside the port. At the appropriate time, the VTS will issue the arrival instructions to the ship. Upon receiving the instructions, the ship carries out the arrival operation, enters the corresponding harbor via the channel, and reaches the designated berth. For the ship departure process, similar to arrival, after completing operations, the ship needs to submit a departure request to the VTS. After receiving the departure instruction from the VTS, the ship proceeds with steps such as document processing, unmooring, and departure from the berth, then exits the port via the channel [47]. For the arrival-departure transition process, the arrival ship can only begin scheduling after the previous departure ship has left, while the departure ship can only be scheduled after the previous arrival ship reaches the berth, ensuring navigational safety.

Several factors influence port ship scheduling, and to simplify the modeling and solution process, the following assumptions are made:

(1): For incoming ships, the application time is considered to be from the moment they apply for entry, and for outgoing ships, the application time is the time when they apply to leave the berth.
(2): The berths for incoming ships have already been pre-assigned.
(3): There are sufficient pilots available.
(4): During the ship scheduling process, factors such as weather, accidents, and other disturbances are not considered.
(5): All ships entering and leaving the port are in the same position near and far from the waterway, that is, the sailing distance of the ship in the waterway is the same.
(6): When applying for entry or exit, pilots and tugboats have already been assigned and are ready.

The parameters are defined as follows:

I

represents the collection of ships, including incoming and outgoing ships, there are n ships;

t_{i}^{s}

represents the ship’s start scheduling time;

t_{i}^{a}

represents the time a ship applies for port entry;

i

represents the previously scheduled ship;

j

represents the subsequently scheduled ship; the

i

and

j

numbers represent the two ships in a row in the dispatch;

S_{i}

represents the distance of ship

i

from the anchorage;

L_{i}

represents the length of ship

i

;

D_{i}

represents the sailing distance of ship

i

; and

v_{i}

represents the average speed of ship

i

in the channel. The decision variables for subsequent use are defined as follows:

B_{i}

represents the berth status of ship

i

, where 1 indicates that the berth is occupied, and 0 indicates that the berth is available, and

I O_{i}

is a binary variable that represents the direction of a ship’s movement, where 1 means the ship enters the port and 0 means the ship leaves the port.

Objective function for ship operation delay:

The design of the objective function should be based on the port entry and exit traffic situation of the ships throughout the entire planning period. In the ship scheduling process, aside from the time delays caused by overtaking or crossing in the channel and harbor, the sailing time of the ship in the channel is fixed. Therefore, the optimization focus is on the sum of all ships’ waiting times at the berth and the delay times in the channel and harbor, as shown in Equation (1).

m i n T = \sum_{i \in I} (t_{i}^{s} - t_{i}^{a})

(1)

Ship navigation constraints:

(1): Scheduling time constraint

To ensure that ships are effectively scheduled, the start time of the ship’s scheduling must be no earlier than its requested scheduling time. This constraint ensures that each ship is scheduled at a reasonable time, avoiding scheduling before the ship’s requested time, thus ensuring the feasibility and safety of the scheduling, as shown in Equation (2).

t_{i}^{s} \geq t_{i}^{a}, \forall i \in I

(2)

(2): Same-direction ship constraint

In traffic flow, the distance and time between vehicles are crucial to improving traffic safety. We need to pay more attention to this in research [48]. As shown in Figure 3, for ships traveling in the same direction in a one-way channel, a safe distance between ships must be maintained. The safe distance

s

is defined as six times the length of the following ship

j

[22]. If the starting positions of ship

i

and ship

j

are the same, then the scheduled starting time for ship

j

should be the starting time of ship

i

plus the safety time interval

t_{s}

. Overtaking is prohibited during the navigation of ships to avoid collisions caused by differences in speed or navigational positions. The specific formula is as follows:

s = 6 \times L_{j}

(3)

t_{s} = s / v_{i}

(4)

To prevent two consecutively scheduled ships from overtaking each other in the channel, specific constraints must be applied when the ships travel in the same direction. For instance, if the leading ship is inbound and the following ship is also inbound, or if both ships are outbound, the scheduling start time of the following ship must be restricted to ensure navigational safety. The specific constraints are detailed in Equation (5).

t_{j}^{s} = m a x (t_{j}^{a}, t_{i}^{s} + t_{s}), \forall i, j \in I

(5)

(3): Switching constraints for in-and-out channel process

In a one-way channel, to ensure navigational safety, only ships moving in the same direction are allowed to pass at the same time. Ships moving in opposite directions cannot simultaneously navigate in the channel. Therefore, the inbound and outbound processes must alternate to ensure both the safety and efficiency of the channel. Specifically, if the following ship is moving in the opposite direction to the preceding ship, it must wait until the preceding ship completes its operation before it can begin its own. This constraint ensures smooth traffic flow in the channel and prevents potential collision risks. But this process adds to the overall time and delays subsequent ships. This constraint ensures smooth traffic flow in the channel and prevents potential collision risks. The formula is as follows:

t_{j}^{s} = t_{i}^{s} + S_{i} / v_{i}

(6)

(4): Constraints for Vessels in Opposite Directions

In a one-way channel, when the inbound and outbound processes alternate, it is essential to maintain a safe separation distance between ships. This safe distance, denoted as

s^{'}

, is defined as six times the length of the largest ship. Inbound ships must wait outside the channel entrance at a safe distance until outbound ships have fully passed through the channel. Similarly, outbound ships must remain outside the safe distance from inbound ships. The scheduling start time of the following ship must be delayed by a safety time interval

t_{s}^{'}

after the start time of the preceding ship. The formula is as follows:

s^{'} = 6 \times \max (L_{i}, L_{j})

(7)

t_{s}^{'} = \frac{s^{'}}{v_{j}}

(8)

When the sailing directions of the preceding and following ships are different (e.g., the preceding ship is an inbound ship and the following ship is an outbound ship, or the preceding ship is an outbound ship and the following ship is an inbound ship), the scheduling start time of the following ship must be restricted to ensure safety. The following ship must wait for the preceding ship to complete its entire journey to avoid crossing conflicts. This is shown in Equation (9).

t_{j}^{s} = m a x (t_{j}^{a}, t_{i}^{s} + t_{s}^{'}), \forall i, j \in I

(9)

(5): Constraints for the ship following process

To ensure ships maintain a constant speed in the channel and avoid collisions, the speeds of both the leading and following ships must be carefully controlled. Specifically, the speed of the following ship must remain within a safe range to prevent it from catching up to or overtaking the leading ship before it reaches its destination. This speed constraint effectively reduces risks during the navigation process, ensuring a safe distance between ships, preventing rear-end collisions, and improving both the safety and efficiency of the channel. This is shown in Equation (10).

\frac{D_{i}}{v_{i}} - \frac{D_{i}}{v_{j} - v_{i}} < 0, v_{j} > v_{i}

(10)

(6): Constraints to avoid berth conflicts

To ensure that a berth is available when a ship enters the channel, the scheduling of inbound ships must align with berth availability. This ensures that inbound ships can smoothly reach their assigned berths and carry out safe operations, thus avoiding delays and safety risks associated with berth occupancy. The constraint is expressed in Equation (11).

(1 - B_{i}) + (1 - I O_{i}) > 0

(11)

The proposed model seeks to optimize vessel scheduling in ports by minimizing the waiting times of ships within the harbor. To accomplish this objective, a series of constraints have been meticulously designed: ensuring that vessels commence their schedules at or after their requested time; maintaining safe distances between consecutively scheduled vessels traveling in the same direction; regulating the alternating passage of inbound and outbound vessels to ensure orderly traffic flow; preserving adequate separation distances between vessels moving in opposite directions; controlling the speed of following vessels to prevent rear-end collisions; and guaranteeing the availability of berths for incoming vessels to avoid conflicts. Collectively, these measures enhance the safety and efficiency of port operations, mitigate unnecessary delays, and elevate the overall level of shipping management.

3.2. Solving Ship Scheduling Problems Using Reinforcement Learning Algorithms

3.2.1. Algorithm Framework

The port ship scheduling problem, characterized by numerous variables, constraints, and significant time complexity, has been proven to be NP-hard [49]. When the number of ships increases, traditional mathematical methods often fail to find the optimal solution. Consequently, heuristic and intelligent algorithms are typically used to explore the solution space and identify a relatively optimal outcome [50]. Heuristic algorithms are widely applied and studied in related fields, offering feasible, though not always optimal, solutions within a short timeframe. In contrast, reinforcement learning algorithms present unique advantages in complex environments. While they may not consistently guarantee an optimal strategy, particularly for large-scale and intricate problems, reinforcement learning facilitates continuous improvement through autonomous learning and environmental interaction, progressively approaching an optimal solution [51].

The optimization of ship scheduling for inbound and outbound operations in a one-way channel is a complex problem involving multivariable planning and multi-stage decision making. The objective is to minimize ship waiting times by determining the order and timing of ship operations. This multi-stage decision problem can be effectively modeled and analyzed as a Markov decision process (MDP). In the decision-making process, the MDP is represented by a tuple of five elements:

〈 S, A, R, P, γ 〉

. In this context, the state space variable

S

represents the set of possible ship operation sequences; the action space

A

represents the set of actions available to agents in the MDP; the reward space

R

represents the interactions between the agent and the environment, providing an evaluation mechanism to guide the adjustment of action strategies; the state transition probability

P

represents the environmental changes resulting from the agent’s actions, determining the likelihood of selecting a specific ship; the discount factor

γ

reflects the agent’s emphasis on future rewards and adjusts the action selection process accordingly. In an MDP, the construction of the reward space is crucial. The reward space is often a cumulative process, representing the long-term accumulated reward. It can be expressed as

G_{t} = R_{t + 1} + γ R_{t + 2} + γ^{2} R_{t + 3} + \cdot \cdot \cdot = \sum_{k = 0}^{K} γ^{k} R_{t + k + 1}

(12)

where

G_{t}

is the accumulated reward at time

t

,

R_{t + 1}

is the reward at time

t + 1

, and

K

is the accumulation duration.

The goal of reinforcement learning is to guide the optimal action strategy by learning the interaction between the intelligent system and the environment, which is achieved by maximizing the long-term accumulated reward for the optimization and iteration of the agent. In MDP, the optimization strategy of the agent is evaluated by constructing state value functions and action value functions, which can be expressed as [52]

V^{π} (s) = \max_{a \in A} (R_{s}^{a} + γ \sum_{s_{t + 1} \in S} P_{s s_{t + 1}}^{a} V (s_{t + 1}))

(13)

Q (s, a) = \max_{a \in A} (R_{s}^{a} + γ \sum_{s_{t + 1} \in S} P_{s s_{t + 1}}^{a} \max_{a \in A} Q (s_{t + 1}, a_{t + 1}))

(14)

where

s_{t + 1}

is the state at time

t + 1

,

P_{s s_{t + 1}}^{a}

is the probability of transitioning to state

s_{t + 1}

from state

s

when action

a

is taken, and

Q (s, a)

is the cumulative reward for selecting action

a

in state

s

. The current optimal state value function and optimal action value function can be expressed by the optimal state value function and optimal action value function of the next state. Through iteration, the convergent value function is obtained, which in turn determines the optimal strategy. Q-learning is one of the commonly used reinforcement learning models [53]. It determines which action should be taken in each state to maximize cumulative rewards by constructing and updating a Q-table, ultimately finding the optimal or near-optimal solution. It is a simple and effective reinforcement learning method. The basic model of reinforcement learning is shown in Figure 4.

3.2.2. Q-Learning-Based Ship Scheduling Optimization Model Design

The reinforcement learning model in this paper consists of a set of states, actions, and a reward function [53,54,55，56]. The proposed structure is as follows:

(1) State Space Design: The scheduling of ship arrivals and departures can be viewed as a traversal decision of nodes in a graph, which can be considered a variation of the traveling salesman problem (TSP). The ship traversal process is shown in Figure 5. The state in the ship scheduling model is represented by the sequence of all visited ships. The number of states varies depending on the number of nodes in the instance. All ship sequences and their actual start times together form the state set, denoted as S.

S = [s h i p_{1}, s h i p_{2}, \cdot \cdot \cdot, s h i p_{n}]

(15)

where

n

represents the number of ships:

(2) Action Space Design: In the ship scheduling model for port entry and exit, adjusting each ship’s position and actual start time is considered an action. This adjustment ensures that the safety distance between ships is maintained and that no ship operates before its designated application time. The set of all selectable ship IDs constitutes the action set in the algorithm, denoted by A.

A = [1, 2, \cdot \cdot \cdot, n]

(16)

(3) Action Selection Strategy: In this paper, the Rollout algorithm was used, and the most basic and commonly used ε-greedy strategy in reinforcement learning was selected as the action selection strategy. The ε-greedy strategy balances exploration and exploitation. Specifically, the action with the highest value function is exploited, while other suboptimal actions are still selected with a certain probability.

a = \{\begin{matrix} \max (Q (s, a)) \\ r a n d o m (A) \end{matrix}

(17)

(4) Mask Design: When an action fails to satisfy the constraint conditions, it is assigned a value of 0, effectively masking the original action set and thereby blocking or selecting specific actions. The mask construction incorporates constraints such as ship speed limitations and berth conflict restrictions [38]. The mask is denoted by

M

, and its length is equal to the number of actions.

M = [1, 1, \cdot \cdot \cdot, 1, 1]

(18)

when an action

a \in A

does not satisfy the constraint conditions, the mask

M (a) = 0

; otherwise,

M (a) = 1

. The final valid action set is

A^{'} = A \cdot M

(19)

(5) Design of the Wait Time Minimization Reward Function: Based on the established ship scheduling model for port entry and exit, the Q-learning algorithm in reinforcement learning is employed to minimize the delay in the start time of each ship’s operation relative to its requested time at the designated position. In order to conform to the Q-learning process, this algorithm improves the reward function accordingly and considers the delay of each ship and the overall delay of all ships. Moreover, the result is de-logarithmic to ensure that the calculation result has no negative value and enhances the stability of the algorithm. The corresponding objective function is formulated and can be computed using Formula (8).

R = l o g ({(\frac{\bar{t^{int}}}{\sum_{i \in u} (t_{i}^{s} - t_{i}^{a})})}^{β} * {(\frac{\bar{t^{int}}}{t_{i}^{d e l a y}})}^{α})

(20)

where

α

and

β

are used to adjust the reward function.

\bar{t^{int}}

represents the average time interval for all ships. Increasing

α

amplifies the impact of each ship’s delay on the reward, while increasing

β

amplifies the total delay’s impact on the reward.

(6) Penalty Function Design: When the resulting sequence fails to meet all ship scheduling requirements, a penalty mechanism is introduced to adjust the outcome. This penalty mechanism can be expressed mathematically as follows:

F (s^{'}) = f (s^{'}) + P (s^{'})

(21)

Here, the scheduling sequence is denoted as

S^{'}

, the objective function as

f (S^{'})

, and the penalty term as

P (S^{'})

. In this study, the penalty

P

is set to a significantly large value of 5000 s.

(7) Q-Table Design: In reinforcement learning, the Q-table stores the value of each state-action pair, guiding the agent in selecting the optimal action for a given state. In this paper, the Q-table represents the likelihood of each ship being selected at each decision-making step.

Q = (\begin{matrix} Q (s_{1}, a_{1}) & Q (s_{2}, a_{2}) & \cdot \cdot \cdot & Q (s_{1}, a_{n}) \\ Q (s_{2}, a_{1}) & Q (s_{2}, a_{2}) & \cdot \cdot \cdot & Q (s_{2}, a_{n}) \\ \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot & \cdot \cdot \cdot \\ Q (s_{n}, a_{1}) & Q (s_{n}, a_{2}) & \cdot \cdot \cdot & Q (s_{n}, a_{n}) \end{matrix})

(22)

The update method can be constructed according to Equation (14), as shown below:

Q (s_{i}, a_{j}) = Q (s_{i}, a_{j}) - l Q (s_{i}, a_{j}) + l Q (s_{i}^{'}, a_{j}^{'})

(23)

where

Q (s_{i}^{'}, a_{j}^{'})

represents the Q-table for the next state, and

l

denotes the learning rate.

3.2.3. Algorithm Process

First, the separation distances between ships are calculated based on factors such as inbound and outbound statuses, speeds, ship lengths, and channel width for different scenarios. This process generates a delay time matrix between ships. Subsequently, this paper employed the rollout method as the core mechanism for implementing the scheduling strategy. In each scheduling iteration, the algorithm first selects an initial ship based on the current policy. It then determines the next ship to be scheduled by following specific policy rules, ensuring that the pre-calculated safety delay times are not violated. To enhance exploration, the ϵ-greedy strategy is implemented, allowing ships to be selected randomly with a certain probability while following the current optimal policy in all other cases. Furthermore, the algorithm assigns a probability to each decision and records both the decisions and their probabilities for subsequent learning updates. After completing a round of ship scheduling, the algorithm evaluates the quality of the scheduling result based on the predefined reward function (total delay time). If the new scheduling solution is better than the known optimal solution, it is updated as the new optimal solution. Through repeated iterations of the rollout process, the algorithm gradually optimizes and learns more effective ship scheduling strategies for specific scenarios. The specific process is illustrated in Figure 6, In the figure, Y represents the condition that is met, and N represents the condition that is not met..

4. Experimental Case Analysis

4.1. Parameter Input

For model validation and analysis, this paper took the Dayao Bay Pier of Dalian Port in Liaoning Province as an example (see Figure 7) [57]. The Dayao Bay Pier is located in the northern part of the Port of Dalian, featuring a single-lane waterway that is 1481.6 m long, as shown in Figure 8a. Under normal circumstances, the anchorage areas include the Dayao Bay Anchorage and the Outer Sea Cargo Anchorage, as illustrated in Figure 8b. The data for the ships anchored in the anchorages on a particular day is shown in Table 1 [57], which includes information such as vessel ID, direction of entry and exit, berth number, vessel length, average speed, requested entry/exit time, and berth distance. This is assuming that the first vessel, whether entering or leaving the port, starts its scheduling at 7:00 a.m., and that berth allocation has already been completed by the terminal operator, meeting the mooring requirements of the vessels.

The experimental simulation in this paper was conducted on a Windows platform with an Intel i5-8250U CPU and 8 GB of memory. The reinforcement learning development environment included Python 3.10, PyCharm 2022, and Anaconda 3.4.1. Relevant parameters for the reinforcement learning algorithm were set as follows: learning rate

l = 0.85

[55], delay impact index

α = 1.5

and

β = 2

, and exploration rate

ε = 0.01

, and the initial Q-table was set to 0. The maximum number of iterations was set to 500.

4.2. Results Analysis

The optimal scheduling scheme for the 20 ships, as determined by the proposed reinforcement learning algorithm, is presented in Table 2. The iteration process is illustrated in Figure 9, where it is evident that the algorithm reached stabilization within 500 iterations. This indicates the algorithm’s ability to identify near-optimal solutions in a relatively short time, thus improving scheduling efficiency. Table 2 presents the optimal scheduling scheme for the ships’ port entry and exit, along with the exact start time for scheduling and the delay time for each ship.

(1): Reasonableness verification

Based on the analysis of the anchored ship data, both Ship 1 and Ship 19 are associated with Berth 3. Ship 19 has already docked and is preparing to depart, while Ship 1 is applying to enter the port, resulting in a berth conflict. According to the berth conflict constraint, Ship 19 must complete its departure plan before Ship 1 enters the port. As shown in Table 2, the optimal scheduling plan prioritizes the departure of Ship 19, followed by the scheduling of Ship 1’s entry, ensuring the rationality and feasibility of the scheduling plan. To further illustrate the scheduling results of the 20 ships in Table 2, a Gantt chart, as shown in Figure 10, was created. The diagram clearly illustrates the optimal entry and exit schedule for the 20 ships, providing a visual representation of each ship’s waiting and sailing times. It also details the sequence of entry and exit for each vessel. The timeline begins at 7:00 a.m. on the day of scheduling and is measured in minutes, offering a comprehensive view of the entire scheduling process, from the ship’s entry application to its departure completion.

(2): Algorithm Superiority

From the perspective of optimal scheduling efficiency, for a one-way channel, since ships cannot sail in opposite directions simultaneously, it is necessary to ensure that there are no ships in the channel before changing direction. Excessive direction changes lead to unnecessary time waste. In theory, for longer one-way channels, reducing the number of direction changes helps shorten the ship scheduling time. Table 3 compares 20 ship scheduling schemes based on first-come-first-served (FCFS) rule, reinforcement learning (RL) algorithm, and basic genetic algorithm (GA). According to the table, the optimal scheduling plan using the reinforcement learning algorithm requires 7 changes of direction, while the FCFS rule requires 11 changes, which significantly reduces the ship’s delay time. In addition, the table also shows the comparison between reinforcement learning algorithm and basic genetic algorithm in scheduling results. The results show that the reinforcement learning algorithm can reduce the delay time by 16% compared with the genetic algorithm. Figure 11 presents the iteration curves for both algorithms, clearly showing that the reinforcement learning approach converges faster and results in shorter waiting times. Figure 12 displays the average runtime of both algorithms over 10 iterations, with the genetic-algorithm-based scheduling optimization taking nearly 700 s, while the reinforcement learning-based scheduling optimization requires only half that time. Figure 13 illustrates the range of optimal values achieved by both algorithms under the same ship scheduling model across 10 iterations. The graph reveals that the Q-Learning algorithm has a smaller difference between the best and worst solutions compared to the genetic algorithm, indicating greater stability. These results demonstrate that the reinforcement learning algorithm is able to identify more efficient and superior scheduling solutions more quickly and consistently.

5. Conclusions

To address the ship scheduling optimization problem for a one-way channel, this paper proposes a ship scheduling optimization model based on reinforcement learning algorithms, with the objective of minimizing waiting times for ships in the port. The model incorporates multiple constraints, including initialization constraints, traffic conversion constraints, time-slot allocation constraints, and berth conflict resolution constraints, all based on the traffic flow characteristics of different port areas. A reinforcement learning algorithm was designed and implemented using Python to solve the problem. Targeted simulation scenarios were created to validate the proposed model. The computational results demonstrate that, compared to the FCFS scheduling method, the proposed model reduced the number of ship entry and exit transitions and decreases the total scheduling time by over 40%. Furthermore, when compared to the genetic algorithm (GA) method, the proposed model achieved more efficient and reasonable scheduling within the same time frame. Consequently, the approach presented in this paper ensures more efficient ship entry and exit at the port. However, the model is simplified, and future research should aim to develop more accurate and adaptable models. Ship scheduling is a complex problem, and this study does not consider dynamic factors such as fluctuating arrival and departure times of ships. With the upsizing trend of ships, the tidal factor will become a major influence on ship scheduling, and the weather will also increase the delay time of ships. At the same time, the allocation of port resources (such as the allocation of pilots, tugs and berths) will also be a major influence. From the perspective of algorithms, due to the limitation of Q-learning algorithm, as the number of ships increases, the computing power requirements will also increase, and the computing speed will be affected, which can be further improved to adapt to more scenarios and obtain better performance.

Author Contributions

Conceptualization, R.Z. and M.S.; literature search, R.Z. and M.S.; data curation, M.S. and Q.F.; investigation, R.Z., M.S., and Q.F.; writing—original draft, R.Z. and M.S.; methodology, R.Z. and M.S., writing—review and editing, R.Z.; visualization, R.Z., M.S., and Q.F.; funding acquisition, Q.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number 52001134) and the Fujian Provincial Natural Science Foundation (grant numbers 2024J01102 and 2023J01326).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lalla-Ruiz, E.; Shi, X.; Voss, S. The Waterway Ship Scheduling Problem. Transport. Res. Part D-Transport. Environ. 2018, 60, 191–209. [Google Scholar] [CrossRef]
Zhen, R.; Shi, Z.; Gu, Q.; Yang, S. A Novel Deterministic Search-Based Algorithm for Multi-Ship Collaborative Collision Avoidance Decision-Making. Ocean Eng. 2024, 292, 116524. [Google Scholar] [CrossRef]
Tong, Y.; Zhen, R.; Dong, H.; Liu, J. Identifying Influential Ships in Multi-Ship Encounter Situation Complex Network Based on Improved WVoteRank Approach. Ocean Eng. 2023, 284, 115192. [Google Scholar] [CrossRef]
Jiang, X.; Zhong, M.; Shi, G.; Li, W.; Sui, Y. Vessel Scheduling Model with Resource Restriction Considerations for Restricted Channel in Ports. Comput. Ind. Eng. 2023, 177, 109034. [Google Scholar] [CrossRef]
Liu, B.; Li, Z.-C.; Wang, Y. A Two-Stage Stochastic Programming Model for Seaport Berth and Channel Planning with Uncertainties in Ship Arrival and Handling Times. Transp. Res. Part E Logist. Transp. Rev. 2022, 167, 102919. [Google Scholar] [CrossRef]
Zhang, P. Optimization Study of Ship Scheduling in Unidirectional Channel Considering Ship Extension. Master’s Thesis, Dalian Maritime University, Dalian, China, 2023. [Google Scholar]
Lei, K. Integrated Scheduling Optimization of One-Way Channel Vessel Entry and Exit and Tugboat Distribution. Master’s Thesis, Dalian Maritime University, Dalian, China, 2022. [Google Scholar]
Shang, W.-L.; Zhang, J.; Wang, K.; Yang, H.; Ochieng, W. Can Financial Subsidy Increase Electric Vehicle (EV) Penetration---Evidence from a Quasi-Natural Experiment. Renewable and Sustainable Energy Reviews 2024, 190, 114021. [Google Scholar] [CrossRef]
Shang, W.-L.; Chen, Y.; Yu, Q.; Song, X.; Chen, Y.; Ma, X.; Chen, X.; Tan, Z.; Huang, J.; Ochieng, W. Spatio-Temporal Analysis of Carbon Footprints for Urban Public Transport Systems Based on Smart Card Data. Appl. Energy 2023, 352, 121859. [Google Scholar] [CrossRef]
Zhen, R.; Lv, P.; Shi, Z.; Chen, G. A Novel Fuzzy Multi-Factor Navigational Risk Assessment Method for Ship Route Optimization in Costal Offshore Wind Farm Waters. Ocean Coast. Manag. 2023, 232, 106428. [Google Scholar] [CrossRef]
Dong, H.; Zhen, R.; Gu, Q.; Lin, Z.; Chen, J.; Yan, K.; Chen, B. A Novel Collaborative Collision Avoidance Decision Method for Multi-Ship Encounters in Complex Waterways. Ocean Eng. 2024, 313, 119512. [Google Scholar] [CrossRef]
Huang, L.; Wan, C.; Wen, Y.; Song, R.; van Gelder, P. Generation and Application of Maritime Route Networks: Overview and Future Research Directions. IEEE Trans. Intell. Transp. Syst. 2025, 26, 620–637. [Google Scholar] [CrossRef]
Wang, D.; Liao, F. Analysis of First-Come-First-Served Mechanisms in One-Way Car-Sharing Services. Transp. Res. Part B Methodol. 2021, 147, 22–41. [Google Scholar] [CrossRef]
Xia, Z.; Guo, Z.; Wang, W.; Jiang, Y. Joint Optimization of Ship Scheduling and Speed Reduction: A New Strategy Considering High Transport Efficiency and Low Carbon of Ships in Port. Ocean Eng. 2021, 233, 109224. [Google Scholar] [CrossRef]
Xu, G.; Guo, T.; Wu, Z. Optimum Scheduling Model for Ship in/Outbound Harbor in One-Way Traffic Fairway. J. Shanghai Marit. Univ. 2008, 34, 50–153, 157. [Google Scholar]
Zheng, H.; Liu, B.; Deng, C.; Feng, P. Ship Scheduling Optimization in One-Way Channel Bulk Harbor. Oper. Res. Manag. Sci. 2018, 27, 28–37. [Google Scholar]
Xu, D.; Li, C.-L.; Leung, J.Y.-T. Berth Allocation with Time-Dependent Physical Limitations on Vessels. Eur. J. Oper. Res. 2012, 216, 47–56. [Google Scholar] [CrossRef]
Ting, C.-J.; Wu, K.-C.; Chou, H. Particle Swarm Optimization Algorithm for the Berth Allocation Problem. Expert Syst. Appl. 2014, 41, 1543–1550. [Google Scholar] [CrossRef]
Mauri, G.R.; Ribeiro, G.M.; Lorena, L.A.N.; Laporte, G. An Adaptive Large Neighborhood Search for the Discrete and Continuous Berth Allocation Problem. Comput. Oper. Res. 2016, 70, 140–154. [Google Scholar] [CrossRef]
Bai, X.; Li, B.; Xu, X. Sequencing of Ships Entering a Port Based on an Improved Artificial Fish Swarm Algorithm. J. Shanghai Marit. Univ. 2021, 42, 85–90. [Google Scholar]
Li, R.; Zhang, X.; Li, J.; Jiang, L. Application of Self-Learning Genetic Algorithm Based on Reinforcement Learning in Ship Scheduling. J. Dalian Marit. Univ. 2022, 48, 20–30. [Google Scholar] [CrossRef]
Zhang, X.; Lin, J.; Guo, Z.; Chen, X. Vessel Scheduling Optimization Based on Simulated Annealing and Multiple Population Genetic Algorithm. Navig. China 2016, 39, 26–30. [Google Scholar]
Zhang, B.; Zheng, Z.; Wang, D. A Model and Algorithm for Vessel Scheduling through a Two-Way Tidal Channel. Marit. Policy Manag. 2020, 47, 188–202. [Google Scholar] [CrossRef]
Zhang, X.; Li, R.; Lin, J. Vessel Scheduling Optimization in Two-Way Traffic Ports. Navig. China 2018, 41, 36–40. [Google Scholar]
Wang, H.; Tian, W.; Zhang, J.; Li, Y. A Hybrid Self-Organizing Scheduling Method for Ships in Restricted Two-Way Waterways. Brodogradnja 2020, 71, 15–30. [Google Scholar] [CrossRef]
Meisel, F.; Fagerholt, K. Scheduling Two-Way Ship Traffic for the Kiel Canal: Model, Extensions and a Matheuristic. Comput. Oper. Res. 2019, 106, 119–132. [Google Scholar] [CrossRef]
Gong, H. Research on Integrated Dispatching Optimization of Compound Channel and Tugboat. Master’s Thesis, Dalian Maritime University, Dalian, China, 2022. [Google Scholar]
Zhang, B.; Zheng, Z. Model and Algorithm for Vessel Scheduling Optimisation through the Compound Channel with the Consideration of Tide Height. Int. J. Shipp. Transp. Logist. 2021, 13, 445–461. [Google Scholar] [CrossRef]
Wang, W.; Ding, A.; Cao, Z.; Peng, Y.; Liu, H.; Xu, X. Deep Reinforcement Learning for Channel Traffic Scheduling in Dry Bulk Export Terminals. IEEE Trans. Intell. Transp. Syst. 2024, 25, 17547–17561. [Google Scholar] [CrossRef]
Chen, X.; Wu, H.; Han, B.; Liu, W.; Montewka, J.; Liu, R.W. Orientation-Aware Ship Detection via a Rotation Feature Decoupling Supported Deep Learning Approach. Engineering Applications of Artificial Intelligence 2023, 125, 106686. [Google Scholar] [CrossRef]
Lin, C.; Zhen, R.; Tong, Y.; Yang, S.; Chen, S. Regional ship collision risk prediction: An approach based on encoder-decoder LSTM neural network model. Ocean Eng. 2024, 296, 117019. [Google Scholar] [CrossRef]
Jonathan, H. Connell and Sridhar Mahadevan. In Robotica; Cambridge University Press: Cambridge, UK, 1999; Volume 17, pp. 229–235. [Google Scholar] [CrossRef]
Lopes Silva, M.A.; de Souza, S.R.; Freitas Souza, M.J.; Bazzan, A.L.C. A Reinforcement Learning-Based Multi-Agent Framework Applied for Solving Routing and Scheduling Problems. Expert Syst. Appl. 2019, 131, 148–171. [Google Scholar] [CrossRef]
Hottung, A.; Kwon, Y.-D.; Tierney, K. Efficient Active Search for Combinatorial Optimization Problems. Available online: https://arxiv.org/abs/2106.05126v3 (accessed on 16 September 2024).
Ottoni, A.L.C.; Nepomuceno, E.G.; de Oliveira, M.S.; de Oliveira, D.C.R. Reinforcement Learning for the Traveling Salesman Problem with Refueling. Complex Intell. Syst. 2022, 8, 2001–2015. [Google Scholar] [CrossRef]
Gambardella, L.M.; Dorigo, M. Ant-Q: A Reinforcement Learning Approach to the Traveling Salesman Problem. In Machine Learning Proceedings 1995; Prieditis, A., Russell, S., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 1995; pp. 252–260. ISBN 978-1-55860-377-6. [Google Scholar]
Duan, H.; Liu, P.; Li, Z.; Tang, D. Variable Speed Limit Control at Freeway Merge Bottlenecks Based on Reinforcement Learning. J. Transp. Syst. Eng. Inf. Technol. 2015, 15, 55–61. [Google Scholar]
Fotuhi, F.; Huynh, N.; Vidal, J.M.; Xie, Y. Modeling Yard Crane Operators as Reinforcement Learning Agents. Res. Transp. Econ. 2013, 42, 3–12. [Google Scholar] [CrossRef]
Huang, H.; Fang, Z.; Wang, Y.; Tang, J.; Fu, X. Analysing Taxi Customer-Search Behaviour Using Copula-Based Joint Model. Transp. Saf. Environ. 2022, 4, tdab033. [Google Scholar] [CrossRef]
Meng, X.; Tang, J.; Yang, F.; Wang, Z. Lane-Changing Trajectory Prediction Based on Multi-Task Learning. Transp. Saf. Environ. 2023, 5, tdac073. [Google Scholar] [CrossRef]
Xiang, Z.; Zhou, D.; Sun, H.; Chu, T. Research on Departure Sequencing in Multi- airport Terminal Area Based on Reinforcement Learning. Aeronaut. Comput. Tech. 2023, 53, 30–34. [Google Scholar]
Zhu, C.; Zhang, Z. Application of Q-Learning Agent in Aircraft Sequencing at En-Route Intersection. Aeronaut. Comput. Tech. 2015, 45, 68–70. [Google Scholar]
Wang, Y. Ship Traffic Organization in Intersection Waters Based on Reinforcement Learning. Master’s Thesis, Dalian University of Technology, Dalian, China, 2022. [Google Scholar]
Du, J.; Zhao, X.; Guo, L.; Wang, J. Machine Learning-Based Approach to Liner Shipping Schedule Design. J. Shanghai Jiaotong Univ. (Sci.) 2022, 27, 411–423. [Google Scholar] [CrossRef]
Xin, X.; Liu, K.; Zhang, J.; Chen, S.; Wang, H.; Cheng, Z. A Self-Organizing Grouping Approach for Ship Traffic Scheduling in Restricted One-Way Waterway. Mar. Technol. Soc. J. 2019, 53, 83–96. [Google Scholar] [CrossRef]
Jiang, J. Analysis of Domestic Intelligent Ship Traffic Management System. Mar. Equip./Mater. Mark. 2023, 31, 26–28. [Google Scholar] [CrossRef]
Liu, C.; Liu, J.; Zhou, X.; Zhao, Z.; Wan, C.; Liu, Z. AIS Data-Driven Approach to Estimate Navigable Capacity of Busy Waterways Focusing on Ships Entering and Leaving Port. Ocean Eng. 2020, 218, 108215. [Google Scholar] [CrossRef]
Liu, X.; Tang, J.; Yuan, C.; Gao, F.; Ding, X. Examining the Characteristics between Time and Distance Gaps of Secondary Crashes. Transp. Saf. Environ. 2024, 6, tdad014. [Google Scholar] [CrossRef]
Zheng, H.; Xu, H.; Liu, B.; Cao, H. One-Way Channel Ship Inbound Order and Berth Allocation Collaborative Optimization. Oper. Res. Manag. Sci. 2017, 26, 37–45. [Google Scholar]
Naderi, E.; Pourakbari-Kasmaei, M.; Cerna, F.V.; Lehtonen, M. A Novel Hybrid Self-Adaptive Heuristic Algorithm to Handle Single- and Multi-Objective Optimal Power Flow Problems. Int. J. Electr. Power Energy Syst. 2021, 125, 106492. [Google Scholar] [CrossRef]
Naeem, M.; Rizvi, S.T.H.; Coronato, A. A Gentle Introduction to Reinforcement Learning and Its Application in Different Fields. IEEE Access 2020, 8, 209320–209344. [Google Scholar] [CrossRef]
Clifton, J.; Laber, E. Q-Learning: Theory and Applications. Annu. Rev. Stat. Its Appl. 2020, 7, 279–301. [Google Scholar] [CrossRef]
Wang, T.; Zhang, Y.; Hu, X. A Q-Learning Based Hyper-Heuristic Scheduling Algorithm with Multi-Rule Selection for Sub-Assembly in Shipbuilding. Comput. Ind. Eng. 2024, 197, 110567. [Google Scholar] [CrossRef]
Bianchi, R.A.; Ribeiro, C.H.; Costa, A.H. On the Relation between Ant Colony Optimization and Heuristically Accelerated Reinforcement Learning. In Proceedings of the 1st International Workshop on Hybrid Control of Autonomous System, Pasadena, CA, USA, 13 July 2009; AAAI: Palo Alto, CA, USA, 2009; pp. 49–55. [Google Scholar]
Ottoni, A.L.C.; Nepomuceno, E.G.; de Oliveira, M.S. A Response Surface Model Approach to Parameter Estimation of Reinforcement Learning for the Travelling Salesman Problem. J. Control Autom. Electr. Syst. 2018, 29, 350–359. [Google Scholar] [CrossRef]
Júnior, F.C.D.L.; Neto, A.D.D.; De Melo, J.D. Hybrid Metaheuristics Using Reinforcement Learning Applied to Salesman Traveling Problem. In Traveling Salesman Problem, Theory and Applications; IntechOpen: Rijeka, Croatia, 2010; ISBN 953-307-426-4. [Google Scholar]
Zhang, X.; Chen, X.; Ji, M.; Yao, S. Vessel Scheduling Model of a One-Way Port Channel. J. Waterw. Port Coast. Ocean Eng. 2017, 143, 04017009. [Google Scholar] [CrossRef]

Figure 1. Overall flowchart.

Figure 2. Diagram of a one-way channel.

Figure 3. Same-direction sailing of ships in a one-way channel.

Figure 4. Reinforcement learning model.

Figure 5. Illustration of ship traversal.

Figure 6. Algorithm process.

Figure 7. Dayao Bay Pier of Dalian Port.

Figure 8. Dayao Bay illustration, (a) is the channel, (b) is the anchorage.

Figure 9. Reinforcement learning optimization iteration process.

Figure 10. Optimal scheduling diagram.

Figure 11. Convergence curves of RL and GA algorithms.

Figure 12. The average time for both algorithms across 10 identical scheduling problems.

Figure 13. The best result range for both algorithms over 10 identical scheduling problems.

Table 1. Basic data of ships.

Ship Number	Direction	Length (m)	Berth Number	Speed (kn)	Sailing Distance (n mile)	Setup Time (hh:mm)
1	In	148	3	8.0	1.5	7:08
2	In	142.7	6	14.0	1.8	7:01
3	In	84.4	1	8.8	1.3	7:07
4	Out	144.8	5	9.0	1.7	7:18
5	Out	278.7	14	8.3	2.6	7:04
6	In	241.3	12	7.1	2.4	7:40
7	In	115.8	9	11.2	2.1	7:56
8	Out	97.2	13	9.5	2.5	7:49
9	Out	150.4	6	15.5	1.8	7:36
10	In	88.8	13	7.7	2.5	7:23
11	In	130	17	10.7	2.9	7:24
12	Out	108.4	10	11.7	2.2	8:29
13	In	142.7	5	11.5	1.7	7:12
14	In	72	2	9.6	1.4	7:47
15	In	166.2	7	12.3	1.9	7:38
16	In	147.5	11	10.9	2.3	7:09
17	In	334.1	14	12.9	2.6	7:03
18	In	85.5	10	8.6	2.2	7:01
19	Out	145.1	3	16.5	1.5	7:11
20	In	95.2	4	7.8	1.6	7:00

Table 2. Ship delay table.

Ship Number	Direction	Setup Time (hh/mm)	Start Time (hh/mm)	Delay Time (min)
5	Out	7:04	7:04	0
3	In	7:07	7:28	21
16	In	7:09	7:32	21
19	Out	7:11	7:44	28
4	Out	7:18	7:46	0
9	Out	7:36	7:49	25
7	In	7:56	7:59	3
14	In	7:47	8:00	42
6	In	7:40	8:05	13
11	In	7:24	8:09	87
20	In	7:00	8:10	45
17	In	7:03	8:19	33
8	Out	7:49	8:31	101
2	In	7:01	8:49	13
10	In	7:23	8:50	105
13	In	7:12	8:53	23
12	Out	8:29	9:02	76
18	In	7:01	9:16	135
1	In	7:08	9:19	33
15	In	7:38	9:23	105

Table 3. Comparison of FCFS and RL scheduling schemes for 20 ships.

FCFS				RL				GA
Ship Number	Direction	Setup Time (hh:mm)	Start Time (hh:mm)	Ship Number	Direction	Setup Time (hh:mm)	Start Time (hh:mm)	Ship Number	Direction	Setup Time (hh:mm)	Start Time (hh:mm)
20	In	7:00	7:00	5	Out	7:04	7:04	19	Out	7:11	7:11
18	In	7:01	7:12	3	In	7:07	7:28	16	In	7:09	7:19
5	Out	7:04	7:27	16	In	7:09	7:32	20	In	7:00	7:21
17	In	7:03	7:46	19	Out	7:11	7:44	1	In	7:08	7:24
3	In	7:07	7:58	4	Out	7:18	7:46	9	Out	7:36	7:36
16	In	7:09	8:07	9	Out	7:36	7:49	4	Out	7:18	7:37
19	Out	7:11	8:20	7	In	7:56	7:59	2	In	7:01	7:51
1	In	7:08	8:25	14	In	7:47	8:00	3	In	7:07	7:52
4	Out	7:18	8:36	6	In	7:40	8:05	14	In	7:47	7:53
13	In	7:12	8:48	11	In	7:24	8:09	12	Out	7:23	8:29
11	In	7:24	8:56	20	In	7:00	8:10	15	In	7:38	8:42
9	Out	7:36	9:13	17	In	7:03	8:19	7	In	7:56	8:44
2	In	7:01	9:20	8	Out	7:49	8:31	6	In	7:40	8:48
15	In	7:38	9:27	2	In	7:01	8:49	8	Out	7:49	9:09
6	In	7:40	9:37	10	In	7:23	8:50	5	Out	7:04	9:14
14	In	7:47	9:57	13	In	7:12	8:53	17	In	7:03	9:38
8	Out	7:49	10:06	12	Out	8:29	9:02	13	In	7:01	9:40
10	In	7:23	10:22	18	In	7:01	9:16	11	In	7:24	9:43
7	In	7:56	10:41	1	In	7:08	9:19	18	In	7:01	9:44
12	Out	8:29	10:52	15	In	7:38	9:23	10	In	7:23	9:46
Total delay (h)		30.37		18.20				21.90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhen, R.; Sun, M.; Fang, Q. Optimization of Inbound and Outbound Vessel Scheduling in One-Way Channel Based on Reinforcement Learning. J. Mar. Sci. Eng. 2025, 13, 237. https://doi.org/10.3390/jmse13020237

AMA Style

Zhen R, Sun M, Fang Q. Optimization of Inbound and Outbound Vessel Scheduling in One-Way Channel Based on Reinforcement Learning. Journal of Marine Science and Engineering. 2025; 13(2):237. https://doi.org/10.3390/jmse13020237

Chicago/Turabian Style

Zhen, Rong, Meng Sun, and Qionglin Fang. 2025. "Optimization of Inbound and Outbound Vessel Scheduling in One-Way Channel Based on Reinforcement Learning" Journal of Marine Science and Engineering 13, no. 2: 237. https://doi.org/10.3390/jmse13020237

APA Style

Zhen, R., Sun, M., & Fang, Q. (2025). Optimization of Inbound and Outbound Vessel Scheduling in One-Way Channel Based on Reinforcement Learning. Journal of Marine Science and Engineering, 13(2), 237. https://doi.org/10.3390/jmse13020237

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimization of Inbound and Outbound Vessel Scheduling in One-Way Channel Based on Reinforcement Learning

Abstract

1. Introduction

2. Literature Review

2.1. Current Research on Ship Scheduling

2.2. Reinforcement Learning in Combinatorial Optimization

3. Model of Ship Scheduling Problem in One-Way Channels

3.1. Construction of One-Way Channel Ship Scheduling Model

3.2. Solving Ship Scheduling Problems Using Reinforcement Learning Algorithms

3.2.1. Algorithm Framework

3.2.2. Q-Learning-Based Ship Scheduling Optimization Model Design

3.2.3. Algorithm Process

4. Experimental Case Analysis

4.1. Parameter Input

4.2. Results Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI