SSRL: A Clustering-Based Reinforcement Learning Approach for Efficient Ship Scheduling in Inland Waterways

Gan, Shaojun; Wang, Xin; Li, Hongdun

doi:10.3390/sym17050679

Open AccessArticle

SSRL: A Clustering-Based Reinforcement Learning Approach for Efficient Ship Scheduling in Inland Waterways

by

Shaojun Gan

¹,

Xin Wang

¹ and

Hongdun Li

^2,*

¹

School of Metropolitan Transportation, Beijing University of Technology, Beijing 100124, China

²

China Academy of Transportation Sciences, Beijing 100029, China

^*

Author to whom correspondence should be addressed.

Symmetry 2025, 17(5), 679; https://doi.org/10.3390/sym17050679

Submission received: 11 March 2025 / Revised: 18 April 2025 / Accepted: 22 April 2025 / Published: 29 April 2025

(This article belongs to the Section Engineering and Materials)

Download

Browse Figures

Versions Notes

Abstract

Efficient ship scheduling in inland waterways is critical for maritime transportation safety and economic viability. However, traditional scheduling methods, primarily based on First Come First Served (FCFS) principles, often produce suboptimal results due to their inability to account for complex spatial–temporal dependencies, directional asymmetries, and varying ship characteristics. This paper introduces SSRL (Ship Scheduling through Reinforcement Learning), a novel framework that addresses these limitations by integrating three complementary components: (1) a Q-learning framework that discovers optimal scheduling policies through environmental interaction rather than predefined rules; (2) a clustering mechanism that reduces the high-dimensional state space by grouping similar ship states; and (3) a sliding window approach that decomposes the scheduling problem into manageable subproblems, enabling real-time decision-making. We evaluated SSRL through extensive experiments using both simulated scenarios and real-world data from the Xiaziliang Restricted Waterway in China. Results demonstrate that SSRL reduces total ship waiting time by 90.6% compared with TSRS, 48.4% compared with FAHP-ES, and 32.6% compared with OSS-SW, with an average reduction of 57.2% across these baseline methods. SSRL maintains superior performance across varying traffic densities and uncertainty conditions, with the optimal information window length of 13–14 ships providing the best balance between solution quality and computational efficiency. Beyond performance improvements, SSRL offers significant practical advantages: it requires minimal computation for online implementation, adapts to dynamic maritime environments without manual reconfiguration, and can potentially be extended to other complex transportation scheduling domains.

Keywords:

restricted waterways; ship scheduling; reinforcement learning; clustering; sliding window

1. Introduction

1.1. Background

Maritime transportation stands as a cornerstone of global trade, recognized for its superior efficiency, environmental friendliness, and cost-effectiveness compared with other transportation modes [1,2]. Within the maritime sector, inland waterway transportation has emerged as a strategic priority, particularly in China, where it serves as a vital channel for domestic economic circulation and long-distance bulk cargo transport. According to the latest data from China’s Ministry of Transport, the nation’s waterway cargo volume surpassed 9.37 billion tons in 2023, with inland waterway shipping achieving a cargo volume of 4.79 billion tons and a cargo turnover of 2077.254 billion ton-kilometers [3]. Despite representing 8.6% of China’s total freight traffic with an estimated economic impact of 3.0 trillion RMB annually, this proportion remains below optimal levels for a comprehensive transportation system [3,4].

The growing shipping volume particularly impacts the middle and upper reaches of the Yangtze River, where geographical features create natural navigational challenges. In these regions, restricted waterways are defined as specialized one-way traffic systems with narrow, winding channels and torrential water flows, as illustrated in Figure 1. These waterways typically extend for 1 to 5 km and create bottlenecks that limit vessel visibility and maneuverability [5]. To address these challenges, China has established signal stations at each restricted waterway to optimize ship scheduling and minimize waiting times [6], which delivers significant economic benefits by reducing operational costs for shipping companies, environmental advantages through decreased emissions from idling vessels, and operational improvements via enhanced waterway throughput capacity and increased supply chain reliability [7,8].

Two primary approaches exist to enhance the restricted waterway efficiency. The first approach is channelization, which involves physical infrastructure constructions to create straighter, deeper, and wider channels [9]. While this method can substantially improve waterway capacity, it requires significant capital investment, necessitates temporary waterway closures, and may introduce environmental disruptions [10,11]. The second approach, intelligent ship scheduling, focuses on optimizing ship passing sequences through advanced algorithms [12,13]. Although this method does not increase physical capacity [14], it offers a more cost-effective, environmentally friendly, and immediately deployable solution for improving navigational efficiency without disrupting existing operations [15].

1.2. Traditional Scheduling Methods in Restricted Waterways

Traditional scheduling methods in restricted waterways primarily rely on First Come First Served (FCFS) principles [16,17,18]. While FCFS approaches offer straightforward implementation and perceived fairness, they neglect critical operational factors, including varying vessel types, navigation speeds, and directional asymmetries [19,20]. These traditional rule-based methods represent non-machine learning approaches to ship scheduling, which rely on predetermined rules. FCFS inherently prioritizes temporal order over system-wide efficiency, leading to suboptimal utilization of waterway capacity, particularly during high traffic density periods [4,21]. Moreover, FCFS-based methods disregard the substantial time disparities between upstream and downstream navigation, which typically differ by a factor of 2–3 in restricted waterways throughout the Yangtze River system [5]. Such operational inefficiencies create a significant performance gap between current waterway management practices and China’s national strategic goal of developing a ‘strong transportation country’ with optimized inland waterway networks [22].

Several studies have investigated the effectiveness of FCFS-based scheduling. Jian et al. [23] developed a fluency model for restricted waterways using FCFS rules and evaluated its performance through Monte Carlo simulations, considering factors such as ship speed, arrival patterns, and waterway characteristics. Liu et al. [24] constructed a simulation model for the Three Gorges ship lock using the SIVAK platform, demonstrating that FCFS-based scheduling could reduce waiting times compared with sequential lock operations. To enhance FCFS performance, Lalla-Ruiz et al. [20] proposed a heuristic optimization algorithm that allows dynamic adjustment of ship sequences while maintaining the basic FCFS principle. Similarly, Gan et al. [17] extended the FCFS framework by incorporating safety constraints for restricted waterway operations.

While FCFS-based methods offer simplicity and fairness, they often lead to suboptimal solutions as they prioritize temporal equality over system efficiency. These methods fail to account for critical factors such as bi-directional traffic patterns, varying navigation times, and the substantial differences between upstream and downstream passage durations. Yang et al. [21] demonstrated that system-optimal scheduling for vessels passing through a waterway bottleneck could reduce costs significantly compared with non-coordinated FCFS approaches by accounting for both bunker costs and schedule delay penalties.

1.3. Intelligent Optimization Methods in Restricted Waterways

Recent advances in computational intelligence have enabled more sophisticated scheduling approaches to optimize ship sequences based on multiple objectives and constraints. Xin et al. [25] introduced a self-organizing model that first clusters ships based on arrival patterns and then optimizes intra-cluster scheduling to minimize ship-ship interference. Addressing congestion at the Three Gorges Dam, Zhao et al. [8] developed a hybrid meta-heuristic algorithm that simultaneously optimizes ship lift and lock scheduling. Xia et al. [7] proposed an integrated approach combining genetic algorithms with speed optimization to reduce both waiting time and carbon emissions. Liu et al. [4] introduced a fuzzy scheduling optimization method for one-way waterways that employs triangular fuzzy numbers to address uncertainty in vessel speeds and provides an algorithm for determining feasible tidal time windows. Their approach outperformed traditional rule-based methods while maintaining computational efficiency. In another study, Liu et al. [19] developed a ship scheduling model that addresses channel-lock coordination during flood season. Using a multi-population genetic algorithm, they demonstrated significant waiting time reduction in both normal and flood impact scenarios compared with independent scheduling approaches. Zhang et al. [11] proposed a model for ship scheduling in an anchorage-to-quay channel with water discharge restrictions. Their approach integrates ship sequencing at the anchorage, channel allocation, and berth planning while considering water discharge impacts to improve transportation efficiency and navigation safety.

Beyond the meta-heuristic and evolutionary approaches, researchers have further explored reinforcement learning frameworks for ship scheduling problems. Li et al. [26] proposed an adaptive heuristic algorithm based on reinforcement learning (GSAA-RL) for ship scheduling optimization, where Q-learning with a unique property of selecting suitable parameters dynamically is developed to adjust the parameters of crossover and mutation to improve the search ability of the algorithm. Their approach can significantly shorten the total time spent by ships in port compared with existing methods. Wang et al. [27] developed a hierarchical deep reinforcement learning framework for channel traffic scheduling in dry bulk export terminals, considering ship deballasting delays and dynamic switching traffic modes. Their approach is capable of producing integrated scheduling plans while minimizing ship mooring, unberthing, and deballasting delays, demonstrating superior performance compared with both traditional optimization methods and practical scheduling rules.

1.4. Research Gaps and Proposed Approach

Despite these advances, several critical challenges remain in the current literature. Unlike previous studies that allow the simultaneous passage of small ships in opposite directions, restricted waterways on the Yangtze River strictly prohibit bi-directional traffic due to safety concerns. Most existing methods assume variable ship speeds; however, speed adjustments are highly dangerous and practically infeasible in restricted waterways with torrential flows and sharp bends. Current approaches typically focus on arrival times while overlooking the significant variations in passage duration. Additionally, the dynamic nature of waterway traffic demands immediate scheduling decisions, yet most existing algorithms require substantial computational time.

Recent advances in artificial intelligence and machine learning present new opportunities for developing more sophisticated scheduling solutions that can adapt to dynamic maritime environments while maintaining computational efficiency. This paper introduces SSRL (Ship Scheduling through Reinforcement Learning), a novel algorithm that integrates three key components to overcome the technical challenges of ship scheduling in restricted waterways. First, unlike rule-based systems, SSRL employs Q-learning to discover efficient scheduling policies through environmental interaction, allowing the system to adapt to the complex dynamics of waterway traffic without relying on predefined rules. Second, to address the high-dimensional nature of the scheduling problem, SSRL utilizes a fuzzy clustering algorithm to group similar ship states, significantly reducing computational complexity while preserving essential traffic pattern information. Third, SSRL implements a novel sliding window approach that decomposes the scheduling problem into smaller, manageable subproblems by focusing on immediate-vicinity ships, enabling real-time decision-making in dynamic environments.

Our research makes several significant contributions to both the theoretical understanding and practical implementation of intelligent waterway management. We develop a comprehensive mathematical formulation of the restricted waterway scheduling problem that accounts for bi-directional traffic patterns, safety constraints, and varying navigation times. We demonstrate SSRL’s effectiveness through extensive simulation experiments, achieving an average 32% reduction in total waiting time compared with existing approaches, including the Traffic Signal Revealing System (TSRS), Online Ship Scheduling algorithm (OSS-SW), and Expert System-based algorithm (FAHP-ES). We also provide practical implementation guidelines, including optimal parameter settings for various traffic conditions, enabling straightforward adoption in real-world waterway management systems.

The remainder of this paper is structured as follows: Section 2 presents the mathematical formulation of the ship scheduling problem in restricted waterways. Section 3 details the proposed SSRL algorithm and its components. Section 4 describes the experimental validation and parameter optimization process. Finally, Section 5 concludes this paper with a summary of findings and future research directions.

2. The Ship Scheduling Problem in Restricted Waterways

The waterway traffic management department establishes a signaling system in each restricted waterway to regulate the ships’ passing sequence (as shown in Figure 2). Unlike on-road traffic light systems, the restricted waterway traffic management system delivers each ship either a ‘go’ signal or a ‘stop’ signal. Upon receiving a ‘go’ signal, a ship is authorized to enter the restricted waterway immediately. Conversely, upon receiving a ‘stop’ signal, the ship must wait outside the restricted waterway until it receives a ‘go’ signal.

Suppose N ships are scheduled to pass through a restricted waterway. The ith ship (

i = 1, 2, 3, \dots, N

) arrives at the restricted waterway border at time

t_{P A T} (i)

and requires time

t_{P C T} (i)

to cross the waterway. The signal station must assign each ship an allowed entry time

t_{A E T} (i)

and an allowed crossing time

t_{A C T} (i)

. The

t_{A E T} (i)

denotes the permitted time for the ith ship to enter the restricted waterway, while

t_{A C T} (i)

represents the allocated duration for the ith ship to traverse the restricted waterway. Two metrics are widely adopted to quantify navigational efficiency: (1) total waiting time

t_{T W T}

and (2) scheduled sequence length

t_{L E N}

. The total waiting time

t_{T W T}

represents the cumulative delay of all N ships and is defined by Equation (1):

t_{T W T} = \sum_{i = 1}^{N} [t_{A E T} (i) - t_{P A T} (i) + t_{A C T} (i) - t_{P C T} (i)]

(1)

The waiting time for the ith ship comprises two components: departure delay

t_{A E T} (i) - t_{P A T} (i)

and traveling delay

t_{A C T} (i) - t_{P C T} (i)

. The departure delay represents the time a ship waits for a ‘go’ signal outside the restricted waterway, while the traveling delay indicates additional time spent within the waterway to prevent overtaking. The scheduled sequence length

t_{L E N}

quantifies the total time duration required for all N ships to pass through the restricted waterway and is defined by Equation (2):

\begin{matrix} t_{L E N} = & max [t_{A E T} (1) + t_{A C T} (1), t_{A E T} (2) + t_{A C T} (2), \\ \dots, t_{A E T} (N) + t_{A C T} (N)] - min [t_{A E T} (1), \\ t_{A E T} (2), \dots, t_{A E T} (N)] \end{matrix}

(2)

In this equation, the first term

max [\cdot]

identifies the time at which the last ship completes its passage through the restricted waterway, while the second term

min [\cdot]

identifies the entry time of the first ship.

Both

t_{T W T}

and

t_{L E N}

are metrics for evaluating waterway scheduling efficiency. Although these metrics are not strictly equivalent, minimizing

t_{T W T}

typically coincides with minimizing

t_{L E N}

for the same set of ships. The

t_{T W T}

metric focuses more on operational costs, while

t_{L E N}

emphasizes waterway capacity. In this research, we adopt

t_{T W T}

as the performance evaluation metric for the ship scheduling algorithm.

Table 1 demonstrates how optimized scheduling can reduce waiting time while maintaining safety constraints. The FCFS policy (scheduling sequence as 1→2→3→4) results in a total waiting time of 125 min. In contrast, the optimal sequence (2→3→1→4) prioritizes faster downstream ships, reducing the waiting time by 81.6% to 23 min. This example confirms the potential efficiency benefit when accounting for directional asymmetries in crossing times, particularly in bi-directional waterways where upstream and downstream navigation times differ substantially. Determining the optimal sequence becomes computationally intensive as the problem scales to include more ships, necessitating advanced optimized algorithmic approaches.

3. The Reinforcement Learning-Based Ship Scheduling Framework

3.1. Preliminaries

3.1.1. Reinforcement Learning

Reinforcement learning (RL) is one of the main domains of machine learning technology [28,29]. The key principle of RL is to develop a decision-making policy based on interactions and feedback from the environment, i.e., acting in an environment and updating the strategy according to the rewards received. This approach is particularly valuable for complex sequential decision problems like ship scheduling, where predetermined rules often fail to adapt to dynamic conditions.

Unlike supervised and unsupervised learning methods, RL learns to maximize the long-term reward of a Markov Decision Process (MDP) by experimenting with different actions and updating action values in response to environmental feedback, thus eliminating the need for predefined data [30]. For waterway management, this means shifting from fixed FCFS principles to adaptive scheduling policies that minimize collective waiting times. Generally, at time t, the agent observes the environmental state

s_{t} \in S

and takes an action

a_{t} \in A

according to policy

π

, such that

a_{t} = π (s_{t})

. The environment then transitions to a new state

s_{t + 1}

with probability

s_{t + 1} \sim p (s_{t}, a_{t})

and returns a reward

r_{t} = r (s_{t}, a_{t}, s_{t + 1})

to the agent. The agent aims to maximize total cumulative reward by selecting an appropriate sequence of actions. For an infinite-horizon MDP problem, the total discounted reward after time t is defined as:

R_{t} = r_{t} + γ r_{t + 1} + γ^{2} r_{t + 2} + γ^{3} r_{t + 3} + \dots = \sum_{i = 0}^{\infty} γ^{i} r_{t + i}

(3)

where

γ \in [0, 1)

is the discount factor that balances the weight between immediate and future rewards. When

γ

approaches 1, the agent treats immediate and future rewards equally. Conversely, when

γ

approaches 0, the agent prioritizes immediate rewards and discounts future rewards. In the ship scheduling process, this allows balancing between optimizing for immediate traffic conditions and considering longer-term arrival patterns. The expected total reward given environmental state s under policy

π

at time t is denoted by the state value function:

V_{π} (s) = E_{π} [R_{t} | s_{t} = s] = E_{π} [\sum_{i = 0}^{\infty} γ^{i} r_{t + i} ∣ s_{t} = s]

(4)

In addition to the state value function, the action value function (also known as Q-function) represents the expected reward starting from state s, taking action a, and following policy

π

:

\begin{matrix} Q_{π} (s, a) & = E_{π} [R_{t} | s_{t} = s, a_{t} = a] \\ = E_{π} [\sum_{i = 0}^{\infty} γ^{i} r_{t + i} ∣ s_{t} = s, a_{t} = a] \end{matrix}

(5)

In practice, Q-learning is considered one of the most effective and efficient RL methods due to its integration of Monte Carlo and Dynamic Programming approaches [30,31]. The Q-learning constructs a Q-table to store values for all possible state–action combinations [32], which translates to evaluating different ship sequences under various traffic conditions for waterway management. The value of

Q (s, a)

represents the expected reward of taking action a in state s. Given accurate Q-table values, the agent can identify the optimal action for each state by selecting the action with the highest Q-value. To determine these values, the Q-learning algorithm iteratively applies the Bellman Equation (6) to update Q-values for all state–action pairs until convergence:

Q^{n e w} (s, a) \leftarrow (1 - α) Q (s, a) + α (r_{t + 1} + γ max_{a^{'}} Q (s^{'}, a^{'}))

(6)

where

Q^{n e w} (s, a)

is the updated Q-value, combining the previous value

Q (s, a)

with a new estimate

r_{t + 1} + γ {max}_{a^{'}} Q (s^{'}, a^{'})

. Here,

s^{'}

represents the next state after taking action a in state s. The learning rate

α \in [0, 1]

controls how much new information overrides existing information. A higher learning rate prioritizes new information; when

α = 1

, previous information is completely discarded, while

α = 0

ignores all new information. This iterative process enables the agent to accumulate knowledge about optimal actions across different states, making it ideal for discovering efficient ship scheduling policies that adapt to changing waterway conditions.

3.1.2. Clustering

Fuzzy C-Means (FCM) is a soft clustering algorithm that assigns membership degrees to each data point across all clusters based on distances between data points and cluster centers [33]. In ship scheduling applications, FCM is critical in discretizing continuous-time parameters, i.e.,

t_{P A T}

and

t_{P C T}

. Given a dataset

X = (x_{1}, x_{2}, \dots, x_{N})

with N samples to be clustered into C groups, FCM aims to minimize the objective function in Equation (7) iteratively:

J = \sum_{i = 1}^{N} \sum_{j = 1}^{C} μ_{i j}^{m} {∥x_{i} - g_{j}∥}^{2}

(7)

m \in [1, \infty)

is a fuzziness index controlling the partition’s softness. Without prior domain knowledge, m is typically set to 2. The membership value

μ_{i j} \in [0, 1]

indicates the probability of sample

x_{i}

belonging to cluster j, such that

\sum_{j = 1}^{C} μ_{i j} = 1 \forall i = 1, 2, \dots, N

(8)

The term

g_{j}

represents the center of the jth cluster, and

∥*∥

denotes the similarity measure between data point

x_{i}

and cluster center

g_{j}

.

The objective function J is minimized when higher membership values are assigned to data points closer to cluster centers. Memberships and cluster centers are updated through the following equations:

μ_{i j} = \frac{1}{{\sum_{k = 1}^{C} (\frac{∥x_{i} - g_{j}∥}{∥x_{i} - g_{k}∥})}^{\frac{2}{m - 1}}}

(9)

g_{j} = \frac{\sum_{i = 1}^{N} μ_{i j}^{m} x_{i}}{\sum_{i = 1}^{N} μ_{i j}^{m}}

(10)

The FCM algorithm is summarized in Algorithm 1. An appropriate number of clusters C is determined, followed by a random selection of initial cluster centers from the dataset. Centers and memberships are then updated iteratively until a stopping criterion is met, such as when membership changes become sufficiently small (Equation (11)) or when centers remain unchanged between consecutive iterations.

max_{i j} \{|μ_{i j}^{(k + 1)} - μ_{i j}^{(k)}|\} < ε

(11)

Algorithm 1 FCM Clustering

Input:

Number of clusters C, Data set X, Fuzziness index m;

Output:

Cluster centers g, Membership matrix

μ

;

1: Randomly select C cluster centers;

2: Calculate the initial memberships;

3: repeat

4: for

i = 1

to N do

5: for

j = 1

to C do

6: Update cluster centers according to Equation (10);

7: end for

8: end for

9: for

i = 1

to N do

10: for

j = 1

to C do

11: Update membership values according to Equation (9);

12: end for

13: end for

14: until stopping criterion is satisfied (Equation (11))

3.2. Proposed Scheduling Algorithm

This section describes our approach to establishing a real-time scheduling policy that guides ships through restricted waterways safely and efficiently. Specifically, we develop the scheduling policy using Q-learning while incorporating two key mechanisms to address the complex state and action spaces: (1) a DTW-based FCM clustering algorithm to identify similarities in ships’ states and group them into clusters, and (2) a sliding window mechanism to make the problem computationally tractable. Q-learning is selected as the core framework due to its model-free nature, which eliminates the need for explicit environment dynamics modeling, particularly advantageous for waterway scheduling where vessel interactions and environmental factors create complex dynamics. Additionally, Q-learning’s ability to balance exploration and exploitation is crucial for discovering optimal scheduling policies in the diverse traffic conditions of restricted waterways.

In standard Q-learning, a table stores Q-values for every possible state–action pair in the environment. However, in restricted waterway traffic management, ships’ predicted arrival times (

t_{P A T}

) and crossing times (

t_{P C T}

) are continuous variables, and the number of ships can be large. This results in a state space that is prohibitively large for conventional RL algorithms. The dimensionality challenge is addressed by the clustering mechanism through state abstraction, transforming high-dimensional continuous ship states into a manageable discrete set of representative clusters. This approach is valued for making the Q-learning problem computationally tractable and for enhancing generalization by allowing the algorithm to apply learned policies to novel but similar traffic patterns.

In our SSRL framework, the state space is composed of three key attributes for each ship: predicted arrival time (

t_{P A T}

), crossing time (

t_{P C T}

), and direction (d). For time t, the state of the ith ship is represented as

s_{i}^{t} = [t_{P A T}^{t} (i), t_{P C T}^{t} (i), d (i)]

, where

t_{P A T}^{t} (i)

represents the predicted arrival time of ship i at time t,

t_{P C T}^{t} (i)

represents its predicted crossing time, and

d (i)

indicates its direction (0 for downstream, 1 for upstream). The overall state

S_{t}

is the collection of N ships’ states,

S_{t} = s_{1}^{t}, s_{2}^{t}, \dots, s_{N}^{t}

, which are

3 \times N

-dimensional data.

In our implementation, N is set to 30 based on practical considerations, resulting in a

3 \times 30

dimensional time series data. To reduce this high-dimensional state space, we employ FCM clustering to group similar environmental states into a manageable number of distinct clusters. Dynamic Time Warping (DTW) is adopted to calculate similarities between clusters because, unlike traditional distance metrics, DTW allows for ‘elastic’ transformations in time series data, making it particularly suitable for ship scheduling where temporal alignment is important. DTW can effectively capture the similarities in ship arrival and crossing time patterns regardless of local temporal variations.

After clustering, we identify C distinct clusters. States within the same cluster exhibit greater similarity than states in different clusters. This allows us to replace the high-dimensional ship state representation with a much smaller set of cluster identifiers, significantly reducing the Q-table size. The Q-learning process then proceeds as follows:

Step 1: Initialize a Q-table with C rows (representing state clusters) and A columns (representing possible actions), where $Q (s, a)$ represents the expected reward for taking action a in state s.
Step 2: Generate N ships with appropriate $t_{P A T}$ and $t_{P C T}$ values, calculate similarities with the C clusters, and assign the current state s to the cluster with the highest similarity.
Step 3: Select a ship scheduling sequence a in state s using an $ϵ$ -greedy policy: choose the action with the highest $Q (s, a)$ value with probability $1 - ϵ$ , or select a random action with probability $ϵ$ .
Step 4: Implement the selected sequence a by allocating appropriate $t_{A E T}$ and $t_{A C T}$ values to each ship to ensure safety. The environmental reward is computed based on the total waiting time ( $t_{T W T}$ ), incentivizing the agent to improve scheduling efficiency.
Step 5: Update $Q (s, a)$ according to the Bellman Equation (6).
Step 6: Repeat steps 3–5 until convergence criteria are met.

The detailed Q-learning process with FCM state reduction is formalized in Algorithm 2. The reward function is designed to minimize the total waiting time of ships. For each action (ship sequence) a taken in state s, the reward

R (s, a)

is defined as the negative sum of waiting times for all ships in that sequence:

R (s, a) = - \sum_{i = 1}^{N} [t_{A E T} (i) - t_{P A T} (i) + t_{A C T} (i) - t_{P C T} (i)]

(12)

where

t_{A E T} (i) - t_{P A T} (i)

represents the departure delay (time spent waiting outside the waterway) and

t_{A C T} (i) - t_{P C T} (i)

represents the traveling delay (additional time spent within the waterway). This reward structure incentivizes the Q-learning agent to prefer ship sequences that result in shorter overall waiting times while maintaining safety constraints. The negative formulation ensures compatibility with standard Q-learning algorithms that aim to maximize cumulative rewards. In our offline training process, this reward signal effectively guides the algorithm to discover scheduling policies that minimize total waiting times.

Algorithm 2 Q-learning with FCM State Reduction

Input:

Number of clusters C, Number of actions A, Learning rate

α

,

Discount factor

γ

, Exploration rate

ε

, Maximum episodes

m a x E p i s o d e s

;

Output:

Optimized Q-table;

1: Initialize Q-table with dimensions

C \times A

with zeros;

2: for

e p i s o d e = 1

to

m a x E p i s o d e s

do

3: Generate N ships with appropriate

t_{P A T}

and

t_{P C T}

values;

4: Calculate similarity with the C clusters;

5: Assign current state s to the cluster with highest similarity;

6: // Select action using

ε

-greedy policy

7: if

r a n d o m (0, 1) < ε

then

8: Select random action a;

9: else

10: Select action a with highest

Q (s, a)

;

11: end if

12: // Implement selected sequence

13: Allocate appropriate

t_{A E T}

and

t_{A C T}

values to each ship;

14: Ensure safety constraints are satisfied;

15: // Calculate reward based on total waiting time

16:

r e w a r d = - t o t a l W a i t i n g T i m e

;

17: // Observe new state

18: Calculate similarity with the C clusters;

19: Assign new state

s^{'}

to the cluster with highest similarity;

20: // Update Q-table using Bellman equation

21:

Q (s, a) = Q (s, a) + α \cdot [r e w a r d + γ \cdot {max}_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a)]

;

22: // Move to next state

23:

s = s^{'}

;

24: // Reduce exploration over time

25:

ε = ε \cdot d e c a y_r a t e

;

26: end for

27: return Q-table;

For a scheduling problem with N ships, the action space consists of

N!

possible ship sequences, which creates a combinatorial explosion as N increases. To address this challenge, SSRL employs a sliding window mechanism that divides the overall scheduling problem into manageable subproblems [34]. This mechanism utilizes two key parameters: (1) information window length (

N_{i w}

), which controls the number of ships considered in each subproblem, and (2) schedule window length (

N_{s w}

), which defines the number of ships actually scheduled in each iteration.

At each decision point, the first

N_{i w}

ships’ states are considered for sequencing, but only decisions for the first

N_{s w}

ships are implemented; the remaining ships are returned to the schedule list for subsequent iterations. This approach reduces the action space from

N!

to

(N_{i w})!

possibilities, making the problem computationally tractable while maintaining solution quality. Additionally, SSRL utilizes an

ϵ

-greedy exploration strategy during learning, selecting a random action with probability

ϵ

and the currently known best action with probability

(1 - ϵ)

, with

ϵ

decreasing as learning progresses to shift from exploration to exploitation.

The sliding window approach is particularly suitable for dynamic waterway environments where future arrival information may be incomplete, allowing decisions based on the most current and reliable data while enhancing the agent’s ability to handle uncertainties by focusing each subproblem on ships with more reliable information. Since ships located far from the restricted waterway rarely navigate ahead of closer vessels, these mechanisms effectively transform the global optimization problem into tractable subproblems with sufficient action space exploration without exhaustive search.

\begin{matrix} t_{A E T} (i + 1) = & max {t_{A E T} (i) + σ, t_{P A T} (i + 1)} \\ t_{A C T} (i + 1) = & max {t_{A E T} (i) + t_{A C T} (i) - t_{A E T} (i + 1), t_{P C T} (i + 1)} \end{matrix}

(13)

\begin{matrix} t_{A E T} (i + 1) = max {t_{A E T} (i) + t_{A C T} (i) + σ, t_{P A T} (i + 1)} \\ t_{A C T} (i + 1) = t_{P C T} (i + 1) \end{matrix}

(14)

Once the ship sequence is determined, each ship must be assigned an appropriate entry time (

t_{A E T}

) and crossing time (

t_{A C T}

) to ensure safe passage. The scheduling adheres to two fundamental principles: (1) overtaking is prohibited within restricted waterways, and (2) a minimum safety gap must be maintained between consecutive ships. Equations (13) and (14) formalize these constraints for ships traveling in the same and different directions, respectively, where

σ

represents the minimum required safety gap (typically set to 1 min in practice). These equations enforce safety through hard constraints rather than penalty terms in the reward function, ensuring a minimum safety gap between ships while prohibiting both overtaking and simultaneous entry of ships from opposite directions. By applying these constraints during action implementation, unsafe actions are excluded from the feasible space, prioritizing safety as non-negotiable while allowing the reward function to focus solely on efficiency optimization by incentivizing the agent to discover scheduling policies that minimize operational delays.

3.3. Computational Complexity and Performance Analysis

Benefiting from the ‘offline training, online application’ paradigm, SSRL resolves the traditional trade-off between accuracy and computational speed. While comprehensive Q-learning is computationally intensive, this process occurs offline, allowing sufficient time to build accurate decision policies. During practical operation, scheduling decisions involve only table lookup operations with

O (1)

complexity, guaranteeing instantaneous response regardless of traffic complexity.

SSRL demonstrates excellent scalability as the number of ships increases through three key mechanisms: (1) State abstraction: FCM clustering simplifies the high-dimensional state space into a fixed number of representative clusters. This ensures that even when the number of ships increases from N to

N + k

, the number of clusters C remains constant, keeping the Q-table size independent of the total ship count. (2) Sliding window: The algorithm considers only

N_{i w}

ships in the information window, maintaining computational complexity at the

(N_{i w})!

level regardless of the total number of ships. This localized decision process allows SSRL to handle theoretically unlimited numbers of ships. This approach is particularly effective in restricted waterways where ships located far from the waterway rarely navigate ahead of closer vessels due to operational constraints. Our experimental design with 30 ships arriving within 1-h already represents an extremely congested scenario, as restricted waterways typically accommodate only 5–10 ships per hour due to their single-direction passage requirement. (3) Offline learning: The Q-learning process is completed offline using simulated data covering various traffic patterns. Once trained, the online deployment computational requirements remain at

O (1)

for table lookup operations, independent of the total number of ships.

In addition to computational efficiency, SSRL achieves an effective balance between scheduling accuracy and ship speed considerations through several mechanisms: (1) Reward function design: By using the negative sum of waiting times as the reward signal, SSRL naturally prioritizes faster ships (typically those traveling downstream with shorter crossing times) while maintaining overall scheduling accuracy. (2) State representation: The state space captures essential information about each ship’s arrival time (

t_{P A T}

), crossing time (

t_{P C T}

), and direction (d), providing the algorithm with comprehensive knowledge to balance the advantages of faster ships against overall traffic flow optimization. (3) Sliding window optimization: Our parameter analysis determined that

N_{i w} = 13

and

N_{s w} = 8

provide the optimal balance point where the algorithm has adequate foresight without excessive computational overhead or delayed response to faster vessels. (4) Efficient action space search: Despite the large theoretical action space of

(N_{i w})!

, our experimental results demonstrate that SSRL can effectively discover near-optimal ship sequences. This is achieved through the combination of

ϵ

-greedy exploration, strategic dimensionality reduction, and the inherent guidance provided by the reward signal toward more efficient scheduling arrangements.

These design features collectively enable SSRL to deliver highly optimized scheduling solutions that strategically sequence ships based on their directional speed characteristics while maintaining scheduling accuracy through comprehensive state representation and reward mechanisms. Experimental results confirm this balance achievement in congested scenarios where the tension between prioritizing faster ships and maintaining overall efficiency is highest.

4. Experiments and Results Analysis

Simulations and experiments were conducted to evaluate the effectiveness, efficiency, and robustness of the proposed algorithm. All tests were performed on a platform with an Intel i7 processor and 16GB of RAM, with no additional software installed to prevent interference.

4.1. Data Description

To simulate real-world scenarios of ships passing through restricted waterways, 30 ships arriving randomly at the restricted waterway border within periods of 1 to 3 h were considered. These scenarios cover a comprehensive range of traffic volumes, from highly congested (1 h) to moderate (2 h) and sparse (3 h) scenarios, ensuring our results are representative across all realistic operational conditions. The crossing time

t_{P C T}

was sampled from Gaussian distributions

N (μ, σ^{2})

, where

μ

and

σ

vary depending on ship direction and waterway characteristics. Based on analysis of AIS data collected from the Xiaziliang Restricted Waterway in China (shown in Figure 1),

μ

and

σ

were set to 18, 3 min and 49, 5 min for downstream and upstream travel, respectively.

4.2. Experimental Results and Parameter Sensitivity

Table 2 details scheduling results for 30 ships arriving at the restricted waterway within one hour under ideal conditions (no uncertainties). The table presents the original ship information and the resulting ship sequences determined by three benchmark methods—TSRS [5], FAHP-ES [5], and OSS-SW [17]—alongside our proposed SSRL algorithm. These established methods were chosen as benchmarks because they specifically address the unique operational constraints of restricted waterways in the Yangtze River system. For the SSRL algorithm, we set

N_{i w} = 10

and

N_{s w} = 5

, meaning the algorithm considered the first 10 ships’ information while applying scheduling decisions only to the first 5 ships in each step. The results indicate that our SSRL algorithm generated the optimal ship sequence that significantly reduced total waiting times. To ensure statistical validity, we conducted 100 independent experiments under identical conditions (30 ships arriving in one hour with no uncertainties), with results shown in Figure 3. The proposed SSRL algorithm consistently achieved the lowest total waiting time with the smallest variance, demonstrating both superior performance and stability compared with conventional approaches.

To comprehensively evaluate the SSRL algorithm’s performance, we designed 12 test cases with varying degrees of uncertainty and congestion (shown in Table 3).

P_{o n}

represents the percentage of ships appearing unexpectedly near the waterway border, while

P_{o f f}

indicates the proportion of ships that dock before entering the waterway. To test the algorithm’s ability to handle these uncertainties, we implemented a 10 min notification time constraint. Ships appearing unexpectedly were added to the scheduling list only 10 min before their arrival at the waterway border, while docked ships were removed from the scheduling list 10 min before they stopped. For each case, we conducted 100 independent Monte Carlo simulations with

N_{i w}

varying from 10 to 30 and

N_{s w}

from 5 to 15.

Figure 4 illustrates the total waiting time of 30 ships scheduled by the SSRL algorithm as a function of schedule window length (

N_{s w}

) and information window length (

N_{i w}

), which varied from 5 to 10 and 10 to 30, respectively. Each data point represents the mean value from 100 independent simulations. Figure 4a shows the least congested scenario with 30 ships arriving over 180 min, Figure 4b depicts the moderately congested scenario with ships arriving within 120 min, and Figure 4c presents the most congested scenario with all ships arriving within 60 min. The color gradient from blue to red represents increasing waiting times.

As expected, waiting times increased with waterway congestion. In the most congested scenario (Figure 4c), the minimum waiting time was 670 min, compared with 313 and 503 min in Figure 4a and Figure 4b, respectively. The information window length (

N_{i w}

) demonstrated a greater influence on algorithm performance than the schedule window length (

N_{s w}

). As

N_{i w}

increased, the total waiting time initially decreased before rapidly increasing in all scenarios. Minimum waiting times were consistently achieved when

N_{i w}

was approximately 13–14 and

N_{s w}

was 8.

We conducted additional experiments with varying degrees of uncertainty (sudden ship appearances and dockings) to comprehensively evaluate the SSRL algorithm’s robustness. Figure 5 presents these results, where the horizontal axis indicates the ship arrival time window (

t_{P A T}

: 60, 120, or 180 min) and the vertical axis shows the mean waiting time across 100 simulations. The best performance was observed under conditions with no sudden ship appearances but with 10% of ships docking (

p_{o n} = 0

and

p_{o f f} = 10 %

). Conversely, the highest waiting times occurred when 10% of ships appeared unexpectedly with no dockings (

p_{o n} = 10 %

and

p_{o f f} = 0

). The two intermediate uncertainty scenarios yielded similar performance levels.

Beyond sliding window parameters, we conducted additional sensitivity analyses on core reinforcement learning parameters. For the learning rate (

α

), our experiments with values ranging from 0.1 to 0.9 revealed that

α = 0.3

provides the best balance between learning speed and stability. Lower values resulted in slow convergence, while higher values occasionally produced significant oscillations in Q-values. We further enhanced stability by implementing an adaptive learning rate mechanism following

α_{t} = α_{0} / (1 + k \cdot t)

, where

α_{0} = 0.3

is the initial learning rate,

k = 0.001

is a decay factor, and t is the episode number.

The discount factor (

γ

) determines how the algorithm balances immediate versus future rewards. Values between 0.8 and 0.95 yielded similar performance, with

γ = 0.9

being optimal in most scenarios. This indicates that both short-term efficiency and long-term strategic planning are important for effective waterway management.

For exploration-exploitation balancing, we compared several approaches, including Boltzmann exploration and fixed

ϵ

values. A decaying

ϵ

-greedy strategy starting with

ϵ = 0.9

and diminishing at a rate of 0.995 per episode consistently delivered superior performance. This approach ensures thorough exploration of the complex action space during early training while gradually transitioning to exploitation of learned knowledge.

For the clustering algorithm, we found that 70 clusters provide the optimal balance between state space reduction and information preservation. These parameter optimization findings demonstrate that SSRL performance remains robust across reasonable parameter ranges, making the algorithm suitable for real-world deployment. Our analysis establishes a systematic methodology for parameter optimization that can be applied to other waterway scenarios beyond those tested in this study.

Computational efficiency is another critical factor for real-world applications, as ships must receive signals immediately upon arrival at the waterway border to prevent traffic chaos. The SSRL’s ‘offline training, online application’ paradigm determines optimal ship sequences by simply retrieving the maximum value from a precomputed knowledge table, requiring minimal computation for online scheduling.

4.3. Discussion

Based on the Monte Carlo simulations, the following conclusions regarding the SSRL algorithm’s performance for ship scheduling in restricted waterways are obtained:

(1) The proposed SSRL method consistently outperforms traditional approaches, including FCFS, OSS-SW, and TSRS. This superior performance stems from the Q-learning approach, which constructs a comprehensive lookup table storing values for all possible scheduling sequences under each state. These Q-values, representing the utility of each ship sequence in a given state, theoretically converge to optimal values with probability 1 as all possible sequences are repeatedly sampled across all states.

(2) The optimal information window length (

N_{i w}

) for Xiaziliang restricted waterway traffic management is 13 ships. As shown in Figure 4,

N_{i w}

significantly impacts the SSRL algorithm’s performance. When

N_{i w}

varies between 10 and 30, the total waiting time follows a ‘V’ shape, with minimum waiting times occurring when

N_{i w}

is approximately 13. This pattern emerges because as

N_{i w}

increases, the number of possible ship sequences grows exponentially, making it difficult for the Q-learning algorithm to converge to optimal values. Conversely, if

N_{i w}

is too small, the search space becomes overly constrained, potentially excluding optimal sequences.

(3) The SSRL algorithm effectively manages restricted waterway uncertainties. Although Figure 5 shows significant variations in waiting times under different uncertainty conditions, this does not indicate a deficiency in handling uncertainties. These variations reflect differences in the number of ships ultimately scheduled: in cases 4–6, three ships dock (

p_{o f f} = 10 %

) and are removed from the scheduling list, leaving only 27 ships; in cases 7–9, three additional ships appear unexpectedly (

p_{o n} = 10 %

), resulting in 33 ships; in the remaining cases, exactly 30 ships are scheduled. The average waiting time per ship remains approximately 16 min across all scenarios, demonstrating the algorithm’s consistent performance regardless of uncertainty type.

(4) Total waiting time increases with traffic density. Figure 5c represents the most congested scenario, with 30 ships arriving within 60 min, resulting in the highest waiting time (approximately 35,000 min) across all cases. In contrast, Figure 5a shows the least congested scenario, with 30 ships distributed over 180 min, yielding the lowest waiting time (approximately 25,000 min). This pattern reflects the inherent capacity limitations of restricted waterways, where increased traffic density inevitably leads to greater congestion and correspondingly longer waiting times.

(5) The SSRL algorithm meets the real-time requirements essential for practical ship scheduling applications. Three key characteristics contribute to its computational efficiency: First, the algorithm performs all intensive learning processes offline, with online implementation requiring only simple table lookups. Second, state space dimensionality is substantially reduced through clustering ship information into a manageable number of groups. Finally, the sliding window mechanism further reduces both state and action space dimensions, enhancing computational efficiency.

5. Conclusions

Inefficient ship scheduling represents a significant bottleneck in restricted waterway traffic management. This paper introduces SSRL, a novel ship scheduling algorithm that integrates reinforcement learning with clustering and sliding window mechanisms. Unlike traditional scheduling approaches, SSRL learns optimal scheduling policies offline by evaluating all possible ship sequences and updating knowledge based on performance feedback. This allows online scheduling decisions to be made through efficient table lookups, minimizing computational requirements during operation.

We validated the SSRL algorithm through extensive experimentation comprising 12 cases, with 100 independent experiments conducted for each case. SSRL consistently demonstrated its superiority over conventional approaches in all scenarios, reducing waiting times by 90.6% compared with TSRS, 48.4% compared with FAHP-ES, and 32.6% compared with OSS-SW. Our comprehensive Monte Carlo simulations across various scenarios and parameter configurations further confirmed the algorithm’s effectiveness in managing ship scheduling under different uncertainty conditions. Parameter sensitivity analysis established that an information window length (

N_{i w}

) of 13 ships combined with a schedule window length (

N_{s w}

) of 8 ships provides the optimal balance between solution quality and computational efficiency in Xiaziliang Restricted Waterway. For reinforcement learning parameters, our analysis determined that a learning rate of

α = 0.3

(enhanced with adaptive decay where k = 0.001), discount factor

γ = 0.9

, and a decaying

ε

-greedy strategy starting with

ε = 0.9

and diminishing at a rate of 0.995 per episode consistently delivers superior performance across various traffic conditions. As expected, more congested waterways resulted in longer waiting times, reflecting physical capacity constraints.

The clustering algorithm and sliding window mechanism introduced in this work effectively address the dimensional challenges of the scheduling problem by reducing state space complexity. Future research could extend this approach by incorporating deep reinforcement learning techniques to handle even larger state spaces, potentially yielding more refined scheduling policies. Additional areas for investigation include adapting the algorithm for multi-waterway coordination and integrating real-time environmental factors such as weather conditions and visibility constraints.

While SSRL was validated using data from the Xiaziliang Restricted Waterway, its application to other restricted waterways with different characteristics requires parameter recalibration. Specifically, the Gaussian distribution parameters for upstream and downstream ships’ crossing times (

μ

and

σ

) vary based on waterway length, current velocity, and geometric features. However, the fundamental SSRL framework remains applicable across all restricted waterways with similar operational constraints of one-way traffic systems. This research contributes to the field of inland waterway transportation by providing a practical, efficient, and adaptable solution for real-time ship scheduling in restricted waterways, with potential applications extending to other complex transportation scheduling problems.

Author Contributions

Conceptualization, S.G. and H.L.; methodology, S.G. and X.W.; software, X.W.; validation, S.G., X.W. and H.L.; formal analysis, X.W.; investigation, S.G.; resources, H.L.; data curation, X.W.; writing—original draft preparation, S.G.; writing—review and editing, X.W. and H.L.; visualization, X.W.; supervision, H.L.; project administration, H.L.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Foundation of China (Grant No. 62003011) and Open Research Project of Big Data Application Technologies Laboratory, China Academy of Transportation Sciences (Grant No. 2021B1203).

Data Availability Statement

Restrictions apply to the availability of these data. Data was obtained from Changjiang Waterway Bureau and are available with the permission of Changjiang Waterway Bureau.

Conflicts of Interest

Author Hongdun Li is employed by the company China Academy of Transportation Sciences. The authors declare that this study received funding from the National Science Foundation of China and Open Research Project of China Academy of Transportation Sciences. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article, or the decision to submit it for publication.

References

Buchem, M.; Golak, J.A.P.; Grigoriev, A. Vessel velocity decisions in inland waterway transportation under uncertainty. Eur. J. Oper. Res. 2022, 296, 669–678. [Google Scholar] [CrossRef]
Zhang, J.; Wan, C.; He, A.; Zhang, D.; Soares, C.G. A two-stage black-spot identification model for inland waterway transportation. Reliab. Eng. Syst. Saf. 2021, 213, 107677. [Google Scholar] [CrossRef]
Yang, W.; Liao, P.; Jiang, S.; Wang, H. Analysis of vessel traffic flow characteristics in inland restricted waterways using multi-source data. arXiv 2024, arXiv:2410.07130. [Google Scholar] [CrossRef]
Liu, D.; Shi, G.; Kang, Z. Fuzzy scheduling problem of vessels in one-way waterway. J. Mar. Sci. Eng. 2021, 9, 1064. [Google Scholar] [CrossRef]
Liang, S.; Yang, X.; Bi, F.; Ye, C. Vessel traffic scheduling method for the controlled waterways in the upper Yangtze River. Ocean Eng. 2019, 172, 96–104. [Google Scholar] [CrossRef]
Gan, S.; Liang, S.; Li, K.; Deng, J.; Cheng, T. Ship trajectory prediction for intelligent traffic management using clustering and ANN. In Proceedings of the 2016 UKACC 11th International Conference on Control (CONTROL), Belfast, UK, 31 August–2 September 2016; pp. 1–6. [Google Scholar]
Xia, Z.; Guo, Z.; Wang, W.; Jiang, Y. Joint optimization of ship scheduling and speed reduction: A new strategy considering high transport efficiency and low carbon of ships in port. Ocean Eng. 2021, 233, 109224. [Google Scholar] [CrossRef]
Zhao, X.; Lin, Q.; Yu, H. A Co-Scheduling Problem of Ship Lift and Ship Lock at the Three Gorges Dam. IEEE Access 2020, 8, 132893–132910. [Google Scholar] [CrossRef]
Fryirs, K.A.; Brierley, G.J.; Hancock, F.; Cohen, T.J.; Brooks, A.P.; Reinfelds, I.; Cook, N.; Raine, A. Tracking geomorphic recovery in process-based river management. Land Degrad. Dev. 2018, 29, 3221–3244. [Google Scholar] [CrossRef]
Lagos, M.S.; Muñoz, J.F.; Suárez, F.I.; Fuenzalida, M.J.; Yáñez-Morroni, G.; Sanzana, P. Investigating the effects of channelization in the Silala River: A review of the implementation of a coupled MIKE-11 and MIKE-SHE modeling system. Wiley Interdiscip. Rev. Water 2024, 11, e1673. [Google Scholar] [CrossRef]
Zhang, Y.; Liu, S.; Zheng, Q.; Tian, H.; Guo, W. Ship scheduling problem in an anchorage-to-quay channel with water discharge restrictions. Ocean Eng. 2024, 309, 118432. [Google Scholar] [CrossRef]
Zhai, D.; Fu, X.; Xu, H.Y.; Yin, X.F.; Vasundhara, J.; Zhang, W. Multi-Layer Scheduling Optimization for Intelligent Mobility of Maritime Operation. In Proceedings of the 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), Kristiansand, Norway, 9–13 November 2020; pp. 1511–1514. [Google Scholar]
Chen, C.; Chen, X. Scheduling optimization in restricted channels based on the agent technology and bayesian network. In Proceedings of the 2017 4th International Conference on Transportation Information and Safety (ICTIS), Banff, AB, Canada, 8–10 August 2017; pp. 291–295. [Google Scholar]
Le Carrer, N.; Ferson, S.; Green, P.L. Optimising cargo loading and ship scheduling in tidal areas. Eur. J. Oper. Res. 2020, 280, 1082–1094. [Google Scholar] [CrossRef]
Zhang, X.; Li, R.; Wang, C.; Xue, B.; Guo, W. Robust optimization for a class of ship traffic scheduling problem with uncertain arrival and departure times. Eng. Appl. Artif. Intell. 2024, 133, 108257. [Google Scholar] [CrossRef]
Eisen, H.E.; Van der Lei, J.E.; Zuidema, J.; Koch, T.; Dugundji, E.R. An Evaluation of First-Come, First-Served Scheduling in a Geometrically-Constrained Wet Bulk Terminal. Front. Future Transp. 2021, 2, 709822. [Google Scholar] [CrossRef]
Gan, S.; Wang, Y.; Li, K.; Liang, S. Efficient online one-way traffic scheduling for restricted waterways. Ocean Eng. 2021, 237, 109515. [Google Scholar] [CrossRef]
Gan, S.; Liang, S.; Li, K.; Deng, J.; Cheng, T. Long-term ship speed prediction for intelligent traffic signaling. IEEE Trans. Intell. Transp. Syst. 2016, 18, 82–91. [Google Scholar] [CrossRef]
Liu, S.; Zhang, Y.; Guo, W.; Tian, H.; Tang, K. Ship scheduling problem based on channel-lock coordination in flood season. Expert Syst. Appl. 2024, 254, 124393. [Google Scholar] [CrossRef]
Lalla-Ruiz, E.; Shi, X.; Voß, S. The waterway ship scheduling problem. Transp. Res. Part D Transp. Environ. 2018, 60, 191–209. [Google Scholar] [CrossRef]
Yang, X.; Gu, W.; Wang, S. Optimal scheduling of vessels passing a waterway bottleneck. Ocean Coast. Manag. 2023, 244, 106809. [Google Scholar] [CrossRef]
Aritua, B.; Cheng, L.; van Liere, R.; de Leijer, H. Blue Routes for a New Era: Developing Inland Waterways Transportation in China; World Bank Publications: Herndon, VA, USA, 2021. [Google Scholar]
Jian, L.; Xing, Y.; Ke-zhong, L.; Zhi-tao, Y. Study on the fluency of one-way waterway transportation based on First Come First Served (FCFS) model. In Proceedings of the 2015 International Conference on Transportation Information and Safety (ICTIS), Wuhan, China, 25–28 June 2015; pp. 669–674. [Google Scholar]
Liu, Y.; Mou, J.M. Simulation on the traffic capacity of the Three Gorges ship lock based on SIVAK. J. Dalian Marit. Univ. 2015, 41, 37–41. [Google Scholar]
Xin, X.; Liu, K.; Zhang, J.; Chen, S.; Wang, H.; Cheng, Z. A Self-Organizing Grouping Approach for Ship Traffic Scheduling in Restricted One-Way Waterway. Mar. Technol. Soc. J. 2019, 53, 83–96. [Google Scholar] [CrossRef]
Li, R.; Zhang, X.; Jiang, L.; Yang, Z.; Guo, W. An adaptive heuristic algorithm based on reinforcement learning for ship scheduling optimization problem. Ocean Coast. Manag. 2022, 230, 106375. [Google Scholar] [CrossRef]
Wang, W.; Ding, A.; Cao, Z.; Peng, Y.; Liu, H.; Xu, X. Deep Reinforcement Learning for Channel Traffic Scheduling in Dry Bulk Export Terminals. IEEE Trans. Intell. Transp. Syst. 2024, 25, 17547–17561. [Google Scholar] [CrossRef]
Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4909–4926. [Google Scholar] [CrossRef]
Haydari, A.; Yilmaz, Y. Deep reinforcement learning for intelligent transportation systems: A survey. IEEE Trans. Intell. Transp. Syst. 2020, 23, 11–32. [Google Scholar] [CrossRef]
Fan, J.; Wang, Z.; Xie, Y.; Yang, Z. A theoretical analysis of deep Q-learning. In Proceedings of the Learning for Dynamics and Control, PMLR, Virtual, 13–18 July 2020; pp. 486–489. [Google Scholar]
Tong, Z.; Chen, H.; Deng, X.; Li, K.; Li, K. A scheduling scheme in the cloud computing environment using deep Q-learning. Inf. Sci. 2020, 512, 1170–1191. [Google Scholar] [CrossRef]
Zhang, Q.; Lin, M.; Yang, L.T.; Chen, Z.; Khan, S.U.; Li, P. A double deep Q-learning model for energy-efficient edge scheduling. IEEE Trans. Serv. Comput. 2018, 12, 739–749. [Google Scholar] [CrossRef]
Lei, T.; Jia, X.; Zhang, Y.; He, L.; Meng, H.; Nandi, A.K. Significantly fast and robust fuzzy c-means clustering algorithm based on morphological reconstruction and membership filtering. IEEE Trans. Fuzzy Syst. 2018, 26, 3027–3041. [Google Scholar] [CrossRef]
Mattingley, J.; Wang, Y.; Boyd, S. Receding horizon control. IEEE Control Syst. 2011, 31, 52–65. [Google Scholar]

Figure 1. Google map of Xiaziliang restricted waterway.

Figure 2. Restricted waterway traffic management system.

Figure 3. Total waiting time of ships across 100 independent tests.

Figure 4. Total waiting time for 30 ships under zero uncertainty scenarios, with ships arriving within: (a) 180 min; (b) 120 min; (c) 60 min.

Figure 5. Total waiting time for ships under 12 different test cases, with ships arriving within: (a) 180 min; (b) 120 min; (c) 60 min.

Table 1. Comparison of FCFS-based ship scheduling policy and the optimal policy.

Original				FCFS				Optimal
ID	Dir	PAT	PCT	ID	AET	ACT	Wait	ID	AET	ACT	Wait
1	Up	14	47	1	14	47	0	2	15	18	0
2	Down	15	18	2	61	18	46	3	22	15	0
3	Down	22	15	3	61	18	42	1	37	47	23
4	Up	42	50	4	79	50	37	4	42	50	0
				Total: 125 min				Total: 23 min

Table 2. Scheduling results of 30 ships by different algorithms.

Original Sequence				TSRS				FAHP-ES				OSS-SW				SSRL
ID	Direction	$t_{PAT}$	$t_{PCT}$	ID	$t_{AET}$	$t_{ACT}$	Delay	ID	$t_{AET}$	$t_{ACT}$	Delay	ID	$t_{AET}$	$t_{ACT}$	Delay	ID	$t_{AET}$	$t_{ACT}$	Delay
1	Downstream	1	18	1	1	18	0	1	1	18	0	4	3	14	0	4	3	14	0
2	Downstream	1	19	2	1	19	0	2	1	19	0	1	3	18	2	6	8	18	0
3	Upstream	2	45	3	20	45	18	14	15	19	0	2	3	19	2	8	11	17	0
4	Downstream	3	14	4	65	14	62	13	15	19	1	6	8	18	0	11	12	16	0
5	Upstream	4	41	5	79	41	75	6	15	19	8	8	11	17	0	10	12	16	1
6	Downstream	8	18	6	120	18	112	12	15	19	3	15	28	40	3	14	15	19	0
7	Upstream	8	51	7	138	51	130	27	57	17	0	16	28	47	2	13	15	19	1
8	Downstream	11	17	8	189	17	178	19	57	18	25	17	29	46	1	12	15	19	3
9	Upstream	11	46	9	206	46	195	28	57	19	0	18	30	48	0	1	15	19	15
10	Downstream	12	15	10	252	15	240	4	57	19	59	9	30	48	21	2	15	19	14
11	Downstream	12	16	11	252	16	240	8	57	19	48	3	30	48	31	25	42	44	0
12	Downstream	15	16	12	252	16	237	10	57	19	49	5	30	48	33	22	42	49	5
13	Downstream	15	18	13	252	18	237	9	76	46	65	7	30	51	22	21	42	49	9
14	Downstream	15	19	14	252	19	237	25	76	46	36	11	81	16	69	23	42	49	6
15	Upstream	25	40	15	271	40	246	15	76	46	57	12	81	16	66	9	42	49	34
16	Upstream	26	47	16	271	47	245	23	76	47	38	10	81	16	70	15	42	49	26
17	Upstream	29	45	17	271	47	244	17	76	47	49	13	81	18	66	16	42	49	18
18	Upstream	30	48	18	271	48	241	16	76	47	50	14	81	19	66	17	42	49	17
19	Downstream	32	18	19	319	18	287	21	76	48	42	24	81	19	46	18	42	49	13
20	Downstream	32	19	20	319	19	287	18	76	48	46	30	81	20	21	3	42	49	44
21	Upstream	34	48	21	338	48	304	29	76	48	18	26	81	20	27	5	42	49	46
22	Upstream	37	49	22	338	49	301	5	76	48	79	27	81	20	27	7	42	51	34
23	Upstream	38	47	23	338	49	302	22	76	49	39	28	81	20	25	30	93	20	33
24	Downstream	41	13	24	387	13	346	7	76	51	68	19	81	20	51	19	93	20	63
25	Upstream	42	44	25	400	44	358	3	76	51	80	20	81	20	50	20	93	20	62
26	Downstream	55	19	26	444	19	389	20	127	19	95	25	101	44	59	24	93	20	59
27	Downstream	57	17	27	444	19	389	26	127	19	72	23	101	47	63	26	93	20	39
28	Downstream	57	19	28	444	19	387	11	127	19	118	21	101	48	67	27	93	20	39
29	Upstream	58	48	29	463	48	405	24	127	19	92	22	101	49	64	28	93	20	37
30	Downstream	60	20	30	511	20	451	30	127	20	67	29	101	49	44	29	113	48	55
Total waiting time				7143 min				1304 min				998 min				673 min

Table 3. Test cases with different parameter settings to evaluate the proposed algorithm.

Case	$t_{PAT}$	$p_{off}$	Case	$t_{PAT}$	$p_{on}$	$p_{off}$
1	60 min	0	7	60 min	10%	0
2	120 min	0	8	120 min	10%	0
3	180 min	0	9	180 min	10%	0
4	60 min	10%	10	60 min	10%	10%
5	120 min	10%	11	120 min	10%	10%
6	180 min	10%	12	180 min	10%	10%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gan, S.; Wang, X.; Li, H. SSRL: A Clustering-Based Reinforcement Learning Approach for Efficient Ship Scheduling in Inland Waterways. Symmetry 2025, 17, 679. https://doi.org/10.3390/sym17050679

AMA Style

Gan S, Wang X, Li H. SSRL: A Clustering-Based Reinforcement Learning Approach for Efficient Ship Scheduling in Inland Waterways. Symmetry. 2025; 17(5):679. https://doi.org/10.3390/sym17050679

Chicago/Turabian Style

Gan, Shaojun, Xin Wang, and Hongdun Li. 2025. "SSRL: A Clustering-Based Reinforcement Learning Approach for Efficient Ship Scheduling in Inland Waterways" Symmetry 17, no. 5: 679. https://doi.org/10.3390/sym17050679

APA Style

Gan, S., Wang, X., & Li, H. (2025). SSRL: A Clustering-Based Reinforcement Learning Approach for Efficient Ship Scheduling in Inland Waterways. Symmetry, 17(5), 679. https://doi.org/10.3390/sym17050679

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SSRL: A Clustering-Based Reinforcement Learning Approach for Efficient Ship Scheduling in Inland Waterways

Abstract

1. Introduction

1.1. Background

1.2. Traditional Scheduling Methods in Restricted Waterways

1.3. Intelligent Optimization Methods in Restricted Waterways

1.4. Research Gaps and Proposed Approach

2. The Ship Scheduling Problem in Restricted Waterways

3. The Reinforcement Learning-Based Ship Scheduling Framework

3.1. Preliminaries

3.1.1. Reinforcement Learning

3.1.2. Clustering

3.2. Proposed Scheduling Algorithm

3.3. Computational Complexity and Performance Analysis

4. Experiments and Results Analysis

4.1. Data Description

4.2. Experimental Results and Parameter Sensitivity

4.3. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI