1. Introduction
Ports are critical interfaces between maritime and hinterland transport, underpinning global trade, safeguarding energy supply chains, and catalyzing regional economic development [
1]. In recent years, the continued growth in global shipping volumes and the trend toward larger vessels have placed increasingly high demands on port operational safety and efficiency. Large vessels have limited maneuverability in restricted port areas, and their berthing, unberthing, and shifting operations rely heavily on tug assistance to ensure safety and prevent damage to port infrastructure [
2]. Therefore, tugboats are essential equipment for maintaining efficient and secure port operations.
In dynamic and complex port environments, allocating a limited and heterogeneous tugboat fleet in real time to meet berthing, unberthing, and shifting demands is a central challenge for tugboat-scheduling systems. Under the dual-carbon strategy and smart-port initiatives, scheduling policies must improve operational timeliness and reduce economic costs while aligning with environmental protection and sustainability objectives [
3]. Current research on tugboat scheduling focuses on single-objective optimization [
4], typically minimizing either total operating cost or total operation time. However, these asymmetric formulations do not satisfy modern ports’ requirement for the symmetrical consideration and joint optimization of economic and environmental performance [
5]. In addition, prevailing metaheuristics (e.g., simulated annealing [
6] and hybrid evolutionary strategies [
7]) are prone to premature convergence on large-scale instances and struggle to achieve a symmetrical balance between exploration and exploitation, handle dynamic, complex constraints, and coordinate multiple objectives. Meanwhile, tugboat operations entail substantial fuel consumption and significant emissions of pollutants and greenhouse gases [
8]. Consequently, reducing energy use and emissions in port operations has become a major priority for ports worldwide [
9].
“Green and sustainable ports” are the current realization goals of global ports. The concept encompasses operational practices and infrastructure designs that minimize environmental impact while maintaining economic feasibility, especially in reducing air emissions and carbon footprints. Previous studies have demonstrated various pathways towards this goal. For instance, optimizing vessel speeds to reduce fuel consumption and emissions has been a direct operational focus [
10], while others have explored the adoption of alternative propulsion systems, such as hybrid or electric tugboats, as a long-term technological solution [
5]. The urgency of addressing emissions in port operations is underscored by the substantial contribution of maritime activities to air pollution and greenhouse gases. Tugboat operations, though a supporting service, constitute a notable emission source within the port area due to their high-power, low-speed operational profiles, leading to significant fuel consumption per unit of work [
8]. Therefore, implementing well-informed tugboat-scheduling decisions that strike a symmetry between efficiency and sustainability is crucial for green and sustainable port development.
While various metaheuristics have been applied to tugboat scheduling, they often suffer from premature convergence and the burden of complex parameter tuning [
11]. To address the needs of green port transformation, this paper integrates the energy consumption characteristics of both conventional and electric tugs along with practical operational constraints in ports. A multi-objective mixed-integer linear programming (MILP) model is constructed, aiming to minimize both total operational cost and operational time. At the algorithmic level, a hybrid optimization framework named Jaya-QL is proposed, that is based on a Jaya algorithm integrated with Q-learning. It leverages the parameter-free, population-based global search capability of the Jaya algorithm to efficiently explore the solution space, while the embedded Q-learning module introduces an adaptive intelligence that dynamically refines the search strategy based on continuous feedback (reward) from the dual objectives of cost and time. A problem-specific encoding scheme and fitness function calculation method are designed to accommodate the characteristics of tugboat scheduling. Validation proceeds on small- and medium-scale benchmarks, followed by complex scenarios constructed from vessel-operation data for the main port area of Lianyungang. The experimental results demonstrate that integrating reinforcement learning with conventional optimization yields practical feasibility and measurable performance gains for tugboat scheduling.
The rest of this paper is organized as follows.
Section 2 introduces the related work.
Section 3 presents the problem background and describes the composition of the model. The algorithm is described in
Section 4.
Section 5 shows numerical experiments, and
Section 6 summarizes the findings.
2. Related Work
The tugboat scheduling problem is a complex combinatorial optimization problem driven by stringent constraints, The mathematical modeling of tugboat operations using Mixed-Integer Linear Programming (MILP) has evolved through the increasing refinement of operational constraints and system boundaries.
Historically, early MILP models treated tugboat assignment as a variant of the Parallel Machine Scheduling Problem (PMSP) [
12] or basic Assignment Problems, primarily focusing on matching vessels with available tugboats. As port operations became more dynamic, the development of MILP models shifted toward capturing sophisticated spatial and physical constraints. Key advancements include the transition from single-base models [
13] to multi-berthing base formulations [
14], which explicitly account for the relocation costs and time delays of tugboats moving across different port regions. Furthermore, models have evolved to incorporate tugboat heterogeneity [
15], where matching logic is no longer binary but based on specific ship length, tonnage, and horsepower requirements. Recent MILP developments have also seen the integration of tugboat scheduling with other port resources, such as integrated dispatching of berths and quay cranes scheduling [
16], creating integrated optimization frameworks that reflect the highly coupled nature of modern maritime logistics. By tracing these developments, it is evident that MILP models have moved from static assignment toward high-fidelity, constraint-rich scheduling tools that provide the necessary benchmark for advanced metaheuristics. The increasing sophistication of these MILP formulations underscores the fundamental reality that the tugboat scheduling problem is a complex combinatorial optimization problem driven by stringent constraints. The core challenge lies in allocating limited tugboat resources to time-sequenced ship service requests within dynamic port environments. This process must satisfy multiple hard constraints while simultaneously optimizing conflicting objectives. Current research addressing this problem primarily focuses on two main directions: the design of model architectures and the development of efficient solution algorithms.
Current research on tugboat scheduling primarily focuses on single-objective optimization. One category of studies considers only operational costs. For example, Abou et al. [
17] established a mixed-integer programming (MIP) model that incorporates constraints related to both pilotage and tugboat operations. The objective of this model is to minimize the maximum waiting time among all vessels. Wei et al. [
18] proposed a MILP model incorporating realistic operational constraints to minimize total operating cost. Jia et al. [
19] employed a network representation and integer programming to allocate tugboats with total operating cost as the objective. A second stream considers only operational efficiency. Another category of studies concentrates solely on operational efficiency. Wang et al. [
20] studied tugboat allocation at container terminals and proposed a mixed-integer programming model that minimizes the maximum operating time across all tugboats. Wang et al. [
21] minimized vessel turnaround time by jointly optimizing tugboat assignment and vessel sequencing. Kang et al. [
2] modeled container-port tugboat scheduling to minimize total weighted service time on the anchorage–berth leg while accounting for uncertainty in vessel arrivals and towage durations. Ma et al. [
22] constructed a mixed-integer linear programming model with the goal of minimizing the total completion time of battery swapping operations, and proposed a logical Benders decomposition algorithm (LBBD) to collaboratively optimize the task allocation of tugboats and the dispatching of battery swapping stations.However, current research on multi-objective optimization for tugboat scheduling remains relatively limited. Wang et al. [
23] developed a multi-objective model for multi-berth tugboat bases to minimize completion time and total fuel consumption under different operating modes. Zhong et al. [
4] examined cross-regional scheduling and proposed an MILP that minimizes both maximum completion time and total fuel consumption. Ren et al. [
24] designed an improved seagull optimization algorithm (SOAPG) to optimize the scheduling of port tugboats, in order to achieve a comprehensive balance among operating costs, operational efficiency and scheduling fairness. In this paper, we formulate a multi-objective tugboat-scheduling model that minimizes total operating cost and total operation time, thereby overcoming the limitations of single-objective approaches. Furthermore, based on the actual operational conditions of the main port area of Lianyungang, we introduce a differentiated cost calculation method that accounts for the distinct characteristics of conventional and electric tugboats.
From the perspective of solution methodologies for tugboat scheduling, metaheuristics have emerged as a dominant research paradigm due to their rapid solving capabilities guided by heuristic functions. Wang et al. [
21] proposed an improved discrete particle swarm optimization (IDPSO) method for the tugboat-assignment problem to minimize vessel turnaround time. Zhu et al. [
25] formulated a mixed-integer programming model to minimize total carbon emissions and solved it efficiently using a variable-neighborhood search algorithm. Wang et al. [
26] employed adaptive large-neighborhood search (ALNS) to optimize tugboat schedules under multi-call, multi-service modes with the goal of minimizing total service cost. Sun et al. [
27] developed an improved genetic algorithm with inversion operations for tugboat scheduling in Zhoushan Port, which demonstrated advantages in both enhancing scheduling quality and reducing computational time. Yao et al. [
28] proposed an improved grey wolf optimizer for efficient tugboat scheduling in multi-berth-base settings. Although metaheuristics demonstrate advantages in computational efficiency, their ability to explore the solution space is constrained by predefined heuristic strategies. This limitation makes them prone to becoming trapped in local optima when tackling large-scale problems. In contrast, reinforcement learning (RL) methods based on dynamic policy iteration exhibit stronger environmental adaptability and greater robustness in managing large state spaces and sparse rewards.Drungilas et al. [
29] modeled the real-time scheduling of Automated Guided Vehicles (AGVs) as a Markov Decision Process and implemented Q-learning for dynamic scheduling. Li et al. [
11] incorporated a Deep Deterministic Policy Gradient (DDPG) algorithm enhanced with prioritized experience replay and a noise suppression mechanism to address tug scheduling in dynamically changing port environments.
Existing studies have extensively explored various methodologies to solve the tugboat scheduling problem. However, traditional metaheuristics often struggle with parameter sensitivity and are prone to premature convergence. Additionally, while reinforcement learning offers adaptability, pure RL methods frequently face challenges such as sparse rewards and slow convergence in high-dimensional discrete search spaces. To address complex and high-dimensional scheduling problems, some researchers have integrated reinforcement learning with metaheuristic algorithms to develop more efficient and intelligent optimization methods. Lu et al. [
30] proposed a hybrid strategy combining four types of metaheuristics with Q-learning, enabling adaptive selection among five local search operators throughout the iterative process. Yu et al. [
31] embedded Q-learning into a meta-heuristic framework to solve energy-efficient multi-objective distributed assembly permutation flow shop scheduling problems. Yu et al. [
32] proposed an optimization framework that emstacks Q-learning into meta-heuristic algorithms, and adaptively selects the neighborhood structure through reinforcement learning to achieve the optimal trade-off between scheduling efficiency and energy conservation and emission reduction. The adaptive decision-making mechanism of RL dynamically adjusts the search behavior of metaheuristics, enhancing their exploration capability in complex solution spaces and reducing the risk of premature convergence. Meanwhile, the structured global search framework provided by metaheuristics accelerates the focus on potentially high-quality solution regions, thereby synergistically improving convergence speed and strengthening the guarantee of global optimality. To effectively solve the complex multi-objective and multi-constraint tugboat scheduling problem, this paper develops a cooperative optimization framework that integrates metaheuristics with RL.
Addressing the limitations of existing tugboat scheduling models and optimization algorithms, this paper proposes a multi-objective MILP model that aims to minimize both the total operating cost and total operating time of tugboats. The model explicitly accounts for cost variations among different types of tugboats under diverse operational conditions. Building on this formulation, we propose an integrated multi-objective optimization framework that synergistically integrates the elite solution-guided mechanism of the Jaya algorithm with the adaptive decision-making capability of Q-learning. This hybrid approach establishes a global-local cooperative mechanism tailored for multi-objective tugboat scheduling. Furthermore, a case study based on real data from the main port area of Lianyungang is conducted to validate the effectiveness and practicality of the proposed model and method, offering a new solution for tugboat scheduling problems.
3. Tugboat Scheduling Model
3.1. Tugboat Scheduling Process
The tugboat scheduling process involves multiple entities including the port, tugboats, and vessels.The operational workflow is illustrated in
Figure 1. When a vessel arrives at the anchorage area and idle tugboats are available, the assigned tugboat initiates its journey toward the anchorage. The tugboat then provides berthing assistance, supporting the vessel until it is safely moored at the target berth. Upon completion of the task, the tugboat returns to its base and awaits the next assignment.
Tugboat scheduling is a core decision problem in port operations and a complex optimization challenge. Fundamentally, it entails dynamically matching tugboat resources to vessel-service demands across time and space under multiple objectives and constraints. The objective is to deploy tugboats efficiently to assist berthing, unberthing, and shifting. This ensures that all tasks are completed within designated time windows, thereby maximizing operational efficiency, reducing overall operational costs, and minimizing time expenditure. The problem involves multiple tugboat bases, a heterogeneous tugboat fleet, and varying task volumes with diverse constraints. These include the number of tugboats, horsepower, speed, sailing distances, and differential cost consumption during assisted versus unassisted states. Consequently, effective solutions must integrate these factors to produce feasible schedules that enhance port operational efficiency and economic performance.
3.2. Model Assumption
The scheduling optimization model in this paper contains the following assumptions:
(1) Vessel arrival times are known in advance as ports receive detailed information before tugboat operations commence.
(2) All tugboats are initially stationed at known locations within their bases prior to task initiation.
(3) Tugboats function without failure during service operations.
(4) Tugboats maintain fixed optimal economical speeds according to their horsepower ratings.
(5) Service time depends solely on travel distance and speed; other factors are not considered.
(6) Each tugboat serves only one vessel simultaneously, with available fleet size and grades meeting operational requirements.
(7) The water depth of the port basin is assumed to be consistently sufficient for all vessel types, and the impact of tides is neglected.
3.3. Parameters and Variables
To ensure a rigorous and unambiguous formulation of the tugboat scheduling problem,
Table 1 provides a comprehensive definition of the mathematical nomenclature.This notation provides the framework for the MILP model presented, which quantifies operational costs (
) and time (
). By categorizing variables into sets, parameters, and decision variables, the table facilitates the systematic derivation of the constraints and objective functions necessary for the Jaya-QL optimization algorithm. The notations used in the model for tugboat scheduling are summarized in
Table 1, which includes symbolic variables, decision variables, and their respective definitions.
3.4. MILP Model
To enhance port operational efficiency, a MILP model has been formulated that minimizes total tugboat operating cost and total operation time while ensuring completion of all tasks. The objective functions are specified in Equations (
1)–(
4).
Equation (
1) minimizes total operating cost. Equation (
2) minimizes total operation time. Equation (
3) defines operating cost as the sum of costs incurred in non-towage and towage states. The product of speed and time effectively translates the temporal duration into an operational distance, which, when multiplied by the distance-based cost coefficient
, yields a consistent monetary unit. Equation (
4) specifies operation time as the sum of travel times from the tugboat base to the task origin, from origin to destination, and from the destination back to the base.
The constraints are specified in Equations (
5)–(
16):
Equation (
5) ensures that the number of tugboats assigned to each task meets the specific requirement of that task. Equation (
6) specifies that the total horsepower of the tugboats assigned to a task must satisfy the horsepower requirement of that task. Equation (
7) restricts each tugboat to at most one concurrent assignment. Equation (
8) requires tugboats assigned to the first task to arrive at the task origin before the start time. Equations (
9) and (
10) impose sequencing constraints so that each tugboat executes tasks in the planned order. Equation (
11) requires a tugboat to return to its home base before starting the next task. Equation (
12) ensures that the available tugboats at each base are sufficient to meet task demand. Equation (
13) sets each tugboat’s initial location at its home base prior to the first task. Equation (
14) specifies that a tugboat must return to its base if it is not currently assigned to any task. Equations (
15) and (
16) define the domain constraints for the decision variables.
4. Algorithm
The Jaya algorithm is a population-based optimization technique that offers advantages such as parameter-free operation and strong global exploration capabilities. However, when applied in isolation, the Jaya algorithm exhibits limited local exploitation capability and is prone to becoming trapped in local optima, particularly in multimodal optimization problems. This asymmetry between its potent exploration and feeble exploitation restricts its overall performance. Furthermore, its inability to effectively utilize historical search information hinders a comprehensive exploration of the solution space. Q-learning, in contrast, is a reinforcement learning method that adapts to dynamic environments and optimizes policies from experience through sequential decision-making. Through trial-and-error interaction within a Markov decision process (MDP), it iteratively improves the policy. Its strength in learning from historical states and rewards presents a symmetrical complement to the population-wide, moment-driven update of Jaya. To address their individual limitations and harness their symmetrically complementary strengths, we propose a hybrid method based on a Jaya algorithm integrated with Q-learning to solve the tugboat scheduling problem. This integration is designed to establish a symmetrical and cooperative search mechanism. The approach employs multidimensional real-valued encoding to represent tugboat-assignment decisions and designs a fitness function within an event-based modeling framework. The detailed procedure of the algorithm is described as follows.
4.1. Encoding
In traditional binary encoding, vessel and tugboat identifiers are first converted to bit strings and then mapped back during decoding, which increases model complexity and computational burden. In contrast, a multidimensional real-valued encoding directly represents the scheduling logic, thereby simplifying the encoding–decoding process and improving resource-allocation efficiency. Therefore, this paper adopts a multidimensional real-valued representation for the tugboat-assignment problem to avoid the complexity of binary coding. During the encoding process, vessels are first numbered sequentially according to their arrival times, following a first-come-first-served principle where earlier arrivals receive priority in tugboat assignment. The encoding must also satisfy Equations (
5) and (
6) of the mathematical model. The dimension of the decision vector depends on the current number of vessel tasks and the maximum number of tugboats required by any single task. An example of the specific encoding is provided in
Table 2. In the example, six vessels await tugboat service, and a tugboat index of 0 denotes an empty slot (i.e., no tugboat is assigned). For illustration, tugboat IDs 7 and 9 are assigned to vessel 1 (two tugboats), whereas tugboat IDs 4, 11, 12, and 13 are assigned to vessel 2 (four tugboats).
4.2. Initial Individuals and Population
Each solution vector represents a candidate solution, and a set of n individuals forms the population. During Jaya initialization, vessel attributes (e.g., length) are first used to determine, for each vessel, the required bollard pull and the number of tugboats. The algorithm then performs randomized matching from the pool of available tugboats. Each individual encodes the tugboat configuration for all vessels in the current stage. Its dimension is determined by the total number of vessel tasks and the maximum number of tugboats required by any single task. To construct the initial population, we proceed iteratively: for each individual to be generated, the algorithm computes tugboat demand for each vessel from its physical parameters; subject to specification constraints, it then randomly samples the required number of tugboats from the available set, thereby ensuring feasibility of every initial individual. An example of population initialization is illustrated in
Figure 2. Each entry denotes a tugboat identifier; 0 indicates an empty slot (i.e., no tugboat assigned). The horizontal dimension of an individual equals the maximum per-task tugboat requirement, and the vertical dimension equals the number of vessel tasks in the current stage. For example, Task 1 requires two tugboats (IDs 7 and 9), whereas Task 2 requires four tugboats (IDs 4, 11, 12, and 13).
4.3. Fitness Function
Based on the preceding formulation and encoding scheme, each generated solution vector (individual) represents a tugboat-scheduling plan. First, solution vectors are generated to satisfy task requirements according to Equations (
5)–(
7), while the initial positions of the tugboats must conform to Equation (
13). Next, the latest arrival time for the first task is derived using Equation (
8), and the start and end times of each task for every tugboat are computed without considering other influencing factors. The composition of these times must conform to Equations (
9) and (
10). Subsequently, the return status of the tugboats is validated based on Equations (
11) and (
12). Any tugboat that fails to meet these constraints at the current node is designated as unassignable. In addition, tugboats without reassigned tasks must satisfy Equation (
14). If the solutions generated by the Jaya algorithm fail to satisfy the required tugboat count, the repair mechanism supplements the assignment by selecting idle tugboats based on the ’shortest relocation distance’ principle. By fine-tuning the operational timeline, the system ensures both physical feasibility and temporal non-overlap for all tugboat sequences. This logic is implemented via linear traversal, maintaining a stable computational overhead as the problem scale increases.In actual port operations, there may be some vessels with restricted maneuvers. In such cases, additional tugboats that exceed the standard configuration need to be allocated. Given the modeling assumption that tugboat operation time depends solely on travel distance and speed, we compute the total operation time
for all tugboats from the start and finish times,
and
, via Equation (
4). Fuel consumption during waiting periods is considered negligible. The total operating cost
incorporates only towing and non-towing cost consumption, as defined in Equation (
3). The pseudocode for fitness calculation is as follows:
| Fitness calculation |
| 1. For each tugboat m: |
| 2. Initialize . |
| 3. Compute and according to Equations (9) and (10). |
| 4. End for. |
| 5. For each ship n: |
| 6. assigned_tugboats ← candidate_solution[n]. |
| 7. required_quantity ← ship_tugboat_match[ship_type[n]].Quantity. |
| 8. if len(assigned_tugboats) ≠ required_quantity then |
| 9. assigned_tugboats ← repair_assignment. |
| 10. end if |
| 11. End for. |
| 12. For each tugboat m: |
| 13. Cost = cost_to_start + cost_to_return + . |
| 14. Time = time_to_start + time_to_return + . |
| 15. End for. |
| 16. Calculate the fitness values of each individual. |
| 17. Fitness1 = Ctotal = ∑Cost; Fitness2 = Ttotal
= ∑Time. |
| 18. Return . |
4.4. Jaya Algorithm
The Jaya algorithm is a metaheuristic optimization technique based on swarm intelligence. Its parameter-free nature significantly reduces the cost of implementation and tuning. During iterations, candidate solutions are updated by directly utilizing information from the symmetrically opposing forces of the best and worst solutions in the current population, creating a symmetrical push-pull dynamic. This mechanism drives solutions progressively toward optimal regions while steering them away from inferior directions, thereby achieving the goal of global optimization. The update rule is specified in Equation (
17):
where
and
respectively represent the current optimal and worst solutions.
and
are random numbers within [0, 1].
Solving multi-objective problems is often constrained by the complexity of the solution space. The Jaya algorithm, with its iterative mechanism that follows favorable solutions and avoids unfavorable ones, exhibits a balance between global exploration and local exploitation. This enables the algorithm to more readily approach the global optimum in multi-constrained, multi-objective tugboat scheduling problems. The specific flowchart of the Jaya algorithm adopted in this paper for solving the tugboat scheduling model is illustrated in
Figure 3.
4.5. Q Learning
Q-learning is a value-based reinforcement learning algorithm. It aims to select the optimal action by leveraging values within a Q-table to achieve the best possible learning outcome. In this paper, the Q-learning framework is constructed as a quadruple that includes the state space, action space, Q-table, and reward function, detailed as follows:
(1) State space
In this paper, the relative quality gap between the current solution obtained by the Jaya algorithm and the global optimal solution is categorized into three states. These three states are formally defined in Equation (
18) as follows:
where
represents the current normalized objective value, and
represents the optimal normalized objective value.
The three-level state design achieves policy differentiation while ensuring learning efficiency. A fewer number of divisions would weaken the diversity of strategies. A threshold of 0.1 in State S(1) ensures that high-quality solutions tend to remain stable, preserving the current solution. Conversely, a threshold of 0.5 delineates a significant improvement, prompting more aggressive adjustments in State S(3). The selection of these thresholds is guided by the typical improvement rates observed during the initial stages of solving the tugboat scheduling problem. Specifically, preliminary sensitivity analysis was conducted by testing different threshold orders of magnitude. The selected values ensure that the state space can effectively distinguish between global exploration phases and local convergence phases without being overly sensitive to numerical noise, thereby stabilizing the Q-value updates.
(2) Action space
State evaluation computes the relative quality gap between current and global optimal solutions. Actions are then selected via
-greedy policy. We design a discrete action space with three adjustment strategies:
A(1),
A(2),
A(3).
A(1): Maintain current tugboat assignments.
A(2): Replace random task assignments with global optimum equivalents.
A(3): Regenerate compliant tugboat combinations for selected tasks.
Table 3 defines the state–action synergy mechanism, which maps evolutionary states (
S) to their corresponding adaptive search strategies (
A) within the Q-learning module.
(3) Q table
Q-learning evaluates the action value (Q) for each state–action pair and selects the action that maximizes the Q-value, corresponding to the highest expected return. As learning progresses and additional feedback is incorporated, the Q-table more accurately approximates the action-value function, thereby improving action selection.
Table 4 presents the structural representation of the Q-matrix
, which is employed to guide the local search during the iterative process by quantifying the accumulated rewards for each State–Action pair.
Under the action-selection policy, actions with higher Q-values in the Q-table are more likely to be chosen. Integrating Q-learning enables the Jaya algorithm to approach globally optimal regions more efficiently. After each iteration, the Q-values are updated according to Equation (
19):
where
is the Q value of action
A taken in state
S,
is the learning rate,
is the discount factor,
r is the actual reward received by action
A, and
is the highest expected reward for all possible operations in state
S.
(4) Reward function
The reward function is designed to guide the agent during reinforcement-learning training in achieving a symmetrical trade-off between total tugboat operating cost and total operation time. Through iterative training, the agent maximizes the reward, thereby directing the search toward optimal solutions. An effective metric is essential in the design to evaluate whether the search is effectively guided toward improved solutions. Therefore, a reward function is formulated based on the dual objectives of minimizing both tugboat operating costs and operating time. This function computes a feedback reward value to assess actions and optimize strategies. The reward function R is defined in Equation (
20):
where
represents the total operation cost of the tugboat,
represents the total operation time of the tugboat,
represents the weight of a single objective, and
represents the benchmark value used to eliminate the difference in the i-th target dimension.
The proposed reward function provides a symmetric evaluation of the independent contributions of cost and time, preventing any single objective from dominating the signal. In Equation (
20), the benchmark value
is defined as the reciprocal of the initial optimal value of single objective.It eliminates the scale and unit differences between cost and time by transforming them into dimensionless ratio. Meanwhile, the negative exponential mapping ensures that the reward is inversely proportional to the objective values. This design maintains a symmetric optimization pressure and provides a stable, normalized reward signal for the Q-learning module. Moreover, the nonlinearity of exponential decay makes the reward more sensitive to improvements across the relevant range. Compared with hard-threshold schemes, the exponential function provides a continuous reward, enhancing dynamic adaptability and enabling the agent to discern subtle improvements. This fine-grained feedback facilitates a balanced trade-off between exploration and exploitation.
4.6. The Jaya Algorithm Integrating Q-Learning
The hybrid optimization framework proposed in this paper deeply integrates the global search capability of the Jaya algorithm with the local optimization ability of Q-learning. It establishes a collaborative global-local optimization mechanism founded on a principle of symmetrical cooperation, thereby enhancing the algorithm’s efficiency and stability in solving complex multi-objective problems. The specific integration strategy, which leverages symmetric design at multiple levels, is implemented as follows:
(1) Global search layer. The Jaya algorithm performs population-level iterative updates to conduct coarse-grained exploration of the solution space, leveraging its rapid convergence. By comparing the difference vector between the symmetrically opposing forces of the current best and worst solutions, the population is steered toward high-potential regions, thereby avoiding the parameter-tuning complexity typical of many metaheuristics.
(2) Local optimization layer. A Q-learning module is embedded to adaptively refine solutions via a state–action–reward mapping. The state space is partitioned by the gap between a candidate solution and the global best; the action set comprises three operators—retain, copy segments from the global best, and random perturbation—and the reward function aggregates improvements in the dual objectives of cost and time to enable fine-grained exploitation.
(3) Knowledge-sharing mechanism. A dynamic and symmetrically bidirectional linkage is maintained between the global best solution and the Q-table. The global best informs local Q-learning actions, while the outcomes of these actions subsequently contribute to updating the global best, creating a symmetric cycle of knowledge exchange and enabling efficient cross-level transfer of experiential knowledge.
To integrate the components into a functional optimizer, the Jaya-QL framework adopts a hierarchical, dual-layer search logic. In the first layer, the population undergoes a global search based on standard Jaya update rules to broadly explore the solution space. In the second layer, each individual solution is treated as an independent Reinforcement Learning Agent for micro-level refinement.
Unlike traditional hybrid metaheuristics that apply uniform operators to the entire population, Jaya-QL allows each agent to observe its own relative performance state (
S) and adaptively select an optimal strategy (
A) from a shared Q-table. This shared structure facilitates collective intelligence, where agents receive immediate rewards (
R) based on solution quality improvements, subsequently updating the Q-values via the Equation (
19). This synergistic interaction ensures a symmetric balance between global exploration and localized, adaptive exploitation, effectively enhancing both convergence speed and the diversity of the Pareto front. The complete procedural flow of this hybrid approach is formalized in Algorithm 1.
| Algorithm 1 Jaya-QL Algorithm for Tugboat Scheduling |
1. Initialize: Population P, Q-table , parameters .
2. Evaluate: Calculate bi-objective fitness for each ; identify and .
3. While do:
4. Step 1: Population-level Jaya Search
5. For each :
6. .
7. If is superior to then .
8. Step 2: Individual-level Q-learning Refinement
9. For each :
10. Calculate to determine current state .
11. Select Action using -greedy policy.
12. Execute to generate .
13. Calculate Reward R using Equation (20).
14. Update using the Temporal Difference (TD) error.
15. If then .
16. Update and .
17. End While
18. Return (Pareto optimal solution set).
|
This paper employs a Jaya algorithm integrated with Q-learning to solve the tugboat scheduling model. The detailed computational procedure is illustrated in
Figure 4.
I. Input specification. Three data sources are required: port task data, available tugboat resources, and tugboat-base information. Task data specify, for each job, the required number of tugboats, required bollard pull, start time, origin, and destination. Resource data describe the available fleet, including fleet size, bollard pull, free-sailing speed, free-sailing cost, and towage cost. Tugboat-base data includes the number of bases, their capacities, and locations.
II. Feasible initialization. Each task must be matched with tugboats that satisfy its quantity and power requirements, which constrains tugboat usage. Given the task order and model constraints, we identify the set of currently available tugboats and randomly generate an initial pool of candidate solutions from this set, ensuring that no infeasible assignment arises.
III. Joint global–local optimization. The Jaya algorithm iteratively updates the population to rapidly identify high-quality regions of the search space. In parallel, a Q-learning module performs local policy refinement: it dynamically selects adjustment actions based on the current state and updates the Q-table via the Bellman recursion for policy evaluation. To balance exploitation near incumbent good solutions with exploration of new regions, we adopt an -greedy action-selection strategy, which facilitates escape from local optima. The procedure returns the resulting tugboat schedule for the set of vessel tasks.
5. Simulation Experiment
To validate the effectiveness of the proposed Jaya-QL, this paper developed a multi-tier experimental validation framework. First, small- and medium-scale simulation cases were randomly generated and compared against the Jaya algorithm to assess the improvements introduced by the framework. Subsequently, actual vessel scheduling data from the main port area of Lianyungang in October 2024 were employed as a case study. In this setting, Jaya-QL was comprehensively compared with ABC, QPSO, and the standard Jaya algorithm to evaluate its overall performance advantages under complex multi-objective conditions. The Jaya-QL algorithm is configured with the following parameters: population size N = 50, number of iterations MG = 500, exploration rate = 0.8, earning rate = 0.25, and discount factor = 0.95. All algorithms are running on a personal computer equipped with an Intel i5-9300H processor, 2.40 GHz and 8gb of memory.
5.1. Parameter Setting
This paper designs a tugboat scheduling experiment based on the actual tugboat scheduling process of the main port area of Lianyungang Port. The port operates 14 tugboats distributed across three bases, with the horsepower (HP) categories and quantities detailed in
Table 5. The experiment was designed to evaluate the practical performance and advantages of the Jaya-QL algorithm. Lianyungang Port has fuel-powered tugboats and electric tugboats. The operating cost per nautical mile for tugboats with different horsepower levels performing scheduled tasks under different conditions is shown in
Table 6. The cost coefficients presented in
Table 6 are derived from the actual operational data provided by the Scheduling Center of Lianyungang Port. These values represent the operating costs, mainly the economic expenses required for the average fuel consumption per nautical mile of tugboats of different horsepower grades under standard working conditions. Tugboats in Lianyungang are restricted to a maximum speed of 11 knots, and voyage time is calculated according to actual distance and speed. When calculating the objective function values, the unit cost coefficients (CNY/H) of tugboats with different horsepower are allocated as shown in
Table 6. To guarantee the safety and stability of port operations, the allocation of tugboat resources is strictly governed by mandatory port regulations. For various vessel types, the specific requirements regarding the quantity and horsepower of tugboats assigned for berthing, unberthing, and shifting operations must adhere to the standardized configuration protocols specified in
Table 7.
5.2. Small and Medium-Scale Case Studies
This section evaluates the effectiveness of the Jaya and Jaya-QL algorithms using small-scale and medium-scale cases. The objective function is formulated with a linear weighting approach to simultaneously optimize total tugboats operating cost and operation time. A comparative analysis is conducted to assess the performance of the two algorithms across different problem scales. The linear weighted objective function is expressed as follows:
where
represents the total tug operating cost, and
represents the total operation time. The coefficients
and
are weight factors assigned to the two objectives respectively, each taking a value within the range [0,1]. Because the two objectives differ markedly in units and scale, each is normalized prior to aggregation. The linear weighted-sum formulation is given by
where
and
are the optimal values of the single-objective optimization of the model for the minimum operation cost min
and minimum operation time min
of the tugboat.
To evaluate the performance of the Jaya and Jaya-QL algorithms in tugboat scheduling, twelve small-scale and medium-scale test cases were randomly generated. The generation of test instances is strictly grounded in empirical operational experience and technical standards to ensure the authenticity of the simulation experiments. Vessel arrival times are modeled after the statistical characteristics of historical task instances to accurately reflect realistic operational rhythms and traffic density. Furthermore, vessels are categorized into five distinct types based on their length, with the mandatory tugboat count and horsepower requirements for each category determined in strict accordance with
Table 7. The comparison results are summarized in
Table 8. From
Table 8, it can be seen that the Jaya and Jaya-QL algorithms exhibit significant performance differentiation as task scale expands. For low-dimensional tasks (Iantances 1 and 2), both Jaya and Jaya-QL algorithms converge to the same global optimum, confirming the validity of Jaya in small solution spaces. However, as the task complexity increases, the Jaya-QL algorithm demonstrates superior global search capability. Its optimization results show an improvement ranging from 2.60% to 32.37% compared to the Jaya algorithm. For instances with 16 or more tasks, the average improvement reaches 25.27%, indicating a pronounced advantage in higher-dimensional settings. This performance gain comes at the cost of longer computation times, as Jaya-QL integrates Q-learning to guide local search.
Similarly, the performance of Jaya and Jaya-QL was assessed across the same twelve cases from the perspective of objective function optimization.
Figure 5 and
Figure 6 illustrate the total tug operating cost and overall operation time obtained by both algorithms. The results show that each method converges to the same optimal solution in low-dimensional spaces, confirming the validity of the baseline Jaya algorithm. As task scale increases, Jaya-QL demonstrates clear superiority. In particular, for the medium-dimensional Instance 11 (m = 14, n = 20, k = 3), Jaya-QL achieves a 32.56% reduction in operating cost compared to Jaya, highlighting the effectiveness of Q-learning in guiding local search and helping the algorithm escape local optima. For medium-scale tasks, Jaya-QL consistently identifies scheduling solutions with lower multi-objective values than Jaya, owing to its dynamic exploitation strategy.
Figure 7 shows the computational times of the two algorithms. Due to the state–action value function update introduced by the Q-learning module, the average solution time of Jaya-QL increases by 30.11–42.55% compared with Jaya, and this time cost is positively correlated with problem dimension.
5.3. Large-Scale Case Study
This paper uses the arrival and departure timetable for the main port area of Lianyungang from 12:00 on 14 October 2024 to 12:00 on 16 October 2024 as the case study. During this period, 34 vessels required tug assistance for berthing, unberthing, and shifting. Detailed vessel information is provided in
Table 9, and tug assignments for each vessel are set according to
Table 7. To ensure a fair and rigorous comparison, the benchmark algorithms (ABC, QPSO, ACO, GA and standard Jaya) were specifically adapted to the tugboat scheduling problem. A unified encoding and decoding framework were implemented across all heuristics to standardize the search space. Furthermore, identical constraint-handling mechanisms were integrated to address the intricate time-window and tugboat-capacity requirements of the port. Each algorithm was executed using its optimal parameter configuration to ensure peak performance on the specific Lianyungang Port datasets. Finally, to evaluate the statistical stability of the proposed Jaya-QL and its counterparts, each algorithm was subjected to 10 independent trials.
The convergence trajectories in
Figure 8 illustrate that while the GA algorithm achieves rapid initial optimization, it suffers from premature convergence, with search performance stagnating after the middle stage of the search. In stark contrast, Jaya-QL exhibits a more robust convergence profile despite a more moderate initial pace, it demonstrates superior sustained optimization capabilities. Notably, after the later stage of the search, Jaya-QL achieves stepwise performance breakthroughs, successfully escaping local optima. This sustained search activity underscores the efficacy of Jaya-QL in balancing exploration and exploitation, ultimately yielding a solution quality that significantly surpasses that of GA and other benchmark meta-heuristics. As shown in
Table 10, the optimal solutions along with their corresponding total operation costs, total operation times, and algorithm computation times are presented for the benchmark algorithms.The proposed Jaya-QL algorithm demonstrates a clear advantage in global optimization compared to the benchmark algorithms. By integrating Q-learning’s dynamic balance between exploration and exploitation, Jaya-QL effectively escapes local optima throughout the iterations. The fitness value of the final solution exhibits a stepwise decline, confirming the effectiveness of reinforcement learning in enhancing solution quality in multi-objective optimization problems. In contrast, the other algorithms tend to exhibit premature convergence during the mid-phase of the search process. In addition to its superior convergence accuracy, Jaya-QL demonstrates remarkable computational reliability. Statistical analysis reveals that the algorithm yielded a standard deviation of 10,750.41 for operational costs, a figure substantially lower than those recorded for ABC (48,232.52) and the standard Jaya algorithm (42,394.28). Furthermore, it maintained a minimal standard deviation of 7.21 in temporal costs. This enhanced stability underscores the efficacy of the integrated Q-learning framework in mitigating sensitivity to stochastic variations. By dynamically regulating the exploration-exploitation trade-off, the proposed mechanism facilitates consistent convergence toward high-quality solutions.
As shown in the multi-objective performance comparison in
Figure 9, the proposed Jaya-QL algorithm demonstrates dual-dimensional advantages in tugboat scheduling optimization. To provide a fair and robust evaluation, the data visualized in
Figure 9 are derived from the average performance across 10 independent trials. Specifically, for each generation, the objective values shown are the arithmetic means calculated from the sample scenarios, thereby mitigating the impact of stochastic fluctuations inherent in meta-heuristic algorithms. In terms of operating cost, Jaya-QL achieves reductions of 24.36%, 17.92%, 15.13%, 7.35% and 22.83% compared to ABC, QPSO, ACO, GA and the standard Jaya algorithm, respectively. In addition, the proposed algorithm achieved improvements in total tugboat operation time, which is a key scheduling metric. The observed optimization effects were 1.58%, 0.29%, 0.23%, 0.21% and 0.93%, respectively. This dual-objective synchronous optimization capability is attributed to the hybrid architecture of Jaya-QL. The reward-punishment mechanism in Q-learning guides the search direction using historical experience, thereby reducing the probability of generating invalid scheduling solutions. Meanwhile, the elite retention strategy in the Jaya algorithm ensures solution diversity. While Jaya-QL significantly outperforms other heuristics in solution quality, its CPU time is higher than that of computationally efficient algorithms such as GA and QPSO. This increased overhead is primarily attributed to the recursive updates of the Q-table and the state–action selection process inherent in reinforcement learning. However, it is noteworthy that Jaya-QL remains more efficient than ACO and ABC and even achieves a slight reduction in time compared to the standard Jaya by streamlining the search trajectory. Despite this, minimizing computational latency remains a critical avenue for future work. Potential enhancements include optimizing the Q-learning update frequency or implementing parallel processing frameworks to further adapt the algorithm for real-time tugboat dispatching in large-scale port operations. The final optimal tugboat scheduling solution is illustrated in
Figure 10.
The experimental results indicate that the proposed Jaya-QL algorithm not only reduces port operating costs but also substantially shortens vessel operation cycles. Beyond economic benefits, these improvements have profound implications for the construction of ’Green Ports’. From a sustainability perspective, the significant reduction in operation time directly translates to lower fuel consumption for both tugboats and berthing vessels, thereby effectively minimizing the carbon footprint and atmospheric pollutant emissions within the port area. Furthermore, the multi-objective collaborative optimization capability of Jaya-QL offers a novel technical pathway for dynamic and real-time tugboat scheduling in complex environments. This allows port decision-makers to transition from traditional cost-centric management to a more sustainable model that harmonizes economic efficiency with environmental stewardship, providing a scalable solution for the global maritime industry’s green transition.
5.4. Parameter Sensitivity Analysis
To balance the convergence performance and computational efficiency of the Jaya-QL algorithm, this study configured the population size at N = 50 and the maximum iterations at MG = 500. Preliminary experiments demonstrate that this configuration provides a sufficient search space to identify high-quality solutions while ensuring stable convergence within a reasonable timeframe. Consequently, a one-factor sensitivity analysis was conducted on the three core parameters of the Q-learning module: the exploration rate (), the learning rate (), and the discount factor ().
As illustrated in
Figure 11,
plays a critical role in balancing global exploration and local exploitation. A lower exploration rate restricts operator diversity, which may cause the algorithm to be trapped in local optima and subsequently increase operational costs. The optimal search performance is achieved when
. Subsequently, the sensitivity analysis of
in
Figure 12 indicates that the update frequency of the Q-table directly influences the stability of the guidance strategy. Setting
effectively harmonizes learning efficiency with search stability, preventing both strategy lag and numerical oscillation. Finally, the analysis of
in
Figure 13 reveals that strategic foresight contributes significantly to scheduling quality. A higher discount factor (
) enables the algorithm to anticipate future task demands, thereby optimizing tugboat operation sequences and minimizing non-productive travel.
In summary, the optimal parameter combination is identified as , , and . Under this configuration, the algorithm achieves the most favorable balance between operational and temporal costs. These results validate the efficacy and robustness of the selected parameters for addressing large-scale tugboat scheduling problems.
6. Conclusions
This paper addressed the challenge of multi-task and multi-base collaborative tugboat scheduling in port operations. A multi-objective MILP model was developed to minimize both total operating cost and operation time while accounting for realistic constraints, including tugboat performance heterogeneity, multiple base locations, and continuity in task sequences. This model establishes a practical decision-making framework that achieves a symmetry between economic efficiency and operational timeliness.
To overcome the limitations of the Jaya algorithm in avoiding local optima and adapting search strategies, we proposed a hybrid optimization framework (Jaya-QL) that employs a Jaya algorithm integrated with Q-learning. By introducing reinforcement learning-based dynamic strategy adjustment, the framework adaptively achieves a symmetric balance between global exploration and local exploitation, thereby improving convergence accuracy and solution quality. Its effectiveness was verified through benchmark tests of varying scales and real operational data from 34 vessels in Lianyungang Port. The results consistently demonstrated that Jaya-QL outperformed comparative algorithms such as ABC and QPSO in optimizing both cost and time, confirming the value of integrating reinforcement learning with metaheuristics in complex scheduling tasks.
The proposed Jaya-QL framework provides a highly efficient and practical optimization paradigm for port operations, demonstrating significant improvements in cost reduction and vessel turnaround efficiency. Beyond its immediate application to tugboat scheduling, the scalability and generalizability of Jaya-QL suggest that integrating reinforcement learning with metaheuristic algorithms offers a potent solution for various complex, resource-constrained scheduling problems.
However, it is acknowledged that the current study operates under a deterministic MILP framework with several idealized assumptions, such as fixed vessel arrival times, constant sailing speeds, and 100% resource reliability. In practice, port environments are inherently stochastic and susceptible to uncertainties including vessel delays, equipment failures, and fluctuating weather conditions. To bridge the gap between theoretical methodology and real-world maritime dynamics, our future research will focus on transitioning from deterministic models to stochastic or robust optimization frameworks. Specifically, incorporating uncertainty parameters and exploring multi-agent reinforcement learning (MARL) will be prioritized to enhance collaborative scheduling efficiency and robustness. This evolution will further broaden the applicability of the Jaya-QL methodology to a wider range of multi-objective optimization contexts in increasingly dynamic environments.