Multi-Objective Optimization for Tugboat Scheduling Based on the Jaya Algorithm Integrating Q-Learning

Yuan, Wei; Xue, Zhongwei; Jiang, Wei

doi:10.3390/sym18010129

Open AccessArticle

Multi-Objective Optimization for Tugboat Scheduling Based on the Jaya Algorithm Integrating Q-Learning

by

Wei Yuan

^1,*

,

Zhongwei Xue

¹ and

Wei Jiang

²

¹

College of Automation, Jiangsu University of Science and Technology, Zhenjiang 212100, China

²

Wuhan Institute of Marine Electric Propulsion, Wuhan 430064, China

^*

Author to whom correspondence should be addressed.

Symmetry 2026, 18(1), 129; https://doi.org/10.3390/sym18010129

Submission received: 21 November 2025 / Revised: 23 December 2025 / Accepted: 7 January 2026 / Published: 9 January 2026

Download

Browse Figures

Versions Notes

Abstract

Tugboats are indispensable for ensuring the safe and efficient berthing and unberthing of large vessels, and their scheduling policies have a direct impact on port efficiency and operating costs. To overcome the limitations of conventional single-objective optimization approaches, this paper develops a multi-objective, mixed-integer linear programming (MILP) model that establishes a symmetric consideration by simultaneously minimizing total operating cost and operation time. In addition, a hybrid optimization framework that employs a Jaya algorithm integrated with Q-learning (Jaya-QL) is introduced. Its Q-learning-driven adaptive mechanism achieves a symmetric balance between global exploration and local exploitation, mitigating premature convergence in the Jaya algorithm. Experimental results show that Jaya-QL achieves average reductions of 17.5% in total cost and 0.65% in total time compared with the Artificial Bee Colony (ABC), Quantum Particle Swarm Optimization (QPSO), Ant Colony Optimization (ACO), Genetic algorithm (GA) and Jaya algorithms. Moreover, it demonstrates superior convergence accuracy and solution diversity, offering a practical and effective decision support tool for tugboat scheduling in modern port operations.

Keywords:

tugboat scheduling; multi-objective optimization; Jaya algorithm; Q-learning

1. Introduction

Ports are critical interfaces between maritime and hinterland transport, underpinning global trade, safeguarding energy supply chains, and catalyzing regional economic development [1]. In recent years, the continued growth in global shipping volumes and the trend toward larger vessels have placed increasingly high demands on port operational safety and efficiency. Large vessels have limited maneuverability in restricted port areas, and their berthing, unberthing, and shifting operations rely heavily on tug assistance to ensure safety and prevent damage to port infrastructure [2]. Therefore, tugboats are essential equipment for maintaining efficient and secure port operations.

In dynamic and complex port environments, allocating a limited and heterogeneous tugboat fleet in real time to meet berthing, unberthing, and shifting demands is a central challenge for tugboat-scheduling systems. Under the dual-carbon strategy and smart-port initiatives, scheduling policies must improve operational timeliness and reduce economic costs while aligning with environmental protection and sustainability objectives [3]. Current research on tugboat scheduling focuses on single-objective optimization [4], typically minimizing either total operating cost or total operation time. However, these asymmetric formulations do not satisfy modern ports’ requirement for the symmetrical consideration and joint optimization of economic and environmental performance [5]. In addition, prevailing metaheuristics (e.g., simulated annealing [6] and hybrid evolutionary strategies [7]) are prone to premature convergence on large-scale instances and struggle to achieve a symmetrical balance between exploration and exploitation, handle dynamic, complex constraints, and coordinate multiple objectives. Meanwhile, tugboat operations entail substantial fuel consumption and significant emissions of pollutants and greenhouse gases [8]. Consequently, reducing energy use and emissions in port operations has become a major priority for ports worldwide [9].

“Green and sustainable ports” are the current realization goals of global ports. The concept encompasses operational practices and infrastructure designs that minimize environmental impact while maintaining economic feasibility, especially in reducing air emissions and carbon footprints. Previous studies have demonstrated various pathways towards this goal. For instance, optimizing vessel speeds to reduce fuel consumption and emissions has been a direct operational focus [10], while others have explored the adoption of alternative propulsion systems, such as hybrid or electric tugboats, as a long-term technological solution [5]. The urgency of addressing emissions in port operations is underscored by the substantial contribution of maritime activities to air pollution and greenhouse gases. Tugboat operations, though a supporting service, constitute a notable emission source within the port area due to their high-power, low-speed operational profiles, leading to significant fuel consumption per unit of work [8]. Therefore, implementing well-informed tugboat-scheduling decisions that strike a symmetry between efficiency and sustainability is crucial for green and sustainable port development.

While various metaheuristics have been applied to tugboat scheduling, they often suffer from premature convergence and the burden of complex parameter tuning [11]. To address the needs of green port transformation, this paper integrates the energy consumption characteristics of both conventional and electric tugs along with practical operational constraints in ports. A multi-objective mixed-integer linear programming (MILP) model is constructed, aiming to minimize both total operational cost and operational time. At the algorithmic level, a hybrid optimization framework named Jaya-QL is proposed, that is based on a Jaya algorithm integrated with Q-learning. It leverages the parameter-free, population-based global search capability of the Jaya algorithm to efficiently explore the solution space, while the embedded Q-learning module introduces an adaptive intelligence that dynamically refines the search strategy based on continuous feedback (reward) from the dual objectives of cost and time. A problem-specific encoding scheme and fitness function calculation method are designed to accommodate the characteristics of tugboat scheduling. Validation proceeds on small- and medium-scale benchmarks, followed by complex scenarios constructed from vessel-operation data for the main port area of Lianyungang. The experimental results demonstrate that integrating reinforcement learning with conventional optimization yields practical feasibility and measurable performance gains for tugboat scheduling.

The rest of this paper is organized as follows. Section 2 introduces the related work. Section 3 presents the problem background and describes the composition of the model. The algorithm is described in Section 4. Section 5 shows numerical experiments, and Section 6 summarizes the findings.

2. Related Work

The tugboat scheduling problem is a complex combinatorial optimization problem driven by stringent constraints, The mathematical modeling of tugboat operations using Mixed-Integer Linear Programming (MILP) has evolved through the increasing refinement of operational constraints and system boundaries.

Historically, early MILP models treated tugboat assignment as a variant of the Parallel Machine Scheduling Problem (PMSP) [12] or basic Assignment Problems, primarily focusing on matching vessels with available tugboats. As port operations became more dynamic, the development of MILP models shifted toward capturing sophisticated spatial and physical constraints. Key advancements include the transition from single-base models [13] to multi-berthing base formulations [14], which explicitly account for the relocation costs and time delays of tugboats moving across different port regions. Furthermore, models have evolved to incorporate tugboat heterogeneity [15], where matching logic is no longer binary but based on specific ship length, tonnage, and horsepower requirements. Recent MILP developments have also seen the integration of tugboat scheduling with other port resources, such as integrated dispatching of berths and quay cranes scheduling [16], creating integrated optimization frameworks that reflect the highly coupled nature of modern maritime logistics. By tracing these developments, it is evident that MILP models have moved from static assignment toward high-fidelity, constraint-rich scheduling tools that provide the necessary benchmark for advanced metaheuristics. The increasing sophistication of these MILP formulations underscores the fundamental reality that the tugboat scheduling problem is a complex combinatorial optimization problem driven by stringent constraints. The core challenge lies in allocating limited tugboat resources to time-sequenced ship service requests within dynamic port environments. This process must satisfy multiple hard constraints while simultaneously optimizing conflicting objectives. Current research addressing this problem primarily focuses on two main directions: the design of model architectures and the development of efficient solution algorithms.

Current research on tugboat scheduling primarily focuses on single-objective optimization. One category of studies considers only operational costs. For example, Abou et al. [17] established a mixed-integer programming (MIP) model that incorporates constraints related to both pilotage and tugboat operations. The objective of this model is to minimize the maximum waiting time among all vessels. Wei et al. [18] proposed a MILP model incorporating realistic operational constraints to minimize total operating cost. Jia et al. [19] employed a network representation and integer programming to allocate tugboats with total operating cost as the objective. A second stream considers only operational efficiency. Another category of studies concentrates solely on operational efficiency. Wang et al. [20] studied tugboat allocation at container terminals and proposed a mixed-integer programming model that minimizes the maximum operating time across all tugboats. Wang et al. [21] minimized vessel turnaround time by jointly optimizing tugboat assignment and vessel sequencing. Kang et al. [2] modeled container-port tugboat scheduling to minimize total weighted service time on the anchorage–berth leg while accounting for uncertainty in vessel arrivals and towage durations. Ma et al. [22] constructed a mixed-integer linear programming model with the goal of minimizing the total completion time of battery swapping operations, and proposed a logical Benders decomposition algorithm (LBBD) to collaboratively optimize the task allocation of tugboats and the dispatching of battery swapping stations.However, current research on multi-objective optimization for tugboat scheduling remains relatively limited. Wang et al. [23] developed a multi-objective model for multi-berth tugboat bases to minimize completion time and total fuel consumption under different operating modes. Zhong et al. [4] examined cross-regional scheduling and proposed an MILP that minimizes both maximum completion time and total fuel consumption. Ren et al. [24] designed an improved seagull optimization algorithm (SOAPG) to optimize the scheduling of port tugboats, in order to achieve a comprehensive balance among operating costs, operational efficiency and scheduling fairness. In this paper, we formulate a multi-objective tugboat-scheduling model that minimizes total operating cost and total operation time, thereby overcoming the limitations of single-objective approaches. Furthermore, based on the actual operational conditions of the main port area of Lianyungang, we introduce a differentiated cost calculation method that accounts for the distinct characteristics of conventional and electric tugboats.

From the perspective of solution methodologies for tugboat scheduling, metaheuristics have emerged as a dominant research paradigm due to their rapid solving capabilities guided by heuristic functions. Wang et al. [21] proposed an improved discrete particle swarm optimization (IDPSO) method for the tugboat-assignment problem to minimize vessel turnaround time. Zhu et al. [25] formulated a mixed-integer programming model to minimize total carbon emissions and solved it efficiently using a variable-neighborhood search algorithm. Wang et al. [26] employed adaptive large-neighborhood search (ALNS) to optimize tugboat schedules under multi-call, multi-service modes with the goal of minimizing total service cost. Sun et al. [27] developed an improved genetic algorithm with inversion operations for tugboat scheduling in Zhoushan Port, which demonstrated advantages in both enhancing scheduling quality and reducing computational time. Yao et al. [28] proposed an improved grey wolf optimizer for efficient tugboat scheduling in multi-berth-base settings. Although metaheuristics demonstrate advantages in computational efficiency, their ability to explore the solution space is constrained by predefined heuristic strategies. This limitation makes them prone to becoming trapped in local optima when tackling large-scale problems. In contrast, reinforcement learning (RL) methods based on dynamic policy iteration exhibit stronger environmental adaptability and greater robustness in managing large state spaces and sparse rewards.Drungilas et al. [29] modeled the real-time scheduling of Automated Guided Vehicles (AGVs) as a Markov Decision Process and implemented Q-learning for dynamic scheduling. Li et al. [11] incorporated a Deep Deterministic Policy Gradient (DDPG) algorithm enhanced with prioritized experience replay and a noise suppression mechanism to address tug scheduling in dynamically changing port environments.

Existing studies have extensively explored various methodologies to solve the tugboat scheduling problem. However, traditional metaheuristics often struggle with parameter sensitivity and are prone to premature convergence. Additionally, while reinforcement learning offers adaptability, pure RL methods frequently face challenges such as sparse rewards and slow convergence in high-dimensional discrete search spaces. To address complex and high-dimensional scheduling problems, some researchers have integrated reinforcement learning with metaheuristic algorithms to develop more efficient and intelligent optimization methods. Lu et al. [30] proposed a hybrid strategy combining four types of metaheuristics with Q-learning, enabling adaptive selection among five local search operators throughout the iterative process. Yu et al. [31] embedded Q-learning into a meta-heuristic framework to solve energy-efficient multi-objective distributed assembly permutation flow shop scheduling problems. Yu et al. [32] proposed an optimization framework that emstacks Q-learning into meta-heuristic algorithms, and adaptively selects the neighborhood structure through reinforcement learning to achieve the optimal trade-off between scheduling efficiency and energy conservation and emission reduction. The adaptive decision-making mechanism of RL dynamically adjusts the search behavior of metaheuristics, enhancing their exploration capability in complex solution spaces and reducing the risk of premature convergence. Meanwhile, the structured global search framework provided by metaheuristics accelerates the focus on potentially high-quality solution regions, thereby synergistically improving convergence speed and strengthening the guarantee of global optimality. To effectively solve the complex multi-objective and multi-constraint tugboat scheduling problem, this paper develops a cooperative optimization framework that integrates metaheuristics with RL.

Addressing the limitations of existing tugboat scheduling models and optimization algorithms, this paper proposes a multi-objective MILP model that aims to minimize both the total operating cost and total operating time of tugboats. The model explicitly accounts for cost variations among different types of tugboats under diverse operational conditions. Building on this formulation, we propose an integrated multi-objective optimization framework that synergistically integrates the elite solution-guided mechanism of the Jaya algorithm with the adaptive decision-making capability of Q-learning. This hybrid approach establishes a global-local cooperative mechanism tailored for multi-objective tugboat scheduling. Furthermore, a case study based on real data from the main port area of Lianyungang is conducted to validate the effectiveness and practicality of the proposed model and method, offering a new solution for tugboat scheduling problems.

3. Tugboat Scheduling Model

3.1. Tugboat Scheduling Process

The tugboat scheduling process involves multiple entities including the port, tugboats, and vessels.The operational workflow is illustrated in Figure 1. When a vessel arrives at the anchorage area and idle tugboats are available, the assigned tugboat initiates its journey toward the anchorage. The tugboat then provides berthing assistance, supporting the vessel until it is safely moored at the target berth. Upon completion of the task, the tugboat returns to its base and awaits the next assignment.

Tugboat scheduling is a core decision problem in port operations and a complex optimization challenge. Fundamentally, it entails dynamically matching tugboat resources to vessel-service demands across time and space under multiple objectives and constraints. The objective is to deploy tugboats efficiently to assist berthing, unberthing, and shifting. This ensures that all tasks are completed within designated time windows, thereby maximizing operational efficiency, reducing overall operational costs, and minimizing time expenditure. The problem involves multiple tugboat bases, a heterogeneous tugboat fleet, and varying task volumes with diverse constraints. These include the number of tugboats, horsepower, speed, sailing distances, and differential cost consumption during assisted versus unassisted states. Consequently, effective solutions must integrate these factors to produce feasible schedules that enhance port operational efficiency and economic performance.

3.2. Model Assumption

The scheduling optimization model in this paper contains the following assumptions:

(1) Vessel arrival times are known in advance as ports receive detailed information before tugboat operations commence.

(2) All tugboats are initially stationed at known locations within their bases prior to task initiation.

(3) Tugboats function without failure during service operations.

(4) Tugboats maintain fixed optimal economical speeds according to their horsepower ratings.

(5) Service time depends solely on travel distance and speed; other factors are not considered.

(6) Each tugboat serves only one vessel simultaneously, with available fleet size and grades meeting operational requirements.

(7) The water depth of the port basin is assumed to be consistently sufficient for all vessel types, and the impact of tides is neglected.

3.3. Parameters and Variables

To ensure a rigorous and unambiguous formulation of the tugboat scheduling problem, Table 1 provides a comprehensive definition of the mathematical nomenclature.This notation provides the framework for the MILP model presented, which quantifies operational costs (

z_{1}

) and time (

z_{2}

). By categorizing variables into sets, parameters, and decision variables, the table facilitates the systematic derivation of the constraints and objective functions necessary for the Jaya-QL optimization algorithm. The notations used in the model for tugboat scheduling are summarized in Table 1, which includes symbolic variables, decision variables, and their respective definitions.

3.4. MILP Model

To enhance port operational efficiency, a MILP model has been formulated that minimizes total tugboat operating cost and total operation time while ensuring completion of all tasks. The objective functions are specified in Equations (1)–(4).

M i n (Z_{1}) = \sum_{m = 1}^{M} \sum_{ω = 1}^{W} z_{1},

(1)

M i n (Z_{2}) = \sum_{m = 1}^{M} \sum_{ω = 1}^{W} z_{2},

(2)

z_{1} = C_{1} P S_{m n}^{k} X_{m n}^{ω} + C_{2} V_{m 2} (T F_{m n}^{ω} - T S_{m n}^{ω}) X_{m n}^{ω} + C_{1} P R_{m n}^{k} X_{m n}^{ω} Y_{m n}^{k ω},

(3)

z_{2} = (P S_{m n}^{k} + P R_{m n}^{k}) / V_{m 1} + T F_{m n}^{ω} - T S_{m n} .^{ω}

(4)

Equation (1) minimizes total operating cost. Equation (2) minimizes total operation time. Equation (3) defines operating cost as the sum of costs incurred in non-towage and towage states. The product of speed and time effectively translates the temporal duration into an operational distance, which, when multiplied by the distance-based cost coefficient

C_{2}

, yields a consistent monetary unit. Equation (4) specifies operation time as the sum of travel times from the tugboat base to the task origin, from origin to destination, and from the destination back to the base.

The constraints are specified in Equations (5)–(16):

\sum_{m = 1}^{M} X_{m n}^{ω} = N_{n}, \forall n \in N,

(5)

H_{m} \geq H_{n} X_{m n}^{ω}, \forall m \in M, \forall n \in N,

(6)

X_{m n}^{ω} X_{m n}^{ω + 1} (T S_{m n}^{ω + 1} - T S_{m n}^{ω}) \geq 0, \forall m \in M, \forall n \in N, \forall k \in K,

(7)

T S_{m n}^{k} + (P S_{m n}^{k} / V_{m 1}) X_{m n}^{1} Y_{m n}^{k 0} \leq T S_{m n}^{ω}, \forall m \in M, \forall n \in N, \forall k \in K,

(8)

X_{m n}^{ω} > X_{m n}^{ω + 1}, \forall m \in M, \forall n \in N, \forall k \in K,

(9)

\sum_{n = 1}^{N} X_{m n}^{ω} - \sum_{n = 1}^{N} X_{m n}^{ω + 1} \geq 0, \forall m \in M, \forall n \in N, \forall k \in K,

(10)

Y_{m n}^{k ω - 1} \geq X_{m n}^{ω}, \forall m \in M, \forall n \in N, \exists k \in K,

(11)

\sum_{k = 1}^{K} Y_{m n}^{k ω} \geq X_{m n}^{ω}, \forall m \in M, \forall n \in N,

(12)

Y_{m n}^{k 0} = 1, \forall m \in M, \forall n \in N,

(13)

Y_{m n}^{k ω} = 1, \forall m \in M, \forall n \in N : X_{m n}^{ω} = 0,

(14)

X_{m n}^{ω}, Y_{m n}^{k ω} = {0, 1}, \forall m \in M, \forall n \in N, \forall k \in K,

(15)

T S_{m n}^{ω}, T F_{m n}^{ω}, T S_{m n}^{k} \geq 0, \forall m \in M, \forall n \in N, ω \in W .

(16)

Equation (5) ensures that the number of tugboats assigned to each task meets the specific requirement of that task. Equation (6) specifies that the total horsepower of the tugboats assigned to a task must satisfy the horsepower requirement of that task. Equation (7) restricts each tugboat to at most one concurrent assignment. Equation (8) requires tugboats assigned to the first task to arrive at the task origin before the start time. Equations (9) and (10) impose sequencing constraints so that each tugboat executes tasks in the planned order. Equation (11) requires a tugboat to return to its home base before starting the next task. Equation (12) ensures that the available tugboats at each base are sufficient to meet task demand. Equation (13) sets each tugboat’s initial location at its home base prior to the first task. Equation (14) specifies that a tugboat must return to its base if it is not currently assigned to any task. Equations (15) and (16) define the domain constraints for the decision variables.

4. Algorithm

The Jaya algorithm is a population-based optimization technique that offers advantages such as parameter-free operation and strong global exploration capabilities. However, when applied in isolation, the Jaya algorithm exhibits limited local exploitation capability and is prone to becoming trapped in local optima, particularly in multimodal optimization problems. This asymmetry between its potent exploration and feeble exploitation restricts its overall performance. Furthermore, its inability to effectively utilize historical search information hinders a comprehensive exploration of the solution space. Q-learning, in contrast, is a reinforcement learning method that adapts to dynamic environments and optimizes policies from experience through sequential decision-making. Through trial-and-error interaction within a Markov decision process (MDP), it iteratively improves the policy. Its strength in learning from historical states and rewards presents a symmetrical complement to the population-wide, moment-driven update of Jaya. To address their individual limitations and harness their symmetrically complementary strengths, we propose a hybrid method based on a Jaya algorithm integrated with Q-learning to solve the tugboat scheduling problem. This integration is designed to establish a symmetrical and cooperative search mechanism. The approach employs multidimensional real-valued encoding to represent tugboat-assignment decisions and designs a fitness function within an event-based modeling framework. The detailed procedure of the algorithm is described as follows.

4.1. Encoding

In traditional binary encoding, vessel and tugboat identifiers are first converted to bit strings and then mapped back during decoding, which increases model complexity and computational burden. In contrast, a multidimensional real-valued encoding directly represents the scheduling logic, thereby simplifying the encoding–decoding process and improving resource-allocation efficiency. Therefore, this paper adopts a multidimensional real-valued representation for the tugboat-assignment problem to avoid the complexity of binary coding. During the encoding process, vessels are first numbered sequentially according to their arrival times, following a first-come-first-served principle where earlier arrivals receive priority in tugboat assignment. The encoding must also satisfy Equations (5) and (6) of the mathematical model. The dimension of the decision vector depends on the current number of vessel tasks and the maximum number of tugboats required by any single task. An example of the specific encoding is provided in Table 2. In the example, six vessels await tugboat service, and a tugboat index of 0 denotes an empty slot (i.e., no tugboat is assigned). For illustration, tugboat IDs 7 and 9 are assigned to vessel 1 (two tugboats), whereas tugboat IDs 4, 11, 12, and 13 are assigned to vessel 2 (four tugboats).

4.2. Initial Individuals and Population

Each solution vector represents a candidate solution, and a set of n individuals forms the population. During Jaya initialization, vessel attributes (e.g., length) are first used to determine, for each vessel, the required bollard pull and the number of tugboats. The algorithm then performs randomized matching from the pool of available tugboats. Each individual encodes the tugboat configuration for all vessels in the current stage. Its dimension is determined by the total number of vessel tasks and the maximum number of tugboats required by any single task. To construct the initial population, we proceed iteratively: for each individual to be generated, the algorithm computes tugboat demand for each vessel from its physical parameters; subject to specification constraints, it then randomly samples the required number of tugboats from the available set, thereby ensuring feasibility of every initial individual. An example of population initialization is illustrated in Figure 2. Each entry denotes a tugboat identifier; 0 indicates an empty slot (i.e., no tugboat assigned). The horizontal dimension of an individual equals the maximum per-task tugboat requirement, and the vertical dimension equals the number of vessel tasks in the current stage. For example, Task 1 requires two tugboats (IDs 7 and 9), whereas Task 2 requires four tugboats (IDs 4, 11, 12, and 13).

4.3. Fitness Function

Based on the preceding formulation and encoding scheme, each generated solution vector (individual) represents a tugboat-scheduling plan. First, solution vectors are generated to satisfy task requirements according to Equations (5)–(7), while the initial positions of the tugboats must conform to Equation (13). Next, the latest arrival time for the first task is derived using Equation (8), and the start and end times of each task for every tugboat are computed without considering other influencing factors. The composition of these times must conform to Equations (9) and (10). Subsequently, the return status of the tugboats is validated based on Equations (11) and (12). Any tugboat that fails to meet these constraints at the current node is designated as unassignable. In addition, tugboats without reassigned tasks must satisfy Equation (14). If the solutions generated by the Jaya algorithm fail to satisfy the required tugboat count, the repair mechanism supplements the assignment by selecting idle tugboats based on the ’shortest relocation distance’ principle. By fine-tuning the operational timeline, the system ensures both physical feasibility and temporal non-overlap for all tugboat sequences. This logic is implemented via linear traversal, maintaining a stable computational overhead as the problem scale increases.In actual port operations, there may be some vessels with restricted maneuvers. In such cases, additional tugboats that exceed the standard configuration need to be allocated. Given the modeling assumption that tugboat operation time depends solely on travel distance and speed, we compute the total operation time

Z_{2}

for all tugboats from the start and finish times,

T S_{m n}^{ω}

and

T F_{m n}^{ω}

, via Equation (4). Fuel consumption during waiting periods is considered negligible. The total operating cost

Z_{1}

incorporates only towing and non-towing cost consumption, as defined in Equation (3). The pseudocode for fitness calculation is as follows:

Fitness calculation

1. For each tugboat m:

2. Initialize

T S_{m}^{k}

.

3. Compute

T S_{m}^{ω}

and

T F_{m}^{ω}

according to Equations (9) and (10).

4. End for.

5. For each ship n:

6. assigned_tugboats ← candidate_solution[n].

7. required_quantity ← ship_tugboat_match[ship_type[n]].Quantity.

8. if len(assigned_tugboats) ≠ required_quantity then

9. assigned_tugboats ← repair_assignment.

10. end if

11. End for.

12. For each tugboat m:

13. Cost = cost_to_start + cost_to_return +

C_{2} (T F_{m}^{ω} - T S_{m}^{ω}) V_{m 2}

.

14. Time = time_to_start + time_to_return +

(T F_{m}^{ω} - T S_{m}^{ω})

.

15. End for.

16. Calculate the fitness values of each individual.

17. Fitness1 = C_total = ∑Cost; Fitness2 = T_total = ∑Time.

18. Return

(C_{total}, T_{total})

.

4.4. Jaya Algorithm

The Jaya algorithm is a metaheuristic optimization technique based on swarm intelligence. Its parameter-free nature significantly reduces the cost of implementation and tuning. During iterations, candidate solutions are updated by directly utilizing information from the symmetrically opposing forces of the best and worst solutions in the current population, creating a symmetrical push-pull dynamic. This mechanism drives solutions progressively toward optimal regions while steering them away from inferior directions, thereby achieving the goal of global optimization. The update rule is specified in Equation (17):

X_{n e w} = X + r_{1} \cdot (X_{b e s t} - | X |) - r_{2} \cdot (X_{w o r s t} - | X |),

(17)

where

X_{best}

and

X_{worse}

respectively represent the current optimal and worst solutions.

r_{1}

and

r_{2}

are random numbers within [0, 1].

Solving multi-objective problems is often constrained by the complexity of the solution space. The Jaya algorithm, with its iterative mechanism that follows favorable solutions and avoids unfavorable ones, exhibits a balance between global exploration and local exploitation. This enables the algorithm to more readily approach the global optimum in multi-constrained, multi-objective tugboat scheduling problems. The specific flowchart of the Jaya algorithm adopted in this paper for solving the tugboat scheduling model is illustrated in Figure 3.

4.5. Q Learning

Q-learning is a value-based reinforcement learning algorithm. It aims to select the optimal action by leveraging values within a Q-table to achieve the best possible learning outcome. In this paper, the Q-learning framework is constructed as a quadruple that includes the state space, action space, Q-table, and reward function, detailed as follows:

(1) State space

In this paper, the relative quality gap between the current solution obtained by the Jaya algorithm and the global optimal solution is categorized into three states. These three states are formally defined in Equation (18) as follows:

S = \{\begin{matrix} S (1) & \frac{f_{c u r r e n t} - f_{b e s t}}{f_{b e s t}} \leq 0.1 \\ S (2) & 0.1 < \frac{f_{c u r r e n t} - f_{b e s t}}{f_{b e s t}} \leq 0.5 \\ S (3) & e l s e, \end{matrix}

(18)

where

f_{c u r r e n t}

represents the current normalized objective value, and

f_{b e s t}

represents the optimal normalized objective value.

The three-level state design achieves policy differentiation while ensuring learning efficiency. A fewer number of divisions would weaken the diversity of strategies. A threshold of 0.1 in State S(1) ensures that high-quality solutions tend to remain stable, preserving the current solution. Conversely, a threshold of 0.5 delineates a significant improvement, prompting more aggressive adjustments in State S(3). The selection of these thresholds is guided by the typical improvement rates observed during the initial stages of solving the tugboat scheduling problem. Specifically, preliminary sensitivity analysis was conducted by testing different threshold orders of magnitude. The selected values ensure that the state space can effectively distinguish between global exploration phases and local convergence phases without being overly sensitive to numerical noise, thereby stabilizing the Q-value updates.

(2) Action space

State evaluation computes the relative quality gap between current and global optimal solutions. Actions are then selected via

ϵ

-greedy policy. We design a discrete action space with three adjustment strategies: A(1), A(2), A(3). A(1): Maintain current tugboat assignments. A(2): Replace random task assignments with global optimum equivalents. A(3): Regenerate compliant tugboat combinations for selected tasks. Table 3 defines the state–action synergy mechanism, which maps evolutionary states (S) to their corresponding adaptive search strategies (A) within the Q-learning module.

(3) Q table

Q-learning evaluates the action value (Q) for each state–action pair and selects the action that maximizes the Q-value, corresponding to the highest expected return. As learning progresses and additional feedback is incorporated, the Q-table more accurately approximates the action-value function, thereby improving action selection. Table 4 presents the structural representation of the Q-matrix

Q (s, a)

, which is employed to guide the local search during the iterative process by quantifying the accumulated rewards for each State–Action pair.

Under the action-selection policy, actions with higher Q-values in the Q-table are more likely to be chosen. Integrating Q-learning enables the Jaya algorithm to approach globally optimal regions more efficiently. After each iteration, the Q-values are updated according to Equation (19):

Q (S, A) = Q (S, A) + α * (r + γ * max (Q (S^{'}, A^{'})) - Q (S, A))

(19)

where

Q (S, A)

is the Q value of action A taken in state S,

α

is the learning rate,

γ

is the discount factor, r is the actual reward received by action A, and

max (Q (S^{'}, A^{'}))

is the highest expected reward for all possible operations in state S.

(4) Reward function

The reward function is designed to guide the agent during reinforcement-learning training in achieving a symmetrical trade-off between total tugboat operating cost and total operation time. Through iterative training, the agent maximizes the reward, thereby directing the search toward optimal solutions. An effective metric is essential in the design to evaluate whether the search is effectively guided toward improved solutions. Therefore, a reward function is formulated based on the dual objectives of minimizing both tugboat operating costs and operating time. This function computes a feedback reward value to assess actions and optimize strategies. The reward function R is defined in Equation (20):

R = e^{- λ δ_{i} Z_{1}} + e^{- (1 - λ) δ_{i} Z_{2}},

(20)

where

Z_{1}

represents the total operation cost of the tugboat,

Z_{2}

represents the total operation time of the tugboat,

λ

represents the weight of a single objective, and

δ_{i}

represents the benchmark value used to eliminate the difference in the i-th target dimension.

The proposed reward function provides a symmetric evaluation of the independent contributions of cost and time, preventing any single objective from dominating the signal. In Equation (20), the benchmark value

δ_{i}

is defined as the reciprocal of the initial optimal value of single objective.It eliminates the scale and unit differences between cost and time by transforming them into dimensionless ratio. Meanwhile, the negative exponential mapping ensures that the reward is inversely proportional to the objective values. This design maintains a symmetric optimization pressure and provides a stable, normalized reward signal for the Q-learning module. Moreover, the nonlinearity of exponential decay makes the reward more sensitive to improvements across the relevant range. Compared with hard-threshold schemes, the exponential function provides a continuous reward, enhancing dynamic adaptability and enabling the agent to discern subtle improvements. This fine-grained feedback facilitates a balanced trade-off between exploration and exploitation.

4.6. The Jaya Algorithm Integrating Q-Learning

The hybrid optimization framework proposed in this paper deeply integrates the global search capability of the Jaya algorithm with the local optimization ability of Q-learning. It establishes a collaborative global-local optimization mechanism founded on a principle of symmetrical cooperation, thereby enhancing the algorithm’s efficiency and stability in solving complex multi-objective problems. The specific integration strategy, which leverages symmetric design at multiple levels, is implemented as follows:

(1) Global search layer. The Jaya algorithm performs population-level iterative updates to conduct coarse-grained exploration of the solution space, leveraging its rapid convergence. By comparing the difference vector between the symmetrically opposing forces of the current best and worst solutions, the population is steered toward high-potential regions, thereby avoiding the parameter-tuning complexity typical of many metaheuristics.

(2) Local optimization layer. A Q-learning module is embedded to adaptively refine solutions via a state–action–reward mapping. The state space is partitioned by the gap between a candidate solution and the global best; the action set comprises three operators—retain, copy segments from the global best, and random perturbation—and the reward function aggregates improvements in the dual objectives of cost and time to enable fine-grained exploitation.

(3) Knowledge-sharing mechanism. A dynamic and symmetrically bidirectional linkage is maintained between the global best solution and the Q-table. The global best informs local Q-learning actions, while the outcomes of these actions subsequently contribute to updating the global best, creating a symmetric cycle of knowledge exchange and enabling efficient cross-level transfer of experiential knowledge.

To integrate the components into a functional optimizer, the Jaya-QL framework adopts a hierarchical, dual-layer search logic. In the first layer, the population undergoes a global search based on standard Jaya update rules to broadly explore the solution space. In the second layer, each individual solution is treated as an independent Reinforcement Learning Agent for micro-level refinement.

Unlike traditional hybrid metaheuristics that apply uniform operators to the entire population, Jaya-QL allows each agent to observe its own relative performance state (S) and adaptively select an optimal strategy (A) from a shared Q-table. This shared structure facilitates collective intelligence, where agents receive immediate rewards (R) based on solution quality improvements, subsequently updating the Q-values via the Equation (19). This synergistic interaction ensures a symmetric balance between global exploration and localized, adaptive exploitation, effectively enhancing both convergence speed and the diversity of the Pareto front. The complete procedural flow of this hybrid approach is formalized in Algorithm 1.

Algorithm 1 Jaya-QL Algorithm for Tugboat Scheduling

1. Initialize: Population P, Q-table

Q (s, a) \leftarrow 0

, parameters

ϵ, α, γ

.
2. Evaluate: Calculate bi-objective fitness for each

X_{i} \in P

; identify

X_{b e s t}

and

X_{w o r s t}

.
3. While

t < Max_Iterations

do:
4. Step 1: Population-level Jaya Search
5. For each

X_{i} \in P

:
6.

X_{i, n e w} = X_{i} + r_{1} (X_{b e s t} - | X_{i} |) - r_{2} (X_{w o r s t} - | X_{i} |)

.
7. If

f (X_{i, n e w})

is superior to

f (X_{i})

then

X_{i} \leftarrow X_{i, n e w}

.
8. Step 2: Individual-level Q-learning Refinement
9. For each

X_{i} \in P

:
10. Calculate

δ = (f (X_{i}) - f (X_{b e s t})) / f (X_{b e s t})

to determine current state

S_{i}

.
11. Select Action

A_{i} \in {A_{1}, A_{2}, A_{3}}

using

ϵ

-greedy policy.
12. Execute

A_{i}

to generate

X_{i, r e f i n e d}

.
13. Calculate Reward R using Equation (20).
14. Update

Q (S_{i}, A_{i})

using the Temporal Difference (TD) error.
15. If

f (X_{i, r e f i n e d}) < f (X_{i})

then

X_{i} \leftarrow X_{i, r e f i n e d}

.
16. Update

X_{b e s t}

and

X_{w o r s t}

.
17. End While
18. Return

X_{b e s t}

(Pareto optimal solution set).

This paper employs a Jaya algorithm integrated with Q-learning to solve the tugboat scheduling model. The detailed computational procedure is illustrated in Figure 4.

I. Input specification. Three data sources are required: port task data, available tugboat resources, and tugboat-base information. Task data specify, for each job, the required number of tugboats, required bollard pull, start time, origin, and destination. Resource data describe the available fleet, including fleet size, bollard pull, free-sailing speed, free-sailing cost, and towage cost. Tugboat-base data includes the number of bases, their capacities, and locations.

II. Feasible initialization. Each task must be matched with tugboats that satisfy its quantity and power requirements, which constrains tugboat usage. Given the task order and model constraints, we identify the set of currently available tugboats and randomly generate an initial pool of candidate solutions from this set, ensuring that no infeasible assignment arises.

III. Joint global–local optimization. The Jaya algorithm iteratively updates the population to rapidly identify high-quality regions of the search space. In parallel, a Q-learning module performs local policy refinement: it dynamically selects adjustment actions based on the current state and updates the Q-table via the Bellman recursion for policy evaluation. To balance exploitation near incumbent good solutions with exploration of new regions, we adopt an

ϵ

-greedy action-selection strategy, which facilitates escape from local optima. The procedure returns the resulting tugboat schedule for the set of vessel tasks.

5. Simulation Experiment

To validate the effectiveness of the proposed Jaya-QL, this paper developed a multi-tier experimental validation framework. First, small- and medium-scale simulation cases were randomly generated and compared against the Jaya algorithm to assess the improvements introduced by the framework. Subsequently, actual vessel scheduling data from the main port area of Lianyungang in October 2024 were employed as a case study. In this setting, Jaya-QL was comprehensively compared with ABC, QPSO, and the standard Jaya algorithm to evaluate its overall performance advantages under complex multi-objective conditions. The Jaya-QL algorithm is configured with the following parameters: population size N = 50, number of iterations MG = 500, exploration rate

ε

= 0.8, earning rate

α

= 0.25, and discount factor

γ

= 0.95. All algorithms are running on a personal computer equipped with an Intel i5-9300H processor, 2.40 GHz and 8gb of memory.

5.1. Parameter Setting

This paper designs a tugboat scheduling experiment based on the actual tugboat scheduling process of the main port area of Lianyungang Port. The port operates 14 tugboats distributed across three bases, with the horsepower (HP) categories and quantities detailed in Table 5. The experiment was designed to evaluate the practical performance and advantages of the Jaya-QL algorithm. Lianyungang Port has fuel-powered tugboats and electric tugboats. The operating cost per nautical mile for tugboats with different horsepower levels performing scheduled tasks under different conditions is shown in Table 6. The cost coefficients presented in Table 6 are derived from the actual operational data provided by the Scheduling Center of Lianyungang Port. These values represent the operating costs, mainly the economic expenses required for the average fuel consumption per nautical mile of tugboats of different horsepower grades under standard working conditions. Tugboats in Lianyungang are restricted to a maximum speed of 11 knots, and voyage time is calculated according to actual distance and speed. When calculating the objective function values, the unit cost coefficients (CNY/H) of tugboats with different horsepower are allocated as shown in Table 6. To guarantee the safety and stability of port operations, the allocation of tugboat resources is strictly governed by mandatory port regulations. For various vessel types, the specific requirements regarding the quantity and horsepower of tugboats assigned for berthing, unberthing, and shifting operations must adhere to the standardized configuration protocols specified in Table 7.

5.2. Small and Medium-Scale Case Studies

This section evaluates the effectiveness of the Jaya and Jaya-QL algorithms using small-scale and medium-scale cases. The objective function is formulated with a linear weighting approach to simultaneously optimize total tugboats operating cost and operation time. A comparative analysis is conducted to assess the performance of the two algorithms across different problem scales. The linear weighted objective function is expressed as follows:

min α Z_{1} + (1 - α) Z_{2},

(21)

where

Z_{1}

represents the total tug operating cost, and

Z_{2}

represents the total operation time. The coefficients

α

and

(1 - α)

are weight factors assigned to the two objectives respectively, each taking a value within the range [0,1]. Because the two objectives differ markedly in units and scale, each is normalized prior to aggregation. The linear weighted-sum formulation is given by

min α (\frac{Z_{1} - Z_{1}^{*}}{Z_{1}}) + (1 - α) (\frac{Z_{2} - Z_{2}^{*}}{Z_{2}}),

(22)

where

Z_{1}^{*}

and

Z_{2}^{*}

are the optimal values of the single-objective optimization of the model for the minimum operation cost min

Z_{1}

and minimum operation time min

Z_{2}

of the tugboat.

To evaluate the performance of the Jaya and Jaya-QL algorithms in tugboat scheduling, twelve small-scale and medium-scale test cases were randomly generated. The generation of test instances is strictly grounded in empirical operational experience and technical standards to ensure the authenticity of the simulation experiments. Vessel arrival times are modeled after the statistical characteristics of historical task instances to accurately reflect realistic operational rhythms and traffic density. Furthermore, vessels are categorized into five distinct types based on their length, with the mandatory tugboat count and horsepower requirements for each category determined in strict accordance with Table 7. The comparison results are summarized in Table 8. From Table 8, it can be seen that the Jaya and Jaya-QL algorithms exhibit significant performance differentiation as task scale expands. For low-dimensional tasks (Iantances 1 and 2), both Jaya and Jaya-QL algorithms converge to the same global optimum, confirming the validity of Jaya in small solution spaces. However, as the task complexity increases, the Jaya-QL algorithm demonstrates superior global search capability. Its optimization results show an improvement ranging from 2.60% to 32.37% compared to the Jaya algorithm. For instances with 16 or more tasks, the average improvement reaches 25.27%, indicating a pronounced advantage in higher-dimensional settings. This performance gain comes at the cost of longer computation times, as Jaya-QL integrates Q-learning to guide local search.

Similarly, the performance of Jaya and Jaya-QL was assessed across the same twelve cases from the perspective of objective function optimization. Figure 5 and Figure 6 illustrate the total tug operating cost and overall operation time obtained by both algorithms. The results show that each method converges to the same optimal solution in low-dimensional spaces, confirming the validity of the baseline Jaya algorithm. As task scale increases, Jaya-QL demonstrates clear superiority. In particular, for the medium-dimensional Instance 11 (m = 14, n = 20, k = 3), Jaya-QL achieves a 32.56% reduction in operating cost compared to Jaya, highlighting the effectiveness of Q-learning in guiding local search and helping the algorithm escape local optima. For medium-scale tasks, Jaya-QL consistently identifies scheduling solutions with lower multi-objective values than Jaya, owing to its dynamic exploitation strategy. Figure 7 shows the computational times of the two algorithms. Due to the state–action value function update introduced by the Q-learning module, the average solution time of Jaya-QL increases by 30.11–42.55% compared with Jaya, and this time cost is positively correlated with problem dimension.

5.3. Large-Scale Case Study

This paper uses the arrival and departure timetable for the main port area of Lianyungang from 12:00 on 14 October 2024 to 12:00 on 16 October 2024 as the case study. During this period, 34 vessels required tug assistance for berthing, unberthing, and shifting. Detailed vessel information is provided in Table 9, and tug assignments for each vessel are set according to Table 7. To ensure a fair and rigorous comparison, the benchmark algorithms (ABC, QPSO, ACO, GA and standard Jaya) were specifically adapted to the tugboat scheduling problem. A unified encoding and decoding framework were implemented across all heuristics to standardize the search space. Furthermore, identical constraint-handling mechanisms were integrated to address the intricate time-window and tugboat-capacity requirements of the port. Each algorithm was executed using its optimal parameter configuration to ensure peak performance on the specific Lianyungang Port datasets. Finally, to evaluate the statistical stability of the proposed Jaya-QL and its counterparts, each algorithm was subjected to 10 independent trials.

The convergence trajectories in Figure 8 illustrate that while the GA algorithm achieves rapid initial optimization, it suffers from premature convergence, with search performance stagnating after the middle stage of the search. In stark contrast, Jaya-QL exhibits a more robust convergence profile despite a more moderate initial pace, it demonstrates superior sustained optimization capabilities. Notably, after the later stage of the search, Jaya-QL achieves stepwise performance breakthroughs, successfully escaping local optima. This sustained search activity underscores the efficacy of Jaya-QL in balancing exploration and exploitation, ultimately yielding a solution quality that significantly surpasses that of GA and other benchmark meta-heuristics. As shown in Table 10, the optimal solutions along with their corresponding total operation costs, total operation times, and algorithm computation times are presented for the benchmark algorithms.The proposed Jaya-QL algorithm demonstrates a clear advantage in global optimization compared to the benchmark algorithms. By integrating Q-learning’s dynamic balance between exploration and exploitation, Jaya-QL effectively escapes local optima throughout the iterations. The fitness value of the final solution exhibits a stepwise decline, confirming the effectiveness of reinforcement learning in enhancing solution quality in multi-objective optimization problems. In contrast, the other algorithms tend to exhibit premature convergence during the mid-phase of the search process. In addition to its superior convergence accuracy, Jaya-QL demonstrates remarkable computational reliability. Statistical analysis reveals that the algorithm yielded a standard deviation of 10,750.41 for operational costs, a figure substantially lower than those recorded for ABC (48,232.52) and the standard Jaya algorithm (42,394.28). Furthermore, it maintained a minimal standard deviation of 7.21 in temporal costs. This enhanced stability underscores the efficacy of the integrated Q-learning framework in mitigating sensitivity to stochastic variations. By dynamically regulating the exploration-exploitation trade-off, the proposed mechanism facilitates consistent convergence toward high-quality solutions.

As shown in the multi-objective performance comparison in Figure 9, the proposed Jaya-QL algorithm demonstrates dual-dimensional advantages in tugboat scheduling optimization. To provide a fair and robust evaluation, the data visualized in Figure 9 are derived from the average performance across 10 independent trials. Specifically, for each generation, the objective values shown are the arithmetic means calculated from the sample scenarios, thereby mitigating the impact of stochastic fluctuations inherent in meta-heuristic algorithms. In terms of operating cost, Jaya-QL achieves reductions of 24.36%, 17.92%, 15.13%, 7.35% and 22.83% compared to ABC, QPSO, ACO, GA and the standard Jaya algorithm, respectively. In addition, the proposed algorithm achieved improvements in total tugboat operation time, which is a key scheduling metric. The observed optimization effects were 1.58%, 0.29%, 0.23%, 0.21% and 0.93%, respectively. This dual-objective synchronous optimization capability is attributed to the hybrid architecture of Jaya-QL. The reward-punishment mechanism in Q-learning guides the search direction using historical experience, thereby reducing the probability of generating invalid scheduling solutions. Meanwhile, the elite retention strategy in the Jaya algorithm ensures solution diversity. While Jaya-QL significantly outperforms other heuristics in solution quality, its CPU time is higher than that of computationally efficient algorithms such as GA and QPSO. This increased overhead is primarily attributed to the recursive updates of the Q-table and the state–action selection process inherent in reinforcement learning. However, it is noteworthy that Jaya-QL remains more efficient than ACO and ABC and even achieves a slight reduction in time compared to the standard Jaya by streamlining the search trajectory. Despite this, minimizing computational latency remains a critical avenue for future work. Potential enhancements include optimizing the Q-learning update frequency or implementing parallel processing frameworks to further adapt the algorithm for real-time tugboat dispatching in large-scale port operations. The final optimal tugboat scheduling solution is illustrated in Figure 10.

The experimental results indicate that the proposed Jaya-QL algorithm not only reduces port operating costs but also substantially shortens vessel operation cycles. Beyond economic benefits, these improvements have profound implications for the construction of ’Green Ports’. From a sustainability perspective, the significant reduction in operation time directly translates to lower fuel consumption for both tugboats and berthing vessels, thereby effectively minimizing the carbon footprint and atmospheric pollutant emissions within the port area. Furthermore, the multi-objective collaborative optimization capability of Jaya-QL offers a novel technical pathway for dynamic and real-time tugboat scheduling in complex environments. This allows port decision-makers to transition from traditional cost-centric management to a more sustainable model that harmonizes economic efficiency with environmental stewardship, providing a scalable solution for the global maritime industry’s green transition.

5.4. Parameter Sensitivity Analysis

To balance the convergence performance and computational efficiency of the Jaya-QL algorithm, this study configured the population size at N = 50 and the maximum iterations at MG = 500. Preliminary experiments demonstrate that this configuration provides a sufficient search space to identify high-quality solutions while ensuring stable convergence within a reasonable timeframe. Consequently, a one-factor sensitivity analysis was conducted on the three core parameters of the Q-learning module: the exploration rate (

ϵ

), the learning rate (

α

), and the discount factor (

γ

).

As illustrated in Figure 11,

ϵ

plays a critical role in balancing global exploration and local exploitation. A lower exploration rate restricts operator diversity, which may cause the algorithm to be trapped in local optima and subsequently increase operational costs. The optimal search performance is achieved when

ϵ = 0.8

. Subsequently, the sensitivity analysis of

α

in Figure 12 indicates that the update frequency of the Q-table directly influences the stability of the guidance strategy. Setting

α = 0.25

effectively harmonizes learning efficiency with search stability, preventing both strategy lag and numerical oscillation. Finally, the analysis of

γ

in Figure 13 reveals that strategic foresight contributes significantly to scheduling quality. A higher discount factor (

γ = 0.95

) enables the algorithm to anticipate future task demands, thereby optimizing tugboat operation sequences and minimizing non-productive travel.

In summary, the optimal parameter combination is identified as

ϵ = 0.8

,

α = 0.25

, and

γ = 0.95

. Under this configuration, the algorithm achieves the most favorable balance between operational and temporal costs. These results validate the efficacy and robustness of the selected parameters for addressing large-scale tugboat scheduling problems.

6. Conclusions

This paper addressed the challenge of multi-task and multi-base collaborative tugboat scheduling in port operations. A multi-objective MILP model was developed to minimize both total operating cost and operation time while accounting for realistic constraints, including tugboat performance heterogeneity, multiple base locations, and continuity in task sequences. This model establishes a practical decision-making framework that achieves a symmetry between economic efficiency and operational timeliness.

To overcome the limitations of the Jaya algorithm in avoiding local optima and adapting search strategies, we proposed a hybrid optimization framework (Jaya-QL) that employs a Jaya algorithm integrated with Q-learning. By introducing reinforcement learning-based dynamic strategy adjustment, the framework adaptively achieves a symmetric balance between global exploration and local exploitation, thereby improving convergence accuracy and solution quality. Its effectiveness was verified through benchmark tests of varying scales and real operational data from 34 vessels in Lianyungang Port. The results consistently demonstrated that Jaya-QL outperformed comparative algorithms such as ABC and QPSO in optimizing both cost and time, confirming the value of integrating reinforcement learning with metaheuristics in complex scheduling tasks.

The proposed Jaya-QL framework provides a highly efficient and practical optimization paradigm for port operations, demonstrating significant improvements in cost reduction and vessel turnaround efficiency. Beyond its immediate application to tugboat scheduling, the scalability and generalizability of Jaya-QL suggest that integrating reinforcement learning with metaheuristic algorithms offers a potent solution for various complex, resource-constrained scheduling problems.

However, it is acknowledged that the current study operates under a deterministic MILP framework with several idealized assumptions, such as fixed vessel arrival times, constant sailing speeds, and 100% resource reliability. In practice, port environments are inherently stochastic and susceptible to uncertainties including vessel delays, equipment failures, and fluctuating weather conditions. To bridge the gap between theoretical methodology and real-world maritime dynamics, our future research will focus on transitioning from deterministic models to stochastic or robust optimization frameworks. Specifically, incorporating uncertainty parameters and exploring multi-agent reinforcement learning (MARL) will be prioritized to enhance collaborative scheduling efficiency and robustness. This evolution will further broaden the applicability of the Jaya-QL methodology to a wider range of multi-objective optimization contexts in increasingly dynamic environments.

Author Contributions

Conceptualization, W.Y. and Z.X.; Methodology, Z.X.; Software, W.Y.; Validation, W.Y., Z.X. and W.J.; Resources, W.J.; Data curation, Z.X. and W.J.; Writing—original draft, Z.X.; Writing—review & editing, W.Y.; Visualization, Z.X. and W.J.; Supervision, W.Y.; Project administration, W.Y. and W.J.; Funding acquisition, W.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the High-Tech Ship Research Project of China (grant number CBG4N21-4-3).

Data Availability Statement

The original contributions presented in this study are included in the article, and further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author Wei Jiang is employed by Wuhan Institute of Marine Electric Propulsion. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Lin, H.; Zeng, W.; Luo, J.; Nan, G. An analysis of port congestion alleviation strategy based on system dynamics. Ocean Coast. Manag. 2022, 229, 106336. [Google Scholar] [CrossRef]
Kang, L.; Meng, Q.; Tan, K.C. Tugboat scheduling under ship arrival and tugging process time uncertainty. Transp. Res. Part E Logist. Transp. Rev. 2020, 144, 102125. [Google Scholar] [CrossRef]
Zhu, J.; Chen, L.; Wang, B.; Xia, L. Optimal design of a hybrid electric propulsive system for an anchor handling tug supply vessel. Appl. Energy 2018, 226, 423–436. [Google Scholar] [CrossRef]
Zhong, H.; Zhang, Y.; Gu, Y. A Bi-objective green tugboat scheduling problem with the tidal port time windows. Transp. Res. Part D Transp. Environ. 2022, 110, 103409. [Google Scholar] [CrossRef]
Chu, L.; Lin, J.; Xu, X.; Yang, Z.; Yang, Q. Joint Scheduling of New Energy Hybrid Tugboats and Berths Under Shore Power Constraint. J. Mar. Sci. Eng. 2025, 13, 2236. [Google Scholar] [CrossRef]
Xu, Q.; Mao, J.; Jin, Z. Simulated Annealing-Based Ant Colony Algorithm for Tugboat Scheduling Optimization. Math. Probl. Eng. 2012, 2012, 246978. [Google Scholar] [CrossRef]
Liu, Z. Hybrid evolutionary strategy optimization for port tugboat operation scheduling. In Proceedings of the 2009 Third International Symposium on Intelligent Information Technology Application, Nanchang, China, 21–22 November 2009; IEEE: New York, NY, USA, 2009; Volume 3, pp. 511–515. [Google Scholar] [CrossRef]
Luo, D.-m.; Bao, S.; Hu, Z.-y.; Tan, P.-q. Cruise speed optimization of tugboat based on real fuel consumption and emission. J. Traffic Transp. Eng. (Chin. Ed.) 2017, 17, 93–100. Available online: https://transport.chd.edu.cn/en/article/id/201701011 (accessed on 6 January 2026). (In Chinese).
Chen, Z.S.; Lam, J.S.L. Life cycle assessment of diesel and hydrogen power systems in tugboats. Transp. Res. Part D Transp. Environ. 2022, 103, 103192. [Google Scholar] [CrossRef]
Sun, M.; Vortia, M.P.; Xiao, G.; Yang, J. Carbon Policies and Liner Speed Optimization: Comparisons of Carbon Trading and Carbon Tax Combined with the European Union Emissions Trading Scheme. J. Mar. Sci. Eng. 2025, 13, 204. [Google Scholar] [CrossRef]
Li, J.; Duan, X.; Xiong, Z.; Yao, P. Tugboat scheduling method based on the NRPER-DDPG algorithm: An integrated DDPG algorithm with prioritized experience replay and noise reduction. Sustainability 2024, 16, 3379. [Google Scholar] [CrossRef]
Edis, E.B.; Oguz, C. Parallel machine scheduling with flexible resources. Comput. Ind. Eng. 2012, 63, 433–447. [Google Scholar] [CrossRef]
Chang, D.F.; Hu, X.N.; Bian, Z.C. A research on port tug dynamic scheduling model and algorithm. Adv. Mater. Res. 2012, 524, 832–835. [Google Scholar] [CrossRef]
Ren, Y.; Li, M.; Lei, Y.; Zhou, Y.; Liu, D.; Tu, J.; Li, B. Research on tugboat scheduling optimization model considering the reliability of tugboat matching scheme. Sci. Rep. 2025, 15, 11922. [Google Scholar] [CrossRef] [PubMed]
Fang, C.; Chai, T.; Huang, W.; Zhu, H. Dual-Objective Optimization of Port Tugboat Scheduling with Heterogeneous Service Capabilities. J. Mar. Sci. Eng. 2025, 13, 2003. [Google Scholar] [CrossRef]
Chu, L.; Zhang, J.; Chen, X.; Yu, Q. Optimization of integrated tugboat–berth–quay crane scheduling in container ports considering uncertainty in vessel arrival times and berthing preferences. J. Mar. Sci. Eng. 2024, 12, 1541. [Google Scholar] [CrossRef]
Abou Kasm, O.; Diabat, A.; Bierlaire, M. Vessel scheduling with pilotage and tugging considerations. Transp. Res. Part E Logist. Transp. Rev. 2021, 148, 102231. [Google Scholar] [CrossRef]
Wei, X.; Meng, Q.; Lim, A.; Jia, S. Dynamic tugboat scheduling for container ports. Marit. Policy Manag. 2023, 50, 492–514. [Google Scholar] [CrossRef]
Jia, S.; Li, S.; Lin, X.; Chen, X. Scheduling tugboats in a seaport. Transp. Sci. 2021, 55, 1370–1391. [Google Scholar] [CrossRef]
Wang, S.; Kaku, I.; Chen, G.Y.; Zhu, M. Research on the modeling of tugboat assignment problem in container terminal. Adv. Mater. Res. 2012, 433, 1957–1961. [Google Scholar] [CrossRef]
Wang, S.; Zhu, M.; Kaku, I.; Chen, G.; Wang, M. An improved discrete pso for tugboat assignment problem under a hybrid scheduling rule in container terminal. Math. Probl. Eng. 2014, 2014, 714832. [Google Scholar] [CrossRef]
Ma, G.; Huang, Y.; Zhang, G.; Fan, P. Logic-Based Benders Decomposition for Unmanned Electric Tugboat Scheduling Considering Battery-Swapping Operations. J. Mar. Sci. Eng. 2025, 13, 1633. [Google Scholar] [CrossRef]
Wang, W.; Zhao, H.; Li, Q. Multi-objectives optimization for port tugboat scheduling considering multi-anchorage bases. Comput. Eng. Appl. 2013, 49, 8–12. [Google Scholar]
Ren, Y.; Chen, Q.; Lau, Y.y.; Dulebenets, M.A.; Li, B.; Li, M. A multi-objective fuzzy programming model for port tugboat scheduling based on the Stackelberg game. Sci. Rep. 2024, 14, 25057. [Google Scholar] [CrossRef] [PubMed]
Zhu, S.; Gao, J.; He, X.; Zhang, S.; Jin, Y.; Tan, Z. Green logistics oriented tug scheduling for inland waterway logistics. Adv. Eng. Inform. 2021, 49, 101323. [Google Scholar] [CrossRef]
Wang, X.; Liang, Y.; Wei, X.; Chew, E.P. An adaptive large neighborhood search algorithm for the tugboat scheduling problem. Comput. Ind. Eng. 2023, 177, 109039. [Google Scholar] [CrossRef]
Sun, C.; Li, M.; Chen, L.; Chen, P. Dynamic tugboat scheduling for large seaports with multiple terminals. J. Mar. Sci. Eng. 2024, 12, 170. [Google Scholar] [CrossRef]
Yao, P.; Duan, X.; Tang, J. An improved gray wolf optimization to solve the multi-objective tugboat scheduling problem. PLoS ONE 2024, 19, e0296966. [Google Scholar] [CrossRef]
Drungilas, D.; Kurmis, M.; Senulis, A.; Lukosius, Z.; Andziulis, A.; Januteniene, J.; Bogdevicius, M.; Jankunas, V.; Voznak, M. Deep reinforcement learning based optimization of automated guided vehicle time and energy consumption in a container terminal. Alex. Eng. J. 2023, 67, 397–407. [Google Scholar] [CrossRef]
Lu, B.; Gao, K.; Ren, Y.; Li, D.; Slowik, A. Combining meta-heuristics and Q-learning for scheduling lot-streaming hybrid flow shops with consistent sublots. Swarm Evol. Comput. 2024, 91, 101731. [Google Scholar] [CrossRef]
Yu, H.; Gao, K.Z.; Ma, Z.F.; Pan, Y.X. Improved meta-heuristics with Q-learning for solving distributed assembly permutation flowshop scheduling problems. Swarm Evol. Comput. 2023, 80, 101335. [Google Scholar] [CrossRef]
Yu, H.; Gao, K.; Li, Z.; Suganthan, P.N. Energy-efficient multi-objective distributed assembly permutation flowshop scheduling by Q-learning based meta-heuristics. Appl. Soft Comput. 2024, 166, 112247. [Google Scholar] [CrossRef]

Figure 1. Tugboat scheduling process [11].

Figure 2. Population of the Jaya algorithm.

Figure 3. Jaya algorithm for tugboat scheduling.

Figure 4. Flowchart of Solving tugboat scheduling with Jaya-QL.

Figure 5. The Operation Costs of Jaya and Jaya-QL after solving 12 instances.

Figure 6. The Time Costs of Jaya and Jaya-QL after solving 12 instances.

Figure 7. The Computational Time of Jaya and Jaya-QL after solving 12 instances.

Figure 8. Convergence graphs of each algorithm.

Figure 9. Comparison of the results after algorithm convergence.

Figure 10. Gantt Chart of tugboat scheduling scheme.

Figure 11. Sensitivity analysis of exploration rate.

Figure 12. Sensitivity analysis of learning rate.

Figure 13. Sensitivity analysis of discount factor.

Table 1. Parameter Symbols and Definitions of Decision Variables.

Category	Symbol	Definition
sets	m	Tugboat index, $m \in {1, 2, \dots, M}$ .
	n	Task index, $n \in {1, 2, \dots, N}$ .
	k	Base index, $k \in {1, 2, \dots, K}$ .
	$ω$	Number of tasks for a tugboat, $ω \in {0, 1, \dots, W}$ .
Input parameters	$C_{1}$	Operational cost per unit distance during positioning/assignment (deadheading).
	$C_{2}$	Operational cost per unit distance when towing (under load).
	$P S_{m n}^{k}$	Distance for tugboat m from base k to the starting point of task n.
	$P R_{m n}^{k}$	Distance for tugboat m from the endpoint of task n back to base k.
	$T S_{m n}^{ω}$	Start time of task n executed by tugboat m.
	$T F_{m n}^{ω}$	End time of task n executed by tugboat m.
	$T S_{m n}^{k}$	Time when tugboat m departs from base k to carry out its first task.
	$V_{m 1}$	Average speed of the tugboat when not towing (positioning).
	$V_{m 2}$	Average speed of the tugboat when towing.
	$H_{m}$	Horsepower of tugboat m.
	$H_{n}$	Horsepower required by task n.
	$N_{n}$	Number of tugboats required for task n.
Output parameters	$z_{1}$	Consumption (energy/fuel) cost of a tugboat operation.
	$z_{2}$	Time cost of a tugboat operation.
	$Z_{1}$	Total consumption cost of all tugboat operations.
	$Z_{2}$	Total time cost of all tugboat operations.
Decision variables	$X_{m n}^{ω} \in {0, 1}$	Equals 1 if tugboat m is assigned to task n as the $ω$ -th mission from base k; otherwise 0.
Decision variables	$Y_{m n}^{k ω} \in {0, 1}$	Equals 1 if tugboat m returns to base k after completing its $ω$ -th task; otherwise 0.

Table 2. Individual Encoding.

Ship Serial Number	Tugboat 1	Tugboat 2	Tugboat 3	Tugboat 4
1	7	9	0	0
2	4	11	12	13
3	1	5	10	0
4	12	13	0	0
5	4	0	0	0
6	6	9	0	0

Table 3. Mapping between Action Space and State Space.

State	State Feature	Expected Action Strategy	Action Selection Priority
$S (1)$	Near global optimum	Exploit known good solution	Prefer action $A (1)$
$S (2)$	Moderate gap	Balance exploration and exploitation	Hybrid actions $A (1)$ , $A (2)$
$S (3)$	Large gap	Explore new solution space	Prefer action $A (3)$

Table 4. Q-table.

State	A(1)	A(2)	A(3)
$S (1)$	$Q (S (1), A (1))$	$Q (S (1), A (2))$	$Q (S (1), A (3))$
$S (2)$	$Q (S (2), A (1))$	$Q (S (2), A (2))$	$Q (S (2), A (3))$
$S (3)$	$Q (S (3), A (1))$	$Q (S (3), A (2))$	$Q (S (3), A (3))$

Table 5. Number of Tugboats by Horsepower at Each Base in the Main Port Area of Lianyungang Port.

Tugboat Base	Tugboat Horsepower
Tugboat Base	5000 HP	5000–6000 HP	6000 HP
Berth 37	1	5	0
Berth 13	0	1	2
East Wharf	1	4	0
Total	2	10	2

Source: Compiled by the authors.

Table 6. Operating costs per nautical mile for each type of tugboat.

$H_{m}$	$C_{1} (CNY / H)$	$C_{2} (CNY / H)$
4000 HP (Electric)	32	76
4800 HP	980	1650
5200 HP	1070	1780
5400 HP (Electric)	43	103
6500 HP	1340	2235

Source: Compiled by the authors.

Table 7. Tugboat Configuration at Lianyungang Port.

Length of Ship L (m)	Number of Tugboats	Requirements
$0 < L \leq 120$	1	—
$120 < L \leq 180$	2	—
$180 < L \leq 260$	2	At least one high-horsepower tugboat
$260 < L \leq 300$	3	At least two high-horsepower tugboats
$L > 300$	4	At least three high-horsepower tugboats

Source: Compiled by the authors. Note. A tugboat with horsepower greater than 5000 HP is considered high-horsepower.

Table 8. Comparison Between Jaya and Jaya-QL for 12 Instances.

				Jaya				Jaya-QL
Instance	$m$	$n$	$k$	Optimal Value	Operation Cost	Time Cost	CPU Time	Optimal Value	Operation Cost	Time Cost	CPU Time
1	5	5	1	104,321.22	207,796.00	846.45	134.98	104,321.22	207,796.00	846.45	221.69
2	5	7	1	132,065.94	263,051.00	1080.87	196.82	132,065.94	263,051.00	1080.87	321.84
3	6	9	1	41,768.52	82,121.20	1415.85	339.06	38,518.52	75,621.20	1415.85	525.14
4	8	12	2	234,501.87	467,142.00	1861.75	322.26	228,410.62	454,959.50	1861.75	534.72
5	8	15	2	308,842.52	615,267.00	2418.03	317.22	300,097.02	597,776.00	2418.03	548.23
6	9	13	2	95,141.11	188,400.90	1827.32	379.03	89,208.90	176,579.00	1838.80	656.69
7	9	16	2	160,793.34	319,408.00	2178.69	481.42	130,643.49	259,138.90	2148.09	763.49
8	11	16	2	136,747.13	271,304.10	2190.16	468.07	110,438.99	219,691.10	2186.89	814.76
9	11	20	2	179,906.93	357,929.00	2521.86	445.30	134,012.23	265,487.30	2537.16	637.16
10	14	18	3	153,554.61	304,798.30	2310.93	502.48	111,156.17	220,050.60	2261.75	770.31
11	14	20	3	256,810.44	510,710.50	2910.38	456.98	173,681.88	344,412.40	2951.37	721.21
12	14	22	3	297,114.78	590,791.30	3438.25	398.78	213,438.52	423,491.80	3385.25	690.84

Source: Compiled by the authors. Note. (1) The unit of Operation Cost: yuan; The unit of Time Cost: minute; The unit of CPU Time: second. (2)

α

= 0.5.

Table 9. Ship information.

ID	Ship Name	Length (m)	Arrival Draft (m)	Departure Draft (m)	Planned Time	Tugboat Assistance Type
1	BR	109.6	5.2	4.2	10.14/12:00	berthing
2	YL1	137.7	7.0	4.8	10.14/12:00	berthing
3	XHX229	119.9	4.7	7.1	10.14/12:00	berthing
4	AR	299.97	18.52	9.03	10.14/12:00	berthing
5	SY66	160.0	5.6	4.6	10.14/17:00	berthing
6	RX	79.99	3.0	5.6	10.14/17:00	berthing
7	ZD88	289.0	17.48	11.63	10.14/19:00	unberthing
8	HXYG	196.27	6.5	6.5	10.14/19:00	unberthing
-	-	-	-	-	-	-
27	XF	190.0	7.1	7.6	10.16/8:00	berthing
28	YFGZ	294.0	13.4	13.5	10.16/8:00	berthing
29	FYH	179.7	10.0	10.5	10.16/8:00	berthing
30	YH27	120.0	8.3	5.2	10.16/10:00	unberthing
31	FY1	117.63	6.0	7.0	10.16/10:00	unberthing
32	GH	111.97	5.0	7.0	10.16/12:00	berthing
33	CWYK	198.5	7.86	8.13	10.16/12:00	berthing
34	FHS	140.8	5.2	4.8	10.16/12:00	berthing

Source: Compiled by the authors. Note: Length, arrival draft, and departure draft are in meters. “Planned time” is the scheduled start time for berthing or unberthing operations.

Table 10. Comparison of Different Values Solved by Various Algorithms.

Method	Optimal Value	Operation Cost	Time Cost	CPU Time
Abc	0.18	1,223,487.30 ± 48,232.52	5778.69 ± 38.56	322.86
QPSO	0.12	1,127,435.10 ± 35,681.81	5703.83 ± 24.22 152.82	152.82
ACO	0.10	1,090,421.60 ± 31,451.45	5700.73 ± 21.82	483.30
GA	0.05	998,783.70 ± 28,125.67	5699.36 ± 19.53	135.06
Jaya	0.17	1,199,240.50 ± 42,394.28	5740.98 ± 30.67	279.36
Jaya-QL	0.01	925,410.10 ± 10,750.41	5687.43 ± 7.21	260.96

Source: Compiled by the authors. Note. (1) The unit of Operation Cost: yuan; The unit of Time Cost: minute; The unit of CPU Time: second. (2)

α = 0.5

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yuan, W.; Xue, Z.; Jiang, W. Multi-Objective Optimization for Tugboat Scheduling Based on the Jaya Algorithm Integrating Q-Learning. Symmetry 2026, 18, 129. https://doi.org/10.3390/sym18010129

AMA Style

Yuan W, Xue Z, Jiang W. Multi-Objective Optimization for Tugboat Scheduling Based on the Jaya Algorithm Integrating Q-Learning. Symmetry. 2026; 18(1):129. https://doi.org/10.3390/sym18010129

Chicago/Turabian Style

Yuan, Wei, Zhongwei Xue, and Wei Jiang. 2026. "Multi-Objective Optimization for Tugboat Scheduling Based on the Jaya Algorithm Integrating Q-Learning" Symmetry 18, no. 1: 129. https://doi.org/10.3390/sym18010129

APA Style

Yuan, W., Xue, Z., & Jiang, W. (2026). Multi-Objective Optimization for Tugboat Scheduling Based on the Jaya Algorithm Integrating Q-Learning. Symmetry, 18(1), 129. https://doi.org/10.3390/sym18010129

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Objective Optimization for Tugboat Scheduling Based on the Jaya Algorithm Integrating Q-Learning

Abstract

1. Introduction

2. Related Work

3. Tugboat Scheduling Model

3.1. Tugboat Scheduling Process

3.2. Model Assumption

3.3. Parameters and Variables

3.4. MILP Model

4. Algorithm

4.1. Encoding

4.2. Initial Individuals and Population

4.3. Fitness Function

4.4. Jaya Algorithm

4.5. Q Learning

4.6. The Jaya Algorithm Integrating Q-Learning

5. Simulation Experiment

5.1. Parameter Setting

5.2. Small and Medium-Scale Case Studies

5.3. Large-Scale Case Study

5.4. Parameter Sensitivity Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI