A Multi-Objective Artificial Bee Colony Algorithm Incorporating Q-Learning Search for the Flexible Job Shop Scheduling Problems with Multi-Type Automated Guided Vehicles

Ge, Shihong; Zhang, Hao; Xu, Zhigang; Yang, Zhiqi

doi:10.3390/app152010948

Open AccessArticle

A Multi-Objective Artificial Bee Colony Algorithm Incorporating Q-Learning Search for the Flexible Job Shop Scheduling Problems with Multi-Type Automated Guided Vehicles

by

Shihong Ge

^1,2

,

Hao Zhang

^1,2,*,

Zhigang Xu

^1,2 and

Zhiqi Yang

^1,2

¹

Shenyang Institute of Automation, Chinese Academy of Sciences, 114 Nantajie, Shenyang 110016, China

²

University of Chinese Academy of Sciences, 19 Yuquanlu, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(20), 10948; https://doi.org/10.3390/app152010948

Submission received: 11 September 2025 / Revised: 4 October 2025 / Accepted: 6 October 2025 / Published: 12 October 2025

Download

Browse Figures

Versions Notes

Abstract

The flexible job shop scheduling problem (FJSP) with transportation resources such as automated guided vehicles (AGVs) is prevalent in manufacturing enterprises. Multi-type AGVs are widely adopted to transfer jobs and realize the collaboration of different machines, but are often ignored in current research. Therefore, this paper addresses the FJSP with multi-type AGVs (FJSP-MTA). Considering the difficulties caused by the introduction of transportation and the NP-hard nature, the artificial bee colony (ABC) algorithm is adopted as a fundamental solution approach. Accordingly, a Q-learning hybrid multi-objective ABC (Q-HMOABC) algorithm is proposed to deal with the FJSP-MTA. First, to minimize both the makespan and total energy consumption (TEC), this paper proposes a novel mixed-integer linear programming (MILP) model. In Q-HMOABC, a three-layer encoding strategy based on operation sequence, machine assignment, and AGV dispatching with type selection is used. Moreover, during the employed bee phase, Q-learning is employed to update all individuals; during the onlooker bee phase, variable neighborhood search (VNS) is used to update nondominated solutions; and during the scout bee phase, a restart strategy is adopted. Experimental results demonstrate the effectiveness and superiority of Q-HMOABC.

Keywords:

flexible job shop scheduling; multi-type automated guided vehicles; artificial bee colony algorithm; reinforcement learning; mixed integer linear programming

1. Introduction

Throughout the past several decades, with the development of industrial manufacturing models, particularly the shift towards small-batch and multi-variant production requirements, flexible manufacturing has gradually emerged as a focal point of extensive research in both academic and industrial fields [1]. The flexible job shop scheduling problem (FJSP) is a representative intractable combinatorial optimization NP-hard problem that finds extensive application in real-world production scenarios [2]. In the FJSP system, each job comprises a sequence of operations, and these operations must be processed on distinct machines, requiring machine-to-machine transmission [3,4]. To ensure the efficient completion of such transportation tasks, manufacturing enterprises typically employ automated guided vehicles (AGVs) to realize the automatic transfer of jobs between machines [5]. However, some studies often ignore transportation time or simply fold transportation time into processing time. This oversight compromises practical relevance, as processing and transportation operations are tightly linked: preceding operation’s completion time determines the starting time of the next transportation operation, while transportation time directly determines the subsequent operation’s start time [6,7]. Consequently, integrated scheduling of production and transportation resources is critical for generating actionable and feasible solutions. FJSP with AGV transportation resources (FJSP-T) traditionally addresses three subproblems: operation sequence, machine assignment, and AGV dispatching. While extensive research has addressed the FJSP-T problem, most studies assume a single AGV type and overlook the impact of AGV heterogeneity [8]. For example, different load/empty speeds and energy consumption levels will affect the total energy consumption (TEC) and production cycle. In practical manufacturing environments, however, workshops rarely rely on one single AGV type. Instead, due to budget limitations, space constraints, and diversity of transportation requirements, it is common to deploy a heterogeneous fleet of AGVs with distinct physical and operational characteristics. This heterogeneity may include variations in acceleration profiles, load/empty travel speeds, and energy consumption patterns under different operating states [9]. Such heterogeneity inherently influences both transportation efficiency and the overall energy consumption of the production system. Neglecting these differences not only risks underestimating the actual TEC but also overlooks potential trade-offs between makespan and energy usage [10].

There is currently a lack of research on multi-type AGVs that integrate energy consumption and makespan. To bridge this gap, this paper extends the conventional FJSP-T framework by introducing AGV-type selection as a fourth subproblem alongside operation sequence, machine selection, and AGV assignment, thereby formulating a mathematical model for the FJSP with multi-type AGVs (FJSP-MTA). The proposed mixed-integer linear programming (MILP) model explicitly incorporates AGV heterogeneity into the integrated scheduling process, enabling a coupled optimization of production and transportation resources that more accurately reflects workshop realities. In doing so, our study aims at the simultaneous minimization of makespan and TEC, aligning with both productivity and sustainability imperatives. This dual objective formulation not only enhances the operational feasibility of generated schedules but also provides decision-makers with actionable insights for achieving balanced, energy-efficient, and cost-effective manufacturing operations.

In recent years, learning-based methods that integrated heuristic algorithms with reinforcement learning (RL) have been developed to solve FJSP [11,12,13]. This paper presents a novel multi-objective approach called the Q-learning hybrid multi-objective artificial bee colony algorithm (Q-HMOABC) to solve the FJSP-MTA optimization model, and its inspiration is drawn from reinforcement learning. This paradigm combines the global search capabilities of heuristic algorithms with the self-learning abilities of RL, significantly improving optimization performance and generalization. Pareto-optimal solutions solved by Q-HMOABC have better global convergence, local diversity, and shorter running time. This algorithm has distinct advantages for solving the FJSP-MTA.

The main innovations of this paper are summarized as follows.

(1): Novel MILP model: A MILP model with the objective of minimizing makespan and TEC is developed to formulate the FJSP-MTA.
(2): Novel algorithm framework: An efficient Q-HMOABC algorithm is designed to efficiently address the FJSP-MTA, and improved strategies are adopted in three ABC phases. Here, Q-HMOABC is formed by embedding the Q-learning RL method into the employed bee phase of the ABC algorithm.
(3): Extensive experiments: Unconstrained benchmark functions and benchmark instances are conducted to test the performance of Q-HMOABC, where the statistical results illustrate the superiority of the proposed algorithm in solving the FJSP-MTA.

The remainder of this paper is organized as follows: Section 2 reviews existing studies on the FJSP-T and relevant solving methods. Section 3 describes the problem and proposes a detailed MILP model. Section 4 explains Q-HMOABC from multiple perspectives. Section 5 presents validation experiments in detail. Finally, Section 6 concludes the study and suggests directions for future research.

2. Literature Review

To the best of our knowledge, there is no literature aimed at solving the FJSP-MTA by using Q-HMOABC. Therefore, this paper conducts a review of the relevant literature, including studies on the FJSP-T and existing state-of-the-art work relating to the solving method.

2.1. FJSP with AGV Transportation Resources

The initial work on the FJSP-T was carried out by Raman et al. [14]. A mixed-integer programming model was formulated for the problem, with the objective of minimizing the makespan. In the study, the transportation process was conceptualized as a general operation, a decision that constrained the adaptability of the vehicles. In order to overcome this limitation, a nonlinear mixed-integer programming model was developed and a heuristic algorithm that incorporated a sliding time window was designed to generate a conflict-free schedule [15]. Subsequent to this, the research on the FJSP-T has gradually become more widespread, driven by both theoretical research and industrial technology. Ham et al. introduced two novel constraint programming models by integrating transfer robots in the FJSP [16]. Pan et al. [17] proposed a learning-based multi-population evolutionary optimization (LMEO) and used multiple heuristics for population initialization. Yan et al. [3] developed an FJSP model under finite AGV transportation constraints within a digital twin workshop. An improved genetic algorithm with three-layer redundant encoding and corrective decoding was proposed, achieving efficient integration of feasible schedules into the digital twin and demonstrating strong practical applicability. Meng et al. [18] proposed an improved genetic algorithm (IGA) for solving the multi-AGV flexible job shop scheduling problem, explicitly considering a limited number of AGVs. The IGA includes a population diversity check mechanism that maintains search robustness and, when tested on five benchmark sets, not only outperformed state-of-the-art algorithms but also found new best-known solutions for instances. Han et al. [19] proposed a novel dual-population collaborative genetic algorithm (DCGA) and introduced a new MILP model for small-sized instances, and designed DCGA with two distinct decoding methods and a dual-population framework. Experimental results demonstrated that the approach improved 18 best-known benchmark solutions and achieved optimal results on small-scale cases. Homayouni et al. [20] proposed a late acceptance hill-climbing (LAHC) heuristic, which found high-quality solutions with less computational time. Afterward, Homayouni et al. [21] proposed a multi-start biased random key genetic algorithm (BRKGA) that employs greedy heuristic rules to select machines and AGVs. Liu et al. [22] developed an MILP to simultaneously minimize both makespan and comprehensive energy consumption by modeling time-node dynamics and energy use of machines and AGVs. Zhang et al. [23] proposed energy-efficient flexible workshop scheduling (EFJS-AGV) for multiple autonomous guided vehicles (AGVs) to address the issue that existing research has not yet incorporated green indicators into manufacturing and logistics scheduling frameworks.

2.2. Solving Methods

Many methods have been applied to address the FJSP-T and its extensions, including the exact algorithm (e.g., branch and bound algorithm, CP, and MILP), heuristic, RL, and commercial solver (e.g., CPLEX and Gurobi). Fontes et al. [24] constructed a new MILP model for joint production and transportation scheduling problems. The proposed modeling method utilizes two sets of interrelated chain decisions and is highly efficient in solving the optimal solution of benchmark problem instances using the Gurobi solver. Heuristic algorithms have been developed as the main tools because of their ability to solve large-scale problems efficiently, which exceeds the capabilities of the exact algorithm, including GA (Zhang et al.) [25], PSO (Ren et al.) [26], and ABC (Gao et al.) [27]. RL has made breakthroughs in many fields in recent years. The combination of RL and heuristic methods has become an important research focus in the FJSP-T. Known as learning-based methods, this approach gives heuristic algorithms the ability to learn from experience. With these abilities, the algorithms can adjust their search strategies and choose parameters more effectively, adapting to different scheduling conditions and improving solution quality. Dong et al. [28] formulated the FJSP-T as an MDP and proposed a heuristic-assisted Deep Q-Network (HA-DQN), integrating heuristic rules to reduce action-space complexity. Thid method demonstrates notable gains in both solution quality and computational speed versus traditional heuristics and pure DRL methods. Cheng et al. [29] proposed both exact and approximate methods for the energy-efficient FJSP-T (EFJSP-T). The exact method employed a novel MILP formulation using the epsilon method, and an approximate approach based on multi-population evolution of imitation learning (IL) was introduced to enhance the efficiency of solution space exploration. However, it remains limited to a single type of AGV. LI et al. [30] proposed a hybrid algorithm combining RL and the ABC algorithm for the FJSP with lot streaming, and RL was developed by building mappings between the environment and schemes of sublots. Xu et al. [31] proposed a solution that transforms the FJSP into a Markov Decision Process (MDP) and employs RL techniques for resolution, and designed 16 composite dispatching rules. Similarly, the study also assumed that only one type of AGV exists. Zhang [32] proposed an optimization scheme that minimizes the weighted sum of the makespan and TEC for the distributed FJSP (DFJSP) involving crane transportation. Yang et al. [33] proposed a digital twin–assisted predictive rescheduling method that generates proactive schedules using order-arrival hypotheses.

3. Problem Description and Formulation

In this section, a comprehensive description and analysis of the FJSP-MTA are provided first. Subsequently, the notations and definitions to the problem are listed. Next, a corresponding MILP model is formulated. Finally, an example of an FJSP-MTA is provided to illustrate the MILP model. Analyzing the transportation scenarios of the FJSP-MTA enables a deeper exploration of its domain knowledge, thereby establishing the foundation for subsequent algorithm design.

3.1. Problem Description

The FJSP-MTA is described as follows: a given number of jobs I = {I₁, I₂, …, I_n} are processed on a set of machines K = {K₁, K₂, …, K_m}. Each job contains multiple operations J_i = {J₁, J₂, …, J_n_i}, which can be processed on multiple processing machines. The transfer between job’s operations is carried out by T = {1, 2, 3, …, t} types of AGV V = {V₁, V₂, …, V_n_V}. The job and AGV are initially in the loading and unloading (LU) position, and there is a certain distance between the LU and machines. For each job, when the current operation is performed on a machine, it needs to be transported to the next machine by an AGV for the next operation. The transportation time between two machines and the processing time of each operation on the compatible machine are known in advance. Machines can be categorized into two states: the processing state and the waiting for processing state. Due to the differences in energy consumption, machines consume energy when processing the job and generate idle energy consumption while waiting to process the job. The transportation process of AGVs is also classified into two types: empty travel and load travel. Empty travel represents that the AGV needs to travel from the current machine or LU position to the machine position to pick up the job, while the load travel represents that the AGV takes the job and transports it to the target machine position. The same AGV consumes energy when loading the job and generates empty energy consumption while picking up the job.

Assumptions

The FJSP-MTA satisfies the following assumptions:

Initial state constraint: All machines and AGVs are available at time 0, and all jobs can be processed at time 0.
Each machine has a buffer zone that can be used to park AGVs and store jobs, and has sufficient buffering capacity.
Each machine can process at most one operation at a time.
Each AGV can transport at most one job or be empty at a time.
The processing task of jobs cannot be interrupted as soon as it starts.
The transportation task of jobs cannot be interrupted as soon as it starts.
The loading and unloading time are excluded in the transportation time.
The job does not return to the LU area once it has been completed.
Machine and AGV breakdown during operation are not considered, with sufficient AGV battery power.
The last operation of each job does not need to be transported after it is finished.

3.2. Formulation of MILP Model

In the FJSP-MTA, we need to deal with the FJSP and the AGV dispatching problem simultaneously. It can be divided into four subproblems: (1) machine assignment problem, deciding the processing machine for each job; (2) machine scheduling problem, defining the processing sequence of the jobs on each machine; (3) AGV assignment problem, determining which type of each AGV to assign; and (4) AGV scheduling problem, resolving the allocation of the transportation resources for each job.

The objective function of the FJSP-MTA is to simultaneously minimize the makespan C_max and TEC. The FJSP-MTA is formulated in this paper as a MILP model, using the notations in Notations.

M i n i m i z e : F (x) = T E C + C_{\max}

(1)

where C_max is the completion time of the last operation O_i,j of last processing job I. TEC can be divided into three components: machine processing energy consumption (MPEC), machine idle energy consumption (MIEC), and AGV transportation energy consumption (ATEC). They are calculated as shown in the following Equation (2):

T E C = M P E C + M I E C + A T E C

(2)

The formulas for the three terms are as follows:

M P E C = \sum_{i = 1}^{n} \sum_{j = 1}^{n i} \sum_{k \in k (i, j)} \sum_{t = 1}^{T} \sum_{v = 1}^{n V} (P P_{k} \cdot P T_{i, j, k} \cdot X_{i, j, k, t, v})

M I E C = \sum_{k \in k (i, j)} I P_{k} \cdot (E M_{k} - \sum_{i = 1}^{n} \sum_{j = 1}^{n i} \sum_{t = 1}^{T} \sum_{v = 1}^{n V} (P T_{i, j, k} \cdot X_{i, j, k, t, v}))

A T E C = \sum_{i = 1}^{n} \sum_{j = 1}^{n i} \sum_{v = 1}^{n V} \sum_{v^{'} = 1}^{n V} \sum_{k \in k (i, j)} \sum_{k^{'} \in k (i, j)} \sum_{t = 1}^{T} (X_{i, (j - 1), k, t, v} \cdot X_{i, j, k^{'}, t^{'}, v^{'}} \cdot T T_{k, k^{'}, t}) \cdot (A P_{l o a d}^{t} - A P_{e m p t y}^{t}) + \sum_{t = 1}^{T} \sum_{v = 1}^{n V} (E A_{t, v} \cdot A P_{e m p t y}^{t})

The first term of ATEC’s calculation is load energy consumption, and the second term is empty energy consumption.

The MILP model is subjected to constraint sets (3)–(17).

S_{i, j} + P T_{i, j, k} \leq E M_{k} + M (1 - \sum_{t = 1}^{T} \sum_{v = 1}^{n V} X_{i, j, k, t, v}) \forall i \in I, \forall j \in J_{i}, k \in k_{(i, j)}

(3)

\begin{array}{l} S_{i, j} + \sum_{k \in k (i, j)} (T T_{k, k^{'}, t} + P T_{i, j, k}) \leq \sum_{t = 1}^{T} \sum_{v = 1}^{n V} S A_{i, (j + 1), t, v} + M (1 - \sum_{t = 1}^{T} \sum_{v = 1}^{n V} S A_{i, (j + 1), t, v}) \\ \forall i \in I, j \in (1, 2, \dots, n_{i} - 1), k^{'} \in k_{(i, j + 1)} \end{array}

(4)

\begin{array}{l} S_{i, j} + \sum_{t = 1}^{T} \sum_{v = 1}^{n V} (X_{i, j, k, t, v} \cdot P T_{i, j, k}) \leq S_{i^{'}, j^{'}} + M (3 - Y M_{i, j, i^{'}, j^{'}} - \sum_{t = 1}^{T} \sum_{v = 1}^{n V} X_{i, j, k, t, v} - \sum_{t = 1}^{T} \sum_{v = 1}^{n V} X_{i^{'}, j^{'}, k, t, v}) \\ \forall i \in I, i^{'} \in I, i < i^{'}, j \in J i, j^{'} \in J i^{'}, k^{'} \in k_{(i, j)} \cap k_{(i^{'}, j^{'})} \end{array}

(5)

\begin{array}{l} S_{i^{'}, j^{'}} + \sum_{t = 1}^{T} \sum_{v = 1}^{n V} (X_{i^{'}, j^{'}, k, t, v} \cdot P T_{i^{'}, j^{'}, k}) \leq S_{i^{'}, j^{'}} + M (2 + Y M_{i, j, i^{'}, j^{'}} - \sum_{t = 1}^{T} \sum_{v = 1}^{n V} X_{i, j, k, t, v} - \sum_{t = 1}^{T} \sum_{v = 1}^{n V} X_{i^{'}, j^{'}, k, t, v}) \\ \forall i \in I, i^{'} \in I, i < i^{'}, j \in J i, j^{'} \in J i^{'}, k^{'} \in k_{(i, j)} \cap k_{(i^{'}, j^{'})} \end{array}

(6)

\begin{array}{l} 1 - M (2 - \sum_{t = 1}^{T} \sum_{v = 1}^{n V} X_{i, j, k, t, v} - \sum_{t = 1}^{T} \sum_{v = 1}^{n V} X_{i, j + 1, k^{'}, t, v} -) \leq \sum_{t = 1}^{T} \sum_{v = 1}^{n V} X_{i, j + 1, k^{'}, t, v} \\ \forall i \in I, j \in {1, 2, \dots, n_{i} - 1}, j^{'} \in J_{i^{'}}, k \in k_{(i, j)}, k^{'} \in k_{(i, j + 1)} \end{array}

(7)

Y T_{t, v, i, j, i^{'}, j^{'}} + Y T_{t, v, i^{'}, j^{'}, i, j} = 1 \forall i, i^{'} \in I, j \in J i, j^{'} \in J i^{'}, O_{i, j} \neq O_{i^{'}, j^{'}}, t \in T, v \in n V

(8)

S_{i, n i} + \sum_{k \in k_{(i, n i)}} \sum_{t = 1}^{T} \sum_{v = 1}^{n V} (P T_{i, n i, k} \cdot X_{i, n i, k, t, v}) \leq C_{m a x} \forall i \in I

(9)

\sum_{k \in k_{i, 1}} X_{i, 1, k, t, v} \cdot T T_{0, k, t} \leq S A_{i, 1, t, v} \forall i \in I

(10)

S A_{i, j, t, v} \leq E A_{t, v} \forall i \in I, j \in J_{i}, t \in T, v \in n V

(11)

S_{i, j} \geq 0 \forall i \in I, j \in J_{i}

(12)

S_{i, j} \geq \sum_{t = 1}^{T} \sum_{v = 1}^{n V} S A_{i, j, t, v} \forall i \in I, j \in J_{i}

(13)

E M_{k} \geq 0 k = \{1, 2, \dots, m\}

(14)

E A_{t, v} \geq 0 t \in T, v \in n V

(15)

S A_{i, j, t, v} \geq 0 \forall i \in I, j \in J_{i}, t \in T, v \in n V

(16)

\sum_{t = 1}^{T} \sum_{v = 1}^{n V} \sum_{k \in k_{i, 1}} X_{i, 1, k, t, v} = 1 \forall i \in I

(17)

where constraint (3) represents the machine shutdown time constraint: the operation must be completed before the machine is shut down. Constraint (4) represents the constraints between process start time and arriving time: the arriving time for each process must precede or be equal to its process start time. Dual constraints (5) and (6) ensure the non-overlapping of the operations arranged on the same machine. Constraint (7) represents adjacent sequential operations constraint: O_i,j and O_i,j₊₁ are not processed on the same machine. Constraint (8) ensures the non-overlapping transportation of the operations assigned to the same AGV. Constraint (9) represents the makespan constraint: makespan must not be less than the start time of each job plus its subsequent processing time. Constraint (10) limits the transportation of the first operation of each job. Constraint (11) represents the AGV working time constraint: The end time for the same AGV’s transport must precede the arrival of the transport process. Constraint (12) limits the range of decision variable S_i,j. Constraint (13) represents the relationship of SA_i,j,t,v and S_i,j: The start time for operation O_i,j must not precede the arrival time. Constraints (14)–(16) limit the ranges of decision variables EM_k, EA_t,v, and SA_i,j,t,v. Constraint (17) represents that an operation can be assigned on exactly one machine and one AGV.

3.3. Example of FJSP-MTA

To better comprehend the above-described MILP model for the FJSP-MTA, Figure 1 illustrates a Gantt chart of a feasible schedule with three jobs, three machines, and two types of AGV, one of each type. Table 1 lists the compatible machines and processing time of each operation O_i,j. Table 2 shows the transfer time between machines for the two types of AGV. M0 represents loading/unloading point.

The parameters in the MILP model describe the above Gantt chart as follows: energy consumption parameters of three machines include PP₁ = 2.5, PP₂ = 2.1, PP₃ = 2.7, IP₁ = 1.5, IP₂ = 1.2, and IP₃ = 1.6. Energy consumption parameters of two types of AGVs include

A P_{l o a d}^{1}

= 2.4,

A P_{l o a d}^{2}

= 1.2,

A P_{e m p t y}^{1}

= 1.6, and

A P_{e m p t y}^{2}

= 0.8. Decision variables include X_1,1,1,1,1 = 1, X_1,2,2,1,1 = 1, X_2,1,3,2,1 = 1, X_2,2,2,1,1 = 1, X_3,1,2,1,1 = 1, X_3,2,1,2,1 = 1; YT_1,1,3,1,1,2 = 1, YT_1,1,1,2,2,2 = 1, YT_1,1,3,1,1,2 = 1, YM_1,1,3,2 = 1, YM_3,1,1,2 = 1, and YM_1,2,2,2 = 1. Start time for each operation include S_1,1 = 20, S_1,2 = 120, S_2,1 = 80, S_2,2 = 160, S_3,1 = 70, and S_3,2 = 160. Shutdown time for three machines include EM₁ = 203, EM₂ = 215, and EM₃ = 130. Arrival time of each vehicle transport job include SA_1,1,1,1 = 20, SA_1,2,1,1 = 120, SA_2,1,2,1 = 80, SA_2,2,1,1 = 20, SA_3,1,1,1 = 70, and SA_3,2,2,1 = 160. End transporting time of two AGVs include EA_1,1 = 150 and EA_2,1 = 160. The machine processing energy consumption is MPEC = 672; the machine idle energy consumption is MIEC = 135. Only the machine M1 generated idle energy consumption after startup. The AGV total energy consumption is ATEC = 488. The total energy consumption generated by the entire workshop is TEC = 1295. All oprerations’ finish time is makespan = 215.

4. Q-Hybrid Multi Objective Artificial Bee Colony (Q-HMOABC)

Due to the NP-hard nature, the ABC algorithm is considered more efficient than exact algorithms for solving the FJSP-MTA on a large scale. However, since the problem involves strong coupling among machine scheduling, transportation assignment, and energy consumption optimization, it is difficult for the standard ABC to obtain high-quality solutions without further enhancement strategies. Recently, the integration of RL into swarm intelligence algorithms has emerged as an effective approach for scheduling optimization. By embedding learning mechanisms such as Q-learning into the search process, the algorithm can adaptively exploit historical experience to guide bees to make decisions and adjust the search direction dynamically. To further balance exploration and exploitation, an improved ABC-based RL algorithm called Q-HMOABC is presented in this work. Moreover, the new population initialization, specific evolutionary phase search method, and strategies for updating populations are introduced to improve and balance the exploration and exploitation.

4.1. Framework of the Q-HMOABC

The framework of the proposed Q-HMOABC consists of four key components. First, a chaotic population initialization method is employed to enhance population diversity at the early stage. Next, a fast nondominated sorting procedure is applied to rank the entire population and provide a structured basis for the subsequent evolutionary process. Then, in the following three phases of the ABC, different strategies are incorporated: during the employed bee phase, a Q-learning mechanism is used to guide the update of all individuals; in the onlooker bee phase, a variable neighborhood search (VNS) is applied to refine nondominated solutions; and in the scout bee phase, a restart strategy is introduced, where individuals that do not improve for a certain number of generations are replaced by randomly generated ones. These components form the integrated framework of Q-HMOABC, aiming to balance exploration and exploitation while maintaining population diversity throughout the search process. The Q-HMOABC framework pseudocode is shown in Algorithm 1.

Algorithm 1: Q-HMOABC

Input: Job and AGV time, energy parameters, population size, learning rate,
discount factor, exploring rate, restarting number, number of VNS;

Output: OS, MA, AD, Makespan, TEC;

1 Initialize population

P_{0} = {x_{1}, x_{2}, \dots, x_{N}}

by Tent chaotic map;

2 Define fitness function

F (x) = (C_{\max} (x), T E C (x))

;

3 Define Q-learning tuple

(S, A, R, Q)

with state

s \in S

, operator

a \in A

,

reward

R = (s, a)

, Q-value

Q (s, a)

;

4

t \leftarrow 0

;

5 while t < MaxIter do

6 Perform fast non-dominated sorting on P_t, obtain fronts F₁, …, F_k;

7 for each employed bee

i \in {1, \dots, N_{e}}

do

8 Select operator

a = \arg \max_{a} Q (s, a)

with probability

1 - ε

, else random a;

9 Generate candidate

x^{'} \leftarrow N_{a} (x_{i})

;

10 Evaluate

F (x^{'}) = (C_{\max} (x^{'}), T E C (x^{'}))

;

11 if

F (x^{'}) ≺ F (x)

then

12

x_{i} \leftarrow x^{'}

;

13 Update

Q (s, a) \leftarrow Q (s, a) + α (R (s, a) + γ \max_{a^{'}} Q (s^{'}, a^{'}) - Q (s, a))

;

14 for each onlooker bee

j \in {1, \dots, N_{0}}

do

15 Select x_k with probability

p_{k} = \frac{F (x_{k})}{\sum_{l}^{N} F (x_{l})}

;

16 Apply

x^{'} \leftarrow V N S (x_{k})

;

17 if

F (x^{'}) ≺ F (x)

then

18 Update

x_{k} \leftarrow x^{'}

;

19 for each scout bee

l \in {1, \dots, N_{s}}

do

20 if no improvement for LN iterations then

21 Restart strategy

x_{l} \sim U (T, E)

;

22 Update Pareto archive

A * \leftarrow A * \cup P_{t}

23 if stopping criterion satisfied then

24 return non-dominated set

A *

;

25

t \leftarrow t + 1

;

26 end while

4.2. Solution Representation

The encoding method directly affects the solution space and efficiency of Q-HMOABC. A three-layer coding method based on operation is used to encode a solution in Q-HMOABC. First, the operation sequence (OS) vector is generated, and machine assignment (MA) and AGV dispatching (AD) vector are then determined according to the OS and corresponding constraints. Finally, OS, MA, and AD are combined to represent a solution to the problem. These parameters are described as follows.

OS: The internal number of the OS chromosome gene indicates the serial number of the job, and the sequence of the same number indicates the operation number of the job. For example, as shown in Figure 2a, the first number 3 in OS vector represents the operation O_3,1, the number 3 at the sixth position denotes O_3,2, and the last position 3 represents O_3,3. The priority of operation within the same job is ensured by this code, as O_3,1 → O_3,2 → O_3,3. The length of the vector is equal to the total number of operations.

MA: The location of the element in the candidate machines set is applied to denote the MA vector. The number k in the MA vector denotes that the kth machine in the candidate set is assigned to the operation O_i,j. As shown in Figure 2b, the first number 2 in MA vector indicates that the second machine M₃ in the set {M₁, M₃} is selected, and the third number 3 in MA vector indicates that the third machine M₄ in the set {M₂, M₃, M₄} is selected. The machine assignment and processing sequence on every machine are jointly determined by decoding OS and MA vectors based on the given set of candidate machines.

AD: The internal number of the AGV chromosome gene represents the specific vehicle assigned for transportation. In this encoding, different integers are used to distinguish both the vehicle type and its index within the type. For example, numbers 1 and 2 denote the first and second AGVs of type 1, respectively, while numbers 3 and 4 correspond to the first and second AGVs of type 2. For example, as shown in Figure 3, the first number 2 in the AD vector indicates that O_2,1 is transported by AGV no. 2 of type 1, the number 3 at the fifth position denotes that O_1,2 is transported by AGV no. 1 of type 2, and the ninth position 4 represents O_4,2 being transported by AGV no. 2 of type 2. Each gene in the AD vector determines which AGV is responsible for the corresponding transportation task. The length of the AD vector is equal to the total number of transportation tasks in the scheduling problem.

Crossover and Mutation

Crossover operations generate new individuals by recombining genes from two parent chromosomes, thereby increasing population diversity. Based on the characteristics of the FJSP-MTA, the encoding methods, and problem constraints of the operation, machine, and AGV gene segments, this work designs different crossover methods to generate new individuals: Specifically, ensure that the number of operations for each job remains unchanged in the offspring after crossover. This paper uses the Partial-Mapped Crossover (PMX) principle to perform crossover operations on OS genes, as shown in the example in Figure 4. Two-point crossover (TPX) is performed first. Select genes from the fourth to the seventh position and swap the middle segments between Parent A and B to generate Offspring A′ and B′. Subsequently, gene repair is performed through process precedence mapping to prevent generating infeasible solutions that violate the precedence relationship among operations within one job. The specific steps involve establishing a mapping relationship for the unswapped genes in Offspring A′ and B′, 1 → 4, 2 → 3, 4 → 1, 3 → 2, thereby generating Offspring A and B. The machine layer also uses TPX. To guarantee that the new machine number is derived from the set of available machines of the original operation, gene repairment is also performed on MA genes.

The AGV layer uses position-based crossover (POX), specifically tailored to AGV type and indices. Specifically, the POX operator ensures that each transportation task in the offspring is assigned to a valid AGV while preserving the intended AGV type and its corresponding index. An illustrative example in Figure 5 shows how partial positions are copied from one parent and the rest are sequentially completed from the other, thereby maintaining type-consistent, index-explicit AGV assignments for all transportation tasks. First, randomly generate two parent genes Parent A and B. Copy the genes at positions 2, 3, 4, 6, 7, and 9 from Parent 1 to Offspring A while retaining their original positions. Copy the genes from the remaining positions of Parent B sequentially to the remaining positions in Offspring A. Offspring B undergoes the same process. If a copied gene violates a feasible type restriction, a minimal repair is applied by remapping the gene to another vehicle within the same type, leaving all other positions unchanged.

After crossover, three complementary mutation strategies are applied to sustain diversity and promote local refinement. (i) OS layer: Two methods are used. A two-point swap that exchanges the genes at two uniformly selected positions, and an inversion that reverses the contiguous subsequence between two random cut points. (ii) MA layer: Two methods are used. Swap one machine at a random location; for each selected operation, the machine k is selected from its feasible set k(i,j); Swap three machines k = {1, 2, 3} at random locations: for three selected operations, the machines are selected from their corresponding feasible set k(i,j). (iii) AD layer: The same two mutation strategies as at the MA level are employed.

4.3. Population Initialization

Population initialization critically shapes swarm search: a well-distributed seed set broadens exploration and delays premature convergence, whereas uneven or biased seeding narrows the search and degrades solution quality. Common schemes are concise but limited—random seeding is easy yet often clustered; heuristic seeding injects problem knowledge at the risk of bias and reduced generality. In the FJSP-MTA, where machine assignments and AGV dispatching are tightly coupled and the search space is large and multimodal, these limitations can quickly propagate into an inferior Pareto front.

Therefore, this work adopts chaotic initialization based on the tent map within Q-HMOABC. The tent map produces deterministic, pseudo-random sequences with near-uniform density, ergodicity, and sensitivity to initial conditions. This seeding increases early diversity, improves space coverage, and lowers the risk of local entrapment, while preserving sufficient randomness to sustain exploration as learning and local search proceed. Empirically, it accelerates convergence and yields stronger Pareto approximations, such as higher hypervolume (HV) and lower inverted generation distance (IGD). The tent map used for initialization is formally defined in Equation (18), and it serves as the default population generator in our framework. Where

x_{t}^{p}

and

x_{t + 1}^{p}

denote the current state of the sequence at iteration t and the next state of the sequence at iteration t + 1, p denotes the population size, and a denotes the control parameter.

x_{t + 1}^{p} = \{\begin{matrix} \frac{x_{t}^{p}}{a}, 0 < x_{t}^{p} < a, \\ \frac{1 - x_{t}^{p}}{1 - a}, a \leq x_{t}^{p} < 1, \end{matrix} a \in (0, 1)

(18)

where the symmetric form is recovered with

a = \frac{1}{2}

, shown in Equation (19):

x_{t + 1}^{p} = \{\begin{matrix} 2 x_{t}^{p}, 0 < x_{t}^{p} < \frac{1}{2}, \\ 2 (1 - x_{t}^{p}), \frac{1}{2} \leq x_{t}^{p} < 1 . \end{matrix}

(19)

We choose an initial value

x_{0} \in (0, 1)

to avoid a dyadic rational, iterate the map for a burn-in period B to eliminate transient effects, and then collect the subsequent chaotic values {u₁, u₂, u₃, …, u_N}. Subsequently, by mapping this sequence to each decision variable in the population, such as operation sequence OS, machine assignment MA, and AGV dispatching AD, an initial population of size P can be generated.

4.4. Fast Nondominated Sorting Procedure

The purpose of this procedure is to sort the population based on dominance relationship between solutions. Each solution is compared with every other solution in the population to find which one is a nondominated solution. In our bi-objective FJSP-MTA formulation, we minimize the C_max and TEC. Fast nondominated sorting classifies a population P = {x_1, x_2, x₃, …, x_N} into the Pareto front by the dominance relation of minimization. For any solution

p, q \in P

, p dominates q as shown in Equation (20).

p ≺ q \Leftrightarrow [C_{\max} (p) \leq C_{\max} (q) \land TEC (p) \leq TEC (q)] \land [C_{\max} (p) < C_{\max} (q) \lor TEC (p) < TEC (q)]

(20)

For each p ∈ P, define the dominated set and domination count in Equation (21).

S (p) = \{q \in P | p ≺ q\}, n (p) = |\{q \in P | p ≺ q\}|

(21)

The first front is F₁ = {p ∈ P|n(p) = 0}; subsequent fronts are obtained by iteratively “peeling” F_k: for all p ∈ F_k and q ∈ S(p) set n(q) ← n(q) − 1, and then Equation (22) is generated. This yields a partition

P = \cup_{k = 1}^{K} F_{k}

with a nondomination rank

r (x) = k \Leftrightarrow x \in F_{k}

. In practice, the front ranks

F_{k}

guide survivor selection.

F_{k + 1} = \{q \in P \ \cup_{i = 1}^{k} F_{i} | n (q) = 0\}

(22)

4.5. Employed Bee Phase

RL is an adaptive optimization control algorithm that obtains reward through interaction between an agent and its environment, thereby forming a learning strategy. Six basic elements are involved during learning, including the agent, environment, state, action, strategy, and reward, with the learning mechanism as in Figure 6. The agent takes action based on the current state. As the most classic RL algorithm, Q-learning records the agent’s learning experience through a Q-table. The number of rows in the Q-table corresponds to the number of state types, and the number of columns corresponds to the number of actions. The Q-value reflects the rationality of selecting a certain action in each state, with initial Q-values all set to 0. Currently, the Q-learning algorithm is widely applied in hybrid meta-heuristic algorithms, such as using ABC to dynamically adjust parameters, select search operators, and determine algorithm structures. Q-table is used to record the Q-values under the state

S_{t}

and the corresponding action

a_{t}

, which indicates the learning experience of the agent, shown in Equation (23). This paper employs the Q-learning algorithm to dynamically select the search strategy for the employed bee phase. The agent selects action based on the ε−greedy strategy, and the update formula for the Q-table is as shown in Equation (24).

\begin{matrix} s_{1} \\ s_{2} \\ ⋮ \\ s_{t} \end{matrix} [\begin{matrix} Q (s_{1}, a_{1}) & Q (s_{1}, a_{2}) & \dots & Q (s_{1}, a_{t}) \\ Q (s_{2}, a_{1}) & Q (s_{2}, a_{2}) & \dots & Q (s_{2}, a_{t}) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ Q (s_{t}, a_{1}) & Q (s_{t}, a_{2}) & \dots & Q (s_{t}, a_{t}) \end{matrix}]

(23)

Q (S_{t}, a_{t}) = Q (S_{t}, a_{t}) + α [r_{t + 1} + γ \max Q (S_{t + 1}, :) - Q (S_{t}, a_{t})]

(24)

where

a_{t}

is the learning rate,

γ

is the discount factor,

r_{t + 1}

is the reward value obtained by taking action

a_{t}

and interacting with the environment in state

S_{t}

, and max

Q (S_{t + 1}, :)

represents the maximum value in table Q in state

S_{t + 1}

based on the

ε

-greedy strategy.

In this paper, Q-learning is used to determine the search strategy for employed bees. The pseudocode for employed bee phase is shown in Algorithm 2. The prerequisite for applying the Q-learning algorithm is a reasonable design and division of states and the definition of actions that meet the requirements. In addition, it is necessary to reasonably design the reward function. Employed bee plays a leading role in the ABC algorithm. In the original ABC algorithm, the employed bee searches within a specified area to generate new solutions.

Algorithm 2: Employed Bee Phase with Q-learing

Input: Non-dominated archive Ω, Q-table Q, number of neighbourhood search LS,
Learning rate, discount factor, exploration rate;

Output: Updated archive Ω;

1 Let current solution be S (the employed bee’s incumbent);

2 Set iteration counter

t \leftarrow 1

;

3 while

t \leq L S

do

4 Observe state

s_{t} \leftarrow ϕ (S)

;

5 Draw

u \sim U (0, 1)

;

6 if

u \geq ε

then

7

a_{t} \leftarrow \arg \max_{b} Q (s_{t}, b)

;

8 else

9 Sample

a_{t}

uniformly from action set;

10 Generate neighbor

S * \leftarrow N a_{t} (S)

;

11 Evaluate

F (S) = (C \max (S), T E C (S))

and

F (S *) = (C \max (S *), T E C (S *))

;

12 if

F (S *) ≺ F (S)

then

13

S \leftarrow S *

;

R_{t} \leftarrow 10

;

Ω \leftarrow

UpdateArchive

(Ω, S)

;

14 else if

F (S) ≺ F (S *)

15

R_{t} \leftarrow 0

;

16 else

17

R_{t} \leftarrow 5

;

Ω \leftarrow

UpdateArchive

(Ω, S *)

;

18 end if

19 Observe

s_{t + 1} \leftarrow ϕ (S)

;

20

Q (s_{t}, a_{t}) \leftarrow (1 - r) \cdot Q (s_{t}, a_{t}) + r (R_{t} + a \max_{b} Q (s_{t + 1}, b)

;

21

t \leftarrow t + 1

;

22 end while

4.5.1. The State Set

The state can be regarded as the agent’s understanding and encoding for the environment. It typically contains some information about the effects on the environment resulting from decisions taken by the agent. The set of states is composed of all possible states for the problem. Under certain conditions, the agent selects the parameter, and we can obtain the Pareto front after iteration. The change in the Pareto front’s convergence and diversity reflects whether this time parameter selection is meaningful. To represent the agent’s state, two metrics, C_t and D_t, are applied to measure the convergence and diversity of Pareto front Equations (25) and (26):

C_{t} = G D (P_{t}, P^{*}) = \sqrt{\frac{\sum_{y \in P_{t}} \min_{x \in P^{*}} {‖x - y‖}_{2}^{2}}{|P_{t}|}}

(25)

D_{t} = \frac{\sum_{i = 1}^{N - 1} |d_{i} - d_{m}|}{(N - 1) d_{m}}

(26)

where P_t is the Pareto front calculated by the algorithm in each generation, and P^∗ is the reference point. The smaller the C_t, the better the convergence, and ΔC_t < 0. In D_t, d_i is the Euclidean distance between adjacent solutions after ordering P_t along one objective, and d_m is the mean value of d. The bigger D_t the better diversity, and ΔD_t > 0. The steps for selecting the state and action are as follows. State1: ΔC_t > 0 and ΔD_t > 0; State2: ΔC_t > 0 and ΔD_t ≤ 0; State3: ΔC_t ≤ 0 and ΔD_t > 0; and State4: ΔC_t ≤ 0 and ΔD_t ≤ 0. State1 and State2 occur during the early evolution period. State3 and State4 will happen when the population has converged. These four conditions are regarded as four states of the agent.

4.5.2. The Select Strategy

The strategy is the basis for the agent acting as a given, which determines the exploration and exploitation capabilities of the agent. In the work, the ε−greedy select strategy is applied in Q-HMOABC and expressed as in Equation (27):

ε - greedy (s) = \{\begin{cases} \arg \max_{a} Q (S_{t}, a_{t}), ε \geq r \\ Randomly a \in A, ε < r \end{cases}

(27)

where

ε

is the exploration rate,

r

is a random number within (0, 1), and max

Q (S_{t}, a_{t})

represents the highest expected Q-value at the state

S_{t}

when the action a is performed. Equation (25) denotes that the agent can adopt action a with probability

(1 - ε)

or a random action with a smaller probability

ε

; that is, the agent exploits the learning experience with probability

(1 - ε)

and explores new knowledge at each selection.

4.5.3. The Action Set

The agent influences its environment by performing the action in RL, and the action set consists of all the actions that the agent can take. The action set consists of eight mutation operators. The four operators for OS are as follows: (1) POX partitions jobs into two subsets and swaps the corresponding subsequences between parents through order-preserving repair; (2) inversion-based POX (IPOX) crossover interleaves job-subset segments from the two parents in their original relative order; (3) inversion mutation selects a contiguous subsequence and reverses it; and (4) insertion mutation removes one operation and reinserts it at another position. The two mutation operators for MA are as follows: (1) single-point mutation reassigns the machine of one selected operation from its feasible set k(i,j); and (2) TPX applies the same reassignment to k randomly chosen operations. The two mutation operators for AD are as follows: (1) single-point mutation reselects the vehicle/type for one operation from its feasible AGV set; and (2) multi-point mutation reselects the vehicle/type for one operation from their feasible AGV set. Some operators are described as follows:

IPOX: This search operator is applied to the OS vector, and its example is shown in Figure 7a for better clarity. The steps are as follows: First, divide the jobs into two subsets, Js1 = {3} and Js2 = {1, 2}. Next, copy from P1 the elements whose jobs belong to Js1 into the same positions of O1, while copying from P2 the elements whose jobs belong to Js2 into the same positions of O2. Finally, for O1, collect from P2 the remaining elements whose jobs are not in Js1, invert their relative order, and sequentially fill the empty positions; for O2, collect from P1 the remaining elements whose jobs are not in Js2, invert their order, and fill the empty positions.

TPX: This search operator is applied to the MA vector. To illustrate TPX more intuitively, an example is provided in Figure 7b. The steps are as follows: First, select two different points a and b in the MS vector. Next, copy the segment P1 [2: 4] to O1 [2: 4] and P2 [2: 4] to O2 [2: 4]. Finally fill the remaining empty positions of O1 sequentially with the elements of P2 outside [2: 4]; and fill the remaining positions of O2 with the elements of P1 outside [2: 4].

MPX: This search operator is applied to the AD vector. An example of MPX is provided in Figure 7c. The processing steps are as follows: First, randomly select three positions in the VD vector. Next, place the elements of P1 at those three positions into the corresponding positions of O1, and place the elements of P2 at the same positions into O2. Finally, fill the empty positions of O1 in order with the remaining elements from P2, and fill the empty positions of O2 with the remaining elements from P1.

4.5.4. The Reward Set

After executing an action, the agent will get a reward. If State1 occurs, Rt = 10; if State2 and State3 occur, Rt = 5; and if State4 occurs, Rt = 0.

4.6. Onlooker Bee Phase

In the original ABC algorithm, onlooker bees follow employed bees to perform searches. The fitness of each individual is calculated as its search probability, and then a roulette wheel selection method is used to select individuals for neighborhood searches. The higher the fitness of an individual, the greater the probability of being selected. Calculating the search probability of an individual based on fitness values is only applicable to single-objective optimization. In this paper, the optimization has two objectives, making it unsuitable to use fitness values to calculate the search probability of individuals. To enhance search efficacy and diversification for the integrated FJSP-MTA, we embed a Variable Neighborhood Search (VNS) into the onlooker bee phase. Rather than a single random tweak, each onlooker bee explores a portfolio of neighborhoods around the critical path in a fixed order; upon improvement, the search restarts from the first neighborhood, otherwise it proceeds to the next, until a termination criterion is reached. This paper proposes new problem-specific neighborhood structures N_k (k = 1, 2, 3, 4, 5), as illustrated in Figure 8. The neighborhood structures include the following: Find the critical path and swap the operation between the i-th operation and the 2nd operation following it (swap2(i)). Find the critical path and insert operation from the 3rd operation to the i-th operation before it (insert3B(i)). Find the critical path and insert the operation from the i-th operation to the 3rd operation following it (insert3F(i)). Find the critical path and change the i-th selected operation to another eligible machine (change M(i)). Find the critical path and change the i-th AGV assigned to the associated transport task by switching to another AGV (change A(i)).

The changes in neighborhoods are as follows:

N₁: swap2(i): randomly select an integer i $1 \leq i \leq l e n g t h O S - 2$ and swap the positions of the i-th operation and the (i + 2) th operation in the OS vector.
N₂: insert3B(i): randomly select an integer i $1 \leq i \leq l e n g t h O S - 3$ and insert the i-th operation into the position of the (i − 3) th operation.
N₃: insert3F(i): randomly select an integer i $1 \leq i \leq l e n g t h O S - 3$ and insert the i-th operation into the position of the (i + 3) th operation.
N₄: changeM(i): randomly select an integer i $1 \leq i \leq l e n g t h M A$ and change from the optional machine set.
N₅: changeA(i): randomly select an integer i $1 \leq i \leq l e n g t h A D$ and change from the optional AGV set.

4.7. Scout Bee Phase

The scout bee phase primarily serves to enhance the algorithm’s global search capability, preventing it from becoming trapped in local optima. Following both employed bee and onlooker bee searches, if an individual fails to improve solution quality after undergoing limited iterations, a new individual must be regenerated to maintain diversity within the population. The original ABC algorithm uses random initialization to generate an initial solution as the updated value for the individual. However, solutions produced by random initialization tend to be poor. To address this issue, a restart strategy is used in the scout bee phase. For each individual, a stagnation counter records consecutive non-improving iterations and is reset upon improvement. Once the counter reaches limit, the individual is replaced by a newly constructed one. The new solution is generated by randomized constructive initialization: the OS vector is drawn as a random permutation of the operations vector. The MS vector assigns each operation to a machine sampled uniformly from its feasible set defined in the machine feasibility table. And the AD vector is sampled uniformly from {1, …, nV}. After replacement, the objective fitness is re-evaluated and the stagnation counter is cleared, thereby preserving population size while restoring diversity and enabling the search to escape stagnation.

5. Experimentation

To test the effectiveness and superiority of the proposed Q-HMOABC algorithm, three sets of experiments were conducted, the results of which are reported in this section. First, the best set of parameters for the proposed Q-HMOABC was determined using the Taguchi orthogonal experiment. Subsequently, in order to validate the effectiveness of the Q-HMOABC algorithm, unconstrained multi-objective benchmark functions and benchmark instances used in the literature were selected. Ablation experiments were conducted to compare its performance against other variant algorithms. Last, comparative experiments were conducted with other state-of-the-art algorithms on the benchmark instances.

All experiments in this section were coded in Matlab2022a on Windows 10 Professional system with Intel(R) Core(TM) i7-12700 CPU, @2.10 GHz and 16 GB RAM hardware environment. All algorithms ran 20 independent times on each instance with the same stop criteria. Three metrics are used to measure the performance of the different algorithms: generation distance (GD), IGD, and HV. The calculation equations for these metrics can be found in [34]. The lower the GD or IGD value, the better the performance of the algorithm. In contrast, the greater the HV value, the better the performance of the algorithm.

5.1. Experimental Instances

To the best of our knowledge, there are currently no benchmark instances available for the FJSP-MTA. Therefore, based on the “job–machine” benchmark functions for [35], this paper made appropriate modifications to the layout matrix within them. These instances were renamed BU01-09 and DN01-10, which can be downloaded from https://fastmanufacturingproject.wordpress.com/2019/04/11/fjspt-instances/ (accessed on 5 October 2025). The benchmark instances were categorized into two groups: small-scale and large-scale, respectively. For ATEC, the AGV’s empty power was set to 0.9 and 1.1, while loaded power was set to 2.2 and 1.8.

5.2. Experimental Parameters

In this section, a Taguchi approach to the design of experiment (DOE) [36] is adopted to obtain the best combination of parameters. The Q-HMOABC has six parameters, namely population size ps, learning rate α, discount factor γ, exploring rate ε, restarting number N, and number of VNS LS. Each parameter has three levels; each parameter appeared 36 times. The corresponding values are given as follows.

(1): $p s = 40, 60, 80$ .
(2): $α = 0.05, 0.1, 0.15$ .
(3): $γ = 0.7, 0.8, 0.9$ .
(4): $ε = 0.3, 0.5, 0.7$ .
(5): $N = 20, 30, 40$ .
(6): $L S = 5, 10, 15$

An orthogonal array

L_{18} (3^{6})

was adopted in this calibration experiment, and for fairness, the max generation G = 150. The mean values of 20 repeated times are shown in Table 3. Figure 9a shows the main effects plot of six parameters for the IGD metric, and Figure 9b shows the signal-to-noise ratio (S/N) corresponding to the IGD response. The lower the IGD metric values, the better the performance. Based on comprehensive observation, the best parametric value configuration was

p s = 60

,

α = 0.10

,

γ = 0.7

,

ε = 0.7

,

N = 20

, and

L S = 15

.

5.3. Effectiveness of Each Improvement of Q-HMOABC

In this section, ablation experiments were conducted to validate the effectiveness of the proposed Q-HMOABC algorithm against three variant algorithms demonstrate its effectiveness: Q-HMOABC-E, which excludes the RL mechanism in employed bee phase; Q-HMOABC-O, which excludes the VNS in onlooker bee phase; and Q-HMOABC-S, which excludes the restart strategy in scout bee phase.

5.3.1. Effectiveness of Q-HMOABC on Unconstrained Benchmark Functions

To test the performance of the Q-HMOABC algorithm, this section selected the unconstrained multi-objective benchmark functions widely used in the literature [37,38], ZDT3, ZDT4, and ZDT6, as two-objective optimization problems. DTLZ2 and DTLZ6 as three-objective optimization problems.

Table 4 presents the results of the Q-HMOABC-E, Q-HMOABC-O, Q-HMOABC-S, and Q-HMOABC optimization algorithms on five unconstrained benchmark functions for the IGD metric, including mean, maximum (Max), minimum (Min), and standard deviation (Std) values, with the optimal values indicated in bold. As shown in Table 4, the Q-HMOABC algorithm demonstrates superior distribution performance compared to the other three algorithms for the vast majority of functions. However, for the DTLZ2 function, the Q-HMOABC-S algorithm exhibits slightly superior distribution performance compared to the other algorithms. For the ZDT4 and DTLZ5 functions, Q-HMOABC demonstrates obviously superior performance over the other algorithms, proving that the proposed Q-HMOABC not only performs better at the convergence metric level but is also superior at the distribution metric level.

Table 5 presents the results for the five standard functions optimized by Q-HMOABC-E, Q-HMOABC-O, Q-HMOABC-S, and Q-HMOABC on the GD metric, also including the five values mentioned above. The best values are indicated in bold. Table 5 demonstrates that the optimal values are obtained by Q-HMOABC. While its performance on functions ZDT3 and DTLZ2 is only slightly superior to the Q-HMOABC-S algorithm, the Q-HMOABC algorithm significantly outperforms the other three algorithms on functions ZDT4, ZDT6, and DTLZ5.

5.3.2. Effectiveness of Q-HMOABC on Benchmark Instances

To further evaluate the effectiveness of the proposed Q-HMOABC, this section tests the performance of the Q-HMOABC algorithm against three other variant algorithms on benchmark instances. Table 6 records the mean and best values of the HV among all variant algorithms in all benchmark instances. The best values are indicated in bold. It can be seen that in terms of both the values of best and mean, Q-HMOABC was better than other three variant algorithms for 16 instances.

Figure 10 and Figure 11 illustrate the behavior of different algorithms on DN10. Figure 10 shows all algorithms’ Pareto front. Regarding the convergence and diversity of the Pareto front, Q-HMOABC can obtain better solutions than its variant competitors, which means Q-HMOABC can find closer approximations toward the real Pareto front. Meanwhile, we can see from Figure 10 that there is an obvious conflicting relationship between TEC and makespan. Figure 11a illustrates the performance trends of various algorithms on the HV metric after 150 iterations. It is evident that Q-HMOABC’s initial HV value surpassed that of other algorithms and maintained this lead throughout the iteration end. Similarly, it demonstrated optimal performance on the IGD metric, as shown in Figure 11b. As shown in the figures, Q-HMOABC demonstrated faster convergence and achieved superior final metrics values compared to the other three variant algorithms.

In conclusion, the design of Q-HMOABC demonstrated strong global search capability-ties. By utilizing improved strategies across three phases of the ABC, Q-HMOABC effectively explored the solution space of the FJSP-MTA, ensuring comprehensive optimization.

5.4. Comparison of Q-HMOABC with Other Algorithms

To further demonstrate the effectiveness and efficiency of Q-HMOABC, it is compared with state-of-the-art algorithms, namely the MOSSA [39], MOGWO [40], MOWOA [41], and NSGA-II [42]. The parameter settings for each of the aforementioned algorithms compared with Q-HMOABC were set to ps = 100, 120, 140. Additionally, NSGA-II is a well-known multi-objective algorithm. The parameter for it is set to ps = 120 in this section. The maximum iteration count (MIC) is set to 150 for all algorithms. The parameters of Q-HMOABC are configured as described in DOE testing. The comparative mean results of the experiment for HV metric are shown in Table 7, and the comparative mean results of the experiment for IGD metric are shown in Table 8. The mean runtime (in seconds) across all benchmark instances after each of the five algorithms ran independently 20 times is listed in the last row of the two tables. The best values are indicated in bold. Among the 17 benchmark instances, Q-HMOABC achieved 16 optimal mean values in HV, and 17 in IGD.

Additionally, the relative percentage increase (RPI) was employed for the ANOVA analysis of HV and IGD. The RPI is calculated using the following Equation (28):

R P I = \frac{(V_{i} - V_{\max})}{V_{\max}} \times 100 %

(28)

where Vmax represents the maximum value of the performance metrics collected across the compared algorithms, and V_i denotes the performance metric value obtained by algorithm i. A paired t-test was also conducted at the 95% confidence level for the RPI values, with the results shown in Table 9. The p-values are less than 0.05, suggesting that Q-HMOABC is statistically superior to MOSSA, MOGWO, MOWOA, and NSGA-II.

To illustrate the differences between the algorithms more intuitively, Figure 12 presents the Pareto front of DN05 obtained using different algorithms. The Pareto approximations show Q-HMOABC forming the leading frontier: its solutions occupy the lower-left region, with a smaller makespan and TEC. Other algorithms lie largely above this envelope, thus providing weaker trade-offs. Overall, Figure 12 indicates superior convergence and coverage of the true front by Q-HMOABC. From this Pareto front diagram, it can also be observed that at a makespan of around 110, the Q-HMOABC algorithm achieves approximately 2650 TEC in the DN05 case study, while the MOGWO algorithm achieves 2900 TEC. The TEC obtained by the other three algorithms is even higher. In addition, the performance metrics from 20 computations for each benchmark instance are illustrated as box plots in Figure 13, and the discrete points denote outliers. In Figure 13a, Q-HMOABC delivers higher HV than the other four algorithms. As shown in Figure 13b, Q-HMOABC obtains lower IGD than the competitors, indicating a closer and more uniform approximation to the Pareto front; NSGA-II occasionally reaches comparably low values. Overall, the results suggest that Q-HMOABC achieves a superior balance of convergence and diversity. These box plots clearly demonstrate that Q-HMOABC outperforms MOSSA, MOGWO, MOWOA, and NSGA-II in terms of optimal objective value and computation stability.

In conclusion, Q-HMOABC outperforms the existing state-of-the-art algorithms. It has a greater search ability for the solution space of the FJSP-MTA and three strategies across ABC phases that can provide better guidance to improve the search accuracy of the nondominated solutions.

6. Discussion

The experimental findings of this study extend beyond algorithmic superiority, offering broader implications for smart manufacturing. The explicit modeling of multi-type AGVs demonstrates that accounting for transporter heterogeneity is essential for accurately balancing production efficiency and energy consumption. Furthermore, the success of the Q-learning hybrid framework underscores the potential of creating more adaptive and autonomous optimization systems for complex scheduling.

However, several limitations affect the immediate industrial applicability of our work. While computationally superior to exact methods, our approach may still face challenges in very large-scale dynamic environments requiring real-time scheduling. More critically, the model incorporates several simplifying assumptions that limit its practical implementation. The exclusion of AGV battery constraints represents a particularly significant simplification, as it fails to capture the operational realities of AGV systems. Incorporating such power management considerations would inherently transform the problem into a more complex but practically relevant dynamic scheduling framework, where charging decisions must be continuously optimized alongside production tasks. This omission currently overlooks battery degradation, charging scheduling, and limited charging station resources. Without considering these constraints, our model may generate theoretically optimal but practically infeasible schedules due to power depletion during critical operations. Additionally, the deterministic nature of our model and its simplified energy consumption calculation—neglecting effects of acceleration, deceleration, and payload variations—further constrain its direct industrial application, highlighting the gap between our current framework and the stochastic conditions of actual manufacturing environments.

7. Conclusions and Future Research

Although the FJSP with AGV transportation resources has widely appeared in modern manufacturing, the FJSP with multi-type AGVs (FJSP-MTA) has been seldom researched. This work proposes a MILP model and Q-HMOABC for the FJSP-MTA with minimizing makespan and TEC. In Q-HMOABC, a three-layer encoding based on operation sequence, machine assignment, and AGV dispatching is used. Moreover, improvement strategies are employed across three phases of the Q-HMOABC algorithm: the employed bee phase integrated RL, the onlooker bee phase incorporated VNS, and the scout bee phase adopted a restart strategy, thereby enhancing the efficiency of solution space exploration. Ablation experiments are conducted on the unconstrained benchmark functions and benchmark instances. The comparison results prove that the three phases’ improvement strategy can more accurately approximate the true Pareto front. Similarly, comparative experiments are conducted with other state-of-the-art algorithms on benchmark instances, with performance metrics demonstrating the effectiveness of Q-HMOABC.

In future research, studies will focus on the following aspects. Firstly, additional constraints, such as sudden job insertion and AGV charging issues, will be incorporated during the modeling phase to further enhance the applicability of the problem. Secondly, our proposed improvement strategies may not be suitable for larger-scale benchmark instances. We will design an effective evolutionary strategy for the FJSP-MTA to improve the algorithm’s efficiency further and validate it on more complex benchmark instances. We will conduct a sensitivity analysis to systematically evaluate the effects of key parameters and guide their tuning in diverse industrial settings. Finally, swarm intelligence optimization represents a prominent area of contemporary research. Investigating whether multi-population coevolution can improve the effectiveness and robustness of the algorithm about FJSP-MTA warrants further inquiry.

Author Contributions

Conceptualization, S.G. and H.Z.; methodology, S.G. and H.Z.; formal analysis, S.G.; writing—original draft preparation, S.G.; writing—review and editing, S.G.; visualization, S.G. and Z.Y.; project administration, Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Youth Program of the Basic Research Plan, Shenyang Institute of Automation, Chinese Academy of Sciences, under Grant 2023JC1K11; and in part by the Liaoning Provincial Natural Science Foundation, under Grant 2025-MS-084.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Acknowledgments

We would like to acknowledge the anonymous reviewers and the editor for their suggestions, which improved the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Notations

Notations	Definitions
Parameter:
I	Job set. (I = 1, 2, 3, …, n)
J_i	Operation set of job i. (J_i = 1, 2, 3, …, n_i)
K	Machine set. (K = 1, 2, 3, …, m)
V	Total numbers of AGVs set. (V = 1, 2, 3, …, nV)
T	Total types of AGVs set. (T = 1, 2, 3, …, t)
V_t	Number of AGVs of type t. (t ∈ T)
i, i′	Job indices.
j, j′	Operation indices.
v, v′	AGV indices, including all types of AGVs.
k, k′	Machines indices.
O_i,j	The j-th operation of job i.
N	Total number of operations of all jobs.
n	Job number.
m	Machine number.
n_i	Operation number of job i.
t	AGV type indices.
nV	Total number of AGVs of all types.
K_(i,j)	Optional machine set for O_i,j.
PT_i,j,k	Processing time of O_i_,_j on machine k.
TT_k,k_′_,t	Transportation time of k and k′ type t AGV.
PP_k	Processing power on machine k (k ∈ K).
${I P}_{k}$	Idle power of machine k (k ∈ K).
${A P}_{e m p t y}^{t}$	The empty power of AGV t (t ∈ T).
${A P}_{l o a d}^{t}$	The load power of AGV t (t ∈ T).
TEC	Total energy consumption in workshop.
MPEC	Machine processing energy consumption.
MIEC	Machine idle energy consumption.
ATEC	AGV total energy consumption.
M	A large positive number.
Decision variable:
C_max	Makespan
X_i,j,k,t,v	When O_i,j is processed on machine k and transported by AGV v of type t. (X_i,j,k,t,v = 0, 1).
YM_i,j,i_′_,j_′	When O_i,j is processed before O_i_′_,j_′ by the same machine. (YM_i,j,_i_′_,j_′ = 0, 1).
YT_t,v,i,j,i_′_,j_′	When O_i,j is transported before O_i_′_,j_′ by AGV v of type t. (YT_{t,v, i,j,} _i_′_j_′ = 0, 1)
S_i,j	Start processing time of O_i,j.
SA_i,j,t,v	Arrive time of AGV v of type t for transporting O_i,j.
EM_k	End processing time for machine k.
EA_t,v	End transporting time of AGV v of type t.

References

Mareddy, P.L.; Narapureddy, S.R.; Dwivedula, V.R.; Karanam, P.R. Development of scheduling methodology in a multi-machine flexible manufacturing system without tool delay employing flower pollination algorithm. Eng. Appl. Artif. Intell. 2022, 115, 105275. [Google Scholar] [CrossRef]
Kacem, I.; Hammadi, S.; Borne, P. Approach by localization and multiobjective evolutionary optimization for flexible job-shop scheduling problems. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2002, 32, 1–13. [Google Scholar] [CrossRef]
Yan, J.; Liu, Z.; Zhang, C.; Zhang, T.; Zhang, Y.; Yang, C. Research on flexible job shop scheduling under finite transportation conditions for digital twin workshop. Robot. Comput.-Integr. Manuf. 2021, 72, 102198. [Google Scholar] [CrossRef]
Pan, Z.; Lei, D.; Wang, L. A Bi-Population Evolutionary Algorithm With Feedback for Energy-Efficient Fuzzy Flexible Job Shop Scheduling. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 5295–5307. [Google Scholar] [CrossRef]
Saidi-Mehrabad, M.; Dehnavi-Arani, S.; Evazabadian, F.; Mahmoodian, V. An Ant Colony Algorithm (ACA) for solving the new integrated model of job shop scheduling and conflict-free routing of AGVs. Comput. Ind. Eng. 2015, 86, 2–13. [Google Scholar] [CrossRef]
Xu, G.; Bao, Q.; Zhang, H. Multi-objective green scheduling of integrated flexible job shop and automated guided vehicles. Eng. Appl. Artif. Intell. 2023, 126, 106864. [Google Scholar] [CrossRef]
Ahmadi-Javid, A.; Hooshangi-Tabrizi, P. Integrating employee timetabling with scheduling of machines and transporters in a job-shop environment: A mathematical formulation and an Anarchic Society Optimization algorithm. Comput. Oper. Res. 2017, 84, 73–91. [Google Scholar] [CrossRef]
Dang, Q.-V.; Singh, N.; Adan, I.; Martagan, T.; van de Sande, D. Scheduling heterogeneous multi-load AGVs with battery constraints. Comput. Oper. Res. 2021, 136, 105517. [Google Scholar] [CrossRef]
Huo, X.; He, X.; Xiong, Z.; Wu, X. Multi-objective optimization for scheduling multi-load automated guided vehicles with consideration of energy consumption. Transp. Res. Part C Emerg. Technol. 2024, 161, 104548. [Google Scholar] [CrossRef]
Fontes, D.B.M.M.; Homayouni, S.M.; Fernandes, J.C. Energy-efficient job shop scheduling problem with transport resources considering speed adjustable resources. Int. J. Prod. Res. 2024, 62, 867–890. [Google Scholar] [CrossRef]
Zhang, R.; Yu, H.; Gao, K.; Fu, Y.; Kim, J.H. A Q-learning based artificial bee colony algorithm for solving surgery scheduling problems with setup time. Swarm Evol. Comput. 2024, 90, 101686. [Google Scholar] [CrossRef]
Ajani, O.S.; Ivan, D.F.; Darlan, D.; Suganthan, P.N.; Gao, K.; Mallipeddi, R. Deep reinforcement learning as multiobjective optimization benchmarks: Problem formulation and performance assessment. Swarm Evol. Comput. 2024, 90, 101692. [Google Scholar] [CrossRef]
Wang, J.J.; Wang, L. A Cooperative Memetic Algorithm With Learning-Based Agent for Energy-Aware Distributed Hybrid Flow-Shop Scheduling. IEEE Trans. Evol. Comput. 2022, 26, 461–475. [Google Scholar] [CrossRef]
Raman, N. Simultaneous scheduling of machines and material handling devices in automated manufacturing. In Proceedings of the Second ORSA/TIMS Conference on Flexible Manufacturing Systems: Operations Research Models and Applications, Ann Arbor, MI, USA, 12–15 August 1986. [Google Scholar]
Bilge, U.; Ulusoy, G. A Time Window Approach to Simultaneous Scheduling of Machines and Material Handling System in an FMS. Oper. Res. 1995, 43, 1058–1070. [Google Scholar] [CrossRef]
Ham, A. Transfer-robot task scheduling in flexible job shop. J. Intell. Manuf. 2020, 31, 1783–1793. [Google Scholar] [CrossRef]
Pan, Z.; Wang, L.; Zheng, J.; Chen, J.F.; Wang, X. A Learning-Based Multipopulation Evolutionary Optimization for Flexible Job Shop Scheduling Problem With Finite Transportation Resources. IEEE Trans. Evol. Comput. 2023, 27, 1590–1603. [Google Scholar] [CrossRef]
Meng, L.; Cheng, W.; Zhang, B.; Zou, W.; Fang, W.; Duan, P. An Improved Genetic Algorithm for Solving the Multi-AGV Flexible Job Shop Scheduling Problem. Sensors 2023, 23, 3815. [Google Scholar] [CrossRef] [PubMed]
Han, X.; Cheng, W.; Meng, L.; Zhang, B.; Gao, K.; Zhang, C.; Duan, P. A dual population collaborative genetic algorithm for solving flexible job shop scheduling problem with AGV. Swarm Evol. Comput. 2024, 86, 101538. [Google Scholar] [CrossRef]
Homayouni, S.M.; Fontes, D.B.M.M. Production and transport scheduling in flexible job shop manufacturing systems. J. Glob. Optim. 2021, 79, 463–502. [Google Scholar] [CrossRef]
Homayouni, S.M.; Fontes, D.B.M.M.; Gonçalves, J.F. A multistart biased random key genetic algorithm for the flexible job shop scheduling problem with transportation. Int. Trans. Oper. Res. 2023, 30, 688–716. [Google Scholar] [CrossRef]
Liu, Z.; Luo, Q.; Wang, L.; Tang, H.; Li, Y. The Low-Carbon Scheduling Optimization of Integrated Multispeed Flexible Manufacturing and Multi-AGV Transportation. Processes 2022, 10, 1944. [Google Scholar] [CrossRef]
Zhang, F.; Li, R.; Gong, W. Deep reinforcement learning-based memetic algorithm for energy-aware flexible job shop scheduling with multi-AGV. Comput. Ind. Eng. 2024, 189, 109917. [Google Scholar] [CrossRef]
Fontes, D.B.M.M.; Homayouni, S.M. Joint production and transportation scheduling in flexible manufacturing systems. J. Glob. Optim. 2019, 74, 879–908. [Google Scholar] [CrossRef]
Zhang, G.; Hu, Y.; Sun, J.; Zhang, W. An improved genetic algorithm for the flexible job shop scheduling problem with multiple time constraints. Swarm Evol. Comput. 2020, 54, 100664. [Google Scholar] [CrossRef]
Ren, Y.; Wu, S.; Chen, S.; Burdette, J.E.; Cheng, X.; Kinghorn, A.D. Interaction of (+)-Strebloside and Its Derivatives with Na+/K+-ATPase and Other Targets. Molecules 2021, 26, 5675. [Google Scholar] [CrossRef]
Gao, K.Z.; Suganthan, P.N.; Chua, T.J.; Chong, C.S.; Cai, T.X.; Pan, Q.K. A two-stage artificial bee colony algorithm scheduling flexible job-shop scheduling problem with new job insertion. Expert Syst. Appl. 2015, 42, 7652–7663. [Google Scholar] [CrossRef]
Dong, X.; Wan, G.; Zeng, P. A heuristic-assisted deep reinforcement learning algorithm for flexible job shop scheduling with transport constraints. Complex Intell. Syst. 2025, 11, 210. [Google Scholar] [CrossRef]
Cheng, W.; Meng, L.; Zhang, B.; Gao, K.; Sang, H. Imitation Learning-Assisted Evolutionary Algorithm for Energy-Efficient Flexible Job Shop Scheduling Problem with Automated Guided Vehicles. IEEE Trans. Evol. Comput. 2025, 1–15. [Google Scholar] [CrossRef]
Li, Y.; Liao, C.; Wang, L.; Xiao, Y.; Cao, Y.; Guo, S. A Reinforcement Learning-Artificial Bee Colony algorithm for Flexible Job-shop Scheduling Problem with Lot Streaming. Appl. Soft Comput. 2023, 146, 110658. [Google Scholar] [CrossRef]
Xu, S.; Li, Y.; Li, Q. A Deep Reinforcement Learning Method Based on a Transformer Model for the Flexible Job Shop Scheduling Problem. Electronics 2024, 13, 3696. [Google Scholar] [CrossRef]
Zhang, Z.-Q.; Wu, F.-C.; Qian, B.; Hu, R.; Wang, L.; Jin, H.-P. A Q-learning-based hyper-heuristic evolutionary algorithm for the distributed flexible job-shop scheduling problem with crane transportation. Expert Syst. Appl. 2023, 234, 121050. [Google Scholar] [CrossRef]
Yang, Y.; Yang, M.; Anwer, N.; Eynard, B.; Shu, L.H.; Xiao, J. A novel digital twin-assisted prediction approach for optimum rescheduling in high-efficient flexible production workshops. Comput. Ind. Eng. 2023, 182, 109398. [Google Scholar] [CrossRef]
Meng, L.; Zhang, C.; Zhang, B.; Gao, K.; Ren, Y.; Sang, H. MILP modeling and optimization of multi-objective flexible job shop scheduling problem with controllable processing times. Swarm Evol. Comput. 2023, 82, 101374. [Google Scholar] [CrossRef]
Deroussi, L.; Norre, S. Simultaneous scheduling of machines and vehicles for the flexible job shop problem. In Proceedings of the International Conference on Metaheuristics and Nature Inspired Computing, Djerba, Tunisia, 28–30 October 2010. [Google Scholar]
Van Nostrand, R.C. Design of Experiments Using the Taguchi Approach: 16 Steps to Product and Process Improvement. Technometrics 2002, 44, 289. [Google Scholar] [CrossRef]
Zitzler, E.; Deb, K.; Thiele, L. Comparison of Multiobjective Evolutionary Algorithms: Empirical Results. Evol. Comput. 2000, 8, 173–195. [Google Scholar] [CrossRef]
Deb, K.; Thiele, L.; Laumanns, M.; Zitzler, E. Scalable multi-objective optimization test problems. In Proceedings of the 2002 Congress on Evolutionary Computation. CEC’02 (Cat. No.02TH8600), Hilton, HI, USA, 12–17 May 2002; Volume 821, pp. 825–830. [Google Scholar]
Li, Y.; Xu, R.; Yan, L.; Gu, S. Research on joint scheduling of AGVs and machines in multi-objective flexible job shop based on ISSA. Robot. Intell. Autom. 2025, 45, 314–325. [Google Scholar] [CrossRef]
Wei, Z.; Yu, Z.; Niu, R.; Zhao, Q.; Li, Z. Research on Flexible Job Shop Scheduling Method for Agricultural Equipment Considering Multi-Resource Constraints. Agriculture 2025, 15, 442. [Google Scholar] [CrossRef]
Zhang, T.; Wei, M.; Gao, X. Modeling an Optimal Environmentally Friendly Energy-Saving Flexible Workshop. Appl. Sci. 2023, 13, 11896. [Google Scholar] [CrossRef]
Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]

Figure 1. Gantt chart of a feasible schedule.

Figure 2. Examples of the solution representation: (a) OS and (b) MA.

Figure 3. An example of the AD solution representation.

Figure 4. Example of PMX for OS genes.

Figure 5. Example of POX for AD genes.

Figure 6. The mechanism of RL.

Figure 7. Examples of operators for action set. (a) IPOX, (b) TPX, (c) MPX.

Figure 8. Five neighborhood structures of onlooker bee phase.

Figure 9. (a) Main effects plot of IGD metric, (b) S/N corresponding to the IGD metric.

Figure 10. Pareto front approximations by different variant algorithms on DN10.

Figure 11. Iteration curve for metrics by different algorithms on DN10: (a) HV metric and (b) IGD metric.

Figure 12. Pareto front approximations by different algorithms on DN5.

Figure 13. Box plots of metrics for five algorithms: (a) HV metric and (b) IGD metric.

Table 1. Cases of compatible machines/processing time of each operation.

Job	Operation	M1	M2	M3
Job 1	O_1,1	50	62	—
Job 1	O_1,2	—	40	55
Job 2	O_2,1	55	47	50
Job 2	O_2,2	65	55	—
Job 3	O_3,1	75	50	43
Job 3	O_3,2	43	—	52

Table 2. Transportation time between machines for two types of AGVs.

AGV_T1	M0	M1	M2	M3	AGV_T2	M0	M1	M2	M3
M0	0	20	30	40	M0	0	40	60	80
M1	20	0	20	30	M1	40	0	40	60
M2	30	20	0	20	M2	60	40	0	40
M3	40	30	20	0	M3	80	60	40	0

Table 3. Mean of IGD value of BU09 different combination parameters.

Test	Parameter						Mean of IGD
Test	ps	α	γ	ε	N	LS	Mean of IGD
1	40	0.05	0.7	0.3	20	5	0.38786
2	40	0.1	0.8	0.5	30	10	0.37171
3	40	0.15	0.9	0.7	40	15	0.36829
4	60	0.05	0.7	0.5	30	15	0.36577
5	60	0.1	0.8	0.7	40	5	0.37891
6	60	0.15	0.9	0.3	20	10	0.37141
7	80	0.05	0.8	0.3	40	10	0.37171
8	80	0.1	0.9	0.5	20	15	0.36714
9	80	0.15	0.7	0.7	30	5	0.38339
10	40	0.05	0.9	0.7	30	10	0.37181
11	40	0.1	0.7	0.3	40	15	0.36324
12	40	0.15	0.8	0.5	20	5	0.38538
13	60	0.05	0.8	0.7	20	15	0.36154
14	60	0.1	0.9	0.3	30	5	0.38859
15	60	0.15	0.7	0.5	40	10	0.36797
16	80	0.05	0.9	0.5	40	5	0.38515
17	80	0.1	0.7	0.7	20	10	0.36261
18	80	0.15	0.8	0.3	30	15	0.36585

Table 4. The comparison of the IGD metric on unconstrained benchmark functions.

Functions	Statistics	Q-HMOABC-E	Q-HMOABC-O	Q-HMOABC-S	Q-HMOABC
ZDT3	Mean	0.02087	0.03223	0.01979	0.01547
	Max	0.02621	0.03745	0.02295	0.01612
	Min	0.01594	0.02643	0.01625	0.01487
	Std	1.742 × 10⁻³	1.834 × 10⁻³	1.177 × 10⁻³	2.135 × 10⁻⁴
ZDT4	Mean	0.5258	1.7716	2.4999	7.327 × 10⁻³
	Max	0.5822	1.8409	2.8305	7.605 × 10⁻³
	Min	0.4710	1.6812	2.1164	7.018 × 10⁻³
	Std	0.0305	0.0437	0.1956	1.751 × 10⁻⁴
ZDT6	Mean	2.935 × 10⁻³	2.972 × 10⁻³	0.01029	2.025 × 10⁻³
	Max	3.054 × 10⁻³	3.103 × 10⁻³	0.01530	2.827 × 10⁻³
	Min	2.861 × 10⁻³	2.823 × 10⁻³	8.154 × 10⁻³	1.495 × 10⁻³
	Std	5.217 × 10⁻⁵	8.499 × 10⁻⁵	1.983 × 10⁻³	2.317 × 10⁻⁵
DTLZ2	Mean	0.09558	0.05251	0.05147	0.04958
	Max	0.12074	0.06307	0.05866	0.05940
	Min	0.06182	0.04730	0.04327	0.04129
	Std	0.01614	4.965 × 10⁻³	4.086 × 10⁻³	5.203 × 10⁻³
DTLZ5	Mean	0.02981	0.01431	0.01112	8.064 × 10⁻³
	Max	0.04205	0.02774	0.01896	8.237 × 10⁻³
	Min	0.02578	8.161 × 10⁻³	7.817 × 10⁻³	7.147 × 10⁻³
	Std	4.415 × 10⁻³	5.341 × 10⁻³	3.055 × 10⁻³	2.769 × 10⁻⁴

Table 5. The comparison of the GD metric on unconstrained benchmark functions.

Functions	Statistics	Q-HMOABC-E	Q-HMOABC-O	Q-HMOABC-S	Q-HMOABC
ZDT3	Mean	1.735 × 10⁻³	2.616 × 10⁻³	1.297 × 10⁻³	1.033 × 10⁻³
	Max	2.031 × 10⁻³	2.913 × 10⁻³	1.443 × 10⁻³	1.276 × 10⁻³
	Min	1.596 × 10⁻³	2.205 × 10⁻³	1.139 × 10⁻³	9.409 × 10⁻⁴
	Std	1.182 × 10⁻⁴	1.934 × 10⁻⁴	8.306 × 10⁻⁵	9.178 × 10⁻⁵
ZDT4	Mean	0.06360	0.2121	0.3029	9.217 × 10⁻⁵
	Max	0.09144	0.3513	0.4501	1.153 × 10⁻⁴
	Min	0.04801	0.1247	0.1773	8.592 × 10⁻⁵
	Std	9.866 × 10⁻³	0.05507	0.06845	8.024 × 10⁻⁶
ZDT6	Mean	0.07706	0.09419	0.07757	5.227 × 10⁻⁴
	Max	0.1043	0.1358	0.1132	7.905 × 10⁻⁴
	Min	0.05942	0.06703	0.06419	4.143 × 10⁻⁴
	Std	9.816 × 10⁻³	0.01625	0.01352	1.030 × 10⁻⁴
DTLZ2	Mean	9.264 × 10⁻³	2.215 × 10⁻³	1.810 × 10⁻³	1.614 × 10⁻³
	Max	0.01294	2.971 × 10⁻³	1.945 × 10⁻³	1.803 × 10⁻³
	Min	6.671 × 10⁻³	1.523 × 10⁻³	1.662 × 10⁻³	1.425 × 10⁻³
	Std	1.529 × 10⁻³	3.581 × 10⁻⁴	6.794 × 10⁻⁵	9.725 × 10⁻⁵
DTLZ5	Mean	2.832 × 10⁻³	1.340 × 10⁻³	1.207 × 10⁻³	3.538 × 10⁻⁴
	Max	3.281 × 10⁻³	1.736 × 10⁻³	1.553 × 10⁻³	3.910 × 10⁻⁴
	Min	2.394 × 10⁻³	1.039 × 10⁻³	9.835 × 10⁻⁴	2.968 × 10⁻⁴
	Std	2.018 × 10⁻⁴	1.909 × 10⁻⁴	1.431 × 10⁻⁴	2.164 × 10⁻⁵

Table 6. The comparison of the HV metric on benchmark instances.

Instances	Q-HMOABC-E		Q-HMOABC-O		Q-HMOABC-S		Q-HMOABC
Instances	Best	Mean	Best	Mean	Best	Mean	Best	Mean
BU01	0.5237	0.5092	0.5004	0.4922	0.5214	0.5059	0.5593	0.5471
BU02	0.4674	0.4555	0.4528	0.4423	0.4702	0.4605	0.5012	0.4947
BU04	0.5587	0.5332	0.5574	0.5134	0.5647	0.5273	0.6283	0.5921
BU05	0.5129	0.5075	0.5310	0.5103	0.5382	0.5178	0.5599	0.5422
BU07	0.5096	0.4885	0.5127	0.4787	0.5209	0.5067	0.5556	0.5384
BU08	0.4828	0.4504	0.4614	0.4264	0.4837	0.4427	0.4967	0.4788
BU09	0.5114	0.5070	0.5216	0.5051	0.5492	0.5229	0.5655	0.5486
DN01	0.4831	0.4620	0.4437	0.4275	0.4756	0.4633	0.4969	0.4828
DN02	0.4523	0.4406	0.4114	0.3958	0.4419	0.4293	0.4889	0.4617
DN03	0.4590	0.4484	0.4540	0.4107	0.4527	0.4309	0.4930	0.4766
DN04	0.5084	0.4872	0.4506	0.4258	0.4830	0.4602	0.5050	0.4796
DN05	0.5056	0.4796	0.4836	0.4493	0.5020	0.4706	0.5207	0.5036
DN06	0.4602	0.4375	0.4410	0.4112	0.4763	0.4468	0.4864	0.4654
DN07	0.5008	0.4845	0.4568	0.4399	0.5074	0.4836	0.5190	0.4992
DN08	0.4724	0.4594	0.4674	0.4321	0.4727	0.4584	0.4921	0.4838
DN09	0.5078	0.4906	0.4871	0.4544	0.4901	0.4807	0.5186	0.5101
DN10	0.4273	0.4178	0.3895	0.3804	0.4269	0.4003	0.4785	0.4537

Table 7. Comparison of HV metric for Q-HMOABC, MOSSA, MOGWO, MOWOA, and NSGA-II on benchmark instances.

Instances	HV
Instances	Q-HMOABC	MOSSA	MOGWO	MOWOA	NSGA-II
BU01	0.5051	0.3921	0.4245	0.4027	0.4525
BU02	0.5021	0.4013	0.4148	0.4046	0.4526
BU04	0.6063	0.4519	0.5024	0.4494	0.5423
BU05	0.5526	0.4332	0.4673	0.4390	0.5031
BU07	0.5217	0.4214	0.4374	0.4051	0.4643
BU08	0.5507	0.4535	0.4716	0.4429	0.5053
BU09	0.5425	0.4281	0.4449	0.4298	0.4979
DN01	0.5182	0.4626	0.4769	0.4533	0.4739
DN02	0.4741	0.4021	0.4223	0.4090	0.4284
DN03	0.4704	0.3934	0.4321	0.4017	0.4710
DN04	0.5242	0.4116	0.4703	0.4126	0.4916
DN05	0.5192	0.4354	0.4667	0.4306	0.4632
DN06	0.4665	0.3935	0.4294	0.3927	0.4225
DN07	0.4841	0.4098	0.4384	0.4124	0.4324
DN08	0.4360	0.3751	0.4026	0.3891	0.4104
DN09	0.4906	0.4123	0.4378	0.3996	0.4449
DN10	0.4635	0.3886	0.4100	0.3875	0.4064
Mean Runtime(s)	714.6	811.7	761.2	774.3	730.5

Table 8. Comparison of IGD metric for Q-HMOABC, MOSSA, MOGWO, MOWOA, and NSGA-II on benchmark instances.

Instances	IGD
Instances	Q-HMOABC	MOSSA	MOGWO	MOWOA	NSGA-II
BU01	0.3857	0.4912	0.4537	0.4861	0.4255
BU02	0.3863	0.4809	0.4716	0.4781	0.4176
BU04	0.2801	0.4189	0.3655	0.4228	0.3281
BU05	0.3476	0.4477	0.4064	0.4410	0.3744
BU07	0.3600	0.4568	0.4302	0.4656	0.4031
BU08	0.3318	0.4183	0.4019	0.4271	0.3701
BU09	0.3400	0.4441	0.4213	0.4422	0.3743
DN01	0.3609	0.4120	0.3973	0.4250	0.3953
DN02	0.4340	0.4935	0.4675	0.4951	0.4573
DN03	0.4195	0.4879	0.4601	0.4774	0.4394
DN04	0.3485	0.4587	0.3959	0.4574	0.3782
DN05	0.3566	0.4434	0.4010	0.4448	0.4072
DN06	0.4494	0.4967	0.4637	0.4987	0.4679
DN07	0.4109	0.4705	0.4494	0.4651	0.4442
DN08	0.4423	0.5008	0.4818	0.5020	0.4703
DN09	0.4059	0.4729	0.4556	0.4729	0.4358
DN10	0.4327	0.4957	0.4663	0.4921	0.4737
Mean Runtime(s)	638.4	733.9	682.0	706.2	661.5

Table 9. Paired-t test for MOSSA, MOGWO, MOWOA and NSGA-II.

Algorithms	HV	IGD
Q-HMOABC vs. MOSSA	0.010	0.007
Q-HMOABC vs. MOGWO	0.006	0.010
Q-HMOABC vs. MOWOA	0.008	0.006
Q-HMOABC vs. NSGA-II	0.000	0.000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ge, S.; Zhang, H.; Xu, Z.; Yang, Z. A Multi-Objective Artificial Bee Colony Algorithm Incorporating Q-Learning Search for the Flexible Job Shop Scheduling Problems with Multi-Type Automated Guided Vehicles. Appl. Sci. 2025, 15, 10948. https://doi.org/10.3390/app152010948

AMA Style

Ge S, Zhang H, Xu Z, Yang Z. A Multi-Objective Artificial Bee Colony Algorithm Incorporating Q-Learning Search for the Flexible Job Shop Scheduling Problems with Multi-Type Automated Guided Vehicles. Applied Sciences. 2025; 15(20):10948. https://doi.org/10.3390/app152010948

Chicago/Turabian Style

Ge, Shihong, Hao Zhang, Zhigang Xu, and Zhiqi Yang. 2025. "A Multi-Objective Artificial Bee Colony Algorithm Incorporating Q-Learning Search for the Flexible Job Shop Scheduling Problems with Multi-Type Automated Guided Vehicles" Applied Sciences 15, no. 20: 10948. https://doi.org/10.3390/app152010948

APA Style

Ge, S., Zhang, H., Xu, Z., & Yang, Z. (2025). A Multi-Objective Artificial Bee Colony Algorithm Incorporating Q-Learning Search for the Flexible Job Shop Scheduling Problems with Multi-Type Automated Guided Vehicles. Applied Sciences, 15(20), 10948. https://doi.org/10.3390/app152010948

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

AGV_T1	M0	M1	M2	M3	AGV_T2	M0	M1	M2	M3
M0	0	20	30	40	M0	0	40	60	80
M1	20	0	20	30	M1	40	0	40	60
M2	30	20	0	20	M2	60	40	0	40
M3	40	30	20	0	M3	80	60	40	0

AGV_T1	M0	M1	M2	M3	AGV_T2	M0	M1	M2	M3
M0	0	20	30	40	M0	0	40	60	80
M1	20	0	20	30	M1	40	0	40	60
M2	30	20	0	20	M2	60	40	0	40
M3	40	30	20	0	M3	80	60	40	0

Article Menu

A Multi-Objective Artificial Bee Colony Algorithm Incorporating Q-Learning Search for the Flexible Job Shop Scheduling Problems with Multi-Type Automated Guided Vehicles

Abstract

1. Introduction

2. Literature Review

2.1. FJSP with AGV Transportation Resources

2.2. Solving Methods

3. Problem Description and Formulation

3.1. Problem Description

Assumptions

3.2. Formulation of MILP Model

3.3. Example of FJSP-MTA

4. Q-Hybrid Multi Objective Artificial Bee Colony (Q-HMOABC)

4.1. Framework of the Q-HMOABC

4.2. Solution Representation

Crossover and Mutation

4.3. Population Initialization

4.4. Fast Nondominated Sorting Procedure

4.5. Employed Bee Phase

4.5.1. The State Set

4.5.2. The Select Strategy

4.5.3. The Action Set

4.5.4. The Reward Set

4.6. Onlooker Bee Phase

4.7. Scout Bee Phase

5. Experimentation

5.1. Experimental Instances

5.2. Experimental Parameters

5.3. Effectiveness of Each Improvement of Q-HMOABC

5.3.1. Effectiveness of Q-HMOABC on Unconstrained Benchmark Functions

5.3.2. Effectiveness of Q-HMOABC on Benchmark Instances

5.4. Comparison of Q-HMOABC with Other Algorithms

6. Discussion

7. Conclusions and Future Research

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Notations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

AGV_T1	M0	M1	M2	M3	AGV_T2	M0	M1	M2	M3
M0	0	20	30	40	M0	0	40	60	80
M1	20	0	20	30	M1	40	0	40	60
M2	30	20	0	20	M2	60	40	0	40
M3	40	30	20	0	M3	80	60	40	0