1. Introduction
With the advancement of intelligent manufacturing and workshop digitization, Job-Shop Scheduling (JSP) has become increasingly uncertain and spatiotemporal coupled due to factors such as multi-resource coupling [
1], complex processes, and constrained logistics. The goal of JSP is to determine processing sequences and resource allocation for multiple machines handling a series of operations. This problem has been proven NP-hard [
2,
3], and in real-world scenarios, it often faces challenges like random order arrivals, equipment status fluctuations, and delivery constraints, evolving into a dynamic problem requiring higher algorithmic adaptability and real-time performance. With the development of artificial intelligence, data-driven approaches such as deep learning (Yuan and Li) [
4] and machine learning (Das et al.) [
5] have garnered significant attention from researchers. A crucial step in applying reinforcement learning to JSP is its formulation as a Markov Decision Process (MDP) [
6], involving key definitions of actions, states, and rewards. However, previous studies often relied on custom formulations based on domain knowledge of JSP, employing manual engineering and specific workloads, leading to inconsistent experimental results. As Liu, Chang, and Tseng [
7] have noted regarding state space, some studies define states as matrices composed of customer order characteristics and system features (such as machine quantity and average completion time), though they fail to specify relationships between different entities. Other research employs disjunction graphs to represent original states and utilizes graph neural networks (GNN) for feature extraction (Zhang [
8]). The primary advantage lies in its ability to handle instances of varying scales without requiring additional training. However, GNN performance may degrade significantly for complex problems, with computational costs potentially increasing substantially (Wu et al. [
9]), as increased neighborhood node numbers may propagate noise information (Zhou et al. [
10]). Since conventional JSPs disregard transportation routes, we need to introduce a multi-robot path problem to address workpiece transportation in shop floor scheduling.
Beyond processing stages, material handling constitutes a critical component in modern workshops. Traditional scheduling research focusing on “material handling” often oversimplifies operations by treating transportation as a fixed time or capacity constraint, without explicitly modeling paths, congestion, or collision avoidance. This leads to solutions that lack inherent conflict-free implementability. It is worth noting that the integration of manufacturing scheduling and AGV transportation is not limited to Job Shop scenarios but is also extensively studied in Flexible Flow Shop (FSP) domains. Both fields face similar challenges in resource coordination and collision avoidance. For instance, recent research by (Wang [
11]) proposed a mixed-integer linear programming model for efficient flexible flow shop scheduling with automatic guided vehicle consideration. However, distinct from FSP which typically follows a unidirectional or stage-sequential flow, JSP involves more complex, multi-directional routing constraints, imposing higher demands on path planning flexibility. To bridge this gap, recent studies have integrated transportation resources into scheduling models, with the landmark work being the Flexible Job Shop Scheduling Problem with Limited Automated Guided Vehicle(AGV) Transportation (FJSP-LAT) [
12]. This approach coordinates machinery and limited AGV fleets in flexible workshops, establishing an optimization model that minimizes maximum completion time, total energy consumption, and delivery penalty through multi-objective evolutionary algorithms. Experimental data reveals diminishing marginal returns from increasing AGV numbers, providing quantitative evidence for resource allocation. However, most existing FJSP-LAT literature neglects explicit path conflict resolution for AGVs, typically adopting a modeling assumption that “ignores path conflicts and collisions between AGVs” (key assumptions listed in the model section) [
13]. This disconnect persists between the planning (scheduling) and execution (path planning) layers, requiring additional multi-robot path planning and collision mitigation during implementation phases. In summary, a critical synthesis of the literature reveals a significant dichotomy: purely scheduling-focused research tends to oversimplify logistics into static time lags, failing to account for dynamic congestion; meanwhile, routing-focused studies often lack the global foresight required for production efficiency. Consequently, most existing hybrid approaches struggle to balance computational tractability with the topological realism needed for actual industrial deployment. This highlights the urgent need for a framework that inherently unifies task dispatching with deadlock-free path verification. Recent trends in hybrid optimization-learning frameworks emphasize the need to balance computational speed with execution feasibility. For industrial-level implementation, a key requirement is ensuring that algorithmic outputs strictly adhere to physical constraints (e.g., collision avoidance and kinematic limits), which purely data-driven methods often struggle to guarantee probabilistically. By integrating the rigorous constraint-handling capability of PBS with the adaptive decision-making of RL, our proposed framework aligns with these industrial standards, offering a solution that is both efficient in calculation and reliable in physical execution.
The proposed model effectively bridges the gap where Job Shop Scheduling with transportation typically neglects collision-free paths among multiple robots, while specific multi-robot path planning algorithms overlook task-level dispatching. First, in problem formulation: We propose a variant of the Job-Scheduling Problem (JSP) that explicitly incorporates transportation time and path accessibility, enabling dispatch plans with direct implementability without adding additional optimization objectives. Second, in modeling: An integrated mixed-integer/Discrete-Time Network (MIDN) model is developed to unify decision-making processes for “transportation decisions”, AGV assignment, and collision-free path selection within a single timeline, while ensuring no space occupation conflicts or head-on collisions through vertex-edge conflict constraints. Third, in methodology: A collaborative solution process integrating “RL scheduling and PBS path planning” is established. The scheduling layer generates high-quality initial Gantt charts under optional machine sets and exclusion constraints, while the path layer creates collision-free transport plans for workstation nodes. Real-time transportation duration then enables closed-loop adjustments to scheduling details. Fourth, in experimentation: Our approach demonstrates effectiveness through classical cases including FT, LA, SWV, and YN. Most instances achieve or exceed strong baselines in scheduling quality. Taking FT06 as an example, the integrated Gantt chart after path fusion perfectly aligns with both time dimensions and AGV routes, achieving zero collisions. This validates the practicality and scalability of the “workshop scheduling + path planning” framework.
2. Problem Description and Modeling
2.1. Problem Description
This paper aims to consider the job shop scheduling (JSP) and automated guided vehicle (AGV) path planning problems simultaneously, with the optimization objective of minimizing the maximum completion time (makespan) in the overall manufacturing system. Specifically, the problem is described as follows:
- (a)
A manufacturing workshop system comprises multiple processing machines, each equipped with independent input and output nodes.
- (b)
Workpieces within the facility must undergo sequential processing through multiple stages, with each step requiring completion exclusively on designated machines without interruption.
- (c)
AGVs (Automated Guided Vehicles) transport workpieces between processes, with each vehicle limited to handling a single piece at any given time.
The workshop’s processing tasks are defined as a set of processes, each with a fixed processing time. Additionally, the internal logistics routes within the workshop are modeled as an undirected connected graph. The node set represents critical positions such as machine entry points, exit points, initial and final workpiece locations, while the edge set indicates the paths that AGVs can traverse.
Each AGV is assigned a fixed initial position and operates in time units defined as time steps, during which it can either remain stationary or move to adjacent nodes. The system must specify the AGV’s transportation routes and durations for each process, along with determining the processing start times for each operation. The joint optimization problem proposed in this paper requires simultaneous decision-making across multiple processes:
- (a)
The processing sequence and time scheduling of the workpiece on each machine;
- (b)
Task allocation and path planning of AGV to avoid path conflict and node occupancy conflict among multiple AGVs.
The constraints include but are not limited to:
- (a)
Each workpiece must be processed in the predetermined order;
- (b)
Each machine can process at most one process at any time;
- (c)
Each AGV can transport at most one workpiece at any time;
- (d)
At any time, each path node or edge can only be occupied by one AGV at most.
The optimization objective is defined as minimizing the overall maximum completion time while considering AGV transportation time under the aforementioned constraints. This problem holds significant theoretical and practical implications, effectively enhancing production-logistics coordination efficiency. It prevents scheduling conflicts and congestion during AGV operations, ensuring efficient and orderly execution of production tasks.
Figure 1 illustrates an application combining JSP and Multi-Agent Path Finding (MAPF) problems. Each machining machine M serves as both the entry point (waiting area) and exit point for workpieces, with multiple robots freely performing transportation tasks across the facility. Upon completion of a machining process, the AGV transports the workpiece to the next machine in the workflow until all tasks are completed, after which the workpiece is returned to the central warehouse in the map.
To visually demonstrate the JSP-MAPF problem involving process sequencing and AGV allocation,
Figure 2a illustrates the problem structure with two processing machines, three workpieces, and each workpiece requiring two processes along with two AGVs. The solid lines represent the actual process sequence, while the dashed lines indicate the potential sequencing order and transport tasks.Here, Oij denotes the j-th process of the i-th workpiece, where blocks represent AGVs, solid lines indicate processing sequences between workpieces, black dashed closed lines show potential sequence relationships before processing on the same machine, and colored dashed lines indicate potential AGV assignments for each process (where AGVs are dispatched to subsequent machines after completing current processes).
Figure 2b presents a solution case: Machine 1 follows processing sequences O31, O21, O12, etc., with Robot A1 handling the transportation of processes O21, O32, O12, and so on.
2.2. Mathematical Model
In order to accurately describe the problem, this section gives a mathematical description of our problem.
Table 1 shows the definition of symbols,
Table 2 shows the decision variable.
As the
Table 3 shows, Objective function (1) aims to minimize the maximum completion time for machining and AGV transportation. Constraints (2)–(4) represent machine processing constraints. Specifically, Constraints (2)–(3) enforce sequence constraints: each process must be assigned to exactly one designated machine, with the completion time equal to the start time plus the processing duration. Constraint (4) implements the machine overlap constraint: any two processes assigned to the same machine must be processed sequentially through the mutual exclusion of M constraints. Constraints (5)–(6) govern task assignment constraints: when adjacent processes occur on different machines, the subsequent process must not start earlier than the predecessor’s completion time plus the workpiece transportation time, and each task must be executed by exactly one AGV. Constraints (7)–(9) enforce path planning constraints. Constraint (7) ensures path feasibility by maintaining the spatiotemporal continuity of the AGV. Constraints (8) and (9) address multi-robot conflict constraints: they restrict the maximum occupancy of a node to one vehicle (vertex collision constraint) and prohibit opposing travel on the same route (edge collision constraint).
3. Solution of JSP Considering Transportation Time and Path
Building upon the model definition, this chapter unfolds in the sequence of “scheduling layer → task allocation layer → multi-AGV path planning layer”: First, it presents the MDP design and training strategies for the reinforcement learning scheduler; then, automatically generates and assigns transportation tasks through Gantt chart preprocessing; subsequently implements PBS (Priority-Based Search) for multi-AGV non-conflict path planning on workshop topology; finally, feeds the arrival times of paths back into the timeline to complete the sequential integration of processing and transportation.
In the path consideration FJSP problem, the optimization goal is to minimize the transportation time and processing time, and the combinatorial optimization of this combinatorial optimization problem is carried out at three levels: machine processing scheduling layer, robot task allocation layer and path planning layer, respectively.As illustrated in
Figure 3 below.
3.1. Introduction to Machining Scheduling Framework
The JSP uses the scheduling framework from the paper [
6]. The scheduling layer framework of this study aims to achieve efficient decision-making and execution of workshop scheduling tasks to minimize completion time. Specifically, we adopt the D3QPN algorithm (incorporating Dueling Double DQN with Prioritized Experience Replay) to effectively mitigate Q-value overestimation and enhance sample efficiency within the discrete scheduling action space.The scheduling layer framework includes the following key components:
State representation: The scheduling problem is described in a structured manner through the partitioning diagram. In this diagram, nodes represent the processing procedures to be processed, and edges represent the constraints before and after processing as well as machine usage constraints. This structured representation can effectively capture the timing relationship of the procedures and resource constraints.
Feature extraction module: The feature extraction module uses the Attention mechanism to extract features from the disjunction graph, so as to effectively capture the key information in the graph structure and the correlation features between processes.
Action selection module: The general dispatching rules (such as FIFO, LIFO, SPT, LPT, etc.) are adopted as the action space. According to the results of graph feature extraction, the most suitable dispatching rules for the current state are dynamically selected to balance the efficiency and interpretability of decision-making.
Scheduling execution and status update: According to the selected scheduling rules, the task scheduling of corresponding number of steps is performed, and the workshop environment and task status information are updated to provide real-time status feedback for the next decision.
Feedback and iterative updates: Based on the effect of scheduling execution, reward feedback is calculated and the strategy is continuously updated through reinforcement learning to improve the overall scheduling performance.
The scheduling layer framework is shown in
Figure 4 below. It has the characteristics of clear structure, easy to implement and interpret, and can effectively adapt to complex workshop scheduling and AGV transportation collaborative optimization tasks.
Action Selection Improvements
The scheduling actions of JSP in reinforcement learning are crucial for addressing scheduling challenges. In the action selection module of the aforementioned framework, at time t, decisions must be made regarding processing actions for multiple workpieces. The original method included: (1) First-In-First-Out (FIFO); (2) Last-In-First-Out (LIFO); (3) Most Remaining Operations (MOR); (4) Least Remaining Operations (LOR); (5) Longest Processing Time (LPT); (6) Shortest Processing Time (SPT); (7) Longest Total Processing Time (LTPT); (8) Shortest Total Processing Time (STPT). While these actions are designed based on process characteristics, scheduling fundamentally involves sequencing and machine allocation. Since none of the existing actions specifically address machine optimization, we propose two new strategies: Most Idle First (MIF) and Least Idle First (LIF), incorporating them into the original eight action set.
As shown in
Figure 5 below, there are five processing machines numbered A–E. At time t, MIF prioritizes the process that can be processed on machine A for arrangement, while LIF prioritizes the process that can be processed on machine C for arrangement.
3.2. Robot Task Assignment
Upon the completion of machine processing scheduling, transportation tasks are generated and assigned to the executable AGVs. Each task corresponds to the movement of a workpiece for job
after completing operation
. Formally, this is represented as a Transport Request (
) triplet:
where
denotes the docking node
of the current machine,
denotes the docking node
of the target machine for the subsequent operation, and
corresponds to the operation completion time
.
Task Allocation Strategy: The scheduler distributes the set of requests to the robot set
. The allocation process determines the decision variable
to ensure that each task is assigned to exactly one robot, satisfying the assignment constraint:
This mechanism ensures that all generated transportation requests are covered by the available fleet.
Idle Priority Principle: To optimize transport efficiency, an “Idle Priority Principle” is employed. When a task
becomes available at
, the system identifies the subset of idle robots,
. The assignment algorithm selects the optimal robot
by minimizing the travel cost to the starting node:
The task is assigned to the selected robot (setting ), thereby minimizing the waiting time before the transportation begins.
3.3. Preprocessing of Scheduling Results Combined with Path Planning
Traditional Gantt charts developed for scheduling problems often neglect transportation time. When incorporating this factor, two critical issues emerge: (1) The continuous influx of different processes onto a single machine creates transportation challenges; (2) Within the same workpiece, the immediate transfer to the next machine after completing a previous process fails to account for transportation duration.
According to the above problems, we use the buffer to solve the first problem and the buffer to preprocess the Gantt chart to solve the second problem.
This section details the preprocessing workflow for integrating path planning-generated transit windows with actual transportation durations into the scheduling timeline. For each material handling task, the start time of the next process is determined by taking the maximum value between the “machine availability” and “transport arrival time”. This approach ensures that the final maximum completion time Cmax incorporates material handling duration while maintaining strict temporal alignment between Gantt charts and the original routing path.
3.3.1. Setting of Handling Buffer Zone
To enhance the implementability of the scheduling algorithm, this study incorporates a distinct workshop spatial architecture during modeling, as illustrated in
Figure 6. The functions of the symbols are shown in
Table 4.The system comprises machining units, input/output ports, and buffer zones, where X.1 denotes the unified entry point for workpieces and X.2 represents the centralized exit point. Each machining unit is equipped with both input and output buffer zones to temporarily store workpieces awaiting processing or completed items requiring transportation. This design addresses the unrealistic assumption in traditional scheduling models that “different operations on the same machine can seamlessly transition without interruption.” In real-world production environments, workpiece transportation and storage inevitably introduce additional delays. The buffer zones serve as intermediaries, allowing workpieces to temporarily reside in these zones after processing until transportation resources or target machines become available for transfer.
Specifically, during the material handling process, AGVs first enter the global buffer through Entrance (Position 1), where they are transported to corresponding processing units via conveyor belts or robotic arms. Upon completion of processing, workpieces are transferred to the output buffer of their respective units, awaiting subsequent transportation. All workpieces ultimately exit the system through Exit (Position 2). This mechanism not only considers processing and transportation in the temporal dimension but also ensures rational workpiece flow in the spatial dimension, effectively preventing scheduling infeasibility caused by resource conflicts or transportation delays.
3.3.2. Workpiece Buffer Treatment
Figure 7 demonstrates the comparison between traditional scheduling results and the preprocessing method proposed in this study. The upper chart shows a conventional Gantt chart that ignores transportation processes, where workpieces are immediately processed on the next machine after completing the previous step. However, in real-world production environments, workpieces require transportation to reach target equipment, making such scheduling unrealistic in practice. The lower chart presents the preprocessed Gantt chart generated by our method, which explicitly incorporates transportation time and resource constraints between processes. For instance, after MachineA completes processing of workpiece J1, its path planned by the PBS algorithm requires 2 time units to reach Machine B, causing subsequent processing tasks to be delayed accordingly. Similarly, the transportation process between MachineA and MachineC for workpiece J2 takes 3 time units, pushing subsequent processing tasks to time point 12. This demonstrates that our method effectively addresses the issue of unreasonable process sequencing caused by traditional scheduling ignoring transportation, generating scheduling solutions that better align with practical execution conditions. This preprocessing mechanism lays the foundation for integrated optimization of path planning and scheduling in subsequent stages.
3.3.3. Task Mapping from Gantt Chart to PBS Algorithm
Traditional approaches only assign processing sequences and machine assignments at the scheduling layer without explicitly outputting transportation tasks. This results in the path layer struggling to inherit scheduling outcomes and align with the timeline, often leading to “disconnection between transportation and machining” during execution phases. To address this, we need to automatically and systematically extract transportation tasks from the obtained Gantt chart as input for the PBS algorithm, ensuring precise spatiotemporal correspondence between path planning and scheduling.
Through preprocessing, we ensure all processes can be successfully transferred. Using “adjacent processes across different machines” as the trigger condition, we verify each process’s completion status before determining if it is the final step. The system extracts critical data from the Gantt chart—specifically task response time T and task transition X to X—to generate task lists for PBS system processing. As illustrated below, this demonstrates a simplified workflow for task extraction.As illustrated in
Figure 8.
3.4. Path Planning Layer PBS Algorithm
Based on the extracted task list, we utilize the PBS (Priority-Based Search) algorithm to determine the underlying path for material handling within a given workshop map. This algorithm integrates the principles of prioritized planning and conflict-driven search, ensuring solvability while significantly improving computational efficiency. The PBS framework divides the search process into two hierarchical levels:
High-Level Search (PT Priority Tree): Maintains a priority graph. Initially, no priority constraints exist, and each robot independently plans the optimal path. The system checks for vertex conflicts or edge conflicts in the path set. If a conflict is detected (overlapping paths between robots), the Priority-Based Search (PBS) expands two child nodes in the priority tree: Node 1: robot (i.e., priority). Node 2: robot. This gradually constructs a partial priority relationship, and the system performs depth-first traversal of the priority tree until finding a conflict-free path set.
Low-Level Search (LLS): Monocular Path Planning: Given a priority relationship, the LLS algorithm plans paths for robots one by one. During path planning, high-priority robots maintain fixed routes while low-priority robots must avoid these paths (treated as dynamic obstacles). The system employs a Space-Time A* algorithm to identify collision-free paths.
Note: Description of Symbols and Functions To ensure consistency with the mathematical model in
Section 2.2, the notations in Algorithm 1 are defined as follows:
: The set of AGVs
, corresponding to
Table 1.
: The collection of transport requests generated in
Section 3.2.
: The low-level pathfinding function that calculates the optimal path for AGV k while treating higher-priority agents as dynamic obstacles.
: A set of ordering constraints () used to resolve conflicts; AGV must yield to .
: A function that validates whether the path set satisfies the Vertex Collision Constraint (Equation (8)) and Edge Collision Constraint (Equation (9)).
: A data structure executing a Depth-First Search (DFS) on the priority tree.
| Algorithm 1: PBS for Factory Transport |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5. Conclusions
This paper presents an integrated solution framework structured into three layers—scheduling, task allocation, and multi-robot path planning—to address the JSP-T problem, where transport robots are traditionally not considered in conflict-free path generation. The framework aims to minimize the makespan, explicitly incorporating transportation time. In the scheduling layer, a reinforcement learning algorithm is employed, and its action design is refined to produce high-quality processing-sequence plans. Experiments on FT, LA, SWV, YN, and other benchmark instances show that the proposed method improves the makespan by 9.7% compared with other reinforcement-learning approaches such as PPO. In the task allocation layer, the scheduling plan is mapped to task times and locations, and transportation tasks are assigned to robots, enabling an effective integration of the JSP and MAPF formulations. In the multi-robot path planning layer, the PBS algorithm receives tasks, workshop layouts, and transportation-resource information to generate conflict-free paths efficiently and reliably for multiple robots. Overall, the framework provides a coherent optimization pipeline from scheduling to execution, enhancing adaptability to varying shop-floor operations and supporting flexible manufacturing environments. Finally, by minimizing high-energy machinery idle time and preventing redundant AGV movements, the proposed framework contributes to energy conservation, directly aligning with the social imperative of sustainable ‘Green Manufacturing.
It is important to clarify that the proposed framework operates primarily as a two-stage sequential optimization strategy rather than a fully dynamic, real-time closed-loop system. Specifically, the mechanism relies on generating an initial high-quality schedule which is then refined via a one-time feedback adjustment based on the collision-free paths generated by the PBS algorithm. While this “Generate-then-Refine” approach effectively resolves spatiotemporal conflicts and ensures implementability, it has limitations. The global optimality of the final solution is partially dependent on the quality of the initial schedule, and the current offline nature of the framework limits its ability to handle continuous, real-time disturbances during execution. Future work will focus on extending this mechanism into a fully iterative loop to further enhance robustness in dynamic environments.