Next Article in Journal
Effect of Combined Film Cooling and Swirl on the Thermal Performance of a Contoured High Pressure Turbine Vane of a Modern Turbofan Engine: A Numerical Study
Previous Article in Journal
Unsteady Internal Flow and Cavitation Characteristics of a Hydraulic Dynamometer for Measuring High-Power Gas Turbines
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hybrid Attention-Augmented Deep Reinforcement Learning for Intelligent Machining Process Route Planning

School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
*
Author to whom correspondence should be addressed.
Machines 2026, 14(3), 343; https://doi.org/10.3390/machines14030343
Submission received: 2 February 2026 / Revised: 13 March 2026 / Accepted: 17 March 2026 / Published: 18 March 2026
(This article belongs to the Section Advanced Manufacturing)

Abstract

Machining process route planning (MPRP) is vital for autonomous manufacturing yet remains challenging under complex, multi-dimensional engineering constraints. This paper proposes an attention-augmented deep reinforcement learning (DRL) framework to achieve intelligent process orchestration. First, an Optional Process Attribute Adjacency Graph (OPAAG) is established to formally model the “feature–process–resource–constraint” coupling, enhancing the agent’s perception of manufacturing semantics. The architecture synergistically integrates Graph Attention Networks (GAT) to perceive spatial benchmark dependencies and a Transformer-based encoder to capture sequential resource correlations within variable-length machining chains. Furthermore, a dynamic action masking mechanism is integrated to guarantee a 100% constraint satisfaction rate during both training and inference stages. Experimental evaluations across diverse part geometries demonstrate that the proposed method offers significant advantages in cost optimization, inference efficiency, and topological stability compared to traditional heuristic algorithms and standard DRL models. By effectively distilling the search space and maintaining action feasibility, the framework provides an efficient and robust solution for autonomous process planning in complex industrial scenarios.

1. Introduction

In the paradigm of Industry 4.0, machining process route planning (MPRP) serves as a critical component of intelligent manufacturing systems. It acts as the bridge between design and production within Computer-Aided Process Planning (CAPP) by identifying machining features, selecting resources, and determining the optimal sequence of operations. The efficiency of MPRP directly impacts product quality, delivery cycles, and manufacturing costs in high-mix, low-volume production environments [1,2,3].

1.1. Traditional and Heuristic Approaches in MPRP

Early research focused on knowledge-driven systems. Li et al. [4] utilized ontology-based models to represent manufacturing knowledge, while Waiyagan and Bohez [5] developed rule-based systems for feature-based planning. Recent extensions include the use of knowledge graphs for dynamic reasoning by Long et al. [6] and semantic frameworks for cost estimation by Hernandes et al. [7]. Concurrently, MPRP was widely treated as a complex combinatorial optimization problem. Hua et al. [8] employed Genetic Algorithms (GA) to optimize tool changes and setup orientations, while Liu et al. [9] applied Ant Colony Optimization (ACO) to navigate large-scale constraint spaces. Multi-objective meta-heuristics have since been applied to diverse scenarios, including cellular manufacturing reliability [10], non-cutting path optimization [11], and intelligent 3D process generation [12]. Liu et al. [13] further integrated process planning with AGV scheduling tasks, and Peng et al. [14] optimized routes specifically for remanufacturing. Despite their mathematical rigor, these heuristic methods often suffer from high computational costs and limited adaptability when facing the “curse of dimensionality” in complex prismatic parts.

1.2. Deep Learning and Reinforcement Learning in Manufacturing

Advancements in Deep Learning (DL) have enabled automated feature extraction from CAD models. Ding et al. [15] used attribute adjacency graphs (AAG) for feature modeling, and other studies [16] explored multi-view representations. Lei et al. [17] developed MFPointNet for direct feature recognition from point clouds. More recently, graph-based encodings have been used to capture topological interactions, such as the GCN models proposed by Wang et al. [18] and the attention-based semantic frameworks by Du et al. [19]. Zhang et al. [20] pointed out the unique advantages of DRL in solving constrained combinatorial optimization problems in MPRP. Deep Reinforcement Learning (DRL) has emerged as a powerful paradigm for real-time decision-making in MPRP. Wu et al. [21] proposed an early DRL framework for rapid process generation. Zhao et al. [22] introduced Graph Attention Networks (GAT) into green manufacturing scheduling, and Xiao et al. [23] utilized graph convolutional RL for energy-aware planning. To enhance stability, Zhang et al. [24,25] explored proximal policy optimization (PPO) with specialized reward shaping and exploration mechanisms. Additionally, Multi-Agent RL has been applied to systemic process control [26] and dual-resource scheduling [27].

1.3. Research Gaps and Research Objectives

Despite these advancements, two critical research gaps persist in the current DRL-based MPRP frameworks:
  • Over-simplification of practical industrial scenarios: Despite the success of existing DRL models, they fail to reflect the multi-layered complexity. Specifically, they do not simultaneously account for the dual-level dependencies: the spatial benchmark (datum) dependencies between features (Zhu et al. [28]) and the internal sequential correlations within individual process chains, especially when handling variable-length sequences (Kwon et al. [29]; Li et al. [30]).
  • Inefficient distillation of the search space: Current research often relies on coarse data representations that lack deep refinement of manufacturing semantics (Zhang et al. [31]). This failure results in a redundant and unnecessarily large solution space. Consequently, optimizing the selection among multiple Optional Processes (Su et al. [32]) while maintaining 100% action feasibility remains a challenge.
In light of these gaps, the objective of this research is to develop a comprehensive intelligent MPRP framework that can simultaneously perceive multi-dimensional manufacturing semantics and strict engineering constraints. We aim to construct an environment model that preserves the intricate “feature–process–resource” coupling and design a hybrid neural architecture capable of capturing both global topological constraints and local sequence correlations.

1.4. Proposed Framework, Contributions, and Organization

To achieve the aforementioned objectives, this paper introduces a Hybrid Attention-augmented DRL framework that distinguishes itself from prior studies through several core innovations. Our approach establishes an Optional Process Attribute Adjacency Graph (OPAAG) to formally map manufacturing constraints into a structured tensor space, providing a high-fidelity representation of machining semantics by encoding complex manufacturing constraints into a numerical latent space. We further develop a Hybrid Attention mechanism that synergistically combines GAT and Transformer layers to extract multi-scale features, where GAT layers perceive spatial benchmark dependencies and the Transformer encoder captures sequential resource correlations within variable-length machining chains. To ensure strict engineering feasibility, a dynamic action masking mechanism is integrated to guarantee a 100% constraint satisfaction rate (CSR) during both training and real-time inference. Extensive experimental results demonstrate that the proposed method outperforms state-of-the-art heuristic and DRL methods in cost optimization, convergence speed, and stability.
The remainder of this paper is organized as follows: Section 2 formalizes the MPRP problem and its symbolic representation; Section 3 details the OPAAG construction and defines the DRL agent and the state space; Section 4 presents the proposed hybrid network architecture; Section 5 discusses the experimental results and comparative analysis; and Section 6 concludes the paper.

2. Problem Formulation and Overall Framework

This chapter aims to establish a rigorous mathematical foundation for intelligent MPRP. First, the MPRP problem is formalized as a Markov Decision Process (MDP). Second, the graph representation model of the part and the symbolic system for optional operation sets are defined in detail. Finally, the integrated “train-application” dual-stage architecture, which incorporates spatial graph encoding and sequential optimization, is introduced.

2.1. Problem Formalization as MDP

The generation and optimization of machining process routes are essentially complex sequential decision-making challenges under multi-dimensional constraints [33,34,35]. To achieve autonomous decision-making, the MPRP problem is formalized as a Markov Decision Process (MDP) characterized by the quintuple ( S , A , P , R , γ ) . At each time step t , the agent perceives the system state s t and executes an action a t based on the policy π θ . The state s t represents a fusion of static structural attributes and the dynamic process status. The state s t is defined as a fusion of static structural attributes s and dynamic process status d t .
The transition s t s t + 1 reflects the iterative update of the machining schedule, as well as the changes in machine and tool status after a specific operation is selected.

2.2. Symbolic Representation of Part and Operations

To enable deep neural reasoning, the part is structured as an Optional Process Attribute Adjacency Graph (OPAAG), denoted as G = ( V , E ) . The vertex set V represents individual machining features, and the edge set E captures directed spatial benchmark (datum) dependencies based on the Datum Principle. Each machining feature f i V is associated with an Optional Process Set O i = { o i , 1 , o i , 2 , , o i , k } , which ensures that the agent perceives the multi-dimensional coupling of “feature-process-resource”. A specific operation o i , j O i is represented as a multi-dimensional vector:
o i , j = [ P r o c e s s , T A D , T o o l , M a c h i n e , C o s t ]
where i represents the feature index, j represents the process index of the current feature P r o c e s s denotes the machining step type, T A D is the tool approach direction, T o o l and M a c h i n e are the required manufacturing resources, and C o s t represents the estimated execution cost. This vectorization ensures that the agent can perceive the multi-dimensional coupling of “feature-process-resource” while maintaining full compatibility with the structural representation G .
The OPAAG structure, including its topological edges and node attributes, is directly mapped into the MDP state space through a tensorization process detailed in Section 3.2.1.

2.3. Architecture of the Hybrid Attention-Augmented DRL Framework

The proposed framework, as shown in Figure 1, adopts a “train-application” dual-stage architecture to balance offline learning efficiency with online responsiveness. This design facilitates the agent to internalize complex machining logic during learning and deploy it for autonomous planning in real time.

2.3.1. Training Stage

The training stage facilitates iterative parameter optimization through intensive agent–environment interaction. During the initial information modeling process, the part’s MBD model is analyzed via the AAGNet algorithm [36] to extract geometric parameters and machining constraints, which are then used to construct the OPAAG. The agent perceives the complex system state s t via a Transformer-GAT augmented PPO framework. Specifically, a two-layer Graph Attention Network (GAT) encodes spatial benchmark (datum) dependencies from the OPAAG structure. Simultaneously, the Transformer encoder serves as a bridge to capture long-range sequential correlations within variable-length operation sequences, ensuring resource stability and decision coherence. By executing actions a t (selections of specific o i , j ) filtered by a dynamic masking mechanism and receiving multi-objective rewards r t , the agent iteratively refines its policy π θ and value evaluation V ω . This recursive feedback loop continues until the policy converges toward an optimal mapping between part topologies and cost-efficient routes.

2.3.2. Application Stage

Upon transitioning to the application phase, the matured model autonomously generates process routes for novel parts without requiring further parameter modifications. The process begins with a consistent pre-treatment of the new part to ensure structural compatibility, after which its OPAAG representation is fed into the inference engine. The agent then identifies the optimal action sequence through a series of sequential decisions, leveraging its internalized knowledge of engineering constraints and resource optimization. The decision process terminates when the machining schedule for all features reaches unity (all features processed), ultimately outputting a complete, constraint-compliant, and cost-optimal route that is ready for manufacturing execution.

3. DRL Agent Construction

3.1. Construction of Optional Process Attribute Adjacency Graph

The OPAAG serves as the pivotal bridge connecting raw CAD data with deep learning inference [37]. The construction process transforms unstructured part models into structured graph models enriched with comprehensive machining semantics through the following four stages.

3.1.1. Feature Recognition and Attribute Extraction

The initialization of the OPAAG begins with the automated parsing of the part’s MBD model via the AAGNet intelligent feature recognition algorithm [36]. Utilizing a Graph Convolutional Network (GCN) architecture, AAGNet learns the intrinsic topological connectivity of B-rep faces to identify typical machining features (e.g., holes, slots, planes). For each identified feature f i , a multi-dimensional attribute set—including geometry (dimensions, coordinates), precision ( I T , R a ), and engineering constraints ( T A D , allowance)—is extracted and encapsulated into a node v i V , as summarized in Table 1. Simultaneously, based on the Datum Principle, the algorithm identifies the locating reference for each feature and constructs directed edges e i j E . These edges are explicitly labeled as “Datum Dependent,” forming the foundational topology of the graph and representing the hard precedence constraints required for sequential decision-making, as shown in Figure 2.

3.1.2. Process Chain Matching

As shown in Figure 3, following feature identification, the system performs Process Chain Matching to correlate each node v i with a feasible sequence of machining steps. The matching logic is driven by the feature type and precision requirements extracted in the previous stage. As detailed in Table 2, the system retrieves typical process routes (e.g., “Drilling Expanding Reaming” for high-precision holes) from the database to form the process chain L i . To ensure manufacturing integrity, the selection strictly adheres to the hierarchical constraint: “Rough Machining Semi-finish Machining Finish Machining.” This sequential logic ensures progressive material removal and minimizes thermal deformation, providing the categorical basis for the P r o c e s s attribute within the operation vector.

3.1.3. Processing Resource Matching

Processing resource matching transforms the abstract process steps into executable operations by assigning physical tools ( T o o l ) and machine tools ( M a c h i n e ) from digitized repositories summarized in Table 3. This procedure populates the specific values for the optional operation vector o i , j = [ P r o c e s s , T A D , T o o l , M a c h i n e , C o s t ] . Valid matching is determined by aligning feature requirements with resource capabilities, ensuring the tool geometry fits the feature accessibility and the machine repeatability satisfies the target I T grade. For each feature f i , the collection of all valid resource–process combinations forms the Optional Operation Set O i , which defines the action space boundaries for the DRL agent.

3.1.4. Processing Cost Calculation

The quantitative evaluation of the C o s t attribute provides the objective basis for the DRL reward function. For a specific operation o i , j , the effective cutting time T c is calculated based on the material removal volume V m r and the Material Removal Rate (MRR) [38]:
M R R = v c × f z × Z × a p
T c = V m r M R R
where v c is cutting speed, f z is feed per tooth, Z is the number of teeth, and a p is the depth of cut. The machining cost is derived as C i , j = k p · T c . To ensure a uniform gradient signal for policy optimization, the raw costs are normalized to a standard interval of [ 0 , 1 ] relative to the extrema within each set O i :
C i , j = C i , j C m i n , i C m a x , i C m i n , i
where C m a x , i and C m i n , i are the maximum and minimum costs within the optional operation set O i for feature f i .
This normalized cost ensures that the agent can effectively distinguish between various resource combinations during the training process. Processing resource matching, processing cost calculation and the final synthesized OPAAG are as shown in Figure 4.

3.2. Intelligent DRL Model

3.2.1. State Definition

The state s t constitutes the perceptual foundation of the reinforcement learning environment and is formalized as a high-dimensional fusion of static attributes s and dynamic attributes d t denoted as s t = [ s d t ] , as shown in Figure 5. To bridge the gap between symbolic process graphs and neural network architectures, the environment state is formally mapped into a structured three-dimensional tensor with dimensions represented as | V | × | O | × 55 . In this representation, | V | denotes the total number of machining features, | O | represents the maximum number of candidate process units allowed per feature, and 55 represents the comprehensive feature dimension of each process operation.
This study establishes an explicit mechanism to map from the Optional Process Attribute Adjacency Graph denoted as OPAAG and represented as G consisting of nodes V and edges E to the state tensor by directly transforming OPAAG edge information into the edge information of the state space. This topological constraint is expressed as a static binary adjacency matrix A { 0 , 1 } | V | × | V | with the mathematical definition as follows:
A i j = 1 , i f   e i j E 0 , o t h e r w i s e
where A i j = 1 explicitly expresses the manufacturing precedence constraints between features such as the datum-first principle, which provides a stable structural foundation for the agent to perceive topological dependencies during the planning process.
Concurrently, the node information of the state space is constructed by concatenating the static attribute matrix derived from each feature node in OPAAG with its corresponding dynamic information matrix. A core contribution of this representation is that a 3-dimensional dynamic feature vector is attached separately to each specific candidate operation o under each feature v within the state tensor. In the 55-dimensional feature vector for each process unit, the first 52 dimensions consist of static manufacturing semantics, including a 16-dimensional one-hot encoding for the process to distinguish between different machining methods such as milling, drilling, and reaming, an 8-dimensional vector for the tool access direction to capture spatial orientation constraints, a 24-dimensional one-hot vector for the specific cutting tool resource required to perform the operation, a 3-dimensional vector identifying the available machining machine, and a 1-dimensional normalized scalar measuring the basic processing cost of the operation.
The subsequent 3-dimensional dynamic features characterize the information that evolves with each discrete time step. The mask vector M t provides a 1-dimensional binary identifier for each specific operation to filter the search space and ensure the feasibility domain. For instance, if a datum feature remains unmachined, all operations for its dependent features are marked as 0, while once an operation meets the precedence and process constraints, its position is marked as 1. The process schedule P t utilizes a 1-dimensional scalar to quantify the overall completion degree of the machining chain belonging to that feature through a fractional value. For example, a feature with a process chain of three steps would have its progress updated to 0.67 upon completing two steps, and the decision process terminates only when all progress values reach unity. The action identifier L t provides a 1-dimensional binary identifier to record the action performed in the preceding decision step. This identifier is essential for calculating non-cutting costs such as tool changes and clamping adjustments, where additional cost penalties are triggered if the current operation requires resources different from those recorded. This tensorized characterization preserves the global topological associations of OPAAG while enabling the seamless conversion of symbolic manufacturing information into a numeric format for subsequent feature extraction.

3.2.2. Action Definition and Masking Mechanism

The action a ( t ) is defined as the discrete selection of a single machining operation from the global operation set, serving as the core output of the reinforcement learning agent at each time step t . This action is represented in a one-hot vector format, a ( t ) { 0 , 1 } V × K × M , where the design of the vector dimensions corresponds directly to the structural hierarchy of the machining process. Specifically, the first dimension V represents the total number of machining features, indicating which feature is selected for processing; the second dimension K corresponds to the length of the process chain for a single feature, specifying the exact process step to be executed; and the third dimension M denotes the number of available tool–machine tool combinations for that particular process step. Within this vector, only the element corresponding to the selected machining operation o i j is marked as 1, while all remaining elements are 0. For instance, if a specific index in the vector is activated, it precisely maps to a unique combination of a feature, its current process stage, and the assigned physical resources.
This one-hot characterization is designed to work in synergy with the mask vector M ( t ) to ensure 100% feasibility of the generated process routes. Due to the strict precedence and resource constraints in manufacturing—such as the requirement that a datum feature must be machined before its dependent features or the prohibition against repeating a completed process—invalid actions are filtered out before the agent performs sampling. By performing an element-wise multiplication between the neural network’s output probability distribution and the mask vector M ( t ) , the probability of selecting an operation that violates engineering constraints is effectively reduced to zero. This masking mechanism ensures that the agent only makes decisions within the feasible search space, which not only prevents the generation of invalid process sequences but also clarifies the structure for the network to output stable probability distributions and perform efficient sampling selection.

3.2.3. State Transition and Termination Conditions

The state transition process in this framework follows the formal rule s t + 1 = T ( s t , a ( t ) ) , where the transition function T acts exclusively upon the dynamic attribute d ( t ) , while the static attribute s remains invariant due to its representation of inherent part characteristics. When the agent executes a specific action a ( t ) , the system first identifies the corresponding machining operation o i j , which represents the j -th process step of feature f i . Based on this execution, the three components of the dynamic attribute are updated through a rigorous logic. Within the mask vector M ( t + 1 ) , all operation masks associated with the completed process step j of feature u i are set to 0 to prevent redundancy, while the masks for the subsequent process step j + 1 are set to 1 to enable progression. Simultaneously, any dependent feature operations that fail to meet their updated datum requirements—such as features relying on u i when u i has not yet reached the necessary stage—remain masked as 0.
The update of the process schedule P ( t + 1 ) ensures that the agent can accurately perceive the completion degree of the manufacturing task. For the selected feature u i , its progress value is incremented as P i ( t + 1 ) = P i ( t ) + 1 / K i , where K i denotes the total number of process steps in the feature’s specific process chain L i . This linear update method allows the state to reflect the real-time machining status of the entire part. Concurrently, the previous operation identifier L ( t + 1 ) is updated by setting the position corresponding to the current action a ( t ) to 1 and all other positions to 0. This identifier serves as the critical reference for calculating tool change and clamping costs in the subsequent time step t + 1 . In scenarios where an invalid action is selected, the state transition function performs no update, resulting in s t + 1 = s t . The decision-making episode reaches its termination condition when the process schedule for all features reaches unity ( P i = 1 ), signifying that the complete process route has been generated and all manufacturing requirements have been satisfied. The logic of state updates and transitions is illustrated in Figure 6.

3.2.4. Incentive Mechanism

The incentive mechanism within this framework is constructed as a two-layer reward structure, comprising an instant reward r t and a cumulative reward r c , designed to guide the agent toward acquiring an optimal machining strategy. The total reward effectively balances the immediate quality of individual actions with the overall goal of generating a complete and compliant process route.
The instant reward r t is utilized to evaluate the performance of the action at each discrete time step, adhering to a design logic that penalizes high-cost operations while rewarding actions that comply with machining constraints. This is formulated through a weighted combination of a cost penalty component and a compliance reward component:
r t = λ 1 · 1 C m + C t + C c C m a x + λ 2 · I v a l i d
In this equation, the cost penalty item seeks to minimize resource consumption by evaluating the machining cost C m , the tool change cost C t , and the clamping cost C c . Since tool changes and clamping adjustments represent non-cutting time that significantly impacts productivity, they are explicitly integrated into the penalty. The term C m a x serves as a preset threshold to normalize the total cost within the [ 0 , 1 ] interval, ensuring that low-cost actions yield higher reward values. For this study, the weights are set as λ 1 = 0.7 for cost and λ 2 = 0.3 for compliance. The compliance reward utilizes an indicator flag I v a l i d , where a value of 1 is assigned to valid actions that satisfy all constraints, and 0 is assigned to invalid actions. Furthermore, tool change and clamping costs are treated as fixed penalties based on industrial time-cost ratios; specifically, C t = 0.3 is triggered when the tool differs from the previous step, and C c = 0.5 is applied when the machine tool or tool access direction (TAD) changes. If resources remain unchanged, both C t and C c are set to 0.
The weights are set as λ 1 = 0.7 and λ 2 = 0.3 to balance the relative scales of primary machining costs (which possess larger numerical magnitudes) and the compliance rewards. These values were determined through preliminary grid search experiments to ensure the agent prioritizes cost optimization while maintaining strict adherence to engineering constraints. Furthermore, the fixed penalties C t = 0.3 and C c = 0.5 are assigned based on empirical industrial time–cost ratios, representing the typical non-cutting overhead for tool and setup changes. A brief sensitivity analysis indicated that the model performance remains stable within a ± 10 % fluctuation in these parameters, confirming the robustness of the reward design.
The cumulative reward r c provides a holistic evaluation of the decision-making sequence at the conclusion of each episode. It is designed to provide positive feedback for complete strategy generation; thus, if an episode terminates because all feature machining is successfully finished, the agent receives a higher cumulative reward. Conversely, if the episode ends prematurely due to the continuous selection of invalid actions, a negative reward is administered to discourage unproductive exploration. The synergy between r t and r c ensures that the agent optimizes both local action-level efficiency and the global integrity of the process route.

4. Network Structure

The network architecture serves as the primary computational vehicle for achieving intelligent decision-making, specifically designed to adapt to the “graph-structured input” and “sequential process transitions” inherent in process planning. This study adopts the Proximal Policy Optimization (PPO) algorithm as the base framework. To effectively capture both topological dependencies (datum relationships between features) and temporal sequential information (process chain order), a Graph Attention Network (GAT) and a Transformer encoder are integrated as core feature extraction modules. The policy network π θ and the value network V ω share this joint feature extraction module, diverging only at their respective output layers to ensure a consistent interpretation of the environmental state. The detailed configuration of network is shown in Figure 7.

4.1. Sequential Feature Encoding via Transformer

The core function of the Transformer encoder is to process the temporal data within the process chain L i of each feature. Traditional recurrent architectures often struggle with variable-length sequences, whereas the Transformer effectively captures internal dependencies through its self-attention mechanism [39,40].
Initially, the process chain of each feature is converted into a raw attribute vector v i , j r a w containing the type, cost, and resource requirements of each step. To eliminate scale differences, a Layer Normalization step is performed. The encoder then utilizes Query ( Q ), Key ( K ), and Value ( V ) matrices to compute attention weights, identifying the significance of each process step relative to others in the sequence. The scaled dot-product attention is formulated as:
A t t e n t i o n ( Q , K , V ) = s o f t m a x Q K T d k V
The final output is a fixed-dimensional encoding vector that realizes the representation of variable-length process chain information:
E m b e d d i n g v i = T r a n s f o r m e r ( L a y e r N o r m ( v i r a w ) )
After each node’s information is transformed into a fixed-dimensional encoding vector, it will be combined with the edge information of the Optional Process Attribute Adjacency Graph, forming an intermediate state in the form of a graph, which then enters the subsequent GAT layer.

4.2. Spatial Topology Aggregation via GAT

The GAT [41] layers are responsible for aggregating node association data within the adjacency graph G to strengthen the characterization of datum dependencies between features. Because the machining state of a reference feature directly dictates the feasibility of subsequent feature operations, the attention mechanism is utilized to adaptively assign weights to different adjacent nodes.
A two-layer GAT structure is implemented for feature aggregation, calculating the attention weight α i j between a node u i and its neighbor u j via multi-head attention. This weight reflects the manufacturing impact of adjacent nodes on the current node. The feature aggregation formula is defined as:
h i ( l + 1 ) = σ j N i α i j W l h j ( l )
where h i ( l + 1 ) is the node feature output from the ( l + 1 ) -th GAT layer, σ is the ReLU activation function, N i is the set of adjacent nodes for u i , and W l is the weight matrix of the l -th layer.

4.3. Decision Head and Masking Mechanism

The policy network outputs the probability distribution of actions based on current state features. Structural features from the GAT layer are converted into a one-dimensional vector through global average pooling and flattened via a fully connected layer. To ensure 100% feasibility of the output process routes, a masking mechanism is integrated to suppress invalid actions.
For operations violating engineering constraints (marked with a mask of 0), the corresponding dimension in the feature vector is set to a minimum value of 10 9 , ensuring its probability remains near zero during Softmax calculation. The action probability distribution is expressed as:
P ( a t | s t ) = S o f t m a x F l a t t e n ( G A T ( s t ) ) + ( 1 M t ) · ( )
where P ( a ( t ) | s t ) is the probability of selecting action a ( t ) under state s t , F l a t t e n ( G A T ( s t ) ) is the GAT output after pooling and flattening.

4.4. PPO Learning Strategy

The value network evaluates the state-value V ω ( s t ) , representing the expected cumulative reward of subsequent decisions, to provide a benchmark for policy updates. It shares the GAT extraction module with the policy network, utilizing two fully connected layers with ReLU activation for mapping to a one-dimensional output.
Training is conducted using the PPO-clip mechanism to achieve stable parameter updates [42]. To reduce variance, Generalized Advantage Estimation (GAE) is employed to calculate the advantage function A ^ t . The policy network is optimized using a clip objective function to prevent excessively large policy shifts. Training terminates when the value network evaluation stabilizes (fluctuation < 10 4 ) and the Constraint Satisfaction Rate (CSR) reaches 100%, indicating that the agent has learned to generate fully compliant and optimized process routes.

5. Experimental Verification

In this section, the proposed attention-augmented DRL framework is evaluated through a series of experiments. The objective is to verify the model’s ability to generate valid and optimized process routes for complex prismatic parts and to demonstrate its performance superiority and robustness.

5.1. Experiment Setup

The implementation and training of the proposed model are conducted on a high-performance workstation. The hardware and software environments used to support the neural network training and the geometric feature processing are detailed in Table 4 below.
To ensure the high fidelity of the manufacturing environment, a comprehensive resource library is constructed, containing 3 sets of CNC machine tools with varying precision grades and 24 types of standardized cutting tools [43].
Following the methodology of previous work [37], the training and testing dataset was constructed by collecting 329 historical machining files of various parts from industrial manufacturing plants and laboratories. These cases encompass a wide spectrum of topological complexities, with the number of machining features ( | V | ) ranging from 7 to 25. The dataset is primarily composed of complex parts, which are foundational to the automotive, aerospace, and general machinery manufacturing sectors. As illustrated in Figure 8, the part geometries include housings, brackets, cylindrical bases, and intricate structural components that require multi-axis processing.
These parts are characterized by a wide variety of machining features and stringent engineering requirements, ensuring that the agent learns to process diverse manufacturing semantics effectively. Specifically, the dataset incorporates diverse feature elements such as planar surfaces, stepped holes, through slots, precision reamed bores, and complex pockets. Moreover, the cases exhibit varying densities of datum dependencies, which necessitate that the agent strictly adheres to fundamental manufacturing principles, such as datum-first and rough-to-finish, when orchestrating the machining sequences. The represented manufacturing scenarios cover three-axis and multi-face machining centers where non-cutting overhead, particularly tool changes and setup adjustments, significantly impacts overall productivity.
The dataset was randomly split into a training set (80%) and a held-out test set (20%). The results reported in Section 5.3 are evaluated on the unseen test set. The training process utilizes the PPO-clip algorithm. To achieve stable convergence and avoid local optima in the high-dimensional action space, the hyperparameters are carefully tuned based on preliminary sensitivity tests. These configurations are summarized in Table 5.

5.2. Training Convergence and Stability Analysis

This section evaluates the learning behavior and convergence properties of the proposed attention-augmented DRL agent across 3000 training episodes. To perform a comprehensive assessment, the training dataset is partitioned into three complexity levels based on feature quantity: the Simple group (centered around |V| ≈ 7), the Medium group (centered around |V| ≈ 15), and the Complex group (centered around |V| ≈ 25). Performance metrics were sampled every five episodes to capture fine-grained training dynamics among these groups.
Figure 9 illustrates the average cumulative reward curves for the three complexity categories. All levels exhibit a robust and steady upward trend during the initial training stage. Due to the integrated masking mechanism, the agent avoids the “sparse reward” challenge commonly encountered in standard RL, as it filters invalid actions from the outset. Consequently, it maintains a 100% Constraint Satisfaction Rate (CSR) throughout the entire training process for all groups. For the Simple group, the agent rapidly identifies optimal process sequences, reaching a stable plateau around episode 1500. As complexity increases, the Medium and Complex groups exhibit slower convergence with pronounced fluctuations. This suggests that the GAT and Transformer modules encounter greater challenges in capturing dense topological dependencies and orchestrating longer process–resource chains. Despite the increased difficulty, all groups reach stable convergence by episode 2200, proving the framework’s strong adaptability to parts with varying feature densities.
To assess learning stability and robust optimization capabilities, three representative parts—Part 1, Part 2, and Part 3—were selected for independent training sessions. These parts possess similar complexity, featuring approximately 12–15 machining features, and involve dense datum dependencies along with multi-resource couplings. Figure 10 shows the training convergence curves for these scenarios.
The consistent convergence behavior across different parts, characterized by synchronized stability and comparable final reward levels, verifies that the proposed Hybrid Attention-DRL model can dependably identify optimal or near-optimal process routes for tasks of similar complexity.
The internal training stability is further examined in Figure 11, which displays the evolution of network losses and policy entropy. The Value Loss ( L V F ) exhibits an initial transient phase as the critic network learns to assess the state-value function for various part geometries, followed by a steady exponential decrease. The Policy Loss ( L C L I P ) remains near the zero baseline, demonstrating that the PPO-clip mechanism effectively constrains policy updates within a stable trust region to prevent extreme divergence. Simultaneously, the entropy curve follows a characteristic non-linear decline, representing the agent’s transition from broad exploration across multiple part types to a concentrated, high-confidence decision-making policy.

5.3. Comparative Analysis with Baseline Algorithms

To evaluate the optimization efficiency and generalization capability of the proposed attention-augmented DRL framework, it is compared against five representative baseline algorithms: Genetic Algorithm (GA), Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), Simulated Annealing (SA), and a Standard PPO model. All algorithms are evaluated using the same resource library and the multi-objective cost model.
To ensure a fair and persuasive comparison, the core parameters of the four heuristic algorithms (GA, ACO, PSO, and SA) were determined through a series of preliminary tuning experiments. Following the benchmark settings commonly used in similar Machining Process Route Planning (MPRP) optimization studies, the parameters were selected to ensure that each heuristic could reach a stable convergence state within the given problem scale (7–25 features). The specific configurations are summarized in Table 6.
The comparative study is conducted using the entire constructed dataset (329 machined parts) covering the full range of feature counts. This comprehensive evaluation ensures that the performance metrics reflect the algorithms’ capabilities across diverse topological complexities and datum dependency densities. Each algorithm was executed for 50 independent runs across the dataset to ensure statistical significance. The quantitative results, representing the average performance across all parts, are summarized in Table 7.
The results in Table 7 indicate that the proposed method achieves a comparative performance advantage over the entire dataset. While traditional heuristic algorithms (GA, ACO, PSO, SA) and the standard PPO model are capable of finding viable process routes, the proposed framework provides a modest reduction in total machining costs. Specifically, the proposed method achieves an average total cost of 422.3, representing an improvement of approximately 1.4% over the Standard PPO model and 3.8% over the best-performing heuristic baseline (ACO). This suggests that the integrated GAT and Transformer modules contribute to a more refined perception of resource coupling and topological constraints, leading to marginally better optimization of non-cutting costs ( C t and C c ). In terms of feasibility, the proposed method maintains a robust 100% CSR over the full range of parts, demonstrating the reliability of the masking mechanism in handling diverse constraint densities. The most notable benefit, however, lies in the computational efficiency and stability. The inference speed of 0.09 s on the RTX 4090 GPU allows for nearly instantaneous process orchestration, and the standard deviation (9.2) is comparable to the Standard PPO model but lower than the heuristic methods, indicating a reliable performance across diverse part geometries.
To assess the significance of the performance gains, a one-tailed t-test was performed between the proposed framework and the strongest heuristic baseline (ACO). Across the 50 independent runs for all 329 parts, the proposed method achieved a significantly lower total cost ( p < 0.05 ). This indicates that the performance improvements are statistically robust and not attributed to the stochastic nature of DRL training.
The overall cost distribution is visualized in Figure 12. The box plot illustrates that the performance ranges of the different algorithms overlap significantly, particularly between the DRL-based models and the best heuristic methods. However, the proposed method maintains a slightly lower median and a more compact distribution, confirming that the attention-augmented DRL approach offers a reliable and efficient option for process orchestration.
The scalability of all investigated algorithms with respect to increasing part complexity is illustrated in Figure 13. To maintain logical consistency with the categorical analysis in Section 5.2, the computation time is evaluated across three landmark complexity levels: Simple (|V| ≈ 7), Medium (|V| ≈ 15), and Complex (|V| ≈ 25).
As shown, the execution time for all four meta-heuristic baselines (GA, ACO, PSO, and SA) exhibits a near-exponential growth pattern as the search space expands with the number of machining features. For instance, ACO’s computation time escalates rapidly from approximately 4.2 s to over 130 s. In stark contrast, the proposed attention-augmented DRL framework displays a remarkably stable and flat inference trajectory, maintaining an average response time of 0.09 s across the entire complexity spectrum. This demonstrates that the integrated GAT and Transformer modules effectively internalize complex manufacturing constraints and dependencies within a fixed-time forward propagation, providing a highly scalable and real-time solution for intelligent process planning in large-scale industrial scenarios.
Furthermore, it is important to emphasize that while this study does not explicitly categorize constraint intensity into discrete gradients, the diverse dataset of 329 cases naturally spans a wide spectrum of manufacturing complexities—ranging from sparse topological dependencies to highly dense engineering constraints. The consistent achievement of a 100% Constraint Satisfaction Rate (CSR) across all test scenarios implicitly demonstrates the robustness of the dynamic masking mechanism and the hybrid attention architecture in handling varying constraint densities. Additionally, the stable performance maintained on the unseen test set (20% of the total cases), which comprises varied part geometries and feature combinations not present during training, serves as a testament to the model’s strong generalization capabilities and its potential for cross-scenario applicability in real-world industrial settings.

5.4. Ablation Study on Neural Components

To quantify the specific contribution of each architectural component to the overall performance, an ablation study was conducted. Four model variants were evaluated across the entire dataset:
  • Full Model (Proposed): The complete architecture incorporating GAT, Transformer, and the Masking mechanism.
  • No GAT: The GAT layer is replaced with a standard Multi-Layer Perceptron (MLP) for feature encoding, ignoring spatial topological dependencies among machining features.
  • No Transformer: The Transformer encoder is removed, relying solely on GAT-aggregated features for decision-making without explicit sequential resource inheritance modeling.
  • No Masking: The masking mechanism is disabled, allowing the agent to explore the entire action space, including invalid process steps that violate precedence constraints.
The performance metrics are summarized in Table 8.
To further quantify the contribution of each architectural component to the overall performance, a detailed analysis of the ablation variants was conducted based on the results in Table 8. The integration of the Graph Attention Network (GAT) is demonstrated to be pivotal for spatial process optimization; its removal (No GAT variant) results in a 19.1% increase in clamping costs ( C c ), from 56.1 to 66.8. This underscores the GAT’s critical role in aggregating spatial benchmark dependencies, which allows the agent to effectively group features and minimize redundant setups.
Furthermore, the Transformer encoder is essential for temporal resource management. The ‘No Transformer’ variant leads to a 15.0% rise in tool change costs ( C t ), increasing from 53.2 to 61.2. This validates the necessity of the self-attention mechanism in capturing long-range sequential correlations within variable-length machining chains for optimal tool inheritance. Notably, the Masking mechanism serves as the foundational guarantee for feasibility; without it (No Masking variant), the agent fails to identify a meaningful optimization gradient, and the Constraint Satisfaction Rate (CSR) plunges to 52.6%. These quantitative findings confirm the synergistic effect of the proposed hybrid attention-augmented architecture in achieving cost-efficient and 100% compliant process orchestration.
The ablation results highlight the functional necessity of each component. As illustrated in Figure 14, the No Masking variant (red line) fails to identify any optimization gradient. Throughout the 3000 training episodes, its reward trajectory remains purely stochastic, wandering irregularly around a low reward level without any significant upward trend. This confirms that without the action-space pruning provided by the masking mechanism, the agent cannot effectively learn the complex precedence rules and resource dependencies, resulting in a drastically low CSR of 52.6%.
In contrast, the Full Model achieves the most efficient convergence and the highest cumulative reward. The No GAT variant exhibits increased setup costs ( C c ), as it lacks spatial relational encoding to group features by common machining datums. The No Transformer variant shows a degradation in tool change performance ( C t ), proving that sequential attention is essential for resource inheritance optimization. These results validate the synergistic effect of the proposed hybrid attention architecture in complex process orchestration.

5.5. Case Study

Figure 15 illustrates the comprehensive decision-making workflow generated for part 1. The orchestration originates from the identification of initial datum features, which undergo a series of preprocessing treatments to be formally modeled as the initial state of the environment. This state is subsequently fed into the trained DRL framework to derive the final orchestration results. During the sequential decision-making process, a dynamic masking mechanism is utilized at each step to filter invalid actions that would violate the 31 precedence constraints, enabling the framework to navigate the high-dimensional search space and identify an optimal sequence. The resulting plan comprises 20 machining operations, successfully streamlining the execution to 12 tool changes and 5 clamping setups. This reduction in non-cutting overheads and auxiliary time directly validates the efficiency of the proposed method in large-scale industrial scenarios.
The detailed process parameters, including feature sequences and resource allocations, are summarized in Table 9. An in-depth analysis of this plan reveals that the DRL agent has effectively internalized professional machining expertise. The agent adhered to the Datum-First strategy by prioritizing the machining of datum planes F1, F2, and F3 from Step 1 to Step 8, thereby establishing stable locating surfaces for subsequent high-precision features. Furthermore, for the IT8 precision hole F4, the model correctly sequenced the multi-step process chain involving drilling, expanding, and reaming in Step 15, Step 19, and Step 20 respectively, which confirms the adherence to the Rough-to-Finish principle. Additionally, the Transformer-based sequential encoding effectively optimized resource correlations by grouping features F6 through F14 for consecutive processing under a unified +X tool access direction and consistent tooling. This case study confirms that the proposed hybrid architecture not only achieves high computational efficiency but also guarantees the generation of process routes that are highly consistent with practical industrial requirements.

6. Conclusions

Machining process route planning (MPRP) plays a critical role in the transition toward autonomous manufacturing. This paper proposed an attention-augmented deep reinforcement learning (DRL) framework for intelligent process orchestration. The proposed approach synergistically leverages Graph Attention Networks (GAT) to perceive spatial benchmark dependencies between machining features and a Transformer-based encoder to model sequential resource inheritance within machining chains.
The main contributions of this study are as follows:
  • An Optional Process Attribute Adjacency Graph (OPAAG) was developed to formally model the complex “feature–process–resource–constraint” coupling relationships, enabling the agent to perceive engineering constraints effectively.
  • A dynamic action masking mechanism was integrated to ensure a 100% constraint satisfaction rate (CSR) during both training and inference stages, providing a robust solution for hard engineering constraints.
  • The hybrid attention-augmented network was comprehensively evaluated against traditional heuristics (e.g., GA, ACO) and standard DRL models, validating its performance advantages in complex process orchestration.
Experimental results demonstrate that the proposed method achieves an average total cost reduction of approximately 3.8% compared to the best-performing heuristic baseline (ACO). Furthermore, with an inference speed of approximately 0.09 s, the framework significantly outperforms iterative algorithms in efficiency and exhibits superior stability across diverse part geometries.
While this research primarily focuses on optimizing resource-related and time-based costs, the minimization of non-cutting overheads directly benefits manufacturing quality. Specifically, the agent’s capability to minimize clamping setups and tool access direction changes indirectly ensures the consistency of machining precision by reducing cumulative setup errors. In future work, the expansion of this architecture to incorporate more sophisticated machining mechanism constraints and direct quality indicators, such as surface roughness, will be explored. Furthermore, multi-machine collaborative scheduling and dynamic resource environments will be investigated to further enhance its versatility in complex shop-floor scenarios [44,45].

Author Contributions

Conceptualization, R.W.; methodology, R.W.; software, R.W.; validation, R.W.; formal analysis, R.W.; investigation, Z.D.; resources, R.W.; data curation, X.D.; writing—original draft preparation, R.W.; writing—review and editing, Y.P.; visualization, R.W.; supervision, M.W.; project administration, Y.P.; funding acquisition, Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was sponsored by the National Key Research and Development Program of China (No. 2022YFB3304100).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Adapa, S.K.; Jagadish. An exhaustive review on intelligent computer-aided process planning in context with various optimisation techniques. Int. J. Mater. Prod. Technol. 2023, 66, 209–231. [Google Scholar] [CrossRef]
  2. Nasir, V.; Sassani, F. A review on deep learning in machining and tool monitoring: Methods, opportunities, and challenges. Int. J. Adv. Manuf. Technol. 2021, 115, 2683–2709. [Google Scholar] [CrossRef]
  3. Li, H.; Zhang, H.; He, Z.; Jia, Y.; Jiang, B.; Huang, X.; Ge, D. Solving integrated process planning and scheduling problem via graph neural network based deep reinforcement learning. arXiv 2024, arXiv:2409.00968. [Google Scholar] [CrossRef]
  4. Li, Y.; Zhou, T. Research on Intelligent Planning Method for Turning Machining Process Based on Knowledge Base. Machines 2025, 13, 417. [Google Scholar] [CrossRef]
  5. Waiyagan, K.; Bohez, E.L.J. Intelligent feature-based process planning for five-axis mill-turn parts. Comput. Ind. 2009, 60, 296–316. [Google Scholar] [CrossRef]
  6. Long, A.; Huang, S.; Tian, Y.; Zhang, Y.; Yu, L.; Chen, Z. An automatic construction and intelligent retrieval method for knowledge graph for machining process specifications. Int. J. Comput. Integr. Manuf. 2025, 1–22. [Google Scholar] [CrossRef]
  7. Hernandes, L.C.; Szejka, A.L.; Mas, F. Intelligent product manufacturing cost estimation framework driven by semantic technologies and knowledge-based systems. Int. J. Comput. Integr. Manuf. 2025, 1–22. [Google Scholar] [CrossRef]
  8. Hua, G.; Zhou, X.; Ruan, X. GA-based synthesis approach for machining scheme selection and operation sequencing optimization for prismatic parts. Int. J. Adv. Manuf. Technol. 2007, 33, 594–603. [Google Scholar] [CrossRef]
  9. Liu, X.; Yi, H.; Ni, Z. Application of ant colony optimization algorithm in process planning optimization. J. Intell. Manuf. 2013, 24, 1–13. [Google Scholar] [CrossRef]
  10. Shirzadi, S.; Tavakkoli-Moghaddam, R.; Kia, R.; Mohammadi, M. A multi-objective imperialist competitive algorithm for integrating intra-cell layout and processing route reliability in a cellular manufacturing system. Int. J. Comput. Integr. Manuf. 2017, 30, 839–855. [Google Scholar] [CrossRef]
  11. Wang, W.; Li, Y.; Huang, L. Rule and branch-and-bound algorithm based sequencing of machining features for process planning of complex parts. J. Intell. Manuf. 2018, 29, 1329–1336. [Google Scholar] [CrossRef]
  12. Jing, X.; Zhu, Y.; Liu, J.; Zhou, H.; Zhao, P.; Liu, X.; Li, Q. Intelligent generation method of 3D machining process based on process knowledge. Int. J. Comput. Integr. Manuf. 2020, 33, 38–61. [Google Scholar] [CrossRef]
  13. Liu, Q.; Wang, C.; Li, X.; Gao, L. An improved genetic algorithm with modified critical path-based searching for integrated process planning and scheduling problem considering automated guided vehicle transportation task. J. Manuf. Syst. 2023, 70, 127–146. [Google Scholar] [CrossRef]
  14. Peng, H.; Wang, H.; Chen, D. Optimization of remanufacturing process routes oriented toward eco-efficiency. Front. Mech. Eng. 2019, 14, 422–433. [Google Scholar] [CrossRef]
  15. Ding, S.; Guo, Z.; Wang, B.; Wang, H.; Ma, F. MBD-Based Machining Feature Recognition and Process Route Optimization. Machines 2022, 10, 906. [Google Scholar] [CrossRef]
  16. Leng, J.; Chen, Q.; Mao, N.; Jiang, P. Combining granular computing technique with deep learning for service planning under social manufacturing contexts. Knowl.-Based Syst. 2018, 143, 295–306. [Google Scholar] [CrossRef]
  17. Lei, R.; Wu, H.; Peng, Y. MFPointNet: A Point Cloud-Based Neural Network Using Selective Downsampling Layer for Machining Feature Recognition. Machines 2022, 10, 1165. [Google Scholar] [CrossRef]
  18. Wang, Z.; Zhang, S.; Zhang, H.; Zhang, Y.; Liang, J.; Huang, R.; Huang, B. Machining feature process route planning based on a graph convolutional neural network. Adv. Eng. Inform. 2024, 59, 102249. [Google Scholar] [CrossRef]
  19. Du, K.; Yang, B.; Wang, S.; Chang, Y.; Li, S.; Yi, G. Relation extraction for manufacturing knowledge graphs based on feature fusion of attention mechanism and graph convolution network. Knowl.-Based Syst. 2022, 255, 109703. [Google Scholar] [CrossRef]
  20. Panzer, M.; Bender, B. Deep reinforcement learning in production systems: A systematic literature review. Int. J. Prod. Res. 2022, 60, 4316–4341. [Google Scholar] [CrossRef]
  21. Wu, W.; Huang, Z.; Zeng, J.; Fan, K. A fast decision-making method for process planning with dynamic machining resources via deep reinforcement learning. J. Manuf. Syst. 2021, 58, 392–411. [Google Scholar] [CrossRef]
  22. Zhao, M.; Mo, L.; Liu, J.; Han, J.; Niu, D. GAT-based deep reinforcement learning algorithm for real-time task scheduling on multicore platform. In Proceedings of the 36th Chinese Control and Decision Conference (CCDC), Xi’an, China, 25–27 May 2024; pp. 5674–5679. [Google Scholar] [CrossRef]
  23. Xiao, Q.; Niu, B.; Xue, B.; Hu, L. Graph convolutional reinforcement learning for advanced energy-aware process planning. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 2802–2814. [Google Scholar] [CrossRef]
  24. Zhang, H.; Wang, W.; Zhang, S.; Zhang, Y.; Zhou, J.; Wang, Z.; Huang, B.; Huang, R. A novel method based on deep reinforcement learning for machining process route planning. Robot. Comput. Integr. Manuf. 2024, 86, 102688. [Google Scholar] [CrossRef]
  25. Zhang, H.; Wang, W.; Wang, Y.; Zhang, Y.; Zhou, J.; Huang, B.; Zhang, S. Employing deep reinforcement learning for machining process planning: An improved framework. J. Manuf. Syst. 2025, 78, 370–393. [Google Scholar] [CrossRef]
  26. Li, C.; Chang, Q.; Fan, H. Multi-agent reinforcement learning for integrated manufacturing system-process control. J. Manuf. Syst. 2024, 76, 585–598. [Google Scholar] [CrossRef]
  27. Zhang, N.; Liu, B.; Zhang, J. Dual Resource Scheduling Method of Production Equipment and Rail-Guided Vehicles Based on Proximal Policy Optimization Algorithm. Technologies 2025, 13, 573. [Google Scholar] [CrossRef]
  28. Zhu, G.; Wang, S.; Wang, L. Heterogeneous graph neural network for modeling intelligent manufacturing systems. Meas. Sci. Technol. 2024, 36, 015114. [Google Scholar] [CrossRef]
  29. Kwon, O.R.; Lee, G.T. A predictive model based on Transformer with statistical feature embedding in manufacturing sensor dataset. Int. J. Comput. Integr. Manuf. 2025, 1–16. [Google Scholar] [CrossRef]
  30. Li, W.; Nie, Y.; Yang, F. Multi-Variable Transformer-Based Meta-Learning for Few-Shot Fault Diagnosis of Large-Scale Systems. Sensors 2025, 25, 2941. [Google Scholar] [CrossRef] [PubMed]
  31. Zhang, C.; Wu, Y.; Ma, Y.; Song, W.; Le, Z.; Cao, Z.; Zhang, J. A review on learning to solve combinatorial optimisation problems in manufacturing. IET Collab. Intell. Manuf. 2023, 5, e12072. [Google Scholar] [CrossRef]
  32. Su, C.; Jiang, Q.; Han, Y.; Wang, T.; He, Q.C. Knowledge graph-driven decision support for manufacturing process: A graph neural network-based knowledge reasoning approach. Adv. Eng. Inform. 2025, 64, 103098. [Google Scholar] [CrossRef]
  33. Barto, A.G. Reinforcement learning: Connections, surprises, and challenge. AI Mag. 2019, 40, 3–15. [Google Scholar] [CrossRef]
  34. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Petersen, S.; Beattie, C.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
  35. Kwon, S.; Oh, Y. Optimal process planning for hybrid additive–subtractive manufacturing using recursive volume decomposition with decision criteria. J. Manuf. Syst. 2023, 71, 360–376. [Google Scholar] [CrossRef]
  36. Wu, H.; Lei, R.; Peng, Y.; Gao, L. AAGNet: A graph neural network towards multi-task machining feature recognition. Robot. Comput. Integr. Manuf. 2024, 86, 102661. [Google Scholar] [CrossRef]
  37. Zhang, L.; Wang, X.; Wu, H.; Peng, Y. A novel approach to part process route generation based on graph neural network encoding. Int. J. Comput. Integr. Manuf. 2025, 1–19. [Google Scholar] [CrossRef]
  38. Huang, B.; Zhang, S.; Huang, R.; Li, X.; Zhang, Y.; Liang, J.C. An effective numerical control machining process optimization approach of part with complex pockets for numerical control process reuse. IEEE Access 2019, 7, 45146–45165. [Google Scholar] [CrossRef]
  39. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30 (NIPS 2017), Proceedings of the 31st Conference on Neural Information Processing System, Long Beach, CA, USA, 4–9 December 2017; NIPS: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar] [CrossRef]
  40. Zhang, J.; Zhang, H.; Xia, C.; Sun, L. Graph-BERT: Only attention is needed for learning graph representations. arXiv 2020, arXiv:2001.05140. [Google Scholar] [CrossRef]
  41. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
  42. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
  43. Vespoli, S.; Mattera, G.; Marchesano, M.; Nele, L.; Guizzi, G. Adaptive manufacturing control with deep reinforcement learning for dynamic WIP management in Industry 4.0. Comput. Ind. Eng. 2025, 202, 110966. [Google Scholar] [CrossRef]
  44. Huang, Z.; Shen, Y.; Li, J.; Fey, M.; Brecher, C. A Survey on AI-Driven Digital Twins in Industry 4.0: Smart Manufacturing and Advanced Robotics. Sensors 2021, 21, 6340. [Google Scholar] [CrossRef] [PubMed]
  45. Ojstersek, R.; Brezocnik, M.; Buchmeister, B. Multi-objective optimization of production scheduling with evolutionary computation: A review. Int. J. Ind. Eng. Comput. 2020, 11, 359–376. [Google Scholar] [CrossRef]
Figure 1. Proposed attention-augmented DRL framework for machining process route planning.
Figure 1. Proposed attention-augmented DRL framework for machining process route planning.
Machines 14 00343 g001
Figure 2. Description of feature recognition.
Figure 2. Description of feature recognition.
Machines 14 00343 g002
Figure 3. Description of process chain matching.
Figure 3. Description of process chain matching.
Machines 14 00343 g003
Figure 4. Description of process resource matching, process cost calculation and the final synthesized OPAAG.
Figure 4. Description of process resource matching, process cost calculation and the final synthesized OPAAG.
Machines 14 00343 g004
Figure 5. Description of state definition.
Figure 5. Description of state definition.
Machines 14 00343 g005
Figure 6. Description of state transition.
Figure 6. Description of state transition.
Machines 14 00343 g006
Figure 7. Description of network Structure.
Figure 7. Description of network Structure.
Machines 14 00343 g007
Figure 8. Some parts of the dataset.
Figure 8. Some parts of the dataset.
Machines 14 00343 g008
Figure 9. Learning curves of the agent on representative test part groups with different complexities.
Figure 9. Learning curves of the agent on representative test part groups with different complexities.
Machines 14 00343 g009
Figure 10. Independent training sessions.
Figure 10. Independent training sessions.
Machines 14 00343 g010
Figure 11. Evolution of training losses and policy entropy.
Figure 11. Evolution of training losses and policy entropy.
Machines 14 00343 g011
Figure 12. Statistical comparison of cost optimization performance across the entire dataset.
Figure 12. Statistical comparison of cost optimization performance across the entire dataset.
Machines 14 00343 g012
Figure 13. Influence of Part Complexity on Inference Efficiency.
Figure 13. Influence of Part Complexity on Inference Efficiency.
Machines 14 00343 g013
Figure 14. Comparison of convergence curves for different ablation model architectures.
Figure 14. Comparison of convergence curves for different ablation model architectures.
Machines 14 00343 g014
Figure 15. Decision-making evolution and optimized machining workflow generated by the DRL framework for the 14-feature component.
Figure 15. Decision-making evolution and optimized machining workflow generated by the DRL framework for the 14-feature component.
Machines 14 00343 g015
Table 1. Detailed attribute representation of machining feature nodes extracted.
Table 1. Detailed attribute representation of machining feature nodes extracted.
SymbolPhysical Significance and Functional Role
I D A unique identifier assigned to each node v i in the graph.
S i z e Defines the basic dimensions (length, width, height) of the feature.
( x , y , z ) Specifies the spatial coordinates of the feature center or reference point.
T A D Direction of cutting.
I T Represents the dimensional tolerance grade required for the feature.
f Specifies the form and position tolerance requirements.
R a Denotes the required surface roughness of the machined surface.
d r e m Defines the material removal orientation or tool access direction.
δ Represents the machining allowance for the initial feature state.
T y p e Classifies the feature into specific machining categories.
Table 2. Typical machining process routes for plane and hole features.
Table 2. Typical machining process routes for plane and hole features.
CategoryTypical Machining Process Routes
PlaneRough Milling
Rough Milling → Finish Milling
Rough Milling → Semi-finish Milling → Finish Milling
Rough Milling → Grinding
Rough Milling → Finish Milling → Grinding
Rough Milling → Semi-finish Milling → Finish Milling → Grinding
HoleDrilling
Drilling → Reaming
Drilling → Expanding → Reaming
Drilling → Expanding → Boring → Fine Boring
Drilling → Expanding → Internal Grinding
Drilling → Expanding → Reaming → Internal Grinding
Drilling → Boring → Internal Grinding → Honing
Drilling → Expanding → Boring → Internal Grinding → Honing
Table 3. Parameterized structure of the processing resource libraries.
Table 3. Parameterized structure of the processing resource libraries.
CategoryManaged Parameter
ToolTool Model
Cutting   Speed   ( v c )
Feed   per   Tooth   ( f z )
Max   Cutting   Depth   ( a p )
Process Applicability
Machine Stroke   Limits   ( X , Y , Z )
Max   Spindle   Speed   ( n m a x )
Repeatability
Power Capacity
Table 4. Experimental environment configuration.
Table 4. Experimental environment configuration.
ComponentSpecification/Version
Operating SystemWindows 11
CPUIntel Core i9-13900K @ 3.00 GHz
RAM32 GB
GPUNVIDIA GeForce RTX 4090 (24 GB)
Programming LanguagePython 3.8
Deep Learning FrameworkPyTorch 1.12.1
Acceleration LibraryCUDA 11.7
Table 5. Hyperparameter configuration for the PPO-clip training.
Table 5. Hyperparameter configuration for the PPO-clip training.
CategoryHyperparameterValue
OptimizationOptimizerAdam
Learning Rate 2 × 10 4
Batch Size32
PPO Core Clip   Coefficient   ( ϵ )0.2
Discount   Factor   ( γ )0.98
GAE   Parameter   ( λ )0.95
NetworkAttention Heads4
Hidden Dimensions128
Table 6. Detailed parameter configurations for baseline heuristic algorithms.
Table 6. Detailed parameter configurations for baseline heuristic algorithms.
AlgorithmKey Parameter Settings
GAPopulation: 100; Generations: 500; Crossover: 0.8; Mutation: 0.1
ACOAnts: 50; Iterations: 300; α = 1.0; β = 2.0; Evaporation: 0.5
PSOParticles: 80; c 1 = c 2 = 2.0 ; Inertia weight: 0.9 0.4
SAInitial Temp: 1000; Cooling Rate: 0.95; Termination: 0.01
Table 7. Overall performance comparison across the entire dataset.
Table 7. Overall performance comparison across the entire dataset.
AlgorithmAvg. C t o t a l Std. Dev C m C t C c CSR (%)Inference Time (s)
GA445.2±13.5315.464.265.694.515.42
ACO438.8±15.2314.862.561.595.222.85
PSO448.6±16.8316.265.866.693.812.15
SA452.4±12.2318.568.465.596.018.60
Std. PPO428.5±10.5313.557.058.098.20.08
Proposed422.3±9.2312.853.256.1100.00.09
Table 8. Performance comparison of ablation model variants.
Table 8. Performance comparison of ablation model variants.
ConfigurationAvg. C t o t a l C t C c CSR (%)Convergence Episode
Full Model422.353.256.1100.0~2150
No GAT439.558.466.8100.0~2550
No Transformer431.861.258.6100.0~2300
No Masking624.592.488.652.6/
Table 9. Process Planning Result of Part 1.
Table 9. Process Planning Result of Part 1.
No.Feature IDFeature TypeProcess NameTADCutting ToolMachine
1F3PlaneRough Milling−XEnd Mill (Φ10)3-axis Machining Center
2F1PlaneRough Milling+YEnd Mill (Φ8)3-axis Machining Center
3F1PlaneFinish Milling+YEnd Mill (Φ6)3-axis Machining Center
4F2PlaneRough Milling−YEnd Mill (Φ8)3-axis Machining Center
5F2PlaneFinish Milling−YEnd Mill (Φ6)3-axis Machining Center
6F5Through SlotRough Milling−XEnd Mill (Φ8)3-axis Machining Center
7F3PlaneFinish MillingXEnd Mill (Φ8)3-axis Machining Center
8F5Through SlotFinish Milling−XEnd Mill (Φ6)3-axis Machining Center
9F9Through HoleDrilling+XTwist Drill (Φ3)3-axis Machining Center
10F10Through HoleDrilling+XTwist Drill (Φ3)3-axis Machining Center
11F13Through HoleDrilling+XTwist Drill (Φ4)3-axis Machining Center
12F14Through HoleDrilling+XTwist Drill (Φ4)3-axis Machining Center
13F11Through HoleDrilling+XTwist Drill (Φ4)3-axis Machining Center
14F12Through HoleDrilling+XTwist Drill (Φ4)3-axis Machining Center
15F4Through HoleDrilling+XTwist Drill (Φ20)3-axis Machining Center
16F6Through HoleDrilling+XTwist Drill (Φ2)3-axis Machining Center
17F8Through HoleDrilling+XTwist Drill (Φ2)3-axis Machining Center
18F7Through HoleDrilling+XTwist Drill (Φ2)3-axis Machining Center
19F4Through HoleExpanding+XExpansion Cutter (Φ20)3-axis Machining Center
20F4Through HoleReaming+XReamer (Φ20)3-axis Machining Center
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, R.; Wang, M.; Du, Z.; Dong, X.; Peng, Y. Hybrid Attention-Augmented Deep Reinforcement Learning for Intelligent Machining Process Route Planning. Machines 2026, 14, 343. https://doi.org/10.3390/machines14030343

AMA Style

Wang R, Wang M, Du Z, Dong X, Peng Y. Hybrid Attention-Augmented Deep Reinforcement Learning for Intelligent Machining Process Route Planning. Machines. 2026; 14(3):343. https://doi.org/10.3390/machines14030343

Chicago/Turabian Style

Wang, Ruizhe, Minrui Wang, Ziyan Du, Xiaochuan Dong, and Yibing Peng. 2026. "Hybrid Attention-Augmented Deep Reinforcement Learning for Intelligent Machining Process Route Planning" Machines 14, no. 3: 343. https://doi.org/10.3390/machines14030343

APA Style

Wang, R., Wang, M., Du, Z., Dong, X., & Peng, Y. (2026). Hybrid Attention-Augmented Deep Reinforcement Learning for Intelligent Machining Process Route Planning. Machines, 14(3), 343. https://doi.org/10.3390/machines14030343

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop