In this section, the proposed attention-augmented DRL framework is evaluated through a series of experiments. The objective is to verify the model’s ability to generate valid and optimized process routes for complex prismatic parts and to demonstrate its performance superiority and robustness.
5.1. Experiment Setup
The implementation and training of the proposed model are conducted on a high-performance workstation. The hardware and software environments used to support the neural network training and the geometric feature processing are detailed in
Table 4 below.
To ensure the high fidelity of the manufacturing environment, a comprehensive resource library is constructed, containing 3 sets of CNC machine tools with varying precision grades and 24 types of standardized cutting tools [
43].
Following the methodology of previous work [
37], the training and testing dataset was constructed by collecting 329 historical machining files of various parts from industrial manufacturing plants and laboratories. These cases encompass a wide spectrum of topological complexities, with the number of machining features (
) ranging from 7 to 25. The dataset is primarily composed of complex parts, which are foundational to the automotive, aerospace, and general machinery manufacturing sectors. As illustrated in
Figure 8, the part geometries include housings, brackets, cylindrical bases, and intricate structural components that require multi-axis processing.
These parts are characterized by a wide variety of machining features and stringent engineering requirements, ensuring that the agent learns to process diverse manufacturing semantics effectively. Specifically, the dataset incorporates diverse feature elements such as planar surfaces, stepped holes, through slots, precision reamed bores, and complex pockets. Moreover, the cases exhibit varying densities of datum dependencies, which necessitate that the agent strictly adheres to fundamental manufacturing principles, such as datum-first and rough-to-finish, when orchestrating the machining sequences. The represented manufacturing scenarios cover three-axis and multi-face machining centers where non-cutting overhead, particularly tool changes and setup adjustments, significantly impacts overall productivity.
The dataset was randomly split into a training set (80%) and a held-out test set (20%). The results reported in
Section 5.3 are evaluated on the unseen test set. The training process utilizes the PPO-clip algorithm. To achieve stable convergence and avoid local optima in the high-dimensional action space, the hyperparameters are carefully tuned based on preliminary sensitivity tests. These configurations are summarized in
Table 5.
5.2. Training Convergence and Stability Analysis
This section evaluates the learning behavior and convergence properties of the proposed attention-augmented DRL agent across 3000 training episodes. To perform a comprehensive assessment, the training dataset is partitioned into three complexity levels based on feature quantity: the Simple group (centered around |V| ≈ 7), the Medium group (centered around |V| ≈ 15), and the Complex group (centered around |V| ≈ 25). Performance metrics were sampled every five episodes to capture fine-grained training dynamics among these groups.
Figure 9 illustrates the average cumulative reward curves for the three complexity categories. All levels exhibit a robust and steady upward trend during the initial training stage. Due to the integrated masking mechanism, the agent avoids the “sparse reward” challenge commonly encountered in standard RL, as it filters invalid actions from the outset. Consequently, it maintains a 100% Constraint Satisfaction Rate (CSR) throughout the entire training process for all groups. For the Simple group, the agent rapidly identifies optimal process sequences, reaching a stable plateau around episode 1500. As complexity increases, the Medium and Complex groups exhibit slower convergence with pronounced fluctuations. This suggests that the GAT and Transformer modules encounter greater challenges in capturing dense topological dependencies and orchestrating longer process–resource chains. Despite the increased difficulty, all groups reach stable convergence by episode 2200, proving the framework’s strong adaptability to parts with varying feature densities.
To assess learning stability and robust optimization capabilities, three representative parts—Part 1, Part 2, and Part 3—were selected for independent training sessions. These parts possess similar complexity, featuring approximately 12–15 machining features, and involve dense datum dependencies along with multi-resource couplings.
Figure 10 shows the training convergence curves for these scenarios.
The consistent convergence behavior across different parts, characterized by synchronized stability and comparable final reward levels, verifies that the proposed Hybrid Attention-DRL model can dependably identify optimal or near-optimal process routes for tasks of similar complexity.
The internal training stability is further examined in
Figure 11, which displays the evolution of network losses and policy entropy. The Value Loss (
) exhibits an initial transient phase as the critic network learns to assess the state-value function for various part geometries, followed by a steady exponential decrease. The Policy Loss (
) remains near the zero baseline, demonstrating that the PPO-clip mechanism effectively constrains policy updates within a stable trust region to prevent extreme divergence. Simultaneously, the entropy curve follows a characteristic non-linear decline, representing the agent’s transition from broad exploration across multiple part types to a concentrated, high-confidence decision-making policy.
5.3. Comparative Analysis with Baseline Algorithms
To evaluate the optimization efficiency and generalization capability of the proposed attention-augmented DRL framework, it is compared against five representative baseline algorithms: Genetic Algorithm (GA), Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), Simulated Annealing (SA), and a Standard PPO model. All algorithms are evaluated using the same resource library and the multi-objective cost model.
To ensure a fair and persuasive comparison, the core parameters of the four heuristic algorithms (GA, ACO, PSO, and SA) were determined through a series of preliminary tuning experiments. Following the benchmark settings commonly used in similar Machining Process Route Planning (MPRP) optimization studies, the parameters were selected to ensure that each heuristic could reach a stable convergence state within the given problem scale (7–25 features). The specific configurations are summarized in
Table 6.
The comparative study is conducted using the entire constructed dataset (329 machined parts) covering the full range of feature counts. This comprehensive evaluation ensures that the performance metrics reflect the algorithms’ capabilities across diverse topological complexities and datum dependency densities. Each algorithm was executed for 50 independent runs across the dataset to ensure statistical significance. The quantitative results, representing the average performance across all parts, are summarized in
Table 7.
The results in
Table 7 indicate that the proposed method achieves a comparative performance advantage over the entire dataset. While traditional heuristic algorithms (GA, ACO, PSO, SA) and the standard PPO model are capable of finding viable process routes, the proposed framework provides a modest reduction in total machining costs. Specifically, the proposed method achieves an average total cost of 422.3, representing an improvement of approximately 1.4% over the Standard PPO model and 3.8% over the best-performing heuristic baseline (ACO). This suggests that the integrated GAT and Transformer modules contribute to a more refined perception of resource coupling and topological constraints, leading to marginally better optimization of non-cutting costs (
and
). In terms of feasibility, the proposed method maintains a robust 100% CSR over the full range of parts, demonstrating the reliability of the masking mechanism in handling diverse constraint densities. The most notable benefit, however, lies in the computational efficiency and stability. The inference speed of 0.09 s on the RTX 4090 GPU allows for nearly instantaneous process orchestration, and the standard deviation (9.2) is comparable to the Standard PPO model but lower than the heuristic methods, indicating a reliable performance across diverse part geometries.
To assess the significance of the performance gains, a one-tailed t-test was performed between the proposed framework and the strongest heuristic baseline (ACO). Across the 50 independent runs for all 329 parts, the proposed method achieved a significantly lower total cost (). This indicates that the performance improvements are statistically robust and not attributed to the stochastic nature of DRL training.
The overall cost distribution is visualized in
Figure 12. The box plot illustrates that the performance ranges of the different algorithms overlap significantly, particularly between the DRL-based models and the best heuristic methods. However, the proposed method maintains a slightly lower median and a more compact distribution, confirming that the attention-augmented DRL approach offers a reliable and efficient option for process orchestration.
The scalability of all investigated algorithms with respect to increasing part complexity is illustrated in
Figure 13. To maintain logical consistency with the categorical analysis in
Section 5.2, the computation time is evaluated across three landmark complexity levels: Simple (|V| ≈ 7), Medium (|V| ≈ 15), and Complex (|V| ≈ 25).
As shown, the execution time for all four meta-heuristic baselines (GA, ACO, PSO, and SA) exhibits a near-exponential growth pattern as the search space expands with the number of machining features. For instance, ACO’s computation time escalates rapidly from approximately 4.2 s to over 130 s. In stark contrast, the proposed attention-augmented DRL framework displays a remarkably stable and flat inference trajectory, maintaining an average response time of 0.09 s across the entire complexity spectrum. This demonstrates that the integrated GAT and Transformer modules effectively internalize complex manufacturing constraints and dependencies within a fixed-time forward propagation, providing a highly scalable and real-time solution for intelligent process planning in large-scale industrial scenarios.
Furthermore, it is important to emphasize that while this study does not explicitly categorize constraint intensity into discrete gradients, the diverse dataset of 329 cases naturally spans a wide spectrum of manufacturing complexities—ranging from sparse topological dependencies to highly dense engineering constraints. The consistent achievement of a 100% Constraint Satisfaction Rate (CSR) across all test scenarios implicitly demonstrates the robustness of the dynamic masking mechanism and the hybrid attention architecture in handling varying constraint densities. Additionally, the stable performance maintained on the unseen test set (20% of the total cases), which comprises varied part geometries and feature combinations not present during training, serves as a testament to the model’s strong generalization capabilities and its potential for cross-scenario applicability in real-world industrial settings.
5.4. Ablation Study on Neural Components
To quantify the specific contribution of each architectural component to the overall performance, an ablation study was conducted. Four model variants were evaluated across the entire dataset:
Full Model (Proposed): The complete architecture incorporating GAT, Transformer, and the Masking mechanism.
No GAT: The GAT layer is replaced with a standard Multi-Layer Perceptron (MLP) for feature encoding, ignoring spatial topological dependencies among machining features.
No Transformer: The Transformer encoder is removed, relying solely on GAT-aggregated features for decision-making without explicit sequential resource inheritance modeling.
No Masking: The masking mechanism is disabled, allowing the agent to explore the entire action space, including invalid process steps that violate precedence constraints.
The performance metrics are summarized in
Table 8.
To further quantify the contribution of each architectural component to the overall performance, a detailed analysis of the ablation variants was conducted based on the results in
Table 8. The integration of the Graph Attention Network (GAT) is demonstrated to be pivotal for spatial process optimization; its removal (No GAT variant) results in a 19.1% increase in clamping costs (
), from 56.1 to 66.8. This underscores the GAT’s critical role in aggregating spatial benchmark dependencies, which allows the agent to effectively group features and minimize redundant setups.
Furthermore, the Transformer encoder is essential for temporal resource management. The ‘No Transformer’ variant leads to a 15.0% rise in tool change costs (), increasing from 53.2 to 61.2. This validates the necessity of the self-attention mechanism in capturing long-range sequential correlations within variable-length machining chains for optimal tool inheritance. Notably, the Masking mechanism serves as the foundational guarantee for feasibility; without it (No Masking variant), the agent fails to identify a meaningful optimization gradient, and the Constraint Satisfaction Rate (CSR) plunges to 52.6%. These quantitative findings confirm the synergistic effect of the proposed hybrid attention-augmented architecture in achieving cost-efficient and 100% compliant process orchestration.
The ablation results highlight the functional necessity of each component. As illustrated in
Figure 14, the No Masking variant (red line) fails to identify any optimization gradient. Throughout the 3000 training episodes, its reward trajectory remains purely stochastic, wandering irregularly around a low reward level without any significant upward trend. This confirms that without the action-space pruning provided by the masking mechanism, the agent cannot effectively learn the complex precedence rules and resource dependencies, resulting in a drastically low CSR of 52.6%.
In contrast, the Full Model achieves the most efficient convergence and the highest cumulative reward. The No GAT variant exhibits increased setup costs (), as it lacks spatial relational encoding to group features by common machining datums. The No Transformer variant shows a degradation in tool change performance (), proving that sequential attention is essential for resource inheritance optimization. These results validate the synergistic effect of the proposed hybrid attention architecture in complex process orchestration.
5.5. Case Study
Figure 15 illustrates the comprehensive decision-making workflow generated for part 1. The orchestration originates from the identification of initial datum features, which undergo a series of preprocessing treatments to be formally modeled as the initial state of the environment. This state is subsequently fed into the trained DRL framework to derive the final orchestration results. During the sequential decision-making process, a dynamic masking mechanism is utilized at each step to filter invalid actions that would violate the 31 precedence constraints, enabling the framework to navigate the high-dimensional search space and identify an optimal sequence. The resulting plan comprises 20 machining operations, successfully streamlining the execution to 12 tool changes and 5 clamping setups. This reduction in non-cutting overheads and auxiliary time directly validates the efficiency of the proposed method in large-scale industrial scenarios.
The detailed process parameters, including feature sequences and resource allocations, are summarized in
Table 9. An in-depth analysis of this plan reveals that the DRL agent has effectively internalized professional machining expertise. The agent adhered to the Datum-First strategy by prioritizing the machining of datum planes F1, F2, and F3 from Step 1 to Step 8, thereby establishing stable locating surfaces for subsequent high-precision features. Furthermore, for the IT8 precision hole F4, the model correctly sequenced the multi-step process chain involving drilling, expanding, and reaming in Step 15, Step 19, and Step 20 respectively, which confirms the adherence to the Rough-to-Finish principle. Additionally, the Transformer-based sequential encoding effectively optimized resource correlations by grouping features F6 through F14 for consecutive processing under a unified +X tool access direction and consistent tooling. This case study confirms that the proposed hybrid architecture not only achieves high computational efficiency but also guarantees the generation of process routes that are highly consistent with practical industrial requirements.