RAPSO: An Integrated PSO with Reinforcement Learning and an Adaptive Weight Strategy for the High-Precision Milling of Elastic Materials

Li, Qingxin; Zeng, Peng; Wu, Qiankun; Zhang, Zijing

doi:10.3390/s25185913

Open AccessArticle

RAPSO: An Integrated PSO with Reinforcement Learning and an Adaptive Weight Strategy for the High-Precision Milling of Elastic Materials

¹

State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110169, China

²

University of Chinese Academy of Sciences, Beijing 101408, China

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(18), 5913; https://doi.org/10.3390/s25185913

Submission received: 22 August 2025 / Revised: 19 September 2025 / Accepted: 19 September 2025 / Published: 22 September 2025

(This article belongs to the Section Sensor Materials)

Download

Browse Figures

Versions Notes

Abstract

This study tackles the challenge of achieving high-precision robotic machining of elastic materials, where elastic recovery and overcutting often impair accuracy. To address this, a novel milling strategy, RAPSO, is introduced by combining an adaptive particle swarm optimization (APSO) algorithm with a reinforcement learning (RL)-based compensation mechanism. The method builds a material-specific milling model through residual error characterization, incorporates a dynamic inertia weight adjustment strategy into APSO for optimized toolpath generation, and integrates a Proximal Policy Optimization (PPO)-based RL module to refine trajectories iteratively. Experiments show that RAPSO reduces residual material by

33.51 %

compared with standard PSO and APSO methods, while offering faster convergence and greater stability. The proposed framework provides a practical solution for precision machining of elastic materials, offering improved accuracy, reduced post-processing requirements, and higher efficiency, while also contributing to the theoretical modeling of elastic recovery and advanced toolpath planning.

Keywords:

robot path planning; particle swarm optimization; milling of elastic materials

1. Introduction

High-precision robotic machining of elastic materials (e.g., viscoelastic polymers, composite propellants, and flexible rubbers) has become a critical requirement in advanced manufacturing fields such as aerospace, biomedical engineering, and soft robotics. However, the inherent deformability and post-machining elastic recovery of these materials, coupled with the flexible dynamic characteristics of robotic systems, lead to persistent challenges including dimensional deviations, overcutting/undercutting, and trajectory instability—all of which severely impair machining accuracy and surface integrity [1,2]. Cetin et al. (2024) further verified the effectiveness of machine learning algorithms in predicting material wear, providing algorithmic references for elastic material machining error prediction [3]. To address these issues, the integration of swarm intelligence optimization, machine learning, and adaptive control has emerged as a promising direction, though existing research still leaves key gaps in practical applicability.

1.1. State of the Art in Elastic and Flexible Material Machining

The machining of deformable materials has long relied on modeling and optimization to mitigate accuracy losses. Early foundational work by Bravo et al. (2005) established a landmark model for milling stability that simultaneously considers the flexibility of both the workpiece and the machine tool, revealing how dynamic interactions between the two contribute to vibration-induced errors [4]. Budak (2006) built an analytical model for cutting forces and structural deformations, laying a basic framework for subsequent elastic material machining modeling [5]. Building on this, Campa et al. (2011) focused on thin-wall floor milling with bull-nose end mills, proposing a chatter avoidance framework that combines geometric modeling and stability diagrams to reduce deformation-related defects [6]. Their findings highlighted the need for material-specific process adaptation, a theme expanded by Marin (2024) in five-axis milling of free-form surfaces; his research demonstrated that tool geometry (ball-end vs. circle-segment end mills) and material flexibility jointly determine surface quality, though it lacked real-time adjustment mechanisms for elastic recovery [7].

In terms of process parameter optimization, Varga et al. (2024) quantified the influence of tool inclination angle and effective cutting speed on the roughness of machined flexible surfaces, providing empirical guidance for parameter selection but not addressing trajectory-level optimization [8]. For a more systematic process simulation, Urbikain-Pelayo et al. (2025) developed Mill+, a tool integrating vibration, cutting force, and surface quality prediction—yet its static modeling approach fails to capture the dynamic springback of elastic materials during robotic machining [9]. These studies collectively confirm that while flexibility and deformation are well-recognized, existing models often decouple material behavior from robotic system dynamics, limiting their precision in practical scenarios.

1.2. Path Planning and Optimization for Machining

Swarm intelligence algorithms, particularly particle swarm optimization (PSO), have been widely adopted for machining path optimization due to their efficiency in global searches. Zhang et al. (2020) proposed a PSO-RBFNN hybrid algorithm for industrial robot inverse kinematics, improving motion smoothness but relying on fixed parameters that struggle with elastic material variability [10]. Bai et al. (2025) optimized the structural stability of CNC gantry machine tools through multi-objective optimization, indirectly providing a hardware foundation for path planning accuracy improvement [11]. To enhance PSO’s performance, fusion with other meta-heuristics has been explored: Zhao et al. (2024) applied an SA-PSO (simulated annealing-PSO) algorithm to optimize impeller side milling paths, boosting efficiency by 15% but failing to balance early-stage exploration and late-stage exploitation [12]. Li et al. (2021) combined PSO with LSSVM for surface roughness prediction, expanding PSO’s application in machining quality control [13]. Abualigah (2025) summarized the latest advances in PSO, noting that adaptive parameter adjustment is key to improving its optimization performance [14]. Pajaziti et al. (2025) optimized CNC machine paths and performance via PSO, further verifying PSO’s role in improving machining efficiency [15].

Adaptive strategies have emerged as a solution to fixed-parameter limitations. Abbas et al. (2020) developed an adaptive design for stainless steel face milling that balances cost, quality, and productivity, while Kaood et al. (2021) optimized nanofluid performance via adaptive PSO—yet neither addressed the unique springback characteristics of elastic materials [16,17]. Even advanced path planning methods, such as the jerk-limited heuristic feedrate scheduling proposed by Xiao et al. (2022), prioritize trajectory smoothness over dynamic error compensation for deformable workpieces [18]. Liu et al. (2024) proposed a linear programming-based time-optimal feedrate planning method, and Liu et al. (2025) developed an S-shaped feedrate-based NURBS interpolator, further enriching the technical path of feedrate optimization [19,20]. El-Kenawy et al. (2022) and Bashir et al. (2023) validated the effectiveness of ensemble learning in engineering applications of intelligent optimization algorithms, providing algorithmic references for machining parameter optimization [21,22].

1.3. Reinforcement Learning and Dynamic Compensation

Reinforcement learning (RL) has shown promise for real-time control in complex machining scenarios, thanks to its ability to learn from environmental feedback. Xiao et al. (2019) was among the first to apply meta-reinforcement learning to machining parameter optimization, laying a foundation for subsequent research [23]. Lu et al. (2025) used meta-RL for energy-efficient five-axis flank milling, reducing energy consumption by 20% while maintaining precision; however, their approach was not tailored to elastic material springback [2]. Zhang et al. (2024) applied deep RL to machining process route planning, improving robustness but lacking a mechanism for trajectory-level error correction [24].

Kaliyannan et al. (2024) combined RL with deep learning for tool condition monitoring, providing technical support for real-time diagnosis of machining processes [25]. Danket et al. (2025) used deep RL to achieve adaptive production capacity planning under variable electricity costs, demonstrating RL’s adaptability in dynamic production environments [26]. Dynamic error compensation remains a critical bottleneck. While Mantas et al. (2025) integrated AI with digital twins for carbon fiber-reinforced plastics (CFRP) machining, their system relied on pre-calibrated models rather than real-time adaptive compensation [27]. For robotic systems specifically, Makulavičius et al. (2023) noted that robotic flexibility—while enabling complex operations—introduces spasmodic trajectory movements during continuous interpolation, a problem rarely addressed in existing compensation frameworks [28].

1.4. Critical Gaps in Existing Research

Despite significant progress, three core limitations persist in the high-precision milling of elastic materials:

Inadequate Springback Modeling. Most existing models (e.g., [6,9]) rely on simplified assumptions or empirical coefficients for elastic recovery, failing to capture its dynamic dependence on cutting thickness and material properties. For example, Liu et al. (2004) established a micro-end-milling force model but did not extend it to predict post-machining springback in viscoelastic materials [29].
Unbalanced Optimization Performance.Traditional PSO and even adaptive variants (e.g., [12,17]) use static or semi-static inertia weights, leading to either premature convergence (local optima) or slow convergence—which are critical issues for time-sensitive robotic machining.
Lack of Robot-Material Adaptive Compensation. While RL-based methods (e.g., [2,24]) improve path robustness, they do not integrate real-time feedback of elastic recovery and robotic trajectory fluctuations, resulting in persistent overcutting/undercutting.

1.5. Objectives and Contributions of This Study

The main contributions of this article can be summarized as follows:

Residual Definition Milling Model. A resilient milling model based on residual definition is constructed, which can predict the deformation and rebound behavior of elastic materials. This model provides a theoretical foundation for addressing machining accuracy issues caused by material characteristics during the milling process of elastic materials, contributing to improved dimensional accuracy and surface quality of the final product.
Adaptive Weight PSO Algorithm. An improved particle swarm optimization algorithm is proposed, which incorporates an adaptive inertia weight strategy for path planning. By dynamically adjusting the inertia weight, this algorithm can broadly explore the solution space in the early stages and refine the search later on, accelerating convergence and improving cutting precision.
Compensation Module Based on Reinforcement Learning. A reinforcement learning compensation module using Proximal Policy Optimization (PPO) is integrated. This module is capable of dynamically adjusting strategies based on real-time feedback, reducing processing errors caused by springback and overcutting, thereby further enhancing machining accuracy and surface smoothness.

This paper is structured as follows: Section 2 presents the theoretical foundation, detailing the development of an elastic material machining model grounded in residual error analysis alongside the formulation of process constraints and optimization criteria. Subsequently, Section 3 introduces a hybrid particle swarm optimization (PSO) framework integrating reinforcement learning principles and adaptive inertia weight mechanisms, named as RAPSO, to generate high-precision cutting trajectories. Furthermore, Section 4 validates the proposed methodology through comparative simulations across diverse machining scenarios, quantitatively assessing its performance in trajectory generation efficiency and dimensional precision. Finally, Section 5 summarizes the key findings and implications of this research, highlighting the technical contributions and potential applications in precision manufacturing.

2. Identifying Dynamic Parameters

A material-specific deformation–prediction framework is established to characterize the elastic recovery and structural deformation dynamics of workpieces during precision machining. This model integrates a quantitative metric for uncut material volume, which serves as the primary optimization criterion for the subsequent reinforcement learning-based trajectory refinement process.

2.1. Material-Specific Machining Behavior Analysis

As illustrated in the schematic diagram (Figure 1), workpieces composed of viscoelastic materials exhibit distinctive machining behavior due to their inherent stress relaxation properties. During the contouring operation towards the target geometry [29], the material undergoes post-machining elastic recovery, resulting in dimensional deviations from the desired profile. By incorporating this springback phenomenon into the toolpath planning algorithm (represented by the optimized trajectory in the diagram), the final surface contour achieves improved conformity with the design specifications through compensatory adjustments.

To account for elastic recovery during machining, a thickness-dependent elastic recovery coefficient k is introduced:

\begin{matrix} \frac{h_{r}}{h} = k \\ \{\begin{matrix} k = 0.1, & h > h_{l} \\ k = 0.02, & h < h_{m i n} \\ k = 0.05, & h_{m i n} < h < h_{l} \end{matrix} \end{matrix}

(1)

where

h_{r}

denotes the thickness recovered elastically after the cutting process, h represents the nominal cutting thickness, and k is the coefficient characterizing the elastic recovery behavior. Based on the principle of minimum cutting thickness, the value of k is not constant but varies with the actual cutting thickness. In the present study, k is defined as a function of h, following the expression given above, with the reference thickness

h_{l} = 4

mm and the minimum effective cutting thickness

h_{m i n} = 2

mm. This functional dependence allows the model to account for size effects and material behavior at small undeformed chip thicknesses. Although empirically defined, this approach is grounded in machining principles such as the minimum cutting thickness and viscoelastic material behavior [29]. By incorporating k as a function of h, the model more accurately predicts post-machining deviations. The relationship between elastic recovery, k, and cutting thickness is illustrated in Figure 2, supporting both theoretical analysis and experimental validation.

2.2. Milling Allowance Definition

In the machining of elastic materials, the term “milling allowance” refers to the predetermined layer of material intentionally left for removal to compensate for elastic deformations induced during the milling operation. This allowance is critical for achieving the desired dimensional precision and surface integrity in the final component. Materials exhibiting significant elasticity, such as specific types of rubber or compliant polymers, are prone to undergo reversible deformation when subjected to cutting forces. This phenomenon, illustrated in Figure 3, necessitates careful planning of the machining allowance to ensure that the elastic recovery is adequately accommodated within the process parameters.

In precision machining processes, the desired geometry of the workpiece is typically defined using CAD models. However, such representations, whether in the form of boundary meshes or discrete point clouds, are often unsuitable for direct integration into mathematical and physical simulations of the cutting process. To facilitate analytical treatment, a continuous implicit function representation is adopted. The target geometry is thereby described by the zero level set of a scalar function:

G (x, y) = 0

(2)

The actual initial geometry of the workpiece is represented by another implicit function:

S (x, y) = 0

(3)

In the context of cutting path optimization, the material boundary plays a pivotal role in influencing both the process efficiency and geometric fidelity. This boundary is defined as the planar curve separating the region to be retained from the excess material designated for removal. It can be accurately reconstructed through piecewise functional fitting techniques based on sampled edge data.

A key objective in process planning is to generate a toolpath that minimizes material waste while maintaining high cutting precision. Such a toolpath is initially specified as a sequence of discrete waypoints that the cutting tool must traverse. To ensure smooth and continuous motion, these discrete points are interpolated into a continuous curve that closely conforms to the actual material edge.

As illustrated in Figure 3, the gradient vector of the target shape function

G (x, y)

, denoted as follows:

▽ G = (G_{x}, G_{y}) = (\frac{\partial G}{\partial x}, \frac{\partial G}{\partial y})

(4)

Equation (4) defines the local normal direction at any point v on the target curve. The corresponding point u on the initial workpiece surface

S (x, y) = 0

along this normal direction satisfies the geometric relation, as follows:

\vec{u} = \vec{v} + α ▽ G (v)

(5)

Given that u lies on the surface S, it follows that

S (u) = 0

. Substituting Equation (5) into the implicit form of S yields the following:

S (u) = S (\vec{v} + α ▽ G (v)) = 0

(6)

For a given point v, Equation (6) can be solved for the scalar parameter

α

. Defining

α = f (v)

, the local thickness of the material to be removed at position v is subsequently determined as follows:

δ^{(s)} {(v)}_{v \in X^{(G)}} = | | \vec{u} - \vec{v} | | = α | | ▽ G (v) | | = f (v) | | ▽ G (v) | |

(7)

2.3. Cutting Constraints and Objectives

Machining operations are inherently irreversible material-removal processes, where any excess material removed beyond the designated boundary cannot be restored. To preserve the integrity of the final component geometry, it is essential to prevent overcutting, which occurs when the tool path intersects or penetrates the target shape boundary. Consequently, a fundamental process constraint requires that the cutting tool trajectory must remain entirely outside or coincident with the desired final surface but never intrude into the region bounded by the target curve. Therefore, the cutting constraint can be defined as follows:

δ^{(s)} (v) \geq 0, \forall v \in X^{(G)}

(8)

Furthermore, the primary objective of the machining operation is to efficiently eliminate the surplus material defined by the initial workpiece geometry, thereby approaching the final target shape as closely as possible. This entails minimizing the volume or area of residual material remaining after cutting, which directly correlates with machining accuracy and process efficiency. Accordingly, the optimization goal is formulated in Equation (9) to reduce the total material deviation between the initial stock and the desired contour, ensuring high precision and minimal post-processing requirements.

\begin{matrix} a r g m i n : t \\ \{\begin{matrix} max_{v \in X^{(G)}} δ^{(S_{t})} (v) \leq ϵ \\ min_{v \in X^{(G)}} δ^{(S_{t})} (v) \geq 0 \end{matrix} \end{matrix}

(9)

Under the prescribed cutting accuracy tolerance

ϵ

and subject to the constraint defined by Equation (8), the optimization objective is to minimize the total number of cutting tools, denoted as t, required to complete the machining process. To improve efficiency while keeping precision within bounds.

3. Methodology

To address the machining characteristics of elastic materials, this study proposes a cutting trajectory planning method based on an improved particle swarm optimization algorithm. Owing to the elastic recovery of materials after the removal of cutting forces, direct machining according to the theoretical contour would result in deviations between the actual formed dimensions and the design specifications. To resolve this issue, the present work implements active compensation of tool paths through the optimization algorithm, introducing a dynamic adjustment mechanism into the conventional particle swarm optimization framework, thereby effectively mitigating precision loss caused by springback effects.

3.1. Particle Swarm Optimization Algorithm

For trajectory planning in robotic milling operations, this paper proposes a path optimization method based on feature point extraction. Specifically, the discrete tool center positions are first interpolated using spline curves to construct continuously differentiable machining trajectories. Subsequently, an adaptive sampling algorithm is employed to identify critical feature points along the path, which significantly reduces trajectory data volume while maintaining machining accuracy. This study adopts a segmented optimization strategy for milling trajectory planning, with the specific implementation process used being the following: First, based on machining accuracy requirements, the target curve

S (x, y) = 0

is discretized into N characteristic segments, and

N + 1

representative control nodes (including start/end points and key intermediate points) are selected to construct the initial trajectory. Within the optimization algorithm framework, these control nodes are mapped to particle swarm individuals in a high-dimensional search space, where each particle’s position is denoted as

P_{i x} = (1, 2, \dots, N + 1)

, velocity as

v_{i}

, and historical optimal position as

P_{b e s t_{i}}

. Notably, as shown in Equation (7), the machining allowance

δ

serves as a key optimization index, whose magnitude exhibits a negative correlation with final surface quality. By utilizing a cutting simulation surrogate model, we can obtain the remaining material allowance after each cutting process. The fitness function for the particle swarm is defined as follows:

f = \sum_{i = 1}^{N + 1} δ^{(s_{t})} (P_{i})

(10)

Using the aforementioned equation, the minimum fitness value for each particle is determined, and the corresponding optimal trajectory position sequence, denoted as

P_{i y}

, can be derived. Consequently, the algorithmic formulation for the milling path in the particle swarm optimization is established.

\{\begin{matrix} v_{i}^{m + 1} = w v_{i}^{m} + c_{1} r_{1} (P_{b e s t_{i}} - P_{i y}^{m}) + c_{2} r_{2} (G_{b e s t_{i}} - P_{i y}^{m}) \\ P_{i y}^{m + 1} = P_{i y}^{m} + P_{i y}^{m} \end{matrix}

(11)

where w denotes the inertia weight, reflecting the extent of reliance on the current velocity direction. The constants

c_{1}

and

c_{2}

are acceleration coefficients that control the maximum step size during the learning process. Additionally,

r_{1}

and

r_{2}

represent two independently generated random numbers uniformly distributed within [0, 1], introduced to enhance the stochastic nature of the search behavior.

3.2. Adaptive Weight Particle Swarm Optimization Algorithm

In the conventional particle swarm optimization (PSO) algorithm, the inertia weight remains constant and does not vary with the number of iterations. When the inertia weight is set to a large value during the initial stages, although it accelerates the convergence speed, it also increases the likelihood of the algorithm getting trapped in a local optimal solution, thereby hindering its ability to reach the global optimum. To address this issue, this paper introduces an adaptive inertia weight strategy that is adjusted based on the optimization progress, as shown in Equation (12).

w = w_{m i n} + (1 - \frac{i}{m} \frac{P_{b e s t_{i}}}{G_{b e s t_{i}}}) (w_{m a x} - w_{m i n})

(12)

Given the parameters

w_{m i n} = 0.6

and

w_{m a x} = 0.8

, where m denotes the total number of iterations and i represents the current iteration count, the ratio

\frac{i}{m}

progressively increases as the algorithm executes, indicating that the computational process advances as expected. In the dynamic regulation mechanism of particle swarm optimization algorithms, parameter

\frac{P_{b e s t_{i}}}{G_{b e s t_{i}}}

characterizes the proximity between the current particle fitness and the global optimal solution, with its dynamic variation reflecting the adaptive adjustment characteristics of the algorithm’s search strategy. Specifically, when the particle fitness approaches the global optimum, the inertia weight is reduced to contract the search range, enabling a precise local search within the neighborhood of the optimal solution. Conversely, during the initial algorithm phase, a larger inertia weight preserves the particle’s global exploration capability, ensuring comprehensive traversal of the solution space. This parameter regulation mechanism, based on variations in the value of

\frac{P_{b e s t_{i}}}{G_{b e s t_{i}}}

and coupled with the relationship of

\frac{P_{b e s t_{i}}}{G_{b e s t_{i}}}

and

1 - \frac{i}{m} * \frac{P_{b e s t_{i}}}{G_{b e s t_{i}}}

, dynamically balances the requirements of global exploration and local exploitation. It achieves an adaptive transition from coarse-grained search to refined optimization, fully demonstrating the self-organizing characteristics of intelligent optimization algorithms.

3.3. Reinforcement Learning Optimization Module

To address the springback phenomenon and overcutting issues that occur during the milling of elastic materials, based on the aforementioned APSO algorithm, this paper further proposes an optimization method combined with reinforcement learning. Generally speaking, the initial cutting path is obtained based on APSO. For the particles (i.e., the path) in APSO, further compensation is carried out based on reinforcement learning to form the final compensated path.

m i n_{x} L_{t o t a l} = L_{P S O} (x) + λ L_{P P O} (x)

(13)

In this study, the Proximal Policy Optimization (PPO) algorithm, illustrated in Figure 4, was adopted to train our reinforcement learning model. The PPO algorithm is widely applied in various tasks due to its stability and efficiency. We simulate the milling process as an environment, input the position of each particle as the state to the model, and use the compensation amount output by the model to adjust the original path to reduce processing errors.

The reward function is the core of the reinforcement learning algorithm, determining the optimization direction of the compensation module. We design a multi-objective reward function that simultaneously optimizes processing precision, surface quality, and processing efficiency:

R_{t} = - (δ_{t}^{+} + δ_{t}^{-}) - λ \cdot Π {f l a g_{t} = 0}

(14)

where

δ_{t}^{+}

and

δ_{t}^{-}

are computed by evaluating the deviation of the recovered boundary from the ideal cutting edge.

λ

is a fixed penalty (set to 5.0) for invalid cutting (i.e., incomplete or failed operations).

Π {f l a g_{t} = 0}

is an indicator function, which takes a value of 1 when the

f l a g_{t}

indicates a machining failure state, and 0 otherwise.

This formulation implicitly enforces a policy that prefers actions resulting in minimal recovery error and successful cuts, guiding the agent toward fine-tuned compensation distributions that preserve geometric fidelity post-cut. Although the current implementation uses fixed scalar weights

λ

, the reward function structure is inherently multi-objective:

R = w_{1} \cdot Δ_{r e d u c t i o n} - w_{2} \cdot δ_{o v e r c u t} - w_{3} \cdot δ_{u n c e r c u t} - w_{4} \cdot FailurePenaltv

(15)

The physical meanings and functions of each indicator are as follows:

(1) Residual Reduction (

Δ_{r e d u c t i o n}

): Represents the reduction in cutting residuals before and after compensation, serving as the core performance metric. A larger value indicates a more significant decrease in residual errors after compensation, with the optimization objective being maximization. In the reward function, it acts as a positive term, reinforced by the weight

w_{1}

to incentivize improved accuracy.

(2) Overcut Amount (

δ_{o v e r c u t}

): Quantifies the extent to which the actual trajectory exceeds the target boundary, measured in pixel units. Overcutting leads to unintended material removal and is considered a negative indicator. In the reward function, it is penalized through the negative weight

w_{2}

to drive the optimization process toward minimizing this value.

(3) Undercut Amount (

δ_{u n d e r c u t}

): Quantifies the shortfall distance between the actual toolpath and the target boundary, measured in millimeter/pixel units. Undercutting results in residual material remaining on the workpiece surface after machining. Acts as a negative term penalized by weight

w_{3}

. The optimization objective is to minimize to ensure first-pass machining success.

(4) Machining Failure Penalty (FailurePenalty):A fixed-magnitude penalty is triggered when critical failures occur. It delivers a substantial negative reward upon failure detection. Moreover, it takes precedence over other objectives (safety-critical).

The PPO algorithm is adopted to train the policy network through the following steps and the complete execution flow is detailed in Algorithm 1:

(1) Environment Interaction:The agent interacts with the milling environment, executing action

a_{t}

and observing immediate reward

r_{t}

and new state

s_{t + 1}

.

(2) Experience Replay:Store state–action–reward–state (SAR) transitions in an experience buffer for subsequent training.

(3) Objective Function Optimization:Use the PPO clip objective function for policy updates, limiting the difference between new and old policies to ensure training stability:

L_{P P O} (θ) = E [m i n (r_{t}) (θ) {\hat{A}}_{t}, c l i p (r_{t} (θ), 1 - ϵ, 1 + ϵ) A_{t}]

(16)

where

θ

represents the current policy parameters.

r_{t} (θ) = π_{θ} (a_{t} | s_{t}) / π_{θ_{old}} (a_{t} | s_{t})

is the probability ratio between the new and old policies.

{\hat{A}}_{t}

is the estimated advantage function, typically computed using Generalized Advantage Estimation (GAE).

ϵ

is a small threshold (e.g., 0.2) controlling the clipping range.

This formulation encourages the agent to improve the policy based on the advantage estimation while constraining large deviations from the previous policy, ensuring training stability.

The PPO loss is typically composed of three parts:

L = L_{c l i p} - c_{1} \cdot L_{v a l u e} + c_{2} \cdot L_{e n t r o p y}

(17)

(4) Value Network Update:Use Generalized Advantage Estimation (GAE) to estimate the advantage function and update the Critic network.

(5) Dynamic Weight Adjustment:Based on historical reward values and processing goals, dynamically adjust the weights of the multi-objective reward function to achieve adaptive optimization.

Algorithm 1 PPO-based Milling Path Compensation Module

1:: Input: Prepared tool path image set $S = {s_{0}, s_{1}, \dots, s_{n}}$ . Initial policy parameters $θ_{0}$ , value network parameters $ϕ_{0}$ . Action set $A = {a_{1}, a_{2}, \dots, a_{k}}$ . Maximum iterations T, clipping coefficient $ϵ$ , learning rate $α$ .
2:: Output: Trained policy $π_{θ}$ for generating compensation sequence.
3:: Initialize policy network $π_{θ}$ and value network $V_{ϕ}$ with $θ_{0}, ϕ_{0}$ .
4:: for $t = 1$ to T do
5:: Sample image $s_{t} \in S$ and initialize $E_{t}$
6:: for each time step in episode do
7:: Select action $a_{t} \sim π_{θ} (a_{t} | s_{t})$
8:: Execute $a_{t}$ in $E_{t}$ , observe reward $r_{t}$ and $s_{t + 1}$
9:: Store transition $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ into buffer B
10:: end for
11:: Compute advantage estimates ${\hat{A}}_{t}$ using GAE
12:: Compute policy ratio $r_{t} (θ) = π_{θ} (a_{t} | s_{t}) / π_{θ_{old}} (a_{t} | s_{t})$
13:: Optimize clipped PPO objective
14:: Update policy network parameters $θ$ using gradient ascent
15:: Update value network parameters $ϕ$ via MSE loss
16:: end for
17:: return optimized policy $π_{θ}$

4. Experiment and Analysis

4.1. Typical Scenarios

Taking the workpiece in Figure 5 as an example, the black part represents the target shape to be machined, and the gray part represents the part to be cut. The optimal cutting path obtained through the traditional particle swarm optimization algorithm retains a greater margin than the RAPSO after cutting.

As shown in Figure 5, the comparison of processing results among PSO, APSO (Adaptive PSO), and RAPSO clearly presents the differences in path optimization effects. The processing trajectory generated by RAPSO is closer to the target boundary, and the material residual after cutting is the smallest.

Furthermore, as shown in Figure 6, during 100 iterations, the variation trends of residual allowances corresponding to the three algorithms with the number of iterations are presented. The convergence speed of RAPSO is significantly higher than that of PSO and APSO, and it shows obvious advantages in the early stage, which reflects the effectiveness of the reinforcement learning compensation strategy.

4.2. Multiple Scene Statistics Results

To further verify the universality of the algorithm in processing tasks of different complex workpieces, this paper randomly selects four typical machining cases (Figure 7) for comparative experiments and records the residual allowances (unit: pixel) of PSO, APSO, and RAPSO under their respective optimal solutions. The experimental results are shown in Table 1.

According to Table 1, the incorporation of the adaptive weight adjustment strategy enables APSO to reduce residual pixels by an average of

6.50 %

compared to the traditional PSO algorithm. On this basis, RAPSO further integrates the reinforcement learning path compensation mechanism, which can reduce the error by an average of

28.47 %

, thus significantly improving the accuracy and stability of path planning. It can be concluded that the improved algorithm has enhanced cutting accuracy.

4.3. Experimental Validation of Robotic Machining System

To empirically validate the proposed methodologies, a comprehensive robotic machining system was developed, illustrated in Figure 8, comprising a high-precision six-axis manipulator (Rokae NB25, Rokae Robotics Co., Ltd., Beijing, China, repeatability

\pm 0.03

mm, payload 25 kg), a 3D depth-sensing camera array (Mech-Mind, Mech-Mind Robotics Co., Ltd., Beijing, China spatial resolution 0.05 mm, field of view

500 \times 500

mm), a trajectory generation workstation, and a CNC-end milling unit (maximum spindle speed 20,000 rpm, 2.2 kW). A double-edged ball milling cutter was used with a rotation speed of 100 r/s and a feed speed of 0.5 mm/s. The 3D imaging subsystem acquires detailed surface topography through point clouds, which are cross-referenced with the target STL model to identify discrete machining layers for processing. The trajectory generation workstation implements the two proposed algorithms to synthesize optimized toolpaths, coordinating robotic arm movements and spindle parameters in real time. Experimental results (Figure 9) demonstrate that all methods achieve dimensional accuracy within the specified tolerance (≤2 mm). These algorithms effectively mitigate two critical challenges in elastic material machining: localized overcutting during tool engagement and residual deformation (springback) during unloading. The improved surface finish further confirms the effectiveness of the algorithms in maintaining dimensional stability and preserving geometric integrity throughout iterative machining cycles.

5. Conclusions

This study introduces RAPSO, a hybrid optimization framework for high-precision robotic milling of elastic materials, which integrates adaptive weight particle swarm optimization (PSO) with a PPO-based reinforcement learning module. By incorporating a residual-based milling model, dynamic inertia weight adjustment, and learning-based trajectory compensation, RAPSO effectively addresses challenges such as springback and overcutting. Experimental results demonstrate that RAPSO reduces residual material by 33.51% compared with standard PSO, achieves faster convergence, and maintains higher stability, thereby improving machining accuracy and efficiency and reducing post-processing requirements.

The proposed framework provides both practical and theoretical contributions. It offers a robust solution for the precision machining of elastic materials while advancing the modeling of elastic recovery and enabling more effective toolpath planning. Future work will focus on further refining elastic recovery and milling allowance modeling through empirical formulas combined with multi-fidelity Kriging surrogate models and evaluating RAPSO in more diverse industrial environments, including different materials, tools, and multi-axis milling scenarios, to assess robustness, scalability, and generalizability.

Overall, RAPSO demonstrates significant improvements in the robotic milling of elastic materials and lays a solid foundation for future research in high-precision manufacturing applications.

Author Contributions

Q.L. designed the study, drafted the manuscript, and acquired funding. P.Z. contributed to the methodology development and critical revision of the paper. Q.W. carried out software implementation, data analysis, and visualization. Z.Z. supervised the project and provided resources. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (92267301), the National Key Research and Development Program of China (2024YFB4711105) and the State Key Laboratory of Robotics (2025-Z02-02).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, C.C.; Kuo, P.H.; Chen, G.Y. Machine learning prediction of turning precision using optimized xgboost model. Appl. Sci. 2022, 12, 7739. [Google Scholar] [CrossRef]
Lu, F.; Zhou, G.; Zhang, C.; Liu, Y.; Chang, F.; Lu, Q.; Xiao, Z. Energy-efficient tool path generation and expansion optimisation for five-axis flank milling with meta-reinforcement learning. J. Intell. Manuf. 2025, 36, 3817–3841. [Google Scholar] [CrossRef]
Cetin, A.; Atali, G.; Erden, C.; Ozkan, S.S. Assessing the performance of state-of-the-art machine learning algorithms for predicting electro-erosion wear in cryogenic treated electrodes of mold steels. Adv. Eng. Inform. 2024, 61, 102468. [Google Scholar] [CrossRef]
Bravo, U.; Altuzarra, O.; De Lacalle, L.L.; Sánchez, J.; Campa, F. Stability limits of milling considering the flexibility of the workpiece and the machine. Int. J. Mach. Tools Manuf. 2005, 45, 1669–1680. [Google Scholar] [CrossRef]
Budak, E. Analytical models for high performance milling. Part I: Cutting forces, structural deformations and tolerance integrity. Int. J. Mach. Tools Manuf. 2006, 46, 1478–1488. [Google Scholar] [CrossRef]
Campa, F.; De Lacalle, L.L.; Celaya, A. Chatter avoidance in the milling of thin floors with bull-nose end mills: Model and stability diagrams. Int. J. Mach. Tools Manuf. 2011, 51, 43–53. [Google Scholar] [CrossRef]
Marin, F. Five-Axis Milling of Rough and PBF-LB Parts with Free-Form Surfaces Using Ball-End and Circle-Segment end Mills. Ph.D. Thesis, European Humanities University (EHU), Vilnius, Lithuania, 2024. [Google Scholar]
Varga, J.; Demko, M.; Kaščák, L.; Ižol, P.; Vrabel’, M.; Brindza, J. Influence of Tool Inclination and Effective Cutting Speed on Roughness Parameters of Machined Shaped Surfaces. Machines 2024, 12, 318. [Google Scholar] [CrossRef]
Urbikain-Pelayo, G.; Olvera-Trejo, D.; de Lacalle, L.N.L.; Elías-Zuñiga, A.; Cabanes, I. Mill+, an intuitive tool for simulating the milling process: Vibrations, cutting forces and surface quality control. SoftwareX 2025, 30, 102114. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, C.; Hu, L.; Qiu, G. Inverse kinematics problem of industrial robot based on PSO-RBFNN. In Proceedings of the 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 12–14 June 2020; IEEE: Piscataway, NJ, USA, 2020; Volume 1, pp. 346–350. [Google Scholar]
Bai, Y.; Yuan, Z.; Yan, Y.; Liu, S. Multiobjective optimization for the bed structure of a CNC gantry machine tool based on neural networks and intelligent optimization algorithms. Sci. Prog. 2025, 108, 00368504251359073. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y. Optimization of machining path for integral impeller side milling based on SA-PSO fusion algorithm in CNC machine tools. Front. Mech. Eng. 2024, 10, 1361929. [Google Scholar] [CrossRef]
Li, B.; Tian, X. An effective PSO-LSSVM-based approach for surface roughness prediction in high-speed precision milling. IEEE Access 2021, 9, 80006–80014. [Google Scholar] [CrossRef]
Abualigah, L. Particle Swarm Optimization: Advances, Applications, and Experimental Insights. Comput. Mater. Contin. 2025, 82, 1539–1592. [Google Scholar] [CrossRef]
Pajaziti, A.; Tafilaj, O.; Gjelaj, A.; Berisha, B. Optimization of Toolpath Planning and CNC Machine Performance in Time-Efficient Machining. Machines 2025, 13, 65. [Google Scholar] [CrossRef]
Abbas, A.T.; Abubakr, M.; Hassan, M.A.; Luqman, M.; Soliman, M.S.; Hegab, H. An adaptive design for cost, quality and productivity-oriented sustainable machining of stainless steel 316. J. Mater. Res. Technol. 2020, 9, 14568–14581. [Google Scholar] [CrossRef]
Kaood, A.; Abubakr, M.; Al-Oran, O.; Hassan, M.A. Performance analysis and particle swarm optimization of molten salt-based nanofluids in parabolic trough concentrators. Renew. Energy 2021, 177, 1045–1062. [Google Scholar] [CrossRef]
Xiao, J.; Liu, S.; Liu, H.; Wang, M.; Li, G.; Wang, Y. A jerk-limited heuristic feedrate scheduling method based on particle swarm optimization for a 5-DOF hybrid robot. Robot. Comput.-Integr. Manuf. 2022, 78, 102396. [Google Scholar] [CrossRef]
Liu, G.; Li, Q.; Yang, B.; Zhang, H.; Fang, L. An efficient linear programming-based time-optimal feedrate planning considering kinematic and dynamics constraints of robots. IEEE Robot. Autom. Lett. 2024, 9, 2742–2749. [Google Scholar] [CrossRef]
Liu, G.; Tan, F.; Zhang, M.; Li, Q.; Fang, L. AS-shaped feedrate based NURBS interpolator for synchronization of robot tool tip trajectory and attitude. Asian J. Control 2025. [Google Scholar] [CrossRef]
El-Kenawy, E.S.M.; Zerouali, B.; Bailek, N.; Bouchouich, K.; Hassan, M.A.; Almorox, J.; Kuriqi, A.; Eid, M.; Ibrahim, A. Improved weighted ensemble learning for predicting the daily reference evapotranspiration under the semi-arid climate conditions. Environ. Sci. Pollut. Res. 2022, 29, 81279–81299. [Google Scholar] [CrossRef]
Bashir, R.N.; Khan, F.A.; Khan, A.A.; Tausif, M.; Abbas, M.Z.; Shahid, M.M.A.; Khan, N. Intelligent optimization of Reference Evapotranspiration (ETo) for precision irrigation. J. Comput. Sci. 2023, 69, 102025. [Google Scholar] [CrossRef]
Xiao, Q.; Li, C.; Tang, Y.; Li, L. Meta-reinforcement learning of machining parameters for energy-efficient process control of flexible turning operations. IEEE Trans. Autom. Sci. Eng. 2019, 18, 5–18. [Google Scholar] [CrossRef]
Zhang, H.; Wang, W.; Zhang, S.; Zhang, Y.; Zhou, J.; Wang, Z.; Huang, B.; Huang, R. A novel method based on deep reinforcement learning for machining process route planning. Robot. Comput.-Integr. Manuf. 2024, 86, 102688. [Google Scholar] [CrossRef]
Kaliyannan, D.; Thangamuthu, M.; Pradeep, P.; Gnansekaran, S.; Rakkiyannan, J.; Pramanik, A. Tool condition monitoring in the milling process using deep learning and reinforcement learning. J. Sens. Actuator Netw. 2024, 13, 42. [Google Scholar] [CrossRef]
Danket, T.; Tanachutiwat, S.; Rungreunganun, V. Adaptive Production Capacity Planning Under Variable Electricity Cost Using Deep Reinforcement Learning. Int. J. Integr. Eng. 2025, 17, 17–30. [Google Scholar] [CrossRef]
Choi, G.R.; Yang, H.; Lee, J.H.; Runfa, T.; Cho, I.S.; Park, S.J.; Lee, C.G.; Kang, J.K. Explainable AI-based evaluation of factors affecting heavy metal removal by microalgae-based adsorbents. J. Appl. Phycol. 2025, 1–14. [Google Scholar] [CrossRef]
Makulavičius, M.; Petkevičius, S.; Rožėnė, J.; Dzedzickis, A.; Bučinskas, V. Industrial robots in mechanical machining: Perspectives and limitations. Robotics 2023, 12, 160. [Google Scholar] [CrossRef]
Liu, X.; Jun, M.B.; DeVor, R.E.; Kapoor, S.G. Cutting mechanisms and their influence on dynamic forces, vibrations and stability in micro-endmilling. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition, Anaheim, CA, USA, 13–19 November 2004; Volume 47136, pp. 583–592. [Google Scholar]

Figure 1. Rebound phenomenon in milling elastic materials.

Figure 2. Relationship diagram between elastic recovery amount, elastic recovery coefficient, and cutting thickness.

Figure 3. Description of cutting process. (a) Definition of milling allowance; (b) The process of material edge shape change during cutting.

Figure 4. Optimized overall flowchart.

Figure 5. Comparison of milling processing results.

Figure 6. Remaining change.

Figure 7. Randomly generated four processing cases.

Figure 8. Robot milling experimental platform.

Figure 9. Comparison of milling experiments.

Table 1. Comparison of different groups.

No.	PSO (Pixel)	APSO (Pixel)	RAPSO (Pixel)	Improvement_1	Improvement_2
1	644	569	364	11.65%	36.03%
2	701	684	573	2.43%	16.23%
3	798	776	465	2.76%	40.08%
4	644	585	459	9.16%	21.54%

Improvement_1 refers to the improvement rate of APSO compared with PSO. Improvement_2 refers to the improvement rate of RAPSO compared with APSO.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Q.; Zeng, P.; Wu, Q.; Zhang, Z. RAPSO: An Integrated PSO with Reinforcement Learning and an Adaptive Weight Strategy for the High-Precision Milling of Elastic Materials. Sensors 2025, 25, 5913. https://doi.org/10.3390/s25185913

AMA Style

Li Q, Zeng P, Wu Q, Zhang Z. RAPSO: An Integrated PSO with Reinforcement Learning and an Adaptive Weight Strategy for the High-Precision Milling of Elastic Materials. Sensors. 2025; 25(18):5913. https://doi.org/10.3390/s25185913

Chicago/Turabian Style

Li, Qingxin, Peng Zeng, Qiankun Wu, and Zijing Zhang. 2025. "RAPSO: An Integrated PSO with Reinforcement Learning and an Adaptive Weight Strategy for the High-Precision Milling of Elastic Materials" Sensors 25, no. 18: 5913. https://doi.org/10.3390/s25185913

APA Style

Li, Q., Zeng, P., Wu, Q., & Zhang, Z. (2025). RAPSO: An Integrated PSO with Reinforcement Learning and an Adaptive Weight Strategy for the High-Precision Milling of Elastic Materials. Sensors, 25(18), 5913. https://doi.org/10.3390/s25185913

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RAPSO: An Integrated PSO with Reinforcement Learning and an Adaptive Weight Strategy for the High-Precision Milling of Elastic Materials

Abstract

1. Introduction

1.1. State of the Art in Elastic and Flexible Material Machining

1.2. Path Planning and Optimization for Machining

1.3. Reinforcement Learning and Dynamic Compensation

1.4. Critical Gaps in Existing Research

1.5. Objectives and Contributions of This Study

2. Identifying Dynamic Parameters

2.1. Material-Specific Machining Behavior Analysis

2.2. Milling Allowance Definition

2.3. Cutting Constraints and Objectives

3. Methodology

3.1. Particle Swarm Optimization Algorithm

3.2. Adaptive Weight Particle Swarm Optimization Algorithm

3.3. Reinforcement Learning Optimization Module

4. Experiment and Analysis

4.1. Typical Scenarios

4.2. Multiple Scene Statistics Results

4.3. Experimental Validation of Robotic Machining System

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI