Controlling Product Properties in Forming Processes Using Reinforcement Learning—An Application to V-Die Bending

Veitenheimer, Ciarán-Victor; Molitor, Dirk Alexander; Arne, Viktor; Groche, Peter

doi:10.3390/app15105483

Open AccessArticle

Controlling Product Properties in Forming Processes Using Reinforcement Learning—An Application to V-Die Bending

by

Ciarán-Victor Veitenheimer

^*

,

Dirk Alexander Molitor

,

Viktor Arne

and

Peter Groche

Institute for Production Engineering and Forming Machines, Technical University of Darmstadt, Otto-Berndt-Straße 2, 64287 Darmstadt, Germany

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(10), 5483; https://doi.org/10.3390/app15105483

Submission received: 20 March 2025 / Revised: 22 April 2025 / Accepted: 9 May 2025 / Published: 14 May 2025

(This article belongs to the Special Issue Advanced Digital Design and Intelligent Manufacturing)

Download

Browse Figures

Versions Notes

Abstract

Uncertainty is unavoidable in forming processes due to fluctuating properties in the semi-finished product, the tool system and the environment. For this reason, numerous scientists have addressed this issue by developing control approaches like self-optimizing machine tools or the control of product properties. Machine learning algorithms, in particular reinforcement learning (RL) methods, show promising results for controlling production processes in this way. In this paper, the application of RL is demonstrated on an industrially commonly used process, V-die bending. For this purpose, first a flexible tool system is developed that allows the bending angle to be adjusted continuously between 80 and 110°. The developed tool is initially simulated through an FEM model in order to create a sufficient database for the training of an RL agent for springback compensation. The pre-trained agent is then used to control the springback in the real process. To close the resulting sim-to-real gap, it is then retrained on the experimentally generated data. It is shown that the springback can be significantly reduced compared to the uncontrolled case in both the simulative and the experimental process.

Keywords:

die bending; property control; reinforcement learning; metal forming; closed-loop control

1. Introduction

Uncertainty is omnipresent in forming processes and is reflected in fluctuating product properties. Fluctuating influences from the semi-finished product, the tool, the machine and the environment prevent reliable manufacturing of components. For this reason, scientists have been researching approaches that contribute to the control of uncertainty for several years. Self-optimizing machine tools [1] and closed-loop control of product properties [2] are seen as promising higher-level approaches. Both approaches have in common that information about product properties of previous production steps is used to manipulate the product properties of subsequent workpieces. For this purpose, models are often used that draw on relationships between the control variables of the process and the product properties. In most cases, the relationships have high complexities, interdependencies and non-linearities, cannot be modelled by analytical models or are unsuitable for control due to long computing times [3]. Therefore, machine learning algorithms are increasingly used that allow a data-driven description of the relationship and whose model parameters are fitted in such a way that manually parameterized cost functions are minimized.

Reinforcement Learning (RL) algorithms are particularly suitable for deriving optimal control variables in forming processes. Using the example of a Finite-Element Method (FEM) deep drawing simulation, Dornheim et al. [4] show that at discrete time steps, the blankholder force can be optimized with respect to sheet thinning, cracking and stress peaks. Idzik et al. [5] show that coupling a RL algorithm with an analytical model in a rolling process leads to pass schedules that optimize both the mechanical properties of the components and the energy efficiency of the process. A similar approach is presented by Reinisch et al. [6] in a multi-stage open-die forging process, where the RL algorithm is rewarded for low geometry deviations of the components as well as for a high utilization of the available press force. Deng et al. [7] demonstrate the suitability of RL algorithms for controlling the sheet thickness in sheet metal strip rolling by adjusting the roll forces and tilts, resulting in reductions in sheet thickness variations.

The first applications of RL algorithms for the design and control of forming processes can be found in the literature. However, their application is largely limited to simulative investigations. This is due to implementation barriers that make it difficult to transfer policies generated from simulation data to real processes. Reasons for this include sim-to-real gaps [8] that often cannot be closed due to a lack of availability of experimental data and latencies between reference variables and manipulated variables that are not simulated [9]. Thus, the full potential of RL for forming process has not yet been exploited. Therefore, in this paper, we present an approach for how RL agents trained on FEM simulation data can be transferred to experimental operation and contribute to the control of product properties. As an example process, a flexible die bending process is used that allows the production of different bending angles and materials. On the one hand, the process is modelled using an FEM simulation, which is used to generate synthetic data, and on the other hand, a flexible die bending tool is designed and manufactured to carry out the experimental tests.

The paper is organized as follows. Section 2 introduces the basic principles of the die bending process and RL algorithms, which are fundamental for the understanding of the paper. Section 3 presents the methodology for RL-based control of the die bending process. The subject of the section is the setup of the FEM simulation, the experimental setup and the procedure for the application and training of the applied RL algorithm. Section 4.1 first presents simulative results on the performance of the RL algorithm under different levels of uncertainty and then its performance in the real process. The paper concludes with a summary of the results in Section 5.

2. State of the Art

2.1. Die Bending

Die bending is a bending process with a straight tool movement where the tool system consists of a punch and a die. Depending on the tool shape, a distinction is made between V and U die bending. It is often used industrially due to its increased robustness, compared to other bending processes. As long as the bending part has not established full contact with the punch and die, it is referred to as free bending. As soon as the workpiece establishes form closure in the tool, i.e., the bending angle is equal to the die angle, free bending is complete and the coining process begins. This is characterized by a sharp increase in the process force (

F_{P}

) which is required to press the workpiece into the die. This significantly increases the dimensional accuracy of the finished component and makes precision die bending the most accurate bending process. An exact calculation to quantify the punch force (

F_{P, req}

) required for coining is not known from the literature and can be almost arbitrarily large. However, it can be estimated approximately, e.g., according to [10] using Equation (1). The die angle is represented by

α

,

r_{p}

indicates the punch radius, b describes the sheet width and

s_{0}

the initial sheet thickness of the semi-finished product. Frictional influences are taken into account by the friction coefficient

μ

, the tensile strength

R_{m}

, 0.2%, yield strength

R_{p 0.2}

and uniform elongation

A_{g}

represent material parameters.

F_{P, req} = \frac{b s_{0}}{2} \{\frac{R_{m} - R_{p 0.2}}{3 A_{g}} {(\frac{s_{0}}{r_{P} + \frac{s_{0}}{2}})}^{2} + R_{p 0.2} (\frac{s_{0}}{r_{P} + \frac{s_{0}}{2}})\} \frac{cos \frac{α}{2} + μ sin \frac{α}{2}}{\frac{α π}{2 \cdot 180}}

(1)

Although die bending results in higher dimensional accuracy compared to free bending processes, it usually requires a corresponding tool for each bending angle. In order to reach high accuracies, the process parameters usually have to be determined empirically. If process conditions fluctuate, such as varying materials or sheet thicknesses, this process often has to be repeated individually. A large number of tools, lengthy process setup and high forces can quickly result in high process costs. Moreover, it is often not possible to react appropriately to the uncertainty in forming processes, induced by the semi-finished product, tool, machine and environment, due to the rigidity of the process.

2.1.1. Springback

Dimensional inaccuracies occur during bending in the form of springback, which is caused by the state of stress in the bent component.

Figure 1 schematically shows two characteristic stress curves in the sheet metal during a die bending process. The dotted line represents the neutral fiber, which is characterized by a stress-free state. In the overlying (underlying) area of the sheet metal, compressive stresses (tensile stresses) prevail, which are highlighted in red (blue) in Figure 1. These stresses increase in magnitude with the distance from the neutral fiber. This becomes clear in the effective stresses during the bending process (blue line). The elastic parts of the bending can regress after the component has been unloaded and the residual stresses (red line) thus cause a springback in the component. In order to be able to clearly distinguish between the different bending angles, they are defined as follows:

$α_{des}^{(unl)}$ : Desired bending angle after unloading the component;
$α_{act}^{(unl)}$ : Actual bending angle after unloading the component;
$α_{des}^{(loa)}$ : Desired bending angle with loaded component (in process);
$α_{act}^{(loa)}$ : Actual bending angle with loaded component (in process).

Figure 2 shows this graphically. The schematically depicted probability density functions imply that variables describing the actual state are subject to uncertainty. The springback of the workpiece can thus be defined as

Δ α = α_{act}^{(unl)} - α_{act}^{(loa)}

.

2.1.2. Control Approaches

There are numerous examples in the literature of how to compensate for springback during sheet metal bending. Numerical or experimental tests are usually carried out to investigate the springback for specific combinations of material, sheet thickness and bending radius. However, compensation usually refers to (iterative) design adjustments to the tool system in order to match the desired product geometry. References [11,12] both propose approaches based on the displacement adjustment method. Here, the shape deviation is iteratively reduced based on the displacement of simulation nodes and can therefore be time consuming. Due to the higher flexibility of air bending, multiple approaches can be found in the literature for controlling the springback inline. One possible approach is incremental bending. Reference [13] presents press brake bending, where the punch stroke is divided into incremental steps, after which the workpiece is unloaded multiple times. From the multiple collected pairs of loaded and unloaded bending angles, the actual thickness and material properties of the workpiece are predicted based on an analytical model. This information is then used to predict the springback using the plane strain theory. Similarly, reference [14] uses a semi-analytical process simulation in combination with online measurements to compensate for springback during air bending. In order to compensate for the effects of fluctuating material properties on a bending angle, reference [15] uses force signals from an adjacent forming stage of a progressive die for material characterization. It has been demonstrated that a LASSO regression may serve to reduce the deviation by 24%. More recent studies focus on the use of artificial neural networks for springback prediction. Reference [16] shows an example where neural networks are trained with FEM simulations to predict springback. The predictions of the network align well and thus represent an inline-capable alternative to FEM. Reference [17] developed a tool path planning strategy based on RL and supervised learning. The efficacy of this approach is evaluated through a case study, which involves an FEM simulation of a rubber tool forming process. It has been demonstrated that a substantial reduction in geometry deviations can be achieved. Reference [18] presents a method for predicting springback in air bending of high-strength sheet metal using a neural network. The springback is predicted by the model based on process parameters like sheet thickness, punch radius and material coefficients. The model was then validated by numerical and experimental tests. Reference [19] develops a neural network control system along with a stepped binder force trajectory to control the springback in a steel channel forming process. In order to demonstrate the generalizability of the model, various configurations of material and lubrication conditions that had not been used for training were employed. Although significant improvements in accuracy were achieved, the study also highlights the need for more advanced sensing technologies and the implementation of real-time closed-loop control to further enhance process stability and performance. Reference [20] addresses this issue by presenting an online springback compensation model for an air bending process. The study demonstrates that in-line control of the bending process can be achieved by integrating camera-based bending angle detection with neural networks and the press controller. The results show that the control strategy significantly improves the accuracy of the achieved bending angle.

2.2. Reinforcement Learning

Like supervised and unsupervised learning, RL is a machine learning class and a subfield of artificial intelligence. It describes sequential decision making problems that an agent learns through feedback. Since the agent is initialized with no information about the environment and the influence of actions, it has to collect information via trial and error. At each time step, the agent chooses an action

a_{t} \in A

and subsequently receives the new state of the environment

s_{t + 1} \in S

along with a scalar reward signal

r_{t + 1} \in R

. Based on this observation, the agent adapts his action to maximize the reward. It is therefore a promising approach to control (partially) unknown systems. A visual representation of the described structure is given in Figure 3. The theoretical foundations of RL and decision making are described in the following sections. Like supervised and unsupervised learning, RL is a machine learning class and a subfield of artificial intelligence. It describes sequential decision-making problems that an agent learns through feedback. Since the agent is initialized with no information about the environment and the influence of actions, it has to collect information via trial and error. At each time step, the agent chooses an action

a_{t} \in A

and subsequently receives the new state of the environment

s_{t + 1} \in S

along with a scalar reward signal

r_{t + 1} \in R

. Based on this observation, the agent adapts his action to maximize the reward. It is therefore a promising approach to control (partially) unknown systems. A visual representation of the described structure is given in Figure 3. The theoretical foundations of RL and decision making are described in the following sections.

2.2.1. Markov Decision Processes

A Markov Decision Process (MDP) is a framework used to model optimization problems and forms the basis of RL. The state of an MDP is defined by the fact that the probability of occurrence of a future state is independent of past conditions as long as the current state is known. This can be defined mathematically as follows:

P [S_{t + 1} ∣ S_{t}] = P [S_{t + 1} ∣ S_{1}, \dots, S_{t}]

(2)

P

describes the state transition probability matrix, whereas

S

represents the states of the environment. If

P

is known, it is fully described through a mathematical model. If this model is used to learn an optimal policy to fulfill a task, RL is considered as model-based. On the contrary, the term model-free RL is used if

P

is not known. The approaches presented in this paper are model-free.

2.2.2. Optimal Control

The behavior of an agent is determined by its policy

π

, which is either deterministic (Equation (3)) or stochastic (Equation (4)). The policy maps each state to a single action (deterministic) or multiple actions with different probabilities (stochastic).

π (s) = a

(3)

π (a | s) = P [a_{t} = a | s_{t} = s]

(4)

The return J describes the cumulative, discounted reward of an MDP. Given a policy

π

, the state-value function

V^{π} (s)

represents the expected return J starting from state s and following policy

π

. It therefore evaluates how good it is for the agent to be in a certain state. It is formally described as:

V^{π} (s) = \sum_{a \in A} π (a | s) (R (s, a) + γ \sum_{s^{'} \in S} P (s^{'} | s, a) V^{π} (s^{'}))

(5)

Similar to the state-value function, an action-value function can be formulated:

Q^{π} (s, a) = R (s, a) + γ \sum_{s^{'} \in S} P (s^{'} | s, a) \sum_{a^{'} \in A} π (a^{'} | s^{'}) Q^{π} (s^{'}, a^{'})

(6)

As the state-value function, it describes the expected return J, starting from state s and following strategy

π

, while also considering action a. In order to solve the RL-task, the agent has to maximize the return and therefore find the optimal policy

π^{*}

.

π^{*} = \arg \max_{a \in A} Q^{*} (s, a)

(7)

This can either be achieved by directly optimizing the policy or by learning the state-value function Q. One possible approach of policy optimization is to use the policy gradient method. The policy is parameterized with

θ

in order to be able to adjust the policy iteratively. This is equivalent to the optimization of the parameterized expected return

J_{θ}

. Policy Gradient (PG) methods use gradient ascent to maximize the objective

J_{θ}

in the direction of the gradient

\nabla_{θ} J_{θ}

. The update rule for the parameters

θ

is then carried out as follows, where

α

denotes the learning rate.

θ_{t + 1} = θ_{t} + α \nabla_{θ} J_{θ}

(8)

As mentioned, another approach to solving the RL problem is to learn the action-value function. In environments with discrete action and state spaces, this task can, e.g., be performed with tables. The complexity increases drastically in continuous or higher-dimensional state and action spaces. Q-Learning is then usually performed with neural networks. Those value estimation approaches are in general computational more efficient and therefore easier to implement. But these are often less suitable for large and continuous spaces. Policy estimation approaches, on the other hand, usually deliver good results in high-dimensional spaces with continuous action selection. Still, they are difficult to implement because of their underlying intractable mathematics.

2.2.3. Actor-Critic Algorithms

Actor-Critic (AC) algorithms represent a class of RL algorithms that combine elements of policy optimization and value estimation. As presented by [21], their main advantage lies in efficiently computing policies and value functions, leading to more stable learning processes. AC networks typically consist of two main components: an actor network and a critic network. The actor network is responsible for learning the policy, while the critic network estimates the value function. By iteratively improving the policy based on feedback from the value estimator, AC networks enable effective exploration and exploitation in the environment. The flexibility and adaptability of neural networks allow the application of AC networks in a variety of RL scenarios, both in continuous and discrete action spaces. These networks also support asynchronous learning, which potentially increases learning speed. AC networks therefore provide a robust method for solving complex RL problems as they combine the benefits of value estimation and policy optimization. The Soft Actor-Critic (SAC) algorithm used in this work is a variant of AC proposed by [22] which additionally incorporates entropy maximization as a regularization term during policy optimization. This encourages exploration by discouraging premature convergence to suboptimal policies and can therefore lead to a more efficient and stable learning process.

3. Methodology

In order to compensate for the springback, the component must be overbent to the angle

α_{des}^{(loa)}

. As previously mentioned, the angle necessary for springback compensation depends on many process parameters and must be determined by the process controller. In this paper, it is intended to control the springback of three different materials in different thicknesses (see Table 1). Thus, a tool design that allows for continuous adjustment of the angle is necessary. Subsequently, a product controller needs to be developed that can predict the springback of different combinations of material, sheet thickness and target angle

α_{des}^{(unl)}

. The use of RL models as product controllers poses challenges in terms of data generation due to the associated data inefficiency. For this reason, a FEM-model of the process is also being developed which can be used to pretrain RL agents. Differences to the real process (sim-to-real gap) should then be closed by retraining on experimental data.

3.1. Design of Variable Die Bending Tool

In order to make a V-die bending process variable regarding its bending angle and thus allow the angle to be controlled, additional Degrees Of Freedom (DOF) need to be introduced in the tooling system. The concept developed for a flexible die is shown in Figure 4 in the form of a principle sketch. It shows that the die is divided into two halves, maintaining a common pivot point (MD). It has a vertical DOF and is guided horizontally in the die center. It is furthermore supported against the floor by a spring of stiffness

c_{d}

. Moreover, each die half is guided at its other end through a floating bearing. The angle

β_{g}

, in which the floating bearing is arranged, ensures that the die angle (

α_{d}

) is reduced when the center point is lowered (

x_{d} > 0

). The die angle can therefore be set depending on the vertical displacement of MD.

Apart from the die, the punch is designed to be flexible, too. Figure 5 shows the principle sketch of a punch with variable punch angle. As with the die, the punch is divided into two halves with a common pivot point (MP). The ends of the punch halves (C) are connected to a floating bearing (D) via springs. It must also be ensured that D is always vertically aligned above C. The floating bearing allows for the compensation of the horizontal offset that occurs when the punch angle is reduced. Similar to the die, springs (

c_{P}

) ensure that the appropriate coining forces can be applied for varying punch angles. Viewed individually, the punch has two DOF (

x_{P 1}

,

x_{P 2}

), as the two halves of the punch can move independently. In combination with the die, these DOF are reduced to the displacement of the die center point

x_{d}

and thus the stroke height.

The tooling system is dimensioned in a way that sheet materials with a length of 160 mm and a width of 50 mm can be processed. The corresponding materials and sheet thicknesses are shown in Table 1. It should also be possible to realize angles in a range from 80 to 110°. The required force for bending is estimated according to Equation (1) for each material and sheet thickness combination. The maximum estimated stamping force is then used to select the springs.

F_{P, \max} = \max (F_{{P, req}_{i}}), i \in m

(9)

The vector m denotes all possible combinations of material and sheet thickness. By integrating the additional DOF into the tool, the product property, i.e., the bending angle, can be controlled. Derived from [2], a cascaded controller for springback compensation is shown in Figure 6. The RL model is referred to as RL-based product-controller in the offline control loop and represents a SAC RL agent. The observable state contains the target angle

α_{des}^{(unl)}

and vector m. The material vector m includes the elastic modulus, the Poisson ratio and the tensile strength of the selected material and sheet thickness. Based on the inputs, the agent selects a compensated angle

α_{des}^{(loa)}

, which is forwarded to the online controller. The online control ensures that machine-side and environmental influences such as gear backlash and actuator inaccuracies can be eliminated. This is achieved by a camera-based inline measurement of the bending angle (50 Hz sampling frequency) and a direct feedback of the measured bending angle into the machine control (PLC). Reference [20] provides a more detailed description of the similar measurement system and information flows in an air bending process. In this work, the camera (acA1600-60hm, Basler AG, Ahrensburg, Germany) of the measuring system is mounted on the moving part of the die so that no distortions occur due to perspective shifts. Subsequent to the execution of the forming operation, the actual unloaded bending angle is transmitted to the product controller. Thereafter, a scalar reward is calculated in accordance with Equation (10). As the agent tries to maximize the reward, the deviation

Δ α_{ctrl}

is minimized.

r = 10 - Δ α_{ctrl} with Δ α_{ctrl} = α_{des}^{(unl)} - α_{act}^{(unl)}

(10)

Due to the data inefficiency of RL methods, the variable die bending process is also represented numerically in the form of an FEM simulation. In Figure 6, the interaction of the simulation with the product property controller is represented by dashed lines. Due to the deterministic nature of a simulation, the online control of

α_{des}^{(loa)}

is not necessary. The numerical mapping of the process otherwise has the same input and output variables as the real process. Inputs are the material vector and the target angle

α_{des}^{(unl)}

. The product property controller can therefore be used to control the springback in the simulation and in the real process. To increase efficiency, the RL-based controller should first interact with the simulation and then be transferred to the real process. Differences between the real process and the simulation are to be overcome by transfer learning.

A more detailed overview of the development process of the RL controller is shown in Figure 7. It can be seen that the basic structure of the RL structure remains the same between the simulation and real process and that the use of knowledge from simulations is enabled by retraining the agent on real process data. As previously mentioned, the state

s^{Sim / Exp}

consists of the vector m and the observed unloaded angle

α_{act}^{(unl)}

. Based on this, the compensated target angle

α_{des}^{(loa)}

is predicted and used as input for the simulation or the process controller. The selected hyperparameters are presented in Table 2 and were determined through a grid search, with the explored parameter space also detailed in the table.

3.2. Finite Element Simulation

The simulation model is set up using the software SIMULIA Abaqus CAE/2023 as a two-dimensional implicit model. All tool parts are defined as rigid bodies based on the described principle (Figure 4 and Figure 5). A plane strain state is assumed for the sheet metal since the width of the sheet is significantly larger than the thickness of the sheet [13,23]. Due to the shorter calculation time, an x-symmetry is also applied to the center of the die. The time sequence of the simulation process is divided into three steps. In the first step, all individual parts are fixed and the sheet metal is placed in the center of the die. In the second step, all springs are preloaded. In the third step, forming takes place by moving the punch downwards. The displacement of the punch required to achieve the target angle

α_{des}^{(unl)}

is calculated using the trigonometric relationship:

s_{P} (α_{des}^{(loa)}) = (cos (\frac{α_{des}^{(loa)}}{2}) + cos (\frac{α_{\max}}{2})) \cdot l_{d} + s_{0}

(11)

It consists of the cosine of the target angle in the process and the angle of the die at the uppermost point (

x_{d} = 0

), the length of a die half and the sheet thickness. A constant friction coefficient of

μ = 0.1

is applied to model the contact between the tool and the sheet metal, which is a widely accepted approximation for metallic contacts in metal forming simulations without detailed surface characterization (see [24]). After the third step, the deformed component is loaded into a new model in its final state of the previous simulation and the applied loads are removed. This causes the springback to form and the unloaded bending angle (

α_{act}^{(unl)}

) can be measured. Figure 8 shows the described process and the simulation model schematically. In order to map the influence of uncertainty occurring in the semi-finished product, multiple data sets are created in which uncertainty is applied to two material parameters (v, E). An overview of the magnitude of the maximum uncertainty applied and the sample size of the different data sets are shown in Table 3. The applied uncertainty is chosen stochastically (uniformly distributed) at the start of every simulation. The data sets are labeled as follows: no uncertainty (nu), low uncertainty level (lu), medium uncertainty level (mu) and high uncertainty level (hu).

3.3. Experimental Setup

To validate the concept experimentally, a tool is designed based on the principle sketches shown above. The real tool system, consisting of die and punch, is shown in Figure 9. It is used in combination with the prototype of the 3D Servo Press [25], where the die is mounted on the press table and the punch is centered above it on the ram. The press is controlled in the task space using model-based control concepts presented by [26]. The available, additional DOF of the 3D Servo Press are not used in this case such that the tilting DOFs are locked and only vertical displacements of the tool are performed. The guiding elements shown in the principle sketches were implemented in the tool using linear guides, plain bearings and slots. In order to apply the spring force evenly and without jamming to the die, the individual spring shown in the principle is replaced by a parallel connection of four springs. The arrangement of the springs in the punch remains as shown in the principle sketch (Figure 5). A camera can be attached to the moving parts of the die, so it is moved along with it. This prevents the measured bending angle from being influenced by perspective distortions induced by a relative displacement between the camera and the part. After setting up the angle detection, which is based on an edge detection algorithm and the live image of the camera, it is therefore sufficient to calibrate the detected angle just once. Further information about the camera setup can be found in [20].

4. Results

In the following, the springback behavior of the different materials and sheet thicknesses of the simulative and real process is shown. This is followed by a discussion of springback using the control system presented.

4.1. Simulative Results

The springback behavior of the simulated process is shown in Figure 10. All figures show the occurring springback

Δ α

for the different configurations (m) plotted against the target angle (

α_{act}^{(loa)}

). The top row refers to data set nu (no uncertainty) and the bottom row to data set mu (medium uncertainty).

It can be clearly seen that the highest springback occurs in the aluminum sheets with a thickness of 0.5 mm. This is expected, since in general, the springback increases with decreasing sheet thickness. It also increases with decreasing modulus of elasticity at similar tensile strength. For some material and sheet thickness combinations, springback increases with increasing forming angle (

α_{act}^{(loa)}

), while it decreases for others. Since this behavior only occurs for materials of higher tensile strength, this leads to the following assumption. The applied stamping force is directly determined by the springs and their deflection, which remains equal for all materials and sheet thicknesses. This may mean that the applied force is not sufficiently high for thicker sheets of stiffer materials, like the used aluminum alloy and steel, at large angles. This assumption was confirmed by simulations with doubled spring stiffnesses, which can be seen in Figure 11.

When comparing data set nu with mu, it is obvious that the variance of the springback increases with rising uncertainty.

With the four data sets presented in Section 3, RL models are trained to learn the material-, sheet thickness- and angle-dependent springback behavior. The trained agents then predict the springback and compensate for it using the control loop. The results are summarized for each data set in Figure 12 in the form of a Probability Density Function (PDF). An overview over the average and maximum deviations of the individual data sets is also shown in Table 4. The PDF of the agent belonging to data set nu shows that the mean deviation could be significantly reduced (

| \bar{Δ α_{ctrl}} | = 0 . 16^{\circ}

). It can also be seen that both the mean and the maximum deviation increase with increasing uncertainty. To test whether the performance of the trained agents can be improved with knowledge of the applied uncertainty, a further series of tests is carried out. In these, the actual material properties (

E + δ E

resp.

v + δ v

instead of E resp. v) were fed to the agents, to see if the springback behavior of untrained configurations is learned. In a real environment, the feedback of the uncertainty or the actual material parameters could be achieved, for example, by measuring the forming forces in the process [27]. However, given the considerable difference between synthetically generated force signals and real measured signals, especially in the case of a flexible tool system, the approach described above was chosen. The results of these tests are also shown in Table 4 (gray values) and in the form of PDFs (Figure 13), separated by data set. The results presented in this study are derived from 30 training runs, with both the mean and maximum deviations being displayed. It shows clearly that with knowledge of the applied uncertainty,

Δ α_{ctrl}

can be reduced significantly. Table 4 shows that both the mean and maximum deviation can be significantly improved for all datasets. The difference between agents with knowledge about the applied uncertainty and those without becomes clearer as the level of applied uncertainty increases.

4.2. Experimental Results

The following section examines the use of RL models for bending angle control in an experimental environment. As shown in Section 3, simulatively trained agents are initially used to predict the springback. Subsequently, the models are retrained based on experimental data and validated on an additional experimental dataset.

For the experimental validation of the agent trained with simulatively generated data, the angular range is traversed with a stepsize of 2.5° (see Table 5). Figure 14 shows the springback

Δ α

plotted against the nominal angle

α_{des}^{(unl)}

. The data points represent the experimentally measured values and the dashed lines the springback predicted by the agent. The used agent was trained on the nu data set, as information on material fluctuations cannot be fed back in the test setup shown. The agent can therefore only learn the mean springback of datasets subject to uncertain material parameters, which corresponds to training on the dataset without applied uncertainty. Figure 14 shows the springback behavior broken down by material. For copper sheets, it can be seen for both thicknesses that the material springs back significantly more in the experimental case than predicted by the agent. Since it could be shown that the agent reduces the deviation in the simulated process to an average of 0.16° (max. 0.88°), this means that the material springs back significantly more in the real process. This may be due to the fact that the material parameters of copper used for the simulation were not determined experimentally with the material used, but represent literature values. As shown by [28], the prediction quality in V-die bending is also highly dependent on the rolling direction of the sheet and the underlying criteria and laws for different materials. For 1 mm thick aluminum sheets, the graph shows a good match between predicted and actual springback. Small deviations in the springback behavior can be seen, but these should be largely due to fluctuations in the material or process. Even for 0.5 mm thick aluminum sheets, there is good agreement between prediction and experiment for desired angles

α_{des}^{(unl)}

greater than 95°. Below this, the springback increases in the predicted and thus simulative case compared to the experimental investigation as the desired angle decreases. As already discussed in Section 4.1, this could also be due to the coining component during die bending, which behaves slightly differently in the real process than in the simulation. For sheets of the two thinner thicknesses of DC01, higher springbacks occur in the simulation than in the real test. For 1 mm thick sheets, on the other hand, the predicted and the experimentally determined springback agree well.

In summary, it can be said that the deviation

Δ α_{ctrl}

is already improved by the simulative training of the agents. However, as simulation and reality do not align exactly, there is a sim-to-real gap that limits the prediction quality of the agent. In order to obtain more accurate results, the gap must be closed through transfer learning. To accomplish this, agents that were originally trained on synthetic data can be further retrained using experimental datasets. As shown in Table 5, a total of an additional 330 tests were carried out to generate an experimental database. The agent was then retrained off-policy on the generated training dataset. The plots in Figure 15 show this dataset (scatter) with the predictions of the final trained agents (dashed lines). The predictions of the agent now align very well with the average springback of the according material and thickness configuration. However, the experimental process underlies uncertainty which, in contrast to simulation, cannot be eliminated. As a result, the springback fluctuates visibly even under similar conditions. Thus, the average and maximum deviation are reduced compared to the sim-to-real data set, but remain high compared to the simulation datasets (see Table 6). The above results are based on the training data set. In order to rule out overfitting, the best retrained agent is revalidated according to the test matrix shown in Table 5. Again, the springback (

Δ α

) of the different materials and sheet thicknesses is plotted over the target angle (

α_{des}^{(unl)}

) in Figure 16. Since the agent is not retrained during validation, the predictions (dashed lines) remain the same. Finally, the PDFs of each dataset are compared with each other in Figure 17. The corresponding characteristic values are summarized in Table 6. It can be seen that the retrained agent achieves similar performance for the training and validation data set. The mean deviation

| \bar{Δ α_{ctrl}} |

is within 0.1°, while the maximum occurring deviation is higher in the training data set. Due to the size of the datasets, this is likely caused by stochastic uncertainty in the process.

5. Conclusions

In accordance with the prevailing trend towards mass customization [29], product life cycles are becoming increasingly shorter and batch sizes are decreasing. Therefore, as illustrated by [30], forming processes and production processes in general need to be flexibilized, which is accompanied by the requirement for process control. RL algorithms are a promising approach for this purpose. In this paper, a novel concept for flexible V-die bending is presented which allows the bending angle to be adjusted and hence allows springback compensation. As springback is mutually dependent on numerous parameters, this results in a complex control task. At first, a parameterized FEM model is developed to accommodate variable angles, sheet thicknesses, and material properties. The general effectiveness of RL as a product property controller could be proven, as the mean bending angle deviation is reduced to 0.16° across all data sets. Additionally, the maximum occurring deviations could be reduced significantly as well. However, when applied to the real-world process, a noticeable sim-to-real gap was observed due to dissimilarities in springback behavior. To address this gap, retraining of the agent on the experimental data set was conducted. This improves the accuracy noticeably, but the process is subject to uncertainty that the agent cannot detect in the real environment. In the simulative environment, it was shown that the prediction quality of the agent could be improved when the agent possesses knowledge of the actual material properties. Earlier studies by [20,31] show that force contains valuable information about the material properties and therefore springback behavior. Deriving from this, the integration of force sensors as additional input for the agent seems to be a promising approach to take process fluctuations into account and further reduce the deviation. The use of RL as a product property controller improves the dimensional accuracy of the bending workpieces in the shown process, which is why its application to other forming processes should be further investigated. It is of particular interest to consider more complex forming processes with multiple DOF, which allows for advanced manipulation.

Author Contributions

Conceptualization, C.-V.V., D.A.M. and V.A.; methodology, D.A.M. and V.A.; investigation, C.-V.V.; writing—original draft preparation, C.-V.V.; writing—review and editing, D.A.M., V.A. and P.G.; visualization, C.-V.V. and D.A.M.; supervision, P.G.; project administration, P.G.; funding acquisition, P.G. All authors have read and agreed to the published version of the manuscript.

Funding

The results of this paper are achieved within the project “ProKI—Demonstrations und Transfernetzwerk KI für die Umformtechnik ProKI-Darmstadt” funded by the German Federal Ministry of Education and Research (BMBF). The authors wish to thank for funding and supporting this project.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AC	Actor-Critic
DOAJ	Directory of Open Access Journals
DOF	Degrees Of Freedom
FEM	Finite-Element Method
LD	Linear Dichroism
MDPI	Multidisciplinary Digital Publishing Institute
PDF	Probability Density Function
RL	Reinforcement Learning
SAC	Soft Actor-Critic
TLA	Three Letter Acronym

References

Möhring, H.; Wiederkehr, P.; Erkorkmaz, K.; Kakinuma, Y. Self-optimizing machining systems. CIRP Ann. 2020, 69, 740–763. [Google Scholar] [CrossRef]
Allwood, J.; Duncan, S.; Cao, J.; Groche, P.; Hirt, G.; Kinsey, B.; Kuboki, T.; Liewald, M.; Sterzing, A.; Tekkaya, A. Closed-loop control of product properties in metal forming. CIRP Ann. 2016, 65, 573–596. [Google Scholar] [CrossRef]
Volk, W.; Groche, P.; Brosius, A.; Ghiotti, A.; Kinsey, B.; Liewald, M.; Madej, L.; Min, J.; Yanagimoto, J. Models and modelling for process limits in metal forming. CIRP Ann. 2019, 68, 775–798. [Google Scholar] [CrossRef]
Dornheim, J.; Link, N.; Gumbsch, P. Model-free adaptive optimal control of episodic fixed-horizon manufacturing processes using reinforcement learning. Int. J. Control Autom. Syst. 2020, 18, 1593–1604. [Google Scholar] [CrossRef]
Idzik, C.; Krämer, A.; Hirt, G.; Lohmar, J. Coupling of an analytical rolling model and reinforcement learning to design pass schedules: Towards properties controlled hot rolling. J. Intell. Manuf. 2023, 35, 1469–1490. [Google Scholar] [CrossRef]
Reinisch, N.; Rudolph, F.; Günther, S.; Bailly, D.; Hirt, G. Successful pass schedule design in open-die forging using double deep Q-learning. Processes 2021, 9, 1084. [Google Scholar] [CrossRef]
Deng, J.; Sierla, S.; Sun, J.; Vyatkin, V. Reinforcement learning for industrial process control: A case study in flatness control in steel industry. Comput. Ind. 2022, 143, 103748. [Google Scholar] [CrossRef]
Zhao, W.; Queralta, J.; Qingqing, L.; Westerlund, T. Towards Closing the Sim-to-Real Gap in Collaborative Multi-Robot Deep Reinforcement Learning. In Proceedings of the 2020 5th International Conference on Robotics and Automation Engineering (ICRAE), Singapore, 20–22 November 2020; pp. 7–12. [Google Scholar]
Dulac-Arnold, G.; Levine, N.; Mankowitz, D.; Li, J.; Paduraru, C.; Gowal, S.; Hester, T. An empirical investigation of the challenges of real-world reinforcement learning. arXiv 2020, arXiv:2003.11881. [Google Scholar]
Zünkler, B. Untersuchung des überelastischen Blechbiegens, von einem Einfachen Ansatz Ausgehend; Hanser Verlag: München, Germany, 1965; Volume 6. [Google Scholar]
Gan, W.; Wagoner, R. Die design method for sheet springback. Int. J. Mech. Sci. 2004, 46, 1097–1113. [Google Scholar] [CrossRef]
Zhang, Z.; Wu, J.; Zhang, S.; Wang, M.; Guo, R.; Guo, S. A new iterative method for springback control based on theory analysis and displacement adjustment. Int. J. Mech. Sci. 2016, 105, 330–339. [Google Scholar] [CrossRef]
Wang, J.; Verma, S.; Alexander, R.; Gau, J. Springback control of sheet metal air bending process. J. Manuf. Process. 2008, 10, 21–27. [Google Scholar] [CrossRef]
Heller, B.; Chatti, S.; Ridane, N.; Kleiner, M. Online-Process Control of Air Bending for Thin and Thick Sheet Metal. J. Mech. Behav. Mater. 2004, 15, 455–462. [Google Scholar] [CrossRef]
Havinga, J.; van den Boogaard, T.; Dallinger, F.; Hora, P. Feedforward control of sheet bending based on force measurements. J. Manuf. Process. 2018, 31, 260–272. [Google Scholar] [CrossRef]
Sharad, G.; Nandedkar, V. Springback in Sheet Metal U Bending-Fea and Neural Network Approach. Procedia Mater. Sci. 2014, 6, 835–839. [Google Scholar] [CrossRef]
Liu, S.; Shi, Z.; Lin, J.; Yu, H. A generalisable tool path planning strategy for free-form sheet metal stamping through deep reinforcement and supervised learning. J. Intell. Manuf. 2025, 36, 2601–2627. [Google Scholar] [CrossRef]
Fu, Z.; Mo, J. Springback prediction of high-strength sheet metal under air bending forming and tool design based on GA–BPNN. Int. J. Adv. Manuf. Technol. 2011, 53, 473–483. [Google Scholar] [CrossRef]
Viswanathan, V.; Kinsey, B.; Cao, J. Experimental Implementation of Neural Network Springback Control for Sheet Metal Forming. J. Eng. Mater. Technol. 2003, 125, 141–147. [Google Scholar] [CrossRef]
Molitor, D.; Arne, V.; Kubik, C.; Noemark, G.; Groche, P. Inline closed-loop control of bending angles with machine learning supported springback compensation. Int. J. Mater. Form. 2024, 17, 8. [Google Scholar] [CrossRef]
Konda, V.; Tsitsiklis, J. Actor-Critic Algorithms. Adv. Neural Inf. Process. Syst. 1999, 12, 1–7. [Google Scholar]
Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. arXiv 2018, arXiv:1801.01290. [Google Scholar]
Shahzamanian, M.; Lloyd, D.; Partovi, A.; Wu, P. Study of influence of width to thickness ratio in sheet metals on bendability under ambient and superimposed hydrostatic pressure. Appl. Mech. 2021, 2, 542–558. [Google Scholar] [CrossRef]
Doege, E.; Behrens, B.-A. Handbuch Umformtechnik: Grundlagen, Technologien, Maschinen; Springer: Berlin/Heidelberg, Germany, 2016; pp. 1–948. [Google Scholar]
Groche, P.; Scheitza, M.; Kraft, M.; Schmitt, S. Increased total flexibility by 3D Servo Presses. CIRP Ann. 2010, 59, 267–270. [Google Scholar] [CrossRef]
Molitor, D.; Arne, V.; Spies, D.; Hoppe, F.; Groche, P. Task space control of ram poses of multipoint Servo Presses. J. Process Control 2023, 129, 103057. [Google Scholar] [CrossRef]
Unterberg, M.; Niemietz, P.; Trauth, D.; Wehrle, K.; Bergs, T. In-situ material classification in sheet-metal blanking using deep convolutional neural networks. Prod. Eng. Res. Dev. 2019, 13, 743–749. [Google Scholar] [CrossRef]
Mulidrán, P.; Spišák, E.; Tomáš, M.; Rohal, V.; Stachowicz, F. The Springback Prediction of Deep-Drawing Quality Steel used in V-Bending Process. Acta Mech. Slovaca 2020, 23, 14–18. [Google Scholar] [CrossRef]
Tseng, M.M.; Wang, Y.; Jiao, R.J. Mass Customization. In CIRP Encyclopedia of Production Engineering; Laperrière, L., Reinhart, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar] [CrossRef]
Yang, D.; Bambach, M.; Cao, J.; Duflou, J.; Groche, P.; Kuboki, T.; Sterzing, A.; Tekkaya, A.; Lee, C. Flexibility in metal forming. CIRP Ann. 2018, 67, 743–765. [Google Scholar] [CrossRef]
Schenek, A.; Görz, M.; Liewald, M.; Riedmüller, K.R. Data-Driven Derivation of Sheet Metal Properties Gained from Punching Forces Using an Artificial Neural Network. Key Eng. Mater. 2022, 926, 2174–2182. [Google Scholar] [CrossRef]

Figure 1. (Residual) Stresses.

Figure 2. Stresses and bending angles in flexible die bending.

Figure 3. Structure of reinforcement learning.

Figure 4. Principle sketch of the die.

Figure 5. Principle sketch of the punch.

Figure 6. Block diagram of the property closed-loop controlled die bending process.

Figure 7. Structure of the AI-supported product controller when applied to the die bending process.

Figure 8. Schematic sequence of the simulative forming process with springback for a target angle of 80° and DC01 with a sheet thickness of 1 mm: (a) Initial state (b) Preloading springs (c) Begin of forming operation (d) Completed forming operation (e) Simulation of springback (f) Final geometry.

Figure 9. Die bending tool consisting of flexible punch (a) and die (b).

Figure 10. Simulated springback behavior with constant material parameters (dataset nu, top) and material parameters subject to uncertainty (dataset mu, bottom).

Figure 11. Simulated springback for different spring stiffnesses of punch and die for DC01.

Figure 12. Bending angle deviations using the trained models on simulative generated datasets with different levels of uncertainty.

Figure 13. Comparison of PDF of the bending angle deviations with and without knowledge of artificially applied uncertainty.

Figure 14. Comparison of experimental springback on validation dataset and the predicted springbacks of the SAC algorithm trained on simulation data (dashed line).

Figure 15. Comparison of the experimental springback on the training dataset and the predicted springback of the SAC algorithm retrained on experimental data (dashed line).

Figure 16. Comparison of the experimental springback of the validation dataset and the predicted springback of the SAC algorithm retrained on experimental data (dashed line).

Figure 17. PDF of bending angle deviations.

Table 1. Material and sheet thickness combinations.

Sheet Thickness [mm]	0.5	0.75	1
DC01	x	x	x
EN AW 6082T6	x	-	x
Copper	x	-	x

Table 2. Overview of selected hyperparameters and search space for grid search.

	Learning Rate	Batchsize	$ϵ$	$τ$
Selected Hyperparameters	$1 \times 10^{- 2}$	128	0.05	0.1
Search Space	$1 \times 10^{- 1}$ – $1 \times 10^{- 4}$	8–256	0.01–0.2	0.01–0.2

Table 3. Applied uncertainties of the four data sets.

Uncertainty	nu	lu	mu	hu
$δ E$	$\pm 0$	$\pm 2500$	$\pm 5000$	±10,000
$δ v$	$\pm 0$	$\pm 0.01$	$\pm 0.02$	$\pm 0.04$
N	3094	1725	3310	2069

Table 4. Maximum and average occurring bending angle deviations using the SAC algorithm as an AI-supported product controller on the different simulation data sets without (black) and with knowledge (gray) about uncertainty of the material parameters (mean values based on 30 trained models each).

Metric	nu	lu	mu	hu
$max (\|Δ α_{ctrl}\|) in$ °	$0.88$	$2.28$	$2.58$	$4.52$
$max (\|Δ α_{ctrl}\|) in$ °	-	$1.59$	$1.96$	$1.72$
$\bar{\|Δ α_{ctrl}\|} in$ °	$0.16$	$0.21$	$0.33$	$0.51$
$\bar{\|Δ α_{ctrl}\|} in$ °	-	$0.17$	$0.19$	$0.19$

Table 5. Overview of experimentally conducted test series and corresponding parameters.

Dataset	N	Stepsize	$α_{des}^{(unl)} \in \dots$ °
Sim-to-Real	95	2.5°	$[80, 110]$
Training	330	1°	$[80, 110]$
Validation	175	2.5°	$[80, 110]$

Table 6. Error metrics for bending angle deviations.

Data Set	$\| \bar{Δ α_{ctrl}} \|$	$max \| \bar{Δ α_{ctrl}} \|$	$R^{2}$
Sim-to-Real	1.28	4.5	−0.03
Training	0.48	3.13	0.88
Validation	0.57	1.86	0.86

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Veitenheimer, C.-V.; Molitor, D.A.; Arne, V.; Groche, P. Controlling Product Properties in Forming Processes Using Reinforcement Learning—An Application to V-Die Bending. Appl. Sci. 2025, 15, 5483. https://doi.org/10.3390/app15105483

AMA Style

Veitenheimer C-V, Molitor DA, Arne V, Groche P. Controlling Product Properties in Forming Processes Using Reinforcement Learning—An Application to V-Die Bending. Applied Sciences. 2025; 15(10):5483. https://doi.org/10.3390/app15105483

Chicago/Turabian Style

Veitenheimer, Ciarán-Victor, Dirk Alexander Molitor, Viktor Arne, and Peter Groche. 2025. "Controlling Product Properties in Forming Processes Using Reinforcement Learning—An Application to V-Die Bending" Applied Sciences 15, no. 10: 5483. https://doi.org/10.3390/app15105483

APA Style

Veitenheimer, C.-V., Molitor, D. A., Arne, V., & Groche, P. (2025). Controlling Product Properties in Forming Processes Using Reinforcement Learning—An Application to V-Die Bending. Applied Sciences, 15(10), 5483. https://doi.org/10.3390/app15105483

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Controlling Product Properties in Forming Processes Using Reinforcement Learning—An Application to V-Die Bending

Abstract

1. Introduction

2. State of the Art

2.1. Die Bending

2.1.1. Springback

2.1.2. Control Approaches

2.2. Reinforcement Learning

2.2.1. Markov Decision Processes

2.2.2. Optimal Control

2.2.3. Actor-Critic Algorithms

3. Methodology

3.1. Design of Variable Die Bending Tool

3.2. Finite Element Simulation

3.3. Experimental Setup

4. Results

4.1. Simulative Results

4.2. Experimental Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI