TD3-Enhanced MPC for Safe Braking of Overhead Cranes with Safety-Critical Region Prediction

Zhang, Wenshuai; Wang, Yifan; Liu, Manlan; Lan, Peng

doi:10.3390/act15060334

Open AccessArticle

TD3-Enhanced MPC for Safe Braking of Overhead Cranes with Safety-Critical Region Prediction

¹

School of Mechanical and Electrical Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China

²

State Key Laboratory of Green Building, Xi’an 710055, China

^*

Author to whom correspondence should be addressed.

Actuators 2026, 15(6), 334; https://doi.org/10.3390/act15060334 (registering DOI)

Submission received: 11 May 2026 / Revised: 8 June 2026 / Accepted: 9 June 2026 / Published: 12 June 2026

(This article belongs to the Section Control Systems)

Download

Browse Figures

Review Reports Versions Notes

Abstract

To address the strong coupling between trolley motion and payload swing, as well as the difficulty of determining optimal braking timing during emergency operations of overhead cranes in complex environments, a model-predictive braking control method integrated with the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm is proposed. Within the Model Predictive Control (MPC) framework, payload swing angle constraints are explicitly incorporated, and an adaptive braking reference trajectory is constructed to achieve rapid and stable stopping while effectively suppressing load oscillations. Furthermore, the TD3 algorithm is employed for online adaptive optimization of key MPC parameters, enabling a dynamic trade-off between braking performance and swing suppression under varying operating conditions. In addition, a minimum braking distance prediction model based on Support Vector Regression (SVR) is developed, and a state-dependent safety-critical region prediction model is established to quantitatively determine optimal braking timing. Simulation results across multiple operating conditions demonstrate that the proposed TD3–MPC method outperforms conventional MPC in terms of braking efficiency, swing suppression capability, and system stability while satisfying swing angle constraints. Moreover, real-crane experimental results demonstrate the effectiveness of the proposed safety-critical region prediction method in determining appropriate braking trigger timing and achieving safe and smooth stopping of the overhead crane under obstacle-avoidance conditions.

Keywords:

TD3-enhanced MPC; overhead crane; safe braking; safety-critical region modeling; anti-sway control

1. Introduction

As a critical material handling system in modern industrial production and infrastructure construction, overhead cranes are widely used in sectors such as construction, transportation, energy, and logistics [1,2]. With the advancement of manufacturing systems and intelligent technologies, the limitations of conventional cranes in terms of efficiency, intelligence, and safety have become increasingly evident. Driven by the increasing demand for safe and efficient operation under complex conditions, intelligent overhead cranes integrating artificial intelligence and data-driven approaches have emerged as a key research focus, promoting the transition toward system-oriented intelligent operation platforms [3].

In practice, overhead cranes often operate in dynamic and uncertain environments while interacting with on-site personnel, which leads to elevated safety risks. In particular, the failure to promptly perceive obstacles and implement appropriate braking strategies may result in collisions, posing serious threats to both equipment and personnel safety [4,5,6]. Therefore, ensuring safe and reliable operation requires not only environmental perception capabilities but also adaptive braking control strategies that can be adjusted in real time according to system states.

The braking process of an overhead crane is characterized by strong coupling between trolley motion and payload swing, resulting in highly nonlinear dynamics. Under non-zero initial conditions (e.g., at least one of the initial velocity, swing angle, or angular velocity is non-zero), especially in emergency braking scenarios, the system exhibits pronounced transient responses. Achieving rapid deceleration while effectively suppressing payload oscillations remains a fundamental challenge for improving system safety performance [7].

In addition, conventional braking strategies typically rely on fixed safety distances or heuristic rules to determine braking initiation, lacking the capability to adapt to varying system states. Such approaches fail to establish a quantitative relationship between system states and braking distance, making it difficult to determine the optimal braking timing. Particularly in emergency scenarios, directly applying braking without accounting for system coupling may induce significant payload oscillations, thereby compromising operational safety. Therefore, it is essential to develop a predictive model that captures the mapping between system states and safe braking distance, enabling state-dependent and adaptive braking decision-making.

In recent years, payload swing suppression and safety control of overhead cranes have attracted significant attention [8,9,10]. Existing studies primarily focus on nonlinear control design, trajectory planning, and robust anti-sway strategies.

He et al. [8] developed a hybrid dynamic model combining partial differential equations and ordinary differential equations for a crane system with a bottom-suspended flexible cable and proposed an adaptive control scheme based on an Integral Barrier Lyapunov Function. This method achieves vibration suppression and compensation for parameter uncertainties while satisfying boundary output constraints and ensuring closed-loop stability; however, it mainly targets vibration control during continuous operation. Mei et al. [11] investigated a flexible crane arm system with parameter uncertainties and asymmetric input-output constraints, and proposed an adaptive deformation control approach based on an asymmetric Barrier Lyapunov Function (BLF), enabling deformation suppression and attitude tracking via auxiliary system design and adaptive laws. Yu et al. [12] developed a tracking control strategy for nonlinear systems with input saturation using command filtering and fuzzy approximation. Sun et al. [13] combined type-2 fuzzy logic with sliding mode control to enhance robustness in anti-sway control. Ouyang et al. [14] addressed double-pendulum effects through system decoupling and S-curve trajectory planning, while Sun et al. [15] proposed a nonlinear feedback control method that achieves anti-sway and positioning without linearization.

Although these approaches demonstrate effective vibration suppression, they are primarily designed for normal operating conditions. With increasing safety requirements in industrial environments, emergency braking has emerged as a critical issue. Unlike conventional operation, emergency braking typically involves strongly coupled, non-equilibrium dynamics, making braking-strategy design more challenging. Consequently, despite extensive research on anti-sway control, systematic investigations focusing on emergency braking remain limited.

To address emergency braking problems, Yamamoto et al. [16] achieved sway-free braking using inverse dynamics but relied on predefined safety distance parameters. Hino et al. [17] proposed a trajectory planning approach that accounts for system dynamic constraints, where candidate trajectories are screened and adjusted to ensure collision avoidance. Ma et al. [18] developed a two-stage emergency braking strategy based on a switching mechanism, enabling rapid stopping and subsequent stabilization. Veciana et al. [19] employed input shaping to suppress residual vibrations during emergency braking under non-zero initial conditions.

In terms of control design, Ref. [20] decomposed the braking process into trolley deceleration and swing suppression, achieving coordinated control. Ref. [21] proposed a constraint-satisfying braking strategy based on passivity theory and barrier Lyapunov functions; however, it involves multiple fixed parameters, limiting practical applicability. Ref. [22] introduced a braking approach that incorporated coupling enhancement terms and barrier-like functions to improve safety but lacked explicit integration of environmental information. Deka et al. [23,24] applied fuzzy control to regulate velocity and suppress swing during braking, yet did not address braking decision-making mechanisms.

Regarding safety distance modeling, Ref. [7] employed MPC to optimize braking trajectories while suppressing swing. Building on this, Chen et al. [25] proposed an MPC-based safety distance prediction method that adaptively adjusts the safety distance based on system states, incorporating relative velocity and payload swing, thereby ensuring safe braking under anti-sway constraints.

Although MPC-based emergency braking methods have demonstrated promising performance in handling system constraints and suppressing payload swing, their effectiveness is highly dependent on the selection of controller parameters, such as weighting coefficients and prediction horizons. In most existing studies, these parameters are determined through offline tuning and remain fixed during operation. As operating conditions and system states vary, fixed-parameter MPC schemes may fail to maintain an appropriate balance between braking efficiency and swing suppression, resulting in degraded control performance. Consequently, achieving rapid braking while simultaneously suppressing payload oscillations under varying operating conditions remains challenging. Therefore, developing adaptive parameter-optimization mechanisms that enable MPC to adjust its control behavior according to changing system states has become an important research direction. Furthermore, integrating adaptive MPC optimization with state-dependent braking-distance prediction to establish a unified safe braking decision-making framework remains largely unexplored.

Recently, reinforcement learning has shown significant potential in solving online optimization and adaptive decision-making problems [26,27]. In particular, the TD3 algorithm exhibits excellent optimization capability in continuous action spaces and can effectively improve parameter adaptation during control processes. Therefore, integrating TD3 with MPC provides a promising approach to online parameter optimization, enabling MPC to dynamically adjust its control behavior according to changing operating conditions and system states.

Overall, existing studies on overhead crane control, emergency braking, and MPC-based optimization have achieved notable progress in swing suppression and safety control. However, several limitations remain. Most existing approaches rely on predefined control structures or fixed controller parameters, limiting their adaptability to varying operating conditions and system states. Although MPC-based methods have demonstrated promising performance, adaptive parameter optimization mechanisms remain insufficiently explored. Moreover, while some studies have investigated safety-distance prediction or collision avoidance, a unified safety-critical region modeling framework has yet to be established, and braking timing determination still lacks a clear quantitative mechanism. These limitations constrain the enhancement of safe braking performance in complex and dynamic operating environments.

To address the aforementioned challenges, this paper proposes an integrated safe braking framework for overhead cranes by combining TD3-enhanced MPC with safety-critical region modeling. Through the coordinated design of adaptive control optimization, adaptive reference trajectory generation, and safety-oriented braking decision-making, the proposed framework aims to simultaneously improve braking efficiency, payload swing suppression, and operational safety under complex operating conditions.

Specifically, the TD3 algorithm is embedded within the MPC framework to perform online optimization of key controller parameters, including the prediction horizon, control horizon, and weighting coefficients. This enables the controller to adapt its braking behavior in real time according to varying operating states. Furthermore, an adaptive reference trajectory generation strategy is developed to dynamically regulate the braking process, allowing a more effective trade-off between rapid braking and payload swing suppression. By jointly optimizing controller parameters and reference trajectories, the proposed TD3-MPC framework enhances both braking efficiency and load stabilization performance.

To quantitatively characterize the braking capability of the crane system, a data-driven minimum braking distance prediction model based on SVR is established. The proposed model captures the relationship between trolley velocity and minimum braking distance under different operating conditions, providing a reliable basis for braking safety assessment.

Building upon the predicted braking distance, a safety-critical region modeling method is further developed to determine optimal braking timing in a dynamic manner. This mechanism enables state-dependent braking decision-making and overcomes the limitations associated with conventional fixed safety-distance strategies.

Comprehensive simulation and experimental studies are conducted under various operating conditions to evaluate the effectiveness of the proposed framework. The results demonstrate that the proposed method achieves superior performance in terms of braking distance reduction, payload swing suppression, and operational safety compared with conventional MPC-based approaches, thereby providing an effective solution for safe and intelligent braking control of overhead cranes.

The remaining sections of this paper are organized as follows. Section 2 develops the dynamic model of the overhead crane and performs model linearization and discretization. Based on this, an adaptive braking reference trajectory is designed, and the objective function and constraints of the MPC framework are formulated. Section 3 presents the proposed TD3-MPC cooperative control method, including the principle of the TD3 algorithm, the design of state and action spaces, the construction of the reward function, and the architecture of the agent, followed by the overall cooperative control framework. Section 4 establishes a minimum braking distance prediction model based on SVR, and further constructs a safety-critical region model to quantitatively determine the braking trigger timing. Section 5 validates the effectiveness of the proposed method through extensive simulations under multiple operating conditions and provides comparative analyses against conventional methods. Finally, concluding remarks are given in Section 6.

2. Model Predictive Braking Control Design

2.1. Dynamic System Model

A simplified model of the overhead crane system is illustrated in Figure 1, comprising a trolley, a payload, and a hoisting cable. For controller design and dynamic analysis, the payload is modeled as a point mass, while the cable is assumed to be massless, rigid, and inextensible with negligible bending stiffness. These assumptions are commonly adopted in overhead crane control studies to reduce model complexity and ensure the real-time implementation of the proposed control strategy [7].

Based on the Lagrange formulation, the nonlinear dynamic model of the overhead crane can be derived as expressed in Equation (1) [28]:

\{\begin{cases} (M + m) \ddot{x} + m l \ddot{θ} \cos θ - m l {\dot{θ}}^{2} \sin θ = F \\ m l^{2} \ddot{θ} + m l \cos θ \ddot{x} + m g l \sin θ = 0 \end{cases}

(1)

where M and m denote the masses of the trolley and payload, respectively; l is the cable length; g is the gravitational acceleration; x represents the trolley displacement; and

θ

denotes the payload swing angle. The external force F applied to the trolley is considered the control input.

To address the safe braking problem under non-zero initial conditions, where the system exhibits coupled translational and oscillatory dynamics, the coupled motion and oscillation states must be explicitly considered. Therefore, the state vector is defined as follows:

x_{b} (t) = {[\begin{matrix} θ & v & \dot{θ} \end{matrix}]}^{T}

(2)

where v denotes the trolley velocity, and

\dot{θ}

denotes the payload angular velocity.

Equation (1) can be further simplified as follows:

\{\begin{cases} \dot{v} = \ddot{x} = \frac{m g \sin θ \cos θ + m l {\dot{θ}}^{2} \sin θ + F}{M + m \sin^{2} θ} \\ \ddot{θ} = \frac{- \cos θ \dot{v} - g \sin θ}{l} \end{cases}

(3)

2.2. Linearization and Discretization

We linearize Equation (3) around the equilibrium point

x_{b} (t) = {[0 0 0]}^{T}

and obtain the following linearized equations:

\{\begin{array}{l} \dot{v} = \frac{m g}{M} θ + \frac{1}{M} F \\ \ddot{θ} = - \frac{(m + M) g}{l M} θ + (- \frac{1}{l M}) F \end{array}

(4)

According to Equation (4), the state-space model of the linearized system can be obtained, as shown in Equation (5):

\{\begin{array}{l} {\dot{x}}_{b} = [\begin{matrix} 0 & 0 & 1 \\ \frac{m g}{M} & 0 & 0 \\ - \frac{(m + M) g}{l M} & 0 & 0 \end{matrix}] x_{b} + [\begin{matrix} 0 \\ \frac{1}{M} \\ - \frac{1}{l M} \end{matrix}] u \\ y = [\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}] x_{b} \end{array}

(5)

where u denotes the system input.

By discretizing Equation (5), the following discrete-time model is obtained as follows:

\{\begin{array}{l} x_{b} (k + 1) = A_{b} x_{b} (k) + B_{b} u (k) \\ y (k) = C_{b} x_{b} (k) \end{array}

(6)

where k denotes the discrete time step.

Based on Equation (6), the state variables at the next time step can be predicted as follows:

\{\begin{array}{l} \begin{matrix} x_{b} (k_{i} + N_{p} | k_{i}) = A_{b}^{N_{p}} x_{b} (k_{i}) + A_{b}^{N_{p} - 1} B_{b} u (k_{i}) + \\ \dots + A_{b}^{N_{p} - N_{c}} B_{b} u (k_{i} + N_{c} - 1) \end{matrix} \\ y (k_{i} + N_{p} | k_{i}) = C_{b} x_{b} (k_{i} + N_{p} | k_{i}) \end{array}

(7)

where N_p represents the prediction horizon, and N_c denotes the control horizon, with N_c ≤ N_p. The sampling instant is denoted by k_i. Given the state information

x_{b}

, the future states can be predicted from time

k_{i} + 1

to

k_{i} + N_{p}

. Equation (7) can then be rewritten in a compact matrix form as follows:

Y = P X_{b} (k_{i}) + Ψ U

(8)

where Y denotes the predicted output sequence over the prediction horizon; U represents the future control input sequence over the control horizon; P is the free-response matrix; and

Ψ

denotes the dynamic control matrix describing the influence of future control actions on the predicted outputs.

Ψ = [\begin{matrix} C_{b} B_{b} & 0 & \dots & 0 \\ C_{b} A_{b} B_{b} & C_{b} B_{b} & \dots & 0 \\ ⋮ & ⋮ & \dots & ⋮ \\ C_{b} A_{b}^{N_{p} - 1} B_{b} & C_{b} A_{b}^{N_{p} - 2} B_{b} & \dots & \dots \end{matrix}] P = [\begin{matrix} C_{b} A_{b} \\ C_{b} A_{b}^{2} \\ ⋮ \\ C_{b} A_{b}^{N_{p}} \end{matrix}]

(9)

2.3. Adaptive Braking Reference Trajectory

To address the coordination between rapid stopping requirements and payload swing suppression during emergency braking of the overhead crane, a composite braking reference trajectory is proposed in this study:

\{\begin{array}{l} r_{k} = x_{c} \frac{2 λ \exp (- α t_{k}) + (1 - λ) (1 - \tanh (β t_{k}))}{1 + λ} \\ t_{k} = k \cdot T_{s} \\ k = 0, 1, \dots, N_{p} - 1 \end{array}

(10)

where

x_{c} = {[\begin{matrix} θ_{c} & v_{c} & {\dot{θ}}_{c} \end{matrix}]}^{T}

,

θ_{c}

,

v_{c}

,

{\dot{θ}}_{c}

, denote the trolley state, payload swing angle, trolley velocity, and swing angular velocity at the braking triggering instant, respectively. T_s is the sampling period. λ, α, and β are adaptive trajectory parameters updated online by the reinforcement learning algorithm according to operating conditions such as payload mass, velocity, and swing state.

The proposed braking trajectory integrates an exponential decay term and a hyperbolic tangent function to simultaneously achieve rapid initial deceleration and smooth terminal convergence. The exponential term governs the braking intensity through α, while the tanh term ensures smooth transition and swing attenuation regulated by β. The parameter λ balances the contribution of the two components, thereby controlling the trade-off between braking speed and swing suppression. With reinforcement learning-based adaptation, these parameters are updated online to improve performance under varying operating conditions. Compared with conventional approaches, the proposed trajectory provides a unified framework that combines fast response, smooth transition, and adaptive regulation.

To construct the MPC performance index function, the reference trajectory over the prediction horizon is defined as shown in Equation (11):

r (k_{i}) = [\begin{matrix} r (0) \\ r (1) \\ ⋮ \\ r (N_{p} - 1) \end{matrix}] = [\begin{matrix} x_{c} \cdot \frac{2 λ \exp (- α \cdot 0 \cdot T_{s}) + (1 - λ) (1 - \tanh (β \cdot 0 \cdot T_{s}))}{1 + λ} \\ x_{c} \cdot \frac{2 λ \exp (- α \cdot 1 \cdot T_{s}) + (1 - λ) (1 - \tanh (β \cdot 1 \cdot T_{s}))}{1 + λ} \\ ⋮ \\ x_{c} \cdot \frac{2 λ \exp (- α \cdot (N_{p} - 1) \cdot T_{s}) + (1 - λ) (1 - \tanh (β \cdot (N_{p} - 1) \cdot T_{s}))}{1 + λ} \end{matrix}]

(11)

where

r (k_{i})

denotes the set of reference trajectories over the prediction horizon.

2.4. Objective Function and Constraint

To ensure accurate tracking of the adaptive braking reference trajectory designed in the previous section, the following MPC objective function is formulated [29,30]:

J (u) = \sum_{k = 1}^{N_{p}} {(y_{k} - r_{k})}^{T} Q (y_{k} - r_{k}) + \sum_{k_{i} = 1}^{N_{c}} u_{k_{i}}^{T} R u_{k_{i}}

(12)

where

y_{k}

denotes the predicted state at step k;

r_{k}

is the reference value at step k; R is the control input weighting matrix; and Q is the state tracking weighting matrix. In particular, Q is adaptively tuned online by the subsequent deep reinforcement learning algorithm based on real-time system-state information, thereby enabling an optimal weighting configuration under varying operating conditions.

To suppress payload swing during braking and ensure that the control input satisfies actuator constraints, the following input constraints are imposed, as shown in Equation (13):

- u_{\max} \leq u_{k_{i}} \leq u_{\max}

(13)

where

u_{\max}

denotes the maximum allowable magnitude of the system control input. Since

u_{\max} = M a_{\max} - m g θ

, the following holds true:

- M a_{\max} - m g θ_{k_{i - 1}} \leq u_{k_{i}} \leq M a_{\max} - m g θ_{k_{i - 1}}

(14)

where

a_{\max} \leq \frac{l g}{T_{s}} (θ_{\max} - \sqrt{θ^{2} (0) + \frac{l}{g} {\dot{θ}}^{2} (0)})

(15)

a_{\max}

denotes the maximum allowable acceleration [31], and

θ_{\max}

represents the maximum permissible swing angle of the system.

θ (0)

and

\dot{θ} (0)

are the initial values of

θ (t)

and

\dot{θ} (t)

at the beginning of the current control interval, respectively. For each control interval, these initial values are updated to the terminal state of the previous interval to ensure continuity of state evolution between successive control cycles. For the detailed derivation of the control input and the swing angle θ, please refer to Appendix A.

The above optimization problem can be reformulated into a standard quadratic programming (QP) form, as shown in Equation (16):

J (u) = \min_{U} \frac{1}{2} U^{T} H U + f^{T} U

(16)

where

\{\begin{cases} H = 2 (Ψ^{T} Q Ψ + R I) \\ f = 2 Ψ^{T} Q (P x_{b} - R_{v}) \end{cases}

(17)

The optimization is solved using IPOPT within a receding-horizon scheme. Only the first control input

u_{0}

is implemented, and the problem is updated and re-solved at each step. This scheme enables tracking of the adaptive reference trajectory, achieving anti-sway in the payload and smooth braking.

3. TD3-Based Adaptive Control Framework

The integration of reinforcement learning (RL) with MPC has shown strong potential for addressing complex control problems [32,33]. Among RL methods, the TD3 algorithm is a deterministic policy gradient approach designed for continuous action spaces, which improves training stability by employing twin critic networks, delayed policy updates, and target policy smoothing to mitigate Q-value overestimation. In this study, TD3 is utilized to perform online adaptive tuning of key MPC parameters, including the prediction horizon N_p, control horizon N_c, weighting matrix Q, and reference trajectory parameters λ, α, and β, thereby improving trajectory tracking accuracy, anti-sway performance, and overall control robustness during crane braking operations.

3.1. State and Action Space Design

The state space observed by the TD3 agent is consistent with the system state defined in Equation (2) and is given as shown in Equation (18):

x_{t} = x_{b} = {[\begin{matrix} θ & v & \dot{θ} \end{matrix}]}^{T}

(18)

where

θ

denotes the payload swing angle and serves as a key indicator of system stability. When

|θ| > θ_{\max}

, the system is considered to be in a critical state. The trolley velocity v represents the current motion state, while the swing angular velocity

\dot{θ}

describes the evolution trend of the swing and is used to predict its future behavior.

The action space Ω generated by the TD3 agent is defined in Equation (19) and consists of eight action dimensions for online adaptive tuning of key MPC parameters:

Ω = [\begin{matrix} N_{p} & N_{c} & λ & α & β & Q_{θ} & Q_{v} & Q_{\dot{θ}} \end{matrix}]

(19)

The ranges of the action variables are defined as follows: N_p

\in [40, 120]

, N_c

\in [32, 100]

,

λ \in [0, 1]

,

α \in [1.5, 3.6]

,

β \in [0.3, 2]

.

3.2. Reward Function Construction

To improve the learning efficiency and control performance of the TD3 agent, a multi-objective composite reward function is constructed [34,35], consisting of four components: state-tracking, control-constraint, parameter-optimization, and safety-constraint terms, as shown in Equation (20).

r = \underset{S t a t e t r a c k i n g}{\underset{⏟}{r_{θ} + r_{v} + r_{\dot{θ}}}} + \underset{C o n t r o l c o n s t r a i n t}{\underset{⏟}{r_{u} + r_{Δ u}}} + \underset{P a r a m e t e r o p t i m i z a t i o n}{\underset{⏟}{r_{N p} + r_{p}}} + \underset{S a f e t y c o n s t r a i n t}{\underset{⏟}{r_{d o n e}}}

(20)

State tracking reward

The state-tracking reward guides the agent to achieve rapid braking and payload anti-sway control. It consists of three terms: swing angle, trolley velocity, and angular velocity. All state variables are normalized into a dimensionless form, and their deviations are evaluated using absolute values. An exponential structure is adopted to improve sensitivity to large deviations while ensuring smooth convergence near equilibrium.

The corresponding reward terms are defined as follows:

r_{θ} = - ω_{θ} (e^{a \bar{θ}} - 1)

(21)

r_{v} = - ω_{v} (e^{b \bar{v}} - 1)

(22)

r_{\dot{θ}} = - ω_{\dot{θ}} (e^{c \bar{\dot{θ}}} - 1)

(23)

where

ω_{θ}

,

ω_{v}

, and

ω_{\dot{θ}}

are the weighting coefficients for the swing angle, velocity, and angular velocity, respectively; a, b, and c are the exponential parameters;

\bar{θ} = | θ^{'} | / θ_{\max}

,

\bar{v} = | v^{'} | / v_{0}

, and

\bar{\dot{θ}} = | {\dot{θ}}^{'} | / {\dot{θ}}_{\max}

denote the normalized state variables;

θ^{'}

,

v^{'}

, and

{\dot{θ}}^{'}

represent the predicted values at the next time step; and

θ_{\max}

,

v_{0}

, and

{\dot{θ}}_{\max}

are the corresponding reference or limit values.

The three terms penalize the deviations of swing angle, velocity, and angular velocity, respectively, thereby achieving rapid braking and effective suppression of payload oscillations.

2.: Control constraint reward

The control-constraint reward is designed to regulate the control-input magnitude and its variation, ensuring smooth actuation during emergency braking. It consists of two terms associated with the control input and its rate of change.

The corresponding reward terms are defined as follows:

r_{u} = - ω_{u} u^{2}

(24)

r_{Δ u} = - ω_{Δ u} {(u - u_{p r e v})}^{2}

(25)

where u and

u_{p r e v}

denote the control inputs at the current and previous time steps, respectively;

ω_{u}

and

ω_{Δ u}

are the weighting coefficients for the control input and its variation.

The two terms penalize excessive control magnitude and abrupt variations, thereby ensuring smooth actuation and improving system stability during the braking process.

3.: Parameter-optimization reward

The parameter-optimization reward is designed to adaptively regulate key MPC parameters, including the prediction horizon and reference trajectory parameters, to improve predictive performance and control effectiveness during emergency braking. It consists of two terms associated with the prediction horizon and parameter coordination.

The corresponding reward terms are defined as follows:

r_{N_{p}} = 0.3 \frac{| θ^{'} |}{θ_{\max}} \frac{N_{p}}{N_{p}^{\max}} - 0.15 {(\frac{N_{p}}{N_{p}^{\max}})}^{2}

(26)

r_{p} = - ω_{α β} {(α β - c)}^{2}

(27)

where N_p is the prediction horizon,

N_{p}^{\max}

is its upper bound;

ω_{α β}

is the weighting coefficient associated with the parameter coordination term; α and β are the reference trajectory parameters; and c is the desired coordination value.

The first term regulates the prediction horizon according to the system state, enlarging it under large oscillations to enhance predictive capability while penalizing excessive values to reduce computational burden. The second term penalizes improper coordination between α and β, thereby maintaining a balance between braking responsiveness and trajectory smoothness.

4.: Safety constraint penalty

The safety constraint penalty is designed to enforce state constraints and ensure safe operation during emergency braking. It consists of a terminal penalty term associated with the swing angle limit.

The corresponding penalty term is defined as follows:

r_{done} = \{\begin{array}{l} - 50 & | θ^{'} | > θ_{\max} \\ 0 & | θ^{'} | \leq θ_{\max} \end{array}

(28)

The term penalizes constraint violations by imposing a large penalty when the swing angle exceeds its allowable bound, thereby promoting safe operation and preventing unstable behaviors.

In summary, the proposed composite reward integrates state tracking, control constraint, parameter optimization, and safety constraint terms to achieve a balanced trade-off among braking efficiency, anti-sway performance, control smoothness, and safety, forming a unified framework for TD3-based adaptive MPC.

3.3. TD3 Learning and Update Strategy

To enable online adaptive optimization of MPC parameters during emergency braking of the overhead crane, a TD3 agent is developed. The agent takes the current system state

x_{t}

as input and outputs the action vector

Ω_{t}

in real time. This vector is used to adjust the prediction horizon, control horizon, weighting matrix, and reference trajectory parameters, thereby improving control performance under varying operating conditions.

During policy exploration, the deterministic action generated by the Actor network is perturbed using Ornstein–Uhlenbeck (OU) noise to enhance exploration in continuous action spaces, as expressed in Equation (29):

\{\begin{cases} Ω_{t} = π_{ϕ} (x_{t}) + χ_{t} \\ χ_{t} = ϕ_{O U} (μ - χ_{t - 1}) + σ_{O U} Ξ_{t} \end{cases}

(29)

where

π_{ϕ} (\cdot)

denotes the Actor policy network;

χ_{t}

represents the exploration noise;

ϕ_{O U}

and

σ_{O U}

are the mean-reversion coefficient and noise intensity of the OU process, respectively; and

Ξ_{t}

is a Gaussian random perturbation. This mechanism enhances exploration efficiency while maintaining action continuity.

The samples

(x_{t}, Ω_{t}, r_{t}, x_{t + 1}, d_{t})

generated through interaction between the TD3 agent and the environment are stored in a prioritized experience replay buffer. To improve training efficiency, transitions are sampled according to their priorities, and importance sampling weights are introduced to correct estimation bias, as shown in Equation (30).

ω_{i} = {(\frac{N}{M} P (i))}^{- κ}

(30)

where N is the capacity of the replay buffer, M is the mini-batch size, P(i) denotes the priority of sample i, and κ ∈ [0, 1] is the bias correction coefficient.

When the number of samples reaches the training threshold, a dual Critic network is employed to update the state-action value function, and the Actor network is updated using a delayed policy update mechanism. This alleviates Q value overestimation and improves training stability. The Huber loss function and target policy smoothing are further introduced to enhance training robustness and prevent overly aggressive updates that may destabilize the online MPC optimization.

To accommodate the heterogeneous multi-parameter action space, the outputs of the Actor network are normalized and then mapped back to their physical ranges according to parameter bounds, enabling effective coupling between TD3 outputs and MPC parameters.

To ensure reproducibility and address implementation details, the TD3 training hyperparameters used in this study are summarized in Table 1.

3.4. TD3-MPC Integrated Control Framework

The TD3-MPC integrated control framework is illustrated in Figure 2.

The system environment (the overhead crane dynamic model) provides real-time state feedback

x_{t}

to the TD3 agent. Based on the current state, the agent outputs the action vector

Ω_{t}

(normalized control parameters), which is mapped to update the prediction horizon, control horizon, weighting matrix, and reference trajectory parameters of the MPC controller.

The MPC then computes the optimal control input at each time step using the updated parameters and applies it to the overhead crane to achieve emergency braking and anti-sway control. The system subsequently transitions to the next state

x_{t + 1}

, and the immediate reward

r_{t}

is evaluated according to the designed reward function.

During this interaction process, the transition samples are stored in a prioritized experience replay buffer for network training. By prioritizing informative samples and employing dual Critic networks, the learning efficiency, training stability, and parameter optimization performance are improved.

The TD3 agent, acting as a high-level meta-controller, performs online adaptive tuning of key MPC parameters through an 8-dimensional action space, including time-domain parameters, weighting matrices, and reference trajectory parameters.

Time-domain parameter optimization

Based on the current system state, the prediction horizon N_p is adaptively adjusted according to the swing dynamics to enhance predictive capability and braking stability. A larger swing angle increases N_p, while a smaller swing angle reduces it to improve computational efficiency and responsiveness.

The control horizon N_c is not an independent optimization variable with a dedicated reward design. Instead, it is implicitly determined by the TD3 policy network as part of the action output, subject to the constraint N_c ≤ N_p, thereby reflecting a trade-off between control performance and computational efficiency.

2.: Weighting matrix optimization

A diagonal weighting matrix composed of swing angle weight

Q_{θ}

, velocity weight

Q_{v}

, and angular velocity weight

Q_{\dot{θ}}

is constructed and updated online according to the system state. When the swing angle increases,

Q_{θ}

is increased to prioritize anti-sway control; otherwise, it is reduced to balance response speed and braking efficiency. The remaining weights are adjusted coordinately to achieve multi-objective performance trade-offs.

3.: Reference trajectory optimization

The reference trajectory parameters α and β are optimized online to regulate the decay rate and smoothness of the braking trajectory. The blending factor λ ∈ [0, 1] determines the combination between exponential decay and hyperbolic tangent trajectories and is generated by the TD3 policy network as part of the action output.

Although λ is not directly associated with an independent reward term, it is indirectly optimized through its impact on system performance and the overall reward signal, enabling a trade-off between braking speed and swing suppression without increasing reward design complexity.

The MPC module receives parameters generated by the TD3 actor and solves a receding-horizon optimization problem based on the system model, current state

x_{t}

, and operational constraints. Only the first control input u₀ is applied to the overhead crane.

Within this hierarchical framework, TD3 is responsible for adaptive parameter tuning under varying operating conditions, while MPC performs model-based predictive optimization and constraint handling. Their integration combines data-driven decision-making with model-based control, improving braking response, anti-sway performance, and operational stability during emergency braking.

4. Safety-Critical Braking Distance Prediction Model

To characterize the braking capability of the overhead crane under different operating velocities, a mapping between trolley velocity and the minimum safe braking distance is established. Considering its strong generalization ability under small-sample conditions and effectiveness in modeling nonlinear input-output relationships [36], SVR is adopted to construct the braking distance model.

In the present study, the SVR model is developed under fixed operating conditions, where trolley velocity is treated as the primary influencing factor, while other variables, including payload mass, cable length, and initial swing conditions, are assumed to vary within a limited range during experimental testing. Therefore, the minimum safe braking distance is modeled as a nonlinear function of velocity, denoted as S(v), establishing a velocity-dependent braking distance prediction model.

The proposed braking distance prediction model is developed from a control-oriented perspective, emphasizing the dominant influence of trolley velocity on braking performance while maintaining model simplicity and computational efficiency. Under the considered operating conditions, the braking distance is primarily governed by the pre-braking kinetic energy of the crane system. Therefore, the velocity-based formulation provides a compact and implementation-friendly representation of braking behavior for real-time braking decision-making.

To ensure the accuracy of the training data and its engineering feasibility, extensive braking simulations are conducted under different initial operating velocities based on the TD3-MPC safe braking control model established in the previous sections. By varying the initial trolley velocity and employing the TD3 algorithm to perform online adaptive tuning of MPC parameters, optimal braking processes satisfying all constraints are obtained under each operating condition. The minimum safe braking distance corresponding to the transition from the initial state to a complete stop is recorded, thereby forming the training dataset as follows:

D a t a = {\{(v_{i}, S_{i})\}}_{i = 1}^{N}

(31)

where v_i denotes the initial velocity sample of the i-th case, S_i represents the corresponding minimum safe braking distance under that operating condition, and N is the total number of samples.

Based on the above sample data, an SVR model is constructed to establish the nonlinear mapping between trolley velocity and the minimum safe braking distance. Its predictive function can be expressed as

S (v) = \sum_{i = 1}^{N_{s v}} α_{i} K (v_{i}, v) + b

(32)

where N_sv denotes the number of support vectors, α_i are the model parameters, b is the bias term, and K(⋅) is the kernel function. Considering the strong nonlinear relationship between velocity and braking distance, the radial basis function (RBF) kernel is adopted in this study as the kernel function of SVR to improve prediction accuracy and robustness.

After the model is trained, the corresponding minimum safe braking distance can be rapidly predicted based on the real-time measured trolley velocity. Figure 3 presents a comparison between the sample data and the SVR prediction results obtained under the operating condition with crane system parameters M = 280 kg, m = 30 kg, l = 2 m, and θ_max = 0.03 rad.

To evaluate the predictive capability of the proposed model, a dataset consisting of 21 samples was randomly divided into a training set (75%) and an independent test set (25%). Hyperparameter optimization was performed using a grid-search strategy combined with leave-one-out cross-validation (LOOCV) on the training set. To avoid potential data leakage, the independent test set was excluded from both model training and hyperparameter optimization procedures and was used solely for final performance evaluation.

The prediction results show that the SVR model achieves an RMSE of 0.0325, an MAE of 0.0267, and an (R²) value of 0.9987 on the independent test set. These results indicate that the proposed model can accurately approximate the nonlinear relationship between trolley velocity and the minimum safe braking distance while maintaining excellent prediction capability for unseen velocity samples within the considered operating conditions.

The SVR model is trained using braking-distance data generated by the proposed TD3–MPC braking framework and serves as a surrogate model for rapid online braking-distance prediction. The resulting prediction model approximates the achievable braking capability of the controller and is subsequently utilized for safety-critical region construction and braking-trigger determination.

Therefore, the developed SVR model provides a reliable braking-distance prediction module for the subsequent construction of the velocity-dependent safety-critical region boundary.

The function S(v) represents the minimum safe braking distance under different velocities and serves as the basis for constructing the safety-critical boundary. By incorporating payload geometry and swing effects, a velocity-dependent safety-critical region D(v) is established. When an obstacle enters this region, braking control should be activated to avoid a potential collision, as illustrated in Figure 4.

By considering payload swing effects and safety margins, the safety-critical region is expressed as Equation (33):

D (v) = ρ [w (1 + K) + S (v) + l \sin θ_{\max}]

(33)

where ρ is the safety factor; v, w, l, and K denote the trolley velocity, payload width, hoisting rope length, and payload expansion coefficient, respectively; and S(v) is the corresponding braking distance.

Based on the proposed model, the boundary of the safety-critical region can be evaluated in real time under different operating velocities, providing a quantitative basis for braking decisions. During crane operation, the trolley velocity v is continuously measured, and the safety-critical region D(v) is updated online. When an obstacle is detected to enter this region, the TD3-MPC braking controller is triggered, ensuring safe and smooth stopping of the overhead crane while preventing potential collisions.

5. Examples and Results

This section presents two simulation case studies. The dynamic model of the overhead crane is used as the simulation environment, while the MPC-based braking strategy and the TD3 training and inference processes are implemented on a Python 3.11 platform. First, a simulation study of TD3-MPC braking control is conducted to verify the effectiveness of the proposed control method. Second, a safety-critical region model is incorporated into the TD3-MPC framework, and further simulations are performed to evaluate both the safety-critical region model and the braking control performance.

5.1. Simulation Analysis of TD3-MPC Braking Control

To evaluate the performance of the proposed TD3-MPC braking control method under different operating velocities and swing angle constraints, three groups of simulation experiments are conducted. The simulation parameters are based on practical engineering data of the overhead crane, as listed in Table 2.

For comparison purposes, a conventional MPC controller is used as the baseline method. The prediction horizon is set to N_p = 50, the control horizon is N_c = 40, and the weighting matrices are defined as Q = diag(30,000,10,000,1000) and R = 0.01. These parameters are kept constant across all simulation scenarios to ensure consistent evaluation conditions for the assessment of the proposed TD3–MPC method.

To further evaluate the rationality of the proposed composite reward function and quantify the contribution of each reward component to the TD3 learning process, an ablation study is conducted under Scenario 1. Since this scenario represents a typical emergency braking condition, it provides an appropriate benchmark for assessing the effectiveness of different reward formulations.

The proposed reward function consists of four components, namely the state-tracking reward, control-constraint reward, parameter-optimization reward, and safety-constraint penalty. These reward terms are designed to jointly guide the agent toward achieving accurate state regulation, constraint-compliant control actions, effective online parameter adaptation, and safe braking behavior. To investigate the necessity of each component, four modified reward functions are constructed by removing one reward term at a time while retaining all remaining terms unchanged.

Considering that the primary objective of the proposed braking strategy is to achieve rapid and safe braking while suppressing payload oscillations, the braking distance and the maximum payload swing angle during the braking process are selected as the main evaluation metrics. The corresponding results are summarized in Table 3.

As shown in Table 2, the complete reward function achieves the best overall braking performance, yielding the shortest braking distance of 0.359 m and a maximum payload swing angle of 0.027 rad. The ablation results demonstrate that each reward component contributes to the learning objective from a different perspective.

When the state tracking reward is removed, the braking distance increases from 0.359 m to 0.397 m, corresponding to an increase of 10.28%, while the maximum payload swing angle decreases slightly from 0.027 rad to 0.026 rad. This result indicates that the state tracking reward primarily contributes to braking efficiency by guiding the controller toward the desired braking trajectory. Without this reward term, the controller is less effective in achieving rapid braking performance.

Removing the control constraint reward increases the braking distance to 0.379 m, which is 5.28% higher than that obtained with the complete reward function, while the maximum payload swing angle remains unchanged at 0.027 rad. This finding suggests that the control constraint reward contributes to braking efficiency by encouraging control actions that satisfy system constraints and actuator limitations during the braking process.

When the parameter optimization reward is excluded, the braking distance increases from 0.359 m to 0.386 m, representing an increase of 7.22%, while the maximum payload swing angle increases from 0.027 rad to 0.028 rad, corresponding to an increase of 3.70%. These results indicate that the parameter optimization reward plays an important role in the online optimization process. In particular, this reward term primarily facilitates the adaptive adjustment of the braking reference trajectory parameters, enabling the controller to generate state-dependent braking profiles that effectively balance braking distance reduction and payload swing suppression. In addition, it also contributes to the online tuning of MPC parameters, further enhancing the adaptability of the control framework under varying operating conditions.

The most significant deterioration in swing suppression performance is observed when the safety constraint penalty is removed. Although the braking distance increases only slightly from 0.360 m to 0.363 m, corresponding to an increase of 0.83%, the maximum payload swing angle increases from 0.027 rad to 0.031 rad, representing an increase of 14.81%. This result indicates that the safety constraint penalty plays a crucial role in incorporating safety-related objectives into the learning process and effectively limiting excessive payload oscillations during emergency braking.

Overall, the ablation study demonstrates that the four reward components are complementary and collectively contribute to the overall performance of the proposed TD3-MPC framework. The complete reward function achieves the most favorable trade-off between braking distance reduction and payload swing suppression, thereby validating the rationality and effectiveness of the proposed reward function design.

Based on the adopted reward structure, the proposed TD3-MPC framework is further evaluated under three representative operating scenarios and compared with the conventional MPC controller. The corresponding state responses presented in Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10 are used to evaluate the effectiveness of the TD3-MPC framework and its capability to improve braking performance under different operating conditions.

Figure 5 compares the state response of the proposed TD3-MPC method and conventional MPC under Scenario 1. In terms of braking performance, TD3-MPC achieves a braking distance of 0.359 m, representing a 22.6% reduction compared with 0.464 m for conventional MPC, indicating higher braking efficiency. For payload swing suppression, TD3-MPC limits the maximum swing angle within 0.027 rad, slightly outperforming conventional MPC (0.029 rad). Moreover, the swing response under TD3-MPC converges faster to zero and exhibits improved stability, whereas conventional MPC shows slower convergence and reduced control accuracy. Regarding velocity response, TD3-MPC achieves faster deceleration, reaching a zero-velocity steady state at approximately 5.1 s, while conventional MPC exhibits a slower and delayed braking process.

Figure 6 illustrates the adaptive evolution of key TD3-MPC parameters under Scenario 1, demonstrating that the controller can adjust control parameters online according to system states.

For time-domain parameters, both the prediction horizon N_p and control horizon N_c increase rapidly during the initial braking stage, reaching peak values of 109 and 77, respectively, to enhance the predictive capability of the control sequence under rapidly varying system states. They then gradually converge to stable values of approximately 82 and 65.

For weighting coefficients, the swing angle weight

Q_{θ}

increases sharply in the early stage to approximately 43,000 and stabilizes at about 22,000. The velocity weight

Q_{v}

and angular velocity weight

Q_{\dot{θ}}

exhibit similar trends but with smaller magnitudes and variations, indicating that swing suppression is prioritized during initial braking. As the system stabilizes, all weighting coefficients converge to balanced values, ensuring coordinated control performance.

For reference trajectory parameters, the adaptive variations reflect a trade-off between rapid response and smoothness. The blending factor λ increases initially to approximately 0.765, making the exponential decay component dominant and enhancing braking intensity. It then decreases to about 0.416, shifting the trajectory towards the hyperbolic tangent component to improve smooth deceleration.

The exponential decay coefficient α increases to 3.11 in the early stage to accelerate velocity attenuation and energy dissipation, and then gradually decreases to improve smoothness and reduce aggressive braking behavior. The hyperbolic tangent parameter β increases initially to strengthen control responsiveness and anti-sway capability, and subsequently decreases to suppress oscillations and improve overall stability.

Figure 7 compares the control performance of TD3–MPC and conventional MPC under more stringent swing angle constraints. When the swing angle constraint is set to 0.01 rad, TD3-MPC limits the maximum payload swing angle to 0.009 rad, satisfying the constraint requirement, whereas conventional MPC reaches 0.011 rad, resulting in constraint violation. This demonstrates that TD3-MPC provides superior constraint-handling capability and faster swing suppression. In terms of braking performance, TD3-MPC achieves a braking distance of 0.856 m, representing a 13.4% reduction compared with 0.989 m for conventional MPC, indicating higher braking efficiency under strict constraints. Overall, TD3-MPC maintains both effective swing suppression and satisfactory braking performance under tight constraints, outperforming conventional MPC in terms of constraint satisfaction and robustness.

Figure 8 illustrates the evolution of key parameters under Scenario 2. Under stricter swing constraints, all parameters increase rapidly at the initial braking stage to enhance control effort and predictive capability, prioritizing swing suppression and constraint satisfaction. They then gradually converge to steady values, reducing control fluctuations and achieving a balance between braking performance and system smoothness.

Figure 9 compares the control performance of TD3–MPC and conventional MPC under a higher operating speed condition. When the trolley velocity is 0.5 m/s, TD3-MPC limits the maximum payload swing angle to 0.027 rad, lower than 0.030 rad achieved by conventional MPC. Moreover, the swing response under TD3-MPC converges faster to zero and exhibits improved stability, demonstrating superior anti-sway performance.

In terms of braking performance, TD3-MPC achieves a braking distance of 0.901 m, representing a 19.4% reduction compared with 1.118 m for conventional MPC, indicating that high braking efficiency is maintained even under high-speed conditions. Overall, TD3-MPC achieves fast braking and effective vibration suppression while maintaining system stability under high-speed operation and outperforms conventional MPC in terms of robustness and overall control performance.

Figure 10 illustrates the evolution of key parameters under Scenario 3. The results show a clear two-stage adjustment behavior during the braking process at higher operating speeds.

The first peak occurs at the initial braking stage, where large system inertia requires a rapid parameter increase to enhance braking intensity and suppress swing motion. The second peak appears after significant velocity reduction, when residual oscillations dominate, prompting further parameter adjustment to accelerate convergence and improve damping performance.

After this stage-wise adjustment, all parameters gradually decrease and stabilize at lower levels, reducing control fluctuations and avoiding excessive control effort. This two-peak characteristic indicates that TD3-MPC can adaptively regulate control behavior according to system state evolution, achieving a balance between rapid braking and effective anti-sway performance under high-speed conditions.

The quantitative performance of the conventional MPC and the proposed TD3–MPC under different operating scenarios is summarized in Table 4. The braking distance, maximum payload swing angle, and settling time are selected as evaluation metrics.

As shown in Table 4, the proposed TD3–MPC consistently outperforms conventional MPC under all operating conditions. Specifically, the braking distance is reduced by 13.4–22.6%, while the settling time is shortened by 20.7–26.6%. In addition, the maximum payload swing angle is reduced by 6.9–18.2%, demonstrating improved swing suppression capability. These results indicate that online adaptation of the prediction horizon, control horizon, and weighting parameters enables the controller to achieve a better balance between braking efficiency and oscillation suppression. Overall, the proposed TD3–MPC method exhibits superior robustness and control performance across different operating velocities and swing-angle constraints.

5.2. Real Crane Experiment

To validate the effectiveness of the proposed TD3–MPC braking control strategy and the safety-critical region prediction model, experimental studies are conducted using the developed overhead crane experimental platform.

A photograph of the experimental platform is shown in Figure 11, and the main system parameters are summarized as follows: the crane span is 4.2 m, the rated lifting height is 2.5 m, and the rated lifting capacity is 1 t. The hoisting speed ranges from 1 to 10 m/min, the trolley traveling speed ranges from 5 to 36 m/min, and the gantry traveling speed ranges from 5 to 10 m/min. In addition, the total equipment height is 2.8 m, the overall length of the gantry mechanism is 4.5 m, the gantry width is 0.4 m, and the working duty classification is A5.

The experimental platform employs a SICK DS35-B15521 laser distance sensor and a Hikrobot MV-CA023-10GC industrial camera to measure the trolley position and payload swing angle, respectively. The laser sensor provides a measurement range of 50 mm–12 m with an accuracy of approximately ±10 mm. The payload swing angle is obtained through a vision-based measurement system using the industrial camera, which operates at a resolution of 1920 × 1200 pixels and a maximum frame rate of 41 fps. To reduce measurement noise and improve signal quality, a low-pass filter is applied to the acquired swing-angle signals before they are used for control implementation.

Sensor data are transmitted to the host computer via a Gigabit Ethernet interface using the GigE Vision protocol, enabling real-time state acquisition. The proposed control algorithm is implemented on a host computer equipped with an Intel Core i9-10900 processor (2.8 GHz, 10 cores) and 64 GB RAM. The proposed control algorithm is implemented on a host computer equipped with an Intel Core i9-10900 processor (2.8 GHz, 10 cores) and 64 GB RAM. The controller operates with a sampling period of 50 ms. The computational performance of the proposed TD3-MPC framework is summarized in Table 5. During the experiments, the average control cycle time was 25.12 ms, and the maximum control cycle time was 35.67 ms, both of which remained below the sampling period. These results indicate that the proposed controller can be executed online on the experimental platform.

To further validate the effectiveness of the proposed safe braking control strategy and safety-critical region prediction method under different operating conditions, a series of experiments is conducted on the developed overhead crane platform. The detailed parameter settings for the experimental groups are summarized in Table 6.

5.2.1. Experimental Group 1

Experimental Group 1 is conducted to establish the braking distance prediction model. The trolley is accelerated to different operating velocities ranging from 0 to 0.6 m/s, and 12 groups of braking distance samples are collected. Based on the experimental data, an SVR-based braking distance prediction model is developed to characterize the relationship between trolley velocity and minimum braking distance. The corresponding prediction results are shown in Figure 12.

Based on the prediction results, the safety-critical region is constructed according to the method presented in Section 4. To evaluate the effectiveness of the proposed approach, experiments are conducted with obstacle distances of 0.8 m and 1.5 m. The corresponding results are shown in Figure 13 and Figure 14.

As shown in Figure 13, when the obstacle is positioned 0.8 m ahead of the crane, the safety-critical region is reached at t = 1.80 s, corresponding to a trolley velocity of 0.227 m/s. The predicted minimum braking distance at this instant is 0.592 m, and the braking controller is immediately activated. The maximum payload swing angle during braking is limited to 0.026 rad. The crane finally comes to a complete stop after approximately 7.50 s with a total displacement of 0.623 m, maintaining a safe clearance from the obstacle.

Figure 14 presents the results for an obstacle distance of 1.5 m. Owing to the longer acceleration phase, braking control is activated at t = 3.10 s with a trolley velocity of 0.404 m/s and a predicted minimum braking distance of 0.895 m. Despite the increased braking demand, the maximum payload swing angle remains limited to 0.026 rad. The crane eventually stops after 7.05 s with a total displacement of 1.333 m.

These results demonstrate that the proposed safety-critical region prediction method can accurately determine the braking trigger timing and effectively coordinate with the TD3–MPC controller, ensuring safe braking while maintaining satisfactory swing suppression performance.

5.2.2. Experimental Group 2

Experimental Group 2 is conducted following the same procedure as Experimental Group 1. The corresponding braking-distance prediction results are presented in Figure 15, while the safety-critical-region prediction and braking-control results are shown in Figure 16 and Figure 17.

For the obstacle distance of 0.8 m, braking control is activated at t = 1.75 t s with a trolley velocity of 0.205 m/s and a predicted minimum braking distance of 0.586 m. The maximum payload swing angle during braking is limited to 0.029 rad, and the crane safely stops after approximately 7.50 s with a total displacement of 0.562 m.

For the obstacle distance of 1.5 m, the braking controller is triggered at t = 3.05 s with a trolley velocity of 0.382 m/s and a predicted minimum braking distance of 0.886 m. The maximum payload swing angle remains limited to 0.028 rad, and the crane comes to a complete stop after 6.90 s with a total displacement of 1.252 m.

Compared with Experimental Group 1, the increased payload mass slightly modifies the braking characteristics of the crane system and leads to a moderate increase in payload oscillation. Nevertheless, the proposed method continues to provide accurate braking decisions and satisfactory swing suppression, demonstrating its effectiveness under different payload conditions.

5.2.3. Experimental Group 3

Experimental Group 3 is conducted to further investigate the influence of rope length on the proposed braking framework. The braking distance prediction results are shown in Figure 18, and the corresponding braking-control results are presented in Figure 19 and Figure 20.

For the obstacle distance of 0.8 m, the safety-critical region is reached at t = 1.70 s, corresponding to a trolley velocity of 0.215 m/s and a predicted minimum braking distance of 0.610 m. The maximum payload swing angle is limited to 0.028 rad, and the crane safely stops after approximately 7.75 s with a total displacement of 0.656 m.

For the obstacle distance of 1.5 m, braking control is activated at t = 3.10 s with a trolley velocity of 0.405 m/s and a predicted minimum braking distance of 0.885 m. The maximum payload swing angle remains limited to 0.028 rad, and the crane eventually stops after 7.69 s with a total displacement of 1.377 m.

Compared with Experimental Group 1, a longer rope length results in a larger braking-distance requirement owing to altered pendulum dynamics in the crane system. Consequently, the safety-critical region expands and braking control is activated earlier. Despite the increased control challenge, the proposed TD3–MPC controller maintains stable braking performance and effective swing suppression.

The experimental results obtained from the three experimental groups demonstrate that the proposed braking framework maintains satisfactory performance under different payload masses and rope lengths. Variations in payload mass and rope length alter the dynamic characteristics of the crane system and consequently affect the minimum braking-distance requirement. Since the safety-critical region is constructed based on the predicted braking distance, these variations directly influence its size.

As the payload mass or rope length increases, a larger braking distance is generally required, resulting in an expanded safety-critical region and earlier braking activation. Nevertheless, the SVR-based braking-distance prediction model successfully captures these changes and provides accurate braking-distance estimates under different operating conditions. Combined with the TD3–MPC controller, the proposed framework consistently achieves safe stopping and effective payload swing suppression, demonstrating good adaptability and practical applicability under varying operating conditions.

6. Conclusions

This study proposes a safe braking method for overhead cranes by integrating TD3-MPC with safety-critical region prediction. Through the coordinated design of the control strategy and braking decision-making mechanism, the proposed method achieves simultaneous improvement of braking performance and operational safety under complex working conditions. By combining intelligent control with data-driven braking decision-making, the proposed framework provides a systematic solution for safe emergency braking control of overhead cranes.

First, the TD3 algorithm is incorporated into the MPC framework to realize online adaptive optimization of key MPC parameters. This enables the controller to dynamically adjust braking behavior according to real-time system states, thereby achieving an effective balance between rapid braking and payload swing suppression. Subsequently, a data-driven minimum braking distance prediction model is established to quantify the braking capability of the system under different operating conditions. Based on this model, a safety-critical region prediction model is further constructed to dynamically determine the optimal braking trigger timing, overcoming the limitation of conventional fixed safety-distance methods that lack state dependency.

To validate the effectiveness of the proposed approach, systematic simulation and real-crane experimental studies are conducted under different swing-angle constraints and operating velocity conditions. Simulation results demonstrate that the proposed TD3–MPC method outperforms conventional MPC in terms of braking efficiency, swing suppression capability, and system stability while satisfying swing angle constraints. In particular, at higher operating velocities, the controller can adaptively adjust key parameters to account for variations in system dynamics, thereby achieving coordinated optimization of rapid braking and vibration suppression. Furthermore, real-crane experimental results verify that the proposed safety-critical region prediction method can accurately determine the braking trigger timing and coordinate effectively with the TD3–MPC braking controller, enabling safe and smooth stopping of the overhead crane under obstacle-avoidance conditions.

This study demonstrates the potential of integrating TD3-MPC with safety-critical region prediction for emergency braking of overhead cranes and provides both theoretical support and practical guidance for intelligent safe braking control under complex operating conditions. At present, the proposed safety-critical region prediction model mainly evaluates system states based on velocity information. Future work will further investigate the influence of payload mass, rope length, swing angle, and other operating factors on the proposed braking prediction framework to enhance its generality and adaptability. Meanwhile, the proposed control framework will be extended to more realistic crane models incorporating cable flexibility, distributed cable mass, and additional nonlinear dynamic characteristics, thereby improving model fidelity and promoting the practical deployment of intelligent safe braking technologies in industrial crane systems.

Author Contributions

W.Z.: writing—original draft, methodology, and validation; Y.W.: writing—review and editing and validation; M.L.: writing—review and editing and supervision; P.L.: supervision, methodology, and validation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (Grant No. 2025YFE0216900) and the National Natural Science Foundation of China (Grant No. U25A20342).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available on request from the authors.

Conflicts of Interest

The authors declare no competing interests.

Nomenclature

Symbol	Description
M	Trolley mass
m	Payload mass
l	Rope length
x	Trolley position
θ	Payload swing angle
u	Control input force
$u_{p r e v}$	Previous control input
N_p	Prediction horizon
N_c	Control horizon
Q	State weighting matrix
R	Control weighting matrix
λ	Reference trajectory parameter
α	Reference trajectory parameter
β	Reference trajectory parameter
$(\cdot)$ ′	Predicted value at the next time step
$θ_{\max}$	Maximum allowable payload swing angle
$v_{0}$	Trolley velocity at the braking trigger instant
$(\bar{\cdot})$	Normalized form of a variable
ρ	Safety factor
w	Payload width
K	Payload expansion coefficient

Appendix A

Appendix A.1. Overview of Objective Function and Constrained Optimization Problem

The core of overhead crane braking control lies in solving a quadratic programming (QP) problem within a finite horizon at each sampling instant to obtain the optimal control sequence, and applying the first element to the system. This optimization problem aims to track the adaptive braking reference trajectory while satisfying actuator saturation constraints and load swing angle safety constraints.

The objective function is defined as follows:

J (u) = \sum_{k = 1}^{N_{p}} {(y_{k} - r_{k})}^{T} Q (y_{k} - r_{k}) + \sum_{k_{i} = 1}^{N_{c}} u_{k_{i}}^{T} R u_{k_{i}}

(A1)

Thus, the optimal control input is obtained by solving the following:

J^{*} (u) = \min_{u} J (u)

(A2)

Appendix A.2. Control Input Constraints

The driving motor of the overhead crane has physical output limits, so the control input must satisfy amplitude constraints:

- u_{\max} \leq u_{k_{i}} \leq u_{\max}

(A3)

where

u_{\max}

denotes the maximum allowable control input amplitude.

Appendix A.3. Swing Angle Constraints

To prevent collision risks caused by excessive load swing during braking, the swing angle must be strictly limited within a safe range:

| θ (k) | \leq θ_{\max}

(A4)

where

θ (k)

is the load swing angle in the k-th control period; and

θ_{\max}

is the maximum allowable swing angle of the system.

Appendix A.4. Limitation of Direct State-Space Mapping

First, an attempt is made to establish a direct mapping between angle constraints and control inputs through the discrete state-space matrix. The constraint is written in matrix form:

G x_{b} (k) \leq φ

(A5)

where

G = [\begin{matrix} 1 & 0 & 0 \\ - 1 & 0 & 0 \end{matrix}]

and

φ = {[\begin{matrix} θ_{\max} & θ_{\max} \end{matrix}]}^{T}

.

According to the discrete state-space equation

x_{b} (k + 1) = A_{b} x_{b} (k) + B_{b} u (k)

, we have

G x_{b} (k + 1) = G A_{b} x_{b} (k) + G B_{b} u (k)

(A6)

However, after computing the coefficient matrices, it is found that

G B_{b} = [\begin{matrix} 0 \\ 0 \end{matrix}]

(A7)

This indicates that in the discrete state-space model, the direct influence coefficient of the control input

u

on the next-step swing angle

θ (k + 1)

is zero, making it impossible to directly establish a mapping between angle constraints and control inputs through the linear state-space equation. Therefore, the explicit analytical relationship between swing angle constraints and trolley acceleration (and thus control input) must be derived from the original kinematic equation.

Appendix A.5. Bounded Swing Angle Analysis Based on the Kinematic Equation

From the linearized 2D dynamic model, the load swing motion satisfies

\ddot{θ} + ω_{n}^{2} θ = - \frac{ω_{n}^{2}}{g} \ddot{x}, ω_{n} = \sqrt{\frac{g}{l}}

(A8)

where

ω_{n}

is the natural frequency of the system;

g

is the gravitational acceleration;

l

is the rope length; and

\ddot{x} = a

is the trolley acceleration.

Assuming the acceleration

a

remains constant within a single control period

T_{s}

(zero-order hold), integrating Equation (A8) over

[0, T_{s}]

yields the analytical expression of swing angle:

θ (T_{s}) = θ (0) \cos ω_{n} T_{s} + \frac{\dot{θ} (0)}{ω_{n}} \sin ω_{n} T_{s} + \int_{0}^{T_{s}} - \frac{a}{g} ω_{n} \sin ω_{n} t d t

(A9)

Evaluating the third term on the right-hand side of Equation (A9):

\int_{0}^{T_{s}} - \frac{a}{g} ω_{n} \sin ω_{n} t d t = - \frac{a}{g} (1 - \cos ω_{n} T_{s})

(A10)

Combining the initial condition terms into amplitude-phase form, let

A = θ (0)

and

B = \frac{\dot{θ} (0)}{ω_{n}}

; then,

A \cos ω_{n} T_{s} + B \sin ω_{n} T_{s} = \sqrt{A^{2} + B^{2}} \sin (ω_{n} T_{s} + ϕ)

(A11)

where

ϕ = \arctan (\frac{θ (0) ω_{n}}{\dot{θ} (0)})

. Due to the boundedness of the sine function

| \sin (\cdot) | \leq 1

, we have:

| θ (T_{s}) | \leq \sqrt{θ^{2} (0) + \frac{{\dot{θ}}^{2} (0)}{ω_{n}^{2}}} + |\frac{a}{g} (1 - \cos ω_{n} T_{s})|

(A12)

Further utilizing

|1 - \cos ω_{n} T_{s}| \leq 1

and

| a | \leq a_{\max}

(maximum allowable acceleration), the upper bound of the single-period swing angle is obtained as follows:

| θ (T_{s}) | \leq \sqrt{θ^{2} (0) + \frac{{\dot{θ}}^{2} (0)}{ω_{n}^{2}}} + \frac{ω_{n} T_{s}}{g} a_{\max}

(A13)

Extending the above single-period conclusion to the entire braking process, the global swing angle constraint is obtained as long as each control period satisfies this boundedness condition:

| θ (t) | \leq \sqrt{θ^{2} (0) + \frac{{\dot{θ}}^{2} (0)}{ω_{n}^{2}}} + \frac{ω_{n} T_{s}}{g} a_{\max}

(A14)

Appendix A.6. Explicit Solution for Maximum Allowable Acceleration

To ensure the swing angle always satisfies the safety constraint

| θ (t) | \leq θ_{\max}

, substituting Equation (A14) into this safety condition yields

\sqrt{θ^{2} (0) + \frac{{\dot{θ}}^{2} (0)}{ω_{n}^{2}}} + \frac{ω_{n} T_{s}}{g} a_{\max} \leq θ_{\max}

(A15)

Solving explicitly for

a_{\max}

, the acceleration constraint is obtained

a_{\max} \leq \frac{g}{ω_{n} T_{s}} (θ_{\max} - \sqrt{θ^{2} (0) + \frac{{\dot{θ}}^{2} (0)}{ω_{n}^{2}}})

(A16)

To achieve efficient braking, the acceleration utilization is maximized at the constraint boundary, so the equality is taken as the maximum allowable acceleration:

a_{\max} = \frac{g}{ω_{n} T_{s}} (θ_{\max} - \sqrt{θ^{2} (0) + \frac{{\dot{θ}}^{2} (0)}{ω_{n}^{2}}})

(A17)

Appendix A.7. Mapping from Acceleration Constraints to Control Input Constraints

Next, the mapping between acceleration constraints and control input is established through the horizontal dynamic equation of the trolley. From Equation (4):

\dot{v} = \frac{m g}{M} θ + \frac{1}{M} F

(A18)

where

M

is the trolley mass;

m

is the load mass; and

F

is the horizontal driving force applied by the motor to the trolley.

Since

\dot{v} = a = \ddot{x}

, Equation (A18) can be rewritten as

a = \frac{m g}{M} θ + \frac{1}{M} F

(A19)

Rearranging yields the explicit relationship between driving force and acceleration:

F = M a - m g θ

(A20)

Substituting the acceleration boundary

| a | \leq a_{\max}

into Equation (A20), the allowable range of control input

F

is obtained as follows:

- M a_{\max} - m g θ \leq F \leq M a_{\max} - m g θ

(A21)

In the discrete control framework, the control input

u_{k_{i}}

at step

k_{i}

is constrained by the swing angle

θ_{k_{i} - 1}

at step

k_{i} - 1

, so

- M a_{\max} - m g θ_{k_{i} - 1} \leq u_{k_{i}} \leq M a_{\max} - m g θ_{k_{i} - 1}

(A22)

Appendix A.8. Final Form of the Constrained Optimization Problem

Combining the control input saturation constraints and the load swing angle safety constraints, the overhead crane braking control problem is finally transformed into the following inequality-constrained optimization problem:

\{\begin{array}{l} \min_{u} J (u) = \sum_{k = 1}^{N_{p}} {(y_{k} - r_{k})}^{T} Q (y_{k} - r_{k}) + \sum_{k_{i} = 1}^{N_{c}} u_{k_{i}}^{T} R u_{k_{i}} \\ s . t . - u_{\max} \leq u_{k_{i}} \leq u_{\max} \\ - M a_{\max} - m g θ_{k_{i - 1}} \leq u_{k_{i}} \leq M a_{\max} - m g θ_{k_{i - 1}} \\ a_{\max} = \frac{g}{ω_{n} T_{s}} (θ_{\max} - \sqrt{θ_{k_{i} - 1}^{2} + \frac{{\dot{θ}}_{k_{i} - 1}^{2}}{ω_{n}^{2}}}) \\ ω_{n} = \sqrt{\frac{g}{l}} \end{array}

(A23)

References

Li, D.; Xie, T.; Li, G.; Yao, J.; Hu, S. Adaptive coupling tracking control strategy for double-pendulum bridge crane with load hoisting/lowering. Nonlinear Dyn. 2024, 112, 8261–8280. [Google Scholar] [CrossRef]
Zhang, M.; Zhang, Y.; Chen, H.; Cheng, X. Model-independent PD-SMC method with payload swing suppression for 3D overhead crane systems. Mech. Syst. Signal Process. 2019, 129, 381–393. [Google Scholar] [CrossRef]
Li, S.H.; Li, K.D.; Gao, Y.S.; Wang, A.-H. Neural network adaptive robust control strategy for bridge crane based on partial state feedback of trolley. J. Mech. Sci. Technol. 2025, 39, 7065–7074. [Google Scholar] [CrossRef]
Yang, T.; Sun, N.; Chen, H.; Fang, Y. Neural network-based adaptive anti-swing control of an underactuated ship-mounted crane with roll motions and input dead zones. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 901–914. [Google Scholar] [CrossRef]
Yang, T.; Sun, N.; Chen, H.; Fang, Y. Swing suppression and accurate positioning control for underactuated offshore crane systems suffering from disturbances. IEEE-CAA J. Autom. Sin. 2020, 7, 892–900. [Google Scholar]
Reddy, B.S. Emergency Braking in an Overhead Crane using Sliding Mode Control under Unknown Uncertainties and Disturbances. In Proceedings of the International Conference on Control, Automation and Systems (ICCAS), Jeju, Republic of Korea, 29 October 2024–1 November 2024; pp. 1371–1376. [Google Scholar]
Chen, H.; Liu, G.; Tian, G.; Zhang, J.; Ji, Z. Safe distance prediction for braking control of bridge cranes considering anti-swing. Int. J. Intell. Syst. 2022, 37, 4845–4863. [Google Scholar]
He, W.; Zhang, S.; Ge, S.S. Adaptive Control of a Flexible Crane System with the Boundary Output Constraint. IEEE Trans. Ind. Electron. 2014, 61, 4126–4133. [Google Scholar] [CrossRef]
Zhang, M.; Jing, X. Adaptive Neural Network Tracking Control for Double-Pendulum Tower Crane Systems with Nonideal Inputs. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 2514–2530. [Google Scholar]
Fang, Y.; Ma, B.; Wang, P.; Zhang, X. A Motion Planning-Based Adaptive Control Method for an Underactuated Crane System. IEEE Trans. Control Syst. Technol. 2012, 20, 241–248. [Google Scholar]
Mei, Y.; Liu, Y.; Wang, H.; Cai, H. Adaptive Deformation Control of a Flexible Variable-Length Rotary Crane Arm with Asymmetric Input-Output Constraints. IEEE Trans. Cybern. 2022, 52, 13752–13761. [Google Scholar] [CrossRef]
Yu, J.; Shi, P.; Dong, W.; Lin, C. Command filtering-based fuzzy control for nonlinear systems with saturation input. IEEE Trans. Cybern. 2017, 47, 2472–2479. [Google Scholar] [CrossRef] [PubMed]
Sun, Z.; Bi, Y.; Zhao, X.; Sun, Z.; Ying, C.; Tan, S. Type-2 fuzzy sliding mode anti-swing controller design and optimization for overhead crane. IEEE Access 2018, 6, 51931–51938. [Google Scholar] [CrossRef]
Ouyang, H.; Hu, J.; Zhang, G.; Mei, L.; Deng, X. Decoupled linear model and S-shaped curve motion trajectory for load sway reduction control in overhead cranes with double-pendulum effect. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2019, 233, 3678–3689. [Google Scholar] [CrossRef]
Sun, N.; Wu, Y.; Chen, H.; Fang, Y. Anti-swing cargo transportation of underactuated tower crane systems by a nonlinear controller embedded with an integral term. IEEE Trans. Autom. Sci. Eng. 2019, 16, 1387–1398. [Google Scholar] [CrossRef]
Yamamoto, M.; Honda, E.; Mohri, A. Safe Automatic Emergency Stop Control of Gantry Crane Including Moving Obstacles in Its Workspace. In Proceedings of the Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 253–258. [Google Scholar]
Hino, H.; Kobayashi, Y.; Higashi, T.; Ota, J. Control methodology of stacker cranes for collision avoidance considering dynamics in a warehouse. In Proceedings of the 2009 IEEE International Conference on Robotics and Biomimetics (ROBIO), Guilin, China, 19–23 December 2009; pp. 983–988. [Google Scholar]
Ma, B.; Fang, Y.; Zhang, Y. Switching-based emergency braking control for an overhead crane system. IET Control Theory Appl. 2010, 4, 1739–1747. [Google Scholar] [CrossRef]
Veciana, J.M.; Cardona, S.; Catala, P. Minimizing residual vibrations for non-zero initial states: Application to an emergency stop of a crane. Int. J. Precis. Eng. Manuf. 2013, 14, 1901–1908. [Google Scholar] [CrossRef]
Chen, H.; Fang, Y.; Sun, N. A payload swing suppression guaranteed emergency braking method for overhead crane systems. J. Vib. Control 2018, 24, 4561–4660. [Google Scholar]
Chen, H.; Xuan, B.; Yang, P.; Chen, H. A new overhead crane emergency braking method with theoretical analysis and experimental verification. Nonlinear Dyn. 2019, 98, 2211–2225. [Google Scholar] [CrossRef]
Chen, H.; Li, M.; Wu, Y. An emergency braking method with swing suppression and safety limits consideration for double pendulum cranes. Control Eng. Pract. 2023, 139, 105638. [Google Scholar] [CrossRef]
Deka, A.; Basireddy, S.R. A Fuzzy Controller for the Emergency Braking Problem in Overhead Cranes. In Proceedings of the 2022 IEEE Delhi Section Conference (DELCON), New Delhi, India, 11–13 February 2022; pp. 1–6. [Google Scholar]
Deka, A.; Basireddy, S.R. Emergency Braking Control in 3D Overhead Cranes Using a Switching PD-Fuzzy Controller. In Proceedings of the 2023 9th International Conference on Control, Automation and Robotics (ICCAR), Bejing, China, 21–23 April 2023; pp. 285–290. [Google Scholar]
Chen, H.; Liu, G.; Tian, G.; Zhang, J.; Ji, Z. Adaptive Safe Distance Prediction Using MPC for Bridge Cranes Considering Anti-Swing. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; pp. 1914–1919. [Google Scholar]
Nasti, S.M.; Najar, Z.A.; Chishti, M.A. Adaptive mapless mobile robot navigation using deep reinforcement learning based improved TD3 algorithm. Front. Robot. AI 2025, 12, 1625968. [Google Scholar] [CrossRef]
Yuste, P.C.; Martínez, J.A.I.; de Miguel, M.A.S. Simulation-based evaluation of model-free reinforcement learning algorithms for quadcopter attitude control and trajectory tracking. Neurocomputing 2024, 608, 128362. [Google Scholar] [CrossRef]
Wang, T.; Tan, N.; Qiu, J.; Zheng, Z.; Lin, C.; Wang, H. A novel model-free adaptive terminal sliding mode controller for bridge cranes. Meas. Control 2023, 56, 1217–1230. [Google Scholar] [CrossRef]
Chen, Q.P.; Yu, B.H.; Min, S.L. Study on Intelligent Vehicle Trajectory Planning and Tracking Control Based on Improved APF and MPC. Int. J. Automot. Technol. 2025, 26, 715–728. [Google Scholar] [CrossRef]
Lv, C.; Xue, B.; Chen, G.; Chen, J.; Yu, H. Event triggered disturbance observer based nonlinear MPC for agricultural machinery trajectory tracking control. Comput. Electron. Agric. 2026, 248, 111798. [Google Scholar] [CrossRef]
Zhang, M.; Ma, X.; Song, R.; Rong, X.; Tian, G.; Tian, X.; Li, Y. Adaptive proportional-derivative sliding mode control law with improved transient performance for underactuated overhead crane systems. IEEE-CAA J. Autom. 2018, 5, 683–690. [Google Scholar] [CrossRef]
Tripura, T.; Chakraborty, S. Learning to predict and control with sparse model discovery and deep temporal difference reinforcement learning. Mech. Syst. Signal Process. 2026, 251, 114226. [Google Scholar] [CrossRef]
Brandner, D.; Sebastien, G.; Sergio, L. Computationally efficient Gauss-Newton reinforcement learning for model predictive control. Comput. Chem. Eng. 2026, 209, 109605. [Google Scholar] [CrossRef]
Li, Z.; Guan, X.; Liu, C.; Li, D.; He, L.; Cao, Y.; Long, Y. Active disturbance rejection control based on Twin-Delayed Deep Deterministic Policy Gradient for an exoskeleton. J. Bionic Eng. 2025, 22, 1211–1230. [Google Scholar] [CrossRef]
Pan, Y.; Wang, Y.; Ran, B. A Hierarchical Dynamic Path Planning Framework for Autonomous Vehicles Based on Physics-Informed Potential Field and TD3 Reinforcement Learning. Appl. Sci. 2026, 16, 3610. [Google Scholar] [CrossRef]
Shi, J.Y.; Zhou, J.; Lyu, Y.Z. Parallel grid search-enhanced support vector regression for small unmanned helicopter modeling. Eng. Comput. 2026, 43, 349–362. [Google Scholar] [CrossRef]

Figure 1. Simplified schematic of an overhead crane system.

Figure 2. Architecture of the proposed TD3-MPC braking control system.

Figure 3. SVR prediction curve.

Figure 4. Schematic of the safety-critical region for the overhead crane.

Figure 5. Responses of trolley displacement, payload swing angle, and trolley velocity under Scenario 1 for different control strategies.

Figure 6. TD3-MPC parameter evolution under Scenario 1.

Figure 7. Responses of trolley displacement, payload swing angle, and trolley velocity under Scenario 2 for different control strategies.

Figure 8. TD3-MPC parameter evolution under Scenario 2.

Figure 9. Responses of trolley displacement, payload swing angle, and trolley velocity under Scenario 3 for different control strategies.

Figure 10. TD3-MPC parameter evolution under Scenario 3.

Figure 11. Overhead crane experimental platform.

Figure 12. Experimental braking distance data and corresponding SVR prediction curve for Experimental Group 1.

Figure 13. Safety-critical region prediction and braking control results for Experimental Group 1 with an obstacle distance of 0.8 m.

Figure 14. Safety-critical region prediction and braking control results for Experimental Group 1 with an obstacle distance of 1.5 m.

Figure 15. Experimental braking distance data and corresponding SVR prediction curve for Experimental Group 2.

Figure 16. Safety-critical region prediction and braking control results for Experimental Group 2 with an obstacle distance of 0.8 m.

Figure 17. Safety-critical region prediction and braking control results for Experimental Group 2 with an obstacle distance of 1.5 m.

Figure 18. Experimental braking distance data and corresponding SVR prediction curve for Experimental Group 3.

Figure 19. Safety-critical region prediction and braking control results for Experimental Group 3 with an obstacle distance of 0.8 m.

Figure 20. Safety-critical region prediction and braking control results for Experimental Group 3 with an obstacle distance of 1.5 m.

Table 1. TD3 training configuration.

Item	Configuration
Actor network structure	3 → 256 → 256 → 128 → 8
Critic network structure	11 → 256 → 256 → 1
Learning rate	Actor: 3 × 10⁻⁴, Critic: 3 × 10⁻⁴
Batch size	256
Replay buffer size	50,000
Discount factor	0.99
Target network update coefficient	0.005
Training episodes	3 (pretraining stage)
Exploration noise	OU noise (σ = 0.15, θ = 0.2, clip = [−0.5, 0.5])
Activation function and optimizer	ReLU (hidden), Sigmoid (output); AdamW

Table 2. Simulation scenarios and parameter settings for TD3–MPC braking control.

Scenario	θmax/rad	M/kg	m/kg	l/m	v/(m/s)
Scenario 1	0.03	280	30	2	0.3
Scenario 2	0.01	280	30	2	0.3
Scenario 3	0.03	280	30	2	0.5

Table 3. Ablation study of the composite reward function under Scenario 1.

Reward Configuration	Braking Distance/m	Maximum Swing Angle/rad
Full Reward	0.359	0.027
w/o State tracking reward	0.397	0.026
w/o Control constraint reward	0.379	0.027
w/o Parameter optimization reward	0.386	0.028
w/o Safety constraint penalty	0.363	0.031

Table 4. Quantitative performance comparison between MPC and TD3–MPC.

Scenario	Method	Braking Distance/m	Maximum Swing Angle/rad	Settling Time/s
Scenario 1	MPC TD3–MPC	0.464 0.359	0.029 0.027	6.95 5.10
Scenario 2	MPC TD3–MPC	0.989 0.856	0.011 0.009	8.20 6.50
Scenario 3	MPC TD3–MPC	1.118 0.901	0.030 0.027	6.85 5.20

Table 5. Computational performance of the proposed TD3-MPC framework.

Component	Average Time/ms	Maximum Time/ms
TD3 inference	0.72	1.02
MPC optimization	21.35	30.48
Total control cycle	25.12	35.67

Table 6. Parameter settings for different experimental groups.

Experimental Group	θmax/rad	M/kg	m/kg	l/m
Group 1	0.03	280	30	1.5
Group 2	0.03	280	50	1.5
Group 3	0.03	280	30	2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, W.; Wang, Y.; Liu, M.; Lan, P. TD3-Enhanced MPC for Safe Braking of Overhead Cranes with Safety-Critical Region Prediction. Actuators 2026, 15, 334. https://doi.org/10.3390/act15060334

AMA Style

Zhang W, Wang Y, Liu M, Lan P. TD3-Enhanced MPC for Safe Braking of Overhead Cranes with Safety-Critical Region Prediction. Actuators. 2026; 15(6):334. https://doi.org/10.3390/act15060334

Chicago/Turabian Style

Zhang, Wenshuai, Yifan Wang, Manlan Liu, and Peng Lan. 2026. "TD3-Enhanced MPC for Safe Braking of Overhead Cranes with Safety-Critical Region Prediction" Actuators 15, no. 6: 334. https://doi.org/10.3390/act15060334

APA Style

Zhang, W., Wang, Y., Liu, M., & Lan, P. (2026). TD3-Enhanced MPC for Safe Braking of Overhead Cranes with Safety-Critical Region Prediction. Actuators, 15(6), 334. https://doi.org/10.3390/act15060334

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TD3-Enhanced MPC for Safe Braking of Overhead Cranes with Safety-Critical Region Prediction

Abstract

1. Introduction

2. Model Predictive Braking Control Design

2.1. Dynamic System Model

2.2. Linearization and Discretization

2.3. Adaptive Braking Reference Trajectory

2.4. Objective Function and Constraint

3. TD3-Based Adaptive Control Framework

3.1. State and Action Space Design

3.2. Reward Function Construction

3.3. TD3 Learning and Update Strategy

3.4. TD3-MPC Integrated Control Framework

4. Safety-Critical Braking Distance Prediction Model

5. Examples and Results

5.1. Simulation Analysis of TD3-MPC Braking Control

5.2. Real Crane Experiment

5.2.1. Experimental Group 1

5.2.2. Experimental Group 2

5.2.3. Experimental Group 3

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Nomenclature

Appendix A

Appendix A.1. Overview of Objective Function and Constrained Optimization Problem

Appendix A.2. Control Input Constraints

Appendix A.3. Swing Angle Constraints

Appendix A.4. Limitation of Direct State-Space Mapping

Appendix A.5. Bounded Swing Angle Analysis Based on the Kinematic Equation

Appendix A.6. Explicit Solution for Maximum Allowable Acceleration

Appendix A.7. Mapping from Acceleration Constraints to Control Input Constraints

Appendix A.8. Final Form of the Constrained Optimization Problem

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI