Approximated Adaptive Dynamic Programming Control of Axial-Piston Pump

Kralev, Jordan; Mitov, Alexander; Slavov, Tsonyo

doi:10.3390/math14071127

Open AccessFeature PaperArticle

Approximated Adaptive Dynamic Programming Control of Axial-Piston Pump

by

Jordan Kralev

¹

,

Alexander Mitov

²

and

Tsonyo Slavov

^1,*

¹

Department of Systems and Control, Technical University of Sofia, Kliment Ohridski 8 Boulevard, 1000 Sofia, Bulgaria

²

Department of Hydroaerodynamics and Hydraulic Machines, Technical University of Sofia, Kliment Ohridski 8 Boulevard, 1000 Sofia, Bulgaria

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(7), 1127; https://doi.org/10.3390/math14071127

Submission received: 24 February 2026 / Revised: 21 March 2026 / Accepted: 26 March 2026 / Published: 27 March 2026

(This article belongs to the Special Issue Advances in Robust Control Theory and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

This article presents the synthesis, real-time implementation, and experimental validation of an approximated adaptive dynamic programming (AADP) actor–critic controller for precise flow rate regulation of a variable-displacement axial-piston pump designed for open-circuit hydraulic systems. Replacing the conventional hydro-mechanical regulator with an electrohydraulic proportional spool valve, the model-free controller employs two compact two-layer neural networks: the actor generates valve PWM signals from the flow tracking error, its integral, and measured discharge pressure, while the critic approximates the infinite-horizon quadratic cost-to-go via the online solution of the Bellman equation through gradient descent on Bellman residuals. Lyapunov analysis establishes closed-loop stability under bounded learning rates, with initial weights tuned via nominal plant simulation to ensure convergence from feasible starting policies. After extensive laboratory testing across four fixed loading conditions and dynamic load variations, the adaptive controller demonstrated superior performance compared with a proportional-integral (PI) controller, a Lyapunov model-reference adaptive controller (LMRAC), and an H_∞ controller (Hinf). Real-time metrics confirm bounded critic signals and near-zero Bellman errors, validating optimal policy convergence amid unmodeled hydraulic nonlinearities.

Keywords:

approximated adaptive; dynamic programming; actor–critic control; reinforcement; axial-piston pump

MSC:

93C83

1. Introduction

The requirement for high energy efficiency of hydraulic drives is usually achieved by volumetric velocity control of hydraulic cylinders or motors. This control principle is based on the use of a variable displacement pump. When ignoring hydraulic losses in the system, it is assumed that the input power is approximately equal to the output power, which means that as much energy is used as is necessary for the drive. Of course, there are always losses of hydraulic energy in the systems, but despite this, the overall efficiency remains high (especially in closed-loop circulation systems), since it depends mainly on the efficiency of the two displacement machines—pump and motor. As is known, hydraulic displacement rotary machines have a relatively high overall efficiency [1,2,3].

The variable displacement pump’s displacement volume is regulated through a regulator that is part of the pump. Conventional regulators of open circulation pumps are hydro-mechanical, and the control laws that they most often implement are pressure (DR), flow rate (DFR) or power (DFLR). Axial-piston pumps are most often used due to their known advantages [4]. With the development of electrohydraulic proportional valves, the possibilities of regulating the displacement volume of pumps have increased. The conventional regulator is replaced by a proportional spool valve [5]. In addition, the implementation of systems with load-sensing (meter-in or meter-out) of the hydraulic cylinder further increases the overall efficiency of the hydraulic system.

In parallel with the development of hydraulic proportional valves, the control capabilities are also developing [6]. With the advent of advanced, robust, and adaptive control [7,8], new prerequisites are found for the development of hydraulic drive technology, which include the displacement volume control of pumps. There are several studies dedicated to the control of axial-piston pumps through advanced control techniques based on the LQR, H_∞, and μ-controller [9,10,11,12,13,14,15,16,17,18,19,20]. There are also studies dedicated to adaptive control [21,22,23,24,25,26]. The main reason is that they achieve a higher quality of control than classical PID laws. Moreover, the use of modern microcontrollers allows the control law of the proportional valve to be embedded and/or to be utilized in real time. Of course, this motivates a number of researchers and practitioners to find synergy between the development of hydraulic drive technology and control theory through the application of modern control laws in order to achieve higher overall efficiency of the entire system.

The authors have studied the control performance of conventional (PID) and advanced (LQR, H_∞, μ, and adaptive) control laws for displacement volume control of an axial-piston pump designed for hydraulic systems with open circulation [27,28,29]. The classical hydro-mechanical pump regulator is replaced by a proportional valve, which is controlled by a PLC [30]. The control law is implemented through an appropriate programming environment and operates in real time [31]. Regardless of the class of the control law (conventional or advanced), the task of synthesizing a controller requires obtaining an adequate mathematical model of the plant. Obtaining such a model for a plant is most often done through system identification [32,33]. This makes the synthesis complex. Modern control theory also offers opportunities for synthesizing the controllers without the presence of a plant model, and this control is known as model-free. One of these control techniques is adaptive actor–critic. This is a hybrid reinforcement learning technique combining an actor to generate control inputs and a critic to evaluate them, using neural networks to adaptively manage, stabilize, and optimize uncertain nonlinear systems in real time. A number of publications testify to its application in controlling various hydraulic drives [34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51], but there are almost none for controlling the displacement volume of an axial-piston pump. This motivates the authors in the present work to synthesize an approximated adaptive dynamic programming controller (actor–critic type) for an axial-piston pump. The advantages of this type of controller are:

A priori information about the plant is not required;
Great freedom in choosing the structure and parameters;
Use of neural networks that are not tied to the plant;
Achieving optimal control according to accepted criteria;
Guaranteed stability until the size of the adaptation step is reached;
Allows analytical proof of stability.

There are also some disadvantages of this type of controller:

More complex algorithm for implementation;
Requires a balance between adaptation velocity and control performance;
Sensitivity to initial conditions.

The main goal of this article is to present the synthesis, implementation and experimental study of an approximated adaptive dynamic programming controller designed for a variable displacement axial-piston pump intended for hydraulic systems with open circulation. The synthesis of the adaptive controller is based on two two-layer neural networks—actor and critic, with hyperbolic tangent (tanh) activation functions. The weights of the network are tuned in real time through the gradient descent algorithm. The criterion for tuning the action network is the approximated total cost-to-go calculated from the critic network, while the criterion for tuning the critic network is the minimization of the Bellman error from the recursive Bellman equation.

The main contributions of this article are:

First actor–critic (AC) approximate adaptive dynamic programming (AADP) controller for axial-piston pump displacement, with Lyapunov-proven stability and tanh-based two-layer NNs tuned via backpropagation on Bellman error.
Full-scale lab validation with real-time recording of flow/pressure/control signals vs. baselines, quantifying gains under fixed/varying loads.
Simulink^® rapid-prototyping framework for hydraulics, with initial weights from simulation.

These contributions advance the field of hydraulic displacement pump control by demonstrating the effectiveness of adaptive control techniques in maintaining stability and performance across varying load conditions.

The article is organized as follows: Section 2 presents the experimental test setup, plant model, and adaptive controller design; Section 3 presents the stability analysis; Section 4 presents the experimental results obtained with the designed adaptive controller; and Section 5 presents a short conclusion.

2. Adaptive Controller Design

2.1. Axial-Piston Pump Experimental Test Setup

The plant is an A10VSO swash-plate-type variable displacement axial-piston pump, designed for open circulation hydraulic drive systems. The pump has a displacement volume of 18 cm³, and it is equipped with a VT-DFP electrohydraulic proportional spool valve, which replaces the classical DR regulator [52,53]. The pump is not suitable for application in hydraulic systems with secondary control [54]. In order to experimentally study various control algorithms for the proportional valve with respect to the displacement volume of the pump, the authors used an existing laboratory test rig [30]. The hydraulic circuit diagram and control subsystem are depicted in Figure 1. Based on the test rig, a subsystem for rapid prototyping of various algorithms for real-time displacement volume control has been developed [31]. A detailed description of the experimental test rig is presented in a previous work, and here only a brief description is given according to the hydraulic diagram shown in Figure 1. The hydraulic part of the system consists of a tank with a volume of 130 L, which serves as a mounting base of the experimental system. The axial-piston pump is driven by a 7.5 kW electric motor. The nominal speed of the electric motor is 1500 min⁻¹. A specially designed hydraulic block is mounted on the pump controller plate, enabling parallel operation of the proportional valve and the classic hydro-mechanical pressure regulator (DR). The VT-DFP proportional valve has a built-in LVDT sensor for the position of its spool and can be mounted on the pump, controlling it independently. The hydro-mechanical regulator has only a safety function, limiting the maximum value of the pump pressure. A pressure-relief valve and a throttle valve are connected in parallel in the pump discharge pipeline. Through them, the pump is loaded by pressure. The pressure-relief valve sets the maximum pressure in loading mode, and the throttle valve allows adjustment of the various output pressure values in a limited range. A gear flow meter and a pressure transducer are connected to the pump discharge pipeline to measure hydraulic parameters—flow rate and pressure.

The rapid prototyping subsystem is based on a VT-5041 external amplifier for the VT-DFP proportional spool valve. It is a selective plug-in card for this type of electrohydraulic proportional valve. An MC012-022 microcontroller [55] generates the control signal to the external amplifier. The system makes it possible to develop various control algorithms in the MATLAB/Simulink^® software environment (ver.2023a). The implementation of the control algorithms is in real time. The communication services between the workstation and microcontroller are realized through a CG150 USB/CAN interface. The control algorithms evaluate the control signal and provide power to the power stage of the external amplifier. The amplifier drives the solenoid of the proportional valve and processes the LVDT-feedback signal regarding the valve position. The control algorithm implemented as a Simulink^® model enables real-time displacement volume control of the pump. The chosen sample rate of 100 Hz ensures the precise regulation of the pump flow rate. The realization of the laboratory experimental setup is depicted in Figure 2.

2.2. Plant Model

Since the controller proposed in the article is based on the model-free design method, the plant model is not necessary for the controller synthesis. Herein, it is used only in off-line computer simulation in order to obtain proper initial conditions for initialization of controller parameters that provide stability of the initial system. The model used is obtained by an identification procedure, and it is presented in the authors’ previous article [33]. The main advantage of such a “black box” model is that it does not need a priori information for the plant.

The estimated discrete time state space model is of the 6th order with 2 outputs (pump flow rate and pump output pressure) and 1 input (proportional valve control signal). All input–output signals are measured with a sample time

T_{0} = 0.01 s

and presented in Figure 3 [29]. The plant model equations are

\begin{array}{l} x (k + 1) = A x (k) + B u (k) + K_{m} η (k), \\ y (k) = C x (k) + D u (k) + η (k) . \end{array}

(1)

The values of the matrices are

\begin{array}{l} A = (\begin{matrix} 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 \\ - 0.3 & 0.43 & 0.84 & - 0.01 & 0.04 & - 0.03 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ 0.57 & - 11.17 & 10.83 & 0.72 & - 1.62 & 1.9 \end{matrix}), B = (\begin{matrix} 0.005 \\ 0.161 \\ 0 \\ 0.209 \\ 0.046 \\ 0.058 \end{matrix}), \\ K_{m} = (\begin{matrix} 0 & 0.075 \\ 0.082 & - 0.015 \\ 0.031 & 0.031 \\ 0.423 & 0.099 \\ - 0.033 & 0.259 \\ 0.141 & - 0.029 \end{matrix}), C = (\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \end{matrix}) and D = (\begin{matrix} 0 \\ 0 \end{matrix}), \end{array}

(2)

where

x (k) = {(\begin{matrix} x_{1} (k) & x_{2} (k) & x_{3} (k) & x_{4} (k) & x_{5} (k) & x_{6} (k) \end{matrix})}^{T}

is the state vector,

y (k) = {(\begin{matrix} y_{Q} & y_{P} \end{matrix})}^{T}

is the output vector,

y_{Q}

(L/min) is the measured axial-piston pump flow rate,

y_{P}

(bar) is the measured axial-piston pump pressure,

u (k)

(mV) is the control signal (PWM),

η (k) = {(η_{Q} (k), η_{p} (k))}^{T}

is the residual vector,

η_{Q} (k)

is the pump flow rate residual error and

η_{P} (k)

is the pump pressure residual error. It is well known that if the elements of the residual vector are white Gaussian noises, then the estimated model parameters

A, B, C, D, K_{m}

are unbiased.

The comparison of the measured and modeled outputs is presented in Figure 4. As can be seen, the values of the obtained FIT are sufficiently large. The results from residual error correlation testing for model validation (Figure 5) show that the residual errors

η_{Q} (k)

and

η_{P} (k)

are white Gaussian noises. Also, it is seen that no correlation exists between residuals and the input signal (Figure 5).

In this paper, the estimated plant model in Equation (2) is utilized only in computer simulation since the proposed controller is inspired from the model-free design method. The results from the simulation of the closed-loop system are used for setting the proper values of initial conditions for neural network parameters. Also, the plant model in Equations (1) and (2) is utilized in [31] for H_∞ controller design, which is used in this article for comparison purposes.

2.3. Actor–Critic Controller Design

The block scheme of the designed control system is depicted in Figure 6. It consists of an axial-piston pump plant and an adaptive actor–critic controller. The critical part of the controller (

K_{c}

) evaluates system performance while the actor part (

K_{a}

) generates the control action, which is the signal applied to the proportional valve coil amplifier. The controller compares the desired pump flow rate

r_{f}

with the measured one

y_{f}

and forms the flow rate tracking error

e_{f}

. The tracking error, the integral of the tracking error

e_{int, f}

, and the measured pump pressure

y_{p}

act as controller inputs and they are processed by the action part to produce the appropriate control action. The critic part continuously computes the quality of the actor part and ensures the feedback signal for its real-time adaptation in order to optimize controller parameters over time and improve tracking performance.

2.3.1. Actor Part

The actor part of the controller is parametrized as a 2-layer neural network with 1 hidden level (

h_{K_{a}}

) and 1 output linear layer with weighting coefficients

w_{K_{a}}

, such that the control signal is formed by

u = w_{K_{a}} F_{σ} (V_{K_{a}}, x_{K_{a}}),

(3)

where

x_{K_{a}} = {[e_{f} e_{int, f} y_{p}]}^{T},

(4)

is the input vector of the controller. The tracking error and its discrete-time integral are evaluated from

\begin{array}{l} e_{f} (k) = r_{f} (k) - y_{f} (k) \\ e_{int, f} (k + 1) = e_{int, f} (k) + T_{0} e_{f} (k + 1), e_{int, f} (0) = 0 \end{array},

(5)

where

T_{0} = 0.01 s

is the sample time. The

V_{K_{a}}

is the weight matrix of the hidden layer with dimensions

3 \times 2

. The activation function

F_{σ}

is taken as the hyperbolic tangent function

F_{σ} (b) = \tanh 0.5 b = \frac{1 - e^{- b}}{1 + e^{b}} .

(6)

Function (6) is applied to two components of vector

V_{K_{a}} x_{K_{a}}

. The weights of the output layer form the row vector

w_{K_{a}}

with dimensions

1 \times 2

, which is used to produce the control signal in Equation (3).

The parameters of the action neural network

V_{K_{a}}

and

w_{K_{a}}

are tuned in real time to minimize the quadratic cost function

J (k)

at each time instant

t = k T_{0}

as

u^{k} = \arg \min J (u, k),

(7)

where

u^{k}

is the projection of the control signal on the interval

t \geq k T_{0}

. The cost function is formed by

J (k) = \sum_{j = 0}^{\infty} \frac{1}{2} α^{j} e_{f}^{2} (k + j),

(8)

where

α \leq 1

is the discount factor. The parametric approximation

\tilde{J} (k, p)

of

J (k)

is searched, where

p \in R^{n_{p}}

is a vector with parameters. To justify such a substitution, it should be guaranteed that

|\tilde{J} (k, p) - J (k)| < ε, f o r k > k_{0} .

(9)

The actor controller parameters (weights

V_{K_{a}}

and

w_{K_{a}}

of the actor neural network) are tuned in real time by the backpropagation rule to minimize the corresponding approximate optimal

\tilde{J} (k, p)

with respect to

V_{K_{a}}

and

w_{K_{a}}

. Let the elements of vector

p

be any of the elements of

V_{K_{a}}

and

w_{K_{a}}

; then the

p

is updated with step

Δ p = - l_{p} \frac{\partial \tilde{J} (k, p)}{\partial p} - k_{p} p,

(10)

where

l_{p}

is the constant learning rate and

k_{p}

is the forgetting factor, preventing extreme parameter divergences.

2.3.2. Critic Part

The approximation

\tilde{J} (k, p)

of

J (k)

is evaluated by the critic neural network, which has 1 hidden layer with a hyperbolic tangent activation function and 1 linear output layer:

\tilde{J} (k, p) = w_{K_{c}} F_{σ} (V_{K_{c}}, x_{K_{c}}),

(11)

where

x_{K_{c}} = {[e_{f} e_{int, f} y_{p} u]}^{T}

(12)

is the input vector of the critic part of the controller.

V_{K_{c}}

is the weight matrix of the hidden layer, with dimensions

4 \times 2

. The weights of the output layer form the row vector

w_{K_{c}}

with dimensions

1 \times 2

. Equation (8) is represented in recursive form as the Bellman equation

J (k) = \frac{1}{2} e_{f}^{2} (k) + α J (k + 1) .

(13)

In order to approximate

J (k)

with

\tilde{J} (k, p)

in the sense of Equation (9), it is necessary the index

\tilde{J} (k, p)

to minimize the Bellman error

ε (k) = (\frac{1}{2} e_{f}^{2} (k) + α \tilde{J} (k + 1, p)) - \tilde{J} (k, p),

(14)

which is used to form the criterion

J_{c} = \frac{1}{2} ε^{2} (k) .

(15)

Criterion (15) is exploited to train the weights of the critic neural network in Equation (11). The real-time tuning of the parameters of the critic neural network

p_{c}

is performed by

Δ p_{c} = - l_{p_{c}} \frac{\partial J_{c} (k, p)}{\partial p_{c}} - k_{p_{c}} p_{c},

(16)

where

l_{p_{c}}

is a constant learning rate and

k_{p_{c}}

is a regularization factor.

2.3.3. Parameter Gradient for Action Network

It is seen from Equation (10) that the tuning of action network weights depends on the derivative of

\tilde{J} (k, p)

with respect to general parameter

p

\frac{\partial \tilde{J} (k, p)}{\partial p} = \frac{\partial \tilde{J} (k, p)}{\partial u} \frac{\partial u}{\partial p},

(17)

where

\frac{\partial \tilde{J} (k, p)}{\partial u} = w_{K_{c}} G_{σ} (V_{K_{c}} x_{K_{c}}) V_{K_{c}} \frac{\partial x_{K_{c}}}{\partial u},

(18)

where

G_{σ}

is the gradient of Function (6) and is evaluated by

G_{σ} (V_{K_{c}} x_{K_{c}}) = \frac{\partial F_{σ} (V_{K_{c}} x_{K_{c}})}{\partial (V_{K_{c}} x_{K_{c}})} = 0.5 (1 - F_{σ}^{2} (V_{K_{c}} x_{K_{c}}))

(19)

and

\frac{\partial x_{K_{c}}}{\partial u} = {[\begin{matrix} 0 & 0 & 0 & 1 \end{matrix}]}^{T} .

(20)

The derivatives of the control signal with respect to the weights of the output layer and hidden layer are

\begin{array}{l} \frac{\partial u}{\partial w_{K_{a}}} = F_{σ} (V_{K_{a}}, x_{K_{a}}) \\ \frac{\partial u}{\partial V_{K_{a}}} = w_{K_{a}} G_{σ} (V_{K_{a}} x_{K_{a}}) x_{K_{a}} \end{array} .

(21)

2.3.4. Parameter Gradient for Critic Network

From Expression (16) it follows that the tuning of the critic network weights requires the derivative of

J_{c} (k, p)

with respect to the general parameter

p_{c}

\frac{\partial {\tilde{J}}_{c} (k, p)}{\partial p_{c}} = ε \frac{\partial ε}{\partial p_{c}} = - ε \frac{\partial \tilde{J} (k, p)}{\partial p_{c}} .

(22)

When

p_{c} = w_{K_{c}}

\frac{\partial \tilde{J} (k, p)}{\partial w_{K_{c}}} = F_{σ} (V_{K_{c}}, x_{K_{c}}),

(23)

since for the case when

p_{c} = V_{K_{c}}

\frac{\partial \tilde{J} (k, p)}{\partial V_{K_{c}}} = w_{K_{c}} G_{σ} (V_{K_{c}} x_{K_{c}}) x_{K_{c}} .

(24)

2.3.5. Initial Condition

The adaptation for both network parameters is gradient based. Thus, only a local extremum of the function can be reached. Therefore, the initial values of parameters are of significant importance. The closed-loop system must be started from a solution that ensures its stability and that is close enough to the desired reference.

For the proposed solution, the initial conditions of the controller are tuned during a numerical simulation of the closed-loop system with the identified model in Equation (1) with the matrices in Equation (2). The aim is to obtain an initially stable closed-loop system with a sufficient stability margin to both model and/or control parameter variations. The following considerations must be observed when tuning the initial values for an actor–critic controller.

The ranges of the hidden layer input signals should be bounded like

‖x_{K_{a}}‖ < B_{a}, ‖x_{K_{c}}‖ < B_{c}

(25)

and the weights of the hidden layer

V_{K_{a}}

,

V_{K_{c}}

for both action and critic networks should not cause a deviation into the saturation zone of the tanh activation function as

‖h_{a}‖ < 1 - ϵ, ‖h_{c}‖ < 1 - ϵ,

(26)

where

h_{a} = F_{σ} (V_{K_{a}} x_{K_{a}})

and

h_{c} = F_{σ} (V_{K_{c}} x_{K_{c}})

. Therefore, we require

V_{K_{a}} < F_{σ}^{- 1} (1 - ϵ) / B_{a}, V_{K_{c}} < F_{σ}^{- 1} (1 - ϵ) / B_{c},

(27)

because this condition will prevent the gradients from vanishing according to Equations (19) and (24), with increased rounding errors when the hidden signal is near unity. Ideally, the hidden layer outputs should be maintained around 0 with the proper selection of initial weight matrices.

The output layer weights of the action network

w_{K_{a}}

should be selected to provide enough amplification in the feedback loop to either stabilize the system output or bring the system output near reference. If we have a steady state, the gain of the open loop is calculated as

k_{s s} = C {(I - A)}^{- 1} B

, then

‖w_{K_{a}}‖ k_{s s} > 1 / ‖r - y‖ - 1

, but also a closed-loop oscillation, and stability should be observed when increasing

w_{K_{a}}

. Alternatively, such control signal scaling can be achieved with the inclusion of external amplification in the forward loop but external for the action network. If the magnitude of the initial weight matrix

w_{K_{a}}

left alone is too large, the gradient descent algorithm can have limited capability to tune it because of Equation (21). The output layer weights of the critic network are not magnitude-bound but could be selected to respect certain monotonicity or sign properties.

Randomization is usually applied over some range of prescribed weights since it is known that if all weights are the same in a network layer, their gradients are the same, and one will not see any specialization of individual neurons during learning. A priori knowledge for input importance is usually better reflected by input weighting vs weighing in the adaptive network initial weights, since weight adaptation can easily completely diminish the effect of the initial state.

The values used in the experiment for the parameters of the actor–critic system are presented in Table 1. They are chosen in order to obtain stable initial system on the basis of simulation with plant model in Equation (1).

3. Stability Analysis

There are some results for stability analysis of deep network dynamical programming systems that are generally based on the Lyapunov criteria, but the determination of the Lyapunov function is case specific. Here, brief analysis of closed-loop system stability with the designed action-critic controller is presented.

Theorem 1.

Let a continuous nonlinear system be given as

\dot{x} = f (x, u),

(28)

with

f

continuous on both

x

and

u

, and control signal

u = g (x, p),

(29)

with

g

continuous on both

x

and

p

. Let the controller’s parameters

p

be tuned by

\dot{p} = - l_{p} \frac{\partial J}{\partial p} - k_{p} p, k_{p} > 0,

(30)

where

J (t, p)

is a smooth function on parameters for every

t

. Given that

x = 0

is an asymptotically stable stationary point of the closed-loop system for the initial values of parameters

p = p_{0}

, then there exists an upper bound of learning rate

0 < l_{p} < {\bar{l}}_{p}

such that

x = 0

will remain an asymptotically stable stationary point for every

p (t), t > 0

.

Proof.

The possible candidate for the Lyapunov function of the closed-loop system is

V (x, p) = V_{x} (x) + \frac{1}{2} p^{T} p,

(31)

where

V_{x} (x)

is positive definite. The time derivative of Equation (31) is

\dot{V} (x, p) = \frac{\partial V_{x}}{\partial x} \dot{x} + p^{T} \dot{p} .

(32)

For

p = p_{0} = c o n s t

, we have that the system is asymptotically stable at

x = 0

; therefore time derivative

\dot{V} (x, p_{0}) = \frac{\partial V_{x}}{\partial x} \dot{x} \leq - γ {‖x‖}^{2} .

(33)

for some

γ > 0

and

‖x‖ < δ

. Taking into account Equations (30) and (33) and due to the continuity of Equation (32), we have for

p (t)

, where

‖p (t) - p_{0}‖ < δ_{p}

\dot{V} (x, p) = - γ {‖x‖}^{2} + p^{T} (- l_{p} \frac{\partial J}{\partial p} - k_{p} p),

(34)

Let

α = \max_{p, t} ‖\frac{\partial J}{\partial p}‖ > 0

. Then it follows that

\begin{array}{l} \dot{V} (x, p) \leq - γ {‖x‖}^{2} + p^{T} (- l_{p} α - k_{p} p) \Rightarrow \\ \dot{V} (x, p) \leq - γ {‖x‖}^{2} + l_{p} α ‖p‖ - k_{p} {‖p‖}^{2} \end{array} .

(35)

To provide

\dot{V} (x, p) \leq 0

it is sufficient that

l_{p} α ‖p‖ \leq γ {‖x‖}^{2} + k_{p} {‖p‖}^{2} .

(36)

Inequality (36) holds if

l_{p} < \frac{γ {‖x‖}^{2} + k_{p} {‖p‖}^{2}}{α ‖p‖} < {\bar{l}}_{p},

(37)

where

{\bar{l}}_{p} = \min_{x, p} \frac{γ {‖x‖}^{2} + k_{p} {‖p‖}^{2}}{α ‖p‖} .

(38)

Therefore for

l_{p} < {\bar{l}}_{p}

,

\dot{V} (x, p) < 0

, which ensures the stability of the closed-loop system. □

This result shows that, as can be expected, there is a tradeoff between adaptation speed and stability. The fast adaptation can lead to instability of the closed-loop system since the slow adaptation will provide stability at the expense of the control performance. Expression (35) provides an upper bound of the learning rate that ensures stability during adaptation.

4. Experimental Results

The designed controller is implemented as a fixed-step discrete-time Simulink^® model consisting of an actor and a critic subsystem together with the gradient calculation and the update rules. The weights of the tunable layers are implemented using an Integer Delay Block with a summator element in the feedback, allowing for dynamic updates of the weights on every simulation step. The layer activation is processed using matrix multiplication between the input vector and the weight matrix, followed by component-wise application of the tanh function for the hidden layer. Precomputed gradient expressions are implemented with mathematical blocks to apply feedback correction signals to the weight matrices.

The Simulink model of the AC controller can be used for numerical simulations, as well as for real-time C or ST code generation from Simulink^®. The code can be further embedded in a microcontroller or PLC. For rapid prototyping purposes, we can run the controller as well in the Real-Time Simulink^® mode. In this mode, a real-time CAN communication channel is established between the host PC and the analog frontend controlled by the industrial MC012-022 microcontroller (Danfoss, Nordborg, Denmark). The program in the microcontroller works as a proxy to deliver measured pressure, flow rate and LVDT feedback to the host PC and to execute the calculated control action by generating a PWM signal to the pump actuator.

It should be noted that the proposed AC controller is represented by a nonlinear state space model with 19 states (19th order), which is not so high an order. For example, the previously designed H_∞ controller [31] and µ-controller [28] are of the 11th and 21st order, respectively, and their design requires an adequate nominal plant model and an adequate uncertainty plant model in order to obtain robustness of closed-loop systems. In recent days, implementation of such high-order controllers in conventional industrial PLC or embedded microcontrollers is not a problem.

To assess and compare the performance of the AC controller, we define a standard quadratic cost as

e r r = \frac{1}{(t_{f} - t_{0})} \sum_{k = t_{0}}^{t_{f}} e_{f} {(k)}^{2}, e_{f} = r_{f} - y_{f},

(39)

where the sum of quadratic tracking flow rate error for the observation period is averaged by the length of the period. For proper interpretation of results, we compare results obtained by the AC controller with results achieved at the same experimental conditions with a conventional proportional-integral (PI) controller, a Lyapunov-based model reference adaptive controller (LMRAC), and an H_∞ controller. The LMRAC controller is chosen because it is designed on the basis of a simple first-order mathematical plant model. This model is not so adequate in the whole plant working range, which closely corresponds to the conception of model-free controller design. Also, the designed LMRAC controller guarantees stability of the closed-loop system as well as the proposed AC controller in this study. The H_∞ controller is chosen because such a controller is an obvious choice when the nominal model describes the plant dynamics sufficiently well, and the closed-loop system should be robust in a wide working range. Moreover, in contrast with the AC controller and LMRAC controller, which did not use a plant model or a simple first-order model, the presented H_∞ controller is designed based on the sixth-order state space plant model in Equations (1) and (2) [31]. This provides a different point of view of the procedure for controller synthesis with respect to requiring a priori information for plant dynamics. In Table 2 and Table 3, the values of the performance index in Equation (39) and the settling times of control systems

t_{s t}

evaluated for both cases of increasing and decreasing reference signal are presented. The results for settling time show the advantage of both adaptive control systems AC and LMRAC.

Figure 7 illustrates experimental results for flow rate regulation in an axial-piston pump under loading condition 2, characterized by the smallest restriction of the throttle valve and creating the highest loading conditions. The figure compares four control strategies: PI (proportional-integral), LMRAC (Lyapunov-based model reference adaptive control), Hinf (H∞ controller), and AC (actor–critic adaptive control), against a reference trajectory.

Over a period of 14 s, the flow rate (in L/min) starts from a baseline of around 17 L/min and experiences step changes representing system commands, and the controllers attempt to track the black reference line.

In the first transition, the PI controller (red) shows notable damped behavior and is unable to reach the commanded flow rate in reasonable time, while for the next transitions, the PI controller provides similar performance to those obtained with LMRAC (blue) and Hinf (pink). This is in accordance with the fact that PI controllers are tuned on the basis of the plant model corresponding to loading 2 (highest loading) [28]. The parameters of the PI controller are obtained from a condition for the desired closed-loop poles’ placement.

In the first transition, LMRAC (blue) demonstrates improved damping but still requires considerable time for adaptation; the AC controller (green) achieves the tightest tracking with minimal error and fastest stabilization across transients; and the Hinf controller (pink) first reaches the reference very fast, but for the next transitions both settling times

t_{s t}

are larger than ones for the adaptive controllers LMRAC and AC. This performance aligns with the evaluated performance index (Table 2) and settling times (Table 3), where AC yields 914.7 compared to PI’s 1668.1, Hinf ‘s 1202.1 and LMRAC’s 1354.3 for this loading, highlighting the adaptive neural network approach’s superiority in handling hydraulic nonlinearities and uncertainties in real-time pump displacement volume control.

Figure 8 presents experimental results for flow rate regulation in an axial-piston pump under loading condition 3. The flow rate command during this loading reaches up to 24 L/min. All controllers respond to the reference, but PI (red) exhibits a pronounced delay in reaction, while LMRAC (blue), Hinf (pink), and AC (green) show drastically reduced transients and a small steady-state error.

This loading can be deciphered as a greater challenge for AC in progressively minimizing adaptation error compared to condition 2, as evidenced by the performance index, where AC scores 1229.7 (versus PI’s 2620.3 and LMRAC’s 1442.3), However, we see that at steady state AC experiences oscillations with reduced amplitude compared to LMRAC, which is related to quieter pump operation with reduced vibration. Only for this loading, the Hinf controller achieves a slightly smaller value of the performance index in comparison with AC, which shows its robustness.

Figure 9 depicts experimental flow rate regulation for loading condition 4 with rates up to 24 L/min. PI displays significant settling time, and LMRAC and AC track closely with minor deviations, minimizing transients. The performance index of AC reaches 831.7, better than LMRAC’s 1119.1, Hinf’s 943.8 and PI’s 2522.4, again confirming the adaptive dynamic programming’s prevalence amid hydraulic uncertainties. As with loading 3, the performance of the AC system is better around high flow rates with reduced oscillation amplitude and frequency. In comparison, LMRAC even develops something that can be classified as auto-oscillatory behavior. With time progression, these effects can either settle or dampen.

It should be noted that the transient responses presented in Figure 8, Figure 9 and Figure 10 show the disadvantages of the PI controller, which is normal because it is not a robust controller. Moreover, they present a strong indication for changes in plant dynamics, which once again shows the complexity of the plant.

Figure 10 shows flow rate regulation performance for loading condition 5 on an axial-piston pump. This is the lowest loading from the pump operating envelope. The reference demands reach around 26 L/min with step disturbances. Again, the PI controller suffers heavy losses in performance due to a slow aperiodic reaction. In contrast, LMRAC is able to follow the reference with a small lag, but AC is able to achieve greater accuracy due to rapid settling with respect to disturbances. The performance index for AC reaches 605.3 here compared to PI’s 1397.8, Hinf’s 701.3, and LMRAC’s 791.7.

Figure 11 overlays control signals (in mV, then converted to PWM and sent to the proportional valve amplifier) from the actor–critic (AC) controller across loadings 2 (red), 3 (blue), 4 (green), and 5 (black) over 14 s, revealing adaptive behavior in response to flow reference steps and load-induced disturbances. In control theory terms, the actor network operates as a nonlinear function incorporating tracking error, its integral, and pressure feedback, processed through tanh-activated hidden layers before linear output scaling. Signals exhibit aggressive initial peaks (up to ~650 mV for loading 4) during transients, reflecting high-gain compensation for plant nonlinearities like valve dynamics and pressure variations, followed by a steady-state level. Loading 5 demands the largest control energy, as evident by shifting the lower bound of the control signal upwards, correlating with its lowest error (605.3), and as lower pressures allow bolder valve actuation for flow recovery. The other loadings show consistent lower-amplitude profiles, aligning with higher resistance in the system.

Figure 12 shows the PI controller output (PWM control signal in mV) for loadings 2–5, exhibiting the characteristic integral ramp followed by saturation near each flow reference level. For every step increase in reference, the PI signal rises monotonically toward about 230 mV, with the smallest load 5 requiring higher steady-state levels, while loading 2 control remains lower when flow increases due to higher output pressure helping in moving the pump swash plate. The PI signal dynamics is relatively slow and has an almost identical shape across cycles, reflecting the fixed proportional and integral gains that cannot adapt to changing operating conditions, and contributing to larger tracking errors compared with the actor–critic controller. For all loadings, the control signal for both the AC system and PI system remains under the hardware limit of 1000 mV.

Figure 13 presents pump discharge pressure (in bar) under actor–critic control for loadings 2 (red), 3 (blue), 4 (green), and 5 (black) over 14 s, showing load-dependent profiles matched with flow reference steps. Loading 5 maintains the lowest pressures (20–25 bar), reflecting lighter throttling; higher loadings escalate to 60–120 bar peaks, with loading 2 exhibiting the most abrupt pressure variations. In the highest loading, partial activation of the system relief valve is expected at around 100 bars. Pressures drop following flow rate reference drops across all cases, as reduced swash plate displacement minimizes output.

Figure 14 shows pump output pressure under PI control for loadings 2–5, mirroring the flow reference steps. Pressures climb sharply to 130 bar peaks during ups, apparently exceeding AC’s peaks by ~5–10 bar, probably due to PI’s sluggish valve response interacting with relief valve activation. Also, on the low side the PI pressure hovers 2-3 bar higher than AC’s dips.

Figure 15 plots the measured flow rate during sequential load variations over three levels, comparing PI (red) and AC (blue) performance with the reference (black) step command. AC tightly follows the reference with minimal overshoot or lag. The PI system exhibits pronounced undershoot after the second reference step and also experiences larger errors in the steady state. The three-step reference is required to prove the capability of the controller to handle variable reference beyond the basic step waveform.

Figure 16 shows the pump output pressure under random load variations for a fixed reference flow rate command. Such load variations are created by manually changing the opening of the throttle valve. The pressure range is similar for both controllers as dictated by load variation limits. The waveform follows the pressure flow response of the pump and it is evident how AC is able to quickly react to the changing load in less than a second, while the PI controller responds more slowly and exhibits large pressure ramps and long transients, especially during rising loads; this indicates limited disturbance rejection and poorer coordination with the relief valves, but its pressure trace is generally smoother with fewer high-frequency fluctuations.

The control signal for both the AC and PI controllers under random load variations and fixed flow rate command is presented in Figure 17. Since the pump response to valve opening is highly nonlinear, in this figure, we see how both controllers respond to this nonlinearity. The AC controller operates with a fixed mean level and slight variations around it to compensate for disturbances, rendering it highly energy efficient compared to PI. The PI controller experiences larger swings in control action due to integrator accumulation and needing more time to find the proper range of control. Both controllers stay within valve range limits and have a limited noise spectrum.

Figure 18 plots the real-time critic network output the parametric neural approximation

\tilde{J} (k, p)

of the infinite-horizon quadratic cost-to-go

J (k, p)

across the four loading conditions over the first 14 s of the experiment. The critic NN outputs have values in the range 0.05–0.15 during steady-state reference levels and spike briefly during some of the transients above 0.15, but converge rapidly from this spike via gradient descent of the action network weights. The bounded behavior of the critic signal and its stability indicates successful optimization of the critic network weights. Additionally, we observe that changing the loading does not shift the range of the critic signal, except slightly on the lower bound for loading 5. Such behavior reflects the boundedness and stability of the closed-loop tracking errors.

The Bellman error

ε (k)

(Figure 19) calculated from the approximated total cost-to-go

\tilde{J} (k, p)

also stays small for all loading disturbances with amplitudes below 6, except in the few time instants where it reaches a hundred. We see that non-zero deviations in the Bellman error are due to the presence of random system noise, which constantly tries to drive the system from its optimal trajectory. However, the Bellman error proves to be exponentially bounded in returning to 0 from such deviations in a few sample periods. Such behavior indicates that the provided critic values practically satisfy the Bellman equation and the closed-loop system follows the optimal trajectory according to the selected criteria.

5. Conclusions

This manuscript synthesizes, implements, and experimentally validates an approximated adaptive dynamic programming (AADP) actor–critic (AC) controller for precise flow rate regulation in a swash plate axial-piston pump (A10VSO, 18 cm³ displacement volume) used in open-circuit hydraulic drive systems, replacing the conventional hydro-mechanical regulator with a VT-DFP proportional valve driven by a MC012-022 microcontroller. The controller employs two two-layer neural networks: the actor generates valve PWM from the tracking error, its integral, and pressure, and the critic approximates the infinite-horizon quadratic cost via Bellman recursion.

The AC system is experimentally tested across four fixed loadings commanded by a throttle valve setting, and also tested across randomized dynamic variations of the loading. In these experiments, AC outperforms Hinf, PI, and LMRAC in observed transient responses and in the calculated ISE metric and settling time. The adaptive system stability is proved via Lyapunov analysis and bounding learning rates, whose requirement is also confirmed in simulation and experiments. Real-time recording of the critic network output and of the Bellman error confirms convergence under transients and varying reference and load.

This work bridges model-free RL with hydraulic engineering, advancing energy-efficient volumetric control amid rising electrification demands. The AC approach requires minimal a priori modeling, and therefore suits uncertain plants like pumps with losses/valve hysteresis, which is pivotal for hydraulics in industrial or mobile/construction machinery.

Author Contributions

Conceptualization, A.M., T.S. and J.K.; methodology, J.K., A.M. and T.S.; software, J.K.; validation, A.M.; formal analysis, T.S., J.K. and A.M.; investigation, A.M., T.S. and J.K. resources, A.M.; data curation, A.M.; writing—original draft preparation, T.S., A.M. and J.K.; writing—review and editing, J.K. and A.M.; visualization, A.M.; supervision, T.S.; project administration, A.M.; funding acquisition, T.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been accomplished with financial support by the European Regional Development Fund within the Operational Programme “Bulgarian national recovery and resilience plan”, Procedure for direct provision of grants “Establishing of a network of research higher education institutions in Bulgaria”, under the Project BG-RRP-2.004-0005 “Improving the research capacity anD quality to achieve intErnAtional recognition and reSilience of TU—Sofia (IDEAS)”.

Data Availability Statement

All needed data and information are included in the paper as tables, numerical values of parameters and graphical information (figures). If the readers request something more specific from the results in the paper, we will provide additional information or data as numerical values.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Findeisen, D.; Helduser, S. Ölhydraulik; Springer: Berlin, Germany, 2015. [Google Scholar]
Ivantysyn, J.; Ivantysynova, M. Hydrostatic Pumps and Motors: Principles, Design, Performance, Modelling, Analysis, Control and Testing; Academia Books International: New Delhi, India, 2001. [Google Scholar]
Manring, N. Fluid Power Pumps and Motors: Analysis, Design, and Control; McGraw-Hill Education: Columbus, OH, USA, 2013. [Google Scholar]
Frankenfield, T. Using Industrial Hydraulics; Rexroth Worldwide Hydraulics; Penton Publishing Inc.: New York, NY, USA, 1984. [Google Scholar]
Tonyan, M. Electronically Controlled Proportional Valves; Marcel Dekker, Inc.: New York, NY, USA, 1985. [Google Scholar]
Skarpetis, M.G. Automatic Control of Hydraulic Systems; Nova Science Publishers, Inc.: New York, NY, USA, 2023. [Google Scholar]
Zhou, K.; Doyle, J. Robust and Optimal Control; Prentice Hall International: Upper Saddle River, NJ, USA, 1996. [Google Scholar]
Åström, K.J.; Wittenmark, B. Adaptive Control; Courier Corporation: North Chelmsford, MA, USA, 2008. [Google Scholar]
Park, S.; Lee, J.; Kim, J. Robust control of the pressure in a control-cylinder with direct drive valve for the variable displacement axial piston pump. Proc. Inst. Mech. Eng. Part I J. Syst. Control Eng. 2009, 223, 455–465. [Google Scholar] [CrossRef]
Zeiger, G.; Akers, A. The application of linear optimal control techniques to axial piston pump controller design. SAE Tech. Pap. 1989, 890953. [Google Scholar] [CrossRef]
Lin, S.; Akers, A. Optimal control theory applied to pressure-controlled axial piston pump design. J. Dyn. Syst. Meas. Control Trans. ASME 1990, 112, 475–481. [Google Scholar] [CrossRef]
Berg, H.; Ivantysynova, M. Design and testing of a robust linear controller for secondary controlled hydraulic drive. Proc. Inst. Mech. Eng. Part I J. Syst. Control Eng. 1999, 213, 375–385. [Google Scholar] [CrossRef]
Zeiger, G.; Akers, A. Dynamic Analysis of an Axial Piston Pump Swashplate Control. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 1986, 1, 49–58. [Google Scholar] [CrossRef]
Berg, H.; Ivantysynova, M. Robust closed loop speed and angular position control for variable displacement hydraulic motors supplied from a constant pressure mains system. Olhydraulik Pneum. 1999, 43, 405–410. [Google Scholar]
Zhang, R.; Alleyne, A.; Prasetiawan, E. Modeling and H 2/H_∞ MIMO control of an earthmoving vehicle powertrain. J. Dyn. Syst. Meas. Control Trans. ASME 2002, 124, 625–636. [Google Scholar] [CrossRef]
Lennevi, J.; Palmberg, J.-O. Application and implementation of LQ design method for the velocity control of hydrostatic transmissions. Proc. Inst. Mech. Eng. Part I J. Syst. Control Eng. 1995, 209, 255–268. [Google Scholar] [CrossRef]
Heybroek, K.; Larsson, J.; Palmberg, J.-O. Open Circuit Solution for Pump Controlled. Actuators. In Proceedings of the 4th FPNI-PhD Symposium, Sarasota, FL, USA, 13–17 June 2006; pp. 27–40. [Google Scholar]
Zhang, P.; Li, Y. Research on Control Methods for the Pressure Continuous Regulation Electrohydraulic Proportional Axial Piston Pump of an Aircraft Hydraulic System. Appl. Sci. 2019, 9, 1376. [Google Scholar] [CrossRef]
Kemmetmüller, W.; Fuchshumer, F.; Kugi, A. Nonlinear pressure control of self-supplied variable displacement axial piston pumps. Control Eng. Pract. 2010, 18, 84–93. [Google Scholar] [CrossRef]
Wei, J.; Guo, K.; Fang, J.; Tian, Q. Nonlinear supply pressure control for a variable displacement axial piston pump. Proc. Inst. Mech. Eng. Part I J. Syst. Control Eng. 2015, 229, 614–624. [Google Scholar] [CrossRef]
Helian, B.; Mustalahti, P.; Mattila, J.; Chen, Z.; Yao, B. Adaptive robust pressure control of variable displacement axial piston pumps with a modified reduced-order dynamic model. Mechatronics 2022, 87, 102879. [Google Scholar] [CrossRef]
Feng, Y.; Jian, Z.; Li, J.; Tao, Z.; Wang, Y.; Xue, J. Advanced Control Systems for Axial Piston Pumps Enhancing Variable Mechanisms and Robust Piston Positioning. Appl. Sci. 2023, 13, 9658. [Google Scholar] [CrossRef]
Busquets, E.; Ivantysynova, M.; Handroos, H. Discontinuous projection-based adaptive robust control for displacement-controlled actuators. J. Dyn. Syst. Meas. Control Trans. ASME 2015, 137, 8. [Google Scholar] [CrossRef]
Haggag, S.A. Robust control and modelling of a heavy equipment variable displacement pump hydraulic system. Int. J. Heavy Veh. Syst. 2011, 18, 288–302. [Google Scholar] [CrossRef]
Guo, K.; Wei, J. Adaptive robust control of variable displacement pumps. In Proceedings of the American Control Conference, Washington, DC, USA, 17–19 June 2013. [Google Scholar]
Koivumäki, J.; Mattila, J. Adaptive and nonlinear control of discharge pressure for variable displacement axial piston pumps. J. Dyn. Syst. Meas. Control Trans. ASME 2017, 139, 101008. [Google Scholar] [CrossRef]
Mitov, A.; Slavov, T.; Kralev, J. Comparison of Advanced Multivariable Control Techniques for Axial-Piston Pump. Processes 2024, 12, 1797. [Google Scholar] [CrossRef]
Slavov, T.; Mitov, A.; Kralev, J. Novel Approach for Robust Control of Axial Piston Pump. Mathematics 2025, 13, 643. [Google Scholar] [CrossRef]
Slavov, T.; Mitov, A.; Kralev, J. Lyapunov-Based Two-Degree-of-Freedom Model Reference Adaptive Control of Axial-Piston Pump. Mathematics 2025, 13, 3513. [Google Scholar] [CrossRef]
Mitov, A.; Kralev, J.; Slavov, T.; Angelov, I. Design of Embedded Control System for Open Circuit Axial Piston Pump. In Proceedings of the 22nd International Symposium on Electrical Apparatus and Technologies, SIELA 2022, Bourgas, Bulgaria, 1–4 June 2022. [Google Scholar]
Mitov, A.; Slavov, T.; Kralev, J. Rapid Prototyping of H∞ Algorithm for Real-Time Displacement Volume Control of Axial Piston Pumps. Algorithms 2023, 16, 120. [Google Scholar] [CrossRef]
Ljung, L. System Identification: Theory for the User, 2nd ed.; Prentice Hall: Wilmington, DE, USA, 1999. [Google Scholar]
Mitov, A.; Kralev, J.; Slavov, T. Identification of Variable Displacement Axial-Piston Pump with Proportional Valve Control. In Proceedings of the 14th International Scientific Conference on Aeronautics, Automotive, and Railway Engineering and Technologies, Sozopol, Bulgaria, 10–13 September 2022. [Google Scholar]
Zhou, J.; Zhang, T.; Zhang, H.; Zhang, Z.; Hong, J.; Yang, J. Energy management strategy for electro-hydraulic hybrid electric vehicles considering optimal mode switching: A soft actor-critic approach trained on a multi-modal driving cycle. Energy 2024, 305, 132172. [Google Scholar] [CrossRef]
Krishnakumar, K.; Limes, G.; Gundy-Burlet, K.; Bryant, D. An adaptive critic approach to reference model adaptation. In Proceedings of the AIAA Guidance, Navigation, and Control Conference and Exhibit, USA, San Francisco, CA, USA, 11–14 August 2003. [Google Scholar]
Akraminia, M.; Tatari, M.; Fard, M.; Jazar, R.N. Designing active vehicle suspension system using critic-based control strategy. Nonlinear Eng. 2015, 4, 141–154. [Google Scholar] [CrossRef]
Agvik, R.; Vännman, S. Adaptive Control of Hydraulic Drive System; Chalmers University of Technology: Gothenburg, Sweden, 2023. [Google Scholar]
Han, T.; Nie, X.; Que, N.; Lu, J.; Yao, J.; Yu, X. Predefined-Time Tracking Control of Servo Hydraulic Cylinder Based on Reinforcement Learning. Actuators 2026, 15, 9. [Google Scholar] [CrossRef]
Li, Y.; Qi, X. Research on Motion Control of Hydraulic Manipulator Based on Prescribed Performance and Reinforcement Learning. Actuators 2026, 15, 39. [Google Scholar] [CrossRef]
Wei, X.; Ye, J.; Xu, J.; Tang, Z. Adaptive Dynamic Programming-Based Cross-Scale Control of a Hydraulic-Driven Flexible Robotic Manipulator. Appl. Sci. 2023, 13, 2890. [Google Scholar] [CrossRef]
Su, Q.; Pei, Z.; Tang, Z. Tracking Control for a Lower Extremity Exoskeleton Based on Adaptive Dynamic Programing. Biomimetics 2023, 8, 353. [Google Scholar] [CrossRef]
He, J.; Zhou, L.; Li, C.; Li, T.; Huang, J.; Su, S. Control Strategy of Hydraulic Servo Control Systems Based on the Integration of Soft Actor-Critic and Adaptive Robust Control. IEEE Access 2024, 12, 63629–63643. [Google Scholar] [CrossRef]
Teng, W.; Wang, G. Adaptive Optimal Control Based on Critic-Actor Architecture for Hydraulic Support Cylinder System with Asymmetric Output Error Constraints. Eng. Lett. 2025, 33, 3535–3542. [Google Scholar]
Kong, Y.; Wang, Y.; Wang, Y.; Zhu, S.; Zhang, R.; Wang, L. Deep Reinforcement Learning Trajectory Tracking Control for a Six-Degree-of-Freedom Electro-Hydraulic Stewart Parallel Mechanism. Eng 2025, 6, 212. [Google Scholar] [CrossRef]
Yuan, X.; Wang, Y.; Zhang, R.; Gao, Q.; Zhou, Z.; Zhou, R.; Yin, F. Reinforcement Learning Control of Hydraulic Servo System Based on TD3 Algorithm. Machines 2022, 10, 1244. [Google Scholar] [CrossRef]
Singh Sidhu, H.; Siddhamshetty, P.; Kwon, J.S. Approximate Dynamic Programming Based Control of Proppant Concentration in Hydraulic Fracturing. Mathematics 2018, 6, 132. [Google Scholar] [CrossRef]
Rout, R.; Kumawat, A.K. Reinforcement Learning-Based Position Tracking Control for Proportional Directional Control Valve-Based Electro-Hydraulic System. IEEE Access 2025, 13, 159597–159609. [Google Scholar] [CrossRef]
Jia, C.; Yu, T.; Song, Z. Robust reinforcement learning with augmented state for leveling control of multi-cylinder hydraulic system. J. Supercomput. 2025, 81, 1. [Google Scholar] [CrossRef]
Khater, A.A.; Fekry, M.; El-Bardini, M.; El-Nagar, A.M. Deep reinforcement learning-based adaptive fuzzy control for electro-hydraulic servo system. Neural Comput. Appl. 2025, 37, 24607–24624. [Google Scholar] [CrossRef]
Hao, X.; Xin, Z.; Huang, W.; Wan, S.; Qiu, G.; Wang, T.; Wang, Z. Deep reinforcement learning enhanced PID control for hydraulic servo systems in injection molding machines. Sci. Rep. 2025, 15, 1. [Google Scholar] [CrossRef]
Hu, P.; Wen, T.; Zhang, D. Bayesian reinforcement learning for adaptive control of energy recuperation in hydraulic excavator arms. Sci. Rep. 2026, 16, 6195. [Google Scholar] [CrossRef]
Rexroth Bosch Group. Pressure and Flow Control System; Technical Data Sheet, RE 30630; Rexroth Bosch Group: Lohr am Main, Germany, 2015. [Google Scholar]
Rexroth Bosch Group. Proportional Directional Valves, Direct Operated, with Electrical Position Feedback as Pilot Control Valve for Control Systems SY(H)DFE; Technical Data Sheet, RE 29016; Rexroth Bosch Group: Lohr am Main, Germany, 2019. [Google Scholar]
Kordak, R. Hydrostatic Drives with Control of the Secondary Unit; The Hydraulic Trainer Vol.6; Mannesmann Rexroth GmbH: Lohr am Main, Germany, 1996. [Google Scholar]
Danfoss. Plus+1 Controllers MC012-020 and 022; Data Sheet, 11077167, Rev DA; Danfoss: Nordborg, Denmark, 2013. [Google Scholar]

Figure 1. Hydraulic circuit diagram of the laboratory test bench.

Figure 2. Laboratory test setup.

Figure 3. Centered input–output data used for discrete-time model estimation.

Figure 4. Comparison between measured outputs and simulated model outputs.

Figure 5. Auto-correlation and cross-correlation tests of residual errors.

Figure 6. Block scheme of an adaptive actor–critic control system.

Figure 7. Experimental results for flow rate control at loading condition 2.

Figure 8. Experimental results for flow rate control at loading condition 3.

Figure 9. Experimental results for flow rate control at loading condition 4.

Figure 10. Experimental results for flow rate control at loading condition 5.

Figure 11. Experimental results for the control signal of the actor–critic adaptive system.

Figure 12. Experimental results for the control signal of the PI system.

Figure 13. Experimental results for the pump pressure of the actor–critic adaptive system.

Figure 14. Experimental results for the pump pressure of the PI system.

Figure 15. Experimental results for flow rate control at load variations.

Figure 16. Experimental results for pump output pressure at load variations.

Figure 17. Experimental results for the control signal at load variations.

Figure 18. Real-time approximation

\tilde{J}

of cost function

J

.

Figure 18. Real-time approximation

\tilde{J}

of cost function

J

.

Figure 19. Real-time computation of the Bellman error.

Table 1. Actor–critic system parameters.

Parameter	Value
$V_{K_{a}} (0)$	$\|\begin{matrix} 0.001 & 0.002 \\ 0.003 & 0.004 \\ 0.001 & 0.002 \end{matrix}\|$
Learning rate of the actor hidden layer	$10^{- 7}$
$w_{K_{a}} (0)$	$\|\begin{matrix} 400 & 600 \end{matrix}\|$
Learning rate of the actor output layer	$0.001$
$V_{K_{c}} (0)$	$\|\begin{matrix} 0.001 & 0.002 \\ 0.002 & 0.003 \\ 0.001 & 0.002 \\ 0.001 & 0.002 \end{matrix}\|$
Learning rate of the critic hidden layer	$0.0001$
$w_{K_{c}} (0)$	$\|\begin{matrix} 1 & 2 \end{matrix}\|$
Learning rate of the critic output layer	$0.0001$
$k_{p}$	0.999
$k_{p_{c}}$	1

Table 2. Performance index of control systems with various controllers.

Controller	$e r r$
Controller	Loading 2	Loading 3	Loading 4	Loading 5
LMRAC	1354.3	1442.3	1119.1	791.7
PI	1668.1	2620.3	2522.4	1397.8
H∞	1202.5	1164.2	943.8	701.3
AC	914.7	1229.7	831.7	605.3

Table 3. Settling time of control systems with various controllers.

Controller	$t_{s t}, s$ (Increasing Reference)				$t_{s t}, s$ (Decreasing Reference)
Controller	Loading 2	Loading 3	Loading 4	Loading 5	Loading 2	Loading 3	Loading 4	Loading 5
LMRAC	0.9	0.4	0.3	0.4	0.9	0.5	0.4	0.4
PI	2.5	1.5	1.3	1.5	1.5	1.7	1.3	1.7
H∞	1	0.8	1	1	1	0.7	1	1.4
AC	0.5	0.3	0.3	0.4	0.5	0.5	0.3	0.3

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kralev, J.; Mitov, A.; Slavov, T. Approximated Adaptive Dynamic Programming Control of Axial-Piston Pump. Mathematics 2026, 14, 1127. https://doi.org/10.3390/math14071127

AMA Style

Kralev J, Mitov A, Slavov T. Approximated Adaptive Dynamic Programming Control of Axial-Piston Pump. Mathematics. 2026; 14(7):1127. https://doi.org/10.3390/math14071127

Chicago/Turabian Style

Kralev, Jordan, Alexander Mitov, and Tsonyo Slavov. 2026. "Approximated Adaptive Dynamic Programming Control of Axial-Piston Pump" Mathematics 14, no. 7: 1127. https://doi.org/10.3390/math14071127

APA Style

Kralev, J., Mitov, A., & Slavov, T. (2026). Approximated Adaptive Dynamic Programming Control of Axial-Piston Pump. Mathematics, 14(7), 1127. https://doi.org/10.3390/math14071127

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Approximated Adaptive Dynamic Programming Control of Axial-Piston Pump

Abstract

1. Introduction

2. Adaptive Controller Design

2.1. Axial-Piston Pump Experimental Test Setup

2.2. Plant Model

2.3. Actor–Critic Controller Design

2.3.1. Actor Part

2.3.2. Critic Part

2.3.3. Parameter Gradient for Action Network

2.3.4. Parameter Gradient for Critic Network

2.3.5. Initial Condition

3. Stability Analysis

4. Experimental Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI