Real-Time Adaptive Linear Quadratic Regulator Control for the QUBE–2 Rotary Inverted Pendulum

Lopez-Jordan, Cynthia; Jafari, Mohammad

doi:10.3390/mca31020033

Open AccessArticle

Real-Time Adaptive Linear Quadratic Regulator Control for the QUBE–2 Rotary Inverted Pendulum

by

Cynthia Lopez-Jordan

and

Mohammad Jafari

^*

Robotics Engineering, Columbus State University, Columbus, GA 31907, USA

^*

Author to whom correspondence should be addressed.

Math. Comput. Appl. 2026, 31(2), 33; https://doi.org/10.3390/mca31020033

Submission received: 9 January 2026 / Revised: 14 February 2026 / Accepted: 24 February 2026 / Published: 27 February 2026

Download

Browse Figures

Versions Notes

Abstract

This paper presents a real-time adaptive Linear Quadratic Regulator (LQR) control strategy for the rotary inverted pendulum. The state weighting matrix of the LQR cost function is continuously adapted online based on real-time tracking error, state dynamics, and sliding-mode-inspired robustness measures. Unlike conventional LQR controllers with fixed weighting matrices or hybrid schemes that apply sliding mode control directly to the control input, the proposed approach modulates the LQR cost function itself, enabling dynamic reshaping of controller behavior while preserving smooth control action. The real-time adaptive controller is implemented using a continuous-time Riccati differential equation solved online, making the method suitable for real-time deployment. Experimental validation is conducted on two Quanser QUBE-Servo 2 rotary inverted pendulum platforms under square, sinusoidal, and sawtooth reference trajectories. Performance is compared against a fixed-gain LQR controller using multiple quantitative metrics, including tracking error and control effort. Experimental results demonstrate substantial improvements in tracking accuracy, with reductions exceeding 70–90% in error metrics, while simultaneously achieving over 94% reduction in control effort. These findings verify that adaptive cost shaping provides an effective and practical mechanism for enhancing LQR performance in underactuated experimental systems.

Keywords:

adaptive LQR; cost function adaptation; sliding-mode-inspired control; rotary inverted pendulum; underactuated systems; experimental validation

1. Introduction

The rotary inverted pendulum is a canonical benchmark for nonlinear, underactuated, and unstable control systems. It is widely used to evaluate advanced control strategies in both simulation and experimental environments. Among the various control approaches proposed in the literature, the Linear Quadratic Regulator (LQR) remains one of the most popular due to its systematic design procedure, guaranteed optimality for linear systems, and ease of implementation in real-time applications [1,2,3]. Commercial platforms such as the Quanser QUBE-Servo 2 further promote the use of LQR as a baseline controller for experimental validation [4].

Despite its advantages, conventional LQR suffers from an inherent limitation where the state and control weighting matrices are fixed and must be carefully tuned offline. In practical implementations, particularly for underactuated systems such as the rotary inverted pendulum, fixed weighting matrices often lead to performance degradation under disturbances, modeling uncertainties, reference changes, or actuator saturation. As a result, significant research effort has been devoted to improving LQR robustness through adaptive, robust, and hybrid control strategies.

More recently, adaptive and data-driven optimal control methods have extended the LQR framework by updating control policies directly from online data. Policy gradient-based adaptive LQR approaches have been proposed in which feedback gains are refined online using gradient descent on the quadratic cost function, with theoretical guarantees on stability and convergence [5,6,7]. Related data-driven and learning-based formulations, including adaptive dynamic programming and reinforcement learning-assisted LQR controllers, have demonstrated the ability to stabilize inverted pendulum systems without requiring precise system models [8]. These approaches reflect a broader shift toward online performance adaptation in optimal control, particularly for underactuated and uncertain systems.

Adaptive LQR techniques aim to overcome these limitations by adjusting controller parameters online to cope with system uncertainties and time-varying dynamics. A substantial body of work has explored adaptive and hybrid optimal control formulations for underactuated mechanical systems, including inverted pendulums. One stream of research focuses on adaptive optimal control and online adjustment of LQR gains based on observed data. For example, adaptive LQR formulations have been proposed for continuous-time systems with uncertain dynamics, where feedback gains are updated online to maintain near-optimal performance despite model uncertainty [9]. Related reinforcement learning-based LQR methods have also been developed to adapt control gains without explicit system models, demonstrating fast stabilization and online parameter adaptation for pendulum balancing tasks [10]. In parallel, sliding mode control (SMC) has been extensively studied as a robust control method capable of rejecting disturbances and uncertainties through discontinuous or nonlinear feedback mechanisms.

However, existing approaches typically treat LQR and SMC as separate paradigms or combine them in a fixed or offline-optimized manner. Hybrid controllers that integrate robust or nonlinear preprocessing into LQR structures, such as fuzzy-LQR schemes that modulate error terms via nonlinear functions [11] or hybrid adaptive–optimal controllers designed to improve robustness under parameter variations [12], demonstrate enhanced transient response and disturbance rejection. Similarly, SMC-based methods often incorporate adaptive laws or fuzzy approximators to reduce chattering and enhance robustness [13]. Integral and robustified versions of LQR and sliding mode concepts have also been applied to inverted pendulum systems to enhance tracking and steady-state performance [14].

More recently, policy gradient and direct data-driven adaptive control methods have been proposed for the LQR problem itself, where the feedback policy is updated adaptively using online data while preserving stability and convergence guarantees [6]. Despite these advances, dynamically adapting the LQR cost function—particularly the state weighting matrix—based on real-time error dynamics and robustness measures remains largely unexplored, especially in experimental settings. This work addresses this gap by proposing a real-time adaptive LQR framework in which the state weighting matrix is continuously modified online using error-driven and sliding-mode-inspired modulation laws. The proposed method is experimentally validated on two Quanser QUBE-Servo 2 rotary inverted pendulum platforms and compared against a conventional fixed-gain LQR controller, demonstrating improvements in tracking performance, robustness, and control effort.

Unlike conventional adaptive or gain-scheduled LQR approaches that update feedback gains or system parameters directly, the proposed method adapts the optimal control objective itself by dynamically shaping the LQR state weighting matrix in real-time. The controller gains are not heuristically modified or switched; instead, they emerge continuously from the online integration of a Riccati differential equation under a time-varying cost function. Sliding-mode concepts are incorporated at the level of cost modulation rather than through discontinuous control injection, preserving smooth control action while enhancing robustness. The primary goal of this work is to experimentally validate adaptive cost shaping within the LQR framework, rather than to benchmark against fundamentally different nonlinear control paradigms.

1.1. Related Work

1.1.1. Classical and Adaptive LQR Control

The Linear Quadratic Regulator has been extensively studied and applied to inverted pendulum systems due to its optimality and structured design methodology [1,2,15]. Several works have investigated the application of fixed-gain LQR controllers to (rotary) inverted pendulum platforms, often serving as a baseline for comparison [12,16]. To address uncertainties and time-varying dynamics, adaptive LQR methods have been proposed. Early approaches include self-tuning regulators and adaptive Riccati-based controllers, while more recent studies explore reinforcement learning and adaptive dynamic programming frameworks [6,9,17]. Complementing these developments, adaptive optimal control formulations and hybrid adaptive-optimal strategies have been widely studied for underactuated systems, where online gain adaptation or data-driven policy updates enable improved robustness under model uncertainty [10,12,18]. These methods primarily focus on adapting the feedback gain or the system model, rather than modifying the LQR cost function directly.

Recent theoretical developments further advance adaptive LQR beyond classical self-tuning regulators. New adaptive LQR frameworks explicitly address practical challenges such as the requirement of an initial stabilizing controller, safe online learning, and computational efficiency, while providing convergence and performance guarantees [6,7,19,20]. These methods primarily adapt the feedback policy or gain structure online and are formulated at the level of control law optimization, rather than through modification of the underlying quadratic cost function.

1.1.2. Online Weight Tuning and Fuzzy LQR Approaches

Several researchers have explored online tuning of LQR weighting matrices using heuristic or intelligent methods. Fuzzy logic-based LQR controllers dynamically adjust weighting parameters based on error magnitude and system states [11,21]. Such approaches can be viewed as part of a broader class of nonlinear or hybrid preprocessing techniques that enhance LQR robustness by modulating error terms or incorporating adaptive fuzzy components [11,12]. While these methods demonstrate improved performance over fixed-gain LQR, they typically rely on rule-based or fuzzy inference systems and do not incorporate robustness measures inspired by sliding mode control. Beyond rule-based fuzzy tuning, several studies in optimal and learning-based control have investigated adaptive or time-varying cost representations, often motivated by inverse optimal control or performance-driven learning objectives. These approaches infer or adjust cost parameters indirectly through learning mechanisms or optimization procedures, but they are typically not implemented in real-time, nor are they integrated with continuous Riccati-based LQR synthesis in experimental settings. Consequently, the adaptation of LQR cost weights remains largely heuristic or offline in most reported applications. Offline optimization of LQR weighting matrices using metaheuristic algorithms such as genetic algorithms and particle swarm optimization has also been reported [22,23]. However, these approaches do not provide proper online adaptation and are unsuitable for handling rapidly changing operating conditions.

1.1.3. Sliding Mode and Hybrid Optimal Control

Sliding mode control has been widely adopted for inverted pendulum systems due to its robustness against disturbances and modeling uncertainties [24,25]. Numerous hybrid control strategies combining LQR and SMC have been proposed, often employing fixed-gain LQR for nominal performance and SMC for robustness enhancement. Variants such as optimized, fuzzy-assisted, or integral sliding mode controllers have been developed to reduce chattering and improve transient behavior [13,14]. Nevertheless, in these studies, the LQR weighting matrices remain fixed, and sliding mode action is applied directly to the control law rather than to the cost function itself. In existing hybrid LQR–SMC formulations, robustness is typically introduced through additional discontinuous control terms or switching mechanisms applied directly to the control input. While effective for disturbance rejection, such designs often increase control effort and may introduce chattering. Importantly, these methods do not reinterpret robustness within the optimal control objective itself, as the LQR cost function and its weighting matrices remain fixed.

Recent research on underactuated system control has explored a wide range of adaptive and nonlinear strategies, including adaptive optimal control, gain-scheduled LQR formulations, and hybrid robust control architectures. While nonlinear adaptive controllers often modify the control law directly or rely on model uncertainty compensation, the present work focuses on a complementary but distinct direction: online adaptation of the LQR cost function within a linear-optimal control framework. By shaping the performance objective rather than the plant model or feedback structure, the proposed method preserves the interpretability, smoothness, and implementation advantages of classical LQR while enabling real-time performance adaptation.

1.1.4. Research Gap

Based on the existing literature, adaptive and hybrid control strategies for inverted pendulum systems predominantly focus on online adaptation of feedback gains, system parameters, or direct augmentation of the control law. Although recent adaptive optimal control and data-driven policy gradient methods update LQR policies online, they do not modify the quadratic cost structure itself. To the best of the authors’ knowledge, no experimental study has reported real-time adaptation of the LQR state weighting matrix using robustness-inspired error modulation while simultaneously solving the Riccati equation online.

1.2. Contributions

The main contributions of this paper are summarized as follows:

A real-time adaptive LQR framework is proposed in which the state weighting matrix of the LQR cost function is continuously modified online based on real-time tracking error, state dynamics, integral error accumulation, and reference variation.
A sliding-mode-inspired modulation mechanism is introduced to enhance robustness, where a nonlinear sliding variable influences the adaptation of the LQR weighting matrix rather than directly injecting discontinuous control action.
A continuous-time Riccati differential equation is implemented and solved online, enabling real-time computation of adaptive LQR gains without relying on built-in LQR solvers, making the approach suitable for real-time embedded implementation.
The proposed controller is experimentally validated on two Quanser QUBE-Servo 2 rotary inverted pendulum platforms and systematically compared with a conventional fixed-gain LQR controller.
Experimental results demonstrate significant improvements in tracking performance, disturbance rejection, robustness, and reduction in control effort, confirming the effectiveness of the proposed adaptive weighting strategy.

1.3. Paper Organization

The remainder of this paper is organized as follows. Section 2.1 presents the system modeling of the rotary inverted pendulum. Section 2 describes the proposed real-time adaptive LQR control framework. Section 4 details the experimental setup and results. Finally, conclusions and future research directions are discussed in Section 5 and Section 6.

2. Mathematical Formulation

2.1. System Model

The QUBE-Servo 2 platform is a rotary servo system designed for research and education in control and robotics systems. It features a DC motor equipped with an encoder for precise measurement of angular position and velocity. Additionally, the QUBE-Servo 2 platform includes interchangeable attachments, such as the rotary pendulum illustrated in Figure 1A,B, enabling users to explore a variety of control methodologies. To enhance the understanding of the dynamics associated with the rotary pendulum, the free-body diagram in Figure 1C outlines the relevant parameters.

The nonlinear equations of motion (EOM) for the rotary pendulum, specifically pertaining to the QUBE-Servo 2 platform, have been derived utilizing the Euler–Lagrange method. This approach facilitates a comprehensive analysis of the system’s dynamics, as detailed in the referenced literature [26].

\begin{matrix} (J_{r} + J_{p} {sin}^{2} α) \ddot{θ} + m_{p} r l cos α \ddot{α} + 2 J_{p} sin α cos α \dot{θ} \dot{α} - m_{p} r l sin α {\dot{α}}^{2} = τ - b_{r} \dot{θ} \end{matrix}

(1)

\begin{matrix} J_{p} \ddot{α} + m_{p} r l cos α \ddot{θ} - J_{p} sin α cos α {\dot{θ}}^{2} + m_{p} g l sin α = - b_{p} \dot{α} \end{matrix}

(2)

To determine the linearized equations of motion (EOM), it is essential to first linearize the nonlinear equations of motion around the operating point. By applying the linearization technique, one can derive expressions that allow for the solving of the accelerations in a more tractable form. The resulting equations provide valuable insights into the system dynamics under small perturbations.

\begin{matrix} \ddot{θ} & = \frac{1}{J_{t}} (m_{p}^{2} l^{2} r g α - J_{p} b_{r} \dot{θ} + m_{p} l r b_{p} \dot{α} + J_{p} τ) \end{matrix}

(3)

\begin{matrix} \ddot{α} & = \frac{1}{J_{t}} (- m_{p} g l J_{r} α + m_{p} l r b_{r} \dot{θ} - J_{r} b_{p} \dot{α} - m_{p} r l τ) \end{matrix}

(4)

\begin{matrix} J_{t} & = J_{p} J_{r} - m_{p}^{2} l^{2} r^{2} \end{matrix}

(5)

\begin{matrix} τ & = \frac{k_{t}}{R_{m}} (v_{m} - k_{m} \dot{θ}) \end{matrix}

(6)

The rotary inverted pendulum can be represented in continuous-time state-space form as

\dot{x} (t) = A x (t) + B u (t),

(7)

where

x (t) = {[\begin{matrix} θ (t) & α (t) & \dot{θ} (t) & \dot{α} (t) \end{matrix}]}^{⊤}

(8)

denotes the rotary arm angle, pendulum angle, and their respective angular velocities, and

u (t)

is the motor voltage input. The matrices

A \in R^{4 \times 4}

and

B \in R^{4 \times 1}

are obtained from the linearized dynamics of the QUBE-Servo 2 platform around the upright equilibrium. The system is linearized about the upright equilibrium configuration, corresponding to

α = 0

and zero angular velocities, which is the standard operating point for rotary inverted pendulum stabilization and tracking. All the parameters with their nominal values are listed in Table 1.

\begin{matrix} [\begin{matrix} {\dot{x}}_{1} \\ {\dot{x}}_{2} \\ {\dot{x}}_{3} \\ {\dot{x}}_{4} \end{matrix}] = \frac{1}{J_{t}} [\begin{matrix} 0 & 0 & J_{t} & 0 \\ 0 & 0 & 0 & J_{t} \\ 0 & m_{p}^{2} l^{2} r g & - J_{p} (b_{r} + \frac{k_{t} k_{m}}{R_{m}}) & m_{p} l r b_{p} \\ 0 & - m_{p} g l J_{r} & m_{p} l r (b_{r} + \frac{k_{t} k_{m}}{R_{m}}) & - J_{r} b_{r} \end{matrix}] + \frac{k_{t}}{R_{m} J_{t}} [\begin{matrix} 0 \\ 0 \\ J_{p} \\ - m_{p} r l \end{matrix}] v_{m} \end{matrix}

(9)

2.2. Conventional Continuous-Time LQR

For a linear system of the form (7), the standard continuous-time LQR problem seeks to minimize the quadratic cost

J = \int_{0}^{\infty} (x^{⊤} (t) Q x (t) + u^{⊤} (t) R u (t)) d t,

(10)

where

Q ⪰ 0

and

R ≻ 0

are constant weighting matrices.

The optimal state-feedback control law is given by

u (t) = - K x (t), K = R^{- 1} B^{⊤} P,

(11)

where

P ≻ 0

satisfies the algebraic Riccati equation (ARE)

A^{⊤} P + P A - P B R^{- 1} B^{⊤} P + Q = 0 .

(12)

In practical implementations, the fixed choice of Q and R significantly influences performance and robustness.

2.3. Adaptive LQR Cost Formulation

To enhance robustness and tracking performance under varying operating conditions, the LQR cost function is modified by allowing the state weighting matrix Q to vary online as a function of real-time system errors and state dynamics:

J = \int_{0}^{\infty} (x^{⊤} (t) Q (t) x (t) + u^{⊤} (t) R u (t)) d t .

(13)

In this work, adaptation is applied to the dominant state associated with the rotary arm angle

θ

, while the remaining diagonal entries of Q remain constant:

Q (t) = [\begin{matrix} q_{11} (t) & 0 & 0 & 0 \\ 0 & q_{22} & 0 & 0 \\ 0 & 0 & q_{33} & 0 \\ 0 & 0 & 0 & q_{44} \end{matrix}] .

(14)

In this formulation, adaptation is intentionally restricted to the dominant state associated with the rotary arm angle to limit complexity and preserve interpretability. The remaining weighting terms are held constant to maintain a consistent penalty structure for the remaining states while allowing targeted adaptation where tracking performance is most sensitive.

2.4. Error-Based Adaptive Weighting Law

Let the tracking error be defined as

e (t) = θ_{ref} (t) - θ (t),

(15)

with a filtered weighted derivative term

{\dot{θ}}_{filt} (t) = β θ_{filt} (t) + (1 - β) \dot{θ} (t), 0 < β < 1 .

(16)

and with an exponentially weighted integral term

{\dot{e}}_{int} (t) = λ e_{int} (t) + e (t), 0 < λ < 1 .

(17)

The adaptive weighting term

q_{11} (t)

is constructed as

q_{11} (t) = q_{11} (0) + Δ q_{pos} (t) + Δ q_{vel} (t) + Δ q_{int} (t) + Δ q_{ref} (t),

(18)

where

\begin{matrix} Δ q_{pos} (t) & = k_{p} {| e (t) |}^{η_{p}}, \end{matrix}

(19)

\begin{matrix} Δ q_{vel} (t) & = k_{v} {| θ_{filt} (t) |}^{η_{v}}, \end{matrix}

(20)

\begin{matrix} Δ q_{int} (t) & = k_{i} {| e_{int} (t) |}^{η_{i}}, \end{matrix}

(21)

\begin{matrix} Δ q_{ref} (t) & = k_{r} {| {\dot{θ}}_{ref} (t) |}^{η_{r}} . \end{matrix}

(22)

Here,

k_{p}, k_{v}, k_{i}, k_{r} > 0

are gains and the exponents

η_{p}, η_{v}, η_{i}, η_{r} \in (0, 1]

provide smooth nonlinear scaling while preventing excessive amplification under large transients. Each adaptive term serves a distinct purpose. The position-related term increases state penalty during large tracking errors, the velocity-related term improves damping during fast transients, the integral-related term addresses persistent bias while avoiding windup through exponential forgetting, and the reference-rate term anticipates rapid command changes. The nonlinear exponents ensure smooth scaling and prevent excessive amplification. These terms are combined to provide complementary adaptation mechanisms rather than precise trajectory-specific tuning. Each individual contribution to

q_{11} (t)

—including the position, velocity, integral, and reference-rate terms—is explicitly bounded to prevent excessive amplification. This ensures that each component remains within a feasible range, preserving smooth and numerically stable adaptation of the LQR cost function.

Finally, the total adaptive weight

q_{11} (t)

is constrained, guaranteeing positive definiteness and boundedness of the LQR cost function.

2.5. Sliding-Mode-Inspired Modulation

To further enhance robustness against disturbances and modeling uncertainties, a sliding variable is defined as

s (t) = e (t) + γ θ_{filt} (t),

(23)

where

γ > 0

is a design constant.

A smooth nonlinear modulation function is introduced:

σ (t) = tanh (\frac{s (t)}{δ (t)}),

(24)

with an adaptive boundary layer thickness

δ (t) = δ_{0} + δ_{1} | e (t) |,

(25)

where

δ_{0} > 0

is a small positive constant that prevents excessive sensitivity near the equilibrium, and

δ_{1} > 0

scales the boundary layer proportionally to the tracking error magnitude to provide smoother adaptation during large deviations.

The final adaptive weight is obtained by sequentially updating the previously computed adaptive weight:

q_{11} (t) \leftarrow q_{11} (t) (1 + κ | σ (t) |),

(26)

where

κ > 0

controls the influence of the sliding-mode-inspired modulation. Equation (26) represents a sequential update: the previously computed adaptive weight

q_{11} (t)

is rescaled by the sliding-mode-inspired modulation factor. This explicit notation clarifies that the update is applied after the base adaptive computation rather than being an implicit equation.

To ensure numerical stability and prevent excessive feedback gains, the adaptive weight is explicitly clamped:

q_{11} (t) = min (max (q_{11}^{min}, q_{11} (t)), q_{11}^{max}),

(27)

where

q_{11}^{min}

and

q_{11}^{max}

are predefined bounds. This ensures that all contributions—including position, velocity, integral, reference-rate, and sliding-mode-inspired modulation—remain bounded.

This mechanism increases the LQR penalty on the dominant state during large deviations while preserving smooth control action.

It is emphasized that this mechanism does not constitute classical sliding mode control. No invariance or discontinuous switching action is enforced. The sliding variable is used solely as a robustness-motivated metric to modulate the LQR cost function smoothly, and the term “sliding-mode-inspired” is adopted in this limited sense.

2.6. Continuous-Time Riccati Differential Equation

Instead of solving the algebraic Riccati equation offline, the Riccati matrix

P (t)

is computed online by integrating the continuous-time Riccati differential equation:

\dot{P} (t) = A^{⊤} P (t) + P (t) A - P (t) B R^{- 1} B^{⊤} P (t) + Q (t) .

(28)

The instantaneous feedback gain is obtained as

K (t) = R^{- 1} B^{⊤} P (t) .

(29)

For numerical robustness, the elements of

P (t)

and

K (t)

are constrained within predefined bounds.

For numerical robustness, the instantaneous feedback gain is clamped element-wise:

K (t) = {proj}_{[- K_{max}, K_{max}]} (R^{- 1} B^{⊤} P (t)),

(30)

where

K_{max} > 0

is a predefined limit. This projection ensures that the online LQR gain remains bounded, preventing excessively large control inputs during transient deviations.

The Riccati differential equation is numerically integrated using a fixed-step explicit integration scheme synchronized with the controller sampling period. Projection bounds are applied element-wise to

P (t)

to preserve symmetry, positive semi-definiteness, and numerical robustness. The resulting gain computation is completed well within the sampling interval, ensuring real-time feasibility for the experimental platform used in this study.

2.7. Control Law and Saturation

The adaptive LQR control law is defined as

u (t) = K (t) (x_{ref} (t) - x (t)),

(31)

where

x_{ref} (t) = {[\begin{matrix} θ_{ref} (t) & 0 & 0 & 0 \end{matrix}]}^{⊤} .

(32)

Finally, actuator constraints are enforced via saturation:

u (t) = {sat}_{[u_{min}, u_{max}]} (u (t)) .

(33)

The computed control input is explicitly limited according to actuator constraints, where

u_{min}

and

u_{max}

correspond to the hardware voltage limits. This guarantees bounded actuation and prevents excitation of unmodeled dynamics.

3. Stability and Boundedness Discussion

The proposed adaptive LQR strategy results in a nonlinear, time-varying closed-loop system due to online adaptation of the LQR cost function and Riccati solution. While a strict global Lyapunov proof is beyond the scope of this work, stability is addressed through explicit boundedness assumptions, smooth adaptation, and practical stability arguments commonly adopted in experimental adaptive control studies. To further strengthen this argument, the resulting closed-loop dynamics are formally interpreted as a bounded linear time-varying (LTV) system, for which classical Lyapunov theory provides sufficient conditions for uniform exponential stability under bounded parameter variation.

3.1. Boundedness of Adaptive Parameters

All adaptive contributions affect only the state weighting matrix

Q (t)

, while the system matrices A and B remain fixed. The dominant adaptive term

q_{11} (t)

is explicitly constrained within predefined bounds:

q_{11}^{min} \leq q_{11} (t) \leq q_{11}^{max},

(34)

which ensures that the cost function remains well-defined and positive definite at all times.

Similarly, the Riccati matrix

P (t)

and feedback gain

K (t)

are integrated online using the continuous-time Riccati differential equation with element-wise projection bounds. These clamping operations, directly implemented via persistent variables and filtered states in the publicly available Simulink implementation (https://github.com/mojafari/LQR_Adaptive), guarantee that

P (t)

and

K (t)

remain bounded for all time, even under large transients or abrupt reference changes.

3.2. Properties of the Sliding-Mode-Inspired Modulation

The sliding-mode-inspired modulation term employs a smooth hyperbolic tangent function with an adaptive boundary layer:

σ (t) = tanh (\frac{s (t)}{δ (t)}), δ (t) = δ_{0} + δ_{1} | e (t) |,

(35)

where

δ_{0}, δ_{1} > 0

are design parameters that prevent excessive sensitivity near equilibrium. This formulation ensures smooth rescaling of the previously computed adaptive weight

Q_{11} (t)

sequentially:

q_{11} (t) \leftarrow q_{11} (t) (1 + κ | σ (t) |),

(36)

where

κ > 0

controls the influence of the modulation. The use of tanh and the adaptive boundary layer guarantees bounded modulation, avoids chattering, and maintains smooth control action.

3.3. Practical Stability of the Closed-Loop System

Under bounded adaptive parameters, clamped Riccati integration, and bounded control input due to actuator saturation, the closed-loop dynamics can be interpreted as a linear time-varying system with bounded coefficients. Standard results for such systems [27,28] ensure that all system states remain bounded and converge to a neighborhood of the equilibrium.

The adaptive weighting mechanism increases the penalty on dominant error states during large deviations, effectively strengthening feedback gains when needed, while smoothly relaxing them near equilibrium. The filtered exponential integral term prevents windup and guarantees bounded contributions from persistent errors, further enhancing robustness under disturbances.

3.4. Role of Actuator Saturation

Control inputs are explicitly constrained by actuator saturation limits consistent with the QUBE-Servo 2 hardware. This guarantees finite actuation, prevents excitation of unmodeled dynamics, and limits the effect of transient gain increases induced by the adaptive Riccati solution.

3.5. Experimental Validation of Stability

Extensive experimental validation confirms that all closed-loop states, feedback gains, and control inputs remain bounded under a wide range of initial conditions and reference signals. Compared to fixed-gain LQR, the proposed adaptive strategy improves disturbance rejection, reduces settling time, and avoids oscillatory or unstable behavior. These results support the claim of practical stability and robustness for real-time experimental operation.

3.6. Enhanced Stability Arguments

In summary, the combination of bounded adaptive contributions, smooth sliding-mode-inspired modulation, clamped Riccati integration, filtered integral terms, and actuator saturation ensures that the closed-loop system is practically stable. All adaptive terms are continuous and clamped, the sliding-mode-inspired modulation prevents high-frequency gain variation near equilibrium, and the online Riccati integration guarantees bounded feedback gains. Consequently, the closed-loop dynamics are a bounded linear time-varying system, for which all states remain bounded and converge to a neighborhood of the equilibrium [27,28]. These properties are corroborated by the experimental results presented in this work.

3.7. Lyapunov Analysis for the Resulting Linear Time-Varying System

The proposed adaptive LQR scheme results in a closed-loop system of the form

\dot{x} (t) = A_{cl} (t) x (t),

(37)

where

A_{cl} (t) = A - B K (t),

(38)

and

K (t) = R^{- 1} B^{⊤} P (t)

is obtained from the Riccati differential equation with bounded time-varying

Q (t)

.

Since the adaptation affects only the cost matrix

Q (t)

while A and B remain constant, the closed-loop system is a linear time-varying (LTV) system with bounded coefficients.

3.7.1. Boundedness and Positive Definiteness

By construction,

q_{11}^{min} \leq q_{11} (t) \leq q_{11}^{max},

(39)

and all remaining entries of

Q (t)

are constant positive values. Therefore, there exist constants

q_{min}, q_{max} > 0

such that

q_{min} I ⪯ Q (t) ⪯ q_{max} I, \forall t \geq 0 .

(40)

Under standard controllability assumptions of

(A, B)

and

R ≻ 0

, the Riccati differential equation

\dot{P} = A^{⊤} P + P A - P B R^{- 1} B^{⊤} P + Q (t)

(41)

with bounded positive definite

Q (t)

admits a unique symmetric positive definite solution

P (t)

that remains bounded for all

t \geq 0

(see [28]).

Furthermore, due to projection-based clamping in the numerical implementation,

0 ≺ P_{min} ⪯ P (t) ⪯ P_{max}, \forall t,

(42)

which implies bounded feedback gains

∥ K (t) ∥ \leq K_{max} .

(43)

3.7.2. Lyapunov Function Candidate

Consider the quadratic function

V (x, t) = x^{⊤} P (t) x .

(44)

Since

P (t)

is symmetric positive definite and bounded, there exist constants

α_{1}, α_{2} > 0

such that

α_{1} {∥ x ∥}^{2} \leq V (x, t) \leq α_{2} {∥ x ∥}^{2} .

(45)

Taking the time derivative along system trajectories:

\begin{matrix} \dot{V} & = {\dot{x}}^{⊤} P x + x^{⊤} \dot{P} x + x^{⊤} P \dot{x} \end{matrix}

(46)

\begin{matrix} = x^{⊤} (A_{cl}^{⊤} P + P A_{cl} + \dot{P}) x . \end{matrix}

(47)

Substituting

\dot{P}

from the Riccati equation and using

A_{cl} = A - B K

with

K = R^{- 1} B^{⊤} P

, we obtain

\dot{V} = - x^{⊤} Q (t) x .

(48)

Since

Q (t) ⪰ q_{min} I > 0

, it follows that

\dot{V} \leq - q_{min} {∥ x ∥}^{2} .

(49)

3.7.3. Uniform Exponential Stability

Using the quadratic bounds on

V (x, t)

and the inequality above, we obtain

\dot{V} \leq - \frac{q_{min}}{α_{2}} V .

(50)

By standard comparison arguments for time-varying Lyapunov functions (e.g., [28]), this implies

V (t) \leq V (0) e^{- λ t}, λ = \frac{q_{min}}{α_{2}} > 0,

(51)

and therefore

∥ x (t) ∥ \leq c e^{- λ t / 2} ∥ x (0) ∥,

(52)

for some

c > 0

.

Hence, the closed-loop system is uniformly exponentially stable in the absence of actuator saturation.

3.7.4. Effect of Saturation and Reference Tracking

With bounded actuator saturation and bounded reference signals, the closed-loop system becomes a perturbed LTV system of the form

\dot{x} = A_{cl} (t) x + d (t),

(53)

where

d (t)

is bounded. Standard input-to-state stability (ISS) results for exponentially stable LTV systems imply that all states remain bounded and converge to a neighborhood proportional to

{sup}_{t} ∥ d (t) ∥

.

Therefore, the proposed adaptive LQR controller guarantees:

Uniform exponential stability of the origin in the unsaturated case;
Practical exponential stability under actuator saturation;
Bounded-state behavior under bounded disturbances.

3.8. Discussion on Disturbance Rejection Capability

In aerospace applications, control systems are frequently subjected to exogenous disturbances such as atmospheric turbulence, aerodynamic uncertainties, and sensor noise. The present work primarily focuses on adaptive reference tracking; however, the proposed controller inherently provides disturbance attenuation properties due to the uniform exponential stability of the underlying linear time-varying closed-loop system.

Specifically, under bounded additive disturbances

d (t)

, the closed-loop dynamics can be expressed as

\dot{x} (t) = A_{cl} (t) x (t) + d (t),

(54)

where

A_{cl} (t)

is uniformly exponentially stable. Standard input-to-state stability (ISS) results for exponentially stable LTV systems imply that the state remains bounded and ultimately converges to a neighborhood whose size is proportional to

{sup}_{t} ∥ d (t) ∥

[28].

Moreover, the adaptive weighting mechanism increases the dominant state penalty during large deviations, effectively strengthening feedback gains when disturbances induce significant tracking errors. The smooth sliding-mode-inspired modulation further enhances transient robustness without introducing discontinuous control action or chattering.

It is emphasized that the proposed framework does not explicitly incorporate stochastic disturbance models (e.g., Dryden or von Kármán turbulence models commonly used in aerospace applications), nor does it constitute a stochastic optimal control formulation such as Linear–Quadratic–Gaussian (LQG). Rather, disturbance rejection is achieved implicitly through adaptive gain modulation and the exponential stability of the time-varying closed-loop system. Extension toward explicit stochastic disturbance modeling represents a valuable direction for future research.

4. Experimental Results

This section details the experimental validation of the proposed adaptive LQR controller, which was implemented on two Quanser QUBE-Servo 2 platforms [29]. Table 1 provides the given parameters for the pendulum [29]. The performance of this method is compared to that of a conventional fixed-gain LQR controller [4] across multiple reference trajectories, including square, sinusoidal, and sawtooth signals. The fixed-gain LQR controller was tuned following standard QUBE-Servo 2 design guidelines to achieve stable operation and reasonable tracking performance across all reference types using a single set of weighting matrices. No aggressive high-gain tuning was employed to favor the proposed method. Observed saturation under abrupt references reflects intrinsic limitations of fixed weighting under rapidly changing commands rather than instability.

For the experimental analyses and simulations conducted in this study, MATLAB and Simulink were utilized. The choice of MATLAB was motivated by its access to the Quanser QUARC Library, which is essential for interfacing with the QUBE-Servo 2 platform. The Adaptive LQR code was developed and executed within MATLAB-Simulink R2020b, ensuring compatibility with the necessary drivers and libraries. All simulations and controller implementations were carried out on an HP EliteBook 840 G6 equipped with 16 GB of RAM and an Intel Core i7-1165G7 CPU, providing the computational resources necessary for real-time performance.

All experiments were conducted using identical system models, actuator limits, and sampling rates as detailed in Table 2. Performance evaluation focused on tracking accuracy, control effort, and robustness, employing both quantitative error metrics and qualitative time-domain responses.

4.1. Reference Tracking Performance

Figure 2 and Figure 3 illustrate the experimental time-domain responses for Square-wave reference trajectories. Each figure contains four subplots showing the reference tracking response, control input, normalized tracking error, and the evolution of the adaptive state weighting coefficient

q_{11} (t)

. Figure 4 and Figure 5 show the time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Square-wave reference tracking on QUBE-Servo 2 (QUBE1&2). These confirm that all adaptive contributions to

q_{11} (t)

remain bounded and smoothly varying, providing empirical support for the practical stability arguments presented in Section 3.

For Square-wave references, the adaptive controller exhibits significantly faster transient response and reduced steady-state error compared to the fixed LQR controller on both platforms. The adaptive weighting mechanism increases the state penalty during large reference changes, resulting in more aggressive yet well-controlled corrective action.

Figure 6 and Figure 7 illustrate the experimental time-domain responses for Sine-wave reference trajectories. Each figure contains four subplots showing the reference tracking response, control input, normalized tracking error, and the evolution of the adaptive state weighting coefficient

q_{11} (t)

. Figure 8 and Figure 9 show the time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Sine-wave reference tracking on QUBE-Servo 2 (QUBE1&2). These confirm that all adaptive contributions to

q_{11} (t)

remain bounded and smoothly varying, providing empirical support for the practical stability arguments presented in Section 3.

For sinusoidal references, the adaptive controller consistently achieves lower phase lag and reduced tracking error, demonstrating its ability to adapt smoothly to continuously varying trajectories. Similar improvements are observed for sawtooth references, where the adaptive controller effectively handles sharp transitions without inducing excessive control effort.

Figure 10 and Figure 11 illustrate the experimental time-domain responses for Sawtooth-wave reference trajectories. Each figure contains four subplots showing the reference tracking response, control input, normalized tracking error, and the evolution of the adaptive state weighting coefficient

q_{11} (t)

. Figure 12 and Figure 13 show the time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Sawtooth-wave reference tracking on QUBE-Servo 2 (QUBE1&2). These confirm that all adaptive contributions to

q_{11} (t)

remain bounded and smoothly varying, providing empirical support for the practical stability arguments presented in Section 3.

It is worth noting that brief inverse transients (dips in the opposite direction of motion) are observed during abrupt transitions, particularly around reference reversals. These behaviors are characteristic of the rotary inverted pendulum, which exhibits inherent non-minimum-phase dynamics due to the underactuated coupling between the rotary arm and pendulum angle. Similar transient effects are observed under both fixed and adaptive controllers and are not induced by the proposed adaptive weighting mechanism.

4.2. Quantitative Tracking Metrics

To quantitatively assess performance, the following metrics are computed for each experiment: Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Symmetric Mean Absolute Percentage Error (sMAPE). All metrics are computed over the full 400 s experimental window at 0.002 s sampling time, ensuring a consistent comparison between fixed and adaptive controllers.

Table 3 summarizes the absolute tracking error metrics for fixed-gain and adaptive LQR controllers.

In all cases, the proposed adaptive method significantly outperforms the fixed LQR controller across all metrics and reference trajectories as listed in Table 4.

4.3. Control Effort and Energy Consumption

To evaluate control efficiency, the root mean square (RMS) value of the control input voltage was computed for each experiment. Table 5 summarizes the results. All RMS control effort metrics are computed after actuator saturation, reflecting the actual voltage applied to the hardware and ensuring a fair energy-based comparison.

The adaptive controller achieves a dramatic reduction in control effort, with RMS reductions exceeding 94% across all test cases (see Table 6). This result highlights the effectiveness of adaptive cost shaping in achieving superior tracking performance while significantly reducing actuator energy consumption.

4.4. Tracking Performance vs. Control Effort Trade-Off

Figure 14 illustrates the trade-off between tracking performance and control effort. The adaptive controller consistently achieves lower tracking error while simultaneously reducing control energy. This behavior contrasts with traditional control strategies, where improved tracking is often achieved at the expense of increased control effort.

The results confirm that adaptive modulation of the LQR cost function provides a principled mechanism for balancing performance and efficiency in real-time experimental systems.

5. Discussion

The experimental results demonstrate the effectiveness of the proposed adaptive LQR framework in improving both tracking performance and control efficiency compared to a conventional fixed-gain LQR controller. Across all tested reference trajectories and hardware platforms, the adaptive controller achieves lower tracking error while significantly reducing actuator effort.

A key observation from the experiments is that performance improvements are not achieved through increased control aggressiveness. On the contrary, the adaptive controller requires substantially less control energy while providing superior tracking. This behavior can be attributed to the dynamic modulation of the LQR cost function, which increases state penalties only when large deviations occur and relaxes them near equilibrium. As quantified in Table 4, MSE reductions exceed 70% across all reference types, confirming the discussion above regarding adaptive cost shaping and efficiency.

The sliding-mode-inspired modulation plays an important role in robustness enhancement. Rather than injecting discontinuous control action, the sliding variable influences the adaptive weighting of the cost function through a smooth nonlinear mapping. This design preserves the advantages of sliding mode concepts—such as sensitivity to error dynamics and disturbance rejection—while avoiding chattering and high-frequency control activity. The smooth evolution of the adaptive weighting matrix is evident in the experimental results.

Although the proposed adaptive law introduces multiple design parameters, these can be grouped into three functional categories: (i) scaling gains that determine the relative influence of error components, (ii) smoothing and forgetting factors that limit the rate of adaptation, and (iii) saturation bounds that enforce safety and robustness. Importantly, none of these parameters affect nominal closed-loop stability, and acceptable performance is obtained over broad parameter ranges without trajectory-specific tuning. In contrast, a fixed-gain LQR typically requires retuning of Q and R when reference characteristics change, which the proposed adaptive framework avoids. Similarly, the other key adaptive parameters, including

δ_{0}

,

δ_{1}

,

γ

,

λ

, and the clamping bounds

q_{11}^{min}

and

q_{11}^{max}

, can be selected from broad admissible ranges without trajectory-specific tuning, as their primary role is to scale the adaptive cost function smoothly. Formal parameter sensitivity analysis is identified as an important direction for future work.

Robustness in this work is interpreted in the sense of bounded-input bounded-output behavior under parametric uncertainty, actuator saturation, and unmodeled dynamics. Experimental robustness was evaluated through repeated trials on two nominally identical hardware platforms, operation near actuator limits, and exposure to aggressive reference trajectories. Across all cases, the adaptive controller maintained bounded states, bounded control inputs, and consistent tracking performance without retuning. Formal robustness margins are not claimed; instead, the results demonstrate practical robustness suitable for experimental and real-time control applications.

Another important aspect is the use of an online Riccati differential equation. By continuously updating the Riccati matrix, the controller adapts its feedback gains in a principled optimal-control framework, even as the cost function varies in time.

While the proposed method demonstrates strong empirical performance, it is important to note that the adaptive closed-loop system is nonlinear and time-varying. As such, stability is discussed in terms of boundedness and practical stability rather than strict global Lyapunov guarantees. Nevertheless, extensive experimental testing confirms stable operation, bounded signals, and repeatable performance improvements under a wide range of operating conditions. The transient inverse responses observed during aggressive reference changes are consistent with the non-minimum-phase characteristics of the rotary inverted pendulum system. In particular, rapid arm acceleration induces an initial pendulum motion opposite to the desired direction due to energy coupling and underactuation. The proposed adaptive controller does not eliminate this fundamental limitation but manages its effects without inducing instability or excessive control effort.

Overall, the results suggest that adaptive cost shaping is a powerful yet underexplored mechanism for improving LQR-based control of underactuated systems, particularly in experimental environments where fixed weighting matrices are insufficient.

6. Conclusions

This paper introduced a novel adaptive LQR control strategy for rotary inverted pendulum systems, in which the state weighting matrix of the LQR cost function is continuously modified online using error-based and sliding-mode-inspired adaptation laws. The proposed approach differs fundamentally from conventional adaptive and hybrid controllers by shaping the optimal control objective itself rather than directly modifying the control input or relying on offline tuning.

The controller was implemented using an online continuous-time Riccati differential equation and experimentally validated on two Quanser QUBE-Servo 2 platforms. Comparative experiments against a fixed-gain LQR controller under square, sinusoidal, and sawtooth reference trajectories demonstrate significant improvements in tracking accuracy and robustness, accompanied by dramatic reductions in control effort.

The results confirm that adaptive cost shaping provides an effective and practical means of enhancing LQR performance in underactuated systems without introducing discontinuities or excessive control activity. The proposed framework is compatible with real-time implementation and code generation, making it suitable for embedded and industrial applications. Extensive experimental evaluation confirms that all adaptive contributions remain bounded and smoothly varying, supporting the practical stability of the proposed approach.

Future work will focus on extending the proposed approach to multi-input systems, investigating formal stability conditions for adaptive cost-based LQR formulations, and exploring data-driven methods for automated tuning of adaptation parameters.

Author Contributions

Conceptualization, C.L.-J. and M.J.; methodology, C.L.-J. and M.J.; software, C.L.-J. and M.J.; validation, C.L.-J. and M.J.; formal analysis, C.L.-J. and M.J.; investigation, C.L.-J. and M.J.; resources, C.L.-J. and M.J.; data curation, C.L.-J. and M.J.; writing—original draft preparation, C.L.-J. and M.J.; writing—review and editing, C.L.-J. and M.J.; visualization, C.L.-J. and M.J.; supervision, M.J.; project administration, M.J.; funding acquisition, M.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data/code presented in this study are available in the GitHub repository at https://github.com/mojafari/LQR_Adaptive.

Acknowledgments

The authors wish to acknowledge the support provided by Columbus State University.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bryson, A.E.; Ho, Y.C. Applied Optimal Control: Optimization, Estimation and Control; Routledge: Oxfordshire, UK, 2018. [Google Scholar]
Lewis, F.L.; Vrabie, D.; Syrmos, V.L. Optimal Control; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
Lavretsky, E.; Wise, K.A. Robust adaptive control. In Robust and Adaptive Control: With Aerospace Applications; Springer: Berlin/Heidelberg, Germany, 2012; pp. 317–353. [Google Scholar]
Quanser Inc. Optimal LQR Balance Control of Pendulum; Quanser Inc.: Markham, ON, USA, 2020. [Google Scholar]
Stamouli, C.; Toso, L.F.; Tsiamis, A.; Pappas, G.J.; Anderson, J. Policy gradient bounds in multitask LQR. IEEE Control. Syst. Lett. 2025, 9, 2495–2500. [Google Scholar] [CrossRef]
Zhao, F.; Chiuso, A.; Dörfler, F. Policy Gradient Adaptive Control for the LQR: Indirect and Direct Approaches. arXiv 2025, arXiv:2505.03706. [Google Scholar] [CrossRef]
Fisher, P.A.; Annaswamy, A.M. Adapt and Stabilize, Then Learn and Optimize: A New Approach to Adaptive LQR. arXiv 2025, arXiv:2512.04565. [Google Scholar] [CrossRef]
Yıldıran, U. Adaptive Control of an Inverted Pendulum by a Reinforcement Learningbased LQR Method. Sak. Univ. J. Sci. 2023, 27, 1311–1321. [Google Scholar] [CrossRef]
Jha, S.K.; Bhasin, S. Adaptive linear quadratic regulator for continuous-time systems with uncertain dynamics. IEEE/CAA J. Autom. Sin. 2019, 7, 833–841. [Google Scholar] [CrossRef]
Hernandez, R.; Garcia-Hernandez, R.; Jurado, F. Modeling, simulation, and control of a rotary inverted pendulum: A reinforcement learning-based control approach. Modelling 2024, 5, 1824–1852. [Google Scholar] [CrossRef]
Saleem, O.; Iqbal, J.; Alharbi, S. Self-Regulating Fuzzy-LQR Control of an Inverted Pendulum System via Adaptive Hyperbolic Error Modulation. Machines 2025, 13, 939. [Google Scholar] [CrossRef]
Gao, G.; Xu, L.; Huang, T.; Zhao, X.; Huang, L. Reduced-Order Observer-Based LQR Controller Design for Rotary Inverted Pendulum. Comput. Model. Eng. Sci. (CMES) 2024, 140, 305. [Google Scholar] [CrossRef]
Nguyen, T.V.A.; Dao, Q.T.; Bui, N.T. Optimized fuzzy logic and sliding mode control for stability and disturbance rejection in rotary inverted pendulum. Sci. Rep. 2024, 14, 31116. [Google Scholar] [CrossRef] [PubMed]
Le, H.D.; Nestorović, T. Integral Linear Quadratic Regulator Sliding Mode Control for Inverted Pendulum Actuated by Stepper Motor. Machines 2025, 13, 405. [Google Scholar] [CrossRef]
Ge, Y.; Purevdorj, C.; Kasai, S.; Wagatsuma, H. An Adaptive Control Method for a Knee-Joint Prosthetic Leg Toward Dynamic Stability and Gait Optimization. In Proceedings of the International Conference on Artificial Life & Robotics (ICAROB2025), Oita, Japan, 13–16 February 2025; ALife Robotics: Oita, Japan; Volume 30, pp. 867–872.
Villacres, J.; Viscaino, M.; Herrera, M.; Camacho, O. Controllers comparison to stabilize a two-wheeled inverted pendulum: PID, LQR and sliding mode control. Int. J. Control. Syst. Robot. 2016, 1, 29–36. [Google Scholar]
Zhao, F.; Dörfler, F.; Chiuso, A.; You, K. Data-enabled policy optimization for direct adaptive learning of the LQR. IEEE Trans. Autom. Control. 2025, 70, 7217–7232. [Google Scholar] [CrossRef]
Carnevale, G.; Mimmo, N.; Notarstefano, G. Data-driven LQR with finite-time experiments via extremum-seeking policy iteration. arXiv 2024, arXiv:2412.02758. [Google Scholar]
Sontag, E.D. Some remarks on gradient dominance and LQR policy optimization. arXiv 2025, arXiv:2507.10452. [Google Scholar] [CrossRef]
Yan, X.; Zhang, W.; Jiao, Y.; Wu, T.; Li, G.; Ma, H.; Ma, H.; Cui, Y.; Lin, Z.; Lin, Z. A Learnable LQR Controller for Uncertain Systems: Hybrid-Driven Recurrent Learning. IEEE Trans. Autom. Sci. Eng. 2025, 23, 1546–1560. [Google Scholar] [CrossRef]
Bekkar, B.; Ferkous, K. Design of online fuzzy tuning LQR controller applied to rotary single inverted pendulum: Experimental validation. Arab. J. Sci. Eng. 2023, 48, 6957–6972. [Google Scholar] [CrossRef]
Zhai, Q.; Xia, X.; Feng, S.; Huang, M. Optimization design of LQR controller based on improved whale optimization algorithm. In Proceedings of the 2020 3rd International Conference on Information and Computer Technologies (ICICT), San Jose, CA, USA, 9–12 March 2020; pp. 380–384. [Google Scholar]
Chacko, S.J.; Neeraj, P.; Abraham, R.J. Optimizing LQR controllers: A comparative study. Results Control. Optim. 2024, 14, 100387. [Google Scholar] [CrossRef]
Furuta, K. Sliding mode control of a discrete system. Syst. Control. Lett. 1990, 14, 145–152. [Google Scholar] [CrossRef]
Vaidyanathan, S.; Lien, C.H. Applications of Sliding Mode Control in Science and Engineering; Springer: Berlin/Heidelberg, Germany, 2017; Volume 709. [Google Scholar]
Quanser, Inc. Pendulum State Space Modeling; Quanser Inc.: Markham, ON, USA, 2020. [Google Scholar]
Bartos, M.; Köhler, J.; Dörfler, F.; Zeilinger, M.N. Stability of Certainty-Equivalent Adaptive LQR for Linear Systems with Unknown Time-Varying Parameters. arXiv 2025, arXiv:2511.08236. [Google Scholar]
Khalil, H.K. Nonlinear Systems; Prentice Hall: Upper Saddle River, NJ, USA, 2002; Volume 3. [Google Scholar]
Quanser Inc. QUBE-Servo 2 User Manual; Quanser Inc.: Markham, ON, USA, 2020. [Google Scholar]

Figure 1. (A) Quanser QUBE-Servo 2; (B) Rotary inverted pendulum attachment; (C) free-body diagram for a rotary inverted pendulum.

Figure 2. Square-wave reference tracking on QUBE-Servo 2 (QUBE1). Subplots show tracking response, control input, normalized tracking error, and adaptive weighting

q_{11} (t)

.

Figure 2. Square-wave reference tracking on QUBE-Servo 2 (QUBE1). Subplots show tracking response, control input, normalized tracking error, and adaptive weighting

q_{11} (t)

.

Figure 3. Square-wave reference tracking on QUBE-Servo 2 (QUBE2). Subplots show tracking response, control input, normalized tracking error, and adaptive weighting

q_{11} (t)

.

Figure 3. Square-wave reference tracking on QUBE-Servo 2 (QUBE2). Subplots show tracking response, control input, normalized tracking error, and adaptive weighting

q_{11} (t)

.

Figure 4. Time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Square-wave reference tracking on QUBE-Servo 2 (QUBE1). These plots demonstrate that all contributions remain bounded and vary smoothly throughout the experiment, supporting the practical stability of the controller.

Figure 4. Time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Square-wave reference tracking on QUBE-Servo 2 (QUBE1). These plots demonstrate that all contributions remain bounded and vary smoothly throughout the experiment, supporting the practical stability of the controller.

Figure 5. Time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Square-wave reference tracking on QUBE-Servo 2 (QUBE2). These plots demonstrate that all contributions remain bounded and vary smoothly throughout the experiment, supporting the practical stability of the controller.

Figure 5. Time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Square-wave reference tracking on QUBE-Servo 2 (QUBE2). These plots demonstrate that all contributions remain bounded and vary smoothly throughout the experiment, supporting the practical stability of the controller.

Figure 6. Sine-wave reference tracking on QUBE-Servo 2 (QUBE1). Subplots show tracking response, control input, normalized tracking error, and adaptive weighting

q_{11} (t)

.

Figure 6. Sine-wave reference tracking on QUBE-Servo 2 (QUBE1). Subplots show tracking response, control input, normalized tracking error, and adaptive weighting

q_{11} (t)

.

Figure 7. Sine-wave reference tracking on QUBE-Servo 2 (QUBE2). Subplots show tracking response, control input, normalized tracking error, and adaptive weighting

q_{11} (t)

.

Figure 7. Sine-wave reference tracking on QUBE-Servo 2 (QUBE2). Subplots show tracking response, control input, normalized tracking error, and adaptive weighting

q_{11} (t)

.

Figure 8. Time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Sine-wave reference tracking on QUBE-Servo 2 (QUBE1). These plots demonstrate that all contributions remain bounded and vary smoothly throughout the experiment, supporting the practical stability of the controller.

Figure 8. Time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Sine-wave reference tracking on QUBE-Servo 2 (QUBE1). These plots demonstrate that all contributions remain bounded and vary smoothly throughout the experiment, supporting the practical stability of the controller.

Figure 9. Time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Sine-wave reference tracking on QUBE-Servo 2 (QUBE2). These plots demonstrate that all contributions remain bounded and vary smoothly throughout the experiment, supporting the practical stability of the controller.

Figure 9. Time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Sine-wave reference tracking on QUBE-Servo 2 (QUBE2). These plots demonstrate that all contributions remain bounded and vary smoothly throughout the experiment, supporting the practical stability of the controller.

Figure 10. Sawtooth-wave reference tracking on QUBE-Servo 2 (QUBE1). Subplots show tracking response, control input, normalized tracking error, and adaptive weighting

q_{11} (t)

.

Figure 10. Sawtooth-wave reference tracking on QUBE-Servo 2 (QUBE1). Subplots show tracking response, control input, normalized tracking error, and adaptive weighting

q_{11} (t)

.

Figure 11. Sawtooth-wave reference tracking on QUBE-Servo 2 (QUBE2). Subplots show tracking response, control input, normalized tracking error, and adaptive weighting

q_{11} (t)

.

Figure 11. Sawtooth-wave reference tracking on QUBE-Servo 2 (QUBE2). Subplots show tracking response, control input, normalized tracking error, and adaptive weighting

q_{11} (t)

.

Figure 12. Time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Sawtooth-wave reference tracking on QUBE-Servo 2 (QUBE1). These plots demonstrate that all contributions remain bounded and vary smoothly throughout the experiment, supporting the practical stability of the controller.

Figure 12. Time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Sawtooth-wave reference tracking on QUBE-Servo 2 (QUBE1). These plots demonstrate that all contributions remain bounded and vary smoothly throughout the experiment, supporting the practical stability of the controller.

Figure 13. Time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Sawtooth-wave reference tracking on QUBE-Servo 2 (QUBE2). These plots demonstrate that all contributions remain bounded and vary smoothly throughout the experiment, supporting the practical stability of the controller.

Figure 13. Time evolution of the individual contributions (i.e., position-related term, velocity-related term, integral term, reference-rate term, and sliding-mode-inspired term) to the adaptive LQR weight

q_{11} (t)

during a representative experiment for Sawtooth-wave reference tracking on QUBE-Servo 2 (QUBE2). These plots demonstrate that all contributions remain bounded and vary smoothly throughout the experiment, supporting the practical stability of the controller.

Figure 14. Control effort versus tracking performance trade-off for fixed and adaptive LQR controllers on two nominally identical QUBE-Servo 2 systems. Each marker represents the RMS tracking error and RMS control input over a 400 s experiment. The adaptive LQR achieves lower tracking error without increasing control effort, demonstrating improved performance efficiency and robustness across hardware units.

Table 1. Parameters of pendulum.

Symbol	Description	Value	Unit
r	Length of rotary arm	0.085	m
$J_{r}$	Moment of inertia of rotary arm about pivot	$\frac{m_{r} r^{2}}{3}$	$kg \times m^{2}$
$L_{p}$	Total length of pendulum	0.129	m
l	Center of mass (COM) location of pendulum	$\frac{L_{p}}{2}$	m
$J_{p}$	Moment of inertia of pendulum about its pivot	$\frac{m_{p} L_{p}^{2}}{3}$	$kg \times m^{2}$
$m_{p}$	Mass of pendulum	0.024	kg
$m_{r}$	Mass of rotary arm	0.095	kg
g	Gravitational acceleration	9.81	${m / s}^{2}$
$b_{r}$	Damping coefficient of rotary arm	1 × 10⁻³	N.m.s/rad
$b_{p}$	Damping coefficient of pendulum	5 × 10⁻⁵	N.m.s/rad
$k_{t}$	Torque constant	0.042	N.m/A
$k_{m}$	Motor back-emf constant	0.042	V/(rad/s)
$R_{m}$	Terminal resistance	8.4	$Ω$

Table 2. System parameters and controller design for Quanser QUBE adaptive LQR.

System Parameters		Controller Gains		System Limits		Waveform Specifications
Parameter	Value	Parameter	Value	Parameter	Value	Parameter	Value
State-Space Matrix A	$[\begin{matrix} 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 152.0057 & - 10.1381 & - 0.5005 \\ 0 & 264.3080 & - 10.0202 & - 0.8702 \end{matrix}]$	$k_{p}$	35	$η_{p}$	0.8	Waveform Type	Sawtooth, Sine, Square
State-Space Matrix B	$[\begin{matrix} 0 \\ 0 \\ 50.6372 \\ 50.0484 \end{matrix}]$	$k_{v}$	12	$η_{v}$	0.5	Amplitude	−45 to 45 degrees
Initial LQR Gain K	$[\begin{matrix} - 2.2361 & 43.9749 & - 1.7938 & 4.3719 \end{matrix}]$	$k_{i}$	8	$η_{i}$	0.5	Frequency	0.02 Hz
LQR Weighting Matrix Q	diag([5, 1, 1, 5])	$k_{r}$	10	$η_{r}$	1	Duration	400 s
LQR Weighting Matrix R	1			$β$	0.8	Sample Time	0.002 s
Additional System Parameters		Control Limits		Other Parameters
$λ$	0.99	$q_{11}^{min}$	5	$δ_{0}$	0.01	$γ$	0.5
$γ$	0.5	$q_{11}^{max}$	100	$δ_{1}$	0.05	$δ_{1}$	0.05
$κ$	0.5	$u_{min}$	−5
		$u_{max}$	5

Table 3. Absolute tracking error metrics for fixed and adaptive LQR controllers.

Case	MSE		RMSE		MAE		sMAPE
Case	Fixed	Adaptive	Fixed	Adaptive	Fixed	Adaptive	Fixed	Adaptive
Square-wave—QUBE1	507.4	130.57	22.526	11.427	16.111	4.8877	31.852	11.527
Square-wave—QUBE2	228.38	121.18	15.112	11.008	9.3059	3.4143	21.914	8.4142
Sine-wave—QUBE1	105.84	10.726	10.288	3.2751	9.2137	2.9901	40.423	20.236
Sine-wave—QUBE2	63.067	4.8123	7.9415	2.1937	7.0892	1.879	40.982	14.362
Sawtooth-wave—QUBE1	238.26	62.638	15.436	7.9144	10.219	2.678	54.337	15.221
Sawtooth-wave—QUBE2	239.93	62.444	15.49	7.9021	12.49	2.3508	77.604	20.08

Table 4. Percentage improvement of adaptive LQR over fixed LQR.

Case	MSE (%)	RMSE (%)	MAE (%)	sMAPE (%)
Square-wave—QUBE1	74.27	49.27	69.66	63.81
Square-wave—QUBE2	46.94	27.16	63.31	61.60
Sine-wave—QUBE1	89.87	68.17	67.55	49.94
Sine-wave—QUBE2	92.37	72.38	73.50	64.96
Sawtooth-wave—QUBE1	73.71	48.73	73.79	71.99
Sawtooth-wave—QUBE2	73.97	48.98	81.18	74.13

Table 5. RMS control effort comparison.

Case	Fixed RMS (u)	Adaptive RMS (u)
Square-wave—QUBE1	11.261	0.6664
Square-wave—QUBE2	11.875	0.6167
Sine-wave—QUBE1	11.249	0.5805
Sine-wave—QUBE2	11.273	0.6016
Sawtooth-wave—QUBE1	11.305	0.5959
Sawtooth-wave– QUBE2	11.344	0.6559

Table 6. Percentage reduction in RMS control effort.

Case	RMS Reduction (%)
Square-wave—QUBE1	94.08
Square-wave—QUBE2	94.81
Sine-wave—QUBE1	94.84
Sine-wave—QUBE2	94.66
Sawtooth-wave—QUBE1	94.73
Sawtooth-wave—QUBE2	94.22

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lopez-Jordan, C.; Jafari, M. Real-Time Adaptive Linear Quadratic Regulator Control for the QUBE–2 Rotary Inverted Pendulum. Math. Comput. Appl. 2026, 31, 33. https://doi.org/10.3390/mca31020033

AMA Style

Lopez-Jordan C, Jafari M. Real-Time Adaptive Linear Quadratic Regulator Control for the QUBE–2 Rotary Inverted Pendulum. Mathematical and Computational Applications. 2026; 31(2):33. https://doi.org/10.3390/mca31020033

Chicago/Turabian Style

Lopez-Jordan, Cynthia, and Mohammad Jafari. 2026. "Real-Time Adaptive Linear Quadratic Regulator Control for the QUBE–2 Rotary Inverted Pendulum" Mathematical and Computational Applications 31, no. 2: 33. https://doi.org/10.3390/mca31020033

APA Style

Lopez-Jordan, C., & Jafari, M. (2026). Real-Time Adaptive Linear Quadratic Regulator Control for the QUBE–2 Rotary Inverted Pendulum. Mathematical and Computational Applications, 31(2), 33. https://doi.org/10.3390/mca31020033

Article Menu

Real-Time Adaptive Linear Quadratic Regulator Control for the QUBE–2 Rotary Inverted Pendulum

Abstract

1. Introduction

1.1. Related Work

1.1.1. Classical and Adaptive LQR Control

1.1.2. Online Weight Tuning and Fuzzy LQR Approaches

1.1.3. Sliding Mode and Hybrid Optimal Control

1.1.4. Research Gap

1.2. Contributions

1.3. Paper Organization

2. Mathematical Formulation

2.1. System Model

2.2. Conventional Continuous-Time LQR

2.3. Adaptive LQR Cost Formulation

2.4. Error-Based Adaptive Weighting Law

2.5. Sliding-Mode-Inspired Modulation

2.6. Continuous-Time Riccati Differential Equation

2.7. Control Law and Saturation

3. Stability and Boundedness Discussion

3.1. Boundedness of Adaptive Parameters

3.2. Properties of the Sliding-Mode-Inspired Modulation

3.3. Practical Stability of the Closed-Loop System

3.4. Role of Actuator Saturation

3.5. Experimental Validation of Stability

3.6. Enhanced Stability Arguments

3.7. Lyapunov Analysis for the Resulting Linear Time-Varying System

3.7.1. Boundedness and Positive Definiteness

3.7.2. Lyapunov Function Candidate

3.7.3. Uniform Exponential Stability

3.7.4. Effect of Saturation and Reference Tracking

3.8. Discussion on Disturbance Rejection Capability

4. Experimental Results

4.1. Reference Tracking Performance

4.2. Quantitative Tracking Metrics

4.3. Control Effort and Energy Consumption

4.4. Tracking Performance vs. Control Effort Trade-Off

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI