Optimal Control Based on Reinforcement Learning for Flexible High-Rise Buildings with Time-Varying Actuator Failures and Asymmetric State Constraints

Li, Min; Xie, Rui

doi:10.3390/buildings15060841

Open AccessArticle

Optimal Control Based on Reinforcement Learning for Flexible High-Rise Buildings with Time-Varying Actuator Failures and Asymmetric State Constraints

by

Min Li

^1,2 and

Rui Xie

^1,2,*

¹

School of Electrical and Control Engineering, North University of China, Taiyuan 030051, China

²

Science and Technology on Electronic Test and Measurement Laboratory, North University of China, Taiyuan 030051, China

^*

Author to whom correspondence should be addressed.

Buildings 2025, 15(6), 841; https://doi.org/10.3390/buildings15060841

Submission received: 23 January 2025 / Revised: 7 February 2025 / Accepted: 10 February 2025 / Published: 7 March 2025

(This article belongs to the Section Building Structures)

Download

Browse Figures

Versions Notes

Abstract

This study centers on the vibration suppression of high-rise building systems under extreme conditions, exploring a reinforcement learning (RL)-based vibration control strategy for flexible building systems with time-varying faults and asymmetric state constraints. A mathematical model precisely depicting the dynamic characteristics of flexible high-rise buildings, considering the time-varying nature of actuator faults, is initially established. Subsequently, a reinforcement learning-based controller is devised to counteract the negative impacts of faults on system performance. By introducing a time-varying asymmetric Lyapunov function, system state constraints are ensured, safeguarding system stability and security. The stability of the closed-loop system is rigorously proven using the Lyapunov stability theory, guaranteeing stable vibration suppression performance even in the presence of faults. The simulation results indicate that the proposed reinforcement learning vibration control method can effectively reduce the vibration response of flexible high-rise buildings when facing time-varying actuator faults. This demonstrates its remarkable robustness and adaptability, presenting a novel and effective solution for vibration control in real-world flexible high-rise buildings.

Keywords:

flexible high-rise buildings; optimal control; time-varying actuator faults; reinforcement learning; asymmetric constraints

1. Introduction

In the field of modern civil engineering, high-rise buildings are being developed to be higher and more flexible, which is in line with the concept of smart cities pursuing efficient, intelligent and sustainable development [1]; however, this development trend also leads to a continuous decline in the overall structural stiffness and damping of the building system. This poses severe challenges to the control of structural responses under dynamic loads such as wind and earthquakes, including the nonlinear and time-varying nature of the load, the coupling of high-frequency and low-frequency vibrations, the uncertainty in the load patterns, the limitations of actuators under extreme conditions, and the need for real-time adaptive control to ensure stability and safety. Due to their inherent structural characteristics, flexible high-rise buildings have complex vibration modes and low damping, making them prone to significant vibrations. This not only affects the comfort of occupants within the building but also potentially endangers the structural safety and durability. To enhance comfort and safety, various countries have established certain standards for building systems [2]. To suppress the vibrations of building systems, scholars have proposed numerous methods. Traditional control measures, such as passive control [3], can reduce structural vibrations to a certain extent, but they often have fixed parameter settings and are difficult to flexibly adjust the control effect according to the changing external excitation and real-time structural status. In contrast, active control systems have real-time adjustment capabilities [4,5], although they highly rely on precise structural models and sensor information, and have strict requirements for the performance stability of actuators. However, active control can better adapt to the changes in natural frequencies caused by the nonlinear characteristics of the building system and changes in the external environment, and is more suitable for the current situation of high-rise buildings. Therefore, active control methods are receiving increasing attention.

In practical applications, sudden faults may occur in high-rise building systems, which can affect the overall structure and safety of the system, especially sensor faults [6] and actuator faults [7,8]. The actuator output characteristics can change significantly, which can seriously damage the expected performance of the control system and lead to an unstable structural response. A more practical but challenging situation is when the fault parameters change over time, resulting in time-varying actuator faults [9]. Ji et al. [10] designed an adaptive control law with a Nussbaum function and an auxiliary signal for a three-dimensional variable-length flexible string system modeled by PDEs, considering time-varying actuator faults and unknown control directions. They verified the control effect through simulations, providing a basis for practical engineering and planning future research directions. Li et al. [11] studied a vibration single-link flexible manipulator system modeled by PDEs, considering nonlinear and time-varying actuator faults as well as unknown control directions. Their research provides important references and inspiration for this study in dealing with time-varying actuator faults, highlighting the necessity of innovative control strategies in complex fault scenarios.

Meanwhile, considering the safety of the system operation, it is necessary to restrict the input, output and state of the actual high-rise building system to ensure its safe and reliable operation. Zhang et al. [12] proposed boundary control for high-rise building systems with active mass dampers under input and output constraints for large flexible building systems. In actual high-rise building systems, the stiffness, mass and damping characteristics of the structure in different directions are complicated due to the asymmetry of the architectural design, which greatly increases the complexity of the control strategy design. Traditional control methods based on the assumption of symmetrical structures make it difficult to effectively address these issues. Wang [13] introduced a time-varying asymmetric constraint Lyapunov function for the tracking control of nonlinear periodic systems and used the adaptive backstepping method and single-parameter adjustment to achieve output tracking error convergence and state constraints. Numerical examples have verified its effectiveness for asymmetric constraint systems. Song et al. [14] combined error transfer transformation with BLF to develop a tracking control method that can handle delays and asymmetric time-varying constraints. Wan et al. [15] addressed the control problem of non-strict feedback nonlinear systems with uncertainties and asymmetric time-varying output constraints by using an adaptive fuzzy state observer and an asymmetric time-varying barrier Lyapunov function, which is worth learning from. Therefore, the research on asymmetric output constraints for flexible high-rise building systems is crucial and requires further in-depth study.

In recent years, reinforcement learning, as a cutting-edge technology in the field of artificial intelligence, has attracted extensive attention from scholars [16,17,18]. Lan et al. [19] proposed a scheme based on improved distributed cooperative reinforcement learning for the control problem of multi-agent systems in unknown dynamic obstacle environments, enhancing performance through a three-layer cooperative mechanism and neural network optimization, which was verified to be effective through simulation. Zhao et al. [20] designed fault-tolerant control for a two-degree-of-freedom nonlinear helicopter system by combining generalized learning neural networks with reinforcement learning, considering actuator faults to construct parameters and using the Lyapunov method to analyze stability. Simulation and experiments confirmed its effectiveness and feasibility, providing new methods and ideas. Meanwhile, in the field of intelligent structural control, adaptive variable stiffness intelligent structural vibration control based on a fuzzy control strategy combined with LSTM also provides a new research direction for structural vibration resistance [21]. In the method proposed in [22], the reinforcement learning framework adopted an actor–critic structure. The critic neural network was used to approximate the loss function to evaluate control performance and serve as input information, while the actor neural network adjusted the control output to minimize the cost function. In the face of high uncertainty caused by time-varying actuator faults and asymmetric constraints, reinforcement learning demonstrates unique advantages, as it can learn and adapt to system dynamics online, continuously optimize control decisions, and minimize structural vibration responses, thereby providing an innovative and highly promising solution for the vibration control of flexible high-rise buildings.

This study focuses on developing an optimal reinforcement learning control strategy for flexible high-rise building systems with time-varying actuator faults and asymmetric constraints. The main contributions of this study are as follows: (1) Unlike previous studies, this work addresses the common issue of time-varying actuator faults in high-rise buildings. The proposed RL strategy interacts with the environment during operation, capturing actuator fault information and dynamically adjusting control actions, overcoming the limitations of traditional methods and enhancing system stability and safety under fault conditions, thus advancing building safety control towards intelligent self-adaptation. (2) Due to asymmetric constraints, flexible high-rise buildings exhibit complex nonlinear behaviors, which traditional linear control methods fail to address. This study integrates asymmetric constraints with RL algorithms to optimize vibration control, reduce safety risks, and ensure the long-term stability of buildings, offering new insights for similar structures and contributing to innovation in civil engineering. (3) RL algorithms autonomously learn optimal strategies through environment interaction and reward feedback, with improvements in learning speed and convergence. Simulation analysis verifies the effectiveness of the proposed method and provides a new method for the flexible control of high-rise buildings, supporting the application of the algorithm in building control technology. This work fills a gap in the control of complex engineering systems, advancing the development of intelligent building control technologies.

2. Dynamic Model

The plan presented in Figure 1 depicts a single-story-tall building system. This system is composed of two flexible walls, which are interconnected by a rigid floor. An active mass damper in the form of a trolley moves on this rigid floor. This model is studied with reference to Quanser’s flexible building platform. To simplify this complex scenario, the left and right walls are modeled as two Euler–Bernoulli beams. Here,

L_{f}

represents the story height.

J_{c}

is the rotor moment of inertia of the damper.

μ_{f}

denotes the mass of the rigid floor, and

μ_{c}

is the mass of the damper. For a given time t,

ι_{l} (y, t)

and

ι_{r} (y, t)

are the elastic deflections of the floor walls at t time and y position, respectively.

ι_{l} (L_{f}, t)

and

ι_{r} (L_{f}, t)

are the elastic deflections of the left and right walls, respectively, at the position y. As the tops of the walls are linked by a rigid plate, the vibrations at the top of the left-hand wall and the right-hand wall are the same. In other words, we have

ι_{l} (L_{f}, t) = ι_{r} (L_{f}, t) = ι (L_{f}, t)

. The flexural rigidity of the flexible wall is denoted by

E I

, and

ρ

represents the uniform mass per unit length of the flexible wall.

s (t)

serves as the input for active boundary vibration control, while

x_{c} (t)

indicates the position of the active mass damper within the xoy coordinate system.

Remark 1.

For the sake of convenience, we present the following representation:

{(\cdot)}^{'} = \frac{\partial (\cdot)}{\partial y}, {(\cdot)}^{″} = \frac{\partial^{2} (\cdot)}{\partial y^{2}}, {(\cdot)}^{‴} = \frac{\partial^{3} (\cdot)}{\partial y^{3}}, {(\cdot)}^{(4)} = \frac{\partial^{4} (\cdot)}{\partial y^{4}}, \overset{\cdot}{(\cdot)} = \frac{\partial (\cdot)}{\partial t}, \overset{\cdot \cdot}{(\cdot)} = \frac{\partial^{2} (\cdot)}{\partial t^{2}}

The kinetic energy

T

of the system can be expressed as

\begin{array}{l} T = \frac{1}{2} {\int_{0}^{L_{f}} ρ [{\dot{ι}}_{l} (y, t)]}^{2} d y + \frac{1}{2} {\int_{0}^{L_{f}} ρ [{\dot{ι}}_{r} (y, t)]}^{2} d y \\ + \frac{1}{2} μ_{f} {[\dot{ι} (L_{f}, t)]}^{2} + \frac{1}{2} J_{c} {[{\dot{x}}_{c} (t)]}^{2} \\ + \frac{1}{2} μ_{c} {[{\dot{x}}_{c} (t) + \dot{ι} (L_{f}, t)]}^{2} \end{array}

(1)

The potential energy

V

is as follows:

V = \frac{1}{2} {\int_{0}^{L_{f}} E I [ι_{l}^{″} (y, t)]}^{2} d y + \frac{1}{2} {\int_{0}^{L_{f}} E I [ι_{r}^{″} (y, t)]}^{2} d y

(2)

The virtual work, denoted as

δ W (t)

, attributable to the non-conservative force is given as follows:

δ W = s (t) δ [x_{c} (t) + ι (L_{f}, t)]

(3)

Hamilton’s principle is

\int_{0}^{t_{f}} δ (T - V + W) d t = 0

(4)

We can obtain the dynamic model of a one-story flexible building system with an active mass damper as follows [12]:

ρ {\ddot{ι}}_{l} (y, t) + E I ι_{l}^{(4)} (y, t) = 0

(5)

ρ {\ddot{ι}}_{r} (y, t) + E I ι_{r}^{(4)} (y, t) = 0

(6)

μ_{f} \ddot{ι} (L_{f}, t) - J_{c} {\ddot{x}}_{c} (t) - E I [ι_{l}^{‴} (L_{f}, t) + ι_{r}^{‴} (L_{f}, t)] = 0

(7)

ι_{l} (0, t) = ι_{r} (0, t) = ι_{l}^{'} (0, t) = ι_{r}^{'} (0, t) = ι_{l}^{″} (L_{f}, t) = ι_{r}^{″} (L_{f}, t) = 0

(8)

ι_{l} (L_{f}, t) = ι_{r} (L_{f}, t) = ι (L_{f}, t)

(9)

μ_{c} \ddot{ι} (L_{f}, t) + (μ_{c} + J_{c}) {\ddot{x}}_{c} (t) = s (t)

(10)

Remark 2.

Vibration Equations (5) and (6) describe the dynamics of the left and right walls of the building, respectively. Boundary conditions (8) and (9) describe the dynamical properties of the floor boundaries. Boundary conditions (7) and (10) describe the connection between the top floor and the controller.

To address the vibration control challenge in uncertain systems, this section incorporates the Assume Mode Method (AMM). The AMM is utilized to approximate the system dynamics, which possess infinite-dimensional characteristics. By relying on the Assume Mode Method, the elastic deflection of the flexible wall can be depicted as

ι (y, t) = \sum_{i = 1}^{\infty} φ_{i} (y) T_{i} (t)

(11)

where

φ_{i} (y)

and

T_{i} (t)

correspond to the natural mode function and modal coordinates of the arm rod of the flexible manipulator, respectively. Considering that the low-order modes usually occupy the dominant position in the vibration process of a high-rise building system, this paper takes the first two modes for an in-depth study. We can obtain the following dynamic model [23]:

B \ddot{Q} + K Q = U

(12)

where

Q = {[x_{c}, T_{1}, T_{2}]}^{T} \in R^{3}

is a generalized coordinate,

U = {[s (t), 0, 0]}^{T} \in R^{3}

is the joint torque.

B \in R^{3 \times 3}

is the inertial matrix.

K \in R^{3 \times 3}

is the stiffness matrix. This is expressed as follows:

B [1, 1] = μ_{c} + J_{c}

B [1, 2] = B [2, 1] = \frac{1}{4} μ_{c} φ_{1} (L_{f})

B [1, 3] = B [3, 1] = \frac{1}{4} μ_{c} φ_{3} (L_{f})

B [2, 2] = ρ - \frac{μ_{f} + μ_{c}}{2} φ_{1}^{2} (L_{f})

B [3, 3] = ρ - \frac{μ_{f} + μ_{c}}{2} φ_{2}^{2} (L_{f})

B [2, 3] = B [3, 2] = \frac{μ_{f} + μ_{c}}{4} φ_{1} (L_{f}) φ_{2} (L_{f})

K = d i a g [0, w_{1}^{2} ρ, w_{2}^{2} ρ]

where

w_{n}

is the separation constant.

The system state space model is as follows:

If

x = [\begin{array}{l} x_{1} \\ x_{2} \end{array}] = [\begin{array}{l} Q \\ \dot{Q} \end{array}]

, there is

\{\begin{cases} {\dot{x}}_{1} = x_{2} \\ {\dot{x}}_{2} = - B^{- 1} K x_{1} + B^{- 1} U \end{cases}

(13)

This paper takes into account that faults are highly likely to occur during the long-term operation of high-rise building systems, and these faults often possess complex time-varying characteristics. Therefore, a kind of time-varying actuator fault is considered as follows:

U = ν τ + \bar{τ}

(14)

where

τ = {[τ_{1}, 0, 0]}^{T} \in R^{3}

denotes the designed actual control input,

ν = d i a g [ℏ, 1, 1] \in R^{3 \times 3}

is the unknown time-varying actuation effectiveness, and

\bar{τ} = {[{\bar{τ}}_{1}, 0, 0]}^{T} \in R^{3}

denotes the additive fault.

Assumption 1.

There exist positive constants

{\underline{ν}}_{i}

and

{\bar{ν}}_{i}

such that

ν_{i} \in [{\underline{ν}}_{i}, {\bar{ν}}_{i}]

. In addition,

{\bar{τ}}_{i}

is bounded.

When the above time-varying faults are considered, Formula (13) can be reduced to

\{\begin{cases} {\dot{x}}_{1} = x_{2} \\ {\dot{x}}_{2} = B^{- 1} (- K x_{1} + ν τ + \bar{τ}) \end{cases}

(15)

3. Controller Design and Stability Analysis

In this part of the discussion, an optimal control strategy for a flexible high-rise building system is introduced to realize vibration suppression. The state error is defined as

e = Q - Q_{d}

, where

Q = {[x_{c}, T_{1}, T_{2}]}^{T}

and

Q_{d} = {[x_{c d}, T_{1 d}, T_{2 d}]}^{T}

represents the desired value of the building, so

\dot{e} = \dot{Q}

. Therefore, the control goal aims to design a control scheme that satisfies

e \to 0, \dot{e} \to 0

when

t \to \infty

.

3.1. Reinforcement Learning Control Algorithm

The actor–critic algorithm is used to design a reinforcement learning strategy to achieve high-rise building vibration control. For the control structure block diagram of the flexible high-rise building system in Figure 2, the whole control process can be clearly presented as follows: Firstly, the flexible high-rise building system is used as the control object, and the control law is generated through the network, which is the starting point of the whole control process. Subsequently, the actor NN starts to work, generates the update law, and determines the control strategy based on this update law. At the same time, the critic NN calculates the cost function, which leads to the update value and provides an evaluation basis for the system control. The RBF neural network is also involved, interacting with the outputs of the actor NN and the critic NN to assist in the system control. In terms of state processing, the desired state is set, which is compared to the actual state variables of the system and the error is calculated. At this point, the system needs to analyze and respond to possible failure scenarios, taking into account the potential factor of actuator failure. At the same time, asymmetric constraints come into play to limit the system state and ensure that the system operates within a reasonable range of states. After that, the stability of the system is analyzed in depth using the Lyapunov function to determine whether the system can operate stably. Finally, the system uncertainty is approximated, and the control strategy is further optimized based on the calculations and analyses in the previous sections, so that the flexible high-rise building system can reach the desired state as much as possible and achieve stable and reliable operation.

The long-term cost function is presented in the following way:

C_{i} (t) = \int_{0}^{\infty} e^{- \frac{m - t}{ψ_{i}}} Y_{i} (m) d m

(16)

where

ψ_{i}

is the constant, and used to approximate future cost.

Y_{i} (m)

represents the immediate price, which is expressed as following:

Y_{i} (m) = {(Q_{i} - Q_{d i})}^{T} D_{i} (Q_{i} - Q_{d i}) + {τ_{i}}^{T} R_{i} τ_{i}

(17)

where matrices

D_{i}

and

R_{i}

are positive definite.

Due to the outstanding performance of the RBF neural network in nonlinear estimation, it is widely used in many fields in practical scenarios and plays an important role. For the continuous function

f (Z_{i}) : ℝ^{q} \to ℝ

, define as follows:

f_{i n} (Z_{i}) = {W_{i}}^{T} S_{i} (Z_{i})

(18)

where

Z_{i}

represents the input of neural network, and

Z_{i} = {[Z_{i 1}, Z_{i 2}, \dots, Z_{i q}]}^{T} \subset ℝ^{q}

.

W_{i} = {[W_{i 1}, W_{i 2}, \dots, W_{i q}]}^{T} \subset ℝ^{l}

with neural network node number

l > 1

denotes the weight of neural network.

S_{i} (Z_{i}) = {[S_{i 1} (Z_{i}), S_{i 2} (Z_{i}), \dots, S_{i l} (Z_{i})]}^{T}

is the radial basis function, where

S_{i k} (Z_{i})

adopts the Gaussian function form as follows:

S_{i k} (Z_{i}) = \exp [- \frac{{(Z_{i} - υ_{k})}^{T} (Z_{i} - υ_{k})}{χ^{2}}]

(19)

where

υ_{k} = {[υ_{k 1}, υ_{k 2}, \dots, υ_{k q}]}^{T} (k = 1, 2, \dots, l)

is the center of the receptive field, and

χ

is the width of the Gaussian function. In this paper, an RBF neural network is applied to estimate unknown continuous functions on compact sets

Ω_{Z_{i}} \subset ℝ^{q}

.

f (Z_{i}) = W_{i}^{* T} S_{i} (Z_{i}) + ∍_{i}, \forall Z_{i} \in Ω_{Z_{i}}

(20)

where

{W_{i}}^{*}

represents the ideal weight, while

∍_{i}

represents the bounded approximation error.

Based on the analysis above, consider

C_{i} = W_{i C}^{* T} S_{i C} (Z_{i C}) + ∍_{i C}

; then,

{\hat{C}}_{i} = {\hat{W}}_{i C}^{T} S_{i C} (Z_{i C})

, where

Z_{i C} = e_{1 i}

is the error variables.

∍_{i C}

is to critique the approximate residual of the neural network. By calculating (16), the approximate error of the cost function is expressed as

ϒ_{i} (t) = Y_{i} (t) - \frac{1}{ψ_{i}} {\hat{C}}_{i} (t) + {\dot{\hat{C}}}_{i} (t)

(21)

When

ψ_{i} \to \infty

, the approximate error of the cost function can be set forth as

ϒ_{i} (t) = Y_{i} (t) + {\dot{\hat{C}}}_{i} (t)

(22)

By utilizing the gradient descent method, the critic neural network is updated as follows:

{\dot{\hat{W}}}_{i C} = - σ_{i C} \frac{\partial E_{i C}}{\partial {\hat{W}}_{i C}}

(23)

where

E_{i C} = \frac{1}{2} {ϒ_{i}}^{T} ϒ_{i}

is to adjust the neural network weight

{\dot{\hat{W}}}_{i C}

, and taking (22) into (23) yields

\begin{array}{l} {\dot{\hat{W}}}_{i C} = - σ_{i C} ϒ_{i} (t) \frac{\partial ϒ_{i}}{\partial {\hat{W}}_{i C}} \\ = - σ_{i C} ϒ_{i} (t) \frac{\partial [Y_{i} (t) - (1 / ψ_{i}) {\hat{C}}_{i} (t) + {\dot{\hat{C}}}_{i} (t)]}{\partial {\hat{W}}_{i C}} \\ = - σ_{i C} (Y_{i} (t) + {\hat{W}}_{i C}^{T} η_{i}) η_{i} \end{array}

(24)

where

σ_{i C} > 0

represents the learning rate of the neural network, and

η_{i} = - \frac{S_{i C}}{ψ_{i}} + \nabla S_{i C} {\dot{Z}}_{i C}

,

\nabla

is the gradient to

Z_{i C}

.

3.2. Stability Analysis

Define the error variables as follows:

e_{1} = Q - Q_{d}

(25)

e_{2} = \dot{Q} - α

(26)

where

e_{1}

is the state error and

e_{2}

denotes the state variables,

Q

and

Q_{d}

denote the state variables and desired state variables of the high-rise building system, respectively.

α = {[α_{1}, α_{2}, α_{3}]}^{T}

is a virtual variable; the specific expressions of dummy variables are given later. The derivation of Formulas (25) and (26) gives

{\dot{e}}_{1} = e_{2} + α = e_{2} - K_{1} e_{1}

(27)

{\dot{e}}_{2} = \ddot{Q} - \dot{α} = B^{- 1} [ν τ + \bar{τ} - K Q] - \dot{α}

(28)

Choose the following Lyapunov candidate function as follows:

V = V_{1} + V_{2}

(29)

where

V_{1} = \frac{1}{2} e_{2}^{T} B e_{2}

(30)

V_{2} = \frac{1}{2} \sum_{i = 1}^{3} q (e_{1 i}) l n \frac{N_{1 i}^{2}}{N_{1 i}^{2} - e_{1 i}^{2}} + \frac{1}{2} \sum_{i = 1}^{3} (1 - q (e_{1 i})) l n \frac{N_{2 i}^{2}}{N_{2 i}^{2} - e_{1 i}^{2}}

(31)

where

q (e_{1 i}) = \{\begin{cases} 1 e_{1 i} > 0 \\ 0 e_{1 i} \leq 0 \end{cases}

,

N_{1 i}

and

N_{2 i}

are constraints on the error

e_{1 i}

.

When

e_{1 i} > 0

, there are

q (e_{1 i}) = 1

, and Formula (31) can be reduced to

V_{2} = \frac{1}{2} \sum_{i = 1}^{3} q (e_{1 i}) l n \frac{N_{1 i}^{2}}{N_{1 i}^{2} - e_{1 i}^{2}}

(32)

When

e_{1 i} \leq 0

, there are

q (e_{1 i}) = 0

, and Formula (31) can be reduced to

V_{2} = \frac{1}{2} \sum_{i = 1}^{3} (1 - q (e_{1 i})) l n \frac{N_{2 i}^{2}}{N_{2 i}^{2} - e_{1 i}^{2}}

(33)

In summary, Formula (31) can be written as

V_{2} = \frac{1}{2} \sum_{i = 1}^{3} l n \frac{1}{1 - ϖ_{i}^{2}}

(34)

where

ϖ_{i} = (1 - q (e_{1 i})) ϖ_{1 i} + q (e_{1 i}) ϖ_{2 i}, i = 1, 2, 3

,

ϖ_{1 i} = [\frac{e_{11}}{N_{21}}, \frac{e_{12}}{N_{22}}, \frac{e_{13}}{N_{23}}]

,

ϖ_{2 i} = [\frac{e_{11}}{N_{11}}, \frac{e_{12}}{N_{12}}, \frac{e_{13}}{N_{13}}]

.

The derivative of (29) is

\dot{V} = {\dot{V}}_{1} + {\dot{V}}_{2}

(35)

where

{\dot{V}}_{1} = e_{2}^{T} B {\dot{e}}_{2}

(36)

\begin{array}{l} {\dot{V}}_{2} = \sum_{i = 1}^{3} l n \frac{ϖ_{i} {\dot{ϖ}}_{i}}{1 - ϖ_{i}^{2}} \\ = \sum_{i = 1}^{3} l n \frac{q_{i} ϖ_{2 i}}{N_{1 i} (1 - ϖ_{2 i}^{2})} ({\dot{e}}_{1 i} - e_{1 i} \frac{{\dot{N}}_{1 i}}{N_{1 i}}) + \sum_{i = 1}^{3} l n \frac{(1 - q_{i}) ϖ_{1 i}}{N_{2 i} (1 - ϖ_{1 i}^{2})} ({\dot{e}}_{1 i} - e_{1 i} \frac{{\dot{N}}_{2 i}}{N_{2 i}}) \end{array}

(37)

By further simplification, Equation (37) can be derived to

\begin{array}{l} {\dot{V}}_{2} = \sum_{i = 1}^{3} l n \frac{q_{i} ϖ_{2 i}}{N_{1 i} (1 - ϖ_{2 i}^{2})} (e_{2 i} + α_{i} - e_{1 i} \frac{{\dot{N}}_{1 i}}{N_{1 i}}) \\ + \sum_{i = 1}^{3} l n \frac{(1 - q_{i}) ϖ_{1 i}}{N_{2 i} (1 - ϖ_{1 i}^{2})} (e_{2 i} + α_{i} - e_{1 i} \frac{{\dot{N}}_{2 i}}{N_{2 i}}) \end{array}

(38)

where

α = - K_{1} e_{1} - K_{2} e_{1}

and

K_{1} = K_{1}^{T} > 0

, where

K_{1}

is a positive definite 3-dimensional matrix,

K_{2} = d i a g [k_{21} (t), k_{22} (t), k_{23} (t)]

, with

k_{2 i} (t) = \sqrt{{({\dot{N}}_{1 i} / N_{1 i})}^{2} + {({\dot{N}}_{2 i} / N_{2 i})}^{2} + o_{i}}

. The positive constants of

o_{i}

are to be designed, ensuring that

k_{2 i} (t)

is always bounded.

Lemma 1.

In the scope

|ϖ| < 1

, the following inequality holds for

ϖ \in ℝ

[24].

l n \frac{1}{1 - ϖ^{2}} \leq \frac{ϖ^{2}}{1 - ϖ^{2}}

(39)

Using the formula

ϖ_{1 i}

,

ϖ_{2 i}

,

α

, and Lemma 1, Formula (38) can be reduced to

{\dot{V}}_{2} \leq \sum_{i = 1}^{3} ϕ_{i} e_{1 i} e_{2 i} - \sum_{i = 1}^{3} \frac{ϖ_{i}^{2}}{(1 - ϖ_{i}^{2})} (k_{1 i} + k_{2 i} + q_{i} \frac{{\dot{N}}_{1 i}}{N_{1 i}} + (1 - q_{i}) \frac{{\dot{N}}_{2 i}}{N_{2 i}})

(40)

where

ϕ_{i} = \frac{q_{i}}{N_{1 i}^{2} - e_{1 i}^{2}} + \frac{1 - q_{i}}{N_{2 i}^{2} - e_{1 i}^{2}}

. Considering that

k_{2 i} + q_{i} \frac{{\dot{N}}_{1 i}}{N_{1 i}} + (1 - q_{i}) \frac{{\dot{N}}_{2 i}}{N_{2 i}} > 0

in Formula (40), the following can be obtained:

{\dot{V}}_{2} \leq \sum_{i = 1}^{3} ϕ_{i} e_{1 i} e_{2 i} - \sum_{i = 1}^{3} \frac{ϖ_{i}^{2}}{(1 - ϖ_{i}^{2})} k_{1 i}

(41)

Considering (36) and (41), we obtain

\dot{V} \leq e_{2}^{T} B {\dot{e}}_{2} + \sum_{i = 1}^{3} ϕ_{i} e_{1 i} e_{2 i} - \sum_{i = 1}^{3} \frac{ϖ_{i}^{2}}{(1 - ϖ_{i}^{2})} k_{1 i}

(42)

Substitute Formula (28) into Formula (42) to obtain

\dot{V} \leq e_{2}^{T} (ν τ + \bar{τ} - K Q - B \dot{α}) + \sum_{i = 1}^{3} ϕ_{i} e_{1 i} e_{2 i} - \sum_{i = 1}^{3} \frac{ϖ_{i}^{2}}{(1 - ϖ_{i}^{2})} k_{1 i}

(43)

Because of this, the design control system input is as follows:

τ 1 = ν^{- 1} (- ϕ e_{1} - K_{3} e_{2} + K Q + B \dot{α} - \bar{τ})

(44)

where

K_{3} = K_{3}^{T} > 0

is an augmented moment matrix. Substituting control input (44) into (43) yields

\dot{V} \leq - \sum_{i = 1}^{3} \frac{ϖ_{i}^{2}}{(1 - ϖ_{i}^{2})} k_{1 i} - e_{2}^{T} K_{3} e_{2}

(45)

Because of the uncertainty of dynamic information, the control system based on the above model is not suitable for a control system in the real world. To overcome this challenge, based on Section 3.1, we design the following control law:

τ 2 = ν^{- 1} (- ϕ e_{1} - K_{3} e_{2} + {\hat{W}}_{a}^{T} S_{a} (Z_{a}))

(46)

where

W_{a}

is the weight of executing the divine channel network, and

S_{a} (Z_{a})

represents the basis function of the execution network. The definition of

{\hat{W}}_{a}^{T} S_{a} (Z_{a})

approaching value

W_{a}^{* T} S_{a} (Z_{a})

is as follows:

W_{a}^{* T} S_{a} (Z_{a}) = K Q + B \dot{α} - \bar{τ} + ε (Z_{a})

(47)

where

Z_{a}

represents the input data, and

ε (Z_{a})

represents the approach error.

The definite approximation error is

ζ_{a} = {\tilde{W}}_{a}^{T} S_{a} (Z_{a})

and the error of the actor neural network is designed as follows:

Ψ_{a} = ζ_{a} + K_{4} (\hat{C} (t) - C_{d} (t))

(48)

where

K_{4} \in ℝ^{3} > 0

. The value of the hope cost function in the definite period is 0.

Definition

Ξ_{a} = \frac{1}{2} Ψ_{a}^{T} Ψ_{a}

. The law of design actor neural network update rate is as follows:

{\dot{\hat{W}}}_{a} = - σ_{a} \frac{\partial Ξ_{a}}{\partial {\hat{W}}_{a}}

(49)

Hence, substituting (48) into (49) yields

{\dot{\hat{W}}}_{a} = - σ_{a} \frac{\partial Ξ_{a}}{\partial Ψ_{a}} \frac{\partial Ψ_{a}}{\partial ζ_{a}} \frac{\partial ζ_{a}}{\partial {\hat{W}}_{a}} = - σ_{a} (ζ_{a} + K_{4} \hat{C}) S_{a}

(50)

where

σ_{a} > 0

is the learning rate of actor neural network. The new updating law is defined as

{\dot{\hat{W}}}_{a} = - σ_{a} ({\hat{W}}_{a}^{T} S_{a} (Z_{a}) + K_{4} \hat{C}) S_{a}

(51)

A Lyapunov function is defined as

V_{3} = \frac{1}{2} \sum_{i = 1}^{3} {\tilde{W}}_{i C}^{T} {\tilde{W}}_{i C}

(52)

where

{\tilde{W}}_{i C} = W_{i C}^{*} - {\hat{W}}_{i C}

is the parameter estimation error.

Derivation of the above equation:

{\dot{V}}_{3} = \sum_{i = 1}^{3} {\tilde{W}}_{i C}^{T} {\dot{\hat{W}}}_{i C}

(53)

Substitute Formula (24) into Formula (53) to obtain

{\dot{V}}_{3} = \sum_{i = 1}^{3} {\tilde{W}}_{i C}^{T} (- σ_{i C} (Y_{i} (t) + {\hat{W}}_{i C}^{T} η_{i}) η_{i})

(54)

When

t \to \infty

, there is

ϒ_{i} (t) \leq τ

and

τ

is a small positive constant. For this, the following can be obtained:

\begin{array}{l} Y_{i} (t) \leq \frac{1}{ψ_{i}} {\hat{C}}_{i} (t) + {\dot{\hat{C}}}_{i} (t) + τ \\ \leq W_{i C}^{* T} \frac{S_{i C} (Z_{i C})}{ψ_{i}} + \frac{∍_{i C}}{ψ_{i}} - \nabla (W_{i C}^{* T} S_{i C} (Z_{i C}) + ∍_{i C}) {\dot{Z}}_{i C} + τ \\ \leq - W_{i C}^{* T} η_{i} + β_{i C} \end{array}

(55)

where

β_{i C} = \frac{∍_{i C}}{ψ_{i}} - \nabla ∍_{i C} {\dot{Z}}_{i C} + τ

,

‖β_{i C}‖ \leq β_{i C, \max}

is bounded.

Formula (54) is obtained by substituting (55):

{\dot{V}}_{3} \leq \sum_{i = 1}^{3} - σ_{i C} {\tilde{W}}_{i C}^{T} ({\tilde{W}}_{i C}^{T} η_{i} + β_{i C}) η_{i}

(56)

Further simplification of the above equation can be obtained:

{\dot{V}}_{3} \leq \sum_{i = 1}^{3} - \frac{1}{2} σ_{i C} {\tilde{W}}_{i C}^{T} {\tilde{W}}_{i C}^{T} η_{i}^{T} η_{i} + \sum_{i = 1}^{3} \frac{1}{2} σ_{i C} β_{i C}^{T} β_{i C}

(57)

For the purpose of the stability analysis, the Lyapunov function is formulated as below:

V = V_{1} + V_{2} + \frac{1}{2} {\tilde{W}}_{a}^{T} {\tilde{W}}_{a} + \frac{1}{2} {\tilde{W}}_{C}^{T} {\tilde{W}}_{C}

(58)

The derivation of the above Equation (58) yields:

\begin{array}{l} \dot{V} \leq - e_{2}^{T} K_{3} {\dot{e}}_{2} - \sum_{i = 1}^{3} \frac{ϖ_{i}^{2}}{(1 - ϖ_{i}^{2})} k_{1 i} + \sum_{i = 1}^{3} e_{2 i}^{T} ({\tilde{W}}_{i a}^{T} S_{i a} - ξ_{i a}) \\ - \sum_{i = 1}^{3} σ_{i a} {\tilde{W}}_{i a}^{T} S_{i a} ({\tilde{W}}_{i a}^{T} S_{i a} + K_{4 i} {\hat{C}}_{i}) - \sum_{i = 1}^{3} σ_{i C} {\tilde{W}}_{i C}^{T} (Y_{i} (t) + {\hat{W}}_{i C}^{T} η_{i}) η_{i} \end{array}

(59)

In which

{\hat{C}}_{i} = W_{i C}^{* T} S_{i C} + {\tilde{W}}_{i C}^{T} S_{i C}

, have

{\hat{C}}_{i}^{T} {\hat{C}}_{i} \leq 2 {(W_{i C}^{* T} S_{i C})}^{T} W_{i C}^{* T} S_{i C} + 2 {({\tilde{W}}_{i C}^{T} S_{i C})}^{T} W_{i C}^{* T} S_{i C}

, the above formula can be reduced to

\begin{array}{l} \dot{V} \leq - e_{2}^{T} K_{3} {\dot{e}}_{2} - \sum_{i = 1}^{3} \frac{ϖ_{i}^{2}}{(1 - ϖ_{i}^{2})} k_{1 i} - \sum_{i = 1}^{3} \frac{1}{2} (σ_{i a} - 1) {‖{\tilde{W}}_{i a}‖}^{2} {‖S_{i a}‖}^{2} \\ + \sum_{i = 1}^{3} 2 σ_{i a} K_{4 i}^{2} {‖{\tilde{W}}_{i C}^{*}‖}^{2} {‖S_{i C}‖}^{2} - \sum_{i = 1}^{3} \frac{1}{2} (σ_{i C} η_{i}^{T} η_{i} - 4 σ_{i a} K_{4 i}^{2} {‖S_{i C}‖}^{2}) {‖{\tilde{W}}_{i C}^{T}‖}^{2} \\ + \sum_{i = 1}^{3} \frac{1}{2} σ_{i C} β_{i C, \max}^{T} β_{i C, \max} + \sum_{i = 1}^{3} \frac{1}{2} σ_{i a} {‖W_{i a}^{*}‖}^{2} {‖S_{i a}‖}^{2} \end{array}

(60)

Lemma 2.

Let the initial condition

V (0)

be bounded and

V (t) \geq 0

be continuous and bounded with [25]:

\dot{V} (t) \leq - Γ V + Λ

(61)

where

Γ, Λ > 0

.

Considering Lemma 1 and 2, (60) can be rewritten as

\begin{array}{l} \dot{V} \leq - e_{2}^{T} K_{3} {\dot{e}}_{2} - \sum_{i = 1}^{3} k_{1 i} l n \frac{1}{1 - ϖ_{i}^{2}} - \sum_{i = 1}^{3} \frac{1}{2} (σ_{i a} - 1) {‖{\tilde{W}}_{i a}‖}^{2} {‖S_{i a}‖}^{2} \\ + \sum_{i = 1}^{3} 2 σ_{i a} K_{4 i}^{2} {‖W_{i C}^{*}‖}^{2} {‖S_{i C}‖}^{2} - \sum_{i = 1}^{3} \frac{1}{2} (σ_{i C} η_{i}^{2} - 4 σ_{i a} K_{4 i}^{2} {‖S_{i C}‖}^{2}) {‖{\tilde{W}}_{i C}^{T}‖}^{2} \\ + \sum_{i = 1}^{3} \frac{1}{2} σ_{i C} β_{i C, \max}^{T} β_{i C, \max} + \sum_{i = 1}^{3} \frac{1}{2} σ_{i a} {‖W_{i a}^{*}‖}^{2} {‖S_{i a}‖}^{2} \\ \leq - Γ V + Λ \end{array}

(62)

where

Γ = \min \{\begin{array}{l} λ_{\min} (2 K_{3}), λ_{\min} (2 K_{1}), λ_{\min} (σ_{i a} - 1) b_{i s}, \\ λ_{\min} (σ_{i C} g_{i}^{2} - 4 σ_{i a} K_{4 i}^{2} {‖S_{i C}‖}^{2}) \end{array}\}

Λ = + \sum_{i = 1}^{3} \frac{1}{2} σ_{i C} β_{i C, \max}^{T} β_{i C, \max} + \sum_{i = 1}^{3} \frac{1}{2} σ_{i a} {‖W_{i a}^{*}‖}^{2} {‖S_{i a}‖}^{2} + \sum_{i = 1}^{3} 2 σ_{i a} K_{4 i}^{2} {‖W_{i C}^{*}‖}^{2} {‖S_{i C}‖}^{2}

where

g_{i} \leq ‖η_{i}‖

,

b_{i s} \leq ‖S_{i a}‖

, and in order to ensure

Γ > 0

, the following conditions are given by

λ_{\min} (K_{1}) > 0

,

λ_{\min} (σ_{i a} - 1) b_{i s} > 0

,

λ_{\min} (σ_{i C} g_{i}^{2} - 4 σ_{i a} K_{4 i}^{2} {‖S_{i C}‖}^{2}) > 0

,

λ_{\min} (K_{3}) > 0

.

Theorem 1.

Taking into account that all state information of the flexible high-rise building system is available and the initial condition is within bounded limits, the presented reinforcement learning control algorithm makes sure that the closed-loop system remains semi-globally and uniformly bounded.

These system signals

e_{1}

,

e_{2}

,

{\tilde{W}}_{a}

, and

{\tilde{W}}_{C}

will eventually converge to

Ω_{e_{1}}

,

Ω_{e_{2}}

,

Ω_{{\tilde{W}}_{a}}

, and

Ω_{{\tilde{W}}_{C}}

, which are defined as follows:

Ω_{e_{1}} = \{e_{1} \in ℝ^{3} |‖e_{1}‖ \leq \sqrt{2 (V (0) + \frac{Λ}{Γ})}\}

Ω_{e_{2}} = \{e_{2} \in ℝ^{3} |‖e_{2}‖ \leq \sqrt{\frac{2 (V (0) + \frac{Λ}{Γ})}{λ_{\min} (H)}}\}

Ω_{{\tilde{W}}_{a}} = \{{\tilde{W}}_{a} \in ℝ^{l \times 3} |‖{\tilde{W}}_{a}‖ \leq \sqrt{2 (V (0) + \frac{Λ}{Γ})}\}

Ω_{{\tilde{W}}_{C}} = \{{\tilde{W}}_{C} \in ℝ^{l} |‖{\tilde{W}}_{C}‖ \leq \sqrt{2 (V (0) + \frac{Λ}{Γ})}\}

4. Numerical Simulations

Through MATLAB R2019b numerical simulations, this section demonstrates the effectiveness of reinforcement learning for flexible high-rise buildings experiencing time-varying actuator failures and presenting asymmetric states. Table 1 shows the physical parameters.

The control goal aims to design a control scheme that satisfies

e \to 0, \dot{e} \to 0

when

t \to \infty

. The system initial values are

x_{c} (t) = 0

,

ι (L_{f}, t) = 0.08

, and the system expected values are

x_{c d} (t) = 0.08

,

ι_{d} (L_{f}, t) = 0

.

When

t = 3 s

, with sudden time-varying actuator failure, fault selection is as follows:

s (t) = 20 \cos (t) τ_{1} + 0.1 \sin 0.5

(63)

Case 1. No control input.

τ_{1} = 0

.

Case 2. Under PD control. The controller is presented as follows:

τ_{1} = - k_{p} e - k_{d} \dot{e}

(64)

where the parameters are given as

k_{p 1} = 463

and

k_{v 1} = 96

.

Case 3. With proposed control (46). Control parameters are selected as follows:

K_{11} = 1.2

,

K_{31} = 23

,

K_{41} = 57

. Asymmetric time-varying state constraints,

{\underline{k}}_{c_{1}} < x_{c} (t) < {\bar{k}}_{c_{1}}

,

{\underline{k}}_{c_{2}} < T_{1} (t) < {\bar{k}}_{c_{2}}

,

{\underline{k}}_{c_{1}} = 0 {. 07 e}^{- 0.5 t} + 0 . 01

,

{\bar{k}}_{c_{1}} = - 0 {. 15 e}^{- 2 t} - 0 . 03

,

{\underline{k}}_{c_{2}} = 0 {. 06 e}^{- 0.5 t} + 0 . 01

,

{\bar{k}}_{c_{2}} = - 0 {. 12 e}^{- 2 t} - 0 . 03

. The learning rate of reinforcement learning is

σ_{1 a} = 0.01

,

σ_{1 c} = 4.8

.

Based on the above three schemes, the control performance is compared by simulation analysis. The simulation results are as follows:

Figure 3 shows the three-dimensional vibration deformation curve of the flexible wall under uncontrolled action. In the above figure, the x axis represents the height of the flexible wall, the y axis represents the simulation time, and the z axis represents the change in the vibration of the flexible wall. It can be seen from the figure that the vibration deformation of the single-story building system cannot converge to 0 without control. Figure 4 shows the three-dimensional vibration deformation curve of the flexible wall under the action of the proposed controller (46). Similarly to the above described without control, the x axis, y axis and z axis represent the height of the flexible wall, simulation time and vibration change in the flexible wall, respectively. It can be seen from Figure 4 that sudden failure occurs at 3 s, the vibration of the flexible wall tends to be stable at 5 s, and the vibration is effectively suppressed.

From the comparison curves in Figure 5 and Figure 6, it can be seen that under the same initial conditions, both the floor slab vibration and the tracking error of the active mass damper are effectively suppressed and tracked within 5 s when the proposed controller (Case 3) is employed. Specifically, the proposed controller is able to rapidly reduce the vibration amplitude of the floor slab and maintain a small tracking error, which indicates its strong performance in dynamic control. In contrast, although the PD control scheme (Case 2) is also able to suppress the vibration, the effect is obviously inferior to that of the proposed controller, especially in terms of the accuracy and response speed. When a sudden time-varying fault occurs in the system at the moment t = 3, the proposed control scheme is able to cope with the fault more efficiently, keeping the vibration and error low, whereas the PD control scheme fails to deal with such a sudden change well, resulting in a poorer performance of the system. This phenomenon indicates that the proposed control scheme is more robust and adaptive in coping with time-varying faults.

Figure 7 demonstrates the comparison of the vibration acceleration of the floor slab, which further validates the advantage of the proposed control scheme in suppressing the vibration acceleration. By comparing with the PD control scheme, it is found that the proposed controller can not only effectively suppress the vibration amplitude of the floor slab, but can also significantly reduce the vibration acceleration to avoid structural damage caused by excessive acceleration. Finally, Figure 8 shows the torque comparison curves under the control of Case 2 and 3. From the figure, it can be seen that after a sudden time-varying fault occurs at the moment t = 3, the proposed controller is able to effectively compensate for the impact of the fault through fault-tolerant control to ensure the stable operation of the system, while the PD control scheme fails to provide a similar fault-tolerant function, resulting in unstable torque changes. Therefore, the proposed control scheme shows better performance in vibration suppression, acceleration control and fault tolerance, and is more adaptable to maintaining system stability and safety under complex and sudden fault conditions.

5. Conclusions

Focusing on the vibration suppression of high-rise building systems under extreme conditions, this paper investigates a reinforcement learning vibration control strategy for a class of flexible building systems with time-varying faults and asymmetric state constraints. Firstly, a mathematical model that accurately describes the dynamic characteristics of flexible high-rise buildings is established, taking into account the time-varying nature of actuator faults. Secondly, a reinforcement learning-based controller is designed to compensate for the impact of faults on system performance. By introducing the time-varying asymmetric Lyapunov function, the system state constraint is guaranteed to ensure the stability and security of the system. The stability of the closed-loop system is strictly proven through Lyapunov stability theory, ensuring that the system can maintain stable vibration suppression performance even in the presence of faults. Finally, the simulation results show that the proposed reinforcement learning vibration control method can effectively suppress the vibration response of flexible high-rise buildings in the face of time-varying actuator faults, demonstrating strong robustness and adaptability, and providing a new and effective approach for solving vibration control problems in actual flexible high-rise buildings.

However, this study has its limitations. The model developed may simplify certain complex practical applications. For example, it may not be able to fully consider the intricate interactions between building structures and foundation soils under extreme conditions, which may affect the practical application of control strategies. In addition, the validation is based on a simulation only. Factors such as unforeseen environmental perturbations, sensor errors, and the complex behavior of building materials under long-term stresses are not considered in practical applications, which poses a challenge to the direct implementation of the proposed method.

For future research, the existing research will be expanded. In terms of theoretical research, we will combine big data analysis and artificial intelligence algorithms to further optimize the reinforcement learning model, mine more potential system characteristics and control laws, and achieve more accurate prediction and response to complex and changing faults and environmental factors. In terms of experimental validation, we will build a simulation platform more similar to the actual situation, simulate different vibration environments and different building types of high-rise building conditions, and accumulate a large amount of experimental data to provide solid support for the theoretical research.

6. Patents

This paper mainly focuses on the vibration suppression problem of flexible high-rise building systems, considers the time-varying fault and asymmetric constraint problem, and also adopts the control strategy based on reinforcement learning to control the system vibration.

Author Contributions

M.L.: conceptualization; methodology; formula derivation, software programming; writing—original draft. R.X.: writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All of the data, models, and code generated or used during the study appear in the submitted article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

RL—reinforcement learning;

L_{f}

—the story height, m;

J_{c}

—the rotor moment of inertia of the damper, Kgm²;

μ_{f}

—the mass of the rigid floor, Kg;

μ_{c}

—the mass of the damper for a given time t, Kg;

E I

—the flexural rigidity of the flexible wall, Nm²;

ρ

—the uniform mass per unit length of the flexible wall, Kg/m;

s (t)

—the input for active boundary vibration control, N;

x_{c} (t)

—the position of the active mass damper within the xoy coordinate system, m;

ι_{l} (y, t)

—the elastic deflection of the left flexible wall at time t and position y;

ι_{r} (y, t)

—the elastic deflection of the right flexible wall at time t and position y;

ι_{l} (L_{f}, t)

—the elastic deflection of the left wall at the

L_{f}

position at time t;

ι_{r} (L_{f}, t)

—the elastic deflection of the right wall at the

L_{f}

position at time t;

C_{i} (t)

—the long-term cost function;

{\dot{\hat{W}}}_{i C}

—the critic neural network update rate;

{\dot{\hat{W}}}_{a}

—the actor neural network update rate;

e_{1}

—the state errors;

e_{2}

—the state variables;

α

—the virtual variable.

References

Tippu, J.; Saravanasankar, S.; Sankaranarayanan, B.; Ali, S.M.; Qarnain, S.S.; Karuppiah, K. Towards sustainability: Analysis of energy efficiency factors in buildings of smart cities using an integrated framework. J. Inst. Eng. (India) Ser. A 2023, 104, 223–235. [Google Scholar] [CrossRef]
Ren, J.; Pan, P.; Wang, T.; Zhou, Y.; Wang, H.; Shan, M. Interpretation of evaluation wtandards for seismic toughness of buildings. J. Build. Struct. 2021, 42, 48–56. [Google Scholar]
Meng, B.; Zhao, Y. The dynamics characteristics of flexible spacecraft and its closed loop stability with passive control. J. Syst. Sci. Complex. 2021, 34, 860–872. [Google Scholar] [CrossRef]
Feng, J.; Liu, Z.; He, X.; Fu, Q.; Li, G. Adaptive vibration control for an active mass damper of a high-rise building. IEEE Trans. Syst. Man Cybern. Syst. 2020, 52, 1970–1983. [Google Scholar] [CrossRef]
Feng, J.; Liu, Z.; He, X.; Li, Q.; He, W. Vibration suppression of a high-rise building with adaptive iterative learning control. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 4261–4272. [Google Scholar] [CrossRef]
Tong, S.; Yang, H.; Liu, S. PDE-based adaptive control of a flexible manipulator with actuator and sensor faults. In Proceedings of the 2021 China Automation Congress (CAC), Beijing, China, 22–24 October 2021; pp. 7832–7837. [Google Scholar]
Liu, S.; Tong, S.; Li, Y. Adaptive Boundary Control for Multi-Constrained Rigid-Flexible Robot with Actuator and Sensor Concurrent Failures. IEEE Sens. J. 2023, 23, 26564–26574. [Google Scholar] [CrossRef]
Zhang, D.; Kong, L.; He, W.; Yu, X. Fixed-Time Control for a Flexible Smart Structure With Actuator Failure: A Broad Learning System Approach. IEEE Trans. Cybern. 2023, 54, 4322–4334. [Google Scholar] [CrossRef]
Yao, X.; Sun, H.; Zhao, Z.; Liu, Y. Event-triggered bipartite consensus tracking and vibration control of flexible Timoshenko manipulators under time-varying actuator faults. IEEE/CAA J. Autom. Sin. 2024, 11, 1190–1201. [Google Scholar] [CrossRef]
Ji, N.; Liu, J. Vibration control for a three-dimensional variable length flexible string with time-varying actuator faults and unknown control directions. IEEE Trans. Autom. Sci. Eng. 2022, 20, 2761–2771. [Google Scholar] [CrossRef]
Li, L.; Liu, J. Nussbaum function-based adaptive boundary control for flexible manipulator with unknown control directions and nonlinear time-varying actuator faults. Int. J. Robust Nonlinear Control 2023, 33, 6778–6798. [Google Scholar] [CrossRef]
Zhang, S.; Li, Q.; Zhao, X.; Liu, Z.; Li, G. Vibration control for an active mass damper of a high-rise building with input and output constraints. IEEE/ASME Trans. Mechatron. 2022, 28, 186–196. [Google Scholar] [CrossRef]
Wang, C.; Wu, Y.; Wang, F.; Zhao, Y. TABLF-based adaptive control for uncertain nonlinear systems with time-varying asymmetric full-state constraints. Int. J. Control 2021, 94, 1238–1246. [Google Scholar] [CrossRef]
Song, Y.D.; Zhou, S. Tracking control of uncertain nonlinear systems with deferred asymmetric time-varying full state constraints. Automatica 2018, 98, 314–322. [Google Scholar] [CrossRef]
Wan, M.; Huang, S. Adaptive output feedback control for a class of uncertain non-strict feedback systems with asymmetric time-varying output constraints. Adv. Mech. Eng. 2020, 12, 1687814020958829. [Google Scholar] [CrossRef]
Meng, Q.; Zhu, M.; Lai, X.; Wang, Y.; Wu, M. Iterative-Learning-Based Motion Planning and Position Control of a Single-Link Flexible Manipulator With Vibration Sensor Hysteresis. IEEE/ASME Trans. Mechatron. 2024, 29, 4560–4571. [Google Scholar] [CrossRef]
Fu, Q.; Chen, X.Y.; He, W. A survey on 3D visual tracking of multicopters. Int. J. Autom. Comput. 2019, 16, 707–719. [Google Scholar] [CrossRef]
Ji, N.; Liu, J. Sliding mode control based on RBF neural network for a class of underactuated systems with input quantization and event-triggering. Trans. Inst. Meas. Control 2024, 46, 280–290. [Google Scholar] [CrossRef]
Lan, X.; Yan, J.; He, S.; Zhao, Z.; Zou, T. Distributed cooperative reinforcement learning for multi-agent system with collision avoidance. Int. J. Robust Nonlinear Control 2024, 34, 567–585. [Google Scholar] [CrossRef]
Zhao, Z.; He, W.; Zou, T.; Zhang, T.; Chen, C.P. Adaptive Broad Learning Neural Network for Fault-Tolerant Control of 2-DOF Helicopter Systems. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 7560–7570. [Google Scholar] [CrossRef]
Zhang, H.; Wang, L.; Shi, W. Seismic control of adaptive variable stiffness intelligent structures using fuzzy control strategy combined with LSTM. J. Build. Eng. 2023, 78, 107549. [Google Scholar] [CrossRef]
Zhao, B.; Liu, D.; Luo, C. Reinforcement learning-based optimal stabilization for unknown nonlinear systems subject to inputs with uncertain constraints. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 4330–4340. [Google Scholar] [CrossRef] [PubMed]
Gao, H.; He, W.; Zhang, L.; Sun, C. Neural-network control of a stand-alone tall building-like structure with an eccentric load: An experimental investigation. IEEE Trans. Cybern. 2020, 52, 4083–4094. [Google Scholar] [CrossRef] [PubMed]
Kong, L.; He, W.; Yang, C.; Li, G.; Zhang, Z. Adaptive fuzzy control for a marine vessel with time-varying constraints. IET Control Theory Appl. 2018, 12, 1448–1455. [Google Scholar] [CrossRef]
Zhao, X.; Zhang, S.; Liu, Z.; Li, Q. Vibration control for flexible manipulators with event-triggering mechanism and actuator failures. IEEE Trans. Cybern. 2021, 52, 7591–7601. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The model of the one-story flexible building system.

Figure 2. Control structure block diagram of high-rise building system.

Figure 3. Three-dimensional vibration deformation diagram of flexible wall under uncontrolled action.

Figure 4. Three-dimensional vibration deformation diagram of flexible wall with the proposed control.

Figure 5. Comparison of AMD tracking errors.

Figure 6. The position elastic deflection of the wall in case 2 and 3.

Figure 7. The floor vibration acceleration in cases 2 and 3.

Figure 8. The control torque in cases 2 and 3.

Table 1. Physical parameter table of flexible high-rise building system.

Parameter	Value	Unit
$L_{f}$	0.54	m
$ρ$	0.453	Kg/m
EI	20	Nm²
$J_{c}$	37	Kgm²
$μ_{c}$	0.65	Kg
$μ_{f}$	0.68	Kg

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, M.; Xie, R. Optimal Control Based on Reinforcement Learning for Flexible High-Rise Buildings with Time-Varying Actuator Failures and Asymmetric State Constraints. Buildings 2025, 15, 841. https://doi.org/10.3390/buildings15060841

AMA Style

Li M, Xie R. Optimal Control Based on Reinforcement Learning for Flexible High-Rise Buildings with Time-Varying Actuator Failures and Asymmetric State Constraints. Buildings. 2025; 15(6):841. https://doi.org/10.3390/buildings15060841

Chicago/Turabian Style

Li, Min, and Rui Xie. 2025. "Optimal Control Based on Reinforcement Learning for Flexible High-Rise Buildings with Time-Varying Actuator Failures and Asymmetric State Constraints" Buildings 15, no. 6: 841. https://doi.org/10.3390/buildings15060841

APA Style

Li, M., & Xie, R. (2025). Optimal Control Based on Reinforcement Learning for Flexible High-Rise Buildings with Time-Varying Actuator Failures and Asymmetric State Constraints. Buildings, 15(6), 841. https://doi.org/10.3390/buildings15060841

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Control Based on Reinforcement Learning for Flexible High-Rise Buildings with Time-Varying Actuator Failures and Asymmetric State Constraints

Abstract

1. Introduction

2. Dynamic Model

3. Controller Design and Stability Analysis

3.1. Reinforcement Learning Control Algorithm

3.2. Stability Analysis

4. Numerical Simulations

5. Conclusions

6. Patents

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI