Application of Deep Reinforcement Learning in Reconﬁguration Control of Aircraft Anti-Skid Braking System

: The aircraft anti-skid braking system (AABS) plays an important role in aircraft taking off, taxiing, and safe landing. In addition to the disturbances from the complex runway environment, potential component faults, such as actuators faults, can also reduce the safety and reliability of AABS. To meet the increasing performance requirements of AABS under fault and disturbance conditions, a novel reconﬁguration controller based on linear active disturbance rejection control combined with deep reinforcement learning was proposed in this paper. The proposed controller treated component faults, external perturbations, and measurement noise as the total disturbances. The twin delayed deep deterministic policy gradient algorithm (TD3) was introduced to realize the parameter self-adjustments of both the extended state observer and the state error feedback law. The action space, state space, reward function, and network structure for the algorithm training were properly designed, so that the total disturbances could be estimated and compensated for more accurately. The simulation results validated the environmental adaptability and robustness of the proposed reconﬁguration controller. Z.Z.; visualization, S.L. and Z.Z.; supervision, S.L., Z.Y., Z.Z., R.J., T.R., Y.J., S.C. and X.Z.; project administration, S.L., Z.Y. and Z.Z.; funding acquisition, Z.Y., Z.Z., R.J., T.R., Y.J., S.C. and X.Z. All authors have the


Introduction
The aircraft anti-skid braking system (AABS) is an essential airborne utilities system to ensure the safe and smooth landing of aircraft [1]. With the development of aircraft towards high speed and large tonnage, the performance requirements of AABS are increasing. Moreover, AABS is a complex system with strong nonlinearity, strong coupling, and timevarying parameters, and is sensitive to the runway environment [2]. These characteristics make AABS controller design an interesting and challenging topic.
The most widely used control method in practice is PID + PBM, which is a speed differential control law. However, it suffers from low-speed slipping and underutilization of ground bonding forces, making it difficult to meet high performance requirements. To this end, researchers have proposed many advanced control methods to improve the AABS performance, such as mixed slip deceleration PID control [3], model predictive control [4], extremum-seeking control [5], sliding mode control [6], reinforcement Q-learning control [7], and so on. Zhang et al. [8] proposed a feedback linearization controller with a prescribed performance function to ensure the transient and steady-state braking performance. Qiu et al. [9] combined backstepping dynamic surface control with an asymmetric barrier Lyapunov function to obtain a robust tracking response in the presence of disturbance and runway surface transitions. Mirzaei et al. [10] developed a fuzzy braking controller optimized by a genetic algorithm and introduced an error-based global optimization approach for fast convergence near the optimum point. The above-mentioned works provide an in-depth study on AABS control; however, the adverse effects caused by typical component faults such as actuator faults are neglected. Since most AABS are designed based on hydraulic control systems, the long hydraulic pipes create an enormous risk of air mixing with oil, and internal leakage. Without regular maintenance, it is easy to cause functional degradation or even failure, which raises many security concerns [11,12]. How to ensure the stability and the acceptable braking performance of AABS after actuator faults becomes a key issue.
In order to actually improve the safety and reliability of AABS, the fault probability can be reduced by reliability design and redundant technology on the one hand [13]. However, due to the production factors (cost/weight/technological level), the redundancy of aircraft components is so limited that the system reliability is hard to increase. On the other hand, fault-tolerant control (FTC) technology can be introduced into the AABS controller design, which is the future development direction of AABS and the key technology that needs urgent attention [14]. Reconfiguration control is a popular branch of FTC that has been widely used in many safety-critical systems, especially in aerospace engineering [15,16]. The essence of reconfiguration control is to consider the possible faults of the plant in the controller design process. When component faults occur, the fault system information is used to reconfigure the controller structure or parameters automatically [17]. In this way, the adverse effects caused by faults can be restrained or eliminated, thus realizing an asymptotically stable and acceptable performance of the closed-loop system. A number of common reconfiguration control methods can be classified as follows: adaptive control [18,19], multi-model switching control [20], sliding mode control [21], fuzzy control [22], other robust control [23], etc. In addition, the characteristics of AABS increase the difficulty of accurate modeling, and many nonlinear reconfiguration control methods are complex and relatively hard to apply in engineering. Therefore, it is crucial to design a reconfiguration controller with a clear structure, and which is model-independent, strong fault-perturbation resistant, and easy to implement.
Han retained the essence of PID control and proposed an active disturbance rejection control (ADRC) technique that requires low model accuracy and shows good control performance [24]. ADRC can estimate disturbances in internal and external systems and compensate for them [25]. Furthermore, ADRC has been widely used in FTC system design because of its obvious advantages in solving control problems of nonlinear models with uncertainty and strong disturbances [26][27][28]. Although the structure is not difficult to implement with modern digital computer technology, ADRC needs to tune a bunch of parameters which makes it hard to use in practice [29]. To overcome the difficulty, Gao proposed linear active disturbance rejection control (LADRC), which is based on linear extended state observer (LESO) and linear state error feedback (LSEF) [30,31]. The bandwidth tuning method greatly reduced the number of LADRC parameters. LADRC has been applied to solve various control problems [32][33][34].
However, it is well known that a controller with fixed parameters may not be able to maintain the acceptable (rated or degraded) performance of a fault system. For this reason, some advanced algorithms with parameter adaptive capabilities have been introduced by researchers that further improve the robustness and environmental adaptability of ADRC, such as neural networks [35,36], fuzzy logic [37,38], and the sliding mode [39,40]. With the development of artificial intelligence techniques, reinforcement learning has been applied to control science and engineering [41,42], and good results have been achieved. Yuan et al. proposed a novel online control algorithm for a thickener which is based on reinforcement learning [43]. Pang et al. studied the infinite-horizon adaptive optimal control of continuous-time linear periodic systems, using reinforcement learning techniques [44]. A Q-learning-based adaptive method for ADRC parameters was proposed by Chen et al. and has been applied to the ship course control [45]. Motivated by the above observations, in this paper, a reconfiguration control scheme via LADRC combined with deep reinforcement learning was developed for AABS which is subject to various fault perturbations. The proposed reconfiguration control method is a remarkable control strategy compared to previous methods for three reasons: (1) AABS is extended with a new state variable, which is the sum of all unknown dynamics and disturbances not noticed in the fault-free system description. This state variable can be estimated using LESO. It indirectly simplifies the AABS modeling; (2) Artificial intelligence technology is introduced and combined with the traditional control method to solve special control problems. By combining LADRC with the deep reinforcement learning TD3 algorithm, the selection of controller parameters is equivalent to the choice of agent actions. The parameter adaptive capabilities of LESO and LSEF are endowed through the continuous interaction between the agent and the environment, which not only eliminates the tedious manual tuning of the parameters, but also results in more accurate estimation and compensation for the adverse effects of fault perturbations; (3) It is a data-driven robust control strategy that does not require any additional fault detection or identification (FDI) module, while the controller parameters are adaptive. Therefore, the proposed method corresponds to a novel combination of active reconfiguration control and FDI-free reconfiguration control, which makes it an interesting solution under unknown fault conditions. The paper is organized as follows. Section 2 describes AABS dynamics with an actuator fault factor. The reconfiguration controller is presented in Section 3. The simulation results are presented to demonstrate the merits of the proposed method in Section 4, and conclusions are drawn in Section 5.

AABS Modeling
The AABS mainly consists of the following components: aircraft fuselage, landing gear, wheels, a hydraulic servo system, a braking device, and an anti-skid braking controller. The subsystems are strongly coupled and exhibit strong nonlinearity and complexity.
Based on the actual process and objective facts of anti-skid braking, the following reasonable assumptions can be made [46]: (1) The aircraft fuselage is regarded as a rigid body with concentrated mass; (2) The gyroscopic moment generated by the engine rotor is not considered during the aircraft braking process; (3) The crosswind effect is ignored; (4) Only the longitudinal deformation of the tire is taken into account and the deformation of the ground is ignored; (5) All wheels are the same and controlled synchronously.

Aircraft Fuselage Dynamics
The force diagram of the aircraft fuselage is shown in Figure 1 and the specific parameters described in the diagram are shown in Table 1.    The aircraft force and torque equilibrium equations are: According to the influence of aerodynamic characteristics, we can obtain [46]:

Landing Gear Dynamics
The main function of the landing gear is to support and buffer the aircraft, thus improving the longitudinal and vertical forces. In addition to the wheel and braking device, the struts, buffers, and torque arm are also the main components of the landing gear. In this paper, it is assumed that the stiffness of the torque arm is large enough, and the torsional freedom of the wheel with respect to the strut and the buffer is ignored, so the torque arm is not considered.
The buffer can be reasonably simplified as a mass-spring-damping system [46], and the force acting on the aircraft fuselage by the buffer can be described as: whose parameters are shown in Table 2. Due to the non-rigid connection between the landing gear and the aircraft fuselage, horizontal and angular displacements are generated under the action of braking forces. However, the struts are cantilever beams, and their angular displacements are very small and negligible. Therefore, the lateral stiffness model can be expressed by the following equivalent second-order equation: whose parameters are shown in Table 3.

Wheel Dynamics
The force diagram of the main wheel brake is shown in Figure 2.
Aerospace 2022, 9, x FOR PEER REVIEW 6 of 27 and negligible. Therefore, the lateral stiffness model can be expressed by the following equivalent second-order equation: whose parameters are shown in Table 3.

Wheel Dynamics
The force diagram of the main wheel brake is shown in Figure 2. It can be seen that during the taxiing, the main wheel is subjected to a combined effect of the braking torque M s and the ground friction torque M j . Due to the effect of the lateral stiffness, there is a longitudinal axle velocity V zx along the fuselage, which is superimposed by the aircraft velocity V and the navigation vibration velocity d V . The dynamics equation of the main wheel is [46]: whose parameters are shown in Table 4. During the braking, the tires are subjected to the braking torque that keeps the aircraft speed always greater than the wheel speed, that is V > V w . Thus, the slip ratio λ is defined to represent the slip motion ratio of the wheels relative to the runway. For the main wheel, using V zx instead of V to calculate λ can avoid false brake release due to landing gear deformation, thus effectively reducing the landing gear walk situation [46]. The following equation is used to calculate the slip rate in this paper: The tire-runway combination coefficient is related to many factors, including real-time runway conditions, aircraft speed, slip rate, and so on. A simple empirical formula called 'magic formula' developed by Pacejka [47] is widely used to calculate and can be expressed as follows: where τ j (j = 1, 2, 3), τ 1 , τ 2 , τ 3 are peak factor, stiffness factor, and curve shape factor, respectively. Table 5 lists the specific parameters for several different runway statuses [48].

Hydraulic Servo System and Braking Device Modeling
Due to the complex structure of the hydraulic servo system, in this paper, some simplifications have been made so that only electro-hydraulic servo valves and pipes are considered. Their transfer functions are given as follows: whose parameters are shown in Table 6.
It should be noted that the anti-skid braking controller should realize both braking control and anti-skid control. To this end, there is an approximately linear relationship between the brake pressure P and the control current I c , which can be described as follows: where P 0 = 1 × 10 7 Pa. The braking device serves to convert the brake pressure into brake torque, which is calculated as follows: M s = µ mc N mc PR mc (11) whose parameters are shown in Table 6. The hydraulic servo system, as the actuator of AABS, is inevitably subject to some potential faults. Problems such as hydraulic oil mixing with air, internal leakage, and vibration seriously affect the efficiency of the hydraulic servo system [49]. Therefore, in this paper, the loss of efficiency (LOE) is introduced to represent a typical AABS actuator fault, which is characterized by a decrease in the actuator gain from its nominal value [26]. In the case of an actuator LOE fault, the brake pressure generated by the hydraulic servo system deviates from the commanded output expected by the controller. In other words, one instead has: where P f ault represents the actuator actual output, and k LOE ∈ (0, 1] refers the LOE fault factor.
Remark 1. n% LOE is equivalent to the LOE fault gain k LOE = 1 − n/100, k LOE = 1 indicates that the actuator is fault-free.

Remark 2.
Note that if the components do not always have the same characteristics as those of fault-free, it is necessary to establish the fault model. This not only provides an accurate model for the next reconfiguration on controller design, but also ensures that the adverse effects caused by fault perturbation can be effectively observed and compensated for.
Thus, Equation (11) can be rewritten as follows: where M s is the actual brake torque.

Remark 3.
As can be seen from the entire modeling process described above, AABS is nonlinear and highly coupled. The actuator fault leads to a sudden jump in the model parameters with greater internal perturbation compared to the fault-free case. Meanwhile, external disturbances such as the runway environment cannot be ignored.

Problem Description
Despite the aircraft having three degrees of freedom, only longitudinal taxiing is focused on in AABS. In this paper, AABS adopted the slip speed control type [48], that is, the braked wheel speed V ω was used as the reference input, and the aircraft speed V was dynamically adjusted by the AABS controller to achieve anti-skid braking. According to Section 2, the AABS longitudinal dynamics model can be rewritten as follows: ..
where f (·) is the controlled plant dynamics, out represents the external disturbance, f is an uncertain term including component faults, b v is the control gain, and u is the system input.
V, out , f ) as the system generalized total perturbation and extend it to a new system state variable, i.e., V, out , f ). Then the state equation of System (14) can be obtained: where x 1 , x 2 , x 3 are system state variables, and h(V, V, out , f ) and its dif- For System (14), affected by the total perturbation, a LADRC reconfiguration controller was designed next to restrain or eliminate the adverse effects, thus realizing the asymptotic stability and acceptable performance of the closed-loop system.

LADRC Controller Design
The control schematic of the LADRC is shown in Figure 3.  Firstly, the following tracking differentiator (TD) was designed: Firstly, the following tracking differentiator (TD) was designed: where v r is the desired input, v 1 is the transition process of v r , v 2 is the derivative of v 1 , and r and h are adjusted accordingly as filter coefficients. The function fhan(·) is defined as follows: We established the following form, LESO: Selecting the suitable observer gains (β 1 , β 2 , β 3 ), LESO then enabled real-time observation of the variables in System (14) When V, out , f ) without error, let LSEF be: then the system (15) can be simplified to a double integral series structure: ..
Further, the bandwidth method [50] was used and we could obtain: where ω o is the observer bandwidth. The larger ω o is, the smaller LESO observation errors are. However, the sensitivity of the system to noise may be increased, so the ω o selection requires comprehensive consideration. Similarly, according to the parameterization method and engineering experience [32], the LSEF parameters can be chosen as: where ω c is the controller bandwidth, ξ is the damping ratio, and in this paper ξ = 1. Therefore, the parameter tuning problem of LADRC controller was simplified to the observer bandwidth ω o and controller bandwidth ω c configuration.

TD3 Algorithm
TD3 algorithm is an offline RL algorithm based on DDPG proposed in 2015 [51]. This approach adopted a similar method implemented in Double-DQN [52] to reduce the overestimation in function approximation, delaying the update frequency in the actornetwork, and adding noises to target the actor-network to release the sensitivity and instability in DDPG. The structure of TD3 is shown in Figure 4.
where c ω is the controller bandwidth, ξ is the damping ratio, and in this paper 1 ξ = .
Therefore, the parameter tuning problem of LADRC controller was simplified to the observer bandwidth o ω and controller bandwidth c ω configuration.

TD3 Algorithm
TD3 algorithm is an offline RL algorithm based on DDPG proposed in 2015 [51]. This approach adopted a similar method implemented in Double-DQN [52] to reduce the overestimation in function approximation, delaying the update frequency in the actor-network, and adding noises to target the actor-network to release the sensitivity and instability in DDPG. The structure of TD3 is shown in Figure 4.  Updating the parameters of critic networks by minimizing loss: where s is the current state, a is the current action, and Q θ i (s, a) stands for the parameterized state-action value function Q with parameter θ i .
is the target value of the function Q θ (s, a), γ ∈ [0, 1] is the discount factor, and the target action is defined as: a = π φ (s) + (26) where noise follows a clipped normal distribution clip [N (0, σ), −c, c], c > 0. This implies that is a random variable with N (0, σ) and belongs to the interval [−c, c].
The inputs of the actor network are both Q θ (s, a) from the critic network and the minibatch form the memory, and the output is the action given by: where φ is the parameter of the actor network, and π φ is the output form the actor network, which is a deterministic and continuous value. Noise follows the normal distribution N (0, σ), and is added for exploration. Updating the parameters of the actor-network based on deterministic gradient strategy: TD3 updates the actor-network and all three target networks every d steps periodically in order to avoid a too fast convergence. The parameters of the critic target networks and the actor-target network are updated according to: The pseudocode of the proposed approach is given in Algorithm 1.

TD3-LADRC Reconfiguration Controller Design
Lack of environment adaptability, poor control performance, and weak robustness are the main shortcomings of parameter-fixed controllers [36]. When a fault occurs, it may not be possible to maintain the acceptable (rated or degraded) performance of the damaged system. Motivated by the above analysis, a reconfiguration controller called TD3-LADRC is proposed in this paper, and its control schematic is shown in Figure 5.
20 Until s reaches terminal state T s .

TD3-LADRC Reconfiguration Controller Design
Lack of environment adaptability, poor control performance, and weak robustness are the main shortcomings of parameter-fixed controllers [36]. When a fault occurs, it may not be possible to maintain the acceptable (rated or degraded) performance of the damaged system. Motivated by the above analysis, a reconfiguration controller called TD3-LADRC is proposed in this paper, and its control schematic is shown in Figure 5. The deep reinforcement learning algorithm TD3 is introduced to realize the LADRC parameters adaption. The details of each part have been described above. The selection of control parameters is treated as the agent's action t a , and the response result of the control system t s is considered as the state, i.e., as follows: The deep reinforcement learning algorithm TD3 is introduced to realize the LADRC parameters adaption. The details of each part have been described above. The selection of control parameters is treated as the agent's action a t , and the response result of the control system s t is considered as the state, i.e., as follows: where e = V − V w , and s obs is the agent observations vector.
The range of each controller parameter is selected as follows: The reward function plays a crucial role in the reinforcement learning algorithm. The appropriateness of the reward function design directly affects the training effect of the reinforcement learning, which in turn affects the effectiveness of the whole reconfiguration controller. According to the working characteristics of AABS, the following reward function is selected after several attempts to ensure stable and smooth braking: The stop conditions for each training episode are as follows, and one of the three will do: (1) The aircraft speed V < 2; (2) The error between main wheel speed and aircraft speed e > 20; (3) Simulation time t > 20 s. Remark 4. TD3, TD, LESO, and LSEF together constitute the TD3-LADRC controller. Compared to normal LADRC, TD3-LADRC realizes the parameter adaption that makes the controller reconfigurable. The robustness and immunity are greatly improved. It can effectively compensate the adverse effects caused by the total perturbations including faults.

TD3-LESO Estimation Capability Analysis
In order to prove the stability of the whole closed-loop system, the convergence of TD3-LESO is first analyzed in conjunction with Assumption 1 [53]. Let the estimation errors of TD3-LESO be x i = x i − z i , i = 1, 2, 3, and the estimation error equation of the observer can be obtained as: (33) can be rewritten as: Based on Assumption 1 and Theorem 2 in Reference [54], the following theorem can be obtained: The proof is given in the Appendix A. Thus, it is clear that there are three positive numbers υ i , i = 1, 2, 3, such that the state estimation error | x i |≤ υ i holds, i.e., the TD3-LESO estimation errors are bounded, which can effectively estimate the states of the controlled plant and the total perturbation.

Stability Analysis of Closed-loop System
The closed-loop system consisted of the control laws (19) and (20), and the controlled object (21) is: ..
If we defined the tracking errors as ε i = v i − x i , i = 1, 2, then we could attain: where By solving Equation (37): Combining Assumption 1, Theorem 1, Theorem 3, and Theorem 4 in the literature [54], the following theorem was proposed to analyze the stability of the closed-loop system: Theorem 2. Under the condition that the TD3-LESO estimation errors are bounded, there exists a controller bandwidth ω c , such that the tracking error of the closed-loop system is bounded. Thus, for a bounded input, the output of the closed-loop system is bounded, i.e., the closed-loop system is BIBO-stable.
See the Appendix A for proof.

Simulation Results
In order to verify the reconfiguration capability and disturbance rejection capabilities of the proposed method, the corresponding simulations are carried out in this section and compared with conventional PID + PBM and LADRC.
The initial states of the aircraft are set as follows: (1) The initial speed of aircraft landing V(0) = 72 m/s; (2) The initial height of the center of gravity Hh = 2.178 m.
To prevent deep wheel slippage as well as tire blowout, the wheel speed was kept following the aircraft speed quickly at first, and the brake pressure was applied only after 1.5 s. The anti-skid brake control was considered to be over when V was less than 2 m/s.
In the experiment, both the critic networks and the actor networks were realized by a fully connected neural network with three hidden layers. The number of neurons in the hidden layer was (50,25,25). The activation function of the hidden layer was selected as the ReLU function, and the activation function of the output layer of the actor network was selected as the tanh function. In addition, the parameters of the actor network and the critic network were tuned by an Adam optimizer. The remaining parameters of TD3-LADRC are shown in Table 7.

Remark 5.
It is noted that the braking time t and braking distance x are selected as the criteria for braking efficiency, and the system stability is observed by slip rate λ.
The model simulation was carried out in MATLAB 2022a, and the TD3 algorithm was realized through the reinforcement learning toolbox. The simulation time was 20 s, the sampling time was 0.001 s. The training stopped when the average reward reached 12,000. The training took about 6 h to complete. The learning curves of the reward obtained by the agent for each interaction with the environment during the training process are shown in Figure 6. The model simulation was carried out in MATLAB 2022a, and the TD3 algorithm was realized through the reinforcement learning toolbox. The simulation time was 20 s, the sampling time was 0.001 s. The training stopped when the average reward reached 12,000. The training took about 6 h to complete. The learning curves of the reward obtained by the agent for each interaction with the environment during the training process are shown in Figure 6. It can be seen that at the beginning of the training, the agent was in the exploration phase and the reward obtained was relatively low. Later, the reward gradually increased, and after 40 episodes, the reward was steadily maintained at a high level and the algorithm gradually converges.

Case 1: Fault-Free and External Disturbance-Free in Dry Runway Condition
The simulation results of the dynamic braking process for different control schemes are shown in Figures 7 and 8 and Table 8.
As can be seen from Figure 7, PID + PBM leads to numerous skids during braking, which may cause serious loss to the tires. In contrast, LADRC and TD3-LADRC not only skid less frequently, but also have shorter braking time and braking distance. Moreover, the control effect of TD3-LADRC is better than LADRC. Figure 8 shows that TD3-LADRC can dynamically tune the controller parameters to accurately observe and compensate for the total disturbances, and thus improve the AABS performance.   It can be seen that at the beginning of the training, the agent was in the exploration phase and the reward obtained was relatively low. Later, the reward gradually increased, and after 40 episodes, the reward was steadily maintained at a high level and the algorithm gradually converges.

Case 1: Fault-Free and External Disturbance-Free in Dry Runway Condition
The simulation results of the dynamic braking process for different control schemes are shown in Figures 7 and 8 and Table 8.
As can be seen from Figure 7, PID + PBM leads to numerous skids during braking, which may cause serious loss to the tires. In contrast, LADRC and TD3-LADRC not only skid less frequently, but also have shorter braking time and braking distance. Moreover, the control effect of TD3-LADRC is better than LADRC. Figure 8 shows that TD3-LADRC can dynamically tune the controller parameters to accurately observe and compensate for the total disturbances, and thus improve the AABS performance.

Case 2: Actuator LOE Fault in Dry Runway Condition
The fault considered here assumed a 20% actuator LOE at 5 s and escalated to 40% LOE at 10 s. The simulation results are shown in Figures 9 and 10 and Table 9.
As can be seen in Figure 9, PID + PBM continuously performed a large braking and releasing operation under the combined effect of fault and disturbance. This makes braking much less efficient and risks dragging and flat tires. In addition, LADRC cannot brake the aircraft to a stop which is not allowed in practice. Figure 9c shows that there is a high frequency of wheel slip in the low-speed phase of the aircraft. In contrast, TD3-LADRC retains the experience gained from the agent's prior training and continuously adjusts the controller parameters online based on the plant states, which ultimately allows the aircraft to brake smoothly. From Figure 10a, it can be seen that the total fault perturbations are estimated fast and accurately based on the adaptive LESO. Overall, TD3-LADRC not only improves the robustness and immunity of the controller in fault-perturbed conditions, but also greatly significantly improves the safety and reliability of AABS.   Remark 6. During the braking process, it is observed that in some instants ω c = 0. It may not affect the stability of the whole system. On the one hand, the value of ω c does not change the fact that A ε is Hurwitz (see Proof of Theorem 2 for details). On the other hand, ω c is constantly changed by the agent through a continuous interaction with the environment, and in these instants the agent considers ω c = 0 as optimal, i.e., no anti-skid braking control leads to better braking results.

Case 2: Actuator LOE Fault in Dry Runway Condition
The fault considered here assumed a 20% actuator LOE at 5 s and escalated to 40% LOE at 10 s. The simulation results are shown in Figures 9 and 10 and Table 9.

Case 3: Actuator LOE Fault in Mixed Runway Condition
The mixed runway structure is as follows: dry runway in the interval of 0-10 s, wet runway in the interval of 10-20 s, and snow runway after 20 s. The fault considered here assumed a 10% actuator LOE at 10 s. The simulation results are shown in Figures 11 and  12 and Table 10.
The deterioration of the runway conditions has resulted in a very poor tire-ground bond. It can be seen from Figure 11 that both braking time and braking distance have increased compared to the dry runway. Figure 12 shows that TD3-LADRC is still able to achieve controller parameters adaption, accurately observe the total fault perturbations, and effectively compensate for the adverse effects. The whole reconfiguration control system adapts well to runway changes. The environmental adaptability of AABS is improved.   As can be seen in Figure 9, PID + PBM continuously performed a large braking and releasing operation under the combined effect of fault and disturbance. This makes braking much less efficient and risks dragging and flat tires. In addition, LADRC cannot brake the aircraft to a stop which is not allowed in practice. Figure 9c shows that there is a high frequency of wheel slip in the low-speed phase of the aircraft. In contrast, TD3-LADRC retains the experience gained from the agent's prior training and continuously adjusts the controller parameters online based on the plant states, which ultimately allows the aircraft to brake smoothly. From Figure 10a, it can be seen that the total fault perturbations are estimated fast and accurately based on the adaptive LESO. Overall, TD3-LADRC not only improves the robustness and immunity of the controller in fault-perturbed conditions, but also greatly significantly improves the safety and reliability of AABS.

Case 3: Actuator LOE Fault in Mixed Runway Condition
The mixed runway structure is as follows: dry runway in the interval of 0-10 s, wet runway in the interval of 10-20 s, and snow runway after 20 s. The fault considered here assumed a 10% actuator LOE at 10 s. The simulation results are shown in Figures 11 and 12 and Table 10.

Conclusions
A linear active disturbance rejection reconfiguration control scheme based on deep reinforcement learning was proposed to meet the higher performance requirements of AABS under fault-perturbed conditions. According to the composition structure and working principle, AABS mathematical model with an actuator fault factor is established. A TD3-LADRC reconfiguration controller was developed, and the parameters of LSEF and LESO were adjusted online using the TD3 algorithm. The simulation results under different conditions verified that the designed controller can effectively improve the antiskid braking performance even under faults and perturbations, as well as different runway environments. It successfully strengthened the robustness, immunity, and environmental adaptability of the AABS, thereby improving the safety and reliability of the aircraft. However, TD3-LADRC is so complex that its control effectiveness was verified only by simulations in this paper. The combined effect caused by various uncertainties in practical applications on the robustness of the controller cannot be completely considered. Therefore, in future work, an aircraft braking hardware-in-loop experimental platform is necessary to build, consisting of the host PC, the target CPU, the anti-skid braking controller, the actuators, and the aircraft wheel. The host PC and the target CPU are the software simulation part, while the other four parts are the hardware part.  The deterioration of the runway conditions has resulted in a very poor tire-ground bond. It can be seen from Figure 11 that both braking time and braking distance have increased compared to the dry runway. Figure 12 shows that TD3-LADRC is still able to achieve controller parameters adaption, accurately observe the total fault perturbations, and effectively compensate for the adverse effects. The whole reconfiguration control system adapts well to runway changes. The environmental adaptability of AABS is improved.

Conclusions
A linear active disturbance rejection reconfiguration control scheme based on deep reinforcement learning was proposed to meet the higher performance requirements of AABS under fault-perturbed conditions. According to the composition structure and working principle, AABS mathematical model with an actuator fault factor is established.
A TD3-LADRC reconfiguration controller was developed, and the parameters of LSEF and LESO were adjusted online using the TD3 algorithm. The simulation results under different conditions verified that the designed controller can effectively improve the anti-skid braking performance even under faults and perturbations, as well as different runway environments. It successfully strengthened the robustness, immunity, and environmental adaptability of the AABS, thereby improving the safety and reliability of the aircraft. However, TD3-LADRC is so complex that its control effectiveness was verified only by simulations in this paper. The combined effect caused by various uncertainties in practical applications on the robustness of the controller cannot be completely considered. Therefore, in future work, an aircraft braking hardware-in-loop experimental platform is necessary to build, consisting of the host PC, the target CPU, the anti-skid braking controller, the actuators, and the aircraft wheel. The host PC and the target CPU are the software simulation part, while the other four parts are the hardware part.  V(τ), out , f ) is bounded, we have:  , we can attain: Considering that A 3 is Hurwitz, there is thus a finite time T 1 so that for any t ≥ T 1 , i, j = 1, 2, 3, the following formula holds [54]: Therefore, the following formula is satisfied: Finally, we can attain: From Equations (A3), (A4), and (A7) we can attain: Let ε sum (0) = |ε 1 (0)| + |ε 2 (0)| + |ε 3 (0)|, for all t ≥ T 1 , the following formula holds: Form Equation (A1) we can attain: For all t ≥ T 1 , i = 1, 2, 3, the above formula holds.
Proof of Theorem 2. According to Equation (37) and Theorem 1, we can attain: where k sum = 1 + k 1 + k 2 , bringing in the controller bandwidth k sum = 1 + ω 2 c + 2ω c , and taking the parameters in this way ensures that A ε is Hurwitz [54].