Automatic Control Optimization for Large-Load Plant-Protection Quadrotor
Abstract
:1. Introduction
2. Materials and Methods
2.1. System Model and Identification
2.2. Attitude Controller Design for Large-Load Plant-Protection Quadrotors
2.2.1. Quadrotor Attitude Controller with Reinforcement Learning
2.2.2. Network Structure and Training Algorithm
- (1)
- Different agents have different physical parameters with different initial states and take different actions, thus avoiding the experience correlation problem.
- (2)
- Compared to traditional RL methods such as deep Q learning (DQN) [25], parallel training does not need experience replay skills and thus requires much less memory.
- (3)
- Several agents can provide more experience at the same time, greatly accelerating the training process.
Algorithm 1: Extended-state RL parallel training algorithm |
: max number of agents |
: max number of episodes |
: max number of steps during an episode |
: The reward function |
: soft update rate |
1: create agents with different system parameters |
2: randomly initialize the weights of the online critic network and the online actor network |
3: initialize the weights of the target critic network and the target actor network |
4: for do |
5: initialize all agents with random states |
6: for do |
7: clear the memory box |
8: for do |
9: recieve the current state |
10: take action according to |
11: receive the next state |
12: put into the memory box |
13: end for |
14: for do |
15: obtain experience from B |
16: compute the reward of experience |
17: |
18: update by minimizing the loss: |
19: update using the sampled policy gradients: |
20: soft update with |
21: soft update with |
22: end for |
23: end for |
24: end for |
3. Results
3.1. Training Environment
3.2. Quadrotor Details
3.3. Training Details
3.4. Performance in Simulation
3.5. Flight Controller Platform Details
3.6. Flight Performance in Real Flight
4. Conclusions
4.1. Discussion
4.2. Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chovancova, A.; Fico, T.; Duchon, F.; Dekan, M.; Chovanec, L.; Dekanova, M. Control Methods Comparison for the Real Quadrotor on an Innovative Test Stand. Appl. Sci. 2020, 10, 2064. [Google Scholar] [CrossRef] [Green Version]
- Anderson, C. 10 Breakthrough Technologies: Agricultural Drones. MIT Technol. Rev. 2014, 3, 58–60. [Google Scholar]
- PX4, Group. PX4: A Professional Open Source Autopilot Stack. Available online: https://github.com/PX4/Firmware (accessed on 27 March 2021).
- ArduPilot Group. The ArduPilot Project Provides an Advanced, Full-Featured and Reliable Open Source Autopilot Software System. Available online: http://ardupilot.org/ (accessed on 27 March 2021).
- Yu, Y.; Yang, S.; Wang, M.; Li, C.; Li, Z. High performance full attitude control of a quadrotor on SO(3). In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015. [Google Scholar]
- Jafari, H.; Zareh, M.; Roshanian, J.; Nikkhah, A. An Optimal Guidance Law Applied to Quadrotor Using LQR Method. Trans. Jpn. Soc. Aeronaut. Space Sci. 2010, 53, 32–39. [Google Scholar] [CrossRef] [Green Version]
- Falanga, D.; Kleber, K.; Mintchev, S.; Floreano, D.; Scaramuzza, D. The Foldable Drone: A Morphing Quadrotor That Can Squeeze and Fly. IEEE Robot. Autom. Lett. 2019, 4, 209–216. [Google Scholar] [CrossRef] [Green Version]
- Bouabdallah, S.; Siegwart, R. Full control of a quadrotor. In Proceedings of the 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, San Diego, CA, USA, 29 October–2 November 2007. [Google Scholar]
- Bouabdallah, S.; Siegwart, R. Backstepping and Sliding-mode Techniques Applied to an Indoor Micro Quadrotor. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005. [Google Scholar]
- Doukhi, O.; Fayjie, A.; Lee, D.J. Global fast terminal sliding mode control for quadrotor UAV. In Proceedings of the 2017 17th International Conference on Control, Automation and Systems (ICCAS), Jeju, Korea, 18–21 October 2017; pp. 1180–1182. [Google Scholar]
- Noormohammadi-Asl, A.; Esrafilian, O.; Arzati, M.A.; Taghirad, H.D. System identification and H-infinity-based control of quadrotor attitude. Mech. Syst. Signal Process. 2020, 135, 106358. [Google Scholar] [CrossRef]
- Liang, X.H.; Wang, Q.; Hu, C.H.; Dong, C.Y. Observer-based H-infinity fault-tolerant attitude control for satellite with actuator and sensor faults. Aerosp. Sci. Technol. 2019, 95, 105424. [Google Scholar] [CrossRef]
- He, Z.; Gao, W.; He, X.; Wang, M.; Liu, Y.; Song, Y.; An, Z. Fuzzy intelligent control method for improving flight attitude stability of plant protection quadrotor UAV. Int. J. Agric. Biol. Eng. 2019, 12, 110–115. [Google Scholar] [CrossRef]
- Sands, T. Development of Deterministic Artificial Intelligence for Unmanned Underwater Vehicles (UUV). J. Mar. Sci. Eng. 2020, 8, 578. [Google Scholar] [CrossRef]
- Sands, T. Optimization Provenance of Whiplash Compensation for Flexible Space Robotics. Aerospace 2019, 6, 93. [Google Scholar] [CrossRef] [Green Version]
- Jemin, H.; Inkyu, S.; Roland, S.; Marco, H. Control of a Quadrotor with Reinforcement Learning. IEEE Robot. Autom. Lett. 2017, 4, 2096–2103. [Google Scholar]
- Lin, X.N.; Yu, Y.; Sun, C.Y. Supplementary Reinforcement Learning Controller Designed for Quadrotor UAVs. IEEE Access 2019, 7, 26422–26431. [Google Scholar] [CrossRef]
- Ma, H.-J.; Xu, L.-X.; Yang, G.-H. Multiple Environment Integral Reinforcement Learning-Based Fault-Tolerant Control for Affine Nonlinear Systems. IEEE Trans. Cybern. 2021, 51, 1913–1928. [Google Scholar] [CrossRef] [PubMed]
- Lin, X.; Liu, J.; Yu, Y.; Sun, C. Event-triggered reinforcement learning control for the quadrotor UAV with actuator saturation ms. Neurocomputing 2020, 415, 135–145. [Google Scholar] [CrossRef]
- Hu, D.; Pei, Z.; Tang, Z. Single-Parameter-Tuned Attitude Control for Quadrotor with Unknown Disturbance. Appl. Sci. 2020, 10, 5564. [Google Scholar] [CrossRef]
- Lillicra, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. Computerence 2015. [Google Scholar] [CrossRef] [Green Version]
- Shi, D.J.; Dai, X.H.; Zhang, X.W.; Quan, Q. A practical performance evaluation method for electric multicopters. IEEE/ASME Trans. Mechatron. 2017, 3, 1337–1348. [Google Scholar] [CrossRef]
- Diederik, P.K.; Jimmy, L.B. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Wang, Y.; Sun, J.; He, H.B.; Sun, C.Y. Deterministic Policy Gradient with Integral Compensator for Robust Quadrotor Control. IEEE Trans. Syst. Man Cybern. Syst. 2019, 50, 3713–3725. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, G.M.; Graves, A.; Riedmiller, M.; Andreas, K.A.; Ostrovski, G. Human-level control through deep reinforcement learning. Nature 2019, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Eigen Group. Eigen Is a C++ Template Library for Linear Algebra: Matrices, Vectors, Numerical Solvers, and Related Algorithms. Available online: https://eigen.tuxfamily.org/ (accessed on 27 March 2021).
Parameter | Value |
---|---|
Mass m | 8 kg |
Moment of inertia | kg·m |
Moment of inertia | kg·m |
Moment of inertia | kg·m |
Arm length d | m |
Lift coefficient | × N/(rad/s) |
Torque coefficient | N·m/(rad/s) |
Rotor control constant | 316 rad/s |
Rotor dynamic response constant | s |
Volume of the tank | 10 L |
Maximum mass of water | 10 kg |
Density of water | kg/m |
Distance between the quadrotor and water | m |
Length and width of the water cube | m |
Parameter | Value |
---|---|
Maximum MOI in the x-axis | kg·m |
Minimum MOI in the x-axis | kg·m |
Maximum MOI in the y-axis | kg·m |
Minimum MOI in the y-axis | kg·m |
Maximum MOI in the z-axis | kg·m |
Minimum MOI in the z-axis | kg·m |
Parameter | Value |
---|---|
Discount factor | |
Learning rate of the critic net | |
Learning rate of the actor net | |
Batch size N | 128 |
Soft update rate | |
Maximum number of agents | 128 |
Maximum number of training episodes | 10,000 |
Control time step | s |
State update time step | s |
Maximum time step in an episode | 5 s |
Reward function constant | |
Reward function constant | |
Reward function constant | |
Reward function constant | |
Reward function constant | |
Reward function constant |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, D.; Pei, Z.; Tang, Z. Automatic Control Optimization for Large-Load Plant-Protection Quadrotor. Appl. Sci. 2021, 11, 4058. https://doi.org/10.3390/app11094058
Hu D, Pei Z, Tang Z. Automatic Control Optimization for Large-Load Plant-Protection Quadrotor. Applied Sciences. 2021; 11(9):4058. https://doi.org/10.3390/app11094058
Chicago/Turabian StyleHu, Dada, Zhongcai Pei, and Zhiyong Tang. 2021. "Automatic Control Optimization for Large-Load Plant-Protection Quadrotor" Applied Sciences 11, no. 9: 4058. https://doi.org/10.3390/app11094058
APA StyleHu, D., Pei, Z., & Tang, Z. (2021). Automatic Control Optimization for Large-Load Plant-Protection Quadrotor. Applied Sciences, 11(9), 4058. https://doi.org/10.3390/app11094058