Learning to Have a Civil Aircraft Take Off under Crosswind Conditions by Reinforcement Learning with Multimodal Data and Preprocessing Data
Abstract
:1. Introduction
2. Related Works
3. Technical Background
3.1. Reinforcement Learning and DDPG
3.2. Simulation Environment
4. Methodology
4.1. State Information for Reinforcement Learning
4.1.1. Flight Status Data
- Positional and rotational information: The positional data include longitude, latitude and altitude, which are denoted as , and , respectively, in this paper. Generally, this information can be obtained from GPS, ground-based augmentation system or air pressure sensors. The rotational data include the pitch, roll and heading of the aircraft, and these data are denoted as , and , respectively.
- Velocity information: The velocity information of an aircraft includes and , which correspond successively to the 3 positional data variables and 3 rotational data variables.
- True airspeed: The true airspeed , which represents the relative speed of the plane and the wind along the heading axis, is also needed and is a critical factor for helping the autopilot system make operational decisions.
- Wind speed: The vector of the wind speed is provided for the RL model, in which is the wind speed and is the angle between the wind speed and the aircraft heading. In this research, we consider only these two components of the wind speed on the horizontal plane (excluding the wind speed in the vertical direction).
- Control information: The control information used in this study is the last control command sent to the aircraft, and it consists of the rudder, elevator, aileron and throttle commands, which are denoted and , respectively.
- Deviation from the centerline of the airstrip: It is necessary to keep the aircraft moving along the centerline of the airstrip during the take-off process, so the deviation from the centerline of the airstrip is input to the autopilot algorithm. To compute the deviation, we establish a coordinate system with the starting point of the airstrip as the origin and transform the position data of the aircraft into this coordinate system. The position of an aircraft in this coordinate system is denoted by the vector . is a vector that indicates the direction of the airstrip, and the deviation is defined as
4.1.2. Preprocessing Data
4.1.3. Visual Information
4.2. Reward Function
4.2.1. Out-of-Bounds Punishments
4.2.2. Rewards for Tentative Movements
4.3. Experience Replay
4.4. Architecture of RL Model
4.5. Implementation Details
Algorithm 1 Core steps of the proposed RL algorithm |
Randomly initialize eval critic network and eval actor with weights and . |
Initialize target network and with weight , |
Initialize experience memory M |
Initialize the actor replacement counter |
Initialize the critic replacement counter |
Initialize the actor replacement interval |
Initialize the critic replacement interval |
for to Z do |
Initialize a random process for action exploration. |
Initialize the aircraft and observe the state . |
for to L do |
Select action according to the current policy and exploration noise. |
Run action and compute reward according to the method in Section 3.2. |
Observe new state |
Store transition in M. |
Sample K transitions from M. |
Set |
Update the critic by minimizing the loss: |
Update the actor policy using the sampled policy gradient (Eq.3). |
if then |
end if |
if then |
end if |
Update and by ; |
end for |
end for |
Get the target actor . |
5. Experiments
5.1. Experiment 1: Learning to Take Off under Crosswind Conditions at Different Speeds
5.2. Experiment 2: Comparison of Learning with and without Visual Data
5.3. Experiment 3: Comparison of Learning with and without Preprocessing Data
6. Conclusions and Future Studies
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Royo, P.; Pablo, E. Autopilot abstraction and standardization for seamless integration of unmanned aircraft system applications. J. Aerosp. Comput. Inf. Commun. 2011, 8, 197–223. [Google Scholar] [CrossRef]
- Theis, J.; Daniel, O.; Frank, T.; Harald, P. Robust autopilot design for landing a large civil aircraft in crosswind. Control Eng. Pract. 2018, 76, 54–64. [Google Scholar] [CrossRef] [Green Version]
- Dehais, F.; Peysakhovich, V.; Scannella, S.; Fongue, J.; Gateau, T. “Automation Surprise” in Aviation: Real-Time Solutions. In Proceedings of the 33rd annual ACM conference on Human Factors in Computing Systems, Seoul, Korea, 18–23 April 2015; pp. 2525–2534. [Google Scholar]
- Cooling, J.E.; Herbers, P.V. Considerations in autopilot litigation. J. Air L. Com. 1982, 48, 693. [Google Scholar]
- Sharma, V.; Voulgaris, P.G.; Frazzoli, E. Aircraft autopilot analysis and envelope protection for operation under icing conditions. J. Guid. Control Dyn. 2004, 27, 454–465. [Google Scholar] [CrossRef]
- Patsko, V.S.; Botkin, N.D.; Kein, V.M.; Turova, V.L.; Zarkh, M.A. Control of an aircraft landing in windshear. J. Optim. Theor. Appl. 1994, 83, 237–267. [Google Scholar] [CrossRef]
- Botkin, N.; Turova, V.; Diepolder, J.; Bittner, M.; Holzapfel, F. Aircraft control during cruise flight in windshear conditions: Viability approach. Dyn. Games Appl. 2017, 7, 594–608. [Google Scholar] [CrossRef] [Green Version]
- Blakelock, J.H. Automatic Control of Aircraft and Missiles.; John Wiley & Sons, Inc.: New York, NY, USA, 1991; pp. 10–53. [Google Scholar]
- Sanil, N.; Rakesh, V.; Mallapur, R.; Ahmed, M.R. Deep Learning Techniques for Obstacle Detection and Avoidance in Driverless Cars. In Proceedings of the 2020 International Conference on Artificial Intelligence and Signal Processing (AISP), Amaravati, India, 10–12 January 2020; pp. 1–4. [Google Scholar]
- Hodges, C.; An, S.; Rahmani, H.; Bennamoun, M. Deep learning for driverless vehicles. In Handbook of Deep Learning Applications; Springer: Cham, Switzerland, 2019; pp. 83–99. [Google Scholar]
- Han, M.; Senellart, P.; Bressan, S.; Wu, H. Routing an autonomous taxi with reinforcement learning. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Indianapolis, Indiana, 24–28 October 2016; pp. 2421–2424. [Google Scholar]
- Kendall, A.; Hawke, J.; Janz, D.; Mazur, P.; Reda, D.; Allen, J.M.; Shah, A. Learning to drive in a day. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 8248–8254. [Google Scholar]
- Heim, E.H.; Viken, E.; Brandon, J.M.; Croom, M.A. NASA’s Learn-to-Fly Project Overview. In Proceedings of the 2018 Atmospheric Flight Mechanics Conference, Atlanta, Georgia, 25–29 June 2018; p. 3307. [Google Scholar]
- Anwar, M.A.; Raychowdhury, A. NavREn-Rl: Learning to fly in real environment via end-to-end deep reinforcement learning using monocular images. In Proceedings of the 25th International Conference on Mechatronics and Machine Vision in Practice (M2VIP), Stuttgart, Germany, 20–22 November 2018; pp. 1–6. [Google Scholar]
- Shaker, M.; Smith, M.N.; Yue, S.; Duckett, T. Vision-based landing of a simulated unmanned aerial vehicle with fast reinforcement learning. In Proceedings of the 2010 International Conference on Emerging Security Technologies, Canterbury, UK, 6–7 September 2010; pp. 183–188. [Google Scholar]
- Caruana, R.; Niculescu-Mizil, A. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd international conference on Machine learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 161–168. [Google Scholar]
- Cunningham, P.; Cord, M.; Delany, S.J. Supervised learning. In Machine Learning Techniques for Multimedia; Springer: Berlin/Heidelberg, Germany, 2008; pp. 21–49. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018; pp. 5–23. [Google Scholar]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. arXiv 2015, arXiv:1509.06461. Available online: https://arxiv.org/pdf/1509.06461 (accessed on 12 June 2020).
- Koch, W.; Mancuso, R.; West, R.; Bestavros, A. Reinforcement learning for UAV attitude control. ACM Trans. Cyber-Phys. Syst. 2019, 3, 1–21. [Google Scholar] [CrossRef] [Green Version]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. Available online: https://arxiv.org/pdf/1509.02971 (accessed on 21 June 2020).
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the 32nd international conference on machine learning, Lille, France, 7–9 June 2015; pp. 1889–1897. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. Available online: https://arxiv.org/pdf/1707.06347 (accessed on 10 July 2020).
- Xie, J.; Peng, X.; Wang, H.; Niu, W.; Zheng, X. UAV Autonomous Tracking and Landing Based on Deep Reinforcement Learning Strategy. Sensors 2020, 20, 5630. [Google Scholar] [CrossRef] [PubMed]
- Dynkin, E.B. Markov processes. In Markov Processes; Springer: Berlin/Heidelberg, Germany, 1965; pp. 77–104. [Google Scholar]
- Bellman, R. Dynamic programming and stochastic control processes. Inf. Control 1958, 1, 228–239. [Google Scholar] [CrossRef] [Green Version]
- Watkins, C.J.; Dayan, P. Q-learning. Mach. Lear. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. Available online: https://arxiv.org/pdf/1312.5602 (accessed on 6 July 2020).
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M. Deep reinforcement learning: A brief survey. IEEE Signal Process Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef] [Green Version]
- Wei, F.; Amaya-Bower, L.; Gates, A.; Rose, D.; Vasko, T. The Full-Scale Helicopter Flight Simulator Design and Fabrication at CCSU. In Proceedings of the 57th AIAA/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference, San Diego, CA, USA, 4–8 January 2016; p. 0582. [Google Scholar]
- Jirgl, M.; Boril, J.; Jalovecky, R. The identification possibilities of the measured parameters of an aircraft model and pilot behavior model on the flight simulator. In Proceedings of the International Conference on Military Technologies (ICMT), Brno, Czech Republic, 19–21 May 2015; pp. 1–5. [Google Scholar]
- Kaviyarasu, A.; Senthil Kumar, K. Simulation of Flapping-wing Unmanned Aerial Vehicle using X-plane and Matlab/Simulink. Defence Sci. J. 2014, 64, 327–331. [Google Scholar] [CrossRef] [Green Version]
- Baomar, H.; Bentley, P.J. An Intelligent Autopilot System that learns piloting skills from human pilots by imitation. In Proceedings of the 2016 International Conference on Unmanned Aircraft Systems (ICUAS), Arlington, VA, USA, 7–10 June 2016; pp. 1023–1031. [Google Scholar]
- Teubert, C.; Watkins, J. The X-Plane Connect Toolbox (2020). Available online: https://github.com/nasa/XPlaneConnect (accessed on 2 August 2020).
- Wayllace, C.; Ha, S.; Han, Y.; Hu, J.; Monadjemi, S.; Yeoh, W.; Ottley, A. DRAGON-V: Detection and Recognition of Airplane Goals with Navigational Visualization. In AAAI 2020; AAAI: Menlo Park, CA, USA, 2020; pp. 13642–13643. [Google Scholar] [CrossRef]
- Fremont, D.J.; Chiu, J.; Margineantu, D.D.; Osipychev, D.; Seshia, S.A. Formal Analysis and Redesign of a Neural Network-Based Aircraft Taxiing System with VerifAI. arXiv 2005, arXiv:2005.07173. Available online: https://arxiv.org/pdf/2005.07173 (accessed on 5 August 2020).
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. Available online: https://arxiv.org/pdf/1511.05952 (accessed on 28 August 2020).
- Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM 1995, 38, 58–68. [Google Scholar] [CrossRef]
Symbol | Description |
---|---|
Longitude | |
Latitude | |
Altitude | |
Pitch | |
Roll | |
Heading | |
Velocity in the longitude direction | |
Velocity in the latitude direction | |
Velocity in the altitude direction | |
Rotational velocity of pitch | |
Rotational velocity of roll | |
Rotational velocity of heading | |
True airspeed | |
Wind speed | |
The angle between wind speed and the aircraft heading | |
The last control command on the rudder | |
The last control command on the elevator | |
The last control command on the aileron | |
The last control command on the throttle | |
D | Deviation from the centerline of airstrip |
Preprocessing function for true airspeed | |
Preprocessing function for rudder control | |
Preprocessing function for elevator control | |
Preprocessing function for aileron control | |
Preprocessing function for wind speed | |
Preprocessing function for wind speed |
Data Received From X-Plane | Data Sent to X-Plane |
---|---|
Longitude | Operations on elevator |
Latitude | Operations on aileron |
Altitude | Operations on rudder |
Angle of pitch | Operations on throttle |
Angle of roll | |
Heading | |
Velocity along the longitude | |
Velocity along the latitude | |
Velocity along the altitude | |
Angular rate of change of pitch | |
Angular rate of change of roll | |
Angular rate of change of heading | |
True air speed |
Memory capacity (one of the 24 h) | 20,000 |
Learning rate (actor) | |
Learning rate (critic) | |
Optimization method | Adam |
Discount factor | |
The actor replacement interval | 800 |
The critic replacement interval | 600 |
Soft update factor | |
Batch size | 32 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, F.; Dai, S.; Zhao, Y. Learning to Have a Civil Aircraft Take Off under Crosswind Conditions by Reinforcement Learning with Multimodal Data and Preprocessing Data. Sensors 2021, 21, 1386. https://doi.org/10.3390/s21041386
Liu F, Dai S, Zhao Y. Learning to Have a Civil Aircraft Take Off under Crosswind Conditions by Reinforcement Learning with Multimodal Data and Preprocessing Data. Sensors. 2021; 21(4):1386. https://doi.org/10.3390/s21041386
Chicago/Turabian StyleLiu, Feng, Shuling Dai, and Yongjia Zhao. 2021. "Learning to Have a Civil Aircraft Take Off under Crosswind Conditions by Reinforcement Learning with Multimodal Data and Preprocessing Data" Sensors 21, no. 4: 1386. https://doi.org/10.3390/s21041386
APA StyleLiu, F., Dai, S., & Zhao, Y. (2021). Learning to Have a Civil Aircraft Take Off under Crosswind Conditions by Reinforcement Learning with Multimodal Data and Preprocessing Data. Sensors, 21(4), 1386. https://doi.org/10.3390/s21041386