Drone Control in AR: An Intuitive System for Single-Handed Gesture Control, Drone Tracking, and Contextualized Camera Feed Visualization in Augmented Reality
- Gesture control for drones, supporting all drone and camera motions, including complex motions;
- Drone tracking in AR, based on cumulative Inertial Measurement Unit (IMU) readings and an initial relative pose calibration, while exploring visual-based drift correction;
- Visualization of the drone’s camera feed in AR in context with the real world, considering the relative pose between drone and user, with the additional option to explore and integrate depth estimation for more accurate placement.
- A unified solution encompassing all of the above, consisting of a HoloLens 2 user interface app, an Android UAV interface app, and a Kafka message broker.
2. Related Work
2.1. Gesture Control
2.3. Computer Vision for AR
2.4. Situation Awareness and Virtual Information Visualization in AR
2.5. Visualization in Context
- Hand gestures are captured, interpreted, translated to drone commands, and forwarded to the drone.
- Telemetry data are used to estimate the drone’s current position, which is then sent to the AR device.
- Live video is captured by the drone’s camera, transmitted to the remote controller, and then streamed to the AR device directly.
3.4. Augmented Reality Application
- Drone control commands: This module is responsible for capturing the gesture/voice interactions of the user, translating them to specific control commands (i.e., pitch, roll, yaw, and throttle values), and sending them to the the Android app via the Kafka broker.
- Drone tracking: The drone tracking module is responsible for monitoring the spatial relation between the real world elements (the drone and the app user) and its virtual representatives. Drone tracking is the backbone for many feedback cues (visual, textual, contextualized videos) that provide the user with enhanced spatial perception. Tracking is possible following an initial calibration process and an IMU-based approach.
- Video feedback: The live video feed from the drone’s camera is projected on a canvas inside the augmented environment. The user can simultaneously be aware of his/her surroundings in the real environment while also viewing the drone’s video feed in his/her field of view (FoV). With minimal head movement and loss of focus, the user can perceive the surrounding environment from two different viewpoints. This video projection is modifiable and can either be placed in the center of the user’s FoV as shown in Figure 2 (FPV mode) or on a smaller panel projected from the drone’s virtual counterpart (Figure 3). The center placement is useful in tasks where higher detail of the drone’s camera feed is required. This contextualized video feed mode is called “egocentric” view with the drone at the center and allows for easier perception of what the drone “sees” during the flight and of the spatial correspondence between the drone’s and the user’s point of view. In contrast, the FPV mode is useful in tasks where higher detail of the drone’s camera feed is required. In addition, when using a drone equipped with an infrared camera, the user can choose between three different modalities, visible (RGB), infrared (IR), or fused (MSX).
- AR feedback and visualization: GUI elements and various visualization features were added in the AR application to provide feedback and various controls. Emphasis was given to creating an easy-to-use application and increasing the SA of the user. In more detail:
- Visualization of the virtual hand joints overlaying on top of the user hands, making it possible for the user to directly check if his/her hands are correctly perceived by HoloLens (Figure 4).
- Drone visualization based on the drone tracking module. A virtual drone is overlaid on top of the real drone so that the user is aware of the drone’s relative position even if it is not in a direct line of sight (e.g., behind a wall or building) or too far to be easily visible with a naked eye.
- An extensive virtual menu that includes all high level commands and toggles for enabling the application features is present in the lower part of the application view. The menu can be be hidden in order to not obstruct the user’s field of view.
- A navigation feedback panel is placed at the top part of the application’s window. In the feedback panel, the current perceived navigation commands are listed alongside with other drone- and user-related information such as the drones’s flying altitude and horizontal distance from the user, the drone heading, and the user heading.
3.5. UAV Interface Application
- Receiving control commands from Kafka and forwarding them to the drone;
- Collecting the drone’s IMU readings, aggregating them to estimate its position, and posting this to Kafka at regular intervals;
- Sending the drone’s live video feed directly to the HoloLens via network sockets.
- Gesture definition, in which an appropriate set of hand gestures was selected to control the drones, meeting requirements relating both to usability and control. This was the focus of earlier work (), refined and updated for the current work;
- Gesture acquisition, in which users’ hand gestures must be captured, interpreted, translated to flight or camera control commands, and forwarded to the drone via the UAV interface Android app;
- Drone position tracking in AR, in which the drone’s motions must be tracked in reference to the virtual world, based on aggregated IMU reading and optionally improved by visual cues;
- Calibration, in which the virtual environment must be aligned with the real world, in terms of both translation and rotation, so that virtual objects can be displayed in the context of the environment. Calibration is necessary both for tracking and displaying the drone in AR and displaying a contextualized video feed, which depends on tracking;
- Video transmission and visualization, in which the drone’s video feed must be transmitted to the HoloLens and displayed in AR in (near) real time, via the UAV interface Android app.
4.1. Gesture Definition
4.1.1. Requirements and Considerations
- To provide an easy-to-learn and intuitive interface;
- To be physically comfortable and ergonomic;
- To allow the full extent of drone motions, including combination motions and variable velocities.
4.1.2. Gesture Vocabulary
4.1.3. Drone Control Features
4.2. Gesture Acquisition
4.2.2. Gesture Acquisition and Interpretation in HoloLens 2
4.3. Drone Position Tracking in AR
4.4. Calibration of the AR Environment
4.4.1. Manual Two-Step Calibration
4.4.2. Visual Drone Pose Estimation
4.4.3. QR Code Reading
4.5. Video Transmission and Visualization
4.5.1. Video Streaming
4.5.2. Video Decoding
5. Testing and Evaluation
5.1. Subjective User Evaluation
5.1.1. Gesture Usability
5.1.2. Field Trials
5.2. Lab Testing and Objective Measurements
5.2.1. Drone Responsiveness
5.2.2. Positioning Accuracy
5.2.3. Video Transmission
6. Conclusions and Future Steps
6.2. AR Tracking and Visualization Present and Future Vision
Informed Consent Statement
Conflicts of Interest
- Özaslan, T.; Loianno, G.; Keller, J.; Taylor, C.J.; Kumar, V.; Wozencraft, J.M.; Hood, T. Autonomous navigation and mapping for inspection of penstocks and tunnels with MAVs. IEEE Robot. Autom. Lett. 2017, 2, 1740–1747. [Google Scholar] [CrossRef]
- Shihavuddin, A.; Chen, X.; Fedorov, V.; Nymark Christensen, A.; Andre Brogaard Riis, N.; Branner, K.; Bjorholm Dahl, A.; Reinhold Paulsen, R. Wind turbine surface damage detection by deep learning aided drone inspection analysis. Energies 2019, 12, 676. [Google Scholar] [CrossRef][Green Version]
- Wu, K.; Rodriguez, G.A.; Zajc, M.; Jacquemin, E.; Clément, M.; De Coster, A.; Lambot, S. A new drone-borne GPR for soil moisture mapping. Remote Sens. Environ. 2019, 235, 111456. [Google Scholar] [CrossRef]
- Hill, A.C. Economical drone mapping for archaeology: Comparisons of efficiency and accuracy. J. Archaeol. Sci. Rep. 2019, 24, 80–91. [Google Scholar] [CrossRef]
- Joyce, K.; Duce, S.; Leahy, S.; Leon, J.; Maier, S. Principles and practice of acquiring drone-based image data in marine environments. Mar. Freshw. Res. 2019, 70, 952–963. [Google Scholar] [CrossRef]
- Karjalainen, K.D.; Romell, A.E.S.; Ratsamee, P.; Yantac, A.E.; Fjeld, M.; Obaid, M. Social drone companion for the home environment: A user-centric exploration. In Proceedings of the 5th International Conference on Human Agent Interaction, Bielefeld, Germany, 17–20 October 2017; pp. 89–96. [Google Scholar]
- Mishra, B.; Garg, D.; Narang, P.; Mishra, V. Drone-surveillance for search and rescue in natural disaster. Comput. Commun. 2020, 156, 1–10. [Google Scholar] [CrossRef]
- Burke, C.; McWhirter, P.R.; Veitch-Michaelis, J.; McAree, O.; Pointon, H.A.; Wich, S.; Longmore, S. Requirements and Limitations of Thermal Drones for Effective Search and Rescue in Marine and Coastal Areas. Drones 2019, 3, 78. [Google Scholar] [CrossRef][Green Version]
- Tezza, D.; Andujar, M. The State-of-the-Art of Human–Drone Interaction: A Survey. IEEE Access 2019, 7, 167438–167454. [Google Scholar] [CrossRef]
- Suárez Fernández, R.A.; Sanchez-Lopez, J.L.; Sampedro, C.; Bavle, H.; Molina, M.; Campoy, P. Natural user interfaces for human-drone multi-modal interaction. In Proceedings of the 2016 International Conference on Unmanned Aircraft Systems (ICUAS), Arlington, VA, USA, 7–10 June 2016; pp. 1013–1022. [Google Scholar] [CrossRef][Green Version]
- Herrmann, R.; Schmidt, L. Design and Evaluation of a Natural User Interface for Piloting an Unmanned Aerial Vehicle: Can gestural, speech interaction and an augmented reality application replace the conventional remote control for an unmanned aerial vehicle? i-com 2018, 17, 15–24. [Google Scholar] [CrossRef]
- Kleinschmidt, S.P.; Wieghardt, C.S.; Wagner, B. Tracking Solutions for Mobile Robots: Evaluating Positional Tracking using Dual-axis Rotating Laser Sweeps. In Proceedings of the ICINCO 2017, Madrid, Spain, 26–28 July 2017; pp. 155–164. [Google Scholar]
- Islam, S.; Ionescu, B.; Gadea, C.; Ionescu, D. Indoor positional tracking using dual-axis rotating laser sweeps. In Proceedings of the 2016 IEEE International Instrumentation and Measurement Technology Conference Proceedings, Taipei, Taiwan, 23–26 May; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
- Arreola, L.; De Oca, A.M.; Flores, A.; Sanchez, J.; Flores, G. Improvement in the UAV position estimation with low-cost GPS, INS and vision-based system: Application to a quadrotor UAV. In Proceedings of the 2018 International Conference on Unmanned Aircraft Systems (ICUAS), Dallas, TX, USA, 12–15 June; IEEE: Piscataway, NJ, USA, 2018; pp. 1248–1254. [Google Scholar]
- Tsai, S.E.; Zhuang, S.H. Optical flow sensor integrated navigation system for quadrotor in GPS-denied environment. In Proceedings of the 2016 International Conference on Robotics and Automation Engineering (ICRAE), Jeju, Korea, 27–29 August; IEEE: Piscataway, NJ, USA, 2016; pp. 87–91. [Google Scholar]
- Hong, Y.; Lin, X.; Zhuang, Y.; Zhao, Y. Real-time pose estimation and motion control for a quadrotor uav. In Proceedings of the 11th World Congress on Intelligent Control and Automation, Shenyang, China, 29 June–4 July 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 2370–2375. [Google Scholar]
- Hoff, W.A.; Nguyen, K.; Lyon, T. Computer-vision-based registration techniques for augmented reality. In Intelligent Robots and Computer Vision XV: Algorithms, Techniques, Active Vision, and Materials Handling; International Society for Optics and Photonics: Bellingham, WA, USA, 1996; Volume 2904, pp. 538–548. [Google Scholar]
- Deshmukh, S.S.; Joshi, C.M.; Patel, R.S.; Gurav, Y. 3D object tracking and manipulation in augmented reality. Int. Res. J. Eng. Technol. 2018, 5, 287–289. [Google Scholar]
- Shreyas, E.; Sheth, M.H.; Mohana. 3D Object Detection and Tracking Methods using Deep Learning for Computer Vision Applications. In Proceedings of the 2021 International Conference on Recent Trends on Electronics, Information, Communication Technology (RTEICT), Bangalore, India, 27–28 August 2021; pp. 735–738. [Google Scholar] [CrossRef]
- Rambach, J.; Deng, C.; Pagani, A.; Stricker, D. Learning 6DoF Object Poses from Synthetic Single Channel Images. In Proceedings of the 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Munich, Germany, 16–20 October 2018; pp. 164–169. [Google Scholar] [CrossRef]
- Li, J.; Wang, C.; Kang, X.; Zhao, Q. Camera localization for augmented reality and indoor positioning: A vision-based 3D feature database approach. Int. J. Digit. Earth 2020, 13, 727–741. [Google Scholar] [CrossRef]
- Yuan, L.; Reardon, C.; Warnell, G.; Loianno, G. Human Gaze-Driven Spatial Tasking of an Autonomous MAV. IEEE Robot. Autom. Lett. 2019, 4, 1343–1350. [Google Scholar] [CrossRef]
- Albanis, G.; Zioulis, N.; Dimou, A.; Zarpalas, D.; Daras, P. Dronepose: Photorealistic uav-assistant dataset synthesis for 3D pose estimation via a smooth silhouette loss. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 663–681. [Google Scholar]
- Endsley, M.R. Toward a Theory of Situation Awareness in Dynamic Systems. Hum. Factors 1995, 37, 32–64. [Google Scholar] [CrossRef]
- Silvagni, M.; Tonoli, A.; Zenerino, E.; Chiaberge, M. Multipurpose UAV for search and rescue operations in mountain avalanche events. Geomat. Nat. Hazards Risk 2017, 8, 18–33. [Google Scholar] [CrossRef][Green Version]
- Volckaert, B. Aiding First Incident Responders Using a Decision Support System Based on Live Drone Feeds. In Proceedings of the Knowledge and Systems Sciences: 19th International Symposium, KSS 2018, Tokyo, Japan, 25–27 November 2018; Springer: Berlin/Heidelberg, Germany, 2018; Volume 949, p. 87. [Google Scholar]
- Hong, T.C.; Andrew, H.S.Y.; Kenny, C.W.L. Assessing the Situation Awareness of Operators Using Maritime Augmented Reality System (MARS). Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2015, 59, 1722–1726. [Google Scholar] [CrossRef]
- Rowen, A.; Grabowski, M.; Rancy, J.P. Through the Looking Glass(es): Impacts of Wearable Augmented Reality Displays on Operators in a Safety-Critical System. IEEE Trans. Hum.-Mach. Syst. 2019, 49, 652–660. [Google Scholar] [CrossRef]
- Lukosch, S.; Lukosch, H.; Datcu, D.; Cidota, M. Providing information on the spot: Using augmented reality for situational awareness in the security domain. Comput. Support. Coop. Work (CSCW) 2015, 24, 613–664. [Google Scholar] [CrossRef][Green Version]
- Brejcha, J.; Lukác, M.; Chen, Z.; DiVerdi, S.; Cadík, M. Immersive trip reports. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology, Berlin, Germany, 14–17 October 2018; pp. 389–401. [Google Scholar]
- Wang, Y.; Krum, D.M.; Coelho, E.M.; Bowman, D.A. Contextualized Videos: Combining Videos with Environment Models to Support Situational Understanding. IEEE Trans. Vis. Comput. Graph. 2007, 13, 1568–1575. [Google Scholar] [CrossRef] [PubMed]
- Konstantoudakis, K.; Albanis, G.; Christakis, E.; Zioulis, N.; Dimou, A.; Zarpalas, D.; Daras, P. Single-Handed Gesture UAV Control for First Responders—A Usability and Performance User Study. In Proceedings of the 17th International Conference on Information Systems for Crisis Response and Management (ISCRAM 2020), Blacksburg, VA, USA, 24–27 May 2020; Volume 17, pp. 937–951. [Google Scholar]
- Peshkova, E.; Hitz, M.; Kaufmann, B. Natural Interaction Techniques for an Unmanned Aerial Vehicle System. IEEE Pervasive Comput. 2017, 16, 34–42. [Google Scholar] [CrossRef]
- Nielsen, M.; Störring, M.; Moeslund, T.B.; Granum, E. A Procedure for Developing Intuitive and Ergonomic Gesture Interfaces for HCI. In Gesture-Based Communication in Human-Computer Interaction; Camurri, A., Volpe, G., Eds.; Springer: Berlin/Heidelberg, Germany, 2004; pp. 409–420. [Google Scholar]
- Sainidis, D.; Tsiakmakis, D.; Konstantoudakis, K.; Albanis, G.; Dimou, A.; Daras, P. Single-handed gesture UAV control and video feed AR visualization for first responders. In Proceedings of the 18th International Conference on Information Systems for Crisis Response and Management (ISCRAM 2021), Blacksburg, VA, USA, 23–26 May 2021; Volume 18, pp. 835–848. [Google Scholar]
- Zhang, F.; Bazarevsky, V.; Vakunov, A.; Tkachenka, A.; Sung, G.; Chang, C.L.; Grundmann, M. Mediapipe hands: On-device real-time hand tracking. arXiv 2020, arXiv:2006.10214. [Google Scholar]
- Anton, H.; Rorres, C. Elementary Linear Algebra: Applications Version; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. [Google Scholar]
- Chen, B.; Parra, A.; Cao, J.; Li, N.; Chin, T.J. End-to-end learnable geometric vision by backpropagating PnP optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8100–8109. [Google Scholar]
- Albanis, G.N.; Zioulis, N.; Chatzitofis, A.; Dimou, A.; Zarpalas, D.; Daras, P. On End-to-End 6DOF Object Pose Estimation and Robustness to Object Scale. ML Reproducibility Challenge 2020. 2021. Available online: https://openreview.net/forum?id=PCpGvUrwfQB (accessed on 20 December 2021).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Konstantoudakis, K.; Christaki, K.; Tsiakmakis, D.; Sainidis, D.; Albanis, G.; Dimou, A.; Daras, P. Drone Control in AR: An Intuitive System for Single-Handed Gesture Control, Drone Tracking, and Contextualized Camera Feed Visualization in Augmented Reality. Drones 2022, 6, 43. https://doi.org/10.3390/drones6020043
Konstantoudakis K, Christaki K, Tsiakmakis D, Sainidis D, Albanis G, Dimou A, Daras P. Drone Control in AR: An Intuitive System for Single-Handed Gesture Control, Drone Tracking, and Contextualized Camera Feed Visualization in Augmented Reality. Drones. 2022; 6(2):43. https://doi.org/10.3390/drones6020043Chicago/Turabian Style
Konstantoudakis, Konstantinos, Kyriaki Christaki, Dimitrios Tsiakmakis, Dimitrios Sainidis, Georgios Albanis, Anastasios Dimou, and Petros Daras. 2022. "Drone Control in AR: An Intuitive System for Single-Handed Gesture Control, Drone Tracking, and Contextualized Camera Feed Visualization in Augmented Reality" Drones 6, no. 2: 43. https://doi.org/10.3390/drones6020043