VR-Based Teleoperation of UAV–Manipulator Systems: From Single-UAV Control to Dual-UAV Cooperative Manipulation

Yang, Zhaotong; Tomita, Kohji; Kamimura, Akiya

doi:10.3390/app152011086

Open AccessArticle

VR-Based Teleoperation of UAV–Manipulator Systems: From Single-UAV Control to Dual-UAV Cooperative Manipulation

by

Zhaotong Yang

¹,

Kohji Tomita

² and

Akiya Kamimura

^2,*

¹

Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba 305-8577, Japan

²

National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba 305-8568, Japan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(20), 11086; https://doi.org/10.3390/app152011086

Submission received: 25 September 2025 / Revised: 14 October 2025 / Accepted: 14 October 2025 / Published: 16 October 2025

Download

Browse Figures

Versions Notes

Abstract

In this paper, we present a VR-based control framework for multi-UAV (rotorcraft-type) aerial manipulation that enables simultaneous control of each UAV and its onboard five-degree-of-freedom (5-DoF) manipulator using virtual-reality controllers. Instead of relying on dense button mappings or predefined gestures, the framework maps natural VR-controller motions in real time to vehicle pose and arm joint commands. The UAVs respond smoothly to translational and rotational inputs, while the manipulators accurately replicate dexterous hand motions for precise grasping. Beyond single-platform operation, we extend the framework to cooperative dual-UAV manipulation, leveraging two-hand poses captured via VR controllers to coordinate two UAV-arm systems for payload transportation and obstacle traversal. Simulation experiments demonstrate accurate trajectory tracking and the potential for successful cooperative transport in cluttered environments, indicating the framework’s suitability for telemanipulation, search-and-rescue, and industrial tasks.

Keywords:

human–robot interaction; intuitive motion control; multiple UAVs; VR-based control framework; teleoperation

1. Introduction

Unmanned aerial vehicles (UAVs) have been widely applied in areas such as environmental monitoring, infrastructure inspection, and logistics delivery [1]. Their adaptability arises from diverse configurations in size, structure, and control strategies, which allow them to meet different mission requirements. However, conventional UAVs mainly serve as passive observers, limited to data collection rather than direct interaction with their surroundings. To overcome this limitation, robotic arms have been integrated into UAV platforms, giving rise to aerial manipulators [2]. These systems extend UAV functionality to include physical interactions such as grasping, tool handling, and surface inspection, enabling operations in confined or hazardous environments where human presence is impractical—for example, the maintenance of bridges, wind turbines, and power lines [3].

To enhance human–robot interaction, researchers have explored more intuitive input modalities beyond traditional remote controllers. Representative approaches include gaze-based control [4,5,6], hand gestures recognized via vision or wearable devices [7,8,9,10,11,12,13], and voice commands [14]. These modalities aim to reduce operator burden, improve adaptability, and support manipulation tasks. At the same time, surveys have comprehensively reviewed teleoperation and human–robot interaction, emphasizing their importance in extending human capabilities into remote or dangerous environments [15,16,17]. Nevertheless, conventional teleoperation often depends on joysticks, screens, and button-based interfaces, which constrain spatial perception and limit the intuitiveness of control.

Recent studies have explored diverse approaches to enhance teleoperation. For instance, vision-based techniques have been employed to control manipulators using computer vision frameworks such as OpenCV [18]. In parallel, haptic feedback has been introduced to provide operators with tactile cues, improving the perception of contact and manipulation quality [19]. More recently, immersive technologies have emerged as transformative tools for intuitive teleoperation [20,21,22,23,24,25]. By embedding the operator within a real-time 3D replica of the robot’s workspace, VR offers an embodied perspective that enhances situational awareness, motor coordination, and control fluency. Such multi-sensory feedback enables operators to perform complex manipulation tasks with higher precision and reduced cognitive load. Beyond teleoperation, VR has also been integrated with deep learning approaches, where imitation learning leverages natural VR-based demonstrations to train visuomotor policies directly from pixels [26]. Collectively, these advances highlight VR as a promising paradigm for natural, efficient, and scalable teleoperation frameworks.

Despite these advances, most VR-based teleoperation frameworks remain restricted to single platforms, where UAV flight and manipulation are often handled sequentially rather than in a fully integrated manner. Interfaces typically employ a single VR controller, thereby limiting operators to one-handed control and underutilizing bimanual capability. This restriction not only reduces operational efficiency but also prevents operators from exploiting natural two-handed coordination, which is often essential for complex manipulation tasks.

At the same time, cooperative aerial transportation using multiple UAVs has been extensively investigated, including methods for collision-free trajectory planning [27], the transportation of flexible payloads [28], and the vision-based coordination of small quadrotors [29]. These works have demonstrated effective cooperative manipulation but largely depend on pre-programmed planning, strict synchronization, or communication strategies, which limit adaptability in dynamic and unstructured environments. As a result, they primarily address coordination and stability at the control level while overlooking how immersive interfaces could enable intuitive, real-time, and bimanual human control. In parallel, recent studies on shared autonomy in teleoperation [30,31] have shown that blending human input with autonomous behaviors can effectively reduce cognitive load and enhance the robustness of remote robotic manipulation. In contrast, the proposed VR framework remains fully operator-driven, focusing on reducing cognitive effort through enhanced spatial perception rather than control delegation. This distinction positions the present work as a perception-augmented alternative to shared-autonomy strategies, emphasizing intuitive human engagement over automated control sharing.

To address these limitations, this paper proposes an immersive VR-based teleoperation framework that enables a single operator to intuitively control one or two UAV–manipulator systems using dual VR controllers.

The main contributions of this work are as follows:

1: Proposed an intuitive VR-based teleoperation framework for aerial manipulation, enabling simultaneous control of a UAV and its onboard manipulator through natural hand motions captured by VR controllers, without relying on dense button mappings or predefined gestures.
2: Demonstrated the framework in single-UAV scenarios, where the operator controlled both the UAV flight and the manipulator joints using VR controllers to approach, align with, and grasp a target object in simulation.
3: Extended the framework to dual-UAV cooperative manipulation, leveraging the operator’s two-hand poses to directly map onto two UAV–manipulator units, thereby achieving coordinated payload transportation and obstacle traversal in simulation.
4: Integrated the above components into a unified teleoperation architecture that connects human motion input, UAV dynamics, and manipulator control within the same framework. This system-level integration demonstrates how existing technologies can be cohesively combined to enable intuitive, synchronized, and scalable human–robot collaboration for aerial manipulation.

Collectively, these contributions advance UAV teleoperation from single-platform control toward scalable, immersive, and cooperative multi-UAV manipulation, providing a foundation for future applications in inspection, disaster response, and industrial operations.

2. Control Methods Design

2.1. Single-UAV and Manipulator Control

Firstly, traditional quad-copter UAVs are typically operated using a handheld radio transmitter, which adjusts motor speeds to control throttle, yaw, pitch, and roll, as illustrated in Figure 1. The onboard flight controller interprets these commands to maintain flight stability and execute precise maneuvers.

Moreover, conventional setups limit one operator to control only a single UAV, making coordinated multi-UAV operations difficult and even more demanding when robotic arms are attached. To address these issues and offer more intuitive user experience, we propose a UAV control system based on a VR controller. As shown in Figure 2, the controller features a circular touchpad capable of detecting finger position. And the Mode button, Trigger button, and Gripper button each correspond to different functions. The user can simply slide their finger to move the UAV in any horizontal direction—forward, backward, left, or right within the global coordinate frame. The touch location also determines the speed of movement, allowing the operator to modulate speed naturally. The system continuously monitors the touch coordinates and, through ROS integration, translates them into real-time velocity commands. This setup supports fluid, responsive control and minimizes cognitive demand. By enabling more instinctive and proportional speed adjustments, it improves both maneuverability and user engagement, simplifying remote UAV operation overall.

The UAV’s movement involves not only horizontal translation but also adjustments in altitude and rotation around the yaw axis. As illustrated in Figure 3, altitude control is governed by the pitch angle of the VR controller. The controller’s orientation is continuously tracked in real time by a pair of positioning base stations. The resulting orientation data are transmitted from Unity (a cross-platform game engine widely used for simulation and visualization) to the main control computer via rosbridge. By interpreting the quaternion data representing the controller’s attitude, the system calculates the real-time pitch angle. When the controller is held level, no change in altitude occurs. In this setup, a tilt threshold of 60 degrees is empirically defined: pitching the controller upward increases altitude, while pitching it downward causes descent. The pitch angle of the VR controller is continuously tracked, and the corresponding upward or downward velocity commands are transmitted via ROS for the UAV’s flight controller. The corresponding control logic for altitude adjustment is outlined in Algorithm 1.

Algorithm 1 Tilt-Triggered UAV Altitude Control (Ascent/Descent)

Input: Controller quaternion q = (w, x, y, z)

, threshold θ_{t h} = 60^{\circ}

Output: UAV motion command trigger
1: while system is running do
2: Read controller orientation via ROS topic

3 : Compute pitch angle θ_{p i t c h} = \arcsin (2 (w y - z x))

4 : if θ_{p i t c h} > θ_{t h}

then
5: Send ascending command to UAV

6 : else if θ_{p i t c h} < - θ_{t h}

then
7:               Send descending command to UAV
8:        else
9:             Maintain UAVs in hover state
10: end while

Yaw control of the UAV follows a similar principle to altitude control. As shown in Figure 4, changes in yaw are governed by the roll angle of the VR controller. When the controller is held level along the roll axis, the UAV maintains its current yaw orientation. A tilt threshold of 60 degrees is defined for this axis as well: tilting the controller to the left results in a counterclockwise yaw rotation of the UAV, whereas tilting it to the right produces a clockwise rotation. The controller’s roll angle is continuously monitored and extracted from its quaternion-based attitude data. These real-time roll values are transmitted via ROS and converted into appropriate yaw velocity commands for the UAV’s flight controller.

The use of a VR controller for manipulator control represents a significant advancement in robotic teleoperation, providing an intuitive interface that directly maps the operator’s intended motions to the system. This section describes the mechanisms through which the VR controller regulates the spatial position and orientation of the manipulator’s end-effector. As illustrated in Figure 5, the manipulator mounted on the UAV platform is equipped with five degrees of freedom (DoF), denoted as joints J1 through J5. While not every joint is directly and independently controlled by the VR device, the control scheme emphasizes intuitive mapping of the controller’s motion to the task space, with certain joints coordinated automatically to realize the commanded end-effector pose. Specifically, J1–J3 correspond to the main arm joints that primarily determine the gross positioning of the manipulator, J4 and J5 function as wrist joints enabling fine orientation adjustments, and the final gripper joint provides grasping capability for object interaction. The following subsections explain the functional roles of J1–J5 and how the VR controller inputs are assigned to these DoFs.

The first three degrees of freedom of the robotic manipulator—J1 through J3—are responsible for determining the approximate spatial position of the end-effector. Utilizing the spatial tracking capabilities of the VR controller, the system captures the operator’s hand position in three-dimensional space and transmits this data to the MoveIt motion planning framework on the master computer via rosbridge. MoveIt is a widely adopted motion planning tool, particularly effective for managing complex motion sequences and path planning in robotic systems. Based on the received spatial coordinates, MoveIt performs inverse kinematics (IK) computations to generate the corresponding joint commands, which drive the actuators associated with J1, J2, and J3.

By holding the Mode button and simply moving the VR controller in free space, the operator can intuitively control the position of the manipulator’s end-effector. This direct mapping between human movement and robot response enhances positioning accuracy, reduces the operator’s cognitive workload, and facilitates the execution of sophisticated manipulation tasks.

In addition to position control, the system also supports rotational manipulation of the end-effector, specifically the wrist pitch (J4) and wrist roll (J5). These rotational degrees of freedom are adjusted by modifying the orientation of the VR controller, namely its pitch and roll angles. To distinguish manipulator control from UAV control, a mode-switching mechanism is implemented using a button on the VR controller. When the button is held, the system enters manipulator control mode, allowing changes in orientation to affect only the manipulator. Meanwhile, by holding the Trigger button, the user can control the two wrist joints (J4, J5) simultaneously. Once the Mode button is released, the system returns to UAV control mode. This switching mechanism effectively prevents control conflicts and enhances the overall operability of the system.

As illustrated in Figure 3, the pitch axis of the VR controller is mapped to joint J4, which governs the pitch of the manipulator’s wrist. Similarly to how altitude is controlled in the UAV system, the operator can intuitively increase or decrease the wrist pitch angle by tilting the VR controller upward or downward. Figure 4 shows the control of J5, corresponding to the roll axis of the manipulator’s wrist. In the same manner as yaw control for the UAV, the operator can rotate the VR controller along its roll axis to adjust the wrist’s roll angle in either direction. This direct mapping between the controller’s orientation and the manipulator’s rotational movement facilitates precise and intuitive wrist control during task execution.

In summary, the proposed control method leverages natural human motion, captured through the VR controller, to provide an intuitive and highly responsive interface for operating the robotic manipulator. This approach not only simplifies control in both industrial and research contexts but also broadens the scope for remote and precise manipulation in a range of application domains. The seamless integration of spatial tracking with real-time motion mapping enhances operability, making complex tasks more accessible.

2.2. Dual-UAV Cooperative Control

In cooperative aerial manipulation, two UAVs are often required to transport elongated payloads through constrained environments while maintaining stability and controllability. This introduces challenges that are not encountered in single-UAV tasks, such as coordinating relative motion, ensuring continuous visual feedback, and adapting to obstacles or gaps of varying geometry. To address these requirements, we designed a set of fundamental operation modes that can be intuitively commanded through VR-based hand gestures. All gestures, even in the case of dual-UAV cooperative control, do not directly correspond to the real-time motion of the drones or manipulators, but rather serve as triggers to initiate or terminate each operation.

Figure 6 illustrates these operation modes. In (a)-(1), translational gestures are mapped to UAV movements in six directions within the global coordinate frame, enabling the operator to reposition the vehicles in three-dimensional space. This capability is crucial for navigation in cluttered environments, where frequent and precise positional adjustments are required. In (a)-(2) and (3), the yaw motion of the UAV is coupled with the first manipulator joint (J1). Since the onboard camera is mounted at the front of the UAV, yawing alone would cause the camera to turn away from the direction of travel during lateral motion. By coupling yaw with the first joint, the manipulator automatically rotates in the opposite direction, keeping the payload aligned with the moving direction and maintaining continuous visual feedback. Subfigures (a)-(4) and (5) show circular gestures, where the two hands are positioned sequentially with a phase offset, resulting in coordinated circular trajectories. This design supports cooperative reorientation and encirclement of elongated payloads, as circular motion provides a predictable and stable relative geometry between the UAVs. Finally, (a)-(6) and (7) demonstrate inclined-surface traversal, in which one UAV maintains a higher altitude and the other a lower altitude, causing the payload to tilt. This mode was introduced to allow passage through narrow or slanted gaps, which are representative of realistic operational constraints.

The symmetric circular motion of the dual-UAV system is modeled with respect to a common center

c = (x_{c}, y_{c})

. The leader UAV1 follows a circular trajectory parameterized by its phase angle

ϕ (t)

, while the follower UAV 2 remains at the diametrically opposite position with respect to the center. This configuration keeps the UAVs evenly distributed on the circle and ensures a constant inter-UAV distance, thereby guaranteeing non-intersecting trajectories during cooperative maneuvers.

The phase angle

ϕ (t)

evolves as:

ϕ (t) = ϕ_{0} + ω t .

(1)

The positions of the UAVs are expressed as:

p_{1} (t) = [\begin{matrix} x_{c} + r c o s (ϕ (t)) \\ y_{c} + r s i n (ϕ (t)) \\ h_{1} \end{matrix}], p_{2} (t) = [\begin{matrix} x_{c} - r c o s (ϕ (t)) \\ y_{c} - r s i n (ϕ (t)) \\ h_{2} \end{matrix}] .

(2)

Yaw orientations are calculated as:

ψ_{1} (t) = a t a n 2 (\dot{y_{1}}, \dot{x_{1}}) = ϕ (t) + \frac{π}{2}, ψ_{2} (t) = ψ_{1} (t) + π .

(3)

To translate the VR controller input into this circular motion behavior, the real-time positions of the two controllers are interpreted to determine the rotational direction and trigger corresponding UAV maneuvers. The detailed procedure of this dual-UAV rotation control process is summarized in Algorithm 2 (Dual-UAV Rotation Control via VR Input).

Algorithm 2 Dual-UAV Rotation Control via VR Input

Input: Positions of left and right VR controllers

p_{L} = (x_{L}, y_{L}, z_{L}), p_{R} = (x_{R}, y_{R}, z_{R})

Output: Direction of circular maneuver (CW/CCW)
1: while system is running do
2: Read controller positions via ROS topic

3 : Compute position difference Δ y = y_{R} - y_{L}

4 : if Δ y

> 0.30 m then
5: Trigger counter-clockwise (CCW), set ω > 0

6 : else if Δ y

< −0.30 m then
7:             Trigger clockwise (CW), set ω < 0
8:        else
9:             Maintain UAVs in hover state
10: end while

As illustrated in Figure 7, this formulation ensures that the UAVs remain symmetrically opposed on the circle with consistent tangential orientation, enabling cooperative encirclement maneuvers.

In the yaw compensation of UAV–manipulator configuration, when a UAV carries a robotic manipulator, the manipulator base is mechanically fixed to the UAV body frame

{B}

, while the world reference frame is denoted as

{W}

. The yaw angle of the UAV with respect to the world frame is expressed as

ψ_{d} (t)

. The first revolute joint of the manipulator, denoted as J1, rotates around the vertical axis with angle

θ_{1} (t)

. Although the manipulator is structurally fixed to the UAV, its effective yaw orientation in the world frame can be decoupled by appropriately controlling J1.

Without such compensation, any change in the UAV yaw angle directly alters the orientation of the manipulator in

{W}

, leading to undesired motion of the end-effector. To address this issue, we introduce a yaw compensation mechanism at joint J1.

The UAV yaw rotation can be described by the planar rotation matrix:

R_{z} (ψ_{d}) = [\begin{matrix} \cos ψ_{d} & - \sin ψ_{d} \\ \sin ψ_{d} & \cos ψ_{d} \end{matrix}],

(4)

where

ψ_{d} (t)

is the UAV yaw angle in the world frame.

The absolute yaw orientation of the manipulator base in the world frame, denoted as

ψ_{m} (t)

is given by:

ψ_{m} (t) = ψ_{d} (t) + θ_{1} (t),

(5)

where

θ_{1} (t)

is the rotation of the first manipulator joint about the vertical (yaw) axis.

To ensure that the manipulator maintains constant yaw in the world frame, i.e.,

The corresponding compensation law is derived as:

θ_{1} (t) = ψ_{m} (0) - ψ_{d} (t) .

(6)

Accordingly, the detailed control logic of the VR-Triggered Yaw Compensation for UAV–Manipulator System is summarized in Algorithm 3.

Algorithm 3 Yaw Compensation for UAV–Manipulator System via VR Input

Input: Rotations of left and right VR controllers

q_{R} = (w_{R}, x_{R}, y_{R}, z_{R})

Output: Yaw rotation of the UAV and inverse compensation of joint J1
1: while system is running do
2: Read controller rotations via ROS topic

3 : Extract roll angle of the right controller ϕ_{R}

from quaternion q_{R}

4 : if ϕ_{R} \geq 90^{\circ}

and left controller remains level then

5 : Set UAV yaw ψ_{d} = ψ_{c} + Δ_{offset}

6 : Set arm joint J 1 angle θ_{1} = - ψ_{c}

7: else
8: Maintain UAVs in hover state
9: end while

This compensation mechanism allows the UAV to freely perform yaw maneuvers for navigation or perception purposes, while the manipulator maintains a fixed orientation in the global frame, as illustrated in Figure 8. Such a principle is essential for tasks requiring stationary alignment or precise environmental interaction, such as grasping, docking, or tool operation.

During cooperative manipulation, the two UAVs may intentionally adopt different altitudes to traverse inclined planes or perform asymmetric maneuvers. When the altitude difference between the UAVs is

Δ h

and their horizontal separation is

d

, the manipulated rod will form an inclination angle

θ

with respect to the horizontal plane, as illustrated in Figure 9. Since each gripper is rigidly attached to the rod, misalignment between the rod and the end-effector orientation would introduce undesirable torques and mechanical stress. To mitigate this, one degree of freedom of the manipulator wrist (the roll joint) is dedicated to compensating for the rod inclination. Accordingly, the control logic of the tilt-triggered motion and wrist compensation is described in Algorithm 4.

Algorithm 4 Cooperative UAV Motion and Wrist Compensation via VR Input

Input: Positions of left and right VR controllers

p_{L} = (x_{L}, y_{L}, z_{L}), p_{R} = (x_{R}, y_{R}, z_{R})

Output: Vertical motion of UAVs and J5 compensation angles
1: while system is running do
2: Read controller positions via ROS topic

3 : Compute position difference Δ z = z_{L} - z_{R}

4 : if ∣ Δ z ∣ > 0.20 m

then

5 : if Δ z > 0

then
6:                  Left UAV ascends; Right UAV descends.
7:             else
8:                  Right UAV ascends; Left UAV descends.
9:             Compute inclination angle

10 : θ = \arctan (\frac{Δ h}{d})

11: Set commanded wrist roll angles:

12 : q_{\{r o l l, c m d\}, 1} = q_{\{r o l l, c m d\}, 2} = q_{\{r o l l, r e f\}} + θ

13: else
14: Maintain UAVs and manipulators in hover state.
15: end while

This compensation law ensures that the gripper orientation remains aligned with the rod axis, regardless of UAV altitude differences. Consequently, the UAVs can stably manipulate elongated objects or traverse inclined planes without inducing excessive mechanical load. Moreover, this strategy effectively decouples manipulator alignment from UAV flight control, improving stability and robustness of cooperative aerial manipulation.

In summary, the proposed Dual-UAV Cooperative Control establishes a unified scheme that enables two UAV-arm systems to function as a coordinated unit. By integrating VR-based teleoperation with cooperative control strategies, the framework achieves synchronized grasping, maintains consistent inter-vehicle geometry, and demonstrates stable cooperative transport in simulation. These results provide a methodological basis for extending aerial manipulation from single-UAV control toward dual-UAV cooperation, thereby preparing the ground for the experimental validation and broader multi-robot applications discussed in the following sections.

3. Experiments Using VR System

3.1. Experimental Setup

The control system (Figure 10) consists of a VR interface on a Windows PC and a simulation environment with ROS on an Ubuntu PC. The VR interface includes an HMD and controllers, while Unity and rosbridge handle communication with the simulation. The environment contains multiple UAVs equipped with manipulators and cameras for aerial grasping tasks. The software integrates ROS, Gazebo, Unity, and rosbridge. ROS manages communication, sensor fusion, and motion planning; Gazebo provides physics-based simulation; Unity offers real-time visualization; and rosbridge enables data exchange. A dual-PC configuration is employed, with ROS, Gazebo, and rosbridge running on Linux and Unity on Windows, to reduce latency by separating VR rendering from physics simulation and ensuring synchronized teleoperation. To clarify the simulation settings, the flight characteristics of the quadcopters used in the experiments are listed in Table 1.

As illustrated in Figure 11, the VR kit (HTC Corporation, Taoyuan City, Taiwan) includes an HMD, controllers, and positioning base stations. By tracking the controllers with two base stations, the system captures spatial pose and maps hand motions to UAV flight and manipulator commands, enabling immersive and natural interaction with the simulated environment.

3.2. Experimental Tasks

To evaluate the effectiveness of the proposed VR-based control framework, a series of experimental tasks were designed in both single-UAV and dual-UAV scenarios. The tasks were selected to reflect representative challenges in aerial manipulation, including precise positioning, cooperative control, and operation in constrained environments.

In the task shown in Figure 12, a single UAV equipped with a manipulator was required to approach a designated target object and perform a grasping action. This task validates the feasibility of using the VR interface for integrated control of both UAV flight and manipulator operation, while ensuring accurate positioning of the end-effector in cluttered environments.

In the next experiment, a cooperative transportation task was simulated, in which two UAVs carried an elongated object together. The operation began with the UAVs tilting to traverse an inclined surface while carrying the object. They then executed a coordinated circular maneuver to demonstrate rotational motion. Next, the leader UAV adjusted its heading to release the object at the designated position, and finally both UAVs exited the environment through a narrow opening. Due to the inherent physical properties of the simulator, the grasped object tended to gradually slip after a certain period of time. Therefore, at the current stage, each phase was conducted and validated separately and subsequently linked to emulate the achievable behavior in real-world conditions, as shown in Figure 13. In addition, the operator performed all actions under both first-person and third-person perspectives to ensure accurate situational awareness and control during the experiment. This experiment demonstrates the capability of the VR-based interface to support complex cooperative aerial transport and highlights the challenges of maintaining coordination, stability, and precise control in constrained environments.

3.3. Experimental Results

The proposed framework was first validated in a single-UAV scenario to examine its capability for precise teleoperation. In this setting, the UAV was required to approach a target object, maintain a stable hovering position, and operate its onboard manipulator to accomplish grasping tasks. To preliminarily test the accuracy of the end-effector control, the manipulator was commanded to trace predefined geometric shapes, including a square and a triangle. The resulting trajectories (Figure 14) clearly demonstrate smooth and repeatable paths, indicating that the VR controller provided sufficient precision for guiding the end-effector along complex spatial patterns.

In addition to manipulator control, the UAV’s overall flight stability was also examined, Figure 15 reconstructs the spatial trajectory. Complementary snapshots (Figure 16, arranged from top to bottom and left to right) illustrate the sequence of the flight and grasping process: the UAV first approaches the target, then switches to the grasping mode to secure the object, and finally increases altitude to depart from the grasping area. These results demonstrate that the proposed framework enables intuitive and coordinated control of UAV flight and manipulator motion.

The proposed VR-based teleoperation framework was further validated in dual-UAV cooperative experiments involving obstacle traversal. In this task, the UAVs carried an elongated object through narrow passages, placed it at a designated location, and then departed while maintaining formation flight. This sequence demonstrates the ability of the framework to coordinate transport, placement, and withdrawal under constrained conditions. Figure 17 (2D top view) and Figure 18 (3D perspective) illustrate the reconstructed path that could be ideally achieved under optimal conditions. While the sequential snapshots in Figure 19 (arranged from top to bottom and left to right) illustrate the detailed motion process under VR teleoperation.

Quantitative analyses further confirm the effectiveness of the proposed framework for coordinating UAV–manipulator systems. Figure 20 compares the UAV inclination angle with the wrist roll joint, showing that the compensation strategy aligned the manipulator with the rod during inclined transport and reduced unwanted torsional effects. In Figure 21, the yaw trajectory is plotted against the commanded reference, indicating accurate heading adjustment throughout the maneuver. The results on Figure 22 demonstrate that both UAVs exhibited synchronized yaw changes (a) while keeping a nearly constant inter-UAV distance of about 4 m (b). This stable relative positioning suggests that the two platforms remained coordinated without noticeable drift. Overall, the findings highlight the feasibility of the compensation approach and its robustness in maintaining geometric constraints, enabling reliable dual-UAV cooperation even under inclination and yaw disturbances.

4. Discussion

The experimental results confirmed that the proposed method works for both single- and dual-UAV manipulation tasks. However, several limitations were observed. First, the current study was conducted in a simulation environment without external disturbances such as wind or actuator faults; hence, explicit compensation or fault-tolerant mechanisms were not implemented. To improve stability in practical scenarios, several countermeasures can be considered. These include incorporating disturbance observers or adaptive control strategies to mitigate the effects of crosswind and adding a passive vibration absorption mechanism to the manipulator to reduce small oscillations transmitted from the UAV body. Furthermore, basic fault detection and recovery strategies could be introduced to protect the system in case of rotor malfunction or unexpected disturbances.

In addition, we acknowledge that certain failure cases may occur under conditions of poor tracking or high visual clutter, which could lead to delayed or inaccurate perception of spatial references. To address this, future work will explore robust sensor fusion and adaptive visual filtering techniques to maintain stable perception in such visually challenging environments.

Finally, the present work focuses on a dual-UAV–manipulator configuration based on rotorcraft platforms; the proposed framework is inherently scalable to multi-UAV cooperative scenarios for handling heavier payloads. The VR-based teleoperation interface and symmetric control mapping are platform-agnostic, emphasizing intuitive human–robot interaction rather than vehicle-specific dynamics. Consequently, parts of the proposed method can also be adapted to fixed-wing UAVs by modifying the motion mapping layer to account for flight constraints, while the operator interface can remain unchanged. Moreover, the same modular structure allows extension to heterogeneous aerial or ground robots through unified communication layers. Future work will explore these generalizations to larger robot teams and dynamic environments, aiming to enhance robustness, coordination, and shared autonomy.

5. Conclusions

In this paper, we proposed a novel control scheme for single and dual manipulator UAV operation by a single operator through a VR-based system. The proposed framework integrates UAV flight and manipulator control into a unified interface, allowing the operator to directly perform complex spatial tasks using natural hand motions. To verify the feasibility of this approach, we conducted simulation experiments in which the operator, equipped VR controllers, successfully manipulated one or two UAVs to perform tasks. The results demonstrated that coordinated control of UAV flight and manipulator motion can be achieved in real time, and that a single operator can intuitively manage dual-UAV cooperative manipulation. These findings confirm the effectiveness of the VR-based control system in enabling seamless interaction between the operator and UAV–manipulator platforms and highlight its potential for tasks requiring dexterous and cooperative aerial manipulation.

In future work, we will further improve our system to experimentally validate a coherent transportation task in a first-person teleoperation setting and extend it from simulation to real-world experiments using UAV platforms equipped with manipulators. Moreover, a comparative evaluation with conventional control approaches will be carried out to quantitatively assess task completion time, success rate, and overall system performance. In addition, an ergonomic evaluation will be performed to analyze the operator’s comfort, workload, and motion efficiency. This evaluation will include both subjective measures, such as perceived fatigue and usability questionnaires, and objective indicators, such as motion trajectories and task execution smoothness, to comprehensively assess human–system interaction during extended teleoperation. Furthermore, the collected results will help identify design improvements in the teleoperation interface and control mapping, contributing to a more intuitive and fatigue-resistant user experience. Finally, applying the system to practical scenarios—such as cooperative inspection, aerial assembly, or disaster response—will be an important step toward demonstrating its usefulness outside the laboratory.

Author Contributions

Conceptualization, Z.Y. and A.K.; methodology, Z.Y.; software, Z.Y.; writing—original draft preparation, Z.Y.; writing—review and editing, K.T. and A.K.; supervision, K.T. and A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

This achievement was supported by JST SPRING, Grant Number JPMJSP2124.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

DOF	Degrees of Freedom
ESC	Electronic Speed Controller
FCU	Flight Control Unit
UAV	Unmanned Aerial Vehicle
VR	Virtual Reality
HMD	Head-Mounted Display
ROS	Robot Operating System

References

Hassanalian, M.; Abdelkefi, A. Classifications, applications, and design challenges of drones: A review. Prog. Aerosp. Sci. 2017, 91, 99–131. [Google Scholar] [CrossRef]
Ollero, A.; Tognon, M.; Suarez, A.; Lee, D.; Franchi, A. Past Present and Future of Aerial Robotic Manipulators. IEEE Trans. Robot. 2022, 38, 626–645. [Google Scholar] [CrossRef]
Michael Gassner, T.C.; Scaramuzza, D. Dynamic Collaboration Without Communication: Vision-Based Cable-Suspended Load Transport with Two Quadrotors. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 5196–5202. Available online: https://ieeexplore.ieee.org/document/7989609 (accessed on 15 August 2025).
Tanriverdi, V.; Jacob, R.J.K. Interacting with eye movements in virtual environments. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, New York, NY, USA, 1–6 April 2000; pp. 265–272. [Google Scholar] [CrossRef]
Sibert, L.E.; Jacob, R.J.K. Evaluation of eye gaze interaction. In Proceedings of the CHI00: Human Factors in Computing Systems, The Hague, The Netherlands, 1–6 April 2000; pp. 281–288. [Google Scholar] [CrossRef]
Yu, D.; Lu, X.; Shi, R.; Liang, H.N.; Dingler, T.; Velloso, E.; Goncalves, J. Gaze-Supported 3D Object Manipulation in Virtual Reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021. [Google Scholar] [CrossRef]
Naseer, F.; Ullah, G.; Siddiqui, M.A.; Khan, M.J.; Hong, K.-S.; Naseer, N. Deep learning-based unmanned aerial vehicle control with hand gesture and computer vision. In Proceedings of the 2022 13th Asian Control Conference (ASCC), Jeju Island, Republic of Korea, 4–7 May 2022. [Google Scholar]
De Fazio, R.; Mastronardi, V.M.; Petruzzi, M.; De Vittorio, M.; Visconti, P. Human–Machine Interaction through Advanced Haptic Sensors: A Piezoelectric Sensory Glove with Edge Machine Learning for Gesture and Object Recognition. Future Internet 2023, 15, 14. [Google Scholar] [CrossRef]
Yun, G.; Kwak, H.; Kim, D.H. Single-Handed Gesture Recognition with RGB Camera for Drone Motion Control. Appl. Sci. 2024, 14, 10230. [Google Scholar] [CrossRef]
Lee, J.W.; Kim, K.-J.; Yu, K.-H. Implementation of a User-Friendly Drone Control Interface Using Hand Gestures and Vibrotactile Feedback. J. Inst. Control Robot. Syst. 2022, 28, 349–352. [Google Scholar] [CrossRef]
Medeiros, D.; Sousa, M.; Raposo, A.; Jorge, J. Magic Carpet: Interaction Fidelity for Flying in VR. IEEE Trans. Vis. Comput. Graph. 2020, 26, 2793–2804. [Google Scholar] [CrossRef] [PubMed]
Shin, S.-Y.; Kang, Y.-W.; Kim, Y.-G. Hand Gesture-based Wearable Human-Drone Interface for Intuitive Movement Control. In Proceedings of the 2019 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 11–13 January 2019; pp. 1–6. Available online: https://ieeexplore.ieee.org/document/8662106 (accessed on 15 August 2025).
Lee, J.-W.; Yu, K.-H. Wearable Drone Controller: Machine Learning-Based Hand Gesture Recognition and Vibrotactile Feedback. Sensors 2023, 23, 2666. [Google Scholar] [CrossRef] [PubMed]
Lawrence, I.D.; Pavitra, A.R.R. Voice-controlled drones for smart city applications. In Sustainable Innovation for Industry 6.0; IGI Global: Hershey, PA, USA, 2024; pp. 162–177. [Google Scholar]
Darvish, K.; Penco, L.; Ramos, J.; Cisneros, R.; Pratt, J.; Yoshida, E.; Ivaldi, S.; Pucci, D. Teleoperation of Humanoid Robots: A Survey. IEEE Trans. Robot. 2023, 39, 1706–1727. [Google Scholar] [CrossRef]
Tezza, D.; Andujar, M. The State-of-the-Art of Human–Drone Interaction: A Survey. IEEE Access 2019, 7, 167438–167454. [Google Scholar] [CrossRef]
Lee, Y.; Connor, A.M.; Marks, S. Mixed Interaction: Evaluating User Interactions for Object Manipulations in Virtual Space. J. Multimodal User Interfaces 2024, 18, 297–311. [Google Scholar] [CrossRef]
Paterson, J.; Aldabbagh, A. Gesture-controlled robotic arm utilizing opencv. In Proceedings of the 2021 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, Turkey, 11–13 June 2021. [Google Scholar]
Xiao, C.; Woeppel, A.B.; Clepper, G.M.; Gao, S.; Xu, S.; Rueschen, J.F.; Kruse, D.; Wu, W.; Tan, H.Z.; Low, T.; et al. Tactile and chemical sensing with haptic feedback for a telepresence explosive ordnance disposal robot. IEEE Trans. Robot. 2023, 39, 3368–3381. [Google Scholar] [CrossRef]
Dafarra, S.; Pattacini, U.; Romualdi, G.; Rapetti, L.; Grieco, R.; Darvish, K.; Milani, G.; Valli, E.; Sorrentino, I.; Viceconte, P.M.; et al. icub3 Avatar System: Enabling Remote Fully Immersive Embodiment of Humanoid Robots. Sci. Robot. 2024, 9, eadh3834. [Google Scholar] [CrossRef] [PubMed]
Park, S.; Kim, J.; Lee, H.; Jo, M.; Gong, D.; Ju, D.; Won, D.; Kim, S.; Oh, J.; Jang, H.; et al. A Whole-Body Integrated AVATAR System: Implementation of Telepresence with Intuitive Control and Immersive Feedback. IEEE Robot. Autom. Mag. 2023, 32, 60–68. [Google Scholar] [CrossRef]
Di Tecco, A.; Camardella, C.; Leonardis, D.; Loconsole, C.; Frisoli, A. Virtual Dashboard Design for Grasping Operations in Teleoperation Systems. In Proceedings of the 2024 IEEE International Conference on Metrology for eXtended Reality, Artificial Intelligence and Neural Engineering (MetroXRAINE), London, UK, 21–23 October 2024; pp. 994–999. [Google Scholar] [CrossRef]
Schwarz, M.; Lenz, C.; Rochow, A.; Schreiber, M.; Behnke, S. Nimbro Avatar: Interactive Immersive Telepresence with ForceFeedback Telemanipulation. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic, 27 September–1 October 2021; pp. 5312–5319. [Google Scholar] [CrossRef]
Galarza, B.R.; Ayala, P.; Manzano, S.; Garcia, M.V. Virtual reality teleoperation system for mobile robot manipulation. Robotics 2023, 12, 163. [Google Scholar] [CrossRef]
Gorjup, G.; Dwivedi, A.; Elangovan, N.; Liarokapis, M. An Intuitive, Affordances Oriented Telemanipulation Framework for a Dual Robot Arm Hand System: On the Execution of Bimanual Tasks. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 3611–3616. [Google Scholar] [CrossRef]
Zhang, T.; McCarthy, Z.; Jowl, O.; Lee, D.; Chen, X.; Goldberg, K.; Abbeel, P. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 5628–5635. [Google Scholar] [CrossRef]
Lee, H.; Kim, H.; Kim, H.J. Planning and Control for Collision-Free Cooperative Aerial Transportation. IEEE Trans. Autom. Sci. Eng. 2016, 15, 189–201. [Google Scholar] [CrossRef]
Chen, T.; Shan, J.; Liu, H.H.T. Cooperative Transportation of a Flexible Payload Using Two Quadrotors. J. Guid. Control. Dyn. 2021, 44, 2099–2107. [Google Scholar] [CrossRef]
Loianno, G.; Kumar, V. Cooperative Transportation using Small Quadrotors using Monocular Vision and Inertial Sensing. IEEE Robot. Autom. 2017, 3, 680–687. [Google Scholar] [CrossRef]
Turco, E.; Castellani, C.; Bo, V.; Pacchierotti, C.; Prattichizzo, D.; Lisini Baldi, T. Reducing Cognitive Load in Teleoperating Swarms of Robots through a Data-Driven Shared Control Approach. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; pp. 4731–4738. [Google Scholar]
Phung, A.; Billings, G.; Daniele, A.F.; Walter, M.R.; Camilli, R. Enhancing Scientific Exploration of the Deep Sea through Shared Autonomy in Remote Manipulation. Sci. Robot. 2023, 8, eadi5227. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Traditional UAV control using a radio transmitter: (a) motion of the UAV during flight; (b) corresponding manual operation of the radio transmitter to achieve the desired motion.

Figure 2. VR controller for UAV operation: (a) rotational axes; (b) joystick input layout.

Figure 3. Mapping of the VR controller’s pitch input to UAV and manipulator motion: (a) pitch input gesture using the VR controller; (b) resulting vertical translation of the UAV; (c) resulting rotation of the J4 joint on the onboard manipulator.

Figure 4. Mapping of the VR controller’s roll input to UAV and manipulator motion: (a) roll input gesture using the VR controller; (b) resulting yaw rotation of the UAV; (c) resulting rotation of the J5 joint on the onboard manipulator.

Figure 5. Kinematic structure of the UAV–manipulator system.

Figure 6. (a) VR-based hand gesture teleoperation; (b) corresponding dual-UAV manipulation scheme.

Figure 7. VR gesture input for dual-UAV circular maneuver: (a) input gesture using the VR controller; (b) resulting coordinated circular motion of two UAVs.

Figure 8. Hand gesture control and yaw compensation of the UAV–manipulator system: (a) hand gesture input for commanding motion; (b) resulting yaw compensation of the UAV–manipulator system.

Figure 9. Hand gesture–based altitude control and dual-UAV compensation: (a) hand gesture input for altitude control; (b) compensatory behavior of the dual-UAV system in response.

Figure 10. System architecture of the VR-based teleoperation and simulation framework.

Figure 11. VR setup with HMD, controllers, and tracking stations.

Figure 12. Experimental plan of a single-UAV operation.

Figure 13. Experimental plan of dual-UAV cooperative operation: (A) Start; (B) Adjust altitude; (C) Adjust heading; (D) Move; (E) Adjust heading; (F) Move; (G) Symmetric rotation; (H) Adjust heading; (I) Window traversal; (J) Descend and release; (K) Ascend; (L) Retreat.

Figure 14. Predefined geometric trajectories executed by the UAV–manipulator end-effector: (a) Square-shaped trajectory; (b) Triangle-shaped trajectory.

Figure 15. UAV trajectory from start (green) to end (red) position.

Figure 16. Sequential UAV teleoperation snapshots in indoor simulation. (1–3) approach phase; (4–6) grasping and departure.

Figure 17. UAV cooperative trajectories shown in 2D.

Figure 18. UAV cooperative trajectories shown in 3D.

Figure 19. Sequential snapshots of UAV teleoperation in the simulated indoor environment.

Figure 20. Comparison of UAV inclination and wrist roll compensation.

Figure 21. UAV yaw angle vs. compensated arm joint, showing yaw decoupling.

Figure 22. Dual-UAV coordination behavior: (a) yaw orientation of the two UAVs; (b) inter-UAV distance during coordinated flight.

Table 1. UAV Parameters.

Mass (kg)	4.0
Moment of Inertia (kg·m²)	diag (0.072, 0.135, 0.153)
Wheelbase (m)	1.3
Thrust Coefficient (N/rpm²)	3.0 × 10⁻⁷
Moment Coefficient (Nm/rpm²)	4.0 × 10⁻⁸

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, Z.; Tomita, K.; Kamimura, A. VR-Based Teleoperation of UAV–Manipulator Systems: From Single-UAV Control to Dual-UAV Cooperative Manipulation. Appl. Sci. 2025, 15, 11086. https://doi.org/10.3390/app152011086

AMA Style

Yang Z, Tomita K, Kamimura A. VR-Based Teleoperation of UAV–Manipulator Systems: From Single-UAV Control to Dual-UAV Cooperative Manipulation. Applied Sciences. 2025; 15(20):11086. https://doi.org/10.3390/app152011086

Chicago/Turabian Style

Yang, Zhaotong, Kohji Tomita, and Akiya Kamimura. 2025. "VR-Based Teleoperation of UAV–Manipulator Systems: From Single-UAV Control to Dual-UAV Cooperative Manipulation" Applied Sciences 15, no. 20: 11086. https://doi.org/10.3390/app152011086

APA Style

Yang, Z., Tomita, K., & Kamimura, A. (2025). VR-Based Teleoperation of UAV–Manipulator Systems: From Single-UAV Control to Dual-UAV Cooperative Manipulation. Applied Sciences, 15(20), 11086. https://doi.org/10.3390/app152011086

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

VR-Based Teleoperation of UAV–Manipulator Systems: From Single-UAV Control to Dual-UAV Cooperative Manipulation

Abstract

1. Introduction

2. Control Methods Design

2.1. Single-UAV and Manipulator Control

2.2. Dual-UAV Cooperative Control

3. Experiments Using VR System

3.1. Experimental Setup

3.2. Experimental Tasks

3.3. Experimental Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI