Beyond Visual and Force Feedback: Role of Vibrotactile and Auditory Cues in Robot Teleoperated Assembly

Ohno, Kaoru; Nagano, Hikaru; Yokokohji, Yasuyoshi

doi:10.3390/robotics15020039

Open AccessArticle

Beyond Visual and Force Feedback: Role of Vibrotactile and Auditory Cues in Robot Teleoperated Assembly

by

Kaoru Ohno

^1,†,

Hikaru Nagano

^2,*,†

and

Yasuyoshi Yokokohji

¹

Department of Mechanical Engineering, Kobe University, Nada-ku, Kobe 657-8501, Japan

²

Haptics Laboratory, Faculty of Fiber Science and Engineering, Kyoto Institute of Technology, Sakyo-ku, Kyoto 606-8585, Japan

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Robotics 2026, 15(2), 39; https://doi.org/10.3390/robotics15020039

Submission received: 31 December 2025 / Revised: 31 January 2026 / Accepted: 5 February 2026 / Published: 9 February 2026

(This article belongs to the Special Issue Embodied Intelligence: Physical Human–Robot Interaction)

Download

Browse Figures

Versions Notes

Abstract

Reliable detection of contact states, such as the “mating” of connectors, is crucial for high-quality teleoperated assembly. Conventional systems relying solely on visual and continuous force feedback often fail to convey these discrete high-frequency transients due to the limited high-frequency rendering capabilities. This study investigates the effectiveness of augmenting visual and force feedback with vibrotactile and auditory cues for detecting connector mating. We conducted three experiments: (1) a mating detection task using recorded multimodal data (

N = 10

), (2) a modality contribution analysis (

N = 10

), and (3) a real-time robot connector insertion task (

N = 10

). Results from the real-time task demonstrated that the proposed multimodal feedback significantly reduced the maximum contact force exerted after mating compared to the baseline visual-force condition (

p < 0.001

), thereby enhancing physical safety. Furthermore, vibrotactile and auditory cues were found to be redundant yet complementary, providing robust cues even when one modality is compromised. Although subjective mental workload increased due to sensory integration, the significant improvement in detection clarity and safety justifies the multimodal approach. We conclude that providing transient vibrotactile and auditory cues is a highly effective strategy for compensating for the limitations of conventional force feedback in teleoperated assembly.

Keywords:

teleoperation; multimodal feedback; haptic display; tactile display; industrial robot; assembly tasks

Graphical Abstract

1. Introduction

1.1. Background

Industrial robots are widely deployed in manufacturing to automate repetitive and hazardous operations such as welding, painting, and pick-and-place handling, yielding substantial gains in throughput, consistency, and safety [1,2]. Nevertheless, many fine assembly processes in electronics and automotive production remain manual or entail intensive human supervision, because they require high dexterity and robust perception in unstructured environments.

Teleoperation has therefore emerged as a practical means to extend the applicability of industrial robots to such tasks. By coupling a human operator to a remote robot through a leader–follower interface, teleoperation allows human decision making and dexterous skills to be exploited while keeping the operator physically separated from constrained or hazardous workspaces [3,4]. In addition to direct execution, teleoperation is increasingly used to collect demonstrations for learning-based automation of contact-rich assembly skills [5,6]. In particular, bilateral control systems facilitate the simultaneous recording of expert force profiles and motion trajectories, a capability essential for learning compliant behaviors in assembly tasks [7,8,9].

Among various assembly operations, connector insertion is especially demanding for teleoperation. Mechanical and electrical connectors often exhibit small clearances, preload, and complex insertion trajectories so that successful mating is associated with subtle, high-frequency mechanical transients that are perceived as a characteristic “click” in both touch and sound [10]. During manual assembly, operators integrate multiple sensory cues, such as visual confirmation, kinesthetic reaction forces, and the distinct vibration and sound at the moment of mating, to decide whether the connector is fully inserted.

In conventional teleoperation, these cues are degraded or partially lost. Camera-based visual feedback is frequently compromised by self-occlusions: the robot hand, cables, or surrounding parts may block the view of the mating interface just before or during insertion, making it difficult to verify completion from images alone. Even when bilateral force feedback is available, the ability to render high-frequency signals is fundamentally limited. To guarantee control stability (i.e., passivity) in the presence of communication delays, the force feedback loop is typically designed to be low-pass-filtered [11,12]. Furthermore, the inherent mechanical dynamics of the haptic interface, such as inertia and friction, inevitably act as a physical filter that suppresses rapid vibrations. As a result, high-frequency contact transients associated with the “click” event are strongly attenuated and rarely transmitted through the kinesthetic channel. Operators may therefore miss the mating event, apply excessive push forces that risk damaging delicate components, or conversely terminate the motion prematurely and leave connectors incompletely engaged.

1.2. Related Works

1.2.1. Haptic (Kinesthetic and Cutaneous) and Auditory Feedback

To mitigate the limitations of visual feedback, extensive research has explored haptic feedback in teleoperation. Numerous studies have demonstrated that force feedback improves performance in contact-rich manipulation tasks by enabling the operator to feel interaction forces and adjust motion accordingly [13,14,15]. However, simply increasing the bandwidth or gain of bilateral force feedback to transmit high-frequency transients often compromises system stability, particularly in the presence of communication delays and model uncertainties [11,12].

To address the trade-off between stability and transparency, two main approaches have been proposed to separate high-frequency tactile information from the low-frequency kinesthetic loop.

The first approach is “event-based” or model-based haptics. Rather than transmitting raw signals, these methods synthesize high-frequency vibrations based on physical models or trigger pre-recorded transients when a specific contact event is detected [16]. They showed that superimposing these high-frequency acceleration transients on conventional force feedback significantly enhances the perceived realism of contact without compromising closed-loop stability. While effective for predictable interactions, these model-based methods often require prior parameter tuning for different materials.

The second approach, which this study adopts, is record-based (measurement-based) feedback: high-frequency vibration signals generated at the tool–environment interface are captured and replayed to the operator as waveforms (often with only level calibration and saturation prevention). Kontarinis et al. first proposed presenting measured high-frequency vibrations via a separate vibrotactile actuator, decoupled from the kinesthetic force feedback channel [17]. In teleoperated medical and construction settings, systems such as VerroTouch and naturalistic vibration feedback devices capture tool-contact accelerations and replay them to the operator [18,19,20]. Such “naturalistic” playback preserves the rich, unmodeled temporal and spectral characteristics of contact-induced vibrations, which can be informative for manipulation tasks involving diverse materials. Beyond teleoperation, record-based replay has also been explored in haptic media prototyping, where recorded interaction signals are reproduced to convey realistic tactile sensations [21].

Recent studies have further examined how such force and vibrotactile feedback should be combined. Wildenbeest et al. observed that while the low-frequency channel primarily enhances task performance, high-frequency cues are more effective in reducing subjective workload and perceived difficulty [22]. Furthermore, distinct roles for these modalities have been identified depending on the target property; for instance, vibration cues are particularly effective for roughness discrimination, whereas force cues are dominant for hardness perception [23].

Auditory feedback has also been explored as a powerful modality for conveying temporal information in teleoperation. The human auditory system is highly sensitive to transient events and can detect changes in contact state with very short reaction times [24,25]. Early work by Massimino and Sheridan used auditory displays as sensory substitution for force feedback, mapping contact force to sound intensity and pitch, and showed that auditory feedback can support accurate teleoperation even under visual occlusion and communication delay [24]. In robot-assisted surgery, Kitagawa et al. demonstrated that visual and auditory sensory substitution of suture tension helps surgeons apply more consistent forces during knot tying [26].

1.2.2. Limitations of Previous Studies

Despite these advances, relatively few studies have focused specifically on connector mating detection in bilateral teleoperation. Most prior work on haptic feedback for teleoperated assembly has considered generic peg-in-hole or snap-fit tasks and evaluated aggregate performance metrics such as completion time, contact forces, and positioning errors [15,22]. Wildenbeest et al. decomposed a typical (dis-)assembly task into free-space movement, contact transitions, and constrained translational and rotational phases and showed that low-frequency force feedback is particularly effective for reducing impact forces in these phases, whereas adding high-frequency vibration yields only marginal improvements in overall task performance [22]. While these works clearly show that force and vibrotactile feedback improve safety and efficiency at the task level, they do not isolate the perceptual processes underlying the detection of discrete mating events under realistic visual occlusions.

Similarly, although vibrotactile and auditory feedback have each been shown to enhance teleoperation, their specific roles and potential redundancy or complementarity in connector mating remain unclear. Studies comparing haptic and auditory cues often focus on warning signals such as slip detection or threshold crossing [25,27], rather than on the precise discrimination of mating events during fine assembly. Moreover, recent work on naturalistic vibrotactile feedback deliberately calls for systematic comparisons with other modalities, including auditory displays, in realistic teleoperation scenarios [20].

Finally, the impact of additional high-frequency cues on operator fatigue and cognitive load is not yet fully understood. While some teleoperation studies report subjective workload measures, the trade-off between improved perceptual information and potential sensory overload has not been systematically explored for connector insertion tasks. Understanding whether vibrotactile and auditory cues genuinely support the operator, or merely increase sensory load, is crucial for designing effective telepresence interfaces that will be acceptable in industrial practice.

1.3. Research Objective

The objective of this study is to develop a multimodal teleoperation system that assists connector insertion by enhancing the perception of mating events. Our system captures high-frequency mechanical transients associated with connector mating and feeds them back to the operator via vibrotactile and auditory channels, complementing standard visual and bilateral force feedback. Unlike many existing studies that focus on the refinement of single-modality feedback, this paper experimentally validates the effectiveness of an integrated multimodal approach. We demonstrate that combining high-frequency vibrotactile and auditory cues with conventional feedback creates a more robust teleoperation environment than single-modality systems.

To validate the system and clarify the role of each sensory modality, we conducted three experiments:

Experiment 1:: Mating Detection: We compared the proposed multimodal condition (Visual + Force + Vibrotactile + Audio) against Visual-only and Visual + Force baselines. This experiment demonstrates the necessity of non-visual cues for reliable detection under visual occlusion.
Experiment 2:: Modality Contribution: We investigated the individual contributions of vibrotactile and auditory cues by comparing them separately and together. We evaluated detection rates, subjective ease of use, and mental workload to determine whether these modalities play complementary or redundant roles.
Experiment 3:: Connector Insertion: We examined whether improved mating detection translates into enhanced performance during active robot control. We assessed whether multimodal feedback reduces task completion time and peak insertion forces and evaluated subjective performance metrics in a realistic teleoperation scenario.

By analyzing how high-frequency cues support decision-making, this paper clarifies the functional division between these modalities and demonstrates their ability to compensate for the band-limited nature of conventional force feedback.

1.4. Contributions

This paper makes the following contributions:

We formulate connector mating as a high-frequency transient event in teleoperated assembly and show why conventional visual and band-limited force feedback can be insufficient under realistic occlusions.
We propose a record-based multimodal feedback design that replays measured high-frequency transients through vibrotactile and auditory channels while preserving the stability constraints of the bilateral force loop.
Through three human-subject experiments, we quantify (i) improvements in mating detection accuracy and clarity and (ii) a significant reduction in post-mating maximum contact force, demonstrating enhanced physical safety.
We provide a modality contribution analysis showing that vibrotactile and auditory cues are largely redundant yet complementary, supporting robust operation when one modality is compromised.

2. Teleoperation System with Multimodal Feedback

2.1. System Overview

To investigate the efficacy of multimodal feedback, we developed a leader–follower teleoperation system that provides visual, force, vibrotactile, and auditory information. The system consists of a leader interface operated by the user and a follower robot. The pose of the leader device is transmitted to the follower robot, while sensory data acquired on the follower side are streamed back to the operator.

2.1.1. Follower-Side System

The follower-side system is shown in Figure 1. We used a 6-DOF industrial robot arm (RS007L, Kawasaki Heavy Industries, Ltd., Tokyo, Japan). A two-finger gripper (2F-140, Robotiq, Quebec City, QC, Canada) was attached to the end-effector with custom 3D-printed fingers designed to hold the connector and to damp impacts during insertion.

Sensors for feedback generation were installed as follows:

Visual: A close-up camera (RealSense D405, Intel Corporation, Santa Clara, CA, USA) captured the gripper view ( $320 \times 180$ , 30 fps), and an overview camera (STC-MCS322U3V, OMRON SENTECH CO., LTD., Kanagawa, Japan) captured the workspace ( $800 \times 600$ , 30 fps).
Force: A 6-axis force/torque sensor (Axia80-M20, ATI Industrial Automation, Apex, NC, USA) mounted between the wrist and the gripper measured interaction forces.
Vibrotactile: A high-bandwidth accelerometer (VS-BV20, TOKIN Corporation, Miyagi, Japan) was attached to a gripper finger to capture high-frequency transients during connector mating.
Audio: An omnidirectional microphone (AT9904, Audio-Technica Corporation, Tokyo, Japan) was placed near the gripper fingers to capture contact sounds.

2.1.2. Leader-Side System

The leader-side system is shown in Figure 2. The operator controlled the robot and received feedback through the following interfaces:

Visual: A monitor displayed the two camera feeds side-by-side, as shown in Figure 3.
Force: A haptic device (omega.6, Force Dimension, Nyon, Switzerland) provided 6-DOF pose input and 3-DOF translational force feedback.
Vibrotactile: A vibrotactile actuator (639897, Foster Electric Co., Ltd., Tokyo, Japan) was mounted on the stylus of the haptic device using a 3D-printed jig, enabling fingertip vibration presentation.
Audio: Earphones (E500, final Inc., Kanagawa, Japan) presented the audio acquired on the follower side. Ambient noise was attenuated using noise-canceling headphones (WH-CH720N, Sony Corporation, Tokyo, Japan).

Figure 2. Configuration of the leader-side system: (a) Overview of the leader-side interface; (b) Close-up of the force display handle equipped with the vibrotactile display.

Figure 3. Visual feedback information.

2.2. Control Method

We adopted a position–position (pose–pose) teleoperation scheme, in which the end-effector pose of the follower robot tracks the pose of the leader device. This mode is suitable for manipulation tasks requiring human-like motions (e.g., tracing and insertion), because it preserves the geometric correspondence between the operator’s motion and the robot motion.

Since the motion ranges of the leader device and the follower robot differ, we applied scaling to the transmitted pose. Let

Δ p_{L} = {[Δ x_{L}, Δ y_{L}, Δ z_{L}]}^{⊤}

and

Δ ϕ_{L} \in R^{3}

denote the incremental translation and rotation (axis–angle representation) of the leader device, respectively. The commanded pose increment for the follower side is computed as

\begin{matrix} Δ x_{F} & = s_{x} Δ x_{L}, \end{matrix}

(1)

\begin{matrix} Δ y_{F} & = s_{y} Δ y_{L}, \end{matrix}

(2)

\begin{matrix} Δ z_{F} & = s_{z} Δ z_{L}, \end{matrix}

(3)

\begin{matrix} Δ ϕ_{F} & = s_{q} Δ ϕ_{L}, \end{matrix}

(4)

where

(s_{x}, s_{y}, s_{z}, s_{q}) = (0.3, 0.3, 0.05, 0.1)

. The scaling gains were empirically tuned by the authors to balance manipulation stability and operator workload.

The pose command was converted to joint commands by inverse kinematics, and a first-order Butterworth low-pass filter with a cutoff frequency of 10 Hz was applied to the joint angle commands. The control command rate was 500 Hz. Communication between the leader and follower was established using TCPROS (TCP/IP). Given the negligible network latency (verified via ICMP RTT: 0.4 ms), we employed an asynchronous control scheme without explicit time synchronization.

2.3. Force Feedback

We constructed a force feedback loop that transmits interaction forces measured on the follower side to the leader haptic device. Although both the force/torque sensor and the haptic device support 6-axis (force and torque) signals, in this study we transmitted and rendered forces only (3-DOF translational forces); torques were not displayed. The force/torque sensor was sampled at 2.4 kHz, and force rendering on the haptic device was updated at 1 kHz. The force feedback gain was set to 0.1.

2.4. Vibrotactile and Auditory Feedback

We implemented record-based vibrotactile and auditory feedback by directly replaying the measured accelerometer and microphone signals acquired on the follower side. Because the absolute perceived intensity depends strongly on the actuator transfer characteristics, mounting conditions, and individual sensitivity, we do not report a single global gain value. Instead, the experimenter set the playback level using a short practice trial so that the transient cues were clearly perceivable without discomfort, and the level was kept fixed throughout the experiment. The accelerometer was sampled at 24 kHz, and the vibrotactile rendering signal was updated at 24 kHz. Audio acquisition and streaming between the PC and the audio interface were executed in blocks at 200 Hz.

3. Experiment 1: Mating Detection

3.1. Objective

In this experiment, we investigated the participants’ ability to detect the completion of connector mating using pre-recorded multimodal feedback. The goal was to verify whether the proposed multimodal feedback (visual, force, vibrotactile, and auditory) enables more accurate and confident detection compared to conventional feedback methods.

3.2. Method

3.2.1. Participants

Ten subjects (

22.9 \pm 0.7

years old, 9 males and 1 female) participated in this experiment. All participants were university students who had no previous experience with the robotic teleoperation system.

3.2.2. Conditions

We compared three feedback conditions to evaluate the effectiveness of adding vibrotactile and auditory cues:

V (Visual only): Participants judged the mating status based solely on the camera view.
VF (Visual + Force): Visual feedback combined with force feedback rendered through the haptic device.
VFVA (Visual + Force + Vibrotactile + Audio): The proposed method, which presents vibrotactile and auditory feedback by direct playback of the measured acceleration and sound signals, in addition to VF.

To prevent excessive participant fatigue in this initial stage, we focused on comparing the full multimodal configuration (VFVA) against the baselines, leaving the individual modality analysis for Experiment 2.

3.2.3. Stimuli and Procedure

The experimental stimuli consisted of data recorded from 20 trials (10 successful mating cases and 10 failed cases), examples of which are shown in Figure 4. The average duration of the stimuli was

14.27 \pm 2.37

s, covering the approach, contact, and mating phases. Participants were presented with these stimuli using the playback system and asked to determine whether the connector mating was successfully completed. The order of feedback conditions was randomized for each participant.

3.2.4. Metrics

Correct Answer Rate: The ratio of correctly identified trials (success/failure).
Subjective Clarity: After each trial, participants rated the subjective easiness of perception on a 7-point Likert scale (1: Did not understand at all, 7: Understood clearly).

Figure 4. Examples of measured multi-modal information in (a) successful and (b) failed mating cases.

3.3. Results

3.3.1. Correct Answer Rate

Figure 5 shows the correct answer rates for each feedback condition. A Shapiro-Wilk test indicated that the data were not normally distributed. Thus, we conducted a non-parametric Friedman test.

The Friedman test revealed a significant main effect of the feedback condition on the correct answer rate (

χ^{2} (2) = 15.730

,

p < 0.001

). The effect size was large (Kendall’s

W = 0.786

). Post-hoc analyses using Wilcoxon signed-rank tests with Bonferroni correction confirmed that the correct answer rate for the proposed VFVA condition was significantly higher than that of the V condition (

p_{adj} = 0.023

,

r = 0.60

) and the VF condition (

p_{adj} = 0.006

,

r = 0.69

). No significant difference was found between the V and VF conditions (

p_{adj} = 0.698

,

r = 0.27

).

Notably, in the proposed VFVA condition, all participants achieved a correct answer rate of 1.0 (100%), demonstrating that the addition of vibrotactile and auditory cues effectively complements visual and force information for detecting discrete mating events.

3.3.2. Subjective Clarity

Figure 6 shows the subjective evaluation of the clarity of perception. The Friedman test showed significant differences among the conditions (

χ^{2} (2) = 16.703

,

p < 0.001

) with a large effect size (

W = 0.835

).

Subsequent multiple comparisons indicated that the proposed VFVA condition was significantly easier to perceive than both the V condition (

p_{adj} = 0.006

,

r = 0.69

) and the VF condition (

p_{adj} = 0.006

,

r = 0.69

). The median score for the VFVA condition was 7 (the maximum score), indicating that participants could perceive the mating moment clearly and with high confidence. No significant difference was observed between the V and VF conditions (

p_{adj} = 0.771

,

r = 0.25

).

4. Experiment 2: Modality Contribution

4.1. Objective

While Experiment 1 demonstrated the effectiveness of the combined multimodal feedback, the individual contributions of vibrotactile and auditory cues remained unclear. The objective of Experiment 2 was to isolate and compare the efficacy of vibrotactile and auditory feedback in detecting the connector mating event. Specifically, we investigated whether one modality is superior to the other or whether they provide redundant information.

4.2. Method

4.2.1. Participants

Ten subjects (

22.7 \pm 0.8

years old, 9 males and 1 female) participated in this experiment. Note that this group was different from the participants in Experiment 1. All participants were university students who had no previous experience with the robotic teleoperation system.

4.2.2. Conditions

Using the same experimental setup and playback method as in Experiment 1, we compared three feedback conditions to isolate the additional cues:

VF (Visual + Force)
VFV (Visual + Force + Vibrotactile)
VFA (Visual + Force + Audio)

4.2.3. Procedure

Participants observed 20 randomized trials (10 success, 10 failure) for each condition. They were asked to judge whether the connector had mated.

4.2.4. Metrics

In addition to the correct answer rate and subjective clarity, we evaluated subjective mental workload to assess the cognitive cost of sensory integration. We employed a 7-point Likert scale based on the “Mental Demand” subscale of the NASA-TLX [28], rather than the full questionnaire. This single-item approach was chosen to minimize interruption time between the repeated trials and reduce overall participant fatigue while allowing for rapid subjective workload probing.

4.3. Results

4.3.1. Correct Answer Rate

Figure 7 illustrates the comparison of task accuracy across the three experimental conditions: VF, VFV, and VFA. A Shapiro-Wilk test indicated that the assumption of normality was violated for the accuracy data (

p < 0.05

). Consequently, we conducted a non-parametric Friedman test to analyze the differences among the conditions.

The Friedman test revealed a significant main effect of the feedback condition on task accuracy (

χ^{2} (2) = 17.034

,

p < 0.001

). The effect size was large (Kendall’s

W = 0.852

), indicating a substantial difference in performance driven by the feedback modalities.

To investigate pairwise differences, post-hoc analyses were performed using Wilcoxon signed-rank tests with Bonferroni correction for multiple comparisons. The results indicated that the accuracy in the VFV condition was significantly higher than in the VF condition (

p_{adj} = 0.021

,

r = 0.60

). Similarly, the VFA condition showed significantly higher accuracy compared to the VF condition (

p_{adj} = 0.021

,

r = 0.60

). However, no significant difference was observed between the VFV and VFA conditions (

p_{adj} = 0.539

,

r = 0.30

).

These results suggest that the addition of vibrotactile or auditory cues to visual and force feedback significantly improves task performance compared to visual and force feedback alone, while the effectiveness of vibrotactile and auditory cues was comparable in this task.

4.3.2. Subjective Clarity

Figure 8 presents the subjective clarity scores for the three experimental conditions. Since the Shapiro-Wilk test indicated that the data were not normally distributed (

p < 0.05

for all conditions), we employed the non-parametric Friedman test.

The results of the Friedman test showed a significant main effect of the feedback condition on the subjective scores (

χ^{2} (2) = 17.882

,

p < 0.001

). The effect size was large (Kendall’s

W = 0.894

), suggesting that the feedback modality strongly influenced the participants’ subjective experience.

Post-hoc analyses using Wilcoxon signed-rank tests with Bonferroni correction revealed that clarity scores in the VFV condition were significantly higher than in the VF condition (

p_{adj} = 0.006

,

r = 0.69

). Similarly, the VFA condition was rated significantly higher than the VF condition (

p_{adj} = 0.006

,

r = 0.69

). No significant difference was found between the VFV and VFA conditions (

p_{adj} = 0.952

,

r = 0.22

).

These findings suggest that participants perceived the task to be significantly easier when vibrotactile or auditory feedback was provided.

4.3.3. Subjective Mental Workload

Figure 9 shows the subjective mental workload scores for the three conditions. A Shapiro-Wilk test confirmed that the data for all conditions followed a normal distribution (

p > 0.05

). Therefore, we conducted a one-way repeated measures ANOVA.

The ANOVA revealed a significant main effect of the feedback condition on the subjective workload (

F (2, 18) = 12.702

,

p < 0.001

). Post-hoc analyses using paired t-tests with Bonferroni correction indicated that the workload scores in the VFV condition were significantly higher (indicating higher load/fatigue) than in the VF condition (

p_{adj} = 0.006

,

d = 0.84

). Similarly, the VFA condition showed significantly higher workload scores compared to the VF condition (

p_{adj} = 0.02

,

d = 0.97

). No significant difference was observed between the VFV and VFA conditions (

p_{adj} = 1.000

).

These results suggest that while the additional feedback modalities improved task performance, they also resulted in increased subjective workload compared to the visual and force feedback alone.

5. Experiment 3: Connector Insertion

5.1. Objective

Following the mating detection experiments (Experiments 1 and 2), we conducted a connector insertion task to evaluate the effectiveness of the proposed feedback system in a continuous operation scenario involving active robot control.

5.2. Method

5.2.1. Participants

Ten subjects (

22.9 \pm 0.5

years old, 10 males) participated in this experiment. Note that this group was different from the participants in Experiments 1 and 2. All participants were university students who had no previous experience with the robotic teleoperation system.

5.2.2. Conditions

We compared two feedback conditions to assess the impact of multimodal feedback on teleoperation performance:

VF (Visual + Force)
VFVA (Visual + Force + Vibrotactile + Audio)

Since this experiment involved active robot control which induces higher participant fatigue compared to passive observation, we limited the comparison to the baseline (VF) and the fully integrated method (VFVA).

5.2.3. Task and Procedure

The task involved controlling the robot to insert a connector into a socket. Each participant performed 10 trials for each condition. The order of conditions was counterbalanced (5 participants per sequence). Before the experimental trials for each condition, participants were given 3 practice trials to familiarize themselves with the specific feedback and operation. Participants were instructed to perform the insertion as quickly and accurately as possible. They were also asked to verbally report the completion of the task immediately upon perceiving that the connector was successfully mated.

5.2.4. Metrics

We evaluated both objective performance and subjective experience using the following metrics:

Objective Metrics:
–
Contact Force: The maximum contact force exerted on the connector after the mating event occurred, serving as an indicator of excessive force application.
–
Task Time: The duration from the start of the insertion phase to the participant’s verbal report of completion.
Subjective Metrics: After completing the trials for each condition, participants rated the following aspects on a 7-point Likert scale:
–
Subjective Clarity: How clear was the mating sensation? (1: Not clear at all, 7: Very clear)
–
Subjective Operability: How easy was it to operate the robot? (1: Very difficult, 7: Very easy)
–
Subjective Mental Workload: How much mental effort was required? (1: Very low, 7: Very high)
–
Subjective Physical Workload: How much physical effort was required? (1: Very low, 7: Very high)

Since this experiment involved active manipulation, we included “Physical Workload” as a metric.

5.3. Results

Prior to the main comparison, a supplementary analysis confirmed no significant sequence effects (VF-first vs. VFVA-first) on the performance metrics (

p > 0.05

).

5.3.1. Maximum Contact Force

Figure 10 compares the maximum contact force between the VF and VFVA conditions. Shapiro-Wilk tests confirmed the normality of the data distributions (

p > 0.05

). Thus, a paired t-test was conducted. The analysis revealed that the contact force was significantly different between the two conditions (

p < 0.001

, Cohen’s d = 2.18) (VFVA < VF).

5.3.2. Task Completion Time

Figure 11 compares the task completion time between the VF and VFVA conditions. Shapiro-Wilk tests confirmed the normality of the data distributions (

p > 0.05

). Thus, a paired t-test was conducted. The analysis revealed that the task time was not significantly different between the two conditions (

p = 0.917

, Cohen’s d = 0.06).

5.3.3. Subjective Clarity

Figure 12 compares the clarity between the VF and VFVA conditions. Shapiro-Wilk tests indicated that the data were not normally distributed (

p < 0.05

). Thus, a Wilcoxon signed-rank test was conducted. The analysis revealed that the clarity was significantly different between the two conditions (

p = 0.002

, r = 0.69) (VFVA > VF).

5.3.4. Subjective Operability

Figure 13 compares the operability between the VF and VFVA conditions. Shapiro-Wilk tests indicated that the data were not normally distributed (

p < 0.05

). Thus, a Wilcoxon signed-rank test was conducted. The analysis revealed that the operability was significantly different between the two conditions (

p = 0.009

, r = 0.58) (VFVA > VF).

5.3.5. Subjective Mental Workload

Figure 14 compares the mental workload between the VF and VFVA conditions. Shapiro-Wilk tests indicated that the data were not normally distributed (

p < 0.05

). Thus, a Wilcoxon signed-rank test was conducted. The analysis revealed that the mental workload was significantly different between the two conditions (

p = 0.009

, r = 0.58) (VFVA > VF).

5.3.6. Subjective Physical Workload

Figure 15 compares the physical workload between the VF and VFVA conditions. Shapiro-Wilk tests confirmed the normality of the data distributions (

p > 0.05

). Thus, a paired t-test was conducted. The analysis revealed that the physical workload was not significantly different between the two conditions (

p = 0.560

, Cohen’s d = 0.23).

6. Discussion

6.1. Effectiveness of Multimodal Feedback

The primary objective of this study was to verify whether adding vibrotactile and auditory cues improves the recognition of connector mating. The results from Experiment 1 and Experiment 3 consistently showed that the proposed multimodal feedback (VFVA) significantly improved detection accuracy and subjective clarity. In teleoperation, continuous force feedback is often band-limited to ensure stability, making it difficult to transmit high-frequency transients (clicks) associated with snap-fits. Visual feedback is also prone to occlusion and delays. Our findings suggest that vibrotactile and auditory cues successfully compensated for the limitations of visual and force feedback by preserving the transient information essential for recognizing the mating moment.

6.2. Redundancy of Vibrotactile and Auditory Cues

Experiment 2 was designed to clarify the individual contributions of vibrotactile and auditory modalities. Interestingly, we found no significant difference in performance or subjective evaluation between the VFV (Vibration) and VFA (Audio) conditions; both improved performance equally compared to the baseline. This suggests that these two modalities provide redundant information for this specific task. However, this redundancy is advantageous in practical scenarios. For instance, in a noisy factory environment where audio is masked, vibration can serve as the primary cue. Conversely, if the haptic interface lacks high-frequency capability, audio can compensate.

6.3. Trade-Off Between Performance and Mental Workload

A notable finding in Experiments 2 and 3 was the increase in subjective mental workload (or fatigue) under the multimodal conditions, despite the improvement in task performance and clarity. While participants reported that the task was “easier to understand” (Clarity), they also felt “more tired” (Mental Workload). This apparent contradiction can be attributed to the increased cognitive load required to integrate multiple sensory inputs. Although the recognition task itself is passive, simultaneously monitoring additional sensory channels (vibrotactile and auditory) alongside visual and force feedback demands higher attentional resources. Thus, we interpret the observed workload increase as a perceptual cost of sensory integration rather than a result of motor adjustments. However, we anticipate that long-term usage will induce a learning effect, thereby reducing this cognitive load as operators develop automaticity in processing these cues. Future work should investigate optimal information presentation methods to minimize this cognitive cost, such as adjusting the intensity or limiting the bandwidth to essential frequencies.

6.4. Impact on Physical Safety

In Experiment 3, the additional feedback significantly reduced the maximum contact force exerted after mating. This indicates that the operators were able to stop the robot’s motion more quickly upon perceiving the mating signal. Although the total task time did not significantly change, the reduction in excessive force is critical for preventing damage to the workpiece and the robot. This confirms that multimodal feedback contributes not only to perception but also to the safety and quality of the assembly operation.

6.5. Limitations and Future Work

While our system replays measured vibration and audio signals with minimal processing, tactile transmission under substantial communication delays can benefit from model-based approaches that estimate and update physical parameters and reconstruct tactile cues on the leader side [29]. Incorporating such delay-compensation strategies is an important direction for future work.

In addition to high-frequency vibration and audio cues, further performance gains may be achieved by transmitting other tactile modalities, in particular the spatial distribution of cutaneous contact on the fingertip (e.g., pressure and shear distributions). Our current interface replays vibration and audio signals as temporal waveforms, but it does not convey where and how the contact is distributed on the finger pad, which may be informative for contact state estimation and fine manipulation. Recent wearable high-resolution cutaneous displays can reproduce distributed fingertip contact patterns using different actuation principles, including multi-channel suction stimuli [30] and finger-mounted high-density pin-array displays [31]. Integrating such distributed cutaneous feedback with the proposed record-based transient playback is a promising direction for improving robustness and reducing excessive contact forces in assembly tasks.

Regarding practical deployment, interference from structural vibrations is a potential challenge. However, such periodic equipment vibrations can be treated as “ego-noise” and effectively suppressed using noise removal techniques, as demonstrated in teleoperated construction robots [19,32]. By combining these methods with frequency-based filtering, the mating cues can be robustly isolated from environmental noise.

Finally, while this study relied on subjective reports (NASA-TLX) for workload assessment, future experiments will incorporate objective measurements using physiological signals to quantify cognitive load more precisely. Methodologies for extracting stress indicators in HRI contexts, as discussed in [33], will be adopted to objectively analyze the physiological cost of sensory integration.

7. Conclusions

This study proposed a multimodal teleoperation system that enhances visual and force feedback with vibrotactile and auditory cues to support connector mating tasks. We demonstrated that these additional modalities effectively compensate for the limited high-frequency rendering capabilities of conventional force feedback systems by acting as proxies for transient signals.

Through a series of experiments, we concluded the following:

Performance and Clarity: The addition of vibrotactile and auditory cues significantly improves the accuracy and subjective clarity of mating detection compared to visual and force feedback alone.
Robustness via Redundancy: Vibrotactile and auditory cues contribute comparably to this performance improvement. This functional redundancy ensures robust operation even if one modality is masked or unavailable.
Safety vs. Mental Workload Trade-off: While the proposed feedback enhances physical safety by significantly reducing post-mating contact forces, it imposes a higher mental workload on the operator due to the cognitive demand of sensory integration.

These findings highlight that utilizing high-frequency modalities is essential for reliable and safe teleoperated assembly. Future work will focus on optimizing the feedback parameters (e.g., intensity or timing) to mitigate mental workload and applying this method to more complex assembly tasks involving flexible objects.

Author Contributions

Conceptualization, H.N.; methodology, K.O. and H.N.; software, K.O. and H.N.; validation, K.O. and H.N.; formal analysis, K.O. and H.N.; investigation, K.O. and H.N.; resources, H.N. and Y.Y.; data curation, K.O. and H.N.; writing—original draft preparation, K.O. and H.N.; writing—review and editing, H.N. and Y.Y.; visualization, K.O. and H.N.; supervision, H.N. and Y.Y.; project administration, H.N.; funding acquisition, Y.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was subsidized by the New Energy and Industrial Technology Development Organization (NEDO) under the project JPNP20016. This paper is one of the achievements of joint research with and is the jointly owned copyrighted material of the ROBOT Industrial Basic Technology Collaborative Innovation Partnership (ROBOCIP).

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the ethical review board of the faculty of engineering of Kobe University (protocol code 04-51, 13 January 2023).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in this study are included in the article material. Further inquiries can be directed to the corresponding author.

Acknowledgments

During the preparation of this manuscript/study, the authors used ChatGPT-5.2 (OpenAI) for the purposes of assisting with language polishing, in order to enhance the readability of the manuscript. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hägele, M.; Nilsson, K.; Pires, J.N.; Bischoff, R. Industrial robotics. In Springer Handbook of Robotics; Springer: Cham, Switzerland, 2016; pp. 1385–1422. [Google Scholar]
Arents, J.; Greitans, M. Smart industrial robot control trends, challenges and opportunities within manufacturing. Appl. Sci. 2022, 12, 937. [Google Scholar] [CrossRef]
Hokayem, P.F.; Spong, M.W. Bilateral teleoperation: An historical survey. Automatica 2006, 42, 2035–2057. [Google Scholar] [CrossRef]
Niemeyer, G.; Preusche, C.; Stramigioli, S.; Lee, D. Telerobotics. In Springer Handbook of Robotics; Springer: Cham, Switzerland, 2016; pp. 1085–1108. [Google Scholar]
Zhao, T.Z.; Kumar, V.; Levine, S.; Finn, C. Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware. In Proceedings of the Robotics: Science and Systems, Daegu, Republic of Korea, 10–14 July 2023. [Google Scholar]
Chi, C.; Xu, Z.; Feng, S.; Cousineau, E.; Du, Y.; Burchfiel, B.; Tedrake, R.; Song, S. Diffusion policy: Visuomotor policy learning via action diffusion. Int. J. Robot. Res. 2025, 44, 1684–1704. [Google Scholar] [CrossRef]
Sasagawa, A.; Fujimoto, K.; Sakaino, S.; Tsuji, T. Imitation learning based on bilateral control for human–robot cooperation. IEEE Robot. Autom. Lett. 2020, 5, 6169–6176. [Google Scholar] [CrossRef]
Buamanee, T.; Kobayashi, M.; Uranishi, Y.; Takemura, H. Bi-act: Bilateral control-based imitation learning via action chunking with transformer. In Proceedings of the 2024 IEEE International Conference on Advanced Intelligent Mechatronics (AIM), Boston, MA, USA, 15–19 July 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 410–415. [Google Scholar]
Liu, W.; Wang, J.; Wang, Y.; Wang, W.; Lu, C. Forcemimic: Force-centric imitation learning with force-motion capture system for contact-rich manipulation. In Proceedings of the 2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, 19–23 May 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 1105–1112. [Google Scholar]
Rusli, L.; Luscher, A.; Sommerich, C. Force and tactile feedback in preloaded cantilever snap-fits under manual assembly. Int. J. Ind. Ergon. 2010, 40, 618–628. [Google Scholar] [CrossRef]
Lawrence, D.A. Stability and transparency in bilateral teleoperation. IEEE Trans. Robot. Autom. 2002, 9, 624–637. [Google Scholar] [CrossRef]
Hannaford, B.; Ryu, J.H. Time-domain passivity control of haptic interfaces. IEEE Trans. Robot. Autom. 2002, 18, 1–10. [Google Scholar] [CrossRef]
Hannaford, B.; Kim, W.S. Force reflection, shared control, and time delay in telemanipulation. In Proceedings of the Conference Proceedings, IEEE International Conference on Systems, Man and Cybernetics, Cambridge, MA, USA, 14–17 November 1989; IEEE: Piscataway, NJ, USA, 1989; pp. 133–137. [Google Scholar]
Wagner, C.R.; Howe, R.D. Force feedback benefit depends on experience in multiple degree of freedom robotic surgery task. IEEE Trans. Robot. 2007, 23, 1235–1240. [Google Scholar] [CrossRef]
Tanioka, T.; Nagano, H.; Tazaki, Y.; Yokokohji, Y. Effects of Haptic Feedback on Precision Peg Insertion Tasks Under Different Visual and Communication Latency Conditions. Robotics 2025, 14, 34. [Google Scholar] [CrossRef]
Kuchenbecker, K.J.; Fiene, J.; Niemeyer, G. Improving contact realism through event-based haptic feedback. IEEE Trans. Vis. Comput. Graph. 2006, 12, 219–230. [Google Scholar] [CrossRef]
Kontarinis, D.A.; Howe, R.D. Tactile display of vibratory information in teleoperation and virtual environments. Presence Teleoperators Virtual Environ. 1995, 4, 387–402. [Google Scholar] [CrossRef]
Kuchenbecker, K.J.; Gewirtz, J.; McMahan, W.; Standish, D.; Martin, P.; Bohren, J.; Mendoza, P.J.; Lee, D.I. VerroTouch: High-frequency acceleration feedback for telerobotic surgery. In Proceedings of the Haptics: Generating and Perceiving Tangible Sensations: International Conference, EuroHaptics 2010, Amsterdam, The Netherlands, 8–10 July 2010; Proceedings, Part I. Springer: Berlin/Heidelberg, Germany, 2010; pp. 189–196. [Google Scholar]
Nagano, H.; Takenouchi, H.; Cao, N.; Konyo, M.; Tadokoro, S. Tactile feedback system of high-frequency vibration signals for supporting delicate teleoperation of construction robots. Adv. Robot. 2020, 34, 730–743. [Google Scholar] [CrossRef]
Gong, Y.; Mat Husin, H.; Erol, E.; Ortenzi, V.; Kuchenbecker, K.J. AiroTouch: Enhancing telerobotic assembly through naturalistic haptic feedback of tool vibrations. Front. Robot. AI 2024, 11, 1355205. [Google Scholar] [CrossRef] [PubMed]
Minamizawa, K.; Kakehi, Y.; Nakatani, M.; Mihara, S.; Tachi, S. TECHTILE toolkit: A prototyping tool for design and education of haptic media. In Proceedings of the 2012 Virtual Reality International Conference, Laval, France, 28–30 March 2012; pp. 1–2. [Google Scholar]
Wildenbeest, J.G.; Abbink, D.A.; Heemskerk, C.J.; Van Der Helm, F.C.; Boessenkool, H. The impact of haptic feedback quality on the performance of teleoperated assembly tasks. IEEE Trans. Haptics 2012, 6, 242–252. [Google Scholar] [CrossRef] [PubMed]
Takahashi, M.; Nagano, H.; Tazaki, Y.; Yokokohji, Y. Effective haptic feedback type for robot-mediated material discrimination depending on target properties. Front. Virtual Real. 2023, 4, 1070739. [Google Scholar] [CrossRef]
Massimino, M.J.; Sheridan, T.B. Sensory substitution for force feedback in teleoperation. In Analysis, Design and Evaluation of Man–Machine Systems 1992; Elsevier: Amsterdam, The Netherlands, 1993; pp. 109–114. [Google Scholar]
Aviles-Rivero, A.I.; Alsaleh, S.M.; Philbeck, J.; Raventos, S.P.; Younes, N.; Hahn, J.K.; Casals, A. Sensory substitution for force feedback recovery: A perception experimental study. ACM Trans. Appl. Percept. (TAP) 2018, 15, 1–19. [Google Scholar] [CrossRef]
Kitagawa, M.; Dokko, D.; Okamura, A.M.; Yuh, D.D. Effect of sensory substitution on suture-manipulation forces for robotic surgical systems. J. Thorac. Cardiovasc. Surg. 2005, 129, 151–158. [Google Scholar] [CrossRef] [PubMed]
van Beek, F.E.; Bisschop, Q.; Gijsbertse, K.; de Vries, P.S.; Kuling, I.A. A Comparison of Haptic and Auditory Feedback as a Warning Signal for Slip in Tele-Operation Scenarios. In Proceedings of the International Conference on Human Haptic Sensing and Touch Enabled Computer Applications, Hamburg, Germany, 22–25 May 2022; Springer: Cham, Switzerland, 2022; pp. 101–109. [Google Scholar]
Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. Adv. Psychol. 1988, 52, 139–183. [Google Scholar]
Yamauchi, T.; Okamoto, S.; Konyo, M.; Hidaka, Y.; Maeno, T.; Tadokoro, S. Real-time remote transmission of multiple tactile properties through master-slave robot system. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1753–1760. [Google Scholar]
Morita, N.; Ichijo, A.; Konyo, M.; Kato, H.; Sase, K.; Nagano, H.; Tadokoro, S. Wearable high-resolution haptic display using suction stimuli to represent cutaneous contact information on finger pad. IEEE Trans. Haptics 2023, 16, 687–694. [Google Scholar] [CrossRef] [PubMed]
Ujitoko, Y.; Taniguchi, T.; Sakurai, S.; Hirota, K. Development of finger-mounted high-density pin-array haptic display. IEEE Access 2020, 8, 145107–145114. [Google Scholar] [CrossRef]
McMahan, W.; Kuchenbecker, K.J. Spectral subtraction of robot motion noise for improved event detection in tactile acceleration signals. In Proceedings of the International Conference on Human Haptic Sensing and Touch Enabled Computer Applications, Tampere, Finland, 13–15 June 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 326–337. [Google Scholar]
Bussolan, A.; Baraldo, S.; Gambardella, L.M.; Valente, A. Assessing the impact of human-robot collaboration on stress levels and cognitive load in industrial assembly tasks. In Proceedings of the ISR Europe 2023; 56th International Symposium on Robotics, Stuttgart, Germany, 26–27 September 2023; VDE: Frankfurt am Main, Germany, 2023; pp. 78–85. [Google Scholar]

Figure 1. Configuration of the follower-side system. (a) Overview of the follower-side robot system; (b) End-effector equipped with a force sensor and a hand camera; (c) End-effector with an attached accelerometer and microphone; (d) End-effector holding the connector.

Figure 5. Comparison of correct answer rates for mating detection across three conditions (V: Visual, VF: Visual + Force, VFVA: Visual + Force + Vibrotactile + Audio) in Experiment 1. *:

p_{adj} < 0.05

, **:

p_{adj} < 0.01

.

Figure 5. Comparison of correct answer rates for mating detection across three conditions (V: Visual, VF: Visual + Force, VFVA: Visual + Force + Vibrotactile + Audio) in Experiment 1. *:

p_{adj} < 0.05

, **:

p_{adj} < 0.01

.

Figure 6. Comparison of subjective clarity for mating detection across three conditions (V, VF, VFVA) in Experiment 1 (1: Difficult, 7: Easy). **:

p_{adj} < 0.01

.

Figure 6. Comparison of subjective clarity for mating detection across three conditions (V, VF, VFVA) in Experiment 1 (1: Difficult, 7: Easy). **:

p_{adj} < 0.01

.

Figure 7. Comparison of correct answer rates for mating detection across three conditions (VF: Visual + Force, VFV: Visual + Force + Vibrotactile, VFA: Visual + Force + Audio) in Experiment 2. *:

p_{adj} < 0.05

.

Figure 7. Comparison of correct answer rates for mating detection across three conditions (VF: Visual + Force, VFV: Visual + Force + Vibrotactile, VFA: Visual + Force + Audio) in Experiment 2. *:

p_{adj} < 0.05

.

Figure 8. Comparison of subjective clarity for mating detection across three conditions (VF, VFV, VFA) in Experiment 2. **:

p_{adj} < 0.01

.

Figure 8. Comparison of subjective clarity for mating detection across three conditions (VF, VFV, VFA) in Experiment 2. **:

p_{adj} < 0.01

.

Figure 9. Comparison of subjective mental workload for mating detection across three conditions (VF, VFV, VFA) in Experiment 2. *:

p_{adj} < 0.05

, **:

p_{adj} < 0.01

.

Figure 9. Comparison of subjective mental workload for mating detection across three conditions (VF, VFV, VFA) in Experiment 2. *:

p_{adj} < 0.05

, **:

p_{adj} < 0.01

.

Figure 10. Comparison of maximum contact force between two conditions (VF: Visual + Force, VFVA: Visual + Force + Vibrotactile + Audio) in Experiment 3. ***:

p < 0.001

.

Figure 10. Comparison of maximum contact force between two conditions (VF: Visual + Force, VFVA: Visual + Force + Vibrotactile + Audio) in Experiment 3. ***:

p < 0.001

.

Figure 11. Comparison of task completion time between two conditions (VF and VFVA) in Experiment 3.

Figure 12. Comparison of subjective clarity between two conditions (VF, VFVA) in Experiment 3. **:

p < 0.01

.

Figure 12. Comparison of subjective clarity between two conditions (VF, VFVA) in Experiment 3. **:

p < 0.01

.

Figure 13. Comparison of subjective operability between two conditions (VF and VFVA) in Experiment 3. **:

p < 0.01

.

Figure 13. Comparison of subjective operability between two conditions (VF and VFVA) in Experiment 3. **:

p < 0.01

.

Figure 14. Comparison of subjective mental workload between two conditions (VF and VFVA) in Experiment 3. **:

p < 0.01

.

Figure 14. Comparison of subjective mental workload between two conditions (VF and VFVA) in Experiment 3. **:

p < 0.01

.

Figure 15. Comparison of subjective physical workload between two conditions (VF and VFVA) in Experiment 3.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ohno, K.; Nagano, H.; Yokokohji, Y. Beyond Visual and Force Feedback: Role of Vibrotactile and Auditory Cues in Robot Teleoperated Assembly. Robotics 2026, 15, 39. https://doi.org/10.3390/robotics15020039

AMA Style

Ohno K, Nagano H, Yokokohji Y. Beyond Visual and Force Feedback: Role of Vibrotactile and Auditory Cues in Robot Teleoperated Assembly. Robotics. 2026; 15(2):39. https://doi.org/10.3390/robotics15020039

Chicago/Turabian Style

Ohno, Kaoru, Hikaru Nagano, and Yasuyoshi Yokokohji. 2026. "Beyond Visual and Force Feedback: Role of Vibrotactile and Auditory Cues in Robot Teleoperated Assembly" Robotics 15, no. 2: 39. https://doi.org/10.3390/robotics15020039

APA Style

Ohno, K., Nagano, H., & Yokokohji, Y. (2026). Beyond Visual and Force Feedback: Role of Vibrotactile and Auditory Cues in Robot Teleoperated Assembly. Robotics, 15(2), 39. https://doi.org/10.3390/robotics15020039

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Beyond Visual and Force Feedback: Role of Vibrotactile and Auditory Cues in Robot Teleoperated Assembly

Abstract

1. Introduction

1.1. Background

1.2. Related Works

1.2.1. Haptic (Kinesthetic and Cutaneous) and Auditory Feedback

1.2.2. Limitations of Previous Studies

1.3. Research Objective

1.4. Contributions

2. Teleoperation System with Multimodal Feedback

2.1. System Overview

2.1.1. Follower-Side System

2.1.2. Leader-Side System

2.2. Control Method

2.3. Force Feedback

2.4. Vibrotactile and Auditory Feedback

3. Experiment 1: Mating Detection

3.1. Objective

3.2. Method

3.2.1. Participants

3.2.2. Conditions

3.2.3. Stimuli and Procedure

3.2.4. Metrics

3.3. Results

3.3.1. Correct Answer Rate

3.3.2. Subjective Clarity

4. Experiment 2: Modality Contribution

4.1. Objective

4.2. Method

4.2.1. Participants

4.2.2. Conditions

4.2.3. Procedure

4.2.4. Metrics

4.3. Results

4.3.1. Correct Answer Rate

4.3.2. Subjective Clarity

4.3.3. Subjective Mental Workload

5. Experiment 3: Connector Insertion

5.1. Objective

5.2. Method

5.2.1. Participants

5.2.2. Conditions

5.2.3. Task and Procedure

5.2.4. Metrics

5.3. Results

5.3.1. Maximum Contact Force

5.3.2. Task Completion Time

5.3.3. Subjective Clarity

5.3.4. Subjective Operability

5.3.5. Subjective Mental Workload

5.3.6. Subjective Physical Workload

6. Discussion

6.1. Effectiveness of Multimodal Feedback

6.2. Redundancy of Vibrotactile and Auditory Cues

6.3. Trade-Off Between Performance and Mental Workload

6.4. Impact on Physical Safety

6.5. Limitations and Future Work

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI