Effects of Introducing Speech Interaction Modality on Performance of Special Vehicle Crew Under Various Task Complexity Conditions

Feng, Chuanyan; Liu, Shuang; Wanyan, Xiaoru; Qian, Chunying; Ji, Kun; Xie, Fang; Zhou, Yue

doi:10.3390/systems13100847

Open AccessArticle

Effects of Introducing Speech Interaction Modality on Performance of Special Vehicle Crew Under Various Task Complexity Conditions

by

Chuanyan Feng

^1,2

,

Shuang Liu

¹,

Xiaoru Wanyan

^1,*,

Chunying Qian

¹

,

Kun Ji

¹,

Fang Xie

³ and

Yue Zhou

³

¹

School of Aeronautic Science and Engineering, Beihang University, Beijing 100191, China

²

Hangzhou International Innovation Institute, Beihang University, Hangzhou 311115, China

³

China North Vehicle Research Institute, China North Industries Group Corporation Ltd., Beijing 100072, China

^*

Author to whom correspondence should be addressed.

Systems 2025, 13(10), 847; https://doi.org/10.3390/systems13100847

Submission received: 14 August 2025 / Revised: 12 September 2025 / Accepted: 24 September 2025 / Published: 26 September 2025

(This article belongs to the Section Complex Systems and Cybernetics)

Download

Browse Figures

Versions Notes

Abstract

An experiment with a two interaction modalities (traditional: touch; novel: touch–speech) × two task complexities (low: visual single task; high: audio–visual dual task) within-subjects design was conducted to observe alterations in crew performance (including task performance, subjective workload, and eye responses) in a typical planning task-based on a high-fidelity special vehicle simulation platform. The results revealed that (1) compared to the traditional interaction modality, the novel interaction modality significantly improved task performance, reduced subjective workload, increased mean peak saccade velocity, and decreased fixation entropy; (2) under high task complexity, subjective workload, mean pupil diameter, and the nearest neighbor index showed significant increases, while no significant decline in task performance was observed; (3) no interaction effect of crew performance was observed between interaction modality and task complexity. The foregoing results imply that (1) the novel interaction modality incorporating speech input exhibits advantages over the traditional touch-based modality in terms of enhancing task performance (over 45% improvement) and reducing cognitive workload; (2) leveraging dual-channel audio–visual information processing facilitates the maintenance of task performance under high task complexity and multi-tasking demands; (3) eye movement characteristics may serve as informative indicators for evaluating the benefits of speech-based interaction and the effectiveness of cognitive resource allocation under high-complexity task conditions. The results can provide a basis for the design of the display and control interface in special vehicles.

Keywords:

interaction modality; speech interaction; touch interaction; task complexity; eye response; special vehicle

1. Introduction

With the integration of information technology into automotive systems, special vehicles are increasingly adopting automation and intelligent design. Consequently, the main duties of crew members are shifting from manual operations to cognitively demanding tasks [1,2]. Science and technological advancements have improved the capabilities of machinery and equipment, thereby amplifying the impact of human limitations on overall system performance. This trend is particularly evident as machines become more intelligent while human operators take on supervisory roles involving monitoring, planning, and diagnosis [3]. Studies reveal that more than 70% of accidents and incidents in typical safety-critical industrial domains are attributable to human error [4,5]. The design of the interaction modality within the crew cabin of special vehicles directly affects crew performance and is closely related to system safety. In recent years, human–machine interaction in special vehicle cabins has transitioned from traditional mechanical interfaces to more intelligent modalities, such as touch-based, speech-based, and multimodal hybrid interactions [6]. Consequently, the research on novel interaction modalities has received considerable attention in the human factor design of crew cabins.

Naturalistic interaction modalities—such as touch, speech, and multi-modal interactions—are being actively explored. Touch interaction enables rapid feedback [7] and fully functional intuitive interfaces within confined operational environments [8,9]. On the other hand, speech interaction offers a remote and contactless interaction interface [9,10]. It allows users to select functions through natural verbal commands, eliminating the need for hierarchical structures and explicit navigation [11,12], while also enhancing emotional engagement [9,13]. Previous studies have shown that the adaptability of touch and speech modalities varies across task contexts, indicating that each modality has its own strengths [14,15,16,17]. Multi-modal interaction technologies allow operators to employ multiple input channels simultaneously, which overcomes the limitations of unimodal interaction and has a favorable potential to improve the efficacy of human–machine interaction [18,19,20]. For example, in applications such as smartphone control [21], personal health data exploration [12], and web-based visualization tools [6], researchers have observed that multi-modal interaction has superior operational efficiency and user satisfaction compared to unimodal interaction.

Numerous studies have undertaken empirical comparisons between touch and multi-modal interactions. For instance, according to Dudek and Schulte [10], when participants were free to choose the interaction modality, the average time of task assignment was shortened. Another user study revealed that the introduction of speech interaction renders multi-modal interaction more precise and rapid than typical touch interaction [22]. In addition, multi-modal interactions incorporating speech and location input have the potential to reduce cognitive workload. In a selection and system control test, Zimmerer et al. [23] found that the combination of touch and speech interactions had a lower cognitive workload than the touch-only unimodal interaction. However, other studies have suggested that the effectiveness of multimodal interaction is not necessarily superior to that of unimodal interaction. For example, in a simulated manned–unmanned teaming monitoring system in the Black Hawk helicopter environment, Levulis et al. [20] observed no significant difference in task performance between the multi-modal and touch interactions. The integration of speech interaction has been found to reduce the physical demands associated with specific interface inputs and provide unique flexibility [20]. Nevertheless, the effectiveness of multimodal interaction may depend on specific task demands, user expectations, and context of use. Although advances in intelligent and information-driven system design are driving the adoption of touch and multimodal interaction as the future primary human–machine interaction modalities in the crew cabin of special vehicles [9,24], whether multi-modality results in better task performance and lower workload for a typical special vehicle crew as compared with touch interaction is not known.

Given the increasing complexity of tasks encountered by special vehicle crews, it is essential to further clarify the influencing mechanisms of various interaction modalities on crew performance under varying task demands, particularly under high-complexity conditions. Knowledge on the role of task complexity in determining task performance remains insufficient [25]. While several studies have demonstrated that increasing task complexity typically results in high demands for cognitive resources that in turn degrade task performance and increase workloads [26,27], others argue the opposite [28]. The adoption of suitable interaction modalities has been shown to enhance performance under conditions of high task complexity, while the effectiveness of a given modality may depend on specific task characteristics, as different modalities align differently with various operational demands [10,18]. Specifically, task complexity can be modulated through different information presentation channels, such as visual, auditory, and tactile modalities, each exerting distinct cognitive demand. When the interaction modality shares the same sensory channel as the task, for example, speech interaction with auditory tasks, conflicts may arise due to competition for processing resources. Therefore, testing the applicability of specific interaction modalities under different task complexity conditions is essential. In special vehicle cabins, crew members frequently use headphones for speech communication during special vehicle operations. The task complexity provided via the auditory channel changes because of this communication style; nevertheless, it is typical and expected. As the requirement of maximizing efficiency becomes more critical when confronted with complex tasks, modifying task complexity by adding auditory channel information processing tasks and verifying the effects of introducing speech interaction on the performance of a special vehicle crew in task-specific settings has practical research significance.

Moreover, the methodologies used to evaluate the effectiveness of different interaction modalities in complex operational contexts warrant further refinement. In previous studies, performance measurement was typically applied to assess the effectiveness of interaction modality, whereas physiological assessment was rarely included for comprehensive analysis. However, eye responses have a distinct advantage in revealing differences in the efficiency of information processing [29,30]. As task complexity varies, certain eye movements change [26,27]. Additionally, the introduction of speech interaction compared with solely employing touch interaction may reduce the necessity for visual saccades during multi-modal interaction [20]. High-complexity tasks may leverage multi-modal interaction over touch interaction, as reflected by specific eye movement features. Typical eye response metrics include peak velocity [31,32], fixation entropy [27], the nearest neighbor index (NNI) [33], and mean pupil diameter. Therefore, studies are required to determine the sensitivity of eye response metrics to various interaction modalities and task complexity in special vehicle crew tasks.

Although studies on the effects of typical interaction modality and task complexity on performance have been conducted in several civil domains (including car driving, smartphone controlling), pertinent research in the field of safety-critical equipment development remains limited [34]. With the foregoing context, the following aspects of the current study may be deemed as requiring improvement. First, pertinent studies have found that multi-modal interaction may have benefits over touch interaction in terms of augmenting task performance and reducing workloads. However, whether these benefits remain applicable to the typical human–machine interaction task of a special vehicle crew is unclear. Specifically, further research is required to determine whether the benefits of multi-modal interaction over touch interaction continue to hold given the task complexity presented by various information processing channels. Moreover, with regard to the human–machine interaction tasks of a special vehicle crew, the statistical disparities that eye response metrics may indicate and the differentiated information processing characteristics they reveal are not yet well understood. Using a high-fidelity special vehicle simulation platform, 20 experienced special vehicle drivers or test drivers performed an ergonomic experiment under typical planning tasks for special vehicle crews. The objective is to examine the effects of interaction modality and task complexity. The purpose of this study is to investigate the effects of traditional touch modality and a novel interaction modality with the introduction of speech interaction on crew performance (including task performance, subjective workload, and eye response) in low-complexity and high-complexity tasks. Considering the foregoing background, the following are hypothesized.

Hypothesis 1 (H1).

The introduction of speech interaction modality will result in better task performance and reduce subjective workload compared with the traditional touch interaction modality, regardless of the task complexity (low when involving information processing by a single visual channel or high when involving information processing by an auditory channel).

Hypothesis 2 (H2).

A high-complexity task with the addition of an auditory channel task will result in worse task performance and higher subjective workload, as compared with a low-complexity task with a single-channel visual task, regardless of the interaction modality.

Hypothesis 3 (H3).

Different interaction modalities and task complexities will result in variations in operator attention strategy and workload states, as reflected by particular eye response metrics.

2. Methods

2.1. Participants

Twenty highly operationally experienced special vehicle drivers or test drivers are recruited, with eighteen proving effective. Two participants with unqualified data were removed, specifically, one participant’s eye movement data was not collected due to equipment failure, and another participant’s performance data did not satisfy the training criteria. The age range of the participants, who are all male and have a mean age of 32.17 years (SD = 7.30), is from 23 to 46 years old. All participants have good health, are right-handed, do not have color blindness, have normal or corrected vision, normal hearing, and have gotten enough sleep (at least 8 h) the night before the experiment. Before the experiment, participants are given information about the rules and procedures and are asked to sign a written informed consent form. The study is in accordance with the Declaration of Helsinki and has been approved by the Biological and Medical Ethics Committee of Beihang University (Approval number BM20230003).

2.2. Experimental Platform and Equipment

The experiment is set up on a new high-fidelity platform for simulating special vehicle tasks. The setup has four parts: hardware, main control, task simulation, and real-time data recording systems. The hardware of the simulation platform consists of multiple touchscreen displays, control components, a simulator host, a server, and other devices. The main control system can select different crew tasks and terminal display control interfaces and set the corresponding initial task parameters. The task simulation system consists of a control simulation component, a display and control integrated terminal, and an embedded control and information terminal software system to simulate high-fidelity tasks, and achieves the interaction between human–machine information and control. The real-time data system has a background information terminal recording module, and the primary function of this module is to gather and record the human–machine interaction behaviors. It also records the precise time of particular operations, such as touch and speech inputs. The Tobii Glass II wearable eye-tracking system (Tobii Technology, Stockholm, Sweden) is used in this study to collect the eye responses of participants. The system’s sampling frequency and accuracy are 50 Hz and less than 0.5°, respectively. The range of eye tracking is 82° horizontally and 48° vertically, the camera resolution is 1920 × 1080, and the frame rate is 25 Hz. The calibration is completed using the one-point calibration method.

2.3. Experimental Design

Two-factor repeated measures with two task complexities × two interaction modalities are used in the experimental design, with each participant completing four distinct experimental conditions. The first independent variable describes the interaction modality of a special vehicle information display interface, including two levels: the traditional touch interaction and the novel touch–speech multi-modal interaction. Participants can use either the touch modality or hybrid touch–speech multi-modal interaction for inputting the same target task. Note that the number of steps required to complete the same task in the multi-modal interaction in which speech is introduced is less than that in the touch interaction. This indicates that a certain manual touch operation has been replaced by an automatic system backstage process according to system recognition and speech input. Additionally, the second independent variable, task complexity, can encompass multiple dimensions such as information presentation channels and time pressure. Given the prevalence and effectiveness of information transmission through various channels in specialized vehicle operations, this study manipulates task complexity by varying information processing demands across multiple presentation channels [35,36]. Specifically, the low-complexity task solely involves processing visual channel information, whereas the high-complexity task involves processing an additional auditory channel information based on the low-complexity task. As shown in Figure 1, participants respond to the typical task presented by the visual channel in the low-complexity task condition, and recognize and provide feedback to the typical visual input and the audible warning input in the high-complexity condition. To balance the different levels in the repeated measures design and eliminate the effects of exercise and fatigue, the experimental sequence is arranged using a Latin Square design.

The dependent variable has three participant performance metrics: (1) task completion time and number of misoperations of crew task; (2) National Aeronautics and Space Administration Task Load Index (NASA-TLX) scores of subjective workload [37]; and (3) mean peak saccade velocity [31,38], fixation entropy [27], NNI [33], and mean pupil diameter [39] of eye response. NASA-TLX is considered one of the most widely used subjective measures of workload [40], while eye-tracking measurements are also recognized as a physiological method sensitive to workload [41,42]. Therefore, both methods are employed in this study. Task completion time refers to the total duration taken by a crew member to complete a given task, number of misoperations refers to the total count of incorrect actions made by the operator during the task. NASA-TLX score is derived by calculating the score for each subscale and its corresponding weight, which are then combined to produce an overall workload score [37]. The formula is as follows:

N A S A - T L X Score = \sum_{i = 1}^{6} W_{i} R_{i} / \sum_{i = 1}^{6} W_{i}

(1)

W_i represents the weight assigned to each of the six subscales, based on the participant’s assessment of their relative importance, R_i is the rating for each subscale.

Saccades are rapid eye movements that occur between fixations. The mean peak saccade velocity measures the highest speed at which the eyes move during these swift shifts in gaze. Fixation entropy quantifies the diversity of an individual’s gaze behavior, representing the extent to which fixations are distributed or concentrated across a given region or stimulus [33]. The formula for fixation entropy (H) is derived from Shannon’s entropy in information theory and is given as follows:

H = - \sum_{j = 1}^{n} p_{j} \log_{2} (p_{j})

(2)

where p_j represents the probability of a fixation occurring at the j-th position within a specific area of interest (AOI), and n denotes the total number of AOIs. NNI measures the degree to which fixations are spatially clustered or dispersed, offering insight into the concentration or distribution of gaze within a defined AOI [33]. The NNI is calculated as follows:

N N I = \sum_{1}^{N} [\min \frac{dkl}{N}] / 0.5 \sqrt{A / N}

(3)

where min (dkl) represents the minimum distance existing between point k and its nearest neighbor l (with l ranging from 1 to N, and l ≠ k), N corresponds to the number of points, and A denotes the polygonal area defined by the outermost fixations. Mean pupil diameter refers to the average size of both the left and right pupils during the task.

2.4. Experimental Tasks and Procedures

Task procedures for typical crew positions were developed in this study through conducting field research and interviews with domain experts (designers, developers, and frontline operators). Typically, the four stages and their corresponding operational designs are determined based on field research and were validated by multiple frontline operators, ensuring strong representativeness. The experimental tasks are designed as follows:

Under a condition in which a task has low complexity, participants are expected to complete the entire task, consisting of four phases (or subtasks): task decomposition, road planning, perception planning, and strike planning. The participants are required to process the visual channel input presented by the information display interface of the special vehicle simulation platform. In the task decomposition phase, the participants are required to check the tasks issued by the main control system for implementation, report the task information verbally, and continue with the task decomposition following the prompts of the task guidance module. In the road planning phase, the participants are required to search and enter the start and end points of the target (or enter the start or end point by voice) on the display control terminal according to the prompts of the task guidance module to complete road planning and delivery. In the perception planning phase, the participants are required to select specific operation equipment and set its search mode and parameters following the prompts of the task guidance module. In the strike planning phase, participants are required to add targets or implement executive equipment, employ equipment and establish specific parameters on existing targets following the prompts of the task guidance module.

Under the condition of high complexity task, the participants are required to respond to the auditory channel warning, which is randomly presented, in addition to completing the visual channel task. The audible channel warnings including high voltage, low voltage, high oil quantity, and low oil quantity. Audio warning messages are delivered to participants through headphones with a random interval of three times per minute. Depending on the warning type, the participants are required to implement certain responses as rapidly and as accurately as possible. This experiment has two phases: training and formal experiment. The participants in the training phase sign the informed permission form after they have been fully acquainted with the experimental platform and task. Each participant is required to complete four experimental tasks under different conditions and then take the NASA-TLX scale test under each condition. An adequate rest period of approximately 10 min is scheduled in between the two experimental conditions, each of which lasts approximately 15 min.

3. Results

Statistical analysis was conducted using IBM SPSS Statistics 25.0 with a confidence level of α = 0.05. Two factors, the interaction modality and the task complexity, were studied in terms of their interactions and main effects on the assessed indicators using repeated measure analysis of variance (ANOVA) [43]. When the main effect and interaction were significant, the Bonferroni method [44] was applied to post hoc comparison and simple effect analysis. The Green–Geisser method [45] is used for correction if the sphericity test is not satisfied.

3.1. Task Performance Results

The task completion time and number of misoperations are both considered in task performance. The descriptive results indicate that the trend of the task completion time is the gradual decrease from “T_H” (traditional touch interaction modality with high task complexity presented by an audio–visual dual task; mean ± standard deviation: 1030.78 ± 623.09), “T_L” (traditional touch interaction modality with low-complexity task presented by a visual single task; 843.61 ± 250.96), and “N_L” (novel interaction modality in which speech interaction is introduced with low task complexity presented by a visual single task; 480.67 ± 271.45) to “N_H” (novel interaction modality in which speech is introduced with high task complexity presented by an audio–visual dual task; 461.11 ± 202.49). Moreover, the number of misoperations gradually decreases from “T_H” (6.78 ± 7.00), “T_L” (6.22 ± 5.81), and “N_L” (3.72 ± 4.88) to “N_H” (3.39 ± 5.81).

The implementation of two-way repeated-measure ANOVA reveals that the interaction between interaction modality and task complexity is insignificant in terms of task completion time and number of misoperations (ps > 0.05). The interaction modality has significant main effects on task completion time (F(1,17) = 47.698, p < 0.001,

η_{p}^{2}

= 0.737) and number of misoperations (F(1,17) = 13.894, p = 0.002,

η_{p}^{2}

= 0.450). Compared with the traditional touch interaction modality, the task completion time of the novel interaction modality in which speech is introduced is significantly shorter (p < 0.001), and the number of misoperations is lower (p = 0.002), as shown in Figure 2. In addition, compared with the traditional touch interaction modality, the novel interaction modality in which speech is introduced significantly reduced task completion time by 49.76% (from 937.194 ± 96.004 to 470.889 ± 46.263) and number of misoperations by 45.29% (from 6.500 ± 1.732 to 3.556 ± 1.052). The main effect of task complexity on task completion time and number of misoperations is insignificant (ps > 0.05).

3.2. Subjective Workload Results

The NASA-TLX scale was used to calculate the subjective workload. The descriptive results indicate an upward trend of NASA-TLX total score in the sequence N_L, T_L, N_H, and T_H, as listed in Table 1.

As listed in Table 2, the two-factor repeated measure ANOVA indicates that the interaction between modality and task complexity is insignificant (p > 0.05). The main effect of the interaction modality on the NASA-TLX score is significant (p = 0.001), demonstrating that the NASA-TLX score under the novel interaction modality in which speech is introduced is significantly lower than that of the traditional touch interaction modality. The main effect of task complexity on the NASA-TLX score is significant (p = 0.002), indicating that the NASA-TLX score of low-complexity task is significantly lower than that of the high-complexity task.

The NASA-TLX total and sub-scale scores are shown in Figure 3. Table 2 indicates that the interaction modality has significant effects on the six NASA-TLX sub-scales. The novel interaction modality in which speech is introduced has significantly low mental demand (MD), physical demand (PD), temporal demand (TD), effort (EF), and frustration (FR) sub-scale scores (ps < 0.05). It also has a significantly higher performance (PE) sub-scale score compared with the traditional touch interaction modality (p < 0.05). The main influence of task complexity on the other four NASA-TLX sub-scales was considerable (ps < 0.05), except for the PE and EF sub-scales. A task with high complexity has been demonstrated to have considerably higher MD, PD, TD, and FR scores than a low-complexity task (ps < 0.05).

3.3. Results of Eye Response

Mean peak saccade velocity, fixation entropy, NNI, and mean pupil diameter are the eye movement metrics chosen for this investigation. As shown in Figure 4, the mean peak saccade velocity has an upward trend in the sequence T_L, T_H, N_H, and N_L (194.89 ± 20.99, 195.86 ± 17.96, 206.53 ± 28.22, and 206.55 ± 39.72, respectively). Fixation entropy exhibits an upward trend in the sequence N_L, N_H, T_L, and T_H (194.89 ± 20.99, 195.86 ± 17.96, 206.53 ± 28.22, and 206.55 ± 39.72, respectively). Similarly, the NNI exhibits an upward trend in the sequence T_L, N_L, T_H, and N_H (0.44 ± 0.07, 0.45 ± 0.09, 0.47 ± 0.07, and 0.49 ± 0.06, respectively). The mean pupil diameter also exhibits an upward trend in the sequence T_L, N_L, N_H, and T_H (3.68 ± 0.44, 3.72 ± 0.43, 3.77 ± 0.407, and 3.77 ± 0.42, respectively).

The results of the two-way repeated measure ANOVA for the mean peak saccade velocity reveal that the interaction effect between interaction modality and task complexity is not statistically significant (p > 0.05). In particular, the mean peak saccade velocity of the traditional touch interaction modality is significantly lower than that of the novel interaction modality in which speech is introduced; this is the main effect of interaction modality: F(1,17) = 6.783, p = 0.019, and

η_{p}^{2}

= 0.285. Task complexity does not have a significant effect (p > 0.05). The results of the two-way repeated measure ANOVA for fixation entropy indicate that no significant interaction occurs between task complexity and interaction modality (p > 0.05). Interaction modality has a significant effect: F(1,17) = 6.833, p = 0.018, and

η_{p}^{2}

= 0.287. The traditional touch interaction modality has a fixation entropy that is considerably higher than the novel interaction modality in which speech is introduced. Task complexity does not have a significant effect (p > 0.05). The results of the two-way repeated measure ANOVA for the mean pupil diameter reveal that the interaction between interaction modality and task complexity is not statistically significant (p > 0.05). In particular, the mean pupil diameter considering the low-complexity task is significantly lower than that of the high-complexity task; the foregoing is the main effect of task complexity (F(1,17) = 14.954, p = 0.001, and

η_{p}^{2}

= 0.468). The interaction modality has an insignificant effect (p > 0.05). The results of the two-way repeated measure ANOVA for the NNI indicate that no significant interaction occurs between task complexity and interaction modality (p > 0.05). Task complexity has a significant main effect: F(1,17) = 10.017, p = 0.006, and

η_{p}^{2}

= 0.371. Particularly, with the introduction of speech interaction, the low-complexity task has an NNI that is considerably lower than the high-complexity task. The interaction modality does not have a significant effect (p > 0.05).

4. Discussion

To examine the changes in performance (including task performance, subjective workload, and eye response) on typical planning tasks for a special vehicle crew, an experiment is designed. The experiment has two interaction modalities (traditional: touch interaction; novel: touch–speech multi-modal interaction) in two types of task complexity (low: visual single task; high: audio–visual dual task). Results revealed that with the introduction of speech interaction modality, the participants demonstrated improved task performance, reduced subjective workload, greater mean peak saccade velocity, and lower fixation entropy under various task complexities compared with the typical touch interaction modality. No significant degradation was observed in task performance (both task completion time and number of misoperations) when comparing the tasks with high and low complexities. The subjective workload was higher, as revealed by the higher NASA-TLX scale and sub-scale scores (except for the PE and EF sub-scales). The mean pupil diameter and NNI increased. For task performance and subjective workloads, no significant interaction effect between interaction modality and task complexity is observed in this study.

After the introduction of speech interaction modality under different task complexities, the participants demonstrated a shorter task completion time (49.76% reduction), fewer misoperations (45.29% reduction), and lower subjective workloads compared with the traditional touch interaction modality. The foregoing is manifested by the lower NASA-TLX and sub-scales scores (MD, PD, TD, EF, and FR) and the higher PE sub-scale score of the novel interaction modality compared with scores of the traditional touch interaction modality. Therefore, H1 is confirmed. This result agrees with those of Levulis et al. [20] and Dudek & Schulte [10]. The introduction of speech interaction into traditional touch interaction in the operation of special vehicle crews is an exploratory endeavor in engineering practice. Typically, the novel interaction modality in which speech is introduced can combine the benefits of fast feedback and facile correction offered by touch interaction with potential object selection and non-contact features of speech interactions [20,46]. Furthermore, considering that the introduction of speech in the novel interaction modality simplifies certain task processes (e.g., triggering automation through speech commands), this inherent characteristic of speech interaction may contribute to performance improvements. Future research could consider adopting more refined experimental designs, such as examining the task simplification effect of speech, to explore this in greater depth. As demonstrated by the lower NASA-TLX scores of the novel interaction modality in which speech is introduced, the subjective workload is lower than that in the traditional touch interaction. This is possibly due to the contactless interface of speech interaction [9], which reduces the physical demand associated with the frequent movement of the upper limbs to interact with the display and lowers the information search cost [20]. It is important to note that, while statistically significant, performance metrics with a medium effect size still require further validation in real-world and complex scenarios to assess their practical potential.

Compared with the low-complexity single visual channel task, the high-complexity task with the addition of an auditory channel has higher NASA-TLX total and sub-scale scores (except for the PE and EF sub-scales). However, the completion time or number of misoperations did not increase; that is, H2 is rejected. This demonstrates that no discernible reduction in task performance is observed, although the subjective workload increases with the shift from single visual channel processing to audio–visual dual-channel processing. This may be a case where the addition of an auditory warning task under the conditions in which the high-complexity task increases the task demand, resulting in an increase in the subjective workload of participants. This result is similar to that of Gulati et al. [28]. However, no distinct degradation in the performance of operators is found; this is inconsistent with the findings of previous investigations in which performance declines [26,27]. The audio–visual dual-channel presentation of high-complexity tasks in this study enables operators to coordinate time sharing more easily. For the high task complexity in this study, participants may have successfully managed the increased workload by adjusting their strategies (e.g., prioritization). Critically, from the perspective of multiple resource theory, these adaptive behaviors indicate that operators can effectively distribute and share different types of cognitive resources, processing more information without depleting any single resource pool, thereby maintaining performance levels even under increased subjective demand. Consequently, the decline in task performance is mitigated in contrast to the single visual channel complex task presented in previous studies.

Tasks with high complexity are frequently considered in the engineering applications of special vehicles, particularly when the tasks are provided through multiple channels, such as auditory and visual channels. Therefore, task complexity is introduced to observe its influence on the main effect of interaction modality. The foregoing findings indicate that interaction modality and task complexity have no appreciable interaction effects on task performance and subjective workload measures. Accordingly, whether the task complexity is lower with the single visual channel or higher with the addition of an auditory channel, the introduction of speech to the novel interaction modality results in better task performance and lower subjective workloads compared with the traditional touch modality. The perceived temporal demands and selection preferences of operators with respect to various interaction modalities vary when the task complexity is altered according to previous studies [10,47]. However, the findings in this study indicate that even under conditions where the task complexity are high, the advantages of the novel interaction modality with the addition of speech interaction compared with conventional touch interaction modality on task performance are evident. Moreover, subjective workloads are not diminished. However, the findings are not consistent with those of previous studies [26,27] except that the results on subjective workload are similar to those of J. Lee et al. [48]. According to multi-resource theory [49], the audio–visual dual-channel presentation used in this study can achieve better time sharing, preventing the degradation of advantages of performance and subjective workload of the novel interaction modality over the traditional modality under high task complexity conditions. This implies that even under conditions where the task complexity is high, the novel interaction modality in which speech interaction is introduced can continue to be leveraged as a primary design to maintain satisfactory crew task performance and subjective workloads.

The examination of eye response metrics indicates that the novel interaction modality in which speech is introduced has greater mean peak saccade velocity and lower fixation entropy than the traditional touch interaction modality. Reduced fixation entropy typically denotes a more regular and systematic visual exploration strategy [50,51] and is related to reduced subjective workloads [52]. In contrast, the higher peak velocity is typically associated with lower brain attention demands and subjective effort [53]. This result suggests that under the novel interaction modality in which speech interaction is added compared with the touch interaction modality, the participants have superior attentional strategies and lower mental resource demands. The results showed that the mean pupil diameter of participants and NNI significantly increased in the high-complexity task compared with those in the low-complexity task. The increases in NNI and pupil diameter are typically closely correlated with an increase in cognitive workloads [50,52]. This study implies that the high-complexity task compared with the low-complexity task may lead to higher cognitive workloads in typical special vehicle information display interface activities. The subjective workload results presented above are generally comparable with the eye response metric results; thus, H3 is confirmed. That is, specific eye response metrics are sensitive to changes in task complexity and interaction modality. To some extent, this finding reveals the changes in attentional strategies and workload states of participants.

The findings of this study mainly contribute to the body of knowledge on task complexity and interaction modalities with respect to performance advantages in typical planning tasks for special vehicles. Despite these results, this study has deficiencies. First, although a new high-fidelity platform for simulating specialized vehicle tasks and eye-tracking measurements was employed, only a limited set of typical tasks (task decomposition, road planning, perception planning, and strike planning) and eye-tracking metrics reflecting cognitive workload were tested. Future studies will require more realistic and complex task scenarios (e.g., noise, net-work latency, team-level dynamics, task complexity dimensions and multitasking interruptions) of special vehicles, along with comprehensive physiological measures (e.g., heart rate variability, electroencephalogram), to validate the results. What is more, due to workplace constraints and institutional requirements, all participants in this study were 18 males aged 23 to 46, with future studies planned to include female participants and a larger, more diverse sample to explore gender differences and age-related factors. Additionally, the brief task durations prevented the assessment of operator fatigue or adaptation effects, which are critical factors in extended real-world operations and will be addressed in future studies on sustained performance. Finally, because only a few hybrid interaction modalities, such as touch and speech, have been considered thus far, further research is required to study more hybrid interaction modalities (e.g., gesture, eye-gaze, multimodal combinations [54,55]). Research results can offer valuable support to the subsequent design of the information display interface of special vehicle crew cabins and serve as a point of reference for related military equipment domains.

5. Conclusions

In this study, the task performance, subjective workload, and eye response metrics of special vehicle crews are examined considering various interaction modalities and task complexities. The following conclusions regarding the typical human–computer interaction tasks for special vehicle crews are drawn from this study. Critically, our findings provide clear evidence regarding the initial hypotheses: First, a novel interaction modality in which speech interaction is introduced can alleviate subjective workloads and improve task performance (more than 45% improvement) regardless of the task complexity (low when involving information processing by a single visual channel, or high when involving information processing by an added auditory channel), fully supporting H1. Moreover, compared with the low-complexity task in which information is processed by a single visual channel, the high-complexity task with the addition of an auditory channel exhibits a higher subjective workload; however, no discernible difference in performance is observed, leading to the rejection of H2 (which predicted performance decline under high complexity). The utilization of information processing by an audio–visual dual-channel is conducive to maintaining crew performance for satisfying multi-tasking requirements. Furthermore, the changes in specific eye movement metrics can serve as a reference for revealing the advantages of introducing speech interaction and maintaining the performance of high-complexity tasks in audio–visual dual-channels, thereby confirming H3. The foregoing conclusions can provide a basis for the design and optimization of the display and control interface in special vehicles.

Author Contributions

Conceptualization, formal analysis, investigation, methodology, validation, visualization, and writing—original draft, C.F.; Conceptualization, methodology, resources, and writing—review and editing, S.L.; Conceptualization, methodology, resources, supervision, and writing—review and editing, project administration, funding acquisition, X.W.; Data curation, validation, and writing—original draft, C.Q.; Visualization, investigation, and writing—original draft K.J.; Investigation, resources, F.X.; Data curation, resources, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 52402507) and the Aeronautical Science Foundation of China (No. 201813300002).

Data Availability Statement

The datasets generated during the current study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors gratefully acknowledge the financial support from the National Natural Science Foundation of China (NSFC) and the Aeronautical Science Foundation of China (ASFC). The authors also thank all participants for their involvement in the study.

Conflicts of Interest

Authors Fang Xie and Yue Zhou were employed by the company China North Vehicle Research Institute, China North Industries Group Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The [China North Vehicle Research Institute, China North Industries Group Corporation] had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

T_H	Traditional touch interaction modality with high task complexity presented by an audio–visual dual task
T_L	Traditional touch interaction modality with low-complexity task presented by a visual single task
N_L	Novel interaction modality in which speech interaction is introduced with low task complexity presented by a visual single task
N_H	Novel interaction modality in which speech is introduced with high task complexity presented by an audio–visual dual task
NASA-TLX	National Aeronautics and Space Administration Task Load Index
NNI	Nearest Neighbor Index
ANOVA	ANalysis Of VAriance
MD	Mental Demand
PD	Physical Demand
TD	Temporal Demand
EF	EFfort
FR	FRustration
PE	PErformance

References

Bathla, G.; Bhadane, K.; Singh, R.K.; Kumar, R.; Aluvalu, R.; Krishnamurthi, R.; Kumar, A.; Thakur, R.N.; Basheer, S. Autonomous Vehicles and Intelligent Automation: Applications, Challenges, and Opportunities. Mob. Inf. Syst. 2022, 2022, 7632892. [Google Scholar] [CrossRef]
Hancock, P.A.; Kajaks, T.; Caird, J.K.; Chignell, M.H.; Mizobuchi, S.; Burns, P.C.; Feng, J.; Fernie, G.R.; Lavallière, M.; Noy, I.Y.; et al. Challenges to Human Drivers in Increasingly Automated Vehicles. Hum. Factors J. Hum. Factors Ergon. Soc. 2020, 62, 310–328. [Google Scholar] [CrossRef] [PubMed]
De Winter, J.C.F.; Hancock, P.A. Why Human Factors Science Is Demonstrably Necessary: Historical and Evolutionary Foundations. Ergonomics 2021, 64, 1115–1131. [Google Scholar] [CrossRef] [PubMed]
Chan, D.W.M.; Baghbaderani, A.B.; Sarvari, H. An Empirical Study of the Human Error-Related Factors Leading to Site Accidents in the Iranian Urban Construction Industry. Buildings 2022, 12, 1858. [Google Scholar] [CrossRef]
Liu, P.; Zhang, R.; Yin, Z.; Li, Z. Human Errors and Human Reliability. In Handbook of Human Factors and Ergonomics; Salvendy, G., Karwowski, W., Eds.; Wiley: Hoboken, NJ, USA, 2021; pp. 514–572. ISBN 978-1-119-63608-3. [Google Scholar]
Saktheeswaran, A.; Srinivasan, A.; Stasko, J. Touch? Speech? Or Touch and Speech? Investigating Multimodal Interaction for Visual Network Exploration and Analysis. IEEE Trans. Vis. Comput. Graph. 2020, 26, 2168–2179. [Google Scholar] [CrossRef]
Grinschgl, S.; Meyerhoff, H.S.; Papenmeier, F. Interface and Interaction Design: How Mobile Touch Devices Foster Cognitive Offloading. Comput. Hum. Behav. 2020, 108, 106317. [Google Scholar] [CrossRef]
Brunzini, A.; Papetti, A.; Grassetti, F.; Moroncini, G.; Germani, M. The Effect of Systemic Sclerosis on Use of Mobile Touchscreen Interfaces: Design Guidelines and Physio-Rehabilitation. Int. J. Ind. Ergon. 2022, 87, 103256. [Google Scholar] [CrossRef]
Detjen, H.; Geisler, S.; Schneegass, S. Maneuver-Based Control Interventions During Automated Driving: Comparing Touch, Voice, and Mid-Air Gestures as Input Modalities. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; pp. 3268–3274. [Google Scholar]
Dudek, M.; Schulte, A. Effects of Tasking Modalities in Manned-Unmanned Teaming Missions. In Proceedings of the AIAA SCITECH 2022 Forum, San Diego, CA, USA, 3 January 2022; American Institute of Aeronautics and Astronautics: Reston, VA, USA, 2022. [Google Scholar]
Cha, M.C. Switching between Touch and Voice: Factors Influencing Modality Selection in Multimodal Systems. Ergonomics 2025, 1–14. [Google Scholar] [CrossRef]
Kim, Y.-H.; Lee, B.; Srinivasan, A.; Choe, E.K. Data@Hand: Fostering Visual Exploration of Personal Data on Smartphones Leveraging Speech and Touch Interaction. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 6 May 2021; ACM: New York, NY, USA, 2021; pp. 1–17. [Google Scholar]
Braun, M.; Weber, F.; Alt, F. Affective Automotive User Interfaces–Reviewing the State of Driver Affect Research and Emotion Regulation in the Car. ACM Comput. Surv. 2021, 54, 1–26. [Google Scholar] [CrossRef]
Herbig, N.; Pal, S.; Düwel, T.; Meladaki, K.; Monshizadeh, M.; Hnatovskiy, V.; Krüger, A.; van Genabith, J. MMPE: A Multi-Modal Interface Using Handwriting, Touch Reordering, and Speech Commands for Post-Editing Machine Translation. In Proceedings of the Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Online, 5–10 July 2020; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 327–334. [Google Scholar]
Oya, R.; Tanaka, A. Touch and Voice Have Different Advantages in Perceiving Positive and Negative Emotions. i-Perception 2023, 14, 20416695231160420. [Google Scholar] [CrossRef]
Zhang, T.; Liu, X.; Zeng, W.; Tao, D.; Li, G.; Qu, X. Input Modality Matters: A Comparison of Touch, Speech, and Gesture Based in-Vehicle Interaction. Appl. Ergon. 2023, 108, 103958. [Google Scholar] [CrossRef]
Zhou, X.; Li, S.; Ma, L.; Zhang, W. Driver’s Attitudes and Preferences toward Connected Vehicle Information System. Int. J. Ind. Ergon. 2022, 91, 103348. [Google Scholar] [CrossRef]
Calhoun, G.L.; Ruff, H.A.; Behymer, K.J.; Rothwell, C.D. Evaluation of Interface Modality for Control of Multiple Unmanned Vehicles. In Engineering Psychology and Cognitive Ergonomics: Cognition and Design; Harris, D., Ed.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2017; Volume 10276, pp. 15–34. ISBN 978-3-319-58474-4. [Google Scholar]
Lazaro, M.J.; Lee, J.; Chun, J.; Yun, M.H.; Kim, S. Multimodal Interaction: Input-Output Modality Combinations for Identification Tasks in Augmented Reality. Appl. Ergon. 2022, 105, 103842. [Google Scholar] [CrossRef] [PubMed]
Levulis, S.J.; DeLucia, P.R.; Kim, S.Y. Effects of Touch, Voice, and Multimodal Input, and Task Load on Multiple-UAV Monitoring Performance During Simulated Manned-Unmanned Teaming in a Military Helicopter. Hum. Factors J. Hum. Factors Ergon. Soc. 2018, 60, 1117–1129. [Google Scholar] [CrossRef] [PubMed]
Franca, A.; Achim, E.; Andreas, S.; Bernd, H. Virtual Reality Meets Smartwatch: Intuitive, Natural, and Multi-Modal Interaction. In Proceedings of the 2017 chi Conference Extended Abstracts on Human Factors in Computing Systems, Denver, CO, USA, 6–11 May 2017; pp. 2884–2890. [Google Scholar]
Cheng, M.-M.; Zheng, S.; Lin, W.-Y.; Vineet, V.; Sturgess, P.; Crook, N.; Mitra, N.J.; Torr, P. ImageSpirit: Verbal Guided Image Parsing. ACM Trans. Graph. 2014, 34, 1–11. [Google Scholar] [CrossRef]
Zimmerer, C.; Krop, P.; Fischbach, M.; Latoschik, M.E. Reducing the Cognitive Load of Playing a Digital Tabletop Game with a Multimodal Interface. In Proceedings of the CHI Conference on Human Factors in Computing Systems, New Orleans, LA, USA, 29 April 2022; ACM: New York, NY, USA, 2022; pp. 1–13. [Google Scholar]
Katzman, N.; Oron-Gilad, T. Touch-and-Go: Interior Tactile Communication in Armored Fighting Vehicles. Ergon. Des. Q. Hum. Factors Appl. 2020, 28, 16–21. [Google Scholar] [CrossRef]
Almaatouq, A.; Alsobay, M.; Yin, M.; Watts, D.J. Task Complexity Moderates Group Synergy. Proc. Natl. Acad. Sci. USA 2021, 118, e2101062118. [Google Scholar] [CrossRef]
Zhang, X.; Qu, X.; Xue, H.; Tao, D.; Li, T. Effects of Time of Day and Taxi Route Complexity on Navigation Errors: An Experimental Study. Accid. Anal. Prev. 2019, 125, 14–19. [Google Scholar] [CrossRef]
Diaz-Piedra, C.; Rieiro, H.; Cherino, A.; Fuentes, L.J.; Catena, A.; Stasi, L.L.D. The Effects of Flight Complexity on Gaze Entropy: An Experimental Study with Fighter Pilots. Appl. Ergon. 2019, 77, 92–99. [Google Scholar] [CrossRef]
Gulati, A.; Nguyen, T.N.; Gonzalez, C. Task Complexity and Performance in Individuals and Groups Without Communication. In Computational Theory of Mind for Human-Machine Teams; Gurney, N., Sukthankar, G., Eds.; Lecture Notes in Computer Science; Springer Nature Switzerland: Cham, Switzerland, 2022; Volume 13775, pp. 102–117. ISBN 978-3-031-21670-1. [Google Scholar]
Hebbar, P.A.; Bhattacharya, K.; Prabhakar, G.; Pashilkar, A.A.; Biswas, P. Correlation Between Physiological and Performance-Based Metrics to Estimate Pilots’ Cognitive Workload. Front. Psychol. 2021, 12, 555446. [Google Scholar] [CrossRef]
Naik, R.; Kogkas, A.; Ashrafian, H.; Mylonas, G.; Darzi, A. The Measurement of Cognitive Workload in Surgery Using Pupil Metrics: A Systematic Review and Narrative Analysis. J. Surg. Res. 2022, 280, 258–272. [Google Scholar] [CrossRef] [PubMed]
Stasi, L.L.D.; Marchitto, M.; Antolí, A.; Cañas, J.J. Saccadic Peak Velocity as an Alternative Index of Operator Attention: A Short Review. Eur. Rev. Appl. Psychol. 2013, 63, 335–343. [Google Scholar] [CrossRef]
Diaz-Piedra, C.; Rieiro, H.; Suárez, J.; Rios-Tejada, F.; Catena, A.; Stasi, L.L.D. Fatigue in the Military: Towards a Fatigue Detection Test Based on the Saccadic Velocity. Physiol. Meas. 2016, 37, N62. [Google Scholar] [CrossRef] [PubMed]
Nocera, F.D.; Camilli, M.; Terenzi, M. A Random Glance at the Flight Deck: Pilots’ Scanning Strategies and the Real-Time Assessment of Mental Workload. J. Cogn. Eng. Decis. Mak. 2007, 1, 271–285. [Google Scholar] [CrossRef]
Dudek, M.; Schulte, A. Meaningful Guidance of Unmanned Aerial Vehicles in Dynamic Environments. In Proceedings of the 1st International Conference on Cognitive Aircraft Systems, Toulouse, France, 18–19 March 2020; p. 5. [Google Scholar]
Jang, I.; Kim, Y.; Park, J. Scrutinizing the Effect of Task Complexity on Operators’ Task Performance Time in a Digitalized Main Control Room of a Nuclear Power Plant. Nucl. Eng. Technol. 2025, 57, 103254. [Google Scholar] [CrossRef]
Liu, P.; Li, Z. Task Complexity: A Review and Conceptualization Framework. Int. J. Ind. Ergon. 2012, 42, 553–568. [Google Scholar] [CrossRef]
Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. Adv. Psychol. 1988, 52, 139–183. [Google Scholar] [CrossRef]
Stasi, L.L.D.; Renner, R.; Staehr, P.; Helmert, J.R.; Velichkovsky, B.M.; Cañas, J.J.; Catena, A.; Pannasch, S. Saccadic Peak Velocity Sensitivity to Variations in Mental Workload. Aviat. Space Environ. Med. 2010, 81, 413–417. [Google Scholar] [CrossRef]
Marquart, G.; Cabrall, C.; Winter, J. de Review of Eye-Related Measures of Drivers’ Mental Workload. Procedia Manuf. 2015, 3, 2854–2861. [Google Scholar] [CrossRef]
Hart, S.G. NASA-Task Load Index (NASA-TLX): 20 Years Later. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2006, 50, 904–908. [Google Scholar] [CrossRef]
Bitkina, O.V.; Park, J.; Kim, H.K. The Ability of Eye-Tracking Metrics to Classify and Predict the Perceived Driving Workload. Int. J. Ind. Ergon. 2021, 86, 103193. [Google Scholar] [CrossRef]
Tolvanen, O.; Elomaa, A.-P.; Itkonen, M.; Vrzakova, H.; Bednarik, R.; Huotarinen, A. Eye-Tracking Indicators of Workload in Surgery: A Systematic Review. J. Investig. Surg. 2022, 35, 1340–1349. [Google Scholar] [CrossRef] [PubMed]
Field, A. Discovering Statistics Using IBM SPSS Statistics, 6th ed.; Sage Publications Limited: Thousand Oaks, CA, USA, 2024. [Google Scholar]
Armstrong, R.A. Recommendations for Analysis of Repeated-Measures Designs: Testing and Correcting for Sphericity and Use of MANOVA and Mixed Model Analysis. Ophthalmic Physiol. Opt. 2017, 37, 585–593. [Google Scholar] [CrossRef] [PubMed]
Mishra, P.; Singh, U.; Pandey, C.; Mishra, P.; Pandey, G. Application of Student’s t-Test, Analysis of Variance, and Covariance. Ann. Card. Anaesth. 2019, 22, 407. [Google Scholar] [CrossRef]
Pfleging, B. SpeeT: A Multimodal Interaction Style Combining Speech and Touch Interaction in Automotive Environments. In Proceedings of the Automotive UI, Salzburg, Austria, 29 November–2 December 2011; Volume 11, p. 15. [Google Scholar]
Liang, C.; Liu, S.; Wanyan, X.; Liu, C.; Xiao, X.; Min, Y. Effects of Input Method and Display Mode of Situation Map on Early Warning Aircraft Reconnaissance Task Performance with Different Information Complexities. Chin. J. Aeronaut. 2023, 36, 105–114. [Google Scholar] [CrossRef]
Lee, J.; Rodriguez, S.S.; Natarrajan, R.; Chen, J.; Deep, H.; Kirlik, A. What’s This? A Voice and Touch Multimodal Approach for Ambiguity Resolution in Voice Assistants. In Proceedings of the 2021 International Conference on Multimodal Interaction, Montréal, QC, Canada, 18 October 2021; ACM: New York, NY, USA, 2021; pp. 512–520. [Google Scholar]
Wickens, C.D. Multiple Resources and Mental Workload. Hum. Factors 2008, 50, 449–455. [Google Scholar] [CrossRef]
Maggi, P.; Nocera, F.D. Sensitivity of the Spatial Distribution of Fixations to Variations in the Type of Task Demand and Its Relation to Visual Entropy. Front. Hum. Neurosci. 2021, 15, 642535. [Google Scholar] [CrossRef]
Tavakoli, A.; Heydarian, A. Multimodal Driver State Modeling through Unsupervised Learning. Accid. Anal. Prev. 2022, 170, 106640. [Google Scholar] [CrossRef]
Wu, C.; Cha, J.; Sulek, J.; Zhou, T.; Sundaram, C.P.; Wachs, J.; Yu, D. Eye-Tracking Metrics Predict Perceived Workload in Robotic Surgical Skills Training. Hum. Factors J. Hum. Factors Ergon. Soc. 2020, 62, 1365–1386. [Google Scholar] [CrossRef]
Bachurina, V.; Arsalidou, M. Multiple Levels of Mental Attentional Demand Modulate Peak Saccade Velocity and Blink Rate. Heliyon 2022, 8, e08826. [Google Scholar] [CrossRef]
Zhou, X.; Qi, W.; Ovur, S.E.; Zhang, L.; Hu, Y.; Su, H.; Ferrigno, G.; De Momi, E. A Novel Muscle-Computer Interface for Hand Gesture Recognition Using Depth Vision. J. Ambient Intell. Humaniz. Comput. 2020, 11, 5569–5580. [Google Scholar] [CrossRef]
Qi, W.; Xu, X.; Qian, K.; Schuller, B.W.; Fortino, G.; Aliverti, A. A Review of AIoT-Based Human Activity Recognition: From Application to Technique. IEEE J. Biomed. Health Inform. 2025, 29, 2425–2438. [Google Scholar] [CrossRef]

Figure 1. Schematic of test scene and experimental design. ** indicates p < 0.001; … represents the remaining dimensions that are not explicitly mentioned.

Figure 2. Results of task completion time and number of misoperations. (note: ** and * indicate p < 0.001 and p < 0.05, respectively).

Figure 3. Results of NASA-TLX total and sub-scale scores under various experimental settings. (note: ** and * indicate p < 0.001 and p < 0.05, respectively).

Figure 4. Results of eye response metrics under different experimental conditions, with subfigures (a–d) representing the changes in mean peak saccade velocity, fixation entropy, NNI, and mean pupil diameter as they vary with experimental conditions. (note: * indicate p < 0.05).

Table 1. Change trends of subjective workload under various experimental settings.

	Subjective Workload Measures
Conditions	NASA-TLX Total	MD	PD	TD	PE	EF	FR
T_L	5.03 ± 1.30	6.22 ± 1.63	4.72 ± 2.05	6.33 ± 1.81	6.56 ± 1.25	6.78 ± 1.73	4.17 ± 1.76
T_H	6.01 ± 1.45	6.83 ± 1.69	5.56 ± 2.50	6.78 ± 1.48	5.78 ± 1.59	7.00 ± 1.64	5.06 ± 2.24
N_L	4.65 ± 1.35	5.06 ± 1.73	3.67 ± 1.85	4.83 ± 2.18	7.33 ± 1.57	5.83 ± 2.28	3.17 ± 2.09
N_H	5.43 ± 1.25	5.56 ± 1.79	4.44 ± 1.62	5.56 ± 1.72	7.06 ± 1.26	6.33 ± 2.14	3.39 ± 2.33

Note: The lowest value of measured indicators (highest in PE) for each of the four experimental conditions is indicated by a gray color.

Table 2. ANOVA results of NASA-TLX total and sub-scale scores.

NASA-TLX Total and Sub-Scale Scores	Interaction Effect of Interaction Modality × Task Complexity			Main Effect of Interaction Modality			Main Effect of Task Complexity
NASA-TLX Total and Sub-Scale Scores	F	p	$η_{p}^{2}$	F	p	$η_{p}^{2}$	F	p	$η_{p}^{2}$
Total scores	0.333	0.572	0.019	14.390	0.001 *	0.458	13.211	0.002 *	0.437
MD	0.065	0.801	0.004	22.178	<0.001 **	0.566	13.60	0.002 *	0.444
PD	0.012	0.913	0.001	8.525	0.010 *	0.334	10.693	0.005 *	0.386
TD	0.747	0.399	0.042	13.214	0.002 *	0.437	4.929	0.040 *	0.225
PE	1.889	0.187	0.100	13.337	0.002 *	0.440	3.517	0.078	0.171
EF	0.457	0.508	0.026	9.654	0.006 *	0.362	2.111	0.164	0.110
FR	1.789	0.199	0.095	17.000	0.001 *	0.500	5.414	0.033 *	0.242

Note: ** and * indicate p < 0.001 and p < 0.05, respectively.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Feng, C.; Liu, S.; Wanyan, X.; Qian, C.; Ji, K.; Xie, F.; Zhou, Y. Effects of Introducing Speech Interaction Modality on Performance of Special Vehicle Crew Under Various Task Complexity Conditions. Systems 2025, 13, 847. https://doi.org/10.3390/systems13100847

AMA Style

Feng C, Liu S, Wanyan X, Qian C, Ji K, Xie F, Zhou Y. Effects of Introducing Speech Interaction Modality on Performance of Special Vehicle Crew Under Various Task Complexity Conditions. Systems. 2025; 13(10):847. https://doi.org/10.3390/systems13100847

Chicago/Turabian Style

Feng, Chuanyan, Shuang Liu, Xiaoru Wanyan, Chunying Qian, Kun Ji, Fang Xie, and Yue Zhou. 2025. "Effects of Introducing Speech Interaction Modality on Performance of Special Vehicle Crew Under Various Task Complexity Conditions" Systems 13, no. 10: 847. https://doi.org/10.3390/systems13100847

APA Style

Feng, C., Liu, S., Wanyan, X., Qian, C., Ji, K., Xie, F., & Zhou, Y. (2025). Effects of Introducing Speech Interaction Modality on Performance of Special Vehicle Crew Under Various Task Complexity Conditions. Systems, 13(10), 847. https://doi.org/10.3390/systems13100847

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Effects of Introducing Speech Interaction Modality on Performance of Special Vehicle Crew Under Various Task Complexity Conditions

Abstract

1. Introduction

2. Methods

2.1. Participants

2.2. Experimental Platform and Equipment

2.3. Experimental Design

2.4. Experimental Tasks and Procedures

3. Results

3.1. Task Performance Results

3.2. Subjective Workload Results

3.3. Results of Eye Response

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI