Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (42)

Search Parameters:
Keywords = virtual speakers

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 1569 KiB  
Article
Virtual Reality-Assisted, Single-Session Exposure for Public Speaking Anxiety: Improved Self-Reports and Heart Rate but No Significant Change in Heart Rate Variability
by Tonia-Flery Artemi, Thekla Konstantinou, Stephany Naziri and Georgia Panayiotou
Virtual Worlds 2025, 4(2), 27; https://doi.org/10.3390/virtualworlds4020027 - 19 Jun 2025
Viewed by 623
Abstract
Introduction: This study examines the combined use of objective physiological measures (heart rate [HR], heart rate variability [HRV]) and subjective self-reports to gain a comprehensive understanding of anxiety reduction mechanisms—specifically, habituation—in the context of Virtual Reality Exposure (VRE) for public speaking anxiety (PSA). [...] Read more.
Introduction: This study examines the combined use of objective physiological measures (heart rate [HR], heart rate variability [HRV]) and subjective self-reports to gain a comprehensive understanding of anxiety reduction mechanisms—specifically, habituation—in the context of Virtual Reality Exposure (VRE) for public speaking anxiety (PSA). The present study evaluated whether a single-session, personalized VRE intervention could effectively reduce PSA. Methods: A total of 39 university students (mean age = 20.97, SD = 3.05) with clinically significant PSA were randomly assigned to a VRE group or a control group. Participants completed a 2 min speech task before and after the intervention and reported subjective distress (Subjective Units of Distress, SUDs), public speaking confidence (Personal Report of Confidence as a Speaker, PRCS), and willingness to speak in public. Heart rate (HR) and heart rate variability (HRV; RMSSD) were recorded at baseline and during speech tasks. The VRE protocol used personalized, hierarchical exposure to virtual audiences, with repeated trials until a criterion reduction in SUDs was achieved. Non-parametric analyses assessed group and time effects. Results: VRE participants showed significant reductions in subjective distress (p < 0.001) and HR (p < 0.001), with HR returning to baseline post-intervention. No such reductions were observed in the control group. Willingness to speak improved significantly only in the VRE group (p = 0.001). HRV did not differ significantly across time or groups. Conclusions: A single, personalized VRE session can produce measurable reductions in PSA, particularly in subjective distress and autonomic arousal, supporting habituation as a primary mechanism of change, even after one session. The lack of HRV change suggests that emotion regulation may require more prolonged interventions. These findings support VRE’s potential as an efficient and scalable treatment option for PSA. Full article
Show Figures

Figure 1

25 pages, 1627 KiB  
Article
Reconciling Inter- and Intra-Individual Variation in L2 Socio-Pragmatic Development: Intensifier Variation in Spoken German
by Mason A. Wirtz
Languages 2025, 10(6), 139; https://doi.org/10.3390/languages10060139 - 12 Jun 2025
Viewed by 411
Abstract
This study is the first to scrutinize the rates of, and the lexical diversity in, adjective intensification in second language (L2) German. We additionally attend to the issue concerning whether sociodemographic variables (i.e., length of residence, age, and gender) and individual learner differences [...] Read more.
This study is the first to scrutinize the rates of, and the lexical diversity in, adjective intensification in second language (L2) German. We additionally attend to the issue concerning whether sociodemographic variables (i.e., length of residence, age, and gender) and individual learner differences (i.e., L2 proficiency, intensity of exposure to the L2, and L2 socioaffect) can predict (a) the inter-individual variation in syntactic adjective intensification, and (b) the observed intra-individual variation based on a weighted measure of intensifier lexical diversity. We analyzed spoken data collected via virtual reality (VR) elicitation tasks from 40 learners of L2 German (first language [L1] English). We found that learners engaged in adjective intensification at similar rates as those reported in the literature, despite some cases of overshooting the target; learners also preferred markers of intensification consistent with the lexical choices of L1 German speakers. Sociodemographic variables did not predict different rates of adjective intensification; rather, individual learner differences such as those relating to L2 proficiency and L2 exposure correlated with more target-like use of intensifiers, though the correlations were weak. The diversity in adjective intensification was also only marginally related to demographic factors and individual learner differences. Our findings suggest that L2 learners indeed engage in similar intensification practices as do L1 speakers; however, systematically predicting more ‘successful’ adoption of target-like sociopragmatic norms among L2 learners remains challenging. Full article
Show Figures

Figure 1

18 pages, 1294 KiB  
Article
The Impact of Virtual Exchanges on the Development of Sociolinguistic Competence in Second Language Spanish Learners: The Case of Voseo
by Francisco Salgado-Robles and Angela George
Languages 2025, 10(5), 109; https://doi.org/10.3390/languages10050109 - 8 May 2025
Viewed by 677
Abstract
This study investigates how sociolinguistically informed instruction and virtual exchanges affect the use of the second-person singular pronouns (usted, , and vos) by adult second language learners of Spanish enrolled in a third-semester course at a four-year college. The [...] Read more.
This study investigates how sociolinguistically informed instruction and virtual exchanges affect the use of the second-person singular pronouns (usted, , and vos) by adult second language learners of Spanish enrolled in a third-semester course at a four-year college. The results from written contextualized tasks and oral discourse completion tasks show that participants who engaged in virtual exchanges with native speakers from Guatemala, Honduras, and El Salvador (experimental group) significantly improved their use of vos compared to those who did not participate in these exchanges (control group). Both groups increased their use of and vos over time, with notable differences between written and oral tasks. These findings provide empirical support for incorporating virtual exchanges into language learning curricula, demonstrating their effectiveness in teaching regional dialectal features such as voseo. Additionally, by focusing on the often-overlooked regionally variable pronoun vos, this study enriches the existing literature on Spanish language instruction and opens new avenues for research on dialectal variation and sociolinguistically informed pedagogy. Full article
(This article belongs to the Special Issue The Acquisition of L2 Sociolinguistic Competence)
Show Figures

Figure 1

17 pages, 3872 KiB  
Article
Technology to Enable People with Intellectual Disabilities and Blindness to Collect Boxes with Objects and Transport Them to Different Rooms of Their Daily Context: A Single-Case Research Series
by Giulio E. Lancioni, Gloria Alberti, Francesco Pezzuoli, Fabiana Abbinante, Nirbhay N. Singh, Mark F. O’Reilly and Jeff Sigafoos
Technologies 2025, 13(4), 131; https://doi.org/10.3390/technologies13040131 - 31 Mar 2025
Viewed by 405
Abstract
(1) Background: People with intellectual disabilities and blindness tend to be withdrawn and sedentary. This study was carried out to assess a new technology system to enable seven of these people to collect boxes containing different sets of objects from a storage room [...] Read more.
(1) Background: People with intellectual disabilities and blindness tend to be withdrawn and sedentary. This study was carried out to assess a new technology system to enable seven of these people to collect boxes containing different sets of objects from a storage room and transport them to the appropriate destination rooms. (2) Methods: The technology system used for the study involved tags with radio frequency identification codes, a tag reader, a smartphone, and mini speakers. At the start of a session, the participants were called by the system to take a box from the storage room. Once they collected a box, the system identified the tags attached to the box, called the participants to the room where the box was to be transported and delivered, and provided them with preferred music stimulation. The same process was followed for each of the other boxes available in the session. (3) Results: During baseline sessions without the system, the mean frequency of boxes handled correctly (collected, transported, and put away without research assistants’ guidance) was zero or virtually zero. During the intervention sessions with the system, the participants’ mean frequency of boxes handled correctly increased to between about 10 and 15 per session. (4) Conclusions: These findings suggest that the new technology system might be helpful for people like the participants of this study. Full article
Show Figures

Figure 1

18 pages, 12629 KiB  
Article
Leveraging AI-Generated Virtual Speakers to Enhance Multilingual E-Learning Experiences
by Sergio Miranda and Rosa Vegliante
Information 2025, 16(2), 132; https://doi.org/10.3390/info16020132 - 11 Feb 2025
Cited by 2 | Viewed by 1262
Abstract
The growing demand for accessible and effective e-learning platforms has led to an increased focus on innovative solutions to address the challenges posed by the diverse linguistic backgrounds of learners. This paper explores the use of AI-generated virtual speakers to enhance multilingual e-learning [...] Read more.
The growing demand for accessible and effective e-learning platforms has led to an increased focus on innovative solutions to address the challenges posed by the diverse linguistic backgrounds of learners. This paper explores the use of AI-generated virtual speakers to enhance multilingual e-learning experiences. This study employs a system developed using Google Sheets and Google Script to create and manage multilingual courses, integrating AI-powered virtual speakers to deliver content in learners’ native languages. The e-learning platform used is a customized Moodle, and three courses were developed: “Mental Wellbeing in Mining”, “Rescue in the Mine”, and “Risk Assessment” for a European ERASMUS+ project. This study involved 147 participants from various educational and professional backgrounds. The main findings indicate that AI-generated virtual speakers significantly improve the accessibility of e-learning content. Participants preferred content in their native language and found AI-generated videos effective and engaging. This study concludes that AI-generated virtual speakers offer a promising approach to overcoming linguistic barriers in e-learning, providing personalized and adaptive learning experiences. Future research should focus on addressing ethical considerations, such as data privacy and algorithmic bias, and expanding the user base to include more languages and proficiency levels. Full article
(This article belongs to the Special Issue Advancing Educational Innovation with Artificial Intelligence)
Show Figures

Figure 1

17 pages, 2555 KiB  
Article
Spatial Sound Rendering Using Intensity Impulse Response and Cardioid Masking Function
by Witold Mickiewicz and Mirosław Łazoryszczak
Appl. Sci. 2025, 15(3), 1112; https://doi.org/10.3390/app15031112 - 23 Jan 2025
Viewed by 909
Abstract
This study presents a new technique for creating spatial sounds based on a convolution processor. The main objective of this research was to propose a new method for generating a set of impulse responses that guarantee a realistic spatial experience based on the [...] Read more.
This study presents a new technique for creating spatial sounds based on a convolution processor. The main objective of this research was to propose a new method for generating a set of impulse responses that guarantee a realistic spatial experience based on the fusion of amplitude data acquired from an omnidirectional microphone and directional data acquired from an intensity probe. The advantages of the proposed approach are its versatility and easy adaptation to playback in a variety of multi-speaker systems, as well as a reduction in the amount of data, thereby simplifying the measurement procedure required to create any set of channel responses at the post-production stage. This paper describes the concept behind the method, the data acquisition method, and the signal processing algorithm required to generate any number of high-quality channel impulse responses. Experimental results are presented to confirm the suitability of the proposed solution by comparing the results obtained for a traditional surround 5.1 recording system and the proposed approach. This study aims to highlight the potential of intensity impulse responses in the audio recording and virtual reality industries. Full article
(This article belongs to the Special Issue Spatial Audio and Sound Design)
Show Figures

Figure 1

75 pages, 608 KiB  
Conference Report
Abstracts of the 4th International Electronic Conference on Nutrients (IECN 2024), 16–18 October 2024
by Mauro Lombardo and Carol Johnston
Biol. Life Sci. Forum 2024, 38(1), 2; https://doi.org/10.3390/blsf2024038002 - 17 Dec 2024
Cited by 1 | Viewed by 3203
Abstract
The 4th International Electronic Conference on Nutrients—Plant-Based Nutrition Focusing on Innovation, Health, and Sustainable Food Systems (IECN 2024) took place online from 16 to 18 October 2024, which aimed to serve as a multidisciplinary platform for the exploration of innovative research and advancements [...] Read more.
The 4th International Electronic Conference on Nutrients—Plant-Based Nutrition Focusing on Innovation, Health, and Sustainable Food Systems (IECN 2024) took place online from 16 to 18 October 2024, which aimed to serve as a multidisciplinary platform for the exploration of innovative research and advancements in nutrient science with a focus on innovations for health and sustainability. Over 150 scholars and experts attended this virtual online conference. Five keynote speakers and seven invited speakers shared their knowledge and discoveries. The conference received 220 abstracts submissions, of which 147 were accepted. This conference report is an abstract collection from six different sessions of IECN 2024. Full article
(This article belongs to the Proceedings of The 4th International Electronic Conference on Nutrients)
33 pages, 46059 KiB  
Article
Real and Virtual Lecture Rooms: Validation of a Virtual Reality System for the Perceptual Assessment of Room Acoustical Quality
by Angela Guastamacchia, Riccardo Giovanni Rosso, Giuseppina Emma Puglisi, Fabrizio Riente, Louena Shtrepi and Arianna Astolfi
Acoustics 2024, 6(4), 933-965; https://doi.org/10.3390/acoustics6040052 - 30 Oct 2024
Viewed by 2325
Abstract
Enhancing the acoustical quality in learning environments is necessary, especially for hearing aid (HA) users. When in-field evaluations cannot be performed, virtual reality (VR) can be adopted for acoustical quality assessments of existing and new buildings, contributing to the acquisition of subjective impressions [...] Read more.
Enhancing the acoustical quality in learning environments is necessary, especially for hearing aid (HA) users. When in-field evaluations cannot be performed, virtual reality (VR) can be adopted for acoustical quality assessments of existing and new buildings, contributing to the acquisition of subjective impressions in lab settings. To ensure an accurate spatial reproduction of the sound field in VR for HA users, multi-speaker-based systems can be employed to auralize a given environment. However, most systems require a lot of effort due to cost, size, and construction. This work deals with the validation of a VR-system based on a 16-speaker-array synced with a VR headset, arranged to be easily replicated in small non-anechoic spaces and suitable for HA users. Both objective and subjective validations are performed against a real university lecture room of 800 m3 and with 2.3 s of reverberation time at mid-frequencies. Comparisons of binaural and monoaural room acoustic parameters are performed between measurements in the real lecture room and its lab reproduction. To validate the audiovisual experience, 32 normal-hearing subjects were administered the Igroup Presence Questionnaire (IPQ) on the overall sense of perceived presence. The outcomes confirm that the system is a promising and feasible tool to predict the perceived acoustical quality of a room. Full article
(This article belongs to the Special Issue Acoustical Comfort in Educational Buildings)
Show Figures

Figure 1

20 pages, 1670 KiB  
Article
An Approach of Query Audience’s Attention in Virtual Speech
by Hongbo Kang, Rui Yang, Ruoyang Song, Chunjie Yang and Wenqing Wang
Sensors 2024, 24(16), 5363; https://doi.org/10.3390/s24165363 - 20 Aug 2024
Viewed by 1236
Abstract
Virtual speeches are a very popular way for remote multi-user communication, but it has the disadvantage of the lack of eye contact. This paper proposes the evaluation of an online audience attention based on gaze tracking. Our research only uses webcams to capture [...] Read more.
Virtual speeches are a very popular way for remote multi-user communication, but it has the disadvantage of the lack of eye contact. This paper proposes the evaluation of an online audience attention based on gaze tracking. Our research only uses webcams to capture the audience’s head posture, gaze time, and other features, providing a low-cost method for attention monitoring with reference values across multiple domains. Meantime, we also propose a set of indexes which can be used to evaluate the audience’s degree of attention, making up for the fact that the speaker cannot gauge the audience’s concentration through eye contact during online speeches. We selected 96 students for a 20 min group simulation session and used Spearman’s correlation coefficient to analyze the correlation between our evaluation indicators and concentration. The result showed that each evaluation index has a significant correlation with the degree of attention (p = 0.01), and all the students in the focused group met the thresholds set by each of our evaluation indicators, while the students in the non-focused group failed to reach the standard. During the simulation, eye movement data and EEG signals were measured synchronously for the second group of students. The EEG results of the students were consistent with the systematic evaluation. The performance of the measured EEG signals confirmed the accuracy of the systematic evaluation. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

35 pages, 24997 KiB  
Article
EchoSee: An Assistive Mobile Application for Real-Time 3D Environment Reconstruction and Sonification Supporting Enhanced Navigation for People with Vision Impairments
by Broderick S. Schwartz, Seth King and Tyler Bell
Bioengineering 2024, 11(8), 831; https://doi.org/10.3390/bioengineering11080831 - 14 Aug 2024
Cited by 1 | Viewed by 3669
Abstract
Improving the quality of life for people with vision impairments has been an important goal in the research and design of assistive devices for several decades. This paper seeks to further that goal by introducing a novel assistive technology platform that leverages real-time [...] Read more.
Improving the quality of life for people with vision impairments has been an important goal in the research and design of assistive devices for several decades. This paper seeks to further that goal by introducing a novel assistive technology platform that leverages real-time 3D spatial audio to promote safe and efficient navigation for people who are blind or visually impaired (PVI). The presented platform, EchoSee, uses modern 3D scanning technology on a mobile device to construct a live, digital 3D map of a user’s environment as they move about their surroundings. Spatialized, virtual audio sources (i.e., virtual speakers) are dynamically placed within the digital 3D scan of the world, providing the navigator with a real-time 3D stereo audio “soundscape.” The digital 3D map, and its resultant soundscape, are continuously updated as the user moves about their environment. The generated soundscape is played back through headphones connected to the navigator’s device. This paper details (1) the underlying technical components and how they were integrated to produce the mobile application that generates a dynamic soundscape on a consumer mobile device, (2) a methodology for analyzing navigation performance with the application, (3) the design and execution of a user study investigating the effectiveness of the presented system, and (4) a discussion of the results of that study along with a proposed future study and possible improvements. Altogether, this paper presents a novel software platform aimed at assisting individuals with vision impairments to navigate and understand spaces safely, efficiently, and independently and the results of a feasibility study analyzing the viability of the approach. Full article
(This article belongs to the Section Nanobiotechnology and Biofabrication)
Show Figures

Graphical abstract

18 pages, 2938 KiB  
Article
Facial Animation Strategies for Improved Emotional Expression in Virtual Reality
by Hyewon Song and Beom Kwon
Electronics 2024, 13(13), 2601; https://doi.org/10.3390/electronics13132601 - 2 Jul 2024
Cited by 4 | Viewed by 3369
Abstract
The portrayal of emotions by virtual characters is crucial in virtual reality (VR) communication. Effective communication in VR relies on a shared understanding, which is significantly enhanced when virtual characters authentically express emotions that align with their spoken words. While human emotions are [...] Read more.
The portrayal of emotions by virtual characters is crucial in virtual reality (VR) communication. Effective communication in VR relies on a shared understanding, which is significantly enhanced when virtual characters authentically express emotions that align with their spoken words. While human emotions are often conveyed through facial expressions, existing facial animation techniques have mainly focused on lip-syncing and head movements to improve naturalness. This study investigates the influence of various factors in facial animation on the emotional representation of virtual characters. We conduct a comparative and analytical study using an audio-visual database, examining the impact of different animation factors. To this end, we utilize a total of 24 voice samples, representing 12 different speakers, with each emotional voice segment lasting approximately 4–5 s. Using these samples, we design six perceptual experiments to investigate the impact of facial cues—including facial expression, lip movement, head motion, and overall appearance—on the expression of emotions by virtual characters. Additionally, we engaged 20 participants to evaluate and select appropriate combinations of facial expressions, lip movements, head motions, and appearances that align with the given emotion and its intensity. Our findings indicate that emotional representation in virtual characters is closely linked to facial expressions, head movements, and overall appearance. Conversely, lip-syncing, which has been a primary focus in prior studies, seems less critical for conveying emotions, as its accuracy is difficult to perceive with the naked eye. The results of our study can significantly benefit the VR community by aiding in the development of virtual characters capable of expressing a diverse range of emotions. Full article
Show Figures

Figure 1

14 pages, 3306 KiB  
Article
Flexible Self-Powered Low-Decibel Voice Recognition Mask
by Jianing Li, Yating Shi, Jianfeng Chen, Qiaoling Huang, Meidan Ye and Wenxi Guo
Sensors 2024, 24(10), 3007; https://doi.org/10.3390/s24103007 - 9 May 2024
Cited by 4 | Viewed by 1815
Abstract
In environments where silent communication is essential, such as libraries and conference rooms, the need for a discreet means of interaction is paramount. Here, we present a single-electrode, contact-separated triboelectric nanogenerator (CS-TENG) characterized by robust high-frequency sensing capabilities and long-term stability. Integrating this [...] Read more.
In environments where silent communication is essential, such as libraries and conference rooms, the need for a discreet means of interaction is paramount. Here, we present a single-electrode, contact-separated triboelectric nanogenerator (CS-TENG) characterized by robust high-frequency sensing capabilities and long-term stability. Integrating this TENG onto the inner surface of a mask allows for the capture of conversational speech signals through airflow vibrations, generating a comprehensive dataset. Employing advanced signal processing techniques, including short-time Fourier transform (STFT), Mel-frequency cepstral coefficients (MFCC), and deep learning neural networks, facilitates the accurate identification of speaker content and verification of their identity. The accuracy rates for each category of vocabulary and identity recognition exceed 92% and 90%, respectively. This system represents a pivotal advancement in facilitating secure and efficient unobtrusive communication in quiet settings, with promising implications for smart home applications, virtual assistant technology, and potential deployment in security and confidentiality-sensitive contexts. Full article
(This article belongs to the Special Issue Advances in Flexible Self-Powered Electronics Sensors)
Show Figures

Figure 1

35 pages, 13690 KiB  
Article
An Audio-Based SLAM for Indoor Environments: A Robotic Mixed Reality Presentation
by Elfituri S. F. Lahemer and Ahmad Rad
Sensors 2024, 24(9), 2796; https://doi.org/10.3390/s24092796 - 27 Apr 2024
Cited by 2 | Viewed by 2735
Abstract
In this paper, we present a novel approach referred to as the audio-based virtual landmark-based HoloSLAM. This innovative method leverages a single sound source and microphone arrays to estimate the voice-printed speaker’s direction. The system allows an autonomous robot equipped with a single [...] Read more.
In this paper, we present a novel approach referred to as the audio-based virtual landmark-based HoloSLAM. This innovative method leverages a single sound source and microphone arrays to estimate the voice-printed speaker’s direction. The system allows an autonomous robot equipped with a single microphone array to navigate within indoor environments, interact with specific sound sources, and simultaneously determine its own location while mapping the environment. The proposed method does not require multiple audio sources in the environment nor sensor fusion to extract pertinent information and make accurate sound source estimations. Furthermore, the approach incorporates Robotic Mixed Reality using Microsoft HoloLens to superimpose landmarks, effectively mitigating the audio landmark-related issues of conventional audio-based landmark SLAM, particularly in situations where audio landmarks cannot be discerned, are limited in number, or are completely missing. The paper also evaluates an active speaker detection method, demonstrating its ability to achieve high accuracy in scenarios where audio data are the sole input. Real-time experiments validate the effectiveness of this method, emphasizing its precision and comprehensive mapping capabilities. The results of these experiments showcase the accuracy and efficiency of the proposed system, surpassing the constraints associated with traditional audio-based SLAM techniques, ultimately leading to a more detailed and precise mapping of the robot’s surroundings. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

17 pages, 1130 KiB  
Article
The Intelligibility Benefits of Modern Computer-Synthesized Speech for Normal-Hearing and Hearing-Impaired Listeners in Non-Ideal Listening Conditions
by Yizhen Ma and Yan Tang
J. Otorhinolaryngol. Hear. Balance Med. 2024, 5(1), 5; https://doi.org/10.3390/ohbm5010005 - 18 Apr 2024
Viewed by 1789
Abstract
Speech intelligibility is a concern for public health, especially in non-ideal listening conditions where listeners often listen to the target speech in the presence of background noise. With advances in technology, synthetic speech has been increasingly used in lieu of actual human voices [...] Read more.
Speech intelligibility is a concern for public health, especially in non-ideal listening conditions where listeners often listen to the target speech in the presence of background noise. With advances in technology, synthetic speech has been increasingly used in lieu of actual human voices in human–machine interfaces, such as public announcement systems, answering machines, virtual personal assistants, and GPS, to interact with users. However, previous studies showed that speech generated by computer speech synthesizers was often intrinsically less natural and intelligible than natural speech produced by human speakers. In terms of noise, listening to synthetic speech is challenging for listeners with normal hearing (NH), not to mention for hearing-impaired (HI) listeners. Recent developments in speech synthesis have significantly improved the naturalness of synthetic speech. In this study, the intelligibility of speech generated by commercial synthesizers from Google, Amazon, and Microsoft was evaluated by both NH and HI listeners in different noise conditions. Compared to a natural female voice as the baseline, listeners’ listening performance suggested that some of the synthetic speech was significantly more intelligible even at rather adverse listening conditions for the NH cohort. Further acoustical analyses revealed that elongated vowel sounds and reduced spectral tilt were primarily responsible for improved intelligibility for NH, but not for HI due to their impairment at high frequencies and possible cognitive decline associated with aging. Full article
Show Figures

Figure 1

17 pages, 9159 KiB  
Article
The Effect of Eye Contact in Multi-Party Conversations with Virtual Humans and Mitigating the Mona Lisa Effect
by Junyeong Kum, Sunghun Jung and Myungho Lee
Electronics 2024, 13(2), 430; https://doi.org/10.3390/electronics13020430 - 19 Jan 2024
Cited by 2 | Viewed by 2052
Abstract
The demand for kiosk systems with embodied conversational agents has increased with the development of artificial intelligence. There have been attempts to utilize non-verbal cues, particularly virtual human (VH) eye contact, to enable human-like interaction. Eye contact with VHs can affect satisfaction with [...] Read more.
The demand for kiosk systems with embodied conversational agents has increased with the development of artificial intelligence. There have been attempts to utilize non-verbal cues, particularly virtual human (VH) eye contact, to enable human-like interaction. Eye contact with VHs can affect satisfaction with the system and the perception of VHs. However, when rendered in 2D kiosks, the gaze direction of a VH can be incorrectly perceived, due to a lack of stereo cues. A user study was conducted to examine the effects of the gaze behavior of VHs in multi-party conversations in a 2D display setting. The results showed that looking at actual speakers affects the perceived interpersonal skills, social presence, attention, co-presence, and competence in conversations with VHs. In a second study, the gaze perception was further examined with consideration of the Mona Lisa effect, which can lead users to believe that a VH rendered on a 2D display is gazing at them, regardless of the actual direction, within a narrow range. We also proposed the camera rotation angle fine tuning (CRAFT) method to enhance the users’ perceptual accuracy regarding the direction of the VH’s gaze.The results showed that the perceptual accuracy for the VH gaze decreased in a narrow range and that CRAFT could increase the perceptual accuracy. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

Back to TopTop