MDPI - Publisher of Open Access Journals

24 pages, 8344 KiB

Open AccessArticle

Research and Implementation of Travel Aids for Blind and Visually Impaired People

by Jun Xu, Shilong Xu, Mingyu Ma, Jing Ma and Chuanlong Li

Sensors 2025, 25(14), 4518; https://doi.org/10.3390/s25144518 - 21 Jul 2025

Viewed by 145

Blind and visually impaired (BVI) people face significant challenges in perception, navigation, and safety during travel. Existing infrastructure (e.g., blind lanes) and traditional aids (e.g., walking sticks, basic audio feedback) provide limited flexibility and interactivity for complex environments. To solve this problem, we [...] Read more.

Blind and visually impaired (BVI) people face significant challenges in perception, navigation, and safety during travel. Existing infrastructure (e.g., blind lanes) and traditional aids (e.g., walking sticks, basic audio feedback) provide limited flexibility and interactivity for complex environments. To solve this problem, we propose a real-time travel assistance system based on deep learning. The hardware comprises an NVIDIA Jetson Nano controller, an Intel D435i depth camera for environmental sensing, and SG90 servo motors for feedback. To address embedded device computational constraints, we developed a lightweight object detection and segmentation algorithm. Key innovations include a multi-scale attention feature extraction backbone, a dual-stream fusion module incorporating the Mamba architecture, and adaptive context-aware detection/segmentation heads. This design ensures high computational efficiency and real-time performance. The system workflow is as follows: (1) the D435i captures real-time environmental data; (2) the processor analyzes this data, converting obstacle distances and path deviations into electrical signals; (3) servo motors deliver vibratory feedback for guidance and alerts. Preliminary tests confirm that the system can effectively detect obstacles and correct path deviations in real time, suggesting its potential to assist BVI users. However, as this is a work in progress, comprehensive field trials with BVI participants are required to fully validate its efficacy. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

18 pages, 1150 KiB

Open AccessArticle

Navigating by Design: Effects of Individual Differences and Navigation Modality on Spatial Memory Acquisition

by Xianyun Liu, Yanan Zhang and Baihu Sun

Behav. Sci. 2025, 15(7), 959; https://doi.org/10.3390/bs15070959 - 15 Jul 2025

Viewed by 214

Abstract

Spatial memory is a critical component of spatial cognition, particularly in unfamiliar environments. As navigation systems become integral to daily life, understanding how individuals with varying spatial abilities respond to different navigation modes is increasingly important. This study employed a virtual driving environment [...] Read more.

Spatial memory is a critical component of spatial cognition, particularly in unfamiliar environments. As navigation systems become integral to daily life, understanding how individuals with varying spatial abilities respond to different navigation modes is increasingly important. This study employed a virtual driving environment to examine how participants with varying spatial abilities (good or poor) performed under three navigation modes, namely visual, audio, and combined audio–visual navigation modes. A total of 78 participants were divided into two groups, good sense of direction (G-SOD) and poor sense of direction (P-SOD), according to their Santa Barbara Sense of Direction (SBSOD) scores. They were randomly assigned to one of the three navigation modes (visual, audio, audio–visual). Participants followed navigation cues and simulated driving behavior to the end point twice during the learning phase, then completed the route retracing task, recognizing scenes task and recognizing the order task. Significant main effects were found for both SOD group and navigation mode, with no interaction. G-SOD participants outperformed P-SOD participants in route retracing task. Audio navigation mode led to better performance in tasks involving complex spatial decisions, such as turn intersections and recognizing the order. The accuracy of recognizing scenes did not significantly differ across SOD groups or navigation modes. These findings suggest that audio navigation mode may reduce visual distraction and support more effective spatial encoding and that individual spatial abilities influence navigation performance independently of guidance type. These findings highlight the importance of aligning navigation modalities with users’ cognitive profiles and support the development of adaptive navigation systems that accommodate individual differences in spatial ability. Full article

(This article belongs to the Section Cognition)

► Show Figures

Figure 1

20 pages, 5700 KiB

Open AccessArticle

Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features

by Hyeonuk Bhin and Jongsuk Choi

Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025

Viewed by 313

Abstract

Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.

Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article

(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)

► Show Figures

Figure 1

19 pages, 1779 KiB

Open AccessArticle

Through the Eyes of the Viewer: The Cognitive Load of LLM-Generated vs. Professional Arabic Subtitles

by Hussein Abu-Rayyash and Isabel Lacruz

J. Eye Mov. Res. 2025, 18(4), 29; https://doi.org/10.3390/jemr18040029 - 14 Jul 2025

Viewed by 255

Abstract

As streaming platforms adopt artificial intelligence (AI)-powered subtitle systems to satisfy global demand for instant localization, the cognitive impact of these automated translations on viewers remains largely unexplored. This study used a web-based eye-tracking protocol to compare the cognitive load that GPT-4o-generated Arabic [...] Read more.

As streaming platforms adopt artificial intelligence (AI)-powered subtitle systems to satisfy global demand for instant localization, the cognitive impact of these automated translations on viewers remains largely unexplored. This study used a web-based eye-tracking protocol to compare the cognitive load that GPT-4o-generated Arabic subtitles impose with that of professional human translations among 82 native Arabic speakers who viewed a 10 min episode (“Syria”) from the BBC comedy drama series State of the Union. Participants were randomly assigned to view the same episode with either professionally produced Arabic subtitles (Amazon Prime’s human translations) or machine-generated GPT-4o Arabic subtitles. In a between-subjects design, with English proficiency entered as a moderator, we collected fixation count, mean fixation duration, gaze distribution, and attention concentration (K-coefficient) as indices of cognitive processing. GPT-4o subtitles raised cognitive load on every metric; viewers produced 48% more fixations in the subtitle area, recorded 56% longer fixation durations, and spent 81.5% more time reading the automated subtitles than the professional subtitles. The subtitle area K-coefficient tripled (0.10 to 0.30), a shift from ambient scanning to focal processing. Viewers with advanced English proficiency showed the largest disruptions, which indicates that higher linguistic competence increases sensitivity to subtle translation shortcomings. These results challenge claims that large language models (LLMs) lighten viewer burden; despite fluent surface quality, GPT-4o subtitles demand far more cognitive resources than expert human subtitles and therefore reinforce the need for human oversight in audiovisual translation (AVT) and media accessibility. Full article

► Show Figures

Figure 1

21 pages, 2816 KiB

Open AccessArticle

AutoStageMix: Fully Automated Stage Cross-Editing System Utilizing Facial Features

by Minjun Oh, Howon Jang and Daeho Lee

Appl. Sci. 2025, 15(13), 7613; https://doi.org/10.3390/app15137613 - 7 Jul 2025

Viewed by 278

Abstract

StageMix is a video compilation of multiple stage performances of the same song, edited seamlessly together using appropriate editing points. However, generating a StageMix requires specialized editing techniques and is a considerable time-consuming process. To address this challenge, we introduce AutoStageMix, an automated [...] Read more.

StageMix is a video compilation of multiple stage performances of the same song, edited seamlessly together using appropriate editing points. However, generating a StageMix requires specialized editing techniques and is a considerable time-consuming process. To address this challenge, we introduce AutoStageMix, an automated StageMix generation system designed to perform all processes automatically. The system is structured into five principal stages: preprocessing, feature extraction, identifying a transition point, editing path determination, and StageMix generation. The initial stage of the process involves audio analysis to synchronize the sequences across all input videos, followed by frame extraction. After that, the facial features are extracted from each video frame. Next, transition points are identified, which form the basis for face-based transitions, inter-stage cuts, and intra-stage cuts. Subsequently, a cost function is defined to facilitate the creation of cross-edited sequences. The optimal editing path is computed using Dijkstra’s algorithm to minimize the total cost of editing. Finally, the StageMix is generated by applying appropriate editing effects tailored to each transition type, aiming to maximize visual appeal. Experimental results suggest that our method generally achieves lower NME scores than existing StageMix generation approaches across multiple test songs. In a user study with 21 participants, AutoStageMix achieved viewer satisfaction comparable to that of professionally edited StageMixes, with no statistically significant difference between the two. AutoStageMix enables users to produce StageMixes effortlessly and efficiently by eliminating the need for manual editing. Full article

► Show Figures

Figure 1

16 pages, 1166 KiB

Open AccessArticle

Research on Acoustic Scene Classification Based on Time–Frequency–Wavelet Fusion Network

by Fengzheng Bi and Lidong Yang

Sensors 2025, 25(13), 3930; https://doi.org/10.3390/s25133930 - 24 Jun 2025

Viewed by 358

Abstract

Acoustic scene classification aims to recognize the scenes corresponding to sound signals in the environment, but audio differences from different cities and devices can affect the model’s accuracy. In this paper, a time–frequency–wavelet fusion network is proposed to improve model performance by focusing [...] Read more.

Acoustic scene classification aims to recognize the scenes corresponding to sound signals in the environment, but audio differences from different cities and devices can affect the model’s accuracy. In this paper, a time–frequency–wavelet fusion network is proposed to improve model performance by focusing on three dimensions: the time dimension of the spectrogram, the frequency dimension, and the high- and low-frequency information extracted by a wavelet transform through a time–frequency–wavelet module. Multidimensional information was fused through the gated temporal–spatial attention unit, and the visual state space module was introduced to enhance the contextual modeling capability of audio sequences. In addition, Kolmogorov–Arnold network layers were used in place of multilayer perceptrons in the classifier part. The experimental results show that the proposed method achieves a 56.16% average accuracy on the TAU Urban Acoustic Scenes 2022 mobile development dataset, which is an improvement of 6.53% compared to the official baseline system. This performance improvement demonstrates the effectiveness of the model in complex scenarios. In addition, the accuracy of the proposed method on the UrbanSound8K dataset reached 97.60%, which is significantly better than the existing methods, further verifying the generalization ability of the proposed model in the acoustic scene classification task. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

34 pages, 9431 KiB

Open AccessArticle

Gait Recognition via Enhanced Visual–Audio Ensemble Learning with Decision Support Methods

by Ruixiang Kan, Mei Wang, Tian Luo and Hongbing Qiu

Sensors 2025, 25(12), 3794; https://doi.org/10.3390/s25123794 - 18 Jun 2025

Viewed by 403

Abstract

Gait is considered a valuable biometric feature, and it is essential for uncovering the latent information embedded within gait patterns. Gait recognition methods are expected to serve as significant components in numerous applications. However, existing gait recognition methods exhibit limitations in complex scenarios. [...] Read more.

Gait is considered a valuable biometric feature, and it is essential for uncovering the latent information embedded within gait patterns. Gait recognition methods are expected to serve as significant components in numerous applications. However, existing gait recognition methods exhibit limitations in complex scenarios. To address these, we construct a dual-Kinect V2 system that focuses more on gait skeleton joint data and related acoustic signals. This setup lays a solid foundation for subsequent methods and updating strategies. The core framework consists of enhanced ensemble learning methods and Dempster–Shafer Evidence Theory (D-SET). Our recognition methods serve as the foundation, and the decision support mechanism is used to evaluate the compatibility of various modules within our system. On this basis, our main contributions are as follows: (1) an improved gait skeleton joint AdaBoost recognition method based on Circle Chaotic Mapping and Gramian Angular Field (GAF) representations; (2) a data-adaptive gait-related acoustic signal AdaBoost recognition method based on GAF and a Parallel Convolutional Neural Network (PCNN); and (3) an amalgamation of the Triangulation Topology Aggregation Optimizer (TTAO) and D-SET, providing a robust and innovative decision support mechanism. These collaborations improve the overall recognition accuracy and demonstrate their considerable application values. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

24 pages, 9841 KiB

Open AccessArticle

The Audiovisual Assessment of Monocultural Vegetation Based on Facial Expressions

by Mary Nwankwo, Qi Meng, Da Yang and Mengmeng Li

Forests 2025, 16(6), 937; https://doi.org/10.3390/f16060937 - 3 Jun 2025

Viewed by 473

Abstract

Plant vegetation is nature’s symphony, offering sensory experiences that influence ecological systems, human well-being, and emotional states and significantly impact human societal progress. This study investigated the emotional and perceptual impacts of specific monocultural vegetation (palm and rubber) in Nigeria, through audiovisual interactions [...] Read more.

Plant vegetation is nature’s symphony, offering sensory experiences that influence ecological systems, human well-being, and emotional states and significantly impact human societal progress. This study investigated the emotional and perceptual impacts of specific monocultural vegetation (palm and rubber) in Nigeria, through audiovisual interactions using facial expression analysis, soundscape, and visual perception assessments. The findings reveal three key outcomes: (1) Facial expressions varied significantly by vegetation type and time of day, with higher “happy” valence values recorded for palm vegetation in the morning (mean = 0.39), and for rubber vegetation in the afternoon (mean = 0.37). (2) Gender differences in emotional response were observed, as male participants exhibited higher positive expressions (mean = 0.40) compared to females (mean = 0.33). (3) Perceptual ratings indicated that palm vegetation was perceived as more visually beautiful (mean = 4.05), whereas rubber vegetation was rated as having a more pleasant soundscape (mean = 4.10). However, facial expressions showed weak correlations with soundscape and visual perceptions, suggesting that other cognitive or sensory factors may be more influential. This study addresses a critical gap in soundscape research for monocultural vegetation and offers valuable insights for urban planners, environmental psychologists, and restorative landscape designs. Full article

(This article belongs to the Special Issue Soundscape in Urban Forests—2nd Edition)

► Show Figures

Figure 1

25 pages, 5837 KiB

Open AccessFeature PaperArticle

Analysis of Facial Cues for Cognitive Decline Detection Using In-the-Wild Data

by Fatimah Alzahrani, Steve Maddock and Heidi Christensen

Appl. Sci. 2025, 15(11), 6267; https://doi.org/10.3390/app15116267 - 3 Jun 2025

Viewed by 470

Abstract

The development of automatic methods for early cognitive impairment (CI) detection has a crucial role to play in helping people obtain suitable treatment and care. Video-based analysis offers a promising, low-cost alternative to resource-intensive clinical assessments. This paper investigates visual features (eye blink [...] Read more.

The development of automatic methods for early cognitive impairment (CI) detection has a crucial role to play in helping people obtain suitable treatment and care. Video-based analysis offers a promising, low-cost alternative to resource-intensive clinical assessments. This paper investigates visual features (eye blink rate (EBR), head turn rate (HTR), and head movement statistical features (HMSFs)) for distinguishing between neurodegenerative disorders (NDs), mild cognitive impairment (MCI), functional memory disorders (FMDs), and healthy controls (HCs). Following prior work, we improve the multiple thresholds (MTs) approach specifically for EBR calculation to enhance performance and robustness, while the HTR and HMSFs are extracted using methods from previous work. The EBR, HTR, and HMSFs are evaluated using an in-the-wild video dataset captured in challenging environments. This method leverages clinically validated cues and automatically extracts features to enable classification. Experiments show that the proposed approach achieves competitive performance in distinguishing between ND, MCI, FMD, and HCs on in-the-wild datasets, with results comparable to audiovisual-based methods conducted in a lab-controlled environment. The findings highlight the potential of visual-based approaches to complement existing diagnostic tools and provide an efficient home-based monitoring system. This work advances the field by addressing traditional limitations and offering a scalable, cost-effective solution for early detection. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Biomedical Data Analysis and Health Informatics)

► Show Figures

Figure 1

21 pages, 813 KiB

Open AccessReview

Light, Sound, and Melatonin: Investigating Multisensory Pathways for Visual Restoration

by Dario Rusciano

Medicina 2025, 61(6), 1009; https://doi.org/10.3390/medicina61061009 - 28 May 2025

Cited by 1 | Viewed by 808

Abstract

Multisensory integration is fundamental for coherent perception and interaction with the environment. While cortical mechanisms of multisensory convergence are well studied, emerging evidence implicates specialized retinal ganglion cells—particularly melanopsin-expressing intrinsically photosensitive retinal ganglion cells (ipRGCs)—in crossmodal processing. This review explores how hierarchical brain [...] Read more.

Multisensory integration is fundamental for coherent perception and interaction with the environment. While cortical mechanisms of multisensory convergence are well studied, emerging evidence implicates specialized retinal ganglion cells—particularly melanopsin-expressing intrinsically photosensitive retinal ganglion cells (ipRGCs)—in crossmodal processing. This review explores how hierarchical brain networks (e.g., superior colliculus, parietal cortex) and ipRGCs jointly shape perception and behavior, focusing on their convergence in multisensory plasticity. We highlight ipRGCs as gatekeepers of environmental light cues. Their anatomical projections to multisensory areas like the superior colliculus are well established, although direct evidence for their role in human audiovisual integration remains limited. Through melanopsin signaling and subcortical projections, they may modulate downstream multisensory processing, potentially enhancing the salience of crossmodal inputs. A key theme is the spatiotemporal synergy between melanopsin and melatonin: melanopsin encodes light, while melatonin fine-tunes ipRGC activity and synaptic plasticity, potentially creating time-sensitive rehabilitation windows. However, direct evidence linking ipRGCs to audiovisual rehabilitation remains limited, with their role primarily inferred from anatomical and functional studies. Future implementations should prioritize quantitative optical metrics (e.g., melanopic irradiance, spectral composition) to standardize light-based interventions and enhance reproducibility. Nonetheless, we propose a translational framework combining multisensory stimuli (e.g., audiovisual cues) with circadian-timed melatonin to enhance recovery in visual disorders like hemianopia and spatial neglect. By bridging retinal biology with systems neuroscience, this review redefines the retina’s role in multisensory processing and offers novel, mechanistically grounded strategies for neurorehabilitation. Full article

(This article belongs to the Section Ophthalmology)

► Show Figures

Figure 1

39 pages, 13529 KiB

Open AccessArticle

Intelligent Monitoring of BECS Conveyors via Vision and the IoT for Safety and Separation Efficiency

by Shohreh Kia and Benjamin Leiding

Appl. Sci. 2025, 15(11), 5891; https://doi.org/10.3390/app15115891 - 23 May 2025

Viewed by 641

Abstract

Conveyor belts are critical in various industries, particularly in the barrier eddy current separator systems used in recycling processes. However, hidden issues, such as belt misalignment, excessive heat that can lead to fire hazards, and the presence of sharp or irregularly shaped materials, [...] Read more.

Conveyor belts are critical in various industries, particularly in the barrier eddy current separator systems used in recycling processes. However, hidden issues, such as belt misalignment, excessive heat that can lead to fire hazards, and the presence of sharp or irregularly shaped materials, reduce operational efficiency and pose serious threats to the health and safety of personnel on the production floor. This study presents an intelligent monitoring and protection system for barrier eddy current separator conveyor belts designed to safeguard machinery and human workers simultaneously. In this system, a thermal camera continuously monitors the surface temperature of the conveyor belt, especially in the area above the magnetic drum—where unwanted ferromagnetic materials can lead to abnormal heating and potential fire risks. The system detects temperature anomalies in this critical zone. The early detection of these risks triggers audio–visual alerts and IoT-based warning messages that are sent to technicians, which is vital in preventing fire-related injuries and minimizing emergency response time. Simultaneously, a machine vision module autonomously detects and corrects belt misalignment, eliminating the need for manual intervention and reducing the risk of worker exposure to moving mechanical parts. Additionally, a line-scan camera integrated with the YOLOv11 AI model analyses the shape of materials on the conveyor belt, distinguishing between rounded and sharp-edged objects. This system enhances the accuracy of material separation and reduces the likelihood of injuries caused by the impact or ejection of sharp fragments during maintenance or handling. The YOLOv11n-seg model implemented in this system achieved a segmentation mask precision of 84.8 percent and a recall of 84.5 percent in industry evaluations. Based on this high segmentation accuracy and consistent detection of sharp particles, the system is expected to substantially reduce the frequency of sharp object collisions with the BECS conveyor belt, thereby minimizing mechanical wear and potential safety hazards. By integrating these intelligent capabilities into a compact, cost-effective solution suitable for real-world recycling environments, the proposed system contributes significantly to improving workplace safety and equipment longevity. This project demonstrates how digital transformation and artificial intelligence can play a pivotal role in advancing occupational health and safety in modern industrial production. Full article

(This article belongs to the Special Issue Leveraging Digital Transformation for Enhanced Occupational Health and Safety in Manufacturing)

► Show Figures

Figure 1

30 pages, 1008 KiB

Open AccessArticle

Early and Late Fusion for Multimodal Aggression Prediction in Dementia Patients: A Comparative Analysis

by Ioannis Galanakis, Rigas Filippos Soldatos, Nikitas Karanikolas, Athanasios Voulodimos, Ioannis Voyiatzis and Maria Samarakou

Appl. Sci. 2025, 15(11), 5823; https://doi.org/10.3390/app15115823 - 22 May 2025

Viewed by 640

Abstract

Aggression in patients with dementia poses significant caregiving and clinical issues. In this work, fusion approaches—Early Fusion and Late Fusion—were compared to classify aggression using audio and visual signals. Early Fusion integrates the extracted features of the two modalities into one dataset before [...] Read more.

Aggression in patients with dementia poses significant caregiving and clinical issues. In this work, fusion approaches—Early Fusion and Late Fusion—were compared to classify aggression using audio and visual signals. Early Fusion integrates the extracted features of the two modalities into one dataset before classification, while Late Fusion integrates the prediction probabilities of standalone audio and visual classifiers with a meta-classifier. Both models were tested using a Random Forest classifier with five-fold cross-validation, and the performance was compared on accuracy, precision, recall, F1-score, ROC-AUC, and inference time. The results showcase that Late Fusion is superior to Early Fusion in terms of accuracy (0.876 vs. 0.828), recall (0.914 vs. 0.818), F1-score (0.867 vs. 0.835), and ROC-AUC score (0.970 vs. 0.922), proving more suitable for high-sensitivity use cases like healthcare and security. However, Early Fusion exhibited higher precision (0.852 vs. 0.824), indicating that in cases when false positives are a requirement, Early Fusion is preferable. Paired t-tests were applied for statistical comparison and indicate that precision alone is significantly different, with the advantage of Early Fusion. Late Fusion also performs slightly less in inference time, which makes it suitable for use in real-time systems. These findings provide significant information on multimodal fusion strategies and their applicability in the detection of aggressive behavior, which can contribute to the development of efficient monitoring systems for dementia care. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

33 pages, 10073 KiB

Open AccessArticle

A Versatile Tool for Haptic Feedback Design Towards Enhancing User Experience in Virtual Reality Applications

by Vasilije Bursać and Dragan Ivetić

Appl. Sci. 2025, 15(10), 5419; https://doi.org/10.3390/app15105419 - 13 May 2025

Viewed by 859

Abstract

The past 15 years of extensive experience teaching VR system development has taught us that haptic feedback must be more sophisticatedly integrated into VR systems, alongside the already realistic high-fidelity visual and audio feedback. The third generation of students is enhancing VR interactive [...] Read more.

The past 15 years of extensive experience teaching VR system development has taught us that haptic feedback must be more sophisticatedly integrated into VR systems, alongside the already realistic high-fidelity visual and audio feedback. The third generation of students is enhancing VR interactive experiences by incorporating haptic feedback through traditional, proven, commercially available gamepad controllers. Insights and discoveries gained through this process contributed to the development of versatile Unity custom editor tool, which is the focus of this article. The developed tool supports a wide range of use cases, enabling the visual, parametric, and descriptive creation of reusable haptic effects. To enhance productivity in commercial development, it supports the creation of haptic and haptic/audio stimulus libraries, which can be further expanded and combined based on object-oriented principles. Additionally, the tool allows for the definition of specific areas within the virtual space where these stimuli can be experienced, depending on the virtual object the avatar holds and the activities they perform. This intuitive platform allows the design of reusable haptic effects through graphical editor, audio conversion, programmatic scripting, and AI-powered guidance. The sophistication and usability of the tool have been demonstrated through several student VR projects across various application areas. Full article

(This article belongs to the Special Issue Enhancing User Experience in Virtual Reality Environments: Innovative Interaction Design Strategies)

► Show Figures

Figure 1

15 pages, 4273 KiB

Open AccessArticle

Speech Emotion Recognition: Comparative Analysis of CNN-LSTM and Attention-Enhanced CNN-LSTM Models

by Jamsher Bhanbhro, Asif Aziz Memon, Bharat Lal, Shahnawaz Talpur and Madeha Memon

Signals 2025, 6(2), 22; https://doi.org/10.3390/signals6020022 - 9 May 2025

Cited by 1 | Viewed by 1583

Abstract

Speech Emotion Recognition (SER) technology helps computers understand human emotions in speech, which fills a critical niche in advancing human–computer interaction and mental health diagnostics. The primary objective of this study is to enhance SER accuracy and generalization through innovative deep learning models. [...] Read more.

Speech Emotion Recognition (SER) technology helps computers understand human emotions in speech, which fills a critical niche in advancing human–computer interaction and mental health diagnostics. The primary objective of this study is to enhance SER accuracy and generalization through innovative deep learning models. Despite its importance in various fields like human–computer interaction and mental health diagnosis, accurately identifying emotions from speech can be challenging due to differences in speakers, accents, and background noise. The work proposes two innovative deep learning models to improve SER accuracy: a CNN-LSTM model and an Attention-Enhanced CNN-LSTM model. These models were tested on the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), collected between 2015 and 2018, which comprises 1440 audio files of male and female actors expressing eight emotions. Both models achieved impressive accuracy rates of over 96% in classifying emotions into eight categories. By comparing the CNN-LSTM and Attention-Enhanced CNN-LSTM models, this study offers comparative insights into modeling techniques, contributes to the development of more effective emotion recognition systems, and offers practical implications for real-time applications in healthcare and customer service. Full article

► Show Figures

Figure 1

23 pages, 1213 KiB

Open AccessArticle

Mobile-AI-Based Docent System: Navigation and Localization for Visually Impaired Gallery Visitors

by Hyeyoung An, Woojin Park, Philip Liu and Soochang Park

Appl. Sci. 2025, 15(9), 5161; https://doi.org/10.3390/app15095161 - 6 May 2025

Viewed by 487

Abstract

Smart guidance systems in museums and galleries are now essential for delivering quality user experiences. Visually impaired visitors face significant barriers when navigating galleries due to existing smart guidance systems’ dependence on visual cues like QR codes, manual numbering, or static beacon positioning. [...] Read more.

Smart guidance systems in museums and galleries are now essential for delivering quality user experiences. Visually impaired visitors face significant barriers when navigating galleries due to existing smart guidance systems’ dependence on visual cues like QR codes, manual numbering, or static beacon positioning. These traditional methods often fail to provide adaptive navigation and meaningful content delivery tailored to their needs. In this paper, we propose a novel Mobile-AI-based Smart Docent System that seamlessly integrates real-time navigation and depth of guide services to enrich gallery experiences for visually impaired users. Our system leverages camera-based on-device processing and adaptive BLE-based localization to ensure accurate path guidance and real-time obstacle avoidance. An on-device object detection model reduces delays from large visual data processing, while BLE beacons, fixed across the gallery, dynamically update location IDs for better accuracy. The system further refines positioning by analyzing movement history and direction to minimize navigation errors. By intelligently modulating audio content based on user movement—whether passing by, approaching for more details, or leaving mid-description—the system offers personalized, context-sensitive interpretations while eliminating unnecessary audio clutter. Experimental validation conducted in an authentic gallery environment yielded empirical evidence of user satisfaction, affirming the efficacy of our methodological approach in facilitating enhanced navigational experiences for visually impaired individuals. These findings substantiate the system’s capacity to enable more autonomous, secure, and enriched cultural engagement for visually impaired individuals within complex indoor environments. Full article

(This article belongs to the Special Issue IoT in Smart Cities and Homes, 2nd Edition)

► Show Figures

Figure 1

Search Results (229)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (229)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI