MDPI - Publisher of Open Access Journals

18 pages, 1101 KB

Open AccessArticle

SR-VLN: Implicit Spatial Reasoning Vision-and-Language Navigation

by Ruolin Zhu, Shaobin Li and Min Yang

Sensors 2026, 26(12), 3809; https://doi.org/10.3390/s26123809 - 15 Jun 2026

Viewed by 235

Vision-and-language navigation (VLN) traditionally relies on explicit reasoning chains, which, despite being interpretable, impose severe constraints on inference efficiency and scalability in long-range environments. Existing multimodal large language models (MLLMs) frequently encounter latency bottlenecks due to the generation of verbose textual narratives during [...] Read more.

Vision-and-language navigation (VLN) traditionally relies on explicit reasoning chains, which, despite being interpretable, impose severe constraints on inference efficiency and scalability in long-range environments. Existing multimodal large language models (MLLMs) frequently encounter latency bottlenecks due to the generation of verbose textual narratives during decision-making. To address these limitations, we propose spatial reasoning vision-and-language navigation (SR-VLN), a novel framework that shifts the paradigm from explicit chain-of-thought (CoT) to an implicit spatial representation space. SR-VLN introduces a pyramidal hierarchical history framework integrated with perceptual compression to condense historical trajectories into multi-scale representations, effectively minimizing token overhead while preserving critical spatial semantics. Rather than generating verbose textual reasoning steps, SR-VLN employs compact, learnable spatial tokens (S-Tokens) to perform agile inference directly within the latent feature space. To establish robust causal mappings between these implicit states and navigational actions, we employ a hybrid training strategy that combines sparse reward supervision with reinforcement learning via GRPO. Extensive evaluations on the R2R, REVERIE, and SOON datasets demonstrate that SR-VLN achieves state-of-the-art overall navigation performance, while maintaining a comparable balance between accuracy and efficiency. Compared to explicit reasoning baselines, our method reduces token consumption by 68% and achieves a 4.1× speedup in inference while reaching a 76.02% success rate and a 73.80% SPL on the R2R unseen split, thereby facilitating near-real-time action prediction in long-range navigation environments. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

23 pages, 2198 KB

Open AccessArticle

An AI-Driven Multi-Feature Approach for Synchronisation and QoE Assessment in Network Music Performance

by Ioannis Doumanis, Kostantinos Tsioutas and George Xylomenos

Appl. Sci. 2026, 16(12), 5919; https://doi.org/10.3390/app16125919 - 11 Jun 2026

Viewed by 128

Abstract

Network Music Performance (NMP) refers to remote musical collaboration over a network in applications such as music education, music production, and live performance. In NMP, synchronisation is a critical factor in musicians’ Quality of Experience (QoE). This interpersonal coordination of musical actions is [...] Read more.

Network Music Performance (NMP) refers to remote musical collaboration over a network in applications such as music education, music production, and live performance. In NMP, synchronisation is a critical factor in musicians’ Quality of Experience (QoE). This interpersonal coordination of musical actions is highly sensitive to variable network conditions, particularly to end-to-end delay and signal degradation. Existing evaluations rely mainly on subjective questionnaires or isolated objective descriptors, creating a gap for a unified metric that quantifies synchrony directly from performance signals. To address this gap, we propose the Objective Synchrony Index (OSI), an AI-driven metric that quantifies ensemble synchrony from paired NMP recordings. We computed OSI using a two-tower multi-task convolutional recurrent neural network (CRNN) that estimates synchrony-relevant descriptors from paired Musician A and Musician B audio streams. We introduce two OSI variants: timing-OSI, which captures temporal coordination through offsets, onsets, beats, and tempo coherence; and ensemble-OSI, which extends this formulation by integrating chord agreement and signal fidelity to reflect structural and perceptual aspects of ensemble interaction. We evaluated OSI using recordings from two NMP studies in which eleven pairs of musicians performed under systematically varied delay and sampling-rate conditions. After each performance, musicians completed QoE questionnaires, allowing us to relate OSI and its components to subjective ratings using repeated-measures correlation. Results showed that, under delay, timing-OSI decreases as latency increases and demonstrates construct validity against subjective QoE measures. Higher synchrony-OSI was associated with greater perceived synchronisation and satisfaction, and with lower perceived delay, irritation, and effort to follow a partner. These relationships were most consistent for offset synchrony and most selective for onset synchrony, while beat and tempo remained relatively stable. Under audio-quality degradation, ensemble-OSI remained relatively stable across sampling rates and did not significantly track subjective QoE as a single predictor. Instead, modest component-level associations suggested that satisfaction was higher when temporal stability and fidelity were preserved, whereas irritation was more closely related to reduced chord agreement. Together, these findings support timing-OSI as a promising objective synchrony metric for delay-impaired NMP, while showing that the extended ensemble-OSI requires further perceptual calibration for audio-quality degradations. Full article

(This article belongs to the Special Issue Empowering Interactions: Advancing Human-Centred AI for Transparent, Collaborative and Accessible Applications)

► Show Figures

Figure 1

29 pages, 2769 KB

Open AccessArticle

A Predictive Dual-Stage Neural Framework for Phase-Coherent Auditory Synthesis on Edge Devices

by Sathit Pairoch, Pattarapong Phasukkit and Teeraporn Suteewong

Sensors 2026, 26(11), 3344; https://doi.org/10.3390/s26113344 - 25 May 2026

Viewed by 451

Abstract

Real-time binaural beat synthesis in dynamic acoustic environments is challenged by carrier non-stationarity, interaural phase discontinuities, and processing delay in conventional digital signal processing pipelines. This study proposes a predictive dual-stage neural framework for phase-coherent auditory synthesis under non-stationary acoustic conditions. The framework [...] Read more.

Real-time binaural beat synthesis in dynamic acoustic environments is challenged by carrier non-stationarity, interaural phase discontinuities, and processing delay in conventional digital signal processing pipelines. This study proposes a predictive dual-stage neural framework for phase-coherent auditory synthesis under non-stationary acoustic conditions. The framework decouples real-time carrier estimation from phase-coherent signal generation through two specialized modules. An intelligent acoustic sensing module (AI-1) estimates time-varying carrier information across harmonic, fluctuating, and broadband acoustic profiles using a causal neural front-end with an adaptive confidence-driven strategy. A predictive phase-coherent generator (AI-2) then forecasts short-horizon carrier trajectories and drives a discrete-time phase accumulator to maintain continuous phase evolution during binaural beat embedding. Objective evaluation under multiple acoustic profiles and noise conditions shows that the proposed framework maintains strong phase continuity, with a Phase Coherence Factor greater than 0.91, and low artifact levels, with a Signal-to-Artifact Ratio greater than 39.8 dB, under the evaluated conditions. Additional comparisons with conventional DSP baselines, stronger classical F0 estimators, a lightweight neural F0 tracker, and component-wise ablation variants further demonstrate that the performance improvement arises from the combination of adaptive carrier estimation and predictive phase-coherent actuation, rather than from carrier estimation alone. Hardware profiling shows a combined INT8 inference time of 2.4 ms per frame on a resource-constrained Raspberry Pi Zero 2W-class edge device. Importantly, this inference time and the sub-millisecond phase-accumulator resolution should not be interpreted as sub-millisecond end-to-end physical audio latency. The complete system still includes buffering, framing, neural inference, and output processing delay; the proposed method instead reduces effective phase-boundary misalignment through short-horizon predictive compensation. These results support the proposed framework as a lightweight engineering solution for real-time phase-continuous auditory synthesis in dynamic listening environments. The reported PCF and SAR values should be interpreted as signal-level indicators of phase continuity and artifact suppression, rather than as evidence of listener comfort, perceptual preference, or neurophysiological efficacy. Full article

(This article belongs to the Special Issue Human Emotion Recognition and Reactions Through Sensor Technologies: Findings, Challenges, Opportunities and Future Directions)

► Show Figures

Figure 1

14 pages, 1680 KB

Open AccessArticle

Perceptual Haptic Spectrum Modeling for Fine Texture Rendering on Virtual Object Surfaces in Virtual Reality

by Jinpeng Xu and Bohan Cui

Electronics 2026, 15(10), 2153; https://doi.org/10.3390/electronics15102153 - 17 May 2026

Viewed by 317

Abstract

To enhance immersion in virtual reality (VR) environments and improve the fidelity of virtual tactile interaction, this study proposes a perceptually grounded haptic-rendering framework for fine surface-texture simulation. The framework is centred on a Perceptual Haptic Spectrum Model (PHSM), which maps virtual surface [...] Read more.

To enhance immersion in virtual reality (VR) environments and improve the fidelity of virtual tactile interaction, this study proposes a perceptually grounded haptic-rendering framework for fine surface-texture simulation. The framework is centred on a Perceptual Haptic Spectrum Model (PHSM), which maps virtual surface attributes, including hardness, elasticity, roughness, friction, and microtexture periodicity, to multi-band tactile targets in perceptual frequency space. A Just Noticeable Difference (JND)-inspired parameterisation strategy is used as a design guideline to avoid imperceptible or redundant actuation, while region-specific response functions adapt the output to the fingertip centre, finger pad, and lateral edge. To improve reproducibility, the revised manuscript now specifies the flexible thin-film force/strain-sensor cell, array quantity, 320 Hz per-cell acquisition setting, signal-conditioning pipeline, contact-state classification rules, delay budget, and dual-actuation scheduling logic. The sensing design is based on a commercial flexible piezoresistive force-sensor cell with microsecond-level response time and a 12-bit ADC acquisition chain that provides a sufficient aggregate sampling margin for a 7–21 cell array. Manufacturer-supported sensor performance and prototype-level acceptance criteria are reported for response time, linearity, repeatability, hysteresis, drift, SNR, contact-state detection, latency, and durability. The system remains a proof-of-concept platform rather than a completed large-scale psychophysical validation. Within these boundaries, the results show coherent integration of perceptual modelling, multi-rate sensing, state monitoring, predictive feedforward control, and coordinated haptic actuation for fine VR texture rendering. Full article

(This article belongs to the Topic Extended Reality: Models and Applications)

► Show Figures

Figure 1

33 pages, 1423 KB

Open AccessReview

Non-Prosthetic Assistive Technologies for Persons with Hearing Losses: A Survey

by Reemas Alsubaiei, Farah AlHayek, Mariam Alsahhaf, Ghadah Alajmi, Aliah Almutairi, Karim Youssef, Ghina El Mir, Sherif Said, Taha Beyrouthy and Samer Al Kork

Technologies 2026, 14(5), 302; https://doi.org/10.3390/technologies14050302 - 13 May 2026

Viewed by 619

Abstract

Millions of persons worldwide experience varying degrees of hearing loss, traditionally addressed through prosthetic solutions such as hearing aids and cochlear implants. However, a significant proportion of individuals cannot benefit from these technologies, cannot access them, or choose not to use them. In [...] Read more.

Millions of persons worldwide experience varying degrees of hearing loss, traditionally addressed through prosthetic solutions such as hearing aids and cochlear implants. However, a significant proportion of individuals cannot benefit from these technologies, cannot access them, or choose not to use them. In this context, non-prosthetic assistive technologies have emerged as a complementary paradigm, leveraging advances in sensing, artificial intelligence, and wearable computing to transform acoustic information into alternative perceptual representations rather than restoring auditory function. This survey provides a review of such systems, focusing on technologies that enhance environmental awareness, communication, and social interaction. Existing approaches are categorized along two main dimensions: the tasks they perform and the platforms on which they operate. Task-oriented analysis includes sound recognition (speech and non-speech), sound source localization, emotion recognition, sign language recognition, and related emerging functionalities. Platform-based analysis emphasizes wearable devices and mobile solutions enabling real-time and context-aware assistance. The survey further highlights key research trends, including real-time auditory scene analysis, portable processing, and artificial intelligence. It shows that recent studies increasingly demonstrate that combining auditory, visual, and haptic modalities improves robustness and usability in real-world conditions, particularly in noisy and dynamic environments. Finally, open challenges such as energy efficiency, latency, evaluation methodologies, and user acceptance are discussed. By synthesizing existing work and identifying open research directions, this survey aims to provide a structured foundation for future developments in intelligent, non-prosthetic assistive systems that redefine how auditory information is accessed and interpreted. Full article

(This article belongs to the Section Assistive Technologies)

► Show Figures

Figure 1

20 pages, 1481 KB

Open AccessArticle

Adaptive Attentional Regulation to Emotional Faces in Subclinical Depression

by Chaoyang Li and Jinhong Ding

Behav. Sci. 2026, 16(5), 657; https://doi.org/10.3390/bs16050657 - 26 Apr 2026

Viewed by 340

Abstract

Cognitive models of depression posit a core role for attentional biases, though empirical evidence remains inconsistent, likely due to variations in task demands. This study utilized eye-tracking to assess attentional patterns in individuals with depressive symptoms during a goal-directed visual search task, specifically [...] Read more.

Cognitive models of depression posit a core role for attentional biases, though empirical evidence remains inconsistent, likely due to variations in task demands. This study utilized eye-tracking to assess attentional patterns in individuals with depressive symptoms during a goal-directed visual search task, specifically dissociating early orienting and late disengagement. Seventy-seven participants, classified into high (HD) and low (LD) depressive-symptom groups based on PHQ-9 scores, completed a “face-in-the-crowd” (FITC) task. The set size (4, 8, or 12 faces) was varied to examine the role of perceptual load. The task involved searching for a single emotional target among neutral distractors (assessing early orienting) and searching for a single neutral target among emotional distractors (assessing late disengagement). Contrary to the negativity-bias hypothesis, the HD group demonstrated what might be interpreted as adaptive attentional regulation. During early orienting (8-face condition), the HD group showed reduced total dwell time on happy targets, suggesting accelerated identification. An attentional bias index (sad minus happy dwell time) correlated positively with depression severity. During late disengagement (8-face condition), the HD group exhibited shorter target fixation latency specifically with sad distractors, indicating facilitated disengagement from negative information. The corresponding bias index correlated negatively with depression levels. Under explicit goal-directed demands, individuals with high depressive symptoms displayed facilitated processing of happy faces and accelerated disengagement from sad faces, rather than an enhanced negativity bias. This pattern tentatively suggests a possible adaptive attentional regulatory mechanism in early depression, although the findings were limited to the 8-face condition and no significant group differences emerged at set sizes 4 or 12. Replication is required before firm conclusions can be drawn. The result underscores the critical influence of task demands and highlights the value of early identification and targeted intervention. Full article

► Show Figures

Figure 1

10 pages, 1269 KB

Open AccessCase Report

Oculometric Measurement of Concussion Magnitude in Professional Baseball Catchers

by Richard Baird, Ryan Harrison, Quinn Kennedy, Mollie McGuire and Dorion Liston

Brain Sci. 2026, 16(4), 369; https://doi.org/10.3390/brainsci16040369 - 29 Mar 2026

Viewed by 618

Abstract

Background/Objectives: Due to their positions, professional baseball catchers are at elevated risk of concussion, which can impair visual processing. There is a need for sensitive sensorimotor monitoring tools to track concussion-related neurophysiological changes more accurately. We investigated whether oculometrics can address this [...] Read more.

Background/Objectives: Due to their positions, professional baseball catchers are at elevated risk of concussion, which can impair visual processing. There is a need for sensitive sensorimotor monitoring tools to track concussion-related neurophysiological changes more accurately. We investigated whether oculometrics can address this need. Methods: Four Major League Baseball catchers completed an oculometric assessment shortly after suffering a concussion (Time 1) and again after completing vision rehabilitation (Time 2). The assessment produces 10 z-scored measures, including a summary score. Results: Players’ Time 1 summary score tended to be typical of a normal healthy adult (Mean = 0.07 z-scored units). On average, players improved by 1.3 z-score units from their Time 1 summary score (SD = 1.07). Exploratory analyses revealed that sensorimotor recovery was driven by smooth pursuit latency, proportion of tracking comprising smooth pursuit, and the amplitude of catch-up saccades. Conclusions: Our analysis was based on a very small sample of concussion cases, each of which was unique. Despite this limitation, our data show how oculometrics can measure improvements in visual processing following a concussion among baseball players with exceptional perceptual-motor skills. Our data highlight the risk that brain injuries in high-performing individuals go undetected due to standard-of-care tools normed to behavior from healthy control populations; for these athletes, “normal” scores cannot be interpreted as neurologically “healthy”. Full article

(This article belongs to the Special Issue Advances in Assessment and Training of Perceptual-Motor Performance (2nd Edition))

► Show Figures

Figure 1

17 pages, 2538 KB

Open AccessArticle

Beyond Synchrony: Non-Phase Gamma as a Candidate Mechanism for Perceptual Anti-Binding

by Rocio Caballero-Díaz, Esteban Sarrias-Arrabal, Ruben Martin-Clemente and Manuel Vazquez-Marrufo

Sci 2026, 8(2), 49; https://doi.org/10.3390/sci8020049 - 20 Feb 2026

Viewed by 756

Abstract

The gamma band observed in human electroencephalography (EEG) has been extensively studied. However, recent research has begun distinguishing the potential roles assigned to phase and non-phase modulation within this band. The primary aim of this study is to analyze the potential role of [...] Read more.

The gamma band observed in human electroencephalography (EEG) has been extensively studied. However, recent research has begun distinguishing the potential roles assigned to phase and non-phase modulation within this band. The primary aim of this study is to analyze the potential role of non-phase gamma modulation in a widely used visual task in human subjects. For this purpose, using a 58-channel EEG recording, gamma activity was evaluated during an oddball task. Responses from 21 healthy subjects were recorded at two separate time points, with an average interval of 49.5 ± 48.9 days. Latency, amplitude, and topographic correlation values were calculated to assess the replicability. Furthermore, potential influence of alpha band harmonics on gamma was analyzed. Topographic analyses revealed a strong negative correlation between gamma phase-locked (synchronous) and non-phase-locked (asynchronous) activity, with correlation coefficients of r < −0.9 for both measures. The results observed between the two time points were robust. The harmonic analysis did not show any potential contribution of the alpha band. The separate analysis of phase and non-phase activity has enabled us to identify distinct roles for each. Establishing non-phase activity as a perceptual “anti-binding” mechanism opens new avenues for exploring a previously unaddressed aspect of gamma activity. Full article

► Show Figures

Figure 1

16 pages, 927 KB

Open AccessFeature PaperArticle

Trained Scent Dog Detection and GC-MS Analysis of Volatile Organic Compounds from Murine Coronavirus-Infected Cell Cultures

by Agata Kokocińska-Alexandre, Martyna Woszczyło, Michał Dzięcioł, Agata Kublicka, Adam Szumowski, Jacek Łyczko, Katarzyna Barłowska, Antoni Szumny, Marcin J. Skwark and Anna Karolina Matczuk

Animals 2026, 16(4), 647; https://doi.org/10.3390/ani16040647 - 18 Feb 2026

Viewed by 977

Abstract

Volatile organic compounds (VOCs) are increasingly recognized as metabolic byproducts of viral infection and may serve as olfactory cues detectable by trained scent dogs. This study examined whether dogs could distinguish cell culture samples infected with murine hepatitis virus strain 1 (MHV-1), a [...] Read more.

Volatile organic compounds (VOCs) are increasingly recognized as metabolic byproducts of viral infection and may serve as olfactory cues detectable by trained scent dogs. This study examined whether dogs could distinguish cell culture samples infected with murine hepatitis virus strain 1 (MHV-1), a biosafety level 2 coronavirus model, from uninfected controls. Parallel chemical analysis using gas chromatography–mass spectrometry (GC-MS) identified 14 VOCs in infected and 12 in control samples. Notably, 3-heptanone and 1-nonanol were unique to infected samples, while others such as acetophenone, nonanal, decanal, and benzaldehyde were significantly elevated—often by 1.5 to 3 times—in infected cultures. Two trained dogs demonstrated high detection sensitivity (0.95) for infected samples compared to a previously trained odor cinnamon group (0.88) and responded with shorter latency (p = 0.04), suggesting perceptual salience of infection-related VOCs. Reliable detection required pooled volumes (~600 µL), suggesting a threshold effect related to VOC concentration. Additionally, a Random Forest-based machine learning classifier trained on the GC-MS-obtained VOC profiles achieved a cross-validated accuracy of 0.82 (SD = 0.25). These findings suggest that dogs use quantitative VOC differences, rather than unique compounds, for detection. The study provides a validated experimental framework for olfactory diagnostics of viral infections and highlights the potential of scent dogs as non-invasive biosensors in both veterinary and public health contexts. Full article

(This article belongs to the Special Issue Canine Olfaction)

► Show Figures

Figure 1

18 pages, 853 KB

Open AccessArticle

Intraindividual Variability in Perceptual-Motor Performance Measured with Virtual Reality Among Military Veterans

by Scott L. Bruce, Michael Cooper, Carly Farmer, Audrey Folsom, Melanie Fulton, Jana Haskins, Cheryl Knight, Carlitta M. Moore, Johnathon A. Mullins, Amy Shollenbarger, Rashele Wade, Stacy Walz, Rebbecca Wellborn, Rachel Wilkins and Kendall Youngman

Brain Sci. 2026, 16(2), 185; https://doi.org/10.3390/brainsci16020185 - 3 Feb 2026

Viewed by 575

Abstract

Background/Objectives: Concussions produce a wide array of symptoms that are often subtle and difficult to quantify. One such symptom involves reaction or response time (RT), consisting of perceptual latency time (LT) and movement time (MT). This pilot study examined the relationship between concussion [...] Read more.

Background/Objectives: Concussions produce a wide array of symptoms that are often subtle and difficult to quantify. One such symptom involves reaction or response time (RT), consisting of perceptual latency time (LT) and movement time (MT). This pilot study examined the relationship between concussion history, mental health, and perceptual-motor performance among military veterans using a virtual reality (VR)-based assessment. The primary outcome was intraindividual variability (IIV), defined as the standard deviation of an individual’s responses across repeated trials. Methods: Of 78 veterans who volunteered, 29 (22 males, 7 females) provided complete VR data. Participants completed surveys assessing concussion and combat history, mental health issues, and suicide ideation. During VR testing, participants responded to 40 trials requiring neck rotation, arm reach, and a step toward left or right virtual targets. Associations between predictors (e.g., concussion, mental health) and VR outcomes (RT, LT, IIV) were evaluated using Receiver Operating Characteristic (ROC) Area Under the Curve (AUC) values. Results: Concussion history was the strongest predictor of performance deficits. IIV metrics were sensitive indicators of both concussion and mental health issues. Univariable analyses yielded AUC values of 0.944–0.806 all of which were statistically significant (p ≤ 0.001), and multivariable analyses produced AUCs of 0.950–0.870 all of which were also statistically significant (p ≤ 0.001). Incongruent movements and longer LT values were especially discriminative. Conclusions: Veterans with concussion and mental health histories demonstrated quantifiable perceptual-motor impairments in VR environments. Findings support VR assessment as a feasible, sensitive tool for detecting subtle residual effects of concussion. Full article

(This article belongs to the Section Sensory and Motor Neuroscience)

► Show Figures

Figure 1

39 pages, 14025 KB

Open AccessArticle

Degradation-Aware Multi-Stage Fusion for Underwater Image Enhancement

by Lian Xie, Hao Chen and Jin Shu

J. Imaging 2026, 12(1), 37; https://doi.org/10.3390/jimaging12010037 - 8 Jan 2026

Cited by 1 | Viewed by 963

Abstract

Underwater images frequently suffer from color casts, low illumination, and blur due to wavelength-dependent absorption and scattering. We present a practical two-stage, modular, and degradation-aware framework designed for real-time enhancement, prioritizing deployability on edge devices. Stage I employs a lightweight CNN to classify [...] Read more.

Underwater images frequently suffer from color casts, low illumination, and blur due to wavelength-dependent absorption and scattering. We present a practical two-stage, modular, and degradation-aware framework designed for real-time enhancement, prioritizing deployability on edge devices. Stage I employs a lightweight CNN to classify inputs into three dominant degradation classes (color cast, low light, blur) with 91.85% accuracy on an EUVP subset. Stage II applies three scene-specific lightweight enhancement pipelines and fuses their outputs using two alternative learnable modules: a global Linear Fusion and a LiteUNetFusion (spatially adaptive weighting with optional residual correction). Compared to the three single-scene optimizers (average PSNR = 19.0 dB; mean UCIQE ≈ 0.597; mean UIQM ≈ 2.07), the Linear Fusion improves PSNR by +2.6 dB on average and yields roughly +20.7% in UCIQE and +21.0% in UIQM, while maintaining low latency (~90 ms per 640 × 480 frame on an Intel i5-13400F (Intel Corporation, Santa Clara, CA, USA). The LiteUNetFusion further refines results: it raises PSNR by +1.5 dB over the Linear model (23.1 vs. 21.6 dB), brings modest perceptual gains (UCIQE from 0.72 to 0.74, UIQM 2.5 to 2.8) at a runtime of ≈125 ms per 640 × 480 frame, and better preserves local texture and color consistency in mixed-degradation scenes. We release implementation details for reproducibility and discuss limitations (e.g., occasional blur/noise amplification and domain generalization) together with future directions. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

18 pages, 3240 KB

Open AccessArticle

A Waist-Mounted Interface for Mobile Viewpoint-Height Transformation Affecting Spatial Perception

by Jun Aoki, Hideki Kadone and Kenji Suzuki

Sensors 2026, 26(2), 372; https://doi.org/10.3390/s26020372 - 6 Jan 2026

Viewed by 712

Abstract

Visual information shapes spatial perception and body representation in human augmentation. However, the perceptual consequences of viewpoint-height changes produced by sensor–display geometry are not well understood. To address this gap, we developed an interface that maps a waist-mounted stereo fisheye camera to an [...] Read more.

Visual information shapes spatial perception and body representation in human augmentation. However, the perceptual consequences of viewpoint-height changes produced by sensor–display geometry are not well understood. To address this gap, we developed an interface that maps a waist-mounted stereo fisheye camera to an eye-level viewpoint on a head-mounted display in real time. Geometric and timing calibration kept latency low enough to preserve a sense of agency and enable stable untethered walking. In a within-subject study comparing head- and waist-level viewpoints, participants approached adjustable gaps, rated passability confidence (1–7), and attempted passage when confident. We also recorded walking speed and assessed post-task body representation using a questionnaire. High gaps were judged passable and low gaps were not, irrespective of viewpoint. At the middle gap, confidence decreased with a head-level viewpoint and increased with a waist-level viewpoint, and walking speed decreased when a waist-level viewpoint was combined with a chest-height gap, consistent with added caution near the decision boundary. Body image reports most often indicated a lowered head position relative to the torso, consistent with visually driven rescaling rather than morphological change. These findings show that a waist-mounted interface for mobile viewpoint-height transformation can reliably shift spatial perception. Full article

(This article belongs to the Special Issue Sensors and Wearables for AR/VR Applications)

► Show Figures

Figure 1

34 pages, 4042 KB

Open AccessArticle

Perceptual Elements and Sensitivity Analysis of Urban Tunnel Portals for Autonomous Driving

by Mengdie Xu, Bo Liang, Haonan Long, Chun Chen, Hongyi Zhou and Shuangkai Zhu

Appl. Sci. 2026, 16(1), 453; https://doi.org/10.3390/app16010453 - 31 Dec 2025

Cited by 1 | Viewed by 553

Abstract

Urban tunnel portals constitute critical safety zones for autonomous vehicles, where abrupt luminance transitions, shortened sight distances, and densely distributed structural and traffic elements pose considerable challenges to perception reliability. Existing driving scenario datasets are rarely tailored to tunnel environments and have not [...] Read more.

Urban tunnel portals constitute critical safety zones for autonomous vehicles, where abrupt luminance transitions, shortened sight distances, and densely distributed structural and traffic elements pose considerable challenges to perception reliability. Existing driving scenario datasets are rarely tailored to tunnel environments and have not quantitatively evaluated how specific infrastructure components influence perception latency in autonomous systems. This study develops a requirement-driven framework for the identification and sensitivity ranking of information perception elements within urban tunnel portals. Based on expert evaluations and a combined function–safety scoring system, nine key elements—including road surfaces, tunnel portals, lane markings, and vehicles—were identified as perception-critical. A “mandatory–optional” combination rule was then applied to generate 48 logical scene types, and 376 images after brightness (30–220 px), blur (Laplacian variance ≥ 100), and occlusion filtering (≤0.5% pixel error) were obtained after luminance and occlusion screening. A ResNet50–PSPNet convolutional neural network was trained to perform pixel-level segmentation, with inference rate adopted as a quantitative proxy for perceptual sensitivity. Field experiments across ten urban tunnels in China indicate that the model consistently recognized road surfaces, lane markings, cars, and motorcycles with the shortest inference times (<6.5 ms), whereas portal structures and vegetation required longer recognition times (>7.5 ms). This sensitivity ranking is statistically stable under clear, daytime conditions (p < 0.01). The findings provide engineering insights for optimizing tunnel lighting design, signage placement, and V2X configuration, and offers a pilot dataset to support perception-oriented design and evaluation of urban tunnel portals in semi-enclosed environments. Unlike generic segmentation datasets, this study quantifies element-specific CNN latency at tunnel portals for the first time. Full article

(This article belongs to the Section Civil Engineering)

► Show Figures

Figure 1

23 pages, 5039 KB

Open AccessArticle

A3DSimVP: Enhancing SimVP-v2 with Audio and 3D Convolution

by Junfeng Yang, Mingrui Long, Hongjia Zhu, Limei Liu, Wenzhi Cao, Qin Li and Han Peng

Electronics 2026, 15(1), 112; https://doi.org/10.3390/electronics15010112 - 25 Dec 2025

Viewed by 949

Abstract

In modern high-demand applications, such as real-time video communication, cloud gaming, and high-definition live streaming, achieving both superior transmission speed and high visual fidelity is paramount. However, unstable networks and packet loss remain major bottlenecks, making accurate and low-latency video error concealment a [...] Read more.

In modern high-demand applications, such as real-time video communication, cloud gaming, and high-definition live streaming, achieving both superior transmission speed and high visual fidelity is paramount. However, unstable networks and packet loss remain major bottlenecks, making accurate and low-latency video error concealment a critical challenge. Traditional error control strategies, such as Forward Error Correction (FEC) and Automatic Repeat Request (ARQ), often introduce excessive latency or bandwidth overhead. Meanwhile, receiver-side concealment methods struggle under high motion or significant packet loss, motivating the exploration of predictive models. SimVP-v2, with its efficient convolutional architecture and Gated Spatiotemporal Attention (GSTA) mechanism, provides a strong baseline by reducing complexity and achieving competitive prediction performance. Despite its merits, SimVP-v2’s reliance on 2D convolutions for implicit temporal aggregation limits its capacity to capture complex motion trajectories and long-term dependencies. This often results in artifacts such as motion blur, detail loss, and accumulated errors. Furthermore, its single-modality design ignores the complementary contextual cues embedded in the audio stream. To overcome these issues, we propose A3DSimVP (Audio- and 3D-Enhanced SimVP-v2), which integrates explicit spatio-temporal modeling with multimodal feature fusion. Architecturally, we replace the 2D depthwise separable convolutions within the GSTA module with their 3D counterparts, introducing a redesigned GSTA-3D module that significantly improves motion coherence across frames. Additionally, an efficient audio–visual fusion strategy supplements visual features with contextual audio guidance, thereby enhancing the model’s robustness and perceptual realism. We validate the effectiveness of A3DSimVP’s improvements through extensive experiments on the KTH dataset. Our model achieves a PSNR of 27.35 dB, surpassing the 27.04 of the SimVP-v2 baseline. Concurrently, our improved A3DSimVP model reduces the loss metrics on the KTH dataset, achieving an MSE of 43.82 and an MAE of 385.73, both lower than the baseline. Crucially, our LPIPS metric is substantially lowered to 0.22. These data tangibly confirm that A3DSimVP significantly enhances both structural fidelity and perceptual quality while maintaining high predictive accuracy. Notably, A3DSimVP attains faster inference speeds than the baseline with only a marginal increase in computational overhead. These results establish A3DSimVP as an efficient and robust solution for latency-critical video applications. Full article

(This article belongs to the Special Issue Digital Intelligence Technology and Applications, 2nd Edition)

► Show Figures

Figure 1

28 pages, 14015 KB

Open AccessArticle

Evaluating Passenger Behavioral Experience in Metro Travel: An Integrated Model of One-Way and Interactive Behaviors

by Ning Song, Xuemei He, Fan Liu and Anjie Tian

Sustainability 2025, 17(24), 11257; https://doi.org/10.3390/su172411257 - 16 Dec 2025

Viewed by 1142

Abstract

With the continuous expansion of urban metro systems, balancing passenger experience and operational efficiency has become a central concern in contemporary public transportation design. However, most existing metro service studies continue to focus on perceptual comfort or isolated usability tasks and lack an [...] Read more.

With the continuous expansion of urban metro systems, balancing passenger experience and operational efficiency has become a central concern in contemporary public transportation design. However, most existing metro service studies continue to focus on perceptual comfort or isolated usability tasks and lack an integrated, behavior-centered perspective that accounts for the full travel chain and diverse user groups. This study develops the Bi-directional Service Behavioral Experience Model (BSBEM), which systematically integrates one-way navigation behaviors and interactive operational behaviors within a unified dual-path framework to identify behavioral patterns and experiential disparities across user groups. Based on the People–Touchpoints–Environments–Messages–Services–Time–Emotion (POEMSTI) behavioral observation framework, this study employs a mixed-method approach combining video-based behavioral coding, usability testing, and subjective evaluation. An empirical study conducted at Beidajie Station on Xi’an Metro Line 2 involved three representative passenger groups: high-frequency commuters, urban leisure travelers, and special-care passengers. Multi-source data were collected to capture temporal, spatial, and interactional dynamics throughout the travel process. Results show that high-frequency commuters demonstrate the highest operational fluency, urban leisure travelers exhibit greater visual dependency and exploratory pauses, and special-care passengers are most affected by accessibility and feedback latency. Further analysis reveals a positive correlation between route complexity and interaction delay, highlighting discontinuous information feedback as a key experiential bottleneck. By jointly modeling one-way and interactive behaviors and linking group-specific patterns to concrete metro touchpoints, this research extends behavioral evaluation in metro systems and offers a novel behavior-based perspective along with empirical evidence for inclusive, adaptive, and human-centered service design. Full article

► Show Figures

Figure 1

Search Results (36)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (36)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI