Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (10)

Search Parameters:
Keywords = acoustic cue-weighting

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 1363 KB  
Article
HydroSNN: Event-Driven Computer Vision with Spiking Transformers for Energy-Efficient Edge Perception in Sustainable Water Conservancy and Urban Water Utilities
by Jing Liu, Hong Liu and Yangdong Li
Sustainability 2026, 18(3), 1562; https://doi.org/10.3390/su18031562 - 3 Feb 2026
Abstract
Digital transformation in water conservancy and urban water utilities demands perception systems that are accurate, fast, and energy-efficient and maintainable over long service lifecycles at the edge. We present HydroSNN, a neuromorphic computer-vision framework that couples an event-driven sensing pipeline with a spiking-transformer [...] Read more.
Digital transformation in water conservancy and urban water utilities demands perception systems that are accurate, fast, and energy-efficient and maintainable over long service lifecycles at the edge. We present HydroSNN, a neuromorphic computer-vision framework that couples an event-driven sensing pipeline with a spiking-transformer backbone to support monitoring of canals, reservoirs, treatment plants, and buried pipeline networks. By reducing always-on compute and unnecessary data movement, HydroSNN targets sustainability goals in smart water infrastructure: lower operational energy use, fewer site visits, and improved resilience under harsh illumination and weather. HydroSNN introduces three novel components: (i) spiking temporal tokenization (STT), which converts asynchronous events and optional frames into latency-aware spike tokens while preserving motion cues relevant to hydraulics; (ii) physics-guided spiking attention (PGSA), which injects lightweight mass-conservation/continuity constraints into attention weights via a differentiable regularizer to suppress physically implausible interactions; and (iii) cross-modal self-supervision (CM-SSL), which aligns RGB frames, event streams, and low-cost acoustic/vibration traces using masked prediction to reduce annotation requirements. We evaluate HydroSNN on public water-surface and event-vision benchmarks (MaSTr1325, SeaDronesSee, DSEC, MVSEC, DAVIS, and DDD20) and report accuracy, latency, and an operation-based energy proxy. HydroSNN improves mIoU/F1 over strong CNN/ViT baselines while reducing end-to-end latency and the estimated energy proxy in event-driven settings. These efficiency gains are practically relevant for off-grid or power-constrained deployments and support sustainable development by enabling continuous, low-power monitoring and timely anomaly response. These results demonstrate that event-driven spiking vision, augmented with simple physics guidance, offers a practical and efficient solution for resilient perception in smart water infrastructure. Full article
33 pages, 11440 KB  
Article
A Vision-Assisted Acoustic Channel Modeling Framework for Smartphone Indoor Localization
by Can Xue, Huixin Zhuge and Zhi Wang
Sensors 2026, 26(2), 717; https://doi.org/10.3390/s26020717 - 21 Jan 2026
Viewed by 152
Abstract
Conventional acoustic time-of-arrival (TOA) estimation in complex indoor environments is highly susceptible to multipath reflections and occlusions, resulting in unstable measurements and limited physical interpretability. This paper presents a smartphone-based indoor localization method built on vision-assisted acoustic channel modeling, and develops a fusion [...] Read more.
Conventional acoustic time-of-arrival (TOA) estimation in complex indoor environments is highly susceptible to multipath reflections and occlusions, resulting in unstable measurements and limited physical interpretability. This paper presents a smartphone-based indoor localization method built on vision-assisted acoustic channel modeling, and develops a fusion anchor integrating a pan–tilt–zoom (PTZ) camera and a near-ultrasonic signal transmitter to explicitly perceive indoor geometry, surface materials, and occlusion patterns. First, vision-derived priors are constructed on the anchor side based on line-of-sight reachability, orientation consistency, and directional risk, and are converted into soft anchor weights to suppress the impact of occlusion and pointing mismatch. Second, planar geometry and material cues reconstructed from camera images are used to generate probabilistic room impulse response (RIR) priors that cover the direct path and first-order reflections, where environmental uncertainty is mapped into path-dependent arrival-time variances and prior probabilities. Finally, under the RIR prior constraints, a path-wise posterior distribution is built from matched-filter outputs, and an adaptive fusion strategy is applied to switch between maximum a posteriori (MAP) and minimum mean square error (MMSE) estimators, yielding debiased TOA measurements with calibratable variances for downstream localization filters. Experiments in representative complex indoor scenarios demonstrate mean localization errors of 0.096 m and 0.115 m in static and dynamic tests, respectively, indicating improved accuracy and robustness over conventional TOA estimation. Full article
Show Figures

Figure 1

18 pages, 2375 KB  
Article
A Co-Expressed Cluster of Genes in the Anterior Brain of Female Crickets Activated by a Species-Specific Calling Song
by Shijiao Xiong, Chunxia Gan, Fengmin Wang, Zhengyang Li, Songwang Yi, Yaobin Lu and Xinyang Zhang
Int. J. Mol. Sci. 2026, 27(2), 706; https://doi.org/10.3390/ijms27020706 - 10 Jan 2026
Viewed by 209
Abstract
Crickets use the pulse pattern of the species-specific calling song as a primary cue for mate recognition. Here we combined transcriptome profiling of brain regions with network-based analyses in Gryllus bimaculatus exposed to silence or pulse trains known to elicit strong or weak [...] Read more.
Crickets use the pulse pattern of the species-specific calling song as a primary cue for mate recognition. Here we combined transcriptome profiling of brain regions with network-based analyses in Gryllus bimaculatus exposed to silence or pulse trains known to elicit strong or weak phonotactic attraction. Acoustic stimulation triggered specific transcriptional changes in the brain, with the anterior protocerebrum showing the most pronounced and selective responses to the calling song pattern, characterized by enrichment in neuromodulatory and neurotransmitter-related pathways. Weighted gene co-expression analysis identified a specific cluster of highly co-expressed genes in the anterior brain (termed the calling song-responsive module) that responded selectively only to the calling song stimulus. Genetic network topology analysis revealed six highly connected key hub genes within the calling song-responsive module—GbOrb2, Gbgl, Gbpum, GbDnm, GbCadN, and GbNCadN. These genes showed extensive interactions with many other genes in the network, suggesting their central regulatory role in response to calling song in female crickets. These findings support the anterior brain as a central integrator of cricket auditory mate recognition cues and point to a core molecular network that likely underpins this behavior. Full article
(This article belongs to the Collection 30th Anniversary of IJMS: Updates and Advances in Biochemistry)
Show Figures

Figure 1

16 pages, 1381 KB  
Article
Dual Routing Mixture-of-Experts for Multi-Scale Representation Learning in Multimodal Emotion Recognition
by Da-Eun Chae and Seok-Pil Lee
Electronics 2025, 14(24), 4972; https://doi.org/10.3390/electronics14244972 - 18 Dec 2025
Viewed by 330
Abstract
Multimodal emotion recognition (MER) often relies on single-scale representations that fail to capture the hierarchical structure of emotional signals. This paper proposes a Dual Routing Mixture-of-Experts (MoE) model that dynamically selects between local (fine-grained) and global (contextual) representations extracted from speech and text [...] Read more.
Multimodal emotion recognition (MER) often relies on single-scale representations that fail to capture the hierarchical structure of emotional signals. This paper proposes a Dual Routing Mixture-of-Experts (MoE) model that dynamically selects between local (fine-grained) and global (contextual) representations extracted from speech and text encoders. The framework first obtains local–global embeddings using WavLM and RoBERTa, then employs a scale-aware routing mechanism to activate the most informative expert before bidirectional cross-attention fusion. Experiments on the IEMOCAP dataset show that the proposed model achieves stable performance across all folds, reaching an average unweighted accuracy (UA) of 75.27% and weighted accuracy (WA) of 74.09%. The model consistently outperforms single-scale baselines and simple concatenation methods, confirming the importance of dynamic multi-scale cue selection. Ablation studies highlight that neither local-only nor global-only representations are sufficient, while routing behavior analysis reveals emotion-dependent scale preferences—such as strong reliance on local acoustic cues for anger and global contextual cues for low-arousal emotions. These findings demonstrate that emotional expressions are inherently multi-scale and that scale-aware expert activation provides a principled approach beyond conventional single-scale fusion. Full article
Show Figures

Figure 1

16 pages, 1176 KB  
Article
Hearing Tones, Missing Boundaries: Cross-Level Selective Transfer of Prosodic Boundaries Among Chinese–English Learners
by Lan Fang, Zilong Li, Keke Yu, John W. Schwieter and Ruiming Wang
Behav. Sci. 2025, 15(12), 1605; https://doi.org/10.3390/bs15121605 - 21 Nov 2025
Viewed by 399
Abstract
Second language (L2) learners often struggle to process prosodic boundaries, which are essential for speech comprehension. This study investigated the nature of these difficulties and how first language (L1) cue-weighting strategies transfer to L2 processing among Chinese (Mandarin)–English learners. The rising pitch that [...] Read more.
Second language (L2) learners often struggle to process prosodic boundaries, which are essential for speech comprehension. This study investigated the nature of these difficulties and how first language (L1) cue-weighting strategies transfer to L2 processing among Chinese (Mandarin)–English learners. The rising pitch that cues English phrase boundaries acoustically overlaps with functionally distinct Chinese lexical tones. Through two experiments comparing Chinese–English learners and native English speakers, we assessed sensitivity across lexical constituent, phrase, and sentence boundaries and manipulated acoustic cues (pause, lengthening, pitch) to estimate their perceptual weights during phrase-boundary identification. L2 learners showed reduced discrimination sensitivity only at the phrase level, performing comparably to native speakers at lexical constituent and sentence boundaries. For phrase boundaries, learners over-relied on pitch and under-relied on pre-boundary lengthening compared to native speakers, though both groups weighted pauses strongly. This selective deficit implicates the transfer of L1 cue-weighting strategies more than a global knowledge deficit. Our findings support a dynamic transfer model where L1 sensitivity to lexical tone transfer of L2 phrase perception, elevating the weight of pitch. While learners show partial adaptation, these results refine the Cue-Weighting Transfer Hypothesis by demonstrating that L2 prosodic acquisition involves both integrated L1 transfer and L2-driven reweighting strategies. Full article
Show Figures

Figure 1

22 pages, 2431 KB  
Article
Perceptual Plasticity in Bilinguals: Language Dominance Reshapes Acoustic Cue Weightings
by Annie Tremblay and Hyoju Kim
Brain Sci. 2025, 15(10), 1053; https://doi.org/10.3390/brainsci15101053 - 27 Sep 2025
Viewed by 901
Abstract
Background/Objectives: Speech perception is shaped by language experience, with listeners learning to selectively attend to acoustic cues that are informative in their language. This study investigates how language dominance, a proxy for long-term language experience, modulates cue weighting in highly proficient Spanish–English bilinguals’ [...] Read more.
Background/Objectives: Speech perception is shaped by language experience, with listeners learning to selectively attend to acoustic cues that are informative in their language. This study investigates how language dominance, a proxy for long-term language experience, modulates cue weighting in highly proficient Spanish–English bilinguals’ perception of English lexical stress. Methods: We tested 39 bilinguals with varying dominance profiles and 40 monolingual English speakers in a stress identification task using auditory stimuli that independently manipulated vowel quality, pitch, and duration. Results: Bayesian logistic regression models revealed that, compared to monolinguals, bilinguals relied less on vowel quality and more on pitch and duration, mirroring cue distributions in Spanish versus English. Critically, cue weighting within the bilingual group varied systematically with language dominance: English-dominant bilinguals patterned more like monolingual English listeners, showing increased reliance on vowel quality and decreased reliance on pitch and duration, whereas Spanish-dominant bilinguals retained a cue weighting that was more Spanish-like. Conclusions: These results support experience-based models of speech perception and provide behavioral evidence that bilinguals’ perceptual attention to acoustic cues remains flexible and dynamically responsive to long-term input. These results are in line with a neurobiological account of speech perception in which attentional and representational mechanisms adapt to changes in the input. Full article
(This article belongs to the Special Issue Language Perception and Processing)
Show Figures

Figure 1

23 pages, 1105 KB  
Article
Examining Speech Perception–Production Relationships Through Tone Perception and Production Learning Among Indonesian Learners of Mandarin
by Keith K. W. Leung, Yu-An Lu and Yue Wang
Brain Sci. 2025, 15(7), 671; https://doi.org/10.3390/brainsci15070671 - 22 Jun 2025
Cited by 2 | Viewed by 1683
Abstract
Background: A transfer of learning effects across speech perception and production is evident in second-language (L2)-learning research, suggesting that perception and production are closely linked in L2 speech learning. However, underlying factors, such as the phonetic cue weightings given to acoustic features, of [...] Read more.
Background: A transfer of learning effects across speech perception and production is evident in second-language (L2)-learning research, suggesting that perception and production are closely linked in L2 speech learning. However, underlying factors, such as the phonetic cue weightings given to acoustic features, of the relationship between perception and production improvements are less explored. To address this research gap, the current study explored the effects of Mandarin tone learning on the production and perception of critical (pitch direction) and non-critical (pitch height) perceptual cues. Methods: This study tracked the Mandarin learning effects of Indonesian adult learners over a four-to-six-week learning period. Results: We found that perception and production gains in Mandarin L2 learning concurrently occurred with the critical pitch direction cue, F0 slope. The non-critical pitch height cue, F0 mean, only displayed a production gain. Conclusions: The results indicate the role of critical perceptual cues in relating tone perception and production in general, and in the transfer of learning effects across the two domains for L2 learning. These results demonstrate the transfer of the ability to perceive phonological contrasts using critical phonetic information to the production domain based on the same cue weighting, suggesting interconnected encoding and decoding processes in L2 speech learning. Full article
(This article belongs to the Special Issue Language Perception and Processing)
Show Figures

Figure 1

11 pages, 1362 KB  
Article
Cue Weighting in Perception of the Retroflex and Non-Retroflex Laterals in the Zibo Dialect of Chinese
by Bing Dong, Jie Liang and Chang Liu
Behav. Sci. 2023, 13(6), 469; https://doi.org/10.3390/bs13060469 - 4 Jun 2023
Viewed by 1957
Abstract
This study investigated cue weighting in the perception of the retroflex and non-retroflex lateral contrast in the monosyllabic words /ɭə/ and /lə/ in the Zibo dialect of Chinese. A binary forced-choice identification task was carried out among 32 natives, using computer-modified natural speech [...] Read more.
This study investigated cue weighting in the perception of the retroflex and non-retroflex lateral contrast in the monosyllabic words /ɭə/ and /lə/ in the Zibo dialect of Chinese. A binary forced-choice identification task was carried out among 32 natives, using computer-modified natural speech situated in a two-dimensional acoustic space. The results showed that both acoustic cues had a significant main effect on lateral identification, with F1 of the following schwa being the primary cue and the consonant-tos-vowel (C/V) duration ratio as a secondary cue. No interaction effect was found between these two acoustic cues. Moreover, the results indicated that acoustic cues were not equally weighted in production and perception of the syllables /ɭə/ and /lə/ in the Zibo dialect. Future studies are suggested involving other acoustic cues (e.g., the F1 of laterals) or adding noise in the identification task to better understand listeners’ listening strategies in their perception of the two laterals in the Zibo dialect. Full article
(This article belongs to the Section Cognition)
Show Figures

Figure 1

16 pages, 1569 KB  
Article
Learning to Perceive Non-Native Tones via Distributional Training: Effects of Task and Acoustic Cue Weighting
by Liquan Liu, Chi Yuan, Jia Hoong Ong, Alba Tuninetti, Mark Antoniou, Anne Cutler and Paola Escudero
Brain Sci. 2022, 12(5), 559; https://doi.org/10.3390/brainsci12050559 - 27 Apr 2022
Cited by 6 | Viewed by 3587
Abstract
As many distributional learning (DL) studies have shown, adult listeners can achieve discrimination of a difficult non-native contrast after a short repetitive exposure to tokens falling at the extremes of that contrast. Such studies have shown using behavioural methods that a short distributional [...] Read more.
As many distributional learning (DL) studies have shown, adult listeners can achieve discrimination of a difficult non-native contrast after a short repetitive exposure to tokens falling at the extremes of that contrast. Such studies have shown using behavioural methods that a short distributional training can induce perceptual learning of vowel and consonant contrasts. However, much less is known about the neurological correlates of DL, and few studies have examined non-native lexical tone contrasts. Here, Australian-English speakers underwent DL training on a Mandarin tone contrast using behavioural (discrimination, identification) and neural (oddball-EEG) tasks, with listeners hearing either a bimodal or a unimodal distribution. Behavioural results show that listeners learned to discriminate tones after both unimodal and bimodal training; while EEG responses revealed more learning for listeners exposed to the bimodal distribution. Thus, perceptual learning through exposure to brief sound distributions (a) extends to non-native tonal contrasts, and (b) is sensitive to task, phonetic distance, and acoustic cue-weighting. Our findings have implications for models of how auditory and phonetic constraints influence speech learning. Full article
(This article belongs to the Special Issue Auditory and Phonetic Processes in Speech Perception)
Show Figures

Figure 1

19 pages, 1186 KB  
Article
Effect of Environment-Related Cues on Auditory Distance Perception in the Context of Audio-Only Augmented Reality
by Vincent Martin, Isabelle Viaud-Delmon and Olivier Warusfel
Appl. Sci. 2022, 12(1), 348; https://doi.org/10.3390/app12010348 - 30 Dec 2021
Cited by 5 | Viewed by 3567
Abstract
Audio-only augmented reality consists of enhancing a real environment with virtual sound events. A seamless integration of the virtual events within the environment requires processing them with artificial spatialization and reverberation effects that simulate the acoustic properties of the room. However, in augmented [...] Read more.
Audio-only augmented reality consists of enhancing a real environment with virtual sound events. A seamless integration of the virtual events within the environment requires processing them with artificial spatialization and reverberation effects that simulate the acoustic properties of the room. However, in augmented reality, the visual and acoustic environment of the listener may not be fully mastered. This study aims to gain some insight into the acoustic cues (intensity and reverberation) that are used by the listener to form an auditory distance judgment, and to observe if these strategies can be influenced by the listener’s environment. To do so, we present a perceptual evaluation of two distance-rendering models informed by a measured Spatial Room Impulse Response. The choice of the rendering methods was made to design stimuli categories in which the availability and reproduction quality of acoustic cues are different. The proposed models have been evaluated in an online experiment gathering 108 participants who were asked to provide judgments of auditory distance about a stationary source. To evaluate the importance of environmental cues, participants had to describe the environment in which they were running the experiment, and more specifically the volume of the room and the distance to the wall they were facing. It could be shown that these context cues had a limited, but significant, influence on the perceived auditory distance. Full article
(This article belongs to the Special Issue Psychoacoustics for Extended Reality (XR))
Show Figures

Figure 1

Back to TopTop