Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (20)

Search Parameters:
Keywords = multichannel audio signal

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 916 KB  
Article
Real-Time Electroencephalography-Guided Binaural Beat Audio Enhances Relaxation and Cognitive Performance: A Randomized, Double-Blind, Sham-Controlled Repeated-Measures Crossover Trial
by Chanaka N. Kahathuduwa, Jessica Blume, Chinnadurai Mani and Chathurika S. Dhanasekara
Physiologia 2025, 5(4), 44; https://doi.org/10.3390/physiologia5040044 - 24 Oct 2025
Viewed by 3842
Abstract
Background/Objectives: Binaural beat audio has gained popularity as a non-invasive tool to promote relaxation and enhance cognitive performance, though empirical support has been inconsistent. We developed a novel algorithm integrating real-time electroencephalography (EEG) feedback to dynamically tailor binaural beats to induce relaxed brain [...] Read more.
Background/Objectives: Binaural beat audio has gained popularity as a non-invasive tool to promote relaxation and enhance cognitive performance, though empirical support has been inconsistent. We developed a novel algorithm integrating real-time electroencephalography (EEG) feedback to dynamically tailor binaural beats to induce relaxed brain states. This study aimed to examine the efficacy and feasibility of this algorithm in a clinical trial. Methods: In a randomized, double-blinded, sham-controlled crossover trial, 25 healthy adults completed two 30 min sessions (EEG-guided intervention versus sham). EEG (Fp1) was recorded using a consumer-grade single-electrode headset, with auditory stimulation adjusted in real time based on EEG data. Outcomes included EEG frequency profiles, stop signal reaction time (SSRT), and novelty encoding task performance. Results: The intervention rapidly reduced dominant EEG frequency in all participants, with 100% achieving <8 Hz and 96% achieving <4 Hz within median 7.4 and 9.0 min, respectively. Compared to the sham, the intervention was associated with an faster novelty encoding reaction time (p = 0.039, dz = −0.225) and trends towards improved SSRT (p = 0.098, dz = −0.209), increased boundary separation in stop trials (p = 0.065, dz = 0.350), and improved inhibitory drift rate (p = 0.067, dz = 0.452) within the limits of the exploratory nature of these findings. Twenty-four (96%) participants reached a target level of <4 Hz with the intervention, while none reached this level with the sham. Conclusions: Real-time EEG-guided binaural beats may rapidly induce low-frequency brain states while potentially preserving or enhancing aspects of executive function. These findings support the feasibility of personalized, closed-loop auditory entrainment for promoting “relaxed alertness.” The results are preliminary and hypothesis-generating, warranting larger, multi-channel EEG studies in ecologically valid contexts. Full article
Show Figures

Figure 1

16 pages, 1530 KB  
Article
Enhanced Respiratory Sound Classification Using Deep Learning and Multi-Channel Auscultation
by Yeonkyeong Kim, Kyu Bom Kim, Ah Young Leem, Kyuseok Kim and Su Hwan Lee
J. Clin. Med. 2025, 14(15), 5437; https://doi.org/10.3390/jcm14155437 - 1 Aug 2025
Cited by 1 | Viewed by 2547
Abstract
Background/Objectives: Identifying and classifying abnormal lung sounds is essential for diagnosing patients with respiratory disorders. In particular, the simultaneous recording of auscultation signals from multiple clinically relevant positions offers greater diagnostic potential compared to traditional single-channel measurements. This study aims to improve [...] Read more.
Background/Objectives: Identifying and classifying abnormal lung sounds is essential for diagnosing patients with respiratory disorders. In particular, the simultaneous recording of auscultation signals from multiple clinically relevant positions offers greater diagnostic potential compared to traditional single-channel measurements. This study aims to improve the accuracy of respiratory sound classification by leveraging multichannel signals and capturing positional characteristics from multiple sites in the same patient. Methods: We evaluated the performance of respiratory sound classification using multichannel lung sound data with a deep learning model that combines a convolutional neural network (CNN) and long short-term memory (LSTM), based on mel-frequency cepstral coefficients (MFCCs). We analyzed the impact of the number and placement of channels on classification performance. Results: The results demonstrated that using four-channel recordings improved accuracy, sensitivity, specificity, precision, and F1-score by approximately 1.11, 1.15, 1.05, 1.08, and 1.13 times, respectively, compared to using three, two, or single-channel recordings. Conclusions: This study confirms that multichannel data capture a richer set of features corresponding to various respiratory sound characteristics, leading to significantly improved classification performance. The proposed method holds promise for enhancing sound classification accuracy not only in clinical applications but also in broader domains such as speech and audio processing. Full article
(This article belongs to the Section Respiratory Medicine)
Show Figures

Figure 1

23 pages, 4973 KB  
Article
Detection of Electric Network Frequency in Audio Using Multi-HCNet
by Yujin Li, Tianliang Lu, Shufan Peng, Chunhao He, Kai Zhao, Gang Yang and Yan Chen
Sensors 2025, 25(12), 3697; https://doi.org/10.3390/s25123697 - 13 Jun 2025
Viewed by 1242
Abstract
With the increasing application of electrical network frequency (ENF) in forensic audio and video analysis, ENF signal detection has emerged as a critical technology. However, high-pass filtering operations commonly employed in modern communication scenarios, while effectively removing infrasound to enhance communication quality at [...] Read more.
With the increasing application of electrical network frequency (ENF) in forensic audio and video analysis, ENF signal detection has emerged as a critical technology. However, high-pass filtering operations commonly employed in modern communication scenarios, while effectively removing infrasound to enhance communication quality at reduced costs, result in a substantial loss of fundamental frequency information, thereby degrading the performance of existing detection methods. To tackle this issue, this paper introduces Multi-HCNet, an innovative deep learning model specifically tailored for ENF signal detection in high-pass filtered environments. Specifically, the model incorporates an array of high-order harmonic filters (AFB), which compensates for the loss of fundamental frequency by capturing high-order harmonic components. Additionally, a grouped multi-channel adaptive attention mechanism (GMCAA) is proposed to precisely distinguish between multiple frequency signals, demonstrating particular effectiveness in differentiating between 50 Hz and 60 Hz fundamental frequency signals. Furthermore, a sine activation function (SAF) is utilized to better align with the periodic nature of ENF signals, enhancing the model’s capacity to capture periodic oscillations. Experimental results indicate that after hyperparameter optimization, Multi-HCNet exhibits superior performance across various experimental conditions. Compared to existing approaches, this study not only significantly improves the detection accuracy of ENF signals in complex environments, achieving a peak accuracy of 98.84%, but also maintains an average detection accuracy exceeding 80% under high-pass filtering conditions. These findings demonstrate that even in scenarios where fundamental frequency information is lost, the model remains capable of effectively detecting ENF signals, offering a novel solution for ENF signal detection under extreme conditions of fundamental frequency absence. Moreover, this study successfully distinguishes between 50 Hz and 60 Hz fundamental frequency signals, providing robust support for the practical deployment and extension of ENF signal applications. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

24 pages, 7390 KB  
Article
Algorithm for Extraction of Reflection Waves in Single-Well Imaging Based on MC-ConvTasNet
by Wanting Lin, Jiaqi Xu and Hengshan Hu
Appl. Sci. 2025, 15(8), 4189; https://doi.org/10.3390/app15084189 - 10 Apr 2025
Viewed by 1553
Abstract
Single-well imaging makes use of reflected waves to image geological structures outside a borehole, with a detection distance expected to reach tens of meters. However, in the received full wave signal, reflected waves have much smaller amplitudes than borehole-guided waves, which travel directly [...] Read more.
Single-well imaging makes use of reflected waves to image geological structures outside a borehole, with a detection distance expected to reach tens of meters. However, in the received full wave signal, reflected waves have much smaller amplitudes than borehole-guided waves, which travel directly through the borehole. To obtain clear reflected waves, we use a deep neural network, the multi-channel convolutional time-domain audio separation network (MC-ConvTasNet), to extract reflected waves. In the signal channels of the common-source gather, there exists a notable arrival time difference between direct waves and reflected waves. Leveraging this characteristic, we train MC-ConvTasNet on the common-source gathers, ultimately achieving satisfactory results in wave separation. For the hard-to-hard single-interface, soft-to-hard single-interface and double-interface models, the reflected waves extracted by MC-ConvTasNet are closer to the theoretical reflected waves in both phase and shape (the average scale-invariant signal-to-distortion ratio exceeds 32 dB) than those extracted by parameter estimation, a median filter and an F-K filter. Meanwhile, MC-ConvTasNet naturally fits in the scenarios of various inclined interfaces and interfaces parallel to the borehole axis. As an application, our method is employed on field logging data and its ability to separate waves is verified. Full article
(This article belongs to the Special Issue Seismic Analysis and Design of Ocean and Underground Structures)
Show Figures

Figure 1

20 pages, 3003 KB  
Article
Equipment Sounds’ Event Localization and Detection Using Synthetic Multi-Channel Audio Signal to Support Collision Hazard Prevention
by Kehinde Elelu, Tuyen Le and Chau Le
Buildings 2024, 14(11), 3347; https://doi.org/10.3390/buildings14113347 - 23 Oct 2024
Viewed by 1835
Abstract
Construction workplaces often face unforeseen collision hazards due to a decline in auditory situational awareness among on-foot workers, leading to severe injuries and fatalities. Previous studies that used auditory signals to prevent collision hazards focused on employing a classical beamforming approach to determine [...] Read more.
Construction workplaces often face unforeseen collision hazards due to a decline in auditory situational awareness among on-foot workers, leading to severe injuries and fatalities. Previous studies that used auditory signals to prevent collision hazards focused on employing a classical beamforming approach to determine equipment sounds’ Direction of Arrival (DOA). No existing frameworks implement a neural network-based approach for both equipment sound classification and localization. This paper presents an innovative framework for sound classification and localization using multichannel sound datasets artificially synthesized in a virtual three-dimensional space. The simulation synthesized 10,000 multi-channel datasets using just fourteen single sound source audiotapes. This training includes a two-staged convolutional recurrent neural network (CRNN), where the first stage learns multi-label sound event classes followed by the second stage to estimate their DOA. The proposed framework achieves a low average DOA error of 30 degrees and a high F-score of 0.98, demonstrating accurate localization and classification of equipment near workers’ positions on the site. Full article
(This article belongs to the Special Issue Big Data Technologies in Construction Management)
Show Figures

Figure 1

18 pages, 3703 KB  
Article
Joint Spatio-Temporal-Frequency Representation Learning for Improved Sound Event Localization and Detection
by Baoqing Chen, Mei Wang and Yu Gu
Sensors 2024, 24(18), 6090; https://doi.org/10.3390/s24186090 - 20 Sep 2024
Cited by 3 | Viewed by 2616
Abstract
Sound event localization and detection (SELD) is a crucial component of machine listening that aims to simultaneously identify and localize sound events in multichannel audio recordings. This task demands an integrated analysis of spatial, temporal, and frequency domains to accurately characterize sound events. [...] Read more.
Sound event localization and detection (SELD) is a crucial component of machine listening that aims to simultaneously identify and localize sound events in multichannel audio recordings. This task demands an integrated analysis of spatial, temporal, and frequency domains to accurately characterize sound events. The spatial domain pertains to the varying acoustic signals captured by multichannel microphones, which are essential for determining the location of sound sources. However, the majority of recent studies have focused on time-frequency correlations and spatio-temporal correlations separately, leading to inadequate performance in real-life scenarios. In this paper, we propose a novel SELD method that utilizes the newly developed Spatio-Temporal-Frequency Fusion Network (STFF-Net) to jointly learn comprehensive features across spatial, temporal, and frequency domains of sound events. The backbone of our STFF-Net is the Enhanced-3D (E3D) residual block, which combines 3D convolutions with a parameter-free attention mechanism to capture and refine the intricate correlations among these domains. Furthermore, our method incorporates the multi-ACCDOA format to effectively handle homogeneous overlaps between sound events. During the evaluation, we conduct extensive experiments on three de facto benchmark datasets, and our results demonstrate that the proposed SELD method significantly outperforms current state-of-the-art approaches. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

14 pages, 1709 KB  
Article
Multi-Channel Audio Completion Algorithm Based on Tensor Nuclear Norm
by Lin Zhu, Lidong Yang, Yong Guo, Dawei Niu and Dandan Zhang
Electronics 2024, 13(9), 1745; https://doi.org/10.3390/electronics13091745 - 1 May 2024
Viewed by 1679
Abstract
Multi-channel audio signals provide a better auditory sensation to the audience. However, missing data may occur in the collection, transmission, compression, or other processes of audio signals, resulting in audio quality degradation and affecting the auditory experience. As a result, the completeness of [...] Read more.
Multi-channel audio signals provide a better auditory sensation to the audience. However, missing data may occur in the collection, transmission, compression, or other processes of audio signals, resulting in audio quality degradation and affecting the auditory experience. As a result, the completeness of the audio signal has become a popular research topic in the field of signal processing. In this paper, the tensor nuclear norm is introduced into the audio signal completion algorithm, and the multi-channel audio signals with missing data are restored by using the completion algorithm based on the tensor nuclear norm. First of all, the multi-channel audio signals are preprocessed and are then transformed from the time domain to the frequency domain. Afterwards, the multi-channel audio with missing data is modeled to construct a third-order multi-channel audio tensor. In the next part, the tensor completion algorithm is used to complete the third-order tensor. The optimal solution of the convex optimization model of the tensor completion is obtained by using the convex relaxation technique and, ultimately, the data recovery of the multi-channel audio with data loss is accomplished. The experimental results of the tensor completion algorithm and the traditional matrix completion algorithm are compared using both objective and subjective indicators. The final result shows that the high-order tensor completion algorithm has a better completion ability and can restore the audio signal better. Full article
(This article belongs to the Section Circuit and Signal Processing)
Show Figures

Figure 1

30 pages, 6523 KB  
Article
On the Challenges of Acoustic Energy Mapping Using a WASN: Synchronization and Audio Capture
by Emiliano Ehecatl García-Unzueta, Paul Erick Mendez-Monroy and Caleb Rascon
Sensors 2023, 23(10), 4645; https://doi.org/10.3390/s23104645 - 10 May 2023
Cited by 1 | Viewed by 2640
Abstract
Acoustic energy mapping provides the functionality to obtain characteristics of acoustic sources, as: presence, localization, type and trajectory of sound sources. Several beamforming-based techniques can be used for this purpose. However, they rely on the difference of arrival times of the signal at [...] Read more.
Acoustic energy mapping provides the functionality to obtain characteristics of acoustic sources, as: presence, localization, type and trajectory of sound sources. Several beamforming-based techniques can be used for this purpose. However, they rely on the difference of arrival times of the signal at each capture node (or microphone), so it is of major importance to have synchronized multi-channel recordings. A Wireless Acoustic Sensor Network (WASN) can be very practical to install when used for mapping the acoustic energy of a given acoustic environment. However, they are known for having low synchronization between the recordings from each node. The objective of this paper is to characterize the impact of current popular synchronization methodologies as part of the WASN to capture reliable data to be used for acoustic energy mapping. The two evaluated synchronization protocols are: Network Time Protocol (NTP) y Precision Time Protocol (PTP). Additionally, three different audio capture methodologies were proposed for the WASN to capture the acoustic signal: two of them, recording the data locally and one sending the data through a local wireless network. As a real-life evaluation scenario, a WASN was built using nodes conformed by a Raspberry Pi 4B+ with a single MEMS microphone. Experimental results demonstrate that the most reliable methodology is using the PTP synchronization protocol and audio recording locally. Full article
(This article belongs to the Special Issue Reliability Analysis of Wireless Sensor Network)
Show Figures

Figure 1

10 pages, 3010 KB  
Article
Multi-Channel Long-Distance Audio Transmission System Using Power-over-Fiber Technology
by Can Guo, Chenggang Guan, Hui Lv, Shiyi Chai and Hao Chen
Photonics 2023, 10(5), 521; https://doi.org/10.3390/photonics10050521 - 1 May 2023
Cited by 9 | Viewed by 3447
Abstract
To establish stable communication networks in harsh environments where power supply is difficult, such as coal mines and underwater, we propose an effective scheme for co-transmission of analog audio signals and energy. By leveraging the advantages of optical fibers, such as corrosion resistance [...] Read more.
To establish stable communication networks in harsh environments where power supply is difficult, such as coal mines and underwater, we propose an effective scheme for co-transmission of analog audio signals and energy. By leveraging the advantages of optical fibers, such as corrosion resistance and strong resistance to electromagnetic interference, the scheme uses a 1550 nm laser beam as the carrier for analog audio signal propagation, which is then converted to electrical energy through a custom InGaAs/InP photovoltaic power converter (PPC) for energy supply and information transfer without an external power supply after a 25 km fiber transmission. In the experiment, with 160 mW of optical power injection, the scheme not only provides 4 mW of electrical power, but also transmits an analog signal with an acoustic overload point (AOP) of 105 dBSPL and a signal-to-noise ratio (SNR) of 50 dB. In addition, the system employs wavelength division multiplexing (WDM) technology to transform from single-channel to multi-channel communication on a single independent fiber, enabling the arraying of receiving terminals. The passive arrayed terminals make the multi-channel long-distance audio transmission system using power-over-fiber (PoF) technology a superior choice for establishing a stable communication network in harsh environments. Full article
Show Figures

Figure 1

19 pages, 7857 KB  
Article
Multi-Scale Audio Spectrogram Transformer for Classroom Teaching Interaction Recognition
by Fan Liu and Jiandong Fang
Future Internet 2023, 15(2), 65; https://doi.org/10.3390/fi15020065 - 2 Feb 2023
Cited by 7 | Viewed by 5062
Abstract
Classroom interactivity is one of the important metrics for assessing classrooms, and identifying classroom interactivity through classroom image data is limited by the interference of complex teaching scenarios. However, audio data within the classroom are characterized by significant student–teacher interaction. This study proposes [...] Read more.
Classroom interactivity is one of the important metrics for assessing classrooms, and identifying classroom interactivity through classroom image data is limited by the interference of complex teaching scenarios. However, audio data within the classroom are characterized by significant student–teacher interaction. This study proposes a multi-scale audio spectrogram transformer (MAST) speech scene classification algorithm and constructs a classroom interactive audio dataset to achieve interactive teacher–student recognition in the classroom teaching process. First, the original speech signal is sampled and pre-processed to generate a multi-channel spectrogram, which enhances the representation of features compared with single-channel features; Second, in order to efficiently capture the long-range global context of the audio spectrogram, the audio features are globally modeled by the multi-head self-attention mechanism of MAST, and the feature resolution is reduced during feature extraction to continuously enrich the layer-level features while reducing the model complexity; Finally, a further combination with a time-frequency enrichment module maps the final output to a class feature map, enabling accurate audio category recognition. The experimental comparison of MAST is carried out on the public environment audio dataset and the self-built classroom audio interaction datasets. Compared with the previous state-of-the-art methods on public datasets AudioSet and ESC-50, its accuracy has been improved by 3% and 5%, respectively, and the accuracy of the self-built classroom audio interaction dataset has reached 92.1%. These results demonstrate the effectiveness of MAST in the field of general audio classification and the smart classroom domain. Full article
Show Figures

Figure 1

14 pages, 3232 KB  
Article
Deep Learning-Based Acoustic Echo Cancellation for Surround Sound Systems
by Guoteng Li, Chengshi Zheng, Yuxuan Ke and Xiaodong Li
Appl. Sci. 2023, 13(3), 1266; https://doi.org/10.3390/app13031266 - 17 Jan 2023
Cited by 3 | Viewed by 6599
Abstract
Surround sound systems that play back multi-channel audio signals through multiple loudspeakers can improve augmented reality, which has been widely used in many multimedia communication systems. It is common that a hand-free speech communication system suffers from the acoustic echo problem, and the [...] Read more.
Surround sound systems that play back multi-channel audio signals through multiple loudspeakers can improve augmented reality, which has been widely used in many multimedia communication systems. It is common that a hand-free speech communication system suffers from the acoustic echo problem, and the echo needs to be canceled or suppressed completely. This paper proposes a deep learning-based acoustic echo cancellation (AEC) method to recover the desired near-end speech from the microphone signals in surround sound systems. The ambisonics technique was adopted to record the surround sound for reproduction. To achieve a better generalization capability against different loudspeaker layouts, the compressed complex spectra of the first-order ambisonic signals (B-format) were sent to the neural network as the input features directly instead of using the ambisonic decoded signals (D-format). Experimental results on both simulated and real acoustic environments showed the effectiveness of the proposed algorithm in surround AEC, and outperformed other competing methods in terms of the speech quality and the amount of echo reduction. Full article
(This article belongs to the Special Issue Advances in Speech and Language Processing)
Show Figures

Figure 1

26 pages, 7180 KB  
Review
A Survey of Optimization Methods for Independent Vector Analysis in Audio Source Separation
by Ruiming Guo, Zhongqiang Luo and Mingchun Li
Sensors 2023, 23(1), 493; https://doi.org/10.3390/s23010493 - 2 Jan 2023
Cited by 18 | Viewed by 4958
Abstract
With the advent of the era of big data information, artificial intelligence (AI) methods have become extremely promising and attractive. It has become extremely important to extract useful signals by decomposing various mixed signals through blind source separation (BSS). BSS has been proven [...] Read more.
With the advent of the era of big data information, artificial intelligence (AI) methods have become extremely promising and attractive. It has become extremely important to extract useful signals by decomposing various mixed signals through blind source separation (BSS). BSS has been proven to have prominent applications in multichannel audio processing. For multichannel speech signals, independent component analysis (ICA) requires a certain statistical independence of source signals and other conditions to allow blind separation. independent vector analysis (IVA) is an extension of ICA for the simultaneous separation of multiple parallel mixed signals. IVA solves the problem of arrangement ambiguity caused by independent component analysis by exploiting the dependencies between source signal components and plays a crucial role in dealing with the problem of convolutional blind signal separation. So far, many researchers have made great contributions to the improvement of this algorithm by adopting different methods to optimize the update rules of the algorithm, accelerate the convergence speed of the algorithm, enhance the separation performance of the algorithm, and adapt to different application scenarios. This meaningful and attractive research work prompted us to conduct a comprehensive survey of this field. This paper briefly reviews the basic principles of the BSS problem, ICA, and IVA and focuses on the existing IVA-based optimization update rule techniques. Additionally, the experimental results show that the AuxIVA-IPA method has the best performance in the deterministic environment, followed by AuxIVA-IP2, and the OverIVA-IP2 has the best performance in the overdetermined environment. The performance of the IVA-NG method is not very optimistic in all environments. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

17 pages, 5371 KB  
Article
A Parallel Classification Model for Marine Mammal Sounds Based on Multi-Dimensional Feature Extraction and Data Augmentation
by Wenyu Cai, Jifeng Zhu, Meiyan Zhang and Yong Yang
Sensors 2022, 22(19), 7443; https://doi.org/10.3390/s22197443 - 30 Sep 2022
Cited by 12 | Viewed by 3749
Abstract
Due to the poor visibility of the deep-sea environment, acoustic signals are often collected and analyzed to explore the behavior of marine species. With the progress of underwater signal-acquisition technology, the amount of acoustic data obtained from the ocean has exceeded the limit [...] Read more.
Due to the poor visibility of the deep-sea environment, acoustic signals are often collected and analyzed to explore the behavior of marine species. With the progress of underwater signal-acquisition technology, the amount of acoustic data obtained from the ocean has exceeded the limit that human can process manually, so designing efficient marine-mammal classification algorithms has become a research hotspot. In this paper, we design a classification model based on a multi-channel parallel structure, which can process multi-dimensional acoustic features extracted from audio samples, and fuse the prediction results of different channels through a trainable full connection layer. It uses transfer learning to obtain faster convergence speed, and introduces data augmentation to improve the classification accuracy. The k-fold cross-validation method was used to segment the data set to comprehensively evaluate the prediction accuracy and robustness of the model. The evaluation results showed that the model can achieve a mean accuracy of 95.21% while maintaining a standard deviation of 0.65%. There was excellent consistency in performance over multiple tests. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

13 pages, 1733 KB  
Article
Familiarity of Background Music Modulates the Cortical Tracking of Target Speech at the “Cocktail Party”
by Jane A. Brown and Gavin M. Bidelman
Brain Sci. 2022, 12(10), 1320; https://doi.org/10.3390/brainsci12101320 - 29 Sep 2022
Cited by 17 | Viewed by 3824
Abstract
The “cocktail party” problem—how a listener perceives speech in noisy environments—is typically studied using speech (multi-talker babble) or noise maskers. However, realistic cocktail party scenarios often include background music (e.g., coffee shops, concerts). Studies investigating music’s effects on concurrent speech perception have predominantly [...] Read more.
The “cocktail party” problem—how a listener perceives speech in noisy environments—is typically studied using speech (multi-talker babble) or noise maskers. However, realistic cocktail party scenarios often include background music (e.g., coffee shops, concerts). Studies investigating music’s effects on concurrent speech perception have predominantly used highly controlled synthetic music or shaped noise, which do not reflect naturalistic listening environments. Behaviorally, familiar background music and songs with vocals/lyrics inhibit concurrent speech recognition. Here, we investigated the neural bases of these effects. While recording multichannel EEG, participants listened to an audiobook while popular songs (or silence) played in the background at a 0 dB signal-to-noise ratio. Songs were either familiar or unfamiliar to listeners and featured either vocals or isolated instrumentals from the original audio recordings. Comprehension questions probed task engagement. We used temporal response functions (TRFs) to isolate cortical tracking to the target speech envelope and analyzed neural responses around 100 ms (i.e., auditory N1 wave). We found that speech comprehension was, expectedly, impaired during background music compared to silence. Target speech tracking was further hindered by the presence of vocals. When masked by familiar music, response latencies to speech were less susceptible to informational masking, suggesting concurrent neural tracking of speech was easier during music known to the listener. These differential effects of music familiarity were further exacerbated in listeners with less musical ability. Our neuroimaging results and their dependence on listening skills are consistent with early attentional-gain mechanisms where familiar music is easier to tune out (listeners already know the song’s expectancies) and thus can allocate fewer attentional resources to the background music to better monitor concurrent speech material. Full article
(This article belongs to the Section Behavioral Neuroscience)
Show Figures

Figure 1

13 pages, 3171 KB  
Article
Digital Taste in Mulsemedia Augmented Reality: Perspective on Developments and Challenges
by Angel Swastik Duggal, Rajesh Singh, Anita Gehlot, Mamoon Rashid, Sultan S. Alshamrani and Ahmed Saeed AlGhamdi
Electronics 2022, 11(9), 1315; https://doi.org/10.3390/electronics11091315 - 21 Apr 2022
Cited by 22 | Viewed by 3987
Abstract
Digitalization of human taste has been on the back burners of multi-sensory media until the beginning of the decade, with audio, video, and haptic input/output(I/O) taking over as the major sensory mechanisms. This article reviews the consolidated literature on augmented reality (AR) in [...] Read more.
Digitalization of human taste has been on the back burners of multi-sensory media until the beginning of the decade, with audio, video, and haptic input/output(I/O) taking over as the major sensory mechanisms. This article reviews the consolidated literature on augmented reality (AR) in the modulation and stimulation of the sensation of taste in humans using low-amplitude electrical signals. Describing multiple factors that combine to produce a single taste, various techniques to stimulate/modulate taste artificially are described. The article explores techniques from prominent research pools with an inclination towards taste modulation. The goal is to seamlessly integrate gustatory augmentation into the commercial market. It highlights core benefits and limitations and proposes feasible extensions to the already established technological architecture for taste stimulation and modulation, namely, from the Internet of Things, artificial intelligence, and machine learning. Past research on taste has had a more software-oriented approach, with a few trends getting exceptions presented as taste modulation hardware. Using modern technological extensions, the medium of taste has the potential to merge with audio and video data streams as a viable multichannel medium for the transfer of sensory information. Full article
Show Figures

Figure 1

Back to TopTop