You are currently viewing a new version of our website. To view the old version click .

146 Results Found

  • Article
  • Open Access
3 Citations
3,004 Views
19 Pages

Audio Pre-Processing and Beamforming Implementation on Embedded Systems

  • Jian-Hong Wang,
  • Phuong Thi Le,
  • Shih-Jung Kuo,
  • Tzu-Chiang Tai,
  • Kuo-Chen Li,
  • Shih-Lun Chen,
  • Ze-Yu Wang,
  • Tuan Pham,
  • Yung-Hui Li and
  • Jia-Ching Wang

Since the invention of the microphone by Barina in 1876, there have been numerous applications of audio processing, such as phonographs, broadcasting stations, and public address systems, which merely capture and amplify sound and play it back. Nowad...

  • Article
  • Open Access
1 Citations
3,046 Views
16 Pages

In recent years, the advances in deep neural networks (DNNs) and large language models (LLMs) have led to major breakthroughs and new levels of performance in Natural Language Processing (NLP), including tasks related to speech processing. Based on t...

  • Article
  • Open Access
1,309 Views
20 Pages

3 April 2025

This paper addresses the issue of distinguishing commercially played songs from non-music audio in radio broadcasts, where automatic song identification systems are commonly employed for reporting purposes. Service call costs increase because these s...

  • Article
  • Open Access
10 Citations
4,267 Views
13 Pages

An Automatic Classification System for Environmental Sound in Smart Cities

  • Dongping Zhang,
  • Ziyin Zhong,
  • Yuejian Xia,
  • Zhutao Wang and
  • Wenbo Xiong

31 July 2023

With the continuous promotion of “smart cities” worldwide, the approach to be used in combining smart cities with modern advanced technologies (Internet of Things, cloud computing, artificial intelligence) has become a hot topic. However,...

  • Article
  • Open Access
3 Citations
3,229 Views
12 Pages

Two-Dimensional Audio Compression Method Using Video Coding Schemes

  • Seonjae Kim,
  • Dongsan Jun,
  • Byung-Gyu Kim,
  • Seungkwon Beack,
  • Misuk Lee and
  • Taejin Lee

As video compression is one of the core technologies that enables seamless media streaming within the available network bandwidth, it is crucial to employ media codecs to support powerful coding performance and higher visual quality. Versatile Video...

  • Article
  • Open Access
6 Citations
2,091 Views
18 Pages

1 December 2023

Partial discharge (PD) is a common issue in power transformers that can lead to catastrophic failures if left undetected. Time reversal (TR) is a well-known technique in signal processing that can reconstruct signals by reversing the direction of tim...

  • Article
  • Open Access
8 Citations
3,836 Views
18 Pages

5 October 2021

Acoustic scene analysis (ASA) relies on the dynamic sensing and understanding of stationary and non-stationary sounds from various events, background noises and human actions with objects. However, the spatio-temporal nature of the sound signals may...

  • Article
  • Open Access
7 Citations
5,065 Views
19 Pages

2 February 2023

Classroom interactivity is one of the important metrics for assessing classrooms, and identifying classroom interactivity through classroom image data is limited by the interference of complex teaching scenarios. However, audio data within the classr...

  • Article
  • Open Access
9 Citations
4,367 Views
19 Pages

Cicada Species Recognition Based on Acoustic Signals

  • Wan Teng Tey,
  • Tee Connie,
  • Kan Yeep Choo and
  • Michael Kah Ong Goh

28 September 2022

Traditional methods used to identify and monitor insect species are time-consuming, costly, and fully dependent on the observer’s ability. This paper presents a deep learning-based cicada species recognition system using acoustic signals to cla...

  • Review
  • Open Access
25 Citations
11,005 Views
30 Pages

12 June 2023

This article provides a detailed review of recent advances in audio-visual speech recognition (AVSR) methods that have been developed over the last decade (2013–2023). Despite the recent success of audio speech recognition systems, the problem...

  • Article
  • Open Access
8 Citations
3,085 Views
19 Pages

23 July 2021

Beamforming is a type of audio array processing techniques used for interference reduction, sound source localization, and as pre-processing stage for audio event classification and speaker identification. The auditory scene analysis community can be...

  • Article
  • Open Access
2,753 Views
16 Pages

9 April 2021

Counting the number of speakers in an audio sample can lead to innovative applications, such as a real-time ranking system. Researchers have studied advanced machine learning approaches for solving the speaker count problem. However, these solutions...

  • Article
  • Open Access
48 Citations
8,490 Views
34 Pages

3 November 2020

Over the past few years, the study of environmental sound classification (ESC) has become very popular due to the intricate nature of environmental sounds. This paper reports our study on employing various acoustic features aggregation and data enhan...

  • Data Descriptor
  • Open Access
2,877 Views
13 Pages

15 March 2024

In this article, a dataset is described which combines wind turbine supervisory control and data acquisition (SCADA), meteorological and acoustical data and thus gives a detailed description of a wind farm and its atmospheric and acoustic environment...

  • Review
  • Open Access
141 Citations
17,381 Views
16 Pages

16 March 2020

The number of publications on acoustic scene classification (ASC) in environmental audio recordings has constantly increased over the last few years. This was mainly stimulated by the annual Detection and Classification of Acoustic Scenes and Events...

  • Article
  • Open Access
7 Citations
3,467 Views
21 Pages

Energy-Efficient Audio Processing at the Edge for Biologging Applications

  • Jonathan Miquel,
  • Laurent Latorre and
  • Simon Chamaillé-Jammes

Biologging refers to the use of animal-borne recording devices to study wildlife behavior. In the case of audio recording, such devices generate large amounts of data over several months, and thus require some level of processing automation for the r...

  • Article
  • Open Access
7 Citations
4,575 Views
20 Pages

Enhancement of Conventional Beat Tracking System Using Teager–Kaiser Energy Operator

  • Matej Istvanek,
  • Zdenek Smekal,
  • Lubomir Spurny and
  • Jiri Mekyska

4 January 2020

Beat detection systems are widely used in the music information retrieval (MIR) research field for the computation of tempo and beat time positions in audio signals. One of the most important parts of these systems is usually onset detection. There i...

  • Communication
  • Open Access
2 Citations
2,120 Views
10 Pages

20 August 2023

Voice spoofing attempts to break into a specific automatic speaker verification (ASV) system by forging the user’s voice and can be used through methods such as text-to-speech (TTS), voice conversion (VC), and replay attacks. Recently, deep lea...

  • Article
  • Open Access
29 Citations
6,522 Views
22 Pages

Towards Automatic Collaboration Analytics for Group Speech Data Using Learning Analytics

  • Sambit Praharaj,
  • Maren Scheffel,
  • Marcel Schmitz,
  • Marcus Specht and
  • Hendrik Drachsler

2 May 2021

Collaboration is an important 21st Century skill. Co-located (or face-to-face) collaboration (CC) analytics gained momentum with the advent of sensor technology. Most of these works have used the audio modality to detect the quality of CC. The CC qua...

  • Article
  • Open Access
10 Citations
3,105 Views
20 Pages

Defending against FakeBob Adversarial Attacks in Speaker Verification Systems with Noise-Adding

  • Zesheng Chen,
  • Li-Chi Chang,
  • Chao Chen,
  • Guoping Wang and
  • Zhuming Bi

17 August 2022

Speaker verification systems use human voices as an important biometric to identify legitimate users, thus adding a security layer to voice-controlled Internet-of-things smart homes against illegal access. Recent studies have demonstrated that speake...

  • Article
  • Open Access
37 Citations
9,111 Views
23 Pages

Monitoring Illegal Tree Cutting through Ultra-Low-Power Smart IoT Devices

  • Alessandro Andreadis,
  • Giovanni Giambene and
  • Riccardo Zambon

16 November 2021

Forests play a fundamental role in preserving the environment and fighting global warming. Unfortunately, they are continuously reduced by human interventions such as deforestation, fires, etc. This paper proposes and evaluates a framework for automa...

  • Article
  • Open Access
1,152 Views
21 Pages

Deep Edge IoT for Acoustic Detection of Queenless Beehives

  • Christos Sad,
  • Dimitrios Kampelopoulos,
  • Ioannis Sofianidis,
  • Dimitrios Kanelis,
  • Spyridon Nikolaidis,
  • Chrysoula Tananaki and
  • Kostas Siozios

Honey bees play a vital role in ecosystem stability, and the need to monitor colony health has driven the development of IoT-based systems in beekeeping, with recent studies exploring both empirical and machine learning approaches to detect and analy...

  • Article
  • Open Access
1 Citations
3,050 Views
14 Pages

Micro-Electro-Mechanical Systems (MEMS) loudspeakers are attracting growing interest as alternatives to conventional miniature transducers for in-ear audio applications. However, their practical deployment is often hindered by pronounced resonances i...

  • Article
  • Open Access
10 Citations
5,487 Views
21 Pages

Interaural Level Difference Optimization of Binaural Ambisonic Rendering

  • Thomas McKenzie,
  • Damian T. Murphy and
  • Gavin Kearney

23 March 2019

Ambisonics is a spatial audio technique appropriate for dynamic binaural rendering due to its sound field rotation and transformation capabilities, which has made it popular for virtual reality applications. An issue with low-order Ambisonics is that...

  • Article
  • Open Access
3 Citations
3,831 Views
16 Pages

Domain Adaptation Speech-to-Text for Low-Resource European Portuguese Using Deep Learning

  • Eduardo Medeiros,
  • Leonel Corado,
  • Luís Rato,
  • Paulo Quaresma and
  • Pedro Salgueiro

24 April 2023

Automatic speech recognition (ASR), commonly known as speech-to-text, is the process of transcribing audio recordings into text, i.e., transforming speech into the respective sequence of words. This paper presents a deep learning ASR system optimizat...

  • Article
  • Open Access
13 Citations
5,572 Views
11 Pages

There is a strong correlation between the like/dislike responses to audio–visual stimuli and the emotional arousal and valence reactions of a person. In the present work, our attention is focused on the automated detection of dislike responses...

  • Review
  • Open Access
5 Citations
2,659 Views
23 Pages

The Use of Audio Signals for Detecting COVID-19: A Systematic Review

  • José Gómez Aleixandre,
  • Mohamed Elgendi and
  • Carlo Menon

23 October 2022

A systematic review on the topic of automatic detection of COVID-19 using audio signals was performed. A total of 48 papers were obtained after screening 659 records identified in the PubMed, IEEE Xplore, Embase, and Google Scholar databases. The rev...

  • Article
  • Open Access
103 Citations
13,045 Views
17 Pages

20 June 2020

This paper proposes a speech-based method for automatic depression classification. The system is based on ensemble learning for Convolutional Neural Networks (CNNs) and is evaluated using the data and the experimental protocol provided in the Depress...

  • Article
  • Open Access
5 Citations
3,571 Views
15 Pages

Multimodal Lip-Reading for Tracheostomy Patients in the Greek Language

  • Yorghos Voutos,
  • Georgios Drakopoulos,
  • Georgios Chrysovitsiotis,
  • Zoi Zachou,
  • Dimitris Kikidis,
  • Efthymios Kyrodimos and
  • Themis Exarchos

28 February 2022

Voice loss constitutes a crucial disorder which is highly associated with social isolation. The use of multimodal information sources, such as, audiovisual information, is crucial since it can lead to the development of straightforward personalized w...

  • Article
  • Open Access
2,697 Views
12 Pages

ATOSE: Audio Tagging with One-Sided Joint Embedding

  • Jaehwan Lee,
  • Daekyeong Moon,
  • Jik-Soo Kim and
  • Minkyoung Cho

6 August 2023

Audio auto-tagging is the process of assigning labels to audio clips for better categorization and management of audio file databases. With the advent of advanced artificial intelligence technologies, there has been increasing interest in directly us...

  • Article
  • Open Access
14 Citations
1,923 Views
22 Pages

19 July 2023

Source recording device identification poses a significant challenge in the field of Audio Sustainable Security (ASS). Most existing studies on end-to-end identification of digital audio sources follow a two-step process: extracting device-specific f...

  • Article
  • Open Access
1,680 Views
14 Pages

Multi-Channel Audio Completion Algorithm Based on Tensor Nuclear Norm

  • Lin Zhu,
  • Lidong Yang,
  • Yong Guo,
  • Dawei Niu and
  • Dandan Zhang

Multi-channel audio signals provide a better auditory sensation to the audience. However, missing data may occur in the collection, transmission, compression, or other processes of audio signals, resulting in audio quality degradation and affecting t...

  • Article
  • Open Access
8 Citations
4,124 Views
18 Pages

Transformer-Based Approach to Pathology Diagnosis Using Audio Spectrogram

  • Mohammad Tami,
  • Sari Masri,
  • Ahmad Hasasneh and
  • Chakib Tadj

30 April 2024

Early detection of infant pathologies by non-invasive means is a critical aspect of pediatric healthcare. Audio analysis of infant crying has emerged as a promising method to identify various health conditions without direct medical intervention. In...

  • Article
  • Open Access
6 Citations
7,793 Views
13 Pages

Audio Deep Fake Detection with Sonic Sleuth Model

  • Anfal Alshehri,
  • Danah Almalki,
  • Eaman Alharbi and
  • Somayah Albaradei

8 October 2024

Information dissemination and preservation are crucial for societal progress, especially in the technological age. While technology fosters knowledge sharing, it also risks spreading misinformation. Audio deepfakes—convincingly fabricated audio...

  • Article
  • Open Access
6 Citations
5,498 Views
21 Pages

Automated Speech Analysis in Bipolar Disorder: The CALIBER Study Protocol and Preliminary Results

  • Gerard Anmella,
  • Michele De Prisco,
  • Jeremiah B. Joyce,
  • Claudia Valenzuela-Pascual,
  • Ariadna Mas-Musons,
  • Vincenzo Oliva,
  • Giovanna Fico,
  • George Chatzisofroniou,
  • Sanjeev Mishra and
  • Majd Al-Soleiti
  • + 18 authors

23 August 2024

Background: Bipolar disorder (BD) involves significant mood and energy shifts reflected in speech patterns. Detecting these patterns is crucial for diagnosis and monitoring, currently assessed subjectively. Advances in natural language processing off...

  • Article
  • Open Access
10 Citations
3,917 Views
15 Pages

25 November 2023

Audio music genre classification is performed to categorize audio music into various genres. Traditional approaches based on convolutional recurrent neural networks do not consider long temporal information, and their sequential structures result in...

  • Article
  • Open Access
1 Citations
3,248 Views
19 Pages

AI-Enhanced Detection of Heart Murmurs: Advancing Non-Invasive Cardiovascular Diagnostics

  • Maria-Alexandra Zolya,
  • Elena-Laura Popa,
  • Cosmin Baltag,
  • Dragoș-Vasile Bratu,
  • Simona Coman and
  • Sorin-Aurel Moraru

8 March 2025

Cardiovascular diseases (CVDs) are the leading cause of death worldwide, claiming over 17 million lives annually. Early detection of conditions like heart murmurs, often indicative of heart valve abnormalities, is critical for improving patient outco...

  • Article
  • Open Access
32 Citations
5,321 Views
23 Pages

7 September 2020

Singing voice detection or vocal detection is a classification task that determines whether a given audio segment contains singing voices. This task plays a very important role in vocal-related music information retrieval tasks, such as singer identi...

  • Article
  • Open Access
4,291 Views
19 Pages

3 July 2024

Recent advancements in text-to-speech (TTS) models have aimed to streamline the two-stage process into a single-stage training approach. However, many single-stage models still lag behind in audio quality, particularly when handling Kurdish text and...

  • Article
  • Open Access
2,019 Views
22 Pages

End-to-End Multi-Modal Speaker Change Detection with Pre-Trained Models

  • Alymzhan Toleu,
  • Gulmira Tolegen,
  • Alexandr Pak,
  • Jaxylykova Assel and
  • Bagashar Zhumazhanov

14 April 2025

In this work, we propose a multi-modal speaker change detection (SCD) approach with focal loss, which integrates both audio and text features to enhance detection performance. The proposed approach utilizes pre-trained large-scale models for feature...

  • Article
  • Open Access
492 Views
32 Pages

12 December 2025

Population aging is increasing dementia care demand. We present an audio-driven monitoring pipeline that operates either on mobile phones, microcontroller nodes, or smart television sets. The system combines audio signal processing with AI tools for...

  • Article
  • Open Access
721 Views
20 Pages

Audio’s Impact on Deep Learning Models: A Comparative Study of EEG-Based Concentration Detection in VR Games

  • Jesus GomezRomero-Borquez,
  • Carolina Del-Valle-Soto,
  • José A. Del-Puerto-Flores,
  • Juan-Carlos López-Pimentel,
  • Francisco R. Castillo-Soria,
  • Roilhi F. Ibarra-Hernández and
  • Leonardo Betancur Agudelo

This study investigates the impact of audio feedback on cognitive performance during VR puzzle games using EEG analysis. Thirty participants played three different VR puzzle games under two conditions (with and without audio) while their brain activi...

  • Article
  • Open Access
2 Citations
2,741 Views
12 Pages

An Investigation into Audio–Visual Speech Recognition under a Realistic Home–TV Scenario

  • Bing Yin,
  • Shutong Niu,
  • Haitao Tang,
  • Lei Sun,
  • Jun Du,
  • Zhenhua Ling and
  • Cong Liu

23 March 2023

Robust speech recognition in real world situations is still an important problem, especially when it is affected by environmental interference factors and conversational multi-speaker interactions. Supplementing audio information with other modalitie...

  • Article
  • Open Access
3 Citations
3,419 Views
15 Pages

Singing Voice Detection in Electronic Music with a Long-Term Recurrent Convolutional Network

  • Raymundo Romero-Arenas,
  • Alfonso Gómez-Espinosa and
  • Benjamín Valdés-Aguirre

23 July 2022

Singing Voice Detection (SVD) is a classification task that determines whether there is a singing voice in a given audio segment. While current systems produce high-quality results on this task, the reported experiments are usually limited to popular...

  • Article
  • Open Access
7 Citations
2,817 Views
32 Pages

A Generic Framework for Enhancing Autonomous Driving Accuracy through Multimodal Data Fusion

  • Henry Alexander Ignatious,
  • Hesham El-Sayed,
  • Manzoor Ahmed Khan and
  • Parag Kulkarni

27 September 2023

Higher-level autonomous driving necessitates the best possible execution of important moves under all conditions. Most of the accidents in recent years caused by the AVs launched by leading automobile manufacturers are due to inadequate decision-maki...

  • Article
  • Open Access
26 Citations
5,722 Views
18 Pages

In this paper, multilayer cryptosystems for encrypting audio communications are proposed. These cryptosystems combine audio signals with other active concealing signals, such as speech signals, by continuously fusing the audio signal with a speech si...

  • Article
  • Open Access
42 Citations
6,115 Views
20 Pages

2 March 2022

Chinese Cantonese opera, a UNESCO Intangible Cultural Heritage (ICH) of Humanity, has faced a series of development problems due to diversified entertainment and emerging cultures. While, the management on Cantonese opera data in a scientific manner...

  • Article
  • Open Access
7 Citations
2,265 Views
15 Pages

Acoustic Detection of Vaccine Reactions in Hens for Assessing Anti-Inflammatory Product Efficacy

  • Gerardo José Ginovart-Panisello,
  • Ignasi Iriondo,
  • Tesa Panisello Monjo,
  • Silvia Riva,
  • Jordi Casadó Cancer and
  • Rosa Ma Alsina-Pagès

5 March 2024

Acoustic studies on poultry show that chicken vocalizations can be a real-time indicator of the health conditions of the birds and can improve animal welfare and farm management. In this study, hens vaccinated against infectious laryngotracheitis (IL...

  • Article
  • Open Access
2 Citations
3,718 Views
13 Pages

Audio Watermarking System in Real-Time Applications

  • Carlos Jair Santin-Cruz and
  • Gordana Jovanovic Dolecek

Watermarking is widely employed to protect audio files. Previous research has focused on developing systems that balance performance criteria, including robustness, imperceptibility, and capacity. Most existing systems are designed to work with pre-r...

of 3