Artificial Intelligence Techniques for Audio, Image, and Multisensory Signal Processing

A special issue of Big Data and Cognitive Computing (ISSN 2504-2289). This special issue belongs to the section "Artificial Intelligence and Multi-Agent Systems".

Deadline for manuscript submissions: 27 January 2027 | Viewed by 1698

Special Issue Editor


E-Mail Website
Guest Editor
School of Engineering and Informatics, Department of Engineering, Pegaso University, 80143 Naples, Italy
Interests: artificial intelligence; acoustics and noise control
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The rapid evolution of machine learning, deep learning, and data-driven methodologies has considerably transformed signal acquisition, analysis, and interpretation tasks in many domains. Audio and image processing, traditionally performed by means of model-based and handcrafted approaches, increasingly benefit from AI methodologies that can learn directly from nonlinear representations of complex data. Moreover, the integration of heterogeneous sensory modalities has led to new perspectives on robust perception and intelligent decision-making in real-world environments.This Special Issue offers an open forum for comprehensive presentation of novel algorithms, methodologies, and applications where AI is effectively employed to improve the performance, efficiency, or interpretability of signal processing. It aims to close the gap between theoretical developments and applied research by highlighting state-of-the-art solutions and describing current challenges and future research directions. Complementary to available reviews and application-driven studies, this Special Issue aims to emphasize cross-domain methodologies and multisensory integration and encourage interdisciplinary dialogue between the signal processing and AI research communities. We welcome original research articles and reviews on topics including, but not limited to, sound and speech analysis, music information retrieval, computer vision, medical and industrial imaging, multimodal and sensor fusion, and real-time or embedded AI systems. Emerging topics of particular interest include explainable and trustworthy AI, data scarcity, transfer learning, and edge computing. Research areas may include (but are not limited to) the following:

  • Deep Learning Architectures for Audio and Speech Signal Processing
  • Artificial Intelligence Techniques in Image and Video Processing
  • Techniques in Multimodal and Multisensory Signal Fusion
  • Sound Event Detection, Acoustic Scene Analysis, and Audio Classification
  • Computer Vision and Image Understanding via AI
  • Machine Learning for Music Information Retrieval and Audio Content Analysis
  • Artificial Intelligence Methods for Signal Enhancement, Denoising, and Source Separation
  • Explainable and Interpretable AI in Signal Processing Apps
  • Edge and Embedded AI for Real-time Audio and Image Processing
  • Transfer Learning and Domain Adaptation in Signal Processing Tasks
  • Artificial Intelligence for Processing Biomedical and Medical Signals and Images
  • Approaches to Data-Driven Environmental and Urban Signal Observation
  • Self-Supervised and Unsupervised Learning for Audio and Image Signals
  • AI Methods for Sensor Networks and Multisensor Data Integration
  • Reliable, Trustworthy, and Ethical AI in Signal Processing Systems

We look forward to receiving your contributions.

Dr. Giuseppe Ciaburro
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • signal processing
  • audio processing
  • image processing
  • multisensory data
  • deep learning
  • multimodal fusion
  • machine learning
  • computer vision
  • acoustic analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

18 pages, 7435 KB  
Article
A Comparative Analysis of Deep-Learning-Based Speech Enhancement Models: Assessing Biometric Speaker Verification in Real-World Noisy Environments
by Md Jahangir Alam Khondkar, Ajan Ahmed, Stephanie Schuckers and Masudul H. Imtiaz
Big Data Cogn. Comput. 2026, 10(3), 98; https://doi.org/10.3390/bdcc10030098 - 23 Mar 2026
Viewed by 560
Abstract
Speech enhancement through denoising is essential for maintaining signal intelligibility and quality in biometric speaker verification pipelines that operate in acoustically adverse conditions. Despite the proliferation of deep learning (DL) architectures for speech denoising, simultaneously optimizing noise attenuation, perceptual fidelity, and speaker-identity preservation [...] Read more.
Speech enhancement through denoising is essential for maintaining signal intelligibility and quality in biometric speaker verification pipelines that operate in acoustically adverse conditions. Despite the proliferation of deep learning (DL) architectures for speech denoising, simultaneously optimizing noise attenuation, perceptual fidelity, and speaker-identity preservation remains an open problem. We address this gap by benchmarking three architecturally distinct DL-based enhancement models—Wave-U-Net, CMGAN, and U-Net—on three independent, domain-diverse corpora (SpEAR, VPQAD, and Clarkson) that the models never encountered during training and by introducing commercial-grade VeriSpeak speaker-verification scores as a biometric evaluation dimension absent from prior comparative studies. Our experiments reveal a clear three-way trade-off: U-Net achieves the highest signal-to-noise ratio (SNR) gains (+61.44% on SpEAR, +67.05% on VPQAD, +235.3% on Clarkson) but sacrifices naturalness; CMGAN yields the best perceptual evaluation of speech quality (PESQ) values (3.33, 1.35, and 2.50, respectively), favoring listening-comfort applications; and Wave-U-Net delivers the strongest biometric fidelity (VeriSpeak improvements of +11.63%, +30.22%, and +29.24%) while offering competitive perceptual quality. These results highlight that model selection must be driven by the target deployment scenario and provide actionable guidance for improving biometric verification robustness under real-world noise. Full article
Show Figures

Figure 1

Other

Jump to: Research

44 pages, 1099 KB  
Systematic Review
Sound Event Detection in Smart Cities: A Systematic Review of Methods, Datasets, and Applications
by Giuseppe Ciaburro and Virginia Puyana-Romero
Big Data Cogn. Comput. 2026, 10(3), 83; https://doi.org/10.3390/bdcc10030083 - 8 Mar 2026
Viewed by 797
Abstract
Sound Event Detection (SED) is a growing area with vast prospects for understanding and designing the sonic fabric of smart cities. In this paper, the latest advances in SED are summarized, focusing on models, datasets, and applications from scientific papers listed on Scopus [...] Read more.
Sound Event Detection (SED) is a growing area with vast prospects for understanding and designing the sonic fabric of smart cities. In this paper, the latest advances in SED are summarized, focusing on models, datasets, and applications from scientific papers listed on Scopus and Web of Science. The paper provides a clear view of how SED is being used in smart cities, public safety, environment monitoring, and home security. The paper also addresses the challenges of SED, including dataset representativeness, model robustness under noisy or complex acoustic scenes, event rarity detection, as well as the ethics of using automatic listening. The paper also provides a view of future work to be undertaken in SED. The focus of the paper is on self-supervised learning, multi-modal fusion, neuro-inspired approaches, as well as privacy-preserving analytics. The paper provides a view of SED as a key technology to make smart cities safe, secure, and sustainable. SED has vast prospects as a key technology to enable artificial perception of smart cities. Full article
Show Figures

Figure 1

Back to TopTop