Next Article in Journal
LLM and Deep Learning in the Loop of Disturbed Traffic Control
Previous Article in Journal
A Unified Interpretability Framework for Feature Importance in Machine Learning Models
Previous Article in Special Issue
Deep Multi-Modal Kernel Map Network for Music Genre Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Spectral Multi-Representation Fusion for Audio Deepfake Detection

Facultad de Ingenieria, Universidad Militar Nueva Granada, Bogota 110111, Colombia
*
Author to whom correspondence should be addressed.
Algorithms 2026, 19(7), 549; https://doi.org/10.3390/a19070549 (registering DOI)
Submission received: 20 May 2026 / Revised: 20 June 2026 / Accepted: 1 July 2026 / Published: 5 July 2026
(This article belongs to the Special Issue Machine Learning Algorithms for Signal Processing)

Abstract

Audio deepfake detection systems often achieve excellent internal validation performance but fail to generalize under real-world inference conditions involving synthetic speech generated with previously unseen AI tools. To address this limitation, this work proposes the Spectral Multi-Representation Fusion (SMRF) framework, which integrates multiple spectral representations and decision-level fusion strategies to improve robustness under cross-domain conditions. Additionally, a Stability-Aware Multi-Metric Selection (SAMMS) strategy is introduced to select architectures by jointly considering predictive performance and cross-representation stability. The proposed framework was evaluated using four spectral representations (log-magnitude spectrogram (LOG), Mel spectrogram (MEL), Discrete Wavelet Transform (DWT), and Constant-Q Transform (CQT)) combined with multiple convolutional architectures and complementary voting strategies. The experiments revealed that isolated models exhibiting validation metrics above 95% may still produce very poor synthetic-audio detection rates during external inference (even lower than 10%). In contrast, fusion-based strategies substantially improved robustness by exploiting complementary synthetic evidence across spectral domains. The results also demonstrated that both the voting strategy and the SAMMS stability parameter λ strongly affect the final behavior of the system. In particular, hybrid fusion using One-Hard Voting with two architectures selected using λ0.25 achieved the best balance between synthetic-audio detection and real-audio preservation, outperforming individual models under cross-domain inference conditions, with detection rates close to 75% for both synthetic and real audio. These findings suggest that stability-aware fusion strategies constitute a promising direction for improving robustness in realistic audio deepfake detection scenarios.
Keywords: audio deepfake detection; spectral multi-representation fusion; SAMMS; decision-level fusion; multi-representation learning; architecture selection; one-hard voting; soft voting; synthetic speech detection. audio deepfake detection; spectral multi-representation fusion; SAMMS; decision-level fusion; multi-representation learning; architecture selection; one-hard voting; soft voting; synthetic speech detection.

Share and Cite

MDPI and ACS Style

Ballesteros, D.; Suarez, D.; Pachon, C. Spectral Multi-Representation Fusion for Audio Deepfake Detection. Algorithms 2026, 19, 549. https://doi.org/10.3390/a19070549

AMA Style

Ballesteros D, Suarez D, Pachon C. Spectral Multi-Representation Fusion for Audio Deepfake Detection. Algorithms. 2026; 19(7):549. https://doi.org/10.3390/a19070549

Chicago/Turabian Style

Ballesteros, Dora, Daniel Suarez, and Cesar Pachon. 2026. "Spectral Multi-Representation Fusion for Audio Deepfake Detection" Algorithms 19, no. 7: 549. https://doi.org/10.3390/a19070549

APA Style

Ballesteros, D., Suarez, D., & Pachon, C. (2026). Spectral Multi-Representation Fusion for Audio Deepfake Detection. Algorithms, 19(7), 549. https://doi.org/10.3390/a19070549

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop