Spectral Multi-Representation Fusion for Audio Deepfake Detection

Ballesteros, Dora; Suarez, Daniel; Pachon, Cesar

doi:10.3390/a19070549

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Spectral Multi-Representation Fusion for Audio Deepfake Detection

by

Dora Ballesteros

^*

,

Daniel Suarez

and

Cesar Pachon

Facultad de Ingenieria, Universidad Militar Nueva Granada, Bogota 110111, Colombia

^*

Author to whom correspondence should be addressed.

Algorithms 2026, 19(7), 549; https://doi.org/10.3390/a19070549 (registering DOI)

Submission received: 20 May 2026 / Revised: 20 June 2026 / Accepted: 1 July 2026 / Published: 5 July 2026

(This article belongs to the Special Issue Machine Learning Algorithms for Signal Processing)

Download Versions Notes

Abstract

Audio deepfake detection systems often achieve excellent internal validation performance but fail to generalize under real-world inference conditions involving synthetic speech generated with previously unseen AI tools. To address this limitation, this work proposes the Spectral Multi-Representation Fusion (SMRF) framework, which integrates multiple spectral representations and decision-level fusion strategies to improve robustness under cross-domain conditions. Additionally, a Stability-Aware Multi-Metric Selection (SAMMS) strategy is introduced to select architectures by jointly considering predictive performance and cross-representation stability. The proposed framework was evaluated using four spectral representations (log-magnitude spectrogram (LOG), Mel spectrogram (MEL), Discrete Wavelet Transform (DWT), and Constant-Q Transform (CQT)) combined with multiple convolutional architectures and complementary voting strategies. The experiments revealed that isolated models exhibiting validation metrics above 95% may still produce very poor synthetic-audio detection rates during external inference (even lower than 10%). In contrast, fusion-based strategies substantially improved robustness by exploiting complementary synthetic evidence across spectral domains. The results also demonstrated that both the voting strategy and the SAMMS stability parameter

λ

strongly affect the final behavior of the system. In particular, hybrid fusion using One-Hard Voting with two architectures selected using

λ \geq 0.25

achieved the best balance between synthetic-audio detection and real-audio preservation, outperforming individual models under cross-domain inference conditions, with detection rates close to 75% for both synthetic and real audio. These findings suggest that stability-aware fusion strategies constitute a promising direction for improving robustness in realistic audio deepfake detection scenarios.

Keywords: audio deepfake detection; spectral multi-representation fusion; SAMMS; decision-level fusion; multi-representation learning; architecture selection; one-hard voting; soft voting; synthetic speech detection.

Share and Cite

MDPI and ACS Style

Ballesteros, D.; Suarez, D.; Pachon, C. Spectral Multi-Representation Fusion for Audio Deepfake Detection. Algorithms 2026, 19, 549. https://doi.org/10.3390/a19070549

AMA Style

Ballesteros D, Suarez D, Pachon C. Spectral Multi-Representation Fusion for Audio Deepfake Detection. Algorithms. 2026; 19(7):549. https://doi.org/10.3390/a19070549

Chicago/Turabian Style

Ballesteros, Dora, Daniel Suarez, and Cesar Pachon. 2026. "Spectral Multi-Representation Fusion for Audio Deepfake Detection" Algorithms 19, no. 7: 549. https://doi.org/10.3390/a19070549

APA Style

Ballesteros, D., Suarez, D., & Pachon, C. (2026). Spectral Multi-Representation Fusion for Audio Deepfake Detection. Algorithms, 19(7), 549. https://doi.org/10.3390/a19070549

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Spectral Multi-Representation Fusion for Audio Deepfake Detection

Abstract

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI