This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
A Comparative Investigation of Cepstral Feature Extraction Methods for Deepfake Speech Detection
by
Nida Akıncı
Nida Akıncı
Nida Akıncı was born in Malatya, Türkiye. She received her B.S. in Computer Engineering from in a [...]
Nida Akıncı was born in Malatya, Türkiye. She received her B.S. in Computer Engineering from İnönü University, Malatya/Türkiye in 2020. She has been pursuing her M.Sc. in Computer Engineering at Fırat University, Elazığ/Türkiye since 2023. She works as a Software Development Specialist at Enqura Information Technologies, where she focuses on Fintech applications. Her research interests include image verification, voice verification, voice signing, and corporate identity authentication.
and
Erdal Özbay
Erdal Özbay
Erdal Özbay was born in Elazig, Türkiye. He received his B.S. in Computer Engineering from Girne [...]
Erdal Özbay was born in Elazig, Türkiye. He received his B.S. in Computer Engineering from Girne American University, TRNC in 2010; his MSc. from Firat University, Elazig/Türkiye in 2013; and his PhD. degree in Computer Sciences from Firat University, Elazig/ Türkiye in 2018. He has been an Associate Professor in the Computer Engineering department at Firat University, Elazig, Türkiye, since 2025. He is the author/co-author of more than 50 technical journal articles and conference publications. His research interests include machine learning, artificial intelligence, image processing, optimization algorithms, point clouds, 3D visualization, curve skeleton, and 3D reconstruction.
*
Department of Computer Engineering, Firat University, 23119 Elazig, Türkiye
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(13), 6707; https://doi.org/10.3390/app16136707 (registering DOI)
Submission received: 1 June 2026
/
Revised: 27 June 2026
/
Accepted: 2 July 2026
/
Published: 4 July 2026
Abstract
The widespread adoption of voice-based authentication systems has been accompanied by an escalating threat from deep learning-based synthetic speech generation techniques. This study presents a comparative and experimental investigation of cepstral feature extraction methods for deepfake speech detection. Specifically, Mel-Frequency Cepstral Coefficients (MFCC), Linear-Frequency Cepstral Coefficients (LFCC), and Constant-Q Cepstral Coefficients (CQCC) are systematically evaluated with respect to their frequency scaling characteristics, spectral resolution properties, and capacity to capture artifacts specific to synthetic speech production. Experiments were conducted on 5571 audio samples drawn from the ASVspoof 2021 Logical Access evaluation partition, with all methods assessed under identical classification conditions using a linear Support Vector Machine. Results indicate that CQCC attains the highest numerical performance, achieving 83.59% accuracy, 89.15% ROC-AUC, and 15.83% Equal Error Rate (EER); however, the performance difference between MFCC and CQCC does not reach statistical significance (p = 0.202). Five-fold cross-validation corroborates this finding (CQCC: 87.89% ± 0.81%). McNemar’s test confirms that the performance difference between LFCC and CQCC is statistically significant (p = 0.036). A fine-grained attack-wise analysis across 13 spoofing systems reveals that no single feature representation consistently outperforms the others across all attack types; CQCC achieves the highest accuracy on 6 out of 13 systems, while MFCC remains competitive on several attack categories. The overall findings indicate that deepfake detection performance is highly sensitive not only to the classifier architecture but also to the choice of frequency scale, cepstral transformation design, and data conditions. Empirical motivation is provided that multi-feature strategies integrating complementary frequency representations may offer more robust and generalizable detection solutions.
Share and Cite
MDPI and ACS Style
Akıncı, N.; Özbay, E.
A Comparative Investigation of Cepstral Feature Extraction Methods for Deepfake Speech Detection. Appl. Sci. 2026, 16, 6707.
https://doi.org/10.3390/app16136707
AMA Style
Akıncı N, Özbay E.
A Comparative Investigation of Cepstral Feature Extraction Methods for Deepfake Speech Detection. Applied Sciences. 2026; 16(13):6707.
https://doi.org/10.3390/app16136707
Chicago/Turabian Style
Akıncı, Nida, and Erdal Özbay.
2026. "A Comparative Investigation of Cepstral Feature Extraction Methods for Deepfake Speech Detection" Applied Sciences 16, no. 13: 6707.
https://doi.org/10.3390/app16136707
APA Style
Akıncı, N., & Özbay, E.
(2026). A Comparative Investigation of Cepstral Feature Extraction Methods for Deepfake Speech Detection. Applied Sciences, 16(13), 6707.
https://doi.org/10.3390/app16136707
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.