Detecting Audio Copy-Move Forgeries on Mel Spectrograms via Hybrid Keypoint Features

Ezgi Ozgen; Seyma Yucel Altay

doi:10.3390/app152111845

and

Department of Software Engineering, Atatürk University, Erzurum 25240, Turkey

^*

Author to whom correspondence should be addressed.

Appl. Sci.2025, 15(21), 11845;https://doi.org/10.3390/app152111845

This article belongs to the Special Issue Multimedia Smart Security

Version Notes

Order Reprints

Abstract

With the widespread use of audio editing software and artificial intelligence, it has become very easy to forge audio files. One type of these forgeries is copy-move forgery, which is achieved by copying a segment from an audio file and placing it in a different place in the same file, where the aim is to take the speech content out of its context and alter its meaning. In practice, forged recordings are often disguised through post-processing steps such as lossy compression, additive noise, or median filtering. This distorts acoustic features and makes forgery detection more difficult. This study introduces a robust keypoint-based approach that analyzes Mel-spectrograms, which are visual time-frequency representations of audio. Instead of processing the raw waveform for forgery detection, the proposed method focuses on identifying duplicate regions by extracting distinctive visual patterns from the spectrogram image. We tested this approach on two speech datasets (Arabic and Turkish) under various real-world attack conditions. Experimental results show that the method outperforms existing techniques and achieves high accuracy, precision, recall, and F1-scores. These findings highlight the potential of visual-domain analysis to increase the reliability of audio forgery detection in forensic and communication contexts.

Keywords:

audio copy-move forgery detection; audio forensics; SIFT; FAST; FREAK

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.