You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

6 November 2025

Detecting Audio Copy-Move Forgeries on Mel Spectrograms via Hybrid Keypoint Features

and
Department of Software Engineering, Atatürk University, Erzurum 25240, Turkey
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Multimedia Smart Security

Abstract

With the widespread use of audio editing software and artificial intelligence, it has become very easy to forge audio files. One type of these forgeries is copy-move forgery, which is achieved by copying a segment from an audio file and placing it in a different place in the same file, where the aim is to take the speech content out of its context and alter its meaning. In practice, forged recordings are often disguised through post-processing steps such as lossy compression, additive noise, or median filtering. This distorts acoustic features and makes forgery detection more difficult. This study introduces a robust keypoint-based approach that analyzes Mel-spectrograms, which are visual time-frequency representations of audio. Instead of processing the raw waveform for forgery detection, the proposed method focuses on identifying duplicate regions by extracting distinctive visual patterns from the spectrogram image. We tested this approach on two speech datasets (Arabic and Turkish) under various real-world attack conditions. Experimental results show that the method outperforms existing techniques and achieves high accuracy, precision, recall, and F1-scores. These findings highlight the potential of visual-domain analysis to increase the reliability of audio forgery detection in forensic and communication contexts.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.