Next Article in Journal
A Survey on Deep-Learning-Based Techniques for Detecting AI-Generated Synthetic Images
Previous Article in Journal
Statement of Peer Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Audio-Based Drone Detection System Using FFT and Machine Learning Models †

by
Leonardo Vicente Jimenez
*,‡,
Gabriel Sánchez Pérez
,
José Portillo-Portillo
,
Linda Karina Toscano Medina
,
Aldo Hernández Suárez
,
Jesús Olivares Mercado
and
Héctor Manuel Pérez Meana
Instituto Politécnico Nacional (National Polytechnic Institute), ESIME Culhuacan, Mexico City 04440, Mexico
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in Presented at the First Summer School on Artificial Intelligence in Cybersecurity, Cancun, Mexico, 3–7 November 2025.
These authors contributed equally to this work.
Eng. Proc. 2026, 123(1), 30; https://doi.org/10.3390/engproc2026123030
Published: 10 February 2026
(This article belongs to the Proceedings of First Summer School on Artificial Intelligence in Cybersecurity)

Abstract

In recent years, the use of drones, also known as unmanned aerial vehicles (UAVs), has experienced a rapid increase due to their wide availability, compact size, low cost, and ease of operation. These devices have found applications in various areas, facilitating human work by covering large distances and operating in inaccessible or dangerous zones. However, their use has also been associated with malicious activities, such as property damage or threats to public security, which highlights the need to develop efficient and precise UAV detection systems. Although approaches based on neural networks have been proposed, they require large amounts of data for training and more computational resources for operation, which limits their applicability. In this study, we propose an alternative approach based on an analysis of audio features obtained through the fast Fourier transform (FFT) algorithm and classification using machine learning (ML) models. Our approach aims to detect the presence of drones using a minimal number of samples, meeting the requirements of efficiency, accuracy, robustness, low cost, and scalability necessary for a functional detection system.

1. Introduction

The widespread adoption of small, low-cost unmanned aerial vehicles (UAVs) [1], or drones, has introduced significant security concerns. While beneficial in fields like photography and agriculture, their potential for malicious use [2,3]—such as espionage, unauthorized access, and transporting hazardous materials [4]—demands effective detection systems.
This study proposes an audio-based drone detection system to address these security challenges. The approach leverages the cost-effectiveness and scalability of audio sensors [5], particularly compared to expensive alternatives like radar or high-resolution cameras, and its ability to detect autonomous drones that RF-based methods [6] might miss.
The system utilizes features extracted via the fast Fourier transform (FFT) method. Due to the high dimensionality of FFT data, principal component analysis (PCA) is employed for dimensionality reduction. The processed features are then used to evaluate a set of machine learning algorithms [7] for classification.
The focus of this study is exclusively on the initial detection phase of a broader security system, which would subsequently include identification, localization, and neutralization steps. The proposed methodology follows a structured flowchart, detailed in the subsequent sections.
The main contribution of this study lies in the comprehensive comparative evaluation of classic machine learning models [8] on an integrated dataset processed via FFT and PCA. Unlike previous studies focused on a single model or dataset, our systematic approach identifies the most robust and efficient algorithms for audio-based drone detection, optimizing the trade-off between accuracy and computational requirements.

2. Materials and Methods

The objective is to evaluate ML models using metrics like accuracy, F1 score, precision, and recall to select optimal hyperparameters.

2.1. Databases

The primary dataset is the Drone Audio Dataset [9], containing drone audio (Parrot Bebop/Mambo) and ambient sounds. This is supplemented with data from Akbal et al. [10] (airplanes, storms, helicopters) and original recordings of a 4DRC™ drone, captured with the specifications in Table 1. The database architecture is shown in Figure 1.

2.2. Preprocessing and Environmental Variability

To ensure robustness against environmental variability, no specific noise filtering or data augmentation techniques were applied, thereby evaluating the models’ raw performance with ambient sounds included in the datasets. The recordings encompass inherent variations in distance and orientation from the original data sources [9,10].

2.3. Feature Extraction

The fast Fourier transform (FFT) [11] approach was applied to transform audio signals from the time to the frequency domain [12], revealing the spectral signature of drone rotors. This reduces dimensionality and provides robustness against noise. Audio clips were 1 s long, resulting in 16,000 samples at a 16 kHz sampling rate. The spectral resolution was 1 Hz , with a frequency range from 0 Hz to 8 k Hz . The relevant range for drone detection is 0 Hz to 5500 Hz [13].

2.4. Feature Selection

Principal component analysis (PCA) [14] was used for dimensionality reduction on the FFT features. Data was first standardized using Z-score normalization. The PCA process involves standardization, covariance analysis, eigen-decomposition, and projection onto principal components.

2.5. Model Comparison with Optimized Hyperparameters

Several ML models were evaluated, including K-nearest neighbors (KNN) [15,16], support vector machines (SVMs), decision tree, random forest [17], Gaussian process classification (GPC) [18], and ensembles (stacking and voting). Hyperparameter optimization was performed via grid search and cross-validation.

3. Results

3.1. Training Performance Evaluation

Firstly, the optimal number of components for each model (excluding decision tree and random forest) will be analyzed. Starting from the original 8000 features per sample, PCA reduced them to a range of 80 to 200 components, preserving more than 90% of the explained variance. This was reflected in the predictive results of each model, where dimensionality reduction did not compromise their performance.
Since the number of PCA components is the only hyperparameter shared by most models (others vary by algorithm), the analysis will focus on evaluation metrics. Although accuracy, precision, F1 score, and recall were calculated, Table 2 shows only accuracy and F1 score.
This selection is justified by the class balance, where accuracy is representative, as it is not biased by dominant classes. The F1 score, by combining precision and recall, evaluates the balance between false positives and negatives, allowing for a concise evaluation without loss of relevant information.
Both KNN and decision tree obtain equitable performance; however, it is important to highlight that decision tree is an unstable algorithm (sensitive to small variations in data), so evaluating point metrics is not sufficient. Later, an evaluation will be shown to verify the instability of decision tree and other models.
Random forest was proposed to mitigate decision tree problems and, as results show, it surpasses its performance.
On the other hand, GPC shows slightly lower performance compared to random forest (2% less accuracy). This model requires special care, since although increasing data might improve its performance, it could also increase its computational cost, which contradicts the efficiency requirement in drone detection systems.
As a final point, the two models with the best performance are presented: SVM and ensemble. Although both reach approximately 97%, the SVM slightly outperforms the ensemble in both metrics.

3.2. External Validation with Test Data

The previous subsection analyzed the results from the training set. This section aims to complement those results by evaluating the models with test data.
Table 3 shows that the GPC model maintains consistent results between training and testing. In contrast, KNN increases its accuracy in Database I but decreases in Database II, suggesting that adding samples from the Drone class does not improve its generalization. The decision tree—of which its instability was mentioned—confirms this behavior: its performance decreased on new data, although it achieved improvements in Database II by including more Drone samples (similar to KNN). The SVM, random forest, and ensemble (voting/stacking) models stand out for their excellent performance, especially in Database II, where the increase in samples from a different drone optimized their results.

4. Discussion

Current drone detection systems require compliance with certain characteristics to be considered efficient. Therefore, this study evaluated a set of machine learning models on two customized databases to find models capable of efficiently detecting these unmanned aerial vehicles (UAVs).
The results showed that SVM, random forest, and voting ensemble models are the most suitable for this task, maintaining good performance even when expanding the database with new types of drones. It is worth noting that although SVM alone achieved high accuracy percentages, the implementation of other models should not be dismissed due to their complementary performance.
For real-time or edge-computing deployments, computational efficiency is crucial. Models like SVM and random forest, while accurate, may have higher inference demands compared to a simpler KNN.

Author Contributions

L.V.J., G.S.P., J.P.-P., L.K.T.M., A.H.S., J.O.M. and H.M.P.M. contributed equally to the conception, writing, and review of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable for studies not involving humans or animals.

Informed Consent Statement

Not applicable for studies not involving humans.

Data Availability Statement

The data supporting the reported results are available upon request from the corresponding author. The data are not publicly available due to privacy restrictions.

Acknowledgments

The authors would like to thank the organizers of the First Summer School on Artificial Intelligence in Cybersecurity for providing the academic framework that inspired this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. MAXIMIXE. ¿Qué se Sabe del Mercado de Drones a Nivel Mundial? 2024. Available online: https://alertaeconomica.com/que-se-sabe-del-mercado-de-drones-a-nivel-mundial/ (accessed on 10 July 2025).
  2. Dron Pasa Muy Cerca de un Avión de Pasajeros en Los Ángeles. 2016. Available online: https://latimes.com/espanol/eeuu/articulo/2016-03-19/hoyla-loc-dron-pasa-muy-cerca-de-avin-de-pasajeros-en-los-ngeles-20160319 (accessed on 18 July 2025).
  3. Méndez, M.Á. Un Dron Hackeado se Desploma Hiriendo a una Atleta en Australia. 2020. Available online: https://es.gizmodo.com/un-dron-hackeado-se-desploma-hiriendo-a-una-atleta-en-a-1559691586 (accessed on 16 July 2025).
  4. Cuál es el Nuevo uso que los Cárteles dan a los Drones en la Frontera con EEUU. 2023. Available online: https://www.infobae.com/mexico/2023/02/10/cual-es-el-nuevo-uso-que-los-carteles-dan-a-los-drones-en-la-frontera-con-eeuu/ (accessed on 18 July 2025).
  5. Wang, Y.; Chu, Z.; Ku, I.; Smith, E.C.; Matson, E.T. A Large-Scale UAV Audio Dataset and Audio-Based UAV Classification Using CNN. In Proceedings of the 2022 Sixth IEEE International Conference on Robotic Computing (IRC), Naples, Italy, 5–7 December 2022; pp. 186–189. [Google Scholar] [CrossRef]
  6. Al-Emadi, S.; Al-Senaid, F. Drone Detection Approach Based on Radio-Frequency Using Convolutional Neural Network. In Proceedings of the 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), Doha, Qatar, 2–5 February 2020; pp. 29–34. [Google Scholar] [CrossRef]
  7. Al-Emadi, S.; Al-Ali, A.; Mohammad, A.; Al-Ali, A. Audio Based Drone Detection and Identification using Deep Learning. In Proceedings of the 2019 15th International Wireless Communications & Mobile Computing Conference (IWCMC), Tangier, Morocco, 24–28 June 2019; pp. 459–464. [Google Scholar] [CrossRef]
  8. IBM. ¿Qué es el Aprendizaje Automático? 2024. Available online: https://www.ibm.com/mx-es/topics/machine-learning (accessed on 1 August 2025).
  9. Al-Emadi, S.A.; Al-Ali, A.K.; Al-Ali, A.; Mohamed, A. Audio Based Drone Detection Using Deep Learning. IWCMC 2019. 2019. Available online: https://github.com/saraalemadi/DroneAudioDataset (accessed on 1 July 2025).
  10. Akbal, E.; Akbal, A.; Dogan, S.; Tuncer, T. An automated accurate sound-based amateur drone detection method based on skinny pattern. Digit. Signal Process. 2023, 136, 104012. [Google Scholar] [CrossRef]
  11. Brigham, E.O. The Fast Fourier Transform; Prentice-Hall: Englewood Cliffs, NJ, USA, 1974. [Google Scholar]
  12. Cochran, W.T.; Cooley, J.W.; Favin, D.L.; Helms, H.D.; Kaenel, R.A.; Lang, W.W.; Maling, G.C.; Nelson, D.E.; Rader, C.M.; Welch, P.D. What is the fast Fourier transform? Proc. IEEE 1967, 55, 1664–1674. [Google Scholar] [CrossRef]
  13. Wang, Y.; Ma, H.; Wei, S.; Zhang, S.; Feng, Z.; Wei, Z. Sound Detection and Alarm System of Unmanned Aerial Vehicle. In Recent Developments in Intelligent Computing, Communication and Devices, Proceedings of the ICCD 2017, Shenzhen, China, 9–10 December 2017; Patnaik, S., Jain, V., Eds.; Springer: Singapore, 2019; pp. 885–898. [Google Scholar]
  14. Abdi, H.; Williams, L.J. Principal component analysis. WIREs Comp. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
  15. Zhang, S.; Li, X.; Zong, M.; Zhu, X.; Wang, R. Efficient kNN Classification With Different Numbers of Nearest Neighbors. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 1774–1785. [Google Scholar] [CrossRef] [PubMed]
  16. Kataria, A.; Singh, M.D. A review of data classification using k-nearest neighbour algorithm. Int. J. Emerg. Technol. Adv. Eng. 2013, 3, 354–360. [Google Scholar]
  17. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  18. Jakkala, K. Deep Gaussian Processes: A Survey. arXiv 2021, arXiv:2106.12135. [Google Scholar] [CrossRef]
Figure 1. Database architecture diagram [9,10].
Figure 1. Database architecture diagram [9,10].
Engproc 123 00030 g001
Table 1. Technical specifications of the audio recordings.
Table 1. Technical specifications of the audio recordings.
Sampling RateBit RateChannel TypeAudio Format
16 kHz16 kbpsMonowav
Table 2. Performance metrics (% mean ± standard deviation).
Table 2. Performance metrics (% mean ± standard deviation).
ModelDatabase IDatabase II
KNN
   Accuracy94.91 ± 1.1894.38 ± 1.27
   F1 score94.90 ± 1.1994.37 ± 1.27
SVM
   Accuracy97.45 ± 1.3997.71 ± 1.51
   F1 score97.45 ± 1.3997.70 ± 1.51
Decision Tree
   Accuracy94.30 ± 1.5293.43 ± 0.86
   F1 score94.30 ± 1.5293.43 ± 0.86
Random Forest
   Accuracy96.48 ± 1.3595.22 ± 0.46
   F1 score96.48 ± 1.3595.21 ± 0.46
GPC
   Accuracy95.50 ± 0.9396.05 ± 1.83
   F1 score95.50 ± 0.9396.03 ± 1.84
Voting
   Accuracy97.32 ± 1.2797.39 ± 1.32
   F1 score97.32 ± 1.2797.38 ± 1.32
Stacking
   Accuracy97.45 ± 1.3797.58 ± 1.42
   F1 score97.45 ± 1.3797.58 ± 1.42
Table 3. Accuracy (%) of models on test data.
Table 3. Accuracy (%) of models on test data.
ModelDatabase IDatabase II
KNN98.493.5
SVM98.299.1
Decision Tree92.396.2
Random Forest98.098.6
GPC92.993.8
Voting98.299.7
Stacking94.799.6
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vicente Jimenez, L.; Sánchez Pérez, G.; Portillo-Portillo, J.; Toscano Medina, L.K.; Hernández Suárez, A.; Olivares Mercado, J.; Pérez Meana, H.M. Audio-Based Drone Detection System Using FFT and Machine Learning Models. Eng. Proc. 2026, 123, 30. https://doi.org/10.3390/engproc2026123030

AMA Style

Vicente Jimenez L, Sánchez Pérez G, Portillo-Portillo J, Toscano Medina LK, Hernández Suárez A, Olivares Mercado J, Pérez Meana HM. Audio-Based Drone Detection System Using FFT and Machine Learning Models. Engineering Proceedings. 2026; 123(1):30. https://doi.org/10.3390/engproc2026123030

Chicago/Turabian Style

Vicente Jimenez, Leonardo, Gabriel Sánchez Pérez, José Portillo-Portillo, Linda Karina Toscano Medina, Aldo Hernández Suárez, Jesús Olivares Mercado, and Héctor Manuel Pérez Meana. 2026. "Audio-Based Drone Detection System Using FFT and Machine Learning Models" Engineering Proceedings 123, no. 1: 30. https://doi.org/10.3390/engproc2026123030

APA Style

Vicente Jimenez, L., Sánchez Pérez, G., Portillo-Portillo, J., Toscano Medina, L. K., Hernández Suárez, A., Olivares Mercado, J., & Pérez Meana, H. M. (2026). Audio-Based Drone Detection System Using FFT and Machine Learning Models. Engineering Proceedings, 123(1), 30. https://doi.org/10.3390/engproc2026123030

Article Metrics

Back to TopTop