Region-Wise Recognition and Classification of Arabic Dialects and Vocabulary: A Deep Learning Approach
Abstract
1. Introduction
Research Gap
2. Literature Review
2.1. Audio MFCCS Generation
2.2. Feature Selection
2.3. Augmentation Method
2.4. Deep Learning Models
3. Methodology
3.1. Data Collection
3.2. Automatic Dialect Identification
3.2.1. Data Preprocessing
3.2.2. Feature Extraction
3.2.3. Feature Selection
3.2.4. Data Augmentation
- ADASYN: By creating artificial samples close to the decision boundary, this technique aimed to oversample the minority class (GLF). It increased the number of GLF samples from 298 to 3762 while maintaining the initial number of MSA samples at 3762.
- Random oversampling: This technique matches the number of samples in the majority class (MSA) by randomly duplicating samples from the minority class (GLF). The method resulted in a total of 3762 GLF samples.
- Synthetic minority oversampling technique (SMOTE): SMOTE interpolates between pre-existing minority class samples to generate copied trials for the smaller class (GLF), much like ADASYN. This method increased the total number of GLF samples to 3762.
4. Model Architectures
4.1. ANN
4.2. CNNs
5. Model Training
5.1. Hyperparameter Tuning
5.2. Model Accuracy and Loss
5.3. Effects of Augmentation Methods
6. Results and Discussion
6.1. Binary Classification
6.2. Multiclass Classification
6.3. Discussion
7. Conclusions
Future Scope
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Chittaragi, N.B.; Koolagudi, S.G. Dialect identification using chroma-spectral shape features with ensemble technique. Comput. Speech Lang. 2021, 70, 101230. [Google Scholar] [CrossRef]
- Lulu, L.; Elnagar, A. Automatic Arabic dialect classification using deep learning models. Procedia Comput. Sci. 2018, 142, 262–269. [Google Scholar] [CrossRef]
- Bent, T.; Atagi, E.; Akbik, A.; Bonifield, E. Classification of regional dialects, international dialects, and nonnative accents. J. Phon. 2016, 58, 104–117. [Google Scholar] [CrossRef]
- Michon, E.; Pham, M.Q.; Crego, J.M.; Senellart, J. Neural network architectures for Arabic dialect identification. In Proceedings of the Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), Santa Fe, NM, USA, 20 August 2018; pp. 128–136. [Google Scholar]
- Tibi, N. Automatic arabic dialect identification using deep learning. In Proceedings of the 2022 IEEE Information Technologies & Smart Industrial Systems (ITSIS), Paris, France, 15–17 July 2022; pp. 1–5. [Google Scholar]
- Adel, B.; Meftah, M.C.E.; Laouid, A.; Chait, K.; Kara, M. Using transformers to classify arabic dialects on social networks. In Proceedings of the 2024 6th International Conference on Pattern Analysis and Intelligent Systems (PAIS), El Oued University, Algeria, 24–25 April 2024; pp. 1–7. [Google Scholar]
- Althobaiti, M.J. Automatic Arabic dialect identification systems for written texts: A survey. arXiv 2020, arXiv:2009.12622. [Google Scholar]
- Elnagar, A.; Yagi, S.M.; Nassif, A.B.; Shahin, I.; Salloum, S.A. Systematic literature review of dialectal Arabic: Identification and detection. IEEE Access 2021, 9, 31010–31042. [Google Scholar] [CrossRef]
- Zaidan, O.F.; Callison-Burch, C. Arabic dialect identification. Comput. Linguist. 2014, 40, 171–202. [Google Scholar] [CrossRef]
- Nahar, K.M.; Al-Hazaimeh, O.M.; Abu-Ein, A.; Al-Betar, M.A. Arabic dialect identification using different machine learning methods. arXiv 2022. [Google Scholar] [CrossRef]
- Kanjirangat, V.; Samardzic, T.; Dolamic, L.; Rinaldi, F. NLP_DI at NADI 2024 shared task: Multi-label Arabic Dialect Classifications with an Unsupervised Cross-Encoder. In Proceedings of the Second Arabic Natural Language Processing Conference, Bangkok, Thailand, 16 August 2024; pp. 742–747. [Google Scholar]
- Abdelazim, M.; Hussein, W.; Badr, N. Automatic Dialect identification of Spoken Arabic Speech using Deep Neural Networks. Int. J. Intell. Comput. Inf. Sci. 2022, 22, 25–34. [Google Scholar] [CrossRef]
- Bird, J.J.; Faria, D.R.; Premebida, C.; Ekárt, A.; Ayrosa, P.P. Overcoming data scarcity in speaker identification: Dataset augmentation with synthetic mfccs via character-level rnn. In Proceedings of the 2020 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC), Ponta Delgada, Portugal, 15–16 April 2020; pp. 146–151. [Google Scholar]
- Tibi, N.; Messaoud, M.A.B. Arabic dialect classification using an adaptive deep learning model. Bull. Electr. Eng. Inform. 2025, 14, 1108–1116. [Google Scholar] [CrossRef]
- Rastogi, U.; Mahapatra, R.P.; Kumar, S. Advancements in Machine Learning Techniques for Hand Gesture-Based Sign Language Recognition: A Comprehensive Review. Arch. Comput. Methods Eng. 2025, 1–38. [Google Scholar] [CrossRef]
- Juvela, L.; Bollepalli, B.; Wang, X.; Kameoka, H.; Airaksinen, M.; Yamagishi, J.; Alku, P. Speech waveform synthesis from MFCC sequences with generative adversarial networks. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, 15–20 April 2018; pp. 5679–5683. [Google Scholar]
- Hamza, A.; Javed, A.R.R.; Iqbal, F.; Kryvinska, N.; Almadhor, A.S.; Jalil, Z.; Borghol, R. Deepfake audio detection via MFCC features using machine learning. IEEE Access 2022, 10, 134018–134028. [Google Scholar] [CrossRef]
- Rezaul, K.M.; Jewel, M.; Islam, M.S.; Siddiquee, K.; Barua, N.; Rahman, M.; Sulaiman, R.; Shaikh, M.; Hamim, M.; Tanmoy, F. Enhancing Audio Classification Through MFCC Feature Extraction and Data Augmentation with CNN and RNN Models. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 37–53. [Google Scholar] [CrossRef]
- Borah, P.; Ahmed, H.A.; Bhattacharyya, D.K. A statistical feature selection technique. Netw. Model. Anal. Health Inform. Bioinform. 2014, 3, 55. [Google Scholar] [CrossRef]
- Chandra, B.; Gupta, M. An efficient statistical feature selection approach for classification of gene expression data. J. Biomed. Inform. 2011, 44, 529–535. [Google Scholar] [CrossRef]
- Van Hulse, J.; Khoshgoftaar, T.M.; Napolitano, A.; Wald, R. Threshold-based feature selection techniques for high-dimensional bioinformatics data. Netw. Model. Anal. Health Inform. Bioinform. 2012, 1, 47–61. [Google Scholar] [CrossRef]
- Zhu, L. Selection of multi-level deep features via spearman rank correlation for synthetic aperture radar target recognition using decision fusion. IEEE Access 2020, 8, 133914–133927. [Google Scholar] [CrossRef]
- Samb, M.L.; Camara, F.; Ndiaye, S.; Slimani, Y.; Esseghir, M.A. A novel RFE-SVM-based feature selection approach for classification. Int. J. Adv. Sci. Technol. 2012, 43, 27–36. [Google Scholar]
- Chen, X.; Gong, Z.; Huang, D.; Jiang, N.; Zhang, Y. Overcoming Class Imbalance in Network Intrusion Detection: A Gaussian Mixture Model and ADASYN Augmented Deep Learning Framework. In Proceedings of the 2024 4th International Conference on Internet of Things and Machine Learning, Nanchang, China, 9–11 August 2024; pp. 48–53. [Google Scholar]
- Beinecke, J.; Heider, D. Gaussian noise up-sampling is better suited than SMOTE and ADASYN for clinical decision making. BioData Min. 2021, 14, 49. [Google Scholar] [CrossRef]
- Salehpour, A.; Norouzi, M.; Balafar, M.A.; SamadZamini, K. A cloud-based hybrid intrusion detection framework using XGBoost and ADASYN-Augmented random forest for IoMT. IET Commun. 2024, 18, 1371–1390. [Google Scholar] [CrossRef]
- Nhita, F.; Kurniawan, I. Performance and Statistical Evaluation of Three Sampling Approaches in Handling Binary Imbalanced Data Sets. In Proceedings of the 2023 International Conference on Data Science and Its Applications (ICoDSA), Bandung, Indonesia, 9–10 August 2023; pp. 420–425. [Google Scholar]
- Lee, T.; Kim, M.; Kim, S.-P. Data augmentation effects using borderline-SMOTE on classification of a P300-based BCI. In Proceedings of the 2020 8th International Winter Conference on Brain-Computer Interface (BCI), Gangwon, Republic of Korea, 26–28 February 2020; pp. 1–4. [Google Scholar]
- Chen, Y.; Chang, R.; Guo, J. Effects of data augmentation method borderline-SMOTE on emotion recognition of EEG signals based on convolutional neural network. IEEE Access 2021, 9, 47491–47502. [Google Scholar] [CrossRef]
- Dolka, H.; VM, A.X.; Juliet, S. Speech emotion recognition using ANN on MFCC features. In Proceedings of the 2021 3rd International Conference on Signal Processing and Communication (ICPSC), Coimbatore, India, 13–14 May 2021; pp. 431–435. [Google Scholar]
- Hazra, S.K.; Ema, R.R.; Galib, S.M.; Kabir, S.; Adnan, N. Emotion recognition of human speech using deep learning method and MFCC features. Radioelectron. Comput. Syst. 2022, 18, 161–172. [Google Scholar] [CrossRef]
- Barua, P.; Ahmad, K.; Khan, A.A.S.; Sanaullah, M. Neural network based recognition of speech using MFCC features. In Proceedings of the 2014 International Conference on Informatics, Electronics & Vision (ICIEV), Dhaka, Bangladesh, 23–24 May 2014; pp. 1–6. [Google Scholar]
- Singh, Y.B.; Goel, S. 1D CNN based approach for speech emotion recognition using MFCC features. In Artificial Intelligence and Speech Technology; CRC Press: Boca Raton, FL, USA, 2021; pp. 347–354. [Google Scholar]
- Lai, H.-Y.; Hu, C.-C.; Wen, C.-H.; Wu, J.-X.; Pai, N.-S.; Yeh, C.-Y.; Lin, C.-H. Mel-Scale Frequency Extraction and Classification of Dialect-Speech Signals with 1D CNN based Classifier for Gender and Region Recognition. IEEE Access 2024, 12, 102962–102976. [Google Scholar] [CrossRef]
- Reggiswarashari, F.; Sihwi, S.W. Speech emotion recognition using 2D-convolutional neural network. Int. J. Electr. Comput. Eng. 2022, 12, 6594. [Google Scholar] [CrossRef]
- Annabel, L.S.P.; Thulasi, V. Environmental Sound Classification Using 1-D and 2-D Convolutional Neural Networks. In Proceedings of the 2023 7th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 22–24 November 2023; pp. 1242–1247. [Google Scholar]
- Zvarevashe, K.; Olugbara, O.O. Recognition of speech emotion using custom 2D-convolution neural network deep learning algorithm. Intell. Data Anal. 2020, 24, 1065–1086. [Google Scholar] [CrossRef]
- Im, S.-K.; Chan, K.-H. Neural Machine Translation with CARU-Embedding Layer and CARU-Gated Attention Layer. Mathematics 2024, 12, 997. [Google Scholar] [CrossRef]
- Li, Y.; Wang, Y.; Yang, X.; Im, S.-K. Speech emotion recognition based on Graph-LSTM neural network. EURASIP J. Audio Speech Music Process. 2023, 2023, 40. [Google Scholar] [CrossRef]
Literature Reference | Model Used | Scope | Data Used | Results |
---|---|---|---|---|
2018 [2] | 1. Long short-term memory (LSTM) 2. Convolutional neural networks (CNNs) 3. Bi-directional LSTM (BLSTM) 4. Convolutional LSTM | Dialect classification using deep learning models. | Arabic Online Commentary (AOC) | LSTM = 71.4%, CNN = 68.0%, BLSTM = 70.9%, CLSTM = 71.1% |
2022 [5] | CNN | Arabic dialect identification | Linguistic Data Consortium (LDC) | CNN = 83% |
2018 [4] | CNN | Neural network for Arabic dialect identification | ADI dataset | CNN F1-score = 05289 |
2022 [10] | K-Nearest Neighbor (KNN), Random Forest (RF), Multi-Layer Perceptron (MLP), Artificial Neural Networks (ANNs) | Dialect identification using machine learning models | ADI17 corpus | MFCCSs KNN = 76%, RF = 64%, ANN = 41%, and MLP mode = 34%, TFCCs KNN = 62%, RF = 60%, ANN = 42%, MLP = 33% |
2022 [12] | Gaussian Naïve Bayes and Support Vector Machines Recurrent neural network (RNN) Deep Neural Networks (DNNs) | Spoken Arabic dialect identification | Multi-dialect Arabic speech parallel corpus | RNN = 67% SVM = 73% Naïve Bayes = 70% DNN = 72% |
2020 [13] | RNN | Speaker identification | Flickr8k dataset | RNN = 99% |
2018 [16] | Generative Adversarial Networks | MFCC speech synthesis | MFCC generation with a correlation of 0.9969 | |
2022 [17] | SVM VGG-16 | Deepfake audio detection | For-Rerec dataset For-2sec dataset For-Norm dataset Fake-or-Real dataset | SVM = 97.57% using For-2sec dataset, gradient boosting classifier = 92.63% using For-norm dataset, SVM = 98.83% highest accuracy using the For-Rerec dataset. |
Augmentation Method | Model | N * | S * | Test Size | Random State | Activation | Total Parameters | Trainable | Non-Trainable | Optimizer |
---|---|---|---|---|---|---|---|---|---|---|
No augmentation | ANN | 3256 | 815 | 0.2 | 42 | Relu | 1,295,905 | 1,295,905 | 0 | ADAM |
No augmentation | 1D CNN | 0.2 | 42 | - | - | - | ADAM | |||
No augmentation | 2D CNN | 0.2 | 42 | - | - | - | ADAM | |||
ADASYN | ANN | 5990 | 1498 | 0.2 | 42 | Relu | 1,287,752 | 1,287,752 | 0 | ADAM |
ADASYN | 1D CNN | 0.1 | 387 | 612,298 | 611,786 | 512 | ADAM | |||
ADASYN | 2D CNN | 0.2 | 128 | tanh | 1,134,346 | 1,134,346 | 0 | ADAM | ||
RandomOverSampler | ANN | 3247 | 812 | 0.2 | 42 | Relu | 1,287,752 | 1,287,752 | 0 | ADAM |
RandomOverSampler | 1D CNN | 0.2 | 387 | 612,298 | 611,786 | 512 | ADAM | |||
RandomOverSampler | 2D CNN | 0.2 | 128 | Tanh | 1,134,346 | 1,134,346 | 0 | ADAM | ||
SMOTE | ANN | 5990 | 1498 | 0.2 | 42 | Relu | 1,287,752 | 1,287,752 | 0 | ADAM |
SMOTE | 1D CNN | 0.2 | 387 | 612,298 | 611,786 | 512 | ADAM | |||
SMOTE | 2D CNN | 0.2 | 128 | Tanh | 1,134,346 | 1,134,346 | 0 | ADAM |
Augmentation Method | Model | Verbose | Patience | Batch Size | Min_Lr | Epochs | Validation_Split |
---|---|---|---|---|---|---|---|
No augmentation | ANN | 1 | 10 | 128 | 0.000001 | 30 | 0.2 |
No augmentation | 1D CNN | 0 | - | 32 | - | 50 | 0.2 |
No augmentation | 2D CNN | 0 | - | 32 | - | 50 | 0.2 |
ADASYN | ANN | 1 | 10 | 128 | 0.000001 | 30 | 0.2 |
ADASYN | 1D CNN | 1 | 10 | 128 | 0.000001 | 30 | 0.2 |
ADASYN | 2D CNN | 1 | 10 | 128 | 0.000001 | 30 | 0.2 |
RandomOverSampler | ANN | 1 | 10 | 128 | 0.000001 | 30 | 0.2 |
RandomOverSampler | 1D CNN | 1 | 10 | 128 | 0.000001 | 30 | 0.2 |
RandomOverSampler | 2D CNN | 1 | 10 | 128 | 0.000001 | 30 | 0.2 |
SMOTE | ANN | 1 | 10 | 128 | 0.000001 | 30 | 0.2 |
SMOTE | 1D CNN | 1 | 10 | 128 | 0.000001 | 30 | 0.2 |
SMOTE | 2D CNN | 1 | 10 | 128 | 0.000001 | 30 | 0.2 |
Methods | Model | Training Accuracy | Training Loss | Test Accuracy | Test Loss | Misclassification Rate | Matthews Correlation Coefficient | Intersection over the Union | Dice Similarity Coefficient |
---|---|---|---|---|---|---|---|---|---|
No augmentation | ANN | 0.9322 | 0.16448 | 0.9415 | 0.1446 | 0.066502 | 0.3570 | 0.9322 | 0.9649 |
No augmentation | 1D CNN | 0.9322 | 0.16448 | 0.9415 | 0.1446 | 0.070197 | 0.331 | 0.9285 | 0.9629 |
No augmentation | 2D CNN | 0.9175 | 0.24438 | 0.94307 | 0.1719 | 0.073892 | 0.0877 | 0.9259 | 0.9615 |
ADASYN | ANN | 0.8500 | 0.3000 | 0.8200 | 0.3300 | 0.157748 | 0.6964 | 0.7494 | 0.8568 |
ADASYN | 1D CNN | 0.8800 | 0.2500 | 0.8600 | 0.2800 | 0.192924 | 0.6289 | 0.7018 | 0.8247 |
ADASYN | 2D CNN | 0.9000 | 0.2000 | 0.8800 | 0.2300 | 0.152118 | 0.7070 | 0.7446 | 0.8536 |
RandomOverSampler | ANN | 0.9447 | 0.1457 | 0.9554 | 0.1503 | 0.05665 | 0.4544 | 0.9421 | 0.9702 |
RandomOverSampler | 1D CNN | 0.9366 | 0.1475 | 0.9477 | 0.1550 | 0.060345 | 0.3999 | 0.9386 | 0.9683 |
RandomOverSampler | 2D CNN | 0.943 | 0.1419 | 0.9554 | 0.1441 | 0.061576 | 0.4034 | 0.9372 | 0.9676 |
SMOTE | ANN | 0.8200 | 0.3677 | 0.8322 | 0.3703 | 0.175567 | 0.6575 | 0.7190 | 0.8365 |
SMOTE | 1D CNN | 0.8187 | 0.4003 | 0.8264 | 0.3841 | 0.060345 | 0.3999 | 0.222 | 0.3636 |
SMOTE | 2D CNN | 0.8909 | 0.2570 | 0.8781 | 0.2867 | 0.124833 | 0.7551 | 0.7875 | 0.8811 |
Index | Precision | Recall | F1-Score |
---|---|---|---|
Class 0 | 0.9468 | 0.9920 | 0.9689 |
Class 1 | 0.7272 | 0.2758 | 0.4000 |
accuracy | 0.9408 | 0.9408 | 0.9408 |
macro avg | 0.8370 | 0.6339 | 0.6844 |
weighted avg | 0.9311 | 0.9408 | 0.9282 |
Reference | Year | Dataset | Model | Accuracy |
---|---|---|---|---|
Automatic Arabic Dialect Classification Using Deep Learning [2] | 2018 | Arabic Online Commentary (AOC) | LSTM | 84.50% |
Automatic Arabic Dialect Identification Using Deep Learning [5] | 2022 | Linguistic Data Consortium (LDC) | Multi-scale product (MP) | 76.00% |
Arabic Dialect Identification Using Different Machine Learning Methods [10] | 2022 | ADI17 | KNN | 76.00% |
Neural Network Architectures for Arabic Dialect Identification [4] | 2018 | ADI dataset | CNN | 52.00% |
Automatic Dialect identification of Spoken Arabic Speech using Deep Neural Networks [12] | 2022 | Multi-dialect Arabic speech parallel corpus | RNN-CNN | 62.75% |
Using Transformers to Classify Arabic Dialects [6] | 2024 | Social network applications | BERT and DistilBERT | BERT = 0.85 DistilBERT = 0.76 |
Method Proposed in this Study | Own dataset | ANN with RandomOverSampler augmentation | 95.50% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Al–Anzi, F.S.; Thankaleela, B.S.S. Region-Wise Recognition and Classification of Arabic Dialects and Vocabulary: A Deep Learning Approach. Appl. Sci. 2025, 15, 6516. https://doi.org/10.3390/app15126516
Al–Anzi FS, Thankaleela BSS. Region-Wise Recognition and Classification of Arabic Dialects and Vocabulary: A Deep Learning Approach. Applied Sciences. 2025; 15(12):6516. https://doi.org/10.3390/app15126516
Chicago/Turabian StyleAl–Anzi, Fawaz S., and Bibin Shalini Sundaram Thankaleela. 2025. "Region-Wise Recognition and Classification of Arabic Dialects and Vocabulary: A Deep Learning Approach" Applied Sciences 15, no. 12: 6516. https://doi.org/10.3390/app15126516
APA StyleAl–Anzi, F. S., & Thankaleela, B. S. S. (2025). Region-Wise Recognition and Classification of Arabic Dialects and Vocabulary: A Deep Learning Approach. Applied Sciences, 15(12), 6516. https://doi.org/10.3390/app15126516