A Novel Fuzzy-Logic-Based Multi-Criteria Metric for Performance Evaluation of Spam Email Detection Algorithms
Abstract
:1. Introduction
- 1.
- As a major contribution, a fuzzy-logic-based performance measure is proposed. This measure combines accuracy, precision, and recall into a single fuzzy function using the Unified And-Or (UAO) fuzzy operator. Several fuzzy decision rules are developed that can be used in the performance evaluation process. To the best of our knowledge, no such attempt has been reported in the literature. It is worth mentioning that while the proposed measure is evaluated in the context of spam email detection, the measure is generic and can be used in any problem domain concerned with performance evaluation.
- 2.
- As proof of concept, two deep learning models are used for comparative performance evaluation. These models are Bidirectional Encoder Representations from Transformers (BERT) and Long Short-Term Memory (LSTM). The use of BERT and LSTM is motivated by the fact that the models have seen a limited use for spam detection (as highlighted in the literature review in Section 2). This is a minor contribution, which supports the major contribution listed above.
- 3.
2. Literature Review
3. Brief Background of BERT and LSTM
3.1. Bidirectional Encoders Representations from Transformers (BERT)
3.2. Long Short-Term Memory (LSTM)
4. Fuzzy-Logic-Based Multi-Criteria Evaluation Metric for Spam Detection Approaches
4.1. Decision Criteria for the Spam Detection Metric
- True positive (): a legitimate email correctly classified as a legitimate email.
- True negative (): a spam email correctly classified as spam email.
- False positive (): a legitimate email incorrectly classified as spam.
- False negative (): a spam email incorrectly classified as legitimate.
4.2. Membership Functions for the Decision Criteria
4.3. Fuzzy Decision Rules for Spam Detection
- Rule R1: IF Accuracy is high AND Precision is high AND Recall is high THEN the Performance is Excellent.
- Rule R2a: IF Accuracy is high AND (Precision is high OR Recall is high) THEN the Performance is Excellent.
- Rule R2b: IF (Accuracy is high OR Precision is high) AND Recall is high THEN the Performance is Excellent.
- Rule R2c: IF (Accuracy is high OR Recall is high) AND Precision is high THEN the Performance is Excellent.
- Rule R3: IF Accuracy is high OR Precision is high OR Recall is high THEN the Performance is Excellent.
4.4. Mathematical Representation of Decision Rules
- Rule 1:
- Rule 2a:
- Rule 2b:
- Rule 2c:
- Rule 3:
5. Results and Discussion
5.1. Characteristics of Datasets
- Many recent studies utilized the Enron, Lingspam, and PU datasets. As shown in Table 2, the use of Enron dataset has been reported in several studies between the years 2020 and 2022 [11,13,15,26,27,41]. Similarly, the use of Lingspam and PU has also been reported in two recent studies published in 2021 [11,15]. This indicates that the datasets are still in active use in the research community.
- The emails in these datasets were generated between 2000 and 2010. This scenario characterizes the change in wordings and writing patterns in emails over a period of ten years [15]. Secondly, the Enron dataset is employed due to its bias towards spam class [15]. Thirdly, the LingSpam dataset is used due to its domain-specific ham mails, which are extracted from scholarly linguistic discussions [15]. Lastly, the PU dataset is used, since it is less frequently used by researchers, thus encouraging us to study the behavior of this dataset with respect to the aforementioned three algorithms and the proposed fuzzy-logic-based performance metric.
5.2. Empirical Results
5.2.1. Evaluation of BERT and LSTM with Respect to the Three Datasets Using Rule R1
5.2.2. Mutual Comparison of BERT and LSTM Using Rule R1
5.3. Extrinsic Evaluation
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
Abbreviations
| FPR | False positive rates | 
| FNR | False negative rates | 
| ROC | Receiver operating characteristics | 
| FL | Fuzzy logic | 
| UAO | Unified An-Or | 
| BERT | Bidirectional encoder representations from transformers | 
| LSTM | Long short term memory | 
| NB | Naïve Bayes | 
| KNN | K-nearest neighbors | 
| DNN | Deep neural network | 
| SVM | Support vector machine | 
| LSVM | Linear SVM | 
| PLSV | Poly SVM | 
| SSVM | Sigmoid SVM | 
| RF | Random forest | 
| WE | Word Embedding | 
| LR | Logistic regression | 
| GNB | Gaussian Naïve Bayes | 
| DT | Decision Tree | 
| SVC | Support vector classifier | 
| CNN | Convolutional neural network | 
| NN | Neural Network | 
| MNB | Multinomial Naïve Bayes | 
| BNB | Bernoulli Naïve Bayes | 
| DL | Deep learning | 
| LTC | Logistic tree classifier | 
| RNN | Recurrent neural networks | 
| MCDM | Multi-criteria decision-making | 
| TP | True positive | 
| TN | True negative | 
| FP | False positive | 
| FN | False negative | 
| OWA | Ordered weighted average | 
References
- Feng, W.; Sun, J.; Zhang, L.; Cao, C.; Yang, Q. A support vector machine based naive Bayes algorithm for spam filtering. In Proceedings of the 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC), Las Vegas, NV, USA, 9–11 December 2016; pp. 1–8. [Google Scholar]
- Dada, E.G.; Bassi, J.S.; Chiroma, H.; Adetunmbi, A.O.; Ajibuwa, O.E.; Ajibuwa, O.E. Machine learning for email spam filtering: Review, approaches and open research problems. Heliyon 2019, 5, e01802. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Available online: https://www.statista.com/statistics/456500/daily-number-of-e-mails-worldwide/ (accessed on 11 April 2022).
- Fonseca, O.; Fazzion, E.; Cunha, I.; Las-Casas, P.H.B.; Guedes, D.; Meira, W.; Hoepers, C.; Steding-Jessen, K.; Chaves, M.H. Measuring, characterizing, and avoiding spam traffic costs. IEEE Internet Comput. 2016, 20, 16–24. [Google Scholar] [CrossRef]
- Park, I.; Sharman, R.; Rao, H.R.; Upadhyaya, S. The effect of spam and privacy concerns on e-mail users’ behavior. J. Inf. Syst. Secur. 2007, 3, 39–62. [Google Scholar]
- Ogwu, S.; Sice, P.; Keogh, S.; Goodlet, C. An exploratory study of the application of mindsight in email communication. Heliyon 2020, 6, e04305. [Google Scholar] [CrossRef] [PubMed]
- Cook, D.; Hartnett, J.; Manderson, K.; Scanlan, J. Catching spam before it arrives: Domain specific dynamic blacklists. In Proceedings of the 2006 Australasian workshops on Grid Computing and E-Research-Volume 54, Hobart, Australia, 16–19 January 2006; Australian Computer Soceity: Darlinghurst, Australia, 2006; pp. 193–202. [Google Scholar]
- Kshirsagar, D.; Patil, A. Blackhole attack detection and prevention by real time monitoring. In Proceedings of the 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), Tiruchengode, India, 4–6 July 2013; pp. 1–5. [Google Scholar]
- Wang, B.; Pan, W. A survey of content-based anti-spam email filtering. J. Chin. Inf. Process. 2005, 5. [Google Scholar]
- Yaseen, Q. Spam Email Detection Using Deep Learning Techniques. Procedia Comput. Sci. 2021, 184, 853–858. [Google Scholar]
- Islam, M.K.; Al Amin, M.; Islam, M.R.; Mahbub, M.N.I.; Showrov, M.I.H.; Kaushal, C. Spam-Detection with Comparative Analysis and Spamming Words Extractions. In Proceedings of the 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 3–4 September 2021; pp. 1–9. [Google Scholar]
- Siddique, Z.B.; Khan, M.A.; Din, I.U.; Almogren, A.; Mohiuddin, I.; Nazir, S. Machine Learning-Based Detection of Spam Emails. Sci. Program. 2021, 2021, 6508784. [Google Scholar] [CrossRef]
- Sheneamer, A. Comparison of Deep and Traditional Learning Methods for Email Spam Filtering. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 560–565. [Google Scholar] [CrossRef]
- Mallampati, D.; Hegde, N.P. A Machine Learning Based Email Spam Classification Framework Model: Related Challenges and Issues. Int. J. Innov. Technol. Explor. Eng. 2020, 9, 3137–3144. [Google Scholar] [CrossRef]
- Srinivasan, S.; Ravi, V.; Alazab, M.; Ketha, S.; Al-Zoubi, A.M.; Kotti Padannayil, S. Spam emails detection based on distributed word embedding with deep learning. In Machine Intelligence and Big Data Analytics for Cybersecurity Applications; Springer: Berlin/Heidelberg, Germany, 2021; pp. 161–189. [Google Scholar]
- Kumar, N.; Sonowal, S. Email spam detection using machine learning algorithms. In Proceedings of the 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), Coimbatore, India, 15–17 July 2020; pp. 108–113. [Google Scholar]
- Anitha, P.; Rao, C.G.; Babu, D.S. Email Spam Filtering Using Machine Learning Based Xgboost Classifier Method. Turk. J. Comput. Math. Educ. 2021, 12, 2182–2190. [Google Scholar]
- Sethi, M.; Chandra, S.; Chaudhary, V. Email Spam Detection using Machine Learning and Neural Networks. Int. Res. J. Eng. Technol. 2021, 8, 349–355. [Google Scholar]
- Bagui, S.; Nandi, D.; Bagui, S.; White, R.J. Machine Learning and Deep Learning for Phishing Email Classification using One-Hot Encoding. J. Comput. Sci. 2021, 17, 610–623. [Google Scholar] [CrossRef]
- Nayak, R.; Jiwani, S.A.; Rajitha, B. Spam email detection using machine learning algorithm. Mater. Today Proc. 2021. [Google Scholar] [CrossRef]
- Euna, N.J.; Hossain, S.M.M.; Anwar, M.M.; Sarker, I.H. Content-based Spam Email Detection Using N-gram Machine Learning Approach. Preprints 2021, 2021090236. [Google Scholar] [CrossRef]
- Chakraborty, S.; Mondal, B. Spam mail filtering technique using different decision tree classifiers through data mining approach-a comparative performance analysis. Int. J. Comput. Appl. 2012, 47, 26–31. [Google Scholar] [CrossRef]
- Rusland, N.F.; Wahid, N.; Kasim, S.; Hafit, H. Analysis of Naïve Bayes algorithm for email spam filtering across multiple datasets. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2017; Volume 226, p. 012091. [Google Scholar]
- Bibi, A.; Latif, R.; Khalid, S.; Ahmed, W.; Shabir, R.A.; Shahryar, T. Spam mail scanning using machine learning algorithm. J. Comput. 2020, 15, 73–84. [Google Scholar] [CrossRef]
- Guo, Z.; Yu, K.; Jolfaei, A.; Ding, F.; Zhang, N. Fuz-spam: Label smoothing-based fuzzy detection of spammers in internet of things. IEEE Trans. Fuzzy Syst. 2021. [Google Scholar] [CrossRef]
- Iqbal, K.; Khan, S.A.; Anisa, S.; Tasneem, A.; Mohammad, N. A Preliminary Study on Personalized Spam E-mail Filtering Using Bidirectional Encoder Representations from Transformers (BERT) and TensorFlow 2.0. Int. J. Comput. Digit. Syst. 2022, 11, 893–903. [Google Scholar] [CrossRef]
- Kaddoura, S.; Alfandi, O.; Dahmani, N. A spam email detection mechanism for english language text emails using deep learning approach. In Proceedings of the 2020 IEEE 29th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), Online, 10–13 September 2020; pp. 193–198. [Google Scholar]
- Zamir, A.; Khan, H.U.; Mehmood, W.; Iqbal, T.; Akram, A.U. A feature-centric spam email detection model using diverse supervised machine learning algorithms. Electron. Libr. 2020, 38, 633–657. [Google Scholar] [CrossRef]
- Sakkis, G.; Androutsopoulos, I.; Paliouras, G.; Karkaletsis, V.; Spyropoulos, C.D.; Stamatopoulos, P. Stacking classifiers for anti-spam filtering of e-mail. In Empirical Methods in Natural Language Processing; Carnegie Mellon University: Pittsburgh, PA, USA, 2001; pp. 44–50. [Google Scholar]
- Attar, A.; Rad, R.M.; Atani, R.E. A survey of image spamming and filtering techniques. Artif. Intell. Rev. 2013, 40, 71–105. [Google Scholar] [CrossRef]
- Zhang, L.; Zhu, J.; Yao, T. An evaluation of statistical spam filtering techniques. ACM Trans. Asian Lang. Inf. Process. (TALIP) 2004, 3, 243–269. [Google Scholar] [CrossRef]
- Available online: https://www.cs.cmu.edu/~enron/ (accessed on 22 December 2021).
- Koprinska, I.; Poon, J.; Clark, J.; Chan, J. Learning to classify e-mail. Inf. Sci. 2007, 177, 2167–2187. [Google Scholar] [CrossRef]
- Cormack, G.V.; Lynam, T.R. Online supervised spam filter evaluation. ACM Trans. Inf. Syst. 2007, 25, 11. [Google Scholar] [CrossRef]
- Androutsopoulos, I.; Koutsias, J.; Chandrinos, K.V.; Paliouras, G.; Spyropoulos, C.D. An evaluation of naive bayesian anti-spam filtering. In Proceedings of the 11th European Conference on Machine Learning (ECML 2000), Barcelona, Spain, 31 May–2 June 2000; Springer: Berlin/Heidelberg, Germany, 2000; pp. 9–17. [Google Scholar]
- Androutsopoulos, I.; Paliouras, G.; Karkaletsis, V.; Sakkis, G.; Spyropoulos, C.D.; Stamatopoulos, P. Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. In Proceedings of the 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, Pisa, Italy, 20–24 September 2000; Springer: Berlin/Heidelberg, Germany, 2000; pp. 1–12. [Google Scholar]
- DeBarr, D.; Wechsler, H. Spam detection using clustering, random forests, and active learning. In Proceedings of the Sixth Conference on Email and Anti-Spam, Mountain View, CA, USA, 16–17 July 2009; pp. 1–6. [Google Scholar]
- Available online: http://www.aueb.gr/users/ion/data/lingspam_public.tar.gz (accessed on 17 December 2021).
- Available online: http://www.aueb.gr/users/ion/data/PU123ACorpora.tar.gz (accessed on 19 December 2021).
- Novo-Lourés, M.; Ruano-Ordás, D.; Pavón, R.; Laza, R.; Gómez-Meire, S.; Méndez, J.R. Enhancing representation in the context of multiple-channel spam filtering. Inf. Process. Manag. 2022, 59, 102812. [Google Scholar] [CrossRef]
- Occhipinti, A.; Rogers, L.; Angione, C. A pipeline and comparative study of 12 machine learning models for text classification. Expert Syst. Appl. 2022, 201, 117193. [Google Scholar] [CrossRef]
- Guo, Z.; Tang, L.; Guo, T.; Yu, K.; Alazab, M.; Shalaginov, A. Deep graph neural network-based spammer detection under the perspective of heterogeneous cyberspace. Future Gener. Comput. Syst. 2021, 117, 205–218. [Google Scholar] [CrossRef]
- Venkateswarlu, B.; Shenoi, V. Optimized generative adversarial network with fractional calculus based feature fusion using Twitter stream for spam detection. Inf. Secur. J. Glob. Perspect. 2021. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Zadeh, L.A. Information and control. Fuzzy Sets 1965, 8, 338–353. [Google Scholar]
- Rehman, S.; Khan, S.A.; Alhems, L.M. A rule-based fuzzy logic methodology for multi-criteria selection of wind turbines. Sustainability 2020, 12, 8467. [Google Scholar] [CrossRef]
- Rehman, S.; Khan, S.A.; Alhems, L.M. Application of TOPSIS approach to multi-criteria selection of wind turbines for on-shore sites. Appl. Sci. 2020, 10, 7595. [Google Scholar] [CrossRef]
- Khan, S.A.; Engelbrecht, A.P. A new fuzzy operator and its application to topology design of distributed local area networks. Inf. Sci. 2007, 177, 2692–2711. [Google Scholar] [CrossRef]
- Khan, S.A. Design and Analysis of Evolutionary and Swarm Intelligence Techniques for Topology Design of Distributed Local Area Networks. Ph.D. Thesis, University of Pretoria, Pretoria, South Africa, 2009. [Google Scholar]
- Khan, S.A. A STRIDE Model based Threat Modelling using Unified and-Or Fuzzy Operator for Computer Network Security. Int. J. Comput. Netw. Technol. 2017, 5, 13–20. [Google Scholar] [CrossRef]
- Khan, S.A. Fuzzy preferences based STRIDE threat model for network intrusion detection. Int. J. Comput. Netw. Technol. 2017, 5, 107–111. [Google Scholar] [CrossRef]
- Mohiuddin, M.A.; Khan, S.A.; Engelbrecht, A.P. Simulated evolution and simulated annealing algorithms for solving multi-objective open shortest path first weight setting problem. Appl. Intell. 2014, 41, 348–365. [Google Scholar] [CrossRef] [Green Version]
- Mohiuddin, M.A.; Khan, S.A.; Engelbrecht, A.P. Fuzzy particle swarm optimization algorithms for the open shortest path first weight setting problem. Appl. Intell. 2016, 45, 598–621. [Google Scholar] [CrossRef] [Green Version]
- Zarghami, M.; Szidarovszky, F. Fuzzy quantifiers in sensitivity analysis of OWA operator. Comput. Ind. Eng. 2008, 54, 1006–1018. [Google Scholar] [CrossRef]
- Hu, H.; Li, G. Granular risk-based design optimization. IEEE Trans. Fuzzy Syst. 2014, 23, 340–353. [Google Scholar] [CrossRef]
- Gao, Q.; Cai, X.; Zhu, J.; Guo, X. Multi-objective optimization and fuzzy evaluation of a horizontal axis wind turbine composite blade. J. Renew. Sustain. Energy 2015, 7, 063109. [Google Scholar] [CrossRef]
- Bhowmick, A.; Hazarika, S.M. E-mail spam filtering: A review of techniques and trends. Adv. Electron. Commun. Comput. 2018, 443, 583–590. [Google Scholar] [CrossRef]
- Baledung. Available online: https://www.baeldung.com/cs/spam-filter-training-sets (accessed on 28 May 2022).

| Algorithm | Accuracy | Precision | Recall | 
|---|---|---|---|
| A | 85% | 80% | 88% | 
| B | 76% | 87% | 72% | 
| C | 89% | 78% | 81% | 
| Reference | Year | Algorithms | Datasets | Performance Metrics | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| A | R | P | L | F | G | Others | ||||
| [1] | 2016 | CVM-NB, SVM, NB | Datamall | X | X | |||||
| [10] | 2021 | BERT, Bidrectional LSTM, KNN, NB | Spambase, Spam filter from Kaggle | X | X | |||||
| [11] | 2021 | Regression, Xgboost, SVM, RF, WE, LSTM | Trec spam Dataset-2007, Enron | X | X | X | X | |||
| PU, Lingspam, Basket(combo of four) | ||||||||||
| [12] | 2021 | NB, CNN, SVM, LSTM | Urduemaildataset | X | X | X | X | |||
| [13] | 2021 | CNN, LSTM, RF, SVM, NB, DT | Enron | X | X | X | ||||
| [14] | 2020 | NB, DT (J48), DNN | SpamAssasin | X | X | X | ||||
| [15] | 2021 | LR, GNB, KNN, DT, Adaboost, | Lingspam, PU, Spam Assasin, Enron | X | X | X | X | X | ||
| RF, SVM, DeepSpamNet | ||||||||||
| [16] | 2020 | SVC, KNN, NB, DT, RF, Adaboost, Bagging | Spamcsv from Kaggale | X | X | |||||
| [17] | 2021 | SVM, CNS-FFO, Rotation Forest, | Spam_ham_dataset from Kaggle | X | X | X | X | X | ||
| MLP, DT (J48), NB, Xboost | ||||||||||
| [18] | 2021 | NN, LR, SVM, NB | Spam Assasin | X | X | X | X | |||
| [19] | 2021 | NV, SVM, DT, CNN, LSTM | Self-collected | X | ||||||
| [20] | 2021 | NB, DT (J48) | Dataset from Kaggle | X | X | X | X | |||
| [21] | 2021 | SVM, DT, LR, MNB | Dataset from Kaggle | X | X | X | X | |||
| [22] | 2012 | NB, DT (J48), LTC | Self-collected | X | X | |||||
| [23] | 2017 | NB | Spambase, Spamdata | X | X | X | X | |||
| [24] | 2020 | NB, SVM | Dataset from github | X | X | X | X | |||
| [26] | 2022 | BERT | Enron | X | X | |||||
| [27] | 2020 | BERT | Enron | X | ||||||
| [28] | 2020 | Adaboost, RF, DT (J48), SVM, Baggin, DL | CSDMC2010_SPAM | X | X | X | X | |||
| [40] | 2022 | Adaboost, Flexible Bayes, NB, RF, SVM | Spam Assasin | X | X | X | X | X | ||
| [41] | 2022 | KNN, MPNN, LR, RF, XGBoost, MNB, GNB, | Enron | X | X | X | X | |||
| BNB, RBF SVM, LSVM, PSVM, SSVM | ||||||||||
| [42] | 2021 | LDA+k-means, LSTM+lr, SVM, MLP, CNN | Twitter, SinaWeibo | X | X | X | X | |||
| [43] | 2021 | GAN, NB, LSTM, SVM | Twitter Spam | X | X | X | ||||
| Dataset | Total Emails | Spam | Ham | % of Spam | Year | 
|---|---|---|---|---|---|
| Spam archive | 15,090 | 15,090 | 0 | 100% | 1998 | 
| Spambase | 4601 | 1813 | 2788 | 39% | 1999 | 
| Lingspam | 2893 | 481 | 2412 | 17% | 2000 | 
| PU1 | 1099 | 481 | 618 | 44% | 2000 | 
| Spamassassin | 6047 | 1897 | 4150 | 31% | 2002 | 
| PU2 | 721 | 142 | 579 | 20% | 2003 | 
| PU3 | 4139 | 1826 | 2313 | 44% | 2003 | 
| PUA | 1142 | 571 | 571 | 50% | 2003 | 
| Zh1 | 1633 | 1205 | 428 | 74% | 2004 | 
| Gen spam | 40,408 | 31,196 | 9212 | 78% | 2005 | 
| Trec 2005 | 92,189 | 52,790 | 39,399 | 57% | 2005 | 
| Biggio | 8549 | 8549 | 0 | 100% | 2005 | 
| Phishing corpus | 415 | 415 | 0 | 100% | 2005 | 
| Enron | 36,715 | 20,170 | 16,545 | 55% | 2006 | 
| Trec 2006 | 37,822 | 24,912 | 12,910 | 66% | 2006 | 
| Trec 2007 | 75,419 | 50,199 | 25,220 | 67% | 2007 | 
| Dataset | Epochs | Accuracy | Precision | Recall | |
|---|---|---|---|---|---|
| 10 | 0.92 | 0.95 | 0.90 | 0.87 | |
| Enron | 20 | 0.94 | 0.95 | 0.9 | 0.88 | 
| 30 | 0.94 | 0.95 | 0.92 | 0.89 | |
| 10 | 0.98 | 0.97 | 0.98 | 0.96 | |
| LingSpam | 20 | 0.98 | 0.97 | 0.98 | 0.96 | 
| 30 | 0.98 | 0.95 | 0.97 | 0.94 | |
| 10 | 0.92 | 0.89 | 0.86 | 0.82 | |
| PU | 20 | 0.92 | 0.84 | 0.87 | 0.80 | 
| 30 | 0.92 | 0.88 | 0.85 | 0.81 | 
| Dataset | Epochs | Accuracy | Precision | Recall | |
|---|---|---|---|---|---|
| 10 | 0.96 | 0.98 | 0.98 | 0.95 | |
| Enron | 20 | 0.97 | 0.98 | 0.98 | 0.96 | 
| 30 | 0.97 | 0.98 | 0.98 | 0.96 | |
| 10 | 0.96 | 0.96 | 0.94 | 0.92 | |
| LingSpam | 20 | 0.98 | 0.96 | 0.95 | 0.93 | 
| 30 | 0.98 | 0.95 | 0.97 | 0.94 | |
| 10 | 0.93 | 0.95 | 0.9 | 0.88 | |
| PU | 20 | 0.95 | 0.95 | 0.95 | 0.92 | 
| 30 | 0.96 | 0.96 | 0.94 | 0.92 | 
| Gender | Participants | Population | Occupation | 
|---|---|---|---|
| Male | 34 | 83% | Students | 
| Female | 7 | 17% | Students | 
| Total | 41 | 100% | - | 
| Dataset | Epochs | Accuracy | Precision | Recall | Decision (Male) | Percentage (Population) | Decision (Female) | Percentage (Population) | 
|---|---|---|---|---|---|---|---|---|
| 10 | 0.92 | 0.95 | 0.90 | 0 | 0% | 0 | 0% | |
| Enron | 20 | 0.94 | 0.95 | 0.9 | 0 | 0% | 0 | 0% | 
| 30 | 0.94 | 0.95 | 0.92 | 34 | 100% | 7 | 100% | |
| 10 | 0.98 | 0.97 | 0.98 | 20 | 59% | 3 | 43% | |
| LingSpam | 20 | 0.98 | 0.97 | 0.98 | 14 | 41% | 4 | 57% | 
| 30 | 0.98 | 0.95 | 0.97 | 0 | 0% | 0 | 0% | |
| 10 | 0.92 | 0.89 | 0.86 | 10 | 29% | 2 | 29% | |
| PU | 20 | 0.92 | 0.84 | 0.87 | 17 | 50% | 1 | 14% | 
| 30 | 0.92 | 0.88 | 0.85 | 7 | 21% | 4 | 57% | 
| Dataset | Epochs | Accuracy | Precision | Recall | Decision (Male) | Percentage (Population) | Decision (Female) | Percentage (Population) | 
|---|---|---|---|---|---|---|---|---|
| 10 | 0.96 | 0.98 | 0.98 | 0 | 0% | 0 | 0% | |
| Enron | 20 | 0.97 | 0.98 | 0.98 | 21 | 62% | 6 | 57% | 
| 30 | 0.97 | 0.98 | 0.98 | 13 | 38% | 1 | 43% | |
| 10 | 0.96 | 0.96 | 0.94 | 0 | 0% | 0 | 0% | |
| LingSpam | 20 | 0.98 | 0.96 | 0.95 | 15 | 44% | 3 | 43% | 
| 30 | 0.98 | 0.95 | 0.97 | 19 | 56% | 4 | 57% | |
| 10 | 0.93 | 0.95 | 0.9 | 0 | 0% | 0 | 0% | |
| PU | 20 | 0.95 | 0.95 | 0.95 | 10 | 29% | 2 | 29% | 
| 30 | 0.96 | 0.96 | 0.94 | 24 | 71% | 5 | 71% | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khan, S.A.; Iqbal, K.; Mohammad, N.; Akbar, R.; Ali, S.S.A.; Siddiqui, A.A. A Novel Fuzzy-Logic-Based Multi-Criteria Metric for Performance Evaluation of Spam Email Detection Algorithms. Appl. Sci. 2022, 12, 7043. https://doi.org/10.3390/app12147043
Khan SA, Iqbal K, Mohammad N, Akbar R, Ali SSA, Siddiqui AA. A Novel Fuzzy-Logic-Based Multi-Criteria Metric for Performance Evaluation of Spam Email Detection Algorithms. Applied Sciences. 2022; 12(14):7043. https://doi.org/10.3390/app12147043
Chicago/Turabian StyleKhan, Salman A., Kashif Iqbal, Nazeeruddin Mohammad, Rehan Akbar, Syed Saad Azhar Ali, and Ammar Ahmed Siddiqui. 2022. "A Novel Fuzzy-Logic-Based Multi-Criteria Metric for Performance Evaluation of Spam Email Detection Algorithms" Applied Sciences 12, no. 14: 7043. https://doi.org/10.3390/app12147043
APA StyleKhan, S. A., Iqbal, K., Mohammad, N., Akbar, R., Ali, S. S. A., & Siddiqui, A. A. (2022). A Novel Fuzzy-Logic-Based Multi-Criteria Metric for Performance Evaluation of Spam Email Detection Algorithms. Applied Sciences, 12(14), 7043. https://doi.org/10.3390/app12147043
 
         
                                                


 
       