SMS Scam Detection Application Based on Optical Character Recognition for Image Data Using Unsupervised and Deep Semi-Supervised Learning
Abstract
:1. Introduction
2. Related Works
3. Feature Generation
3.1. Dataset
3.2. Dataset Pre-Processing
3.3. Feature Extraction
3.3.1. Unsupervised: TF_IDF and PCA
3.3.2. Deep Semi-Supervised: Tokenization and Sequence and Padding
4. Proposed Work: Technical Implementation of Unsupervised and Deep Semi-Supervised Models
4.1. Unsupervised Learning Models
- Feature Generation: Features were extracted from the text messages using techniques like TF-IDF (Term Frequency-Inverse Document Frequency) and PCA (Principal Component Analysis) to capture the most relevant textual information and reduce dimensionality.
- Machine Learning Algorithm: In this phase, the K-Means algorithm was used to partition the data into clusters based on message similarity, while NMF provided a linear combination of non-negative features, and GMM modeled the data’s distribution using multiple Gaussian distributions.
- Message Classification: Each model classified messages as either spam or ham, based on the patterns identified during the clustering process. The performance of these models was evaluated using precision, accuracy, recall, and F1-score, providing a comprehensive assessment of their effectiveness in smishing detection.
4.2. Model Settings
4.3. Experimental Results and Discussion
4.4. Deep Semi-Supervised Learning
- Feature Generation: Similar to the unsupervised models, this phase involved extracting features using advanced natural language processing techniques, including word embeddings like Word2Vec, GloVe, and possibly transformer-based embeddings, to capture more nuanced text representations.
- Deep Learning Algorithm: We implemented three distinct deep learning architectures—RNN-Flatten, LSTM (Long Short-Term Memory), and Bi-LSTM (Bidirectional LSTM). The RNN-Flatten model utilized a recurrent neural network followed by a flattening layer to process the sequential data, while LSTM and Bi-LSTM models captured long-term dependencies and bidirectional context within the text.
- Message Classification: Each of these models classified the messages as spam or ham. The classification report, which includes metrics like accuracy, precision, recall, and F1-score, was generated to evaluate the performance of these deep semi-supervised models, allowing us to compare their effectiveness in identifying smishing attacks.
4.5. Model Configuration Details
4.6. Experiments Results for Semi-Supervised Approache
5. Discussion
5.1. Analysis of Models Performance
5.2. Performance with New Data
6. Real Time Detection Capabilities
- First, each SMS image is individually classified, allowing the system to handle messages one at a time. The user can then select from a variety of models to analyze the nature of the image, offering the ability to choose the most suitable model for their needs as shown in Figure 9 and Figure 10 respectively.
- Once the SMS is submitted, the system initiates preprocessing to prepare the data for analysis. Following preprocessing, the selected model’s feature extraction techniques and classifier are applied to the SMS, enabling the system to accurately assess its content.
- Finally, the application displays the result, indicating whether the message is classified as spam or ham with the accuracy given by the selected model as shown in figure.
7. Conclusions and Future Work
- Developing a more versatile application that enables users to effortlessly select and switch between different models, tailored to their specific needs.
- Enhancing spam detection accuracy by integrating advanced feature extraction methods and natural language processing (NLP) techniques, with a focus on capturing subtle details such as digits, fonts, and emojis within messages.
- Expanding language support beyond English to ensure the system’s reliability and effectiveness across a broader range of linguistic contexts.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Samad, S.R.A.; Ganesan, P.; Rajasekaran, J.; Radhakrishnan, M.; Ammaippan, H.; Ramamurthy, V. SmishGuard: Leveraging Machine Learning and Natural Language Processing for Smishing Detection. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 11. [Google Scholar] [CrossRef]
- Njuguna, D.N.; Kamau, J.; Kaburu, D. A Review of Smishing Attaks Mitigation Strategies. Int. J. Comput. Inf. Technol. 2022, 11, 9–13. [Google Scholar] [CrossRef]
- Haber, M.J.; Chappell, B.; Hills, C. Attack Vectors. In Cloud Attack Vectors: Building Effective Cyber-Defense Strategies to Protect Cloud Resources; Apress: Berkeley, CA, USA, 2022; pp. 117–219. [Google Scholar]
- Vosen, D.J. An Exploration of Cyberpsychology Strategies Addressing Unintentional Insider Threats through Undergraduate Education: A Qualitative Study. Ph.D. Thesis, Colorado Technical University, Springs, CO, USA, 2021. [Google Scholar]
- McLennan, M. The Global Risks Report 2022, 17th ed.; World Economic Forum: Cologny, Switzerland, 2022. [Google Scholar]
- Julis, M.R.; Alagesan, S. Spam Detection in SMS Using Machine Learning through Textmining. Int. J. Sci. Technol. Res. 2020, 9, 2. [Google Scholar]
- Barrera, D.; Naranjo, V.; Fuertes, W.; Macas, M. Literature Review of SMS Phishing Attacks: Lessons, Addresses, and Future Challenges. In Proceedings of the International Conference on Advanced Research in Technologies, Information, Innovation and Sustainability, Madrid, Spain, 18–20 October 2023; Springer: Cham, Switzerland, 2024; pp. 191–204. [Google Scholar]
- Tiwari, A. Supervised Learning: From Theory to Applications. In Artificial Intelligence and Machine Learning for EDGE Computing; Academic Press: Cambridge, MA, USA, 2022; pp. 23–32. [Google Scholar]
- Al-Qahtani, A.F.; Cresci, S. The COVID-19 Scamdemic: A Survey of Phishing Attacks and Their Countermeasures during COVID-19. IET Inf. Secur. 2022, 16, 324–345. [Google Scholar] [CrossRef] [PubMed]
- Akinyelu, A.A. Advances in Spam Detection for Email Spam, Web Spam, Social Network Spam, and Review Spam: ML-Based and Nature-Inspired-Based Techniques. J. Comput. Secur. 2021, 29, 473–529. [Google Scholar] [CrossRef]
- Wickramasinghe, I.; Kalutarage, H. Naive Bayes: Applications, Variations and Vulnerabilities: A Review of Literature with Code Snippets for Implementation. Soft Comput. 2021, 25, 2277–2293. [Google Scholar] [CrossRef]
- Genuer, R.; Poggi, J.-M. Random Forests; Springer: Cham, Switzerland, 2020. [Google Scholar]
- Abayomi-Alli, O.; Misra, S.; Abayomi-Alli, A.; Odusami, M. A review of soft techniques for SMS spam classification: Methods, approaches and applications. J. Eng. Appl. Artif. Intell. 2019, 86, 197–212. [Google Scholar]
- Taha, K. Semi-Supervised and Un-Supervised Clustering: A Review and Experimental Evaluation. Inf. Syst. 2023, 114, 102178. [Google Scholar] [CrossRef]
- Kumarasiri, W.L.T.T.N.; Siriwardhana, M.K.J.C.; Suraweera, S.A.D.S.L.; Senarathne, A.N.; Harshanath, S.M.B. Cybersmish: A Proactive Approach for Smishing Detection and Prevention Using Machine Learning. In Proceedings of the 2023 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Kirtipur, Nepal, 11–13 October 2023; pp. 210–217. [Google Scholar]
- Shahra, E.Q.; Basurra, S.; Wu, W. Real-Time Multi-Class Classification of Water Quality Using MLP and Ensemble Learning. In Proceedings of the International Congress on Information and Communication Technology; Springer: Singapore, 2024; pp. 481–491. [Google Scholar]
- Usmani, U.A.; Happonen, A.; Watada, J. A Review of Unsupervised Machine Learning Frameworks for Anomaly Detection in Industrial Applications. In Intelligent Computing; Springer: Cham, Switzerland, 2022; pp. 158–189. [Google Scholar]
- Patel, E.; Kushwaha, D.S. Clustering Cloud Workloads: K-Means vs Gaussian Mixture Model. Procedia Comput. Sci. 2020, 171, 158–167. [Google Scholar] [CrossRef]
- Rokach, L.; Maimon, O. Clustering Methods. In Data Mining and Knowledge Discovery Handbook; Springer: Boston, MA, USA, 2005; pp. 321–352. [Google Scholar]
- Slijepcevic, I.V.; Scaife, A.M.M.; Walmsley, M.; Bowles, M.; Wong, O.I.; Shabala, S.S.; Tang, H. Radio Galaxy Zoo: Using Semi-Supervised Learning to Leverage Large Unlabelled Data Sets for Radio Galaxy Classification Under Data Set Shift. Mon. Not. R. Astron. Soc. 2022, 514, 2599–2613. [Google Scholar] [CrossRef]
- Mansoor, R.A.Z.A.; Jayasinghe, N.D.; Muslam, M.M.A. A Comprehensive Review on Email Spam Classification Using Machine Learning Algorithms. In Proceedings of the 2021 International Conference on Information Networking (ICOIN), Jeju Island, Republic of Korea, 13–16 January 2021; pp. 327–332. [Google Scholar]
- Sharaff, A.; Pathak, V.; Paul, S.S. Deep Learning-Based Smishing Message Identification Using Regular Expression Feature Generation. Expert Syst. 2022, 40, e13153. [Google Scholar] [CrossRef]
- Shahra, E.Q.; Wu, W.; Basurra, S.; Rizou, S. Deep Learning for Water Quality Classification in Water Distribution Networks. In Proceedings of the International Conference on Engineering Applications of Neural Networks, Crete, Greece, 25–27 June 2021; pp. 153–164. [Google Scholar]
- Gupta, M.; Bakliwal, A.; Agarwal, S.; Mehndiratta, P. A Comparative Study of Spam SMS Detection Using Machine Learning Classifiers. In Proceedings of the 2018 Eleventh International Conference on Contemporary Computing (IC3), Noida, India, 2–4 August 2018; pp. 1–7. [Google Scholar]
- Yerima, S.Y.; Bashar, A. Semi-Supervised Novelty Detection with One Class SVM for SMS Spam Detection. In Proceedings of the 2022 29th International Conference on Systems, Signals and Image Processing (IWSSIP), Sofia, Bulgaria, 1–3 June 2022; pp. 1–4. [Google Scholar]
- Sheikhi, S.; Kheirabadi, M.T.; Bazzazi, A. An Effective Model for SMS Spam Detection Using Content-Based Features and Averaged Neural Network. Int. J. Eng. 2020, 33, 221–228. [Google Scholar]
- Zainal, K.; Sulaiman, N.F.; Jali, M.Z. An Analysis of Various Algorithms for Text Spam Classification and Clustering Using RapidMiner and Weka. Int. J. Comput. Sci. Inf. Secur. 2015, 13, 66. [Google Scholar]
- Oswald, C.; Simon, S.E.; Bhattacharya, A. SpotSpam: Intention Analysis Driven SMS Spam Detection Using BERT Embeddings. ACM Trans. Web (TWEB) 2022, 16, 1–27. [Google Scholar] [CrossRef]
- Jouban, M.Q.; Farou, Z. TAMS: Text Augmentation Using Most Similar Synonyms for SMS Spam Filtering. 2022. Available online: https://ceur-ws.org/Vol-3226/paper4.pdf (accessed on 8 August 2024).
- Mishra, S.; Soni, D. Implementation of ‘Smishing Detector’: An Efficient Model for Smishing Detection Using Neural Network. SN Comput. Sci. 2022, 3, 1–13. [Google Scholar] [CrossRef] [PubMed]
- Zhang, B.; Zhao, G.; Feng, Y.; Zhang, X.; Jiang, W.; Dai, J.; Gao, J. Behavior Analysis Based SMS Spammer Detection in Mobile Communication Networks. In Proceedings of the 2016 IEEE First International Conference on Data Science in Cyberspace (DSC), Changsha, China, 13–16 June 2016; pp. 538–543. [Google Scholar]
- Waheeb, W.; Ghazali, R.; Deris, M.M. Content-Based SMS Spam Filtering Based on the Scaled Conjugate Gradient Backpropagation Algorithm. In Proceedings of the 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), Zhangjiajie, China, 15–17 August 2015; pp. 675–680. [Google Scholar]
- Roy, P.K.; Singh, J.P.; Banerjee, S. Deep Learning to Filter SMS Spam. Future Gener. Comput. Syst. 2020, 102, 524–533. Available online: https://www.sciencedirect.com/science/article/pii/S0167739X19306879 (accessed on 8 August 2024). [CrossRef]
- Shahra, E.Q.; Wu, W.; Basurra, S.; Aneiba, A. Intelligent Edge-Cloud Framework for Water Quality Monitoring in Water Distribution System. Water 2024, 16, 196. [Google Scholar] [CrossRef]
- Nair, A.R.; Tripathy, V.D.; Lalitha Priya, R.; Kashimani, M.; Janthalur, G.A.N.; Ansari, N.J.; Jurcic, I. A Smarter Way to Collect and Store Data: AI and OCR Solutions for Industry 4.0 Systems. In Topics in Artificial Intelligence Applied to Industry 4.0; Wiley Telecom: Hoboken, NJ, USA, 2024; pp. 271–288. [Google Scholar]
- Manovich, L. Computer vision, human senses, and language of art. AI SOCIETY 2021, 36, 1145–1152. [Google Scholar] [CrossRef]
- Tabassum, A.; Patil, R.R. A survey on text pre-processing & feature extraction techniques in natural language processing. Int. Res. J. Eng. Technol. (IRJET) 2020, 7, 4864–4867. [Google Scholar]
- Dong, G.; Liu, H. Feature Engineering for Machine Learning and Data Analytics; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Patel, C.; Patel, A.; Patel, D. Optical character recognition by open source OCR tool tesseract: A case study. Int. J. Comput. Appl. 2012, 55, 50–56. [Google Scholar] [CrossRef]
- Guyon, I.; Elisseeff, A. An introduction to feature extraction. In Feature Extraction: Foundations and Applications; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1–25. [Google Scholar]
- Karamizadeh, S.; Abdullah, S.M.; Manaf, A.A.; Zamani, M.; Hooman, A. An overview of principal component analysis. J. Signal Inf. Process. 2020, 4. [Google Scholar] [CrossRef]
- Imani, M.; Montazer, G.A. Email Spam Detection Using Linear Discriminant Analysis Based on Clustering. CSI J. Comput. Sci. Eng. 2017, 15, 22–30. [Google Scholar]
- Jolliffe, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed]
- Wang, X.S.; Ryoo, J.H.J.; Bendle, N.; Kopalle, P.K. The role of machine learning analytics and metrics in retailing research. J. Retail. 2021, 97, 658–675. [Google Scholar] [CrossRef]
- Ouali, Y.; Hudelot, C.; Tami, M. An overview of deep semi-supervised learning. arXiv 2020, arXiv:2006.05278. [Google Scholar]
- Van Engelen, J.E.; Hoos, H.H. A survey on semi-supervised learning. Mach. Learn. 2020, 109, 373–440. [Google Scholar] [CrossRef]
Author | Methods | Dataset | Advantages | Challenges | Preprocessing Techniques |
---|---|---|---|---|---|
[22] | Multinomial NB, SVM, and RF, LSTM, Stacked LTSM, Bi-LSTM, and stacked Bi-LSTM | Stacked Bi-LSTM | 98.8% and 99.09% (with and without regex) | UCI dataset | stemming, stopwords removal, regex |
[24] | DT, SVM, NB, LR, AdaBoost, ANN, CNN, RF | CNN | 99.19%, 98.25% | Two different Spam SMS Dataset | Tf-IDF, Tokenizer |
[25] | One class SVM | One class SVM | 98% | UCI dataset | TF-IDF, bag-of-words |
[26] | content-based features using averaged neural networks | Neural Network Algorithm | 98% | UCI dataset | C# framework for feature extraction like urls, punctuation, emoji etcs. |
[27] | Classification and Clustering | For classification, SVM is best and for clustering, the K-Means algorithm is best | Weka SVM 99.3% in 1.54 s, K-Means 2.7 s, RapidMiner SVM 96.64% in 21 s, K-Means in 37.0 s. | UCI dataset | Tokenization, Stop word removal |
[28] | Decision tree, SVM, Random forest | DistilBERT + SVM | 98.07% | Grumble Text Website, NUS SMS Corpus (NSC), Caroline Tag’s Ph.D. Thesis, Spam SMS Corpus v.0.1 Big | BERT, DistillBERT, RoBERT, SpanBERT, NLP, Cosine Similarity Measures |
[29] | RF, Bi-LSTM | Bi-LSTM | 93.34% | UCI spam dataset | TAMS, Text Augumented Most similar synonymns, Word2Vec, stop words, Duplicate removal, |
[30] | Back Propogation NN, NB, DT | Back Propogation NN | 97.40% | UCI spam dataset | stemming, tokenization, and feature extraction for seven best features through NN. |
[31] | Hidden Markov model (HMM) | HMM | 95.90% | UCI spam dataset | Stop words removal, punctuation to original words |
[32] | ANN, Scaled Conjugate Gradient Algorithm | ANN | 99.10% | Datasets contain SMS spam, DIT spam, British language | Feature abstraction, replacement of similar words, tokenization, stemming, Lowercase conversion |
[33] | NB, LR, CNN, LSTM, RF, The boosted Gradient, | CNN | 99.44% | UCI spam dataset | Feature extraction |
Text | Label |
---|---|
Hi, How are you. When are you planning to meet me | 1 |
Congratulations on winning the prize of 2000. To stop receiving messages, type stop www.morereplayport.co.uk, accessed on 8 August 2024 Customer Support 0987617205546 | 0 |
Good Morning. Can we discuss this issue after sometime instaed of now | 1 |
Service announcement from BRP. You have received a BRP card. Please call 07046744435 right away to schedule delivery between 8 a.m. and 8 p.m. | 0 |
Text1 | Natural Language Processing is a part of AI |
Text2 | Machine learning is a part of AI |
Vocab | AI | Is | Language | Learning | Machine | Natural | Of | Part | Process |
---|---|---|---|---|---|---|---|---|---|
Text1 | 1 | 1 | 0.317 | 0 | 0 | 0.317 | 1 | 1 | 0.317 |
Text2 | 1 | 1 | 0 | 0.354 | 0.354 | 0 | 1 | 1 | 0 |
Before tokenizer | me also da i fel yesterday night wait til day night dear |
After tokenizer | [29, 253, 319, 3, 384, 354, 200, 215, 355, 78, 200, 102] |
Train_data | yesterday night wait | |
til day night dear | [354, 200, 215, | |
355, 78, 200, 102] | ||
Test_data | since when which side | |
any fever any vomitin | [835, 85, 349, | |
3200, 120, 120] |
Encoded_train | [216, 1085, 1086, 123, 1, 1633, 320, 1634, 3, 79, 385, 2, 90, 85, 3, 40, 47] |
Padded_train_pre | [0 0 0 0 0 0 0 0 216 1085 1086 123 1 1633 320 1634 3 79 385 2 90 85 3 40 47] |
Padded_train_post | [216 1085 1086 123 1 1633 320 1634 3 79 385 2 90 85 3 40 47 0 0 0 0 0 0 0 0] |
Model | Feature Extraction | Min_df | Clusters | Initial Iterations | Max Iterations | Random State |
---|---|---|---|---|---|---|
Kmeans | Vectorizer | 10 | 2 | 10 | 600 | 99 |
Kmeans | Transformer | None | 2 | 10 | 600 | 99 |
NMF | Vectorizer | 10 | 2 | 0 | 600 | 99 |
PCA | Transformer | 30 | 2 | 10 | 600 | 99 |
GMM | Vectorizer | 10 | 2 | 0 | 600 | 99 |
Runs | K-means Vectorizer | NMF | PCA | K-means Transformer | Guassian_Matrix |
---|---|---|---|---|---|
1 | 90.51% | 88.24% | 69.81% | 71.87% | 88.31% |
2 | 91.88% | 88.24% | 71.94% | 68.18% | 91.47% |
3 | 92.30% | 88.24% | 71.66% | 68.18% | 88.03% |
4 | 91.06% | 88.24% | 66.78% | 71.87% | 82.39% |
5 | 90.51% | 88.24% | 71.94% | 71.87% | 87.14% |
6 | 91.27% | 88.24% | 71.73% | 71.87% | 86.59% |
7 | 91.06% | 88.24% | 71.32% | 71.87% | 86.31% |
8 | 92.43% | 88.24% | 71.87% | 71.87% | 92.37% |
9 | 89.96% | 88.24% | 66.71% | 71.87% | 91.33% |
10 | 92.57% | 88.24% | 72.01% | 71.87% | 87.28% |
11 | 90.92% | 88.24% | 71.80% | 71.87% | 86.67% |
12 | 89.13% | 88.24% | 72.01% | 71.87% | 85.35% |
13 | 91.06% | 88.24% | 72.01% | 71.87% | 90.30% |
14 | 90.44% | 88.24% | 69.81% | 71.87% | 91.47% |
15 | 90.30% | 88.24% | 72.01% | 71.87% | 92.37% |
16 | 90.92% | 88.24% | 72.01% | 71.87% | 92.50% |
17 | 89.41% | 88.24% | 71.80% | 71.87% | 88.10% |
18 | 91.54% | 88.24% | 71.94% | 71.87% | 89.61% |
19 | 90.78% | 88.24% | 72.01% | 71.87% | 92.37% |
20 | 92.23% | 88.24% | 71.80% | 71.87% | 90.92% |
Accuracy | 91.01% | 88.24% | 71.15% | 71.50% | 89.04% |
Model | Hyperparameters | Feature Extraction Parameters | Accuracy |
---|---|---|---|
K-means Vectorizer | min df = 10 | sublinear tf = true, norm = l2, ngram range = (1, 2), stop words = ‘english’ | (51.03%) |
K-means Vectorizer | min df = 10 | - | (90.92%) |
K-means Vectorizer | min df = 5 | - | (88.23%) |
K-means Vectorizer | min df = 14 | - | (82.53%) |
K-means Vectorizer | min df = none | - | (69.87%) |
Gaussian Mixture | min df = 10 | sublinear tf = true, norm = l2, ngram range = (1, 2), stop words = ‘english’ | (50.75%) |
Gaussian Mixture | min df = 10 | - | (89.00%) |
Gaussian Mixture | min df = 5 | - | (81.00%) |
Gaussian Mixture | min df = 14 | - | (87.00%) |
Gaussian Mixture | min df = none | - | |
NMF | min df = 10 | n_components = 2, solver = mu | (88.00%) |
K-means Transformer | - | - | (71% ) |
PCA | n_components = 25 | - | (72% ) |
PCA | n component = 2 | - | (38.65%) |
PCA | n component = 10 | - | (66.71%) |
PCA | n component = 30 | - | - |
PCA | n component = 40 | - | - |
Model | Type | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
K-means Vectorizer | ML | 90.92 | 91:00 | 91:00 | 91:00 |
K-means Transformer | ML | 71:87 | 80.00 | 72:00 | 70:00 |
NMF | ML | 88.24 | 88:00 | 88:00 | 88:00 |
PCA | ML | 72.48 | 74:00 | 72.00 | 71:00 |
GMM | ML | 89.40 | 89:00 | 89.50 | 89:00 |
Parameter | RNN-Flatten | LSTM | Bi-LSTM |
---|---|---|---|
Training and Testing Ratio | 80:20 | 8:20 | 8:20 |
Rando State | 42 | 42 | 42 |
Vocabulary | 3462 | 3462 | 3462 |
Max sequence leng | 8 | 8 | 8 |
Embedding Size | 24 | 24 | 24 |
Unit layers | 1 layer(8,24) | 1 layer(8,24) | 1 layer(8,24) |
Dense | 500 | 500 | 500 |
Dropout | 0.5 | 0.5 | 0.5 |
Feature Layer | Relu | Relu | Relu |
Classifying Laye | Sigmoid | Sigmoid | Sigmoid |
Optimizer | rmsprop | adam | adam |
Loss Function | crossentropy | crossentropy | crossentropy |
Training epoch | 50 | 50 | 50 |
Feature Extraction | Tokenizer and pad sequences | Tokenizer and pad sequences | Tokenizer and pad sequence |
Runs | RNN | LSTM | Bi-LSTM |
---|---|---|---|
1 | 94.50% | 92.44% | 92.44% |
2 | 95.53% | 92.10% | 92.78% |
3 | 95.19% | 92.44% | 92.10% |
4 | 94.85% | 92.44% | 93.81% |
5 | 92.78% | 91.75% | 94.50% |
6 | 94.50% | 93.13% | 94.50% |
7 | 93.47% | 91.41% | 94.50% |
8 | 92.44% | 93.13% | 92.44% |
9 | 95.16% | 91.75% | 94.85% |
10 | 94.81% | 93.13% | 94.16% |
11 | 93.81% | 93.13% | 93.13% |
12 | 93.13% | 91.41% | 94.16% |
13 | 93.13% | 91.75% | 93.81% |
14 | 93.81% | 90.72% | 95.53% |
15 | 92.44% | 92.10% | 94.50% |
16 | 94.16% | 93.47% | 94.85% |
17 | 95.13% | 93.47% | 94.16% |
18 | 93.47% | 91.07% | 94.16% |
19 | 95.50% | 91.75% | 94.50% |
20 | 94.85% | 92.44% | 94.16% |
Accuracy | 94.13% | 92.25% | 93.95% |
Model | Type | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
RNN-Flatten | DL | 94.13 | 94.00 | 94.00 | 94:00 |
LSTM | DL | 92.09 | 93.00 | 92.00 | 92:00 |
Bi-LSTM | DL | 92.78 | 93.00 | 92.00 | 92:00 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shinde, A.; Shahra, E.Q.; Basurra, S.; Saeed, F.; AlSewari, A.A.; Jabbar, W.A. SMS Scam Detection Application Based on Optical Character Recognition for Image Data Using Unsupervised and Deep Semi-Supervised Learning. Sensors 2024, 24, 6084. https://doi.org/10.3390/s24186084
Shinde A, Shahra EQ, Basurra S, Saeed F, AlSewari AA, Jabbar WA. SMS Scam Detection Application Based on Optical Character Recognition for Image Data Using Unsupervised and Deep Semi-Supervised Learning. Sensors. 2024; 24(18):6084. https://doi.org/10.3390/s24186084
Chicago/Turabian StyleShinde, Anjali, Essa Q. Shahra, Shadi Basurra, Faisal Saeed, Abdulrahman A. AlSewari, and Waheb A. Jabbar. 2024. "SMS Scam Detection Application Based on Optical Character Recognition for Image Data Using Unsupervised and Deep Semi-Supervised Learning" Sensors 24, no. 18: 6084. https://doi.org/10.3390/s24186084
APA StyleShinde, A., Shahra, E. Q., Basurra, S., Saeed, F., AlSewari, A. A., & Jabbar, W. A. (2024). SMS Scam Detection Application Based on Optical Character Recognition for Image Data Using Unsupervised and Deep Semi-Supervised Learning. Sensors, 24(18), 6084. https://doi.org/10.3390/s24186084