MDPI - Publisher of Open Access Journals

19 pages, 1776 KiB

Open AccessReview

Decoding the Genes Orchestrating Egg and Sperm Fusion Reactions and Their Roles in Fertility

by Ranjha Khan, Muhammad Azhar and Muhammad Umair

Biomedicines 2024, 12(12), 2850; https://doi.org/10.3390/biomedicines12122850 - 15 Dec 2024

Viewed by 2643

Mammalian fertilization is a complex and highly regulated process that has garnered significant attention, particularly with advancements in assisted reproductive technologies such as in vitro fertilization (IVF). The fusion of egg and sperm involves a sequence of molecular and cellular events, including capacitation, [...] Read more.

Mammalian fertilization is a complex and highly regulated process that has garnered significant attention, particularly with advancements in assisted reproductive technologies such as in vitro fertilization (IVF). The fusion of egg and sperm involves a sequence of molecular and cellular events, including capacitation, the acrosome reaction, adhesion, and membrane fusion. Critical genetic factors, such as IZUMO1, JUNO (also known as FOLR4), CD9, and several others, have been identified as essential mediators in sperm–egg recognition and membrane fusion. Additionally, glycoproteins such as ZP3 within the zona pellucida are crucial for sperm binding and triggering the acrosome reaction. Recent gene-editing technologies, such as CRISPR/Cas9 and conditional knockout models, have facilitated the functional annotation of genes such as SPAM1 and ADAM family members, further elucidating their roles in capacitation and adhesion. Furthermore, the integration of CRISPR-Cas9 with omics technologies, including transcriptomics, proteomics, and lipidomics, has unlocked new avenues for identifying previously unknown genetic players and pathways involved in fertilization. For instance, transcriptomics can uncover gene expression profiles during gamete maturation, while proteomics identifies key protein interactions critical for processes such as capacitation and the acrosome reaction. Lipidomics adds another dimension by revealing how membrane composition influences gamete fusion. Together, these tools enable the discovery of novel genes, pathways, and molecular mechanisms involved in fertility, providing insights that were previously unattainable. These approaches not only deepen our molecular understanding of fertility mechanisms but also hold promise for refining diagnostic tools and therapeutic interventions for infertility. This review summarizes the current molecular insights into genes orchestrating fertilization and highlights cutting-edge methodologies that propel the field toward novel discoveries. By integrating these findings, this review aims to provide valuable knowledge for clinicians, researchers, and technologists in the field of reproductive biology and assisted reproductive technologies. Full article

(This article belongs to the Special Issue Molecular and Genetic Bases of Infertility)

► Show Figures

Figure 1

27 pages, 920 KiB

Open AccessArticle

AI-Generated Spam Review Detection Framework with Deep Learning Algorithms and Natural Language Processing

by Mudasir Ahmad Wani, Mohammed ElAffendi and Kashish Ara Shakil

Computers 2024, 13(10), 264; https://doi.org/10.3390/computers13100264 - 12 Oct 2024

Cited by 4 | Viewed by 4549

Abstract

Spam reviews pose a significant challenge to the integrity of online platforms, misleading consumers and undermining the credibility of genuine feedback. This paper introduces an innovative AI-generated spam review detection framework that leverages Deep Learning algorithms and Natural Language Processing (NLP) techniques to [...] Read more.

Spam reviews pose a significant challenge to the integrity of online platforms, misleading consumers and undermining the credibility of genuine feedback. This paper introduces an innovative AI-generated spam review detection framework that leverages Deep Learning algorithms and Natural Language Processing (NLP) techniques to identify and mitigate spam reviews effectively. Our framework utilizes multiple Deep Learning models, including Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) networks, Gated Recurrent Unit (GRU), and Bidirectional LSTM (BiLSTM), to capture intricate patterns in textual data. The system processes and analyzes large volumes of review content to detect deceptive patterns by utilizing advanced NLP and text embedding techniques such as One-Hot Encoding, Word2Vec, and Term Frequency-Inverse Document Frequency (TF-IDF). By combining three embedding techniques with four Deep Learning algorithms, a total of twelve exhaustive experiments were conducted to detect AI-generated spam reviews. The experimental results demonstrate that our approach outperforms the traditional machine learning models, offering a robust solution for ensuring the authenticity of online reviews. Among the models evaluated, those employing Word2Vec embeddings, particularly the BiLSTM_Word2Vec model, exhibited the strongest performance. The BiLSTM model with Word2Vec achieved the highest performance, with an exceptional accuracy of 98.46%, a precision of 0.98, a recall of 0.97, and an F1-score of 0.98, reflecting a near-perfect balance between precision and recall. Its high F2-score (0.9810) and F0.5-score (0.9857) further highlight its effectiveness in accurately detecting AI-generated spam while minimizing false positives, making it the most reliable option for this task. Similarly, the Word2Vec-based LSTM model also performed exceptionally well, with an accuracy of 97.58%, a precision of 0.97, a recall of 0.96, and an F1-score of 0.97. The CNN model with Word2Vec similarly delivered strong results, achieving an accuracy of 97.61%, a precision of 0.97, a recall of 0.96, and an F1-score of 0.97. This study is unique in its focus on detecting spam reviews specifically generated by AI-based tools rather than solely detecting spam reviews or AI-generated text. This research contributes to the field of spam detection by offering a scalable, efficient, and accurate framework that can be integrated into various online platforms, enhancing user trust and the decision-making processes. Full article

(This article belongs to the Special Issue When Natural Language Processing Meets Machine Learning—Opportunities, Challenges and Solutions)

► Show Figures

Figure 1

20 pages, 369 KiB

Open AccessSystematic Review

A Systematic Review of Deep Learning Techniques for Phishing Email Detection

by Phyo Htet Kyaw, Jairo Gutierrez and Akbar Ghobakhlou

Electronics 2024, 13(19), 3823; https://doi.org/10.3390/electronics13193823 - 27 Sep 2024

Cited by 6 | Viewed by 11698

Abstract

The landscape of phishing email threats is continually evolving nowadays, making it challenging to combat effectively with traditional methods even with carrier-grade spam filters. Traditional detection mechanisms such as blacklisting, whitelisting, signature-based, and rule-based techniques could not effectively prevent phishing, spear-phishing, and zero-day [...] Read more.

The landscape of phishing email threats is continually evolving nowadays, making it challenging to combat effectively with traditional methods even with carrier-grade spam filters. Traditional detection mechanisms such as blacklisting, whitelisting, signature-based, and rule-based techniques could not effectively prevent phishing, spear-phishing, and zero-day attacks, as cybercriminals are using sophisticated techniques and trusted email service providers. Consequently, many researchers have recently concentrated on leveraging machine learning (ML) and deep learning (DL) approaches to enhance phishing email detection capabilities with better accuracy. To gain insights into the development of deep learning algorithms in the current research on phishing prevention, this study conducts a systematic literature review (SLR) following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. By synthesizing the 33 selected papers using the SLR approach, this study presents a taxonomy of DL-based phishing detection methods, analyzing their effectiveness, limitations, and future research directions to address current challenges. The study reveals that the adaptability of detection models to new behaviors of phishing emails is the major improvement area. This study aims to add details about deep learning used for security to the body of knowledge, and it discusses future research in phishing detection systems. Full article

(This article belongs to the Special Issue Machine Learning and Cybersecurity—Trends and Future Challenges)

► Show Figures

Figure 1

20 pages, 3083 KiB

Open AccessArticle

Efficient Detection of Irrelevant User Reviews Using Machine Learning

by Cheolgi Kim and Hyeon Gyu Kim

Appl. Sci. 2024, 14(16), 6900; https://doi.org/10.3390/app14166900 - 7 Aug 2024

Viewed by 1438

Abstract

User reviews such as SNS feeds and blog writings have been widely used to extract opinions, complains, and requirements about a given place or product from users’ perspective. However, during the process of collecting them, a lot of reviews that are irrelevant to [...] Read more.

User reviews such as SNS feeds and blog writings have been widely used to extract opinions, complains, and requirements about a given place or product from users’ perspective. However, during the process of collecting them, a lot of reviews that are irrelevant to a given search keyword can be included in the results. Such irrelevant reviews may lead to distorted results in data analysis. In this paper, we discuss a method to detect irrelevant user reviews efficiently by combining various oversampling and machine learning algorithms. About 35,000 user reviews collected from 25 restaurants and 33 tourist attractions in Ulsan Metropolitan City, South Korea, were used for learning, where the ratio of irrelevant reviews in the two kinds of data sets was 53.7% and 71.6%, respectively. To deal with skewness in the collected reviews, oversampling algorithms such as SMOTE, Borderline-SMOTE, and ADASYN were used. To build a model for the detection of irrelevant reviews, RNN, LSTM, GRU, and BERT were adopted and compared, as they are known to provide high accuracy in text processing. The performance of the detection models was examined through experiments, and the results showed that the BERT model presented the best performance, with an F1 score of 0.965. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 2027 KiB

Open AccessReview

Genetic Deficiencies of Hyaluronan Degradation

by Stephen P. Fink and Barbara Triggs-Raine

Cells 2024, 13(14), 1203; https://doi.org/10.3390/cells13141203 - 16 Jul 2024

Cited by 6 | Viewed by 2360

Abstract

Hyaluronan (HA) is a large polysaccharide that is broadly distributed and highly abundant in the soft connective tissues and embryos of vertebrates. The constitutive turnover of HA is very high, estimated at 5 g per day in an average (70 kg) adult human, [...] Read more.

Hyaluronan (HA) is a large polysaccharide that is broadly distributed and highly abundant in the soft connective tissues and embryos of vertebrates. The constitutive turnover of HA is very high, estimated at 5 g per day in an average (70 kg) adult human, but HA turnover must also be tightly regulated in some processes. Six genes encoding homologues to bee venom hyaluronidase (HYAL1, HYAL2, HYAL3, HYAL4, HYAL6P/HYALP1, SPAM1/PH20), as well as genes encoding two unrelated G8-domain-containing proteins demonstrated to be involved in HA degradation (CEMIP/KIAA1199, CEMIP2/TMEM2), have been identified in humans. Of these, only deficiencies in HYAL1, HYAL2, HYAL3 and CEMIP have been identified as the cause or putative cause of human genetic disorders. The phenotypes of these disorders have been vital in determining the biological roles of these enzymes but there is much that is still not understood. Deficiencies in these HA-degrading proteins have been created in mice and/or other model organisms where phenotypes could be analyzed and probed to expand our understanding of HA degradation and function. This review will describe what has been found in human and animal models of hyaluronidase deficiency and discuss how this has advanced our understanding of HA’s role in health and disease. Full article

(This article belongs to the Special Issue Role of Hyaluronan in Human Health and Disease)

► Show Figures

Figure 1

42 pages, 8098 KiB

Open AccessArticle

Leveraging Stacking Framework for Fake Review Detection in the Hospitality Sector

by Syed Abdullah Ashraf, Aariz Faizan Javed, Sreevatsa Bellary, Pradip Kumar Bala and Prabin Kumar Panigrahi

J. Theor. Appl. Electron. Commer. Res. 2024, 19(2), 1517-1558; https://doi.org/10.3390/jtaer19020075 - 15 Jun 2024

Cited by 2 | Viewed by 2527

Abstract

Driven by motives of profit and competition, fake reviews are increasingly used to manipulate product ratings. This trend has caught the attention of academic researchers and international regulatory bodies. Current methods for spotting fake reviews suffer from scalability and interpretability issues. This study [...] Read more.

Driven by motives of profit and competition, fake reviews are increasingly used to manipulate product ratings. This trend has caught the attention of academic researchers and international regulatory bodies. Current methods for spotting fake reviews suffer from scalability and interpretability issues. This study focuses on identifying suspected fake reviews in the hospitality sector using a review aggregator platform. By combining features and leveraging various classifiers through a stacking architecture, we improve training outcomes. User-centric traits emerge as crucial in spotting fake reviews. Incorporating SHAP (Shapley Additive Explanations) enhances model interpretability. Our model consistently outperforms existing methods across diverse dataset sizes, proving its adaptable, explainable, and scalable nature. These findings hold implications for review platforms, decision-makers, and users, promoting transparency and reliability in reviews and decisions. Full article

► Show Figures

Figure 1

17 pages, 5144 KiB

Open AccessArticle

Deep Learning-Based Truthful and Deceptive Hotel Reviews

by Devbrat Gupta, Anuja Bhargava, Diwakar Agarwal, Mohammed H. Alsharif, Peerapong Uthansakul, Monthippa Uthansakul and Ayman A. Aly

Sustainability 2024, 16(11), 4514; https://doi.org/10.3390/su16114514 - 26 May 2024

Cited by 3 | Viewed by 2378

Abstract

For sustainable hospitality and tourism, the validity of online evaluations is crucial at a time when they influence travelers’ choices. Understanding the facts and conducting a thorough investigation to distinguish between truthful and deceptive hotel reviews are crucial. The urgent need to discern [...] Read more.

For sustainable hospitality and tourism, the validity of online evaluations is crucial at a time when they influence travelers’ choices. Understanding the facts and conducting a thorough investigation to distinguish between truthful and deceptive hotel reviews are crucial. The urgent need to discern between truthful and deceptive hotel reviews is addressed by the current study. This misleading “opinion spam” is common in the hospitality sector, misleading potential customers and harming the standing of hotel review websites. This data science project aims to create a reliable detection system that correctly recognizes and classifies hotel reviews as either true or misleading. When it comes to natural language processing, sentiment analysis is essential for determining the text’s emotional tone. With an 800-instance dataset comprising true and false reviews, this study investigates the sentiment analysis performance of three deep learning models: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), and Recurrent Neural Network (RNN). Among the training, testing, and validation sets, the CNN model yielded the highest accuracy rates, measuring 98%, 77%, and 80%, respectively. Despite showing balanced precision and recall, the LSTM model was not as accurate as the CNN model, with an accuracy of 60%. There were difficulties in capturing sequential relationships, for which the RNN model further trailed, with accuracy rates of 57%, 57%, and 58%. A thorough assessment of every model’s performance was conducted using ROC curves and classification reports. Full article

(This article belongs to the Special Issue Smart Technologies and Sustainable Development in Hospitality and Tourism)

► Show Figures

Figure 1

24 pages, 1490 KiB

Open AccessArticle

Next-Generation Spam Filtering: Comparative Fine-Tuning of LLMs, NLPs, and CNN Models for Email Spam Classification

by Konstantinos I. Roumeliotis, Nikolaos D. Tselikas and Dimitrios K. Nasiopoulos

Electronics 2024, 13(11), 2034; https://doi.org/10.3390/electronics13112034 - 23 May 2024

Cited by 10 | Viewed by 10073

Abstract

Spam emails and phishing attacks continue to pose significant challenges to email users worldwide, necessitating advanced techniques for their efficient detection and classification. In this paper, we address the persistent challenges of spam emails and phishing attacks by introducing a cutting-edge approach to [...] Read more.

Spam emails and phishing attacks continue to pose significant challenges to email users worldwide, necessitating advanced techniques for their efficient detection and classification. In this paper, we address the persistent challenges of spam emails and phishing attacks by introducing a cutting-edge approach to email filtering. Our methodology revolves around harnessing the capabilities of advanced language models, particularly the state-of-the-art GPT-4 Large Language Model (LLM), along with BERT and RoBERTa Natural Language Processing (NLP) models. Through meticulous fine-tuning tailored for spam classification tasks, we aim to surpass the limitations of traditional spam detection systems, such as Convolutional Neural Networks (CNNs). Through an extensive literature review, experimentation, and evaluation, we demonstrate the effectiveness of our approach in accurately identifying spam and phishing emails while minimizing false positives. Our methodology showcases the potential of fine-tuning LLMs for specialized tasks like spam classification, offering enhanced protection against evolving spam and phishing attacks. This research contributes to the advancement of spam filtering techniques and lays the groundwork for robust email security systems in the face of increasingly sophisticated threats. Full article

(This article belongs to the Special Issue Automated Methods for Speech Processing and Recognition)

► Show Figures

Figure 1

23 pages, 637 KiB

Open AccessArticle

Unsupervised Domain Adaptation via Weighted Sequential Discriminative Feature Learning for Sentiment Analysis

by Haidi Badr, Nayer Wanas and Magda Fayek

Appl. Sci. 2024, 14(1), 406; https://doi.org/10.3390/app14010406 - 1 Jan 2024

Cited by 3 | Viewed by 2409

Abstract

Unsupervised domain adaptation (UDA) presents a significant challenge in sentiment analysis, especially when faced with differences between source and target domains. This study introduces Weighted Sequential Unsupervised Domain Adaptation (WS-UDA), a novel sequential framework aimed at discovering more profound features and improving target [...] Read more.

Unsupervised domain adaptation (UDA) presents a significant challenge in sentiment analysis, especially when faced with differences between source and target domains. This study introduces Weighted Sequential Unsupervised Domain Adaptation (WS-UDA), a novel sequential framework aimed at discovering more profound features and improving target representations, even in resource-limited scenarios. WS-UDA utilizes a domain-adversarial learning model for sequential discriminative feature learning. While recent UDA techniques excel in scenarios where source and target domains are closely related, they struggle with substantial dissimilarities. This potentially leads to instability during shared-feature learning. To tackle this issue, WS-UDA employs a two-stage transfer process concurrently, significantly enhancing model stability and adaptability. The sequential approach of WS-UDA facilitates superior adaptability to varying levels of dissimilarity between source and target domains. Experimental results on benchmark datasets, including Amazon reviews, FDU-MTL datasets, and Spam datasets, demonstrate the promising performance of WS-UDA. It outperforms state-of-the-art cross-domain unsupervised baselines, showcasing its efficacy in scenarios with dissimilar domains. WS-UDA’s adaptability extends beyond sentiment analysis, making it a versatile solution for diverse text classification tasks. Full article

(This article belongs to the Special Issue Application of Machine Learning in Text Mining)

► Show Figures

Figure 1

16 pages, 487 KiB

Open AccessArticle

The Language of Deception: Applying Findings on Opinion Spam to Legal and Forensic Discourses

by Alibek Jakupov, Julien Longhi and Besma Zeddini

Languages 2024, 9(1), 10; https://doi.org/10.3390/languages9010010 - 22 Dec 2023

Cited by 2 | Viewed by 4093

Abstract

Digital forensic investigations are becoming increasingly crucial in criminal investigations and civil litigations, especially in cases of corporate espionage and intellectual property theft as more communication occurs online via e-mail and social media. Deceptive opinion spam analysis is an emerging field of research [...] Read more.

Digital forensic investigations are becoming increasingly crucial in criminal investigations and civil litigations, especially in cases of corporate espionage and intellectual property theft as more communication occurs online via e-mail and social media. Deceptive opinion spam analysis is an emerging field of research that aims to detect and identify fraudulent reviews, comments, and other forms of deceptive online content. In this paper, we explore how the findings from this field may be relevant to forensic investigation, particularly the features that capture stylistic patterns and sentiments, which are psychologically relevant aspects of truthful and deceptive language. To assess these features’ utility, we demonstrate the potential of our proposed approach using the real-world dataset from the Enron Email Corpus. Our findings suggest that deceptive opinion spam analysis may be a valuable tool for forensic investigators and legal professionals looking to identify and analyze deceptive behavior in online communication. By incorporating these techniques into their investigative and legal strategies, professionals can improve the accuracy and reliability of their findings, leading to more effective and just outcomes. Full article

(This article belongs to the Special Issue New Challenges in Forensic and Legal Linguistics)

► Show Figures

Figure 1

18 pages, 1688 KiB

Open AccessArticle

Review Evaluation for Hotel Recommendation

by Ying-Chia Hsieh, Long-Chuan Lu and Yi-Fan Ku

Electronics 2023, 12(22), 4673; https://doi.org/10.3390/electronics12224673 - 16 Nov 2023

Cited by 2 | Viewed by 1513

Abstract

With the prevalence of backpacking and the convenience of using the Internet, many travelers like sharing their experiences in online communities. The development of online communities has changed the decision-making process of consumer purchasing, especially for travel, i.e., some travelers reconsider their decisions [...] Read more.

With the prevalence of backpacking and the convenience of using the Internet, many travelers like sharing their experiences in online communities. The development of online communities has changed the decision-making process of consumer purchasing, especially for travel, i.e., some travelers reconsider their decisions because they believe that the reviews of online communities are more valuable than advertisements. However, these reviews are not completely reliable since most reviews are provided without specific author information and the review data are too large to be observed. In this paper, we propose a novel approach (named ET) to evaluate the trustworthiness of reviews in online travel communities. Our method considers three concepts, including the sentiment similarity of reviewers in the social network, features of the reviews, and behaviors of the reviewers. The experimental results demonstrate that our method is effective in evaluating the trustworthiness of reviews. Full article

(This article belongs to the Special Issue Data Push and Data Mining in the Age of Artificial Intelligence)

► Show Figures

Figure 1

14 pages, 3079 KiB

Open AccessArticle

A Hybrid Model with New Word Weighting for Fast Filtering Spam Short Texts

by Tian Xia, Xuemin Chen, Jiacun Wang and Feng Qiu

Sensors 2023, 23(21), 8975; https://doi.org/10.3390/s23218975 - 4 Nov 2023

Cited by 2 | Viewed by 2948

Abstract

Short message services (SMS), microblogging tools, instant message apps, and commercial websites produce numerous short text messages every day. These short text messages are usually guaranteed to reach mass audience with low cost. Spammers take advantage of short texts by sending bulk malicious [...] Read more.

Short message services (SMS), microblogging tools, instant message apps, and commercial websites produce numerous short text messages every day. These short text messages are usually guaranteed to reach mass audience with low cost. Spammers take advantage of short texts by sending bulk malicious or unwanted messages. Short texts are difficult to classify because of their shortness, sparsity, rapidness, and informal writing. The effectiveness of the hidden Markov model (HMM) for short text classification has been illustrated in our previous study. However, the HMM has limited capability to handle new words, which are mostly generated by informal writing. In this paper, a hybrid model is proposed to address the informal writing issue by weighting new words for fast short text filtering with high accuracy. The hybrid model consists of an artificial neural network (ANN) and an HMM, which are used for new word weighting and spam filtering, respectively. The weight of a new word is calculated based on the weights of its neighbor, along with the spam and ham (i.e., not spam) probabilities of short text message predicted by the ANN. Performance evaluations on benchmark datasets, including the SMS message data maintained by University of California, Irvine; the movie reviews, and the customer reviews are conducted. The hybrid model operates at a significantly higher speed than deep learning models. The experiment results show that the proposed hybrid model outperforms other prominent machine learning algorithms, achieving a good balance between filtering throughput and accuracy. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

15 pages, 2289 KiB

Open AccessArticle

Policy-Based Spam Detection of Tweets Dataset

by Momna Dar, Faiza Iqbal, Rabia Latif, Ayesha Altaf and Nor Shahida Mohd Jamail

Electronics 2023, 12(12), 2662; https://doi.org/10.3390/electronics12122662 - 14 Jun 2023

Cited by 10 | Viewed by 2734

Abstract

Spam communications from spam ads and social media platforms such as Facebook, Twitter, and Instagram are increasing, making spam detection more popular. Many languages are used for spam review identification, including Chinese, Urdu, Roman Urdu, English, Turkish, etc.; however, there are fewer high-quality [...] Read more.

Spam communications from spam ads and social media platforms such as Facebook, Twitter, and Instagram are increasing, making spam detection more popular. Many languages are used for spam review identification, including Chinese, Urdu, Roman Urdu, English, Turkish, etc.; however, there are fewer high-quality datasets available for Urdu. This is mainly because Urdu is less extensively used on social media networks such as Twitter, making it harder to collect huge volumes of relevant data. This paper investigates policy-based Urdu tweet spam detection. This study aims to collect over 1,100,000 real-time tweets from multiple users. The dataset is carefully filtered to comply with Twitter’s 100-tweet-per-hour limit. For data collection, the snscrape library is utilized, which is equipped with an API for accessing various attributes such as username, URL, and tweet content. Then, a machine learning pipeline consisting of TF-IDF, Count Vectorizer, and the following machine learning classifiers: multinomial naïve Bayes, support vector classifier RBF, logical regression, and BERT, are developed. Based on Twitter policy standards, feature extraction is performed, and the dataset is separated into training and testing sets for spam analysis. Experimental results show that the logistic regression classifier has achieved the highest accuracy, with an F1-score of 0.70 and an accuracy of 99.55%. The findings of the study show the effectiveness of policy-based spam detection in Urdu tweets using machine learning and BERT layer models and contribute to the development of a robust Urdu language social media spam detection method. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

28 pages, 5319 KiB

Open AccessArticle

Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review

by Ashokkumar Palanivinayagam, Claude Ziad El-Bayeh and Robertas Damaševičius

Algorithms 2023, 16(5), 236; https://doi.org/10.3390/a16050236 - 29 Apr 2023

Cited by 45 | Viewed by 13172

Abstract

Machine-learning-based text classification is one of the leading research areas and has a wide range of applications, which include spam detection, hate speech identification, reviews, rating summarization, sentiment analysis, and topic modelling. Widely used machine-learning-based research differs in terms of the datasets, training [...] Read more.

Machine-learning-based text classification is one of the leading research areas and has a wide range of applications, which include spam detection, hate speech identification, reviews, rating summarization, sentiment analysis, and topic modelling. Widely used machine-learning-based research differs in terms of the datasets, training methods, performance evaluation, and comparison methods used. In this paper, we surveyed 224 papers published between 2003 and 2022 that employed machine learning for text classification. The Preferred Reporting Items for Systematic Reviews (PRISMA) statement is used as the guidelines for the systematic review process. The comprehensive differences in the literature are analyzed in terms of six aspects: datasets, machine learning models, best accuracy, performance evaluation metrics, training and testing splitting methods, and comparisons among machine learning models. Furthermore, we highlight the limitations and research gaps in the literature. Although the research works included in the survey perform well in terms of text classification, improvement is required in many areas. We believe that this survey paper will be useful for researchers in the field of text classification. Full article

(This article belongs to the Special Issue Machine Learning in Statistical Data Processing)

► Show Figures

Figure 1

42 pages, 3130 KiB

Open AccessReview

A Comprehensive Review of Cyber Security Vulnerabilities, Threats, Attacks, and Solutions

by Ömer Aslan, Semih Serkant Aktuğ, Merve Ozkan-Okay, Abdullah Asim Yilmaz and Erdal Akin

Electronics 2023, 12(6), 1333; https://doi.org/10.3390/electronics12061333 - 11 Mar 2023

Cited by 318 | Viewed by 103208

Abstract

Internet usage has grown exponentially, with individuals and companies performing multiple daily transactions in cyberspace rather than in the real world. The coronavirus (COVID-19) pandemic has accelerated this process. As a result of the widespread usage of the digital environment, traditional crimes have [...] Read more.

Internet usage has grown exponentially, with individuals and companies performing multiple daily transactions in cyberspace rather than in the real world. The coronavirus (COVID-19) pandemic has accelerated this process. As a result of the widespread usage of the digital environment, traditional crimes have also shifted to the digital space. Emerging technologies such as cloud computing, the Internet of Things (IoT), social media, wireless communication, and cryptocurrencies are raising security concerns in cyberspace. Recently, cyber criminals have started to use cyber attacks as a service to automate attacks and leverage their impact. Attackers exploit vulnerabilities that exist in hardware, software, and communication layers. Various types of cyber attacks include distributed denial of service (DDoS), phishing, man-in-the-middle, password, remote, privilege escalation, and malware. Due to new-generation attacks and evasion techniques, traditional protection systems such as firewalls, intrusion detection systems, antivirus software, access control lists, etc., are no longer effective in detecting these sophisticated attacks. Therefore, there is an urgent need to find innovative and more feasible solutions to prevent cyber attacks. The paper first extensively explains the main reasons for cyber attacks. Then, it reviews the most recent attacks, attack patterns, and detection techniques. Thirdly, the article discusses contemporary technical and nontechnical solutions for recognizing attacks in advance. Using trending technologies such as machine learning, deep learning, cloud platforms, big data, and blockchain can be a promising solution for current and future cyber attacks. These technological solutions may assist in detecting malware, intrusion detection, spam identification, DNS attack classification, fraud detection, recognizing hidden channels, and distinguishing advanced persistent threats. However, some promising solutions, especially machine learning and deep learning, are not resistant to evasion techniques, which must be considered when proposing solutions against intelligent cyber attacks. Full article

► Show Figures

Figure 1

Search Results (29)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (29)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI