Machine Learning and AI in Intelligent Data Mining and Analysis

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 June 2023) | Viewed by 9260

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
Interests: machine learning; AI; data mining; graph neural networks; data analysis

E-Mail Website1 Website2
Guest Editor
School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
Interests: data mining; natural language processing

E-Mail Website
Guest Editor
School of Cyber Science and Engineering, Sichuan University, Chengdu 610000, China
Interests: cyber security; information confrontation; deep learning; system security; cyber-attack detection
Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA 18045, USA
Interests: trustworthy AI; medical AI; federated learning

Special Issue Information

Dear Colleagues,

With huge volumes of data available for mining and analysis, AI methods have been successfully used in a variety of applications. Over the last decade, machine learning and AI have clearly overwhelmed the traditional paradigm of data mining and analysis. Compared with conventional feature engineering approaches that mainly rely on expert knowledge, representative AI methods based on deep learning or machine learning techniques are highly data-driven and can effectively capture the intrinsic correlations lying in data. We are interested in articles that explore machine learning and AI methods in intelligent data mining and analysis. Potential topics include but are not limited to the following:

  • Machine learning and AI algorithms widely used in data mining and analysis, e.g., graph neural networks or transformers. 
  • Interesting data mining applications based on deep learning or machine learning techniques, e.g., security, drug, or software.
  • Resources related to the above topics, e.g., new datasets, tools, or surveys.

Prof. Dr. Chuan Shi
Dr. Cheng Yang
Prof. Dr. Yong Fang
Dr. Lichao Sun
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • AI
  • data mining
  • graph neural networks
  • data analysis

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 4171 KiB  
Article
Deep Learning-Based Detection Technology for SQL Injection Research and Implementation
by Hao Sun, Yuejin Du and Qi Li
Appl. Sci. 2023, 13(16), 9466; https://doi.org/10.3390/app13169466 - 21 Aug 2023
Cited by 1 | Viewed by 2327
Abstract
Amid the incessant evolution of the Internet, an array of cybersecurity threats has surged at an unprecedented rate. A notable antagonist within this plethora of attacks is the SQL injection assault, a prevalent form of Internet attack that poses a significant threat to [...] Read more.
Amid the incessant evolution of the Internet, an array of cybersecurity threats has surged at an unprecedented rate. A notable antagonist within this plethora of attacks is the SQL injection assault, a prevalent form of Internet attack that poses a significant threat to web applications. These attacks are characterized by their extensive variety, rapid mutation, covert nature, and the substantial damage they can inflict. Existing SQL injection detection methods, such as static and dynamic detection and command randomization, are principally rule-based and suffer from low accuracy, high false positive (FP) rates, and false negative (FN) rates. Contemporary machine learning research on SQL injection attack (SQLIA) detection primarily focuses on feature extraction. The effectiveness of detection is heavily reliant on the precision of feature extraction, leading to a deficiency in tackling more intricate SQLIA. To address these challenges, we propose a novel SQLIA detection approach harnessing the power of an enhanced TextCNN and LSTM. This method begins by vectorizing the samples in the corpus and then leverages an improved TextCNN to extract local features. It then employs a Bidirectional LSTM (Bi-LSTM) network to decipher the sequence information inherent in the samples. Given LSTM’s modest effectiveness for relatively long sequences, we further integrate an attention mechanism, reducing the distance between any two words in the sequence to one, thereby enhancing the model’s effectiveness. Moreover, pre-trained word vector features acquired via BERT for transfer learning are incorporated into the feature section. Comparative experimental results affirm the superiority of our deep learning-based SQLIA detection approach, as it effectively elevates the SQLIA recognition rate while reducing both FP and FN rates. Full article
(This article belongs to the Special Issue Machine Learning and AI in Intelligent Data Mining and Analysis)
Show Figures

Figure 1

14 pages, 545 KiB  
Article
Self-Supervised Spatio-Temporal Graph Learning for Point-of-Interest Recommendation
by Jiawei Liu, Haihan Gao, Chuan Shi, Hongtao Cheng and Qianlong Xie
Appl. Sci. 2023, 13(15), 8885; https://doi.org/10.3390/app13158885 - 01 Aug 2023
Viewed by 1135
Abstract
As one of the most crucial topics in the recommendation system field, point-of-interest (POI) recommendation aims to recommending potential interesting POIs to users. Recently, graph neural networks have been successfully used to model interaction and spatio-temporal information in POI recommendations, but the data [...] Read more.
As one of the most crucial topics in the recommendation system field, point-of-interest (POI) recommendation aims to recommending potential interesting POIs to users. Recently, graph neural networks have been successfully used to model interaction and spatio-temporal information in POI recommendations, but the data sparsity of POI recommendations affects the training of GNNs. Although some existing GNN-based POI recommendation approaches try to use social relationships or user attributes to alleviate the data sparsity problem, such auxiliary information is not always available for privacy reasons. Self-supervised learning provides a new idea to alleviate the data sparsity problem, but most existing self-supervised recommendation methods are designed for bi-partite graphs or social graphs, and cannot be directly used in the spatio-temporal graph of POI recommendations. In this paper, we propose a new method named SSTGL to combine self-supervised learning and GNN-based POI recommendation for the first time. SSTGL is empowered with spatio-temporal-aware strategies in the data augmentation and pre-text task stages, respectively, so that it can provide high-quality supervision information by incorporating spatio-temporal prior knowledge. By combining self-supervised learning objective with recommendation objectives, SSTGL can improve the performance of GNN-based POI recommendations. Extensive experiments on three POI recommendation datasets demonstrate the effectiveness of SSTGL, which performed better than existing mainstream methods. Full article
(This article belongs to the Special Issue Machine Learning and AI in Intelligent Data Mining and Analysis)
Show Figures

Figure 1

19 pages, 1788 KiB  
Article
Enhancing Phishing Email Detection through Ensemble Learning and Undersampling
by Qinglin Qi, Zhan Wang, Yijia Xu, Yong Fang and Changhui Wang
Appl. Sci. 2023, 13(15), 8756; https://doi.org/10.3390/app13158756 - 28 Jul 2023
Cited by 4 | Viewed by 1808
Abstract
In real-world scenarios, the number of phishing and benign emails is usually imbalanced, leading to traditional machine learning or deep learning algorithms being biased towards benign emails and misclassifying phishing emails. Few studies take measures to address the imbalance between them, which significantly [...] Read more.
In real-world scenarios, the number of phishing and benign emails is usually imbalanced, leading to traditional machine learning or deep learning algorithms being biased towards benign emails and misclassifying phishing emails. Few studies take measures to address the imbalance between them, which significantly threatens people’s financial and information security. To mitigate the impact of imbalance on the model and enhance the detection performance of phishing emails, this paper proposes two new algorithms with undersampling: the Fisher–Markov-based phishing ensemble detection (FMPED) method and the Fisher–Markov–Markov-based phishing ensemble detection (FMMPED) method. The algorithms first remove benign emails in overlapping areas, then undersample the remaining benign emails, and finally, combine the retained benign emails with phishing emails into a new training set, using ensemble learning algorithms for training and classification. Experimental results have demonstrated that the proposed algorithms outperform other machine learning and deep learning algorithms, achieving an F1-score of 0.9945, an accuracy of 0.9945, an AUC of 0.9828, and a G-mean of 0.9827. Full article
(This article belongs to the Special Issue Machine Learning and AI in Intelligent Data Mining and Analysis)
Show Figures

Figure 1

16 pages, 728 KiB  
Article
A New Model for Emotion-Driven Behavior Extraction from Text
by Yawei Sun, Saike He, Xu Han and Ruihua Zhang
Appl. Sci. 2023, 13(15), 8700; https://doi.org/10.3390/app13158700 - 27 Jul 2023
Viewed by 1110
Abstract
Emotion analysis is currently a popular research direction in the field of natural language processing. However, existing research focuses primarily on tasks such as emotion classification, emotion extraction, and emotion cause analysis, while there are few investigations into the relationship between emotions and [...] Read more.
Emotion analysis is currently a popular research direction in the field of natural language processing. However, existing research focuses primarily on tasks such as emotion classification, emotion extraction, and emotion cause analysis, while there are few investigations into the relationship between emotions and their impacts. To address these limitations, this paper introduces the emotion-driven behavior extraction (EDBE) task, which addresses these limitations by separately extracting emotions and behaviors to filter emotion-driven behaviors described in text. EDBE comprises three sub-tasks: emotion extraction, behavior extraction, and emotion–behavior pair filtering. To facilitate research in this domain, we have created a new dataset, which is accessible to the research community. To address the EDBE task, we propose a pipeline approach that incorporates the causal relationship between emotions and driven behaviors. Additionally, we adopt the prompt paradigm to improve the model’s representation of cause-and-effect relationships. In comparison to state-of-the-art methods, our approach demonstrates notable improvements, achieving a 1.32% improvement at the clause level and a 1.55% improvement at the span level on our newly curated dataset in terms of the F1 score, which is a commonly used metric to measure the performance of models. These results underscore the effectiveness and superiority of our approach in relation to existing methods. Full article
(This article belongs to the Special Issue Machine Learning and AI in Intelligent Data Mining and Analysis)
Show Figures

Figure 1

16 pages, 438 KiB  
Article
Incorporating Multi-Hypotheses as Soft-Templates in Neural Headline Generation
by Yana A, Zhenghao Liu, Suyalatu Dong and Fanyu Bu
Appl. Sci. 2023, 13(14), 8478; https://doi.org/10.3390/app13148478 - 22 Jul 2023
Viewed by 548
Abstract
Neural models are widely applied to headline generation. Template-based methods are a promising direction to overcome the shortcomings of the neural headline generation (NHG) model in generating duplicate or extra words. Previous works often retrieve relevant headlines from the training data and adopt [...] Read more.
Neural models are widely applied to headline generation. Template-based methods are a promising direction to overcome the shortcomings of the neural headline generation (NHG) model in generating duplicate or extra words. Previous works often retrieve relevant headlines from the training data and adopt them as the soft template to guide the NHG model. However, these works had two drawbacks: reliance on additional retrieval tools, and uncertainty regarding semantic consistency between the retrieved headline and the source article. The NHG model uncertainty can be utilized to generate hypotheses. The hypotheses generated based on a well-trained NHG model not only contain salient information but also exhibit diversity, making them suitable as soft templates. In this study, we use a basic NHG model to generate multiple diverse hypotheses as candidate templates. Then, we propose a novel Multiple-Hypotheses-based NHG (MH-NHG) model. Experiments on English headline generation tasks demonstrate that it outperforms several baseline systems and achieves a comparable performance with the state-of-the-art system. This indicates that MH-NHG can generate more accurate headlines guided by multiple hypotheses. Full article
(This article belongs to the Special Issue Machine Learning and AI in Intelligent Data Mining and Analysis)
Show Figures

Figure 1

19 pages, 3173 KiB  
Article
DTGCF: Diversified Tag-Aware Recommendation with Graph Collaborative Filtering
by Yi Zuo, Shengzong Liu and Yun Zhou
Appl. Sci. 2023, 13(5), 2945; https://doi.org/10.3390/app13052945 - 24 Feb 2023
Cited by 1 | Viewed by 1450
Abstract
In tag-aware recommender systems, users are strongly encouraged to utilize arbitrary tags to mark items of interest. These user-defined tags can be viewed as a bridge linking users and items. Most tag-aware recommendation models focus on improving the accuracy by introducing ingenious design [...] Read more.
In tag-aware recommender systems, users are strongly encouraged to utilize arbitrary tags to mark items of interest. These user-defined tags can be viewed as a bridge linking users and items. Most tag-aware recommendation models focus on improving the accuracy by introducing ingenious design or complicated structures to handle the tagging information appropriately. Beyond accuracy, diversity is considered to be another important indicator affecting the user satisfaction. Recommending more diverse items will provide more interesting items and commercial sales. Therefore, we propose a diversified tag-aware recommendation model based on graph collaborative filtering. The proposed model establishes a generic graph collaborative filtering framework tailored for tag-aware recommendations. To promote diversity, we adopt two modules: personalized category-boosted negative sampling to select a certain proportion of similar but negative items as negative samples for training, and adversarial learning to make the learned item representation category-free. To improve accuracy, we conduct a two-way TransTag regularization to model the relationship among users, items, and tags. Blending these modules into the proposed framework, we can optimize both the accuracy and diversity concurrently in an end-to-end manner. Experiments on Movielens datasets show that the proposed model can provide diverse recommendations while maintaining a high level of accuracy. Full article
(This article belongs to the Special Issue Machine Learning and AI in Intelligent Data Mining and Analysis)
Show Figures

Figure 1

Back to TopTop