Machine Learning and Data Mining for User Classification

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Processes".

Deadline for manuscript submissions: 30 September 2025 | Viewed by 7765

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, International Hellenic University, 65404 Kavala, Greece
Interests: keystroke dynamics; user classification; machine learning; data mining

E-Mail Website
Guest Editor
Department of Computer Science, International Hellenic University, 65404 Kavala, Greece
Interests: signal processing; intelligent systems; pattern recognition; machine learning
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor

Special Issue Information

Dear Colleagues,

The modern Internet is characterized by its many users, the multitude of web services offered, and the increased complexity of accessing digital resources. In this context, an increased number and more sophisticated threats have occurred, requiring new methods for protecting and facilitating users. Also, the large amount of stored raw data, which is growing at a dizzying pace daily, hides information that is not immediately available and requires time and effort.

This Special Issue suggests new approaches for creating user profiles to protect unsuspecting users, enhancing user authentication, and extracting information from textual data. Moreover, this Special Issue is interested in document classification and, generally, for methods protecting Internet users, making better use of the services offered, and extracting the available information using data derived mainly from text and typing.

Dr. Ioannis Tsimperidis
Dr. Eleni Vrochidou
Prof. Dr. George A. Papakostas
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data mining
  • information retrieval
  • text analysis
  • data clustering
  • user authentication
  • user profiling
  • user classification by inherent and acquired characteristics
  • natural language processing
  • author classification
  • keystroke dynamics
  • typing pattern recognition
  • content classification
  • digital text forensics
  • typing behavior

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

15 pages, 1708 KiB  
Article
ET-Mamba: A Mamba Model for Encrypted Traffic Classification
by Jian Xu, Liangbing Chen, Wenqian Xu, Longxuan Dai, Chenxi Wang and Lei Hu
Information 2025, 16(4), 314; https://doi.org/10.3390/info16040314 - 16 Apr 2025
Viewed by 207
Abstract
With the widespread use of encryption protocols on network data, fast and effective encryption traffic classification can improve the efficiency of traffic analysis. A resampling method combining Wasserstein GAN and random selection is proposed for solving the dataset imbalance problem, and it uses [...] Read more.
With the widespread use of encryption protocols on network data, fast and effective encryption traffic classification can improve the efficiency of traffic analysis. A resampling method combining Wasserstein GAN and random selection is proposed for solving the dataset imbalance problem, and it uses Wasserstein GAN for oversampling and random selection for undersampling to achieve class equalization. Based on Mamba, an ultra-low parametric quantity model, we propose an encrypted traffic classification model, ET-Mamba, which has a pre-training phase and a fine-tuning phase. During the pre-training phase, positional embedding is used to characterize the blocks of the traffic grayscale image, and random masking is used to strengthen the learning of the intrinsic correlation among the blocks of the traffic grayscale image. During the fine-tuning phase, the agent attention mechanism is adopted in the feature extraction phase to achieve global information modeling at a low computational cost, and the SmoothLoss function is designed to solve the problem of the insufficient generalization ability of cross-entropy loss function during training. The experimental results show that the proposed model significantly reduces the number of parameters and outperforms other models in terms of classification accuracy on non-VPN datasets. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)
Show Figures

Graphical abstract

33 pages, 6468 KiB  
Article
Exploring Sentiment Analysis for the Indonesian Presidential Election Through Online Reviews Using Multi-Label Classification with a Deep Learning Algorithm
by Ahmad Nahid Ma’aly, Dita Pramesti, Ariadani Dwi Fathurahman and Hanif Fakhrurroja
Information 2024, 15(11), 705; https://doi.org/10.3390/info15110705 - 5 Nov 2024
Viewed by 2032
Abstract
Presidential elections are an important political event that often trigger intense debate. With more than 139 million users, YouTube serves as a significant platform for understanding public opinion through sentiment analysis. This study aimed to implement deep learning techniques for a multi-label sentiment [...] Read more.
Presidential elections are an important political event that often trigger intense debate. With more than 139 million users, YouTube serves as a significant platform for understanding public opinion through sentiment analysis. This study aimed to implement deep learning techniques for a multi-label sentiment analysis of comments on YouTube videos related to the 2024 Indonesian presidential election. Offering a fresh perspective compared to previous research that primarily employed traditional classification methods, this study classifies comments into eight emotional labels: anger, anticipation, disgust, joy, fear, sadness, surprise, and trust. By focusing on the emotional spectrum, this study provides a more nuanced understanding of public sentiment towards presidential candidates. The CRISP-DM method is applied, encompassing stages of business understanding, data understanding, data preparation, modeling, evaluation, and deployment, ensuring a systematic and comprehensive approach. This study employs a dataset comprising 32,000 comments, obtained via YouTube Data API, from the KPU and Najwa Shihab channels. The analysis is specifically centered on comments related to presidential candidate debates. Three deep learning models—Convolutional Neural Network (CNN), Bidirectional Long Short-Term Memory (Bi-LSTM), and a hybrid model combining CNN and Bi-LSTM—are assessed using confusion matrix, Area Under the Curve (AUC), and Hamming loss metrics. The evaluation results demonstrate that the Bi-LSTM model achieved the highest accuracy with an AUC value of 0.91 and a Hamming loss of 0.08, indicating an excellent ability to classify sentiment with high precision and a low error rate. This innovative approach to multi-label sentiment analysis in the context of the 2024 Indonesian presidential election expands the insights into public sentiment towards candidates, offering valuable implications for political campaign strategies. Additionally, this research contributes to the fields of natural language processing and data mining by addressing the challenges associated with multi-label sentiment analysis. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)
Show Figures

Figure 1

12 pages, 583 KiB  
Article
IKDD: A Keystroke Dynamics Dataset for User Classification
by Ioannis Tsimperidis, Olga-Dimitra Asvesta, Eleni Vrochidou and George A. Papakostas
Information 2024, 15(9), 511; https://doi.org/10.3390/info15090511 - 23 Aug 2024
Viewed by 2263
Abstract
Keystroke dynamics is the field of computer science that exploits data derived from the way users type. It has been used in authentication systems, in the identification of user characteristics for forensic or commercial purposes, and to identify the physical and mental state [...] Read more.
Keystroke dynamics is the field of computer science that exploits data derived from the way users type. It has been used in authentication systems, in the identification of user characteristics for forensic or commercial purposes, and to identify the physical and mental state of users for purposes that serve human–computer interaction. Studies of keystroke dynamics have used datasets created from volunteers recording fixed-text typing or free-text typing. Unfortunately, there are not enough keystroke dynamics datasets available on the Internet, especially from the free-text category, because they contain sensitive and personal information from the volunteers. In this work, a free-text dataset is presented, which consists of 533 logfiles, each of which contains data from 3500 keystrokes, coming from 164 volunteers. Specifically, the software developed to record user typing is described, the demographics of the volunteers who participated are given, the structure of the dataset is analyzed, and the experiments performed on the dataset justify its utility. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)
Show Figures

Figure 1

Review

Jump to: Research

41 pages, 1802 KiB  
Review
A Systematic Review of CNN Architectures, Databases, Performance Metrics, and Applications in Face Recognition
by Andisani Nemavhola, Colin Chibaya and Serestina Viriri
Information 2025, 16(2), 107; https://doi.org/10.3390/info16020107 - 5 Feb 2025
Viewed by 2362
Abstract
This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent [...] Read more.
This study provides a comparative evaluation of face recognition databases and Convolutional Neural Network (CNN) architectures used in training and testing face recognition systems. The databases span from early datasets like Olivetti Research Laboratory (ORL) and Facial Recognition Technology (FERET) to more recent collections such as MegaFace and Ms-Celeb-1M, offering a range of sizes, subject diversity, and image quality. Older databases, such as ORL and FERET, are smaller and cleaner, while newer datasets enable large-scale training with millions of images but pose challenges like inconsistent data quality and high computational costs. The study also examines CNN architectures, including FaceNet and Visual Geometry Group 16 (VGG16), which show strong performance on large datasets like Labeled Faces in the Wild (LFW) and VGGFace, achieving accuracy rates above 98%. In contrast, earlier models like Support Vector Machine (SVM) and Gabor Wavelets perform well on smaller datasets but lack scalability for larger, more complex datasets. The analysis highlights the growing importance of multi-task learning and ensemble methods, as seen in Multi-Task Cascaded Convolutional Networks (MTCNNs). Overall, the findings emphasize the need for advanced algorithms capable of handling large-scale, real-world challenges while optimizing accuracy and computational efficiency in face recognition systems. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining for User Classification)
Show Figures

Figure 1

Back to TopTop