Machine Learning for Data Mining

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 19 February 2026 | Viewed by 1265

Special Issue Editors


E-Mail Website
Guest Editor
Division of Science, Mathematics, and Technology (DSMT), Governors State University, University Park, IL 60484, USA
Interests: big data analytics and stochastic optimization for renewable energy integration; data mining and data engineering; smart grids; embedded system and machine learning
Department of Computer Science and Engineering, University of Nevada, Reno, Reno, NV 89557, USA
Interests: big data analytics; data mining; data engineering
Special Issues, Collections and Topics in MDPI journals
Department of Information Technology, Kennesaw State University, Marietta, GA 30060, USA
Interests: machine Learning; data imputation; forecasting; model optimisation; data visualisation; big data visualisation and interaction; AR/VR data visualisation

Special Issue Information

Dear Colleagues,

The field of machine learning (ML) has become the cornerstone of modern data mining, enabling the extraction of meaningful insights from complex, high-dimensional datasets. By leveraging advanced algorithms, neural networks, and statistical models, machine learning has enhanced data mining capabilities in various fields such as healthcare, cybersecurity, IoT, and intelligent systems. The rapid growth of data generated by digital platforms, edge computing, and distributed sensors has further expanded the need for scalable, efficient, and adaptive machine learning-driven data mining techniques. Recent breakthroughs in automated feature engineering, ensemble learning, and explainable AI have expanded the potential of data mining, enabling researchers and practitioners to develop robust predictive models, optimise decision-making processes, and discover hidden patterns in massive datasets. In addition, emerging challenges such as federated learning and real-time stream mining have also brought about exciting new research directions.

This Special Issue "Machine Learning for Data Mining" aims to showcase cutting-edge research and innovative methods at the intersection of machine learning and data mining. We encourage submissions exploring novel algorithms, scalable frameworks, and practical applications to address real-world challenges.

Original research articles and reviews are welcome. Potential topics include, but are not limited to, the following:

  • Advanced machine learning algorithms;
  • Deep learning in data mining;
  • Scalable and distributed learning;
  • Explainable AI in data mining;
  • Privacy-preserving data mining;
  • Real-time and streaming mining.

We look forward to your valuable contributions and groundbreaking research in this rapidly evolving field.

Dr. Yunchuan Liu
Dr. Lei Yang
Dr. Rui Wu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • data mining
  • deep learning
  • predictive modelling
  • big data analytics

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 1610 KB  
Article
Machine Learning Approaches for Classifying Chess Game Outcomes: A Comparative Analysis of Player Ratings and Game Dynamics
by Kamil Samara, Aaron Antreassian, Matthew Klug and Mohammad Sakib Hasan
Electronics 2026, 15(1), 1; https://doi.org/10.3390/electronics15010001 - 19 Dec 2025
Abstract
Online chess platforms generate vast amounts of game data, presenting opportunities to analyze match outcomes using machine learning approaches. This study develops and compares four machine learning models to classify chess game results (White win, Black win, or Draw) by integrating player rating [...] Read more.
Online chess platforms generate vast amounts of game data, presenting opportunities to analyze match outcomes using machine learning approaches. This study develops and compares four machine learning models to classify chess game results (White win, Black win, or Draw) by integrating player rating information with game dynamic metadata. We analyzed 11,510 complete games from the Lichess platform after preprocessing a dataset of 20,058 initial records. Seven key features were engineered to capture both pre-game skill parameters (player ratings, rating difference) and game complexity metrics (game duration, turn count). Four machine learning algorithms were implemented and optimized through grid search cross-validation: Multinomial Logistic Regression, Random Forest, K-Nearest Neighbors, and Histogram Gradient Boosting. The Gradient Boosting classifier achieved the highest performance with 83.19% accuracy on hold-out data and consistent 5-fold cross-validation scores (83.08% ± 0.009%). Feature importance analysis revealed that game complexity (number of turns) was the strongest correlate of the outcome across all models, followed by the rating difference between opponents. Draws represented only 5.11% of outcomes, creating class imbalance challenges that affected classification performance for this outcome category. The results demonstrate that ensemble methods, particularly gradient boosting, can effectively capture non-linear interactions between player skill and game length to classify chess outcomes. These findings have practical applications for chess platforms in automated content curation, post-game quality assessment, and engagement enhancement strategies. The study establishes a foundation for robust outcome analysis systems in online chess environments. Full article
(This article belongs to the Special Issue Machine Learning for Data Mining)
Show Figures

Figure 1

17 pages, 12946 KB  
Article
A Comparative Analysis of LLM-Based Customer Representation Learning Techniques
by Sangyeop Lee, Jong Seo Kim, Kisoo Kim, Bojung Ko, Junho Moon and Minsik Park
Electronics 2025, 14(24), 4783; https://doi.org/10.3390/electronics14244783 - 5 Dec 2025
Viewed by 261
Abstract
Recent advances in large language models (LLMs) have enabled the effective representation of customer behaviors, including purchases, repairs, and consultations. These LLM-based customer representation models apply to predicting future behavior of the customer or clustering customers with similar representations by latent vectors. Since [...] Read more.
Recent advances in large language models (LLMs) have enabled the effective representation of customer behaviors, including purchases, repairs, and consultations. These LLM-based customer representation models apply to predicting future behavior of the customer or clustering customers with similar representations by latent vectors. Since these representation technologies depend on data, this paper examines whether training a recommendation model (BERT4Rec) from scratch or fine-tuning a pre-trained LLM (ELECTRA) is more effective for our customer data. To address this, a three-step approach is conducted: (1) defining a sequence of customer behaviors into textual inputs for LLM-based representation learning, (2) extracting customer representation as latent vectors by training or fine-tuning representation models on a dataset of 14 million customers, and (3) training classifiers to predict purchase outcomes for eight products. Our focus is on comparing two primary approaches in step (2): training BERT4Rec from scratch versus fine-tuning pre-trained ELECTRA. The average AUC and F1-score of classifiers across eight products reveal that both methods achieve gaps of only 0.012 in AUC and 0.007 in F1-score. On the other hand, the fine-tuned ELECTRA achieves a 0.27 improvement in the top 10% lift for targeted marketing strategies. This result is particularly meaningful given that buyers of products constitute only about 0.5% of the entire dataset. Beyond the three-step approach, we make an effort to interpret latent space in two-dimensional and attention shifts in fine-tuned ELECTRA. Furthermore, we compare its efficiency advantages against fine-tuned LLaMA2. These findings provide practical insights for optimizing LLM-based representation models in industrial applications. Full article
(This article belongs to the Special Issue Machine Learning for Data Mining)
Show Figures

Figure 1

20 pages, 752 KB  
Article
Automatic Labeling of Real-World PMU Data: A Weakly Supervised Learning Approach
by Yunchuan Liu, Lei Yang and Junshan Zhang
Electronics 2025, 14(23), 4703; https://doi.org/10.3390/electronics14234703 - 28 Nov 2025
Viewed by 212
Abstract
This paper presents a weakly supervised learning framework for real-world event identification in transmission networks using phasor measurement unit (PMU) data. The growing integration of renewable energy sources has introduced greater variability in grid conditions, intensifying the need for accurate event detection. Although [...] Read more.
This paper presents a weakly supervised learning framework for real-world event identification in transmission networks using phasor measurement unit (PMU) data. The growing integration of renewable energy sources has introduced greater variability in grid conditions, intensifying the need for accurate event detection. Although high-resolution PMU measurements enable event identification to be formulated as a classification problem, traditional supervised learning approaches are hindered by the scarcity of labeled data, and acquiring large-scale, high-quality labeled PMU datasets remains prohibitively expensive. To overcome this challenge, we propose an automated PMU data-labeling method that combines domain knowledge with machine learning techniques through the use of labeling functions. A novel t-cherry junction tree-based estimation algorithm is introduced to enhance label accuracy, and a greedy strategy is employed to reduce computational complexity. These components are integrated into a weakly supervised framework capable of training robust event classifiers using limited labeled data and abundant unlabeled data. Extensive experiments on real-world PMU datasets demonstrate that our approach achieves competitive accuracy with significantly fewer labeled samples compared to conventional data-driven methods, highlighting its adaptability and resilience under real-world conditions. Full article
(This article belongs to the Special Issue Machine Learning for Data Mining)
Show Figures

Figure 1

28 pages, 3016 KB  
Article
Ensemble Learning Model for Industrial Policy Classification Using Automated Hyperparameter Optimization
by Hee-Seon Jang
Electronics 2025, 14(20), 3974; https://doi.org/10.3390/electronics14203974 - 10 Oct 2025
Viewed by 511
Abstract
The Global Trade Alert (GTA) website, managed by the United Nations, releases a large number of industrial policy (IP) announcements daily. Recently, leading nations including the United States and China have increasingly turned to IPs to protect and promote their domestic corporate interests. [...] Read more.
The Global Trade Alert (GTA) website, managed by the United Nations, releases a large number of industrial policy (IP) announcements daily. Recently, leading nations including the United States and China have increasingly turned to IPs to protect and promote their domestic corporate interests. They use both offensive and defensive tools such as tariffs, trade barriers, investment restrictions, and financial support measures. To evaluate how these policy announcements may affect national interests, many countries have implemented logistic regression models to automatically classify them as either IP or non-IP. This study proposes ensemble models—widely recognized for their superior performance in binary classification—as a more effective alternative. The random forest model (a bagging technique) and boosting methods (gradient boosting, XGBoost, and LightGBM) are proposed, and their performance is compared with that of logistic regression. For evaluation, a dataset of 2000 randomly selected policy documents was compiled and labeled by domain experts. Following data preprocessing, hyperparameter optimization was performed using the Optuna library in Python 3.10. To enhance model robustness, cross-validation was applied, and performance was evaluated using key metrics such as accuracy, precision, and recall. The analytical results demonstrate that ensemble models consistently outperform logistic regression in both baseline (default hyperparameters) and optimized configurations. Compared to logistic regression, LightGBM and random forest showed baseline accuracy improvements of 3.5% and 3.8%, respectively, with hyperparameter optimization yielding additional performance gains of 2.4–3.3% across ensemble methods. In particular, the analysis based on alternative performance indicators confirmed that the LightGBM and random forest models yielded the most reliable predictions. Full article
(This article belongs to the Special Issue Machine Learning for Data Mining)
Show Figures

Figure 1

Back to TopTop