AI-Driven Data Analytics and Mining

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 15 August 2026 | Viewed by 14286

Special Issue Editors


E-Mail Website
Guest Editor
School of Social Sciences, Department of Finance and Accounting, Faculty of Economic Sciences, Lucian Blaga University of Sibiu, 550324 Sibiu, Romania
Interests: data analysis; databases; big data; data mining; data science; cybernetics; big data analytics; machine learning; deep learning; sentiment analysis; text mining

E-Mail Website
Guest Editor
Department of Finance, Information Systems and Business Modeling, Faculty of Economics and Business Administration, West University of Timisoara, 300223 Timisoara, Romania
Interests: big data analytics; data science; data mining; artificial intelligence; machine learning; deep learning; reinforcement learning; NLP; sentiment analysis; cybersecurity

E-Mail Website
Guest Editor
Department of Finance and Accounting, Faculty of Economic Sciences, Lucian Blaga University of Sibiu, 550324 Sibiu, Romania
Interests: cybernetics; big data analytics; machine learning; deep learning; sentiment analysis; text mining

Special Issue Information

Dear Colleagues,

The rapid growth of data volumes and complexity has made AI-driven analytics and mining indispensable for extracting actionable knowledge. By integrating cybernetic feedback principles with advanced machine learning and sentiment analysis techniques, researchers can develop adaptive, self-regulating systems that learn from dynamic environments and human inputs. This convergence addresses critical challenges in processing high-velocity, heterogeneous data streams while ensuring robust decision support and system autonomy.

This Special Issue of Electronics welcomes contributions that advance AI-driven data analytics and mining, with a focus on cybernetics, big data analytics, and sentiment analysis. We seek original research, reviews, and case studies highlighting novel algorithms, system architectures, and end-to-end pipelines—from data acquisition and integration through to explainable modeling and deployment. Submissions should align with the journal’s mission to foster innovative, open access dissemination of impactful AI solutions.

Application scenarios of interest include, but are not limited to, the following:

  • Cybernetic control architectures for adaptive mining.
  • Scalable big data preprocessing and feature-learning frameworks.
  • Sentiment-aware text and social media analytics.
  • IoT-enabled data mining applications.
  • Cybersecurity and privacy maintenance.
  • Industrial automation and cybernetic control in Industry 4.0.
  • Big data analytics in finance and accounting.
  • Sentiment analysis and social media intelligence.
  • Life sciences and healthcare monitoring.
  • Internet of Things (IoT) deployments.
  • Management and marketing optimization.
  • Environmental monitoring and sustainability.

I/We look forward to hearing from you.

Prof. Dr. Marian-Pompiliu Cristescu
Dr. Claudiu Brandas
Dr. Dumitru Alexandru Mara
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • AI-driven data analytics
  • data mining
  • cybernetics
  • big data analytics
  • stream mining
  • concept-drift adaptation
  • explainable AI

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 9739 KB  
Article
Denoising Auto-Encoder-Enhanced Deep Non-Negative Matrix Factorization Clustering Model
by Shaodong Wenren, Liang Dou and Jian Jin
Electronics 2026, 15(9), 1811; https://doi.org/10.3390/electronics15091811 - 24 Apr 2026
Viewed by 196
Abstract
Non-negative matrix factorization directly decomposes data features into a base matrix and community matrix, which are easily affected by noise. Multi-view datasets have multiple feature matrices, each with a different angle. The data features need to be re-synthesized rather than simply concatenated or [...] Read more.
Non-negative matrix factorization directly decomposes data features into a base matrix and community matrix, which are easily affected by noise. Multi-view datasets have multiple feature matrices, each with a different angle. The data features need to be re-synthesized rather than simply concatenated or added. Based on the advantages and disadvantages of multi-view clustering and non-negative matrix factorization, we attempt to transplant the method of analyzing abstract connected graphs, analogize the similarity between edges and samples in the graph, and propose a deep non-negative matrix factorization model for clustering by constructing a similarity matrix and decomposing it. At the same time, in order to reduce the interference of noise, we introduce a denoising auto-encoder and non-negative matrix factorization in series, and research the reconstruction features, ultimately forming a model structure framework of “denoising auto-encoder, non-negative matrix factorization, clustering”. Through experiments, the denoising auto-encoder-enhanced non-negative matrix factorization achieved good results on five datasets. It achieved an accuracy of 87 percenton the BBC Sport dataset and 61 percent on Wiki-fea, which increased by two percentage points. The clustering results demonstrate that the model can effectively alleviate the impact of noise and provide new ideas for how to integrate multi-view features. Full article
(This article belongs to the Special Issue AI-Driven Data Analytics and Mining)
Show Figures

Figure 1

20 pages, 27100 KB  
Article
EHCFE: Enhanced Hierarchical Clustering with Feature Engineering for Automating Labeling of Student Performance and Dropout Prediction
by Nusaybah Alghanmi
Electronics 2026, 15(6), 1265; https://doi.org/10.3390/electronics15061265 - 18 Mar 2026
Viewed by 355
Abstract
Educational success is a critical component of societal development, yet increasing student dropout rates present ongoing challenges. While supervised learning models are commonly used for dropout prediction, they rely on manually labeled data, a process that is time-consuming and dependent on expert annotation. [...] Read more.
Educational success is a critical component of societal development, yet increasing student dropout rates present ongoing challenges. While supervised learning models are commonly used for dropout prediction, they rely on manually labeled data, a process that is time-consuming and dependent on expert annotation. Unsupervised learning models, clustering approaches, have been explored as an alternative; however, existing methods typically group students based on activity patterns without generating binary outcome labels such as dropout or success. Furthermore, their effectiveness often depends heavily on the quality of the selected features, and most current solutions utilize only limited or pre-structured subsets of institutional data. This paper addresses these challenges and proposes EHCFE (Enhanced Hierarchical Clustering with Feature Engineering), to automatically generate binary labels from unlabeled educational datasets. EHCFE applies feature engineering by generating new features from the top-ranked features identified during feature selection while retaining the original feature set, thereby improving the quality of the labeling outcomes. The approach is evaluated on three datasets and compared with current and state-of-the-art models using several evaluation metrics, including F1 score, area under the receiver operating characteristic curve (AUC), and silhouette coefficient. Experimental results show that EHCFE achieves the highest F1 score (0.709 and 0.28) and AUC values (0.766 and 0.81) on two datasets. A ranking analysis across six evaluation metrics demonstrates that EHCFE outperforms existing models, achieving the highest average ranks of 1.50 and 1.83 on two datasets and a competitive rank of 1.92 on the third. Full article
(This article belongs to the Special Issue AI-Driven Data Analytics and Mining)
Show Figures

Figure 1

19 pages, 1553 KB  
Article
Enhancing Student Retention in Higher Education Institutions (HEIs): Machine Learning Approach
by Emeka Cajetan Umendu, Mustansar Ghanzanfar, Aaron Kans and Md Atiqur Rahman Ahad
Electronics 2026, 15(4), 734; https://doi.org/10.3390/electronics15040734 - 9 Feb 2026
Viewed by 1058
Abstract
Student dropout remains a critical challenge for higher education institutions, with significant implications for resource allocation, academic planning, and institutional sustainability. This study applies machine learning techniques to predict student non-continuation and attrition to support data-driven retention strategies in higher education. By framing [...] Read more.
Student dropout remains a critical challenge for higher education institutions, with significant implications for resource allocation, academic planning, and institutional sustainability. This study applies machine learning techniques to predict student non-continuation and attrition to support data-driven retention strategies in higher education. By framing the problem as a multi-class classification task (Dropout, Enrolled, Graduate), the proposed framework enables early and differentiated intervention planning. Using a publicly available higher education student dataset (4424 records, 34 features, multi-class outcome), a structured analytical pipeline was implemented, incorporating Winsorisation for outlier mitigation, SMOTE for class imbalance handling, and targeted feature engineering. Model performance was assessed using a 5-fold nested cross-validation framework. Four classifiers, Extra Trees, Random Forest, Gradient Boosting, and Logistic Regression, were trained on an optimised subset of 28 features. Among these, the Extra Trees model achieved the strongest performance, attaining a mean AUC of 0.96 (±0.0053) and an accuracy of 87.4% (±0.012). Model interpretability was enhanced through SHAP analysis, which identified cumulative approved academic units and tuition fee payment status as the most influential predictors of student outcomes. The findings underscore the value of early predictive analytics for informing proactive institutional interventions, particularly in academic monitoring and financial support to strengthen student retention frameworks. Full article
(This article belongs to the Special Issue AI-Driven Data Analytics and Mining)
Show Figures

Figure 1

21 pages, 5194 KB  
Article
A Typhoon Clustering Model for the Western Pacific Coast Based on Interpretable Machine Learning
by Yanhe Wang, Yinzhen Lv, Lei Zhang, Tianrun Gao, Ruiqi Feng, Yihan Zhou and Wei Zhang
Electronics 2026, 15(2), 379; https://doi.org/10.3390/electronics15020379 - 15 Jan 2026
Viewed by 530
Abstract
As a complex and destructive natural disaster, the characteristics of typhoons are closely related to human activities, and their accurate categorization is of vital significance for improving disaster warning and management capabilities. This study highlights the key role of typhoon clustering in analyzing [...] Read more.
As a complex and destructive natural disaster, the characteristics of typhoons are closely related to human activities, and their accurate categorization is of vital significance for improving disaster warning and management capabilities. This study highlights the key role of typhoon clustering in analyzing typhoon behaviors, aiming to provide reliable support for disaster prevention and control. Based on the NOAA meteorological dataset from 2003 to 2024, this study firstly adopts the K-means clustering algorithm to classify typhoons into seven categories and then utilizes eight machine learning models to train and validate the classification results, and introduces the Shapley’s additive interpretation (SHAP) algorithm to enhance the interpretability of the models. The study data covers a variety of features such as air temperature, wind speed, atmospheric pressure, and weather station observations, etc. After a systematic preprocessing process, a feature matrix containing key variables such as typhoon intensity and moving speed is constructed. The results show that the XGBoost model outperforms others across multiple evaluation metrics (Accuracy: 0.992, Precision: 0.989, Recall: 0.992, F1.5 Score: 0.990), highlighting its exceptional capability in managing complex weather classification tasks. The seven categories of typhoon types classified by K-means exhibit different feature patterns, while the SHAP analysis further reveals the effects of each feature on the classification and its potential interactions. This study not only verifies the effectiveness of K-means combined with machine learning in typhoon classification but also lays a solid scientific foundation for accurate prediction, risk assessment and optimization of management strategies for typhoon disasters through the in-depth analysis of feature impacts. Full article
(This article belongs to the Special Issue AI-Driven Data Analytics and Mining)
Show Figures

Figure 1

32 pages, 2195 KB  
Article
MUSIGAIN: Adaptive Graph Attention Network for Multi-Relationship Mining in Music Knowledge Graphs
by Mian Chen, Tinghao Wang, Chunhao Li and Yuheng Li
Electronics 2025, 14(24), 4892; https://doi.org/10.3390/electronics14244892 - 12 Dec 2025
Viewed by 1188
Abstract
With the exponential growth of digital music, efficiently identifying key music relationship nodes in large-scale music knowledge graphs is crucial for enhancing music recommendation, emotion analysis, and genre classification. To address this challenge, we propose MUSIGAIN, a GATv2-based adaptive framework that combines graph [...] Read more.
With the exponential growth of digital music, efficiently identifying key music relationship nodes in large-scale music knowledge graphs is crucial for enhancing music recommendation, emotion analysis, and genre classification. To address this challenge, we propose MUSIGAIN, a GATv2-based adaptive framework that combines graph robustness metrics with advanced graph neural network mechanisms for multi-relationship mining in heterogeneous music knowledge graphs. MUSIGAIN tackles three fundamental challenges: the prohibitive computational complexity of exact graph-robustness calculations, the limitations of traditional centrality measures in capturing semantic heterogeneity, and the over-smoothing problem in deep graph neural networks. The framework introduces three key innovations: (1) a layer-wise dynamic skipping mechanism that adaptively controls propagation depth based on third-order embedding stability, reducing computation by 30–40% while preventing over-smoothing; (2) the DiGRAF adaptive activation function that enables node-specific nonlinear transformations to capture semantic heterogeneity across different entity types; and (3) ranking-based optimization supervised by graph robustness metrics, focusing on relative importance ordering rather than absolute value prediction. Experimental results on four real-world music knowledge graphs (POP-MKG, ROCK-MKG, JAZZ-MKG, CLASSICAL-MKG) demonstrate that MUSIGAIN consistently outperforms existing methods in Top-5% node identification accuracy, achieving up to 96.78% while maintaining linear scalability to graphs with hundreds of thousands of nodes. MUSIGAIN provides an efficient, accurate, and interpretable solution for key node identification in complex heterogeneous graphs. Full article
(This article belongs to the Special Issue AI-Driven Data Analytics and Mining)
Show Figures

Figure 1

32 pages, 6691 KB  
Article
Fine-Tuning and Explaining FinBERT for Sector-Specific Financial News: A Reproducible Workflow
by Marian Pompiliu Cristescu, Claudiu Brândaș, Dumitru Alexandru Mara and Petrea Ioana
Electronics 2025, 14(23), 4680; https://doi.org/10.3390/electronics14234680 - 27 Nov 2025
Viewed by 3131
Abstract
The increasing use of complex “black-box” models for financial news sentiment analysis presents a challenge in high-stakes settings where transparency and trust are paramount. This study introduces and validates a finance-focused, fully reproducible, open-source workflow for building, explaining, and evaluating sector-specific sentiment models [...] Read more.
The increasing use of complex “black-box” models for financial news sentiment analysis presents a challenge in high-stakes settings where transparency and trust are paramount. This study introduces and validates a finance-focused, fully reproducible, open-source workflow for building, explaining, and evaluating sector-specific sentiment models mapped to standard market taxonomies and investable proxies. We benchmark interpretable and transformer-based models on public datasets and a newly constructed, manually annotated gold-standard corpus of 1500 U.S. sector-tagged financial headlines. While a zero-shot FinBERT establishes a reasonable baseline (macro F1 = 0.555), fine-tuning on our gold data yields a robust macro F1 = 0.707, a substantial uplift. We extend explainability to the fine-tuned FinBERT with Integrated Gradients (IG) and LIME and perform a quantitative faithfulness audit via deletion curves and AOPC; LIME is most faithful (AOPC = 0.365). We also quantify the risks of weak supervision: accuracy drops (−21.0%) and explanations diverge (SHAP rank ρ = 0.11) relative to gold-label training. Crucially, econometric tests show the sentiment signal is reactive, not predictive, of next-day returns; yet it still supports profitable sector strategies (e.g., Technology long-short Sharpe 1.88). Novelty lies in a finance-aligned, sector-aware, trustworthiness blueprint that pairs fine-tuned FinBERT with audited explanations and uncertainty checks, all end-to-end reproducible and tied to investable sector ETFs. Full article
(This article belongs to the Special Issue AI-Driven Data Analytics and Mining)
Show Figures

Figure 1

22 pages, 728 KB  
Article
Design and Performance Evaluation of LLM-Based RAG Pipelines for Chatbot Services in International Student Admissions
by Maksuda Khasanova Zafar kizi and Youngjung Suh
Electronics 2025, 14(15), 3095; https://doi.org/10.3390/electronics14153095 - 2 Aug 2025
Cited by 4 | Viewed by 7099
Abstract
Recent advancements in large language models (LLMs) have significantly enhanced the effectiveness of Retrieval-Augmented Generation (RAG) systems. This study focuses on the development and evaluation of a domain-specific AI chatbot designed to support international student admissions by leveraging LLM-based RAG pipelines. We implement [...] Read more.
Recent advancements in large language models (LLMs) have significantly enhanced the effectiveness of Retrieval-Augmented Generation (RAG) systems. This study focuses on the development and evaluation of a domain-specific AI chatbot designed to support international student admissions by leveraging LLM-based RAG pipelines. We implement and compare multiple pipeline configurations, combining retrieval methods (e.g., Dense, MMR, Hybrid), chunking strategies (e.g., Semantic, Recursive), and both open-source and commercial LLMs. Dual evaluation datasets of LLM-generated and human-tagged QA sets are used to measure answer relevancy, faithfulness, context precision, and recall, alongside heuristic NLP metrics. Furthermore, latency analysis across different RAG stages is conducted to assess deployment feasibility in real-world educational environments. Results show that well-optimized open-source RAG pipelines can offer comparable performance to GPT-4o while maintaining scalability and cost-efficiency. These findings suggest that the proposed chatbot system can provide a practical and technically sound solution for international student services in resource-constrained academic institutions. Full article
(This article belongs to the Special Issue AI-Driven Data Analytics and Mining)
Show Figures

Figure 1

Back to TopTop