Recent Advances in Data Mining: Methods, Trends, and Emerging Applications

A special issue of Computers (ISSN 2073-431X).

Deadline for manuscript submissions: 31 December 2025 | Viewed by 8433

Special Issue Editor


E-Mail Website
Guest Editor
Khoury College of Computer Science, Northeastern University, Silicon Valley Campus, San Jose, CA 95113, USA
Interests: information retrieval; social network analysis; data mining; machine learning; sentiment analysis; scientometrics; bibliometrics

Special Issue Information

Dear Collogues,

We are excited to invite you to contribute to this Special Issue, entitled ‘Recent Advances in Data Mining: Methods, Trends, and Emerging Applications’.

In recent years, data mining has emerged as a keystone of intelligent data analysis, enabling the discovery of hidden patterns, actionable insights, and predictive knowledge from massive and complex datasets. Powered by rapid progress in computing technologies, machine learning, big data infrastructure, and modern data mining techniques have found applications in a growing range of fields, including healthcare, finance, cybersecurity, social media, environmental sciences, and more.

This Special Issue aims to bring together the latest innovations, trends, challenges, and practical applications in the field of data mining. We invite high-quality contributions that explore new methodologies, theoretical frameworks, and real-world use-cases that highlight the transformative power of data mining in solving contemporary problems. Submissions may focus on algorithmic advances, novel applications, performance improvements, or interdisciplinary research integrating data mining with other emerging technologies such as cloud computing, edge AI, and blockchain.

We welcome original research articles, comprehensive review papers, and insightful case studies from academia and industry.

Topics of interest include, but are not limited to, the following:

  • Novel data mining algorithms and frameworks;
  • Data preprocessing and feature engineering techniques;
  • Scalable and distributed data mining solutions;
  • Mining structured, semi-structured, and unstructured data;
  • Graph and network mining;
  • Privacy-preserving data mining;
  • Data mining in healthcare, finance, marketing, and cybersecurity;
  • Temporal, spatial, and spatio-temporal data mining;
  • Automated and interpretable data mining;
  • Ethical and responsible data mining practices;
  • Integration of data mining with AI, IoT, and cloud technologies;
  • Emerging applications and interdisciplinary approaches;
  • Pattern discovery in large-scale and unstructured data;
  • Text mining and semantic analysis;
  • Sentiment analysis and opinion mining.

Dr. Tehmina Amjad
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Computers is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data mining
  • pattern discovery
  • scalable algorithms
  • privacy-preserving analytics
  • feature selection
  • graph mining
  • real-world applications
  • interpretability
  • semantics analysis
  • emerging trends

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

28 pages, 2467 KB  
Article
MSDSI-FND: Multi-Stage Detection Model of Influential Users’ Fake News in Online Social Networks
by Hala Al-Mutair and Jawad Berri
Computers 2025, 14(12), 517; https://doi.org/10.3390/computers14120517 - 26 Nov 2025
Viewed by 446
Abstract
The rapid spread of fake news across social media poses significant threats to politics, economics, and public health. During the COVID-19 pandemic, social media influencers played a decisive role in amplifying misinformation due to their large follower bases and perceived authority. This study [...] Read more.
The rapid spread of fake news across social media poses significant threats to politics, economics, and public health. During the COVID-19 pandemic, social media influencers played a decisive role in amplifying misinformation due to their large follower bases and perceived authority. This study proposes a Multi-Stage Detection System for Influencer Fake News (MSDSI-FND) to detect misinformation propagated by influential users on the X platform (formerly Twitter). A manually labeled dataset was constructed, comprising 68 root tweets (42 fake and 26 real) and over 40,000 engagements (26,700 replies and 14,000 retweets) collected between December 2019 and December 2022. The MSDSI-FND model employs a two-stage analytical framework integrating: (1) content-based linguistic and psycholinguistic analysis, (2) user profiles analysis, structural and propagation-based modeling of information cascades analysis. Several machine-learning classifiers were tested under single-stage, two-stage, and full multi-stage configurations. An ablation study demonstrated that performance improved progressively with each added analytical stage. The full MSDSI-FND model achieved the highest accuracy, F1-score, and AUC, confirming the effectiveness of hierarchical, stage-wise integration. The results highlight the superiority of the proposed multi-stage, influential user-aware framework over conventional hybrid or text-only models. By sequentially combining linguistic, behavioral, and structural cues, MSDSI-FND provides an interpretable and robust approach to identifying large-scale misinformation dissemination within influential user-driven social networks. Full article
Show Figures

Figure 1

25 pages, 7447 KB  
Article
Machine Learning Models for Subsurface Pressure Prediction: A Data Mining Approach
by Muhammad Raiees Amjad, Rohan Benjamin Varghese and Tehmina Amjad
Computers 2025, 14(11), 499; https://doi.org/10.3390/computers14110499 - 17 Nov 2025
Viewed by 450
Abstract
Precise pore pressure prediction is highly essential for safe and effective drilling; however, the nonlinear and heterogeneous nature of the subsurface strata makes it extremely challenging. Conventional physics-based methods are not capable of handling this nonlinearity and variation. Recently, machine learning (ML) methods [...] Read more.
Precise pore pressure prediction is highly essential for safe and effective drilling; however, the nonlinear and heterogeneous nature of the subsurface strata makes it extremely challenging. Conventional physics-based methods are not capable of handling this nonlinearity and variation. Recently, machine learning (ML) methods have been deployed by researchers to enhance prediction performance. These methods are often highly domain-specific and produce good results for the data they are trained for but struggle to generalize to unseen data. This study introduces a Hybrid Meta-Ensemble (HME), a meta model framework, as a novel data mining approach that applies ML methods and ensemble learning on well log data for pore pressure prediction. This proposed study first trains five baseline models including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Deep Feedforward Neural Network (DFNN), Random Forest (RF), and Extreme Gradient Boost (XGBoost) to capture sequential and nonlinear relationships for pore pressure prediction. The stacked predictions are further improved through a meta learner that adaptively reweighs them according to subsurface heterogeneity, effectively strengthening the ability of ensembles to generalize across diverse geological settings. The experimentation is performed on well log data from four wells located in the Potwar Basin which is one of Pakistan’s principal oil- and gas-producing regions. The proposed Hybrid Meta-Ensemble (HME) has achieved an R2 value of 0.93, outperforming the individual base models. Using the HME approach, the model effectively captures rock heterogeneity by learning optimal nonlinear interactions among the base models, leading to more accurate pressure predictions. Results show that integrating deep learning with robust meta learning substantially improves the accuracy of pore pressure prediction. Full article
Show Figures

Figure 1

18 pages, 770 KB  
Article
Emotion in Words: The Role of Ed Sheeran and Sia’s Lyrics on the Musical Experience
by Catarina Travanca, Mónica Cruz and Abílio Oliveira
Computers 2025, 14(11), 460; https://doi.org/10.3390/computers14110460 - 24 Oct 2025
Viewed by 790
Abstract
Music plays an increasingly vital role in modern society, becoming a fundamental part of everyday life. Beyond entertainment, it contributes to emotional well-being by helping individuals express their feelings, process emotions, and find comfort during different life moments. This study explores the emotional [...] Read more.
Music plays an increasingly vital role in modern society, becoming a fundamental part of everyday life. Beyond entertainment, it contributes to emotional well-being by helping individuals express their feelings, process emotions, and find comfort during different life moments. This study explores the emotional impact of Ed Sheeran’s lyrics and Sia’s lyrics on listeners. Using an exploratory approach, it applies a text mining tool to extract data, identify key dimensions, and compare thematic elements across both artists’ work. The analysis reveals distinct emotional patterns and thematic contrasts, offering insight into how their lyrics resonate with audiences on a deeper level. These findings enhance our understanding of the emotional power of contemporary music and highlight how lyrical content can shape listeners’ emotional experiences. Moreover, the study demonstrates the value of text mining as a method for examining popular music, providing a new lens through which to explore the connection between music and emotion. Full article
Show Figures

Figure 1

19 pages, 1396 KB  
Article
Sparse Keyword Data Analysis Using Bayesian Pattern Mining
by Sunghae Jun
Computers 2025, 14(10), 436; https://doi.org/10.3390/computers14100436 - 14 Oct 2025
Viewed by 422
Abstract
Keyword data analysis aims to extract and interpret meaningful relationships from large collections of text documents. A major challenge in this process arises from the extreme sparsity of document–keyword matrices, where the majority of elements are zeros due to zero inflation. To address [...] Read more.
Keyword data analysis aims to extract and interpret meaningful relationships from large collections of text documents. A major challenge in this process arises from the extreme sparsity of document–keyword matrices, where the majority of elements are zeros due to zero inflation. To address this issue, this study proposes a probabilistic framework called Bayesian Pattern Mining (BPM), which integrates Bayesian inference into association rule mining (ARM). The proposed method estimates both the expected values and credible intervals of interestingness measures such as confidence and lift, providing a probabilistic evaluation of keyword associations. Experiments conducted on 9436 quantum computing patent documents, from which 175 representative keywords were extracted, demonstrate that BPM yields more stable and interpretable associations than conventional ARM. By incorporating credible intervals, BPM reduces the risk of biased decisions under sparsity and enhances the reliability of keyword-based technology analysis, offering a rigorous approach for knowledge discovery in zero-inflated text data. Full article
Show Figures

Graphical abstract

33 pages, 9908 KB  
Article
Mapping the Chemical Space of Antiviral Peptides with Half-Space Proximal and Metadata Networks Through Interactive Data Mining
by Daniela de Llano García, Yovani Marrero-Ponce, Guillermin Agüero-Chapin, Hortensia Rodríguez, Francesc J. Ferri, Edgar A. Márquez, José R. Mora, Felix Martinez-Rios and Yunierkis Pérez-Castillo
Computers 2025, 14(10), 423; https://doi.org/10.3390/computers14100423 - 3 Oct 2025
Viewed by 2032
Abstract
Antiviral peptides (AVPs) are promising therapeutic candidates, yet the rapid growth of sequence data and the field’s emphasis on predictors have left a gap: the lack of an integrated view linking peptide chemistry with biological context. Here, we map the AVP landscape through [...] Read more.
Antiviral peptides (AVPs) are promising therapeutic candidates, yet the rapid growth of sequence data and the field’s emphasis on predictors have left a gap: the lack of an integrated view linking peptide chemistry with biological context. Here, we map the AVP landscape through interactive data mining using Half-Space Proximal Networks (HSPNs) and Metadata Networks (MNs) in the StarPep toolbox. HSPNs minimize edges and avoid fixed thresholds, reducing computational cost while enabling high-resolution analysis. A threshold-free HSPN resolved eight chemically and biologically distinct communities, while MNs contextualized AVPs by source, function, and target, revealing structural–functional relationships. To capture diversity compactly, we applied centrality-guided scaffold extraction with redundancy removal (90–50% identity), producing four representative subsets suitable for modeling and similarity searches. Alignment-free motif discovery yielded 33 validated motifs, including 10 overlapping with reported AVP signatures and 23 apparently novel. Motifs displayed category-specific enrichment across antimicrobial classes, and sequences carrying multiple motifs (≥4–5) consistently showed higher predicted antiviral probabilities. Beyond computational insights, scaffolds provide representative “entry points” into AVP chemical space, while motifs serve as modular building blocks for rational design. Together, these resources provide an integrated framework that may inform AVP discovery and support scaffold- and motif-guided therapeutic design. Full article
Show Figures

Graphical abstract

20 pages, 1604 KB  
Article
Rule-Based eXplainable Autoencoder for DNS Tunneling Detection
by Giacomo De Bernardi, Giovanni Battista Gaggero, Fabio Patrone, Sandro Zappatore, Mario Marchese and Maurizio Mongelli
Computers 2025, 14(9), 375; https://doi.org/10.3390/computers14090375 - 8 Sep 2025
Viewed by 896
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) are employed in numerous fields and applications. Even if most of these approaches offer a very good performance, they are affected by the “black-box” problem. The way they operate and make decisions is complex and difficult [...] Read more.
Artificial Intelligence (AI) and Machine Learning (ML) are employed in numerous fields and applications. Even if most of these approaches offer a very good performance, they are affected by the “black-box” problem. The way they operate and make decisions is complex and difficult for human users to interpret, making the systems impossible to manually adjust in case they make trivial (from a human viewpoint) errors. In this paper, we show how a “white-box” approach based on eXplainable AI (XAI) can be applied to the Domain Name System (DNS) tunneling detection problem, a cybersecurity problem already successfully addressed by “black-box” approaches, in order to make the detection explainable. The obtained results show that the proposed solution can achieve a performance comparable to the one offered by an autoencoder-based solution while offering a clear view of how the system makes its choices and the possibility of manual analysis and adjustments. Full article
Show Figures

Figure 1

15 pages, 1461 KB  
Article
Quantum Computing in Data Science and STEM Education: Mapping Academic Trends and Analyzing Practical Tools
by Eloy López-Meneses, Jesús Cáceres-Tello, José Javier Galán-Hernández and Luis López-Catalán
Computers 2025, 14(6), 235; https://doi.org/10.3390/computers14060235 - 16 Jun 2025
Cited by 1 | Viewed by 2463
Abstract
Quantum computing is emerging as a key enabler of digital transformation in data science and STEM education. This study investigates how quantum computing can be meaningfully integrated into higher education by combining a dual approach: a structured assessment of the specialized literature and [...] Read more.
Quantum computing is emerging as a key enabler of digital transformation in data science and STEM education. This study investigates how quantum computing can be meaningfully integrated into higher education by combining a dual approach: a structured assessment of the specialized literature and a practical evaluation of educational tools. First, a science mapping study based on 281 peer-reviewed publications indexed in Scopus (2015–2024) identifies growth trends, thematic clusters, and international collaboration networks at the intersection of quantum computing, data science, and education. Second, a comparative analysis of widely used educational platforms—such as Qiskit, Quantum Inspire, QuTiP, and Amazon Braket—is conducted using pedagogical criteria including accessibility, usability, and curriculum integration. The results highlight a growing convergence between quantum technologies, artificial intelligence, and data-driven learning. A strategic framework and roadmap are proposed to support the gradual and scalable adoption of quantum literacy in university-level STEM programs. Full article
Show Figures

Graphical abstract

Back to TopTop