Topic Editors

School of Business, University of Southern Queensland, Springfield, QLD 4300, Australia
School of Computer Science, Queensland University of Technology, Brisbane, QLD 4000, Australia
School of Business, University of Southern Queensland, Springfield 4300, Australia
Prof. Dr. Ji Zhang
School of Mathematics, Physics and Computing, University of Southern Queensland, Toowoomba, QLD 4350, Australia
Prof. Dr. Revathi Venkataraman
School of Computing, SRM Institute of Science and Technology, Chennai 603203, India

New Applications of Big Data Technology: Integration of Data Mining and Artificial Intelligence

Abstract submission deadline
31 December 2025
Manuscript submission deadline
31 March 2026
Viewed by
7538

Topic Information

Dear Colleagues,

The landscape of data mining and machine learning is rapidly evolving, fuelled by advancements in algorithms, computational power, and the availability of vast datasets. This Topic will explore the latest trends and innovations shaping the future of these fields. Key areas of interest include, but are not limited to, deep learning architectures, reinforcement learning, unsupervised and semi-supervised learning techniques, federated learning, and the integration of machine learning with big data technologies. We invite contributions that address novel approaches and methodologies, including improvements in model interpretability, the development of more efficient algorithms, and the application of machine learning in diverse domains such as healthcare, finance, engineering, material science, and social networks. Special emphasis will be placed on emerging topics like generative AI, explainable AI (XAI), edge AI, and the ethical implications of AI deployment. In the realm of data mining, we are particularly interested in new techniques for anomaly detection, pattern recognition, and predictive analytics. Papers exploring the convergence of data mining with AI technologies, such as using deep learning for feature extraction or leveraging generative models for data augmentation, are highly encouraged. By bringing together cutting-edge research and practical applications, this Topic will provide a comprehensive overview of the current state and future directions of data mining and machine learning. We encourage submissions that offer theoretical insights, empirical studies, and case studies demonstrating the transformative impacts of these technologies. Join us in contributing to this exciting discourse and advancing our field through collaborative knowledge-sharing.

Prof. Dr. Xujuan Zhou
Prof. Dr. Yuefeng Li
Prof. Dr. Raj Gururajan
Prof. Dr. Ji Zhang
Prof. Dr. Revathi Venkataraman
Topic Editors

Keywords

  • data and text mining
  • graph data mining
  • machine and deep learning
  • reinforcement learning
  • supervised and unsupervised learning
  • semi-supervised learning
  • federated learning
  • generative AI and explainable AI (XAI)
  • edge AI
  • pattern recognition and anomaly detection
  • predictive analytics
  • natural language processing (NLP)
  • computer vision
  • big data technologies
  • AI applications in diverse domains

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Applied Sciences
applsci
2.5 5.3 2011 18.4 Days CHF 2400 Submit
Data
data
2.2 4.3 2016 26.8 Days CHF 1600 Submit
Electronics
electronics
2.6 5.3 2012 16.4 Days CHF 2400 Submit
Information
information
2.4 6.9 2010 16.4 Days CHF 1600 Submit
Mathematics
mathematics
2.3 4.0 2013 18.3 Days CHF 2600 Submit

Preprints.org is a multidisciplinary platform offering a preprint service designed to facilitate the early sharing of your research. It supports and empowers your research journey from the very beginning.

MDPI Topics is collaborating with Preprints.org and has established a direct connection between MDPI journals and the platform. Authors are encouraged to take advantage of this opportunity by posting their preprints at Preprints.org prior to publication:

  1. Share your research immediately: disseminate your ideas prior to publication and establish priority for your work.
  2. Safeguard your intellectual contribution: Protect your ideas with a time-stamped preprint that serves as proof of your research timeline.
  3. Boost visibility and impact: Increase the reach and influence of your research by making it accessible to a global audience.
  4. Gain early feedback: Receive valuable input and insights from peers before submitting to a journal.
  5. Ensure broad indexing: Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (10 papers)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
19 pages, 7259 KiB  
Article
A Novel Fuzzy Kernel Extreme Learning Machine Algorithm in Classification Problems
by Asli Kaya Karakutuk and Ozer Ozdemir
Appl. Sci. 2025, 15(8), 4506; https://doi.org/10.3390/app15084506 - 19 Apr 2025
Viewed by 146
Abstract
Today, numerous methods have been developed to address various problems, each with its own advantages and limitations. To overcome these limitations, hybrid structures that integrate multiple techniques have emerged as effective computational methods, offering superior performance and efficiency compared to single-method solutions. In [...] Read more.
Today, numerous methods have been developed to address various problems, each with its own advantages and limitations. To overcome these limitations, hybrid structures that integrate multiple techniques have emerged as effective computational methods, offering superior performance and efficiency compared to single-method solutions. In this paper, we introduce a basic method that combines the strengths of fuzzy logic, wavelet theory, and kernel-based extreme learning machines to efficiently classify facial expressions. We call this method the Fuzzy Wavelet Mexican Hat Kernel Extreme Learning Machine. To evaluate the classification performance of this mathematically defined hybrid method, we apply it to both an original dataset and the JAFFE dataset. The method is enhanced with various feature extraction methods. On the JAFFE dataset, the algorithm achieved an average classification accuracy of 94.55% when supported with local binary patterns and 94.27% with a histogram of oriented gradients. Moreover, these results outperform those of previous studies conducted on the same dataset. On the original dataset, the proposed method was compared with an extreme learning machine and wavelet neural network, and it was found that the method has remarkable efficiency compared to the other two methods. Full article
Show Figures

Figure 1

18 pages, 7299 KiB  
Article
Unsupervised Contrastive Learning for Time Series Data Clustering
by Bo Cao, Qinghua Xing, Ke Yang, Xuan Wu and Longyue Li
Electronics 2025, 14(8), 1660; https://doi.org/10.3390/electronics14081660 - 19 Apr 2025
Viewed by 164
Abstract
Aiming at the problems of existing time series data clustering methods, such as the lack of similarity metric universality, the influence of dimensional catastrophe, and the limitation of feature expression ability, a time series data clustering method based on unsupervised contrasting learning (UCL-TSC) [...] Read more.
Aiming at the problems of existing time series data clustering methods, such as the lack of similarity metric universality, the influence of dimensional catastrophe, and the limitation of feature expression ability, a time series data clustering method based on unsupervised contrasting learning (UCL-TSC) is proposed. The method first utilizes Residual, TCN, and CNN-TCN to construct multi-view representations of spatial, temporal, and spatial–temporal features of time series data, and adaptively fuses complementary information to enhance feature extraction capabilities. Subsequently, positive and negative sample pairs are constructed based on nearest neighbor and pseudo-clustering label information. Finally, a contrast loss function consisting of feature loss, clustering loss, and a regularization term is designed to facilitate the model in achieving compact intra-cluster and sparse inter-cluster clustering effects in the clustering process. The experimental results on the UCR dataset show that UCL-TSC performs well with respect to several evaluation indexes, such as clustering accuracy, normalized information degree, and purity, and is more effective in learning time series data features and achieving accurate clustering compared to traditional clustering and deep clustering methods. Full article
Show Figures

Figure 1

22 pages, 6364 KiB  
Article
Multi-Frame Joint Detection Approach for Foreign Object Detection in Large-Volume Parenterals
by Ziqi Li, Dongyao Jia, Zihao He and Nengkai Wu
Mathematics 2025, 13(8), 1333; https://doi.org/10.3390/math13081333 - 18 Apr 2025
Viewed by 230
Abstract
Large-volume parenterals (LVPs), as essential medical products, are widely used in healthcare settings, making their safety inspection crucial. Current methods for detecting foreign particles in LVP solutions through image analysis primarily rely on single-frame detection or simple temporal smoothing strategies, which fail to [...] Read more.
Large-volume parenterals (LVPs), as essential medical products, are widely used in healthcare settings, making their safety inspection crucial. Current methods for detecting foreign particles in LVP solutions through image analysis primarily rely on single-frame detection or simple temporal smoothing strategies, which fail to effectively utilize spatiotemporal correlations across multiple frames. Factors such as occlusion, motion blur, and refractive distortion can significantly impact detection accuracy. To address these challenges, this paper proposes a multi-frame object detection framework based on spatiotemporal collaborative learning, incorporating three key innovations: a YOLO network optimized with deformable convolution, a differentiable cross-frame association module, and an uncertainty-aware feature fusion and re-identification module. Experimental results demonstrate that our method achieves a 97% detection rate for contaminated LVP solutions on the LVPD dataset. Furthermore, the proposed method enables end-to-end training and processes five bottles per second, meeting the requirements for real-time pipeline applications. Full article
Show Figures

Figure 1

19 pages, 2604 KiB  
Article
Quantifying Relational Exploration in Cultural Heritage Knowledge Graphs with LLMs: A Neuro-Symbolic Approach for Enhanced Knowledge Discovery
by Mohammed Maree
Data 2025, 10(4), 52; https://doi.org/10.3390/data10040052 - 10 Apr 2025
Viewed by 365
Abstract
This paper introduces a neuro-symbolic approach for relational exploration in cultural heritage knowledge graphs, exploiting Large Language Models (LLMs) for explanation generation and a mathematically grounded model to quantify the interestingness of relationships. We demonstrate the importance of the proposed interestingness measure through [...] Read more.
This paper introduces a neuro-symbolic approach for relational exploration in cultural heritage knowledge graphs, exploiting Large Language Models (LLMs) for explanation generation and a mathematically grounded model to quantify the interestingness of relationships. We demonstrate the importance of the proposed interestingness measure through a quantitative analysis, highlighting its significant impact on system performance, particularly in terms of precision, recall, and F1-score. Utilizing the Wikidata Cultural Heritage Linked Open Data (WCH-LOD) dataset, our approach achieves a precision of 0.70, recall of 0.68, and an F1-score of 0.69, outperforming both graph-based (precision: 0.28, recall: 0.25, F1-score: 0.26) and knowledge-based (precision: 0.45, recall: 0.42, F1-score: 0.43) baselines. Furthermore, the proposed LLM-powered explanations exhibit better quality, as evidenced by higher BLEU (0.52), ROUGE-L (0.58), and METEOR (0.63) scores compared to baseline approaches. We further demonstrate a strong correlation (0.65) between the interestingness measure and the quality of generated explanations, validating its ability to guide the system towards more relevant discoveries. This system offers more effective exploration by achieving more diverse and human-interpretable relationship explanations compared to purely knowledge-based and graph-based methods, contributing to the knowledge-based systems field by providing a personalized and adaptable relational exploration framework. Full article
Show Figures

Figure 1

24 pages, 7335 KiB  
Article
An Interpretable Hybrid Deep Learning Model for Molten Iron Temperature Prediction at the Iron-Steel Interface Based on Bi-LSTM and Transformer
by Zhenzhong Shen, Weigang Han, Yanzhuo Hu, Ye Zhu and Jingjing Han
Mathematics 2025, 13(6), 975; https://doi.org/10.3390/math13060975 - 15 Mar 2025
Viewed by 527
Abstract
Hot metal temperature is a key factor affecting the quality and energy consumption of iron and steel smelting. Accurate prediction of the temperature drop in a hot metal ladle is very important for optimizing transport, improving efficiency, and reducing energy consumption. Most of [...] Read more.
Hot metal temperature is a key factor affecting the quality and energy consumption of iron and steel smelting. Accurate prediction of the temperature drop in a hot metal ladle is very important for optimizing transport, improving efficiency, and reducing energy consumption. Most of the existing studies focus on the prediction of molten iron temperature in torpedo tanks, but there is a significant research gap in the prediction of molten iron ladle temperature drop, especially as the ladle is increasingly used to replace the torpedo tank in the transportation process, this research gap has not been fully addressed in the existing literature. This paper proposes an interpretable hybrid deep learning model combining Bi-LSTM and Transformer to solve the complexity of temperature drop prediction. By leveraging Catboost-RFECV, the most influential variables are selected, and the model captures both local features with Bi-LSTM and global dependencies with Transformer. Hyperparameters are optimized automatically using Optuna, enhancing model performance. Furthermore, SHAP analysis provides valuable insights into the key factors influencing temperature drops, enabling more accurate prediction of molten iron temperature. The experimental results demonstrate that the proposed model outperforms each individual model in the ensemble in terms of R2, RMSE, MAE, and other evaluation metrics. Additionally, SHAP analysis identifies the key factors contributing to the temperature drop. Full article
Show Figures

Figure 1

20 pages, 8383 KiB  
Article
Self-Supervised Time-Series Preprocessing Framework for Maritime Applications
by Shengli Dong, Jilong Liu, Bing Han, Shengzheng Wang, Hong Zeng and Meng Zhang
Electronics 2025, 14(4), 765; https://doi.org/10.3390/electronics14040765 - 16 Feb 2025
Viewed by 468
Abstract
This study proposes a novel self-supervised data-preprocessing framework for time-series forecasting in complex ship systems. The framework integrates an improved Learnable Wavelet Packet Transform (L-WPT) for adaptive denoising and a correlation-based Uniform Manifold Approximation and Projection (UMAP) approach for dimensionality reduction. The enhanced [...] Read more.
This study proposes a novel self-supervised data-preprocessing framework for time-series forecasting in complex ship systems. The framework integrates an improved Learnable Wavelet Packet Transform (L-WPT) for adaptive denoising and a correlation-based Uniform Manifold Approximation and Projection (UMAP) approach for dimensionality reduction. The enhanced L-WPT incorporates Reversible Instance Normalization to improve training efficiency while preserving denoising performance, especially for low-frequency sporadic noise. The UMAP dimensionality reduction, combined with a modified K-means clustering using correlation coefficients, enhances the computational efficiency and interpretability of the reduced data. Experimental results validate that state-of-the-art time-series models can effectively forecast the data processed by this framework, achieving promising MSE and MAE metrics. Full article
Show Figures

Figure 1

22 pages, 4481 KiB  
Article
A Clustering Algorithm Based on Local Relative Density
by Yujuan Zou, Zhijian Wang, Xiangchen Wang and Taizhi Lv
Electronics 2025, 14(3), 481; https://doi.org/10.3390/electronics14030481 - 24 Jan 2025
Cited by 1 | Viewed by 853
Abstract
DBSCAN and DPC are typical density-based clustering algorithms. These two algorithms have their drawbacks, such as difficulty in clustering when there are significant differences in density between clusters. This study proposes a clustering algorithm, RDBSCAN, which is based on local relative density, drawing [...] Read more.
DBSCAN and DPC are typical density-based clustering algorithms. These two algorithms have their drawbacks, such as difficulty in clustering when there are significant differences in density between clusters. This study proposes a clustering algorithm, RDBSCAN, which is based on local relative density, drawing on the extension strategy of DBSCAN and the allocation mechanism of DPC. The algorithm first uses k-nearest neighbors to calculate the original local density, then sorts the points in descending order of this density. It then selects the point with the highest original local density from the unprocessed points as the local center of the next cluster. Based on this local center, RDBSCAN calculates the local relative density, determines the core objects, and performs cluster expansion. Drawing on the allocation mechanism of DPC, the algorithm performs a secondary allocation for points in clusters that are too small to complete the final clustering. Comparative experiments using RDBSCAN and eight other clustering algorithms were conducted, and the test results show that RDBSCAN ranks first in clustering performance metrics among all algorithms on synthetic datasets and second on real-world datasets. Full article
Show Figures

Figure 1

24 pages, 2674 KiB  
Article
Achieving Excellence in Cyber Fraud Detection: A Hybrid ML+DL Ensemble Approach for Credit Cards
by Eyad Btoush, Xujuan Zhou, Raj Gururajan, Ka Ching Chan and Omar Alsodi
Appl. Sci. 2025, 15(3), 1081; https://doi.org/10.3390/app15031081 - 22 Jan 2025
Cited by 2 | Viewed by 1797
Abstract
The rapid advancement of technology has increased the complexity of cyber fraud, presenting a growing challenge for the banking sector to efficiently detect fraudulent credit card transactions. Conventional detection approaches face challenges in adapting to the continuously evolving tactics of fraudsters. This study [...] Read more.
The rapid advancement of technology has increased the complexity of cyber fraud, presenting a growing challenge for the banking sector to efficiently detect fraudulent credit card transactions. Conventional detection approaches face challenges in adapting to the continuously evolving tactics of fraudsters. This study addresses these limitations by proposing an innovative hybrid model that integrates Machine Learning (ML) and Deep Learning (DL) techniques through a stacking ensemble and resampling strategies. The hybrid model leverages ML techniques including Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Logistic Regression (LR) alongside DL techniques such as Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory Network (BiLSTM) with attention mechanisms. By utilising the stacking ensemble method, the model consolidates predictions from multiple base models, resulting in improved predictive accuracy compared to individual models. The methodology incorporates robust data pre-processing techniques. Experimental evaluations demonstrate the superior performance of the hybrid ML+DL model, particularly in handling class imbalances and achieving a high F1 score, achieving an F1 score of 94.63%. This result underscores the effectiveness of the proposed model in delivering reliable cyber fraud detection, highlighting its potential to enhance financial transaction security. Full article
Show Figures

Figure 1

21 pages, 10348 KiB  
Article
A Learning Resource Recommendation Method Based on Graph Contrastive Learning
by Jiu Yong, Jianguo Wei, Xiaomei Lei, Jianwu Dang, Wenhuan Lu and Meijuan Cheng
Electronics 2025, 14(1), 142; https://doi.org/10.3390/electronics14010142 - 1 Jan 2025
Viewed by 890
Abstract
The existing learning resource recommendation systems suffer from data sparsity and missing data labels, leading to the insufficient mining of the correlation between users and courses. To address these issues, we propose a learning resource recommendation method based on graph contrastive learning, which [...] Read more.
The existing learning resource recommendation systems suffer from data sparsity and missing data labels, leading to the insufficient mining of the correlation between users and courses. To address these issues, we propose a learning resource recommendation method based on graph contrastive learning, which uses graph contrastive learning to construct an auxiliary recommendation task combined with a main recommendation task, achieving the joint recommendation of learning resources. Firstly, the interaction bipartite graph between the user and the course is input into a lightweight graph convolutional network, and the embedded representation of each node in the graph is obtained after compilation. Then, for the input user–course interaction bipartite graph, noise vectors are randomly added to each node in the embedding space to perturb the embedding of graph encoder node, forming a perturbation embedding representation of the node to enhance the data. Subsequently, the graph contrastive learning method is used to construct auxiliary recommendation tasks. Finally, the main task of recommendation supervision and the constructed auxiliary task of graph contrastive learning are jointly learned to alleviate data sparsity. The experimental results show that the proposed method in this paper has improved the Recall@5 by 5.7% and 11.2% and the NDCG@5 by 0.1% and 6.4%, respectively, on the MOOCCube and Amazon-Book datasets compared with the node enhancement methods. Therefore, the proposed method can significantly improve the mining level of users and courses by using a graph comparison method in the auxiliary recommendation task and has better noise immunity and robustness. Full article
Show Figures

Figure 1

16 pages, 3708 KiB  
Article
Suppression of Strong Cultural Noise in Magnetotelluric Signals Using Particle Swarm Optimization-Optimized Variational Mode Decomposition
by Zhongda Shang, Xinjun Zhang, Shen Yan and Kaiwen Zhang
Appl. Sci. 2024, 14(24), 11719; https://doi.org/10.3390/app142411719 - 16 Dec 2024
Viewed by 736
Abstract
To effectively separate strong cultural noise in Magnetotelluric (MT) signals under strong interference conditions and restore the true forms of apparent resistivity and phase curves, this paper proposes an improved method for suppressing strong cultural noise based on Particle Swarm Optimization (PSO) and [...] Read more.
To effectively separate strong cultural noise in Magnetotelluric (MT) signals under strong interference conditions and restore the true forms of apparent resistivity and phase curves, this paper proposes an improved method for suppressing strong cultural noise based on Particle Swarm Optimization (PSO) and Variational Mode Decomposition (VMD). First, the effects of two initial parameters, the decomposition scale K and penalty factor α, on the performance of variational mode decomposition are studied. Subsequently, using the PSO algorithm, the optimal combination of influential parameters in the VMD is determined. This optimal parameter set is applied to decompose electromagnetic signals, and Intrinsic Mode Functions (IMFs) are selected for signal reconstruction based on correlation coefficients, resulting in denoised electromagnetic signals. The simulation results show that, compared to traditional algorithms such as Empirical Mode Decomposition (EMD), Intrinsic Time Decomposition (ITD), and VMD, the Normalized Cross-Correlation (NCC) and signal-to-noise ratio (SNR) of the PSO-optimized VMD method for suppressing strong cultural noise increased by 0.024, 0.035, 0.019, and 2.225, 2.446, 1.964, respectively. The processing of field data confirms that this method effectively suppresses strong cultural noise in strongly interfering environments, leading to significant improvements in the apparent resistivity and phase curve data, thereby enhancing the authenticity and reliability of underground electrical structure interpretations. Full article
Show Figures

Figure 1

Back to TopTop