Next Article in Journal
Constructing Two Edge-Disjoint Hamiltonian Cycles in BCube Data Center Networks for All-to-All Broadcasting
Previous Article in Journal
Developable Ruled Surfaces with Constant Mean Curvature Along a Curve
Previous Article in Special Issue
SFSIN: A Lightweight Model for Remote Sensing Image Super-Resolution with Strip-like Feature Superpixel Interaction Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Editorial

Where Are Data Mining and Machine Learning Headed in the Era of Big Knowledge and Large Models?

1
School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
2
Engineering Research Center of Blockchain Application, Supervision and Management, Southeast University, Ministry of Education, Nanjing 211189, China
*
Author to whom correspondence should be addressed.
Current address: Jiulonghu Campus, Southeast University, No. 2 SEU Road, Nanjing 211189, China.
Mathematics 2026, 14(2), 233; https://doi.org/10.3390/math14020233
Submission received: 20 December 2025 / Revised: 6 January 2026 / Accepted: 6 January 2026 / Published: 8 January 2026

1. Introduction

The advent of big knowledge, whose basis is massive volumes of heterogeneous data, and the success of large pre-trained models have revolutionized artificial intelligence. Data mining and machine learning, once focused on pattern detection from curated datasets, now grapple with complex, noisy, heterogeneous, and dynamic data sources. Looking back at the progress in data mining and machine learning in recent years, we have identified several noteworthy trends related to this Special Issue as follows:
First, Transformer [1], a deep learning framework that strongly relies on attention mechanisms to model the relationships between elements in an input sequence, has had a revolutionary impact on the field of machine learning. The core components of the Transformer include multi-head self-attention and a feed-forward network, and it incorporates sequence order information through positional encoding. This design allows the model to simultaneously focus on all positions in the input sequence, thus efficiently processing data from multiple modalities such as natural language, speech, and images. It has become the cornerstone of modern NLP. Pre-trained language models, such as BERT [2] and GPT [3], are based on the Transformer architecture, achieving breakthrough performance in tasks such as text classification, machine translation, and question answering. It has driven the development of cross-modal learning. The Vision Transformer segments images into patches and, analogous to a word input Transformer, achieves performance comparable to or even better than CNNs on the ImageNet classification task [4]. Also, it has facilitated the development of large-scale models and driven the evolution from specialized models to general-purpose foundational models.
Second, self-supervised learning (SSL) has become one of the mainstream paradigms of modern machine learning. The core idea of SSL is to train models by automatically constructing supervisory signals from unlabeled data, thereby avoiding reliance on large amounts of manually labeled data. SSL has made significant progress in recent years. Contrastive learning [5] effectively improves representation quality by bringing the representations of different augmented views of the same data sample closer together while simultaneously pushing the representations of different samples further apart. Masked autoencoders, such as MAE [6], achieve efficient visual representation learning by reconstructing occluded image patches. Multimodal self-supervised learning, such as CLIP [7], performs exceptionally well on zero-shot classification tasks through image-text contrastive learning. SSL has driven the learning paradigm shift from “small data plus strong supervision” to “big data plus weak supervision”.
Third, as data mining and machine learning algorithms are widely deployed in the real world, their security and trustworthiness are increasingly challenging. The primary challenges include adversarial attacks, model extraction, data poisoning, privacy leakage, and insufficient interpretability of model decisions. This direction has attracted increasing attention from the research community and has made rapid progress in recent years. For example, adversarial training focuses on improving training efficiency, expanding the scope of defense, and balancing the “robustness–accuracy trade-off” between natural precision and robustness [8]. As model sizes increase, hallucination becomes a typical problem in large models, referring to outputs that seem reasonable but are actually incorrect, unfounded, or irrelevant to the input context [9]. The academic community has conducted in-depth research on the detection, mitigation, and evaluation of large-scale illusions, resulting in representative achievements such as the fine-grained fact evaluation framework FactScore [10], retrieval augmentation generation [11], and Chain-of-Thought [12].
Finally, the combination of big knowledge and large models is profoundly transforming the application of artificial intelligence in several key fields. By integrating language understanding capabilities derived from massive amounts of parameters with factual support from external knowledge, these systems significantly improve accuracy, interpretability, and credibility in specialized domains. This paradigm has achieved breakthroughs in fields such as healthcare, scientific research, legal services, education, and intelligent decision-making. The synergy between big knowledge and large models is driving AI from “general dialogue” to “professional intelligence,” with its core trend being the enhancement of system reliability and usability through retrieval augmentation, knowledge alignment, and multimodal fusion.
The research articles published in this special issue demonstrate the technological development trends in the era of big knowledge and large models from the four aspects mentioned above. While they do not cover all the characteristics of this era, they are sufficient to provide some insights for the research community.

2. An Overview of Published Articles

The list of published contributions is as follows:
The study “Abnormal Traffic Detection System Based on Feature Fusion and Sparse Transformer” by Xinjian Zhao, Weiwei Miao, Guoquan Yuan, Yu Jiang, Song Zhang, and Qianmu Li was published in May 2024, which proposed a feature fusion and sparse transformer-based anomalous traffic detection system. The system utilizes a feature fusion network to encode the traffic data sequences and extract features, fusing them into coding vectors through shallow and deep convolutional networks. Then, it uses a sparse transformer to capture the complex relationships between network flows. Finally, a multilayer perceptron is used to classify the traffic and achieve anomaly traffic detection. The study shows that the feature fusion network improves feature extraction from small sample data, the deep encoder enhances the understanding of complex traffic patterns, and the sparse transformer reduces the computational and storage overhead and improves the scalability of the model.
The study “A Novel Online Hydrological Data Quality Control Approach Based on Adaptive Differential Evolution” by Qun Zhao, Shicheng Cui, Yuelong Zhu, Rui Li, and Xudong Zhou was published in June 2024, which proposed an online hydrological data quality control method based on an adaptive differential evolution algorithm according to the characteristics of hydrological data. The method takes into account the characteristics of continuity, periodicity, and seasonality, forming a Periodic Temporal Long Short-Term Memory predictive control model. Considering the real-time nature of the data, an adaptive differential evolution algorithm is used to optimize the above model, creating an Online Composite Predictive Control Model that provides confidence intervals and recommended values for control and replacement. The proposed method can effectively manage data quality, providing a solid data foundation for hydrological data analysis.
The study “DABC: A Named Entity Recognition Method Incorporating Attention Mechanisms” by Fangling Leng, Fan Li, Yubin Bao, Tiancheng Zhang, and Ge Yu was published in June 2024 and proposed an entity recognition method based on the DeBERTa-Attention-BiLSTM-CRF model. The proposed method first extracts features using a DeBERTa model. Then, the attention mechanism is introduced to further enhance the extracted features. Finally, it uses a BiLSTM to further capture the long-distance dependencies in the text and obtain the predicted sequences through the CRF layer, and then identify the entities in the text. This work advances the development of complex Chinese named entity recognition.
The study “Enhancing Knowledge-Concept Recommendations with Heterogeneous Graph-Contrastive Learning” by Liting Wei, Yun Li, Weiwei Wang, and Yi Zhu was published in July 2024 and proposed a novel multi-task strategy for knowledge-concept recommendations, namely Knowledge-Concept Recommendations with Heterogeneous Graph-Contrastive Learning. It considers learners and their structural neighbors as positive contrastive pairs to construct self-supervision signals on the predefined meta-path from heterogeneous information networks as auxiliary tasks, which can capture the higher-order neighbors of learners by presenting different perspectives. Then, the information noise-contrastive estimation loss is regarded as the main training objective to increase the differentiation of learners from different professional backgrounds. This study demonstrates the crucial role of contrastive learning on knowledge graphs in recommendation systems.
The study “KCB-FLAT: Enhancing Chinese Named Entity Recognition with Syntactic Information and Boundary Smoothing Techniques” by Zhenrong Deng, Zheng Huang, Shiwei Wei, and Jinglin Zhang was published in August 2024, which proposed an innovative model KCB-FLAT to enhance Chinese named entity recognition performance by integrating enriched semantic information with the word-boundary smoothing. It extracts various types of syntactic data and utilizes a key-value memory network to generate syntactic feature embeddings for Chinese characters. Subsequently, it employs an encoder named Cross-Transformer to thoroughly combine syntactic and lexical information to address the entity boundary segmentation errors caused by lexical information. Finally, it introduces boundary smoothing, combined with a regularity-conscious function, to capture the internal regularity of entities, reducing the overconfidence of models.
The study “Few-Shot Learning Sensitive Recognition Method Based on Prototypical Network” by Guoquan Yuan, Xinjian Zhao, Liu Li, Song Zhang, and Shanming Wei was published in September 2024, which proposed a prototype network-based named entity recognition method, namely the FSPN-NER model, to solve the problem of difficult recognition of sensitive data in data-sparse text. The model utilizes the positional coding model to pre-train the data and perform feature extraction, then computes the prototype vectors to achieve entity matching, and finally introduces a boundary detection module to enhance named entity recognition. This work shows the effectiveness of prototype networks utilizing a few labeled data and category prototypes to enhance the generalization of models.
The study “Pre-Trained Language Model Ensemble for Arabic Fake News Detection” by Lama Al-Zahrani and Maha Al-Yahya was published in September 2024, which investigated an ensemble of transformer-based models, such as AraBERT, MARBERT, AraELECTRA, AraGPT2, and ARBERT, using them in the tasks of Arabic fake news detection. Arabic possesses complex linguistic patterns and contexts, leading to performance degradation in standalone Transformer-based fake news detection systems. This study evaluated various ensemble methods, including weighted average ensemble, hard voting, and soft voting, to determine the most effective techniques for improving the learning model and increasing prediction accuracy.
The study “Fortify the Guardian, Not the Treasure: Resilient Adversarial Detectors” by JRaz Lapid, Almog Dubin, and Moshe Sipper was published in November 2024, which introduced a robust adversarial detection method via adversarial retraining (RADAR). RADAR aims to fortify adversarial detectors against such adaptive attacks while preserving the classifier’s accuracy. It employs adversarial training by incorporating adversarial examples, i.e., crafted to deceive both the classifier and the detector, into the training process. This dual optimization enables the detector to learn and adapt to sophisticated attack scenarios. Improving the ability of machine learning models to cope with adversarial attacks is an important direction for the development of trustworthy artificial intelligence. This article provides a unique example.
The study “Advanced Trans-BiGRU-QA Fusion Model for Atmospheric Mercury Prediction” by Dong-Her Shih, Feng-I. Chung, Ting-Wei Wu, Bo-Hao Wang, and Ming-Hung Shih was published in November 2024, which proposed a novel advanced Trans-BiGRU-QA hybrid model to predict the atmospheric mercury concentration accurately. This study demonstrates the role of feature engineering methods and sliding window-based time series processing methods in environmental pollution prediction.
The study “Advanced Trans-EEGNet Deep Learning Model for Hypoxic-Ischemic Encephalopathy Severity Grading” by Dong-Her Shih, Feng-I Chung, Ting-Wei Wu, Shuo-Yu Huang, and Ming-Hung Shih was published in December 2024, which proposed a deep learning method for rapid classification and assessment of hypoxic–ischemic encephalopathy severity in newborns. The method addresses data imbalance and noise interference through data preprocessing techniques. The combination of an EEGNet and a Transformer with an attention mechanism improves the computation time and feature extraction. This article demonstrates the important applications of the Transformer architecture and data preprocessing methods in the classification and assessment of key diseases.
The study “Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review” by Wan Zhang and Jing Zhang, published in March 2025, was a review article in this issue. It particularly focuses on hallucination in retrieval-augmented large language models. First, the causes of hallucinations from different sub-tasks in the retrieval and generation phases are carefully examined. Then, a comprehensive framework is proposed to address the hallucinations in retrieval-augmented LLMs after reviewing current hallucination mitigation techniques. In addition, it reviews hallucination-reduction methods based on detection and correction, and discusses the promising future research directions. This article has significant reference value for the study of hallucination mitigation.
The study “SFSIN: A Lightweight Model for Remote Sensing Image Super-Resolution with Strip-like Feature Superpixel Interaction Network” by Yanxia Lyu, Yuhang Liu, Qianqian Zhao, Ziwen Hao, and Xin Song was published in May 2025, which proposed a novel lightweight super-resolution model for remote sensing images, namely a strip-like feature superpixel interaction network (SFSIN). It utilizes the Transformer to capture global context information through long-range dependencies and the CNNs to perform shape-adaptive convolutions. By stacking strip-like feature superpixel interaction modules, it aggregates strip-like features to enable deep feature extraction from local and global perspectives. It also uses the convolutional block attention module with upsampling convolution to integrate deep features from spatial and channel dimensions to improve reconstruction performance. The proposed method effectively reduces the number of parameters and computational complexity of super-resolution techniques, making it easier to use in resource-constrained edge environments.
The published articles are listed below:
  • Zhao X.; Miao W.; Yuan G.; Jiang Y.; Zhang S.; Li Q. Abnormal Traffic Detection System Based on Feature Fusion and Sparse Transformer. Mathematics 2024, 12(11), 1643. https://doi.org/10.3390/math12111643.
  • Zhao Q.; Cui S.; Zhu Y.; Li R.; Zhou X. A Novel Online Hydrological Data Quality Control Approach Based on Adaptive Differential Evolution. Mathematics 2024, 12(12), 1821. https://doi.org/10.3390/math12121821.
  • Leng F.; Li F.; Bao Y.; Zhang T.; Ye G. DABC: A Named Entity Recognition Method Incorporating Attention Mechanisms. Mathematics 2024, 12(13), 1992. https://doi.org/10.3390/math12131992.
  • Wei L.; Li Y.; Wang W.; Zhu Y. Enhancing Knowledge-Concept Recommendations with Heterogeneous Graph-Contrastive Learning. Mathematics 2024, 12(15), 2324. https://doi.org/10.3390/math12152324.
  • Deng Z.; Huang Z.; Wei S.; Zhang J. KCB-FLAT: Enhancing Chinese Named Entity Recognition with Syntactic Information and Boundary Smoothing Techniques. Mathematics 2024, 12(17), 2714. https://doi.org/10.3390/math12172714.
  • Yuan G.; Zhao X.; Li L.; Zhang S.; Wei S. Few-Shot Learning Sensitive Recognition Method Based on Prototypical Network. Mathematics 2024, 12(17), 2791. https://doi.org/10.3390/math12172791.
  • Al-Zahrani L.; Al-Yahya M. Pre-Trained Language Model Ensemble for Arabic Fake News Detection. Mathematics 2025, 12(18), 2941. https://doi.org/10.3390/math12182941.
  • Lapid R.; Dubin A.; Sipper M. Fortify the Guardian, Not the Treasure: Resilient Adversarial Detectors. Mathematics 2024, 12(22), 3451. https://doi.org/10.3390/math12223451.
  • Shih D.-H.; Chung F.-I.;Wu T.-W.;Wang B.-H.; Shih M.-H. Advanced Trans-BiGRUQA Fusion Model for Atmospheric Mercury Prediction. Mathematics 2024, 12(22), 3547. https://doi.org/10.3390/math12223547.
  • Shih D.-H.; Chung F.-I.; Wu T.-W.; Huang S.-Y.; Shih M.-H. Advanced Trans-EEGNet Deep Learning Model for Hypoxic-Ischemic Encephalopathy Severity Grading. Mathematics 2024, 12(24), 3915. https://doi.org/10.3390/math12243915.
  • Zhang W.; Zhang J. Hallucination Mitigation for Retrieval-Augmented Large Language Models: A Review. Mathematics 2025, 13(5), 856. https://doi.org/10.3390/math13050856.
  • Lyu Y.; Liu Y.; Zhao Q.; Hao Z.; Song X. SFSIN: A Lightweight Model for Remote Sensing Image Super-Resolution with Strip-like Feature Superpixel Interaction Network. Mathematics 2025, 13(11), 1720. https://doi.org/10.3390/math13111720.

3. Conclusions

The objectives of this Special Issue on “Data Mining and Machine Learning in the Era of Big Knowledge and Large Models” were successfully achieved through the incorporation of groundbreaking research in these fields. Each contribution significantly advanced the understanding and capabilities of big knowledge and large model technologies, focusing on Transformer-based named entity recognition, knowledge recommendation, time series analysis, model integration, data quality improvement, hallucinations, and innovative application development. The collective impact of these studies is profound, aligning with the core purpose: to enhance the breadth and quality of knowledge acquisition and to promote intelligent systems to better serve humanity. This Special Issue stands as a testament to the potential of relevant technologies in shaping a more informed, efficient, and smart world.

Acknowledgments

I would like to extend my appreciation to the authors for their diligent research, the reviewers for providing insightful comments and constructive suggestions, and the editors and proofreading team for their meticulous attention to detail, ensuring high-quality publishing in terms of both research content and printing standards.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
  2. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar]
  3. Yenduri, G.; Ramalingam, M.; Selvi, G.C.; Supriya, Y.; Srivastava, G.; Maddikunta, P.K.R.; Raj, G.D.; Jhaveri, R.H.; Prabadevi, B.; Wang, W.; et al. GPT (generative pre-trained transformer)—A comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. IEEE Access 2024, 12, 54608–54649. [Google Scholar] [CrossRef]
  4. Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In Proceedings of the 2021 International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar]
  5. Hu, H.; Wang, X.; Zhang, Y.; Chen, Q.; Guan, Q. A comprehensive survey on contrastive learning. Neurocomputing 2024, 610, 128645. [Google Scholar] [CrossRef]
  6. He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
  7. Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
  8. Qian, Z.; Huang, K.; Wang, Q.F.; Zhang, X.Y. A survey of robust adversarial training in pattern recognition: Fundamental, theory, and methodologies. Pattern Recognit. 2022, 131, 108889. [Google Scholar] [CrossRef]
  9. Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM Trans. Inf. Syst. 2025, 43, 1–55. [Google Scholar] [CrossRef]
  10. Min, S.; Krishna, K.; Lyu, X.; Lewis, M.; Yih, W.T.; Koh, P.W.; Iyyer, M.; Zettlemoyer, L.; Hajishirzi, H. Factscore: Fine-grained atomic evaluation of factual precision in long form text generation. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 12076–12100. [Google Scholar]
  11. Izacard, G.; Lewis, P.; Lomeli, M.; Hosseini, L.; Petroni, F.; Schick, T.; Dwivedi-Yu, J.; Joulin, A.; Riedel, S.; Grave, E. Atlas: Few-shot learning with retrieval augmented language models. J. Mach. Learn. Res. 2023, 24, 11912–11954. [Google Scholar]
  12. Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large language models are zero-shot reasoners. Adv. Neural Inf. Process. Syst. 2022, 35, 22199–22213. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, J.; Zhang, W. Where Are Data Mining and Machine Learning Headed in the Era of Big Knowledge and Large Models? Mathematics 2026, 14, 233. https://doi.org/10.3390/math14020233

AMA Style

Zhang J, Zhang W. Where Are Data Mining and Machine Learning Headed in the Era of Big Knowledge and Large Models? Mathematics. 2026; 14(2):233. https://doi.org/10.3390/math14020233

Chicago/Turabian Style

Zhang, Jing, and Wan Zhang. 2026. "Where Are Data Mining and Machine Learning Headed in the Era of Big Knowledge and Large Models?" Mathematics 14, no. 2: 233. https://doi.org/10.3390/math14020233

APA Style

Zhang, J., & Zhang, W. (2026). Where Are Data Mining and Machine Learning Headed in the Era of Big Knowledge and Large Models? Mathematics, 14(2), 233. https://doi.org/10.3390/math14020233

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop