Topic Editors

Data Stream Mining and Processing
Topic Information
Dear Colleagues,
Data are generated in increasingly high volumes and velocities of streams underpinned by the digitalization of diverse industrials, advancement of technologies of smart sensors, mobile devices, social media, and industrial systems. Unlike static datasets, data streams arrive continuously at high velocity and volume with unpredictable bursts, making it infeasible to process the entire system. This imposes challenges on system design of memory access, computation time, and storage, upon portioning the entire stream into stages. Algorithms must operate in bounded-memory, low-latency, and often single-pass modes, with the ability to adapt in real time to concept drift, handling noisy or incomplete data, and maintaining reliable performance under non-stationary and bursty workloads. A compelling frontier is the hybrid integration of data stream mining with LLM-based reasoning. In such a new system design paradigm, streaming data is continuously embedded, indexed, and injected into LLM pipelines to enable context-aware decision support. This integration presents a number of open technical challenges including real-time vector embedding and semantic indexing; semantic drift mitigation and alignment of embedding spaces to LLM reasoning; temporal retrieval to serve LLMs with the most relevant information; provenance-aware embedding pipelines; and context-aware redundancy compression and summarization. In addition to the classical CAP theorem trade-offs between consistency, availability, and partition, the advancement of data stream systems must meet crosscutting quality attributes of observability to monitor model behavior, system health, and evolving data characteristics in real time; explainability to ensure transparent decision-making, where model updates are continuous and context evolves; sustainability, to optimize energy efficiency, computational resource usage, and long-term model maintainability; and learning performance, including accuracy under concept drift, adaptability to unseen patterns, and robustness against adversarial or anomalous inputs. Balancing these competing objectives is a key challenge in the design of data stream mining systems and motivates the development of new algorithms and architectures for scalable, interpretable, and resilient stream analytics.
This Topic, “Data Stream Mining and Processing”, aims to bring together novel algorithmic developments, advanced system design, practical implementations, theoretical insights, practices, and processes that address the challenges of real-time data stream processing. We invite high-quality, original research contributions that tackle core problems in data stream mining, including pattern detection, online learning, memory-efficient processing, dynamic model adaptation, metrics, processes, frameworks, and toolchains to improve crosscutting quality attributes. Furthermore, we welcome interdisciplinary papers that apply these algorithms to domains such as cybersecurity, financial analytics, Internet of Things (IoT), smart cities, bioinformatics, healthcare, financial systems, and social applications.
The objective of this topic is to provide a comprehensive view of the state of the art in data stream processing and to foster collaboration between researchers and practitioners in algorithms, artificial intelligence, systems, and application domains. Contributions may include, but are not limited to, the following areas:
- Core Data Stream Mining Topics
- Algorithms for classification, clustering, and regression over data streams
- Online learning and continual learning under concept drift
- Sliding window models, synopsis structures, and approximation techniques
- Stream-based ensemble methods and drift detection mechanisms
- Real-time anomaly detection and change point analysis
- Stream mining for graphs, time series, and multi-modal data
- LLM-Based Reasoning and Hybrid System Design
- Architectures for integrating data stream mining with LLMs
- Real-time retrieval-augmented generation (RAG) using stream embeddings
- Online embedding generation and semantic drift handling in LLM pipelines
- Hybrid temporal-semantic retrieval models for LLM prompting
- LLM-centric prompt filtering, summarization, and token-aware selection from streams
- Evaluation and benchmarking of hybrid stream–LLM systems
- Vector Database and Embedding Infrastructure
- Incremental indexing and efficient ingestion for vector databases in stream settings
- Embedding provenance and auditability for retrieval-based AI systems
- Streaming-compatible algorithms and hybrid search methods
- Crosscutting System Attributes
- Observability and monitoring frameworks for streaming AI systems
- Explainability and interpretability in evolving online models
- Energy-aware and sustainable deployment of continuous learning systems
- Robustness to adversarial inputs, label noise, and incomplete supervision
- Applications and Case Studies
- Real-time stream mining in domains such as cybersecurity, finance, healthcare, and smart cities
- Streaming interfaces for conversational agents and adaptive user modeling
- Federated or edge-based stream analytics for IoT and mobile environments
- Low-latency AI decision-making in autonomous and mission-critical systems
Prof. Dr. Yan Liu
Dr. Zheng Li
Topic Editors
Keywords
- data stream mining
- online learning
- large language models
- hybrid AI systems
- retrieval-augmented generation
- vector databases
- semantic drift
- concept drift
- observability
- explainability
- real-time machine learning
- sustainable AI
- embedding pipelines
Participating Journals
Journal Name | Impact Factor | CiteScore | Launched Year | First Decision (median) | APC | |
---|---|---|---|---|---|---|
![]()
Algorithms
|
2.1 | 4.5 | 2008 | 17.8 Days | CHF 1800 | Submit |
![]()
Applied Sciences
|
2.5 | 5.5 | 2011 | 19.8 Days | CHF 2400 | Submit |
![]()
Data
|
2.0 | 5.0 | 2016 | 25.2 Days | CHF 1600 | Submit |
![]()
Information
|
2.9 | 6.5 | 2010 | 18.6 Days | CHF 1800 | Submit |
![]()
Mathematics
|
2.2 | 4.6 | 2013 | 18.4 Days | CHF 2600 | Submit |
Preprints.org is a multidisciplinary platform offering a preprint service designed to facilitate the early sharing of your research. It supports and empowers your research journey from the very beginning.
MDPI Topics is collaborating with Preprints.org and has established a direct connection between MDPI journals and the platform. Authors are encouraged to take advantage of this opportunity by posting their preprints at Preprints.org prior to publication:
- Share your research immediately: disseminate your ideas prior to publication and establish priority for your work.
- Safeguard your intellectual contribution: Protect your ideas with a time-stamped preprint that serves as proof of your research timeline.
- Boost visibility and impact: Increase the reach and influence of your research by making it accessible to a global audience.
- Gain early feedback: Receive valuable input and insights from peers before submitting to a journal.
- Ensure broad indexing: Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.