Submit your Manuscript Submit your Abstract Propose a Topic

Topic Menu

Topic Editors

Prof. Dr. Yan Liu

E-Mail Website

Gina Cody School of Engineering and Computer Science, Concordia University, Montreal, QC H3G 1M8, Canada

Dr. Zheng Li

E-Mail Website

School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast, Northern Ireland, UK

Data Stream Mining and Processing

Abstract submission deadline

28 February 2027

Manuscript submission deadline

30 April 2027

Viewed by

5207

Topic Information

Dear Colleagues,

Data are generated in increasingly high volumes and velocities of streams underpinned by the digitalization of diverse industrials, advancement of technologies of smart sensors, mobile devices, social media, and industrial systems. Unlike static datasets, data streams arrive continuously at high velocity and volume with unpredictable bursts, making it infeasible to process the entire system. This imposes challenges on system design of memory access, computation time, and storage, upon portioning the entire stream into stages. Algorithms must operate in bounded-memory, low-latency, and often single-pass modes, with the ability to adapt in real time to concept drift, handling noisy or incomplete data, and maintaining reliable performance under non-stationary and bursty workloads. A compelling frontier is the hybrid integration of data stream mining with LLM-based reasoning. In such a new system design paradigm, streaming data is continuously embedded, indexed, and injected into LLM pipelines to enable context-aware decision support. This integration presents a number of open technical challenges including real-time vector embedding and semantic indexing; semantic drift mitigation and alignment of embedding spaces to LLM reasoning; temporal retrieval to serve LLMs with the most relevant information; provenance-aware embedding pipelines; and context-aware redundancy compression and summarization. In addition to the classical CAP theorem trade-offs between consistency, availability, and partition, the advancement of data stream systems must meet crosscutting quality attributes of observability to monitor model behavior, system health, and evolving data characteristics in real time; explainability to ensure transparent decision-making, where model updates are continuous and context evolves; sustainability, to optimize energy efficiency, computational resource usage, and long-term model maintainability; and learning performance, including accuracy under concept drift, adaptability to unseen patterns, and robustness against adversarial or anomalous inputs. Balancing these competing objectives is a key challenge in the design of data stream mining systems and motivates the development of new algorithms and architectures for scalable, interpretable, and resilient stream analytics.

This Topic, “Data Stream Mining and Processing”, aims to bring together novel algorithmic developments, advanced system design, practical implementations, theoretical insights, practices, and processes that address the challenges of real-time data stream processing. We invite high-quality, original research contributions that tackle core problems in data stream mining, including pattern detection, online learning, memory-efficient processing, dynamic model adaptation, metrics, processes, frameworks, and toolchains to improve crosscutting quality attributes. Furthermore, we welcome interdisciplinary papers that apply these algorithms to domains such as cybersecurity, financial analytics, Internet of Things (IoT), smart cities, bioinformatics, healthcare, financial systems, and social applications.

The objective of this topic is to provide a comprehensive view of the state of the art in data stream processing and to foster collaboration between researchers and practitioners in algorithms, artificial intelligence, systems, and application domains. Contributions may include, but are not limited to, the following areas:

Core Data Stream Mining Topics
- Algorithms for classification, clustering, and regression over data streams
- Online learning and continual learning under concept drift
- Sliding window models, synopsis structures, and approximation techniques
- Stream-based ensemble methods and drift detection mechanisms
- Real-time anomaly detection and change point analysis
- Stream mining for graphs, time series, and multi-modal data
- LLM-Based Reasoning and Hybrid System Design
- Architectures for integrating data stream mining with LLMs
- Real-time retrieval-augmented generation (RAG) using stream embeddings
- Online embedding generation and semantic drift handling in LLM pipelines
- Hybrid temporal-semantic retrieval models for LLM prompting
- LLM-centric prompt filtering, summarization, and token-aware selection from streams
- Evaluation and benchmarking of hybrid stream–LLM systems
Vector Database and Embedding Infrastructure
- Incremental indexing and efficient ingestion for vector databases in stream settings
- Embedding provenance and auditability for retrieval-based AI systems
- Streaming-compatible algorithms and hybrid search methods
Crosscutting System Attributes
- Observability and monitoring frameworks for streaming AI systems
- Explainability and interpretability in evolving online models
- Energy-aware and sustainable deployment of continuous learning systems
- Robustness to adversarial inputs, label noise, and incomplete supervision
Applications and Case Studies
- Real-time stream mining in domains such as cybersecurity, finance, healthcare, and smart cities
- Streaming interfaces for conversational agents and adaptive user modeling
- Federated or edge-based stream analytics for IoT and mobile environments
- Low-latency AI decision-making in autonomous and mission-critical systems

Prof. Dr. Yan Liu
Dr. Zheng Li
Topic Editors

Keywords

data stream mining
online learning
large language models
hybrid AI systems
retrieval-augmented generation
vector databases
semantic drift
concept drift
observability
explainability
real-time machine learning
sustainable AI
embedding pipelines

Participating Journals

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Algorithms algorithms	2.6	5.4	2008	17.6 Days	CHF 1800	Submit
Applied Sciences applsci	2.9	6.1	2011	15 Days	CHF 2400	Submit
Data data	2.4	5.4	2016	19.2 Days	CHF 1600	Submit
Information information	4.3	8.2	2010	18.7 Days	CHF 1800	Submit
Mathematics mathematics	2.3	5.4	2013	17.4 Days	CHF 2600	Submit

Preprints.org is a multidisciplinary platform offering a preprint service designed to facilitate the early sharing of your research. It supports and empowers your research journey from the very beginning.

MDPI Topics is collaborating with Preprints.org and has established a direct connection between MDPI journals and the platform. Authors are encouraged to take advantage of this opportunity by posting their preprints at Preprints.org prior to publication:

Share your research immediately: disseminate your ideas prior to publication and establish priority for your work.
Safeguard your intellectual contribution: Protect your ideas with a time-stamped preprint that serves as proof of your research timeline.
Boost visibility and impact: Increase the reach and influence of your research by making it accessible to a global audience.
Gain early feedback: Receive valuable input and insights from peers before submitting to a journal.
Ensure broad indexing: Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (3 papers)

Download All Papers

Order results

Result details

Journals

Show export options Show export options

Select all

Export citation of selected articles as:

14 pages, 3062 KB

Open AccessArticle

A New Measurement-Based Benchmark Data Set for Radio Spectrum Analysis Applications

by Szilárd László Takács, Lajos Muzsai, Zoltán Németh, Bence Bakos, András Lukács, Csaba Huszty, Péter Vári and András Lapsánszky

Data 2026, 11(5), 115; https://doi.org/10.3390/data11050115 - 11 May 2026

Viewed by 508

Abstract

Radio spectrum is a limited national resource whose efficient utilization is of strategic importance. With the rapid advancement of wireless technologies, maintaining spectrum cleanliness and enabling fast and reliable anomaly detection have become critical challenges. Artificial intelligence (AI)-based approaches have recently shown great promise in addressing these issues; however, their effectiveness strongly depends on the availability of high-quality, representative, and annotated datasets. Generating such datasets is a complex task, further complicated by environmental conditions such as weather. Hungary’s nationwide spectrum monitoring network enables continuous observation of frequency bands, thereby providing the opportunity to construct large-scale and sustainable datasets. This study introduces a measurement methodology designed for the FM sound broadcasting in the VHF band (87.5–108 MHz), presents the resulting dataset, and details the annotation process. The published, openly accessible dataset is expected to serve not only as a valuable reference point but also as a benchmark for the international research community, facilitating the development, validation, and objective comparison of AI-driven spectrum monitoring solutions. Full article

(This article belongs to the Topic Data Stream Mining and Processing)

► Show Figures

Figure 1

24 pages, 2841 KB

Open AccessArticle

Enhancing Data Quality with a Novel Neural Parameter Diffusion Approach

by Jun Yang, Kehan Hu, Zijing Yu and Zhiyang Zhang

Data 2026, 11(4), 72; https://doi.org/10.3390/data11040072 - 2 Apr 2026

Viewed by 597

Abstract

This study presents a novel neural parameter diffusion approach (FWA-PDiff) designed to enhance data quality. To address the limitations of conventional diffusion models—such as inefficient sampling and insufficient feature sensitivity, which may compromise output fidelity—this study introduces four key innovations. First, the proposed model introduces an adaptive recalibration of the sampling frequency in the Fourier domain to optimize feature extraction for image data. Second, a dual-channel autoencoder architecture is employed, featuring a multi-scale, fine-grained encoder (MFE) that enables the simultaneous capture of features at multiple resolutions. Third, a wavelet-attention mechanism (WA) is incorporated into the decoder to highlight subtle high-frequency details. Fourth, the proposed model introduces a hybrid loss function that combines Mean Squared Error (MSE) and Kullback–Leibler (KL) divergence to improve data reconstruction. Collectively, these improvements enable the generation of high-fidelity parameters, thereby contributing to enhanced data quality. Extensive experiments conducted on benchmark datasets—including MNIST, CIFAR-10, CIFAR-100, and STL-10—demonstrate the effectiveness of the proposed approach, which consistently achieves superior performance in improving data quality. Full article

(This article belongs to the Topic Data Stream Mining and Processing)

► Show Figures

Figure 1

38 pages, 12262 KB

Open AccessArticle

A Reproducible FPGA–ADC Synchronization Architecture for High-Speed Data Acquisition

by Van Muoi Ngo and Thanh Dong Nguyen

Data 2026, 11(1), 23; https://doi.org/10.3390/data11010023 - 21 Jan 2026

Cited by 2 | Viewed by 2539

Abstract

High-speed data acquisition systems based on field-programmable gate arrays (FPGAs) often face synchronization challenges when interfacing with commercial analog-to-digital converters (ADCs), particularly under constrained hardware routing conditions and vendor-specific clocking assumptions. This work presents a vendor-independent FPGA–ADC synchronization architecture that enables reliable and repeatable high-speed data acquisition without relying on clock-capable input resources. Clock and frame signals are internally reconstructed and phase-aligned within the FPGA using mixed-mode clock management (MMCM) and input serializer/deserializer (ISERDES) resources, enabling time-sequential phase observation without the need for parallel snapshot or delay-line structures. Rather than targeting absolute metrological limits, the proposed approach emphasizes a reproducible and transparent data acquisition methodology applicable across heterogeneous FPGA–ADC platforms, in which clock synchronization is treated as a system-level design parameter affecting digital interface timing integrity and data reproducibility. Experimental validation using a custom Kintex-7 (XC7K325T) FPGA and an AFE7225 ADC demonstrates stable synchronization at sampling rates of up to 125 MS/s, with frequency-offset tolerance determined by the phase-tracking capability of the internal MMCM-based alignment loop. Consistent signal acquisition is achieved over the 100 kHz–20 MHz frequency range. The measured interface level timing uncertainty remains below 10 ps RMS, confirming robust clock and frame alignment. Meanwhile, the observed signal-to-noise ratio (SNR) performance, exceeding 80 dB, reflects the phase–noise-limited measurement quality of the system. The proposed architecture provides a cost-effective, scalable, and reproducible solution for experimental and research-oriented FPGA-based data acquisition systems operating under practical hardware constraints. Full article

(This article belongs to the Topic Data Stream Mining and Processing)

► Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Displaying articles 1-3

Submit your Abstract

Journal Name	Impact Factor	CiteScore	Launched Year	First Decision (median)	APC
Algorithms algorithms	2.6	5.4	2008	17.6 Days	CHF 1800	Submit
Applied Sciences applsci	2.9	6.1	2011	15 Days	CHF 2400	Submit
Data data	2.4	5.4	2016	19.2 Days	CHF 1600	Submit
Information information	4.3	8.2	2010	18.7 Days	CHF 1800	Submit
Mathematics mathematics	2.3	5.4	2013	17.4 Days	CHF 2600	Submit

Topic Menu

Topic Editors

Data Stream Mining and Processing

Topic Information

Keywords

Participating Journals

Published Papers (3 papers)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI