Topic Editors

Gina Cody School of Engineering and Computer Science, Concordia University, Montreal, QC H3G 1M8, Canada
Dr. Zheng Li
School of Electronics, Electrical Engineering and Computer Science, Queen’s University Belfast, Belfast, Northern Ireland, UK

Data Stream Mining and Processing

Abstract submission deadline
28 February 2027
Manuscript submission deadline
30 April 2027
Viewed by
607

Topic Information

Dear Colleagues,

Data are generated in increasingly high volumes and velocities of streams underpinned by the digitalization of diverse industrials, advancement of technologies of smart sensors, mobile devices, social media, and industrial systems. Unlike static datasets, data streams arrive continuously at high velocity and volume with unpredictable bursts, making it infeasible to process the entire system. This imposes challenges on system design of memory access, computation time, and storage, upon portioning the entire stream into stages. Algorithms must operate in bounded-memory, low-latency, and often single-pass modes, with the ability to adapt in real time to concept drift, handling noisy or incomplete data, and maintaining reliable performance under non-stationary and bursty workloads.  A compelling frontier is the hybrid integration of data stream mining with LLM-based reasoning. In such a new system design paradigm, streaming data is continuously embedded, indexed, and injected into LLM pipelines to enable context-aware decision support. This integration presents a number of open technical challenges including real-time vector embedding and semantic indexing; semantic drift mitigation and alignment of embedding spaces to LLM reasoning; temporal retrieval to serve LLMs with the most relevant information; provenance-aware embedding pipelines; and context-aware redundancy compression and summarization.  In addition to the classical CAP theorem trade-offs between consistency, availability, and partition, the advancement of data stream systems must meet crosscutting quality attributes of observability to monitor model behavior, system health, and evolving data characteristics in real time; explainability to ensure transparent decision-making, where model updates are continuous and context evolves; sustainability, to optimize energy efficiency, computational resource usage, and long-term model maintainability; and learning performance, including accuracy under concept drift, adaptability to unseen patterns, and robustness against adversarial or anomalous inputs. Balancing these competing objectives is a key challenge in the design of data stream mining systems and motivates the development of new algorithms and architectures for scalable, interpretable, and resilient stream analytics.

This Topic, “Data Stream Mining and Processing”, aims to bring together novel algorithmic developments, advanced system design, practical implementations, theoretical insights, practices, and processes that address the challenges of real-time data stream processing. We invite high-quality, original research contributions that tackle core problems in data stream mining, including pattern detection, online learning, memory-efficient processing, dynamic model adaptation, metrics, processes, frameworks, and toolchains to improve crosscutting quality attributes. Furthermore, we welcome interdisciplinary papers that apply these algorithms to domains such as cybersecurity, financial analytics, Internet of Things (IoT), smart cities, bioinformatics, healthcare, financial systems, and social applications.

The objective of this topic is to provide a comprehensive view of the state of the art in data stream processing and to foster collaboration between researchers and practitioners in algorithms, artificial intelligence, systems, and application domains. Contributions may include, but are not limited to, the following areas:

  • Core Data Stream Mining Topics
    • Algorithms for classification, clustering, and regression over data streams
    • Online learning and continual learning under concept drift
    • Sliding window models, synopsis structures, and approximation techniques
    • Stream-based ensemble methods and drift detection mechanisms
    • Real-time anomaly detection and change point analysis
    • Stream mining for graphs, time series, and multi-modal data
    • LLM-Based Reasoning and Hybrid System Design
    • Architectures for integrating data stream mining with LLMs
    • Real-time retrieval-augmented generation (RAG) using stream embeddings
    • Online embedding generation and semantic drift handling in LLM pipelines
    • Hybrid temporal-semantic retrieval models for LLM prompting
    • LLM-centric prompt filtering, summarization, and token-aware selection from streams
    • Evaluation and benchmarking of hybrid stream–LLM systems
  • Vector Database and Embedding Infrastructure
    • Incremental indexing and efficient ingestion for vector databases in stream settings
    • Embedding provenance and auditability for retrieval-based AI systems
    • Streaming-compatible algorithms and hybrid search methods
  • Crosscutting System Attributes
    • Observability and monitoring frameworks for streaming AI systems
    • Explainability and interpretability in evolving online models
    • Energy-aware and sustainable deployment of continuous learning systems
    • Robustness to adversarial inputs, label noise, and incomplete supervision
  • Applications and Case Studies
    • Real-time stream mining in domains such as cybersecurity, finance, healthcare, and smart cities
    • Streaming interfaces for conversational agents and adaptive user modeling
    • Federated or edge-based stream analytics for IoT and mobile environments
    • Low-latency AI decision-making in autonomous and mission-critical systems

Prof. Dr. Yan Liu
Dr. Zheng Li
Topic Editors

Keywords

  • data stream mining
  • online learning
  • large language models
  • hybrid AI systems
  • retrieval-augmented generation
  • vector databases
  • semantic drift
  • concept drift
  • observability
  • explainability
  • real-time machine learning
  • sustainable AI
  • embedding pipelines

Participating Journals

Journal Name Impact Factor CiteScore Launched Year First Decision (median) APC
Algorithms
algorithms
2.1 4.5 2008 19.2 Days CHF 1800 Submit
Applied Sciences
applsci
2.5 5.5 2011 16 Days CHF 2400 Submit
Data
data
2.0 5.0 2016 25 Days CHF 1600 Submit
Information
information
2.9 6.5 2010 20.9 Days CHF 1800 Submit
Mathematics
mathematics
2.2 4.6 2013 17.3 Days CHF 2600 Submit

Preprints.org is a multidisciplinary platform offering a preprint service designed to facilitate the early sharing of your research. It supports and empowers your research journey from the very beginning.

MDPI Topics is collaborating with Preprints.org and has established a direct connection between MDPI journals and the platform. Authors are encouraged to take advantage of this opportunity by posting their preprints at Preprints.org prior to publication:

  1. Share your research immediately: disseminate your ideas prior to publication and establish priority for your work.
  2. Safeguard your intellectual contribution: Protect your ideas with a time-stamped preprint that serves as proof of your research timeline.
  3. Boost visibility and impact: Increase the reach and influence of your research by making it accessible to a global audience.
  4. Gain early feedback: Receive valuable input and insights from peers before submitting to a journal.
  5. Ensure broad indexing: Web of Science (Preprint Citation Index), Google Scholar, Crossref, SHARE, PrePubMed, Scilit and Europe PMC.

Published Papers (1 paper)

Order results
Result details
Journals
Select all
Export citation of selected articles as:
38 pages, 12262 KB  
Article
A Reproducible FPGA–ADC Synchronization Architecture for High-Speed Data Acquisition
by Van Muoi Ngo and Thanh Dong Nguyen
Data 2026, 11(1), 23; https://doi.org/10.3390/data11010023 - 21 Jan 2026
Viewed by 134
Abstract
High-speed data acquisition systems based on field-programmable gate arrays (FPGAs) often face synchronization challenges when interfacing with commercial analog-to-digital converters (ADCs), particularly under constrained hardware routing conditions and vendor-specific clocking assumptions. This work presents a vendor-independent FPGA–ADC synchronization architecture that enables reliable and [...] Read more.
High-speed data acquisition systems based on field-programmable gate arrays (FPGAs) often face synchronization challenges when interfacing with commercial analog-to-digital converters (ADCs), particularly under constrained hardware routing conditions and vendor-specific clocking assumptions. This work presents a vendor-independent FPGA–ADC synchronization architecture that enables reliable and repeatable high-speed data acquisition without relying on clock-capable input resources. Clock and frame signals are internally reconstructed and phase-aligned within the FPGA using mixed-mode clock management (MMCM) and input serializer/deserializer (ISERDES) resources, enabling time-sequential phase observation without the need for parallel snapshot or delay-line structures. Rather than targeting absolute metrological limits, the proposed approach emphasizes a reproducible and transparent data acquisition methodology applicable across heterogeneous FPGA–ADC platforms, in which clock synchronization is treated as a system-level design parameter affecting digital interface timing integrity and data reproducibility. Experimental validation using a custom Kintex-7 (XC7K325T) FPGA and an AFE7225 ADC demonstrates stable synchronization at sampling rates of up to 125 MS/s, with frequency-offset tolerance determined by the phase-tracking capability of the internal MMCM-based alignment loop. Consistent signal acquisition is achieved over the 100 kHz–20 MHz frequency range. The measured interface level timing uncertainty remains below 10 ps RMS, confirming robust clock and frame alignment. Meanwhile, the observed signal-to-noise ratio (SNR) performance, exceeding 80 dB, reflects the phase–noise-limited measurement quality of the system. The proposed architecture provides a cost-effective, scalable, and reproducible solution for experimental and research-oriented FPGA-based data acquisition systems operating under practical hardware constraints. Full article
(This article belongs to the Topic Data Stream Mining and Processing)
Show Figures

Figure 1

Back to TopTop