Adaptive Data Prefetching for File Storage Systems Using Online Machine Learning

George Savva; Herodotos Herodotou

doi:10.3390/bdcc10010028

and

Department of Electrical Engineering, Computer Science and Engineering, Cyprus University of Technology, Limassol 3036, Cyprus

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput.2026, 10(1), 28;https://doi.org/10.3390/bdcc10010028
(registering DOI)

Version Notes

Order Reprints

Abstract

Data prefetching is essential for modern file storage systems operating in large-scale cloud and data-intensive environments, where high performance increasingly depends on intelligent, adaptive mechanisms. Traditional rule-based methods and recently proposed machine learning-based techniques often struggle to cope with the complex and rapidly evolving data access patterns characteristic of big-data workloads. In this paper, we introduce an online, streaming machine learning (SML) approach for predictive data prefetching that retrieves useful data into the cache ahead of time. We present a novel online training framework that extracts features in real time and continuously updates streaming ML models to learn and adapt from large and dynamic access streams. Building on this framework, we design new SML-driven prefetching algorithms that decide when, how, and what data to prefetch into the cache with minimal overhead. Extensive experiments using production traces from Huawei Technologies Inc. and Google workloads from the SNIA IOTTA repository demonstrate that our intelligent policies consistently deliver the highest byte hits among competing approaches, achieving 97% prefetch byte precision and reducing data access latency by up to 2.8 times. These results show that streaming ML can deliver immediate performance gains and offers a scalable foundation for future adaptive storage systems.

Keywords:

online machine learning; smart storage systems; data prefetching; Hoeffding tree; SOKNL; streaming ML

Adaptive Data Prefetching for File Storage Systems Using Online Machine Learning

Abstract

Article Metrics

Citations

Article Access Statistics