You are currently on the new version of our website. Access the old version .
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

10 January 2026

Adaptive Data Prefetching for File Storage Systems Using Online Machine Learning

and
Department of Electrical Engineering, Computer Science and Engineering, Cyprus University of Technology, Limassol 3036, Cyprus
*
Author to whom correspondence should be addressed.
Big Data Cogn. Comput.2026, 10(1), 28;https://doi.org/10.3390/bdcc10010028 
(registering DOI)

Abstract

Data prefetching is essential for modern file storage systems operating in large-scale cloud and data-intensive environments, where high performance increasingly depends on intelligent, adaptive mechanisms. Traditional rule-based methods and recently proposed machine learning-based techniques often struggle to cope with the complex and rapidly evolving data access patterns characteristic of big-data workloads. In this paper, we introduce an online, streaming machine learning (SML) approach for predictive data prefetching that retrieves useful data into the cache ahead of time. We present a novel online training framework that extracts features in real time and continuously updates streaming ML models to learn and adapt from large and dynamic access streams. Building on this framework, we design new SML-driven prefetching algorithms that decide when, how, and what data to prefetch into the cache with minimal overhead. Extensive experiments using production traces from Huawei Technologies Inc. and Google workloads from the SNIA IOTTA repository demonstrate that our intelligent policies consistently deliver the highest byte hits among competing approaches, achieving 97% prefetch byte precision and reducing data access latency by up to 2.8 times. These results show that streaming ML can deliver immediate performance gains and offers a scalable foundation for future adaptive storage systems.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.