Next Article in Journal
An RMSprop-Incorporated Latent Factorization of Tensor Model for Random Missing Data Imputation in Structural Health Monitoring
Previous Article in Journal
Testing the Effectiveness of Voxels for Structural Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Memory-Efficient Batching for Time Series Transformer Training: A Systematic Evaluation

1
Informatics Innovation Center of Excellence, School of Informatics, Walailak University, Nakhon Si Thammarat 80160 , Thailand
2
Capital One, New York, NY 10171, USA
3
IBM Research India, Banglore 560045, India
4
IBM TJ Watson Research Center, Yorktown Heights, NY 10598, USA
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(6), 350; https://doi.org/10.3390/a18060350
Submission received: 3 May 2025 / Revised: 27 May 2025 / Accepted: 28 May 2025 / Published: 5 June 2025
(This article belongs to the Section Parallel and Distributed Algorithms)

Abstract

Transformer-based time series models are being increasingly employed for time series data analysis. However, their training remains memory intensive, especially with high-dimensional data and extended look-back windows, while model-level memory optimizations are well studied, the batch formation process remains an underexplored factor to performance inefficiency. This paper introduces a memory-efficient batching framework based on view-based sliding windows operating directly on GPU-resident tensors. This approach eliminates redundant data materialization caused by tensor stacking and reduces data transfer volumes without modifying model architectures. We present two variants of our solution: (1)per-batch optimization for datasets exceeding GPU memory, and (2) dataset-wise optimization for in-memory workloads. We evaluate our proposed batching framework systematically using peak GPU memory consumption and epoch runtime as efficiency metrics across varying batch sizes, sequence lengths, feature dimensions, and model architectures. Results show consistent memory savings, averaging 90% and runtime improvements of up to 33% across multiple transformer-based models (Informer, Autoformer, Transformer, and PatchTST) and a linear baseline (DLinear) without compromising model accuracy. We extensively validate our method using synthetic and standard real-world benchmarks, demonstrating accuracy preservation and practical scalability in distributed GPU environments. The proposed method highlights batch formation process as a critical component for improving training efficiency.
Keywords: view-based batching; time series forecasting; transformer models; distributed training view-based batching; time series forecasting; transformer models; distributed training

Share and Cite

MDPI and ACS Style

Sinthong, P.; Nguyen, N.; Ekambaram, V.; Jati, A.; Kalagnanam, J.; Koad, P. Memory-Efficient Batching for Time Series Transformer Training: A Systematic Evaluation. Algorithms 2025, 18, 350. https://doi.org/10.3390/a18060350

AMA Style

Sinthong P, Nguyen N, Ekambaram V, Jati A, Kalagnanam J, Koad P. Memory-Efficient Batching for Time Series Transformer Training: A Systematic Evaluation. Algorithms. 2025; 18(6):350. https://doi.org/10.3390/a18060350

Chicago/Turabian Style

Sinthong, Phanwadee, Nam Nguyen, Vijay Ekambaram, Arindam Jati, Jayant Kalagnanam, and Peeravit Koad. 2025. "Memory-Efficient Batching for Time Series Transformer Training: A Systematic Evaluation" Algorithms 18, no. 6: 350. https://doi.org/10.3390/a18060350

APA Style

Sinthong, P., Nguyen, N., Ekambaram, V., Jati, A., Kalagnanam, J., & Koad, P. (2025). Memory-Efficient Batching for Time Series Transformer Training: A Systematic Evaluation. Algorithms, 18(6), 350. https://doi.org/10.3390/a18060350

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop