Effects of Window and Batch Size on Autoencoder-LSTM Models for Remaining Useful Life Prediction
Abstract
1. Introduction
2. Theoretical Background and Related Work
2.1. Deep Learning-Based RUL Prediction
2.2. Unsupervised Representation Learning and AE-LSTM Hybrids
2.3. Hyperparameter Configuration and Accuracy–Efficiency Trade-Offs
3. Materials and Methods
3.1. Dataset
- FD004: The most complex configuration, featuring six operating conditions and two fault modes (high-pressure compressor and fan degradation). It contains 249 engine units for training and 248 for testing. This subset is used to assess model robustness under more realistic operational variability [19]. The same grid of window sizes and batch sizes is applied to both FD001 and FD004 to enable a consistent comparison, without additional dataset-specific hyperparameter tuning. The detailed specifications of the FD001 and FD004 subsets are summarized in Table 1.
3.2. Data Preprocessing
- Feature selection: Among the 21 sensors, static sensors whose readings remained constant throughout all operational cycles (sensors 1, 5, 6, 10, 16, 18, and 19) were excluded, as they do not contribute to degradation estimation. In total, 17 features—14 dynamic sensors plus 3 operational settings—were retained as model inputs [4,8].
- RUL label generation and capping: For each engine instance in the training set, RUL was computed by subtracting the current cycle from its maximum operational cycle [4,10]. Following common practice in C-MAPSS studies, the maximum RUL was capped at 125 cycles, yielding a piecewise-linear target. The capped RUL labels were used for training; the official RUL labels provided with the test subsets were used for evaluation [4,10].
- Normalization: All features were normalized to the [0, 1] range using the MinMaxScaler implementation from scikit-learn (version 1.7.2, NumFOCUS, Austin, TX, USA) [24]. To avoid data leakage, the scaler was fitted exclusively on the training set and then applied to both the training and test sets [4,5,6].
- Sliding-window construction for training: Overlapping sequences of length (stride = 1) were extracted from each time series in the training set [8,25]. Each sequence forms an input tensor , with target set to the RUL at the last time step; here, is the window size and is the number of input features [4].
- Test window construction: For evaluation, a single window of length was constructed per test engine by taking the last time steps of the corresponding time series [19,20]. If the remaining sequence length was shorter than , the beginning of the window was padded by repeating the earliest available measurements (“edge” padding). The model thus produces one RUL prediction per engine unit, which is compared against the corresponding ground-truth RUL label [19,22].
3.3. Model Architecture: Autoencoder-LSTM (AE-LSTM)
- Unsupervised feature extraction (AE): The AE takes 17-dimensional sensor vectors as input and compresses them into an 8-dimensional latent representation. The encoder consists of fully connected layers with the structure 17 → 12 → 8, with ReLU activations after each linear layer. The decoder symmetrically reconstructs the input via layers 8 → 12 → 17, again with ReLU between the hidden layers. The AE is trained in an unsupervised manner using mean squared reconstruction error. For each subset (FD001 and FD004), each window size, and each random seed, a separate AE is pre-trained on all sliding windows extracted from the training units [5]. The resulting encoder weights are then used to initialize the encoder inside the AE-LSTM head for all batch sizes and encoder training modes corresponding to that subset-window-seed configuration [5].
- RUL prediction (LSTM head): For the supervised stage, the pre-trained encoder is embedded into an AE-LSTM head. For each input window, the encoder is applied to every time step, producing a sequence of 8-dimensional latent vectors. These encoded sequences are fed into a two-layer LSTM with 50 hidden units per layer and a dropout rate of 0.2 between layers to capture temporal dependencies. The hidden state at the final time step is passed through a regression head comprising two fully connected layers 50 → 25 → 1 with a ReLU activation between them to produce the RUL estimate.
3.4. Experimental Setup
- AE pre-training: The AE was trained using the Adam optimizer with mean squared error loss and an initial learning rate of 0.001. For each subset (FD001 and FD004), each window size, and each random seed, a separate AE was pre-trained for 50 epochs on all sliding windows extracted from the training units used for supervised learning (excluding validation units). The resulting encoder weights were reused to initialize the encoder of the AE-LSTM head for all batch sizes and encoder training modes under the same subset-window-seed configuration [5,8].
- LSTM head training with early stopping: The AE-LSTM head was trained using mean squared error loss and the Adam optimizer. The maximum number of training epochs for the LSTM head was set to 150, but early stopping based on validation loss was applied with a patience of 15 epochs [22,23,25]. The learning-rate schedule included a warm-up phase during the first five epochs, linearly increasing the effective learning rate, and the base learning rate (0.001) was scaled linearly with the batch size relative to a reference batch size of 128 (following the Linear Scaling Rule), with an upper bound of 0.001 to prevent overly large steps [22,26]. This constraint ensures training stability but implies that for very large batch sizes (e.g., 512), the learning rate is effectively capped, which may limit the optimizer’s ability to escape sharp minima compared to smaller batches. Gradient clipping with a threshold of 1.0 was applied to stabilize training. When fine-tuning the encoder, its learning rate was set to one quarter of the learning rate used for the LSTM and regression head parameters [27].
- Validation strategy and multi-seed experiments: For each subset, 20% of the training units were reserved as a validation set, ensuring that the validation set contained at least 1024 sliding windows for stable early stopping. For every configuration of window size, batch size, and encoder mode, the model was trained and evaluated across five random seeds [28]. For each seed, the same validation split was used to enable fair comparisons [9]. This multi-seed approach allows us to quantify and mitigate the bias arising from random weight initialization and optimization stochasticity [9]. Consequently, this ensures that the reported trends reflect the true impact of the hyperparameters rather than artifacts of a specific random state [12].
3.5. Evaluation Metrics
- RMSE (root mean squared error): Measures the average magnitude of the error between predicted and true RUL; lower values indicate better performance [30].
- C-MAPSS Score (asymmetric score): Penalizes late predictions more heavily than early ones, reflecting the higher risk associated with overestimating RUL. For each test sample, lower scores indicate better performance [31].
- MAE (mean absolute error): Measures the average absolute difference between predicted and true RUL; lower values indicate better performance [3].
- R2 (coefficient of determination): Represents the proportion of variance in the true RUL that is explained by the predictions; values closer to 1 indicate a better fit [5].
- Average training time per epoch: Computed over the supervised AE-LSTM training stage and does not include the one-time 50-epoch AE pre-training overhead. Throughout the remainder of the paper, this quantity is used as the main indicator of training efficiency [9].
4. Results
4.1. Results on FD001
4.2. Effect of Window Size on FD001
4.3. Effect of Batch Size and Training Efficiency on FD001
4.4. Frozen Versus Fine-Tuned Encoder on FD001
4.5. Results on FD004
4.6. Validation via Intelligent Optimization
4.7. Experimental Analysis
5. Discussion
5.1. Impact of Window Size and Physical Interpretation
5.2. Batch Size and Optimization Dynamics
5.3. Comparison with State-of-the-Art Models
5.4. Insights from Validation via Intelligent Optimization
5.5. Limitations and Future Directions
6. Conclusions
- Window size: approximately 40–70 cycles.
- Batch size: between 64 and 256.
- Encoder training: fine-tune the encoder together with the LSTM head.
- Window size: approximately 60–80 cycles.
- Batch size: again between 64 and 256, with batch size 128–256 forming an empirical accuracy–efficiency Pareto front.
- Encoder training: Encoder fine-tuning is especially important to adapt the latent space to heterogeneous operating conditions.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| AE | Autoencoder |
| LSTM | Long short-term memory |
| AE-LSTM | Autoencoder–long short-term memory |
| CNN | Convolutional neural network |
| RUL | Remaining useful life |
| PdM | Predictive maintenance |
| PHM | Prognostics and health management |
| C-MAPSS | Commercial Modular Aero-Propulsion System Simulation |
| RMSE | Root mean squared error |
| MAE | Mean absolute error |
| HPC | High-pressure compressor |
| IoT | Internet of Things |
| PSO | Particle Swarm Optimization |
| SA | Simulated Annealing (SA) |
References
- Fischer, D.; Moder, P.; Ehm, H. Investigation of Predictive Maintenance for Semiconductor Manufacturing and Its Impacts on the Supply Chain. In Proceedings of the 2021 22nd IEEE International Conference on Industrial Technology (ICIT), Valencia, Spain, 10–12 March 2021; Volume 1, pp. 1409–1416. [Google Scholar] [CrossRef]
- Nunes, P.; Santos, J.; Rocha, E. Challenges in Predictive Maintenance—A Review. CIRP J. Manuf. Sci. Technol. 2023, 40, 53–67. [Google Scholar] [CrossRef]
- Shen, L.; Wang, Y.; Du, B.; Yang, H.; Fan, H. Remaining Useful Life Prediction of Aero-Engine Based on Improved GWO and 1DCNN. Machines 2025, 13, 583. [Google Scholar] [CrossRef]
- Jiang, L.; Zhang, X.; Cao, H.; Zhang, Y. A Transformer-Based Framework with Historical Data Fusion for RUL Prediction. Meas. Sci. Technol. 2025, 36, 106103. [Google Scholar] [CrossRef]
- Lodygowski, T.; Szrama, S. Unsupervised Classification and Remaining Useful Life Prediction for Turbofan Engines Using Autoencoders and Gaussian Mixture Models: A Comprehensive Framework for Predictive Maintenance. Appl. Sci. 2025, 15, 7884. [Google Scholar] [CrossRef]
- Belay, M.A.; Blakseth, S.S.; Rasheed, A.; Salvo Rossi, P. Unsupervised Anomaly Detection for IoT-Based Multivariate Time Series: Existing Solutions, Performance Analysis and Future Directions. Sensors 2023, 23, 2844. [Google Scholar] [CrossRef]
- Li, Z.; He, Q.; Li, J. A Survey of Deep Learning-Driven Architecture for Predictive Maintenance. Eng. Appl. Artif. Intell. 2024, 133, 108285. [Google Scholar] [CrossRef]
- Elsherif, S.M.; Hafiz, B.; Makhlouf, M.A.; Farouk, O. A Deep Learning-Based Prognostic Approach for Predicting Turbofan Engine Degradation and Remaining Useful Life. Sci. Rep. 2025, 15, 26251. [Google Scholar] [CrossRef]
- Bouthillier, X.; Delaunay, P.; Bronzi, M.; Trofimov, A.; Nichyporuk, B.; Szeto, J.; Vincent, P. Accounting for Variance in Machine Learning Benchmarks. Proc. Mach. Learn. Syst. 2021, 3, 747–769. [Google Scholar]
- Wang, C.H.; Liu, J.Y. Integrating Feature Engineering with Deep Learning to Conduct Diagnostic and Predictive Analytics for Turbofan Engines. Math. Probl. Eng. 2022, 2022, 9930176. [Google Scholar] [CrossRef]
- Wang, Z.; Dahouda, M.K.; Hwang, H.; Joe, I. Explanatory LSTM-AE-Based Anomaly Detection for Time Series Data in Marine Transportation. IEEE Access 2025, 13, 117308–117320. [Google Scholar] [CrossRef]
- Keskar, N.S.; Mudigere, D.; Nocedal, J.; Smelyanskiy, M.; Tang, P.T.P. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. arXiv 2016, arXiv:1609.04836. [Google Scholar] [CrossRef]
- Fristiana, A.H.; Alfarozi, S.A.I.; Permanasari, A.E.; Pratama, M.; Wibirama, S. A Survey on Hyperparameters Optimization of Deep Learning for Time Series Classification. IEEE Access 2024, 12, 191162–191198. [Google Scholar] [CrossRef]
- Almeida, J.; Soares, J.; Lezama, F.; Limmer, S.; Rodemann, T.; Vale, Z. A systematic review of explainability in computational intelligence for optimization. Comput. Sci. Rev. 2025, 57, 100764. [Google Scholar] [CrossRef]
- Rajwar, K.; Deep, K.; Das, S. An Exhaustive Review of the Metaheuristic Algorithms for Search and Optimization: Taxonomy, applications, and open challenges. Artif. Intell. Rev. 2023, 56, 13187–13257. [Google Scholar] [CrossRef]
- Li, G.; Jung, J.J. Deep Learning for Anomaly Detection in Multivariate Time Series: Approaches, Applications, and Challenges. Inf. Fusion 2023, 91, 93–102. [Google Scholar] [CrossRef]
- Frederick, D.K. User’s Guide for the Commercial Modular Aero-Propulsion System Simulation (C-MAPSS) Software; NASA Technical Memorandum NASA/TM—2007-215026, 2007. Available online: https://ntrs.nasa.gov/api/citations/20070034949/downloads/20070034949.pdf (accessed on 13 October 2025).
- DeCastro, J.A.; Litt, J.S.; Frederick, D.K. A Modular Aero-Propulsion System Simulation of a Large Commercial Aircraft Engine; NASA Technical Memorandum NASA/TM—2008-215303, 2008. Available online: https://ntrs.nasa.gov/api/citations/20080043619/downloads/20080043619.pdf (accessed on 13 October 2025).
- Vollert, S.; Theissler, A. Challenges of Machine Learning-Based RUL Prognosis: A Review on NASA’s C-MAPSS Data Set. In Proceedings of the 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Vasteras, Sweden, 7–10 September 2021; pp. 1–8. [Google Scholar] [CrossRef]
- Mitici, M.; de Pater, I.; Barros, A.; Zeng, Z. Dynamic Predictive Maintenance for Multiple Components Using Data-Driven Probabilistic RUL Prognostics: The Case of Turbofan Engines. Reliab. Eng. Syst. Saf. 2023, 234, 109199. [Google Scholar] [CrossRef]
- Chazhoor, A.; Mounika, Y.; Sarobin, M.V.R.; Sanjana, M.V.; Yasashvini, R. Predictive Maintenance Using Machine Learning-Based Classification Models. IOP Conf. Ser. Mater. Sci. Eng. 2020, 954, 012001. [Google Scholar] [CrossRef]
- Hong, C.W.; Lee, C.; Lee, K.; Ko, M.S.; Kim, D.E.; Hur, K. Remaining Useful Life Prognosis for Turbofan Engine Using Explainable Deep Neural Networks with Dimensionality Reduction. Sensors 2020, 20, 6626. [Google Scholar] [CrossRef]
- Kulanuwat, L.; Chantrapornchai, C.; Maleewong, M.; Wongchaisuwat, P.; Wimala, S.; Sarinnapakorn, K.; Boonya-Aroonnet, S. Anomaly Detection Using a Sliding Window Technique and Data Imputation with Machine Learning for Hydrological Time Series. Water 2021, 13, 1862. [Google Scholar] [CrossRef]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Zamanzadeh Darban, Z.; Webb, G.I.; Pan, S.; Aggarwal, C.; Salehi, M. Deep Learning for Time Series Anomaly Detection: A Survey. ACM Comput. Surv. 2024, 57, 15. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. arXiv 2017. [Google Scholar] [CrossRef]
- Ong, K.S.H.; Wang, W.; Niyato, D.; Friedrichs, T. Deep-Reinforcement-Learning-Based Predictive Maintenance Model for Effective Resource Management in Industrial IoT. IEEE Internet Things J. 2021, 9, 5173–5188. [Google Scholar] [CrossRef]
- Zhang, C.; Song, D.; Chen, Y.; Feng, X.; Lumezanu, C.; Cheng, W.; Chawla, N.V. A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 1409–1416. [Google Scholar] [CrossRef]
- Yıldırım, U.; Afşer, H. Linear Methods for Predictive Maintenance: The Case of NASA C-MAPSS Datasets. Appl. Sci. 2025, 15, 9945. [Google Scholar] [CrossRef]
- Hodson, T.O. Root-mean-square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci. Model Dev. 2022, 15, 5481–5487. [Google Scholar] [CrossRef]
- Ramasso, E.; Saxena, A. Review and Analysis of Algorithmic Approaches Developed for Prognostics on CMAPSS Dataset. In Proceedings of the Annual Conference of the Prognostics and Health Management Society 2014, Fort Worth, TX, USA, 29 September–2 October 2014; Volume 6. [Google Scholar] [CrossRef]
- Fan, Z.; Li, W.; Chang, K.-C. A Bidirectional Long Short-Term Memory Autoencoder Transformer for Remaining Useful Life Estimation. Mathematics 2023, 11, 4972. [Google Scholar] [CrossRef]
- Tan, W.M.; Teo, T.H. Remaining Useful Life Prediction Using Temporal Convolution with Attention. AI 2021, 2, 48–70. [Google Scholar] [CrossRef]
- Thakuri, S.K.; Li, H.; Ruan, D.; Wu, X. The RUL Prediction of Li-Ion Batteries Based on Adaptive LSTM. J. Dyn. Monit. Diagn. 2025, 4, 53–64. [Google Scholar] [CrossRef]
- Leukel, J.; González, J.; Riekert, M. Machine Learning-Based Failure Prediction in Industrial Maintenance: Improving Performance by Sliding Window Selection. Int. J. Qual. Reliab. Manag. 2023, 40, 1449–1462. [Google Scholar] [CrossRef]
- Zito, F.; Talbi, E.-G.; Cavallaro, C.; Cutello, V.; Pavone, M. Metaheuristics in automated machine learning: Strategies for optimization. Intell. Syst. Appl. 2025, 26, 200532. [Google Scholar] [CrossRef]
- Bischl, B.; Binder, M.; Lang, M.; Pielok, T.; Richter, J.; Coors, S.; Lindauer, M. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2023, 13, e1484. [Google Scholar] [CrossRef]
- Rakesh, V.; Mazumdar, S.; Samanta, T.; Pal, S.; Das, A. Impact of hyperparameter optimization on the accuracy of lightweight deep learning models for real-time image classification. arXiv 2025, arXiv:2507.23315. [Google Scholar] [CrossRef]
- Gupta, M.M. Fuzzy Logic and Neural Networks. In Proceedings of the 1992 IEEE International Conference on Systems Engineering, Kobe, Japan, 17–19 September 1992; pp. 636–639. [Google Scholar] [CrossRef]







| Dataset | Training Units | Test Units | Conditions | Fault Modes |
|---|---|---|---|---|
| FD001 | 100 | 100 | 1 | HPC * Degradation |
| FD004 | 249 | 248 | 6 | HPC * & Fan Degradation |
| Parameter | Values |
|---|---|
| Window Size | 10–100 (step = 2) |
| Batch Size | 32, 64, 128, 256, 512 |
| Optimizer | Adam |
| Learning Rate | 0.001 (base value) |
| AE Epochs | 50 |
| Max LSTM Epochs | 150 |
| Early stopping patience | 15 epochs |
| Component | Specification |
|---|---|
| OS | Linux 5.15.153.1-microsoft-standard-WSL2 |
| Docker version | 4.45.0 (pytorch:2.2.1) (Docker Inc., Palo Alto, CA, USA) |
| Python Version | 3.10.13 (Python Software Foundation, Beaverton, OR, USA) |
| PyTorch Version | 2.2.1 (Linux Foundation, San Francisco, CA, USA) |
| CUDA Version | 12.1 (NVIDIA Corp., Santa Clara, CA, USA) |
| CPU | AMD Ryzen 9 7950X (Advanced Micro Devices, Inc., Santa Clara, CA, USA) |
| GPU | NVIDIA GeForce RTX 4090 (ASUSTeK Computer Inc., Taipei, Taiwan) |
| RAM | 32GB (Samsung Electronics Co., Ltd., Suwon, Republic of South Korea) |
| Batch Size | Encoder Mode | Best Window | RMSE | C-MAPSS Score | MAE | Time (s) * | |
|---|---|---|---|---|---|---|---|
| 32 | finetune | 48 | 14.13 | 309.64 | 10.49 | 0.884 | 2.67 |
| frozen | 70 | 14.38 | 353.96 | 11.04 | 0.880 | 2.09 | |
| 64 | finetune | 26 | 14.29 | 323.44 | 10.46 | 0.882 | 1.55 |
| frozen | 68 | 14.43 | 347.05 | 10.96 | 0.879 | 1.17 | |
| 128 | finetune | 64 | 13.99 | 326.78 | 10.46 | 0.887 | 0.75 |
| frozen | 68 | 14.34 | 329.42 | 10.88 | 0.881 | 0.68 | |
| 256 | finetune | 90 | 14.45 | 350.26 | 10.74 | 0.879 | 0.51 |
| frozen | 36 | 14.68 | 386.03 | 10.76 | 0.875 | 0.43 | |
| 512 | finetune | 14 | 24.26 | 2280.5 | 18.63 | 0.609 | 0.28 |
| frozen | 40 | 26.18 | 3614.55 | 21.8 | 0.511 | 0.31 |
| Batch Size | Encoder Mode | Best Window | RMSE | C-MAPSS Score | MAE | Time (s) * | |
|---|---|---|---|---|---|---|---|
| 32 | finetune | 69 | 30.12 | 49,372.8 | 22.21 | 0.712 | 7.56 |
| frozen | 42 | 40.27 | 79,524.1 | 31.24 | 0.501 | 7.41 | |
| 64 | finetune | 74 | 29.48 | 45,446.0 | 21.92 | 0.707 | 3.85 |
| frozen | 12 | 38.76 | 82,798.1 | 30.23 | 0.493 | 3.74 | |
| 128 | finetune | 76 | 28.67 | 27,303.7 | 21.08 | 0.723 | 2.05 |
| frozen | 10 | 38.75 | 65,700.6 | 30.17 | 0.494 | 1.95 | |
| 256 | finetune | 68 | 29.74 | 32,741.4 | 22.13 | 0.702 | 1.23 |
| frozen | 20 | 40.45 | 80,094.1 | 31.71 | 0.449 | 1.08 | |
| 512 | finetune | 26 | 41.54 | 107,266.3 | 33.65 | 0.405 | 0.69 |
| frozen | 36 | 46.25 | 388,160.5 | 37.55 | 0.280 | 0.64 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Jeon, E.; Jin, D.; Kim, Y. Effects of Window and Batch Size on Autoencoder-LSTM Models for Remaining Useful Life Prediction. Machines 2026, 14, 135. https://doi.org/10.3390/machines14020135
Jeon E, Jin D, Kim Y. Effects of Window and Batch Size on Autoencoder-LSTM Models for Remaining Useful Life Prediction. Machines. 2026; 14(2):135. https://doi.org/10.3390/machines14020135
Chicago/Turabian StyleJeon, Eugene, Donghwan Jin, and Yeonhee Kim. 2026. "Effects of Window and Batch Size on Autoencoder-LSTM Models for Remaining Useful Life Prediction" Machines 14, no. 2: 135. https://doi.org/10.3390/machines14020135
APA StyleJeon, E., Jin, D., & Kim, Y. (2026). Effects of Window and Batch Size on Autoencoder-LSTM Models for Remaining Useful Life Prediction. Machines, 14(2), 135. https://doi.org/10.3390/machines14020135

