Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features
Abstract
1. Introduction
2. Materials and Methods
2.1. Participants
2.2. Experimental Design and Data Collection
2.3. Data Preprocessing
Feature Category | Feature |
---|---|
Static Features | Weight (kg) |
Height (m) | |
BMI (kg/m2) | |
Body fat percentage (%) | |
Resting oxygen consumption (L/min) | |
Resting heart rate (Beats/min) | |
Dynamic Features | Exercise heart rate (Beats/min) |
X-axis acceleration (G) | |
Y-axis acceleration (G) | |
Z-axis acceleration (G) | |
VM (G) |
2.4. Model Construction
2.4.1. Attention Mechanism
- (1)
- Time Attention Mechanism (TAM)
- Squeeze: Spatial features (channel number C and two-dimensional spatial dimensions H × W) are compressed along the spatial dimension into channel description vectors through global average pooling. Compared with the C × H × W structure of image data, this paper omits the spatial dimension H × W of the time series and retains only the time channel C and the spatial dimension W composed of multiple indicators. The calculation method is shown in Equation (3). FC denotes the feature matrix F on the Cth channel and FC(i) denotes its value at the ith time step.
- Excitation: We introduce a fully connected layer with a bottleneck structure to generate channel attention weights S:
- (2)
- Spatial Attention Mechanism (SAM)
- (3)
- Spatio-temporal Attention Mechanism (STAM)
2.4.2. VO2 Prediction Model
2.5. Model Evaluation Indicators
- (1)
- Root Mean Square Error (RMSE)
- (2)
- Mean Absolute Error (MAE)
- (3)
- Deciding Coefficient (R2)
3. Results
3.1. Sequential Dynamic Characteristics
3.2. Construction of VO2 Prediction Model Based on Dynamic-Static Feature Fusion
3.3. VO2 Prediction Model with Integrated Attention Mechanism
3.4. VO2 Prediction Performance Across Exercise Intensity Zones
4. Discussion
4.1. Enhanced VO2 Prediction Using Accelerometer–Heart Rate Fusion
4.2. The Key Role of CNN in Predicting Oxygen Uptake
4.3. The Impact of the Attention Mechanism on Predicting Oxygen Uptake
4.4. Increased Error in VO2 Prediction Model During High-Intensity Exercise Phases
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
VO2 | Oxygen Uptake |
HR | Heart Rate |
ACC | Accelerate |
CPET | Cardiopulmonary exercise testing |
BMI | Body Mass Index |
LSTM | Long Short-Term Memory |
CNN | Convolutional Neural Network |
SAM | Spatial Attention Module |
TAM | Time Attention Mechanism |
STAM | Spatio-temporal Attention Module |
CLSA | CNN-SAM-LSTM model |
CLTA | CNN-TAM-LSTM model |
CLSTA | CNN-STAM-LSTM model |
RMSE | Root Mean Square Error |
MAE | Mean Absolute Error |
References
- Laukkanen, J.A.; Isiozor, N.M.; Kunutsor, S.K. Objectively Assessed Cardiorespiratory Fitness and All-Cause Mortality Risk. Mayo Clin. Proc. 2022, 97, 1054–1073. [Google Scholar] [CrossRef]
- Jones, A.M.; Carter, H. The Effect of Endurance Training on Parameters of Aerobic Fitness. Sports Med. 2000, 29, 373–386. [Google Scholar] [CrossRef] [PubMed]
- Whipp, J.B.; Ward, A.S. Gas Exchange Dynamics and the Tolerance to Muscular Exercise: Effects of Fitness and Training. Ann. Physiol. Anthropol. 1992, 11, 207–214. [Google Scholar] [CrossRef]
- Guazzi, M.; Adams, V.; Conraads, V.; Halle, M.; Mezzani, A.; Vanhees, L.; Arena, R.; Fletcher, G.F.; Forman, D.E.; Kitzman, D.W.; et al. Clinical Recommendations for Cardiopulmonary Exercise Testing Data Assessment in Specific Patient Populations. Circulation 2012, 126, 2261–2274. [Google Scholar] [CrossRef] [PubMed]
- Crouter, S.E.; Antczak, A.; Hudak, J.R.; DellaValle, D.M.; Haas, J.D. Accuracy and Reliability of the ParvoMedics TrueOne 2400 and MedGraphics VO2000 Metabolic Systems. Eur. J. Appl. Physiol. 2006, 98, 139–151. [Google Scholar] [CrossRef]
- Van Hooren, B.; Souren, T.; Bongers, B.C. Accuracy of Respiratory Gas Variables, Substrate, and Energy Use from 15 CPET Systems During Simulated and Human Exercise. Scand. J. Med. Sci. Sports 2024, 34, e14490. [Google Scholar] [CrossRef] [PubMed]
- Wicks, J.R.; Oldridge, N.B.; Nielsen, L.K.; Vickers, C.E. HR Index—A Simple Method for the Prediction of Oxygen Uptake. Med. Sci. Sports Exerc. 2011, 43, 2005–2012. [Google Scholar] [CrossRef]
- Keytel, L.; Goedecke, J.; Noakes, T.; Hiiloskorpi, H.; Laukkanen, R.; Van Der Merwe, L.; Lambert, E. Prediction of Energy Expenditure from Heart Rate Monitoring During Submaximal Exercise. J. Sports Sci. 2005, 23, 289–297. [Google Scholar] [CrossRef]
- Davidson, P.; Trinh, H.; Vekki, S.; Müller, P. Surrogate Modelling for Oxygen Uptake Prediction Using LSTM Neural Network. Sensors 2023, 23, 2249. [Google Scholar] [CrossRef]
- Li, F.; Chang, C.-H.; Chung, Y.-C.; Wu, H.-J.; Kan, N.-W.; ChangChien, W.-S.; Ho, C.-S.; Huang, C.-C. Development and Validation of 3 Min Incremental Step-In-Place Test for Predicting Maximal Oxygen Uptake in Home Settings: A Submaximal Exercise Study to Assess Cardiorespiratory Fitness. Int. J. Environ. Res. Public Health 2021, 18, 10750. [Google Scholar] [CrossRef]
- DiPietro, R.; Hager, G.D. Deep learning: RNNs and LSTM. In Handbook of Medical Image Computing and Computer Assisted Intervention; Elsevier: Amsterdam, The Netherlands, 2020; pp. 503–519. [Google Scholar]
- Porszasz, J.; Casaburi, R.; Somfay, A.; Woodhouse, L.J.; Whipp, B.J. A Treadmill Ramp Protocol Using Simultaneous Changes in Speed and Grade. Med. Sci. Sports Exerc. 2003, 35, 1596–1603. [Google Scholar] [CrossRef] [PubMed]
- Mei, P.; Li, M.; Zhang, Q.; Li, G.; Song, L. Prediction Model of Drinking Water Source Quality with Potential Industrial-Agricultural Pollution Based on CNN-GRU-Attention. J. Hydrol. 2022, 610, 127934. [Google Scholar] [CrossRef]
- Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention Mechanisms in Computer Vision: A Survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11211, pp. 3–19. ISBN 978-3-030-01233-5. [Google Scholar]
- American College of Sports Medicine. ACSM’s Guidelines for Exercise Testing and Prescription, 11th ed.; Wolters Kluwer: Philadelphia, PA, USA, 2021; ISBN 978-1-9751-5326-4. [Google Scholar]
- Lu, Z.; Yang, J.; Tao, K.; Li, X.; Xu, H.; Qiu, J. Combined Impact of Heart Rate Sensor Placements with Respiratory Rate and Minute Ventilation on Oxygen Uptake Prediction. Sensors 2024, 24, 5412. [Google Scholar] [CrossRef] [PubMed]
- Bangaru, S.S.; Wang, C.; Aghazadeh, F.; Muley, S.; Willoughby, S. Oxygen Uptake Prediction for Timely Construction Worker Fatigue Monitoring Through Wearable Sensing Data Fusion. Sensors 2025, 25, 3204. [Google Scholar] [CrossRef]
- Gómez-Carmona, C.D.; Bastida-Castillo, A.; Ibáñez, S.J.; Pino-Ortega, J. Accelerometry as a Method for External Workload Monitoring in Invasion Team Sports. A Systematic Review. PLoS ONE 2020, 15, e0236643. [Google Scholar] [CrossRef]
- Sheridan, D.; Jaspers, A.; Viet Cuong, D.; Op De Beéck, T.; Moyna, N.M.; de Beukelaar, T.T.; Roantree, M. Estimating Oxygen Uptake in Simulated Team Sports Using Machine Learning Models and Wearable Sensor Data: A Pilot Study. PLoS ONE 2025, 20, e0319760. [Google Scholar] [CrossRef]
- Nakamura, T.; Kiyono, K.; Wendt, H.; Abry, P.; Yamamoto, Y. Multiscale Analysis of Intensive Longitudinal Biomedical Signals and Its Clinical Applications. Proc. IEEE 2016, 104, 242–261. [Google Scholar] [CrossRef]
- Ernst, G. Heart-Rate Variability—More than Heart Beats? Front. Public Health 2017, 5, 240. [Google Scholar] [CrossRef]
- De Brabandere, A.; Op De Beéck, T.; Schütte, K.H.; Meert, W.; Vanwanseele, B.; Davis, J. Data Fusion of Body-Worn Accelerometers and Heart Rate to Predict VO2max during Submaximal Running. PLoS ONE 2018, 13, e0199509. [Google Scholar] [CrossRef]
- Lee, C.J.; Lee, J.K. IMU-Based Energy Expenditure Estimation for Various Walking Conditions Using a Hybrid CNN–LSTM Model. Sensors 2024, 24, 414. [Google Scholar] [CrossRef] [PubMed]
- Hossain, M.B.; LaMunion, S.R.; Crouter, S.E.; Melanson, E.L.; Sazonov, E. A CNN Model for Physical Activity Recognition and Energy Expenditure Estimation from an Eyeglass-Mounted Wearable Sensor. Sensors 2024, 24, 3046. [Google Scholar] [CrossRef] [PubMed]
- Amelard, R.; Hedge, E.T.; Hughson, R.L. Temporal Convolutional Networks Predict Dynamic Oxygen Uptake Response from Wearable Sensors Across Exercise Intensities. NPJ Digit. Med. 2021, 4, 156. [Google Scholar] [CrossRef]
- Zhu, C.; Liu, Q.; Meng, W.; Ai, Q.; Xie, S.Q. An Attention-Based CNN-LSTM Model with Limb Synergy for Joint Angles Prediction. In Proceedings of the 2021 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM), Delft, The Netherlands, 12–16 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 747–752. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2017; Volume 30. [Google Scholar]
- Cao, F.; Yang, S.; Chen, Z.; Liu, Y.; Cui, L. Ister: Inverted Seasonal-Trend Decomposition Transformer for Explainable Multivariate Time Series Forecasting. arXiv 2024, arXiv:2412.18798. [Google Scholar]
- Zhou, X.; Sheil, B.; Suryasentana, S.; Shi, P. Multi-Fidelity Fusion for Soil Classification via LSTM and Multi-Head Self-Attention CNN Model. Adv. Eng. Inform. 2024, 62, 102655. [Google Scholar] [CrossRef]
- Zheng, B.; Luo, W.; Zhang, M.; Jin, H. Arrhythmia Classification Based on Multi-Input Convolutional Neural Network with Attention Mechanism. PLoS ONE 2025, 20, e0326079. [Google Scholar] [CrossRef]
- Khan, M.; Hossni, Y. A Comparative Analysis of LSTM Models Aided with Attention and Squeeze and Excitation Blocks for Activity Recognition. Sci. Rep. 2025, 15, 3858. [Google Scholar] [CrossRef]
- Schneider, D.A.; Wing, A.N.; Morris, N.R. Oxygen Uptake and Heart Rate Kinetics During Heavy Exercise: A Comparison Between Arm Cranking and Leg Cycling. Eur. J. Appl. Physiol. 2002, 88, 100–106. [Google Scholar] [CrossRef]
- Zhao, B.; Xing, H.; Wang, X.; Song, F.; Xiao, Z. Rethinking Attention Mechanism in Time Series Classification. Inf. Sci. 2023, 627, 97–114. [Google Scholar] [CrossRef]
- Gløersen, Ø.; Colosio, A.L.; Boone, J.; Dysthe, D.K.; Malthe-Sørenssen, A.; Capelli, C.; Pogliaghi, S. Modeling Vo2 On-Kinetics Based on Intensity-Dependent Delayed Adjustment and Loss of Efficiency (DALE). J. Appl. Physiol. 2022, 132, 1480–1488. [Google Scholar] [CrossRef]
- Yang, Y.; Zha, K.; Chen, Y.; Wang, H.; Katabi, D. Delving into Deep Imbalanced Regression. In Proceedings of the International Conference on Machine Learning, PMLR, Online, 18–24 July 2021; pp. 11842–11851. [Google Scholar]
Male (n = 14) | Female (n = 7) | |
---|---|---|
Age | 24 ± 3 | 25 ± 3 |
Height (cm) | 176 ± 8 | 163 ± 5 |
Weight (kg) | 70.6 ± 13.3 | 52.3 ± 6 |
BMI | 22.8 ± 3 | 19.8 ± 1.5 |
Body fat percentage (%) | 16.2 ± 5.8 | 23.7 ± 3.4 |
Dynamic Feature Input Layer | Static Feature Input Layer | CNN | SAM | TAM | LSTM | Output Layer | |
---|---|---|---|---|---|---|---|
Number of neurons | 10 × 5 | 6 | 64 | 64 | 64 | 128 | 1 |
Activation functions | \ | \ | ReLU | Sigmoid | Sigmoid | Tanh | Linear |
Feature | Model | Train | Validation | Test | ||||||
---|---|---|---|---|---|---|---|---|---|---|
RMSE | MAE | R2 | RMSE | MAE | R2 | RMSE | MAE | R2 | ||
HR + Static Features | LSTM | 0.0851 | 0.0626 | 0.9918 | 0.2006 | 0.1342 | 0.9536 | 0.3335 | 0.2305 | 0.8882 |
CNN-LSTM | 0.0306 | 0.0224 | 0.9981 | 0.2095 | 0.1477 | 0.9499 | 0.3232 | 0.2950 | 0.8950 | |
HR + Acc Data + Static Features | LSTM | 0.0892 | 0.0649 | 0.9908 | 0.1031 | 0.0764 | 0.9871 | 0.2720 | 0.2078 | 0.9256 |
CNN-LSTM | 0.0044 | 0.0035 | 1.0000 | 0.0504 | 0.0302 | 0.9971 | 0.2317 | 0.1566 | 0.9460 |
Model | Train | Validation | Test | ||||||
---|---|---|---|---|---|---|---|---|---|
RMSE | MAE | R2 | RMSE | MAE | R2 | RMSE | MAE | R2 | |
CLSA | 0.005 | 0.0038 | 1.0000 | 0.0517 | 0.0290 | 0.9968 | 0.1942 | 0.1241 | 0.9621 |
CLTA | 0.0051 | 0.0040 | 1.0000 | 0.0519 | 0.0285 | 0.9968 | 0.2648 | 0.1881 | 0.9295 |
CLSTA | 0.0041 | 0.0031 | 1.0000 | 0.0609 | 0.0304 | 0.9955 | 0.2030 | 0.1279 | 0.9586 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, Z.; Song, Y.; Pang, L.; Li, S.; Sun, G. Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features. Sensors 2025, 25, 4062. https://doi.org/10.3390/s25134062
Wang Z, Song Y, Pang L, Li S, Sun G. Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features. Sensors. 2025; 25(13):4062. https://doi.org/10.3390/s25134062
Chicago/Turabian StyleWang, Zhen, Yingzhe Song, Lei Pang, Shanjun Li, and Gang Sun. 2025. "Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features" Sensors 25, no. 13: 4062. https://doi.org/10.3390/s25134062
APA StyleWang, Z., Song, Y., Pang, L., Li, S., & Sun, G. (2025). Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features. Sensors, 25(13), 4062. https://doi.org/10.3390/s25134062