This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Open AccessArticle
Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features
by
Zhen Wang
Zhen Wang
,
Yingzhe Song
Yingzhe Song ,
Lei Pang
Lei Pang *,
Shanjun Li
Shanjun Li * and
Gang Sun
Gang Sun
Institute of Artificial Intelligence in Sports, Capital University of Physical Education and Sports, Beijing 100191, China
*
Authors to whom correspondence should be addressed.
Sensors 2025, 25(13), 4062; https://doi.org/10.3390/s25134062 (registering DOI)
Submission received: 10 June 2025
/
Revised: 26 June 2025
/
Accepted: 26 June 2025
/
Published: 29 June 2025
Abstract
Dynamic oxygen uptake (VO2) reflects moment-to-moment changes in oxygen consumption during exercise and underpins training design, performance enhancement, and clinical decision-making. We tackled two key obstacles—the limited fusion of heterogeneous sensor data and inadequate modeling of long-range temporal patterns—by integrating wearable accelerometer and heart-rate streams with a convolutional neural network–LSTM (CNN-LSTM) architecture and optional attention modules. Physiological signals and VO2 were recorded from 21 adults through resting assessment and cardiopulmonary exercise testing. The results showed that pairing accelerometer with heart-rate inputs improves prediction compared with considering the heart rate alone. The baseline CNN-LSTM reached R2 = 0.946, outperforming a plain LSTM (R2 = 0.926) thanks to stronger local spatio-temporal feature extraction. Introducing a spatial attention mechanism raised accuracy further (R2 = 0.962), whereas temporal attention reduced it (R2 = 0.930), indicating that attention success depends on how well the attended features align with exercise dynamics. Stacking both attentions (spatio-temporal) yielded R2 = 0.960, slightly below the value for spatial attention alone, implying that added complexity does not guarantee better performance. Across all models, prediction errors grew during high-intensity bouts, highlighting a bottleneck in capturing non-linear physiological responses under heavy load. These findings inform architecture selection for wearable metabolic monitoring and clarify when attention mechanisms add value.
Share and Cite
MDPI and ACS Style
Wang, Z.; Song, Y.; Pang, L.; Li, S.; Sun, G.
Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features. Sensors 2025, 25, 4062.
https://doi.org/10.3390/s25134062
AMA Style
Wang Z, Song Y, Pang L, Li S, Sun G.
Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features. Sensors. 2025; 25(13):4062.
https://doi.org/10.3390/s25134062
Chicago/Turabian Style
Wang, Zhen, Yingzhe Song, Lei Pang, Shanjun Li, and Gang Sun.
2025. "Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features" Sensors 25, no. 13: 4062.
https://doi.org/10.3390/s25134062
APA Style
Wang, Z., Song, Y., Pang, L., Li, S., & Sun, G.
(2025). Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features. Sensors, 25(13), 4062.
https://doi.org/10.3390/s25134062
Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details
here.
Article Metrics
Article Access Statistics
For more information on the journal statistics, click
here.
Multiple requests from the same IP address are counted as one view.