Activity-Independent Estimation of VO2max from Short-Duration Multimodal Wearable Signals

Saldaña-Aristizábal, Laura; Rivas-Caicedo, Jhonathan L.; Niño-Tejada, Kevin; Patarroyo-Montenegro, Juan F.

doi:10.3390/electronics15132843

Open AccessArticle

Activity-Independent Estimation of VO₂max from Short-Duration Multimodal Wearable Signals

by

Laura Saldaña-Aristizábal

¹

,

Jhonathan L. Rivas-Caicedo

¹

,

Kevin Niño-Tejada

¹

and

Juan F. Patarroyo-Montenegro

^2,*

¹

Department of Electrical and Computer Engineering, University of Puerto Rico, Mayagüez, PR 00680, USA

²

Department of Computer Science and Engineering, University of Puerto Rico, Mayagüez, PR 00680, USA

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(13), 2843; https://doi.org/10.3390/electronics15132843

Submission received: 11 May 2026 / Revised: 17 June 2026 / Accepted: 26 June 2026 / Published: 30 June 2026

(This article belongs to the Special Issue Ubiquitous Computing and Mobile Computing)

Download

Browse Figures

Versions Notes

Abstract

Cardiorespiratory fitness is a key indicator of overall health, yet its assessment still largely depends on structured protocols such as cardiopulmonary exercise testing (CPET), which require specialized equipment, trained personnel, and controlled laboratory conditions that limit accessibility. Wearable sensing technologies offer a practical alternative by continuously capturing physiological and biomechanical signals during daily life. However, most wearable-based approaches remain constrained by activity-specific modeling, structured exercise protocols, or prolonged monitoring periods, limiting generalization across real-world behaviors. This work presents an activity-independent machine learning framework for estimating VO₂max from short-duration multimodal wearable signals acquired during semi-structured real-world daily activities. The proposed two-stage framework first estimates the metabolic equivalent of task (MET) as a continuous representation of activity intensity, then integrates this estimate with physiological, biomechanical, and demographic features to predict subject-level VO₂max. By decoupling physiological demand from explicit activity labels, the framework improves robustness to unseen activities while preserving physiological interpretability. Evaluation under the Leave-One-Subject-Out validation protocol demonstrates that short-duration wearable-derived signals encode meaningful information related to inter-subject differences in cardiorespiratory fitness. These findings support the feasibility of activity-independent, wearable-based fitness estimation and provide a practical foundation for scalable preventive health monitoring in everyday life.

Keywords:

cardiorespiratory fitness; VO₂max estimation; wearable sensors; multimodal sensing; machine learning; digital health

1. Introduction

Preventive healthcare increasingly relies on the ability to monitor physiological status continuously and non-invasively outside clinical environments [1,2]. Among health indicators associated with long-term cardiovascular and metabolic well-being, cardiorespiratory fitness (CRF) is one of the most informative, as reduced CRF has been consistently linked to elevated cardiovascular risk, diminished functional capacity, and increased all-cause mortality [3,4]. Consequently, reliable assessment of CRF is valuable not only for athletic performance evaluation, but also for early detection of health deterioration and long-term wellness monitoring in the general population [5].

The most widely accepted indicator of CRF is maximal oxygen uptake, VO₂max, defined as the maximum rate at which oxygen can be utilized during intense exercise [3,6]. Clinically, VO₂max is commonly measured through CPET, a supervised laboratory procedure requiring specialized equipment, trained personnel, controlled testing conditions, and substantial physical effort from participants. Although highly accurate, these requirements limit large-scale deployment and make repeated assessment impractical for many populations, particularly older adults or individuals with reduced exercise tolerance [7].

Recent advances in wearable sensing technologies have created new opportunities for accessible cardiorespiratory monitoring. Modern wearable devices can continuously acquire physiological and biomechanical signals such as heart rate, blood oxygen saturation (SpO₂), acceleration, angular velocity, and body orientation during normal daily life [8]. Recent developments in wearable sensor design have also focused on creating more flexible, lightweight, and unobtrusive sensing platforms that improve user comfort and facilitate continuous physiological monitoring, further expanding the capabilities of wearable health technologies [9,10]. By enabling the integration of physiological and movement-derived information outside controlled laboratory settings, wearable sensing opens the door to more practical and scalable approaches for estimating cardiorespiratory fitness, as conceptually illustrated in Figure 1, which contrasts the resource-intensive nature of conventional CPET with a wearable-based framework capable of estimating VO₂max from short-duration daily life measurements.

Despite this promise, wearable-based CRF estimation remains limited by three major challenges: many methods require prolonged monitoring over days or weeks [11,12]; others depend on structured exercise protocols or predefined movement sequences [13,14,15], and some remain strongly activity-dependent, requiring either specific exercises or explicit activity recognition labels [16,17]. In addition, validation is often performed on relatively narrow cohorts, limiting confidence in generalization across heterogeneous populations [18,19,20].

This work asks a central question: can cardiorespiratory fitness be estimated from less than one hour of wearable signals collected during semi-structured daily activities, without exercise testing, structured protocols, or explicit activity recognition?

To address this question, this work proposes an activity-independent framework for estimating VO₂max from short-duration wearable-derived physiological and biomechanical signals collected during everyday behavior. The ground-truth VO₂max values were obtained using the Queens College Step Test, an established and validated submaximal exercise protocol commonly used for indirect cardiorespiratory fitness evaluation and population-level assessment of cardiorespiratory fitness [21,22,23].

Rather than modeling discrete activity categories, the proposed approach learns generalized relationships between movement intensity, physiological response, and subject-level cardiorespiratory capacity.

The framework adopts a two-stage strategy. In the first stage, a regression model estimates MET, a standardized physiological measure that quantifies energy expenditure relative to resting metabolic rate and has been extensively adopted in exercise physiology, epidemiological studies, and wearable sensing research as a practical and physiologically meaningful representation of physical activity intensity [24,25,26,27].

Although MET values derived from standardized compendia represent population-level estimates rather than individualized measurements, they provide a validated framework for characterizing the energetic demands of daily activities and have been widely used to standardize activity intensity across diverse research settings [28,29,30]. By expressing physical effort on a continuous physiological scale rather than through discrete activity categories, MET provides an interpretable intermediate representation that enables the proposed framework to model generalized relationships between movement intensity, physiological response, and cardiorespiratory fitness independently of the specific activity being performed.

The estimated MET values are subsequently used as a continuous representation of activity intensity. In the second stage, this intensity representation is combined with physiological biomarkers, movement descriptors, and demographic information to predict subject-level VO₂max. By decoupling physiological demand from activity semantics, the proposed framework aims to improve robustness to unseen activities while preserving physiological interpretability.

The main contributions of this work are summarized as follows:

An activity-independent framework for wearable-based VO₂max estimation that reduces reliance on explicit activity recognition and structured exercise protocols.
A two-stage modeling strategy that first estimates movement intensity through MET regression and subsequently integrates intensity, physiological signals, and demographic descriptors for subject-level cardiorespiratory fitness prediction.
Validation on a heterogeneous participant cohort spanning variations in sex, age, and fitness level, supporting the generalization capacityy of the proposed approach under realistic daily life conditions.
A scalable alternative to conventional laboratory-based fitness assessment that may enable more accessible and frequent cardiorespiratory monitoring in non-clinical settings.

The remainder of this paper is organized as follows. Section 2 describes the study protocol, wearable sensing setup, preprocessing procedures, feature engineering pipeline, and machine learning models used for MET and VO₂max estimation. Section 3 presents the experimental results. Section 4 discusses the implications, limitations, and future research directions of the proposed framework. Finally, Section 5 concludes the paper.

2. Materials and Methods

This section describes the experimental protocol, data collection procedures, wearable setup, preprocessing steps, and artificial intelligence architectures used in the study. It outlines how participants were recruited and monitored using wearable devices during a structured protocol involving real-world physical activities. Two separate pipelines were established: one for MET regression and another for estimating the VO₂max health indicator.

An overview of the proposed framework is presented in Figure 2. The methodology follows a two-stage modeling pipeline for wearable-based cardiorespiratory fitness estimation. In the first stage, synchronized movement signals from IMU sensors are segmented into short-duration windows to estimate continuous MET values, providing a representation of physical intensity over time. The resulting MET estimates are then evaluated using a stability criterion based on the accumulated stability duration, denoted by

s_{i}

, which represents the cumulative duration (in seconds) over which consecutive MET estimates satisfy the stability condition. Only periods satisfying

s_{i} \geq 60

s are selected for further analysis. For each valid stable segment, the corresponding movement, physiological, and demographic signals are organized into 1-min windows from which movement-derived, physiological, intensity-related, and demographic features are extracted. The resulting feature representations are subsequently used in the second stage to estimate subject-level VO₂max through a regression model.

2.1. Study Design and Data Collection Protocol

This study was approved by the Institutional Review Board (IRB) of the University of Puerto Rico and conducted in accordance with the ethical standards of the Collaborative Institutional Training Initiative (CITI) Program. All participants provided written informed consent prior to participation and completed a Physical Activity Readiness Questionnaire (PAR-Q) [31] to ensure safe involvement in the protocol.

A total of 67 individuals were recruited, of which 60 participants were included in the final analysis after excluding incomplete or low-quality recordings. Demographic information, including age, sex, height, weight, body fat percentage, and body mass index (BMI), was collected for each participant and later incorporated as input features in the proposed models. Further details on the dataset collection methodology and sensing framework are available in our previously published work [32].

Figure 3 illustrates the distribution of demographic and fitness characteristics across the participant pool, highlighting the heterogeneous nature of the recruited cohort. Recruitment was intentionally designed to encourage broad participation, with eligibility criteria kept intentionally wide and limited primarily to adulthood and safe participation requirements. As expected in a general population sample, most participants were concentrated within the average fitness range, while fewer individuals were observed at the lower and higher ends of the spectrum. This distribution reflects realistic population variability, although it introduces additional challenges for modeling extreme physiological responses.

Data collection was conducted in a controlled indoor environment to reduce external variability. As a reference measure of cardiorespiratory fitness, each participant completed the Queens College Step Test, a standardized submaximal exercise protocol from which VO₂max was estimated using sex-specific predictive equations [22]. The test consisted of stepping at a fixed cadence (24 steps/min for men, 22 steps/min for women) for three minutes, followed by manual heart rate measurement.

Following the step test, participants completed a structured activity protocol designed to capture a broad range of physical intensities representative of daily living. The protocol was developed in accordance with American National Standards Institute/Consumer Technology Association (ANSI/CTA) guidelines for real-world wearable evaluation [33,34], ensuring standardized activity execution and controlled intensity levels while preserving ecological relevance to natural daily conditions. This balance between protocol consistency and real-world representativeness is essential for developing models that generalize reliably in wearable-based health monitoring applications. The protocol lasted approximately 37 min and included alternating periods of rest, light, moderate, and vigorous activities defined according to MET-based categorizations [35]. The activity transitions were guided by an automated timing system to ensure consistent execution among the participants. The sequence of activities, their duration, and the corresponding MET values, illustrated in Figure 4, correspond to the standardized protocol performed by all participants during data acquisition and are part of the multimodal dataset previously presented in [32].

This protocol enabled the collection of synchronized motion and physiological signals across a wide range of controlled yet representative activity intensities.

2.2. Wearable Sensor Configuration

To capture both biomechanical and physiological signals participants were instrumented with a combination of inertial and biomarker sensing devices, as illustrated in Figure 5.

Motion data were collected using five MetaMotionRL Inertial Measurement Units (IMUs) (MbientLab Inc., San Jose, CA, USA), placed on the chest, left hand, right hand, left knee, and right knee. The sensors operated at a sampling frequency of 50 Hz and recorded a combination of tri-axial acceleration, gyroscope, and quaternion signals depending on their placement. The chest sensor provided the most comprehensive set of measurements, serving as a central reference for whole-body motion [36], while limb-mounted sensors captured complementary upper and lower extremity dynamics.

This configuration was designed to provide balanced coverage of both linear and rotational movement while minimizing sensor burden and preserving user comfort, supporting potential real-world deployment scenarios.

Physiological signals were acquired using two wrist-worn devices: a Garmin Venu 3 smartwatch (Garmin Ltd., Olathe, KS, USA) and a CheckMe™ oximeter (Viatom Technology Co., Ltd., Shenzhen, China), worn on opposite wrists. Both devices recorded heart rate (HR) and peripheral SpO₂ at a sampling frequency of 0.5 Hz, providing redundant measurements to improve robustness under motion conditions.

2.3. Signal Preprocessing

The raw signals collected from IMUs and biomarker devices were processed through a two-stage pipeline consisting of temporal synchronization and signal-specific filtering, corresponding to the preprocessing stage illustrated in Figure 2.

To ensure consistency across heterogeneous data sources, all signals were first aligned in time. Timestamps from both IMU and biomarker streams were parsed and converted into a unified high-resolution datetime representation using an ISO-like format (YYYY-MM-DD HH:MM:SS.ssssss), enabling consistent temporal alignment across modalities.

For each subject, a common temporal interval was defined as:

t_{start} = max (t_{{IMU}_{start}}, t_{{BIO}_{start}}), t_{end} = min (t_{{IMU}_{end}}, t_{{BIO}_{end}}),

(1)

where

t_{{IMU}_{start}}

and

t_{{IMU}_{end}}

denote the initial and final timestamps of the IMU recordings, respectively;

t_{{BIO}_{start}}

and

t_{{BIO}_{end}}

denote the initial and final timestamps of the biomarker recordings, respectively; and

t_{start}

and

t_{end}

represent the beginning and end of the common temporal interval shared by both sensing modalities.

All signals were then trimmed to this shared interval defined by Equation (1), ensuring strict temporal alignment between motion and physiological measurements.

Following synchronization, signal-specific preprocessing was applied to account for the distinct characteristics of motion and biomarker data.

Biomarker signals, including Heart Rate (HR) and SpO₂, were sampled at a low frequency of 0.5 Hz. Due to the low sampling frequency, conventional filtering provides limited benefit and may distort the physiological signal [37]. Therefore biomarker signals were left unfiltered and unnormalized, preserving their original scale and temporal dynamics.

In contrast, IMU signals were filtered to remove noise and irrelevant frequency components. Each channel was processed using a sequential Butterworth filtering approach [38]:

A high-pass filter with a cutoff frequency of 0.1 Hz to remove DC components and low-frequency drift.
A low-pass filter with a cutoff frequency of 10 Hz to suppress high-frequency noise.

Both filters were applied using zero-phase filtering to prevent phase distortion. After filtering, each signal was normalized using Z-score normalization [39]:

x_{norm} = \frac{x - μ}{σ}

(2)

where

x_{norm}

denotes the normalized sample value, x is the original IMU sample, and

μ

and

σ

correspond to the mean and standard deviation, respectively, computed for the corresponding IMU channel of each subject.

Normalization defined by Equation (2) was performed independently for each IMU channel on a per-subject basis, preventing information leakage across subjects while preserving the temporal structure of each recording.

This preprocessing strategy reflects the complementary roles of the signals: IMU data require noise reduction and normalization, whereas biomarker signals are preserved in their original form to maintain physiological interpretability.

2.4. Stage 1: Intensity Representation for Activity-Independent Modeling

To enable activity-independent modeling, an intermediate representation of movement intensity was introduced based on MET. Unlike discrete activity labels, which may not generalize to unseen activities, MET provides a continuous measure of physiological effort. In this framework, MET serves as an intermediate variable that captures the intensity of movement and provides contextual information to the downstream model. This stage corresponds to the MET estimation block shown in Figure 2.

This representation allows the model to interpret signals in terms of energetic demand rather than activity type. By decoupling intensity from activity semantics, the proposed approach supports the learning of generalized relationships between movement, physiological response, and cardiorespiratory outcomes, enabling the model to operate even when the user performs activities that were not included in the training dataset.

As illustrated in greater detail in Figure 6, Stage 1 transforms wearable motion signals into a continuous MET representation through a sequence of preprocessing, segmentation, feature extraction, and regression steps. Movement signals acquired from wearable IMUs are first synchronized and filtered, after which non-overlapping 5-s windows are generated and used to extract representative motion descriptors characterizing movement intensity and temporal dynamics. These features are then processed by an MLP regression model to estimate continuous MET values over time. The predicted MET signal is subsequently smoothed using median and moving average filters to improve temporal consistency and is then evaluated using a stability criterion based on the accumulated stable duration, denoted by

s_{i}

, which represents the amount of consecutive time during which the MET signal remains stable. If the stability condition

s_{i} \geq 60

s is not satisfied, the framework continues processing subsequent 5-s windows and reevaluates the stability criterion. Once the condition is met, the corresponding stable segment is transferred as the output of Stage 1 and forwarded to Stage 2 of the framework. This stability filtering step ensures that the downstream VO₂max model primarily receives windows corresponding to sustained activity conditions, reducing the influence of transitional periods that may introduce variability in physiological response patterns.

2.4.1. Feature Engineering for MET Regression

Prior to feature extraction, the synchronized signals were segmented into fixed-length windows to enable localized analysis of movement patterns. To capture the short-term dynamics of human movement, IMU signals were segmented into non-overlapping windows of 5 s. At a sampling rate of 50 Hz, each window contains sufficient temporal resolution to characterize variations in motion intensity while remaining responsive to rapid changes in activity.

Within each window, movement intensity was represented through features derived from both accelerometer and gyroscope signals. Rather than relying on raw axes, tri-axial measurements were transformed into magnitude signals, enabling an orientation-invariant description of motion and focusing the representation on the overall level of physical effort.

Feature extraction was designed to capture complementary aspects of movement intensity [40]. Measures such as root mean square, mean absolute value, and signal power quantify the overall energy of the motion, while peak-related descriptors (e.g., maximum value and peak-to-peak range) capture bursts of activity. Statistical features, including variability and dispersion, reflect the consistency or irregularity of movement patterns.

To further capture how intensity evolves within each window, the signal was partitioned into four temporal segments, and the energy of each segment was computed. This provides a compact representation of intra-window dynamics, allowing the model to distinguish between steady motion and transient changes in effort.

Together, these features provide a structured representation of movement intensity that links raw motion signals to their underlying energetic demand, forming the basis for MET estimation.

2.4.2. MET-Based Intensity Estimation

To obtain a continuous representation of movement intensity, a regression model was trained to estimate the MET from the structured intensity features extracted from 5-s IMU windows.

The proposed model is a fully connected feedforward neural network designed to learn the nonlinear relationship between wearable motion descriptors and their associated energetic cost. The architecture consists of two hidden layers with GELU activation functions and dropout regularization, followed by a linear output layer that produces a scalar MET estimate for each input window. Ground-truth MET labels were assigned according to the experimental protocol described in Figure 4, allowing the model to learn a direct mapping between movement-derived features and physiological intensity.

This formulation enables movement intensity to be represented as a continuous variable rather than as a discrete activity label. Consequently, transitions between effort levels can be naturally captured over time, providing a smooth and physiologically meaningful description of motion intensity that can be integrated into the downstream VO₂max prediction pipeline.

2.5. Integration into the Prediction Pipeline

The estimated MET values were incorporated into the prediction pipeline as a continuous representation of movement intensity. Unlike discrete activity labels, this signal provides a direct measure of physiological effort over time.

To improve robustness, the predicted MET signal was refined using a median filter followed by a moving average filter, reducing outliers and capturing the underlying intensity trend [41].

From the smoothed MET signal, features describing both intensity magnitude and temporal behavior were extracted. In addition, a stability metric was defined to quantify the consistency of the intensity signal using the coefficient of variation (CV) [42]:

{CV}_{MET} = \frac{σ_{MET}}{μ_{MET}}

(3)

where

σ_{MET}

and

μ_{MET}

denote the standard deviation and mean of the MET signal within a sliding window, respectively.

To account for temporal persistence, stability was accumulated over time. Let

Δ t = 5

s denote the resolution of the MET predictions, and let i indicate the index of the current MET prediction window. A counter

c_{i}

was used to track the number of consecutive prediction windows satisfying the MET stability condition:

c_{i} = \{\begin{matrix} c_{i - 1} + 1, & if {CV}_{MET} (i) < τ \\ 0, & otherwise \end{matrix}

(4)

where

c_{i}

denotes the stability counter at window i,

c_{i - 1}

is the counter value from the previous window,

{CV}_{MET} (i)

is the coefficient of variation of the predicted MET signal evaluated at window i, and

τ

is the predefined stability threshold. When

{CV}_{MET} (i)

is below

τ

, the current window is considered stable and the counter is increased by one. Otherwise, the counter is reset to zero.

The corresponding stability duration,

s_{i}

, was then computed as:

s_{i} = c_{i} \cdot Δ t

(5)

where

s_{i}

represents the cumulative duration, in seconds, over which consecutive MET prediction windows have satisfied the stability condition, and

Δ t

is the duration of each MET prediction window.

This metric enables the identification of sustained activity periods, which are later used as a gating mechanism for VO₂max modeling.

2.6. Stage 2: Modeling Framework for VO₂max Estimation

Building upon the stable MET representation obtained in Stage 1, the second stage maps multimodal wearable features to a subject-level VO₂max estimate. Since short-term fluctuations in the predicted MET signal are frequently associated with activity transitions and delayed physiological adaptation, only temporally stable segments are considered for downstream modeling.

A segment is considered valid when the accumulated stability duration satisfies

s_{i} \geq 60 s

(6)

where

s_{i}

is the accumulated stability duration defined in Equation (5). Thus, only segments for which the predicted MET signal remains stable for at least 60 consecutive seconds are forwarded to Stage 2.

The 60-s stability requirement was introduced to prioritize sustained physiological responses over transient activity transitions. Heart rate kinetics studies have shown that cardiovascular adaptation to changes in exercise intensity occurs progressively, with delayed responses before a new steady state is reached [43,44]. Consequently, stable periods were selected to improve feature reliability and reduce the influence of transitional segments. While this criterion was appropriate for the activity durations considered in the present protocol, future studies involving more heterogeneous free-living behavior may benefit from evaluating alternative stability thresholds.

Once the stability criterion is satisfied, the corresponding multimodal signals, including IMU data, biomarker signals (HR and SpO₂), and MET estimates, are segmented into fixed-length windows of 60 s, which constitute the fundamental units for feature extraction in Stage 2.

As illustrated in Figure 7, each stable window is transformed into a multimodal feature vector combining movement-derived, physiological, intensity-related, and demographic information. These feature vectors are subsequently processed by the XGBoost regression model to generate window-level VO₂max estimates. Since cardiorespiratory fitness is defined at the subject level rather than the window level, predictions obtained across all stable windows corresponding to a participant are aggregated to produce a final subject-level estimate. This aggregation step reduces the influence of local variability between individual windows and enables the model to capture more consistent subject-specific physiological patterns across different activity conditions.

2.6.1. Feature Engineering for VO₂max Regression

Feature engineering was designed to capture the relationship between movement intensity and physiological response in an activity-independent manner. Features were extracted from synchronized accelerometry, HR, SpO₂, and MET signals using 60-s windows.

Movement features were derived from acceleration magnitude signals obtained from sensors placed on the chest, knee, and hand. Time-domain and frequency-domain descriptors were used to characterize movement intensity, variability, and temporal dynamics, while inter-segment correlations were included to capture coordination patterns across body regions [45].

Physiological features were extracted from HR and SpO₂ signals, summarizing their level, variability, and temporal evolution within each window. To explicitly link movement and physiological response, cross-modal features were also computed between HR and motion signals [46].

The MET signal, estimated from the Stage 1 regression model, was incorporated as a continuous representation of activity intensity. Descriptors capturing intensity level and temporal variation were extracted. In addition, combined HR and MET features were included to reflect cardiovascular efficiency under different workload conditions.

Finally, demographic and baseline physiological variables, including age, sex, height, weight, BMI, resting HR, and resting SpO₂, were incorporated to provide subject-specific context [47]. This allowed the model to account not only for the instantaneous physiological response to activity, but also for individual characteristics that influence cardiorespiratory fitness.

Figure 8 provides an illustrative example of representative engineered features for Subject 5 across the experimental protocol. The selected signals include physiological features (Heart Rate and SpO₂), movement-derived features (Chest Motion RMS), and intensity-related features (MET and HR-MET coupling), each represented by distinct color groupings for visual clarity. The figure is intended to provide a qualitative view of how these feature categories evolve throughout the protocol and respond to changes in activity intensity.

Periods of rest are characterized by relatively stable and low-amplitude responses across most modalities, whereas transitions to more demanding activities, particularly walking and box carrying, are accompanied by increases in MET estimates, heart rate, and chest motion RMS. The figure also illustrates that the different modalities do not evolve identically. For example, activities associated with similar MET levels may exhibit distinct movement and physiological responses, as evidenced by differences in chest motion RMS and HR-MET coupling. These observations highlight the complementary information captured by multimodal wearable signals and motivate the integration of movement-derived, physiological, and intensity-related features for activity-independent VO₂max estimation.

2.6.2. Subject-Level Prediction Strategy

The regression model produces predictions at the window level, while the target variable, VO₂max, is defined at the subject level. Each subject contributes multiple windows derived from 60-s segments of synchronized data.

To obtain a subject-level estimate, window-level predictions are grouped into non-overlapping chunks of fixed size. For each chunk, the median prediction is computed to reduce the influence of noisy or unrepresentative windows. The final VO₂max estimate for the subject is then obtained as the median of all chunk-level predictions.

Formally, given a set of window-level predictions

{{\hat{y}}_{i}}_{i = 1}^{N}

, these are partitioned into K chunks, and the subject-level prediction is defined as:

{\hat{y}}_{subject} = median (median (C_{1}), median (C_{2}), \dots, median (C_{K}))

(7)

where

C_{k}

denotes the set of predictions within the k-th chunk.

2.6.3. Model Architecture

VO₂max estimation was formulated as a supervised regression task using extreme gradient boosting (XGBoost) [48]. This model was selected due to its strong performance on structured tabular data and its ability to capture nonlinear relationships among heterogeneous feature types, including physiological, movement-derived, intensity-related, and demographic variables.

Training was performed at the window level, where each feature vector represented a 60-s segment and was assigned the VO₂max label of the corresponding subject. This strategy allowed the model to learn consistent subject-level physiological patterns from multiple observations collected across different activities and conditions.

To improve sensitivity across the full fitness spectrum, sample weighting was incorporated during training, assigning greater importance to subjects at the lower and upper ends of the VO₂max distribution. In addition, the target variable was standardized using statistics computed from the training set and transformed back to the original scale for evaluation.

Model hyperparameters were selected based on validation performance to balance predictive accuracy and generalization.

2.7. Evaluation Protocol

Model performance was evaluated using the Leave-One-Subject-Out (LOSO) [49] cross-validation strategy. In this setup, data from one subject was held out for testing, while the model was trained using data from all remaining subjects. This process was repeated for each subject, ensuring that all evaluations were performed on unseen individuals.

Performance was assessed at the subject level. Although the model produces predictions at the window level, these were aggregated into a single estimate per subject, as described in the previous subsection. This ensures consistency between the prediction and the ground truth, which is defined at the subject level.

The primary evaluation metric was the root mean squared error (RMSE), computed between the predicted and true VO₂max values across all subjects. In addition, the coefficient of determination (

R^{2}

) and the Pearson correlation coefficient (r) were reported to assess the strength and linearity of the relationship between predicted and true values.

All reported metrics correspond to the aggregation of predictions from the held-out subjects across all folds.

3. Results

This section presents the results of the proposed framework, including both the MET regression and the VO₂max estimation models. The evaluation focuses on generalization across subjects under a LOSO scheme.

Performance is assessed using RMSE,

R^{2}

, and r, providing a concise view of prediction accuracy and agreement with reference values.

3.1. Stage 1 Performance: MET Regression

The first stage of the proposed framework estimates a continuous MET signal from wearable motion features, providing an intermediate representation of movement intensity for downstream physiological modeling. Prior to evaluation, the model hyperparameters were optimized using Optuna (v3.6.2) [50] under a LOSO validation scheme. The selected configuration consisted of 87 training epochs, a learning rate of

7.11 \times 10^{- 4}

, weight decay of

7.60 \times 10^{- 5}

, a batch size of 64, a hidden dimension of 128 neurons, and a dropout rate of 0.44.

In addition to LOSO, model performance was evaluated using a Leave-One-Activity-Out (LOAO) strategy, which provides a stricter test of generalization by assessing the model’s ability to estimate intensity levels for activities that were entirely excluded during training. To enforce this separation, LOAO partitioning was defined using the ground-truth MET labels assigned by the experimental protocol, ensuring that all samples corresponding to the held-out intensity level were omitted from training. This design prevents exposure to that activity intensity during model fitting and provides a rigorous evaluation of generalized intensity estimation across unseen movement conditions.

As shown in Figure 9, the model achieved low prediction error under LOSO (RMSE = 0.885 MET) with near-zero bias, indicating reliable intensity estimation across unseen subjects. Under the more challenging LOAO setting, performance remained strong for intermediate-intensity activities, where both RMSE and prediction bias remained relatively small. Larger deviations were observed at the extremes of the MET range, particularly for held-out activities corresponding to rest (MET = 1.0), which was consistently overestimated, and cycling (MET = 6.8), which was systematically underestimated. This pattern suggests that the regression model primarily learns an interpolative mapping within the range of intensities observed during training, while extrapolation beyond that range remains more challenging. Nevertheless, the model preserved a coherent progression of estimated intensity across unseen activity conditions, supporting its use as a meaningful intermediate representation of physical effort within the proposed framework.

3.2. Stage 2 Performance: VO₂max Estimation

The second stage of the proposed framework evaluates the ability of the aggregated wearable-derived features to estimate subject-level VO₂max under a LOSO validation scheme. The proposed model achieved a mean fold RMSE of 5.48 mL·kg⁻¹·min⁻¹, a global RMSE of 6.82 mL·kg⁻¹·min⁻¹, an

R^{2}

value of 0.40, and a Pearson correlation coefficient of

r = 0.64

, indicating moderate predictive agreement with the reference VO₂max values and meaningful capture of inter-subject physiological variability.

Figure 10 presents the relationship between predicted and reference VO₂max values across all LOSO folds. A clear positive association can be observed, confirming that the model successfully preserves the relative ranking of subjects according to cardiorespiratory fitness. The fitted calibration line exhibited a slope of 0.466 and an intercept of 24.204 mL·kg⁻¹·min⁻¹, indicating compression of the predicted VO₂max range relative to the reference values. As a result, lower VO₂max values tended to be overestimated, whereas higher VO₂max values were generally underestimated. This behavior is likely influenced by the concentration of participants within the mid-range of the fitness distribution, which encourages regression toward the population mean and reduces predictive sensitivity at the extremes of the physiological spectrum.

This prediction pattern is further illustrated in Figure 11, which shows the residual error as a function of the true VO₂max value. Positive residuals are more frequent at lower fitness levels, while increasingly negative residuals appear at higher VO₂max values, confirming a compression effect in the predicted range. This behavior suggests that subjects at the physiological extremes remain the most challenging cases for the model, whereas prediction errors remain comparatively balanced within the mid-range values, where most participants are concentrated. Overall, the observed trend indicates that the proposed framework captures meaningful cardiorespiratory patterns from wearable-derived signals while highlighting the need for greater representation of extreme fitness profiles in future datasets.

To further evaluate the robustness of the proposed activity-independent representation, an additional LOAO validation was performed, in which all samples corresponding to one activity category were excluded during training and used exclusively for testing. This protocol provides a stricter assessment of generalization, as the model must infer cardiorespiratory fitness from movement intensities that were not observed during optimization. As shown in Figure 12, the average RMSE remained close to the LOSO baseline across all held-out activities, with all LOAO conditions remaining within 0.13 mL·kg⁻¹·min⁻¹ of the LOSO average RMSE. This indicates only modest performance variations between LOAO conditions. Slightly larger errors were observed when box carrying was excluded, suggesting that this activity may provide a richer combination of biomechanical and physiological responses that contributes useful information for characterizing cardiorespiratory fitness. Nevertheless, the overall differences remained limited, supporting the robustness of the proposed activity-independent representation. These results indicate that the proposed framework does not rely on activity-specific patterns, but instead learns generalized relationships between physiological response, biomechanical behavior, and underlying cardiorespiratory fitness.

Collectively, these results demonstrate that short-duration wearable-derived physiological and biomechanical signals encode sufficient information to recover meaningful inter-subject variability in cardiorespiratory fitness. Moreover, the limited variation observed across LOAO conditions indicates that the proposed framework does not depend heavily on any single activity and generalizes well to unseen activities within the experimental protocol, supporting the central premise of activity-independent cardiorespiratory health estimation from wearable data. Validation under fully unconstrained free-living behavior remains an important direction for future work.

To assess the propagation of Stage 1 errors into the final prediction task, we computed the correlation between subject-level MET prediction error (MAE) and the absolute VO₂max prediction residual. The resulting correlation was weak (Pearson

r = 0.124

,

p = 0.344

), indicating that larger MET estimation errors do not systematically correspond to larger VO₂max prediction errors.

4. Discussion and Future Work

This study demonstrates the feasibility of estimating cardiorespiratory fitness from wearable-derived physiological and motion signals collected during less than one hour of daily activities. Unlike traditional assessment methods that require controlled laboratory protocols, specialized equipment, and clinical supervision, the proposed framework suggests that meaningful fitness-related information can be inferred rapidly under semi-structured real-world activity conditions. This represents an important step toward accessible cardiorespiratory health monitoring using wearable devices. However, further validation under fully unconstrained free-living conditions is needed before conclusions regarding long-term everyday monitoring can be drawn.

A central contribution of this work is its activity-independent modeling strategy. By learning generalized relationships between movement intensity, physiological response, and cardiorespiratory capacity, the proposed framework eliminates the need for specific exercise protocols or prolonged data collection, thereby improving practicality for deployment in semi-structured and potentially real-world monitoring scenarios. Although the proposed framework is not intended to replace clinical CPET evaluation, it may serve as an accessible screening or longitudinal monitoring tool capable of identifying changes in cardiorespiratory fitness outside laboratory environments.

The diversity of the collected cohort, including participants of different ages, sexes, and fitness levels, strengthens the validity of the proposed framework and supports generalization across users. However, because most individuals naturally fall within average fitness ranges, the resulting target distribution is centered around the population mean. This imbalance is reflected in the calibration behavior observed in Figure 10, where the fitted regression line exhibits a compressed prediction range relative to the reference values. This effect is likely driven primarily by the limited representation of subjects at the extremes of the fitness spectrum, which encourages regression toward the population mean.

Another limitation arises from the first-stage MET estimation model and the reference intensity labels used for its development. The protocol-assigned MET values were derived from standardized resources, including the Compendium of Physical Activities and ANSI/CTA guidelines, and therefore provide validated population-level estimates of activity intensity. However, these labels do not fully capture inter-individual variability in energy expenditure, and the model was trained within a relatively limited intensity range (approximately 1 to 6.8 METs), restricting its ability to extrapolate beyond the activities represented in the dataset. Since estimated intensity serves as an important contextual feature for downstream VO₂max prediction, both subject-specific variations in metabolic cost and limited representation of higher or lower effort levels may influence model robustness. Nevertheless, the weak association observed between subject-level MET errors and VO₂max residuals suggests that the second-stage model is able to leverage complementary physiological, biomechanical, and demographic information beyond the intensity representation alone. Future work should incorporate individualized MET measurements obtained through indirect calorimetry or portable metabolic systems and extend the range of activity intensities to further improve generalization.

Future work should focus on expanding the dataset, particularly by increasing representation at the extremes of the fitness spectrum, where prediction remains most challenging. Incorporating additional biomarkers, such as respiration rate, electrocardiographic signals, skin temperature, and heart rate variability, may further improve physiological characterization. Finally, validation against direct CPET, the gold-standard measure of cardiorespiratory fitness, and the use of directly measured VO₂max values for model development would provide stronger physiological and clinical validation of the proposed framework and allow quantification of the impact of reference-label uncertainty introduced by indirect submaximal fitness assessments. Future work will therefore focus on collecting an independent cohort with simultaneous CPET and multimodal wearable measurements to evaluate the proposed activity-independent pipeline against directly measured VO₂max values and further assess its generalization capacity under gold-standard testing conditions.

5. Conclusions

This work presented an activity-independent machine learning framework for estimating cardiorespiratory fitness from short-duration multimodal wearable signals acquired during daily life activities. Unlike conventional approaches that rely on structured exercise protocols, prolonged monitoring periods, or explicit activity recognition, the proposed framework was designed to learn generalized relationships between movement intensity, physiological response, and subject-level fitness capacity. To achieve this, a two-stage modeling strategy was introduced in which MET was first estimated as a continuous representation of activity intensity and subsequently integrated with physiological biomarkers, biomechanical descriptors, and demographic information to predict VO₂max.

Experimental evaluation under the LOSO validation protocol demonstrated that short-duration wearable-derived signals contain meaningful information related to inter-subject variability in cardiorespiratory fitness. The proposed framework achieved moderate predictive agreement with reference VO₂max values, with a mean fold RMSE of 5.48 mL·kg⁻¹·min⁻¹, a global RMSE of 6.82 mL·kg⁻¹·min⁻¹, an

R^{2}

of 0.40, and a Pearson correlation coefficient of

r = 0.64

. These results indicate that clinically relevant fitness-related physiological patterns can be captured from less than one hour of wearable monitoring, even without explicit knowledge of the performed activity.

From a broader perspective, these findings support the feasibility of scalable, unobtrusive cardiorespiratory fitness assessment using wearable devices in semi-structured activity settings outside traditional laboratory environments. By reducing dependence on specialized testing equipment, supervised exercise protocols, and activity-specific models, the proposed framework represents a practical step toward accessible preventive health monitoring in everyday life. Future work will focus on validation against direct cardiopulmonary exercise testing measurements, evaluation under fully unconstrained free-living conditions, expanding population diversity, and improving predictive performance at the extremes of the fitness spectrum.

Author Contributions

Conceptualization, L.S.-A. and J.F.P.-M.; methodology, L.S.-A. and J.F.P.-M.; software, L.S.-A.; validation, L.S.-A. and J.F.P.-M.; investigation, L.S.-A., K.N.-T. and J.F.P.-M.; resources, J.F.P.-M.; data curation, L.S.-A.; writing—original draft preparation, L.S.-A., J.L.R.-C., and J.F.P.-M.; writing—review and editing, L.S.-A. and J.F.P.-M.; supervision, J.F.P.-M.; funding acquisition, J.F.P.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received financial support from the NSF CAREER Intelligent Biomarker Analysis based on Wearable Distributed Computing under Grant No. OAC-2439345.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the University of Puerto Rico’s Institutional Review Board (IRB), protocol CPHSI/IRB-URPM No. 2024070003 (approved on 27 January 2025).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data supporting the findings of this study are publicly available in Zenodo at https://doi.org/10.5281/zenodo.15857137.

Acknowledgments

During the preparation of this manuscript, the authors used OpenAI ChatGPT (GPT-5 series) for language refinement, grammar correction, and improvement of manuscript readability. The authors reviewed and edited all generated content and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, Z.; Cascioli, V.; McCarthy, P.W. Healthcare monitoring using low-cost sensors to supplement and replace human sensation: Does it have potential to increase independent living and prevent disease? Sensors 2023, 23, 2139. [Google Scholar] [CrossRef]
Patel, B. The role of AI-integrated wearables in predictive healthcare: A scoping review. J. Med. Artif. Intell. 2026, 9, 26. [Google Scholar] [CrossRef]
Ross, R.; Blair, S.N.; Arena, R.; Church, T.S.; Després, J.P.; Franklin, B.A.; Haskell, W.L.; Kaminsky, L.A.; Levine, B.D.; Lavie, C.J.; et al. Importance of assessing cardiorespiratory fitness in clinical practice: A case for fitness as a clinical vital sign: A scientific statement from the American Heart Association. Circulation 2016, 134, e653–e699. [Google Scholar] [CrossRef] [PubMed]
Lang, J.J.; Prince, S.A.; Merucci, K.; Cadenas-Sanchez, C.; Chaput, J.P.; Fraser, B.J.; Manyanga, T.; McGrath, R.; Ortega, F.B.; Singh, B.; et al. Cardiorespiratory fitness is a strong and consistent predictor of morbidity and mortality among adults: An overview of meta-analyses representing over 20.9 million observations from 199 unique cohort studies. Br. J. Sports Med. 2024, 58, 556–566. [Google Scholar] [CrossRef] [PubMed]
LaMonte, M.J. Cardiorespiratory fitness in the prevention and management of cardiovascular disease. Rev. Cardiovasc. Med. 2022, 23, 382. [Google Scholar] [CrossRef] [PubMed]
Laukkanen, J.; Rauramaa, R.; Salonen, J.; Kurl, S. The predictive value of cardiorespiratory fitness combined with coronary risk evaluation and the risk of cardiovascular and all-cause death. J. Intern. Med. 2007, 262, 263–272. [Google Scholar] [PubMed]
Beltz, N.M.; Gibson, A.L.; Janot, J.M.; Kravitz, L.; Mermier, C.M.; Dalleck, L.C. Graded exercise testing protocols for the determination of VO2max: Historical perspectives, progress, and future considerations. J. Sports Med. 2016, 2016, 3968393. [Google Scholar] [CrossRef]
Hsiao, C.T.; Tong, C.; Coté, G.L. Machine learning-based VO2 estimation using a wearable multiwavelength photoplethysmography device. Biosensors 2025, 15, 208. [Google Scholar] [CrossRef] [PubMed]
Yu, W.; Cai, P.J.; Liu, R.; Shen, F.P.; Zhang, T. A flexible ultrasensitive IgG-modified rGO-based FET biosensor fabricated by aerosol jet printing. Appl. Mech. Mater. 2015, 748, 157–161. [Google Scholar]
Perilli, S.; Di Pietro, M.; Mantini, E.; Regazzetti, M.; Kiper, P.; Galliani, F.; Panella, M.; Mantini, D. Development of a wearable electromyographic sensor with aerosol jet printing technology. Bioengineering 2024, 11, 1283. [Google Scholar] [CrossRef] [PubMed]
Spathis, D.; Perez-Pozuelo, I.; Gonzales, T.I.; Wu, Y.; Brage, S.; Wareham, N.; Mascolo, C. Longitudinal cardio-respiratory fitness prediction through wearables in free-living environments. npj Digit. Med. 2022, 5, 176. [Google Scholar] [PubMed]
Frade, M.C.M.; Beltrame, T.; Gois, M.d.O.; Pinto, A.; Tonello, S.C.G.d.M.; Torres, R.d.S.; Catai, A.M. Toward characterizing cardiovascular fitness using machine learning based on unobtrusive data. PLoS ONE 2023, 18, e0282398. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Song, Y.; Pang, L.; Li, S.; Sun, G. Attention-Enhanced CNN-LSTM Model for Exercise Oxygen Consumption Prediction with Multi-Source Temporal Features. Sensors 2025, 25, 4062. [Google Scholar] [PubMed]
Beltrame, T.; Amelard, R.; Wong, A.; Hughson, R.L. Prediction of oxygen uptake dynamics by machine learning analysis of wearable sensors during activities of daily living. Sci. Rep. 2017, 7, 45738. [Google Scholar] [CrossRef] [PubMed]
Sheridan, D.; Jaspers, A.; Viet Cuong, D.; Op De Beéck, T.; Moyna, N.M.; de Beukelaar, T.T.; Roantree, M. Estimating oxygen uptake in simulated team sports using machine learning models and wearable sensor data: A pilot study. PLoS ONE 2025, 20, e0319760. [Google Scholar] [CrossRef] [PubMed]
Neshitov, A.; Tyapochkin, K.; Kovaleva, M.; Dreneva, A.; Surkova, E.; Smorodnikova, E.; Pravdin, P. Estimation of cardiorespiratory fitness using heart rate and step count data. Sci. Rep. 2023, 13, 15808. [Google Scholar] [CrossRef] [PubMed]
Saldaña-Aristizábal, L.; Rivas-Caicedo, J.L.; Niño-Tejada, K.; Patarroyo-Montenegro, J.F. Indirect AI-Based Estimation of Cardiorespiratory Fitness from Daily Activities Using Wearables. Electronics 2025, 14, 3081. [Google Scholar] [CrossRef]
Buttar, K.K.; SaBoo, N.; KaCKer, S. Maximum Oxygen Consumption (VO2 max) Estimation using Direct and Indirect Method in Indian Population: A Pilot Study. J. Clin. Diagn. Res. 2020, 14, 6. [Google Scholar] [CrossRef]
Akay, M.F.; Çetin, E.; Yarım, İ.; Bozkurt, Ö.; Özçiloğlu, M.M. Development of novel maximal oxygen uptake prediction models for Turkish college students using machine learning and exercise data. In Proceedings of the 2017 9th International Conference on Computational Intelligence and Communication Networks (CICN); IEEE: Piscataway, NJ, USA, 2017; pp. 186–189. [Google Scholar]
Chikov, A.; Egorov, N.; Medvedev, D.; Chikova, S.; Pavlov, E.; Drobintsev, P.; Krasichkov, A.; Kaplun, D. Determination of the athletes’ anaerobic threshold using machine learning methods. Biomed. Signal Process. Control 2022, 73, 103414. [Google Scholar] [CrossRef]
Nabi, T.; Rafiq, N.; Qayoom, O. Assessment of cardiovascular fitness [VO2 max] among medical students by Queens College step test. Int. J. BioMed Adv. Res. 2015, 6, 418–421. [Google Scholar]
Chatterjee, S.; Chatterjee, P.; Bandyopadhyay, A. Validity of Queen’s College Step Test for estimation of maximum oxygen uptake in female students. Indian J. Med. Res. 2005, 121, 32–35. [Google Scholar] [PubMed]
Heydari, P.; Varmazyar, S.; Variani, A.S.; Hashemi, F.; Ataei, S.S. Correlation of Gerkin, Queen’s College, George, and Jackson methods in estimating maximal oxygen consumption. Electron. Physician 2017, 9, 5525. [Google Scholar] [PubMed][Green Version]
Jetté, M.; Sidney, K.; Blümchen, G. Metabolic equivalents (METS) in exercise testing, exercise prescription, and evaluation of functional capacity. Clin. Cardiol. 1990, 13, 555–565. [Google Scholar] [CrossRef] [PubMed]
Ainsworth, B.E.; Haskell, W.L.; Whitt, M.C.; Irwin, M.L.; Swartz, A.M.; Strath, S.J.; O’brien, W.L.; Bassett, D.R.; Schmitz, K.H.; Emplaincourt, P.O.; et al. Compendium of physical activities: An update of activity codes and MET intensities. Med. Sci. Sports Exerc. 2000, 32, S498–S516. [Google Scholar] [CrossRef] [PubMed]
Byrne, N.M.; Hills, A.P.; Hunter, G.R.; Weinsier, R.L.; Schutz, Y. Metabolic equivalent: One size does not fit all. J. Appl. Physiol. 2005, 99, 1112–1119. [Google Scholar] [CrossRef] [PubMed]
Herrmann, S.D.; Willis, E.A.; Ainsworth, B.E.; Barreira, T.V.; Hastert, M.; Kracht, C.L.; Schuna, J.M., Jr.; Cai, Z.; Quan, M.; Tudor-Locke, C.; et al. 2024 Adult Compendium of Physical Activities: A third update of the energy costs of human activities. J. Sport Health Sci. 2024, 13, 6–12. [Google Scholar] [CrossRef] [PubMed]
Mendes, M.d.A.; Da Silva, I.; Ramires, V.; Reichert, F.; Martins, R.; Ferreira, R.; Tomasi, E. Metabolic equivalent of task (METs) thresholds as an indicator of physical activity intensity. PLoS ONE 2018, 13, e0200701. [Google Scholar] [CrossRef] [PubMed]
O’Driscoll, R.; Turicchi, J.; Hopkins, M.; Duarte, C.; Horgan, G.W.; Finlayson, G.; Stubbs, R.J. Comparison of the validity and generalizability of machine learning algorithms for the prediction of energy expenditure: Validation study. JMIR mHealth uHealth 2021, 9, e23938. [Google Scholar] [CrossRef] [PubMed]
Wei, B.; Romano, C.; Pedram, M.; Nolan, B.; Morelli, W.A.; Alshurafa, N. Developing and comparing a new BMI inclusive energy expenditure algorithm on wrist-worn wearables. Sci. Rep. 2025, 15, 20060. [Google Scholar] [CrossRef] [PubMed]
Shephard, R.J. PAR-Q, Canadian Home Fitness Test and exercise screening alternatives. Sports Med. 1988, 5, 185–195. [Google Scholar] [CrossRef] [PubMed]
Rivas-Caicedo, J.L.; Saldaña-Aristizabal, L.; Niño-Tejada, K.; Patarroyo-Montenegro, J.F. A Multi-Sensor Dataset for Human Activity Recognition Using Inertial and Orientation Data. Data 2025, 10, 129. [Google Scholar] [CrossRef]
Consumer Technology Association. ANSI/CTA-2065-A, Physical Activity Monitoring for Wearables. 2023. Available online: https://www.cta.tech/standards/ansicta-2065-a/ (accessed on 1 May 2026).
Consumer Technology Association. ANSI/CTA-2074-R-2025: Core Fitness Metrics and Definitions. 2025. Available online: https://shop.cta.tech/products/ansi-cta-2074-r-2025 (accessed on 1 May 2026).
Herrmann, S.D.; Willis, E.A.; Ainsworth, B.E. The 2024 compendium of physical activities and its expansion. J. Sport Health Sci. 2024, 13, 1. [Google Scholar] [CrossRef] [PubMed]
Cui, B.; Song, X.; Monique, T.; van Beijnum, B.J.; Wang, Y. Evaluating Multi-Sensor Placement and Neural Network Architectures for Physical Activity Level Classification. In Proceedings of the 2025 47th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC); IEEE: Piscataway, NJ, USA, 2025; pp. 1–5. [Google Scholar]
Liu, H.; Allen, J.; Khalid, S.G.; Chen, F.; Zheng, D. Filtering-induced time shifts in photoplethysmography pulse features measured at different body sites: The importance of filter definition and standardization. Physiol. Meas. 2021, 42, 074001. [Google Scholar] [CrossRef]
Türkler, L.; Akkan, L.Ö. Noise Reduction Techniques for Sensor Data: Comparative Analysis of Kalman, Butterworth, Savitzky-Golay, Median, and Moving Average Filters for UWB-Based Position Estimation. Celal Bayar Univ. J. Sci. 2025, 21, 146–159. [Google Scholar] [CrossRef]
Jung, S.; de l’Escalopier, N.; Oudre, L.; Truong, C.; Dorveaux, E.; Gorintin, L.; Ricard, D. A machine learning pipeline for gait analysis in a semi free-living environment. Sensors 2023, 23, 4000. [Google Scholar] [CrossRef] [PubMed]
Yuan, Y.; Yu, Y.; Cai, S.; Cheng, W. Optimizing wearable IMU configurations for running gait analysis: A machine learning-based sensor fusion approach. Front. Bioeng. Biotechnol. 2026, 14, 1762919. [Google Scholar] [CrossRef] [PubMed]
Dosinas, A.; Lukocius, R.; Vaitkunas, M.; Nedzinskaite, G.; Vaskys, P.; Gudzius, S.; Jonaitis, A. Sensors and signal processing methods for a wearable physiological parameters monitoring system. Elektron. Ir. Elektrotechnika 2017, 23, 74–81. [Google Scholar] [CrossRef]
Montoye, A.H.; Conger, S.A.; Connolly, C.P.; Imboden, M.T.; Nelson, M.B.; Bock, J.M.; Kaminsky, L.A. Validation of accelerometer-based energy expenditure prediction models in structured and simulated free-living settings. Meas. Phys. Educ. Exerc. Sci. 2017, 21, 223–234. [Google Scholar]
Bunc, V.; Heller, J.; Leso, J. Kinetics of heart rate responses to exercise. J. Sports Sci. 1988, 6, 39–48. [Google Scholar] [CrossRef] [PubMed]
Borresen, J.; Lambert, M.I. Autonomic control of heart rate during and after exercise: Measurements and implications for monitoring training status. Sports Med. 2008, 38, 633–646. [Google Scholar] [CrossRef] [PubMed]
Kwon, S.B.; Ahn, J.W.; Lee, S.M.; Lee, J.; Lee, D.; Hong, J.; Kim, H.C.; Yoon, H.J. Estimating maximal oxygen uptake from daily activity data measured by a watch-type fitness tracker: Cross-sectional study. JMIR mHealth uHealth 2019, 7, e13327. [Google Scholar] [PubMed]
Granero-Gallegos, A.; González-Quílez, A.; Plews, D.; Carrasco-Poyatos, M. HRV-based training for improving VO2max in endurance athletes. A systematic review with meta-analysis. Int. J. Environ. Res. Public Health 2020, 17, 7999. [Google Scholar] [CrossRef] [PubMed]
Xiang, L.; Deng, K.; Mei, Q.; Gao, Z.; Yang, T.; Wang, A.; Fernandez, J.; Gu, Y. Population and age-based cardiorespiratory fitness level investigation and automatic prediction. Front. Cardiovasc. Med. 2022, 8, 758589. [Google Scholar] [CrossRef] [PubMed]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Wang, Z.; Zhang, Q.; Lan, K.; Yang, Z.; Gao, X.; Wu, A.; Xin, Y.; Zhang, Z. Enhancing instantaneous oxygen uptake estimation by non-linear model using cardio-pulmonary physiological and motion signals. Front. Physiol. 2022, 13, 897412. [Google Scholar] [PubMed]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]

Figure 1. Conceptual comparison between conventional laboratory-based CPET and wearable-based approaches for cardiorespiratory fitness assessment.

Figure 2. Proposed two-stage framework for wearable-based VO₂max estimation. Stage 1 converts short-duration movement signals into continuous MET estimates and evaluates their temporal stability. Only segments that satisfy the stability criterion (

s_{i} \geq 60

s) are transferred through the green pathway to Stage 2, where movement-derived, physiological, intensity-related, and demographic features are extracted and used to predict subject-level VO₂max.

Figure 2. Proposed two-stage framework for wearable-based VO₂max estimation. Stage 1 converts short-duration movement signals into continuous MET estimates and evaluates their temporal stability. Only segments that satisfy the stability criterion (

s_{i} \geq 60

s) are transferred through the green pathway to Stage 2, where movement-derived, physiological, intensity-related, and demographic features are extracted and used to predict subject-level VO₂max.

Figure 3. Distribution of demographic characteristics and VO₂max values across the study cohort, illustrating the heterogeneity of participants in age, body composition, sex, and cardiorespiratory fitness.

Figure 4. Structured activity protocol implemented during data collection, showing the sequence and duration of the activities performed by all participants. The protocol spans a broad range of MET-defined physical intensities over approximately 37 min and corresponds to the data acquisition procedure used to construct the multimodal dataset analyzed in this study.

Figure 5. (a) Placement of the wearable sensors and measured signals. (b) Participant wearing the sensing system during data collection. Photographs are original images acquired by the authors during the study and are reproduced with participant consent. Identifiable facial information has been removed to protect participant privacy.

Figure 6. Detailed overview of Stage 1 of the proposed framework. Wearable motion signals are preprocessed and segmented into short-duration windows for feature extraction and MET regression. The resulting continuous MET signal is subsequently smoothed and analyzed using a stability criterion based on a rolling coefficient of variation. Stable segments with an accumulated duration exceeding 60 s are transferred to Stage 2, whereas unstable periods continue accumulating additional windows until the stability condition is satisfied.

Figure 7. Detailed overview of Stage 2 of the proposed framework. Stable activity windows obtained from Stage 1 are segmented into 60-s intervals and transformed into multimodal feature vectors combining movement-derived, physiological, intensity-related, and demographic information. Window-level VO₂max estimates generated by the XGBoost regression model are subsequently aggregated to produce a final subject-level prediction.

Figure 8. Temporal evolution of representative physiological, movement-derived, and intensity-related features for Subject 5 across the structured activity protocol.

Figure 9. MET regression performance under LOSO and LOAO evaluation settings, showing both RMSE and prediction bias across held-out activity conditions.

Figure 10. Predicted versus reference VO₂max values across all LOSO folds. The fitted regression line illustrates the overall agreement trend between model estimates and measured VO₂max values.

Figure 11. Relationship between absolute prediction error and reference VO₂max values.

Figure 12. Comparison of average RMSE obtained under standard LOSO evaluation and LOAO validation for each held-out activity. The dashed line represents the LOSO baseline (5.48 mL·kg⁻¹·min⁻¹). Despite modest activity-specific variations, all LOAO conditions remained close to the baseline performance, indicating robust generalization to unseen activities within the experimental protocol.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Saldaña-Aristizábal, L.; Rivas-Caicedo, J.L.; Niño-Tejada, K.; Patarroyo-Montenegro, J.F. Activity-Independent Estimation of VO₂max from Short-Duration Multimodal Wearable Signals. Electronics 2026, 15, 2843. https://doi.org/10.3390/electronics15132843

AMA Style

Saldaña-Aristizábal L, Rivas-Caicedo JL, Niño-Tejada K, Patarroyo-Montenegro JF. Activity-Independent Estimation of VO₂max from Short-Duration Multimodal Wearable Signals. Electronics. 2026; 15(13):2843. https://doi.org/10.3390/electronics15132843

Chicago/Turabian Style

Saldaña-Aristizábal, Laura, Jhonathan L. Rivas-Caicedo, Kevin Niño-Tejada, and Juan F. Patarroyo-Montenegro. 2026. "Activity-Independent Estimation of VO₂max from Short-Duration Multimodal Wearable Signals" Electronics 15, no. 13: 2843. https://doi.org/10.3390/electronics15132843

APA Style

Saldaña-Aristizábal, L., Rivas-Caicedo, J. L., Niño-Tejada, K., & Patarroyo-Montenegro, J. F. (2026). Activity-Independent Estimation of VO₂max from Short-Duration Multimodal Wearable Signals. Electronics, 15(13), 2843. https://doi.org/10.3390/electronics15132843

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Activity-Independent Estimation of VO₂max from Short-Duration Multimodal Wearable Signals

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design and Data Collection Protocol

2.2. Wearable Sensor Configuration

2.3. Signal Preprocessing

2.4. Stage 1: Intensity Representation for Activity-Independent Modeling

2.4.1. Feature Engineering for MET Regression

2.4.2. MET-Based Intensity Estimation

2.5. Integration into the Prediction Pipeline

2.6. Stage 2: Modeling Framework for VO₂max Estimation

2.6.1. Feature Engineering for VO₂max Regression

2.6.2. Subject-Level Prediction Strategy

2.6.3. Model Architecture

2.7. Evaluation Protocol

3. Results

3.1. Stage 1 Performance: MET Regression

3.2. Stage 2 Performance: VO₂max Estimation

4. Discussion and Future Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Activity-Independent Estimation of VO2max from Short-Duration Multimodal Wearable Signals

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design and Data Collection Protocol

2.2. Wearable Sensor Configuration

2.3. Signal Preprocessing

2.4. Stage 1: Intensity Representation for Activity-Independent Modeling

2.4.1. Feature Engineering for MET Regression

2.4.2. MET-Based Intensity Estimation

2.5. Integration into the Prediction Pipeline

2.6. Stage 2: Modeling Framework for VO2max Estimation

2.6.1. Feature Engineering for VO2max Regression

2.6.2. Subject-Level Prediction Strategy

2.6.3. Model Architecture

2.7. Evaluation Protocol

3. Results

3.1. Stage 1 Performance: MET Regression

3.2. Stage 2 Performance: VO2max Estimation

4. Discussion and Future Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Activity-Independent Estimation of VO₂max from Short-Duration Multimodal Wearable Signals

2.6. Stage 2: Modeling Framework for VO₂max Estimation

2.6.1. Feature Engineering for VO₂max Regression

3.2. Stage 2 Performance: VO₂max Estimation