1. Introduction
Recently, research on the accuracy of bio-signal measurement based on wearable devices has been actively conducted [
1,
2]. Wearable devices, such as smart rings and watches, measure bio-signals in real-time, including heart rate, oxygen saturation, and skin temperature. Based on the measured bio-signals, wearable devices can determine information such as personal activity status and health status [
3]. Among these, energy expenditure (EE), which provides information to determine the user’s level of activity, calculates activity levels by distinguishing body movements, making it possible to determine. So, it is possible to determine whether the user is fatigued or overworked. Moreover, by integrating biomechanical motion patterns with physiological data, it is possible to infer the specific activities being performed by the user in real-time [
4]. Personalized physiological and activity data play a crucial role in real-time health monitoring, risk assessment, and emergency response, significantly contributing to individual healthcare [
5].
Among wearable devices, smartwatches are the most widely adopted due to their high usability and convenience. When using wearable devices to assess the energy expenditure of workers, it is essential to ensure that the device does not pose a safety risk or interfere with work tasks. Smart rings may interfere with precise hand movements required for certain occupations and pose a potential safety risk due to their conductive metal exterior, which can conduct electricity. Therefore, for monitoring workers’ activity levels, smartwatches are more suitable, as they do not hinder finger movements and provide insulation through rubber wristbands. Smartwatches are equipped with photoplethysmography (PPG) sensors, enabling real-time measurement of heart rate (HR), heart rate variability (HRV), and oxygen saturation (SpO2). Since heart rate is strongly correlated with exercise intensity, it can be used to estimate exercise intensity and infer EE.
Various methods exist for estimating EE, including the oxygen consumption (VO
2) method, metabolic equivalent of task (MET) method, and heart rate-based method. The VO
2 method, derived from indirect calorimetry, calculates EE by measuring the amount of oxygen consumed by the body. This method directly reflects metabolic activity and is considered the most accurate approach for estimating EE [
6,
7,
8]. However, the VO
2-based method requires a sensor to be attached to the mouth, limiting its practical usability.
The MET method quantifies EE by comparing it to the resting metabolic rate (RMR) [
9,
10,
11]. This approach has the benefit of enabling simple EE estimation using only MET values for each activity type, body weight, and exercise duration. Thus, it is widely used in wearable devices such as smartwatches for daily EE measurement [
12]. However, the MET method has limitations as it does not account for changes in EE based on the exercise environment (e.g., outdoor vs. indoor), even within the same type of activity. Additionally, accurately applying MET indices requires information on both the user’s current activity type and exercise intensity. However, real-time identification of activity types and intensities is challenging. To address this, researchers have proposed methods that use cameras to recognize user activities [
13,
14]. Nevertheless, these methods face practical limitations, including difficulty in accurately recognizing movements and the need for the camera to track the user during activities like running. Furthermore, cameras alone cannot effectively assess exercise intensity variations caused by differences in movement speed.
EE can be estimated using the Keytel equation, which is based on heart rate [
15,
16,
17]. This approach has the advantage of reflecting exercise intensity in real-time through variations in heart rate, allowing for dynamic and individualized EE estimation [
18,
19]. However, the Keytel equation assumes a proportional relationship between heart rate and VO
2, which may lead to inaccuracies as the proportionality factor varies across different exercise intensities, reducing its precision [
20,
21].
To overcome these limitations, we propose a novel real-time and personalized EE estimation method, referred to as RTEE. The proposed method integrates conventional EE estimation formulas with a deep Q-network (DQN)-based exercise intensity inference model to predict EE based on real-time heart rate data. To improve real-time EE accuracy, a DQN-based, heart rate-driven exercise intensity prediction network is employed. Consequently, the proposed RTEE provides fine-grained, real-time EE estimation per second by leveraging heart rate variations that reflect an individual’s health and activity status.
3. Methods
3.1. Overview of the Proposed Method
In this study, Deep Q-Network (DQN) was selected as the reinforcement learning model, considering the characteristics of the task, which involves real-time energy expenditure prediction based on heart rate. DQN offers advantages in terms of stable convergence during training and ease of implementation. It is particularly well-suited for problems with discrete and relatively limited state and action spaces.
The action space in this study consists of 201 discrete activity intensity coefficients, divided into increments of 0.1. The reinforcement learning model is thus required to select the most appropriate value from these 201 candidates. This discrete action selection aligns naturally with the Q-value-based decision mechanism of DQN, making it a suitable choice for this problem setting.
Although alternative approaches such as actor-critic algorithms are widely used in reinforcement learning, they generally involve simultaneous optimization of policy and value functions, which increases computational complexity and introduces additional stability challenges [
38,
39]. In contrast, DQN focuses solely on value function approximation, resulting in lower computational overhead and more stable training dynamics. Given the real-time constraints of the application, computational efficiency and stability were critical considerations, leading to the selection of DQN as the most appropriate reinforcement learning framework for this study.
The proposed RTEE algorithm estimates EE by combining real-time heart rate values, which are corrected using DQN, with a modified MET.
Figure 1 shows the overall structure of RTEE, a novel energy consumption prediction method proposed. RTEE consists of the AF-RL part that predicts the activity intensity coefficient as RL and the EEE part that calculates the predicted EE. Data including personal body information and heart rate information for 300 s are input into AF-RL, and this data are input into the Environment.
In the Environment, 1 s of data are given as the State, and the Agent predicts the activity intensity coefficient a based on the user’s body information and heart rate information presented in the State. The EEE that receives the prediction calculates the real-time EE through modified MET using the personal information provided in the State. The calculated real-time EE is then sent back to the Environment to calculate the error with the ground truth (GT) EE. This process is repeated to predict the optimal activity intensity coefficient a. Therefore, real-time EE prediction is possible, and the structure can be personalized according to the individual’s health status.
Figure 2 is a schematic diagram of the policy network. The policy network of the DQN model consists of a fully connected feedforward neural network. The input to the network includes five features: weight, height, age, gender, and heart rate. The network architecture comprises two hidden layers with 64 neurons each, followed by ReLU activation functions. The output layer consists of 201 units corresponding to discrete activity intensity coefficients, with a linear activation function.
3.2. Proposed AF-RL
Proposed AF-RL, a DQN-based activity intensity factor (a) inference network, predicts real-time activity intensity based on heart rate.
Figure 3 shows the input data provided as the model’s Environment. One Environment is defined as 5 min (300 s) of data consisting of [Weight, Height, Age, Gender, current heart rate] at 1 s intervals. The Environment has the entire 300 s data and provides the State to the Agent at 1 s intervals as follows.
The agent receives a State, defined as in Equation (9), and establishes a policy to estimate the optimal
activity intensity coefficient. The action space available to the agent, based on the policy, is defined as follows:
Here,
represents the activity intensity coefficient. The activity intensity coefficient ranges from 0 to 20 and is divided into 0.1-unit increments, resulting in a total of 201 possible values, from which one is selected as
. The selected activity intensity coefficient
is transferred to EEE and applied to calculate the predicted EE. The predicted EE, computed in EEE, is then transferred back to the Environment, where it is compared with the GT EE to calculate the error. The Agent receives a reward based on this comparison. The reward function is as follows:
In this study, GT_EE refers to the energy expenditure (EE) per second calculated based on the indirect calorimetry formula, as defined in Equation (1). The proposed reward function is based on the absolute error between the predicted energy expenditure (Pred_EE) and the ground truth energy expenditure (GT_EE), and is computed according to Equation (10). The objective of the agent is to make predictions that are as close as possible to the actual values. Accordingly, the reward function is designed to guide the learning process toward minimizing the magnitude of the prediction error.
Although mean squared error (MSE) is commonly used in reinforcement learning reward functions, this study adopts an absolute error-based approach. This is because the prediction values in this context are relatively small, and using MSE may result in insufficient reward sensitivity for effective learning.
In addition, normalization was implemented in the reward function by introducing a denominator. This normalization was applied to compensate for variations in the relative error according to the magnitude of GT_EE, thereby facilitating stable learning. By normalizing the error with respect to the magnitude of GT_EE, the reward function provides consistent learning feedback across a wide range of energy expenditure values. This contributes to a more stable learning process.
The reward is always negative and approaches zero as the prediction error decreases. This structure encourages the agent to revise its policy in the direction of reducing the prediction error, providing a clear learning objective. This reward function is particularly well-suited for the task of second-by-second energy expenditure prediction, and the agent progressively improves its estimation accuracy by adjusting its action through iterative learning.
3.3. Proposed EEE
The DQN model determines the activity intensity coefficient based on the policy established by considering the state information. The determined activity intensity coefficient is then transferred to EEE for predicted EE calculation. EEE computes the predicted EE by utilizing the activity intensity coefficient (a) received from AF-RL. The equation for calculating the predicted EE based on the activity intensity coefficient is derived from Equations (3) and (5). According to Equation (3), EE is calculated by multiplying the RMR by the MET index. Similarly, Equation (5) states that EE can be calculated by adding the activity energy expenditure (AEE) to RMR. Therefore, Equation (11) is derived as follows.
Here,
represents the predicted EE per second and
denotes the RMR per second, calculated using the Mifflin formula (Equations (6) and (7)) in units of seconds. Additionally, a is the activity intensity coefficient (
) predicted by AF-RL. This equation can be explained as follows: The activity intensity coefficient is added to RMR per second, which corresponds to the 1 MET state. The value a represents the MET index excluding the default state of 1, and multiplying this value by RMR yields AEE. Real-time EE prediction is achievable using this equation.
The predicted EE, calculated according to Equation (11), is transferred to the AF-RL Environment and used to compute the reward for the action performed by the Agent, as defined in Equation (10). By iteratively executing this process, the Agent gradually enhances the accuracy of its EE predictions. The RTEE model proposed in this study predicts personalized EE in real time based on reinforcement learning (RL). In summary, AF-RL learns the optimal activity intensity coefficient by considering personal physiological data and heart rate measurements using the DQN-based RL model. The predicted a is applied to the newly defined Equation (11) in EEE to compute the predicted EE. Through this iterative process, the RL agent receives rewards and gradually enhances EE prediction accuracy. The RTEE model enables real-time EE prediction through this structured framework. Furthermore, it is designed for smartwatch implementation, allowing EE prediction using only basic personal information and heart rate data.
5. Result
5.1. Performance Comparison by Algorithm
To evaluate the performance of the proposed model, we compared its energy expenditure (EE) prediction results with those obtained from conventional estimation methods, specifically the MET-based method (Equation (2)) and the Keytel equation (Equation (4)), both of which have been widely used in previous studies.
The MET method estimates EE based on a pre-defined MET value and body weight, requiring prior knowledge of the MET index corresponding to the activity type. The Keytel equation estimates EE using heart rate along with personal information such as Gender, weight, and age.
Table 7 summarizes the EE estimation formulas presented earlier, along with the required input parameters for each method and the proposed model.
In this study, we analyzed the changes in loss values to verify the convergence behavior of the proposed DQN-based model. As shown in
Figure 5, the loss exhibited relatively large fluctuations during the early stages of training but gradually decreased over time. Eventually, the loss stabilized within the range of approximately 99,000 to 120,000, indicating that the policy network effectively learned to approximate the Q-values. Training was terminated at 35 epochs, at which point the loss was judged to have sufficiently converged.
The experiment was conducted by training the prediction model using the proposed TREE and then validating it with data from participants who were not included in the training process. The experimental results are visualized in graphs comparing the prediction outcomes of GT EE, Keytel EE, MET EE, and our model. Additionally, the overall error and activity-specific errors for each participant are presented in a table as MAE values. Furthermore, similar to the graph comparing the EE prediction results with those of our model, as provided by the Apple Watch in the WEEE dataset, the MAE errors are also summarized in the table.
Figure 6a shows the prediction results for participant P03, while
Figure 6b presents the results for participant P09. For performance verification, the results are shown alongside GT EE, Keytel EE, and MET EE. GT EE represents the EE per second calculated based on Equation (1), while our model (Ours) predicts EE per second using RTEE, as proposed in this paper. Keytel EE is computed using Equation (4), and MET EE is derived from Equation (2). Examining both (a) and (b), we observe that MET EE exhibits the smallest error relative to GT EE during the initial rest period. However, the gap between GT EE and MET EE widens significantly once the activity begins. This indicates that the MET-based EE calculation method is relatively accurate during rest but tends to overestimate EE during activities, leading to a significant reduction in accuracy. Furthermore, Keytel EE consistently overestimates EE throughout the entire duration.
In contrast, our model (Ours) demonstrates the smallest error range across all activities. The detailed activity-specific and overall errors are presented below.
Figure 7 visualizes the errors of GT EE and each measurement method based on the experimental results in
Figure 5. Similar to the prediction results, it can be observed that the MET method exhibits the lowest error during the resting state, while the proposed model achieves the lowest error during active states, particularly at higher exercise intensities.
Table 8 shows the overall and activity-specific MAE errors of EE prediction results for RTEE in participants P03 and P09, along with the MAE errors of Keytel EE and MET EE. The prediction results indicate that the proposed model achieves the lowest prediction error compared to GT EE across the entire measurement period. An analysis of prediction performance by activity segment shows that during the resting state, the MET method produced the lowest error for both participants. However, as they transitioned into the active state, RTEE recorded the lowest prediction error across all activity sections except for the ‘Run2’ section of P03.
In general, while MET EE provides accurate predictions during rest, its error increases significantly during activities. Similarly, Keytel EE consistently overestimates EE across all segments. Therefore, the results demonstrate that the proposed RTEE model most accurately predicts EE across the entire measurement period.
5.2. Comparison with Apple Watch
In the previous experiments, we compared the performance of our model (Ours) with the Keytel EE and MET EE calculation methods, which rely on predefined formulas. Additionally, to evaluate how well our model performs against the latest techniques, we compared it with the EE estimates provided by the Apple Watch Series 6, one of the most widely used wearable devices. The EE data from the Apple Watch was obtained from the WEEE dataset.
Figure 8a shows a comparison between GT EE and the predicted EE from our model (Ours) during the activities of participant P03, while
Figure 8b compares the EE measured by the Apple Watch with the predicted results. The analysis excludes missing time intervals from the 30 min measurement period.
Examining
Figure 8a, we observe that when EE increases, the Apple Watch 6′s predicted EE also rises significantly, leading to substantial prediction errors. In contrast,
Figure 8b shows that when EE increases sharply, the Apple Watch 6 does not reflect this increase accurately and instead predicts lower EE values. Overall, these results indicate that our model (Ours) provides more precise EE predictions than the Apple Watch. Below is a table presenting the MAE errors for both the Apple Watch 6 and our model.
Figure 9 visualizes the errors recorded by the Apple Watch and the proposed model compared to the ground truth EE. Overall, the proposed model demonstrated comparable performance to the Apple Watch during lower-intensity activities while showing smaller errors as the exercise intensity increased.
Table 9 shows the performance error for each participant by calculating the MAE error through a comparison of Apple Watch Series 6 EE and our model’s (Ours) predicted EE with GT EE. As shown in the table, both participants (P03 and P09) exhibit lower EE prediction errors with our model than with the Apple Watch 6. Notably, participant P03 achieved a prediction error of just 0.019583, indicating a very low error.
5.3. Performance Comparison According to DQN Scenario Length
In RL, performance can vary significantly depending on the scenario length setting. In this study, we conducted two performance tests for prediction accuracy, using scenario lengths of 5 min (300 s) and 1 min (60 s).
Figure 10 shows the EE prediction results from an experiment comparing performance with scenario lengths of 5 min and 1 min. A 5Min EE represents the MAE error across the entire test dataset (participants P03 and P06) when trained with a 5 min scenario length, while 1Min EE represents the MAE error when trained with a 1 min scenario length. The visualization results indicate that while 1Min EE exhibits lower errors in certain areas, 5Min EE demonstrates lower overall errors across the entire dataset. Below is a graph illustrating the MAE errors for both scenario lengths.
Table 10 shows the MAE error of EE prediction based on scenario length. As shown in the table, the prediction error is lower when the scenario length is set to 5 min compared to 1 min.
6. Conclusions
In this study, we proposed RTEE, a real-time energy expenditure (EE) prediction model that infers activity intensity from heart rate and applies reinforcement learning. RTEE successfully reflects the Metabolic Equivalent of Task (MET) in real time and mitigates EE overestimation at high intensities. It also leverages the Keytel method to address the variable relationship between heart rate and oxygen consumption, improving prediction accuracy during rest.
RTEE outperformed conventional methods by achieving the lowest average error across various activities, making it particularly suitable for monitoring EE in workers performing diverse tasks rather than single-type exercises. Unlike traditional MET-based approaches, which require activity-specific indices, RTEE estimates intensity coefficients directly from heart rate changes, enhancing real-time usability.
Although the model does not explicitly model the delay in reaching a steady physiological state, it partially captures transition dynamics via heart rate. Future work will focus on modeling these dynamics more precisely and improving accuracy during exercise onset.
We also acknowledge limitations related to the use of RMR (estimated via the Mifflin equation) instead of BMR and dataset dependency. To address variability and approximation errors, our model dynamically adjusts predictions through reinforcement learning. Further work will involve refining BMR–RMR modeling, expanding the dataset, and reducing measurement noise to enhance robustness and generalizability. The model’s performance is subject to the quality of the training data, which may include noise from movement or sensor errors. Future research will aim to enhance reliability by addressing these issues.