Assessing Heart Rate Using Consumer Technology Association Standards

: It is difﬁcult for developers, researchers, and consumers to compare results among emerg-ing wearable technology without using a uniform set of standards. This study evaluated the accuracy of commercially available wearable technology heart rate (HR) monitors using the Consumer Technology Association (CTA) standards. Participants (N = 23) simultaneously wore a Polar chest strap (criterion measure), Jabra Elite earbuds, Scosche Rhythm 24 armband, Apple Watch 4, and Garmin Forerunner 735 XT during sitting, activities of daily living, walking, jogging, running, and cycling, totaling 57 min of monitored activity. The Apple Watch mean bias was within ± 1 bpm, and mean absolute percent error (MAPE) was <3% in all six conditions. Garmin underestimated HR in all conditions, except cycling and MAPE was >10% during sedentary, lifestyle, walk-jog, and running. The Jabra mean bias was within ± 5 bpm for each condition, and MAPE exceeded 10% for walk-jog. The Scosche mean bias was within ± 1 bpm and MAPE was <5% for all conditions. In conclusion, only the Apple Watch Series 4 and the Scosche Rhythm 24 displayed acceptable agreement across all conditions. By employing CTA standards, future developers, researchers, and consumers will be able to make true comparisons of accuracy among wearable devices.


Introduction
The International Data Corporation estimates a 22.4% compound annual growth rate of wearable technology, equating to approximately 489.1 million units shipped globally in 2023, including smartwatches (22.3%), wristbands (14.3%), earwear (56.0%), and other wearable technology (7.5%) [1]. These devices are popular for day-to-day life as well as potential use in the health care industry [2], highlighting the importance of a standard of accuracy testing for these devices.
Standards that outline protocols and validation criteria for wearables have recently been presented by the Consumer Technology Association (CTA) for step counting [3], sleep [4], and heart rate [5]. These standards provide guidance for researchers to evaluate research-grade devices as well as commercial devices used by the lay public. Since the CTA released their standards for protocol and validation, researchers have begun to acknowledge them in their designs for step count [6][7][8][9], and HR [10][11][12]. However, only limited studies have actually implemented these standards [6].
Recent reviews of wearable technology have referred to or recommended the use of CTA's standards in validation research [13][14][15][16][17]. These CTA standards attempt to reconcile potential inaccuracies in current wearable heart rate monitor studies (e.g., a lack of investigation of diverse skin types [18]) and give common ground to compare the accuracy of devices. Technicians informed participants on the duration and characteristics of the protocol, as well as when each section of the test began and ended. Per the CTA protocol [5], the participant began seated quietly, refraining from engaging with external stimuli. Participants completed each section of the CTA protocol during one session ( Figure 2). Technicians adhered to the timing and intensities indicated in the CTA protocol for the sedentary, lifestyle activities, walking, jogging, running, and cycling. For the lifestyle activities section, participants completed the two minutes of full body activities of daily living (9: 30-11:30) folding laundry and simulated grocery shopping. Jogging/running was completed on a treadmill (Life Fitness, 95 Ti, Franklin Park, IL, USA). When cycling, participants kept their hands on the bicycle ergometer (Monark, Ergomedic 874E, Varberg, Sweden) handlebars in a natural position. Exercise intensities were defined by metabolic equivalent task (MET) level, ratings of perceived exertion (RPE), and/or HR max. Moderate intensity was defined as 3.0-6.0 METs, RPE 12-13 (Borg scale [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20], or HR max of 64-76%. Vigorous intensity was defined as >6.0 METs, RPE of l4-17, HR max 77-95%. The Jabra, Scosche, and Polar data were obtained from the PerformTek app in onesecond increments. The Apple and Garmin data were obtained from their respective apps, but these data were not provided in one-second increments. Rather, data were provided at variable time points, sometimes every second, sometimes every three seconds, and sometimes with long breaks of more than a minute in between data points. Data from each device was aligned according to the timestamp with the appropriate start time as the Polar Technicians informed participants on the duration and characteristics of the protocol, as well as when each section of the test began and ended. Per the CTA protocol [5], the participant began seated quietly, refraining from engaging with external stimuli. Participants completed each section of the CTA protocol during one session ( Figure 2). Technicians adhered to the timing and intensities indicated in the CTA protocol for the sedentary, lifestyle activities, walking, jogging, running, and cycling. For the lifestyle activities section, participants completed the two minutes of full body activities of daily living (9:30-11:30) folding laundry and simulated grocery shopping. Jogging/running was completed on a treadmill (Life Fitness, 95 Ti, Franklin Park, IL, USA). When cycling, participants kept their hands on the bicycle ergometer (Monark, Ergomedic 874E, Varberg, Sweden) handlebars in a natural position. Exercise intensities were defined by metabolic equivalent task (MET) level, ratings of perceived exertion (RPE), and/or HR max. Moderate intensity was defined as 3.0-6.0 METs, RPE 12-13 (Borg scale [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20], or HR max of 64-76%. Vigorous intensity was defined as >6.0 METs, RPE of l4-17, HR max 77-95%. Technicians informed participants on the duration and characteristics of the protocol, as well as when each section of the test began and ended. Per the CTA protocol [5], the participant began seated quietly, refraining from engaging with external stimuli. Participants completed each section of the CTA protocol during one session ( Figure 2). Technicians adhered to the timing and intensities indicated in the CTA protocol for the sedentary, lifestyle activities, walking, jogging, running, and cycling. For the lifestyle activities section, participants completed the two minutes of full body activities of daily living (9:30-11:30) folding laundry and simulated grocery shopping. Jogging/running was completed on a treadmill (Life Fitness, 95 Ti, Franklin Park, IL, USA). When cycling, participants kept their hands on the bicycle ergometer (Monark, Ergomedic 874E, Varberg, Sweden) handlebars in a natural position. Exercise intensities were defined by metabolic equivalent task (MET) level, ratings of perceived exertion (RPE), and/or HR max. Moderate intensity was defined as 3.0-6.0 METs, RPE 12-13 (Borg scale [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20], or HR max of 64-76%. Vigorous intensity was defined as >6.0 METs, RPE of l4-17, HR max 77-95%. The Jabra, Scosche, and Polar data were obtained from the PerformTek app in onesecond increments. The Apple and Garmin data were obtained from their respective apps, but these data were not provided in one-second increments. Rather, data were provided at variable time points, sometimes every second, sometimes every three seconds, and sometimes with long breaks of more than a minute in between data points. Data from each device was aligned according to the timestamp with the appropriate start time as the Polar The Jabra, Scosche, and Polar data were obtained from the PerformTek app in onesecond increments. The Apple and Garmin data were obtained from their respective apps, but these data were not provided in one-second increments. Rather, data were provided at variable time points, sometimes every second, sometimes every three seconds, and sometimes with long breaks of more than a minute in between data points. Data from each device was aligned according to the timestamp with the appropriate start time as the Polar chest strap in conjunction with the guidelines provided by the CTA protocol [5].
The data were then graphed to show any inherent delays with a device and aligned appropriately and analyzed. Additionally, almost all commercially available wearables have data dropout that may be due to damaged devices, failure of the tester to apply the testing devices properly, failure of the tester and/or the participant to follow the protocol, as well as other reasons that are inexplicable but are apparent when visualizing the data. It is common practice in research that data analysts make an honest effort to include all valid data acquired regardless of agreement between the reference device and test device, and report the percentage of data removed and retained per device [10]. Data were removed for all devices if the Polar chest strap registered a zero for HR. In addition, each device was also examined, and data were removed from analysis if there was a registered zero as the HR for a device or a locked HR. The total number and percentage of data points removed for each device were calculated. The following analyses were conducted after all zeroes were removed from the data sets.
Each device under test was compared only to the Polar chest strap. Accuracy was assessed for each device within each activity via mean bias, mean absolute percent error (MAPE), Bland-Altman analysis, an equivalence test, and intraclass correlation coefficient (ICC) using the two-way mixed model and absolute agreement in IBM SPSS (IBM Statistics version 26.0, Armonk, NY, USA). This analysis is consistent with previous literature employing multiple testing methods when aiming to prove that data are the same [6,14,20,21]. Equivalence tests were conducted using the two-one-sided-tests method with Jamovi statistical software [22]. Equivalence was determined with upper and lower boundaries set at 5% of the mean heart rate from the Polar chest strap for each activity. MAPE and Bland-Altman analyses were conducted using Microsoft Excel (Redmond, WA, USA). Accuracy of a device was considered if the MAPE was <10%, ICC > 0.90, and if it was equivalent to the Polar chest strap.

Figures 3-8 demonstrate
Bland-Altman plots of heart rate for each device during the six conditions of activity compared to the Polar H7. Tables 1-4 present the results for the Apple Watch series 4, Garmin 735 XT watch, Jabra Elite earbuds, and Scosche Rhythm 24 armband, respectively. There were several instances in which the Polar chest strap registered a heart rate of zero bpm. Data points were removed during the sedentary condition (875, 10.6%), lifestyle condition (499, 2.8%), walking (8, 0.1%), walk/jog (3, 0.02%), running (24, 0.2%), and cycling (132, 0.8%). The Apple and Scosche devices did not have any zeroes registered as HR, so no data points were removed from analysis for these devices, as indicated in Table 1. Data were removed from analysis for the Garmin watch for the lifestyle condition (88, 4.5%), and data from one participant malfunctioned during the lifestyle condition. Additionally, data from two participants malfunctioned during the walking, walk/jog, running, and cycling conditions. Thus, analysis of the Garmin watch was conducted with 22 participants for the lifestyle condition and 21 participants for walking, walk/jog, running, and cycling. The number and percentage of data points removed from the Garmin data are shown in Table 2. The Jabra earbuds registered zero as a HR as shown in Table 3 during the sedentary condition (48, 0.6%), lifestyle condition (225, 1.3%), walking (178, 1.8%), walk/jog (739, 4.5%), running (333, 3.4%), and cycling (295, 1.8%).                               Table 1 shows the results of the Apple Watch Series 4. This device had the second least number of data points of the four devices tested. The Apple device had a mean bias within 1 bpm of the Polar chest strap, MAPE of less than 3%, and an ICC of > 0.96 in all six conditions. The Apple Watch performed the worst in the lifestyle activity (2.95% MAPE and ICC = 0.963) and best during cycling (0.62% MAPE and ICC = 0.998). Apple was equivalent to the Polar chest strap for all six testing conditions (p < 0.001). Table 2 provides the results of the Garmin 735 XT. The Garmin device had the least amount of data points of all four devices. The device tended to underestimate heart rate in all conditions, except cycling. MAPE was greater than 10% during sedentary, lifestyle, dynamic walk-jog, and running. Additionally, the ICCs were below 0.90 for all conditions and the limits of agreement were wide. The Garmin tested as not equivalent to the Polar chest strap for the sedentary condition (p = 1.00), but equivalent for the remaining five conditions (p = 0.006 for lifestyle, p < 0.001 for walking, dynamic walk-jog, running, and cycling).

Jabra
The results of the Jabra Elite earbud are shown in Table 3. The Jabra had the second highest number of data points available for analysis. The mean bias was within ±5 bpm for each condition, except cycling, which was higher. MAPE exceeded 10% for dynamic walk-jog, was 7-8% for lifestyle, walking, running, and cycling, and was below 5% for the sedentary condition. ICCs were below 0.90 for lifestyle and walking, equal to 0.90 for dynamic walk-jog and cycling, and greater than 0.92 for sedentary and running. The Jabra was equivalent to the Polar chest strap for all six conditions (p < 0.001). Table 4 shows the results of the Scosche Rhythm 24 armband. The Scosche device provided the highest number of data points of the four devices tested. Mean bias was within ±1 bpm for all six conditions, and MAPE was less than 2% for dynamic walk-jog, running, and cycling. MAPE was highest for the lifestyle condition (4.12%). The ICCs were 0.94 or higher for all conditions. The Scosche was equivalent to the Polar chest strap for all six conditions (p < 0.001).

Discussion
This study aimed to evaluate the agreement of commercially available wearable technology heart rate monitors to a criterion measure using the established CTA standards [5]. It was hypothesized that each device would return valid heart rate measures during rest, lifestyle activity, walking, jogging, and cycling. Our results indicate that the Apple Watch Series 4 and the Scosche Rhythm 24 display acceptable agreement with a criterion chest strap heart rate monitor across all conditions tested. The Jabra Elite earbuds were found to have acceptable agreement in the sedentary, running, and cycling conditions, whereas the Garmin Forerunner 735 XT did not satisfy our definition of validity for any condition tested.
Our research group has identified that there is often considerable lag between the release of wearable technology devices and independent investigations that report validity [14]. Toward this end, we are aware of only one other investigation that has determined validity of heart rate measures in the Apple Watch Series 4 [23]. The investigation was conducted in post-operative cardiac surgery patients with no exercise component and reported that agreement was higher in patients in atrial fibrillation (rc = 0.86) compared to those who were not (rc = 0.64) [23]. To our knowledge the current investigation represents the first report of the Apple Watch Series 4 with respect to heart rate validity during any form of exercise. Our results indicate acceptable heart rate agreement of the Apple Watch Series 4 across conditions including sedentary, lifestyle activities, walking, jogging, running, and cycling. Several other reports have found acceptable heart rate agreement for activities such as steady state treadmill exercise of increasing speed up to 6 mph [24], exercise blocks progressing from walking (3.0 and 4.0 mph) to running (6.0 and 7.5 mph) to cycling (100 W and 175 W) [25], and initial stages of the Bruce Treadmill test and stages of the 25-watt cycle protocol [26]. It should be noted that investigations into original versions of the Apple Watch have determined decreasing levels of validity as the intensity of exercise increases [27,28]. Further investigations are necessary to reveal whether this phenomenon applies subsequent versions of the Apple Watch.
The Scosche Rhythm 24 is a device that is designed to be located on the forearm and investigations reporting validity during exercise are beginning to emerge. One investigation using various walking and running speeds on the treadmill determined acceptable agreement between a previous version of the Scosche (Rhythm) arm strap with a chest strap monitor (overall r = 0.93, MAPE range = 2.22% to 6.67%) [29]. Additionally, no difference was reported in heart rate compared to a chest strap when participants performed walking and running on a treadmill and in an unrestricted setting (p > 0.05) [21]. The current investigation is in line with these previous reports in that the Scosche Rhythm 24 displayed acceptable agreement across all conditions tested. However, other investigations have reported lower validity measures during specific exercise applications. Although acceptable heart rate agreement with an ECG criterion device was reported at rest (rc = 0.93), cycle (rc = 0.84), and treadmill exercise (rc = 0.92), the Scosche Rhythm+ was not deemed valid while completing elliptical training (rc = 0.41 with arms, 0.27 without arm movement) [19]. During variable intensity trail running, better agreement from the Scosche Rhythm+ was observed during downhill running (MAPE = 3.8%, bias = 1.9 bpm, rc = 0.885) than when running was at a generally positive incline (MAPE = 6.2%, bias = 3.9 bpm, rc = 0.699) [21]. Thus, it is possible that, similar to what has been noted in the Apple Watch above, activities performed at greater intensities may result in a decreased agreement in heart rate measures obtained from the Scosche armband compared a criterion measure.
Earbud-based heart rate investigations are beginning to emerge in the exercise validity literature. Acceptable agreement has been reported during treadmill-based graded exercise testing with a prototype earbud device (R 2 = 0.98) [30], certain resistance training exercises with Bose SoundSport Headphones (leg curl MAPE range = 4.46% to 6.48%) [31], as well as during treadmill and high intensity training exercises with the Jabra Pulse earbuds (MAPE = 2.48%, rc = 0.943; MAPE = 3.53%, rc = 0.861 respectively) [6]. On the other hand, graded exercise testing on a cycle ergometer resulted in heart validity that decreased as exercise intensity increased (MAPE at 50 W = 6.4%, MAPE at 200 W = 15.42%) [31]. Additionally, the Jabra Pulse earbud device tended to have less agreement compared to ECG when heart rate was above 100 bpm in patients with cardiac diseases and in participants with atrial fibrillation (r 2 = 0.434) [32]. Finally, the Jabra Elite Sport earbud device was determined to have poor agreement compared to a criterion chest strap during trail running at variable intensities of exercise (MAPE = 21.3%, rc = 0.384, ICC = 0.395) [21]. In the current investigation, the Jabra Elite earbuds acceptable agreement was observed in the sedentary, running, and cycling conditions but not during lifestyle tasks, walking, or the dynamic walk-jog. As earbud devices have shown varied results in returning accurate heart rate values, further investigations should be directed toward determining which factors negatively impact recorded measures. Unlike the previous devices discussed in which greater intensity appears to impact validity (Apple Watch, Scosche Rhythm), it is possible that greater engagement in active movement may decrease agreement to a criterion measure.
For a device to be considered valid in the current investigation, it had to meet the predetermined thresholds in every statistical test employed. Toward this end, the Garmin Forerunner 735 XT was not considered valid for any condition tested. Several Garmin Forerunner models have been tested in the exercise validity literature specific to heart rate. The Garmin 225 displayed large limits of agreement (−32.53 to 29.40 bpm) when compared to ECG during a walking protocol that included grades up to 8% [33], and large MAPE values in a protocol that consisted of walking and running (MAPE range = 9.88% to 27.38%) [34]. The Garmin 235 was found to have acceptable heart rate agreement during cycling at 150 W (Rho = 0.889), but low agreement during lower intensity cycling (50 W Rho = 0.269, 100 W Rho = 0.462) [35]. Additionally, heart rate measures in the Garmin 235 were considered acceptable when participants performed treadmill exercise up to 6 mph (Absolute percent difference = 6.1%) and stationary cycle exercise up to 125 W (Absolute percent difference = 4.6%), but not while completing elliptical exercise (Absolute percent difference = 13.7%) [19]. To our knowledge, we are the first to report heart rate validity in the Garmin Forerunner 735 XT. It appears that improvements must be made to increase the heart rate sensing capability in these devices.
It is unknown why the Polar chest strap returned readings of zero. It was most likely due to an issue with contact between the sensor and the skin, but greater investigation into the specifics was outside the scope of this study. All protocols were followed, including the use of conducting gel with the Polar chest strap. All devices were monitoring HR prior to recording data.

Conclusions
The purpose of this investigation was to determine heart rate validity of commercially available wearable technology devices using the established CTA standards [5]. The main conclusion from the current study indicates that only the Apple Watch Series 4 and the Scosche Rhythm 24 display acceptable agreement across all conditions. Although we hypothesized that every device would return valid heart rate measures during rest, lifestyle activity, walking, jogging, and cycling, that was not the case. Although activity type was not able to be synchronized, it did not influence heart rate of the Polar, Scosche, or Jabra data and did not appear to influence Apple data. Future research may investigate if heart rate data of the Garmin device is influenced by activity type. As wearable technology continues to advance, future investigations should employ CTA standards for developers, researchers, and consumers to make true comparisons of accuracy among wearables. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author upon Institutional Review Board approval. The data are not publicly available due to resections of privacy and ethics.

Conflicts of Interest:
The authors declare no conflict of interest.