1. Introduction
Physical inactivity and unhealthy lifestyle are major contributors to the rise in cardiovascular and chronic diseases [
1]. The use of self-monitoring has proven motivational for some groups of patients with chronic disease [
2]. Commercial self-monitoring devices are becoming increasingly popular, not only for consumer markets but also for research and medical purposes [
3,
4]. The majority of consumers use self-monitoring to optimize their fitness activities, as a means of lifestyle monitoring, and for motivational purposes. Healthcare professionals use the trackers to help patients monitor their daily activity and to motivate patients for improved physical activity. The need for physical activity in vulnerable groups such as the elderly, those with chronic diseases and obese patients has been well established. Previous studies of physical activity used self-monitoring devices to correlate body mass index and physical activity in American adults, and Izawa et al. have demonstrated correlations between mental health, mortality and physical activity in Japanese heart failure (HF) patients [
5,
6,
7].
Modern self-monitoring technologies provide readings of step detection, sleep-depth measurements and even heart rate. Even though most self-monitoring devices are not medical devices, and are thus not approved for medical purposes, the commercially available self-monitoring devices provide low-cost, user-friendly solutions. They allow users to monitor their physical activity and daily routines, often with minimal inconvenience. Several studies have focused on the accuracy of self-monitoring devices and in a broad variety of setups [
3,
8,
9,
10,
11,
12,
13,
14,
15]. Not surprisingly, the literature remains contradictory on this point, and accuracy is affected by many factors. Studies show that the step count accuracy of self-monitoring devices varies depending on height, use of walking aides, types of walking surfaces and walking speed [
16,
17].
Nighttime heart rate can be used as a diagnostic parameter and is associated with HF-related hospitalizations [
18]. Known factors that influence the accuracy of heart rate measures in self-monitoring devices include motion artifacts caused by physical activity, skin color, and placement of trackers [
19,
20].
As the population of chronic disease patients increases, it is important to look for new ways of designing HF care and rehabilitation [
21]. The availability and cost of activity trackers makes them ideal for use in health care systems [
22]. This project is part of a three-study setup. The overall aim of the study was to identify those devices that would be most effective for HF patients in a tele-rehabilitation program as part of the Future Patient research program. The aim of this particular study was to evaluate four devices based on step count, and two devices based on heart rate for further testing in clinical applications.
2. Materials and Methods
This study evaluates four activity trackers, and one sleep tracker. The trackers were selected based on availability, battery life, price, previous experience in the research group and their capacity to provide access to an application interface [
2]. The self-monitoring devices were evaluated in a two-step setup: First, the step count validity is evaluated through a standardized walking test. Second, the pulse accuracy is evaluated during daily living. Two armbands are tested in this setup, the Fitbit Charge HR, and the Garmin Vivofit 2. Both devices are equipped with an accelerometer and are able to estimate step count through the repetitive movements of the arms during walking. In addition, the Charge HR is also able to estimate pulse from an optical heart rate sensor placed under the armband. The Fitbit Zip and the Fitbit One are two clip-based pedometers usually worn at the hip, in a pocket or attached to the bra. Both devices are capable of measuring step count based on an accelerometer. The Beddit sleep sensor is tested for pulse accuracy. The Beddit is placed in the bed, under the sheets at chest level, and measures mechanical impulse through ballistocardiography. An overview of all devices involved in this study is shown in
Table 1.
2.1. Standardized Walking Tests
Participants in the step validity study were 22 healthy volunteers (11 female) aged 22–52 (Mean ± SD, 31.1 ± 8.03). Participants did not suffer from any walking disabilities that could lead to unnatural walking patterns. Prior to the start of the experiment, each participant signed an informed consent form. The study was reported to the local ethics committee, although it did not require ethical approval.
During the walking tests, participants wore five devices, four activity trackers, and a Shimmer 3 used as reference. The activity trackers were set by date of birth, height, and weight, and were used according to the guidelines of the manufacturer. Participants wore the Garmin Vivofit 2 and the Fitbit Charge HR on either wrist. In order to avoid bias between dominant and non-dominant use the subjects were randomized to wear armbands on dominant and non-dominant wrist. An overview of the firmware of each device is shown in
Table 1. The walking tests were performed outdoors on an asphalt parking lot. A 100-m rectangular track (6 m × 44 m) was drawn on the asphalt (
Figure 1a). Each participant walked the track four times, two sessions at 2 km/h and two sessions at 3.5 km/h. The two walking speeds were selected in order to reflect the average walking speed of HF patients at 2.7 km/h [
23]. The number of steps measured by each activity tracker was noted from the tracker display before and after the four walking sessions. Participants were asked to stand still during the notation and maintain a normal walking pace through the track. Before starting to walk, subjects were asked to stand at one corner of the track, where the four activity trackers were attached as described in
Table 1.
A researcher controlled the walking speeds by walking in front of the participants through the walking test. Walking speeds were sustained by using a Garmin Fenix 3 GPS watch. To ensure the appropriate walking speeds, the researcher practiced his walking speed for 15 m before starting the walking test as shown on
Figure 1a. The practice walking was also used to initialize the speedometer on the Garmin Fenix 3. The 15 m distance was chosen based on the time it took the Garmin Fenix 3 to provide stable measures of walking speed.
Walking at the proper pace of 2 km/h or 3.5 km/h, the researcher crossed the starting line, and the participants were asked to follow him, staying 1 m behind. The setup of the walking test is shown in
Figure 1b.
The Shimmer 3 is set up to use the internal gyroscope and is attached with an elastic band to their dominant ankle. It is placed so that the z-axis of the gyroscope aligns with the transversal axis measuring the swing phase as positive rotation. The Shimmer 3 is set to sample at 64 Hz.
In addition to the GPS, a stopwatch was started when the participant crossed the starting line and stopped at the finish line. The exact average walking speeds were then calculated.
2.2. Calculating the Number of Steps Measured by the Shimmer 3
Data from the Shimmer 3 is processed through a simple threshold algorithm. Swing phases in the gait patterns were detected by setting a 100 degree/s threshold. The number of steps was then calculated as the number of swing phases on a single leg multiplied by two. A sample of the gyroscope data and the peaks detected by the algorithm is shown in
Figure 2.
To ensure that the algorithm calculated the correct number of steps, all gyroscope data was also manually inspected. Any deviation between the algorithm and the manual count was evaluated by two researchers. Eighteen deviations were found and corrected in the last step of the walking tests where the swing phase did not reach the 100 degree/s threshold.
2.3. Heart Rate Validity
Nine healthy participants (five females) enrolled voluntarily in the heart rate study. Participants were aged 21–42 (26 ± 5.1). No subjects reported any current or previous heart conditions, and all subjects had normal sinus rhythms. Before the start of the experiment, each participant signed an informed consent form.
At the beginning of study, participants received a Fitbit Charge HR armband, a Beddit sleep sensor and a tablet computer. The Fitbit Charge HR was attached to the wrists by the researcher, as described by the manufacturer. Participants were instructed to place the Beddit sensor in their bed and to use the tablet to switch it on when they went to bed and off when they woke up in the morning. Simultaneously with the two trackers, subjects also wore a Shimmer 3 three-point Electrocardiogram (ECG) monitor. Electrodes were placed on the right and left shoulders and upper right and left thighs and the Lead II ECG was acquired at sampling frequency 512 Hz. Subjects were instructed to use the equipment for one night.
To calculate heart rate, the R-peaks were detected in the ECG signals by implementing a Pan Tompkins like algorithm. The ECG segments were filtered with a rank 4 zero phase Chebyshev2 bandpass (1–40 Hz) filter. The filtered signals were then differentiated and squared. Finally, a moving average filter with a width of 0.2 s was applied. The pulse was calculated by peak distances in the signal.
To eliminate the errors caused by poor signal quality, the ECG signal was assessed manually. By comparing the pulse calculations with the raw ECG signal, only those segments without artifacts and with clear QRS complexes and R-peaks were selected for the analysis. All segments had a minimum duration of 1 h. The total duration of all ECG recordings was 97.46 h.
4. Discussion
The aim of this study was to validate devices on step count and heart rate using healthy subjects. The overall aim was to identify the most sustainable devices for HF patients in a tele-rehabilitation program.
All activity trackers increased percentage errors during the 2 km/h walking test as compared to the 3.5 km/h walking test. Both the average systematic error and standard deviations were affected by the reduction in walking speed, where the Fitbit Charge, the Fitbit Zip, and the Garmin Vivofit 2 all showed standard deviations above 30.4% during the 2 km/h walking test, making their usability extremely limited in this scenario. Although gait patterns of healthy volunteers may differ from the gait of the elderly or chronically ill groups, the same patterns of negative correlations between gait speed and step accuracy were found in this study [
10]. The gait speeds used in this study were selected on the basis of the average gait speeds of HF patients. However, the two gait speeds of 2 km/h and 3.5 km/h did not represent all HF patients. Based on results from Pulignano et al., the slowest group of HF patients walked an average of 1.8 km/h. Moreover, results from previous studies suggest that the step detection accuracies of self-monitoring devices at extremely slow gait <1.8 km/h are reduced substantially. Hence, self-monitoring devices may not be the most appropriate option for monitoring physical activity in extremely slow-walking groups [
20,
21].
The reliability of the Fitbit One found in this study corresponds with the results reported in Simpson et al. [
24]. Simpson et al. tested the Fitbit One with elderly subjects at gait speeds ranging from 1.08 km/h to 3.24 km/h. Simpson et al. also reported that a waist-worn Fitbit One may be unable to detect speeds slower than 1.8 km/h. The Fitbit Zip had shown an error of −1.1% ± 5.8% steps during the 3.5 km/h walking test. However, the plot in
Figure 6a shows that the results were affected by outliers; nonetheless, the result is similar to the findings of Ferguson et al. and Kooiman et al., who found that the Fitbit Zip was highly valid for measuring step count in a laboratory setting at a 4.8 km/h gait speed and free living conditions [
25,
26]. Evaluating the Fitbit Zip during the 2 km/h walking test increased the error to −22.9% ± 33.3%, a result similar to the treadmill study by Femina et al., who found an error of −17.27% compared to the expected steps [
27]. No previous study was found evaluating the step count precision of the Fitbit Charge HR and the Garmin Vivofit 2 during slow walking speeds. The plots in
Figure 3a and
Figure 4a reveal that the results from the Fitbit Charge HR and the Garmin Vivofit 2 were affected by outliers during the 3.5 km/h walking test. These outliers contribute to an increase in the standard deviations and may explain the relatively high standard deviations as compared to the Fitbit One. To our knowledge, no previous study has addressed the issue of outliers in step detection for any of the self-monitoring devices evaluated in this study.
The modified Bland-Altman plots of the pulse measurements revealed an underestimation of the Fitbit Charge HR compared to the ECG measurements. These findings are supported by the work of Jo et al., who continue to conclude that especially high pulse measurements are underestimated by the Fitbit Charge HR [
28]. No previous study was found evaluating the heart rate precision of the Beddit sleep sensor.
The number of participants in the pulse accuracy calculations was not large enough to fully generalize the devices’ capabilities for measuring pulse. Further studies are needed to fully validate the pulse accuracies of these devices, taking into account both the physical activity and skin color of participants. Results from the pulse measurements of the Beddit sleep tracker and the Fitbit Charge HR are not directly comparable, as the Fitbit Charge HR measures all-day pulse, while the Beddit measures pulse only during nighttime. As a result, the accuracy measures cannot be used to compare the two devices. However, the results can be used to help health care professionals become aware of the accuracies of the various self-monitoring devices and of the pulse accuracies before considering self-tracking devices for monitoring in a clinical context.
To ensure accurate heart rate calculations, the ECG was segmented. This meant that the noisier parts of the ECGs were discarded. It cannot be ruled out that this method may have favored calm periods of the ECG readings, when the participants were at rest. It follows that the comparisons of the Fitbit Charge HR and the ECG may be slightly biased towards the resting periods of the participants.
One issue that arises from the use of these devices in a clinical context is that the actual algorithms behind the heart rate estimate are unavailable. The window size picked for averaging the ECG pulse recordings was selected based on the update frequency of the pulse readings in the application interface. This method may not be the same method as that used by Fitbit and Beddit.
This study presents evaluations based only on step count and heart rate validity. However, self-monitoring devices are often capable of detecting additional variables, such as energy expenditure, distance walked and sleep depth. Evaluating the measurements of self-monitoring devices is important for their future use in clinical applications. However, a full evaluation of patient perceptions and ease of use is needed in order to assess the usability of self-tracking devices in a clinical context [
29]. Further evaluations of the self-monitoring devices will be undertaken as part of the Future Patient trial (
www.labwelfaretech.com) in order to fully validate them for use in chronically ill HF patients.
5. Conclusions
This study shows that some self-monitoring devices are better suited than others for measuring step count at slow walking speeds. No device showed an absolute systematic error above 1.5% during the 3.5 km/h gait speed. However, the standard deviations were highest in the two wrist-worn devices. During the 2 km/h gait speed, the Fitbit One and the Garmin Vivofit 2 showed the lowest average systematic error percentage; however, the standard deviations of the Fitbit One were significantly lower, making this the most reliable candidate for use in slow-walking populations.
When averaging the ECG pulse within 1 min and 5 min windows, the error percentages of pulse estimates found by the Fitbit Charge HR were −3.42% ± 7.99% and those of the Beddit sleep tracker were −3.27% ± 4.60%. The findings reveal the current functionality and limitations of commercially available self-tracking devices, and point towards a need for future research in this area.