1. Introduction
Swimming is a sport, like many others, where time over a given distance (i.e., speed) is the main performance indicator. During training sessions, various speed-related parameters, such as race pace, maximal aerobic velocity, critical speed, and others, are used for monitoring training intensity [
1,
2]. However, when it comes to monitoring or evaluating training intensity in real time, land-based sports can rely on a larger variety of physiological or biomechanical parameters. This is due to the larger spectrum of wearable technologies available for such purposes (for example, training load by GPS, oxygen consumption by spiroergometers, etc.), resulting in numerous ways of in situ or a posteriori monitoring of physical effort during sports training or even during competition.
Portable heart rate (HR) or oxygen consumption (VO
2) measuring devices can be worn in ecological scenarios for a live assessment of the physiological demands of many activities, such as running [
3], cycling [
4], tennis [
5,
6], or soccer [
7]. For HR measurement, on land one can rely on valid and reliable HR chest straps that have been used for decades and are deemed the portable HR measurement gold standard [
8,
9,
10]. While these chest straps use electrocardiography measures (i.e., they capture the electrical activity of the heart to detect the QRS complex), other portable sensors, such as built-in wristwatch sensors and other multi-site sensors, use photoplethysmography (PPG). This technology analyzes the subtle light intensity shifts in the arteries caused by changes in blood volume, which are also felt as pulsating blood. Although the measurement principles are completely different, these devices have shown relative robustness in measuring HR [
11,
12]. However, these assessments of measurement agreement take place almost exclusively in terrestrial environments, leaving it unclear whether the same technology would be equally accurate in aquatic environments.
In water-based environments, monitoring training or competition performance live based on popular physiological parameters such as HR or VO
2 becomes impractical. Not only it is essential to overcome the obvious incompatibility between electronic components and water, but there is also the problem of data transmission. Almost all wearable devices transmit to a receiver using technologies such as Bluetooth or ANT+, which, despite their advantages, do not function in aquatic environments. That’s why today, some sensors offer built-in memory, so swimmers can save their training effort and analyze it afterwards. Nevertheless, the saved values still need to be accurate. The body displacement through the water during actual swimming can lead to water passing between the PPG sensors and the skin, potentially impairing the readings [
13]. Studies have shown that the pressure between the PPG sensor and the skin influences the reading accuracy, thus the sensor cannot be worn too loosely or tightened to the point of discomfort but requires a moderate pressure, highlighting the importance of full and constant skin contact [
14]. Even chest straps might have trouble getting accurate readings during swimming due to the increased water drag after the start and the push-out off the wall after each turn, which might cause the chest strap to fall out of place. Swimmers could bypass this issue by wearing a swimsuit, but this might alter their swimming mechanics and/or buoyancy.
Recently, an alternative approach was developed for one of the PPG sensors available on the market. This specific sensor (OH1+ and Verity sense, Polar Electro, Finland) has been provided with a clip that allows it to be attached to the swimming goggles strap, in this case, and to read HR from the temple. Given the placement site, the sensor can be covered with a swimming cap, which, in theory, might overcome the difficulties mentioned above. Nevertheless, it is still an a posteriori approach since the sensor saves the data to be analyzed later.
A couple of studies have conducted experiments using this free PPG sensor in swimming under different conditions: one has evaluated the agreement between a temple- and an arm-worn sensor with a chest strap in an intermittent and progressive protocol [
15]. However, it remains unclear if different intensities would jeopardize the readings or the analyses. The other study reported the validity of the devices using the mean values of minimum HR, maximum HR, and mean HR within different intensity zones [
13]. The measure of agreement was presented for the whole test, and the large limits of agreement reported (around 25 and −27 for the upper and lower limits, respectively) leave it unclear exactly when and why the lack of accuracy occurred.
The literature reports mixed findings about exercise intensity and the accuracy of optical sensors in on-land measurements. It has been described that the accuracy rises alongside the exercise intensity [
8,
13,
16], probably due to the increased blood flow. However, at rest it tends to be more accurate than at lower intensities [
17,
18,
19]. On the other hand, there are reports of high exercise intensity being negatively related to accuracy [
8,
17,
20,
21], probably due to increased arm swing. Thus, the aim of the present study was to assess the agreement of two different optical devices for HR monitoring with the chest strap while swimming and to check for bias in different intensities directly from rest to maximum intensity and back to rest again. This analysis will help build the body of knowledge and provide a comprehensive understanding of how to interpret possible biases and, ultimately, help monitor, evaluate, and prescribe exercise intensity with rigor. It was hypothesized that the temple sensor would be an accurate method for HR assessment in swimming and would outperform the wristwatch. By confirming the hypothesis, the present study can also provide the user with guidance on which device to use.
2. Materials and Methods
The sample consisted of 30 participants (20 men and 10 women) attending a sports course at a local university (
Table 1). The students were randomly chosen from among a pool of 52. All tests took place in the same lane of the same pool at the same time of the day (morning period). The tests were held on two consecutive days, with the water temperature and relative air humidity reported at 28.25 ± 0.35 degrees and 79.0 ± 1.41%, respectively. All participants gave their informed consent to participate in the present study after being informed of the study’s design and potential risks. The exclusion criteria included any recent cardiovascular or respiratory medical event that might impair the results or increase the study’s potential risks. The study was approved by the ethics committee of the institution. All procedures were in accordance with the Declaration of Helsinki regarding human research.
2.1. Protocol
Tethered swimming was used for this study. The participants were tied by the waist to the starting block with a static rope. Although a few concerns have been raised throughout the years about the differences between tethered swimming and free swimming, the purpose of this study was not to analyze swimming technique but to compare the HR readings of different devices at different swimming intensities. Furthermore, tethered swimming was implemented to achieve continuous arm movements while swimming and to mitigate the water flow between the HR sensors and the skin. The subjects performed an incremental step test that consisted of three 30 s bouts at different self-selected paces followed by a 1 min rest period. The aim of this approach was to provide a quick rise in HR at the beginning of the test, followed by a period of regular increase, a steep decrease right after the end of the effort, and finally, a progressively slower decrease during the resting period. All of the tests took place in the lane adjacent to the pool deck to provide a comfortable passive rest after the test. The sign for the beginning of the test and the end of each step (and, therefore, the beginning of the next) was given through whistling. Prior to the first whistle, the subjects were in a horizontal position, sculling, keeping the rope stretched. After the first whistle, the subjects were instructed to keep a slow, comfortable pace for 30 s. Next, another whistle marked the beginning of the second 30 s period, where the subjects now swam at a moderate self-selected pace. After the third whistle, the subjects performed an all-out 30 s front crawl. A final fourth whistle was given for the subjects to stop and immediately grab the side wall, put their feet on the wall’s small step, and rest as relaxed as possible with their arms under the water. The test ended after the 1 min rest period.
2.2. Data Collection
Before entering the pool, the subjects were equipped with a chest strap HR monitor (H10, Polar, Kempele, Finland), a wristwatch (Pacer Pro, Polar, Kempele, Finland), and a free optical sensor (OH1+, Polar, Kempele, Finland). For the present study, both the wristwatch and the free optical sensor were tested against the chest strap. It was assumed that the chest strap measured real HR values. Chest straps manufactured by this brand have been used for decades as a valid and reliable method for assessing HR [
22]. Given their superior performance in detecting the QRS complex [
9,
10], these chest straps are often used as the “gold standard” to test the accuracy of other sensors [
9,
10]. Moreover, the model used in the present study has specifically been developed and validated for assessing R–R intervals [
10], and are thus more accurate than the typical QRS complex detection provided by previous devices. The chest strap was placed around the subjects’ chest, at the xiphoid process, with a close but comfortable fit without moving, as described and shown by the manufacturer [
23]. The close fit of the strap alongside the decision to use tethered swimming aimed to ensure the quality of the HR readings. The wristwatch was placed on the top of the wrist, approximately 2 cm (a finger’s width) up from the pisiform bone (commonly known as the “wrist bone”) and a little bit in the direction of the elbow, as recommended by the manufacturer [
24]. Different strap sizes were used for the male and female swimmers as the male-sized straps tend to be too large for female wrists. This ensured compliance with another manufacturer guideline that states that the device should be tightened firmly against the skin [
24].
The free PPG sensor can be worn in several places, the temple being one of them. It was placed against the temple of the preferred breathing side and attached to the goggle’s straps as described in the owner’s manual [
25], with a small add-on: the swim cap was worn outside of the goggles strap and, therefore, the sensor stayed underneath the swim cap. In theory, this tweak could help reduce the probability of the formation of a water film between the skin and the sensor that could affect the readings. The same swimming cap and goggles were used for all participants, ensuring a tight fit of the sensor against the temple and complete coverup throughout the protocol. All three devices recorded HR in real time to their own internal storage; i.e., although available, no real-time transmission was used with either device. This has to do with the inability of Bluetooth and ANT+ technologies to properly transmit data through water. After the test, all of the data were transferred to the Polar Flow website and then transferred to a laptop in CSV format for further analyses. All devices were previously synchronized with the brand’s computer software to ensure they all had the exact same time stamp. Despite the attempt to start all devices at the same time, it is virtually impossible to achieve per-second synchronization. Thus, corrections were made afterwards based on the devices’ time stamp to ensure that all data analyses started and ended according to the test timings. The HR from the chest strap, temple sensor, and wristwatch were treated as HR
chest, HR
temple, and HR
watch, respectively. All recordings, by any device, that presented reading failures were discarded.
2.3. Statistical Analysis
Data analyses were performed for each test step separately and for the whole test. Prior to every analysis, a close inspection of the chest strap HR readings was taken to check for artifacts or possible interruptions in the readings. To test for a possible influence of the participant’s sex in the difference between methods, an independent sample
t-test was performed. To test for the association between devices, the coefficient of determination (R
2) was calculated through a simple linear regression using a statistical software package (SPSS 29, IBM, Armonk, NY, USA). As a rule of thumb, the correlation was interpreted as very weak if R
2 < 0.04, weak if 0.04 ≤ R
2 < 0.16, moderate if 0.16 ≤ R
2 < 0.49, high if 0.49 ≤ R
2 ≤ 0.81, and very high if 0.81 ≤ R
2 ≤ 1.0 [
26]. Using the same software package, the interclass correlation coefficient (ICC) was also calculated to assess the consistency between the device readings. Values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.90 were deemed indicative of poor, moderate, good, and excellent consistency, respectively [
27]. To check for the measuring agreement between the devices, a Bland–Altman plot was computed in SigmaPlot (SigmaPlot 14.0, Grafiti LLC, Palo Alto, CA, USA). The lower and upper limits of agreement (LoA) were computed, as usually seen in the literature, as the mean difference ± 1.96 SD [
28]. Additionally, the clinically preferable LoA were manually set to a mean difference of −10 and +10 b·min
−1. The agreement was considered valid if at least 80% of the values fell within both limits, as described elsewhere [
29]. An association of r > 0.90 (R
2 > 0.81), a bias of <5 b·min
−1, and an ICC > 0.90 were used to deem the readings as accurate.
3. Results
Figure 1 depicts an example of the HR data collected from all three devices in the whole test. There was not one single case where the chest strap underperformed. No visible artifacts nor sudden reading interruptions were identified. Thus, the accuracy of the HR data from the chest strap was assumed. The test for differences between the male and female participants revealed nonsignificant outcomes for both comparisons (for HR
chest vs. HR
temple t
(3319) = 1.92,
p = 0.055 and for HR
chest vs. HR
watch t
(2796) = 1.25,
p = 0.21).
Looking at the HR
chest vs. HR
temple comparison, a strong correlation was denoted (R
2 = 0.85) when analyzing the whole protocol (
Table 2). Stepwise, it varied (0.37 < R
2 > 0.93). For the ICC, the first step had a moderate reliability (ICC = 0.71, 95% CI from 0.53 to 0.81), whilst all others showed an excellent reliability (ICC ± 0.94, 95% CI from 0.91 to 0.99). The Bland–Altman plots (
Figure 2) revealed that more than 80% of the values fell within the 95% CI boundaries for all situations (separate steps and whole test), which is an indicator of a valid agreement.
Regarding the comparison of the measurement between HR
chest and HR
watch, as described in
Table 2, a moderate correlation was found when looking at the whole test (R
2 = 0.23). However, stepwise, all results revealed a weak correlation between the measuring devices (0.01 ≤ R
2 ≤ 0.06). A poor measurement consistency was found in all analyses (0.03 ≤ ICC ≤ 0.42, 95% CI from −0.04 to 0.68). Again, all Bland–Altman (
Figure 2) plots indicated a valid agreement between the methods, as more than 80% of the values fell within the 95% CI boundaries for all analyses.
4. Discussion
The present study aimed to evaluate the accuracy of two different HR measuring devices while swimming. Information about the accuracy of wearable PPG sensors in water allow for a considered choice when monitoring HR. It was hypothesized that the free PPG sensor would outperform the wristwatch in tethered front crawl swimming and be sufficiently accurate to be used confidently. Both hypotheses were accepted, as the wristwatch performance was poor (R2 = 0.23, ICC = 0.42, and mean bias equal to −26.5) and the temple sensor’s was excellent (R2 = 0.85, ICC = 0.96, and mean bias equal to −1).
4.1. Agreement Between HRchest and HRwatch
On land, wristwatches from this brand have demonstrated a good agreement with ECG signals in various sports and activities [
28,
29]. However, in the present study, the wristwatch underperformed in every established criterion (stepwise and in the whole test), as seen in
Table 2. The literature reports various criteria for accuracy: the presence of an association above 0.90 (r > 0.90 or R
2 > 0.81) and a bias (mean difference) of <3 b·min
−1 [
17], <5 b·min
−1 [
30], or <6 b·min
−1 [
31] are two of the most used criteria. In the present study, an association of r > 0.90 (R
2 > 0.81), a bias of <5 b·min
−1, and an ICC > 0.90 were used as cut-off points for accuracy and, as such, the measurement of HR by the wristwatch was deemed as inaccurate. In the aquatic environment, one of the studies conducted to assess the agreement between wrist-based and chest-based HR readings reported ICC values ranging from 0.2 (at 61–70% HRmax) to 0.44 (at 81–90% HRmax). This is corroborated by their Bland–Altman plot, where a bias of −9.51 and an LoA of approximately −53 and 34 denote a low agreement between the methods. In the present study, a mean bias of −26.5 with an LoA of −73 and 20 for the whole test is in tandem with the low accuracy reported for the wristwatch PPG during swimming.
Also on land, there are reports of reduced accuracy at higher running velocities, meaning that with quicker and wider arm movements, the accuracy falls [
8,
14,
17,
20,
32]. In fact, this impairment is acknowledged by the brand, as on their website one can find a disclaimer warning that wrist-based optical heart rate measurement is “not necessarily accurate in sports where you move your hands vigorously” [
33]. Thus, a low accuracy during front crawl swimming was somewhat expected not only because of the arm movements but also because of the impact with the water during the entry phase of the arms. In a study with a similar setup to ours, in which Olstad and Zinner [
13] do not report any specific watch placement and fit, ICC values as low as 0.2 (at 61–70% HRmax) to 0.44 (at 81–90% HRmax) are reported. The sensor pressure on the skin and the watch placement are stated to affect its accuracy [
24,
34]. That is the reason why the brand has specific guidelines for watch placement and fit. In the present study, all of the brand’s guidelines were followed. Nevertheless, the wristwatch still performed less well than reported on land. A mean bias of −26.5 with an LoA of −73 and 20 for the whole test is in tandem with the low accuracy reported for the wristwatch PPG during swimming. The sensor placement limitation can be indirectly improved by placing a free PPG sensor on the upper arm that transmits in real time to the watch (as the latter is restricted to wrist use), as, in the upper limbs, proximal readings are more accurate than distal ones [
34]. Other studies have also reported a lack of accuracy of wrist-based PPG while swimming whether watch placement is reported or not [
35,
36]. It should be made clear that none of the studies state that the watches or the technology are to blame; nor does the present one. The explanations put forward by all studies are similar: the presence of water between the sensors and the skin and the heavy arm movements seem to be major factors impairing the HR readings. To surpass the water constraint, the only reliable solution available for PPG devices seems to be to cover the sensor to avoid any contact with the water.
Skin pigmentation has been suggested to potentially affect optical HR readings [
21,
37,
38]. The participants in the present study were all within the Type 3 skin tone type according to the Fitzpatrick skin tone scale (due to the non-presence of individuals of other races in the class), thus no skin-tone-related bias should be expected.
A limitation of the present study may come from the swimming skill of the participants. Despite making all efforts to mitigate the water flow between the sensors and the skin (for example, the adoption of tethered swimming and the placement and fit of the wristwatch), a poor technique might have negatively impacted the wristwatch readings specifically. The participants in this study were attending the second academic year of the sports course. Few had previous contact with swimming classes and the majority were beginners. Beginner swimmers have a rougher technique, which may affect negatively the readings at wrist level. For example, during the catch phase, swimming books suggest there should be approximately a 90° angle at the elbow [
39], which, in turn, represents a 90° angle with the water flow. Since the wristwatch sensor is located perpendicularly to the arm, any deviation from this 90° angle at the elbow will increase the flow of water parallel to the arm and, therefore, increase the likelihood of water interfering with the readings. However, Olstad and Zinner [
13] evaluated “well-trained competitive swimmers” and the accuracy was still far from desirable.
4.2. Agreement Between HRchest and HRtemple
Previous studies have reported, during land-based activities, a good agreement between this free PPG sensor and ECG signals with the sensor placed on the upper arm, the forearm, or the temple [
11]. A good agreement is also reported with the “gold standard” chest straps [
21,
40,
41]. The results of the present study are in tandem. The HR
chest vs. HR
temple analysis showed an R
2 of 0.85 (very high), an ICC of 0.96 (CI = 0.95–0.96; excellent), and a mean bias of −1, falling within the criteria for accuracy in every measure. Furthermore, lower intensities scored lower in terms of both R
2 and ICC, in tandem with the literature for land-based activities [
8,
16]. However, an individual analysis of the HR measured at the temple (as seen in
Figure 2(A1)) revealed that a misreading often occurred during the first part of the test (usually around 30 s), i.e., at a lower intensity. This was not the case for all participants, but there was a trend. One could argue that the low exercise intensity was the root cause of the misreading, as the literature suggests. However, it was during the rest step (the last one), where the HR fell to pre-test values (or below), that the highest accuracy was observed. This gives the impression that the problem was at the beginning of the test, which coincidently is when lower intensities were attained.
In all studies where Bland–Altman plots are presented, their LoA should be interpreted carefully. As pointed out by Giavarina [
42], this kind of plot only defines the interval of agreement; it does not say whether the limits are acceptable or not. According to the same authors, the acceptable LoA must be defined a priori, based on clinical necessity, biological considerations, or other goals. This means that statistical agreement is one thing, and clinical acceptability is another. In the present study, it was defined that −10/10 b·min
−1 would be a clinically acceptable lower and upper LoA. As such, despite the agreement shown by the Bland–Altman plots (bias < 5 b·min
−1), one should not consider the calculated LoA appropriate for any of the cases presented in
Figure 2. Instead, consider the proposed −10/10 b·min
−1 (red lines in
Figure 2). Thus, upon closer inspection of the Bland–Altman plots from the HR
chest vs. HR
temple comparison, one can notice that in all plots, the vast majority of the data points fall within the −10/10 b·min
−1 range. As a specific example, looking at the overall plot of HR
chest vs. HR
temple (
Figure 2(A5)) the mean bias is −1, which is very close to zero, but the computed LoA are still larger than 10 b·min
−1 (−17.5 and 15.4 to the lower and upper LoA, respectively). This was due to the previously reported higher inaccuracy of the first 30 s readings. Notwithstanding, most of the values represent differences between −10 and 10 b·min
−1, which agree with the proposed LoA (red lines in
Figure 2). In this study, all devices were started with the subject seated (but not at rest) on the pool’s side wall, fully equipped, just a few seconds before the test start. Thus, there is the possibility that if the temple sensor had been turned on a bit earlier, or if a dry warmup had been performed right before the test start, the values would probably have been more accurate.
This multi-site PPG sensor (in the present study placed at the temple) provides some solutions to the limitations of wrist-based HR measurement, which might justify its results. From a location perspective, it is placed in a site with less movement than anywhere in the upper or lower arm; thus, there are fewer artifacts to be corrected and fewer reading interruptions. Regarding the water contact, the fact that it can be covered by a swimming cap diminishes the probability of water flowing between the sensor and the skin, ensuring uninterrupted, clean readings.
A possible confounding factor may be the presence of hair in the temple zone. In one study, hair was reported to affect the accuracy of PPG [
14]. Specially in female participants, who tend to have longer hair, it may affect skin contact with the sensor. Although the difference in HR
chest and HR
temple between men and women was not significant, it was very close to significance. Thus, the sensor placement must be kept in mind and the results interpreted carefully, as it is not always possible to completely avoid the hairline despite best efforts.
4.3. Practical Implications
Notwithstanding the usefulness of the HR temple sensor, the present study tested the built-in memory of the devices. However, this free PPG sensor can be paired with the wristwatch. This means that if a swimmer wears both sensors, they can still collect some swimming metrics that might be of interest to the swimmer and/or their coach (like meters swum, speed per lap, or even stroke rate), provided by the wristwatch and the accuracy of the temple-worn PPG sensor.
5. Conclusions
During tethered front crawl swimming, the HR values measured with the chest strap, regularly deemed as the portable gold standard, presented a better agreement with those measured with an optical sensor placed on the temple rather than with a wristwatch. Better monitoring of the training intensity should be expected from the temple HR readings, but, nevertheless, swimmers would benefit from a combination of both devices.
The readings on the temple sensor tend to be more accurate if coaches and/or athletes avoid placing it above any hair and start the sensor’s reading before commencing the swim (while doing the dryland warm up, for example). A newer model of the temple sensor has the possibility of transmitting HR data as well as swimming statistics in real time to dedicated software. As swimmers roll their head to breathe, it could be possible for the software to acquire HR data in this small but cyclical moment and display the swimmer’s internal load instantaneously. A possible line of investigation would be to test the accuracy of the signal transmission to the software. Future studies could also aim to evaluate the accuracy of the HR readings during different strokes, at different intensities, and with different durations. Also, the assessment of the accuracy of other swimming wearables (like intelligent swim goggles) would be an interesting line of research.