Concurrent Validity and Reliability of Two Portable Powermeters (Power2Max vs. PowerTap) to Measure Different Types of Efforts in Cycling

The purpose was to assess the concurrent validity and reliability of two portable powermeters (PowerTap vs. Power2Max) in different types of cycling efforts. Ten cyclists performed two submaximal, one incremental maximal and two supramaximal sprint tests on an ergometer, while pedaling power and cadence were registered by both powermeters and a cadence sensor (GarminGSC10). During the submaximal and incremental maximal tests, significant correlations were found for power and cadence data (r = 0.992–0.997 and 0.996–0.998, respectively, p < 0.001), with a slight power underestimation by PowerTap (0.7–1.8%, p < 0.01) and a high reliability of both powermeters (p < 0.001) for measurement of power (ICC = 0.926 and 0.936, respectively) and cadence (ICC = 0.969 and 0.970, respectively). However, during the supramaximal sprint test, their agreement to measure power and cadence was weak (r = 0.850 and −0.253, p < 0.05) due to the low reliability of the cadence measurements (ICC between 0.496 and 0.736, and 0.574 and 0.664, respectively; p < 0.05) in contrast to the high reliability of the cadence sensor (ICC = 0.987–0.994). In conclusion, both powermeters are valid and reliable for measuring power and cadence during continuous cycling efforts (~100–450 W), but questionable during sprint efforts (>500 W), where they are affected by the gear ratio used (PowerTap) and by their low accuracy in cadence recording (PowerTap and Power2Max).


Introduction
Portable powermeters are devices designed to measure the power output (i.e., the exercise intensity) during pedaling.Since the end of the 1980s, these instruments have been used to monitor training, to perform field-based performance tests, to analyze cycling competitions, and to evaluate changes in equipment.They can be classified according to their location on the bike (e.g., rear hub, crank, chainring, pedal, shoe, or handlebar) or to the technology used (e.g., strain gauges, accelerometers, or multi-sensors to measure wind-speed, slope, etc.) [1].
Nowadays, numerous authors have assessed the validity and reliability of the power output measurements of several powermeters (e.g., Stages, Garmin Vector, Quarq, Keo Power, etc.) [2].However, this is not the case for two widely used devices in the scientific literature [1,[3][4][5]: Power2Max (strain gauges in the chainring) and PowerTap G3 (strain gauges in the rear hub).While their validity and reliability during submaximal pedaling in a seated position has been proven [3,5], the validity and reliability of the PowerTap during supramaximal pedaling (i.e., 5 s sprints) and under different pedaling conditions (i.e., seated vs. standing cyclists' positions, low vs. high gear ratios used) has been questioned [3,4].To the best of our knowledge, no previous study has analyzed Power2Max validity and reliability under these types of efforts and conditions.
Furthermore, a recent case study that registered data during competition [1] showed that these two portable powermeters were not interchangeable for short and high-intensity efforts (i.e., < 10 s and > 7.5 W•kg −1 , respectively).This is crucial, because the ability to repeat these efforts is a differential factor between elite and non-elite male and female cyclists, and also because most of these studies obtained data from different types of portable powermeters [6][7][8].
Therefore, the main purpose of the present study was to compare the concurrent validity of two portable powermeters (PowerTap vs. Power2Max) to measure different types of efforts (submaximal, maximal, and supramaximal) in cycling.The reliability of these devices during submaximal and supramaximal efforts was also analyzed.

Materials and Methods
Ten club cyclists of performance level 3 (1-5 scale [9]) participated in this study (age: 22.1 ± 8.1 years, height: 1.79 ± 0.06 m, body mass: 67.5 ± 3.3 kg; maximal aerobic power: 352.6 ± 44.6 W; cycling experience: 6.6 ± 4.0 years).All of them voluntarily participated and signed written consent.Inclusion criteria were to have competed in cycling for at least 2 years and to have a training volume of more than 3000 km before the start of the study.The study was approved by the University Ethics Committee and met the requirements of the Declaration of Helsinki for research on human beings.

Procedures
The testing was performed at the beginning of the cyclists' competitive season (February-April).They arrived at the laboratory (800 m of altitude) with their bikes, after a 48 h period without hard training.First, the cyclists' anthropometrical characteristics and the bikes' dimensions were registered [10].The bikes' dimensions and the clipless pedals were then replicated in a carbon fiber bike (Scott Adicct 30, Scott Sports, Givisiez, Switzerland) where one cadence sensor and two powermeters were installed.The comparison between the registry of these devices allowed the study of the concurrent validity.This bike was set on a cyclosimulator (Cateye CS-1000, Cateye Co., Ltd., Osaka, Japan) [11], and the cyclists performed a 10 min 100 W warm-up period with a 5 min rest before starting the test.
The assessment protocol (Figure 1) was performed in two sessions (day 1 and day 2) separated by a minimum of 48 h, and under similar environmental conditions (20-25 • C, 60-65% relative humidity).On the first day, a submaximal test (submaximal test 1) and an incremental maximal test were performed, with a 15 min rest between them.On the second day, another submaximal test (submaximal test 2) and a supramaximal sprint test were performed, with a 15 min rest between them.The sub-maximal tests were used to analyze the inter-day reliability while the supra-maximal sprint tests were used to analyze intra-day reliability.The riding position was standardized, with the cyclists having to rest their hands on the brakes while holding a seated position during both the incremental and supramaximal tests.During the submaximal test, the cyclists pedaled while both seated and standing.In all tests, they were able to drink water at libitum during the recovery periods to avoid dehydration.

Cadence Sensor and Powermeters' Adjustment
The cadence sensor (Garmin GSC 10, Lenexa, KA, USA) was placed at the right chainstay, and the two powermeters were placed at the chainring (Power2Max Type S, Waldhufen, Germany) and at the rear hub (PowerTap G3, Madison, WI, USA).These three devices were paired with three measurement units (Garmin, Lenexa, KA, USA): Edge 810, Edge 500, and Edge 705, respectively.They were configured at 1 Hz sample frequency and installed on the handlebar stem.To avoid the effect of temperature on the calibration procedure, the bike remained at the same ambient conditions in which the data were obtained for at least 30 min before the start.Afterwards, the power meters were zeroed according to the manufacturer's guidelines before performing each test [1].The rear wheel pressure was also standardized at 7 atmospheres.

Submaximal Tests (1 and 2)
On days 1 and 2, three sets of pedaling at 30, 35, and 40 km•h −1 were performed, with 2 min rest between them.Each set consisted of 6 repetitions of 1 min pedaling at different cadences (60, 80, and 100 rpm) and riding positions (seated and standing, alternatively).The cyclists adapted their position and cadence during the first 30 s of each repetition, and the data from the last 30 s were used for the analysis.The data from day 1 were used to analyze the concurrent validity, and the data from day 2 were used to analyze the interday reliability.The cadence sensor display (i.e., Edge 810) was used by the cyclists as visual reference to adapt their cadence, and the cyclosimulator display (Cateye CS-1000) to adapt their speed.

Incremental Maximal Test
On day 1, this test was performed following the submaximal test, after a recovery period of 15 min.The initial testing speed was set at 27 km•h −1 , increasing by 1 km•h −1 every minute until the cyclist was not able to maintain the speed [11].The gear ratio was freely chosen by the cyclists, and they were asked to maintain a cadence between 85 and 100 rpm and to remain seated at all times during the test.Data from the entire minute was used for the analysis, and the display of the cyclosimulator (Cateye CS-1000) was used as visual reference for the cyclists to adapt to the testing speed.

Cadence Sensor and Powermeters' Adjustment
The cadence sensor (Garmin GSC 10, Lenexa, KA, USA) was placed at the right chainstay, and the two powermeters were placed at the chainring (Power2Max Type S, Waldhufen, Germany) and at the rear hub (PowerTap G3, Madison, WI, USA).These three devices were paired with three measurement units (Garmin, Lenexa, KA, USA): Edge 810, Edge 500, and Edge 705, respectively.They were configured at 1 Hz sample frequency and installed on the handlebar stem.To avoid the effect of temperature on the calibration procedure, the bike remained at the same ambient conditions in which the data were obtained for at least 30 min before the start.Afterwards, the power meters were zeroed according to the manufacturer's guidelines before performing each test [1].The rear wheel pressure was also standardized at 7 atmospheres.

Submaximal Tests (1 and 2)
On days 1 and 2, three sets of pedaling at 30, 35, and 40 km•h −1 were performed, with 2 min rest between them.Each set consisted of 6 repetitions of 1 min pedaling at different cadences (60, 80, and 100 rpm) and riding positions (seated and standing, alternatively).The cyclists adapted their position and cadence during the first 30 s of each repetition, and the data from the last 30 s were used for the analysis.The data from day 1 were used to analyze the concurrent validity, and the data from day 2 were used to analyze the inter-day reliability.The cadence sensor display (i.e., Edge 810) was used by the cyclists as visual reference to adapt their cadence, and the cyclosimulator display (Cateye CS-1000) to adapt their speed.

Incremental Maximal Test
On day 1, this test was performed following the submaximal test, after a recovery period of 15 min.The initial testing speed was set at 27 km•h −1 , increasing by 1 km•h −1 every minute until the cyclist was not able to maintain the speed [11].The gear ratio was freely chosen by the cyclists, and they were asked to maintain a cadence between 85 and 100 rpm and to remain seated at all times during the test.Data from the entire minute was used for the analysis, and the display of the cyclosimulator (Cateye CS-1000) was used as visual reference for the cyclists to adapt to the testing speed.

Supramaximal Sprint Test
On day 2, the cyclists performed this test after the submaximal test, following a 15 min recovery period.It consisted of 4 sets of seated pedaling at supramaximal intensity with a 4 min rest in between.Each set consisted of 3 repetitions of 5 s supramaximal pedaling using different gear ratios (36-19, 36-13, 52-15 and 52-12) with 3 min of active pedaling at 100 W between each repetition.The test started at a cadence of 80 rpm after a 10 s countdown.At the end of the recovery period, another 10 s countdown was displayed, and the cyclists were instructed to pedal as fast as possible for another 5 s.The peak values of power and cadence measurements during the entire 5 s period were analyzed, and the Power2Max display (i.e., Edge 500) was used as a visual reference by the cyclists to adapt their power and cadence during the recovery period.

Statistical Analysis
Data of the three measurement units (Edge 810, Edge 500, and Edge 705) were analyzed using the same cycling software (Golden Cheetah 3.1 [12]).The results are expressed as mean ± SD.The SPSS+ software was used to analyze both the power and cadence measurements (v.26.0,IBM Corp, Armonk, NY, USA).One-way analysis of variance (ANOVA) for repeated measures was applied to compare the two powermeters (PowerTap vs. Power2Max) when additional variables were not included.Two-and three-way ANOVA for repeated measures were used to analyze the effects of testing speed (30, 35, and 40 km•h-1), cadence (60, 80, and 100 rpm) and pedaling position (standing vs. seated) on the percentual differences between the two powermeters.These percentual differences were calculated as follows: Differences (%) = (PowerTap − Power2Max) × 100/mean value of the two powermeters.When a significant F value was found, the Newman-Keuls post hoc analysis was used to establish statistical differences between means, and the 95% confidence interval (CI95%) of these differences was calculated.Pearson correlation coefficient (r) was used to assess the relationships between variables.Inter-and intra-day reliability of both submaximal and supramaximal sprint tests was assessed using the coefficient of variation (CV) and the intraclass correlation coefficient (ICC) [13,14].Assumptions of the ANOVA and Pearson correlation tests were examined prior to the analysis.Values of p < 0.05 were considered statistically significant.
Figure 2 shows the Bland-Altman plots of the percentual differences in both power and cadence measurements between the two powermeters during the submaximal test (Day 1).The power differences were constant (r = −0.01 and p > 0.05), while the cadence differences decreased as cadence increased (r = 0.31 and p < 0.001).Figure 2 shows the Bland-Altman plots of the percentual differences in both power and cadence measurements between the two powermeters during the submaximal test (Day 1).The power differences were constant (r = −0.01 and p > 0.05), while the cadence differences decreased as cadence increased (r = 0.31 and p < 0.001).

Supramaximal Sprint Test
A significant relation (r = 0.850 and p < 0.001) was found between the peak power registered by the two powermeters.However, the correlation was lower and negative for the peak cadence (r = −0.253and p < 0.05).The peak cadence measurements of the Power2Max correlated positively (r = 0.635 and p < 0.001) with the cadence sensor (Garmin Figure 3 shows the Bland-Altman plots of the percentual differences in both power and cadence measurements between the two powermeters during the incremental maximal test.The power differences were constant (r = −0.01 and p > 0.05), while the cadence differences decreased as cadence increased (r = 0.32 and p < 0.001).
Figure 4 shows the Bland-Altman plots of the percentual differences in both peak power and peak cadence measurements between the two powermeters during the supramaximal sprint test.Power and cadence differences increased as cadence increased (r = −0.34 and −0.37, respectively; p < 0.001).(p < 0.05) in the power measurements between the PowerTap and Power2Max.# Significant differences (p < 0.001) in the cadence measurements with respect to the cadence sensor (Garmin GSC10).
Figure 4 shows the Bland-Altman plots of the percentual differences in both peak power and peak cadence measurements between the two powermeters during the supramaximal sprint test.Power and cadence differences increased as cadence increased (r = −0.34 and −0.37, respectively; p < 0.001).
Intra-day reliability analysis for the supramaximal sprint test showed overall significant test-retest correlations for the PowerTap and Power2Max powermeters when analyzing both power (ICC between 0.918 and 0.943, and 0.896 and 0.939, respectively; p < 0.001) and cadence data (ICC between 0.496 and 0.736, and 0.574 and 0.664, respectively; p < 0.05).The reliability of this last variable was highest for the cadence sensor (ICC between 0.987 and 0.994).The power's coefficient of variation was significantly different in the PowerTap and Power2Max (CV = 4.9 ± 4.3 and 10.0 ± 15.0%, respectively; F = 4.8 and p < 0.05).The cadence's coefficient of variation was higher in the two powermeters compared to the cadence sensor (CV = 6.8 ± 5.6, 6.5 ± 10.6, and 1.1 ± 0.8%, respectively; F = 6.2 and p < 0.01).
Intra-day reliability analysis for the supramaximal sprint test showed overall significant test-retest correlations for the PowerTap and Power2Max powermeters when analyzing both power (ICC between 0.918 and 0.943, and 0.896 and 0.939, respectively; p < 0.001) and cadence data (ICC between 0.496 and 0.736, and 0.574 and 0.664, respectively; p < 0.05).The reliability of this last variable was highest for the cadence sensor (ICC between 0.987 and 0.994).The power's coefficient of variation was significantly different in the PowerTap and Power2Max (CV = 4.9 ± 4.3 and 10.0 ± 15.0%, respectively; F = 4.8 and p < 0.05).The cadence's coefficient of variation was higher in the two powermeters compared to the cadence sensor (CV = 6.8 ± 5.6, 6.5 ± 10.6, and 1.1 ± 0.8%, respectively; F = 6.2 and p < 0.01).

Discussion
The main finding of this study is to demonstrate the concurrent validity and reliability of the PowerTap and Power2Max powermeters during the monitoring of both submaximal and incremental maximal efforts (i.e., continuous pedaling), but not during the supramaximal efforts (i.e., sprint pedaling).The small differences observed between the two powermeters during continuous pedaling are justified by their location (i.e., rear hub vs. chainring, respectively) and by the effect of both the cadence and the cyclists' position on the bike (i.e., higher at lower cadences and in the standing position).On the other hand, the high differences and low reliability of the power and cadence measurements during the sprints are related to the lack of accuracy in the cadence measurements (PowerTap and Power2Max) and to the effect of the gear ratio on the power measurements (PowerTap).

Continuous Pedaling (Submaximal and Incremental Maximal Tests)
The agreement found between PowerTap and Power2Max power measurements during both submaximal and incremental maximal tests (r = 0.992 y 0.997, respectively) is similar to that obtained in previous studies (r = 0.997, p < 0.001) in which PowerTap Additionally, Power2Max showed low intra-day reliability to measure peak power, with a coefficient of variation higher than 5% and that observed in PowerTap (CV = 10.0 ± 15.0 and 4.9 ± 4.3%, respectively).Both devices use four strain gauges and internally calculate the pedaling cadence, so the sources of error in power measurements can be diverse [4].Cadence measurement could be one of them, because cadence is used to calculate power (i.e., Power = torque × crank angular velocity).Thus, our results surprisingly showed that the agreement between the two devices for measuring cadence is very weak (r = −0.253and p < 0.05).Moreover, PowerTap and Power2Max underestimated peak cadence measurements compared to the cadence sensor (Table 3), showing a weak agreement with this device (r = −0.427and 0.635, respectively) and higher coefficients of variation (CV = 6.8 ± 5.6, 6.5 ± 10.6, and 1.1 ± 0.8%, respectively).Along the same lines, a recent study that compared SRM and Favero showed that the CVs of these two devices during sprint pedaling were higher than those observed during continuous pedaling for both power and cadence measurements [18].An important difference of the present study with respect to previous studies [3,4,18] is that the cyclists were already pedaling before the start of the sprint tests, thus resulting in higher cadence values (i.e., >160 rpm and 130-150 rpm, respectively).Taking into account that the accuracy of cadence measurement with portable powermeters depend on the cadence value (Figure 4b), this should be considered during pedaling efforts with cadences higher than 150 rpm (e.g., track cycling or sprints in road cycling).

Limitations
The main limitations of the present work were the following: (A) The tests were performed under laboratory conditions, which does not represent the changes in the environmental temperature and vibrations that affect the power output measured [4].However, the laboratory tests allowed us to better standardize the protocols and identify the variables that affected the powermeter measurement (i.e., type of effort, position on the bike or cadence).(B) Only one measurement unit of PowerTap and Power2Max were used, knowing that it is possible to obtain differences between units of the same powermeter [5,19].(C) Inter-day reliability was analyzed in the submaximal test, while intra-day reliability was analyzed in the supramaximal sprint test.This was due to the need to design a cost-effective protocol for competitive cyclists, including no more than two days of testing.Future studies should design a protocol to analyze inter-and intra-day reliability in both the submaximal and supramaximal sprint tests.(D) The reliability (CVs) of the power and cadence measurements during each condition depended both on the reliability of the devices (PowerTap and Power2Max) and on the biological variability of the cyclists while pedaling (i.e., having to adjust the velocity on the ergometer), so alternative ergometers to the Cateye CS-1000 should be used in future studies.(E) The minimum sample size for adequate statistical power to analyze test-retest reliability was not calculated [20], although the sample size of the present study was similar to those used in previous recent studies on the same topic [21][22][23].(F) The participants of the study had a performance level 3 (i.e., club competitors).Therefore, future studies should check if these results are similar in other levels of performance (e.g., 1 and 5).

Conclusions
The PowerTap and Power2Max powermeters are valid and reliable devices for measuring power and cadence during continuous cycling efforts at pedaling intensities between ~100 and 450 W. In these type of efforts, PowerTap underestimates power by 0.7-1.8%,depending on pedaling cadence, the cyclist's position on the bike (seated vs. standing), and the gear ratio used.
However, the validity and reliability of these powermeters for monitoring power and cadence during sprint efforts (>500 W) is highly questionable, as they are affected by the gear ratio used (PowerTap) and by their lack of accuracy in cadence recording (PowerTap and Power2Max), among other possible factors that should be investigated in further studies.

Figure 1 .
Figure 1.Schematic representation of the experimental design of the present study.Max = maximum possible speed; rpm = revolutions per minute; Gear ratio = Number of chainring teeth-number of sprocket teeth.

Figure 1 .
Figure 1.Schematic representation of the experimental design of the present study.Max = maximum possible speed; rpm = revolutions per minute; Gear ratio = Number of chainring teeth-number of sprocket teeth.

Figure 2 .
Figure 2. Bland-Altman plots of the differences between the PowerTap and Power2Max measurements during the submaximal maximal test: (a) Percentual differences in the power measurements; (b) Percentual differences in the cadence measurements.The short-dashed lines represent the upper and lower 95% limits of agreement (±1.96SD), the solid line represents the bias (Mean) and the dotted line represents the tendency (correlation) between the two variables.

Figure 2 .
Figure 2. Bland-Altman plots of the differences between the PowerTap and Power2Max measurements during the submaximal maximal test: (a) Percentual differences in the power measurements; (b) Percentual differences in the cadence measurements.The short-dashed lines represent the upper and lower 95% limits of agreement (±1.96SD), the solid line represents the bias (Mean) and the dotted line represents the tendency (correlation) between the two variables.

Figure 3 .
Figure 3. Bland-Altman plots of the differences between the PowerTap and Power2Max measurements during the incremental maximal test: (a) Percentual differences in the power measurements; (b) Percentual differences in the cadence measurements.The short-dashed lines represent the upper and lower 95% limits of agreement (±1.96SD), the solid line represents the bias (Mean) and the dotted line represents the tendency (correlation) between the two variables.

Figure 3 .
Figure 3. Bland-Altman plots of the differences between the PowerTap and Power2Max measurements during the incremental maximal test: (a) Percentual differences in the power measurements; (b) Percentual differences in the cadence measurements.The short-dashed lines represent the upper and lower 95% limits of agreement (±1.96SD), the solid line represents the bias (Mean) and the dotted line represents the tendency (correlation) between the two variables.

Figure 4 .
Figure 4. Bland-Altman plots of the differences between the PowerTap and Power2Max ments during the supramaximal sprint test: (a) Percentual differences in the peak power measurements; (b) Percentual differences in the peak cadence measurements.The short-dashed lines represent the upper and lower 95% limits of agreement (±1.96SD), the solid line represents the bias (Mean) and the dotted line represents the tendency (correlation) between the two variables.

Figure 4 .
Figure 4. Bland-Altman plots of the differences between the PowerTap and Power2Max measurements during the supramaximal sprint test: (a) Percentual differences in the peak power measurements; (b) Percentual differences in the peak cadence measurements.The short-dashed lines represent the upper and lower 95% limits of agreement (±1.96SD), the solid line represents the bias (Mean) and the dotted line represents the tendency (correlation) between the two variables.
Overall: Mean of all speeds, cadences, and positions for the same powermeter.* Significant differences (p < 0.05) between the two powermeters.

Table 2 .
Mean± SD of the power and cadence registered by the two powermeters (PowerTap vs. Power2Max) during the incremental maximal test at different speeds (from 27 to 49 km•h −1 ).