Power Assessment in Road Cycling: A Narrative Review

Nowadays, the evaluation of physiological characteristics and training load quantification in road cycling is frequently performed through power meter data analyses, but the scientific evidence behind this tool is scarce and often contradictory. The aim of this paper is to review the literature related to power profiling, functional threshold testing, and performance assessment based on power meter data. A literature search was conducted following preferred reporting items for review statement (PRISMA) on the topic of {“cyclist” OR “cycling” AND “functional threshold” OR “power meter”}. The reviewed evidence provided important insights regarding power meter-based training: (a) functional threshold testing is closely related to laboratory markers of steady state; (b) the 20-min protocol represents the most researched option for functional threshold testing, although shorter durations may be used if verified on an individual basis; (c) power profiling obtained through the recovery of recorded power outputs allows the categorization and assessment of the cyclist’s fitness level; and (d) power meters represent an alternative to laboratory tests for the assessment of the relationship between power output and cadence. This review elucidates the increasing amount of studies related to power profiling, functional threshold testing, and performance assessment based on power meter data, highlighting the opportunity for the expanding knowledge that power meters have brought in the road cycling field.


Introduction
Road cycling is an extremely demanding endurance sport characterized by its cyclic nature, large training volumes, and high intensities [1]. The activity is comprised of several different disciplines with clear physiological differences according to the typology of the cyclist and the particularities of the event (length, elevation gain, mass, or individual start, etc.) [2]. As a consequence, different types of riders specialized in specific events and efforts have appeared: time trialists [3,4], sprinters [5], and grand tour riders [6] are some examples.
These differences have implications for the evaluation of training characteristics and load quantification, which are currently performed through several laboratory and field methods [7,8]. Among the field methods, subjective assessments, such as ratings of perceived exertion, stand out due to their easy implementation [9,10]. Previous research has shown that such methods present moderate to substantial differences compared to heart rate monitoring [7,[11][12][13]. Heart rate-based assessments are also linked to several setbacks, such as the underestimation of neuromuscular and anaerobic efforts, delayed response to the stimuli, and difficulties for the precise assessment of intermittent efforts [14][15][16]. As for laboratory methods, measurements of oxygen uptake and blood lactate concentrations are some lactate concentrations are some of the most widely used. Although these measurements are precise and reliable [17], they are also linked to several limitations, such as the reliance on expensive equipment and the fact that the cyclist needs to get to a laboratory setting in order to be tested [18]. Therefore, laboratory-based methods are inadequate for measuring performance and training load on a day-by-day basis. Mobile power meters (Mpm), contrary to heart rate monitors or subjective scales, measure workload directly and not only the physiological response to the effort [8,19]. Furthermore, the anaerobic threshold and VO2max, two of the most important laboratory markers, can be calculated from power output (PO) during field training sessions [4,20]. Although modern Mpm vary considerably in their trueness-0.9 ± 3.2% (mean ± SD), the precision is generally high (1.2 ± 0.9%) (mean ± SD) [21][22][23] and, therefore, these tools may represent an interesting alternative for training load quantification given their ability to provide an objective assessment of anaerobic, neuromuscular, and intermittent efforts.
Among the main practical applications of Mpm is the functional threshold power (FTP) testing proposed by Allen and Coggan [19]. The result obtained from subtracting 5% of the mean PO sustained during a 20-min time trial is, according to the authors, the maximum PO, which can be maintained by the cyclist in a quasi-steady state. Allen and Coggan's work [19] is widely considered as the reference for power meter-based training in the practical field and has, therefore, been chosen as the unifying thread for the current review. However, it should be highlighted that the authors did not provide a sufficient scientific basis to support their assumptions and the research that has attempted to validate their theories has had varying degrees of success [19,[24][25][26][27]. FTP is also used as a reference for establishing seven different training zones and, additionally, the testing protocol provides information about the riders' power profile, which can help in their classification according to their strengths and weaknesses [28]. The establishment of training zones allows for the accurate tracking of fitness, form, and fatigue, while it also enables setting the intensities of the training sessions precisely and minimizing burnout risk [19]. On the other hand, power profiling is used to assess the riders' level and potential, and, at the same time, serves for redirecting training to work on riders' weaknesses and also to improve their strengths [29]. Figure 1 summarizes the main practical applications of Mpm-based training. Mpm-based assessments integrate both an objective measure of the work performed and the individual physiological characteristics, two elements that have been suggested as indispensable for the correct quantification of training load in road cycling [30]. Although the FTP test, the training zones derived from its determination, and the power profile charts are commonly used by athletes and coaches, the scientific evidence behind these tools is scarce and often contradictory.
Previous evidence suggests that laboratory-based tests may represent a better alternative than field-based tests, as they are more reliable and enable the measurement of physiological variables that provide additional information [31]. This is further supported by Reiser [32], who showed that power output values obtained in the lab may be transferred to the field, especially if the cyclist is riding his/her own Mpm-mounted bike placed on an ergometer. To date, the evidence obtained from Mpm-based assessments integrate both an objective measure of the work performed and the individual physiological characteristics, two elements that have been suggested as indispensable for the correct quantification of training load in road cycling [30]. Although the FTP test, the training zones derived from its determination, and the power profile charts are commonly used by athletes and coaches, the scientific evidence behind these tools is scarce and often contradictory.
Previous evidence suggests that laboratory-based tests may represent a better alternative than field-based tests, as they are more reliable and enable the measurement of physiological variables that provide additional information [31]. This is further supported by Reiser [32], who showed that power output values obtained in the lab may be transferred to the field, especially if the cyclist is riding his/her own Mpm-mounted bike placed on an ergometer. To date, the evidence obtained from the studies that have attempted to assess the relationship between laboratory markers and the results obtained in a field test has not been reviewed. Whether laboratory-set thresholds can be replicated through a field test (and its ideal duration) remains unknown.
To date, the most commonly applied approach for profiling and evaluating cycling performance level has been developed by Coggan [19]. However, this method has several drawbacks: firstly, there is no scientific consensus regarding what defines trained, well-trained, and elite cyclists. Secondly, this method does not allow a distinction to be made based on the effort put into training, as genetic endowment leads to highly variable performance levels for a given training effort [33]. Lastly, concern arises about using world-class performance values as a reference point due to the fact that doping may cause the distortion of these values [34]. A comparison between the thresholds suggested by Coggan [19] and real-world values is needed before approving this power profiling method.
Mpm located in the crank arm can assess both angular velocity and torque [18,19]. This has important implications, as it allows the determination of not only cadence, but also different pedaling patterns and even power distribution between the legs [23]. Accordingly, Mpm have incorporated the possibility of assessing individual pedaling techniques in real-world conditions. Whether cadence is modified with power and/or cycling discipline is a question that merits further investigation.
Consequently, the following narrative review aims to shed light on the following questions: (a) can the FTP test be used interchangeably with laboratory-set thresholds in trained and/or untrained cyclists?; (b) what is the ideal testing duration for assessing FTP in trained and/or untrained cyclists?; (c) could power profiling be used for talent detection and for assessing cyclists' strength and weaknesses?; and (d) does optimal cadence depend on power output and place of assessment (laboratory versus field)?

Information Sources
A computer-based scientific literature search was completed from 1 March to 31 March 2020, using the following information sources: Medline (PubMed), Web of Science (WOS), the Cochrane Collaboration Database, Cochrane Library, Evidence Database (PEDro), Evidence Based Medicine (EBM) Search review, National Guidelines, EMBASE, and Scopus and Google Scholar system. To obtain an overview of the methodologies used to study FTP, power profiling, and power-based training zones, a broad search was performed for topics relating to cycling and Mpm using the keywords "cyclist," "cycling," "functional threshold," and "power meter" with Boolean operators, such as "AND" or "OR".

Study Inclusion Criteria
Two reviewers independently examined the titles and abstracts of all publications and determined the relevance of the publications for inclusion. The full texts were obtained to ascertain whether the publications satisfied the inclusion criteria. In addition, the reference sections of the selected articles were searched to identify other relevant articles. When considering final inclusion in this review, each paper's relevance to the following question was considered: does this document add to the field of Mpm-based cycling training and performance assessment? Stemming from this question, the following inclusion criteria were used: (a) studies related to power meter-based performance assessment; (b) samples of healthy trained and untrained participants; and (c) publication date between 1 January, 1980 and 31 December, 2019.
Following an initial full-text review, 42 out of the original 256 articles were deemed directly relevant to the topic and included for detailed reading. Using these criteria, 32 scientific papers with clear methodologies were selected for this review together with one relevant book, which was also included in the database and used to connect this paper's focus on empirical methods with the practical discourse on FTP and Mpm data.

Study Exclusion Criteria
Duplicated articles were deleted and abstracts and non-peer reviewed articles were excluded. The exclusion criteria were as follows: (a) studies related to power meter-based health interventions and assessments; (b) samples of unhealthy trained and untrained participants; and (c) power meter-based assessments in other sport disciplines. Ten records were excluded from the review process due to the following reasons: unrelated to the field of cycling (n = 6); cycling-based assessments performed with no performance purposes (n = 2); and unhealthy participants (n = 2). The study selection process has been summarized in Figure 2.

Study Exclusion Criteria
Duplicated articles were deleted and abstracts and non-peer reviewed articles were excluded. The exclusion criteria were as follows: a) studies related to power meter-based health interventions and assessments; b) samples of unhealthy trained and untrained participants; and c) power meterbased assessments in other sport disciplines. Ten records were excluded from the review process due to the following reasons: unrelated to the field of cycling (n = 6); cycling-based assessments performed with no performance purposes (n = 2); and unhealthy participants (n = 2). The study selection process has been summarized in Figure 2.

Relationship between Functional Threshold Power and Laboratory Thresholds
Ventilatory and lactate thresholds can be currently obtained through different methods during a graded exercise test. Ventilatory thresholds (VT) [35] and respiratory compensation points (RCP) [36] are normally calculated from oxygen uptake data. There is also a broad range of lactate thresholds (LT), which respond to different concepts and can be obtained through several different testing protocols: individual anaerobic threshold (IAT) [37], maximal lactate steady state (MLSS) [38], fixed blood lactate concentrations of 2 and 4 mmol/L, initial rises of 1 mmol/L, Dmax [39], and modified Dmax [40] methods.
The evidence regarding the true relationship between FTP and this broad range of laboratoryset thresholds is scarce and contradictory. It has been verified that FTP obtained from a 20-mintest

Relationship between Functional Threshold Power and Laboratory Thresholds
Ventilatory and lactate thresholds can be currently obtained through different methods during a graded exercise test. Ventilatory thresholds (VT) [35] and respiratory compensation points (RCP) [36] are normally calculated from oxygen uptake data. There is also a broad range of lactate thresholds (LT), which respond to different concepts and can be obtained through several different testing protocols: individual anaerobic threshold (IAT) [37], maximal lactate steady state (MLSS) [38], fixed blood lactate concentrations of 2 and 4 mmol/L, initial rises of 1 mmol/L, Dmax [39], and modified Dmax [40] methods.
The evidence regarding the true relationship between FTP and this broad range of laboratory-set thresholds is scarce and contradictory. It has been verified that FTP obtained from a 20-mintest can Sustainability 2020, 12, 5216 5 of 14 be sustained for long time periods (50-60 min) [41,42], an estimation that nears the quasi-steady state proposed by Allen and Coggan [19]. Therefore, out of all the methods for establishing the laboratory thresholds, the MLSS and the RCP should theoretically be linked to the FTP, as both refer to stable states that can be sustained over time [43]. These relationships have been previously tested and the correlations were nearly perfect for both RCP (r = 0.97) and MLSS (r = 0.91), although the intensity at which MLSS was represented differed by as much as 7% from FTP [44]. Furthermore, the relationship changed depending on the cyclists' level, with the well-trained group showing a higher association (r = 0.94) than the trained group (r = 0.91) [28,45,46]. Similar findings have been obtained in another study, in which FTP and LT were closely linked in trained cyclists but not in recreational cyclists [24]. The PO obtained from FTP 20-min testing does not seem to correlate with all the other LT methods [24,26,27], except for fixed blood lactate concentrations of 4.0 mmol/L (r = 0.88, p < 0.001) [46]. On the other hand, another FTP testing duration has been attempted in several scientific studies. Carmichael and Rutberg [47] proposed an 8-min FTP estimation test, where 90% of the mean PO was used to calculate the functional threshold. As with the 20-min FTP test, a meaningful relationship was only established when LT was determined as the onset of blood lactate at 4.0 mmol/L, although moderate correlations were obtained for the lactate thresholds obtained as an initial rise of 1.00 mmol/L, Dmax, and modified Dmax (r = 0.61-0.82) [25,48,49]. Table 1 summarizes the most important aspects of the studies included in this section of the review. FTP was only associated to LT at 4.0 mmol/L. From the reviewed studies, the protocol proposed by Allen and Coggan [19] has been used the most for establishing FTP, and high correlations between FTP obtained through this method and several laboratory tests, such as RCP and MLSS, have been observed [43][44][45][46][47]. However, the existence of high levels of inter-individual variability could influence the obtained values. Although various studies have proven a relationship between FTP and LT determined as the onset of blood lactate at 4.0 mmol/L, it is well known that establishing LT at fixed blood lactate levels does not take into account the considerable inter-individual differences in lactate metabolism and may overestimate or underestimate the MLSS, which shows great variability among individuals (from 2-8 mmol/L) [50]. Therefore, this finding remains anecdotal, as relying on fixed values for determining the anaerobic threshold is no longer accepted in the practical field [51]. Finally, the reviewed studies have used samples characterized by wide ranges of fitness levels (VO 2max from 46 to 75 mL/kg/min −1 ). This brings us to the question of whether fitness level has an influence on the relationship between FTP and laboratory thresholds. To date, the sample has been divided according to the fitness level in only a few studies and the conclusions suggest that the relationship between FTP and laboratory markers may be stronger in well-trained individuals compared to untrained cyclists [24,28,[45][46][47]. Although this finding should be further explored in future studies, some of the reasons behind this observation could be related to the fact that higher level cyclists are normally more experienced and previous familiarization and pacing experience play an important role in the accuracy of steady-state tests, such as the FTP [52][53][54].

FTP Testing Durations
As stated in the previous section, FTP obtained through Allen and Coggan's method [19] is very highly correlated to steady-state physiological concepts, such as MLSS [28] and RCP [45,46]. Despite this, this testing duration may have some setbacks: experience and pacing strategy play an important role in long time trial-like efforts [52][53][54][55][56][57][58][59][60][61][62][63] and the results of the test seem to be strongly influenced by previous familiarization [54], especially in inexperienced athletes. As the FTP is, per definition, a quasi-steady state that relies mainly on aerobic metabolism, it could be suggested that almost any steady state time trial effort of sufficient duration would be related to this threshold [55]. Consequently, several authors have suggested shorter alternatives for testing FTP.
Carmichael and Rutberg [47] proposed an 8-min test for estimating FTP, which does not seem to be related to any laboratory-set threshold except for the fixed blood lactate concentration of 4 mmol/L [26,48,49]. Furthermore, it has not yet been confirmed whether 90% of the 8-min PO equals 95% of the 20-min PO. Moreover, it is well known that maximal PO obtained during a graded exercise test accurately predicts FTP [55], and as much as 91% of PO variation in a 20-min test can be explained by peak oxygen uptake [46]. Accordingly, several even shorter durations have been proposed for FTP testing: 4-min PO seems to be very strongly correlated to 20 and 60-min PO (r = 0.92-0.95, p < 0.001) and could represent 75% of the maximal PO that can be sustained during one hour [56]. Contrary to what is suggested in the standard protocol, the aforementioned study showed that 60-min PO represented 90% and not 95% of 20-min PO, a difference that could be explained by the discrepancies in the warm-up protocol performed in this study and what Allen and Coggan [19] suggest. Burnley [57] suggested that subtracting 15 watts to the mean PO obtained in the last 30 s of a 3-min all-out test results in a steady state that can be maintained with stable VO 2 and blood lactate levels. However, no assessment of 20-min PO was performed in this case and these results, despite promising, could only be verified in 60% of all tested subjects. Considering the latter, caution is required when attempting these protocols on a group basis without previous verifications. Figure 3 represents a summary of the different durations and intensities proposed for establishing FTP. [57] suggested that subtracting 15 watts to the mean PO obtained in the last 30 s of a 3-min all-out test results in a steady state that can be maintained with stable VO2 and blood lactate levels. However, no assessment of 20-min PO was performed in this case and these results, despite promising, could only be verified in 60% of all tested subjects. Considering the latter, caution is required when attempting these protocols on a group basis without previous verifications. Figure 3 represents a summary of the different durations and intensities proposed for establishing FTP. As described in this section, different FTP testing durations have been evaluated in previous scientific research. The protocol proposed by Allen and Coggan [19] allows the determination of a steady-state PO that is commonly linked to several laboratory markers, such as MLSS and RCP. Shorter tests may represent a promising alternative, especially for unexperienced athletes due to their easy implementation and limited duration. However, current evidence regarding these alternatives is still scarce and, therefore, further studies need to clarify whether shorter protocols can be used interchangeably with Allen and Coggan's test [19].  As described in this section, different FTP testing durations have been evaluated in previous scientific research. The protocol proposed by Allen and Coggan [19] allows the determination of a steady-state PO that is commonly linked to several laboratory markers, such as MLSS and RCP. Shorter tests may represent a promising alternative, especially for unexperienced athletes due to their easy implementation and limited duration. However, current evidence regarding these alternatives is still scarce and, therefore, further studies need to clarify whether shorter protocols can be used interchangeably with Allen and Coggan's test [19].

Power Profiling in Road Cycling
The power profiling first proposed by Allen and Coggan [19] has been used over the last decade to objectively quantify the performances of different cyclists and to categorize riders according to their strengths and weaknesses. This ecologically valid assessment of power-producing capacity over cycling-specific durations is a useful tool for quantifying elements of cycling-specific performance in competitive and recreational cyclists [58]. It has been established in competitive, elite, and professional cyclists that the power profile obtained in the laboratory can successfully match the values obtained by recovering training and competition data during full cycling seasons [29,30,58]. This signature of the cyclists' physical ability is based on a hyperbolic relationship between the record PO over different durations (1 s to 4 h) and is normally used to compare data between different classes of riders: among professional riders, sprinters have the highest PO for 1 to 5 s (up to 20 W/kg), climbers present the highest PO for 5 to 60 min (5.5 to 6.5 W/kg), and flat specialists present high PO for durations up to three hours (over 4 W/kg) [29,30]. Finally, the power profile of grand tour riders shows high PO throughout the entire curve: values of 18.1-20.4 W/kg for 1 to 5 s; 7.2-5.7 W/kg for 5 to 60 min; and almost 5 W/kg for 3 h have been previously reported in the literature [59].
Interestingly, the power profile of riders who specialize in a particular type of event can be matched with data obtained through the analysis of different types of races, such as time trials [60], different grand tour stages [61], or even cycling sportive events [62]. The opportunity to analyze the power requirements that characterize a specific event and then compare them to the strengths and weaknesses of the rider should not be overlooked due to its potential applications in the practical field.
Finally, power profiling has allowed researchers and coaches to objectively track riders' levels and changes in performance without the need to use expensive laboratory equipment to assess fitness through cardiorespiratory values, such as VO 2max or VT/LT. Outside of the practical field, scientists have also started to acknowledge this potential way of assessing performance, and, nowadays, it is increasingly common to find studies in which the participants' fitness levels or changes in performance are measured in W/kg [25,[62][63][64].

Cadence and Power Output
The opportunity to objectively assess intensity in cycling has brought attention to new details, such as the optimal cadence associated with a specific PO. This new possibility for implementing field tests seems relevant, especially when considering the important setbacks associated with studying power related to cadence in a laboratory setting: it has been observed that crank torque profiles on the ergometer are significantly different and generate a higher perceived exertion compared with road cycling conditions [65]. Furthermore, the crank torque profile varies substantially according to the terrain, a conditioning factor that cannot be recreated in a laboratory setting [66]. It should also be remarked that self-selected cadence is normally higher in the laboratory setting compared with road conditions [58] and imposing a cadence can modify the amount of work that a cyclist can complete above his FTP [67].
Taking all of the previous into account, the importance of obtaining field power and cadence data should be emphasized. Previous evidence suggests that higher PO is linked to higher self-selected cadence [68], although this relationship could be modified through specific training [69], with low cadence intervals improving performance in time trial-like efforts and high cadence intervals increasing the self-selected cadence. It currently remains unknown whether there is a relationship between FTP and self-selected or more efficient cadence.

Limitations
The studies included in the current review presented several limitations. Firstly, the sample sizes were relatively small (less than 30 participants in all cases). This restricts the conclusions that can be interpreted from the results. However, limited sample sizes can be mostly explained by the difficulties that arise when attempting to recruit participants for complex laboratory tests. Secondly, different measuring devices (treadmills, power meters, gas exchange measuring tools, lactate analyzers, etc.) were used across the studies. This further limited the possibility to compare the results obtained in different studies. Finally, participants with a wide range of fitness levels were included in most research. The studies that controlled this variable found differences in the relationship between some laboratory and field markers according to the physical condition. Therefore, it seems likely that this factor could influence the results obtained in other areas of the current review.
One of the confounding factors that must be taken into account when performing indoor laboratory cycling tests is the fact that outdoor conditions are difficult to replicate (temperature, humidity, cooling, different positioning on the bike, different muscle activation on a fixed position, etc.). The reviewed studies aimed to compare outdoor and indoor tests in the same conditions and, therefore, the assessments were performed in an artificial environment. This should be considered when interpreting the results, as none of the studies analyzed the differences that could be expected when riding outdoors.
Finally, the main aim of the current review was to describe the evidence that exists around the specific research questions addressed in this manuscript. The broadness of the covered topic allowed the inclusion of studies that were very different in their methodology, participants, or even objectives. This limited the possibilities to perform an objective assessment of the results across all studies and resulted in a review that is not systematic in nature. Another important setback was related to the search strategy, which was oriented towards answering the research questions. This posed an important methodological limitation to the current review, as several relevant concepts, such as critical power and novel metrics derived from power assessment (Stamina, FRC, W' balance, or Pmax) have not been covered. These important limitations should be pondered when interpreting the conclusions proposed by the authors.

Conclusions
From the evidence presented in this review, it could be suggested that: (a) FTP is closely related to-but does not necessarily represent-several laboratory markers of steady state, such as RCP and MLSS. This relationship should always be evaluated on a one by one basis; (b) the 20-min protocol represents the most researched option for testing FTP, while other testing durations have a less scientific background; (c) power profiling obtained through the recovery of record PO for different durations from a series of training and competition data allows for the categorization and assessment of the cyclist's fitness level without the need to rely on laboratory tests; and (d) Mpm can be used to assess the relationship between cadence and power in the field.

Conflicts of Interest:
The authors declare no conflict of interest.