1. Introduction
In various sports—certain track and field events and cycling disciplines, gymnastics, combat sports, and most game and snow sports—explosive actions, such as jumping, accelerating, changing direction, or launching an object or opponent, contribute crucially to performance [
1,
2,
3,
4,
5,
6]. Actions such as these depend heavily on the ability to generate muscle and external force at high velocities and within timespans ranging from several milliseconds to a few seconds [
3,
7,
8,
9,
10,
11]. This ability can be referred to as explosive strength [
12]. Since the coupling of force and velocity implied here is well represented by mechanical power, and because the energy for muscle work during these short-duration actions is not generated aerobically, the ability can also be referred to as muscular power [
13] or anaerobic power [
14].
Within the context of performance testing and the monitoring of athletic training, methods must be established for quantifying certain physical abilities considered to be relevant to a particular sports performance [
15], to recognize changes in these over time induced by maturation, training, detraining, injury, and the like. In this context, the assessment of anaerobic power has received continuous attention in sports science literature for several decades [
16,
17,
18]. Two of the most prominent methods for assessing anaerobic power that have emerged from past research are various forms of vertical jumping and cycling sprint tests [
16,
18,
19]. The validity and reliability of both test forms are generally accepted [
15,
20,
21,
22], and their application extends to a wide range of sports beyond jumping and cycling disciplines alone [
23,
24,
25,
26,
27,
28].
Whereas earlier versions of the vertical jump test focused on jump height [
16], mechanical power has emerged in recent years as the superior parameter for assessing explosive strength of the lower extremities in athletes [
23]. Although jump height is directly coupled to mechanical work performed during the push-off phase and to take-off velocity [
29], this parameter does not reflect the time component of muscular force development [
30], which is especially important in many sport settings [
3,
7,
8,
9,
31]. Since mechanical power does consider the time component, it seems fitting that this parameter has shifted to the forefront of consciousness regarding explosive strength assessment.
This change has come hand in hand with the rapid evolution of testing technology, especially force plates, whose implementation is becoming commonplace in performance testing facilities. High-resolution force plates (sample rates typically ≥1000 Hz) allow the determination of instantaneous mechanical power as the product of instantaneous force and velocity, the latter obtained by mathematical integration of the mass-normalized net force (i.e., acceleration) signal. While force plates capable of measuring in either one or three dimensions now represent the gold standard for determining mechanical power in movements, such as jumping [
29], hopping [
32], and sprinting [
33,
34], other technologies have emerged as well. For example, portable position transducers and accelerometers capable of estimating power from acceleration and mass are becoming common for training and field-testing purposes [
35,
36]. Even a simple, yet remarkably valid, method for determining lower-extremity muscle power based on vertical jump flight time and push-off distance, measured with either a smart phone or other device, has been established [
29,
37].
Although measuring mechanical power in cycling sprint tests has a longer history [
17,
38], the advent of electronic direct-force cycling power meters capable of recording instantaneous power one or more times per second has refined diagnostics of maximal cycling power output drastically. Further, with the increasingly common use of power meter-equipped bicycles and ergometers in training, athletes and coaches are perhaps beginning to think of cycling sprint performance more in terms of power than ever before [
39,
40,
41].
In addition to developments in measurement technology and criterion parameters, testing procedures themselves have also evolved over the years. Whereas early studies on anaerobic power employed rather long (up to 30 s) cycling sprint tests [
17] and quite generic “jump height” tests [
16], these have gradually lost favor among researchers and performance testing facilities, and been replaced by shorter sprints [
42,
43,
44] and more standardized jump tests [
22]. In particular, the six-second cycling sprint has established itself as a means for determining maximal cycling power in research settings [
20,
42] and is also commonly employed in standardized performance testing [
44]. During all-out cycling sprint tests, peak power is achieved within the first few seconds [
17]; thus, longer durations, although appropriate for assessing anaerobic capacity or fatigue resistance, are unnecessary for determining peak power and perhaps even detrimental if they cause subjects to unconsciously pace themselves for the longer effort [
21,
44]. For assessing jumping power, standardized countermovement or squat jumps without arm movement have become most common. Eliminating arm movement, typically by placing hands on the hips, provides a more reliable measurement of lower extremity power [
22]. The squat jump, in particular, is arguably the most elementary and standardizable jump form for assessing anaerobic power of the lower limbs, since it isolates concentric activation over a pre-defined range-of-motion and eliminates the influence of a stretch-shortening cycle [
45]. In the case of both cycling sprint and jump tests, improvements have been made over time in the reliability and specificity, and arguably the time-efficiency, of methods, criteria which are essential for assessing and tracking athletes’ performance.
Although both cycling sprint and vertical jump tests are valid and well-established methods for assessing anaerobic power, they obviously differ considerably, particularly in their respective movement patterns (unilateral versus bilateral, and cyclic versus acyclic) and test durations (several seconds versus less than one second). Whereas several previous studies have addressed correlations between anaerobic cycling power and vertical jump height [
19,
24,
25,
38,
45,
46], the aforementioned evolution of methodology and consciousness make power-to-power the more obvious, and thus the more relevant comparison, today than was perhaps the case years ago. However, due to a surprising sparsity of published research on this topic [
45,
47], it remains unclear how well the power measurements from both tests—and more importantly with regards to performance testing, changes in these—correlate with one another. Information about this relationship is of particular relevance for performance testing centers seeking the most economical use of their spatial and financial resources, as well as athletes’ time and energy. If the two tests are interchangeable, it could be unnecessary for centers to provide facilities for both.
For the two tests to be considered interchangeable, more than a high correlation in cross-sectional data is necessary. It must also be shown that changes in test results over time (longitudinal data), especially those induced by training or de-training, correlate well with one another, and, ultimately, that similar conclusions about training effectiveness would be drawn from both test forms. Thus, the aim of this study was three-fold: The preliminary aim was to re-assess the relationship between anaerobic cycling and jumping performance from a cross-sectional perspective, this time using peak power from both tests, in a group of strength-trained athletes. Building on this, we then investigated, from a longitudinal perspective, the relationship between changes in peak cycling power and in peak jumping power over time. Finally, we assessed how well magnitude-based inferences about individual changes in ability over time agree between the two tests. The corresponding hypotheses to be tested were that cycling and jumping power correlate very strongly, that changes in these correlate very strongly, and that inferences about individual changes in these agree in the large majority (at least 75%) of cases. To fulfil the study aims, we used data from a six-second cycling sprint test and an unloaded vertical squat jump, obtained from professional ski cross racers during their routine performance testing, at six different time points over the course of two annual training cycles. Additionally, because other jump forms were also included in the performance tests at each time point, we explored analogous relationships between cycling peak power and both countermovement jump power and squat jump peak power with an additional load equal to body weight.
2. Materials and Methods
2.1. Subjects
Data were collected from professional ski cross racers from the Swiss national team, over the course of two annual training cycles, during regular performance testing at the Swiss Federal Institute of Sport (Magglingen, Switzerland). To be precise, tests were conducted in the months of May, August, and November of two consecutive years. Estimates of subjects’ one-repetition maximum for back half squats (knee angle ~100°) based on isometric squatting against a force plate and a validated conversion factor of 0.7 [
48] were 2.0–2.7 × body mass for females and 2.2–3.4 × body mass for males. Thus, subjects were considered to be strength-trained. Although the primary reason for the performance testing was to guide the athletes individually in their training, all athletes gave written consent for their anonymized data to be used for research purposes. Further, the methods and procedures of the research were approved by the ethical review board of the canton of Bern, Switzerland (project ID 2018-00742). In total, data from 20 athletes were available for the study. Descriptive characteristics of the cohort at the onset of the study are displayed in
Table 1.
2.2. Data Collection
Because data were obtained during routine performance testing, additional measurements not included in the present investigation were also performed. Upon arriving at the testing center, athletes’ body height was measured using a stadiometer. Athletes then proceeded to warm up individually for 20–30 min. Testing commenced with an isometric squat strength testing procedure (data not included in the study), which lasted around five minutes. Thereafter, athletes completed a battery of single vertical jumps, including countermovement and squat jumps with additional loads ranging from 0% to 100% of body weight. For the unloaded condition, athletes placed their hands at the hips to eliminate an arm swing while jumping. For loaded conditions, athletes jumped with a 10-kg barbell loaded with weight plates placed across their shoulders (as when performing back squats). A custom-made, hand-activated retention system, which was engaged during the flight phase of loaded jumps, ensured that subject could safely land with no load on their shoulders. Squat jumps were commenced from a static starting position with a knee angle of approximately 90°, which was controlled visually by an investigator. For countermovement jumps, depth of the countermovement was determined instinctively by the athletes themselves (typically 0.20–0.35 m), but data were filtered before analysis (see below) to ensure continuity in individuals’ jump execution across time points. Athletes were instructed to jump as high and explosively as possible. For unloaded jumps and jumps with additional loads up to 20% of body weight, three valid trials were performed. For heavier loads, to minimize fatigue, only one valid trial was required, although 1–2 additional trials were performed in the case of obviously poor execution or peak power values clearly lower than expected based on the trend of preceding loading condition.
Jumps were performed on a one-dimensional force plate (MLD Test EVO 2, SP Sport, Trins, Austria), which recorded ground reaction force at 1000 Hz. Using the total mass, determined by the force plate prior to each jump, the accompanying software (Muskelleistungsdiagnose 2010, version 5.2.0.6101, InfPro IT Solutions GmbH, Innsbruck, Austria) calculated acceleration–time, velocity–time, power–time, and position–time curves of the center of mass from the recorded force–time signal for each jump. Squat jumps for which mechanical power descended below −1 W/kg body mass within 1 s prior to the onset of positive (upward) velocity—indicating an unacceptable countermovement—were deemed invalid and deleted.
For the present investigation, we decided a priori to include only squat jumps (because these are inherently most standardized) performed with no additional load (because this is most common in practice), to keep the results concise. However, during data analysis, the other extreme loading condition (100% body weight) as well as countermovement jumps (unloaded only) were deemed helpful for explaining findings, and were thus included in the presentation of results. Thus, the jump parameters retained for data analysis were the peak concentric power for squat jump (SJ), squat jump with an additional load equal to body weight (LSJ), and unloaded countermovement jump (CMJ). For each jump type, the value from the trial with the highest peak concentric power was retained. The test-retest percent typical error (CV) for peak power using the described protocol has been previously determined to lie between 2.7% (unloaded) and 4.7% (100% body weight additional load) for squat jumps and between 2.5% (unloaded) and 3.9% (100% load) for countermovement jumps [
49].
Following the last vertical jump, athletes proceeded to the six-second cycling sprint test. This was performed with a flying start on a Wattbike ergometer (Wattbike Trainer, Nottingham, UK) according to the 6” Peak Power Test protocol in the Wattbike test guide [
50] (pp. 19–21). Initially, saddle height and handlebar position were set up according to athletes’ personal preferences (settings were determined at the first time point and replicated thereafter). Then, athletes pedaled with minimal resistance (<100 W) for 1–2 min while the test procedure was explained to them by an investigator. Resistance settings for the test were determined at each time point anew based on sex and current body mass, according to the recommendations in the Wattbike test guide [
50] (p. 24). Precisely, according to body mass ranges for each gender (see
Table 1), the air/magnet settings varied between 4/1 and 6/1 for females and between 8/1 and 8/6 for males. Immediately prior to the measurement, athletes pedaled with no load for 20–30 s while the protocol was selected on the Wattbike monitor. Simultaneously with the onset of the six-second timer, the resistance was increased to the selected level by an investigator, and athletes pedaled as hard as fast as possible for six seconds while seated on the saddle. During the test, the Wattbike was stabilized by an investigator or coach, to prevent unwanted movement of the entire ergometer. The Wattbike measures chain tension with a load cell at 100 Hz and crank angular velocity twice per revolution yielding two power values per revolution [
51,
52]. For the present study, only the highest power value recorded during the six-second cycling sprint test (6sCS) was analyzed.
2.3. Statistical Analysis
Statistical analyses were performed using customized Python scripts. Initially, to ensure continuity of individuals’ jump execution across time points, jumps for which the concentric push-off distance differed by more than 0.05 m from the subject’s mean value were excluded from data analysis. For descriptive purposes, peak power values averaged across all time points for each of the four tests (6sCS, SJ, LSJ, CMJ) were assessed for normal distribution using the Shapiro–Wilk test and a significance level (α) of 0.05, then compared for females, males, and the pooled cohort using either a one-way ANOVA (normally distributed data) or the Kruskal–Wallis test (if one or more tests did not pass the test for normal distribution), also with α = 0.05. Post-hoc comparisons between tests were made using Bonferroni correction.
For hypothesis testing, change scores were calculated and expressed as a percent of the pre-test value, rather than in Watts. With six time points, a maximum of 15 (
) possible change scores per subject could be calculated. For additional analyses, two particular sub-types of changes were analyzed separately: changes between consecutive time points (t
2-t
1, t
3-t
2, t
4-t
3, t
5-t
4, t
6-t
5, five total) and changes between time points separated by one year (t
4-t
1, t
5-t
2, t
6-t
3, three total), because these represent two common time spans between performance tests for elite and amateur athletes. Peak power and change score data sets were tested for normal distribution using the Shapiro–Wilk test (α = 0.05). Thereafter, relationships between raw power values (cross-sectional correlations) and between change scores (longitudinal correlations) were assessed using either Pearson’s r (if both data sets were normally distributed) or Spearman’s rho (if one or both data sets did not pass the test for normal distribution). For cross-sectional correlations, both absolute and body-mass-normalized power were analyzed, since absolute power was expected to be rather heterogeneous, which can exaggerate the strength of correlations. For the same reason, cross-sectional correlations were performed separately for males and females. Longitudinal correlations, however, were performed on change scores for absolute power only and for both genders combined; this choice was made for the sake of being concise while maximizing
n, and since the effects on percent changes of gender and normalizing power were expected to be negligible. Correlation coefficients were categorized based on their magnitude as trivial (0–0.1), small (0.1–0.3), moderate (0.3–0.5), large (0.5–0.7), very large (0.7–0.9), or extremely large (>0.9) according to Hopkins et al. [
53].
Finally, a magnitude-based inference about each individual change was made using the methods described by Hopkins [
54] and integrated into his open-source spreadsheet [
55]. For SJ, LSJ, and CMJ peak power, percent typical errors (CV) of 2.7%, 4.7%, and 2.5%, respectively, were used [
49], whereas a 4.9% typical error was assumed for 6sCS peak power [
52]. For all inferences, the smallest meaningful change was set to 1%, and the confidence range for the true change was 80%. In short, based on measured changes, typical errors, and smallest meaningful changes, inferences were formulated, each of which included a degree of certainty (‘very likely’, ‘possible’, or ‘unclear’) and a change type (‘increase’, ‘trivial change’, or ‘decrease’). To assess the degree of agreement between magnitude-based inferences, a three-tier approach was used: The percentage of cases was reported for which (1) the most likely change (independent of its certainty) agreed, (2) the phrased inferences agreed, and (3) phrased inferences contradicted. Inferences were considered to agree if the most likely changes agreed and certainty of both were not ‘unclear’. Inferences were considered to contradict if one included ‘increase’ and the other included ‘decrease’ and certainty of both were not ‘unclear’.
4. Discussion
This study aimed to investigated the degree to which a six-second cycling sprint test performed on a Wattbike ergometer and a vertical squat jump test on a force plate are interchangeable as means for assessing anaerobic power of the lower extremities in strength-trained athletes. Answering this question could provide helpful information for smaller performance testing facilities, as well as coaches or sports clubs, who intend to invest in testing devices and wish to do so in the most resourceful manner possible. The main findings were that (1) extremely large linear correlations exist between maximal cycling power and maximal jumping power (
Table 3,
Figure 2 and
Figure 4), (2) large linear correlations on individual percent changes in maximal power across time spans of less than one year exist between cycling and unloaded jumping (
Table 4,
Figure 3 and
Figure 4), and (3) magnitude based inferences made about individual test-to-test changes from sprint cycling and squat jumping power agree in around half of all cases, whereas the most-likely types of change (substantial increase, decrease, or trivial change) agree in a slight majority of cases (
Figure 5).
First, this study confirmed a strong relationship between jumping power and sprint cycling power in the current cohort of strength-trained ski cross racers. Whereas previous studies have shown similar relationships, usually using jump height rather than power, in untrained subjects [
46], strength-trained non-cyclists [
19,
24,
25,
38,
56], and strength-trained cyclist [
27], this is one of few studies [
45,
47] to have directly compared mechanical power from both test forms. Thus, the close relationship between the two types of power measures seems to be fairly universal. Errors in the estimate of squat jumping maximal power from sprint cycling power using the regression equation from
Figure 2a were 434 (90% c.l. 383–502) W or about 10%. Similarly, strong relationships with 6sCS were found for LSJ and CMJ. These relationships are not surprising and, as has been described elsewhere, can be attributed to the fact that power in both tests is mainly produced by the same muscle groups (hip and knee extensors) in a similarly dynamic fashion. The clear systematic differences in the actual power values (
Table 2,
Figure 1) are most obviously due to the fact that cycling is a unilateral movement, whereas jump tests were performed in a bilateral manner. However, the magnitude of the differences between cycling and all forms of jumping suggests that other factors are in play as well. One such factor could have been diminished coordination for the cyclic movement [
57]. Another explanation for the lower power values in the 6sCS (<500 W in a few cases) was the combination of low ergometer resistance and a flying start, an effect that was exacerbated in the case of athletes with particularly steep force–velocity profiles (i.e., high relative maximal strength but comparatively low maximal contraction velocity). In essence, the movement velocity was seemingly too high for some athletes, and peak power was limited accordingly. Indeed, it pilot measures performed after the study with some of the same athletes, cycle sprint tests with higher resistance and/or a standing start yielded 5%–10% higher peak power values. This poses the question of whether the test protocol and resistance settings recommended by Wattbike should be applied to strength-trained athletes, which might merit further exploration in practical settings.
For sprint cycling and vertical jump tests to be considered interchangeable, a predictable relationship must exist not only for cross-sectional data but also between changes in performance over time. To our knowledge, this relationship has not been well established in previously published research; doing so was, therefore, an important aim of this study. Correlations between longitudinal percent changes in power were somewhat smaller than the cross-sectional correlations on actual power, but nonetheless moderate to large in magnitude. As expected when designing the study, correlations between jumping and cycling power change scores were highest for SJ; nonetheless, 6sCS change scores also yielded large correlation coefficients when paired with LSJ or CMJ change scores. In general, the weakest correlations between jumping and cycling change scores were found for time spans of one year. One reason for the smaller correlations in this subset of change scores was the smaller range of values, which is not surprising since athletes were in the same phase of physical preparation at both time points. This finding suggests that larger changes in one test may be reflected in the other test, whereas smaller changes are not.
As a point of comparison, changes in power from the various jumping forms (SJ, LSJ, and CMJ) between consecutive time points also correlated slightly better (0.73–0.76) than those occurring over one year (0.59–0.71). Further, although change scores from the various jump forms correlated more strongly with each other than did change scores from any jump form with those from cycling, the differences were not all too drastic. Thus, from a longitudinal perspective, it appears that sprint cycling and squat jumping power tests reflect one another only about as well as do squat jump and countermovement jump tests. Typically, however, squat and countermovement jump tests are not considered interchangeable.
Although observed changes were mostly within around ±9%, it might seem remarkable to some readers that professional athletes occasionally experienced changes of almost ±20% within a few months. However, we do not find such changes particularly surprising considering that the present subjects’ (skiers’) explosive strength is generally only moderate (compared to sprint specialists, for example). At the same time, the present subjects (and skiers in general) possess very good maximal strength, which affords their explosive strength a rather large degree of trainability during the off-season conditioning phase. Additionally, a competitive season and a post-season break lay between time points 3 and 4, which could explain most of the larger decreases in power we observed. The occasional extreme changes, especially between consecutive time points, certainly expanded the data range, which tends to increase correlation coefficients. Accordingly, we attribute the lower correlation coefficients between change scores over one year to the narrower range of change scores (due to the fact that athletes were at the same phase of their physical preparation period at both time points).
One of the main uses of performance testing is for tracking changes in an athlete’s physical abilities over time. In the most rudimentary sense, this means breaking down test results from two time points into one of three possible concrete conclusions: the tested ability improved, it declined, or no meaningful change occurred. Magnitude-based inferences on individual changes are a statistical method that uses known or assumed values for measurement precision and the smallest meaningful change in a given ability to infer (with a precluding degree of certainty) one of these three conclusions. If such inferences are to be made about an individual’s anaerobic power based on either SJ or 6sCS, and if the two tests are to be considered interchangeable, the probability that the inference over the same time span from the other test would agree must be high. Moreover, there must be a very small probability that inferences from the two tests would contradict. In the present study, these criteria were not fulfilled to a satisfactory degree. In the case of changes in 6sCS compared to changes in unloaded SJ, the most likely change (i.e., substantial increase, substantial decrease, or trivial change) displayed the most frequent agreement, in up to 63% of cases. However, since the most likely change does not always differentiate itself from the other two possible changes with enough certainty to justify its ultimate adoption, worded inferences agreed in only 43%–54% of cases, somewhat dependent on the time span. Moreover, inferences contradicted in 17–26 cases. For other jump forms, agreement with 6sCS was more or less the same. To put this result into context, around 20% agreement could be expected with random guessing alone. For comparison, inferences for SJ and LSJ agreed more frequently (57%–67% of cases) and also contradicted less often (11%–13% of cases). We must concede, however, that the agreement in the large majority of cases we were looking for (75%, see hypotheses) was not achieved among the various jump tests either. Interestingly, inferences for SJ and CMJ agreed the least frequently (~33% of cases) among the tests we analyzed. Therefore, based on magnitude-based inferences, the degree of interchangeability between squat jump and cycle sprint tests is not particularly high, but comparable to the degree of interchangeability between various jump execution forms.
We recognize that the percentages of agreeing and contradicting inferences attained with our methods are dependent on choices we made regarding confidence intervals and smallest important changes; however, based on simulations with our data, percentages would not have been uniformly improved with another confidence range between 70% and 90%. Further, increasing the smallest important percent change from 1% up to 5% would have decreased percentages of both agreeing and contradicting cases simultaneously. Therefore, we believe our conclusions would not have been different had other reasonable choices for confidence intervals and smallest important changes been made.