An Assessment of the Hopping Strategy and Inter-Limb Asymmetry during the Triple Hop Test: A Test–Retest Pilot Study

: The aims of the present study are to: (1) determine within- and between-session reliability of multiple metrics obtained during the triple hop test; and (2) determine any systematic bias in both the test and inter-limb asymmetry scores for these metrics. Thirteen male young American football athletes performed three trials of a triple hop test on each leg on two separate occasions. In addition to the total distance hopped, manual detection of touch down and toe-off were calculated via video analysis, enabling ﬂight time (for each hop), ground contact time (GCT), reactive strength index (RSI), and leg stiffness (between hops) to be calculated. Results showed all coefﬁcient of variation (CV) values were ≤ 10.67% and intraclass correlation coefﬁcients (ICC) ranged from moderate to excellent (0.53–0.95) in both test sessions. Intrarater reliability showed excellent reliability for all metrics (CV ≤ 3.60%, ICC ≥ 0.97). No systematic bias was evident between test sessions for raw test scores ( g = − 0.34 to 0.32) or the magnitude of asymmetry ( g = − 0.19 to 0.43). However, ‘real’ changes in asymmetry (i.e., greater than the CV in session 1) were evident on an individual level for all metrics. direction of asymmetry, kappa coefﬁcients revealed poor-to-fair levels of agreement between test sessions for all metrics (K = − 0.10 to 0.39), with the exception of the ﬁrst hop (K = 0.69). These data show that, given the inherent limitations of distance jumped in the triple hop test, practitioners can conﬁdently gather a range of reliable data when computed manually, provided sufﬁcient test familiarization is conducted. In addition, although the magnitude of asymmetry appears to show only small changes between test sessions, limb dominance does appear to ﬂuctuate between test sessions, highlighting the value of also monitoring the direction of the imbalance.


Introduction
Jump testing is a common method of quantifying ballistic force production capabilities and is often implemented to assess lower body jump performance [1,2], neuromuscular fatigue [3,4], inter-limb asymmetry [5,6], and rehabilitation status post injury [7,8]. Jump tests are commonly used due to the associated (and previously reported) strong reliability and time-efficient methods [2,5], thus making them viable for a wide range of practitioners. Such reasons are key, as they help to ensure confidence in subsequent data collection procedures, whilst enabling a means of gathering objective data for those working in sports with large squads of athletes (e.g., American football, soccer, baseball, and rugby).
Horizontal hop testing is regularly implemented in rehabilitation settings and has been frequently cited in the rehabilitation literature as a method of analyzing rehabilitation status and return-to-sport readiness [7,[9][10][11]. The tests often utilized include the single-leg hop and the triple and crossover hop for distance, with previous literature indicating strong reliability for all of these tests (ICC = 0.89-0.99) [12]. However, for team-sport athletes, repeated hopping tasks, such as the triple hop (i.e., three consecutive hops for maximal distance) test, may display greater levels of ecological validity than the single-leg hop (i.e., one hop for maximal distance), due to the repeated requirement for deceleration and subsequent concentric force application in as short a time as possible [13], which are common movement requirements in team sports [1,13,14]. Despite the usefulness of hop testing for those with limited budgets and the prevalence of their use in a clinical setting, recent evidence has identified that 'distance jumped' is a somewhat limited measure of jump performance that does not provide any information relating to jump strategy (i.e., how the jump was performed) [9]. Additionally, the validity of horizontal testing in clinical sports medicine settings has been questioned [15,16]. Similarly, in previous research relating to vertical jump testing, the outcome measure of jump height has been shown to be somewhat insensitive to changes in an athlete's readiness to train following competition or intense exercise [4,17]. Furthermore, metrics that elicit an understanding of jump strategy (e.g., reactive strength and propulsive impulse) have been consequently championed, owing to their better ability to detect meaningful change greater than the error in the test [4,17] and also provide a more in-depth understanding of jump performance [18][19][20][21]. Thus, with an increase in research investigating vertical jump strategy, it can be argued that the same is warranted for horizontal jumping. This suggestion is supported in recent research by Kotsifaki et al. [9], who highlighted that distance jumped was an insufficient metric, on its own, to detect deficits in performance at the knee joint during an athlete's rehabilitation from injury.
Another commonly reported measure from horizontal hop testing is inter-limb asymmetry data [7,11,12,22]. Such information is often used to try to determine whether such limb differences are associated with reductions in athletic performance [22][23][24] or with an increased risk of injury [23,25]. However, similarly to Kotsifaki et al. [9], recent empirical investigations have suggested that the asymmetry value from hop testing may overestimate an injured athlete's rehabilitation status [7,26]. This is relevant because, if such information is used to inform a practitioner's decision making, it has the potential to contribute to an athlete being cleared to train or compete earlier than when fully ready, which, in turn, may heighten the risk of re-injury [27]. Additionally, recent investigations have highlighted the importance of quantifying the 'direction of asymmetry' (i.e., which limb performs better out of the two), resulting in an understanding of limb dominance [5,12,28]. Such information has been quantified using the kappa coefficient and has shown that levels of agreement between test sessions are far from perfect. Simply put, the dominant limb in one test session may not always be the dominant one in the subsequent test session, resulting in the direction of asymmetry 'switching sides'. This is relevant because, if only the magnitude of asymmetry is monitored, shifts in the pattern of asymmetry (i.e., limb dominance) can be easily missed, especially in healthy athletes [5,12,28]. Whilst this type of analysis has shown substantial variation in lower limb strength and vertical jump testing, comparable evidence for horizontal jumping is lacking.
Therefore, the aims of the present study are twofold: (1) to determine within-and between-session reliability of multiple metrics obtained during the triple hop test; and (2) determine any systematic bias (i.e., significant differences) in both the test and asymmetry scores for these test metrics. Given comparable research relating to jump strategy for horizontal jump testing seems scarce, a true hypothesis was challenging to generate. However, with sufficient test familiarization, it is hypothesized that all data exhibit acceptable reliability and no systematic bias would be evident between test sessions.

Experimental Design
This study used a test-retest design, using adolescent American football players, noting that the relationship between horizontal jumping and linear speed (r = −0.58 to −0.69) has been shown to be stronger than the relationship between vertical jumping and linear speed (r = −0.56 to −0.61) in American football athletes [29]. Participants performed the triple hop test on six separate occasions prior to an organized team practice. Sessions 1-4 were used for test familiarization and were conducted over a period of 2 weeks (i.e., on Tuesday and Thursday at team practices), in which athletes were allowed to practice the triple hop test until they felt comfortable with the required technique under the supervision of the primary researcher. Given the inherent instability of the triple hop protocol (i.e., being performed on one leg), this number of familiarization sessions was used to reduce the chances of any learning effects in the test. Sessions 5 and 6 served as the two data collection sessions, which were separated by 48 h of rest and took place halfway through the high school football season.

Participants
Thirteen male high school American football players (age: 16.9 ± 0.3 years; height: 1.81 ± 0.05 m; body mass: 86.0 ± 13.7 kg) volunteered to participate in the study. Sample size estimation was done based on the work of Walter et al. [30], which estimates the n required for reliability studies. In the present study, which used a test-retest design, a sample of 9 was required for the minimal acceptable ICC value to be 0.5 and the estimated ICC to be 0.8. All subjects had a minimum of 4 years of competitive American football experience and were free from injury throughout the duration of this study and the preceding 6 weeks. Written informed consent was provided by the parent or guardian of each athlete, as well as participant ascent. Ethical approval was granted by the London Sport Institute research and ethics committee at Middlesex University, London, UK.

Procedures
Warm Up. Prior to testing, participants completed a warm-up exercise following the RAMP protocol as outlined by Jeffreys [31]. This consisted of self-paced jogging for 5 min; 1 × 10 repetitions of dynamic stretches, including multi-directional lunges, hamstring 'scoop-walks', 2 × 20 m lateral shuffles, and 2 × 20 m sprint accelerations; and three practice trials of the triple hop test on each leg, as described in previous research [32]. A 5 min rest period was provided between the end of the warm up and the start of data collection.
Triple Hop Test. Participants completed three triple hops (arm-swing allowed) per leg during each of the two testing sessions, with all test scores averaged on each limb and used for further analysis. Hops were completed on artificial turf, near the sideline of an American football pitch, where each yard is clearly delineated by paint on the field. Participants were instructed to begin with their toe behind the designated start line; to hop forward as far as possible repeatedly for three hops; to minimize ground contact time (GCT) in-between hops; and to "stick" the landing on the final hop for 3 s. An inability to "stick" the landing resulted in a void attempt and the player was required to redo the trial after a 90 s rest period. All trials were separated by 90 s of rest and conducted in an alternating order (i.e., trial 1 = left leg, trial 2 = right leg, trial 3 = left leg, etc.). Trials were filmed in slow-motion at 240 frames per second using a smartphone (iPhone SE 2nd generation), which has been previously validated for its use in research [29] and uploaded into a motion analysis software (Noraxon Inc. Scottsdale, AZ, USA). The smartphone was mounted on and fixed to a tripod, which was set at a height of 1 m off the ground and at a distance of 9 m perpendicular to the direction of the hopping task, in line with similar recording methods during sprinting research [33]. The timestamp at initial touchdown of hops two and three and toe-off of each hop were manually recorded using the software by the primary researcher (KD). Flight time and GCT for each hop were manually derived from the timestamps. Reactive strength index (RSI) and leg stiffness were computed between hops 1-2 and hops 2-3 on both limbs. RSI was calculated by dividing flight time of one hop by the previous GCT, as per the methods employed by Lloyd et al. [34]. Leg stiffness was estimated using the equation previously validated by Dalleau et al. [35] and is shown in Equation (1). Finally, the total distance hopped was also computed to the nearest cm, from toe to heel.

Statistical Analysis
All data were initially recorded as means and standard deviations (SD) in Microsoft Excel. Normality of the data was confirmed using the Shapiro-Wilk test (p > 0.05). Withinand between-session reliability were computed using the coefficient of variation (CV), which was calculated as (SD/average)*100 and a two-way random intraclass correlation coefficient (ICC) with 95% confidence intervals (CI). Intrarater reliability was also calculated for 6 participants from session 1 to ensure consistency in the manual detection of raw data, noting that this totaled to 36 individual hops being analyzed for flight time and 24 individual hops for all other metrics. CV values < 10% were deemed acceptable (3) and ICC values were interpreted in line with suggestions by Koo and Li [8], where >0.90 = excellent; 0.75-0.90 = good; 0.50-0.74 = moderate; and <0.50 = poor. The magnitude of asymmetry was calculated based on suggestions by Bishop et al. [36,37] using Equation (2).
From an interpretation perspective, the magnitude of asymmetry was only considered to be 'real' if greater than the CV in test session 1, as described in previous research [38]. In order to compute the direction of asymmetry, an IF function was added to the end of the equation: *IF(L<R,1,−1), which provided a positive percentage value when the right limb scored higher than the left and a negative percentage value when the left limb scored higher than the right. It is important to note that, when used, this function ensured the magnitude of asymmetry was not altered, which can occur with some asymmetry Equation (7). Kappa coefficients were used to quantify levels of agreement for the direction of asymmetry between test sessions and interpreted in line with suggestions by Viera and Garrett [39], where <0 = poor; 0.0-0.2 = slight; 0.21-0.4 = fair; 0.41-0.6 = moderate; 0.61-0.8 = substantial; 0.81-0.99 = nearly perfect; and 1 = perfect.
Paired-sample t-tests were used to determine systematic bias between test sessions for both test scores, whilst Wilcoxon signed-rank tests were used for the asymmetry data, with statistical significance set at p < 0.05. Finally, Hedges's g effect sizes were computed to provide an understanding of practical significance between test sessions and interpreted in line with suggestions by Rhea [40] for recreationally trained athletes, where <0.35 = trivial; 0.35-0.80 = small; 0.81-1.50 = moderate; and >1.5 = large. Table 1 shows mean ± SD test data with Hedges's g effect sizes. No systematic bias was evident between test sessions (p > 0.05; g range = −0.34 to 0.32). Within-and between-session reliability data is shown in Table 2. For absolute reliability, all CV values were <10%, with the exception of leg stiffness, which showed slightly elevated variability within both sessions (≤10.67%).   When interpreting inter-limb asymmetry scores (Table 3), no significant differences were evident between sessions for the magnitude of imbalance (p > 0.05; g range = −0.19 to 0.43). When considering the direction of asymmetry, kappa coefficients revealed poor-tofair levels of agreement for all metrics and hops between test sessions (K = −0.10 to 0.39), with the exception of flight time for the first hop, which showed substantial agreement (K = 0.69). Mean and individual data are presented in Figures 1 and 2 (raw test scores) and in Figures 3 and 4 (inter-limb asymmetry), which show the spread of individual scores across the sample. When assessing changes in asymmetry on an individual basis, 3 subjects showed real asymmetries for the total distance ( Figure 3); 3 for flight time in the first two hops and only 2 subjects for the final hop; 5 for GCT between hops 1−2 and 3 between hops 2−3; 2 for RSI between hops 1−2 and 3 between hops 2−3; and 2 for leg stiffness between hops 1−2 and only 1 between hops 2−3 ( Figure 4). These changes in asymmetry have been represented by dashed lines on Figures 3 and 4. Table 3. Mean asymmetry scores ± standard deviations, between-session Hedges's g effect size data with 95% confidence intervals (CI), and between-session kappa coefficients (with descriptor) for the direction of asymmetry. in Figures 3 and 4 (inter-limb asymmetry), which show the spread of individual scores across the sample. When assessing changes in asymmetry on an individual basis, 3 subjects showed real asymmetries for the total distance ( Figure 3); 3 for flight time in the first two hops and only 2 subjects for the final hop; 5 for GCT between hops 1−2 and 3 between hops 2−3; 2 for RSI between hops 1−2 and 3 between hops 2−3; and 2 for leg stiffness between hops 1−2 and only 1 between hops 2−3 ( Figure 4). These changes in asymmetry have been represented by dashed lines on Figures 3 and 4.        Table 3. Mean asymmetry scores ± standard deviations, between-session Hedges's g effect size data with 95% confidence intervals (CI), and between-session kappa coefficients (with descriptor) for the direction of asymmetry.

Discussion
The aims of the present study are to: (1) determine within-and between-session reliability of multiple metrics obtained during the triple hop test; and (2) determine any systematic bias in both the test and asymmetry scores for these test metrics. Results showed that manual analysis of flight time, GCT, RSI, and leg stiffness metrics can be done reliably both within and between test sessions, despite slightly elevated variability for leg stiffness. This is supported by a distinct lack of systematic bias between sessions for raw test data. No systematic bias was evident for asymmetry scores; however, levels of agreement for the direction of asymmetry were typically not high, indicating fluctuating limb dominance characteristics during the triple hop test, when quantifying between test sessions.
This study is one of the first investigations that assesses more than only distance jumped (i.e., jump strategy) during the triple hop test. Lloyd et al. [34] reported flight time, GCT, and RSI values during the triple hop test for 20 male professional soccer players who had previously required surgery for an anterior cruciate ligament (ACL) rupture. In addition, a key difference with the present study is the collection of flight time and GCT data using the OptoJump measuring system. However, given the infrequent use of such metrics during the triple hop test (i.e., reporting of distance jumped only) [8,35,39], it is somewhat surprising that no reliability data was reported in this study (Lloyd et al. [34]). Thus, one of the key priorities of the present study was to determine within-and between-session reliability for such metrics. Table 2 shows acceptable variability and relative reliability (ICC) for all metrics, with slightly elevated CV values for leg stiffness. Although the opinion of the authors is that leg stiffness can still be used with confidence, the slightly elevated CV values could be attributed to a couple of possible reasons. Firstly, this metric is a predicted value, using the validated equation from Dalleau et al. [35], which represents a more viable method of quantifying leg stiffness for practitioners in the field. Secondly, stiffness involves displacement, which may be a more variable strategy metric, especially when undertaking repeated hopping on one leg. In addition, whether the reliability of horizontal leg stiffness is sufficient for a clinical application may depend on the context and the practitioner. For example, in a rehabilitation setting where inter-limb asymmetries in lower limb capacity may exceed 20%, horizontal leg stiffness may provide additional information regarding return-to-sport readiness after common sport injuries, such as ACL rupture. It is important to note that between-session reliability was noticeably better than within-session, which may be due to the data being averaged on each limb in each test session prior to computation of between-session reliability. Therefore, practitioners can confidently use 2D motion analysis to reliably gather in-depth metrics beyond jump distance during the triple hop test, both within and between sessions, which represents a novel finding among healthy participants. This suggestion is also supported by the excellent intrarater reliability, which shows that, when the same person analyzes the data, this manual detection method can be done with high levels of consistency. Table 3 shows data for the magnitude of asymmetry in both test sessions. All differences in the magnitude of asymmetry were trivial (g ≤ 0.35), with the exception of RSI between the first two hops (g = 0.43). This provides the impression that the magnitude of asymmetry is consistent between test sessions. However, it is important to note the large SD values, relative to the mean asymmetry score, which highlights large withingroup variation. As such, this likely precludes any meaningful differences from being determined between test sessions, which has been acknowledged in previous asymmetry studies [6,12]. Consequently, previous studies have suggested analyzing asymmetry data on an individual basis [40], hence the inclusion of Figure 2, which shows the large spread of individual asymmetry values for each metric. This is magnified further when compared to Figure 1, which shows that the spread of individual values is considerably less for raw jump metrics. Thus, although only trivial-to-small differences were evident in the magnitude of asymmetry, individual changes were sometimes quite extreme, supporting the notion of analyzing asymmetry data on an individual basis. When assessing whether real asymmetries were present, no athlete consistently exhibited real asymmetries between hops or metrics, highlighting the highly individualized and variable nature of asymmetry. This is also supported in previous research by Bishop et al. [28], who reported similar individual inconsistencies for changes in asymmetry during a competitive season in academy soccer players. As such, the data from the present study and Bishop et al. [28] indicate that an athlete being fatigued or rested may have little impact on the consistency of asymmetry being greater than the test variability score. Table 3 also reports kappa coefficients, which were used to depict levels of agreement for the direction of asymmetry. Simply put, this is a statistical method that aims to quantify whether the superior performing limb was consistent between test sessions, once any agreement by chance were removed, and has become a common method of analysis in recent asymmetry research [5,6,13]. With the exception of flight time for the first hop, kappa values ranged from poor to fair (−0.10 to 0.39), indicating the fluctuating nature of limb dominance between test sessions. These data are largely in agreement with previous studies that have used unilateral jump tests and have assessed the direction of asymmetry in a test-retest design using healthy populations [5,6,13]. Specifically, this clearly demonstrates the concept of movement variability across the triple hop test, noting that it is rare for one limb to consistently outperform the other for any metric between test sessions. It is interesting that, in this study, there were substantial levels of agreement for the metric of flight time during the first hop (K = 0.69), which is challenging to fully explain. Due to the first hop having no momentum, flight time (and distance jumped) is largely dependent on ballistic force generation qualities [2], which, it seems, do not fluctuate significantly between test sessions. In contrast, the other jumps are likely to be dependent on a range of factors, such as effective use of the stretch-shortening cycle, time constraints during ground contact, momentum upon impact, and center of mass relative to the base of support during the task. Collectively, these factors may have had an effect on the fluctuating nature of limb dominance during the test. Fluctuating limb dominance is an important concept of which practitioners should be aware, because it helps to contextualize the complexity of asymmetry, considering that this is a ratio number composed of two component parts [5][6][7]. Simply put, if practitioners only monitor asymmetry as a single absolute number, the inherent changes seen in limb dominance, shown in the present study between test sessions, will be missed.
It is important to note a few limitations in the present study. Firstly, the sample size was small, but we did aim to overcome this issue by providing individual data throughout. Thus, practitioners may wish to interpret these results within the context of pilot testing. Secondly, the use of 2D analysis was not compared against an alternative method of measurement (e.g., an optical measurement system like Optojump). Such analyses would provide additional confidence in the manual detection of touch-down and toe-off. We aimed to somewhat combat this confidence issue by undertaking a test-retest design, enabling both within-and between-session reliability data to be reported. Consequently, these metrics do appear to be usable for practitioners. Thirdly, no kinematic analysis was undertaken (e.g., assessment of joint angles, such as knee flexion or knee valgus), which future research should aim to include. Including kinematic analysis would help to complement the existing metrics reported in the present study and may help to further explain why fluctuations in limb dominance were evident between test sessions. Finally, owing to the existing general data protection regulations and the age of the participants, we were unable to conduct interrater reliability, which, if conducted, would further enhance the usability of these metrics amongst those working in interdisciplinary teams. Moving forward, given the importance of monitoring more than the outcome measure solely (i.e., the distance jumped for horizontal jump tasks), we suggest that, where possible, practitioners take the time to calculate and quantify some strategy-based metrics, such as RSI and leg stiffness, as these may be more sensitive to change than jump distance [34].

Conclusions
The present study demonstrates that, with appropriate levels of familiarization, quantifying metrics for a repeated hopping strategy can be conducted reliably. These results represent a useful finding for practitioners who can start to consider both the outcome measure and the jump strategy during a repeated hopping task. Given the importance of strategy metrics in vertical jumping and the limitations of jump distance on its own in horizontal jumping, this study highlights that practitioners can confidently gather more data from the triple hop test, which is commonly employed as part of a return-to-play test battery.