Determination of a CrossFit® Benchmark Performance Profile

In the trend sport CrossFit®, international competition is held at the CrossFit® Games, known worldwide as the definitive fitness test. Since American athletes are the best in the world regarding CrossFit®, there might be influencing factors on international competition performance. Here, we characterize the benchmark performance profile of American and German CrossFit® athletes (n = 162). To collect the common benchmark performance by questionnaire, 66 male and 96 female CrossFit® athletes (32.6 ± 8.2 years) participated in our survey in both nations. By comparing the individual performance variables, only a significant difference in total power lift performance by males was identified between the nations (p = 0.034). No other significant differences were found in the Olympic lift, running, or the “Girl” Workout of the Day (Fran, Grace, Helen) performance. Very large to extremely large (r = 0.79–0.99, p < 0.01) positive correlations were found between the power lift and Olympic lift variables. Further linear regression analysis predicted the influence of back squat performance on performance in the Olympic lifts, snatch (R2 = 0.76) and clean and jerk (R2 = 0.84). Our results suggested a dominant role of back squat performance in the assessment of physical fitness of CrossFit® athletes.


Introduction
In the international competition of the trend sport CrossFit ® , the CrossFit Games ® , the athletes reach top performances every year [1]. Few previous studies have examined physiological variables that predict the performance at the CrossFit ® Games [2]. Despite Martínez-Gómez et al. associating athletes' performances at the CrossFit ® Games Open 2019 with various power, strength, and aerobic markers [3], so far there are still no specific criteria that allow a prediction of the performance.
The training modality of CrossFit ® , as varied, high-intensity interval training (HIIT), includes exercises from the main elements of gymnastics, weightlifting exercises, and cardiovascular activities, and is usually performed as the "Workout of the Day" (WOD), with the focus on constantly varying functional movements [4]. The CrossFit ® training concept aims to prepare athletes to perform a variety of workouts. Considering that the constant variation of workouts is an essential element of CrossFit ® , in international competitions the WOD requirements are only announced to the athletes a few minutes before the competition [5]. The last-minute announcement of the WOD is an essential difference from other sports, as otherwise it is always known exactly which discipline will be performed in the next competition. Top performance in competition, as in any other sport, is only achievable after years of scheduled training, and requires continuous progression that is monitored in some manner during training [6].
Determining benchmarks and ascertaining performance variables of specific exercises and WODs can be applied for the progression monitoring [7]. Due to the constant variability of training, determination of benchmark performance is necessary, especially in CrossFit ® . Since 2008, CrossFit ® athletes can use the online software "Beyond the Whiteboard" (BTWB) to collect benchmarks performance data and compare them with others.
For this purpose, particular benchmark workouts have been developed in CrossFit ® , like "Hero" WODs or "Girl" WODs. These benchmark workouts must be performed to the same specifications every time [8]. For the "Fran" WOD, there are three rounds, including 21, 15, and 9 repetitions, for time, of 95/65-pound barbell thrusters (male/female) and pull-ups. The "Grace" WOD includes 30 repetitions of 135/95-pound clean and jerk (male/female) for time, and the "Helen" WOD includes 3 rounds of a 400 m sprint, 21 repetitions of 53/35pound American kettlebell swing, followed by 12 pull-ups. In parallel, CrossFit ® also applies the performance variables in the most common weightlifting exercises for performance benchmarking. So, the one-repetition maximum (1-RM) of the power lifts (deadlift, back squat, bench press, and shoulder press) and the Olympic lifts (snatch and clean and jerk) are of special interest [8]. Previous studies investigated the predictive power for top rankings in the CrossFit Games ® 2013 and 2016 of the individual benchmark performance, and found no significant results [9,10]. The CrossFit Open ® is the main opportunity to qualify for the CrossFit Games ® . Mangine et al. analyzed the primary success predictor at the 2018 CrossFit Open ® , and concluded that body fat percentage had the most significant effect [2]. To predict the 19.1 CrossFit Open ® Workout and the WOD "Fran" performances, a further study concluded that absolute VO 2 peak and CrossFit ® Total (one-repetition maximum tests for the squat, deadlift, and overhead press) might be influencing factors [11]. Moreover, it was observed that no German athlete has ever won the CrossFit Games ® since they began in 2007. On the other hand, the American participants are the best in the world regarding CrossFit ® [12]. However, no study has yet investigated significant differences in the athletes' performance profile between both nations, so for the first time, we analyzed the variation between German and American CrossFit ® performances.
To find valid predictors of CrossFit ® performance, only a few studies have been conducted, and they showed conflicting results [13][14][15][16]. On the one hand, previous studies investigated the influence of the physiological variables of aerobic capacity and anaerobic power, and showed a significant influence on CrossFit ® performance [13,15]. On the other hand, studies have only demonstrated an effect of strength on the performance of the "Grace" and "Fran" WODs, but not for "Cindy" [14]. The examination of the CrossFit ® "Murph" challenge (1-mile run, 100 pullups, 200 pushups, 300 air squats, 1-mile run) showed that only the physiological parameter of body-fat percentage was significantly related to total "Murph" time [17]. Based on the results of Dexheimer et al. and Martinez et al., the back squat performance may be considered as a major predictor, so in one study, the back squat strength explained 42% of the variance of the "Fran" performance [15]. Martinez et al. found moderate to strong positive correlations between squat variables and performance in the different WODs [16]. In summary, not a single benchmark performance was found with high predictive power for the main CrossFit ® WOD performances. We hypothesize that considering the entire benchmark performance profile, rather than individual variables, will allow us to predict an athlete's performance ability or compare the performance internationally.
Thus, the aim of our study is to analyze the benchmark performance profile of American and German CrossFit ® athletes in detail, and to investigate any significant differences. In addition, we wanted to verify individual parameters of the benchmark performance profile with our data that predicted specific CrossFit ® performance in previous studies [15,16].

Materials and Methods
Here, were report the characterization of international CrossFit ® athletes' benchmark performance profile based on the benchmark data of American and German participants collected by using a questionnaire. We compared our results using the online benchmarking tool "BTWB" with over 60,000 data points of certain benchmark performances to determine the benchmark performance profile. Based on our sample, we asked whether significant differences occurred between nations and identified benchmark variables predicting others. Our results will allow CrossFit ® athletes to rank their performance internationally, identify deficiencies, and predict specific benchmark variables.

Participants
To characterize the benchmark performance profile of American and German CrossFit ® athletes, in this study, 162 CrossFit ® athletes (male = 66; female = 96) participated from the United States of America (n = 82) and Germany (n = 80). The average age of participants was 32.6 ± 8.2 years. On average, the athletes had a CrossFit ® experience of 3.4 ± 1.9 years, with a training scope per week of 6.6 ± 3.5 h (see Table 1). The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Ethics Committee of University of the Federal Armed Forces Munich, Germany. 6.6 ± 3.5 6.9 ± 3.9 6.4 ± 3.3 6.3 ± 3.0 6.9 ± 4.0 CrossFit ® experience (years) 3.4 ± 1.9 3.3 ± 1.9 3.5 ± 1.9 3.5 ± 2.1 3.1 ± 1.8 Note: The values are expressed as mean ± standard deviation (SD).

Measures
The questionnaire contained 19 items for six overall metrics. Items 1-7 referred to anthropometric data, including gender, age, height, bodyweight, workout volume per week, workout frequency, and years of practice in CrossFit ® . Item 8 required a focus on competition. The next items contained the current 1-RM for the common power lifts (bench press, deadlift, back squat, shoulder press), the 1-RM for the Olympic lifts (snatch and clean and jerk), and the running times for 400 m sprint or 1-mile. Finally, participants completed items 17-19 regarding their current times for the three most common "Girl" workouts, "Fran", "Grace", and "Helen".

Procedure
The questionnaire was prepared in German and English, and both were validated for clarity for four weeks each. After validation, the English questionnaire was distributed in five CrossFit ® boxes around Austin (Texas, United States of America) to collect the American athletes' data. In the same way, the German questionnaires were distributed in six CrossFit ® boxes around Munich and Ratisbon (Bavaria, Germany) to collect the data of the German athletes. To include more participants, the questionnaire was also placed online via the platform www.soscisurvey.de (accessed period from 15 October 2018 to 5 November 2018) and shared in social media groups of the participating CrossFit ® boxes. The survey period was four weeks for each. To further interpret the results, the sample's performance profiles were compared using the "BTWB" benchmarking online tool, which includes a data set of millions of CrossFit ® athletes worldwide.

Statistical Analysis
Descriptive statistics were performed on participant characteristics (Table 1) and on performance data. All data are presented as mean ± standard deviation (SD). Potential outliers were inspected using a box plot and excluded for the description of the performance profiles. To obtain more informative benchmarks and arithmetic means, we also calculated percentile values for all performance variables from the sample and the online "BTWB" tool. Percentage thresholds of 1%, 10%, 25%, 50%, and 80% were determined to represent the different performance profiles by gender. Preliminary analyses were conducted to ensure there were no violations of the assumptions of normality and homogeneity of the variance. The normality was tested using the Shapiro-Wilk test and Q-Q plots, and the homogeneity of the variance using the Levene test. An independent sample t-test was conducted to compare the benchmark performance for American and German athletes.
The Mann-Whitney U-test was performed when the assumption of normality or the homogeneity of the variance was violated. Simple Pearson's r correlations were used to determine the associations between all benchmark performance data. R-values of 0.1, 0.3, 0.5, 0.7, and 0.9 were considered small, moderate, large, very large, and extremely large, respectively [18]. For each of the dependent Olympic-lift performance variables, a multiple regression model was created to analyze the influence of the independent powerlift performance variables. Each power-lift performance variable with significant influence (p < 0.001) was examined in a single linear regression model to create a predictive model of performance and to evaluate the R 2 to determine the portion of explained variation. The regression assumptions were met by performing tests for multicollinearity using variance inflation factor values, homoscedasticity using a scatterplot of standardized residuals and predicted values, multivariate normality using Q-Q plots, and linearity using scatterplots. All analyses were conducted with the software package SPSS 25.0 (IBM, Armonk, NY, USA), and the level of statistical significance (α) was set at 0.05.

Results
The anthropometric data of the participants showed that the training scope per week (h) for males was 0.5 h higher than for females and 0.4 h higher for Germans in the national comparison. The CrossFit ® experience (years) average was 3.4 ± 1.9, without any major differences between the subgroups.
In Table 2, all performance data are shown by gender and nationality. When comparing the genders, we found that males' total powerlift performance was 61% higher than that of females, and the total Olympic lift performance was 53% higher. Males reported faster times for all "Girl" WODs, despite the scaled weights. This effect was also evident for all run values, as shown in Table 2. The American athletes showed higher average values for all power-lift and Olympic-lift performances, without higher maximum ranges. Note: the values are expressed as mean ± standard deviation (SD). Abbreviations: BP = bench press, BS = back squat, CJ = clean and jerk, DL = deadlift, FR = Fran, GR = Grace, HE = Helen, RM = repetition maximum, SN = snatch, SP = shoulder press. We next studied whether there were significant differences in the performance benchmarks between the nations. The t-test for independent samples showed only a significant difference (54.5 kg) for the total power lift performance of Americans (523.9 ± 82.0 kg) and Germans (469.5 ± 105.8 kg) in males (t (64) = −2.17; p = 0.034), and no significant difference for females (t (94) = −2.33; p = 0.062)-see Figure 1. No other significant difference was observed in the Olympic lift performance and in the "Girl" WODs or running times between the nations. The percentage of performance thresholds was calculated (Table 3) and graphically visualized in Figure 1 separated by gender to analyze the benchmark performance profile. According to percentage threshold values, the classification of the performance enabled a more precise description of the CrossFit ® athletes' reachable physical fitness. So, females could move less weight in all weightlifting exercises in all performance groups. However, the proportion of the single weightlifting exercises was equally weighted between the genders. So, deadlift performance was the dominant exercise, with a bodyweight ratio of 2.0 for males and 1.7 for females, followed by the back squat performance, with a bodyweight ratio of 1.7 and 1.4, respectively. The bench press performance was not entirely as pronounced in females as in males, with a bodyweight ratio of 0.8 compared to 1.3 (for comparison, see Figure 2A,C). In descending order of expression, the subsequent weightlifting exercises and their bodyweight ratios for males and females were: clean and jerk (1.1 and 0.9), snatch (0.9 and 0.7), and shoulder press (0.8 and 0.6).   However, for the "Girl" WOD "Grace," females achieved comparable top performances to males. The difference in mean times was only 1%. Nevertheless, the perfor-mance differences in the "Fran" WOD and the 1-mile time were less pronounced than in the "Helen" WOD and the 400 m run time. While females completed the "Fran" WOD an average of 70 s slower, the "Helen" WOD difference was an average of 84 s slower. Similar trends could be observed for the running performance, so the males ran the 400 m on average 25% faster, but the 1-mile only 18% faster.
To analyze the relationship between the benchmark performances, Pearson's correlations were calculated (see Table 4). These significant correlations indicated that the power-lift performance was strongly related to the Olympic-lifting performance (r = 0.79-0.99; p < 0.01). Based on the data of this study, moderate to strong negative correlations between the weightlifting and the "Girl" WOD also were determined, but were partially nonsignificant (see Table 4). The performance in the "Helen" WOD was strongly related to the performance in the 400 m and 1-mile runs (r = 0.59 + 0.58; p < 0.01).  Based on the Pearson's correlation findings, multiple regression was calculated to predict the Olympic lift performance values, snatch, and clean and jerk, based on the single power-lift performance values. From the deadlift, bench press, back squat, and shoulder press performance values, only the back squat performance was a significant predictor of snatch and clean and jerk performance (p < 0.001). A simple linear regression was performed to predict participant's snatch performance based on their back squat performance (see Figure 3A). A significant regression equation was found (F (1,160) = 497.081, p < 0.001), with an R 2 of 0.756. Participants' predicted snatch performance was equal to 3.333 + 0.494 (back squat performance) kg when back squat performance was measured in kilograms. Participants' average snatch performance increased by 0.494 kg for each kilogram of back squat performance. To predict the clean and jerk performance on the back squat performance, a simple linear regression was calculated in the same way (see Figure 3B). The regression equation was also significant (F (1,160) = 852.916; p < 0.001), with an R 2 of 0.841. The predicted clean and jerk performance was equal to 3.279 + 0.650 (back squat performance) kg. For each kilogram of back squat performance, the clean and jerk performance increased 0.650 kg.

Discussion
In this study, we characterized in detail the benchmark performance profile of American and German CrossFit ® athletes and compared the obtained data with thousands of available online data. We found only one significant difference, in the total power-lift performance of males between both nations. Based on our data, the power-lift and Olympic-lift variables showed very large to extremely large correlations. The back squat performance predicted 76% of the variance for the snatch performance, and even 84% of the variance for the clean and jerk performance.
To our knowledge, no studies have previously examined the benchmark performance profile of CrossFit ® athletes in detail. For the first time, we were able to describe the overall performance ability of CrossFit ® athletes and to identify differences between two nations. Mangine et al. presented normative scores for five common benchmark workouts (i.e., "Fran", "Grace", "Helen", "Filthy-50", and "Fight-Gone-Bad") in a previous study, and observed that, on average, males achieved better scores than females for all WODs, despite scaled weights by gender [19]. However, the classification of performance by percentage thresholds in this study showed that females may well be able to achieve similar values to males in WODs without bodyweight exercises. We were able to show females of the 1% performance group achieved similar values for the "Grace" WOD consisting only of clean and jerk exercises (135/95 pounds for males/females) with scaled weights contrasted with the "Fran" and "Helen" WODs. Both WODs included the bodyweight exercise of pull-ups. Through all performance groups, females could not achieve similar values as males, confirmed by the data analysis using the online "BTWB" tool.
Finding only one significant performance difference between the two nations was surprising. This result did not confirm our assumption that the two nations' different levels of success in the CrossFit Games ® would result in differences in fitness abilities. So, there could be other factors, such as social capital [20] or commercial environment, to achieve and sustain top athlete success as in other sports; e.g., in tennis [21].
Determining which variables predicted the performance of one of the best-known WODs, "Fran", was also the purpose of previous studies. Leitão et al. showed that maximal and endurance strength training of thrusters was strongly related to "Fran" performance [22]. We can confirm moderate to strong negative correlations between weightlifting exercises and the "Girl" WODs "Fran", "Grace", and "Helen", also in a multinational experimental group with a larger sample size, as in previous studies. Our linear regression model was consistent with previous studies demonstrating back squat strength, explaining 84% of the variance for 1-RM clean and jerk performance and 76% of the variance for the snatch performance [14,23]. Thus, to the best of our knowledge, our regression model best describes the variance of snatch and clean and jerk performance of all existing studies regarding CrossFit ® . Of note was our large sample size (n = 162), which distinguished our regression model from the noted experimental studies [15,16]. Martinez examined the influence of squat performance and performances in different WODs and found moderate to strong (r = 0.47-0.69, p < 0.05) positive correlations, as our data also showed [16]. This underlined squat as a major determinant of performance in CrossFit ® .
However, CrossFit ® WODs often consist of multimodal exercises that include not only strength-and power-based actions, but also aerobic exercises like rowing or running. Thus, CrossFit ® is a complex training modality that requires different physical abilities (including stamina, flexibility, and agility). So, the interaction of different performances might play a role in the overall assessment of CrossFit ® athletes' fitness abilities. For this reason, the total benchmark performance profile should be considered and combined with the assessment of other physical tests, such as the squat test from Martinez et al. [16].
While the present investigation provided some information about the benchmark performance profile and the relationship between the performance values, it was not without limitation. Since the present study was only a questionnaire survey, it is unknown whether the results could be reproduced in a performance test. However, the performance profile can be validated by comparing it with the data from the online "BTWB" tool. Due to the large size of the online data set, possible incorrect data did not have a significant impact.
The training concept of CrossFit ® intends to optimally prepare the athletes for unknown and unknowable challenges, and how they face them in competition. Identifying predictors for best performance in unknown challenges remains the major task of future CrossFit ® science. Our results confirmed the major role of back squat performance, and showed no differences in physical ability between German and American athletes. Further research should also apply cluster analysis, as shown by Peña et al., to find relationships between the outcome of a simulated CrossFit ® competition, anthropometric measures, and performance variables [24].

Conclusions
To better understand CrossFit ® performance, it is necessary to determine a CrossFit ® benchmark performance profile, as we have presented in this study. In future studies, the consistency of the benchmark performance profile could be confirmed by experimental data collection. In summary, the profile allows our results to rank CrossFit ® performance internationally, identify deficiencies, and predict specific benchmark variables. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Conflicts of Interest:
The authors declare no conflict of interest.