1. Introduction
CrossFit
® is a form of high-intensity functional training (HIFT) that has experienced exponential growth in popularity over the past decade [
1]. This training method combines elements of weightlifting, gymnastics, and cardiovascular endurance exercises, performed at high intensity with considerable variability across sessions [
2,
3,
4,
5,
6,
7]. This combination aims to develop multiple physical capacities simultaneously, including endurance, strength, power, agility, coordination, and speed [
1].
One of the main challenges faced by CrossFit
® coaches and practitioners is the monitoring and regulation of training load in such a complex setting. CrossFit
® sessions are characterized by their heterogeneous structure, typically including daily programmed routines known as the Workout of the Day (WOD), published on the official CrossFit website (
https://www.crossfit.com/ [accessed on 11 August 2025]). In sports with more homogeneous training structures, physiological markers such as heart rate or oxygen consumption have been successfully used to monitor training [
8]. These variables have also been examined in CrossFit
® workouts, where recent studies have characterized acute cardiovascular and metabolic responses, including oxygen uptake kinetics and energy expenditure [
7,
9]. However, these indicators present limitations when applied to sessions involving strength components or short intermittent efforts, as in CrossFit
®, where physiological responses do not conform to conventional loading patterns [
4,
10,
11,
12]. Moreover, the equipment requirements and testing conditions make such physiological assessments impractical for routine use in typical CrossFit
® environments.
For this reason, perceptual methods, such as the rating of perceived exertion (RPE), have gained increasing attention as practical tools for monitoring load. The RPE provides an integrative measure of internal load that reflects the combined influence of metabolic, cardiorespiratory, and neuromuscular stress [
10,
11,
12]. Its session-based derivative (session RPE, or sRPE) has been widely validated as a reliable indicator of training intensity and load in both endurance- and strength-based modalities [
10,
11,
13,
14]. Previous studies have also shown strong associations between sRPE and physiological markers, such as blood lactate concentration during HIFT [
3,
4,
12], supporting its physiological validity in addition to its practicality. Along with its physiological basis, sRPE stands out for its ecological validity, low cost, and ease of application, making it particularly suitable for continuous monitoring in CrossFit
® boxes and similar training settings [
3,
4,
10,
12]. However, it is important to recognize that RPE-based methods are not free from limitations. Factors such as familiarization with the scale and consistent contextual conditions are essential to ensure reliable application of RPE in CrossFit
® environments.
The validity of sRPE has been extensively confirmed across different sports [
10,
11,
13,
14]. In CrossFit
® and other forms of HIFT, several studies have demonstrated that sRPE correlates with variables such as blood lactate concentration or the number of repetitions performed [
4,
12], although its relationship with heart rate appears weaker [
3,
12], suggesting greater sensitivity to metabolic and mechanical rather than cardiovascular demands. Moreover, longitudinal studies have shown that sRPE remains valid when compared with heart rate-based methods, while its reliability improves as participants become more familiar with linking perceived exertion to physiological effort [
12]. Recent investigations have also examined its relationship with training volume and sex differences, reinforcing its utility as a subjective monitoring tool in high-intensity environments [
3,
5,
15]. However, most of these studies have assessed sRPE only as a post-WOD measure [
2,
3,
4,
5,
7,
16,
17,
18,
19], without considering the full structure of a CrossFit
® session, which typically includes at least three distinct components: warm-up, strength/skill work, and the WOD [
6]. This simplification limits the interpretation of overall perceived exertion and reduces the ecological validity of sRPE for real-world monitoring, highlighting the need for approaches that integrate all session phases.
Therefore, the primary aim of this study was to evaluate the validity of sRPE by comparing it with a weighted estimate (RPEW) derived from RPE values recorded during the different session phases, in order to determine whether sRPE accurately reflects accumulated effort. Additionally, we sought to quantify the perceptual demands of different CrossFit® sessions and analyze the relative contribution of each session phase (warm-up, strength/skill, WOD, and cooldown) to the overall sRPE. Finally, we examined whether the type of WOD performed (AMRAP, EMOM, or RFT) and participant sex influenced perceived exertion and its distribution throughout the session. We hypothesized that sRPE and RPEW would be interchangeable measures of overall session exertion, and that both WOD characteristics and participant sex would influence perceived exertion and its distribution across session phases.
2. Materials and Methods
2.1. Participants
Twenty-four CrossFit® practitioners participated in the study: 13 men (age 34.7 ± 8.1 years, height 180.7 ± 9.4 cm, body mass 88.5 ± 10.2 kg) and 11 women (age 34.2 ± 8.6 years, height 163.5 ± 6.9 cm, body mass 61.5 ± 4.7 kg). All participants trained regularly at the same CrossFit®-affiliated center and had at least one year of experience in this training modality. Inclusion criteria required active participation in box classes and prior experience with high-intensity training. Exclusion criteria included any cardiovascular or musculoskeletal condition within the previous six months that could compromise participation.
Participants were instructed to maintain their usual dietary habits throughout the study and to refrain from introducing new nutritional supplements or ergogenic aids. No formal dietary monitoring was performed, as the study aimed to preserve ecological validity under real-world training conditions. The menstrual cycle phase of female participants was not controlled, as sessions were conducted in their habitual training environment without specific scheduling constraints. Adherence and potential withdrawals were closely monitored by the principal investigator, who was the founder, owner, and certified instructor of the affiliated center. Attendance records were systematically reviewed using the box’s registration system, ensuring continuous follow-up of the entire sample throughout the 16-week intervention. No participants withdrew from the study.
Sample size was determined a priori, based on previous evidence and conventional statistical power criteria. Tibana et al. [
4] reported moderate effects (f = 0.31; achieved power = 0.81) in eight trained men during functional training, while Crawford et al. [
12] confirmed the validity of sRPE against heart rate-based methods in 25 recreational participants. Based on these data and a priori power calculations for detecting moderate effects in paired comparisons and repeated-measures analyses (α = 0.05, power = 0.80), a minimum of 18–22 participants was estimated. To ensure adequate power, account for potential dropouts, and maintain comparability with previous studies, 24 participants were finally recruited. The number of recorded sessions (20) and follow-up duration (16 weeks) were determined by the availability of the center and participants, yielding a large number of within-subject observations under ecologically valid training conditions.
Following the classification proposed by McKay et al. [
20], 14 participants were categorized as recreationally active (Tier 1), 8 as trained (Tier 2), and 2 as highly trained (Tier 3). All participants volunteered for the study and provided written informed consent prior to participation. The study protocol was approved by the Ethics Committee of the University of León and conducted in accordance with the principles of the Declaration of Helsinki.
2.2. Procedures
A longitudinal observational study was conducted over 16 weeks in a CrossFit® box located in Pola de Siero (Asturias, Spain). The methodological design and data reporting followed the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology Statement) guidelines for observational studies. Each participant completed 20 training sessions designed and supervised by a certified CrossFit® Level 1 coach with five years of experience and a university degree in Sport Sciences. Sessions were performed in groups of up to 10 participants and followed a standardized structure: warm-up, strength/skill block, main workout (WOD), and cooldown. Among these sessions, three different WOD formats were specifically analyzed: (i) AMRAP (n = 10; “as many repetitions as possible”), in which participants completed as many rounds or repetitions as possible within a set time; (ii) EMOM (n = 8; “every minute on the minute”), where participants performed a prescribed task at the start of each minute and rested until the next; and (iii) RFT (n = 10; “rounds for time”), in which the objective was to complete a prescribed task in the shortest possible time. Sessions followed the international CrossFit® programming standards for the “as prescribed” (RX) level. The exercises, formats and external loads were consistent with those typically prescribed in affiliated boxes and were adjusted to participants’ individual capacities. Specifically, representative workouts were analyzed for each format: (i) AMRAP (~12 min): 10 thrusters (40–50% 1 RM), 15 box jumps, 20 kettlebell swings (16–24 kg); (ii) EMOM (~10 min): 10 burpees and 10 wall balls (6–9 kg); and (iii) RFT (3 rounds for time): 400 m run, 15 power cleans (50–60% 1 RM) and 15 sit-ups.
Prior to data collection, participants attended a familiarization session to ensure proper understanding of the modified Borg CR-10 scale of perceived exertion [
10]. The scale ranged from 0 (“rest”) to 10 (“maximal”), with intermediate verbal anchors (e.g., “easy”, “hard”, “very hard”) according to Foster et al. [
10]. In addition, during the two weeks preceding the study, participants practiced using the scale during their regular training sessions to ensure consistency in reporting. Throughout the study, RPE was recorded immediately after completing each session component (warm-up, strength/skill, WOD, and cooldown) (
Figure 1). From these values, a weighted mean RPE (RPE
W) was calculated, considering the relative duration of each phase with respect to the effective total training time. This index represented the time-weighted perceptual load of the entire session [
11,
21,
22] and was used to assess the convergent validity of the sRPE, which was collected ~30 min after each session (
Figure 1) to ensure an overall evaluation of exertion and minimize potential overestimation caused by the final exercises [
10,
23]. All measurements were performed by the same evaluator to ensure procedural consistency. Sessions were conducted under usual CrossFit
® box conditions, including verbal feedback and peer encouragement, which are inherent to this training environment [
24].
Finally, training load (TL) for each session was calculated in two ways: (i) as the product of sRPE and effective session duration in minutes (excluding transitions or explanations) (TL
sRPE) [
10,
14]; and (ii) as the sum of the partial loads of each session phase (phase-specific RPE × duration of the phase) (TL
RPE).
2.3. Statistical Analysis
Results are presented as mean ± standard deviation. The normality of the data was assessed using the Shapiro–Wilk test. When significant deviations from normality were detected, a logarithmic transformation was applied prior to analysis. Paired Student’s t-tests were used to compare RPEW with sRPE, and TLsRPE with TLRPE. Reliability between measures was assessed using the intraclass correlation coefficient (ICC), typical error of measurement (TE), coefficient of variation (CV; TE expressed as a percentage), and minimal detectable change (MDC). ICC values were interpreted as poor (<0.50), moderate (0.50–0.74), good (0.75–0.90), or excellent (>0.90). CV values were classified as good (<5%), moderate (5–10%), or poor (>10%), and MDC values as excellent (<10%), moderate (10–30%), or poor (>30%). Agreement between measures was further examined using Bland–Altman plots.
Differences in effective training time, RPEW, sRPE, TLRPE, and TLsRPE were analyzed using a two-way repeated-measures ANOVA (sex [male vs. female] × WOD type [AMRAP, EMOM and RFT]). Additionally, the duration, RPE, and TL of each session phase were analyzed using a mixed-model repeated-measures ANOVA with three factors: sex as a between-subjects factor, and WOD type and session phase (warm-up, strength/skill, WOD, and cooldown) as within-subjects factors. Sphericity was tested with Mauchly’s test, and when violated, the Greenhouse–Geisser correction was applied. Significant main effects or interactions were followed by Bonferroni-adjusted post hoc comparisons. Effect sizes for ANOVA were estimated using partial eta squared (η2p) and interpreted as trivial (<0.01), small (0.01–0.059), moderate (0.06–0.139), or large (≥0.14). Pairwise comparisons were evaluated using Cohen’s d, interpreted as trivial (<0.20), small (0.20–0.49), moderate (0.50–0.79), or large (≥0.80). Associations between variables were examined using Pearson’s correlation coefficient (r). Finally, stepwise multiple regression analyses were conducted to determine the influence of phase-specific RPE values on global sRPE. Multicollinearity was assessed by calculating the variance inflation factor (VIF), with values < 10 considered acceptable. Statistical significance was set at p < 0.05. All analyses were conducted using IBM SPSS Statistics v.24.0 (IBM Corp., Armonk, NY, USA).
No covariates were included in the models, as the sample was relatively homogeneous, and the within-subject design minimized the influence of potential confounders. All 24 participants completed the entire follow-up period, and no missing data were recorded for the main variables. Occasional missing entries were handled using a complete case approach. Therefore, no additional sensitivity analyses were required beyond those considered in the study design.
3. Results
All 24 participants met the inclusion criteria and were included in the study. They completed the 20 scheduled sessions within a period of 8 to 16 weeks, with no dropouts due to injury, scheduling incompatibilities or personal reasons. Adherence was 100%, and no missing data were recorded for the main variables (RPE, sRPE, and duration). Therefore, the final analysis was performed using the complete sample.
The effective duration of the training sessions was 37.3 ± 5.0 min, representing 62.1 ± 8.3% of the total session time (~60 min). RPE
W (5.8 ± 1.5) was significantly lower (
p < 0.001, d = 0.69) than sRPE (6.8 ± 1.4). Consequently, TL
sRPE was 15.5 ± 15.2% higher than TL
RPE (254.1 ± 59.6 vs. 213.8 ± 59.5 AU;
p < 0.001, d = 0.68). Bland–Altman analysis (
Figure 2) revealed a positive bias in both comparisons, indicating a systematic tendency of sRPE to overestimate load. However, the wide limits of agreement reflected substantial interindividual variability, limiting the level of concordance between methods. Finally, although the relative reliability of sRPE and TL
sRPE was moderate to good, their absolute reliability and sensitivity to detect small changes were limited (
Table 1).
No main effects of sex or WOD type were observed for effective duration, sRPE, RPE
W, TL
sRPE, or TL
RPE. In contrast, session phase had a significant effect (
p < 0.001) on duration (F = 239.0, η
2p = 0.58), RPE (F = 162.3, η
2p = 0.49), and TL (F = 317.0, η
2p = 0.65). The highest values (
p < 0.01) were recorded in the WOD, followed by the strength/skill phase (
Table 2). Together, these two phases accounted for ~65% of the effective time and ~75% of the total TL
RPE of the session (
Figure 3). In addition, a significant interaction between session phase and WOD type was observed for both duration (F = 2.39,
p = 0.031, η
2p = 0.03) and RPE (F = 3.3,
p = 0.007, η
2p = 0.04) (
Figure 4).
The duration and RPE of the warm-up, strength/skill, WOD, and cooldown phases were significantly correlated (p < 0.001) with the effective session time (r = 0.52, 0.49, 0.56, and 0.50, respectively) and with sRPE (r = 0.38, 0.61, 0.83, and 0.33, respectively). In addition, a significant correlation (r = 0.59, p < 0.001) was observed between the difference in sRPE and RPEW values and the difference between WOD and strength/skill RPE.
Multiple regression analysis showed that sRPE was primarily determined by WOD RPE, which explained 70% of the variance (R
2 = 0.70,
p < 0.001). Adding strength/skill RPE improved the model fit (R
2 = 0.72,
p < 0.001), and including cooldown RPE increased the explained variance to 73% (R
2 = 0.73,
p < 0.001). Among predictors, WOD RPE showed the highest standardized coefficient (β = 0.73), indicating that a one-unit increase in perceived exertion during the WOD was associated with an ~0.73-unit increase in sRPE. In contrast, the strength/skill (β = 0.14) and cooldown (β = 0.11) phases made smaller but still significant contributions (
p < 0.01). The full regression models are presented in
Table 3.
4. Discussion
This study examined the convergent validity and reliability of sRPE in complete CrossFit® sessions. The main findings indicate that sRPE tends to overestimate RPEW and, consequently, the TL derived from sRPE compared with that obtained from RPEW. Although the relative reliability of both metrics was moderate to good, their absolute reliability and sensitivity to detect small changes were limited. Moreover, RPE reported during the WOD was the primary determinant of sRPE, explaining nearly 70% of its variance. While sRPE provides a reasonable estimate of perceived exertion at the group level, the wide limits of agreement observed in the Bland–Altman plots reveal considerable interindividual variability, which limits its accuracy for individual monitoring. The recreational status of the participants may have contributed to this variability, as less experienced individuals may exhibit greater inconsistency when integrating exertion across session phases.
Previous studies [
3,
4,
12] have analyzed the relationship between sRPE and physiological indicators in HIFT, but none have specifically examined its convergent validity across complete CrossFit
® sessions, which represents the main novelty of the present work. Our findings expand current evidence by showing that sRPE not only correlates with physiological responses [
3,
4,
12], but also differs systematically from RPE
W, reflecting the complex perceptual weighting that characterizes multimodal, high-intensity training. This pattern contrasts with what has been reported in more homogeneous training modalities, such as strength or power training. In strength training domain, sRPE is sensitive to variations in intensity [
11,
21,
22] and pace [
25]. Some studies comparing sRPE with the average RPE recorded after each set found comparable values in power training [
11,
22], whereas others in maximal strength and hypertrophy protocols reported higher mean set RPEs than sRPE [
21,
22]. In contrast, the present study observed the opposite pattern, with sRPE overestimating RPE
W, suggesting that perceptual mechanisms in multimodal, high-intensity intermittent training may differ from those in more uniform exercise modalities.
A possible explanation for this discrepancy is that sRPE appears to be strongly influenced by the final phase of the session. Regression analysis confirmed that WOD RPE was the main determinant of sRPE, supporting the hypothesis that perceptual recall is predominantly influenced by the most intense and recent stimuli of the session [
26]. The accumulation of fatigue, elevated cardiorespiratory and metabolic responses, and the anaerobic contribution during the WOD [
2,
4,
5,
6,
7,
15,
16,
18,
19,
27,
28] likely amplify perceived exertion in the final minutes. Indeed, in our study, the greater the difference between RPE during the WOD and strength/skill phases, the larger the discrepancy between sRPE and RPE
W, indicating that although RPE
W integrates the contribution of all phases, sRPE predominantly reflects the perceptual impact of the WOD [
26]. Moreover, the high physiological demands of the WOD [
2,
4,
5,
6,
7,
15,
16,
18,
19,
28] elicit substantial post-exercise stress that may persist for several hours or even up to 48 h [
19,
28], potentially impairing participants’ ability to recall effort evenly across the session. Although Foster et al. [
10] recommended collecting sRPE ~30 min post-exercise to minimize the influence of the final stimuli, this interval may be insufficient for highly intense WODs [
29], as supported by Tibana et al. [
3], who observed a notable reduction in sRPE when measured at longer recovery intervals.
A secondary but relevant finding was the small yet significant contribution of the cooldown to sRPE. In a previous study by our group [
23], the type and duration of the cooldown influenced sRPE, particularly following high-intensity exercise. In the present study, cooldowns were standardized as passive and brief, which likely minimized their influence. However, differences in cooldown design may partly explain the variability in sRPE reported across studies [
2,
3,
4,
5,
7,
16,
17]. Therefore, not only the main session content but also the final session phase should be considered when monitoring or planning training load in HIFT.
The highest RPE values were consistently recorded during the WOD, corroborating previous research identifying this phase as the most demanding [
6]. Its continuous, high-intensity nature elicits greater cardiovascular and metabolic responses [
6], which in turn contribute to higher RPE values [
28]. In contrast, the preceding strength/skill phase, characterized by lower stimulus density and a more controlled pace, elicits lower RPEs despite involving high external loads [
11,
21,
22,
25]. Meier et al. [
6] reported ~30% higher cardiovascular responses during the WOD compared with the strength/skill phase, consistent with the ~30% higher RPE observed in our data. The mean WOD RPE (7.5 ± 1.4) aligns with previously reported ranges (7–9) [
2,
3,
4,
5,
7,
16,
17,
18,
19], although the slightly lower values observed here (<8) may reflect the inclusion of complete sessions (~60 min) rather than isolated WODs. When the WOD is performed as stand-alone training stimulus, participants often reach near-maximal perceived exertion [
7], whereas complete sessions include transitions and pauses that moderate overall RPE [
30]. Additionally, differences in exercise selection [
5,
11,
21,
22], specific WOD format [
5,
17,
19], and motivational factors such as verbal encouragement from coaches and peers [
19,
24] may further explain the slightly lower RPE values observed. Variability among participants related to training experience or sex could also influence RPE. However, previous studies have not found significant effects of either factor on WOD RPE despite differences in cardiovascular responses and work capacity [
5,
16]. Finally, the use of different RPE scales across studies (e.g., Borg 6–20 vs. CR-10) may also contribute to discrepancies in reported values.
Regarding WOD type, studies have generally shown that RPE does not differ substantially across formats [
5,
17,
19]. However, a greater glycolytic contribution has been observed in RFT compared with AMRAP WODs, likely due to the shorter duration and higher intensity of the former versus the longer, self-paced nature of the latter [
15,
17,
19]. Therefore, increases in RPE may depend on both exercise intensity and total work volume [
4,
22]. Despite these metabolic differences, both WOD types appear to induce similar autonomic responses [
27] and comparable levels of fatigue and muscle damage [
19]. Consistent with this, our results showed no significant differences in RPE across WOD formats (
Figure 4).
The present study, together with that of Meier et al. [
6], represents one of the few investigations to analyze complete CrossFit
® sessions including all phases (warm-up, strength/skill work, WOD and cooldown). Previous studies typically assessed isolated WODs [
2,
3,
4,
5,
7,
15,
16,
17,
18,
19]. The training loads obtained here were higher than those reported in WOD-only protocols [
3,
5]. However, when computed solely from WOD RPE and duration (
Table 2), they felt within previously reported ranges (~35–180 AU) [
3,
5], emphasizing the role of session duration in determining total load. For example, WODs lasting ~20 min have been associated with loads near to 150–180 AU [
3,
5], whereas shorter efforts (~4 min) produce loads of ~35 AU [
3]. In our study, the mean WOD duration (~13 min) explains why the training load from this phase fell between these extremes.
Finally, this study has several limitations. First, the sample comprised recreational CrossFit
® practitioners from a single center, which may limit generalizability to other populations or training settings. Second, individual factors such as sleep, prior fatigue, or nutritional intake (including supplement use) were not controlled and could have increased random variability. Moreover, although participants underwent a familiarization phase, repeated exposure to 20 sessions might have influenced their perceptual calibration. Finally, all RPE measures were collected by a single male evaluator, which ensured methodological consistency but may have introduced a gender-related bias [
31]. Future studies should consider using mixed-gender evaluators or self-reported RPE collection to minimize this potential bias.