1. Introduction
Triathlon is an increasingly popular sport with broad participation spanning three disciplines (swimming, cycling, and running) in the same event. In recent years in Spain, participation in triathlon has increased by more than 200% among young athletes of school age (Spanish Triathlon Federation) [
1]. In this discipline, maximum oxygen consumption (VO
2max) is considered the gold standard for determining cardiovascular capacity [
2]. Accurate VO
2max measurement requires specialised equipment found in exercise physiology laboratories—techniques that are often not available to every professional. In addition, testing an entire team can be time consuming because only one athlete can be evaluated at a time. Therefore, alternative parameters have been developed to predict VO
2max that allow several athletes to be tested at the same time without requiring sophisticated laboratory tools [
3].
The ability both to maintain a high percentage of VO
2max for long periods of time and simultaneously move efficiently, referred to as running effectiveness (RE), comes from a series of physiological attributes that contribute to the success of running performance and help athletes stand out [
4]. RE is generally used to refer to steady-state oxygen consumption at a given running speed and expresses the energy expenditure required by individuals to perform at a given intensity [
5]. Trained runners have higher REs compared to lesser-trained runners, which indicates that positive adaptations occur in response to regular training. Although a given athlete may be genetically predisposed to having a ‘good’ RE, various strategies can potentially further enhance an individual’s RE by increasing metabolic, cardiorespiratory, biomechanical, and/or neuromuscular responses [
4].
Until a few years ago, RE was not considered an important factor in the improvement of athletes’ careers. However, this area is now the focus of increasing interest. RE is the result of the interaction between multiple factors. Of these, the most important may be biomechanical factors, neuromuscular variables such as leg stiffness, exposure to training periods at altitude, and anthropometric variables [
5]. A good correlation has been observed between RE and oxygen consumption (VO
2) while running. Runners with a good RE use less oxygen than runners with a poor RE at the same speed and under homogeneous conditions [
6]. However, it has also been noted that RE can vary by up to 30% between trained runners with a similar VO
2max [
7].
In recent years, the advent of portable power estimators has dramatically changed training and competitive running, allowing athletes to be accurately evaluated [
8]. Among these systems, Stryd, Boulder, CO, USA (
www.stryd.com, accessed on 1 March 2021) pioneered the manufacture of power meters for runners. The Stryd running power meter is a pedometer that attaches to the shoe to measure variables that quantify performance including pace, distance, elevation, power, form power, cadence, ground contact time, vertical oscillation, and leg spring stiffness [
9].
This is a relatively new type of instrument, and the validity and reliability of these systems for evaluating power output and space–time parameters have only recently been validated. In this context, the operating power data recorded by Stryd has been successfully used to establish a linear relationship between power and speed to predict power output at different submaximal operating speeds, demonstrating the great potential of this portable equipment for studying efficiency patterns while running. Additionally, a few studies found a positive correlation between Stryd’s power data and the operating economy or metabolic demands. Indeed, a recent study by Cerezuela-Espejo et al. determined the correlation between these power meters and oxygen consumption [
8]. Moreover, Cartón-Llorente et al. 2021 [
10], determined that Stryd could reliably determine the functional threshold power (FTP) of runners.
The detection of running patterns and the variables involved in achieving the maximum possible speed while running has always been the subject of research [
11,
12,
13]. This allows us to compare which parameters best define running efficiency, meaning that the similarities and discrepancies between athletes who are more or less successful in competitions can be examined. In this sense, the use of objective grouping or classification techniques (which are commonly employed with a variety of goals in different fields such as engineering, science, or technology) is also feasible in sports sciences. Thus, unsupervised classification (commonly known as clustering) is a classical technique used in the area of machine learning [
14]. According to Rokach [
15], clustering divides data patterns into subsets in such a way that similar patterns are grouped together.
Several studies have focused on gait patterns by using clustering techniques such as hierarchical clustering analysis (HCA). These provide an interpretable analysis of large quantities of data from sensors, as a multivariate problem, to obtain different groups of athletes with similar running gait patterns [
16]. The objective of this study was to determine running patterns and variables involved in attaining maximum running speed in young triathletes.
3. Results
First, we studied the reliability of the Stryd sensor against the gold standard measured in the laboratory by calculating the correlation of the values obtained with the sensor and the standard at each of the three thresholds. Our data demonstrated the reliability of the Stryd compared to laboratory systems, as also shown in some recent studies [
8,
25] and so we used this data in the subsequent detection of running patterns. Thus, as shown in
Table 2, we compared the speed obtained in the laboratory system with the values for W, W/kg, HW, and FPR obtained by the Stryd device. In addition, these data were also compared with VO
2 (mL/kg/min) measured in the laboratory so we could find the variables that best correlated with speed.
The strongest correlations with speed at each threshold were W/kg, HW, and FPR; on the contrary VO2max was not significantly correlated with speed at any of the thresholds.
Figure 2 shows the graphs corresponding to the regression models calculated to compare W/kg and VO
2max for each of the three thresholds. This allowed us to identify which variable best explained the dependent variable of speed. In this case, the regression models highlighted two variables as explanatory factors for the speed reached by the study participants.
Power, but not VO
2max, perfectly explained speed for each of the thresholds. As shown in
Table 2, there was an exceptionally strong (near 100%) correlation between power and speed, which was also observed in the regression model with an
R2 remarkably close to 1. This indicated that variability in speed could be explained very well by the power of the athlete. Moreover, the regression model indicated how much power would be required to acquire a determined speed at each threshold. On the contrary, this effect was not observed for VO
2max, and the corresponding regression models could not explain the increase or decrease in speed based on this parameter. There was no evidence to indicate a linear relationship between VO
2max and speed.
3.1. Correlations Map
We carried out both Pearson’s r analysis (the most commonly used method to assess correlations) and Kendall and Spearman correlations as non-parametric methods commonly used to perform rank-based correlation analysis [
23]. Nevertheless, the significant correlations remained the same in both cases and so we used Pearson’s correlation coefficient (r) to calculate the correlation matrix to highlight the most pertinent variables to study. As shown in
Figure 3. The dots in red tones referred to negative correlations. For example, as the HW variable increased, the FPR decreased, in a significant and quite strong linear correlation remarkably close to −1. In addition, as FPR decreased, GCT decreased, and SPD, W/kg, HW, W, and LSS increased. Furthermore, as GCT decreased, LSS/kg, VO
2max, SPD, W/kg, and HW increased. Finally, as RE increased, SPD and SL also increased.
Dots in blue tones referred to positive correlations such as the reasonable correlation between SPD and W/kg and significant correlation between SPD and HW. This means that the more W/kg and HW, the higher the SPD runners attained—a clear indicator of running pattern. When LSS/kg increased, CAD also increased and the increase in VO2max correlated with the increase in W/kg. As SPD increased, W/kg and HW also increased; increased W/kg produced increased HW, VO, W, and LSS; as HW increased, W and LSS increased; and increased W resulted in increased LSS.
In contrast, some variables were not significantly correlated and when we cross-referenced these there was only one gap in the matrix. For example, our data indicated that a higher running cadence did not mean that the athlete would run faster. Indeed, this variable did not show a significant linear correlation, meaning that, a priori, it was unlikely to be an important factor in the generation of more speed. In contrast, there was no correlation between athletes with a high VO2max and faster speed, as we previously observed in our regression models. However, faster athletes had a higher W/kg and HW, and a lower FPR. VO2max strongly correlated with W/kg and this seemed to indicate that to run fast, athletes must also correctly manage their power.
3.2. Clustering Heat Map
Finally, we decided to study the patterns of each runner by generating a heat map using HCA. First the data was scaled to standardise the variables and minimise the impact of the different magnitudes. Thus, the data were normalised to have zero mean and unit variance. When the data were scaled, the Euclidean distance of the z-scores was the same as the correlation distance. On the other hand, a connectivity-based clustering or HCA approach was used to identify homogeneous gait patterns in the entire participant group by creating a cluster tree or dendrogram. To perform the HCA, we used the R package ‘pheatmap’ library (Version 1.0.12) [
26]. This allowed us to generate clusters of similar runners based on the variables extracted from the Stryd data and to construct a heat map to observe these patterns according to assigned colours.
The procedure for performing agglomerative HCA on the data set consisted of three steps: calculation of the distance matrix between participants, computation of a linkage function, and definition of clusters. In brief, first the Euclidean distance between every pair of athletes was calculated for an M-dimensional space. Second, individual participants were paired into binary clusters based on the distance information using the Ward D2 linkage method [
27]. Third, newly formed clusters were grouped into larger clusters until the dendrogram was formed [
16,
24]. The Ward minimum variance method was used to minimise the total within-cluster variance. At each step, the cluster pair with a minimum between-cluster distance was merged.
Finally, we visually inspected the dendrogram and decided to separate the clusters into three groups based on our knowledge of the athletes. Thus, the K parameter was established at 3. As shown in
Figure 4, we represented the result of the clustering as a heat map.
As shown on the heat map, three clusters of athletes with similar characteristics to each other were identified. The reference group was cluster two (athletes S7 and S13), representing the two individuals with the best competitive results. As shown, the SPD variable for these two athletes corresponded to the highest values, highlighted in warmer colours (red tones). In contrast, the participants with SPD marked in cooler colours (blue tones) were the slowest from among the cohort. The colour scale was established by columns, with red being representing the individual with the highest value in each of the variables.
Thus, a colour pattern could be observed for each individual with respect to the reference group by noting the variables for which warmer or cooler colours were obtained. For example, participant S14 obtained low SPD, VO2max consumption, HW, and W/kg values and high values for FPR and GCT, indicating the aspects of their running technique they should work on to increase their RE or running speed. In contrast, athlete S2 had high W/kg, LSS, W, and VO values and low CAD values with respect to the reference participants, even though their running speed was normal. This was probably because of the strong correlation between SPD and W/kg and weak correlation between SPD and the other variables.
Based on these data, we carried out a detailed analysis of which characteristics in each athlete were increased or reduced compared to those who had obtained better results. Moreover, by examining certain reference variables such as RE, we observed differences between the participants.
Figure 5 shows a graphical representation of the relationship between speed and the variables that best correlated with it, also separating the individuals by each of the cluster groups. These graphs allowed us to better understand the differences between athletes who run faster and who better manage their performance power compared to those who run slower, according to these groups. Group two was used as the reference and was shown in green.
Thus, the fastest runners had a decreased FPR (A) and GCT (B), and an increased W/kg (C), HW (D), SL (E), and RE (F). Based on these results, it appears that power management and running dynamics play a more important role than VO2max in athletes who run faster.
4. Discussion
The objective of this study was to determine the running patterns and variables involved in the maximum running speed of young triathletes. We observed that there was a pattern of decreased FPR and GCT, and increased W/kg, HW, SL, and RE among faster athletes. Based on these results, it appears that power management and running dynamics play a more important role than VO
2max in athletes who run faster. Various studies have demonstrated the reliability and validity of portable systems such as Stryd for measuring running power [
9,
28,
29]. Additionally, running power is a more sensitive measure of exercise intensity than other internal and external parameters, such as heart rate or speed [
28].
Calculation of the linear correlations for each of the variables we collected in this study was an easy and fast method to understand which factors or characteristics were related to each other. This allowed us to quickly find indications about the influence of some of these variables with respect to others in order to obtain the most important power parameters for running. We observed that the Stryd device data correlated well with VO2max laboratory equipment data. This added confidence to our study of the interrelation of variables and subsequently, to our comparisons between athletes to reliably apply grouping techniques to search for patterns representative of RE. It was also interesting to see that certain variables did not linearly influence RE and so, could be discarded for the purposes of this work, or studied using other distribution models.
In addition, we consider the clustering techniques represented by heat maps to constitute an especially useful tool for quickly explaining the differences between different runners. The colour codes allowed us to find similar patterns for each variable collected during the test, which corresponded to the patterns of each competitor. The ability to group athletes by these colour patterns represents RE patterns either based on the reference of athletes who obtained the best competitive results or simply on a pre-determined set of variables. This will allow us to extrapolate these findings to techniques for other sports in which different characteristics are measured.
Limitations of the Study and Future Activities
One of the limitations of this work was its sample size because it was only sufficient to allow us to obtain preliminary results related to our research topic. However, this work is encouraging and we believe that future work in this area seems very promising. We must also consider that obtaining data for high-performance athletes is quite difficult because they are a very small population and therefore the sample will never be large. Nevertheless, although our sample cohort was small and homogeneous, we would need a larger number of subjects to have sufficient strength of these results to be able to generalise them with confidence to other athletes with similar characteristics. Additionally, for future research, the variable sex should be considered, as it could be a confusing factor when studying the RE.
Finally, the sample size made it difficult to fully utilise the potential of the some of the artificial intelligence techniques available to us. Future work should be directed towards the application of these results in the training of young triathletes to help improve their performance and to determine biomechanical running patterns that complement the present power study using the Stryd sensor in young athletes.
5. Conclusions
In this work, we studied how to identify running patterns among young athletes based on data from wearable sensors (such as Stryd) as compared to laboratory equipment results. Our findings indicate that power management was key to maximising running speed. VO2max strongly correlated with W/kg, indicating that to run faster, athletes must also correctly manage their power. We used different techniques to identify the relationship between strength and some of the other variables in our data set. Thus, we were able to establish which parameters each athlete should work on to enhance their running form. Heat maps were a tool that also allowed us to quickly group runners with similar characteristics, defining colour patterns to characterise them. Furthermore, by comparing each athlete’s performance with the other competitors, we were able to work with individual runners to set target parameters for their improvement. Given that the data was obtained from measurement sensors, we consider it to be very valuable and totally objective information that could perhaps lead to the modification of certain methodologies or training techniques. This work opens the door for future work with other types of variables, such as biomechanics obtained from other sensors, which will broaden the spectrum of factors that can be studied.