1. Introduction
Marathon running represents one of the most demanding tests of human endurance and strength, requiring participants to demonstrate resilience and mental fortitude over a distance of 42.195 km. Elite marathon runners showcase extraordinary capabilities, with the fastest male athletes averaging speeds of approximately 20 km/h and the fastest female athletes achieving around 19.3 km/h. These remarkable feats are not accomplished through a simple, unwavering pace. Instead, they result from carefully calibrated speed fluctuations designed to balance energy conservation and fatigue management. A key component of these performance strategies is the concept of critical speed. The maximum sustainable aerobic pace an athlete can maintain without rapidly accumulating fatigue. Running at or below this pace allows the body to sustain energy supply and meet the muscular demand for oxygen. Exceeding this velocity, however, forces the body’s metabolism to shift towards anaerobic pathways, leading to accelerated fatigue. Through rigorous training, elite runners develop the capacity to approach or even surpass their critical speed for extended periods. By alternating between slower and faster speeds, they optimize their energy use, delay fatigue, and maximize performance [
1,
2].
This strategy enables them to sustain energy and delay fatigue, thereby optimizing their potential across the race distance without reaching their maximal oxygen uptake (O₂max), which cannot be sustained for a long time [
3]. The latter represents the maximum rate at which an individual can consume oxygen during intense exercise. Higher levels of O₂max are typically associated with enhanced endurance and aerobic capacity, enabling runners to maintain faster speeds over extended distances [
4]. The combination of an athlete’s O
2max and their tolerance to oxygen deficit (their ability to continue exercising when the demand for oxygen surpasses the supply) helps to inform their pacing strategy and predict their endurance potential. In contrast to the elite runners, recreational runners, an increasingly large and diverse demographic often adopt a rigid pacing approach, aiming to maintain a constant speed throughout the marathon. This approach leads them to achieve their O
2max [
5]. This strategy, however, frequently results in a phenomenon known as the marathon wall, which is characterised by a sudden onset of extreme fatigue that typically occurs around the 26th kilometer [
6]. The phenomenon is caused by a depletion of glycogen stores and an increased reliance on less efficient energy sources, which results in a significant reduction in speed. A sharp decline in speed is often observed in recreational runners who encounter this phenomenon, resulting in a final median speed that is considerably lower than their initial pace [
7]. This pacing challenge emphasises the necessity for a more adaptive strategy that could assist runners in achieving their optimal performance without experiencing significant energy deficits in the latter stages of the race.
As the race progresses, particularly in the final 15 km, physiological indicators such as heart rate, oxygen consumption (O
2), and respiratory rate exhibit increasing variability or entropy [
8,
9]. In this context, entropy reflects the body’s fluctuating physiological state as it strives to meet the escalating demands of prolonged exertion. Two primary types of entropy are relevant to marathon running:
Clausius Entropy (Thermodynamic Entropy): Clausius entropy, which is rooted in the principles of thermodynamics, reflects the disorder or heat accumulation within the body during prolonged exertion. As runners approach the final kilometers, their bodies generate a substantial amount of internal heat, particularly in the presence of elevated temperatures, with body temperatures frequently reaching over 40 °C. This rise in thermodynamic entropy can place additional strain on the body’s cooling and energy systems, thereby exacerbating fatigue [
10,
11].
Shannon entropy (informational entropy) is defined as follows: Derived from information theory, Shannon entropy refers to the amount of predictability in the runner’s physiological data. During the marathon, there is a reduction in Shannon entropy in the final 10 km, which indicates that the physiological responses become less varied and more predictable as the body reaches a state of fatigue. This reduction in informational entropy indicates that the body’s capacity to adapt in a dynamic manner is impaired, thereby reducing the efficacy of self-regulatory mechanisms in the latter stages of the race [
9].
One promising avenue for enhancing pacing strategies in real time is the use of a variable automatic encoder. This AI-based technology is capable of analysing intricate physiological signals and translating them into actionable feedback for the runner. By dynamically encoding data from variables such as heart rate, O
2, speed, and perceived exertion, a variable automatic encoder can provide a continuously updated representation of the runner’s state. Integration of this encoder with a cardio-GPS device could facilitate the provision of personalized pacing adjustments to runners based on their real-time physical state. This would assist them in optimising speed fluctuations without exceeding their limits, and in developing an understanding of the relationship between their perceived exertion and their actual physiological response in terms of % of O
2max. Indeed, AI could assist with pacing in a manner that is commensurate with the Borg Scale for Perceived Exertion [
12]. In structured training program, runners are frequently introduced to a variety of levels of exertion, which are typically evaluated using the Borg scale of perceived exertion. The scale ranges from 6 to 20, with each level corresponding to a subjective feeling of effort and difficulty. The most commonly utilized levels for marathon training are as follows [
13]:
The pace is designated as “Easy” and is equivalent to a speed of 5.5–6.4 km/h. The individual is comfortable and able to engage in conversation.
The 14th level of exertion, designated as “moderate pace”, is characterized by a moderate level of effort that allows for sustained conversation. The level of difficulty is moderate, yet the pace is sustainable.
The seventeenth level is designated as “Hard Pace.” This level of exertion is considerable, yet it can be sustained for relatively brief distances.
The Maximal Effort (20): This level of exertion is close to the point of exhaustion and is therefore not sustainable over extended periods of time.
Perceived exertion levels allow runners to adjust their pace according to the specific demands of a race. This study aims to determine whether an AI system can learn these exertion levels through calibration tests to establish a personalized energetic signature. Based on O2max, oxygen deficit tolerance, and perceived exertion, this signature could guide AI-assisted pace adjustments during the race. By analyzing sophisticated physiological data, this pilot study explores whether an AI-powered system can offer more effective pacing strategies for marathon runners than traditional cardio-GPS devices. The goal is to design an adaptable pacing assistant that integrates and encodes each runner’s unique physiological responses, enabling optimal pace adjustments while avoiding the rigidity of fixed speed maintenance. Such an approach could allow recreational runners to input personalized pacing profiles into their devices, with AI guiding them through optimal speed variations across a marathon distance. This innovation aims to support runners in achieving their personal best under safe, optimized pacing conditions, fostering greater enjoyment and sustainability, especially for diverse and aging populations. To demonstrate this potential, a pilot experiment investigates the capacity of a deep neural network to extract an energetic signature. Physiological modulations in a cohort of runners during a marathon were analyzed using extensive datasets, including heart rate and speed (Garmin 630), oxygen uptake (O2), respiratory frequency, and metabolic data (Cosmed K5). The use of deep learning in multi-sensor sports data analysis remains uncommon, partly due to the high cost and variability of data acquisition. To address this, innovative strategies such as fractal methods and data augmentation techniques, including sliding windows, were applied to enhance temporal progression within datasets.
The study focuses on analyzing marathon runners’ physiological parameters through deep neural networks to derive insights about performance and propose race strategy adjustments. Artificial intelligence can assist sports physiologists in prioritizing performance tests, uncovering details that are otherwise inaccessible to the human eye. A Variational Autoencoder (VAE), a generative statistical model, was used to create individual signatures sensitive to physiological variations and fatigue [
14]. Additionally, Hölder exponents and multifractal spectrum analysis provided a deeper understanding of cardiac autoregulation during intense exercise [
15]. While Lyapunov exponents have been used to characterize equilibrium plateau [
16], their integration into a multivariable energetic context remains unexplored and could help identify exhaustion points and unsustainable pacing with greater precision.
The study seeks to elucidate the unique physiological signatures of marathon runners and explore the interpretability of Garmin and K5 data. Ultimately, it aims to detect fatigue-induced disruptions in race dynamics, advancing our understanding of marathon performance and supporting improved training and race strategies.
2. Materials and Methods
2.1. Subjects
Even if we started with 10 runners, one of them was excluded on account of incomplete data. This was due to a malfunction in the analyzers (a battery issue) that occurred at the half-marathon. Then, nine recreational but well-trained male marathon runners (mean age: 40.1 ± 10.6 years; weight: 72.7 ± 6.5 kg; height: 178.3 ± 7.5 cm) with performance representative of the average performance of non-elite runners but well-trained male marathon runners whose performance are in the first quartile of performance in popular marathon as the Paris marathon [
17] (
Table 1). To avoid introducing additional variables that could impact statistical analysis, we deliberately included only one gender in our investigation.
All participants volunteered and maintained their regular training routines without alterations. The selected runners had prior experience completing a minimum of two marathons. They had been engaged in consistent training, involving three to four sessions per week, covering a range of 50 to 80 km per week, for over 5 years. Every week, the participants incorporated a High-Intensity Interval Training session, involving 6 repetitions of 1000 m at intensities between 90% to 100% of their maximal heart rate, along with a tempo training session of 15 to 25 km at speeds ranging from 100% to 90% of their average marathon pace.
Ethical considerations were met, as the study’s objectives and procedures received approval from an institutional review board (CPP Sud-Est V, Grenoble, France; reference: 2018-A01496-49). All participants were well-informed about the study and provided written consent to participate.
Table 2 in the manuscript presents information regarding participants’ ages, their personal best marathon completion times, and the year these performances were achieved. Notably, some of the runners achieved their personal best during the Sénart Marathon.
2.2. The Marathon and Experimental Measures
In the context of the marathon event, all participants participated in an official race known as the Sénart Marathon in France. The race commenced at 9 a.m. under specific environmental conditions. On 1 May 2019, in Sénart, the weather included temperatures between 11 and 15 °C (from 9 a.m. to 1 p.m.), no precipitation, and an average humidity of 60%. Blood lactate levels were assessed using a finger-based lactate measurement device (Lactate PRO2 LT-1730; ArKray, Kyoto, Japan) immediately after a 15-min warm-up at a leisurely pace and three minutes after crossing the finish line.
Throughout the study, we collected continuous data on respiratory gases (oxygen uptake [O2], ventilation [E], and respiratory exchange ratio [RER]) using a portable telemetric system (K5; Cosmed, Rome, Italy) that allowed breath-by-breath analysis. Additionally, a combination of a global positioning system (GPS) watch (Garmin, Olathe, KS, USA) and the K5 system was utilized to monitor heart rate and speed responses, with 5-s averaged data, during each trial. The data collection sampling frequency were the following:
- –
Garmin data: Collected at a frequency of 1 Hz (once per second), providing continuous recording of the athlete’s pacing, heart rate, and GPS coordinates.
- –
Cosmed K5 data: Acquired at a sampling rate of 0.2 Hz (once every 5 s), capturing breath-by-breath physiological parameters, including oxygen uptake (VO2), carbon dioxide production (VCO2), and ventilation (VE).
- –
Outlier Treatment: Outliers were identified using statistical criteria based on physiological plausibility, particularly for heart rate, VO2, and speed data. Heart rate values outside physiologically plausible ranges (below 40 bpm or above 220 bpm) were identified as outliers. VO2 and speed data points were similarly screened using statistical thresholds (values beyond three standard deviations from the participant’s mean). Approximately 1.5% of the collected data points were identified as outliers and subsequently removed from the dataset.
- –
Missing Data: Missing data (2% of the dataset) ocasionally arise due to sensor failures, communication issues with the athlete, or manual data-collection errors. Missing data were addressed using spline interpolation when the gap was short (less than 30 s), connecting the last known data point to the next known data point using polynomials for a smoother approximation. Longer gaps or substantial missing segments were excluded from the analysis to ensure data integrity. Kalman filter from Subway and Stoffer [
18] were used in this paper for ensuring each data point is assigned one definitive imputed value.
- –
Scaling of Data: for any variable (e.g., heart rate), the standardized value calculated as, where denote the mean and standard deviation of x over all observations in the dataset, All physiological variables–heart rate, VO2, and speed–were standardized using Z-score normalization before being provided as inputs into the VAE. Such uniform scaling also assists in dimensionality reduction and network training by preventing any one variable’s scale from dominating the learning process. To prevent pacing-related influences, runners were encouraged to self-pace their runs while the cardio-GPS display was concealed.
Hydration and refreshment points were available every 5 km during the marathon, with additional stations offering sponges and sustenance every 5 km from 7.5 km onward. Runners could remove their masks at these points to consume food and beverages. Consistently, all participants drank one glass of water and consumed fruit at each hydration station along the route and at the Start/Finish area.
For metabolic assessment during exercise, the participants utilized COSMED reusable face masks constructed from silicone to prevent allergenic reactions. These masks were ergonomically designed to fit snugly and comfortably, maintaining a proper seal without compromising data accuracy. In high-intensity exercises, the runners used masks with inspiratory valves to reduce resistance during inhalation, enhancing comfort.
The rate of Perception Exertion (RPE) was tracked using the Borg 6–20 scale [
12], and participants reported their level of fatigue at least every kilometer or more frequently as needed. This scale was employed to correlate physiological stress indicators with marathon fatigue, and participants were familiarized with it during the two weeks preceding the race.
2.3. Mathematical Procedure
2.3.1. Variational Auto Encoder
Autoencoders are a class of neural networks widely employed in unsupervised learning tasks, particularly in the domain of dimensionality reduction, feature learning, and data generation. The fundamental architecture of an autoencoder consists of an encoder and a decoder, which work in tandem to learn a compressed representation of the input data. The encoder maps the high-dimensional input data into a lower-dimensional latent space, capturing essential features and patterns. Subsequently, the decoder reconstructs the original input data from this compressed representation (
Figure 1).
The encoder and decoder components of a standard autoencoder are typically implemented using feedforward neural networks. During training, the autoencoder aims to minimize the difference between the original input and the reconstructed output. This process encourages the network to learn a meaningful encoding of the input data in the latent space. Autoencoders have found applications in various fields, including image denoising, anomaly detection, and feature extraction. While conventional AEs are effective at learning compact representations of data, they lack a probabilistic interpretation, making them limited in tasks that require uncertainty estimation or generative capabilities. Variation Autoencoders (VAEs) address this limitation by introducing probabilistic modeling into the autoencoder framework. VAEs reinterpret the latent space as a probability distribution, allowing for the generation of new data samples by sampling from this distribution. The key principle of VAEs is to impose a constraint on the latent space’s distribution to encourage it to follow a specific prior distribution, often a Gaussian distribution. This constraint is typically enforced using the Kullback-Leibler (KL) divergence, which encourages the learned distribution to be like the chosen prior (
Table 2). During training, VAEs aim to minimize two primary components of the loss function: the reconstruction loss, which ensures faithful data reconstruction, and the KL divergence, which aligns the learned latent distribution with the desired prior distribution. Balancing these two components enables VAEs to generate new data samples that exhibit both meaningful variations and adherence to the underlying data distribution [
19].
Here we detail the experimental setup, including the architecture and hyperparameters of the VAEs trained in the frame of the project. The selection of these parameters was guided by both prior research in the field and empirical evaluation on our specific dataset. We use the MSE model to calculate the reconstruction loss, while we systematically tuned the hyperparameters to achieve optimal performance and convergence during training.
The Variational Autoencoder (VAE) was trained from scratch, without any pre-trained weights or transfer learning, ensuring that all latent representations emerged solely from the given race data. The following hyperparameters were chosen based on a combination of heuristics and experiments and are listed in
Table 2. To validate the chosen parameters, we conducted a sensitivity analysis by systematically varying key hyperparameters while monitoring performance metrics. The results indicated that the selected parameters yielded optimal trade-offs between model complexity, convergence, and reconstruction quality. To ensure reproducibility, a public code repository (GitHub) will be provided upon publication, containing (1) data preprocessing scripts to replicate input transformations (2) model architecture and training configurations for consistent setup (3) hyperparameter settings and evaluation metrics.
The Variational Autoencoder (VAE) was chosen for this study due to its ability to perform unsupervised learning and latent space modeling, making it particularly well-suited for capturing the intrinsic structure of marathon race data. Unlike supervised models that require labeled data, the VAE learns meaningful representations without explicit labels, which is crucial given the limited number of runners. The model allows for dimensionality reduction, encoding race dynamics into a latent space where variations in runner performance, including the “hitting the wall” phenomenon, can be analyzed.
The choice of a VAE is motivated by its ability to learn meaningful latent structures in an unsupervised manner, which is crucial given the limited number of runners and the absence of explicit labels. The VAE facilitates dimensionality reduction while preserving the variability in race dynamics, allowing for an interpretable latent space. Supervised models such as CNNs are primarily designed for classification tasks and require labeled data, which is not available in this context. While recurrent neural networks (RNNs) and long-short term memory models (LSTMs) are effective for time-series prediction, this study does not aim to predict the race timeline but rather to explore underlying latent variables. Moreover, recurrent models often require large datasets and significant computational resources, whereas VAEs efficiently extract meaningful representations with limited data.
Alternative approaches such as standard autoencoders could have been considered; however, VAEs offer a probabilistic framework that enhances generalization and prevents overfitting. A potential comparison with other models, such as GANs or deterministic autoencoders, could further validate the VAE’s capacity to learn structured latent spaces. The primary objective is not classification or forecasting, but rather uncovering hidden dynamics in race behavior, making the VAE the most suitable choice for this study.
2.3.2. Model Validation
Given the unsupervised nature of the task, traditional supervised validation techniques such as accuracy or F1-score are not directly applicable. Instead, the model’s performance was assessed using reconstruction error, latent space coherence, and consistency of learned representations across different runners. To ensure robustness, a holdout validation approach was used, where the dataset was split into 80% for training and 20% for testing. The reconstruction loss (Mean Squared Error and Kullback-Leibler divergence) was monitored to prevent overfitting. Additionally, the VAE was tested on an independent subset of race segments, verifying that latent variables remained stable and interpretable across different runners.
Further evaluation involved visualizing the latent trajectories, revealing meaningful groupings related to physiological events like “hitting the wall”. To confirm the VAE’s reliability, multiple training runs with different initialization seeds were performed, ensuring that the latent space structure was consistent across experiments. While standard supervised benchmarks are not applicable here, these validation strategies provide strong evidence that the VAE captures relevant race dynamics in a reproducible and generalizable manner.
The small-sample size constraint is mitigated by the methodology used to extract overlapping temporal windows from race signals, significantly increasing the number of training samples. By segmenting race data into smaller windows, the VAE learns local patterns and variations within individual performances, rather than being constrained to whole-race trajectories. This approach enhances the effective sample size, improving the model’s ability to generalize. While a larger and more diverse dataset would undoubtedly strengthen the conclusions, the primary objective of this study is not to make broad population-level predictions, but rather to explore latent representations of race dynamics. The findings serve as a proof of concept, demonstrating that the VAE can encode meaningful variations in running performance.
2.3.3. Construction of the Learning Dataset
To use the datasets at hand while increasing the amount of data for training and considering the temporality of the race, we chose to implement a sliding window on our pre- existing datasets. The sliding window technique is commonly employed in signal processing and data analysis, particularly in scenarios where data is acquired progressively or in a continuous stream. This technique enables data croping in a “sliding” manner.
In scenarios involving progressive data acquisition, such as streaming sensor data or online monitoring systems, it is often impractical to process the entire dataset at once due to memory and computational constraints. The sliding window segmentation technique addresses this challenge by considering a subset of the most recent data points, which are continuously updated as new data arrives. This approach maintains a fixed-size window over the data stream, allowing for the analysis of temporal trends, patterns, or features within the window’s span.
Choosing an appropriate window size Lw is critical, as a smaller window may capture rapid changes but overlook long-term trends, while a larger window may smooth out important fluctuations. By experience, we can show that Lw = 60, ΔT = 1 (or a window of 10 min and a shift of 10 s) give satisfactory results in the context of our study. The resultant learning dataset thus contains over 8000 2D samples for K5, and over 10,000 2D samples for GARMIN.
2.3.4. Marathon Individual Signature
Introducing t-SNE: To visually comprehend the intricate dynamics and patterns inherent in marathon races, we employ t-Distributed Stochastic Neighbor Embedding (t-SNE) in conjunction with our Variational Autoencoder (VAE). t-SNE is a nonlinear dimensionality reduction technique that excels at preserving local structures in high-dimensional data when projected onto a lower-dimensional space. The primary goal of t-SNE is to map high-dimensional data points into a 2D or 3D space while maintaining the relationships and distances between these points as closely as possible. This makes t-SNE particularly effective at revealing clusters, patterns, and disparities in complex datasets [
21].
Visualizing 2D signatures of Marathon races: In our study, the VAE compresses the high-dimensional feature space of marathon race data into a lower-dimensional latent space. However, to gain intuitive insights and visually represent the runners’ behaviors, we extend the analysis by integrating t-SNE. By applying t-SNE to the latent representations learned by the VAE, we further reduce the dimensionality and capture intricate structures that might not be apparent in the original feature space. The 2D t-SNE signatures provide a human-interpretable representation of the complex dynamics observed during marathon races. These signatures allow for exploring individual runner trajectories within the reduced space. This level of interpretability is particularly valuable for our study’s objectives, where understanding the variations and patterns in runners’ behaviors contributes to comprehensive insights into race dynamics.
Marathon signature viability: The choice of the hyperparameters in
Table 2 was ultimately based upon the cleanliness of the signatures obtained with the t-SNE. To evaluate their viability, we have decided to rely upon their continuity. There are several methods available to quantify the “continuity” of a discrete dynamic system. Each method offers a distinct perspective on how to measure the smoothness or regularity of transitions within the system. The study of the variation rate—measuring the rate of change between consecutive values in a time series [
22]—or even entropy measures [
23] and Fractal Analysis [
24] are means available to rate our satisfaction of the signatures obtained. While this criterion is important and must be looked upon, we suppose in the frame of this project that the signatures we visualized on are decent enough to work with.
2.4. Use of Lyapunov Exponents for Anticipating the “Marathon Wall”
2.4.1. Lyapunov Exponents
Lyapunov exponents are mathematical quantities used to characterize the behavior of dynamic systems—whether discrete or continuous, particularly in the context of chaos theory. They provide insights into the sensitivity of trajectories within a system to small perturbations, aiding in identifying chaotic or unpredictable behavior.
For our analysis, we focus on the first Lyapunov exponent, denoted as λ
1. Its value is particularly insightful for quantifying chaos within dynamic systems. It characterizes the rate at which initially close trajectories in the phase space diverge exponentially, a hallmark of chaotic behavior. Higher values of λ
1 indicate more robust chaos [
25], suggesting that the system is highly sensitive to initial conditions and prone to unpredictability.
In the context of our study analyzing marathon runners’ behavior during a race, the calculation of the first Lyapunov exponent serves several vital purposes. Firstly, it quantifies the extent of variability and unpredictability exhibited in runners’ behavior throughout the race. Additionally, an elevated first Lyapunov exponent could signify the presence of intricate chaotic patterns within runners’ pacing and positioning, reflecting nuanced interactions among diverse influencing factors. Moreover, this analytical approach offers insights into performance dynamics, illuminating how minor deviations in initial conditions can yield diverse outcomes among individual runners. Lastly, shifts in the behavior of the Lyapunov exponent could correspond to critical junctures within the race, such as the commencement, culmination, or demanding segments, thereby elucidating runners’ responses to specific race dynamics. In essence, calculating the first Lyapunov exponent within our marathon study plays an essential role in characterizing variability, detecting chaotic tendencies, comprehending performance dynamics, and unveiling pivotal transition points in the race.
2.4.2. Computing λ1 with the Wolf ODE Algorithm
Consider a dynamic system described by the vector differential equation:
where represents the state vector and represents the vector field governing the system’s dynamics. The Wolf ODE algorithm estimates λ
1 using the concept of tangent vectors and their evolution and consists of several key steps detailled in
Table 3 [
26]:
In the same manner as in
Section 2.3.2, we constructed a sliding window over each race of chosen size Lw to plot the over time. We then compute with the Wolf algorithm method detailed above over the Lw window points to obtain the plot for the entire race. After optimization, we found that Lw = 30 is optimal, large enough for to make sense and small enough to get precise race sensitivity information.
2.5. Mann-Whitney U Test for Comparing the Appearance of Fatigue-Induced Cracks in the K5 and Garmin Datasets
The Mann-Whitney U test has proven to be a valuable tool in our comparative analysis of the appearance of fatigue-induced cracks in the K5 and Garmin datasets.
4. Discussion
Marathon running, as explored in this study, presents a unique interplay between physiological limits, pacing strategies, and emerging AI technologies. Even if the IA community could highlights the importance of comparing our proposed VAE based deep neural network approach with another cutting-edge neural network research in a comparative context that could strengthen the relevance and impact of our findings as fault location in cloud data center interconnections, multi-fault location in 5G radio and optical wireless networks, or neural network mapping in optical Network-on-Chips, this approach primarily address optimization problems distinct from our focus on individualized athletic pacing strategies.
Indeed, the objective of our research is to specifically target physiological data-driven approaches, with a view to emphasizing personalized energy reserve management for the purpose of avoiding the drastic marathon speed that is known to appear in the so-called “wall” close to the 30th km according to the individuals. Therefore, the primary objective was to address the critical challenge of balancing energy conservation with fatigue management, particularly for recreational runners who often struggle with rigid pacing strategies. By integrating advanced data analysis and AI tools, we aimed to uncover insights that could inform more adaptable pacing strategies, thereby enhancing performance and reducing the risk of energy deficits. Our study has demonstrated significant insights into the intricate dynamics of marathon performance, emphasizing the interplay between physiological regulation, pacing strategies, and emerging AI technologies. Through a combination of innovative analytical methods and advanced data acquisition techniques, we have provided a new perspective on how runners manage their pace and adapt to the physical demands of a marathon. The findings open avenues for further exploration and practical applications.
Research has demonstrated that muscular power output is regulated in an anticipatory manner to prevent uncontrolled disruptions in physiological homeostasis [
27]. Pacing has a significant impact on energy production from both aerobic and anaerobic energy systems. The goal of the pacing strategy is to optimize these energy systems accordingly. Although the effects of various physiological regulators overlap, the conscious brain integrates their net input using the rating of perceived exertion (RPE) [
12,
28,
29,
30].
Changes in homeostatic status, reflected by momentary RPE, allow for alteration of pacing strategy (power output) in both an anticipatory and responsive manner based on pre-exercise expectations and peripheral feedback from different physiological sensors [
31,
32]. Recent studies have examined the continuous physiological response and RPE during marathons, revealing a similar decrease in the ratio between RPE and speed, heart rate, and VO
2 for all recreational runners [
13].
Our current understanding of how runners adapt their marathon pace to account for various cardiorespiratory and biomechanical factors remains incomplete. One hypothesis proposes a strong connection between the rating of perceived exertion (RPE) and a physiological and mechanical message that is crucial for maintaining accurate adjustments in running speed. More specifically, it is essential to achieve a balance between stride amplitude and stride frequency, in the same way that a cyclist needs to dose his equipment. Furthermore, the signal must possess a sufficient level of uncertainty to effectively convey information.
In this comprehensive study, we have delved into the intricate dynamics of marathon races, employing novel analytical approaches to gain deeper insights into runner behaviors and performance. Our findings have shed light on several key facets of marathon race dynamics, prompting valuable discussions and considerations for future research and applications. One noteworthy revelation from our analysis is the contrast between the K5 and GARMIN datasets in terms of their ability to accurately describe race dynamics and predict fatigue. K5 emerges as a robust data source, effectively encapsulating the nuances of marathon race behaviors. However, this effectiveness comes at the cost of invasiveness, raising concerns regarding the democratization of the method. While K5 offers rich insights, the accessibility of such technology remains a challenge for widespread application. Addressing this limitation warrants exploration of alternative, less invasive data sources that still capture race dynamics effectively.
To ensure the coherence of race signatures with the continuity criteria, adopting mathematical models becomes imperative. Our study acknowledges the importance of modeling in assessing the fidelity of the observed signatures in adhering to the principles of continuity. Future research endeavors should consider incorporating mathematical frameworks that quantify the continuity of race behaviors, offering a more rigorous and objective assessment of the dynamic race strategies observed. In our pursuit of a deeper understanding of marathon race dynamics, the utilization of a temporal Variational Autoencoder (t-VAE) proved instrumental. Unlike classic VAEs, t-VAEs are tailored to better consider the temporality of the race, allowing for more nuanced insights into the evolving behaviors of runners. This innovative approach offers promising avenues for future investigations into the dynamic interplay between various factors influencing marathon performance [
33,
34,
35,
36].
Our analysis unveils a compelling relationship between the appearance of Lyapunov cracks applied on the new variables synthetized with the VAE process. and critical moments in the marathon race, such as significant speed drops and the RPE reaching 15. This temporal alignment suggests the potential for a proactive race strategy to delay the Lyapunov crack. Runners and coaches can explore adaptive pacing techniques and interventions aimed at optimizing performance and mitigating the impact of fatigue-induced fluctuations. This strategic adaptation could lead to enhanced race outcomes and improved performance control thanks to real time race monitoring from the future physiological sensors relied on the phone [
37].
5. Conclusions
In conclusion, our study demonstrates the intricate and multifaceted nature of marathon race dynamics, as revealed through advanced data analysis techniques. While K5 is an effective method for identifying race behaviours, its intrusive nature presents challenges in terms of its applicability to a wider audience. The necessity for mathematical modelling to evaluate coherence with continuity criteria, strategies to delay the Lyapunov crack, and the utilisation of t-VAEs for temporal considerations collectively contribute to a rapidly expanding field of research aimed at optimising marathon performance and advancing our comprehension of the dynamics inherent to long-distance running.
In conclusion, this study addresses the following key questions:
It is therefore pertinent to enquire whether an AI system utilising a variable automatic encoder and learned energetic signatures could assist runners in more effectively managing their pace.
Does the use of AI-assisted pacing reduce the probability of experiencing the phenomenon known as the “marathon wall”?
It would be beneficial to ascertain whether AI pacing can enhance performance outcomes and mitigate health risks, particularly in the context of increasingly hot conditions.
In response to these three questions, it can be stated that the mathematical model is not yet sufficiently developed to allow for the accurate calculation of time. Furthermore, it would be prudent to ascertain the suitability of this approach before advocating it to runners who may be reluctant to forego the use of cardio-GPS monitoring during their races. It would be more beneficial to encourage runners to trust in their sensations, as measured by the Borg scale, rather than relying on speed in km/h or pace in miles per kilometer. Furthermore, it was observed that the cardio-GPS provides only a partial representation of the physiological response when compared to the full metabolic response.
Of course, the limited sample size of nine runners, which indeed restricts the generalizability of our findings. However, it is important to highlight that obtaining comprehensive cardiorespiratory data during an official marathon event is exceptionally challenging. This difficulty arises due to the high sensitivity of metabolic analyzers, constraints posed by marathon organization protocols, and official rules that significantly restrict equipment usage and data collection processes. Consequently, expanding the sample size under official competition conditions is often logistically complex and technically constrained. Nevertheless, future research efforts will aim to overcome these barriers and incorporate a larger, more diverse sample to enhance the applicability of the results.