1. Introduction
The range of intelligent data analysis applications has increased steadily in recent years as more domains have been impacted. Intelligent data analysis uses advanced statistical, machine learning, and pattern recognition techniques to probe deeper into the data structure and extract meaningful, non-obvious insights beyond simple summarization [
1]. Various sports are not outliers, and the approach has also been utilized heavily in sports where competition results are a combination of different factors that may not be apparent by non-assisted observation. In sports, these approaches have been used in fan engagement [
2], officiating competitions [
3], sports betting [
4], and sports training [
5].
Moreover, artificial intelligence (AI) approaches are already being used in practice for selecting youth prospects in professional soccer leagues (e.g., the Premier League, La Liga, Bundesliga, etc.) [
6]. The market of intelligent data analytics in sports is expected to grow from USD 0.9 billion in 2023 to USD 15.6 billion in 2033 [
7] with some estimates [
8] ranging up to USD 20.5 billion. Another one of the next steps in the evolution of sports could also be digital sports coaches, which may make sports more accessible by offering a digital service that may allow athlete users to pursue their wellness and fitness goals in a more efficient manner [
9]. The refereeing of sports has also changed by it being supported more and more by information technology; this has happened in soccer with Video Assistant Referees [
10] and in the 2024 Paris Olympics by improving decision-making in various events, from biomechanics analysis in swimming to real-time performance tracking in athletics [
11].
The consequence of methods for intelligent data analysis being used to evaluate athletes objectively is that these methods are also being incorporated into their training routines to exploit every opportunity for improvement fully.
These trends have been observed in practice, and 109 cases of Smart Sports Training (SST) were identified between 2006 and 2020. SST approaches represent various forms of athletic training that utilize wearables, sensors, IoT devices, and intelligent data analysis methods and tools to enhance training efficiency and outcomes. SST aims to improve performance or reduce effort, all while maintaining or exceeding the current performance levels [
5].
Sports cover a diverse variety of activities with different demands on athletes. These demands can be mental and/or physical. In sports, this can range from short bursts of high intensity to prolonged, sustained efforts. This review relates to endurance, defined as the ability to keep engaged in something difficult, unpleasant, or painful for a long time [
12]. Given that endurance sports often require a balance of physical and mental endurance, making them a unique challenge for athletes, it is crucial to develop a deep understanding of how intelligent data analysis can address these challenges.
Several similar systematic literature reviews (SLRs) already exist, addressing either sports in general [
5,
13,
14] or team sports [
15]. They highlight the transformative potential of intelligent data analysis in sports. They do not address our topic directly or cover endurance sports sufficiently, which was the focus of the proposed review.
The authors in Rajšp et al. [
5] proposed the term Smart Sports Training and covered 108 science papers. The drawback is that the SLR was conducted in 2020 and is over four years old. The approach also does not identify the devices used in training or the benefits of incorporating modern approaches in individual cases or endurance sports, which were not investigated specifically.
Furthermore, Bonidia et al. [
13] presented an analysis of data mining techniques and algorithms in sports from 2010 to 2018. They identified 31 relevant studies, highlighting the interest in applying computational intelligence to improve sports performance, predict outcomes, and optimize training. Compared to our proposed study, the SLR did not identify the devices used in training and covered only three studies from the category of endurance sports training, which deserves an investigation of its own.
The authors in Krstić et al. [
14] presented a systematic literature review on applying artificial intelligence, machine learning, and deep learning in improving sports performance. The areas of improvement of this SLR were that only one scientific database was queried and that only eleven studies were related to endurance sports, which is the focus of our review.
Despite the three systematic literature reviews providing valuable insights into the topic, they fell short of addressing the specifics of the endurance sports field. For this reason a more extensive study was proposed and conducted, with the goal of addressing the existing gaps in knowledge and provide an overview of intelligent data analysis in endurance sports training.
Through a systematic literature review, this paper identifies and presents the latest advancements in the domain of endurance sports in the domain of Smart Sports Training as a systematic literature review.
The objectives of this review were determined to investigate (1) how intelligent data analysis methods are used in endurance sports training, (2) which endurance sports are the most supported by such applications of Smart Sports Training, and (3) what types of performance improvements and training goals are targeted, reported, or investigated through the application of intelligent data analysis methods in endurance sports. Based on the study objectives, the following research questions (RQ) were formed:
- RQ 1:
Which intelligent data-analysis methods are used in endurance sports, what performance parameters and outcomes are they focused on, and in which disciplines are these methods implemented most frequently?
- RQ 1.1:
Which intelligent data analysis approaches were used in endurance sports training?
- RQ 1.2:
What was the focus of studies utilizing intelligent data analysis approaches?
- RQ 1.3:
Which endurance sports disciplines are supported most frequently by the implementation of intelligent data analysis methods?
- RQ 2:
What are the most common IoT (Internet of Things) devices, wearables, and sensors combined in endurance sports training combined with intelligent data analysis approaches, and how do they contribute to the training process and data collection and analysis?
The main unique contributions of this review paper are as follows:
Comprehensive Taxonomy of Approaches: A taxonomy for intelligent data analysis methods in endurance sports, based on categorization in machine learning, deep learning, computational intelligence, and other data-driven techniques.
Holistic Overview of Sensors and Devices: A catalog of the standard IoT devices, wearables, and sensors adopted in endurance sports.
Focused Endurance Sports Insights: Unlike previous papers of the broad sports reviews, our study is focused on endurance sports.
The remainder of this paper is structured as follows.
Section 2 presents endurance sports and their training.
Section 3 presents the systematic literature review methodology followed.
Section 4 presents the literature review results.
Section 5 discusses the findings, and the paper is concluded in
Section 6, where key insights and future research directions are presented.
4. Results
The literature search was performed on 22 October 2024. Before duplicate removal and screening, an overall 1305 results were identified, with most of the results (78.5% of the total) originating from Scopus (580) and Web of Science (445), as shown in
Table 4.
The investigation of duplicates was based on the Digital Object Identifiers (DOIs), authors, and titles of individual studies.
Table 5 shows the identified duplicates between the different databases.
It is interesting to note that duplicated results are sometimes even identified inside the same database (e.g., Scopus 2 and Web of Science 2) because these databases aggregate the results across multiple smaller databases. Most of the duplicates were connected to cross references with the Scopus (328 duplicates) and Web of Science (316 duplicates) databases, which was to be expected since the majority of studies were identified from them. The number of unique papers per database, which were not duplicated in other databases, was 18 in IEEE Xplore, 193 in ScienceDirect, 306 in Scopus, and 162 in Web of Science.
After duplicate removal, 943 papers were identified. The screening was performed in a three-step procedure: (a) the retracted papers were identified and eliminated, (b) the papers were screened based on abstract and title, and (c) the papers were screened based on their full-text contents. The papers accepted in the last step (c) were entered into the Data Extraction Table.
Retractions due to papers of questionable origins and practices have increased steadily, reaching over 10,000 papers in 2023 [
30]. We used Retraction Watch [
31] published by CrossRef to compare our results with the retracted papers. In our review, 33 papers were identified as retracted and eliminated from the scope of our study, decreasing the total number of papers to 910.
These papers (910) were then screened by abstract and title, and a total of 726 papers were eliminated, which brought the number of papers to 184. Each study was screened by the first two authors of this paper for relevancy; if a consensus was not achieved, the study was screened additionally by one of the remaining authors. After this, these papers were analyzed thoroughly, and the full text was analyzed. A total of 104 papers were eliminated because they were irrelevant after close inspection, 5 papers with full texts were unavailable, and 75 papers were selected for inclusion in the systematic literature review. The PRISMA flow diagram [
32] in the
Figure 1 below summarizes the whole selection process and shows the number of studies in each selection process step.
The first paper identified was from the year 2008, and the field has become increasingly relevant and studied through the years, as seen in the
Figure 2, with the highest increase in published studies happening between 2017 (two studies) and 2018 (eight studies). The field has stayed relevant after this, and between 6 and 12 studies were published each year.
The papers were also analyzed by where they were published to identify if any journals, book series, or conferences emerged as the best fit for the inspected theme. It was found that the majority of papers (42–56%) were published in unique publications. Only some publications had more than one paper related to the inspected field, as seen in
Table 6, most notably the journals MDPI Sensors (eight papers), IEEE Access (five papers), Springer Journal of Sports Sciences (four papers). This displays the high multidisciplinarity of the domain, which impacts multiple fields.
Numerous approaches were utilized in the case of intelligent endurance sports training, and a diverse set of methods was identified. The variety of approaches, ranging from traditional machine learning and statistical methods to deep learning techniques, demonstrates the complexity and richness of the field.
4.1. Taxonomy of Intelligent Methods
Motivated by a taxonomy presented in [
5], a novel taxonomy was developed to represent the intelligent approaches used in endurance sports training better. The developed taxonomy used and applied existing taxonomies to the approaches identified in endurance sports training. The main categories of computational intelligence, data mining, machine learning, deep learning, and other methods were adopted from [
33]. The computational intelligence methods were divided and grouped based on Engelbrecht’s taxonomy presented in [
34]. The machine learning subcategories were categorized according to [
35,
36]. The taxonomy of techniques is shown in
Figure 3 below. It should be noted that this is not an exhaustive taxonomy of all the intelligent methods; the categories are shown only for methods that were actually identified to be used in the studies.
A total of 63 different techniques were identified. The techniques identified are categorized and presented below by their primary category.
4.1.1. Computational Intelligence
Computational Intelligence is an extensive and diverse collection of nature-inspired computational techniques and methods used to model and solve complex problems in which the conventional approaches based on strict and well-defined techniques are either not viable or not efficient [
37]. Computational intelligence-based approaches were used 13 times, ranging from evolutionary algorithms to swarm intelligence algorithms and fuzzy systems, as shown below.
4.1.2. Data Mining
Data Mining is the process of identifying and extracting information from large datasets to identify previously unknown and potentially useful patterns and relationships [
58]. Data mining techniques were applied three times in the identified literature, as seen below.
Association Rule Mining (ARM) [
59] in [
40];
Numerical Association Rule Mining [
59] in [
40];
Data mining (general) [
60] in [
61].
4.1.3. Machine Learning
Machine Learning is concerned with developing algorithms that enable computers to learn from data and improve their performance on a given task over time without being programmed explicitly [
62]; methods from supervised learning (58), unsupervised learning (6), and reinforcement learning (1) were identified in our case.
Supervised Learning
- (a)
Regression
Polynomial regression [
68] in [
69];
LASSO regression [
74] in [
71,
75];
Support vector regression [
76] in [
77];
Bivariate regression [
78] in [
79];
Multivariate regression [
80] in [
79,
81];
Regression trees [
82] in [
83];
Regression models [
84] in [
85,
86].
- (b)
Classification
Relevance vector machines [
101] in [
102];
Boosted trees [
109] in [
72].
- (c)
Ensemble Learning
Unsupervised Learning
Clustering
Dimensionality Reduction
- –
Singular Value Decomposition (SVD) [
119] in [
67,
120]
Local Matrix Completion (LMC) [
121] in [
120]
Reinforcement Learning
Reinforcement learning [
122] in [
123]
4.1.4. Deep Learning
Deep learning is a subdomain of machine learning that utilizes multilayered artificial neural networks to simulate the complex decision-making of the human brain [
124,
125]. It is a subset of machine learning but was included in the taxonomy as a separate unit since it is characterized by a unique architecture that enables the model to learn multiple levels of abstraction, capturing both simple and complex data features. The most frequently used specific methods were Long Short-Term Memory (LSTM) networks (9), and Convolutional Neural Networks (CNNs) (7).
Recurrent Neural Networks (RNNs) [
138] in [
88,
139];
Feed-Forward Artificial Neural Networks [
140] in [
47,
141];
Temporal Convolutional Networks (TCNs) [
142] in [
143];
Deep Recurrent Q-Learning Network (DRQN) [
144] in [
137];
Region-Based Convolutional Neural Network (R-CNN) [
145] in [
146];
Deep Convolutional Neural Networks [
147] in [
106,
148];
Back-Propagation Neural Network [
149] in [
150];
Adaptive Neural Network [
151] in [
152];
Bidirectional LSTMs [
153] in [
131];
Hybrid Long Short-Term Memory + Convolutional Neural Network in [
132];
Cascaded Pyramid Network [
158] in [
146].
4.1.5. Other Methods
In the other methods section, methods were listed deemed not to fit a specific field or were the only representatives of their field.
Markov Decision Process [
159] in [
160];
You Only Look Once (YOLO—real-time object detection) [
161] in [
106];
Change-Point Segmentation Algorithm in [
91];
Large Language Model (Chat GPT v3) [
162] in [
163];
Custom Supervised Learning Method in [
164];
Custom Correlation-Based Algorithm in [
165];
Customized Predictive Algorithm in [
166];
Collaborative Filtering [
167] in [
95];
Case-Based Reasoning (CBR) [
168] in [
98];
Subgroup Discovery [
170] in [
85].
The most popular general category was machine learning (used 65 times), followed by deep learning (40), computational intelligence (13), and data mining (3), with other methods being used 11 times. When investigating individual approaches, the most popular approaches, which occurred six or more times, were Artificial Neural Networks (10), Long-Short Term Memory (9), support vector machines (8), k-nearest neighbors (8), Convolutional Neural Networks (7), and XGBoost (6).
4.2. Sports
A total of nine endurance sports were identified, where intelligent approaches were used, as shown in
Table 7. The most studied sports were running (35) and cycling (21). A single study could address multiple sports.
4.3. Study Focuses
Each study had a specific goal, which was fulfilled by incorporating an intelligent method in the process. We have identified seven general topics that were the focuses of the inspected studies, with them being the following:
Fatigue and injury management aims to predict when an athlete is at risk of injuries and adjust their training loads or techniques accordingly.
Pacing/effort strategies and (training) path optimization are related to choosing the optimum effort the athlete should expend.
Performance prediction and evaluation are related to evaluating the training performance and predicting the athlete’s future results.
Physiological metrics and biomechanics focused on quantifying the internal and external factors that impact performance.
Technique analysis and classification mostly related to analyzing the athletes movement.
Training planning and adaptation address the design and continuous adjustment of training plans.
Other includes studies whose goals do not fit into identified categories.
The diverse research topics identified in the literature demonstrate the broad applicability of intelligent data analysis in endurance sports training. Together, these research papers illustrate how intelligent data analysis enhances our understanding of endurance sports and offers practical tools for optimizing athletic performance and safety. The categorization of individual papers into previously presented categories is shown in
Table 8.
Computational intelligence techniques were used primarily in training planning and adaptation, pacing/effort strategies and path optimization, as well as performance prediction and evaluation, reflecting their strength in optimizing training strategies and modeling performance outcomes.
Data mining approaches were applied in performance prediction and evaluation, where they played a critical role in extracting insights from large datasets of previous training or competition results. Deep learning methods were most used in technique analysis and classification.
Deep learning was also used frequently in performance prediction and evaluation and physiological metrics and biomechanics, showcasing the increasing reliance on neural networks for processing complex physiological and biomechanical data. The distribution of different techniques demonstrates the applicability of intelligent techniques in endurance sports training research and is shown in
Table 9.
Studies were also examined and categorized according to the number of participants for the applied approaches, as shown in
Table 10 below. The not applicable or 0 refers to studies that were not tested on individuals or the number of individuals was not described. When combining these results, we can see that 37 (~49%) studies were applied on 10 or fewer individuals, or the number of individuals was not stated clearly.
The use of smart devices, sensors, and wearables has become an essential component in modern endurance sports training. These devices enable precise data collection, real-time feedback, and advanced performance analysis, providing athletes and coaches with valuable insights that help optimize training, prevent injuries, and improve overall efficiency. The data gathered using these devices allow for individualized training programs, ensuring athletes can push their natural limits while minimizing injury risks.
The research identified six categories of devices used in endurance sports training, each capturing specific types of data.
Wearable Devices and On-Body Sensors: Smartwatches, wristbands, and rings track metrics like heart rate, sleep patterns, step count, and movement efficiency. These are essential for monitoring daily training loads, energy expenditure, and recovery states [
173].
Inertial and Motion Sensors: Devices such as accelerometers, gyroscopes, and IMUs (inertial measurement units) provide crucial information about movement patterns, stride mechanics, cadence, and stability, which are key factors in optimizing running, cycling, and skiing techniques [
174].
Physiological and Biometric Sensors: Heart rate monitors, pulse oximeters, lactate analyzers, and electromyography (sEMG) sensors measure internal physiological responses to training. These devices help assess cardiovascular efficiency, muscular activation, and metabolic thresholds, aiding in performance prediction and fatigue management [
175].
Performance and Exercise Equipment: Power meters, ergometers, cadence sensors, and cycling computers measure real-time output, ensuring that athletes train at the correct intensity levels [
176].
Location and Environmental Sensors: GPS devices track distance, speed, elevation, and route optimization. Environmental sensors, such as barometers and weather API integrations, provide real-time information on external conditions affecting performance [
177].
Imaging and Motion Capture Systems: Video cameras, motion capture systems, and force plates analyze biomechanics, posture, and gait efficiency, helping refine technique and minimize injury risk [
178].
By leveraging data from these devices, endurance athletes can fine-tune their training plans based on objective metrics rather than subjective perception alone. Real-time feedback allows for immediate adjustments in intensity, form, or pacing. Furthermore, by monitoring biometrics and external conditions continuously, athletes can prevent overtraining and reduce injury risks. With machine learning and artificial intelligence applications, these data can be analyzed further to detect patterns, optimize training loads, and even predict future performance outcomes [
179]. The categorization of research papers by the sensors and devices used is shown in
Table 11. The most popular category of devices was Physiological and Biometric Sensors, which were used in 33 papers, followed by Inertial and Motion Sensors (20), and Location and Environmental Sensors (17). The least used were Wearable Devices and On-Body Sensors, which were used nine times. These devices are general and include several sensors, which were described more thoroughly in other categories. This may lead to a lower number in the first category.
The device used the most times was the heart rate monitor, with 28 occurrences, followed by GPS, with 16 occurrences.
The use of the three most popular devices in each sport is presented in
Table 12. Race walking and Ironman are not specified because no sensors were used in the studies due to the approaches being based on the competition results. As shown in the Table, heart rate monitor is in the top three devices, used in four out of seven sports, since it is one of the most accurate sensors for obtaining measurable data in endurance sports. The only sports where a heart rate monitor is not used are cross-country skiing, speed skating, and biathlon, which use more specific sensors. Additionally, less research has focused on these sports, leading to fewer cases of sensors being used.
5. Discussion
This section presents the findings of our systematic literature review on the role of intelligent data analysis in endurance sports training. The review examines the trends resulting from the analysis of 75 primary sources identified in our systematic literature review. This discussion aims to contextualize the results within intelligent data analysis in the endurance sports training domain and assess the strengths and limitations of current practices critically. The discussion is organized according to the research questions outlined in the methodology.
RQ 1.: Which intelligent data-analysis methods are used in endurance sports, what performance parameters and outcomes are they focused on, and in which disciplines are they implemented most frequently? |
The purpose of our first research question aimed to map the current research space of intelligent data analysis in endurance sports training. To construct this overview, we investigated three core components: the specific analytical methods being used (RQ 1.1), the primary goals and applications of these methods (RQ 1.2), and the endurance disciplines receiving the most research attention (RQ 1.3). The following subsections discuss the findings for each of these components in detail.
RQ 1.1: Which intelligent data analysis approaches were used in endurance sports training? |
A wide range of data analysis methods were used in endurance sports training, but machine learning (ML) methods were by far the most common, used in ~49% papers. Deep learning (DL) approaches followed, accounting for roughly ~41%. This trend is a consequence of the fact that the main goals in sports analytics often focus on prediction and classification (e.g., [
65,
95,
113,
115,
120]). The most popular machine learning techniques included SVM, KNN, linear regression, random forests, and XGBoost. These methods are well-established in machine learning and have been in use for over 20 years, except for XGBoost, which is newer.
Deep learning is used increasingly to analyze complex, high-dimensional data. For example, Long Short-Term Memory (LSTM) networks are particularly suited for the time-series nature of training data. Studies have shown their effectiveness in estimating physiological states during running [
134] and cycling [
129]. Convolutional Neural Networks (CNNs) excel at predicting from visual and sensor data. They are being used for technique analysis, such as identifying cross-country skiing techniques [
127,
131].
In contrast, computational intelligence (CI) methods were used less frequently, appearing 12 times, but they played a crucial niche role in optimization. These methods were applied mainly to complex planning problems, where a single correct answer might not exist. This includes generating personalized training routes through use of genetic algorithms [
33], or adapting plans with Differential Evolution [
39]. This indicates a precise task-specific application of methods in the existing literature.
Overall, 63 different techniques were used, which shows that numerous challenges the researchers are trying to tackle require specific niche approaches in many cases.
RQ 1.2: What was the focus of studies utilizing intelligent data analysis approaches? |
The application of intelligent data analysis in endurance sports covers diverse categories of training themes. The most prominent focus areas were performance prediction and evaluation, technique analysis and classification, and training planning and adaptation, underscoring a primary demand for data-driven tools to forecast athletic potential and prescribe optimized training regimes.
For performance prediction, researchers relied heavily on traditional machine learning models to analyze historical data and predict future outcomes [
65,
97,
115]. In contrast, while machine learning still dominated the field in training planning and adaptaion, computational intelligence methods were often involved, whose optimization capabilities are well-suited for creating personalized and adaptive plans [
48,
51,
72].
The next tier of research interests included technique analysis and classification (14 studies), followed by fatigue and injury management (10 studies), and pacing and effort strategies (9 studies). Technique analysis, crucial for improving efficiency and preventing strain, was dominated overwhelmingly by deep learning methods. The use of models like CNNs and LSTMs is a natural fit for interpreting complex, high-dimensional data from IMUs and video feeds to classify movements or identify flaws in form [
127,
146,
156]. The focus on fatigue and injury management addresses the critical need to maintain athlete health and longevity. Here, studies typically employed ML classifiers to predict injury risk based on training load, biomechanical markers, and physiological responses [
66,
92,
111]. Similarly, developing optimal pacing and effort strategies, another optimization-oriented task benefited from both ML and CI approaches to model race dynamics and suggest ideal energy expenditure [
33,
57].
Finally, a smaller but foundational group of studies focused on modeling physiological metrics and biomechanics (seven studies). This research aims to translate raw sensor data into meaningful biological insights, such as estimating metabolic thresholds or biomechanical forces [
134,
139]. Deep learning was particularly prevalent in this area because it captures intricate patterns in physiological time-series data.
Overall, the distribution of studies‘ focuses reveals a methodologically complex landscape. There is a clear alignment between the nature of the problem and the choice of intelligent method: established ML models are the choice of many for predictive tasks, CI algorithms are deployed for complex optimization challenges, and DL excels at perception and pattern recognition from raw sensor data. This task-specific application of technology demonstrates a maturing field focused on solving tangible problems in endurance sports.
RQ 1.3: Which endurance sports disciplines are most frequently supported by the implementation of intelligent data analysis methods? |
Among the endurance sports studied, running was the most popular sport to be studied, with 35 of the studies that were analyzed studying it. This is because running is accessible and ubiquitous worldwide. It is simple to obtain data using commercially available wearables, and it is of interest at both the recreational and elite levels. The running studies used mostly smart data analysis to monitor training loads, predict injuries, and optimize pacing strategies. For instance, Rothschild et al. [
75] and Martinez-Gramage et al. [
92] utilized wearable-based data and machine learning to predict performance outcomes and determine biomechanical inefficiencies.
Cycling ranked as the second most reported sport, being featured in 21 studies. Cyclists benefit from high-grade datasets generated by power meters, cadence sensors, and GPS-equipped cycling computers. These data sources allow for many applications, from workload monitoring to power output modeling and predictive performance forecasting.
Fister et al. [
42] present an evolutionary-algorithm approach that generates cycling training sessions on a topology graph using a TSS-inspired objective, while Sagi et al. [
73] develop a classification-based recommendation framework for race team selection, where CatBoost achieves the best performance.
Rowing, the focus of seven studies, is technically inclined, with intelligent methods often being employed to study movement skills and stroke techniques. Zhang et al. [
146] employed convolutional neural networks and video-based analysis to extract biomechanical parameters, and Wang et al. [
150] employed AI to construct customized training (nutrition) programs from rowing ergometer data.
Cross-country skiing (seven cases), demonstrated a strong prevalence of inertial measurement units (IMUs) to determine types of skiing and detect movement inefficiencies (e.g., Jang et al. [
127] and Seeberg et al. [
165] are studies that show the ability of sensor fusion and neural networks in sports with complex movement patterns).
Triathlon, a data-rich endurance sport by nature, was researched in only two studies. The lack of studies may reflect, either its lower relative popularity compared to individual disciplines or increasing complexity in combining data from multiple sport disciplines at the same time.
Other sports studied less often included biathlon (two studies), speed skating (two), and Ironman competitions (one). These sports might entail specialized equipment or be challenging to standardize data on, possibly limiting their appearance in intelligent data analysis research. Nevertheless, the existing studies (e.g., Maier et al. [
72] biathlon and Krumm et al. [
64] speed skating), yielded interesting results with the incorporation of shooting accuracy indicators and biomechanical analysis.
Endurance sports like running and cycling benefit from wide commercial sensor support and large user bases, while specialized endurance sports trail behind, suggesting where future research could be directed.
RQ 2.: What are the most common IoT (Internet of Things) devices, wearables, and sensors combined in endurance sports training combined with intelligent data analysis approaches, and how do they contribute to the training process and data collection and analysis? |
The deployment of IoT sensors and wearable sensors is at the heart of applying smart data analysis for endurance training. Out of 75 reviewed studies, heart rate monitors were the most frequently applied devices (23 studies), given that they provide priceless information regarding the internal load, training intensity, and cardiovascular adaptation. Their frequent use in consumer markets and clinical settings makes them highly congruent, with data-driven performance monitoring and modeling.
GPS sensors were the next most common (14 studies), employed mainly in outdoor sports such as running, cycling, and triathlon. GPS tracking allows monitoring of distance, altitude, speed, and route planning, which are used in machine learning models to identify pacing habits or overtraining, such as demonstrated by Lovdal et al. [
111] and Berndsen et al. [
116].
Inertial measurement units (IMUs) were applied extensively in biomechanical examinations, especially in cross-country skiing, running, and rowing. IMUs, which combine accelerometers, gyroscopes, and magnetometers, provide abundant movement data for classifying techniques and detecting inefficiency. For example, Jang et al. [
127] applied CNN and LSTM models based on IMU data to classify skiing styles accurately.
Smartwatches, wristbands, and rings were used in four studies, which were multi-sensor wearables that can collect a range of data, including heart rate, number of steps, sleep, and stress level. Oura Ring and Garmin watches contain cloud integration that facilitates large-scale data integration for predictive modeling, as illustrated in the study by Rothschild et al. [
75].
Power meters and cycling computers, though popular in the context of cycling research, were used in eight studies. These devices allow for the direct measurement during the exercise of external workload (e.g., watts), cadence, and torque. Sagi et al. [
73] and Fister et al. [
42] used power output measurements for determining performance trends and prediction of fatigue through ensemble machine learning methods.
Technology hardware, such as force plates, video cameras, and movement capture equipment, was applied primarily to sports requiring accurate technical assessment. For instance, Krumm et al. [
64] used ground reaction forces in speed skating, while Zhang et al. [
146] applied R-CNNs to extract posture information from rowing videos.
Less often, studies utilized lactate analyzers, pulse oximeters, and surface EMG sensors to quantify internal physiological responses, muscle activation, and aerobic fitness. Such sensors are particularly beneficial in clinical or laboratory-based measurements in building more precise training stress and adaptation models.
More broadly, the results indicate a trend toward multimodal sensor fusion, whereby multiple data streams—physiological, biomechanical, and environmental—are integrated into smart systems. This enables holistic training analysis, personalized feedback, and the creation of adaptive training systems. As the price of devices declines and interoperability rises, the role of IoT and wearables in endurance sports can be anticipated to grow, supported by smart data analysis architectures.
Limitations to Validity
Despite conducting this systematic literature review carefully, several limitations still have to be acknowledged. We acknowledge and recognize the following limitations of our study findings: time frame limitations, the quality and validity of the primary studies, and scope and selection bias.
Time frame limitations: The review contains literature up to October 2024. Because of the development of the field, newer methods and studies have been published since our search concluded.
Quality and validity of primary studies: The validity of our synthesis is fundamentally dependent on the quality of the included studies. A major challenge we identified is the prevalence of research based on small, participant samples (as shown in
Table 10, almost 50% of studies had 10 or fewer participants). This limits the generalizability of findings and highlights a need for larger, more diverse validation studies.
Scope and selection bias: Our review protocol focused exclusively on peer-reviewed, English-language articles from four major scientific databases and limited coverage from additional sources identified from relevant SLRs. This decision introduces potential publication bias (as studies with null or negative results are less likely to be published) and language bias. This means that relevant work published in languages other than English, as well as any findings in the gray literature, has not been identified.
6. Conclusions
This paper reviewed the role of intelligent data analysis methods in endurance sports training and how they can enhance it. Endurance sports cover many highly competitive activities where sustained performance over extended durations is vital. Given the complexity of factors involved in athletic performance, intelligent data analysis has emerged as a powerful tool for refining the training approaches.
The field has risen in popularity from 2018 onwards and has received constant attention ever since. Overall, 75 papers were identified.
The approaches have been used in all areas of endurance sports training, with the most popular addressed areas of performance prediction and evaluation (19), technique analysis and classification (14), and training planning and adaptation (13); these areas demonstrate a broad need among athletes and coaches for accurate forecasting of training outcomes and individualized, data-driven program design. Additional study areas included pacing strategies and training path optimization (nine), reflecting the growing interest in real-time, adaptive coaching solutions.
The most often studied sports were running (35) and cycling (21), which could be attributed to their widespread popularity.
Of the algorithms identified, neural networks (general ANN, LSTM, CNN), support vector machines, k-nearest neighbors, and XGBoost were applied the most frequently. Machine learning was shown to be the most popular approach when looking at results in general. Our review shows that the use of devices in approaches is dominated by heart rate monitors and GPS devices, which provide crucial data on internal physiological load and external performance. While these are standard, a growing number of studies are leveraging more advanced sensors like IMUs for biomechanical analysis and multimodal wearables for more comprehensive athlete monitoring. Despite the number of studies reviewed, this review is subject to potential publication bias, as English-language and peer-reviewed sources were included predominantly. Additionally, many identified studies involved small participant samples, limiting the generalizability of the results.
As costs decline and devices become more user-friendly, these tools could benefit elite competitors and everyday sports enthusiasts, changing how endurance training is conducted. In conclusion, intelligent data analysis provides substantial interventions for enhancing endurance sports training and raising the bar of athletic achievement. Given the competitiveness of endurance sports, it will continue to do so in the coming years.
The future directions of research on the domain could include application of methods used in running and cycling (currently the most studied) to underrepresented less popular endurance sports (e.g., biathlon, cross-country skiing, triathlon), developing artificial sports trainers (as proposed in [
135] for cycling) for all types of endurance sports, integrating multimodal sensor data, developing robust frameworks that integrate multiple data sources and adapt based on available data, and verification of approaches tested on a small number of individuals.