A Systematic Literature Review of Intelligent Data Analysis Methods for Smart Sport Training

: The rapid transformation of our communities and our way of life due to modern technologies has impacted sports as well. Artiﬁcial intelligence, computational intelligence, data mining, the Internet of Things (IoT), and machine learning have had a profound effect on the way we do things. These technologies have brought changes to the way we watch, play, compete, and also train sports. What was once simply training is now a combination of smart IoT sensors, cameras, algorithms, and systems just to achieve a new peak: The optimum one. This paper provides a systematic literature review of smart sport training, presenting 109 identiﬁed studies. Intelligent data analysis methods are presented, which are currently used in the ﬁeld of Smart Sport Training (SST). Sport domains in which SST is already used are presented, and phases of training are identiﬁed, together with the maturity of SST methods. Finally, future directions of research are proposed in the emerging ﬁeld of SST.


Introduction
The rapid development of Information Technologies (IT) has had an impact on almost all areas of our lives. Computers, smartphones, smart watches, and other mobile and pervasive technologies change the way we work and how we perceive the outer world. Furthermore, robots are replacing human workers in various industries, especially in the era of Industry 4.0 [1]. There is no doubt that our civilization has to adapt to the many changes that are the consequence of modern technology [2]. Sport training is not an exception, and is also an interesting area, where modern technology is revolutionizing the way athletes maximize their performance and compete on a higher level than ever before. By the same token, with the increase of participation trends in mass sporting events [3], as well as the involvement of people in sporting activities, there is a need for systems/applications that can guide, help, and support people in enjoying their activities [4]. For instance, many people all over the world cannot hire a professional sports trainer due to the many barriers, e.g., financial. On the other hand, extensive research that links intelligent data analysis tools/methods with sport science is building new intelligent solutions that support all phases of sports training. Smart Sport Training (SST) is a type of sports training, which utilizes the use of wearables, sensors, and Internet of Things (IoT) devices, and or intelligent data analysis methods and tools to improve training performance and/or reduce workload, while maintaining the same or better training performance. This means that SST implementations range from simple tasks, such as introducing wearable devices [5] in a sports training session, performing intelligent data analysis of a session, to much more complex artificial trainer implementations, where a coach is replaced by a smart agent which manages all the aspects of training, except for actually performing the proposed exercises for the trainee [6]. The workload reduction can apply either to the athlete or his trainer. For an athlete, an improved training plan means he can achieve better results with the same, or even less, amount of training, and for his trainer, this means the assistance of IT technologies can automate parts of his coaching routine.
The research area that represents the intelligent data analysis methods in the domain of sport training is now becoming very popular. Despite the popularity of this research area, literature on this subject is expanding quickly. In this paper, we compile the latest advancements in this domain. We review the intelligent data analysis methods that are applied in different sports, either individual or team sports. Moreover, we study how mature the studies in the field are when measured according to the technology readiness level [7].
The remainder of this paper is structured as follows. Section 2 outlines the fundamentals of sports training. Section 3 presents a description of the research methodology used to conduct a systematic literature review, Section 4 identifies and classifies the intelligent data methods used in the field. Section 5 presents all the reviewed studies, sorted by the sports in which they were conducted. Section 6 analyzes the findings, and provides answers to the proposed research questions, together with future challenges. The paper is wrapped up in Section 7, where conclusions are drawn.

Sport Training
Sports training is a continuous process between an athlete and their trainer. It is a pedagogically organized process where the role of the trainer is one of teacher and organizer, with respect to guiding the athlete's activities, and organizing their training sessions [8]. Training exercises are precisely defined tasks that demand physical effort, and should in some way improve the sports results of the trainee. Multiple training exercises are then organized into complete units called training sessions. The end goal of a training session is the perfection of the athlete's abilities, in other words reaching their natural potential. The continuous process of training can be broken down briefly into the following four phases [6]: • Planning refers to the prescription of the proper exercise units. The cycle of sports training sessions are focused around the competition calendar. It is the phase in which the trainer prepares the exercise schedule for the athlete. • Realization is the execution phase of the prepared exercises. The roles of the trainer in this phase are: preparing (potential) equipment, conducting a psychophysical evaluation of the athlete before the session, monitoring the intensity of the session, and improving tactics in team based sports. The exercise data needed for further analysis are recorded in this step. • Control is a comparison of the exercises actually performed by the athlete versus the planned exercises. This can be completed by the use of video analysis and contemporary computational technology. In individual sports, a bio-metric performance analysis can be performed, whereas notational analysis systems are used in team sports. • Evaluation is the measurement of the athlete's performance. Two kinds of evaluations exist: (1) The evaluation of the single training load (short-term performance analysis) and the (2) evaluation of the total training cycle load (long-term performance analysis). The evaluation is the comparison between set goals versus achieved results, and the amount of planned versus actually performed exercises.
The interconnection and continuous transition between the four mentioned phases is seen in Figure 1. Each cycle should provide the athlete with improved results. Since sports training is an activity of at least two parties, notably the trainer and the athlete, various computational-based approaches can be used to aid the decision-making of the trainer, or replace him altogether by introducing a virtual assistant. This allows the athlete to choose from a variety of possible training regimes without the need for employing an actual person to aid them in their training.

Research Methodology
We followed the Systematic Literature Review Guidelines in Software Engineering [9] to conduct this review. The goal of this review was to: (1) Identify how modern smart applications and methods assist athletes and trainers in sports training and (2) how fast the theoretical knowledge is transitioning into practical real-world use cases. Based on the study goals, the following Research Questions (RQ) were formulated: The literature search was conducted between 12 March 2020 and 18 March 2020. The standard search string used to identify literature was: ("sport") AND ("training" OR "tracker" OR "logger" OR "diary" OR "trainer") AND ("data mining" OR "computational intelligence" OR "artificial intelligence" OR "big data" OR "machine learning").
There were some differences in the search strings used between databases, due to different query languages and limitations between scientific paper databases. The databases queried are shown in Table 1 with their corresponding number of results; the results are shown prior to the exclusion of duplicates and prior to their evaluation based on inclusion and exclusion criteria. The abstracts of the studies were all inspected to include/exclude the studies from the review. There were some necessary modifications to search strings. The ScienceDirect database allows a maximum of eight Boolean operators per search field, so the search string was split for the abstract and full text conditions. All the results were inspected on all databases except for Google Scholar, where the results were shown by relevancy, and the search was stopped once there were no more included (relevant) studies on two successful pages −20 results, and this criteria was satisfied after 270 inspected results.
All the database results were checked for duplicates, and after removal, 181 results remained. The duplicates between database pairs are seen in Table 2, where it can be seen that all the databases had at least one duplicate when compared with Scopus or Google Scholar. That is because Scopus and Google Scholar merely index documents found on other databases, and do not host them. The selection and exclusion criteria were specified, and limitations were examined with respect to determining as complete and actual a state of the field as possible.

Selection criteria:
• The research addressed sport training, sport trainers, or sport trainees.
• The research was peer reviewed.
• The research addressed sport as an athletic activity requiring skill or physical prowess and often of a competitive nature [10]. • The research used at least one of the intelligent data analysis technologies (e.g., data mining, computational intelligence, big data, and machine learning).

Exclusion criteria:
• The research was not in the English language.
• The full text of the research was not available on the digital library or any of the subscription services.
• The research only addressed activity recognition from a leisure perspective (e.g., general health).

Limitations:
• The research was limited to the five scientific databases/search engines: ACM Digital Library, IEEEXplore, ScienceDirect, Scopus, and Google Scholar. • The research had to be available prior to 12.03.2020, when the indexing of potential articles was conducted.
• Google Scholar results were searched until there were at least two consecutive pages of non-relevant results (20), so a total of 270 results were inspected.
The attributes identified for each study are shown in Table 3. When research proposed general solutions/models for sport training across multiple disciplines but the model was tested and/or used only on a specific discipline or athletes from a specific discipline, that discipline was identified as the only sport of application in the Table. Differentiation between team and individual sports was done on an individual basis, and was determined from each piece of research individually (e.g., tennis may in some cases refer to 1 vs. 1 matches, and in another to 2 vs. 2 matches). The training phases presented were referenced from [6], and their maturity was identified according to the [11] abstraction of H2020 European Union Technology Readiness Level (TRL) [7]. The proposal [11] mapped the nine levels of TRL to the four ordinal values: • Idea (TRL 0-3).
• Production (TRL 8-9). Planning 0-not addressed, 1-idea, 2-prototype, 3-validation, 4-production Realization ---||---Control ---||---Evaluation ---||---By inspecting the title, keywords and the abstract 207 studies were initially selected for review, out of which 27 were duplicates. The remaining 180 studies were selected for full text inspection. However, we could not find the full text for 25 of the studies, but this is of no concern, since the unavailable studies were of dubious origin, and we did not detect them to be cited in any of the papers (when inspected on Google Scholar). Of the 155 fully available studies, 109 were selected as relevant for our literature review and 46 were excluded. The whole review process is shown briefly in Figure 2. The field of smart sports training has been rising in popularity over the last few years, as demonstrated in Figure 3. The first identified study was from the year 2006, and between 2006 and 2012, one to four studies were published in the field each year. Its popularity started increasing sharply from the year 2013 onward, with no less than four research studies in any of the following years. Of the research studies, 23 and 22 of them in this review were published in 2018 and 2019, respectively, which contrasts sharply with previous years. The data for 2020 are of a different shade and hue, since the year is still in progress and we anticipate more research to be published. The results of literature search analysis are provided in the next sections.

Intelligent Data Methods Used in Studies
The objective of this section is to present the intelligent data analysis methods that were used by the researchers in the domain of smart sport training. Following the recent practice of intelligent method taxonomies proposals on other highly domain specific fields, such as intrusion detection [12], very large-scale integrated circuits and systems [13], program binaries [14], diabetes management [15], the same practice, and establishment of a novel taxonomy of intelligent methods is proposed in the domain of SST. The proposed taxonomy is based on the methods identified and currently used in the domain and may be extended as the domain grows and matures in the future.
According to Figure 4, our taxonomy comprises of five main groups (some algorithms can be counted in more than one group e.g., Artificial Neural Networks. In our case, we put artificial neural networks in the machine learning group, since most of the studies that reported the use of Artificial Neural Networks also used the other machine learning methods, e.g., Decision Trees in the same study), from which the used algorithms were identified: • Computational Intelligence methods [16]: -Evolutionary Algorithms: Differential Evolution (DE) [17]. -Swarm Intelligence Algorithms: Bat Algorithm (BA) [18], and Particle Swarm Optimization (PSO) [19]). -Fuzzy systems [20]. -Simulated annealing [21].
Some research did not define the algorithms used clearly, just the field from which they were (e.g., [46], or used custom non-conventional algorithms (e.g., [47]), in such cases the method used was identified as custom and data analysis field from which the method was (e.g., [47] custom data mining algorithm). In cases where the data analysis method used was not clearly visible and an accurate determination was not possible, the slash sign ('/') was used in the

Review of Sports
The following sports were detected in the literature review: Aikido, archery, badminton, basketball, climbing, counter-movement jumping, cricket, cycling, fencing, fitness (gym), (American/ Australian) football, golf, hammer throwing, handball, hockey, jumping, karate, kick-box, rowing, running, cycling, shooting, ski jumping, skiing, soccer, swimming, table tennis, Tai-chi, tennis, triathlon, volleyball, weight lifting, and yoga. The remaining research was unrelated to a specific discipline, and was concerned with sports training in general. This research was placed in the General category. Some sports were investigated much more than others, as is shown in Figure 5, which may be due to their popularity among athletes and regular people, or they may simply be easier to evaluate, and were, as such chosen by researchers. The most popular sports for research were: soccer (12 papers), running (11), and weight lifting (9).  Figure 5. Identified research by the relative frequency of sport it was applied on, rounded to three significant digits.
We have also divided the sports by their participation into three categories: (1) Individual sports, which are sports where the participant normally competes against other individuals and not as a part of a team; (2) mixed sports, where the individual sometimes competes individually against other individuals, but may, in some competitions, be part of a duo (e.g., Tennis) or a team (e.g., Ski Jumping), and (3) team sports, where the individual is always part of a larger team and competes against other teams. We have classified the identified sports in the following way: • Individual-aikido, archery, climbing, jumping, fencing, fitness (gym training), golf, hammer throwing, karate, kickboxing, rowing, running, shooting, skiing, swimming, Tai-chi, tennis, triathlon, weight lifting, and yoga. • Mixed-badminton, cycling, rowing, ski jumping, table tennis, and tennis.
The research related to the General category (unrelated to a particular sport) was not classified according to individual, mixed, team division. Most of the studies were related to individual type sports (54,6%), as seen in Table 4. This may be because it is much easier to control all the experiment variables with individuals, and it is also much easier to receive consent for studying from individuals than from whole teams where every individual has to consent. The identified research is separated and presented by sport (alphabetically sorted) in the following subsections. The research of each sport is presented in a separate table, together with the methods used and the maturity of their implementation, for all sports for which three or more research studies were identified. The sports for which there were less than three studies identified are presented in Section 5.14. For clarity, the following abbreviations for sport training phases presented in Section 2 are used in the table data: Planning, Realization, Control, and Evaluation. Table 5 presents research on smart sports training done in the domain of basketball. All the studies were at least partially concerned with the realization phase of the sports training; in [48] this meant recognizing the actions performed during training by the use of a wearable device, for [49] which meant the creation of a Virtual Reality (VR) environment. Only one study [50], was concerned with the evaluation phase of sports training for more than an idea level of study; the study was concerned with the evaluation of actions performed by players and the effect it had on the game score. The only research that was at least partially related to three of the training activities was [51], where a comprehensive web information system was presented for sports statistics and analysis. There was, however, no presentation of the system by any practical means. The proposal and prototype were described.

Basketball
Identification of commonly used technical actions in basketball games to provide reference for the training of players and coaches, based on Apriori algorithm model generated association rules.
The system's usage in practice was demonstrated. 0 0 0 2

Cycling
No studies in the domain of cycling addressed all four stages of sports training, as seen in Table 6. All of the studies addressed the planning stage of sports training, which is not surprising, since data can be captured easily and existing data sets exist [53]. The research analyzed the cycling session from the coach/manager perspective [54], where the identification of athletes with high potential in cycling was done. The obtained objective of the best results does not necessarily mean the best results for the athlete, but rather the best results for the organization the athlete is representing. One of the studies [55] addressed nutrition planning for optimizing training performance.

Fitness (Gym Training)
Fitness (gym training) is a great environment for conducting research on SST since the environment can be controlled and a lot of training can be done on devices, which allows for the much easier control of variables. The athletes (e.g., [58][59][60]) can easily be equipped with different wearable devices or sensors to measure their exercise data accurately. None of the research was concerned with the control and evaluation phases of SST, as seen in Table 7. A lot of research is done on classifying the movement being conducted, and a repetition count of the exercises (e.g., [58,59]).

Rowing
None of the research in rowing was concerned with the planning phase of SST (Table 8). The research ranged from VR supported rowing simulators that were designed with a combination of use for a rowing training machine ( [62]) to the longitudinal range analysis of training data in rowing [47].

Running
Running was the second most popular sport for research of SST and a total of 11 research items were identified and analyzed (Table 9). All four stages of SST were addressed in the domain of running. There were also some methods presented for post analysis of runners' performance (e.g., [64,65] to identify where running speeds were inadequate. Since the whole act of running when not in a stadium means that the user moves quickly through the environment and does not stay in the same area, the trainers are limited in providing direct feedback during the training. This problem has been solved by the use of wearable sensors and feedback devices, which provide feedback instantly to the athlete, just as is the case with fatigue detection systems [66] and the targeting of heartrate planning systems [67] and virtual coach systems [68]. Table 9. Identified studies in the domain of running.

Ref.
Research Focus Results Training SST Methods P R C E [69] Simulated annealing Planning the optimum running speed of an athlete, by estimating the physical effort needed at each part of the competitions and training.
An example of the application of a data-driven approach to the development of an adaptive decision support system for sports training, based on a case of estimating the optimum running speed of an athlete.

Shooting
All three studies in the field of shooting were related to the realization training stage as seen in Table 10. The main method of improving shooting training is the use of augments to replace real weapon ammo with IT-supported training devices, so that actual ammo does not need to be used. This, in turn, improves the safety of the training grounds, reduces training costs due to replacing real ammo with simulated shots, and enables advanced analysis of training data.

Soccer
Soccer is arguably one of the most popular sports in the world, with up to 43% of the world's population watching or playing it to some extent [77]. Soccer was also the most popular individual sport researched in the SST research, and was the focus of research in 12 different papers ( Table 11). All of the training stages were researched at least once in the field of soccer training research, with one research study [78] related to all the phases of training research. Some of the main researched topics include: injury prediction, prevention and recovery [79][80][81], match analysis [82][83][84], and performed training analysis [85][86][87]. System was developed, and is going to be tested at two soccer clubs. GPS data was used to uncover the training workload of players in a professional soccer club during a season.
The proposed Ordinal predictor was accurate and precise in medium RPE value (i.e. between 4 and 7) but was not consistent in the extreme values (i.e. below 4 and above 7). The soccer training cycles detected were composed of two kinds of training: high and low intensity training loads performed in the days long before, and close to, the match.

Swimming
Swimming research was concerned mostly with the realization phase of the SST, as seen in Table 12. One of the topics [90] was related to the recruitment of potential swimmers'. Wearable devices for evaluating actual training were used in [91,92]. Kohonen's networks showed that through the use of independent variables, they could group subjects accurately into categories, which after a year, achieved very good, average and very weak performances.

Table Tennis
None of the table tennis SST research addressed the planning and evaluation training phases of sports training as shown in Table 13. Virtual reality training with the use of VR goggles for use in table tennis amateur players was presented and proposed in [94]. Since the primary equipment the player uses is a table tennis racket, most of the important movement is concentrated in the upper limb movement. Two studies [95,96] related to table tennis executed training data measurements with the inertial movement unit sensors, fixed on the players' hands.

Tennis
The research in the domain of Tennis addressed all four phases of SST as seen in Table 14. The authors of [97] presented an interesting way of analyzing a tennis game, it was one of the only two studies discovered by our review to use sound recordings as a way of analyzing a game and its events. Similarly to the discipline of table tennis , the use of inertial movement unit sensors was also pervasive in this field (e.g., [98][99][100]). Rules about an athlete were discovered, and using these rules in the future was proposed.

Triathlon
The triathlon is a multidisciplinary sport which consists of cycling, running, and swimming stages, so it should be noted that the mentioned research is at least partially relevant to the triathlon discipline. None of the research addressed the realization stage of SST as displayed in Table 15. The focus of the two studies [103,104] was related to predicting the time in which the athlete will complete each part of the triathlon race. The triathlon discipline [103,105] was the only field where the particle swarm optimization method was used. The k-NN classifier was found to be the best predictor, but the data are not generalizable, and need to be studied further.

Volleyball
The volleyball-related SST research contained two studies [107,108] which addressed training in full, addressing all four stages of training concurrently, as seen in Table 16. Some [107,109] of the research was more focused towards team training, as to what to improve in the team to improve results, and some with improving individuals [109]. The system use in practice was presented. The system detection process of player jumps (which has a 93% rate for true positives and 100% accuracy for true negatives) was described in depth.

Weight Lifting
Weight lifting is similar to fitness (gym training), a fairly static discipline where movement is limited. It was the third most researched SST discipline and a total of nine research studies were published as seen in Table 17. The research was focused largely around the use of inertial movement unit sensors in a lot of studies [110][111][112]. Interestingly, one [113] of the studies put sensors on the equipment instead of the trainee.

Other Sports
In the other sports category, all the sports were included which had two or less studies addressing their training. As such, the sports of aikido, archery, badminton, climbing, counter movement jumping, cricket, fencing, (American/Australian) football, golf, hammer throwing, handball, hockey, karate, kickboxing, ski jumping, skiing, Tai-chi, and yoga, which contained less research regarding their domain (N < 3), are presented here and shown in Table 18. The field research ranged from injury prediction and identification [119,120], to pose recognition and evaluation [121][122][123], virtual coaching and coaching assistants [124][125][126], and VR systems [127,128]. Table 18. Identified studies of other sports.

Ref. Sport
Research Focus Results Training SST Methods P R C E [129] Aikido CBR Presentation of an AI-Virtual Trainer educative system on a case of Aikido lessons. The system proposes varied lessons to trainers, via the utilization of case-based reasoning.
The system can propose training tasks based on requested training objectives, without repeating the same exercises. Archers were split between low and high potential and a 97.5% classification accuracy rate was found.
2 0 0 2 [131] Badminton LR To design a mobile application, which will serve as a virtual trainer and provide the athlete with dietary, exercise and health related advice, based on his profile.
A developed solution for managing stress and health, generating exercise and training schedules, suggesting meals by using the example of badminton.
3 1 3 2 [132] Climbing / A system for automatic route recognition on a climbing wall is proposed using a Inertial Measurement Unit sensor.
The system was developed and tested. It was very accurate for ascent-only climbs, but only limited in use when the ascent was combined with a descent. Developing an Artificial Intelligence based cricket coach, that amateur cricketers can use to practice and gain expertise in cricket, particularly in batting, bowling and fielding.
The system can suggest the best strokes as well as bat movements and can train fielders with regard to every aspect of training.  Hamstring injury prediction models based on training loads, estimated by GPS, accelerometers and perceived exertion ratings of an Australian football club.
Logistic Regressions were found to be the best performing model for predicting injuries. Poor accuracy was found when data from another football club was applied.

General
The General research addressed all non domain specific research where the research was not focused explicitly on a particular discipline; the identified papers are shown in Table 19. The use of smart textiles [143] was addressed as part of training data collection. There were also proposals [144,145] of more generalized methods/frameworks regarding SST. Fatigue prediction [146] was done with real-time voice analysis software and is the second study of all the reviewed ones to feature sound analysis. Another unique aspect addressed was the use of machine learning to shorten the standard estimation of cardiopulmonary function by predicting the end result of cardiopulmonary tests [147]. This is an interesting optimization of training, since such tests usually require the maximum exertion of a person under examination, which leaves the athlete tired after the examination and limits his options for further training on the same day.

Discussion
The pervasiveness of smart applications has influenced all aspects of sports training. Studies show that the SST is and can be a determining force in transforming all of the four training phases (1.RQ). The most researched training phase was realization, as shown in Table 20. The realization phase is arguably one of the least complex stages where the SST approach can be utilized, because recording the actual data during training can be done mostly by the use of wearable devices. Some multi-purpose wearable devices can be used in multiple domains and can provide sensors for sleep tracking, heart rate monitoring, step tracking, accelerometer, gyroscope, GPS, etc. as presented in [5]. Evaluation phase research was mostly related to longitudinal studies where the training data was compared with the competition performances (e.g., [88,107]). The research related to the training control phase was tightly interconnected with the realization training phase research, especially when measures were introduced to record training data (e.g., [72,91,112,117]).
The least research was related to the planning phase of the training, where the real human coaches still prevail. The planning phase research was related to creating training (exercise) plannin g (e.g., [55][56][57]61,106,126]), nutrition planning (e.g., [55,151]), and full framework approaches (e.g., [78,131,145]), which were related to all training phases.
Identifying (2.RQ ) the most SST supported sports research was presented previously on the Figure 5 The most utilized data analysis methods (3.RQ) were support vector machines (19), artificial neural networks (14), k-nearest neighbors (11), and random forest (11). Based on the taxonomy identified in Section 4, the most widely used methods were in the category of data mining as seen in Figure 6.
Since SST research has experienced such a burst of growth in recent years, a number of papers presented numerous new approaches to sports training. We did not review any of the reviewed approaches in any of the training applications as TRL validation (4.RQ) level research. This may seem surprising, since a lot of applications have been widely used by the general public on Android devices, such as MyFitnessPal [157], Endomondo Running and Walking [158], Google Fit [159], and Apple ecosystem devices, such as Apple Health [160], all of them having over 10 million users. The problem is that such approaches do not share the algorithms and data analysis behind their analysis systems to maintain their competitive advantage. The highest level of TRL achieved was control and, two planning, four realization, three control, and three evaluation research studies reached that level, as seen in Table 21. Since the majority of research done is still in the prototype phase, additional studies are needed in all SST stages and sports to progress towards the control and validation of the proposed ideas. The maturity of SST is very low for phases in sports where no research was found: • Planning-no research in the domains of basketball, hammer throwing, rowing, D e e p L e a r n i n g 7 . 3 8 % Figure 6. Relative frequency of intelligent methods used by the proposed taxonomy, rounded to three significant digits.
The datasets used in the studies were mostly private data collections, which hinders the ability for verification and validation of published results. Only six studies used and referenced publicly available datasets as presented in Table 22. The vast majority of research used private datasets based on real data. The Dataset column presents the reference to the used dataset, in the Data type column the data were identified as real or synthetic, based on their origin, and in the Studies column, the studies using the mentioned dataset are referenced. All the studies where private datasets were used, or from which we could not identify the used dataset, were classified under the Private dataset rows. This, however, does not mean that free, open, and public datasets from the domain of SST do not exist. There are already a significant number of them (e.g., [53,[166][167][168][169][170][171]), together with websites that publish sanitized datasets (e.g., [172]), it remains a fact that they are not used enough. If researchers are going to avoid using existing datasets, they should try to ensure publishing as much of their private datasets as possible. A notable mention, although not used in any of the identified studies, is also the data generation methods [173], which could in the future allow easier verification and generation of data. According to our literature review, this research area is still very young, and has experienced increasing and persistent interest among researchers.
Therefore, there are still plenty of challenges for future research. The summarized challenges are: • Knowledge transfer into the real-world and validation level research. There are a lot of research papers that propose planning the training sessions for athletes in various sports. However, most of the papers are concluded with the results in a table, where the results generated by the selected method are shown. However, we do not know how some athletes approach these plans and what the long-term consequences or influence on race results are. Therefore, we encourage researchers to also share their insights of these results in the real-world. Most of the research that was presented reached at most the control phase of TRL and, as such, stopped short of the validation phase. If the field is to gain widespread validity such research is needed, and researchers should try to get in contact with professional athletes more and plan their experiments to capture a wider scope of audience in the field researched. • Cooperation with trainers and athletes: Every athlete is unique and his/her body or mind have different features. How to integrate this component in automatic intelligent solutions still remains a very topical problem. According to our systematic literature review, there are almost no papers that would include the conversation of researchers with athletes and their trainers in the design phase of their experiments. • Obtaining test datasets and their dissemination: The experiments are based on data that could be real or synthetic (i.e., generated artificially). Although a lot of data are available publicly (for example in cycling [53], or soccer [167]), most of the data are still inaccessible, mostly in the domain of individual sports. For that reason, researchers should be encouraged to deposit their data into public repositories, and enable other researchers to access their data.

Conclusions
In this paper, we reviewed the latest advances in the development and use of intelligent data analysis methods in the domain of sport training. The purpose of this systematic literature review was twofold. Firstly, we wanted to identify the main intelligent data analysis methods that can be used in different training phases, and, secondly, we wanted to determine which sports are the most supported by these methods.
The study revealed that researchers apply various methods, including computational intelligence, conventional data mining methods, deep learning, machine learning, and some other methods. Computational intelligence algorithms have been rising in popularity in recent years, while the most used intelligent data analysis methods remain support vector machine (19 studies), artificial neural networks (14 studies), k-nearest neighbors (11 studies), and random forest (11 studies). According to this review of 109 studies, we identified that soccer (12 studies), running (11 studied), and weight lifting (10 studies) were the most researched ones. When comparing sports based on participation levels, we have found that over half of the research in this field (54.6%) can be classified as research based on sports for individuals, and that team and mixed sports represent roughly a third of the existing research. The research domain still has a lot of room for improvement, where more validation-level research is needed as well as more publicly available datasets for replicating research and allowing an improvement of methods. Since the field is relatively new, a lot of sports exist with no research in the domain of smart sports training, which offers a wide area of possibilities for research to be made.

Conflicts of Interest:
The authors declare no conflict of interest.