Towards an AI-Based Tailored Training Planiﬁcation for Road Cyclists: A Case Study

: In a world where the data is a central piece, we provide a novel technique to design training plans for road cyclists. This study exposes an in-depth review of a virtual coach based on state-of-the-art artiﬁcial intelligence techniques to schedule road cycling training sessions. Together with a dozen of road cycling participants’ training data, we were able to create and verify an e-coach dedicated to any level of road cyclists. The system can provide near-human coaching advice on the training of cycling athletes based on their past capabilities. In this case study, we extend the tests of our empirical research project and analyze the results provided by experts. Results of the conducted experiments show that the computational intelligence of our system can compete with human coaches at training planiﬁcation. In this case study, we evaluate the system we previously developed and provide new insights and paths of amelioration for systems based on artiﬁcial intelligence for athletes. We observe that our system performs equal or better than the control training plans in 14 and 24 week training periods where it was evaluated as better in 4 of our 5 test components. We also report a higher statistical difference in the results of the experts’ evaluations between the control and virtual coach training plan (24 weeks; training load: X 2 = 4.751; resting time quantity: X 2 = 3.040; resting time distance: X 2 = 2.550; efﬁciency: X 2 = 2.142).


Introduction
Performance is tracked and optimized everywhere at any time.The sports world is an area where one can observe this ever growing need to perform better assessed by the growing market of mobile performance applications (see Deloitte 2017's report).In the business field, companies try to maximize their benefits by increasing their performance in multiple levels [1].The common aspect between these two fields is that one can track the performance and get an idea of it through quantitative information.Data is nowadays ubiquitous, and devices are always collecting data.This makes it easy to access microquantitative and visual feedback of our performance but does not necessarily explain the meaning in a macro perspective.Indeed, amateurs as well as professionals in the sports field are relying on wearable devices as a motivational object, but more importantly to perform better [2].
The growing market of wearables devices enables the tracking of new body-related information to be more precise and more detailed [3].These ameliorations enable one to get a deeper quantitative feedback, and might also overwhelm the athletes with numerical and complex data, reducing their capabilities to understand and use it properly [4].This high amount of data can lead the athletes to confusion or misunderstanding and could even make them lose motivation towards using such tools [5].
Making sense of one's data may become a complicated task.The athlete's level will also determine how one interacts and uses the collected data.However, one can fall into overtraining syndrome, which is experienced when a person trains disproportionally and ends up training in an inappropriate way compared to their capabilities and what their body can manage [6].Additionally, such syndrome can occur when the equilibrium between the training load and the resting time is not respected and can lead to critical situations, both related to physical and mental health [7].Amateur athletes are also more prone to develop such syndrome, as they do not benefit from coach feedback on the way they train [6].
Personalized training feedback may be seen as a way to mitigate the issues induced by performance tracking in sports.Professional athletes rely on human coaches with expertise in managing training, efforts, and resting time.We suppose that amateur and semi-professional athletes are not interested in the price of a human expert to manage their training as this would be too much for their needs.This level of athletes use training software that lacks a proper personalization component and may experience a decrease in their motivation past a certain time [8].
In our study, we present a novel approach to training management.We leverage on new artificial intelligence-based work in order to create a virtual coach with training personalization capabilities.The technique we use is able to tailor and take into consideration any level of athletes, as it bases the feedback instructions on the capabilities of its users.Using state-of-the-art technologies and specific measurements, we ensure a proper and convenient training plan designed and managed around the athletes.Thus, we provide amateurs and semi-professionals with a convenient training coaching system, enabling them to get expert-like instructions and feedback.Additionally, this solution can provide human coaches with a different point of view on their athletes' training and introduce more diversity in training plans.

Related Work
Through our research work, we reviewed an interesting collection of research papers that demonstrated the trends in computer intelligence applied to sports.We particularly looked at the presence of machine learning and algorithms applied to cycling.The results of the research still yielded additional sports that we comprised in this review.We coded the results of our state-of-the-art review according to Lv. L. and Ye.C.'s categories [9].We augmented the category set by adding new groups of systems, as reported in Table 1.As a general statement, we report a higher involvement of researchers into systems dedicated to endurance sports, like running [10][11][12][13] and cycling [13][14][15][16][17][18][18][19][20].One of the main reasons for this is that these physical activities are quite easy to practice and affordable.Thanks to this ease of access and higher popularity, they both benefit a wider range of sensors that can track the athletes [21].These sports are also ones that may require less data manipulation in order to work with machine learning.Indeed, rankings in running and cycling competitions are based on incremental scores [14], and one can observe changes in the athletes' performance through the collected timeseries data coming from many different sensors [21].

Usable Features for Sports
A primary data source for performance measurement is sensors.They can be integrated in specific devices or by using the ones embedded in smartphones.Raw data from these sensors may be used, but they are usually processed in order to remove noise.Sensors provide data with a high frequency and thus may not be used directly.For example, information such as the 3-axis acceleration enables the understanding of a movement relative to its past state but can highly fluctuate across short periods of time since the sensors are particularly sensitive.Most of the time, we observed a higher usage of feature combinations, where features get merged in order to create a new one, a process commonly known as feature extraction.Such measurements are time-related and are highly impacted by past values.In cycling, for instance, one can use Training Stress Score (TSS) to explain the effort required by a training session [22].Using sensors allows measuring signals without intruding the user's activity and, therefore, without impact on their performance.In contrast, measurements based on self-reporting such as Rating of Perceived Exertion (RPE) require that the athlete provide the required information in an explicit manner.For the RPE, the user should have a certain knowledge of its current capabilities as they have to report the perceived exertion on a 20-point scale.Despite being more accurate than other sensors' measurements, the RPE is hard to use [19].
Systems are no longer dependent on wearable sensors, unlike the ones used for running or cycling.Researchers also calculated statistics based on players or teams' actions and scores, since sports like football where access to a video of matches in high-ranking teams is quite difficult for external users to access.Thus, statistical data from the matches are used in order to predict future scores [23].Similarly, in the case of fitness, the data gathered on the athlete cannot accurately describe their effort; rather, one needs to use additional devices to track the movements and state on the machine [24,25].Features explaining the performance of a team or an individual may not be directly related to some sensors data.In fact, one can understand the performance of a football team by checking their scores and enrich the statistics with additional information on whether it was a home win or an away win [26].External information feeds can also help to understand one's performance, since people share a lot of information through social media platforms.Thus, mining the information shared on social media and using Natural Language Processing (NLP) can provide a rich source of feedback data about an athlete's performance [11].
Sports where the body position has a high impact on performance is also being helped by computer vision capabilities as well as recent research in deep learning.Indeed, it is possible to extract a great number of features and information from a camera feed.The biomechanical data can be treated in order to enhance the movements or correct the postures of the athletes and thus help them perform better.Sports such as golf, tennis or javelin are benefitting from these techniques to track the athlete's position and treat it through image processing [27][28][29].One's body shape may partially define their ability to perform at a certain level, as well as the extent to which one can perform a gesture.We observed the usage of data coming from specific sensors or measurements made to explain the current organs and physiological status of an athlete.We obviously find a high usage of the heartrate, despite being criticized for its high variance and dependence on the athlete's form [19].While heartrate explains the evolution of one's heart beats per minutes, other features are not sensor-based and may require experts to perform measurements.For example, kinanthropometric data is a set of information composed of the body size, its shape or even its composition and may be used to evaluate one's potential performance [30].
In Table 2, we summarize the observed features and classify using our own taxonomy.Compared to outer body-related data, the inner body information may vary faster.The two categories are linked but still separated by a thin line, which is the latter's fluctuation across a short period of time.Athlete-related data make a clear categorization of the athlete using their age, sex and anthropometric information.Additionally, computer vision (CV) based features can be extracted from video frames processing.These are measurements based on the skeleton's joint position and enable the understanding of one's body structure and position through time.A high number of sports rely on the athlete's position in time, thus the information may be key to an enhancement in performance [27].Processing video feeds can provide a high amount of information in professional-level sports, since the data is undisclosed due to the competition aspect.

Machine Learning Usage in a Sports Context
Not all machine learning techniques can be used to treat sports problems.Through the papers we reviewed, we found some trends in the selected techniques.The type of data, the number of features and the distribution of the data are some of the most determinant aspects to consider while choosing a machine learning model.Thus, depending on the features and the quantity of available data, researchers privileged some models among others.In Table 3, we present the count of machine learning techniques used in the reviewed papers.Artificial Neural Network (ANN) are the most used approaches of our review.This is due to recent advances in deep learning and ANN-based models, but mainly its generalization capability.ANNs tend to be, when well used, models that can treat any type of data.The architecture is flexible, as one can parametrize the number of neurons, the layer types (which depends on the data type), and the starting weights.The only issue is that ANN are known to be overused and, in some cases, may also overfit the dataset quite easily.The Support Vector Machine (SVM) approach is very similar to ANN in terms of generalization capabilities.As for ANNs, these models can also support high amounts of data.We are surprised that there are not more papers using this approach.Indeed, the use of SVM ensures avoiding overfitting on the available data.On the contrary to supervised machine learning techniques, unsupervised machine learning tries to solve one of the main issues of creating a dataset, which is data labelling.The goal of adopting such techniques is generating clusters of data that have similar characteristics or values.The finality of such an approach is mostly an explanation of the dataset's content where one can find and extract trends or patterns.They seem to be quite efficient at determining winning strategies in multi-staged competitions [14,15].
We also accounted papers using some older techniques to interpret sports' data, such as fuzzy logic and ontologies.Both techniques provide results that can be easily understood and may also be used with unlabeled data.Fuzzy logic coupled with fuzzy inference can provide an idea of the different states, or classes, of the given data [25].Thus, one could use it to extract classes from a given dataset that may not be initially understandable.In counterparts, ontologies are used to have a defined number of states and transitions and one uses semantic reasoning to apply the data to it and extract results.The latter technique will adapt the states and their transition to a dataset, thus it can be used to provide recommendations of another one's performance with a certain degree of flexibility [11,17].
Computer vision (CV) is also not directly linked to machine learning techniques and can beused when one needs to extract data from image processing in order to construct a statistical analysis or a usable dataset for machine learning techniques [31].We found many applications (mainly in sports) where it was not possible to use on-bodied sensors to track the user's movements.Thus, the tracking of data such as the skeleton joints can provide information on gestures and postures performed by the athletes and react to them [27].CV can also be used to track objects' movements and explain specific behaviors [28].

Sports Coaching Based on Machine Learning
Virtual coaches can be defined as, "computer systems capable of sensing relevant context, determining user intent and providing useful feedback with the aim of improving some aspect of the user's life" [32].We observed that virtual coaching has evolved as fast as machine learning research, enabling management of larger quantities of information, more data types and offering new models to rely on.However, we hereafter point out aspects linked to coaching and a common gap of all reviewed solutions.E-coaching consists of virtual support for human real-life activities and it can be deployed in a plethora of different contexts, including sports.
A coaching support can be provided in many ways, but it is important that the medium chosen for the interaction with the user still needs to be adequate to its application, especially in sports since athletes may not get the information in all contexts (i.e., before, during or after their effort).We relied on the modalities proposed by [33] to construct our synthesis: Audio communication is particularly interesting in sports, since it allows conveying information of the current performance to an athlete without engaging them in a high workload activity.We define high workload as any activity where the person needs to think and focus in order to get information.On the contrary, a video communication system represents a high workload during the effort, since the athlete needs to focus on the screen and not on the effort they are making.However, head-mounted displays may reduce the induced workload as the athlete's focus on their main activity is only overlaid by a small piece of information, such as direction information [16].Video cues and indications may be more efficiently used in sports or periods of effort where one is not stressed by a time constraint or can easily switch their focus.Sports such as weight lifting or trail track running benefit from video-based indications, since the athlete has time to focus on a screen before performing an exercise [34].Additionally, indoor and individual sports do not provide the same triggers for athletes' performances.Thus, involving the user in a totally virtual environment can benefit their performance at particular tasks in sports like home cycling [35].
Text-based communication, either synchronous or asynchronous, is presenting the same issues as video communication.Additionally, they may not enable one to provide as rich information as in the audio or video communication media.However, people rely on text for off-activity information gathering; thus, an athlete could easily look at coaching information while not practicing sports.Systems like training planners are only available through text-based communication since they are used pre or post the sports activity [11,20,36].
Athletes use coaches to push themselves to their limit and enhance their performance.Virtual coaching can provide motivational cues in an implicit or more explicit manner depending on the communication medium and the coaching type used.Team sports may benefit from the group effect as a motivation source but can also be enhanced by adding virtual information to the group [12,18,37].Although video communication is not recommended in certain sports, it can provide additional and richer information than audio or text-based feedback.People feel more comfortable and motivated when using a video-based coach since the emotional engagement may be helped by facial micro-expressions [38].Additionally, it is easier to get a correct execution of exercises using a video or virtual character showing the movements, rather than explaining them via text or using audio information [39].
We observe a trend in the use and availability of computational intelligence-based systems to support athletes in sports.In our review, systems were directed at assisting the athletes in order to correct or provide feedback on their performance.This growth is also explained by the increasing capabilities of machine learning and artificial intelligence, enabling new analysis and modeling of the athletes' performance.However, we identified a gap in these systems, since no research was properly leveraging on these techniques to properly personalize its feedback to the athlete.Thus, we introduce a new reinforcement learning-based virtual coach.The main aim of this study is to test our virtual coach training plans on the following five different components: the training sessions' distribution, training load, resting time quantity, resting time distribution, and efficiency.We involve professional coaches into the process of evaluation and compare our approach for training planification to an established training coaching platform and gather quantitative and qualitative results that we further analyze.

Materials and Methods
We involved six road cycling experts to provide a qualitative and quantitative feedback on three series of training plans.We generated the training plans using our virtual coaching system that bases the planification on a balance of TSS and TSB values.The system functioning is further explained in a previous research paper [36].The experts are provided with three different training plans each designed for a specific training period.Thus, coaches have to evaluate plans for the following training periods: The selected time periods were dependent on the availability of training plans provided by TrainingPeaks' coaches as we use it as a control training plan platform.Using the selected time periods, we generated new virtual coach-based training plans and anonymized both training sources.We have built the training plan and provided the anonymized data of a semi-professional road cyclist after consent agreement.
Both control and virtual coach training systems were provided with the athlete's training data in order to build up the training.The training plans were composed of a date, a session type and the TSS score.The TSB score was recorded at some point in the time period in order to inform the participants about the timeline of the sessions, their type, the load induced by the session and its implication on the athletes' overall form.In the case of our virtual coach system, we selected the most rewarded attempt at planning for each time period [36].
The proposed sessions are standardized in a set of categories: endurance, anaerobic intervals, recovery and rest.This implies that the sessions proposed in the control training plans are converted into these categories.We further explain the categorization process in the following section.
The selected control training plans were described as not intended for professional athletes and we qualitatively evaluated them as adequate in terms of the proposed sessions for our athlete's profile.Additionally, these training plans were also selected because they were not merging too many sessions of different sports in order to build up performance.Thus, at the end of our selection process we have chosen the following training plan as the control:

The Training Evaluation Questionnaire
The experts were provided an online questionnaire that mixed quantitative questions with a 5-point Likert scale and more qualitative ones to gather comments and insights on cycling coaches.The set of questions was the same for each time period to allow a comparison across them.The questionnaire was composed of the following questions: 1.
Which of the two planning seems the most appropriate for this period of time? 2.
Explain your choice in the previous question.
Please evaluate the training load for the two proposed planning (1 = very low, 5 = very high).5.
Please evaluate the efficiency of the proposed planning (1 = very bad, 5 = very good).8.
Additional comments on Planning 2?
The identification of the two proposed planning was made using "Planning 1" and "Planning 2" as names for each of the set time period.As the set of provided planning for each time period did not mention if the planning represented either the control or the virtual coach training plans, both could be identified as either Planning 1 or Planning 2 in each time period.The training plans in each time period were randomly attributed to the two particular identifiers.

TrainingPeaks' Converted Sessions
For comparison purposes between the two proposed training plans, we use a defined set of training types.Thus, some of the control plan session names had to be adapted.Since each planning period had specific workout names, we transformed their names depending on their initial name (if it mentioned the training type), the effort distribution along the session and the session's TSS load.
Hereafter are presented the original sessions' names and the converted ones with an additional explanation as to why they were categorized like this.The six week planning sessions converted as anaerobic intervals were threshold intervals, sub threshold + VO2/AP, threshold + VO2 intervals, sprint (15 and 30 sec sprints), sprint training, VO2 intervals, threshold + 30/30 intervals, sub threshold plus breakaway, combination big gear + sprint/finish, endurance with on/off sprints, tempo progression, and sprint + VO2 (race simulation).We motivate this categorization by observing the training intensity fluctuation during the training.All these sessions were composed of scheduled periods of different intensities denoting intervals.
In the same control training plan, we further grouped the following sessions under endurance: combination big gear + sprint/finish, endurance with on/off sprints, tempo progression and sprint + VO2 (race simulation).Even if these sessions were segmented, we considered the fact that their duration was longer than what was presented in intervals.The intensity in these sessions was mainly set to be the same during a long period of time.
Recovery sessions were considered by their low TSS load and are meant to allow the athlete to recover.We considered the following sessions as being part of the Recovery category: recovery + sprints, recovery + high cadence and accelerations, sprints on/off (10 s), and pre-race (one day before).
Adaptations to the 14-week control training plan were conducted.We categorized the later sessions as being anaerobic intervals: spin ups, 3 aerobic test, strengthreps (set 1; 8 × 2.5 FTP(HR); 2 × 10 FTP(HR) into 10 zone 2), FTP Test, STME intervals (one minute; FTP(HR) over/unders), hill reps-set, short spin, activation session and hilly ride-low RPM.The main reason for this categorization is the presence of multiple segments with alternated intensities, which happened either in the whole training session or in most of it.We combined two training sessions (core strength and flexibility workout and leg speed set 1-short), as Recovery, since the combined TSS value was low.
In the same training period, we categorized a small number of sessions as Endurance due to the long segments in the same TSS value.Thus, the following sessions were considered Endurance: activation session and hilly ride-low RPM.
For this last training period, we identified the following training sessions as pertaining to the Endurance session type: hills >400 m 1 h 30 m; velocity skill 120 rpm; PD curve test: long; endurance 1 h 30 m; tempo intervals 1 × 45 m; goup (or Dartlek Solo) 2 h hard; rolling tempo 1 × 55 m; and race: circuit 1 h.These sessions were all composed of a single or two long segments of the same intensity.Surprisingly, we could not find any session to define as Recovery in the 24-week training plan.

Measurements
Results are reported in a contingency table for each question having a Likert-scale answer for each condition.The results are grouped into a positive part, a neutral part and a negative part where we summed values that were in the positive middle part (between 3 and 5), the neutral part (3) and the negative middle part (between 3 and 1).Additionally, we provide an χ 2 value for each table in order to provide information on the dependency of the results we gathered.We also gather qualitative data on the reasons people have preferred one training plan over another for a certain period, as well as their thoughts on the proposed planning.

Results
The experiment was conducted with six sports coaching experts in the cycling domain.As we extend our previous research, the study was done across the span of one year, but the data and the questionnaire remained untouched.
In this section, we first present the results in terms of suitability of the proposed training plans for each training period.Then, we analyze the results through contingency tables for each question and each training period.

Suitability of the Training Plans
The binary answer provided by each participant informed which planning was better in a given period of training.We asked the coaches for a detailed explanation to understand their choice for each training period.The difference was higher in the six week training period, where 60% of the participants selected control planning.From the qualitative data collected on the reasons that they chose the latter, three of the experts mentioned too high TSS and TSB values in the virtual coach's proposed planning.Most importantly, one mentioned that the TSS values of 457 must not be present in a training plan.The smoother approach to the TSB increase presented by the control training plan has been described by two of these three coaches as a more adequate approach to training.The last one that voted for the control planning mentioned the diversity of the trainings, while the two participants who voted for the virtual coach plan mentioned the session distributions as "slightly better" and "more polarized" compared to the control plan.
Results for the same questions across a longer time period demonstrated less discrepancy between the two training plans.Indeed, three coaches voted for the control planning as the best solution for the 14 weeks planning period, while three others voted for the virtual coach training plan.Similar to the previous virtual coach-based plan, the presence of training sessions with a 457 TSS value has been criticized as being too high.The control planning was mentioned as having a more gradual approach with easier workout sessions in terms of TSS and avoids TSB peaks in the high-risk zone (Blog post by Joel Friel: https://joefrielsblog.com/managing-training-using-tsb/).The last control planning vote provided the motive that sessions were more diverse and proposed a higher number of anaerobic training.However, it seems that coaches all have their preferred style and approach, as one of them mentioned that the virtual coach planning better suited the training period since it proposed a higher amount of endurance sessions relative to intervals.Another vote was given to the virtual coach plan as the participant noted the gradual increase over the period as preferable.The same expert also mentioned that the planning was better if one planned to build a strong foundation for long-term training.The last coach who voted for the virtual coach mentioned that the control training plan was too poor in terms of the provided training load.
As in the previous situation, the 24 week training plan proposed as the control received three votes, compared to three votes for the virtual coach-based one.The same expert voted for the control as the virtual coach was presenting 457 TSS sessions.Another participant for the control planning mentioned that the planning was, "more efficient without tiring the rider so much in each session," while the last one that also voted for the latter plan mentioned the main presence of anaerobic intervals.One coach mentioned that the virtual coach planning would be more interesting for long-term performance building and argued that this training plan could also be perceived as "more encouraging from a moral perspective" with the help of small peaks.Another participant mentioned the fact that the interval quantity was too important in the control training plan and opted for the virtual coach proposition despite a mentioned high amount of endurance sessions.Lastly, the virtual coach planning proposed a fewer number of interval sessions than in the control plan, which resulted in the second vote for the virtual coach-based training plan.

Training Sessions' Distribution
The second question of our questionnaire targeted the distribution of the sessions through the week or the whole training period.The collected data provide insights on the way that the virtual coach schedules and the number of a certain type of training sessions, compared to the distribution of the sessions in the control training plan.In Table 4, we present the results of the 6 week training plans in a contingency table.We observe that the virtual coach's sessions' distribution across this time period appears worse than the control.However, according to the χ 2 -test, the results are not significantly different (χ 2 = 13.897).We observed no statistical difference in the two other time periods of planning (14 weeks: χ 2 = 20.925 and 24 weeks: χ 2 = 6.303).Comparing the contingency tables, the distribution of the virtual coach sessions revealed to get better with longer spans as demonstrated by Table 5, but still could not match the distribution quality of the control training plans.Thus, we conclude that the system has to be enhanced on the distribution of the sessions, but we see an enhancement as the training period is augmented.

Training Load
Experts were asked to rate the training load of the two proposed training plans for the three given periods.Results demonstrate no significant difference between both conditions in the 6 and 14 week training periods, while the results from the 24 week planning were significantly different (χ 2 = 6.303).In the latter time period, the virtual coach was outperforming the control plan by having a lower training load according to the participants.Table 6 shows the resulting contingency table for the 24 week planning period, where the control plan has been defined by experts as providing a training load that is too high.Comparing the training plans, we observed that the control plan has maintained peaks at a higher TSS value, while the virtual coach-generated planning provided peaks with 457 TSS, but still maintained some training sessions with lower TSS in between.The same results appeared in the two other training periods, where the virtual coach generated a training plan that has been evaluated as either more balanced or in the lower range of values than the control training plan.We also observed that most coaches evaluated the control training plan as providing a training load that was too high compared to the virtual coach, in the 14 week training period (three times given a four for the control).
In summary, one could say that the training load was more balanced in the training plans provided by the virtual coach in the longest training period of 24 weeks.

Resting Time Quantity
The results obtained on the evaluation of the resting time quantity demonstrated a statistically significant difference for all the training periods.In the 14 and 24 week time period, the virtual coach training plans were more balanced in terms of resting time quantity and were evaluated as providing a more decent amount of it compared to the control plans, as reported in the contingency tables of Table 7.In the shortest time period, the control training plan provided a more satisfying resting time quantity compared to the virtual coach.We observed that the coaches mainly evaluated it as "neutral" in the control with four of them providing a three, as demonstrated in Table 8.The resting time quantity reveals to be more adequate and balanced for the training proposed by the virtual coach.We observed better results in the 14 week training plans, while both propositions were judged as excessively providing resting time (compared to their respective planning).In the 6 week time period, the control training plan outperformed the virtual coach training plan.

Resting Time Distribution
Through this component of the questionnaire, we observed if the resting time was correctly distributed across the training period, according to the coaches.We observed that only the 14 and 24 week training plans received statistically significant results.The 6 week training plans had no significant difference according to the χ 2 -test (χ 2 = 7.733).Additionally, this time period is the only one where the virtual coach was worse than the control training plan, as shown in Table 9.In this 6 week time period, the virtual coach was either putting too much resting days in the same week or not enough, which means that the distribution of the between sessions' resting days was not balanced to the training sessions' intensity.We observed that the training plans were also getting equally evaluated by the experts on the 24 week training period as shown in Table 10.However, the main discrepancy of Table 10 was observed in the 14 week' training plans, where the control planning was evaluated as providing too much resting time in a consecutive manner (two more coaches gave a four than for the virtual coach).Results in the 14 week and 24 week training evaluations were statistically independent according to the χ 2 -tests (14 weeks: χ 2 = 4.751 and 24 weeks: χ 2 = 2.550).The virtual coach was always evaluated as better in the distribution of the resting days in longer periods of time.Thus, we observed that the virtual coach needs more constraints on the short period of time and still has to be more balanced between training and resting in periods exceeding 14 weeks.

Efficiency of the Training Plans
Results from the evaluation of the planning efficiency were statistically dependent on the 6 week and the 14 week training periods, while the 24 week results were statistically independent.In the two former time periods, the virtual coach was never evaluated as efficient and was considered more neutral, while the coaches mostly agreed on the control training plan, as demonstrated in Table 11.As for the other evaluations, the virtual coach was better noted for the longer periods of time but received its worst results for 6 weeks.Table 12 shows the 24 week training plan scores, where we observe that the virtual coach received even less disagreement on the efficiency than the control training plan.Efficiency seems to be one of the areas where our virtual coach was not performing better than the control training plans.We further looked at the qualitative evaluation of the whole training plans from the coaches and made the link to this lacking component.However, coaches have not found the training plans out of place but had a more neutral evaluation of them, indicating that the basis is there and only needs some changes to get full agreement.

Discussion
Our virtual coach demonstrated good results on the longer time periods.We observed that the results are better managed by our virtual coach generated training plan, compared to the training plans provided by real human coaches.Experts composed of professional cycling experts, evaluated our virtual coach planning to be just as appropriate as the human coaches for the 14 and 24 week training period.We observed better results on the components relating to resting time and resting distribution.These results help us identify paths of amelioration where adjustments to the virtual coach optimization conditions could lead to better management of the athletes' training plan.
Based on these expert evaluations, we identified flaws in the design of the generated training plans.Indeed, most of the coaches mentioned the control plan as the best suited for the provided training period.The reason for this is the fact that looking at the TSB curves, we observe that the control is less peaking in each extreme and TSB values are smaller too.Thus, the virtual coach appeared more chaotic in the management of TSB compared to the smoother curves of the control training plan.Additionally, the peaks of TSS sessions were higher in the virtual coach than in the control training plans, and the experts interpreted them as very high intensity training plans in the case of the virtual coach.The results for the appreciation of the training plan are something that is mainly induced by the lack of information provided to the experts.Providing them with a capabilities summary of the user we built the training plan for might have changed the results.We still think that the training plans are excessive in the performance requirement and we leverage on these first comments to adapt the conditions we apply to the training.
Through the results, observed that the main issues faced by the virtual coach training plans happened in the 6 week training duration where the system was proposing high intensity trainings and provided unbalanced and intensive training plans for a short period.In fact, in such a reduced time frame, one would not want to see such high intensity training sessions and may need a more relaxed training plan.We need to reduce the TSS threshold for the sessions selected according to the length of the training requested.Additionally, the time of the season plays a role in training, since one would not start their training season with such high-intensity trainings.Adapting the range of TSS sessions proportionally to the current season's trainings might be a better strategy than providing the athletes with a training plan based on their whole sports life.The distribution of the sessions received bad evaluations for the virtual coach because it was either proposing too much or not enough of the appropriate training sessions.This is clearly seen in some training plans generated where the virtual coach is filling the planning with resting days in order to reduce the TSB values and maintain the athlete in the proper zone.As previously mentioned, the intensity regulation should also have the proper effect on this component.Proposing lower intensity trainings will reduce the necessity of recovery and resting days and will thus reduce the probability of a chain of these sessions.
In the longer time periods, we observe that our virtual coach system can perform as well as the human coaches' plans.In terms of resting quantity and resting time distribution, the virtual coach was outperforming the control training plans.In the intensity of the training, and despite the peaks of high TSS values in some sessions, the experts revealed that the virtual coach provided the athletes with a coherent training plan.In some cases, the training plans were in both judged as excessive, such as in terms of resting times distribution (see Table 10).
There are indeed points where the virtual coach's plans were not the better than the control ones and other components where this tendency was inverted.We believe that working on a shorter time period and using less data for the initial training plan could result in a more balanced training plan for each time period.We also need to put more focus in the management of the athlete's TSB evolution in order to smoothen the progression in the proposed trainings.We, however, keep in mind that there were still some components where the experts were less likely to notice a difference between both presented training plans.Thus, there is positive as well as negative management of components in both approaches.One should consider the proposed planning from the virtual coach and learn from it and use it, while the virtual coach should be inspired by the way a human coach interprets athletes.

Limitations
From a technical point of view, the developed virtual coach is currently basing the performance evaluation on the whole dataset of provided trainings.We observed that this could mislead the creation of our training plans, as one's performances may not be the same over the span of years.Thus, the over-evaluation of an athlete's capabilities could also lead them towards a negative effect of training as they might fall into overtraining after the first week of the proposed planning.However, such limitations would require more testing in real time, with participants trying to follow the planning and using the virtual coach.We think that in such a situation, the model will get adjusted as well as the planning and this would mitigate the overtraining risks.The virtual coach only takes the athlete into consideration and not his ambient.The latter is also causing issues as a sports year is dependent on components like seasonality.
Taking the experimental point of view, we observed that the information shown in the generated training plan lacked context.Experts were not able to understand and were sometimes frustrated by the fact that there were peaks in the proposed sessions' TSS.Providing them with an athlete profile would certainly have changed their point of view on the proposed planning.Our case study does not experiment the usage of such technology-based coaching with a real-life athlete, but this will be explored in future work.

Conclusions
The field of sports science is being highly impacted by the availability of wearable data.Such devices enable anyone to get information on their performance, mainly in endurance sports.We observed the growing trend to apply computational intelligence as well as artificial intelligence-based systems to understand and enhance athletes in many kinds of sports.The provided tools are not always as simple as smart bracelets or smart watches; rather, they involve a certain quantity of installations such as tracking cameras.Endurance sports, as mentioned earlier, do have a better solution as wearables and specific measurement tools are vastly developed.
Using such information, one intends to understand her training or get some feedback on her performance.It may foster a positive effect on motivation to practice sports that may also tend to go through overtraining.Thus, researchers are looking for virtual coaching solutions to accompany the new and current athletes in their training, by providing algorithm-based training advices.Such tools have a limit, as mentioned, and are mainly based on heuristic-based rules to determine training planning.
Through our empirical study, we propose a novel approach on training planning, by tailoring the training sessions proposed by the past performance of the athlete to their current performance and evolution through time.We proposed our reinforcement learningbased approach in past research and test it in this case study with cycling coaches as experts.The quantitative results demonstrated that our virtual coach has been equally, or even better, evaluated compared to training plans from real coaches.Qualitative data and the former quantitative results also pointed out lack of context of the developed virtual coach.Thus, we will further go into the development and research of our virtual coach by adding more contextualization in the training plans creation process.

Table 1 .
Applications of machine learning in sports.

Table 2 .
The categorization of used data in sports dedicated systems.

Table 3 .
Comparison of machine learning models usage in sports.
24 weeks: "Intermediate Peak form and Base for faster Criterium + email access to Coach (WKO4i Levels)," from Richard Rollinson.

Table 4 .
The contingency table of the 6 week sessions' distribution.

Table 5 .
The contingency tables of the 14 and 24 week sessions' distribution.

Table 6 .
The contingency table of the 24 week training load.

Table 7 .
The contingency tables of the 14 and 24 week resting time quantity evaluation.

Table 8 .
The contingency table of the 6 week resting time quantity evaluation.

Table 9 .
The contingency table of the resting time distribution in the 6 week plans.

Table 10 .
The contingency tables of the 14 week and 24 week resting time distribution evaluation.

Table 11 .
The contingency tables of the 6 and 14 week efficiency evaluation.

Table 12 .
The contingency table of the 24 week training plans efficiency evaluation.