Training Design, Performance Analysis, and Talent Identification—A Systematic Review about the Most Relevant Variables through the Principal Component Analysis in Soccer, Basketball, and Rugby

Since the accelerating development of technology applied to team sports and its subsequent high amount of information available, the need for data mining leads to the use of data reduction techniques such as Principal Component Analysis (PCA). This systematic review aims to identify determinant variables in soccer, basketball and rugby using exploratory factor analysis for, training design, performance analysis and talent identification. Three electronic databases (PubMed, Web of Science, SPORTDiscus) were systematically searched and 34 studies were finally included in the qualitative synthesis. Through PCA, data sets were reduced by 75.07%, and 3.9 ± 2.53 factors were retained that explained 80 ± 0.14% of the total variance. All team sports should be analyzed or trained based on the high level of aerobic capacity combined with adequate levels of power and strength to perform repeated high-intensity actions in a very short time, which differ between team sports. Accelerations and decelerations are mainly significant in soccer, jumps and landings are crucial in basketball, and impacts are primarily identified in rugby. Besides, from these team sports, primary information about different technical/tactical variables was extracted such as (a) soccer: occupied space, ball controls, passes, and shots; (b) basketball: throws, rebounds, and turnovers; or (c) rugby: possession game pace and team formation. Regarding talent identification, both anthropometrics and some physical capacity measures are relevant in soccer and basketball. Although overall, since these variables have been identified in different investigations, further studies should perform PCA on data sets that involve variables from different dimensions (technical, tactical, conditional).


Introduction
In the last decade, team sports have experienced an accelerating growth and evolution in technological developments (e.g., wearable, small, and inter-device connection), influencing the daily work from researchers to practitioners in the sports science area. Thanks to this development, new and specific tools have been created to use in team sports science and medicine that are safer, less invasive and with high validity and reliability [1,2]. The creation of these technological tools led to the development of different software to capture and analyze up to a thousand data per second in up to 400 variables after or in real-time from different dimensions (technical, tactical, conditional) [3].
A systematic review was performed following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [19]. The procedure realized for data identification, selection, and extraction is presented in Figure 1.

Data Sources
A systematic electronic search was computed from PubMed (n = 67), Web of Science (n = 154), SPORTDiscus (n = 68) and Scopus (n = 179) on 1 November 2019 before 9:00 a.m., in order to identify studies that use the PCA in team sports as a data reduction technique. The authors were not blinded to journal names or manuscript authors. The search strategy combined terms covering the topics of the population ("team sport," soccer, football, basketball, rugby) and intervention (PCA; "principal component analysis," ¨exploratory factor analysis¨). The search was made using combinations of the following terms linked with the Boolean operators "AND" (inter-group Boolean operator) and "OR" (intra-group Boolean operator, only for the second). Studies were included if PCA was made in the most-studied team sports (soccer, rugby, or basketball) following previous research [16].

Data Collection
One of the authors downloaded the primary data from the articles (title, authors, date, and database) to an Excel spreadsheet (Microsoft Excel, Microsoft, Redmond, WC, USA) and removed the duplicate records. Then, two authors screened the search results independently against inclusion/exclusion criteria. The references that could not be eliminated by title or abstract were retrieved and independently evaluated for inclusion. The authors were not masked to the title or authors of the publications. Any disagreements (2%, n = 11) on the final inclusion-exclusion status were resolved through discussion in both the screening and excluding phases, and the final decision was an agreement between authors. Abstract and conference papers from annual meetings were not included due to the lack of information needed to systematize (e.g., PC cumulative %, PCA total

Data Sources
A systematic electronic search was computed from PubMed (n = 67), Web of Science (n = 154), SPORTDiscus (n = 68) and Scopus (n = 179) on 1 November 2019 before 9:00 a.m., in order to identify studies that use the PCA in team sports as a data reduction technique. The authors were not blinded to journal names or manuscript authors. The search strategy combined terms covering the topics of the population ("team sport," soccer, football, basketball, rugby) and intervention (PCA; "principal component analysis", "exploratory factor analysis"). The search was made using combinations of the following terms linked with the Boolean operators "AND" (inter-group Boolean operator) and "OR" (intra-group Boolean operator, only for the second). Studies were included if PCA was made in the most-studied team sports (soccer, rugby, or basketball) following previous research [16].

Data Collection
One of the authors downloaded the primary data from the articles (title, authors, date, and database) to an Excel spreadsheet (Microsoft Excel, Microsoft, Redmond, WC, USA) and removed the duplicate records. Then, two authors screened the search results independently against inclusion/exclusion criteria. The references that could not be eliminated by title or abstract were retrieved and independently evaluated for inclusion. The authors were not masked to the title or authors of the publications. Any disagreements (2%, n = 11) on the final inclusion-exclusion status were resolved through discussion in both the screening and excluding phases, and the final decision was an agreement between authors. Abstract and conference papers from annual meetings were not included due to the lack of information needed to systematize (e.g., PC cumulative %, PCA total variance explained, eigenvalues, statistical and methodological crucial information). The additional information provided by the authors was considered during the screening process. Lack of other forthcoming details led to the article being excluded. Documents from all languages were included but were excluded if a translation may not be done.

Data Selection
Two authors performed the final studies' selection and information extraction. The systematization of the data was made using methodological outcomes and results of the studies. The methodological approach was made analyzing the criteria used to perform exploratory analysis considering retention loading criteria, data suitability testing, extraction method used, factor and loading retention criteria selected, and rotation method if performed (see Table 1). The EFA outcomes were resumed considering author, year, sport (discipline), variables characteristics, participants information, sample size, number of extracted factors, percentage of variances explained, number of variables extracted, and final extraction outcomes (see Table 1).

Results
A total of 468 articles was initially retrieved from the mentioned databases, of which 116 were excluded considering the title, abstract and year of publication. After duplicate removal, a total of 188 articles was analyzed, contemplating exclusion and inclusion criteria. Finally, the full text of 42 studies was read and, due to a lack of vital information, eight studies were not considered. Therefore, 34 articles were included in this review ( Figure 1).

Study Characteristics
Big data reduction through PCA was performed in 34 articles, clustered in different team sports: 17 in soccer, 11 in basketball, and 6 in rugby. The extracted variables belonged to five metrics: technical, tactical, biomechanical, physical/physiological and anthropometrics. Overall, the most considered metric was physical/physiological.

Soccer
From the 17 articles on soccer (Table 1), four articles aim to identify the technical patterns that define players' behavior, five articles to analyze the tactics, two articles to explore biomechanical aspects, ten articles to assess the physical/physiological requirements, and five articles to characterize the anthropometrical variables. Overall, from 54% to 81% of the total variance was explained through ≤23 variables.

Basketball
From the 11 articles on basketball ( Table 2), five articles extracted variables related to technical patterns, one article evaluated biomechanical aspects, five articles characterized physical/physiological requirements, and two articles assessed anthropometrical metrics. In general, the percentage of total variance explained was higher than 62% through 14 or fewer variables.

Rugby
From the six articles included in the qualitative synthesis (Table 3), five articles studied the physical/physiological requirements, and one article analyzed the tactical variables. In general, the percentage of total variance explained from those variables that formed PC was between 52% and 90%, which was formed by 38 variables to explore tactical analysis and lower than 9 for physical/physiological requirements.

Discussion
The present systematic review sought to identify the most relevant variables to explain players' performances, extracted through PCA in soccer, basketball and rugby to highlight the practical applications for training design, performance analysis and talent identification. In soccer, together with some anthropometric measures, relative age is the most important factor for talent identification. In the sport of soccer, a key tactic taught by coaches is occupying space during the game. In addition to controlling the ball, passing, and shots on goal, issues related to tactics and space occupation should guide training task design, ensuring that a high level of aerobic endurance in combination with very intensive and short actions is employed. In basketball, both anthropometrics and some physical capacity tests (sprint, flexibility, and agility) should be performed for talent identification. Like other team sports, training tasks in basketball should be based on a high level of aerobic endurance and the required acceleration/decelerations, but unlike other sports, landings impact, and body positioning gain relevance. While in rugby, as in other sports, highintensity actions need to be trained, but for rugby this needs to be in combination with collisions. In addition, the tactics or style of play and high-intensity game pace are very relevant for this team sport.
A player's external perception defines the motor behavior required to respond, which in turn determines the resultant motor action. These can be categorized into four main conditional dimensions each sport-specific action. Specifically, a player act using a motor skill (technical dimension) that requires a movement (conditional dimension), according to the player's decision making (tactical dimension), and is conditioned by the player's psychological state (psychological dimension) [49]. The uncertainty and non-linearity of the team sports games' environmental nature lead to engaging these dimensions during the competition, without the possibility of pre-establishing what action will perform in each situation, making team sports unpredictable. In this context, it is why behavior analysis has become crucial to better understanding team sport athlete preparation.
The present systematic review, analyzed studies published in team sports that, applied the PCA technique. Practically, the importance of this systematic review may be considered in three ways: (1) team staff decision-making focusing training processes on the most relevant performance manifestations (high-intensity actions, lower intensity actions, shortexplosive actions), (2) making more efficient training and competition analysis processes, which remains crucial due to the congested schedules in which many of the sports are usually involved, and (3) to highlight the most relevant variables for talent identification.

Soccer
One of the most relevant challenges for coaches of young soccer players is to develop training processes to determine a particular potential to become a professional player. Players' anthropometrics (i.e., weight, height, biceps/triceps, subscapular, supra iliac measures) are essential performance indicators for talent identification. However, most studies identified that maturity is one of the most critical variables in this regard [25,26,32,34]. To date, the current research in soccer talent identification reports a systematic bias in selection towards players born early in the year (relative age effect) [18]. Those players with early maturity tended to have better physiological and technical performance. Subsequently, they are more influential on the game and recognized as more talented [18].
But, all dimensions of a team performance should be trained using soccer-specific situations. Therefore, technical and conditioning (in addition to psychological) dimensions should be developed during soccer-specific tasks in which tactical positioning should be the main basis [49]. In this regard, authors have highlighted the teams' surface area (occupied space) as the main variable to assess team positioning [20][21][22]. This variable may be suitable for evaluating team behavior during different game phases and training the team to act in different game phases, where the attacking team should occupy a greater area than the defending team [50]. This fact is suitable because players perform in greater spaces per player and more inter-player distances during the attacking phase. In contrast, in the defending phase, players should maintain lower inter-player distances to close space within the convex hull, avoiding the attacking team's progression.
However, team tactics are governed by inter-player connections through technical actions [51]. In this sense, training those tasks based on collective tactical positioning should be constrained to improve motor skills. Based on PCA, the main technical variables are the control of the ball, passes, and shots, specifically, in young soccer [25,32,34]. So, coaches should design training tasks with continuous role changes, ensuring players concentrate to coordinate sudden movements with teammates from greater areas to lower to improve positional decision making, in combination with a high number of ball controls, passes, and shots. This fact is consistent with Cordón-Carmona et al. [51], who explained that pass networks imply passes and controls are a vital game issue.
Finally, physical fitness and conditioning in soccer should be achieved using highintensity intermittent actions (anaerobic endurance), with lower intensity and longer efforts (aerobic endurance), considerable power and strength aimed to perform very fast and intensive actions (neuromuscular efforts)-all of these together can results in agility and flexibility training tasks. Therefore, from the more than 200 variables that may be extracted from internal/external load, the challenge is to identify the most relevant variables in each sense. From the studies identified that perform principal components, 17 variables have shown the highest percentages explaining players performance: anaerobic endurance (i.e., angular velocity, speed displacements, distance at high metabolic load, HSR, sprint running, maximum velocity), aerobic endurance (i.e., distance covered, distance covered slow than 6 km/h, distance covered at between 21-24 km/h, metabolic power, dynamic stress load), and neuromuscular efforts (i.e., jumps, impacts, accelerations, decelerations, maximum acceleration, maximum decelerations).
In summary, both head and physical fitness and conditioning coaches should design training tasks in which players' are required to have an optimal use of occupied space. Simultaneously, they need to perform many ball controls, passes and shots during high-intensity aerobic endurance, combined with a high number of impacts and accelerations/decelerations. For example, those task designed based on differential learning has been demonstrated as efficient to improve players performance from different dimensions (technical, tactical, and conditional) at the same time [52,53].

Basketball
Basketball as a sport has some unique qualities, which is different from other team sports, when considering how to assist coaches with athlete selection decision-making. Different authors have reported hand measures, height, weight, muscle mass and fat mass or body fat % are the main anthropometric factors important for talent identification in basketball. In fact, in talent identification programs, anthropometric measures have become one of the most important measures to consider, with it reported that anthropometrics metrics formed the second largest principle component in basketball talent identification, explaining the 20 % of the variance [54]. Moreover, the first and the third component related to talent identification is related to sprint test performance from 10 to 30 m, plus flexibility and agility variables. These metrics and variables are consistent with those identified in this systematic review, in which these tests are supported for talent identification processes, together with countermovement jump and squat jump testing [42]. Therefore, anthropometrical variables and sprint, flexibility and agility tests seem to be the primary basis for talent identification in basketball.
When considering the conditional dimension, unlike football, in basketball, both technical and conditioning dimensions have formed the main research topic of performing data reduction through PCA. In contrast, positional decision-making variables (tactical dimension) have not been widely investigated in basketball [55], and its application has been lower than in soccer [56][57][58]. For conditioning in basketball, distance over 18 km/h, max accelerations and decelerations, together with impacts of 3-5 g, average landing and take-off, and relative distance have been found as principal components between basketball's load variables [15,43,44]. This is consistent with these studies highlighting that lower spaces per player are related to a high number of accelerations/decelerations and more high-intensity displacements [59,60]. When designing training tasks focused on these variables, free-throws, 2 and 3-points, passes, turnovers, defensive and offensive rebounds have been highlighted as essentials requirements [17,[35][36][37].
Together with task focus on performance improvement, and the highlight that sudden basketball-specific action variables identified by the principle component (i.e., throws, turnovers, rebounds, acceleration/decelerations, and take off) make basketball a team sport with a high incidence of injury [61,62]. Thus, training processes should aim to, together with coadjutant exercises to focus on injury prevention. Specifically, injury prevention programs should be done during the in-season period because of their greater effectivity than pre-season, at least, in anterior cruciate ligament's injury prevention [61]. Suggestions by Stojanovic and Ostojic [62], reported that stretching, proprioception, strength, plyometric and agility drills with additional verbal and visual feedback on proper landing technique lead to a decreased rate of an anterior cruciate ligament injury in team sport athletes. Due to the high eccentric contractions during basketball-specific actions, overuse and inflammatory conditions accounted for more than 39% of injuries during a 32 weeks basketball season [63] and, therefore, coadjutant training programs should consider this fact.

Rugby
From a holistic viewpoint, players' and teams' positioning is the main basis on which to develop the remaining dimensions (technical, conditioning, and psychological). Although distances between rugby players has been assessed [64], it is not widely evaluated [65]. One study reported that time of possession, speed of play and playing form variables are the appropriate measures to assess playing tactical issues. However, further studies are needed to extract robust conclusions [8]. Similar to the tactical dimension, the most relevant technical variables were not extracted. Therefore, although some actions such as collisions/tackle and passes are highly considered in rugby [66,67], PCA is necessary to extract the most critical information from technical actions.
The use of technology capable of extracting physical and physiological variables is widely used in rugby [66]. Therefore, they are the dimension most considered to extract the most relevant variables [13,45,46,48]. Rugby is a team sport in which players engage in a repeated high-intensity exercise involving frequent collisions, especially during the most demanding passages [66]. To assess players' performance during matches, the most relevant variables were aerobic/anaerobic endurance (rating of perceived exertion, cumulative load, week to week load increasing, heart rate, player load and high-speed distance), in combination with high-intensity impacts [13,45,46,48]. Therefore, coaches should design training tasks in which the players perform a robust high-intensity activity (i.e., HSR and sprint) separated by short bouts of lower intensity activities (i.e., standing, walking, and jogging), together with collisions and wresting bouts [47,66]. This task could be based on worst-case scenarios to ensure players' performance in these critical situations [66]. In these cases, high-speed running efforts and the importance of tackle success is important [67]. As a consequence of the high-frequency collisions, especially the head/face collisions, complementary training programs are essential to reduce injury incidence [68,69].
Overall, each team sport's structural constraints define the players' and teams' behavior, and subsequently, main variables regarding talent identification, training design, and match analysis differ between them. However, team sports are similar in the fact that they are high-intensity sports in which the performance of high degree of aerobic endurance is mixed with sub-maximal strength and power aimed to carry out fast and intensive actions such as jumps, accelerations or decelerations. The implication of abilities, decision-making, movement and psychological state is implied in the players' motion action, makes that multivariate analysis processed should be made to warrant optimal player performance in the most critical session: the official match.
However, in this systematic review, there was no work in which a data reduction technique was used to reduce the data set in which multivariate data variables were involved. Therefore, the most relevant variables of each dimension in each team sport were extracted, focusing only on a unique dimension. In this regard, general conclusions identify that a combination of 3-4 principle components is needed to explain team sports analysis performance, at least when extraction criteria were set at an eigenvalue of greater than one. Intensity training load metrics with principle component "loadings" above 0.6 or 0.7 were deemed to possess well-defined relationships with the extracted principle component.

Conclusions
Since team sports' performance depends on different dimensions (i.e., technical, tactical, and conditional), multivariate data analysis should be performed in three ways: to make an efficient game analysis, design training tasks based on the most relevant efforts, and to highlight the most relevant variables for talent identification. The most critical variables extracted differ between team sports in each dimension were:

•
For talent identification: anthropometrical variables, together with a player's relative age.

•
For training design and performance analysis: Tactical dimension: occupied space. Technical dimension: ball control, passes and shot to goal. Conditional dimension: angular velocity, speed displacements, distance at high metabolic load, HSR, sprint running, maximum velocity, distance covered, distance covered slow than 6 km/h, distance covered at between 21-24 km/h, metabolic power, dynamic stress load, jumps, impacts, accelerations, decelerations, maximum acceleration, maximum decelerations.

Basketball
• For talent identification: anthropometrical variables (hand measures, height, weight, muscle mass, and fat mass or body fat %), and 10 to 30 m sprint, flexibility, and agility tests seem to be the main basis for talent identification in basketball.

•
For training design and performance analysis: Technical dimension: free-throws, 2 and 3-points, passes, turnovers, defensive and offensive rebounds Conditional dimension: distance over 18 km/h, max accelerations and decelerations, together with impacts 3-5 g, average landing and take-off, and relative distance

Rugby
• For talent identification: this review cannot give any information because of the lack of literature on this aspect. • For training design and performance analysis: Tactical dimension: possession, speed of play, playing form, and infringement. Conditional dimension: rating of perceived exertion, cumulative load, week to week load increasing, heart rate, player load, high-speed distance, and impacts.
However, since these variables are extracted from different studies, further research should perform PCA from databases with variables from different dimensions and analyze the impact of different dimensions in the behavior of other dimensions (e.g., biomechanical in physical/physiological, tactic dynamics in technical efficacy).

Practical Applications for Training Task Design and Performance Analysis
Training tasks should ensure the development of players' tactical, technical and conditional dimensions, both head and physical fitness and conditioning coaches should ensure training tasks used are only those in which all dimensions are considered. Based on different variables that formed each principal component in each team sport, training tasks in each sport should differ: • Soccer: coaches should ensure that the occupied area is suitable during the game. Together with ball controls, passes, and shots, this fact should guide task design during a high level of aerobic endurance in combination with very intensive and short actions. Funding: This research received no external funding.