Psychosocial and Physiological Factors Affecting Selection to Regional Age-Grade Rugby Union Squads: A Machine Learning Approach

Talent selection programmes choose athletes for talent development pathways. Currently, the set of psychosocial variables that determine talent selection in youth Rugby Union are unknown, with the literature almost exclusively focusing on physiological variables. The purpose of this study was to use a novel machine learning approach to identify the physiological and psychosocial models that predict selection to a regional age-grade rugby union team. Age-grade club rugby players (n = 104; age, 15.47 ± 0.80; U16, n = 62; U18, n = 42) were assessed for physiological and psychosocial factors during regional talent selection days. Predictive models (selected vs. non-selected) were created for forwards, backs, and across all players using Bayesian machine learning. The generated physiological models correctly classified 67.55% of all players, 70.09% of forwards, and 62.50% of backs. Greater hand-grip strength, faster 10 m and 40 m sprint, and power were common features for selection. The generated psychosocial models correctly classified 62.26% of all players, 73.66% of forwards, and 60.42% of backs. Reduced burnout, reduced emotional exhaustion, and lower reduced sense of accomplishment, were common features for selection. Selection appears to be predominantly based on greater strength, speed, and power, as well as lower athlete burnout.


Introduction
Talent identification programmes assess the attributes of athletes, to guide talent selection programmes [1]. The aim of talent selection programmes is to select players with the potential to be the 'sporting superstars' of tomorrow and help clubs/governingbodies achieve their long-term performance goals [2]. In furtherance of this long-term goal, selected players are usually integrated into talent development programmes which attempt to provide a learning environment that helps players achieve their potential [1]. However, talent selection programmes feature common problems. Firstly, youth performance is frequently used to predict success in adulthood when making selection decisions [3][4][5], despite youth performance offering low predictive accuracy [6][7][8]. For example, only 17% of male U18 sprinters who ranked among the top 50 highest performers internationally achieved the same ranking at senior level. Secondly, talent selection decisions are often made based on subjective criteria [1,3,4,9,10]. For example, interviews of national youth soccer coaches revealed that perceptions of talent and consequent selection decisions are primarily based on implicit coach preferences [9]. Consequently, current approaches to stages would identify similar predictive physiological variables to previous investigations in rugby union [31] and similar psychosocial variables to previous investigations in other sports [34][35][36][37][38][39][40].

Participants
A total of 104 male U16 and U18 Rugby Union players (Mage = 15.47, SDage = 0.80; U16 n = 62; U18 n = 42) who attended one of two North Wales Rugby 'Talent Camps' in 2019 or 2020, volunteered to take part and gave informed consent in-line with institutional ethics guidelines. Of the 104 players who attended, 66 players were selected and 38 were not selected to the regional squads. Of the selected players, 37 were forwards (of which 16 = U16 and 21 = U18) and 29 were backs (of which 17 = U16 and 12 = U18). Of the non-selected players, 19 were forwards (of which 16 = U16 and 3 = U18) and 19 were backs (of which 13 = U16 and 6 = U18). These selections formed the six classification groups for analyses (i.e., selected players vs. non-selected players, selected forwards vs. non-selected forwards, and selected backs vs. non-selected backs).

Procedure
Players from regional squads and eligible age-grade clubs received an invitation to participate in a 1-day 'talent camp' in early spring 2019 or 2020, to assess their suitability for selection to a regional U16s or U18s rugby academy. Prior to these talent camps, players were advised to rest. During the talent camp, players completed a range of physiological and psychological assessments in a station-format which players rotated around until all tests were completed, followed by rugby matches. The selection decisions were made by regional coaches and based on subjective perceptions of performance during matches held on the talent days. For the purpose of this investigation, players were assessed on demographics, anthropometric, performance, and psychosocial measures (with the former 3 comprising 21 'physiological' variables and the latter 47 'psychosocial' variables) to identify differential features between those who were selected and not selected for the regional academy.
Physiological demographic measures included self-reported weekly physical activity hours (assessed in 5-h increments, starting at 1-5 and going up to 30 h+), self-reported weekly training frequency with the academy before the talent camp (to the nearest integer), self-reported incidence of a significant injury during their career (assessed as 'yes' or 'no'), and birth quarter (determined via birthday as: quartile 1 = September 1st to November 30th; quartile 2 = December 1st to February 28th/29th; quartile 3 = March 1st to May 31st; quartile 4 = June 1st to August 31st). For physiological anthropometric measures, players removed all heavy garments and footwear prior to recording measurements. Players' body mass (kg) was measured using electronic column scales (Seca 799, GmbH, Hamburg, Germany). Standing height and sitting height (cm) were measured using a portable stadiometer (HR001, Tanita Europe BV, Amsterdam, The Netherlands) and leg length was calculated as the difference between standing and sitting height (cm). Body Mass Index (BMI) was calculated as weight divided by height (in metres) squared. The Reciprocal Ponderal Index (RPI), also known as Sheldon's index [48], was calculated using the following equation: height (cm)/weight 0.333 (kg). Before measurement of physical performance measures, all participants completed a standardised (in terms of time and intensity) warm-up administered by regional strength and conditioning coaches and were briefed on how to execute each assessment. The counter movement jump was performed on a jump mat (JustJump, Probiotics Inc, Huntsville, AL, USA) indoors while wearing trainers, to assess jump height (cm) and peak anaerobic lower body power (W) using the Sayers Equation [49]; hands were positioned on the hips and the best jump height from three trials was recorded [50]. A hand grip strength test (Takei 5001 Grip-A Handgrip Dynamometer, Takei Scientific Instruments Co, Nigata, Japan) was used to infer strength (kg) within the dominant and non-dominant arm; participants stood with their back against a wall with their testing arm at 10 • -15 • from the shoulder and elbow flexed at 90 • with the highest score from two attempts (per arm) recorded [51]. Time (s) taken to sprint 10 and 40 m was recorded using timing gates (Brower Timing Systems, Draper, USA) on a 3G artificial grass pitch while wearing rugby shoes with studs; each sprint distance was completed twice with a 2-min rest between each repetition, with the fastest time recorded for each player. For the 40 m sprint: velocity was calculated as 40 m divided by the time taken to complete the 40 m; acceleration was calculated as velocity divided by the time taken to complete the 40 m sprint; force was calculated as acceleration multiplied by weight (kg); momentum was calculated as the velocity multiplied by the player's weight (kg); and average power was calculated using the Harman Formula [52].
The psychosocial questionnaires were administered in two questionnaires' packs to players during the 1-day 'talent camp'. Players were informed that their responses would not affect their selection. The first questionnaire pack gathered training behaviours (e.g., goal orientation, commitment, athlete identity). Players were also asked to report the number of hours of employed work they completed every week. The second questionnaire pack examined competitive experiences and personality traits (e.g., optimism, perfectionism, alexithymia). Questionnaires were chosen based on previous research which has identified these psychological constructs as important for athlete development [38]. In order to include several components and to circumnavigate issues with excessive questionnaire length, two items per construct were included. For complete information on the psychosocial variables collected, original sources, and items used, see Appendix A.

Data Analysis
To evaluate which features (i.e., predictor variables) best classified (i.e., determined) group membership (selected vs. non-selected), Bayesian pattern recognition was performed; a complete list of the features evaluated by the pattern recognition analysis (21 physiological and 47 psychosocial variables) can be found in Table 1. To explore the relative importance of factors within their topic-domains and player-positional categories and reduce the likelihood of machine learning overly reducing the features considered within datasets, we split analyses by positions (i.e., backs and forwards) and domains (i.e., physiological and psychosocial). Pattern recognition was performed using the open-source programming language R (R Core Development, 2021). Within this coding environment, the Tidyverse package [53] was used to perform advanced data manipulation, and the rWeka package [54]) was used to interface R with WEKA machine learning algorithms [55]. Analysis comprised three stages: first, features were standardised as part of data pre-processing; second, feature selection was performed to filter the dataset to a combination of its most predictive features, thus creating 'models' of features that best at differentiated group classification; and third, the classification accuracy of the created models was tested to evaluate how well the created models should predict group membership in future.

Pre-Processing
For all analyses, the data of U16 and U18 players were standardised and amalgamated. The raw U16 data were transformed into z-scores using the U16 means and standard deviations, and the raw U18 data was transformed into z-scores using the U18 means and standard deviations. Therefore, when the z-scored U16 and U18 data were amalgamated, zscores indicated how much greater/less athletes scored on a feature (i.e., predictor variable) compared to their age-group peers. For data processing purposes, each z-scored feature was converted into a vector that went from 0 to 100 (with a player's score of 50 representing a score equivalent to the age-group mean and a score of 60 represented 1SD above the agegroup mean, etc.). The purpose of amalgamating the data of U16 and U18 players was to: construct/evaluate classification models with greater accuracy via a larger dataset; identify features which determine overall 'age-grade' rugby union selection; and aid interpretation because similar features and model classification accuracies emerged for U16 and U18 players when analysed separately.

Feature Selection for Model Creation
Feature selection involved the use of correlation attribute evaluator [56], relief F attribute evaluator [57], gain ratio attribute evaluator [58], and info gain attribute evaluator [58], to identify (up to) 15 of the strongest features for determining group membership (i.e., selected vs. non-selected). Only features which were identified as being in the top 15 (this criterion was set arbitrarily based on the number of variables collected, prior to any data analysis) by at least two feature selection algorithms could become part of a 'model' and proceed in the analysis (some of the feature selection algorithms used can return less than 15 features if they are deemed as insufficiently predictive [56][57][58]). The resulting models were the combination of features within the dataset that best predicted group classification. Feature selection was performed a total of 6 times to create 6 models for 3 position conditions (all players, forwards, and backs) × 2 feature subsets (physiology features and psychosocial features).

Model Classification Accuracy
Each of the six models created by feature selection stage had its classification accuracy tested (i.e., how accurate, in percentage terms, a model is in predicting group membership) via the use of Naïve Bayes [59], J48 decision tree [60], Support Vector Machine [61], and K-nearest neighbour [62] classification algorithms. These algorithms assigned each player with an expected group membership (selected or non-selected) based on their score on features within the model. This process was iterated using a 'leave one out' cross-validation procedure wherein classification algorithms were performed repeatedly but with each of players' data left out once [45]. Thus, the final classification accuracy reported was the average percentage accuracy across each iteration. This 'leave one out' cross validation procedure was chosen over a training/validation sample-split to create the most accuratepossible models (i.e., via an as-large-as-possible dataset during feature selection), whilst still minimizing the overfitting of results to the specific dataset and preserving generalizability (i.e., via the conservative nature of the 'leave one out' method) [45]. Table 2 contains the models created by feature selection and their overall classification accuracy. All models comprised between three and six features; naturally, less features were agreed on by more feature selection algorithms. Classification accuracy of the models ranged between 60 and 72% and was better than chance.    11.08 ± 2.19 questionnaire score), lower burnout (27.12 ± 5.79 vs. 29.37 ± 5.99 questionnaire score), lower exhaustion (9.15 ± 2.69 vs. 10.24 ± 3.55 questionnaire score), and lower introjected regulation' (4.12 ± 2.53 vs. 4.82 ± 3.14 questionnaire score) than non-selected players.

Discussion
The aim of this study was to measure primary physiological and psychosocial factors in age-grade Rugby Union players and to utilize a novel Bayesian pattern recognition technique to identify which attributes differentiate between selection and non-selection to regional U16 and U18 performance pathways. The main findings of this investigation suggested that the generated physiological models correctly classified 67.55% of all players, 70.09% of forwards, and 62.50% of backs. Greater hand-grip strength, faster 10 m and 40 m sprint, and power were common features for selection. The generated psychosocial models correctly classified 62.26% of all players, 73.66% of forwards, and 60.42% of backs. Reduced burnout and emotional exhaustion, and lower reduced sense of accomplishment, were common features for selection. Selection appears to be predominantly based on greater strength, speed, and power, as well as lower athlete burnout. Of note, the greater specificity and lower sensitivity across all analyses suggests that non-selected players were easier for the algorithms to identify. This finding is logical when one considers that players who should not be selected likely stand out more (e.g., particularly slow/weak) compared to players who should be selected (i.e., where the margins may be finer). The present investigation offers an arguably more comprehensive test of factors than previous studies into age-grade rugby selection (e.g., [21]); is the first attempt to objectively understand the currently subjective decision-making that determines selection to regional age-grade academies in Wales; and tests the role of psychosocial and physiological attributes via new and cutting-edge analytical methods.
The findings of this investigation provide a unique insight into differences in psychosocial components between selected and non-selected players. The results suggest that selected players (generally across positions) reported lower levels of overall burnout and specifically lower exhaustion and lower reduced sense of accomplishment compared to non-selected players. Consistent with previous research in Rugby Union [32,33], these results suggest that burnout is a prominent factor in the sport. Interestingly, the present pattern recognition analysis did not support previously proposed theoretical explanations of burnout, exhaustion, and reduced sense of accomplishment, such as perfectionism and coping [32,63]; it is possible that the mechanisms producing these outcomes were too individualized within the present sample to be identified at the feature selection stage.
Regardless of the precise mechanisms leading to burnout, results highlight the need for coaches to consider how it could ultimately derail athlete progression within talent selection and development [10].
The psychosocial results also reveal differences across forwards and backs. Forwards report lower life stress, which is logical when viewed in line with results on burnout [64]. The selected forwards also reported lower scores in difficulties describing feelings, which is a component of the personality trait Alexithymia [65]. Those high in Alexithymia often are unable to express and recognise their emotions leading to difficulties in regulating emotions and difficulties with interpersonal relations [65]. Alexithymia is relatively under-researched in athletes, but some research has linked those high in Alexithymia with risk-taking [66], and endurance sports [67]. From a forward's perspective, it could be argued that the lower scores related to difficulty describing emotions are indicative of greater emotional regulation, and their ability to resolve negative emotions that arose from stressful aspects of life [67] and physical demands of playing forward. This is again logical when considered with the reports of lower life stress and burnout. Future research should attempt to tease the present findings apart by further investigating the coping strategies that might differ across positions.
Whilst research has shown positional differences from a relative age effect [68] and physiological perspective [69], the present investigation is, to the authors' knowledge, the first to reveal positional differences in psychosocial components of selected vs. non-selected rugby players. Research by Dimundo et al. [21] included one measure of cognitive skills in rugby union players but found no significant differences across players and went on to call for future research to include psychosocial characteristics in talent identification/selection methodologies. Adopting a battery of psychological tests comprising fewer items in the present investigation provided an opportunity to assess players on a wide variety of relevant psychosocial components. Recent applied research [70,71] has also adopted this method of utilizing fewer items per construct to facilitate both a broad assessment and to encourage athlete engagement. Whilst we would recommend that any psychosocial investigations such as this are followed up on a more detailed basis between a sport psychologist and coaches, the method adopted here does facilitate a broad understanding of psychosocial component relevant to talent identification, selection, and development.
Physiological models correctly classified selected players in the range between 62.50% and 70.09% and were stronger predictors of selection than psychosocial models, which has been alluded to previously [21]. In addition, the common features for selection within our models are generally in agreement with previous research examining differences in physical and performance measures between selected and non-selected players. For example, in the present investigation, greater hand grip strength was a performance feature important for selection across all players and within positional categories. Others have confirmed that greater strength in general [72,73] and handgrip strength specifically [19,21,74] is a characteristic of selection to rugby performance pathways and distinguishes between standard of play in age-grade players [75].
Sprinting speed is an important physical quality in Rugby Union and is associated with many performance parameters such as evading opponents, line and tackle breaks and has been shown to distinguish between selected and non-selected age-grade players [18,31,73,75]. In the present investigation, selected players recorded faster sprint times over 10 m (all players and forwards) and 40 m (forwards and backs). Indeed, 10 m sprinting speed was one of the features within the model that correctly classified 67.55% of all players, and coaches and sport scientists should ensure the inclusion of these assessments into talent selection programmes.
Previous research has consistently shown that selection for Rugby Union performance pathways across U15-U21 age grades is biased towards taller [73,75] and heavier players [21,75]. This may explain the well-established selection bias towards relatively older players [31,[76][77][78] and to some degree early maturing players [79] (i.e., the relative age effect). Notably, in the present investigation, stature and body mass were not common features of selection for all players regardless of positional category. Although earlier birth quartile (Q1 and Q2) was part of the model for selected backs, this was not a feature for selection in the physiological models covering all players and forwards and may partly explain these findings. Despite its absence as a direct feature for selection, body weight did appear to be an important factor when expressed as momentum (backs), force (forwards) and power during 40 m sprinting (all players and forwards). Further suggestion of the importance of body shape and size was evidenced via a lower Reciprocal Ponderal Index (RPI) as an important feature for selection for forward players. The RPI is an index of adiposity calculated as the relationship between standing height divided by the cube root of body weight and based on allometric modelling has a stronger mathematical foundation than BMI, as weight is a variable of cubic dimensions [48]. RPI has been associated with performance in sports such as soccer and tennis [80,81]. The lower RPI found in selected forwards in the present investigation would infer that greater body mass rather than a more linear (ectomorphic) body shape is an important factor in terms of selection within this positional category.
The methods used to derive the aforementioned psychosocial and physiological findings feature several strengths. Firstly, the present investigation was the first to directly assess the role of primary player-derived psychosocial attributes on talent selection in Rugby Union. Secondly, the novel pattern recognition analysis performed on physiological features revealed similar predictive features for talent selection in Rugby Union to previous correlational studies [31] which, importantly, gives confidence to the psychosocial features identified as predictive for the first time. Lastly, a rigorous and conservative 'leave one out' cross-validation classification procedure was used. This classification procedure facilitates more accurate feature models (i.e., via an as-large-as-possible dataset during feature selection) whilst minimizing the overfitting of results to the specific dataset (i.e., by testing classification accuracy on the entire sample, instead of a small validation-specific sample) [45,46]. The newfound knowledge from the present investigation can be used by coaches, managers, parents, and guardians in making sure youth Rugby Union players are adequately developed and supported for future success. Coaches may wish to prioritize the physiological development of relatively stronger and faster players, while parents and guardians may wish to monitor for signs and causes related to burnout and exhaustion. Such provision should position Rugby Union players optimally for selection by regional age-grade academies.
It is important to note however, some of the present investigation's limitations. Classification accuracies (60-74%) were less than those of studies utilizing similar machine learning approaches in other sporting domains [82,83]. However, this result can be expected for two reasons. The regional academy's subjective/intuition-based selection criteria likely introduce inevitable statistical 'noise', and the present investigation's conservative 'leave one out' cross-validation classification procedure likely resulted in lower classification accuracies. One method to increase classification accuracy despite this, could be the use of even more comprehensive test batteries (e.g., via evaluations of practice histories, technical ability, tactical ability, and performance history). For example, evidence to suggest that the features collected in our study do not capture the role of tactical/technical attributes in determining selection, can be seen in the backs' generally lower classification accuracy (~60%) compared to forwards' (~70). For backs in particular, tactical and technical skill may be a particularly important trait when academies evaluate players. Future studies are encouraged to collect ratings of players' tactical and technical ability from independent coaches, alongside developmental variables such as practice histories, which have demonstrated themselves as important factors in determining future success [83]. Additionally, subsequent investigations may wish to also evaluate the interactive role of aerobic fitness, a variable that was not possible to assess in the present investigation due to time constraints on the talent camp day but has previously demonstrated an ability to differentiate between selected and non-selected rugby union players [31].

Conclusions
This is the first study that has utilized a machine learning approach to examine the factors that determine selection to a regional age-grade Rugby Union academy in Wales. The present investigation offers an arguably more comprehensive analysis of factors than previous studies in this population and informs an objective understanding of the current subjective decision-making that determines selection to regional age-grade academies in Wales. From these findings, it appears that physiological factors are more predictive of selection. Specifically, the findings of this present investigation suggest that greater strength, speed, and power during sprint running were important factors for selection and should be included as routine assessments in talent selection for regional academies. Nevertheless, psychosocial factors were also shown to be important with reduced burnout and emotional exhaustion, and lower reduced sense of accomplishment, common features for selection. Indeed, this is the first study to comprehensively examine psychosocial factors important for selection to rugby academies and the findings add weight to the argument that these factors should be considered as part of a holistic selection framework in Rugby Union. Furthermore, practitioners should also consider position-specific differences in factors important for selection when planning talent selection frameworks. Future studies are encouraged to adopt a holistic approach to talent selection through investigating a comprehensive combination of physiological and psychosocial factors alongside tactical and technical ratings and developmental variables such as practice histories.   Key: ** = Reverse Score (i.e., 1 = 5, 2 = 4, 3 = 3, 4 = 2 and 5 = 1).