Test–Retest Reliability of Task Performance for Golf Swings of Medium- to High-Handicap Players

Background: Golf swing performance in medium- to high-handicap players must be reliably measured to use this variable in both research studies and in applied settings. Nevertheless, there are no studies published on this topic and test–retest evidence is only available for low-handicap players. The aim of this study was to determine the number of attempts necessary to obtain a reliable measurement protocol for swing performance variables in medium- to high-handicap players. Methods: Ten amateur players (55.67 (13.64) years, 78.4 (11.4) kg, 1.75 (7.95) m) took part in a test–retest study in two experimental sessions one week apart. In each one, fifteen swings with a six iron and a driver were evaluated with a 3D Doppler tracking golf radar. Results: The results showed that variables related to side carry could not be reliably measured in medium- to high-handicap players in only fifteen trials (ICC < 0.26, SEM > 12.05 m and MDC > 33.41 m). The rest of the performance variables related to the club and ball trajectories could be reliably measured with a 3D Doppler radar with between seven and ten swings. Conclusions: At least seven swings are recommended for the driver and ten for the six iron to measure golf swing performance.


Introduction
Golf has become increasingly popular among amateur players [1]. At the beginning of the 21st century, there were about 30,000 golf courses and 55 million players around the world [2], while in 2021, this number had increased to 66 million [1]. This popularity has given rise to scientific studies aiming to determine the most common injuries in golf players and the most important biomechanical and physiological determinants of swing performance. The swing used depends on the type of shot played. There are three main swing shots: full swings, approach or chip shots and putt shots [3].
Swing performance is usually measured using variables related to club and ball trajectories, as well as their interaction. These variables may include the club head speed, total distance, carry distance, spin loft and smash factor. Readers who want a deeper description of swing performance variables can read [4]. Several studies determining the effects of different training protocols on swing performance have been published [5][6][7][8][9][10][11][12][13][14]. One of the key methodological issues that should be addressed to obtain robust conclusions is the reliability of the protocol used to measure full swing performance, but very few of the published studies have provided a specific reliability analysis for the testing protocols used [5][6][7][8][9][10].
Most of these studies measured different performance variables in between three-to five-trial protocols and analyzed their reliability using the interclass correlation coefficient (ICC). Read et al. measured club head speed in a three-trial protocol with low-handicap players (i.e., 5.8 (2.26)) and obtained good reliability results (i.e., ICC = 0.87) [10], while other studies used a five-trial protocol with these players (i.e., ≤5) and obtained excellent ICC values (i.e., ICC > 0.9) [5][6][7][8]. Weston et al. used a ten-trial protocol to quantify club head speed, backspin and sidespin with participants with a medium handicap (i.e., 11.2 (6.1)), which provided moderate to good reproducibility indices (ICC = 0.68-0.84) [9]. It should be noted that the studies on medium-handicap players obtained less reliable results with ten-trial protocols than the studies on low-handicap players, even with fewer trials.
There are other non-experimental studies that have quantified swing parameters and analyzed their reliability [15][16][17][18][19]. Schofield et al. measured downswing velocity with a three-trial protocol and obtained good to excellent ICC values (i.e., 0.7-0.98), depending on the load [15]. Lewis et al. used a three-trial protocols to study the "maximal" tee shots in low-handicap participants and obtained good to excellent ICC results (i.e., ICC = 0.85-0.95) [16]. Gordon et al. studied club head speed in low-handicap players (i.e., ≤8) and obtained excellent ICC values (i.e., ICC = 0.95) with a five-shot protocol [18], while Barnett et al. [19] developed a process-oriented test to assess golf swing and putt stroke in children and obtained acceptable test-retest reliability (i.e., ICC = 0.6).
Moreover, only one study provided reliability data for an assessment protocol of five shots considering a wide variety of performance variables (i.e., club head speed, face angle, club path, attack angle, ball speed, carry, side carry and spin rate). In this study with low-to medium-handicap (i.e., 9.3 ± 8) players, Outram and Wheat found that, using five good shots with a driver, five iron and nine iron, the ICC was between 0.64 and 0.97 for all the performance variables [20]. It should be noted that the reliability data for ball and club performance variables during swing were obtained with low-to medium-handicap players and it is possible that higher handicap players would require a different number of strokes to obtain reliable results. This is because the variability in the swing performance of low-handicap players is lower than that of high-handicap players [21], and variability is a key issue for reliability.
The players' level is not the only factor than can influence test-retest reliability. The number of swings completed in the measurement protocol is another key factor. There are several studies (not only on golf swing) that have analyzed the effect of the number of trials performed in the experimental protocol on the test-retest reliability [22,23]. With this analysis, reliability studies can provide recommendations on the number of trials required to carry out a reliable measurement.
As far as the authors are aware, only one study has examined the effect of the number of shots used to compute biomechanical variables related to golf performance on reliability [20,22]. Severin et al. performed a reliability study that established the number of trials required to achieve good reliability based on the ICC, standard error of the measurement and sequential averaging analysis in low-to medium-handicap players (i.e., 7.8 ± 4.7) using a six iron and a driver [22]. The results showed that the number of trials required needed to measure range of motion, angular velocity, ground reaction forces and torques during the golf swing varied from 4 to 11. It should be noted that these are biomechanical variables but not performance variables per se.
Thus, there is a gap in the literature with respect to the number of trials required to obtain reliable performance measurements with medium-to high-golf-handicap players. This information can be crucial for both researchers and coaches who need to evaluate swing performance in this population.
There is thus a need for reproducibility studies that provide sufficient information to establish the best golf swing measurement protocols to measure swing performance variables for participants with high handicaps. The contribution of this study is the determination of the test-retest reliability of swing performance variables for medium-to high-handicap players. Moreover, the number of swings needed to evaluate the performance of golf swings in this population with good reliability was determined. The aim of this study was thus to determine the number of attempts necessary to obtain a reliable measurement protocol for swing performance variables with medium-to high-handicap players. We hypothesized that more trials would be needed to obtain reliability than found previously with low-to medium-handicap players.

Participants
Sample size was determined using the equation proposed by Bonett et al. [24] and the previously published ICC value (i.e., 0.86) for seven-iron and driver club speeds and distances [5]. The significance level was set at p = 0.05 at 80% statistical power, which meant ten subjects participated in the study. Recruitment was undertaken with a non-probabilistic method (convenience sample). Eight subjects were male and two female, with a mean age of 55.67 (13.64) years, weight of 78.4 (11.4) kg, height of 1.75 (7.95) m and body mass index of 25.62 (2.44) kg/m 2 . The participants' mean handicap was 31.23 (6.63) strokes. The inclusion criteria were: (i) players with a handicap between 18 and 36 strokes and (ii) players who did not have an injury during the 6 months prior to the study (self-reported).
All the participants voluntarily gave their consent to participate and the protocols applied in this research project were approved by our university's Ethical Committee.

Tasks and Apparatus
The participants performed the tasks described below with a six iron and a driver in two experimental sessions separated by one week to allow the test-retest reliability to be determined between the two experimental sessions. The driver was selected because it is one of the golf clubs most frequently used both by practitioners of the sport and in studies [5][6][7][8]. The six iron was selected because it is an intermediate iron and is also used in studies that carry out this type of analysis [22]. At the beginning of the first session, the subjects were informed about the study on arrival and signed the informed consent form. They then undertook a warm-up (10 min) based on joint mobility and low-intensity muscle-resistance exercises, as well as taking practice swings with their own six iron and driver [10]. The experimental task consisted of performing 15 "good" shots with the maximum effort, first with the six iron and then the driver, aiming at an 80 × 80 cm target. "Good" shots were self-determined by the participants in accordance with their perception and assessment. Any trial that the participant reported to be a miss hit or considered unsatisfactory in any other way was excluded from the analysis [22]. The participants used their own clubs during the tests and were allowed 30-60 s of rest between trials. Once they had completed the 15 "good" shots with the six iron, they rested for 5 min and then started the driver shots. The self-perceived "bad" trials were excluded from the analysis.
The laboratory was equipped with a golf cage (3 × 3 × 3 m) with a safety net to stop the golf ball after each shot. The participants were placed at 1 m from the cage wall and the ball was placed on an artificial grass mat (1 × 1 m). When the participants performed the driver shots, the ball was placed on a rubber tee. A 3D Doppler tracking golf radar (FlightScope Mevo+, FlightScope EDH, Orlando, FL, USA) was placed 2.4 m behind the middle of the golf mat (Figure 1), as recommended by the manufacturer for indoor measurements. An 80 × 80 cm target was placed behind the net.
The golf radar recorded the performance variables of the subjects' shots, which were stored using the FS golf App (FlightScope EDH, Orlando, FL, USA). The performance parameters included were: ball speed, club speed, smash factor, carry distance, total distance, roll distance, spin rate, apex height, flight time, angle of attack, spin loft, spin axis, lateral distance, launch direction and launch angle [4]. The data parameters are explained on the manufacturer's website (https://flightscopemevo.com/pages/flightscope-dataparameters (accessed on 8 November 2022)). The golf radar recorded the performance variables of the subjects' shots, which were stored using the FS golf App (FlightScope EDH, Orlando, FL, USA). The performance parameters included were: ball speed, club speed, smash factor, carry distance, total distance, roll distance, spin rate, apex height, flight time, angle of attack, spin loft, spin axis, lateral distance, launch direction and launch angle [4]. The data parameters are explained on the manufacturer's website (https://flightscopemevo.com/pages/flightscope-data-parameters (accessed on 8 November 2022)).

Statistical Analysis
The statistical analysis was carried out in Matlab R2021b (Mathworks, Natick, MA, USA). Three test-retest reliability parameters were computed: the intraclass correlation coefficient (A,k model [25]), the standard error of the measurement (SEM) and the minimum detectable change (MDC). These parameters were computed using the first trial from the test and retest days, as well as the mean value from 2 to 15 trials in the two testing sessions. This procedure has been performed previously to determine the number of shots needed to reach acceptable reliability values for swing biomechanical variables [22]. ICC scores lower than 0.5 were interpreted as poor, those between 0.5 and 0.75 as moderate and those between 0.75 and 0.9 as good, while scores greater than 0.9 were interpreted as excellent reliability [26]. To determine the number of trials required to compute each of the performance variables, a new index was created (see Equation (1)) called the required trials index (RTI) that could be used for this type of reliability study:

Statistical Analysis
The statistical analysis was carried out in Matlab R2021b (Mathworks, Natick, MA, USA). Three test-retest reliability parameters were computed: the intraclass correlation coefficient (A,k model [25]), the standard error of the measurement (SEM) and the minimum detectable change (MDC). These parameters were computed using the first trial from the test and retest days, as well as the mean value from 2 to 15 trials in the two testing sessions. This procedure has been performed previously to determine the number of shots needed to reach acceptable reliability values for swing biomechanical variables [22]. ICC scores lower than 0.5 were interpreted as poor, those between 0.5 and 0.75 as moderate and those between 0.75 and 0.9 as good, while scores greater than 0.9 were interpreted as excellent reliability [26]. To determine the number of trials required to compute each of the performance variables, a new index was created (see Equation (1)) called the required trials index (RTI) that could be used for this type of reliability study: where ICC is the intraclass correlation coefficient, i is the number of trials used to compute the ICC and ICC1 is the ICC obtained with only one trial. The same index was obtained using the SEM and MDC values by simply changing the −1 of the numerator to +1. Then, the numbers of trials with the highest RTI values for the SEM and MDC and the lowest RTI value for the ICC were selected for each performance parameter. These differences were due to the fact that greater ICC values indicated greater reliability while, in the case of the SEM and MDC, the lower values indicated greater reliability. The mean values and standard deviations of all the performance variables were calculated using the required number of trials, which, according to the RTI, were seven for the driver and ten for the six iron. Student's t-test was applied for related samples to compare the means of the test and the retest for each variable. The level of significance was set at p = 0.05. Figure 2 shows the ICC values for the driver and six iron using 1 to 15 shots for each swing parameter. It can be seen that some parameters reached excellent reliability with only one or two shots (i.e., ball speed, club speed, carry distance and total distance), others needed between three and eight to reach good to excellent values (i.e., smash factor, roll distance, spin rate, apex height, flight time, angle of attack, spin loft) and others did not reach acceptable values (i.e., spin axis, lateral distance and launch direction). Furthermore, it seemed that some variables could be measured more reliably using six iron shots than with the driver (e.g., launch angle).

Results
RTI value for the ICC were selected for each performance parameter. These differences were due to the fact that greater ICC values indicated greater reliability while, in the case of the SEM and MDC, the lower values indicated greater reliability. The mean values and standard deviations of all the performance variables were calculated using the required number of trials, which, according to the RTI, were seven for the driver and ten for the six iron. Student's t-test was applied for related samples to compare the means of the test and the retest for each variable. The level of significance was set at p = 0.05. Figure 2 shows the ICC values for the driver and six iron using 1 to 15 shots for each swing parameter. It can be seen that some parameters reached excellent reliability with only one or two shots (i.e., ball speed, club speed, carry distance and total distance), others needed between three and eight to reach good to excellent values (i.e., smash factor, roll distance, spin rate, apex height, flight time, angle of attack, spin loft) and others did not reach acceptable values (i.e., spin axis, lateral distance and launch direction). Furthermore, it seemed that some variables could be measured more reliably using six iron shots than with the driver (e.g., launch angle).   Tables 1 and 2 and indicate the same tendency as the ICC results. Some parameters with poor to moderate ICC values showed relatively low SEM values; for example, launch direction values varied from 1.2 to 5.55 degrees. This indicates that not only should relative reliability variables (i.e., ICC) be taken into account when test-retest reliability is discussed but also absolute ones (i.e., SEM and MDC).  It should be noted that the number of required trials (RTI) differed when ICC, SEM and MDC were used to compute it (Table 3). For example, if we check the values of Table 3 for the smash factor of driver swings, the RTI recommended four shots using ICC as the reliability variable and seven when the variable used was SEM or MDC. Considering all the data from Table 3 as a whole, seven trials can be recommended for a driver and ten for a six iron in order to reach good reliability values (i.e., ICC, SEM and MDC) for almost all variables. Only spin axis and launch direction with the driver needed more than seven trials. Nevertheless, these variables did not reach good reliability values independently of the number of trials chosen (see Figure 2 and Tables 1 and 2). Therefore, seven trials can be recommended for protocols with the purpose of measuring drive performance. Regarding the six iron, spin loft, launch direction and launch angle needed 11, 12 and 11 trials, respectively. Nevertheless, the reliability values for spin loft and launch angle were also excellent with ten trials. Finally, launch direction did not reach good reliability values independently of the number of shots considered for its measurement. Overall, ten trials seems to be an adequate recommendation to measure swing performance with a six iron. The mean and standard deviation of the parameters calculated using seven trials for the driver and ten trials for the six iron are shown in Table 4, as well as the results of Student's t-test for samples related between the test and retest. As can be seen, there were no significant differences between the test and retest values for any variable.

Discussion
This is the first study to establish the protocol needed to reach test-retest reliability in measurements to assess performance parameters related to golf swing in medium-to high-handicap players, and the results obtained have important implications for research on how the performance of novice amateur players can be improved and how trainers can measure players' performance.
First, good reliability could be achieved in all the performance parameters except for spin axis, lateral distance and launch direction, which probably need more than fifteen shots due to their great variability. In relation to this, Severin et al. performed a reliability study that established that the range of motion, angular velocity and ground reaction force variables (biomechanical variables related to swing performance) could be reliably measured in low-to medium-handicap players [22]. Outram and Wheat performed a reliability analysis of ball and club trajectory variables (similar to our variables) in lowto medium-handicap players and found good to excellent ICC values when including five swings in the analysis. In their study, lateral distance achieved good reliability (i.e., ICC = 0.7-0.73 and SEM = 1.7-2.4 m, depending on the golf club used), which contrasts with the poor values obtained in the present study (i.e., ICC < 0.26, SEM > 12.05 m and MDC > 33.41 m, regardless of the number of swings and the golf club). This difference was probably due to the fact that the variables related to lateral carry are the most difficult to control for novice players [21]. In this sense, Weston et al. found a lower ICC for spin axis (0.68) than spin loft or club head speed (i.e., 0.72-0.84), as well as a greater typical error (30.3%, 12.2% and 4.4% for spin axis, spin loft and club head speed, respectively) [9]. It should be noted that the sample in this previous study was composed of players with a medium handicap (i.e., 11.2), while in the present study, the players had a higher handicap (i.e., 31.23). It is therefore possible that lateral distance cannot be reliably measured in medium-to high-handicap players using a laboratory protocol and a Doppler radar system or that more than fifteen swing trials may be needed.
Secondly, for the rest of the performance variables, the number of required swings could be set at seven for the driver and ten for the six iron. It should be noted that these values were established using ICC, SEM and MDC values and not only ICC values. For example, although carry distance could be measured with a single swing for the six iron according to the ICC value (i.e., 0.88), the MDC varied from 31.53 m for one trial to 17.97 m for the mean value of six trials. This study recommends more trials than the five trials used by Outram and Wheat in their reliability study [20]. These differences could have been due to the disparity in the skill levels between the golfers in the two studies. In this sense, Read et al. found excellent reliability (i.e., ICC = 0.87) with low-handicap players when club head speed was measured with a driver using three trials [10]. Other studies using five trials also found excellent values for club head speed with low-handicap players (i.e., ICC > 0.9) [5][6][7][8]. Nevertheless, Weston et al. found excellent reliability for club head speed (i.e., ICC = 0.84) with medium-handicap players (i.e., 11.2) but they had to perform ten trials [9]. Therefore, these results suggest that the lower the handicap of the players, the lower the number of trials required to reach good reliability. This hypothesis was confirmed by our results, since seven to ten trials were required to reliably measure swing performance in high-handicap players. These numbers of trials were higher than those recommended for low-handicap players (i.e., three to five trials) [5][6][7][8]10,20] and similar to those found for medium-handicap players (i.e., ten trials) [9]. Finally, Severin et al. recommended between 4 and 12 trials for low-handicap players to measure kinematic and kinetic variables during swing performance [22]. Nevertheless, these biomechanical variables (performance variables) were not included in the present study, so no direct comparisons can be made.
Finally, we want to highlight the contribution of this study to establishing the method to be applied for test-retest studies that aim to determine the number of trials required in a motor task to reach reliable measurements [27]. The required trial index can be used to select the best number of trials to achieve reliable measurements while minimizing the measurement time (i.e., as few trials as possible). This index was proposed and used in this study to avoid subjective interpretations by the researchers. Future studies should test the performance of this index in other studies that examine the reliability of different motor actions and variables.
This study had some limitations that should be highlighted. As only fifteen trials were performed, it was not possible to obtain a reliable protocol for the parameters related to lateral deviation. It is possible that more reliable values could have been obtained for these variables with a greater number of shots.

Conclusions
It can be concluded that the performance variables related to the club and ball trajectories of medium-to high-handicap golf players performing between seven and ten swings could be reliably measured using a 3D Doppler radar. At least seven swings are recommended for the driver and ten for the six iron. The variables related to lateral deviation could not be reliably measured with medium-to high-handicap players using fifteen swings.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of Universidad Internacional de La Rioja (protocol code: PI033/2022).