What Is Performance? A Scoping Review of Performance Outcomes as Study Endpoints in Athletics

Purpose: This review set out to summarise, define, and provide future direction towards the use of performance outcome measures as endpoints in research performed at international benchmark events in athletics. Methods: Scoping review methodology was applied through a search of the PubMed and Sports Discus databases and a systematic article selection procedure. Articles that met the inclusion criteria underwent triage for further quantitative and qualitative analysis. A concept chart was generated to describe the methods by which performance had been measured and introduce descriptive labels for theoretical and practical application. Results: None of 2972 articles primarily identified from the database search met the triage standards for quantitative data extraction. Eleven articles were included in a qualitative analysis. The analysis identified the common methods by which performance has been measured, reported and analysed. The resulting concept chart collates labels from the qualitative analysis (categories, themes, and constructs) with sports practice labels (performance metrics, framework, and analysis). Conclusions: The state of knowledge concerning methods to employ performance metrics as endpoints in studies performed at major competitions in athletics has been summarised. Constructing a methodology that combines the performance metric variables (continuous and ordinal) that are currently utilised as endpoints remains a challenge.


Introduction
Measurements of performance success influence possibilities to participate in individual sports at the highest level through mechanisms such as qualifications to the largest competitions, sponsorship contracts, government funding of sports organisations, and subsequently funding of athletes by those organisations. Winning and personal bests are typical benchmarks of outstanding achievements in this setting. Performance success may also be regarded as the ultimate endpoint of sports medicine and epidemiological research in elite sports through identifying and investigating factors that influence performance outcome (PO). The definition of PO and its evaluation through objective methodology therefore requires consistency and consideration of the broad array of factors that contribute to that endpoint.
The methodology to evaluate or define 'performance' among elite individual athletes has not been settled in the scientific literature and poses a real-world challenge for sporting bodies when objectively evaluating performance. The same challenge is posed when attempting to investigate factors that influence performance. One reason for this debate is that a paradox exists whereby a subjective evaluation of an individual athlete's PO can be dominated by the comparison with others regardless of a goal to outperform oneself [1]. Individuals competing at recreational sport level are often motivated by intrinsic factors such as psychological wellbeing, maintaining fitness, enjoyment or skill development [2,3]. Nonetheless, for individual athletes at the very highest level, performance at the key competitions is the outstanding endpoint measure for success. The international benchmark event (IBE) thus constitutes a highly suitable setting for studies of factors influencing performance among elite athletes.
Furthermore, the traditional endpoints for studies in sports medicine and epidemiology have been clinical endpoints like injury and illness [4]. However, these health factors may not be sensitive enough to capture the essence of what is required for sports performance at the very highest level. Notions such as time-loss injury have been introduced to better suit the sports context [5]. The measure of time-loss from participation may still be insufficient due to that even minor deviations from optimal health and capacity level may prove crucial to success at the highest level of sports [6]. A recent trend investigating outcomes in sports epidemiology has endeavoured to progress the traditional endpoints of injuries and illnesses towards the endpoint of 'performance'. Injuries impair the chance of success by sportspeople [7], and time lost during a competition preparation phase due to injury and illness is associated with decreased likelihood in achieving a performance goal in athletics at IBEs [6]. The continued direction of this work to progress from injury and illness as endpoints in high performance sport requires the development of an objective methodology and working definition of 'performance' as an endpoint in sports medicine and epidemiology settings. This outcome would enable researchers to progress, when warranted, from injury and illness being the endpoint to investigating injury and illness as factors that may ultimately influence performance.
Athletics (track and field) is the most participated-in sport at the Olympic Games [8] and one of the highest participation sports in Europe [9]. The real-world challenge in evaluating performance objectively in athletics is demonstrated by the example where an athlete may underperform according to their meet rank yet outperform their season's best time. This challenge is compounded in athletics by some running events having an external tactical focus towards other competitors in a race (e.g., 800 m and 1500 m) and other events an internally focused maximum effort (e.g., 100 m and javelin throw). The objective evaluation of performance at an IBE and investigation of factors that influence performance are impaired currently by the absence of a consistent working definition and methodology to do so.
Scoping reviews have recently been introduced to examine the extent, variety, and nature of research evidence on a novel topic; summarise findings from a heterogeneous body of knowledge; and identify gaps in the literature to aid the planning of future research [10]. The aim of this study was to apply scoping review methodology to examine the use of various POs as endpoints in research performed at athletics competitions of the highest level. A secondary aim is to provide directions for future work into defining PO in individual sports and the methodology to analyse performance objectively for its utilisation as an endpoint in performance evaluation, sports medicine, and epidemiology.

Methods
The Preferred Reporting Items for Systematic Reviews and Meta-analysis Protocols (PRISMA-P) guidelines [11] were followed to identify a primary set of articles for data extraction and review. The 5-step process as described by Arksey and O'Malley [12] with enhancements as described by Levac and colleagues [13] was utilised: Identify the research question, identify relevant studies, study selection, chart the data, and collate, summarise, and report the results. In the final step, the review process was supplemented by application of thematic analysis methods [14]. The PRISMA extension for scoping reviews (PRISMA-ScR) checklist was used to ensure complete and transparent reporting [10]. Before initiating the review, the protocol for the analysis was registered on the PROSPERO International prospective register for systematic reviews website (http://www.crd.york.ac.uk/PROSPERO) on 11 February 2018 (registration number: CRD42018087272).

Identification of Relevant Studies
The research question to be addressed by the review was: "How has performance at IBEs in athletics been objectively analysed?" The purpose of restricting the review to IBEs was to capture performances where athletes are most likely to be striving for their highest outcome of the season. The article inclusion criteria were: Published in English, reporting original research, involve athletics athletes, reporting research encompassing IBEs (International Association of Athletics Federation (IAAF) World Championships, Olympic Games, continental championships, and Commonwealth Games), and reported athletes' performance. Articles were excluded under the following criteria: Study involved subjects under 18 years old, and the study involved a performance metric that was not 'event outcome based', e.g., physiological measures, laboratory tests, and physical parameter tests, e.g., jumps tests, time trials, strength tests.

Final Study Selection
The retrieved records were in the next step uploaded to Covidence software (Covidence systematic review software, Veritas Health Innovation, Melbourne, Australia. Available at www.covidence.org). Two authors (B.P.R., M.K.D.) independently reviewed titles and abstracts for potential eligibility. For the potentially eligible records, the full-text articles were thereafter retrieved and assessed according to the inclusion and exclusion criteria. The reference lists of the resulting articles were searched by the lead author (B.D.R.) for inclusion of additional articles. Any discrepancies were discussed by the reviewers (B.D.R., MK.D.). No conflicts were outstanding. The review of full-text articles revealed that those articles that reported a performance metric provided sufficient content data for a continued analysis.

Collating the Results
A thematic analysis was performed to assess the articles reporting on a performance metric for categories and themes using a six-stage recursive process [14]. The lead author (BR) conducted a risk of bias assessment using the Downs and Black checklist [15] for randomised controlled trails (RCTs) and the Newcastle-Ottawa scale (NOS) for assessing the quality of nonrandomised studies [16]. The Downs and Black checklist assign a score out of 31 with ≥75% as deemed to be high quality or 'low' risk of bias, 60-75% moderate quality, and ≤60% as low quality or 'high' risk of bias [17]. The NOS uses a star rating system up to a maximum score of 9 stars divided in to three classes: Selection, comparability, and outcome. The thematic analysis was thereafter performed by the lead author (BR) to summarise and coalesce the content of the articles. The information charted from the articles included the study design, country, study population, study aims, performance metric, analysis, and risk of bias assessment (on RCTs and cohort studies). Significant parts were identified and sorted into categories and themes. In the final step, a concept chart was generated through an iterative process to provide a model representation of the methods by which the assessed articles measured, reported, and analysed performance.

Literature Search
The initial literature search identified a total of 2972 articles for title and abstract review. Twenty articles were selected for full-text assessment with the aim to extract data for either quantitative or qualitative analysis. Nine articles did not report a performance metric, seven did not report results from a benchmark event, and four articles included a sub-elite athlete population, thus providing no articles that met the desired frame of reference for quantitative data extraction. Eleven of the 20 articles reported a performance metric that provided data for qualitative thematic analysis ( Figure 1).

Literature Search
The initial literature search identified a total of 2972 articles for title and abstract review. Twenty articles were selected for full-text assessment with the aim to extract data for either quantitative or qualitative analysis. Nine articles did not report a performance metric, seven did not report results from a benchmark event, and four articles included a sub-elite athlete population, thus providing no articles that met the desired frame of reference for quantitative data extraction. Eleven of the 20 articles reported a performance metric that provided data for qualitative thematic analysis ( Figure 1).

Risk of Bias, Data, and Concept Thematic Extraction
Six of the 11 articles eligible for thematic analysis (randomised controlled trial (RCT), n = 1; Cohort studies, n = 5) underwent risk of bias assessment (Appendix B). The RCT was deemed to have a 'high' risk of bias (55%). All cohort studies scored ≤2/4 in the 'selection' category relating to the studies not being designed in a fashion that included exposed and non-exposed cohorts, zero in the 'comparability' section for the same reason, and 3/3 in the 'outcome' section relating to the results recording process and subject follow-up. The remaining five articles were descriptive studies and did not undergo risk of bias assessment.
Two constructs were identified to 'analyse' the themed categories: 'deviations' (n = 8) represent a divergence from a performance trend like meet, annual or career performances [18,20,23,24,27] or divergence from performance standards such as season's best or ranking [21,25,26], and 'associations' (n = 6) between the performance metric and predetermined independent variables, such as psychological factors [19,25,28], physical or physiological factors [22], and societal factors [24,27]. Three articles analysed performance metrics by both constructs, deviations and associations [24,25,27]. Thematic extraction identified a common flow of analysing performance that established a PO for evaluation.

Concept Chart
Two primary features of the concept chart define and illustrate 'PO' as an endpoint measure ( Figure 2). 'Thematic analysis labels' (categories, themes, and constructs) are provided for generalisability along with 'sports practice labels' (performance metrics, framework, and analysis) to guide the practical application of the results from the thematic analysis.

INTRA-Personal
Deviation-from intra-personal performance standard Extrapolated or interpolated prediction of one distance from the other two performances Risk of bias tools: Downs and Black checklist; a score of ≥75% was deemed to be of high quality, 60-75% moderate quality, and ≤60% low quality [15]; NOS = Newcastle-Ottawa scale, a nine-star rating system divided into three categories (Appendix B) [17].

Discussion
Despite its widespread discussion in nonacademic spheres, to our knowledge, no research has sought to use objective methodology to summarise use of POs as endpoints for studies performed at IBEs in athletics. This review reports that research to date on performance in athletics competitions at the highest level is predominated by continuous and ordinal performance metrics as endpoints for analysis within an intra-personal framework. The performance metrics have commonly been analysed by illustrating deviations from performance trends or performance standards, or by observing associations with predetermined independent variables, or both.
The framework resulting from the review also places the objectified PO as one important component within this broader appraisal process. From a sports practice perspective, performance evaluation processes have to date involved subjective evaluation according to the particular criteria that, for instance, a sports federation chooses to apply. Such evaluations require the consideration of many factors relating to an athlete's performance and context within which that performance lies. Further, a comprehensive structured framework may never have the capacity to incorporate all of those factors.

The Need for an Evaluation Context
The qualitative analysis of the included articles content identified themes that provide an avenue to apply context to performance metric categories. By applying a framework to the interpretation of a performance metric, the crude measurement or finishing position can at the basic level be analysed in comparison to other athletes or within one's own historical results. This 'contextualised result' provides a foundation on which further analyses may follow. Such assessments may support sports practices by focusing on deviations from a performance standard like a national or personal record, deviation from a trend like seasonal improvement, or constitute academic investigations of associations with physiological, psychological or societal factors. The framing of the performance metric with intra-or inter-personal scope thus provides a platform to analyse the contextual result The performance metrics 'categories' report the crude measurement of the event or competition result and rank. The 'framework' themes add scope/context to the performance metric for analysis. The resultant themed category, or 'contextual result', is finally 'analysed' using one of or several constructs. The end result of this methodology provides the 'PO' that may then be 'evaluated' by a sports federation or academic purpose according to their own objectives. This objective methodology offers consistent sports practice labels to defining PO by applying a framework to the performance metric establishing a contextual result that may then be analysed according to an applied construct.
A framework is applied to the performance metric producing the contextual result, which may then be analysed according to an applied construct.

Discussion
Despite its widespread discussion in nonacademic spheres, to our knowledge, no research has sought to use objective methodology to summarise use of POs as endpoints for studies performed at IBEs in athletics. This review reports that research to date on performance in athletics competitions at the highest level is predominated by continuous and ordinal performance metrics as endpoints for analysis within an intra-personal framework. The performance metrics have commonly been analysed by illustrating deviations from performance trends or performance standards, or by observing associations with predetermined independent variables, or both.
The framework resulting from the review also places the objectified PO as one important component within this broader appraisal process. From a sports practice perspective, performance evaluation processes have to date involved subjective evaluation according to the particular criteria that, for instance, a sports federation chooses to apply. Such evaluations require the consideration of many factors relating to an athlete's performance and context within which that performance lies. Further, a comprehensive structured framework may never have the capacity to incorporate all of those factors.

The Need for an Evaluation Context
The qualitative analysis of the included articles content identified themes that provide an avenue to apply context to performance metric categories. By applying a framework to the interpretation of a performance metric, the crude measurement or finishing position can at the basic level be analysed in comparison to other athletes or within one's own historical results. This 'contextualised result' provides a foundation on which further analyses may follow. Such assessments may support sports practices by focusing on deviations from a performance standard like a national or personal record, deviation from a trend like seasonal improvement, or constitute academic investigations of associations with physiological, psychological or societal factors. The framing of the performance metric with intraor inter-personal scope thus provides a platform to analyse the contextual result through a combination of constructs that enhance the possibilities to interpret the PO as a basis for decision-making in sports practice and/or scientific inferences.
A recently proposed practical application of structured PO analysis with intra-personal scope is tracking of performances to identify unusual improvements possibly caused by doping, the so-called athlete performance module (APM) in the "athlete's performance passport" [31]. The individual variation of performance in elite athletics athletes over a single season has been reported to be small, with a coefficient of variation ranging from 1.1 to 1.4% (90% CI: 1.0-1.6%) [32]. In the APM, performance data are modelled based on past performances, and unusual performances by an athlete trigger a more thorough testing program. By these means, athletes with unusual deviations from predicted performances are identified and made subject for testing using blood testing. Moreover, structured PO analyses with inter-personal scope have been performed to study the possible influence from natural variations in hormone levels on athletic performance. To test associations between serum androgen levels and performance, athletes have been classified in tertiles according to their free testosterone (fT) concentration and the best competition results achieved in the highest and lowest fT tertiles then compared [33]. When contrasted with the lowest female fT tertile, women with the highest natural fT tertile performed significantly better in 400 m, 400 m hurdles, 800 m, hammer throw, and pole vault, with margins of 2-4%. The results have been used by the International Association of Athletics Federation (IAAF) to conclude that female athletes with high fT levels have a significant competitive advantage over those with low fT in the corresponding events. A question for future research is whether there is an opportunity to even more accurately represent performance in circumstances such as the two mentioned above using a contextualised result analysis.

Expanding the Scope of PO Evaluations
The performance metric categories of 'rank' and 'crude measurement' identified in the review have been shown to display limitations when used as singular endpoints for evaluation. Evaluating crude measurements has been challenged in its narrow approach to the evaluation of 'success', particularly with respect to 'losing' and 'failure' [34]. Correspondingly, the calibre of competition at an event may result in overvalued or undervalued rankings. An athlete may continually lose narrowly to the highest-calibre athletes and be poorly ranked; conversely, an athlete may continually beat competitors of lower calibre and be ranked unduly high [21]. Applying a contextual framework to a performance metric enables the evaluation of the resulting PO to move from a singular endpoint with limitations to a more versatile endpoint that captures and considers a broader array of factors. This process is important to coalesce the varied constructs that result in the PO, yet the challenge remains in quantifying the PO beyond identifying and describing the constructs that inform it.
An alternative approach to evaluation of POs is to base these on subjective goals. Subjective evaluation of performance metrics or contextual results as endpoints alone has limitations. Subjective seasonal goals could take many forms, including a 'mastery' approach whereby an athlete is motivated by the achievement of absolute or intra-personal competence or avoidance of incompetence, or conversely, a 'performance' approach whereby the athlete is motivated to do better than or not do worse than others [35]. Performance approach goals may lack stability over time, as 'goal switching' is thought to be more prevalent in goals that are established under reasons of 'external pressure' compared with 'autonomous reasons' [36]. Subjective goals evaluated by this approach may be challenged through questioning the veracity or appropriateness of the goal itself. A competition of higher standard may then shift the subjective evaluation of individual performance from a 'crude measurement' like finishing time to the 'rank' of finishing place [37]. Athletes may also avoid self-definitions of failure by obtaining satisfaction or success through smaller accomplishments [38]. Linking the evaluation of POs to the attainment of a subjective goal therefore has questionable value in the objective categorical evaluation of performance.

Practice Implications
Effective PO evaluation in sports practice settings, for instance, by a national federation, may require that a contextual result be analysed using a combination of constructs. Examples of constructs derived from combining contextual results include: Deviations from a trend of intra-personal crude measurements like height or distance, surpassing a performance standard like intra-personal season's best, or reference to an inter-personal meet ranking or national record. Each construct described in these examples may be analysed as a singular endpoint; however, a comprehensive evaluation of PO would consider the contribution of each of these endpoints in combination. The concept chart provides a methodology to objectify PO, yet further work is required to develop a 'next step' in the objective concept chart methodology to encapsulate a combination of constructs like the examples described above for effective evaluation.

Review Strengths and Limitations
Strengths in this scoping review include the use of a highly structured method consistent with a systematic review process yet flexible where required within the scoping review. This allowed the identification and description of the various methods used in athletics to describe PO as an endpoint for evaluation. We delimited our search criteria to benchmark events in athletics, and more research may exist when assessing a broader array of events. We were thus able to clearly identify limitations in the current body of work towards objectifying PO and gaps in the literature that open opportunity to further research.
Having considered several alternative review methodologies, we found that the scoping review methods incorporating systematic processes were the current best practice to identify and report on research using POs as endpoints. An unexpected limitation was the lack of existing research that satisfied the triage criteria for quantitative data extraction. We searched a common array of academic research databases; however, the inclusion of a greater number of databases and also comprising 'grey literature' may have enhanced the possibility of finding further research on the topic. The thematic analysis was conducted systematically and according to well-established methods. This iterative analysis was qualitative by nature and generalisability in a strict meaning cannot thus be regarded as one of its expected attributes. A pragmatic approach to validating qualitative analyses was instead adopted: Use of systematic sampling, triangulation and constant comparison, and proper audit and documentation [39].

Conclusions
The motivation to undertake this work was to address a 'real-world' desire to minimise possible subjectivity in performance evaluation processes. Through a scoping review and thematic analysis, we have described the existing foundation for an objective methodology towards categorising and analysing performance metrics from IBE's in athletics. By objectifying and establishing PO as an ultimate endpoint for sports medicine and epidemiological research, further opportunities evolve to apply this methodology to sports federation athlete evaluation and the investigation of factors that influence PO success and failure. A considerable challenge remains in constructing a methodology that combines the two observed independent performance metric variables (continuous and ordinal) that are currently utilised as endpoints.
Author Contributions: All authors contributed equally to the conceptualisation, methodology, and writing contribution to this article.
Funding: This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.