Developing a Model of Risk Factors of Injury in Track and Field Athletes

This work aimed to develop a model to assess the likelihood of injury in track and field athletes, and to establish which factors have the greatest impact. Tests verifying their significance were also reviewed, as well as the method for selecting variables. The key element was to confirm the quality of the classification system and to test the impact of individual factors on the likelihood of injury. The survey was carried out among physically active participants who take part in track and field sporting disciplines. The Cronbach’s alpha was 0.73, which can be considered an acceptable value for the survey. The seven most important factors influencing the risk of injury were selected from a group of twenty-four and were used to create the model. The Nagelkerke’s R2 reached 0.630 for the logit model, which indicates a good effect of the independent variables. The data suggested that the largest factor influencing the risk of injury was the number of prior injuries.


Introduction
According to sociologists, the popularity of sport in the world is rising, reflecting the needs of a modern society seeking adrenaline rushes and high emotions [1]. The majority, however, are satisfied with watching sport or treating physical activity as a hobby [2]. Sport, for the remaining group, reflects their way of living [3]. Their ultimate objective is to find the limitations of their bodies and to achieve the best results [4]. In this case, the health benefits of sport become dangerous. Where there is regular physical activity, especially in running, there is always a risk of injury. According to Wiese-Bjornstal [5] and Fernandes et al. [6], the main causes of sport injuries are physical, biological including physiological, anatomical, and some training factors, such as muscle imbalances, overtraining, physical fatigue, or a lack of base physical condition. Environmental factors, such as facilities and unsuitable sport equipment, can also affect the occurrence of injury. Additionally, injuries often cause severe stress for athletes, disrupt their training, interfere with high performance competition, and can lead to feelings of separation and isolation from their family, coaches, and teammates [6,7].
Injuries related to sport, cars [8], and bikes [9], or accidents in the workplace, are a daily occurrence in human lives. The usual outcome is some kind of injury that takes place during competitive sport as well as during recreation activities, where people should have fun and rest. Bare and Holne [10] claimed that according to the van Mechelen [11] model, once it has been recognized through injury investigation that sports injuries create a threat to the health of athletes, injury prevention must be established. This includes information on why a particular athlete may be at risk in a given situation (risk factors) or how injuries happen [9,10]. A daily training regime and, consequently, excessive exercise frequently results in injuries. It is challenging to predict specific injuries since the entire process is very complex expected injuries and establishing relations between them. Logistic regression analysis was used to analyze the data obtained. Once it has been recognized that sports injuries threaten the health and performance of athletes, the reasons must be established for future injury prevention. Here, a theoretical approach was used. Modeling the risk of injury in sport assesses the most common injuries and their causes. The probability of injury was modeled using binomial and link distribution and a logit function was established. The Hosmer-Lemeshow (HL) test was applied and WoE coefficients (weight of evidence) for all model variables were calculated. The design and implementation of this model may be used in the future for modeling the risk of key injuries among runners, which probably constitute the majority of injuries among track and field athletes.

Subject
For this study, survey data and identifying factors were collected from a group of physically active track and field athletes, including jumpers, runners, and throwers. The sample of respondents included five age categories: U16, U18, U20, U23, and seniors (athletes between 24 and 35 years old). Moreover, the athletes were at different levels of sports performance: international, national, first, second, and third sports classes. The sample of respondents included 105 females and 101 males ( Figure 1), which represents 5.5% of the total number (3750; 1938 women and 1812 men) of competitors registered in 2016-2017 by the Polish Athletics Association. I hereby confirm that I have taken into account all ethics issues. The research does not involve physical interventions on the study participants (questioner) and did not, at any stage, involve animals. Therefore, the study (W13_221661_2018) was accepted by the Human Ethics Committee of the Wroclaw University of Science and Technology, and does not require an ethics code.

Study Design
This study is focused on identifying and determining the cause of injury occurrence among track and field athletes of different events, different age categories, and performance level. The necessary information was obtained using a specially constructed questionnaire, including information on why a particular athlete may be at risk in a given situation or how injuries happen. Cronbach's alpha was used to assess the validity of the questionnaire. The next step was selecting the risk factors that influence expected injuries and establishing relations between them. Logistic regression analysis was used to analyze the data obtained. Once it has been recognized that sports injuries threaten the health and performance of athletes, the reasons must be established for future injury prevention. Here, a theoretical approach was used. Modeling the risk of injury in sport assesses the most common injuries and their causes. The probability of injury was modeled using binomial and link distribution and a logit function was established. The Hosmer-Lemeshow (HL) test was applied and WoE coefficients (weight of evidence) for all model variables were calculated. The design and implementation of this model may be used in the future for modeling the risk of key injuries among runners, which probably constitute the majority of injuries among track and field athletes.

Subject
For this study, survey data and identifying factors were collected from a group of physically active track and field athletes, including jumpers, runners, and throwers. The sample of respondents included five age categories: U16, U18, U20, U23, and seniors (athletes between 24 and 35 years old). Moreover, the athletes were at different levels of sports performance: international, national, first, second, and third sports classes. The sample of respondents included 105 females and 101 males ( Figure 1), which represents 5.5% of the total number (3750; 1938 women and 1812 men) of competitors registered in 2016-2017 by the Polish Athletics Association. I hereby confirm that I have taken into account all ethics issues. The research does not involve physical interventions on the study participants (questioner) and did not, at any stage, involve animals. Therefore, the study (W13_221661_2018) was accepted by the Human Ethics Committee of the Wroclaw University of Science and Technology, and does not require an ethics code.

Injury History Survey
The first step toward injury evaluation and future modeling was selecting the factors that influence expected injuries and establishing relations between them. Different tools were used to acquire data and select these factors [29]. For this study, survey data were prepared and analyzed. The questionnaire was part of the first author's dissertation [30]. In our experiment, the reportable concept of injury should be understood as the physical condition of the athlete, who, at least once in

Injury History Survey
The first step toward injury evaluation and future modeling was selecting the factors that influence expected injuries and establishing relations between them. Different tools were used to acquire data and select these factors [29]. For this study, survey data were prepared and analyzed. The questionnaire was part of the first author's dissertation [30]. In our experiment, the reportable concept of injury should be understood as the physical condition of the athlete, who, at least once in one year (between August 2016 and August 2017) of the training period, was temporarily (one to two months) prevented from continuing training or participating in competitions. This injury required medical advice and short rehabilitation. The injury is recognized as the specific type of injury (characteristic of athletics) characterized by a similar recovery time. They concerned the following health problems: muscle pulling (hamstring, quadriceps, groin, and other), Achilles tendonitis, knee tendonitis, ankle sprain, wrist twist, or lower back pain. The injury specification applies to all athletes irrespective of sport class, age, gender, and event. The electronic survey data was collected from athletes in October 2017, after finishing the 2017 competition period.
The questionnaire was divided into three sections/categories concerning demographic factors, training/competition factors, and health/regeneration factors. Demographics included sex, BMI, and morphology. The training/competition section contained information about training specification, including training load, warm-up connected with a single training session, training experience, sport level availability and quality of sport facility, atmospheric condition in summer training period and competition period, and biomechanics, which is connected to the techniques executed during training and competition. They were categorized in such a way that the outliers (values three standard deviations away from the mean) were included in one data set. More frequently occurring answers were further classified into sets. Adjacent categories for ordinal scale variables were combined when the number of observations in a selected category was not higher than five ( Table 1). The majority of the factors were categorical variables for which coding is an important process. The purpose of coding is to prepare data for multidimensional analysis, which has a great influence on the interpretation of coefficients in models. All variables are coded as 0 or 1. A variable taking the value 0 is called a reference category. Dummy coding is used in multidimensional data models and is designed to answer the query of how an estimated coefficient variable βˆj of j in each analyzed category differs from the results of this coefficient for the reference category. The logical supplement to the categorical division within which there is a likelihood of injury in the mobility aspect, characteristic for athletics, is the introduction of numerical ranking, from 1 to 5, of selected factors/observations. The numerical ranking was performed as follows: When the number means sex, 1: injury among women; 2: injury among men. The following factors are also included in the binary numerical ranking:

Statistical Procedure
Logistic regression analysis was used to analyze the data obtained [31][32][33]. It aimed to determine the approximate relationship between a dependent variable (represented as y) and an independent variable (represented as x). An independent variable was referred to as a factor. If p factors are obtained, our data for n athletes can be represented inFigure 2.
• Frequency of practicing/competition in a badly prepared area of a sports facility: 1: never; 2: occasional; 3: quite often; 4: often; 5: always. When the number means sex, 1: injury among women; 2: injury among men. The following factors are also included in the binary numerical ranking: • Biomechanics: quality of posture: 1: correct; 2: with defects. • Natural regeneration ability: 1: insufficient; 2: good.

Statistical Procedure
Logistic regression analysis was used to analyze the data obtained [31][32][33]. It aimed to determine the approximate relationship between a dependent variable (represented as y) and an independent variable (represented as X). An independent variable was referred to as a factor. If p factors are obtained, our data for n athletes can be represented in Figure 2. The xij is the value of factor j for element I, while yi is a value of the dependent variable for this element. Each observation y1,y2,...,yn has one of the two possible outcomes: 0 for no injury (failure), or 1 for an injury (pass). The following chapters describe them as predictors or outcomes. There were qualitative variables: nominal and ordinal, which have categories, or quantitative variables. To improve the survey, some quantitative variables were divided into categories despite losing a part of the information. Cronbach's alpha was used to assess the validity of questionnaire. Cronbach's alpha = 0.73, which is considered an acceptable value. Cronbach Alpha values range from 0 to 1. In most cases, the value should be at least 0.70 or higher, although a value from 0.60 to 0.70 is acceptable. For the calculation of effect size in logit models, the coefficient of Cox and Snell was used. This coefficient takes values between 0 and 1, where 0 indicates a very weak effect of the independent variable. However, this coefficient cannot reach a value of 1; therefore, Nagelkerke's R2 was applied, which is the value of Cox and Snell's R2 standardized on the maximum value it can achieve. The Statistica 10.0 software divides the factors into uncategorized, quantitative predictors and categorized, categorical, and numerical predictors. The x ij is the value of factor j for element I, while y i is a value of the dependent variable for this element. Each observation y 1 , y 2 , ..., y n has one of the two possible outcomes: 0 for no injury (failure), or 1 for an injury (pass). The following chapters describe them as predictors or outcomes. There were qualitative variables: nominal and ordinal, which have categories, or quantitative variables. To improve the survey, some quantitative variables were divided into categories despite losing a part of the information. Cronbach's alpha was used to assess the validity of questionnaire. Cronbach's alpha = 0.73, which is considered an acceptable value. Cronbach Alpha values range from 0 to 1. In most cases, the value should be at least 0.70 or higher, although a value from 0.60 to 0.70 is acceptable. For the calculation of effect size in logit models, the coefficient of Cox and Snell was used. This coefficient takes values between 0 and 1, where 0 indicates a very weak effect of the independent variable. However, this coefficient cannot reach a value of 1; therefore, Nagelkerke's R2 was applied, which is the value of Cox and Snell's R2 standardized on the maximum value it can achieve.

Results
The Statistica 10.0 software divides the factors into uncategorized, quantitative predictors and categorized, categorical, and numerical predictors.

Results
Forward stepwise selection was used to choose seven predictors for the model: previous injuries, sleep, age, blood, training load, atmospheric conditions, and competition. Table 2 shows that according to S and Wald statistics, the other predictors should be removed from the model. This model will be referred to as M. In our studies, the HL test value was 7.36, which produced a p-value equal to 0.5. The p-value was higher than the standard significance level of p = 0.05. This means there was no evidence to reject the hypothesis that the analyzed model of logistic regression was well fitted to the data. This was also confirmed by the high AUC value of 0.88. Figure 3 presents the receiver operating characteristic (ROC) curve and area under the curve (AUC) value.  The odds ratio for model M calculated from the classification matrix (Table 3) was 19.53. It was much higher than one and indicated that the classification was much better than that expected "at random".  The odds ratio for model M calculated from the classification matrix (Table 3) was 19.53. It was much higher than one and indicated that the classification was much better than that expected "at random". Next, WoE coefficients for all model M variables were calculated. The strongest relationship was observed between the factor prv.inj (number of previous injuries) and the risk of injury, reflected by the highest value of IV = 0.993.
Analysis of the graph presented in Figure 4 and WoE coefficients showed that the risk of injury increased with the category. The higher the category, the greater the number of previous injuries and the lower the WoE coefficient. The lower the WoE coefficient, the higher the risk of injury.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 11 Next, WoE coefficients for all model M variables were calculated. The strongest relationship was observed between the factor prv.inj (number of previous injuries) and the risk of injury, reflected by the highest value of IV = 0.993.
Analysis of the graph presented in Figure 3 and WoE coefficients showed that the risk of injury increased with the category. The higher the category, the greater the number of previous injuries and the lower the WoE coefficient. The lower the WoE coefficient, the higher the risk of injury. The WoE coefficient gradually decreased ( Figure 3) across categories 1 and 2 (positive WoE, risk of injury is relatively small for elements included in these classes) to categories 3 and 4 (negative WoE, increase in the risk of injury).
The training period was the second predictor creating the strongest relationship with the risk of injury, for which the IV value was 0.326. The IV statistic of age was frequently related to the training period was only slightly lower (IV = 0.282). There were mostly moderate relationships between the remaining predictors and probability of belonging to the class 0.1 < IV < 0.3.

Discussion
This work aimed to develop a model to assess the likelihood of injury in track and field athletes, and to establish which factors determine the probability to the greatest extent. Our primary focus was on severe injuries with extensive consequences and require the help of specialists as well as significant changes in the training program [28]. There were also large discrepancies between the group of respondents (a) and the representative group (b) in females and males. This was due to the fact the questionnaire was filled mainly by athletes with higher sports classes.
According to van Mechelen [11], the injury risk factors are mostly divided into two main categories: internal-athlete-related risk factors, often recognized as intrinsic factors; and externalenvironmental risk factors. When we try to model the risk of injury in sport, all these factors can be divided into modifiable and nonmodifiable factors [10]. In our studies, most of the factors can be classified into the above categories; however, our analysis is based on three categories: demographic factors, training/competition factors, and health/regeneration factors. The more precise the division and grouping of factors, the greater the likelihood of classifying a particular factor as modifiable. Such an assumption may indicate a more effective modeling process, and thus help to determine the The WoE coefficient gradually decreased (Figure 4) across categories 1 and 2 (positive WoE, risk of injury is relatively small for elements included in these classes) to categories 3 and 4 (negative WoE, increase in the risk of injury).
The training period was the second predictor creating the strongest relationship with the risk of injury, for which the IV value was 0.326. The IV statistic of age was frequently related to the training period was only slightly lower (IV = 0.282). There were mostly moderate relationships between the remaining predictors and probability of belonging to the class 0.1 < IV < 0.3.

Discussion
This work aimed to develop a model to assess the likelihood of injury in track and field athletes, and to establish which factors determine the probability to the greatest extent. Our primary focus was on severe injuries with extensive consequences and require the help of specialists as well as significant changes in the training program [28]. There were also large discrepancies between the group of respondents (a) and the representative group (b) in females and males. This was due to the fact the questionnaire was filled mainly by athletes with higher sports classes.
According to van Mechelen [11], the injury risk factors are mostly divided into two main categories: internal-athlete-related risk factors, often recognized as intrinsic factors; and external-environmental risk factors. When we try to model the risk of injury in sport, all these factors can be divided into modifiable and nonmodifiable factors [10]. In our studies, most of the factors can be classified into the above categories; however, our analysis is based on three categories: demographic factors, training/competition factors, and health/regeneration factors. The more precise the division and grouping of factors, the greater the likelihood of classifying a particular factor as modifiable. Such an assumption may indicate a more effective modeling process, and thus help to determine the most critical factors affecting sports injuries. Nonmodifiable risk factors, such as gender and age, may be of interest.
Because age and gender are nonmodifiable, they were excluded from the model. Most factors are training factors that are modifiable through behavioral approaches [10]. Therefore, such a multifactorial approach to the determination of risk factors of sports injuries requires a dynamic model, which takes the entire sequence of events preceding or directly affecting the occurrence of an injury. According to Meeuwisse [34], the sum of these risk factors and the interaction between them "prepares" the athlete for an injury to occur in a given situation. Studies considering only a single variable and its impact on injury risk may be too simplistic. Therefore, in order to better understand the etiology of injuries, the collective involvement of many factors in the risk of injury should be analyzed [35,36].
According to Ruddi [35] and Bittencourt [36], injuries occur as a result of complex and nonlinear interactions between multiple factors. Bahr [37] stated that it is unlikely that a single, isolated factor is capable of providing enough information to predict injuries at the individual level. Our model includes seven predictors: previous injuries, sleep, age, blood, training load, atmospheric conditions, and competition. Although age belongs to the unmodified category of risk factors [10], in our study, age was included in the modeling process because the athletes were divided into age categories. Training and sports level, as well as competitions, are related to age categories, including the number of competitions in the season. Training factors are modifiable through behavioral changes, which can alter blood parameters. Training and competition periods occur in an annual training cycle. The training period was the critical predictor creating the strongest relationship with the risk of injury, for which the IV value was 0.326.
Track and field athletes are at particularly high risk of injury due to their involvement in many events. This shows that the inherent risk of sports injury is related to the degree of athlete event exposure [38,39]. The more one trains and competes, the more one is exposed and the more injuries occur. The frequency and number of injuries as well as the significance of the injury influence further training and competition. Therefore, the most critical risk factors affecting sports injuries are health-related factors, such as previous injuries and sleep, which are associated with resistance to illness [40]. This was confirmed by our model M, which showed strong connections between the factor prv.inj (number of previous injuries) and the risk of injury, reflected by the highest value of IV = 0.993. A study done by Chen [40] showed that 71.1% of athletes who have suffered from sports injuries in the past were likely to suffer again. A study by Walter et al. [41] pointed out that only 50% of sports injuries are new, while the rest are repeat injuries. The explanation of these dependencies highlights Ruddy's [35] claims that previous injury had been used as an example to explain methodologies that can be used to determine the association between a factor and the risk of injury. These dependencies confirm Akobeng [42], who pointed out that all factors associated with an increase or decrease in the risk of injury are often repetitive.
Modeling the risk of injury in sport is a theoretical analysis of the risk of getting injured by paying attention to the most common injuries and what may cause them. The design and implementation of this model may be used for modeling the risk of car accidents [8] or accidents in the workplace.
We agree with Carey's [43] suggestion that a significant limitation in implementing complex approaches of modeling the risk of injury is the amount of data. It is required for the application of the appropriate methodology of investigation. Implementation of sufficient data will help determine the most critical interactions between risk factors and, above all, find those factors that are most often repeated and play a serious role in the occurrence of injury. Therefore more detailed injury data in future research is needed to reinforce the present results. The second limitation is the small number of athletes participating in the study. Further research should more deeply investigate the relationship between training load data, especially considering the training and competition cycle and the risk of injury. The limitations of the present survey-based study also include answers based on the subjective Appl. Sci. 2020, 10, 2963 9 of 11 feelings of the respondents. The final thoughts were based on factors that can be clearly identified by the respondents.

Conclusions
According to its AUC and IV values, previous injuries (prv.inj) was revealed to be the most significant factor. There is no doubt that each injury has a minor or major influence on the biomechanics of athletes' movement, causing unnatural loading and often leading to other injuries. Training experience (experience) was the second strongest predictor affecting the probability of injury. The risk of injury increases with each year of training. Although training and exercise prepare the body for loading and may protect it from injury, the final analysis showed that the potential risk of injury over time is increasing. The significance of these two factors is highly intuitive and commonly known among athletes, thus these findings support the correctness of the presented method. The Cronbach's alpha was 0.73, which can be considered acceptable. The Nagelkerke's R2 reached 0.630 for the logit model, which indicates a moderately strong effect of the independent variables. Our results indicate that early identification of risk factors and their gradation will help prevent further injuries.