1. Introduction
Musculoskeletal injuries represent a frequent health burden among physically active individuals, particularly young adults engaged in recreational and competitive activities. In a large collegiate survey, the 1-year prevalence of sports injuries reached 50.01% among Japanese university athletes [
1]. Recent biomechanical evidence also suggests that optimizing landing strategies may help reduce lower-limb injury risk [
2]. Beyond first-time injuries, subsequent injuries represent an important clinical and practical problem because they may disrupt training continuity, reduce physical performance, and increase longer-term health consequences [
3]. Subsequent injuries may occur due to incomplete recovery, persistent functional deficits, or an inappropriate return to physical activity, and they can be categorized according to whether they affect the same site and tissue as the index injury [
4,
5]. Understanding factors associated with subsequent injuries is therefore essential for informing prevention strategies and supporting safe participation in physical activity.
Previous research has identified several determinants of musculoskeletal injuries, including individual characteristics and body composition, as well as prior injury history [
6]. Evidence syntheses indicate that hamstring strain injuries have multifactorial determinants, with neuromuscular and exposure-related factors often discussed as relevant contributors, although effects are typically modest and context-dependent [
7]. In addition, systematic evidence indicates that training exposure and workload are associated with injury outcomes in physically active populations, although the strength of these associations varies across settings and outcome definitions [
8,
9]. Training exposure and load (e.g., training weekly load) may contribute to injury occurrence, particularly when increases in training dose are rapid and recovery is insufficient, potentially exceeding tissue capacity [
8,
9]. Despite these advances, most studies have focused primarily on initial injuries rather than subsequent injuries, and the determinants of repeated injury events remain less well understood in student populations [
7].
Analyzing subsequent injuries is crucial because they may involve different body sites or tissue types than the initial event, indicating that previously injured individuals can be vulnerable to broader injury patterns [
10]. Importantly, the risk of a subsequent time-loss injury appears to be highest immediately after return to play; for example, [
10] reported a 9.4% injury risk in the week of return to play compared with a 3.6% baseline risk. Moreover, re-injuries may occur shortly after an initial injury and during return-to-play, when unresolved deficits combined with ongoing training exposure can increase vulnerability to a further episode [
11]. To better interpret these patterns, classification approaches allow researchers to distinguish between recurrent, related, and unrelated subsequent injuries and to examine injury trajectories more precisely [
12]. Such approaches improve the interpretability of repeated injury events and support the development of targeted prevention strategies [
13].
Despite the growing body of evidence on musculoskeletal injuries in young adults, university students remain understudied with respect to subsequent injury occurrence. In particular, there is a lack of studies that jointly examine the interplay between key intrinsic body-composition indices reflecting adiposity and muscularity, specifically fat mass index (FMI) and skeletal muscle index (SMI) and a key extrinsic exposure metric, training weekly load (TWL), in relation to subsequent injuries in this population [
14]. Although body composition indices have been associated with injury occurrence in physically active individuals [
15], most prior work in student populations has focused on first-time injuries or general musculoskeletal symptoms rather than on subsequent injuries following an initial episode. Given that musculoskeletal injuries are common among university students and may impair daily functioning and academic performance [
16], clarifying how FMI and SMI relate to subsequent injury outcomes in the context of TWL is important for developing evidence-informed prevention strategies. Furthermore, applying extended classification frameworks for subsequent injuries can provide deeper insight into mechanisms leading to repeated injury events and help differentiate recurrent, related, and unrelated outcomes [
17]. Together, these gaps form the rationale for the present study.
This study is the first to comprehensively analyze subsequent musculoskeletal injuries among university students by focusing on the joint contribution of body composition indices (FMI and SMI) and training weekly load (TWL). Using an established subsequent injury classification framework, this study differentiates between recurrent, related, and unrelated injuries to characterize injury trajectories more precisely [
17]. Accordingly, this study addresses the following research questions:
- (1)
What are the independent and combined associations of FMI, SMI, and TWL with the occurrence of subsequent musculoskeletal injuries (recurrent, related, or unrelated) in university students?
- (2)
To what extent do FMI, SMI, and TWL contribute to explaining differences in subsequent injury outcomes, and what are the main explanatory limitations of these associations within the present study design?
We hypothesized that FMI, SMI, and TWL would each be associated with subsequent injury occurrence, and that considering body composition, training weekly load (TWL), and physical activity measures jointly would provide a more comprehensive explanatory perspective than single-domain analyses, without implying predictive performance.
2. Materials and Methods
The present analysis is based on data from two independently recruited cohorts of physically active university students assessed between 2022 and 2023 using an identical measurement protocol. The cohorts were merged to increase statistical power and improve the robustness of injury-related analyses, particularly for subsequent and recurrent injury outcomes characterized by relatively low event counts. For the purposes of this study, only participants with complete data for all variables relevant to the planned analyses were included, which explains differences between the current sample size and those reported previously.
Prior to merging, the cohorts were compared across all variables included in the present analyses; standardized mean differences below 0.10 supported their treatment as a single study population.
2.1. Study Design
The investigation followed a cross-sectional framework, and participants were enrolled from a university student population using a non-probability (convenience) sampling strategy. A cross-sectional design was adopted because injury history, including prior and subsequent injuries, was assessed retrospectively within a defined recall period using a standardized questionnaire. Data collection was carried out at the Wroclaw University of Health and Sport Sciences during 2022–2023, involving students from physical education, sport, and physiotherapy degree programs.
Data from two independently recruited cohorts were combined to increase statistical power and ensure stable estimation of associations between subsequent musculoskeletal injury occurrence and intrinsic and extrinsic risk factors. Prior to merging, cohort comparability was verified as described above.
Participants completed an online questionnaire assessing musculoskeletal injury occurrence, physical activity patterns, and training characteristics, followed by an in-person laboratory assessment including anthropometric and body composition measurements.
2.2. Ethics
Ethical clearance for the study was granted by the Senate Research Ethics Committee of the Wroclaw University of Health and Sport Sciences (approval no. 13/2022). All stages of the research were carried out in line with the ethical principles defined by the Declaration of Helsinki. Before enrolling in the project, students received comprehensive information regarding the purpose of the study, the assessment procedures, and the conditions under which their data would be stored and used. Each participant provided electronic informed consent prior to the commencement of data collection.
2.3. Sample Size
The planned sample size was guided by methodological recommendations for multivariable analyses in epidemiological research. For logistic regression, we followed the widely used heuristic method of approximately the number of outcome events per model parameter as a practical guideline to support reliable model estimation, including interaction terms [
18,
19].
Alongside this rule-of-thumb, we also performed an a priori sample-size calculation based on the conventional proportion formula, assuming a 95% confidence level and a prespecified margin of error (δ) [
18,
19]:
With δ set to 0.05 and p fixed at 0.5 (worst-case variance), the resulting minimum sample size was 385. To account for potential missingness and incomplete records, we inflated this estimate by 20%, yielding a target sample size of approximately 460 participants. These parameters were chosen to provide sufficient power for sex-stratified analyses. In subgroup models where the “10 events per predictor” guideline could not be fully met, this criterion was treated as a flexible recommendation.
2.4. Participants
In total, 454 university students participated (219 men, 48%; 235 women, 52%). Participants were drawn from students studying Physical Education, Sport, and Physiotherapy at the Wroclaw University of Health and Sport Sciences during the 2022–2023 academic year. Invitations to join the study were extended during scheduled classes, with no obligation to participate. The proportion of men and women in the final sample closely mirrored the typical sex distribution within these academic programs.
Initial eligibility criteria specified that participants had to be physically active students who regularly attended on-site university classes. To keep the sample comparable in terms of habitual physical activity exposure, we excluded students who participated in university-level competitive sport or were enrolled in elite performance programs. University-level competitive sports participation was defined as current membership in officially recognized university sports teams competing in inter-university or national-level competitions. Enrollment in elite performance programs referred to participation in structured, high-performance training pathways characterized by systematic coaching, competition schedules, and training volumes exceeding standard curricular requirements.
Eligibility for inclusion was assessed using a standardized self-report screening question administered during recruitment, in which participants declared whether they were currently involved in university sports teams or elite athletic programs. Individuals responding affirmatively were excluded from participation. Exclusion criteria were: (1) refusal to participate, (2) incomplete data for key study variables, (3) exemption from mandatory university physical classes lasting longer than two consecutive weeks, and (4) presence of an acute musculoskeletal injury within one month prior to the assessments.
Out of 454 participants, 36 were excluded due to missing data in key variables (questionnaires, including injury occurrence or measurements, mainly balance—not analyzed in this work), which were treated as block-missing cases.
A further 13 participants were missing data for only one variable; these missing values were addressed using multiple imputation, as detailed in the subsequent section. After applying data cleaning procedures and handling missing values through multiple imputation, the final analytical cohort comprised 418 students. The average total physical activity levels, expressed in MET·min/week, were above the IPAQ-defined high-activity criterion (≥3000 MET·min/week) in both men (3608 ± 1357) and women (3019 ± 1001).
2.5. Anthropometric Measurements
Anthropometric data collection took place at the Biokinetics Research Laboratory of the Central Research Laboratory, Wroclaw University of Health and Sport Sciences. The laboratory complies with a certified Quality Management System aligned with PN-EN ISO 9001:2015 (Certificate No. PW-15105-22X) [
20].
Stature was recorded in duplicate to the nearest 0.1 cm using a GPM anthropometer (GPM Instruments GmBH, Susten, Switzerland). Assessment of body mass and body fat percentage was carried out by means of bioelectrical impedance analysis (BIA) using the InBody 230 device (InBody Co., Ltd., Cerritos, CA, USA). Participants underwent testing without footwear and in light sports attire, in accordance with the manufacturer’s guidelines. Body mass index (BMI) was then computed using the following equation:
Similarly, Fat Mass Index and Skeletal muscle Mass Index were calculated using formula:
In the present study, BMI was used solely as a descriptive variable. To avoid redundancy and potential collinearity, only body composition indices—fat mass index (FMI) and skeletal muscle index (SMI)—were included as independent variables in the analytical models.
2.6. Questionnaire Measurements
Injury Occurrence—Injury History Questionnaire (IHQ)
Musculoskeletal injury history from the last 12 months was evaluated with the use of the standardized Injury History Questionnaire (IHQ), a commonly used instrument in epidemiological studies of physically active individuals [
14]. Participants reported the number and types of injuries related to sport, recreation, or everyday activities, including information on the injured body region and the duration of any resulting time-loss. For statistical analysis, responses were recoded into a binary variable reflecting the presence (1) or absence (0) of injury, which served as the main outcome measure. Previous studies in young adult and university student samples have demonstrated good test–retest reliability of the IHQ.
Subsequent injury classification
Based on the Injury History Questionnaire (IHQ), participants who reported at least one injury were additionally asked to indicate whether the injury occurred at the same anatomical location and involved the same injury type as any previously reported injury. Using this information, subsequent musculoskeletal injuries were classified into three mutually exclusive categories.
Recurrent injuries were defined as injuries occurring at the same anatomical location and of the same type as a previous injury. Related injuries were defined as injuries occurring at the same anatomical location but involving a different injury type, or injuries of the same type occurring at a different anatomical location. Unrelated injuries were defined as injuries occurring at a different anatomical location and involving a different injury type compared with any previously reported injury.
This classification was applied using predefined operational criteria derived from self-reported injury characteristics and was used to address study aims related to the differentiation of injury types.
For analytical purposes, subsequent injuries were further operationalized according to case numbers. While recurrent, related, and unrelated injuries were initially identified, injuries classified as related and unrelated were combined into a single category labeled “other subsequent injury” due to the limited number of cases in these subgroups. This approach ensured sufficient statistical power and stable estimation in the statistical analyses.
Physical Activity—International Physical Activity Questionnaire (IPAQ)
Levels of physical activity were measured using the Polish version of the International Physical Activity Questionnaire—Long Form (IPAQ-LF) [
21]. The questionnaire was completed online through the Google Forms platform. The independent variables were: total physical activity—TPA [MET/min/week], and vigorous component (VPA) presented as a percentage of TPA {%}.
Training weekly load (TWL)
Training Weekly Load (TWL) was derived from self-report items included in the lifestyle questionnaire. Participants were asked: (1) “On average, how many structured training sessions do you perform per week?” and (2) “On average, how long does one training session last?” (reported in minutes). TWL was calculated as the product of weekly training frequency and average session duration, and expressed as minutes per week, providing an estimate of the total structured training volume (time-based exposure)accumulated across a week. This pragmatic approach was selected as it is suitable for cross-sectional, questionnaire-based studies in heterogeneous, non-elite student populations, where prospective session-level measures of internal load (e.g., session-RPE) are not feasible and may be affected by recall bias. This approach has been widely used in studies of physically active students and recreational athletes, offering a practical indicator of habitual training exposure.
2.7. Treatment of Incomplete Data and Imputation Strategy
Data incompleteness was observed for some questionnaires results (Questionnaire of Eating Behaviors—QEB and Pittsburgh Sleep Quality Index—PSQI) and balance assessments. The missing-data mechanism was evaluated using a logistic regression–based Missing Completely At Random (MCAR) test, in which PSQI missingness was regressed on all 16 QEB dietary items. The likelihood ratio test (χ2 = 18.64, df = 16, p = 0.288) indicated no dependence on observed, supporting the assumption that missingness followed a Missing Completely at Random (MCAR) pattern. Balance measures were excluded from the missingness diagnostics because they were not included in the primary analytical framework.
Missing values in multivariable analyses were managed by applying a multiple imputation approach based on chained equations (MICE; mice v3.14.0). Twenty imputed datasets were generated using predictive mean matching for continuous variables, logistic regression for binary variables, and polytomous regression for ordinal dietary items. All analyses were conducted in R (RStudio 2025.09.1+401), with convergence confirmed through standard diagnostic plots.
Although the missing values did not concern the variables included in the present injury-related analyses, the imputation procedure is reported because it affected the composition of the overall dataset. Multiple imputation allowed all participants with partially incomplete records to remain in the analytical pool, ensuring that the final dataset consisted of 418 individuals rather than a smaller, case-wise reduced sample. In other words, imputing missing values preserved the full sample structure used across the study, while maintaining appropriate statistical integrity for subsequent modeling.
2.8. Statistics
Data analyses were carried out with the use of Statistica 14.0 (TIBCO Software Inc., Palo Alto, CA, USA) as well as RStudio (v2025.09.1+401). Assumptions of normality and homoscedasticity for continuous variables were verified using the Shapiro-Wilk and Levene’s tests, respectively, before further modeling. Baseline characteristics are summarized using means ± standard deviations with 95% confidence intervals or frequencies and proportions, as appropriate.
The sequence of statistical analyses and decision steps applied in the study is summarized in an analytical flowchart (
Figure 1), providing an overview of the methodological process from preliminary data screening to multivariable modeling and ROC-based evaluation.
Simple comparisons
Sex-related differences were assessed by applying independent t-tests for continuous variables and chi-square tests for categorical variables.
Univariate logistic regression
Following the exploratory stage, separate univariate logistic regression models were estimated for each intrinsic and extrinsic variable to assess their individual associations with subsequent injury (injured vs. non-injured). These models provided crude odds ratios (ORs) and 95% confidence intervals, allowing identification of the strongest single predictors within each risk-factor domain.
Multivariable logistic regression
Variables showing the strongest independent relationships in the univariate models (sex, FMI or SMI and TPA, VPA, TWL or EXP)—were then jointly entered into multivariable logistic regression models. This step allowed assessment of whether intrinsic and extrinsic predictors remained significant when controlling each other, and whether their combined effects improved model performance relative to single-factor models.
ROC analysis for single predictors
To quantify the discriminatory ability of the strongest body-composition and training/physical activity predictors, receiver operating characteristic (ROC) curves were generated for each variable separately. The discriminatory ability of the predictors was assessed by calculating the area under the curve (AUC). Optimal threshold values were established using the Youden index, with sensitivity and specificity reported for each variable. Cut-off points, sensitivities, and specificities were reported for descriptive and comparative purposes only and should not be interpreted as thresholds for practical screening or clinical decision-making.
ROC analysis for combined predictors
Finally, a ROC curve was constructed for the multivariable model including both the body-composition and training/physical activity predictors. Comparing this combined AUC to the AUCs of the two single-predictor models allowed evaluation of whether the cumulative effect of the strongest risk factors provided superior discrimination relative to individual predictors alone, and whether integrating intrinsic and extrinsic characteristics meaningfully enhanced injury-risk classification. Accordingly, sensitivity, specificity, and Youden index values are presented for descriptive and comparative purposes only and should not be interpreted as clinically actionable screening thresholds.
The significance threshold was set at p < 0.05. Interaction effects were visualized in R using ggplot2 (v. 4.0.1.).
Generative AI tools were used in accordance with COPE recommendations and MDPI transparency requirements and were limited to preparatory, editorial, and organizational support. No AI systems were involved in statistical analyses, interpretation of results, or formulation of scientific conclusions. Chat Academia (v1.0, 2025), Scholarcy (v4.0, 2025), NotebookLM (v1.3, 2025), and Elicit (v2.0, 2025) were used for literature exploration, preliminary summaries, and identification of common methodological approaches, with all outputs verified manually. ChatGPT (OpenAI, GPT-4.1, 2025) supported language editing, early drafting, and access to technical documentation for R (v. 4.5.2.) packages. All AI-assisted content was reviewed, edited, and approved by the authors, who take full responsibility for the final manuscript.
4. Discussion
This study examined how body composition indices, training weekly load (TWL), and physical activity measures relate to subsequent musculoskeletal injuries in physically active university students. Overall, the observed associations were weak. Univariate models showed only minimal associations between individual factors and subsequent injury, with SMI and, to a lesser degree, TWL emerging as the relatively strongest but still modest markers. The combined model indicated that neither SMI nor TWL, nor their interaction contributed meaningfully to subsequent injury status. Consistent with these findings, exploratory ROC analyses showed poor discrimination (AUC only marginally above ≈0.60 and <0.65 across models), indicating that practically these models are not useful for screening or individual-level stratification of subsequent injury in this population.
A key finding was that SMI was the only intrinsic/extrinsic variable significantly associated with subsequent injury; however, the magnitude of this association was minimal (OR ≈ 1.09) and corresponds to a trivial effect (Cohen’s d ≈ 0.05), which limits its explanatory relevance and practical value. This pattern is consistent with broader evidence suggesting that many modifiable morphological or neuromuscular characteristics show weak or inconsistent associations with injury outcomes and often provide limited standalone utility beyond non-modifiable factors such as age and previous injury [
7]. Similarly, prospective research in professional football suggests that comprehensive strength-testing protocols may have limited value as screening tools for discriminating future musculoskeletal injury risk in applied settings [
22]. Taken together, the small effect of SMI and the absence of a robust TWL signal support the view that subsequent injuries emerge from complex, multifactorial interactions rather than from any single marker [
23]. One plausible interpretation is that SMI may partly reflect habitual training exposure and longer-term sport participation, i.e., students with higher muscle mass may have greater exposure to potentially injurious situations which could explain the direction of association despite its very small magnitude; this is consistent with evidence linking skeletal muscle mass indices with physical activity history in university students [
24].
When interpreting training-related findings, the operationalization of exposure is important. Accordingly, findings related to training characteristics should be interpreted with caution, as the applied measures capture simplified and partial aspects of training exposure rather than its full physiological and temporal complexity. In our study, TWL was a pragmatic time-based proxy that may not capture intensity distribution, activity type, recovery, or temporal load fluctuations that influence adaptation. Contemporary conceptual models emphasize that training load alone is insufficient to explain injury occurrence and must be understood within broader interactions between intrinsic capacity, exposure, and recovery processes [
23]. In this context, injury risk is often framed as a function of how quickly and how far training loads are progressed relative to an individual’s capacity, rather than as a simple consequence of absolute weekly volume alone [
25]. Related work suggests that acute: chronic workload dynamics and short-term workload “spikes” (including EWMA-based approaches) may provide additional insight beyond simple weekly totals [
26]. These considerations likely contribute to the weak and imprecise association observed for TWL and help explain why combining TWL with SMI did not yield meaningfully stronger discrimination.
A further explanation for the weak associations is that the evaluated variables are relatively distal proxies of injury mechanisms. Body composition indices (FMI/SMI) may reflect long-term morphology and exposure, whereas subsequent injury occurrence is also driven by short-term fluctuations in tissue capacity, recovery, and task-specific loading that were not captured by our measures. In addition, physical activity metrics derived from IPAQ (TPA and VPA%) quantify overall activity and relative intensity distribution rather than the mechanical characteristics of exposure (e.g., impact, direction changes, contact), which may attenuate associations with musculoskeletal injury outcomes.
The absence of a cumulative or interaction effect in our analyses indicates that the combined models did not improve discriminatory performance compared with single markers, suggesting no meaningful synergy between the tested intrinsic and extrinsic variables within this relatively homogeneous, highly active student cohort. Although some studies in other contexts have reported incremental gains when combining indicators (e.g., selected asymmetry-related measures with body composition indices) [
27], broader evaluations of musculoskeletal injury prediction modeling show that, even when multiple candidate variables are combined, model performance is often modest, with limited generalizability and practical utility [
28]. Evidence on asymmetry-related risk factors is also mixed overall, with inconsistent associations across sporting populations [
29]. Importantly, reviews of machine-learning approaches in sports injury research highlight that even advanced analytical methods often achieve only modest and poorly generalizable discrimination when based on limited sets of isolated predictors, supporting the need for broader, multidimensional monitoring frameworks [
30]. Collectively, these observations align with our results and reinforce the central practical conclusion that simple anthropometric/body-composition markers and time-based TWL summaries are not sufficient for useful discrimination of subsequent injury status in this population.
The weak discriminatory performance of our models (AUC ≈ 0.60) has clear practical and clinical implications. In applied sports medicine contexts, AUC values below approximately 0.70 are generally considered to reflect poor or only minimal discriminative capability, indicating insufficient accuracy for reliable individual-level injury risk prediction. Consequently, such models offer limited clinical utility for distinguishing between individuals who will and will not sustain a subsequent musculoskeletal injury. From a practical perspective, this suggests that coaches, physiotherapists, and physical education educators should not treat skeletal muscle index (SMI) or training weekly load (TWL) as standalone or clinically actionable prediction tools for subsequent injury risk in previously injured university students. Instead, our findings reinforce the broader conclusion that injury prevention strategies in highly active populations should not be built around simple anthropometric markers or single-volume workload summaries. In the context of this manuscript, ROC findings are interpreted as discrimination/classification within the observed sample, rather than as prospective, time-ordered prediction. Importantly, existing evidence in sports injury prediction research indicates that even increasingly sophisticated analytical approaches often struggle to achieve strong and clinically meaningful discrimination when relying on limited sets of isolated predictors, underscoring the need for broader, multidimensional monitoring frameworks [
30]. From an applied standpoint, prevention efforts in university settings are likely better served by focusing on controllable and modifiable domains such as training intensity regulation and progression, movement quality and technical execution, and structured recovery management, while also accounting for psychological and academic stressors. Building on this practical perspective, day-to-day prevention in university settings may be better served by individualized return-to-activity progression after an initial injury and cautious load progression in students with a recent injury history [
31]. Routine monitoring of subjective fatigue and early warning symptoms (e.g., soreness, perceived exertion, sleep quality, and overall wellness) may provide more actionable information than weekly load summaries alone, as supported by evidence on single-item athlete wellbeing measures and their relationship with training load [
32]. Preventive practice is also facilitated by instructors and coaches who can support implementation and adherence; evidence indicates that coach education can improve adherence to injury-prevention programs in real-world settings [
33].
Several limitations of the present study should be acknowledged when interpreting the findings. Due to the cross-sectional design and retrospective assessment of injury history, causal relationships and precise temporal sequencing of injuries cannot be inferred. Training weekly load (TWL) was operationalized as a simple time-based proxy derived from self-reported training frequency and session duration and therefore did not capture dimensions such as exercise intensity, activity type, or progressive overload. Furthermore, because all participants were recruited from sport- and health-related university programs and exhibited generally high physical activity levels, the generalizability of the findings to less active or non-sport student populations may be limited. Vigorous physical activity was expressed as a percentage of total physical activity (VPA%), reflecting relative intensity distribution rather than absolute volume; therefore, this measure does not capture the total amount of vigorous activity performed. Training-related findings should be interpreted cautiously in light of these methodological limitations. Future work would benefit from longitudinal designs, sex- and age-stratified analyses, and broader sets of measured determinants to clarify whether injury-relevant profiles differ across subgroups and exposure environments [
14,
34,
35,
36,
37]. At the same time, while movement-screening constructs (e.g., Functional Movement Screen), posture-related characteristics, and performance attributes have been examined in the wider literature [
38,
39,
40,
41,
42], these domains were not assessed in the present study and therefore cannot be evaluated here.