Cross-Cultural Adaptation of the Dance Functional Outcome Survey (DFOS) for Spanish Dancers

A growing number of research papers regarding Spanish-speaking dancers justifies the need for an adapted Spanish version of the Dance Functional Outcome Survey (DFOS). The objective of this study was to cross-culturally adapt and validate the DFOS for Spanish-speaking dancers. A sample of 127 healthy and injured professional and pre-professional dancers were recruited. Test-retest reliability of DFOS-Sp was examined using intraclass correlation coefficients. Construct validity compared DFOS-Sp to the Medical Outcomes Study 36-Item Short-Form Health Survey (SF-36) using Pearson correlations. Principal component analysis identified factors and internal-item consistency. Sensitivity was evaluated by generating receiver operating characteristic and area under the curve analyses. A subgroup of 51 injured dancers were followed across three time-points to examine responsiveness using repeated measures analysis of variance. Injured scores were analyzed for floor and ceiling effects. The DFOS-Sp showed high test-retest reliability (ICC2,1 ≥ 0.92). DFOS-Sp scores had moderate construct validity compared with SF-36 physical component summary scores (r ≥ 0.56). Principal component analysis (PCA) supported uni-dimensionality explaining 58% of the variance with high internal consistency (α = 0.91).Area under the curve (AUC) sensitivity values were excellent (AUC ≥ 0.82). There were significant differences across time (p < 0.001), demonstrating responsiveness to change, with no floor or ceiling effects. The DFOS-Sp demonstrated acceptable test-retest reliability and validity in Spanish-speaking dancers, with comparable psychometric performance to the English-language version.


Introduction
Ballet and contemporary dance are activities that require advanced levels of technical skill. Dancers are considered high-performance athletes because they perform complex, physically demanding routines and are subjected to long periods of rehearsal, similar to other elite athletes [1,2].
Most studies that investigate the incidence and prevalence of injuries in dance point to classical ballet as the dance modality with the highest technical demands [3][4][5]. As the foundation for other dance forms, ballet also sustains the highest rate of injuries [1,6,7]. However, both classical ballet and contemporary dancers are at high risk of low back and lower extremity injuries [7][8][9].
Due to the functional impact of injury on the dancer's life, improved ways to assess and treat these challenging injuries are necessary. To analyze treatment efficacy, dance-specific outcome measures

Cross-Cultural Adaptation
We followed international recommendations to perform cross-cultural adaptation and translation of the DFOS [26,27]. Cross-cultural adaptation includes cultural and linguistic adaptation of a questionnaire, and examination of its psychometric properties of reliability and validity [28]. This process consisted of six steps: (1) forward translation; (2) reconciliation; (3) back translation; (4) review and reconciliation; (5) pilot study; and (6) validation ( Figure 1, Supplementary Materials survey A and B). Recommendations advise using forward translation (from source language to target language), followed by back translation to source language again, using bilingual translators who are, in one case, native speaking in the target language and in the second case, native speaking in the source language, followed by performing a thorough analysis of the new version to identify discrepancies and verify that the questionnaire will be clearly understood by study participants. In the first translation, one translators was a physical therapist and other was a sport doctor with extensive clinical experience in dance musculoskeletal disorders. In the back translation, the translators were sport physical therapists.

Participants
We recruited healthy dancers from dance companies and schools and injured dancers from dance-medicine physical therapy clinics in Spain. Inclusion criterion for healthy participants included: (i) minimum of 3-years dance training including ballet and/or contemporary dance; (ii) intermediate to expert skill level; (iii) ≥15-yrs; and (iv) no low back or lower extremity injury in the previous 3-months. For injured dancers, inclusion criterion were: (i) minimum of 3-years dance training including ballet and/or contemporary dance; (ii) intermediate to expert skill level; (iii) over 15-years of age; and (iv) clinical diagnosis of any musculoskeletal injury in the low back or lower extremity verified on ultrasound or magnetic resonance imaging. Exclusion criterion included: (i) non-Spanish-speaking; (ii) pregnancy; (iii) current active disease processes; (iv) the dancers who had the presence of musculoskeletal injury in other part of the body, such as upper limb or cervical region; or (v) previous surgery. Dancers were informed of the study objectives and provided written consent

Participants
We recruited healthy dancers from dance companies and schools and injured dancers from dance-medicine physical therapy clinics in Spain. Inclusion criterion for healthy participants included: (i) minimum of 3-years dance training including ballet and/or contemporary dance; (ii) intermediate to expert skill level; (iii) ≥15-yrs; and (iv) no low back or lower extremity injury in the previous 3-months. For injured dancers, inclusion criterion were: (i) minimum of 3-years dance training including ballet and/or contemporary dance; (ii) intermediate to expert skill level; (iii) over 15-years of age; and (iv) clinical diagnosis of any musculoskeletal injury in the low back or lower extremity verified on ultrasound or magnetic resonance imaging. Exclusion criterion included: (i) non-Spanish-speaking; (ii) pregnancy; (iii) current active disease processes; (iv) the dancers who had the presence of musculoskeletal injury in other part of the body, such as upper limb or cervical region; or (v) previous surgery. Dancers were informed of the study objectives and provided written consent according to the guidelines as approved by the local ethics committee (01/2019), which complied with all the principles set out in the Declaration of Helsinki. Parental consent was obtained if the dancer was under 18-years of age.
A priori analysis was conducted to determine sample size for test-retest reliability and construct validity with one-group, two measurements (test-retest), effect size δ = 0.25, power = 0.95, α = 0.05, resulting in 54 subjects [29]. A sample of 89 dancers was enrolled in the test-retest and 127 dancers in the construct validity analyses. For factor analysis, a minimum of five observations per item (e.g., 70 dancers) are recommended [30]. The sample of 89 dancers were used in the factor analysis.
To assess differences between healthy and injured groups in receiver operating characteristic (ROC) analyses, sample size estimation was conducted using a predetermined level of sensitivity of 80% (alternative hypothesis Ha= 0.80, null hypothesis H0 = 0.50, α = 0.05, power = 0.95) [31]. A sample of 52 per group was necessary, and 127 dancers were used in this analysis.
To assess instrument responsiveness to change, a priori analysis for sample size was conducted using one-group repeated measures ANOVA over three time-points: injured at intake (Intake Time), discharge (Discharged), and 3-months follow-up (3-months). With a small effect size δ = 0.25, power = 0.95, α = 0.05, a total sample of 43 was required. A sample of 52 injured dancers was used in this analysis.

Data Analysis
Incomplete questionnaires missing more than two items were eliminated. DFOS-Sp total, ADL and Technique scores were obtained by summing individual questions. SF-36 scores for the 8-domains and composite MCS and PCS scores were obtained using standard procedures [23].
Principal component analysis (PCA) was conducted using Eigenvalues and factor loading patterns to identify and extract factors and determine Cronbach's α (JASP v.0.9.2.0, University of Amsterdam, The Netherlands). We hypothesized a single-factor model with item correlations ≥ 0.70 and Cronbach's α ≥ 0.70.
To conduct sensitivity analyses in the healthy group and injured groups, we conducted a t test for equal variances not assumed, due to unequal sample sizes (Healthy = 74; Injured = 69) and significant Levene test (p < 0.004). Predictive accuracy or sensitivity was measured by generating ROC curves, area under the curves (AUC), and associated 95%CI for DFOS-Sp (total, ADL and Technique subscores) in SPSS. ROC curves used DFOS-Sp scores as binary state outcome variables coded as 0 = healthy and 1 = injured. Sensitivity and specificity for cutoff values were determined.
To determine internal responsiveness, we examined differences in DFOS-Sp and SF-36 scores in injured dancers across three time-points using repeated measures analysis of variance in SPSS. For all analyses, Mauchly's test was used to assess assumption of sphericity. In the case of significance, the Huynh-Feldt correction was applied to the degrees of freedom and F value if the epsilon value was 0.75 or greater, and the Greenhouse-Geisser correction if epsilon was <0.75. In these cases, epsilon and corrected values (e.g., degrees of freedom, F values) are reported. Pairwise comparisons were conducted where there was a significant main effect. We hypothesized pairwise differences across time-points.
Internal responsiveness was further defined in four ways: SEM, minimal detectable change at 95%CI (MDC 95 ), standardized response mean (SRM), and effect size, using the following equations: MDC 95 = 1.96* √ 2 *SEM; SRM = mean change in score/SD of change scores; effect size = mean change scores/SD of baseline scores. SEM, MDC 95 , SRM, and effect size were calculated for DFOS-Sp total and subscores and for SF-36 PCS and MCS. We anticipated SRM values, demonstrating high responsiveness [35], and large effect sizes of 0.80 or greater [36].
For floor and ceiling effects, we determined the percentage of dancers who achieved the highest and lowest DFOS-Sp scores within the 'Injured' group. Ceiling and floor effects of <15% of respondents scoring the highest or lowest scores were considered acceptable [37].
Healthy group test-retest reliability of DFOS-Sp total, ADL, and Technique scores were high (ICC 2,1 = 0.92, 0.89, and 0.91 respectively). DFOS-Sp item correlations were high, ranging from ICC 2,1 = 0.70-0.92, with the exception of stairs, developpé, and rond de jambe which were moderate (ICC 2,1 = 0.66-0.69). SEM ranged from 1.29-1.93. 'Injured' group reliability scores were very high for all DFOS-Sp scores and item-level (ICC 2,1 = 0.99). SEM ranged from 0.66-1.49. The combined (Healthy and Injured) pool of 127 dancers was used in analyses of construct validity. Moderate Pearson correlations were found between SF-36 PCS v. DFOS-Sp total (r = 0.61), and subscores (ADL r = 0.56 and Technique r = 0.58) ( Table 3). Individual ADL-items were compared to PCS-domains (Physical Function, Role Physical, Bodily Pain) with correlations ranging from r = 0.26-0.54. Individual Technique-items were compared to PCS-domain Physical Function with correlations ranging from r = 0.20-0.50. In contrast, weak to no correlations were found for SF-36 MCS v. DFOS-Sp total and subscores. Individual ADL-items compared to MCS-domains (Vitality, Social Functioning, Mental Health, Role Emotional) and Technique-items were compared to MCS-domain Social Functioning were also weak. Data from 89 participants were used in PCA and Cronbach's α internal consistency analyses. PCA, used single-factor loading with oblique oblimin rotation and suppression of coefficients < 0.40 [38]. Kaiser-Meyer-Olkin = 0.86 indicated sampling adequacy and Bartlett's Test of Sphericity was significant (χ 2 = 1053.222, df = 91, p < 0.001). Inter-item correlations loaded from 0.50-0.85 (Table 4), accounting for 58% of the variance explained and Eigenvalue = 8.12. Cronbach's α values were high for all 14-items (α = 0.91, CI 95 = 0.88-0.94), the 6-items within ADL (α = 0.91), and the 8-items within Technique (α = 0.91).  Fifty-one dancers (46 female, 5 male; mean ± SD age, 19.12 ± 3.02 years) participated in internal responsiveness investigations. The dancers were recruited from a private clinic that is specialized in dance injuries. The principal investigator gave the instructions, educated the physiotherapists, and supervised the physiotherapy treatments to ensure that the procedures were conducted a standard way. The dancer's information was given to the researchers. The main injuries were: soleus muscle strain, hamstring rupture, adductor magnus rupture, Achilles tendinopathy, post ankle impingement, psoas tendinopathy, bone edema, and patella tendinopathy (Figure 3). Treatment provided to the injured dancers included rest, eccentric exercises, ultrasound-guided percutaneous needle electrolysis, ultrasound-guided percutaneous neuromodulation, manual therapy, magnetotherapy, and activity modifications. Fifty-one dancers (46 female, 5 male; mean ± SD age, 19.12 ± 3.02 years) participated in internal responsiveness investigations. The dancers were recruited from a private clinic that is specialized in dance injuries. The principal investigator gave the instructions, educated the physiotherapists, and supervised the physiotherapy treatments to ensure that the procedures were conducted a standard way. The dancer's information was given to the researchers. The main injuries were: soleus muscle strain, hamstring rupture, adductor magnus rupture, Achilles tendinopathy, post ankle impingement, psoas tendinopathy, bone edema, and patella tendinopathy (Figure 3). Treatment provided to the injured dancers included rest, eccentric exercises, ultrasound-guided percutaneous needle electrolysis, ultrasound-guided percutaneous neuromodulation, manual therapy, magnetotherapy, and activity modifications.
Mauchaly's Test of Sphericity was not significant; therefore sphericity was not violated. There were significant differences across time for DFOS-Sp total F(1,49) = 64.145, p < 0.001, ADL F(1,49) = 90.954, p < 0.001 and Technique F(1,49) = 28.081, p < 0.001 (Figure 4, Table 5). Pairwise comparisons were also significant between Injured, Discharged, and 3-months (p < 0.001) for DFOS-Sp total, ADL, and Technique scores. There were also significant differences across time for SF-36 PCS F(1,49) = 135.892, p < 0.001 but not for MCS. PCS pairwise comparisons were significant between each of the three time-points (p < 0.001). Mauchaly's Test of Sphericity was not significant; therefore sphericity was not violated. There were significant differences across time for DFOS-Sp total F(1,49) = 64.145, p < 0.001, ADL F(1,49) = 90.954, p < 0.001 and Technique F(1,49) = 28.081, p < 0.001 ( Figure 4, Table 5). Pairwise comparisons were also significant between Injured, Discharged, and 3-months (p < 0.001) for DFOS-Sp total, ADL, and Technique scores. There were also significant differences across time for SF-36 PCS F(1,49) = 135.892, p < 0.001 but not for MCS. PCS pairwise comparisons were significant between each of the three time-points (p < 0.001).        For DFOS-Sp and SF-36 scores over the three time-points, SEM values were higher at Intake Time and lowest at 3-months. MCS 95 displayed a pattern of decreasing from high at Intake Time to low at 3-months. SRM values for DFOS-Sp and PCS scores ranged from 0.65 to 1.37 when comparing Intake Time to Discharged.
'Injured' group DFOS-Sp scores were examined for floor and ceiling effects. One percent of 'Injured' individuals had minimum DFOS-Sp total scores, and none had maximal scores. Therefore, no ceiling or floor effects were considered to be present.

Discussion
Our main finding of this study was that DFOS-Sp items were equivalent to those in the original version, as determined by bilingual and clinical experts involved in the study. This adaptation showed good psychometric properties in Spanish dancers. The hypotheses regarding high test-retest and equivalence reliability, high internal-item consistency, and convergent and divergent correlations between DFOS-Sp and SF-36 PCS and MCS were supported. Most items in the single-factor PCA model were highly correlated. No ceiling or floor effects were considered to be present. Each finding is discussed below.
The DFOS contains French ballet terms used in ballet training around the world, simplifying translation. All dancers participating in the DFOS-Sp had ballet training. Although Beaton et al. [25] suggested that one forward translator have no knowledge of the concepts being quantified, we used bilingual native Spanish-speaking translators with dance backgrounds for the forward translation for this reason. Therefore meaning and intent were simplified. The questionnaire seemed to be easily understood by participants, who required less than 5-min to complete it independently. All participants completed the full DFOS-Sp, resulting in a maximum response rate.
We approached our assessment of the psychometric properties of the DFOS-Sp using criteria described by the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN consensus) [39]. Important measurement properties to assess in health status questionnaires include internal consistency, reproducibility, construct validity, floor and ceiling effects, responsiveness, and interpretability. These are addressed below.
Test-retest reliability correlations were very high for DFOS-Sp total and subscores for 'Combined', 'Healthy', and 'Injured' groups, supporting our hypothesis of ICC ≥ 0.70. These findings are similar to those reported for the original DFOS [10]. Individual-items were also very high with several exceptions in the 'Healthy' group. Stairs, developpé, and rond de jambe were moderately correlated. This inter-item discrepancy was also reported for rond de jambe in the 'Healthy' group in the English version DFOS. It is often difficult for healthy dancers to discriminate between qualitative fluctuations of ability to perform, muscle soreness, and injury pain. Because they push themselves, we often see them, for example, avoiding stairs at the end of the day due to a heavy rehearsal or performance, without being injured.
We calculated SEM for 'Combined', 'Healthy' and 'Injured' groups, with similar results across the groups. SEM for DFOS-Sp was slightly lower than that reported for the English-DFOS [10] and are lower than reported values for other orthopaedic outcomes tools [40][41][42].
Construct validity was assessed by examining correlations between DFOS-Sp and SF-36. It was expected that PCS and its domains scores would have stronger correlations with DFOS-Sp, indicating convergent validity, while MCS and its domains scores would have weak or no correlation with DFOS-Sp, indicating divergent validity. Similar to previous studies, there were a moderate correlations for DFOS-Sp v. PCS and weak for DFOS-Sp v. MCS [42][43][44][45].
PCA found that all DFOS-Sp 14-items loaded onto a single factor. Factor-items were ≥0.74 with the exception of overall activity and kneeling. These exceeded r ≥ 0.50, so were not considered for elimination. High Cronbach's α indicated excellent internal consistency, similar to the original DFOS [10]. Bronner et al. [10] reported that all 14-items loaded onto a single factor, indicating a single dimension in the original DFOS. However, they suggested that for the clinician to interpret the impact of injury on ADL versus Technique and make clinical decisions in rehabilitation progression, clinicians should calculate both subscores as well as the total score. ROC analyses are used in clinical epidemiology to quantify how accurately medical diagnostic tests can distinguish between differing patient states, in this case, 'Healthy' and 'Injured' [46]. ROC plots sensitivity against 1-specificity across the full range of values. AUC assesses overall diagnostic accuracy, or discrimination, by summarizing the entire location of the ROC curve rather than depending on a specific operating point. DFOS-Sp total and subscores accurately demonstrated discrimination between healthy and injured dancers. An AUC value of 0.82, found in DFOS-Sp total, is considered excellent [47].
The majority of dance injuries were to the lower extremity, with most at the foot and ankle [6,7]. Injuries were typical of those seen in ballet, with 55% involving the calf, ankle, or foot.
All DFOS-Sp scores and PCS improved from Injured to Discharged and Discharged to 3-months as hypothesized. Greatest decreases in SEM and MDC 95 values occurred from Injured to Discharge but decreased further at 3-months. This is similar to previous reports of the English-DFOS and other instruments [10,42,48,49]. SRM values demonstrated high responsiveness [35], and high effect sizes as hypothesized. SRM and effect sizes were insubstantial in MCS scores.
Floor or ceiling effects were considered if more than 15% of the 'Injured' group achieved highest or lowest scores [37]. Floor or ceiling effects suggest limited content validity. If a patient achieves either a maximal or minimal score and subsequently becomes better or worse, the questionnaire cannot reflect this change and its responsiveness is compromised. None of the 'Injured' individuals had maximum or minimum scores. Therefore, no ceiling or floor effects were present.
All dancers were Caucasian, which may limit generalizability of these findings. However, the ballet world in Spain remains primarily Caucasian which limited our population diversity. We recruited individuals from different cities in Spain to minimize bias due to cultural, semantic, or demographic factors. The DFOS-Sp has not been tested on Central or South American Spanish-speaking dancers and may require further study. In addition, future research should develop dance questionnaires, focused on the upper limb and spinal injuries as well as their impact on performance.

Conclusions
DFOS-Sp demonstrated excellent reliability, construct validity, and internal consistency for use with Spanish-speaking adult and adolescent ballet and contemporary dancers. Therefore, the DFOS-Sp is a useful tool to monitor both healthy state and functional limitation following lower extremity or back injury in ballet and contemporary dancers.
The sample of this study consisted primarily of Spanish Caucasian participants, which may limit the generalization of the results. Central and South American Spanish may require further testing. This research supports further translation of DFOS into other languages and the adoption of it as an international reference outcome measure in prospective studies on dancers.