Analysis of Structural Characteristics and Psychometric Properties of the SarQoL® Questionnaire in Different Languages: A Systematic Review

Background: Sarcopenia is the gradual and global loss of muscle and its functions. Primary sarcopenia is associated with the typical changes of advanced aging and affects approximately 5–10% of the population. The Sarcopenia and Quality of Life (SarQoL®) questionnaire is composed of 55 items, 22 questions, and is organized into seven domains of quality of life. The main objective of this systematic review was to analyze the structural characteristics and psychometric properties of it, as well as to classify its measurement properties, its methodological quality, and the criteria as good measurement properties of the adaptations and validations made on the SarQoL® questionnaire in different languages. Methods: A systematic review was carried out in the PUBMED, Web of Science, Cinahl, LatinIndex, and SCOPUS databases. The keywords used were: “SarQoL”, “assessment”, “sarcopenia”, “geriatric”, “PROM”, “quality of life”, and “questionnaire”, using the Boolean operator “AND”. All articles published up to 15 January 2022 were considered. Methodological quality and psychometric properties were assessed based on the COSMIN guidelines and the guidelines and general recommendations of PRISMA. Documents published in languages other than English were excluded, as well as versions of the SarQoL® published in the form abstracts for conferences when the full text was not available. Results: A total of 133 articles were identified, 14 of which were included. The evaluated questionnaires and the structural characteristics and psychometric properties of each of them were collected. Conclusion: The different cross-cultural versions of the questionnaire showed good basic structural and psychometric characteristics for the evaluation of patients with sarcopenia.


Introduction
Sarcopenia is the gradual and global loss of the musculoskeletal system associated with low muscle quality and quantity, the most representative characteristic of which being muscle insufficiency linked to the loss of strength and dysfunction presented by patients suffering from this disorder [1,2]. Sarcopenia is currently associated with heterogeneous causality, including physiological, genetic, and environmental factors [3], and two different Inclusion criteria: Studies including cross-cultural adaptation and validation of the SarQoL ® questionnaire for assessing the quality of life in people with sarcopenia in any language other than the original language. Exclusion criteria: Studies having completed the adaptation phase but not having completed the validation phase, and studies that have been featured in the form of abstracts at conferences but have not been published in full text in any database, which made it impossible to analyze their psychometric properties.

Selection of Documents
The identified documents were submitted to the Rayyan platform (rayyan.qcri.org) [19] to collect, review, and evaluate citation titles and abstracts. First, the articles found as duplicated were eliminated, comprising a total of 64 documents. Subsequently, two researchers (G.M.-T., J.M.-C.) carried out an independent and blinded review and screening based on titles and abstracts, and those articles not meeting the inclusion criteria were eliminated. Conversely, those complying with the said criteria were selected and located for full-text reading. Furthermore, those articles arising doubts or in which the title and abstract did not reveal sufficient information to determine their inclusion or exclusion were also retrieved. Discrepancies were solved by a third reviewer (A.G.-M.).

Instrument
The original version of the SarQoL ® questionnaire was developed by Beaudart et al. in 2015 [14] and validated by the same authors two years later [8]. The questionnaire consists of 55 items translated into 22 questions, which in turn are organized into seven different dysfunction domains: physical and mental health, locomotion, body composition, functionality, activities of daily living, leisure activities, and fears [14]. The SarQoL ® questionnaire response options are a combination of Likert scales (3,4, or 5 levels) and questions with different multiple-choice options. Although the questionnaire is easy to complete in just 10 min, there is also a short version of only 14 items, which the authors recommend where the original version may be too burdensome for respondents [20].

Results Synthesis and Data Extraction
All the articles finally selected were analyzed in order to identify the cross-cultural adaptation and collect information on the process of construction and validation of these tools. Likewise, the structural characteristics extracted from each cross-cultural adaptation were title, authors, year of publication, acronym, population, BMI, setting, diagnosis of sarcopenia, number of subjects with sarcopenia, number of subjects in the pilot phase, and number of subjects per item in the validation phase. On the other hand, the results of the extracted psychometric properties were test-retest reliability, internal consistency, and construct validity.

Result
After having identified 133 documents, 132 through the search in the databases and one [21] through the search in the reference lists of the selected articles, 68 were eliminated due to being duplicates. Of the 65 selected documents, 31 were finally excluded by title and abstract. The remaining 34 articles were subject to full-text examination, after which 21 were excluded, three because they only consisted of abstracts of a conference, and one because the validation phase had not been carried out. Finally, we had a total of 14 articles left for carrying out this systematic review; this entire selection process is shown in the flowchart below ( Figure 1).
After reading the titles and applying the selection criteria to all the documents, a total of 14 cross-cultural adaptations were selected [5,12,13,[21][22][23][24][25][26][27][28][29][30][31] in different languages, such as English, Romanian, Dutch, Polish, Greek, Lithuanian, Russian, Spanish, Ukrainian, Korean, Serbian, Chinese, and Turkish. The Hungarian version was not considered as the validation phase of the questionnaire had not been carried out. At the same time, the Persian, Czech, and Latvian versions were not included as their text was not fully available, and only an abstract of which was found in the World Congress on Osteoporosis, Osteoarthritis, and Musculoskeletal Diseases (WCO-IOF-ESCEO 2020): Poster abstracts [32].  21 were excluded, three because they only consisted of abstracts of a conference, and one because the validation phase had not been carried out. Finally, we had a total of 14 articles left for carrying out this systematic review; this entire selection process is shown in the flowchart below ( Figure 1). After reading the titles and applying the selection criteria to all the documents, a total of 14 cross-cultural adaptations were selected [5,13,14,[22][23][24][25][26][27][28][29][30][31][32] in different languages, such as English, Romanian, Dutch, Polish, Greek, Lithuanian, Russian, Spanish, Ukrainian, Korean, Serbian, Chinese, and Turkish. The Hungarian version was not considered as the validation phase of the questionnaire had not been carried out. At the same time, the Persian, Czech, and Latvian versions were not included as their text was not fully available, and only an abstract of which was found in the World Congress on Osteoporosis, Osteoarthritis, and Musculoskeletal Diseases (WCO-IOF-ESCEO 2020): Poster abstracts [33]. Table 1 details the structural characteristics of the questionnaires: acronym, population, Body Mass Index (BMI), setting, diagnosis of sarcopenia/number of subjects with sarcopenia, number of subjects included in the piloting phase, and number of subjects included per item in the validation phase.  Table 1 details the structural characteristics of the questionnaires: acronym, population, Body Mass Index (BMI), setting, diagnosis of sarcopenia/number of subjects with sarcopenia, number of subjects included in the piloting phase, and number of subjects included per item in the validation phase.
Furthermore, the number of questions and items were the same in all the cross-cultural adaptations of the SarQoL ® questionnaire included in this review, that is, all the adaptations into the different languages included 22 questions and 55 items. At the same time, the self-administration of time was only specified in two of the adaptations, in the English version, which was established in 10 min [30], and in the Spanish version in 10-15 min [5]. In addition, the diagnostic criteria for sarcopenia that were taken into account in each of the adaptations were those collected until 2018 by the European Working Group on Sarcopenia in Older People (EWGSOP); from the adaptations made in 2019 onwards, the diagnostic criteria included by the European Working Group on Sarcopenia in Older People, revised in early 2018 (EWGSOP2) were those taken into account, although only in the Chinese version of the SarQoL ® questionnaire other different diagnostic criteria for sarcopenia were taken into account, i.e., the Asian Working Group for Sarcopenia criteria (AWGS).
The psychometric properties of the questionnaires are shown in Table 2. They include reliability, internal consistency, and construct validity measured through convergent and divergent validity.

Content Validity
To evaluate the validity of the content in the 13 versions of the SarQoL ® , the three criteria considered in the COSMIN guidelines were taken into account [33], including relevance, comprehensiveness, and comprehensibility. Twelve out of the thirteen studies analyzed the validity of the content, and six of them [22][23][24]26,27,29] considered the comprehensibility criterion. The validity of the content could not be evaluated because these aspects were doubtful or unclear, and so were considered as inconsistent. The patients were not asked about any of these three aspects, and as the authors evaluated the relevance and comprehensiveness and these were not shown clearly or in sufficient detail, so they were classified as inconsistent in relation to the criteria for measurement properties.

Structural Validity
None of the studies evaluated the structural validity of the SarQoL ® , so the extent to which the scores obtained reflect an adequate dimensionality of the quality of life in patients with sarcopenia could not be analyzed. The COSMIN guidelines [17] recommend this property to be evaluated prior to internal consistency or cross-cultural validity.

Internal Consistency
The internal consistency was calculated for the 13 adapted versions of the SarQoL ® questionnaire using Cronbach's alpha; they scored an internal consistency considered excellent α > 0.8 in all questionnaires, proving a high internal consistency. The range of Cronbach's alpha values was between 0.866 of the Korean version [26] and 0.96 in the Greek version [23]. Thus, the classification of the criteria for a good measurement property was considered for all the studies as "indeterminate", since the criteria were met because the structural validity according to the COSMIN guidelines had not been taken into account [17].

Test-Retest Reliability
The intraclass correlation coefficient (ICC) was used to test the reliability between the first and the second questionnaire according to the scores of the individual and general domains of the SarQoL ® . An ICC greater than 0.7 is considered an acceptable reliability [34]. It was measured in all the adaptations included in this review, except in the Romanian [31] and Serbian [27] versions of the SarQoL ® . ICC values ranged from 0.935 for the Russian version [24] to 0.99 of the Spanish and Polish versions [5,13]. The highest score was recorded for Domain D6 (Leisure activities) at 1.00 in the Polish version [12] and the lowest for Domain D7 (Fears) at 0.64, CI 95% (0.52-0.70) in the Greek version [23]. The Korean version did not specify the time elapsed between the first and second questionnaire and for the rest of the studies there was an interval of two weeks. Test-retest reliability was rated as a "sufficient" measurement property in the studies in which it was included.

Measurement Errors
Only one adaptation considered the measurement error referring to the Standard Error of Measurement (SEM). In the Greek version [23], SEM was reported on each of the SarQoL ® dimensions (D1: 2.42; D2: 3.15; D3: 6.95; D4: 2.7; D5: 5.04; D6: 6.23; D7: 9.17) and the total SEM score attained 2.75. The SEM is a parameter commonly used to indicate the amount of measurement error in an instrument, the interpretation of the measurement around a mean value, or the range within which the "true" value lies [35]. Other error measurements, such as the Minimal Important Change (MIC) or Limits of Agreement (LoA), were not considered in this study. Therefore, based on the criteria for a good measurement property, according to the COSMIN guidelines [17], the classification would be described as "indeterminate" [23] because the MIC has not been reported.

Construct Validity (Convergent Validity)
Convergent validity was reported in all studies [5,12,13,[22][23][24][25][26][27][28][29][30][31]. In addition, all the studies established the correlations with the SF-36 and EQ-5D. Regarding the SF-36, although not applicable to all the studies included, they were correlated with the domains of physical functioning, role limitation due to physical problems, bodily pain, and general health and vitality. In the case of the Polish version, the SF-36v2 ® PCS (physical component summary) and the SF-36v2 ® MCS (mental component summary) were correlated, and, at the same time, the Ukrainian version was also correlated with the SF-36v2 ® PCS. On the other hand, the EQ-5D was also included for the dimensions of utility score, mobility, and usual activities; self-care was also added in the Turkish version. In addition, the EQ-VAS was used in the Dutch, Polish, Spanish, and Ukrainian versions (see Table 2). Therefore, according to the COSMIN guidelines [17], based on the criteria for a good measurement property, the classification scored was considered as "insufficient" in three of the studies [5,23,26] because the results are not in agreement with the hypothesis that 75% of the correlations are ≥0.50. The rest were considered "sufficient", as the results were in agreement with the 75% hypothesis (see Table 3).

Construct Validity (Divergent Validity)
Divergent validity was reported in all studies except for the Polish version [12]; the correlations were made based on the SF-36, the EQ-5D, and the HADS. The dimensions of the SF-36 included social functioning, role limitation due to emotional problems, mental health, role limitation due to physical problems (only in the Greek and Korean versions), emotional wellbeing (only in the Korean version), bodily pain (only in the Turkish version), and the Ukrainian version is the only one that included the SF-36 MCS dimension in divergent validity. Moreover, only in the Spanish version HADS anxiety and HADS depression were correlated. On the other hand, in relation to the correlation of the EQ-5D, the dimensions self-care, pain/discomfort, and anxiety/depression were included. Therefore, according to the COSMIN guidelines [17], and based on the criteria for a good measurement property, all the studies were considered "insufficient" in terms of classification of good measurement property, except the Spanish [5] and the Korean [26] versions, which were considered "suffi-cient", since 75% of their indices were less than 0.30, and finally, the Polish version [12] was considered as "indeterminate" because it did not include divergent validity (see Table 3).

Criterion of Validity and Responsiveness
For the validity of the criterion, the COSMIN guidelines take into account the evaluation of this property based on the agreement with the hypothesis, using an external instrument or "gold standard" [17]. The measure considered by the authors of the COSMIN guidelines is the AUC (Area Under the Curve), which takes into account those values greater or lower than 0.70. It should be noted that no information on this property was reported; therefore, it could not be qualified. Similarly, responsiveness provides us with information to detect changes over time and considers the same AUC measurement using values of 0.70. None of the 13 validation studies of SarQoL ® evaluated this feature, nor did the original version take it into account [8]. That is why it did not qualify either.

Floor-Ceiling Effect
The ceiling or floor effect refers to the percentage of patients who obtained the highest scores (ceiling) or the lowest (floor), with percentages greater than 15% being considered significant [34,36]. They were analyzed in all the studies, although no floor effect was observed in any of them and only one study provided data on the ceiling effect, i.e., the Ukrainian version [25] in Domain 7 (fears) in which a ceiling effect of 28.6% could be found in 14 people with 100 points. However, no reference was made to the floor-ceiling effect in the Korean version [26]. The ceiling-floor effect is not considered a measurement property according to the COSMIN guidelines, although it has been considered in the referred studies.

Discriminative Power
In all versions, the discriminative power was considered since it is an instrument specifically designed to be used in sarcopenic populations; that is why the capacity of the questionnaire to differentiate between subjects with sarcopenia and those without sarcopenia must be taken into account. Furthermore, it is evaluated by comparing the total score of the SarQoL ® questionnaire with the scores of the individual domains [14]. In all studies, quality of life is better in subjects without sarcopenia compared with subjects diagnosed with sarcopenia. The discriminative power could not be evaluated, as it was not considered a measurement property by the COSMIN guidelines.

Methodological Quality
The "inadequate" methodological quality of most of the studies was due to the low number of samples equal to five times the number of elements, and only in three adaptations was the methodological quality very good [26,27,30] according to the COSMIN guidelines [18] (see Table 3). The methodological quality was evaluated based on the criteria of: (1) very good; seven subjects per item in samples ≥100 participants; (2) adequate; five subjects per item in samples ≥100 participants, or six subjects per item in samples <100 participants; (3) doubtful; five subjects per item in samples <100 participants; (4) inadequate; <5 subjects per item.

Discussion
The objective of this systematic review was to carry out an analysis of the validated questionnaires of the SarQoL ® with cross-cultural adaptation into different languages, for the evaluation of the quality of life in patients with sarcopenia, and to collect the structural characteristics, psychometric properties, as well as the classification of their measurement properties, their methodological quality, and the criteria as good measurement properties of all the versions of the questionnaires and, subsequently, to compare for identifying the most relevant ones to be used in clinical practice, as well as in the field of research. A total of 14 studies were identified and included in the analysis of this systematic review.
The adaptation studies included samples ranging from 10 to 25 subjects, except in the Spanish version [5] in which this was not reported, while in the original version [8] a total of 43 subjects and 12 experts were included. According to Beaton et al. [37], the ideal number of subjects for the pilot phase should be between 30-40 subjects, as recommended in the AAOS (American Academy of Orthopedic Surgeons) guide. Therefore, larger samples of subjects in the cross-cultural adaptation phase should be considered for future adaptations of the SarQoL ® in other languages.
In the validation phase, the sample in the different adaptations ranged between 49 and 699 subjects, while in the original version a total of 296 subjects were included. Only three adaptations [26,27,30] to other languages had a very good methodological quality, while the methodological quality of the original version [8] is adequate. Larger subject samples should be considered for future validation in other languages.
Structural validity was not considered in any of the SarQoL ® adaptations, and it was not considered in the original version either. Therefore, for future adaptations of this questionnaire, the analysis of its internal structure should be included, since it is important to know it in order to decide how the items should be combined within a scale or subscale [17]. Regarding internal consistency, all the adaptations showed an excellent Cronbach's alpha greater than 0.7, and the original version refers to a Cronbach's alpha of 0.87; each of the seven SarQoL ® domains ranged from 0.84 in D1-Mental Health to 0.89 in D6-Leisure Activities. Therefore, despite presenting an excellent Cronbach's alpha, it would be convenient in future adaptations to analyze the structural validity to complete the internal validity.
To test the reliability of the SarQoL ® , the test-retest was used in week two in all the studies except the Romanian [31] and the Serbian [27] versions. Likewise, a time span of two weeks elapsed in the original version [8]. On the other hand, the ICC was excellent in all versions of SarQoL ® and in their corresponding dimensions, as in the original version [8], except in the Dutch version [22] (domain 6 and 7) and in the Greek version [23] (domain 7) that scored a low test-retest reliability.
The measurement error was only included in the Greek version [23] and it was not considered in the original version of the SarQoL ® . Therefore, in future adaptations of SarQoL ® this should be taken into account as a measurement property [17,34].
To analyze the validity of the construct, the correlations with SF-36 and EQ-5D were included for both convergent validity and divergent validity, except in the Spanish version [5], which also included the HADS for divergent validity. Regarding the original version [8], in addition to the SF-36 and EQ-5D, it was also correlated with the Mini Mental State Examination and the Mobility-Tiredness Scale. Regarding convergent validity, all versions showed a good correlation, except three of them [5,23,26] that presented an insufficient correlations; the original version shows a good convergent validity. In relation to the divergent validity, only two versions showed a good correlation [5,26], and in the original version [8] a good correlation is also recorded.
Furthermore, criterion validity and responsiveness were not reported in any of the SarQoL ® versions, nor in the original version. Therefore, in future adaptations and validations of this questionnaire into other languages, these measurement properties could be included to consider the changes recorded if some type of treatment is included.
In both the SarQoL ® versions and the original, no ceiling-floor effects were observed, except in the Ukrainian version [25] in which the ceiling effect only occurred in one of its domains. Moreover, in all versions of the SarQoL ® the discriminative power was taken into account in relation to the total score and by domains, in the same way as in the original version [8], with the exception that in the original version a logistic regression was used for comparing both groups (sarcopenia and non sarcopenia) as in the Chinese version [28].

Strengths and Limitations
Although this is the first study that analyzes the psychometric and structural characteristics of a reference questionnaire for the evaluation of patients with sarcopenia, such as the SarQoL ® , there are some limitations that must be indicated when interpreting the results reached. For example, although the search was carried out in five databases of worldwide relevance, there could be some versions of SarQoL ® not being collected in the aforementioned databases and, therefore, not included in this study. In addition, it is important to highlight that there are some limits within the studies themselves that should be corrected in future studies, since the evaluation of some psychometric properties could not be made. This was the case of structural validity. Therefore, in the future it would be convenient to design studies analyzing this psychometric property, which is highly significant for questionnaires validation.
Although the different versions of the SarQoL ® assess quality of life in people with sarcopenia, the impact of comorbidities and their interaction on functional capacity was not assessed, as it is a risk factor to consider and may influence the SarQoL ® results, so future adaptations and validations should take into account the inclusion of the different comorbidities that these subjects may present.

Conclusions
The main conclusion that can be drawn after carrying out the study is that the SarQoL ® has been translated and adapted into multiple languages. All the versions analyzed show basic psychometric characteristics that can be classified qualitatively ranging between good and excellent. However, only three of the 13 versions analyzed featured adequate methodological quality.
Clinicians and researchers at the international level have different instruments with psychometric properties that, as a rule, are similar to the adapted and validated versions existing of the SarQoL ® published in different languages. Therefore, these properties allow us to compare the results obtained with samples from different countries. Despite these good characteristics, there are psychometric variables that none of the versions included, affecting their rating criteria for good measurement properties and methodological quality according to the COSMIN guidelines. Therefore, it is necessary to design studies that include the same measurement properties so that the validation process is homogeneous within the scientific community. Likewise, thanks to this study, it can be concluded that the SarQoL ® questionnaires available so far, as tools for evaluating the quality of life in people with sarcopenia, which have been translated and validated in different languages, are valid to be used in this population in different countries.  Acknowledgments: The authors would like to thank Silvia Jiménez Jiménez for her invaluable help in understanding the Russian version.

Conflicts of Interest:
The authors declare no conflict of interest.