Absolute Reliability and Concurrent Validity of Hand-Held Dynamometry in Shoulder Rotator Strength Assessment: Systematic Review and Meta-Analysis

The purpose of this study is to establish the absolute reliability between hand-held dynamometers (HHDs) and concurrent validity between HHDs and isokinetic dynamometers (IDs) in shoulder rotator strength assessment. The Medline, CINAHL, and Central databases were searched for relevant studies up to July 2020. Absolute reliability was determined by test–retest studies presenting standard error of measurement (SEM%) and/or minimal detectable change (MDC%) expressed as percentage of the mean. Studies considering intra-class correlation coefficient (ICC) between IDs and HHDs were considered for concurrent validity. The risk of bias and the methodological quality were evaluated according to COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN). Thirteen studies were included in the meta-analysis. Shoulder internal rotator strength assessment MDC% was 0.78%, 95% confidence interval (CI) −5.21 to 3.66, while shoulder external rotators MDC% was 3.29%, CI −2.69 to 9.27. ICC between devices was 0.94, CI (0.91 to 0.96) for shoulder internal rotators and 0.92, IC (0.88 to 0.97) for shoulder external rotators. Very high correlation was found for shoulder rotator torque assessment between HHDs and IDs. The COSMIN checklist classified the selected studies as adequate and inadequate.


Introduction
Muscle strength assessment is relevant in patients with shoulder disorders [1,2]. Isokinetic and hand-held dynamometers (HHDs) are useful tools for clinicians to objectively assess muscle strength, quantify the degree of impairment, and evaluate treatment efficacy when performed before and after an intervention [3,4]. Isokinetic dynamometers (IDs) are considered the reference standard in muscle testing. Maximal torque can be generated throughout the entire range of motion, and results are not underestimated if the examiner is unable to oppose enough force to the assessed muscle strength [5]. Although considered to be the gold standard, isokinetic testing is limited in clinical settings because of the high cost and the laboratory setting required. HHDs are considered low cost, with convenient size and easy use of instruments, but their psychometric properties in shoulder rotator strength assessment are not clearly understood. Relevant psychometric properties include absolute reliability and concurrent validity analysis. Absolute reliability determines variations in repeated measurements performed multiple times under similar conditions [6,7], ensuring that changes between measurements are due to differences in performance instead of inconsistency in measuring the capacity of the applied device [8].
The most common methods for a correct analysis are the standard error of measurement (SEM) reliability and minimal detectable change (MDC) for within-subject variation [9]. They are usually expressed as percentage of the mean (SEM% and MDC%) for analysis purposes. Concurrent validity focuses on the extent to which scores of a certain instrument are related to a well-established gold standard obtained at the same point in time [7]. One common method for analyzing concurrent validity is inter-class reliability (ICC). It determines the degree to which individuals maintain the same position in a sample assessed by different instruments. Few systematic reviews have summarized the results of strength assessment in the shoulder joint. Edouard et al. [10] conducted a systematic review to determine the influence of position in shoulder rotator strength assessment with isokinetic dynamometry, Sorensen et al. [11] in measurement properties of isokinetic dynamometry for assessment of shoulder muscle strength and Schrama et al. [1] in intra-examiner reliability in strength assessment in the upper extremity. No systematic review has been published to determine absolute reliability and concurrent validity in shoulder strength assessment. The aim of this systematic review was to (i) determine the absolute reliability between HHDs and (ii) determine the concurrent validity between HHDs and IDs in maximal shoulder internal and external rotator strength assessment.

Materials and Methods
The reporting of this systematic review is based on the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines [12]. The PRISMA guidelines consist of a 27-item checklist and a four-phase flow diagram.

Search Strategy
A search was conducted for relevant studies in English published from 1990 up to and including July 2020. The CENTRAL Cochrane, MEDLINE, and CINAHL databases were explored, in addition to gray literature. Muscle strength, isokinetic, dynamometry or dynamometer, validity, reliability, gold standard, shoulder joint, and rotator cuff were included as search terms.

Study Selection
The inclusion criteria for studies were (1) asymptomatic participants; (2) shoulder rotator strength of the dominant side of participants assessed using isometric contractions with either an HHD or ID; (3a) absolute reliability, expressed as the SEM% or MDC% for within-subject variability between trials in maximal shoulder rotator strength assessment, and/or (3b) concurrent validity, expressed as ICC with 95% confidence interval (CI); and (4) for absolute reliability studies, strength assessment expressed in kilos or pounds. For concurrent validity studies, strength assessment was expressed in Newton * meter (Nm).
The exclusion criteria were (1) no full text availability, (2) measures of central tendency and dispersion not mentioned, and (3) concurrent validity expressed in Pearson correlation.

Data Extraction
Titles and abstracts were screened by two reviewers (MA and BT). Full-text review based on the inclusion/exclusion criteria was suggested by yes or no criteria. If discrepancies existed between reviewers, the ratings were discussed until consensus. Studies analyzing absolute reliability were included when the method employed was properly described, especially population, number of subjects, dynamometer model, shoulder assessment position, test-retest procedure, interval between the two tests, and a statistical analysis suitable for reliability tests. Studies analyzing concurrent validity were considered when sample characteristics, HHD model, ID model, and assessment position were mentioned.

Quality Assessment Methodology
The "Consensus-based Standards for the selection of health measurement instruments risk of bias checklist" (COSMIN) [13] was used for methodological assessment. Box 7 was used to assess absolute reliability studies and box 8 for concurrent validity studies. The methodological quality of each item in a box was rated as "very good", "adequate", "doubtful", or "inadequate". The lowest rating in the box was used for determining the overall quality of a measurement property in each study. Following COSMIN, the results of each study were rated as sufficient (+), insufficient (-), or indeterminate (?), according to Terwee et al. [14]. Absolute reliability measurement was rated as sufficient if MDC was less than 15%. Different studies [15,16] recommend that changes between 10% and 15% are clinically relevant. Concurrent validity was considered as sufficient if the obtained ICC was over 0.70 [13].
Lower limit of 95% confidence interval of MDC% (MDC% lower limit) was calculated as MDC% lower limit = ((mean retest-mean test) * 100/(mean test and retest)) − MDC%. Concurrent validity was analyzed by ICC between HHDs and IDs, considering a twoway random effects model with absolute agreement. Level of agreement between devices was classified according to Munro's scale, where values of less than 0.25 represented little correlation; 0.26-0.49 low correlation; 0.50-0.69 moderate correlation; 0.70-0.89 high correlation; and over 0.9 very high correlation [17]. Stata Statistical Software: Release 13 (College Station, TX, USA: StataCorp LP software) was used for statistical analysis.

Overall Results
A total of 1053 studies were selected from the initial search. Seventy-four studies were considered for full-text review after removal of excluded studies and duplicates. Fifteen studies met inclusion and exclusion criteria; 13 of those [18][19][20][21][22][23][24][25][26][27][28][29][30] were considered for meta-analysis. Two studies were not considered due to values not being shown in kgs or pounds for HHD assessments. Ten of the assessed studies [18][19][20][21][22][23][24][25][26][27] provided a detailed review for the MDC% of within-subject variations between trials 1 and 2 for the HHD when assessing muscle strength in shoulder rotators ( Figure 1). Descriptions of characteristics and outcomes of studies considering absolute reliability of HHD are presented in Tables 1 and 2. Descriptions of characteristics and outcomes of studies considering concurrent validity between HHD and gold standard ID are presented in Tables 3 and 4.

Absolute Reliability
The methodological quality of the selected studies considering absolute reliability is shown in Table 5. One study presented adequate methodological quality [26], while nine studies [18][19][20][21][22][23][24][25]27] were rated as inadequate. Inadequate methodological quality of the studies was mainly due to an inappropriate time interval (fewer than 3 days) and a low sample size (<30).

Concurrent Validity
The methodological quality of the selected studies analyzing concurrent validity is shown in Table 6. One study was scored as adequate [28]. Two studies were scored as inadequate [29,30] due to a small sample size (<30). The ID used in each study was explicitly mentioned and recognized as the gold standard. Independent measurements were assessed between devices, and correlations were calculated in all studies.

Absolute Reliability of HHD and Concurrent Validity between HHDs and IDs Based on the Criteria for Good Measurement Properties
Absolute reliability for shoulder IR strength measurement was rated as sufficient in four studies [18,19,21,24] and as insufficient in three studies [20,22,26]. Absolute reliability for shoulder ER strength assessment was rated as sufficient in four studies [18][19][20]24] and as insufficient in six studies [21][22][23][25][26][27]. Concurrent validity for internal and external shoulder rotator strength assessment was rated as sufficient in all studies [28][29][30] (Table 7).

Hand-Held Dynamometry: Absolute Reliability
Effect size is expressed in kgs as percentage of the mean (kg%). One study [19] assessed internal and external rotation strength in two groups (healthy young adults and female athletes) with separate analysis.

Concurrent Validity between Hand-Held Dynamometers and Isokinetic Dynamometers
Effect size is expressed as IC and 95% CI for shoulder internal and external strength assessment.

Shoulder Internal Rotators
Two studies were considered for HHD and ID concurrent validity. One study compared the Lafayette HHD to the Biodex ID [28] with the patient seated and shoulder positioned in a scapular plane, while the other study compared the FED HHD to the ID REV 7000 ID device [30] in the supine position and shoulder positioned at 90 • abduction ( Figure 4).

Shoulder External Rotators
Three studies ( Figure 5) were considered for HHD and ID concurrent validity. One study compared the Lafayette HHD to the Biodex ID [28] with the patient seated and shoulder positioned in a scapular plane. The second study compared the FED HHD to the ID REV 7000 ID device [29] in the supine position and shoulder positioned at 90 • abduction. The third study compared the Lafayette HHD to the Cybex Norm ID in the supine position and shoulder positioned at 0 • abduction [30].

Concurrent Validity between Hand-Held Dynamometers and Isokinetic Dynamometers
Effect size is expressed as IC and 95% CI for shoulder internal and external strength assessment.

Shoulder Internal Rotators
Two studies were considered for HHD and ID concurrent validity. One study compared the Lafayette HHD to the Biodex ID [28] with the patient seated and shoulder positioned in a scapular plane, while the other study compared the FED HHD to the ID REV 7000 ID device [30] in the supine position and shoulder positioned at 90° abduction (Figure 4).

Shoulder External Rotators
Three studies ( Figure 5) were considered for HHD and ID concurrent validity. One study compared the Lafayette HHD to the Biodex ID [28] with the patient seated and shoulder positioned in a scapular plane. The second study compared the FED HHD to the ID REV 7000 ID device [29] in the supine position and shoulder positioned at 90° abduction. The third study compared the Lafayette HHD to the Cybex Norm ID in the supine

Discussion
The general results showed good absolute reliability for HHDs in shoulder internal and external rotator strength assessment. Lower and upper limits of MDC% did not exceed 15%, showing low random error of the instrument in strength test-retest evaluation. Minimal clinically important difference (MCID) reflects the smallest measured change in score that patients perceive as important. In strength evaluations, various authors [15,16] argue that the clinical significance is about 15%. In fact, they suggest that differences between the affected and non-affected limbs should not exceed 15% to be able to return safely to sports activity. As this systematic review shows MDC% under the MCID for both internal and external rotation, HHDs can detect small but clinically relevant changes in the assessment of shoulder rotator strength.
The narrower the interval between the upper and lower limits of the MDC%, the greater precision shown by the assessment tool. For example, when analyzing absolute reliability in shoulder external rotator strength assessment (Figure 3 diamond), a confidence interval between -2% and 9% was shown, meaning that if a patient is referred by a physician to an eight-week rehabilitation program, this would be effective for shoulder external rotator strengthening if at least a 9% improvement is observed at the end of those 8 weeks. Only McLaine et al. [18] and Kaleem et al. [23] showed MDC% values under 15% for internal rotation, while for external rotation only Kaleem et al. [23] published lower values. The upper limits of the MDC% in studies that contributed to the review provide a measurement error similar to IDs. This review showed lower MDC% than what was reported by van Meeteren et al. [31], but higher MDC% than reported by Collado-Mateo et al. [32]. The MDC% obtained in the meta-analysis for lower extremities [19] showed higher values than those obtained in this study. This is in accordance with what was stated by Wikholm and Bohannon [33], which establishes that in small groups such as the shoulder rotators, reliability is higher than in larger groups, such as the knee extensors.
Regarding concurrent validity, a very high correlation was established for internal and external rotator strength assessment according to the Munro scale [17]. Concurrent validity between the HHD and the ID was better for internal rotation than for external

Discussion
The general results showed good absolute reliability for HHDs in shoulder internal and external rotator strength assessment. Lower and upper limits of MDC% did not exceed 15%, showing low random error of the instrument in strength test-retest evaluation. Minimal clinically important difference (MCID) reflects the smallest measured change in score that patients perceive as important. In strength evaluations, various authors [15,16] argue that the clinical significance is about 15%. In fact, they suggest that differences between the affected and non-affected limbs should not exceed 15% to be able to return safely to sports activity. As this systematic review shows MDC% under the MCID for both internal and external rotation, HHDs can detect small but clinically relevant changes in the assessment of shoulder rotator strength.
The narrower the interval between the upper and lower limits of the MDC%, the greater precision shown by the assessment tool. For example, when analyzing absolute reliability in shoulder external rotator strength assessment (Figure 3 diamond), a confidence interval between -2% and 9% was shown, meaning that if a patient is referred by a physician to an eight-week rehabilitation program, this would be effective for shoulder external rotator strengthening if at least a 9% improvement is observed at the end of those 8 weeks. Only McLaine et al. [18] and Kaleem et al. [23] showed MDC% values under 15% for internal rotation, while for external rotation only Kaleem et al. [23] published lower values. The upper limits of the MDC% in studies that contributed to the review provide a measurement error similar to IDs. This review showed lower MDC% than what was reported by van Meeteren et al. [31], but higher MDC% than reported by Collado-Mateo et al. [32]. The MDC% obtained in the meta-analysis for lower extremities [19] showed higher values than those obtained in this study. This is in accordance with what was stated by Wikholm and Bohannon [33], which establishes that in small groups such as the shoulder rotators, reliability is higher than in larger groups, such as the knee extensors.
Regarding concurrent validity, a very high correlation was established for internal and external rotator strength assessment according to the Munro scale [17]. Concurrent validity between the HHD and the ID was better for internal rotation than for external rotation. Very high correlation between devices and a narrow confidence interval were reported in all studies assessing internal rotation [28][29][30]. Two studies [28,29] classified the concurrent validity between instruments as very high correlation in the assessment of external rotator strength. One study classified it as high correlation. External rotation at 90 • or higher glenohumeral abduction generates less activation in coracobraquialis, biceps, anterior braquialis, major pectoralis, and subscapularis, generating less glenohumeral stability. Evidence of anticipated external rotation in shoulder rotators has been published [34]. This could partially explain why reliability and validity are better for internal rotators than external rotators in the shoulder joint.
To the best of our knowledge, this is the first systematic review that summarizes available research on measurement properties in HHDs when used to assess shoulder muscle strength. Although both rotations showed good absolute reliability, better reliability and concurrent validity are observed for the internal rotators than for the external rotators. Several factors with a relevant influence on the calculations of the psychometric properties of the instruments measuring muscle strength must be considered for correct interpretation of the tests.
The stabilization system is related to the external fixation applied to allow the movement to be as analytical as possible and eliminate compensations. For example, when evaluating the shoulder joint, it is suggested to stabilize the pelvic girdle, trunk, and anterior translation of the humeral head. Proximal stabilizer deficiencies affect the distal force evaluated. Incorrect stabilization can also affect the length-tension relationship of the musculature to be evaluated. Unlike the ID, the HHD does not have a stabilization system, so it must be created externally. In all studies, stabilization systems were poorly described.
There are various ways of positioning the subject while assessing the shoulder joint. Articles included seated and supine shoulder evaluations with low and high angles of glenohumeral elevation. It is expected that the supine evaluation can better fix the scapula, which should increase the reliability and the evaluated torque [16]. New studies comparing different subjects' position for shoulder strength assessment are required.
If the same evaluator carries out the tests, a difference of at least seven days between test and retest is recommended [7]. In many publications, the retest was carried out less than 3 days after the first one, which takes away any independence, since the evaluator remembers the result obtained. Independence in evaluations was achieved only in three studies [19,20,25]. This can overestimate HHDs' psychometric properties in shoulder rotator strength assessment.
The evaluation protocol must be clearly described so that it can be reproduced. The conditions should be as similar as possible in both the protocol and environmental conditions. For example, if two submaximal tests are performed in a strength test, they must remain constant in the retest [6]. Similar conditions in the test-retest assessment were declared in all studies.
Randomization is one of the ways to reduce random error. If shoulder internal and external rotations are evaluated, there must be a randomization system so that some subjects start with medial rotation and others with lateral rotation. Thus, there is a possibility of systematic errors such as fatigue in lateral rotation due to the same order in performing both types of rotation. As a considerable number of studies show deficits in these relevant factors, the information provided in this review should be interpreted with caution. Future studies with better design in protocols and higher methodological quality are suggested to increase the fidelity of the results.
Publication bias is evident, since only papers published in English were included. Another bias comes from the lack of a universal equation to calculate MDC and ICC. Asymptomatic subjects were considered in this review, so the MDC% values obtained cannot be directly extrapolated to patients with shoulder dysfunction. Finally, concurrent validity of internal rotator strength assessment in this meta-analysis is composed by only two studies and a total sample size of 41 patients, making them less informative than other meta-analyses presented in this study.

Conclusions
Considering COSMIN classifications, studies were ranked methodologically between adequate and inadequate. Higher MDC% was found for shoulder external rotator strength assessment, although inside the MIC. Very high correlation between devices according to Munro was found for internal and external rotation. Although HHDs appear to be reliable tools for strength assessment in shoulder rotators, results should be considered with caution due to the clinical heterogeneity between studies and some methodological flaws.