A Meta-Analysis of the Reliability of Four Field-Based Trunk Extension Endurance Tests

This meta-analysis aimed to estimate the inter- and intra-tester reliability of endurance measures obtained through trunk extension field-based tests and to explore the influence of the moderators on the reliability estimates. The reliability induction rate of trunk extension endurance measures was also calculated. A systematic search was conducted using various databases, and subsequently 28 studies were selected that reported intraclass correlation coefficients for trunk extension endurance measures. Separate meta-analyses were conducted using a random-effects model. When possible, analyses of potential moderator variables were carried out. The inter-tester average reliability of the endurance measure obtained from the Biering-Sorensen test was intraclass correlation coefficient (ICC) = 0.94. The intra-session reliability estimates of the endurance measures recorded using the Biering-Sorensen test, the prone isometric chest raise test, and the prone double straight-leg test were ICC = 0.88, 0.90, and 0.86, respectively. The inter-session average reliability of the endurance measures from the Biering-Sorensen test, the prone isometric chest raise test, and the dynamic extensor endurance test were ICC = 0.88, 0.95, and 0.99, respectively. However, due to the limited evidence available, the reliability estimates of the measures obtained through the prone isometric chest raise, prone double straight-leg, and dynamic extensor endurance tests should be considered with a degree of caution. Position control instruments, tools, and familiarization session demonstrated a statistical association with the inter-session reliability of the Biering-Sorensen test. The reliability induction rate was 72.8%. Only the trunk extension endurance measure obtained through the Biering-Sorensen test presented sufficient scientific evidence in terms of reliability to justify its use for research and practical purposes.

Study characteristics 18 For each study, present characteristics for which data were extracted (e.g., study size, PICOS, follow-up period) and provide the citations.
7 and Table S4 Risk of bias within studies 19 Present data on risk of bias of each study and, if available, any outcome level assessment (see item 12). 6-9, Table 1 and Table S5 Results of individual studies 20 For all outcomes considered (benefits or harms), present, for each study: (a) simple summary data for each intervention group, and (b) effect estimates and confidence intervals, ideally with a forest plot.
10 Figure 2 Synthesis of results 21 Present results of each meta-analysis performed, including confidence intervals and measures of consistency. Table 2 Risk of bias across studies 22 Present results of any assessment of risk of bias across studies (see item 15). Not applicable Additional analysis 23 Give results of additional analyses, if performed (e.g., sensitivity or subgroup analyses, meta-regression (see item 16)).

Summary of evidence
24 Summarize the main findings, including the strength of evidence for each main outcome; consider their relevance to key groups (e.g., healthcare providers, users, and policymakers).

15-17
Limitations 25 Discuss limitations at the study and outcome level (e.g., risk of bias), and at the review-level (e.g., incomplete retrieval of identified research, reporting bias).

17
Conclusions 26 Provide a general interpretation of the results in the context of other evidence, as well as implications for future research.

FUNDING
Funding 27 Describe sources of funding for the systematic review and other support (e.g., supply of data); role of funders for the systematic review. For more information, visit: www.prisma-statement.org. Table S2. Search strategy (all databases).

COSMIN risk of bias checklist
The checklist contains standards referring to design requirements and preferred statistical methods of studies on measurement properties. For each measurement property, a COSMIN box was developed containing all standards needed to assess the quality of a study on that specific measurement property. Each standard of the box is rated as "very good", "adequate", "doubtful", or "inadequate" quality. The overall rating of the quality of each study is determined by taking the lowest rating of any standard in the box.

Criteria for good measurement properties
The results of each study on a measurement property should be rated against the criteria for good measurement properties. Each result is rated as either sufficient (+ = ICC ≥ 0.70), insufficient (− = ICC < 0.70), or indeterminate (? = ICC not reported).

Summarize the evidence and grade the quality of the evidence
The results from different studies on one measurement property can be quantitatively pooled in a meta-analysis or qualitatively summarized. After pooling or summarizing all evidence per measurement property, and rating the pooled or summarized results against to the criteria for good measurement properties, the quality of the evidence is graded (high, moderate, low, very low evidence) on the basis of the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach. The GRADE approach uses five factors to determine the quality of the evidence: risk of bias (i.e., the methodological quality of the studies), inconsistency (i.e., unexplained inconsistency of results across studies), indirectness (i.e., evidence from different populations, interventions, or outcomes than the ones of interest), imprecision (i.e., total sample size of the available studies), and publication bias (i.e., negative results are less often published). The fifth factor, publication bias, is difficult to assess in studies on measurement properties, because of a lack of registries for this type of studies. Therefore, we do not take this factor into account in this meta-analysis [40].