Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessReview

Peer-Review Record

Clinical Evaluation of Functional Lumbar Segmental Instability: Reliability, Validity, and Subclassification of Manual Tests—A Scoping Review

J. Funct. Morphol. Kinesiol. 2025, 10(4), 400; https://doi.org/10.3390/jfmk10040400

by Ioannis Tsartsapakis^1,*

, Aglaia Zafeiroudi²

and Gerasimos V. Grivas³

Reviewer 1: Anonymous

Reviewer 2: Anonymous

J. Funct. Morphol. Kinesiol. 2025, 10(4), 400; https://doi.org/10.3390/jfmk10040400

Submission received: 2 September 2025 / Revised: 30 September 2025 / Accepted: 14 October 2025 / Published: 15 October 2025

(This article belongs to the Section Kinesiology and Biomechanics)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for the opportunity to evaluate this scientific work. For each section, I suggest the following to the authors:
1. Introduction:
The general context is well presented, but it would have been beneficial to more clearly identify the gaps in the existing literature that justify this narrative review.
2. Materials and Methods:
The search strategy is missing important specific keywords, such as "Clinical Test of Thoracolumbar Dissociation" or "Deep Muscle Contraction scale", which are mentioned in the rest of the text. This may raise questions about the exhaustiveness of the search.
The flow chart is also present and useful. However, clear criteria for the "fundamental theoretical sources" are lacking. How were they selected? How many were there? This introduces an element of subjectivity and reduces transparency.
3. Results
Most sections provide a good narrative synthesis of the data in the tables. However, in some parts (e.g., subchapter 3.3 on reliability), there is some reliance on the tables, and the text becomes a simple list of results without sufficient integration and interpretation. For example, it would have been useful to compare and contrast the conflicting PIT test results from different studies directly in the text. Table 3 seems incomplete (rows with study names and results are missing, which are listed below in a chaotic format, as on page 10). This is a major problem of clarity and presentation.
4. Discussion
Although the synthesis of studies is coherent, the general manner of presentation is generally too descriptive and affirmative, lacking a critical dialogue with studies that have contradictory results.

Author Response

1rst REVIEWER
Responses to reviewer

Introduction:

Reviewer comment: The general context is well presented, but it would have been beneficial to more clearly identify the gaps in the existing literature that justify this narrative review.

Response: We thank the reviewer for this important observation. In response, we have clarified the literature gap that justifies the present review. Specifically, the final paragraph of the Introduction (page 2) now states:
“Despite the growing body of literature, there is a lack of integrative reviews that critically synthesize manual tests for FLSI in relation to reliability, validity, subclassification, and predictive value. Moreover, the conceptual foundations of these tests are rarely linked to biomechanical models of spinal stability, such as Panjabi’s framework [22], nor are they consistently evaluated within stratified clinical contexts. This gap limits the development of standardized protocols and impairs clinical decision-making.”

This passage explicitly identifies the absence of integrative synthesis and conceptual linkage as key gaps, thereby strengthening the rationale for conducting this narrative review.

(Materials and Methods)

Reviewer comment:

The search strategy is missing important specific keywords, such as "Clinical Test of Thoracolumbar Dissociation" or "Deep Muscle Contraction scale", which are mentioned in the rest of the text. This may raise questions about the exhaustiveness of the search.

The flow chart is also present and useful. However, clear criteria for the "fundamental theoretical sources" are lacking. How were they selected? How many were there? This introduces an element of subjectivity and reduces transparency.

Response:

We thank the reviewer for this precise and constructive observation. In response, we confirm that the search strategy was refined to include the keywords “Clinical Test of Thoracolumbar Dissociation” and “Deep Muscle Contraction scale,” as explicitly stated in Section 2.2 (Page 3):
“The search terms were expanded to include additional clinically relevant constructs such as ‘Clinical Test of Thoracolumbar Dissociation’ and ‘Deep Muscle Contraction scale,’ which are discussed in the Results section. These additions ensured alignment between the search strategy and the thematic scope of the review.”

These terms were incorporated during the second phase of search refinement, following preliminary thematic coding and identification of emerging constructs in the literature.

Regarding the selection of foundational theoretical sources, the manuscript provides a clear explanation in Section 2.2:

“In addition, 11 foundational theoretical sources were referenced to support the conceptual framework and biomechanical definitions relevant to FLSI. These sources were selected based on citation frequency, relevance to spinal stability models, and inclusion in prior reviews.”

Furthermore, the rationale for their inclusion is reiterated later in the same section:

“These sources were not subject to eligibility screening, as they did not meet the criteria for empirical inclusion, but were selected based on citation frequency, relevance to spinal stability models, and their role in prior conceptual reviews.”

This clarification addresses concerns regarding transparency and subjectivity, and aligns with accepted practices for scoping narrative reviews of complex clinical phenomena.

Results

Reviewer comment:

Most sections provide a good narrative synthesis of the data in the tables. However, in some parts (e.g., subchapter 3.3 on reliability), there is some reliance on the tables, and the text becomes a simple list of results without sufficient integration and interpretation. For example, it would have been useful to compare and contrast the conflicting PIT test results from different studies directly in the text. Table 3 seems incomplete (rows with study names and results are missing, which are listed below in a chaotic format, as on page 10). This is a major problem of clarity and presentation.

Response:

We thank the reviewer for this detailed and constructive observation. In response, subchapter 3.3 has been revised to enhance interpretive synthesis and reduce reliance on tabulated data. Specifically, the revised text now includes direct comparison of PIT reliability findings across studies. As stated in the manuscript:

“The Prone Instability Test (PIT), a widely utilized method for evaluating lumbar shear instability, has demonstrated inconsistent reliability in various studies. Ravenna et al. [37] reported low inter-rater agreement (κ = 0.10–0.27), thereby raising concerns about its reproducibility. Conversely, Larkin et al. [38] reported enhanced reliability (κ = 0.72) through the implementation of a modified PIT protocol, accompanied by standardized procedures. Kim et al. [39] further corroborated the test's clinical relevance, reporting substantial reliability for PIT (κ = 0.79).”

This addition addresses the reviewer’s request for integration and contrast of conflicting results.

Regarding Table 3, we acknowledge the formatting issue noted. The table has been reconstructed to include complete rows with test names, reference standards, diagnostic metrics (sensitivity, specificity, LR+ and LR−), and source attribution. The previous fragmented listing of results beneath the table has been removed, and the data now appear in structured format with consistent alignment and labeling.

Furthermore, Tables 4 through 8 have been revised to ensure clarity, completeness, and traceability. Each table now includes full entries with study identifiers, metrics, and interpretive notes, aligned with the corresponding narrative sections. This restructuring enhances the coherence of the Results section and supports the thematic synthesis across diagnostic accuracy, reliability, subclassification, and predictive value.

These revisions collectively improve the analytical depth and presentation quality of the Results section, in line with the reviewer’s recommendations.

(Discussion)

Reviewer comment:
Although the synthesis of studies is coherent, the general manner of presentation is generally too descriptive and affirmative, lacking a critical dialogue with studies that have contradictory results.

Response:
We thank the reviewer for this important observation. In response, the Discussion section has been revised to incorporate a more critical dialogue with studies reporting divergent or contradictory findings. Specifically, the revised text now contrasts the consistent diagnostic performance of the Passive Lumbar Extension Test (PLET) with the inconsistent reliability and specificity observed in the Prone Instability Test (PIT) and Posterior Shear Test (PST). As stated:

“In contrast, the Prone Instability Test (PIT) and Posterior Shear Test (PST) exhibited inconsistent reliability and limited specificity across studies (Denteneer et al., 2017; Ravenna et al., 2011; Chatprem et al., 2022). These findings raise concerns about their standalone clinical utility and suggest that their use should be contextualized within broader test batteries.”

Additionally, the discussion now addresses the limitations of observational instruments such as the Aberrant Movement Pattern (AMP) test, noting:

“Composite observational instruments, such as the Aberrant Movement Pattern (AMP) test, offer a more nuanced approach to identifying motor control dysfunction. However, inter-rater variability and the absence of standardized operational definitions limit their reproducibility (Ferrari et al., 2015; Vanti et al., 2016).”

The revised section also critically appraises the predictive value of manual tests, acknowledging the absence of consistent prognostic associations in studies by Oliveira et al. (2019), Denteneer et al. (2021), and Thomson et al. (2025). These findings are explicitly discussed as limitations to the clinical utility of isolated test performance.

Finally, the methodological considerations subsection highlights the heterogeneity of study designs, sample sizes, and reference standards, and calls for the integration of emerging technologies (e.g., wearable sensors, AI-assisted motion analysis) to improve diagnostic objectivity and reproducibility. This reflects a shift from descriptive synthesis to critical evaluation and future-oriented recommendations.

These revisions collectively strengthen the interpretive depth of the Discussion and address the reviewer’s concern regarding the lack of engagement with contradictory evidence.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

Thank you for the opportunity to review your manuscript entitled “Clinical Evaluation of Functional Lumbar Segmental Instability: Reliability, Validity, and Subclassification of Manual Tests.” This narrative review addresses a clinically relevant and conceptually important topic in musculoskeletal rehabilitation, namely the evaluation of functional lumbar segmental instability (FLSI). The manuscript reflects an extensive effort to integrate biomechanical, clinical, and diagnostic literature. However, there are several critical issues related to scope, organization, and interpretive rigor that should be addressed before the manuscript can be considered for publication in a high-impact journal.

(Introduction): The introduction is comprehensive but suffers from conceptual overcrowding. Numerous themes—such as subclassification models, prognostic tools, and emerging assessments—are introduced without a clear central research question or organizing framework.
(Introduction): Please streamline the introduction to focus more explicitly on the rationale and objectives of the review. What are the key questions this review seeks to answer regarding FLSI evaluation?
(Introduction): Redundant references and overlapping definitions (e.g., FLSI vs. structural instability vs. sub-threshold instability) should be minimized to improve clarity.
(Materials and Methods): While the review is presented as a narrative, the methodology includes systematic features (e.g., search strategy, eligibility criteria, data extraction, thematic analysis). This hybrid structure is somewhat confusing.
(Materials and Methods): If the authors aim to preserve the narrative review format, consider simplifying the methods accordingly. Alternatively, reframe the study as a scoping review to justify the semi-systematic approach.
(Materials and Methods): The use of quality appraisal tools (QUADAS, QAREL) is appropriate; however, the outcomes of this appraisal process are not sufficiently reported or tabulated. Consider including a summary of quality ratings to enhance transparency.
(Materials and Methods): The rationale behind the five thematic domains should be better justified with references or conceptual grounding.
(Results): The results section is overly descriptive and lacks prioritization. While comprehensiveness is appreciated, the key findings regarding diagnostic accuracy and reliability are buried in dense paragraphs.
(Results): Please consider summarizing diagnostic and reliability metrics in concise tables and limiting text to interpretive highlights.
(Results): Certain studies (e.g., Rabin, Alyazedi, Oliveira) are cited repeatedly, potentially introducing bias. Please ensure balanced representation of the evidence base.
(Results): The section on predictive validity is valuable but lacks critical analysis of why tests failed to predict outcomes—e.g., lack of stratification, intervention heterogeneity, or poor construct validity.
(Discussion): The discussion synthesizes literature broadly but misses the opportunity to critically reflect on the limitations of current evidence or the reasons for conflicting findings.
(Discussion): You state that manual tests show “limited predictive value” but do not fully explore the implications for clinical decision-making. Why do these tools fall short in forecasting treatment outcomes?
(Discussion): The clinical application of subclassification models is promising but should be discussed with more nuance, particularly regarding feasibility, training needs, and potential misclassification.
(Discussion): Recent technologies (e.g., sensors, AI) are mentioned briefly. Consider elaborating on how these tools could be integrated into a clinical framework for FLSI assessment.
(Conclusion): The conclusion appropriately calls for integrative and multidimensional approaches, but remains abstract and somewhat repetitive of earlier sections.
(Conclusion): Please clearly state what this review contributes to the field that previous reviews have not.
(Conclusion): Highlight specific tests (e.g., PLET) or frameworks (e.g., subclassification) that show the most promise based on your synthesis.
(Conclusion): Suggest concrete research directions, such as the validation of composite test batteries or stratified outcome studies in primary care settings.

Author Response

2^nd Reviewer
Responses.

Reviewer comment:

The introduction is comprehensive but suffers from conceptual overcrowding. Numerous themes—such as subclassification models, prognostic tools, and emerging assessments—are introduced without a clear central research question or organizing framework.

Response:
We thank the reviewer for this insightful observation. In response, we have revised the Introduction to reduce conceptual density and improve thematic focus. Supporting themes such as subclassification models and prognostic indicators are now introduced as extensions of the central diagnostic challenge, rather than as parallel concepts. The revised structure emphasizes the distinction between functional and structural instability and frames the subsequent discussion around the clinical evaluation of FLSI.