Inter-Rater Agreement Between a Trained Nurse and Physicians in FAST Examination of Trauma Patients: A Pilot Study in the Emergency Department
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThank you for the opportunity to review this article. This topic is highly relevant due to the critical condition of most patients with multiple trauma admitted to the emergency department. This requires immediate decisions regarding patient management. Given the overload of emergency department physicians, training nurses in ultrasound diagnostics of fluid in the abdominal cavity, pleural cavities, and pericardium is an important task for the emergency medicine system. The main limitations of this study are the very small sample size and short duration of study (three months) conducted at a single medical center, which casts doubt on the validity of the results. Furthermore, in the description of materials and methods, the authors failed to indicate that ultrasound is used not only to detect fluid in the abdominal cavity and pericardium, but also fluid in the pleural cavities, as hemothorax is often possible in cases of multiple trauma. Furthermore, it is unclear whether the ultrasound results were available to the trauma and surgical teams providing medical care to the patient, as such data is essential for rapid decision-making regarding treatment. The study requires a multi-center approach with the involvement of a sufficient number of nurses trained in standard educational programs.
Author Response
Reviewer 1
|
Comment |
Response |
Citation (section, page and line/-s) |
|
Thank you for the opportunity to review this article. This topic is highly relevant due to the critical condition of most patients with multiple trauma admitted to the emergency department. This requires immediate decisions regarding patient management. Given the overload of emergency department physicians, training nurses in ultrasound diagnostics of fluid in the abdominal cavity, pleural cavities, and pericardium is an important task for the emergency medicine system.
|
We thank the reviewer for this positive and encouraging comment and for recognizing the clinical relevance of the topic. |
|
|
The main limitations of this study are the very small sample size and short duration of study (three months) conducted at a single medical center, which casts doubt on the validity of the results. |
We thank the reviewer for this important comment. We agree that the relatively small sample size, short study duration, and single-center design represent important limitations that may affect the robustness and generalizability of the findings.
These aspects have been explicitly acknowledged in the Limitations section: ‘It is important to acknowledge certain limitations of this study. First, the study was a pilot study conducted at a single center, which limits the generalizability of the find-ings. Second, the sample size was relatively small. Third, the relatively short study duration may have limited patient recruitment and the number of positive cases, potentially affecting the representativeness of the findings.’ |
4.3. Limitations p.12, 469-473
|
|
Furthermore, in the description of materials and methods, the authors failed to indicate that ultrasound is used not only to detect fluid in the abdominal cavity and pericardium, but also fluid in the pleural cavities, as hemothorax is often possible in cases of multiple trauma. |
We thank the reviewer for this valuable comment. We agree that FAST may also be used to detect fluid in the pleural cavities, including hemothorax in trauma patients.
In response, we have revised the Abstract, the Introduction and the Methods sections to clarify that FAST assessment includes evaluation for free fluid in the intraperitoneal cavity, pericardium, and pleural spaces. Abstract: ‘The Focused Assessment with Sonography in Trauma (FAST) is a bedside ultrasound examination used for the early detection of free fluid in the intraperitoneal cavity, pericardium, and pleural spaces.’
Introduction: ‘FAST is used to assess the presence of free fluid in the intraperitoneal cavity, pericardium and pleural spaces’
Methods: ‘…with the aim of rapidly detecting free fluid in the intraperitoneal, pericardial cavity, and pleural spaces…’ |
Abstract, p.1, 24-26
Introduction, p.2, 82-83
2.3. Ultrasound equipment and data recording, p. 4, 158-159
|
|
Furthermore, it is unclear whether the ultrasound results were available to the trauma and surgical teams providing medical care to the patient, as such data is essential for rapid decision-making regarding treatment. |
We thank the reviewer for this important comment. We agree that clarification regarding the availability of ultrasound findings to the clinical team is essential.
In response, we have revised the Methods section to explicitly state that FAST examinations performed as part of the study did not influence clinical decision-making, and that patient management was based on standard clinical assessment and imaging pathways: ‘Ultrasound findings recorded for study purposes were not made available to the trauma or surgical teams for decision-making, and patient management was based on standard clinical assessment and imaging pathways.’ |
2.5.4. Collection of demographic and clinical data, p. 5-6, 236-238 |
|
The study requires a multi-center approach with the involvement of a sufficient number of nurses trained in standard educational programs. |
We thank the reviewer for this important comment. We fully agree that a multicenter approach involving a larger number of nurses trained through standardized educational programs is essential to strengthen the external validity of the findings.
As this study was designed as a pilot, it aimed to provide preliminary insights under controlled conditions. We have now further emphasized in the Conclusions the need for future multicenter studies including multiple operators and standardized training protocols to better assess the generalizability and clinical applicability of nurse-performed FAST: ‘…Further studies involving larger samples, multiple operators, multicentre designs, and standardized training protocols are needed to validate these findings, better define its role in clinical practice, evaluate its integration into routine care, and investigate its potential impact on patient safety.’ |
Conclusions, p. 13, 533-536
|
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe study aims to compare FAST findings between an ED nurse and radiologists. The methodology is sound and the study design is quite standard and not flawed. However, the numbers of patients with positive findings are quite low compared to those with negative findings, only 4 out of 68 patients. This makes the acceptance of the results hard to accept as valid. The authors can work around this by perhaps including hemodialysis patients as part of the study subjects. This can include the number of positive patients. Alternatively, they can continue the study for longer. The authors recognise this limitation and should include more positive patients to power the study
Author Response
Reviewer 2
|
Comment |
Response |
Citation (section, page and line/-s) |
|
The study aims to compare FAST findings between an ED nurse and radiologists. The methodology is sound and the study design is quite standard and not flawed. |
We sincerely appreciate the reviewer’s positive evaluation of the study. |
|
|
However, the numbers of patients with positive findings are quite low compared to those with negative findings, only 4 out of 68 patients. This makes the acceptance of the results hard to accept as valid. The authors can work around this by perhaps including hemodialysis patients as part of the study subjects. This can include the number of positive patients. Alternatively, they can continue the study for longer. The authors recognise this limitation and should include more positive patients to power the study. |
We thank the reviewer for this thoughtful comment. We agree that the low number of patients with positive findings represents an important limitation and affects the precision and interpretability of the results.
As acknowledged in the manuscript, this study was designed as a pilot study conducted in a real-world emergency trauma setting, where the prevalence of positive FAST findings was inherently low during the study period. Nevertheless, we have explicitly acknowledged this limitation and further elaborated on its implications in the Discussion section: ‘However, these findings should be interpreted with caution, as the study was conducted in a single-center setting and involved a single trained nurse, a limited sample size, a relatively short study duration, and a low number of positive cases. Therefore, the results may reflect individual operator performance rather than the broader feasibility of nurse-performed FAST and should not be generalized beyond the study setting.’
We agree that including a larger number of positive cases would strengthen the analysis; however, the inclusion of different patient populations (e.g., hemodialysis patients) would not be consistent with the study objective, which specifically focuses on trauma patients in the emergency department.
We also agree that extending the study duration of study would allow recruitment of a greater number of positive cases and improve the robustness of the findings. This has now been emphasized in the Limitation and Conclusions sections: Limitations: ‘Third, the relatively short study duration may have limited patient recruitment and the number of positive cases, potentially affecting the representativeness of the findings.’
Conclusions: ‘These findings support the potential feasibility of nurse-performed FAST in emergency care settings. Further studies involving larger samples, multiple operators, multicenter designs, and standardized training protocols are needed to validate these findings, better define their role in clinical practice, evaluate their integration into routine care, and investigate their potential impact on patient safety.’ |
Discussion, p.10, 377-381
Limitations, p.12, 472-473
Conclusions, p.13, 532-536
|
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authors
Abstract
The abstract is too definitive for such a small pilot. It presents strong claims of accuracy and implies clinical readiness, yet the numbers are based on very few positive findings and very small CT-verified samples. It also reports very high agreement in regions where kappa could not even be meaningfully estimated, which risks overstating the robustness of the findings. Most importantly, the abstract does not warn the reader that the diagnostic-accuracy estimates come from only 23 patients in one subgroup and 10 in another, with only 2 positives in each table.
Introduction
The introduction sets up a patient-safety and operational-efficiency rationale, but it does not translate that rationale into measurable study endpoints. It argues that nurse FAST may optimize triage, reduce delays, and improve safety, yet none of those outcomes are assessed. That creates a mismatch between the problem framing and the actual study design. The literature gap is also stated somewhat vaguely: the manuscript says evidence on nurses is scarce, but it does not clearly explain what specific unanswered question this study addresses beyond repeating a basic agreement/accuracy comparison already explored in prior nurse FAST studies.
Methods
Design and setting
The design has serious threats to external validity. It is single-center, convenience-based, and explicitly dependent on when the nurse-researcher was available, which introduces selection bias from the outset. The manuscript does not describe how often eligible patients were missed, whether included cases differed from non-included cases, or whether the included time windows were representative of usual ED operations. That makes the sample vulnerable to systematic bias.
Participants
The eligibility criteria are problematic in several ways. First, requiring signed consent from the patient or relatives in an emergency-trauma context likely excludes the sickest, most time-critical, and less accessible patients. Second, excluding life-threatening presentations specifically to avoid delaying treatment means the study omits exactly the cohort in whom FAST is often most clinically consequential. Third, excluding patients who did not know Greek or English introduces language-based selection bias and limits generalizability. Fourth, excluding BMI > 40 and pregnancy removes clinically relevant subgroups in whom image acquisition may be harder but still matters in real practice. Overall, the sampling strategy appears to enrich for easier, more stable cases.
index test and operator training
The nurse training description is insufficiently detailed for replication and for judging validity. The manuscript gives total hours and supervision duration, but not the competency criteria, number of supervised FAST scans completed, assessment thresholds, image-review process, or whether the nurse had to demonstrate interobserver proficiency before study enrollment. Because only one nurse performed all scans, the study may be measuring this individual nurse’s aptitude more than the feasibility of nurse-performed FAST as a practice model.
blinding and independence
The blinding is weaker than the manuscript suggests. The radiologist knew the nurse had already examined the patient, which can subtly influence interpretation. The second examiner also assessed the patient after the first scan, so the patient’s condition and scanning context were not simultaneous. The manuscript does not report the time interval between scans, whether clinical changes occurred between examinations, or whether either examiner had access to other clinical cues likely to affect interpretation. This matters because inter-rater agreement may partly reflect shared clinical context rather than pure image interpretation agreement.
reference standard / verification
The reference-standard strategy is inconsistent. Some patients had nurse FAST plus radiologist FAST plus CT, while trauma-team activation cases had nurse FAST plus CT but no radiologist comparator. This creates two analytically different pathways that are later combined under the umbrella of diagnostic accuracy. The manuscript does not explain the criteria for who received CT among the 58 dual-assessed patients, so partial verification bias is possible. Since CT was not done systematically in everyone, the reported diagnostic performance may reflect a selected subgroup rather than the whole study population.
statistical analysis
The statistical plan is thin for the data structure. Cohen’s kappa was used, but the manuscript does not address the well-known instability of kappa under extreme prevalence, which is exactly the situation here, with almost all scans negative. Reporting raw agreement alone in that context is potentially misleading. The paper also gives p-values for kappa, which add little interpretive value compared with confidence intervals, yet confidence intervals for kappa are not reported. More importantly, the diagnostic-accuracy analysis does not seem preplanned around an adequate sample size and yields perfect point estimates with extremely wide confidence intervals, making the estimates practically uninformative. There is also no sample-size justification for either agreement or accuracy aims.
Results
sample flow and cohort definition
The study flow is not reported cleanly enough. The manuscript states that 76 were evaluated, 8 excluded, 68 enrolled, 58 entered the agreement analysis, 23 of those had CT, and 10 additional trauma-team patients had nurse FAST plus CT. But it never clearly explains why only 23 of 58 had CT, what distinguished them clinically, or whether CT-negative cases differed systematically from CT-tested cases. That obscures the denominator and raises concern that the accuracy subset was selectively verified.
distribution of findings
The event rate is too low to sustain strong conclusions. Positive findings were extremely rare: 4 RUQ positives, 1 LUQ positive, 1 SUPH positive, and none in BLADDER. With prevalence this low, very high agreement can occur largely because both raters mostly call everything negative. In other words, the manuscript’s headline agreement values may be driven more by class imbalance than by convincing demonstration of true discriminative performance across all windows.
inter-rater agreement
The agreement section overinterprets sparse data. In RUQ, kappa is based on only four radiologist-positive cases and one disagreement. In LUQ and BLADDER, kappa could not be meaningfully estimated, yet the narrative still leans on the observed agreement percentages as if they were substantively informative. In SUPH, the “perfect” agreement is based on a single positive case. Those are fragile estimates, and the manuscript does not adequately emphasize how unstable they are. The phrase “substantial to excellent inter-rater reliability” reads too strongly given the tiny number of positive observations.
diagnostic accuracy
This is the manuscript’s weakest section. Reporting 100% sensitivity, specificity, PPV, and NPV in the main CT subgroup is not persuasive when the 2×2 table contains only 23 patients and just 2 positives. The confidence intervals themselves show how uncertain the estimates are, especially sensitivity and PPV. The same problem appears in the trauma-team subgroup, again with only 10 patients and 2 positives. Perfect performance under such tiny denominators is exactly where readers should be most cautious, but the manuscript treats these results too affirmatively. There is also no explanation of whether CT readers were blinded to FAST findings, which matters when CT is the reference standard.
Discussion
interpretation of agreement
The discussion generalizes too quickly from a very narrow dataset. It interprets the findings as support that emergency nurses can perform and interpret FAST correctly, but the actual evidence is from one nurse, one hospital, one device, a short enrollment period, and very few positive scans. The discussion does not sufficiently acknowledge that this could be an operator-specific success rather than evidence of broader role implementation. It also leans on system-level implications like radiologist availability and ED efficiency that were not studied here.
interpretation of diagnostic accuracy
The manuscript treats the perfect diagnostic estimates as meaningful evidence of reliability, despite openly admitting later that the small sample may explain them. That caveat should not be buried; it should dominate interpretation. The comparison with prior literature is also selective in tone: the discussion highlights consistency with nurse FAST literature, but the manuscript’s own estimates are more extreme than prior studies and far less stable. The argument that stronger training may explain the superior performance is speculative and unsupported because the study does not compare training intensities or operators.
broader claims
The discussion repeatedly pivots from agreement/accuracy to claims about faster assessment, improved ED functionality, patient safety, and role expansion, none of which were measured. That makes the discussion aspirational rather than evidence-bound. It would be more credible if it stayed closer to what the data can actually show.
Limitations
The limitations section is incomplete. It acknowledges small sample size, single center, convenience sampling, single nurse, and few positive cases, but it does not explicitly mention partial verification bias, prevalence effects on agreement statistics, the exclusion of unstable/high-risk patients, the possibility of spectrum bias from excluding harder-to-scan groups, the lack of simultaneous paired assessments, or the possibility that the reference-standard subset was selected by clinical judgment. These are not minor omissions; they are central threats to validity.
Conclusion
The conclusion is too strong relative to the evidence. Saying the findings show “high diagnostic accuracy” and support meaningful nurse contribution to timely assessment is more definitive than the data justify. Given the tiny CT-confirmed sample and the low event rate, the manuscript should conclude feasibility and preliminary signal, not accuracy in a way that sounds established. The closing claims again imply patient-safety benefit without having measured any patient-safety endpoint.
Author Response
Reviewer 3
|
Comment |
Response |
Citation (section, page and line/-s) |
|
Abstract |
||
|
The abstract is too definitive for such a small pilot. It presents strong claims of accuracy and implies clinical readiness, yet the numbers are based on very few positive findings and very small CT-verified samples.
|
We thank the reviewer for this thoughtful and constructive comment. We agree that the abstract, as originally written, may have conveyed a more definitive interpretation of the findings than is warranted given the pilot nature of the study and the limited number of positive cases. In response, we have revised the abstract to better reflect the exploratory nature of the study and to avoid overstatement. Specifically:
(a) We have emphasized inter-rater agreement as the primary focus of the study and reframed diagnostic accuracy as an exploratory objective: ‘This study aimed to evaluate the inter-rater agreement between a trained emergency nurse and physicians in performing FAST and to explore the diagnostic accuracy of nurse-performed FAST compared with computed tomography (CT)’
(b) We have explicitly clarified the size and structure of the CT-verified subgroups: ‘Results: The sample included 68 trauma patients, of whom 58 underwent FAST by both the nurse and the radiologist and were included in the inter-rater agreement analysis….. In a subgroup of patients who underwent CT (n = 23), as well as in an additional trauma-team subgroup (n = 10), diagnostic accuracy estimates were…’
(c) We have softened the wording of the Results and Conclusions to reflect the preliminary nature of the findings and removed any implication of definitive clinical readiness: Results: ‘Agreement in the RUQ area was 98.3% (Cohen’s kappa= 0.85, p < 0.001) while agreement was observed in all cases in the SUPH region (100%, Cohen’s kappa = 1.00, p < 0.001), although this finding was based on a single positive case. High observed agreement was also noted in LUQ (98.3%) and BLADDER regions; however, Cohen’s kappa could not be reliably estimated in these regions due to limited variability and the very small number of positive cases.’
Conclusions: ‘This pilot study suggests that, under specific training conditions, a trained emergency nurse may achieve a high level of agreement with physician assessments when performing FAST. The findings regarding diagnostic accuracy are preliminary and should be interpreted with caution due to the small sample size and low number of positive cases.’ |
Abstract, p.1, 28-31
Abstract, p.1, 42-45
Abstract, p.2, 52-55 |
|
It also reports very high agreement in regions where kappa could not even be meaningfully estimated, which risks overstating the robustness of the findings.
|
We thank the reviewer for this important comment. We agree that in regions with very low variability and an extremely small number of positive findings, the interpretation of agreement metrics, particularly Cohen’s kappa, is limited and may risk overstating the robustness of the findings. In response, we have revised the presentation of agreement results to avoid overstating robustness in regions where kappa could not be reliably estimated due to limited variability: ‘…; however, Cohen’s kappa could not be reliably estimated in these regions due to limited variability and the very small number of positive cases.’ |
Abstract, p.1-2, 46-47
|
|
Most importantly, the abstract does not warn the reader that the diagnostic-accuracy estimates come from only 23 patients in one subgroup and 10 in another, with only 2 positives in each table.
|
We thank the reviewer for this important comment. In response, we have revised the Abstract to explicitly report the number of patients included in each subgroup (n = 23 and n = 10) and to clarify that only two positive cases were identified in each subgroup. We have also added wording to highlight that these estimates are based on very small numbers and are associated with wide confidence intervals: ‘In a subgroup of patients who underwent CT (n = 23), as well as in an additional trauma-team subgroup (n = 10), diagnostic accuracy estimates were 100% for sensitivity and specificity; however, these estimates were based on a very small number of positive cases (only two positive cases in each subgroup) and were associated with wide confidence intervals.’ |
Abstract, p.2, 47-51 |
|
Introduction |
||
|
The introduction sets up a patient-safety and operational-efficiency rationale, but it does not translate that rationale into measurable study endpoints. It argues that nurse FAST may optimize triage, reduce delays, and improve safety, yet none of those outcomes are assessed. That creates a mismatch between the problem framing and the actual study design. |
We thank the reviewer for this insightful and constructive comment. We agree that the original Introduction placed disproportionate emphasis on broader clinical implications such as patient safety, triage optimization, and operational efficiency, without these outcomes being directly assessed in the present study. In response, we have revised the Introduction to better align the study rationale with the actual study design and measured outcomes. Specifically, we have removed statements referring to patient safety and operational outcomes that were not evaluated in this study
We have also revised the title to avoid overemphasis on broader clinical and patient safety implications: ‘Inter-rater agreement between a trained nurse and physicians in FAST examination of trauma patients: a pilot study in the emergency department.’ |
Title, p.1, 2-4 |
|
The literature gap is also stated somewhat vaguely: the manuscript says evidence on nurses is scarce, but it does not clearly explain what specific unanswered question this study addresses beyond repeating a basic agreement/accuracy comparison already explored in prior nurse FAST studies. |
We thank the reviewer for this important comment. We agree and we have strengthened the description of the literature background and gap. While previous studies have examined nurse-performed FAST, we now more clearly indicate that the available evidence is limited and heterogeneous, and has primarily focused on diagnostic performance and feasibility rather than inter-rater agreement between nurses and physicians within the same clinical setting: ‘Furthermore, existing evidence is heterogeneous, with variability in study design, training approaches, and reference standards [17,18]. The literature has primarily focused on diagnostic performance and feasibility, while less emphasis has been placed on inter-rater agreement between nurses and physicians, especially using independently performed assessments within the same clinical setting.’ |
Introduction, p.3, 96-101 |
|
Methods |
||
|
Design and setting
The design has serious threats to external validity. It is single-center, convenience-based, and explicitly dependent on when the nurse-researcher was available, which introduces selection bias from the outset. The manuscript does not describe how often eligible patients were missed, whether included cases differed from non-included cases, or whether the included time windows were representative of usual ED operations. That makes the sample vulnerable to systematic bias.
|
We thank the reviewer for this important comment. We acknowledge that the study design includes limitations that may affect external validity, particularly due to its single-center, pilot nature and the use of convenience sampling.
Some of these aspects were already acknowledged in the original Limitations section; however, in response to the reviewer’s comment, we have further clarified and expanded this section to improve transparency. Specifically, we now state more explicitly that the study did not systematically record the number or characteristics of eligible patients who were not included, that no comparison was performed between included and non-included cases, and that patient inclusion depended on the availability of the trained nurse, which may limit the representativeness of the sample across different ED time periods: ‘Fourth, a convenience sample was used, and patient inclusion depended on the availability of the participating nurse; therefore, not all eligible patients may have been included during the study period, introducing potential selection bias. Fifth, the study did not systematically record the number or characteristics of eligible patients who were not included, and no comparison was performed between included and non-included cases; therefore, selection bias cannot be excluded. Sixth, patient inclusion was also limited to the time periods during which the trained nurse was available, and the included time windows may not fully represent routine ED operations across all shifts.’ |
Limitations, p.12, 474-482 |
|
Participants
The eligibility criteria are problematic in several ways.
First, requiring signed consent from the patient or relatives in an emergency-trauma context likely excludes the sickest, most time-critical, and less accessible patients. Second, excluding life-threatening presentations specifically to avoid delaying treatment means the study omits exactly the cohort in whom FAST is often most clinically consequential. Third, excluding patients who did not know Greek or English introduces language-based selection bias and limits generalizability. Fourth, excluding BMI > 40 and pregnancy removes clinically relevant subgroups in whom image acquisition may be harder but still matters in real practice. Overall, the sampling strategy appears to enrich for easier, more stable cases.
|
We thank the reviewer for this important and insightful comment. We acknowledge that the eligibility criteria may have introduced selection bias and may have led to the inclusion of more stable and less complex cases.
The requirement for informed consent, as well as the exclusion of life-threatening cases, was primarily driven by ethical considerations and the need to avoid any delay in urgent clinical management.
We also acknowledge that the exclusion of patients based on language, as well as specific clinical characteristics such as BMI > 40 and pregnancy, may further limit the generalizability of the findings and contribute to a more selected study population.
In response, we have expanded the Limitations section to more explicitly address these potential sources of selection and spectrum bias and their implications for the interpretation of the results. ‘Seventh, the eligibility criteria, including the requirement for informed consent, the exclusion of life-threatening cases, and the exclusion of specific patient groups (e.g., patients with language barriers, BMI > 40, or pregnancy), may have led to the underrepresentation of more critically ill or complex patients, thereby introducing poten-tial selection bias.’ |
Limitations, p.12, 483-487 |
|
Index test and operator training
(a) The nurse training description is insufficiently detailed for replication and for judging validity. The manuscript gives total hours and supervision duration, but not the competency criteria, number of supervised FAST scans completed, assessment thresholds, image-review process, or whether the nurse had to demonstrate interobserver proficiency before study enrollment.
(b) Because only one nurse performed all scans, the study may be measuring this individual nurse’s aptitude more than the feasibility of nurse-performed FAST as a practice model.
|
We thank the reviewer for this important and detailed comment.
(a) In response, we have clarified the training description in the Methods section by providing additional details on the structure and duration of the training and supervision process. However, we acknowledge that specific competency criteria, formal assessment thresholds, and a predefined number of supervised FAST examinations were not systematically documented as part of the study protocol. We also clarify that no formal assessment of interobserver proficiency was required prior to study enrollment, beyond the completion of the training and supervised practice: ‘During the training period, the nurse performed FAST examinations as part of supervised clinical practice. However, a predefined minimum number of supervised ex-aminations, formal competency thresholds, and standardized assessment criteria were not specified as part of the study protocol. In addition, no formal assessment of interobserver agreement was conducted prior to study initiation beyond the completion of the training and supervised clinical practice.’
In addition, we have expanded the Limitations section to explicitly acknowledge this issue: ‘Ninth, the training process was not based on predefined competency thresholds or standardized assessment criteria, and interobserver proficiency was not formally eval-uated prior to study initiation.’
(b) We thank the reviewer for this important comment. We agree that the use of a single operator may influence the interpretation of the findings and may reflect individual performance rather than the broader feasibility of nurse-performed FAST as a practice model. In response, we have expanded the Limitations section to explicitly acknowledge that the use of a single trained nurse may reflect individual performance and may limit the generalizability of the findings to broader clinical practice: ‘Eighth, FAST examination was performed by a single nurse, which may limit the generalizability of the results. The findings may also reflect individual operator performance rather than the broader feasibility of nurse-performed FAST as a clinical practice model.’ |
2.4. Training of the nurse who collected the data, p. 4, 178-183
Limitations, p. 13, 494-496
Limitations, p. 12, 488-491 |
|
Blinding and independence
The blinding is weaker than the manuscript suggests. The radiologist knew the nurse had already examined the patient, which can subtly influence interpretation. The second examiner also assessed the patient after the first scan, so the patient’s condition and scanning context were not simultaneous. The manuscript does not report the time interval between scans, whether clinical changes occurred between examinations, or whether either examiner had access to other clinical cues likely to affect interpretation. This matters because inter-rater agreement may partly reflect shared clinical context rather than pure image interpretation agreement. |
We thank the reviewer for this important and insightful comment. We acknowledge that complete blinding between examiners was not feasible within the real-world emergency department setting in which the study was conducted. Although the nurse and physicians recorded their findings independently and did not have access to each other’s results or images, the radiologist was aware that a prior FAST examination had been performed, and the assessments were conducted sequentially rather than simultaneously.
However, it should be noted that FAST is a rapid bedside examination, typically completed within a few minutes, which may limit the likelihood of substantial clinical changes between assessments.
We agree that these factors may have introduced some degree of shared clinical context and may have influenced the observed inter-rater agreement. In addition, the time interval between examinations and potential clinical changes were not systematically recorded.
In response, we have revised the Methods section to provide a clearer and more balanced description of the procedures related to independence and blinding: ‘These procedures were implemented to promote independent recording of findings and to support blinding by minimizing access to prior results between examiners. FAST examinations were performed sequentially as part of routine clinical practice; however, given the rapid nature of the examination, the time interval between assessments was expected to be short, although it was not systematically recorded or analyzed.’
We have also expanded the Limitations section to explicitly acknowledge the potential impact of these factors on the study findings: ‘Tenth, complete blinding between examiners was not feasible, as FAST examinations were performed sequentially and the second examiner was aware that a prior examination had been conducted. The time interval between assessments and potential clinical changes were not systematically recorded. These factors may have introduced a degree of shared clinical context and may have influenced the observed inter-rater agreement.’ |
2.5.2. Ensuring independence and blinding, p. 5, 214-218
Limitations, p.13, 499-504
|
|
Reference standard / verification
The reference-standard strategy is inconsistent. Some patients had nurse FAST plus radiologist FAST plus CT, while trauma-team activation cases had nurse FAST plus CT but no radiologist comparator. This creates two analytically different pathways that are later combined under the umbrella of diagnostic accuracy. The manuscript does not explain the criteria for who received CT among the 58 dual-assessed patients, so partial verification bias is possible. Since CT was not done systematically in everyone, the reported diagnostic performance may reflect a selected subgroup rather than the whole study population. |
We thank the reviewer for this important and insightful comment. We agree that the reference-standard strategy reflects the constraints of real-world clinical practice and may introduce heterogeneity in the diagnostic accuracy analysis.
In our study, CT was performed based on clinical judgment and standard trauma management protocols, rather than as part of a predefined study algorithm. As a result, not all patients underwent CT, and different diagnostic pathways emerged (i.e., nurse FAST with radiologist assessment and CT in some cases, and nurse FAST with CT in trauma-team activation cases).
We acknowledge that this approach may introduce partial verification bias, as the diagnostic accuracy estimates are based on a selected subgroup of patients who underwent CT rather than the entire study population.
In response, we have clarified the criteria for CT utilization in the Methods section: ‘CT imaging was also performed in selected patients outside Trauma Team activation, based on clinical judgment and standard trauma management protocols, and was not applied systematically to all patients.’
Additionally, we have expanded the Limitations section to explicitly acknowledge the potential for verification bias and its impact on the interpretation of diagnostic performance: ‘Eleventh, CT imaging was not performed systematically in all patients but was based on clinical indications and routine trauma management protocols. As a result, different diagnostic pathways were followed (i.e., nurse-performed FAST with radiologist assessment and CT in some cases, and nurse-performed FAST with CT in Trauma Team activation cases). This approach may have introduced partial verification bias, as diagnostic accuracy estimates were derived from a selected subgroup of patients who underwent CT and may not fully represent the entire study population.’ |
2.5.3. Management of severely injured patients and CT utilization, p. 5, 228-230
Limitations, p. 13, 505-511
|
|
Statistical analysis
(a) The statistical plan is thin for the data structure. Cohen’s kappa was used, but the manuscript does not address the well-known instability of kappa under extreme prevalence, which is exactly the situation here, with almost all scans negative. Reporting raw agreement alone in that context is potentially misleading. The paper also gives p-values for kappa, which add little interpretive value compared with confidence intervals, yet confidence intervals for kappa are not reported.
(b) More importantly, the diagnostic-accuracy analysis does not seem preplanned around an adequate sample size and yields perfect point estimates with extremely wide confidence intervals, making the estimates practically uninformative. There is also no sample-size justification for either agreement or accuracy aims. |
We thank the reviewer for this important and insightful comment. We agree that the interpretation of Cohen’s kappa may be affected by the very low prevalence of positive findings in our study, which may lead to instability of the estimates. In response, we have clarified in the Methods section that kappa values should be interpreted with caution in the context of limited variability and have emphasized the role of observed agreement as a complementary descriptive measure: ‘Given the low prevalence of positive findings, kappa estimates were interpreted with caution, as agreement measures may be affected by limited variability and prevalence effects.’
We agree that no formal a priori sample size calculation was performed for either the agreement or diagnostic accuracy analyses. As this study was designed as a pilot study, the aim was to generate preliminary data rather than provide adequately powered confirmatory estimates. We have now explicitly acknowledged this in the Design and setting section and as a limitation:
Design and setting: ‘Given the pilot nature of the study, no formal sample size calculation was performed. The sample size was based on feasibility and the number of eligible patients presenting during the study period.’
Limitaion: ‘Finally, no formal a priori sample size calculation was performed for either the inter-rater agreement or the diagnostic accuracy analyses, as this study was designed as a pilot study. Therefore, the sample size may not be sufficient to provide precise or stable estimates, particularly for diagnostic performance.’ |
2.7. Statistical analysis, p. 6, 259-261
2.1. Design and setting, p. 3, 118-120
Limitations, p.13, 519-522 |
|
Results |
||
|
Sample flow and cohort definition
The study flow is not reported cleanly enough. The manuscript states that 76 were evaluated, 8 excluded, 68 enrolled, 58 entered the agreement analysis, 23 of those had CT, and 10 additional trauma-team patients had nurse FAST plus CT. But it never clearly explains why only 23 of 58 had CT, what distinguished them clinically, or whether CT-negative cases differed systematically from CT-tested cases. That obscures the denominator and raises concern that the accuracy subset was selectively verified. |
We thank the reviewer for this important and insightful comment. We agree that the study flow and the definition of the diagnostic accuracy subset required clearer presentation.
The criteria for CT utilization and Trauma Team activation have been clarified in the Methods section (Section 2.5.3), where it is stated that CT imaging was performed based on clinical indications and routine trauma management protocols and was not applied systematically to all patients.
To further improve clarity, we have now also revised the Results section to explicitly indicate that only a subset of patients underwent CT based on clinical indications, thereby explaining why the diagnostic accuracy analysis was conducted on a selected subgroup:
‘Of these 58 patients, 23 also underwent CT scanning, based on clinical indications and physician judgment, in accordance with routine trauma management protocols.’
In addition, the Limitations section explicitly acknowledges the potential for partial verification bias and the possibility that patients who underwent CT may differ systematically from those who did not: ‘Eleventh, CT imaging was not performed systematically in all patients but was based on clinical indications and routine trauma management protocols. As a result, different diagnostic pathways were followed (i.e., nurse-performed FAST with radiologist assessment and CT in some cases, and nurse-performed FAST with CT in Trauma Team activation cases). This approach may have introduced partial verification bias, as diagnostic accuracy estimates were derived from a selected subgroup of patients who underwent CT and may not fully represent the entire study population.’ |
3.1. Patient characteristics, p. 6, 276-277
Limitations, p. 13, 505-511
|
|
Distribution of findings
The event rate is too low to sustain strong conclusions. Positive findings were extremely rare: 4 RUQ positives, 1 LUQ positive, 1 SUPH positive, and none in BLADDER. With prevalence this low, very high agreement can occur largely because both raters mostly call everything negative. In other words, the manuscript’s headline agreement values may be driven more by class imbalance than by convincing demonstration of true discriminative performance across all windows. |
We thank the reviewer for this important and insightful comment. We agree that the very low prevalence of positive findings in our study represents an important limitation and may influence the interpretation of agreement measures.
In response, we have taken several steps to address this issue. First, we have clarified in the 2.7. Statistical analysis that kappa estimates should be interpreted with caution in the context of low prevalence and limited variability: ‘Given the low prevalence of positive findings, kappa estimates were interpreted with caution, as agreement measures may be affected by limited variability and prevalence effects.’ Second, in the Results and Abstract, we have avoided overinterpretation of agreement values, particularly in regions with very few or no positive findings.
Finally, we have expanded the Limitations section to explicitly acknowledge that the low number of positive cases limits the interpretability of agreement measures and may affect the robustness of the findings: ‘Twelfth, the number of patients included in the diagnostic accuracy analysis was limited, and the number of positive cases was particularly small. This may affect the estimation of sensitivity and specificity and lead to wide confidence intervals. In addition, the very low prevalence of positive findings may have influenced the observed inter-rater agreement, as high agreement may partly reflect concordance in negative clas-sifications rather than consistent identification of positive cases, thereby limiting the stability and interpretability of agreement measures. ’ |
2.7. Statistical analysis, p. 6, 259-261
Limitations, p. 13, 512-518 |
|
Inter-rater agreement
The agreement section overinterprets sparse data. In RUQ, kappa is based on only four radiologist-positive cases and one disagreement. In LUQ and BLADDER, kappa could not be meaningfully estimated, yet the narrative still leans on the observed agreement percentages as if they were substantively informative. In SUPH, the “perfect” agreement is based on a single positive case. Those are fragile estimates, and the manuscript does not adequately emphasize how unstable they are. The phrase “substantial to excellent inter-rater reliability” reads too strongly given the tiny number of positive observations. |
We thank the reviewer for this important and insightful comment. We agree that the very small number of positive findings in several regions limits the stability and interpretability of the inter-rater agreement estimates.
In response, we have revised the Results and Discussion sections to avoid overinterpretation of agreement measures. Specifically, we have removed or softened expressions such as “substantial to excellent inter-rater reliability” and have emphasized that high observed agreement in some regions may largely reflect the predominance of negative findings rather than consistent identification of positive cases: Results: ‘Kappa for the RUQ region was 0.85 (p < 0.001), indicating high agreement between examiners. However, this estimate was based on a small number of positive cases (n = 4), with only one discordant result, and should therefore be interpreted with caution.’
‘SUPH: Kappa was 1.00 (p < 0.001), indicating complete agreement between the examiners, although this finding was based on a single positive case and should therefore be interpreted with caution.’
‘Overall, while observed agreement was high across regions, these estimates should be interpreted with caution, as the very low prevalence of positive findings and the small number of events limit the stability and interpretability of agreement measures.’
Discussion: ‘However, these findings should be interpreted with caution, as the study was conducted in a single-center setting and involved a single trained nurse, a limited sample size, a relatively short study duration, and a low number of positive cases. Therefore, the results may reflect individual operator performance rather than the broader feasibility of nurse-performed FAST and should not be generalized beyond the study setting.’
‘These findings suggest a high level of agreement between the nurse and the radiologists when performing FAST; however, this should be interpreted with caution given the small number of positive cases and the study design. In addition, the findings are based on a single trained nurse in a single-center setting and may therefore reflect individual operator performance rather than the broader feasibility of nurse-performed FAST as a clinical practice model.’
In addition, we have clarified in the Statistical Analysis section that kappa estimates should be interpreted with caution in the context of low prevalence: ‘Given the low prevalence of positive findings, kappa estimates were interpreted with caution, as agreement measures may be affected by limited variability and prevalence effects.’, and we have expanded the Limitations section to explicitly acknowledge the instability of agreement measures under these conditions: ‘In addition, the very low prevalence of positive findings may have influenced the observed inter-rater agreement, as high agreement may partly reflect concordance in negative classifications rather than consistent identification of positive cases, thereby limiting the stability and interpretability of agreement measures.’ |
Results, p.8, 310-311
Results, p.9, 329-331
Results, p.9, 344-346
Discussion, p.10, 377-381
Discussion, p.10, 388-393
2.7. Statistical analysis, p. 6, 259-261
Limitations, p.13, 514-518
|
|
Diagnostic accuracy
(a) This is the manuscript’s weakest section. Reporting 100% sensitivity, specificity, PPV, and NPV in the main CT subgroup is not persuasive when the 2×2 table contains only 23 patients and just 2 positives. The confidence intervals themselves show how uncertain the estimates are, especially sensitivity and PPV. The same problem appears in the trauma-team subgroup, again with only 10 patients and 2 positives. Perfect performance under such tiny denominators is exactly where readers should be most cautious, but the manuscript treats these results too affirmatively.
(b) There is also no explanation of whether CT readers were blinded to FAST findings, which matters when CT is the reference standard. |
(a) We thank the reviewer for this important and insightful comment. We fully agree that the diagnostic accuracy results should be interpreted with caution, given the very small sample size and the extremely low number of positive cases in both subgroups.
We acknowledge that reporting perfect diagnostic performance (100% sensitivity, specificity, PPV, and NPV) under such conditions may be misleading, as these estimates are based on very small denominators and are associated with wide confidence intervals, reflecting substantial uncertainty.
In response, we have revised the Results: ‘However, these estimates are based on a very small number of positive cases and are associated with wide confidence intervals.’
and Abstract: ‘…however, these estimates were based on a very small number of positive cases (only two positive cases in each subgroup) and were associated with wide confidence intervals’ sections to avoid overly affirmative language and to more clearly emphasize the exploratory nature and limited precision of these estimates.
(b) We thank the reviewer for this important comment.
As described in the Methods section, the radiologist was aware that a FAST examination had been performed but did not have access to the nurse’s findings or images. CT interpretation was conducted independently as part of routine clinical practice, without access to the recorded FAST results.
To improve clarity, we have revised the Methods section to explicitly state that CT interpretation was performed without access to FAST findings: ‘CT images were interpreted independently by the radiologist without access to the recorded FAST findings, as part of routine clinical practice. The CT findings were collected and recorded by the ED physician.’ |
3.4. Verification of diagnostic accuracy of nurse-performed FAST, p. 9, 357-358
Abstract, p. 2, 50-51
2.5.3. Management of severely injured patients and CT utilization, p. 5, 225-277
|
|
Discussion |
||
|
Interpretation of agreement
The discussion generalizes too quickly from a very narrow dataset. It interprets the findings as support that emergency nurses can perform and interpret FAST correctly, but the actual evidence is from one nurse, one hospital, one device, a short enrollment period, and very few positive scans. The discussion does not sufficiently acknowledge that this could be an operator-specific success rather than evidence of broader role implementation. It also leans on system-level implications like radiologist availability and ED efficiency that were not studied here. |
We thank the reviewer for this important and insightful comment. We agree that the initial version of the Discussion may have overgeneralized findings from a limited dataset.
In response, we have revised the Discussion to adopt a more cautious interpretation of the results. Specifically, we now explicitly acknowledge that the findings are based on a single trained nurse in a single-center setting, with a limited sample size, a relatively short study duration, and a low number of positive cases, and may therefore reflect individual operator performance rather than the broader feasibility of nurse-performed FAST as a clinical practice model. Furthermore, we clarify that the results should not be generalized beyond the study setting: ‘However, these findings should be interpreted with caution, as the study was conducted in a single-center setting and involved a single trained nurse, a limited sample size, a relatively short study duration, and a low number of positive cases. Therefore, the results may reflect individual operator performance rather than the broader feasibility of nurse-performed FAST and should not be generalized beyond the study setting.’
‘These findings suggest a high level of agreement between the nurse and the radiologists when performing FAST; however, this should be interpreted with caution given the small number of positive cases and the study design. In addition, the findings are based on a single trained nurse in a single-center setting and may therefore reflect individual operator performance rather than the broader feasibility of nurse-performed FAST as a clinical practice model. There is limited data in the literature regarding the performance of FAST by nurses and its comparison with physicians’ assessments. The results of this study suggest that, following appropriate training, an emergency care nurse may be able to perform FAST and interpret the findings with a high level of agreement; however, these results should not be generalized beyond the specific study setting.’
We have also tempered statements related to wider implementation and system-level implications (e.g., radiologist availability and emergency department efficiency), clarifying that these aspects were not directly evaluated in the present study and should be explored in future research: ‘In this context, training nurses in FAST may potentially support more timely patient assessment and improved ED functionality. This enhances both the safe, effective and quality care of patients [22] and the advanced role of nurses in the emergency care setting [23–25]. However, these aspects were not directly evaluated in the present study and warrants further investigation.’ |
Discussion, p. 10, 377-381
Discussion, p. 10-11, 388-398
Discussion, p. 11, 401-405 |
|
Interpretation of diagnostic accuracy
The manuscript treats the perfect diagnostic estimates as meaningful evidence of reliability, despite openly admitting later that the small sample may explain them. That caveat should not be buried; it should dominate interpretation. The comparison with prior literature is also selective in tone: the discussion highlights consistency with nurse FAST literature, but the manuscript’s own estimates are more extreme than prior studies and far less stable. The argument that stronger training may explain the superior performance is speculative and unsupported because the study does not compare training intensities or operators. |
We thank the reviewer for this important and insightful comment. We agree that the initial version of the Discussion may have overemphasized the diagnostic accuracy estimates and did not sufficiently foreground the uncertainty associated with the small sample size and the very low number of positive cases.
In response, we have substantially revised Section 4.2 to ensure that this limitation is clearly stated at the outset and dominates the interpretation. The diagnostic accuracy estimates are now explicitly described as preliminary, derived from a very small sample, and associated with wide confidence intervals and low precision. We have removed language that could be interpreted as suggesting robust or definitive diagnostic performance: ‘The diagnostic accuracy of nurse-performed FAST was evaluated using CT as the reference standard. In the present study, sensitivity, specificity, and predictive values were estimated at 100%. However, these estimates should be interpreted with extreme caution, as they were derived from a very small number of patients and an extremely limited number of positive cases, resulting in wide confidence intervals and low precision. Therefore, these findings should be considered preliminary and not indicative of true diagnostic performance.’
We have also revised the comparison with prior literature to present a more balanced and cautious interpretation. Specifically, we now acknowledge that the estimates observed in the present study are higher and less stable than those reported in previous studies, and we emphasize that differences are more likely attributable to sample size and low event prevalence rather than true performance differences: ‘Compared with the limited available literature [17,18], the diagnostic estimates observed in the present study are higher and less stable.’
Furthermore, we have modified the discussion of training to avoid unsupported causal interpretations. While we retain a description of the training received, we now explicitly state that the study design does not allow conclusions regarding the impact of training intensity or comparison between operators: ‘The apparent differences between the present findings and prior literature may be explained primarily by the small sample size and the very low prevalence of positive cases in this study, rather than true differences in performance.’
‘However, although the nurse in the present study underwent extensive training, the study design does not allow conclusions to be drawn regarding the impact of training intensity on diagnostic accuracy.’ |
Discussion, p.11, 407-413
Discussion, p.11, 414-415
Discussion, p.11, 439-441
Discussion, p.12, 464-466
|
|
Broader claims
The discussion repeatedly pivots from agreement/accuracy to claims about faster assessment, improved ED functionality, patient safety, and role expansion, none of which were measured. That makes the discussion aspirational rather than evidence-bound. It would be more credible if it stayed closer to what the data can actually show. |
We thank the reviewer for this important comment. We agree that the initial version of the Discussion may have extended beyond the direct findings of the study by including broader system-level implications that were not formally evaluated.
In response, we have revised the relevant sections of the Discussion to ensure closer alignment with the data. Specifically, statements related to faster assessment, emergency department functionality, patient safety, and role expansion have been moderated and reframed as potential implications rather than demonstrated outcomes: ‘There is limited data in the literature regarding the performance of FAST by nurses and its comparison with physicians’ assessments. The results of this study suggest that, following appropriate training, an emergency care nurse may be able to perform FAST and interpret the findings with a high level of agreement; however, these results should not be generalized beyond the specific study setting.’
‘In this context, training nurses in FAST may potentially support more timely patient assessment and improved ED functionality. This enhances both the safe, effective and quality care of patients [22] and the advanced role of nurses in the emergency care setting [23–25]. However, these aspects were not directly evaluated in the present study and warrants further investigation.’
We now explicitly acknowledge that these aspects were not directly measured in the present study and should be interpreted cautiously. |
Discussion, p.11, 394-398
Discussion, p.11, 401-405
|
|
Limitations |
||
|
The limitations section is incomplete. It acknowledges small sample size, single center, convenience sampling, single nurse, and few positive cases, but it does not explicitly mention partial verification bias, prevalence effects on agreement statistics, the exclusion of unstable/high-risk patients, the possibility of spectrum bias from excluding harder-to-scan groups, the lack of simultaneous paired assessments, or the possibility that the reference-standard subset was selected by clinical judgment. These are not minor omissions; they are central threats to validity. |
We thank the reviewer for this important and insightful comment. We agree that several key threats to validity required more explicit acknowledgment in the Limitations section. In response, we have substantially revised and expanded the Limitations section to address all the points raised: - partial verification bias: ‘Eleventh, CT imaging was not performed systematically in all patients but was based on clinical indications and routine trauma management protocols. As a result, different diagnostic pathways were followed (i.e., nurse-performed FAST with radiologist assessment and CT in some cases, and nurse-performed FAST with CT in Trauma Team activation cases). This approach may have introduced partial verification bias, as diagnostic accuracy estimates were derived from a selected subgroup of patients who underwent CT and may not fully represent the entire study population.’ - prevalence effects on agreement statistics: ‘Twelfth, the number of patients included in the diagnostic accuracy analysis was limited, and the number of positive cases was particularly small. This may affect the estimation of sensitivity and specificity and lead to wide confidence intervals. In addition, the very low prevalence of positive findings may have influenced the observed in-ter-rater agreement, as high agreement may partly reflect concordance in negative classifications rather than consistent identification of positive cases, thereby limiting the stability and interpretability of agreement measures.’ - the exclusion of unstable/high-risk patients: ‘Seventh, the eligibility criteria, including the requirement for informed consent, the exclusion of life-threatening cases, and the exclusion of specific patient groups (e.g., pa-tients with language barriers, BMI > 40, or pregnancy), may have led to the underrepresentation of more critically ill or complex patients, thereby introducing potential selection and spectrum bias.’ -the possibility of spectrum bias from excluding harder-to-scan groups: ‘Seventh, the eligibility criteria, including the requirement for informed consent, the exclusion of life-threatening cases, and the exclusion of specific patient groups (e.g., patients with language barriers, BMI > 40, or pregnancy), may have led to the un-derrepresentation of more critically ill or complex patients, thereby introducing potential selection and spectrum bias.’ - the lack of simultaneous paired assessments: ‘Tenth, complete blinding between examiners was not feasible, as FAST examinations were performed sequentially and the second examiner was aware that a prior examination had been conducted. The time interval between assessments and potential clinical changes were not systematically recorded. These factors may have introduced a degree of shared clinical context and may have influenced the observed inter-rater agreement.’ - possibility that the reference-standard subset was selected by clinical judgment: ‘Eleventh, CT imaging was not performed systematically in all patients but was based on clinical indications and routine trauma management protocols. As a result, different diagnostic pathways were followed (i.e., nurse-performed FAST with radiolo-gist assessment and CT in some cases, and nurse-performed FAST with CT in Trauma Team activation cases). This approach may have introduced partial verification bias, as diagnostic accuracy estimates were derived from a selected subgroup of patients who underwent CT and may not fully represent the entire study population.’ |
Limitations: p. 13, 505-511
p. 13, 512-518
p. 12, 483-487
p. 12, 483-487
p.13, 499-504
p. 13, 505-511
|
|
Conclusion |
||
|
The conclusion is too strong relative to the evidence. Saying the findings show “high diagnostic accuracy” and support meaningful nurse contribution to timely assessment is more definitive than the data justify. Given the tiny CT-confirmed sample and the low event rate, the manuscript should conclude feasibility and preliminary signal, not accuracy in a way that sounds established. The closing claims again imply patient-safety benefit without having measured any patient-safety endpoint. |
We thank the reviewer for this important comment. We agree that the initial version of the Conclusions may have been too strong relative to the available evidence.
In response, we have revised the Conclusions: ‘In conclusion, this pilot study suggests that a trained emergency care nurse may achieve a high level of agreement with physicians in performing and interpreting FAST examinations in trauma patients in the emergency department. With regard to the secondary objective, the findings provide preliminary insights into the diagnostic accuracy of nurse-performed FAST compared with CT; however, further investigation is required to more reliably assess diagnostic performance. These findings support the potential feasibility of nurse-performed FAST in emergency care settings. Further studies involving larger samples, multiple operators, multicentre designs, and standardized training protocols are needed to validate these findings, better define their role in clinical practice, evaluate their integration into routine care, and investigate their potential impact on patient safety.’
|
Conclusions, p. 13, 526-536 |
Author Response File:
Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThe work touches on an important and current topic of the role of nurses in performing FAST in emergency departments. The topic is clinically relevant and is in line with the global trend of expanding nursing competencies. The article is written clearly, logically and contains well-presented results.
However, the methodology of the study needs significant refinements and reinforcements to ensure the reliability of the results.
- There is no justification for the sample size, even in the context of the pilot study.
- No research hypothesis or expected level of agreement has been specified (e.g. kappa > 0.8).
- It was not described whether the study was registered (e.g. clinicaltrials.gov), which is increasingly required even for observational studies.
- The number of patients who met the criteria but were not included due to the lack of availability of a nurse was not reported.
- Authors should provide a CONSORT-style flowchart to increase transparency.
- It was not described whether the radiologist performed FAST before or after other imaging tests (e.g., X-ray), which may have affected interpretation.
- It was not stated how many radiologists performed the tests and whether their experience was homogeneous.
- The authors use Cohen's kappa and classic diagnostic indicators, but:
- Lacks statistical power analysis
- Lack of analysis of FAST execution times, which are crucial for patient safety
- Lack of image compliance analysis (e.g., quality assessment by an independent expert).
The study has potential and addresses an important topic, but requires significant methodological reinforcement and a more critical interpretation of the results.
Author Response
Reviewer 4
|
Comment |
Response |
Citation (section, page and line/-s) |
|
The work touches on an important and current topic of the role of nurses in performing FAST in emergency departments. The topic is clinically relevant and is in line with the global trend of expanding nursing competencies. The article is written clearly, logically and contains well-presented results. |
We sincerely thank the reviewer for these positive and encouraging comments. |
|
|
However, the methodology of the study needs significant refinements and reinforcements to ensure the reliability of the results. |
We thank the reviewer for this important comment. We agree that methodological clarity and rigor are essential to ensure the reliability of the findings.
Although the reviewer did not specify particular aspects, we have carefully revisited the methodology section and strengthened it to improve clarity and transparency. In particular, we have further clarified: (i) 2.1. Design and setting: ‘Given the pilot nature of the study, no formal sample size calculation was performed. The sample size was based on feasibility and the number of eligible patients presenting during the study period.’ (ii) 2.4. Training of the nurse who collected the data: ‘During the training period, the nurse performed FAST examinations as part of supervised clinical practice. However, a predefined minimum number of supervised ex-aminations, formal competency thresholds, and standardized assessment criteria were not specified as part of the study protocol. In addition, no formal assessment of in-terobserver agreement was conducted prior to study initiation beyond the completion of the training and supervised clinical practice.’
(iii) 2.5.2. Ensuring independence and blinding: ‘FAST examinations were performed by ten radiologists as part of routine clinical practice. Although all were qualified specialists, their level of experience in FAST was not formally standardized or recorded.’
‘The radiologist did not have access to the results of other imaging tests (e.g., X-ray or CT) at the time of the FAST examination and was only aware of the type of injury.’
‘These procedures were implemented to promote independent recording of findings and to support blinding by minimizing access to prior results between examiners. FAST examinations were performed sequentially as part of routine clinical practice; however, given the rapid nature of the examination, the time interval between assessments was expected to be short, although it was not systematically recorded or analyzed. ’
(iv) 2.5.3. Management of severely injured patients and CT utilization: ‘CT images were interpreted independently by the radiologist without access to the recorded FAST findings, as part of routine clinical practice. The CT findings were collected and recorded by the ED physician. CT imaging was also performed in selected patients outside Trauma Team activation, based on clinical judgment and standard trauma management protocols, and was not applied systematically to all patients.’
(v) 2.5.4. Collection of demographic and clinical data: ‘Ultrasound findings recorded for study purposes were not made available to the trauma or surgical teams for decision-making, and patient management was based on standard clinical assessment and imaging pathways.’
(vi) 2.6. Ethical issues: ‘This study was not prospectively registered in a clinical trial registry.’
(vii) 2.7. Statistical analysis: ‘Given the low prevalence of positive findings, kappa estimates were interpreted with caution, as agreement measures may be affected by limited variability and prevalence effects.’
In addition, we have further emphasized key methodological limitations, including the single-center design, the involvement of a single operator, and the small number of positive cases and other methodological limitations: ‘Third, the relatively short study duration may have limited patient recruitment and the number of positive cases, potentially affecting the representativeness of the findings. Fifth, the study did not systematically record the number or characteristics of eligible patients who were not included, and no comparison was performed between included and non-included cases; therefore, selection bias cannot be excluded. Sixth, patient inclusion was also limited to the time periods during which the trained nurse was available, and the included time windows may not fully represent routine ED operations across all shifts. Seventh, the eligibility criteria, including the requirement for informed consent, the exclusion of life-threatening cases, and the exclusion of specific patient groups (e.g., patients with language barriers, BMI > 40, or pregnancy), may have led to the underrepresentation of more critically ill or complex patients, thereby introducing potential selection and spectrum bias. Eighth, FAST examination was performed by a single nurse, which may limit the generalizability of the results. The findings may also reflect individual operator performance rather than the broader feasibility of nurse-performed FAST as a clinical practice model. Ninth, the training process was not based on predefined competency thresholds or standardized assessment criteria, and interobserver proficiency was not formally evaluated prior to study initiation. Tenth, complete blinding between examiners was not feasible, as FAST examinations were performed sequentially and the second examiner was aware that a prior examination had been conducted. The time interval between assessments and potential clinical changes were not systematically recorded. These factors may have introduced a degree of shared clinical context and may have influenced the observed inter-rater agreement. Eleventh, CT imaging was not performed systematically in all patients but was based on clinical indications and routine trauma management protocols. As a result, different diagnostic pathways were followed (i.e., nurse-performed FAST with radiolo-gist assessment and CT in some cases, and nurse-performed FAST with CT in Trauma Team activation cases). This approach may have introduced partial verification bias, as diagnostic accuracy estimates were derived from a selected subgroup of patients who underwent CT and may not fully represent the entire study population. Twelfth, the number of patients included in the diagnostic accuracy analysis was limited, and the number of positive cases was particularly small. This may affect the estimation of sensitivity and specificity and lead to wide confidence intervals. In addition, the very low prevalence of positive findings may have influenced the observed inter-rater agreement, as high agreement may partly reflect concordance in negative clas-sifications rather than consistent identification of positive cases, thereby limiting the stability and interpretability of agreement measures. Finally, no formal a priori sample size calculation was performed for either the in-ter-rater agreement or the diagnostic accuracy analyses, as this study was designed as a pilot study. Therefore, the sample size may not be sufficient to provide precise or stable estimates, particularly for diagnostic performance.’
We also discussed their potential impact on the reliability and generalizability of the findings in the Discussion section: ‘However, these findings should be interpreted with caution, as the study was conducted in a single-center setting and involved a single trained nurse, a limited sample size, a relatively short study duration, and a low number of positive cases. Therefore, the results may reflect individual operator performance rather than the broader feasibility of nurse-performed FAST and should not be generalized beyond the study setting.’ |
p.3, 118-120
p.4, 178-183
p.5, 202-204
p. 5, 207-209
p.5, 214-218
p.5, 225-230
p. 5-6, 236-238
p.6, 251
p.6, 259-261
Limitations, p. 12-13, 472-522
Discussion, p. 10, 377-381
|
|
There is no justification for the sample size, even in the context of the pilot study. |
We thank the reviewer for this important comment.
No formal sample size calculation was performed, as this study was designed as a pilot study aiming to explore the feasibility of nurse-performed FAST examinations and to generate preliminary data. The sample size was therefore determined pragmatically, based on the number of eligible trauma patients presenting to the emergency department during the predefined study period.
We have now clarified this in the Design and setting section: ‘Given the pilot nature of the study, no formal sample size calculation was performed. The sample size was based on feasibility and the number of eligible patients presenting during the study period.’
Additionally, we acknowledge this in the Limitation section: ‘Finally, no formal a priori sample size calculation was performed for either the inter-rater agreement or the diagnostic accuracy analyses, as this study was designed as a pilot study. Therefore, the sample size may not be sufficient to provide precise or stable estimates, particularly for diagnostic performance.’ |
2.1. Design and setting, p. 3, 118-120
Limitations, p. 13, 519-522 |
|
No research hypothesis or expected level of agreement has been specified (e.g. kappa > 0.8). |
We thank the reviewer for this important comment.
Given the pilot and exploratory nature of the study, no formal research hypothesis or predefined threshold for the expected level of agreement (e.g., kappa > 0.8) was specified a priori. The primary aim was to estimate the level of agreement between the nurse and the radiologists and to generate preliminary data to inform future studies. |
|
|
It was not described whether the study was registered (e.g. clinicaltrials.gov), which is increasingly required even for observational studies. |
We thank the reviewer for this important comment. Although registration is not mandatory for observational pilot studies, we acknowledge that it is increasingly recommended to improve transparency and reproducibility. This has now been clarified in the manuscript: ‘This study was not prospectively registered in a clinical trial registry.’ |
2.6. Ethical issues, p. 6, 251
|
|
The number of patients who met the criteria but were not included due to the lack of availability of a nurse was not reported. |
We thank the reviewer for this important comment. We agree that the number of eligible patients who were not included due to the unavailability of the participating nurse was not systematically recorded.
In response, we have now explicitly acknowledged this in the Limitations section, stating that the study did not record the number or characteristics of eligible patients who were not included, and that no comparison was performed between included and non-included cases. We also note that patient inclusion depended on the availability of the trained nurse, which may have introduced selection bias: ‘Fourth, a convenience sample was used, and patient inclusion depended on the availability of the participating nurse; therefore, not all eligible patients may have been included during the study period, introducing potential selection bias. Fifth, the study did not systematically record the number or characteristics of eligible patients who were not included, and no comparison was performed between included and non-included cases; therefore, selection bias cannot be excluded.’ |
Limitations, p. 12, 474-479
|
|
Authors should provide a CONSORT-style flowchart to increase transparency. |
We thank the reviewer for this valuable suggestion. We agree that a flowchart would improve the transparency of the study.
In response, we have added a CONSORT-style flow diagram illustrating the patient selection process, including the number of patients assessed for eligibility, excluded, and included in the different analyses. |
Figure 1: Flow diagram of patient inclusion and analysis, p. 7 |
|
It was not described whether the radiologist performed FAST before or after other imaging tests (e.g., X-ray), which may have affected interpretation. |
We thank the reviewer for this important comment. We agree that the timing of FAST in relation to other imaging tests may influence interpretation.
In the present study, the radiologist did not have access to other imaging results (e.g., X-ray or CT) at the time of the FAST examination and was only aware of the type of injury, as part of routine clinical information. Therefore, interpretation was not influenced by prior imaging findings.
We have now clarified this point in the Methods section to improve transparency: ‘The radiologist did not have access to the results of other imaging tests (e.g., X-ray or CT) at the time of the FAST examination and was only aware of the type of injury.’ |
2.5.2. Ensuring independence and blinding, p. 5, 207-209
|
|
It was not stated how many radiologists performed the tests and whether their experience was homogeneous. |
We thank the reviewer for this important comment. We agree that the number and level of experience of radiologists may influence the interpretation of FAST examinations.
In the present study, FAST examinations were performed by ten radiologists as part of routine clinical practice. Although all were qualified specialists, their level of experience in FAST was not formally standardized or recorded.
We have now clarified this in the Methods section and acknowledged the potential for inter-operator variability as a limitation: Methods: ‘FAST examinations were performed by ten radiologists as part of routine clinical practice. Although all were qualified specialists, their level of experience in FAST was not formally standardized or recorded.’
Limitations: ‘In addition, FAST examinations were performed by multiple radiologists as part of routine clinical practice, and their level of experience was not formally standardized, which may have introduced inter-operator variability.’
|
2.5.2. Ensuring independence and blinding, p. 5, 202-204
Limitations, p.12, 491-493 |
|
The authors use Cohen's kappa and classic diagnostic indicators, but: - Lacks statistical power analysis
|
We agree that no formal a priori sample size calculation was performed for either the agreement or diagnostic accuracy analyses. As this study was designed as a pilot study, the aim was to generate preliminary data rather than provide adequately powered confirmatory estimates. We have now explicitly acknowledged this in the Design and setting section and as a limitation:
Design and setting: ‘Given the pilot nature of the study, no formal sample size calculation was performed. The sample size was based on feasibility and the number of eligible patients presenting during the study period.’
Limitations: ‘Finally, no formal a priori sample size calculation was performed for either the inter-rater agreement or the diagnostic accuracy analyses, as this study was designed as a pilot study. Therefore, the sample size may not be sufficient to provide precise or stable estimates, particularly for diagnostic performance.’ |
2.1. Design and setting, p. 3, 117-120
Limitations, p. 13, 519-522 |
|
- Lack of analysis of FAST execution times, which are crucial for patient safety |
We thank the reviewer for this important comment. However, the present study was designed to evaluate inter-rater agreement and the diagnostic accuracy of nurse-performed FAST, and did not include assessment of examination time. We have now clarified this in the Methods section by stating that the duration of FAST examinations and the time interval between assessments were not systematically recorded or analyzed: ‘These procedures were implemented to promote independent recording of findings and to support blinding by minimizing access to prior results between examiners. FAST examinations were performed sequentially as part of routine clinical practice; however, given the rapid nature of the examination, the time interval between assessments was expected to be short, although it was not systematically recorded or analyzed.’ In addition, this has been explicitly acknowledged in the Limitations section: ‘Tenth, complete blinding between examiners was not feasible, as FAST examinations were performed sequentially and the second examiner was aware that a prior examination had been conducted. The time interval between assessments and the du-ration of FAST examinations were not systematically recorded or analyzed,…’
|
2.5.2. Ensuring independence and blinding, p. 5, 214-218
Limitations, p. 13, 499-502
|
|
- Lack of image compliance analysis (e.g., quality assessment by an independent expert).
|
We thank the reviewer for this important comment. We agree that assessment of image quality by an independent expert could provide additional information regarding the technical adequacy of FAST examinations. However, the present study was designed to evaluate inter-rater agreement and diagnostic accuracy, and did not include a formal assessment of image quality. We have now acknowledged this as a limitation: ‘In addition, the quality of ultrasound images was not formally assessed by an independent expert, and therefore the technical adequacy of the examinations could not be evaluated.’ |
Limitations, p. 13, 496-498 |
|
The study has potential and addresses an important topic, but requires significant methodological reinforcement and a more critical interpretation of the results. |
We thank the reviewer for this overall assessment and constructive feedback. We have carefully addressed the methodological concerns raised and have revised the manuscript accordingly. Specifically, we have strengthened the Methods and Limitations sections and adopted a more cautious and critical interpretation of the findings throughout the Discussion and Conclusions. |
|
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have made adjustments based on the comments. I am satisfied with the adjustments made.
Author Response
Reviewer 1_2nd round
|
Comment |
Response |
|
The authors have made adjustments based on the comments. I am satisfied with the adjustments made.
|
We thank the reviewer for the positive feedback and for acknowledging the revisions made to the manuscript. |
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThanks for the corrections and additional changes, I have nothing more to add
Author Response
Reviewer 2_2nd round
|
Comment |
Response |
|
Thanks for the corrections and additional changes, I have nothing more to add.
|
We thank the reviewer for the positive feedback and appreciate the time and consideration given to the revised manuscript. |
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThis revised version is much stronger and now broadly matches the rebuttal. I do not see the earlier mismatch problem anymore.
Author Response
Reviewer 3_2nd round
|
Comment |
Response |
|
This revised version is much stronger and now broadly matches the rebuttal. I do not see the earlier mismatch problem anymore.
|
We thank the reviewer for this positive and encouraging feedback and are pleased that the revisions have addressed the concerns raised. |
Author Response File:
Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThe authors write: "Given the pilot and exploratory nature of the study, no formal research hypothesis was established..." - the pilot study is also covered by the hypothesis, especially since it is a pilot study and the research proper will likely be continued.
Please formulate the research hypotheses for the work.
.
Author Response
Reviewer 4_2nd round
|
Comment |
Response |
Citation (section, page and line/-s) |
|
The authors write: "Given the pilot and exploratory nature of the study, no formal research hypothesis was established..." - the pilot study is also covered by the hypothesis, especially since it is a pilot study and the research proper will likely be continued. Please formulate the research hypotheses for the work.
|
We thank the reviewer for this important comment. Although no formal a priori statistical hypothesis or predefined threshold for agreement was specified, the study was guided by the assumption that a trained emergency care nurse may achieve a high level of agreement with radiologists when performing and interpreting FAST examinations. We have now clarified this point in the manuscript: ‘This study was conceptually guided by the assumption that a trained emergency care nurse could achieve a high level of agreement with physicians and demonstrate acceptable diagnostic performance when performing and interpreting FAST examinations.’ |
Introduction, p.3, lines 107-109
|
Author Response File:
Author Response.pdf

