Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Topic Modeling the Academic Discourse on Critical Incident Stress Debriefing and Management (CISD/M) for First Responders

Trauma Care 2025, 5(3), 18; https://doi.org/10.3390/traumacare5030018

by Robert Lundblad¹, Saul Jaeger¹

, Jennifer Moreno²

, Charles Silber¹

, Matthew Rensi³

and Cass Dykeman^4,*

Reviewer 1:

Rebecca Jean Ryznar

Reviewer 2:

Madeleine Hinwood

Trauma Care 2025, 5(3), 18; https://doi.org/10.3390/traumacare5030018

Submission received: 29 December 2024 / Revised: 2 June 2025 / Accepted: 17 June 2025 / Published: 21 July 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

I suggest you update some of your references since some are dated from the early 1980's (see example: https://pubmed.ncbi.nlm.nih.gov/34672028/).

Check spacing between sentences (for example line 94)

Change "one" to "our" in this section: Given the aforementioned, one research question (RQ) was designed to guide the present study. This question was: RQ1: What is the topical structure of the academic discourse on Critical Incident Stress Debriefing and Critical Incident Stress Management?

Provide the link for this: The exact nature of these steps can also be viewed on this research project's website.

The article needs official figures/tables with results for your topics explored rather than only figures showing examples/methodology.

Line 442 consider rewriting since this is a fragment : For example, questions such as the following:

Author Response

#1: I suggest you update some of your references since some are dated from the early 1980's (see example: https://pubmed.ncbi.nlm.nih.gov/34672028/). References updated.

#2: Check spacing between sentences (for example line 94). Done.

#3: Change "one" to "our" in this section: Given the aforementioned, one research question (RQ) was designed to guide the present study. This question was: RQ1: What is the topical structure of the academic discourse on Critical Incident Stress Debriefing and Critical Incident Stress Management? Done.

#4: Provide the link for this: The exact nature of these steps can also be viewed on this research project's website. Done, see note bene at and of the MS.

#5: The article needs official figures/tables with results for your topics explored rather than only figures showing examples/methodology. Done, see figures 2-5.

#6: Line 442 consider rewriting since this is a fragment : For example, questions such as the following. Done.

Reviewer 2 Report

Comments and Suggestions for Authors

Rationale for use of the method (LDA) should come in stronger in the introduction. The acronym is defined in the abstract, but many readers will not know what this is, and the method should be defined before you note that other studies in similar fields have used the same method. You can also include the rationale for applying the method to this particular topic area- what will it give you? The acronym should probably also be re-defined in the introduction.

Methods:

The documentation of methods seems incomplete. The preprocessing steps are not fully described; critical details like stop word lists, tokenisation approach, and final vocabulary size are missing. There are also a few incomplete sentences throughout that need to be address, for example the "(e.g., stop word removal)" appears to be an incomplete sentence or list?

Search strategy: The search fields in Web of Science are not specified ("specify fields if possible" also appears to be an incomplete sentence?). Please justify why you chose the document types that were included- there is insufficient explanation for why certain document types (like letters and meeting abstracts) were included.

The filtering process is not clear. Although you mention for example removing CISD protein papers, the specific keywords used for filtering are not provided. The individual review process by authors lacks detail on criteria used, and there is no mention of whether inter-rater reliability measures were used, if multiple authors reviewed abstracts.

Data: Although the total word count (40,040) is provided, other important corpus statistics are missing, such as average abstract length, information around the temporal distribution of papers, and whether you planned to account for temporal changes in the literature?

Define perplexity before you start using it in the methods- some re-ordering needs to occur here.

The preprocessing steps should be explicitly detailed in the manuscript rather than requiring readers to visit an external website.

Please justify use of abstracts only- given the goal of the paper, which is to understand the academic discourse around CISD/CISM, full texts may be more appropriate because intervention details and implementation variations are often found in the methods, important limitations and caveats are usually in the discussion. I know it would increase the computing power required, but as you only have 214 papers, using full texts would likely provide more robust data for the LDA. Further, the cumulative stress and longitudinal aspects you are interested in might be discussed in paper bodies but not abstracts. I think you need a strong justification around why abstracts are sufficient for your research questions, if you retain this approach, and acknowledge this limitation in the paper.

Analysis: You have used a complex method, and I think the details in the analysis section are insufficient. There are implementation details missing, for example there is no mention of key LDA parameters (α and β hyperparameters), no specification of number of iterations or convergence criteria. The description of the process used for model selection is vague. "Low log perplexity" and "high topic coherence" need quantification, and similarly it is not well defined what you mean by "meaningful levels of detail". Was any cross-validation performed? Which generative AI models were used? How were they prompted? What criteria were used to evaluate and revise the AI-generated labels? What human validation process or expertise was required?

Details around reproducibility are also scant; as before, according to this section, critical methodological details will be made available on an external website- these were not evaluated during this review and so some of the comments I have raised may be addressed there, but some of these items are crucial for inclusion in the paper itself.

Model validation- did you conduct any stability analysis? Any other quality metrics beyond perplexity and coherence? Were topic interpretations validated? How did you plan to handle potential topic overlap?

Results

This section is very brief and missing several key details I would expect to see for this method.

Model selection: two coherence scores (0.415 vs 0.437) are reported without context- I am not an expert in this method, and it is unclear what these are referring to. Although log perplexity values were a selection criterion, these were not provided. There was no statistical justification for choosing k=4 over k=3 beyond a vague reference to greater insight- selecting the appropriate k really needs to be well justified. In alignment with this there was no comparison data provided for the other k values (2-10) that were tested. Some data around the other candidtaes would be excellent.

Topics were listed in terms of keywords without their weights/probabilities. There was no analysis of term co-occurrence patterns, and no discussion of any topic overlap or distinctness, and no temporal analysis of how topics might have evolved or changed over time. Maybe this was not an issue for this literature, but this needs to be addressed. also- if it hasn't changed in the abstracts, maybe the detail is too scant- a full-text analysis may provide different results?

I think the discussion will need to be revisited once methods/results are finalised, however I have a few points:

When you discuss possible motivations for the presence of identified topics, your current presentation of results doesn't provide enough evidence about topic prevalence or relationships to support such discussion. The results will need to be increased in detail to support this discussion.

There is a risk in discussing absent topics- because you use abstracts only, have a relatively small corpus, and without a more rigorous analysis of your topic model (including topic coverage, stability, and validation), identifying genuine gaps versus modeling artifacts will be difficult. This needs justification.

The phrase "topics and associated keywords that are most relevant to researchers and practitioners" suggests selective reporting, but you haven't presented clear criteria for determining relevance. Overall, given the limited presentation or results , discussing implications for practice may be premature- it may be better to frame this in terms of future research/next steps rather than recomendations for practice.

Author Response

Thank you for the time you took to give us such excellent guidance!

#1: Rationale for use of the method (LDA) should come in stronger in the introduction. The acronym is defined in the abstract, but many readers will not know what this is, and the method should be defined before you note that other studies in similar fields have used the same method. You can also include the rationale for applying the method to this particular topic area- what will it give you? The acronym should probably also be re-defined in the introduction. Done.

Methods:

#2: The documentation of methods seems incomplete. The preprocessing steps are not fully described; critical details like stop word lists, tokenisation approach, and final vocabulary size are missing. There are also a few incomplete sentences throughout that need to be address, for example the "(e.g., stop word removal)" appears to be an incomplete sentence or list? There was a number of preprocessing steps taken and these are fully detailed in this MS’s OSF page. A screenshot of the preprocessing steps is also provided here.

#3: Search strategy: The search fields in Web of Science are not specified ("specify fields if possible" also appears to be an incomplete sentence?). Please justify why you chose the document types that were included- there is insufficient explanation for why certain document types (like letters and meeting abstracts) were included. Done.

#4: The filtering process is not clear. Although you mention for example removing CISD protein papers, the specific keywords used for filtering are not provided. The individual review process by authors lacks detail on criteria used, and there is no mention of whether inter-rater reliability measures were used, if multiple authors reviewed abstracts. Done.

#5: Data: Although the total word count (40,040) is provided, other important corpus statistics are missing, such as average abstract length, information around the temporal distribution of papers, and whether you planned to account for temporal changes in the literature? We have added the average abstract length, as well as the range of words, perhaps abstract. Temporal changes was not a research question in the study, but we noted it as a topic to be pursued in future research.

#6: Define perplexity before you start using it in the methods- some re-ordering needs to occur here. Done.

#7: The preprocessing steps should be explicitly detailed in the manuscript rather than requiring readers to visit an external website. The pre-processing testing steps that we took were standard ones for LDA research. We’re going to respectively push back on this a bit in terms of we think that he listing of all the preprocessing parameters would weigh down the narrative and will be are easily available online for the reader that wants that level of granularity.

#9a: Analysis: You have used a complex method, and I think the details in the analysis section are insufficient. There are implementation details missing, for example there is no mention of key LDA parameters (α and β hyperparameters), no specification of number of iterations or convergence criteria. Done.

#9b: The description of the process used for model selection is vague. "Low log perplexity" and "high topic coherence" need quantification, and similarly it is not well defined what you mean by "meaningful levels of detail” Done.

#9c: Which generative AI models were used? How were they prompted? What criteria were used to evaluate and revise the AI-generated labels? What human validation process or expertise was required? This is provided in precise detail detail in the GenAI usage log available online on the research projects OSF webpage and we believe such level of detail with distract from the narrative in the manuscript itself.

#10: Details around reproducibility are also scant; as before, according to this section, critical methodological details will be made available on an external website- these were not evaluated during this review and so some of the comments I have raised may be addressed there, but some of these items are crucial for inclusion in the paper itself. Orange Data mining does not allow the random seed hyperparameter to be tuned. The implications of this for reproducibility are addressed in the implications section.

#11: Model validation- did you conduct any stability analysis? Any other quality metrics beyond perplexity and coherence? Were topic interpretations validated? How did you plan to handle potential topic overlap? Addressed in the Method and Implications sections.

Results

#12: This section is very brief and missing several key details I would expect to see for this method. Done.

#13: Model selection: two coherence scores (0.415 vs 0.437) are reported without context- I am not an expert in this method, and it is unclear what these are referring to. Although log perplexity values were a selection criterion, these were not provided. There was no statistical justification for choosing k=4 over k=3 beyond a vague reference to greater insight- selecting the appropriate k really needs to be well justified. In alignment with this there was no comparison data provided for the other k values (2-10) that were tested. Some data around the other candidates would be excellent. Done.

#14: Topics were listed in terms of keywords without their weights/probabilities. There was no analysis of term co-occurrence patterns, and no discussion of any topic overlap or distinctness, and no temporal analysis of how topics might have evolved or changed over time. Maybe this was not an issue for this literature, but this needs to be addressed. also- if it hasn't changed in the abstracts, maybe the detail is too scant- a full-text analysis may provide different results? Temporal analysis was not part of this preliminary exploratory analysis but is a good idea for the future.

I think the discussion will need to be revisited once methods/results are finalised, however I have a few points:

#15: When you discuss possible motivations for the presence of identified topics, your current presentation of results doesn't provide enough evidence about topic prevalence or relationships to support such discussion. The results will need to be increased in detail to support this discussion. Done.

#16: There is a risk in discussing absent topics- because you use abstracts only, have a relatively small corpus, and without a more rigorous analysis of your topic model (including topic coverage, stability, and validation), identifying genuine gaps versus modeling artifacts will be difficult. This needs justification.

Thank you for this important point. I agree that identifying absent or underrepresented topics in a topic model carries inherent risks, especially when using a relatively small corpus composed solely of abstracts. This is particularly true in the absence of deeper validation measures such as topic coverage metrics, stability tests across model runs, or human-coded comparisons.

To address this concern, we have revised the manuscript in two key ways:

Added methodological transparency: we now explicitly acknowledge the limitations of using abstracts and a small dataset for LDA modeling, including the possibility that some topic gaps may reflect modeling artifacts rather than actual absences in the literature.
Clarified the interpretive frame: we have reframed the discussion of “absent topics” as tentative observations rather than definitive claims. These are now presented as possible gaps that may warrant further investigation, ideally through larger corpora or complementary methods such as full-text analysis or expert validation.

Additionally, we have expanded the limitations section to note that future research would benefit from:

Models with full hyperparameter tuning (e.g., iterations, passes, alpha),
Stability assessments across multiple model runs,
And a more systematic validation of topic presence or absence, possibly incorporating human coders or domain experts.

I hope this revision strikes the right balance between analytical curiosity and methodological caution, and I appreciate the reviewer’s call for greater rigor in interpreting topic model outputs.

#17: The phrase "topics and associated keywords that are most relevant to researchers and practitioners" suggests selective reporting, but you haven't presented clear criteria for determining relevance. Overall, given the limited presentation or results , discussing implications for practice may be premature- it may be better to frame this in terms of future research/next steps rather than recommendations for practice. Done, see opening paragraph of the discussion section.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

General Writing and Grammar
• Proofread for minor grammatical issues, especially in complex compound sentences (e.g., missing commas or awkward phrasing like “Through this problemization…” → “Through this problematization…”).
• Be consistent in using either “CISD/M” or “CISD and CISM”; switching back and forth can confuse readers unfamiliar with the acronyms.
• Some sentences are overly long and could benefit from being broken up for clarity and flow.

Spelling & Formatting
• “Problematization” is occasionally misspelled as “problemization.”
• Fix inconsistent section and figure formatting (e.g., Figures 3–5 are titled “Figure X. Topic X LDAvis: ” and left incomplete).
• Section headings are inconsistently styled, ensure uniform formatting across the manuscript.
• Use of en-dashes and em-dashes is inconsistent. For example, “CISD/M—like police…” should be reviewed for stylistic consistency.

Background & Citations
• The background is rich but may benefit from tighter organization, possibly with subheadings for “CISD,” “CISM,” and “Criticisms/Gaps.”
• Citations are generally appropriate, but some bold claims could use more support (e.g., “CISD is often recognized as the most essential component…”).
• Consider referencing key critiques of CISD (e.g., early criticisms by Rose et al., 2003 or Wessely et al.) more directly to balance the discussion.

Methods
• Clarify inclusion/exclusion criteria for abstracts. Manual review is described but lacks inter-rater reliability assessment—even a brief justification for its absence would improve transparency.
• Hyperparameter limitations in Orange should be clearly framed in terms of how they might bias findings or affect interpretability.

• The choice to use a 4-topic model over the statistically better 3-topic model is justified, but the rationale could be explained more rigorously—perhaps with LDAvis visual comparisons or examples.
• There’s a missed opportunity to comment on the overlap or distinctiveness of topics—how much did they share keywords?

• The discussion is thorough and insightful. Still, it could be improved by summarizing the key finding from each topic more concisely at the start of each subsection.
• The implications sections are strong but might be strengthened by clearly differentiating between research gaps and practice gaps.

• Limitations: This section is relatively strong, but consider adding:
o The limited generalizability due to only using abstracts instead of full texts.
o The lack of qualitative triangulation with expert interviews or focus groups.
o Potential bias introduced by relying on generative AI for topic labeling.

• Add more concrete proposals for future work:
o What kind of longitudinal studies?
o How might cumulative trauma be operationalized and studied?
o What would improved procedural adherence studies look like?

Author Response

Thank you for the opportunity to revise and resubmit. I have placed the completed punch list of the request requested revisions below. All requested revisions were made. This round of revisions appears in green. Please think the reviewer for their excellent work for they have helped make this manuscript better.

Completed Punch List of Reviewers' Requests

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you for such a thorough revision. the manuscript is improved- thank you for improving clarity around your methods etc, and the acknowledgement of limitations is well-written

Round 3

Reviewer 1 Report

Comments and Suggestions for Authors

Thank you for addressing all of my suggested revisions. Paper is definitely improved for publication!

Article Menu

Topic Modeling the Academic Discourse on Critical Incident Stress Debriefing and Management (CISD/M) for First Responders

Further Information

Guidelines

MDPI Initiatives

Follow MDPI