Review Reports
- Chiara Rossi 1,2,
- Fabio Frisone 2,3 and
- Barbara Colombo 8,*
- et al.
Reviewer 1: Anonymous Reviewer 2: Anonymous Reviewer 3: Anonymous
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIn this manuscript, the authors assessed whether personalized music listening affected linguistic features of older adults’ memory retrieval in a nursing home setting, observing a greater word count during retrieval after music listening. I appreciated the author’s use of personalized music and choice of an understudied population. My primary critiques concern study design, sample size, and the choice of outcome metrics, which I believe limit the ability to draw conclusions.
Primary General Comments
- The inclusion of “narrative structure” within the title implies that the analysis included some sort of structural analysis (i.e., how words fit together, the structure of the narrative arc), which I do not see present in the manuscript. I recommend changing the language in the title and throughout to be more specific to the linguistic aspects evaluated. That being said, LIWC-22 does have these types of true structural analyses built in. If this is the author’s goal, including these analyses would be beneficial.
- My primary concern regarding the study design is the possibility of order effects and the inconsistencies between the baseline and music conditions. Results may be simply due to participants’ familiarity with the task as it was repeated, or comfort level sharing memories with the experimenter. Did the authors design the study in a way to mitigate this, which was perhaps unclear from the methods? For example, did all subjects do the no-music condition first, or was the order balanced across participants? It appears not (and this is discussed in the limitations). However, even after addressing this in the limitations, it remains particularly challenging to accept any of the reported results. Additionally, the memory task in the baseline and music conditions appeared to differ (one involved three memories at once, and the other one memory per week), which further introduces confounds that make it difficult to draw conclusions.
- Even with nonparametric statistics, the extremely small sample size makes interpreting statistical results difficult. If the authors do not wish to collect data from additional participants (which would be ideal), I suggest supplementing the group-level statistics with additional individual-level reports (more like a multiple-case study). The authors already do some of this (ie, “x participants showed increases and x participants showed decreases). It might be worth including individual-level plots or emphasizing this level of analysis more (and to perhaps retitle the paper accordingly as a ‘pilot’ or ‘multiple case study’).
Specific Comments
Abstract
- The inclusion of the nonsignificant findings within the abstract is somewhat confusing, particularly when stating the primary findings as if these effects were significant (‘may primarily enhance structural and integrated aspects of’). I recommend including only statistically significant findings in the abstract.
Introduction
- In general, I think the use of the word “institutionalized” to mean older adults in nursing homes is somewhat confusing. There are many different reasons one could be placed in an ‘institution’ (mental health, memory care, prison, etc.). I think it would be helpful if the authors could specify what ‘institutionalization’ means in the context of this manuscript early on (i.e., in the introduction) to avoid confusion.
- The introduction could use more references to the existing literature on linguistic features of autobiographical memory in aging (using both LIWC and other metrics such as BERT).
- Additionally, a larger background describing LIWC and the meaningfulness of its associated metrics would be helpful (even if just elaboration on the original Pennebaker studies). While this is briefly mentioned in the introduction, it is unclear what terms like ‘reflective integration’ and ‘cognitive restructuring’ mean in this context.
Methods
- If the goal was to assess memory in a vulnerable population (i.e., individuals with cognitive decline), why were individuals with MMSE < 17 screened out? A rationale for this in the Methods would be helpful.
- The methods could be more clearly described: it was unclear whether the three music-evoked memory sessions each consisted of three lifetime periods, or if each session consisted of only one lifetime period. If the latter, why were these lifetime periods separated for the music condition and not the baseline condition? Was the order of the life period always consistent, or was it counterbalanced?
- How long were participants given to respond to each cue word? Were they given prompts to continue if the time allotment had not yet lapsed?
- During each 10-minute listening session, was the same music played? Or, was it a different 10-minute selection each time?
- While the sample size was determined based on feasibility (and therefore there was no power analysis), it might be helpful to readers to see a post-hoc sensitivity analysis to see what the minimum detectable effect size would have been for the given sample of N=11.
- On line 206: “The increase in word count remained statistically significant”, this sentence would be best in the results and not the methods.
- Line 202: ‘selected LIWC22 categories’. Please list which categories.
Results
- While I understand that metrics were aggregated across all time periods and listening sessions, it would be helpful to be able to see if order effects may have been present by comparing the metrics at each session separately.
- Figures could use additional clarification, including y axis labels and explanations of what central lines and error bars indicate.
- Line 236: “Changes were mixed across participants.” Does this refer to pre to post differences being positive vs. negative? This is perhaps unnecessary given the statistical finding (or include instead qualitative descriptions as is done for other liwc categories).
- Several metrics are described in the results that are not mentioned in the methods or hypotheses (family-related language, past orientation). Were these exploratory?
Discussion
- I found that the discussion overstated the results. Although they tempered their language by noting that these findings were nonsignificant, the authors still interpreted nonsignificant results as if they were significant at multiple points. If this were a case study paper or a truly qualitative, this type of interpretation may be more appropriate. However, in the current scope, this level of interpretation felt unwarranted.
Author Response
Reviewer 1
Primary General Comments
- The inclusion of “narrative structure” within the title implies that the analysis included some sort of structural analysis (i.e., how words fit together, the structure of the narrative arc), which I do not see present in the manuscript. I recommend changing the language in the title and throughout to be more specific to the linguistic aspects evaluated. That being said, LIWC-22 does have these types of true structural analyses built in. If this is the author’s goal, including these analyses would be beneficial.
We thank the reviewer for this important observation. We agree that the term narrative structure may overstate the type of analysis performed in the present study. Our quantitative analyses relied on LIWC-22, which captures lexical and psycholinguistic categories rather than higher-order narrative organization (e.g., coherence, temporal sequencing, narrative arc, or discourse-level structure).
Accordingly, we have revised the title and the manuscript throughout to replace narrative structure with terminology that more accurately reflects the variables assessed (new title: Personalized Music Listening and Autobiographical Narration in Nursing Home Residents: Linguistic and Qualitative Findings from a Pilot Study). We also clarified in the Methods and Limitations sections that LIWC-derived variables should be interpreted as linguistic indicators rather than formal measures of narrative structure.
In addition, following both the reviewer’s observation and the editor’s recommendations, we expanded the manuscript by including an exploratory qualitative narrative component based on a manual analysis of the interviews. This addition was intended to capture aspects of autobiographical narration that extend beyond lexical counts, highlighting recurrent phenomenological and communicative features such as the re-emergence of concrete autobiographical scenes, activation of identity and social belonging, recovery of meaningful former roles, and increased emotional depth and communicative engagement.
- My primary concern regarding the study design is the possibility of order effects and the inconsistencies between the baseline and music conditions. Results may be simply due to participants’ familiarity with the task as it was repeated, or comfort level sharing memories with the experimenter. Did the authors design the study in a way to mitigate this, which was perhaps unclear from the methods? For example, did all subjects do the no-music condition first, or was the order balanced across participants? It appears not (and this is discussed in the limitations). However, even after addressing this in the limitations, it remains particularly challenging to accept any of the reported results. Additionally, the memory task in the baseline and music conditions appeared to differ (one involved three memories at once, and the other one memory per week), which further introduces confounds that make it difficult to draw conclusions.
We thank the reviewer for highlighting this crucial methodological issue. We agree that the fixed order of conditions and the differences in recall format between baseline and music-assisted sessions may introduce practice, familiarity, and procedural confounds.
In the original design, the baseline session was intentionally administered first to obtain an uncontaminated pre-intervention narrative sample prior to any music exposure. However, we acknowledge that this choice limits causal interpretation.
We have now clarified this rationale in the Methods section, substantially expanded the Limitations section, and revised the manuscript language to avoid causal claims. Throughout the paper, findings are now framed as preliminary associations rather than definitive intervention effects.
We also explicitly recommend that future studies adopt randomized counterbalanced designs with matched recall procedures across conditions.
- Even with nonparametric statistics, the extremely small sample size makes interpreting statistical results difficult. If the authors do not wish to collect data from additional participants (which would be ideal), I suggest supplementing the group-level statistics with additional individual-level reports (more like a multiple-case study). The authors already do some of this (ie, “x participants showed increases and x participants showed decreases). It might be worth including individual-level plots or emphasizing this level of analysis more (and to perhaps retitle the paper accordingly as a ‘pilot’ or ‘multiple case study’).
We appreciate this valuable suggestion. We agree that the small sample size limits the interpretability of group-level inferential statistics. In response, we have repositioned the manuscript more explicitly as a pilot exploratory study.
In addition, following the reviewer’s recommendation, we have supplemented the group-level analyses with an individual-level visualization of trajectories (Figure 2), illustrating pre–post changes in autobiographical narrative productivity (word count) for each participant. This addition allows for a more transparent appreciation of within-subject variability and highlights the consistency of the observed pattern across participants, with all individuals showing an increase in word count in the music-assisted condition.
We also revised the Results section to place greater emphasis on individual-level patterns alongside group-level statistics.
These changes are highlighted in yellow in the revised manuscript.
Specific Comments
Abstract
- The inclusion of the nonsignificant findings within the abstract is somewhat confusing, particularly when stating the primary findings as if these effects were significant (‘may primarily enhance structural and integrated aspects of’). I recommend including only statistically significant findings in the abstract.
We agree that the previous wording of the abstract may have overstated the interpretation of non-significant findings. In response, we revised the abstract to emphasize only the statistically significant result, namely the increase in total word count following music-assisted recall. We also removed language that could imply confirmed effects on structural or integrative aspects of narration, and reformulated the concluding sentence more cautiously to reflect the exploratory nature of the study.
Introduction
- In general, I think the use of the word “institutionalized” to mean older adults in nursing homes is somewhat confusing. There are many different reasons one could be placed in an ‘institution’ (mental health, memory care, prison, etc.). I think it would be helpful if the authors could specify what ‘institutionalization’ means in the context of this manuscript early on (i.e., in the introduction) to avoid confusion.
We agree that the term institutionalized may be interpreted broadly and could generate ambiguity across different care or custodial settings. In the context of the present manuscript, the term refers specifically to older adults residing in a nursing home. In response, we clarified this definition early in the Introduction and revised the manuscript throughout to use more specific terminology (e.g., nursing home residents, older adults living in residential care, or long-term care residents) whenever appropriate. These changes are highlighted in yellow in the manuscript. We also revised the title accordingly.
- The introduction could use more references to the existing literature on linguistic features of autobiographical memory in aging (using both LIWC and other metrics such as BERT).
We expanded the Introduction by adding references on age-related changes in autobiographical narration, including evidence that older adults may produce fewer episodic details and rely more on semantic or generalized information, as well as broader changes in lexical diversity, grammatical complexity, and discourse organization (e.g., Levine et al., 2002; Kemper et al., 2001; Burke & Shafto, 2004). We also incorporated a brief discussion of computational approaches beyond dictionary-based methods, including transformer-based models such as BERT, which may capture contextual meaning and higher-order discourse patterns (e.g., Devlin et al., 2019; Fraser et al., 2016). Finally, we clarified that LIWC was selected in the present study as a feasible, interpretable, and appropriate first-step method within an exploratory pilot design conducted in a residential care setting. These changes are highlighted in yellow in the revised manuscript.
- Additionally, a larger background describing LIWC and the meaningfulness of its associated metrics would be helpful (even if just elaboration on the original Pennebaker studies). While this is briefly mentioned in the introduction, it is unclear what terms like ‘reflective integration’ and ‘cognitive restructuring’ mean in this context.
We expanded the Introduction to better describe Linguistic Inquiry and Word Count (LIWC) as a validated dictionary-based text analysis tool that quantifies psychologically meaningful word categories and has been widely used in psychological research (Tausczik & Pennebaker, 2010; Pennebaker et al., 2015; Boyd et al., 2022).
We also clarified the meaning of the specific categories examined in the present study. For example, the cognitive processes category includes words related to causation, insight, discrepancy, and certainty, whereas exclusion words may reflect contrastive differentiation or the organization of alternative perspectives within narratives. Importantly, these categories are now explicitly described as indirect and probabilistic linguistic markers rather than direct measures of internal psychological states.
In addition, we revised the manuscript throughout to use more cautious terminology and to avoid overinterpretation of constructs such as “reflective integration” or “cognitive restructuring,” framing them instead as possible interpretive correlates of observed linguistic patterns.
Methods
- If the goal was to assess memory in a vulnerable population (i.e., individuals with cognitive decline), why were individuals with MMSE < 17 screened out? A rationale for this in the Methods would be helpful.
The rationale for excluding participants with lower MMSE scores was primarily ethical and procedural rather than theoretical. In accordance with local clinical and ethical standards, individuals with an MMSE score above 18 are generally considered able to understand the study procedures, provide autonomous informed consent, and participate reliably in interview-based research tasks. For this reason, we adopted this threshold to ensure participants’ decisional capacity and ability to engage meaningfully in autobiographical narration. We have now clarified this rationale in the Methods section. These changes are highlighted in yellow in the revised manuscript.
- The methods could be more clearly described: it was unclear whether the three music-evoked memory sessions each consisted of three lifetime periods, or if each session consisted of only one lifetime period. If the latter, why were these lifetime periods separated for the music condition and not the baseline condition? Was the order of the life period always consistent, or was it counterbalanced?
We have now revised the Methods section to specify that the baseline assessment (T0) consisted of a single autobiographical interview in which participants were invited to recall memories from three lifetime periods (childhood, adolescence, and adulthood), without music. In contrast, the music-assisted phase consisted of three separate weekly sessions, each focused on one specific lifetime period only. In each music session, participants listened to the playlist associated with that life period and were then presented with the same cue word used for that period at baseline.
We also clarified that the order of life periods was kept consistent across participants (childhood, adolescence, adulthood) and was not counterbalanced. This choice was made to preserve a coherent developmental sequence and to facilitate autobiographical recall in an older and cognitively vulnerable population. At the same time, we acknowledge that this fixed ordering and the procedural difference between baseline and music-assisted sessions represent important methodological limitations. We have therefore made this issue more explicit in both the Methods and Limitations sections. These changes are highlighted in yellow in the revised manuscript.
- How long were participants given to respond to each cue word? Were they given prompts to continue if the time allotment had not yet lapsed?
No fixed time limit was imposed for responses to each cue word. Participants were encouraged to narrate autobiographical memories in a natural and spontaneous manner, allowing sufficient time for recall and verbal expression according to their individual pace and communicative abilities. To ensure procedural consistency, memory retrieval was guided through the same standardized cue-word instruction for all participants: “What does the word X remind you of?” Participants were then invited to narrate a specific autobiographical memory associated with the cue word. No additional directive prompts were systematically used beyond this standardized instruction. We have now clarified this procedure in the Methods section. These changes are highlighted in yellow in the revised manuscript.
- During each 10-minute listening session, was the same music played? Or, was it a different 10-minute selection each time?
During the three music-assisted sessions, each participant was exposed to the same personalized playlist for approximately 10 minutes. Thus, the musical content remained stable across sessions for each individual, while the order of the tracks was varied to reduce repetition effects and maintain engagement. We have now clarified this procedure in the Methods section. We also note that the full list of songs included in each participant’s personalized playlist is reported in the Supplementary Materials.
- While the sample size was determined based on feasibility (and therefore there was no power analysis), it might be helpful to readers to see a post-hoc sensitivity analysis to see what the minimum detectable effect size would have been for the given sample of N=11.
We appreciate the reviewer’s suggestion. Given the exploratory nature of the study and the very small sample size, we opted not to include a post-hoc sensitivity analysis, as such estimates may be unstable and potentially misleading in pilot designs. Instead, we emphasized effect sizes and individual-level patterns, which we believe provide a more informative interpretation of the data at this stage.
- On line 206: “The increase in word count remained statistically significant”, this sentence would be best in the results and not the methods.
The sentence has been moved in the Results section.
- Line 202: ‘selected LIWC22 categories’. Please list which categories.
We have clarified the specific LIWC categories included in the analyses in the Methods section.
Results
- While I understand that metrics were aggregated across all time periods and listening sessions, it would be helpful to be able to see if order effects may have been present by comparing the metrics at each session separately.
We thank the reviewer for this important suggestion. In response, we have conducted an additional descriptive analysis examining word count at each individual session and included a new figure (Figure 2) illustrating session-level trajectories for each participant.
As shown in Figure 2, narrative productivity did not exhibit a progressive increase across the baseline (no music) sessions. Instead, word count showed a marked increase at the onset of the music-assisted condition and remained elevated across subsequent sessions.
Although these observations are descriptive and should be interpreted cautiously given the small sample size and exploratory design, this pattern does not appear consistent with a simple linear practice or order effect across repeated sessions.
These additions have been incorporated in the Results section and are highlighted in the revised manuscript.
- Figures could use additional clarification, including y axis labels and explanations of what central lines and error bars indicate.
We thank the reviewer for this helpful suggestion. In the revised manuscript, we have improved the clarity of all figures by adding explicit axis labels (e.g., Word Count for the y-axis and Condition or Session for the x-axis) and by revising the figure captions to clearly describe the graphical elements.
In particular, we clarified that individual trajectories are represented by thin gray lines, while the central black line indicates the mean trend across participants.
In addition, to improve clarity and avoid potential ambiguity in interpretation, we removed the original Figure 1 and focused on figures that more directly represent individual-level trajectories and session-level patterns.
These revisions have been implemented throughout the manuscript and are highlighted in the updated version.
- Line 236: “Changes were mixed across participants.” Does this refer to pre to post differences being positive vs. negative? This is perhaps unnecessary given the statistical finding (or include instead qualitative descriptions as is done for other liwc categories).
We thank the reviewer for this helpful observation. We agree that the statement “changes were mixed across participants” was vague and did not add meaningful information beyond the statistical results. In response, we have removed this sentence from the manuscript and relied instead on the reported statistics and the newly added figures illustrating individual-level trajectories to convey variability across participants.
- Several metrics are described in the results that are not mentioned in the methods or hypotheses (family-related language, past orientation). Were these exploratory?
Yes, the variables family-related language and past orientation were intended as exploratory analyses rather than primary hypothesis-driven outcomes. Our a priori hypotheses focused on narrative productivity (word count) and linguistic markers related to cognitive and integrative processing. In contrast, family-related language and past orientation were included as additional descriptive indicators to explore other potentially relevant dimensions of autobiographical narration. We have now clarified this distinction in the Methods section and revised the manuscript to explicitly label these variables as exploratory outcomes.
Discussion
- I found that the discussion overstated the results. Although they tempered their language by noting that these findings were nonsignificant, the authors still interpreted nonsignificant results as if they were significant at multiple points. If this were a case study paper or a truly qualitative, this type of interpretation may be more appropriate. However, in the current scope, this level of interpretation felt unwarranted.
We agree that the previous Discussion may have placed excessive interpretative weight on non-significant findings. In response, we revised the Discussion to more clearly distinguish statistically supported results from exploratory observations. Interpretations of non-significant outcomes were substantially moderated, and stronger claims were replaced with cautious language such as “directional trends,” “tentative patterns,” and “should be interpreted cautiously.”
The increase in narrative productivity (word count) is now presented as the main quantitative finding, whereas the other LIWC variables are discussed only as exploratory indicators for future research. In line with the editors’ recommendations, we also added a qualitative narrative section based on manual analysis of the interviews. This allowed themes such as autobiographical vividness, identity reactivation, social belonging, and emotional communication to be grounded in participants’ narratives and excerpts rather than inferred from non-significant statistical results.
Reviewer 2 Report
Comments and Suggestions for AuthorsThis study utilizes a pre-post within subjects design to evaluate the impact of listening to a personalized music playlist for 10 mins on retrieval of autobiographical memories based on prompts to recall a story from childhood, adolescence, and adulthood over 4 sessions with Session 1 being baseline of all three time periods, and then, I assume the next sessions went in order of recall from childhood, adolescence, and adulthood for all participants.
The goal of this study was to evaluate the impact of music on autobiographical narration, assessed in linguistic characteristics including narrative production (word count), linguistic indicators of cognition such as insight - and causation-related terms, and markers of contrasting differentiation such as exclusion words) as well as affective tone (positive and negative emotions).
This is an important line of research to help identify non-pharmacological ways to improve cognition especially in physically frail, institutionalized older adults. Their predictions were that listening to personalized music would lead to (1) increased verbal output and (2) increased cognitive processes and linguistic markers. Their only significant finding was an increase in verbal output, with other effects being identified as marginally significant.
I have a few procedural concerns about this study.
- Sample size and age range. Eleven participants ranged in age from 65 to 86, quite a span for a small sample. No data is available on how long the participants were institutionalized, which may be a feature more important than overall age.
- Study design
- The baseline condition prompted narrative autobiographical memory (AM) from all three time periods. Recall sessions with the manipulation lasted 15 minutes - how long was the recall phase of the baseline condition (Session 1)?
- Might a more robust comparison to evaluate the impact of personalized music have included a 10 minute listening time of non-personalized music (or audio story from the relevant time period) in the baseline condition?
- The data from narrative AM in the experimental conditions (with personalized music) are combined to provide a median score for output - which was significantly different from baseline. It’s not clear however, did word count increase across sessions? Might there be a confound with practice that is unrelated to music?
- Perhaps the study would benefit from a control condition where older adults are prompted similarly over the four sessions without any music. Or personalized vs non-personalized music.
- The title of the paper identifies narrative structure and identity as outcomes - but the only significant effect was on narrative productivity. And even that finding might be due to session order, not personalized music.
While I think this is an important line of research, the study design leads to alternative conclusions unrelated to listening to personalized music.
Author Response
Reviewer 2
This study utilizes a pre-post within subjects design to evaluate the impact of listening to a personalized music playlist for 10 mins on retrieval of autobiographical memories based on prompts to recall a story from childhood, adolescence, and adulthood over 4 sessions with Session 1 being baseline of all three time periods, and then, I assume the next sessions went in order of recall from childhood, adolescence, and adulthood for all participants.
The goal of this study was to evaluate the impact of music on autobiographical narration, assessed in linguistic characteristics including narrative production (word count), linguistic indicators of cognition such as insight - and causation-related terms, and markers of contrasting differentiation such as exclusion words) as well as affective tone (positive and negative emotions).
This is an important line of research to help identify non-pharmacological ways to improve cognition especially in physically frail, institutionalized older adults. Their predictions were that listening to personalized music would lead to (1) increased verbal output and (2) increased cognitive processes and linguistic markers. Their only significant finding was an increase in verbal output, with other effects being identified as marginally significant.
I have a few procedural concerns about this study.
- Sample size and age range. Eleven participants ranged in age from 65 to 86, quite a span for a small sample. No data is available on how long the participants were institutionalized, which may be a feature more important than overall age.
We thank the reviewer for this important comment. We agree that the small sample size and broad age range limit the generalizability of the findings and increase heterogeneity within the sample. For this reason, the manuscript has now been more explicitly framed as an exploratory pilot study throughout.
With regard to duration of residence in the nursing home, we added some new clarifications as follows: “All participants had been residing in the nursing home for at least 10 years at the time of the study, indicating a shared long-term residential care context.”
- Study design
- The baseline condition prompted narrative autobiographical memory (AM) from all three time periods. Recall sessions with the manipulation lasted 15 minutes - how long was the recall phase of the baseline condition (Session 1)?
We have revised the Methods section to specify that no fixed time limit was imposed for responses during the baseline interview or during responses to individual cue words. Participants were encouraged to narrate autobiographical memories at their own pace in order to preserve the ecological and autobiographical nature of the task. The overall duration of the baseline session was longer than the single music-assisted recall sessions because it included narratives from all three life periods within one interview. We have now clarified this point in the revised manuscript.
- Might a more robust comparison to evaluate the impact of personalized music have included a 10 minute listening time of non-personalized music (or audio story from the relevant time period) in the baseline condition?
We agree that a more rigorous control condition including non-personalized music, neutral auditory stimulation, or autobiographical prompts without music would have strengthened causal inference and allowed more specific conclusions regarding the role of personalization. The present study was conceived as a preliminary feasibility pilot conducted within a residential care setting, and resource constraints limited the inclusion of additional control arms. We have now emphasized this issue more explicitly in the Limitations section and identified controlled comparisons (e.g., personalized vs. non-personalized music; music vs. no music) as a priority for future studies.
- The data from narrative AM in the experimental conditions (with personalized music) are combined to provide a median score for output - which was significantly different from baseline. It’s not clear however, did word count increase across sessions? Might there be a confound with practice that is unrelated to music?
We thank the reviewer for this important observation. We agree that aggregating the music-assisted sessions may obscure potential order or practice effects.
To address this concern, we conducted an additional descriptive analysis examining narrative productivity (word count) at each individual session and have included a new figure (Figure 2) illustrating session-level trajectories for each participant.
As shown in Figure 2, word count did not exhibit a progressive increase across the baseline (no music) sessions. Instead, narrative output showed a marked increase at the onset of the music-assisted condition and remained elevated across subsequent sessions.
Although these observations are descriptive and should be interpreted cautiously given the small sample size and exploratory design, this pattern does not appear consistent with a simple linear practice effect across repeated sessions.
We have now added this analysis to the Results section and clarified this point in the Discussion.
- Perhaps the study would benefit from a control condition where older adults are prompted similarly over the four sessions without any music. Or personalized vs non-personalized music.
The absence of a control condition is a major limitation of the present study and prevents firm causal conclusions. We have strengthened the Limitations section accordingly and explicitly state that future research should include controlled and randomized designs comparing no-music, personalized music, and non-personalized music conditions. In addition, following the editors’ recommendations, we expanded the current manuscript by including an exploratory qualitative narrative section based on manual analysis of the interviews. Although this does not compensate for the lack of experimental control, it helps provide a richer understanding of how participants experienced and narrated music-assisted recall, highlighting communicative and autobiographical dimensions that can inform the design of future controlled studies.
- The title of the paper identifies narrative structure and identity as outcomes - but the only significant effect was on narrative productivity. And even that finding might be due to session order, not personalized music.
We agree that the previous title overstated the scope of the findings, particularly given that the only statistically significant result concerned narrative productivity. In response, we revised the title to better reflect the actual outcomes assessed and the exploratory nature of the study. The new title is Personalized Music Listening and Autobiographical Narration in Nursing Home Residents: Linguistic and Qualitative Findings from a Pilot Study.
We also revised the manuscript throughout to replace the term narrative structure with more precise expressions such as linguistic features and narrative productivity, and we adopted more cautious language regarding interpretation of the findings.
While I think this is an important line of research, the study design leads to alternative conclusions unrelated to listening to personalized music.
We substantially restructured the Discussion and Conclusions to reflect this issue more consistently and cautiously. Interpretations are now framed as preliminary and exploratory rather than causal, and we explicitly acknowledge the possible contribution of procedural and relational factors alongside music exposure.
Reviewer 3 Report
Comments and Suggestions for AuthorsThis manuscript investigates whether personalized music listening influences the linguistic and structural features of autobiographical narration in institutionalized older adults (N = 11). The topic is timely and theoretically interesting. The authors are also commended for applying FDR correction for multiple comparisons despite the small sample. However, several methodological and interpretive issues require substantive revision before the manuscript can be considered for publication.
-
The most critical concern pertains to the structural confound between the baseline and post-intervention conditions. As described in the Methods, the baseline session asked participants to recall memories from all three life periods (childhood, adolescence, adulthood) in a single sitting, whereas the post-intervention phase consisted of three separate weekly sessions, each dedicated to one life period and each preceded by approximately 10 minutes of music listening followed by approximately 15 minutes of narration. The post-intervention linguistic indices were then computed by aggregating transcripts across these three sessions. This design introduces a fundamental asymmetry: the post-intervention condition afforded substantially more total elicitation time, more focused prompting per life period, greater interviewer familiarity, and cumulative practice effects from repeated narrative tasks. Consequently, the dramatic increase in word count (median 105 vs. 626) — the study's primary significant finding — cannot be unambiguously attributed to music. It may simply reflect the fact that three dedicated sessions naturally yield more verbal output than one consolidated session. The authors acknowledge the absence of a control condition in the limitations, but this specific structural confound between session format and experimental condition is not explicitly addressed and represents a more fundamental threat to internal validity than a generic practice-effect caveat. I strongly recommend that the authors either (a) clarify the exact duration and structure of the baseline session to demonstrate comparability, or (b) reframe the word count finding with an explicit discussion of this confound and substantially temper the associated conclusions.
Relatedly, the manuscript at times employs causal or quasi-causal language that is not warranted by a single-group pre–post design without a control condition. For instance, the abstract states that "music-assisted recall produced a significant increase in total word count" (line 27–28), and the discussion refers to music as "facilitating narrative activation" and "supporting more elaborated retrieval." The word "produced" implies a causal relationship that cannot be established from this design. Throughout the manuscript, the authors should systematically replace causal phrasing with associative or observational language (e.g., "was associated with," "was followed by," "coincided with") to accurately reflect the study's inferential limitations.
-
A further statistical concern involves the interpretation of LIWC proportional measures in light of the large word count increase. LIWC computes the percentage of total words falling into each category, not absolute frequencies. When total word count increases approximately six-fold, proportional stability in a category (e.g., positive emotion words) does not necessarily mean that the absolute number of such words remained unchanged — it may have increased substantially in raw terms while remaining proportionally constant. The authors interpret unchanged proportions of emotion words as evidence that music did not alter emotional valence, but this interpretation conflates proportional and absolute change. I recommend that the authors either report absolute word counts for key LIWC categories alongside proportions, or explicitly discuss the proportional-versus-absolute distinction and its implications for the emotion-related findings.
-
In Table 2, the median for positive emotion words appears to decline from 2.96% at baseline to 0.54% post-intervention — a notable drop in proportional terms — yet this is reported as non-significant (Z = -0.66, p = .557). While non-significance with N = 11 is understandable, the direction and magnitude of this median shift deserve comment, particularly given the theoretical interest in emotional valence. The authors should at minimum note this descriptive pattern and discuss whether it might reflect a floor effect, dilution by increased total output, or genuine inter-individual variability.
The title includes the term "Identity," yet identity was not directly measured in this study. The connection between linguistic markers and narrative identity is entirely inferential, mediated through LIWC categories that serve as indirect proxies at best. While the theoretical rationale linking cognitive-process language and exclusion markers to integrative processing is reasonable, the manuscript should either (a) modify the title to more accurately reflect what was measured (e.g., replacing "Identity" with "Narrative Structure" or "Linguistic Features"), or (b) more explicitly acknowledge in the discussion that identity-related claims remain speculative and are not directly supported by the current data.
-
Several additional points merit attention. First, sample characteristics are notably skewed (9 males, 2 females), which the authors mention briefly in the limitations but do not discuss in terms of how gender-related differences in verbal expressiveness, emotional language use, or narrative style might have influenced the LIWC results. Second, the MMSE range (19.0–29.5) is broad and spans from borderline cognitive impairment to normal cognition; given that cognitive status likely moderates both music responsiveness and narrative productivity, even exploratory analyses examining whether MMSE scores correlate with the magnitude of pre–post changes would strengthen the manuscript. Third, the supplementary material provides the playlists, which is appreciated, but the semi-structured interview procedure used to develop them is not described in sufficient detail for replication — the authors should specify the questions asked, the criteria for song selection, and how disagreements between participant preferences and clinical judgment were resolved. Fourth, inter-rater reliability for transcription is not reported, nor is there any mention of how interviewer prompts or conversational scaffolding during the narrative sessions were standardized or controlled.
-
Finally, the manuscript would benefit from a more balanced discussion section. The current discussion, while intellectually rich, tends to elaborate extensively on theoretical implications of trends that did not reach statistical significance (cognitive processes, exclusion markers, temporal orientation), sometimes in language that reads as though these were confirmed findings. Given that only word count survived FDR correction, the discussion of non-significant trends should be more concise and consistently framed as speculative, with clearer signposting that these patterns await confirmation in adequately powered designs.
Author Response
Reviewer 3
This manuscript investigates whether personalized music listening influences the linguistic and structural features of autobiographical narration in institutionalized older adults (N = 11). The topic is timely and theoretically interesting. The authors are also commended for applying FDR correction for multiple comparisons despite the small sample. However, several methodological and interpretive issues require substantive revision before the manuscript can be considered for publication.
- The most critical concern pertains to the structural confound between the baseline and post-intervention conditions. As described in the Methods, the baseline session asked participants to recall memories from all three life periods (childhood, adolescence, adulthood) in a single sitting, whereas the post-intervention phase consisted of three separate weekly sessions, each dedicated to one life period and each preceded by approximately 10 minutes of music listening followed by approximately 15 minutes of narration. The post-intervention linguistic indices were then computed by aggregating transcripts across these three sessions. This design introduces a fundamental asymmetry: the post-intervention condition afforded substantially more total elicitation time, more focused prompting per life period, greater interviewer familiarity, and cumulative practice effects from repeated narrative tasks. Consequently, the dramatic increase in word count (median 105 vs. 626) — the study's primary significant finding — cannot be unambiguously attributed to music. It may simply reflect the fact that three dedicated sessions naturally yield more verbal output than one consolidated session. The authors acknowledge the absence of a control condition in the limitations, but this specific structural confound between session format and experimental condition is not explicitly addressed and represents a more fundamental threat to internal validity than a generic practice-effect caveat. I strongly recommend that the authors either (a) clarify the exact duration and structure of the baseline session to demonstrate comparability, or (b) reframe the word count finding with an explicit discussion of this confound and substantially temper the associated conclusions.
Relatedly, the manuscript at times employs causal or quasi-causal language that is not warranted by a single-group pre–post design without a control condition. For instance, the abstract states that "music-assisted recall produced a significant increase in total word count" (line 27–28), and the discussion refers to music as "facilitating narrative activation" and "supporting more elaborated retrieval." The word "produced" implies a causal relationship that cannot be established from this design. Throughout the manuscript, the authors should systematically replace causal phrasing with associative or observational language (e.g., "was associated with," "was followed by," "coincided with") to accurately reflect the study's inferential limitations.
We sincerely thank the reviewer for this thoughtful and important comment. We agree that the asymmetry between the baseline session (three life periods assessed within one session) and the post-intervention phase (three separate weekly sessions, each focused on one life period) represents a substantial methodological limitation that may affect interpretation of the word-count findings. In response, we have revised the manuscript to explicitly acknowledge this structural confound as a central threat to internal validity, beyond a generic practice-effect explanation.
Specifically, we now clarify that the higher verbal output observed in the music-assisted condition may reflect multiple factors, including greater total elicitation time, reduced task demands through separation of life periods across sessions, increasing familiarity with the interviewer, and repeated autobiographical practice, in addition to any possible contribution of music. We therefore substantially tempered the interpretation of the word-count effect throughout the Abstract, Discussion, Limitations, and Conclusions.
We also systematically revised causal or quasi-causal wording (e.g., “produced,” “facilitating,” “supporting more elaborated retrieval”) and replaced it with more appropriate associative language such as “was associated with,” “coincided with,” or “may reflect.” These changes are highlighted in yellow in the revised manuscript.
- A further statistical concern involves the interpretation of LIWC proportional measures in light of the large word count increase. LIWC computes the percentage of total words falling into each category, not absolute frequencies. When total word count increases approximately six-fold, proportional stability in a category (e.g., positive emotion words) does not necessarily mean that the absolute number of such words remained unchanged — it may have increased substantially in raw terms while remaining proportionally constant. The authors interpret unchanged proportions of emotion words as evidence that music did not alter emotional valence, but this interpretation conflates proportional and absolute change. I recommend that the authors either report absolute word counts for key LIWC categories alongside proportions, or explicitly discuss the proportionalversus-absolute distinction and its implications for the emotion-related findings.
We thank the reviewer for this important and insightful observation. We agree that LIWC variables represent proportional measures and that, in the context of a substantial increase in total word count, proportional stability does not necessarily imply stability in absolute frequency.
In response, we have revised the manuscript to clarify this distinction and to avoid overinterpretation of unchanged proportions of emotion-related words. Specifically, we now acknowledge that, although the relative proportion of positive and negative emotion words did not change significantly, their absolute frequency may have increased in parallel with the overall increase in narrative output.
We have therefore tempered our interpretation and now describe these findings more cautiously, emphasizing that the data do not support a clear change in emotional valence at the proportional level, while recognizing that absolute emotional expression may have varied.
These clarifications have been incorporated in the Results and Discussion sections.
- In Table 2, the median for positive emotion words appears to decline from 2.96% at baseline to 0.54% post-intervention — a notable drop in proportional terms — yet this is reported as non-significant (Z = -0.66, p = .557). While non-significance with N = 11 is understandable, the direction and magnitude of this median shift deserve comment, particularly given the theoretical interest in emotional valence. The authors should at minimum note this descriptive pattern and discuss whether it might reflect a floor effect, dilution by increased total output, or genuine inter-individual variability.
We thank the reviewer for this thoughtful observation. We agree that, despite the absence of statistical significance, the descriptive decrease in the median proportion of positive emotion words (from 2.96% to 0.54%) warrants comment given the theoretical relevance of emotional valence. In response, we have revised the Results and Discussion sections to explicitly acknowledge this pattern.
We now note that, in the context of the small sample size and substantial increase in total word count, this proportional reduction may reflect several non-mutually exclusive factors, including dilution effects due to increased narrative output, marked inter-individual variability, or differences in the thematic focus of recalled memories across sessions. Because the study was not powered to detect subtle changes in emotional language, these descriptive trends should be interpreted cautiously and considered hypothesis-generating rather than conclusive. These changes are highlighted in yellow in the revised manuscript.
- The title includes the term "Identity," yet identity was not directly measured in this study. The connection between linguistic markers and narrative identity is entirely inferential, mediated through LIWC categories that serve as indirect proxies at best. While the theoretical rationale linking cognitive-process language and exclusion markers to integrative processing is reasonable, the manuscript should either (a) modify the title to more accurately reflect what was measured (e.g., replacing "Identity" with "Narrative Structure" or "Linguistic Features"), or (b) more explicitly acknowledge in the discussion that identity-related claims remain speculative and are not directly supported by the current data.
We revised the title as suggested: Personalized Music Listening and Autobiographical Narration in Nursing Home Residents: Linguistic and Qualitative Findings from a Pilot Study. These changes are highlighted in yellow in the revised manuscript. In addition, with the inclusion of the new qualitative narrative section, references to identity are now grounded primarily in participants’ autobiographical accounts (e.g., former roles, social belonging, meaningful self-images) rather than inferred solely from LIWC markers. Nevertheless, we explicitly frame these aspects as interpretative and exploratory rather than directly measured outcomes.
- Several additional points merit attention. First, sample characteristics are notably skewed (9 males, 2 females), which the authors mention briefly in the limitations but do not discuss in terms of how gender-related differences in verbal expressiveness, emotional language use, or narrative style might have influenced the LIWC results. Second, the MMSE range (19.0–29.5) is broad and spans from borderline cognitive impairment to normal cognition; given that cognitive status likely moderates both music responsiveness and narrative productivity, even exploratory analyses examining whether MMSE scores correlate with the magnitude of pre–post changes would strengthen the manuscript. Third, the supplementary material provides the playlists, which is appreciated, but the semi-structured interview procedure used to develop them is not described in sufficient detail for replication — the authors should specify the questions asked, the criteria for song selection, and how disagreements between participant preferences and clinical judgment were resolved. Fourth, inter-rater reliability for transcription is not reported, nor is there any mention of how interviewer prompts or conversational scaffolding during the narrative sessions were standardized or controlled.
We thank the reviewer for these thoughtful and constructive observations.
Gender imbalance: We have expanded the Limitations section to more explicitly discuss the potential impact of gender imbalance (9 males, 2 females), including possible differences in verbal expressiveness, emotional language use, and autobiographical narrative style. We now acknowledge that these factors may have influenced the observed linguistic patterns and limit generalizability.
Cognitive status (MMSE): We agree that cognitive status may play a moderating role. Given the small sample size (N = 11), we did not conduct correlational analyses between MMSE scores and outcome measures, as such analyses would be underpowered and potentially misleading. Instead, we have clarified the MMSE inclusion range and discussed its possible moderating role in the Limitations section, identifying this as an important direction for future research.
Playlist development procedure: We have expanded the Methods section to provide a more detailed description of the semi-structured interview used to develop the personalized playlists. This now includes the domains explored (e.g., preferred artists, genres, meaningful songs, and music associated with salient life periods), as well as the collaborative refinement process involving the facility’s music therapist.
Transcription and interviewer standardization: We have clarified the transcription procedures and, where applicable, added information regarding consistency in interviewer prompts and narrative elicitation. We also acknowledge in the Limitations section that formal inter-rater reliability was not assessed and that interviewer scaffolding may have influenced narrative production.
These revisions are reflected in the Methods and Limitations sections and are highlighted in the revised manuscript.
- Finally, the manuscript would benefit from a more balanced discussion section. The current discussion, while intellectually rich, tends to elaborate extensively on theoretical implications of trends that did not reach statistical significance (cognitive processes, exclusion markers, temporal orientation), sometimes in language that reads as though these were confirmed findings. Given that only word count survived FDR correction, the discussion of non-significant trends should be more concise and consistently framed as speculative, with clearer signposting that these patterns await confirmation in adequately powered designs.
We substantially revised and shortened the Discussion to improve balance and proportionality. The increase in word count, as the only result surviving FDR correction, is now clearly presented as the primary quantitative finding. Interpretations of cognitive processes, exclusion markers, and temporal orientation were condensed and consistently reframed as tentative, descriptive, and hypothesis-generating. Clearer signposting was added throughout to indicate that these trends require confirmation in larger and adequately powered controlled studies.
In parallel, as suggested also by the Editors, part of the interpretative focus was relocated to the newly added qualitative narrative section, where themes such as autobiographical vividness, identity reactivation, and emotional communication are grounded in participants’ narratives rather than in non-significant statistical effects.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript is much improved, and the authors have satisfactorily addressed my primary concerns. Particularly, I appreciated the author's reframing of the manuscript as a pilot study, with an emphasis on individual change and qualitative findings, while not overstating results.
Author Response
We sincerely thank the Reviewer 1 for the careful and constructive evaluation throughout the review process.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have addressed all of my concerns in this revision. The paper is more clear in its scope and contributions to the literature.
Author Response
We sincerely thank the Reviewer 2 for the careful evaluation of the manuscript throughout the review process.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have engaged seriously with the first-round review and most of the previously raised concerns are now adequately addressed in the manuscript body. My remaining concerns focus on residual issues that were raised in the first round but only partially resolved, as well as one new issue introduced during revision.
First, the most fundamental concern from the first round—the structural confound between the single-session baseline and the three-session post-intervention condition—has been acknowledged but not fully resolved. The authors now state in the Discussion (lines 381–385) that "the post‐intervention condition involved repeated sessions whereas baseline narratives were collected in a single session" and that "greater verbal output may therefore reflect not only music‐related processes, but also increased familiarity with the task, repeated practice, or reduced cognitive load across sessions." This is an improvement over the original manuscript. However, the Abstract (lines 31–33) still states that "music-assisted recall was associated with a significant increase in total word count compared with baseline, indicating greater narrative productivity," which presents the word count finding as a substantive outcome without flagging the structural confound. The phrase "indicating greater narrative productivity" is particularly problematic, because it presupposes that the increase reflects a meaningful productivity change rather than the mechanical consequence of having three dedicated sessions instead of one consolidated session. The Abstract should be revised to explicitly note this structural asymmetry, or at minimum to soften the inferential claim (e.g., "was accompanied by higher total word count, although this difference may partly reflect the greater elicitation time afforded by the three-session post-intervention format"). Similarly, the Conclusions section (lines 516–522) still refers to "a marked increase in narrative productivity" without flagging the confound, and the section-level descriptive trend analysis added in Section 3.1.1 (lines 278–289), while a constructive addition, somewhat overstates what the data can support. The claim that "this pattern does not appear consistent with a simple linear practice effect" addresses only one alternative explanation (linear practice across sessions) but does not address the more fundamental confound, which is that the three post-intervention sessions involve dedicated single-period prompting whereas the baseline session compressed all three life periods into a single sitting. This structural difference would plausibly produce a step-change in word output at the transition between conditions regardless of any music effect, simply because of the change in session format and elicitation time per period. The authors should acknowledge this more directly.
Second, the response regarding MMSE moderation is reasonable in principle but the implementation in the manuscript is minimal. The authors decline to run correlational analyses on the grounds of low power, which is defensible, but the Limitations section (lines 504–507) only briefly notes that "baseline cognitive functioning may also moderate responsiveness to music cues and narrative output." Given the broad MMSE range (19.0–29.5), spanning from borderline impairment to high-normal cognition, even a brief descriptive observation—for example, whether the magnitude of word count increase appeared visually associated with MMSE in Figure 1—would be informative for readers. The current treatment leaves the question entirely to future research without providing any descriptive context from the present sample.
Third, the response to the inter-rater reliability point is partially addressed but somewhat unsatisfying. The Methods section (lines 212–214) now states: "All transcripts were reviewed for accuracy prior to linguistic analysis. Formal inter-rater reliability coefficients were not computed." This is honest, but the question of how interviewer prompts and conversational scaffolding were standardized across sessions remains underspecified. The authors mention that "interviewers minimized additional prompting and adopted a supportive, non-directive stance" (lines 178–180), but it is not clear whether the same interviewer conducted all sessions for a given participant, whether interviewer training or calibration occurred, or whether there were any procedural differences between baseline and intervention sessions in how prompts were delivered. Given that interviewer behavior is one plausible alternative explanation for increased verbal output, even a brief clarification on these points would strengthen the manuscript.
Fourth, a new minor inconsistency appears to have been introduced during revision. The Abstract states that "qualitative findings suggested that music-assisted narratives were often characterized by more vivid autobiographical scenes, reactivation of meaningful social identities and former life roles, and emotionally salient communication" (lines 33–36). However, the qualitative analysis was conducted only on the music-assisted narratives and not comparatively on the baseline narratives (Methods section 2.3, lines 233–236, describes this as "an illustrative, conceptually guided reading of the narratives rather than as a formal qualitative analysis," applied to post-intervention transcripts). The Abstract phrasing implies a comparative qualitative finding ("more vivid... than baseline") that was not actually conducted. The authors should either revise the Abstract to clarify that the qualitative observations describe features of the music-assisted narratives without comparison to baseline, or extend the qualitative analysis to include baseline narratives as a comparison reference. The current phrasing risks overstating the comparative basis of the qualitative findings.
Fifth, a few minor issues warrant attention. The Supplementary Materials section (lines 531–532) still contains placeholder text from the journal template ("Figure S1: title; Table S1: title; Video S1: title") rather than actual descriptions of the supplementary content; this should be updated before publication. The Author Contributions section (lines 533–539) similarly contains residual instructional template language ("For research articles with several authors, a short paragraph specifying their individual contributions must be provided. The following statements should be used...") that should be removed. These appear to be oversights from the revision process rather than substantive issues.
Author Response
The authors have engaged seriously with the first-round review and most of the previously raised concerns are now adequately addressed in the manuscript body. My remaining concerns focus on residual issues that were raised in the first round but only partially resolved, as well as one new issue introduced during revision.
First, the most fundamental concern from the first round—the structural confound between the single-session baseline and the three-session post-intervention condition—has been acknowledged but not fully resolved. The authors now state in the Discussion (lines 381–385) that "the post‐intervention condition involved repeated sessions whereas baseline narratives were collected in a single session" and that "greater verbal output may therefore reflect not only music‐related processes, but also increased familiarity with the task, repeated practice, or reduced cognitive load across sessions." This is an improvement over the original manuscript. However, the Abstract (lines 31–33) still states that "music-assisted recall was associated with a significant increase in total word count compared with baseline, indicating greater narrative productivity," which presents the word count finding as a substantive outcome without flagging the structural confound. The phrase "indicating greater narrative productivity" is particularly problematic, because it presupposes that the increase reflects a meaningful productivity change rather than the mechanical consequence of having three dedicated sessions instead of one consolidated session. The Abstract should be revised to explicitly note this structural asymmetry, or at minimum to soften the inferential claim (e.g., "was accompanied by higher total word count, although this difference may partly reflect the greater elicitation time afforded by the three-session post-intervention format"). Similarly, the Conclusions section (lines 516–522) still refers to "a marked increase in narrative productivity" without flagging the confound, and the section-level descriptive trend analysis added in Section 3.1.1 (lines 278–289), while a constructive addition, somewhat overstates what the data can support. The claim that "this pattern does not appear consistent with a simple linear practice effect" addresses only one alternative explanation (linear practice across sessions) but does not address the more fundamental confound, which is that the three post-intervention sessions involve dedicated single-period prompting whereas the baseline session compressed all three life periods into a single sitting. This structural difference would plausibly produce a step-change in word output at the transition between conditions regardless of any music effect, simply because of the change in session format and elicitation time per period. The authors should acknowledge this more directly.
We thank the Reviewer for these important observations. In response, we revised the Abstract to soften the interpretation of the word-count finding and explicitly note that the post-intervention condition involved three separate sessions whereas baseline narratives were collected within a single session. We therefore no longer describe the increase in word count as directly “indicating greater narrative productivity.”
We also revised Section 3.1.1 to clarify that the descriptive session-level trends cannot disentangle potential music-related effects from the structural differences between conditions. In particular, we now explicitly acknowledge that the transition from a single consolidated baseline session to dedicated post-intervention sessions for each life period may itself have contributed to the increase in verbal output independently of music exposure.
In the Discussion, we further expanded the methodological caveat by specifying that the baseline condition compressed autobiographical recall across all three life periods into one session, whereas the post-intervention condition distributed these periods across separate sessions, thereby increasing elicitation space and potentially facilitating longer narratives independently of music effects.
Finally, we revised the Conclusions section to replace the expression “marked increase in narrative productivity” with more cautious wording focused on “higher narrative/verbal output”,
Second, the response regarding MMSE moderation is reasonable in principle but the implementation in the manuscript is minimal. The authors decline to run correlational analyses on the grounds of low power, which is defensible, but the Limitations section (lines 504–507) only briefly notes that "baseline cognitive functioning may also moderate responsiveness to music cues and narrative output." Given the broad MMSE range (19.0–29.5), spanning from borderline impairment to high-normal cognition, even a brief descriptive observation—for example, whether the magnitude of word count increase appeared visually associated with MMSE in Figure 1—would be informative for readers. The current treatment leaves the question entirely to future research without providing any descriptive context from the present sample.
We thank the Reviewer for this helpful suggestion. We agree that, despite the limited statistical power of the present pilot sample, providing descriptive context regarding the possible role of baseline cognitive functioning improves the interpretability of the findings.
In response, we added a descriptive observation in the Results section noting that increases in narrative output were observed across participants with varying levels of MMSE performance, including individuals with relatively lower MMSE scores. We also expanded the Limitations section to clarify that participants across the observed MMSE range (19.0–29.5) generally showed increased verbal output in the music-assisted condition, while emphasizing that the small sample size precludes any reliable conclusions regarding cognitive moderation effects.
We intentionally avoided formal correlational analyses because the sample size was insufficient to support stable inferential interpretation, and we wished to avoid overinterpreting exploratory associations.
Third, the response to the inter-rater reliability point is partially addressed but somewhat unsatisfying. The Methods section (lines 212–214) now states: "All transcripts were reviewed for accuracy prior to linguistic analysis. Formal inter-rater reliability coefficients were not computed." This is honest, but the question of how interviewer prompts and conversational scaffolding were standardized across sessions remains underspecified. The authors mention that "interviewers minimized additional prompting and adopted a supportive, non-directive stance" (lines 178–180), but it is not clear whether the same interviewer conducted all sessions for a given participant, whether interviewer training or calibration occurred, or whether there were any procedural differences between baseline and intervention sessions in how prompts were delivered. Given that interviewer behavior is one plausible alternative explanation for increased verbal output, even a brief clarification on these points would strengthen the manuscript.
We thank the Reviewer for this important methodological observation and agree that interviewer behavior represents a potentially relevant source of variability, particularly in relation to narrative productivity.
In response, we expanded the Methods section to clarify the procedures used to standardize autobiographical interviews across sessions. Specifically, all baseline and music-assisted sessions for all participants were conducted by the same interviewer in order to maximize procedural consistency and minimize interpersonal variability. Importantly, no procedural differences were introduced between baseline and music-assisted sessions in the manner or frequency of prompts.
Fourth, a new minor inconsistency appears to have been introduced during revision. The Abstract states that "qualitative findings suggested that music-assisted narratives were often characterized by more vivid autobiographical scenes, reactivation of meaningful social identities and former life roles, and emotionally salient communication" (lines 33–36). However, the qualitative analysis was conducted only on the music-assisted narratives and not comparatively on the baseline narratives (Methods section 2.3, lines 233–236, describes this as "an illustrative, conceptually guided reading of the narratives rather than as a formal qualitative analysis," applied to post-intervention transcripts). The Abstract phrasing implies a comparative qualitative finding ("more vivid... than baseline") that was not actually conducted. The authors should either revise the Abstract to clarify that the qualitative observations describe features of the music-assisted narratives without comparison to baseline, or extend the qualitative analysis to include baseline narratives as a comparison reference. The current phrasing risks overstating the comparative basis of the qualitative findings.
We revised the Abstract to clarify that the qualitative component consisted of exploratory observations describing recurrent features of the music-assisted narratives only, without direct qualitative comparison to baseline narratives. Specifically, we replaced the previous phrasing suggesting “more vivid” narratives with more neutral descriptive language focused on the presence of vivid autobiographical scenes, references to meaningful social identities and former life roles, and emotionally salient communication within the music-assisted accounts.
To ensure consistency throughout the manuscript, we also revised portions of the Discussion where the wording could be interpreted as implying qualitative baseline–post comparisons beyond the scope of the analyses actually performed.
Fifth, a few minor issues warrant attention. The Supplementary Materials section (lines 531–532) still contains placeholder text from the journal template ("Figure S1: title; Table S1: title; Video S1: title") rather than actual descriptions of the supplementary content; this should be updated before publication. The Author Contributions section (lines 533–539) similarly contains residual instructional template language ("For research articles with several authors, a short paragraph specifying their individual contributions must be provided. The following statements should be used...") that should be removed. These appear to be oversights from the revision process rather than substantive issues.
We thank the reviewer for carefully identifying these formatting oversights. The placeholder text remaining from the journal template in the Supplementary Materials and Author Contributions sections has now been removed from the revised manuscript. The supplementary file itself already contained the intended supplementary material and has been retained unchanged.