Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Reading Between the Lines: Digital Annotation Insights from Heritage and L2 Learners

Languages 2025, 10(9), 207; https://doi.org/10.3390/languages10090207

by Edna Velásquez

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Languages 2025, 10(9), 207; https://doi.org/10.3390/languages10090207

Submission received: 23 April 2025 / Revised: 14 August 2025 / Accepted: 18 August 2025 / Published: 26 August 2025

(This article belongs to the Special Issue Language Processing in Spanish Heritage Speakers)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This is a very interesting study about using social digital annotation (SDA) in the Spanish L2/heritage language classroom. The data-set under investigation gives important insights into how Spanish heritage language (HL) learners engage with a digital platform that allows them to collaboratively annotate their reading assignments. Particularly the insights from the analysis of types of cognitive and interactive strategies used in the participant annotations is insightful. Together with the data from participant engagement with the platform and the subsequent participant survey the study indicates that SDA supports HL learners in developing their academic reading skills, specifically in particular aspects.

Despite the relevance of this study, this article has to be significantly improved in terms of (1) the types of research questions that can be answered on basis of the existing data-set, (2) the stringency of argument and (3) the empirical methodology employed to answer the research questions. Since the data-set itself is valuable, I think that these issues can be solved albeit with considerable changes in the manuscript.

The most prevalent issue that makes the study in the current state not publishable, from my point of view, concerns a statistically invalid comparison of data from L2-learners (N=4!!!!) in comparison to HL learners (N=36). From my point of view, the analysis of the data from HL-learners is very valuable in its own right. And I strongly suggest that the authors should not engage in the comparison of L2- vs HL-learners at all (they might report the results from the L2-speakers in comparison to the HL-learners is a supplementary exploratory study or look at the group as a whole and indicate verbally in which instances the L2-learners behave differently from their HL-peers). Also, the authors are completely intransparent about the raw frequencies of nearly all quantitative measures (i.e., engagement with the platform, raw frequencies of annotations per group and category of analysis, distribution of the answers to the end-of-term-survey), instead they present mostly means or proportions. This does not allow the reader to retrace the analyses and evaluate the soundness of conclusions that they draw from them.

Another weak point of this study concerns the lack of grounding of the qualitative analysis of annotations in terms of cognitive and interpersonal strategies in the literature. How did the authors come up with these categories? In how far are these examples for cognitive (reading) strategies? In how far do they indicate that students (a.) cognitively engage with the text and (b.) collaborate within a “so-called” community of readers?

Both of these issues should be addressed in a revised version of this paper to be considered for publication.

In the following, I give some comments per section. They summarize and complement my comments in the attached pdf-file.

Introduction:

Overall comments:
- This section contains important information on what DSR and SDA are and why they might be useful methods in reading development in on- and offline instruction contexts. However, the section should be more clearly structured in order to ground the research questions which are proposed at the end of the section more thoroughly in the existing academic discourse.
- It remains unclear whether the sections reviewing findings from empirical studies refer to L1 or L2 instruction and/or to on- or offline instruction contexts. This should be made clear more systematically.
- Also, the section on literacy and reading development in heritage language (HL) learners needs to be revised as it currently focusses a lot on the differentiation between L2 learners and HL-learners.
- Finally, I miss a section that provides a theoretical and or empirical basis for the categories of cognitive processes and interaction types that are investigated in the content analysis of the annotations. On the basis of what type of literature were these categories established?
Lines 60 – 101:
- Here the authors provide a relevant review of empirical studies on the use of DSR, specifically SDA, in reading instruction. It remains unclear whether the studies your present refer specifically to L2 learning or are more general. I think this should be clarified.
- Currently the authors structure this section on the basis of individual studies. This leads to redundancies and a lack of overview at the end. Instead, the authors could condense the section by structuring it along the lines of findings and affordances and refer to several empirical studies at the same time.
Lines 102 – 114:
- This section focusses on studies investigating DSR in L2 Spanish instruction. Why this focus? Do the authors really expect differences between different L2 target languages? If this is not the case, I suggest summarizing findings of studies of DSR in L2 reading instruction (in case they are any different from the studies presented in the previous section).
Lines 115-152:
- This section should be framed slightly differently once the problematic statistical comparison between HL- and L2-learners is removed.
Lines 168-189:
- The stance of this section does not really fit here. I get the sense that the authors are already discussing their results/implications of their results. I suggest to integrate most information on this section in the conclusion and focus on aspects that explain why the study was set up in the way it was.
Lines 190-200:
- In how far are research question 1 and 2 different from one another?
- Subquestion of RQ2 on differences between HL learners and L2 learners cannot be answered in a statistically meaningful way on the basis of this data-set. I recommend the authors to focus on the types of cognitive strategies displayed by HL learners.

Materials and Methods:

Overall comment: This section gives important information on the methods employed but should be structured more systematically. I suggest the authors to follow APA guidelines for structuring methods sections (1. Participants, 2. Materials (including intervention), 3. Procedure). Also a section on research ethics and informed consent should be included somewhere in this section.
- Participants: We do not get enough information about the participant group. Please also report the CEFR-level of the course (or the individual course participants).
- Materials: We also do not find out how the annotation task was prompted by the tutor. It would be helpful to provide us with the “example of effective annotation” that the tutor gave the students in the beginning of the course. And how did the participants fill in the end-of-semester survey?
- Analysis: Please give more information on the thematic analysis. How were the thematic categories developed? Provide us with the coding manual (in appendix).

Results:

General comments: The results should be structured according to the research questions. This is currently not the case. The authors should provide raw frequencies and descriptives for all data that they use to answer the individual questions.
Section 3.1:
- Please provide (raw) frequencies of the total set of annotations that were analysed and then the frequencies per category both for TI and PI (you could do so in an overview table). Also present the categories in order of frequency.
Section 3.2:
- Leave out the statistical comparison of L2-learners with HL-learners. Instead you could potentially conduct within-group comparisons between the distribution of TI vs. PIs across, although I doubt that you will find significant differences from looking at the figures.
Section 3.3:
- Provide more extensive descriptive information on the outcomes of the end-of-term survey. (Just a mean is not enough. What is the mean based on? What was the range/SD?)
- It would also help to provide the exact statements that they answered to in the results table (here or in the methods section).
- Finally, integrate the student quotes that you provide at the end of the section in the text in which you describe and interpret the data before.

Discussion:

Overall comment: Some of this section is more an overall conclusion (first section) or a description of the results (ll 628-644) than a discussion of the findings. The authors should discuss the outcomes concerning each research question individually. Also, I suggest an overall discussion of findings here followed by limitations.

Conclusion:

Overall comment: The authors make some important points here. See my comments some of the introduction section concerning the relevance of this study that, from my point of view, should actually go here.
Lines 661-684: The authors make some important recommendations here concerning the use of DSR and SDA in the classroom. However, I think it would be important, if you clearly delineated which recommendations are concretely supported by your own findings and which are more general and maybe related to your own experiences in the project.

Limitations:

Overall comments: Important points are being made here, but I suggest to include this section as subsection of the discussion. That way the overall conclusion and the recommendations for practice provide a better conclusion. Please find attached my review report and recommendations for substantial revision.

Ethical concerns:
The empirical component of this study is highly problematic, because the authors hardly ever give information on the absolute frequencies of their findings. It gives the impression as if the authors use means and percentage scores to disguise the thin quantitative empirical basis on which they base their conclusions. Especially the statistical comparison of the L2-learners (N=4) compared to the HL-learners (N=36) is misleading and illicity. This is a shame as the results from a more qualitative point of view are quite interesting and valuable in their own right. This is presumably not their intention. From reading the paper, I got the impression that they might be not trained very well with respect to analysing, interpreting and reporting quantitative data. They might not be aware of the fact that absolute numbers should be reported, especially when dealing with small data-sets. They might also not have had good advise with relation to the statistical tests they performed. I, hence, suggest that they get advise on how to best and most transparently present the quantitative data in their study when revising the article.

Comments for author File: Comments.pdf

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Reading Between the Lines: Digital Annotation Insights from 2 Heritage and L2 Learners

A brief summary This article examines text engagement, reading behaviors, and peer interaction through digital annotations on the platform Perusall in two sections of an Advanced Spanish language class. The participants in this study are 40 undergraduate university students represent two groups of learners: Spanish Heritage Learners (SHL) and Second Language learners (L2). The results indicate that while both groups of learners revealed equal levels of interaction with the reading materials and their peers, the L2 group appears to rely more heavily on questioning strategies and translations, while the SHL prefer strategies of evaluation and connecting the readings with their every-day (language) experiences. For the authors these results underline the usefulness of annotation platforms in the creation of meaningful and collaborative learning environments.

General concept comments
Overall, the study is well embedded in previous literature and the research questions – clearly identified -are soundly motivated from the context. The results are presented logically and concisely. The interpretation is supported by the results. The study adds important insights into several areas of research. It examines use of digital annotations platforms (here: Perusall) in the asynchronous classroom as a means to cognitively and socially engage with the text and the other course members. Moreover, it sheds light onto the use of this technology in a language learning context and the collaborative classroom. Another merit of the study is that it highlights the abilities and strategies employed by learners in interacting with texts in the second language. Especially insightful is the focus on the strategies used by Heritage Learners of Spanish. The results here underline their crucial role in language learning classrooms and in their interactions with the group of L2 Spanish learners.

As the authors state in section 6 (Limitations and Directions for Future Research), a weakness of the study is the small number of participants. Whereas the number of Heritage Learners is above 35 (this can be inferred by the percentages offered on p.5, l.220-222), the L2 group is very small. It is in its size and scope, however comparable to other studies in this research area (e.g., Zapata & Mesa-Morales 2018).

In sum, this study is an important addition to a small but growing area of research. As the authors state, to date there are very few studies examining the use of digital annotation in the online language learning classroom. Given the increase in offers of online learning in higher education and an increasing number especially of heritage Spanish speakers attending colleges, studies such as this one will offer important insights into online learning, digital tools, and student engagement.

Specific comments:

p. 3, lines 115 – 123: In line 118 it says: “these students in U.S. universities”. Do you mean here both HS and L2 learners? While it is true that both groups attend universities, would you say that both groups of learners are growing? Could you reinforce this with some data (e.g., Pew Research Center, MLA)? Along the same lines, do you have any data on the use of digital annotation platforms in courses at universities? Adding this information could make your argument more relevant to the larger context.

p. 3, lines 125 and following: In this paragraph, you mention the linguistic profiles of SHL and L2 learners, yet in the following two paragraphs you cite studies only examining SHL learners. Since you cite studies on L2 learners in a language classroom earlier on the page, maybe here you could transition to research on SHL learners.

p. 5, lines 219 and following: 90% were Heritage Speakers and 10% L2 – you may want to mention here how this disparity could influence the interpretation of the results. Although you mention this on section 6., it should perhaps at least be mentioned (albeit briefly) here as well.

p. 6, lines 255 and following: As you mention the engagement metrics of the analysis, could you indicate how these emerged? Are they based on previous studies or a focus of your study specifically?

p. 11: On the results of the longer viewing time among the SHL, you stated that this might be due to a more reflective reading approach. Could it be that this group, in contrast to the L2 group, might be less practiced in completing these kinds of exercises in Spanish and therefore need more time to process the reading? The different linguistic experiences between the two groups might influence this outcome.

p. 13, Table 2: Where there any differences in ratings between the two participant groups? I noticed that the results here are collapsed, a bit surprising given how other results report outcomes for both participant groups. It might be interesting for the readers to see the means for both groups in the same table and if there were any differences in perception.

p.14: Also, here a more detailed report on students’ self-reports might give more insights: Did L2 learners draw different conclusions or have different perceptions as the SHL?

16/17: The self-reported surveys give insights into student’s perceptions, but – as you state – the responses are difficult to assess. Perhaps a future study that incorporates measurable changes in text understanding could offer more insights into any learning benefits, other than the ones that are self-reported.

Author Response

Please, see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The present study examines how Spanish heritage language (SHL) and second language (L2) learners annotate texts through a collaborative digital annotation tool. The results showed that SHL learners favored evaluative/connective strategies (reflecting personal experiences), while L2 learners relied more on questioning/translating (analytical approaches). Students reported gains in vocabulary, motivation, and academic language awareness, challenging deficit views of SHL literacy. Overall, I believe that the current study has very good potential. However, I also have certain concerns. These concerns mostly relate to the methodological part of the study, and the statistical analysis and discussion of the results.

The authors can find my comments on each section of the manuscript below.

Introduction

The introduction successfully contextualizes the study within digital social reading and social digital annotation practices. However, it takes a long time to reach the purpose of the study, which is to investigate how Spanish heritage language (SHL) learners, and second language (L2) learners engage with texts through collaborative digital annotation. Readers who have not read the abstract may mistakenly believe that the focus of the current study is on collaborative digital annotations among speakers who were raised monolingually. Heritage speakers and L2 learners are not mentioned until page 3 of the introduction (line 102). The first three pages of the introduction could be an introduction to a study on collaborative digital annotations in general, not necessarily in heritage or second languages. Even when the focus shifts to heritage and second language learners on line 102, it is abrupt and seems unmotivated.

I suggest restructuring the Introduction in such a way to be made clear from the beginning:

What is the purpose of the present study?
Why were heritage and second language learners chosen as participants for the present study?
Is there relevant research on monolingually raised speakers of any language that motivates investigating collaborative digital annotations in the heritage and second language?
Is there an actual need to examine collaborative digital annotations in heritage and second language learners?
Why was Spanish the chosen language of study?

This line of inquiry is increasingly urgent, as many Spanish language classrooms now bring together students who learned Spanish informally at home—heritage speakers—and those who were raised speaking only English and are learning Spanish as a second or foreign language.: I find the argument very weak. The authors should explain what it is that makes this line of inquiry urgent (was there a different policy in the past that has now changed? If the same policy that was applied in the past is also applied nowadays, i.e., heritage language speakers and L2 speakers being in the same classrooms, I do not see the urgency of the situation).

I suggest moving the relevant research that starts on line 125 higher in the Introduction so that readers understand from the very beginning what the present study is about and what motivates it.

Materials and Methods

Participant numbers should be provided in a clear way so that no mathematical calculations are necessary to figure out the number of participants per group: thirty-six participants were heritage speakers of Spanish, and 4 participants grew up with English as their dominant language.

Participant distribution is problematic. It cannot be that the article examines heritage language and second language learners as two distinct groups, but there is only one group with sufficient participant number (heritage language learners) and a very small second group of second language learners with only four participants. The groups are thus not evenly distributed.

I suggest that the authors either collect more data from the L2 learner group, or only keep the heritage language learner group, or clearly state that the L2 learner group results are exploratory. If the authors still choose to keep the groups as they are, they should provide strong argumentation that supports the existence of the L2 learner group with such a small number of participants.

Results

How did authors come up with the presented classification scheme of the annotations? Were they based on a previous scheme or did they come up with classifications based on personal judgements of the annotations? Please clarify.

Statistical analysis

In principle, using both descriptive and inferential statistics to compare the data of the two groups of learners is the right approach. However, the current approaches raises certain concerns:

Statistical results may not be valid because of the huge difference in number between the two groups, and because of the very small power in the L2 learner group (n=4) which increases the risk of Type II errors (missing true effects).
Effect sizes are missing. Authors should report effect sizes such as Cohen’s d for t-tests, Cramer’s V for chi-square)

I suggest the authors clearly mention that the L2 learner findings are exploratory (if they decide to keep the L2 learner group as is), and they apply non-parametric tests to compare the two groups.

Perceptions Survey

The authors present findings of the perceptions survey without having previously introduced the perceptions survey in the Methods and Materials section (there is only a quick mention of it within two lines (264-266). I suggest the authors properly introduce the perceptions survey describing the number and type of questions that were included in it, and also the type of answers (e.g. open ended, scale).

Importantly, in the Results section, the authors should report how the mean scores were calculated.

The Results section ends kind of abruptly with the students’ comments without any type of commentary provided by the authors or any summary of the results. I suggest adding a transition paragraph to the Discussion section.

Discussion

The first paragraph restates the results without mentioning the first research question. Then, in the second paragraph, the authors link the results to MacGregor-Mendoza and Moreno’s (2021) call to move beyond deficit-based views of heritage language learners without discussing further and in more depth how the actual results of the present study support MacGregor-Mendoza and Moreno’s views. Additionally, there is no discussion regarding the results of the L2 learners in relation to the first research question.

The second and third research questions are discussed more effectively than the first one.

Overall, although the Discussion aligns with views challenging deficit models, it does not fully engage with debates that are also relevant to heritage language learners, such as translanguaging or multiliteracies frameworks, so as to describe more effectively practices related to heritage language and second language learners.

When summarizing strategy differences, the authors restate findings without adding new interpretation. I suggest adding and expanding on why these patterns matter.

I find the link to broader implications of the study to be weak. Although 21^st century classrooms are mentioned, I was wondering how the authors would like the results of the current study to larger trends such as AI assisted annotation and equity in hybrid learning. I suggest then that the authors reframe pedagogical recommendations of the current study. What teaching practices would the current study authors recommend?

It would also be great if the authors could provide more discussion on how they see digital annotations evolving in the heritage and L2 education. (e.g., "Could AI-generated annotations complement peer collaboration?")

Author Response

Please, see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

23 July 2025

Journal: Languages

Manuscript ID: languages-3633645, v2

Title: Reading between the Lines: Digital Annotation Insights from Heritage and L2 Learners.

Special Issue: Language Processing in Spanish Heritage Speakers

Dear author and editors,

I am happy to hear that the author is grateful for the feedback I gave to the first draft of the paper. I appreciate the highlighting of revised passages and the revision chart that was provided by the author. Thank you!

I acknowledge that the author has taken on a number of my suggestions for revision in v2. I also agree to some arguments concerning their decision not to take on some suggestions for revision. However, I do see that a number of my major concerns concerning this study were not addressed appropriately or interpreted in the wrong way. I, hence, cannot recommend the publication of this article in its current revised form and suggest major revisions.

Before I present a list of individual comments, here my major point of concern: My previously raised concerns about the way quantitative results are reported and the statistical analyses that are being performed on them has not been addressed appropriately. On the basis of the author’s answers to my comments concerning these issues, I conclude that they do not have the methodological background necessary to transparently and reliably present, analyse and interpret the quantitative data that they collected. (1) They argue that they provide “summary statistics” (i.e., group means and percentage scores) as opposed to raw frequencies to ensure clarity (over transparency). But it would not be difficult to do both at the same time, by doing the following:

Table 1:
- For each group indicate the number of participants in brackets (N=36 vs. N=4)
- For each Metric (i.e., viewing time, active engagement, etc.) indicate that you are providing group means and additionally provide a measure of spread and variance per category, i.e., min/max, standard deviation (SD)).
Graph 1:
- For each bar (i.e., immigration, identity, education, health), indicate the total number of mean annotations and the SD.
- For each subcategory mean (i.e., 2.3 PI for immigration), also provide the SD.
Graph 2 and Graph 3:
- For each group SHL vs. SL2 provide the total number of annotations on which the percentage score is based on.

(2) They conduct tests of statistical difference between the two groups and argue that they point out that the results are exploratory. However, if you argue that your analysis is exploratory and should not be treated as generalizable across a population (which it really shouldn’t for the group of just 4 SL2 participants), then a test of statistically significant difference should not be performed in the first place, because the results of these tests imply exactly that the results are generalizable across the entire population. Interestingly enough, the author agrees with me on this issue in their analysis of the findings of the perceptions study (lines 548-552). There they stress the fact that it does not make sense to statistically compare the results of the two groups. This makes their decision to conduct these tests earlier on questionable and leads to methodological inconsistency. In addition to this, the second type of group comparison that the author conduced, namely the Chi-square test to detect differences in the distribution of the annotation categories between the two groups (ll. 515-516 and ll. 535-536) is an invalid statistical method. Such a difference cannot be tested by Chi-square tests (which can only be used to compare the differences in distributions of ONE annotation category between two different groups). If at all (and I hope I explained why statistical difference testing should not be conducted in this study in the first place), a Kruskall-Wallis test would be the appropriate means to test these differences in category distributions across the two groups. Once again, I suggest that the author only presents and describes the frequency data (with the additional information on variance and raw totals as suggested above) and attempts to explain why they assume the two groups tend to differ in some of the findings.

Without making appropriate changes to the quantitative part of this study – as suggested above - , I cannot recommend to publish this paper. In order to solve these issues, I reiterate my suggestion that the author seeks advice by a statistical expert, in case they doubt my assessment concerning the invalid methodology or in case my suggestions for amendment remain unclear to them.

Besides these issues concerning the presentation and analysis of quantitative data, I reply to each of the author’s responses in their review chart, below, adding some points that were overlooked or that would improve the article even more.

Kind regards

The reviewer

Review chart (as provided by the Author in the cover letter with my responses and additional comments):

Section	Reviewer’s comments v1	Author’s response (v2)	Reviewer’s comments v2
overall	The manuscript compares SHL and L2 learners despite the extremely small L2 sample (N = 4), making any statistical comparison invalid.	I revised the manuscript to frame the L2 data as exploratory, noting that the small sample limits generalizability. This clarification appears in the Materials and Methods section (P. 5; lines 213 -216), in the Statistical Analysis section (P. 11, lines 453 -455), and in the Limitations section (P. 16, lines 683 – 685).	You are more explicit about the fact that the small sample size only allows for an exploratory analysis. However, you do not follow up the consequences, namely that no statistical difference tests can (and need to) be conducted in case of exploratory analysis. See my comments and suggestions above to amend this.
overall	The coding categories for the qualitative analysis are not sufficiently grounded in literature. It is unclear how the examples reflect cognitive engagement or how they support the claim of a “community of readers.”	I revised the Results section to clarify that the coding categories were developed inductively, based on patterns observed in student annotations, and are aligned with widely recognized cognitive reading strategies. I also added two sentences explaining how these categories reflect cognitive engagement (e.g., summarizing, evaluating, questioning) and how peer interactions contribute to collaborative meaning-making and the development of a reading community. The clarification appears on p. 6 - 7 lines 264- 273.	You mention here that the cognitive reading strategies you find are "widely recognized cognitive reading strategies". Yet, besides a single sentence, you don’t make reference to the theoretical and empirical literature that you supposedly base these categories on. You should properly introduce them in the background chapter (see my suggestion for revising the structure of this chapter further below).
overall	The raw frequencies of annotations, engagement data, and survey responses are not reported, making it difficult to assess the soundness of the conclusions.	I added a statement in the Results section noting that raw frequencies were reviewed during analysis and are available upon request. I retained a focus on summary statistics in line with the study’s design and conventions for mixed-methods reporting. The clarification appears on P. 11; lines 455- 458.	This is not the appropriate way to do this. See my suggestions for presenting descriptive statistics of your quantitative data above.
Intro/ background	The Introduction provides valuable background on digital social reading (DSR) and social digital annotation (SDA), but it needs a clearer structure to ground the research questions.	I revised the Introduction to improve the overall structure and flow, beginning with a clear rationale for the study and integrating a smoother transition to the research questions.	I cannot detect a difference in the structure of this section and suggest the following structure: o - After contextualization of the importance of reading strategies, explain what cognitive reading strategies are and why they are important. o - Then explain why and how DSR and specifically SDA may support the development of cognitive reading strategies. Here you could also explain, why the social dimension (i.e., peer interaction, developing community of readers) might be important. o - Then introduce why these questions is particularly relevant for Spanish as a HL and L2 learners and their reading development. Here also add, to what degree the two groups are similar and or different.
Intro/ background	It is not always clear whether cited studies refer to L1 or L2 instruction, or whether they occurred in online or offline contexts.	I added clarifying phrases throughout the literature review to indicate whether cited studies refer to L1 or L2 learners and to the instructional context (online or offline). Changes can be seen highlighted in the Introduction on P. 2 – 3.	As you will see from my comments in the pdf, this is still not always clear.
Intro/ background	The theoretical or empirical grounding of the categories of cognitive and interaction strategies is missing in the introduction.	I added a brief bridging sentence near the end of the Introduction clarifying that the strategy categories—such as summarizing, evaluating, and making connections—are described in more detail in the Results section, where they are grounded in observed annotation patterns and aligned with commonly recognized forms of cognitive engagement. This helps orient the reader without duplicating information already provided in the Results section. The	It makes sense to present the individual categories (summarizing etc.) only when you present the results from your analysis. But still, you need to SHOW in how far these categories are “commonly recognized” in the literature in this section. You need to explain exactly what you mean by cognitive reading strategies and how they contribute to reading in general. See my comment above about the restructuring of the introduction.
Intro/ background	The “previous studies” section focuses on studies investigating DSR in L2 Spanish instruction. Why is this the focus? Do the authors expect differences between target languages? If not, consider summarizing DSR studies in L2 reading instruction more broadly.	The studies reviewed focus on DSR in L2 Spanish because the present study also investigates reading engagement among Spanish heritage and L2 learners. These sources provide a directly comparable instructional context, helping to ground the analysis. While DSR research in other L2 languages may show similar trends, expanding the scope was not necessary for the purposes of this Spanish-specific study. No changes were made.	Fine by me, even though I think that you miss an opportunity in making your study results relevant for other contexts in which HL and L2 learners of other languages are in the same situation.
Intro/ background	Lines 168–189 (in the original): This section feels out of place, as it begins discussing results or implications prematurely. Consider moving most of it to the conclusion.	I revised the conclusion to integrate the paragraphs previously located in lines 168– 189 of the introduction. These paragraphs, which focused on the implications of the research, were relocated and blended with the original conclusion. The updated section now includes a more robust synthesis of the study’s pedagogical implications, connecting research findings with broader instructional recommendations.	This makes more sense to me, now.
	Lines 190–200 (in the original): Research Questions 1 and 2 appear too similar; please clarify the difference. Additionally, the sub- question in RQ2 regarding differences between HL and L2 learners cannot be answered meaningfully given the small L2 sample. Consider focusing only on HL learners’ cognitive strategies.	No changes were made to the research questions. While both RQ1 and RQ2 address aspects of engagement, RQ1 focuses on patterns of overall interaction (including peer engagement), whereas RQ2 analyzes the types of cognitive strategies learners used in their annotations. Additionally, comparisons between HL and L2 learners are clearly framed as exploratory due to the small L2 sample, a limitation acknowledged in both the Methods, Discussion and Limitation sections.	Fine by me.
Methods	The Materials and Methods section should be organized more systematically. The authors are encouraged to follow APA’s structure (Participants, Materials, Procedure). A section on research ethics and informed consent should also be included.	I reorganized Section 2 (Materials and Methods) into three clearly labeled subsections following APA formatting conventions: (2.1) Participants, (2.2) Materials, and (2.3) Procedure. I added a brief ethics statement confirming IRB approval and informed consent procedures, in alignment with standard research ethics practices (P. 5,	OK, but see my individual comments for extra information below.
Methods	Participants: We do not get enough information about the participant group. Please also report the CEFR-level of the course (or the individual course participants).	I added a sentence to the participant description clarifying that the course corresponds approximately to the B1–B2 levels of the Common European Framework of Reference for Languages (CEFR). (P. 5).	Ok, but in this section some more important background information on participants is still missing. APA-conventions usually ask for age, gender and other important background information that could have influenced the results of the study.
	Materials: We don’t know how the annotation task was prompted. …		This comment was not dealt with in this revised version.
Methods	Analysis: Please give more information on the analysis process of the cognitive engagement strategies. …		This comment was dealt with a bit superficially in the section 2.3 Procedure. However, this is not enough information on how you conducted the thematic analysis. We need information on: (1) The selection of the unit of analysis. (2) The identification of themes and the development of a coding manual. (3) The final coding process and the way the two independent coders conducted their analysis and compared the results in the intercoder agreement.
Results	The results should be structured according to the research questions.	While I did not restructure the entire Results section, I clarified in the manuscript how each subsection aligns with the corresponding	You don’t do this in the first section, 3.1, when you report on the thematic analysis. Please add some introductory sentences here to explain how this analysis contributes to answering your research questions.
Results	Section 3.1: Please provide (raw) frequencies of the total set of annotations that were analysed and then the frequencies per category both for TI and PI (you could do so in an overview table). Also present the categories in order of frequency.		You did not react to this comment and also did not revise your paper accordingly. I suggest you to do so. Additionally, for each direct quote from a participant, please provide some background information: gender, age, SHL-learner or SL2-learner. If you used anonymized participant IDs, you can also use them here.
Results	Section 3.2: Leave out the statistical comparison of L2-learners with HL-learners. Instead you could potentially conduct within- group comparisons between the distribution of TI vs. PI, although I doubt you will find significant differences from looking at the figures.	I retained the comparison between HL and L2 learners to highlight trends in cognitive engagement; however, I clearly framed all comparisons involving L2 learners as exploratory due to their small sample size (n = 4). I revised Section 3.2 to explicitly caution readers about the limited generalizability of these findings. I also expanded the analysis of within-group interaction types (TI vs. PI) to respond to the reviewer’s suggestion and contextualize the findings more carefully.	See my general comments above. Concerning your reporting of within-group interaction types, the insertion does not make sense. I was actually referring to a repeated samples test per group. But I do think that this is not necessary and will not yield any insights that can’t be detected from the descriptive figures that you (should) provide.
Results	Section 3.3: Provide more extensive descriptive information on the outcomes of the end-of-term survey. (Just a mean is not enough. What is the mean based on? What was the range/SD?) It would also help to provide the exact statements that they answered to in the results table (here or in the methods section). Finally, integrate the student quotes that you provide at the end of the section in the text in which you describe and interpret the data before.	I revised Section 3.3 to include more detailed descriptive statistics. I added the standard deviation, minimum, and maximum values for each survey item in Table 2 (P. 14). I also inserted the exact statements used in the survey into the results table. Finally, I integrated the most representative student quotes directly into the interpretive narrative, eliminating the separate bulleted list (Lines 557 -559; 563-565; 568 – 571; 574 – 576) .	Good that you provide the extra information on numbers here. Also good that you integrated the statements with the results from the table. Please also indicated here, what kind of participant the statement comes from (age, gender, SHL- or L2-learner).
Discussion:	The authors should discuss the outcomes concerning each research question individually. Also, I suggest an overall discussion of findings here followed by limitations.	To improve clarity and organization, I added transitional phrases that explicitly link each part of the discussion to the corresponding research question (lines 580, 613, 644). This helps guide the reader through the interpretation of findings in a more structured manner. Additionally, I ensured that the section begins with a synthesis of key results and concludes with a clearly labeled subsection on limitations and future directions, as recommended (See P. 16).	Looks better.
Conclusion:	The authors make some important points here. See my comments on the introduction section concerning the relevance of this study, which should actually go here. Lines 661–684: The authors make some important recommendations concerning the use of DSR and SDA in the classroom. However, it would be important to clearly delineate which recommendations are concretely supported by the study’s findings and which are more general or drawn from the authors’ own experience.	I moved the paragraph on the study’s relevance from the introduction to the beginning of the conclusion, where it now frames the importance of the research in the current academic context. I also revised the recommendations section to clearly distinguish between those that are supported by my findings (e.g., strategy use patterns, peer interaction trends) and those based on my implementation experience (e.g., small annotation groups, initial ungraded tasks).	Agree, see above.
	I suggest including this section as a subsection of the discussion. That way, the overall conclusion and the recommendations for practice provide a better closing.	I moved the Limitations section to become a subsection of the Discussion section (now titled 4.1 Limitations). This structural adjustment ensures that the Conclusion remains focused on summarizing the findings and outlining practice- oriented recommendations, as suggested by the reviewer.	Better.

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

Thank you for your thorough and thoughtful feedback on the revised manuscript titled "Reading Between the Lines: Digital Annotation Insights from Heritage and L2 Learners" (Manuscript ID: languages-3633645). I truly appreciate the time and expertise you have invested in reviewing this work.

I understand and acknowledge your continuing concerns, particularly regarding the presentation and interpretation of the quantitative data. In response, I have carefully revised the manuscript to address the issues you identified. Below, I summarize the major changes made to the manuscript:

1. Descriptive Statistics and Transparency in Reporting:
   - Table 1 now includes the number of participants in parentheses for each group (N=36 for HL, N=4 for L2).
   - Each engagement metric (e.g., viewing time, active engagement) now includes mean, standard deviation, minimum, and maximum values to enhance transparency and clarify data variability.
   - Graph 1 has been revised to display means and standard deviations (SDs) directly, either in parentheses or with a footnote.
   - A new APA-formatted Table 2 has been added to provide the total number of annotations per cognitive strategy and per group, as requested for Graphs 2 and 3.

2. Clarification of Statistical Tests and Methodological Approach:
   - I have removed the inferential statistical tests (t-tests and chi-square) comparing the two groups, as suggested. These were replaced with descriptive comparisons that emphasize observed trends without making claims of generalizability.
   - The manuscript now clearly explains that findings related to L2 learners are exploratory and should be interpreted with caution due to the limited sample size.

3. Statistical Consultation:
   - I consulted with a colleague with expertise in applied linguistics and quantitative methods to review the revised tables and clarify the appropriate use of descriptive versus inferential statistics in small-sample educational studies.

I have also carefully reviewed and addressed each of your specific suggestions in the attached updated revision chart. I believe these changes strengthen the study’s methodological clarity and transparency, while still supporting the qualitative and pedagogical contributions of the research.

Thank you again for your constructive and rigorous feedback, which has significantly improved the manuscript. I hope that the revisions made are now satisfactory and that the study may be reconsidered for publication.

Sincerely,

The Author

Author Response File: Author Response.pdf

Article Menu

Reading Between the Lines: Digital Annotation Insights from Heritage and L2 Learners

Further Information

Guidelines

MDPI Initiatives

Follow MDPI