Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Inferential Reading Skills in High School: A Study on Comprehension Profiles

Educ. Sci. 2025, 15(6), 654; https://doi.org/10.3390/educsci15060654

by Andrea Nadalini^*

, Claudia Marzi

, Marcello Ferro

, Alessandra Cinini, Paola Cutugno and Davide Chiarella^*

Reviewer 1: Anonymous

Reviewer 2:

Gail Brown

Educ. Sci. 2025, 15(6), 654; https://doi.org/10.3390/educsci15060654

Submission received: 10 April 2025 / Revised: 13 May 2025 / Accepted: 19 May 2025 / Published: 26 May 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study investigates how school students in Italy aged between 15-19 years process and comprehend continuous text. To assess processing, the authors used the finger-tracking method to measure reading times. To assess certain aspects of comprehension (vocabulary knowledge, referential and inferential skills), the authors used a questionnaire. The authors discussed the findings alongside implications for the utility of instructional support.

I found the topic to be interesting and the manuscript to be generally well-written. However, I also found various points in the text that lacked sufficient detail/explanation/justification. Please find these listed below.

Abstract

1) Line 9: I would advise against the use of the term “lexical dependency patterns” to describe reading behaviour. Lexical dependency is a term widely used in syntax and NLP to refer to grammatical relationships between words, and as such, it could be confusing for readers to encounter it here.

Introduction

2) Line 34: The authors make reference to the existence of “different theoretical models” without specifying which ones they are referring to or explaining them further. It is also noteworthy that the authors later on (e.g. lines 382-384) claim that their results provide support to particular theoretical models. Yet, this claim lacks foundation since the authors have not explained in their Introduction the key assumptions of relevant theoretical models, so that the reader can evaluate which theoretical model the reported results are most consistent with.

3) Line 55: The authors state that in reading studies it is critical to take into account “non-linguistic” factors. The term “non-linguistic” is vague as it stands given that many factors (e.g. cognitive, experiential) can be described as “non-linguistic”, but in the context in which it appears it can be implied that it is used to refer to “working memory” and “inhibition” based on what was mentioned in the same paragraph. Since the authors stress how important these competencies are, readers may be left wondering why they were not assessed in this study. The authors could address this in a “Limitations” section.

4) Line 76-78: Related to my comment above, the authors mention “working memory” and “inhibition”, inter alia, but never explicitly define these terms in relation to “executive functions”. In fact, cognitive abilities such as “working memory” and “inhibition” are encompassed by the term “executive functions”, a least in most formal definitions of the EF construct I am aware of. Thus, the statement made in lines 76-78 appears circular and quite confusing.

5) Line 99: The authors mention the literacy achievement of adolescents in Italy specifically, but this comes out of the blue (i.e. no prior mention of Italy before this point in the text). The authors may want to consider preparing the reader, perhaps by discussing at the start of the paragraph literacy achievement more generally, and then zooming in to the Italian education context and its particularities/relevance for this research topic.

6) Line 107: The abbreviation ICT is not properly introduced in full for unfamiliar readers.

7) Line 128: The sentence is quite vague, as it is not specified which “strengths” and “weaknesses” of what/who exactly the authors are making reference to.

8) At the end of section 1 (Lines 140-153), the authors give an overview of their study methods (e.g. ReadLet app, questionnaire), but do not explain what their specific research questions and hypotheses are. I would advise the authors to be specific about what their research questions are, what their hypotheses are, and how they will be evaluated.

Materials and Methods

9) Line 156-159: The age range of this study’s sample (15-19 years) raises the following concern: why was the finger-tracking technique chosen when it is highly unlikely that students of these ages still use their finger to track where they are reading in their everyday life? If the recruited participants were beginner readers of younger ages (or if the same adolescent participants were tested using a different method), then the study would have been more methodologically sound. Adding to the above, a second concern about the chosen methodology is the following: typically, when using a touchscreen on most devices including tablets, users know that they can touch the screen to scroll up and down or move sideways. That means that the finger-tracking technique is quite unfamiliar and counter-intuitive, even after going through 1 practice trial. I think these are important methodological concerns that the authors should address either in the same section or in a “Limitations” sections.

10) Line 168: In Table 1, I would advise the authors to include a note below the table to explain what certain variables stand for. While “Mean Word Length (letters)” is easy to understand, the abbreviation “POS” and the variable “Mean Dependency Length (words)” may not be clear to all your readers.

11) Line 170: The authors could provide the whole texts and accompanying questionnaire that they used via supplementary materials. It would also be helpful to provide excerpts of the texts and examples of questions so that readers can have a better idea of the materials used in this study.

12) Line 198: In the interest of Open Science practices, I would advise the authors to make their analysis scipts available.

13) Line 204: The authors do not justify why they chose a Gaussian distribution with an inverse link. Did they check if this fitted their data better than other options (e.g. gamma, log)? What criteria were used?

14) Line 205-208: When describing the independent variables in their models, the authors do not explain: (a) why age, a key developmental variable, was not included in models or why age differences were not taken into account in other ways (e.g. residualisation), (b) why question accuracy is treated as both an independent variable in the analyses of reading times and as a dependent variable later on. I think point (b) relates to my 8^th comment above, namely the fact that no clear hypotheses were specified, and as such, it is hard to understand if the authors are testing particular predictions about effects of question accuracy on reading times.

15) Line 215-217: The authors do not justify why they chose to create 3 categories to classify participants based on their accuracy in the questionnaire. Why not 2 categories for instance using the median split approach, or any other number of categories? Also, the authors need to specify how many participants they ended up with in each category.

Results

16) Line 254: The results reported in Table 2 are confusing. In particular, the effect of Length is negative (estimate: -1.27; t value: -14.25). This contradicts what is commonly found in the literature as well as your own description in lines 248-249, namely that longer words result in higher tracking times. As mentioned in my 12^th comment above, the authors could already provide access to their analysis scripts and model outputs to double check these results. This is easy to achieve by R markdown files which render both the code and output.

Discussion

17) Line 330-332: The authors make a statement regarding typical findings from previous research but do not provide references to substantiate it.

18) Line 345-347: Relevant theoretical work should have been introduced in the Introduction section, so as not to appear out of the blue.

19) Line 374-379: The explanation of what “inferential processing” stands for appears towards the end of the Discussion, but arguably, this is not the best place for this. If anything, key concepts and terms should be introduced in the Introduction section first.

20) Line 382-384: Again, as mentioned in my last two comments above, the authors have chosen to introduce at the end of the Discussion section theoretical work which has not been previously mentioned or explained. I would advise the authors to reorganise their manuscript to improve the flow of information.

Conclusion

21) Line 406-408: The link between "lexical features" and predictive processing as well as the distinction with "lexical-level decoding" is not clear.

Finally, please note that there are points in the manuscript where grammatical errors/typos are found (e.g. lines 156, line 339, line 367 etc). The authors could carefully review their manuscript to correct any such cases.

Author Response

Abstract

R: Thanks for pointing this out. We welcomed the suggestion and changed the term “lexical dependency patterns” with “lexical effects”.

Introduction

R: We thank the Reviewer for this valuable observation. In response, we have revised the Introduction to explicitly present and briefly describe the key theoretical models relevant to our study. In particular, we now refer to the Simple View of Reading, the Construction-Integration Model, and the Landscape Model, each of which offers a distinct but complementary perspective on the cognitive processes underlying reading comprehension. We also clarify the common distinction across these models between lower-level processes (e.g., word recognition, syntactic parsing) and higher-level processes (e.g., inference generation, integration of information), which serves as the theoretical framework for interpreting our results.

In the Conclusions section, we have also strengthened the link between our findings and these theoretical assumptions. In particular, we highlight how our results align with models that emphasize higher-order integration and context-based semantic processing in skilled readers, thereby addressing the concern regarding the interpretability of our findings in light of relevant theoretical frameworks.

R: We thank the reviewer for this useful observation. We agree that the term “non-linguistic” was too vague in the original version. We have therefore revised the paragraph to specify that we are referring to domain-general cognitive mechanisms such as working memory, inhibition, attention allocation, and metacognitive monitoring. Although these functions were not directly assessed in our study, we consider them important contributors to comprehension and discuss their role and relevance in the concluding remarks of the Discussion section, by acknowledging this as a limitation and suggesting that future studies should include independent measures of executive functions to better capture the interplay between linguistic and cognitive dimensions of reading comprehension.

4) Line 76-78: Related to my comment above, the authors mention “working memory” and “inhibition”, inter alia, but never explicitly define these terms in relation to “executive functions”. In fact, cognitive abilities such as “working memory” and “inhibition” are encompassed by the term “executive functions”, at least in most formal definitions of the EF construct I am aware of. Thus, the statement made in lines 76-78 appears circular and quite confusing.

R: We fully agree that “working memory” and “inhibition” are components of the broader construct of executive functions (EFs), as supported by widely accepted definitions in the cognitive literature. To address this, we have clarified the terminology in the revised version of the manuscript by explicitly referring to these processes as core executive functions. We have also modified the relevant paragraph (in the Introduction section) to better reflect the theoretical framework, ensuring that readers understand how these abilities relate to the broader domain of EF and why their role is important in reading comprehension, and added a small paragraph at the end of the Discussion highlighting the limitations of our approach.

R: We added “Italian” in the abstract to explicitly mention the nationality of our participants. Moreover, we added a few lines of text in the Introduction to further discuss how the Invalsi Test showed a decreasing level of competence of Italian language, by comparing the test results of 2024 and 2019.

6) Line 107: The abbreviation ICT is not properly introduced in full for unfamiliar readers.

R: We added the specification of the acronyms ICT (Information and Communication Technology) and NLP (Natural Language Processing).

7) Line 128: The sentence is quite vague, as it is not specified which “strengths” and “weaknesses” of what/who exactly the authors are making reference to.

R: We refer to “strengths” and “weaknesses” of the readers, as emerging from the time-stamped profile that ReadLet can provide. Following the reviewer’s comment, we integrated the paragraph by highlighting the adaptability of the protocol for large-scale reading assessment as well as for monitoring individual learning trajectories and potential difficulties.

R: We agree that our research questions and hypotheses required a more explicit formulation, as they were previously distributed across the paragraphs of the Introduction section. We have therefore added a dedicated statement at the end of the Introduction, clearly outlining our main research questions and hypotheses, in order to guide the reader more directly and address this important point.

Materials and Methods

R: There is already evidence showing a remarkably similar pattern of reading behavior when comparing finger- and eye-tracking measurements with mature readers. Crepaldi et al. (2022) revealed a strong correlation between fixation and tracking times in adult readers. In particular, Spearman p correlation values grew with embedding levels of linguistic units, ranging from .66 at the token level, to .81 at the chunk level and .98 at the sentence level. We added these details in the Introduction section.

R: We agree with the reviewer that the acronym POS cannot be familiar to every reader; accordingly, we modified it in Part-Of-Speech in the table. We removed details about mean dependency length since we added in Appendix A a translated version of our texts, which aligns in complexity with the original ones.

R: We created an Appendix section where we reported an English translation of the originally submitted texts in Italian, along with related questionnaires. Both versions of the texts have been analyzed for linguistic complexity by giving readability indices, average word length and sentence length, thus trying to ensure clarity and accessibility for a wider audience.

12) Line 198: In the interest of Open Science practices, I would advise the authors to make their analysis scripts available.

R: We fully agree and uploaded anonymized data and analysis script in an openly accessible repository (Github) and linked it in the Data Availability Statement section at the end of the paper.

R: We have followed indications in the literature (Lo & Andrews, 2015) suggesting to model reading times (and other variables that are heavily skewed on the right tail) as an inverse gaussian distribution. This approach succeeded in a remarkably high goodness of fit (in terms of R-squared), thus confirming the appropriateness of the selected method.

14) Line 205-208: When describing the independent variables in their models, the authors do not explain: (a) why age, a key developmental variable, was not included in models or why age differences were not taken into account in other ways (e.g. residualisation), (b) why question accuracy is treated as both an independent variable in the analyses of reading times and as a dependent variable later on. I think point (b) relates to my 8th comment above, namely the fact that no clear hypotheses were specified, and as such, it is hard to understand if the authors are testing particular predictions about effects of question accuracy on reading times.

R: a) we did not include age as it was mirrored by the grade level variable. b) question accuracy represented the dependent variable in the analyses of the questionnaire data (via logistic regression, with two possible levels corresponding to correct/incorrect), while it represented a continuous predictor when modelling the tracking times (in this case it is normalized between 0 and 1, and can be read as a percentage of accuracy).

R: We choose to regroup participants into 3 categories not to strongly dichotomize a variable that is continuous in nature and not to lose completely what is in between extremely good and bad comprehenders. We did so in a bottom-up manner via the quantile distributions, in order to avoid any subjective definition of question accuracy. We added, in the revised text, how many students are part of the three groups (i.e. N=16 as good comprehenders, N= 29 as average/medium, and N=12 as poor ones).

Results

16) Line 254: The results reported in Table 2 are confusing. In particular, the effect of Length is negative (estimate: -1.27; t value: -14.25). This contradicts what is commonly found in the literature as well as your own description in lines 248-249, namely that longer words result in higher tracking times. As mentioned in my 12th comment above, the authors could already provide access to their analysis scripts and model outputs to double check these results. This is easy to achieve by R markdown files which render both the code and output.

R: This is due to the inverse link function used in the model, which assumes an inverse relationship between the predictors and the dependent variable (see Lo & Andrews, 2015). Infact, the coefficient of the frequency predictor, according to which more frequent words are read faster (i.e. facilitatory effect), is positive (estimate: .25; t value: 2.48). However, the GLMM allows to back-transform model’s predictions for plotting the results. To help the reader, we added this clarification in the caption of the two models (please see Table 2 and 4).

Discussion

17) Line 330-332: The authors make a statement regarding typical findings from previous research but do not provide references to substantiate it.

R: We added explicit reference to studies in Italian, where this sensitivity has been shown for the poorest readers to exhibit word frequency and length effects during reading tasks. For instance, Burani et al. (2008) found that both dyslexic and younger readers benefited from morphological structures in words, indicating reliance on sublexical units. Similarly, De Luca et al. (2008) reported that word length significantly affected reading latencies in children with reading deficits, underscoring the impact of word length on reading performance in early readers.

18) Line 345-347: Relevant theoretical work should have been introduced in the Introduction section, so as not to appear out of the blue.

R: In response to this (and a previous) suggestion, we have integrated the Introduction section to better highlight how our study builds on these established models and contributes to the understanding of reading processes in high school students. We now more explicitly connect the theoretical models to our research questions and hypotheses, making the link between theory and our empirical study more evident to the reader. These changes are aimed at ensuring that the theoretical underpinnings are clearly established and provide a solid foundation for the subsequent analysis.

R: We have now explicitly defined this key concept within the Introduction section, directly following our discussion on the importance of inference-making. This revision aims to clarify the concept for readers early in the manuscript, ensuring a more coherent understanding of inferential processing before delving into the specific research questions.

R: We understand the importance of presenting relevant theoretical frameworks earlier in the manuscript to facilitate a smoother flow of information. To address this, we have reorganized the manuscript to introduce the key theoretical models and related research findings in the Introduction section, prior to the presentation of our research questions. This revision ensures that the theoretical background is clearly established and allows for a more cohesive transition into the study’s objectives and discussion of results. We hope this restructuring enhances the clarity and coherence of the manuscript.

Conclusion

21) Line 406-408: The link between "lexical features" and predictive processing as well as the distinction with "lexical-level decoding" is not clear.

R: To address this concern, we have revised the relevant section of the manuscript to better explain the relationship between lexical features and predictive processing. Specifically, we clarify that skilled readers tend to rely less on simple lexical-level decoding (i.e., recognizing individual word forms) and more on higher-order processes such as prediction and integration of meaning across linguistic units. Lexical features, in this context, are seen as part of the broader process of semantic activation, which facilitates anticipation of upcoming words or concepts based on context and prior knowledge. This predictive processing, in turn, allows proficient readers to minimize the impact of lexical constraints, such as word frequency, and better integrate information across the text. Conversely, poor comprehenders tend to remain more reliant on lexical decoding strategies, which makes them more sensitive to surface-level features like word frequency and length. We hope this revision makes the distinction clearer.

Finally, please note that there are points in the manuscript where grammatical errors/typos are found (e.g. lines 156, line 339, line 367 etc). The authors could carefully review their manuscript to correct any such cases.

R: we checked and corrected syntactic errors and typos.

Reviewer 2 Report

Comments and Suggestions for Authors

2024 Infer paper reviewer comments

Your literature review seems well informed in many aspects of reading and the title of your paper strongly suggests the focus of your paper is “inferential reading skills”. My problem is that you conclude this literature with a statement about “understanding explicit details in the text”(Line 150-151), and your measures and data don’t seem include these explicit text details?

Next, your selected texts are referenced to “Albertin et al., 2024” and this paper is (from its title, in the reference list, Line 447) for an elderly population and early detection of cognitive decline? This is a very different participants in your study? So, I’m questioning if these are appropriate?

While you describe some features of these texts (Table 1, Line 168), readers of this paper (possibly teachers?) don’t know the topics or the grade level difficulty of these texts – only their similarity on your reported variables…

Line 165 states these are “excerpts adapted and rearranged from original article”, so any readers of your paper do not know if these texts were factual or narrative texts or on familiar topics for the students nor how these were “adapted and rearranged”? These selected texts came from an “Italian science communication magazine”, Science texts and topics are likely the most difficult of all content area texts, especially in terms of the vocabulary, (like History, Geography) – I’m unsure how this would have affected these data?

Three types of questions are asked for the data from this study – 2 in vocabulary (using synonym detection), 3 on referential links (pronouns & other nouns, to actions) and inferences. No examples of these questions are provided, so the actual type of vocabulary, referent type (eg simple pronoun or more complex?) and 2 inference questions “tapped into the ability to draw inferences on the basis of information given in the text”. (line 174-5)

Readers of this paper can’t specifically validate what any of these questions are measuring. More importantly, inference ability (your focus topic?) is too general.

My example, likely familiar to you, would be to say “my assessment included vegetables, fruits and cakes” – when there are many different types of each food category – and then, grouping eaters on what? Maybe, one type of vegetable (included) might have had the ONLY significant effect – or the same for fruits and cakes? Inferences can include many different types (cause/effect, using background knowledge or… pick one of many more?) any of which could impact on your results?

Additionally, for example, one specific type of question (either vocabulary, referent or inference) may or may not significantly impact on the other types – or interact with the other two “question types”…? I can’t tell?

To solve these issues, you could include an appendix, with samples of the text and samples of the questions, to validate what you claim in your paper?

You regrouped students based on quartiles (Line 214 – 218) with “above 3^rd quartile… 85%” and “below 1^st quartile…57%” – so about a quarter of your final group of participants were scoring less than 57% on the 7 questions… So, quartiles are calculated based on medians, to start, and I’m thinking 85% seems a high score (almost 6 out of 7) for the 3^rd quartile, and 57% (4 out of 7) seems high for the bottom quartile?

Given your definition of “comprehension depth as defined by their accuracy in the questionnaire” (Line 213), I’m concerned about what this actually means, as I have no evidence in terms of examples of the questions you used, so I can’t judge the validity of your regrouping.

Next, your literature review acknowledges the importance of automaticity (“these processes become increasingly automatic”, Line 69), and I’m wondering how this fits with your filtering out of low and high tracking times. Perhaps, my problem with filtering out extremely low “extremely low (< 35ms) or high (> 1500ms) tracking time, or because they were covered for less than half of their length (n= 3386).” (Line 228). Maybe short tracking time may indicate automaticity of knowledge or familiarity or deep knowledge? Covering a word for less than half its length may have different causes for students with different decoding or comprehension performance – so this may be a confound that perhaps hasn’t been considered here?

Line 248 - “longer and/or unfrequent words resulted in higher tracking times than shorter and/or frequent ones.” This confirms that what I would call “unfamiliar” words might be tracked LONGER as the reader tries to figure out their meaning – maybe using memory or context?

Taking out these participants & data – there are 57 participants – unsure which grades these are from? Whether one sample biases the results – and I count 9 Grade 10 point on Figure 1 in RED – so conclusions for that grade are based on 9 participants? This leaves 48 participants across Grades 11 & 12, who are spread how? And then, you grouped the 3 grades together into the profiles… So, as a reader I don’t know how each many students are in each grade group, nor in each newly formed group.

All of your analyses rest on these questionnaire results, later grouped by performance, so I’m left unsure as to how “trends” can be validly and reliably concluded based on these final sample sizes?

All results are therefore questioned – as they all follow on from these final samples?

As shown in Figure 2, the effects of word (log) frequency and length were more pronounced for 10th graders.

Line 249-250 “As shown in Figure 2, the effects of word (log) frequency and length were more pronounced for 10th graders.” – Table 2 doesn’t seem to include any Grade 10 data, because?? The Figures below (Figure 2, a, b, c) have all 3 grades (10^th graders shown in red, Line 237). Maybe I am misreading or misunderstanding this?

Similarly, Table 3 (Line 267) doesn’t include Grade 10. Perhaps, I might be showing my ignorance of these complex statistical processes. Supportive data results could be presented simpler, showing means and standard deviations for each Grade Level Group, before any re-organising?

From Figure 3 (Line 270) it appears that Grade 12 students performed most poorly of ALL groups on inferences – the main focus of your paper? Later, Figure 4 (images on previous page to Line 283) confirm MUCH shorter tracking time for Grade 12 - for me this implies more automaticity in decoding (implying faster reading times) – and yet, the Grade 12 scores on inferences (Figure 3, Line 269) are much lower?

Your discussion talks about the “dynamic interaction between reading strategy and comprehension depth” (Line 315) and I’m questioning what “reading strategy” you have included, as well as my question about “comprehension depth” in terms of your questionnaire results (earlier, Line 213).

You state (“results did not reveal any striking difference across grade”, Line 318) when I can see on two figures (Figures 3 & 5) that Grade 12 appear to perform most poorly on inferences in your questionnaire?

You continue to discuss (line 320-325) “personal attitude” and “habit toward reading”, for which you have no evidence from your paper.

I agree with the points about automatacity, via tracking times, from which you use the term “processing speed” (Line 339 – 350), and this finding is not something new, except that you link this to “finger tracking”. I’d question whether this is typically how students in Grades 10-12 read most of the time – as my experiences suggest they read silently and seldom use finger tracking.

Continuing, each of your conclusions (lines 351 to 367) are valid and were evident in research in the mid-1980’s, from my own doctoral thesis. One statement “we found that students with the lowest accuracy could not respond to any of the question types” (Line 358) seems to contradict your earlier point that “the learning curve of reading competence is approaching its ceiling by the time teenagers start high school” (Line 321). Clearly, students who can’t accurately answer any questions are not approaching ceiling performance.

“Effective inferential processing”(Line 377) outlines details of “given content, drawing on world knowledge, assumptions, deductions, contextual factors, and textual cues to derive deeper meaning. This is in line with studies suggesting that inferential processing imposes a high cognitive demand (Singer & Leon, 2007; Barth et al., 2015)”(to Line 381) – confirmatory information that isn’t new.

For your conclusions, I totally agree, that “the relationship between inferential ability and other cognitive functions must be further explored in future studies.” (Line 390)

One final problem is this quote from Line 411-412 ( “poor comprehenders might still depend on surface-level processing strategies, making them more susceptible to lexical properties”) as you don’t really have any evidence around “surface level” (explicitly stated) comprehension. Inferences depend on background knowledge and you didn’t provide any evidence about your participants’ background knowledge of the text topics in your materials.

“Developing students’ awareness of their own comprehension processes and encouraging metacognitive strategies may also enhance their ability to derive unstated meanings from connected text” (starting in Line 417-419) This is simply synonymous with “awareness of their own comprehension”.

There is much evidence to support “the need for targeted instructional strategies to foster inferential reasoning and processing of implicit textual information” (Line 414-5), however, this isn’t new and possibly “guided questioning techniques to bridge explicit and implicit content, and exposure to diverse textual material” (Line 416-7) may be useful. Without details of your questioning techniques, you don’t seem to present evidence that supports this.

Much of your final paragraph, about “the role of metacognition” (Line 425) and “targeted interventions” (Line 427) can be justified by past research. Have you presented evidence to support this – I’m not sure?

I’m also unsure about the face validity of a “finger tracking technique” ( Line 429-430) and whether or not this is really how Grade 10-12 or other readers actually read and learn from texts.

Finally, I am VERY interested in your work, despite this feedback. I’d be keen to see how you address my comments and provide evidence to support much of what you are saying – and, more importantly, support the development of effective, evidence-based interventions for making inferences.

Comments on the Quality of English Language

Author Response

R: We agree that the Introduction needed to better clarify the articulation of skills under investigation. In the revised version, we have expanded the final paragraph of the Introduction to explicitly define inferential processing, while also reinforcing the connection between vocabulary knowledge, referential disambiguation, and the processing of explicit textual information. This addition is intended to highlight that our comprehension questionnaire was designed to assess both the ability to interpret explicit information and to draw inferences from implicit content. The updated version thus better reflects the structure of our comprehension measure and aligns with the multidimensional perspective on reading that we adopt in the study.

R: We thank the reviewer for this observation. The reference to Albertin et al. (2024) was included to acknowledge the broader PRIN ReMind project in which the reading materials were originally developed. However, we have now clarified that the texts used in the present study were not designed for assessing cognitive decline nor tailored to elderly participants. As specified in the Materials section, the selected passages were adapted and rearranged from articles originally published in Focus, an Italian science communication magazine aimed at the general public. The texts were selected for their suitability with a high school population as well, ensuring an appropriate level of linguistic complexity and content engagement. To further enhance transparency and allow readers to better evaluate the materials, we have now included English translations of the two texts in the Appendix.

R: While the texts are drawn from scientific communication, which can be more challenging in terms of vocabulary and conceptual complexity, the selection was intentional to ensure a certain level of linguistic demand. However, we understand the importance of contextualizing these texts within the grade levels and topics. Therefore, to clarify their appropriateness for our participants (high school students, grades 10-12), we have now included these texts in the Appendix. This additional information will allow readers to better assess the relevance and challenge of the materials in relation to the target population.

R: We acknowledge that the lack of specific examples for the types of questions posed in the study may have made it difficult to fully understand how the measures were applied. Accordingly, we have now included all questions (vocabulary, referential links, and inferences) in the Appendix. This will provide readers with a clearer understanding of the types of linguistic phenomena assessed and the structure of the questions.

Readers of this paper can’t specifically validate what any of these questions are measuring. More importantly, inference ability (your focus topic?) is too general.

R: Regarding the potential interactions between the three question types (vocabulary, referential links, and inference), we agree that it is important to clarify whether these factors could interact or influence one another. We have included a brief discussion of this issue in the revised manuscript, highlighting that while these skills are related, each assesses different aspects of reading comprehension.

To solve these issues, you could include an appendix, with samples of the text and samples of the questions, to validate what you claim in your paper?

R: We do thank all these precious comments and hope the additions in Appendix A (texts and questions, and their clarifications) address the concerns the reviewer raised and contribute to a more transparent understanding of the study’s design and measures. We believe these changes will enhance the overall validity and comprehensibility of our study.

You regrouped students based on quartiles (Line 214 – 218) with “above 3rd quartile… 85%” and “below 1st quartile…57%” – so about a quarter of your final group of participants were scoring less than 57% on the 7 questions… So, quartiles are calculated based on medians, to start, and I’m thinking 85% seems a high score (almost 6 out of 7) for the 3rd quartile, and 57% (4 out of 7) seems high for the bottom quartile?

R: We modified the text into “comprehension proficiency, as determined by their performance on the questionnaire, rather than by grade level”. We agree that a notion of “depth” was a bit vague and not commonly used in the domain of reading comprehension.

R: Thanks for the interesting observation. We agree that automaticity speeds up the processing time, facilitating the recognition of individual words and their integration within a coherent text representation. However, tracking times lower than 35ms seem too short for being associated to any kind of cognitive processing (similarly to how, in the context of eye-tracking, individual fixations shorter than 50ms are removed). Similar reasons led us to the exclusion of words tracked for less than half of their length.

R: The Reviewer is correct in suggesting that longer and/or less frequent words might take longer to process because readers may need to engage in additional cognitive processes, such as retrieving the word meaning from memory or using contextual clues to make sense of the word. This is consistent with research showing that more difficult words, whether due to length or frequency, often require more effort and attention, leading to longer reading times. We agree that these increased tracking times could reflect a more active processing effort to decode or integrate the unfamiliar word into the broader context of the text.

R: Since we realized there is a bit of overlap between the dots in the plot, which made it difficult to spot every single one of them, we added the sample size for each grade in the caption of Figure 2, and we changed the “merged” scatterplot with 3 ones – one for each grade level.

All results are therefore questioned – as they all follow on from these final samples?

R: We agree that the sample size could have been bigger. However, we were limited by student’s availability to sign the informed consent and take part to the study. In addition, the high goodness-of-fit of the models strengthens the reliability of the results. Although, we recognize that future studies using the current protocol should aim to include more students in the sample.

As shown in Figure 2, the effects of word (log) frequency and length were more pronounced for 10th graders.

Line 249-250 “As shown in Figure 2, the effects of word (log) frequency and length were more pronounced for 10th graders.” – Table 2 doesn’t seem to include any Grade 10 data, because?? The Figures below (Figure 2, a, b, c) have all 3 grades (10th graders shown in red, Line 237). Maybe I am misreading or misunderstanding this?

R: The baseline of the model corresponds to grade 10, and the other parameters have to be read as deviation from that baseline - note that due to the inverse link function used in the model, the sign of coefficients represents an inverse relationship with the dependent variable (i.e. tracking times). We added these explicitations in the captions. We agree that they are necessary for a complete understanding of the models.

R: Actually, Figure 4 & 5 (and the corresponding models, summed-up in Tables 4 and 5) do not include grade level anymore, but are based on the regrouping of participants based on the accuracy on the questionnaire. However, we used the same color code for both groups, which could have made the graph more difficult to read. We have changed the colors in the revised manuscript (see Figures 3, 4 and 5).

Regarding the results plotted in Fig 3, grade 12 achieved the lowest absolute accuracy. However, the model did not reveal a significant difference in the question accuracy on inferential questions between grades (as also pointed out by the large overlap between the error bars). The main result of the current paper is rather the one plotted in Fig 5, and relates to the fact that we could dissociate the capability of drawing inferences from the competence on vocabulary size and referential disambiguation.

R: Thank you for your insightful comment. By "reading strategy," we refer to the cognitive processes and approaches that readers employ when engaging with a text, such as decoding strategies, use of context, and inference-making. These strategies, while not directly measured as a separate variable in our study, are inferred from the way participants responded to different types of questions (vocabulary, referential, and inference) in the comprehension questionnaire. The interaction between these strategies and comprehension depth is reflected in the performance differences across these question types, which depend on both explicit understanding (vocabulary and referential links) and more implicit processes (inference-making). Regarding "comprehension depth" (now re-defined as comprehension proficiency) we defined it operationally based on participants' accuracy in the questionnaire, which was designed to tap into different layers of comprehension -- ranging from explicit recall to higher-order inferential processing. We believe this operational definition allows us to capture a nuanced view of reading comprehension beyond basic understanding, though we agree that future work could further elaborate on the specific strategies employed by readers in relation to these different levels of comprehension.

You continue to discuss (line 320-325) “personal attitude” and “habit toward reading”, for which you have no evidence from your paper.

R: Thank you for pointing out this issue. You are correct in noting that we do not provide direct evidence of "personal attitude" or "habit toward reading" in the present study. These terms were introduced in the discussion to acknowledge potential confounding factors that might influence reading comprehension, which we consider as part of the broader context of reading behavior. However, since we do not directly measure or collect data on these variables, we have removed these references from the revised manuscript to maintain clarity and focus on the actual findings of the study. We appreciate your comment, and we have adjusted the discussion to ensure that our conclusions are more tightly aligned with the evidence provided in our results: The absence of a clear developmental pattern in tracking times might reflect individual differences in processing strategies rather than a uniform progression across grades.

I agree with the points about automaticity, via tracking times, from which you use the term “processing speed” (Line 339 – 350), and this finding is not something new, except that you link this to “finger tracking”. I’d question whether this is typically how students in Grades 10-12 read most of the time – as my experiences suggest they read silently and seldom use finger tracking.

For your conclusions, I totally agree that “the relationship between inferential ability and other cognitive functions must be further explored in future studies.” (Line 390)

R: We appreciate your observation regarding the lack of explicit evidence on "surface-level processing." To address this, we recognize that while our study focuses on inferential reasoning, it is true that explicit comprehension, including surface-level processing, is an underlying factor in the ability to make inferences. In the revised version we modified "surface-level processing" into basic lexical processing, thus clarifying that we are referring to word-level recognition and surface features, such as word frequency or familiarity.

We recognize the importance of considering background knowledge in the comprehension process. In the current study, participants were not specifically assessed for prior knowledge of the text topics. We appreciate your suggestion and incorporated the acknowledgement that prior knowledge of the topics addressed in the texts was not explicitly measured.

R: We agree and we have modified into The absence of a clear developmental pattern in tracking times might reflect individual differences in processing strategies rather than a uniform progression across grades.

R: We clarified this issue, both in the Materials section as well as in the Discussion.

R: We clarified this issue by modifying the text into While our study did not directly assess students’ metacognitive strategies, our findings are consistent with the idea that enhancing students’ awareness of their own comprehension processes may support their ability to derive unstated meanings from connected text, in line with recent evidence on the role of metacognition in reading comprehension (Cartwright 2023; rice2024; Hall et al. 2020; Rice & Wijekumar 2024}.

I’m also unsure about the face validity of a “finger tracking technique” ( Line 429-430) and whether or not this is really how Grade 10-12 or other readers actually read and learn from texts.

R: While we acknowledge that finger-tracking is not a naturalistic reading modality for proficient readers, we have now clarified in the Introduction that its use is supported by strong correlations with eye-tracking data at multiple levels of text levels, especially in studies with young adults. Accordingly, given its low intrusiveness and applicability in classroom settings, finger-tracking offers a feasible and informative method to investigate reading dynamics in school-aged participants.

R: We would like to express our gratitude for the constructive and thoughtful comments provided. The reviewer's critical insights and encouragement have helped us clarify the scope of our study and strengthen both its empirical and interpretative components. We have revised the manuscript accordingly, and we hope that the changes implemented effectively address the concerns raised. We truly appreciate the reviewer’s interest in our work and their support for advancing evidence-based strategies to foster inferential reading skills.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I find this sentence to be hard-to-read and thus it's difficult to understand what the authors are claiming. Could the authors clarify what they mean by "proficient readers rely on lexical features" and how/why exactly this suggests "predictive processing and higher-order integration mechanisms rather than lexical-level decoding"?

Secondly, the authors could consider revising their manuscript to correct any errors/typos, as I had mentioned in my last review. Please find some examples below

Line 185 --> ...participants took part in our study...

Line 281 --> ...the highest accuracy

Line 338 --> This hypothesis seems to be strengthened

Author Response

The authors have addressed the questions I had originally raised, and I have no further questions. There are just two remaining points that the authors may want to consider. Firstly, the authors have amended lines 411-414 in their Conclusion section as follows:

" This result aligns with theories– such as the Construction-Integration Model and the Landscape Model– proposing that proficient readers rely on lexical features – as part of a broader, context-based processing strategy– suggesting predictive processing and higher-order integration mechanisms rather than lexical-level decoding, in line with well-established findings (Perfetti & Stafura, 2014). ".I find this sentence to be hard-to-read and thus it's difficult to understand what the authors are claiming. Could the authors clarify what they mean by "proficient readers rely on lexical features" and how/why exactly this suggests "predictive processing and higher-order integration mechanisms rather than lexical-level decoding"?

R: We followed your comment and expanded the paragraph to convey our interpretation in a more detailed manner.

Line 281 --> ...the highest accuracy

Line 338 --> This hypothesis seems to be strengthened

R: Thanks for pointing out these typos. We corrected them and a couple of other ones.

Reviewer 2 Report

Comments and Suggestions for Authors

Firstly, I want to congratulate the author(s) for taking on board many of my comments, which I can see in this second review!

I’d be really interested to see if this “Readlet protocol” might be useful for these purposes, in English-speaking schools…

My process is to re-read the whole paper, checking for coherent text, as well as each specific Line Reference for additional texts added following my initial review…

My first comment is around paragraph length. Line 129, where “However” appears, I think this should be a new paragraph and would be consistent with Line 102, where the authors do this.

I am pleased to see the Appendix, with texts, questions & answers; numbers of students (around Line 246) and Figure 1, Figures 3,4 & 5, Table 2, Tables 3, 4 & 5 (graphs and descriptions) are much easier to read. These are valuable additions for readers of this paper. I’d suggest you double check the texts in your appendix, as there seem to be some minor punctuation errors.

I understand Table 1 has the data that your paper included, and wondered if you added a word count (in your Appendix) and maybe some measure of text difficulty/complexity (that teachers might understand (in Table 1)? There are such online measures, some of which are freely available.

Your discussion is one paragraph - which seems much too long? Please break this into separate main points for the ease of readers.

Please, also, re-read your conclusions with a similar goal of breaking this down into main ideas that readers can more easily understand and take on board.

Lastly, I’d be interested in following up further with you or your team about any other research you are working on in comprehension or, more specifically inferences. I hope that this process enables the sharing of my details, once your paper is published.

Author Response

Firstly, I want to congratulate the author(s) for taking on board many of my comments, which I can see in this second review! I’d be really interested to see if this “Readlet protocol” might be useful for these purposes, in English-speaking schools… My process is to re-read the whole paper, checking for coherent text, as well as each specific Line Reference for additional texts added following my initial review…

My first comment is around paragraph length. Line 129, where “However” appears, I think this should be a new paragraph and would be consistent with Line 102, where the authors do this.

R: We splitted the text into two different paragraphs.

R: Thanks for pointing out the typos. We corrected them.

R: We added a new Table in the Appendix listing the same lexical features described in Table 1 in the main text.

Your discussion is one paragraph - which seems much too long? Please break this into separate main points for the ease of readers.

Please, also, re-read your conclusions with a similar goal of breaking this down into main ideas that readers can more easily understand and take on board.

R: We splitted the text into different paragraphs.

R: Thank you very much for your interest. You can find our e-mails in the manuscript. We are open to exchange emails on the topic.

Article Menu

Inferential Reading Skills in High School: A Study on Comprehension Profiles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI