Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Development and Diagnostic Accuracy of a Novel Screening Tool for Early Detection of Pediatric Visual Impairment in Indonesian School-Aged Children

Healthcare 2026, 14(9), 1233; https://doi.org/10.3390/healthcare14091233

by Arya Ananda Indrajaya Lukmana¹

, Tri Rahayu^2,3,4,*, Kianti Raisa Darusman^1,2,5, Ray Wagiu Basrowi^1,6,7,*

and Nila Djuwita F. Moeloek^1,2

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Healthcare 2026, 14(9), 1233; https://doi.org/10.3390/healthcare14091233

Submission received: 25 March 2026 / Revised: 30 April 2026 / Accepted: 1 May 2026 / Published: 3 May 2026

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This study demonstrates strong statistical performance for the CIPSEL score, a 10-item questionnaire focused on visual symptoms to identify uncorrected refractive errors (URE), among children in a school setting. By utilizing a three-tier risk stratification system rather than a binary pass/fail approach, the authors established a robust correlation between questionnaire results and actual clinical visual acuity.

Please address the following points in the Discussion section:

1. While this tool shows significant promise as a first-line screening method, the study findings does not represent "confirmed" URE. Clearly framing the CIPSEL Score as preliminary screening layer is essential for managing clinical expectations. It is important for people to understand that this tool is designed to prioritize high-risk cases for specialist referral while acknowledging the need for subsequent clinical confirmation.

2. Since the study focused on children between aged 8 to 12 years, this tool functions more accurately as a refractive error screener rather than a comprehensive amblyopia prevention tool. Given that effective amblyopia treatment is most successful in younger cohorts, please include a discussion on the necessity of developing a separate version of CIPSEL tailored for the 4–6 age group to truly address preventable vision loss early in the critical period of visual development.

3. Please provide specific examples of how the "medium risk" category should be managed in real world clinic settings such as implementing a formal 3-6 month monitoring interval for scheduled re-screening.

4. Typo in Figure 2. Correct Students with normal "Visus" to Vision

Author Response

Response: We thank the reviewer for this important observation and fully agree. We have added an explicit statement early in the Discussion section to frame CIPSEL unambiguously as a preliminary triage instrument rather than a diagnostic tool. The following text has been added (Discussion, Paragraph 1):

"It is essential to emphasize that a positive CIPSEL screen does not constitute a diagnosis. The tool is intended as a preliminary triage layer, identifying children who warrant referral for formal clinical evaluation, including best-corrected visual acuity testing and, where indicated, cycloplegic refraction, rather than as a replacement for ophthalmological assessment."

This framing is also reinforced in the Conclusions section, which now consistently refers to CIPSEL as a first-line screening and triage tool.

Comment 2

Since the study focused on children between aged 8 to 12 years, this tool functions more accurately as a refractive error screener rather than a comprehensive amblyopia prevention tool. Given that effective amblyopia treatment is most successful in younger cohorts, please include a discussion on the necessity of developing a separate version of CIPSEL tailored for the 4–6 age group to truly address preventable vision loss early in the critical period of visual development.

Response: We agree with this clinically important observation. A dedicated paragraph has been added to the Discussion section addressing the critical period of visual development and the necessity of developing a CIPSEL version adapted for the 4–6 age group. This paragraph is located immediately following the community ophthalmology implications discussion, before the Limitations paragraph:

"The current study was conducted among children aged 8–12 years (grades 3–5), a population in whom the critical period for amblyopia development has largely concluded. While CIPSEL demonstrates strong screening performance in this age group, the greatest opportunity for preventing permanent vision loss lies in earlier detection, ideally before the age of 7–8 years, when neuroplasticity of the visual cortex allows for effective amblyopia treatment [33]. Future development should therefore prioritize the adaptation of CIPSEL for younger cohorts, particularly children aged 4–6 years. A version tailored for this age group would require significant modification: the questionnaire format would need to shift from self-report to a proxy-report instrument completed by parents or caregivers, and items would need to be redesigned around observable behaviors accessible to non-clinical observers, given the limited verbal and cognitive capacity of preschool-aged children."

The supporting reference cited is: Thompson B et al. Harnessing Brain Plasticity to Improve Binocular Vision in Amblyopia: An Evidence-Based Update. Eur J Ophthalmol. 2024;34:901–912 [Ref. 33].

Comment 3

Please provide specific examples of how the "medium risk" category should be managed in real world clinic settings such as implementing a formal 3-6 month monitoring interval for scheduled re-screening.

Response: We thank the reviewer for this practical and clinically relevant suggestion. We have added a dedicated paragraph within the tiered stratification discussion section that provides concrete, real-world management guidance for the Medium Risk category:

"For children in the Medium Risk category (score = 3), a structured monitoring approach is recommended rather than immediate referral or discharge. A re-screening interval of 3–6 months is proposed, allowing sufficient time to observe whether symptoms stabilize or progress, while avoiding unnecessary burden on specialist services. If symptoms worsen before the scheduled re-screening — such as increased squinting, declining academic performance, or new visual complaints — prompt referral to an ophthalmologist is warranted."

Additionally, Table 6 (CIPSEL Risk Stratification and Recommended Clinical Actions) has been updated to specify this interval explicitly in the Suggested Action column for Medium Risk, replacing the previous non-specific phrase 'Re-screen within a few months.'

Comment 4

Typo in Figure 2. Correct Students with normal "Visus" to Vision.

Response: We thank the reviewer for identifying this typographical error. The label in Figure 2 has been corrected from "Students with Normal Visus" to "Students with Normal Vision" in the revised figure file submitted alongside this manuscript.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

I congratulate the authors on their work. Below are some specific comments:

The study included students in grades 3-5, and the age range is stated in the manuscript as "approximately aged 8–12 years" (line 270). This suggests that the authors did not directly record the age, but rather used an estimated value. However, the Abstract states "mean age 8.6 years" (line 23). The Methods section states that demographic data (including age) were not individually recorded due to an anonymity protocol (line 190). So how was the average age of 8.6 years arrived at? If the estimated age was calculated based on the grade level of the participating children and the number of children in each grade, is this truly a solid enough statistical calculation to justify specifying an average value in such a study?
Similarly, gender information was not recorded. Since neither gender nor age data is available, it is not possible to perform subgroup analyses.
The study used the Snellen test. This test does not replace cycloplegic refraction. The authors, in a very honest approach, acknowledged this fact and did not claim a definitive diagnosis. However, the article highlights a fundamental drawback of this choice. Using a Snellen test without cycloplegia can lead to diagnostic difficulties. For example, hyperopia can easily be overlooked. Or children with eye strain, attention deficit, or cooperation problems might be mistakenly classified as having "visual impairment." In short, AUC values are compared with a flawed reference standard rather than a true gold standard. In this case, if the reference standard itself is flawed, how can we be sure how well or poorly the instrument accurately reflects its true diagnostic performance?
The Buderer formula indicated that the study's sample size should be at least 130 people. 131 students were included in the study. However, all the children were selected from a single school in the South Jakarta region. How can we say that these 131 students represent all of Indonesia or the WTO countries, including the region where the study was initially conducted? Furthermore, the 19.8% prevalence found in the study seriously contradicts the 40% reference used in the methodology section.

Author Response

Comment 1

The study included students in grades 3-5, and the age range is stated in the manuscript as "approximately aged 8–12 years" (line 270). This suggests that the authors did not directly record the age, but rather used an estimated value. However, the Abstract states "mean age 8.6 years" (line 23). The Methods section states that demographic data (including age) were not individually recorded due to an anonymity protocol (line 190). So how was the average age of 8.6 years arrived at? If the estimated age was calculated based on the grade level of the participating children and the number of children in each grade, is this truly a solid enough statistical calculation to justify specifying an average value in such a study?

Response: We thank the reviewer for identifying this important inconsistency. Upon reflection, we fully agree that specifying a mean age value of 8.6 years was methodologically unjustifiable given that individual age data were not recorded. The value was retrospectively estimated from grade-level distribution, which does not constitute a robust enough basis to report as a precise statistical mean.

Accordingly, we have removed the mean age of 8.6 years from the Abstract and all other sections of the manuscript. All references to participant age now consistently use the estimated range of 8–12 years based on grade level, with explicit justification. The following clarification has been added to the Methods section:

"The age range of 8–12 years was estimated based on the standard Indonesian primary school entry age (7 years) and the grade levels of participating students (grades 3–5), consistent with the Ministry of Education's national schooling guidelines. Individual age data were not recorded in accordance with the anonymization protocol."

The Highlights section has also been updated to read: "131 students from grades 3 to 5, estimated to be between 8 and 12 years of age based on the Indonesian standard school entry age system," replacing the previous incorrect reference to a mean age.

Comment 2

Similarly, gender information was not recorded. Since neither gender nor age data is available, it is not possible to perform subgroup analyses.

Response: We agree with this observation and acknowledge it as a meaningful limitation. The absence of individually recorded sex and age data was a consequence of the anonymization protocol applied during the government-mandated school health screening program under which this study was embedded. While this protocol is ethically justified and consistent with the KEPPKN 2021 exemption criteria, the reviewer is correct that it precludes sex-stratified and age-stratified subgroup analyses.

We have added an explicit acknowledgement and methodological rationale for this limitation in the Limitations section:

"The absence of individually recorded sex and age data, while consistent with the anonymization protocol of the government-mandated screening program, precludes sex-stratified or age-stratified subgroup analyses. Given that the prevalence and symptomatology of refractive errors may differ between sexes and across specific age cohorts within the 8–12 year range, future studies should incorporate these variables under appropriate data protection frameworks to enable more granular analysis."

We also note this explicitly in the Future Research section as a priority direction for subsequent validation studies.

Comment 3

The study used the Snellen test. This test does not replace cycloplegic refraction. The authors, in a very honest approach, acknowledged this fact and did not claim a definitive diagnosis. However, the article highlights a fundamental drawback of this choice. Using a Snellen test without cycloplegia can lead to diagnostic difficulties. For example, hyperopia can easily be overlooked. Or children with eye strain, attention deficit, or cooperation problems might be mistakenly classified as having "visual impairment." In short, AUC values are compared with a flawed reference standard rather than a true gold standard. In this case, if the reference standard itself is flawed, how can we be sure how well or poorly the instrument accurately reflects its true diagnostic performance?

Response: We thank the reviewer for raising this fundamental methodological point with such clarity. This is perhaps the most critical scientific concern in the review, and we have addressed it substantively rather than defensively.

We agree that the use of non-cycloplegic Snellen visual acuity as the reference standard represents an important limitation that requires transparent and specific acknowledgement. We have substantially expanded the discussion of this limitation with a dedicated paragraph supported by three peer-reviewed references, now included in the Discussion section (following the initial Snellen acknowledgement):

"A fundamental methodological consideration concerns the use of uncorrected Snellen visual acuity as the reference standard, rather than cycloplegic refraction, which is widely regarded as the gold standard for diagnosing refractive errors in children [18]. This limitation was acknowledged a priori and the study outcome variable was consequently framed as 'visual impairment consistent with possible URE' rather than 'confirmed refractive error.' It is recognized that non-cycloplegic Snellen testing may underestimate accommodative hyperopia, as children with significant hyperopia may compensate through active accommodation and achieve normal visual acuity despite clinically relevant refractive error [20], and may conversely overestimate visual impairment in children with accommodative excess or spasm, where transiently reduced distance acuity reflects a functional rather than structural deficit [21], or in those with limited cooperation, attention, or comprehension of the test task during mass screening [19]. However, this pragmatic choice reflects real-world primary care constraints in Indonesia, where cycloplegic refraction is not feasible in mass school-based screening. Importantly, this same limitation is shared by virtually all large-scale school vision screening programs globally, including those endorsed by the WHO, which rely on visual acuity as the primary screening metric because of its practicality [22]. The AUC of 0.887 therefore reflects CIPSEL's performance against the same reference standard used operationally in the target setting, a programmatically valid benchmark even if not the biological gold standard."

Regarding the reviewer's specific concern about whether the AUC can be trusted given an imperfect reference standard: we wish to clarify that the vast majority of school vision screening validation studies, including those cited in our manuscript, use non-cycloplegic visual acuity as their reference standard precisely because it represents the operational benchmark against which any screening tool must perform in real-world settings. Cycloplegic refraction is not available at the point of primary-care screening anywhere in Indonesia's public health system. The CIPSEL tool is designed to work within this operational reality, and its AUC reflects its accuracy relative to the same standard it would be used alongside in practice. Future studies should indeed compare CIPSEL performance against cycloplegic refraction as a confirmatory gold standard, a recommendation now explicitly stated in the Future Research section.

New references have been added to support the specific claims in this paragraph:

18. Morgan, I.G.; Iribarren, R.; Fotouhi, A.; Grzybowski, A. Cycloplegic Refraction Is the Gold Standard for Epidemiological Studies. Acta Ophthalmol. (Copenh.) 2015, 93, 581–585, doi:10.1111/aos.12642.

19. Arnold, R.W.; Donahue, S.P.; Silbert, D.I.; Longmuir, S.Q.; Bradford, G.E.; Peterseim, M.M.W.; Hutchinson, A.K.; O’Neil, J.W.; De Alba Campomanes, A.G.; Pineles, S.L. AAPOS Uniform Guidelines for Instrument-Based Pediatric Vision Screen Validation 2021. J. Am. Assoc. Pediatr. Ophthalmol. Strabismus 2022, 26, 1.e1-1.e6, doi:10.1016/j.jaapos.2021.09.009.

20. O’Donoghue, L.; Rudnicka, A.R.; McClelland, J.F.; Logan, N.S.; Saunders, K.J. Visual Acuity Measures Do Not Reliably Detect Childhood Refractive Error - an Epidemiological Study. PLoS ONE 2012, 7, e34441, doi:10.1371/journal.pone.0034441.

21. Monsa Hilora; Koushik Tripathy Accommodative Excess; StatPearls [Internet].; StatPearls Publishing: Treasure Island (FL), 2025;

Comment 4

The Buderer formula indicated that the study's sample size should be at least 130 people. 131 students were included in the study. However, all the children were selected from a single school in the South Jakarta region. How can we say that these 131 students represent all of Indonesia or the WTO countries, including the region where the study was initially conducted? Furthermore, the 19.8% prevalence found in the study seriously contradicts the 40% reference used in the methodology section.

Response: We thank the reviewer for raising both the generalizability concern and the prevalence discrepancy. We address each in turn.

On generalizability: We agree that a single-school sample from South Jakarta cannot and should not be presented as representative of all of Indonesia, let alone other LMIC settings. The study was explicitly designed as an initial pilot validation study, not a population survey, and the sample size was calculated for diagnostic accuracy purposes (i.e., to ensure reliable sensitivity and specificity estimates), not for population representativeness. We have strengthened the Limitations section with the following explicit statement:

"As a single-center pilot study conducted in one urban private school in South Jakarta, generalizability to rural, public school, or lower socioeconomic settings remains to be established. The study was designed as an initial validation study, not a population survey, and the sample size was calculated for diagnostic accuracy studies specifically, not for prevalence estimation."

On the prevalence discrepancy (19.8% vs. 40%): The reviewer identifies that the observed prevalence of 19.8% is substantially lower than the 40% figure used in the Buderer sample size calculation. We wish to clarify that the 40% figure was deliberately selected as a conservative, worst-case prevalence estimate derived from a post-pandemic Jakarta-wide study (Darusman et al., 2023), which assessed a broader and more heterogeneous school population. In diagnostic accuracy sample size calculations, using a higher prevalence estimate is standard practice to ensure that the number of 'positive' cases (i.e., children with visual impairment) is sufficient to produce reliable sensitivity estimates. Using the conservative estimate of 40% ensured that our sample would contain an adequate number of true positive cases even if the actual prevalence was lower. The observed prevalence of 19.8% in this private school sample is consistent with the urban private school setting of this study, where children may have already undergone prior eye care, and should not be interpreted as contradicting the broader Jakarta prevalence data. We have added the following clarifying sentence to the Methods section, immediately following the sample size paragraph:

"The 40% prevalence figure was selected as a conservative worst-case estimate to ensure adequate sample size for diagnostic accuracy analysis. The actual observed prevalence of 19.8% in this single-school pilot study reflects the specific characteristics of the study setting and should not be interpreted as a population-level prevalence estimate."

We appreciate the reviewer's attention to this detail, which has allowed us to provide a more rigorous and transparent explanation of the sample size rationale.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

I thank the authors for their responses to my previous comments. I would like to address a point I omitted in the first round.

Questions 8 and 9 of the CIPSEL questionnaire are functionally conditional: Question 9 ('If you have ever worn glasses, do you wear them throughout the day?') is only valid for students who answered 'Yes' to question 8 ('Have you ever worn glasses before?'). For students who have never worn glasses, question 9 is normally semantically unanswerable.
The article does not explain how this conditional structure is handled in the scoring algorithm. Specifically: was a default score of 0 assigned to question 9 for students who do not wear glasses, or was the item skipped and the total score calculated proportionally? This information is important because it directly affects each participant's total CIPSEL score, the score distribution reported in Table 3, and therefore the AUC and cutoff point analyses.

Author Response

We confirm that a score of 0 was assigned to Item 9 for all students who had never worn glasses (i.e., those who responded 'No' to Item 8). No items were skipped, and no proportional score adjustment was applied. Total CIPSEL scores for all 131 participants were calculated as the straightforward sum of all 10 items, yielding a consistent range of 0–10 across the entire sample.

Point 1 — Item 9 Is Semantically Answerable by All Participants

We respectfully note that Item 9, "If you have worn glasses before, do you wear them throughout the day?", is, in our assessment, semantically answerable by all students, including those who have never worn glasses. The underlying behavior being measured is whether a child wears corrective lenses consistently. A child who has never worn glasses does not wear glasses throughout the day , and can therefore answer 'No' meaningfully and validly. The conditional clause ("If you have worn glasses before...") functions as a contextual qualifier to help respondents interpret the question, not as a logical gate that renders the item inapplicable to non-wearers.

This is distinct from a genuinely unanswerable conditional item — for example, "What brand of glasses do you wear?" — where a non-wearer has no valid response option. Item 9 offers a binary Yes/No response, and 'No' is both linguistically appropriate and behaviorally accurate for non-wearers. Assigning a score of 0 therefore reflects a valid response, not an imputed default.

Point 2 — The Three-Scenario Structure Is Internally Consistent and Clinically Coherent

The combined scoring of Items 8 and 9 produces three distinct and clinically meaningful scenarios, as summarized in the table below:

Scenario	Q8 (Glasses ever?)	Q8 Score	Q9 (Worn throughout day?)	Q9 Score	Combined Q8+Q9	Clinical Interpretation
A	No — never worn glasses	0	No — does not wear glasses at all	0	0	No prior diagnosis or correction; vision has not been evaluated or corrected
B	Yes — worn glasses before	1	No — does not wear them consistently	0	1	Prior diagnosis but non-compliant with correction; potential ongoing uncorrected impairment
C	Yes — worn glasses before	1	Yes — wears them throughout the day	1	2	Prior diagnosis and fully compliant with correction; most likely managed

This gradient — from no prior diagnosis (Scenario A), to diagnosed but non-compliant (Scenario B), to diagnosed and fully compliant (Scenario C) — reflects a clinically coherent progression of refractive error history and management. The scoring structure is therefore not merely a procedural convention but carries substantive clinical meaning consistent with the construct being measured.

Point 3 — Empirical Evidence: Cronbach's Alpha Confirms Universal 10-Item Scoring

The internal consistency analysis (Cronbach's α = 0.649) was calculated across all 10 items for all 131 participants without any missing data handling or item exclusion for Q9. This serves as direct empirical evidence that Item 9 was scored as a complete item for all participants. Had items been skipped for a subset of participants or scores calculated proportionally, the reliability analysis would have required imputation or listwise exclusion procedures — neither of which was applied. The uniformity of the 10-item scoring structure is therefore embedded in the reported reliability statistics.

Point 4 — Intentional Adaptation for Field Administration Context

We acknowledge that the instrument from which CIPSEL was adapted — Vasudevan et al. (2023) — administered the equivalent item only to students who had previously used spectacles, effectively treating it as a filtered/branching item. In that large-scale epidemiological survey (n = 3,432), branching logic was feasible. In the CIPSEL context, questionnaires were administered via face-to-face interview by healthcare workers in a mass school screening setting, where branching logic introduces a meaningful risk of administration errors and increases cognitive load for non-specialist interviewers. The decision to administer Item 9 universally — with non-wearers naturally responding 'No' — was therefore a deliberate contextual adaptation designed to maximize administration fidelity in resource-limited, non-clinical settings.

Point 5 — No Impact on Score Distributions, AUC, or Cutoff Analyses

Since the scoring of Item 9 was applied consistently and uniformly to all 131 participants from the outset of data collection, the score distributions reported in Table 3, the AUC of 0.887 (95% CI: 0.829–0.946), the dual cutoff points (≥3 and ≥4), and all derived sensitivity and specificity values are accurate and unaffected by this clarification. There is no recalculation required; the data have always reflected this scoring approach.

Author Response File: Author Response.pdf

Article Menu

Development and Diagnostic Accuracy of a Novel Screening Tool for Early Detection of Pediatric Visual Impairment in Indonesian School-Aged Children

Further Information

Guidelines

MDPI Initiatives

Follow MDPI