Review Reports
- Ahmadreza Mirzaei,
- Alireza Rahmani Shahraki and
- Fiona P. Maunsell
- et al.
Reviewer 1: Anonymous Reviewer 2: Anonymous
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis study on an AI-assisted fecal egg counting system has clinical relevance. However, several issues need to be addressed:
- The authors evaluate both "quantitative agreement" and "diagnostic performance," but the main research aim is not clearly stated. It is recommended that the abstract and introduction clearly indicate whether the goal is to validate Parasight® as a replacement method or merely as a screening tool;
- All data in this study were used for both model building and evaluation, with no independent validation set, which may raise the risk of overfitting;
- The algorithmic process of the Parasight® system is largely undisclosed—can any relevant workflow or methodology be provided?
- Although samples were refrigerated, testing within 6 days may still affect egg integrity; experimental evidence or supporting literature should be provided;
- What is the repeatability or measurement stability of the Parasight® system itself? Has this been evaluated?
- The consistently low values of Lin’s concordance correlation coefficient—are they due to limitations in the Parasight® algorithm, or related to sample selection or data distribution?
Author Response
Reviewer 1 Comments and Author Responses
Comment 1.
The authors evaluate both “quantitative agreement” and “diagnostic performance,” but the main research aim is not clearly stated. It is recommended that the abstract and introduction clearly indicate whether the goal is to validate Parasight® as a replacement method or merely as a screening tool.
Response:
We appreciate this comment and agree that the primary objective required clearer articulation. We have revised both the abstract and the final paragraph of the Introduction to explicitly state that the goal of this study was not to validate Parasight® as a direct replacement for the manual McMaster method, but rather to evaluate its performance as an automated screening and decision-support tool for identifying animals exceeding a clinically relevant treatment threshold. The revised text now clearly emphasizes assessment of agreement, ranking ability, and diagnostic classification rather than interchangeability with manual counts.
Comment 2.
All data in this study were used for both model building and evaluation, with no independent validation set, which may raise the risk of overfitting.
Response:
We acknowledge this limitation. This study was designed as an analytical validation study rather than a machine-learning model development effort. The Parasight® algorithm was not trained or modified using our dataset; instead, it was evaluated as a fixed, commercially deployed system. Our analyses (CCC, Bland–Altman, ROC, and regression) were therefore performance evaluations, not model training or tuning exercises. To clarify this point, we have added text to the Discussion noting that while no independent external dataset was used, the risk of algorithmic overfitting is minimized because the AI model itself was not developed or adjusted using these data. We also note that future multi-farm studies with external validation cohorts are warranted.
Comment 3.
The algorithmic process of the Parasight® system is largely undisclosed—can any relevant workflow or methodology be provided?
Response:
We agree that transparency is important. However, Parasight® is a proprietary commercial system, and detailed algorithmic architecture is not publicly available. We have clarified this limitation in the Materials and Methods and Discussion, and we now provide a higher-level description of the workflow based on publicly available information: standardized filtration, fluorescent labeling of helminth eggs, automated image acquisition, and deep-learning–based classification and enumeration. We explicitly state that lack of access to internal algorithmic parameters is a limitation inherent to evaluating proprietary diagnostic platforms.
Comment 4.
Although samples were refrigerated, testing within 6 days may still affect egg integrity; experimental evidence or supporting literature should be provided.
Response:
We thank the reviewer for this important point. We have revised the Materials and Methods section to better justify sample handling and now explicitly cite supporting evidence. In particular, we reference Crawley et al. (2016), who demonstrated that fecal egg counts remain stable for up to 7 days under refrigeration (3–5 °C), with significant declines occurring only after longer storage periods. This reference has been added to support our storage protocol and to acknowledge that refrigeration beyond one week may affect egg recovery.
Comment 5.
What is the repeatability or measurement stability of the Parasight® system itself? Has this been evaluated?
Response:
We acknowledge that within-device repeatability of Parasight® was not directly assessed in this study. Our experimental design focused on between-method agreement using a single Parasight® measurement per sample, reflecting typical clinical use. We have now explicitly stated this as a limitation in the Discussion, noting that future studies should include replicate Parasight® measurements on the same samples to quantify intra-device precision and short-term measurement stability.
Comment 6.
The consistently low values of Lin’s concordance correlation coefficient—are they due to limitations in the Parasight® algorithm, or related to sample selection or data distribution?
Response:
This is an important observation. We have expanded the Discussion to clarify that the low CCC values primarily reflect systematic underestimation and scale bias, rather than poor association. This interpretation is supported by 1. moderate-to-high Spearman correlations, 2. strong ROC performance (AUC 0.90–0.96), and 3. regression analyses demonstrating consistent monotonic relationships between methods. Additionally, the right-skewed distribution and high biological variability inherent to fecal egg count data likely contributed to reduced absolute concordance.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsDear authors,
I found your research highly engaging and believe it makes a significant and valuable contribution to the field. I have a few comments:
- Please clarify whether storing fecal samples for 6 days at +4°C may have contributed to the degradation of parasite eggs and potentially led to false-negative results. Were any preservatives used during storage? If so, please provide detailed information on the storage protocol, including the type and concentration of preservatives, if applicable.
- Please specify the qualifications of the observers involved in the study (e.g., their experience in laboratory diagnostics of parasite eggs). This information is important for assessing the reliability of the manual egg-counting process.
- If this study were conducted in a clinical setting, could the error associated with manual egg counting—particularly as parasitic load increases—impact the proposed treatment regimen for animals? Specifically: (1) Would the observed errors in manual counting influence the choice and/or dosage of prescribed antiparasitic drugs?
Conversely, would the errors introduced by the automated egg-counting software affect treatment decisions (e.g., drug selection or dosage)? (2) It is essential to elaborate in greater detail on the practical implications of the errors identified in both manual and automated counting methods. How might these inaccuracies influence clinical decision-making? If the errors in automated counting do not significantly alter the overall treatment strategy compared to manual counting, could it be argued that these errors are less critical in a clinical context? Addressing these points will help underscore the relevance and applicability of your findings. - Please include a conclusion that succinctly summarizes the key findings of your study, their implications for veterinary practice, and potential directions for future research. This will help contextualize the significance of your work for readers.
Author Response
Reviewer 2 Comments and Author Responses
Comment 1
Please clarify whether storing fecal samples for 6 days at +4 °C may have contributed to the degradation of parasite eggs and potentially led to false-negative results. Were any preservatives used during storage? If so, please provide detailed information on the storage protocol, including the type and concentration of preservatives, if applicable.
Response:
We thank the reviewer for this important comment. No chemical preservatives were used during fecal sample storage in this study. All samples were stored at 4 °C immediately after collection and maintained under refrigeration until analysis, with manual McMaster counts performed within a maximum of 6 days post-collection.
The chosen storage conditions are supported by experimental evidence indicating that short-term refrigeration preserves helminth egg integrity and detectability. In particular, Crawley et al. (2016) demonstrated that fecal egg counts remain stable for up to 7 days when samples are stored at 3–5 °C, with significant declines occurring only after longer refrigeration periods (>8–10 days). Similar findings have been reported in horses and small ruminants, where refrigeration within this time window does not result in biologically meaningful egg degradation or false-negative results.
To address this point more explicitly, the Materials and Methods section has been revised to clarify that
Comment 2
Please specify the qualifications of the observers involved in the study (e.g., their experience in laboratory diagnostics of parasite eggs). This information is important for assessing the reliability of the manual egg-counting process.
Response:
We thank the reviewer for this suggestion. Both manual observers were licensed veterinarians with approximately 10 years of experience in clinical parasitology and laboratory-based fecal egg counting. Their professional training included routine diagnostic microscopy for gastrointestinal parasites in both clinical and research settings. In addition, both observers completed similar formal training, as they were classmates during their veterinary education and subsequently received comparable postgraduate instruction in parasitological diagnostic techniques. This shared training background ensured methodological consistency and minimized inter-observer variability. The manuscript has been revised to explicitly describe observer qualifications and training.
Comment 3
If this study were conducted in a clinical setting, could the error associated with manual egg counting—particularly as parasitic load increases—impact the proposed treatment regimen for animals? Specifically: (1) Would the observed errors in manual counting influence the choice and/or dosage of prescribed antiparasitic drugs? Conversely, would the errors introduced by the automated egg-counting software affect treatment decisions (e.g., drug selection or dosage)? (2) It is essential to elaborate in greater detail on the practical implications of the errors identified in both manual and automated counting methods.
Response
We thank the reviewer for highlighting the clinical implications of counting error. In small ruminant practice, fecal egg counts are not used to determine anthelmintic drug selection or dosage, which are instead based on parasite epidemiology, resistance status, and animal body weight. Rather, fecal egg counts are primarily applied to treatment eligibility decisions (e.g., above or below a predefined threshold) within targeted selective treatment frameworks.
Accordingly, the variability observed in manual egg counts at higher parasite burdens—although statistically detectable—is unlikely to influence drug choice or dosing in a clinical setting. Its primary impact lies in potential misclassification of animals near treatment thresholds. In practice, this risk is mitigated by integrating fecal egg counts with clinical indicators such as FAMACHA© score, body condition, age, and production status.
Errors associated with the automated system similarly do not affect drug selection or dosage but may influence treatment classification when the same numerical threshold is applied without calibration. In this study, the automated system consistently underestimated egg counts relative to the manual reference, which reduced sensitivity when using the conventional 1000 EPG cutoff. Importantly, this did not result in false-positive treatment decisions but rather in missed positives near the threshold. When an adjusted cutoff (~480–500 EPG) was applied, diagnostic equivalence with manual classification was restored, indicating that these errors are systematic and correctable, rather than random or clinically destabilizing.
Taken together, inaccuracies in both manual and automated fecal egg counting methods primarily affect threshold-based treatment decisions, not therapeutic strategy or dosing. Once method-specific thresholds are established, the automated system’s errors are unlikely to be more clinically consequential than the known variability inherent to manual microscopy. This supports the interpretation that, in a calibrated clinical context, automated counting errors are not prohibitive and may be less critical than traditional manual variability, particularly when integrated with standard clinical assessment.
Comment 4
Please include a conclusion that succinctly summarizes the key findings of your study, their implications for veterinary practice, and potential directions for future research.
Response
We sincerely thank the reviewer for this valuable suggestion. We greatly appreciate this comment, as the manuscript benefited from the inclusion of a concise concluding section. Based on this feedback, we added a clear conclusion summarizing the key findings, their clinical implications, and directions for future research, which has improved the clarity and overall impact of the paper.
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsDear Authors,
Thank you very much for clarifying all the points I raised and for revising the manuscript accordingly.
In my view, your research holds tremendous practical value and is poised to make a significant impact on real-world implementation. The meticulous selection of conditions to ensure accurate AI-based egg counting represents a critical step toward reducing the routine workload on veterinary services.
Wishing you all the best with your future research!