Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Comparison of Machine Learning Algorithms to Predict Down Syndrome During the Screening of the First Trimester of Pregnancy

Appl. Sci. 2025, 15(10), 5401; https://doi.org/10.3390/app15105401

by Eduardo Alonso^1,2,*

, Andoni Beristain^1,3

, Jorge Burgos⁴

and Ibai Gurrutxaga²

Reviewer 1:

Wai Lun Lo

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2025, 15(10), 5401; https://doi.org/10.3390/app15105401

Submission received: 15 April 2025 / Revised: 7 May 2025 / Accepted: 9 May 2025 / Published: 12 May 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper presents a new approach for screening women during the first 1
trimester of pregnancy at high risk of developing Down syndrome (DS) using machine learning algorithms. Different machine learning models, have been trained in a pseudoanonymized dataset of 90,532 screening patients with less than a 1% of positive cases from Cruces University Hospital. The results indicate that the use of machine learning techniques in screening programs for early detection of Down syndrome can effectively predict the risk during pregnancy.
Overall speaking, the quality of the paper is good.
The dataset is obtained from practical cases from Hospital. The training methods is well presented and the results support the conclusions of the paper.
However, the conclusion is quite brief and this section should be extended to describe the original contributions of the paper and highlight the directions of future research.

Introduction – the number of references given in this paper is 20, I suggest authors to include more up-to-date research literature survey in this area. The section of Introduction could be extended a bit.
Methodology – A flowchart could be included in this section to illustrate the logic flow of the method proposed in this paper.
Methodology – The color Fig. 1 could be changed to other easily readable color tone for the correlation matrix so as to increase its readability.
Methodology – The size of Figure 2. Correlation between variables and trisomy 21 could be increased with a larger font sizes.
The size of Figure 3. Training Workflow could be increased with a larger font size.
In section 2.3.3. Training Model, a table can be included to summarize the types of models used in this paper.
In section 3, authors should explain the reasons why the following methods are chosen for comparisons.

Logistic Regression
SGD Classiifier
XGB Classifier
Gradient Boosting Classifier
Extra Trees Classifier
Radom Forest Classifer

In Figure 5. ROC-AUC curves on test data, the axis labels are too small, the figures should be reformatted with a better font size and a clear axis scale.
In Table 2. Scoring metrics for different thresholds. All the data and results in the table are within a quite narrow range (e.g. first column data varies from 0.975 to 0.982), authors should explain the relative merits of different method. Authors could illustrate the results with a bar chart.
The Conclusion section should be extended to describe the original contributions of the papers and the insights for future research in this area.

Author Response

Dear reviewer,

Thank you for your insightful feedback on our manuscript. We appreciate you taking the time to provide these constructive comments. We have carefully considered each of your comments, and the following details the changes we have made to the manuscript in response to your suggestions

Comments 1: Introduction – the number of references given in this paper is 20, I suggest authors to include more up-to-date research literature survey in this area. The section of Introduction could be extended a bit.
Response 1: Thank you for this helpful suggestion. We have now expanded the Introduction section and incorporated nine additional references to provide a more comprehensive survey of the current research literature in this area. We believe this extension clarifies the context and the significance of our work.

Comments 2: Methodology – A flowchart could be included in this section to illustrate the logic flow of the method proposed in this paper.
Response 2: We appreciate this suggestion for improved clarity. We believe that Figure 3 already clearly illustrates the workflow of the method proposed in this paper. However, we are open to further refining this figure or adding supplementary material if the reviewer feels specific aspects remain unclear after review.

Comments 3: Methodology – The color Fig. 1 could be changed to other easily readable color tone for the correlation matrix so as to increase its readability.
Response 3: Thank you for pointing out the readability issue with Figure 1. We have changed the color scheme of the correlation matrix to a more easily discernible tone. Additionally, we have added annotations to further improve its readability

Comments 4: Methodology – The size of Figure 2. Correlation between variables and trisomy 21 could be increased with a larger font sizes.
Response 4: We have addressed your concern regarding the size and font of Figure 2. The figure size has been increased, and we trust that the larger font sizes now make it more legible.

Comments 5: The size of Figure 3. Training Workflow could be increased with a larger font size.
Response 5: Following your suggestion, we have increased both the font size and the overall size of Figure 3 (Training Workflow) to enhance its clarity and readability.

Comments 6: In section 2.3.3. Training Model, a table can be included to summarize the types of models used in this paper.
Response 6: We agree that a table would improve the organization of this information. Therefore, the paragraph in section 2.3.3 has been replaced with a table summarizing the models used, and a corresponding reference has been added to the text.

Comments 7: In section 3, authors should explain the reasons why the following methods are chosen for comparisons:
Logistic Regression
SGD Classifier
XGB Classifier
Gradient Boosting Classifier
Extra Trees Classifier
Random Forest Classifier
Response 7: Thank you for highlighting the need for clarification here. We have added a sentence at the beginning of section 3 to explain our choice of comparison models: "The machine learning models used for comparison were selected to provide a representative sample of common classification algorithms readily available in the scikit-learn library, supplemented with XGBoost, chosen for its performance, efficiency, and ability to mitigate overfitting, allowing for an evaluation of diverse modeling approaches." We hope this explanation is helpful.

Comments 8: In Figure 5. ROC-AUC curves on test data, the axis labels are too small, the figures should be reformatted with a better font size and a clear axis scale.
Response 8: We appreciate your feedback on Figure 5. The axis labels and the overall size of the figure have been increased to improve readability and ensure a clearer understanding of the axis scales.

Comments 9: In Table 2. Scoring metrics for different thresholds. All the data and results in the table are within a quite narrow range (e.g., first column data varies from 0.975 to 0.982), authors should explain the relative merits of different method. Authors could illustrate the results with a bar chart.
Response 9: Thank you for this insightful comment. We have added a sentence to the results section to provide context for the seemingly small differences in the scoring metrics: "In a clinical setting, even a seemingly small increase in sensitivity can translate to a meaningful reduction in false negatives, ensuring that more affected pregnancies are correctly identified. Similarly, a slight improvement in specificity reduces false positives, minimizing unnecessary anxiety and invasive procedures for low-risk pregnancies. It is also important to note the inherent trade-off between sensitivity and specificity. Some models, like XGBoost with the gmeans threshold, prioritize sensitivity to maximize the detection of positive cases, which may result in a slight decrease in specificity. Conversely, other models may exhibit higher specificity at the cost of slightly lower sensitivity. The choice of the most suitable model and threshold depends on the specific clinical priorities and the relative costs of false positives and false negatives." We will also consider your suggestion of illustrating these results with a bar chart for further clarity in future revisions.

Comments 10: The Conclusion section should be extended to describe the original contributions of the papers and the insights for future research in this area.
Response 10: Thank you for this valuable suggestion. We have carefully revised and significantly extended the Conclusion section. We have made sure to explicitly highlight the original contributions of this paper and have provided a more detailed discussion of the insights for future research in this area. This revision goes beyond a simple rewrite and involves a substantial expansion to address your point thoroughly.

Reviewer 2 Report

Comments and Suggestions for Authors

The organization of the introduction should be improved carefully.
Line 40, please check the citation “[5][? ]”.
In the abstract, please give the full names of the PAPP-A protein and B-HCG hormone.
In general, there is a lack of explanation of replicates and statistical methods used in the study. Also, 0.89 ROC area?
Please check the last paragraph of the introduction. It does not make sense to conclude the text as “Section 2 (Methodology) introduces the background…”.
The presentation of Tables 1 and 2 should be improved.
The organization of this work could be improved. The sections on results, discussion, and conclusions could be improved.
Please give more annotations to Figure 1 and Figure 2.
Please check the order of the figures. Page 11 presents Figure 7. Page 12 presents Figure 6.

Comments on the Quality of English Language

The quality of English needs improving. The manuscript has grammatical issues with improper syntax, spelling, punctuation, and other issues. However, the manuscript is not well written, and the language quality impairs the reader's understanding on several occasions. I recommend improving the manuscript with a language professional or a language polishing company.

Author Response

Dear reviewer,

Thank you for your insightful and constructive feedback on our manuscript. We have carefully considered each of your suggestions and have made the following revisions, including a comprehensive revision of the English to improve clarity and readability by addressing grammar, syntax, spelling, and punctuation.

Comment 1: The organization of the introduction should be improved carefully.
Response 1: Thank you for your feedback on the organization of the Introduction. We have carefully reviewed this section and have made revisions to improve its flow and clarity. Specifically, the Introduction has been modified and expanded to provide a clearer context for both traditional methods and machine learning approaches applied to T21 screening. We have also incorporated additional references to support this expanded context. We hope these changes have resulted in a more coherent and well-organized Introduction. Thank you for this suggestion, which we believe has strengthened the manuscript.

Comment 2: Line 40, please check the citation “[5][? ]”.
Response 2: Thank you for pointing out the error in the citation on line 40. We apologize for this oversight. We have thoroughly reviewed all citations in the manuscript and have made the necessary corrections, including adding new references where appropriate.

Comment 3: In the abstract, please give the full names of the PAPP-A protein and B-HCG hormone.
Response 3: We have now included the full names of the PAPP-A protein (Pregnancy-Associated Plasma Protein A) and β-hCG hormone (beta-human chorionic gonadotropin hormone) in the abstract as suggested.

Comment 4:In general, there is a lack of explanation of replicates and statistical methods used in the study. Also, 0.89 ROC area?
Response 4: Thank you for raising this important point regarding the clarity of our statistical methods. We have carefully reviewed the manuscript and expanded the explanations in the following sections:

Section 2.2.1 (Exploratory Data Analysis): To clarify the statistical approach in this section, we have provided a more detailed explanation of Pearson correlation, which was used to assess the relationships between variables. We now explain that Pearson correlation measures the linear association between continuous variables, detailing the interpretation of the correlation coefficient's range and values.
Section 2.2.3 (Outlier Removal): The Grubbs' test, used for outlier detection, was already mentioned, but we have ensured that its purpose and suitability for our data (univariate, approximately normally distributed) are explicitly stated.
First paragraph of Section 2.3.3 (Model Training): We have significantly expanded the explanation of k-fold cross-validation, detailing the process of partitioning the data into folds, the iterative training and validation procedure, and how this technique helps to estimate the model's generalization performance and prevent data leakage.
Second paragraph of Section 2.3.3 (Model Training): The Evolutionary Algorithms (EAs) used for hyperparameter optimization are now described in greater detail. We have explained the analogy to biological evolution, the role of the fitness function, and the genetic operators involved (mutation, crossover, recombination).
Third paragraph of Section 2.3.3 (Model Training): We have provided a more extensive description of the feature selection techniques employed, including the definition of feature importance and how the algorithm dynamically selects feature subsets.

Regarding the question about the "0.89 ROC area," we acknowledge that this was an error on our part. The standard term is "ROC-AUC" (Receiver Operating Characteristic Area Under the Curve), which is a measure of a classifier's ability to distinguish between classes. We have reviewed the Results section to ensure that ROC-AUC is consistently used and that its interpretation is clear. If the reviewer is referring to a specific instance where "ROC area" was used, we would appreciate a pointer to that location in the text so we can correct it.

Comment 5: Please check the last paragraph of the introduction. It does not make sense to conclude the text as “Section 2 (Methodology) introduces the background…”.
Response 5: Thank you for pointing out the lack of logical flow in the final paragraph of the Introduction. You are right; directly stating what Section 2 introduces was not an effective way to conclude the section. We have therefore modified the last paragraph to provide a smoother transition and a clearer overview of the paper's structure. The revised paragraph now begins with: "The following sections of this paper detail the key components of our study, beginning..." and then outlines the subsequent sections. We believe this provides a more coherent and informative conclusion to the Introduction.

Comment 6: The presentation of Tables 1 and 2 should be improved.
Response 6: Thank you for pointing out the need for improved presentation of Tables 1 and 2. We have revised both tables to enhance their readability through a clearer layout and formatting, more descriptive captions, and improved labeling and organization of information, adhering to formatting guidelines commonly used in scientific publications. We believe these modifications have significantly improved the clarity and effectiveness of these tables.

Comment 7: The organization of this work could be improved. The sections on results, discussion, and conclusions could be improved.
Response 7: Thank you for this general feedback on the organization and clarity of the Results, Discussion, and Conclusions sections. In response to this and other comments, we have undertaken a thorough revision of the entire manuscript. This revision included a careful restructuring of these sections to improve their logical flow and coherence. Furthermore, we have focused on enhancing the clarity of the language throughout the document to ensure the presentation of our findings and their interpretation is as effective as possible. In particular, we have also significantly extended the Conclusions section to explicitly highlight the original contributions of this paper and provide more detailed insights for future research in this area. We believe these comprehensive changes have significantly improved the overall organization and readability of the work.

Comment 8: Please give more annotations to Figure 1 and Figure 2.
Response 8: Thank you for your suggestion to improve the annotations in Figure 1 and Figure 2. We have carefully revised these figures to enhance their clarity and informativeness by increasing the font size of all labels, axis titles, and values to improve readability, adding more descriptive labels to the color bar in Figure 1, clarifying the axis labels in both figures, and providing more detailed captions that explain their purpose and how to interpret them, as well as improving the title placement and wording. Furthermore, we have also taken the opportunity to enhance Figure 3 by adding more descriptive labels within the boxes, clarifying the flow of data and operations with more precise arrows and labels, and providing a more comprehensive caption that explains each step of the training workflow. We believe these enhancements make all three figures significantly more understandable and contribute to a clearer presentation of our methodology and results.

Comment 9: Please check the order of the figures. Page 11 presents Figure 7. Page 12 presents Figure 6.
Response 9: Thank you for pointing out the incorrect order of the figures. We apologize for this oversight, which can sometimes occur with LaTeX's figure placement. We have carefully reviewed the manuscript and have reordered the figures and tables to ensure they appear in the correct sequential order throughout the document.

Reviewer 3 Report

Comments and Suggestions for Authors

The article needs further work

Comments for author File: Comments.pdf

Author Response

Dear reviewer,

Thank you for your insightful and constructive feedback on our manuscript. We have carefully considered each of your suggestions and have made the following revisions

Comment 1: The article lacks sufficient information about the distribution of positive and negative cases in the dataset. It is only mentioned that positive cases are less than 1%. Providing the exact number of positive and negative cases would be helpful for assessing the balance of the dataset.
Response 1: The exact number of positive and negative cases is provided in the first row of Table 1.

Comment 2: While the authors mention the pseudonymization of the data, they do not provide enough detail about the methods used to protect patient confidentiality. Adding a section detailing the data protection measures would enhance the ethical soundness of the research.
Response 2: We appreciate the reviewer's concern regarding patient confidentiality. It is important to clarify that the data used in this study was extracted directly from the hospital's database, and our research team was not involved in the initial anonymization or data extraction process. The data was provided to us without any personal identifiers. The ethical approval for this process is detailed in the "Institutional Review Board Statement" section of the manuscript. However, to further address the reviewer's concern, we have expanded the description of the data handling procedures in Section 2.1 (Data Extraction) to provide more specific details about how the data was handled once it was received by our team.

Comment 3: The article lacks a comparison of the proposed approach with current standard screening methods. It would be beneficial to show how the machine learning models outperform traditional statistical methods in terms of accuracy, sensitivity, and specificity.
Response 3: Thank you for pointing out the need for a clearer comparison with current standard screening methods. To address this, we have revisited and expanded the fourth paragraph of the Introduction section to provide a more detailed description of traditional screening methods, and have added relevant references to support these points; furthermore, we have rewritten the Discussion section, and in the fourth paragraph, we have included a more direct comparison of our machine learning models' performance with the performance of traditional methods as reported in the literature.

Comment 4: The discussion section could be expanded to analyze the clinical significance of the obtained results in more depth. The authors should discuss the potential impact of these machine learning models on clinical practice and decision-making by doctors and patients.
Response 4: Thank you for this important suggestion. The Discussion section has been completely rewritten to provide a more thorough analysis of the clinical significance of our findings and to clarify the potential contributions of this research.

Comment 5: The article does not address the baseline cognitive model of the patients undergoing screening.
Response 5: Thank you for raising this point. We acknowledge that the baseline cognitive model of patients can be an important consideration in some medical contexts. However, in the specific context of first-trimester screening for Trisomy 21, the screening process relies primarily on the measurement of biochemical and biophysical markers, such as hormone levels and ultrasound measurements of nuchal translucency. These markers (PAPP-A, β-HCG, NT) are well-established indicators of Trisomy 21 risk and are not directly influenced by the mother's cognitive function. Therefore, data related to the baseline cognitive model of the patients was not included in our analysis. We recognize the importance of considering a broader range of factors in other medical contexts, but for the specific methodology of this screening, it is not a contributing factor.

Comment 6: The article does not specify the role of the physician as the primary expert in this matter, given that it is a medical problem and not one of artificial intelligence. Machine learning cannot make a final diagnosis.
Response 6: Thank you for highlighting the crucial role of the physician in clinical decision-making. We agree that machine learning models are tools to support, not replace, expert medical judgment. To clarify this in our manuscript, we have revised the Discussion section to explicitly state that our models are intended to support physicians in risk assessment and should not be used for final diagnoses. The interpretation of results and ultimate clinical decisions must remain with qualified medical professionals.

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

The article be can accepted in its original form

Article Menu

Comparison of Machine Learning Algorithms to Predict Down Syndrome During the Screening of the First Trimester of Pregnancy

Further Information

Guidelines

MDPI Initiatives

Follow MDPI