Next Article in Journal
Assessment of In-Hospital Pain Control after Childbirth and Its Correlation with Anxiety in the Postpartum Period: A Cross-Sectional Study at a Single Center in the USA
Previous Article in Journal
Human Maternal-Fetal Interface Cellular Models to Assess Antiviral Drug Toxicity during Pregnancy
 
 
Article
Peer-Review Record

Identifying Risk Factors for Premature Birth in the UK Millennium Cohort Using a Random Forest Decision-Tree Approach

Reprod. Med. 2022, 3(4), 320-333; https://doi.org/10.3390/reprodmed3040025
by David Waynforth
Reviewer 1:
Reviewer 2:
Reprod. Med. 2022, 3(4), 320-333; https://doi.org/10.3390/reprodmed3040025
Submission received: 10 November 2022 / Revised: 6 December 2022 / Accepted: 7 December 2022 / Published: 9 December 2022
(This article belongs to the Special Issue Recent Advances in Fetal Medicine 2022)

Round 1

Reviewer 1 Report

 

 

 

The work titled “Identifying risk factors for premature birth in the UK Millennium Cohort using a Random Forest decision-tree approach.” Proposes a machine learning approach to analyse HER data and predicting the type of preterm birth (e.g., “very preterm birth”).

The work is relevant from a clinical perspective however it suffers from weaknesses that should be addressed.

Hereafter my comments:

·      To improve the readability of the work, in the last paragraph of the introduction, I suggest that the author include the innovative contribution of the work compared to the state of the art in a bulleted list.

·       I do not properly understand the sentence “machine learning was selected over regression modelling.”. It would be more appropriate to specify “a ML algorithm for classification purposes was selected over regression modelling”.

·       In the description of the dataset, the author should specify whether the dataset is publicly available and should explain how the training was conducted (specifically how the training, testing and validation sets were constructed) to allow researchers in the field to conduct fair comparisons. Besides it is crucial to specify whether a cross validation for hyperparameter tuning was conducted.

·       How did the author handle the unbalancing of classes if this was present?

·       Why did the authors choose a random forest as the classification algorithm and not extreme gradient boosting (namely XGBOOST which is robust to missing data too). The authors should give more details on this choice.

·       The analysis of the state of the art is almost absent I suggest the author to include it to allow the reader to understand where this work lies in the literature.

 

 

 

 

Author Response

Reviewer 1

The work titled “Identifying risk factors for premature birth in the UK Millennium Cohort using a Random Forest decision-tree approach.” Proposes a machine learning approach to analyse HER data and predicting the type of preterm birth (e.g., “very preterm birth”).

The work is relevant from a clinical perspective however it suffers from weaknesses that should be addressed.

Hereafter my comments:

  •     To improve the readability of the work, in the last paragraph of the introduction, I suggest that the author include the innovative contribution of the work compared to the state of the art in a bulleted list.

This was also raised by Reviewer 2, and I have added a final paragraph to the introduction to address it and conclude the introduction with a clear statement about the aims of the research.

  • I do not properly understand the sentence “machine learning was selected over regression modelling.”. It would be more appropriate to specify “a ML algorithm for classification purposes was selected over regression modelling”.

Thank you: edited as you suggested.

  • In the description of the dataset, the author should specify whether the dataset is publicly available and should explain how the training was conducted (specifically how the training, testing and validation sets were constructed) to allow researchers in the field to conduct fair comparisons. Besides it is crucial to specify whether a cross validation for hyperparameter tuning was conducted.

I have added information on data access in the methods section. Cross-validation scores in hyperparameter tuning are graphed in an appendix, and the code is included. I have edited the text to ensure that this is clear.

  •       How did the author handle the unbalancing of classes if this was present?

Given that preterm birth is relatively common, problems with extremely unbalanced classes were not expected. However, very preterm birth occurred for just over 1% of all births, and it is possible that with oversampling (over-bagging) very preterm birth, the algorithm performance could be improved. I have added this point to the discussion.

  • Why did the authors choose a random forest as the classification algorithm and not extreme gradient boosting (namely XGBOOST which is robust to missing data too). The authors should give more details on this choice.

XGBOOST and random forest differ in how the decision trees are built, sequentially (XGBOOST) versus simultaneously (random forest). While XGBOOST is known to be a better choice for imbalanced data, random forest is more straightforward to hypertune to maximize algorithm performance. Random forest is more widely used at this point in time: Random forest can be implemented in Stata, has recently been added to SPSS via a Python plugin, and the opensource BlueSky Statistics.

  • The analysis of the state of the art is almost absent I suggest the author to include it to allow the reader to understand where this work lies in the literature.

The discussion has been reworked to address this.

Reviewer 2 Report

dear authors thank you for your submission 

the paper is of interest; we thank you for bringing the discussion to a such an important issue in the field of obstetrics 

we have a few suggestions for you to improve the visibility of the submitted paper 

1. the result in the abstract are discussed in brief; you might consider adding a few points regarding the associated factors in predicting early vs late onset PL (PRETERM LAB)

2. though we liked the current introduction, it did not clearly highlight the study's aim nor its goals; instead, you discussed that at the beginning of your discussion ?? revise the introduction, please.

in your introduction you discussed maternal, and paternal causes of PL what about fetal causes?

you should define in brief the different aetiological factors of very early and late-onset PL since many of these are parameters of the current study.

in line 78 delete eg from reference 

3. results 

they were very nicely presented in clear simple language yet we felt that they were not explained thoroughly 

see machine learning is not familiar to many, we think that important results especially regarding those who had a good role in predicting early and late PL should be emphasized more.

fig.4 had an undefined abbreviation for [maternal and paternal age    tf????

5. discussion 

it needs more tightening 

the different factors that predicted really and late PL was not discussed in depth moreover we did not see too many studies that were compared with the current results.

in line 241-244 the discussion needs to be strengthed more Please revise 

the references 

too many of them are old,20-25 years and one is 50 years old??? update please

 

 

 

 

 

Author Response

Reviewer 2

dear authors thank you for your submission 

the paper is of interest; we thank you for bringing the discussion to a such an important issue in the field of obstetrics 

we have a few suggestions for you to improve the visibility of the submitted paper 

  1. the result in the abstract are discussed in brief; you might consider adding a few points regarding the associated factors in predicting early vs late onset PL (PRETERM LAB)

-- I have reworked the abstract a little to allow space for this.

  1. though we liked the current introduction, it did not clearly highlight the study's aim nor its goals; instead, you discussed that at the beginning of your discussion ?? revise the introduction, please.

-- I have added a paragraph at the end of the introduction summarising the aims.

        in your introduction you discussed maternal, and paternal causes of PL    what about fetal causes?

       you should define in brief the different aetiological factors of very early and late-onset PL since many of these are parameters of the current study.

-- Thank you for this comment, I have added a brief discussion of this to the introduction.

in line 78 delete eg from reference 

  1. results 

they were very nicely presented in clear simple language yet we felt that they were not explained thoroughly 

see machine learning is not familiar to many, we think that important results especially regarding those who had a good role in predicting early and late PL should be emphasized more.

-- I have added information to aid understanding of the variable importance scores.

fig.4 had an undefined abbreviation for [maternal and paternal age    tf????

-- I have removed the abbreviation, ft, which stands for full-time. “Age left education” is sufficient.

  1. discussion 

it needs more tightening 

-- Thank you for this feedback: I have reorganised the discussion section, adding a clear structure.

the different factors that predicted really and late PL was not discussed in depth moreover we did not see too many studies that were compared with the current results.

-- I have added a section to the discussion on similarities and differences to past studies.

in line 241-244 the discussion needs to be strengthed more Please revise 

the references 

too many of them are old,20-25 years and one is 50 years old??? update please

-- I have included some more recent references.

Round 2

Reviewer 1 Report

I thank the authors for the reply, I have no further concerns

Author Response

(The reviewer had no further recommendations)

Reviewer 2 Report

thank you for accepting our suggestion 

 

Author Response

(The reviewer had no further recommendations)

Back to TopTop