Application of Artiﬁcial Intelligence in Screening for Adverse Perinatal Outcomes—A Systematic Review

: (1) Background: AI-based solutions could become crucial for the prediction of pregnancy disorders and complications. This study investigated the evidence for applying artiﬁcial intelligence methods in obstetric pregnancy risk assessment and adverse pregnancy outcome prediction. (2) Methods: Authors screened the following databases: Pubmed/MEDLINE, Web of Science, Cochrane Library, EMBASE, and Google Scholar. This study included all the evaluative studies comparing artiﬁcial intelligence methods in predicting adverse pregnancy outcomes. The PROSPERO ID number is CRD42020178944, and the study protocol was published before this publication. (3) Results: AI application was found in nine groups: general pregnancy risk assessment, prenatal diagnosis, pregnancy hypertension disorders, fetal growth, stillbirth, gestational diabetes, preterm deliveries, delivery route, and others. According to this systematic review, the best artiﬁcial intelligence application for assessing medical conditions is ANN methods. The average accuracy of ANN methods was established to be around 80–90%. (4) Conclusions: The application of AI methods as a digital software can help medical practitioners in their everyday practice during pregnancy risk assessment. Based on published studies, models that used ANN methods could be applied in APO prediction. Nevertheless, further studies could identify new methods with an even better prediction potential.


Introduction
The first mentions of artificial intelligence (AI) appeared in ancient Greek mythology. In the 15th century, Leonardo Da Vinci crafted an impressive mechanical knight that could move its arms, sit, and twist its head. However, the actual field of artificial intelligence research was founded in 1956 [1].
Artificial intelligence is a science and engineering field that deals with computerdriven mechanisms or machines that use computer power, memory, and large amounts of data. It has been handling learning tasks by creating models of intelligent behavior with the lowest possible human interruption. The appliance of artificial intelligence in medicine can be divided into two main branches: virtual and physical. The virtual branch is represented by machine learning (ML) which is characterized by computer algorithms that can enhance learning through experience [1]. In contrast, the physical branch consists of mechanical devices such as medical equipment or highly advanced robots [2].
Many complications of pregnancy cannot be directly treated once they occur. Therefore, perinatal medicine predicts high-risk groups and applies interventions to minimize adverse Box 1. Search strategy.
(pregnant OR pregnancy OR prepartum OR prenatal OR gestation OR prelabor OR maternal) AND (artificial neural networks OR artificial intelligence OR machine learning) AND (pregnancy risk) All searches were conducted on 27 May 2022, with languages restricted to English, German, or Polish and with no publication-time limits. Additionally, references of all included studies were hand-searched for additional relevant articles.

Inclusion Criteria
All types of evaluative study designs were included and assessed. Two reviewers (SF and MP) independently screened the studies by the title, abstract, and full text. Studies that met the selection criteria were included. The reference lists of the included studies were additionally screened. Every included study was assessed (0 = not relevant, 1 = possibly relevant, and 2 = very relevant). Only publications that scored at least 1 point were included in the study. Any disagreement was discussed and resolved by the third researcher (AK).

Data Extraction
PI(E)CO question was "Artificial intelligence methods in the screening for pregnancy risk and adverse pregnancy outcomes in pregnant women." Population (P) was pregnant women with high-risk pregnancies (with named complications of pregnancies). Intervention (E) applied artificial intelligence methods to evaluate pregnancy risk and to screen for APO. For comparison, (C) pregnant women with low-risk pregnancies were included. The outcome (O) was the prediction value of the studied artificial intelligence method. Studies (S) included in the analyses were retrospective or prospective trials with the unaffected population as a control.
The PRISMA diagram was made according to Reporting Items for Systematic Reviews and Meta-Analyses. The PRISMA flow chart is shown in Figure 1 [28].

Quality Assessment and Risk of Bias
The risk of bias was assessed independently by two authors (SF and MP) using the Newcastle-Ottawa scale [29]. The third reviewer (AK) resolved apparent discrepancies in the selection process. In general, the studies included were of moderate to high quality. The selection process is shown in Table 1.

Synthesis of Results
Due to the heterogeneity of the included studies, there was no possibility of performing a quantitative synthesis. Nevertheless, all prediction values of the included studies' AI methods were divided into groups according to pregnancy outcomes and were assessed. Results were summarized in Tables 2-10.

Quality Assessment and Risk of Bias
The risk of bias was assessed independently by two authors (SF and MP) using the Newcastle-Ottawa scale [29]. The third reviewer (AK) resolved apparent discrepancies in the selection process. In general, the studies included were of moderate to high quality. The selection process is shown in Table 1.

Results
AI has been applied to many different aspects of perinatal medicine. The included studies described AI applications in predicting general pregnancy risk characteristics, prenatal diagnoses, pregnancy hypertension disorder, fetal growth, stillbirth, gestational diabetes, preterm deliveries, the delivery route, and other conditions [69][70][71][72][73][74][75][76][77][78][79]. The studies were divided into nine groups using the predictive value of the AI methods. Tables 3-10 present the main characteristics of the included studies. For each group, the most robust method was identified.
There were various predictors used in the model construction. The most relevant were the clinical parameters and existing health conditions of the mother (maternal and gestational age; gravidity and parity; BMI, weight gain; and life parameters such as blood pressure, lifestyle, nutrition, etc.), pregnancy-related complications (threatened preterm delivery, bleeding in pregnancy, prepregnancy and gestational diabetes, hypertensive disorder spectrum, HELLP syndrome, cholestasis and other liver disorders in pregnancy, thrombophilia, autoimmune diseases, an age higher than 35 years, multiple pregnancies, and an interval of 10 years or more between pregnancies), laboratory parameters (albuminuria, hyperglycemia, leucocythemia, etc.), fetal monitoring parameters (basal fetal heart rate, variability, and the occurrence of accelerations or decelerations), and Ultrasound and Doppler parameters (fetal movement; the growth of the fetus; uterine, cerebral, and umbilical fetal Doppler signals; and the amniotic fluid index). However, many other variables might be used in order to measure the overall risk of a pregnancy. This would increase the model's representativeness and accuracy, but it could be difficult to determine how effective the model would be.
Classification and regression trees (CART) [30], fuzzy logic [31], the teletech architecture ILITIA [32], ANNs, and naïve Bayes were used [34]. CART had the best predictive value (accuracy of 93.4% in the training group and 82.5% in the tested group), and naïve Bayes had the worst accuracy with a score of 70% [34]. Nevertheless, the observed differences between all the methods were minimal.
The second group included four articles about the prenatal diagnosis of chromosomal abnormalities using ANNs and binary classification models such as the averaged perceptron, the boosted DT, the Bayes point machine, the decision forest, the decision jungle, the locally deep SVM, LR, ANNs, and the SVM (Table 3) [35][36][37][38]. A well-known chromosomal abnormality prediction was built in the AI model using ultrasound indicators (nuchal translucency, crown-rump length, and the presence of the nasal bone) and pregnancy-associated plasma protein A (PAPP-A) with free β-hCG. All were obtained between the 9th + 3 and the 11th week + 6 days of gestation.
The best method was the Decision Forest model, which achieved the highest accuracy of 89.5%, with an almost 100% detection rate of 21st trisomy [38].
The third group included five articles about pregnancy hypertension characteristics (Table 4) [39][40][41][42][43]. The methods used in this group were ANNs [41], neuro-fuzzy machine learning techniques [40], MLR [39,43], and other ML techniques such as LR, the DT model, the naïve Bayes classification, the SVM, the RF algorithm, and the stochastic gradient boosting method [42]. For pregnancy hypertension prediction, the following risk factors were assessed: maternal age at the time of delivery, gestational age at delivery; maternal race; parity; neonatal birth weight; prepregnancy BMI; cervical ripening during induction; fetal growth restriction; blood pressure; maternal medical history of hypertension, diabetes, and previous preeclampsia; obstetrical and social histories; medications prescribed during pregnancy; and laboratory data (blood urea nitrogen, serum creatinine, spot urine protein-to-creatinine ratio, urine albumin-to-creatinine ratio, hemoglobin, fasting blood glucose, serum albumin, uric acid, total bilirubin, aspartate transaminase, alanine transaminase, total cholesterol, triglycerides, high-density lipoprotein cholesterol, and low-density lipoprotein cholesterol).
ML techniques showed remarkable results. The stochastic gradient boosting model had the best prediction performance, with an accuracy of 97.3% and a false-positive rate of 0.9%. ANNs and MLR showed an area under the ROC curve of 0.952 as well as 86.2% sensitivity and 95.4% specificity for ANNs.
The fourth group included four articles about the prediction of fetal growth restriction (FGR) ( Table 5) [44][45][46][47]. The three most important factors for SGA prediction were gestational weight gain, maternal smoking, and prior low birth weight (LBW) infants. Prepregnancy BMI, gestational weight growth, and a history of deliveries of infants weighing more than 4080 g were the primary predictors of LGA.
Methods such as the SVM, the RF, LR, sparse LR models, linear and quantile regression, Bayesian additive regression trees, generalized boosted models, and the ML technique called the bagged tree were used [44][45][46][47]. The SGA bagged tree case had a prediction value of 84.9% and an area under the receiver operating characteristic curve of 0.636 [46]. The highest accuracy was achieved by the SVM, with a prediction score of 90.7% and an AUC of 0.588 [44].
The next group of examined articles included four studies about stillbirth prediction (Table 6) [48][49][50][51]. MLR [48], artificial intelligence analysis of time-lapse (TLM) embryo images [49], LR, ANNs, the gradient boosting DT [50], regularized logistic regression, the DT based on classification and regression trees, the RF, extreme gradient boosting (XGBoost), and the multilayer perceptron neural network were used in those studies [51].  Social predictors (maternal age, parity, education, occupation, ethnicity, place of residence, previous fetal loss, bleeding during pregnancy, maternal BMI, number of prior caesarean sections, multiple pregnancies, child's gender, and fetal growth rate) and comorbid conditions (hypertension disorder spectrum, diabetes, sickle cell disease, renal disorders, thyroid dysfunction, and venereal diseases) were used to estimate the risk of miscarriages. MLR showed excellent results with a C-statistic basic model of 80% of stillbirth predictions.
Other factors such as the mother's age, BMI, triceps skin-fold thickness, plasma glucose concentration at 2 h in an oral glucose tolerance test, 2 h serum insulin level, and diabetes degree function were particularly effective in predicting gestational diabetes. The ML method resulted in a high accuracy even at pregnancy initiation with 0.85 ROC, substantially outperforming a baseline risk score of 0.68 ROC [52]. The usage of the RBFNetwork also showed great results with a precision of 78.5%, an F-Measure of 78.6%, an ROC area of 0.839, and a Kappa statistic of 0.509 [53]. Therefore, these two models seem to be the best fit for the prediction of GDM [52,53]. The seventh group included fourteen studies about preterm delivery prediction (Table 8) [59][60][61][62][63][64][65][66][67][68][69][70][71][72]. ANNs [59], cross-validated regressions [60], the RF classifier, the rulebased classifier, penalized logistic regression [61], the synthetic minority oversampling technique [63], LR, classification and regression trees (CART), the SVM, the Bayesian classifier [64], the back-propagation neural network [65], the system for mobile health SVM [72], deep learning [66], data mining, and the feed-forward back-propagation network were used in all of these studies [67][68][69][70][71].   Predictors such as first trimester bleeding, preterm rupture of the membranes, polyhydramnios, oligohydramnios, the occurrence of infections, close spacing between pregnancies, histories of preterm delivery, and miscarriages played a crucial role in preterm delivery despite the frequently used clinical evaluation of patients and their comorbidities.
Out of all AI methods used, the RF performed well, with a sensitivity of 97%, a specificity of 85%, an area under the ROC curve of 94%, and a mean square error rate of 14% [61]. Prediction values for most of the studies were similar.
The eighth group includes two articles about predicting the delivery route (Table 9) [73,74]. The DT [73] and the backpropagation learning algorithm (ANN method) [74] were used. The predictors included maternal age, gravida, parity, gestational age at delivery, need for and kind of labor induction, baby's presentation at birth, and maternal comorbidities. Again, ANNs performed better than ML, with a 97.5% specificity and a sensitivity of 60.9% [74].
The last group consists of five studies predicting other aspects of perinatal care (Table 10) [75][76][77][78][79]. Despite the group's heterogeneity, the previous groups' predictors were also used. For example, the studies described the prediction of pregnancy outcomes among women affected by SLE using a binary logistic regression model [75], and they described the prediction of congenital heart disease (CHD) among pregnant women [44], the frequency of 27 potential risk factors related to pregnancy and the perinatal period [77], and the identification of patients with the risk of fetal distress in labor using ANNs [78]. The application of ML was used to identify severe maternal morbidity (SMM) and relevant risk factors from electronic health records (EHRs) [79].
For preeclampsia assessment, Jhee et al. conducted a prospective study of 11,006 pregnant patients [42]. The machine learning model-decision trees (DTs), the naïve Bayes classification (NBC), the support vector machine (SVM), the random forest (RF), and stochastic gradient boosting (SGB) were subsequently trained and used. The predictive value of the DT was 84.7%, and it was 89.9% for the NBC, 89.2% for the SVM, 92.3% for the RF, and 97.3% for SGB [42].
Li et al. conducted a prospective study to predict the fetal growth restriction on a sample of 215568 pregnant women. The ML technique-the support vector machine (SVM)-and the random forest (RF) were trained and tested in comparison to logistic regression (LR) and sparse LR. The accuracy of the SVM was 92.4%, and it was 43.7% for C4.5, 61.2% for RF; 94.5% for sparse LR; the AUC for LR was 0.6 with 93% accuracy of the model [44]. Koivu

Discussion
All of the presented works showed the potential of AI methods for improving risk assessment and predicting adverse perinatal outcomes. We found that AI techniques had high prediction values established at around 80-90%, which were better in comparison to logistic regression methods. However, this systematic review did not distinguish the best AI method, and further prospective studies should be performed. We suppose that there are two reasons for this. First, every perinatal complication has different risk factors and occurrences [80][81][82][83], and a comparison of each led to biases of the heterogeneity of the results. The other reason is that most of models were tested on small groups or were not proven prospectively.
Risk factors for PTO, primarily sociological and clinical indicators, the mother's health, problems during pregnancy, comorbidities, laboratory values, and fetal monitoring parameters, were considered. However, many more characteristics might be taken into account to measure the overall risk of a pregnancy. Doing so would increase the model's representativeness and accuracy, but it can be difficult to judge how beneficial this would be. Immunohistochemical and genetic predictors are two of the components that are typically not taken into account. Medical specialties like immunology and genetics have advanced extremely quickly, and these fields of study have produced the orphan molecules responsible for many illnesses. Therefore, combining genetic and immunohistochemistry predictors with the previously discussed socioeconomic, laboratory, and medical history components in AI models may improve their ability to predict outcomes.
Study construction and sample size are the most significant sources of bias in the included studies. There is a majority of retrospective studies or studies where the prospective or retrospective character is not indicated. Moreover, studies where the prospective analysis of patient records was made were often conducted on small samples. As a result, only a few studies provide robust evidence of AI accuracy [35][36][37]42,44,50,71]. Based on those studies, ANN models seem appropriate for extensive patient data in APO prediction.
The usage of AI in obstetrics is not close to being a gold standard. Therefore, this summary of AI application opportunities seems crucial to show the unused potential of these methods. The authors see the opportunity for AI application in daily routine medical challenges. For example, the AI techniques' predictive value could be used during the first assessment of a pregnant woman or even when planning the pregnancy or on the perinatology department to determine the delivery timing. Gathering the potential risk factors and using trained and validated software could lead to very early diagnosis of complications or even their prevention, for example, in patients with GDM during pregnancy [52,53]. Today's decisions are made according to the experience and knowledge of the medical practitioners whose input in medical diagnoses and procedures is not questionable. Nevertheless, human brains are not able to proceed large prospective study data to calculate the exact risk factor in every medical case. We supposed that the digital software could help to increase medical condition prediction.
There are reports of AI applications using mobile software [84,85]. Telemedicine was applied to inform patients about their condition and gestational diabetes state. AI tools for patient examination, hospital monitoring, or everyday clinical routine could improve healthcare results. Pregnant women often underestimate their health status and do not report everything to their medical care providers [86,87]. Combining AI and telemedicine software could help medical care providers assess real-time risks and threats to pregnant women, as it works, for example, in cardiology or diabetology [88][89][90]. As previously mentioned, such information could avoid many unwanted adverse medical conditions. This study had several limitations. One of them was a problem with synthesis because of the heterogeneity of the reported results. Therefore, several deviations into the groups were made to approximate the similar outcomes of the assessed studies. Unfortunately, this was insufficient because of methodological differences and significantly different AI methods. As a result, only AI applications with a calculated LR were quantitively synthesized. Moreover, the heterogeneity of the APO prediction reporting could have an impact on reported results. The detailed assessment of each AI method could provide the readers with the information needed to apply the AI methods in praxis. Nevertheless, there were not enough studies on the same quality level (prospective with large patient groups) according to the same complication. Therefore, more prospective studies could conclude the best AI method.
The strength of this study was its novel character. There are no published studies comparing the usage of artificial intelligence methods to predict perinatology complication occurrences.
This systematic review described AI's application in predicting pregnancy complications, which is one of this study's biggest strengths. It gave a broad overview of the different factors and diseases to which AI can be applied [1,16].

Conclusions
The application of AI methods as a digital software can help medical practitioners in their everyday practice in pregnancy risk assessment. Decision making supported by technology could eliminate the mistakes made because of the imperfect human brain. Based on published studies, models that used ANNs and were tested on large prospective data could be applied in APO prediction. Nevertheless, further studies could identify new methods with even better prediction potential.
Funding: This research did not receive any specific grant from public, commercial, or not-for-profit funding agencies.

Institutional Review Board Statement:
This study was conducted in accordance with the Declaration of Helsinki.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.

Conflicts of Interest:
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.