You are currently viewing a new version of our website. To view the old version click .
Medical Sciences Forum
  • Proceeding Paper
  • Open Access

18 November 2024

A Machine Learning-Based Risk Prediction Model During Pregnancy in Low-Resource Settings †

,
,
and
1
Indian Institute of Technology Delhi, New Delhi 110016, India
2
Royal College of Surgeons in Ireland, University of Medical and Health Sciences, D02 YN77 Dublin, Ireland
*
Author to whom correspondence should be addressed.
Presented at the 2nd International One Health Conference, Barcelona, Spain, 19–20 October 2023.
This article belongs to the Proceedings The 2nd International One Health Conference

Abstract

Maternal health is a serious concern for many nations due to a lack of appropriate healthcare facilities, healthcare staff, and late diagnoses of life-threatening diseases. Pregnant women suffer with numerous challenges during the pregnancy and childbirth. Non-communicable diseases, a lack of nutrition in diets, and unawareness of the risks associated with pregnancy are the primary reasons for these challenges. Sometimes these reasons become a direct cause of maternal mortality as well. Awareness of the risks and early detection may contribute to a reduction in maternal deaths during pregnancy and childbirth. Various ICTs have been incorporated into the healthcare industry to diagnose the issue as quickly as is feasible and an appropriate remedy can be initiated to treat diseases. Machine Learning (ML) techniques have the potential to predict the probable risk factors for timely interventions; however, challenge arises when the data are limited and unstructured. The Decision Tree (DT), Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Linear Discriminant Analysis (LDA) algorithms, with 10-fold cross validation, are used in this study. The dataset utilized in this study included both the present and past medical histories and important vitals of pregnant women. With a test score of 98.8%, the Decision Tree (DT) algorithm outperformed other algorithms, according to the results. Based on the predicted result, pregnant women can consult with medical specialists for their consultation to reduce the potential difficulties in the near future.

1. Introduction

The United Nations (UN) has made maternal health a top focus (UN). To focus on maternal health around the world, the Millennium Development Goals (MDG) were established in 2000, followed by the Sustainable Development Goals (SDG) in 2015 [1]. Maternal mortality rates decreased significantly, but not totally.
Any risk during pregnancy can be detrimental to the health of the mother and unborn child. Although all pregnancies bear risk and necessitate care, a high-risk pregnancy (HRP) always necessitates additional care during and after delivery [2]. Depending on an expectant woman’s medical history, any pregnancy may become high-risk as it progresses. Pregnancies at high risk can be avoided if complications are identified early. Regular checkups before and during pregnancy aid many women in having a risk-free pregnancy and delivery free of significant complications [3].
The objective of this research is to evaluate several machine learning algorithms using a primary dataset in order to determine the most effective approach. This study introduces an innovative prediction model that use machine learning algorithms to identify high-risk pregnancies in rural areas of India. The evaluation of the model’s performance was conducted by assessing its accuracy, sensitivity, and specificity in predicting high-risk events. This approach will assist Accredited Social Health Activists (ASHA) and other frontline healthcare workers (FLHW) in detecting high-risk pregnancies by analyzing symptoms and vital signs, including body temperature, blood pressure, heart rate, fetal heart rate, and breathing rate, in order to ensure prompt and appropriate action. Later, based on the predicted risk, the expectant woman can receive appropriate care to ensure a healthy pregnancy and a childbirth without complications.
In this paper, in Section 2, we compare the various machine learning algorithms proposed by other researchers in their papers. Then, in Section 3, we explain the process of our data processing approach in detail and the collected results are discussed and displayed in Section 4. In Section 5 and Section 6, we propose our conclusion and recommendations.

3. Methodology

Initially, primary data were collected during the field study in the designated villages of the district Udham Singh Nagar in Uttarakhand. During the data-cleaning procedure, it was discovered that the data are imbalanced. The Synthetic Minority Over-Sampling Technique (SMOTE) was employed to achieve a balanced dataset. Subsequently, various machine learning algorithms were applied to the dataset in order to identify the optimal algorithm for the high-risk pregnancy (HRP) prediction model. The entire process is depicted in Figure 1.
Figure 1. A block diagram for generating the machine learning (ML) model.

3.1. Data Collection

Data were gathered from the villages of Mohanpur, Devnagar, Bhuda, Rameshpur, Chutki Devaria, Fulshungi, Khera, Rampura, Bhadipura, Sanjay Nagar, Shaktifarm, and Maharajpur, located in the district of Udham Singh Nagar, Uttarakhand. We presented our work to the ethical committee prior to our field study and obtained their approval. The members of the ethical committee were from the Indian Institute of Technology (IIT) Delhi and the All-India Institute of Medical Science (AIIMS) Delhi. We obtained the consent of all volunteers who participated in the interview and data collection. Their consent was obtained in writing. Consent was obtained from 396 volunteers. After receiving authorization from the authorities, 282 datasets were obtained from the district hospital. Additionally, the SMOTE was employed to construct 159 datasets in order to balance the dataset.
According to figures from the Sample Registration System (SRS), Uttarakhand is the sole state in the country where the Maternal Mortality Ratio (MMR) climbed from 89 in the period of 2015–17 to 103 in the years 2018–20. The Government of India implemented numerous programs and measures to decrease the Maternal Mortality Rate (MMR), despite the fact that the MMR was rising in Uttarakhand. So, these villages were selected for field study of district Udham Singh Nagar, Uttarakhand. The villages indicated above were selected using the stratified random sampling technique. And the respective Sub Center (SC), Primary Healthcare Center (PHC), and Community Healthcare Center (CHC) were visited to conduct interviews with pregnant women. The study involved conducting interviews with pregnant women who had sought care at the Sub Center (SC), Primary Health Center (PHC), and Community Health Center (CHC). These interviews were conducted in collaboration with Accredited Social Health Activists (ASHA) and Auxiliary Nurse Midwives (ANM) in order to gather the necessary data. A description of features of the dataset via the stratified random sampling technique is shown in Table 2. The collected data consist of a total of 837 rows and 13 columns.
Table 2. HRP dataset description.
The study focuses on the population of women of reproductive age (15–49 years) in the Udham Singh Nagar district in Uttarakhand, which totals 519,972 individuals. Among these, 41,006 pregnant women were registered for Antenatal Care (ANC) [18].
To determine the appropriate sample size, Yamane’s formula was applied. Yamane’s formula is given as follows:
n = N 1 N e 2
where N represents the population size and e is the margin of error. For this study, N = 41,006 and e = 5% (0.05). Plugging these values into the formula, we obtain the following:
n = 41006 1 41006 0.05 2 = 396
This calculation yielded a required sample size of 396. However, an additional 282 samples were collected from the District Hospital of Udham Singh Nagar from women who delivered during the same period. Therefore, the total sample size increased to 678. During the classification process, the sample classes were found to be unbalanced. To address this imbalance, the Synthetic Minority Over-sampling Technique (SMOTE) was employed. This technique generated additional samples for the minority class, resulting in a final sample size of 837.
The features listed in Table 2 were chosen after extensive consultation with the gynecologists and surgeons of Jawahar Lal Nehru District Hospital, Udham Singh Nagar, Uttarakhand. They recommended these non-invasive parameters as essential for the monitoring and prediction of high-risk pregnancies (HRP). They assisted in the classification of high- and low-risk classes. Additionally, this dataset was shared and debated with gynecologists from other hospitals to obtain their input in order to establish the risk classification classes. Table 3 provides a summary of the HRP dataset.
Table 3. Summary of HRP dataset.

3.2. Data Cleaning and Processing

The data were gathered from several villages during the field visits. The gathered data, which were initially unbalanced, underwent data balancing procedures. At the outset, there were 455 rows designated as High Risk and 223 rows designated as No Risk. The application of Synthetic Minority Oversampling Techniques (SMOTE) [19] resulted in an increase in the number of No Risk rows to 383, as depicted in Figure 2.
Figure 2. HRP dataset before and after using SMOTE.

3.3. Machine Learning Algorithms

The utilization and prevalence of machine learning are experiencing a notable rise across various domains. Machine learning has proven to be beneficial in the identification and diagnosis of issues within diverse domains of the medical field. The primary aim of this study is to forecast the likelihood of pregnancy-related risks by utilizing the medical history and vital signs of pregnant individuals. Digital technology can assist healthcare workers in identifying pregnancies that have a higher risk. This prediction is accomplished by the utilization of diverse machine learning techniques, including Logistic Regression (LR), Decision Tree (DT), Linear Discriminant Analysis (LDA), Naive Bayes (NB), Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). These algorithms acquire knowledge from the dataset and make predictions for a novel user. The entire dataset was partitioned into an 80:20 ratio, with 80% of the data allocated for training the model and the remaining 20% used for testing [18]. The dataset consists of 837 rows, which belong to two risk classification classes: High Risk, including 455 rows, and No Risk, comprising 382 rows. To achieve optimal accuracy, we employed the 10-fold cross-validation technique for both training and testing purposes.

3.4. Performance Metrics for Comparative Analysis

On the collected dataset, multiple machine learning algorithms such as Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Decision Tree (DT), Naive Bayes (NB), and Support Vector Machine (SVM) were used to predict the risk [19,20]. To assess the accuracy and performance of the chosen algorithms, a comparative analysis employing the 10-fold cross validation testing technique was conducted. The algorithms were analyzed using the following criteria and based on their performance, the best model was selected [21],
Accuracy = TP + TN TP + TN + FP + FN
Precision = TP TP + FP
Sensitivity = TP TP + FN
Specificity = TN TN + FP
F 1   Score = 2 × Precision × Recall Precision + Recall

4. Results

Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Decision Tree (DT), Naive Bayes (NB), and Support Vector Machine (SVM) algorithms were used on the data collected from the field, and results are presented in this section [22,23,24].

Comparative Analysis of Result

All the machine learning algorithms were trained and tested using a k-fold cross validation method. From Table 4 we can see that Support Vector Machine (SVM) has only a 66.67 accuracy, 80% precision, 66% sensitivity, 100% specificity, and a 62% fa-score. Next, the Linear Discriminant Analysis (LDA) algorithm has only an 83.33% accuracy, 84% precision, 83% sensitivity, 80% specificity, and an 83% f1-score. The Naive Bayes (NB) algorithm has an 88.69%accuracy, 89% precision, 89% sensitivity, 86% specificity, and 89% f1-score. On the other hand, Logistic Regression (LR) performed better, with a 91.27% accuracy, 92% precision, 92% sensitivity, 94% specificity, and 92% f1-score. Similarly, on the same dataset, the K-Nearest Neighbors (KNN) algorithm gave better results compared to the abovementioned algorithms, with a 95.23% accuracy, 95% precision, 95% sensitivity, 94% specificity, and a 95% f1-score. Lastly, the Decision Tree (DT) algorithm gave the best result, with a 98.80% accuracy, 99% precision, 99% sensitivity, 99% specificity, and a 99% f1-score. This comparative analysis gave the conclusion that Decision Tree algorithm performed best among all other algorithms.
Table 4. Performance results of machine learning algorithms.
Among all the Machine Learning algorithms used to predict High Risk during pregnancy, the modified Decision Tree algorithm produced the best results.
Regarding the use of the Decision Tree (DT) algorithm with a K-fold cross validation technique, Table 5 indicates a 98.82% accuracy in using the 20% unlabeled dataset. The algorithm predicted 84 samples as true positive (TP), 1 sample as false positive (FP), 1 sample as false negative (FN), and 82 samples as true negative (TN). Every time, the algorithm selects 20% of the test data randomly and, similarly, one fold is selected for testing among K folds of the training dataset. The Decision Tree algorithm gave a 99% accuracy for High Risk and No Risk.
Table 5. Confusion matrix performance result of the selected model.

5. Discussion

Numerous studies and pieces of research have been conducted on the utilization of machine learning algorithms for the prediction of risk during pregnancy. The objective of this study was to enhance the precision of the machine learning algorithm. However, the performance of the utilized algorithms can be influenced by several restrictions associated with the available dataset, including but not limited to accuracy, biasing, and weaknesses [24]. The objective of the design and development of this model was to proactively detect pregnancies with a high risk of complications in order to mitigate maternal mortality rates. Furthermore, this intervention is expected to contribute to the enhancement of maternal health outcomes in rural regions. Due to the limited resources available, numerous healthcare centers in rural areas often consider it a viable alternative. The methodology can be effectively utilized by front-line healthcare workers (FLHWs) operating in remote regions for the purpose of identifying pregnancies that are at a heightened risk.
The objective of this study was to create and implement a robust predictive model for identifying high-risk pregnancies in rural regions. This was achieved by employing a range of machine learning techniques, including Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Decision Tree (DT), Naive Bayes (NB), and Support Vector Machine (SVM). The dataset was partitioned into an 80:20 ratio for the purpose of training and evaluating the models. For each method, a random selection of 20% of the unlabeled data were used for validation purposes. The evaluation of all the aforementioned algorithms was conducted using a 10-fold cross-validation technique. In each cycle, a one-fold validation approach was employed. Upon conducting an analysis of the algorithms, it was observed that the Decision Tree (DT) method yielded the most favorable outcome, exhibiting an accuracy rate of 98.80%, a precision of 99%, sensitivity of 99%, specificity of 99%, and an f1-score of 99%. In our investigation, it was determined that K-Nearest Neighbors (KNN) and Logistic Regression (LR) exhibit notable performance as the second-best algorithms, achieving accuracy, precision, sensitivity, and f1-score values of 95% and 92%, respectively. Linear Discriminant Analysis (LDA) and Naive Bayes (NB) had the third highest levels of accuracy, achieving 89% and 83%, respectively. During the comparative analysis of these machine learning methods, it was observed that the Support Vector Machine (SVM) approach exhibited the lowest accuracy rate of 67%. The Python language was used as a programming language to implement all of the aforementioned Machine Learning (ML) algorithms and to develop the prediction model.
In this study, various machine learning algorithms were used to predict the risk during pregnancy. One limitation of this study is its sample size of 837, which potentially affects the findings’ generalizability. The dataset’s regional specificity may not fully represent broader demographic variations, possibly limiting the model’s applicability across different populations. Additionally, the predictive accuracy of the algorithm hinges on the sleeted features of the dataset, where some other relevant variables are absent due to data constraints, which could impact the outcome. Furthermore, the complexity of machine learning algorithms might obscure the interpretability of predictions, posing challenges for clinical integration, without clear, actionable insights derived from the model’s decision-making process.

6. Conclusions

In this study, it was determined that the Decision Tree (DT) algorithm can serve as a foundational algorithm for the development of a high-risk pregnancy prediction (HRPP) system. The aforementioned approach can prove to be beneficial in geographically isolated regions with limited resources, where healthcare professionals are burdened with additional responsibilities. This system is anticipated to contribute to a reduction in the burden and an enhancement in the efficiency of healthcare professionals, including ASHA and ANMs, by enabling them to evaluate high-risk pregnant women inside their respective healthcare facilities. Pregnant individuals can also utilize this system to evaluate their condition with appropriate training. For the future, there are plans to create a mobile application that may be utilized by healthcare professionals and expectant mothers through their mobile devices. Additionally, the application has the capability to provide recommendations for potential solutions. The successful implementation of this intervention necessitates substantial advice and support from healthcare practitioners. In order to decrease maternal mortality rates and enhance the health of pregnant women, the implementation of information and communication technology (ICT) interventions within the medical field could be beneficial.

Author Contributions

Conceptualization, K.T. and C.M.S.; methodology, K.T.; software, K.T.; validation, K.T., V.M.C. and T.P.; formal analysis, K.T., C.M.S. and T.P.; investigation, K.T., V.M.C. and T.P.; resources, K.T., V.M.C. and T.P.; data curation, K.T., C.M.S., V.M.C. and T.P.; writing—original draft preparation, K.T., V.M.C. and T.P.; writing—review and editing, K.T., C.M.S., V.M.C. and T.P.; visualization, K.T., C.M.S., V.M.C. and T.P.; supervision, V.M.C.; project administration, K.T., V.M.C. and T.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of IIT DELHI (under the protocol number 2021/P020).

Data Availability Statement

The data presented in this study are available upon reasonable request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Tomar, K.; Sharma, C.M.; Sharma, P.; Gupta, D.; Chariar, V.M. Impacts of Environmental Factors on Maternal Health in Low Resource Settings. In Proceedings of the 6th International Conference on Resources and Environment Sciences (ICRES 2024), Bangkok, Thailand, 7–9 June 2024. [Google Scholar] [CrossRef]
  2. Cleveland Clinic. High-Risk Pregnancy. 14 December 2021. Available online: https://my.clevelandclinic.org/health/diseases/22190-high-risk-pregnancy (accessed on 10 April 2023).
  3. US National Institutes of Health. What Is a High-Risk Pregnancy? 31 January 2017. Available online: https://www.nichd.nih.gov/health/topics/pregnancy/conditioninfo/high-risk (accessed on 10 April 2023).
  4. Ebrahimzadeh, F.; Hajizadeh, E.; Vahabi, N.; Almasian, M.; Bakhteyar, K. Prediction of unwanted pregnancies using logistic regression, probit regression and discriminant analysis. Med. J. Islam. Repub. Iran 2015, 29, 828–832. [Google Scholar]
  5. Montella, E.; Ferraro, A.; Sperlì, G.; Triassi, M.; Santini, S.; Improta, G. Predictive Analysis of Healthcare-Associated Blood Stream Infections in the Neonatal Intensive Care Unit Using Artificial Intelligence: A Single Center Study. Int. J. Environ. Res. Public Health 2022, 19, 2498. [Google Scholar] [CrossRef] [PubMed]
  6. Yiu, T. Understanding Random Forest. 12 June 2019. Available online: https://towardsdatascience.com/understanding-random-forest-58381e0602d2 (accessed on 14 April 2023).
  7. Zhu, W.; Zeng, N.; Wang, N. Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis with Practical SAS® Implementations; Northeast SAS Users Group 2010; Health Care and Life Sciences: Baltimore, MD, USA, 2010; pp. 1–9. [Google Scholar]
  8. Lakshmi, B.N.; Indumathi, T.S.; Ravi, N. A Study on C.5 Decision Tree Classification Algorithm for Risk Predictions During Pregnancy. Procedia Technol. 2016, 24, 1542–1549. [Google Scholar] [CrossRef]
  9. Pereira, S.; Portela, F.; Santos, M.F.; Machado, J.; Abelha, A. Predicting Type of Delivery by Identification of Obstetric Risk Factors through Data Mining. Procedia Comput. Sci. 2015, 64, 601–609. [Google Scholar] [CrossRef]
  10. Akbulut, A.; Ertugrul, E.; Topcu, V. Fetal health status prediction based on maternal clinical history using machine learning techniques. Comput. Methods Programs Biomed. 2018, 163, 87–100. [Google Scholar] [CrossRef] [PubMed]
  11. Bautista, J.M.; Quiwa, Q.A.I.; Reyes, R.S.J. Machine learning analysis for remote prenatal care. In Proceedings of the IEEE Region 10 Annual International Conference, Proceedings/TENCON, Osaka, Japan, 16–19 November 2020; pp. 397–402. [Google Scholar] [CrossRef]
  12. Raja, R.; Mukherjee, I.; Sarkar, B.K. A Machine Learning-Based Prediction Model for Preterm Birth in Rural India. J. Healthc. Eng. 2021, 2021, 6665573. [Google Scholar] [CrossRef] [PubMed]
  13. Moreira, M.W.; Rodrigues, J.J.; Kumar, N.; Al-Muhtadi, J.; Korotaev, V. Nature-Inspired Algorithm for Training Multilayer Perceptron Networks in e-health Environments for High-Risk Pregnancy Care. J. Med. Syst. 2018, 42, 51. [Google Scholar] [CrossRef] [PubMed]
  14. Paydar, K.; Niakan Kalhori, S.R.; Akbarian, M.; Sheikhtaheri, A. A clinical decision support system for prediction of pregnancy outcome in pregnant women with systemic lupus erythematosus. Int. J. Med. Inform. 2017, 97, 239–246. [Google Scholar] [CrossRef] [PubMed]
  15. Macrohon, J.J.E.; Villavicencio, C.N.; Inbaraj, X.A.; Jeng, J.-H. A Semi-Supervised Machine Learning Approach in Predicting High-Risk Pregnancies in the Philippines. Diagnostics 2022, 12, 2782. [Google Scholar] [CrossRef] [PubMed]
  16. Yadav, A. Support Vector Machines (SVM)—20 October 2018. Available online: https://towardsdatascience.com/support-vector-machines-svm-c9ef22815589 (accessed on 14 April 2023).
  17. Antonogeorgos, G.; Panagiotakos, D.B.; Priftis, K.N.; Tzonou, A. Logistic Regression and Linear Discriminant Analyses in Evaluating Factors Associated with Asthma Prevalence among 10- to 12-Years-Old Children: Divergence and Similarity of the Two Statistical Methods. Int. J. Pediatr. 2009, 2009, 1–6. [Google Scholar] [CrossRef] [PubMed]
  18. Singh, N.; Nguyen, P.H.; Jangid, M.; Singh, S.K.; Sarwal, R.; Bhatia, N.; Johnston, R.; Joe, W.; Menon, P. District Nutrition Profile: Udham Singh Nagar, Uttarakhand; International Food Policy Research Institute: New Delhi, India, 2022. [Google Scholar]
  19. Hernandez, M.; Epelde, G.; Beristain, A.; Ǻlvarez, R.; Molina, C.; Larrea, X.; Alberdi, A.; Timoleon, M.; Bamidis, P.; Konstantinidis, E. Incorporation of Synthetic Data Generation Techniques within a Controlled Data Processing Workflow in the Health and Wellbeing Domain. Electronics 2022, 11, 812. [Google Scholar] [CrossRef]
  20. Baratloo, A.; Hosseini, M.; Negida, A.; El Ashal, G. Part 1: Simple Definition and Calculation of Accuracy, Sensitivity and Specificity. Emergency 2015, 3, 48–49. [Google Scholar] [PubMed]
  21. Villavicencio, C.N.; Macrohon, J.J.E.; Inbaraj, X.A.; Jeng, J.H.; Hsieh, J.G. COVID-19 prediction applying supervised machine learning algorithms with comparative analysis using weka. Algorithms 2021, 14, 201. [Google Scholar] [CrossRef]
  22. Chauhan, N.S. Decision Tree Algorithm, Explained. 9 February 2022. Available online: https://www.kdnuggets.com/2020/01/decision-tree-algorithm-explained.html (accessed on 14 April 2023).
  23. IBM K-Nearest Neighbors Algorithm. Available online: https://www.ibm.com/topics/knn (accessed on 14 April 2023).
  24. Raschka, S. STAT 479: Machine Learning. 2018. Available online: https://sebastianraschka.com/pdf/lecture-notes/stat479fs18/02_knn_notes.pdf (accessed on 14 April 2023).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.