Next Article in Journal
Thermal and Exergetic Performance Assessment of an ORC Coupled with Thermal Energy Storage Using Thermal Oils for Low-Grade Heat Recovery
Previous Article in Journal
Investigating Muscle Activation Patterns and Muscle Synergies Pre- and Post-Balance Training in Older Adults
Previous Article in Special Issue
A Two-Step Algorithm for Handling Block-Wise Missing Data in Multi-Omics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MultiScaleFusion-Net and ResRNN-Net: Proposed Deep Learning Architectures for Accurate and Interpretable Pregnancy Risk Prediction

1
Department of Computer Science, Aberystwyth University, Penglais Campus, Aberystwyth SY23 3FL, UK
2
Department of Physics, Aberystwyth University, Penglais Campus, Aberystwyth SY23 3FL, UK
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Appl. Sci. 2025, 15(11), 6152; https://doi.org/10.3390/app15116152
Submission received: 16 April 2025 / Revised: 14 May 2025 / Accepted: 16 May 2025 / Published: 30 May 2025
(This article belongs to the Special Issue Application of Artificial Intelligence in Biomedical Informatics)

Abstract

:
Women exhibit marked physiological transformations in pregnancy, mandating regular and holistic assessment. Maternal and fetal vitality is governed by a spectrum of clinical, demographic, and lifestyle factors throughout this critical period. The existing maternal health monitoring techniques lack precision in assessing pregnancy-related risks, often leading to late interventions and adverse outcomes. Accurate and timely risk prediction is crucial to avoid miscarriages. This research proposes a deep learning framework for personalized pregnancy risk prediction using the NFHS-5 dataset, and class imbalance is addressed through a hybrid NearMiss-SMOTE approach. Fifty-one primary features are selected via the LASSO to refine the dataset and enhance model interpretability and efficiency. The framework integrates a multimodal model (NFHS-5, fetal plane images, and EHG time series) along with two core architectures. ResRNN-Net further combines Bi-LSTM, CNNs, and attention mechanisms to capture sequential dependencies. MultiScaleFusion-Net leverages GRU and multiscale convolutions for effective feature extraction. Additionally, TabNet and MLP models are explored to compare interpretability and computational efficiency. SHAP and Grad-CAM are used to ensure transparency and explainability, offering both feature importance and visual explanations of predictions. The proposed models are trained using 5-fold stratified cross-validation and evaluated with metrics including accuracy, precision, recall, F1-score, and ROC–AUC. The results demonstrate that MultiScaleFusion-Net balances accuracy and computational efficiency, making it suitable for real-time clinical deployment, while ResRNN-Net achieves higher precision at a slight computational cost. Performance comparisons with baseline machine learning models confirm the superiority of deep learning approaches, achieving over 80% accuracy in pregnancy complication prediction.

1. Introduction

Pregnancy alters the physiology and structure of a woman’s body in many ways, and it therefore requires close, thorough, and consistent supervision. During this period, mother and fetal health begins to be influenced by numerous clinical, demographic, and lifestyle factors, as depicted in Figure 1.
However, pregnancy monitoring in today’s world uses rather rough and unsophisticated methods that do not seem to take into consideration the differences among patients. The absence of customized monitoring processes increases the chance of early indicators of conditions that could adversely affect both the mother and newborn. Thus, it becomes critical for modern forms of medical systems to require precise tracking technologies with individuality and prevention for maternal and fetal safety.
AI and particularly ML generate new possibilities to improve pregnancy surveillance systems. In the field of healthcare, the study of machine-learning-applicable systems towards analyzing complex patterns in extensive datasets has proved to be effective. In pregnancy monitoring, the data streams that can be analyzed using ML technologies include clinical scan data, devices with wearables, and demographic information. One of the major datasets available in this domain is the National Family and Health Survey-5 (NFHS-5), containing complete data of 136,136 instances and 95 features. The models that can be built with such an extensive dataset are predictive in the broadest sense of the term. Still, the dataset has its issues, such as class imbalance and feature redundancy, which greatly affect the model’s performance. To meet these challenges, data balancing has been conducted using NearSMOTE, and feature selection has been conducted using a method called the least absolute shrinkage and selection operator (LASSO) to obtain better and reliable prediction models [1].
Although various features in any large dataset can improve the predictive power of a model, they can also increase complexity, making it more prone to overfitting. These challenges are avoided when we apply feature selection techniques such as the LASSO because it reduces the model complexity and thus its proficiency. This particular optimization is very important for constructing a surveillance system for maternal health. The least absolute shrinkage and selection operator model provides valid predictive modeling capabilities because its optimal selection patterns simplify the model [1].
Currently, routine pregnancy monitoring remains limited to a great extent by traditional check-up appointments, the use of pencil and paper for data registration, and conventional instruments. These methods do not have real-time assessment of maternal and fetal condition and are worsened by limited resources and unequal access to healthcare. Therefore, the activation of artificial intelligence utilization in the field of maternal health causes positive outcomes regarding these challenges. This approach involves the use of advanced computation mechanisms that analyze various pregnancy-related factors to alert concerned medical practitioners on time [2].
Deep learning, as a branch of AI, has extended these capabilities to a great extent. Neural networks can handle a huge amount of data and analyze complex data sources that include clinicians’ data of mother and fetus and all kinds of imaging data to provide a data-driven view of a mother’s health condition. The use of explainable AI (XAI) builds on these ideas and improves the utility of these systems by interpreting the results of model interpretation for healthcare providers [3].
Modern methods of monitoring involve scanning electronic health record (EHR) data for such complications as pre-eclampsia by predicting the readings of blood pressure, proteinuria, and specific patients’ histories to recommend interventions appropriately [4,5].
All over the world, and despite the modern development and availability of medical resources, it is worth noting that maternal mortality is still rife and was estimated to be at 287,000 cases in 2020. Maternal health outcomes are influenced not only through biomedical factors but also social, environmental, and systemic determinants; therefore, holistic, multi-faceted, and advanced approaches are required for managing the condition beyond clinical medicine [6]. Globally, the community healthcare system considers maternal health as one of the critical success factors for social and economic growth. However, complications during pregnancy remain the major cause of hospitalization and mortality among pregnant women, particularly in countries with low to middle incomes (LMICs). In 2017, approximately 295,000 women died following childbirth according to a WHO report. Statistics indicate that 94% of deaths belong to LMIC countries [7]. The adoption of machine learning and deep learning in the management of maternal healthcare leads to the early prediction of potential high-risk complications, such as preterm labor. In [8], the ability of ML algorithms to analyze the patient records and data from wearable sensors to predict early labor signs accurately has been demonstrated. However, there are currently systems used to monitor pregnancy that need several devices to be connected to a base unit to monitor the number of objectives, including blood oxygenation, temperature, blood pressure, heart rate, fetal heart rate, and uterine contractions [9].
Another of the main application areas of maternal care includes the use of methods and tools to detect fetal and maternal diseases. Due to the innovative utility of deep learning, especially convolutional neural networks (CNNs), ultrasonic visualization and examination have improved to make precise diagnoses of fetal structural abnormalities and placental pathologies. Recall that CNNs have obtained accuracy rates of 92% in the task of defect detection in ultrasound images [10].
A precise image database has been used to a great extent by conducting rigorous research to create procedures that can identify disorders such as spina bifida and cardiac defects. In the same manner, wearable sensor technologies that integrate deep learning extract the vital signs of the mother, such as blood pressure and heart rate, to monitor the clinical well-being of the mother [11].
Owing to technological improvements in fetal MRI in recent years, the capabilities of describing the microstructures and patterns of fetal brain development in utero with higher spatial resolution have become more advanced. However, the manual segmentation process of fetal brain structure configuration is time-consuming and prone to variability. This has made it possible to segment fetal brain images through deep learning techniques such as CNNs and the U-Net family in dealing with aspects such as motion artifacts, low tissue contrast, and limited annotated samples [12].
A supplementary aspect of maternal care is the mental health of the woman during her pregnancy. Naturally, recent works have been devoted to the integration of machine learning approaches with methods of prenatal mental health screening that can contribute to the early stages of clinical practice. For instance, Krishnamurti et al. investigated an interpretable model of using first-trimester structured patient-reported data for early-onset high-risk moderate to severe depression during pregnancy. By incorporating such psychosocial factors as food insecurity, the model performed well in terms of predictive accuracy and remained easy to implement, thus underlining the role of mobile-app-based tools in early mental health screening [13].
The study conducted by Yang et al. [14] is one of the few that applied the approach of machine learning for early pregnancy loss (EPL) prediction across women who had recurrent pregnancy loss (RPL) based on preconception data only. Hou et al. implemented a Gradient Boosting Machine (GBM) on their study’s dataset to develop a strong and efficient predictive model with an AUC of 0.805 depending on nine features, which include age, BMI, induced abortion, previous pregnancy losses, and homocysteine concentration. This approach ensures that working with AI-based models supports early risk assessment for high-risk pregnancies and clinical decision-making.
The emerging trends in deep learning indicate that multimodal approaches aid in clinical prediction. In [15], Wang et al. presented MSTP-Net, which is a three-path framework that enhances medical image segmentation through uniting local and global parameters. For instance, the authors in [16] pointed out that multimodal learning and the attention mechanism in the deep learning framework enable handling of limitations like scarce annotation and lesion variability in medical imaging. These advances also strengthen the importance of AI in improving maternal health monitoring methods and enriching recognition accuracy.
Some of the recent developments show that AI has a significant role in obstetric imaging. Bonnard et al. [17] presented a Prior-Guided Attribution framework aimed at enhancing ultrasound diagnosis by using spatial priors learned during the dataset training phase. Csillag et al. [18] designed AmnioML, a deep learning architecture for segmenting amniotic fluid and estimating volume from fetal MRIs, where the proposed network integrated uncertainty quantification. Wang et al. [19] presented MiTU-Net as a transformer-based U-Net model for the segmentation of fetal head and pubic symphysis, which helped in accurate angle measurement of progression. Together, these milestones indicate surging reliance on AI in enhancing maternal care and reducing the time to diagnose complications.
However, contemporary research further discusses the issues of predictive approaches regarding the processes of maternal care. Bai et al. [20] discussed sonographic thresholds to guide timely treatment for miscarriage. According to Lee et al. [21], sperm DNA damage, characterized by double-strand breaks, is said to predict miscarriage in poor-prognosis couples. Qi et al. [22] developed ML models of postpartum depression based on biopsychosocial information. Hunt et al. [23] described emergency cesarean models as needing clinically useful tools for delivery planning. This work provides support for the previous research findings on the importance of improving the use of data-driven models to improve maternal health.

1.1. Contributions

The major contributions of this study are as follows:

1.1.1. Model Development

  • We propose and develop two deep learning architectures—ResRNN-Net (combining Bi-LSTM, CNN, and attention mechanisms) and MultiScaleFusion-Net (utilizing GRU and multiscale convolution)—designed initially for tabular clinical data.
  • Based on the knowledge from single-modality experiments, we introduce a multimodal deep learning model that integrates tabular features, fetal ultrasound images, and EHG time-series [24] signals into a unified prediction system.

1.1.2. Methodological Innovations

  • A robust data preprocessing pipeline is implemented, addressing class imbalance through a hybrid NearMiss-SMOTE strategy and reducing feature redundancy using LASSO feature selection.
  • Explainability is prioritized through the application of SHAP values (for tabular and time-series data) and Grad-CAM (for imaging data), offering model interpretability that is essential for clinical adoption.

1.1.3. Experimental Rigor

  • Comprehensive 5-fold stratified cross-validation is used for robust evaluation.
  • Performance is compared against strong baseline models such as XGBoost, voting ensemble, TabNet, and MLPs.
  • An in-depth analysis of the trade-offs between precision, recall, computational efficiency, and interpretability is provided.

1.1.4. Real-World Applicability

  • The proposed models demonstrate strong generalizability, scalability, and transparency, making them viable candidates for integration into clinical maternal health monitoring systems.

2. Related Work

The research by Javed et al. presents two novel ensemble models, Echo Dense Inception Blending (EDI-Blend) and Dense Reservoir Inception Modular Network (DRIM-Net), aimed at predicting miscarriage risks by applying advanced data analytics techniques, particularly deep learning (DL). The study utilizes a balanced dataset processed through the hybrid balancing technique NearSMOTE and features selected using the least absolute shrinkage and selection operator (LASSO). The models are validated using 10-fold cross-validation, with EDI-Blend achieving notable performance metrics such as 0.732 accuracy and 0.721 ROC–AUC, while DRIM-Net outperformed with 0.769 accuracy and 0.837 ROC–AUC. The results highlight the potential for these models to aid healthcare professionals in early detection and intervention, ultimately improving patient outcomes [1].
Through real-time data collection, healthcare providers obtain pregnancy-related health information about their patients that helps them to spot potential medical complications early. Wearable sensors possess the capability to send notifications about vital sign changes that indicate pre-eclampsia or gestational diabetes to both patients and healthcare staff [11].
Bada et al. discuss a novel hypertension prediction model by merging deep learning (DL) with transfer learning (TL) techniques to escalate early detection and lessen the risk of cardiovascular diseases. Their perspective entails fine-tuning a pre-trained Feed-Forward Deep Neural Network (FFDNN), initially prospered for diabetes prediction using the PIMA dataset, and modifying it for hypertension prediction using the PPG-BP dataset. They link a hybrid feature selection strategy integrating statistical methods and domain expertise, choosing four major features—systolic and diastolic blood pressure, heart rate, and age—to upgrade the model performance. The TL strategy incorporates freezing pre-trained layers and re-training recently added layers to confirm efficient knowledge transfer while decreasing training time. The final model arrived at an accuracy of 81.34%, with 88% precision and 80% recall, evidencing a strong balance between identifying hypertensive patients and reducing false positives. This effort focuses on the potential of transfer learning to reshape the existing healthcare models for associated diagnostic tasks, especially in data-scarce environments [25].
Barredo Arrieta et al. address the requirement to provide understandable explanations, which faces additional challenges because various stakeholders, including patients, clinicians, and developers, possess different needs. Implementing XAI into clinical operational systems becomes difficult due to the requirement of avoiding workflow disruptions [26].
Gao, Osmundson et al. discuss various deep learning approaches that have multiple applications in pregnancy monitoring to solve crucial maternal and fetal health problems. The prediction of preterm birth stands as the most essential deep learning application because it prevents neonatal mortality and morbidity. Studies conducted by [27] confirm DL models’ achievement to accurately predict risk of extreme preterm birth.
The research by Fiorentino, Villani et al. extensively assesses deep learning applications for fetal anomaly detection, where CNNs demonstrate high precision for anomaly recognition. The detection models achieve outstanding results in identifying minor structural anomalies, which leads to early identification of diseases and quick medical help [28].
The research by Islam, Mustafina et al. presents an extensive review about machine learning systems for pregnancy outcome prediction while demonstrating their ability to improve maternal healthcare with data-based knowledge. The work integrates previous models into hierarchies that group together models based on clinical parameters and demographic factors, fetal health indicators, and more features. Additionally, the authors state that machine learning has improved early risk assessment for gestational diabetes and preterm birth, yet obstacles with balancing data and interpretation difficulties alongside generalization issues block its extensive adoption [29].
Machine learning explainability techniques deliver information about the prediction processes used by models. In their research, Tempel, Groos et al. compare SHAP and Grad-CAM for human activity recognition through a comparative analysis. These researchers demonstrate that SHAP provides optimal feature-level explanations through its quantitative analysis, which lets users track individual input feature contributions; thus, it excels at structured and tabulated datasets. On the other hand, the visual nature of Grad-CAM produces clear heatmaps showing which image sections drive the model toward its decision outcome [30].
Margret et al. introduce an extensive survey on the application of machine learning (ML) and deep learning (DL) box models in maternal healthcare, with an emphasis on reducing maternal mortality. The survey classifies models into white, gray, and black boxes based on interpretability and estimates their efficacy across numerous predictive tasks, such as fetal monitoring, pre-eclampsia detection, and childbirth outcome prediction. Composed of 71 studies, the review illuminates the growing role of AI-driven clinical decision support systems (CDSSs) while also observing their underutilization in the existing research. The researchers underscore the importance of manipulating diverse data sources, including electronic health records, imaging, and behavioral data, and advise to improve model transparency and ethical deployment, explicitly in resource-constrained settings. Medical imaging stands better due to the system that identifies the main differences in images above what traditional assessment tools can detect [31,32].
Sun, Zou et al. present a deep learning model that uses EHR data to forecast preterm birth delivery with excellent accuracy levels while creating time for necessary preventive healthcare actions for at-risk pregnancies [33]. The research by Du, McNestry et al. discusses that the implementation of CDSSs adds multiple capabilities at different levels of health delivery, which minimizes errors in medication prescription and adverse incidents while enhancing clinical oversight, lowering costs, delivering administrative tasks, supporting diagnosis functions and decision systems, and simplifying medical operations [34].
The research by Shaheen, Javed et al. examines a DL-based CDSS for gestational diabetes management that resulted in better patient results and decreased healthcare expenses. The research demonstrates that DL has the potential to advance pregnancy medical care through its ability to deliver prompt data-based insights. The studies showcase obstacles as well as benefits since healthcare systems require better interface design and adaptation of DL models into the current healthcare infrastructure [35]. These comparative approaches are summarized in Table 1, highlighting their respective outcomes, benefits, and limitations in prenatal care.
Ukrit et al. form a web-based maternal health risk forecasting model using machine learning classifiers and IoT-collected vital signs. Within the spectrum of models tested, an ensemble of random forest, AdaBoost, and Gradient Boosting with soft voting secures the highest accuracy of 81.5%. The research highlights the usefulness of ensemble learning in improving pregnancy risk classification for clinical decision support [36].
Li et al. propose PregnancyLine, a visual analysis system that improves pregnancy care and risk communication by revealing temporal medical data through intuitive visual metaphors. The system assists doctor–patient interaction, promotes early detection of abnormalities, and aids pregnant women in understanding their examination results and associated risks. The evaluation results display improved comprehension and reduced anxiety among users [37].
Continuous accumulation of data provides a broad array of information on maternal health and makes it easier for healthcare examiners to examine the changes in pregnancy health over a given period. The integration of DL and IoT systems improves the predictive analysis of societies while availing the early prenatal facilities that help organizations to achieve better health of new mothers and their babies [38]. XAI develops methods to make artificial intelligence frameworks explain their choices in essential environments, including healthcare. However, various limitations block the widespread acceptance of XAI systems. The primary difficulty faced by interpretable artificial intelligence involves choosing between understandable prediction systems that provide limited precision and advanced prediction methods that offer enhanced precision [39].

3. Architectural System Design

This section describes the overall approach used to create the deep learning framework for personalized pregnancy tracking. This includes dataset acquisition, preprocessing, feature selection, and model building.

3.1. Dataset Acquisition

The research utilizes the National Family Health Survey-5 (NFHS-5) dataset, which is publicly available and can be accessed via platforms such as Kaggle (https://www.kaggle.com/datasets/ravisinghiitbhu/nfhs5, accessed on 1 February 2025). The dataset contains 136,136 samples and 95 variables. The variables include essential clinical, demographic, and lifestyle factors related to maternal health, such as BMI, hemoglobin concentration, blood pressure resolution, age, education, previous pregnancy, or if they have breastfed, etc. The target variable shows whether there is the presence or absence of pregnancy complications, modeled as a binary classification problem [40].
Figure 2 presents a snapshot of the NFHS-5 dataset, illustrating features such as respondent age, education level, household assets, and healthcare access. This tabular view highlights the diversity and richness of the collected data, forming the foundation for pregnancy complication prediction.
Figure 3 depicts the class imbalance in the target variable, with approximately 75.7% of instances indicating pregnancy complications and 24.3% representing non-complications. This imbalance was addressed using the NearMiss-SMOTE technique during preprocessing.

3.2. Data Preprocessing

To prepare the dataset, preprocessing steps were applied:
  • Missing values were imputed using mean imputation for numerical features and mode imputation for categorical variables.
  • Categorical features were one-hot-encoded for machine learning compatibility.
  • Numerical features were normalized using StandardScaler to ensure zero mean and unit variance.
A correlation heatmap (Figure 4) was generated to identify redundant features.

3.3. Addressing Class Imbalance

A hybrid approach combining NearMiss under-sampling and SMOTE over-sampling was applied to address class imbalance. The final class distribution is illustrated in Figure 5 and Figure 6.
Figure 7 and Figure 8 show visually similar distributions before and after NearMiss-SMOTE, indicating that the synthetic samples generated by NearMiss-SMOTE follow the original data’s structure without introducing significant outliers.
Table 2 presents a comparative evaluation of the model’s performance with and without the NearMiss-SMOTE technique, illustrating its impact on recall and F1-score.

3.4. Feature Selection Using LASSO

The LASSO (least absolute shrinkage and selection operator) was applied to reduce high-dimensional features to the most relevant predictors. The regularization parameter ( α ) was selected using 5-fold cross-validation, automatically optimized via LassoCV from scikit-learn. The cross-validation procedure ensures a balance between underfitting and overfitting, enabling selection of the most predictive yet sparse feature subset.
After extensive tuning, a value of α = 0.01 was found to produce the best trade-off between model sparsity and performance. This value effectively shrank irrelevant coefficients to zero while preserving the most informative features for predicting pregnancy complications.
The objective function minimized by the LASSO is
min β 1 2 n i = 1 n ( y i X i β ) 2 + α j = 1 p | β j |
Here, α controls the regularization strength, and  β j denotes the coefficients. Features with zero coefficients are excluded.
Figure 9 visualizes the non-zero (most impactful) features identified after regularization, reducing the dataset to 51 key features, some of which include
Sons_died, Daughters_died, Curr_Preg, LastChild_Want, ChildFood_bottle, Child_ putToBrst, Hg_levelAdjusted, B_ChildTwin, Delivery_CSection, Child_Brst, Preg_iron, ChildGiven_otherBMilk, Preg_intParaDrug, ShortBreaths, ChildAge_mnths, Hg_child_adjust, Related, Hypertension, RespDisease, Thyroid, ultrasound, PostnatalChk, Resp_healthChk, DPTB, MMR, Benefit_HCare, Smoke, Prenatal_care, DeliveryPlace_Private, Water_Source_ Natural Source, Water_Source_Other, DPT_full, MEASLES_full, JE_full.

3.5. Model Development

Three deep learning architectures were implemented:
  • ResRNN-Net: Combines Bi-LSTM, CNN, and an attention mechanism to capture sequential dependencies.
  • MultiScaleFusion-Net: Utilizes GRU and multiscale convolutions for robust feature extraction.
  • Multimodal Model: Integrates tabular, imaging, and time-series data for comprehensive pregnancy monitoring.

3.6. Baseline Machine Learning Models

To set a performance baseline for deep learning models, some of the conventional machine learning models were deployed and tested. These models provided an idea of how classical methods could forecast pregnancy complications based on the NFHS-5 dataset.
  • XGBoost (Extreme Gradient Boosting): XGBoost is a powerful ensemble-based boosting algorithm known for its efficiency and accuracy. It builds decision trees sequentially, with each tree correcting errors from the previous one. The model was optimized using hyperparameter tuning to balance bias and variance effectively.
  • Random Forest: This ensemble method builds several decision trees at training time and returns the mode of the classes (classification) or the average prediction (regression). Its power is in preventing overfitting while having high predictive accuracy by randomized feature selection and bootstrapped sets.
  • Logistic Regression: Logistic regression, being a baseline model, offered a simple method of binary classification. It predicts the probability of a class given a linear combination of input features and played a critical role in establishing the minimum attainable performance. Each model was trained on the preprocessed dataset, with the hyperparameters tuned via randomized search.

3.7. System Overview: High-Level Architecture of the Pregnancy Monitoring Framework

The pregnancy monitoring framework was designed as an end-to-end system integrating state-of-the-art deep learning techniques with advanced data processing and explainability tools. The system follows a sequential modular pipeline, as illustrated in Figure 10.
The system starts with the collection of the National Family and Health Survey-5 (NFHS-5) dataset. This dataset contains 136,136 records and 95 features spanning clinical, demographic, and lifestyle aspects of maternal health. In stage two, missing values are addressed through mean imputation for numerical features and mode imputation for categorical features. Numerical features are standardized to provide even scaling, whereas the sole categorical feature, ‘State’, receives one-hot encoding. To improve model efficacy and readability, the LASSO (least absolute shrinkage and selection operator) was used to scale down the dataset into 51 vital features responsible for making accurate predictions of pregnancy complications. Next, three deep neural network models were constructed: ResRNN-Net based on Bi-LSTM, CNN, and attention techniques, MultiScaleFusion-Net using GRU and multiscale convolution blocks for stronger feature fusion, and the multimodal model, which integrates images through CNN, time series by LSTM, and tabular by MLP.
Models were trained with 5-fold stratified cross-validation to have data splits balanced for all folds. Accuracy, precision, recall, F1-score, and ROC–AUC were used as the evaluation metrics. Then, search was used to optimize the learning rate, batch size, and dropout rate to achieve optimal model performance. For model explanation, SHAP (SHapley Additive exPlanation) and Grad-CAM (gradient-weighted class activation mapping) were used, providing global and local understanding of feature importance. The final outcomes were assessed according to model performance criteria, providing robust predictions and explainable insights to inform healthcare decisions. This modular structure guarantees the system’s scalability, stability, and clinical validity, making it flexible for operational deployment in real-world healthcare environments.

3.8. System Architecture

The system architecture comprises five main elements: data preprocessing, feature selection, model development, training and evaluation, and performance metrics with explainable AI.

3.8.1. ResRNN-Net and MultiScaleFusion-Net

Data Preprocessing and Balancing: The NFHS-5 dataset was processed to handle inconsistencies. Missing values were imputed using mean (numerical) and mode (categorical). StandardScaler was applied for feature normalization. Class imbalance was addressed using a hybrid NearMiss under-sampling and SMOTE over-sampling technique.
Feature Selection: The LASSO was applied to select 51 highly predictive features, reducing dimensionality and improving model efficiency.
Model Development: Two deep learning architectures were constructed:
  • ResRNN-Net: A hybrid architecture combining Bi-LSTM (64 units) for sequential learning, CNN (64 filters, kernel size = 3) for spatial feature extraction, an attention mechanism for feature prioritization, and a dense layer (128 units) for final predictions.
  • MultiScaleFusion-Net: This model has a Bidirectional GRU (64 units), and it captures multiscale dependencies using Conv1D layers with kernel sizes of 1, 3, and 5. Outputs are concatenated and processed through a dense layer (128 units) followed by a sigmoid classifier.
The architectural choices of ResRNN-Net and MultiScaleFusion-Net are based on the nature and complexity of the NFHS-5 dataset. While the data are tabular, several variables exhibit logical progressions or temporal dependencies—such as antenatal visit sequences, immunization coverage, and time-related lifestyle changes. To exploit these implicit patterns, ResRNN-Net integrates Bi-LSTM for capturing sequential patterns, CNN for extracting local feature interactions, and attention mechanisms for prioritizing influential signals. On the other hand, MultiScaleFusion-Net employs GRU units for lightweight sequence modeling and leverages multiscale Conv1D filters to capture patterns at varying granularities across feature subsets. These architectures were selected to balance performance, interpretability, and computational efficiency for personalized maternal health prediction tasks.
Training and Evaluation: The data were divided into training (80%) and validation subsets (20%). Deep learning models were trained utilizing the Adam optimizer with a learning rate of 0.0005 and a batch size of 64. Overfitting was avoided with early stopping (patience = 3 epochs) and also learning rate reduced dynamically on plateau in performance by using the ReduceLROnPlateau callback. For rigorous evaluation, 5-fold stratified cross-validation was utilized. Metrics used to evaluate models included accuracy, precision, recall, F1-score, and ROC–AUC, thereby ensuring balanced analysis across all aspects of performance.
Algorithm 1 illustrates the steps of ResRNN-Net and MultiScaleFusion-Net framework.
Algorithm 1: ResRNN-Net and MultiScaleFusion-Net with Ensemble Learning
  • Input: Scaled input features X scaled , binary labels y, input shape s, number of dense blocks L, growth rate k, selected block indices { l 1 , l 2 , , l k }
  • Output: Binary prediction y ^
1: Perform stratified train/validation split
       ( X train , y train ) , ( X val , y val ) StratifiedShuffleSplit ( X scaled , y )
2: Apply SMOTE on training data:
       ( X res , y res ) SMOTE ( X train , y train )
3: Compute class weights:
       w c = n | C c | c { 0 , 1 }
4: Define ResRNN-Net:
       Reshape: X R n × s × 1
       Apply Bi-LSTM: H 1 BiLSTM ( X ) Dropout ( H 1 )
       H 2 LSTM ( H 1 ) Dropout ( H 2 ) BatchNorm ( H 2 )
       CNN path: C Conv 1 D ( X ) Conv 1 D ( C ) BatchNorm ( C )
       Attention vector: A Attention ( Expand ( H 2 ) , Expand ( C ) )
       Final representation: Z [ H 2 , Flatten ( C ) , Flatten ( A ) ]
       Output: y ^ res σ ( W Z + b )
5: Define MultiScaleFusion-Net:
       Reshape: X R n × s × 1
       for each  i = 1 to L do
           if  i { l 1 , l 2 , , l k }  then
                 Add Dense Block with growth rate k
                 Add Transition Layer
           end if
       end for
       Apply multiscale conv:
            Y 1 × 1 ReLU ( W 1 × 1 X + b 1 × 1 )
            Y 3 × 3 ReLU ( W 3 × 3 X + b 3 × 3 )
            Y 5 × 5 ReLU ( W 5 × 5 X + b 5 × 5 )
            Y pool MaxPool ( X )
            Y Concat ( Y 1 × 1 , Y 3 × 3 , Y 5 × 5 , Y pool ) Flatten
       Apply Dense Block: D 1 Dense 128 ( Y ) , D 2 Dense 128 ( D 1 )
       D [ D 1 , D 2 ]
       GRU block: G BiGRU ( Reshape ( D ) ) Dropout ( G )
       Final output: Z [ D , G ] Dense y ^ fusion
6: Train XGBoost:
       Use early stopping with validation set to obtain y ^ xgb
7: Train Voting Classifier:
       Combine random forest and logistic regression (soft voting)
       Output: y ^ vote = α y ^ RF + ( 1 α ) y ^ LR , α = 0.5
8: Return: Predictions from all models: y ^ res , y ^ fusion , y ^ xgb , y ^ vote

3.8.2. Multimodal Model

To predict pregnancy complications, convolutional neural networks (CNNs) are designed for ultrasound images, long short-term memory (LSTM) networks for electrohysterogram (EHG) time series, and multi-layer perceptron (MLP) networks for patient health data are used. The overall architecture is shown in Figure 11.
Data Preprocessing and Balancing: Fetal ultrasound images were resized to 224 × 224 pixels, normalized, and augmented for CNN training. Electrohysterogram (EHG) signals were resampled, min–max-scaled, and class-balanced using SMOTE. Tabular data underwent imputation, feature normalization, and LASSO-based selection to ensure seamless integration across modalities.
Model Development: The multimodal model consists of three key components:
  • Convolutional Neural Networks (CNNs): Used for analyzing ultrasound images and detecting fetal anomalies.
    Z ( l ) = ReLU c = 1 C W c ( l ) X c + b ( l )
    (Convolution + ReLU activation: implemented as nn.Conv2d followed by nn.ReLU())
    P ( l ) = MaxPool Z ( l )
    (Pooling operation: nn.MaxPool2d)
    f CNN = Flatten AdaptiveAvgPool ( P ( L ) )
    (Final [-25]feature vector via global pooling and flattening: nn.AdaptiveAvgPool2d + nn.Flatten)
  • Long Short-Term Memory (LSTM) Networks: Used for processing EHG time-series data and detecting uterine contraction patterns.
    i t = σ W i x t + U i h t 1 + b i
    (Input gate computation)
    f t = σ W f x t + U f h t 1 + b f
    (Forget gate computation)
    o t = σ W o x t + U o h t 1 + b o
    (Output gate computation)
    g t = tanh W g x t + U g h t 1 + b g
    (Candidate cell state)
    c t = f t c t 1 + i t g t
    (Updated cell state)
    h t = o t tanh ( c t )
    (Hidden state)
    f LSTM = h T
    (Final LSTM output: implemented using nn.LSTM, taking last hidden state)
  • Multi-Layer Perceptron (MLP): Used for structured tabular data, capturing demographic and clinical factors.
    h 1 = ReLU ( W 1 x tab + b 1 )
    (First dense layer with ReLU activation:nn.Linear → ReLU)
    f MLP = ReLU ( W 2 h 1 + b 2 )
    (Second dense layer, producing final tabular embedding)
    Outputs from MLP (Figure 12), CNN (Figure 13), and LSTM (Figure 14) models are fused in a shared integration layer before making final predictions. This fusion mechanism ensures that tabular, spatial, and temporal information contributes to more accurate predictions.
f combined = f CNN f LSTM f MLP
y ^ = σ ( w f combined + b )
Training and Evaluation: The model was trained using the Adam optimizer (learning rate = 0.0005) and a batch size of 16/32. Early stopping (patience = 3 epochs) prevented overfitting, while the ReduceLROnPlateau scheduler dynamically adjusted the learning rate upon performance plateau. A 5-fold stratified cross-validation approach was applied to ensure model robustness across different data distributions.
Performance Metrics and Explainable AI: SHAP was used for interpreting feature contributions in tabular and time-series inputs, while Grad-CAM visualizations provided heatmaps highlighting critical regions in ultrasound images, ensuring model transparency.

3.9. Ethical Considerations

This study utilized the publicly accessible and fully anonymized NFHS-5 dataset, provided by the Ministry of Health and Family Welfare, Government of India. As a secondary dataset, it contains no personally identifiable information, and no direct interaction with human subjects occurred. All preprocessing and analysis steps were conducted in accordance with data privacy regulations (Figure 15).

4. Results

To guarantee the generalization and robustness of the constructed deep learning models, ResRNN-Net, MultiScaleFusion-Net, and the multimodal model underwent a stringent cross-validation scheme. The multimodal model was evaluated separately in two experimental settings.
  • Training solely on tabular data (NFHS-5 dataset);
  • Training on a fusion of tabular, image, and time-series data.

4.1. Cross-Validation Strategy

The dataset was divided employing 5-fold stratified cross-validation such that every fold had an equal ratio of positive and negative pregnancy complications. This approach enabled deep learning models to be trained and tested on several data splits, minimizing variance and providing better estimation of model performance. For ResRNN and MultiScaleFusion-Net, in every fold, the data were split into 80% training and 20% validation subsets. Due to the heterogeneous nature of the datasets, special measures were taken for the images, the data were augmented to prevent overfitting, and, for the time series, the data were segmented and processed with padding to maintain sequence length consistency, and then a combination strategy was adopted where each fold contained a balanced subset of all three data types.

4.1.1. Model Training and Evaluation Using Cross-Validation

In both the folds, ResRNN-Net and MultiScaleFusion-Net were trained for 10 epochs with a batch size of 32. The models were validated on the validation set during training using the following key performance metrics: accuracy (overall accuracy of predictions), precision (proportion of true positive predictions out of all positive predictions), recall (capacity to spot true positive cases), F1-score (harmonic mean of precision and recall), and ROC–AUC (robustness over different thresholds). The performance metrics were estimated from the validation set using the evaluation function and contrasted against true labels, with each performance measure being computed. The performance metrics for the final model were obtained by averaging the results from all five folds. ResRNN-Net had an F1-score of 0.82, an ROC–AUC of 0.79, and an accuracy of 79%, which is indicative of excellent predictive power. Likewise, MultiScaleFusion-Net achieved an F1-score of 0.82, an ROC–AUC of 0.80, and an accuracy of 80%, which is very close to the performance of ResRNN-Net. A detailed comparison of the cross-validation results for both models is presented in Table 3.
For the multimodal model, two experiments, as shown in Table 4, were observed and both were trained with 10 epochs, 32 as the batch size, BCE-Loss, and an Adam optimizer with a learning rate of 0.001. The tabular-only model achieved 80% accuracy with balanced performance across all the metrics, while the integrated multimodal model had higher recall (100%), indicating that it detected all the positive cases but at the cost of lower precision (55%).

4.1.2. Advanced Model Training and Evaluation

For increased model refinement, an advanced evaluation and training scheme was adopted involving addressing class imbalance, efficient pipelines, early termination, and maximized threshold learning. The measures ensured stable performance while avoiding significant overfitting. The data were shown to have extreme class imbalance, with complications more common than non-complications. To counter this, SMOTE was used only on the training set, creating synthetic samples for the minority class. Class weights were also calculated to provide greater significance to under-represented samples, encouraging balanced learning. An efficient data pipeline was produced based on TensorFlow’s tf.data API. This pipeline enabled effective batch processing, shuffling, and prefetching, minimizing training bottlenecks. A batch size of 128 was used to trade off computational efficiency against memory usage. Both ResRNN-Net and MultiScaleFusion-Net were trained with the Adam optimizer (learning rate 0.0005) and binary cross-entropy loss. Early stopping (patience of two epochs) avoided overfitting, and ReduceLROnPlateau reduced the learning rate by half upon performance plateau. In order to optimize the F1-score, predictions were tested over thresholds from 0.3 to 0.7. The optimal threshold that yielded the highest F1-score was chosen for ultimate testing. To compare the deep learning models, XGBoost and a voting ensemble (random forest + logistic regression) were trained under the same conditions. XGBoost utilized early stopping, whereas the ensemble utilized soft voting for stable predictions.
Table 5 illustrates the performance metrics of all the models, wherein ResRNN-Net and MultiScaleFusion-Net stood out as the best. The outcome proved that both deep learning models—ResRNN-Net and MultiScaleFusion-Net—performed better than the conventional models, with higher F1-scores, recall, and ROC–AUC. MultiScaleFusion-Net showed slightly higher recall, while ResRNN-Net maintained better balance across all the measures.

4.1.3. Hyperparameter Tuning for ResRNN-Net and MultiScale Fusion Net

To maximize model performance and generalizability, hyperparameter tuning was performed for the ResRNN-Net and MultiScaleFusion-Net architectures. This was conducted to determine the optimal learning rate, batch size, and dropout rate configurations, which would maximize prediction accuracy and model efficiency. The following hyperparameters were tuned:
  • Learning Rate: [0.0003, 0.0005]—Controls gradient update step size.
  • Batch Size: Evaluated at discrete values of 32, 64, and 128—Number of samples processed before model weight updates.
  • Dropout Rate: [0.3, 0.5]—Prevents overfitting by randomly deactivating neurons.
A random search approach was used to explore hyperparameter combinations for both models. For each trial, the models were initialized with the Adam optimizer and binary cross-entropy loss using early stopping (patience: one epoch). The models were evaluated using the F1-score, balancing precision and recall. ResRNN-Net was tested with ten trials and MultiScaleFusion-Net with five trials due to early convergence.
As shown in Table 6, Both models achieved F1-scores above 0.81, showcasing strong prediction performance with balanced precision and recall. The learning rate of 0.0005 ensured smooth convergence, while a batch size of 64 optimized computational efficiency. A 50% dropout rate effectively prevented overfitting without compromising learning capacity.
To determine whether the observed performance difference between ResRNN-Net and MultiScaleFusion-Net was statistically significant, a paired t-test was conducted on the F1-scores across cross-validation folds. The paired t-test yielded a t-statistic of 5.9761 with a p-value of 0.0039, indicating a statistically significant difference at the 0.01 level. These results suggest that, while both models performed competitively, ResRNN-Net demonstrates statistically superior performance under the t-test criterion.

4.1.4. Hyperparameter Tuning for Multimodal Model

To optimize the performance of the multimodal deep learning model, hyperparameter tuning was conducted as shown in Table 7 to identify the best configurations for the learning rate, batch size, and optimizer selection. The search aimed to enhance the model stability and predictive power across the three modalities: tabular, imaging, and time-series data. The following key hyperparameters were considered for tuning:
  • Learning Rate: [0.001, 0.0005].
  • Batch Size: [16, 32].
  • Optimizer: [Adam, AdamW].
A grid search approach was applied across eight hyperparameter configurations, where the model was trained and evaluated on validation accuracy, F1-score, precision, recall, and ROC–AUC. We can obtain better F1-score (83%), accuracy (78.9%), and recall (100%) values by choosing the optimal hyperparameters, which are learning rate = 0.0005, batch size = 16, and Adam as the optimizer.
By selecting optimized hyperparameters for cross-validation, the performance of the multimodal model improved. The accuracy increased from 60% to 63%, as shown in Table 8. This highlights the critical impact of hyperparameter tuning in enhancing model robustness and predictive power.
Discussion on Multimodal Integration: The multimodal model represents an important first step toward integrating structured tabular data, medical imaging, and temporal signals for more comprehensive risk prediction. Initially, the model achieved perfect recall, but fine-tuning the hyperparameters was crucial to improving the balance across the other performance metrics, as shown in Table 8. Currently, we use early fusion to combine the learned embeddings from each modality. However, the drop in precision and only moderate accuracy suggest that this approach may not fully harness the power of cross-modal interactions. To address this, future work will explore alternative fusion strategies, such as late fusion, attention-based modality weighting, and hierarchical feature aggregation. Additionally, we plan to conduct ablation studies using combinations of two modalities to better understand each data type’s contribution and refine our integration strategy further.
Table 9 presents a consolidated overview of the key hyperparameters used across the ResRNN-Net, MultiScaleFusion-Net, and multimodal models. It details the architectural components, such as recurrent layers, convolutional blocks, and fusion strategies, along with their associated configurations, like unit sizes, activation functions, and optimizers.

4.2. Performance Interpretation for ResRNN and MSF Net

The confusion matrix and ROC curve analyses present interesting observations of the efficacy and reliability of ResRNN-Net as well as MultiScaleFusion-Net to predict pregnancy complications. The first important observation is the high recall by both the models, guaranteeing that as many actual complication cases as possible were labeled correctly. The high sensitivity in this context is essential in real-world clinical uses, where the failure to recognize a potential complication would have a negative consequence.
Figure 16 illustrates that the optimal performance for ResRNN-Net was at a threshold of 0.40, whereas the best results for MultiScaleFusion-Net, as shown in Figure 17, were obtained at a marginally increased threshold of 0.45. This variation indicates the need for threshold optimization to realize clinically meaningful predictions. Finally, the collective evidence from the confusion matrix and ROC curve analyses corroborated the models’ performance, proving their feasibility for real-world application in maternal health tracking and early complication identification.

4.3. Multimodal Model Performance

The multimodal model was evaluated in two configurations: tabular-only and integrated (tabular + image + EHG). The tabular-only model achieved 80% accuracy with balanced precision and recall, making it a strong baseline. However, it lacked deeper insights from imaging and physiological signals. The integrated model improved recall (90%), ensuring fewer missed complications, but at the cost of lower precision (55%), leading to more false positives. The integrated model leveraged diverse data for richer predictions. Its increased training complexity and computational demands require further optimization for real-world deployment.

4.4. Discussion and Comparative Analysis

Although competitive performance was shown by all the models, ResRNN-Net and MultiScaleFusion-Net were remarkable due to their higher recall and F1-scores. TabNet was a very interpretable choice, and the MLP was computationally efficient, with excellent accuracy. The model selection was based on the deployment context—ResRNN-Net for clinical sensitivity, TabNet for interpretability, and an MLP for real-time resource-scarce scenarios. The integrated multimodal model, despite its complexity, reinforced the value of combining tabular, image, and time-series data for robust pregnancy complication prediction. Table 10 provides a critical evaluation of all implemented models, comparing key metrics including accuracy, precision, recall, F1-score, and ROC–AUC.

Statistical and Deployment Oriented Insights

To support the experimental findings with practical insights, we examined computational efficiency and deployment feasibility. Both ResRNN-Net and MultiScaleFusion-Net were trained with the best hyperparameters obtained from a random search (learning rate = 0.0005; batch size = 64).
  • Training Time:
    ResRNN-Net took approximately 517.21 s to train for three epochs with early stopping.
    MultiScaleFusion-Net completed training in 113.8 s.
    The quicker convergence of MultiScaleFusion-Net highlights the efficiency of its architecture with multiscale convolutional layers.
  • Inference Time:
    For inference on 1000 samples:
    *
    ResRNN-Net took 2.62 s.
    *
    MultiScaleFusion-Net took 1.41 s.
    This demonstrates the effectiveness of MultiScaleFusion-Net for real-time applications.
  • Trade-offs Between Models
    ResRNN-Net provided higher precision and F1-scores but required longer training and inference times due to recurrent layers.
    MultiScaleFusion-Net achieved similar performance with faster training and inference thanks to efficient convolutional layers, making it well suited for real-time applications.
    The TabNet and MLP models reached 78–80% accuracy but lack the advanced feature extraction of deep learning architectures. Their simplicity aids interpretability but limits them in handling complex data.
    The multimodal model (tabular, image, and time-series data) had lower accuracy (60%), whereas the tabular-only model performed well, achieving (80%) accuracy with stable precision and recall.
    Deployment Recommendation: MultiScaleFusion-Net is preferred for clinical deployment due to its speed and scalability. ResRNN-Net remains useful where precision outweighs efficiency. The model selection should align with the computational capacity, latency constraints, and prediction priorities.
  • Hardware Specifications: All the experiments were conducted using the following hardware:
    Programming Language: Python 3.
    GPU: T4.
    CPU: Intel Xeon @ 2.00 GHz.
    Environment: Google Colab.

4.5. Explainable AI and Feature Importance

For transparency and trust in the pregnancy complication prediction models, explainable AI (XAI) methods were utilized, with SHapley Additive exPlanation (SHAP) being the main focus. This method provided both global feature importance insights and local explanations for each prediction, increasing the interpretability of the ResRNN-Net and MultiScaleFusion-Net models.
To provide deeper interpretability of the SHAP visualizations, it is essential to understand what each feature index represents. Figure 18 presents the complete mapping of the SHAP feature indices to their corresponding clinical, demographic, and lifestyle variables in the NFHS-5 dataset. This mapping aids clinicians and researchers in translating model insights into actionable factors for maternal health evaluation.
The SHAP summary plot in Figure 19 depicts the contribution of every feature regarding model predictions. For ResRNN-Net (including 10 samples), the major features were Feature 47 (Water-Source-Other), Feature 33 (Postnatal-check), Feature 46 (Water-Source-Natural-Source), and Feature 38 (Smoke), as presented by the summary plot. Positive values of SHAP represented features contributing to increased pregnancy complication prediction probabilities, whereas negative values decreased the prediction probability. Feature 17 (Child-putToBrst) and Feature 21 (Child-Brst) had significant contributions, which coincided with the clinical expectations.
Figure 20 presents the Tabular-SHAP analysis for the multimodal model, which reveals that Feature 39 (Prenatal_Care) proves to be a prominent feature, although Feature 27 (Hg_child_adjust) and Feature 24 (Preg_intParaDrug) contribute notably.
The SHAP feature importance bar plot as shown in Figure 21 also calculates the mean contribution of every feature regarding the model predictions. Features such as 47, 33, and 46 recorded the highest mean SHAP values, justifying their critical role in classification. The visualization shows a more significant ordering of feature impact in the direction of decision interpretability by ResRNN-Net. These findings validate the model’s adherence to the clinical expectations and its trustworthiness in practice.
The MultiScaleFusion-Net SHAP feature importance plot in Figure 22 reveals Feature 39 (Prenatal_Care) to be the most influential on the model predictions, with a much greater mean SHAP value than the others. Features 16 (ChildFood_bottle), 26 (ChildAge_mnths), and 33 (PostnatalChk) also make strong contributions, confirming their contribution to pregnancy complication classification. The distribution shows that MultiScaleFusion-Net is founded on a more specific collection of predictive variables than ResRNN-Net.
In order to better understand the individual predictions, SHAP waterfall plots were created for example predictions of the two models. These plots decomposed the contributions of each feature, illustrating how the model arrived at its final prediction. For ResRNN-Net, Figure 23 identified Feature 39 (Prenatal_Care) as the most contributory factor in reducing the prediction probability greatly. On the other hand, Feature 22 (Preg_iron) and others improved the likelihood of a positive prediction. The SHAP values cumulatively accounted for the last output, supporting the decision-making process of the model. Likewise, the waterfall plot of MultiScaleFusion-Net, shown in Figure 24, identified Feature 39 (Prenatal_Care) and Feature 33 (PostnatalChk) as major contributors to the prediction results. Feature 39 (Prenatal_Care) drove the prediction towards the negative class, while Feature 22 (Preg_iron) influenced it positively. The plot emphasizes the model’s reliance on clinically significant features, justifying its conformity to domain expertise.

4.6. Grad-CAM Visualization

To effectively demonstrate the Grad-CAM approach, an alternative image dataset was employed: the Fetal Planes Ultrasound Dataset [41]. This dataset includes ultrasound images of fetal structures, categorized into different planes, such as fetal thorax, fetal brain, fetal abdomen, and maternal cervix. A ResNet-18 model was fine-tuned on this dataset, achieving high accuracy in classifying the fetal planes. The Grad-CAM technique was then applied to visualize the regions that influenced the model’s predictions for each image. As shown in all the images in Figure 25, the heatmaps effectively highlight the critical regions within the ultrasound scans, such as the thoracic cavity in the fetal thorax images or the brain structures in the fetal brain images. This visualization confirmed that the model was making predictions based on medically relevant areas rather than irrelevant background features.
These results demonstrate the potential of Grad-CAM for enhancing interpretability in medical imaging tasks. While it was not directly applicable to the tabular pregnancy dataset, the image-based evaluation provided a clear understanding of how Grad-CAM highlights important features in deep learning models.

5. Conclusions

Worldwide healthcare initiatives consider maternal health to be their foundational building block while affecting public health in addition to social and economic development. The implementation of technology, especially artificial intelligence (AI), in maternal healthcare presents a complete solution for the existing challenges. In this study, we have constructed a stable deep learning framework to predict pregnancy complications based on the NFHS-5 dataset. By adopting a systematic pipeline with data preprocessing, feature selection, model building, and evaluation, the project achieved identifying high-risk pregnancies with great accuracy and explainability.
ResRNN-Net and MultiScaleFusion-Net proved to be the top-performing architectures, utilizing Bi-LSTM, CNNs, and attention techniques to identify subtle relationships in the dataset. ResRNN-Net obtained the best F1-score (82%), balancing precision and recall, while MultiScaleFusion-Net provided similar performance (81%) with quicker inference. Conventional machine learning algorithms such as XGBoost and ensemble classifier lagged behind the deep learning strategies, highlighting the significance of sophisticated architectures for high-complexity clinical predictions. The multimodal model further enhanced predictive power by integrating tabular, image, and time-series data, ensuring higher recall (100%), making it ideal for risk-sensitive applications. The incorporation of explainable AI (XAI) methods, including SHAP and Grad-CAM, additionally improved the model explainability, ensuring that the essential features influencing the predictions were interpretable and clinically meaningful. The deep learning models not only achieved better prediction accuracy but also provided interpretable results, so the system is quite suitable for deployment in maternal health monitoring in the real world.
Fostering the implementation of predictive models in maternal healthcare requires negotiating numerous important directions. Broadening datasets to incorporate more divergent populations—across geographic regions, age groups, and volatile risk factors—will complement the adaptability and tenacity of these models. Refining data quality through state-of-the-art techniques, such as generative models for attributing missing values and the merging of additional data modalities (e.g., genetic, environmental, and behavioral), is predicted to significantly elevate predictive performance. Coherent integration with clinical systems, specifically electronic health records (EHRs), would empower real-time predictions and actionable alerts, thus promoting more reactive clinical decision-making. Enriching model comprehensibility through explainable AI (XAI) techniques and developing intuitive interfaces for healthcare professionals will be crucial for effective implementation. Moreover, sequential studies that track maternal health over time will be fundamental for evaluating the long-term influence and clinical relevance of these predictive frameworks.

Author Contributions

Conceptualization, A.A. and M.S.; methodology, A.A.; software, A.A. and M.S.; validation, A.A., M.S. and S.F.J.; formal analysis, A.A. and M.S.; investigation, A.A. and M.S.; resources, M.A. and E.A.; data curation, A.A.; writing—original draft preparation, A.A. and M.S.; writing—review and editing, S.F.J.; visualization, A.A.; supervision, M.A. and E.A.; project administration, M.A. and E.A.; A.A. and M.S. contributed equally to this work. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Not applicable. The study used publicly available, sanitized data from Kaggle, and no direct involvement of human participants occurred.

Data Availability Statement

The dataset used in this study is publicly available on Kaggle at the following URL: https://www.kaggle.com/datasets/ravisinghiitbhu/nfhs5, accessed on 1 February 2025.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Javed, A.; Javaid, N.; Hasnain, M.; Sarfraz, U.; Ahmed, I.; Shafiq, M.; Choi, J. Applying advanced data analytics on pregnancy complications to predict miscarriage with explainable AI. IEEE Access 2024, 12, 176845–176862. [Google Scholar] [CrossRef]
  2. Koivu, A.; Sairanen, M. Predicting risk of stillbirth and preterm pregnancies with machine learning. Health Inf. Sci. Syst. 2020, 8, 14. [Google Scholar] [CrossRef] [PubMed]
  3. Tjoa, E.; Guan, C. A survey on explainable artificial intelligence (XAI): Toward medical XAI. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4793–4813. [Google Scholar] [CrossRef]
  4. Mhatre, R.S.; Bakal, W.J. Fetal anomaly detection in ultrasound images: A review of deep learning-based approaches. In Proceedings of the 2024 3rd International Conference on Automation, Computing and Renewable Systems (ICACRS), Pudukkottai, India, 4–6 December 2024; pp. 936–942. [Google Scholar]
  5. Peahl, A.F.; Howell, J.D. The evolution of prenatal care delivery guidelines in the United States. Am. J. Obstet. Gynecol. 2021, 224, 339–347. [Google Scholar] [CrossRef]
  6. Souza, J.P.; Day, L.T.; Rezende-Gomes, A.C.; Zhang, J.; Mori, R.; Baguiya, A.; Jayaratne, K.; Osoti, A.; Vogel, J.P.; Campbell, O. Early warning indicators of severe maternal outcomes: A multicountry, facility-based study. Lancet Glob. Health 2024, 12, e306–e316. [Google Scholar] [CrossRef] [PubMed]
  7. World Health Organization (WHO). Maternal Mortality Evidence Brief Key Facts. 2019. Available online: https://iris.who.int/bitstream/handle/10665/329886/WHO-RHR-19.20-eng.pdf (accessed on 2 December 2024).
  8. Allahem, H.; Sampalli, S. Automated labour detection framework to monitor pregnant women with a high risk of premature labour using machine learning and deep learning. Inform. Med. Unlocked 2022, 28, 100771. [Google Scholar] [CrossRef]
  9. Ryu, D.; Kim, D.H.; Price, J.T.; Lee, J.Y.; Chung, H.U.; Allen, E.; Walter, J.R.; Jeong, H.; Cao, J.; Kulikova, E.; et al. Comprehensive pregnancy monitoring with a network of wireless, soft, and flexible sensors in high- and low-resource health settings. Proc. Natl. Acad. Sci. USA 2021, 118, e2100466118. [Google Scholar] [CrossRef]
  10. Nurmaini, S.; Partan, R.U.; Bernolian, N.; Sapitri, A.I.; Tutuko, B.; Rachmatullah, M.N.; Darmawahyuni, A.; Firdaus, F.; Mose, J.C. Deep learning for improving the effectiveness of routine prenatal screening for major congenital heart diseases. J. Clin. Med. 2022, 11, 6454. [Google Scholar] [CrossRef]
  11. Alim, A.; Imtiaz, M.H. Wearable sensors for the monitoring of maternal health—A systematic review. Sensors 2023, 23, 2411. [Google Scholar] [CrossRef]
  12. Ciceri, T.; Squarcina, L.; Giubergia, A.; Bertoldo, A.; Brambilla, P.; Peruzzo, D. Review on deep learning fetal brain segmentation from magnetic resonance images. Artif. Intell. Med. 2023, 143, 102608. [Google Scholar] [CrossRef]
  13. Krishnamurti, T.; Rodriguez, S.; Wilder, B.; Gopalan, P.; Simhan, H.N. Predicting first time depression onset in pregnancy: Applying machine learning methods to patient-reported data. Arch. Women’s Ment. Health 2024, 27, 1019–1031. [Google Scholar] [CrossRef] [PubMed]
  14. Yang, X.; Wang, R.; Zhang, W.; Yang, Y.; Wang, F. Predicting risk of the subsequent early pregnancy loss in women with recurrent pregnancy loss based on preconception data. BMC Women’s Health 2024, 24, 381. [Google Scholar] [CrossRef] [PubMed]
  15. Wang, J.; Li, X.; Ma, Z. Multi-Scale Three-Path Network (MSTP-Net): A new architecture for retinal vessel segmentation. Measurement 2025, 250, 117100. [Google Scholar] [CrossRef]
  16. Tian, Y.; Xu, Z.; Ma, Y.; Ding, W.; Wang, R.; Gao, Z.; Cheng, G.; He, L.; Zhao, X. Survey on deep learning in multimodal medical imaging for cancer detection. Neural Comput. Appl. 2023, 37, 1–16. [Google Scholar] [CrossRef]
  17. Wang, F.; Silvestre, G.; Curran, K. MiTU-Net: A fine-tuned U-Net with SegFormer backbone for segmenting pubic symphysis-fetal head. arXiv 2024, arXiv:2401.15513v1. [Google Scholar]
  18. Csillag, D.; Paes, L.M.; Ramos, T.; Romano, J.V.; Schuller, R.; Seixas, R.B.; Oliveira, R.I.; Orenstein, P. AmnioML: Amniotic fluid segmentation and volume prediction with uncertainty quantification. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2023; Volume 37, pp. 15494–15497. [Google Scholar]
  19. Bonnard, J.; Dapogny, A.; Zsamboki, R.; Braud, L.D.; Jurkovic, D.; Bailly, K.; Dhombres, F. Prior-guided attribution of deep neural networks for obstetrics and gynecology. IEEE J. Biomed. Health Inform. 2024, 28, 870–883. [Google Scholar] [CrossRef] [PubMed]
  20. Xue, B.; Li, T.; Lin, Q. Therapeutic effect of mifepristone combined with misoprostol in early missed miscarriage and prediction of incomplete abortion. Asian Pac. J. Reprod. 2025, 14, 77–83. [Google Scholar] [CrossRef]
  21. Lee, C.-I.; Wang, T.-E.; Chen, C.-H.; Huang, C.-C.; Tsao, H.-M.; Lee, T.-H.; Chang, T.A.; Hsu, C.-T.; Lee, M.-S. Personalized prediction model for miscarriage: In-depth sperm DNA fragmentation. J. Assist. Reprod. Genet. 2025, 1–11. [Google Scholar] [CrossRef]
  22. Qi, W.; Wang, Y.; Wang, Y.; Huang, S.; Li, C.; Jin, H.; Zuo, J.; Cui, X.; Wei, Z.; Guo, Q.; et al. Prediction of postpartum depression in women: Development and validation of multiple machine learning models. J. Transl. Med. 2025, 23, 291. [Google Scholar] [CrossRef]
  23. Hunt, A.; Bonnett, L.; Heron, J.; Lawton, M.; Clayton, G.; Smith, G.; Norman, J.; Kenny, L.; Lawlor, D.; Merriel, A. Systematic review of clinical prediction models for the risk of emergency caesarean births. BJOG Int. J. Obstet. Gynaecol. 2024, 132, 231–240. [Google Scholar] [CrossRef]
  24. EHG Time Series Dataset. Available online: https://physionet.org/content/ehgdb/1.0.0/ (accessed on 7 January 2025).
  25. Bada, A.B.; Garko, A.B.; Gabi, D. Hypertension prediction using deep learning with transfer learning techniques. Fudma J. Sci. 2024, 8, 257–263. [Google Scholar] [CrossRef]
  26. Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
  27. Gao, C.; Osmundson, S.; Edwards, D.R.V.; Jackson, G.P.; Malin, B.A.; Chen, Y. Deep learning predicts extreme preterm birth from electronic health records. J. Biomed. Inform. 2019, 100, 103334. [Google Scholar] [CrossRef]
  28. Fiorentino, M.C.; Villani, F.P.; Cosmo, M.D.; Frontoni, E.; Moccia, S. A review on deep-learning algorithms for fetal ultrasound-image analysis. Med. Image Anal. 2023, 83, 102629. [Google Scholar] [CrossRef]
  29. Islam, M.N.; Mustafina, S.N.; Mahmud, T.; Khan, I. Machine learning to predict pregnancy outcomes: A systematic review, synthesizing framework and future research agenda. BMC Pregnancy Childbirth 2022, 22, 348. [Google Scholar] [CrossRef]
  30. Tempel, F.; Groos, D.; Ihlen, E.A.; Adde, L.; Strümke, I. Choose your explanation: A comparison of SHAP and GradCAM in human activity recognition. arXiv 2024, arXiv:2412.16003. [Google Scholar]
  31. Margret, I.N.; Rajakumar, K.; Arulalan, K.V.; Manikandan, S.V. Statistical insights into machine learning-based box models for pregnancy care and maternal mortality reduction: A literature survey. IEEE Access 2024, 12, 68184–68207. [Google Scholar] [CrossRef]
  32. Micucci, M.; Iula, A. Recent advances in machine learning applied to ultrasound imaging. Electronics 2022, 11, 1800. [Google Scholar] [CrossRef]
  33. Sun, Q.; Zou, X.; Yan, Y.; Zhang, H.; Wang, S.; Gao, Y.; Liu, H.; Liu, S.; Lu, J.; Yang, Y.; et al. Machine learning-based prediction model of preterm birth using electronic health record. J. Healthc. Eng. 2022, 2022, 9635526. [Google Scholar] [CrossRef]
  34. Du, Y.; McNestry, C.; Wei, L.; Antoniadi, A.M.; McAuliffe, F.M.; Mooney, C. Machine learning-based clinical decision support systems for pregnancy care: A systematic review. Int. J. Med. Inform. 2023, 173, 105040. [Google Scholar] [CrossRef]
  35. Shaheen, I.; Javaid, N.; Alrajeh, N.; Asim, Y.; Aslam, S. Hi-Le and HiTCLe: Ensemble learning approaches for early diabetes detection using deep learning and explainable artificial intelligence. IEEE Access 2024, 12, 66516–66538. [Google Scholar] [CrossRef]
  36. Ukrit, M.F.; Jeyavathana, R.B.; Rani, A.L.; Chana, V. Maternal Health Risk Prediction with Machine Learning Methods. In Proceedings of the 2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE), Vellore, India, 22–23 February 2024; pp. 1–8. [Google Scholar]
  37. Li, L.; Liu, Z.; Wang, X.; Zhang, X.; Wang, L. PregnancyLine: A Visual Analysis System for Pregnancy Care and Risk Communication. IEEE Access 2022, 10, 111493–111505. [Google Scholar] [CrossRef]
  38. Tripathy, S.S.; Imoize, A.L.; Rath, M.; Tripathy, N.; Bebortta, S.; Lee, C.; Chen, T.; Ojo, S.; Isabona, J.; Pani, S.K. A novel edge-computing-based framework for an intelligent smart healthcare system in smart cities. Sustainability 2022, 15, 735. [Google Scholar] [CrossRef]
  39. Hulsen, T. Explainable artificial intelligence (XAI): Concepts and challenges in healthcare. AI 2023, 4, 652–666. [Google Scholar] [CrossRef]
  40. Singh, R. NFHS-5 Dataset on Kaggle. Available online: https://www.kaggle.com/datasets/ravisinghiitbhu/nfhs5 (accessed on 10 December 2024).
  41. Medical Imaging Dataset. Available online: https://zenodo.org/records/3904280 (accessed on 1 February 2025).
Figure 1. Critical factors affecting maternal health.
Figure 1. Critical factors affecting maternal health.
Applsci 15 06152 g001
Figure 2. Sample view of the NFHS-5 dataset.
Figure 2. Sample view of the NFHS-5 dataset.
Applsci 15 06152 g002
Figure 3. Class distribution.
Figure 3. Class distribution.
Applsci 15 06152 g003
Figure 4. Feature correlation heatmap.
Figure 4. Feature correlation heatmap.
Applsci 15 06152 g004
Figure 5. Class distribution before applying SMOTE and NearMiss.
Figure 5. Class distribution before applying SMOTE and NearMiss.
Applsci 15 06152 g005
Figure 6. Class distribution after applying SMOTE and NearMiss.
Figure 6. Class distribution after applying SMOTE and NearMiss.
Applsci 15 06152 g006
Figure 7. Original train data distribution (PCA).
Figure 7. Original train data distribution (PCA).
Applsci 15 06152 g007
Figure 8. SMOTE-Resampled train data distribution (PCA).
Figure 8. SMOTE-Resampled train data distribution (PCA).
Applsci 15 06152 g008
Figure 9. LASSO feature importance plot.
Figure 9. LASSO feature importance plot.
Applsci 15 06152 g009
Figure 10. The implementation procedure for the deep learning framework—pipeline overview.
Figure 10. The implementation procedure for the deep learning framework—pipeline overview.
Applsci 15 06152 g010
Figure 11. Proposed deep learning system for pregnancy risk prediction. (i) ResRNN-Net: combines Bi-LSTM and CNN to capture temporal and spatial features, followed by attention-based fusion and dense layers for classification. (ii) MultiScaleFusion-Net: utilizes parallel Conv1D layers at different scales, followed by GRU and dense layers to extract hierarchical features for final prediction.
Figure 11. Proposed deep learning system for pregnancy risk prediction. (i) ResRNN-Net: combines Bi-LSTM and CNN to capture temporal and spatial features, followed by attention-based fusion and dense layers for classification. (ii) MultiScaleFusion-Net: utilizes parallel Conv1D layers at different scales, followed by GRU and dense layers to extract hierarchical features for final prediction.
Applsci 15 06152 g011
Figure 12. Multi-layer perceptron (MLP) network architecture.
Figure 12. Multi-layer perceptron (MLP) network architecture.
Applsci 15 06152 g012
Figure 13. Convolutional neural network architecture.
Figure 13. Convolutional neural network architecture.
Applsci 15 06152 g013
Figure 14. Long short-term memory (LSTM) network architecture.
Figure 14. Long short-term memory (LSTM) network architecture.
Applsci 15 06152 g014
Figure 15. Extended proposed multimodal deep learning architecture. The system fuses three distinct input modalities: (i) ultrasound images processed through a CNN backbone, (ii) EHG time series processed with LSTM layers, and (iii) structured tabular data processed by a multi-layer perceptron (MLP). Learned feature vectors are concatenated and passed through fully connected layers for final binary classification, enabling comprehensive prediction by leveraging heterogeneous data sources.
Figure 15. Extended proposed multimodal deep learning architecture. The system fuses three distinct input modalities: (i) ultrasound images processed through a CNN backbone, (ii) EHG time series processed with LSTM layers, and (iii) structured tabular data processed by a multi-layer perceptron (MLP). Learned feature vectors are concatenated and passed through fully connected layers for final binary classification, enabling comprehensive prediction by leveraging heterogeneous data sources.
Applsci 15 06152 g015
Figure 16. Confusion matrix and ROC–AUC curve for ResRNN-Net.
Figure 16. Confusion matrix and ROC–AUC curve for ResRNN-Net.
Applsci 15 06152 g016
Figure 17. Confusion matrix and ROC–AUC curve for MultiScale Fusion-Net.
Figure 17. Confusion matrix and ROC–AUC curve for MultiScale Fusion-Net.
Applsci 15 06152 g017
Figure 18. Feature mapping and importance.
Figure 18. Feature mapping and importance.
Applsci 15 06152 g018
Figure 19. SHAP plot: ResRnn.
Figure 19. SHAP plot: ResRnn.
Applsci 15 06152 g019
Figure 20. SHAP plot: multimodal [tabular].
Figure 20. SHAP plot: multimodal [tabular].
Applsci 15 06152 g020
Figure 21. SHAP summary for ResRNN-Net.
Figure 21. SHAP summary for ResRNN-Net.
Applsci 15 06152 g021
Figure 22. SHAP summary for MultiScale Fusion Net.
Figure 22. SHAP summary for MultiScale Fusion Net.
Applsci 15 06152 g022
Figure 23. SHAP waterfall for ResRNN-Net.
Figure 23. SHAP waterfall for ResRNN-Net.
Applsci 15 06152 g023
Figure 24. SHAP waterfall for MultiScale Fusion-Net.
Figure 24. SHAP waterfall for MultiScale Fusion-Net.
Applsci 15 06152 g024
Figure 25. Confusion matrix and ROC–AUC curve for ResRNN-Net.
Figure 25. Confusion matrix and ROC–AUC curve for ResRNN-Net.
Applsci 15 06152 g025
Table 1. Comparison of different techniques in prenatal care.
Table 1. Comparison of different techniques in prenatal care.
TechniquesOutcomesBenefitsLimitationsReference
XGBoost + Neural NetworksHigh precision in predicting stillbirth risksEffectively handles large datasets, improves clinical decisionsModel interpretability issues, sensitive to data quality[2]
Wireless Sensors + Machine LearningReal-time tracking of maternal health parametersEnables remote monitoring, beneficial for low-resource settingsData security concerns, variability in sensor accuracy[9]
CNN + LASSO + Grad-CAM92% accuracy in fetal anomaly detectionEarly identification of fetal defects, improved diagnostic accuracyRequires large labeled datasets, high computational demand[10,28]
Deep Learning + Transfer Learning81.34% accuracy, 88% precision, 80% recallEnhances prediction of hypertensive disorders during pregnancyRequires extensive pre-trained models, risk of bias in transfer learning[25]
Deep Neural Networks + SHAPImproved explainability of preterm birth risk factorsProvides interpretable results, identifies key risk indicatorsComputationally intensive, explainability still limited[27]
CNN + LSTM + SHAPHigh accuracy in predicting diabetes risk during pregnancyCaptures time-series variations, improves patient-specific monitoringRequires high-quality time-series data, model complexity[29]
Random Forest + SVMHigh-risk pregnancy classification modelEnhances early identification of maternal mortality risksLess effective with complex, high-dimensional data[31]
Logistic Regression + Decision Trees88.5% accuracy in preterm birth predictionSimple model with interpretable resultsLower accuracy compared to deep learning models[33]
Decision Trees + Different algorithmsHighlighted gaps in explainabilityBalances simplicity with predictive powerStruggles with unstructured clinical data[34]
Table 2. Model performance comparison with and without NearMiss-SMOTE.
Table 2. Model performance comparison with and without NearMiss-SMOTE.
MetricWithout NearMiss-SMOTEWith NearMiss-SMOTE
Accuracy0.7980.791
Precision0.7490.725
Recall0.8970.938
F1-Score0.8160.818
ROC–AUC0.8660.864
Note: Bold values indicate the best performance between the two configurations.
Table 3. Cross- validation performance of ResRNN-Net and MultiScale Fusion Net.
Table 3. Cross- validation performance of ResRNN-Net and MultiScale Fusion Net.
Performance MetricsResRNN-NetMultiScaleFusion-Net
Accuracy0.79 ± 0.00220.80 ± 0.0040
Precision0.74 ± 0.00330.74 ± 0.0121
Recall0.91 ± 0.01000.91 ± 0.0169
F1-Score0.82 ± 0.00390.82 ± 0.0027
ROC–AUC0.79 ± 0.00170.80 ± 0.0036
Table 4. Cross-validation performance of tabular and integrated multimodal models.
Table 4. Cross-validation performance of tabular and integrated multimodal models.
Performance MetricsMultimodal Model (Tabular)Multimodal Model (Integrated All 3 Datasets)
Accuracy0.80 ± 0.00250.60 ± 7.2192
Precision0.75 ± 0.00810.55 ± 0.0517
Recall0.91 ± 0.01841.00 ± 0.0000
F1-Score0.82 ± 0.00370.71 ± 0.0407
ROC–AUC0.80 ± 0.00250.60 ± 0.0738
Table 5. Summary of model performance metrics with mean ± standard deviation.
Table 5. Summary of model performance metrics with mean ± standard deviation.
ModelBest ThresholdAccuracyPrecisionRecallF1-ScoreROC–AUC
ResRNN-Net0.400.800.730.930.820.87
MultiScaleFusion-Net0.450.790.720.950.810.86
XGBoost0.450.710.640.940.760.81
Voting Ensemble0.400.780.720.930.810.86
Mean ± Standard Deviation 0.7771 ± 0.02990.7128 ± 0.03030.9325 ± 0.02260.8075 ± 0.02120.8539 ± 0.0213
Table 6. Best hyperparameters found.
Table 6. Best hyperparameters found.
ModelLearning RateBatch SizeDropout RateF1-Score
ResRNN-Net0.0005640.50.82 ± 0.0028
MultiScaleFusion-Net0.0005640.50.81 ± 0.0018
Table 7. Hyperparameter optimization results for the multimodal model [integrated].
Table 7. Hyperparameter optimization results for the multimodal model [integrated].
Learning RateBatch SizeOptimizerAccuracyF1-ScorePrecisionRecallROC–AUC
0.001016Adam0.52630.66670.52940.90.5056
0.001016AdamW0.57890.55560.62500.50.5833
0.001032Adam0.52630.66670.52940.90.5056
0.001032AdamW0.52630.68970.52631.00.5000
0.000516Adam0.78950.83330.71431.00.7778
0.000516AdamW0.63160.66670.63640.70.6278
0.000532Adam0.63160.74070.58821.00.6111
0.000532AdamW0.57890.50000.66670.40.5889
Note: Bold values and gray shading highlight the best-performing configuration based on all evaluation metrics.
Table 8. Cross-validation performance of multimodal model (integrated) using hyperparameter tuning.
Table 8. Cross-validation performance of multimodal model (integrated) using hyperparameter tuning.
Performance MetricsMultimodal Model (Integrated All 3 Datasets)
Old ValueNew Value
Accuracy0.600.63
Precision0.550.59
Recall1.000.90
F1-Score0.710.71
ROC–AUC0.600.63
Table 9. Summary of key model hyperparameters.
Table 9. Summary of key model hyperparameters.
ModelComponentHyperparametersValues
ResRNN-NetBi-LSTMUnits64
CNNFilters/Kernel Size64/3
AttentionTypeAdditive Attention
Dense LayerUnits/Activation128/ReLU
OptimizerType/LR/Batch SizeAdam/0.0005/64
MultiScaleFusion-NetConv1DKernel Sizes1, 3, 5
BiGRUUnits64
Dense LayersUnits128 × 2 layers
FusionMethodConcatenation
Multimodal ModelCNNInput Size/Activation224 × 224/ReLU
LSTMUnits/Sequence Length64/Variable
MLPLayers/Activation2 layers (64, 32)/ReLU
FusionStrategyConcatenation + Dense Layer
Table 10. Critical evaluation of models implemented.
Table 10. Critical evaluation of models implemented.
ModelAccuracyPrecisionRecallF1-ScoreROC–AUC
ResRNN-Net0.800.730.930.820.87
MultiScaleFusion-Net0.790.720.940.810.86
TabNet0.780.890.780.780.85
MLP (PyTorch)0.79----
Multimodal Model0.800.750.900.810.80
Multimodal Model (Integrated 3 datasets)0.600.551.000.710.59
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Asad, A.; Sarwar, M.; Aslam, M.; Akpokodje, E.; Jilani, S.F. MultiScaleFusion-Net and ResRNN-Net: Proposed Deep Learning Architectures for Accurate and Interpretable Pregnancy Risk Prediction. Appl. Sci. 2025, 15, 6152. https://doi.org/10.3390/app15116152

AMA Style

Asad A, Sarwar M, Aslam M, Akpokodje E, Jilani SF. MultiScaleFusion-Net and ResRNN-Net: Proposed Deep Learning Architectures for Accurate and Interpretable Pregnancy Risk Prediction. Applied Sciences. 2025; 15(11):6152. https://doi.org/10.3390/app15116152

Chicago/Turabian Style

Asad, Amna, Madiha Sarwar, Muhammad Aslam, Edore Akpokodje, and Syeda Fizzah Jilani. 2025. "MultiScaleFusion-Net and ResRNN-Net: Proposed Deep Learning Architectures for Accurate and Interpretable Pregnancy Risk Prediction" Applied Sciences 15, no. 11: 6152. https://doi.org/10.3390/app15116152

APA Style

Asad, A., Sarwar, M., Aslam, M., Akpokodje, E., & Jilani, S. F. (2025). MultiScaleFusion-Net and ResRNN-Net: Proposed Deep Learning Architectures for Accurate and Interpretable Pregnancy Risk Prediction. Applied Sciences, 15(11), 6152. https://doi.org/10.3390/app15116152

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop