Implementation of Predictive Analytics in Healthcare Using Hybrid Deep Learning Models

Kargotra, Poonam; Parray, Irfan Ramzan; Malik, Arun; Kharisma, Ivana Lucia

doi:10.3390/engproc2025107067

Open AccessProceeding Paper

Implementation of Predictive Analytics in Healthcare Using Hybrid Deep Learning Models^†

¹

School of Computer Science and Engineering, Lovely Professional University, Phagwara 144411, India

²

Informatic Engineering, Nusa Putra University, Sukabumi 43152, West Java, Indonesia

^*

Author to whom correspondence should be addressed.

^†

Presented at the 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society, Aizuwakamatsu City, Japan, 20–26 January 2025.

Eng. Proc. 2025, 107(1), 67; https://doi.org/10.3390/engproc2025107067

Published: 8 September 2025

(This article belongs to the Proceedings of The 7th International Global Conference Series on ICT Integration in Technical Education & Smart Society)

Download

Browse Figures

Versions Notes

Abstract

Predictive analytics has emerged as a powerful tool for improving decision-making in healthcare, particularly in disease prediction and patient management. However, conventional architectures may find it difficult to handle various features of healthcare data, such as high dimensionality and ineffective measures to handle unstructured data. This work examines the shortcomings of the traditional ML strategy by fusing deep learning approaches with the existing models in an improved predictive performance. Specifically, we propose three hybrid models: (1) Random Forest and Neural Networks (RF + NN), (2) XGBoost and Neural Networks (XGBoost + NN), and (3) Autoencoder and Random Forest (Autoencoder + RF). The goal is to compare these models’ ability to predict healthcare outcomes using standard performance metrics, which include the measures of accuracy, precision, recall, and F1-score. An important research gap revealed from the literature review is that most models tend to have higher precision at the cost of recall and vice versa. Our proposed hybrid models combine the strengths of feature selection from traditional algorithms (RF, XGBoost) with the advanced pattern recognition capabilities of Neural Networks (NNs) and autoencoders, aiming for a more balanced predictive performance. The RF + NN model produces the highest accuracy at 96.81%, with precise accuracy at 90.48% and accurate precision at 70.08%. Nevertheless, the accuracy of a slightly lower XGBoost + NN model of 96.75% showed better actual capability of identifying true positives than false positives, with 73.54% recall. From our results it is evident that the best model in terms of precision was the Autoencoder + RF model, with a precision of 91.36%; it was however the worst in recall, with only 66.22%. Accordingly, these findings imply that for the same level of predictive accuracy, the hybrid models are better in handling imbalanced problems and provide directions for better healthcare predictive systems in the future.

Keywords:

predictive analytics; deep learning random forest; ensemble techniques; XGBoost; neural network; machine learning

1. Introduction

In recent years, the healthcare industry has witnessed an unprecedented growth in data generation, ranging from electronic health records (EHRs) to medical imaging and genomic data [1,2]. The progress in the technologies for data collection and availability of big healthcare data has led to change in the healthcare industry. Whether it is the shift from Electronic Health Records (EHRs) to medical imaging, genomic sequencing, and even to patient monitoring systems, the amount of available health data is rapidly increasing [3,4]. This holds out much potential for enhancing diagnostic precision, individualized treatment regimens, and therapeutic consequences by means of enhanced computational methods. However, health care data has some challenges; these are: heterogeneity in forms of structured and unstructured data, missing values, and high dimensionality, as seen with medical image data and genomic sequences. These characteristics for the most part present a challenge to conventional methods in machine learning, and this has called for advanced models in handling big, scattered, and noisy datasets.

In this context, deep learning algorithms [1] have come up to solve the challenges that arise as a result of the healthcare data. Artificial Neural Networks (ANNs) are among the most successful deep learning algorithms in today’s world, where people have utilized them in various areas to automate certain tasks, such as disease diagnosis, picture identification, and patient prognosis. However, one model is usually insufficient to describe the variability of patterns inherent in large medical datasets. This is where ensemble learning becomes important. Multiple model integration techniques that involves combining the results of several models into a final output have been shown to be highly useful in the field of medicine [5].

Notably, different ensembled deep learning models comprising Autoencoders, Random Forests, XGBoost, and other similar algorithms are gradually becoming popular for predictive healthcare analytics [2]. Autoencoders, a unique type of unsupervised mode of a neural network, are principally used for dimensionality reduction and feature extraction. These models help to reduce the complexity of the data and highlight the most significant characteristics while reducing the subsequent predictions’ computational requirements. When used with classifiers such as Random Forests or XGBoost that work well on ordered data and can detect interactions, Autoencoders are a suitable tool for effectively handling healthcare data.

Another area that is likely to greatly benefit from integration of deep learning models is the prediction of chronic diseases [3], especially diabetes—a major global public health concern. Diabetes is a chronic disease, and people with diabetes have an increased risk factor for developing such complications as cardiovascular disease, renal failure, and neuropathy; if diagnosed early, people can help reduce these complications. Other conventional techniques used to diagnose people living with diabetes involve reviewing their histories and conducting various diagnostic examinations, which are expensive, time consuming, and limited by human error. Thus, it is proposed to apply Autoencoders for feature extraction and Random Forest or XGBoost for classification and achieve higher accuracy and lower costs of diabetes prediction.

The objective of this work is to understand the effectiveness of Autoencoders combined with Random Forest and XGBoost for predicting diabetes. The key contributions of this work include the following:

Leveraging Autoencoders for automated feature extraction from high-dimensional clinical data, such as EHRs and medical images.
Combining tree-based models (Random Forest and XGBoost) with neural network-based feature extraction to improve predictive accuracy and robustness.
Evaluating the ensemble approach against traditional machine learning models in terms of key performance metrics such as accuracy, precision, recall, and F1-score.

Apart from the predictive performance, the current study will also explore the explainability of the prediction model, a particularly important area in the context of healthcare, as the model’s reasoning needs to be trusted by clinicians and patients. Two tree-based methods, decision tree and boosting, are black-box models that lack interpretability as demanded by critical users of the AI-driven solutions for healthcare [6].

This study will also show how incorporating deep learning algorithms with ordinary machine learning algorithms enhances more accurate, explainable, and efficient healthcare prediction models. It may offer a blueprint for creating relevant tools for real-time diagnosis, aspiring for a quicker diagnosis, individualistic approach, and more effective outcomes for the patients. Figure 1 shows the different key points in healthcare predictive analytics.

1.1. Key Challenges in Healthcare Data Analytics

Despite these important opportunities, there are various difficulties in the application of the given deep learning models to healthcare data [4]. There are certain conditions: primary conditions in healthcare datasets skew greatly to diabetes, but it in many cases this covers only a small proportion of the overall data, and a model tends to predict the majority class [7]. Further, the data collected will be noisy data because of the impreciseness or rounding off and may also provide missing values. Data availability and data security are two other important aspects in healthcare settings, particularly when dealing with patient-related information. As a result, we must be certain not only that predictive models obtain high accuracy rates but that they do not violate the requirements of data sharing and usage.

In addition, the fact that deep learning model-specific decisions are challenging to explain hinders the implementation of these systems in the healthcare sector [8]. There is always a need to explain the results based on the models’ predictions because clinicians need to rely on the results to make sound decisions. This is especially so when you can find ways of developing models from the ensemble members such as Random Forest and XGBoost that are relatively easy to interpret compared to more complex black box models like deep learning Neural Networks.

1.2. Significance of the Study

This research is relevant to continue filling the gap between state-of-the-art modern deep learning algorithms and the possibilities of practical implementation in the world of healthcare, with the given focus on the prediction of such chronic diseases as diabetes. In this study, Autoencoders, Random Forest, and XGBoost algorithms are adopted to examine the integrated application of models in dealing with big data dimensions, improving predictive accuracy and revealing disease characteristics. The conclusions of this study highlighted by the authors may provide a direct influence on the future development of the use of AI in healthcare, particularly in the progressive diagnosis of diseases and the more effective development of relevant prevention and treatment measures based on the individual capabilities of the patients.

2. Literature Review

The use of predictive analytics in healthcare has been helpful in improving clinical care along with producing positive impacts on patient care and health system functioning. Predictive analytics, which is the use of more sophisticated tools to analyze the data collected, seek to anticipate future health episodes so that corrective measures and treatment can be taken before actual occurrences. Compared to other analytical techniques, deep learning has received significant interest for its ability to model complex, non-linear relationships and handle large volumes of medical data. Several studies have shown the pivotal role of deep learning in designing healthcare predictive analytics models. T.B. Sivakumar et al. [5] proposed study (2024) on diabetes analysis using deep learning, in which they implemented a machine learning model via an individual approach, though ensemble approaches could be implemented. P. Jantawong et al. [9] proposed a study (2024) for sensor-based physical activity recognition models integrated into deep learning for enhanced performance. In 2023, S.S. Priya et al. [10] demonstrated a study on the impact of deep learning and machine learning on IoT-enabled healthcare systems. The review article proposed by B. Saha et al. [11] demonstrates the different machine learning and deep learning approaches and their implications in predictive analytics.

Table 1 presents a comparative review of several studies from the latest years in healthcare predictive analytics using deep learning. It covers different studies related to deep learning on predictive analytics, with their outcomes and research gaps framed as limitations.

3. Proposed Methodology

The proposed methodology concentrates on employing predictive analytics in the healthcare domain and by utilizing deep learning models [21,22] and other hybrid techniques. The overall framework includes multiple stages: collection of dataset, cleansing, transformation and selection of features, training and testing of models, and assessment and ranking. These healthy outcome-related predictive models [23,24], including RF, NN, XG Boost, Autoencoder, and the further investigated RF + NN, XGBoost + NN, and Autoencoder + RF hybrids, are created to generate highly accurate and robust prognoses.

3.1. Data Collection

To implement such a research methodology, well-organized data relating to different diseases in humans was needed. So for this, data was used from of an open-source platform, i.e., Kaggle. The dataset contains nine columns and around 100,000 patients’ data in the form of rows. Figure 2 presents an overview of the dataset that was involved in performing this research, as a strong base of it. It shows the different columns of the dataset, which are named ‘gender’, ‘age’, ‘hypertension’, ‘heart disease’, ‘smoking history’, ‘bmi’, ‘HbA1c_level’, ‘blood glucose level’, and ‘diabetes’.

3.2. Hybrid Deep Learning Models

The proposed system integrates hybrid models to leverage the strength of individual techniques.

Random Forest (RF) + Neural Network (NN):

The illustrated RF model is utilized to serve for feature selection and a preliminary classification stage.
The selected features are given to a Neural Network for a final set of predictions.

XGBoost + Neural Network (NN):

Features are chosen, and predictions are made in XGBoost first.
These features are then passed through a Neural Network to get further improvements and to capture non-linear relationships.

Autoencoder + Random Forest (RF):

An autoencoder is used for feature learning and feature reduction in an unsupervised learning approach.
The data with the downsized number of features feeds into a Random Forest classifier for the last call.

Figure 3 represents the complete flowchart of the proposed methodology for the above-listed hybrid deep learning algorithms for predictive analytics in healthcare focusing on patient diabetes data from different features of the dataset.

4. Implementation

The implementation of the proposed predictive analytics framework in healthcare is carried out in a systematic manner. This section outlines the tools, programming environment, and step-by-step execution of the hybrid deep learning models RF + NN, XGBoost + NN, and Autoencoder + RF. In implementation, extreme data processing has been conducted to handle missing values in the dataset with various data analytics techniques. Furthermore, it involves encoding of categorical variables, such as the ‘Gender’ column’s values. Then, one of the most important steps occurs, splitting the dataset, in which 80% of the data is considered for training and 20% for testing for more optimized results.

4.1. Random Forest and Neural Network (RF + NN)

In the proposed method known as RF + NN, first, RF is used to select the relevant features from the dataset. This makes sense and reduces dimensionality so that only important inputs are forwarded to the Neural Network. The selected features are then passed through a feedforward Neural Network with numerous dense layers using an Adam optimizer and binary cross entropy for classification. The RF model helps in interpretability as it ranks order features, while the NN uncovers complexities within the data that can be in a non-linear form. This is due to the integration of RF’s feature importance into the prediction and the accuracy inherent for Neural Networks in the healthcare field [25,26]. Inherent for Neural Networks in the healthcare field [25,26] is testing data. A confusion matrix is an evaluation model of performance that deals with classification models and gives specific details about the prediction. It is a table with four components: True Positives (TPs), True Negatives (TNs), False Positives (FPs), and False Negatives (FNs). TP is the number of true positives, TN of true negatives, FP of false positives, and FN of false negatives. The proposed matrix facilitates calculation of the accuracy, precision, recall, and F1-score and provides more information on the model. Figure 4 shows that the proposed hybrid method results in maximum data as True Positives, which is an indication of model’s performance that model is performing very well with high accuracy.

4.2. XGBoost and Neural Network (XGBoost + NN)

The Hybrid model XGBoost + Neural Network (NN) uses XGBoost in feature selection and Neural Networks in making predictions. XGBoost determines important features and keeps only those; it also enhances the quality of inputs used; the Neural Network predicts non-linear relationships accurately. This integration improves the speed of computation and reliability of prediction in big healthcare datasets, making this model more efficient.

Figure 5 presents the confusion matrix for the XGBoost and Neural Network hybrid, which also performs very well but is slightly less accurate than RF + NN, as a decrease in True Positives can be observed from the confusion matrix.

4.3. Autoencoder and Random Forest (Autoencoder + RF)

The third hybrid method proposed model is Autoencoder and Random Forest together, in which healthcare data is compressed by Autoencoders, which perform unsupervised learning to obtain features from the data. Consequently, the described facilities employ Random Forest, a reliable ensemble method, based on these features for the purpose of predictive analysis. This combination improves model characteristics, deals with the issues of missing data, and provides better predictions in respect to diseases or outcomes in healthcare by implementing efficiency and precision in datasets in healthcare.

5. Results

The evaluation of the models in the field of predictive analytics for the healthcare domain demonstrates a higher accuracy of 96.82% on average, 90.47% precision, 70.08% recall, and 78.98% F1-score in the combination Random Forest + NN. The XGBoost + NN model was not far behind, with 96.75% accuracy, 86.38% precision, 73.54% recall, and 79.44% F1-score. Lastly, the Autoencoder + RF model gave an accuracy of 96.58% and the highest precision of 91.36%, but the recall was 66.22%; therefore, the F1-score was 76.78%. These results suggest the feasibility and comparable effectiveness of ensemble and deep learning methods in predictive healthcare analytics.

Figure 6 presents the accuracy comparison for all three proposed hybrid models for predictive analytics on the healthcare dataset focusing on diabetes. RF + NN performed slightly better than other models.

6. Conclusions

The conclusion of the study shows the ensemble learning techniques combined with deep learning techniques have a high predictive accuracy of over 96%; the RF + NN model yields the highest accuracy of 96.82%. Despite these mix-and-match strategies demonstrating fairly good performance, there is still room to fine-tune the proposed models for increasing recall rates to establish balanced and reliable prediction models important to healthcare fields. Future work can consider using hyperparameter optimization, selection of a greater number of features, improved interpretability by integrating more Explainable AI (XAI) techniques, and training on larger and more diverse datasets for even better generalization. Also, integrating actual-time big data frameworks may help AI provide actual-time decision-making, develop clinical applications, and strengthen patient care reports. Overall, the integration of ensemble learning and deep learning demonstrates considerable promise in healthcare analytics; addressing recall limitations, improving interpretability, and expanding the scalability of these methods are critical next steps. Thus, through the development of new methods in AI such as real-time big data systems and explainability, future studies can lay the groundwork for reliable, reproducible, and clinically beneficial AI tools in the healthcare environment for improving the delivery of patient care and resulting patient outcomes.

Author Contributions

Conceptualization, P.K. and A.M.; methodology, P.K.; software, I.R.P.; validation, P.K., I.R.P. and A.M.; formal analysis, P.K.; investigation, I.R.P.; resources, I.L.K.; data curation, I.R.P.; writing, original draft preparation, P.K.; writing, review and editing, A.M. and I.L.K.; visualization, I.R.P.; supervision, A.M.; project administration, A.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available upon reasonable request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tripathy, N.; Moharana, B.; Balabantaray, S.K.; Nayak, S.K.; Pati, A.; Panigrahi, A. A Comparative Analysis of Diabetes Prediction Using Machine Learning and Deep Learning Algorithms in Healthcare. In Proceedings of the 2024 International Conference on Advancements in Smart, Secure and Intelligent Computing (ASSIC), Bhubaneswar, India, 27–29 January 2024; pp. 1–6. [Google Scholar] [CrossRef]
Senthil, G.A.; Geerthik, S.; Jasphin Vijay, J.; Mohanakrishnan, K. A Novel Pharmacovigilance Strategy for Detecting Adverse Drug Reactions in Healthcare Using Machine Learning and Blockchain. In Proceedings of the 2024 Second International Conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI), Coimbatore, India, 28–30 August 2024; pp. 763–767. [Google Scholar] [CrossRef]
Ashreetha, B.; Srinivasa Kumar, S.V.S.S.; Srinivas, J.S.; Prasad, K.; Shekhar, R.; Dankan Gowda, V. Accurate Neoplasm Diagnosis with Comprehensive Machine Learning and Deep Learning Approaches. In Proceedings of the 2024 5th International Conference for Emerging Technology (INCET), Belgaum, India, 24–26 May 2024; pp. 1–7. [Google Scholar] [CrossRef]
Gupta, M. Advances in AI: Employing Deep Generative Models for the Creation of Synthetic Healthcare Datasets to Improve Predictive Analytics. In Proceedings of the 2023 International Conference on Communication, Security and Artificial Intelligence (ICCSAI), Greater Noida, India, 23–25 November 2023; pp. 1026–1030. [Google Scholar] [CrossRef]
Sivakumar, T.B.; Malakar, A.; Lekshmi, S.; Shailaja, G.; Kalaivani, E.; Babu, K.D. Enhanced Diabetes Prediction Using Deep Autoen-coder Framework and Electronic Health Records. In Proceedings of the 2024 Second International Conference on Advances in Information Technology (ICAIT), Chikkamagaluru, India, 24–27 July 2024; pp. 1–5. [Google Scholar] [CrossRef]
Desai, V.V.; Patil, M.; Thorat, M.; Kshirsagar, N.S.; Kestwal, M.; Chougule, P. Analysis on the Various Approaches Using Machine Learning in Different Businesses. In Proceedings of the 2024 International Conference on Healthcare Innovations, Software and Engineering Technologies (HISET), Karad, India, 18–19 January 2024; pp. 254–257. [Google Scholar] [CrossRef]
ul Haque, A.; Ghani, M.S.; Mahmood, T. Decentralized Transfer Learning using Blockchain & IPFS for Deep Learning. In Proceedings of the 2020 International Conference on Information Networking (ICOIN), Barcelona, Spain, 7–10 January 2020; pp. 170–177. [Google Scholar] [CrossRef]
Khanna, T.; Mathur, S. An Enhanced Analysis of Cloud-Based Deep Learning Model in Smart Healthcare Management. In Proceedings of the 2024 2nd International Conference on Disruptive Technologies (ICDT), Greater Noida, India, 15–16 March 2024; pp. 705–710. [Google Scholar] [CrossRef]
Mekruksavanich, S.; Jantawong, P.; Jitpattanakul, A. Ensemble Deep Learning Network for Enhancing Performances of Sensor-Based Physical Activity Recognition Based on IMU Sensor Data. In Proceedings of the 2024 5th International Conference on Big Data Analytics and Practices (IBDAP), Bangkok, Thailand, 23–25 August 2024; pp. 150–155. [Google Scholar] [CrossRef]
Priya, S.S.; Al-Fatlawy, M.H.; Khare, N.; Mahalakshmi, V.; Ganesh, S.S. Machine and Deep Learning Classifications for IoT-Enabled Healthcare Devices. In Proceedings of the 2023 4th International Conference on Computation, Automation and Knowledge Management (ICCAKM), Dubai, United Arab Emirates, 12–13 December 2023; pp. 1–7. [Google Scholar] [CrossRef]
Badhan, A.; Rana, A.; Malhi, S.S.; Kaur, P.; Saha, B. A Comparative Analysis for Loan Approval Prediction using Machine Learning. In Proceedings of the 2024 International Conference on Electrical Electronics and Computing Technologies (ICEECT), Greater Noida, India, 29–31 August 2024; pp. 1–5. [Google Scholar] [CrossRef]
Khaloofi, H.; Hussain, J.; Azhar, Z.; Ahmad, H.F. Performance Evaluation of Machine Learning Approaches for COVID-19 Forecasting by Infectious Disease Modeling. In Proceedings of the 2021 International Conference of Women in Data Science at Taif University (WiDSTaif), Taif, Saudi Arabia, 30–31 March 2021; pp. 1–6. [Google Scholar] [CrossRef]
Rasjid, Z.E. Predictive Analytics in Healthcare: The Use of Machine Learning for Diagnoses. In Proceedings of the 2021 International Conference on Electrical, Computer and Energy Technologies (ICECET), Cape Town, South Africa, 9–10 December 2021; pp. 1–6. [Google Scholar] [CrossRef]
Shruti; Trivedi, N.K. Predictive Analytics in Healthcare using Machine Learning. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; pp. 1–5. [Google Scholar] [CrossRef]
Vadivel, S.; Jayakarthik, R. Predictive Analytics on COVID-19 Prediction using ResNets. In Proceedings of the 2022 6th International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 29–31 March 2022; pp. 280–287. [Google Scholar] [CrossRef]
Naik, S.; Kumar, P.; Saha, S.; Bairagya, S.D.; Rawat, D.; Baliarsingh, S.K. Predictive Healthcare Analytics: A Multidisease Approach using Logistic Regression. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024; pp. 1–6. [Google Scholar] [CrossRef]
Iqbal, S.; Siddiqui, G.F.; Rehman, A.; Hussain, L.; Saba, T.; Tariq, U.; Abbasi, A.A. Prostate Cancer Detection Using Deep Learning and Traditional Techniques. IEEE Access 2021, 9, 27085–27100. [Google Scholar] [CrossRef]
Barhate, A.; Kumar, P.; Verma, P.; Jikar, N.; Tale, A.; Hikre, V. Smart Healthcare: Harnessing the Power of Machine Learning for Predictive Analysis. In Proceedings of the 2024 Parul International Conference on Engineering and Technology (PICET), Vadodara, India, 3–4 May 2024; pp. 1–7. [Google Scholar] [CrossRef]
Sundas, A.; Badotra, S.; Shahi, G.S.; Verma, A.; Bharany, S.; Ibrahim, A.O.; Abulfaraj, A.W.; Binzagr, F. Smart Patient Monitoring and Recommendation (SPMR) Using Cloud Analytics and Deep Learning. IEEE Access 2024, 12, 54238–54255. [Google Scholar] [CrossRef]
Puli, S.K.; Usha, P. Transforming Healthcare: Advancements, Applications, and Future Directions of Machine Learning. In Proceedings of the 2024 10th International Conference on Smart Computing and Communication (ICSCC), Bali, Indonesia, 25–27 July 2024; pp. 502–506. [Google Scholar] [CrossRef]
Singh, A.P.; Luhach, A.K.; Jhanjhi, N.Z.; Ghosh, U. A Novel Patient-Centric Architectural Framework for Blockchain-Enabled Healthcare Applications. IEEE Trans. Ind. Inform. 2021, 17, 5779–5789. [Google Scholar] [CrossRef]
Chatrati, S.P.; Hossain, G.; Goyal, A.; Bhan, A.; Bhattacharya, S.; Gauravc, D.; Tiwari, S.M. Smart home health monitoring system for predicting type 2 diabetes and hypertension. J. King Saud. Univ. Comput. Inf. Sci. 2022, 34, 862–870. [Google Scholar] [CrossRef]
Ullah, A.; Azeem, M.; Ashraf, H.; Alabaudi, A.A.; Humayun, M.; Jhanjhi, A.N. Secure Healthcare Data Aggregation and Transmission in IoT: A Survey. IEEE Access 2021, 9, 16849–16865. [Google Scholar] [CrossRef]
Singh, S.; Malik, A.; Batra, I.; Sharma, S.; Poongodi, M. Need for Integration of Blockchain Technology in Supply Chain Management of Health Supplements. In Proceedings of the 2023 3rd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 12–13 May 2023; pp. 1757–1761. [Google Scholar]
Diwaker, C.; Tomar, P.; Solanki, A.; Nayyar, A.; Jhanjhi, N.Z.; Abdullah, A.; Supramaniam, M.A. A New Model for Predicting Component-Based Software Reliability Using Soft Computing. IEEE Access 2019, 7, 147191–147203. [Google Scholar] [CrossRef]
Ray, S.K.; Sinha, R.; Ray, S.K. A smartphone-based post-disaster management mechanism using WIFI tethering. In Proceedings of the 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA), Auckland, New Zealand, 15–17 June 2015; pp. 966–971. [Google Scholar] [CrossRef]

Figure 1. Key uses of healthcare predictive analytics.

Figure 2. Dataset overview.

Figure 3. Proposed methodology flowchart.

Figure 4. Confusion matrix for hybrid prediction (Random Forest and NN).

Figure 5. Confusion matrix for hybrid predictive model XGBoost + NN.

Figure 6. Accuracy comparison of proposed hybrid models.

Table 1. Literature review table.

Authors	Methodology Used	Key Outcomes	Limitations
H. Khaloofi et al., 2021 [12]	Machine Learning: Regression Algorithms	Investigated COVID-19 data collected from Kaggle. Performed predictive analysis on different countries’ data.	The study shows comparative results for different algorithms and lists the best mode based on RMSE. The accuracy parameter could also be integrated for the best decision-making on the basis of precision.
Z.E. Rasjid, 2021 [13]	Machine Learning: SVM, Decision Tree, RF, KNN.	Performed predictive analysis on patient health records; data includes different multiple diseases data.	The researchers mentioned accuracy results in the Conclusion section, showing an SVM with 59% accuracy, which is quite low compared to previous work. Also lacks mention about the dataset.
Shruti et al., 2023 [14]	Neural Network, SVM	A predictive, analytics-based study on healthcare data, with the conclusion that predictive analytics in healthcare improves outcomes, reduces costs, and enhances productivity, but challenges include data privacy, bias, and interpretation.	Lacks in showing the quantitative results, such as accuracy of model and any graphical performance of predictive analysis.
S. Vadivel et al., 2022 [15]	RNN, Dense Net, MLP, Mobile Net models	The paper proves that together with IoT data, ResNets can accurately predict COVID-19 cases with real-time updates of infection, recoveries, and deaths, surpassing the accuracy of other deep learning models.	There are some research limitations based on epidemic prediction, sustainable development, and forecasting with consideration to context.
S. Naik et al., 2024 [16]	Logistic Regression	The research proves that the use of logistic regression models in identifying diabetes, heart disease, breast cancer, and lung cancer guarantees early diagnoses and prompt managerial intercession.	The research shows good accuracy results but lacks in comparison to other models, as many other best-performing models exist that could perform more precisely.
S. Iqbal et al., 2021 [17]	LSTM, ResNet, KNN, SVM	The study achieved high accuracy (99.07%) and AUC (0.9984) using KNN-Cosine with GLCM features, with even better results (99.84% accuracy, 0.9999 AUC) using ResNet-101 and LSTM. LSTM-based methods outperformed hand-crafted feature extraction.	Researchers mentioned the research gap themselves; LSTM’s feed-forward nature and bit parity issues are limitations. Transfer learning and feature selection in the future could cover this research gap.
A. Barhate et al., 2024 [18]	Machine learning for disease prediction review study.	This review-based study states that machine learning and predictive analytics were coming into focus, because with their help diseases are now diagnosed in advance, treatment is planned individually, and the medical images are enhanced. These technologies are now popular in moving the health care system from the old model of disease-centered to patient-centered care and improve its predictive ability, as well as the chances of improving treatment.	Review very conventional in manner. More review papers must be proposed by taking AI-trained models for designing new models in healthcare predictive analytics.
A. Sundas et. al., 2024 [19]	Deep learning integration with Categorical Cross Entropy (CCE) Optimization for prediction accuracy.	High accuracy and F-scores (>0.90), especially in forecasting urgent conditions. SPMR Local Predictive Model (LPM) excels in hypertensive individuals. The study gives real-time monitoring of chronic disease patients (e.g., hypertension, diabetes) both locally and via the cloud.	Scarcity of long-term data for chronic patients (hypertension, diabetes). Data imbalance in categories (e.g., emergency alerts) affecting performance metrics. Overfitting concerns with large, inconsistent datasets.
S.K. Puli et al., 2024 [20]	Review based on machine learning and deep learning.	The study shows that machine learning (ML) in healthcare is also being used for early diagnosis, individualized treatments, and resource management, and as a result also for enhanced patient satisfaction and lower expenses. But issues—for example, data privacy, model interpretability, and regulatory requirements—have to be solved, making it ethical and responsible usage. The future aims for a combination of new, innovative ideas like Explainable Artificial Intelligence and real-time healthcare.	The study shows the theoretical comparison of different studies, but it could have been much better if researchers would have performed a quantitative comparison of different papers as well.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kargotra, P.; Parray, I.R.; Malik, A.; Kharisma, I.L. Implementation of Predictive Analytics in Healthcare Using Hybrid Deep Learning Models. Eng. Proc. 2025, 107, 67. https://doi.org/10.3390/engproc2025107067

AMA Style

Kargotra P, Parray IR, Malik A, Kharisma IL. Implementation of Predictive Analytics in Healthcare Using Hybrid Deep Learning Models. Engineering Proceedings. 2025; 107(1):67. https://doi.org/10.3390/engproc2025107067

Chicago/Turabian Style

Kargotra, Poonam, Irfan Ramzan Parray, Arun Malik, and Ivana Lucia Kharisma. 2025. "Implementation of Predictive Analytics in Healthcare Using Hybrid Deep Learning Models" Engineering Proceedings 107, no. 1: 67. https://doi.org/10.3390/engproc2025107067

APA Style

Kargotra, P., Parray, I. R., Malik, A., & Kharisma, I. L. (2025). Implementation of Predictive Analytics in Healthcare Using Hybrid Deep Learning Models. Engineering Proceedings, 107(1), 67. https://doi.org/10.3390/engproc2025107067

Article Menu

Implementation of Predictive Analytics in Healthcare Using Hybrid Deep Learning Models^†

Abstract

1. Introduction