Recent Trends in Machine Learning for Healthcare Big Data Applications: Review of Velocity and Volume Challenges
Abstract
1. Introduction
2. Background
2.1. Volume and Velocity in the Era of Machine Learning and Healthcare Big Data
2.2. Challenges Surrounding Large Volumes of Healthcare Big Data
2.3. Challenges Surrounding High Velocity Healthcare Big Data
3. Materials and Methods
3.1. Survey Methodology
3.2. Literature Selection and Sources
3.3. Inclusion and Exclusion Criteria
3.3.1. Inclusion Criteria
- Peer-reviewed articles discussing AI/ML applications in healthcare big data analytics.
- Studies that investigated distributed computing solutions (e.g., Apache Spark, federated learning, and neuromorphic computing) for scalable model training.
- Research with a focus on improving machine learning model velocity, scalability, model/data parallelism, and efficient utilization of resources and power.
- Research addressing challenges related to model velocity, volume, and optimization in large-scale healthcare big data systems.
3.3.2. Exclusion Criteria
- Disease-specific reviews and learning models’ evaluations.
- Studies focused on general AI and machine learning applications without healthcare context.
- Articles with limited technical/theoretical contributions.
- Articles relied on outdated methods and lacked performance validation in real-world healthcare environments.
- Articles with a weak comprehensive assessment to ensure the replicability of the proposed method.
- Articles with limited evaluation, such as no reported time or a scalability assessment.
3.4. Categorization and Thematic Analysis
4. Literature Review
4.1. Efficient Techniques, Arithmetic Operations, and Improved Dimensionality Reduction
4.2. Advanced and Specialized Processing Hardware
4.3. Clustering and Parallel Processing Methods Frameworks
5. Discussion and Analysis
5.1. Role Analysis of Efficient Computations and Techniques in Healthcare Big Data
- Scalability: refers to the capacity of the proposed approach to maintain performance when applied to large-scale healthcare big data environments.
- Applicability to Healthcare Big Data: evaluates the potential for the approach to be adopted across diverse medical applications, particularly those involving high-volume and high-velocity data streams.
- Computational Resource Efficiency: assesses the computational cost and hardware requirements associated with the proposed methods, emphasizing whether the techniques truly align with the pursuit of computational efficiency in healthcare big data settings.
5.2. Role Analysis of Specialized Hardware in Healthcare Big Data
- Scalability: measures whether the proposed hardware-based approach is efficiently scalable in large-scale healthcare big data environments.
- Cost: measures the financial expenses associated with the deployment and maintenance of hardware-accelerated learning models in big data settings.
- Technical Difficulties: pertains to the complexity of implementing, configuring, scaling, and integrating the proposed approach in a healthcare big data setting.
5.3. Role of Clustering and Parallel Processing in Healthcare Big Data
6. Research Gaps and Future Trends
6.1. Research Gaps and Future Directions in Efficient Computational Techniques
6.2. Research Gaps and Future Directions in Specialized Hardware Acceleration
6.3. Research Gaps and Future Directions in Parallel and Distributed Processing Frameworks
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Baalann, K.P.; Chandrasekar, V.S.A.; Kathirvel, M.; Chakraborty, T.; Vasanthi, R.K.; Ganesh, K.; Bhandari, A.; Prasanna, P.M.; Parthasarathy, S. AI innovations in anaesthesia: A systematic review of clinical application. Indian J. Clin. Anaesth. 2025, 12, 177–189. [Google Scholar] [CrossRef]
- Khanra, S.; Dhir, A.; Islam, A.N.; Mäntymäki, M. Big data analytics in healthcare: A systematic literature review. Enterp. Inf. Syst. 2020, 14, 878–912. [Google Scholar] [CrossRef]
- L’heureux, A.; Grolinger, K.; Elyamany, H.F.; Capretz, M.A.M. Machine learning with big data: Challenges and approaches. IEEE Access 2017, 5, 7776–7797. [Google Scholar] [CrossRef]
- Batko, K.; Ślęzak, A. The use of big data analytics in healthcare. J. Big Data 2022, 9, 3. [Google Scholar] [CrossRef]
- National Center for Biotechnology Information (NCBI). Genebank Statistics. Available online: https://www.ncbi.nlm.nih.gov/genbank/statistics/ (accessed on 22 January 2023).
- Santosh, K.C.; Ghosh, S. COVID-19 imaging tools: How big data is big? J. Med. Syst. 2021, 45, 71. [Google Scholar] [CrossRef] [PubMed]
- Fatima, S. Improving healthcare outcomes through machine learning: Applications and challenges in big data analytics. Int. J. Adv. Res. Eng. Technol. Sci. 2024, 11, 2349–2819. [Google Scholar]
- Berros, N.; El Mendili, F.; Filaly, Y.; El Idrissi, Y.E.B.E. Enhancing digital health services with big data analytics. Big Data Cogn. Comput. 2023, 7, 64. [Google Scholar] [CrossRef]
- Al-Sai, Z.A.; Husin, M.H.; Syed-Mohamad, S.M.; Abdin, R.M.S.; Damer, N.; Abualigah, L.; Gandomi, A.H. Explore big data analytics applications and opportunities: A review. Big Data Cogn. Comput. 2022, 6, 157. [Google Scholar] [CrossRef]
- Miah, S.J.; Camilleri, E.; Vu, H.Q. Big data in healthcare research: A survey study. J. Comput. Inf. Syst. 2022, 62, 480–492. [Google Scholar] [CrossRef]
- Khoei, T.T.; Singh, A. Data reduction in big data: A survey of methods, challenges and future directions. Int. J. Data Sci. Anal. 2025, 20, 1643–1682. [Google Scholar] [CrossRef]
- Tsai, C.-W.; Lai, C.-F.; Chao, H.-C.; Vasilakos, A.V. Big data analytics: A survey. J. Big Data 2015, 2, 21. [Google Scholar] [CrossRef]
- James, R. Out of the box: Big data needs the information profession—The importance of validation. Bus. Inf. Rev. 2014, 31, 118–121. [Google Scholar] [CrossRef]
- Miller, H.D. From volume to value: Better ways to pay for health care. Health Aff. 2009, 28, 1418–1428. [Google Scholar] [CrossRef]
- Guo, C.; Chen, J. Big data analytics in healthcare. In Knowledge Technology and Systems: Toward Establishing Knowledge Systems Science; Springer Nature: Singapore, 2023; pp. 27–70. [Google Scholar] [CrossRef]
- Ohlhorst, F.J. Big Data Analytics: Turning Big Data into Big Money; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Laney, D. 3D data management: Controlling data volume, velocity, and variety. META Group Res. Note 2001, 6, 1. [Google Scholar]
- Malekloo, A.; Ozer, E.; AlHamaydeh, M.; Girolami, M. Machine learning and structural health monitoring overview with emerging technology and high-dimensional data source highlights. Struct. Health Monit. 2022, 21, 1906–1955. [Google Scholar] [CrossRef]
- Palanisamy, V.; Thirunavukarasu, R. Implications of big data analytics in developing healthcare frameworks—A review. J. King Saud Univ. Comput. Inf. Sci. 2019, 31, 415–425. [Google Scholar] [CrossRef]
- George, M.M.; Rasmi, P.S. Performance comparison of Apache Hadoop and Apache Spark for COVID-19 data sets. In Proceedings of the 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 20–22 January 2022; pp. 1659–1665. [Google Scholar] [CrossRef]
- Kumari, S.; Muthulakshmi, P. High-performance computation in big data analytics. In International Conference on Intelligent Systems Design and Applications; Springer: Cham, Switzerland, 2022; pp. 543–553. [Google Scholar] [CrossRef]
- Alanazi, A. Using machine learning for healthcare challenges and opportunities. Inform. Med. Unlocked 2022, 30, 100924. [Google Scholar] [CrossRef]
- Lee, C.H.; Yoon, H.-J. Medical big data: Promise and challenges. Kidney Res. Clin. Pract. 2017, 36, 3–11. [Google Scholar] [CrossRef]
- Sagi, O.; Rokach, L. Ensemble learning: A survey. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
- Alhenawi, E.; Al-Sayyed, R.; Hudaib, A.; Mirjalili, S. Feature selection methods on gene expression microarray data for cancer classification: A systematic review. Comput. Biol. Med. 2022, 140, 105051. [Google Scholar] [CrossRef]
- Adadi, A. A survey on data-efficient algorithms in the big data era. J. Big Data 2021, 8, 24. [Google Scholar] [CrossRef]
- Tchapga, C.T.; Mih, T.A.; Kouanou, A.T.; Fonzin, T.F.; Fogang, P.K.; Mezatio, B.A.; Tchiotsop, D. Biomedical image classification in a big data architecture using machine learning algorithms. J. Healthc. Eng. 2021, 2021, 9998819. [Google Scholar] [CrossRef] [PubMed]
- Rehman, A.; Naz, S.; Razzak, I. Leveraging big data analytics in healthcare enhancement: Trends, challenges, and opportunities. Multimed. Syst. 2022, 28, 1339–1371. [Google Scholar] [CrossRef]
- An, Q.; Rahman, S.; Zhou, J.; Kang, J.J. A comprehensive review on machine learning in the healthcare industry: Classification, restrictions, opportunities and challenges. Sensors 2023, 23, 4178. [Google Scholar] [CrossRef]
- Azmi, J.; Arif, M.; Nafis, M.T.; Alam, M.A.; Tanweer, S.; Wang, G. A systematic review on machine learning approaches for cardiovascular disease prediction using medical big data. Med. Eng. Phys. 2022, 105, 103825. [Google Scholar] [CrossRef]
- Altman, M.B.; Wan, W.; Hosseini, A.S.; Nowdeh, S.A.; Alizadeh, M. Machine learning algorithms for FPGA implementation in biomedical engineering applications: A review. Heliyon 2024, 10, 4. [Google Scholar] [CrossRef] [PubMed]
- Zeydan, E.; Arslan, S.S.; Liyanage, M. Managing distributed machine learning lifecycle for healthcare data in the cloud. IEEE Access 2024, 12, 115750–115774. [Google Scholar] [CrossRef]
- Khalsan, M.; Machado, L.R.; Al-Shamery, E.S.; Ajit, S.; Anthony, K.; Mu, M.; Agyeman, M.O. A survey of machine learning approaches applied to gene expression analysis for cancer prediction. IEEE Access 2022, 10, 27522–27534. [Google Scholar] [CrossRef]
- Zhang, X.-D. A Matrix Algebra Approach to Artificial Intelligence; Springer: Singapore, 2020; p. 803. [Google Scholar] [CrossRef]
- Ashraf, M.; Gupta, D.; Khanna, A.; Bhattacharyya, S.; Hassanien, A.E.; Anand, S.; Jaiswal, A. Prediction of cardio-vascular disease through cutting-edge deep learning technologies: An empirical study based on TensorFlow, PyTorch and Keras. In International Conference on Innovative Computing and Communications; Gupta, D., Khanna, A., Bhattacharyya, S., Hassanien, A.E., Anand, S., Jaiswal, A., Eds.; Advances in Intelligent Systems and Computing; Springer: Singapore, 2021; Volume 1165, pp. 239–255. [Google Scholar] [CrossRef]
- Dai, H.; Peng, X.; Shi, X.; He, L.; Xiong, Q.; Jin, H. Reveal training performance mystery between TensorFlow and PyTorch in the single GPU environment. Sci. China Inf. Sci. 2022, 65, 172101. [Google Scholar] [CrossRef]
- Kimm, H.; Paik, I.; Kimm, H. Performance comparison of TPU, GPU, CPU on Google Colaboratory over distributed deep learning. In Proceedings of the IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC), Singapore, 20–23 December 2021; pp. 312–319. [Google Scholar] [CrossRef]
- Nikolić, G.S.; Dimitrijević, B.R.; Nikolić, T.R.; Stojčev, M.K. A survey of three types of processing units: CPU, GPU, and TPU. In Proceedings of the 57th International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST), Ohrid, North Macedonia, 16–18 June 2022; pp. 1–6. [Google Scholar] [CrossRef]
- Chung, I.-H.; Sainath, T.N.; Ramabhadran, B.; Picheny, M.; Gunnels, J.; Austel, V.; Chauhari, U.; Kingsbury, B. Parallel deep neural network training for big data on Blue Gene/Q. IEEE Trans. Parallel Distrib. Syst. 2016, 28, 1703–1714. [Google Scholar] [CrossRef]
- Dhouibi, M.; Ben Salem, A.K.; Saidi, A.; Ben Saoud, S. Accelerating deep neural networks implementation: A survey. IET Comput. Digit. Tech. 2021, 15, 79–96. [Google Scholar] [CrossRef]
- Khalilian, M.; Boroujeni, F.Z.; Mustapha, N.; Sulaiman, M.N. K-means divide and conquer clustering. In Proceedings of the International Conference on Computer and Automation Engineering, Bangkok, Thailand, 8–10 March 2009; pp. 306–309. [Google Scholar] [CrossRef]
- Imran, S.; Mahmood, T.; Morshed, A.; Sellis, T. Big data analytics in healthcare—A systematic literature review and roadmap for practical implementation. IEEE/CAA J. Autom. Sin. 2020, 8, 1–22. [Google Scholar] [CrossRef]
- Qiu, J.; Wu, Q.; Ding, G.; Xu, Y.; Feng, S. A survey of machine learning for big data processing. EURASIP J. Adv. Signal Process. 2016, 2016, 67. [Google Scholar] [CrossRef]
- Slavakis, K.; Kim, S.-J.; Mateos, G.; Giannakis, G.B. Stochastic approximation vis-a-vis online learning for big data analytics [lecture notes]. IEEE Signal Process. Mag. 2014, 31, 124–129. [Google Scholar] [CrossRef]
- Ta, V.-D.; Liu, C.-M.; Nkabinde, G.W. Big data stream computing in healthcare real-time analytics. In Proceedings of the IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, China, 5–7 July 2016; pp. 37–42. [Google Scholar] [CrossRef]
- Shahraki, A.; Abbasi, M.; Taherkordi, A.; Jurcut, A.D. A comparative study on online machine learning techniques for network traffic streams analysis. Comput. Netw. 2022, 207, 108836. [Google Scholar] [CrossRef]
- Luo, Y.; Yin, L.; Bai, W.; Mao, K. An appraisal of incremental learning methods. Entropy 2020, 22, 1190. [Google Scholar] [CrossRef]
- He, J.; Mao, R.; Shao, Z.; Zhu, F. Incremental learning in online scenario. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13926–13935. [Google Scholar] [CrossRef]
- Senthil, R.; Anand, T.; Somala, C.S.; Saravanan, K.M. Bibliometric analysis of artificial intelligence in healthcare research: Trends and future directions. Future Healthc. J. 2024, 11, 100182. [Google Scholar] [CrossRef]
- Ganatra, H.A. Machine learning in pediatric healthcare: Current trends, challenges, and future directions. J. Clin. Med. 2025, 14, 807. [Google Scholar] [CrossRef]
- Ahmed, A.; Xi, R.; Hou, M.; Shah, S.A.; Hameed, S. Harnessing big data analytics for healthcare: A comprehensive review of frameworks, implications, applications, and impacts. IEEE Access 2023, 11, 112891–112928. [Google Scholar] [CrossRef]
- Domenteanu, A.; Cibu, B.; Delcea, C. Mapping the research landscape of Industry 5.0 from a machine learning and big data analytics perspective: A bibliometric approach. Sustainability 2024, 16, 2764. [Google Scholar] [CrossRef]
- Mayer, R.; Jacobsen, H.-A. Scalable deep learning on distributed infrastructures: Challenges, techniques, and tools. ACM Comput. Surv. 2020, 53, 1–37. [Google Scholar] [CrossRef]
- Lwakatare, L.E.; Raj, A.; Crnkovic, I.; Bosch, J.; Olsson, H.H. Large-scale machine learning systems in real-world industrial settings: A review of challenges and solutions. Inf. Softw. Technol. 2020, 127, 106368. [Google Scholar] [CrossRef]
- Le, T.T.; Fu, W.; Moore, J.H. Scaling tree-based automated machine learning to biomedical big data with a feature set selector. Bioinformatics 2020, 36, 250–256. [Google Scholar] [CrossRef] [PubMed]
- Zheng, X.; Li, P.; Wu, X. Data stream classification based on extreme learning machine: A review. Big Data Res. 2022, 30, 100356. [Google Scholar] [CrossRef]
- Sangeetha, G.; Balasubramanian, V. HEL-MCNN: Hybrid extreme learning modified convolutional neural network for allocating suitable donors for patients with minimized waiting time. Expert Syst. Appl. 2023, 232, 120673. [Google Scholar] [CrossRef]
- Lahoura, V.; Singh, H.; Aggarwal, A.; Sharma, B.; Mohammed, M.A.; Damaševičius, R.; Kadry, S.; Cengiz, K. Cloud computing-based framework for breast cancer diagnosis using extreme learning machine. Diagnostics 2021, 11, 241. [Google Scholar] [CrossRef]
- Malik, H.; Anees, T.; Naeem, A.; Naqvi, R.A.; Loh, W.-K. Blockchain-federated and deep-learning-based ensembling of capsule network with incremental extreme learning machines for classification of COVID-19 using CT scans. Bioengineering 2023, 10, 203. [Google Scholar] [CrossRef]
- Rajendran, S.; Khalaf, O.I.; Alotaibi, Y.; Alghamdi, S. MapReduce-based big data classification model using feature subset selection and hyperparameter tuned deep belief network. Sci. Rep. 2021, 11, 24138. [Google Scholar] [CrossRef]
- Goswami, M.; Mohanty, S.; Pattnaik, P.K. Optimization of machine learning models through quantization and data bit reduction in healthcare datasets. Frankl. Open 2024, 8, 100136. [Google Scholar] [CrossRef]
- Sharada, K.A.; Sushma, K.; Muthukumaran, V.; Mahesh, T.; Swapna, B.; Roopashree, S. High ECG diagnosis rate using novel machine learning techniques with distributed arithmetic (DA)-based gated recurrent units. Microprocess. Microsyst. 2023, 98, 104796. [Google Scholar] [CrossRef]
- Rahman, M.M.; Al-Amin, M.; Hossain, J. Machine learning models for chronic kidney disease diagnosis and prediction. Biomed. Signal Process. Control 2024, 87, 105368. [Google Scholar] [CrossRef]
- Narwane, S.V.; Sawarkar, S.D. Is handling unbalanced datasets for machine learning uplift system performance? A case of diabetic prediction. Diabetes Metab. Syndr. Clin. Res. Rev. 2022, 16, 102609. [Google Scholar] [CrossRef]
- Kumar, V.; Biswas, S.; Rajput, D.S.; Patel, H.; Tiwari, B. PCA-based incremental extreme learning machine (PCA-IELM) for COVID-19 patient diagnosis using chest X-ray images. Comput. Intell. Neurosci. 2022, 2022, 9107430. [Google Scholar] [CrossRef]
- Hoozemans, J.; Peltenburg, J.; Nonnemacher, F.; Hadnagy, A.; Al-Ars, Z.; Hofstee, H.P. FPGA acceleration for big data analytics: Challenges and opportunities. IEEE Circuits Syst. Mag. 2021, 21, 30–47. [Google Scholar] [CrossRef]
- Wang, L.; Alexander, C.A. Big data analytics in medical engineering and healthcare: Methods, advances, and challenges. J. Med. Eng. Technol. 2020, 44, 267–283. [Google Scholar] [CrossRef]
- Sanaullah, A.; Yang, C.; Alexeev, Y.; Yoshii, K.; Herbordt, M.C. Real-time data analysis for medical diagnosis using FPGA-accelerated neural networks. BMC Bioinform. 2018, 19 (Suppl. S19), 19–31. [Google Scholar] [CrossRef] [PubMed]
- Sharma, Y.; Tiwari, N.K.; Upadhyay, V.K. EffSVMNet: An efficient hybrid neural network for improved skin disease classification. Smart Health 2024, 34, 100520. [Google Scholar] [CrossRef]
- Sakthivel, R.; Thaseen, I.S.; Vanitha, M.; Deepa, M.; Angulakshmi, M.; Mangayarkarasi, R.; Mahendran, A.; Alnumay, W.; Chatterjee, P. An efficient hardware architecture based on an ensemble of deep learning models for COVID-19 prediction. Sustain. Cities Soc. 2022, 80, 103713. [Google Scholar] [CrossRef]
- Cheng, X.; Liu, D.; Lu, J.; Wei, L.; Hu, A.; Lei, J.; Zou, Z.; Zou, X.; Jiang, Q. Efficient hardware design of a deep U-net model for pixel-level ECG classification in healthcare devices. Microelectron. J. 2022, 126, 105492. [Google Scholar] [CrossRef]
- Soffer, S.; Ben-Cohen, A.; Shimon, O.; Amitai, M.M.; Greenspan, H.; Klang, E. Convolutional neural networks for radiologic images: A radiologist’s guide. Radiology 2019, 290, 590–606. [Google Scholar] [CrossRef] [PubMed]
- Draelos, R.L.; Dov, D.; Mazurowski, M.A.; Lo, J.Y.; Henao, R.; Rubin, G.D.; Carin, L. Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes. Med. Image Anal. 2021, 67, 101857. [Google Scholar] [CrossRef]
- Aruna, V.B.K.L.; Chitra, E.; Padmaja, M. Accelerating deep convolutional neural network on FPGA for ECG signal classification. Microprocess. Microsyst. 2023, 103, 104939. [Google Scholar] [CrossRef]
- Yacoub, M.H.; Ismail, S.M.; Said, L.A.; Madian, A.H.; Radwan, A.G. Reconfigurable hardware implementation of K-nearest neighbor algorithm on FPGA. AEÜ–Int. J. Electron. Commun. 2024, 173, 154999. [Google Scholar] [CrossRef]
- Shafqat, S.; Kishwer, S.; Rasool, R.U.; Qadir, J.; Amjad, T.; Ahmad, H.F. Big data analytics enhanced healthcare systems: A review. J. Supercomput. 2020, 76, 1754–1799. [Google Scholar] [CrossRef]
- Kumar, S.; Singh, M. Big data analytics for healthcare industry: Impact, applications, and tools. Big Data Min. Anal. 2018, 2, 48–57. [Google Scholar] [CrossRef]
- Abdel-Fattah, M.A.; Othman, N.A.; Goher, N. Predicting chronic kidney disease using hybrid machine learning based on Apache Spark. Comput. Intell. Neurosci. 2022, 2022, 9898831. [Google Scholar] [CrossRef] [PubMed]
- Guan, P.; Yu, K.; Wei, W.; Tan, Y.; Wu, J. Big data analytics on lung cancer diagnosis framework with deep learning. IEEE/ACM Trans. Comput. Biol. Bioinf. 2023, 21, 757–768. [Google Scholar] [CrossRef]
- Sukanya, J.; Gandhi, K.R.; Palanisamy, V. An assessment of machine learning algorithms for healthcare analysis based on improved MapReduce. Adv. Eng. Softw. 2022, 173, 103285. [Google Scholar] [CrossRef]
- Albattah, W.; Khan, R.U.; Alsharekh, M.F.; Khasawneh, S.F. Feature selection techniques for big data analytics. Electronics 2022, 11, 3177. [Google Scholar] [CrossRef]
- Xing, W.; Bei, Y. Medical health big data classification based on KNN classification algorithm. IEEE Access 2019, 8, 28808–28819. [Google Scholar] [CrossRef]
- Jaiswal, V.; Saurabh, P.; Lilhore, U.K.; Pathak, M.; Simaiya, S.; Dalal, S. A breast cancer risk prediction and classification model with ensemble learning and big data fusion. Decis. Anal. J. 2023, 8, 100298. [Google Scholar] [CrossRef]
- Orlu, G.U.; Abdullah, R.B.; Zaremohzzabieh, Z.; Jusoh, Y.Y.; Asadi, S.; Qasem, Y.A.M.; Nor, R.N.H.; Mohd Nasir, W.M.H. A Systematic Review of Literature on Sustaining Decision-Making in Healthcare Organizations Amid Imperfect Information in the Big Data Era. Sustainability 2023, 15, 15476. [Google Scholar] [CrossRef]
- Vettoruzzo, A.; Bouguelia, M.-R.; Vanschoren, J.; Rögnvaldsson, T.; Santosh, K.C. Advances and challenges in meta-learning: A technical review. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 4763–4779. [Google Scholar] [CrossRef]
- Rafiei, A.; Moore, R.; Jahromi, S.; Hajati, F.; Kamaleswaran, R. Meta-learning in healthcare: A survey. SN Comput. Sci. 2024, 5, 791. [Google Scholar] [CrossRef]
- He, X.; Zhao, K.; Chu, X. AutoML: A survey of the state-of-the-art. Knowl. Based Syst. 2021, 212, 106622. [Google Scholar] [CrossRef]
- Barbudo, R.; Ventura, S.; Romero, J.R. Eight years of AutoML: Categorization, review, and trends. Knowl. Inf. Syst. 2023, 65, 5097–5149. [Google Scholar] [CrossRef]
- Yuan, H.; Yu, K.; Xie, F.; Liu, M.; Sun, S. Automated machine learning with interpretation: A systematic review of methodologies and applications in healthcare. Med. Adv. 2024, 2, 205–237. [Google Scholar] [CrossRef]
- Parimanam, K.; Lakshmanan, L.; Palaniswamy, T. Hybrid optimization-based learning technique for multi-disease analytics from healthcare big data using optimal pre-processing, clustering, and classifier. Concurr. Comput. Pract. Exp. 2022, 34, e6986. [Google Scholar] [CrossRef]
- Cao, B.; Zhao, J.; Lv, Z.; Liu, X.; Yang, S.; Kang, X.; Kang, K. Distributed parallel particle swarm optimization for multi-objective and many-objective large-scale optimization. IEEE Access 2017, 5, 8214–8221. [Google Scholar] [CrossRef]
- Wang, X.; Wang, F.; He, Q.; Guo, Y. A multi-swarm optimizer with a reinforcement learning mechanism for large-scale optimization. Swarm Evol. Comput. 2024, 86, 101486. [Google Scholar] [CrossRef]
- Bhattacharya, M.; Islam, R.; Abawajy, J. Evolutionary optimization: A big data perspective. J. Netw. Comput. Appl. 2016, 59, 416–426. [Google Scholar] [CrossRef]
- Yang, T.; Deng, Y.; Yu, B.; Qian, Y.; Dai, J. Local feature selection for large-scale data sets with limited labels. IEEE Trans. Knowl. Data Eng. 2022, 35, 7152–7163. [Google Scholar] [CrossRef]
- Liyanage, Y.W.; Zois, D.-S.; Chelmis, C. Dynamic instance-wise joint feature selection and classification. IEEE Trans. Artif. Intell. 2021, 2, 169–184. [Google Scholar] [CrossRef]
- Zhu, X.; Song, Y.; Wang, P.; Li, L.; Fu, Z. Data-driven adaptive and stable feature selection method for large-scale industrial systems. Control Eng. Pract. 2024, 153, 106097. [Google Scholar] [CrossRef]
- Sakivama, K.; Kato, S.; Ishikawa, Y.; Hori, A.; Monrroy, A. Deep learning on large-scale multicore clusters. In Proceedings of the International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Lyon, France, 24–27 September 2018; pp. 314–321. [Google Scholar] [CrossRef]
- Ragala, R.; Kumar, G. Rank-based pseudoinverse computation in extreme learning machine for large datasets. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 1341–1346. [Google Scholar] [CrossRef]
- Zhao, S.-X.; Wang, X.-Z.; Wang, L.-Y.; Hu, J.-M.; Li, W.-P. Analysis on fast training speed of extreme learning machine and replacement policy. Int. J. Wirel. Mob. Comput. 2017, 13, 314–322. [Google Scholar] [CrossRef]
- Yang, B. Application of matrix decomposition in machine learning. In Proceedings of the IEEE International Conference on Computer Science, Electronics and Information Engineering and Intelligent Control Technology (CEI), Fuzhou, China, 24–26 September 2021; pp. 133–137. [Google Scholar] [CrossRef]
- Dereziński, M.; Mahoney, M.W. Recent and upcoming developments in randomized numerical linear algebra for machine learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Barcelona, Spain, 25–29 August 2024; pp. 6470–6479. [Google Scholar] [CrossRef]
- Ahmad, A.; Pasha, M.A. Optimizing hardware-accelerated general matrix–matrix multiplication for CNNs on FPGAs. IEEE Trans. Circuits Syst. II Express Briefs 2020, 67, 2692–2696. [Google Scholar] [CrossRef]
- Robinson, T.; Harkin, J.; Shukla, P. Hardware acceleration of genomics data analysis: Challenges and opportunities. Bioinformatics 2021, 37, 1785–1795. [Google Scholar] [CrossRef] [PubMed]
- Antunes, R.S.; da Costa, C.A.; Küderle, A.; Yari, I.A.; Eskofier, B. Federated learning for healthcare: Systematic review and architecture proposal. ACM Trans. Intell. Syst. Technol. 2022, 13, 1–23. [Google Scholar] [CrossRef]
- Bharati, S.; Mondal, M.R.H.; Podder, P.; Prasath, V.B.S. Federated learning: Applications, challenges, and future directions. Int. J. Hybrid Intell. Syst. 2022, 18, 19–35. [Google Scholar] [CrossRef]
- Joshi, M.; Pal, A.; Sankarasubbu, M. Federated learning for healthcare domain—Pipeline, applications, and challenges. ACM Trans. Comput. Healthc. 2022, 3, 1–36. [Google Scholar] [CrossRef]
- Li, H.; Li, C.; Wang, J.; Yang, A.; Ma, Z.; Zhang, Z.; Hua, D. Review on the security of federated learning and its application in healthcare. Future Gener. Comput. Syst. 2023, 144, 271–290. [Google Scholar] [CrossRef]
- Tian, F.; Yang, J.; Zhao, S.; Sawan, M. NeuroCARE: A generic neuromorphic edge computing framework for healthcare applications. Front. Neurosci. 2023, 17, 1093865. [Google Scholar] [CrossRef]
- Gautam, A.; Sharma, S. Artificial narrow intelligence-inspired neuromorphic computing for logic operations in healthcare appliances. In Proceedings of the 7th International Conference on Circuit Power and Computing Technologies (ICCPCT), Kollam, India, 8–9 August 2024; Volume 1, pp. 714–719. [Google Scholar] [CrossRef]
- Goyal, S.R. Neuromorphic system for real-time healthcare applications. In Primer to Neuromorphic Computing; Academic Press: Cambridge, MA, USA, 2025; pp. 83–96. [Google Scholar] [CrossRef]
- Cohen, S.; Leve, F.; Trannois, H.; Badreddine, W.; Legendre, F. A decision-making model based on spiking neural network (SNN) for remote patient monitoring. Int. J. Mach. Learn. Comput. 2023, 13, 82–90. [Google Scholar] [CrossRef]
- Yamazaki, K.; Vo-Ho, V.-K.; Bulsara, D.; Le, N. Spiking neural networks and their applications: A review. Brain Sci. 2022, 12, 863. [Google Scholar] [CrossRef] [PubMed]
- Shahid, A.; Mushtaq, M. A survey comparing specialized hardware and evolution in TPUs for neural networks. In Proceedings of the IEEE 23rd International Multitopic Conference (INMIC), Bahawalpur, Pakistan, 5–7 November 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Jouppi, N.P.; Young, C.; Patil, N.; Patterson, D.; Agrawal, G.; Bajwa, R.; Bates, S.; Bhatia, S.; Boden, N.; Borchers, A.; et al. In-Datacenter Performance Analysis of a Tensor Processing Unit. Proc. Int. Symp. Comput. Archit. 2017, 45, 1–12. [Google Scholar] [CrossRef]
- Azghadi, M.R.; Lammie, C.; Eshraghian, J.K.; Payvand, M.; Donati, E.; Linares-Barranco, B.; Indiveri, G. Hardware implementation of deep network accelerators towards healthcare and biomedical applications. IEEE Trans. Biomed. Circuits Syst. 2020, 14, 1138–1159. [Google Scholar] [CrossRef] [PubMed]
- Theodorakopoulos, L.; Theodoropoulou, A.; Stamatiou, Y. A State-of-the-Art Review in Big Data Management Engineering: Real-Life Case Studies, Challenges, and Future Research Directions. Eng 2024, 5, 1266–1297. [Google Scholar] [CrossRef]
- Huang, H.; Chow, E. Exploring the design space of distributed parallel sparse matrix–multiple vector multiplication. IEEE Trans. Parallel Distrib. Syst. 2024, 35, 1977–1988. [Google Scholar] [CrossRef]
- Kang, H.; Kwon, H.C.; Kim, D. HPMaX: Heterogeneous parallel matrix multiplication using CPUs and GPUs. Computing 2020, 102, 2607–2631. [Google Scholar] [CrossRef]
- Liu, J.; Liang, X.; Ruan, W.; Zhang, B. High-performance medical data processing technology based on distributed parallel machine learning algorithm. J. Supercomput. 2022, 78, 5933–5956. [Google Scholar] [CrossRef]
- Sharma, S.K.; Dixit, R.J. Applications of parallel data processing for biomedical imaging. In Applications of Parallel Data Processing for Biomedical Imaging; IGI Global: Hershey, PA, USA, 2024; pp. 1–24. [Google Scholar] [CrossRef]
- Misra, C.; Bhattacharya, S.; Ghosh, S.K. STARK: Fast and scalable Strassen’s matrix multiplication using Apache Spark. IEEE Trans. Big Data 2020, 8, 699–710. [Google Scholar] [CrossRef]
- Foldi, T.; von Csefalvay, C.; Perez, N.A. JAMPI: Efficient matrix multiplication in Spark using barrier execution mode. Big Data Cogn. Comput. 2020, 4, 32. [Google Scholar] [CrossRef]
- Mishra, R. Parallel computing techniques for accelerating machine learning algorithms on big data. In Proceedings of the International Conference on Power, Energy, Environment and Intelligent Control (PEEIC), Greater Noida, India, 19–23 December 2023; pp. 669–672. [Google Scholar] [CrossRef]
- Jini, S.; Indra, N.C. Understanding the impact of data parallelism on neural network classification. Opt. Mem. Neural Netw. 2022, 31, 107–121. [Google Scholar] [CrossRef]
- Salih, S.Q.; Alsewari, A.A. A new algorithm for normal and large-scale optimization problems: Nomadic people optimizer. Neural Comput. Appl. 2020, 32, 10359–10386. [Google Scholar] [CrossRef]


| Reference | Aim | Findings |
|---|---|---|
| [25] | Investigate healthcare feature selection methods with a focus on high-dimensional microarray gene expression datasets of cancer disease. The study evaluated 132 publications from nine different research directions. | Based on six key perspectives assessment, the study highlights that there is no single feature selection algorithm that is universally effective. It also emphasized that further research is required to address the computational intensity and time-consumption of feature selection methods for high-dimensional datasets. |
| [26] | Exploring how applications of machine learning algorithms in healthcare are dependent on the availability of medical datasets as the process of data acquisition faces many challenges. | The study identified several potential research avenues that call for more data-efficient machine learning algorithms research. The review suggested several solutions for the effective classification of small medical datasets and further suggested ML pipelines to address the challenges of large volumes of healthcare datasets. |
| [27] | Review, investigate, and evaluate the role of ML in the classification of biomedical image datasets with emphasis on applications of healthcare big data. While effective, the review paper also investigated the performance efficiency of ML using big data frameworks like Apache Spark. | Efficient and streamlined diagnosis of a substantial volume of healthcare biomedical images requires the integration of ML and big data technologies. The proposed workflow (deep learning and Apache Spark) highlights several advantages, such as enhancing rapid medical data query, training speed, and efficient management of vast healthcare datasets (both structured and unstructured). |
| [28] | Explore and investigate the applications and utilization of big data analytics techniques in healthcare, emphasizing the early detection of diseases, prediction, and prevention. The study examined five healthcare sub-disciplines: medical signal analytics, bioinformatics, image analysis and informatics, and public and clinical informatics. | The review highlighted several key applications of big data in healthcare, including personalized medicine, clinical decision support, operational optimization, and cost-effectiveness analysis. It demonstrates how big data analytics facilitate early patient identification for timely intervention, thereby enhancing clinical outcomes across medical domains. Moreover, the study examined several key challenges in adopting big data analytics in healthcare, including privacy concerns, efficient data processing, and data heterogeneity, while emphasizing the need for further research and exploration. |
| [29] | Review and examine the effectiveness of supervised and unsupervised machine learning in time series healthcare datasets, such as heart rate datasets. The review also analyzed the advantages and disadvantages of utilizing unsupervised methods in cases when dataset labels are unavailable. | While effective in predictive analytics and diagnosis when applied to several scenarios, the authors stressed the need for a collaborative approach between machine learning and data analytics to ensure effective integration of ML algorithms into healthcare practices. |
| [30] | Review and analyze of the performance of 41 research articles aimed at the prediction of cardiovascular diseases (CVD) using healthcare medical bigdata. | The authors stressed that the reviewed ML algorithm’s performance notably degrades when applied to larger sample sizes. Second, many studies lacked consistent clinical relevance, rendering the application of proposed methods in real-world healthcare scenarios more challenging. The authors also suggested—based on several noted issues in the reviewed literature—the need for efficient selection and hyper-parameter tuning techniques to enhance the prediction accuracy of CVD. |
| [31] | Examine a range of ML algorithms implemented on Field Programmable Gate Arrays and hybrid system-on-a-chip (SoC) for real-time classification required for high-velocity healthcare applications. The study primarily focused on tackling scalability challenges posed by embedded systems for biomedical applications, such as power, limited memory, and network sizes and topologies. | The study emphasized the significance of real-time performance gain with less energy consumption compared to traditional implementations of nine reviewed ML algorithms. However, in addition to requiring low-level programming skills, the authors stressed that many of the existing embedded systems are not designed for general scalable purposes and lack the necessary flexibility to accommodate healthcare big data applications. New design patterns for ML algorithms and embedded systems (such as flexible FPGA architecture) for solving such problem sets are required. |
| [32] | Explore the role of cloud infrastructure and machine learning algorithms in the lifecycle management of ML in healthcare and biomedical data. The study comprehensively reviewed the state-of-the-art architectural decisions necessary to ensure data privacy, security, and efficient management of AI-driven healthcare systems. | The study highlighted several critical roles in realizing data-driven decision-making in healthcare big data. Among key findings is the distributed learning across decentralized structures for efficient high-velocity processing, data pipelines to ensure effective learning of large volumes of medical data and mitigation of ML bottlenecks, as well as integration of federated learning as a key aspect in enhancing medical collaboration, privacy, lifecycle management, and efficient AI-driven healthcare applications. |
| Ref. | Focus | Scalability | Healthcare Big Data Applications | Computational Resources Requirement | Other Limitations |
|---|---|---|---|---|---|
| [54] | Volume | ●●●●●● | ●●●●●● | ●●●●●● |
|
| [57] | Velocity | ●○○○○○ | ●●○○○○ | ●●●○○○ |
|
| [58] | Velocity | ●●●●●● | ●●●○○○ | ●●●●●● |
|
| [59] | Velocity | ●●○○○○ | ●●○○○○ | ●●○○○○ |
|
| [60] | Volume | ●●●●●● | ●●○○○○ | ●●○○○○ |
|
| [61] | Velocity | ●●●●●● | ●●●●●● | ●○○○○○ |
|
| [62] | Velocity | ●○○○○○ | ●○○○○○ | ●●○○○○ |
|
| [63] | Volume | ●●○○○○ | ●●○○○○ | ●●●○○○ |
|
| [64] | Volume | ●○○○○○ | ●○○○○○ | ●●○○○○ |
|
| [65] | Velocity | ●●●●●● | ●●●○○○ | ●●●●●● |
|
| Ref. | Focus | Scalability | Cost | Technical Difficulties | Other Limitations |
|---|---|---|---|---|---|
| [69] | Velocity | ●●●●●● | ●●●●○○ | ●●●●○○ |
|
| [70] | Velocity | ●○○○○○ | ●○○○○○ | ●●●●●● |
|
| [71] | Velocity | ●●●●○○ | ●○○○○○ | ●●●○○○ |
|
| [73] | Volume | ●●●●●● | ●●●●○○ | ●●●●○○ |
|
| [74] | Velocity | ●○○○○○ | ●○○○○○ | ●●●○○○ |
|
| [75] | Velocity | ●○○○○○ | ●○○○○○ | ●●●○○○ |
|
| Ref. | Focus | Scalability | Cost | Technical Difficulties | Other Limitations |
|---|---|---|---|---|---|
| [78] | Volume | ●●●●●● | ●○○○○○ | ●●○○○○ |
|
| [79] | Velocity | ●●●●●● | ●●●○○○ | ●●●●○○ |
|
| [80] | Velocity | ●●●●●○ | ●○○○○○ | ●○○○○○ |
|
| [81] | Volume | ●●●●●● | ●○○○○○ | ●●○○○○ |
|
| [82] | Velocity | ●●●●○○ | ●○○○○○ | ●○○○○○ |
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khudhur, D.Y.; Shibghatullah, A.S.; Shaker, K.; Abdul Latif, A.; Muda, Z.C. Recent Trends in Machine Learning for Healthcare Big Data Applications: Review of Velocity and Volume Challenges. Algorithms 2025, 18, 772. https://doi.org/10.3390/a18120772
Khudhur DY, Shibghatullah AS, Shaker K, Abdul Latif A, Muda ZC. Recent Trends in Machine Learning for Healthcare Big Data Applications: Review of Velocity and Volume Challenges. Algorithms. 2025; 18(12):772. https://doi.org/10.3390/a18120772
Chicago/Turabian StyleKhudhur, Doaa Yaseen, Abdul Samad Shibghatullah, Khalid Shaker, Aliza Abdul Latif, and Zakaria Che Muda. 2025. "Recent Trends in Machine Learning for Healthcare Big Data Applications: Review of Velocity and Volume Challenges" Algorithms 18, no. 12: 772. https://doi.org/10.3390/a18120772
APA StyleKhudhur, D. Y., Shibghatullah, A. S., Shaker, K., Abdul Latif, A., & Muda, Z. C. (2025). Recent Trends in Machine Learning for Healthcare Big Data Applications: Review of Velocity and Volume Challenges. Algorithms, 18(12), 772. https://doi.org/10.3390/a18120772

