Next-Generation Machine Learning in Healthcare Fraud Detection: Current Trends, Challenges, and Future Research Directions

Razzaq, Kamran; Shah, Mahmood

doi:10.3390/info16090730

Open AccessReview

Next-Generation Machine Learning in Healthcare Fraud Detection: Current Trends, Challenges, and Future Research Directions

by

Kamran Razzaq

^*

and

Mahmood Shah

^*

Newcastle Business School, The University of Northumbria, Newcastle upon Tyne NE1 8ST, UK

^*

Authors to whom correspondence should be addressed.

Information 2025, 16(9), 730; https://doi.org/10.3390/info16090730

Submission received: 28 June 2025 / Revised: 6 August 2025 / Accepted: 22 August 2025 / Published: 25 August 2025

(This article belongs to the Special Issue Emerging Applications of Machine Learning in Healthcare, Industry, and Beyond)

Download

Browse Figures

Versions Notes

Abstract

The growing complexity and size of healthcare systems have rendered fraud detection increasingly challenging; however, the current literature lacks a holistic view of the latest machine learning (ML) techniques with practical implementation concerns. The present study addresses this gap by highlighting the importance of machine learning (ML) in preventing and mitigating healthcare fraud, evaluating recent advancements, investigating implementation barriers, and exploring future research dimensions. To further address the limited research on the evaluation of machine learning (ML) and hybrid approaches, this study considers a broad spectrum of ML techniques, including supervised ML, unsupervised ML, deep learning, and hybrid ML approaches such as SMOTE-ENN, explainable AI, federated learning, and ensemble learning. The study also explored their potential use in enhancing fraud detection in imbalanced and multidimensional datasets. A significant finding of the study was the identification of commonly employed datasets, such as Medicare, the List of Excluded Individuals and Entities (LEIE), and Kaggle datasets, which serve as a baseline for evaluating machine learning (ML) models. The study’s findings comprehensively identify the challenges of employing machine learning (ML) in healthcare systems, including data quality, system scalability, regulatory compliance, and resource constraints. The study provides actionable insights, such as model interpretability to enable regulatory compliance and federated learning for confidential data sharing, which is particularly relevant for policymakers, healthcare providers, and insurance companies that intend to deploy a robust, scalable, and secure fraud detection infrastructure. The study presents a comprehensive framework for enhancing real-time healthcare fraud detection through self-learning, interpretable, and safe machine learning (ML) infrastructures, integrating theoretical advancements with practical application needs.

Keywords:

machine learning; healthcare; cybercrimes; frauds; medical databases

1. Introduction

Healthcare fraud, waste and abuse (FWA) is a significant and growing threat to our nation’s financial health and integrity in healthcare. These illicit activities result in substantial economic losses, with estimates indicating that billions of dollars are lost annually [1]. The impact of this financial drain extends beyond increasing healthcare costs for individuals and institutions; it often diverts critically needed resources away from proper patient care, perpetuating a vicious circle that further exacerbates inequities in access to services and erodes public trust in the healthcare system. Take, for instance, the estimate that the National Health Care Anti-Fraud Association (NHCAA) says healthcare fraud costs the United States USD 60–250 billion a year [2]. For example, there is a fraud case registered at the U.S. Attorney’s Office in the Middle District of North Carolina. A healthcare company falsely claimed over USD 100 million through up-coded billing and fake diagnoses [3]. In another example, according to Sweeney [4], the federal government has saved USD 1.5 billion since the program launched in 2011, from the machine learning-based fraud prevention system implemented at the Centres for Medicare & Medicaid Services (CMS).

These financial losses quantify the need for more efficient and complex detection and prevention of fraudulent activities. Coupled with the ever-increasing issue of digitising healthcare processes and the colossal amount of data generated, this provides new means for increasingly complex fraud schemes.

Healthcare fraud is an evolving landscape which is diverse and vast in scope. Some schemes include performing unnecessary genetic or COVID-19 testing [5], claiming for behavioural health services when not provided [6], totally fictitious claims [7], use of cloned or falsified medical records to support fraudulent billing [8], upcoding (billing for more expensive services than rendered) [9], double-billing for the same service [10], and duplicate claims for the same service [11]. The sheer variety and ingenuity of these fraudulent activities underscore the shortcomings of traditional detection methods, which typically resort to static rules and manual monitoring [12].

Traditionally, healthcare fraud detection has relied on rule-based systems and manual audits, and these approaches are losing ground against the increasingly complex and expansive fraud [13]. However, these methods are often time-consuming, resource-intensive, and prone to human error. In addition, they are inherently reactive, struggle to discover new fraud patterns, and respond to perpetrators’ constantly evolving delivery methods. The static nature of these systems makes them a vulnerable target, as fraudsters can learn and adapt their methods to become increasingly effective at bypassing rule-based solutions, highlighting the need for a more dynamic and intelligent approach to fraud detection.

To overcome these limitations, machine learning (ML) has proved to be a powerful and promising approach for automating and improving healthcare fraud detection [14]. ML is a type of artificial intelligence that allows systems to learn from large amounts of data and find spots of high and complex patterns that do not require explicit programming [15]. In the healthcare context, where the volume and complexity of data are immense, it is essential for this capability, in particular, because human analysts cannot easily find subtle abnormalities and suspicious patterns indicative of fraudulent behaviour [16,17]. Machine learning (ML) algorithms have the potential to evolve and adapt to new data, making them more dynamic and capable of discovering new fraud patterns than rule-based systems.

The objective of this study is to comprehensively investigate the current use cases of machine learning for healthcare fraud detection. It will discuss the application of various machine learning techniques in identifying types of healthcare fraud, utilising specific methods and highlighting the significant challenges to their practical implementation. Additionally, the study will outline futuristic directions in this field by highlighting upcoming trends and innovations that can help the field combat healthcare fraud. Its purpose is to present a thorough and insightful analysis for healthcare technology strategists and senior researchers who want to understand the role of advanced machine learning in protecting the healthcare system.

Although the datasets and multiple governing contexts are from the US-based healthcare system, the present provides a global perspective of healthcare fraud detection. However, due to the limited data access from developing and underdeveloped regions, specific insights from those regions are restricted. Future research should focus on these settings to better understand the healthcare fraud prevention phenomenon.

Although numerous studies have explored machine learning (ML) applications for fraud detection in the financial sector, there have been very few in the context of healthcare insurance fraud [7,11,18,19]. There has been a lack of comprehensive and comparative analysis incorporating the latest machine learning (ML) concepts, such as explainable AI, federated learning, and resampling techniques like SMOTE-ENN, in a single framework. Earlier literature reviews tend to focus on algorithms involved in or datasets related to analysis, and fail to connect theoretical advancements with practical challenges in implementation across various types of healthcare fraud, such as billing, insurance, prescription, and identity theft. The present study provides a comprehensive, full-spectrum analysis. It develops actionable recommendations relevant to researchers and practitioners, regulators, and insurance companies, enabling them to design robust and scalable systems for detecting and preventing fraud.

The research protocol concerns the following key research question: How effectively can machine learning be applied to healthcare organisations? Followed by some further questions:

-: What are the recent advancements in using machine learning to detect fraud in the healthcare sector?
-: What barriers or challenges do organisations face in implementing machine learning in healthcare fraud detection?
-: In which ways can the efficiency of machine learning be improved in detecting healthcare fraud?
-: Which datasets are more common in healthcare fraud detection?

2. Background and Related Work

Machine learning in healthcare fraud detection employs various techniques, each with strengths and suitability in addressing multiple aspects of the current problem. These techniques can be further classified as supervised, unsupervised, semi-supervised, deep learning, and ensemble learning [20,21].

Earlier studies, e.g., Hamid, Khalique [13], Zhang, Xiao [22], Ali, Abd Razak [23] have primarily listed ML techniques implemented for fraud detection, yet they fail to present a holistic framework that combines practical challenges, performance barriers, and cross-industry implementations.

For instance, ref. [13] employed traditional supervised learning algorithms, such as DT and SVM, on structured datasets, but lacks attention to class imbalance or privacy issues. However, ref. [22] utilised unsupervised clustering techniques to discover fraudulent patterns, which were more effective in fraud detection. Moreover, ref. [24] integrated blockchain with machine learning improves tamper resistance, showing robust regulatory configuration.

Furthermore, most studies ignore imbalanced data processing or recent developments, such as blockchain implementation, federated learning, explainable AI and SMOTE-ENN. The current research addresses the gaps above by integrating machine learning (ML) methods with practical implementation feasibility and providing a systematic roadmap for future studies.

Recent advancements in transformer models for time series analysis, such as Autoformer and FEDformer, have addressed some of the main drawbacks of traditional transformers, especially their struggles with efficiently capturing long-term patterns and their heavy computational demands with sequential healthcare data [25]. Autoformer, for instance, uses a decomposition block that separates the trend and seasonal elements within a time series. This makes the model’s predictions easier to interpret and boosts its ability to make accurate long-term forecasts. That is particularly useful for applications like monitoring claim submission trends or analysing billing cycles in healthcare fraud analytics.

Meanwhile, FEDformer introduces a frequency-enhanced approach using Fourier transforms to identify repeating patterns directly in the frequency domain. This method helps lower computational complexity while keeping the necessary seasonal and periodic signals intact. That makes it especially beneficial for working with high-frequency healthcare claims data, where understanding long-term patterns matters, but traditional attention mechanisms would be too resource-intensive. Together, these transformer variants show promising improvements in both performance and scalability for time-based healthcare applications.

An evident approach to learning about fraud is leveraging supervised learning and essentially training algorithms on labelled datasets of fraudulent and non-fraudulent activities [23]. Decision trees (DT), random forests (RF), support vector machines (SVMs), and logistic regression (LR) are standard algorithms in this category [26]. In these models, we train to learn to categorise new, unseen data, knowing the patterns present in the provided labelled training data. This can be achieved, for example, by having experienced practitioners validate the findings of these tools by labelling questionable billing behaviours, which gives them feedback to the model to help it learn and become better at flagging suspect providers over time [27,28].

Although supervised learning techniques like decision trees (DT), support vector machines (SVMs), and random forests (RF) are efficient in identifying known fraud patterns, this is due to their training on labelled datasets [29,30]. Yet, their dependency on labelled datasets makes them impractical for healthcare datasets, where privacy laws and the cost of labelling data pose the key barrier, resulting in severe class imbalance [31]. Even though supervised learning performs well in structured claims and monotonous frauds, such as billing and upcoding, it cannot identify the latest frauds from the datasets without being trained on new datasets, making it ineffective in the healthcare environment [32]. Due to privacy concerns and the rapid evolution of fraudulent schemes in healthcare, supervised learning relies on labelled data, which can be a significant and challenging issue.

An alternative is unsupervised learning, which does not require labelling. Techniques such as clustering and anomaly detection are applied to uncover hidden patterns and identify outliers in the data [33]. Unsupervised algorithms, for instance, can identify claims that significantly diverge from standard billing patterns or patient behaviour, thereby identifying potential fraud. Another form of unsupervised learning is trend analysis, which compares a provider’s billing behaviour against their peers to determine if a possible anomaly exists [34]. Unsupervised learning is valuable, especially when detecting new and emerging fraud patterns that may not be tracked down by labelled data or predefined rules. Nonetheless, humans may already analyse several anomalies to determine if they are frauds. Despite all these advantages, unsupervised learning has some limitations, such as being primarily reliant on good-quality feature engineering and having a very high false positive rate, rendering it unproductive in instantaneous decision-making within operational systems [35]. However, unsupervised techniques remained effective when combined with traditional methods, which uncovered outlier activity within patient records [36].

Semi-supervised learning is a hybrid approach that utilises both a small amount of labelled data and a large amount of unlabelled data to train [37]. This becomes very useful in detecting healthcare fraud, where obtaining large labelled datasets can prove complicated or costly. Unlabelled data can help the model improve its knowledge of the underlying structure and distribution of the data and, therefore, may lead to better accuracy and generalisation [32].

Deep learning models, including neural networks with various layers, can analyse complex patterns in high-dimensional healthcare data [38]. These models can automatically learn intricate features from the data without requiring explicit feature engineering. Thus, these models are well-suited for detecting sophisticated fraud methods. Deep learning architectures, such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory Networks (LSTMs), are being investigated for their ability to discover complex patterns in claims data and temporal sequences of healthcare events [39].

Deep learning techniques, such as LSTMs and CNNs, possess sophisticated capabilities for extracting complex, non-linear patterns from large datasets, including electronic health records, medical images, and prescription logs, which makes them a suitable fit for complex healthcare fraud detection systems [40]. However, their black-box nature, rigorous training conditions, and the need for enormous labelled datasets restrict their application in highly regulated environments, where trust, auditability, and explainability are most important. Although efficient in preliminary fraud detection analyses, they are challenging to implement in real-world healthcare settings due to limitations in interpretability and ethical concerns [41].

Combining multiple individual machine learning models’ predictions improves accuracy and robustness over single models [42]. Typical ensemble methods are commonly used, such as adaptive boosting (AdaBoost), gradient boosting, and extreme gradient boosting (XGBoost) [43]. These methods combine through an iterative process of many weak classifiers to form a robust predictive model. As shown by stacking ensemble models (stacking multiple base models with a meta-learner), these models perform very well at detecting specific types of fraud, such as overutilisation.

Primarily, these machine learning techniques are increasingly being used in healthcare fraud detection. ML algorithms used in billing fraud detection analyse a massive amount of claims data and flag suspicious patterns, such as odd billing codes, duplicated claims, or services provided that do not fit with a patient’s medical history [44,45,46]. Many such discrepancies in patient records and insurance claims—signals of fraudulent billing practices stem from unstructured data, and NLP is necessary to make sense of them [47]. The futuristic use of predictive analytics uses historical data to predict what fraudulent billing activity may occur and, in doing so, proactively intervene [28]. Chatterjee, Das [28] discussed some techniques powered by AI real-time monitoring systems to analyse real-time transactions and detect and prevent fraudulent claims with immediate payment.

ML systems for prescription fraud watch for patterns in frequency, dosage, and relationships between patients, prescribers, and pharmacies that could indicate drug diversion schemes or doctor-shopping [48,49]. Unsupervised Learning can identify unusual prescription patterns for individual patients or providers compared to typical behaviour [50]. Additionally, medical claims data can be classified to identify patterns of provider-side prescription fraud indicators [51].

ML techniques for healthcare identity theft analyse patient identification documents, insurance claims and medical records to detect anomalies and suspicious patterns indicating unauthorised use of patient identity [52]. In medical billing and real-time fraud detection systems, outlier detection, like anomaly detection, can flag such activities relative to a specific patient’s record. In addition, inconsistencies that indicate identity theft can be pulled out of patient behaviour analysis, such as tracking appointment attendance and treatment type [53]. Finally, verification of the authenticity of medical documents and control for fake information is provided through NLP [54].

The present study employs a systematic approach, adhering to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines to enhance transparency and reliability. This literature review was conducted to summarise recent advancements in machine learning (ML) for healthcare fraud detection and prevention. For this purpose, multiple databases were utilised, including Scopus, IEEE Xplore, Web of Science, and Google Scholar. The keywords used for the study consisted of: “machine learning”, “cybercrime”, “cyber frauds”, “healthcare”, health insurance”, and “deep learning”. The studies published between 2015 and 2025 were included in the review. Moreover, the studies published in English were considered, and non-English papers were ignored. Finally, two independent reviewers evaluated the titles and abstracts to synthesise the findings qualitatively.

The articles, which were peer-reviewed and included healthcare fraud datasets, were published in English and chosen for the study. Moreover, the selected articles were then screened by two independent experts from the field, who assessed the titles, abstracts, and full texts. Finally, a qualitative synthesis was conducted to identify patterns in the data and classify the results.

Table 1 below presents standard machine learning techniques and their commonly used algorithms.

3. Navigating the Challenges of Healthcare Fraud Detection with ML

The machine learning-based healthcare fraud detection system has undergone several developments; however, some challenges may prevent it from being effective and widely adopted in other areas. Data complexity, heterogeneity, and sheer volume make working with healthcare data [77]. Data is sourced from various sources, including structured and unstructured claims data, unstructured clinical notes, medical images, and sensor readings.

This vast amount of information must be pre-processed well for sophisticated data preprocessing techniques, such as handling missing values, encoding categorical variables, and scaling numerical features. These steps can be computationally intensive and require specialised data science and healthcare expertise. More accurate modelling from the ‘big data’ nature of healthcare comes at the cost of a robust data management infrastructure and pipelines [40].

Fraudulent schemes continuously evolve, with fraudsters using advanced technologies such as AI to generate more sophisticated and harder-to-detect fraudulent products, such as fake claims and duplicate medical records [78]. As this dynamic landscape requires, fraud detection systems must be dynamic, not static. To maintain its effectiveness, the machine learning model must be retrained on new data and potentially refactored to incorporate new features or algorithms that can handle emerging fraud patterns.

Bias in AI algorithms, particularly when biases stemming from the training data manifest, results in disparate detection rates among different demographic groups [79]. For instance, a model trained on a majority demographic group may not perform with equal accuracy on data from a minority demographic group. Moreover, healthcare fraud data is typically characterised by severe class imbalance; the number of fraudulent cases is much smaller than that of non-fraudulent cases [80]. Table 2 presents the strengths and weaknesses of LLMs vs. Graph-Based Models in healthcare fraud.

As a result, it can lead to models that are very good at distinguishing legitimate cases but not at identifying fake cases, since the model may be biased towards the majority class. To solve this problem, the Synthetic Minority Oversampling Technique (SMOTE) can create synthetic instances of the minority class [44]. Addressing data imbalance and bias is imperative to building fair and effective fraud detection systems that do not unfairly discriminate against one patient or provider population over another.

Another big challenge is that cyber criminals could tamper with predictive models so they will not be caught [81,82]. Fraudsters may try to manipulate data inputs or even change algorithms in scenarios that do not usually catch their illicit activity, and legitimate activity is flagged as fraudulent. Strong precautions, such as robust data validation processes, frequent updates, and regular testing of AI systems, should be in place to maintain their accuracy and reliability. Resource limitations also limit the deployment of advanced ML systems for healthcare fraud detection [83]. These systems require a substantial amount of computational power to build and maintain, as well as expertise in data science and machine learning, along with ongoing financial investment. Small organisations usually lack these facilities and cannot afford them at once.

In addition, the ethical considerations regarding its use in healthcare, including the possibility of impersonal interaction and what that means for those who lack access to technology, are equally important to discuss [84]. Additionally, these models must be trained rigorously in the privacy and security of the sensitive patient data used to train and operate them [85]—Table A1 in Appendix A highlights machine-learning techniques in healthcare, their applications, benefits and potential challenges.

Although a comprehensive list of challenges is discussed in Table A1 of Appendix A, the following are some primary barriers.

3.1. Data Imbalance and Quality

The absolute predominance of non-fraudulent data tends to create a bias in models, where the latest sampling or synthetic growth methods, such as SMOTE, are necessary, which are associated with risks [86].

3.2. Privacy and Compliance Constraints

Data protection rules and regulations, such as GDPR, restrict the sharing of data, thwarting collective model training or continuous surveillance between institutions [87].

3.3. Interpretability and Trust

In deep learning, black-box models erode stakeholders’ confidence in ML infrastructure, particularly where legal or clinical validation is required [88].

3.4. Resource Limitations

The expensive computation, lack of talent, and high infrastructure cost hinder implementation in a low-resource healthcare environment [2,89].

3.5. Adversarial Manipulation

Cybercriminals are equipped to take advantage of or exploit ML systems, which necessitates the model’s accuracy, robustness, and continuous updates [90].

4. Emerging Trends in ML-Driven Fraud Detection

Several emerging trends and directions in the field have positioned machine learning for a big future in fraud detection in healthcare. With the increase in digital fraud, machine learning (ML) techniques are increasingly applied in critical application areas, such as healthcare. For example, to increase the interpretability of ML models, techniques such as Shapley Additive explanations (SHAP) values are used to quantify each feature’s contribution to the prediction [91]. This makes it essential for AI-driven fraud detection systems to be able to explain to us why their decisions were made. The techniques, such as SHAP and Local Interpretable Model-agnostic Explanations (LIME), are primarily used to introduce the black-box occurrence of machine learning (ML) and deep learning (DL) models. These techniques offer feature contributions that can be visualised, helping fraud detection and prevention become more effectively explainable and reliable in regulatory compliance domains, such as healthcare [92].

Integrating blockchain technology with machine learning presents various opportunities to enhance the security and integrity of healthcare data used for fraud detection [8,14,24,52,93,94]. Blockchain’s immutable and decentralised ledger can offer a tamper-proof record for healthcare transactions and patient data, making it resistant to fraudsters’ attempts to fabricate information and submit claims. Moreover, the integration of blockchain helps mitigate exploitation threats and enhances auditability in fraud detection workflows [95,96,97]. Finally, this technology can help secure data sharing and verification between authorised entities, enhancing the accuracy and reliability of fraud detection models.

In addition to the evident advantages, the blockchain integration introduces critical challenges, such as transaction dormancy, high computational cost, and legal concerns regarding data immutability in storing personal information. To overcome these barriers, hybrid architectures are employed, which utilise off-chain storage, lightweight consensus systems, and policy-based blockchain strategies [98].

Future fraud detection systems will be able to operate in real-time and adapt to new fraud techniques [1,13,72]. Second, real-time fraud detection occurs by comparing vast volumes of data from claims generated in real-time, enabling the immediate discovery and blocking of suspicious activities before claims are paid. Such adaptive models will be able to learn from new data and adjust their detection strategies on the fly without requiring reprogramming, making them more resilient against emerging fraud schemes.

An emerging machine learning paradigm, Federated Learning, allows training models on decentralised datasets of multiple organisations without directly sharing sensitive patient data [87,99,100]. It enables us to break out of data silos and handle privacy concerns simultaneously, thereby attracting creative individuals who can develop more robust and generalisable fraud detection models through joint development using different datasets, despite the confidentiality of these datasets. Moreover, Federated Learning enables collaborative training of models, even when data is stored on local servers, without sharing sensitive patient records, all under the strict guidance of privacy regulations like the GDPR, which ensures data protection across organisations [101,102].

Another promising direction of graph-based machine learning that exploits the native relational nature of healthcare data is utilised [103]. These techniques can consider patients, providers, claims, and other entities as nodes in the graph, and their nuances as edges, to identify long and interrelated patterns of scope, such as a network of interconnected, colluding providers or coordinated fraud involving patients. As an alternative to traditional tabular data analysis methods, this offers the potential to find an understanding that would be lost in the tabular data analysis method.

Temporal graph neural networks GNNs, such as Temporal Graph Attention Network TGAT and Temporal Graph Convolutional Network T-GCN enhance traditional GNNs by integrating temporal dependencies and dynamic node interactions over time. TGAT employs time encoding and attention systems to design the temporal dependencies, making it learn from interaction patterns. Meanwhile, T-GCN merges graph convolutions with gated recurrent units (GRUs) to attain sequence and structure. Such capabilities enable temporal GNNs to help detect fraud patterns.

5. The Art and Science of Feature Engineering and Selection for Fraud Detection

5.1. Importance of Feature Engineering

Building effective machine learning models for healthcare fraud detection is a critical aspect of feature engineering. The process, also known as feature extraction, involves creating and selecting relevant and informative raw healthcare data features that help the model differentiate between fraud and legitimate activities [104,105,106]. Machine learning algorithms often cannot recognise subtle patterns and anomalies characterised as fraudulent behaviour, leading to low accuracy and interpretation of the detection models.

5.2. Common Features in Healthcare Fraud

In healthcare fraud detection, there are standard features that are commonly used. Features are often described as transactional, behavioural, contextual, and temporal. At first, there are basic transaction features such as transaction amount, the time of service, and provider or merchant details [13]. The behavioural features encompass a user’s typical interaction patterns with the healthcare system, including their transaction history, appointment attendance, and the services they recommend.

Additional information related to the circumstances surrounding a transaction or patient interaction, including the geographic location where services were provided, is referred to as contextual features. Other features can also be derived from domain knowledge, such as billing codes, diagnosis codes, and relationships among referring and performing providers. Combining various features can provide a more comprehensive picture of the possibility of a fraudulent transaction.

5.3. Feature Selection Techniques

A potentially large set of engineered features is first identified using feature selection techniques for the most relevant and informative features [91,107,108]. The methods presented here aim to reduce the data dimensions and enhance the model’s predictive capabilities and interpretability. Filter, wrapper, and embedded methods are standard feature selection methods that use statistical measures to rank features based on their association with the target variable, evaluate feature-sets, and integrate feature selection into the machine learning model training process, respectively.

Specific techniques, such as information gain, recursive feature elimination, and principal component analysis (PCA), are utilised in healthcare fraud detection. Figure 1 visually presents the multi-step flow of the feature engineering process.

Feature engineering should be tailored to the types of healthcare fraud being targeted [109]. For example, features related to billing patterns, such as the frequency of specific billing codes or the average cost per claim, can be engineered to detect billing fraud. Features regarding the number of services a provider offers compared to their peers can be created to identify overutilisation.

For prescription fraud detection, features related to the types, amounts, and behaviours of prescribed drugs, as well as the prescribing practices of doctors and dispensing pharmacies, can be engineered [44]. Sometimes, data scientists and healthcare domain experts must collaborate to identify features that most likely indicate whether such activity is fraudulent.

In addition, federated learning offers a fascinating approach to privacy-conserving fraud detection; however, its implementation in a real-time environment remains challenging. Organisations must ensure that their operational structure has the potential capacity, including consistent bandwidth, GPU computing, and a reliable aggregation mechanism. These settings are pretty expensive and not widely available in every organisation.

Moreover, implementing real-time federated learning introduces a new layer of complexity, including expectancy, user variability, and the need for continuous updates, making its implementation possible only in high-resource institutions.

6. Datasets Used in Healthcare Fraud Detection Research

Machine learning’s effectiveness in detecting healthcare fraud relies heavily on high-quality and relevant data for training and evaluation. Various datasets are used in research in this area, each with its own characteristics and limitations.

Healthcare fraud detection research uses primary claims data, electronic health records (EHR), providers, and claims history as information sources. Often, public datasets from the Centres for Medicare & Medicaid Services (CMS), i.e., the Medicare Physician and Other Practitioner (MPOP PS) by Provider and Provider (MPOP P) datasets, are used [110,111]. The datasets encompass healthcare utilisation, payments, and submitted charges, providing detailed information. Another widely used dataset platform is the Office of the Inspector General’s List of Excluded Individuals and Entities (LEIE), a primary source for identifying federal healthcare program-excluded providers for fraud [44,112]. Also, several other datasets concerning healthcare fraud on platforms like Kaggle can be used for data analysis and machine learning competitions [113]. Some of the most commonly used datasets in healthcare fraud detection using machine learning (ML) are discussed in Table 3 below.

Nevertheless, the existing datasets face several challenges. The problem of severe class imbalance is aggravated, where the number of fraudulent cases is usually more significant than the number of legitimate ones [31]. The imbalance can also result in machine learning models sometimes favouring the majority over others to reduce the number of false positives, thus making it hard for the model to identify ‘fraud.’ While the LEIE dataset helps expose overtly fraudulent providers, its limitations in identifying other, more nuanced forms of fraud, such as overutilisation, which accounts for a large percentage of healthcare fraud losses, are crucial [112]. In addition, some datasets have less information about which methods fraudulent claims were submitted for, making it challenging to train models to handle fraud at a detailed level. Other challenges include inconsistent data formats, a lack of standardisation of data across different data sources, and privacy concerns that can restrict access to specific patient-level data [114].

Although class imbalance is an explicitly acknowledged issue in healthcare fraud detection, it varies remarkably across different types of datasets [13,44,86]. Public datasets, such as MIMIC-III, or synthetically structured datasets, are not well-represented enough for fraud detection, which limits their effectiveness for supervised learning in the absence of data augmentation [115]. Moreover, existing managerial records from insurers are hardly accessible due to delicate biases that exhibit institutional reporting practices, making the balance of fraud and non-fraud cases even more problematic.

Efforts are made to improve the quality and usefulness of datasets for healthcare fraud detection research. One method is to utilise the expertise of old physicians and medical billers in annotating more accurately classified datasets, addressing class imbalance, and identifying minor fraud [116]. This is also why authorities are becoming more aware that more information about the fraudulent results found in the researchers’ training data is necessary. Developing standardised benchmarks, publicly available and representative of different kinds of healthcare fraud, would greatly help the field with further direct comparison and evaluation of multiple machine learning methods.

Table 3. Standard Datasets Used in Healthcare Fraud Detection Research.

Dataset	Description	Strengths	Limitations	Reference
CMS Medicare Data (MPOP PS/P)	Detailed data on Medicare utilisation, payments, and charges.	Large scale, publicly available.	Can suffer from class imbalance.	[117,118,119]
LEIE (List of Excluded Individuals And Entities)	List of providers excluded from federal healthcare programmes due to fraud.	Provides labels for overt fraud.	Significant class imbalance is often associated with blatant fraud.	[44,91,120]
Kaggle Healthcare Datasets	Various datasets related to healthcare claims and provider information.	Publicly available, used in competitions.	Varies in quality and representativeness.	[86,121,122]
NHCAA Healthcare Fraud Dataset	Suitable for relative model benchmarking and works perfectly when the input is balanced data.	Consists of Labelled instances of various frauds and feature metadata	Not available publicly, and requires ethical requirements for use	[123,124,125]
MIMIC-III & IV	Commonly used in advanced fraud detection, e.g., document falsification and insurance exploitation inference.	Large-scale clinical and temporal data.	Explicitly designed for clinical analytics and requires a huge preprocessing time.	[126,127]

7. Interplay Between Cybersecurity and Machine Learning in Healthcare Fraud Prevention

Machine learning technologies are beneficial when used within a tier of cybersecurity to prevent healthcare fraud. Qayyum, Qadir [128] and Razzaq and Shah [129] stated that secure cybersecurity is a must to protect the sensitive healthcare data used by ML models for training and prediction. It was further discussed by Borky, Bradley [130], essential security controls, such as firewalls, antivirus software, firm password policies, and strict security access controls, are critical to protecting data from unauthorised access, use, and disclosure. Another important preventative measure is establishing a strong security culture among healthcare organisations and giving users comprehensive training on phishing attacks [131,132,133].

The healthcare industry faces many sophisticated cyberattacks, including ransomware, phishing, data breaches, cloud compromises, supply chain attacks, and business email compromise [134]. These attacks can have severe consequences, from disrupting patient care and compromising sensitive patient data to causing significant financial losses and reputational damage. The high value of protected health information (PHI) on the black market makes healthcare organisations a prime target for cybercriminals.

A new wave of cyberattacks, aimed at sapping healthcare through resource consumption and increasingly implemented by exploiting human vulnerability or weaknesses in engineering and software, requires that a healthcare organisation establish its capability to defend itself [135]. One vulnerability, i.e., anomaly detection, can be performed through AI-powered tools that scrutinise network traffic and user behaviour patterns, which may point to a cyber intrusion or malicious activity [136]. Moreover, ML algorithms trained to identify patterns and characteristics of phishing and malware can identify phishing emails and malware. Additionally, AI can play a role in identity verification processes and detect unnatural access attempts to patient records or billing systems [137]. Figure 2 presents an overall view of how machine learning approaches are applied in healthcare. In the present study, the significant role of machine learning in healthcare fraud detection is discussed along with other fields, like image processing and patient behaviour analysis.

8. Tools Used in Healthcare Fraud Detection

The development of ML-driven solutions for healthcare fraud requires a robust ML-integrated infrastructure. Practical analytical tools and an integrated framework are crucial to designing, training, validating and implementing healthcare fraud detection models. Moreover, apart from algorithmic capability, reliability and compliance with healthcare requirements drive their success in clinical and insurance domain applications.

8.1. Core Machine Learning Tools

8.1.1. TensorFlow and Keras

One of the most commonly used tools is TensorFlow, an open-source ML platform that is very popular and flexible [138]. It can support various tasks, including deep learning, and provides robust and straightforward tools to build, train, and deploy models using CPUs, GPUs, or distributed systems [139,140]. Keras integrates well with a user-friendly high-level API for building neural networks. TensorFlow’s popularity is primarily due to its strong community support and extensive ecosystem, which enables the development of complex ML models for healthcare applications.

8.1.2. PyTorch

Another popular open-source machine learning framework is PyTorch, which is particularly popular in research. Meta developed PyTorch and is famous for its dynamic computation graphs that give a broad scope of flexibility in the design of models [141]. It features excellent GPU acceleration and robust tools and libraries for building and training neural networks and performing general machine learning tasks. With substantial research and increasing industry adoption, PyTorch has proven an excellent tool for experimenting with new machine learning techniques to detect healthcare fraud.

8.1.3. Apache MXNet

Apache MXNet is a scalable deep learning framework that supports multiple programming languages and enables efficient training on various hardware [142]. The design was flexible and performant for developing and deploying deep neural networks on various platforms. However, MXNet has not been actively developed since September 2023.

8.1.4. Weka

Weka is an open-source software tool specifically for data mining and machine learning that is especially popular in academic and research environments [143,144]. This provides a user-friendly graphical interface that enables users to perform various data analysis and predictive modelling tasks, including classification, regression, clustering, and association rule mining, without requiring code development. Weka is a helpful tool for conducting healthcare fraud research and practice, as it supports many data formats and large sets of machine learning algorithms.

Depending on the needs of a particular project, one has to select a specific machine learning framework based on, amongst other things, the complexity of the models to be developed, the size of the datasets to be processed, the level of customizability required, and the environment in which the developed models have to be deployed. Weka can be chosen for exploratory analysis or small-scale projects mainly because of the applications and ease of use of traditional machine learning algorithms, with deployments not on large scales. Still, TensorFlow or PyTorch is often the go-to framework for deep learning applications and deployments.

Table 4 presents TensorFlow and PyTorch as the most efficient tools for advancing deep learning models in healthcare fraud detection, providing scalability, substantial libraries and GPU support. TensorFlow supports the implementation of large-scale systems because of its growing environment, while PyTorch boost research infrastructure due to its robust prototyping. However, Weka is less equipped with deep learning technologies, but it is still helpful for smaller dataset processing and has a more user-friendly interface. Moreover, MXNet supports a multi-language environment, but lacks advancement because it has not been updated with the latest technological needs.

Apart from these general tools used for fraud detection in healthcare, some specialised analytical tools are deployed in the healthcare domain, e.g., spaCy + ScispaCy, Transformers, OpenFL, PySyft, Flower, and Tesseract OCR [152,153].

8.2. Specialised Analytical Tools

8.2.1. spaCy + ScispaCy

To be used for entity detection and data mining from unstructured clinical notes, and to retrieve statements to determine possible discrepancies and fake documents [154]. The tool performs well in preprocessing and tagging procedures within fraud workflows, but requires the integration of anomaly detection layers for its lucrative applications.

8.2.2. Transformers

The transformers are typically employed to identify linguistic patterns and semantic discrepancies in doctors’ notes and insurance documents through the use of deep contextual embeddings [155,156].

9. Discussion

The study highlights the growing complexity and potential of ML methodologies for addressing healthcare fraud, which is identified as a more complex, structured, and intricate form of fraud with stringent regulatory requirements. Although several studies have listed algorithmic strategies, Ali, Abd Razak [23], Qayyum, Qadir [128]. The current study has listed algorithmic strategies and contributes by implementing futuristic technologies, such as federated learning, explainable AI, and graph-based models, while assessing their pragmatic and empirical suitability.

Apart from the previous studies, the current study underscores the latest approaches, such as SMOTE-ENN [109] and federated learning [94] are exclusively developed to solve real-world issues, such as class imbalance and data privacy.

The literature review demonstrates that, although ML-driven practices have achieved convincing outcomes, their implementation remains challenging in multiple contexts, including infrastructure barriers, data privacy concerns, and the unavailability of labelled datasets.

The study further illustrates the technological shift from supervised learning techniques to ensemble learning, federated learning, and various hybrid learning approaches for addressing class imbalance and reducing false positives.

9.1. Recent Advancements in Using Machine Learning for Healthcare Fraud Detection

The latest improvements have introduced robust and intelligent methods for managing complex and advanced fraudulent activities in healthcare fraud detection using machine learning (ML). Bounab, Zarour [44] have shown that ML models, such as decision trees (DT), combined with SMOTE-ENN, demonstrate remarkable performance in managing imbalanced healthcare datasets.

Meanwhile, Kapadiya, Ramoliya [94] stated that using blockchain technology with ensemble learning helped improve security and privacy in healthcare insurance fraud detection systems. Furthermore, deep reinforcement learning, combined with autoencoders, has significantly contributed to detecting anomalies in MRI, X-ray, and clinical CT image datasets with high precision [46,157]. Moreover, federated learning, when integrated with relevant machine learning (ML) algorithms, helps secure the Internet of Medical Things (IoMT), supporting the acquisition of energy-efficient solutions among discrete machines [158,159].

Recent developments in healthcare fraud detection through machine learning (ML) have strengthened the fraud detection system, addressing several challenges, including data imbalance, data and system security, data privacy, and, most notably, real-time fraud detection.

9.2. Barriers Organisations Face in Implementing Machine Learning

9.2.1. Data Quality and Availability

ML models typically require high-quality and complete datasets; however, data quality and availability have long been key concerns in healthcare data, which is often missing, incomplete, and sometimes unavailable. Moreover, due to strict data protection rules and regulations, such as the General Data Protection Regulation (GDPR), healthcare data becomes inaccessible, which poses a significant barrier to implementing machine learning for the healthcare fraud detection process [160]. According to Zhang, Morley [161], the UK’s NHS has faced substantial difficulties regarding data transparency, data protection, and data quality, which resulted in uneven data training with ML-based fraud detection models.

9.2.2. Integration with Existing Systems

Like many other sectors, healthcare organisations rely on traditional IT infrastructures incompatible with modern real-world technologies. In this scenario, healthcare organisations need to update their digital environment, which is costly. Hence, it is a key barrier to implementation within available budgets.

9.2.3. Regulatory Compliance

Regulatory compliance, such as GDPR, can be particularly challenging when handling sensitive data, like healthcare information. Moreover, verifying that ML models do not introduce bias is essential, which could raise ethical concerns when making decisions. Under the GDPR, it is strictly prohibited, and stringent requirements are imposed on transferring personal data across borders, and it is only permitted when the host country ensures an acceptable level of data safeguarding practices [162].

9.2.4. Resource Constraints

Resources may be required whenever the latest technologies need to be installed in any organisation. When implemented in healthcare, ML requires significant financial investment in the latest devices, data scientists, engineers, domain experts, and maintenance, which is scarce for every healthcare organisation to afford. Moreover, intelligent use of machine learning in healthcare fraud detection systems requires skilled personnel capable of understanding, managing, and interpreting model outputs.

9.2.5. Adaptability to Emerging Threats

In today’s rapidly evolving digital landscape, cybercriminals continue to attack, and cyber threats evolve exponentially. In this case, healthcare organisations need to train their models and update their systems frequently, which is sometimes impossible for them to do.

9.2.6. Resistance to Change by Organisation

Many healthcare organisations tend to rely on traditional fraud detection practices. Such a culture and resistance by staff and leaders towards updating their infrastructure often hinders or prevents ML implementation.

9.3. Improving Machine Learning Efficiency

The effectiveness of machine learning (ML) in healthcare organisations can be significantly improved by better managing healthcare data, efficient feature engineering, ML model upgradation, and adaptive learning methods. A significant improvement involves utilising SMOTE-ENN for detecting fraudulent activities and reducing noise in healthcare data [44]. Moreover, ensemble learning techniques like bagging, boosting, voting, and stacking models are used to reduce variance, minimise bias, and improve prediction accuracy simultaneously [163].

Various tools, primarily sequential forward selection, SHAP, and LIME, enhance the effectiveness and explainability of models in feature selection. Meanwhile, federated learning supports maintaining privacy and decreasing transmission costs, which is especially helpful for IoMT systems [164].

9.4. Standard Datasets Used in Healthcare Fraud Detection

The study’s most used datasets are Medicare CMS data, List of Excluded Individuals and Entities (LEIE), and Kaggle. The first dataset has more than 9 million records and 29 features and is the most commonly used dataset for detecting fraud in healthcare services. The second dataset consists of records excluded from the CMS federal healthcare programs and includes 18 attributes.

Finally, the Kaggle dataset is primarily used in academia and simulation-driven fraud detection projects. It consists of original healthcare data, which is a common reason for its suitability for benchmarking machine learning (ML) models.

9.5. Ethical Concerns in ML-Based Healthcare Fraud Detection

Implementing machine learning in healthcare fraud detection raises serious ethical concerns that must be addressed to ensure unbiased, transparent, and privacy-acquiescent procedures. Significant problems include algorithmic bias, patient record privacy concerns, a lack of model explainability, and inadequate ethical frameworks.

A substantial concern is algorithmic bias, as models trained on imbalanced or unrepresentative samples may falsely identify claims from specific demographic categories or types of providers [165]. Another critical issue is the privacy of patients’ records, particularly given the sensitive nature of healthcare information. Yet, fully anonymised datasets can pose re-identification consequences when applied to complex ML pipelines. Hence, the latest technologies, such as federated learning, i.e., OpenFL, need to be implemented to mitigate the impact while maintaining the model’s constant performance [166].

Additionally, the absence of model explainability raises concerns for ethical and legal rationales when employed in significant decision-making applications. A lack of precision can weaken the legal due process in fraud detection cases and lower the clinician’s trust in the technology [167]. For example, the machine learning-based model used in UnitedHealthcare underwent an inquiry in 2022, when it generated biased results for insurance claims [168]. The case underscores the imperative of explainability, fairness auditing, and active human involvement in implementing machine learning models.

9.6. Limitations of Machine Learning in Healthcare Systems

Although machine learning outperforms in real-world healthcare systems, it suffers from several challenges when implemented in a healthcare environment.

9.6.1. Data Privacy and Governance

Data privacy and governance are the primary concerns in healthcare datasets, due to the GDPR and region-specific policies for data acquisition. This limits the overall scope of data sharing and model training, resulting in isolated models.

9.6.2. Label Noise

Several healthcare fraud detection systems require labelled dataset for supervised machine learning techniques. However, the labels in healthcare datasets are usually noisy and inconsistent because of errors in coding, inadequate documentation, and auditors’ subjective understanding, which can generate bias.

9.6.3. Model Drift

Fraud patterns and clinical behaviours constantly change over time, limiting the capability of the models trained on historical data, which requires regular training and real-time monitoring to avoid degradation.

9.6.4. Infrastructure Constraints

A significant concern in implementing advanced machine learning models in the developing world is resource constraints, such as a lack of computational resources, a knowledge gap, and a lack of awareness.

9.7. Operationalising Transparency and Bias Mitigation

Transparency and bias mitigation should be incorporated into model training and governance roadmaps to ensure the liable and impartial implementation of machine learning models in healthcare fraud detection. This is attainable via model auditing, explainability metrics, fairness metrics, humans in the loop, and dashboard transparency.

9.7.1. Model Auditing

Regular audits on ML models, such as financial or clinical audits, can identify potential issues, such as model drift, biased analysis, and feature selection challenges. An effective audit practice consists of the records of data origin, model intent, assumptions, and any known limitations.

9.7.2. Explainability Metrics

SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations) help stakeholders assess features individually and globally.

9.7.3. Transparency Dashboards

A transparent dashboard shows the real-time predictions, confidence intervals, and model interpretability layers, enabling the compliance team to monitor outputs in advance, thereby mitigating the risk of false outcomes and interconnected legal consequences.

10. Conclusions and Future Research Directions

Machine learning has shown great potential as a transformative tool in the war on healthcare fraud. ML algorithms can leverage a broad range of supervised, unsupervised, deep learning, and ensemble algorithms to sift through enormous amounts of healthcare data and identify meaningful patterns and anomalies that indicate fraudulent activity, including billing and prescription fraud, healthcare entitlement fraud, and healthcare identity theft.

The future of this field promises even more sophisticated and practical solutions, driven by emerging trends such as explainable AI, which will enhance the transparency and trustworthiness of these systems; the integration of blockchain technology to improve data security and integrity; the development of adaptive and real-time models capable of continuously learning and responding to evolving fraud tactics; the application of federated learning to enable collaborative fraud detection while preserving data privacy; and the use of graph-based machine learning to uncover complex relationships and networks involved in fraudulent schemes.

While the potential for futuristic machine learning to accelerate healthcare fraud detection exists, further research and innovation are necessary to realise its transformative potential. The challenges of addressing the complexity and volume of healthcare data, the ever-increasing sophistication of fraud schemes, the critical issues requiring AI and data balancing, and robust cybersecurity measures will be paramount.

Future research should yield more interpretable and stronger models, generate better and more accurate datasets that better represent the world, and ultimately facilitate more effective interaction between experts from various fields, including healthcare, machine learning, and cybersecurity. These advancements will continue to be pursued to protect the healthcare system from the insidious effects of fraud and to redirect resources to delivering quality care to all.

10.1. Practical Implications

The current study provides significant practical insights for healthcare organisations for ML-driven healthcare fraud detection systems. By conducting a critical analysis of the latest machine learning (ML) techniques, the current research demonstrates how cutting-edge ML strategies can enhance the effectiveness and performance of fraud detection in healthcare data. Furthermore, the study addresses prevalent challenges, including data imbalance, transparency, and data privacy issues. It provides potential solutions, enabling healthcare organisations to implement machine-learning techniques for fraud detection in healthcare data. By examining commonly used datasets in healthcare and machine learning analytical tools, the study provides a comprehensive guide to implementing intelligent ML-based fraud detection systems in healthcare environments.

10.2. Future Research Directions

Future studies should emphasise developing identical, industry-specific datasets, implementing techniques such as synthetic data creation, federated learning (FL), explainable AI, and safeguarding computation to balance data effectiveness with privacy preservation. There is an immense need for enhanced and comprehensive studies examining machine learning models’ efficiency across diverse healthcare systems and regulatory environments. Moreover, there is a need for interdisciplinary research collaboration, encompassing technical perspectives, as well as legal, ethical, and governance infrastructure, which is necessary to implement these initiatives effectively.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Machine Learning techniques in Healthcare, along with applications, benefits and challenges.

Paper	ML Techniques	ML Types	Applications	Datasets Used	Study Types	Benefits	Challenges	Mitigation Strategies
[44]	Decision Trees mixed with a hybrid resampling technique, i.e., SMOTE-ENN	Supervised ML	Fraudulent transactions detection in U.S. Medicare claims processing systems	Medicare dataset from the US	Experimental study	Exceptional accuracy, F1 and recall approximately 0.99 and effective with imbalanced data	Severe class imbalance, The threat of overfitting, SMOTE can bring the noise	Combining SMOTE-ENN with domain-related features like “Provider Type.”
[169]	ML Algorithms: Classification and Regression	Supervised ML	Financial management and decision-making in healthcare organisations	Administrative data	Experimental study	Enhanced Efficiency and accuracy in handling healthcare data, cost-effectiveness	Issues in the quality of data, Implementation Challenges	Usually, preprocessing and model tuning can resolve challenges.
[94]	Bagging and Stacking, and ML Integration with Blockchain Technology	Ensemble Learning Techniques	Fraud detection in healthcare insurance claims using decentralised blockchain systems	Healthcare insurance claims data	Experimental study	Advanced fraud detection process, Improved data security	Scalability issues between ML and blockchain	Integration of multiple classifiers and blockchains for securing data
[157]	Deep Reinforcement Learning and Convolutional Autoencoders	Deep learning and Reinforcement learning	Automatically Anomaly Detection from clinical image datasets	Clinical CT images dataset	Experimental study	Efficient data processing, Minimises manual assignments Increases diagnostic precision	Challenges in structural variations	Using mixed methods and autoencoders can remove errors.
[158]	Random Forest, Support Vector Machine and Federated Learning	Supervised and unsupervised ML	Detection of Intrusions from Medical Internet of Things data to protect healthcare information	Medical Internet of Things (M-IoT) network data	Experimental study	Improved accuracy, Energy efficacy, and Resource Optimisation	Computational resource constraint, High energy usage	Leveraging economic ML models and federated learning for model updating can mitigate the challenges.
[170]	Fuzzy Closure Miner for Frequent Itemset (FCMFI) and Nucleotide Sequence Comprehension Engine (NSCE)	Unsupervised ML	SARS-CoV-2 analysis to detect transmutation prototypes and abnormalities	SARS-CoV-2 data	Experimental study	Improved Pattern Recognition	Complicated genomic data patterns might be challenging in analysation	Anomalies identification by combining multiple techniques like FCMFI and NSCE
[171]	Dense Multi-Scale - Transnet DMSC and Multi-level Fusion	Deep Learning	Automatic Detection of Anomalies	Multiple Medical and Image Datasets	Experimental study	Enhanced feature mining, Advanced anomaly detection	Conventional CNNs are not effective, Issues in losing features	Combining different transfer modules and DMSC for feature fusion.
[160]	A mixture of AI and ML techniques	Supervised ML, Unsupervised ML and Deep Learning	Detection and prevention of fraudulent transactions in Nigeria	Different real-world datasets from Nigeria	Empirical study	Enhanced accuracy, efficient fraud detection	Scalability and regulatory compliance challenges	Permanent learning and adaptive systems using current datasets
[172]	Patch-wise contrastive learning-based auto-encoder (PatchCL-AE)	Unsupervised ML and Deep Learning	Anomaly detection from medical image datasets to scan diseases	Medical Imaging Datasets	Experimental study	Enhanced anomaly detection	Reconstruction-based techniques constraints, over-dependence on pixel-level losses	Reduce noise and improve scalability.
[173]	Weighted MultiTree (WMT) and Density-Based Clustering (DBC)	Unsupervised ML	Detection of fraud from healthcare insurance claims datasets	Health Insurance Claims Datasets	Experimental study	Improved fraud detection performance	Challenges in clustering claims	Applying two different approaches in two different stages, i.e., WMT and DBC.
[174]	Bayesian Belief Network (BBN)	Supervised ML	Detection of fraud from healthcare insurance claims datasets	Health Insurance Claims Datasets	Experimental study	Enhanced model performance	Issues in Claims Analysis	Exploits relational structure of transaction attributes using BBN to integrate interconnections
[175]	Multiple anomaly detection techniques	Unsupervised, Supervised ML and Deep Learning	Detection of Anomalies from medical image datasets	Multiple Medical imaging datasets	Benchmark study	Enhanced anomaly detection	Unreliable processes and dataset selections limit productivity	Using numerous datasets and implementing a single framework.
[18]	Decision tree	Supervised ML	Detection of fraud from healthcare insurance claims datasets	Health Insurance Claims Datasets from Ghana	Experimental study	Efficient classification accuracy and improved security	Scalability issues between ML and blockchain, Challenges in fraud detection	Integrating ML into Ethereum smart contracts.
[32]	CatBoost, XGBoost, LightGBM, Random Forest	Supervised ML	Detection and prediction of fraud from healthcare insurance claims datasets.	Health Insurance Claims Datasets	Experimental study	Enhanced detection accuracy	Multi-dimensional and noisy data	Using ensemble learning and the use of CatBoost, XGBoost, and LightGBM tools.
[176]	Random Forest, KNN, SVM, and MLP	Supervised ML	Detection of Anomalies from the Internet of Medical Things Datasets	Different Internet of Things (IoT) datasets	Experimental study	Outstanding accuracy and efficient detection	Scalability issues with conventional models, Challenges in the implementation of IoMT	Use of multiple ML algorithms at once.
[177]	Aggregated Mondrian Forests, Half-Space Trees, Bijective soft sets, Shannon entropy, and TOPSIS	Supervised and Unsupervised ML	Effective diagnosis in competent healthcare.	Medical Datasets	Experimental study	Enhanced Detection and accuracy	Reconstruction of the static model is required, Increased computational costs	Leveraging novel framework to minimise dimensionality and real-time processing.
[178]	Decision tree, K-NN, Logistic Regression, Random Forest, AdaBoost, XGBoost	Supervised ML	Detection of Anomalies from body area networks (BANs)	Body Area Networks Data	Experimental study	Exceptionally great accuracy	Resource constraints in IoT BANs, Anomaly detection challenges	Integration of multiple classifiers and using standard data conversion tools.
[179]	Meta-Reinforcement Learning (Meta-RL)	Reinforcement learning	Detection of fraud from healthcare insurance claims datasets	Medical Datasets	Experimental study	Improved performance and high accuracy	Substantial class imbalance, the threat of overfitting, Possibility of data loss	Using Meta Reinforcement Learning for task distribution.
[34]	Multiple Anomaly Detection Techniques	Unsupervised ML, Deep learning	Detection of fraud from healthcare insurance claims datasets and anomaly detection.	Health Insurance Claims Dataset from Belgium	Experimental study	Advanced anomalies detection	Unavailability of labelled dataset, Challenges in identifying the difference between fraud and abuse	Using advanced anomaly detection techniques.
[93]	Federated learning, Blockchain-based task scheduling	Supervised ML	Detection of Anomalies from the Internet of Medical Things Datasets	Healthcare Internet of Medical Things Data	Experimental study	Increase energy efficiency	Dispersed fraudulent transactions in IoMT data, Security and privacy challenges in cloud infrastructure	Integrating federated learning with blockchain-based task scheduling.
[14]	Logistic Regression, Decision Tree, KNN, Naive Bayes, SVM, and Random Forest	Supervised ML	Fraud detection in healthcare insurance claims using decentralised (blockchain) systems	Different Healthcare Datasets	Experimental study	Increased Accuracy and improved fraud detection	Security and privacy challenges in blockchain implementation, Data may be compromised	Leveraging ML to examine data and check blockchain transactions.
[88]	Black-box neural networks, XAI techniques, DT	Supervised ML	Detection of fraud from healthcare datasets	UCI datasets	Experimental study	Improved model Performance and accuracy	Black-box complexity, lack of conventional accuracy	Integrating XAI with traditional methods.
[180]	CNN-LSTM, DNN, Modified SHA-256 encryption algorithm	Supervised ML, Deep learning	Detection of fraud from healthcare insurance claims datasets	Hospital Administrative Datasets	Experimental study	Improved accuracy	Data security challenges	Leveraging DL and modified SHA-256 encryption algorithm for security enhancement.
[22]	Neural Networks, Focal-loss function	Unsupervised ML	Detection of fraud from healthcare insurance claims datasets	Health Insurance Claims Datasets from China	Experimental study	Improved fraud detection	Unavailability of labelled dataset, Noisy data	Use of unsupervised ML model and feature engineering, along with focal-loss function.
[24]	Deep learning ensemble (EffiIncepNet), EfficientNet, Inception-ResNet-v2 architectures	Supervised ML and Deep learning	Fraud detection in healthcare insurance claims using decentralised blockchain systems	IEEE-CIS Fraud Detection Dataset	Experimental study	Increased accuracy and improved information security	Managing complex and high-dimensional data, Issues in blockchain data	Using deep learning ensemble model with blockchain.
[181]	Multiple AI techniques and approaches	Supervised ML and Deep learning	Detection of Anomalies from medical image datasets	Clinical Dataset of MRI	Review article	Improved and efficient fraud detection	Manual explanation, Challenges in feature extraction	Different DL, ensemble learning and XAI approaches.
[182]	Multi-channel heterogeneous graph neural networks (HGNNs)	Unsupervised ML and Deep Learning	Detection of fraud from healthcare insurance claims datasets	Health Insurance Claims Dataset from China	Experimental study	Improved fraud detection	Conventional methods’ limitations, Issues in graph creation	Convert data into multi-channel heterogeneous graphs and use advanced anomaly detection.
[45]	Random Forest, Logistic Regression, ANNs, SMOTE Boruta	Supervised ML	Detection of fraud from healthcare insurance claims datasets from Saudi Arabia.	Health Insurance Claims Dataset of Saudi Arabia	Experimental study	Increased fraud detection	Data disparity issues, Vigorous fraudulent transactions	Use of SMOTE Boruta to improve model accuracy.
[46]	Deep autoencoders	Unsupervised ML and Deep learning	Detection of fraud from healthcare insurance claims datasets	Health Insurance Claims Datasets	Experimental study	Improved fraud detection	Unavailability of labelled dataset, Noisy data	Using deep autoencoders and evaluating performance with density-based models.
[13]	Association rule mining, Unsupervised classifiers, e.g., IF, CBLOF, ECOD, OCSVM	Unsupervised learning	Detection of fraud from healthcare insurance claims datasets	CMS DE-SynPUF dataset	Experimental study	Improved fraud detection	Complications in the data, i.e., data quality, Real-time fraud detection required	Use of association rule mining and unsupervised ML for anomaly detection.
[34]	Multiple anomaly detection techniques, SHAP	Unsupervised ML	Detection of fraud from healthcare insurance claims datasets and anomaly detection.	Health Insurance Claims Datasets from Belgium	Experimental study	Increased anomalies detection	Unavailability of labelled dataset, Challenges in anomaly detection	Using advanced anomaly detection techniques along with SHAP.
[183]	Feature ranking methods	Supervised learning	Detection of fraud from healthcare insurance claims datasets	Health Insurance Claims Datasets	Experimental study	Enhanced model performance	Substantial class imbalance, The threat of overfitting	Using feature ranking methods to identify the most relevant feature and minimising noise.
[184]	Random Forest, Adaptive Boosting, Logistic Regression, Perceptron, and Deep NN	Supervised ML	Detection of Anomalies from medical image datasets	Canadian Institute for Cybersecurity (CIC) IoT Dataset	Experimental study	Increased accuracy and advanced anomaly detection	Substantial class imbalance, the threat of overfitting Effective threat detection is required	Implications of feature-reducing methods. Use of SMOTE to prevent overfitting.
[19]	Multiple Anomaly Detection techniques	Unsupervised ML and Deep Learning	Detection of fraud from healthcare insurance claims datasets and anomaly detection.	Smart home datasets	Systematic literature review	Automatic fraud detection	Substantial class imbalance, Issues due to differences in simulation and real-world datasets	Use of supervised ML to identify potential risks.
[185]	Multiple ML classification techniques, Risk classification, Premium prediction	Supervised ML	Detection of fraud from healthcare insurance claims datasets and anomaly detection	Health Insurance Claims Datasets from the US	Experimental study	Minimises manual workload, Enhanced operational excellence	Managing complex and high-dimensional data, Data security,	Integrating ensemble ML with feature engineering for fast processing.

References

Najar, A.V.; Alizamani, L.; Zarqi, M.; Hooshmand, E. A global scoping review on the patterns of medical fraud and abuse: Integrating data-driven detection, prevention, and legal responses. Arch. Public Health 2025, 83, 1–24. [Google Scholar] [CrossRef]
Dorsey. Healthcare Fraud: A World Beyond the Anti-Kickback Statute. 2024 [cited 2025 04/2025]. Available online: https://www.dorsey.com/newsresources/publications/client-alerts/2024/5/healthcare-fraud (accessed on 27 June 2025).
Attorney. Man Pleads Guilty to Conspiracy to Launder Money in Connection with $100 Million Health Care Fraud Scheme. 2025 [cited 2025 2025]. Available online: https://www.justice.gov/usao-mdnc/pr/man-pleads-guilty-conspiracy-launder-money-connection-100-million-health-care-fraud (accessed on 27 June 2025).
Sweeney, E. Predictive Analytics Saves Government $1.5B in Improper Payments. 2016. Available online: https://www.fiercehealthcare.com/antifraud/predictive-analytics-saves-government-1-5b-improper-payments (accessed on 27 June 2025).
Vandenberg, O.; Martiny, D.; Rochas, O.; van Belkum, A.; Kozlakidis, Z. Considerations for diagnostic COVID-19 tests. Nat. Rev. Microbiol. 2020, 19, 171–183. [Google Scholar] [CrossRef]
Duong, M.T.; Bruns, E.J.; Lee, K.; Cox, S.; Coifman, J.; Mayworm, A.; Lyon, A.R. Rates of Mental Health Service Utilization by Children and Adolescents in Schools and Other Common Service Settings: A Systematic Review and Meta-Analysis. Adm. Policy Ment. Health Ment. Health Serv. Res. 2020, 48, 420–439. [Google Scholar] [CrossRef]
Arshed, M.A.; Mumtaz, S.; Gherghina, Ș.C.; Urooj, N.; Ahmed, S.; Dewi, C. A Deep Learning Model for Detecting Fake Medical Images to Mitigate Financial Insurance Fraud. Computation 2024, 12, 173. [Google Scholar] [CrossRef]
Venkatesh, R.; Hanumantha, B.S. A Privacy-Preserving Quantum Blockchain Technique for Electronic Medical Records. IEEE Eng. Manag. Rev. 2023, 51, 137–144. [Google Scholar] [CrossRef]
Joiner, K.A.; Lin, J.; Pantano, J. Upcoding in medicare: Where does it matter most? Health Econ. Rev. 2024, 14, 1. [Google Scholar] [CrossRef] [PubMed]
Viriyathorn, S.; Witthayapipopsakul, W.; Kulthanmanusorn, A.; Rittimanomai, S.; Khuntha, S.; Patcharanarumol, W.; Tangcharoensathien, V. Definition, Practice, Regulations, and Effects of Balance Billing: A Scoping Review. Health Serv. Insights 2023, 16, 11786329231178766. [Google Scholar] [CrossRef]
Branion-Calles, M.; Godfreyson, A.; Berniaz, K.; Arason, N.; Chan, H.; Erdelyi, S.; Winters, M.; Sengupta, J.; Essa, M.; Rajabali, F.; et al. Underreporting and selection bias of serious road traffic injuries in auto insurance claims and police reports in British Columbia, Canada. Transp. Res. Interdiscip. Perspect. 2025, 30, 101375. [Google Scholar] [CrossRef]
Lukyanenko, R.; Maass, W.; Storey, V.C. Trust in artificial intelligence: From a Foundational Trust Framework to emerging research opportunities. Electron. Mark. 2022, 32, 1993–2020. [Google Scholar] [CrossRef]
Hamid, Z.; Khalique, F.; Mahmood, S.; Daud, A.; Bukhari, A.; Alshemaimri, B. Healthcare insurance fraud detection using data mining. BMC Med. Inform. Decis. Mak. 2024, 24, 112. [Google Scholar] [CrossRef]
Mohammed, M.A.; Boujelben, M.; Abid, M. A Novel Approach for Fraud Detection in Blockchain-Based Healthcare Networks Using Machine Learning. Futur. Internet 2023, 15, 250. [Google Scholar] [CrossRef]
Kühl, N.; Schemmer, M.; Goutier, M.; Satzger, G. Artificial intelligence and machine learning. Electron. Mark. 2022, 32, 2235–2244. [Google Scholar] [CrossRef]
Aminizadeh, S.; Heidari, A.; Toumaj, S.; Darbandi, M.; Navimipour, N.J.; Rezaei, M.; Talebi, S.; Azad, P.; Unal, M. The applications of machine learning techniques in medical data processing based on distributed computing and the Internet of Things. Comput. Methods Programs Biomed. 2023, 241, 107745. [Google Scholar] [CrossRef] [PubMed]
Praveen, S.P.; Krishna, T.B.M.; Anuradha, C.H.; Mandalapu, S.R.; Sarala, P.; Sindhura, S. A robust framework for handling health care information based on machine learning and big data engineering techniques. Int. J. Health Manag. 2022, 1–18. [Google Scholar] [CrossRef]
Amponsah, A.A.; Adekoya, A.F.; Weyori, B.A. A novel fraud detection and prevention method for healthcare claim processing using machine learning and blockchain technology. Decis. Anal. J. 2022, 4, 100122. [Google Scholar] [CrossRef]
Galvão, Y.M.; Castro, L.; Ferreira, J.; Neto, F.B.D.L.; Fagundes, R.A.D.A.; Fernandes, B.J. Anomaly detection in smart houses for healthcare: Recent advances, and future perspectives. SN Comput. Sci. 2024, 5, 136. [Google Scholar] [CrossRef]
Yan, J.; Wang, X. Unsupervised and semi-supervised learning: The next frontier in machine learning for plant systems biology. Plant J. 2022, 111, 1527–1538. [Google Scholar] [CrossRef]
Ganaie, M.A.; Hu, M.; Malik, A.K.; Tanveer, M.; Suganthan, P.N. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 2022, 115, 105151. [Google Scholar] [CrossRef]
Zhang, C.; Xiao, X.; Wu, C. Medical Fraud and Abuse Detection System Based on Machine Learning. Int. J. Environ. Res. Public Health 2020, 17, 7265. [Google Scholar] [CrossRef]
Ali, A.; Razak, S.A.; Othman, S.H.; Eisa, T.A.E.; Al-Dhaqm, A.; Nasser, M.; Elhassan, T.; Elshafie, H.; Saif, A. Financial Fraud Detection Based on Machine Learning: A Systematic Literature Review. Appl. Sci. 2022, 12, 9637. [Google Scholar] [CrossRef]
Almazroi, A.A. Innovative AI ensemble model for robust and optimized blockchain-based healthcare systems. Netw. Model. Anal. Health Inform. Bioinform. 2025, 14, 1–19. [Google Scholar] [CrossRef]
Thundiyil; Picone, J.; McKenzie, S. Transformer Architectures in Time Series Analysis: A Review; Temple University: Philadelphia, PA, USA, 2014. [Google Scholar]
Shafiei, A.; Tatar, A.; Rayhani, M.; Kairat, M.; Askarova, I. Artificial neural network, support vector machine, decision tree, random forest, and committee machine intelligent system help to improve performance prediction of low salinity water injection in carbonate oil reservoirs. J. Pet. Sci. Eng. 2022, 219, 111046. [Google Scholar] [CrossRef]
Murorunkwere, B.F.; Ihirwe, J.F.; Kayijuka, I.; Nzabanita, J.; Haughton, D. Comparison of Tree-Based Machine Learning Algorithms to Predict Reporting Behavior of Electronic Billing Machines. Information 2023, 14, 140. [Google Scholar] [CrossRef]
Chatterjee, P.; Das, D.; Rawat, D.B. Digital twin for credit card fraud detection: Opportunities, challenges, and fraud detection advancements. Futur. Gener. Comput. Syst. 2024, 158, 410–426. [Google Scholar] [CrossRef]
Afriyie, J.K.; Tawiah, K.; Pels, W.A.; Addai-Henne, S.; Dwamena, H.A.; Owiredu, E.O.; Ayeh, S.A.; Eshun, J. A supervised machine learning algorithm for detecting and predicting fraud in credit card transactions. Decis. Anal. J. 2023, 6, 100163. [Google Scholar] [CrossRef]
Razzaq, K.; Shah, M.; Fattahi, M.; Tang, J. Empowering machine learning for robust cyber-attack prevention in online retail: An integrative analysis. Humanit. Soc. Sci. Commun. 2025, 12, 1–15. [Google Scholar] [CrossRef]
Niaz, N.U.; Shahariar, K.N.; Patwary, M.J.A. Class Imbalance Problems in Machine Learning: A Review of Methods and Future Challenges. In Proceedings of the ICCA 2022: 2nd International Conference on Computing Advancements, Dhaka, Bangladesh, 13–15 January 2022; pp. 485–490. [Google Scholar]
Wang, Z.; Chen, X.; Wu, Y.; Jiang, L.; Lin, S.; Qiu, G. A robust and interpretable ensemble machine learning model for predicting healthcare insurance fraud. Sci. Rep. 2025, 15, 218. [Google Scholar] [CrossRef] [PubMed]
Samara, M.A.; Bennis, I.; Abouaissa, A.; Lorenz, P. A survey of outlier detection techniques in IoT: Review and classification. J. Sens. Actuator Netw. 2022, 11, 4. [Google Scholar] [CrossRef]
De Meulemeester, H.; De Smet, F.; van Dorst, J.; Derroitte, E.; De Moor, B. Explainable unsupervised anomaly detection for healthcare insurance data. BMC Med. Inform. Decis. Mak. 2025, 25, 14. [Google Scholar] [CrossRef]
Li, D.; Qi, Z.; Zhou, Y.; Elchalakani, M. Machine Learning Applications in Building Energy Systems: Review and Prospects. Buildings 2025, 15, 648. [Google Scholar] [CrossRef]
Yang, X.; Qi, X.; Zhou, X. Deep Learning Technologies for Time Series Anomaly Detection in Healthcare: A Review. IEEE Access 2023, 11, 117788–117799. [Google Scholar] [CrossRef]
Li, G.; Yu, Z.; Yang, K.; Lin, M.; Chen, C.L.P. Exploring Feature Selection With Limited Labels: A Comprehensive Survey of Semi-Supervised and Unsupervised Approaches. IEEE Trans. Knowl. Data Eng. 2024, 36, 6124–6144. [Google Scholar] [CrossRef]
Iqbal, A.; Amin, R. Time series forecasting and anomaly detection using deep learning. Comput. Chem. Eng. 2023, 182, 108560. [Google Scholar] [CrossRef]
Abimbola, B.; Marin, E.d.L.C.; Tan, Q. Enhancing Legal Sentiment Analysis: A Convolutional Neural Network–Long Short-Term Memory Document-Level Model. Mach. Learn. Knowl. Extr. 2024, 6, 877–897. [Google Scholar] [CrossRef]
Amarasinghe, S.C. Developing Robust Deep Learning Models for Intelligent Infrastructure: Addressing Scalability, Security, and Privacy Challenges. Appl. Res. Artif. Intell. Cloud Comput. 2024, 7, 1–10. [Google Scholar]
Nesvijevskaia, A.; Ouillade, S.; Guilmin, P.; Zucker, J.-D. The accuracy versus interpretability trade-off in fraud detection model. Data Policy 2021, 3, e12. [Google Scholar] [CrossRef]
Rajendran, R.M.; Vyas, B. Detecting APT Using Machine Learning: Comparative Performance Analysis with Proposed Model. In Proceedings of the SoutheastCon 2024, Atlanta, GA, USA, 15–24 March 2024; pp. 1064–1069. [Google Scholar]
Verma, I.; Prasad, S.K. Exploring Ensemble Learning Techniques for Infant Mortality Prediction: A Technical Analysis of XGBoost Stacking AdaBoost and Bagging Models. Birth Defects Res. 2025, 117, e2443. [Google Scholar] [CrossRef]
Bounab, R.; Zarour, K.; Guelib, B.; Khlifa, N. Enhancing Medicare Fraud Detection Through Machine Learning: Addressing Class Imbalance With SMOTE-ENN. IEEE Access 2024, 12, 54382–54396. [Google Scholar] [CrossRef]
Nabrawi, E.; Alanazi, A. Fraud Detection in Healthcare Insurance Claims Using Machine Learning. Risks 2023, 11, 160. [Google Scholar] [CrossRef]
Suesserman, M.; Gorny, S.; Lasaga, D.; Helms, J.; Olson, D.; Bowen, E.; Bhattacharya, S. Procedure code overutilization detection from healthcare claims using unsupervised deep learning methods. BMC Med. Inform. Decis. Mak. 2023, 23, 1–11. [Google Scholar] [CrossRef]
Mahadevkar, S.V.; Patil, S.; Kotecha, K.; Soong, L.W.; Choudhury, T. Exploring AI-driven approaches for unstructured document analysis and future horizons. J. Big Data 2024, 11, 1–54. [Google Scholar] [CrossRef]
Ocan, P. Enhancing Prescription Fraud and Error Detection in NHS Prescriptions Through Anomaly Detection. Reflective Prof. 2024, 4, 1–63. [Google Scholar]
Darwish, D. Machine Learning and IoT in Health 4.0, in IoT and ML for Information Management: A Smart Healthcare Perspective; Springer: Berlin/Heidelberg, Germany, 2024; pp. 235–276. [Google Scholar]
Wang, J.E.; Beaulieu-Jones, B.; Brat, G.A.; Marwaha, J.S. The role of artificial intelligence in helping providers manage pain and opioid use after surgery. Glob. Surg. Educ.-J. Assoc. Surg. Educ. 2024, 3, 1–5. [Google Scholar] [CrossRef]
Nguyen, T.; Perez, V. Privatizing Plaintiffs: How Medicaid, the False Claims Act, and Decentralized Fraud Detection Affect Public Fraud Enforcement Efforts. J. Risk Insur. 2019, 87, 1063–1091. [Google Scholar] [CrossRef]
Kapadiya, K.; Patel, U.; Gupta, R.; Alshehri, M.D.; Tanwar, S.; Sharma, G.; Bokoro, P.N. Blockchain and AI-Empowered Healthcare Insurance Fraud Detection: An Analysis, Architecture, and Future Prospects. IEEE Access 2022, 10, 79606–79627. [Google Scholar] [CrossRef]
DeFulio, A.; Rzeszutek, M.J.; Furgeson, J.; Ryan, S.; Rezania, S. A smartphone-smartcard platform for contingency management in an inner-city substance use disorder outpatient program. J. Subst. Abus. Treat. 2021, 120, 108188. [Google Scholar] [CrossRef]
Mahmud, M.A.I.; Talukder, A.T.; Sultana, A.; Bhuiyan, K.I.A.; Rahman, M.S.; Pranto, T.H.; Rahman, R.M. Toward news authenticity: Synthesizing natural language processing and human expert opinion to evaluate news. IEEE Access 2023, 11, 11405–11421. [Google Scholar] [CrossRef]
Cord, M.; Cunningham, P. Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval. J. Electron. Imaging 2007, 18, 039901-01-2. [Google Scholar] [CrossRef]
Hastie, T. Overview of supervised learning. In The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; pp. 9–41. [Google Scholar]
Nasteski, V. An overview of the supervised machine learning methods. Horizons B 2017, 4, 51–62. [Google Scholar] [CrossRef]
Krishnan, R.; Rajpurkar, P.; Topol, E.J. Self-supervised learning in medicine and healthcare. Nat. Biomed. Eng. 2022, 6, 1346–1352. [Google Scholar] [CrossRef]
Tiwari, A. Supervised learning: From theory to applications. In Artificial Intelligence and Machine Learning for EDGE Computing; Elsevier: Amsterdam, The Netherlands, 2022; pp. 23–32. [Google Scholar]
Tyagi, K.; Rane, C.; Sriram, R.; Manry, M. Unsupervised learning. In Artificial Intelligence and Machine Learning for EDGE Computing; Elsevier: Amsterdam, The Netherlands, 2022; pp. 33–52. [Google Scholar]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R.; Taylor, J. Unsupervised learning. In An Introduction to Statistical Learning: With Applications in Python; Springer: Berlin/Heidelberg, Germany, 2023; pp. 503–556. [Google Scholar]
Priyadarshi, R.; Ranjan, R.; Vishwakarma, A.K.; Yang, T.; Rathore, R.S. Exploring the Frontiers of Unsupervised Learning Techniques for Diagnosis of Cardiovascular Disorder: A Systematic Review. IEEE Access 2024, 12, 139253–139272. [Google Scholar] [CrossRef]
He, M.; Cerna, J.; Mathew, R.; Zhao, J.; Zhao, J.; Espina, E.; Clore, J.L.; Sowers, R.B.; Hsiao-Wecksler, E.T.; Hernandez, M.E. Objective anxiety level classification using unsupervised learning and multimodal physiological signals. Smart Health 2025, 36, 100572. [Google Scholar] [CrossRef]
Yang, X.; Song, Z.; King, I.; Xu, Z. A survey on deep semi-supervised learning. IEEE Trans. Knowl. Data Eng. 2022, 35, 8934–8954. [Google Scholar] [CrossRef]
Song, Z.; Yang, X.; Xu, Z.; King, I. Graph-Based Semi-Supervised Learning: A Comprehensive Review. IEEE Trans. Neural Networks Learn. Syst. 2022, 34, 8174–8194. [Google Scholar] [CrossRef] [PubMed]
Zhou, Z.-H. Semi-supervised learning. Mach. Learn. 2021, 1, 315–341. [Google Scholar]
Taye, M.M. Understanding of Machine Learning with Deep Learning: Architectures, Workflow, Applications and Future Directions. Computers 2023, 12, 91. [Google Scholar] [CrossRef]
Janiesch, C.; Zschech, P.; Heinrich, K. Machine learning and deep learning. Electron. Mark. 2021, 31, 685–695. [Google Scholar] [CrossRef]
Dong, S.; Wang, P.; Abbas, K. A survey on deep learning and its applications. Comput. Sci. Rev. 2021, 40, 100379. [Google Scholar] [CrossRef]
Matsuo, Y.; LeCun, Y.; Sahani, M.; Precup, D.; Silver, D.; Sugiyama, M.; Morimoto, J. Deep learning, reinforcement learning, and world models. Neural Netw. 2022, 152, 267–275. [Google Scholar] [CrossRef]
Archana, R.; Jeevaraj, P.S.E. Deep learning models for digital image processing: A review. Artif. Intell. Rev. 2024, 57, 1–33. [Google Scholar] [CrossRef]
Cha, Y.-J.; Ali, R.; Lewis, J.; Büyüköztürk, O. Deep learning-based structural health monitoring. Autom. Constr. 2024, 161, 105328. [Google Scholar] [CrossRef]
Razzaq, K.; Shah, M. Machine Learning and Deep Learning Paradigms: From Techniques to Practical Applications and Research Frontiers. Computers 2025, 14, 93. [Google Scholar] [CrossRef]
Ullah, F.; Ullah, I.; Khan, R.U.; Khan, S.; Khan, K.; Pau, G. Conventional to Deep Ensemble Methods for Hyperspectral Image Classification: A Comprehensive Survey. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 3878–3916. [Google Scholar] [CrossRef]
Nobel, S.M.N.; Swapno, S.M.M.R.; Islam, R.; Safran, M.; Alfarhood, S.; Mridha, M.F. A machine learning approach for vocal fold segmentation and disorder classification based on ensemble method. Sci. Rep. 2024, 14, 14435. [Google Scholar] [CrossRef]
Fathi, S.; Ahmadi, A.; Dehnad, A.; Almasi-Dooghaee, M.; Sadegh, M.; Initiative, F.T.A.D.N. A Deep Learning-Based Ensemble Method for Early Diagnosis of Alzheimer’s Disease using MRI Images. Neuroinformatics 2023, 22, 89–105. [Google Scholar] [CrossRef]
Zamani, A.S.; Hashim, A.H.A.; Shatat, A.S.A.; Akhtar, M.; Rizwanullah, M.; Mohamed, S.S.I. Implementation of machine learning techniques with big data and IoT to create effective prediction models for health informatics. Biomed. Signal Process. Control. 2024, 94, 106247. [Google Scholar] [CrossRef]
Razali, F.M.; Sulaiman, N.; Manan, D.I.A.; Said, J. Sustainability of Audit Profession in Digital Technology Era: The Role of Competencies and Digital Technology Capabilities to Detect Fraud Risk. SAGE Open 2025, 15, 21582440241304974. [Google Scholar] [CrossRef]
Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; Galstyan, A. A Survey on Bias and Fairness in Machine Learning. ACM Comput. Surv. 2021, 54, 1–35. [Google Scholar] [CrossRef]
Ekin, T.; Frigau, L.; Conversano, C. Health care fraud classifiers in practice. Appl. Stoch. Models Bus. Ind. 2021, 37, 1182–1199. [Google Scholar] [CrossRef]
Saxena, S.; Singh, A.; Tiwari, S. Prediction model for digital image tampering using customised deep neural network techniques. Int. J. Syst. Assur. Eng. Manag. 2024, 1–9. [Google Scholar] [CrossRef]
Li, C.; Ding, S.; Zou, N.; Hu, X.; Jiang, X.; Zhang, K. Multi-task learning with dynamic re-weighting to achieve fairness in healthcare predictive modeling. J. Biomed. Inform. 2023, 143, 104399. [Google Scholar] [CrossRef]
D’hOndt, E.; Ashby, T.J.; Chakroun, I.; Koninckx, T.; Wuyts, R. Identifying and evaluating barriers for the implementation of machine learning in the intensive care unit. Commun. Med. 2022, 2, 162. [Google Scholar] [CrossRef]
Karimian, G.; Petelos, E.; Evers, S.M.A.A. The ethical issues of the application of artificial intelligence in healthcare: A systematic scoping review. AI Ethics 2022, 2, 539–551. [Google Scholar] [CrossRef]
Tazi, F.; Nandakumar, A.; Dykstra, J.; Rajivan, P.; Das, S. SoK: Analyzing Privacy and Security of Healthcare Data from the User Perspective. ACM Trans. Comput. Health 2024, 5, 1–31. [Google Scholar] [CrossRef]
Gholampour, S. Impact of Nature of Medical Data on Machine and Deep Learning for Imbalanced Datasets: Clinical Validity of SMOTE Is Questionable. Mach. Learn. Knowl. Extr. 2024, 6, 827–841. [Google Scholar] [CrossRef]
Alsamhi, S.H.; Myrzashova, R.; Hawbani, A.; Kumar, S.; Srivastava, S.; Zhao, L.; Wei, X.; Guizan, M.; Curry, E. Federated Learning Meets Blockchain in Decentralized Data Sharing: Healthcare Use Case. IEEE Internet Things J. 2024, 11, 19602–19615. [Google Scholar] [CrossRef]
Khan, N.; Nauman, M.; Almadhor, A.S.; Akhtar, N.; Alghuried, A.; Alhudhaif, A. Guaranteeing Correctness in Black-Box Machine Learning: A Fusion of Explainable AI and Formal Methods for Healthcare Decision-Making. IEEE Access 2024, 12, 90299–90316. [Google Scholar] [CrossRef]
Razzaq, K.; Shah, M. Barriers to Implementing ML for Cybercrime Prevention in Online Retailing. In Proceedings of the SaudiCIS 2024 Proceedings, Dhahran, Saudi Arabia, 19–21 November 2024; 2024. [Google Scholar]
Sarker, I.H. Multi-aspects AI-based modeling and adversarial learning for cybersecurity intelligence and robustness: A comprehensive overview. Secur. Priv. 2023, 6, e295. [Google Scholar] [CrossRef]
Liang, Q.; Bauder, R.A.; Khoshgoftaar, T.M. Enhancing Medicare Fraud Detection: Random Undersampling Followed by SHAP-Driven Feature Selection with Big Data. In Proceedings of the 2024 IEEE 36th International Conference on Tools with Artificial Intelligence (ICTAI), Herndon, VA, USA, 28–40 October 2024; pp. 256–263. [Google Scholar]
Mohale, V.Z.; Obagbuwa, I.C. A systematic review on the integration of explainable artificial intelligence in intrusion detection systems to enhancing transparency and interpretability in cybersecurity. Front. Artif. Intell. 2025, 8, 1526221. [Google Scholar] [CrossRef]
Lakhan, A.; Mohammed, M.A.; Nedoma, J.; Martinek, R.; Tiwari, P.; Vidyarthi, A.; Alkhayyat, A.; Wang, W. Federated-Learning Based Privacy Preservation and Fraud-Enabled Blockchain IoMT System for Healthcare. IEEE J. Biomed. Health Inform. 2022, 27, 664–672. [Google Scholar] [CrossRef]
Kapadiya, K.; Ramoliya, F.; Gohil, K.; Patel, U.; Gupta, R.; Tanwar, S.; Rodrigues, J.J.; Alqahtani, F.; Tolba, A. Blockchain-assisted healthcare insurance fraud detection framework using ensemble learning. Comput. Electr. Eng. 2024, 122, 109898. [Google Scholar] [CrossRef]
Fetaji, B.; Fetaji, M.; Hasan, A.; Rexhepi, S.; Armenski, G. FRAUD-X: An Integrated AI, Blockchain, and Cybersecurity Framework with Early Warning Systems for Mitigating Online Financial Fraud—A Case Study from North Macedonia. IEEE Access 2025, 13, 48068–48082. [Google Scholar] [CrossRef]
Cholevas, C.; Angeli, E.; Sereti, Z.; Mavrikos, E.; Tsekouras, G.E. Anomaly Detection in Blockchain Networks Using Unsupervised Learning: A Survey. Algorithms 2024, 17, 201. [Google Scholar] [CrossRef]
Benedetti, H.; Nikbakht, E.; Sarkar, S.; Spieler, A.C. Blockchain and corporate fraud. J. Financ. Crime 2020, 28, 702–721. [Google Scholar] [CrossRef]
Xu, C.; Zhang, C.; Xu, J.; Pei, J. SlimChain: Scaling blockchain transactions through off-chain storage and parallel processing. In Proceedings of the VLDB Endowment, Copenhagen, Denmark, 4 December 2021; Volume 14, pp. 2314–2326. [Google Scholar]
Cirillo, F.; De Santis, M.; Esposito, C. Applications of Solid Platform and Federated Learning for Decentralized Health Data Management. In Artificial Intelligence Techniques for Analysing Sensitive Data in Medical Cyber-Physical Systems: System Protection and Data Analysis; Springer: Berlin/Heidelberg, Germany, 2025; pp. 95–111. [Google Scholar]
Li, N.; Lewin, A.; Ning, S.; Waito, M.; Zeller, M.P.; Tinmouth, A.; Shih, A.W. The Canadian Transfusion Trials Group Privacy-preserving federated data access and federated learning: Improved data sharing and AI model development in transfusion medicine. Transfusion 2024, 65, 22–28. [Google Scholar] [CrossRef]
Long, G.; Shen, T.; Tan, Y.; Gerrard, L.; Clarke, A.; Jiang, J. Federated learning for privacy-preserving open innovation future on digital health. In Humanity Driven AI: Productivity, Well-Being, Sustainability and Partnership; Springer: Berlin/Heidelberg, Germany, 2021; pp. 113–133. [Google Scholar]
Joshi, M.; Pal, A.; Sankarasubbu, M. Federated learning for healthcare domain-pipeline, applications and challenges. ACM Trans. Comput. Healthc. 2022, 3, 1–36. [Google Scholar] [CrossRef]
Paul, S.G.; Saha, A.; Hasan, Z.; Noori, S.R.H.; Moustafa, A. A Systematic Review of Graph Neural Network in Healthcare-Based Applications: Recent Advances, Trends, and Future Directions. IEEE Access 2024, 12, 15145–15170. [Google Scholar] [CrossRef]
Islam, M.A.; Majumder, M.Z.H.; Miah, M.S.; Jannaty, S. Precision healthcare: A deep dive into machine learning algorithms and feature selection strategies for accurate heart disease prediction. Comput. Biol. Med. 2024, 176, 108432. [Google Scholar] [CrossRef]
Rakhmatulin, I.; Dao, M.-S.; Nassibi, A.; Mandic, D. Exploring Convolutional Neural Network Architectures for EEG Feature Extraction. Sensors 2024, 24, 877. [Google Scholar] [CrossRef]
Hassan, M.; Kaabouch, N. Impact of Feature Selection Techniques on the Performance of Machine Learning Models for Depression Detection Using EEG Data. Appl. Sci. 2024, 14, 10532. [Google Scholar] [CrossRef]
Theng, D.; Bhoyar, K.K. Feature selection techniques for machine learning: A survey of more than two decades of research. Knowl. Inf. Syst. 2023, 66, 1575–1637. [Google Scholar] [CrossRef]
Amiriebrahimabadi, M.; Mansouri, N. A comprehensive survey of feature selection techniques based on whale optimization algorithm. Multimedia Tools Appl. 2023, 83, 47775–47846. [Google Scholar] [CrossRef]
Bounab, R.; Guelib, B.; Benzerogue, S.; Zarour, K. Optimizing Machine Learning for Healthcare Fraud Detection: A Framework Using Hybrid Feature Selection and Hyperparameter Tuning. In Proceedings of the 2024 International Conference on Advanced Aspects of Software Engineering (ICAASE), Constantine, Algeria, 9–10 November 2024; pp. 1–8. [Google Scholar]
Puttelaar, R.V.D.; de Lima, P.N.; Knudsen, A.B.; Rutter, C.M.; Kuntz, K.M.; de Jonge, L.; Escudero, F.A.; Lieberman, D.; Zauber, A.G.; Hahn, A.I.; et al. Effectiveness and Cost-Effectiveness of Colorectal Cancer Screening With a Blood Test that Meets the Centers for Medicare & Medicaid Services Coverage Decision. Gastroenterology 2024, 167, 368–377. [Google Scholar] [CrossRef]
Pennap, D.; Swain, R.S.; Akhtar, S.; Liao, J.; Wei, Y.; Li, J.; Wernecke, M.; MaCurdy, T.E.; Kelman, J.A.; Mosholder, A.D.; et al. Comparing the Centers for Medicare and Medicaid Services (CMS) enrollment data death dates to the National Death Index (NDI). Pharmacoepidemiol. Drug Saf. 2024, 33, e5772. [Google Scholar] [CrossRef] [PubMed]
Hancock, J.T.; Wang, H.; Khoshgoftaar, T.M.; Liang, Q. Data reduction techniques for highly imbalanced medicare Big Data. J. Big Data 2024, 11, 8. [Google Scholar] [CrossRef]
Yang, X.; Zeng, Z.; Teo, S.G.; Wang, L.; Chandrasekhar, V.; Hoi, S. Deep learning for practical image recognition: Case study on kaggle competitions. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018. [Google Scholar]
Leone, N.; Greco, G.; Ianni, G.; Lio, V.; Terracina, G.; Eiter, T.; Staniszkis, W. The INFOMIX system for advanced integration of incomplete and inconsistent data. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, MD, USA, 14–16 June 2005. [Google Scholar]
Cai, L.; Li, J.; Lv, H.; Liu, W.; Niu, H.; Wang, Z. Integrating domain knowledge for biomedical text analysis into deep learning: A survey. J. Biomed. Inform. 2023, 143, 104418. [Google Scholar] [CrossRef]
Murala, D.K.; Panda, S.K.; Dash, S.P. MedMetaverse: Medical Care of Chronic Disease Patients and Managing Data Using Artificial Intelligence, Blockchain, and Wearable Devices State-of-the-Art Methodology. IEEE Access 2023, 11, 138954–138985. [Google Scholar] [CrossRef]
Warren, J.L.; Barrett, M.J.; White, D.P.; Banks, R.; Cafardi, S.; Enewold, L. Sensitivity of Medicare Data to Identify Oncologists. JNCI Monogr. 2020, 2020, 60–65. [Google Scholar] [CrossRef]
Jacobs, J.P.; Shahian, D.M.; Grau-Sepulveda, M.; O’bRien, S.M.; Pruitt, E.Y.; Bloom, J.P.; Edgerton, J.R.; Kurlansky, P.A.; Habib, R.H.; Antman, M.S.; et al. Current Penetration, Completeness, and Representativeness of The Society of Thoracic Surgeons Adult Cardiac Surgery Database. Ann. Thorac. Surg. 2022, 113, 1461–1468. [Google Scholar] [CrossRef]
Johnson, J.M.; Khoshgoftaar, T.M. Data-Centric AI for Healthcare Fraud Detection. SN Comput. Sci. 2023, 4, 1–14. [Google Scholar] [CrossRef]
Duman, E. Implementation of XGBoost Method for Healthcare Fraud Detection. Sci. J. Mehmet Akif Ersoy Univ. 2022, 5, 69–75. [Google Scholar]
Tariq, M.; Palade, V.; Ma, Y. Transfer learning based classification of diabetic retinopathy on the Kaggle EyePACS dataset. In International Conference on Medical Imaging and Computer-Aided Diagnosis; Springer: Singapore, 2022. [Google Scholar]
Neto, E.C.P.; Dadkhah, S.; Sadeghi, S.; Molyneaux, H.; Ghorbani, A.A. A review of Machine Learning (ML)-based IoT security in healthcare: A dataset perspective. Comput. Commun. 2023, 213, 61–77. [Google Scholar] [CrossRef]
Kumaraswamy, N.; Markey, M.K.; Barner, J.C.; Rascati, K. Feature engineering to detect fraud using healthcare claims data. Expert Syst. Appl. 2022, 210, 118433. [Google Scholar] [CrossRef]
Haque, M.E.; Tozal, M.E. Identifying health insurance claim frauds using mixture of clinical concepts. IEEE Trans. Serv. Comput. 2021, 15, 2356–2367. [Google Scholar] [CrossRef]
Mardani, S.; Moradi, H. Using Graph Attention Networks in Healthcare Provider Fraud Detection. IEEE Access 2024, 12, 132786–132800. [Google Scholar] [CrossRef]
Xu, J.; Cai, H.; Zheng, X. Timing of vasopressin initiation and mortality in patients with septic shock: Analysis of the MIMIC-III and MIMIC-IV databases. BMC Infect. Dis. 2023, 23, 1–10. [Google Scholar] [CrossRef]
Tian, J.; Cui, R.; Song, H.; Zhao, Y.; Zhou, T. Prediction of acute kidney injury in patients with liver cirrhosis using machine learning models: Evidence from the MIMIC-III and MIMIC-IV. Int. Urol. Nephrol. 2023, 56, 237–247. [Google Scholar] [CrossRef]
Qayyum, A.; Qadir, J.; Bilal, M.; Al-Fuqaha, A. Secure and Robust Machine Learning for Healthcare: A Survey. IEEE Rev. Biomed. Eng. 2020, 14, 156–180. [Google Scholar] [CrossRef]
Razzaq, K.; Shah, M. Advancing Cybersecurity Through Machine Learning: A Scientometric Analysis of Global Research Trends and Influential Contributions. J. Cybersecur. Priv. 2025, 5, 12. [Google Scholar] [CrossRef]
Borky, J.M.; Bradley, T.H. Protecting information with cybersecurity. In Effective Model-Based Systems Engineering; Springer: Cham, Switzerland, 2019; pp. 345–404. [Google Scholar]
Hassan, N.H.; Ismail, Z.; Maarop, N. A conceptual model for knowledge sharing towards information security culture in healthcare organization. In Proceedings of the 2013 International Conference on Research and Innovation in Information Systems (ICRIIS), Kuala Lumpur, Malaysia, 27–28 November 2013; pp. 516–520. [Google Scholar]
Nifakos, S.; Chandramouli, K.; Nikolaou, C.K.; Papachristou, P.; Koch, S.; Panaousis, E.; Bonacina, S. Influence of Human Factors on Cyber Security within Healthcare Organisations: A Systematic Review. Sensors 2021, 21, 5119. [Google Scholar] [CrossRef]
Georgiadou, A.; Mouzakitis, S.; Askounis, D. Assessing MITRE ATT&CK Risk Using a Cyber-Security Culture Framework. Sensors 2021, 21, 3267. [Google Scholar] [CrossRef]
Papathanasiou, A.; Liontos, G.; Liagkou, V.; Glavas, E. Business email compromise (BEC) attacks: Threats, vulnerabilities and countermeasures—A perspective on the greek landscape. J. Cybersecur. Priv. 2023, 3, 610–637. [Google Scholar] [CrossRef]
Yang, X.; Zhang, C.; Sun, Y.; Pang, K.; Jing, L.; Wa, S.; Lv, C. FinChain-BERT: A High-Accuracy Automatic Fraud Detection Model Based on NLP Methods for Financial Scenarios. Information 2023, 14, 499. [Google Scholar] [CrossRef]
Samariya, D.; Thakkar, A. A Comprehensive Survey of Anomaly Detection Algorithms. Ann. Data Sci. 2021, 10, 829–850. [Google Scholar] [CrossRef]
Jabarulla, M.Y.; Lee, H.-N. A Blockchain and Artificial Intelligence-Based, Patient-Centric Healthcare System for Combating the COVID-19 Pandemic: Opportunities and Applications. Healthcare 2021, 9, 1019. [Google Scholar] [CrossRef] [PubMed]
Developers, T. TensorFlow; Zenodo: Genève, Switzerland, 2022. [Google Scholar]
Tolstoluzka, O.; Telezhenko, D. Development and training of LSTM models for control of virtual distributed systems using TensorFlow and Keras. Radioelectron. Comput. Syst. 2024, 2024, 27–37. [Google Scholar] [CrossRef]
Abadi, Z.J.K.; Mansouri, N.; Javidi, M.M. Deep reinforcement learning-based scheduling in distributed systems: A critical review. Knowl. Inf. Syst. 2024, 66, 5709–5782. [Google Scholar] [CrossRef]
Imambi, S.; Prakash, K.B.; Kanagachidambaresan, G.; PyTorch. Programming with TensorFlow: Solution for Edge Computing Applications; Springer: Cham, Switzerland, 2021; pp. 87–104. [Google Scholar]
Kim, S.; Wimmer, H.; Kim, J. Analysis of deep learning libraries: Keras, pytorch, and MXnet. In Proceedings of the 2022 IEEE/ACIS 20th International Conference on Software Engineering Research, Management and Applications (SERA), Las Vegas, NV, USA, 22–25 May 2022. [Google Scholar]
Merlini, D.; Rossini, M. Text categorization with WEKA: A survey. Mach. Learn. Appl. 2021, 4, 100033. [Google Scholar] [CrossRef]
Qamar, U.; Raza, M.S. Practical Data Science with WEKA. In Data Science Concepts and Techniques with Applications; Springer: Berlin/Heidelberg, Germany, 2023; pp. 393–448. [Google Scholar]
Novac, O.-C.; Chirodea, M.C.; Novac, C.M.; Bizon, N.; Oproescu, M.; Stan, O.P.; Gordan, C.E. Analysis of the Application Efficiency of TensorFlow and PyTorch in Convolutional Neural Network. Sensors 2022, 22, 8872. [Google Scholar] [CrossRef]
Joseph, F.J.J.; Nonsiri, S.; Monsakul, A. Keras and TensorFlow: A hands-on experience. In Advanced Deep Learning for Engineers and Scientists: A Practical Approach; Springer: Cham, Switzerland, 2021; pp. 85–111. [Google Scholar]
Rivera-Escobedo, M.; López-Martínez, M.D.J.; Solis-Sánchez, L.O.; Guerrero-Osuna, H.A.; Vázquez-Reyes, S.; Acosta-Escareño, D.; Olvera-Olvera, C.A. Low-Scalability Distributed Systems for Artificial Intelligence: A Comparative Study of Distributed Deep Learning Frameworks for Image Classification. Appl. Sci. 2025, 15, 6251. [Google Scholar] [CrossRef]
Ketkar, N.; Moolayil, J.; Ketkar, N.; Moolayil, J. Introduction to pytorch. In Deep Learning with Python: Learn Best Practices of Deep Learning Models with PyTorch; Apress: New York, NY, USA, 2021; pp. 27–91. [Google Scholar]
Shao, Y.; Zhang, C.; Xing, L.; Sun, H.; Zhao, Q.; Zhang, L. A new dust detection method for photovoltaic panel surface based on Pytorch and its economic benefit analysis. Energy AI 2024, 16, 100349. [Google Scholar] [CrossRef]
Li, M.; Wen, K.; Lin, H.; Jin, X.; Wu, Z.; An, H.; Chi, M. Improving the Performance of Distributed MXNet with RDMA. Int. J. Parallel Program. 2019, 47, 467–480. [Google Scholar] [CrossRef]
Stancato, G. Enhancing Parametric Design Education Through Rhinoceros/Grasshopper: Visual Perception Principles, Student Learning, and Future Integration with AI. In Advances in Representation: New AI-and XR-Driven Transdisciplinarity; Springer: Berlin/Heidelberg, Germany, 2024; pp. 813–824. [Google Scholar]
Sachin, D.N.; Annappa, B.; Ambesange, S. Federated learning for digital healthcare: Concepts, applications, frameworks, and challenges. Computing 2024, 106, 3113–3150. [Google Scholar] [CrossRef]
Bouh, M.M.; Hossain, F.; Paul, P.; Ahmed, A. Enhancing Medical Records Digitization Through a Post-OCR Processing Technique. In Proceedings of the 2024 IEEE-EMBS Conference on Biomedical Engineering and Sciences (IECBES), Penang, Malaysia, 11–13 December 2024; pp. 311–316. [Google Scholar]
Hernandez, F.G.; Nguyen, Q.; Smith, V.C.; Cordero, J.A.; Ballester, M.R.; Duran, M.; Solé, A.; Chotsiri, P.; Wattanakul, T.; Mundin, G.; et al. Named entity recognition of pharmacokinetic parameters in the scientific literature. Sci. Rep. 2024, 14, 1–8. [Google Scholar] [CrossRef] [PubMed]
Niu, H.; Omitaomu, O.A.; Langston, M.A.; Olama, M.; Ozmen, O.; Klasky, H.B.; Laurio, A.; Ward, M.; Nebeker, J. EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records. J. Biomed. Inform. 2024, 150, 104605. [Google Scholar] [CrossRef] [PubMed]
Purushothaman, S.; Shanmugam, G.S.; Nagarajan, S. Achieving Seamless Semantic Interoperability and Enhancing Text Embedding in Healthcare IoT: A Deep Learning Approach with Survey. SN Comput. Sci. 2023, 5, 1–28. [Google Scholar] [CrossRef]
Diez, P.L.; Sundgaard, J.V.; Margeta, J.; Diab, K.; Patou, F.; Paulsen, R.R. Deep reinforcement learning and convolutional autoencoders for anomaly detection of congenital inner ear malformations in clinical CT images. Comput. Med. Imaging Graph. 2024, 113, 102343. [Google Scholar] [CrossRef]
Ioannou, I.; Nagaradjane, P.; Angin, P.; Balasubramanian, P.; Kavitha, K.J.; Murugan, P.; Vassiliou, V. GEMLIDS-MIOT: A Green Effective Machine Learning Intrusion Detection System based on Federated Learning for Medical IoT network security hardening. Comput. Commun. 2024, 218, 209–239. [Google Scholar] [CrossRef]
Imtiaz, M.A.; Razzaq, K.; Javed, M.A.; Masood, H.; Yousaf, H.F.; Siddique, H. An Enhanced Data Protection and Security based on Machine Learning: Deep Analysis on Threat Mitigation, Challenges in Internet of Medical Things (IoMTs). Spectr. Eng. Sci. 2025, 3, 496–521. [Google Scholar]
Odufisan, O.I.; Abhulimen, O.V.; Ogunti, E.O. Harnessing artificial intelligence and machine learning for fraud detection and prevention in Nigeria. J. Econ. Criminol. 2025, 7, 100127. [Google Scholar] [CrossRef]
Zhang, J.; Morley, J.; Gallifant, J.; Oddy, C.; Teo, J.T.; Ashrafian, H.; Delaney, B.; Darzi, A. Mapping and evaluating national data flows: Transparency, privacy, and guiding infrastructural transformation. Lancet Digit. Health 2023, 5, e737–e748. [Google Scholar] [CrossRef] [PubMed]
Andrade, D. GDPR and Cross-Border Data Transfers in Clinical Trials. 2025. Available online: https://www.clinicaltrialvanguard.com/article/gdpr-and-cross-border-data-transfers-in-clinical-trials/ (accessed on 27 June 2025).
Shaikh, T.A.; Rasool, T.; Verma, P.; Mir, W.A. A fundamental overview of ensemble deep learning models and applications: Systematic literature and state of the art. Ann. Oper. Res. 2024, 1–77. [Google Scholar] [CrossRef]
Alahmadi, A.; Khan, H.A.; Shafiq, G.; Ahmed, J.; Ali, B.; Javed, M.A.; Alahmadi, A.H. A privacy-preserved IoMT-based mental stress detection framework with federated learning. J. Supercomput. 2024, 80, 10255–10274. [Google Scholar] [CrossRef]
Kumar, R.; Garg, S.; Kaur, R.; Johar, M.G.M.; Singh, S.; Menon, S.V.; Kumar, P.; Hadi, A.M.; Hasson, S.A.; Lozanović, J. A comprehensive review of machine learning for heart disease prediction: Challenges, trends, ethical considerations, and future directions. Front. Artif. Intell. 2025, 8, 1583459. [Google Scholar] [CrossRef]
Vyas, A.; Abimannan, S.; Hwang, R.H. Sensitive Healthcare Data: Privacy and Security Issues and Proposed Solutions. In Emerging Technologies for Healthcare: Internet of Things and Deep Learning Models; Wiley Online Library: Hoboken, NJ, USA, 2021; pp. 93–127. [Google Scholar]
Sahoh, B.; Choksuriwong, A. The role of explainable Artificial Intelligence in high-stakes decision-making systems: A systematic review. J. Ambient. Intell. Humaniz. Comput. 2023, 14, 7827–7843. [Google Scholar] [CrossRef]
Schreiber, M. New AI tool counters health insurance denials decided by automated algorithms. In The Guardian; American Medical Association: Chicago, IL, USA, 2025. [Google Scholar]
Lebcir, I. Utilizing Machine Learning for Financial Management in Healthcare. South East. Eur. J. Public Health 2025, 26, 1529–1542. [Google Scholar]
Dubey, S.; Verma, D.K.; Kumar, M. Severe acute respiratory syndrome Coronavirus-2 GenoAnalyzer and mutagenic anomaly detector using FCMFI and NSCE. Int. J. Biol. Macromol. 2023, 258, 129051. [Google Scholar] [CrossRef]
Zhou, W.; Wu, S.; Wang, Y.; Zuo, L.; Yi, Y.; Cui, W. DMU-TransNet: Dense multi-scale U-shape transformer network for anomaly detection. Measurement 2024, 229, 114216. [Google Scholar] [CrossRef]
Lu, S.; Zhang, W.; Guo, J.; Liu, H.; Li, H.; Wang, N. PatchCL-AE: Anomaly detection for medical images using patch-wise contrastive learning-based auto-encoder. Comput. Med. Imaging Graph. 2024, 114, 102366. [Google Scholar] [CrossRef]
Settipalli, L.; Gangadharan, G. WMTDBC: An unsupervised multivariate analysis model for fraud detection in health insurance claims. Expert Syst. Appl. 2022, 215, 119259. [Google Scholar] [CrossRef]
Kumaraswamy, N.; Ekin, T.; Park, C.; Markey, M.K.; Barner, J.C.; Rascati, K. Using a Bayesian Belief Network to detect healthcare fraud. Expert Syst. Appl. 2023, 238, 122241. [Google Scholar] [CrossRef]
Cai, Y.; Zhang, W.; Chen, H.; Cheng, K.-T. MedIAnomaly: A comparative study of anomaly detection in medical images. Med. Image Anal. 2025, 102, 103500. [Google Scholar] [CrossRef] [PubMed]
Alsalman, D. A Comparative Study of Anomaly Detection Techniques for IoT Security Using Adaptive Machine Learning for IoT Threats. IEEE Access 2024, 12, 14719–14730. [Google Scholar] [CrossRef]
Sai, S.; Bhandari, K.S.; Nawal, A.; Chamola, V.; Sikdar, B. An IoMT-Based Incremental Learning Framework With a Novel Feature Selection Algorithm for Intelligent Diagnosis in Smart Healthcare. IEEE Trans. Mach. Learn. Commun. Netw. 2024, 2, 370–383. [Google Scholar] [CrossRef]
Siddiqui, M.A.; Kalra, M.; Krishna, C.R. ADSBAN: Anomaly detection system for body area networks utilizing IoT and machine learning. Concurr. Comput. Pr. Exp. 2024, 36, e8075. [Google Scholar] [CrossRef]
Seshagiri, S.; Prema, K.V. Efficient Handling of Data Imbalance in Health Insurance Fraud Detection Using Meta-Reinforcement Learning. IEEE Access 2025, 13, 23482–23497. [Google Scholar] [CrossRef]
Mohanty, M.D.; Das, A.; Mohanty, M.N.; Altameem, A.; Nayak, S.R.; Saudagar, A.K.J.; Poonia, R.C. Design of smart and secured healthcare service using deep learning with modified SHA-256 algorithm. Healthcare 2022, 10, 1274. [Google Scholar] [CrossRef]
Khosravi, P.; Mohammadi, S.; Zahiri, F.; Khodarahmi, M.; Zahiri, J. AI-Enhanced Detection of Clinically Relevant Structural and Functional Anomalies in MRI: Traversing the Landscape of Conventional to Explainable Approaches. J. Magn. Reson. Imaging 2024, 60, 2272–2289. [Google Scholar] [CrossRef]
Hong, B.; Lu, P.; Xu, H.; Lu, J.; Lin, K.; Yang, F. Health insurance fraud detection based on multi-channel heterogeneous graph structure learning. Heliyon 2024, 10, e30045. [Google Scholar] [CrossRef]
Hancock, J.T.; Bauder, R.A.; Wang, H.; Khoshgoftaar, T.M. Explainable machine learning models for Medicare fraud detection. J. Big Data 2023, 10, 154. [Google Scholar] [CrossRef]
Khan, M.M.; Alkhathami, M. Anomaly detection in IoT-based healthcare: Machine learning for enhanced security. Sci. Rep. 2024, 14, 5872. [Google Scholar] [CrossRef]
Devaguptam, S.; Gorti, S.S.; Akshaya, T.L.; Kamath, S.S. Automated Health Insurance Processing Framework with Intelligent Fraud Detection, Risk Classification and Premium Prediction. SN Comput. Sci. 2024, 5, 450. [Google Scholar] [CrossRef]

Figure 1. Workflow of feature engineering in healthcare fraud detection.

Figure 2. Overview of key applications of machine learning in healthcare fraud detection.

Table 1. Summary of Machine Learning Techniques Used in Healthcare Fraud Detection.

Techniques		Description	Algorithms	Strengths	References
Traditional ML	Supervised Learning	Trained on labelled data	DT, RF, LR, SVM	Effective with labelled data Detects known fraud patterns	[55,56,57,58,59]
	Unsupervised Learning	Identifies hidden patterns	Clustering, Anomaly Detection	Detects novel fraud schemes Labels not needed	[60,61,62,63]
	Semi-Supervised Learning	Uses labelled and unlabelled data	Mixed supervised/unsupervised	Works with limited labelled data	[64,65,66]
Advanced ML	Deep Learning	Neural networks for complex pattern analysis	CNNs, LSTMs	Handles high-dimensional data Detect sophisticated fraud	[67,68,69,70,71,72,73]
Advanced ML	Ensemble Methods	Combines multiple models	Boosting, Stacking	Robustness High accuracy	[74,75,76]

Table 2. Comparative analysis (strength) of LLMs and Graph-based models in healthcare fraud detection.

Capability	Temporal Abstraction	Cross-Modal Learning	Interpretability	Deployment
LLMs GPT, BERT)	Moderate	Strong	Moderate	High
Graph-based models (TGAT, TGCN)	Strong	Moderate	Moderate	Moderate
Computational Trade-offs	Graph-based models require more memory	LLMs need high computational power	GNNs are more transparent	LLMs are good in text domains, while GNNs are good in graph scenarios

Table 4. Comparison of Machine Learning Frameworks for Healthcare Applications.

Framework	Key Functionalities	Strengths Relevant to Healthcare Fraud Detection	Limitations/Considerations	References
TensorFlow	Deep learning, numerical computation, large-scale ML, multi-language support, GPU/distributed processing, Keras integration.	Scalability, flexibility, strong industry adoption, and a comprehensive ecosystem for complex models.	It can have a steeper learning curve for beginners compared to Keras.	[138,139,145,146,147]
PyTorch	Deep learning, dynamic computation graphs, strong GPU acceleration, and extensive libraries.	Flexibility, ease of use in research, strong community support, and rapid prototyping.	It may require more coding for basic tasks than Weka.	[141,148,149]
MXNet	Scalable deep learning, multi-language support, and efficient training.	Scalability and support from major cloud providers.	No longer under active development as of September 2023.	[150,151]
Weka	Data mining, machine learning algorithms, user-friendly GUI, data preprocessing, and visualisation.	Ease of use, intuitive interface, a wide range of algorithms, and suitable for non-programmers.	Less emphasis on deep learning compared to TensorFlow and PyTorch.	[143,144]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Razzaq, K.; Shah, M. Next-Generation Machine Learning in Healthcare Fraud Detection: Current Trends, Challenges, and Future Research Directions. Information 2025, 16, 730. https://doi.org/10.3390/info16090730

AMA Style

Razzaq K, Shah M. Next-Generation Machine Learning in Healthcare Fraud Detection: Current Trends, Challenges, and Future Research Directions. Information. 2025; 16(9):730. https://doi.org/10.3390/info16090730

Chicago/Turabian Style

Razzaq, Kamran, and Mahmood Shah. 2025. "Next-Generation Machine Learning in Healthcare Fraud Detection: Current Trends, Challenges, and Future Research Directions" Information 16, no. 9: 730. https://doi.org/10.3390/info16090730

APA Style

Razzaq, K., & Shah, M. (2025). Next-Generation Machine Learning in Healthcare Fraud Detection: Current Trends, Challenges, and Future Research Directions. Information, 16(9), 730. https://doi.org/10.3390/info16090730

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Next-Generation Machine Learning in Healthcare Fraud Detection: Current Trends, Challenges, and Future Research Directions

Abstract

1. Introduction

2. Background and Related Work

3. Navigating the Challenges of Healthcare Fraud Detection with ML

3.1. Data Imbalance and Quality

3.2. Privacy and Compliance Constraints

3.3. Interpretability and Trust

3.4. Resource Limitations

3.5. Adversarial Manipulation

4. Emerging Trends in ML-Driven Fraud Detection

5. The Art and Science of Feature Engineering and Selection for Fraud Detection

5.1. Importance of Feature Engineering

5.2. Common Features in Healthcare Fraud

5.3. Feature Selection Techniques

6. Datasets Used in Healthcare Fraud Detection Research

7. Interplay Between Cybersecurity and Machine Learning in Healthcare Fraud Prevention

8. Tools Used in Healthcare Fraud Detection

8.1. Core Machine Learning Tools

8.1.1. TensorFlow and Keras

8.1.2. PyTorch

8.1.3. Apache MXNet

8.1.4. Weka

8.2. Specialised Analytical Tools

8.2.1. spaCy + ScispaCy

8.2.2. Transformers

9. Discussion

9.1. Recent Advancements in Using Machine Learning for Healthcare Fraud Detection

9.2. Barriers Organisations Face in Implementing Machine Learning

9.2.1. Data Quality and Availability

9.2.2. Integration with Existing Systems

9.2.3. Regulatory Compliance

9.2.4. Resource Constraints

9.2.5. Adaptability to Emerging Threats

9.2.6. Resistance to Change by Organisation

9.3. Improving Machine Learning Efficiency

9.4. Standard Datasets Used in Healthcare Fraud Detection

9.5. Ethical Concerns in ML-Based Healthcare Fraud Detection

9.6. Limitations of Machine Learning in Healthcare Systems

9.6.1. Data Privacy and Governance

9.6.2. Label Noise

9.6.3. Model Drift

9.6.4. Infrastructure Constraints

9.7. Operationalising Transparency and Bias Mitigation

9.7.1. Model Auditing

9.7.2. Explainability Metrics

9.7.3. Transparency Dashboards

10. Conclusions and Future Research Directions

10.1. Practical Implications

10.2. Future Research Directions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI