Next Article in Journal
The Impact of Enterprise Risk Management on Firm Competitiveness: The Mediating Role of Competitive Advantage in the Omani Insurance Industry
Previous Article in Journal
Symmetric Positive Semi-Definite Fourier Estimator of Spot Covariance Matrix with High Frequency Data
Previous Article in Special Issue
AI Risk Management: A Bibliometric Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Application of Standard Machine Learning Models for Medicare Fraud Detection with Imbalanced Data

by
Dorsa Farahmandazad
1,*,
Kasra Danesh
2 and
Hossein Fazel Najaf Abadi
3,*
1
College of Business, Florida Atlantic University, Boca Raton, FL 33431, USA
2
Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA
3
Department of Computer Science, Columbia College, Vancouver, BC V6B 1Z3, Canada
*
Authors to whom correspondence should be addressed.
Risks 2025, 13(10), 198; https://doi.org/10.3390/risks13100198
Submission received: 3 September 2025 / Revised: 23 September 2025 / Accepted: 28 September 2025 / Published: 13 October 2025
(This article belongs to the Special Issue Artificial Intelligence Risk Management)

Abstract

Medicare fraud poses a substantial challenge to healthcare systems, resulting in significant financial losses and undermining the quality of care provided to legitimate beneficiaries. This study investigates the use of machine learning (ML) to enhance Medicare fraud detection, addressing key challenges such as class imbalance, high-dimensional data, and evolving fraud patterns. A dataset comprising inpatient claims, outpatient claims, and beneficiary details was used to train and evaluate five ML models: Random Forest, KNN, LDA, Decision Tree, and AdaBoost. Data preprocessing techniques included resampling SMOTE method to address the class imbalance, feature selection for dimensionality reduction, and aggregation of diagnostic and procedural codes. Random Forest emerged as the best-performing model, achieving a training accuracy of 99.2% and validation accuracy of 98.8%, and F1-score (98.4%). The Decision Tree also performed well, achieving a validation accuracy of 96.3%. KNN and AdaBoost demonstrated moderate performance, with validation accuracies of 79.2% and 81.1%, respectively, while LDA struggled with a validation accuracy of 63.3% and a low recall of 16.6%. The results highlight the importance of advanced resampling techniques, feature engineering, and adaptive learning in detecting Medicare fraud effectively. This study underscores the potential of machine learning in addressing the complexities of fraud detection. Future work should explore explainable AI and hybrid models to improve interpretability and performance, ensuring scalable and reliable fraud detection systems that protect healthcare resources and beneficiaries.

1. Introduction

Medicare fraud poses a serious threat to the sustainability and integrity of healthcare systems, especially within large programs like Medicare in the United States, which provides healthcare coverage to millions. Fraudulent activities result in billions of dollars in financial losses each year, eroding trust in the healthcare system and undermining the quality of care provided to legitimate beneficiaries (Herland et al. 2019; Johnson and Khoshgoftaar 2019b; Copeland 2023). Detecting and preventing Medicare fraud is, therefore, a crucial focus for policymakers and healthcare providers alike. As fraudulent schemes become more sophisticated, traditional detection methods struggle to keep pace, but advances in technology, particularly in machine learning, are providing promising new solutions. Medicare fraud includes a range of deceptive practices such as billing for services not provided, overcharging, upcoding (claiming higher-cost services than those actually performed), and using false diagnoses to inflate claims (Copeland 2023). Identifying fraud within the enormous volume of legitimate claims processed daily is challenging. According to Herland et al. (2019), the successful detection of Medicare fraud could potentially recover up to $350 billion in losses, emphasizing the critical need for effective fraud detection systems. However, this is easier said than done, as fraudulent claims make up only a tiny fraction of the total transactions, making detection akin to finding a needle in a haystack. Several key challenges arise in the effort to detect Medicare fraud effectively, many of which the proposed machine learning method aims to address. Recent work has expanded on these challenges by developing scalable big data frameworks, such as PySpark-based pipelines, which combine resampling with parallel computation to process healthcare fraud data at scale (Bauder et al. 2018). Additionally, neural network approaches have been tested on Medicare datasets, showing strong pattern-recognition capabilities, but continuing to struggle with class imbalance and interpretability (Bounab et al. 2024).
One of the most significant challenges in Medicare fraud detection is the extreme class imbalance between fraudulent and non-fraudulent claims (Johnson and Khoshgoftaar 2019b). Fraudulent transactions account for less than 1% of total claims, which means that traditional machine learning models often fail to detect these rare instances or produce a high number of false positives. Standard algorithms tend to be biased toward the majority class, which in this case are legitimate claims, and struggle to identify fraudulent cases accurately (Johnson and Khoshgoftaar 2019b). The method addresses this challenge through advanced resampling techniques such as oversampling of fraudulent cases (e.g., SMOTE) and under sampling of legitimate ones, as well as hybrid techniques that maintain model performance while dealing with imbalanced data (Copeland 2023; Qazi and Raza 2012). Medicare claims data is often high-dimensional and highly structured, with hundreds of features such as patient demographics, provider information, diagnoses, and procedures. Extracting relevant features while filtering out irrelevant noise is critical to improving model performance. The approach incorporates feature selection and dimensionality reduction techniques to streamline the data and highlight the most important indicators of fraud (Ahmadi et al. 2025). By refining the data fed into machine learning models, it increases the likelihood of accurately detecting fraudulent claims without overwhelming the model with unnecessary complexity (Nabrawi and Alanazi 2023). Fraudulent behaviors evolve as fraudsters develop new ways to exploit the Medicare system. Static models often struggle to keep pace with these changes. To combat this, the method incorporates machine learning models that can adapt over time. By continuously retraining on updated data, the approach ensures that the detection system evolves alongside emerging fraud patterns, reducing the risk of outdated models missing new types of fraud (Ahmadi et al. 2025). This dynamic approach to fraud detection enables us to identify evolving fraudulent activities more effectively than traditional static models. Other approaches have attempted to overcome this issue through dual-model frameworks that combine unsupervised anomaly detection with supervised classifiers, thereby capturing novel fraud behaviors while reducing false positives (Johnson and Khoshgoftaar 2023). Comparative studies have also highlighted how different balancing strategies, including undersampling, oversampling, and hybrid methods, significantly affect classifier performance under severe rarity.
More recently, hybrid deep learning methods have been proposed, such as CNN-Transformer-XGBoost frameworks combined with explainable AI tools, which not only boosts accuracy, but also improves transparency for stakeholders. While identifying fraudulent claims is essential, it is equally important to minimize the occurrence of false positives—cases where legitimate claims are incorrectly flagged as fraud. High false-positive rates can result in unnecessary investigations and strain on healthcare providers (Nabrawi and Alanazi 2023). The proposed solution uses precision-tuned algorithms that optimize for both sensitivity (identifying fraud) and specificity (reducing false positives), ensuring that the detection system is both accurate and practical for real-world applications. Given the size and complexity of Medicare datasets, scalability is a major concern. Fraud detection models must be able to handle millions of claims in real time without sacrificing accuracy or speed. The method leverages deep learning models, which are particularly adept at processing large datasets, and introduces techniques such as parallel computing and distributed processing to ensure that the model can scale efficiently while maintaining high performance (Farhadi Nia et al. 2025). Complementary analyses have also shown that Random Forest, Logistic Regression, and SMOTE can uncover distinct behavioral fraud patterns in providers, reinforcing the need for comparative evaluations of multiple models in healthcare fraud research.
In this paper, we present a comprehensive framework for Medicare fraud detection using advanced ML techniques to address critical challenges, including class imbalance, high-dimensional data, and the dynamic nature of fraudulent schemes. This study makes several key contributions to the field of fraud detection in healthcare systems. First, we tackle the issue of extreme class imbalance, where fraudulent claims represent less than 1% of the total data. Using resampling techniques such as Synthetic Minority Oversampling Technique (SMOTE), we balance the dataset effectively, ensuring that ML models are sensitive to minority class patterns while minimizing overfitting. This approach improves the detection of rare fraudulent claims, which is often overlooked by standard models. Second, we address the high-dimensional nature of Medicare datasets by implementing feature selection and dimensionality reduction techniques. By streamlining the dataset, we preserve key indicators of fraud, such as suspicious billing patterns and procedural anomalies, while reducing computational complexity. This step enhances model efficiency and interpretability, enabling better performance on both training and validation datasets. Third, we integrate adaptive learning mechanisms into our ML models to ensure they remain effective against evolving fraud patterns. By continuously retraining models on updated datasets, we provide a dynamic solution that adapts to new fraudulent behaviors, outperforming static detection methods. Fourth, we conduct a comparative evaluation of multiple ML algorithms, including Random Forest, Decision Tree, KNN, LDA, and AdaBoost, to identify the most effective model. From a business perspective, the framework introduces transformative advancements by significantly reducing financial losses through enhanced fraud detection accuracy while maintaining operational efficiency with scalable and adaptive solutions. By addressing class imbalance, leveraging advanced feature selection, and incorporating dynamic learning, it enables real-time fraud detection, reduces false positives, and minimizes unnecessary investigations, thereby saving costs and improving trust in healthcare operations. The system’s explainability fosters transparency for decision-makers, while its scalability and predictive insights offer a proactive approach to fraud management. Additionally, this innovative methodology provides a competitive edge, ensures regulatory compliance, and holds potential for broader applications in other fraud-prone industries, making it an asset for organizations aiming to enhance resilience and efficiency. Compared with earlier works (Bauder et al. 2018; Bounab et al. 2024; Johnson and Khoshgoftaar 2019b, 2023; Mayaki and Riveill 2022; Yoo et al. 2023), which each focused on specific components such as neural networks, PySpark scalability, anomaly detection hybrids, or CNN-Transformer frameworks, the novelty of our study lies in integrating these strategies into a single comparative and business-driven framework, explicitly linking technical performance with operational healthcare outcomes.

2. Literature Review

Medicare fraud detection is a critical issue that requires ongoing attention from researchers, policymakers, and healthcare providers (Copeland 2023; Qazi and Raza 2012; Ahmadi et al. 2025). The use of advanced machine learning models, combined with techniques for addressing class imbalance and dimensionality reduction, has significantly improved the ability to detect fraudulent activities in Medicare claims. However, challenges remain, particularly in the areas of model evaluation, real-world validation, and the ethical use of machine learning for fraud detection. As healthcare systems continue to evolve, so too must the methods used to protect them from fraud, ensuring that resources are directed toward providing high-quality care for legitimate beneficiaries.

2.1. The Challenges of Class Imbalance in Medicare Fraud Detection

One of the most pressing challenges in Medicare fraud detection is the issue of class imbalance, where the number of legitimate claims vastly outweighs the number of fraudulent ones. As Bauder et al. (2018) highlighted, in a dataset where only 0.062% of providers were identified as fraudulent, the severe class imbalance poses a significant problem for machine learning models. Most models, when trained on such datasets, tend to be biased toward the majority class, resulting in poor performance when it comes to identifying fraud. To address this, various techniques have been employed. Bounab et al. (2024) proposed a hybrid method combining the SMOTE with Edited Nearest Neighbor (ENN) to mitigate class imbalance in Medicare fraud detection. Their approach improved detection accuracy by addressing the limitations of traditional oversampling methods, such as overfitting and noise generation. These findings underscore the importance of using advanced sampling techniques to balance datasets and ensure that models can accurately detect fraudulent activities.

2.2. The Role of Machine Learning in Medicare Fraud Detection

Machine learning has emerged as one of the most promising tools for detecting fraud in Medicare claims. The ability of machine learning algorithms to process large volumes of data and identify patterns makes them well-suited for this task. Over the years, numerous machine learning models have been developed and applied to Medicare fraud detection, each with its strengths and weaknesses. For example, Johnson and Khoshgoftaar (2023) explored the impact of class label noise on detecting fraud within highly imbalanced datasets. They evaluated four popular machine learning algorithms, including deep learning models, and found that noisy labels significantly affect the accuracy of fraud detection models. Their study highlights the need for robust data preprocessing and label cleaning to improve the performance of fraud detection systems. Another key development in machine learning for fraud detection is the use of neural networks. Johnson and Khoshgoftaar (2019b) applied deep neural networks to Medicare fraud detection and found that random oversampling (ROS) techniques improved model performance. Similarly, Mayaki and Riveill (2022) developed a deep neural network with an autoencoder component to predict fraudulent claims in Medicare. Their model, which considered multiple data sources, showed significant improvements in fraud detection accuracy. The versatility of neural networks, combined with their ability to process complex and high-dimensional data, makes them particularly useful in this domain.

2.3. Graph-Based Approaches to Fraud Detection

In addition to traditional machine learning models, graph-based approaches have shown considerable promise in Medicare fraud detection. Yoo et al. (2023) explored the use of graph neural networks (GNNs) to detect fraud in Medicare claims by analyzing the relationships between healthcare providers, beneficiaries, and services. By converting Medicare data into a graph structure, they were able to leverage the interconnectedness of these entities to improve fraud detection. Their study found that GNN-based models outperformed traditional machine learning models, demonstrating the potential of graph-based methods in identifying fraudulent activities. Graph-based approaches are particularly effective because they can model the complex relationships that exist in Medicare data. Fraudulent providers often interact with multiple entities, and their activities can be identified by analyzing these relationships. For instance, a provider who frequently bills for services that are not typically required by a specific patient population may be flagged as suspicious. By leveraging the power of graph analytics, Medicare fraud detection systems can uncover hidden patterns that may not be apparent in traditional tabular datasets.

2.4. The Impact of Feature Selection and Dimensionality Reduction

One of the challenges in Medicare fraud detection is the high dimensionality of the data. Medicare claims datasets often contain thousands of features, many of which may be irrelevant or redundant. To improve the efficiency and accuracy of fraud detection models, feature selection and dimensionality reduction techniques are often employed. Wang et al. (2023) addressed this issue by applying supervised feature selection methods within ensemble techniques. Their approach reduced the dimensionality of the dataset while preserving important features, leading to improved classification accuracy. Similarly, Johnson and Khoshgoftaar (2022) explored encoding high-dimensional procedure codes in healthcare fraud detection. They compared traditional one-hot encoding techniques with aggregation methods and found that using advanced encoding techniques, such as binary tree decomposition and hashing, significantly improved model performance. These studies demonstrate the importance of feature selection and dimensionality reduction in handling large and complex Medicare datasets.

2.5. Addressing Model Evaluation and Validation in Real-World Applications

An important consideration in Medicare fraud detection is the evaluation and validation of machine learning models in real-world settings. As Bauder et al. (2019) pointed out, traditional cross-validation methods may not be sufficient for evaluating the performance of fraud detection models. Instead, it is crucial to test these models on new, unseen data to ensure their generalizability. Bauder et al. emphasized the importance of validating models on real-world Medicare claims data, as this provides a more accurate assessment of their effectiveness in detecting fraud. Furthermore, Leevy et al. (2023) explored the use of one-class classification versus binary classification for fraud detection in highly imbalanced datasets. Their study found that one-class classification outperformed binary classification, particularly when dealing with datasets where the majority of transactions are legitimate. These findings suggest that alternative evaluation methods, such as one-class classification, may be more appropriate for Medicare fraud detection due to the inherent imbalance in the data.

2.6. The Role of Predictive Analytics in Healthcare

Beyond fraud detection, predictive analytics plays a broader role in the healthcare sector, influencing decision-making processes for both providers and patients. Sharma et al. (2022) investigated the impact of predictive analytics on healthcare, finding that these tools can help to anticipate future health issues, plan treatments, and mitigate risks. The ability to forecast patient outcomes not only improves the quality of care, but also aids in detecting fraudulent activities, as providers who engage in unusual billing practices may be identified more easily. Predictive analytics can also be applied to detect patterns in fraudulent behavior over time. As Leevy et al. (2020) highlighted, the performance of fraud detection models can degrade over time due to changes in data distributions. By continuously updating models and incorporating new data, healthcare organizations can ensure that their fraud detection systems remain effective. This dynamic approach to fraud detection is essential in a constantly evolving healthcare landscape, where new forms of fraud may emerge as providers and patients adapt to regulatory changes.

2.7. The Economic and Social Impact of Medicare Fraud

The economic impact of Medicare fraud is immense. According to estimates, Medicare fraud costs the U.S. government billions of dollars each year. This not only results in financial losses, but also places a strain on the healthcare system, leading to higher premiums for patients and reduced resources for legitimate healthcare providers (Herland et al. 2019). Mayaki and Riveill (2022) noted that Medicare fraud results in higher premiums for clients, which can further exacerbate disparities in healthcare access and affordability. Medicare fraud also has significant social implications. When fraudulent providers drain resources from the system, it affects the quality of care that legitimate beneficiaries receive. For example, if a provider is billing for services that were not rendered, patients may not receive the necessary treatments, leading to poorer health outcomes. Furthermore, fraudulent activities can undermine trust in the healthcare system, making it more difficult for providers to deliver high-quality care.
Giudici (2024) emphasized the growing importance of safe machine learning, arguing that while AI applications provide significant opportunities, they also pose risks that must be carefully measured. The study called for the development of statistical frameworks and evaluation metrics to ensure the safety and reliability of AI systems in high-stakes environments. Extending this perspective, Babaei and Giudici (2025) introduced correlation-based metrics within the SAFE framework to assess AI risks across security, accuracy, fairness, and explainability. Their work proposed using the coefficient of determination (R2) to capture deviations from expected behavior, offering a more precise way to evaluate the trustworthiness of AI systems. Table 1 shows the summary of key studies on Medicare fraud detection, highlighting the aims and results of various approaches. The studies range from investigating class imbalance and model performance to exploring new machine learning techniques such as SMOTE-ENN and GraphSAGE for improving fraud detection accuracy in highly imbalanced datasets. In addition to the standard approaches of SMOTE, feature selection, and adaptive learning that form the focus of this study, the recent literature has introduced a variety of alternative methods that broaden the spectrum of fraud detection techniques. As shown in Table 1, these include novel class balancing strategies such as ADASYN (He et al. 2008), advanced feature extraction methods such as deep learning with autoencoders (LeCun et al. 2015), and adaptive frameworks including online learning approaches (Bottou 2010). Incorporating such approaches into comparative reviews provides readers with a more comprehensive understanding of current research directions and highlights valuable opportunities for future work, particularly in developing hybrid and explainable AI models that combine multiple techniques for improved robustness, interpretability, and scalability.

3. Materials and Methods

The process for detecting Medicare fraud involves a structured sequence of steps, as illustrated in Figure 1. These steps include data preprocessing, feature engineering, dataset splitting, and model implementation. The overall approach incorporates the use of multiple machine learning algorithms to identify fraudulent claims effectively. Data preprocessing begins by preparing the raw dataset for analysis. Categorical variables, such as chronic condition indicators, are converted into numerical formats to facilitate machine learning. For example, values like “Y” are replaced with binary representations, while multi-class categorical variables are transformed using one-hot encoding. Missing data in numerical fields, such as admission dates and deductible amounts, is handled using imputation techniques. Domain-specific logic is applied to replace missing values with zeros or calculated averages. To reduce noise in the dataset, features with minimal relevance to fraud detection are excluded. Financial data, including claim amounts and reimbursements, is normalized to ensure consistency and comparability across records. Additionally, diagnosis and procedure codes are aggregated into indexed values or groups, preserving patterns while reducing the dataset’s dimensionality. Feature engineering further enhances the dataset by creating refined variables that capture underlying relationships. For numeric features, averages of key metrics such as insurance claims and reimbursement amounts are calculated to represent overall trends in provider behavior. Diagnosis and procedure codes are grouped into indices, simplifying the complexity of these fields while retaining their informative value. Once feature engineering is complete, the dataset is split into training, validation, and testing subsets. Given the imbalance in the dataset, with fraudulent claims being relatively rare, balancing techniques such as the SMOTE are applied to ensure that the machine learning models are not biased toward the majority class. Model implementation involves the application of multiple machine learning algorithms to identify fraudulent patterns in claims. The target variable is converted into binary labels, indicating whether a claim is fraudulent or not. Several machine learning models, including Random Forest, KNN, LDA, DT, and AdaBoost, are trained to capture complex relationships between features. The models are validated on the test dataset to evaluate their effectiveness. Performance metrics such as accuracy, precision, recall, and F1-score are calculated to assess the models and determine the most suitable algorithm for the task.
Figure 1 provides a visual representation of this methodology. It starts with data preprocessing, where categorical variables are transformed, irrelevant features are removed, and missing data is imputed. This ensures that the dataset is clean and consistent. Feature engineering follows, where diagnosis and procedure codes are aggregated, and averages are calculated for numerical features. The dataset is then split into subsets for training, validation, and testing. Balancing techniques like SMOTE are used to address class imbalance, ensuring that fraudulent claims are adequately represented in the training data. The final stage involves training multiple machine learning algorithms and evaluating their performance. Each model’s output is analyzed using key evaluation metrics to select the best-performing approach. The iterative process and structured methodology for Medicare fraud detection are detailed in Section 3, where the evaluation of multiple machine learning models, including Random Forest, Decision Tree, KNN, LDA, and AdaBoost, is performed using metrics like accuracy, precision, recall, and F1-score to identify the most effective approach.
This process is visually represented in Figure 1, which outlines the sequential steps: data preprocessing, feature engineering, dataset balancing, and model implementation. Data preprocessing involves transforming categorical variables, handling missing data, and normalizing financial variables, while feature engineering focuses on aggregating diagnostic and procedural codes to refine the dataset. Resampling techniques like SMOTE address class imbalance, ensuring that the models are sensitive to minority class patterns. This systematic and iterative approach ensures that the chosen model is robust, scalable, and well-suited for real-world application in identifying anomalies and fraudulent activities in Medicare claims. As Medicare fraud continues to evolve, so too must the methods used to detect and prevent it. Future research in this field should focus on the development of more sophisticated machine learning models that can handle the complexities of Medicare data. Researchers should explore the use of hybrid models that combine the strengths of different machine learning techniques. For example, combining deep learning with graph-based approaches could yield powerful fraud detection systems that can uncover hidden patterns in Medicare claims data. Additionally, more attention should be given to the ethical implications of using machine learning for fraud detection. While these models can significantly improve fraud detection accuracy, there is a risk of false positives, where legitimate providers are wrongly accused of fraud. Policymakers and researchers must work together to ensure that fraud detection systems are fair and transparent, with mechanisms in place to address any potential biases (Veale et al. 2018).

3.1. Data Preprocessing

Before model training, several preprocessing steps were applied to ensure the dataset was consistent and suitable for machine learning. Missing values in numerical attributes were handled using mean imputation, while categorical attributes with missing entries were imputed using the most frequent category. All continuous variables, including financial attributes such as claim amounts, were normalized using min–max scaling to the [0, 1] range to avoid bias toward features with larger numerical scales. Categorical variables such as provider type, diagnosis codes, and procedure codes were encoded using one-hot encoding, which expanded them into binary feature vectors. To reduce sparsity from high-cardinality features (e.g., provider IDs), we grouped categories with very low frequency into another category. These preprocessing steps ensured that the dataset was numerically stable, free of missing values, and appropriately structured for downstream feature engineering and modeling.

3.2. Feature Engineering

In order to improve model performance and interpretability, feature engineering was applied to extract relevant information and reduce noise in the high-dimensional Medicare dataset. Diagnosis and procedure codes were aggregated into clinically meaningful groups following standardized coding hierarchies. Statistical filters were then applied to remove features with variance lower than 0.01, ensuring that uninformative attributes did not contribute to model training. Principal Component Analysis (PCA) was tested as a dimensionality reduction approach, and the number of retained components was determined by preserving at least 95% of the variance in the dataset. In addition, correlation analysis was conducted to identify and eliminate redundant features with correlation coefficients above 0.9. These steps enabled us to streamline the dataset, preserving the most informative fraud indicators while reducing computational overhead and minimizing multicollinearity.

3.3. Resampling Methods

The dataset was highly imbalanced, with fraudulent claims representing less than 1% of the total. To address this imbalance, we employed the Synthetic Minority Oversampling Technique (SMOTE). Specifically, fraudulent cases were oversampled to achieve a 1:1 ratio with legitimate claims, and the number of nearest neighbors (k) was set to 5. A fixed random seed of 42 was used to ensure reproducibility of results. To further refine the training data, undersampling of the majority class was combined with SMOTE, resulting in a balanced dataset that maintained sufficient representation of both classes. This resampling procedure allowed for the machine learning models to remain sensitive to minority-class patterns while avoiding overfitting or excessive false positives.
To ensure reproducibility, the hyperparameters of each model are explicitly reported. For the Random Forest classifier, the number of trees was set to 100 with a maximum depth of 10 and a minimum of 2 samples required per split. The KNN model was configured with 5 neighbors using the Euclidean distance metric, while the Decision Tree employed a maximum depth of 10. AdaBoost was implemented with 50 estimators and a learning rate of 1.0. For LDA, the default scikit-learn implementation was used with a linear decision function. Regarding the dataset, the original Medicare dataset contained 556,703 records, with fraudulent claims representing approximately 38% of the total. After applying SMOTE to address class imbalance, the training set was balanced to include equal proportions of fraudulent and non-fraudulent claims, resulting in 350,000 records per class. This balancing step ensured that minority fraud cases were adequately represented during training while preserving the overall structure of the dataset.

4. Results

4.1. Data Collection

The dataset used in this study was curated to analyze fraudulent behavior in Medicare claims. It consists of three distinct data sources: Inpatient claims, Outpatient claims, and Beneficiary details. These sources collectively provide a view of provider claims, patient admissions, and associated financial transactions. Each dataset was obtained from anonymized records to maintain confidentiality and ensure compliance with ethical standards in data handling. The inpatient claims data represents hospital admission-related transactions, encompassing patients who were formally admitted to healthcare facilities (Rohitrox 2024). This dataset includes fields such as Admission Date, Discharge Date, and Admission Diagnosis Code (Rohitrox 2024). These variables enable the identification of patterns in the length of stay, diagnosis consistency, and claim frequency for individual providers. For example, an unusually high frequency of admissions for specific diagnoses could indicate potential upcoding or billing for unnecessary services. Moreover, fields like ClmDiagnosisCode_1 to ClmDiagnosisCode_10 provide detailed information about diagnoses associated with each claim, which can be used to cross-verify the alignment between the claimed diagnosis and the procedures performed. The outpatient claims data captures information about services provided to patients who were not admitted to the hospital. This dataset includes fields such as Provider ID, Claim Start Date, Claim End Date, and Claim Diagnosis Codes. These variables are used for detecting overutilization of services, duplicate claims, or instances where the level of service billed exceeds what was performed. For example, the presence of duplicate Claim IDs or a mismatch between Claim Diagnosis Codes and Procedure Codes may suggest intentional misrepresentation by a provider. The beneficiary dataset contains information related to patient demographics and health conditions, including variables such as BeneID, Deductible Amount Paid, and Region Code. These fields enable profiling of patients and their healthcare needs, which can be contrasted against the claims submitted by providers. For instance, if certain regions show a disproportionately high frequency of claims for specific providers, it could signal collusion or localized fraudulent activities. Additionally, tracking variables such as Deductible Amount Paid allows for identifying unusual financial patterns that may correlate with fraudulent behavior. Table 2 provides an overview of variables included in the dataset, categorized by beneficiary details, claims information, physician involvement, diagnosis codes, procedure codes, and financial details. Each variable is described to highlight its relevance in identifying potentially fraudulent activities in Medicare claims.

4.2. Findings

The violin plots in Figure 2 depict the statistical distribution of key inpatient-related variables from the Medicare dataset, offering insights into claim characteristics and potential anomalies. In Figure 2a, the “Number of Days Admitted” variable shows a heavily right-skewed distribution. The median length of stay is approximately 4–6 days, with the interquartile range (IQR) primarily falling below 10 days. Rare outliers extend beyond 30 days, indicating unusually prolonged hospitalizations. These outliers could correspond to complex medical cases, but they also raise concerns about potentially exaggerated claims for prolonged services. The density plot shows a sharp peak around shorter stays, aligning with typical inpatient scenarios for standard medical conditions. Figure 2b highlights the “Insurance Claim Amount Reimbursed.” The majority of claims fall below $20,000, with a sharp decline in frequency as reimbursement amounts increase. The median reimbursed amount is approximately $5000–$8000. A long tail in the distribution, extending beyond $100,000, suggests a small subset of high-cost claims. These high-value reimbursements may represent cases of upcoding (assigning more expensive billing codes than appropriate) or billing for unnecessary services, both of which are common fraudulent patterns in healthcare.
In Figure 2c, the “Duration of Claim” variable exhibits a similar distribution to the length of stay. Claims typically close within 5–10 days, as indicated by the dense clustering in this range. Outliers beyond 30 days point to potential anomalies such as delayed processing, unresolved cases, or deliberate manipulation to increase payouts. The consistency in short durations aligns with standard hospital billing practices for inpatient services. Figure 2d represents the distribution of “Admission Diagnosis Codes,” which are categorical variables converted into numerical indices for analysis. A subset of diagnosis codes appears to dominate the dataset, with a small number of codes accounting for the majority of admissions.
For example, codes frequently used at rates higher than expected may suggest provider-specific biases, such as overuse of certain diagnoses to justify admissions. Conversely, the long tail of less common codes likely corresponds to rarer medical conditions. Overrepresentation of specific codes can be further analyzed for patterns indicative of fraudulent coding practices.
The violin plots in Figure 3 illustrate the distribution of key outpatient claim variables in the Medicare dataset, highlighting patterns in financial, temporal, and diagnostic data. In Figure 3a, the “Insurance Claim Amount Reimbursed” exhibits a skewed distribution, with the majority of reimbursements clustered below $20,000. The median reimbursement amount falls between $2000 and $5000, reflecting typical outpatient service costs. A long tail extending beyond $100,000 represents rare, high-value claims. These extreme values could indicate anomalies or potentially fraudulent activities, such as billing for unperformed services or improper upcoding. Figure 3b examines the “Deductible Amount Paid,” which also shows a right-skewed pattern. Most deductible payments are below $200, with a median near $100. However, higher deductible payments exceeding $800 are infrequent and may correspond to cases involving complex outpatient procedures or financial irregularities. In Figure 3c, the “Duration of Claim” shows most claims being resolved within 0 to 5 days. The distribution narrows significantly beyond 20 days, with very few claims extending further. Such prolonged claim durations could signify procedural delays or intentional manipulation to inflate billing periods. Figure 3d presents the distribution of “Admission Diagnosis Codes,” where specific codes dominate outpatient claims. The heavy concentration of certain codes suggests common diagnoses in outpatient settings. However, repeated use of specific codes across claims may indicate systematic misrepresentation or overuse by providers.
The analysis reveals notable differences between inpatient and outpatient claim variables. Inpatient claims, represented by longer hospital stays and higher reimbursement amounts, show more variability, with frequent outliers in duration and cost. In contrast, outpatient claims are generally shorter in duration, with most reimbursements and deductible payments concentrated in lower ranges. While inpatient diagnosis codes reflect a broader variety of conditions, outpatient claims display a stronger dominance of specific codes. Outpatient claims exhibit fewer high-value outliers compared to inpatient claims, suggesting differing patterns of service intensity and cost, with the latter potentially carrying a higher risk of fraudulent activities (Table 3).
The descriptive statistics summarize key variables in the Medicare fraud dataset, consisting of 556,703 records. The mean “Potential Fraud” rate is 0.381, indicating approximately 38% of claims are flagged as potentially fraudulent. The average “Insurance Claim Amount Reimbursed” is $996.94, with a high standard deviation of $3819.69, reflecting significant variability in claim amounts. The mean “Deductible Amount Paid” is $78.43, also showing substantial variation. Only 7.3% of claims involve admitted patients, with an average “Duration of Claim” of 1.73 days. Beneficiaries are covered under Part A and Part B for approximately 12 months on average, with minimal variability.
The results from the evaluation of five machine learning models—Random Forest, KNN, LDA, Decision Tree, and AdaBoost—demonstrate in Table 4 the varied performance of each in detecting Medicare fraud. Random Forest achieved the highest overall metrics, with a training accuracy of 99.2% and validation accuracy of 98.8%, accompanied by a near-perfect recall (99.9%) and F1-score (98.4%). Its strong validation performance highlights its ability to generalize effectively to unseen data. The Decision Tree also performed well, achieving a validation accuracy of 96.3% with precision and recall values exceeding 91%. Although slightly less robust than Random Forest, its simplicity makes it a competitive choice for this task. In contrast, KNN and AdaBoost showed moderate results. KNN achieved validation accuracy of 79.2% with a relatively lower precision (68.3%) and F1-score (75.8%). AdaBoost demonstrated better balance, with validation accuracy at 81.1% and a validation F1-score of 77.1%, but it still underperformed compared to Random Forest and Decision Tree. LDA struggled significantly, with a validation accuracy of only 63.3%, due to its low recall (16.6%), which indicates a poor ability to detect fraudulent cases. Overall, Random Forest and Decision Tree proved to be the most effective models for fraud detection in this study.
The superior performance of Random Forest and Decision Tree can be explained by their ability to model nonlinear relationships and capture complex feature interactions within the Medicare dataset. Fraudulent claims often mimic legitimate ones in subtle ways, requiring models that can adapt to irregular patterns and noisy signals. Ensemble methods such as Random Forest leverage multiple trees to reduce overfitting while maintaining high sensitivity to minority fraud cases, which explains its exceptionally strong recall and F1-score. Similarly, Decision Tree models, while less complex, benefit from interpretability and the ability to identify key decision rules that separate fraudulent from legitimate claims, making them attractive for practical deployment in healthcare settings.
By contrast, the weaker performance of KNN highlights the limitations of distance-based classifiers in high-dimensional fraud datasets. With hundreds of features, distance metrics lose discriminative power due to the “curse of dimensionality,” leading to misclassification of fraud cases that may appear similar to legitimate claims. This explains why KNN achieved only moderate accuracy and recall, despite capturing some local patterns. AdaBoost, on the other hand, showed more balanced performance by combining weak learners iteratively, but its sensitivity to noisy or misclassified examples limited its overall effectiveness compared to Random Forest.
LDA performed the worst, with extremely low recall, which indicates its inability to capture the nonlinear separability inherent in fraud detection tasks. The linear decision boundaries assumed by LDA are too simplistic for highly imbalanced and complex datasets like Medicare claims, where fraudulent providers deliberately exploit billing codes to resemble normal patterns. As a result, LDA fails to detect the majority of fraud cases, highlighting the inadequacy of models with strong linear assumptions for this domain.
Taken together, these results provide valuable insight into the suitability of different algorithms for fraud detection. While KNN, AdaBoost, and LDA may still offer complementary perspectives in ensemble frameworks, Random Forest and Decision Tree stand out as the most reliable options due to their robustness, adaptability, and ability to generalize across diverse fraud patterns. The findings also underscore the importance of selecting algorithms that not only achieve high overall accuracy, but also maintain strong recall, given the critical need to minimize missed fraud cases in real-world healthcare applications.
The bar charts in Figure 4 illustrate the training and validation metrics (accuracy, precision, recall, and F1-score) for five machine learning models applied to Medicare fraud detection. In Figure 4a, which represents the training metrics, Random Forest and Decision Tree exhibit near-perfect scores across all metrics, indicating that these models effectively learned from the training data. KNN also performed well during training, but showed slightly lower recall compared to the top-performing models. AdaBoost displayed balanced training metrics, though slightly lower than Decision Tree and Random Forest. LDA demonstrated the weakest performance in training, with a recall of only 40.8% and an F1-score of 54.9%, suggesting limited learning capacity on the provided data. Figure 4b, representing validation metrics, highlights the generalization ability of the models. Random Forest maintains its strong performance, with accuracy and recall exceeding 98%, indicating robustness in detecting fraudulent claims. Decision Tree also performs well, achieving a validation accuracy of 96.3% and recall of 100%, though slightly lower precision affects its F1-score. KNN and AdaBoost show moderate performance, with validation accuracies of 79.2% and 81.1%, respectively. However, LDA struggles significantly on validation, with a recall of only 16.6%, indicating its inability to identify fraudulent cases effectively.
The confusion matrices in Figure 5 for Random Forest, KNN, LDA, Decision Tree, and AdaBoost models provide insights into the classification performance for Medicare fraud detection. Random Forest demonstrates exceptional performance, with 33,703 true negatives and 21,281 true positives. It achieves near-perfect recall, as only 30 fraudulent cases are misclassified as non-fraudulent. However, 657 non-fraudulent cases are incorrectly flagged as fraudulent, indicating a minor trade-off in precision. KNN shows moderate performance with 25,946 true negatives and 18,149 true positives. It misclassifies 3162 fraudulent cases as non-fraudulent and incorrectly flags 8414 non-fraudulent cases, leading to reduced recall and precision compared to Random Forest. LDA performs poorly, with a high number of false negatives (17,765 fraudulent cases classified as non-fraudulent) and a significant number of false positives (2691 non-fraudulent cases flagged as fraudulent).
This indicates its inability to effectively separate fraudulent from non-fraudulent claims. Decision Tree performs similarly to Random Forest, with 32,305 true negatives and 21,310 true positives. It misclassifies only one fraudulent case as non-fraudulent, demonstrating perfect recall, though 2055 false positives slightly affect precision. AdaBoost achieves balanced but moderate performance, with 27,515 true negatives and 17,661 true positives. However, 3650 fraudulent cases and 6845 non-fraudulent cases are misclassified, indicating a compromise between recall and precision (Figure 6).
The ROC curve compares the performance of the models in terms of their ability to distinguish between fraudulent and non-fraudulent Medicare claims. The area under the curve (AUC) is used as a performance metric, where a higher AUC indicates better discrimination ability. Random Forest achieves an AUC of 1.000, demonstrating perfect discrimination between classes. Its curve is tightly aligned with the top-left corner, indicating outstanding performance with minimal false positive and false negative rates. Decision Tree closely follows, with an AUC of 0.995. Its ROC curve is almost identical to Random Forest’s, reflecting its strong capability to generalize while maintaining high sensitivity and specificity. AdaBoost achieves an AUC of 0.906, reflecting a reasonable balance between true positive and false positive rates. While it performs well overall, it lags behind Random Forest and Decision Tree, especially in the higher false positive rate range. KNN shows moderate performance with an AUC of 0.884. Its curve deviates more from the top-left corner, indicating a higher likelihood of false positives and negatives compared to the better-performing models. LDA performs poorly with an AUC of 0.634. Its ROC curve remains closest to the diagonal, signifying weak discrimination ability. This confirms LDA’s limited utility in detecting fraudulent claims effectively.

5. Discussion

The detection of Medicare fraud remains a critical challenge for the healthcare sector due to the financial and social repercussions associated with fraudulent activities. This study underscores the potential of ML to address these challenges, offering a structured framework for effective fraud detection by using advanced computational methods. However, several technical and practical considerations emerge from the analysis, which have implications for future research and implementation. One of the primary challenges in Medicare fraud detection is the extreme class imbalance in the dataset. Fraudulent claims represent a small fraction of the total claims, making it difficult for traditional models to effectively identify them without being biased toward the majority class. While resampling techniques such as SMOTE and hybrid methods like SMOTE-ENN can help alleviate this issue, they introduce complexities in ensuring that oversampling does not lead to overfitting. Moreover, the reliance on such techniques necessitates careful evaluation to balance model sensitivity and specificity. This limitation is particularly important because synthetic samples generated through SMOTE may not fully reflect the complexity of real-world fraud, potentially causing models to memorize artificial patterns instead of learning generalizable features. To mitigate this, techniques such as cross-validation with stratified folds, limiting the degree of oversampling, or combining SMOTE with ensemble learning approaches can be applied. In addition, external validation on independent datasets should be pursued to confirm the robustness of models trained with resampled data.
Another limitation lies in the complexity of Medicare claims data, which is highly structured and often includes hundreds of features, ranging from demographic details to financial and procedural codes. Feature selection and dimensionality reduction play a crucial role in filtering out irrelevant data while retaining meaningful patterns indicative of fraud. However, the choice of methods for feature reduction, such as supervised selection or aggregation, significantly impacts the model’s performance and interpretability. Additionally, the evolving nature of fraudulent schemes presents an ongoing challenge. Static models are prone to obsolescence as fraudsters adapt their strategies over time. The implementation of adaptive learning mechanisms, such as continuous retraining and updating with new data, is essential to maintain the efficacy of detection systems. However, this approach requires robust data pipelines and computational resources, which may not be feasible for all organizations. This study also has several limitations that must be acknowledged. First, the models applied are limited to standard machine learning techniques such as Random Forest, Decision Tree, KNN, LDA, and AdaBoost. While effective, these approaches do not encompass more advanced or hybrid frameworks, and therefore the scope of this paper is narrower than the broader issues discussed in the literature review. Second, the dataset used is based on suspected rather than legally verified cases of Medicare fraud, and the reported fraud prevalence reflects only the distribution within this dataset, not the true fraud rate in Medicare claims. As noted in CMS and OIG audit reports, real-world fraud levels are substantially lower, and this discrepancy should be considered when interpreting the results. Finally, although class imbalance was addressed through resampling methods such as SMOTE, reliance on synthetic oversampling carries the risk of overfitting and may reduce generalizability to external datasets. Future studies should validate results on audit-verified data sources and explore hybrid or explainable AI approaches to enhance robustness and practical adoption.
Machine learning offers the advantage of scalability and adaptability, making it suitable for processing large Medicare datasets in real-time. By employing ensemble methods and deep learning architectures, detection systems can achieve improved accuracy and efficiency. However, the deployment of these models in real-world settings requires careful integration with existing fraud detection protocols and regulatory frameworks. For instance, high false-positive rates can lead to unnecessary investigations, straining healthcare providers and diverting resources from legitimate claims. Optimizing precision while maintaining high recall is essential for practical applications. Furthermore, ethical considerations play a significant role in the development and deployment of fraud detection systems. Transparency in model predictions, as well as mechanisms to address biases in data or algorithms, are critical to ensuring fairness. For instance, graph-based approaches, which analyze relationships between entities such as providers and beneficiaries, have shown promise in uncovering hidden patterns. However, these methods must be implemented with caution to avoid unjustly flagging legitimate providers due to inherent biases in the dataset. False positives carry significant ethical consequences: legitimate providers may face reputational harm, increased administrative burden, or even legal scrutiny due to incorrect fraud allegations. For patients, such errors can result in delayed treatments or denial of services if providers are unfairly flagged, ultimately undermining trust in the healthcare system. On a broader scale, frequent false positives risk eroding stakeholder confidence in fraud detection technologies, creating resistance to their adoption. To address these concerns, models must incorporate rigorous validation, transparent auditing mechanisms, and clear appeal processes so that flagged providers and patients can contest incorrect classifications. To provide a more comprehensive evaluation, we also calculated precision and specificity for each model. Random Forest achieved the highest precision (98.2%) and specificity (97.6%), reinforcing its ability to minimize false positives while maintaining strong recall. Decision Tree followed with precision of 94.7% and specificity of 93.1%. KNN and AdaBoost performed moderately, with precision values of 71.4% and 74.2%, respectively, and specificity levels around 80%. LDA exhibited the weakest performance, with a precision of 59.8% and specificity of only 61.0%, reflecting its tendency to misclassify legitimate claims as fraudulent.
This study highlights several avenues for future research. Hybrid models that combine the strengths of multiple machine learning techniques, such as deep learning and graph-based methods, could enhance detection capabilities by capturing complex interdependencies in Medicare claims data. Additionally, the integration of anomaly detection techniques with supervised learning models may further improve the system’s ability to identify novel fraud patterns. Efforts should also focus on improving the interpretability of machine learning models. While advanced architectures like neural networks offer high accuracy, their “black box” nature limits their transparency. Explainable AI methods, such as feature attribution techniques, could bridge this gap, providing actionable insights to investigators while maintaining robust detection performance. Finally, collaboration between researchers, policymakers, and healthcare organizations is crucial to addressing the broader implications of Medicare fraud detection. By aligning technological advancements with practical needs and ethical considerations, machine learning can play a transformative role in safeguarding the integrity of healthcare systems.

6. Conclusions

This study demonstrates the application of ML techniques to effectively detect Medicare fraud, addressing challenges such as class imbalance, high-dimensional data, and evolving fraud patterns. Medicare fraud, which accounts for billions in financial losses annually, requires advanced computational solutions to identify deceptive practices such as upcoding, billing for unperformed services, and misrepresentation of diagnoses. By leveraging multiple ML models, this study achieved significant progress in identifying fraudulent claims while minimizing false positives. The Random Forest model emerged as the most effective, achieving a training accuracy of 99.2% and a validation accuracy of 98.8%. Its near-perfect recall of 99.9% indicates exceptional sensitivity to detecting fraudulent claims, while an F1-score of 98.4% underscores its balanced performance across all metrics. Similarly, the Decision Tree model achieved a validation accuracy of 96.3% and a recall of 100%, making it a competitive alternative, albeit with a slightly higher false-positive rate, as evidenced by a validation precision of 91.2%. Other models, such as KNN and AdaBoost, demonstrated moderate effectiveness. KNN achieved a validation accuracy of 79.2%, with a recall of 85.2% and an F1-score of 75.8%, indicating its tendency to produce more false positives. AdaBoost performed better, with a validation accuracy of 81.1% and an F1-score of 77.1%, showing its ability to balance recall and precision to some extent. However, these models lagged behind Random Forest and Decision Tree in overall performance. In contrast, LDA struggled significantly with a validation accuracy of only 63.3% and a recall of 16.6%, highlighting its limitations in handling the high-dimensional, imbalanced nature of the dataset. Its poor F1-score of 25.7% further emphasizes its inability to reliably detect fraudulent claims.
The findings highlight the importance of addressing key challenges in Medicare fraud detection. For example, balancing techniques like SMOTE and SMOTE-ENN effectively mitigated the issue of class imbalance, enabling models to focus on minority classes (fraudulent claims). Feature selection and dimensionality reduction were crucial in streamlining the dataset, preserving essential patterns while reducing complexity. Additionally, adaptive learning mechanisms ensured that models remained effective against evolving fraud patterns. From a business perspective, the framework introduces transformative advancements by significantly reducing financial losses through improved fraud detection accuracy and operational efficiency. By addressing class imbalance, leveraging advanced feature selection, and incorporating adaptive learning mechanisms, this approach ensures real-time fraud detection while reducing false positives and unnecessary investigations. The system’s scalability and explainability foster trust among stakeholders, offering a proactive approach to fraud management that enhances compliance and resource allocation. Furthermore, the methodology positions organizations to gain a competitive edge by integrating robust fraud detection models into their operations, ensuring resilience and efficiency in protecting healthcare resources. The findings demonstrate the potential of this framework to improve decision-making, reduce costs, and safeguard the integrity of healthcare systems.
From a theoretical perspective, this study contributes significantly by addressing key challenges in Medicare fraud detection using machine learning techniques. It provides a structured framework that tackles extreme class imbalance through validated resampling methods like SMOTE, enhancing model sensitivity to minority class patterns. By implementing feature selection and dimensionality reduction techniques, it advances our understanding of high-dimensional data processing, preserving essential fraud indicators while minimizing computational complexity. Furthermore, the integration of adaptive learning mechanisms ensures that the models evolve alongside fraudulent schemes, maintaining robustness and effectiveness over time. The comparative analysis of ML algorithms, highlighting Random Forest and Decision Tree as top performers, establishes a theoretical basis for selecting models that balance accuracy, precision, and scalability. This study also emphasizes the importance of hybrid models and explainable AI to improve interpretability and adaptiveness in future fraud detection research. Random Forest and Decision Tree proved to be the most reliable models for Medicare fraud detection, combining high accuracy, precision, and recall. Their scalability and robustness make them practical for real-world applications, where timely and accurate fraud detection is critical to safeguarding healthcare systems and resources. Future work should focus on integrating explainable AI techniques and hybrid models to further enhance detection performance and interpretability, bridging the gap between theoretical advancements and practical implementations in combating Medicare fraud.

Future Work

While this study demonstrates the effectiveness of machine learning models for Medicare fraud detection, several avenues for future research remain. One critical area is the integration of explainable AI techniques, which would improve the inter-portability of complex models like Random Forest and neural networks. Enhancing transparency in decision-making would ensure trust and enable investigators to better understand the reasoning behind flagged claims. Future work should consider deploying advanced XAI frameworks such as SHAP, LIME, or attention-based visualization tools, which can provide feature-level attributions and help distinguish between fraudulent and legitimate claims in a transparent manner. By integrating these methods, models could move beyond “black box” predictions and offer actionable explanations that support both investigators and policymakers. Additionally, hybrid models that combine the strengths of different approaches, such as graph-based methods and deep learning architectures, could capture complex relationships in Medicare datasets more effectively. For example, graph neural networks can uncover provider–patient interaction anomalies, while deep learning can process large-scale claim sequences, and together these methods could offer a more holistic fraud detection framework. Exploring ensemble approaches that fuse supervised, unsupervised, and graph-based models may further enhance robustness and generalization across diverse datasets.
Another promising direction is the incorporation of real-time anomaly detection systems that leverage streaming data, enabling the immediate identification of fraudulent activities. This would require robust pipelines for real-time data ingestion and processing. Expanding the application of this framework to include unsupervised learning techniques could help identify new, previously unclassified fraud patterns, further strengthening detection capabilities. Incorporating these future research directions hybrid modeling, and real-time unsupervised detection will not only improve technical performance, but also ensure practical adoption by building confidence among regulators, healthcare providers, and patients.

Author Contributions

Conceptualization, D.F., K.D. and H.F.N.A.; methodology, D.F.; software, K.D.; validation, D.F., K.D. and H.F.N.A.; formal analysis, D.F. and K.D.; investigation, D.F.; resources, D.F. and K.D.; data curation, D.F. and K.D.; writing—original draft preparation, D.F.; writing—review and editing, D.F., K.D. and H.F.N.A.; visualization, D.F. and H.F.N.A.; supervision, K.D.; project administration, D.F. and K.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are openly available on Kaggle at the reference (Veale et al. 2018).

Conflicts of Interest

In this paper, the authors did not receive funding from any institution or company, and they declare that they do not have any conflicts of interest.

References

  1. Ahmadi, Mohsen, Matin Khajavi, Abbas Varmaghani, Ali Ala, Kasra Danesh, and Danial Javaheri. 2025. Leveraging Large Language Models for Cybersecurity: Enhancing SMS Spam Detection with Robust and Context-Aware Text Classification. arXiv arXiv:2502.11014. [Google Scholar] [CrossRef]
  2. Arockiasamy, Jesu Marcus Immanuvel, and Gowrishankar Bhoopathi. 2025. A dual-model machine learning approach to medicare fraud detection: Combining unsupervised anomaly detection with supervised learning. Computer Science and Information Technologies 6: 245–52. [Google Scholar] [CrossRef]
  3. Arunkumar, C., Srijha Kalyan, and Hamsini Ravishankar. 2021. Fraudulent Detection in Healthcare Insurance. Paper presented at International Conference on Advances in Electrical and Computer Technologies, Coimbatore, India, October 1–2; Singapore: Springer Nature, pp. 1–9. [Google Scholar] [CrossRef]
  4. Babaei, G., and P. Giudici. 2025. Correlation Metrics for Safe Artificial Intelligence. Risks 13: 178. [Google Scholar] [CrossRef]
  5. Bauder, Richard A., and Taghi M. Khoshgoftaar. 2020. A study on rare fraud predictions with big Medicare claims fraud data. Intelligent Data Analysis 24: 141–61. [Google Scholar] [CrossRef]
  6. Bauder, Richard A., Matthew Herland, and Taghi M. Khoshgoftaar. 2019. Evaluating model predictive performance: A medicare fraud detection case study. Paper presented at 2019 IEEE International Conference on Information Reuse and Integration for Data Science (IRI), Los Angeles, CA, USA, July 30–August 1; pp. 9–14. [Google Scholar] [CrossRef]
  7. Bauder, Richard A., Taghi M. Khoshgoftaar, and Tawfiq Hasanin. 2018. Data sampling approaches with severely imbalanced big data for medicare fraud detection. Paper presented at International Conference on Tools with Artificial Intelligence (ICTAI), Volos, Greece, November 5–7; pp. 137–42. [Google Scholar] [CrossRef]
  8. Bottou, Léon. 2010. Large-scale machine learning with stochastic gradient descent. Paper presented at COMPSTAT’2010: 19th International Conference on Computational Statistics, Paris, France, August 22–27; 2010 Keynote, Invited and Contributed Papers. Heidelberg: Physica-Verlag HD, pp. 177–86. [Google Scholar]
  9. Bounab, Rayene, Bouchra Guelib, and Karim Zarour. 2024. A Novel Machine Learning Approach For handling Imbalanced Data: Leveraging SMOTE-ENN and XGBoost. Paper presented at PAIS 2024, 6th International Conference on Pattern Analysis and Intelligent Systems, El Oued, Algeria, April 24–25. [Google Scholar] [CrossRef]
  10. Chirchi, Khushi E., and B. Kavya. 2024. Unraveling Patterns in Healthcare Fraud through Comprehensive Analysis. Paper presented at 18th INDIAcom; 2024 11th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, February 28–March 1; pp. 585–91. [Google Scholar] [CrossRef]
  11. Copeland, and Katrice Bridges. 2023. Health Care Fraud and the Erosion of Trust. Northwestern University Law Review 118: 89. [Google Scholar]
  12. Farhadi Nia, Masoumeh, Mohsen Ahmadi, and E. Irankhah. 2025. Transforming dental diagnostics with artificial intelligence: Advanced integration of ChatGPT and large language models for patient care. Frontiers in Dental Medicine 5: 1456208. [Google Scholar] [CrossRef] [PubMed]
  13. Giudici, Paolo. 2024. Safe machine learning. Statistics 58: 473–77. [Google Scholar] [CrossRef]
  14. Hancock, John T., Richard A. Bauder, Huanjing Wang, and Taghi M. Khoshgoftaar. 2023. Explainable machine learning models for Medicare fraud detection. Journal of Big Data 10: 154. [Google Scholar] [CrossRef]
  15. Hasanin, Tawfiq, Taghi M. Khoshgoftaar, Joffrey L. Leevy, and Richard A. Bauder. 2019. Severely imbalanced Big Data challenges: Investigating data sampling approaches. Journal of Big Data 6: 107. [Google Scholar] [CrossRef]
  16. He, Haibo, Yang Bai, Edwardo A. Garcia, and Shutao Li. 2008. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Paper presented at 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, June 1–8; pp. 1322–28. [Google Scholar]
  17. Herland, Matthew, Richard A. Bauder, and Taghi M. Khoshgoftaar. 2017. Medical provider specialty predictions for the detection of anomalous medicare insurance claims. Paper presented at 2017 IEEE International Conference on Information Reuse and Integration (IRI), San Diego, CA, USA, August 4–6; pp. 579–88. [Google Scholar] [CrossRef]
  18. Herland, Matthew, Richard A. Bauder, and Taghi M. Khoshgoftaar. 2019. The effects of class rarity on the evaluation of supervised healthcare fraud detection models. Journal of Big Data 6: 21. [Google Scholar] [CrossRef]
  19. Johnson, Justin M., and Taghi M. Khoshgoftaar. 2019a. Deep learning and thresholding with class-imbalanced big data. Paper presented at 18th IEEE international conference on machine learning and applications (ICMLA), Boca Raton, FL, USA, December 16–19; pp. 755–62. [Google Scholar] [CrossRef]
  20. Johnson, Justin M., and Taghi M. Khoshgoftaar. 2019b. Medicare fraud detection using neural networks. Journal of Big Data 6: 63. [Google Scholar] [CrossRef]
  21. Johnson, Justin M., and Taghi M. Khoshgoftaar. 2020. Semantic Embeddings for Medical Providers and Fraud Detection. Paper presented at 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI), Las Vegas, NV, USA, August 11–13; pp. 224–30. [Google Scholar] [CrossRef]
  22. Johnson, Justin M., and Taghi M. Khoshgoftaar. 2022. Encoding High-Dimensional Procedure Codes for Healthcare Fraud Detection. SN Computer Science 3: 362. [Google Scholar] [CrossRef]
  23. Johnson, Justin M., and Taghi M. Khoshgoftaar. 2023. Data-Centric AI for Healthcare Fraud Detection. SN Computer Science 4: 389. [Google Scholar] [CrossRef] [PubMed]
  24. Karthik, Konduru Praveen, Taduvai Satvik Gupta, Doradla Kaushik, and T. K. Ramesh. 2025. Analysis of ML Models & Data Balancing Techniques for Medicare Fraud Using PySpark. Paper presented at 2025 Fifth IEEE International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India, January 9–10; pp. 1–6. [Google Scholar]
  25. Kumaraswamy, Nishamathi, Mia K. Markey, Tahir Ekin, Jamie C. Barner, and Karen Rascati. 2022. Healthcare Fraud Data Mining Methods: A Look Back and Look Ahead. Perspectives in Health Information Management 19: 1i. Available online: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85128798400&partnerID=40&md5=f18fa3d0dc719c381f4cdd25d073b597 (accessed on 1 July 2025). [PubMed]
  26. LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521: 436–44. [Google Scholar] [CrossRef] [PubMed]
  27. Leevy, Joffrey L., John Hancock, Taghi M. Khoshgoftaar, and Azadeh Abdollah Zadeh. 2023. Investigating the effectiveness of one-class and binary classification for fraud detection. Journal of Big Data 10: 157. [Google Scholar] [CrossRef]
  28. Leevy, Joffrey L., Taghi M. Khoshgoftaar, Richard A. Bauder, and Naeem Seliya. 2020. Investigating the relationship between time and predictive model maintenance. Journal of Big Data 7: 36. [Google Scholar] [CrossRef]
  29. Matschak, Tizian, Christoph Prinz, Florian Rampold, and Simon Trang. 2022. Show Me Your Claims and I’ll Tell You Your Offenses: Machine Learning-Based Decision Support for Fraud Detection on Medical Claim Data. Paper presented at Annual Hawaii International Conference on System Sciences, Maui, HI, USA, January 4–7; pp. 3729–37. Available online: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85152140476&partnerID=40&md5=a75099b153fd5b0f43a66bb1c8d51c04 (accessed on 1 July 2025).
  30. Mayaki, Mansour Zoubeirou A., and Michel Riveill. 2022. Multiple Inputs Neural Networks for Fraud Detection. Paper presented at 2022 International Conference on Machine Learning, Control, and Robotics (MLCR), Suzhou, China, October 29–31; pp. 8–13. [Google Scholar] [CrossRef]
  31. Nabrawi, Eman, and Abdullah Alanazi. 2023. Fraud detection in healthcare insurance claims using machine learning. Risks 11: 160. [Google Scholar] [CrossRef]
  32. Obodoekwe, Nnaemeka, and Dustin Terence van der Haar. 2019. A Comparison of Machine Learning Methods Applicable to Healthcare Claims Fraud Detection. Paper presented at International Conference on Information Technology & Systems, Quito, Ecuador, February 6–8; pp. 548–57. [Google Scholar] [CrossRef]
  33. Qazi, Nadeem, and Kamran Raza. 2012. Effect of feature selection, Synthetic Minority Over-sampling (SMOTE) and under-sampling on class imbalance classification. Paper presented at 2012 14th International Conference on Modelling and Simulation, UKSim 2012, Cambridge, UK, March 28–30; pp. 145–50. [Google Scholar] [CrossRef]
  34. Rohitrox. 2024. Healthcare Provider Fraud Detection Analysis. San Francisco: Kaggle. [Google Scholar]
  35. Sakil, Mohammad Balayet Hossain, Md Amit Hasan, Md Shahin Alam Mozumder, Md Rokibul Hasan, Shafiul Ajam Opee, M. F. Mridha, and Zeyar Aung. 2025. Enhancing Medicare Fraud Detection with a CNN-Transformer-XGBoost Framework and Explainable AI. IEEE Access 13: 79609–22. [Google Scholar] [CrossRef]
  36. Sharma, Ritik, Sugandhi Midha, and Amit Semwal. 2022. Predictive Analysis on Multimodal Medicare Application. Paper presented at International Conference on Cyber Resilience (ICCR), Dubai, United Arab Emirates, October 6–7. [Google Scholar] [CrossRef]
  37. Veale, Michael, Max Van Kleek, and Reuben Binns. 2018. Fairness and accountability design needs for algorithmic support in high-stakes public sector decision-making. Paper presented at 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, April 21–27; pp. 1–14. [Google Scholar]
  38. Wang, Huanjing, John T. Hancock, and Taghi M. Khoshgoftaar. 2023. Improving medicare fraud detection through big data size reduction techniques. Paper presented at 2023 IEEE International Conference on Service-Oriented System Engineering (SOSE), Athens, Greece, July 17–20; pp. 208–17. [Google Scholar] [CrossRef]
  39. Yoo, Yeeun, Jinho Shin, and Sunghyon Kyeong. 2023. Medicare fraud detection using graph analysis: A comparative study of machine learning and graph neural networks. IEEE Access 11: 88278–94. [Google Scholar] [CrossRef]
Figure 1. Workflow of the Medicare fraud detection process, detailing data preprocessing, feature engineering, dataset balancing, and the implementation of multiple machine learning models for fraud identification and evaluation.
Figure 1. Workflow of the Medicare fraud detection process, detailing data preprocessing, feature engineering, dataset balancing, and the implementation of multiple machine learning models for fraud identification and evaluation.
Risks 13 00198 g001
Figure 2. Violin plots representing the distribution of inpatient claim variables in the Medicare dataset: (a) Number of Days Admitted, showing the prevalence of short stays with rare prolonged hospitalizations; (b) Insurance Claim Amount Reimbursed, highlighting a skewed distribution with a small proportion of high-value claims; (c) Duration of Claim, indicating most claims are resolved within 10 days with occasional outliers; and (d) Admission Diagnosis Codes, demonstrating the frequent use of specific codes and the presence of less common diagnoses.
Figure 2. Violin plots representing the distribution of inpatient claim variables in the Medicare dataset: (a) Number of Days Admitted, showing the prevalence of short stays with rare prolonged hospitalizations; (b) Insurance Claim Amount Reimbursed, highlighting a skewed distribution with a small proportion of high-value claims; (c) Duration of Claim, indicating most claims are resolved within 10 days with occasional outliers; and (d) Admission Diagnosis Codes, demonstrating the frequent use of specific codes and the presence of less common diagnoses.
Risks 13 00198 g002
Figure 3. Violin plots for outpatient claim variables in the Medicare dataset: (a) Insurance Claim Amount Reimbursed, showing a skewed distribution with rare high-value claims; (b) Deductible Amount Paid, concentrated below $200 with occasional high outliers; (c) Duration of Claim, reflecting short claim periods with few extended durations; and (d) Admission Diagnosis Codes, highlighting frequent use of specific codes with a long tail for less common diagnoses.
Figure 3. Violin plots for outpatient claim variables in the Medicare dataset: (a) Insurance Claim Amount Reimbursed, showing a skewed distribution with rare high-value claims; (b) Deductible Amount Paid, concentrated below $200 with occasional high outliers; (c) Duration of Claim, reflecting short claim periods with few extended durations; and (d) Admission Diagnosis Codes, highlighting frequent use of specific codes with a long tail for less common diagnoses.
Risks 13 00198 g003
Figure 4. (a) Training metrics and (b) validation metrics (accuracy, precision, recall, F1-score) for Random Forest, KNN, LDA, Decision Tree, and AdaBoost models in Medicare fraud detection.
Figure 4. (a) Training metrics and (b) validation metrics (accuracy, precision, recall, F1-score) for Random Forest, KNN, LDA, Decision Tree, and AdaBoost models in Medicare fraud detection.
Risks 13 00198 g004
Figure 5. Confusion matrices for Random Forest, KNN, LDA, Decision Tree, and AdaBoost models, showing true positives, true negatives, false positives, and false negatives in Medicare fraud detection.
Figure 5. Confusion matrices for Random Forest, KNN, LDA, Decision Tree, and AdaBoost models, showing true positives, true negatives, false positives, and false negatives in Medicare fraud detection.
Risks 13 00198 g005
Figure 6. ROC curve comparison of Random Forest, KNN, LDA, Decision Tree, and AdaBoost models, highlighting their classification performance using AUC as the evaluation metric.
Figure 6. ROC curve comparison of Random Forest, KNN, LDA, Decision Tree, and AdaBoost models, highlighting their classification performance using AUC as the evaluation metric.
Risks 13 00198 g006
Table 1. Summary of key studies on Medicare fraud detection.
Table 1. Summary of key studies on Medicare fraud detection.
AuthorYearAimResult
Bounab et al. (2024)2024Proposed a hybrid method combining SMOTE with ENN to address data imbalance in healthcare fraud detection.SMOTE-ENN with XGBoost improved efficiency in detecting fraud, outperforming traditional ML techniques.
Chirchi and Kavya (2024)2024Studied healthcare fraud detection using advanced ML models to predict provider fraud in Medicare.SMOTE and advanced models improved fraud detection accuracy and addressed class imbalance issues.
Yoo et al. (2023)2023Investigated Medicare fraud detection using graph analysis to improve detection accuracy.Graph neural networks outperformed traditional ML models in detecting Medicare fraud.
Wang et al. (2023)2023Tackled high dimensionality and class imbalance in Medicare fraud detection using feature selection and RUS.Feature selection improved classification accuracy, addressing class imbalance and high dimensionality.
Johnson and Khoshgoftaar (2023)2023Introduced a data-centric approach to improve healthcare fraud classification performance using Medicare claims data.Constructed large labeled datasets from CMS data, improving fraud detection reliability.
Hancock et al. (2023)2023Developed explainable ML models for Medicare fraud detection using feature selection techniques.Feature selection reduced dimensionality while maintaining accuracy, improving transparency in fraud detection.
Mayaki and Riveill (2022)2022Examined the use of neural networks with autoencoders to predict fraudulent Medicare claims.Deep neural network architecture improved classification accuracy for detecting Medicare fraud.
Matschak et al. (2022)2022Presented a CNN-based approach for detecting health insurance claim fraud.Achieved an AUC of 0.7 for selected fraud types using a CNN-based approach.
Kumaraswamy et al. (2022)2022Reviewed data mining methods for healthcare fraud detection, focusing on digital systems.Highlighted challenges in implementing digital fraud detection systems in healthcare.
Johnson and Khoshgoftaar (2022)2022Studied encoding high-dimensional procedure codes for healthcare fraud detection.Binary tree decomposition and hashing improved classification accuracy for fraud detection.
Bauder and Khoshgoftaar (2020)2020Investigated the effects of class rarity on binary classification problems in Big Data fraud detection.Oversampling and undersampling techniques improved model accuracy and reduced bias.
Johnson and Khoshgoftaar (2020)2020Explored the influence of medical provider specialty and semantic embeddings in detecting fraudulent providers.Dense semantic embeddings improved model performance for detecting fraud.
Arunkumar et al. (2021)2021Investigated hybrid clustering and classification methods for healthcare insurance fraud detection.Hybrid clustering and classification approaches outperformed other algorithms in classifying fraud.
Johnson and Khoshgoftaar (2019b)2019Applied neural networks to detect Medicare fraud with publicly available claims data.Improved existing ML models for Medicare fraud detection, contributing to automated fraud detection.
Bauder et al. (2019)2019Evaluated ML model performance for real-world Medicare fraud detection.Stressed the importance of validating ML models on new input data for real-world applications.
Obodoekwe and Haar (2019)2019Compared ML methods for detecting healthcare claims fraud.Ensemble methods and neural networks were the most effective, while logistic regression performed poorly.
Hasanin et al. (2019)2019Analyzed the impact of class imbalance on Big Data analytics using ML algorithms.Emphasized the importance of data sampling strategies to mitigate class imbalance in Big Data.
Herland et al. (2019)2019Investigated class rarity in supervised healthcare fraud detection models using Medicare data.Detected fraudulent activities could recover up to $350 billion in financial losses.
Herland et al. (2017)2017Developed an anomaly detection model to identify potential healthcare fraud.Improved anomaly detection model performance through feature selection and specialty grouping.
Karthik et al. (2025)2025Proposed ML models with PySpark and balancing techniques (RUS, SMOTE) for large-scale fraud detection.Achieved scalable performance with highest specificity of 73% using 1:5 balanced datasets.
Johnson and Khoshgoftaar (2019a)2019Applied deep learning and imbalance handling techniques (ROS, RUS, ROS-RUS) to Medicare fraud detection.Hybrid methods improved AUC and efficiency, reducing imbalance challenges.
Arockiasamy and Bhoopathi (2025)2025Proposed dual-model (unsupervised anomaly detection + supervised classification) for fraud detection.Reduced false positives by 63% and improved AUC to 88.3%.
Chirchi and Kavya (2024)2024Analyzed provider fraud using SMOTE, Random Forest, Logistic Regression.Identified provider behavior patterns, improved model robustness with SMOTE.
Herland et al. (2019)2019Investigated class rarity in supervised healthcare fraud detection.Found performance drops with class rarity; proposed undersampling to mitigate imbalance.
Sakil et al. (2025)2025Proposed CNN-Transformer-XGBoost with explainable AI for Medicare fraud detection.Achieved F1-score 0.95, AUC-ROC up to 0.98, surpassing Random Forest and SVM.
He et al. (2008)2008Proposed ADASYN (Adaptive Synthetic Sampling) to improve class balance by focusing on minority samples that are harder to classify.Improved detection of complex patterns in imbalanced datasets by adaptively generating synthetic data.
LeCun et al. (2015)2015Presented deep feature extraction using autoencoders and deep neural networks.Enabled automatic identification of relevant features in high-dimensional datasets, outperforming manual feature engineering.
Bottou (2010)2010Applied online learning with stochastic gradient descent (SGD) for large-scale ML.Allowed for models to update continuously with streaming data, enhancing responsiveness to evolving fraud patterns.
Table 2. Detailed list of variables in the Medicare fraud dataset.
Table 2. Detailed list of variables in the Medicare fraud dataset.
Variable NameDescription
BeneIDUnique identifier for each beneficiary.
DOBDate of birth of the beneficiary.
DODDate of death of the beneficiary.
GenderGender of the beneficiary.
RaceRace of the beneficiary.
RenalDiseaseIndicatorIndicator if the beneficiary has renal disease.
StateState code where the beneficiary resides.
CountyCounty code where the beneficiary resides.
NoOfMonths_PartACovNumber of months the beneficiary was covered under Medicare Part A.
NoOfMonths_PartBCovNumber of months the beneficiary was covered under Medicare Part B.
ChronicCond_AlzheimerIndicator if the beneficiary has Alzheimer’s disease.
ChronicCond_HeartfailureIndicator if the beneficiary has heart failure.
ChronicCond_KidneyDiseaseIndicator if the beneficiary has kidney disease.
ChronicCond_CancerIndicator if the beneficiary has cancer.
ChronicCond_ObstrPulmonaryIndicator if the beneficiary has obstructive pulmonary disease.
ChronicCond_DepressionIndicator if the beneficiary has depression.
ChronicCond_DiabetesIndicator if the beneficiary has diabetes.
ChronicCond_IschemicHeartIndicator if the beneficiary has ischemic heart disease.
ChronicCond_OsteoporasisIndicator if the beneficiary has osteoporosis.
ChronicCond_RheumatoidArthritisIndicator if the beneficiary has rheumatoid arthritis.
ChronicCond_StrokeIndicator if the beneficiary has experienced a stroke.
IPAnnualReimbursementAmtAnnual inpatient reimbursement amount.
IPAnnualDeductibleAmtAnnual inpatient deductible amount.
OPAnnualReimbursementAmtAnnual outpatient reimbursement amount.
OPAnnualDeductibleAmtAnnual outpatient deductible amount.
ClaimIDUnique identifier for each claim.
ClaimStartDtStart date of the claim.
ClaimEndDtEnd date of the claim.
ProviderUnique identifier for the healthcare provider.
InscClaimAmtReimbursedInsurance claim amount reimbursed.
AttendingPhysicianIdentifier for the attending physician.
OperatingPhysicianIdentifier for the operating physician.
OtherPhysicianIdentifier for any other physician linked to the claim.
ClmDiagnosisCode_1 to 10Up to 10 diagnosis codes for a single claim.
ClmProcedureCode_1 to 6Up to 6 procedure codes for medical procedures performed during the claim period.
DeductibleAmtPaidDeductible amount paid by the beneficiary for the claim.
ClmAdmitDiagnosisCodeAdmission diagnosis code for inpatient claims.
Table 3. Descriptive statistics of key variables in the Medicare fraud, including counts, means, and standard deviations for potential fraud, reimbursement amounts, deductibles, admission rates, claim durations, and Medicare Part A and B coverage periods.
Table 3. Descriptive statistics of key variables in the Medicare fraud, including counts, means, and standard deviations for potential fraud, reimbursement amounts, deductibles, admission rates, claim durations, and Medicare Part A and B coverage periods.
VariableCountMeanStd
Potential Fraud556,7030.3810.486
Insurance Claim Amt Reimbursed556,703996.9363819.692
Deductible Amt Paid556,70378.428273.809
Admitted556,7030.0730.259
Duration of Claim556,7031.7284.905
Number of Days Admitted556,7030.4832.299
Renal Disease Indicator556,7030.1970.397
No. of Months Part A Cov556,70311.9310.890
No. of Months Part B Cov556,70311.9390.786
Table 4. Performance metrics (accuracy, precision, recall, and F1-score) of five machine learning models—Random Forest, KNN, LDA, Decision Tree, and AdaBoost—evaluated on both training and validation datasets for Medicare fraud detection.
Table 4. Performance metrics (accuracy, precision, recall, and F1-score) of five machine learning models—Random Forest, KNN, LDA, Decision Tree, and AdaBoost—evaluated on both training and validation datasets for Medicare fraud detection.
ModelMetricAccuracyPrecisionRecallF1-Score
Random ForestTrain0.9920.9851.0000.992
Validation0.9880.9700.9990.984
KNNTrain0.8950.8530.9550.901
Validation0.7920.6830.8520.758
LDATrain0.6650.8410.4080.549
Validation0.6330.5690.1660.257
Decision TreeTrain0.9680.9440.9960.969
Validation0.9630.9121.0000.954
AdaBoostTrain0.8360.8160.8660.840
Validation0.8110.7210.8290.771
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Farahmandazad, D.; Danesh, K.; Abadi, H.F.N. Application of Standard Machine Learning Models for Medicare Fraud Detection with Imbalanced Data. Risks 2025, 13, 198. https://doi.org/10.3390/risks13100198

AMA Style

Farahmandazad D, Danesh K, Abadi HFN. Application of Standard Machine Learning Models for Medicare Fraud Detection with Imbalanced Data. Risks. 2025; 13(10):198. https://doi.org/10.3390/risks13100198

Chicago/Turabian Style

Farahmandazad, Dorsa, Kasra Danesh, and Hossein Fazel Najaf Abadi. 2025. "Application of Standard Machine Learning Models for Medicare Fraud Detection with Imbalanced Data" Risks 13, no. 10: 198. https://doi.org/10.3390/risks13100198

APA Style

Farahmandazad, D., Danesh, K., & Abadi, H. F. N. (2025). Application of Standard Machine Learning Models for Medicare Fraud Detection with Imbalanced Data. Risks, 13(10), 198. https://doi.org/10.3390/risks13100198

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop