Next Article in Journal
An Intelligent Software Architecture for Digital Library Systems in Sustainable Education
Previous Article in Journal
DSGTA: A Dynamic and Stochastic Game-Theoretic Allocation Model for Scalable and Efficient Resource Management in Multi-Tenant Cloud Environments
Previous Article in Special Issue
Frame and Utterance Emotional Alignment for Speech Emotion Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Methodology for Detecting Suspicious Claims in Health Insurance Using Supervised Machine Learning

by
Jose Villegas-Ortega
1,2,*,
Luis Napoleon Quiroz Aviles
3,
Juan Nazario Arancibia
3,
Wilder Carpio Montenegro
3,
Rosa Delgadillo
1 and
David Mauricio
1
1
Faculty of System Engineering, Universidad Nacional Mayor de San Marcos, Lima 15081, Peru
2
Unidad de Postgrado, Universidad Peruana de Ciencias Aplicadas, Lima 15072, Peru
3
Ministerio de Salud, Seguro Integral de Salud, Lima 15018, Peru
*
Author to whom correspondence should be addressed.
Future Internet 2025, 17(12), 584; https://doi.org/10.3390/fi17120584
Submission received: 15 November 2025 / Revised: 10 December 2025 / Accepted: 10 December 2025 / Published: 18 December 2025

Abstract

Health insurance fraud (HIF) places a substantial economic burden on global health systems. While supervised machine learning (SML) offers a promising solution for its detection, most approaches are ad hoc and lack a systematic methodological framework that ensures replicability, adaptability, and effectiveness, especially in contexts with severe class imbalance. We developed PDHIF (Phases for Detecting Fraud in Health Insurance), a six-phase systematic methodology that introduces a holistic focus that integrates fraud theory, actors, manifestations, and factors with the complete SML lifecycle. We applied this methodology in a case study using a dataset of 8.5 million claims from a public health insurance system in Peru. We trained and evaluated three SML models (Random Forest, XGBoost, and multilayer perceptron) in two experimental scenarios: one with the original, highly unbalanced dataset and another with a training set balanced via the K-means SMOTE technique. When PDHIF was applied, the results revealed a stark contrast: in the unbalanced scenario, the models were ineffective at detecting fraud (F1 score < 0.521) despite high accuracy (>98%). In the balanced scenario, the performance improved dramatically. The best-performing model, RF, achieved an F1 score of 0.994, a sensitivity of 0.994, and an AUC of 0.994 on the test set, demonstrating a robust ability to distinguish suspicious claims.

1. Introduction

Fraud, waste, and abuse in healthcare systems represent a global economic and social burden, consuming an estimated 3% to 10% of total annual healthcare spending [1] and costing hundreds of billions of dollars [2]. This diversion of resources not only compromises the financial sustainability of healthcare systems but also erodes patient trust and can negatively impact the quality of care.
Efforts to mitigate health insurance fraud (HIF) face significant challenges, including high initial detection costs and administrative complexities. While automated detection of health insurance fraud can result in substantial savings—estimated at 25% of total healthcare spending in the United States [3]—identifying such practices in the healthcare sector is more complex than those in other insurance sectors, such as credit or auto insurance [4]. Furthermore, detecting health insurance fraud involves numerous challenges: information is often lacking or of poor quality, and reporting fraud can be arduous and cost-inefficient, resulting in high investigation costs and delays in legal action [5]. Fraudsters also evolve and adapt their strategies, making it difficult to analyze large volumes of overlapping data [6]. This is compounded by the lack of real-time detection [7] and the high costs of triage [8], which require human intervention to resolve doubtful claims [9]. Despite these efforts, it is estimated that $100 billion USD is lost annually to fraud, waste, and abuse in the U.S. healthcare system alone [2].
Methods for detecting health insurance fraud (HIF) require a deep understanding of the ecosystem in which it occurs. Five fundamental, interrelated elements are identified: (1) the conditions that define fraud as an intentional act that violates the law to obtain an illicit benefit [10]; (2) the actors who commit it (insurers, policyholders, and healthcare providers); (3) the specific manifestations of fraud for each actor; (4) the factors (macro, meso-, and microenvironmental) that influence fraudulent behavior; and (5) the key processes where fraud occurs (enrollment, service delivery, and billing/payment). The interaction of these elements is dynamic and adaptive. Figure 1 conceptualizes these relationships, providing an essential framework for designing robust detection strategies.
Furthermore, CRISP-DM exhibits low levels of adherence, which becomes evident when applied to highly complex domains such as health insurance fraud (HIF), given that its generalist and industry-independent nature lacks the necessary flexibility and operational focus [11,12]; this limitation is corroborated in the analysis of the 12 representative studies in Table 1, where although supervised machine learning (SML) techniques have become established as key tools for identifying suspicious patterns, significant methodological shortcomings persist, so the authors prefer to use ad hoc methodologies.
Initially, studies focused on accuracy, a metric that can be misleading in datasets with class imbalance. This led to the adoption of more robust metrics, such as the F1 score. Since 2018, there has been a trend toward incorporating data balancing techniques to mitigate model bias [17,22]. Despite these advances, the review reveals a fundamental methodological gap: most studies use ad hoc approaches. Only one study has applied the cross-sector standard process for data mining (CRISP-DM) methodology [18], and there is no standardized machine learning framework adapted to the specificities of high-intensity factors (HIFs). This methodological deficiency results in algorithms trained without clearly identifying the context in which they operate, their manifestations, or associated factors.
Therefore, the objective of this study is to propose and validate a systematic methodology for developing SML models for detecting suspicious claims in health insurance. To demonstrate its effectiveness, we applied this methodology in a case study using data from a public health insurance system in Peru.

2. Materials and Methods

Part A: We present the PDHIF (Phases for Detecting Health Insurance Fraud) methodology, a systematic approach designed for the complete lifecycle of health insurance fraud detection. Unlike general methodologies such as CRISP-DM, a recognized standard for data mining [12], the PDHIF is specific to health insurance fraud detection and consists of six phases: (1) identifying indicators of fraud; (2) identifying the manifestations associated with the actors and integrating data by associating the available variables with the factors of health insurance fraud; (3) preprocessing—which includes data cleaning, transformation, and preparation—and data balancing; (4) developing, training, and evaluating the health insurance fraud model; (5) deploying and operational integration; and (6) adaptive monitoring and evolution (Table 2). These phases not only enable more effective and efficient detection of medical malpractice (MMP) but also facilitate a better understanding of the phenomenon and continuous system improvement through feedback and model refinement. Unlike previous approaches, this method is based on the fundamental principles of machine learning and introduces significant improvements tailored to the specific challenges of the MMP. Its systematic design encompasses everything from understanding fraud to adaptive monitoring, integrating components such as identifying manifestations associated with the actors, data preprocessing and balancing, training supervised models, and their operational deployment. This ensures greater accuracy in identifying fraudulent patterns and constant evolution in response to changes in the healthcare environment. Figure 2 illustrates the complete workflow, which is detailed in the six phases described in Table 2.
Part B: We empirically validated the PDHIF methodology by applying the steps in Table 2 in a case study using data from Peru’s main public insurer, Seguro Integral de Salud (SIS), which covers 26.3 million people, which is equivalent to 77.2% of the Peruvian population. In total, 12.9 million people receive medical care, generating approximately 93 million services annually [27]. The volume of these transactions implies a high risk of fraud. For the budget allocated by the Ministry of Economy and Finance [28] to this health insurance program, considering estimates ranging from 3% to 10% [1], annual losses due to fraud would amount to between USD 18.6 and 61.8 million. This highlights the urgent need to implement effective fraud detection and prevention solutions. Notably, SIS has previously developed medical auditing efforts for detecting nonconformities on the basis of artificial intelligence. In this context, the first four phases of the PDHIF methodology are implemented to demonstrate its applicability.

2.1. Phase 1—Identifying Signs of Fraud

The SIS operates under a regulatory framework that establishes three audit processes for healthcare services recorded in single care forms (FUA): automatic supervision (PSA), electronic medical supervision (SME), and post-appointment onsite control (PCPP). Noncompliance with these protocols is considered a suspected case of malpractice. The PSA is governed by resolutions No. 185-2009-SIS/J, No. 192-2016-SIS/J, and No. 000051-2024-SIS/J, which define 46 consistency rules and 27 automated validation rules [29]. The SME, regulated by Directive No. 001-2017-SIS/GNF-V.01 and Resolutions No. 210-2018/SIS and No. 000051-2024-SIS/J, oversees healthcare services, budget management, and the replenishment of supplies. Its implementation is limited, given the partial review of the FUAs (Single Administrative Forms), which is carried out by approximately 58 medical auditors nationwide, leaving a significant proportion without thorough analysis [30]. The PCPP (Program for the Quality of Healthcare Services), established by Resolutions No. 006-2016/SIS and No. 231-2022-SIS/J, evaluates the quality of services in person [31], which also generates the need for specialized personnel, travel expenses, and per diem allowances.
In Peru, local socioeconomic dynamics significantly influence the SML, including low salaries for medical personnel [32], citizen insecurity, institutional distrust (92% of the population distrusts Congress and 83% the Ministry of the Interior and limits the effectiveness of audits and sanctions [33,34], systematic and political corruption (exemplified by convicted and imprisoned presidents), and judicial impunity. These factors weaken control mechanisms and create an environment conducive to fraudulent practices [33,35]. In this context, we analyze the fraud triangle theory, which focuses on the provider, who operates under political, economic, and labor pressures. This can motivate suspicious practices such as referring patients to private clinics, overbilling them for services, excessive prescription of medications, or fabricating charges for procedures not performed. Opportunities for fraud include the ability to manipulate information and knowledge of the process and regulatory ambiguity. Rationalization, on the other hand, is based on the cultural normalization of fraud, expressed in phrases such as “everyone does it,” the perception of impunity due to weak controls, a low probability of detection, and the normalization of lax data entry practices.

2.2. Phase 2—Identification of Available Manifestations and Factors

From the Phase 1 analysis, the reviewers (D.M., J.C.A., J.V., W.C.M.) selected 12 supplier-related statements [10]. In Table 3, the reviewers subsequently worked in pairs to confirm the supplier statements, considering the regulations and information from the audited FUAs. In cases of discrepancies, a third reviewer participated in the decision to identify potential statements (L.Q.A., R.D.). A statement will be recorded as ELIGIBLE if, within the context of the SIS, there is any possibility of suspected fraud that violates any PSA, SME, or PCPP regulations identified in Phase 1 and has a variable recorded in the FUA. Otherwise, it is considered UNELIGIBLE. Table 3a presents the five eligible supplier statements. Additionally, 10 SML factors associated with 23 available variables (selected = yes) from the FUAs were mapped (Table 3b); these variables make up the RLIMA-CE dataset.

2.3. Phase 3—Data Preprocessing and Balancing

We considered the available variables associated with the factors of Phase 2. The records include FUAs from 2023 and 2024 corresponding to public providers of the third level of care in the Metropolitan Lima Center and East China. This level provides care for rare problems and complex pathologies that require specialized and high-technology procedures. These data make up the original dataset called RLIMA-CE.
RLIMA-CE was preprocessed in three stages: cleaning, transformation, and normalization. Initially, 42 independent variables corresponding to the provider’s domains were processed. During cleaning, the following were removed: V02–V04, because they identify the FUAs; V07 and V12, because they are considered in the filtering of each dataset; V08–V09, because they identify regional records; V10, because it is represented in V11; V15, because it is not necessary for a classification problem; V16–V17, because it is represented in V18; V20, because it is included in V21; V34–V36, because they are included in V28–V33; and V39–V41, because they are synthesized in V42.
In the data transformation phase, clinical codes, such as the International Classification of Diseases (ICD-10) for diagnoses, the Catalog of Medical and Health Procedures (CPMS) for procedures, and the Integrated System for the Supply of Medicines and Medical-Surgical Supplies (SISMED) for supplies and medicines, were standardized to ensure consistency. Variables V21, V28, V30, and V32 underwent frequency encoding, whereas V22, V29, and V31 were transformed by counting unique codes. Finally, the independent variable V42 was created from an evaluation of the PSA, SME, and PCPP audit processes: it was labeled “1” if there was an observation (suspected fraud) and “0” if the record was legitimate. Upsampling was considered for the minority class via historical records (2019–2022), thus allowing these markings from previous years to be leveraged. The variables were not normalized because we used decision tree-based algorithms.
To understand the effectiveness of the PDHIF methodology, two experimental scenarios were created: Scenario 1, a preprocessed dataset of 23 variables and unbalanced; and Scenario 2, a preprocessed dataset of 23 variables and balanced, for which the balancing K-Means SMOTE was implemented, which was successful, achieving a balanced distribution of classes in all segments, resulting in a dataset of 16,648,713 records and 23 variables (Table 4).
To operationalize the detection of suspicious claims, we selected three distinct supervised learning algorithms based on their widespread adoption in fraud detection literature and their differing learning capabilities [21,22]. Random Forest (RF) is an ensemble method that constructs a multitude of decision trees during training. It operates on the principle of bagging (bootstrap aggregating), where the final prediction is the majority vote of the individual trees, making it highly robust to overfitting and noise [36]. Extreme Gradient Boosting (XGB) is a scalable implementation of gradient boosting machines; unlike RF, XGB builds trees sequentially, where each new tree attempts to correct the errors of the previous ones. It employs a regularized objective function to control model complexity and utilizes a sparsity-aware algorithm to handle missing data effectively [37]. Multilayer Perceptron (MLP) is a feedforward artificial neural network consisting of at least three layers of nodes: an input layer, a hidden layer, and an output layer. It utilizes backpropagation for training and non-linear activation functions, allowing it to model complex, non-linear relationships within high-dimensional data [38]. Finally, to address the class imbalance, we utilized K-Means SMOTE. Unlike standard SMOTE, which interpolates between nearest neighbors regardless of data distribution, K-Means SMOTE first clusters the data using k-means and then applies oversampling only in safe clusters. This approach prevents the generation of noise and ensures that synthetic samples are generated in crucial areas of the minority class distribution.
For each scenario, the performance of three machine learning algorithms was trained and evaluated: random forest (RF), extreme gradient boosting (XGB), and multilayer perceptron (MLP). The data were randomly divided, with 70% for training and 30% for testing. Additionally, scenario 2 was balanced via the K-Means SMOTE technique with k = 2 (silhouette score = 0.9960) and subjected to 10-fold cross-validation. Model performance was assessed via five metrics: precision, recall (sensitivity), F1 score, accuracy, and AUC-ROC. After several iterations, the hyperparameters were defined (Table 5).

3. Results

This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, and the experimental conclusions that can be drawn.

3.1. Phase 4—Model Development, Training, and Evaluation

3.1.1. Confusion Matrix

Visual analysis corroborates the numerical findings. In Scenario 1 (unbalanced data), Figure 3a–c reveal a critical deficiency in the models. A high number of false negatives (FNs) are observed, representing suspicious cases that the models fail to detect. Specifically, the RF model (Figure 3a) incorrectly classified 24,091 suspicious claims as legitimate, whereas XGB (Figure 3b) and MLP (Figure 3c) failed in 25,637 and 29,581 cases, respectively. This result demonstrates that, despite high overall accuracy, the models were practically ineffective for their primary purpose: identifying suspected fraud. In contrast, Scenario 2 (balanced data) illustrates improved performance. After the K-Means balancing technique, SMOTE was applied, and the number of false negatives was drastically reduced in all the models. The RF model (Figure 3d) now only failed in 14,341 cases, while it correctly identified over 2.48 million suspicious claims (true positives). Similarly, XGB (Figure 3e) and MLP (Figure 3f) demonstrated very robust detection capabilities, correctly identifying the vast majority of suspicious cases.

3.1.2. Loss Function

Figure 4 shows the evolution of the loss function for the XGB (a and e) and MLP (c and g) models in both scenarios for both the training and validation sets. In all these cases, a clear convergence of the function is observed, indicating an effective learning process and optimal generalization capacity without overfitting. Notably, the XGB model demonstrated superior performance, consistently achieving lower loss values than the MLP in both scenarios. With respect to accuracy, XGB exhibits outstanding performance in both scenarios (b and f), with AUC curves that rise rapidly and remain very close together, indicating fast learning without overfitting. The MLP model also shows good learning, although a slight tendency toward overfitting can be observed in the balanced scenario (h) because of the separation between the curves.

3.1.3. Results of the Metrics

The metrics in Table 6 summarize the performance of the three models, RF, XGB, and MLP, in the two scenarios, with Scenario 2 being the most effective. In Scenario 1, although all the models have high accuracy (>0.98), their sensitivity values are very low, indicating that they are not effective at detecting suspicious claims. The RF model (F1 = 0.521) achieves the best balance, outperforming XGB (F1 = 0.484) and the MLP (F1 = 0.358). However, Scenario 2 (balanced) is clearly superior for all the models, consistently achieving accuracy, sensitivity, and F1 score metrics above 0.94, demonstrating the effectiveness of the balancing technique in improving the detection of the positive class. Within Scenario 2, the RF model remains the most robust model, with F1 = 0.994, slightly outperforming XGB (F1 = 0.978) and the MLP (F1 = 0.945). In conclusion, Scenario 2 (balanced) is the most suitable approach for this classification task, and the RF consistently performs best in both contexts.
To complement the analysis, we incorporated the Matthews Correlation Coefficient (MCC) into our evaluation. Unlike accuracy—which can be misleading in highly skewed datasets—the MCC provides a more robust and discriminative assessment of model performance. In the non-balanced scenario, the models achieved MCC values ranging from 0.41 to 0.56, demonstrating that the classifiers were able to learn meaningful patterns despite the extreme skew in class distribution. After applying data balancing techniques, performance improved substantially, with MCC values between 0.89 and 0.99, confirming that the models capture the underlying structure of the problem effectively when trained on a more representative sample.
The high metrics observed in Scenario 2 reflect the model’s ability to learn patterns within the balanced training space. However, we recognize that implementation in a real-world environment with highly skewed distributions may result in different performance dynamics.

4. Discussion

4.1. About the Methodology

While CRISP-DM provides a global standard for data mining, its generic application is insufficient to address the inherent complexities of health insurance fraud (HIF), as evidenced by its limited adoption in the reviewed previous studies (Table 1). The PDHIF methodology overcomes these limitations by departing from this general framework and integrating a domain-specific taxonomy that operationalizes fundamental theories, such as the Triangle or other fraud theories, directly into the feature engineering and preprocessing phases. Unlike the generalist approach of CRISP-DM, PDHIF explicitly incorporates the identification of actors (insurer, provider, insured) and the local regulatory frameworks that define legality, considering macro, meso, and micro environmental factors before the modeling phase. This conceptualization adds methodological value by reducing the “black box” nature of feature selection, ensuring that models are not only statistically valid but also contextually consistent with medical audit regulations.
PDHIF’s systematic six-phase design transcends the technical detection of anomalies by enabling a contextualized understanding of fraudulent dynamics, especially in environments with systemic challenges of corruption and impunity, such as the Peruvian case. The methodology ensures the system’s sustainability by including an operational deployment and adaptive monitoring phase (Phase 6), which facilitates periodic retraining cycles to incorporate new factors arising from regulatory changes and the evolution of fraudulent tactics. In this way, the PDHIF enhances the model’s interpretive capacity and offers a robust, tailored solution that reflects the true complexity of health claims content, filling the existing methodological gap in the current literature.
Furthermore, the methodology effectively addresses key technical challenges, such as high-class imbalance, through advanced balancing techniques (SMOTE) and the selection of operationally focused evaluation metrics (F1 score, recall, and AUC-ROC). The inclusion of operational deployment phases and continuous monitoring ensures the system’s sustainability in real-world scenarios, where evolving fraudulent tactics and regulatory changes necessitate periodic updates. Thus, the PDHIF positions itself as a robust and reproducible framework that combines analytical rigor with practicality, offering a solid foundation for implementation in healthcare systems with characteristics similar to those of the Peruvian case study.

4.2. Regarding the Case Study: SIS

The empirical validation of the PDHIF methodology in Peru’s SIS demonstrates its effectiveness in addressing the real challenges of fraud detection in public health systems. The rigorous application of the initial phases allowed for the identification of specific manifestations of fraud among providers (M10, M11, M12, M14, M16) and their mapping to variables available in the single care forms (FUA), creating a representative dataset (KLIMA-CE) with 23 predictor variables associated with 10 factors (provider characteristics, gender, age, specialty, diagnosis, chronic condition, reimbursement, treatments, medication, and auditing). Data preprocessing, which included the creation of synthetic variables and frequency encoding, along with the application of the k-means SMOTE to balance the classes, proved fundamental in overcoming the extreme imbalance (1.5% of suspects), a typical condition in fraud cases that limits the effectiveness of conventional approaches.
The experimental results confirm the superiority of Scenario 2 (balanced), where the application of K-Means SMOTE proved decisive. In the unbalanced scenario, the high accuracy (>98%) masked a critically low sensitivity (<0.38) due to the algorithms’ inherent bias toward the majority class. The introduction of synthetic samples fundamentally altered the decision boundaries, enabling the models to learn the structural features of fraud rather than treating them merely as statistical anomalies. Within this optimized context, RF emerged as the most robust algorithm (F1 = 0.994), outperforming XGBoost (F1 = 0.978) and the Multilayer Perceptron (F1 = 0.945). Theoretically, this dominance is attributed to RF’s bagging mechanism, which reduces variance and exhibits greater resilience to label noise—a limitation inherent in data derived from administrative audits. In contrast, boosting methods like XGBoost can be prone to overfitting as they attempt to correct errors in noisy instances, while MLPs typically require significantly larger data volumes to converge effectively on tabular data. The operational validation of this approach is evidenced by the drastic reduction in false negatives (from 24,091 to 14,341 in the best-performing model). The robustness of the proposed methodology is further validated by the high MCC scores (>0.89) in the balanced scenario, confirming that the high F1 scores are not artifacts of the metric but reflect true discriminative power. This corroborates that combining advanced balancing techniques with ensemble algorithms constitutes an efficient solution, thereby establishing the first high-precision digital auditor for the Peruvian health system.

4.3. Limitations

This study has limitations in its scope of validation. First, the empirical application was limited to tertiary-level providers in Metropolitan Lima. Consequently, the specific predictive patterns identified may not be directly generalizable to other levels of complexity (such as primary care centers), the private sector, or rural regional contexts with different resource limitations and socioeconomic dynamics. However, the PDHIF methodology itself—specifically in its actor identification and factor mapping phases—is intrinsically designed to be adaptable. Therefore, broader implementation would require retraining the models via region-specific datasets, thus capturing the local variations inherent in healthcare service delivery.
Second, a significant limitation lies in the construction of the target variable (V42), derived from observations of the SIS administrative audit processes (PSA, SME, PCPP) and not from judicial verdicts. This introduces a risk of labeling bias, where models could learn to replicate preexisting ‘audit patterns’ instead of detecting novel fraud schemes that circumvent current controls. However, according to the definition of an HIF as obtaining illegal benefits through deception [10], the regulatory infractions detected by the SIS constitute a solid basis for illegitimacy. Therefore, the results should be interpreted as ‘high-probability suspicious claims’ that require human review, rather than being considered confirmed fraud. While future iterations should incorporate feedback loops to refine these labels, the proposed methodological framework (PDHIF) remains generally applicable and transferable to other contexts.”

5. Conclusions

The PDHIF methodology proves to be a robust and systematic framework for detecting fraud in health insurance. Unlike the predominantly ad hoc approaches documented in the literature and traditional data mining methods, PDHIF introduces a holistic approach that integrates fraud theory, actors, manifestations, and factors with the complete lifecycle of supervised machine learning algorithms. Its validation in the real-world case of the SIS confirms its ability to operationalize clinical, operational, and economic variables in a highly accurate predictive model, with random forest emerging as the most effective algorithm (F1 score of 0.994) after the critical application of advanced balancing techniques.
The successful implementation of the first four phases of PDHIF in a highly complex and extremely unbalanced environment demonstrates its practical utility in transforming transactional data into actionable alerts and overcoming the limitations of traditional audit systems. This approach provides a reproducible basis for public and private institutions to implement proactive, scalable, and adaptive detection systems capable of optimizing the allocation of audit resources and mitigating significant financial losses.

Future Work

A crucial next step for operational integration is the development of an interpretability layer to elucidate the decision-making process of black-box models. Future work will incorporate explainable AI (XAI) techniques, such as Shapley additive explanations (SHAPs), to decompose the model’s predictions, allowing medical auditors to understand which specific factors—for example, discrepancies between diagnoses and procedures or provider history—contribute to the suspicion score. This transparency is essential for increasing auditor confidence and developing effective prevention rules. Additionally, we plan to develop hybrid systems that combine the proposed supervised model with unsupervised approaches, thereby expanding the scope of fraud detection to include novel anomalies and emerging patterns.
To further mitigate the risks of overfitting and metric inflation associated with synthetic oversampling (SMOTE), future work will focus on evaluating the models on strictly unbalanced test sets that mimic real-time production flows. In addition, we plan to conduct comparative analyses via alternative handling techniques, such as undersampling, ADASYN, and cost-sensitive learning, to validate the robustness of the PDHIF.
Furthermore, to establish a more comprehensive baseline, future studies will expand the experimental section to include a wider range of algorithms, such as support vector machines (SVMs), LightGBM, and ensemble variations, thereby benchmarking the robustness of the PDHIF approach. Concurrently, we plan to investigate alternative imbalance-adaptive oversampling techniques, such as adaptive synthetic sampling (ADASYN), to compare their effectiveness against K-means SMOTE in preserving the underlying data distribution while optimizing minority class detection.

Author Contributions

J.V.-O.: Contributed to the conception of the study, conceptualization of the methodology, literature review, validation of the methodology in the case study, preprocessing design, algorithm selection, formal analysis, writing—original and editing. L.N.Q.A.: Contributed to the formal analysis of the method of machine learning in the case study, validation of the dataset and development of the classification model in the SIS operational environment. J.N.A.: Contributed to the construction of the machine learning method in the case study, construction of software, data curation of the dataset and development of the classification model in the SIS operational environment. W.C.M.: Contributed to the formal analysis of the method of machine learning in the case study, validation of the dataset and development of the classification model in the SIS operational environment. R.D.: Contributed to the application of the case study method, formal analysis, review and editing and funding acquisition. D.M.: Contributed to the conception of the study, supervision, conceptualization of the methodology, validation of the methodology in the case study, preprocessing design, algorithm selection, formal analysis, writing—original and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Universidad Nacional Mayor de San Marcos–RR N° 005446-2025 and project number C25202801.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

In accordance with international ethical guidelines for research involving anonymized secondary data, informed consent was not required. The study involved no direct interaction with patients, no intervention, and posed no risk to individual subjects.

Data Availability Statement

The data used for this case study, specifically the claims database of the Seguro Integral de Salud of Peru, are not publicly available due to the internal policies of the public institution, which prefers to keep these records confidential. However, the methodology detailed in the manuscript describes the entire process, and all results are based exclusively on the internal processing of these data by the collaborating institution. The data presented in this study are available on request from the corresponding author.

Acknowledgments

We wish to express our sincere gratitude to the Seguro Integral de Salud of Peru for its invaluable collaboration in providing data for the case study and for its commitment to applying the PDHIF methodology. This research would not have been possible without the institution’s support and interest in implementing innovative solutions for fraud detection in the public health system.

Conflicts of Interest

J.V. worked at the Seguro Integral de Salud (SIS), and J.N.A., W.C.M. and L.Q.A. worked at the SIS. The other authors declare that they have no conflicts of interest. This project is the result of a collaboration between the SIS and the National University of San Marcos (UNMSM).

Abbreviations

The following abbreviations are used in this manuscript:
HIFHealth insurance fraud
SMLSupervised machine learning
PDHIFPhases for Detecting Fraud in Health Insurance
CRISP-DMCross-sector standard process for data mining
ICD-10Classification of Diseases, 10th Revision
CPTCurrent procedural terminology
SMOTESynthetic minority oversampling technique
FUASingle care forms
PSAAutomatic supervision
SMEElectronic medical supervision
PCPPPost-appointment onsite control

References

  1. Joudaki, H.; Rashidian, A.; Minaei-Bidgoli, B.; Mahmoodi, M.; Geraili, B.; Nasiri, M.; Arab, M. Improving fraud and abuse detection in general physician claims: A data mining study. Int. J. Health Policy Manag. 2016, 5, 165–172. [Google Scholar] [CrossRef]
  2. U.S. Department of Justice Justice Manual. 976. Health Care Fraud—Generally. United States Department of Justice. Available online: https://www.justice.gov/archives/jm/criminal-resource-manual-976-health-care-fraud-generally (accessed on 6 April 2025).
  3. Shrank, W.H.; Rogstad, T.L.; Parekh, N. Waste in the US Health Care System: Estimated Costs and Potential for Savings. JAMA 2019, 322, 1501. [Google Scholar] [CrossRef]
  4. Kose, I.; Gokturk, M.; Kilic, K. An interactive machine-learning-based electronic fraud and abuse detection system in healthcare insurance. Appl. Soft Comput. J. 2015, 36, 283–299. [Google Scholar] [CrossRef]
  5. Kirlidog, M.; Asuk, C. A fraud detection approach with data mining in health insurance. Procedia Soc. Behav. Sci. 2012, 62, 989–994. [Google Scholar] [CrossRef]
  6. Shin, H.; Park, H.; Lee, J.; Jhee, W.C. A Scoring Model to Detect Abusive Billing Patterns in Health Insurance Claims. Expert. Syst. Appl. 2012, 39, 7441–7450. [Google Scholar] [CrossRef]
  7. Ahmed, M.; Ahamad, M.; Jaiswal, T. Augmenting Security and Accountability Within the eHealth Exchange. IBM J. Res. Dev. 2014, 58, 8. [Google Scholar] [CrossRef]
  8. Phua, C.; Alahakoon, D.; Lee, V. Minority report in fraud detection: Classification of skewed data. Acm Sigkdd Explor. Newsl. 2004, 6, 50–59. [Google Scholar] [CrossRef]
  9. Travaille, P.; Müller, R.M.; Thornton, D.; Hillegersberg, J. Electronic fraud detection in the US medicaid healthcare program: Lessons learned from other industries. In Proceedings of the 17th AMCIS 2011, Detroit, MI, USA, 4–8 August 2011; Available online: http://doc.utwente.nl/78000/ (accessed on 15 October 2016).
  10. Villegas-Ortega, J.; Bellido-Boza, L.; Mauricio, D. Fourteen years of manifestations and factors of health insurance fraud, 2006–2020: A scoping review. Health Justice 2021, 9, 26. [Google Scholar] [CrossRef] [PubMed]
  11. Shimaoka, A.M.; Ferreira, R.C.; Goldman, A. The evolution of CRISP-DM for data science: Methods, processes and frameworks. SBC Rev. Comput. Sci. 2024, 4, 28–43. [Google Scholar] [CrossRef]
  12. Schröer, C.; Kruse, F.; Gómez, J.M. A systematic literature review on applying CRISP-DM process model. Procedia Comput. Sci. 2021, 181, 526–534. [Google Scholar] [CrossRef]
  13. Ortega, P.A.; Figueroa, C.J.; Ruz, G.A. A Medical Claim Fraud/Abuse Detection System based on Data Mining: A Case Study in Chile. DMIN 2006, 6, 26–29. [Google Scholar]
  14. Liou, F.-M.; Tang, Y.-C.; Chen, J.-Y. Detecting hospital fraud and claim abuse through diabetic outpatient services. Health Care Manag. Sci. 2008, 11, 353–358. [Google Scholar] [CrossRef]
  15. Mailloux, A.T.; Cummings, S.W.; Mugdh, M. A decision support tool for identifying abuse of controlled substances by forwardhealth medicaid members. J. Hosp. Mark. Public Relat. 2010, 20, 34–55. [Google Scholar] [CrossRef]
  16. Francis, C.; Pepper, N.; Strong, H. Using support vector machines to detect medical fraud and abuse. In Proceedings of the 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Boston, MA, USA, 30 August–3 September 2011; pp. 8291–8294. [Google Scholar]
  17. Bauder, R.A.; Khoshgoftaar, T.M.; Hasanin, T. Data Sampling Approaches with Severely Imbalanced Big Data for Medicare Fraud Detection. In Proceedings of the 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI), Volos, Greece, 5–7 November 2018; IEEE: Volos, Greece, 2018; pp. 137–142. [Google Scholar]
  18. Johnson, J.M.; Khoshgoftaar, T.M. Data-Centric AI for Healthcare Fraud Detection. SN Comput. Sci. 2023, 4, 389. [Google Scholar] [CrossRef]
  19. Hancock, J.T.; Bauder, R.A.; Wang, H.; Khoshgoftaar, T.M. Explainable machine learning models for Medicare fraud detection. J. Big Data 2023, 10, 154. [Google Scholar] [CrossRef]
  20. Nabrawi, E.; Alanazi, A. Fraud Detection in Healthcare Insurance Claims Using Machine Learning. Risks 2023, 11, 160. [Google Scholar] [CrossRef]
  21. Prova, N. Healthcare Fraud Detection Using Machine Learning 2024. In Proceedings of the 2024 Second International Conference on Intelligent Cyber Physical Systems and Internet of Things (ICoICI), Coimbatore, India, 20–30 August 2024. [Google Scholar] [CrossRef]
  22. Bounab, R.; Zarour, K.; Guelib, B.; Khlifa, N. Enhancing Medicare Fraud Detection Through Machine Learning: Addressing Class Imbalance With SMOTE-ENN. IEEE Access 2024, 12, 54382–54396. [Google Scholar] [CrossRef]
  23. Mao, Y.; Li, Y.; Xu, B.; Han, J. XGAN: A Medical Insurance fraud Detector based on GAN with XGBoost. J. Inf. Hiding Multim. Signal Process 2024, 15, 36–52. [Google Scholar]
  24. Cressey, D.R. Other People’s Money; A Study of the Social Psychology of Embezzlement; The Free Press: Los Angeles, CA, USA, 1953. [Google Scholar]
  25. Wolfe, D.T.; Hermanson, D.R. The fraud diamond: Considering the four elements of fraud. CPA J. 2004, 74, 38–42. [Google Scholar]
  26. Kranacher, M.-J.; Riley, R. Forensic Accounting and Fraud Examination; John Wiley & Sons: Hoboken, NJ, USA, 2019; Available online: https://books.google.com/books?hl=es&lr=&id=GnOODwAAQBAJ&oi=fnd&pg=PR12&dq=Forensic+Accounting+and+Fraud+Examination,+&ots=PMN4s72CCa&sig=9obhV0dZZK1s4MkAvFO_a6fVdl4 (accessed on 14 April 2025).
  27. SIS. Boletín Estadístico 2024 del Seguro Integral de Salud (SIS). Available online: https://cdn.www.gob.pe/uploads/document/file/7530499/6401616-boletin-estadistico-2024.pdf?v=1737643909 (accessed on 21 June 2025).
  28. MEF. Consulta Amigable del Ministerio de Economía y Finanzas del Perú; MEF: Lima, Peru, 2023; Available online: https://apps5.mineco.gob.pe/transparencia/Navegador/default.aspx?y=2023&ap=ActProy (accessed on 14 May 2025).
  29. Espinoza Rivera, S. Estrategias Implementadas por el Seguro Integral de Salud y su Influencia en las Transferencias Financieras y su Ejecución por Parte de los Hospitales Nacionales e Institutos Especializados, Lima–Peru, 2009–2017. Master’s Thesis, Universidad Nacional Federico Villarreal, Lima, Peru, 2019. Available online: https://repositorio.unfv.edu.pe/handle/20.500.13084/3121 (accessed on 14 February 2025).
  30. Espinal Redondez, L.Á.; Ibáñez Alvarado, C.M.; Moyano Melo, M.A.J.A. Propuesta de un Modelo Predictivo para Realizar un Control y Supervisión más Eficiente de las Prestaciones de Servicios de Salud en una Aseguradora Pública de Salud. Master’s Thesis, Universidad Peruana de Ciencias Aplicadas (UPC), Lima, Peru, 2020. Available online: https://repositorioacademico.upc.edu.pe/handle/10757/652194 (accessed on 14 February 2025).
  31. Galagarza Ruíz, G.I. Validación Prestacional Oportuna de las Prestaciones del Servicio de Cuidados Intensivos de un Hospital nivel III-I Periodo 2012–2014. Master’s Thesis, Universidad de San Martín de Porres, Lima, Peru, 2015. [Google Scholar]
  32. Quispe Mamani, J.C.; Quilca Soto, Y.; Calcina Álvarez, D.A.; Yapuchura Saico, C.R.; Ulloa Gallardo, N.J.; Aguilar Pinto, S.L.; Quispe Quispe, B.; Quispe Maquera, N.B.; Cutipa Quilca, B.E. Moral Risk in the Behavior of Doctors of the Comprehensive Health Insurance in the Province of San Román, Puno-Peru, 2021. Front. Public Health 2022, 9, 799708. [Google Scholar] [CrossRef] [PubMed]
  33. Encuesta Perú 21—Ipsos Inseguridad Ciudadana en Perú. Available online: https://www.ipsos.com/es-pe/inseguridad-ciudadana-en-peru-encuesta-peru-21-ipsos-febrero-2025 (accessed on 8 May 2025).
  34. INEI. Perú Delincuencia y Corrupción son los Principales Problemas que Afectan al País. Available online: https://m.inei.gob.pe/prensa/noticias/delincuencia-y-corrupcion-son-los-principales-problemas-que-afectan-al-pais-9294/ (accessed on 8 May 2025).
  35. Algalobo Távara, B.P.; Espinoza Sánchez, N.A. La corrupción y su relación con los índices de pobreza extrema en el Perú. Rev. InveCom 2025, 5. Available online: https://ve.scielo.org/scielo.php?script=sci_arttext&pid=S2739-00632025000102086 (accessed on 8 May 2025). [CrossRef]
  36. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  37. Chen, X. Questionable University-sponsored supplements in high-impact journals. Scientometrics 2015, 105, 1985–1995. [Google Scholar] [CrossRef]
  38. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Diagram of the elements of health insurance fraud.
Figure 1. Diagram of the elements of health insurance fraud.
Futureinternet 17 00584 g001
Figure 2. Diagram of the 6-phase PDHIF methodology.
Figure 2. Diagram of the 6-phase PDHIF methodology.
Futureinternet 17 00584 g002
Figure 3. Confusion matrix: Scenario 1 ((a): RF, (b): XGB, (c): MLP). Scenario 2 ((d): RF, (e): XGB, (f): MLP).
Figure 3. Confusion matrix: Scenario 1 ((a): RF, (b): XGB, (c): MLP). Scenario 2 ((d): RF, (e): XGB, (f): MLP).
Futureinternet 17 00584 g003
Figure 4. Loss function: Scenario 1 ((a): XGB; (c): MLP). Scenario 2 ((e): XGB; (g): MLP). Precision function: Scenario 1 ((b): XGB; (d): MLP). Scenario 2 ((f): XGB; (h): MLP).
Figure 4. Loss function: Scenario 1 ((a): XGB; (c): MLP). Scenario 2 ((e): XGB; (g): MLP). Precision function: Scenario 1 ((b): XGB; (d): MLP). Scenario 2 ((f): XGB; (h): MLP).
Futureinternet 17 00584 g004
Table 1. Summary of previous studies on HIF detection based on SMLs.
Table 1. Summary of previous studies on HIF detection based on SMLs.
AutorCountry|Database (Size)AlgorithmBalancing MethodAccuracyPrecisionAUCRecallSpecificityF1 ScoreDM Methodology
[13]Chile|Banmédica SA, claims (500,000), abusive (169).ANN–MLPNRNRNR82.0%73.4%NR91.0%Ad hoc
[14]Taiwan|NHI (1,050,979 patients y 17,668 providers)LRNR84.6%NRAd hoc
ANN91.5%
DT98.7%
[15]US|Medicaid US Wisconsin (Medication for 190 beneficiaries)DT-CHAIDNR95.3%91.9%NR87.2%96.5%NRAd hoc
[16]US|Medicare y Medicaid (182,809 invoices)SVMNR99.0%NRNRNRNRNRAd hoc
[6]Republic of Korea|HIRA (3075 internal medicine providers)DT-CDA ScoreNRAd hoc
[17]US|Medicare Part A, B, D (759,267 nonfraud, 473 fraud)LRSMOTENR82.8%NRAd hoc
RFRUS82.8%
XGBRUS81.7%
[18]US|Nine datasets from Medicare part B, part D, DMEPOS.XGBNR95.4%84.9%95.7%NRCRISP-DM
RF87.2%NR
XGB95.9%86.2%96.8%
RF80.2%NR
XGB95.0%85.5%96.9%
RF83.8%NR
[19]USMedicare Part D (5,344,106 instances, 0.07% fraud)CATBNR95.7%NRAd hoc
XGB95.6%
RF86.4%
LGBM84.8%
ET84.1%
LR83.5%
Medicare Part B (8,669,497 instances, 0.05% fraud)CATBNR95.9%NR
XGB94.7%
RF84.6%
LGBM85.0%
ET86.1%
LR91.4%
[20]Saudi Arabia|Three providers from 2022RFSMOTE98.2%98.1%90.0%100.0%80.0%99.0%Ad hoc
LRSMOTE80.4%97.6%80.2%80.4%80.0%88.2%
ANNSMOTE94.6%98.0%88.0%96.1%80.0%97.0%
[21]US|Exclusive dataset (558,000).RFNR92.4%95.6%90.7%83.8%NR89.3%Ad hoc
XGB91.7%97.4%96.2%80.4%88.1%
SVM81.8%81.6%87.6%67.1%73.7%
IF62.6%51.5%40.6%21.9%30.8%
DLM91.7%94.8%90.9%65.1%77.2%
SEM92.8%93.6%97.0%86.9%90.2%
[22]US|Medicare US Part B (9,449,361)LRSMOTE-ENN and cross-validation65.0%69.0%73.0%67.0%NR65.0%Ad hoc
DT100.0%99.0%95.0%100.0%100.0%
RF95.0%95.0%99.0%95.0%95.0%
XGB96.0%96.0%99.0%96.0%96.0%
AdaGB65.0%70.0%68.0%67.0%64.0%
LGBM91.0%90.0%97.0%91.0%91.0%
[23]China|Tianchi Precision Social Security Competition: Real Health Insurance Data (NR)LRD.Samp.NR95.6%NR73.0%NR82.8%Ad hoc
XGBD.Samp.81.1%86.9%83.9%
LRSMOTE95.8%78.3%86.2%
XGBSMOTE91.2%90.3%90.8%
LRBSMOTE95.6%80.5%87.4%
XGBBSMOTE92.6%86.9%89.6%
LRGAN93.8%72.8%81.9%
XGBGAN94.4%96.4%95.4%
NR: Not reported. NHI: National Health Insurance. DM: Data mining. LR: Logistic regression. DT: decision tree. RF: random forest. IF: isolation forest. DLM: Deep learning Model. SVM: Support Vector Machine. SEM: stacking ensemble model. XGB: extreme gradient boosting. CATB: Categorical boosting. AdaGB: Adaptive boosting. LGBM: Light gradient boosting machine. ANN: artificial neural network. MLP: Multilayer perceptron. CHAID: Chi-square automatic interaction detection. HIRA: Health Insurance Review Agency. BSMOTE: Borderline synthetic minority-oversampling technique. D.Samp: Downsampling is a technique for reducing the size of the majority class and balancing an unbalanced dataset.
Table 2. Detailed description of the phases of the PDHIF methodology.
Table 2. Detailed description of the phases of the PDHIF methodology.
PhaseDescription
Phase 1.
Identifying signs of fraud
First, we consider the theoretical framework. According to the definition of SML [10], fraud cannot be asserted without verifying intent and the obtaining of illegal benefits. In other words, “HIF is an act based on intentional deception or misrepresentation to obtain illegal benefits related to the coverage provided by health insurance.” Since intent cannot be determined at this stage, it is possible to analyze noncompliance with the regulatory framework and local regulations, which could generate suspicion. This suspicion could correspond to potential fraud.
Second, we consider the local socioeconomic dynamics specific to the health sector and their direct influence on SML. To do this, we must analyze the elements defined in fraud theory, using the Fraud Triangle [24], the Fraud Diamond [25], and MICE (Minimum Incentives, Comprehensive Care, and Enterprises) [26], COSO (ERM), among others †.
Phase 2.
Identifying available manifestations and factors
The manifestations of adverse health effects (AHS) are associated with the types of actors involved ‡, considering the study by Villegas-Ortega et al. (2021) [10] and the legal framework. This delimitation facilitates a more precise and operational analysis, focused on evidence rather than empirical or abstract categories, identifying concrete patterns according to the actor’s profile: in the case of insurance companies (2 possible manifestations), policyholders (7), or medical providers (12). Similarly, the 47 AHS factors § SML [10] available in transactional (claims, invoices) and/or nontransactional (regulatory reports, surveys) data are identified. This phase ensures the coverage of the factors (variables) associated with AHS in the case study.
Phase 3. Data preprocessing and balancingData are prepared for analysis through a process that includes cleaning (duplicate removal, error correction, and handling of missing values), transformation (categorical variable coding and format standardization), and normalization (data scaling to ensure comparability). Additionally, transactions are labeled as fraudulent or legitimate, and the compatibility of clinical standards (ICD-10, CPT) is validated across heterogeneous systems. Because SML datasets are often unbalanced—with an SMLll proportion of fraudulent cases compared to legitimate ones—traditional techniques such as overrandom sampling or undersampling are avoided due to their risk of overfitting or loss of information. Instead, advanced methods such as SMOTE or others are applied, combined with other high-dimensionality techniques, allowing for category balancing and optimizing the model’s predictive capacity.
Phase 4. Model development, training, and evaluationIn this phase, the model is trained using SML, with a stratified division of the collected data (Phase 3) into training, validation, and test sets. The process includes cross-validation to ensure robustness and hyperparameter fitting, optimizing metrics for unbalanced problems such as AUC-ROC (to assess discriminatory capacity between classes), Recall (maximizing the detection of true positives), and F1-Score (balancing accuracy and completeness). The final evaluation is performed with the test set, ensuring that the reported metrics reflect real-world performance with unseen data, thus validating the model’s effectiveness in identifying fraud patterns under real-world operating conditions.
Phase 5. Deployment and Operational IntegrationThe model is implemented in a production environment using a scalable architecture that supports both real-time and batch processing, ensuring its integration with existing systems. To maximize its operational utility, automated workflows are established where transactions classified as suspicious are routed to the specialized investigation team (auditors, tax officials, and SML specialists) to confirm fraud, prioritizing cases with a higher probability of fraud. In addition, security protocols (data encryption, access control) are implemented, and best practices for fraud detection are established and implemented.
Phase 6. Adaptive Monitoring and EvolutionA continuous monitoring system for the SML detection model’s performance is implemented, with automated alerts on key metrics (F1 score, recall, false positive rate) to ensure the model remains current in identifying new fraud patterns. To this end, periodic model retraining cycles are performed, incorporating new factors or manifestations based on regulatory changes, as well as false positive/negative reports from the research team. The integration of these components guarantees that the system evolves in tandem with fraudulent tactics and regulatory frameworks, preserving its accuracy in dynamic healthcare environments.
ICD-10: Classification of Diseases, 10th Revision; CPT: current procedural terminology; SMOTE: synthetic minority oversampling technique. † The Fraud Triangle [24] examines three pillars: pressure (e.g., the economic need of patients or providers), opportunity (weak controls in billing or medical records), and rationalization (perception of impunity or ethical justifications); the Fraud Diamond [25], adds a fourth element: capability (technical skills to evade systems, such as mastery of diagnostic codes); MICE classifies motivations as money, ideology, coercion, or ego (such as physicians who falsify diagnoses out of “altruism”); and COSO ERM (2017) provides tools for designing proactive controls (e.g., detection algorithms). ‡ INSURANCE COMPANY: M1. Falsifying benefit or service statements; M2. Falsifying reimbursements. INSURED: M3. Using an incorrect diagnosis; M4. Document manipulation; M5. Billing for services not provided; M6. Opportunistic Fraud; M7. Identity fraud; M8. Doctor shopping; M9. Misrepresenting eligibility. HEALTH CARE PROVIDER: M10. Upcoding; M11. Unperformed; M12. Falsifies the diagnosis or treatment; M13. Soliciting, offering, or receiving a kickback; M14. Unbundling or exploding charges; M15. Falsify medical documents or records; M16. Unjustified services, Overutilization; M17. Opportunistic fraud; M18. Repeat billing for the same service provided; M19. Readmission/admission; M20. Type of room Charge; M21. Services were canceled but contained discounts and professional courtesies. § MACRO FACTORS: F1. Regulations and Norms; F2. Socioeconomic and political conditions; F3. Infrastructure; F4. Culture; F5. Complexity of health systems; F6. Geography. MESO FACTORS: F7. Characteristics of the provider; F8. Provider responsibility; F9. Measures of the administrative authority; F10. Internal disciplinary mechanisms; F11. Payment methods and contracts; F12. Medical records; F13. Auditing, supervision, sanctions, and control; F14. Quality and performance evaluation system; F15. Reputation; F16. Commercial implications; F17. Claims management and policy; F18. Reimbursement processes; F19. Employability; F20. Patient identification mechanisms; F21. Types of professionals. MICRO FACTORS: F22. Gender; F23. Age; F24. Predominant race; F25. Marital status; F26. Place of residence; F27. Having insurance status; F28. Predominant language; F29. Diagnoses; F30. Medical and surgical treatments; F31. Specialties; F32. Medications; F33. Chronic condition; F34. Risk of illness; F35. Ethics and morals; F36. Perception of inequity and injustice; F37. Information asymmetry; F38. The adjusters’ decision; F39. Capacity building; F40. Deductibles/coinsurance; F41. Economic situation. COLLABORATIVE FACTORS: F42. Professional and patient; F43. Provider and insurer; F44. Consumer and provider; F45. Consumer and insurer; F46. Bosses and employed; F47. The Guanxi., which is a Chinese term that describes the basic dynamics of networks of contacts and personal influence.
Table 3. (a). Selection of supplier manifestations. (b). Factors and Variables Available for the Case Study.
Table 3. (a). Selection of supplier manifestations. (b). Factors and Variables Available for the Case Study.
(a)
ManifestationJustificationResult
M10This involves billing for services under diagnostic and procedure codes that are more expensive than those actually performed. The PSA detects this through automated validation rules, and medical appropriateness is determined in the SME; both are linked to the FUA.ELIGIBLE
M11The SME identifies cases by detecting FUAs without support in clinical or surgical reports, medical signatures or laboratory results; if such a situation is found, the FUA is marked as a charge for procedures never performed.ELIGIBLE
M12The PCPP assesses the consistency between the diagnosis and the reported treatment. These irregularities are marked in the FUAs as observations, which may indicate fraud, making the individual a candidate for the SML.ELIGIBLE
M13None of the SIS audit processes suggest the identification of suppliers with possible bribes or illegal commissions, nor are they marked in the FUAS because illegitimate payments cannot be identified, so this statement is not selected.UNELIGIBLE
M14Through automatic consistency rules, the PSA blocks these cases by detecting repetitive charges for separate procedures, which is complemented by the SME, since it considers the analysis of specialized clinical cases, so there are records marked with such observations, which justifies including it.ELIGIBLE
M15The PCPP verifies inconsistencies in dates, seals, or protocols by comparing clinical records with validated standards; however, it does not record the falsification of certificates, medical records, or alteration of documents to justify payments.UNELIGIBLE
M16The SME detects anomalies such as multiple diagnostic support procedures in short periods, cross-referencing data with medical records to validate their justification, marking the FUAS with unjustified services or overutilization.ELIGIBLE
M17The PSA, with automated validation, identifies atypical claims (e.g., oxygen registered in liters instead of cubic meters) by comparing with historical profiles of patients and providers; however, with the FUAS, it cannot be guaranteed that it is an opportunistic fraud.UNELIGIBLE
M18The PSA applies consistency rules, blocking duplicate FUAS for the same patient during overlapping periods, thus validating the uniqueness of procedures. However, no FUAS records are flagged.UNELIGIBLE
M19None of the SIS audit processes analyzes readmissions or unnecessary admissions or repeated hospitalizations without clinical improvement.UNELIGIBLE
M20Charges for unused room types, billing for stays in unassigned rooms.
The SIS does not monitor charges for unused room types or billing for stays in unassigned rooms.
UNELIGIBLE
M21The SIS does not foresee canceled services with discounts in its audit processes.UNELIGIBLE
(b)
CategoryFactor of SMLAvailable VariablesTypeSelected Variables
Key identifiers V01. Anonymized FUA numberAlphanumericYes
V02. FUA IdentifierNumericNo
V03. Year of productionNumericNo
V04. Production monthAlphanumericNo
Demographic DataF22. GenderV05. Patient’s sexNumericYes
F23. AgeV06. Patient’s ageNumericYes
Supplier details V07. SIS Macro-Regional ManagementNumericNo
V08. UDRNumericNo
V09. DepartmentNumericNo
V10. Implementing UnitNumericNo
F7. Supplier detailsV11. Supplier codeAlphanumericYes
V12. Supplier categoryAlphanumericNo
Service Details V13. Service provisionAlphanumericYes
V14. Destination at highNumericYes
V15. Date of serviceTemporaryNo
V16. Hospital start dateTemporaryNo
V17. Date of hospital dischargeTemporaryNo
F31. SpecialtyV18. Type of CareNumericYes2
V19. Hospital StayNumericYes2
F29. Diagnosis
F33. Chronic health condition
V20. ICD-10 Primary DiagnosisAlphanumericNo
V21. Diagnostic Profile: ICD-10NumericYes2
V22. Number of diagnosesNumericYes2
V23. Healthcare PersonnelAlphanumericYes
Economic variablesF18. Refund processes and billing featuresV24. Gross valuation of the FUANumericNo
V25. Gross Value of ProceduresNumericYes
V26. Gross value of medicinesNumericYes
V27. Gross value of inputsNumericYes
ConsumptionF30. Medical and surgical treatmentsV28. Procedure Consumption ProfileNumericYes2
V29. Number of FUA ProceduresNumericYes2
V30. Input Consumption ProfileNumericYes2
V31. Number of FUA InputsNumericYes2
F32. MedicineV32. Medication Consumption ProfileNumericYes2
V33. FUA Drug NumberNumericYes2
V34. Type of Consumption (FUA Details)AlphanumericNo
V35. CPMS/SISMED Code (FUA Details)AlphanumericNo
V36. Amount delivered (FUA Details)NumericNo
F13. Audit, supervision, sanction, and controlV37. Marking for PCPP in SMENumericYes
V38. Number of FUA rules observed in PSANumericYes
V39. PSA resultNumericNo
V40. SME resultNumericNo
V41. PCPP resultNumericNo
V42. Suspicion MarkingNumericYes2
Yes = 23
ICD-10: International Statistical Classification of Diseases, 10th Edition; CPMS: Catalog of Medical and Health Procedures in the health sector based on the international terminology “Current Procedural Terminology”; SISMED: Integrated Supply System for Pharmaceutical Products, Medical Devices and Health Products; UDR: Regional decentralized unit; Yes: the variable is considered; No: the variable is not considered; Considered for filtering each dataset: V07 and V12. 2: Newly calculated variables that replace and synthesize others; V18 Type of hospital care: a value of 1 is assumed if V16 and V17 have information; otherwise, its value is 0 (outpatient); V19 Hospital stay is determined as the difference in days between V17 and V16; V21 Diagnostic Profile: This considers all the ICD-10 diagnoses from the FUA concatenated and ordered (Dx 1 + Dx 2 + …. + Dx n). These form different diagnostic profiles; however, owing to the high cardinality, an encoding is applied that represents the frequency of the diagnostic profile; V22 is determined by counting the number of different diagnoses given in V21; V28 Procedure Consumption Profile: This considers all procedures registered with concatenated and ordered FUA CPMS codes where V34 = ‘P’ [(V35 Proc1 + V36 Cant1) + (V35 Proc2 + V36 Cant2) +…+ (V35 Procn + V36 Cantn)]. These form different clinical procedure consumption profiles; however, owing to the high cardinality, an encoding is applied that represents the frequency of the procedure consumption profile; V29 is the frequency of unique CPMS codes recorded in V35. V30 Input Consumption Profile: This considers all inputs registered with SISMED codes from the FUA, concatenated and ordered, where V34 = ‘I’ [(V35 Insu1 + V36 Cant1) + (V35 Insu2 + V36 Cant2) +…+ (V35 Insun + V36 Cantn)]. These form different input consumption profiles; however, owing to the high cardinality, an encoding is applied that represents the frequency of the input consumption profile; V31 is the frequency of unique SISMED codes corresponding to supplies registered in V35. V32 Medication Consumption Profile: This considers all medications registered with SISMED codes from the FUA, concatenated and ordered, where V34 = ‘M’ [(V35 Medi1 + V36 Cant1) + (V35 Medi2 + V36 Cant2) + … + (V35 Medin + V36 Cantn)]. These form different medication consumption profiles; however, owing to the high cardinality, an encoding is applied that represents the frequency of the medication consumption profile; V33 is the frequency of unique SISMED codes corresponding to medications registered in V35. V42 Suspicion Marking, a variable outcome constructed on the basis of the evaluation of the audit processes: PSA, SME and PCPP.
Table 4. Original dataset features and scenarios.
Table 4. Original dataset features and scenarios.
DATASET: RLIMA-CE
Observations and VariablesOriginal DatasetScenery 1
Preprocessed and Not Balanced
Scenery 2
Preprocessed and Balanced
Variables422323
Records8,453,846 (100.0%)8,453,846 (100.0%)16,648,713 (100.0%)
Records with suspicions129,490 (1.5%)129,490 (1.5%)8,324,356 (50.0%)
Legitimate records8,324,356 (98.5%)8,324,356 (98.5%)8,324,357 (50.0%)
RLIMA-CE: This includes public tertiary care hospitals reserved for the treatment of rare problems and complex pathologies requiring specialized and high-tech procedures in Central Metropolitan Lima and East Metropolitan Lima.
Table 5. SML Model Parameters for the RLIMA-CE Dataset.
Table 5. SML Model Parameters for the RLIMA-CE Dataset.
RF HyperparametersXGB HyperparametersMLP Hyperparameters
Max_depth: NoneMax_depth = 7Dropout = 0.2
Units = 64
Learning_rate = 0.001
Estimators: 200Estimators = 300
Min_samples_split: 2Learning_rate = 0.1
Min_samples_leaf: 1Subsample = 1.0
Max_features = ‘sqrt’Colsample_bytree = 1.0
Max_depth: NoneMax_depth = 7Dropout = 0.2
Units = 64
Learning_rate = 0.001
Estimators: 200Estimators = 300
Min_samples_split: 2Learning_rate = 0.1
Min_samples_leaf: 1Subsample = 0.8
Max_features = ‘sqrt’Colsample_bytree = 0.8
RF: random forest, XGB: extreme gradient boosting, and MLP: multilayer perceptron.
Table 6. Key results from the models in both scenarios and Matthew’s Correlation Coefficient (MCC).
Table 6. Key results from the models in both scenarios and Matthew’s Correlation Coefficient (MCC).
MetricsScenario 1 (No Balanced)Scenario 2 (Balanced)
RFXGBMLPRFXGBMLP
Precision_test0.8290.8390.7140.9950.9730.926
Recall_test0.3800.3400.2390.9940.9840.965
F1_test0.5210.4840.3580.9940.9780.945
Accuracy_test0.9890.9890.9870.9940.9780.944
MCC test0.55690.53000.40790.98890.95620.8888
RF: random forest, XGB: extreme gradient boosting, and MLP: multilayer perceptron.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Villegas-Ortega, J.; Quiroz Aviles, L.N.; Arancibia, J.N.; Montenegro, W.C.; Delgadillo, R.; Mauricio, D. Methodology for Detecting Suspicious Claims in Health Insurance Using Supervised Machine Learning. Future Internet 2025, 17, 584. https://doi.org/10.3390/fi17120584

AMA Style

Villegas-Ortega J, Quiroz Aviles LN, Arancibia JN, Montenegro WC, Delgadillo R, Mauricio D. Methodology for Detecting Suspicious Claims in Health Insurance Using Supervised Machine Learning. Future Internet. 2025; 17(12):584. https://doi.org/10.3390/fi17120584

Chicago/Turabian Style

Villegas-Ortega, Jose, Luis Napoleon Quiroz Aviles, Juan Nazario Arancibia, Wilder Carpio Montenegro, Rosa Delgadillo, and David Mauricio. 2025. "Methodology for Detecting Suspicious Claims in Health Insurance Using Supervised Machine Learning" Future Internet 17, no. 12: 584. https://doi.org/10.3390/fi17120584

APA Style

Villegas-Ortega, J., Quiroz Aviles, L. N., Arancibia, J. N., Montenegro, W. C., Delgadillo, R., & Mauricio, D. (2025). Methodology for Detecting Suspicious Claims in Health Insurance Using Supervised Machine Learning. Future Internet, 17(12), 584. https://doi.org/10.3390/fi17120584

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop