You are currently viewing a new version of our website. To view the old version click .
  • Article
  • Open Access

5 October 2020

Medical Fraud and Abuse Detection System Based on Machine Learning

,
and
1
School of Management, Zhejiang University, Hangzhou 310058, China
2
School of Material Science and Engineering, Qingdao University of Science and Technology, Qingdao 266042, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Machine Learning and Analytics for Medical Care and Health Service

Abstract

It is estimated that approximately 10% of healthcare system expenditures are wasted due to medical fraud and abuse. In the medical area, the combination of thousands of drugs and diseases make the supervision of health care more difficult. To quantify the disease–drug relationship into relationship score and do anomaly detection based on this relationship score and other features, we proposed a neural network with fully connected layers and sparse convolution. We introduced a focal-loss function to adapt to the data imbalance and a relative probability score to measure the model’s performance. As our model performs much better than previous ones, it can well alleviate analysts’ work.

1. Introduction

In China, the health care market is worth more than 2 trillion yuan per year. Besides the considerable market, Chinese national health care security administration randomly inspected 197 thousand of medical institutions and found that nearly 1/3 of them had existing health care violations in 2018, China [1]. It is estimated that approximately 10% of healthcare expenditure is wasted due to health care abuse or fraud behavior, which makes it an essential issue for health care systems.
Usually, inappropriate healthcare behavior includes system error, medical abuse, and healthcare fraud [2]. With the development of health care settlement system, the settlement process gets more faultless, but still it cannot prevent intentional deception.
Generally, medical abuse means that healthcare service providers offer unnecessary medical treatments or services to the patient, to get more profit or kickbacks. Healthcare fraud is an intentional deception used, which is intended to obtain unauthorized benefits [2]. Usually, it is implemented by an intentional “patient” or a group of malicious “patient” rather than a medical service provider, which gets more complex to supervise. In recent days, it was reported that a family pretended to be sick for dozens of diseases and asked for hundreds of pills per day, and it turned out that they had frauded the healthcare funds for 400 thousand in a year. [3] Since healthcare fraud is more harmful to health insurance funds, the laws of most countries/regions define it as illegal. Both healthcare fraud and the milder medical abuse damage the health insurance system and finally result in the harm of social welfare.
Medical abuse and healthcare fraud behaviors are slightly different from typical medical behaviors, as a professional data analyst can discriminate between the abnormal suspect and regular records by reviewing multiple dimension records. However, as the participation rate is more than 95% and the daily records go over 100 thousand, facing such a massive number of records, data analysts cannot comprehensively review all of the records [4].
With the help of machine learning, we can train a model to classify the abnormal records by learning samples’ characteristics. In order to find fraudulent behaviors, there are several main difficulties to deal with:
(1)
There is no exact rule which can clearly distinguish the abnormality of medical insurance transactions. Moreover, the number of abnormal records is tiny compared to the massive number of regular treatment records. For those two reasons above, the relatively small dataset of labeled abnormal records limit the algorithm accuracy.
(2)
Due to the influence of various concomitant diseases, patient characteristics, doctor preferences, and additional noise factors in medical treatment records, the situation is complicated, making the anomaly challenging to find out [5].
(3)
Because intentional deception fraudsters often use multiple methods to conceal their fraudulent behaviors behind enormous usual transaction data, traditional means based on rules are challenging to find fraudsters and hard to cover the updated fraud behaviors.
(4)
The frequent changes upon the medical insurance drugs list or disease relations, call for in time updating the logic of anomaly detection. Additionally, as a result of massive data, it will take much time to perform retraining or redetection to update the anomaly detection system.
For those reasons listed above, the real-world healthcare scenario is so complex that many reasonable behaviors seem abnormal, and hence the abnormal detection system in the healthcare domain is known as hard to develop and apply. In order to get rid of this dilemma, we tried to use a machine learning method (neural network) to detect medical fraud cases. The results proved that our model can indeed significantly alleviate the analysts’ work.
In our work, we used neural networks to understand the combination of disease and prescription, which plays a significant role in medical abuse or fraudulent behavior. After the construction of feature engineering, we applied an outlier detection model to find suspicious anomaly records. In the last place, medical data analysts re-checked those suspicious records and made analysis. The result of the experiment shows that our model can improve the discover rate of abnormal health care behavior.

3. Results

Because there were no available labeled datasets the same as Zhejiang province’s health care system, we randomly sampled a dataset from the database and then labeled it, which contained 100 known abuse/fraud records and another 900 known regular records.
We tested the result using the traditional rule and sorts, and it only gets 24% among the top 100 records. In contrast, we tested several outlier detection algorithms using the features generated in Figure 3, the result as Table 4.
Table 4. Detection rate of different algorithms.
The traditional methods mainly include percentiles and other statistical factors, then sort the final weight-sum score and get the most abnormal records. The rest of the algorithm was based on the normalized features generated as in Table 2 in the previous section. In order to simulate the anomaly detection in the real-world (which usually checks the specific rate of all samples due to the massive data), we used metrics called Detection rate (DR), which indicates the actual anomaly samples counts in the top 10% of overall samples. K-means and DBscan were based on the most common Euclidean distance and got a 35% and 33% detection rate. We find that the Isolation Forest does the best detection among the top 100 records, which got 47% accuracy in the top 100 scores. The Isolation Forest algorithm intends to build several trees which split the similar samples and gather similar samples. As a result of few available samples, we set the estimators in the forest to 100, and the sample number was 256 on whole features with a random shuffle. We believed that the Isolation Forest was better in searching and dealing with irrelevant attributes, which resulted in the highest score. As Figure 5 shows, the Isolation Forest based on the comprehensive features with disease–drug relation scores gets a higher detection rate than traditional means.
Figure 5. Ranks of known abnormal record based on (a) our model (b) traditional rule.
We unsampled the 1000 records with 10× SMOTE on both normal/abnormal records for a better test of our detection model, tested K-means, and Isolation Forest on this 10 k SMOTE dataset. The result was shown in Table 5:
Table 5. Detection rate of different algorithms on the SMOTE 10 k dataset.
In the SMOTE 10 k dataset, the Isolation Forest using the same parameters (estimators and sample number) gets a lower detection rate compared to the original 1 k dataset. However, the K-means gets higher in contrast, which may indicate that the distribution tended to be a global anomaly.
Among the first 100 records (with highest relative probability scores), our model detected 55 actual abnormal ones, which means that healthcare data analysts only need to check the top 100 records scored by our model and will find the 55-abnormal behavior. By contrast, 550 of that or even more (according to the abnormal rate) would be needed to check without our model. In conclusion, our method can concentrate the abnormal records and help save the manual check effort.
After the above experimental steps on the original 7.37 million anonymized healthcare records, we obtained suspected abnormal records under the abnormal rate of one ten-thousandth. Since the whole records were too numerous to examine, we sampled 500 consecutive samples every 20% of data order by abnormal score then made manual labeling, and we found that the relationship between the abnormality and the actual abnormal ratio is as shown in Table 6, where the number of abnormal behaviors is more noticeable than origin data.
Table 6. Abnormal Score vs. Actual Abnormal ratio.
Our anomaly detection system is not only effective but also efficient. The feature extraction module takes 25 min @ 1080ti for training on 7 million treatment records. Retraining is needed only when there are large-scale changes in drug/disease. Additionally, the inference time complexity corresponds to the number of samples linearly, about 1 min @ 5 million data. The outlier detection time complexity is linear also.

4. Discussion

During the process of manual labeling, we found that different data analysts have different concepts of abnormal and they are constrained by their domain knowledge of disease or prescription, which will result in the inconsistency of results.
There are several patterns which can be found in the final abnormal records:
1.
Drug dosage abuse
In some cases, during the treatment, there exist much more drug dosages than usual, and usually, the drug belongs to a local pharmaceutical factory and gets a high price.
2.
Duplicate test
A patient takes the generally unnecessary same test multiple times. For example, in one of the abnormal records, a patient was charged 3 times for testing antibody to hepatitis B surface antigen (HBsAb) by a medical service provider in a single treatment.
3.
Unrelated drugs
Medical provider provides unrelated drugs, which will show low disease/drug relation score in the record. For example, one of the Chinese patent drugs for trauma was applied to a patient with influenza.
4.
Unrelated service
Several medical services are frequently happened in health care abuse, like medical massage aimed for pulling muscle but used in cold treatment.
5.
Drugs with similar effects abuse
Some records appear that excessive drugs and dosages with similar effects were used for a single treatment. Usually, it relates to medical providers’ abuse.
6.
Excessive outpatient frequency
Some records show that the patient does excessive outpatient behaviors. For example, a malicious user visited different hospitals on average twice a day, and every time he asked the doctor for pseudoephedrine hydrochloride prescriptions.
While reviewing the abnormal data, we found that the records whose prescription includes Chinese traditional patent medicines made by the local pharmaceutical factory are more likely to be abusive. Meanwhile, we noticed that the abnormal rate in pharmacies and private hospitals is higher than that of official public hospitals, which may indicate that those places need to be specially supervised.
In conclusion, those main kinds of abnormal records have a commonality that the summary price of treatment is more than other records, whose average amount was five times more than regular records. The overall average amount for a single treatment is 300 yuan per outpatient record. We found that the most typical of those abnormal behaviors is drug dosage abuse, and usually, the drug dosage abuse behavior’s drug has a high price, which directly related to the profit or kickback.

5. Conclusions and Future Work

We provided a model that can detect abnormal records in the healthcare area, using outlier detection and an end-to-end multi-label prediction method for disease–prescription correlation scores. The most significant advantages of this model are that it requires a simple data type, provides admirable practicability, achieves better accuracy and recall compared to the traditional rule-based method, which alleviates data analysts’ work.
When we tried to improve the performance of the model, we restricted the range of data as we limited the number of drugs and diseases to less than 1000. As a side-effect of accuracy improvement, this anomaly detection system can only detect about 71% of records. Due to the unlabeled dataset’s constraints and the massive amount of data, we can only verify the sampled result and check on our 1000 records dataset or do level-sample tests, which are not comprehensive enough to cover every disease and drug.
Meanwhile, the limitation of our paper is that due to privacy and security issues involved, we failed to obtain enough training data thoroughly. The medical history is incomplete, and the dataset contains serious noise even though we do data clean for several times. The results contain misjudgment and can only be an assistant tool for data analysts for efficiency. We need to endure more effort to ask for access to databases stored in government, hospitals, and relative institutions as it is playing a pivotal role in building a perfect model. On the other hand, in order to balance the skewed label classes, we tried the adjustment factor to tune the downsample and upsample rate, which produced an acceptable result. However, there remain more techniques to enhance the performance, such as SMOTE, threshold moving, or transform the supervised classification problem to an unsupervised anomaly detection problem.

Author Contributions

Conceptualization, C.Z., X.X., and C.W.; data curation, C.W.; formal analysis, C.Z. and C.W.; investigation, C.Z.; methodology, C.Z. and C.W.; project administration, C.W.; writing—original draft, C.Z.; writing—review and editing, C.Z. and X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Fundamental Research Funds for the Central Universities, Artificial Intelligence Research Foundation of Baidu Inc., Zhejiang University and Cybervein Joint Research Lab, Zhejiang Natural Science Foundation (LY19F020051), Program of ZJU and Tongdun Joint Research Lab, CAS Earth Science Research Project(XDA19020104), National Natural Science Foundation of China (U19B2042).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Available online: http://www.nhsa.gov.cn/art/2019/6/30/art_7_1477.html (accessed on 22 September 2020).
  2. Rashidian, A.; Joudaki, H.; Vian, T. No Evidence of the Effect of the Interventions to Combat Health Care Fraud and Abuse: A Systematic Review of Literature. PLoS ONE 2012, 7, e41988. [Google Scholar] [CrossRef] [PubMed]
  3. Available online: https://mp.weixin.qq.com/s/y0sQsg8p48CcwO2BfDaTuQ (accessed on 22 September 2020).
  4. Copeland, L.; Edberg, D.; Panorska, A.K.; Wendel, J. Applying business intelligence concepts to Medicaid claim fraud detection. J. Inf. Syst. Appl. Res. 2012, 5, 51. [Google Scholar]
  5. Chandola, V.; Banerjee, A.; Kumar, V. Anomaly Detection: A Survey. ACM Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
  6. Kotsiantis, S.B.; Zaharakis, I.; Pintelas, P. Supervised machine learning: A review of classification techniques. Emerg. Artif. Intell. Appl. Comput. Eng. 2007, 160, 3–24. [Google Scholar]
  7. Hastie, T.; Tibshirani, R.; Friedman, J. Unsupervised learning. In The Elements of Statistical Learning; Springer: New York, NY, USA, 2009; pp. 485–585. [Google Scholar]
  8. Wang, R.; Nie, K.; Wang, T.; Yang, Y.; Long, B. Deep Learning for Anomaly Detection. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 894–896. [Google Scholar]
  9. Liou, F.M.; Tang, Y.C.; Chen, J.Y. Detecting hospital fraud and claim abuse through diabetic outpatient services. Health Care Manag. Sci. 2008, 11, 353–358. [Google Scholar] [CrossRef] [PubMed]
  10. Margret, J.J.; Sreenivasan, S. Implementation of Data mining in Medical fraud Detection. Int. J. Comput. Appl. 2013, 69, 142–146. [Google Scholar]
  11. Ekina, T.; Leva, F.; Ruggeri, F.; Soyer, R. Application of Bayesian methods in detection of healthcare fraud. Chem. Eng. Trans. 2013, 33, 81–89. [Google Scholar]
  12. Van Capelleveen, G.; Poel, M.; Mueller, R.M.; Thornton, D.; van Hillegersberg, J. Outlier detection in healthcare fraud: A case study in the Medicaid dental domain. Int. J. Account. Inf. Syst. 2016, 21, 18–31. [Google Scholar] [CrossRef]
  13. Ikono, R.; Iroju, O.; Olaleke, J.; Oyegoke, T. Meta-Analysis of Fraud, Waste and Abuse Detection Methods in Healthcare. Niger. J. Technol. 2019, 38, 490. [Google Scholar] [CrossRef]
  14. Ekin, T.; Lakomski, G.; Musal, R.M. An unsupervised Bayesian hierarchical method for medical fraud assessment. Stat. Anal. Data Min. 2019, 12, 116–124. [Google Scholar] [CrossRef]
  15. Matloob, I.; Khan, S.; ur Rahman, H.; Hussain, F. Medical Health Benefit Management System for Real-Time Notification of Fraud Using Historical Medical Records. Appl. Sci. 2020, 10, 5144. [Google Scholar] [CrossRef]
  16. Yang, Y.-Y.; Lin, Y.-A.; Chu, H.-M.; Lin, H.-T. Deep Learning with a Rethinking Structure for Multi-label Classification. In Proceedings of the Asian Conference on Machine Learning, Nagoya, Japan, 17–19 November 2019; Volume 101, pp. 125–140. [Google Scholar]
  17. Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.