You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

27 July 2020

Medical Health Benefit Management System for Real-Time Notification of Fraud Using Historical Medical Records

,
,
and
1
Department of Computer and Software Engineering, College of Electrical and Mechanical Engineering (CEME), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan
2
Shifa International Hospital, Islamabad 44000, Pakistan
*
Author to whom correspondence should be addressed.
This article belongs to the Section Computing and Artificial Intelligence

Abstract

This paper presents a novel framework for fraud detection in healthcare systems which self-learns from the historical medical data. Historical medical records are required for training and testing of machine learning models. The main problem being faced by both private and government health supported schemes is a rapid rise in the amount of claims by beneficiaries mostly based on fraudulent billing. Detection of fraudulent transactions in healthcare systems is a strenuous task due to intricate relationships among dynamic elements including doctors, patients, service. In light of aforementioned challenges in health support programs, there is a need to develop intelligent fraud detection models for tracing the loopholes in procedures which may lead to successful reimbursement of fraudulent medical bills. In order to address the issue of fraud in healthcare programs our solution proposes a framework based on three entities (patient, doctor, service). Firstly, the framework computes association scores for three elements of the healthcare ecosystem namely patients, doctors or services. The framework filters out identified cases using association scores. The Confidence values, after G-means clustering of transactional data, are computed for each service in each specialty. Rules are generated based on the confidence values of services for each specialty. Then, an evaluation of identified cases is done using rule engine. The framework classifies cases into fraudulent activities based on the similarity bit’s value. The validation of framework is performed on local hospital employees transactional data which includes many reported cases of fraudulent activities in addition to some introduced anomalies.

1. Introduction

‘Fraud’ and ‘abuse’, these two phrases are generally used to identify the major medical reimbursement issues that defeat the ultimate objective of a valid claim. We divide the healthcare frauds into two major categories, service_availing patterns and service_providing patterns. Any fraud can occur, either in the service_providing patterns or in service_availing patterns. Figure 1 explains these two categories of the healthcare frauds. The service_availing patterns capture all the services availed by the patients, duplication of either services (actually not availed) or claims against those services. In simple words, a misrepresentation of the services (or products) for which, the bills are generated but actually not availed. For example, an insurance claim provided by the patient can be inconsistent with his age or gender. There is a possibility that one patient is availing the same service again and again or he/she is availing the service less frequently. In such a case, the frequency of the visits of patients to the hospitals or doctors is either quite high or low. The service_providing patterns refer to the misrepresentation of facts by the doctors, pharmacies or hospitals. There is a possibility that these service_providers generate duplicate bills for the same provided service. The doctors or hospitals can prescribe unnecessary treatments to the patients; the pharmacies can charge patients twice for the same medicine whereas the doctors can prescribe or perform unnecessary procedures and the providers may allow the medical card’s misutilization.
Figure 1. The Figure 1 explains two categories of healthcare frauds: Frauds in Service availing patterns(Patients) and Frauds in Service providing patterns(Doctors, pharmacies, hospitals).
Though many companies normally maintain their ‘Special Investigation Departments’ to control all the frauds and abuses in the re-imbursement of the medical bills but this is not enough to fulfil the purpose. Such departments get the guidelines from multiple sources and apply ‘Conventional Surveillance Techniques’ [1]. Whenever these departments detect any fraudulent payments, they proceed for the recovery of funds and then try to introduce the controls to avoid a future occurrence of such misrepresented billings. Once any claimed case is identified as a fraud/abuse, it can be recognized as an identified pattern. Such identified patterns are then utilized to make the adjustments in the billing policies of the existing system in order to prevent the reoccurrence of fraudulent activities. This type of approach commonly known as ‘pay and chase’, is not an efficient manner of detecting a fraud as it only generates an extra expense [2]. It is of partial use against the healthcare fraud cases because there are high degrees of variations in the clinical practices and billing patterns due to the complex healthcare services. For example a variation can be noticed in the doctors fee structures despite the fact that they are working in the same specialized departments. Many studies have demonstrated variations as high as 400 percent in the frequency of the major procedures among different doctors of the same hospital. There are four categories of the claim analysis. The first one is claim-centric which identifies whether the provided services are according to a patient’s age, gender and diagnosis. The second is member-centric which identifies whether the provided services are according to the specialty of the doctor. The third one is provider-centric which identifies whether the claimed services are provided by the specific hospital and the last category is the ‘network analysis’ which is based on the combination of the member-centric and provider-centric analysis [3]. Our research is focused precisely on claim-centric and member-centric. In recent past, several studies proposed techniques to develop fraud detection systems. Many of these studies used payment-based analysis to detect frauds. They use one of the healthcare elements to identify any one type of healthcare fraud. To best of our knowledge no one considered delivered or availed service as separate element. Whereas our proposed framework detects fraudulent activities, using all the three main elements namely doctor, patient and service. The important part of the framework is the rule engine, which process over five years original transactional data of employees of a local hospital, for generating rules. Moreover, a self-learned fraud detection system detects patient, doctor and service level frauds.
The fraud-related claims in healthcare are the sources of burden and inconvenience to the overall society. A fraud in healthcare, affects both, the public as well as private sector employers in the form of high-cost over-runs. There are many victims of healthcare frauds who are exploited by the unnecessary treatments. In some cases the patient’s data is compromised to generate any fraudulent claims. It will be a meaningless statement if we say that the healthcare fraud is not a crime or there are no victims of this fraud. Accordingly, a detection of fraud in healthcare is a hot topic of research, nowadays. There is a need to cut down this increasing cost as the victims of the healthcare fraud are none other than a common man. In most of the countries including Pakistan, the government has just initiated medical support programs through several national-level initiatives. One of these initiatives is the establishment of Prime Minister Task Force on IT and Telecom in 2018 to lay down the foundation of the data standards and annotations for incorporating the improved plans in healthcare service delivery to the common man. Our work is part of this program, proposing a framework that can be adopted for this national initiative. The major concern is to reduce/prevent the chance of fraudulent activities in such programs. This can only be achieved by implementing different adaptive-modelling techniques for detecting fraud through the healthcare data. For this we have utilized last five years insurance claim data of employees of one of the largest and well-equipped hospital of Pakistan. They provided us sufficient details for supporting this National level objective. According to the provided statistics each day thousands of patients visit this hospital and it has 62 different specialties.

Research Contributions

In recent years, the focus is more on fraud detection in the healthcare as the people in well-developed countries think that the fraud increases an overall expenditure and makes the health insurance problematic for the genuine people. In most of the developing countries, the government has started medical support programs and if such programs face any victimization of the healthcare fraud then there will be no support for genuine patients. This paper presents a novel framework for the fraud detection in healthcare; which considers all three main elements, namely, Patient (service-consumer), Doctors (service_providers) and Services (lab tests and treatments). Our proposed framework provides following significant contributions required in any health care fraud detection scheme.
  • The framework provides a self-learned knowledge base system, on the original five years transactional data of a local hospital.
  • The novel concept of generating association scores between doctors, patients and services is introduced. The association scores are computed based on frequency of visits between the above mentioned elements and used these association scores to detect anomalies.
  • Another novel idea of generating confidence values of all services in each specialty of a local hospital is introduced. As per domain knowledge only Cardiologist can recommend ECG whereas in real life even an ENT specialist can also recommend it, framework computes confidence value of service named ECG, have in ENT specialty. Similarly, even a peadiatrician can recommend kidney ultrasound and framework computes confidence value of the service named “kidney ultrasound” in peads specialty. There are many other examples of this. Based on these confidence values, rule engine is generated.
  • Another contribution is that this work is part of the national medical support program. We consider a private hospital as our pilot project because in our country due to lack of resources, the electronic health records are not well maintained in the public sector hospitals. Whereas private sector hospitals are using the automated and autonomous Electronic Health Record Systems and the availability of the patient’s data from the private sector is also better. For this reason, we consider the transactional data of a private sector hospital. The public sector programs normally run parallel to a private sector but this research is representing the private sector in the National Health Programs.

3. Material and Methods

3.1. Dataset Details

The analysis is conducted on five years [2013, 2014, 2015, 2016, 2017, 2018, 2019] insurance claim transactional data of a local hospital. These are hospital employees who are availing insurance policies provided by hospital management. Based on the designation, insurance policies are allocated to each employee. The size of transactional data is shown in Table 2. The initial framework is proposed in [37].
Table 2. Attributes in Dataset.
The set of attributes which are providing details about the availed and provided services are shown in Table 3.
Table 3. Each transaction’s Attributes in data set.
The framework involves an implementation of the three phases for detecting fraudulent activities:
  • Association scores generation and threshold application
  • Rule generation engine
  • Similarity Function
We have implemented the fraud detection system by incorporating the above-mentioned three phases. Detecting a fraud from the healthcare data is actually an identification of outliers from such records. In the first phase, we identified the “outliers” and “need to be investigated” cases. In the second phase, we implemented rule engine for further analyzing the identified cases from the first phase. In the third phase, we checked each current transaction against the generated rules. The proposed framework is depicted in Figure 2, the association between the doctors, patients and services are computed and whenever a case of fraud is identified, the rating score of that element, gets reduced. Based on the number of visits, the association scores are found and these are giving an in-depth understanding of the behaviour of each element.
Figure 2. This figure depicts functionality of proposed fraud detection model. First association scores are computed among services, doctors and patients then based on association scores cases are forwarded to Rule engine for further processing.

3.2. Association Scores Computation and Threshold Application

The three main elements of the proposed framework are patients, providers (doctors, pharmacy, hospitals) and services. These three elements are actually associated with each other. There is a need to find out the association score of each element with another element. The association scores are computed based on the frequency of visits or frequency of the prescriptions. If a patient visits frequently to avail a specific service (e.g., X-rays, ECGs). In this case, a patient is prescribed X-rays again and again from one doctor. This is considered as outlier. We compute association scores based on the frequency of the patient visits to the providers and services. The purpose of this step is to forward only those patient records to rule engine, which are identified as the “outliers” or “need to be investigated”. We computed the association scores by using Equations (1)–(4).
  • Doctor (Association score) Y is computed by denoting i as number of times patient P k checked by doctor D j and D n is representing total number of patients checked by doctor D j . As shown in Equation (1)
    Y = ( i / D n )
  • Patient with services (Association score) is also computed by denoting m as number of patients availed service S h and S x is representing number of times patient P k availed S h service. (for all patients)
    S p = ( m / S x )
  • Service with Doctor (Association score) is also computed by denoting T as number of times doctor D j prescribed service S h and S n is representing number of times all doctors prescribed service S h (for all services)
    S d = ( T / S n )
  • Patient (Association score) is computed by denoting G as number of times doctor D j examined patient P k and P n is representing total number of patient P k visits.
    F = ( G / P n )
The association scores are between 0 and 1. After the computation of the association scores, we calculate threshold by computing an average of all the association scores for each provider, service or patient. All those transactions which are less than average but greater than the minimum threshold and equal to the average, are considered as the normal cases whereas all the association scores which are greater than the average but less than the maximum threshold, are considered as the “need to be investigated”. The minimum threshold value and maximum threshold value is set up to identify the outliers. The minimum threshold indicates that anything that has happened just once is an anomaly. It means that if any patient, visits a provider only once that can be an anomaly (or any doctor prescribing any service just once to only one patient). Thus, we have kept the minimum threshold as 0.011. Similarly, we have chosen the maximum threshold by considering the fact that if a patient is visiting the same doctor and out of a total of his 100 visits, he visits the same doctor more than 70 times, there could be an anomaly. That is why, we have kept the association scores greater than 0.7, as the maximum threshold. All those association scores which are less than the minimum threshold and greater than the maximum threshold, are identified as the outliers. The flowchart of this phase is shown in Figure 3. Patient association scores are denoted as F, doctor’s association scores are denoted as Y and association scores of services with respect to doctors or patients are denoted as S p and S d respectively. We set threshold for all association scores as discussed above. Figure 3 explains the flow of the first phase of proposed framework. A hash algorithm is applied for the de-identification of patient records. The variables Y, S p , S d and F holds association scores of patient with respect to doctors, service with respect to doctor, service with respect to patient and doctor with respect to patient. The threshold is computed separately for each type of association scores. The variable Z is representing function or container which holds values for all four types of association scores after computation of threshold values. We apply four checks on Y, S p , S d and F separately. Based on these check ‘outlier’ and ‘need to be investigated’ cases are identified. The Rating score is initially set as 100 for each element of the framework and after first phase rating score is updated based on the occurrence of identified cases. Each time identified case is found, rating of that particular element is decremented as shown in Figure 3. The cases of the “need to be investigated” and “outliers” are analyzed in the second phase.
Figure 3. This figure depicts functionality of first phase. Association scores are computed among three elements.

3.3. Rule Engine Generation

The second phase of the proposed framework generates rules for each specialty of the local hospital. It is already mentioned that the proposed framework is validated on an original data of local hospital. There are 62 specialties in this hospital. Following are the two main tasks which are executed under this phase:
  • We perform hashing on the patients data by assigning separate identification numbers to every service, every doctor and specialty.
  • Clustering the transactional data and generated association rules.
During cluster analysis we found outliers within different clusters. For this purpose, after applying the clustering to transactional data we applied concepts of support and confidence to these generated clusters. We applied three different clustering algorithms on this transactional data: Gmeans, Xmeans and Fuzzy Cmeans. G-Means clustering algorithm, is an extension of KMeans. The G-means algorithm is density based clustering; it tries to find a subset of data that fits a Gaussian distribution. G-means executes k-means, increments value of variable k hierarchically until the data assigned to each centroid are Gaussian. It is identified by research that Gmeans is improved form of clustering which has provided an intrusion detection with the high Detection and the low False Positive Rate. This technique can approximate number of the clusters in the considered data and initialize the centroids which results in fast convergence of algorithm [38]. The X-means [39] executes K-means multiple times and during each run, it takes local decisions whether to create a subset of current centroid or not and this splitting decision is taken by the computation of the Bayesian Information Criterion (BIC) [40]. We have compared the generated clusters of all three algorithms in Table 4. We took one cluster and computed Mean of that cluster. Centriods are generated by Fuzzy C-means, G-means and X-means. Pick the centriod generated by each algorithm, which is closest to the computed Mean. Actual center is the computed mean of selected cluster. Computed center is the centriod computed by the algorithms. Difference is the subtraction of actual center from algorithm computed centriod. Based on our analysis, it is found that the G-means clustering is more efficient as compared to the other two clustering techniques for this transactional data.
Table 4. Comparison of Clustering Algorithms.

3.4. Rule Engine Algorithm

Following steps are used to generate rule engine

3.4.1. Step 1

Perform de-identification of patient records.
  • Each patient assigned p a t i e n t n unique number
  • Each doctor/specialization assigned d o c t o r n unique identifier
  • Each service assigned s e r v i c e n unique identifier

3.4.2. Step 2

Grouping of patient records based on the specaility_id from where they availed service. Guassian based clustering is used for the identification of clusters as shown in Figure 4.
S u p p o r t ( S h ) = C o u n t ( S h ) / c l u s t e r n
C o n f i d e n c e ( S h D j ) = S u p p o r t ( S h D j ) / S u p p o r t ( S h )
where c l u s t e r n is the total number of elements in clusters. Transaction c n is representing transactions of the patients P k who are identified as two separate cases namely “Need to be Investigated” and “outliers”. All transactions which are identified with these labels are transferred for further analysis, to the rule engine. We computed confidence value for each service within clusters. We apply threshold on confidence values for all members within clusters and all members whose confidence values are on boundaries are identified as anomaly. The flowchart for second phase is shown in Figure 4 which shows how clusters are processed to generate rules. We find support count of each specialty D j in all clusters and then find support count of each service S h for this specialty D j .
Figure 4. The rule engine is computing confidence values for all services in all specialties.
Finally, these support counts are used for computing confidence values. The last condition is for checking whether confidence values are computed for all specialties or not. Based on the computed confidence values, rules are generated which are stored in database for the third phase. Figure 5 describes the complete fraud detection system. In Figure 5, there are three main elements, and each element is receiving transactional data from different hospital servers. Each element (Patient, provider, service) has its own storage. Association scores are computed between each pair namely service with respect to doctor, service with respect to patient, patient with respect to doctor, and doctor with respect to patient. Once the association scores are computed and thresholds are applied, we get set of identified cases. Transactions are identified in two cases “outliers” or “need to be investigated”. The rating of each element (Patient, provider, service) whose transactions are found to be suspicious will be decremented. These cases are used as an input to the rule engine. The Rule engine further analyzes the transactions and if these cases are detected as fraud then rating score of involved element, will remain same otherwise rating score will be updated. Basically set of rules are generated for each specialty_id. Whenever any patient visits the hospital for availing the particular set of services, system first checks which specialty_id patient visits, and then evaluates according to the rules already computed for each specialty_id. The third phase of the proposed framework is shown in Figure 6, Similarity function is used for computation of similarity between current transaction c and generated rule R. Similarity bit is denoted by a and Similarity Function denoted by H
S i m i l a r i t y f u n c t i o n H = R c
Similarity bit is a equal to 1, if after the similarity computation the size of the input transaction c is equal to size of similarity function H, and if after similarity computation the size of the input transaction c is not equal to size of similarity function H then similarity bit will be equal to zero. If the similarity bit is not 1, then transaction will be marked as a fraud. Otherwise it will be marked as normal.
Figure 5. Detailed visualization of fraud detection system.
Figure 6. Third phase of Fraud detection model is described.

4. Results and Discussions

4.1. Case Study

The five years (2013, 2014, 2015, 2016, 2017, 2018, 2019) annotated insurance claim transactional data of employees of a local hospital is considered for this analysis. The addressed problem is the constant increase in employees insurance coverage expenditures in each year as depicted in Figure 7 and it can be easily predicted as exponential increment in coming years due to increase in healthcare frauds. Fraud detection model is applied to analyze this dataset and only few results are shown to add better understanding of the work and therefore only subsets of results are shown in the figures.
Figure 7. Yearwise insurance amount utilization.

4.1.1. First Phase

In the first phase, the association scores are computed between each pair of elements. Few of the cases are shown in this section to explain how association scores are actually computed. In this phase we identified two separate cases:
  • Outliers
  • Need to be investigated
Association score among service Optical Coherence Tomography OCT scan and patients are shown in Figure 8. Total 21 patients avail this service of OCT scan and an average of all association scores is 0.052. We set this average value as a threshold. It can be seen from the Figure 8 that two patients are identified as “need to be investigated”, and rating of this service is decremented to 98 from 100. Total score of rating is 100. Similarly, association score for all services and patients are computed in the same manner and the rating score is also adjusted accordingly.
Figure 8. Service with respect to Patient association scores.
Figure 9 explains doctor_id association scores with respect to patients. Total 36 number of patients are examined by the doctor_id 131. Two cases are identified as “need to be investigated” and the rating score of this doctor is decreased to 98 from 100.
Figure 9. Doctor with respect to Patient association scores.
Figure 10 shows association scores of services with respect to doctors. Service Routine Electroencephalogram “EEG” prescribed by 50 different doctors and six cases are identified as “need to be investigated”. The threshold value is 0.0476. The Rating score of this service is 94, which is decreased by 6. Complete output is shown in Appendix A.1.
Figure 10. Service with respect to doctor association scores.
Figure 11 is explaining patients with respect to doctors association scores, Patient MR_no is 959705 visited 126 times to the hospital and he is examined by 12 different doctors. This patient visited doctor_id 1511, forty eight times. Four cases of “need to be investigated” are identified. The Rating score of this patient is decreased to 96.
Figure 11. Patient with respect to doctor association scores.

4.1.2. Second Phase

All those records which are identified as “outlier” or “need to be investigated” are forwarded to the Rule Engine for a further investigation. Total 62 association rules are generated from this data set and separate rule is generated for each specialization and specialty_id is used to represent identifier for each specialization. Rule engine basically generate rules that describe which specialization can provide which specific service. We generated rule by computing confidence values for each service in particular specialization. By using this knowledge, we can evaluate each transaction whether it is normal or fraud. This can be done by applying the Similarity function. We have selected specialty Urology with specialty_id: 620 and we can get all services which are provided by this specialty_id as shown in Figure 12. It can be seen that there is confidence value of each service for each specialty_id.
Figure 12. The specialty_id 620 (Urology) is selected and confidence values of each service availed/provided in this specialty.
The relationship between service and specialty is depicted in Figure 13, in this plot confidence values of all service_ids for the specialty_ids are depicted. The value of confidence has provided us with an estimation, that what is the probability of prescription of considered service in this specialty_id. Based on this estimation, resources can be also allocated and budget can also be planned. Table 5, is depicting confidence values of few services for different specialties.
Figure 13. Scatter plot for all services confidence for all specialty.
Table 5. Services and their Confidence values for Pulmonologist, Orthopedic, Pediatrician, Neurologist, Urologist.
Rule only contain service_ids whose confidence values are above 0.001 or user can define the threshold depending upon their scenarios. The rule generated by the rule engine can be explained with the help of example. Consider specialty name Pediatric Cardiology. Table 6 is showing services availed from this specialty and confidence values of these services are also provided.
Table 6. Services and their confidence values for Pediatric Cardiology.
Table 7 is depicting rule for this specialty. If in any transaction Abdomen upper service is availed. The similarity function first check whether this service is present in the rule of Pediatric Cardiology as shown in Table 6. This case is identified as fraud, and if in any transaction service is availed whose confidence value is less than 0.001, from the considered specialty it will also be identified as fraud and passed to analyst dashboard for further investigation. The rules are generated from the medical historical data.
Table 7. Rule for Pediatric Cardiologist extracted from Table 6.

4.1.3. Third Phase

The following example explains how the similarity bit is computed using already generated rule for specialty_id: 620. In Current transaction c, patient is availing three services from specialty_id: 620, it can be seen from Table 5, that there is no service with service_id: 2. Computation of similarity function and similarity bit value generation are shown below. The value of Similarity bit is 0 this means this transaction is a fraud case.
c = T r a n s a c t i o n
c = { 1070 , 1152 , 2 }
S i z e ( c ) = 3
S i m i l a r i t y f u n c t i o n = R c
R = { 1070 , 1152 }
S i z e ( H ) = 2
if S i z e ( c ) and S i z e ( H ) are equal then similarity bit will be a = 0 . If the similarity bit is equal to 1 only then the current transaction is normal. This engine is generated from five years annotated transactional data, so based on the association scores we have identified cases and evaluated them against the rules, from which we have found already tagged fraud cases. After this analysis we have reached to final status and got rating of doctors, patients and services separately.

4.2. Detected Frauds

After the third phase fraud cases have been detected. Now, we are able to check status and rating of each element. As it is already mentioned that due to the large size of data only a subset of records are shown in screenshots to depict our system performance. One of the main point that must be clarified at this level is that we have considered employee insurance claim data, so detected cases are less in number because patients are either employees or their beneficiaries. We discussed detected fraud cases using the screenshots.
Figure 14 depicts the final status and ratings of doctors. It can be seen that three doctors have been identified by our system as a fraud. It can be seen that the rating score of these doctors are also adjusted finally. Doctor_id 2301, 551 and 31 are identifed as the fraud cases. Initial status and rating of the identified doctors, which are generated in the first phase of the proposed system, are depicted in Figure 15, Figure 16 and Figure 17.
Figure 14. Doctors rating and status.
Figure 15. Doctor_id 2301 initial rating in First phase.
Figure 16. Doctor_id 31 initial rating in First phase.
Figure 17. Doctor_id 551 initial rating in First phase.
Figure 15 shows association score of doctor_id: 2301. Initial rating of this doctor is 95, as five identified cases of this doctor are forwarded to the Rule engine (second phase). Complete output of Figure 15 is provided in Appendix A.2. Third phase has identified doctor_id: 2301 as a fraud and updated rating score of this doctor is 99 as shown in Figure 14.
Figure 16 shows initial rating of doctor_id: 31 which is −9 (negative). More than a 100 identified cases of this doctor are forwarded to the rule engine (second phase). The doctor_id: 31 is also identified as fraud and final rating is 99 as shown in Figure 14. Figure 17 shows that first phase’s initial rating of doctor_id: 551 is 7 and 93 cases of this doctor are forwarded to second phase for analysis. In the third phase this doctor is identified as fraud and updated rating score is 98. Figure 18 shows that eight cases of frauds are identified in service availing patterns. As it is the subset of complete output. We can also check initial ratings and association scores of each of these Patients in first phase as we have checked for the doctors. When a service element is considered, service_id 221 and service_id 250 are detected as fraud. It is shown in Figure 19 and Figure 20.
Figure 18. Patients Rating and status.
Figure 19. Service wrt Patient Rating and status.
Figure 20. Service wrt doctors Rating and status.
So we have analysed this transactional data in terms of the three element of proposed framework and detected different number of already tagged fraud cases. The rule engine has been designed on the basis of five years transactional data, each specialty_id (specialization like cardiology, urology etc.) has a set of services with confidence levels that define rules for it.

5. Conclusions

Many countries have recently initiated government medical support programs and in such programs there is no tolerance for any fraudulent claims. There is a critical need of system for capturing and identifying fraud cases in day to day transactions in healthcare industry. Lots of research studies have been conducted in last decade but most of them are based on financial analysis and disease/medication analysis. We have proposed framework by considering patients, doctors (providers) and services as main elements. We computed relationships between these elements by calculating association scores. By learning from the historical transactional data, we have generated Rule engine. Firstly, dataset is filtered out based on elements association scores and then forwarded identified cases to the Rule engine for further analysis. The fraud cases are finally identified and the ratings of all three elements are updated after an evaluation from the rule engine. We have validated this framework for detecting fraudulent transactions from annotated local hospital transactional data and successfully identified eight fraud cases along patient element, two cases along service element and three cases along doctor element of proposed System. We communicated our findings to the hospital management.
In future, the proposed methodology can be further improved by extracting sequences of services availed from each specialty using some data mining techniques. Upon finding a set of sequences for every specialty, fraud detection will be more effective.

Author Contributions

Conceptualization, I.M. and S.K.; methodology, I.M. and S.K., software, I.M.; validation, I.M., S.K., H.u.R.; resources, H.u.R.; data curation, I.M.; writing—original draft preparation, I.M. visualization, I.M.; supervision, S.K.; Review and Proofreading F.H. and All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

Special thanks to shifa international hospital, Islamabad, Pakistan for provding dataset for validation of proposed framework. This research is part of PM Task Force on IT and Telecom initiative for “Sehat Card Scheme”.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1

Complete output of Figure 10 is shown in Figure A1, Figure A2, Figure A3 and Figure A4.
Figure A1. Servuce wrt doctor Rating and status.
Figure A2. Servuce wrt doctor Rating and status.
Figure A3. Servuce wrt doctor Rating and status.
Figure A4. Servuce wrt doctor Rating and status.

Appendix A.2

Complete output of Figure 15, is depicted in Figure A5 and Figure A6.
Figure A5. Doctor wrt Patient Rating and status.
Figure A6. Doctor wrt Patient Rating and status.

References

  1. Optum. The Key to Detecting Fraud and Abuse in Medical Billing; White Paper 12-28110 04/12; Optuminsight, Inc.: Eden Prairie, MN, USA, 2012. [Google Scholar]
  2. Olsen, L.; Saunders, R.S.; Yong, P.L. The Healthcare Imperative: Lowering Costs and Improving Outcomes: Workshop Series Summary; National Academies Press: Washington, DC, USA, 2010. [Google Scholar]
  3. Landon, B.E.; Keating, N.L.; Barnett, M.L.; Onnela, J.; Paul, S.; O’Malley, A.J.; Keegan, T.; Christakis, N.A. Variation in patient-sharing networks of physicians across the united states. JAMA 2012, 308, 265–273. [Google Scholar] [CrossRef] [PubMed]
  4. Li, J.; Huang, K.Y.; Jin, J.; Shi, J. A survey on statistical methods for health care fraud detection. Health Care Manag. Sci. 2008, 11, 275–287. [Google Scholar] [CrossRef] [PubMed]
  5. Joudaki, H.; Rashidian, A.; Minaei-Bidgoli, B.; Mahmoodi, M.; Geraili, B.; Nasiri, M.; Arab, M. Using data mining to detect health care fraud and abuse: A review of literature. Global J. Health Sci. 2015, 7, 194. [Google Scholar] [CrossRef] [PubMed]
  6. Travaille, P.; Müller, R.M.; Thornton, D.; Hillegersberg, J.V. Electronic fraud detection in the us medicaid healthcare program: Lessons learned from other industries. In Proceedings of the 17th Americas Conference on Information Systems, AMCIS 2011, Detroit, MI, USA, 4–8 August 2011. [Google Scholar]
  7. Ortega, P.A.; Figueroa, R.G.A.; Cristin, J. A medical claim fraud/abuse detection system based on data mining: A case study in chile. DMIN 2006, 6, 26–29. [Google Scholar]
  8. Yang, W.; Hwang, S. A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst. Appl. 2006, 31, 56–68. [Google Scholar] [CrossRef]
  9. Thornton, D.; van Capelleveen, G.; Poel, M.; van Hillegersberg, J.; Mueller, R.M. Outlier-based health insurance fraud detection for us medicaid data. In Proceedings of the 16th International Conference on Enterprise Information Systems, ICEIS (2), Lisbon, Portugal, 27–30 April 2014; pp. 684–694. [Google Scholar]
  10. Liu, Q.; Vasarhelyi, M. Healthcare fraud detection: A survey and a clustering model incorporating geo-location information. In Proceedings of the 29th World Continuous Auditing and Reporting Symposium (29WCARS), Brisbane, Australia, 21–22 November 2013. [Google Scholar]
  11. Kose, I.; Gokturk, M.; Kilic, K. An interactive machine-learning-based electronic fraud and abuse detection system in healthcare insurance. Appl. Soft Comput. 2015, 36, 283–299. [Google Scholar] [CrossRef]
  12. Thornton, D.; Mueller, R.M.; Schoutsen, P.; Hillegersberg, J.V. Predicting healthcare fraud in medicaid: A multidimensional data model and analysis techniques for fraud detection. Procedia Technol. 2013, 9, 1252–1264. [Google Scholar] [CrossRef]
  13. Feldman, K.; Chawla, N.V. Does medical school training relate to practice? Evidence from big data. Big Data 2015, 3, 103–113. [Google Scholar] [CrossRef] [PubMed]
  14. Herland, M.; Bauder, R.A.; Khoshgoftaar, T.M. Medical provider specialty predictions for the detection of anomalous medicare insurance claims. In Proceedings of the 2017 IEEE International Conference on Information Reuse and Integration (IRI), San Diego, CA, USA, 4–6 August 2017; pp. 579–588. [Google Scholar]
  15. Bauder, R.A.; Khoshgoftaar, T.M.; Richter, A.; Herland, M. Predicting medical provider specialties to detect anomalous insurance claims. In Proceedings of the 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), San Jose, CA, USA, 6–8 November 2016; pp. 784–790. [Google Scholar]
  16. Bauder, R.A.; Khoshgoftaar, T.M. A probabilistic programming approach for outlier detection in healthcare claims. In Proceedings of the 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), Anaheim, CA, USA, 18–20 December 2016; pp. 347–354. [Google Scholar]
  17. Bauder, R.A.; Khoshgoftaar, T.M. A novel method for fraudulent medicare claims detection from expected payment deviations (application paper). In Proceedings of the 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI), Pittsburgh, PA, USA, 28–30 July 2016; pp. 11–19. [Google Scholar]
  18. Bauder, R.A.; Khoshgoftaar, T.M. The detection of medicare fraud using machine learning methods with excluded provider labels. In Proceedings of the Thirty-First International Florida Artificial Intelligence Research Society Conference (FLAIRS-31), Melbourne, FL, USA, 21–23 May 2018. [Google Scholar]
  19. Chandola, V.; Sukumar, S.R.; Schryver, J.C. Knowledge discovery from massive healthcare claims data. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; ACM: New York, NY, USA, 2013; pp. 1312–1320. [Google Scholar]
  20. Verma, A.; Taneja, A.; Arora, A. Fraud detection and frequent pattern matching in insurance claims using data mining techniques. In Proceedings of the 2017 Tenth International Conference on Contemporary Computing (IC3), Noida, India, 10–12 August 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–7. [Google Scholar]
  21. Huang, Z.; Lu, X.; Duan, H. Anomaly detection in clinical processes. In Proceedings of the AMIA Annual Symposium Proceedings, Chicago, IL, USA, 3–7 November 2012; American Medical Informatics Association: Bethesda, MD, USA, 2012; Volume 2012, p. 370. [Google Scholar]
  22. Okita, A.; Yamashita, M.; Abe, K.; Nagai, C.; Matsumoto, A.; Akehi, M.; Yamashita, R.; Ishida, N.; Seike, M.; Yokota, S.; et al. Variance analysis of a clinical pathway of video-assisted single lobectomy for lung cancer. Surg. Today 2009, 39, 104–109. [Google Scholar] [CrossRef] [PubMed]
  23. de Klundert, J.V.; Gorissen, P.; Zeemering, S. Measuring clinical pathway adherence. J. Biomed. Inform. 2010, 43, 861–872. [Google Scholar] [CrossRef] [PubMed]
  24. Gath, I.; Geva, A.B. Unsupervised optimal fuzzy clustering. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11, 773–780. [Google Scholar] [CrossRef]
  25. Lenard, M.J.; Alam, P. Application of fuzzy logic to fraud detection. In Encyclopedia of Information Science and Technology, 1st ed.; IGI Global: Hershey, PA, USA, 2005; pp. 135–139. [Google Scholar]
  26. Köppen, M.; Kasabov, N.; Coghill, G. Advances in Neuro-Information Processing: 15th International Conference, ICONIP 2008, Auckland, New Zealand, November 25–28, 2008, Revised Selected Papers; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5507. [Google Scholar]
  27. Peng, J.; Li, Q.; Li, H.; Liu, L.; Yan, Z.; Zhang, S. Fraud detection of medical insurance employing outlier analysis. In Proceedings of the 2018 IEEE 22nd International Conference on Computer Supported Cooperative Work in Design (CSCWD), Nanjing, China, 9–11 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 341–346. [Google Scholar]
  28. Anbarasi, M.S.; Dhivya, S. Fraud detection using outlier predictor in health insurance data. In Proceedings of the 2017 International Conference on Information Communication and Embedded Systems (ICICES), Chennai, India, 23–24 February 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar]
  29. Sun, C.; Yan, Z.; Li, Q.; Zheng, Y.; Lu, X.; Cui, L. Abnormal group-based joint medical fraud detection. IEEE Access 2018, 7, 13589–13596. [Google Scholar] [CrossRef]
  30. Cui, H.; Li, Q.; Li, H.; Yan, Z. Healthcare fraud detection based on trustworthiness of doctors. In Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, Tianjin, China, 23–26 August 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 74–81. [Google Scholar]
  31. Hristidis, V. Information Discovery on Electronic Health Records; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
  32. Altaf, W.; Shahbaz, M.; Guergachi, A. Applications of association rule mining in health informatics: A survey. Artif. Intell. Rev. 2017, 47, 313–340. [Google Scholar]
  33. Toti, G.; Vilalta, R.; Lindner, P.; Lefer, B.; Macias, C.; Price, D. Analysis of correlation between pediatric asthma exacerbation and exposure to pollutant mixtures with association rule mining. Artif. Intell. Med. 2016, 74, 44–52. [Google Scholar] [CrossRef] [PubMed]
  34. Cai, R.; Liu, M.; Hu, Y.; Melton, B.L.; Matheny, M.E.; Xu, H.; Duan, L.; Waitman, L.R. Identification of adverse drug-drug interactions through causal association rule discovery from spontaneous adverse event reports. Artif. Intell. Med. 2017, 76, 7–15. [Google Scholar] [CrossRef] [PubMed]
  35. Zeng, L.; Wang, B.; Fan, L.; Wu, J. Analyzing sustainability of chinese mining cities using an association rule mining approach. Resour. Policy 2016, 49, 394–404. [Google Scholar] [CrossRef]
  36. Sowah, R.A.; Kuuboore, M.; Ofoli, A.; Kwofie, S.; Asiedu, L.; Koumadi, K.M.; Apeadu, K.O. Decision support system (dss) for fraud detection in health insurance claims using genetic support vector machines (gsvms). J. Eng. 2019, 2019, 1432597. [Google Scholar] [CrossRef]
  37. Matloob, I.; Khan, S. A framework for fraud detection in government supported national healthcare programs. In Proceedings of the 2019 11th International Conference on Electronics, Computers and Artificial Intelligence (ECAI), Pitesti, Romania, 27–29 June 2019; pp. 1–7. [Google Scholar]
  38. Zhao, Z.; Guo, S.; Xu, Q.; Ban, T. G-means: A clustering algorithm for intrusion detection. In Proceedings of the International Conference on Neural Information Processing, Auckland, New Zealand, 25–28 November 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 563–570. [Google Scholar]
  39. Pelleg, D.; Moore, A.W. X-means: Extending k-means with efficient estimation of the number of clusters. In Proceedings of the ICML: Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, CA, USA, 29 June–2 July 2000; Volume 1, pp. 727–734. [Google Scholar]
  40. Ekina, T.; Leva, F.; Ruggeri, F.; Soyer, R. Application of bayesian methods in detection of healthcare fraud. Chem. Eng. Trans. 2013, 33. [Google Scholar] [CrossRef]

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.