Next Article in Journal
Antibacterial Efficacy of Some Medicinal Plants on Multidrug Resistance Bacteria and Their Toxicity on Eukaryotic Cells
Next Article in Special Issue
Improving the Consistency of the Failure Mode Effect Analysis (FMEA) Documents in Semiconductor Manufacturing
Previous Article in Journal
Optimization Techniques for a Distributed In-Memory Computing Platform by Leveraging SSD
Previous Article in Special Issue
Study on the Digital Intelligent Diagnosis of Miniature Machine Tools
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Need-Based and Optimized Health Insurance Package Using Clustering Algorithm

Department of Computer and Software Engineering, College of Electrical and Mechanical Engineering (CEME), National University of Sciences and Technology (NUST), Islamabad 44000, Pakistan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(18), 8478; https://doi.org/10.3390/app11188478
Submission received: 26 July 2021 / Revised: 6 September 2021 / Accepted: 8 September 2021 / Published: 13 September 2021
(This article belongs to the Special Issue Advanced Applications of Industrial Informatic Technologies)

Abstract

:
The paper presents a novel methodology based on machine learning to optimize medical benefits in healthcare settings, i.e., corporate, private, public or statutory. The optimization is applied to design healthcare insurance packages based on the employee healthcare record. Moreover, with the advancement in the insurance industry, it is rapidly adapting mathematical and machine learning models to enhance insurance services like funds prediction, customer management and get better revenue from their businesses. However, conventional computing insurance packages and premium methods are time-consuming, designation specific, and not cost-effective. During the design of insurance packages, an employee’s needs should be given more importance than his/her designation or position in an organization. The design of insurance packages in healthcare is a non-trivial task due to the employees’ changing healthcare needs; therefore, using the proposed technique employees can be moved from their existing package to another depending upon his/her need. This provides the motivation to propose a methodology in which we applied machine learning concepts for designing need-based health insurance packages rather than professional tagging. By the design of need-based packages, medical benefit optimization which is the core goal of our proposed methodology is effectively achieved. Our proposed methodology derives insurance packages that are need-based and optimal based on our defined criteria. We achieved this by first applying the clustering technique to historical medical records. Subsequently, medical benefit optimization is achieved from these packages by applying a probability distribution model on five years employees’ insurance records. The designed technique is validated on real employees’ insurance records from a large enterprise.The proposed design provides 25% optimization on medical benefit amount compared to current medical benefits amount therefore, gives better healthcare to all the employees.

1. Introduction

In developing countries, the income persons/employees cannot afford the high cost medical services which are exponentially rising with time. Under such conditions, good insurance policies can prevent the employees from financial troubles when required [1]. Healthcare provision is the main challenge in health care industry. The cooperate organizations/enterprises help their employees with healthcare insurance coverage to increase their productivity. When insurance policy amount is not enough to fulfil the healthcare needs of the employees in that case employees tries to overcome their burden by mis-utilizing medical benefit amount. The main cause of these misutilizations is the inappropriate design of medical benefit packages. Moreover, the enterprises normally provide one package with a fixed premium amount and a medical benefit amount for each type of category in their organizational hierarchy, without considering that one employee need for healthcare services may vary as compared to others in the same category.
There are three main elements in the health insurance industry namely client, healthcare providers, and insurer [2]. Some of the important concepts related to the health insurance industry are listed below in Table 1.

1.1. Problem Description

In healthcare insurance, the sum insured and premium are directly proportional to each other. Whenever the sum insured amount is increased premium amount also increases accordingly.
The four main problems in health insurance are: moral hazard, adverse selection, healthcare fraud [2] and excess sum insured amounts as shown in Figure 1. Moral hazard is the term used for the situation, once an employee gets insurance coverage, he/she tries to get treatment for a long-standing health condition like hernia, etc. The adverse selection is the case when groups with poor health conditions, got insurance coverage. Pre-existing member groups with a fewer number of healthy members, join an insurance scheme results in monetary losses for these schemes because the insurance claims are more than the total premium collected. The third main problem is healthcare fraud which is the misutilization of available funds. The fourth important problem is that organizations are paying more premium amounts for their employee’s health insurance. If there are 10,000 employees in any organization, there is a possibility that only 5000 employees are visiting the hospital for availing health services whereas the remaining never avail services. In some cases, the premium amount is greater than the total computed medical benefit utilization. Therefore, there is a need to optimize the premium amount according to the medical benefit utilization amount.
The main goal is to keep a balance between two important factors: Employee need (sum insured amount) and employer budget (budget limits (premium)). The imbalance between these two factors results in overburdened employers or stressful employees as shown in Figure 1.
The objective is to develop a dynamic premium amount calculation technique for different groups within each category of employees in an organization using machine learning. Employees can be transferred from one group to another depending upon their needs. These packages are useful for employers as well as for insurance companies or employers.
Solution based on rigorous and systematic methodology is required, which can minimize the financial crisis of employees and enterprises. To the best of our knowledge, none of the existing designs used machine learning techniques to generate need-based insured packages as well as the data driven analysis on real enterprise record using probability distribution model. In the proposed methodology, optimization refers to the amount of medical benefit performed. The existing medical benefits amount is optimized by using the need-based packages evaluated using machine learning techniques.

1.2. Research Contributions

The proposed methodology focuses on medical benefit optimization. The designed methodology provides following research contributions to the research area:
1.
Incorporates medical benefit optimization using kmeans clustering and probability distribution model.
2.
Generates employee need-based health insurance packages for enterprises and insurance companies. The historical medical records of employees are analyzed for generating these packages.
3.
Provides detailed data-driven analysis for medical benefit optimization by defining lower and upper bounds for each package amount and estimation of out of pocket employees in each category.

2. Related Work

Most of the research is conducted on highlighting different issues in healthcare but none of them is focused on the cause of the incidence. The prevalence of the issues lies in addressing the lack of available funds to low-income employees for healthcare services. There are researches that employed machine learning techniques to address many problems in the insurance industry  [3]. Recent research is conducted to optimize benefit amounts in the automobile insurance industry but still lacks the focus on the optimization of the amounts using need-based insurance packages.
The genetic algorithm-based methodology for optimization of technical benefits over car insurance amount is proposed in  [4]. The design uses a combination of heuristic and predictive techniques to perform optimization for moving targets. An automated framework for insurance companies is suggested using an extreme gradient boosting algorithm on blockchain-based transactional records  [5]. The results showed that XGboost produces high performance as compared to other existing learning algorithms. An online learning solution is proposed in  [6] that can automatically handle real-time updates of the insurance network. Machine learning-based techniques are applied to predict fake claims in automobile insurance and simplifies the calculation of premium amounts based on previous financial details for different customers. The importance of machine learning models in the insurance domain is sssed, and a comparison of household and motor insurance is provided in  [7]. The detailed evaluation of machine learning algorithms in the insurance industry is performed in  [8] and proved the effectiveness of the random forest algorithm. A study on the health insurance scheme implementation constraints of an organization is discussed in  [9]. It uses the qualitative research design over the employee data to explore the challenges counter in the implementation process. A strategy to model the renewal price adjustment problem as a sequential decision problem and single and constrained Markov chain process is proposed in  [10]. The model analyzes revenue maximization and its effects on the customers’ retention levels by applying a model-free reinforcement learning algorithm. The results are validated on employee’s data of a Spanish company named BBVA. A hybrid methodology is proposed in  [11] that analyzes the data imbalance problem in the automobile insurance industry by applying K Reverse Nearest Neighborhood and one-class support vector machine (OCSVM) algorithms. A comparative study is performed in  [12] that compares various classifiers and an application of genetic algorithm-based fuzzy C-Means clustering in the automobile insurance industry.
Few proposed schemes utilized probability distributions and other statistical models for funds prediction and annuity valuation. A new risk-function premium principle is proposed in  [13]. An investigation is performed in  [14] that proves the fourth-order statistics as an accurate approximation of the expected loss. The research conducted in  [15] utilizes extreme trees and neural networks (NNs) models for bond return predictability. A methodology based on a machine-learning technique for pricing arithmetic and geometric average options is proposed in  [16]. It provides a model-free scheme for efficient and quick results. The advanced ensembles such as random forests and boosted trees are used for insurance pricing, presented in  [17]. The proposed design uses probability distribution models such as Poisson and gamma deviation for the purpose. A framework based on machine learning models is proposed in  [18] that elaborates the impact of models on individual insurance consumers. The evaluation of the risk and severity of an insurance claim is performed using the telematics data and prior knowledge in  [19]. The scheme computes the premium amount using variables, e.g., age, postal code and car model and driving patterns are used for generating premium. The methods for variable annuity valuation based on machine learning and data clustering are proposed in  [20]. The tree boosted models of machine learning are used to optimize the proposed premium, presented in  [21]. A system to adjust the insurance rates in real-time according to the change in character traits of the user is designed in  [22]. The cost optimization issue is addressed in  [23], that provides a comprehensive review of existing studies and analyzed research studies based on healthcare cost optimization problems. The impact of artificial intelligence on the insurance industry is presented in  [24]. The study illustrates that with the focus on cost efficiency and new revenue streams, the insurance business model is transformed from a loss compensation model to prediction and prevention.

3. Materials and Methods

The proposed methodology is divided into three cascaded steps, including generation of need-based package using mapping and clustering, computation of medical benefits using the generated packages and data driven analysis using probability distribution to estimate the amount. The notations used in the algorithms are depicted in Table 2.
Figure 2 depicts the overall methodology used for the medical benefit optimization. The concept of clustering is incorporated to identify natural groups in existing categories of employees.The kmeans clustering is applied on each category and the clusters centroid are mapped on the selected category records.

3.1. Need Based Package Generation Using Clustering

The machine learning concept of clustering is used for classification purposes. Clustering is the natural grouping of records or data, in which similar records are in the same group whereas dissimilar records are in the different group. For generating need-based packages, we must perform category based analysis of transactional data. One category of employees is considered at one time and then clustering and mapping are applied as shown in Figure 3, amount is the money used by the employees of the selected category and on the x-axis number of records are the number of transactions in the selected category. Each category can be divided into groups, as depicted in the Figure 3. As a result of clustering, clusters are obtained in each category. Once a centroid for each cluster is obtained, we compute the distance of each computed centroid from all records of the considered category. The distance between centroids and all records of each category are computed and based on these calculated distances, patients are assigned to each cluster. The values of amount and visits are kept same as that of the centroid for all records. These steps are repeated for all the centroids.
Algorithm 1, explains the procedure for the need-based package generation. Inputs for the algorithm 1 are: number of clusters, max_iterations, φ 1 , φ 2 , φ 3 , φ 4 , φ 5 , φ 6 , φ 7 along with all considered attributes. First of all, the Kmeans algorithm is executed for one category. The second for loop is on x, where x denotes the number of clusters and C x denotes centroid for cluster x. Mapping of C x on specific category records is performed. The meaning of term mapping is different in our scenario, it is the computation of distance of C x from all the records in the considered category. Fourth, for loop iterates for the number of records in a specific category. D u denotes the distance of record from centroid where subscript k denotes category. The mean of all the D u is computed. All those records whose D u is less than the Mean are added to the G x . All remaining are discarded. At line no 18, the amount of considered record is A m u , and A m x is the amount of centroid. A m u is replaced with A m x . In the end, the sum of all amounts A m x in one group is computed.
For generating packages, there is a need to sum up the total amount of all members in one group. Within every category u, total amount is computed for each group β i . Total amount α j for each category is computed in the next step.
Algorithm 1: Need based Package generation using clustering and mapping.
Applsci 11 08478 i001

3.2. Medical Benefit Computation Using Generated Packages

We generate different insurance packages for each category in the first step. Once all insurance packages are generated, the next step is to calculate the total medical benefit utilization is computed using the following equations:
ϵ G i = α G i / a m o u n t G i
where a m o u n t G i is the amount value of centroid of considered group in each category and i is the number of groups in each category. In Equation (2), we are computing number of employees from generated packages.
ϵ k = i G ϵ G i
Once we get ϵ , Equation (3) is used, for computing total medical benefit utilization.
ϕ = j = 1 j = k i = 1 i = k ϵ i α j
α j is the total computed amount for each category where j = A,B,C,D,E,F,G. The step by step implementation is represented in Algorithm 2.
Algorithm 2: Medical benefit computation using generated packages.
Applsci 11 08478 i002

3.3. Data Driven Analysis Using Probability Distribution Model

Once we get ϕ , then we will move to the third step. In this step, data-driven analysis is performed for computing:
1.
Lower and upper bound for each group of each category and
2.
Out of pocket employees before and after optimization
3.
Estimation of total medical benefits amount by using a probability distribution model
A probability distribution is a function that depicts the possibility of getting the values that a random variable can have within the distribution. In our case, this random variable is the a m o u n t . We evaluate the results of the above two steps, with the help of probability distribution models. The probability distribution model that fits employee insurance records is the Gamma distribution model as shown in Figure 4. On the X-axis, there number of employees and on the Y-axis is the probability density function.
The shape and scale parameters of gamma distributions are a and b respectively. The formula for Mean μ and standard deviation σ in terms of gamma distribution parameters are represented in Equations (4) and (5).
μ G h = a G h b G h
σ G h = a G h b G h
where h = {1,2,3,4..}, depending upon the number of groups within the category. Once we get μ G h and σ G h for each group of all categories, we can adjust the premium amount for each group of every category, by using a gamma distribution plot. Algorithm 3 explains the computation of lower and upper bound for premium amounts for each category using the gamma probability distribution model. V i is the out of pocket employees where i represents a category.
Algorithm 3: Data driven analysis using probability distribution Model.
Applsci 11 08478 i003
The main topic of discussion nowadays is that most employers or organizations are paying the excess premium amount to the insurance companies while their employees are not frequently consuming the amount of the medical benefit. The objective is to align the premium amount with actual medical utilization made by each category of employees. The historical medical records enable us to generate need-based packages, with the help of these packages we can estimate actual utilization of amount using the probability distribution concept. After data visualization we observed that the gamma distribution fits the data as shown in Figure 4. The process of optimization depends on the difference between T real medical benefits amount and ϕ computed medical benefits amount. The data-driven analysis using a probability distribution plot is performed for each group in the specified category. For each group, we evaluate the mean μ and standard deviation σ . It is observed that as the amount increases the number of employees availing that amount is reduced. This fact can be observed from the shape of the gamma distribution curve. When we get a value for a standard deviation for each group, we know this fact that standard deviation is used as a constant in the probability distribution. Within the gamma distribution, we can move towards the right side of the mean and can get estimates of how many employees are getting out of pocket while availing healthcare services. All medical expenses equal and greater than μ + 2 σ are not affordable for employees. These estimates let us find out the number of out of pocket employees in each category before and after optimization.The same concept is used for finding lower and upper bound for the premium amount for each group of all categories. We set the bracket from μ till μ + σ as the safe range for estimating the premium amount for the specified group. The members in one group can be moved to another group depending upon the amount that the employee has utilized. This optimization methodology introduces a model for employers and insurance companies. The complexity of these three algorithms is also not a limiting factor as the planning is only done once a year for deciding the packages for any enterprise.

4. Results and Analysis

The database used for medical benefit optimization methodology is depicted in Figure 5.
For medical benefit optimization methodology, we need to prepare data first. The table named facttable_data contains all the prepared columns for each employee. The table named as ’centers’ contain centroids after the kmeans algorithm. Euclidean_distance table contains results after computing distance from all centroids. Member_center contains members of each center.

4.1. Data Preparation

The data is prepared before the implementation of the Kmeans algorithm. The set of attributes providing details about the availed and provided services are shown in Table 3.
The queries which are applied on transactional data are provided below.
CREATE TABLE facttable_data (
EMP_ID varchar(255),
age float,
total_visits float,
total_amount float,
relation_status varchar(255),
relation float,
gender float,
CATEGORY nvarchar(255))
After creating table named facttable_data, we are going to insert values in this table by executing stored procedure ’prepareDataForKmeans’ Appendix A and following queries. As it is already mentioned that FACTTABLE is the main table which contains all transactions of last five years.
INSERT INTO facttable_data (EMP_ID,
total_visits,
total_amount, relation_status)
Select EMP_ID, COUNT(EMP_ID)
as total_visits,
SUM(amount) as total_amount,
STRING_AGG(RELATION_ ,’,’)
as relation_status
from FACTTABLE GROUP BY EMP_ID
In the original data set, relation and gender both are of string data type, but we define them as floats for further processing. This is done by using the following query.
UPDATE facttable_data
SET facttable_data.age
= employee.AGE,
facttable_data.CATEGORY
= employee.CATEGORY,
facttable_data.gender =
(Case when employee.gender = ’M’
then 1 when employee.gender
= ’F’ then 0 else 0 end)
FROM facttable_data
INNER JOIN employee
on facttable_data.EMP_ID
= employee.EMP_ID
For relation(status) we used 1 for married and 0 for single.
UPDATE facttable_data
SET facttable_data.relation =
(Case when
facttable_data.relation_status
like ’%SON%’or
facttable_data.relation_status
like ’%WIFE%’or
facttable_data.relation_status
like ’%DAUGHTER%’or
facttable_data.relation_status
like ’%HUSBAND%’  then 1
else 0 end )
The proposed methodology is implemented in eclipse using JAVA programming language. The screenshots of the computed values are provided in Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11.
Kmeans clustering algorithm is used for performing clustering. For Kmeans we need to enter the number of clusters and number of iterations as an input to the kmeans algorithm. We select the number of clusters by using the Silhouette score method. In Figure 6, it can be seen that peaks are at 2, which identifies the optimum number of cluster for all the categories in the provided data. Based on the Silhouette score, we kept the number of clusters as 2. The number of iterations is decided by observing the change in centroids values. Once the number of clusters and maximum iterations is decided. Then, need-based packages are generated for each category of employees. Let us first analyze the already in use packages for each category and then we discuss in detail the generated need-based packages for each category.
From Table 4, it can be seen that there is a single premium amount for each type of category. It seems that while designing such packages insurance companies did not consider the need of employees. The need of any person can be analyzed by different parameters like age, marital status, gender, amount and number of visits, etc. By considering above mentioned parameters, it can be seen that based on any person’s need each category can be divided into two groups. The proposed methodology generates packages by dividing each category into further two classes based on the above-mentioned parameters.
Table 5 depicts a need-based package for category A. The methodology further divides category A into two groups. It can be seen that five attributes are considered for defining the need of any employee age, gender, relation status(relationship status), amount, number of visits. The Sum amount is the sum of all employees amount in the first group. In category A, age is not the distinguishing attribute, all-male employees whose ages are greater than and equal to 30 and are married come in the first group. The premium amount for each employee in the first group is Rupees 55,205. All the female employees whose age is greater than and equal to 30 and are married come in the second group and the premium amount for each employee in this group is rupees 13,189. The total premium amount for category A α A is six million six hundred thirty-six thousand fifty-six hundred and seventy rupees. The total number of employees in category A is 170. The main purpose of this analysis is to define the premium amount according to the need of employees. Out of 170 employees, not all the employees visit the hospital for taking healthcare services every year but still employer is paying the huge and same amount of premium for all the employees of category A. From Table 5, it can be seen that the sum amount is the sum of amounts of all employees in one group. We use equation 1 for computing the number of employees for both groups. The number of employees in generated packages is 64, which is less than 170. This analysis on five years data proves that 40% of total employees in category A visit hospital. So the premium amount is computed accordingly. Equation (3) is used for computing the total sum amount for all groups of one category.
In Table 6, it can be seen that all employees whose ages are less than and equal to 26, male and married are in group I. For all employees whose ages are greater than and equal to 31, female and married, there is a total of 10 employees in this category and only 4 of them visited the hospital for availing health services. Again, 40% of employees are using health care services. Premium amount for each group in category B is computed using Equation (3).
Table 7 depicts the package for category C. Group I in category C consists of all employees whose ages are less than and equal to 28, female and married. The annual premium for the group I is rupees 17,208. In group II, all employees whose ages are greater and equal to 29, male and married. The premium amount for each employee in this group of category C is 58,784 rupees. The number of employees in group I are 318 and in group 2 are 150. The original number of employees in category C is 985. The number of employees who avail health services is 468. In category C, almost 50% employees are availing healthcare services.
Table 8 depicts the package for category D, employees whose ages are less than and equal to 28, male, married are in a group I and all employees whose ages are greater than and equal to 29, male and single are in group II. The original number of employees in category D is 252. Using equation1, the numbers of employees in both groups are computed for category D. Forty-nine employees are in a group I and one hundred and fifty-seven employees are in group II. In category D, 85% employees are availing healthcare services.
Table 9 depicts the package for category E, all employees whose ages are less than and equal to 28, male and married are in group I and all employees whose ages are greater than and equal to 29, female and married are in group II. The premium amount for each employee in group I is 54,665 rupees and the premium amount for each employee in group II is 13,802. The original number of employees in category E is 1991. In category E, 50% of employees are availing health care services.
In Table 10 package for the category, F is shown, all employees whose ages are 28, and are male and married are in one group and all employees whose ages are 26, and are female and single are in group II. In this category, only 40% are using healthcare services. The premium amount for each employee in group I is 41,191 and for each employee in group II is 7,768.
Table 11 depicts the package for category G. Employees who are married, male and aged less than and equal to 30 are in group I and all-female employees with age greater than and equal to 30, married are in group II. The premium amount for each employee in group I is 96,912 rupees and the premium amount for each employee in group II is 19,656. The total number of employees in category G is 1619. About 82% employees are using health services.
The current medical expense for 5143 employees and the total computed medical expense in rupees are depicted in Figure 7, which is the output of Algorithm 2.
Now there is a need for data-driven analysis. The probability distribution which fits our data is ’Gamma’. Let us first consider Category A. The probability distribution for each group in category A is shown in Figure 8 and Figure 9.
We combine both groups in category A and compute the total amount for this category. μ is mean, ϕ is the original medical amount and σ is the standard deviation. For data-driven analysis, we used a gamma probability distribution model. The parameters of gamma distribution for both groups of category A are shown in Table 12.
In category B there are four employees in group I and only one employee is in group II. In category C there are two groups.
In Figure 10 and Figure 11, a histogram with a distribution fit(histfit) is plotted for group I and group II in category C. In group I, the mean value is around 10,000 and standard deviation σ is 40,000. In group II, it can be seen that mean value μ is around 25,000 and standard deviation σ is 19,690.
The parameters a and b for gamma distribution are computed for both groups of category C as shown in Table 13.
In group I, of category D, the mean μ is around 54,087, and σ is around 42,686. In group II, the mean μ is 10,812 and the standard deviation σ is 11,105.67 as shown in Table 14. The probability distribution model for groups of category D is depicted in Figure 12 and Figure 13.
Similarly, gamma parameters for all categories E, F and G are computed as shown in Table 15.
Table 16 depicts the number of out of pocket employees in the current medical benefits amount and out of pocket employees in optimized benefit amount using Gamma distribution. The term out of pocket is used to explain how many employees are spending additional amounts for availing healthcare services and getting financially overburdened.
In Figure 14, outlier are detected and these outliers are basically depicting the out of pocket employees in each category.
In Figure 15, out of pocket employees before and after medical benefit optimization are shown. The y-axis on the left side is depicting the number of out-of-pocket employees and the y-axis on the right side is depicting the number of out of pocket employees after optimization. The x-axis is depicting categories used for insurance coverage.
In Table 17, the gamma distribution is depicted for each group in all categories. If we move to the right of gamma distribution each time by adding μ to σ . As we know the concept of scaling in distributions, we multiply constant with standard deviation σ as depicted in Table 9. It can be observed that the value of the probability density function decreases as we move right to the μ . By using these observations we defined lower and upper bound using gamma distribution, for already generated need-based packages. From μ till μ + σ , we can set affordable premium amounts for each package in each category. However, if we move to or greater than μ + 2 σ , then there is the chance of monetary losses. From generated gamma distribution, we can define lower bound and upper bound for each group of every category. Deciding when to move from lower to upper bound is based on attributes like age, gender and marital status which are defining needs of the employees.
Table 18 is showing lower and upper bounds for setting premium amounts for each group of all categories. Employers/organizations can decide on the affordable premium amount within defined bounds exceeding these amounts can result in financial losses.
The total amount is computed by first using the mean μ as the premium amount for each group in each category. Then μ + σ as the premium amount for each group of each category. After that μ + 2 σ is computed for each group of all categories and used as a premium amount. Table 19 depicts the total amount computed using gamma distribution for categories A,B,C,D,E,F and G. The last three columns in Table 19, are probability density functions for μ + σ , μ + 2 σ and μ + 3 σ . Category B, D and F, we can see currently allocated medical amount is less than the need of the employees who are in these categories. The proposed methodology solves the real-life problem of insurance package design. The insurance packages are designed by learning from historical medical records. This data-driven analysis provides an easy way to compute out of pocket employees and to define lower/upper bound for the premium amount for all groups of every category.
After analyzing each category, we compute the overall amount by moving to the right on the probability distribution curve as depicted in Table 19. The organization can adjust the amount of their medical benefit by using our proposed methodology. The need of the employees can be changed, so our model enables employers to move an employee from one group to another.

4.1.1. Case 1: T < ϕ

In this case no optimization is required.

4.1.2. Case 2: T > ϕ

There can be a case in which T is greater than the ϕ optimized amount, then there is a need of optimization. The scale parameter σ is multiplied two or three times and then added to the μ , according to the scenario.

4.1.3. Case 3: Need of Employees Changes with Time

There can be a case in which employee healthcare needs change due to multiple reasons for example, he/her gets married, requirements with ageing increase, etc. Thus, the employee group changes and his insurance package is accordingly updated. In such cases, probability distributions will be generated again for each category.

4.1.4. Case 4: T = ϕ

The ideal case is the case where T the current medical benefit is equal to the optimized medical benefit ϕ . In such cases, no more optimization is required.

4.1.5. Observation

Figure 16, depicts the real number of employees before the optimization methodology is applied in the left y-axis, and the number of employees in each category after medical benefit optimization methodology is applied is depicted by the right y-axis. The number of employees in each category is reduced after the optimization methodology is applied. This proves that not all insured employees are using healthcare services. The observation is that organization is paying extra premium amounts as not all employees are using healthcare services frequently.
It can be seen from Figure 17 that the allocated benefit amount is not fully utilized.The x-axis is representing categories and the y-axis is depicting the percentage of amount utilization in each category.
In Figure 18, it can be seen that after optimization total amount for each category is reduced. It is observed that in the three categories B, D and F, there are out of pocket employees. They paid an additional amount from their pocket for using healthcare services.
Figure 19 depicts the data-driven analysis of amounts for each category using a gamma distribution. After the optimization, it is necessary to compute the percentage of optimization achieved. Optimization percentage is computed by using Equation (6):
P = ( ϕ T ) T 100 = 29288429.57 116446432.6 100 = 25 %
ϕ is computed optimized amount and T is current total amount, both amounts are shown in Figure 12. The T is optimized to ϕ by 25% with the help of proposed methodology. This optimization is achieved as we have divided each category into two groups, each group has different sum amount and different premium amounts. By using these dynamic amounts for each category, need based packages are derived.

5. Discussion

The techniques and studies which are discussed in Section 2, shows a trend of applying machine learning techniques and gaining gradual recognition and acceptance for solving insurance-related issues in the healthcare industry. The focus of most of the research is more on cost efficiency and fraud detection. The designs are based on prediction, analysis and evaluation of healthcare services, operations and resources. It is observed that the data-driven analysis using probability distributions is the most helpful platform for analyzing existing loopholes. Optimization is an interdisciplinary term, which combines mathematics, computer science, economics and engineering. There is a pervasive requirement of the applicability of optimization approaches to healthcare-related problems. The optimization approaches improve the provision of healthcare services by making them more cost-effective and efficient. It is a good practice to perform experimentation, using different values for the parameters considered in the optimization problem, to verify the robustness of the optimization results. For this purpose, intensive experimentation is performed on medical benefits amount in a considered case study using the concepts of data clustering and a probability distribution model. After the detailed literature review in Section 2, we observed that there is a requirement of an efficient methodology to generate insurance packages that must not be designation specific rather evenly distributed fairly in all employees of an organization. None of the existing methodologies are focused on generating insurance packages and optimization of medical benefits using machine learning techniques. Our proposed methodology use machine learning concepts and perform data-driven analysis of original transactional data using probability distribution concepts. Table 20 shows the novelty of our proposed methodology.
This data-driven analysis is helpful in ensuring longterm sustainability of the insurance programs. The main purpose of using Kmeans is to get a group of all similar employees in each category. Further processing on centroids is performed to get better premium amounts for packages. Subsequently, the gamma probability distribution is applied to optimize the amount according to the requirement. There is a chance that any employee’s group can be changed, and accordingly, the amount can be optimized. The probability distribution for each category is computed again.

6. Practical Implications

Healthcare expenditure is continuously rising, especially in developing and low-income countries. According to the World Health Organization’s report published in 2016, the annual rise in healthcare costs in developing countries is around 6% greater than 4% of the developed countries.
In Pakistan, unluckily, the healthcare sector is among the most neglected sectors and is spending only 3% of its gross domestic product (GDP) on the education sector, health, and nutrition. Since the last decade, the government allocates 0.5pc to 0.8pc of its GDP for the health sector, less than the WHO benchmark that is 6pc of GDP. According to the World Bank report (April 2017), this figure is significantly less than other countries.
In most countries, including Pakistan, the government has just initiated medical support programs through several national-level initiatives. One of these initiatives is the establishment of the Prime Minister Task Force on IT and Telecom (in 2018) to lay down the foundation of data standards and annotations for incorporating the improved plans in healthcare service delivery to the common person. The Sehat card scheme was recently introduced in Pakistan to provide need-based insurance packages. The proposed methodology can be adopted to improve the provision of health insurance in healthcare institutions to control their overall expenditures. The designed methodology is beneficial for insurance companies as well as for enterprises.

7. Limitations

With the increase in data size, cloud computing will be required for processing. Although, the complexity of proposed three algorithms is not a limiting factor as the planning is only done once a year for deciding the packages for any enterprise. We have used age, gender, marital status, relation, amount and number of visits during the design of need based insurance packages. As more attributes are added analysis can produce more cost effective packages. So the proposed methodology can be extended by adding more features that can contribute to the evaluation of employee needs. It would reflect better and improved performance if applied for a wider perspective. We received data in a raw form and proper preprocessing of data was required before implementation. The data acquisition from well-reputed hospitals is a troublesome task and is the main hurdle in the implementation of this type of methodology.

8. Conclusions and Future Work

In the healthcare industry, there is a dire need to focus on health insurance issues and to introduce efficient cost-effective techniques for fair distribution of insurance facilities. The linkage of healthcare benefits with employee roles in the organization is one of the critical problems. There is a need to replace the current strategies with methodologies that can ensure need-based healthcare benefits to the employees. This will not only minimize the chances of fraud/misutilization of healthcare benefits but will also enhance the sense of health security among the employees irrespective of their grades or designations. The existing studies are based on practices and strategies, emphasized funds prediction, customer management and estimation of insurance premium amounts. Our proposed methodology generated need-based packages using a machine learning model based on K_means clustering. With the help of this model, we have computed the optimum premium amount. The data-driven analysis based on the probability distribution model is used for analyzing the adjustment of premium amounts within each package depending upon the available financial resources. The proposed methodology is validated via five years’ transactional data of employees of a large enterprise. The results indicate that the medical premium amount is optimized by 25% of the current benefit amounts. Therefore, if adopted, it will not only allow employers and insurance companies to design suitable insurance schemes for the provision of healthcare benefits but will also prevent financial losses in the long run. By such extensive analysis, we can understand the power of insurance industry policies which can save millions of lives but at the same time can result in a disaster. In this research, we have used only data from one enterprise, but shall be applicable to other similar enterprises, as the algorithm easily scales for several employees and the number of clusters.

Author Contributions

Conceptualization, I.M. and S.A.K.; methodology, I.M. and S.A.K.; software, I.M.; validation, I.M. and S.A.K.; formal analysis, I.M.; investigation, S.A.K.; resources, I.M.; data curation, S.A.K.; writing—original draft preparation, I.M.; writing—review and editing, F.H., W.H.B., R.R., F.K.; visualization, I.M.; supervision, S.A.K.; project administration, I.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

The authors thank Shifa International Hospital, Islamabad, Pakistan for providing the employee’s insurance data to validate our proposed methodology. This work is part of PM Task Force on IT and Telecom initiative that genuine people can easily get medical benefits from the government level initiative and is presented as an initial proposed methodology to the taskforce for fraud prevention and package generation.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Stored Procedure
CREATE PROCEDURE prepareDataForKmeans
(
@category nvarchar(255)
)
AS
BEGIN
DECLARE @amountAvg float
DECLARE @ageAvg float
DECLARE @visitAvg float
DECLARE @relationAvg float
DECLARE @genderAvg float
DECLARE @amountStdev float
DECLARE @ageStdev float
DECLARE @visitStdev float
DECLARE @relationStdev float
DECLARE @genderStdev float
SET @amountAvg = (
SELECT AVG(total_amount)
FROM facttable_data
);
SET @visitAvg = (
SELECT AVG(total_visits)
FROM facttable_data
);
SET @ageAvg = (
SELECT AVG(age)
FROM facttable_data
);
SET @relationAvg = (
SELECT AVG(relation)
FROM facttable_data
);
SET @genderAvg = (
SELECT AVG(gender)
FROM facttable_data
);
SET @amountStdev = (
SELECT STDEV(total_amount)
FROM facttable_data
);
SET @visitStdev = (
SELECT STDEV(total_visits)
FROM facttable_data
);
SET @ageStdev = (
SELECT STDEV(age)
FROM facttable_data
);
SET @relationStdev = (
SELECT STDEV(relation)
FROM facttable_data
);
SET @genderStdev = (
SELECT STDEV(gender)
FROM facttable_data
);
select EMP_ID, (age - @ageAvg) / @ageStdev as age,
(total_visits - @visitAvg) / @visitStdev as total_visits,
(total_amount - @amountAvg) / @amountStdev as total_amount,
(relation - @relationAvg) / @relationStdev as relation,
(gender - @genderAvg) / @genderStdev as
gender from facttable_data where CATEGORY = ’A’
END

References

  1. Rao, S. Health insurance: Concepts, issues and challenges. Econ. Political Wkly. 2004, 39, 3835–3844. [Google Scholar]
  2. Radermacher, R.; Dror, I.; Noble, G. Challenges and strategies to extend health insurance to the poor. In Protecting the Poor: A Microinsurance Compendium; ILO: Geneva, Switzerland, 2006. [Google Scholar]
  3. Ding, K.; Lev, B.; Peng, X.; Sun, T.; Vasarhelyi, M.A. Machine learning improves accounting estimates: Evidence from insurance payments. Rev. Account. Stud. 2020, 25, 1098–1134. [Google Scholar] [CrossRef]
  4. Groba, C.; Sartal, A.; Vázquez, X.H. Solving the dynamic traveling salesman problem using a genetic algorithm with trajectory prediction: An application to fish aggregating devices. Comput. Oper. 2015, 56, 22–32. [Google Scholar] [CrossRef]
  5. Dhieb, N.; Ghazzai, H.; Besbes, H.; Massoud, Y. A secure ai-driven architecture for automated insurance systems: Fraud detection and risk measurement. IEEE Access 2020, 8, 58546–58558. [Google Scholar] [CrossRef]
  6. Kowshalya, G.; Nandhini, M. Predicting fraudulent claims in automobile insurance. In 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT); IEEE: Piscataway, NJ, USA, 2018; pp. 1338–1343. [Google Scholar]
  7. Grize, Y.-L.; Fischer, W.; Lützelschwab, C. Machine learning applications in nonlife insurance. Appl. Stoch. Model. In Business Ind. 2020, 36, 523–537. [Google Scholar] [CrossRef]
  8. Itri, B.; Mohamed, Y.; Mohammed, Q.; Omar, B. Performance comparative study of machine learning algorithms for automobile insurance fraud detection. In 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS); IEEE: Piscataway, NJ, USA, 2019; pp. 1–4. [Google Scholar]
  9. Hossain, S.S.M.R.; Salman, S.M. Implementation challenges of the mandatory health insurance scheme. Bull. Natl. Res. Centre 2019, 43, 151. [Google Scholar] [CrossRef] [Green Version]
  10. Krasheninnikova, E.; García, J.; Maestre, R.; Fernández, F. Reinforcement learning for pricing strategy optimization in the insurance industry. Eng. Appl. Artif. Intell. 2019, 80, 8–19. [Google Scholar] [CrossRef]
  11. Sundarkumar, G.G.; Ravi, V. A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. ENgineering Appl. Artif. Intell. 2015, 37, 368–377. [Google Scholar] [CrossRef]
  12. Subudhi, S.; Panigrahi, S. Use of optimized fuzzy c-means clustering and supervised classifiers for automobile insurance fraud detection. J. King Saud-Univ.-Comput. Inf. Sci. 2020, 32, 568–575. [Google Scholar] [CrossRef]
  13. Challa, A. Insurance Models and Risk-Function Premium Principle. 2012. Available online: https://www.semanticscholar.org/paper/Insurance-models-and-risk-function-premium-Challa/1f19f01beafdb3c451861c9275901e68ab3a0377 (accessed on 20 May 2021).
  14. Mazzoccoli, A.; Naldi, M. The expected utility insurance premium principle with fourth-order statistics: Does it make a difference? Algorithms 2020, 13, 116. [Google Scholar] [CrossRef]
  15. Bianchi, D.; Büchner, M.; Tamoni, A. Bond risk premiums with machine learning. Rev. Financ. Stud. 2020, 34, 1046–1089. [Google Scholar] [CrossRef]
  16. Gan, L.; Wang, H.; Yang, Z. Machine learning solutions to challenges in finance: An application to the pricing of financial products. Technol. Forecast. Soc. Chang. 2020, 153, 119928. [Google Scholar] [CrossRef]
  17. Henckaerts, R.; Côté, M.-P.; Antonio, K.; Verbelen, R. Boosting insights in insurance tariff plans with tree-based machine learning methods. North Am. Actuar. J. 2020, 25, 255–285. [Google Scholar] [CrossRef]
  18. Kuo, K.; Lupton, D. Towards explainability of machine learning models in insurance pricing. arXiv 2020, arXiv:2003.10674. [Google Scholar]
  19. Kröger, V.; Nordström, R. Expected Individual Insurance Cost Based on Driving Pattern: Machine Learning Methods Using Telemetric Data; Digitala Vetenskapliga Arkivet: Uppsala, Sweden, 2020. [Google Scholar]
  20. Gan, G. Application of data clustering and machine learning in variable annuity valuation. Insur. Math. Econ. 2013, 53, 795–801. [Google Scholar] [CrossRef]
  21. Spedicato, G.A.; Dutang, C.; Petrini, L. Machine Learning Methods to Perform Pricing Optimization. A Comparison with Standard Glms. 2018. Available online: https://www.semanticscholar.org/paper/Machine-Learning-Methods-to-Perform-Pricing-A-with-Spedicato-Dutang/6a51b2c8557acde21389193ea86f3d00482036c3 (accessed on 20 May 2021).
  22. Collopy, F.; Nard, C.A.; Amin, H.S.; Turocy, G.; Takieh, S.V.S.; Krosky, R.C.; Noonan, D.; Narvaez, G.A.; Asquith, B. Dynamic Insurance Rates. US Patent App. 12/536,999, 27 May 2010. [Google Scholar]
  23. Abdalkareem, Z.A.; Amir, A.; Al-Betar, M.A.; Ekhan, P.; Hammouri, A.I. Healthcare scheduling in optimization context: A review. Health Technol. 2021, 11, 445–469. [Google Scholar] [CrossRef]
  24. Eling, M.; Nuessle, D.; Staubli, J. The impact of artificial intelligence along the insurance value chain and on the insurability of risks. In The Geneva Papers on Risk and Insurance-Issues and Practice; Springer: Berlin/Heidelberg, Germany, 2021; pp. 1–37. [Google Scholar]
  25. Hassan, A.K.I.; Abraham, A. Modeling insurance fraud detection using imbalanced data classification. In Advances in Nature and Biologically Inspired Computing; Springer: Berlin/Heidelberg, Germany, 2016; pp. 117–127. [Google Scholar]
  26. Jha, B.K.; Sivasankari, G.; Venugopal, K. Fraud detection and prevention by using big data analytics. In 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC); IEEE: Piscataway, NJ, USA, 2020; pp. 267–274. [Google Scholar]
  27. Matloob, I.; Khan, S.A.; Rahman, H.U. Sequence mining and prediction-based healthcare fraud detection methodology. IEEE Access 2020, 8, 143256–143273. [Google Scholar] [CrossRef]
  28. Matloob, I.; Khan, S.; Hussain, F.; Rahman, H. Medical health benefit management system for real-time notification of fraud using historical medical records. Appl. Sci. 2020, 10, 5144. [Google Scholar] [CrossRef]
  29. Singh, D.; Kumar, P. Conceptual mapping of insurance risk management to data mining. Int. J. Comput. Appl. 2012, 975, 8887. [Google Scholar] [CrossRef]
  30. Soeini, R.A.; Rodpysh, K.V. Applying data mining to insurance customer churn management. Int. Proc. Comput. Sci. And Information Technol. 2012, 30, 82–92. [Google Scholar]
  31. Bhatnagar, V.; Ranjan, J.; Singh, R. Analytical customer relationship management in insurance industry using data mining: A case study of indian insurance company. Int. J. Netw. Virtual Organ. 2011, 9, 331–366. [Google Scholar] [CrossRef]
  32. Zhikun, X.; Yanwen, W.; Zhaohui, L. Optional insurance compensation rate selection and evaluation in financial institutions. Int. e-Serv. Sci. Technol. 2014, 7, 233–242. [Google Scholar]
  33. Goonetilleke, T.O.; Caldera, H. Mining life insurance data for customer attrition analysis. J. Ind. Intell. Inf. 2013. [Google Scholar] [CrossRef]
Figure 1. Description of problems to be addressed.
Figure 1. Description of problems to be addressed.
Applsci 11 08478 g001
Figure 2. Graphical Summary of Proposed Methodology.
Figure 2. Graphical Summary of Proposed Methodology.
Applsci 11 08478 g002
Figure 3. Clusters within categories.
Figure 3. Clusters within categories.
Applsci 11 08478 g003
Figure 4. Probability distribution model for employees insurance records.
Figure 4. Probability distribution model for employees insurance records.
Applsci 11 08478 g004
Figure 5. Relational database for medical benefit optimization methodology.
Figure 5. Relational database for medical benefit optimization methodology.
Applsci 11 08478 g005
Figure 6. Determining Number of clusters.
Figure 6. Determining Number of clusters.
Applsci 11 08478 g006
Figure 7. Total Medical Amount Comparison.
Figure 7. Total Medical Amount Comparison.
Applsci 11 08478 g007
Figure 8. HISTFIT for group 1 in Category A.
Figure 8. HISTFIT for group 1 in Category A.
Applsci 11 08478 g008
Figure 9. HISTFIT for group 2 in Category A.
Figure 9. HISTFIT for group 2 in Category A.
Applsci 11 08478 g009
Figure 10. HISTFIT for group 1 in Category C.
Figure 10. HISTFIT for group 1 in Category C.
Applsci 11 08478 g010
Figure 11. HISTFIT for group 2 in Category C.
Figure 11. HISTFIT for group 2 in Category C.
Applsci 11 08478 g011
Figure 12. HISTFIT for group 1 in Category D.
Figure 12. HISTFIT for group 1 in Category D.
Applsci 11 08478 g012
Figure 13. HISTFIT for group 2 in Category D.
Figure 13. HISTFIT for group 2 in Category D.
Applsci 11 08478 g013
Figure 14. Out of pocket employees in original data.
Figure 14. Out of pocket employees in original data.
Applsci 11 08478 g014
Figure 15. Number of Out of Pocket employees before and after Optimization.
Figure 15. Number of Out of Pocket employees before and after Optimization.
Applsci 11 08478 g015
Figure 16. Number of employees in each category.
Figure 16. Number of employees in each category.
Applsci 11 08478 g016
Figure 17. Percentage of Utilization Amount in each category.
Figure 17. Percentage of Utilization Amount in each category.
Applsci 11 08478 g017
Figure 18. Comparison of Utilization Amount in each category.
Figure 18. Comparison of Utilization Amount in each category.
Applsci 11 08478 g018
Figure 19. Data Driven Analysis using Gamma Distribution.
Figure 19. Data Driven Analysis using Gamma Distribution.
Applsci 11 08478 g019
Table 1. Important Terminologies.
Table 1. Important Terminologies.
TerminologyDescription
InsuredThe employee and his family members, who are availing the insurance policy.
Sum amountA maximum amount paid by the insurance company in case
the insured person gets hospitalized.
For example, the sum insurance of an employee is
PKR 3 Lacs and got hospitalized three times in a year.
On his first treatment, the billed amount is 50 thousands PKR,
on his second and third treatments, the amount is of PKR 1 Lac and
2 Lacs, respectively. The total bill amount for the year is PKR 3.5 Lacs
which exceeds the sum insured of the employee hence, paid by the insured person.
Premium amountThe fixed annual amount paid by the insured.
In the case of organizations or employers, insured are
employees who are availing insurance policy, and their
employers are the payer of premium amounts to insurance companies.
PayerIt is the employer/ organization paying the premium.
Insurance policyIt is a legal document that includes details
regarding particular insurance coverage for an insured.
ContractIt is the legal agreement between the insurer and insured.
Table 2. Notation with description.
Table 2. Notation with description.
NotationDescription
φ depicts transactional data of employees.
kThere are seven categories of employees in this organization and each category is availing separate
insurance plan. Categories are represented by k where k= (A B,C,D,E,F,G)
ϕ denotes total computed medical benefit.
α j denotes total amount in each category.
β i depicts packages for k categories.
α G i depicts amount of each group in each category
Ois the Premium amount which the hospital/organization is paying.
The organization is giving total premium T and for each category there is a separate value of
premium amount
ϵ Number of total employees are denoted by E and number
of employees in each group G x of each category are denoted by ϵ
V i is the out of pocket employees in each category i = k where k = A,B,C,D,E,F,G.
Table 3. Attributes.
Table 3. Attributes.
NameData Type
EMP_IDvarchar(255)
AgeFloat
Total_visitsFloat
Total_amountFloat
Marital_statusFloat
RelationFloat
GenderFloat
Categorynvarchar(255)
Table 4. Category wise Annual Premium amount and number of employees.
Table 4. Category wise Annual Premium amount and number of employees.
Category TypePremium Amount (PKR)Number of Employees
A39,036170
B17,466.65
C31,922985
D13,075.57252
E26,3091991
F88621619
G68,811120
Table 5. Need based Package for Category A.
Table 5. Need based Package for Category A.
AgeGenderRelation StatusSum AmountAmountVisit
30.00.01.02,484,22555,205.02700
30.01.01.0250,59113,189.0342
Table 6. Need based Package for Category B.
Table 6. Need based Package for Category B.
AgeGenderRelation StatusSum AmountAmountVisit
26.00.01.066,21233,106.0116.0
31.01.01.040,51820,259.034.0
Table 7. Need based Package for Category C.
Table 7. Need based Package for Category C.
AgeGenderRelation StatusSum AmountAmountVisit
28.01.01.02,460,74417,208.03003.0
29.00.01.018,693,31258,784.021,306.0
Table 8. Need based Package for Category D.
Table 8. Need based Package for Category D.
AgeGenderRelation StatusSum AmountAmountVisit
28.00.01.02,488,04854,088.03036.0
29.00.00.01,697,48410,812.01727.0
Table 9. Need based Package for Category E.
Table 9. Need based Package for Category E.
AgeGenderRelation StatusSum AmountAmountVisit
28.00.01.030,557,73554,665.036,894.0
29.01.01.04,182,00613,802.05151.0
Table 10. Need based Package for Category F.
Table 10. Need based Package for Category F.
AgeGenderRelation StatusSum AmountAmountVisit
28.00.01.010,750,85141,191.013,572.0
26.01.00.08,171,9367768.010,520.0
Table 11. Need based Package for Categoy G.
Table 11. Need based Package for Categoy G.
AgeGenderRelation StatusSum AmountAmountVisit
30.00.01.03,779,56896,912.04046.0
30.01.01.01,631,44819,656.01826.0
Table 12. Parameter Computation for Category A.
Table 12. Parameter Computation for Category A.
Groupa(Shape)b(Scale) μ σ
11.016512,868.713,08112,974
21.1717247,418.555,561.251,328
Table 13. Parameter Computation for Category C.
Table 13. Parameter Computation for Category C.
Groupa(Shape)b(Scale) μ σ
18.9108311,528.910,273234,414.82
21.138122,539.625,652.424,045.73
Table 14. Parameter Computation for Category D.
Table 14. Parameter Computation for Category D.
Groupa(Shape)b(Scale) μ σ
11.2610842,889.954,087.848,164.51
20.9478231407.210,81211,105.67
Table 15. Parameters estimation for Categories E, F and G.
Table 15. Parameters estimation for Categories E, F and G.
CategoryGroupa(Shape)b(Scale) μ σ
E11.2610842,889.954,087.848,164.51
20.76136518,17513,837.815,858.85
F10.9440728228.137767.957994.73
21.1246936,624.541,191.338,840.83
G14.6003421,066.496,91245,183
21.3139914,95919,656.117,147.48
Table 16. Out of Pocket Employees in Current Medical Benefit and in Optimized Amount.
Table 16. Out of Pocket Employees in Current Medical Benefit and in Optimized Amount.
CategoryOut of Pocket Employees in
Current Amount
Out of Pocket Employees in
Optimized Amount
A4533
B3010
C5433
D4715
E5731
F5212
G3017
Table 17. Optimization of total medical benefit for each Category A,B,C,D,E,F and G.
Table 17. Optimization of total medical benefit for each Category A,B,C,D,E,F and G.
CategoryGroup No μ σ μ + σ μ + 2 σ μ + 3 σ P ( μ + σ ) P ( μ + 2 σ ) P ( μ + 3 σ )
A113,08112,97426,05539,02952,0041.05 × 10 5 3.84 × 10 6 1.41 × 10 6
255,56151,328106,889158,218209,5472.75 × 10 6 9.9 × 10 7 3.5 × 10 7
C1102,73234,414137,146171,560205,9745.70 × 10 6 1.60 × 10 6 3.60 × 10 7
225,65224,04549,69873,74397,789.595.80 × 10 6 2.11 × 10 6 7.50 × 10 7
D154,08848,164102,252150,416198,5812.90 × 10 6 1.07 × 10 6 3.70 × 10 7
210,81211,10521,91733,02344,1291.20 × 10 5 4.44 × 10 6 1.65 × 10 6
E154,08748,164102,2521,504,16198,5813.10 × 10 6 1.03 × 10 6 3.26 × 10 7
213,83715,85929,69745,55561,4147.80 × 10 6 2.97 × 10 6 1.15 × 10 7
F17768799515,76223,75731,7521.66 × 10 5 6.16 × 10 6 2.29 × 10 6
241,19138,84180,032118,873157,7133.59 × 10 6 1.30 × 10 6 4.60 × 10 6
G196,91245,184142,096187,279232,4634.03 × 10 6 1.27 × 10 6 3.20 × 10 7
219,65617,14736,80353,95071,0978.45 × 10 6 3.03 × 10 6 1.05 × 10 6
Table 18. Optimization of total medical benefit for each Category A,B,C,D,E,F and G.
Table 18. Optimization of total medical benefit for each Category A,B,C,D,E,F and G.
CategoryGroup #Lower BoundUpper Bound
A113,08126055.4
255,5611,06,889
C11,02,732137146
225,65249,698
D154,0871,02,252
210,81221,917
E154,0871,02,252
213,83829,696
F1776815,763
241,19180,032
G196,912142,096
219,65636,803
Table 19. Optimization of total medical benefit for each Category A,B,C,D,E,F and G.
Table 19. Optimization of total medical benefit for each Category A,B,C,D,E,F and G.
Category ϕ β μ + σ μ + 2 σ μ + 3 σ
A2,722,4996,636,05714,224,15520,743,55627,262,958
B126,98987,333231,184335,380439,575
C21,274,51231,442,54672,638,05797,809,596122,981,136
D4,347,7963,295,044.14412,159,06317,871,77523,584,488
E35,067,87352,380,635161,637,982237,239,625312,841,268
F19,504,87814,347,41260,059,01989,927,607119,796,195
G4,113,4568,257,40511,036,09515,387,10819,738,122
Table 20. Comparison with Existing Techniques.
Table 20. Comparison with Existing Techniques.
ML Related Researches in Insurance IndustryType of ResearchComparison
[8,12,25,26,27,28]Fraud DetectionAll of these researches are
proposing methodologies
for detecting
fraud in insurance industry.
 [3,6,13,19,21,22,29]Premium amount in auto insuranceThese researches focus on risk
functions premium calculation,
automobile premium computation based on
user driving pattern etc
[30,31,32,33]Customer relatedAll these researches are related
to customer management.
Proposed
Methodology
Medical benefit optimizationWe observe that in the last decade focus
of the researcher is more on
fraud detection, risk prediction and
customer management.
None of the research focus on the medical
benefit optimization. The main goal of our
research is the medical benefit
optimization from need based packages.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Matloob, I.; Khan, S.A.; Hussain, F.; Butt, W.H.; Rukaiya, R.; Khalique, F. Need-Based and Optimized Health Insurance Package Using Clustering Algorithm. Appl. Sci. 2021, 11, 8478. https://doi.org/10.3390/app11188478

AMA Style

Matloob I, Khan SA, Hussain F, Butt WH, Rukaiya R, Khalique F. Need-Based and Optimized Health Insurance Package Using Clustering Algorithm. Applied Sciences. 2021; 11(18):8478. https://doi.org/10.3390/app11188478

Chicago/Turabian Style

Matloob, Irum, Shoab Ahmad Khan, Farhan Hussain, Wasi Haider Butt, Rukaiya Rukaiya, and Fatima Khalique. 2021. "Need-Based and Optimized Health Insurance Package Using Clustering Algorithm" Applied Sciences 11, no. 18: 8478. https://doi.org/10.3390/app11188478

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop