A New Case-Mix Classiﬁcation Method for Medical Insurance Payment

: Rapidly rising medical expenses can be controlled by a well-designed medical insurance payment system with the ability to ensure the stability and development of medical insurance funds. At present, China is in the stage of exploring the reform of the medical insurance payment system. One of the signiﬁcant tasks is to establish an appropriate reimbursement model for disease treatment expenses, so as to meet the needs of patients for medical services. In this paper, we propose a case-mixed decision tree method that considers the homogeneity within the same case subgroup as well as the heterogeneity between different case subgroups. The optimal case mix is determined by maximizing the inter-group difference and minimizing the intra-group difference. In order to handle the instability of the tree-based method with a small amount of data, we propose a multi-model ensemble decision tree method. This method ﬁrst extracts and merges the inherent rules of the data by the stacking-based ensemble learning method, then generates a new sample set by aggregating the original data with the additional samples obtained by applying these rules, and ﬁnally trains the case-mix decision tree with the augmented dataset. The proposed method ensures the interpretability of the grouping rules and the stability of the grouping at the same time. The experimental results on real-world data demonstrate that our case-mix method can provide reasonable medical insurance payment standards and the appropriate medical insurance compensation payment for different patient groups.


Introduction
In recent years, the rapid increase in health care costs has become a troublesome issue, and the diagnosis and treatment of diseases often face intrinsic complexity and uncertainty. Therefore, the reform and improvement of medical insurance payment methods have always been anticipated in the medical community. A good medical insurance payment system should not only control the expenditure of medical insurance funds and restrain unreasonable medical behaviors, but also fairly compensate medical costs and expenses to ensure the quality and enthusiasm of medical services. Reference [1] proposed a novel case-mix classification scheme, the diagnosis related groups system (DRGs), which comprehensively considers factors such as disease diagnosis, disease severity, and intensity of medical service usage, and establishes a suite of clinical case classification methods based on medical resource consumption. Due to the wide applicability to practical situations, this method has played a positive role in promoting the medical service system in the United States and effectively controlled the growth of medical expenses [2,3]. Therefore, many countries followed them to develop their own DRG grouping system [4,5]. However, the differences in healthcare ecosystems between countries cause the DRGs not to work equally well everywhere. In 2020, the Chinese National Healthcare Security Administration proposed the Big Data Diagnosis-Intervention Packet (DIP) grouping scheme. From a large amount of data, DIP extracts features that are closely related to the patient's medical resource consumption level and combines cases through these features [6].
The case-mix model is essentially a disease grouping system designed to improve the quality of care or cost management. In general, research on medical expenses is based on various regression models to predict disease costs [7]. However, in case-mix studies, it is more important to classify patients into clinically meaningful and understandable groups that consume similar healthcare resources. Tree methods are often used to build case-mix models inspired by their advantages of intuitive and interpretable representations. Numerous authors have proposed different tree-based models. Reference [8] conducted a study on the diagnosisrelated grouping of inpatient medical expenses in colorectal cancer patients based on a decision tree model. Reference [9] proposed a method to build the regression trees by bootstrap and use them for model retrieval in DGR systems. The authors in [10] generated diagnostically relevant groups through the CHAID model and provided a more accurate estimate of casemix costs. Reference [11] investigated the diagnosis-related grouping of senile cataracts based on the E-CHAID algorithm. However, the tree-based models also have some drawbacks, with the undesirable tendency to overfit the data. Furthermore, tree structures are notoriously unstable, especially when the number of training samples is small, and small perturbations in the training set may cause large changes in the generated classes [12]. The diversity of tree structures also comes from different greedy search algorithms used to identify trees. In the related literature, there are generally two approaches used to deal with tree model instability: model selection and model combination. The advantage of choosing a single tree model is that simple and interpretable rules can be produced through faster computation. Similar approaches are based on selecting a single representative decision tree using a metric that evaluates the similarity or distance between trees, see [13,14]. However, these methods are mainly used in classification problems, and the accuracy and stability of such a single tree are not as good as those of an ensemble method. Ensemble learning usually achieves higher accuracy and better generalization ability than a single classifier by generating multiple models and combining them to obtain the final prediction result [15]. Common ensemble learning methods include packing, boosting, stacking, and Bayesian model averaging [16][17][18][19]. For some problems in the medical and bioinformatical community, it is more important to extract useful knowledge from the data rather than an accurate model. Therefore, the output of the learning model should be accurate, stable, understandable, and acceptable to people. However, most of the existing ensemble methods focus on improving the accuracy of prediction models, while ignoring the interpretability. In addition, feature selection is an important step in the process of constructing a reasonable case mix. Especially when the data dimension is relatively high, one can select the optimal input feature set from a given dataset, allowing machine learning models to better understand and distinguish the patterns in dataset more efficiently. At the same time, reducing the data dimension is beneficial to shorten the time for subsequent computation. In recent studies, several hybrid metaheuristic-based methods have been applied to solve the feature selection problem and achieved good results. For example, reference [20] proposed a genetic algorithm-based hierarchical feature selection (HFS) model to optimize local and global features extracted from images. Reference [21] proposed a binary hybrid metaheuristic-based algorithm and applied it to feature selection for COVID-19 classification.
Most of the traditional case-mix methods consider the variability among patients from a medical point of view. In contrast to these methods, this study focuses on the development of data-driven methods with the motivation to explore the differences in medical costs among different patients and to give intuitive case-mix rules. The main work and contributions of this paper are as follows: First, we propose a new case-mix decision tree model. We define a new objective function to evaluate the differences in medical resource consumption between different subgroups of cases. Minimizing this objective function leads to simultaneously maximizing the difference in medical resource consumption between heterogeneous groups and minimizing the difference within homogeneous groups.
Second, considering that the tree method tends to become unstable as the amount of data is small, we borrow the idea of stacking to extract the internal rules of the data by multiple learners of different types and combine the models based on the least-squares method. At the same time, in order to avoid the overfitting problem, we shrink the coefficients using the 2 norm as a penalty term. A new sample set is constructed using the original data and the generated rules, and a case-mix tree model is built from it. This method exploits more model information through integration and improves the accuracy and reliability of grouping results. Finally, we validate the effectiveness of the method on real-world data and formulate an appropriate case-mix payment standard.
The rest of the paper is organized as follows. In Section 2, we give a detailed introduction of the proposed case-mix decision tree model and multi-model ensemble decision tree method. In Section 3, the grouping performances of the proposed method in comparison with CART and CHAID are evaluated under various scenarios through simulation experiments. In Section 4, from the case data provided by the Jilin Province Administration of Social Medical Insurance of China, we construct an ovarian cancer case-mix model and formulate a payment standard to provide the reference for the medical expense reimbursement of ovarian cancer patients. Finally, we conclude the paper in Section 5.

Methodology
This paper proposes a multi-model ensemble decision tree model to solve the problem of case-mix and group payment, and generates an interpretable model while ensuring reasonable grouping. In the first subsection, we introduce the case-mix decision tree model (CDT). The second part describes the multi-model ensemble decision tree method (MEDT).

Case-Mix Decision Tree
Traditional decision trees are generally used to solve classification and regression problems. They recursively divide the data space by optimizing a specific objective function and generate multiple disjoint partitions [22]. The sub-nodes corresponding to each partition have different partitioning characteristics. Therefore, we can regard each partition as a different cluster. The objective functions used to select and divide features generally vary according to different problems. For example, the information gain ratio is used in the classical C4.5 algorithm, and the Gini index or squared error is used in CART. In the case-mix problem, we intend to merge different cases into groups, formulate the medical payment standard for different case mixes according to the medical resources they consume, and recommend reasonable medical insurance compensation payments for them [23]. The case-mix methods mainly depend on the selection of grouping features, which leads to different grouping outputs. For example, in the case-mix of patients with cerebral infarction, different clusters will be formed using different grouping features, as shown in Figure 1. In Figure 1, the blue and green lines represent the density of the cost distribution for two different patient groups, the purple and orange lines represent the average cost of the two different disease groups, and the red line represents the cost of all patients' mean. From Figure 1, we can see that different grouping characteristics have significant differences in the degree of differentiating the patient. When the disease type is selected as the grouping characteristic, the degree of differentiating patients between the two disease groups is more obvious, and the difference between different groups is large. When choosing ethnicity or marital status as the dividing feature, the difference between the two groups is hardly distinguishable.
We aim to find a reasonable grouping method, which satisfies the following two properties: first, the difference between the groups should be large enough to indicate that different groups have significant differences in the consumption of medical resources, so as to identify the needs of different patients for medical care; second, the differences between groups should be as small as possible, indicating that patients in the same group have similar medical needs. From these perspectives, we propose a new method for selecting features via a decision tree. Let X and Y denote the explanatory variable and the target variable, respectively, D to be the dataset D = {(x 1 , y 1 ), (x 2 , y 2 ), · · · , (x N , y N )}, R 1 , R 2 , · · · , and R M be M sub-regions divided into the feature space. Here, j and s represent the segmentation variable and segmentation point, respectively. If we select the j-th variable x (j) at first and use s as a split point, two sub-regions can be defined as follows: To represent the in-group difference, we adopt the following in-group variance: where y i represents the consumption level of the i-th patient's medical resources, which can be reflected in medical expenses,ȳ R 1 andȳ R 2 represent the mean value of medical expenses in the sub-region R 1 and R 2 , respectively. The smaller the intra-group variance, the smaller the difference in intra-group medical resource consumption between the two disease groups after segmentation, providing stronger homogeneity. To measure the difference of medical resource consumptions between different disease groups, the mean squared distance between groups is used as follows: whereȳ is the mean value of the medical expenses of all cases before grouping the current sample set. The larger the mean squared distance between groups, the greater the difference between the two disease groups after grouping, providing the stronger heterogeneity between groups.
It is necessary to measure the efficiency of different grouping methods, so we define a grouping objective function: For a case-mix method, it is better to make the differences between groups as large as possible, and the differences within groups as small as possible. Finding the best grouping method can be turned into an optimization problem min j,s Q(j, s). The solution to this problem is similar to regression trees. The greedy algorithm can be used to traverse all split variables j, and for the fixed split variable j, traverse all of the split points s, so as to find the optimal splitting. The split variable and split point form a pair of (j, s). The input space is divided into two regions in turn, and the above division process is repeated on each region until the stop condition is satisfied. The CDT algorithm is as follows: Step 1: Find the optimal segmentation variable j and the segmentation point s by solving Traverse every segmentation variable j and the corresponding segmentation point s, and select the pair (j, s) that minimizes the objective in Equation (5).
Step 2: Use the selected pair (j, s) to divide the area and determine the corresponding output.
Step 3: Continue to repeat Steps 1 and 2 for the two sub-regions until the stop condition is satisfied.
Step 4: Divide the input space into M sub-regions R 1 , R 2 , · · · , R M , for a sample x, the value is given by where I(x ∈ R m ) is the indicator function, and all pairs (j, s) correspond to the splitting characteristics of R m . A "fully grown" tree often overfits the data, so it is necessary to set a certain early-stop condition during the tree's growth. Furthermore, a procedure analogous to backward selection is used to prune the tree by cutting off the unneeded leaf nodes [24]. A tree T with leaf nodes {N K } is defined as Then, we calculate the cost-complexity [25] for tree T with the following formula: where α > 0 is the cost-complexity parameter, and |T| is the number of leaf nodes in the tree. For a fixed α, there exists a subtree that minimizes D α (T), denoted by T α . Note that the optimal subtree T α tends to be simple for large α and complex for small α. Reference [25] showed that the tree sequence minimizing D α (T) is nested and trees can be pruned recursively, in which cross-validation is commonly used to choose an appropriate subtree.

The MEDT Algorithms
Due to the data collection burden and personal privacy issues, the amount of case data is often relatively small, making the single-tree model prone to structural instability. To address this issue, this paper proposes and evaluates a novel model combination method that combines the predominant accuracy and stability of multiple models with the interpretability of a single model. This method utilizes a stacking method to combine the metadata generated by multiple base learners through a meta-learner. We can regard that during the model combination, the multiple learners jointly learn from the data, extract rules, and then combine the rules in some way to produce a new model. This model is based on the understanding of the data generated by multiple learners-a mapping of multiple models in general, consequently leading the differentiating of their rules to become complicated. Many factors, such as the patient's physiological characteristics, disease degree, and treatment received, are closely related to the medical resources consumed by the patient. In the case-mix problem, we intend to find features that have significant impacts on patient healthcare costs and divide the population into different subgroups based on these features. In this way, the reference of medical expenses under different subgroups can be obtained. A good grouping method can show the degree of influence of different features on the results, while demonstrating the obtained rules in an intuitive and understandable way, such as a hierarchical structure or a tree diagram. The model obtained by the combination of meta-learners can be considered as an explanation of the patient's physiological characteristics, disease degree, received treatment, and medical resources consumed. Although this explanation may not be clear enough, it does not impact the grouping of patients generated according to the importance of features. On the premise that the learned model is "true", we can extract a variety of rules through the above method and combine them to give an "explanation" of the data generation mechanism. To avoid overfitting problems, we use K-fold cross-validation and build a new set of synthetic samples based on this newly generated "rule", which is aggregated with the original training set to form a new test set. This new test set contains not only the information in the original data but also the "rule" information extracted by various learners, leading the information contained to be more comprehensive. In general, the accuracy and stability of the learner tend to improve with the size of the training set. Therefore, more accurate and stable case-mix pricing results can be obtained based on the new test set than based on the original data. The procedure just described will be called multi-model ensemble decision tree (MEDT), as shown in Algorithm 1.

Algorithm 1 The MEDT algorithm.
Input: D is the training set, L is the learning algorithm; k is the class number of learning algorithms to generate models; C is a combined model, T is a classifier. For i = 1 to k Let L i be the i-th learning algorithm; Generate M i as the set of models by applying learning algorithm L i to D; Generate S i as the set of metadata generated by the model M i . Obtain C M 1 ,··· ,M k by applying C to S 1 , · · · , S k . Let x be a covariates randomly generated by D, and D be the new dataset generated by applying C M 1 ,··· ,M k to x. Merge data: D new ← D ∪ D . Train the model with dataset D new to obtain the classifier T. Output: Grouping result.
During the actual case-mix process, we usually select the base learner from a variety of strong learners, such as random forest, gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), lasso regression, support vector regression (SVR), and other models. The advantage of this setting is that the proposed ensemble model takes the prediction capability for linear and nonlinear structures into account, and that the selected models have strong abilities to distinguish the variable importance, which can identify the most important variables that affect the degree of medical resource consumption. While ensuring the generalization ability of the model, we intend to preserve the information of the original data as much as possible. We adopt the five-fold cross-validation method, so we randomly divide the original data into five parts, out of which four parts are used as a training set and the remaining one is used as a validation set. The base learner is trained using the training data and the predictions produced by the base learner are used as metadata. Metadata can be regarded as the input features of the metamodel. Next, we will use the metadata to train the metamodel (the combined model C). Figure 2 shows the above learning process. Model combination has great significance in the procedure of ensemble construction. The appropriate combination can improve the data analysis ability of the model. The most common combination method is majority voting and its variants [26], such as simple majority voting and weighted majority voting. In this paper, we investigate the stackinglike approach to constructing the ensemble, so as to explore a better way to combine trained base learners. Each base learner provides a different contribution to the final results, which can be represented as a weight. In this case, the question is turned into how to predefine the weights. Reference [26] proposed a stacked regression method. This method improves the prediction accuracy by linearly combining the different predictors, and determines the weight of each predictor by the least-squares method. Since we choose the base learners among strong ones, there is a strong correlation between the prediction results of the models. Using the least-squares method to assign weights to the model often causes overfitting. Here, we use the ridge regression method to determine the weights of the model: where λ is the penalty item, M i (x) denotes the predictions of k different models, and β i is the weight of the i-th model. When combining the models, we shrink the weights of the models by adding the 2 norm. This can tackle the overfitting problem, improve the generalization ability of the model, and better integrate multiple models to analyze the data. Then, we obtain the additional samples by applying the rules obtained with the ensemble model, then generate a new sample set D by aggregating those with the original data, and finally use D to construct a case-mix decision tree. Figure 3 shows the specific process of the MEDT algorithm.

Simulation Study
We compare the performance of multi-model ensemble decision tree methods in terms of several criteria over some simulation scenarios. We generate an outcome y through a certain model with four categorical covariates ( Several parameters are inspected to assess the prediction performance under various simulation scenarios. First, we change the sample size n. Next, we invoke three types of data generation schemes, where x 4 , x 5 , and x 6 are unrelated variables to y, and the error ε ∼ N(0, 1). In the first case, the outcome y is obtained by a linear model. In the second case, the outcome y is generated in a way that includes a polynomial term, a logarithmic term, and interaction terms. The third case is more complicated than the second one, with the participation of an exponential term.
Caes 1: y = 10x 1 + 5x 2 + 1 5 x 3 + 3 + ε; For each of the above scenarios, we augmented with additional 1000 samples generated by simulation. We split each dataset into the training and testing sets, one of which was randomly assigned half of that dataset and the other as its complement. Since class labels are not provided in our data, we evaluated the performance of the aforementioned methods based on cluster internal information [27]. We calculated the Calinski-Harabaz index (CHI), silhouette coefficient (SC), and Davies-Bouldin index (DBI) based on the testing set.
We conducted two simulation experiments. In the first experiment, we considered the settings with sample sizes n of 1000, 2000, and 4000, in which the training set and the test set each accounted for 50% of the dataset. We compared the performance of CART and CHAID with the proposed MEDT. The larger the values of CHI and SC, the smaller the value of DBI, indicating that the generated clusters are dense within the same cluster and that the different clusters are farther apart, i.e., the results of the case-mix are more significant.
It can be seen from Table 1 that when the sample size was relatively small (n = 1000), our proposed MEDT method exploited more sample information, so its performance was significantly better than the other two methods, with larger CHI and SC values and a smaller DBI value. Additionally, CART and CHAID may have under-fitting problems when the sample size is small, leading to inaccurate final grouping results. In addition, as the sample size increased, the performance of the three methods improved their results in terms of all metrics. When the sample size was increased from n = 1000 to n = 2000, the performance of CART and CHAID was significantly improved, but the clustering effect was still worse than our method. When the sample size became larger (n = 4000), the differences between the above three methods were subtle, but MEDT and CART performed relatively well. The results of the first simulation experiment show that the MEDT method performs better in the case of small samples. This is due to the fact that the MEDT method obtains more sample information by integrating more models. Next, we verified that this approach favors better clustering (especially when the sample size is relatively small). In the second experiment, we had two different settings: in the first setting, we duplicated samples of the original training set and aggregated them into a new test set to train our proposed case-mix tree model, denoted as COM hereafter; in the second setting, we used the MEDT method directly on the training set data. As in the first simulation, we set the sample size n to be 600 and 1000 and generated the data. The training set and the test set each accounted for 50%, and Table 2 shows the comparative performance under two different settings. The results in Table 2 show that when the sample size was small, the MEDT method could obtain better clustering results by utilizing more sample and model information. At the same time, we can see that applying the MEDT led to a slight increase in MSE compared with the method of directly duplicating samples, but this increase is hardly distinguishable. For the case-mix problem, we are more interested in the clustering of homogeneous patients than the accuracy of individual prediction. A reasonable grouping of cases will help health insurance departments to differentiate patients and make compensatory payments. In terms of the three metrics CHI, SC, and DBI, the MEDT method has better performance than that of direct sample duplication.
In some clinical scenarios, we often need to group patients in order to homogenize similar populations. However, when the amount of available data is insufficient, methods such as CART and CHAID do not perform so well, and the improvement brought about by simply copying the sample is also limited. In this case, the MEDT method performs better in grouping problems than the two comparative methods while keeping the predominant interpretability of the decision tree method. In the following section, we construct the case-mix of ovarian cancer patients based on this MEDT method.

Application
We conducted the experiment using the data of ovarian cancer (OC) patients in some tertiary hospitals in Jilin Province, China. This dataset was provided by the Jilin Province Administration of Social Medical Insurance of China and contains medical consumption records of OC patients from 2017 to 2019, patients' individual information (including age, gender, marital status, ethnicity, medical insurance category, etc.), medical diagnosis information (including disease name, main medical operations, comorbidities, complications, etc.), and medical expenses (total amount of consumption and various medical service expenses). The features in the original data are not presented as a formalized vector but disorderly displayed as multiple records. Therefore, in order to conduct our subsequent experiments, we first preprocessed the data and quantified the categorical variables.

Ovarian Cancer Case-Mix Pricing
After preprocessing, we had 1463 OC cases. Next, we needed to price the case-mix according to the difference in medical resource consumption of patients. First, we used the MEDT method to build the case-mix model. The depth of the tree is proportional to the complexity of the model. Therefore, we set the maximum depth of the tree to 4 and pruned it. Figure 4 shows that 12 different case subgroups were obtained using the MEDT method, and the final grouping results are very clear and interpretable. Then, we verified that our grouping is reasonable. Figure 5 shows the differences in medical expenses of 12 different OC case subgroups based on the MEDT method. As can be seen from Figure 5, the OC case subgroups obtained using our method are relatively separated, and each subgroup has different medical resource requirements. Afterward, we illustrated the rationality of grouping from the perspective of statistical tests. Firstly, we performed the Kruskal-Wallis test on the medical costs of patients in the 12 case subgroups, and obtained the p-value of the test is less than 10 −2 . Then, we performed multiple comparisons of case subgroups by Holm's method [28]. Among them, the largest p-value in multiple comparisons was 6.80 × 10 −3 , which is still less than 10 −2 , indicating that there are significant statistical differences in the medical costs of patients between different case subgroups. These results from the statistical test verify that the proposed grouping method is reasonable.
Finally, we set a price for the ovarian cancer case-mix based on the grouping results in Figure 4 and compared it with the current payment standard for OC disease. In order to ensure consistency with the grouping criteria, we made small adjustments to the results of the case-mix. This helps the grouping criteria to be simple and comprehensible. Currently, the average reimbursement rate of medical insurance for treating OC is 70%. Therefore, we took 70% of the actual medical expenses of each case subgroup and rounded it to multiples of 500 to set as the corresponding payment standard for the OC case-mix. The results are shown in Table 3.
From Table 3, we can see that the payment standard for OC patients based on the MEDT method is more reasonable and interpretative. By grouping patients by differences in medical resource requirements, it cannot only meet the medical needs of mild patients but also increase the degree of medical compensation for severe patients, making the allocation of medical insurance funds more reasonable. At the same time, after applying the payment standard for the case mix, the total medical expenses of OC patients can be decreased by 9.12% compared with the previous standard. The standards developed by our method are also beneficial for controlling and reducing health care costs.

Result Comparison
We applied CART, CAHID, and our proposed MEDT method to build a case-mix tree model, and evaluated the performance of our method and comparative methods in terms of CV and RIV. The CV reflects the difference in medical resource consumption within each case subgroup, while the RIV reflects the degree of variance reduction after the case mix. A tree with a smaller CV value has a smaller degree of dispersion and better homogeneity, as well as smaller differences within the group. Similarly, a tree with a larger RIV can discover the better underlying rules in the data, provide more reasonable grouping, and reduce variations more [29]. According to the current technical specifications for DGR and DIP group payment by the National Medical Security Administration, the CV value after applying the case-mix should be less than 0.8, and the RIV value should be greater than 80%.
in whichx i represents the average cost of the i-th case subgroup,x represents the average medical cost of all patients, and x ij represents the medical cost of the j-th patient in the i-th subgroup. We set the same maximum tree depth and minimum sample number of leaf nodes for all three methods, and pruned the results of CART and MEDT. Then, we calculated the average CV and RIV values of the OC case-mix generated by the three methods. The results in Table 4 show that the grouping performance of the MEDT is better than the CART and CHAID methods, with a lower average CV value and a higher RIV value. Considering Table 3 together with Table 4, we can see that the CV values in the groups are all less than 0.8, indicating that the internal dispersion of each subgroup gets smaller after grouping. The RIV is 93.90%, which is greater than 0.8, indicating that the grouping method can discover more latent rules within the data, and the degree of systematization is higher. The grouping reduces the degree of variation by 93.90%, meaning that grouping has the capability to significantly reduce the degree of variation within the group. In a nutshell, the simulation experiments' results show that the proposed method outperforms the two comparative methods (CART, CHAID) in terms of reasonable metrics while better reflecting the heterogeneity between different case subgroups. Meanwhile, the results of experiments on real-world data show that the grouping by our method produces better CV and RIV values compared to the two other methods, which indicates that our method can perform better in the real dataset.

Conclusions
Faced with the rapid increase in medical expenses, the reform of medical insurance payments has become an imperative topic. At present, the medical insurance payments in China are mainly based on case-mix payment methods such as DRG and DIP. The good case-mix payment methods should generate reasonable groups and provide appropriate compensatory payments for patients with different medical resource needs. In this paper, we propose a case-mix decision tree method, which provides the reasonable grouping of patients with different medical resource needs as well as the predominant interpretability of tree models. In practical situations, provided data are often insufficient, causing the single-tree model to be structurally unstable. To handle this problem, we propose a multimodel ensemble decision tree method. During the model combination, we penalize by the ridge regression method to avoid the overfitting problem. Eventually, we construct a casemix decision tree model and provide the interpretable grouping rules. The subgrouping experiments on both simulated and real-world data showed that our proposed method outperforms the two comparative methods (CART, CHAID).
The disadvantage of this method is that it requires more training time than a single decision tree, due to the integration of multiple models, especially when dealing with high-dimensional data. Furthermore, we have only conducted experiments on data with a few diseases. In the future, we can augment our dataset by collecting more cases from different medical centers to generate more reliable health insurance pricing models.