Application of Entropy-based Attribute Reduction and an Artificial Neural Network in Medicine: a Case Study of Estimating Medical Care Costs Associated with Myocardial Infarction

In medicine, artificial neural networks (ANN) have been extensively applied in many fields to model the nonlinear relationship of multivariate data. Due to the difficulty of selecting input variables, attribute reduction techniques were widely used to reduce data to get a smaller set of attributes. However, to compute reductions from heterogeneous data, a discretizing algorithm was often introduced in dimensionality reduction methods, which may cause information loss. In this study, we developed an integrated method for estimating the medical care costs, obtained from 798 cases, associated with myocardial infarction disease. The subset of attributes was selected as the input variables of ANN by using an entropy-based information measure, fuzzy information entropy, which can deal with both categorical attributes and numerical attributes without discretization. Then, we applied a correction for the Akaike information criterion () to compare the networks. The results revealed that fuzzy information entropy was capable of selecting input variables from heterogeneous data for ANN, and the proposed procedure of this study provided a reasonable estimation of medical care costs, which can be adopted in other fields of medical science.


Introduction
Reliable estimates of medical care costs for myocardial infarction (MI)-related patients can provide an alternative to cost-effectiveness evaluations of MI prevention, screening and treatment policies [1,2].Besides, obtaining accurate estimates of this outcome will allow the administration to properly manage the available medical resources for inpatient hospitalizations.Previous works have demonstrated that demographic factors, percutaneous coronary intervention, coronary artery bypass graft surgery and length of stay were significantly associated with higher healthcare costs [3,4].
Artificial neural networks (ANNs) provide a rich, powerful and robust nonparametric modeling framework currently being used in a variety of applications in medicine, such as diagnosis, electronic signal analysis, medical image analysis, radiology and clinical outcome prediction.In [5], the authors made use of ANN analysis to assess the accuracy of real-time endoscopic ultrasound elastography in focal pancreatic lesions.In [6], the authors developed random forests, support vector machines and ANN models to diagnose acute appendicitis.Shi et al. validated the use of ANN models for predicting quality of life after breast cancer surgery [7].In [8], the author developed a biomedical-based decision support system for the classification of heart sound signals by using principal component analysis (PCA) and ANN.In [9], an ANN model was developed to predict survival in patients with pancreatic ductal adenocarcinoma.In [10], the authors applied the ANN method to model the sample entropy.
ANNs offer several advantages, including requiring less formal statistical training, the ability to implicitly detect complex nonlinear relationships between dependent and independent variables, the ability to detect all possible interactions between predictor variables and the availability of multiple training algorithms [11].However, the "black box" nature, heavy computational burden, proneness to overtraining and the empirical nature of model selection are the disadvantages of ANN models.Hybrid methods, like ANN and genetic algorithms (GAs) [12], ANN and PCA [8], ANN and the artificial bee colony algorithm (ABC) [13], ANN and autoregressive integrated moving average (ARIMA) models [14] and ANN and rough sets [15] were developed to overcome the above problems.In this study, an integrated method based on fuzzy information entropy and ANN was developed to estimate medical care costs for admissions of MI disease.
To handle the multidimensional data efficiently, dimensionality reduction should be performed to map the data to a lower dimensional space.A typical method is attribute reduction based on Pawlak's rough set model, which has been successfully used in feature subset selection and attribute reduction [16,17].Pawlak's rough set model works in circumstances where only nominal attributes exist in an information system and is limited in dealing with numerical variable directly unless applying a discretizing algorithm, which may lead to some information loss [18,19].To address this problem, rough-fuzzy and fuzzy-rough sets were proposed in [20] and analyzed in detail in [21], which were successfully used in attribute reduction.In [22,23], the authors proposed an integrated use of fuzzy and rough set theories to reduce the data redundancy based on the fuzzy dependency function.
Then, the authors introduced an information measure, fuzzy information entropy, to measure the discernibility power of a fuzzy equivalence relation [24].The significance of categorical, numeric and fuzzy attributes can be defined in a general form with this measure.Thus, we applied fuzzy information entropy to compute attribute reduction in this research.
In this study, a method was introduced to estimate the medical care costs, obtained from 798 cases of MI-related patients.Then, the data set was mapped to a lower dimensional space using fuzzy information entropy, being applied as an attribute reduction technique.Therefore, the problem of overtraining and heavy computational burden of ANNs could be avoided by eliminating superfluous input attributes.The result showed that the proposed method was efficient in estimating medical care costs and could be adopted in various medical applications.

Raw Data
For this study, we obtained 798 cases of inpatients with myocardial infarction (ICD-10 code I21, International Classification of Diseases, 10th revision) from three comprehensive hospitals in Wuhan, China.The raw data contain the demographic factors of patients (age in years, gender), characteristics of patients (history of diabetes, blood pressure, history of smoking, cholesterol, physically active or not, obesity, history of angina, history of MI), medical treatment process (prescribed nitroglycerin, taking anti-clotting drugs, electrocardiogram result, creatine phosphate kinase blood result, troponin T blood result, taking clot-dissolving drugs, time of hospitalization, hemorrhaging, magnesium, digitalis, beta blockers, surgical treatment, surgical complications and length of stay) and the medical care costs for each patient.

Fuzzy Rough Set Model
The equivalence relations are introduced in Pawlak's rough set-based methodology to partition the universe and generate mutually exclusive equivalence classes as elemental concepts for categorical variables.In [22], the authors suggested that the fuzzy equivalence relation should be generated for numeric and fuzzy attributes, instead of crisp equivalence relations.
The heterogeneous data can be depicted as an information system , , and ⊆ , where , , … , , and is a binary relation on , denoted by a relation matrix , where ∈ 0,1 is the relation of and .
Given that and represent a non-empty finite set and a binary fuzzy relation on this set, for ∀ , , ∈ , satisfies: (1) Reflectivity: , , , , Definition 1.For ∀ ∈ , the fuzzy equivalence class generated by and is defined as: Definition 2. Given a fuzzy information system, , ∪ , , , where is the set of condition attributes and is the decision attribute.For ⊆ , | | | | ⁄ represents the dependency of on , where is a lower approximation of the decision and also called a positive region of the decision.

Entropy-Based Information Measure
The fuzzy equivalence class has been defined in Definition 1.Then, the pertinent points of the fuzzy information entropy are introduced as follows.
Definition 5.The information quantity of the fuzzy equivalence relation is introduced as: when the relation is a crisp equivalence relation, this information quantity is defined as Shannon's entropy [24].Definition 6.Given a fuzzy decision system S , ∪ , , , , are two subsets of and the fuzzy equivalence classes on and are and ; the joint entropy and the conditional entropy are defined as: Definition 7. Given a fuzzy decision system S , ∪ , , , ⊆ , ∀ ∈ , is superfluous if ; and is independent if . is a reduct if .

Artificial Neural Network (ANN)
An ANN is composed of a number of interconnected neurons (referred to as "nodes", "processing units" or "processing elements"), organized hierarchically in layers.In an ANN, knowledge about the problem is modeled by using learning algorithms and saved in weighted connections.The feed-forward neural network of multi-layer perception (MLP) with an error back-propagation (BP)-type of learning algorithm is the most popular ANN model used in estimation and regression problems.An MLP consists of an input layer, hidden layers and an output layer, each of which is composed of a set of neurons.The MLP with the back-propagation algorithm is trained using a dataset of associated input and target values, and the network is updated by rearranging the weights of neurons in every epoch upon calculating the error in the network's output, until the performance of the network is satisfactory.
The number of neurons in the input layer depends on the result of the dimensionality reduction based on the fuzzy information entropy, and the problems of overtraining and heavy computational burden could be avoided with the lower number of input variables.For the neurons in the hidden and output layer, their inputs are processed by multiplying each input by a corresponding weight and summing the products and then transmitting this sum to the output by using a nonlinear transfer function.In each epoch, the weights among the neurons of the network are adjusted based on the errors between the actual outputs and the target outputs.When the training is complete, the network should be able to provide an accurate estimation for a given input.
A three-layered MLP feed-forward neural network with the back-propagation learning algorithm was applied in our study.For this research, there is only one neuron in the output layer representing medical care costs; thus, the most critical problem is defining the size of the input layer, which is directly related to the network's performance.The entropy-based information measure for the fuzzy equivalence relation can analyze the significance of various factors to remove the superfluous attributes from the heterogeneous data.Therefore, the combined method presented in this paper can overcome the disadvantages of ANNs in estimating medical costs.

Akaike Information Criterion (AIC)
Information-based criteria, such as the Akaike information criterion (AIC), which are measures of the relative quality of an estimated statistical model, are widely used as the model selection approach.The underlying idea of the information-based criteria is to identify an optimal trade-off between an unbiased approximation of the model and the complexity of the model.In [25,26], the authors use AIC in the neural network model selection to determine the optimal parameters of the ANN model.In this study, the comparison of ANN models is conducted using AIC.In the general sense, by using the likelihood , the AIC is calculated as: or using residual sum of squares (RSS): where denotes the total number of estimable parameters in the model and is the sample size.For a small sample size, when ⁄ is less than 40, a correction for AIC ( ) is recommended in [27] as follows.
The model exhibiting the smallest AIC value is selected as the best fit model in this study.

Evaluation of Performance
We use two different criteria to evaluate the performance of the integrated method based on fuzzy information entropy and ANNs: root mean square error (RMSE) and the mean absolute percentage error (MAPE).The first criterion is calculated by: 1 where and denote the calculated and target value of the medical costs, respectively, and n is the number of training data.The reason for applying the training error in the criteria is that it is directly related to the ANN's memorization ability.The second criterion is MAPE, defined as: where n, C and T are defined as in (9).The second criterion is the average percentage of the absolute values of relative error, which can be used directly for experimental data without data normalization.

Dimension Reduction via Fuzzy Information Entropy
A three-layered MLP feed-forward ANN structure has been made use in this work.However, one of the problems related to ANN modeling is that sample data usually contains superfluous, irrelevant, and noisy input variables, which may obstruct knowledge acquisition and affect the network's generalization ability.Therefore, due to the multiple input variables, the choice of ANN input variables could overcome these drawbacks.

Definition 8. Given
, ∪ , , represents a fuzzy information system, for ⊆ , ∀ ∈ , the significance of attribute in relative to is defined as: The fuzzy information entropy is introduced in Section 2.2.The greater the entropy value is, the stronger the discernibility is and the more significant the attribute is.If the significance equals zero, then we consider that the attribute is superfluous; otherwise, is indispensable.The aim of attribute selection is to search a subset of attributes that has the same approximating power as the original data and that does not have any redundant attribute.We apply a forward search algorithm, which has been discussed in detail in [24].The parameter ( 0.001) is a tiny positive real number that controls the convergence.Attribute reduction starts with an empty set of attributes, and in each iteration, one attribute is added into this empty set to produce the maximal increment of fuzzy information entropy; the process stops when the increment of entropy is less than ε in one round by adding any attribute into this subset.Then, the superfluous attributes were eliminated, and eight key attributes were selected as the input variables of the ANN.Let , , … , be a reduct, where 1,2, … ,8 denotes the eight selected attributes, respectively; the maximal increment of entropy is fulfilled during this process, and the significance of the selected attribute is summarized in Table 3.The subset of attributes provides the maximal entropy as 1.6824, which is the result of reduction.

Estimation Using the Artificial Neural Network
Two scenarios are simulated and compared in this step, namely estimation of the medical costs using the ANN only ( ) and using the proposed method ( ).Table 4 demonstrates the structure and training parameters, which consists of the number of neurons in the layers, initial learning rate, \momentum constant value and activation functions used in different layers.
The stopping criteria in Table 4 determine when to stop training the neural network.Criterion 1 defines the maximum number of minutes for the algorithm to run; Criterion 2 depicts the maximum number of epochs allowed, and the training stops if the maximum number of epochs is exceeded; Criterion 3 describes an threshold value in the training error, and the training discontinues if the relative change in the training error is less than the threshold value compared to the previous epoch.In each epoch, whether training is proceeding or not is based on the stopping criteria, which is checked in the given order.In this study, 496 samples were selected as the training set to update the weights of the network; 153 samples were selected as the validation set to prevent the overtraining; and 149 samples were selected as the test set to measure the performance of the network.The results of network with the best estimation performance of the two scenarios are described in Table 5.For each network, the training process has been replicated 20 times to acquire stable performance with random initially values of weight and bias.

ANN architecture
The number of layers 3 The number of neurons in the layers Input: 24 ( )/8 ( ) Hidden: ≤10 Output: 1 The initial weights and bias The Nguyen-Widow method  Table 5 contains comparison between the chosen networks of scenario and selected by .From Table 5, a network with nine hidden neurons was selected in , while the chosen network in has eight hidden units.The results of reveal that the network complexity is simplified by eliminating the superfluous input attributes via fuzzy information entropy, which will prevents the problem of overtraining and the heavy computational burden of ANN.At the same time, the results of and show that the proposed method provides a better estimation performance.Figure 2 displays two scatterplots of estimated value on the -axis by observed medical costs on the -axis for all training samples based on the two selected networks, which demonstrates that the selected network of has a relatively good ability of estimating medical cost than the chosen network of .The results indicate that the reduct can capture the content in the original dataset while maintaining good approximating ability.

Conclusions
In this paper, we introduced an integrated method to address the overtraining and heavy computational burden of ANN modeling, and the proposed approach provides a relatively good approximating ability of estimating medical costs related to myocardial infarction.The results showed that attribute reduction based on fuzzy information entropy can be applied for selecting the input variables of an ANN.
The choice of input variables is a fundamental and crucial consideration in identifying the optimal functional form of statistical models [28].However, ANN is a typical data-driven statistical modeling approach, and there is no prior assumption made regarding the structure of the model.Selecting input variables is complicated by the fact that the data is multidimensional and heterogeneous and there is interdependence between available input variables and redundant variables with little predictive power, which results in the usage of dimensionality reduction techniques, often accompanied by discretizing algorithms.Shannon's entropy has been widely used as an information measure in machine learning.In this research, we applied fuzzy information entropy, which is a generalization of Shannon's entropy, to deal with both categorical attributes and numerical variables simultaneously and to measure the fuzzy equivalence relation.
In this study, the results of comparison indicate that can be used in model selection and multi-model inference of ANN for a small sample size.The problem of model selection is considerably important for acquiring a higher level of performance in an ANN.In the field of ecology, AIC is widely used to compare and rank multiple statistical models and to estimate which of them best approximates the "true" process underlying the biological phenomenon under study [29], and it has been applied to select the optimal structure of a neural network in many works.
Although the proposed approach generates a reasonable result in estimating the medical costs related to myocardial infarction, further work should address more detailed individual-level data collection, such as income level, education level, behavioral factors, residential environment, socioeconomic factors, neighborhood characteristics, etc.Moreover, the subjective factors of patients should be considered in further research.One major contribution of our work is to introduce an integrated method to model the nonlinear relationship in issues involving multivariate data and to overcome the disadvantages of ANN modeling, which can be applied in other fields of medical science.

Figure 1
Figure 1 shows the procedure proposed in this research.It consists of five parts: (1) raw data obtainment; (2) data preprocessing, including classification and normalization; (3) dimensionality reduction via fuzzy information entropy; (4) estimating medical costs through an ANN; and (5) comparing the results.

Figure 1 .
Figure 1.The procedure of the proposed method.

Figure 2 .
Figure 2. Estimated-by-observed charts for medical cost.

Table 2 .
Statistical properties of four numerical variables.

Table 3 .
Increment of the entropy and the significance of each attribute.

Table 4 .
ANN architecture and training parameters.

Table 5 .
Results of the selected networks.RSS, residual sum of squares.H.U. denotes the number of neurons in the hidden layer; k denotes the total number of estimable parameters. *