The Medical Outcomes Distribution and the Interpretation of Clinical Data Based on C4.5 Algorithm for the RCC Patients in Taiwan

The aim of our study is to explore the medical outcomes among patients in the respiratory care center (RCC) and related factors. A cross-sectional study was performed at a regional hospital in central Taiwan from January 2018 to December 2018. The sample consisted of 236 patients who received RCC medical services. The chi-square test, multiple ordinal logistic regression analyses, and C4.5 decision tree algorithm were performed. The risk factors for medical outcomes in critical or deceased patients were obesity (BMI ≥ 27.0) (OR = 2.426, 95% C.I. = 1.106–5.318, p = 0.027), being imported from home (OR = 2.104, 95% C.I. = 1.106–3.523, p = 0.005), and with the Acute Physiology and Chronic Health Evaluation II (APACHE II) score ≥ 25 (OR = 2.640, 95% C.I. = 1.283–5.433, p = 0.008). The results of the C4.5 algorithm showed a precision of 79.80%, a recall of 78.80%, an F-measure of 78.20%, a receiver operating characteristic curve (ROC) area of 89.20%, and a precision-recall curve (PRC) area of 81.70%. It is important to design effective intervention strategies for patients who are obese and with high APACHE II scores and propose timely treatments for the patients’ onset of disease at home. Moreover, by using the C4.5 algorithm, data can be interpreted in terms of decision trees to aid the understanding of the medical outcomes of the RCC patients.


Introduction
Patients ventilated for a prolonged period often consume a high amount of intensive care unit (ICU) resources [1]. It is estimated that the number of ICU patients needing prolonged mechanical ventilation ranges from 3% to 10%, with these patients utilizing 37-40% of ICU resources [2]. Taiwan's National Health Insurance Administration proposed a four-step Integrated Delivery System (IDS) for the care of ICU patients who receive ventilator treatment for longer than 21 days and require lengthy mechanical ventilation. Under this system, patients are transferred from ICUs to respiratory care centers (RCCs). Patients may also be transferred to an Respiratory Care Ward (RCW) if they stay at an RCC for more than 42 days. Respiratory care wards are chronic care units designed for patients with prolonged respiratory failure [3,4]. Due to limited medical resources and the rising cost of health care, disease prognosis has become a very important issue for health sciences.
The Acute Physiology and Chronic Health Evaluation II (APACHE II) score is a severity-of-illness score that is taken at the time of admission of the patient to the ICU, showing the worst figures during the patient's first 24 h of ICU stay [5]. It has been found that there is a significant association between APACHE II scores and risk of mortality (ROM), i.e., the higher the APACHE II score, the higher the ROM [6]. Clarification of the 2 of 11 relationship between APACHE II scores and the medical outcomes of RCC patients can assist in designing more effective medical protocols.
The mechanism by which obesity increases the ROM for ICU patients is unclear. Akinussi et al. [7], for instance, reported that obesity (body mass index (BMI) ≥ 30 kg/m 2 ) was not related to crude ICU mortality when compared with patients with a BMI lower than 30 kg/m 2 . Conversely, Oliveros et al. [8] found lesser mortality for obese patients compared with those with normal weight, but Hogue et al. [9] stated that obesity is not associated with increased risk for ICU mortality. O'Brien et al. [10] demonstrated that lower BMI was linked with higher ROM, whereas overweight and obesity were associated with lower mortality. Additionally, the relationship between BMI and patients in the RCC patients weaning rates is much understudied.
Well-timed access to appropriate care is imperative for critically ill patients. It has, for example, been found that transported patients have a significantly longer ICU stay and greater estimated ROM compared to non-transported patients [11]. The early transfer of patients has also been found to mitigate the risk of adverse impacts on medical outcomes [12]. The transported source of patients may influence the delivery process when referred to the ICU and thus determine the medical outcome.
An exploration of the sociodemographic factors and successful weaning rates' relationship has shown that gender does not affect weaning rates in the Taiwanese context [4]. Frengley et al. [13] showed that the successful weaning rates decrease with patients' age. However, other studies have reported that weaning rates have no relationship with age [14]. The critical issues that affect RCC patients' weaning rates have been shown to be the related causes of respiratory failure as well as confounding diseases such as pulmonary disease, cardiovascular disease, cancer, and renal failure [3].
Data mining algorithms have been developed to collect clinical data to aid in the diagnosis of diseases, such as coronary heart disease, Type 2 diabetes, and Parkinson's disease [15][16][17]. This is a new scientific area that integrates research from the fields of statistics, machine learning, and computer science (particularly database management). These algorithms facilitate the early detection of diseases, more precise medical treatment, and better use of social medical resources. The C4.5 algorithm is an improved ID3 (Iterative Dichotomiser 3) algorithm, which, in turn, uses information gain rate to select attributes, which produces a more laconic decision tree and enhances the algorithm's efficiency. It has been used in many ways in the fields of medical and clinical informatics. For example, it has been applied to test a clinical guideline-based decision support system in which the C4.5 algorithm was successfully implemented for the testing of a medical DSS relating to chronic diseases [18]. Moreover, it has been adopted for the development of application tools that diagnose diabetes infection symptoms [19].
The aim of our study is to explore the medical outcome distribution among patients who stayed in the RCC with different APACHE II scores and sociodemographic features and to obtain knowledge regarding factors related to the prognosis of RCC patients, especially concerning the roles and effects of BMI and patient sources. The C4.5 decision tree algorithm was performed for the inferential statistical analysis. The performance parameters for the C4.5 decision tree algorithm were precision, recall, F-measure (the harmonic mean of precision and recall), receiver operating characteristic curve (ROC) area, and precision-recall curve (PRC) area.

Study Design
A cross-sectional design was employed to achieve the study's research objectives. Clinical data were collected from 236 patients (135 men; 101 women) who received RCC medical services and were transferred from the ICU to the RCC at a regional hospital in central Taiwan between January 2018 and December 2018. Anonymous analysis of the data was used to ensure confidentiality, and the protocol was approved by Jen-Ai Hospital's Medical Ethics Committee (IRB no. 108-74).

Measurements
Outcome variables were classified according to patients' medical outcomes-critical or deceased, transferred to an RCW, liberated from the ventilator (weaning). The independent variables were obtained from patients' medical records, i.e., age, gender, body weight, and height. The BMI value was calculated using weight and height data in accordance with Taiwan's Health Promotion Administration's BMI classification (<18.5 underweight; 18.5-24.9 healthy; 25-26.9 overweight; ≥27 obese). Other factors included the source of admission (home or long-term care facility) and the APACHE II score (divided into categories: <15, 15-24, and ≥25). These were employed to analyze the relationship with the medical outcomes of the patient.

Statistical Analysis
Frequent analyses, mean value, and standard deviation were conducted to assess the distribution of patients' medical outcomes, sociodemographic factors, and APACHE II score. Furthermore, to evaluate the relationship between patients' medical outcomes and related variables, a chi-square test and ordinal logistic regression analysis were performed for the inferential statistical analysis. The SPSS software (SPSS Inc., Chicago, IL, USA) (version 18.0) was used.

C4.5 Decision Tree Algorithm and Performance Evaluation
The C4.5 decision tree algorithm was performed to analyze the data inferentially. Two models were built with five basic attributes plus 12 disease attributes. The performance evaluation parameters for the C4.5 decision tree algorithm were precision, recall, F-measure, ROC area, and PRC area. The C4.5, originally developed by Quinlan [20] and considered an improvement of Quinlan's earlier ID3 algorithm, is used to generate decision trees. The decision trees generated by C4.5 are used mainly for classification. As a result, C4.5 is often referred to as a statistical classifier and a landmark decision tree program that is probably the machine learning workhorse most widely used in practice to date [21]. The C4.5 builds decision trees from a set of training data in a similar way as the ID3, using the concept of information entropy. The training data are a set of pre-classified samples. At each node of the tree, C4.5 picks the data attribute that best splits its sample sets into subsets enriched in one class or the other using the splitting criterion normalized information gain where the highest gain is chosen as the decision criterion. Quinlan [20] suggests using the following equation to calculate the gain ratio: Considering the information content of a message that indicates-not the class to which the case belongs-but rather the outcome of the test on feature X, we used the following equation for Split Info: The GainRatio(X, T) is thus the proportion of information generated by the split that is useful for the classification.
The performance evaluators used for the C4.5 decision tree algorithm were precision, recall, F-measure, ROC area, and PRC area. These are defined below: Precision (positive predictive value (PPV)): TP TP + FP ; Recall (also known as sensitivity, or true positive rate (TPR)): TP TP + FN ; F-measure (also known as F1 score, which is the harmonic mean of precision and recall): Appl. Sci. 2021, 11, 2566 4 of 11 ROC area: A ROC area is an area under the ROC curve (AUC), one of the common evaluators for machine learning algorithms. A ROC curve is a plot of the false positive rate (x-axis) versus the true positive rate (y-axis) for a number of different candidate threshold values between 0 and 1; PRC area: A PRC area is the area under the PRC curve, another common performance evaluator for machine learning methods. A PRC curve is a plot of the precision (y-axis) and the recall (x-axis) for different thresholds similar to the ROC curve.

The Description of the Sample Data
The study included 236 patients (135 men; 101 women). Overall, 139 (58.9%) of the patients were above 75 years old (M = 74.1 ± 14.6 years old). A total of 66 (28.0%) of the patients' negative outcomes were due to cardiac disease, and 24 (10.2%) due to cancer. Moreover, 111 (47.2%) of the patients' BMI values fell in a normal range (18.5-23.9), and 32.3% were obese (≥27.0). More patients were transferred from home than from a longterm care facility (54.7%/47.3%). Half (50%) of the samples' APACHE II scores fell in the 15-24 range. Overall, 47.9% of the patients were liberated from the ventilator, 26.3% were admitted to an RCW, and 25.8% of the samples were certified as either critical or deceased. The samples' basic sociodemographic and clinical features are shown in Table 1. The differences in the patients' medical outcomes distribution resulting from the X 2 tests are illustrated in Table 2. The results of the univariate analysis showed that the variables that reached statistically significant difference in the distribution of medical outcomes were BMI level (p = 0.006), source of admission (p = 0.011), APACHE II scores classification (p = 0.028), and diabetes (p = 0.023). When comparing the differences in medical outcomes, a higher percentage of successful weaning was found in patients with a normal BMI level (51.4%), those being transported from a long-term care institution (56.1%), and those with APACHE II score of less than 15 (56.3%). Conversely, 47.1% of the patients with BMI > 30 and 33.3% of the patients received from home resulted in death or were deemed critical. A higher APACHE II score also resulted in a lower percentage of weaning. No statistically significant differences between groups were found regarding gender, age, and other diseases ( Table 2).   Table 3 shows the results from the ordinal logistic regression analysis. The results illustrate that BMI, source of admission, and APACHE II scores can be considered as predictors for medical outcomes. Obese patients were 2.426 times (95% C.I. = 1.106 to 5.318) more likely than normal-weight patients to be critical or deceased (p = 0.027). Patients who were transported from home were 2.104 times (95% C.I. = 1.257 to 3.523) more so than those from the long-term care institution (p = 0.005), and patients with APACHE II scores ≥ 25 were 2.640 times (95% C.I. = 1.283 to 5.433) were similarly found to be so (p = 0.008) compared with patients whose APACHE II score < 15.

C4.5 Decision Tree Algorithm
Two different models were built using different attributes as the input for the C4.5 algorithm (i.e., the patients' medical data), and the experimental results were derived using 10-fold cross validation.
For Model I, five basic attributes were selected based on the statistical analysis. These included age, APACHE II score, move status, gender, and BMI value. Table 4 and Figure 1 show the performance of the C4.5 decision tree algorithm using these five basic attributes. The results included the following: precision (74.90%), recall (70.80%), F-measure (69.60%), ROC area (85.10%), and PRC area (76%). The decision tree reflects inferential statistical outcomes illustrated above very well. It additionally reveals that younger patients tend to fall into a class with a high probability of successfully weaning.  For Model II, in addition to the five basic attributes adopted in Model I, 12 diseaserelated factors were added as input for the C4.5 decision tree algorithm. These included cancer, cardiac disease, pneumonia, cerebrovascular accident, diabetes, lower respiratory illnesses, hypertension, chronic kidney disease (CKD), liver disease, dementia, Parkinson's disease, and other chronic diseases. Table 5 and Figure 2 show the performance of the C4.5 decision tree algorithm with its 5 basic and 12 disease attributes. The results of the C4.5 algorithm improved as follows: precision (79.80%), recall (78.80%), F-measure (78.20%), ROC area (89.20%), and PRC area (81.70%).

C4.5 PERFORMANCE WITH 5 BASIC ATTRIBUTES
Weaning Critical_Death RCW Weighted Avg. For Model II, in addition to the five basic attributes adopted in Model I, 12 diseaserelated factors were added as input for the C4.5 decision tree algorithm. These included cancer, cardiac disease, pneumonia, cerebrovascular accident, diabetes, lower respiratory illnesses, hypertension, chronic kidney disease (CKD), liver disease, dementia, Parkinson's disease, and other chronic diseases. Table 5 and Figure 2 show the performance of the C4.5 decision tree algorithm with its 5 basic and 12 disease attributes. The results of the C4.5 algorithm improved as follows: precision (79.80%), recall (78.80%), F-measure (78.20%), ROC area (89.20%), and PRC area (81.70%).

Discussion
Patients ventilated for prolonged periods often experience progressive chronic respiratory failure and substantial comorbidity. They also consume a large percentage of intensive care resources. It is, therefore, necessary to identify the significant factors related to RCC patients' medical outcomes. The results of this study indicated that BMI, APACHE II scores, and patient source are related to the medical outcomes of RCC patients. Compared to patients with successful weaning, critical or deceased patients were characterized by being obese (BMI value ≥ 27.0), transferred from home, and having APACHE II scores ≥ 25.
In ventilated patients, obesity (BMI value ≥ 27.0) was found to be related to higher odds of death or critical medical outcomes compared with normal-weight patients, whereas overweight (BMI value between 24.0 to 26.9) and underweight patients (BMI value < 18.5) were not found to be significantly associated with the distribution of medical outcomes. This result implies that obesity may play an important role in the prognosis of RCC patients. These results are related to the extant literature. Mancuso [22], for instance, suggested that obesity may be a significant factor for the pathogenesis of pulmonary diseases, which contains pro-inflammatory mediators produced in the adipose tissue that induces a state of systemic inflammation [22]. Some researchers have also stated the probability of a higher frequency of organ dysfunctions in obese patients [23][24][25]. However, other studies have found that a BMI less than 18.5 kg/m 2 was associated with increased mortality in ICU patients [26,27]. The possible reasons for this may be that BMI status was presented to be an indicator of general nutrition, and sufficient nutrition is important to help respiratory muscle contractility and the successive ability to expectorate phlegm. As such, the effect of obesity related to medical outcomes of RCC patients needs further investigation.
The APACHE II scoring system was developed by Knaus

Discussion
Patients ventilated for prolonged periods often experience progressive chronic respiratory failure and substantial comorbidity. They also consume a large percentage of intensive care resources. It is, therefore, necessary to identify the significant factors related to RCC patients' medical outcomes. The results of this study indicated that BMI, APACHE II scores, and patient source are related to the medical outcomes of RCC patients. Compared to patients with successful weaning, critical or deceased patients were characterized by being obese (BMI value ≥ 27.0), transferred from home, and having APACHE II scores ≥ 25.
In ventilated patients, obesity (BMI value ≥ 27.0) was found to be related to higher odds of death or critical medical outcomes compared with normal-weight patients, whereas overweight (BMI value between 24.0 to 26.9) and underweight patients (BMI value < 18.5) were not found to be significantly associated with the distribution of medical outcomes. This result implies that obesity may play an important role in the prognosis of RCC patients. These results are related to the extant literature. Mancuso [22], for instance, suggested that obesity may be a significant factor for the pathogenesis of pulmonary diseases, which contains pro-inflammatory mediators produced in the adipose tissue that induces a state of systemic inflammation [22]. Some researchers have also stated the probability of a higher frequency of organ dysfunctions in obese patients [23][24][25]. However, other studies have found that a BMI less than 18.5 kg/m 2 was associated with increased mortality in ICU patients [26,27]. The possible reasons for this may be that BMI status was presented to be an indicator of general nutrition, and sufficient nutrition is important to help respiratory muscle contractility and the successive ability to expectorate phlegm. As such, the effect of obesity related to medical outcomes of RCC patients needs further investigation.
The APACHE II scoring system was developed by Knaus et al. in 1985 [5]. It was calculated by employing the values of 12 physiological variables, age, and chronic health status, each of which was assigned points using a similar tactic. Scores were allotted to the worst values of each variable for the calculation of the APACHE II scoring system. Studies have demonstrated that a growing score is closely associated with ROM in ICU patients and the outcome in a wide range of disease conditions [6,28,29]. The results of our study similarly demonstrate a significant association between APACHE II scores and the medical outcome of RCC patients. It was found that the higher the APACHE II score, the higher the risk of being critical or deceased. These findings confirm the proficiency of this scoring system to predict patients' outcomes according to the degree of severity of their disease.
Individuals living at home are usually thought to be healthier and to have a better ability to take care of themselves than those staying in long-term care institutions. However, the results of this study demonstrate that RCC patients who were transported from home had worse medical outcomes than patients who were transferred from long-term care facilities. The risk of becoming critical or ROM for transported from home patients was found to be more than twofold (OR = 2.135). This may be attributable to a delay in receiving medical treatment incurred by patients transported from home. This result is also supported by the extant literature in that it has been found that transportation barriers may play an important role, e.g., creating delayed access to health care may cause extended suffering, tough and exorbitant treatment, and increased morbidity and mortality whereas intervention strategies such as the early detection of diseases and timely treatments can save patients transported from home [30]. Therefore, the reasons behind the poor medical outcomes of RCC patients referred from home require further investigation.
The results of the C4.5 decision tree algorithm suggest that by adding the 12 disease attributes to the analysis, the precision improved almost 5% from 74.90% to 79.80%. The recall rate also improved significantly, from 70.80% to 78.80%. The other three performance measurements, namely, F-measure, ROC area, and PRC area, also improved, as shown in Table 6 and Figure 3. These data can assist medical personnel in recognizing the workings of the system for the prognosis of RCC patients. As such, medical personnel can recognize the result and intervene in the structure of the resulting decision tree. Future studies can also be conducted to assist with the interpretation of patient data in order to facilitate high rates of true positive and negative results. Schönhofer, et al. (2002) found age to be associated with increased mortality in ventilated patients [31], i.e., ventilator-dependent patients older than 80 usually experience extremely poor medical outcomes. Our study, however, showed that age is not a risk factor related to the medical result of RCC patients; consequently, higher age was not definitely found to be associated with worse health statuses. As such, there may be other factors related to the medical outcomes of RCC patients, and thus, more research is needed to explore the effect of age on the prognosis of RCC patients.
As with all research, our study has limitations. One is that the study employed a crosssectional design. As such, the effect of factors related to RCC patients' medical outcomes needs further examination in that the study sample originated from one metropolitanregional hospital in central Taiwan, and thus, the findings may not be generalizable to RCC patients throughout Taiwan. Another weakness is the small case number of some categories used to evaluate the relationship between the variables. Despite these weaknesses, our study contributes to the literature in that it explored the relationship between the medical outcomes' distribution and health issues such as APACHE II scores and BMI status levels, as admission sources for RCC patients in Taiwan.

C4.5 PERFORMANCE COMPARISON FOR MODEL I AND MODEL II
Model I (5 basic attributes) Model II (5 basic + 12 disease attributes)

Conclusions
Weaning from RCC suggests that the patients tolerate spontaneous breathing. Since patients ventilated for a prolonged period usually consume a high amount of medical resources, our study explored and identified important factors related to the medical outcomes and using C4.5 algorithm data mining technology in which clinical data can be interpreted in terms of a decision tree to aid in the understanding of the medical outcomes of the RCC patients. We also suggest that new intervention strategies should be designed to care for patients who are obese and have high APACHE II scores and that more effective methods for timely treatments for the patients' onset of the disease at home. Institutional Review Board Statement: The study protocol was approved by the Medical Ethics Committee of Jen-Ai Hospital (IRB no. 108-74).

Informed Consent Statement:
The institutional review board of the Medical Ethics Committee of Jen-Ai Hospital permitted this study without the requirement of written informed approval from any of the study patients.