4.1. Results from Model 1
From
Table 2, the most important predictor contributing to predicting the response is TSH (
X17), with a score of 100. The second most important predictor is FTI (
X21), with a score of 63.88. We can observe that all six quantitative variables contribute to predicting the response, as each quantitative variable has a score higher than zero. The dependency plots of all the quantitative variables are shown in
Figure 1,
Figure 2,
Figure 3,
Figure 4,
Figure 5 and
Figure 6: the vertical axis represents a 0.5[log(p
1/p
0)], where p
1 is the probability that the variable is in class 1 and where p
0 is the probability that the variable is in class 0. For simplicity, we refer to 0.5[log(p
1/p
0)] as the logit or log-odds. The interpretation from TreeNet is based on comparing the relative values of the log-odds; i.e., the higher the value of the log odds, the higher the probability that a variable is in class 1.
From the partial dependency plot, we take each quantitative variable and assign each of its parts to categories according to their respective functions to the logit. After the original variables are divided into parts, new variables are generated as binary variables (
Table 3). Please note that TreeNet by Salford Systems includes a feature that shows which values of the predictors constitute the separating points.
Figure 1 shows the partial dependency plot, which indicates the relationship between TSH (
X17) and the log-odds from Model 1. From this plot, we can separate
X17 into two levels given that the log-odds value shifts to a different level when
X17 = 0.025, i.e., the splitting value is 0.025. The two binary variables,
X17L1 and
X17L2, are generated as shown in
Table 3.
Figure 2 shows the partial dependency plot, which indicates the relationship between FTI (
X21) and the log-odds from Model 1. From this plot, we can separate
X21 into three levels given that the log-odds value stays constant and then shows a downward slope when
X21 = 0.055. The downward slope stops when
X21 = 0.07, and there is no change in the log-odds after this value. Therefore, this predictor is separated into three levels with two splitting values, 0.055 and 0.07. The three binary variables,
X21L1, X21L2, and
X21L3, are generated as shown in
Table 3.
Figure 3 shows the partial dependency plot, which indicates the relationship between TT4 (
X19) and the log-odds from Model 1. From this plot, we can separate
X19 into three levels given that the log-odds value stays constant and then shows a downward slope when
X19 = 0.042. The downward slope stops when
X19 = 0.065, and there is no change in the log-odds after this value. Therefore, this predictor is separated into three levels with two splitting values, 0.042 and 0.065. The three binary variables,
X19L1, X19L2, and
X19L3, are generated as shown in
Table 3.
Figure 4 shows the partial dependency plot, which indicates the relationship between T3 (
X18) and the log-odds from Model 1. From this plot, we can separate
X18 into two levels given that the log-odds value drops to a different level when
X18 = 0.006, i.e., the splitting value is 0.006. The two binary variables,
X18L1 and
X18L2, are generated as shown in
Table 3.
Figure 5 shows the partial dependency plot, which indicates the relationship between T4U (
X20) and the log-odds from Model 1. From this plot, we can separate
X20 into two levels given that the log-odds value drops to a different level when
X20 = 0.097, i.e., the splitting value is 0.097. The two binary variables,
X20L1 and
X20L2, are generated as shown in
Table 3.
Figure 6 shows the partial dependency plot, which indicates the relationship between age (
X1) and the log-odds from Model 1. From this plot, we can separate
X1 into two levels given that the log-odds value drops to a different level when
X1 = 0.15, i.e., the splitting value is 0.15. The two binary variables,
X1L1 and
X1L2, are generated as shown in
Table 3.
The second categories of
X21 and
X19 show a linear trend. Therefore, there are two more variables to generate for these two categories (
Table 4).
4.2. Results from Model 2
The variable importance plot for Model 2 is shown in
Table 5.
From
Table 5, the most important predictor that contributes to predicting the response is TSH (
X17), with a score of 100. The second most important predictor is thyroxine (
X3), with a score of 44.88. We can observe that all six quantitative variables contribute to predicting the response, as each quantitative variable has a score higher than zero. The dependency plots of all the quantitative variables are shown in
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11 and
Figure 12: the vertical axis represents a 0.5[log(p
2/p
0)], where p
2 is the probability that a variable is in class 2 and where p
0 is the probability that a variable is in class 0. The higher the value of the log-odds, the higher the probability that a variable belongs in class 2.
From
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11 and
Figure 12, we take each quantitative variable and assign their parts to categories according to the respective function of each to the logit. Next, the new variables are generated as binary variables (
Table 6). Please note that TreeNet by Salford Systems includes a feature that shows which values of the predictors constitute the separating points.
Figure 7 shows the partial dependency plot, which indicates the relationship between TSH (
X17) and the log-odds from Model 2. From this plot, we can separate
X17 into two levels given that the log-odds value shifts to a different level when
X17 = 0.006, i.e., the splitting value is 0.006. The two binary variables,
X17LL1 and
X17LL2, are generated as shown in
Table 6.
Figure 8 shows the partial dependency plot, which indicates the relationship between TT4 (
X19) and the log-odds from Model 2. From this plot, we can separate
X19 into four levels given that the log-odds value shifts to the different level when
X19 = 0.065. Then, the value shows a downward slope when
X21 = 0.145. The downward slope stops when
X19 = 0.161, and there is no change in the log-odds after this value. Therefore, this predictor is separated into four levels with three splitting values: 0.065, 0.145, and 0.161. The four binary variables,
X19LL1,
X19LL2,
X19LL3, and
X19LL4, are generated as shown in
Table 6.
Figure 9 shows the partial dependency plot, which indicates the relationship between T3 (
X18) and the log-odds from Model 2. From this plot, we can separate
X18 into three levels given that it shows a downward slope when
X18 = 0.02. The downward slope stops when
X18 = 0.045, and there is no change in the log-odds after this value. Therefore, this predictor is separated into three levels with two splitting values, 0.02 and 0.045. The three binary variables,
X18LL1,
X18LL2, and
X18LL3, are generated as shown in
Table 6.
Figure 10 shows the partial dependency plot, which indicates the relationship between FTI (
X21) and the log-odds from Model 2. From this plot, we can separate
X21 into five levels given that it shows an upward slope when
X21 = 0.057. The upward slope stops when
X21 = 0.071, and the log-odds value stays constant until
X21 = 0.115. Then, the plot shows a downward slope until
X21 = 0.217. There is no change in the log-odds after this value. Therefore, this predictor is separated into five levels with four splitting values: 0.057, 0.071, 0.115, and 0.217. The five binary variables,
X21LL1, X21LL2, X21LL3, X21LL4, and
X21LL5, are generated as shown in
Table 6.
Figure 11 shows the partial dependency plot, which indicates the relationship between T4U (
X20) and the log-odds from Model 2. From this plot, we can separate
X20 into three levels given that the log-odds value shifts to a different level twice when
X20 = 0.07 and when
X20 = 0.15, i.e., the splitting values are 0.07 and 0.15. The three binary variables,
X20LL1, X20LL2, and
X20LL3, are generated as shown in
Table 6.
Figure 12 shows the partial dependency plot, which indicates the relationship between age (
X1) and log-odds from Model 2. From this plot, there is a downward slope when
X1 = 0.75. The downward slope stops when
X1 = 0.85, and there is no change in the log-odds after this value. Therefore, this predictor is separated into three levels with two splitting values: 0.75 and 0.85. The three binary variables,
X1LL1,
X1LL2, and
X1LL3, are generated as shown in
Table 6.
Note that the second and fourth categories of
X4, the third category of
X19, the second category of
X18, and the second category of
X1 all show a linear trend. Therefore, there are five more variables to generate for these five categories (
Table 7).
All the variables generated in this step, as shown in
Table 3,
Table 4,
Table 6 and
Table 7, serve as the input for building the multinomial model in the final step (step 5). However, only the generated binary variables from
Table 3 and
Table 6 are included in the input used to search for interactions via ASA in step 2 since ASA can find rules from categorical variables only.
Step 2: Use CBA to obtain the active rules. In this step, the variables input into the process are (i) the original categorical predictors (
X2–
X16) and (ii) the generated binary variables from
Table 3 and
Table 6. For this dataset, as the first class (hypo-function) accounts for 2.47% of the dataset, it is necessary to lower the level of support to below 1% in order to capture the rules for this class and set the minimum confidence at 80% to generate the active rules. In total, 5808 rules are generated in this step.
Step 3: Select all the classifier rules from CBA. As a large number of rules are generated in step 2, we take this approach to decrease the number of rules and thereby simplify the process. In total, 26 classifier rules are generated, some examples of which are as follows:
Rule 6: If X20L1 = 1, X18LL1 = 0, and X21LL4 = 1, then Y = 0 with s = 8.537%, c = 100%.
Rule 8: If X11 = 0, X19LL3 = 1, and X18LL1 = 0, then Y = 0 with s = 3.075%, c = 100%.
Rule 22: If X8 = 0, X17L1 = 0, and X19L3 = 0, then Y = 1 with s = 1.935%, c = 97.26%.
Rule 26: If X3 = 0, X17LL1 = 0, and X19LL2 = 1, then Y = 2 with s = 5.276%, c = 88.945%.
Step 4: Convert the 26 classifier rules into variables. For example, Rule 6 is converted into the new variable referred to as
X20L1(1)X18LL1(0)X21LL4(1) and Rule 22 is converted into the new variable referred to as
X8(0)X17L1(0)X19L3(0). From 26 classifier rules, we generate 26 interactions. However, two extra variables are generated from Rule 6 and Rule 8. From Model 2, the fourth category of
X21 and the third category of
X19 each generate a variable,
X21QQ4 and
X19QQ3, respectively. For Rule 6, as
X21LL4 = 1, we generate another interaction, which involves
X21QQ4, referred to as
X20L1(1)X18LL1(0)X21QQ4. To generate this new variable, we multiply the generated variable
X21QQ4 with the dummy variable:
The extra variable from Rule 8 is generated similarly. In total, 28 interactions are generated from the 26 classifier rules.
Step 5: In this illustration, we apply the backward stepwise method and use BIC to select the multinomial logit model. The candidate variables comprise the original variables (X1–X21), all the variables generated from TreeNet, and all 28 potential interactions from step 4.
From the backward stepwise method, we obtain the following model, which yields the best BIC at 233.41. We select this model to represent this dataset, which will be used for classification. To be specific, we will use this model to compute the probability of each patient being in a given class based on the values of the predictors applied. Then, we can assign the most likely class to each patient.
We can further establish the following facts based on the signs and values of the coefficients. The interpretation for the variables generated in Step 2 will be compared to the dependency plots from the TreeNet model, and the interaction generated from step 4 will be compared with the result from ASA:
(i) A patient who is on thyroxine and/or has had thyroid surgery has a higher probability of being in the normal class (Y = 0) than in any of the other classes.
(ii) A patient with an FTI value of 0.07 or higher has a greater probability of being in the normal class (
Y = 0) than in the hypo-function class (
Y = 1). This result is consistent with the result shown in
Figure 2.
(iii) A patient with a TSH value below 0.006 has a greater probability of being in the normal class (
Y = 0) than in the hyper-function class (
Y = 2). This result is consistent with the result shown in
Figure 7.
(iv) A patient with a TT4 value of between 0.065 and 0.145 or a T3 value below 0.02 has a greater probability of being in the hyper-function class (
Y = 2) than in the normal class (
Y = 0). This result is consistent with the results shown in
Figure 8 and
Figure 9.
(v) The higher the value of FTI in the range 0.057 to 0.071, the greater the probability of a patient being in the hyper-function class (
Y = 2) than in the normal class (
Y = 0). This result is consistent with the result shown in
Figure 10.
(vi) A patient who has never had thyroid surgery and has a TT4 value below 0.065 and a TSH value of 0.025 or higher has a greater probability of being in the hypo-function class (Y = 1) than in any of the other two classes. This result is consistent with Rule 22.
Our proposed model (Model 3) provides a useful interpretation. However, we will also compare the performance of our model with that of the multinomial logit model developed from different sets of input. As shown in
Table 8 Model 1 is the selected multinomial logit model when the candidate predictors comprise only the main effects (
X1–
X21), whereas Model 2 is the selected multinomial logit model when the candidate predictors comprise the main effects (
X1–
X21) and all the two-way interactions (
X1X2–
X20X21, where
XiXj =
Xi.
Xj). Note that all the models are found from stepwise regression using BIC criterion.