# Cohen’s Kappa Coefficient as a Measure to Assess Classification Improvement following the Addition of a New Marker to a Regression Model

## Abstract

## 1. Introduction

## 2. Materials and Methods

_{O}− e

_{O}and the number of patients correctly reclassified as diseased by O″ = f

_{O}− b

_{O}.

_{E}− e

_{E}and E″ = f

_{E}− b

_{E}. The random number of reclassifications concerns disease-free (E′) and diseased people (E″). Thus, the number of people for whom we can expect a real improvement in reclassification, i.e., non-random, is O′ − E′ disease-free and O″ − E″ diseased, respectively.

_{E}+ f

_{E}) we denote the random (expected) number of correct reclassifications, and by O = (a

_{O}+ f

_{O}) the observed number of correct reclassifications, then the calculations can be reduced to determining the Cohen’s Kappa compliance coefficient. However, a symmetric table is needed to determine the Kappa coefficient. After adding a new variable to the basic model, the change in the predicted probability may have three variants, i.e., reclassification down, up, and no reclassification, and the reality is a binary variable with two categories, i.e., disease-free and diseased. In order to maintain symmetry of the table, the addition of a hidden category is necessary. No examined person can belong to the hidden category, therefore the counts in this category will be zero (Table 2).

_{O}+ f

_{O}) reduced by the expected number of correct reclassifications E = (a

_{E}+ f

_{E}).

- κ > 0 indicates better reclassification after adding a new variable,
- κ < 0 indicates worse reclassification after adding a new variable,
- κ = 0 indicates that there are no changes in the reclassification.

_{O}+ f

_{O}) − (a

_{E}+ f

_{E}) = x

_{O}− e

_{O}) − (a

_{E}− e

_{E}) = x

_{O}− b

_{O}) − (f

_{E}− b

_{E}) = x

#### 2.1. Example

_{Down}= 0.50, NRI

_{Up}= 0.25, NRI = 0.75

_{O}+ f

_{O}) − (a

_{E}+ f

_{E}) = a

_{O}− a

_{E}+ f

_{O}− f

_{E}= 9

_{O}− e

_{O}) − (a

_{E}− e

_{E}) = 9

_{O}− b

_{O}) − (f

_{E}− b

_{E}) = 9

_{0}+ b

_{0}= 25 people, i.e., 50% of the study group, while an increase in e

_{0}+ f

_{0}= 15 people (30% of the analyzed group). It is known that by adding this variable we can improve the forecast for disease-free individuals more than for the diseased. We expected that in the groups of the diseased and disease-free, the probability would decrease as described above, i.e., in half of the subjects in the group of thirty healthy people (a

_{E}= 15) and in half of the subjects in the twenty diseased (b

_{E}= 10). Meanwhile, the reclassification turned out to be more favorable than expected and the probability of disease decreased more in healthy people (a

_{O}= 20 people, i.e., 66.7%). For those diagnosed with a disease, the decrease concerned b

_{O}= 5 people, i.e., 25% of this group. Similarly, we expected that in the groups of the diseased and disease-free the probability would increase as described above, i.e., 30% of the subjects in the healthy group (e

_{E}= 9) and in in the diseased group (f

_{E}= 6). Meanwhile, again, the reclassification turned out to be more favorable, i.e., the probability of disease increased more in diseased people (f

_{O}= 10, 50%) and less in the healthy group (b

_{O}= 5, 16.7%).

#### 2.2. Plan of Simulation Study

#### 2.3. Selection of Candidates for Extended Models

#### 2.4. Construction of the Basic Model

## 3. Results

^{2}(Nagelkerke) values. In the baseline model, after applying forward stepwise regression, three variables remained: BMI, income, and daily activity.

## 4. Discussion

## 5. Conclusions

**Figure 1.**Method for presenting variable x, i.e., the number of correct reclassifications reduced by the random number of correct reclassifications after adding a new variable to the logistic regression model.

**Table 1.**Observed reclassification to a group of people who were diagnosed with a disease and a group of disease-free individuals, and the numbers expected for random reclassification.

Observed Frequency | ||||

disease-free | diseased | total | ||

reclassification | down | a_{O} | b_{O} | # down |

no changes | c_{O} | d_{O} | # no changes | |

up | e_{O} | f_{O} | # up | |

total | # disease-free | # diseased | n | |

Expected frequency * | ||||

reclassification | down | a_{E} | b_{E} | |

no changes | c_{E} | d_{E} | ||

up | e_{E} | f_{E} |

**Table 2.**Location of the hidden category in the table with observed reclassification to the group of people who were diagnosed with the disease and the group of disease-free individuals, and table with expected numbers for a random reclassification.

Observed Frequency | |||||

disease-free | hidden category | diseased | total | ||

reclassification | down | a_{O} | 0 | b_{O} | # down |

no changes | c_{O} | 0 | d_{O} | # no changes | |

up | e_{O} | 0 | f_{O} | # up | |

total | # disease-free | 0 | # diseased | n | |

Expected frequency * | |||||

reclassification | down | a_{E} | 0 | b_{E} | |

no changes | c_{E} | 0 | d_{E} | ||

up | e_{E} | 0 | f_{E} |

**Table 3.**Observed reclassification to the group of people who were diagnosed with a disease and the group of disease-free individuals, and the numbers expected random reclassification using a 50-element sample.

Observed Frequency | |||||

disease-free | hidden category | diseased | total | ||

reclassification | down | a_{O} = 20 | 0 | b_{O} = 5 | a_{O} + b_{O} = 25 |

no changes | c_{O} = 5 | 0 | d_{O} = 5 | c_{O} + d_{O} = 10 | |

up | e_{O} = 5 | 0 | f_{O} = 10 | e_{O} + f_{O} = 15 | |

total | a_{O} + c_{O} + e_{O} = 30 | 0 | b_{O} + d_{O} + f_{O} = 20 | n = 50 | |

Expected frequency * | |||||

reclassification | down | a_{E} = 15 | 0 | b_{E} = 10 | |

no changes | c_{E} = 6 | 0 | d_{E} = 4 | ||

up | e_{E} = 9 | 0 | f_{E} = 6 |

**Table 4.**Summary of results for one-dimensional logistic regression models describing CVD risk depending on BMI, place of residence, marital status, income, daily activity, education, SCORE result, and random variables with planned distributions.

Independent Variables | Frequency (%) | p-Value | OR [95%CI] | R^{2} # | BASIC MODEL * |
---|---|---|---|---|---|

CANDIDATES FOR THE BASIC MODEL | |||||

1. BMI | 0.02 | BMI | |||

underweight | 21 (0.5) | 0.2287 | 1.7 [0.72, 4.05] | ||

standard | 931 (23.5) | reference | |||

overweight | 1780 (44.8) | <0.0001 | 1.52 [1.29, 1.79] | ||

obesity | 1239 (31.2) | <0.0001 | 1.97 [1.65, 2.35] | ||

2. place of residence | 0.0003 | ||||

rural area | 1496 (37.7) | 0.3207 | 0.94 [0.82, 1.07] | ||

urban area | 2475 (62.3) | reference | |||

3. marital status | 0.0004 | ||||

single | 1143 (28.8) | 0.1202 | 0.9 [0.78, 1.03] | ||

in a relationship | 2828 (71.2) | reference | |||

4. income | 0.007 | income | |||

low | 1034 (26.0) | 0.0007 | 0.77 [0.67, 0.9] | ||

average | 2226 (56.1) | reference | |||

high | 711 (17.9) | 0.0001 | 0.7 [0.59, 0.83] | ||

5. daily activity | 0.003 | daily activity | |||

passive | 1335 (33.6) | 0.0035 | 1.29 [1.09, 1.53] | ||

mixed | 1793 (43.8) | 0.6809 | 0.97 [0.82, 1.14] | ||

active | 897 (22.6) | reference | |||

CANDIDATES FOR THE NEW MODELS | |||||

ADDITIONAL | |||||

6. education | 0.04 | ||||

basic | 830 (20.9) | reference | |||

professional | 1065 (26.8) | <0.0001 | 0.63 [0.52, 0.76] | ||

medium | 1408 (35.5) | <0.0001 | 0.45 [0.38, 0.53] | ||

higher | 668 (16.8) | <0.0001 | 0.40 [0.33, 0.49] | ||

7. SCORE | 0.39 | ||||

high | 2573 (64.8) | <0.0001 | 22.66 [18.27, 28.12] | ||

low | 1398 (35.2) | reference | |||

RANDOM | assumed parameters | ||||

8. uniform | interval: [0, 100] | 0.3049 | 1.00 [1.00, 1.00] | 0.0003 | |

9. normal | mean (sd) = 0 (1) | 0.4043 | 1.03 [0.96, 1.09] | 0.0003 | |

10. Poisson | λ = 4 | 0.0443 | 1.03 [1.00, 1.07] | 0.001 | |

11. exponential | λ = 1 | 0.5114 | 1.02 [0.96, 1.09] | 0.0001 | |

12. binomial | p = 0.1 | 0.7362 | 0.96 [0.78, 1.19] | 0.00003 | |

13. binomial | p = 0.5 | 0.6574 | 0.97 [0.86, 1.10] | 0.00009 |

^{2}(Nagelkerke)—the measure of the model fit quality.

**Table 5.**Results of logistic regression models describing CVD risk: basic model (variables in the model: BMI, income, and daily activity) and models expanded by education, SCORE, and random variables with uniform, normal, Poisson, exponential, binomial (p = 0.1), and binomial (p = 0.5) distributions.

Model | Wald Test p-Value | Likelihood Ratio Test p-Value | AUC [95%CI] | AUC Change after Adding Marker p-Value |
---|---|---|---|---|

basic | 0.59 [0.57, 0.60] | |||

basic + education | (p < 0.0001 for each category) | <0.0001 | 0.63 [0.61, 0.65] | <0.0001 |

basic + SCORE | <0.0001 | <0.0001 | 0.79 [0.78, 0.81] | <0.0001 |

basic + uniform | 0.2532 | 0.2532 | 0.59 [0.57, 0.61] | 0.3989 |

basic + normal | 0.5251 | 0.5251 | 0.59 [0.57, 0.60] | 0.7074 |

basic + Poisson | 0.0550 | 0.0549 | 0.59 [0.57, 0.61] | 0.3206 |

basic + exponential | 0.4761 | 0.4764 | 0.59 [0.57, 0.61] | 0.4795 |

basic + binomial (p = 0.1) | 0.7848 | 0.7847 | 0.59 [0.57, 0.60] | 0.4742 |

basic + binomial (p = 0.5) | 0.8866 | 0.8866 | 0.59 [0.57, 0.60] | 0.6523 |

**Table 6.**Reclassification quality based on Cohen Kappa and Net Reclassification Improvement (NRI) for a continuous change of CVD risk between the base and extended logistic regression models.

Model | x Number (% from n) | p-Value * | κ [95%CI] | p-Value # | NRI [95%CI] |
---|---|---|---|---|---|

basic + education | 311 (7.82) | <0.0001 | 0.16 [0.13, 0.19] | <0.0001 | 0.32 [0.26, 0.38] |

basic + SCORE | 1035 (26.06) | <0.0001 | 0.50 [0.48, 0.53] | <0.0001 | 1.06 [1.01, 1.10] |

basic + uniform | 30 (0.74) | 0.3470 | 0.01 [−0.02, 0.05] | 0.3470 | 0.03 [−0.03, 0.09] |

basic + normal | 17 (0.41) | 0.6068 | 0.01 [−0.02, 0.04] | 0.6068 | 0.02 [−0.05, 0.08] |

basic + Poisson | 54 (1.35) | 0.0876 | 0.03 [0.00, 0.06] | 0.0874 | 0.05 [−0.01, 0.12] |

basic + exponential | −22 (−0.55) | 0.4733 | −0.01 [−0.04, 0.02] | 0.4736 | 0.02 [−0.04, 0.08] |

basic + binomial (p = 0.1) | 0 (0.00) | 0.6690 | 0.00 [−0.02, 0.02] | 0.6684 | 0.01 [−0.03, 0.05] |

basic + binomial (p = 0.5) | 14 (0.35) | 0.6574 | 0.01 [−0.02, 0.04] | 0.6574 | 0.01 [−0.05, 0.08] |

**Table 7.**Reclassification quality based on Cohen Kappa and Net Reclassification Improvement (NRI) for a change of CVD risk (between the base and extended logistic regression models) exceeding 1%.

Model | x Number (% from n) | p-Value * | Unit-κ [95%CI] | p-Value # | Unit-NRI [95%CI] |
---|---|---|---|---|---|

basic + education | 310 (7.8) | <0.0001 | 0.15 [0.12, 0.18] | <0.0001 | 0.31 [0.26, 0.38] |

basic + SCORE | 1035 (26.1) | <0.0001 | 0.50 [0.48, 0.53] | <0.0001 | 1.05 [1.01, 1.10] |

basic + uniform | 24 (0.6) | 0.1965 | 0.007 [−0.004, 0.018] | 0.1984 | 0.02 [−0.01, 0.06] |

basic + normal | −6 (−0.2) | 0.3595 | −0.002 [−0.005, 0.002] | 0.3599 | −0.006 [−0.019, 0.007] |

basic + Poisson | 30 (0.7) | 0.1588 | 0.010 [−0.004, 0.023] | 0.1590 | 0.030 [−0.011, 0.072] |

basic + exponential | 8 (0.2) | 0.2918 | 0.002 [−0.002, 0.006] | 0.2950 | 0.008 [−0.07, 0.023] |

basic + binomial (p = 0.1) | 0 (0) | 1.0000 | 0.000 [0.000, 0.000] | NA | 0.000 [0.000, 0.000] |

basic + binomial (p = 0.5) | 0 (0) | 1.0000 | 0.000 [0.000, 0.000] | NA | 0.000 [0.000, 0.000] |

**Table 8.**Reclassification quality based on Cohen Kappa and Net Reclassification Improvement (NRI) for a categorial risk of the CVD (between the base and extended logistic regression models). The two risk categories were built based on a cut-off value p = 0.4444.

Model | x Number (% from n) | p-Value * | κ (p) [95%CI] | p-Value # | NRI (p) [95%CI] |
---|---|---|---|---|---|

basic + education | 52 (1.3) | 0.0012 | 0.01 [0.01, 0.02] | <0.0001 | 0.06 [0.03, 0.09] |

basic + SCORE | 397 (10.0) | <0.0001 | 0.13 [0.11. 0.14] | <0.0001 | 0.40 [0.37. 0.44] |

basic + uniform | −6 (−0.2) | 0.3936 | −0.002 [−0.001, 0,002] | 0.3934 | −0.007 [−0.022, 0.009] |

basic + normal | 2 (0.1) | 0.3332 | 0.001 [−0.001, 0.002] | 0.2749 | 0.004 [−0.003, 0.011] |

basic + Poisson | 9 (0.2) | 0.2882 | 0.002 [−0.002, 0.007] | 0.2879 | 0.009 [−0.008, 0.026] |

basic + exponential | 4 (0.1) | 0.1987 | 0.001 [−0.001, 0.002] | 0.1999 | 0.004 [−0.002, 0.010] |

basic + binomial (p = 0.1) | 0 (0.0) | 0.9092 | 0.000 [−0.002, 0.002] | 0.9998 | 0.000 [−0.004, 0.004] |

basic + binomial (p = 0.5) | 0 (0.0) | 1.0000 | 0.000 [0.000, 0.000] | NA | 0.000 [0.000, 0.000] |

