Author Contributions
Conceptualization, R.H.B.; methodology, V.P.O. and S.R.; software, R.H.B., V.P.O. and S.R.; validation, R.H.B., V.P.O., S.R. and S.Y.; formal analysis, R.H.B., V.P.O. and S.R.; investigation, R.H.B., V.P.O., S.R. and S.Y.; resources, R.H.B. and V.P.O.; data curation, R.H.B., V.P.O. and S.R.; writing—original draft preparation, R.H.B. and V.P.O.; writing—review and editing, R.H.B., V.P.O., S.R. and S.Y.; visualization, V.P.O. and S.R.; supervision, S.Y.; project administration, S.Y. All authors have read and agreed to the published version of the manuscript.
Acknowledgments
The NACC database is funded by NIA/NIH Grant U24 AG072122. NACC data are contributed by the NIA-funded ADRCs: P30 AG062429 (PI James Brewer, MD, PhD), P30 AG066468 (PI Oscar Lopez, MD), P30 AG062421 (PI Bradley Hyman, MD, PhD), P30 AG066509 (PI Thomas Grabowski, MD), P30 AG066514 (PI Mary Sano, PhD), P30 AG066530 (PI Helena Chui, MD), P30 AG066507 (PI Marilyn Albert, PhD), P30 AG066444 (PI David Holtzman, MD), P30 AG066518 (PI Lisa Silbert, MD, MCR), P30 AG066512 (PI Thomas Wisniewski, MD), P30 AG066462 (PI Scott Small, MD), P30 AG072979 (PI David Wolk, MD), P30 AG072972 (PI Charles DeCarli, MD), P30 AG072976 (PI Andrew Saykin, PsyD), P30 AG072975 (PI Julie A. Schneider, MD, MS), P30 AG072978 (PI Ann McKee, MD), P30 AG072977 (PI Robert Vassar, PhD), P30 AG066519 (PI Frank LaFerla, PhD), P30 AG062677 (PI Ronald Petersen, MD, PhD), P30 AG079280 (PI Jessica Langbaum, PhD), P30 AG062422 (PI Gil Rabinovici, MD), P30 AG066511 (PI Allan Levey, MD, PhD), P30 AG072946 (PI Linda Van Eldik, PhD), P30 AG062715 (PI Sanjay Asthana, MD, FRCP), P30 AG072973 (PI Russell Swerdlow, MD), P30 AG066506 (PI Glenn Smith, PhD, ABPP), P30 AG066508 (PI Stephen Strittmatter, MD, PhD), P30 AG066515 (PI Victor Henderson, MD, MS), P30 AG072947 (PI Suzanne Craft, PhD), P30 AG072931 (PI Henry Paulson, MD, PhD), P30 AG066546 (PI Sudha Seshadri, MD), P30 AG086401 (PI Erik Roberson, MD, PhD), P30 AG086404 (PI Gary Rosenberg, MD), P20 AG068082 (PI Angela Jefferson, PhD), P30 AG072958 (PI Heather Whitson, MD), and P30 AG072959 (PI James Leverenz, MD). The authors thank the reviewers for their time and effort in reviewing the paper and for their constructive comments.
Figure 1.
Overall association between E4 count and CI using Chi-square test. Blue represents the absence of cognitive impairment (CI = 0), and red represents the presence of cognitive impairment (CI = 1). Chi-squared statistics for the test of independence and corresponding p-values are given below the diagram. The test shows a significant association between the E4 count and CI.
Figure 1.
Overall association between E4 count and CI using Chi-square test. Blue represents the absence of cognitive impairment (CI = 0), and red represents the presence of cognitive impairment (CI = 1). Chi-squared statistics for the test of independence and corresponding p-values are given below the diagram. The test shows a significant association between the E4 count and CI.
Figure 2.
ANOVA of E4 count vs. age of onset. An ANOVA of age at the onset of cognitive impairment by APOE-E4 allele count. Boxplots show the median (yellow line), interquartile range (blue box), and 5th–95th percentile whiskers for E4 counts of 0 (bottom), 1 (middle), and 2 (top). Each group’s mean ± SD is annotated above its box: 69.67 ± 11.8 years (E4 = 0, N = 25,661), 68.97 ± 9.9 years (E4 = 1), and 65.66 ± 8.3 years (E4 = 2). Vertical gray lines mark the overall median, and the ANOVA test (F = 131.776, p < 0.001) confirms a significant dose-dependent decrease in the age of onset with an increasing E4 count.
Figure 2.
ANOVA of E4 count vs. age of onset. An ANOVA of age at the onset of cognitive impairment by APOE-E4 allele count. Boxplots show the median (yellow line), interquartile range (blue box), and 5th–95th percentile whiskers for E4 counts of 0 (bottom), 1 (middle), and 2 (top). Each group’s mean ± SD is annotated above its box: 69.67 ± 11.8 years (E4 = 0, N = 25,661), 68.97 ± 9.9 years (E4 = 1), and 65.66 ± 8.3 years (E4 = 2). Vertical gray lines mark the overall median, and the ANOVA test (F = 131.776, p < 0.001) confirms a significant dose-dependent decrease in the age of onset with an increasing E4 count.
Figure 3.
Prevalence of cognitive impairment for each E4 count by race.
Figure 3.
Prevalence of cognitive impairment for each E4 count by race.
Figure 4.
Association between E4 counts and CI for each race (part 1 of 2). Panels: White (top) and then Black and American Ind. (bottom). Association between E4 counts and CI for each race. Panels: Pacific Island, Asian, Other (Hisp), and Unknown. Blue represents E4 count = 0, Red = 1, Green = 2. Chi2 and corresponding p-values are below each figure.
Figure 4.
Association between E4 counts and CI for each race (part 1 of 2). Panels: White (top) and then Black and American Ind. (bottom). Association between E4 counts and CI for each race. Panels: Pacific Island, Asian, Other (Hisp), and Unknown. Blue represents E4 count = 0, Red = 1, Green = 2. Chi2 and corresponding p-values are below each figure.
Figure 5.
Analyzing the association between E4 counts and the age of onset of CI within each race using ANOVA. Diagrams from top left to bottom right represent the White, Black, American Indian, and Pacific Island groups. Chi-square statistics for the test of independence and corresponding p-values are given below each diagram.
Figure 5.
Analyzing the association between E4 counts and the age of onset of CI within each race using ANOVA. Diagrams from top left to bottom right represent the White, Black, American Indian, and Pacific Island groups. Chi-square statistics for the test of independence and corresponding p-values are given below each diagram.
Figure 6.
ANOVA analyzing the association between E4 counts and the age of onset of CI within each race. Diagrams represent the Asian, Other (Hispanic), and Unknown groups. Chi-square statistics for the test of independence and corresponding p-values are given below each diagram.
Figure 6.
ANOVA analyzing the association between E4 counts and the age of onset of CI within each race. Diagrams represent the Asian, Other (Hispanic), and Unknown groups. Chi-square statistics for the test of independence and corresponding p-values are given below each diagram.
Figure 7.
[CI prevalence by E4 count and gender]. Prevalence of cognitive impairment (CI) by APOE-E4 allele count (0, 1, 2) and gender. Blue bars show males, and gold bars show females. In both genders, CI prevalence rises with each additional E4 allele: at E4 = 0, approximately 66 % of males and 52 % of females are impaired; at E4 = 1, about 77 % of males and 67 % of females are impaired; and at E4 = 2, roughly 88 % of males and 80 % of females are impaired. Gender differences are modest at each genotype, indicating a consistent dose-dependent effect of E4 on CI prevalence across sexes.
Figure 7.
[CI prevalence by E4 count and gender]. Prevalence of cognitive impairment (CI) by APOE-E4 allele count (0, 1, 2) and gender. Blue bars show males, and gold bars show females. In both genders, CI prevalence rises with each additional E4 allele: at E4 = 0, approximately 66 % of males and 52 % of females are impaired; at E4 = 1, about 77 % of males and 67 % of females are impaired; and at E4 = 2, roughly 88 % of males and 80 % of females are impaired. Gender differences are modest at each genotype, indicating a consistent dose-dependent effect of E4 on CI prevalence across sexes.
Figure 8.
Chi-squared test shows strong dependence between gender and CI.
Figure 8.
Chi-squared test shows strong dependence between gender and CI.
Figure 9.
A Chi-squared test shows a strong dependence between E4s count and CI for both male (left) and female (right) subjects.
Figure 9.
A Chi-squared test shows a strong dependence between E4s count and CI for both male (left) and female (right) subjects.
Figure 10.
ANOVA test shows a strong difference in the mean age of onset of CI for various E4 counts for both males (top) and females (bottom).
Figure 10.
ANOVA test shows a strong difference in the mean age of onset of CI for various E4 counts for both males (top) and females (bottom).
Table 1.
The number of subjects in each E4 count and race group and the percentage of cognitively impaired patients for each (E4 count, race) group.
Table 1.
The number of subjects in each E4 count and race group and the percentage of cognitively impaired patients for each (E4 count, race) group.
Race | Total | E4 Count = 0 | % CI E4 Count = 0 | E4 Count = 1 | % CI E4 Count = 1 | E4 Count = 2 | % CI E4 Count = 2 |
---|
White | 32774 | 19438 | 59.20 | 11103 | 72.71 | 2233 | 84.46 |
Black | 5308 | 2923 | 49.06 | 2009 | 61.72 | 376 | 76.86 |
American Ind. | 269 | 181 | 51.38 | 73 | 52.05 | 15 | 80.00 |
Pacific Island | 42 | 28 | 57.14 | 12 | 83.33 | 2 | 50.00 |
Asian | 1067 | 778 | 52.83 | 251 | 70.12 | 38 | 79.95 |
Other (Hisp) | 511 | 335 | 62.99 | 150 | 75.33 | 26 | 96.15 |
Unknown | 239 | 145 | 69.66 | 81 | 71.60 | 13 | 76.92 |
Table 2.
Chi-square tests of HWE by race: observed vs. expected counts of CI for E4 = 0, 1, 2.
Table 2.
Chi-square tests of HWE by race: observed vs. expected counts of CI for E4 = 0, 1, 2.
Race | Observed | Expected | Chi2 Stat | p-Value | HWE Conform? |
---|
White | [19438, 11103, 2233] | [19055.01, 11870.33, 1848.66] | 137.2063 | 0.0000 | No |
Black | [2923, 2009, 376] | [2905.88, 2043.03, 359.10] | 1.4633 | 0.2264 | Yes |
American Ind. | [181, 73, 15] | [175.88, 83.26, 9.85] | 4.1009 | 0.0429 | No |
Pacific Isl. | [28, 12, 2] | [27.52, 12.95, 1.52] | 0.2270 | 0.6337 | Yes |
Asian | [778, 251, 38] | [765.11, 276.84, 25.04] | 9.3337 | 0.0023 | No |
Other (Hisp) | [335, 150, 26] | [328.92, 162.10, 19.97] | 2.8351 | 0.0922 | Yes |
Unknown | [145, 81, 13] | [143.99, 83.04, 11.97] | 0.1454 | 0.7030 | Yes |
Table 3.
Proportion with cognitive impairment by race and E4 count.
Table 3.
Proportion with cognitive impairment by race and E4 count.
Race | E4 Count | Proportion with CI | SD Proportion with CI | Count |
---|
White | 0 | 0.592 | 0.492 | 19,438 |
White | 1 | 0.727 | 0.446 | 11,103 |
White | 2 | 0.845 | 0.362 | 2233 |
Black | 0 | 0.491 | 0.500 | 2923 |
Black | 1 | 0.617 | 0.486 | 2009 |
Black | 2 | 0.769 | 0.422 | 376 |
American Ind. | 0 | 0.514 | 0.501 | 181 |
American Ind. | 1 | 0.521 | 0.503 | 73 |
American Ind. | 2 | 0.800 | 0.414 | 15 |
Pacific Isla | 0 | 0.571 | 0.504 | 28 |
Pacific Isla | 1 | 0.833 | 0.389 | 12 |
Pacific Isla | 2 | 0.500 | 0.707 | 2 |
Asian | 0 | 0.528 | 0.500 | 778 |
Asian | 1 | 0.701 | 0.459 | 251 |
Asian | 2 | 0.790 | 0.413 | 38 |
Other | 0 | 0.630 | 0.484 | 335 |
Other | 1 | 0.753 | 0.433 | 150 |
Other | 2 | 0.962 | 0.196 | 26 |
Unknown | 0 | 0.697 | 0.461 | 145 |
Unknown | 1 | 0.716 | 0.454 | 81 |
Unknown | 2 | 0.769 | 0.439 | 13 |
Table 4.
Logistic regression analysis of the association between APOE-E4 allele count and cognitive impairment.
Table 4.
Logistic regression analysis of the association between APOE-E4 allele count and cognitive impairment.
Variable | Odds Ratio | 95% CI | p-Value |
---|
E4 count = 0 | Reference | - | - |
E4 count = 1 | 1.79 | 1.71–1.87 | <0.001 |
E4 count = 2 | 3.75 | 3.36–4.18 | <0.001 |
Model Statistics |
Sample size | 38,453 |
Likelihood ratio (df = 2) | 1144.89 |
LR test p-value | <0.001 |
Pseudo- | 0.0229 |
Table 5.
The coefficients of the logistic regression model for CI using E4 counts as a predictor grouped by race.
Table 5.
The coefficients of the logistic regression model for CI using E4 counts as a predictor grouped by race.
Race | Intercept | E4 Count = 0 | E4 Count = 1 | E4 Count = 2 |
---|
Overall | 0.315 | 0.000 | 0.579 | 1.293 |
White | 0.373 | 0.000 | 0.607 | 1.317 |
Black | 0.000 | −0.036 | 0.476 | 1.186 |
American Ind. | 0.063 | 0.000 | 0.000 | 0.949 |
Pacific Island | 0.268 | 0.000 | 0.830 | 0.000 |
Asian | 0.118 | 0.000 | 0.716 | 1.052 |
Other (Hisp) | 0.544 | 0.000 | 0.537 | 1.941 |
Unknown | 0.859 | 0.000 | 0.006 | 0.000 |
Table 6.
The logistic regression model evaluations by race for CI using E4 counts as a predictor.
Table 6.
The logistic regression model evaluations by race for CI using E4 counts as a predictor.
Race | AUC | CA | F1 | Prec | Recall | MCC |
---|
Overall | 0.585 | 0.639 | 0.499 | 0.409 | 0.639 | 0.000 |
White | 0.588 | 0.656 | 0.520 | 0.430 | 0.656 | 0.000 |
Black | 0.580 | 0.565 | 0.565 | 0.565 | 0.565 | 0.116 |
American Ind. | 0.485 | 0.496 | 0.483 | 0.487 | 0.496 | −0.028 |
Pacific Island | 0.495 | 0.488 | 0.497 | 0.525 | 0.488 | −0.026 |
Asian | 0.560 | 0.552 | 0.534 | 0.534 | 0.552 | 0.041 |
Other (Hisp) | 0.557 | 0.682 | 0.553 | 0.466 | 0.682 | 0.000 |
Unknown | 0.473 | 0.694 | 0.569 | 0.482 | 0.694 | 0.000 |
Table 7.
Summary of the log-odds increase coefficient, odds ratio, and p-value for race-specific logistic regression results for E4 Count on CI coefficient, odds ratio, and p-value by race.
Table 7.
Summary of the log-odds increase coefficient, odds ratio, and p-value for race-specific logistic regression results for E4 Count on CI coefficient, odds ratio, and p-value by race.
Race | Coefficient | Odds Ratio | p-Value |
---|
White | 0.6296 | 1.8769 | 3.43 × 10−202 |
Black | 0.5632 | 1.7563 | 1.34 × 10−33 |
American Ind. | 0.3166 | 1.3724 | 0.135306 |
Pacific Isla | 0.5853 | 1.7955 | 0.344312 |
Asian | 0.6929 | 1.9994 | 9.12 × 10−8 |
Other | 0.7615 | 2.1416 | 6.75 × 10−5 |
Unknown | 0.1351 | 1.1446 | 0.577726 |
Table 8.
The coefficients of the linear regression model for the age of onset of CI, using E4 counts as a predictor, grouped by race.
Table 8.
The coefficients of the linear regression model for the age of onset of CI, using E4 counts as a predictor, grouped by race.
Race | Intercept | E4 Count = 0 | E4 Count = 1 | E4 Count = 2 |
---|
White | 68.298 | 1.371 | 0.576 | −2.699 |
Black | 69.562 | 1.419 | 0.757 | −2.825 |
American Ind. | 66.710 | 0.377 | 1.104 | −3.875 |
Pacific Island | 59.807 | 4.772 | −3.006 | 8.192 |
Asian | 67.835 | 0.716 | 0.261 | −3.155 |
Other (Hisp) | 65.990 | 1.392 | 0.497 | −3.029 |
Unknown | 63.397 | 0.431 | 1.029 | −3.895 |
Table 9.
The linear regression model evaluations for the age of onset of CI using E4 counts as a predictor by race. (Note: a negative R2 value suggests that the model’s predictions are less accurate than simply using the average as a prediction.)
Table 9.
The linear regression model evaluations for the age of onset of CI using E4 counts as a predictor by race. (Note: a negative R2 value suggests that the model’s predictions are less accurate than simply using the average as a prediction.)
Race | MSE | RMSE | MAE | MAPE | R2 |
---|
White | 120.727 | 10.988 | 8.683 | 0.138 | 0.010 |
Black | 98.071 | 9.903 | 7.742 | 0.122 | 0.010 |
American Ind. | 103.988 | 10.197 | 8.208 | 0.132 | −0.124 |
Pacific Island | 249.128 | 15.784 | 12.022 | 0.219 | −1.000 |
Asian | 108.188 | 10.401 | 8.457 | 0.130 | −0.021 |
Other (Hisp) | 107.101 | 10.349 | 8.439 | 0.136 | −0.044 |
Unknown | 173.324 | 13.165 | 10.145 | 0.200 | −0.082 |
Table 10.
The coefficients of the logistic regression model for CI using E4 counts as a predictor, grouped by gender, do not show a significant difference.
Table 10.
The coefficients of the logistic regression model for CI using E4 counts as a predictor, grouped by gender, do not show a significant difference.
Race | Intercept | E4 Count = 0 | E4 Count = 1 | E4 Count = 2 |
---|
Male | 0.652 | 0.000 | 0.545 | 1.289 |
Female | 0.073 | 0.000 | 0.611 | 1.295 |
Table 11.
The coefficients of the linear regression model for the age of onset of CI, using E4 counts as a predictor, grouped by gender.
Table 11.
The coefficients of the linear regression model for the age of onset of CI, using E4 counts as a predictor, grouped by gender.
Gender | Intercept | E4 Count = 0 | E4 Count = 1 | E4 Count = 2 |
---|
Male | 68.032 | 0.694 | 0.359 | −2.146 |
Female | 68.693 | 1.875 | 0.781 | −3.239 |
Table 12.
The linear regression model evaluations for the age of onset of CI, using E4 counts as a predictor, grouped by gender.
Table 12.
The linear regression model evaluations for the age of onset of CI, using E4 counts as a predictor, grouped by gender.
Gender | MSE | RMSE | MAE | MAPE | R2 |
---|
Male | 113.835 | 10.669 | 8.440 | 0.135 | 0.005 |
Female | 119.651 | 10.939 | 8.635 | 0.135 | 0.014 |
Table 13.
Population characteristics before and after data cleaning.
Table 13.
Population characteristics before and after data cleaning.
Characteristic | Initial Sample | Final Sample |
---|
| (n = 24,673) | (n = 10,023) |
---|
Records removed | - | 14,650 (59.4%) |
Primary reasons | - | Medical history (−4, 9 codes) |
Key variables affected | - | HYPERT, VB12DEF, THYDIS |
Table 14.
Odds ratios for E4 count across models.
Table 14.
Odds ratios for E4 count across models.
Model | E4 Count | Odds Ratio | 95% CI | p-Value |
---|
Base Model | 1 E4 | 1.78 | 1.48–2.13 | <0.001 |
| 2 E4 | 2.90 | 1.94–4.33 | <0.001 |
Demographic | 1 E4 | 1.88 | 1.56–2.26 | <0.001 |
Model | 2 E4 | 3.25 | 2.17–4.88 | <0.001 |
Medical History | 1 E4 | 1.88 | 1.56–2.25 | <0.001 |
Model | 2 E4 | 3.25 | 2.17–4.87 | <0.001 |
Table 15.
Model Comparison Statistics.
Table 15.
Model Comparison Statistics.
Comparison | | df | p-Value |
---|
Base vs. Demographic | 66.99 | 11 | <0.001 |
Demographic vs. Medical | 5.03 | 4 | 0.285 |