Predicting the Risk of Alzheimer’s Disease and Related Dementia in Patients with Mild Cognitive Impairment Using a Semi-Competing Risk Approach

Alzheimer’s disease (AD) and AD-related dementias (AD/ADRD) are a group of progressive neurodegenerative diseases. The progression of AD can be conceptualized as a continuum in which patients progress from normal cognition to preclinical AD (i.e., no symptoms but biological changes in the brain) to mild cognitive impairment (MCI) due to AD (i.e., mild symptoms but not interfere with daily activities), followed by increasing severity of dementia due to AD. Early detection and prediction models for the transition of MCI to AD/ADRD are needed, and efforts have been made to build predictions of MCI conversion to AD/ADRD. However, most existing studies developing such prediction models did not consider the competing risks of death, which may result in biased risk estimates. In this study, we aim to develop a prediction model for AD/ADRD among patients with MCI considering the competing risks of death using a semi-competing risk approach.


Introduction
Alzheimer's disease (AD) and AD-related dementias (AD/ADRD) are a group of progressive neurological diseases.As the most common cause of dementia, AD accounts for 60% to 80% of dementia cases [1].AD/ADRD poses significant public health burdens in the United States (US).It is estimated that there are 6.5 million adults over 65 years living with AD, with the number expected to reach 12.7 million by the year 2050 [1].The estimated total healthcare cost for AD treatment in 2020 is estimated at USD 305 billion, with the cost expected to increase to more than USD 1 trillion as the population ages [2].
The progression of AD/ADRD can be conceptualized as a continuum in which patients progress from normal cognition to preclinical AD/ADRD (i.e., no symptoms but biological changes in the brain) to mild cognitive impairment (MCI) due to AD/ADRD (i.e., mild symptoms but not interfere with daily activities), followed by increasing severity of dementia due to AD/ADRD [1].As an early stage of memory or other cognitive ability loss, MCI has usually been considered a pre-dementia phase of AD/ADRD.However, not all patients with MCI will transition to AD/ADRD dementia.Prior evidence suggests the existence of heterogeneity in AD progression pathways (e.g., faster progression or with different clinical syndromes) [3,4].Characterizing and predicting different AD/ADRD progression pathways and the associated risk factors is a crucial step in understanding the mechanism of AD/ADRD.
It is estimated that about 10-15% of patients with MCI will transition to AD/ADRD each year, and after six years of follow-up, approximately 80% of MCI patients will be converted to AD/ADRD [5][6][7][8][9][10].Therefore, early detection-thus, prediction models-of the transition of MCI to AD/ADRD are needed.There has been a considerable increase in efforts over the past few years to build machine-learning-based models for AD/ADRD prediction with clinical data such as neurobehavioral status exam scores, patient demographics, neuroimaging data, and laboratory test values [11][12][13].Meanwhile, the proliferation of clinical research networks with large collections of real-world data (RWD), including electronic health records (EHRs), claims, and billing data among others, offers unique opportunities to generate real-world evidence (RWE) [14] that will have direct translational impacts on AD/ADRD research.Recent advancements in machine learning (ML) have led to success in various RWD analysis tasks, such as clinical risk prediction [15,16], disease subphenotyping [17,18], and personalized treatment [19].Analyses of EHRs are complicated due to large sample sizes, high dimensionality, sparsity, and heterogeneity [20], but more importantly, an appropriate study design that accounts for the various potential biases inherently exists in observational EHR data.
A recent systematic review examined studies that used machine learning methods and clinical data to model risk for the progression of AD/ADRD [11].Of the 64 papers included in the systematic review, about half of them modeled the development of AD/ADRD in individuals who were initially cognitively normal or had only MCI.However, most existing studies developing such prediction models do not consider the competing risks of other factors, such as death.Competing risks refer to the situation where the study population is at risk for more than one type of possibly correlated failure events [21], and it could lead to biased results and misleading interpretation of the hazard ratio if we simply treat death as random censoring and fit a standard Cox proportional regression model in the sense that it does not account for the scenario that a patient who experienced the AD/ ADRD subsequently had death.In the case of AD/ADRD prediction, the study population is subject to both the risk of AD/ADRD and the risk of death.On one hand, individuals with MCI are at an increased risk of developing AD/ADRD; on the other hand, these individuals are also at an increased risk of death compared to those without MCI because of age and aging-related health conditions such as cardiovascular disease, diabetes, and cancer are more common in older adults.These conditions can cause long-term damage to the body's systems, making individuals more vulnerable to complications and infections.Additionally, age-related changes in the immune system can weaken the body's defenses against infections and increase the risk of mortality from infectious diseases.Thus, older adults are at an increased risk of death due to the accumulation of aging-related health conditions and their impact on overall health and immune function [22][23][24][25].
The risk of death is a competing risk for the development of AD/ADRD, as individuals who die before developing AD/ADRD will not contribute to the incidence of AD/ADRD, meaning that the competing risk of death would censor the AD/ADRD outcomes; thus, the risk of death serves as an informative censoring for AD/ADRD failure events.Failure to account for death as a competing risk may lead to biased estimates of the incidence of AD/ ADRD and inaccurate predictions of the risk of developing AD/ADRD.The data used in such prediction task can be viewed as semi-competing risks data with a non-terminal event (i.e., AD/ADRD) and a terminal event (i.e., death) [26], and the disease process can then be described as an illness-death process with three states: MCI, AD/ADRD, and death.The illness-death process is characterized by three hazard functions to quantify the transition rates between states, i.e., the hazard from MCI to AD/ADRD, the hazard from MCI to death, and the hazard from AD/ADRD to death, as shown in Figure 1.For simplicity, in this paper, we only consider the progression from MCI to AD/ADRD or death and aim to develop a prediction model for AD/ADRD among patients with MCI considering the competing risks of death using a semi-competing risk approach, while other semi-competing risks of AD/ADRD can be modeled similarly.

Materials and Methods
In this study, we used the structured electronic health records (EHR) data from the OneFlorida+ Clinical Research Network [27], one of the eight clinical data research networks contributing to the national Patient-Centered Clinical Research Network (PCORnet) funded by the Patient-Centered Outcomes Research Institute (PCORI).The OneFlorida+ network contains robust longitudinal and linked patient-level real-world clinical data of ~16.8 million Floridians, including data from Medicaid and Medicare claims, tumor registries, vital statistics, and EHRs from its clinical partners.The OneFlorida+ data is a Health Insurance Portability and Accountability Act of 1996 (HIPAA) limited data set (i.e., dates are not shifted) that contains detailed patient demographics and their clinical characteristics, including encounters, diagnoses, procedures, vitals, medications, and labs, following the PCORnet Common Data Model (CDM) [27,28].The data contributed to the OneFlorida+ network undergoes rigorous quality checks and a privacy-preserving record linkage process is used to link and deduplicate patient records from multiple health systems and data sources (i.e., through Datavant required by PCORnet).
From the OneFlorida+ data, patients who had MCI diagnosis recorded in at least one inpatient or two outpatient encounters within a year were identified with ICD codes (ICD-9: 331.83, 294.9;ICD-10: G31.84, F09).For each MCI patient, their first MCI diagnosis was considered as the baseline, and the patients were followed-up until their first AD/ADRD diagnosis, death, or the last record available.Figure 2 displays the overall patient timeline of a typical patient.
To investigate the impact of risk factors on the progression from MCI to AD/ADRD in the presence of a competing risk of death, we utilized the illness-death model [19].Our analysis focused on the time to AD/ADRD (T 1 ) and the time to death (T 2 ), which may be correlated.It is important to note that assuming independence between T 1 and T 2 and separately applying the Cox model to each time-to-event may introduce bias to the results due to not fully accounting for correlation between T 1 and T 2 .To address this, we applied the illness-death model to jointly model T 1 and T 2 , accounting for their correlation and the possibility of T 1 influencing the occurrence of T 2 .
The model assumes that T 1 and T 2 are semi-competing risks, where an individual can experience one event (AD/ADRD) while remaining at risk for the other event (death).We assumed that AD/ADRD is the non-terminal event and death is the terminal event.In Figure 3, we illustrate the joint distribution of T 1 and T 2 .Under Scenario I, only the death event is observed, resulting in the marginal distribution of T 2 when T 1 approaches infinity.
Therefore, we assume that the hazard function for death is proportional over time and the censoring process is non-informative.Under Scenario II, both AD/ADRD and death events are observed, with the support of the joint distribution in the upper wedge of the plot (T 1 < T 2 ) since the time to death always occurred after the time to AD/ADRD.In this scenario, we assume that the risk of AD/ADRD and death are correlated, the hazard functions for both events are proportional over time, and the censoring process is non-informative.These assumptions are crucial for ensuring the validity of the model and obtaining accurate estimates.By utilizing this model, we obtained a more comprehensive understanding of the relationship between AD/ADRD and death and identified potential risk factors for each event.
The illness-death model uses the following equations: where α i is a patient-specific frailty parameter, and X i is covariate for ith patient [34][35][36].
The frailty parameter is a random effect that is used to account for unobserved heterogeneity among patients that could affect their risk of experiencing events of interest.In Equation ( 1), h 1 (t 1 ) denotes the baseline hazard function of time from MCI to AD/ADRD.β 1 denotes a p-dimension coefficient for AD/ADRD from MCI.We interpret the jth component of β 1 as the log of the hazard ratio (HR for one unit increase in that component while adjusting for other components of X and α i .Similarly, in Equation ( 2), we denote the baseline hazard function for death from MCI by h 2 (t 2 ) and interpret the jth component of exp(β 2 ) as the hazard ratio (HR) of death from MCI for one unit increase in that component while holding other components of X and α i .The difference between the illness-death model and other competing risks model, such as the Fine-Gray model [21], is that the illness-death model measures the transition-specific hazard from AD/ADRD to death, as death is a terminal event whereas AD/ADRD is a non-terminal event.Thus, the transition is from the AD/ ADRD event to the death event and is not irreversible.In Equation (3), we denote the baseline hazard function for transitioning from AD/ADRD to death by h 3 (t 2 |t 1 ).The jth component of exp(β 3 ) can be interpreted as the HR of death from AD/ADRD to death for one unit increase in that component while adjusting other components of X and α i .
In our study, we assumed that α i follows a gamma distribution, which is a widely used distribution for modeling random effects in survival analysis.The gamma distribution assumption is based on the characteristic that individual frailties are non-negative and have a skewed distribution, which is a common characteristic of frailties in survival analysis.The frailty parameter α i accounts for the correlation between the time to AD/ADRD and the time to death and reflects the unobserved patient-specific factors that may influence the risk of experiencing the events of interest.The interpretation of β 1 , β 2 , and β 3 are thus different from fitting a single Cox model for each transition since it incorporates the patient-specific effect in the model.The Bayesian paradigm is computationally efficient and provides a framework for predicting future outcomes.All data analyses in this paper are conducted using R 4.0.3package "SemiCompRisks" [37].

Results
A total of 35,774 patients with MCI were identified from the OneFlorida+ clinical research network.Figure 4 shows a flow diagram of the study cohort.After excluding patients who had no visits after their first MCI diagnosis, 33,661 patients were included in the analysis.There are also different distributions between AD/ADRD patients and non-AD/ADRD patients in the risk factors we included.In general, AD/ADRD patients tend to have higher frequencies in most diseases except for anxiety, rheumatic disease, liver diseases, hemiplegia or paraplegia, HIV/AIDS, sleep disorder, and visual impairment.
Table 2 shows the hazard ratios for AD/ADRD treating the death as a random censor vs. with consideration of death as a semi-competing risk.Several factors were identified by both models as risk factors for having AD/ADRD, including older age, being Hispanic, having depression, hypertension, diabetes, cerebrovascular diseases, dementia, and stroke.In general, there was not much difference between the two models in terms of hazard ratios (and their corresponding confidence interval) for most included predictors; however, renal diseases, traumatic brain injury, and vision impairment have had larger confidence intervals that are not statistically significant in the model that considered competing risk.

Discussion
In this study, using a large collection of real-world data from the OneFlorida+ network, we aimed to develop models to predict the conversion from MCI to AD/ADRD with the presence of death as a competing risk.Through this analysis, we have identified several important risk factors for the development of AD/ADRD.We found that patients who have older age, are Hispanics, have depression, diabetes, cerebrovascular diseases, renal disease, or stroke have a higher hazard of having AD/ADRD.These findings are consistent with previous literature [38], indicating the validity of our study.For example, vascular diseases have been linked with an increased risk of AD as the impairments to cerebrovascular network and neurovascular control mechanisms would reduce their abilities to maintain brain activity [39].History of hypertension, high blood pressure, and heart diseases have all been reported to be associated with a higher risk of AD/ADRD.Diabetes, especially type 2 diabetes (T2D), is also associated with an increased risk of cognitive dysfunction and dementia through mechanisms such as insulin resistance and metabolic syndrome [38,[40][41][42].Individuals with a history of depression were more likely to develop AD/ADRD later in life, especially those with earlier-life depression.Finally, it has also been reported that Hispanics are more likely to develop AD/ADRD partially because of their higher risk of high blood pressure, heart disease, diabetes, and stroke-all additional risk factors for AD/ ADRD [43,44].
In this study, in addition to the standard Cox model, we used a semi-competing risk approach to build the AD/ADRD prediction model.In theory, the use of semi-competing risk models can account for the occurrence of the competing risk event (i.e., death in our case) and its relationship to the primary outcome (AD/ADRD), which improves the accuracy of risk prediction when the two hazards are strongly correlated.In this experiment, the hazard of AD/ADRD (at time t) is interpreted as the cause-specific hazard considering a patient-specific effect, i.e., the instantaneous risk of developing AD/ADRD (at time t) in the presence of death given not having AD/ADRD or death (up to time t).In comparison, the standard Cox model assumes that the only possible outcome is the occurrence of the primary event and does not fully account for the correlation between the primary event and the competing event.Given the complexity of AD/ADRD diseases, this hypothesis is plausible and important to capture in the analytical methods.
The assumptions for the semi-competing risks model are reasonable for this study.First, the independent censoring assumption is untestable [45].However, we suspect that the event and the censor are conditional independents of the covariates and frailty for the dataset because the risk factors for ADRD, such as age and comorbidities, can also influence the risk of mortality.As we have controlled covariates and considered frailty effects in our model, this helps to account for their influence on the probabilities of death and ADRD.Secondly, regarding the assumption of proportional hazards, we conducted a comprehensive analysis using Schoenfeld residuals and performed formal tests for each covariate included in the data analysis [46].These evaluations aimed to determine whether the multiplicative relationship holds true.We reported that the majority of covariates (38 of 41 risk factors for time to AD/ADRD and 39 of 41 risk factors for time to death) in our dataset satisfied the proportional hazards assumption.The Schoenfeld residuals exhibited no significant deviations from proportionality, indicating that the hazard ratios for these covariates remain constant over time.Finally, regarding the assumption of frailty distribution, extensive evidence in the statistics literature supports the use of the gamma frailty model in situations where events are positively correlated [26,47,48].We have thoroughly discussed the justifications for this choice in relation to our dataset and research question.Additionally, we examined our data using Pearson's correlation to assess the relationship between the occurrence of AD/ADRD and death events, and our analysis revealed a positive and statistically significant correlation coefficient.Consequently, assuming a gamma distribution for the frailty is reasonable.It is worth noting that the semi-competing risks model we employed is also flexible in terms of the choice of frailty assumptions, including the inverse gamma and Gaussian distributions.
However, we did not observe significant differences in the HRs between the semicompeting risks model and the standard Cox model (i.e., significant level > 0.05 for all HR comparisons) except for a few variables.The lack of observed differences between the two models suggests that the two methods perform similarly in predicting AD/ADRD among patients with MCI and that the additional adjustment of the semi-competing regression model did not yield significantly different estimates.It is possible that death may not be a competing risk in the progression between MCI to AD/ADRD, as suggested by the smaller mortality in patients with no AD/ADRD (13.5%) than in patients who developed AD/ADRD (23.5%).Despite the lack of observed differences, it is still important to consider the potential advantages of the semi-competing risk approach, as evident from Table 2; renal disease, traumatic brain injury, sleep disorder, and visual impairment are only statistically significantly associated with AD/ADRD in the model that did not consider death as a competing risk.This suggests that patients with these conditions were at higher risk of developing AD/ADRD compared to those without these conditions only when we treat death as random censoring, which indicates that the association between these predictors and the risk of developing AD/ADRD may be confounded by the presence of death as a semi-competing risk.Therefore, the interpretation of the HRs in our findings should take into account the model used and the presence of death as a semi-competing risk.The HRs obtained from the semi-competing risks model may be useful for predicting the risk of developing AD/ADRD in patients with conditions such as renal disease, traumatic brain injury, sleep disorder, or a visual impairment, and the standard Cox model may not fully capture the association between these predictors and the risk of developing AD/ADRD due to the presence of death as a semi-competing risk.
There are some limitations in this work.First, we only considered death as a potential competing risk; however, MCI patients may also have other significant conditions such as various types of cancers and other types of dementia besides AD/ADRD, especially as they age, which may serve as competing risks.Multiple chronic conditions are highly prevalent in older adults, such as hyperlipidemia, ischemic heart disease, and chronic kidney disease, among others, that are competing risks of AD/ADRD [49].Nevertheless, our approach can be easier extended to consider multiple competing risks in one model [37,50,51].Secondly, in this study, we only used the structured data from the OneFlorida+ network where some other important risk factors, such as social determinants of health (SDoH) and clinical narratives, were not readily available and thus not included in our analysis, and there may also be other unmeasured confounding variables that may bias the model.Furthermore, we were only able to model AD/ADRD onset as the outcome.As such, a better understanding of the heterogeneity of the progression from MCI to AD/ ADRD is essential for the development of personalized treatment and prevention strategies.
In addition to the above-mentioned improvement to the methodology, potential future work in this area could focus on further validating and improving the semi-competing risk model by incorporating additional predictors or exploring different modeling approaches.Additionally, it may be valuable to explore the impact of different types of competing events on the prediction of AD/ADRD, as well as develop and evaluate interventions aimed at reducing the risk of both AD/ADRD and death in this population.Furthermore, it may be beneficial to investigate the generalizability of the findings to other populations and healthcare settings.
While our study developed a novel semi-competing risks regression model to predict the risk of AD/ADRD in individuals with MCI, we did not report the precision and sensitivity of the model in our current findings.We acknowledge that these performance metrics are important for evaluating the predictive ability of the model and are of great interest to the research community.However, due to the limitations of this current study, we were unable to include these results in our analysis.We plan to further investigate the precision and sensitivity of the model in future research and will report our findings in the discussion section (Section 4) of our paper.We believe that this future work will provide valuable insights into the performance of our model and its potential use in clinical practice.

Conclusions
In this work, using large collections of real-world clinical data from the OneFlorida+ Clinical Research Consortium, we identified a number of risk factors for AD/ADRD, which are consistent with the literature.We considered death as a competing risk and fitted a semi-competing risks model in addition to the standard Cox model.However, we did not observe significant differences from the semi-competing risks model, which suggests that in this specific study, the traditional Cox regression model may be a sufficient and appropriate approach for predicting the occurrence of AD/ADRD in the presence of the competing risk of death among MCI patients.However, further research may be warranted to investigate the performance of semi-competing regression models in other settings and populations.The illness-death process for semi-competing risks data.Illustration for the joint distribution of T 1 and T 2 .A flow chart of study population selection.Hazard ratios for the occurrence of Alzheimer's disease and AD-related dementias (AD/ADRD) with treating death as random censoring vs. with consideration of death as a semicompeting risk.

Figure 2 .
Figure 2. Patient timeline of the predictor extraction time window and the outcome observation window.
Among them, 27,771 did not develop any types of AD/ADRD, while 5890 developed AD/ ADRD.Among patients with AD/ADRD, 3214 of them have developed AD, 1268 of them developed vascular dementia, 283 of them developed Lewy body dementia, 105 developed frontotemporal dementia, and 1020 of them have mixed dementia.In terms of the number of death, for patients with no AD/ADRD, 3749 (13.5%) patients died, while for patients with AD/ADRD, 5890 (23.5%) of them died.The distributions of all baseline characteristics from the predictor extraction window were shown in Table1for patients with any type of AD/ADRD vs. those with no AD/ADRD.Compared with those who did not develop AD/ADRD, there were more females (60.1% vs. 52.8%),more Hispanics (25.7% vs. 18.2%), and patients with AD/ADRD also tended to be older (mean age: 74.4 vs. 59.4) and have higher mortality rates (23.5% vs. 13.5%).Both groups have similar BMI, and there are fewer current smokers (9.7% vs. 14.4%)among AD/ADRD patients but more patients with unknown smoking status (61.2% vs. 56.0%).
[52]ccurately study the progression of AD/ADRD, we would need to be able to extract and model other intermediate outcomes such as neuropsychological tests (e.g., Mini-Mental State Examination [MMSE]) that are not typically captured in structured EHR either.Advanced natural language processing (NLP) methods can be leveraged to extract additional risk factors and neuropsychological test results that measure disease severity[52]from clinical narratives.It is also worth noting that the progression from MCI to AD/ADRD is a heterogeneous process.Some individuals may progress quickly, while others may not progress at all.There are many factors that can influence the rate and direction of the progression, including age, genetics, comorbidities, lifestyle factors, and other environmental factors.Furthermore, there are multiple subtypes of AD/ADRD.In this study, we grouped five conditions (Alzheimer's, vascular dementia, Lewy body, frontotemporal dementia, and mixed dementias) together as AD/ADRD, but individuals with different subtypes may present with different patterns of cognitive impairment and neuropathology.

Table 1 .
Baseline characteristics of the study population.

Variable Hazard Ratio (HR) Treating Death as Random Censoring Considering Death as a Semi-Competing Risk
*Informatics (MDPI).Author manuscript; available in PMC 2024 June 25.