Previous Article in Journal
A Mixture Integer GARCH Model with Application to Modeling and Forecasting COVID-19 Counts
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Bayesian Non-Linear Mixed-Effects Model for Accurate Detection of the Onset of Cognitive Decline in Longitudinal Aging Studies

by
Franklin Fernando Massa
1,*,
Marco Scavino
1 and
Graciela Muniz-Terrera
2
1
Departamento de Métodos Cuantitativos, Universidad de la República, Montevideo 11400, Uruguay
2
Heritage College Osteopathic Medicine, Ohio University, Athens, OH 45701, USA
*
Author to whom correspondence should be addressed.
Stats 2025, 8(3), 74; https://doi.org/10.3390/stats8030074
Submission received: 26 June 2025 / Revised: 28 July 2025 / Accepted: 29 July 2025 / Published: 18 August 2025

Abstract

Change-point models are frequently considered when modeling phenomena where a regime shift occurs at an unknown time. In aging research, these models are commonly adopted to estimate of the onset of cognitive decline. Yet these models present several limitations. Here, we present a Bayesian non-linear mixed-effects model based on a differential equation designed for longitudinal studies to overcome some limitations of classical change point models used in aging research. We demonstrate the ability of the proposed model to avoid biases in estimates of the onset of cognitive impairment in a simulated study. Finally, the methodology presented in this work is illustrated by analyzing results from memory tests from older adults who participated in the English Longitudinal Study of Aging.

1. Introduction

In aging research, determining the onset of cognitive decline is highly relevant, since its accurate and early detection allows for a better understanding of the aging process, its characteristics, and factors associated with its onset [1]. Early detection is therefore critical for the implementation of preventive or therapeutic interventions that can slow or mitigate cognitive decline. The inaccurate estimation of this onset can have significant and potentially harmful consequences; for instance, overestimation of cognitive decline onset may result in delayed care provision for older adults. On the other hand, underestimating the onset of cognitive decline can lead to unnecessary anxiety and worry. Caregivers may think that older persons are in a more advanced state of cognitive deterioration than they are, negatively impacting their quality of life and emotional well-being. Therefore, there is a need for methods that permit the accurate estimation of the onset of cognitive decline in older adults, backed by solid data and appropriate clinical assessments.
Change-point (CP) models are commonly used in aging research for estimating the onset of cognitive decline [2,3,4] and answering questions concerned with the timing of processes such as terminal decline [5]. The detection of the moment at which a CP occurs in the trajectory of a stochastic process is a problem that has been addressed from multiple perspectives in different disciplines. It is a common problem in time series analysis [6] and of utmost interest in longitudinal studies, where a set of individuals is followed over time. Although both types of studies aim to obtain predictions about the behavior of paths of stochastic processes, longitudinal studies usually pay more attention to the determinants of the phenomenon under investigation. In this sense, statistical analysis is frequently performed in the context of regression models, often including random effects. Change-point regression models are commonly formulated in longitudinal studies within a framework based on linear mixed models [7,8,9] and more specifically, in aging research [10,11,12]. Commonly used model specifications include a change over time that may well be abrupt, such as in the Broken stick model (BSM) [13], or gradual, as in the Bacon & Watts model (BWM) [14] and the Bent cable regression model (BCR) [15].
Formulating models suitable for longitudinal data within a differential equation (DE) framework relaxes linearity assumptions like those imposed in linear mixed-effects models. Regarding the explicative features of cognitive decline, the use of mixed models within a non-linear setting maintains the advantages of this methodology, such as describing population and individual variation and accounting for dependent data, while also allowing the incorporation of aspects concerning the onset of the decline phase and the factors associated with its delay or advancement, as well as model the speed at which this process occurs.
We introduce a Bayesian non-linear mixed effects model in which the longitudinal trajectory is modeled through a differential equation (DE). DE models are increasingly used in longitudinal studies [16,17,18], allowing the representation of complex dynamics where the passage of time plays a fundamental role. This novel research contribution focuses on aspects such as describing the temporal evolution of a specific phenomenon and predicting future observations. It is also possible to consider the mean of a longitudinal mixed-effects model as a particular case of a linear DE.
The rest of the paper is organized as follows. In Section 2, we review three CP regression models that are often used in practice, and then focus on the new DE model we propose. Statistical inference for these models is presented under a Bayesian framework, and we provide criteria for the model selection stage. In Section 3, we describe a simulation study designed to evaluate the performance of the DE model under different data-generating processes (DGPs) comparing it against the models presented in Section 2. An application of cognitive data from the English Longitudinal Study of Aging (ELSA) [19] is illustrated in Section 4, where the proposed DE model shows superior prediction accuracy than the other three models and, on average, shifts the estimate of the onset of the cognitive decline two years later. Finally, in Section 5, we draw conclusions and propose future lines of research.

2. Materials and Methods

We formulate CP regression models within a non-linear mixed modeling framework (NLMM) as described by [20]. Assuming that the n i measurements of the i-th individual are contained on the vector Y i = ( y i 1 , y i 2 , , y i n i ) , we adopt the hierarchical definition of NLMM expressed through the following equations:
y i j = f ( t i j , θ 0 i , , , θ K i ) + ε i j , i = 1 , , n = 1 , , n i ,
θ k i = β k 0 + η k i , k = 1 , , K ,
where ε i j N ( 0 , σ ϵ 2 ) and η k i N ( 0 , ω k 2 ) .
Under this specification, the outcome y i j (response of subject i at time t i j ) is a non-linear function of time and a set of parameters θ 0 i , , θ K i . Equation (1) is often referred to as the “individual-level model”. Additionally, the parameters of this equation may include random effects η k i . Equation (2) is often referred to as the “population-level model”. In addition, both random effects ( η k i ) and model errors ( ϵ i j ) are assumed to follow a normal distribution with zero mean and variance parameters ω k 2 and σ ϵ 2 , respectively. An advantage of the NLMM lies in its ability to make predictions about the future values of a particular individual or an average individual.
The most commonly used CP models presented in the literature, as well as our new proposal, are introduced below. All the models are formulated within an NLMM framework by only considering an intercept β k 0 for each parameter in the populational equation. However, the formulation of models with explanatory variables can be easily generalized.

2.1. Change Point Models

Several formulations of CP models have been used in the literature. Here we recall the BSM as a piecewise linear model with one free knot. It is characterized by the following piecewise-continuous linear function:
f B S M ( t , θ ) = θ 0 + θ 1 t , t θ C P θ 0 + θ 1 θ C P + θ 2 ( t θ C P ) t , t > θ C P .
Another characterization has been provided in the BWM with a smooth transition function given as
f B W M ( t , θ ) = θ 0 + θ 1 ( t θ C P ) + θ 2 ( t θ C P ) tanh t θ C P θ T
where tanh(.) denotes the hyperbolic tangent function.
Finally, we consider the BCR where two linear functions are connected by a quadratic polynomial:
f B C R ( t , θ ) = θ 0 + θ 1 t , t θ C P θ T θ 0 + θ 1 θ C P + θ 2 ( t θ C P + θ T ) 2 4 θ T , θ C P θ T < t θ C P + θ T θ 0 + ( θ 1 + θ 2 ) t θ 2 θ C P , t > θ C P + θ T .
Including random effects and explanatory variables in these models is optional; this choice reflects which parameter is allowed to vary among individuals, as presented in Equation (1). Although the interpretation of the intercept ( θ 0 ) and slope parameters ( θ 1 , θ 2 ) differs across models, the parameter associated with the CP ( θ C P ) has the same meaning in all three models. Equations (4) and (5) propose smoother alternatives to the abrupt change of the BSM, where the smoothness of such change is controlled by the value of the transition parameter ( θ T ) and a smooth function (a hyperbolic tangent and a quadratic polynomial, respectively).

2.2. Differential Equation Model (DEM)

Unlike the three previous alternatives, the model presented in Equation (6) is based on the description of a rate of change that is not constant over the course of aging. The mean trajectory is described by a simple exponential decay DE where the key element of this model is the rate function r ( t , θ ) :
f D E M ( t , θ ) = θ 1 , t θ C P r ( t , θ ) f D E M ( t , θ ) , t θ C P f D E M ( θ C P , θ ) = θ 0 .
The family of solutions obtained by solving this simple DE is determined from the specification of the rate function r ( t , θ ) . In this study, a non-decreasing rate function as presented in Figure 1 is proposed. This specification of the rate function resembles the mean function corresponding to the BCR model where two straight lines are connected by a polynomial. However, the obtained mean function poses a different alternative.
Up to the point where the deterioration process begins, the rate of change is zero, which translates into a horizontal trajectory of the f function at the value θ 0 . Then there is a transition period where the rate begins to increase up to a maximum value where it stabilizes. From this moment on, the decay is proportional to the cognitive state. Equation (7) presents the components of the rate function:
r ( t , θ ) = θ 1 / θ 0 , t = θ C P p 3 ( t ) , θ C P < t θ C P + θ T θ 2 , t > θ C P + θ T .
As in the previous cases, the CP is modeled through the parameter θ C P . Additionally, a transition period (of length θ T ) around the CP is considered. Lastly, p 3 ( t ) is a polynomial of third degree that smoothly connects both parts of the rate function. To this end, it enforces the following constraints:
  • p 3 ( θ C P ) = θ 1 / θ 0
  • p 3 ( θ C P ) = 0
  • p 3 ( θ C P + θ T ) = θ 2
  • p 3 ( θ C P + θ T ) = 0 .
The coefficients of this polynomial can be found solving a linear system (see Appendix A). It is important to note that the cubic polynomial p 3 ( t ) used to define the transition in the rate function does not introduce additional free parameters into the model. Instead, its four coefficients are entirely determined by the continuity and differentiability constraints imposed at the boundaries of the transition period. Therefore, the polynomial serves strictly as a smooth interpolant ensuring a gradual change in the rate function, and does not contribute additional flexibility in terms of parameter estimation. As a consequence, the number of parameters estimated in the DEM remains comparable to that of the other models considered. Under this specification, it is possible to obtain the closed analytical expression for the f D E M ( . ) function presented in Equation (8)
f D E M ( t , θ ) = θ 0 + θ 1 t θ C P , t θ C P θ 0 e θ C P t r ( s , θ ) d s , t > θ C P .
After expanding the expression in the first part of Equation (8), θ 0 θ 1 θ C P could be thought as an intercept like parameter for the linear segment before the CP. Applying straightforward algebra, it is possible to expand the exponential expression in (8) as follows:
θ 0 e P 3 ( t ) , θ C P < t θ C P + θ T θ 0 e P 3 ( θ C P + θ T ) + θ 2 ( t θ C P θ T ) , t > θ C P + θ T .
Being P 3 ( t ) = θ C P t p 3 ( s ) . Additionally, by redefining θ ¯ 0 = θ 0 e P 3 ( θ C P + θ T ) , the structure of the exponential decay becomes clearer.
Nonetheless, we prefer to present the model by means of Equations (6) and (7) since they provide a clearer interpretation of the parameters.
To better understand how these models work, Figure 2 compares the mean trajectory of the four alternatives presented in Equations (3)–(6).
Figure 2 shows how the four models consider three phases in the mean function. The BSM consists of two linear segments with different slopes abruptly joined at the CP. The other alternatives adhere to this pattern, adding a “transition” phase between.
Regarding the transition parameter, it has different meanings in the four models. In the BWM, its meaning is not trivial, but a “radius of curvature” (see page 528 of ref. [14]) can be constructed around the CP allowing for a smoother transition between both regimes. The BSM can be viewed as a limiting case of the BWM when the transition parameter equals zero, resulting in an abrupt transition. In the case of the BCR, the transition parameter represents the semi-amplitude of the transition period around the CP. Finally, in the DEM, it is the amplitude of the period that connects both the linear and the decay phases. This period starts at the CP and ends at beginning of the decline phase.
The DEM has the advantage that the value of the function f D E M is never less than zero. This property may be advantageous in applications where the response variable is inherently non-negative, such as cognitive scores. Additionally, as seen in Figure 2, is the only specification capable of capturing a non-linear behavior after the CP. The latter is relevant since, as will be presented in scenarios based on Monte Carlo simulation, an inappropriate choice at CP model selection could result in significant biases in estimating the CP.

2.3. Bayesian Inference

Due to the inclusion of random effects, a CP and the specification of a transition parameter, the proposed DEM belongs to the family of non-linear models and requires iterative and computationally demanding estimation methods. Hence, we opted for the use of a Bayesian estimation approach, that can better handle a (possibly) large number of random effects without resorting to numerical methods to solve high-dimensional integrals [21]. As we will see below, the estimation techniques were based on efficient Markov chain Monte Carlo (MCMC) algorithms.
We selected the following prior distributions. We assigned non-informative Gaussian priors to the fixed effects with precision parameters set to 0.001. On the other hand, for the CP and the transition parameter we propose uniform priors whose range accounted for the duration of the study. Finally, as suggested by [22,23], we assigned half-Cauchy distributions as a weakly informative prior for the parameters associated with the error and random effects variances. These choices allowed us to write the models described above as in Figure 3:
In all cases, we run four parallel Markov chains of model parameters for 5000 iterations each. After discarding the first 2500 iterations of each chain, we assess the convergence of the MCMC algorithm with the four separate Markov chain samples of size 2500. Convergence was monitored using the R ^ statistic, trace plots, and the effective sample size [24]. These checks were systematically applied in both the simulation study and the application to real-world data from ELSA. All statistical analyses were conducted using R [25] making use of the MCMC algorithms available in the rstan and rjags libraries [26,27].

2.4. Model Selection

One way to perform model selection within a Bayesian framework is by using the marginal likelihood [28]. In model selection, K competing models are considered and researchers are interested in the relative plausibility of each model M k , k 1 , 2 , , K , given the priors and the data. This relative plausibility is contained in the posterior model probability p ( M k | y ) of model M k given the data:
p ( M k | y ) = p ( y | M k ) p ( M k ) k = 1 K p ( y | M k ) p ( M k )
where p ( y | M k ) is the marginal likelihood of the data under the M k model. Thus, from the initial representation of modeling uncertainty contained in p ( θ k | M k ) and p ( M k ) , the posterior distribution of the model p ( M k | y ) updates this uncertainty quantification after observing the data. For the applications considered in this study, the calculations necessary to compute these quantities were obtained using the algorithms of the bridgesampling R library [29].
In addition, another extensively used indicator is the widely applicable information criteria (WAIC) [30]:
WAIC = 2 i = 1 n log E p o s t p ( y i | θ ) i = 1 n V a r p o s t log p ( y i | θ )
where E p o s t and V a r p o s t represent the mean and variance with respect to the posterior distribution p ( θ | y ) . Unlike posterior model probabilities, WAIC evaluates out-of-sample prediction accuracy. To this end, it uses the log-predictive density (which is a more general quantity than the mean squared error) and a bias correction that take into account the effective number of parameters. For the purposes of this study, the algorithms used to obtain the WAIC were those contained in the loo R library [31].
The fit indicators described above were introduced to assess the performance of the new DEM against BSM, BWM, and BCR. In this paper, the emphasis is on the better estimation of the CP. For this reason, the comparison is conducted in terms of bias and interval coverage. Lastly, to explore the overall fit of each model, the posterior model probabilities as well as the WAIC value are provided.
To accomplish this, we proposed three simulation-based experiments. The aim of these experiments is to compare the fit of the models in different situations and to determine their performance to estimate the CP. Finally, the performance of the DEM versus the other competitors is presented using real cognitive data from ELSA.

3. Simulation Study

We designed a simulation study based on three experiments. The first has the DEM as the data-generating process (DGP). In this case, not only is the DEM expected to present the best fit, but it is also expected to find negative biases in the other three model specifications when estimating the CP. The second scenario corresponds to a situation where the DGP is the BSM. We considered this scenario to assess how robust the DEM is when it is not the appropriate model. Finally, the third scenario is similar to the first one, but limiting the follow-up period after the CP so that the curvature of the trajectory is not sufficiently decisive to point to the DEM as the best model.
In all scenarios, we simulated 1000 data sets composed of 50 individuals observed on 10 occasions at random times between 0 and 20. The value of the parameter β 1 is set to 0, since the model is designed for application in studies of cognitive decline, where it is natural to assume that the cognitive state of the individuals remains constant until the moment when cognitive decline begins. This assumption is consistent with previous literature (e.g., Karr et al., 2018 [1]), which suggests cognitive performance remains relatively stable prior to decline. For this reason, each scenario considered a horizontal trajectory up to the CP, which was fixed in the middle of the follow-up period ( β C P = 10 ) . In Scenario 2, the slope after the CP ( β 2 ) was set at 0.5 , while Scenarios 1 and 3 considered a decline rate of 0.5 . For all cases, the value of β 0 was set at 11, the observation noise had a variance σ ε 2 of 1.4 , and the random effects had variances of 0.3 , 0.1 , and 2 for the intercept, the slope (or rate), and the CP, respectively. Finally, the transition parameter ( θ T ) was set to 3. The algorithm to generate the data sets follows the pseudo-code presented in Algorithm 1.
The parameters used in the simulation study were selected to reflect typical values observed in empirical cognitive aging research. In particular, the intercept value corresponds to the average total recall score observed among cognitively healthy older adults in the English Longitudinal Study of Ageing (ELSA) [32]. The post-decline rate was set to −0.5, representing a moderate deterioration compatible with prior studies, which report average annual declines in total recall scores ranging from 0.3 to 0.5 points [33]. This value was chosen to ensure that the change point is identifiable without being unrealistically steep. The remaining parameters were chosen to reproduce plausible levels of observation noise and inter-individual variability, and to allow for a meaningful transition period ( θ T = 3 ) across different model specifications.
Algorithm 1 Data simulation for the s-th scenario.
  • Require:  N = 1000 ; n o b s = 50 ; n i j = 7 ; T m a x = 20 ; DGP parameters
  • Ensure: N data sets
1:
for  k 1 to N do
2:
     for  i 1 to n o b s  do
1.
Using a uniform distribution, generate 10 random times between 0 and T m a x ( t i 1 , t i 2 , , t i 10 ).
2.
Generate individual parameters from population-level Equation (2) using inputs and simulated random effects (see Figure 3) θ 0 i , θ 2 i , θ C P i .
3.
Generate observations y i 1 , y i 2 , , y i 7 using the individual-level Equation (1).
3:
     end for
4:
end for
In each scenario, the four models considered were fitted to each of the 1000 data sets using the hierarchical Bayesian non-linear mixed model proposed in Figure 3. The choices of prior distributions could be observed in Appendix B (see Table A1). Information on the posterior distribution of the parameters was extracted after verifying the convergence of the MCMC algorithm (see the distribution of the R ^ statistic and n e f f for the β ^ C P parameter in Figure A1 in Appendix C). The posterior median (PM) of the CP was used as the point estimator, and credibility intervals (CrI) were constructed using the 0.025 and 0.975 sample quantiles (higher posterior density intervals were also constructed but did not differ significantly from those reported in Table 1). The performance of the four models in estimating CP was evaluated through the bias of the β C P estimate, calculated as the difference between the real value and its corresponding estimate. It was considered that a successful (unbiased) estimation should include zero within the CrI of the bias. Likewise, using the CrI of β C P obtained in each of the 1000 data sets (as well as the true value of the CP), the effective coverage of this parameter was approximated in all models for each scenario. In this case, it is desirable that these values be as close as possible to 95%. Additionally, the values obtained from the posterior distribution of the parameters of each model were used to calculate posterior model probabilities and WAIC (and its standard error). We followed the pseudo-code presented in Algorithm 2 to obtain estimates, intervals, and fit indicators from every data set.
Algorithm 2 Data simulation for the sth scenario.
  • Require: N simulated data sets
  • Ensure: bias, effective coverage, and posterior probabilities
1:
for  k 1 to N do
1.
Fit B S M , B W M , B C R and D E M to data set k.
2.
Get PM and 95% CrI of CP.
3.
Compute fit indicators.
2:
end for
Summarize values obtained in c).
Table 1 presents the PM and the 95% CrI for the CP parameter, its bias, and the effective coverage probability of CrI. Finally, posterior model probability and WAIC are presented for the four models in each scenario.
The results presented in Table 1 suggest that the DEM has a superior performance when the decline after the CP shows enough curvature (Scenario 1). This can be seen in the posterior model probability, the lowest value of the WAIC estimate, on the unbiased nature of the CP estimator (the credible interval for the bias covers the value zero without being too wide), and on the effective posterior coverage probability of the credible interval of the true CP being very close to the nominal value of 95%. Furthermore, the performance of the BSM is suboptimal, as it exhibits a negative bias in the estimation of the CP.
In addition, it can be observed that in the other two scenarios, the DEM performance competes with the other alternatives, even in the worst case (Scenario 2). If the true decline is either linear (Scenario 2) or close to linear (Scenario 3), the CP estimated has a negligible median bias and a high posterior coverage close to the true CP value. It is worth noting that, even when the post-CP curvature is not sufficiently pronounced (Scenario 3), the posterior model probability indicates moderate evidence in favor of DEM.

4. Application

4.1. Data

ELSA is a prospective, population-based cohort of individuals aged over 50 to understand different aspects of aging in England [19]. The cohort was established in 2002 with data collected biannually across nine monitoring waves, in which interviews have been conducted with 19,221 individuals. Computational constraints led us to analyze a randomly selected subset of 5000 participants from ELSA. We focus on a memory marker, which is called the total word recall test [34]. This was constructed as the sum of both immediate and delayed recall tests. Participants were presented with a list of ten words and prompted to recall them immediately and after approximately five minutes. This test serves as a measure of verbal skills and working memory.

4.2. Results

Figure 4a depicts a spaghetti plot of the total recall test score versus age over a sample of 5000 randomly selected participants. Additionally, Figure 4b displays some highlighted trajectories in black over the light gray trajectories from the first panel. Although Figure 4a suggests significant variability among trajectories, a “stable” phase followed by a “downward” phase beginning between 60 and 70 years of age can be discerned. This pattern is also evident in Figure 4b, where individual trajectories vary in level but consistently show that the rate of change differs between the beginning and the end of the follow-up period.
We fitted the four models introduced in the change-point models section and calculated the model selection criteria presented in the model assessment section. Model hyperparameters matched those used in our simulation study (Appendix B, Table A1), with one key adaptation: the change-point parameter ( β C P ) was assigned a uniform (60, 80) prior for the ELSA application, reflecting both the cohort’s age distribution (baseline range: 50–90 years) and clinical evidence that cognitive decline onset rarely occurs outside this window. All other priors remained unchanged. Table 2 presents the estimated parameters, including posterior medians and credible intervals, as well as model selection indicators for the three baseline models—BSM, BWM, and BCR—and the proposed DEM. On the one hand, all four models agree on the average intercept and the random components estimates. However, although the posterior model probability provides only anecdotal evidence in favor of DEM, the value of WAIC favors it as well. We confirm that all model estimates presented in this section met standard convergence criteria across the four models compared.
Figure 5 compares the posterior distributions of the CP estimates across the four models, revealing critical differences in their characterization of cognitive decline onset. While the BSM, BWM, and BCR models converge on earlier CP estimates (medians: 67–68 years), the DEM suggests a markedly later onset (median: 74.3 years; 95% CrI: 72.9–75.6). This discrepancy aligns with our simulation results, where the DEM avoided the downward bias exhibited by other models when true decline was non-linear. The DEM’s wider credible intervals (2.7 years vs. 1–3 years in other models) reflect its capacity to capture individual variability in non-linear trajectories—a biologically plausible feature absent in piecewise-linear alternatives. Clinically, this 6-year shift in estimated onset could significantly impact intervention timing, as earlier estimates may prompt premature interventions, whereas the DEM’s later estimates better match the observed preservation of cognitive function in aging populations. These findings underscore the DEM’s advantages in both statistical rigor and translational relevance.
As seen in the simulation experiments, this could be due to non-linearities on the mean trajectory. Additionally, the DEM presents a narrower transition period, which is compensated by a higher variability in the random component of the CP.
The results obtained from the empirical application reveal greater discrepancies between models in estimating the change point (CP), which may stem from various factors—most notably, the presence of nonlinear dynamics that simpler models fail to capture adequately. Unlike the simulated data, the estimates reported in Table 2 exhibit divergence not only in the CP but also in the parameters associated with random effects and measurement error.
Finally, using the simulated values from the posterior distribution of the parameters from the DEM, Figure 6 presents trace plots from the fixed effects of the DEM and the fitted curve along a 95% probability band for total recall.

5. Conclusions

This paper introduced a non-linear alternative based on a new DEM specification to commonly used CP models and presented its performance compared to the models most commonly used in the literature. Model fitting was performed in a Bayesian context within the R framework using the r j a g s or r s t a n libraries. For the purposes of this study, model estimation using r j a g s resulted in shorter runtimes than those required by r s t a n . Nevertheless, parameter estimation could also be undertaken straightforwardly within a frequentist framework. Comparison among models was conducted by computing model selection indicators that rely on the MCMC samples.
Different simulation scenarios were implemented to explore the performance of the DEM with reference to the BSM, BWM, and BCR models. From the analysis of the results of the simulation design, we observed that, when the cognitive decline phase had a sufficiently pronounced curvature, the DEM presented the best fit and produced a less biased estimator for the CP with higher posterior coverage in the credible interval. However, when there were not enough observations in the period where the curvature manifests itself, the performance of the DEM decreased. Furthermore, when the DGP did not exhibit curvature in the post-CP phase, the performance of DEM was found to be on par with that of its competitors.
Ultimately, in the illustration of the models, we used actual cognitive data from the ELSA study. The results showed a slightly better fit of the DEM, which suggested an onset of the cognitive decline stage an average of 6 years later than the other models. This result is of particular interest in the area of aging because it provides vital information for health policy planning. In conclusion, it is worth noting that the model described in this paper is not meant to replace the models proposed in the literature, but rather to serve as a viable alternative, offering commendable statistical properties, transparent parameter interpretation, and biologically plausible features (such as the mean function consistently avoiding intersection with the x-axis). Regarding its use, it should be preferred when there are reasons to include non-linear behavior in the modeling stage.
Future research will focus on (1) extending the DEM to incorporate explanatory variables (e.g., genetic risk factors) and handle missing data patterns common in longitudinal aging studies, (2) developing open-source software tools to facilitate clinical adoption, (3) applications to other neurodegenerative processes, and (4) personalized intervention timing based on model-derived decline trajectories. These developments will further bridge statistical innovation with clinical practice in aging research.

Author Contributions

Conceptualization, G.M.-T. and F.F.M.; methodology, F.F.M.; software, F.F.M.; validation, G.M.-T., M.S. and F.F.M.; formal analysis, F.F.M.; investigation, G.M.-T. and F.F.M.; resources and data curation, F.F.M.; writing—original draft preparation, F.F.M., M.S. and G.M.-T.; writing—review and editing, F.F.M., M.S. and G.M.-T.; visualization, F.F.M.; supervision, G.M.-T. and M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

All participants gave written informed consent. The English Longitudinal Study of Ageing has been approved by the National Research Ethics Service (London Multicentre Research Ethics Committee (MREC/01/2/91)).

Data Availability Statement

To obtain ELSA data from ELSA in all waves, including wave 0 (Health Survey from England), contact the UK Data Service. The code used to generate the simulated data sets and obtain the results presented in this paper can be obtained by request to the main author.

Conflicts of Interest

The authors declared that they have no potential conflicts of interest in relation to the research, authorship, and/or publication of this article.

Abbreviations

The following abbreviations are used in this manuscript:
CPChange point
BSMBroken stick model
BWMBacon & Watts model
BCRBent cable regression
DEDifferential equation
DEMDifferential equation model
DGAData generating process
ELSAEnglish longitudinal study of ageing
NLMMNon-linear mixed model
MCMCMarkov chain Monte Carlo
WAICWidely applicable information criteria

Appendix A. Coefficients of p3(t)

The polynomial in the rate function r ( t , θ ) has the form
a 0 + a 1 t + a 2 t 2 + a 3 t 3 .
Imposing the constraints presented in Section 2.2, the following system of linear equations is obtained:
1 θ C P θ C P 2 θ C P 3 0 1 θ C P θ C P 2 1 θ C P + θ T θ C P + θ T 2 θ C P + θ T 3 0 1 2 θ C P + θ T 3 θ C P + θ T 2 a 0 a 1 a 2 a 3 = θ 1 / θ 0 0 θ 2 0 .
The coefficients of p 3 ( t ) that arise after solving this system are
  • a 3 = 2 θ 1 / θ 0 θ 2 / θ T 3
  • a 2 = 3 2 θ C P + θ T θ 1 / θ 0 θ 2 / θ T 3
  • a 1 = 6 θ C P θ C P + θ T θ 1 / θ 0 θ 2 / θ T 3
  • a 0 = θ 1 / θ 0 θ 1 / θ 0 θ 2 θ C P 2 2 θ C P + 3 θ T θ T 3 .

Appendix B. Prior Distributions

Table A1 provides a detailed description of the selected a priori distributions used for each parameter in the four models considered in the simulation study of Section 3.
Table A1. Prior distributions for the simulation study described in Section 3. Note that N, U, and H C denote the normal, uniform, and half-Cauchy distributions, respectively.
Table A1. Prior distributions for the simulation study described in Section 3. Note that N, U, and H C denote the normal, uniform, and half-Cauchy distributions, respectively.
ParameterDistributionBSMBWMBCRDEMDescription
Fixed effects
β 0 N ( 10 , 100 ) Intercept
β 2 N ( 0 , 100 ) Linear slope before CP (decay rate in DEM)
β C P U ( 0 , 20 ) Change point
θ T U ( 0 , 5 ) Transition parameter
Random effects
σ ϵ 2 H C ( 0 , 10 ) Observation noise variance
ω b 0 2 H C ( 0 , 1 ) Intercept variance
ω b 2 2 H C ( 0 , 1 ) Slope (decay rate in DEM) variance
ω b C P 2 H C ( 0 , 1 ) Change point variance
Note that N, U, and H C denote the normal, uniform, and half-Cauchy distribution, respectively.

Appendix C. MCMC Diagnostics

Figure A1 presents a summary of convergence indicators related to the β ^ C P estimate for the four considered models across the three considered scenarios.
Figure A1. Convergence diagnostics for the β ^ C P estimate.
Figure A1. Convergence diagnostics for the β ^ C P estimate.
Stats 08 00074 g0a1

References

  1. Karr, J.E.; Graham, R.B.; Hofer, S.M.; Muniz-Terrera, G. Does Cognitive Decline Begin? A Systematic Review of Change Point Studies on Accelerated Decline in Cognitive and Neurological Outcomes Preceding Mild Cognitive Impairment, Dementia, and Death. Psychol. Aging 2018, 33, 195–218. [Google Scholar] [CrossRef]
  2. Hall, C.; Lipton, R.; Sliwinski, M.; Stewart, W.F. A change point model for estimating the onset of cognitive decline in preclinical Alzheimer’s disease. Stat. Med. 2000, 19, 1555–1566. [Google Scholar] [CrossRef]
  3. Muniz-Terrera, G.; van den Hout, A.; Matthews, F. Random change point models: Investigating cognitive decline in the presence of missing data. J. Appl. Stat. 2011, 38, 705–716. [Google Scholar] [CrossRef]
  4. van den Hout, A.; Muniz-Terrera, G.; Matthews, F. Smooth random change point model. Stat. Med. 2011, 30, 599–610. [Google Scholar] [CrossRef] [PubMed]
  5. Sprague, B.N.; Freed, S.A.; Phillips, C.B.; Ross, L.A. A viewpoint on change point modeling for cognitive aging research: Moving from description to intervention and practice. Ageing Res. Rev. 2020, 58, 101003. [Google Scholar] [CrossRef]
  6. Aminikhanghahi, S.; Cook, D.J. A Survey of Methods for Time Series Change Point Detection. Knowl. Inf. Syst. 2017, 51, 339–367. [Google Scholar] [CrossRef]
  7. Kiuchi, A.S.; Hartigan, J.A.; Holford, T.R.; Rubinstein, P.; Stevens, C.E. Change points in the series of T4 counts prior to AIDS. Biometrics 1995, 51, 236–248. [Google Scholar] [CrossRef]
  8. McLain, A.; Albert, P. Modeling Longitudinal Data with a Random Change Point and No Time-Zero: Applications to Inference and Prediction of the Labor Curve. Biometrics 2014, 70, 1052–1060. [Google Scholar] [CrossRef]
  9. Muggeo, M. Modeling temperature effects on mortality: Multiple segmented relationships with common break points. Biostatistics 2008, 9, 613–620. [Google Scholar] [CrossRef]
  10. Domenicus, A.; Ripatti, S.; Pedersen, N.; Palmgren, J. A random change point model for assessing variability in repeated measures of cognitive function. Stat. Med. 2008, 27, 5786–5798. [Google Scholar] [CrossRef]
  11. van den Hout, A.; Muniz-Terrera, G.; Matthews, F. Change point models for cognitive tests using semi-parametric likelihood. Comput. Stat. Data Anal. 2013, 57, 684–698. [Google Scholar] [CrossRef] [PubMed]
  12. Yu, L.; Boyle, P.; Wilson, R.; Segawa, E.; Leurgans, S.; Jager, P. A random change point model for cognitive decline in Alzheimer’s disease and mild cognitive impairment. Neuroepidemiology 2012, 70, 73–83. [Google Scholar] [CrossRef]
  13. Cohen, P. Applied Data Analytic Techniques for Turning Points Research; Routledge: New York, NY, USA, 2008. [Google Scholar]
  14. Bacon, D.; Watts, D. Estimating the transition between two intersecting straight lines. Biometrika 1971, 58, 525–534. [Google Scholar] [CrossRef]
  15. Chiu, G.; Lockhart, R.; Routledge, R. Bent-Cable Regression Theory and Applications. J. Am. Stat. Assoc. 2006, 101, 542–553. [Google Scholar] [CrossRef]
  16. Albano, G.; Giorno, V.; Román-Román, P.; Torres-Ruiz, F. Inference on a stochastic two-compartment model in tumor growth. Comput. Stat. Data Anal. 2012, 56, 1723–1736. [Google Scholar] [CrossRef]
  17. Hu, Y.; Treinen, R. A one-step method for modelling longitudinal data with differential equations. Br. J. Math. Stat. Psychol. 2019, 72, 38–60. [Google Scholar] [CrossRef]
  18. Rosenström, T.; Jokela, M.; Hintsanen, M.; Pulkki-Råback, L.; Hutri-Kähönen, N.; Keltikangas-Järvinen, L. Longitudinal course of depressive symptoms in adulthood: Linear stochastic differential equation modeling. Psychol. Med. 2013, 43, 933–944. [Google Scholar] [CrossRef]
  19. Steptoe, A.; Breeze, E.; Banks, J.; Nazroo, J. Cohort profile: The English Longitudinal Study of Ageing. Int. J. Epidemiol. 2013, 42, 1640–1648. [Google Scholar] [CrossRef] [PubMed]
  20. Demidenko, E. Mixed Models: Theory and Applications with R, 2nd ed.; Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  21. Lee, S.Y. Bayesian Nonlinear Models for Repeated Measurement Data: An Overview, Implementation, and Applications. Mathematics 2022, 10, 898. [Google Scholar] [CrossRef]
  22. Gelman, A. Prior distributions for variance parameters in hierarchical models. Bayesian Anal. 2006, 1, 515–533. [Google Scholar] [CrossRef]
  23. Polson, N.G.; Scott, J.G. On the Half-Cauchy Prior for a Global Scale Parameter. Bayesian Anal. 2012, 7, 887–902. [Google Scholar] [CrossRef]
  24. Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; Chapman & Hall/CRC Press: London, UK, 2013. [Google Scholar]
  25. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2024; Available online: https://www.R-project.org/ (accessed on 27 July 2025).
  26. Stan Development Team RStan: The R Interface to Stan. 2020. Available online: https://mc-stan.org/rstan (accessed on 27 July 2025).
  27. Plummer, M. rjags: Bayesian Graphical Models Using MCMC; R package version 4–15; 2023. Available online: https://CRAN.R-project.org/package=rjags (accessed on 27 July 2025).
  28. Llorente, F.; Martino, L.; Delgado, D.; López-Santiago, J. Marginal Likelihood Computation for Model Selection and Hypothesis Testing: An Extensive Review. SIAM Rev. 2023, 65, 3–58. [Google Scholar] [CrossRef]
  29. Gronau, Q.F.; Singmann, H.; Wagenmakers, E.J. Bridgesampling: An R Package for Estimating Normalizing Constants. J. Stat. Softw. 2020, 92, 1–29. [Google Scholar] [CrossRef]
  30. Watanabe, S. Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. J. Mach. Learn. Res. 2010, 11, 3571–3594. [Google Scholar]
  31. Vehtari, A.; Gelman, A.; Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 2017, 27, 1413–1432. [Google Scholar] [CrossRef]
  32. Zheng, F.; Yan, L.; Yang, Z.; Zhong, B.; Xie, W. HbA1c, diabetes and cognitive decline: The English Longitudinal Study of Ageing. Diabetologia 2018, 61, 839–848. [Google Scholar] [CrossRef] [PubMed]
  33. Zaninotto, P.; Batty, G.D.; Allerhand, M.; Deary, I.J. Cognitive function trajectories and their determinants in older people: 8 years of follow-up in the English Longitudinal Study of Ageing. J. Epidemiol. Community Health 2018, 72, 685–694. [Google Scholar] [CrossRef]
  34. McFall, S. Understanding Society: UK Household Longitudinal Study: Cognitive Ability Measures; Institute for Social and Economic Research, University of Essex: Essex, UK, 2013. [Google Scholar]
Figure 1. Rate function for the DEM.
Figure 1. Rate function for the DEM.
Stats 08 00074 g001
Figure 2. Change point models.
Figure 2. Change point models.
Stats 08 00074 g002
Figure 3. Graphical model of the hierarchical Bayesian non-linear mixed model. Nodes represent variables of interest (observed = shaded, latent = unshaded), with dependencies represented via the graph structure.
Figure 3. Graphical model of the hierarchical Bayesian non-linear mixed model. Nodes represent variables of interest (observed = shaded, latent = unshaded), with dependencies represented via the graph structure.
Stats 08 00074 g003
Figure 4. (a) 5000 randomly selected trajectories of total recall. (b) A set of 16 individual trajectories showing that individuals not only may have a different number of measurements but also participate in the study at different periods.
Figure 4. (a) 5000 randomly selected trajectories of total recall. (b) A set of 16 individual trajectories showing that individuals not only may have a different number of measurements but also participate in the study at different periods.
Stats 08 00074 g004
Figure 5. Posterior CP distribution according to the four proposed models.
Figure 5. Posterior CP distribution according to the four proposed models.
Stats 08 00074 g005
Figure 6. (a) Trace plots from β 0 , β 2 , β C P and β T . (b) Fitted trajectory of total recall according to the DEM.
Figure 6. (a) Trace plots from β 0 , β 2 , β C P and β T . (b) Fitted trajectory of total recall according to the DEM.
Stats 08 00074 g006
Table 1. Results from the simulation experiments.
Table 1. Results from the simulation experiments.
Scenario 1Scenario 2Scenario 3
DGPDEMBSMDEM
CP Estimate (95% CrI)
   BSM8.89 (7.96; 9.84)9.98 (8.42; 11.56)10.06 (9.31; 10.91)
   BWM9.23 (7.84; 16.13)11.86 (8.65; 13.38)10.49 (9.08; 18.78)
   BCR9.20 (7.97; 15.58)10.91 (9.11; 13.82)9.99 (9.34; 10.91)
   DEM10.40 (9.68; 11.24)10.06 (8.50; 11.67)10.56 (9.45; 14.17)
CP Bias (95% CrI)
   BSM−1.11 (−2.04; −0.16)−0.02 (−1.58; 1.56)0.06 (−0.69; 0.91)
   BWM−0.77 (−2.16; 6.13)1.86 (−1.35; 3.38)0.49 (−0.92; 8.78)
   BCR−0.80 (−2.03; 5.58)0.91 (−0.89; 3.82)−0.01 (−0.66; 0.91)
   DEM0.40 (−0.32; 1.24)0.06 (−1.50; 1.67)0.56 (−0.55; 4.17)
CP Effective Coverage (95% CrI)
   BSM0.25 (0.18; 0.31)0.92 (0.89; 0.96)0.946 (0.92; 0.98)
   BWM0.18 (0.13; 0.23)0.79 (0.73; 0.84)0.817 (0.76; 0.87)
   BCR0.25 (0.19; 0.31)0.90 (0.86; 0.94)0.952 (0.92; 0.98)
   DEM0.92 (0.88; 0.96)0.95 (0.92; 0.98)0.849 (0.80; 0.90)
Posterior Model Probability
   BSM<0.0010.250.09
   BWM<0.0010.190.11
   BCR<0.0010.260.07
   DEM>0.990.310.73
WAIC (SE)
   BSM1841.83 (30.50)1651.82 (29.38)1670.53 (30.85)
   BWM1829.44 (30.31)1653.00 (29.37)1670.10 (30.70)
   BCR1842.73 (30.29)1652.69 (29.44)1671.69 (30.85)
   DEM1666.15 (26.61)1658.81 (29.31)1657.94 (30.34)
Table 2. Parameter estimates, 95% credible intervals, and model selection indicators.
Table 2. Parameter estimates, 95% credible intervals, and model selection indicators.
BSMBWMBCRDEM
Fixed Effects
β 0 11.2911.2811.3111.22
(11.19; 11.40)(10.17; 11.39)(11.20; 11.42)(11.11; 11.32)
β 2 −50.26−0.12−0.26−0.05
(−0.29; −0.24)(−0.14; −0.11)(−0.28; −0.23)(−0.06; −0.04)
β C P 68.6167.7768.3374.28
(67.30; 70.50)(66.14; 69.20)(66.85; 69.94)(72.93; 75.60)
θ T 3.332.473.63
(0.26; 4.95)(0.13; 4.82)(0.69; 4.96)
Random Effects
σ ϵ 2.272.272.272.24
(2.24; 2.30)(2.25; 2.30)(2.25; 2.30)(2.22; 2.27)
σ β 0 2.122.132.132.16
(2.05; 2.20)(2.05; 2.20)(2.06; 2.20)(2.08; 2.32)
σ β 2 0.110.060.110.03
(0.09; 0.13)(0;05; 0.07)(0.09; 0.14)(0.02; 0.04)
σ β C P 6.846.686.748.39
(6.05; 7.80)(5.81; 7.49)(5.86; 7.61)(7.74; 9.09)
Model Selection
WAIC1,233,242.5123,265.3123,223.6122,827.6
PMP0.2290.228≤0.0010.542
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Massa, F.F.; Scavino, M.; Muniz-Terrera, G. A Bayesian Non-Linear Mixed-Effects Model for Accurate Detection of the Onset of Cognitive Decline in Longitudinal Aging Studies. Stats 2025, 8, 74. https://doi.org/10.3390/stats8030074

AMA Style

Massa FF, Scavino M, Muniz-Terrera G. A Bayesian Non-Linear Mixed-Effects Model for Accurate Detection of the Onset of Cognitive Decline in Longitudinal Aging Studies. Stats. 2025; 8(3):74. https://doi.org/10.3390/stats8030074

Chicago/Turabian Style

Massa, Franklin Fernando, Marco Scavino, and Graciela Muniz-Terrera. 2025. "A Bayesian Non-Linear Mixed-Effects Model for Accurate Detection of the Onset of Cognitive Decline in Longitudinal Aging Studies" Stats 8, no. 3: 74. https://doi.org/10.3390/stats8030074

APA Style

Massa, F. F., Scavino, M., & Muniz-Terrera, G. (2025). A Bayesian Non-Linear Mixed-Effects Model for Accurate Detection of the Onset of Cognitive Decline in Longitudinal Aging Studies. Stats, 8(3), 74. https://doi.org/10.3390/stats8030074

Article Metrics

Back to TopTop