Associations of the Short Physical Performance Battery (SPPB) with Adverse Health Outcomes in Older Adults: A 14-Year Follow-Up from the English Longitudinal Study of Ageing (ELSA)

The Short Physical Performance Battery (SPPB) is an objective tool for evaluating three domains (balance, repeated chair stands, and gait speed) of lower extremity physical function in older age. It is unclear how the associations between SPPB scores and health outcomes persist over time. The aim of this 14-year cohort study was to investigate associations between SPPB scores and health outcomes among participants aged 60+ years in the English Longitudinal Study of Ageing (ELSA). The exposures were SPPB scores (total and domain-specific) at baseline (Wave 2). The outcomes were mobility impairments, difficulties in performing basic activities of daily living (ADL) or instrumental activities of daily living (IADL), and falls, measured at seven subsequent timepoints (Waves 3 to 9). The analyses involved linear and logistic multilevel regressions. After adjusting for potential confounders, a one-point increase in the total SPPB score was associated with a 0.13 (95% CI: −0.16, −0.10) decrease in mobility impairment, a 0.06 (−0.08, −0.05) decrease in ADL disabilities, a 0.06 (−0.07, −0.04) decrease in IADL disabilities, and 8% (0.90, 0.95) lower odds of falling (averaged across all follow-ups). Associations between the SPPB domains and health outcomes were more varied. The SPPB may be a useful measure for identifying older adults at a high risk of adverse outcomes.


Introduction
Biological aging in humans is often characterised by a decline in physical function including balance, muscular strength, and walking speed [1]. Good physical function has been suggested to prevent frailty and underpin people's ability to remain physically active and carry out the activities of daily living (ADL) that help maintain independence, as well as prevent falls and hospitalisation [2]. Given the rapid aging of the population and the associated strain on health and social care, much attention in contemporary public health policy and research has focused on the relationship between physical function and health to guide preventive interventions that help keep older adults independent for as long as possible [3,4]. Central to the understanding of this relationship is a measure of physical function that can guide researchers and practitioners on the most appropriate action to prevent adverse health outcomes and declining quality of life [5].
The Short Physical Performance Battery (SPPB) [6] has emerged as one of the leading tools for assessing physical function. The SPPB measures three components of physical function: balance, as measured across three levels of difficulty; lower limb strength, measured as the speed at which an individual can perform five unassisted sit-to-stand movements; and gait speed, measured as a 2.44-metre walk. The original study presenting the validity of the SPPB has been cited over 8600 times as of October 2022, and the instrument is often used as a measure to assess an intervention's effects in trials targeting improved physical function [7,8]. Moreover, the SPPB has been proposed to be an appropriate measure for characterising the frailty status of older adults, distinguishing robust physical function from pre-frailty or frailty [9,10].
In addition to its widespread use as an outcome measure of physical function, previous work has shown that lower scores on the SPPB are associated with a range of adverse outcomes, including falls [11,12], hospitalisation [13,14], long-term care needs [13,15], frailty [16], and all-cause mortality [17]. However, studies with longer follow-up periods providing evidence on how the association between SPPB scores and physical outcomes changes over time are scarce. Likewise, little attention has been spent on ascertaining the relative contribution of the subcomponents of the SPPB for protecting against mobility impairments, loss of the ADL, and falls.
The aim of the present study was therefore to examine associations of the SPPB and its domains of balance, strength, and gait speed with the key functional outcomes collected over 14 years. We hypothesised that higher total SPPB scores at baseline would be associated with more favourable outcomes over time, as would higher values across each of the domains.

Study Design and Participants
We used data from Waves 2 (2004Waves 2 ( -2005 to 9 (2018-2019) of the English Longitudinal Study of Ageing (ELSA) [18], a nationally representative biannual survey of English adults aged 50+ years, living in private households. The original sample of respondents was recruited from households participating in the Health Survey for England in 1998England in , 1999England in , or 2001. Further information on the cohort profile is available elsewhere [19]. Wave 2 was used as the baseline assessment, as the SPPB was first administered in this wave. We limited the sample to core members aged 60+ years at baseline, to align with the World Health Organization's definition of older age [20]. ELSA received ethical approval from the National Research Ethics Service and all participants provided informed consent. The present study was approved by the Research Ethics Approval Committee for Health (EP 22 048) at the University of Bath. This study has been reported in line with the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines (see the checklist in Table S1) [21].

Outcomes
There were four outcome variables assessed at every timepoint: (1) mobility impairments, (2) ADL disabilities, (3) instrumental activities of daily living (IADL) disabilities, and (4) falls. Mobility impairments were operationalised as the sum of reported difficulty with 10 activities (0, no difficulty; 10, difficulty performing all activities), namely walking 100 yards; sitting for about 2 h; getting up from a chair after sitting for long periods; climbing several flights of stairs without resting; climbing one flight of stairs without resting; stooping, kneeling, or crouching; reaching or extending arms above shoulder level; pulling or pushing large objects; lifting or carrying over 10 pounds; and picking up a 5 pence coin from a table. Functional decline was evaluated on the basis of self-reported limitations in basic ADL and IADL. There were six ADL items: dressing, walking across a room, bathing or showering, eating, getting in and out of bed, and using the toilet. The seven IADL items included using a map, preparing a hot meal, shopping for groceries, making telephone calls, taking medications, doing work around the house and garden, and managing money. The responses were summed to construct a scale ranging from 0 to 6 for ADL disabilities, and from 0 to 7 for IADL disabilities, representing the number of activities with reported difficulty. Finally, the participants were asked whether they had fallen in the previous 2 years (no versus yes).

Exposures
The SPPB is composed of three measures of physical performance: standing balance, repeated chair stands, and gait speed [6]. Balance was evaluated using three hierarchical tasks involving side-by-side, semi-tandem, and full-tandem stands. For the repeated chair stand test, participants were timed as they completed five sit-to-stand repetitions. Gait speed was assessed by timing the participants as they walked 2.44 metres at their regular pace. Each SPPB domain was scored from 0 (worst) to 4 (best) according to established cut-off points and summed to generate a total score from 0 to 12 [6]. A detailed explanation of the scoring methods is provided in the Supplementary Materials ( Figures S1-S3). We used the baseline SPPB scores (total and domain-specific) as the exposures.

Covariates
Potential confounders were selected in line with existing studies [13,22]. All measures were retrieved at baseline. Sociodemographic covariates included age (a continuous variable, collapsed to 90 for participants aged 90+ years), biological sex, ethnicity (dichotomised in ELSA as White versus non-White), marital status (single, separated/divorced/widowed, and married), employment status (not employed versus working full-or part-time), education (no formal qualifications, school qualifications, at least some higher education), and total non-pension wealth (quintiles) at the benefit unit level.
Health-related covariates comprised physical activity, body mass index (BMI), cognitive function, and depressive symptoms. The respondents reported how frequently they participated in sports or activities that were vigorous, moderately energetic, and mildly energetic (more than once a week, once a week, 1 to 3 times a month, hardly ever or never). A four-category variable was created for physical activity, in line with previous work [23]: inactive (no weekly activity), only mild activity at least once per week, moderate but no vigorous activity at least once per week, and vigorous activity at least once per week. BMI was a continuous variable, calculated as weight (kg) divided by height squared (m 2 ). Cognitive function was assessed by aggregating information from neuropsychiatric batteries testing time orientation, immediate and delayed recall, prospective memory, verbal fluency, and executive function (processing speed and efficiency) [24,25]. As the scoring of each test varied, we computed ‡-scores for the individual tests. The sum of the individual domains' ‡-scores was then standardised to obtain a ‡-score for global cognitive function [26]. Depressive symptoms were assessed using the 8-item Centre for Epidemiologic Studies Depression Scale (CES-D) [27].

Statistical Analysis
We defined our complete-case samples as participants with data on the SPPB exposures (total score), covariates, and outcomes at baseline and at least one follow-up wave. Descriptive statistics were calculated as the mean (standard deviation (SD)) for continuous or count variables, and the frequency (percentage) for categorical variables. We compared the unweighted baseline characteristics of the participants with complete and missing data, using independent t-tests and Pearson's chi-squared (χ 2 ) tests.
In our primary analyses, we used linear (mobility impairments, ADL disabilities, IADL disabilities) and logistic (falls) multilevel models to examine the associations between total SPPB scores (Wave 2; continuous exposure) and subsequent outcomes (Waves 3 to 9). The models were built in 10 stages (Table S2) [26]. The analyses were repeated to explore the associations between the SPPB domain scores (entered as categorical exposures in separate models and then simultaneously in mutually adjusted models) and outcomes previously described (see Table S3 for the stages of the models). Binary variables (no difficulties versus difficulties performing at least one activity) for mobility impairment, ADL disabilities, and IADL disabilities were used as secondary outcomes in the multilevel logistic models.
ELSA has a hierarchical structure, with repeated measures nested within persons. Therefore, we included a random intercept at the individual level and a random slope according to time (defined by the wave of follow-up) in the linear multilevel models. Due to convergence issues encountered when running the random intercept and (random) slope models, random intercept (fixed slope) models were performed for all binary outcomes. The baseline cross-sectional sampling weight was applied to all models to improve population representativeness.
As a sensitivity analysis, multiple imputation with chained equations was conducted to replace missing data on covariates and outcomes, using all exposures (total SPPB score, balance score, repeated chair stand test score, and gait score), outcomes, and covariates as predictors, as well as several baseline auxiliary variables including self-reported general health (1: poor, 5: excellent), occupational class (using the three-class National Statistics Socio-Economic Classification), smoking status (never a smoker, former smoker, current smoker), alcohol consumption (less than once a week, one to four times per week, five or more times per week), living status (living alone versus not living alone), and the presence of any limiting long-standing illness (no versus yes). Predictive mean matching was used to impute the data on mobility impairment, ADL disabilities, IADL disabilities, BMI, and depressive symptoms, according to the "just another variable" approach [28]. Following the recommendations for imputing derived variables (e.g., BMI) [29], height and weight were incorporated into the imputation model. Given the discrepancies in the missing data across cognitive tests, the ‡-scores for the individual domains were imputed using a linear regression model [30]. The ‡-score of global cognitive function was imputed passively [30]. We included interactions between total SPPB score and age, and between total SPPB score and biological sex to ensure congruence with the primary analytical models [28,30]. The imputation model was weighted using the crosssectional sampling weight from Wave 2. The patterns of missing data are shown in the Supplementary Materials (Tables S4 and S5). The data were assumed to be missing at random, and 25 datasets were imputed and combined for analyses using Rubin's rules. In a further set of sensitivity analyses, we planned to analyse the three count outcomes (i.e., mobility impairment, ADL disabilities, and IADL disabilities) using multilevel mixedeffects negative binomial regressions. However, because of convergence issues, these models are not presented.

Descriptive Statistics
Of 6183 eligible participants aged 60+ years at baseline, our complete-case samples for models with mobility impairment, ADL or IADL disabilities, and falls as outcomes included 3548, 3547, and 3505 participants, respectively (Table 1, Figure S4). Descriptive statistics summarising the outcome variables at each of the follow-up waves are presented in Table S6. Compared with the complete-case samples, participants with missing data were older on average, and a higher proportion were female, of non-White ethnic origin, unmarried, and of lower socioeconomic status (Table S7). Moreover, the excluded samples had lower cognitive function scores, SPPB scores (total and domain-specific), and BMI values; they also reported more depressive symptoms, mobility impairments, ADL disabilities, and IADL disabilities. In addition, a lower proportion of participants in the analytical samples were physically inactive and had experienced a fall.  Table 1), these were treated as distinct outcomes in the analyses.
There was a slightly greater decrease (interaction term: −0.004 (95% CI: −0.006, −0.002)) in IADL disabilities per one-point increase in the total SPPB score for older participants in our sample (Model 7). However, there was no evidence that the associations between the total SPPB score, and mobility impairment, ADL disabilities, or falls were modified by age (all p > 0.05). For each one-point increase in the total SPPB score, the decrease in mobility impairment (interaction term: 0.06 (95% CI: 0.02, 0.11)) and the odds of falling (interaction term: 1.07 (95% CI: 1.02, 1.12)) was smaller among women than men, on average across all follow-ups (Model 8). Table 2. Mean changes in mobility impairment, ADL disabilities, IADL disabilities, and the odds ratios of falls per one-point increase in the total SPPB score, using repeated measures outcomes from seven waves of follow-up. ADL, activities of daily living; IADL, instrumental activities of daily living; SPPB, Short Physical Performance Battery; n, number of participants; CI, confidence interval; OR, odds ratio. 1 The quadratic time variable was subsequently excluded from the models, as there was no evidence of non-linearity. 2 Age, biological sex, ethnicity, marital status, employment status, education, and wealth. 3 Physical activity, body mass index, cognitive function, and depressive symptoms. Note: All values are weighted estimates.
No statistically significant interactions emerged in the multilevel logistic models with mobility impairment or IADL disabilities as outcomes. However, in addition to a statistically significant interaction with linear time (p = 0.005), the decrease in the odds of reporting one or more ADL disabilities for each one-point increase in the total SPPB score was smaller among older respondents (interaction term: 1.007 (95% CI: 1.003, 1.012); Model 11) in the multilevel logistic model. The results of multiple imputation were comparable with the primary analyses. An exception was the interaction between the total SPPB score and time, which was no longer statistically significant (p = 0.079) in the model with mobility impairment as the outcome (Model 9). Moreover, there was evidence of an interaction between the total SPPB score and quadratic time (0.005 (95% CI: 0.003, 0.008)) on IADL disabilities (Models 10 and 11).
on the association between the total SPPB score and mobility impairment (p = 0.013 for the interaction between the total SPPB score and quadratic time). These interactions were statistically significant in the mutually adjusted models (Model 11; Figures 1 and S5). No statistically significant interactions emerged in the multilevel logistic models with mobility impairment or IADL disabilities as outcomes. However, in addition to a statistically significant interaction with linear time (p = 0.005), the decrease in the odds of reporting one or more ADL disabilities for each one-point increase in the total SPPB score was smaller among older respondents (interaction term: 1.007 (95% CI: 1.003, 1.012); Model 11) in the multilevel logistic model. The results of multiple imputation were comparable with the primary analyses. An exception was the interaction between the total SPPB score and time, which was no longer statistically significant (p = 0.079) in the model with mobility impairment as the outcome (Model 9). Moreover, there was evidence of an interaction between the total SPPB score and quadratic time (0.005 (95% CI: 0.003, 0.008)) on IADL disabilities (Models 10 and 11).

Associations of the SPPB Domains with Adverse Health Outcomes
In the unadjusted models (Model 1; Table 3), balance scores of two or more were associated with significantly fewer mobility impairments and IADL disabilities (all p ≤ 0.01), relative to individuals with zero points (the reference class). There was a clear gradient in ADL disabilities from the highest to the lowest balance score (all p ≤ 0.001). Scoring three (odds ratio (OR) = 0.60, p = 0.042) or four (OR = 0.33, p < 0.001) points on the balance

Associations of the SPPB Domains with Adverse Health Outcomes
In the unadjusted models (Model 1; Table 3), balance scores of two or more were associated with significantly fewer mobility impairments and IADL disabilities (all p ≤ 0.01), relative to individuals with zero points (the reference class). There was a clear gradient in ADL disabilities from the highest to the lowest balance score (all p ≤ 0.001). Scoring three (odds ratio (OR) = 0.60, p = 0.042) or four (OR = 0.33, p < 0.001) points on the balance test was associated with significantly lower odds of falling. Gait scores of two or more were associated with fewer mobility impairments and ADL disabilities, and lower odds of falling (all p < 0.05); participants scoring three or four points on the gait test reported fewer IADL disabilities (both p < 0.001).    In the fully adjusted models (Model 6), balance was not associated with mobility impairment. Only participants scoring four points on the balance test had significantly fewer IADL disabilities (−0.36 (95% CI: −0.65, −0.06)) and reduced odds of falling (0.58 (95% CI: 0.37, 0.89)). Balance scores of two or more were associated with fewer ADL disabilities (all p < 0.01). Three or four points on the gait test were associated with fewer mobility impairments (both p < 0.05), and four points were associated with fewer (−0.42 (95% CI: −0.68, −0.17)) IADL disabilities. Higher gait scores were associated with significantly fewer ADL disabilities and decreased odds of falling (all p < 0.05). Sit-to-stand performance was associated with all outcomes in the expected directions in the unadjusted and fully adjusted models (all p < 0.01).
When the SPPB domains controlled for one another, the results were largely maintained (Table S10). However, in the unadjusted models, only a balance score of four was associated with significantly fewer IADL disabilities (p = 0.007). Balance was not associated with mobility impairment or falls. Furthermore, only participants with gait scores of three or four had significantly fewer mobility impairments and lower odds of falling (all p ≤ 0.001). In the fully adjusted models, balance and sit-to-stand scores were no longer associated with IADL disabilities or falls. A gait score of four was associated with a 0.70 (95% CI: −1.07, −0.33) decrease in mobility impairments and a 0.33 (95% CI: −0.59, −0.07) decrease in IADL disabilities. Participants scoring one, three, or four points on the gait test reported fewer ADL disabilities, and those scoring three or four points showed reduced odds of falling, relative to the reference groups (all p < 0.05). These patterns of the findings were broadly similar when using binary outcomes (Tables S11 and S12) and multiple imputation (Tables S13 and S14), although the statistical significance levels varied.

Discussion
In this study, we provide evidence that the SPPB, as a global measure of physical function, is consistently associated with favourable outcomes relating to robust physical health, including the prevention of mobility impairment, ADL disabilities, and IADL disabilities, as well as lower odds of falling. These associations remained across seven waves of follow-up spanning 14 years, even after adjusting for demographic and health-related covariates. The domain-specific results were more varied, whereby only lower extremity muscle strength, as measured by repeated chair stands, was consistently independently associated with favourable outcomes, although the statistical significance levels varied between the unadjusted and fully adjusted models. For balance and gait speed, only scores at the highest end of the respective scales were associated with better outcomes relative to the lowest (most unfavourable) possible score.
The present study, importantly, provides evidence for the discriminant validity of the SPPB, that is, its ability to distinguish between people with high and low function, in line with their differing functional capabilities. The finding that the total SPPB score was associated with favourable outcomes is consistent with other studies that have explored associations between physical function and a range of physical health outcomes [11][12][13][14]16,31], including difficulty performing ADL and IADL [13,14]. The results, however, contrast with those of a recent prospective study in Sweden, which showed that the SPPB score was not associated with falls in 202 older adults aged 75 years or older [32]. These discrepancies might be explained by the smaller sample size, the shorter 1-year follow-up, or the high mean (SD) baseline SPPB score of 10.7 (1.4) and the consequent limited variability in the Swedish study for detecting meaningful differences.
An interesting finding of the present study relates to the incongruent results amongst the subcomponents of the SPPB tool. We hypothesised that each domain of balance, strength, and gait speed would be independently associated with the outcomes of interest. However, this was not entirely the case. Differences in the performance of the subcomponents may be explained by the way they assess the respective aspects of physical function. The measure of strength, which asks the participant to perform five sit-to-stand repetitions as fast as they can, is the only one of the three independent assessments to measure maximal capability. The balance task, by contrast, has three levels of difficulty but the test ceases if a participant holds each position for 10 s, with individuals being assigned maximum points for achieving all three. Indeed, a top score of 4 was by far the most frequent score amongst the analysed cohort, accounting for over 78% of participants. Similarly, the gait speed test, which asks participants to walk at their normal, rather than top, speed also does not require maximal effort. The frequency of achieving a top score for gait speed was 68% in the current sample, compared with 40% for the chair rise test.
These findings suggest that the SPPB, when used as intended as a composite measure of physical function, is a useful tool for identifying individuals who might be more prone to falling or losing independence. Consistent with previous work, the analysis of this large dataset demonstrated that higher overall function scores may have a protective influence on physical health, both in the short and long term [17]. The strength component of the SPPB appears to be useful as an independent assessment of that domain. However, for independent assessments of balance or gait speed, researchers may look to other more sensitive measures to discriminate older adults' capabilities in these components of physical function. Such examples might be the Community Balance and Mobility Scale, which has been shown to overcome issues with ceiling effects pertaining to the SPPB [33], or a fast, rather than normal, gait speed test [34]. Nonetheless, our results suggest the SPPB can be used as a measure for helping clinical practitioners or researchers identify individuals who would benefit from interventions targeting improved physical function.
A strength of the present work is the use of data from a large longitudinal cohort study with long-term follow-up that allows an assessment of the associations between physical function and physical health over time. This study also benefitted from a rigorous analysis of the SPPB and its domains using statistical models assembled in several stages. Still, the analysis may be limited somewhat by the homogenous sample in terms of ethnicity (although it was diverse across certain markers of socioeconomic position), self-reported outcomes that may be prone to bias, and survival bias within the study population as reflected by the observed attrition across timepoints, which, given the age of the population and the length of follow-up, was not unexpected. Moreover, while zero-inflated negative binomial models have been recommended for analysing count outcomes (e.g., mobility impairment, ADL disabilities, IADL disabilities) with excess zeros, these models were not considered in the present work due to software constraints precluding the specification of a multilevel framework [35]. Nevertheless, these models are likely to produce less biased estimates than linear or logistic regression models, and may be better suited for the investigation of exposures associated with count outcome measures, such as mobility impairment, ADL disabilities, or IADL disabilities [35].

Conclusions
Overall, this study demonstrated that the SPPB was associated with adverse future physical health outcomes in a sample of older adults in England. As hypothesised, we observed that higher overall physical function, as measured by the total SPPB score, was associated with the maintenance of mobility, retention of the ability to complete ADL and IADL, and lower odds of falling. While the SPPB-measured chair rise test appears to be a useful measure of lower extremity strength, for isolated assessments of balance or gait speed, researchers and practitioners may be advised to seek more robust and sensitive measures.
Supplementary Materials: The following supporting information can be downloaded at https://www. mdpi.com/article/10.3390/ijerph192316319/s1. Table S1: STROBE checklist. Figure S1: Flowchart summarising the balance test's scoring system. Figure S2: Flowchart summarising the repeated chair stand test's scoring system. Figure S3: Flowchart summarising the gait test's scoring system. Table S2: Specifications for multilevel models with the total SPPB score as the exposure. Table S3: Specifications for multilevel models with SPPB domain scores (balance, repeated chair stands, and gait) as the exposures. Table S4: Patterns of missing data prior to multiple imputation. Table S5: Number of observations imputed per dataset. Figure S4: Flow of study members into the complete-case analytical samples. Table S6: Descriptive statistics for mobility impairment, ADL disabilities, IADL disabilities, and falls at each wave of follow-up. Table S7: Baseline characteristics of the complete-case samples used for the analyses versus the samples excluded because of missing data (unweighted). Table S8: Odds ratios for mobility impairment, ADL disabilities, and IADL disabilities per one-point increase in the total SPPB score, using the repeated measures outcomes from seven waves of follow-up. Table S9: Mean changes in mobility impairment, ADL disabilities, and IADL disabilities, and odds ratios for falls per one-point increase in the total SPPB score, using the repeated measures outcomes from seven waves of follow-up (imputed samples). Figure S5: Marginal effects of biological sex on mobility impairment (a) and falls (b) at representative values of the total SPPB score at baseline, and simple slopes for the relationship between total SPPB score at baseline and IADL disabilities at different ages (c). Table S10: Results of the multilevel model for the SPPB domain scores at baseline (balance, repeated chair stands, and gait) on mobility impairment, ADL disabilities, IADL disabilities, and falls over seven waves of follow-up: all exposures entered simultaneously. Table S11: Results of the multilevel logistic regression model for the SPPB domain scores at baseline (balance, repeated chair stands, and gait) on mobility impairment, ADL disabilities, and IADL disabilities over seven waves of follow-up: each exposure entered in separate models. Table S12: Results of the multilevel logistic regression model for the SPPB domain scores at baseline (balance, repeated chair stands, and gait) on mobility impairment, ADL disabilities, and IADL