# Estimating Prevalence of Coronary Heart Disease for Small Areas Using Collateral Indicators of Morbidity

## Abstract

**:**

## 1. Background

## 2. Modelling Latent Morbidity at the Lower Spatial Scale

_{L}denote the set of lower level small areas within a particular region, and let i = 1, .., N

_{H}denote the set of aggregated higher level areas (e.g., local health authorities) within which the small areas are nested. The available data contain P observed indicators y

_{j}= (y

_{j}

_{1}, .., y

_{jP}) at the small area scale (such as small area death totals), and counts Z

_{i}= (Z

_{i}

_{1}, .., Z

_{iQ}) (e.g., disease prevalence totals) observed only at the aggregated area scale. However, one aim of the modelling process is to develop estimates z

_{j}= (z

_{j}

_{1}, .., z

_{jQ}) of these indicators at a small area scale.

_{1}, .., f

_{R}), where R is of typically of much smaller dimension than the total number P + Q of observed indicators. For simplicity, a univariate common factor f = (f

_{1}, ...f

_{NL}) is considered (i.e. R = 1). In the parlance of factor analysis techniques, the set of observed indicators are proxies for, or ”measures of”, the underlying latent factor.

_{jp}(j = 1, .., N

_{L}, p = 1, …, P) and the latent factor. In population health applications, the indicators are typically discrete counts (e.g., deaths, hospital admissions), assumed either Poisson or binomial, so that a general linear mixed model is appropriate for the measurement equations. In the application here, mortality or admission is infrequent in relation to population at risk, and Poisson sampling is relevant. Expected mortality or admission counts O

_{jp}are obtained by applying a standard age-sex schedule (for the entire region, providing an internal standard, or for the nation, providing an external standard) to small area populations at risk. Then one has

_{jp}is the relative risk of outcome p in small area j. In the present application, expectations O

_{jp}are scaled to equal the total of expected counts over all small areas, namely $\sum _{j}{y}_{jp}=\sum _{j}{O}_{jp}$, so that the region wide average relative risk ρ

_{p}for indicator p is 1 if an internal standard is used.

_{jp}also control for structural influences unrelated to population morbidity per se (e.g., effectiveness of health care services, hospital configuration). Intercepts are not included in (2), so providing a form of location constraint on the latent variable f [8]. The coefficients λ

_{p}are typically known as loadings, the specification of which is considered below.

_{j}= (x

_{j}

_{1}, .., x

_{jL})′ (such as small area socio-economic or population risk variables) of the latent morbidity index. These influence the latent morbidity index f

_{j}via regression terms

_{jl}are standardised, the absolute size of the β coefficients measures the relative importance of different population risk factors or socio-economic variables in defining the morbidity index.

_{jh}. Also let f

_{[}

_{j}

_{]}= (f

_{1}, …f

_{j}

_{−1}, f

_{j}

_{+1}, …, f

_{NL}) denote the collection of morbidity effects for all areas but area j. Under the scheme of Leroux et al. [10], though adapted here to include regression effects, as in (3.1) (3.2), the expected value of the latent effect in area j and its variance are

_{jh}may incorporate factors such as distances between areas j and h. However, in many applications the w

_{jh}simply represent adjacency, namely w

_{jh}= w

_{hj}= 1 if areas h and j are adjacent, zero otherwise. In this case it is relevant to define the neighbourhood ∂

_{j}of small area j, which contains the m

_{j}areas adjacent to area j, and one then has $\sum _{h\ne j}{w}_{jh}={m}_{j}$. The expectations are then

_{p}, or on the variance ${\sigma}_{f}^{2}$ in (4.2). The first kind involves standardized factors, with ${\sigma}_{f}^{2}=1$, as in the spatial factor model of Wang and Wall [11], with all loadings then unknowns. An alternative constraint involves appropriately fixed loadings, such as setting one of the loadings λ

_{p}to a particular fixed value, usually 1. The variance ${\sigma}_{f}^{2}$ is then an unknown parameter.

## 3. Methods: Estimating Prevalence at Small Area Level based on the Morbidity Index

_{j}

_{1}, …, z

_{jQ}) (e.g., prevalence totals) for small areas j = 1, …, N

_{L}. Estimation of the missing lower area scale data takes account (a) of values of the small area morbidity index f = (f

_{1}, …, f

_{NL}), and (b) of the known prevalence totals (Z

_{i}

_{1}, …, Z

_{iQ}) for the i = 1, …, N

_{H}higher level areas. The small areas are nested within the higher level areas, with H

_{j}∈ {1, …,N

_{H}} denoting the higher level area to which small areas j belong, and the region is defined equivalently by all the higher level areas or all the lower level areas.

_{jq}of the small area counts z

_{jq}. This involves using an external schedule of prevalence rates r

_{qsk}for the q

^{th}outcome by age k and sex s, and applying this schedule to small area population estimates P

_{jsk}, so that ${E}_{jq}=\sum _{s}\sum _{\kappa}{P}_{js\kappa}{r}_{qs\kappa}$.

_{j}

_{1},…, z

_{jQ}) take account of the observed prevalence counts Z

_{iq}of the higher level areas they are located in, the Poisson means Δ

_{iq}in the likelihood Z

_{iq}∼ Po(Δ

_{iq}) for the higher level observed totals Z

_{iq}are defined by totals of small area means δ

_{jq}located within each higher area. Thus let

_{jq}) can be set up to ensure that the posterior means of the Δ

_{iq}equal (to a close approximation) the known higher level totals Z

_{iq}. One way to achieve thus is via a collection of N

_{H}fixed effects γ

_{q},H

_{j}in the model for the δ

_{jq}, equivalent to using dummy variables in the small area model for each higher scale area, and providing a Poisson equivalence to the multinomial [12]. Thus the z

_{jq}for H

_{j}= i are multinomial within Z

_{iq}. We also wish the values of the latent morbidity index f

_{j}to influence the multinomial allocation of Z

_{iq}to small areas in a manner analogous to that in Equation (2). So the small area model is

_{P}

_{+}

_{q}are additional loadings on the latent spatial morbidity index. Whether they are set to known values or taken as unknowns depends on the identification constraint adopted for the scale of the f

_{j}.

_{NH}), for example, as random rather than fixed effects, in practice have a very similar consequence. that the means of the Δ

_{iq}equal (to a close approximation) the known higher level total Z

_{iq}. For example, one might use random effect spatial priors, comparable to (4)–(5) but at the higher area level.

_{iq}are accurate measures of morbidity, and a constraint to reproduce them may not be advantageous. For example, the prevalence counts obtained under the QOF system in England may under-record prevalence in deprived areas, since the quality of primary care is lower in such areas [13], this may result in less effective case-finding [14]. To allow unconstrained estimation of small area prevalence counts, one may use an intercept in the model for δ

_{jq}that is not specific to the higher area, namely

_{j}sum to the known Z

_{i}for large area i within which areas j are located. It seems reasonable to use socioeconomic variables x as causes of variability in f, but another strategy would be to use small area socioeconomic variables as additional indicators of the latent variable.

## 4. CHD Morbidity in London: Data

_{L}= 625 wards and N

_{H}= 31 PCTs in London. The first two small area indicators (y

_{j}

_{1}, y

_{j}

_{2}) are male and female CHD deaths over 2004–2006, while (y

_{j}

_{3}, y

_{j}

_{4}) are male and female hospitalisations for CHD over three financial years 2003–2004 to 2005–2006. Expected deaths and hospitalisations O

_{jp}in (2) are based on London wide death and hospitalisation rates specific to gender and five year age bands.

_{i}(for 2004–2005 and 2005–2006 combined) are observed only for PCTs, but one goal of the model is to estimate missing small area CHD prevalence totals z

_{j}. Expected CHD prevalence totals E

_{jq}= E

_{j}at ward level in (7.2) are obtained with an external schedule of CHD prevalence rates by age and sex, and applying this schedule to small area population estimates (here 2005 intercensal estimates of ward populations developed by the UK Office of National Statistics). The external schedule used is based on the 2003 Health Survey for England [15], with the expectations E

_{j}scaled so that the London wide standard prevalence ratio is 1 (i.e., the total of observed prevalence counts Z

_{i}across all London PCTs equals the total of expected prevalence counts E

_{j}over all London wards).

_{1}= average weekly household income in 2001–2002 [16], x

_{2}= proportion of population of south Asian ethnicity, 2001 Census [17,18], and x

_{3}= estimated ward level smoking prevalences [19]. These predictors are converted to standardised form so that their relative importance can be assessed.

## 5. CHD Morbidity in London: Models

_{P}

_{+1}= λ

_{5}= 1, so that ${\sigma}_{f}^{2}$ is an unknown, the inverse variance $1/{\sigma}_{f}^{2}$ is accordingly assigned a Gamma(1,1) prior. To ensure the model produces a positive index of CHD morbidity, the remaining λ

_{p}parameters also follow Gamma(1,1) priors [20]. Fixed effect parameters, namely β parameters in (3) and γ parameters in (7.2) and (8) are assigned diffuse N(0, 100) priors, while a uniform prior k ∼ U(0, 1) is assumed for the spatial correlation coefficient in (4)–(5).

_{rep}|y), under a mixed predictive approach [22], where sampled replicates y

_{rep}are based on model means that include replicate samples from random effects (f and u effects). Then a mixed-predictive test for area j and outcome p has the form

_{jp,mix}< 0:05) or over-fitted (p

_{jp,mix}> 0:95) with the expected proportions in these two tails (namely 0.05 in each).

_{l}parameters from the multiple cause regression (3) show income differences between wards to be the most important known influence on the index, though concentrations of south Asian ethnic groups are also important. As expected, higher income levels are negatively associated with morbidity (so the 95% interval for the coefficient β

_{1}is confined to negative values). The importance of area socioeconomic status to CHD outcomes is confirmed by other studies [24,25].

_{j}= z

_{j}/E

_{j}. For policy purposes, the probability that a small area has significantly higher relative risk and thus possibly needs special resources is important. Therefore the marginal variance ω

^{2}= var(ξ

_{j}) is monitored during the MCMC run, and the standardized relative risks (SRRs)

_{j}> 1, −1 < SRR

_{j}< 1 and SRR

_{j}< −1. Clusters of elevated risk are now clearly apparent.

## 6. Discussion

_{i}of prevalence, one can estimate lower scale totals z

_{j}, using information on both socioeconomic structure (x

_{j}) and related outcomes (y

_{j}) at the lower spatial scale. The procedures outlined in the paper could in fact be used to disaggregate survey based estimates Z

_{ik}which include relevant demographic stratifiers k (e.g. age, sex, ethnicity). Relevant spatial SEM coefficients (β and λ parameters) may well differ between demographic category. For example, one might seek to disaggregate survey-based regional estimates of diabetes by ethnicity to a lower spatial scale.

## References

- Sundquist, K; Malmström, M; Johansson, S; Sundquist, J. Care need index, a useful tool for the distribution of primary health care resources. J. Epid. Comm. Health
**2003**, 57, 347–352. [Google Scholar] - Morris, R; Whincup, P; Lampe, F; Walker, M; Wannamethee, S; Shaper, A. Geographic variation in incidence of coronary heart disease in Britain: the contribution of established risk factors. Heart
**2001**, 86, 277–283. [Google Scholar] - Hogan, J; Tchernis, R. Bayesian factor analysis for spatially correlated data, with application to summarizing area-level material deprivation from census data. J. Amer. Stat. Assoc
**2004**, 99, 314–324. [Google Scholar] - Kline, R. Principles and practice of structural equation modeling; Guilford Press: New York, NY, USA, 2004. [Google Scholar]
- Gelfand, A; Smith, A. Sampling based approaches to calculate marginal densities. J. Amer. Statist. Assoc
**1990**, 85, 398–409. [Google Scholar] - Lunn, D; Spiegelhalter, D; Thomas, A; Best, N. The BUGS project: evolution, critique and future directions. Stat. Med
**2009**, 28, 3049–3067. [Google Scholar] - McCullagh, P; Nelder, J. Generalized Linear Models; Chapman and Hall/CRC: New York, NY, USA, 1989. [Google Scholar]
- Best, N; Hansell, A. Geographic variations in risk: adjusting for unmeasured confounders through joint modeling of multiple diseases. Epidemiology
**2009**, 20, 400–410. [Google Scholar] - Rezaeian, M; Dunn, G; St Leger, S; Appleby, L. Geographical epidemiology, spatial analysis and geographical information systems: a multidisciplinary glossary. J. Epidemiol. Community Health
**2007**, 61, 98–102. [Google Scholar] - Leroux, B; Lei, X; Breslow, N. Estimation of disease rates in small areas: a new mixed model for spatial dependence. In Statistical Models in Epidemiology, the Environment and Clinical Trials; Halloran, M, Berry, D, Eds.; Springer-Verlag: New York, NY, USA, 1999; pp. 135–178. [Google Scholar]
- Wang, F; Wall, M. Generalized common spatial factor model. Biostatistics
**2003**, 4, 569–582. [Google Scholar] - Forster, J. Bayesian Inference for Poisson and Multinomial Log-Linear Models; Southampton Statistical Sciences Research Institute: Southampton, UK; (S3RI Methodology Working Papers, M09/11), 2009. [Google Scholar]
- Wright, J; Martin, D; Cockings, S; Polack, C. Overall QOF scores lower in practices in deprived areas. Br. J. Gen. Prac
**2006**, 56, 277–279. [Google Scholar] - Sigfrid, L; Turner, C; Crook, D; Ray, S. Using the UK primary care Quality and Outcomes Framework to audit health care equity: preliminary data on diabetes management. J. Publ. Health
**2006**, 28, 221–225. [Google Scholar] - Strong, M; Maheswaran, R; Radford, J. Socioeconomic deprivation, coronary heart disease prevalence and quality of care: a practice-level analysis in Rotherham using data from the new UK general practitioner Quality and Outcomes Framework. J. Publ. Health
**2006**, 28, 39–42. [Google Scholar] - Heady, P; Clarke, P; Brown, G; Ellis, K; Heasman, D; Hennell, S; Longhurst, J; Mitchell, B. Model-Based Small Area Estimation; Office for National Statistics: London, UK, 2003. [Google Scholar]
- Forouhi, N; Sattar, N; Tillin, T; McKeigue, P; Chaturvedi, N. Do known risk factors explain the higher coronary heart disease mortality in south Asian compared with European men? Prospective follow-up of the Southall and Brent studies, UK. Diabetologia
**2006**, 49, 2580–2588. [Google Scholar] - Tziomalos, K; Weerasinghe, C; Mikhailidis, D; Seifalian, A. Vascular risk factors in South Asians. Int. J. Cardiol
**2008**, 128, 5–16. [Google Scholar] - Scarborough, P; Allender, S; Rayner, M; Goldacre, M. Validation of model-based estimates (synthetic estimates) of the prevalence of risk factors for coronary heart disease for wards in England. Health and Place
**2009**, 15, 596–605. [Google Scholar] - Sahu, S. Bayesian estimation and model choice in item response models. J. Stat. Comp. Sim
**2002**, 72, 217–232. [Google Scholar] - Spiegelhalter, D; Best, N; Carlin, B; van der Linde, A. Bayesian measures of model complexity and fit. J. Roy. Stat. Soc. B
**2002**, 64, 583–639. [Google Scholar] - Marshall, C; Spiegelhalter, D. Identifying outliers in Bayesian hierarchical models: a simulation-based approach. Bayesian Analysis
**2007**, 2, 1–33. [Google Scholar] - Brooks, S; Gelman, A. Alternative methods for monitoring convergence of iterative simulations. J. Comp. Graph. Stat
**1998**, 7, 434–456. [Google Scholar] - Winkleby, M; Sundquist, K; Cubbin, C. Inequities in CHD incidence and case fatality by neighborhood deprivation. Am. J. Prev. Med
**2007**, 32, 97–106. [Google Scholar] - Sundquist, K; Winkleby, M; Ahlén, H; Johansson, S. Neighborhood socioeconomic environment and incidence of coronary heart disease: a follow-up study of 25,319 women and men in Sweden. Am. J. Epid
**2004**, 159, 655–662. [Google Scholar] - Hogan, J; Tchernis, R. Bayesian factor analysis for spatially correlated data, with application to summarizing area-level material deprivation from census data. J. Amer. Stat. Assoc
**2004**, 99, 314–324. [Google Scholar] - Liu, X; Wall, M; Hodges, J. Generalized spatial structural equation modeling. Biostatistics
**2005**, 6, 539–557. [Google Scholar] - Jarman, B. Underprivileged areas: validation and distribution of scores. Brit. Med. J
**1984**, 289, 1587–1592. [Google Scholar] - Glover, G; Robin, E; Emami, J; Arabscheibani, G. A needs index for mental health care. Soc Psychiatry Psychiatr Epidemiol
**1998**, 33, 89–96. [Google Scholar]

Average Deviance | Complexity | DIC | Proportion of y values with Pr(y_{rep} > y) under 0.05 or over 0.95 | |
---|---|---|---|---|

Model 1 (Multinomial Constraint) | 2,570 | 1,290 | 3,860 | 0.093 |

Model 2 (unconstrained) | 2,529 | 1,512 | 4,041 | 0.085 |

Mean | Stdevn | Monte Carlo SE | 2.5% | 97.5% | ||
---|---|---|---|---|---|---|

Model 1 | β_{1} | −0.199 | 0.020 | 0.002 | −0.242 | −0.162 |

β_{2} | 0.180 | 0.023 | 0.002 | 0.139 | 0.231 | |

β_{3} | 0.050 | 0.020 | 0.002 | 0.009 | 0.087 | |

κ | 0.937 | 0.055 | 0.002 | 0.800 | 0.998 | |

λ_{1} | 0.450 | 0.039 | 0.003 | 0.371 | 0.525 | |

λ_{2} | 0.410 | 0.044 | 0.003 | 0.325 | 0.495 | |

λ_{3} | 0.720 | 0.040 | 0.004 | 0.638 | 0.783 | |

λ_{4} | 0.769 | 0.041 | 0.004 | 0.682 | 0.893 | |

Model 2 | β_{1} | −0.394 | 0.032 | 0.003 | −0.461 | −0.333 |

β_{2} | 0.158 | 0.025 | 0.0001 | 0.108 | 0.208 | |

β_{3} | 0.062 | 0.030 | 0.002 | 0.007 | 0.120 | |

κ | 0.935 | 0.055 | 0.002 | 0.797 | 0.998 | |

λ_{1} | 0.303 | 0.024 | 0.001 | 0.258 | 0.352 | |

λ_{2} | 0.241 | 0.026 | 0.001 | 0.191 | 0.292 | |

λ_{3} | 0.366 | 0.020 | 0.001 | 0.326 | 0.406 | |

λ_{4} | 0.413 | 0.023 | 0.001 | 0.366 | 0.456 |

PCT | Observed from QOF | Expected using HSE 2003 as standard | RR based on actual QOF prevalence records | Rank of RR | Average income | Income rank |
---|---|---|---|---|---|---|

Barking and Dagenham | 9,800 | 9,147 | 1.071 | 25 | 5.3 | 2 |

Barnet | 20,161 | 19,287 | 1.045 | 21 | 7.6 | 25 |

Bexley | 13,973 | 14,778 | 0.946 | 12 | 6.7 | 14 |

Brent | 14,040 | 13,542 | 1.037 | 20 | 6.6 | 10 |

Bromley | 19,466 | 20,883 | 0.932 | 11 | 7.5 | 23 |

Camden | 9,430 | 9,389 | 1.004 | 15 | 7.3 | 21 |

City and Hackney | 9,030 | 8,746 | 1.033 | 19 | 5.5 | 4 |

Croydon | 17,519 | 18,868 | 0.928 | 9 | 6.8 | 15 |

Ealing | 18,410 | 15,228 | 1.209 | 29 | 7.3 | 22 |

Enfield | 14,839 | 16,232 | 0.914 | 5 | 6.6 | 11 |

Greenwich | 12,419 | 11,464 | 1.083 | 26 | 5.8 | 5 |

Hammersmith and Fulham | 7,022 | 7,609 | 0.923 | 7 | 7.8 | 27 |

Haringey | 9,318 | 9,360 | 0.996 | 14 | 6.6 | 12 |

Harrow | 13,680 | 12,949 | 1.056 | 22 | 7.7 | 26 |

Havering | 16,650 | 16,538 | 1.007 | 16 | 6.9 | 16 |

Hillingdon | 13,929 | 14,408 | 0.967 | 13 | 7.2 | 19 |

Hounslow | 8,127 | 7,663 | 1.061 | 24 | 6.5 | 9 |

Islington | 13,929 | 14,408 | 0.967 | 13 | 7.2 | 19 |

Kensington and Chelsea | 6,953 | 9,506 | 0.731 | 1 | 8.0 | 28 |

Kingston | 8,573 | 8,485 | 1.010 | 17 | 8.1 | 29 |

Lambeth | 9,768 | 10,499 | 0.930 | 10 | 6.6 | 13 |

Lewisham | 12,027 | 11,348 | 1.060 | 23 | 6.1 | 8 |

Newham | 12,495 | 9,433 | 1.325 | 31 | 4.8 | 1 |

Redbridge | 14,488 | 14,223 | 1.019 | 18 | 6.9 | 17 |

Richmond and Twickenham | 7,802 | 10,312 | 0.757 | 2 | 9.0 | 31 |

Southwark | 10,233 | 11,168 | 0.916 | 6 | 6.0 | 6 |

Sutton and Merton | 19,303 | 21,385 | 0.903 | 4 | 7.6 | 24 |

Tower Hamlets | 9,724 | 7,523 | 1.293 | 30 | 5.4 | 3 |

Waltham Forest | 11,955 | 10,640 | 1.124 | 27 | 6.0 | 7 |

Wandsworth | 10,904 | 11,763 | 0.927 | 8 | 8.4 | 30 |

Westminster | 9,921 | 11,097 | 0.894 | 3 | 7.1 | 18 |

© 2010 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Congdon, P. Estimating Prevalence of Coronary Heart Disease for Small Areas Using Collateral Indicators of Morbidity. *Int. J. Environ. Res. Public Health* **2010**, *7*, 164-177.
https://doi.org/10.3390/ijerph7010164

**AMA Style**

Congdon P. Estimating Prevalence of Coronary Heart Disease for Small Areas Using Collateral Indicators of Morbidity. *International Journal of Environmental Research and Public Health*. 2010; 7(1):164-177.
https://doi.org/10.3390/ijerph7010164

**Chicago/Turabian Style**

Congdon, Peter. 2010. "Estimating Prevalence of Coronary Heart Disease for Small Areas Using Collateral Indicators of Morbidity" *International Journal of Environmental Research and Public Health* 7, no. 1: 164-177.
https://doi.org/10.3390/ijerph7010164