1. Introduction
There has been a proliferation of place (or neighbourhood) and health research over the last two decades in which a range of attributes related to place of residence and their potential link to behavioural and clinical outcomes have been investigated in multiple contexts. Methodological heterogeneity across studies, however, precludes the assessment of robustness of these associations and their comparison across study sites [
1]. For this reason replication studies have been recommended. Beyond the ascertainment of general conclusions relating to the effect of attributes of where people live on health, there is also the need to enhance our understanding of the processes which both shape and explain these observations [
2]. Causal inference may be augmented through studies that seek to capitalise on “natural” variations in contextual conditions, which exist across regions and countries [
3,
4]. Such comparisons are starting to emerge [
5,
6,
7]. Critical to the success of these comparisons is not only the development of comparable attributes of place and clinical outcome measures, but also an
a priori understanding of the geographic clustering of health outcomes within differently scaled expressions of a given context as well as altogether different contexts. Gaining insights into the scale and nature of environmental influences on cardiometabolic diseases in different contexts can therefore help guide the selection of an appropriate geographic unit to be used for comparison purposes. Researchers undertaking cross-context comparisons appear to have an appreciation of measurement specification, but limited attention seems to have been paid to geographic clustering thus far.
Geographic clustering of outcomes can be assessed using either geospatial measures, such as the Moran’s I [
8], or the standard Intra-Class Correlation (ICC) coefficient, each readily calculated using standard geographic administrative units and relevant to the multi-level analytic models often used in place and health studies. The ICC provides information on the degree of “relatedness” of aggregates of people in the same geographic (or social) unit, reflecting shared experiences or conditions [
9]. This information is useful to orient future research on the geographic drivers of cardiovascular diseases (CVD) by analysing CVD risk factors that have the largest amount of within-area correlation and between-area variability. Information on the strength of clustering of risk factors is important to public health authorities to assist the targeting and analysis of interventions to reduce regional disparities in disease [
10].
It is also known that naively analysing clustered observations as if they were independent results in underestimation of the standard error of the estimated cluster-level effect and thus the inflation of Type I error rates [
11]. The ICC quantifies the bias created by not accounting for the extent of non-independence in analyses and has therefore utility in sample size calculations and accurate estimates of parameter effects and their variability in clustered designs [
12]. Intra-Class Correlation is a function of the similarity of individuals within groups, relative to the similarity between groups. Thus, the extra variation, attributable to the geographic unit, is given by the ICC indexing the geographic clustering of individuals for the geographic unit being analysed. The effect of clustering depends, however, on both the cluster size and the ICC by which the ‘design effect’ can be expressed as a multiplier of the usual variance estimate [
13,
14]. For a given ICC, the design effect can therefore serve to quantify the extent of analytic efficiency associated with the use of differently sized geographic units (translating into different numbers of geographic units for a given region). The design effect will be close to 1.0 for large numbers of geographic units, indicating a minimal impact on analytic efficiency. The design effect can be much greater, and analytic efficiency correspondingly reduced, for limited numbers of geographic units. Understanding the effects of clustering on analytic efficiency hence has important implications for the selection and use of different geographic units in place and health analyses.
Using Australian and French cardiometabolic risk factor data from population-based cohorts in large metropolitan centres in each nation, this study of the geographic clustering of risk factors aimed to provide knowledge relevant to decisions on choice of geographic unit and interpretation of analyses of relationships between environmental factors and cardiometabolic outcomes for different places. The objectives were to describe and compare the extent of geographic variation, and the implications for analytic efficiency, across a range of cardiometabolic parameters for geographic units in these countries. In so doing, this report implicates cross-country differences in the distribution and/or role of environmental features that bear on cardiometabolic health. It also provides information that can assist place and health researchers in understanding the implications of geographic clustering of health outcomes related to decisions on geographic units for sampling and analysis.
2. Materials and Methods
2.1. Context and Data Sources
This study uses baseline data from cohort studies in metropolitan regions of Adelaide, South Australia and Paris, France. Both cohort and their study regions are described below.
2.1.1. Australian Cohort and Study Region
Adelaide was settled in 1836 with systematic urban planning by Colonel William Light [
15], featuring a square mile street grid and parklands region surrounding a central business district. The Adelaide metropolitan region has developed over time to stretch 80 km north to south, and 30 km east to west, and also to reflect a diverse ethnic and socioeconomic profile. In 2001, population density for the Adelaide Statistical Division was 587.8 persons per square kilometre. The Adelaide baseline cohort study region includes the northern and western regions of metropolitan Adelaide (see
Figure 1), accounting for 38% of the city population and 28% of the population of the state of South Australia [
16]. The population of the study region reflects the post-World War II migration of peoples from Europe and the United Kingdom. Relative to the state, the study region has high proportions of low income families and low full-time secondary school participation rates [
17]. Despite this profile, the region covers a diverse spectrum of both high and low socioeconomic status and ethnic background residents, as well as, commercial, educational and industrial sites.
The Australian study is part of the Place and Metabolic Syndrome (PAMS) project which links geospatial data with clinical data, by which to analyse environmental attributes hypothesised to affect cardiometabolic risk and disease. Clinical data are derived from a population-based biomedical cohort, the North West Adelaide Health Study (NWAHS). Spatially-referenced data on cohort participants’ social and built environments are extracted from a comprehensive geographic information system (GIS). The NWAHS is a collaboration between the Central Northern Area Health Service (CNAHS), South Australian Department for Health and Ageing, The University of Adelaide, University of South Australia and SA Pathology [
18].
The NWAHS is comprised of adults aged 18 years and over from the north-west region of Adelaide, randomly selected from the Electronic White Pages (EWP) telephone directory. To date, there have been three waves of data collection, Wave 1 (2000–2003), Wave 2 (2004–2006) and Wave 3 (2008–2010). At baseline recruitment there were 8213 eligible individuals, of which 5850 completed a Computer Assisted Telephone Interview (CATI) including socio-demographic, behavioural and self-reported health questions. These participants were invited to undertake a clinical assessment, taking approximately 45 min and conducted by trained clinical staff. Clinical measures included weight, height, waist and hip girths, blood pressure, and a fasting blood sample from which glycaemic and lipid variables were determined. These assessments conducted at either of two hospitals within the north-west region were completed by 4056 participants or 49.4% of the eligible sample, and 69.4% of those who completed the CATI. All participants provided informed consent. Relative to the study region population, cohort participants were more likely to be female, aged 45 years and over, and hold a university degree.
NWAHS participants were assigned a geo-reference that represented their residential address at all stages of data collection. Of the original baseline sample, 4041 participants had a valid geo-reference. Missing geo-references reflect incomplete residential street addresses that were not traceable following a crosscheck of paper records or telephone contact. Ethics review board approval for the PAMS project was provided by the University of South Australia (P029-10; P030-10), the Central Northern Area Health Service (HREC No. 2010010), and South Australian Department for Health and Ageing (HREC/13/SAH/57).
2.1.2. French Cohort and Study Region
The city of Paris was remodelled by Baron Georges-Eugène Haussmann during the 1850s, incorporating regulations on building ways, public parks, and facilities. The grid plan of Paris saw frequent intersections and orthogonal geometry to accommodate daily population movement. According to the National Institute of Statistics and Economic Studies (Insee), the overall population density of the City of Paris in 2009 was 21,196 residents per square kilometre. Older cities such as this grew organically but to include regions less reminiscent of the original urban design. The Paris cohort study region, situated within the Île-de-France region, includes 10 of the 20 districts of Paris and 111 other municipalities of the region (see
Figure 1). These municipalities are located in close proximity to Paris, in the first crown of counties around Paris and further away in the second crown of counties in the Île-de-France region. The Île-de-France region has, overall, the highest socioeconomic status in France yet also the greatest socio-territorial inequalities. These inequalities were reflected in the municipalities sampled in the French cohort.
Figure 1 provides the geographic setting of the two study regions.
The RECORD Cohort Study (“Residential Environment and CORonary heart Disease” [
19]) includes 7290 participants recruited between March 2007 and February 2008 [
20,
21]. Additional participants were recruited for a second wave of data collection (2011–2013). The study is a collaboration between UMR-S 707 (Inserm—Université Pierre et Marie Curie) and the Centre d’Investigations Préventives et Cliniques of Paris. Participants benefit from a free health check-up, offered every five years to all working and retired employees and their families by the French National Health Insurance System for Salaried Workers. Participants were recruited without
a priori sampling during these two-hour long preventive health check-ups conducted by the Centre d’Investigations Préventives et Cliniques [
22,
23] in four of its health centres, located in the Paris metropolitan area (Paris, Argenteuil, Trappes, and Mantes-la-Jolie). Eligibility criteria were age 30 to 79 years, ability to complete the study questionnaires in French, and residence within the study region. Among those attending health centres and who were eligible based on age and place of residence, 10.9% were not admitted given linguistic or cognitive difficulties in completing questionnaires. Individuals admitted for participation received further information about the study from trained survey staff. Of these, 83.6% agreed to participate and completed the data collection protocol. Relative to the population attending the preventive health checks, those individuals admitted for participation in the study had greater education, and resided closer to the examination centres and in more affluent and/or low building density neighbourhoods [
24].
Participants were geocoded using their residential address in 2007–2008. Research assistants corrected all incorrect or incomplete addresses with the participants by telephone. Extensive investigations with local Departments of Urban Planning were conducted to complete the geocoding. Precise spatial co-ordinates and block group codes were identified for 100% of the participants. The study protocol was approved by the French Data Protection Authority (Authorisation #907011). All participants signed an informed consent to enter the study.
2.2. Outcome Variables
The French and Australian studies both focused on cardiovascular health and collected information on recognised risk markers for cardiovascular diseases. The following risk markers measured at baseline were therefore used for calculation and comparison of ICCs: Body Mass Index (BMI), waist girth, systolic blood pressure, diastolic blood pressure, resting heart rate, and fasting level of triglycerides, total cholesterol, high-density lipoprotein cholesterol (HDL-C), blood glucose, and glycosylated haemoglobin (HbA1c). Resting heart rate was not measured in the Australian cohort, and HbA1c was not measured in the French Cohort. Participants with a missing value for any outcome measure were excluded, resulting in a final sample of 6430 French and 3893 Australian participants (88.2%, and 96.3% of the original cohorts, respectively).
Both studies measured height via wall-mounted stadiometre and body mass using calibrated scales [
22,
25]. BMI was computed as body mass (kg)/height (m
2). Waist girth was measured with the participant standing evenly on both feet to the nearest millimetre using an inelastic tape placed midway between the lower ribs and iliac crests on the mid-axillary line [
23].
Supine brachial blood pressure was measured three times in the right arm after a 10 min rest period, using a manual sphygmomanometer [
22,
25]. Cuff size was selected based on arm girth. The first and the fifth Korotkoff phases were used to define systolic blood pressure (mmHg) and diastolic blood pressure (mmHg), respectively. The mean of the second and third reading of each measure was taken as the ‘true’ value [
23]. For the French cohort only, resting heart rate in beats per minute was measured by electrocardiogram using a Cardionics CardioPlug device following a 5–7 min rest period [
22].
For the Australian cohort, concentrations of fasting serum total cholesterol, HDL-C, and triglycerides, and fasting plasma glucose were measured using Chemistry Immuno Analyzer systems (Olympus AU5402 (total cholesterol/HDL-C), AU5401 (Triglycerides), Olympus AU5400 (Glucose); Olympus Optical Co. Ltd., Tokyo, Japan). Whole blood was used for HbA
1c. In the French cohort, biomarker concentrations were measured on serum under fasting conditions (enzymatic method, automat HITACHI 917, Hitachi, Meylan, France). The HDL cholesterol was measured by direct enzymatic assay with cyclodextrin. Further detail on methodologies for both cohort studies has been provided elsewhere [
26,
27].
2.3. Administrative Geographic Units
Cohort participants were located within a hierarchy of administrative geographic units according to country-specific standards. For Adelaide, administrative geographic units compliant with 2001 Australian Standard Geographical Classification (ASGC) [
28] and Census Geographic Areas [
29] included, from smallest to largest: Census Collection District (CD); Statistical Local Area (SLA); State Suburb; Postal Areas (POA); and Local Government Areas (LGA). The CD, the smallest geographic unit in the ASGC, averages 220 dwellings in urban areas [
28]. The four larger Australian geographic units were formed by aggregating CDs without omission or overlap. The Statistical Local Area (SLA) is a general purpose geographic unit and the base geographic unit by which population statistics other than Censuses are collected and disseminated [
28]. A Postal Area (POA) is created by allocating whole CDs to approximate to Australia Post
® postcode areas, and the State Suburb is formed by allocating CDs to form a unit that aligns with the most recent gazetted suburb at the time of Census [
29]. The Australia Post
® postcode and gazetted suburb are not a part of the ASGC or Census Geographic Areas and have no associated demography, and therefore, do not support direct comparisons.
For Paris, administrative geographic units included, smallest to largest: IRIS (Ilôts Regroupés pour l’Information Statistique) neighbourhoods (i.e., census block groups); TRIRIS neighbourhoods (i.e., census tracts); and municipalities. A further category, census blocks, refer to the portions of territory that are delimited by the street network. Given that the French participants were spread over a large territory, the number of participants per census street blocks is very low, thus this classification was not considered in the present study. IRIS neighbourhoods are defined by Insee (National Institute of Statistics and Economic Studies) based on the 1999 French Census to group census blocks largely homogeneous in socioeconomic status (SES) and housing features. IRIS neighbourhoods comprise an average of 2000 residents. TRIRIS areas are defined by Insee by grouping an average of three contiguous IRIS neighbourhoods from the same municipality. Municipalities can pertain to large, medium-size, or small towns and are the smallest administrative level for the election of representatives. The 10 districts of Paris assessed in this study were analysed as akin to the 111 independent municipalities to be included (hence “municipalities” number 121 in total). Some of the smallest municipalities in the sample were not subdivided into IRIS/TRIRIS neighbourhoods.
2.4. Demographic Characteristics
For each cohort, each participant’s date of birth and gender were recorded at baseline, with age being calculated from date of birth to clinic appointment. Educational attainment and income were also collected. These measures were used to describe the samples only.
2.5. Statistical Analyses
Two sample t-tests and two-sided chi-square tests were used to compare continuous and categorical demographic and clinical measures, between Australian and French samples.
ICCs were computed for each outcome variable and for each administrative geographic unit from within- and between-unit variance parameters estimated from a two-level random intercept multi-level linear model with no predictors [
12]. In linear models, the ICC quantifies the strength of association for measures of individuals within the same cluster, and the proportion of the total variability attributable to between-cluster variations [
30]. Thus, for continuous variables the ICC is expressed as:
where
is the variance in the true mean level of the outcome variable between clusters (the geographic unit variance component or between-cluster variance) and
is the variance in the outcome variable among study participants within a geographic unit (the individual-level variance component or within-cluster variance). The formulation based on correlated observations therefore is closely related to that based on variance components, as the ICC can be seen as a measure of the relative sizes of the two variance components. The ICC reflects the extra variation caused by the natural differences among the individuals within each cluster; it indexes dependence among individuals within a given cluster. Models were estimated with SAS software (version 9.1, SAS Institute Inc., Cary, NC, USA) using the Mixed procedure.
The design effect can be calculated as the ratio of the variance of the estimate under the actual (clustered) design to the variance of the estimate obtained assuming the same data to have come from a simple random sample [
13]. It quantifies the analytic efficiency of clustered designs and is used in sample size and statistical power calculations, and to adjust statistics naively generated under the assumption of independence. In this study, design effects were derived from ICC estimates by the following equation:
where
is the harmonic mean number of participants per unit for each administrative geographic unit. The harmonic mean was used due to sample sizes (participant numbers within unit) being unequal within geographic units of a given type. It provides an estimate of the equivalent sample size for each group under a balanced design that takes into account the loss of statistical power attributed to the unequal sample sizes.
ICCs were pooled across outcomes and geographic units using meta-analytic methods for correlations using the MedCalc Statistical Software version 16.2.1 (MedCalc Software bvba, Ostend, Belgium [
31]).
4. Discussion
This study of two population-based samples in different nations indicates that the proportion of variability in cardiometabolic risk factors explained by variation between geographic administrative units was overall relatively low, as indicated by a median ICC of 1% (
Table 3 and
Table 4). This modest level of clustering is consistent with several previous reports concerning the geographic clustering of cardiometabolic risk [
33], mortality [
34], self-reported health problems, quality of life and well-being [
35], yet lower than indicated by one report [
36]. Our results align, however, with community-level ICCs for cardiometabolic risk factors in a six-community heart health intervention project [
10].
Despite the overall low level of clustering, the degree of clustering varied according to geographic unit, risk factor, and population setting. Variations in clustering by geographic unit and health outcome are consistent with previous investigations [
35]. Considering both samples together, clustering was especially noticeable for BMI, resting heart rate and HbA
1c, but less so for blood pressure and lipidaemic outcomes. Blood pressure and lipid measures are routinely assessed in clinical settings and largely managed through pharmacological intervention. This contrasts with the management of anthropometric and glycaemic risk factors, which are largely (although not exclusively) recommended to be addressed by lifestyle change. The greater successes achieved through pharmacological approaches compared to preventive and health promotion efforts could explain why risk factors such as blood pressure and lipidaemia showed less variability than anthropometric and glycaemic outcomes. This interpretation is also consistent with the relatively higher within-area correlation observed for systolic as opposed to diastolic blood pressure in both samples. Poor blood pressure control is common among treated hypertensive patients and has been shown to be largely attributed to poor systolic blood pressure control [
37]. The above interpretation may also be partly coherent with the relatively high clustering that was found for resting heart rate, as resting heart rate control is not as commonly considered a clinical target as blood pressure control [
38].
More pronounced clustering (ICC > 2%) was observed for HbA
1c in the Australian sample, and for BMI and resting heart rate in the French sample. Design effects for these measures were consistently greater for the larger geographic administrative units in each nation. Markedly more clustering was observed for HbA
1c than for fasting glucose, these ICCs 4.91 and 2.69, respectively, in the Australian sample. Although both correspond to glycaemic control, HbA
1c is an indicator of long-term glycaemia and has been shown to respond to social environmental stress [
39,
40], and also to vary with race and ethnicity [
41,
42]. Both social stressors and ethnic composition are likely to vary locally, potentially explaining the high level of clustering found for HbA
1c. With respect to resting heart rate, a previous study based on the French Cohort [
43] showed that higher resting heart rates were associated with lower individual- and area-level socioeconomic status as well as measures of physical inactivity and larger waist girth. The fact that these factors each exhibit geographic variations in the French region examined might explain the significant level of geographic clustering that was detected.
Although the extent of clustering in waist girth did not greatly differ between samples, clustering was greater for BMI in the French sample. Median BMI was within the overweight range for the Australian sample but within the healthful range for the French sample. This finding may reflect variation in behavioural norms between the study regions. The French study has reported lesser geographic clustering for waist girth that for BMI, contrary to previous reports where more comparable clustering was observed for these anthropometric outcomes [
44,
45]. A
post-hoc analysis of the data suggested that clustering in waist girth was of a magnitude comparable to that of BMI, once statistical models were adjusted for age and gender. For example, for IRIS the ICCs were 5.6% for BMI and 4.0% for waist girth following adjustments.
The ICCs in the current study displayed sizeable variability and the corresponding design effects illustrate that even low ICCs will have substantial implications for analytic efficiency, depending on the size of the geographic administrative units used in analysis. What this study contributes beyond prior reports on geographic clustering is the demonstration, consistent across two different settings, that the extent of clustering given by the ICC is inversely related to the size of the geographic administrative unit in which individuals are clustered, whereas analytic inefficiency indexed by design effects greater than 1.0 is positively related to the size of the geographic unit in which individuals are clustered.
For a given population sample in a given geographic setting, the smallest geographic administrative unit will number greatest but include the least study participants, whereas the largest geographic unit will number least but include the greatest numbers of study participants. Decisions on the most appropriate scale and corresponding geographic administrative unit to use in spatial epidemiological studies must consider therefore the methodological relationship between analytic efficiency and clustering, and a priori the theoretical basis for a certain scale of observation and analysis.
As shown here, analytic efficiency is improved as the number of geographic units increases, even though parallel reductions in the size of geographic units generate reductions in the median number of study participants per geographic unit. This relationship reflects the fact that statistical power varies directly with the precision by which the mean level of a clustered outcome can be estimated, given by the inverse of:
where
is the variance in the true mean level of the outcome variable among clusters (the geographic unit variance component) and
is the variance in the outcome variable among study participants within a geographic unit (the individual-level variance component). The equation shows that if
is large relative to
, then only modest gains will accrue by increasing the number (
n) of study participants per geographic unit, whereas far larger gains in power can be obtained if the number of clusters (
c) is increased.
Clustering estimates provide an understanding of the scale at which environmental influences operate [
46] and, with estimates of the design effect and analytic efficiency, can inform the selection of administrative geographic units to be used for local as well as cross-country comparisons. Overall, greater clustering was found for smaller geographic units such as the French IRIS and the Australian Collection District and State Suburbs as opposed to larger units. Contrary to the Australian Collection District and French IRIS units, which had few participants per unit, Australian State Suburb and French TRIRIS units were found to have relatively high variability while still maximising the number of geographic units and participants per unit available for analysis and thus providing greater statistical power for analyses conducted at that level. Final decisions regarding the selection of appropriate units for cross-site comparisons need to be informed by not only the above observations but also relevant theory, and the nature and expected variability of the environmental exposure hypothesised to explain the geographic clustering in outcomes.
The study represents a first step in an international collaborative effort to investigate differences between countries in place and health relationships. Both studies have already yielded evidence that features of geographic areas do reflect local variations in cardiometabolic risk factors. For instance, the French study has documented associations between individual and neighbourhood socio-economic status (SES) and resting heart rate [
43]; between neighbourhood SES, neighbourhood urbanicity, supermarket characteristics and BMI [
21,
44,
45]; and between socioeconomic, physical, service, and social environment characteristics and blood pressure [
20,
47]. The Australian study has reported on associations between public open space [
48] measures, a wealth indicator derived from property value [
49], walkability [
50], and socioeconomic conditions of areas [
51,
52] and a range of cardiometabolic risk markers. The next step in this Australia–France comparison is to derive common measures of these social and built environmental features, and using appropriate geographic administrative units, assess the degree to which they can explain the local-area geographic variations in cardiometabolic health illustrated by the present study.
The current study examined geographic variation simply in the prevalence of cardiometabolic risk factors. This approach precludes inference as to whether individual socioeconomic conditions or environmental conditions shaped the geographic distribution of health outcomes or whether relatively healthier (or unhealthy) individuals tended to migrate toward particular areas. Further research is needed to assess whether risk factors in participants followed up over time retain the same extent of geographic patterning. Differences and/or similarities between Adelaide and Paris also require follow-up assessment. A greater residential density of study participants in the Paris region may have contributed to the slightly higher average level of geographic clustering observed for the French sample. The French study also covered a larger area, which resulted in fewer participants within units with comparable sizes. The study assessed risk factors for chronic conditions characterised by very long induction periods. It is not known for how long participants had lived in a given geographic unit, which may have had an impact on the extent of clustering found. Finally, this study sought to identify the geographic administrative units that had the greatest utility, in terms of both clustering and analytic efficiency, across the two study contexts by which to inform cross-country comparisons. For this reason, the ICC, estimating clustering within pre-determined administrative units, was preferred over additional geographic clustering statistics (e.g., Moran’s I) that assess the clustering of discrete point locations over a given space continuum.