1. Introduction
Obesity and related chronic diseases are recognized as a global population health challenge [
1,
2]. Calls have been made to move beyond behavioral risk factors such as physical activity levels for these conditions to consider associated environmental, political, economic, and social determinants in populations [
3,
4], including the effects of built environment features on energy balance behaviors and active living [
5]. Any call to develop and implement public health and urban planning policies to address these relationships [
6,
7] may be premature, however. Some built environment features (e.g., land use mix, intersection density, and recreational facilities) have been shown to be related to some individual health outcomes (e.g., body mass index, physical activity, healthy eating) [
8,
9,
10,
11,
12], although not consistently [
13,
14,
15,
16].
A recent review of tools used to assess environments for physical activity [
15] distinguishes two general types: (1) self-reports of
perceptions of the environment typically completed by participants in studies that also record information about physical activity information; and (2) instruments that are used to independently assess the environment for particular properties. This second category includes both
GIS-based measures which use existing administrative data as a basis for the formation of measures, and
observational measures or community audits that involve the direct observation of features of the built environment thought to be relevant to physical activity. The reviewers call for additional research with all tools to establish relationships between characteristics of the built environment and physical activity, as well as the relationships between different ways of assessing the built environment. Such research is important to establishing an evidence base to support specific improvements to the built environment that will result in improvements in active living and physical activity.
Our work has focused upon the objective assessment of the micro features of the built environment, in part because we wished to form a database of this information for future use by community partners. Databases of the spatial distribution of individual features of the built environment have many potential uses, including the secondary identification of specific local barriers to, and assets for, physical activity as part of community planning [
17,
18,
19]. We chose to use the Irvine Minnesota Inventory (IMI) [
20,
21] because it is the most comprehensive of the community audit tools.
Two recent papers [
22,
23] have reported relationships between individual health behaviors and scales developed by aggregating IMI items. This paper reports our unsuccessful attempts to (1) replicate these findings using similar measures in a two different locations; and (2) our subsequent attempt to examine the properties of the proposed scales and to repeat the scale construction procedures. Our findings call into question the viability of these procedures to create scales that will generalize. We believe that it is important to make these failures known for several reasons: (1) recent work suggests that current publication practices result in many false positive findings being reported [
24]; (2) the failure to publish failures to replicate introduces publication bias that will interfere with the ability of future reviews and meta-analyses to accurately summarize evidence [
25,
26]; and (3), the discovery of fraudulent published findings in psychology have led to an increased call to conduct and publish research which repeats and extends [
27].
1.1. Irvine Minnesota Inventory
The IMI is a comprehensive measure of macro- and micro-level built environment characteristics thought to be linked to physical activity [
20]. Trained observers rate 162 built environment characteristics for each road segment (two facing sides of one street block) in a study area. A high degree of inter-rater reliability for these ratings has been demonstrated [
21]. While the developers created items that sampled four general domains (Accessibility, Pleasurability, Perceived Safety from Traffic, and Perceived Safety from Crime), no scoring procedures were initially provided to reduce or summarize the large amount of information collected by this procedure.
1.2. Proposed IMI Scales Concerning Physical Activity and Walking (Boarnet et al.)
Boarnet’s team [
22] proposed scales of IMI items based on their associations with different physical activity measures. The Twin Cities Walking Study [
23] structured the data collection. Thirty-six urban study areas, each measuring 805 m by 805 m, were selected to fit four combinations of residential density and street pattern (high density/small blocks; low density/small blocks; high density/large blocks; and low density/large blocks).
Health data for a 7 day period were obtained from 716 recruited participants (20 living in each of the study areas with a small number who did not complete the study or who had missing data). The health measures included two measures of Total Physical Activity (one obtained through accelerometer data measuring distance walked over the seven day period and one obtained by completion of the self-report International Physical Activity Questionnaire (IPAQ) [
28]) and two measures of each of Total Walking, Total Walking for Leisure and Total Walking for Travel (all obtained from the self-report IPAQ and separately from a self-report travel diary).
IMI data were collected by separate observers from a random sample of 20% of the segments in each of the 36 study areas. For each of the 716 participants who supplied health data, environment measures were the means across all observed segments in the study area in which that participant lived for each of the 162 IMI items.
Analysis consisted of examining the relationships between the health measures and the built environment measures. Each IMI item was entered into a separate regression analysis to determine its relationship with each of the physical activity measures (a maximum of 9 times 162 separate regressions if all items had variability). Each regression analysis also included the covariates enumerated in
Table 1.
The authors proposed that the items showing significant relationships (defined as p < 0.1) in these analyses could be assembled into separate scales to score the propensity of environments to support each of: Physical Activity, Total Walking, Total Walking for Travel, and Total Walking for Leisure. Two versions of scales for each of these outcome variables were proposed: a moderate version which included items associated with either the IPAQ or the travel diary version of each measure, and a prudent version restricted to the items associated with both the IPAQ and the travel diary version of the measure. An additional scale was proposed out of items associated with accelerometer data. No item weights were provided for assembling items into scales.
Table 1.
Comparison of covariates used in the current study and Boarnet
et al., [
22].
Table 1.
Comparison of covariates used in the current study and Boarnet et al., [22].
Covariate | Current Study | Boarnet et al. |
---|
Age | In Years | In Years |
Age squared | In years Squared | In Years Squared |
Children | Dummy Variable = 1 if children < 18 years in household, else 0 | Dummy Variable = 1, if children < 18 years in household, else 0 |
Married | Dummy variable = 1 if married, else 0 | Dummy variable = 1 if married, else 0 |
Education | Dummy variable = 1 indicating that the respondent has completed some college/university, college/university degree, or graduate/professional degree, else 0 | Three dummy variables indicating highest level of education (some college, college degree, or graduate/professional degree) |
Employment | Dummy variable = 1 if currently employed, else 0 | Dummy variable = 1 if currently employed, else 0 |
Student | Dummy variable = 1 if a student, else 0 | Dummy variable = 1 if a student, else 0 |
Household Income | Dummy variable = 1, indicating the respondent (and their family) may be in straitened financial circumstances | Household income, indicated by 3 dummy variables for annual income in ranges of US$20,000 to US$50,000, US$50,000 to US$80,000, and more than US$80,000 |
Race/Ethnicity | Not Collected | Race/ethnicity, indicated by 3 dummy variables for Black, Asian, and Hispanic |
Drive to work | Not Collected | Dummy variable indicating whether respondent drives to work |
Vehicle | Not Collected | Dummy variable = 1, if vehicle is available to respondent |
Dog | Not Collected | Dummy variable = 1, if dog owner |
1.3. Proposed IMI Scales Concerning Walkability in the Context of Light Rail Transit Use (Werner et al.)
Werner
et al. [
29] studied whether individuals were more likely to use light rail transit (
i.e., walk to a transit stop) if they lived on a “walkable” block in Salt Lake City, Utah. Light Rail Transit (LRT) usage data were collected by survey at two times points from 51 individuals living within 0.5 miles of a new transit station. To measure walkability, independent observers completed the IMI for each segment in the study area on which a participant lived. Scales were derived from the items of the IMI. Beginning with the domains proposed by the IMI authors, Werner and colleagues divided the Accessibility domain into three new sub-domains (Density, Diversity, and Pedestrian Access), renamed the Pleasurability domain (Attractiveness), and retained the Traffic Safety and Crime Safety domains. Next, they re-categorized some items to ensure that items represented only a single domain. Then standard scores were calculated for each feature, and aggregated into scales corresponding to the six domains. This was accomplished by averaging the item standard scores. Items in each domain with no rated scores were ignored when calculating these averages. A participant’s built environment scores were the scale scores representing each of the six domains for the segment on which he or she lived.
ANCOVA analyses were conducted to determine if the derived IMI scales differentiated non-users, new users, and continuing LRT users. Positive relationships were reported between LRT usage and Diversity (p < 0.05), Safety from Crime (p < 0.05) and Residential Density (p < 0.1).
1.4. Community Health and the Built Environment (CHBE) Project
The CHBE project [
30] sought to uncover opportunities in four communities in Alberta, Canada, for promoting physical activity and healthy eating by overcoming barriers in the built environment while working directly with diverse communities to act on these opportunities. The process included expert assessment of the built environment, but went beyond this activity to share this information with a Community Working Group that included representatives from each community and the CHBE research team. This Working Group then jointly planned, managed, and evaluated interventions. It was hoped that the Working Group members would disseminate their enhanced understandings through their regular roles within the community, and also generate sufficient support for the process to insure its sustainability after the research project concluded. As part of this project, built environment data was collected using an adapted version of the IMI.
Individual self-reported health survey data from a computer-assisted, random-digit-dial phone survey were made available for the current analyses by the Healthy Alberta Communities (HAC) project [
31], an earlier independent study that examined the effect of a number of community interventions on community obesity rates in the same Alberta communities. This linked dataset provided an opportunity to examine the findings reported by Boarnet
et al. [
22] and the scale construction procedures reported both by Boarnet
et al. and by Werner
et al. [
29]. The study received ethical clearance from the Health Research Ethics Board (Panel B), University of Alberta.
4. Discussion
This study failed to support the hypotheses of Boarnet
et al. [
22] and failed to support the viability of scales created using the methods of either Boarnet
et al. [
22] or Werner
et al. [
29] in the current settings. We can offer no evidence that the scales produced by either research group have general value for establishing relationships between built environment features and health behaviors. The reasons relate not only to the lack of consistent relationships between scales derived from the IMI and physical activity measures, but also to general properties of the scales derived by the procedures used by Boarnet
et al. and by Werner
et al.
It is possible that two or more items, each with very little relationship to each other, may both contribute to the prediction of another variable (such as a health behavior) and therefore may conceivably be combined into a scale to use to predict that other variable. The method typically used to establish such a scale would be multiple regression analysis. However, multiple regression analyses generally require large samples in order to be reliable in this task. Rules of thumb for the number of individuals required per item considered range from a low of 10 [
40] to 30 or higher [
41]. In Boarnet
et al.’s [
22] analysis, there is a sample of 716 persons on at least 178 IMI items, considerably below what would be considered sufficient to produce replicable findings. However, the situation is much poorer than even this sample size would suggest. Because there were only 36 study areas over which built environment scores were derived by averaging, there were only 36 possible sets of scores for the 716 participants on those environment scales, and as a result, the sample size is effectively only 36. This very small effective sample size helps to explain why Boarnet
et al. needed to conduct their analyses one IMI variable at a time rather than including many items simultaneously in their regression analyses. However, this has resulted in a very large number of analyses, each of which is reported at a “relaxed” level of statistical significance. For a large number of such tests (
i.e., the multiple comparison problem), conventional wisdom is that significance levels should be substantially tightened rather than relaxed [
42]. Altogether it should come as no shock that the scales suggested by Boarnet
et al. perform poorly in another setting.
In addition, all of the proposed scales examined here showed low alpha coefficients. That is, the collections of items that are aggregated into single scales do not intercorrelate very highly, and therefore there is minimal evidence that they are measuring anything in common. The scales proposed by Werner
et al. [
29] combine items classified into domains related to particular abstract properties of the built environment. If such domain classifications were accurate, then the items that measure a single property should intercorrelate, and, in turn, aggregating them into scales should result in scales with high(er) internal consistency coefficients. Werner
et al. do not calculate internal consistency coefficients. However, it would have been possible for Werner
et al. to report alpha coefficients even though there were many items that could not be scored for particular segments. In fact the scoring procedure that they used (for each individual, averaging the standard scores for items which did have a score) is formally equivalent to assigning that mean to each item that could not be scored. When this is explicitly done, alpha coefficients can be calculated as we have demonstrated above.
These low alpha coefficients suggest that the properties in the proposed classifications of items are not well represented by these scales. This may be because of a very large number of very narrow items, and a small number of very broad concepts. One way to proceed in attempting to develop scales for the IMI would be to look for a larger number of intermediate properties within each domain and seek items that intercorrelate with each other to form into scales. If such can be discovered, they would have high(er) internal consistency and should be easier to validate. For example, preliminary work suggests that IMI items indicating the presence of green spaces are associated with the presence of schools, traffic calming devices, and controlled crossings. Such a cluster of properties might in turn be associated with greater physical activity among individuals living close to it. Our team is currently engaged in attempting to form such scales from the items of the IMI using methodologies that have long been used for this purpose in psychometrics [
42].
For policy and planning activities, successful scales would be able to locate clusters of interlocking properties or their absence that might support particular outcomes. This does not mean that the potential of individual items such as locating precisely the location of particular environmental features would be lost; rather an appropriate contextualization of such specific features might be encouraged in efforts to establish successful policies to modify the built environment.
Strengths and Limitations
The current study was based upon a substantially larger data set than either the Boarnet et al. or the Werner et al. studies. It included IMI data on all segments rather than a sample of segments and thus avoided sampling error in the environmental ratings. It also included information from two differing urban environments which together provided a larger number of study areas than were available in the Boarnet et al. study.
However, the current study did not have exactly comparable physical activity outcome measures to the Boarnet
et al. or the Werner
et al. studies. Thus the current study did not have direct behavioral (accelerometer) data as the Boarnet
et al. study did. As result, we were unable to examine whether the specific scales that Boarnet
et al. formed to predict this data were replicable. However, we were clearly unable to replicate the recommended Boarnet
et al. scales to predict self-report walking data, even where our measures differed primarily in the span of time over which the self-report measures were reported. Similarly, we did not have an outcome variable comparable to the LRT usage variable in Werner
et al. study, and it remains possible that their procedures would work for their outcome variable in future studies. Nevertheless since the scales that they propose were general scales that together include the majority of IMI items, we believe that they should show internal consistency as measured by coefficient alpha, and that they should be sufficiently broad to also correlate with related outcome variables such as the ones used here. This property of scales is generally taken to be an important part of the external validity of the underlying concept putatively being measured by the scale [
43].