Relating Built Environment to Physical Activity: Two Failures to Validate

The Irvine-Minnesota Inventory (IMI) is an audit tool used to record properties of built environments. It was designed to explore the relationships between environmental features and physical activity. As published, the IMI does not provide scoring to support this use. Two papers have since been published recommending methods to form scales from IMI items. This study examined these scoring procedures in new settings. IMI data were collected in two urban settings in Alberta in 2008. Scale scores were calculated using the methods presented in previous papers and used to test whether the relationships between IMI scales and walking behaviors were consistent with previously reported results. The scales from previous work did not show expected relationships with walking behavior. The scale construction techniques from previous work were repeated but scales formed in this way showed little similarity to previous scales. The IMI has great potential to contribute to understanding relationships between built environment and physical activity. However, constructing reliable and valid scales from IMI items will require further research.


Introduction
Obesity and related chronic diseases are recognized as a global population health challenge [1,2]. Calls have been made to move beyond behavioral risk factors such as physical activity levels for these conditions to consider associated environmental, political, economic, and social determinants in populations [3,4], including the effects of built environment features on energy balance behaviors and active living [5]. Any call to develop and implement public health and urban planning policies to address these relationships [6,7] may be premature, however. Some built environment features (e.g., land use mix, intersection density, and recreational facilities) have been shown to be related to some individual health outcomes (e.g., body mass index, physical activity, healthy eating) [8][9][10][11][12], although not consistently [13][14][15][16].
A recent review of tools used to assess environments for physical activity [15] distinguishes two general types: (1) self-reports of perceptions of the environment typically completed by participants in studies that also record information about physical activity information; and (2) instruments that are used to independently assess the environment for particular properties. This second category includes both GIS-based measures which use existing administrative data as a basis for the formation of measures, and observational measures or community audits that involve the direct observation of features of the built environment thought to be relevant to physical activity. The reviewers call for additional research with all tools to establish relationships between characteristics of the built environment and physical activity, as well as the relationships between different ways of assessing the built environment. Such research is important to establishing an evidence base to support specific improvements to the built environment that will result in improvements in active living and physical activity.
Our work has focused upon the objective assessment of the micro features of the built environment, in part because we wished to form a database of this information for future use by community partners. Databases of the spatial distribution of individual features of the built environment have many potential uses, including the secondary identification of specific local barriers to, and assets for, physical activity as part of community planning [17][18][19]. We chose to use the Irvine Minnesota Inventory (IMI) [20,21] because it is the most comprehensive of the community audit tools.
Two recent papers [22,23] have reported relationships between individual health behaviors and scales developed by aggregating IMI items. This paper reports our unsuccessful attempts to (1) replicate these findings using similar measures in a two different locations; and (2) our subsequent attempt to examine the properties of the proposed scales and to repeat the scale construction procedures. Our findings call into question the viability of these procedures to create scales that will generalize. We believe that it is important to make these failures known for several reasons: (1) recent work suggests that current publication practices result in many false positive findings being reported [24]; (2) the failure to publish failures to replicate introduces publication bias that will interfere with the ability of future reviews and meta-analyses to accurately summarize evidence [25,26]; and (3), the discovery of fraudulent published findings in psychology have led to an increased call to conduct and publish research which repeats and extends [27].

Irvine Minnesota Inventory
The IMI is a comprehensive measure of macro-and micro-level built environment characteristics thought to be linked to physical activity [20]. Trained observers rate 162 built environment characteristics for each road segment (two facing sides of one street block) in a study area. A high degree of inter-rater reliability for these ratings has been demonstrated [21]. While the developers created items that sampled four general domains (Accessibility, Pleasurability, Perceived Safety from Traffic, and Perceived Safety from Crime), no scoring procedures were initially provided to reduce or summarize the large amount of information collected by this procedure.

Proposed IMI Scales Concerning Physical Activity and Walking (Boarnet et al.)
Boarnet's team [22] proposed scales of IMI items based on their associations with different physical activity measures. The Twin Cities Walking Study [23] structured the data collection. Thirty-six urban study areas, each measuring 805 m by 805 m, were selected to fit four combinations of residential density and street pattern (high density/small blocks; low density/small blocks; high density/large blocks; and low density/large blocks).
Health data for a 7 day period were obtained from 716 recruited participants (20 living in each of the study areas with a small number who did not complete the study or who had missing data). The health measures included two measures of Total Physical Activity (one obtained through accelerometer data measuring distance walked over the seven day period and one obtained by completion of the self-report International Physical Activity Questionnaire (IPAQ) [28]) and two measures of each of Total Walking, Total Walking for Leisure and Total Walking for Travel (all obtained from the self-report IPAQ and separately from a self-report travel diary). IMI data were collected by separate observers from a random sample of 20% of the segments in each of the 36 study areas. For each of the 716 participants who supplied health data, environment measures were the means across all observed segments in the study area in which that participant lived for each of the 162 IMI items.
Analysis consisted of examining the relationships between the health measures and the built environment measures. Each IMI item was entered into a separate regression analysis to determine its relationship with each of the physical activity measures (a maximum of 9 times 162 separate regressions if all items had variability). Each regression analysis also included the covariates enumerated in Table 1.
The authors proposed that the items showing significant relationships (defined as p < 0.1) in these analyses could be assembled into separate scales to score the propensity of environments to support each of: Physical Activity, Total Walking, Total Walking for Travel, and Total Walking for Leisure. Two versions of scales for each of these outcome variables were proposed: a moderate version which included items associated with either the IPAQ or the travel diary version of each measure, and a prudent version restricted to the items associated with both the IPAQ and the travel diary version of the measure. An additional scale was proposed out of items associated with accelerometer data. No item weights were provided for assembling items into scales. Werner et al. [29] studied whether individuals were more likely to use light rail transit (i.e., walk to a transit stop) if they lived on a "walkable" block in Salt Lake City, Utah. Light Rail Transit (LRT) usage data were collected by survey at two times points from 51 individuals living within 0.5 miles of a new transit station. To measure walkability, independent observers completed the IMI for each segment in the study area on which a participant lived. Scales were derived from the items of the IMI. Beginning with the domains proposed by the IMI authors, Werner and colleagues divided the Accessibility domain into three new sub-domains (Density, Diversity, and Pedestrian Access), renamed the Pleasurability domain (Attractiveness), and retained the Traffic Safety and Crime Safety domains. Next, they re-categorized some items to ensure that items represented only a single domain. Then standard scores were calculated for each feature, and aggregated into scales corresponding to the six domains. This was accomplished by averaging the item standard scores. Items in each domain with no rated scores were ignored when calculating these averages. A participant's built environment scores were the scale scores representing each of the six domains for the segment on which he or she lived. ANCOVA analyses were conducted to determine if the derived IMI scales differentiated non-users, new users, and continuing LRT users. Positive relationships were reported between LRT usage and Diversity (p < 0.05), Safety from Crime (p < 0.05) and Residential Density (p < 0.1).

Community Health and the Built Environment (CHBE) Project
The CHBE project [30] sought to uncover opportunities in four communities in Alberta, Canada, for promoting physical activity and healthy eating by overcoming barriers in the built environment while working directly with diverse communities to act on these opportunities. The process included expert assessment of the built environment, but went beyond this activity to share this information with a Community Working Group that included representatives from each community and the CHBE research team. This Working Group then jointly planned, managed, and evaluated interventions. It was hoped that the Working Group members would disseminate their enhanced understandings through their regular roles within the community, and also generate sufficient support for the process to insure its sustainability after the research project concluded. As part of this project, built environment data was collected using an adapted version of the IMI.
Individual self-reported health survey data from a computer-assisted, random-digit-dial phone survey were made available for the current analyses by the Healthy Alberta Communities (HAC) project [31], an earlier independent study that examined the effect of a number of community interventions on community obesity rates in the same Alberta communities. This linked dataset provided an opportunity to examine the findings reported by Boarnet et al. [22] and the scale construction procedures reported both by Boarnet et al. and by Werner et al. [29]. The study received ethical clearance from the Health Research Ethics Board (Panel B), University of Alberta.

Built Environment Measurement
In June 2008, three observers attended a 3-day training session on the administration of an adapted version of the IMI. The IMI remained intact, but the CHBE-modified tool [30] included additional items from the Systematic Pedestrian and Cycling Environmental Scan [32] and the Pedestrian Environment Data Scan [33]. In addition, a separate IMI rating was performed on each side of the road constituting a segment in the original IMI. Finally, all segments from all four communities were rated to provide a complete database for future use. Ratings were registered on a Motorola MC35 handheld computer running CyberTrack software (CyberTracker, v3.129, CyberTracker, Cape Town, South Africa). A GPS reading was taken at the mid-point of each segment.
In total, observers documented 3,786 segments in four communities including two smaller rural communities in North East Alberta. All four communities had previously been chosen as representative sites for health promotion interventions for the Healthy Alberta Communities project [31] by government funders. Because the smaller communities differed markedly, only the data from the 3,195 segments in the two larger urban settings were used in the current work. The North Central Edmonton community (population: 41,026) [34] comprises 11 neighborhoods in Edmonton, Alberta (see Figure 1) and is characterized as an inner-city area. It is a relatively homogenous environment built on a grid pattern. Medicine Hat (population: 61,097) [35] is a city in southern Alberta that includes a wide range of built environments from downtown to suburban spaces (see Figure 2).

Health Survey Data
Individual self-reported health data were made available from the HAC project [31]. Measures included self-report physical activity variables for the 2,042 participants in the two urban communities examined in the current analyses (780 from North Central Edmonton and 1,262 from Medicine Hat that could be geocoded (GeoPinpoint, v6.4) to a segment within the community). (HAC participants could optionally provide an address, but not all chose to do so. Analysis suggested no differences between these groups on the physical activity measures employed here).
Physical activity questions asked in the HAC survey were taken from the Canadian Community Health Survey (CCHS) Cycle 3.1. [36,37] and had, in turn, been based on the IPAQ [28]. One difference is that CCHS data were collected for a 3-month recall period. Measures of Total Walking, Total Walking for Leisure, and Total Walking for Travel comparable to the self-report measures calculated by Boarnet et al. [22] could be calculated. Instructions are available for calculating Total Walking for Leisure, i.e., multiplying daily frequency of walking for leisure by a categorical variable indicating the normal duration of walking for leisure. This number is then multiplied by a factor which expresses the metabolic energy cost as a multiple of the resting metabolic rate (MET). Instructions were not provided for calculating Total Walking for Travel in comparable units. As the Walking for Travel question requires a categorical response indicating the amount of time spent walking for travel in the past week, creating values for the categorical ranges of times allows for the same Total Walking for Travel variable as was calculated for Total Walking for Leisure. The following values were assigned: none = 0; <1 h = 0.5; 1-5 h = 3; 6-10 h = 8; 11-20 h = 15.5; and >20 h = 22.5. Thus, using the calculation described for Total Walking for Leisure, Total Walking for Travel was calculated by multiplying the average number of hours spent walking for travel per day by the MET value for walking. The Total Walking variable was then calculated as the sum of Total Walking for Leisure and Total Walking for Travel.

Repeating Proposed IMI Scales Concerning Physical Activity and Walking (Boarnet et al.)
Data were re-coded and analyzed to approximate the methods described by Boarnet et al. [22]. Applying an 805 m × 805 m grid to CHBE data (ArcGIS Desktop: Release 10, Redlands, CA, USA), 50 study areas were created for Medicine Hat and 16 study areas were created for North Central Edmonton. By way of illustration Figure 3 shows the study areas and distribution of respondents for Medicine Hat.
Built environment feature scores were the average scores for the individual features across observed segments within the individual's study area. However, 100% of the segments inside the study area were used (rather than 20%). As accelerometer and travel diary information were not collected by HAC, only the Boarnet et al. scales based on self-reported physical activity survey data (i.e., Total Walking, Total Walking for Leisure, or Total Walking for Travel) could be examined. Averages were taken for all IMI items proposed for the moderate scales by Boarnet et al. to represent each scale. Internal consistency of the items in these scales was assessed by calculating Cronbach's alpha [38]. To repeat the Boarnet et al. scale construction procedures, regression analyses were performed separately in each of our two communities to determine the relationship between the three physical activity scales (Total Walking, Walking for Leisure, and Walking for Travel) and each of the IMI items as corrected for the variables in Table 1. The overlap between the scales proposed by Boarnet et al. and the items suggested by these analyses were examined.

Generalizing Proposed IMI scales Concerning Walkability in the Context of Light Rail Transit Use (Werner et al.)
While the focus of Werner et al. [29] was on walking to LRT stations, the current analysis applied their derived scales to the more general walking behaviors: Total Walking, Walking for Leisure, and Walking for Travel. As in the Werner et al. analysis, each participant's six scale scores (Density, Diversity, Pedestrian Access, Attractiveness, Safety from Traffic, Safety from Crime) were based on the average of standardized scores on the street segment of their residence for each variable in the scale as specified by Werner et al. Multiple linear regression was used to determine if a relationship existed between the built environment scales and each walking behavior measure. As the walking behaviors reported here are continuous variables, multiple linear regression was used in place of ANCOVA. Analyses were corrected for age and gender, in addition to education. Finally, internal consistency of the six IMI derived scales was examined by calculating Cronbach's alpha. Table 2 summarizes the relationships between the walking behavior scales from the HAC data and the scales calculated according to the recommendations of Boarnet et al. None of the correlation coefficients met conventional levels of statistical significance. These results did not change when the relationships were examined by regression analyses that corrected for the variables in Table 1.  Table 3 shows the internal consistency of the Boarnet et al. recommended scales, as measured by Cronbach's alpha. Values range from below 0 (which indicates that on average the items have negative correlations with each other) to 0.608. Common rules of thumb for internal consistency place the limit of acceptability for a scale at 0.7 or greater, but suggest 0.8 or higher for a scale's routine use [39]. Table 4 summarizes how the items recommended by Boarnet et al. fared in the current data in predicting the scales to which they had been assigned, based on a level of significance of p < 0.10. Only a very few of these items show relationships here.  Table 5 summarizes the scales that would have been constructed by replicating the procedures of Boarnet et al. [22]. Note the very small number of common scale items across the two CHBE settings. Most of the items that would have been included in scales for either CHBE setting were not included in Boarnet et al. scales.

Results of Generalizing Proposed IMI Scales Concerning Walkability in the Context of Light Rail Transit Use (Werner et al.)
A comparison of the relationships between domains of the built environment and health behaviors is presented in Table 6. For the Werner results, the level of significance is reported for relationships between LRT use and the independent variables. Standardized beta coefficients for the regression analyses and level of significance are presented for results generated from CHBE/HAC data. In Medicine Hat, a statistically significant relationship was discovered between the Attractiveness scale and the Walking for Leisure outcome. In North Central Edmonton, statistically significant relationships were discovered between the Crime Safety variable and two health outcomes: Total Walking and Walking for Travel. A consistent pattern of significance was not found for any relationships between built environment domains and physical activity behaviors across all three settings. The Diversity scale is found to have a significant relationship with walking behaviors only by Werner. Crime Safety was found to have significant relationship with walking behaviors by Werner, and in CHBE's North Central Edmonton setting. In North Central Edmonton, this relationship is negative, suggesting less Total Walking and less Walking for Travel in areas considered safer from crime. The Attractiveness scale is an indicator of any walking behaviors only in Medicine Hat.

Discussion
This study failed to support the hypotheses of Boarnet et al. [22] and failed to support the viability of scales created using the methods of either Boarnet et al. [22] or Werner et al. [29] in the current settings. We can offer no evidence that the scales produced by either research group have general value for establishing relationships between built environment features and health behaviors. The reasons relate not only to the lack of consistent relationships between scales derived from the IMI and physical activity measures, but also to general properties of the scales derived by the procedures used by Boarnet et al. and by Werner et al. It is possible that two or more items, each with very little relationship to each other, may both contribute to the prediction of another variable (such as a health behavior) and therefore may conceivably be combined into a scale to use to predict that other variable. The method typically used to establish such a scale would be multiple regression analysis. However, multiple regression analyses generally require large samples in order to be reliable in this task. Rules of thumb for the number of individuals required per item considered range from a low of 10 [40] to 30 or higher [41]. In Boarnet et al.'s [22] analysis, there is a sample of 716 persons on at least 178 IMI items, considerably below what would be considered sufficient to produce replicable findings. However, the situation is much poorer than even this sample size would suggest. Because there were only 36 study areas over which built environment scores were derived by averaging, there were only 36 possible sets of scores for the 716 participants on those environment scales, and as a result, the sample size is effectively only 36. This very small effective sample size helps to explain why Boarnet et al. needed to conduct their analyses one IMI variable at a time rather than including many items simultaneously in their regression analyses. However, this has resulted in a very large number of analyses, each of which is reported at a "relaxed" level of statistical significance. For a large number of such tests (i.e., the multiple comparison problem), conventional wisdom is that significance levels should be substantially tightened rather than relaxed [42]. Altogether it should come as no shock that the scales suggested by Boarnet et al. perform poorly in another setting.
In addition, all of the proposed scales examined here showed low alpha coefficients. That is, the collections of items that are aggregated into single scales do not intercorrelate very highly, and therefore there is minimal evidence that they are measuring anything in common. The scales proposed by Werner et al. [29] combine items classified into domains related to particular abstract properties of the built environment. If such domain classifications were accurate, then the items that measure a single property should intercorrelate, and, in turn, aggregating them into scales should result in scales with high(er) internal consistency coefficients. Werner et al. do not calculate internal consistency coefficients. However, it would have been possible for Werner et al. to report alpha coefficients even though there were many items that could not be scored for particular segments. In fact the scoring procedure that they used (for each individual, averaging the standard scores for items which did have a score) is formally equivalent to assigning that mean to each item that could not be scored. When this is explicitly done, alpha coefficients can be calculated as we have demonstrated above.
These low alpha coefficients suggest that the properties in the proposed classifications of items are not well represented by these scales. This may be because of a very large number of very narrow items, and a small number of very broad concepts. One way to proceed in attempting to develop scales for the IMI would be to look for a larger number of intermediate properties within each domain and seek items that intercorrelate with each other to form into scales. If such can be discovered, they would have high(er) internal consistency and should be easier to validate. For example, preliminary work suggests that IMI items indicating the presence of green spaces are associated with the presence of schools, traffic calming devices, and controlled crossings. Such a cluster of properties might in turn be associated with greater physical activity among individuals living close to it. Our team is currently engaged in attempting to form such scales from the items of the IMI using methodologies that have long been used for this purpose in psychometrics [42].
For policy and planning activities, successful scales would be able to locate clusters of interlocking properties or their absence that might support particular outcomes. This does not mean that the potential of individual items such as locating precisely the location of particular environmental features would be lost; rather an appropriate contextualization of such specific features might be encouraged in efforts to establish successful policies to modify the built environment.

Strengths and Limitations
The current study was based upon a substantially larger data set than either the Boarnet et al. or the Werner et al. studies. It included IMI data on all segments rather than a sample of segments and thus avoided sampling error in the environmental ratings. It also included information from two differing urban environments which together provided a larger number of study areas than were available in the Boarnet et al. study.
However, the current study did not have exactly comparable physical activity outcome measures to the Boarnet et al. or the Werner et al. studies. Thus the current study did not have direct behavioral (accelerometer) data as the Boarnet et al. study did. As result, we were unable to examine whether the specific scales that Boarnet et al. formed to predict this data were replicable. However, we were clearly unable to replicate the recommended Boarnet et al. scales to predict self-report walking data, even where our measures differed primarily in the span of time over which the self-report measures were reported. Similarly, we did not have an outcome variable comparable to the LRT usage variable in Werner et al. study, and it remains possible that their procedures would work for their outcome variable in future studies. Nevertheless since the scales that they propose were general scales that together include the majority of IMI items, we believe that they should show internal consistency as measured by coefficient alpha, and that they should be sufficiently broad to also correlate with related outcome variables such as the ones used here. This property of scales is generally taken to be an important part of the external validity of the underlying concept putatively being measured by the scale [43].

Conclusions
By failing to validate the findings or procedures of other research groups, this paper demonstrated that no acceptable scoring scheme has yet been developed for the IMI. The authors are currently applying psychometric methods to create such a scoring scheme. Until a better scoring scheme is available, any generalizations concerning the impact of the built environment on population health using the IMI tool remain premature.