2.1. Data and Variables
The variables tested in the model are shown in
Table 1. Descriptive statistics are computed, including percentages, means, and standard deviations for sociodemographic and built environmental variables. Our measure of life expectancy came from a recently released dataset of life expectancies by county [
50].
Four mediating (endogenous) variables are posited between sprawl and life expectancy, and a fifth between socioeconomics and life expectancy. The first four are: average county-level vehicle miles traveled (VMT) per household [
51], the U.S. Environmental Protection Agency’s (EPA) air quality index (AQI, a combination of six air quality indicators) [
52], average county body mass index (BMI) [
53], and the violent crime rate (crime) [
54]. They relate, respectively, to four causes of premature death—traffic accidents, respiratory illnesses, obesity-related chronic health conditions, and crime and its effects on physical and mental health. The fifth mediating variable is the prevalence of smoking in the population, which has no obvious relationship to sprawl but a strong relationship to socioeconomic status.
County VMT estimates were obtained from the EPA. The EPA used surrogates such as population and roadway miles to allocate statewide total VMT to individual counties. Total VMT was divided by the number of households in each metropolitan county in 2010 to obtain VMT per household [
51].
EPA has estimated AQI at the county level and includes an annual summary of days with good, moderate, unhealthy, and very unhealthy air. The AQI takes all six air pollutant criteria into account: carbon monoxide, nitrogen dioxide, ozone, sulfur dioxide, PM
2.5, and PM
10. The ratio of unhealthy days to total days is included as a variable in the model [
52].
The BMI data came from the Behavioral Risk Factor Surveillance System (BRFSS), a telephone survey conducted by state health departments and managed by the Centers for Disease Control and Prevention (CDC). More than 350,000 adults are interviewed nationally each year to collect detailed information on health risk behaviors, preventive health practices, and health care access primarily related to chronic disease and injury. The Selected Metropolitan/Micropolitan Area Risk Trends (SMART) project, which is populated with BRFSS data for metropolitan and micropolitan statistical areas with 500 or more respondents was used in this study [
53]. This study used the county average BMI estimated from data for survey years 2007 through 2010.
Crime statistics were obtained from the uniform crime report of the Federal Bureau of Investigation (FBI). The FBI supplies crime data by type (e.g., violent and property) and subtype (e.g., murder, rape, theft) aggregated by county. In this study, a violent crime rate is computed by dividing the total number of violent crimes by the county population in hundreds of thousands. The violent crime rate is treated as an endogenous variable [
54].
Finally, smoking prevalence, the only endogenous variable unrelated to sprawl, has been estimated at the county level by the National Cancer Institute based on combined information from the two major health surveys, the BRFSS and the National Health Interview Survey (NHIS) [
55]. The estimates are based on grouped years to provide reasonable sample sizes in each county. This study used the most recent time periods’ estimates, 2000–2003. The smoking prevalence variable in this study is “ever smoked”. For ever smoked, a person 18 years of age or older must have reported smoking at least 100 cigarettes in their lifetime by the time of interview, in both BRFSS and NHIS surveys [
55]. This study treats smoking as a mediating (endogenous) variable on the pathway between sociodemographic variables and life expectancy.
Exogenous variables came from various sources. From the 2010 Census, we downloaded data on population, households, sex, age, and race/ethnicity, and computed percentage of the population that is male and percentage of the population that is white. In this study, the Yost et al. SES index was modified as a measure of socioeconomic status [
56,
57]. Yost included SES as a composite factor that combines three generally accepted domains: education, income, and occupation. For this study, we updated Yost’s index using census 2010 data.
The exogenous variable of greatest interest is the county compactness/sprawl index. This index places urban sprawl at one end of a continuous scale and compact development at the other [
59]. The original index, developed in 2002, was updated to 2010 in a recent study [
58,
60,
61]. The updated index incorporates more measures of the built environment than the original index did, and captures four distinct dimensions of sprawl: development density; land use mix; population and employment centering; and street accessibility, which represents the relative connectivity of the street network at the county level. These four dimensions are extracted from multiple correlated variables using principal component analysis and the first principal component is transformed to an index with the mean of 100 and a standard deviation of 25. The National Institutes of Health website [
62] provides detailed information on the methodology, variable names under each dimension, factor loadings (the correlation between a variable and a principal component), eigenvalues (the explanatory power of a single principal component), and percentages of explained variance. This updated index is freely available for 994 counties and county equivalents [
62]. The updated index was used as the measure of compactness in this study.
2.2. Statistical Analysis
This study used structural equation modeling (SEM) to address associations between life expectancy and urban sprawl. SEM is a “model-centered” methodology that seeks to evaluate theoretically-justified models against data [
63,
64]. The estimation of SEM models involves solving a set of equations, one for each “response” or “endogenous” variable in the network. Variables that are solely predictors of other variables are termed “influences” or “exogenous” variables.
A SEM model for life expectancy was estimated using Amos 19 and maximum likelihood procedures. A total of 606 metropolitan counties with no missing data were included in the analysis. Working with complete datasets allowed us to compute modification indices, which in turn allowed us to identify missing links in the model. Modification indices are computable only if the dataset contains no missing information. Data were examined for frequency distributions and simple bivariate relationships, especially for linearity. All variables were natural log (ln) transformed to equalize variances and improve linearity.
Four plausible mediating pathways were included connecting sprawl with life expectancy. One pathway was through average county VMT, used as a proxy for traffic fatalities in the SEM model. A second pathway was through air pollution. Air Quality Index was used as the measure of air pollution. A third pathway was through obesity, which was measured as the average BMI. A fourth and final pathway was through the violent crime rate.
This study reports the following measures of fit: the chi square, the root mean square error of approximation (RMSEA), and the comparative fit index (CFI). This study also reports results as standardized regression coefficients, which represent a standard deviation of change in the outcome per standard deviation of change in the independent variable.