In 2014, the U.S. Public Health Leadership Forum proposed that local and state health departments act as the Community Chief Health Strategist [1
]. One of the practice recommendations in the report specifically calls for analysis and translation of large, real-time data sets to identify trends and hot spots. A key step toward achieving this recommendation involves leveraging in new ways data already routinely available to health departments as secondary data sources, namely birth and death certificate data. The overarching objective of this undertaking is to identify trends in health outcomes in a population, and the associated socio-economic determinants that may be conducive to developing adequate and efficient interventions to enhance public health. This is deemed of particular relevance for the resilience of urban regions, where inter-generational and cross-cultural population dynamics challenges standards and practices in healthy cities.
Increasing the availability of standardized sub-county data is important for enhancing the capability of improving our understanding of public health in urban areas. Geographic variation in health factors and outcomes at the small-area level, including zip codes and census tracts, has been noted in many contexts [2
]. Moreover, the use of small area analysis has become an important tool in the effective targeting of limited public health resources [3
]. Despite the ability of spatial data analytics to detect patterns of clustering of events in small geographies across an urban region, small geographies may display data on health conditions that are very sparse, which can lead to cross-sectional analysis being biased by small-n designs. The challenge this creates for public health researchers and practitioners is to discern longitudinal trends from noise associated with low frequencies. Thus, given the criticality of small-area public health analysis and the constraints on access to relevant public health data, the purpose of our analysis is to present and demonstrate a robust research design to (a) study the longitudinal stability of spatial clustering with small case numbers per census tract and (b) assess the clustering changes over time across the urban environment to better inform public health policy making at the community level.
To this end, we use a case study of Mecklenburg County, North Carolina, to demonstrate that temporal windowing may effectively smooth out noise, enhance the cross-sectional validity of results, and allow us to trace longitudinal trends in spatial clusters of hypertension during pregnancy compiled from 2011–2014 birth certificates. If applied in an ongoing fashion, this approach would facilitate an important tool in targeting limited public health resources. We argue this analysis enables the greater efficiency of public health departments, while leveraging existing data and preserving citizen personal privacy.
Local health departments increasingly are adopting geographic information system (GIS) software for epidemiological analyses [4
] and to build analytic capacity [6
]. Historically, the geographic unit of analysis most often used has been the county [7
] or zip code [10
], often due to availability of geography-related data. Using the county as the aggregation unit, however, precludes identifying specific high and low risk locales within the county, which may covary with the socioeconomics of local population, environmental exposure, or geographic access to health care services. Although analysis within county units, such as zip codes, has been used with publicly available data, analysis at smaller geographic units, such as city blocks or census tracts, has been used less frequently due to concerns about individual privacy and confidentiality [11
]. Using the smallest feasible unit of analysis could uncover more distinct or isolated high and low risk locations in need of public health attention. As local health departments seek to act as Community Chief Health Strategists, public health administrators will want geospatially informed analysis.
Geospatial analysis begins by assigning a location to each case [12
], while balancing considerations of accuracy and anonymity. Street addresses act as the finest level of analysis, yielding the most detailed results. Aggregating cases into larger spatial units, such as zip codes or counties, hides valuable details and reduces variance in the data [13
]. In other words, the spatial unit acts as a proxy for individual cases, resulting in a positional discrepancy between points on a map and true home locations [14
]. The degree of location accuracy becomes important when planning local interventions or when proximity of cases to a source of exposure needs to be determined [15
]. Using the most accurate location provides health program planners with evidence to more reliably identify where increases in resources or interventions are needed [16
Accuracy in location, however, can lead to cases being identifiable, particularly in small spatial units [17
]. The identifiability of cases has been noted as a potential or actual issue of concern in reproductive health [18
], birth defects [19
], diet [20
], environmental health [21
], social care planning [22
], and geo-privacy studies [23
]. The U.S. Privacy Rule of the Health Insurance Portability and Accountability Act (HIPAA) [24
] requires that disclosed health information be restricted to the minimum necessary to satisfy its intended purpose. Data are considered de-identified in accordance with the HIPAA Privacy Rule if the data do not “identify an individual and if the covered entity has no reasonable basis to believe it can be used to identify an individual” [25
]. De-identification of geographic information is accomplished by aggregating geographic identifiers to large-population area-based units or applying statistical principles to render information not individually identifiable [26
]. However, there is no universal standard for “adequate confidentiality protection” or “acceptable risk” [27
Thus, selecting an approach to reduce the probability of identifying individuals, while preserving the characteristics of the geographic data for valid inference, depends in part on the nature of the data, acceptable confidentiality risk, and current and future use of the data [28
Public health administrators and researchers should be aware of the analytic approach used to identify high and low risk areas. Research using census tracts identifies fairly specific spatial clusters of high rates (hot spots) of adverse health conditions, and conversely, spatial clusters of low rates (cold spots) of that condition [29
]. Knowing whether a hot spot is statistically significant (that is, it would not have occurred by chance) can be determined through various approaches of spatial statistics [30
]. The health geography literature has evolved towards fully recognizing the scientific merit of exploratory analysis [31
], and cluster detection in particular. Open source geospatial software (e.g., GeoDa, PYSAL, R code libraries) make geographic statistical methods more accessible to target locations for interventions [32
The United States Patient Portability and Affordable Care Act requirement that health care organizations conduct community needs assessments [33
] has led to new collaborations between health departments and health care organizations, including data sharing. To leverage the value of contemporaneous data from the electronic health records requires not only sharing data elements across health organizations, but overcoming historical and logistical challenges of sharing data among health departments [34
]. The geographic mapping of any health condition is a cogent framework to comply with legal requirements as it assumes that the health condition has an underlying spatial pattern [35
] and that it is well positioned to capitalize on the extensive toolbox of geospatial methods to identify where underlying spatial patterns exist to direct strategic planning [36
We argue, however, that using the county level as the spatial unit may smooth out much of the spatial variability conducive to effective strategic planning for certain health conditions. Therefore, the contention is that aggregating confidential health data temporally and spatially yields results that are stable at the census tract level. In particular, we study the use of temporal moving windows spanning multiple years to reduce uncertainty in prevalence rates resulting from small counts in small geographies. Windowing is aimed at detecting the stable spatial patterns embedded in a spatial data series, including possible hot and cold spots, when spatial data series are based on small-n data sets, such as a number of chronic diseases in urban settings. As a corollary, this temporal smoothing reduces the sensitivity of longitudinal analysis to annual fluctuations that could be ascribed to non-systematic causes. Hence, it is anticipated that sensitive temporal windowing smooths out seemly random effects to reveal longitudinal trends.
The geographic variation in prenatal hypertension rates per census tract in Mecklenburg County appears to reveal a non-random spatial pattern in the annual, two-year moving average, and three-year moving average (Figure 1
, Figure 2
and Figure 3
). This spatial pattern is formally tested with Moran’s I and LISA statistics. Given the geography of census tracts and the series of prenatal hypertension rates, we find the distribution of prenatal hypertension exhibits a significantly positive Moran’s I statistic for each of the eight data series tested (Table 2
), indicating a level of positive spatial autocorrelation in hypertension across the county. The four categories of hot and cold spot census tracts are seen in the LISA maps of Mecklenburg County (Figure 4
, Figure 5
and Figure 6
) and the LISA results are documented in Table 3
Overall, the incidence of prenatal hypertension does not happen randomly in the urban region under study. When we look at neighborhoods within the Mecklenburg County region, we find that hot spots tend to be loosely found in a crescent to the north of the county center throughout the study years. However, hot spots tend to shift to western portions of Mecklenburg County in 2013 and become more prominent in the east in 2014. These areas are known to be associated with the social geography of the region, particularly with a large proportion of African American and Hispanic populations, with lower educational attainment, and lower socio-economic status (i.e., higher poverty rates, lower household income, higher unemployment) [57
]. Prenatal hypertension cold spots are mainly found in an area fanning out south from the county center as well as in the northern section of the county; this spatial pattern carries through the study period. Cold spots are found in neighborhoods with large presence of Caucasian populations, higher educational attainment, and higher socio-economic status [58
]. Hot and cold spots fall in these areas with greater consistency as temporal windowing is applied to the yearly series, which suggests this is the discernable trend in prenatal hypertension once noise has been filtered out. Finally, while tracts that are outliers begin to fade away over time, hot and cold spots become more prominent. Hence, when noise caused by small-n annual statistics is controlled for, the same neighborhoods of the city continue to be separated by sharp health disparities epitomized by hot and cold spots in prenatal hypertension.
Identifying the location of at-risk hot spot communities provides an opportunity to refine place-based health programs in urban regions with the use of temporal windowing. Geospatial analysis makes it feasible for local health departments to analyze their own data while complying with regulations and ethics related to protection of human subjects for public health surveillance and planning purposes. With a few procedures, existing data and publicly available databases can be leveraged by geocoding birth and death certificates to explore those data from different perspectives such that hot spots of census tracts can be pinpointed for further investigation, including for further confirmatory analysis of spatial epidemiology.
In the case of Mecklenburg County, chronic diseases such as hypertension continue to be cited as high priorities in the 2010–2018 Community Health Assessments (CHA) [59
]. Although the CHA allows health departments a place to compile health priorities for the county, one major challenge health departments face is understanding the impact of initiatives aimed at reducing health disparities. With CHA conducted every 4 years, the county intends to track the overall indicator of women who have a history of one or more chronic diseases. However, Charlotte administrators have not tracked where hypertension hot spots are located, and whether these hot spots shrank or expanded over time.
In Charlotte, communities that have the highest priority in the CHA tend to have less education and income and live in neighborhoods which lack access to healthy food and safe places for recreation. Spatially, a crescent-shaped area of poverty and low-educational attainment has formed around the center city of Charlotte. These residents may also be exposed to risk factors that increase their chance for chronic disease. Our results demonstrated that once temporal windowing is used with the hypertension data, especially the two- and three-year windows, hot spot clusters emerged from these traditionally segregated communities in the Charlotte area. The benefit of tracking hypertension in Charlotte with Moran’s I and LISA analysis at census tracts is that it is possible to identify whether health departments achieve their goals from assessment to assessment. Although county level spatial aggregation continues to be in use routinely in most urban areas, this level of analysis precludes this type of surveillance and is rather ineffective to fully inform decision makers on where to target their efforts and on whether past efforts were successful.
Studying a series of spatial health data through a finer-scale analysis—such as the neighborhoods—allows a better understanding of local health outcomes and risk factors over time. Incorporating smaller geographies into a GIS can also aid health administrators by incorporating the location of hospital and medical facilities to address whether proximity influences the spread of the hot or cold spots. Moreover, U.S. Census and American Community Survey demographic data can be linked to see whether trends correlate with socio-economic conditions. Linking these data into the CHA will require collaboration with data stewards and adequate training of public health practitioners so that the benefits of using these data can be fully realized. In this way, a more tailored approach to surveying health priorities can be undertaken.
Our findings support that data spatially aggregated to the tract level lead to results that contribute to increased capacity to identify local clusters [60
]. We also show that data stability and greater consistency in the significant spatial patterns can be obtained through a temporal windowing approach, which is an important achievement for strategic planning when rates are based on small numbers. Another type of models, known as Bayesian hierarchical models, have also been used to address sparseness in populations and cases, allowing for an adaptive smoothing approach [61
]. This can, however, create overly smoothed maps, masking true risk distribution. The degree of smoothing used is a trade-off between high sensitivity and high specificity [62
]. Prior research suggests that a numerator of 20 or more is needed to produce fairly stable estimates [63
]. Also, it is well known that denominator data that rely on annual data sampling to apprehend the at-risk population (denominator)—such as annual American Community Survey data—is affected by large and spatially variable margins of error across the urban region. While some methods have been developed to provide unbiased estimates of statistics [64
], many analyses continue to ignore this important and impactful data uncertainty [66
We also acknowledge a key limitation in spatial analytics when dealing with aggregated data, such as census tracts. The possibility exists of having a Modifiable Areal Unit Problem [67
], in the form of a scale effect or of a zoning effect. The scale effect arises when different results are attained due to variations in the scale of aggregation units. This implies that using census tracts rather than zip codes, for example, can lead to different findings. The zoning effects occur when a constant scale of analysis is used with a variation in the shape of aggregation units. This is the situation for census tracts, as well as zip codes and county boundaries. We did not test for scale or zoning effect because the study was to document how administrators can use confidential data. Further research is needed to understand which of the commonly used geographic scales is more useful for public health planning and under which conditions.
With tight state budgets limiting health department’s funds, using LISA on spatially aggregated secondary health data with a GIS provides for a targeted and more efficient approach to health resources planning, enabling monitoring of socio-economic determinants of health at a geographic scale commensurate with policy making and assessment. With the use of temporal windowing, administrators can reduce the effect of random noise while using a health indicators that are based on a small number of cases. The geographic tools used in this study are not intended to draw any causal conclusions about the spatial patterns that emerge from the analysis. Instead, they are powerful, effective, and robust for pattern detection and monitoring to enhance health administrator’s understanding of the processes that occur at the neighborhood or community level, such as census tracts, or whichever level administrators deem appropriate for the data to be aggregated too. Over time, the use of these tools will help local health departments identify how heath indicators change and what socio-demographic data associate with those changes.
We successfully demonstrated the application of this geospatial health analytics research design to the case of prenatal hypertension in the urban setting of Mecklenburg County, and because it is easy to reproduce, argue for its broad use in public health departments as part of their standard analytic toolbox. Tracking a system of sub-county data will allow public health officials in different urban contexts to benefit from our research by better understanding local health outcomes and risk factors over time. Incorporating these data into a collaborative urban network is advocated so that the benefits of using these data can be fully realized and identified challenges resolved.