Space-Time Statistical Insights about Geographic Variation in Lung Cancer Incidence Rates: Florida, USA, 2000–2011

The geographic distribution of lung cancer rates tends to vary across a geographic landscape, and covariates (e.g., smoking rates, demographic factors, socio-economic indicators) commonly are employed in spatial analysis to explain the spatial heterogeneity of these cancer rates. However, such cancer risk factors often are not available, and conventional statistical models are unable to fully capture hidden spatial effects in cancer rates. Introducing random effects in the model specifications can furnish an efficient approach to account for variations that are unexplained due to omitted variables. Especially, a random effects model can be effective for a phenomenon that is static over time. The goal of this paper is to investigate geographic variation in Florida lung cancer incidence data for the time period 2000–2011 using random effects models. In doing so, a Moran eigenvector spatial filtering technique is utilized, which can allow a decomposition of random effects into spatially structured (SSRE) and spatially unstructured (SURE) components. Analysis results confirm that random effects models capture a substantial amount of variation in the cancer data. Furthermore, the results suggest that spatial pattern in the cancer data displays a mixture of positive and negative spatial autocorrelation, although the global map pattern of the random effects term may appear random.


Introduction
Spatial scientists, practitioners, and policy makers are interested in understanding the spatial variation in cancer rates at various geographic scales and resolutions (e.g., [1]). One commonly employed geographic resolution is the county, because aggregating especially counts of rare cancer cases by county for ecological analyses almost always preserves patient confidentiality [2]. This ethical and legal goal is at the expense of data analysis accuracy and precision, as well as data analytic complications such as the ecological fallacy. Meanwhile, depicting finer geographic resolution rates with choropleth or kernel density smoothed maps can maintain patient confidentiality while improving data analysis accuracy and precision (e.g., [3]), and help avoid or minimize such complications as the ecological fallacy.
Often, one objective of such research is to investigate associations between cancer rates and socio-economic/demographic characteristics. The availability of such covariates, generally retrieved from government census publications, tends to be very limited for fine geographic resolutions (e.g., census blocks). Furthermore, as a rare event, many cancer rates are often zero in a sizeable number of areal units at a very fine geographic resolution. Because a spatial analysis

Background
A number of risk factors that are associated with lung cancer incidence have been examined and characterized in the literature (e.g., [10][11][12]); cigarette smoking is the most well-known factor that can trigger lung cancer. Studies also show that nonsmokers exposed to secondhand tobacco have higher risks of developing lung cancer (e.g., [13]). Other life style related risk factors, such as an unhealthy diet and alcohol consumption, increase the risk of developing lung cancer (e.g., [14]). Another suspicious contributor to human lung cancer burden is outdoor air pollution (e.g., fine particulate matter and a concentration of ozone); a number of studies examine, and the findings support, an association between air pollution and lung cancer risk (e.g., [15,16]).
Indicators of socio-economic status also tend to be highly correlated with lung cancer risk. Due to their availability, the use of these variables has been popular amongst researchers to describe lung cancer incidence rates in the literature. For example, Mao et al. [10] report a significant inverse relationship between high socio-economic status, and lung cancer risk. Socio-economic status reflects one's lifestyle, including diet, working and living conditions, enabling them to be treated as surrogates, and assumed to be associated with lung cancer [17,18]. Specifically, the part of the population with lower socio-economic status (e.g., less educated, below a poverty level, unemployed) tends to have a higher risk of developing lung cancer than their counterparts that are classified with higher socio-economic status [19].
In addition, the risk of developing lung cancer tends to vary across racial/ethnic, age, and sex groups. Alberg et al. [19] argue that lung cancer incidence rates are similar for African and white Americans; however, a higher risk is observed for black American men than white American men. Haiman et al. [20] comment that this difference is attributable to varying smoking behaviors among these ethnic/racial groups. Some case-control studies suggest higher risks of smoking-related lung cancer in women than men (e.g., [21,22]); however, this sex-difference in susceptibility to lung cancer still lacks supporting evidence. Age has been an important risk factor for most of cancers; the risk of lung cancer increases as age increases, seemingly as a part of the natural maturation process. Studies also report that immigration status plays a role in lung cancer risk; for example, United States (U.S.). Asian immigrants have higher lung cancer mortality rates than their U.S.-born counterparts; whereas the rates are lower among U.S. black immigrants than U.S.-born blacks (e.g., [23]), This variation may be attributable to differences in smoking prevalence between the U.S. and the countries of origin, and differences across socio-economic classes (e.g., [24,25]).
Lung cancer incidence rates generally are observed to vary substantially across geographic space. The literature suggest that air pollution is one of the major contributors to this geographic variation [13]. For example, Jacquez and Greiling [26] observe clusters of significantly high lung cancer incidence rates in central Long Island coinciding with a concentration of air toxics. The spatial variation of risk for lung cancer also is attributable to the geographic distribution of population. For example, Kelsall and Diggle [27] report that the prevalence of lung cancer incidence is higher in areas with high social deprivation, which may directly link to smoking behavior and eating habits. A range of spatial models, including Bayesian space-time joint models (e.g., [28]), spatial multilevel regression models (e.g., [29]), and conditional autoregressive models (e.g., [30]), have been applied to account for the geographic variation present in geospatial cancer data analyses.
A RE model frequently is utilized for a longitudinal data analysis exploiting repeated measures over time [31]. For example, it has popularly been applied to model economic/social phenomena. Frondel and Vance [32] specify a RE model to estimate fuel price elasticities with household data. Clarke et al. [33] use a model with both fixed and random effects to analyze the determinants of pupil achievement in primary school, finding that a RE model outperforms a fixed effects only model, based on statistical efficiency. Chen and Tarko [34] employ a RE model to investigate traffic safety in highway work zones, with their results indicating that a RE model furnishes a good option for that type of research.

Data and Methodology
Lung cancer cases were obtained from the Florida cancer registry. After a data cleaning process that led to removal of duplicates (e.g., patients were diagnosed with lung cancer as a secondary cancer), records containing missing information (e.g., age and sex), and unsuccessfully geocoded records (i.e., failed-to-be-geocoded cases were deleted for the entire state, and then subsets were extracted from the clean dataset for specific study areas), 172,495 cancer incidences were used in data analyses. Cancer points are distributed unevenly across the 67 counties of the state, sample size ranging from 13,918 (Broward County) to 31 (Liberty County), with a median of 1277 (Santa Rosa County). These lung cancer incidences occurred in a 12-year span, from 2000 to 2011. At the block group resolution, a relatively fine geographic resolution, many block groups have zero cancer incidences. In contrast, only 1.98% of the census tracts, a coarser geographic resolution, have zero cancer counts. To avoid the issue of excessive zeros, this research focuses on two geographic resolutions, namely county and census tract, for comparison purpose. In addition, this paper limits its study area to six different metropolitan statistical areas (MSAs) focusing on relatively highly densely populated areas in the state: Pensacola, Tallahassee, Jacksonville, Orlando, Miami, and Tampa.

Lung Cancer Incidence Rates
The crude cancer incidence rate, the ratio of cancer counts and population size at risk, generally is considered as a limited measure because cancer generally occurs at different rates based on age, gender, and even racial group composition of a population. A comparison of crude cancer rates over time or across different geographic areas is likely to be plagued by bias because of different local population compositions [35]. Standardization of disease rates has been proposed to control for changes in population structure. Adjustments of cancer rates for age is a frequently applied standardization [36]. The Centers for Disease Control and Prevention (CDC) also adopts this approach for statistical report purposes. With the availability of age and sex information for lung cancer patients, this research adjusts lung cancer incidence rates for both age and sex. Figure 1 portrays the geographic distribution of adjusted lung cancer incidence rates across the State of Florida and its six MSAs. The Moran coefficient (MC) and Geary Ratio (GR) statistics suggest adjusted cancer rates exhibit a very weak PSA map pattern at the county resolution (Figure 1a), and random spatial patterns at the census tract resolution (Figure 1b-g).
Compared with the crude lung cancer incidence rates summarized in the Appendix A ( Figure A1), the standardization process tends to reduce spatial clusters of similar cancer rates (i.e., clusters of high values or low values), and generate alternating patterns (i.e., a low lung cancer rate is surrounded by high rates for its neighbors, or a high lung cancer rate is surrounded by low rates for its neighbors) at both the county and census tract resolutions. In addition, due to relatively small populations at the census tract resolution, rate adjustment triggers outliers (e.g., high cancer rates). For example, the highest adjusted cancer rate in the Miami MSA reaches 2.73%, whereas the highest crude rate is 0.36%. Also, more census tracts stand out with high adjusted cancer rates compared with their corresponding crude ones. To mitigate negative impacts of extreme outliers, census tracts with small populations but some cancer counts are aggregated with their neighboring tracts for the analyses summarized in this paper. Specifically, the Miami MSA has 19 such census tracts that were merged into their adjacent tracts; the Tampa and Orlando MSAs have, respectively, five and one such census tracts. Most of these merged census tracts involve commercial, industrial, or coastal land use.

Moran Eigenvector Spatial Filtering
Moran Eigenvector spatial filtering (MESF) is a spatial statistical methodology that introduces a set of eigenvectors into a regression model specification to capture SA. Eigenvectors can be extracted from a transformed spatial weights matrix C, which can be expressed as: where I is an n-by-n identity matrix, 1 is a n-by-1 vector of ones, n is the number of areal units, and T is the matrix transpose operator. This transformed spatial weights matrix generates n eigenvectors; however, only a subset of them serves as independent variables to be included in a model specification [37]. This subset can be identified from a candidate eigenvector set with a stepwise regression procedure [38]. A RE model can be specified as: where Y denotes a response variable, X denotes a matrix of covariates, β X denotes regression coefficients for covariates, Z denotes a RE term, and ε denotes a regression error term. The RE term, Z, is commonly assumed to be normally distributed and uncorrelated with both covariates and residuals, and to have a mean of zero. In order to estimate the RE term and separate it from the residual error ε, additional information (e.g., repeated measures furnished in a space-time series, or priors in a Bayesian analysis) are necessary (e.g., [39]). A RE model can be further extended with MESF, in order to accommodate both SSRE and SURE terms simultaneously, as: where E k denotes a subset of eigenvectors, and β E are unknown coefficients for these eigenvectors. E k β E furnishes a SSRE term, and Z SURE denotes a SURE term. That is, the RE term, Z, is decomposed into the linear combination of E k β E and Z SURE . Furthermore, a separation of the selected eigenvectors, E k , into PSA and NSA eigenvectors, can furnish a way to investigate PSA and NSA components in a SSRE. In this paper, space-time lung cancer counts (e.g., n-by-T = 67-by-12 for the county resolution) furnish the repeated measures for the response variable. The count variable is described with a Poisson probability model by including the logarithmic values of expected lung cancer counts as an offset variable. After a RE term successfully is estimated using the Poisson RE model, a MESF model is specified to estimate the SSRE and SURE components, with the estimated RE term as the independent variable. Essentially, a linear combination of the selected eigenvectors constructs a SSRE term, which is further decomposed into a PSA-NSA mixture [40], and the MESF model residual constitutes the SURE term. Poisson RE and MESF models were implemented in R 3.4.2.; the glmer procedure (package lme4) was utilized to estimate the RE components.

Results and Discussion
This section summarizes analysis results for both county and census tract resolutions. Regression results for quasi-Poisson and Poisson RE models are compared, and the estimated RE components are portrayed with maps.

The State Scale and County Resolution
Seven variables were retrieved to describe lung cancer incidence rates at the county resolution, including smoking rates from the Florida Department of Health, and socio-economic variables, which are median household income, the percentage of population with a college or higher degree, the percentage of population below a poverty threshold, the percentage of Hispanic population, the percentage of black population, and immigrants, from the U.S. Census Bureau. Table 1 summarizes the estimation results for a Poisson RE model, as well as the results of a quasi-Poisson model for comparison purpose. It shows that the lung cancer data has considerable overdispersion (i.e., excess Poisson variation). However, the extra-Poisson variation successfully is accounted for in the RE model, with the overdispersion parameter decreasing from 13.36 to 2.15. Moreover, an inclusion of the RE term leads to an increase in the pseudo-R 2 , increasing it from 0.30 to 0.74. The VIF values are all less than 10 (e.g., [41,42]), indicating no excessive multi-collinearity among the covariates.  Table 1 also reports standard errors increase in the Poisson RE model specification, which results in significance level changes for some covariates, compared with the results of the covariates-only quasi-Poisson specification. For example, the ratio of population with a college or higher degree, the ratio of population under poverty, and the ratio of black population become insignificant in the RE model. The immigrant variable is included mainly because the State of Florida has gained a large number of immigrants, and papers in the literature argue that lung cancer risk may vary among U.S. residents and immigrants, as discussed in the preceding background. However, the immigrant variable does not have a significant association with lung cancer risk in both models. The only significant variable in the Poisson RE model is the smoking rate, which exhibits a positive relationship with lung cancer risk. The estimated RE term has a mean of zero, and is not correlated with the covariates, as expected. Figure 2 portrays the geographic distributions of RE components at the county resolution. The counties with high/low adjusted lung cancer rates in Figure 1a also are conspicuous in Figure 2a, which captures the major spatial pattern of lung cancer rates. However, the MC values suggest that both the RE and SSRE terms contain trace amounts of SA, which means inclusion of the covariates in the Poisson mixed model explains some degrees of the PSA component observed on Figure 1a. The p-values of the Shapiro-Wilk (S-W) normal diagnostic statistic indicate that neither closely conforms to a normal distribution. The decomposition of the SSRE term yields a mixture of moderate-to-strong PSA (Figure 2c) and moderate NSA (Figure 2d). The p-values of the S-W statistic indicate that SSRE-PSA and SSRE-NSA are normally distributed. The MC suggests no significant SA in the SURE component, and that it deviates from a bell-shape curve. the Poisson mixed model explains some degrees of the PSA component observed on Figure 1a. The p-values of the Shapiro-Wilk (S-W) normal diagnostic statistic indicate that neither closely conforms to a normal distribution. The decomposition of the SSRE term yields a mixture of moderate-to-strong PSA (Figure 2c) and moderate NSA (Figure 2d). The p-values of the S-W statistic indicate that SSRE-PSA and SSRE-NSA are normally distributed. The MC suggests no significant SA in the SURE component, and that it deviates from a bell-shape curve.

The Metropolitan Statistical Area Scale and Census Tract Resolution
Smoking prevalence data are not available at the census tract resolution. So only socio-economic and demographic variables were included to describe lung cancer incidence rates. Results for Poisson RE models for each MSA are compared with covariate-only quasi-Poisson regression results. The overdispersion values larger than one in Table 2 indicate the lung cancer counts are slightly overdispersed for all MSA cases. However, all of them get closer to one for the mixed models. The pseudo-R 2 increases suggest improvements of model performance for all MSAs. A comparison of Tables 2 and 3 shows that standard errors get larger for the Poisson RE model specifications, which may have an impact on the significance level of independent variables. Including the RE terms also enhances model performance; all RE specifications have larger pseudo-R 2 values. Tables 2 and 3 show that median household income is significant in all specifications, and has a negative association with lung cancer risk. Although the well-educated population variable is significant in some cases, exhibiting an inverse relationship, the population below poverty variable tends to be positively associated with lung cancer rates. The relationships between these socio-economic indicators and lung cancer risk corroborates the findings in the literature (e.g., [43,44]). For demographic factors, the estimated results suggest lower lung cancer risks for Hispanics, blacks, and immigrants. Stellman et al. [45] comment that the white and black populations have similar lung cancer risks if their smoking habits are similar. However, studies (e.g., [46]) find that Caucasians are more likely to be heavier smokers than African-American, which makes them more susceptible to lung cancer. Singh and Miller [23] observe that although lung cancer risk varies among different racial/ethical groups, it tends to be lower among U.S. immigrants due to a relatively lower smoking prevalence.
Intercept-only RE models are specified for each study area to examine the spatial variation in lung cancer incidence rates. Table 4 summarizes the amount of variation explained by the RE terms. It indicates that the RE terms explain a substantially smaller amount of variations at the census tract resolution than at the county resolution. In addition, this percentage varies across the six MSAs, with the Tallahassee MSA having the lowest statistical explanation (11.64%), and the Pensacola MSA

The Metropolitan Statistical Area Scale and Census Tract Resolution
Smoking prevalence data are not available at the census tract resolution. So only socio-economic and demographic variables were included to describe lung cancer incidence rates. Results for Poisson RE models for each MSA are compared with covariate-only quasi-Poisson regression results. The overdispersion values larger than one in Table 2 indicate the lung cancer counts are slightly overdispersed for all MSA cases. However, all of them get closer to one for the mixed models. The pseudo-R 2 increases suggest improvements of model performance for all MSAs. A comparison of Tables 2 and 3 shows that standard errors get larger for the Poisson RE model specifications, which may have an impact on the significance level of independent variables. Including the RE terms also enhances model performance; all RE specifications have larger pseudo-R 2 values. Tables 2 and 3 show that median household income is significant in all specifications, and has a negative association with lung cancer risk. Although the well-educated population variable is significant in some cases, exhibiting an inverse relationship, the population below poverty variable tends to be positively associated with lung cancer rates. The relationships between these socio-economic indicators and lung cancer risk corroborates the findings in the literature (e.g., [43,44]). For demographic factors, the estimated results suggest lower lung cancer risks for Hispanics, blacks, and immigrants. Stellman et al. [45] comment that the white and black populations have similar lung cancer risks if their smoking habits are similar. However, studies (e.g., [46]) find that Caucasians are more likely to be heavier smokers than African-American, which makes them more susceptible to lung cancer. Singh and Miller [23] observe that although lung cancer risk varies among different racial/ethical groups, it tends to be lower among U.S. immigrants due to a relatively lower smoking prevalence.
Intercept-only RE models are specified for each study area to examine the spatial variation in lung cancer incidence rates. Table 4 summarizes the amount of variation explained by the RE terms. It indicates that the RE terms explain a substantially smaller amount of variations at the census tract resolution than at the county resolution. In addition, this percentage varies across the six MSAs, with the Tallahassee MSA having the lowest statistical explanation (11.64%), and the Pensacola MSA having the highest statistical explanation (27.13%). The average percentage of variation accounted for by the RE terms is roughly 21%, indicating a tremendous amount of unexplained geographic variation in lung cancer rates, particularly at the census tract resolution. Figure 3 depicts the amount of variation accounted for by each RE component beyond that by the covariates. The SSRE and SURE components constituting a RE term explain almost the same amount of variation across all MSAs. Meanwhile, for the two sub-terms of the SSRE, the SSRE-NSA term outperforms the SSRE-PSA term for the Orlando, Pensacola, Tallahassee, and Tampa MSAs.   Figure 4 portrays the spatial patterns of RE components for the six MSAs. Because the RE components account for relatively low percentages of the geographic variation at the census tract resolution, Figure 4(a1-a6) do not reflect the map patterns of adjusted lung cancer rates well; however, they capture high cancer rates in urban areas, and low rates in rural areas for most of the MSAs, which also are highlighted on their corresponding cancer rates maps. For example, Figure  4(a3) highlights census tracts within Fort Lauderdale and Pompano Beach that have relatively high cancer rates, which also stand out in Figure 1d. The MCs imply a presence of weak PSA in the RE components, except for the Pensacola and Tallahassee MSAs, and the p-values of the S-W statistic indicate that they all barely conform to normal distributions.
After a removal of the SURE components from the RE terms, stronger PSA is detected in the SSRE components, with increasing MC values for most MSAs (Figure 4(b1-b6)). However, the SSRE components in the Pensacola and Tallahassee MSAs still exhibit (near-) zero SA. Similarly, a decomposition of these SSRE terms yields mixtures of moderate-to-strong PSA components ( Figure  4(c1-c6)) and weak-to-moderate NSA components (Figure 4(d1-d6)) for all MSAs. The p-values of the S-W statistic suggest that all of the SSRE-PSA and SSRE-NSA terms closely conform to normal distributions, except for the Jacksonville MSA. Map patterns displayed in Figure 4(e1-e6) appear random, an outcome confirmed by their insignificant MCs. All of the SURE components are normally distributed.  Figure 4 portrays the spatial patterns of RE components for the six MSAs. Because the RE components account for relatively low percentages of the geographic variation at the census tract resolution, Figure 4(a1-a6) do not reflect the map patterns of adjusted lung cancer rates well; however, they capture high cancer rates in urban areas, and low rates in rural areas for most of the MSAs, which also are highlighted on their corresponding cancer rates maps. For example, Figure 4(a3) highlights census tracts within Fort Lauderdale and Pompano Beach that have relatively high cancer rates, which also stand out in Figure 1d. The MCs imply a presence of weak PSA in the RE components, except for the Pensacola and Tallahassee MSAs, and the p-values of the S-W statistic indicate that they all barely conform to normal distributions.
After a removal of the SURE components from the RE terms, stronger PSA is detected in the SSRE components, with increasing MC values for most MSAs (Figure 4(b1-b6)). However, the SSRE components in the Pensacola and Tallahassee MSAs still exhibit (near-) zero SA. Similarly, a decomposition of these SSRE terms yields mixtures of moderate-to-strong PSA components (Figure 4(c1-c6)) and weak-to-moderate NSA components (Figure 4(d1-d6)) for all MSAs. The p-values of the S-W statistic suggest that all of the SSRE-PSA and SSRE-NSA terms closely conform to normal distributions, except for the Jacksonville MSA. Map patterns displayed in Figure 4(e1-e6) appear random, an outcome confirmed by their insignificant MCs. All of the SURE components are normally distributed.

Conclusions
This research examines the spatial patterns of lung cancer incidence rates at different geographic resolutions and scales in Florida, and also investigates factors that are associated with lung cancer risk. Major findings are as follows. First, lung cancer count data contain a substantial amount of overdispersion (13.36) at the county resolution, whereas they are much less overdispersed (less than 2) at the census tract resolution. A RE model specification successfully addresses this issues. Because the estimated overdispersion parameter is closer to 1 for the RE model specifications, substitution of a negative binomial model becomes unnecessary, which is a desirable outcome given reservations expressed by [47] concerning the suitability of this latter specification for SA situations. Second, a RE model furnishes an efficient method to correct for biased estimation (e.g., underestimated standard errors). Regression results indicate that an inclusion of a RE term, which can serve as a proxy for omitted variables, improves model performance (e.g., it increases pseudo-R 2 values). Third, estimated results suggest that a risk of lung cancer is positively associated with smoking behavior, and the percentage of population with low socio-economic status (e.g., low household income, poor education), and negatively associated with the percentage of black/Hispanic population, and immigrants. These positive/negative relationships corroborate findings already appearing in the literature.
This research contributes to the literature in the following two ways. First, this research shows that the RE model specifications improve model performance by including a RE terms that successfully accounts for variation beyond that attributable to covariates. Here the RE terms account for 58.39% of the geographic variation in lung cancer incidence rates at the county resolution, and 21% of this variation, on average, at the census tract resolution. This outcome indicates that considerable unexplained variation exists in the lung cancer data at the census tract resolution. This poor statistical explanation probably is attributable to two major factors: aggregating cancer cases into a coarser resolution (e.g., county) averages out noise that present in a finer resolution (e.g., census tract) [48]; and, the massive immigration to Florida of seniors. Generally speaking, population migration over time can contribute to a change in cancer rates, and results in an introduction of a source of variation that is not well described with RE. A purposeful migration for health issues can have a large impact. For example, unhealthy immigrates would choose to move closer to health facilities, or move away from contaminated areas, whereas healthy people relocate to regions that are economically better off (e.g., [49,50]). Immigration, thus, may muddle disease rates in a region with rates increasing in some areas while decreasing in others [51]. In addition, the State of Florida is a well-known destination for retired people. Such movement of elderly people can distort the age pyramid of the state, resulting in an impact on adjusted cancer rates.
Second, the RE term comprises SSRE and SURE components; their MCs indicate the existence of weak-to-moderate PSA (e.g., the Miami MSA) or (near)-zero SA (e.g., the Tallahassee MSA) in the SSRE components. However, a decomposition of the SSRE terms explicitly shows that they essentially are mixtures of moderate-to-strong PSA and weak-to-moderate NSA. Griffith and Arbia [8] utilize a two-SA-parameter spatial simultaneous autoregressive model to uncover a mixture of SA, where the PSA component counterbalances the NSA component. A discovery of SA mixtures has rarely been reported in literature, especially in epidemiology, and its detection can help researchers gain a better understanding of the geographic distribution of, geographic variation of, and risk factors for a disease. As discussed earlier, the moderate-to-strong PSA largely is associated with the geographic distribution of socio-economic phenomenon (e.g., employment status, population migration), whereas the weak-to-moderate NSA likely is linked to mechanisms such as a decrease of lung cancer rates because of increasing cancer screening when lung cancer cases are detected in neighboring places.
This study furnishes motivation for a number of future research efforts. First, a comparison of research outcomes at the county and census tract resolutions reveal a presence of substantial heterogeneity in lung cancer data, and more noise is expected if a spatial analysis is conducted at a finer resolution (e.g., block groups). Thus, extending current research to a finer resolution would be beneficial. Second, a comparison of crude and adjusted lung cancer incidence rates suggests the disappearance of some prominent spatial patterns (e.g., PSA) at both geographic resolutions. However, this observation has rarely been discussed in the literature, and hence a further examination of rate standardization and/or more similar case studies is necessary. Third, to date, the literature about PSA-NSA mixtures is relatively scant. This study only explores the scenario that a weak-to-moderate PSA or (near)-zero SA can be partitioned into a mixture of moderate-to-strong PSA and weak-to-moderate NSA. Other scenarios (e.g., a global strong PSA; moderate NSA) remain to be investigated. Finally, SA mixtures are discovered in the lung cancer data in Florida. Similar research should be conducted to examine if consistent results would be obtained with different empirical data, or for different study areas.