1. Introduction
Lung and bronchus cancer remains a leading cause of mortality in the United States. While smoking is a well-established risk factor, the contribution of environmental and urban structural characteristics to lung cancer mortality is less understood. This paper examines how smoking prevalence and urban density—proxied by the number of skyscrapers per state—jointly influence lung cancer mortality, emphasizing gender differences in these relationships.
In this paper, I use the term Vertical Density to denote height-based structural density, rather than overall population or employment density. Vertical Density is proxied by the count of skyscrapers, which capture extreme realizations of high floor-area ratios (FAR) in the most intensively developed urban locations. In standard urban economics models, FAR is the canonical measure of land-use intensity and increases where land rents and accessibility are high, leading developers to substitute capital for land by building upward (McDonald and McMillen [
1]). Because consistent state-level data on FAR or fine-grained urban density are not available over long time horizons in the United States, skyscraper counts provide a practical and theoretically grounded proxy for the extent to which a state contains highly vertical urban cores. Importantly, this measure is not intended to represent average urban density, but rather the presence and scale of vertically intensive development within a state.
Greater vertical density in urban settings can intensify the accumulation and reduced dispersion of air pollutants, particularly fine particulate matter and nitrogen oxides, within dense built forms. This can result in higher population level exposures in affected neighborhoods. Such exposures have been associated with increased lung cancer risk through pathways including oxidative stress and chronic inflammation [
2,
3].
Dense urban configurations also contribute to urban heat island effects that can worsen local air quality by enhancing photochemical reactions and reducing pollutant dispersion, thereby compounding respiratory stress in exposed populations [
4,
5].
Moreover, disparities in access to high-quality healthcare and environmental mitigation resources across densely built urban areas may contribute to differences in lung cancer screening uptake, timeliness of diagnosis, and treatment quality, thereby shaping survival outcomes. Such inequalities in both environmental exposure and healthcare access have been identified as key drivers of lung cancer disparities [
6,
7].
Drawing on a novel dataset of 2034 state-year observations across 48 U.S. states from 1999 to 2022, we merge epidemiological data from the CDC with static measures of skyscraper prevalence. The dataset includes age-adjusted mortality and smoking rates disaggregated by gender. The analysis includes 48 U.S. states. Delaware, North Carolina, South Carolina, and Puerto Rico were excluded due to incomplete data on age-adjusted lung cancer mortality incidence for females, males, or both. While skyscraper counts are time-invariant, they capture persistent state-level differences in urban density, land use, and exposure to environmental risk factors. Urban form, especially vertical density, has been linked to pollution exposure, health access, and behavioral risk patterns [
8,
9,
10,
11].
This study makes three key contributions. First, it employs a fully interacted quadratic regression model to identify gender-specific, nonlinear relationships between urban density and lung cancer mortality. Following Chiang and Wainwright [
12], we incorporate quadratic terms to capture U-shaped or inverted U-shaped effects, reflecting the complex health implications of increasing urban density. Second, it introduces skyscraper count as a structural proxy for vertical urban development, leveraging its correlation with economic activity, environmental exposure, and infrastructure access (Brueckner & Rosenthal [
13]). Third, the model integrates behavioral (smoking) and structural (urban form) variables to produce a nuanced understanding of cancer mortality trends.
The results confirm a strong, positive association between smoking prevalence and lung cancer mortality, in line with epidemiological evidence [
14,
15]. Mortality rates decline over time, though the trend is more pronounced for males than females. A key finding is the inverted U-shaped relationship between skyscraper density and mortality for both sexes—mortality initially rises with urban density but eventually declines, likely reflecting a shift from pollution-driven risks to better healthcare access and urban amenities at very high densities (Rundle et al. [
16]).
Importantly, even after accounting for smoking and time, significant gender disparities in cancer mortality persist. This underscores the value of disaggregated analysis and the limitations of one-size-fits-all urban health policies. By integrating urban economics with public health, this study advances our understanding of how built environments shape population health across space and time.
3. Methodology
Consider the following empirical model, in which all independent variables fully interact with the binary variable
(where
denotes the base category of males, and
denotes females):
In this model, the dependent variable is the age-adjusted mortality rate from lung and bronchus cancer. The explanatory variables include the number of skyscrapers in each state (Skyscrapers), its squared term (Skyscrapers
2), age-adjusted smoking prevalence (Smoking), a linear time trend (Year), and their interactions with the sex indicator Female. The disturbance term
satisfies the classical assumptions of the linear regression model (Ramanathan [
20] p. 96).
The analysis employs pooled OLS rather than fixed-effects or first-difference models, which rely exclusively on within-unit variation and therefore absorb time-invariant regressors such as the skyscraper measure. The pooled specification exploits cross-sectional variation across states and is identified under the standard assumption that the regressors are uncorrelated with the error term (see Wooldridge [
19]).
This fully interacted specification allows both intercepts and slopes to differ systematically between males and females, providing a flexible framework for testing gender heterogeneity in the effects of smoking behavior, time trends, and urban density (Wooldridge, [
19] pp. 225–263).
Quadratic terms in skyscraper counts are included to allow for non-linear effects of urban density on lung cancer mortality. Such non-monotonic relationships are theoretically plausible, as increases in vertical density may initially elevate environmental exposure risks but eventually reduce mortality through improved access to healthcare, infrastructure, and urban amenities (Chiang & Wainwright, 2005 [
12] pp. 229–231).
The variable Year is normalized such that Year = 0 corresponds to 1999. This transformation ensures that the intercept represents a meaningful baseline mortality level rather than an extrapolation to an implausible reference year, thereby improving interpretability of the constant term (Ramanathan [
20] pp. 147–148; Hoaglin [
21,
22]).
The model is estimated using pooled state-year observations with sex-specific data. Standard errors are reported in parentheses, and the interacted structure permits direct comparison of marginal effects across genders within a unified empirical framework.
4. Results
One concern with the pooled OLS specification is that identification relies on the assumption that the explanatory variables are uncorrelated with the regression disturbance term. In particular, if the skyscraper measure were systematically related to unobserved state-level characteristics affecting mortality, the estimated coefficients could be biased. To assess this concern,
Table S1 in the Supplementary Materials reports diagnostic evidence based on the correlation between the skyscraper variable and the regression residuals. The results show no statistically meaningful correlation, providing support for the maintained exogeneity assumption and suggesting that the estimated association is not driven by residual correlation between skyscrapers and unobserved determinants of mortality.
Table 3 presents the regression results corresponding to the empirical models specified in Equation (1). In line with previous findings, the results reveal a positive association between lung cancer mortality and smoking prevalence, and a negative association with time.
The baseline expected prevalence of lung cancer in 1999—among individuals in states with no skyscrapers and the minimum observed percentage of smokers—is estimated at cases per 100,000 males (calculated as and cases per 100,000 females A one-percentage-point increase in smoking prevalence is associated with an increase of age-adjusted lung cancer cases per 100,000 males and cases per 100,000 females.
Results from the fully interacted regression model (Column 1,
Table 3) show that in 1999, the lung cancer mortality rate among non-smoking women was
cases per 100,000 higher than that of men. However, after adjusting for smoking prevalence, this gender gap narrows: for each additional percentage point in smoking, the expected mortality increase is
cases for females compared to
cases for males per 100,000 individuals. Notably, once the smoking rate exceeds a threshold of
—which is below the minimum observed female smoking rate of 19.0%—the expected mortality rate for women falls below that of men. This threshold can be determined by solving the following equation, based on the results in Column (1)
Table 3:
where
represents the smoking prevalence (%) at which the projected lung cancer mortality rate for females becomes lower than that for males. Solving the equation yields:
.
Looking at the coefficients for the
variable, each additional year is associated with a reduction of
lung cancer deaths per 100,000 males and
per 100,000 females. These estimates indicate a steady downward trend in lung cancer mortality over time for both sexes. If one were to mechanically extrapolate this linear trend under the highly restrictive assumptions of no skyscrapers and the lowest observed smoking prevalence, the implied time at which the linear prediction would intersect zero is approximately 30.5 years for males and 48.6 years for females. These thresholds can be calculated by solving the following equations, based on the estimates reported in Column (1) of
Table 3: For males:
; For females:
. Solving these equations yields:
years for males and
years for females. This calculation is intended solely as a theoretical illustration of the relative pace of decline and should not be interpreted as a plausible forecast, as mortality rates are bounded away from zero and subject to biological, behavioral, and institutional constraints.
Figure 1 illustrates the relationship between the age-adjusted lung cancer mortality rate and the number of skyscrapers, based on the estimates presented in Column (1) of
Table 3. Notably, this relationship exhibits an inverted U-shaped curve for both genders. The projected mortality rate rises from 49.54 cases per 100,000 males and 39.45 cases per 100,000 females in states with zero skyscrapers, peaking at 45 cases per 100,000 males at 125 skyscrapers and 52.97 cases per 100,000 females at 136 skyscrapers. Beyond these points, the projected rates decline, falling to 48.39 cases per 100,000 males and 39.54 cases per 100,000 females at 270 skyscrapers.
This pattern suggests that, initially, increased vertical urban development may reflect higher population density and greater industrial activity—both of which are associated with elevated levels of ambient air pollution and exposure to carcinogens. Epidemiological studies have long established a strong link between urban air pollution—particularly fine particulate matter (PM2.5)—and increased lung cancer incidence (Pope et al., 2002; Hamra et al., 2014 [
14,
15]). Moreover, densely built urban environments are often associated with behavioral and socioeconomic risk factors, such as higher smoking rates, elevated stress levels, and lower levels of physical activity [
16,
23].
Figure 1 displays projected mortality rates separately for women and men as a function of the skyscraper measure. While these projections convey overall gender-specific patterns, they do not directly report the implied female–male difference or its statistical uncertainty, making it difficult to determine whether visually small gaps in certain ranges are statistically meaningful. To address this,
Figure S1 in the Supplementary Materials reports the estimated female–male difference in projected mortality rates together with the corresponding 99% confidence interval. The bottom panel of
Figure S1 shows that the estimated difference remains on one side of zero across the full range of skyscraper values, indicating a statistically significant gender gap at each level of vertical urban development. Although the magnitude of the gap varies and appears modest in some regions, the confidence interval does not cross zero, clarifying that the visual proximity of the gender-specific projections in
Figure 1 reflects scale rather than statistical insignificance.
In the case of lung cancer, modern medical treatments can be highly effective. Urban areas with greater building density typically have better access to advanced healthcare facilities and superior medical services, which may contribute to declining mortality rates as the number of skyscrapers increases. However, even when “controlling” for urban density and smoking rates, the expected mortality rate from lung cancer remains consistently higher among males than females for any given number of skyscrapers.
Robustness to Highly Urbanized States
To further strengthen the internal coherence of the analysis, the main results are complemented by additional robustness checks reported in the
Supplementary Materials. Because skyscraper construction is highly concentrated in a small number of states, it is informative to assess whether the estimated relationships are disproportionately influenced by highly urbanized observations. To this end,
Table S2 presents median (quantile) regression estimates, which reduce sensitivity to extreme observations and mitigate the influence of states with exceptionally large skyscraper counts by emphasizing the central tendency of the conditional mortality distribution. The median regression results closely replicate the baseline pooled OLS findings, preserving both the inverted U-shaped relationship between vertical urban density and lung cancer mortality and the observed gender differences. This consistency indicates that the main conclusions are not driven by highly urbanized states or by mean-based estimation per se.
In addition,
Figures S1 and S2 explicitly examine how the female–male mortality gap varies across levels of vertical urban density by reporting projected differences together with their associated confidence intervals. These supplementary analyses clarify that, even in density ranges where the gender-specific mortality curves appear visually close, the estimated gender gap remains statistically distinguishable from zero throughout the relevant range. Together, the results reported in the
Supplementary Materials underscore the incremental contribution of the additional analyses, demonstrating that the nonlinear density effects and gender heterogeneity documented in the main text are robust to alternative estimation strategies and variations in urban density.
Figure S2 examines gender-based mortality variation with respect to urban density based on the median regression. The bottom panel of
Figure S2 reports the estimated female–male mortality difference implied by the median regression specification, along with its statistical uncertainty. As in
Figure S1, the estimated difference does not cross zero over the relevant range, indicating that the gender gap remains statistically distinguishable from zero throughout. This consistency across alternative measures of urban form and estimation approaches reinforces the conclusion that observed gender differences reflect systematic variation rather than noise, even when point estimates appear close in magnitude.
5. Summary and Conclusions
This study investigates how smoking prevalence and urban density, measured by skyscraper counts, interact to influence lung and bronchus cancer mortality across U.S. states from 1999 to 2022. Using a gender-disaggregated dataset and a fully interacted quadratic regression model, the analysis reveals that lung cancer mortality is strongly associated with smoking and exhibits a non-linear relationship with vertical urban density. Mortality initially increases with urban density but later declines, suggesting a complex interplay between environmental exposure and healthcare access. Importantly, the study finds persistent gender differences in cancer mortality that are not fully explained by smoking rates or urban form, indicating the need for more tailored health interventions.
Initially, increased vertical urban development likely reflects higher population density and industrial activity, both linked to greater air pollution and cancer risk. Dense urban areas also concentrate behavioral and socioeconomic risks, such as higher smoking prevalence and stress. However, at very high levels of density, improved access to healthcare may help reduce lung cancer mortality. Notably, even after accounting for urban density and smoking, men consistently show higher lung cancer mortality than women, pointing to persistent gender disparities.
The observed differences between men and women should be interpreted within a broader sex- and gender-based framework that goes beyond purely quantitative contrasts. A growing interdisciplinary literature demonstrates that men and women differ systematically in health perception, risk evaluation, and decision-making under uncertainty, all of which influence disease prevention and health outcomes. For example, Arbel, Fialkoff, and Kerner [
24] show that health conditions such as obesity are differently internalized and economically evaluated by men and women, highlighting gender-specific health trade-offs. More recent evidence indicates persistent sex differences in preventive health decision-making, including vaccination behavior, reflecting divergent risk attitudes and cognitive strategies between females and males (Arbel et al. [
25]). These behavioral and social mechanisms are particularly relevant for lung cancer, where smoking trajectories, occupational exposures, healthcare-seeking behavior, and adherence to screening recommendations remain strongly gendered.
Complementing the behavioral and decision-making evidence discussed above, recent lung cancer–specific clinical reviews published in 2025 emphasize that biological sex differences interact with gendered behaviors across the entire disease trajectory. Advances in basic and translational lung cancer research highlight sex-based differences in tumor biology, molecular pathways, and immune response, particularly in non-small-cell lung cancer, with important implications for personalized prevention and treatment strategies (Mascaux et al. [
26]). Clinical reviews further document that men and women differ in screening uptake, diagnostic pathways, and histological subtypes, reinforcing the need for sex-sensitive screening and early detection approaches (Tahayneh et al. [
27]). Growing evidence also indicates that sex modifies response to systemic therapies, including immunotherapy and chemo-immunotherapy combinations, with systematic reviews and meta-analyses reporting sex-specific differences in treatment efficacy and toxicity profiles [
28,
29]. In addition, recent work on patient-centered outcomes shows that postoperative recovery trajectories and health-related quality of life following pulmonary resection differ significantly by sex, underscoring the relevance of gender-sensitive clinical management (Beushausen et al. [
30]). Taken together, this body of 2025 clinical literature supports the interpretation that the sex differences observed in the present study reflect an interaction between biological mechanisms and socially constructed gender roles, rather than purely statistical variation, and reinforces the importance of integrating public health, oncology, and gender perspectives when interpreting econometric findings.
Despite its contributions, the study has several limitations. First, skyscraper counts are a crude, time-invariant proxy for urban density and may not fully capture temporal shifts in population distribution, environmental conditions, or healthcare infrastructure. Second, the analysis assumes uniform effects within states, potentially overlooking intrastate disparities such as those between urban cores and peripheral regions. Third, omitted variables—such as occupational exposure, socioeconomic status, or access to early detection—may confound the observed relationships.
A key strength of the pooled OLS approach used in this study, relative to fixed-effects regression, is that the skyscraper variable captures persistent cross-sectional differences in vertical urban development across states using accurately observed building counts. By construction, this measure exploits between-state variation in urban form, which is central to the research question. Econometric approaches that rely exclusively on within-state variation, such as fixed-effects or first-difference models, would absorb this cross-sectional dimension and therefore be unable to identify the association between vertical urban structure and mortality outcomes. Retaining this cross-sectional variation allows the analysis to preserve meaningful differences in urban form across states that are directly relevant to the study.
The empirical analysis relies on pooled OLS estimation, which rests on the assumption that the independent variables are uncorrelated with the random disturbance term. A potential concern is that the skyscraper measure, as a proxy for broader urban environments, may be correlated with unobserved state-level characteristics that also affect mortality. To assess this concern, I provide additional diagnostic evidence in
Table S1 in the Supplementary Materials, where I examine the correlation between the skyscraper variable and the regression residuals. The results show no statistically meaningful correlation, lending support to the maintained exogeneity assumption underlying the pooled OLS specification. While this evidence cannot definitively rule out all forms of omitted-variable bias, it increases confidence that the estimated associations are not driven by systematic correlation between skyscrapers and unobserved determinants of mortality.
A related limitation concerns smoking prevalence, which is measured at a single point in time (BRFSS 2022) and applied uniformly across the mortality panel. This temporal mismatch implies that smoking should be interpreted as a contemporaneous cross-sectional control rather than a historical exposure, and the associated coefficients should not be used to support strong temporal or causal interpretations.
From a policy perspective, these findings underscore the need for targeted public health strategies that account for both environmental and demographic heterogeneity. Urban planning should prioritize reducing pollution in moderately dense areas while ensuring equitable access to healthcare in high-density regions. Gender-specific health campaigns and lung cancer screening programs may also be warranted, particularly in states with persistently high male mortality. More broadly, this research highlights the importance of integrating urban economics into public health frameworks to better address the spatial and structural determinants of disease.