Understanding Regional Risk Factors for Cancer: A Cluster Analysis of Lifestyle, Environment and Socio-Economic Status in Poland

: To date, no results have been published regarding cluster analysis of risk factors for cancer in Poland. Many cancer deaths are preventable through the modification of cancer risk behaviours. This study explores the multidisciplinary connection between lifestyle, environment and socio-economic status (SES). Cluster analyses indicate that major metropolitan areas and large industrial regions differ significantly in terms of SES, lifestyle and environment when compared with other parts of Poland. Our ﬁndings show that in order for interventions to be e ﬀ ective, cancer-prevention policy should be addressed on both local and national scales. While anti-cancer policies in Poland’s industrial regions should focus on air pollution, the country’s northern regions should aim to curb smoking, increase sports activity and improve SES. Policy interventions must target the root causes of cancer in each region of Poland and must account for SES.


Introduction
Cancer is the second leading cause of death responsible for taking the lives of one in six or a total of 9.6 million people worldwide in 2018 [1]. The World Health Organization (WHO, Geneva, Switzerland) estimates that nearly a quarter of the global cancer burden occurs in Europe, where the probability of dying from cancer or any one of three leading non-communicable diseases (NCDs; include cardiovascular disease, diabetes and chronic respiratory disease) among adults between the ages of 30 and 70 ranges from 9.1% in Sweden to 23.6% in Bulgaria [2,3]. By comparison, the probability of dying from NCDs in the United States (US) is around 14.3% with 1.7 million newly diagnosed cases in 2018 [4]. In an effort to fast-track the overall aims of cancer research, including improved prevention, detection and new therapy development, US Congress allocated 500 million USD to the Cancer Moonshot Research Initiatives 2017-2025 [5]. Globally, the provision of policy options stems from the WHO Global Action Plan for the Prevention and Control of NCDs 2013-2020 which seeks to reduce cancer mortality rates by 20% by 2030 [6]. Efforts to fight cancer have also been undertaken at the European level. In 2009, the European Commission (EC) launched a joint action on "European Partnership for Action Against Cancer" (EPAAC), focused on facilitating cooperation and sharing experiences among the EU member states in the area of cancer control policies [7]. Intended to help policy-makers in the EU member states design national cancer control programmes [8], the key 1.
Invest in human resources by increasing medical staff and improving the quality of oncological training and education; 2.
Reduce cancer incidence through investments in education and primary prevention related to cancer disease (i.e., promotion of healthy lifestyles to curb cancer-risk behaviours); 3.
Improve the effectiveness of secondary prevention through larger investments in screening; 4.
Invest in bench sciences and healthcare innovation in order to introduce effective diagnostic and therapeutic solutions; 5.
Fund improvements within the oncological care system in order to improve organization and processes whilst also providing patients with access to the highest quality of comprehensive, diagnostic and therapeutic care along the "patient path".
The National Oncological Strategy 2020-2030 was adopted by the Parliament in February 2020. As such, evidence-based strategies for cancer-risk factors in Poland are desperately needed to improve policy interventions and curb rising rates of cancer mortality. To date, only a limited number of studies have been published to help fill this void. Polak et al. [14] showed that socio-economic status (SES) is significantly correlated with lung cancer mortality in Poland. These findings, however, are constrained by the study's sole focus on lung cancer outcomes. Another study in Poland examined breast cancer and found a negative correlation between income, poverty and breast cancer incidence [15]. What is lacking in the literature are studies that take into account a variety of cancer types, coupled with geospatial considerations, cancer-risk behaviours and social and economic demographic conditions. The WHO's International Agency for Research on Cancer (IARC) classifies cancer-risk factors into two main categories: internal and external. The former depends on genetic profiling and unhealthy behaviours (e.g., tobacco and alcohol use, physical inactivity and unhealthy diet). The latter is shaped by levels of physical, chemical and other biologically carcinogenic agents borne in air, water, soil, etc. [1]. Studies show that cancer-risk factors are unevenly distributed across regions and vary across gender, age and a variety of socio-economic factors [16][17][18].
Some cancer-risk behaviours, such as tobacco and/or alcohol use, have already been studied in Poland and across Central and Eastern Europe [14,[19][20][21]. Thus far, no empirical evidence combining both internal and external cancer-risk factors across regions in Poland has been published. This study aims to fill this gap by investigating regional differences in SES (income and education levels) in Poland combined with lifestyle (tobacco and alcohol use, diet, sports activity) and environment (air pollution). This paper investigates the clustering of health-risk factors from data collected through the 2015 wave of the Social Diagnosis study (objective and subjective quality of life survey) and local pollution data obtained from Poland's Chief Inspectorate for Environmental Protection.

Lifestyle
Cancer incidence is highly correlated with a number of lifestyle factors [22], which include tobacco use, diet, physical inactivity, obesity and alcohol consumption [23][24][25]. Tobacco use increases the risk of fourteen types of cancer [26][27][28] including lung cancer, which is the leading cause of cancer deaths in Poland [29]. One-third of annual cancer mortalities in the US are caused by tobacco exposure and use, while another one-third of cancer deaths are associated with diet, physical inactivity and obesity [30]. Obesity increases the risk of thirteen types of cancer [30][31][32][33][34][35][36][37], while physical activity has been proven to reduce the risk of four of these cancer types [30,31,34,38], and regular physical activity combined with a healthy caloric intake helps to maintain a healthy body weight, thus reducing the overall risk of cancer [30]. Excessive regular alcohol consumption (more than two drinks for men and one drink for women) is an established cause of many cancers, including mouth, pharynx, larynx, esophagus and liver [39][40][41]. In the US, multiple studies have shown that cancer-risk behaviours often co-occur [23,[42][43][44], and only a very small percentage of adults (<5%) achieve a healthy lifestyle that includes no tobacco use, low alcohol consumption, regular physical activity as well as a healthy diet and weight [43,45].
Some studies have found evidence of health-risk lifestyles clustering within geographic areas and across gender, age and socio-economic groups. For example, high rates of the co-occurrence of smoking and alcohol have been identified among both females [24] and males [46]. A study from Spain examined the relationship between smoking, alcohol consumption, physical activity, and the diet of college students and found that clusters of smoking, diet, and physical inactivity occur together [47]. Other studies show a high prevalence of unhealthy diet and inactivity among young adults in college [40,41].

Air Quality
The impacts of environmental pollution on health and cancer incidence are studied continuously, and results vary depending on geographical region, pollutant type and environmental medium (e.g., air, soil, water). Because research conducted by the European Environment Agency (EEA) finds that "air pollution is a major cause of premature death and disease and is the single largest environmental health risk" [48], this paper focuses on air pollution in Poland. Key air pollutants with standards set by WHO air quality guidelines and EU Ambient Air Quality Directives include particulate matter (PM), carbon monoxide (CO), hydrocarbons, such as benzo[a]pyrene (BaP), sulphur oxides (SO 2 ), nitrogen oxides (NO 2 ), ozone (O 3 ), benzene (C 6 H 6 ), certain metals (Pb, As, Cd, Ni) and polycyclic aromatic hydrocarbons [48].
Numerous epidemiological studies conducted over the last few decades have provided evidence that risk of cancer is related to air pollution [14,[49][50][51]. Particulate matter (PM 10 , PM 2.5 ) and benzo[a]pyrene (BaP) are regarded as pollutants of particularly high risk [48,52]. In general, PM consists of complex mixtures of solid and liquid particles in the air. Particles with a diameter of 10 microns or less (PM 10 ) can penetrate and stay inside the lungs. Even more damaging to health are particles with a diameter of 2.5 microns or less (PM 2.5 ), which can penetrate lung barriers and enter the circulatory system. Chronic exposure to these types of particle matter increases the risk of developing cardiovascular and respiratory diseases, as well as lung cancer. Limits aim to achieve the lowest concentrations of PM possible [53]. Therefore, these air pollutants were included in our analysis of cancer risk factors.

Socio-Economic Factors
Many studies provide evidence of socio-economic differences (i.e., income, household wealth gains and losses, education) on individual health [54][55][56][57][58] as well as on the cost of illness [59]. Studies also show disparities in cancer risk and mortality related to socio-economic factors, such as income or educational level. Income plays a significant role in shaping social living conditions since higher income allows for better housing, increased schooling and more sophisticated forms of recreation [60].
Education may influence the risk of cancer in a variety of ways, including but not limited to the choice of profession, sports activities, smoking and alcohol consumption, and participation in health screening programmes [61].
There is a common agreement among scholars that the role of SES in shaping health is largely indirect. SES shapes individual consumer behaviour (i.e., smoking, diet, physical activity, obesity) and thus impacts health. SES also influences health through better or worse access to the health care system.
Significant disparities in cancer death rates due to differences in income were identified through a cross-sectional study of 3135 counties in the US [62] and were found to be influenced by mediators such as health-risk behaviours, quality and costs of medical care and food insecurity. Similar findings regarding reasons for the growing rates of cancer mortality disparities in US counties are provided by Withrow et al. [63], who conclude that smoking, obesity and high alcohol consumption are highly correlated with income and may contribute to differences in cancer mortality.
Another study uses survey data collected in France (N = 3359) to examine the relationship between perceptions of cancer risk factors and SES. Results from this analysis show that the perception of cancer risk factors depends on the SES of respondents. Here, higher perceptions of both behavioural and psychosocial factors emerge for respondents with higher SES, with respondents with intermediate levels of SES being more likely to stress environmental risk factors [17].
Polak et al. [14] confirm the correlation between socio-economic factors and death from lung cancer and other respiratory diseases in 66 regions in Poland for the period 2010-2014. The authors measured SES using an index derived from five variables: average monthly salary, percentage of the population with higher education, percentage of people employed in finance and real estate, unemployment rate, and percentage of the population receiving social support due to poverty. Using multiple weighted linear regression models, the authors estimated a relationship between SES and mortality resulting from respiratory diseases and lung cancer. The results confirm a correlation between current SES measured by the SES index and mortality from both respiratory diseases and lung cancer at the population level. However, changes in SES were found to be important only for mortality caused by respiratory diseases. An increase in SES was associated with a decrease in mortality from respiratory disease but did not matter for lung cancer mortality.
Tanaka et al. examine links between the risk of cancer-related mortality and economic disparities among 21,000 patients diagnosed with various types of cancer between 2010 and 2012 in the Aomori Prefect of Japan. To measure cancer incidence, the authors employ two indicators: (1) an age-standardized incidence rate and (2) the incidence rate ratio. Using a multivariable Cox regression approach, the authors investigate the relationship between patient survival time and income disparities. They also examine how specific factors, such as stage of cancer diagnosis and type of treatment, influence the rate of cancer death at a particular point in time (hazard rate). The authors find that patient income levels differentiate age-standardized cancer incidence rates for breast, cervical and prostate cancer but not for stomach, lung and liver cancer [64]. These results are in-line with findings from Korea, where a study used Korean National Health Insurance Cancer Registration data to investigate the relationship between cancer incidence and family income. The age-standardized incidence rates for major cancer types were calculated and adjusted for age, number of family members and residential area. Using a logistic model, the authors prove that there is a negative and significant relationship between income and cancer incidence risk. In general, cancer incidence risk is higher for low-income families when compared to higher-income counterparts. However, this study also shows that within the same income group, cancer-risk differs across types of cancer [65].
The results of previous studies of the relationship between socio-economic factors and cancer incidence are mixed. The impact of socio-economic factors varies by country or district and may be different for various types of cancer. However, the majority of studies confirm that socio-economic factors have an indirect impact on the incidence of certain cancers. Therefore, studies on cancer risk factors must be situated in a socio-economic context.

Research Questions
Based on the literature review, the following research questions were developed: • Are clusters distinguishable when internal factors (lifestyle and socio-economic status) are combined with external factors (environment) into a single spatial analysis? • What are the implications of spatial cluster analysis of cancer-risk factors for health policy interventions in Poland?

Data
The Social Diagnosis (SD) study is an appreciably long-standing survey developed by an interdisciplinary team of professionals seeking to measure significant aspects of household life in Poland, including attitudes, mindsets and behaviour. The survey focuses on collecting economic information (e.g., income, material wealth, savings and finance) as well as behavioural, attitudinal, demographic and socio-economic details (e.g., including but not limited to educational attainment, medical care, approach to problem-solving, stress, psychological well-being, lifestyle, pathologies, engagement in cultural events and the arts, use of communication technologies) from participants aged 16 or older and living in households across all regions in Poland. The dataset was derived from a panel study where the sample of households was first drawn and surveyed in 2000. This same sample of households was re-surveyed in 2003, and then every two years consecutively.
In this study, we merged the 2015 SD sample (19,769 respondents) with 2014 local air pollution data received from the Office of the Chief Inspectorate for Environmental Protection, which provides hourly measurements of various air pollutants taken at approximately 2000 air-quality measuring stations from across Poland in the years 2000-2018 [66,67]. The data was aggregated and merged using the 66 Nomenclature of Territorial Units for Statistics (NUTS-3) region classification in Poland. [68]. The sample characteristics of regions are provided in Table 1. Four cancer-risk behaviours were identified using data from the SD on the regional level: average daily cigarette use, alcohol consumption, sports activity, and being overweight. Cigarettes per day is a continuous variable constructed from the self-reported average number of cigarettes smoked per day. Alcohol consumption is a dichotomous measure that was defined as true if the respondent reported "drinking too much" in the last year and false otherwise. The sports activity condition was set to be true if the respondent reported engaging in any sports activity including aerobics, running/jogging/Nordic walking, gym, cycling, skiing or other winter sport, swimming, football or other team sport, yoga, martial arts and another sport or type of physical activity. The measure of overweight/obesity is represented by the calculation of the body-mass index (BMI). It is a continuous measure that reflects the combination of the respondent's self-reported height and weight measured in kilograms and centimetres, respectively.
Four continuous measures of air quality were merged with the SD survey data and include: level of benzoapyrene (BaP), airborne particle matter concentrations with a diameter of either 10 or less and 2.5 or less microns (PM 10 and PM 2.5 , respectively) and duration (measured in days) of elevated PM 10 concentrations (PM 10 days) above 50 µg/m 3 . In general, the lower the ambient air measure, the better the air quality.
In this sample, average tobacco use is an average of 3.5 cigarettes per day. Nearly 5% of respondents declared that they consume too much alcohol. The average national BMI measures 26.3, and 36% of respondents confirmed completing at least one sports activity per week. The average number of years of completed education for the sample is around 12, indicating that most of the sample earned a high school diploma or its equivalent. Monthly net household income was around 4000 PLN (approx. 1000 USD, with an exchange rate of 1 USD = 4 PLN). In our sample, the annual particulate matter for larger particles, PM 10 , averaged around 32 µg/m 3 , which exceeds WHO guidelines (20 µg/m 3 ). In addition, although concentrated PM 10 levels should not exceed 35 days per year, our air-quality measurement shows that the PM 10 concentration exceeded WHO limit guidelines for over 53 days. Annual measures of fine particulate matter concentrations (PM 2.5 ) reached 23.60 µg/m 3 , which also exceeds the recommended WHO annual limit of 10 µg/m 3 [69,70]. Moreover, the average annual levels of benzoapyrene measured 4.42 µg/m 3 , which is very close to the recommended limit (5 µg/m 3 ) [69].

Methods
Analysis of air quality conditions, socio-economic factors and cancer-risk behaviours was completed using Ward's method [71] through which cluster centroids are performed based on minimum variance criterion, and the distance between two objects is calculated using Euclidean distance. In the beginning, each cluster is a single observation. During each step of Ward's method, the algorithm seeks pairs that lead the minimum variance to increase upon merging. Through this method, the sum of squares begins at zero but grows as the algorithm merges observations. Ward's method does not provide guidelines on the number of clusters to select, thereby allowing for some theoretical argumentation as to the appropriate number of clusters. According to earlier studies, there is no one decision tree for selecting the proper number of clusters [72][73][74][75]. The selection of the number of clusters is a multi-decision problem, and algorithms need to be developed in order to automatically choose adequate numbers. In this study, the best fit was based on the R-square parameter (>0.7), and the cubic clustering criterion, which leads to the selection of 3 to 5 clusters. It was empirically determined that the 5-cluster solution yielded the best match and was most easily interpretable. This solution was selected for further exploration. The PROC CLUSTER and PROC FASTCLUS procedures in SAS (9.4) were used to perform cluster analyses. By implementing a cluster analysis involving all variables, a standardization was performed with Z-scores in order to limit the probability of variables with large variance (e.g., income) from yielding larger effects on results relative to variables with smaller variance (e.g., number of cigarettes per day) [76,77]. Standardization (sometimes called normalization) is important as it helps ensure that groups are defined based on the distance between observations. Final results are interpreted as deviation from the mean value for all regions versus the national average. It is necessary to perform standardization tests in order to eliminate outliers and measure all variables on a similar scale. This helps avoid situations in which one variable has a larger impact on cluster analysis relative to another one. The graphical representation of clusters was performed using R software's sp package [78] and the NUTS-3 classification. [68]. In order to measure potential impact on raw cancer morbidity data, we conducted multiple regression modelling [79]. The dependent variable was the number of newly diagnosed cancer cases without age adjustment, and the explanatory variables included SES, lifestyle and environment.

Cluster Analysis
Our cluster analysis captured three groups of factors: environment, lifestyle and SES. Table 2 presents five clusters based on a combined analysis of environmental, lifestyle and SES. The first cluster is the city of Kraków (green district). This cluster is characterized by the second-highest level of household income, years of education and sports activity. In this cluster, measures of air pollution are very high and exceed WHO limits. The second cluster (yellow) covers the largest part of Poland. This cluster neighbours industrial and metropolitan areas. When it comes to lifestyle factors, there is a low sports activity level and low BMI, which falls below the national average level. In this cluster, SES is low and constitutes the lowest household income across all clusters. In Figure 1, the third cluster (purple) represents a district that includes the Silesia region and the city of Bełchatów. This cluster is characterized by limit-exceeding levels of benzopyrene and fairly high measures of PM 2.5 and PM 10 . Additionally, these regions are characterized by high BMI and appreciable levels of educational attainment and income. Figure 1 reveals that district 4 (red), which represents the metropolitan areas of Warsaw, Wrocław, Szczecin, Poznań and Gdańsk, differ significantly from other areas of Poland. Cluster 4 is characterized by the highest levels of household net monthly income, most years of education, and the highest declared sports activity. In addition, this metropolitan cluster includes a fairly high percentage of respondents who declare consuming too much alcohol. It appears that these respondents are dually aware of both their healthy and non-healthy behaviours. Cluster 5 (blue), which encompasses the northern part of Poland, is the cleanest area in terms of air pollution. Nevertheless, it presents with the nation's highest average prevalence of cigarette use, over-consumption of alcohol, low sports activity and high BMI. The difference between clusters 2 (yellow) and 5 (blue) stems from their respective measures of air pollution. Areas near industrial areas (i.e., old central industrial regions of Silesia) as well as those in and near Kraków appear more heavily polluted than the northern region of Poland.

Cluster Analysis
Our cluster analysis captured three groups of factors: environment, lifestyle and SES. Table 2 presents five clusters based on a combined analysis of environmental, lifestyle and SES. The first cluster is the city of Kraków (green district). This cluster is characterized by the second-highest level of household income, years of education and sports activity. In this cluster, measures of air pollution are very high and exceed WHO limits. The second cluster (yellow) covers the largest part of Poland. This cluster neighbours industrial and metropolitan areas. When it comes to lifestyle factors, there is a low sports activity level and low BMI, which falls below the national average level. In this cluster, SES is low and constitutes the lowest household income across all clusters. In Figure 1, the third cluster (purple) represents a district that includes the Silesia region and the city of Bełchatów. This cluster is characterized by limit-exceeding levels of benzopyrene and fairly high measures of PM2.5 and PM10. Additionally, these regions are characterized by high BMI and appreciable levels of educational attainment and income. Figure 1 reveals that district 4 (red), which represents the metropolitan areas of Warsaw, Wrocław, Szczecin, Poznań and Gdańsk, differ significantly from other areas of Poland. Cluster 4 is characterized by the highest levels of household net monthly income, most years of education, and the highest declared sports activity. In addition, this metropolitan cluster includes a fairly high percentage of respondents who declare consuming too much alcohol. It appears that these respondents are dually aware of both their healthy and non-healthy behaviours. Cluster 5 (blue), which encompasses the northern part of Poland, is the cleanest area in terms of air pollution. Nevertheless, it presents with the nation's highest average prevalence of cigarette use, overconsumption of alcohol, low sports activity and high BMI. The difference between clusters 2 (yellow) and 5 (blue) stems from their respective measures of air pollution. Areas near industrial areas (i.e., old central industrial regions of Silesia) as well as those in and near Kraków appear more heavily polluted than the northern region of Poland.   Table 3 reveals the number of newly diagnosed cancer cases per 100,000 citizens in each cluster in 2014. The highest number of new incidence occurs in clusters 1 and 4, which include metropolitan areas and the city of Kraków. These are characterized by high levels of SES. The lowest number of newly diagnosed cases occurs in cluster 5, which is located in the northern Poland and is characterized by the lowest levels of air pollution.

Multiple Linear Regression Model of Cancer Morbidity
Standard multiple linear regression was performed in order to measure the causality of cancer morbidity. Cancer morbidity was measured per 100,000 citizens based on the National Cancer Registry (KRN) data [80] at the NUT-3 regional level.
In order to discuss the potential effects of three groups of factors (environment, lifestyle and SES), it is necessary to understand the impact of these cancer risk factors on cancer morbidity. Firstly, there is a strong and significantly positive impact of the benzopyrene (BaP) level on cancer morbidity. If the BaP level rises by 1 µg/ m 3 , cancer morbidity increases by 24 incidences per 100,000 citizens. When it comes to lifestyle measure, there is no significant impact of this group of variables. As presented in Table 4, cigarette consumption has a positive effect on cancer morbidity while sports activity decreases the number of newly diagnosed cancer cases. With respect to the two variables capturing SES, higher household income has a negative effect on cancer morbidity, while additional years of education result in higher morbidity. It is necessary to analyse these measures according to cluster segments and higher SES is often associated with industry areas where environmental aspects play a significant role. The R-squared statistic was 0.5673 which means that almost 57% of the variation in cancer morbidity is explained by SES, lifestyle and environment variables in this regression model. A significant limitation of the regression model is the sample size with observation included from only 66 regions in one period of time. As such, regression models may lend insightful findings in addition to cluster analysis.

Discussion
This is the first attempt to examine a comprehensive set of cancer risk factors at the regional level in Poland. In 2017, the number of newly diagnosed cancer cases exceeded 164,000. According to the National Cancer Registry, the average growth rate of newly diagnosed cases in 2011-2016 measured around 2.4% yearly [29]. The negative impact of tobacco use, poor diet, physical inactivity and obesity has already been documented by several earlier studies [23,25,45,81]. Tobacco use accounts for nearly 22% of cancer deaths [1] and increases the risk of fourteen types of cancer [82], particularly lung cancer, which is the leading cause of cancer death in Poland [29]. Moreover, obesity linked with physical inactivity, alcohol consumption and an unhealthy diet increases the risk of cancer [24,25]. Another key finding is that air pollution increases the risk of cancer [49][50][51]. In addition, low SES is associated with an increased probability of tobacco use and related exposure to carcinogenic substances [17,83]. One study from Poland examined the relationship between SES and mortality resulting from lung cancer and respiratory diseases and confirmed a correlation between current SES and mortality from both respiratory diseases and lung cancer [14]. Considering WHO-defined cancer risk factors [1], this paper focuses not only on smoking and declared alcohol consumption but also takes into account internal factors such as SES, diet and sports activity combined with external determinants such as the environment. With this extensive approach in mind, a regional analysis of cancer risk factors was also undertaken. In order to develop cluster analysis on the aggregated regional NUT-3 level [68] data, the SD dataset with 19,769 respondents was merged with a local data bank on pollution [66,67].
The main research question examines the impact of combined environment, lifestyle and SES on the risk of cancer while taking the spatial split into account. The results of the final cluster analysis are presented in Figure 1 and Table 2. The first cluster (green) occurs in the city of Kraków, which is an interesting cluster, where citizens present similarly in terms of SES and sports activity to those living in other large metropolitan areas across Poland. However, Kraków is very heavily polluted due to coal, rubbish and wood heating methods as well as heavy and congested traffic. Moreover, Kraków is located in the Vistula River valley, which significantly limits natural air circulation. The second cluster (yellow) is the area neighbouring industrial regions. Table 2 shows that cluster 2 has slightly higher air pollution compared to the national average. Moreover, people living in the second cluster declare low sports activity and have relatively low SES. The third cluster (purple) includes the Silesia region and metropolitan area of Bełchatów. Like Krakow, this area is heavily polluted due to coal-mining and heavy industry. Unlike Kraków, however, the people living in this area exhibit less sports activity, which implies a higher BMI level. The fourth cluster (red) is a metropolitan district characterized by high SES and high sports activity ratio. On the other hand, people in large cities are more aware of risky behaviours and declare a fairly high level of alcohol consumption. People here are highly educated, which implies a higher awareness of non-healthy behaviours. District 5 (blue) in Figure 1 covers northern Poland and is characterized by the lowest levels of pollution in the country. However, the people of this region are heavy consumers of tobacco and alcohol, with low engagement in sports activity and a high BMI level.
In addition to cluster analysis, the multiple regression model was performed in order to help draw conclusions about the impact of the proposed interventions on cancer morbidity. Taking into account the significant limitations of our regression model (a low number of districts used), the regression analysis shows the direction of impact of the environment, lifestyle and SES on cancer morbidity. While smoking and air pollution have positive impacts on cancer morbidity, sports activity and household income carry opposite results. The regression modelling is an ecological analysis, and causality interpretations should be made with caution.
This study relied on self-reported survey data from the SD study and was therefore subject to response bias. A second potential limitation is related to the SD alcohol consumption data. In conducting a detailed review of the available dataset, there were no precise data on alcohol consumption per capita at the regional NUT-3 level in Poland. A third possible limitation relates to cancer risk specifications. That is, this analysis did not take into account differences in cancer risk factors as they relate to different types of cancer. For example, 90% of lung cancer deaths are related to cigarette smoking [84]. Finally, this analysis lacks in the differentiation of large metropolitan cities, where population density is high, but more detailed data are not available. For this reason, our study aggregated all data to the NUT-3 level. Nonetheless, the results of this study examine global cancer risk factors [1] and can be referenced when seeking to improve the effectiveness of cancer-prevention policy in Poland.

Conclusions
This study focused on identifying clusters where to better address appropriate local cancer prevention policy. Policy interventions should not only consider anti-smoking campaigns but may also include tailored fiscal and social policy tools that are specific to geographical clusters. These tools should be designed to focus on issues related to air pollution, consumer lifestyle and SES in target regions. Our main recommendations are presented in Table 5. In order to improve their effectiveness, national cancer prevention campaigns should focus on local cancer-risk factors. Regarding which national policy tackles broadly defined problems such as cigarette smoking or air pollution, both national and local policy-makers must work together to focus on matching proper policy tools to the cancer-risk factors that dominate a particular jurisdiction. Moreover, better methods of communication must be developed as a means to adequately address the different needs of each of the various regions of Poland.
Author Contributions: D.M.-conceptualization; data curation; formal analysis; funding acquisition; methodology; validation; visualization; writing-original draft preparation; writing-review and editing preparation; M.A.W.-conceptualization; funding acquisition; project administration; supervision; writing-original draft preparation; C.C.-conceptualization; funding acquisition; supervision; writing-original draft preparation; writing-review and editing preparation. All authors have read and agreed to the published version of the manuscript.

Funding:
The study was conducted under the New Economy Lab project funded by the Polish National Agency for Academic Exchange (NAWA). Ciecierski also acknowledges funding from the National Institutes of Health's National Cancer Institute, Grant Number U54CA202995. The content is solely the responsibility of the authors and does not necessarily represent the official views of NAWA or the National Institutes of Health.

Conflicts of Interest:
The authors declare no conflict of interest.