Area-Level Determinants in Colorectal Cancer Spatial Clustering Studies: A Systematic Review

The increasing pattern of colorectal cancer (CRC) in specific geographic region, compounded by interaction of multifactorial determinants, showed the tendency to cluster. The review aimed to identify and synthesize available evidence on clustering patterns of CRC incidence, specifically related to the associated determinants. Articles were systematically searched from four databases, Scopus, Web of Science, PubMed, and EBSCOHost. The approach for identification of the final articles follows PRISMA guidelines. Selected full-text articles were published between 2016 and 2021 of English language and spatial studies focusing on CRC cluster identification. Articles of systematic reviews, conference proceedings, book chapters, and reports were excluded. Of the final 12 articles, data on the spatial statistics used and associated factors were extracted. Identified factors linked with CRC cluster were further classified into ecology (health care accessibility, urbanicity, dirty streets, tree coverage), biology (age, sex, ethnicity, overweight and obesity, daily consumption of milk and fruit), and social determinants (median income level, smoking status, health cost, employment status, housing violations, and domestic violence). Future spatial studies that incorporate physical environment related to CRC cluster and the potential interaction between the ecology, biology and social determinants are warranted to provide more insights to the complex mechanism of CRC cluster pattern.


Introduction
Cancer is one of the most important causes of mortality and morbidity around the globe. It is the third cause of death after cardiovascular diseases and road traffic injuries as reported by the Global Burden of Disease Study 2017 [1]. Among the types of cancers, colorectal cancer (CRC) accounted as the third most common cancer in men and the second highest in women [2]. The disease represented the loss of 15,800,000 disability-adjusted life years in 2013, 56% of which in middle-and low-income countries and 44% in the industrialized countries [3]. While the trend of colorectal cancer had shifted to the left in the western regions, the new cases diagnosed among the young and elderly age group Asians are increasing [4,5].
CRC is multifactorial by nature. No single hazardous factor is plausibly related to CRC, but individual factors such as sex, age, and family history, lifestyle behaviors including alcohol consumption, high intake of red meat and processed meat, low fruit and vegetable intake, high-fat diet, and physical inactivity were massively studied [6][7][8]. Few researchers postulated on the specific gene-environment interaction likely to cause CRC, on an individual basis [9,10]. Although education on the risk factors for CRC have been continuously delivered to the public, there are still evidences of huge disparities in CRC incidence across different location descriptively [11,12].
While many factors were found related to CRC, generally they can be classified into modifiable and non-modifiable. Among the established modifiable risk factors include obesity, westernized diet, physical inactivity, and low fiber intake [7,[13][14][15]. Meanwhile, the non-modifiable risk factors include hereditary, age, gender, and ethnicity [16][17][18]. Numerous studies in the literature have focused on colorectal cancer because of its high incidence and mortality and that it is closely related to individual lifestyle (modifiable risk factors), indirectly the tendency to cluster. People living in the same neighborhood tend to have similar lifestyle and share many cluster-inducing factors that grow substantial public concern over locally elevated CRC incidence [19,20]. Despite of the knowledge on modifiable and non-modifiable risk factors, less is known about interaction between multiple risk factors that may occur within a small geographical area. Thus, there has been an expansion of studies that explore the relationship between simultaneous risk factors collectively contributing to the potential of CRC cluster within a local area.
According to the classical theory of epidemiological triad, disease transmission can be explained through host, agent, and environment factors and the interaction that overlapped with each other. The triad model had shown successful intervention to curb the spread of infectious disease. The recognition of multiple risk factors related to CRC proposes the implementation of ecobiosocial approach to explain the relationship and interaction between the environment, biological, and social. Historically, the concept of ecobiosocial was essential in the field of vector-borne diseases, whereby integrated vector management actions and planning were first developed [21]. In 2016, an obesity framework based on the ecobiosocial concept was attempted to further elucidate the significance of environmental influence that shape the unhealthy choice of foods, compounded with social and genetic factors leading to the issue of obesity [22]. As CRC is a chronic disease, requiring more than a decade of potential exposure to the multiple risk factors, it may result from the complex interaction of ecobiosocial factors. The ecological was referred to as "individual activity" and "activity environment" [22] that disrupt the energy imbalance promoting obesity, one of the CRC risk factors. This includes any factors that promote or inhibit the physical activity of an individual within a local setting such as the availability of recreational park, green areas, safe walkability, or accessible public transportation as provided in many major cities. The biological component features the predisposing genetic factors, age, sex, ethnicity, and obesity depicted the susceptibility of an individual towards developing CRC. Notwithstanding that, the social factors define the behaviors and attitudes are likely shaped by the cultural importance of eating pattern, westernized diet, socioeconomic status, smoking habit, and lack of health-seeking behavior.
Spatial cluster detection is an important tool in cancer surveillance to identify areas of elevated risk and to generate subsequent hypotheses about cancer etiology [23,24]. Provided that the existence of known established risk factors of an area, spatial cluster analysis may predict the future trend of cancer locally and inform control strategies. A spatial disease cluster is defined as an area with an unusually higher disease incidence rate [25]. However, the term has been vaguely used to refer to a population-based cancer epidemiology due to the complex interaction between multiple factors believed to contribute to such event. Cancer cluster identification is heavily dependent on the accuracy of methodological design used to estimate the local relative risk as compared to the control [26,27]. Besides, spatial analysis of CRC incidence may provide a new knowledge on the relationships between external risk factors and people lifestyle with CRC burden across communities. This will enable policymakers to develop tailored intervention to areas where the CRC risk is greater. Therefore, the review aimed to identify and synthesize available evidence on clustering patterns of CRC incidence, specifically related to the associated determinants.

Materials and Methods
The systematic review was conducted in compliant to the PRISMA or Preferred Reporting Items for Systematic Reviews and Meta-Analyses review protocol [28]. PRISMA aims to guide researchers source the appropriate information at high level of accuracy. Based on the protocol, the authors initiated the systematic literature review by formulating appropriate research questions. The authors performed systematic searching that consists of identification, screening, and eligibility. The authors then proceed to appraise the quality of the selected articles using the Appraisal tool for Cross-Sectional Studies (AXIS tool) [29] to ensure the quality of the articles included. Upon completion, the authors read through in detail all the articles for data extraction and analysis.

Formulation of the Research Question
The research question was formulated based on the PICO concept; a tool often used to assist authors in developing suitable research questions for the review. It consists of Population or Problem, Interest, and Context/Outcome [30]. Based on this concept, the authors have included the three main aspects in the review: colorectal cancer (Population), spatial cluster (Interest), and determinants (Context/Outcome), which led the authors to the main research question "What are the determinants commonly linked to colorectal cancer cluster in spatial studies?".

Systematic Searching Strategies
The systematic searching strategy preceded by the identification, screening, and eligibility stages ( Figure 1).

Identification
Identification stage involved enrichment of the keywords through the utilization of synonyms and their variation to be used during article searching in the databases. The search string was developed and computed using Boolean operators and phrase searching as shown in Table 1. The systematic literature search was conducted between 24 and 27 May 2021, which involved four primary databases: Scopus, Web of Science, PubMed, and EBSCOHost, resulted in the retrieval of 3134 records. These four databases were selected because of their availability and accessibility to our institution. There were 45 duplicate records found and removed. The records were exported from the databases and arranged for screening in an Excel sheet. Table 1. Keyword search used in the identification process.

Database Search String
EBSCOHost (("colorectal cancer" OR "colorectal neoplasm *" OR "colorectal tumor *" OR "colorectal carcinoma" OR "large bowel cancer") AND ("cluster analysis" OR "spatial analysis" OR "geographical information system" OR "geographic distribution" OR "incidence distribution" OR "demography")) AND ("risk factor *" OR "cancer risk" OR "determinant *")) 1 The symbol * was used in the search strategy as truncation and wildcard function to increase variability of selected keywords.

Screening
The title and abstract of each article were examined for relevance and screened based on specific criteria by the authors. The inclusion criteria for article selection were: (1) published between 2016 and 2021, (2) full original article, (3) written in English, (4) study focused at identifying colorectal cancer clusters or interrelations between two or more of the clusters. The duration of published articles screened was determined based on the recent development and dynamicity of Geographic Information System (GIS) software. Articles of systematic review, conference proceeding, book chapter, and reports were excluded. Any disagreement on article selection was resolved via discussion. The screening process had excluded 1603 articles, while the remaining 46 articles proceeded for retrieval of full text for eligibility.

Eligibility
There were 38 full text articles successfully retrieved for eligibility. The authors reviewed all full text articles and recorded the reason for the article exclusion. A total of 26 articles were excluded due to the absence of spatial analysis (n = 13), the articles focus on CRC mortality (n = 5), focus on CRC screening adherence (n = 5) and article that combined other type of cancer (n = 3). The remaining articles were resumed with the quality appraisal process.

Quality Appraisal
The articles selected from the eligibility process must be further examined for risk of bias assessment to ensure the quality of the study [31]. Study quality was assessed using the appraisal tool for observational and cross-sectional studies (AXIS tool) as shown in Table 1. The scale is designed for non-experimental research and includes 20 items measuring each aspect of study quality [29]. Each study was assessed for potential risk of biases through key domain areas of study design, sample size justification, target population, sampling frame, sample selection, measurement validity and reliability, methodology limitations and discussion. Two authors conducted the assessments independently. Any disagreement between the two authors was resolved through discussion until consensus met; when necessary, a third reviewer was consulted. A total of 12 articles were included in the final stage.
The result for quality assessment is presented in Table 2. The total number of "yes" were recorded for every study as the tool guide does not specify the standardize scoring measure. The mean total quality score was 15.4 (range [14][15][16]. Of the 12 studies finalized, quality assessment using AXIS revealed all the included studies had clear study objectives and employed appropriate study design with respect to their objectives. Similarly, all 12 studies clearly defined the target population with an appropriate sampling frame. Only three studies addressed and categorized non responders [32][33][34]. The risk factor and outcome variables measured were appropriate to the aims of each study and were correctly measured. All studies clearly explained the statistical significance used and sufficiently described the methodology to enable them to be repeated. One study [35] inadequately described the basic data. Meanwhile, none of the studies reported information on the nonresponders, possibly due to the nature of ecological analysis used in spatial studies. All studies provided information on the methodological limitations. Five studies did not state any information regarding ethical approval or consent of participants [33,[36][37][38][39]. Table 2. Quality assessment of included studies using AXIS tool.

Author (Year), Country
Were the aims/objectives of the study clear?
Was the study design appropriate for the stated aim(s)?
Was the sample size justified?
Was the target/reference population clearly defined?
Was the sample frame taken from an appropriate population base so that it closely represented the target/reference population under investigation?
Was the selection process likely to select subjects/participants that were representative of the target/reference population under investigation?
Were measures undertaken to address and categorise non-responders?
Were the risk factor and outcome variables measured appropriate to the aims of the study?
Were the risk factor and outcome variables measured correctly using instruments/measurements that had been trialled, piloted, or published previously?
Is it clear what was used to determine statistical significance and/or precision estimates?
Were the methods (including statistical methods) sufficiently described to enable them to be repeated? Table 2. Cont.

Discussion Other
Were the basic data adequately described?
Does the response rate raise concerns about non-response bias?
If appropriate, was information about non-responders described?
Were the results internally consistent?
Were the results for the analyses described in the methods, presented?
Were the authors' discussions and conclusions justified by the results?
Were the limitations of the study discussed?
Were there any funding sources or conflicts of interest that may affect the authors' interpretation of the results?
Was ethical approval or consent of participants attained?
Total Recorded "Yes" Torres et al.

Type of Spatial Analysis
The review showed multiple types of spatial cluster statistics being used across the included studies. Seven studies [11,34,35,38,39,42,43] utilized Moran's Index to summarize the spatial autocorrelation over study area, three studies [32,36,41] used Poisson Regression Model, three studies [11,32,37] used Getis-Ord Gi, two studies [35,42] used local indicators of spatial association (LISA) and each one study analyzed their data using at least Besag-York-Mollie (BYM) model [38] and Generalized Linear Models [33], respectively or in combination with the others. Generally, the type of test can be classified into global, local, and focused tests according to the study hypotheses. The global cluster statistics, such as Moran's I often used to inform the existence of spatial structure of an area, not considering the point of location or the difference between different cluster [44]. Meanwhile, local statistics such as LISA and Ord-Getis Gi, explained on the nature of the spatial dependency of a given locality and focused test (e.g., Poisson Regression Model) explore the possible clusters near potential risk factors [44].

Factors Associated with CRC Cluster
Evidence of clustering were abundance as most of the studies reported presence of CRC cluster in the study population. However, the outcomes were more meaningful when the studies incorporate other factors to further understand the association with CRC cluster. The review found multiple factors frequently studied such as age, sex, ethnicity, overweight and obesity, smoking, daily consumption of fruit and milk, socioeconomic status as represented by the median income level, employment status, health costs, housing violations or domestic violence, health care coverage, urbanicity, dirty streets and tree coverage. Collectively, these can be summarized into social, biology and ecology determinants ( Table 4). Table 4. Factors analyzed in each of the included studies. When comparing the factors associated with CRC cluster in all the included studies, eight studies [11,32,33,36,38,[41][42][43] defined the relationship of CRC cluster with social factors, another nine studies [32,[34][35][36][37][38][39]41,42] explain on the biology factors, while two studies [11,42] analyzed on the ecological factor.

Ecology
While CRC has been commonly linked to westernized diet and physical inactivity, less focus was given to explore on the ecological factor leveraging towards colorectal carcinogenesis. Two studies suggest that the surrounding physical environment has temporally shaped the progression of CRC cluster within an area [11,42]. High accessibility to healthcare facilities was found correlated with substantial CRC screening rate, hence increase in CRC incidence and cluster [42]. Similarly, the CRC clusters were found dependent with the urbanicity level of an area [42]. The definition of an urban area by Kuo et al. (2019) was determined based on the population density of each county, which can be misleading when applied to contextual geographic variation due to ecological factor [42]. Fast food outlets offering high-dense fat diet and processed meat were more common in urban cities, thus worth explore to predict future CRC cluster. Even though factors like dirty streets and tree coverage as proxy to the physical environments were incorporated into the scoring of the community statistical area characteristics, the role of ecology per se was not highlighted [11]. Therefore, future studies to explore on the influence of ecological factors and CRC cluster is recommended.
Eight studies [11,32,33,36,38,[41][42][43] applied the spatial autocorrelation approach to explore the influence of social factors on CRC cluster and found out that neighborhoods with higher median household income level ranging between USD 38,040 and USD 80, 876 annually in 2011, were associated with decreased risk of both early and late-stage CRC [32]. On the other hand, in the middle-income countries where universal health coverage is an issue, the high CRC cluster pattern was observed with more frequent utilization of health services as measured through the health cost. Indirectly, the association explained the impact of socioeconomic inequalities against CRC incidence over time [36,43]. However, further information regarding the tumor stage following early detection and treatment deem important is lacking to complement the circumstances. Besides that, the aggregation analysis makes it difficult to elicit causal effect linkage on individual basis with regards to respective economic background.
The biological factors frequently analyzed in CRC spatial cluster studies were age and sex. Geographical areas with ageing population tend to form CRC cluster as compared to the younger age group [34,39,41]. However, information on the length of residency and migration activities were lacking to verify the plausible relationship between age and CRC cluster in the context of residential areas [11,42]. On the other hand, Pakzad et al. (2016) and Roquette et al. (2019) revealed specific spatial pattern of CRC cluster for men and women respectively [35,37]. The findings suggest for potential sex-specific determinants susceptible to CRC in certain areas despite the exposure to several other risk factors such as high fat diet, physical inactivity, and smoking. In areas with heterogenous ethnic population, Liu et al. (2016) reported higher CRC incidence among the Hispanics than non-Hispanics Whites and Blacks [32]. The selective trend against particular ethnicities likely supports the notion of gene-environment interaction in the progression of CRC that may arise from culturally specific dietary pattern and lifestyle. However, other factors such as the length of residency and social reciprocity should be critically considered and controlled with the native population.
Based on the findings, there is compelling evidence for future research on the interaction of ecological, biological, and social factors collectively with the geographic distribution of CRC incidence, to create area-level tailored cancer care services. Through the baseline information on the local patterns of CRC distribution, allocation of resources could be made available and planning of more targeted community intervention.

Discussion
The review systematically identified frequent reference made to coin the populationbased CRC cluster through methodological approach. Differences in methodology and statistical methods used were described to gauge better understanding of spatial cluster definition [11,45]. To date, there is little consensus on the definition of cluster pertaining to non-communicable diseases specifically colorectal cancer in the community [46]. Whereas the CDC defines a cancer cluster as "a greater than expected number of cancer cases that occurs within a group of people in a geographic area over a defined period of time", there is notoriously vague and grey area on the definitive baseline figure for such cancer in an area [25,47]. The challenges elicited upon investigation of cancer clusters at fields had called for multiple arguments on the validity of the statistical analyses [27,48]. To overcome the potential inflated probability, this has called for alternative approach of using a standardized incidence ratio (SIR) analysis based on causal inference framework rooted for cancer cluster [49]. The statistical constructs suggested for exposure hypotheses as compared to the traditional observed cancer outcomes shed some light to more reliable findings. Spatial cluster analysis plays an important role in quantifying geographic variation patterns [26,42,50]. It is commonly used in disease surveillance, spatial epidemiology, population genetics, landscape ecology, and many other fields, but the underlying principles are the same [51,52]. Spatial patterns are of interest to be used in cancer research to explain the link between exposure to the surrounding environments and development of cancer over more than ten years, of which the existing environment and ecosystem might have undergone drastic changes. Several approaches to the geographic pattern recognition include visualization techniques based on "eye-balling", kernel-based methods that accentuate differences on a surface, artificial intelligence approaches and exploratory spatial data analysis (ESDA) which rely on statistical test [44].
Despite the progress in spatial statistics utilized in CRC research, the review highlighted that many studies are complacent with the traditional Moran's I index to examine the spatial independence. The frequent usage may be related to the universal understanding of similar interpretation behavior relatively when compared to correlation coefficient [53]. With the growing interest of spatial statistics methodologies employed in various contexts, the heterogeneous geospatial studies of CRC incidence have led to difficulty in the comparison of study outcomes [32,37,38]. Spatial analysis was applied to improve understanding of a range CRC-related issues, including the distribution and determinants, the mechanisms driving the local CRC epidemiology, the effect of preventive strategies and the barriers to seek for treatment. Often, the geospatial methods have been combined with environmental factors exposure to understand the drivers of local cancer epidemiology; however, such studies remain limited for CRC in high-incidence areas [11,42,54,55].
Whereas the ecological determinants showed great significance when applied in spatial research, this factor has been lacked studied in relation to CRC. Factors such as health care coverage, urbanicity, dirty streets, and tree coverage were analyzed as the ecological determinants in the review. High accessibility of healthcare facilities offers better services including screening, thus concomitant with high CRC cluster [42,56]. Many studies linked urbanicity with higher CRC incidence partly due to the availability of health infrastructure and advance treatment options, besides the highly dense population. Walkability areas and greenness of streets have been associated with the amount and duration of physical activity [57], one of the well-known protective factors for CRC.
Exposure to unhealthy food environments such as the availability and accessibility to unhealthy food stores may encourage the surrounding community to have less healthy diets [54,58,59]. Likewise, the absence of green spaces or recreational parks nearby for physical activity may lead to continuous physical inactivity [19,60,61]. These are the examples of physical environment potentially instill the unhealthy lifestyle to localized settings, which may pose higher risk towards CRC in long-term. The review identified minimal studies that explore the association of the physical environment with population-based CRC cluster [11,42]. Physical environment plays important role that influence the formation of obesogenic environment, shaping the behavior and lifestyle of the population [62]. It contributes both direct and indirect pathways towards occurrence of CRC. Previous literatures highlighted the significance of ecological factors' exposure with CRC clusters to justify public health actions and policy in the context of preventive strategies [14,54,60,61,63]. Inability to identify potential modifiable factors within the physical environment poses salient challenges towards future CRC prevention and control.
Previous literatures examined molecular genetics found associated with CRC. This includes the APC gene, K-ras family, p53, DCC, and several mismatched repair genes leading to mutations throughout the genome of affected cells. While more than 50% of sporadic CRC cases were linked to some degree of genetic mutation, the occurrence of COX-2 genetic polymorphism is particularly high among Caucasians compared to the Asians [64]. Males are more likely than females to be diagnosed as CRC across all age groups, demonstrating the role of sex in carcinogenesis [64,65].
Other social aspects such as adherence to the existing CRC screening program may provide insight to explain the existence of a local cluster pattern [66,67]. A high-incidence area for CRC can be due to large uptake of CRC screening by the people, indicating good health awareness [68]. The behavior and attitude towards health highly influenced by the socioeconomic status [69]. Many studies supported better health outcome in countries providing universal health coverage [41]. Similarly, few recent spatial studies showed high-high clusters of CRC concentrated in urban areas compared to rural areas [11,34,42]. This can be further explained by the urbanized lifestyle that leads to more readily accessible and available online food delivery at present [70], promoting physical inactivity across all age groups in the population. Thus, future studies that examine the influence of urbanized lifestyle with the formation of CRC spatial cluster is recommended.
The ecological, biological, and social determinants have significant impacts to formation of geographical aggregated pattern of CRC incidence to an extent, when studied independently. In circumstances of the true population, most of the factors present simultaneously and possibly interact with each other, producing greater effect to increased risk of CRC. Although major studies highlighted the synergistic effect at the individual level, through animal studies supporting the gene environment interaction [71,72], there have been limited studies that examine the interaction between these factors to benefit the preventive strategies at the population level. The combined effect analysis is crucial to inform the multisectoral stakeholders on the challenging CRC burden as a shared public health issue. Figure 3 summarizes the interaction occurring between ecological, biological, and social determinants that potentially influence the existence of CRC cluster within population.

Strength
The review identified potential future research area on the association of ecological, biological, and social factors with the clustering pattern of CRC incidence. With the existing knowledge on population CRC cluster driven by various sociodemographic circumstances, future studies can be designed to explore the physical and built environment across various geographic settings. Furthermore, it provides insight to the multilevel stakeholders on more specific intervention and preventive strategies tailored to the high-risk areas.

Limitation
While ecological, biology, and social determinants are interrelated to cause colorectal cancer both at the individual level and community level, it is difficult to distinguish the relations of each factor independently. Spatial analysis studies focusing on cancer incidence secondary to occupation-related were not included in the review, thus limit the discussion on ecological and social influence towards cluster distribution.
Most of the reviewed articles were from middle-income settings, which may either reflect publication bias or a focus of research efforts on such settings. In high-incidence countries of the Asian region, studies with limited use of spatial analysis methods could reflect a lack of access to information resources or insufficient expertise in these settings. Nonetheless, the review revealed areas with high CRC incidence stand to gain the most from understanding of CRC spatial patterns in which clustering may be important epidemiologically.
Nearly all the models have shown significant associations between CRC cluster and demographic, socioeconomic, and risk-factor variables, although is it difficult to rule out publication bias favoring studies with positive findings. However, associations observed between CRC cluster and different factors such as sex, household income, and obesity at the population level vary across studies. These were recognized as important individual-level risk factors, highlighting the potential for ecological fallacy.

Conclusions
In conclusion, the review identified robust evidence of CRC cluster across different geographical settings. However, attempts to examine the association of area-level determinants and CRC cluster are lacking in ecology as compared to the common biology and social attributes. Therefore, future spatial studies that incorporate physical environment (ecological) factors in this research field are warranted as guides for policymakers to plan more targeted preventive and control actions. Studies relating more than one determinant with the CRC cluster displayed potential degree of interaction, which is understudied. Future interaction analysis that incorporates the combination of ecology, biology, and social attributes may benefit to explain the trend of CRC cluster in detail, thus validating the cancer control continuum planning.