Examining the Impacts of the Built Environment on Quality of Life in Cancer Patients Using Machine Learning

: Despite accumulative evidence regarding the impact of the physical environment on health-related outcomes, very little is known about the relationships between built environment characteristics and the quality of life (QoL) of cancer patients. This study aims to investigate the association between the built environment and QoL by using survey data collected from cancer patients within the United States in 2019. To better understand the associations, we controlled the effects from sociodemographic attributes and health-related factors along with the residential built environment, including density, diversity, design, and distance to transit and hospitals on the self-reported QoL in cancer patients after treatment. Furthermore, machine learning models, i.e., logistic regression, decision tree, random forest, and multilayer perceptron neural network, were employed to evaluate the contribution of these features in predicting the QoL. The results from machine learning models indicated that the travel distance to the closest large hospital, perceived accessibility, distance to transit, and population density were among the most signiﬁcant predictors of the cancer patients’ QoL. Additionally, the health insurance status, age, and education of patients are associated with QoL. The adverse effects of density on the self-reported QoL in this study can be addressed by individuals’ emotions towards negative aspects of density. Given the strong association between QoL and urban sustainability, consideration should be given to the side effects of urban density on cancer patients’ perceived wellbeing.


Introduction
The relationship between built environment and health-related conditions (such as physical activity, obesity, and cardiovascular disease) have been extensively discussed in previous studies [1][2][3]. Although the sustainable built environment is a multidimensional concept, the literature defines it through particular measures including the urban density and intensity of activities, diversity of land use, street network design, aesthetic qualities, and transportation facilities [4]. Neighborhood walkability, street connectivity, density, and mixed land use are associated with higher walking trips and physical activities [5][6][7] and can reduce the risk of cardiovascular disease [8,9]. Built environment attributes including density, diversity of land use, availability of destinations, and distance to transit can explain physical activity-related improvements in mortality and morbidity [10].
However, the evaluation of built environment attributes on cancer outcomes is a relatively novel arena that has not been extensively discussed in cancer-related research.
Notably, a few studies discuss the role of residential neighborhoods on the level of physical activities and body mass of cancer patients [11][12][13][14], while a majority of literature focuses on the effects of geographical accessibility and distance to cancer care providers on cancer public health planners to recognize vulnerable groups of patients who require further support interventions and provide appropriate services to cancer survivors [49].
The growing need to develop sustainability in cities points to the importance of health and wellbeing in shaping sustainable communities. Researchers suggest a close association between QoL and environmental sustainability. Accordingly, in addition to measuring sustainability indicators, measuring and tracking the QoL of urban residents could be regarded as a critical goal of city planners and policymakers [50]. Moreover, the satisfaction of city residents regarding environmental sustainability indicators such as green spaces, air quality, noise level, cleanliness, and climate change can increase city livability and QoL [51]. Accordingly, investigating the effects of built environment factors on QoL can be regarded as an effort to understand the environmental sustainability concept.
This study aims to identify the factors that contribute to the QoL of people who struggle with cancer while considering a comprehensive set of internal and external factors. Accordingly, we address the following research questions: (1) How do built environment attributes along with health-related and sociodemographics shape the self-reporting QoL of cancer patients? and (2) What are the most influential factors associated with patients' QoL?
To fulfill the research gap, the present study designs a comprehensive survey to collect data from cancer patients across the US systematically. We explore the effects of built environment attributes by considering objective and actual measures as well as the subjective and perceptional factors. Furthermore, and to conduct a comprehensive framework, we employ the sociodemographic attributes and health-related variables suggested in the literature as the significant internal determinants of cancer patients' QoL. Although most of the previous studies have focused on simple regression models to examine the linear relationship between QoL and its predictors [36,52], we employ machine learning models to analyze the survey data. The application of a broad conceptual model developed based on machine learning algorithms allows us to better understand self-reported QoL in cancer patients. Our study further incorporated machine learning models (i.e., logistic regression, decision tree, random forest, and multilayer perceptron neural network) to delineate the nonlinear patterns underlying the predictor variables with respect to the QoL of cancer patients. Results of this study have a great potential to help urban planners and policymakers design health-oriented neighborhoods that can improve cancer patients' wellbeing and satisfaction.

Survey Design
In this study, an online cross-sectional survey was designed to collect data from cancer patients across the US. All participants consented, and an Institutional Review Board (IRB) was approved for survey administration and data usage. To ensure the survey cohort, eligible participants must (1) have been treated by radiotherapy, chemotherapy, or other treatments, (2) be in remission or still seeking other treatments, and (3) be over 18 years old. The main objective of designing the questionnaire was to obtain information related to the behavioral patterns of the cancer patients during primary treatments including radiotherapy and chemotherapy, and other treatments. The questionnaire contained general information and questions related to the cancer type and the treatments, patients' residential neighborhoods as well as their perceptions, attitudes, and quality of life. The third part of the questionnaire included the socioeconomic attributes of the respondents. After attaining the initial data (n = 950), we omitted those patients who filled in the questionnaires within less than 600 s to remove unreliable entries. To this end, a total number of 750 surveys remained for the spatial analysis.

Geocoding
By requesting the respondents' home addresses, we geocoded home locations. We omitted cases with invalid home addresses that we were not able to locate on the map. The remaining addresses were fed into Google MyMap (https://www.caliper.com/maptovu. htm, accessed on 20 January 2020), which draws pushpins on locations corresponding to given addresses. To ensure every pushpin was located in the correct location, we manually verified each location by matching the home zip code and street names on Google maps with the address provided by the respondent. As such, we geocoded latitude and longitude coordinates of home address locations (n = 589). Figure 1 depicts the spatial disparities of the cancer patients according to the types of treatments (radiotherapy and chemotherapy). tionnaires within less than 600 s to remove unreliable entries. To this end, a total number of 750 surveys remained for the spatial analysis.

Geocoding
By requesting the respondents' home addresses, we geocoded home locations. We omitted cases with invalid home addresses that we were not able to locate on the map. The remaining addresses were fed into Google MyMap (https://www.caliper.com/maptovu.htm, accessed on 20 January 2020), which draws pushpins on locations corresponding to given addresses. To ensure every pushpin was located in the correct location, we manually verified each location by matching the home zip code and street names on Google maps with the address provided by the respondent. As such, we geocoded latitude and longitude coordinates of home address locations (n = 589). Figure 1 depicts the spatial disparities of the cancer patients according to the types of treatments (radiotherapy and chemotherapy).

Built Environment Measures
Earlier studies have often calculated the built environment attributes at census block level [53] or considering the zip codes of participants' home addresses at the time of diagnosis [54]. While the literature has often measured the built environment at the block group level [14,24,25], this study measured the disaggregated built environment attributes in a one-mile buffer area around the participants' home location. The built environment measures in the present study include density, land use diversity, street design, and distance to transit. Using a geographic information system (GIS), we joined different datasets to the extracted buffer layers and measured the built environment attributes.

Built Environment Measures
Earlier studies have often calculated the built environment attributes at census block level [53] or considering the zip codes of participants' home addresses at the time of diagnosis [54]. While the literature has often measured the built environment at the block group level [14,24,25], this study measured the disaggregated built environment attributes in a one-mile buffer area around the participants' home location. The built environment measures in the present study include density, land use diversity, street design, and distance to transit. Using a geographic information system (GIS), we joined different datasets to the extracted buffer layers and measured the built environment attributes.
To calculate population density, we used population data at the census tract level from the American Community Survey (ACS) (https://www.nhgis.org/, accessed on 6 February 2020). Hence, the population density of each participant's home location was calculated within the corresponding one-mile buffer area.
Further, we computed an "entropy index" (EI) as our measure of land use diversity (mixed-use) in buffer areas [55]. The jobs by sectors, including five sectors that are known to be serving jobs (retail, services, food and accommodation, health, and education), were summed for blocks within the buffer areas of participants' home locations. The entropy is calculated as follows: where P is the share of each of the five job sectors, and K is the number of sectors (i.e., K = 5). Notably, the EI ranges from 0 to 1, where the value of 1 shows the equal number of jobs in each of the sectors within the buffer area and 0 indicates that all jobs are in a single sector.
The distance to transit was measured according to the network distance from participants' homes to the closest public transit stops by using Maptitude (https://www. caliper.com/maptovu.htm, accessed on 12 February 2020) software. Moreover, transit stop density was estimated by dividing the number of public transportations stops by the area of the buffers to evaluate the accessibility of the participants to public transportation. We measured the travel distance to the closest large hospital through the shortest travel time from the geocoded respondent's home location. Travel distances of more than 50 miles were disregarded from the final analysis.

Perceived Built Environment and Accessibility
Although a supportive actual environment has been proven a necessary factor in improving individuals' health outcomes, studies suggest the importance of the perceived environment in promoting active mobility and health-related behaviors [56]. To explore the perceived built environment, we asked the participants to evaluate their neighborhood's characteristics on five-point Likert scales, where 1 = very poor to 5 = very good. Respondents stated how well their residence and its location met their needs through six statements in terms of easy access to their health provider, easy access to drugstores, closeness to work/school, closeness to family members, affordability of the neighborhood according to the patients' income and their treatment costs, quietness, and safety and security of their neighborhood according to the cancer patients' mental and physical condition. We then used confirmatory factor analysis (maximum likelihood with Promax rotation with 59.54% variance explained and Kaiser-Meyer-Olkin (KMO) = 0.869) to reduce the number of factors and extract one factor to indicate the perceived built environment. We also asked respondents to evaluate their residential accessibility in terms of approximate driving distance (in minutes) from their residential built environment to six different errands. We factor analyzed the distances to obtain a factor that indicated the built environment accessibility (maximum likelihood with Promax rotation with 62.27% variance explained and KMO = 0.879).

Quality of Life
All of our participants were selected from patients who had received three types of cancer treatments. Accordingly, to identify QoL, the survey included a self-reported question evaluating the respondents' overall quality of life after cancer treatments in a five-point Likert type scale from 1 = terrible to 5 = excellent.

Other Key Variables
Previous studies have found that other key variables such as sociodemographic and health condition can also influence QoL. For instance, people on low income are more likely to have less physical activity and hence higher rates of morbidity and poorer physical function [57]. On the contrary, higher income adults are likely associated with higher levels of health-related QoL [58]. Accordingly, the survey contained self-reported questions related to the socioeconomic attributes of the patients including age, gender, income, race, education, employment status, homeownership, car ownership, and health insurance coverage.
Regarding cancer-related factors, the survey included questions about the cancer type and the type of treatments. We categorized the patients' cancer types based on the diagnosis difficulty into three groups including easy, intermediate, and hard to diagnose [22]. Table 1 Sustainability 2021, 13, 5438 6 of 19 shows the descriptive statistics of the key variables and Table 2 demonstrates the results from the factor analysis. We also grouped the patient's cancers based on radiotherapy and chemotherapy.

Predictive Modeling for Quality of Life of Cancer Patients
In the literature, machine learning models have been widely used for predictive modeling tasks, e.g., to predict chronic diseases and analysis of vital signs [59,60], to study the effects of built environment on driving distance [61], to identify abnormalities in manufacturing processes and schedule predictive maintenance [62][63][64], and to recognize transportation modes with mobile sensing [65,66]. In this study, four machine learning models, i.e., logistic regression, decision tree, random forest, and multilayer perceptron neural network, were employed to investigate how built environment characteristics, perceived built environment, socio-demographic attributes, and patients' health-related variables were correlated with their QoL. Notably, the QoL scores are binarized into highor low-level QoL using a cut-off QoL = 3. In other words, if a patient is with QoL ≥ 3, a label of "high-level of QoL" (i.e., 1), will be assigned. Otherwise, the patient will be associated with a label of "low-level of QoL" (i.e., 0). Let x = (1, x 1 , x 2 , . . . , x m ) denote the feature vector of an instance (i.e., a patient) and y ∈ (0, 1) be the label. In logistic regression, the log-odds for label 1 are calculated as: The parameter β can be determined using the maximum likelihood estimation [67]. As a classification problem, we adopted 0.5 as the cut-off probability: ifp(x) ≥ 0.5, the estimated labelŷ was considered as 1. Otherwise, it was 0 [68]. The decision tree model deploys a tree-like structure to learn simple decision rules inferred from data. Starting from the root node, an instance is sorted through a sequence of internal nodes to reach a leaf node, which assigns a class label to the instance. Each internal node symbolizes a test on the instance and the path from the root to leaf node can be represented as a classification rule [69]. Assume the leaf node h contains n h patients, we letp denote the proportion of class k observations in node h. The patients in node h can be classified based on majority voting: A few criteria can be used for splitting internal nodes, such as cross entropy and the Gini index [70]. Notably, although the decision tree method is considered a relatively simple approach, the generated classification rules are highly interpretable. Thus, it is still widely used in the machine learning community, especially among medical scientists [71].
An extension of the decision tree classifier is the random forest. It consists of a large number of decision trees that operate as an ensemble [72]. To ensure maximum diversity exists among the trees, the bootstrap aggregation (i.e., bagging) strategy is incorporated in the random forest [69]. That is, each decision tree is allowed to perform bootstrap (i.e., randomly sample from the dataset with replacement) and grow a decision tree based on the bootstrapped instances. Then, the prediction of class membership is based on a majority voting process. Letŷ t (x) be the predicted label from the tth decision tree for an instance with a feature vector x, meaning the final predicted label of that instance is: where T is the total number of trees in the forest and this number can be adaptively tuned by the user [69]. Finally, a multilayer perceptron (MLP) neural network model was implemented. The MLP feature vectors in the input layer to the class labels in the output layer through hidden layers [73]. Usually, multiple hidden layers are incorporated to handle the nonlinearity of the input data. The output layer contains two neurons representing the classification results (i.e., 0 and 1) [74]. The mean squared normalized error is used as the performance measure of MLP and the weight associated with each neuron is optimized based on the backpropagation approach [75]. Table 1 indicates that approximately half of the sample were male, with an average age of 53 years old. The majority of the cases were white American, mostly high-educated. The sample population was covered by a variety of health insurance, the majority of which are converged by Medicaid and Medicare. About 83% of the sample were categorized into the easy to intermediate levels of cancer diagnosis. As cancer patients can be treated by more than one type of treatment during the remedy, the sample can have multiple answers. Thus, the distribution of the three cancer treatments is slightly similar to each other. Table 2 indicates the factor analysis for perceived accessibility and perceived built environment. Utilizing confirmatory factor analysis for each set of questions, we extracted two main factors.

Predictive Modeling Results
In this study, three metrics, i.e., accuracy, F-score, and area under the receiver operating characteristic curve (AUROC) were used for the evaluation of the performance of the proposed models. Accuracy was calculated as the correctly classified instances over the total number of instances. F-score balanced the precision and recall in the classification results. Precision referred to the number of true positives over predicted positive instances, whereas recall measured the ratio between the number of true positives and all positive instances [69]. AUROC calculated the area under the ROC curve, plotting the true positive rate versus the false positive rate. The true positive rate is the probability that a positive instance (i.e., a high QoL patient in our case) will be predicted as positive, and the false positive rate indicates the probability that a negative instance is considered as positive. Both the accuracy and F-score were in the range of (0,1) where the ideal value is 1. The AUROC ranged between 0.5 and 1, in which 0.5 corresponded to random classification and 1 corresponded to the perfect result. Notably, we randomly selected 80% of instances for training and 20% for test and each result is an average of 50 replications.
As shown in Figure 2, the most complex model, i.e., the MLP, achieved the highest accuracy for both the training (90%) and test sets (69%). Here, three hidden layers were deployed with 12, 12, and 6 neurons, respectively. The model with moderate complexity, e.g., the decision tree, achieved a test accuracy of 64%, which was slightly worse than the MLP but better than the logistic regression. Notably, the decision tree model is associated with high interpretability, and the obtained tree structure is visualized in Figure 3. In addition, the simplest model, i.e., logistic regression, achieved 66% and 61% accuracy for training and test, respectively. This corroborates the results from models with higher complexity and demonstrates the effectiveness of the selected predictive variables. Notably, the best accuracy for the test data achieved is~70%. This is mainly due to the high heterogeneity of cancer patients within each group (i.e., high-level QoL and low-level QoL) as we binarized the continuous QoL scores from the survey.
Patients' primary health provider 0.584

Predictive Modeling Results
In this study, three metrics, i.e., accuracy, F-score, and area under the receiver operating characteristic curve (AUROC) were used for the evaluation of the performance of the proposed models. Accuracy was calculated as the correctly classified instances over the total number of instances. F-score balanced the precision and recall in the classification results. Precision referred to the number of true positives over predicted positive instances, whereas recall measured the ratio between the number of true positives and all positive instances [69]. AUROC calculated the area under the ROC curve, plotting the true positive rate versus the false positive rate. The true positive rate is the probability that a positive instance (i.e., a high QoL patient in our case) will be predicted as positive, and the false positive rate indicates the probability that a negative instance is considered as positive. Both the accuracy and F-score were in the range of (0,1) where the ideal value is 1. The AUROC ranged between 0.5 and 1, in which 0.5 corresponded to random classification and 1 corresponded to the perfect result. Notably, we randomly selected 80% of instances for training and 20% for test and each result is an average of 50 replications.
As shown in Figure 2, the most complex model, i.e., the MLP, achieved the highest accuracy for both the training (90%) and test sets (69%). Here, three hidden layers were deployed with 12, 12, and 6 neurons, respectively. The model with moderate complexity, e.g., the decision tree, achieved a test accuracy of 64%, which was slightly worse than the MLP but better than the logistic regression. Notably, the decision tree model is associated with high interpretability, and the obtained tree structure is visualized in Figure 3. In addition, the simplest model, i.e., logistic regression, achieved 66% and 61% accuracy for training and test, respectively. This corroborates the results from models with higher complexity and demonstrates the effectiveness of the selected predictive variables. Notably, the best accuracy for the test data achieved is ~70%. This is mainly due to the high heterogeneity of cancer patients within each group (i.e., high-level QoL and low-level QoL) as we binarized the continuous QoL scores from the survey.  In addition, the F-score and AUROC of each model are summarized in Table 3. It is noteworthy that all four models achieved similar F-scores and AUROCs. The MLP achieved the best F-score, i.e., 0.72. The other three models, i.e., logistic regression, decision tree, and random forest, achieved slightly lower F-scores around 0.69 to 0.71. Further, the decision tree had the best AUROC, i.e., 0.67. The three other models obtained an AUROC ranging from 0.63 to 0.66. The results have shown that all the models are associate with good discriminative powers, and they are quite robust in predicting the patients in the positive class (i.e., patients with high QoL). This, in turn, indicates that our selected predictive variables are closely related to the QoL of cancer patients. Sustainability 2021, 13, x FOR PEER REVIEW 10 of 19 In addition, the F-score and AUROC of each model are summarized in Table 3. It is noteworthy that all four models achieved similar F-scores and AUROCs. The MLP achieved the best F-score, i.e., 0.72. The other three models, i.e., logistic regression, decision tree, and random forest, achieved slightly lower F-scores around 0.69 to 0.71. Further, the decision tree had the best AUROC, i.e., 0.67. The three other models obtained an AU-ROC ranging from 0.63 to 0.66. The results have shown that all the models are associate with good discriminative powers, and they are quite robust in predicting the patients in the positive class (i.e., patients with high QoL). This, in turn, indicates that our selected predictive variables are closely related to the QoL of cancer patients.  To gain insight into the usefulness of each feature, we computed the importance scores related to each feature considering the random forest approach. We omitted features with small scores and only considered the highest determinants of QoL (such as income, household size, perceived built environment). Table 4 indicates the top 17 features regarding the scores. The score is calculated as node impurity weighted by the probability of reaching that node and is normalized into (0,1). Moreover, to understand the relationship intensity and direction of the independent variables and QoL, Table 5 summarizes the results from the logistic regression and describes the most significant determinants of QoL in cancer patients. The results from the logistic regression support the decision tree algorithms. Comparing Tables 4 and 5 indicates that the top 10 features selected by random forest are with larger coefficients than other variables in the logistic regression model. However, the results from the decision tree effectively follow the scores of random forest and are relatively in line with the coefficients from the logistic model (see Figure 3). The age, health insurance, education, travel distance to the closest large hospital, and perceived accessibility are among the most important predictors of cancer patients' QoL in both the decision tree and random forest scores.

Discussion
This study employs a cross-section survey to investigate how built environment impacts the quality of life (QoL) of cancer patients.
The random forest's results demonstrate the top ten most important features that predict the QoL of cancer patients (Table 4) and the logistic regression indicates associations. Our results demonstrate that the built environment characteristics considerably contribute to predicting the QoL of the participants. According to the scored features in random forest, the travel distance to the closest hospital is one of the most significant predictors of QoL. Previous studies have suggested that the distance from residential neighborhoods to patients' treating hospital influences cancer outcomes, and consequently, those who reside far from their care provides may have a lower QoL among cancer survivors [76]. Although travel distance to health facilities can be a barrier for cancer patients [77][78][79][80], this study considers the distance to the closest hospital and not the treated hospital. Hence, residing in neighborhoods distant from large hospital can be an indicator of living in the low-dense suburbs.
Perceived accessibility is the third predictor of the QoL in random forest [33]. The perception towards accessibility to the neighborhood local services such as access to schools, public transportation, medical care, and shopping exhibits a significant effect on self-rated health [81]. Although the logistic regression does not indicate a significant association between the perceived accessibility and QoL, it seems that patients residing with less accessibility (greater values of perceived accessibility), reported lower levels of QoL.
Distance to transit is the fourth important feature in the random forest. These measures are defined as the supportive built environment features that can significantly predict the QoL [29]. The literature introduces the distance to transit and residential density as two of the objective indicators measuring the quality of urban life [50]. According to the logistic model, patients residing in areas with more distance to transit declare lower QoL. The association is not statistically significant, but the direction is aligned with the theory.
Population density is another determinant of QoL in the random forest. Despite the lack of a clear understanding of the mechanism under which different urban densities influence QoL, some studies have suggested that high density positively affects life satisfaction [82]. Higher population density can be positively associated with subjective wellbeing when accompanied by mixed land uses, public transport, limited car traffic, access to green spaces, and social equity [83]. People who reside in higher density neighborhoods are more likely to perform physical activities [84] and more able to experience better health conditions and life satisfaction [85]. On the other hand, some research suggests that living in less dense areas can increase the quality of life while controlling for all the other sociodemographic and somatic health variables [86]. Accordingly, urban density contributes to QoL in different ways. The results of logistic regression in terms of density and QoL associations indicates an evident paradox. Earlier studies have often reported a positive relationship between population density and health outcomes due to the availability of walkable destinations, and consequently a higher tendency towards walking, biking, or public transit [87,88]. In contrast, our results suggest that a higher level of QoL is reported by participants in neighborhoods with lower population density. Research on compact city form states that the negative association between life satisfaction and urban density stems from the emotional response of the residents toward perceived crime and stress in crowded and noisy neighborhoods [83]. In contrast, residing in low-dense suburbs has positive effects on the wellbeing of individuals through positive emotions and calmness [89]. In addition, higher levels of anxiety can be found in high-density areas and consequently decrease mental health [90]. The positive effects of density on wellbeing occur when it brings with it mixed land use, access to public transit, restricted car travel, access to green spaces, and social equity [83]. Accordingly, the adverse effects of population density on the self-reported QoL in cancer patients can be a result of their negative emotions towards the negative aspects of density, such as traffic congestion, the sense of crime, and lack of green space.
The scored features of the random forest reveal that the entropy index plays a moderate role in defining the level of self-reported QoL in cancer patients. Neighborhoods with mixed land use provide the cancer survivors accessibility to different errands in a walkable distance [25]. This result is in accordance with some previous studies about the compact city form in which mixed land use has the potential to provide a better quality of life through offering longer, healthier, and safer lives and contributing to the economic wellbeing and health of cities [91].
Random forest scores show that among all sociodemographic characteristics, respondents' age has an enormous contribution to the level of QoL among cancer patients. It seems that the process of aging in cancer patients can influence disease adjustment and therefore impact the health-related QoL [92]. Our results from the regression model reveal that older cancer patients have a higher level of QoL. This finding is in line with similar studies, which suggest that younger patients feel worse than older adults on some quality of life dimensions because they suffer more from psychological symptoms and financial issues [93,94].
The random forest score of health insurance shows that this feature can differentiate the QoL experience through different levels. This result is in line with previous studies that demonstrate health insurance status is associated with health-related attributes of cancer patients over time [95]. Since patients with poorer insurance coverage may have less access to high-quality treatment, this can result in later diagnoses and worse outcomes [96]. This result confirms empirical evidence, which shows that health insurance can reinforce the health of vulnerable groups, such as senior adults, children, and people with premedical conditions and low-income populations [97]. Moreover, the associations between health insurance and QoL explain that participants who have private and/or employer-paid insurance health insurance reported a higher QoL levels compared with low-income participants who have government-related insurance. It confirms previous studies that report that cancer-related financial burdens are related to an increased risk of depression and lower health-related QoL levels in cancer patients [98].
The number of cars in the family is the tenth significant factor in predicting the QoL that has been identified by the random forest. To the best of our knowledge, there is no evidence to identify the effect of vehicle ownership on the QoL of cancer patients. However, the vehicle is the most usual mobility mode particularly for residents of distant and rural areas, so, it can affect a cancer patients' access to treatment facilities where they might not have access to other mobility modes [77,78,99,100]. Access to private vehicles and the option of driving with others are among the most crucial treatment-related factors that impose barriers to cancer patients [101]. Vehicle availability is assumed as a variable that has a positive relationship with the early diagnosis stage [15] and receiving the first line of treatments [79]. Patients residing in areas having no access to a private vehicle are less likely to follow cancer screening treatments [102]. This evidence can support the contribution of access to a car in the QoL of cancer patients.
Furthermore, education is another factor contributing to the QoL of cancer patients. This result supports the studies that propose education improves wellbeing because it develops access to economic devices, enhances a person's sense of control over life, and increases social support [103]. The positive association between education and QoL in this study can be justified by the earlier research suggesting that low education along with low neighborhood socioeconomic status result in worse all-kind survival for particular cancers [24]. The higher score related to the significance of chemotherapy compared with radiotherapy reveals that chemotherapy treatment has a more significant contribution in predicting quality of life [104]. Chemotherapy treatment appears to have a negative effect on the QoL of patients who received this treatment. Although physicians suggest chemotherapy to improve QoL for patients with end-stage cancer, it cannot reinforce QoL for patients with moderate or poor performance status and worsened QoL close for patients with good performance status [105]. Gender and race have a small participation in determining the level of QoL. The race of the participants (white versus other races) has a small but notable effect on QoL after treatment [106].

Conclusions
This study brings new insights regarding the impacts of actual and perceived built environment characteristics on the QoL of cancer patients while controlling sociodemographic and health-related factors. To address the first research question regarding the factors explaining the self-reporting QoL, we employed the random forest approach. Results suggested that the QoL of cancer patients can be principally influenced by built environment features, including travel distance to a closest large hospital, perceived accessibility, distance to transit, population density, and sociodemographic factors such as age, health insurance status, and education. Results from the logistic regression fulfill the second research question regarding the most significant determinants of QoL in cancer patients. Population density, age, education, health insurance, and chemotherapy treatment are the most critical determinants of QoL in cancer patients. We point out the main research outcomes in the following areas:

•
Our findings regarding the effects of built environment features such as density and access to healthcare facilities on the QoL of cancer patients indicate that a supportive built environment can overcome the barriers in the outdoor environment, increase the likelihood of physical activity, and therefore improve perceived quality of life. These results point out that urban design and transportation planning need to become more friendly for this population group with particular needs and requirements.

•
To improve social equity, it is fundamental to design environments compatible with the needs of all community groups, including people who are struggling with chronic diseases that require ongoing medical attention or limit activities of daily living in the long term.

•
Understanding the associations between built environment and health-related QoL can help in the development of intervention policies that aim to improve cancer patients' wellbeing. Hence, there is a need for collaboration between transit agencies, MPOs, and community planners to target the living environment and mobility needs of people who are burdened with chronic disease. To this end, urban and transportation planners and practitioners should be more involved in this field and acquire more knowledge from other disciplines. Integrating transportation planning with public health and social studies could reinforce existing policies and strategies in transportation accessibility and equity and therefore increase wellbeing and QoL.

•
In addition, there is an inherent need to develop a QoL measurement that comprehensively counts for subjective feelings as well as objective factors in terms of patients' health condition, transportation, and built environment. This QoL measurement can be used as a policy tool by communities and local governments to evaluate the extent to which the mobility and built environment meet the needs of patients with chronic diseases.

•
The inverse associations between population density and cancer patients' QoL indicate that compact development strategies can be fulfilled when policymakers address the side effects of urban density, such as fear of crime, high noise, and traffic congestion. This compact development pattern should concentrate on strategies that increase robust transportation options and improve public health indicators such as air quality while creating safe and secure neighborhoods that preserve more open space.
There is large room for improvement in our understanding of the effects of built environment and transportation accessibility on cancer patients' QoL in future research.
Working with the small sample size in our study can be a principal limitation of our study, which may have caused some failures in identifying more associations between the key variables, particularly in the logistic regression model. Further studies need to be developed to collect data on a large population of cancer patients regarding their mobility needs, their concerns towards residential neighborhoods, and their preferences about the attributes of a supportive neighborhood that can overcome their physical, mental, social, and environmental barriers. The other limitation of this study is related to measuring QoL. Measuring the QoL of the patients through standard EQ-5D-5L or Q-5D-3L criteria can allow future studies to explain the QoL of cancer patients thoroughly. This study also emphasizes the need for collaboration between health policymakers, urban planners, and transportation experts to conduct more research regarding the effects of transportation policies on health outcomes.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the privacy constraints.

Conflicts of Interest:
The authors declare no conflict of interest.