Social Determinants of Health Data Improve the Prediction of Cardiac Outcomes in Females with Breast Cancer

Stabellini, Nickolas; Cullen, Jennifer; Moore, Justin X.; Dent, Susan; Sutton, Arnethea L.; Shanahan, John; Montero, Alberto J.; Guha, Avirup

doi:10.3390/cancers15184630

Open AccessArticle

Social Determinants of Health Data Improve the Prediction of Cardiac Outcomes in Females with Breast Cancer

by

Nickolas Stabellini

^1,2,3,4,*,

Jennifer Cullen

^1,5,

Justin X. Moore

⁶,

Susan Dent

⁷,

Arnethea L. Sutton

⁸,

John Shanahan

⁹,

Alberto J. Montero

² and

Avirup Guha

^4,10

¹

Case Western Reserve University School of Medicine, Case Western Reserve University, Cleveland, OH 44106, USA

²

Department of Hematology-Oncology, University Hospitals Seidman Cancer Center, Cleveland, OH 44106, USA

³

Faculdade Israelita de Ciências da Saúde Albert Einstein, Hospital Israelita Albert Einstein, São Paulo 05652-900, SP, Brazil

⁴

Department of Medicine, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA

⁵

Case Comprehensive Cancer Center, Cleveland, OH 44106, USA

⁶

Center for Health Equity Transformation, Department of Behavioral Science, Department of Internal Medicine, Markey Cancer Center, University of Kentucky College of Medicine, Lexington, KY 40506, USA

⁷

Duke Cancer Institute, Duke University, Durham, NC 27708, USA

⁸

Department of Kinesiology and Health Sciences, College of Humanities and Sciences, Virginia Commonwealth University, Richmond, VA 23284, USA

⁹

Cancer Informatics, Seidman Cancer Center, University Hospitals of Cleveland, Cleveland, OH 44106, USA

¹⁰

Cardio-Oncology Program, Medical College of Georgia, Augusta University, Augusta, GA 30912, USA

^*

Author to whom correspondence should be addressed.

Cancers 2023, 15(18), 4630; https://doi.org/10.3390/cancers15184630

Submission received: 31 July 2023 / Revised: 8 September 2023 / Accepted: 14 September 2023 / Published: 19 September 2023

(This article belongs to the Topic Public Health and Healthcare in the Context of Big Data)

Download

Browse Figures

Versions Notes

Abstract

:

Simple Summary

This research aimed to investigate if adding social determinants of health (SDOH) to predictive models improves major adverse cardiovascular events (MACE) predictions in breast cancer patients, as cardiovascular disease is their leading cause of death. ML models, incorporating SDOH, demographics, risk factors, tumor characteristics, and treatments, were developed and compared. The results showed that including SDOH enhanced ML model performance in forecasting MACEs within two years of breast cancer diagnosis, especially for non-Hispanic Black patients. These findings offer more accurate risk assessments and personalized care insights for breast cancer patients, while also guiding efforts toward achieving healthcare equity.

Abstract

Cardiovascular disease is the leading cause of mortality among breast cancer (BC) patients aged 50 and above. Machine Learning (ML) models are increasingly utilized as prediction tools, and recent evidence suggests that incorporating social determinants of health (SDOH) data can enhance its performance. This study included females ≥ 18 years diagnosed with BC at any stage. The outcomes were the diagnosis and time-to-event of major adverse cardiovascular events (MACEs) within two years following a cancer diagnosis. Covariates encompassed demographics, risk factors, individual and neighborhood-level SDOH, tumor characteristics, and BC treatment. Race-specific and race-agnostic Extreme Gradient Boosting ML models with and without SDOH data were developed and compared based on their C-index. Among 4309 patients, 11.4% experienced a 2-year MACE. The race-agnostic models exhibited a C-index of 0.78 (95% CI 0.76–0.79) and 0.81 (95% CI 0.80–0.82) without and with SDOH data, respectively. In non-Hispanic Black women (NHB; n = 765), models without and with SDOH data achieved a C-index of 0.74 (95% CI 0.72–0.76) and 0.75 (95% CI 0.73–0.78), respectively. Among non-Hispanic White women (n = 3321), models without and with SDOH data yielded a C-index of 0.79 (95% CI 0.77–0.80) and 0.79 (95% CI 0.77–0.80), respectively. In summary, including SDOH data improves the predictive performance of ML models in forecasting 2-year MACE among BC females, particularly within NHB.

Keywords:

breast cancer; cardiovascular disease; machine learning; social determinants of health; race; disparities; prediction; cardiooncology

Graphical Abstract

1. Introduction

In 2020, breast cancer (BC) was the primary driver of global cancer incidence, accounting for an estimated 2.3 million new cases (11.7% of all cancer cases) [1]. In the United States (US), projections for 2023 indicate an estimated 300,590 new BC cases and 43,700 BC-related fatalities [2]. BC is the most prevalent form of cancer worldwide, with around 91% of individuals diagnosed with BC achieving a minimum five-year survival rate [1,3]. However, for every molecular subtype and stage of disease (except stage I), Black women exhibit the lowest 5-year relative survival rate compared to all other racial/ethnic groups [4]. The most significant disparities between Black and White women are observed in hormone-receptor-positive/human epidermal growth factor receptor 2-negative disease, with survival rates of 88% and 96% for Black and White women, respectively [4].

Cardiovascular disease (CVD) is the leading cause of death among patients with active BC over 50 [5]. The risk of CVD-related mortality is higher in post-menopausal female BC survivors than in individuals without a BC history [5]. Effective management of preexisting CVD risk factors, such as diabetes mellitus and hypertension, significantly influences the prognosis of older BC patients [6]. Social determinants of health (SDOH) are defined as “the conditions in which people are born, grow, work, live, and age, and the wider set of forces and systems shaping the conditions of daily life” by the World Health Organization, contributing significantly to the development of CVD risk factors, morbidity, and mortality, especially within marginalized communities [7,8]. SDOH, encompassing factors such as poverty, limited education, neighborhood disadvantage, racial residential segregation, discrimination, insufficient social support, and isolation, significantly influence both the stage at which BC is diagnosed and the subsequent survival outcomes [9].

Machine Learning (ML) models have been increasingly used as prediction tools due to their potential greater performance compared to traditional regression models, and their capacity to learn and deal with data with multiple structures, especially clinical data [10,11,12]. These models operate by receiving input data and employing mathematical optimization and statistical analysis techniques to predict outcomes [13]. A meta-analysis published in 2020 demonstrated that ML algorithms exhibit a high level of accuracy in predicting CVD outcomes [13].

According to recent evidence, ML models incorporating SDOH data improve the risk prediction of in-hospital mortality after hospitalization for Heart Failure (HF), particularly among Black adults [14]. The inclusion of SDOH data elevated the model’s classification index (C-index) from 0.72 (95% confidence interval [CI] 0.73–0.79) to 0.77 (95% CI 0.71–0.75) for Black patients, yet this effect was not observed in non-Black patients [14]. However, to our knowledge, there are no studies examining whether the inclusion of SDOH data enhances the prediction of cardiovascular events in patients with BC. We hypothesize that ML models incorporating SDOH data will outperform models without this integration in predicting major cardiac events (MACEs) in BC patients, especially in patients who are non-Hispanic Black (NHB). The primary objective of this study is to develop and compare race-specific (separate models for NHB and non-Hispanic White (NHW) patients) and race-agnostic (race as a covariate) ML models with and without SDOH data in the prediction of MACE in patients with BC.

2. Materials and Methods

2.1. Study Setting

The study setting was the University Hospitals (UH) Seidman Cancer Center in Northeast Ohio, US. UH is a large hybrid academic-community tertiary care center that provides medical services to diverse communities, including urban, suburban, and rural areas. It comprises an extensive network comprising 23 hospitals, over 50 health centers and outpatient facilities, and more than 200 physicians’ offices across 16 counties in the region [15,16]. The patient population at UH is predominantly from inner-city areas, leading to a higher representation of Black patients and comparatively lower percentages of Hispanic and Asian minorities than the US population [15,16].

2.2. Data Source

The data for this study were collected from the UH Seidman Cancer Center data repository, which is based on the CAISIS platform, an open-source, web-based cancer data management system that integrates multiple sources of patient data [16,17,18,19,20,21,22]. To enhance the accuracy and comprehensiveness of the obtained information for each patient, additional data from Electronic Health Records (EHR) were incorporated using the Electronic Medical Record Search Engine (EMERSE) [23]. All patient records were deidentified.

2.3. Inclusion and Exclusion Criteria

The cohort (Figure 1) consisted of females aged 18 years or older diagnosed with BC at any stage. The diagnosis was determined based on specific ICD 9/10 codes, including C50.XX, C79.81, 174.X, 175.0, 175.9, 198.81, and 217, where “X” represents any integer [24,25]. The inclusion criteria encompassed patients diagnosed between 1 January 2010 and 31 December 2019, ensuring a minimum follow-up period of two years by the year 2022, which was the data collection year. Patients were excluded from the analysis if they were male or had in situ carcinoma. Due to a low number of patients with Hispanic ethnicity, these individuals were also excluded from the analysis. All patients with available SDOH data were included, while patients without SDOH data were excluded from the analysis.

2.4. Outcome

The co-primary outcomes of this study were the diagnosis and time-to-event occurrence of 2-year MACE following the diagnosis of BC. The MACE events considered included heart failure (HF), acute coronary syndrome (ACS), atrial fibrillation (A-fib), and ischemic stroke (IS) [16,26]. The diagnosis of these events was determined using specific ICD 9/10 codes obtained from the complete medical history recorded in the EHR of each patient.

2.5. Covariates

Data on demographics, risk factors, SDOH, tumor characteristics, and treatment were collected for all eligible patients. Demographic information obtained from the patient’s EHR included age at diagnosis, self-reported race/ethnicity (NHB, NHW, other), and payer information (Medicaid, Medicare, private insurance, self-pay, other). Risk factors were extracted from the comorbidities list using relevant ICD codes identified prior to the MACE diagnosis. These risk factors encompassed self-reported smoking status (yes, no, former, unknown), Charlson comorbidity index, and cardiovascular (CV) history/risk factors (yes, no) [27,28]. Positive CV history/risk factors were identified if the patient had a diagnosis of hyperlipidemia, cardiomyopathy, known coronary artery disease, prior myocardial infarction (MI), carotid disease, prior transient ischemic attack (TIA)/stroke, and/or chronic kidney disease (CKD) (Supplemental Table S1). Combining these factors into a single variable aimed to generate a covariate that characterizes patients at high CV risk [29].

Individual and neighborhood-level SDOH features were sourced from LexisNexis, the world’s largest electronic database for legal and public-records-related information. These features were grouped into four domains: social and community context (marital status, number of household members, distance to closest relatives), economic stability (address stability, property status, annual income, properties owned, wealth index, household income, total count of transport properties owned), neighborhood and built environment (crime index, burglary index, car theft index, murder index, neighborhood median household income, neighborhood median home values), and educational access and quality (education institution rating, college attendance) [30,31]. The LexisNexis dataset utilized in our study consists of a compilation of various public and private records that are updated at different frequencies, with the data obtained reflecting the most current available records and combining records from adult patients discharged from a UH facility over 2.5 years and adult patients who are members of an Accountable Care Organization [32].

Tumor characteristics included date of cancer diagnosis, hormone receptor status (estrogen receptor (ER), progesterone receptor (PR), and HER2), histological type (ductal or lobular, not specified (NOS), other/unknown), and TNM staging group (stage 0–IV). Treatment characteristics encompassed appointment completion rates and the use of single or combination treatments throughout a patient’s follow-up, including radiation of the breast (right, left), chemotherapy, endocrine therapy, and immunotherapy.

2.6. Descriptive Analysis

To ensure the integrity and reliability of our dataset for analysis, we implemented an outlier detection procedure [33]. This involved the application of data visualization techniques, specifically utilizing box plots, to effectively identify and subsequently remove outliers from the dataset [34].

The data were categorized based on race/ethnicity (NHB, NHW) and presented as absolute values and percentages for categorical variables and as median and quartiles for continuous variables. To compare categorical variables among different racial/ethnic groups, the Pearson chi-square test was employed. The distribution assumptions of continuous variables were assessed using histograms and the Kolmogorov–Smirnov test. Student’s t-tests were conducted for normally distributed factors, while non-parametric Kruskal–Wallis tests were used for non-normally distributed factors.

Correlation plots were used to examine the correlations among independent variables, and variables that exhibited statistically significant correlations were not included simultaneously in the models. A significance level of p < 0.05 was considered, and missing values were excluded from the analysis.

2.7. Machine Learning Development

Race-specific and race-agnostic ML models, with and without SDOH data, were developed and compared (Figure 2). The ML approach was chosen in this study due to its ability to learn from data and handle diverse data structures [14,35,36]. We utilized the tree-based method called extreme gradient boosting (XGBoost), designed for ML in survival analysis [37,38].

The preprocessing phase encompassed three main stages: data splitting, feature engineering, and feature selection. During the data split, the data were chronologically divided into three sets: 60% for training, 20% for testing, and 20% for validation [39]. In the process of feature engineering, categorical variable columns were transformed through transposition, resulting in the creation of individual binary classification columns for each category—in this new scheme, a value of 1 represented true, while 0 denoted false [40]. Feature selection was performed on the training set by comparing variables based on their association with MACE (yes vs. no), selecting those with a p-value less than 0.30, a conservative approach to avoid the exclusion of relevant covariates [41]. The testing set was used for hyperparameter tuning using a 10-fold 10-times cross-validation with 100 iterations, prioritizing the C-index [42]. All the models were adjusted for the following hyperparameters: nrounds (number of additional trees or weak learners added to the model), nthread (number of parallel threads used), eta (shrinkage of feature weights in each boosting step), max_depth (the maximum depth of each tree), min_child_weight (the minimum weight/number of samples required to create a new node in the tree), gamma (the minimum loss reduction to create new tree-split), subsample (the fraction of observations/rows to subsample at each step), and colsample_bytree (percentage of features/columns used to build each tree). The hyperparameter tuning was conducted using the randomized search approach [43]. Subsequently, the tuned model was applied to the validation set using a 10-fold, 10-times cross-validation. The performance of the ML models was assessed using the mean C-index, a precise and appropriate technique for measuring prediction error, along with its 95% CI [42,44,45]. The models ultimately chosen following the aforementioned phases were the ones exhibiting the highest C-index values.

2.8. Software and Packages

The analyses were conducted using RStudio software, version 4.2.2 [46]. The ML models were developed using the “mlr3” (version 0.16.1) and “mlr3proba” (version 0.5.2) packages [47,48].

3. Results

3.1. Population

We included 4309 females with BC (Figure 1; Table 1), of which 765 (17.8%) were categorized as NHB. The median age at diagnosis for the cohort was 63 years, with an interquartile range (IQR) of 53 to 72 years. Ductal carcinoma accounted for 49.2% of the diagnoses, while 5.7% were classified as stage III and 1.9% as stage IV. Among the cases, 44.9% were ER-positive, 40.2% were PR-positive, and 6.8% were HER2-positive. Most patients were never smokers (50.6%) and had a history or risk factor for cardiovascular disease (74.6%). The median Charlson comorbidity score was 4 (IQR 2–7). Surgery was performed in 60% of the cohort, while 28.2% received chemotherapy, 46% received endocrine therapy, 4.7% received immunotherapy, and 39.4% received radiotherapy.

3.2. Outcomes

Within a two-year follow-up period after the BC diagnosis, 11.4% of the patients experienced a MACE, with a median time-to-event of 177 days and an IQR of 45 to 414 days. HF was the most commonly diagnosed event, occurring in 6.9% of the patients, followed by A-fib in 3.7%, IS in 2.4%, and ACS in 2.3%. When comparing NHB individuals to NHW individuals, significantly higher rates of MACE (19.2% vs. 9.9%), HF (13.1% vs. 5.5%), and ACS (4.8% vs. 1.7%) were observed among NHB patients (p < 0.001). Moreover, NHB individuals had a rate of IS of 3.4% and A-fib of 3.8%, while NHW had rates of IS of 2.3% and A-fib of 3.8%. There were no notable differences in the time-to-event between racial/ethnic groups.

3.3. Race-Agnostic ML Models

The race-agnostic models with and without SDOH data were developed in 4309 female patients with BC (Table 2). The model without SDOH data exhibited a C-index of 0.78 (95% CI 0.76–0.79), while the model with SDOH data exhibited a C-index of 0.81 (95% CI 0.80–0.82).

3.4. Race-Specific ML Models—NHB

The race-specific models in NHB were developed in 765 patients (Table 2). The model without SDOH data exhibited a C-index of 0.74 (95% CI 0.72–0.76). The model with SDOH data exhibited a C-index of 0.75 (95% CI 0.73–0.78).

3.5. Race-Specific ML Model—NHW

The race-specific models in NHW were developed in 3321 patients (Table 2). The model without SDOH data exhibited a C-index of 0.79 (95% CI 0.77–0.80). The model with the SDOH data model exhibited a C-index of 0.79 (95% CI 0.77–0.80).

4. Discussion

This study aimed to develop and compare race-specific and race-agnostic ML models, with and without SDOH data, in predicting MACE in patients with BC. Our findings indicate that including SDOH data significantly improved the predictive performance of the ML models in NHB patients. Conversely, for NHW patients, the addition of SDOH data did not result in a noticeable change in the model’s performance, suggesting that other factors may have a more prominent role in driving MACE development in this group. Racial disparities in SDOH may contribute to the higher incidence of MACE in NHB patients, further emphasizing the social construct of race.

As a field, cardiology has been at the forefront of adopting ML techniques [49,50,51,52]. Several studies have demonstrated that ML algorithms outperform traditional risk assessments that rely on established CVD risk factors [13,53,54]. Conventional CVD risk assessment models often assume a linear relationship between each risk factor and CVD outcomes [55]. In addition, these models have limitations, including variations among specific populations, the overestimation of CVD risk in certain situations, and a limited number of predictors [56,57]. In previously published ML models for CVD prediction that did not incorporate SDOH data, most shared a common set of demographic variables (e.g., age, sex, smoking status) and laboratory values [13]. Our results encourage the integration of SDOH into ML algorithms developed for predicting CVD in patients with BC.

Traditional clinical risk factors for CVD have long been acknowledged in prevention efforts [58]. However, there is increasing recognition of the significant role played by the SDOH in the development of CVD [7]. Recent evidence has shown that specific SDOH, such as socioeconomic status (SES), race and ethnicity, social support, cultural and language factors, access to healthcare, and residential environment, play a crucial role in predicting disparities in CVD risk and CVD outcomes [7]. A lower SES is hypothesized to act as a chronic stressor, contributing to promoting a proinflammatory state and developing atherosclerosis [59,60,61,62]. The chronic stress associated with lower SES can be quantified using allostatic load, which is linked to a significant increase of up to 31% in CVD risk [21]. Taking into account the aspect of the neighborhood-built environment (which refers to the physical characteristics and design of neighborhoods), research has consistently demonstrated that adverse neighborhood conditions such as higher population density; increased traffic; limited availability of nearby stores, supermarkets, and fitness centers; and insufficient green spaces or vegetation are associated with an elevated CVD risk [63,64,65,66,67]. Furthermore, psychosocial factors (psychological and social characteristics) play a crucial role in CVD—various factors within this domain, including job strain, childhood experiences, depression, perceived discrimination, and social isolation, have been shown to have significant associations with the development and progression of CVD [68,69,70,71,72,73,74,75,76]. Our findings reaffirm the crucial role of SDOH in CVD. We observed a noteworthy enhancement in the predictive performance of the race-agnostic model when incorporating SDOH data, with the model’s C-index increasing from 0.78 to 0.81. This underscores the significance of considering SDOH factors in improving the accuracy of CVD prediction models.

Notably, our results have shown that the predictive performance after including SDOH data is higher in NHB compared to NHW. This highlights the importance of understanding racial disparities and conceptualizing race as a social construct. Structural racism can contribute to residential segregation, which in turn influences employment prospects, economic status, access to quality education, and exposure to higher levels of neighborhood violence, crime, and poverty [7]. An illustrative example of this effect is the higher likelihood of Black individuals residing in states with high levels of structural racism reporting a history of MI within the past year compared to Black individuals in states with low levels of structural racism [7]. Focusing specifically on patients with BC, it is hypothesized that adverse SDOH may explain the racial disparities observed in CVD outcomes within this population, as NHB women with BC face greater adversity in SDOH factors [16]. This is of utmost importance considering the higher MACE/CVD rates observed in NHB individuals, as confirmed by our study results [16].

From a practical standpoint, the findings of our study align with the principles outlined in the 2023 American Heart Association statement titled “Equity in Cardio-Oncology Care and Research”, emphasizing the need to implement strategies that mitigate inequalities and address the healthcare needs of underserved populations [77]. The results underscore the urgency of developing public health policies aimed at addressing disparities in SDOH. Immediate action is needed to ensure equitable healthcare access and tackle the underlying factors contributing to SDOH disparities. Furthermore, our study has demonstrated the importance of integrating SDOH data into future predictive models to enhance their performance.

This study possesses several limitations. First, the database used in this study relies on EHR, and some information may be incomplete or missing. Furthermore, while our institution maintains a close follow-up with patients as a nationally recognized comprehensive cancer center, some patients may still be lost to follow-up or seek emergency care at other healthcare facilities, which could introduce a potential bias. Additionally, the criteria for data availability in LexisNexis may have led to a selection bias in our sample. The results reported may reflect the characteristics and demographics of the catchment area where our institution is located and may represent individuals with a higher propensity for seeking healthcare services. Moreover, including both patients with curable and incurable BC could have influenced the reported rates of MACE. The ML models were not validated in an external dataset.

5. Conclusions

In summary, there is an improvement in the predictive performance of machine learning models for predicting MACEs in patients with BC with the incorporation of social determinants of health (SDOH) data, particularly NHB patients. These findings underscore that race is a social construct and emphasize the importance of public policies to reduce inequalities and address SDOH disparities. Future studies should consider prospective and multicenter designs or US nationally representative samples, encompass diverse populations, explore a broader range of covariates, develop specific models for different types of CVD, scrutinize optimal cut-off points for individual models, and investigate the geographical variations in SDOH within regions.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/cancers15184630/s1, Table S1: ICD codes used to categorize cardiovascular history/risk factors and MACE. Medications names included on each medication category. Social determinants of health variables from LexisNexis and its domain categorization.

Author Contributions

A.G. and N.S. conceived the study concept and the study design. N.S. had full access to all the data in the study and took responsibility for the data’s integrity and the accuracy of the data analysis. N.S. and A.G. drafted the first version of the manuscript. All authors contributed to manuscript revisions and editing. All authors have read and agreed to the published version of the manuscript.

Funding

AG is supported by an American Heart Association-Strategically Focused Research Network Grant in Disparities in Cardio-Oncology (#847740, #863620); NS is supported through funding from the Sociedade Beneficente Israelita Brasileira Albert Einstein on the program “Marcos Lottenberg & Marcos Wolosker International Fellowship for Physicians Scientist—Case Western”.

Institutional Review Board Statement

Patient records were deidentified, and the study was approved by the University Hospitals of Cleveland Institutional Review Board (IRB), ethic code: STUDY20200606.

Informed Consent Statement

Patient consent was waived due to the deidentification of the data.

Data Availability Statement

University Hospitals (UH) Seidman Cancer Center database is available at UH Cleveland Medical Center and has access restricted to researchers with IRB approval.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

ACS	acute coronary syndrome
A-fib	atrial fibrillation
BC	breast cancer
CI	confidence interval
CVD	cardiovascular disease
CV	cardiovascular
C-index	concordance index
EMERSE	Electronic Medical Record Search Engine
ER	estrogen receptor
HER	eletronic health records
HF	heart failure
IQR	Interquartile range
ICD	International Classification of Diseases
IS	ischemic stroke
MACE	major cardiac events
ML	machine learning
MI	myocardial infarction
NHB	non-Hispanic Black
NHW	non-Hispanic White
NOS	not specified
PR	progesterone receptor
SDOH	social determinants of health
SES	socioeconomic status
TIA	transient ischemic attack
US	United States
UH	University Hospitals

References

Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA. Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef] [PubMed]
Siegel, R.L.; Miller, K.D.; Wagle, N.S.; Jemal, A. Cancer statistics, 2023. CA. Cancer J. Clin. 2023, 73, 17–48. [Google Scholar] [CrossRef] [PubMed]
SEER [Internet]. Cancer of the Breast (Female)—Cancer Stat Facts. Available online: https://seer.cancer.gov/statfacts/html/breast.html (accessed on 1 June 2022).
Giaquinto, A.N.; Sung, H.; Miller, K.D.; Kramer, J.L.; Newman, L.A.; Minihan, A.; Jemal, A.; Siegel, R.L. Breast Cancer Statistics, 2022. CA. Cancer J. Clin. 2022, 72, 524–541. [Google Scholar] [CrossRef]
Mehta, L.S.; Watson, K.E.; Barac, A.; Beckie, T.M.; Bittner, V.; Cruz-Flores, S.; Dent, S.; Kondapalli, L.; Ky, B.; Okwuosa, T.; et al. Cardiovascular Disease and Breast Cancer: Where These Entities Intersect: A Scientific Statement from the American Heart Association. Circulation 2018, 137, e30–e66. [Google Scholar] [CrossRef] [PubMed]
Haque, R.; Prout, M.; Geiger, A.M.; Kamineni, A.; Thwin, S.S.; Avila, C.; Silliman, R.A.; Quinn, V.; Yood, M.U. Comorbidities and Cardiovascular Disease Risk in Older Breast Cancer Survivors. Am. J. Manag. Care 2014, 20, 86–92. [Google Scholar]
Powell-Wiley, T.M.; Baumer, Y.; Baah, F.O.; Baez, A.S.; Farmer, N.; Mahlobo, C.T.; Pita, M.A.; Potharaju, K.A.; Tamura, K.; Wallen, G.R. Social Determinants of Cardiovascular Disease. Circ. Res. 2022, 130, 782–799. [Google Scholar] [CrossRef]
Social Determinants of Health. Available online: https://www.who.int/health-topics/social-determinants-of-health (accessed on 25 August 2022).
Coughlin, S.S. Social determinants of breast cancer risk, stage, and survival. Breast Cancer Res. Treat. 2019, 177, 537–548. [Google Scholar] [CrossRef]
Shin, S.; Austin, P.C.; Ross, H.J.; Abdel-Qadir, H.; Freitas, C.; Tomlinson, G.; Chicco, D.; Mahendiran, M.; Lawler, P.R.; Billia, F.; et al. Machine learning vs. conventional statistical models for predicting heart failure readmission and mortality. ESC Heart Fail. 2020, 8, 106–115. [Google Scholar] [CrossRef]
Stevens, L.M.; Mortazavi, B.J.; Deo, R.C.; Curtis, L.; Kao, D.P. Recommendations for Reporting Machine Learning Analyses in Clinical Research. Circ. Cardiovasc. Qual. Outcomes 2020, 13, e006556. [Google Scholar] [CrossRef]
Stabellini, N.; Nazha, A.; Agrawal, N.; Huhn, M.; Shanahan, J.; Hamerschlak, N.; Waite, K.; Barnholtz-Sloan, J.S.; Montero, A.J. Thirty-Day Unplanned Hospital Readmissions in Patients with Cancer and the Impact of Social Determinants of Health: A Machine Learning Approach. JCO Clin. Cancer Inform. 2023, 7. [Google Scholar] [CrossRef]
Krittanawong, C.; Virk, H.U.H.; Bangalore, S.; Wang, Z.; Johnson, K.W.; Pinotti, R.; Zhang, H.; Kaplin, S.; Narasimhan, B.; Kitai, T.; et al. Machine learning prediction in cardiovascular diseases: A meta-analysis. Sci. Rep. 2020, 10, 16057. [Google Scholar] [CrossRef] [PubMed]
Segar, M.W.; Hall, J.L.; Jhund, P.S.; Powell-Wiley, T.M.; Morris, A.A.; Kao, D.; Fonarow, G.C.; Hernandez, R.; Ibrahim, N.E.; Rutan, C.; et al. Machine Learning–Based Models Incorporating Social Determinants of Health vs. Traditional Models for Predicting In-Hospital Mortality in Patients with Heart Failure. JAMA Cardiol. 2022, 7, 844–854. [Google Scholar] [CrossRef] [PubMed]
Annual Report 2021|University Hospitals|Cleveland, OH|University Hospitals. Available online: https://www.uhhospitals.org/about-uh/publications/UH-Annual-Report/2021-annual-report (accessed on 5 March 2023).
Stabellini, N.; Dmukauskas, M.; Bittencourt, M.S.; Cullen, J.; Barda, A.J.; Moore, J.X.; Dent, S.; Abdel-Qadir, H.; Kawatkar, A.A.; Pandey, A.; et al. Social Determinants of Health and Racial Disparities in Cardiac Events in Breast Cancer. J. Natl. Compr. Canc. Netw. 2023, 21, 705–714.e17. [Google Scholar] [CrossRef] [PubMed]
Stabellini, N. Racial Differences in Chronic Stress/Allostatic Load variation due to Androgen Deprivation Therapy in Prostate Cancer. JACC Cardio Oncol. 2022, 4, 555–557. [Google Scholar] [CrossRef]
Stabellini, N.; Bruno, D.S.; Dmukauskas, M.; Barda, A.J.; Cao, L.; Shanahan, J.; Waite, K.; Montero, A.J.; Barnholtz-Sloan, J.S. Sex Differences in Lung Cancer Treatment and Outcomes at a Large Hybrid Academic-Community Practice. JTO Clin. Res. Rep. 2022, 3, 100307. [Google Scholar] [CrossRef]
Stabellini, N.; Chandar, A.K.; Chak, A.; Barda, A.J.; Dmukauskas, M.; Waite, K.; Barnholtz-Sloan, J.S. Sex differences in esophageal cancer overall and by histological subtype. Sci. Rep. 2022, 12, 5248. [Google Scholar] [CrossRef]
Stabellini, N.; Cullen, J.; Cao, L.; Shanahan, J.; Hamerschlak, N.; Waite, K.; Barnholtz-Sloan, J.S.; Montero, A.J. Racial disparities in breast cancer treatment patterns and treatment related adverse events. Sci. Rep. 2023, 13, 1233. [Google Scholar] [CrossRef]
Stabellini, N.; Cullen, J.; Bittencourt, M.S.; Moore, J.X.; Cao, L.; Weintraub, N.L.; Harris, R.A.; Wang, X.; Datta, B.; Coughlin, S.S.; et al. Allostatic load and cardiovascular outcomes in males with prostate cancer. JNCI Cancer Spectr. 2023, 7, pkad005. [Google Scholar] [CrossRef]
Stabellini, N.; Tomlinson, B.; Cullen, J.; Shanahan, J.; Waite, K.; Montero, A.J.; Barnholtz-Sloan, J.S.; Hamerschlak, N. Sex differences in adults with acute myeloid leukemia and the impact of sex on overall survival. Cancer Med. 2023, 12, 6711–6721. [Google Scholar] [CrossRef]
EMERSE: Electronic Medical Record Search Engine. Available online: https://project-emerse.org/index.html (accessed on 16 May 2022).
ICD—ICD-9—International Classification of Diseases, Ninth Revision. Available online: https://www.cdc.gov/nchs/icd/icd9.htm (accessed on 25 July 2022).
ICD-10 Version:2019. Available online: https://icd.who.int/browse10/2019/en (accessed on 15 April 2021).
Bonsu, J.M.; Guha, A.; Charles, L.; Yildiz, V.O.; Wei, L.; Baker, B.; Brammer, J.E.; Awan, F.; Lustberg, M.; Reinbolt, R.; et al. Reporting of Cardiovascular Events in Clinical Trials Supporting FDA Approval of Contemporary Cancer Therapies. J. Am. Coll. Cardiol. 2020, 75, 620–628. [Google Scholar] [CrossRef]
Guha, A.; Dey, A.K.; Omer, S.; Abraham, W.T.; Attizzani, G.; Jneid, H.; Addison, D. Contemporary Trends and Outcomes of Percutaneous and Surgical Mitral Valve Replacement or Repair in Patients with Cancer. Am. J. Cardiol. 2020, 125, 1355–1360. [Google Scholar] [CrossRef] [PubMed]
Charlson, M.E.; Pompei, P.; Ales, K.L.; MacKenzie, C.R. A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation. J. Chronic Dis. 1987, 40, 373–383. [Google Scholar] [CrossRef] [PubMed]
Kerr, A.J.; Broad, J.; Wells, S.; Riddell, T.; Jackson, R. Should the first priority in cardiovascular risk management be those with prior cardiovascular disease? Heart Br. Card. Soc. 2009, 95, 125–129. [Google Scholar] [CrossRef] [PubMed]
Social Determinants of Health. Available online: https://risk.lexisnexis.com/healthcare/social-determinants-of-health (accessed on 3 May 2022).
Social Determinants of Health—Healthy People 2030|Health.Gov. Available online: https://health.gov/healthypeople/priority-areas/social-determinants-health (accessed on 3 May 2022).
Accountable Care Organizations (ACOs)|CMS. Available online: https://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/ACO (accessed on 19 October 2022).
Osborne, J.W.; Overbay, A. The power of outliers (and why researchers should ALWAYS check for them). Pract. Assess. Res. Eval. 2004, 9, 6. [Google Scholar] [CrossRef]
Nuzzo, R.L. The Box Plots Alternative for Visualizing Quantitative Data. PM&R 2016, 8, 268–272. [Google Scholar] [CrossRef]
Lewis, E.F. Machine Learning and Social Determinants of Health—An Opportunity to Move Beyond Race for Inpatient Risk Prediction in Patients with Heart Failure. JAMA Cardiol. 2022, 7, 854–855. [Google Scholar] [CrossRef]
Azuaje, F. Artificial intelligence for precision oncology: Beyond patient stratification. Npj Precis. Oncol. 2019, 3, 6. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794, (KDD ’16). [Google Scholar] [CrossRef]
Rusdah, D.A.; Murfi, H. XGBoost in handling missing values for life insurance risk prediction. SN Appl. Sci. 2020, 2, 1336. [Google Scholar] [CrossRef]
Xu, Y.; Goodacre, R. On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning. J. Anal. Test. 2018, 2, 249–262. [Google Scholar] [CrossRef]
Verdonck, T.; Baesens, B.; Óskarsdóttir, M.; vanden Broucke, S. Special issue on feature engineering editorial. Mach. Learn. 2021. [Google Scholar] [CrossRef]
Pudjihartono, N.; Fadason, T.; Kempa-Liehr, A.W.; O’Sullivan, J.M. A Review of Feature Selection Methods for Machine Learning-Based Disease Risk Prediction. Front. Bioinforma. 2022, 2. [Google Scholar] [CrossRef] [PubMed]
Harrell, F.E., Jr.; Califf, R.M.; Pryor, D.B.; Lee, K.L.; Rosati, R.A. Evaluating the Yield of Medical Tests. JAMA 1982, 247, 2543–2546. [Google Scholar] [CrossRef] [PubMed]
Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Harrell, F.E.; Lee, K.L.; Mark, D.B. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 1996, 15, 361–387. [Google Scholar] [CrossRef]
Longato, E.; Vettoretti, M.; Di Camillo, B. A practical perspective on the concordance index for the evaluation and selection of prognostic time-to-event models. J. Biomed. Inform. 2020, 108, 103496. [Google Scholar] [CrossRef] [PubMed]
RStudio|Open Source & Professional Software for Data Science Teams. Available online: https://www.rstudio.com/ (accessed on 3 May 2022).
Lang, M.; Bischl, B.; Richter, J.; Schratz, P.; Casalicchio, G.; Coors, S.; Au, Q.; Binder, M.; Becker, M. Mlr3: Machine Learning in R—Next Generation 2022. Available online: https://CRAN.R-project.org/package=mlr3 (accessed on 20 July 2023).
Sonabend, R.; Király, F.J.; Bender, A.; Bischl, B.; Lang, M. mlr3proba: An R package for machine learning in survival analysis. Bioinformatics 2021, 37, 2789–2791. [Google Scholar] [CrossRef]
Zhang, J.; Gajjala, S.; Agrawal, P.; Tison, G.H.; Hallock, L.A.; Beussink-Nelson, L.; Lassen, M.H.; Fan, E.; Aras, M.A.; Jordan, C.; et al. Fully Automated Echocardiogram Interpretation in Clinical Practice. Circulation 2018, 138, 1623–1635. [Google Scholar] [CrossRef]
Madani, A.; Arnaout, R.; Mofrad, M.; Arnaout, R. Fast and accurate view classification of echocardiograms using deep learning. Npj Digit. Med. 2018, 1, 6. [Google Scholar] [CrossRef]
Attia, Z.I.; Kapa, S.; Lopez-Jimenez, F.; McKie, P.M.; Ladewig, D.J.; Satam, G.; Pellikka, P.A.; Enriquez-Sarano, M.; Noseworthy, P.A.; Munger, T.M.; et al. Screening for cardiac contractile dysfunction using an artificial intelligence–enabled electrocardiogram. Nat. Med. 2019, 25, 70–74. [Google Scholar] [CrossRef]
Javaid, A.; Zghyer, F.; Kim, C.; Spaulding, E.M.; Isakadze, N.; Ding, J.; Kargillis, D.; Gao, Y.; Rahman, F.; Brown, D.E.; et al. Medicine 2032: The future of cardiovascular disease prevention with machine learning and digital health technology. Am. J. Prev. Cardiol. 2022, 12, 100379. [Google Scholar] [CrossRef]
Kakadiaris, I.A.; Vrigkas, M.; Yen, A.A.; Kuznetsova, T.; Budoff, M.; Naghavi, M. Machine Learning Outperforms ACC/AHA CVD Risk Calculator in MESA. J. Am. Heart Assoc. 2018, 7, e009476. [Google Scholar] [CrossRef]
Alaa, A.M.; Bolton, T.; Angelantonio, E.D.; Rudd, J.H.F.; Schaar, M. van der Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants. PLoS ONE 2019, 14, e0213653. [Google Scholar] [CrossRef]
Pal, M.; Parija, S.; Panda, G.; Dhama, K.; Mohapatra, R.K. Risk prediction of cardiovascular disease using machine learning classifiers. Open Med. 2022, 17, 1100–1113. [Google Scholar] [CrossRef]
Kremers, H.M.; Crowson, C.S.; Therneau, T.M.; Roger, V.L.; Gabriel, S.E. High ten-year risk of cardiovascular disease in newly diagnosed rheumatoid arthritis patients: A population-based cohort study. Arthritis Rheum. 2008, 58, 2268–2274. [Google Scholar] [CrossRef]
Damen, J.A.; Pajouheshnia, R.; Heus, P.; Moons, K.G.M.; Reitsma, J.B.; Scholten, R.J.P.M.; Hooft, L.; Debray, T.P.A. Performance of the Framingham risk models and pooled cohort equations for predicting 10-year risk of cardiovascular disease: A systematic review and meta-analysis. BMC Med. 2019, 17, 109. [Google Scholar] [CrossRef] [PubMed]
Roth, G.A.; Mensah, G.A.; Johnson, C.O.; Addolorato, G.; Ammirati, E.; Baddour, L.M.; Barengo, N.C.; Beaton, A.Z.; Benjamin, E.J.; Benziger, C.P.; et al. Global Burden of Cardiovascular Diseases and Risk Factors, 1990–2019. J. Am. Coll. Cardiol. 2020, 76, 2982–3021. [Google Scholar] [CrossRef] [PubMed]
Miller, G.E.; Chen, E.; Shimbo, D. Mechanistic Understanding of Socioeconomic Disparities in Cardiovascular Disease*. J. Am. Coll. Cardiol. 2019, 73, 3256–3258. [Google Scholar] [CrossRef] [PubMed]
Schultz, W.M.; Kelli, H.M.; Lisko, J.C.; Varghese, T.; Shen, J.; Sandesara, P.; Quyyumi, A.A.; Taylor, H.A.; Gulati, M.; Harold, J.G.; et al. Socioeconomic Status and Cardiovascular Outcomes. Circulation 2018, 137, 2166–2178. [Google Scholar] [CrossRef] [PubMed]
Rosengren, A.; Hawken, S.; Ôunpuu, S.; Sliwa, K.; Zubaid, M.; Almahmeed, W.A.; Blackett, K.N.; Sitthi-amorn, C.; Sato, H.; Yusuf, S. Association of psychosocial risk factors with risk of acute myocardial infarction in 11,119 cases and 13,648 controls from 52 countries (the INTERHEART study): Case-control study. The Lancet 2004, 364, 953–962. [Google Scholar] [CrossRef] [PubMed]
Marmot, M.G.; Stansfeld, S.; Patel, C.; North, F.; Head, J.; White, I.; Brunner, E.; Feeney, A.; Marmot, M.G.; Smith, G.D. Health inequalities among British civil servants: The Whitehall II study. The Lancet 1991, 337, 1387–1393. [Google Scholar] [CrossRef]
Malambo, P.; Kengne, A.P.; Villiers, A.D.; Lambert, E.V.; Puoane, T. Built Environment, Selected Risk Factors and Major Cardiovascular Disease Outcomes: A Systematic Review. PLoS ONE 2016, 11, e0166846. [Google Scholar] [CrossRef] [PubMed]
Christine, P.J.; Auchincloss, A.H.; Bertoni, A.G.; Carnethon, M.R.; Sánchez, B.N.; Moore, K.; Adar, S.D.; Horwich, T.B.; Watson, K.E.; Diez Roux, A.V. Longitudinal Associations between Neighborhood Physical and Social Environments and Incident Type 2 Diabetes Mellitus: The Multi-Ethnic Study of Atherosclerosis (MESA). JAMA Intern. Med. 2015, 175, 1311–1320. [Google Scholar] [CrossRef] [PubMed]
James, P.; Banay, R.F.; Hart, J.E.; Laden, F. A Review of the Health Benefits of Greenness. Curr. Epidemiol. Rep. 2015, 2, 131–142. [Google Scholar] [CrossRef] [PubMed]
Yeager, R.; Riggs, D.W.; DeJarnett, N.; Tollerud, D.J.; Wilson, J.; Conklin, D.J.; O’Toole, T.E.; McCracken, J.; Lorkiewicz, P.; Xie, Z.; et al. Association between Residential Greenness and Cardiovascular Disease Risk. J. Am. Heart Assoc. 2018, 7, e009117. [Google Scholar] [CrossRef]
Chandrabose, M.; Rachele, J.N.; Gunn, L.; Kavanagh, A.; Owen, N.; Turrell, G.; Giles-Corti, B.; Sugiyama, T. Built environment and cardio-metabolic health: Systematic review and meta-analysis of longitudinal studies. Obes. Rev. 2019, 20, 41–54. [Google Scholar] [CrossRef]
Wu, Q.; Kling, J.M. Depression and the Risk of Myocardial Infarction and Coronary Death. Medicine 2016, 95, e2815. [Google Scholar] [CrossRef]
Everson-Rose, S.A.; Lutsey, P.L.; Roetker, N.S.; Lewis, T.T.; Kershaw, K.N.; Alonso, A.; Diez Roux, A.V. Perceived Discrimination and Incident Cardiovascular Events: The Multi-Ethnic Study of Atherosclerosis. Am. J. Epidemiol. 2015, 182, 225–234. [Google Scholar] [CrossRef]
Valtorta, N.K.; Kanaan, M.; Gilbody, S.; Hanratty, B. Loneliness, social isolation and risk of cardiovascular disease in the English Longitudinal Study of Ageing. Eur. J. Prev. Cardiol. 2018, 25, 1387–1396. [Google Scholar] [CrossRef]
Deschênes, S.S.; Graham, E.; Kivimäki, M.; Schmitz, N. Adverse Childhood Experiences and the Risk of Diabetes: Examining the Roles of Depressive Symptoms and Cardiometabolic Dysregulations in the Whitehall II Cohort Study. Diabetes Care 2018, 41, 2120–2126. [Google Scholar] [CrossRef]
Stewart, R.A.H.; Colquhoun, D.M.; Marschner, S.L.; Kirby, A.C.; Simes, J.; Nestel, P.J.; Glozier, N.; O’Neil, A.; Oldenburg, B.; White, H.D.; et al. Persistent psychological distress and mortality in patients with stable coronary artery disease. Heart 2017, 103, 1860–1866. [Google Scholar] [CrossRef]
Everson-Rose, S.A.; Roetker, N.S.; Lutsey, P.L.; Kershaw, K.N.; Longstreth, W.T.; Sacco, R.L.; Diez Roux, A.V.; Alonso, A. Chronic Stress, Depressive Symptoms, Anger, Hostility, and Risk of Stroke and Transient Ischemic Attack in the Multi-Ethnic Study of Atherosclerosis. Stroke 2014, 45, 2318–2323. [Google Scholar] [CrossRef] [PubMed]
Öhlin, B.; Nilsson, P.M.; Nilsson, J.-Å.; Berglund, G. Chronic psychosocial stress predicts long-term cardiovascular morbidity and mortality in middle-aged men. Eur. Heart J. 2004, 25, 867–873. [Google Scholar] [CrossRef] [PubMed]
Demakakos, P.; Biddulph, J.P.; de Oliveira, C.; Tsakos, G.; Marmot, M.G. Subjective social status and mortality: The English Longitudinal Study of Ageing. Eur. J. Epidemiol. 2018, 33, 729–739. [Google Scholar] [CrossRef] [PubMed]
Kivimäki, M.; Pentti, J.; Ferrie, J.E.; Batty, G.D.; Nyberg, S.T.; Jokela, M.; Virtanen, M.; Alfredsson, L.; Dragano, N.; Fransson, E.I.; et al. Work stress and risk of death in men and women with and without cardiometabolic disease: A multicohort study. Lancet Diabetes Endocrinol. 2018, 6, 705–713. [Google Scholar] [CrossRef]
Addison, D.; Branch, M.; Baik, A.H.; Fradley, M.G.; Okwuosa, T.; Reding, K.W.; Simpson, K.E.; Suero-Abreu, G.A.; Yang, E.H.; Yancy, C.W.; et al. Equity in Cardio-Oncology Care and Research: A Scientific Statement From the American Heart Association. Circulation 2023, 148, 297–308. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Study consort diagram for breast cancer patients, University Hospitals (UH) population (2010–2020).

Figure 2. Study machine learning design detailing race-specific and race agnostic models.

Table 1. Population characteristics from patients with breast cancer at University Hospitals (UH) Seidman Cancer Center, 2010–2020.

	Patients Diagnosed with Breast Cancer
	University Hospitals (UH), 2010–2020
	n = 4309
Age at diagnosis—median (IQR)	63 (53–72)
Race/ethnicity—n (%)
non-Hispanic Black	765 (17.7)
non-Hispanic White	3321 (77.1)
Other	223 (5.2)
Stage—n (%)
III–IV	326 (7.5)
Histology—n (%)
Ductal	2121 (49.2)
ER+—n (%)	1936 (44.9)
PR+—n (%)	1732 (40.2)
HER2+—n (%)	90 (2.1)
Smoking status—n (%)
Smoker	303 (7)
Former smoker	9897 (22.9)
Never smoker	2182 (50.6)
Unknown	837 (19.4)
Charlson comorbidity score—median (IQR)	4 (2–7)
Cardiovascular history/risk factor—n (%)	3123 (74.6)
Cardiomyopathy	230 (5.3)
Coronary artery disease (CAD)	775 (18)
Myocardial infarction (MI)	261 (6.1)
Carotid disease (CD)	141 (3.3)
Transient ischemic attack (TIA)/Stroke	67 (1.6)
Chronic kidney disease (CKD)	536 (12.4)
Dyslipidemia	2285 (5.3)
Diagnosis per patient—median (IQR)	2 (0–2)
Surgery—n (%)
Mastectomy	792 (18.4)
Lumpectomy	964 (22.4)
Chemotherapy (C)—n (%)	1213 (28.2)
Radiotherapy (R)—n (%)	1699 (39.4)
Left	401 (9.3)
Right	436 (10.1)
Immunotherapy (I)—n (%)	204 (4.7)
Endocrine therapy (E)—n (%)	1982 (46)
Combined therapy—n (%)
C + R	761 (17.7)
I + R	123 (2.9)
H + C + R	459 (10.7)
H + C + R + I	61 (1.4)
% appointments attended—median (IQR)	66.6 (50–81.8)

Table 2. Hyperparameters and performance for race-agnostic and race-specific ML models designed to predict 2-year MACE.

		Hyperparameters	Performance (C-Index)
Race-agnostic	Without SDOH data	nrounds = 2050; nthread = 10; verbose = 0; eta = 0.02715107; max_depth = 9; min_child_weight = 2.886243; gamma = 3.93808; subsample = 0.9668632; colsample_bytree = 0.9550104	0.78 (0.76–0.79)
Race-agnostic	With SDOH data	nrounds = 50; nthread = 8; verbose = 0; eta = 0.1013887; max_depth = 1; min_child_weight = 2.971928; gamma = 3.337559; subsample = 0.804832; colsample_bytree = 0.97875	0.81 (0.80–0.82)
NHB	Without SDOH data	nrounds = 50; nthread = 14; verbose = 0; eta = 0.02364827; max_depth = 1; min_child_weight = 2.62171; gamma = 4.533674; subsample = 0.9894932; colsample_bytree = 0.6737331	0.74 (0.72–0.76)
NHB	With SDOH data	nrounds = 50; nthread = 16; verbose = 0; eta = 0.04240374; max_depth = 4; min_child_weight = 7.789127; gamma = 4.256919; subsample = 0.9581859; colsample_bytree = 0.6278961	0.75 (0.73–0.78)
NHW	Without SDOH data	nrounds = 50; nthread = 4; verbose = 0; eta = 0.03734001; max_depth = 2; min_child_weight = 2.380759; gamma = 4.503645; subsample = 0.8980231; colsample_bytree = 0.8306106	0.79 (0.77–0.80)
NHW	With SDOH data	nrounds = 4050; nthread = 14; verbose = 0; eta = 0.06144029; max_depth = 2; min_child_weight = 0.1104873; gamma = 2.937595; subsample = 0.999557; colsample_bytree = 0.8240068	0.79 (0.77–0.80)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Stabellini, N.; Cullen, J.; Moore, J.X.; Dent, S.; Sutton, A.L.; Shanahan, J.; Montero, A.J.; Guha, A. Social Determinants of Health Data Improve the Prediction of Cardiac Outcomes in Females with Breast Cancer. Cancers 2023, 15, 4630. https://doi.org/10.3390/cancers15184630

AMA Style

Stabellini N, Cullen J, Moore JX, Dent S, Sutton AL, Shanahan J, Montero AJ, Guha A. Social Determinants of Health Data Improve the Prediction of Cardiac Outcomes in Females with Breast Cancer. Cancers. 2023; 15(18):4630. https://doi.org/10.3390/cancers15184630

Chicago/Turabian Style

Stabellini, Nickolas, Jennifer Cullen, Justin X. Moore, Susan Dent, Arnethea L. Sutton, John Shanahan, Alberto J. Montero, and Avirup Guha. 2023. "Social Determinants of Health Data Improve the Prediction of Cardiac Outcomes in Females with Breast Cancer" Cancers 15, no. 18: 4630. https://doi.org/10.3390/cancers15184630

APA Style

Stabellini, N., Cullen, J., Moore, J. X., Dent, S., Sutton, A. L., Shanahan, J., Montero, A. J., & Guha, A. (2023). Social Determinants of Health Data Improve the Prediction of Cardiac Outcomes in Females with Breast Cancer. Cancers, 15(18), 4630. https://doi.org/10.3390/cancers15184630

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Social Determinants of Health Data Improve the Prediction of Cardiac Outcomes in Females with Breast Cancer

Abstract

Simple Summary

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Setting

2.2. Data Source

2.3. Inclusion and Exclusion Criteria

2.4. Outcome

2.5. Covariates

2.6. Descriptive Analysis

2.7. Machine Learning Development

2.8. Software and Packages

3. Results

3.1. Population

3.2. Outcomes

3.3. Race-Agnostic ML Models

3.4. Race-Specific ML Models—NHB

3.5. Race-Specific ML Model—NHW

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI