A Systematic Review and Critical Assessment of Breast Cancer Risk Prediction Tools Incorporating a Polygenic Risk Score for the General Population

Simple Summary Several risk prediction tools have been developed to better stratify women according to their risk of developing breast cancer (BC) and inform prevention and early detection strategies. Many recent versions of these tools now incorporate a polygenic risk score (PRS) that uses the aggregated effect of common genetic variants, also known as single nucleotide polymorphisms (SNP), as a reliable predictor to estimate BC risk. However, the characteristics of each tool in terms of PRS development, population, and risk factors included vary considerably, which may affect their predictive performance and limit their use in public health practices. Thus, this systematic review characterizes BC risk prediction tools incorporating a PRS and explores the factors that can influence their ability to predict a woman’s risk of developing BC during her lifetime. Abstract Single nucleotide polymorphisms (SNPs) in the form of a polygenic risk score (PRS) have emerged as a promising factor that could improve the predictive performance of breast cancer (BC) risk prediction tools. This study aims to appraise and critically assess the current evidence on these tools. Studies were identified using Medline, EMBASE and the Cochrane Library up to November 2022 and were included if they described the development and/ or validation of a BC risk prediction model using a PRS for women of the general population and if they reported a measure of predictive performance. We identified 37 articles, of which 29 combined genetic and non-genetic risk factors using seven different risk prediction tools. Most models (55.0%) were developed on populations from European ancestry and performed better than those developed on populations from other ancestry groups. Regardless of the number of SNPs in each PRS, models combining a PRS with genetic and non-genetic risk factors generally had better discriminatory accuracy (AUC from 0.52 to 0.77) than those using a PRS alone (AUC from 0.48 to 0.68). The overall risk of bias was considered low in most studies. BC risk prediction tools combining a PRS with genetic and non-genetic risk factors provided better discriminative accuracy than either used alone. Further studies are needed to cross-compare their clinical utility and readiness for implementation in public health practices.


Introduction
Breast cancer (BC) remains a major public health issue worldwide and is the second leading cause of death among women annually [1].There is compelling evidence that early detection by mammography screening improves prognosis and reduces mortality rates from BC even though risks of overdiagnosis, over-treatment and psychological impacts cannot be discounted [2,3].
Currently, most organized BC screening programs are offered to women based solely on their age, from age 40-50 to age 70-74, depending on the country [4,5].Although the risk of developing BC increases with age, genetic, environmental, lifestyle, reproductive and hormonal factors have been found to be associated with the risk of developing the disease [6].In this context, risk-stratified BC screening, in which individual risk assessment based on multiple risk factors is used to tailor screening recommendations (e.g., more screening for women at higher risk and less screening for those at lower risk), has been proposed as an alternative to the current age-based approach [7][8][9].Developing and validating accurate BC risk prediction tools is therefore critical to achieving optimal riskstratified BC screening strategies.
Genome-wide association studies (GWAS) have identified common, low-penetrance genetic variants associated with BC risk [10,11].Although individually, these variants confer minimal risk of BC, their effect becomes significant when aggregated as a polygenic risk score [(PRS), also known as genetic risk score-GRS] [12].This PRS can be used alone or incorporated into a risk prediction model to identify women at higher risk of developing BC [13,14].At the population level, risk prediction tools could be used to stratify healthy women based on their risk level of developing cancer in a certain time period (commonly 5 or 10 years) in order to adapt preventive measures [15][16][17].While risk prediction models aim to predict the probability of an event occurring in individuals based on a combination of factors, risk prediction tools are the means by which these models are implemented in clinical or public health practice [18].Commonly used tools to predict BC risk include the Breast Cancer Risk Assessment Tool (BCRAT; also referred to as the Gail model) [19], the International Breast Cancer Intervention Study model (IBIS; the Tyrer-Cuzick model) [20], BRCAPRO risk assessment tool [21], and the Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm model (BOADICEA) [13,22].Clinical-grade tests to measure PRS are now available, and several BC risk prediction tools, including IBIS and BOACIDEA, have been extended to include a PRS value [13,[23][24][25].
For a risk prediction tool based on or incorporating a PRS to be clinically useful for prevention or early detection, it must provide good risk discrimination between individuals who will develop the disease and those who will not and account for the population risk of the disease.Even if a model is well-calibrated to predict different risk categories, its ability to stratify groups of individuals in the population with a sufficient difference in absolute risk to justify additional preventive interventions plays an important role in its clinical utility [23].Nonetheless, considerable heterogeneity in PRS development methods brings uncertainty and potential bias to BC risk prediction tools incorporating a PRS [26].
To date, there is no critical assessment regarding the development, performance, and risk of bias of BC risk prediction tools that include a PRS.In addition, population characteristics for which these tools are best suited remain to be further investigated.Finally, we need to understand the current validation processes of these tools and the existence of comparative studies to envision how they could apply to clinical or public health practices.Thus, we conducted this systematic review to help fill this gap in the literature.Our specific aims are to (1) identify, characterize, and summarise the different prediction risk models incorporating a PRS to estimate the risk of developing BC in women in the general population; and (2) assess the risk of bias of individual studies reporting on their performance.

Protocol and Registration
A systematic review protocol was published in the International Prospective Register of Systematic Reviews PROSPERO (PROSPERO 2020 CRD42020198930 available at https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=198930 (accessed on 6 November 2023).

Search Strategy
This systematic review followed PICOTS and PRISMA guidelines [27,28].We proceeded to a first search of the Medline, EMBASE databases and the Cochrane Library up to June 2021 using the strategy presented in Supplemental Table S1.We then updated our search to retrieve relevant literature up to November 2022.Our search strategy, adapted for each database, consisted of a combination of keywords and controlled vocabulary for three concepts: "breast cancer", "polygenic risk score or genetic risk score" and "cancer risk prediction tools".We also manually screened bibliographic references of all included papers and other relevant systematic reviews or meta-analyses to retrieve additional studies.

Eligibility Criteria
We included studies reporting original research published in a peer-reviewed journal describing the development and/or validation of prediction models incorporating a PRS and using it to estimate the risk of developing BC for adult women in the general population.We defined the general population as a cohort representing women typically considered at average risk of developing BC.Therefore, we excluded studies including individuals with a history of BC or focusing on specific population groups (e.g., individuals with a known mutation in BRCA1 or BRCA2 genes, nurses' study, hereditary BC risk in a familial setting, etc.).The prediction models included in this review could either use only genetic factors in the form of a PRS or a combination of genetic and non-genetic risk factors.To be included in our review, studies needed to meet the following criteria: (1) describe the development and/or validation of a prediction model; (2) use at least two SNPs in the form of a PRS or GRS; (3) predict the risk of developing BC for a specified period in an individual's life (e.g., 5 or 10-year risk, lifetime or remaining lifetime risk); (4) report a measure of performance to assess the predictive capacity of the model (i.e., measure of discrimination (e.g., C-statistic, AUC), or calibration (e.g., Hosmer-Lemeshow statistic)).Articles published in French or English and with any study design were considered without restriction on the publication year.

Study Selection
Two independent reviewers (C.Mbuya-Bienge and C.D. Kazemali) screened the titles and abstracts.A first pilot selection of titles and abstracts was conducted from a random sample of 10% of the identified articles to verify the clarity and consistency of inclusion criteria.Since the kappa statistic was 0.9 for this pilot, indicating no significant problems, no changes were made to the selection process and criteria.The full text of all potentially relevant studies was also assessed independently by the two researchers.When a consensus to include or exclude a study could not be reached, a senior researcher (H.Nabi) made the final decision.

Data Extraction Process and Analysis
Data extraction was undertaken independently by the same two researchers (C.Mbuya-Bienge and C.D. Kazemali) using a grid based on the CHARMS checklist of relevant items to extract from individual studies of prediction tools [29].Data were extracted into tables divided into four categories that influence the models' validity and utility: (1) study characteristics, (2) outcomes and predictors, (3) model development and (4) model performance and validation.Study characteristics included information such as study design, source of data, study population size, study population characteristics (e.g., ethnicity) and type of study according to the TRIPOD guidelines [30] (e.g., development only (1a); development and validation using resampling (1b); random (2a) or non-random (2b) split sample development and validation; development and validation using a separate data (3); and validation only (4)).Outcomes and predictors were assessed based on the studies' definition and method for measuring the outcomes and predictors, handling of predictors and selecting genetic and non-genetic predictors.For the model development phase, we considered the handling of missing data, the modelling method and the model presentation.Model performance and validation were assessed from the reported measure of performance, the classification measures if available and the method of internal or external validation if applicable.
The heterogeneity of model characteristics in terms of predictors and outcomes precluded the possibility of pooling data across studies.Therefore, a narrative synthesis was conducted.Key study characteristics, validation and accuracy of individual risk prediction models, as well as the methodological quality, are described in tables and summarised narratively.We presented the studies' main measure of discrimination and its 95% confidence interval (CI), when provided, using a forest plot.The measures of discrimination, such as the area under the receiver operating characteristics curve (AUROC or AUC) or the concordance statistics (c-statistics), indicate how well patients can be classified into two groups (usually the cases with the disease and the controls without the disease).Possible values range between 0.0 and 1.0 with a value of 1.0 indicating that the model has a perfect classification accuracy and 0.5 indicating that the classification is not better than a random classification.On rare occasions, the value can be less than 0.5, indicating that the model has an inaccurate classification accuracy (i.e., it performs worse than chance) [31,32].When the same study reported performance measures for multiple steps of the same model (e.g., model development and internal validation), only the best-performing model was included in our main analysis.When an article presented performance measures for the development of a model and external validation on a different population (TRIPOD level 3), both models were included in our analysis.If a performance measure was presented separately for a model including only a PRS and the same model combining the PRS with genetic and non-genetic risk factors, both models were considered separately.The same method was performed if a model presented results for different subtypes of BC or different ethnicities.However, if a study presented different performance measures for the PRS and some genetic and non-genetic risk factors individually, only the most comprehensive model was retained.
We also presented the calibration assessed with the Hosmer-Lemeshow test or the expected-to-observed (E/O) ratio, and the reclassification assessed with the net reclassification index (NRI) when available.The Hosmer-Lemeshow test provides a chi-square and p-value that indicates the goodness of fit.When this test is not statistically significant, it indicates a lack of evidence of model miscalibration.The expected-to-observed (E/O) ratio provides a ratio of the total expected number of cases (individuals with the outcome) to the observed number of cases.A value of 1 indicates that the model is perfectly calibrated, while values less than 1 and above 1 indicate, respectively, that the model is either underpredicting or overpredicting the total number of cases in the population [31].The NRI seeks to quantify the agreement between risk classification and event status (cases and controls) when comparing an old model to a new model given a set of predefined risk categories.It allows evaluating the incremental value in the predictive capacity of new predictors to an existing set of predictors.The statistic is calculated as follows P(up|case) − P(down|case) + P(down|control − P(up|control), and its value ranges between −2 and 2. The terms "up" and "down", respectively, refer to a new risk model placing an individual into a higher risk category or a lower risk category compared to the old model [33,34].
Finally, we performed sensitivity analyses by using multiple comparative assessments based on characteristics such as the populations on which the model was developed, the number of SNPs, the type of risk prediction tools used, the BC subtype and the age category to determine their impact on the models' performance.

Risk of Bias of Individual Studies
We used PROBAST Prediction model Risk Of Bias Assessment Tool) [35], a tool which is organized into four domains (participants, predictors, outcome and analysis) to assess the risk of bias of individual studies that developed or validated multivariable diagnosis or prognosis prediction models.We used the same classification as the tool to indicate whether the studies were at low (+), high (−) or unclear (?) risk of bias for each domain separately.Based on our classification for each domain, we followed the PROBAST's method [35] to determine the overall risk of bias for a study.

Study Selection
A total of 7377 records were found from our search strategy after removing duplicates.We excluded 7191 records after screening their titles and abstracts and assessed the full text of 186 papers.A flow diagram of the selection process is presented in Figure 1.At the full-text level, the main reasons for exclusion were that the studies did not include a PRS or a GRS (n = 53) and did not present a predictive model (n = 28) or a measure of performance (n = 19).Additionally, seven studies did not present a risk prediction model for the general population as two of them were developed on a population of working nurses [36,37], and five were developed either for women at increased familial risk of BC [38,39], with a previous diagnosis of BC [40,41] or with known genetic mutations [42].Two additional studies were identified via reference screening of the included studies.In total, we included 37 studies [12,[23][24][25] in our systematic review, presenting seven different risk prediction tools.

Characteristics of Risk Prediction Models
Table 2 shows the main non-genetic predictors used in combined models.The number of predictors included in the models varied between one [75] and twenty-five [58].Almost all models used the current age of women as a predictor of BC.Age was considered either by directly introducing it in the models as an independent variable or by stratifying by age groups [25,47,51,58,73].Age at menarche, age at menopause, age at first live birth and family history of BC were also used in the majority of models.Details on participants and risk models are shown in Supplementary Table S2.Most models (55.0%) were developed or validated in populations of European descent, 26.3% in Asian populations, 12.5% in populations of African descent and 6.3% in Hispanic populations.A little more than 55%, 30% and 15% of the models estimated the 5-year, 10-year and lifetime risk of developing BC, respectively.In addition, one model estimated the 2-year [66] and 3-year [71] BC risk.Among the models based on a combination of genetic and non-genetic risk factors, 32.7% were an upgraded version of the BCRAT, 21.2% of the IBIS, 9.6% of the Breast Cancer Surveillance Consortium (BCSC) and 7.7% on the BOADICEA tool.

Characteristics of Included Studies
The main characteristics of individual studies are presented in Table 1.= Also includes microcalcifica RAD50, RAD51C, RAD51D and TP53) includes weight and history of ovaria includes history of lobular carcinoma i noma in situ (LCIS), age at onset of brea model also includes information on ex breast tumor pathology and demograp hip ratio.[71] X Pal Choudhury, 2020 [58] iCARE-Lit = Also includes microcalcifications and masses; d = Also includes a custom gene panel (ATM, BARD1, BRCA1, BRCA2, CD RAD50, RAD51C, RAD51D and TP53) and weight; e = Also includes HRT type; f = Used as an interaction between BMI and menop includes weight and history of ovarian cancer; h = Also includes year of birth, age at cancer diagnosis for family history and hi includes history of lobular carcinoma in situ, age at cancer diagnosis for family history and history of ovarian cancer; j = Also incl noma in situ (LCIS), age at onset of breast cancer in a relative, bilateral breast cancer in a relative, ovarian cancer in a relative and ma model also includes information on explicit family history of breast and other cancers, genetic factors such as pathogenic variants breast tumor pathology and demographic factors (see Lee et al. 2019 [13]); l = Other factors includes sex hormone levels of estrad hip ratio.
Maas, 2016 [55] X X X X X X X X Mealiffe, 2010 [57] X X X X 1 X Olsen, 2021 [71] X Pal Choudhury, 2020 [58] iCARE = Also includes microcalcifications and masses; d = Also includes a custom gene panel (ATM, BARD1, BRCA1, BRCA2, CD RAD50, RAD51C, RAD51D and TP53) and weight; e = Also includes HRT type; f = Used as an interaction between BMI and menop includes weight and history of ovarian cancer; h = Also includes year of birth, age at cancer diagnosis for family history and h includes history of lobular carcinoma in situ, age at cancer diagnosis for family history and history of ovarian cancer; j = Also inc noma in situ (LCIS), age at onset of breast cancer in a relative, bilateral breast cancer in a relative, ovarian cancer in a relative and m model also includes information on explicit family history of breast and other cancers, genetic factors such as pathogenic variants breast tumor pathology and demographic factors (see Lee et al. 2019 [13]); l = Other factors includes sex hormone levels of estra hip ratio.

Heig-Ht
Pal Choudhury, 2020 [58] iCARE = Also includes microcalcifications and masses; d = Also includes a custom gene panel (ATM, BARD1, BRCA1, BRCA2, CDH1, CHEK2, NF1, PALB2, P RAD50, RAD51C, RAD51D and TP53) and weight; e = Also includes HRT type; f = Used as an interaction between BMI and menopausal status at baseline; g = includes weight and history of ovarian cancer; h = Also includes year of birth, age at cancer diagnosis for family history and history of ovarian cancer; i = includes history of lobular carcinoma in situ, age at cancer diagnosis for family history and history of ovarian cancer; j = Also includes hyperplasia, lobular noma in situ (LCIS), age at onset of breast cancer in a relative, bilateral breast cancer in a relative, ovarian cancer in a relative and male breast cancer; k = BOAD model also includes information on explicit family history of breast and other cancers, genetic factors such as pathogenic variants and unobserved genetic e breast tumor pathology and demographic factors (see Lee et al. 2019 [13]); l = Other factors includes sex hormone levels of estradiol; m = Also includess wa hip ratio.
Age of breast cancer diagnosis in affected first-and second-degree relatives is also collected.‡ = Results are presented stratified by age.a = Also includes number of first-degree relatives with ovarian cancer; b = Presented as percentage mammographic density; c = Also includes microcalcifications and masses; d = Also includes a custom gene panel (ATM, BARD1, BRCA1, BRCA2, CDH1, CHEK2, NF1, PALB2, PTEN, RAD50, RAD51C, RAD51D and TP53) and weight; e = Also includes HRT type; f = Used as an interaction between BMI and menopausal status at baseline; g = Also includes weight and history of ovarian cancer; h = Also includes year of birth, age at cancer diagnosis for family history and history of ovarian cancer; i = Also includes history of lobular carcinoma in situ, age at cancer diagnosis for family history and history of ovarian cancer; j = Also includes hyperplasia, lobular carcinoma in situ (LCIS), age at onset of breast cancer in a relative, bilateral breast cancer in a relative, ovarian cancer in a relative and male breast cancer; k = BOADICEA model also includes information on explicit family history of breast and other cancers, genetic factors such as pathogenic variants and unobserved genetic effects, breast tumor pathology and demographic factors (see Lee et al. 2019 [13]); l = Other factors includes sex hormone levels of estradiol; m = Also includess waist-to-hip ratio.

Discriminatory Accuracy
Figures 2 and 3 show the discriminative performance of the individual risk models and their 95% confidence intervals (CI) when provided.For the models including a PRS only, 105 measures of discrimination were reported representing different versions of certain models (Supplemental Table S2).The discrimination measure values ranged from 0.48 (95% CI = 0.43-0.53)[74] to 0.68 (95% CI = 0.61-0.75)[7].For models including a combination of genetic and non-genetic factors, a total of 93 measures of discrimination were also reported.The discrimination measure values ranged from 0.52 (95% CI = 0.48-0.57)[57] to 0.77 (95% CI = 0.75-0.79)[66].The model by Shieh et al. (2017) [7], including only a PRS, had the best predictive capacity and predicted the 5-year ER-positive risk in women of European descent, whereas the one by Liu et al. (2021) [74] had the lowest predictive capacity and predicted the 5-year overall BC risk in women of Latinx descent.The combined model by Erikson et al. ( 2020) [66] had the best predictive capacity and predicted the 2-year overall BC risk in women of European descent, and the combined model by Mealiffe et al. (2010) [57] had the lowest predictive capacity and estimated the 5-year risk of ER-negative BC in women of European descent.Due to differences in outcomes, predictors and time frame, direct comparisons between models could not be made; thus, these results are intended only to provide an overview of the models' performance.

Net Reclassification Improvement
A measure of reclassification was provided in twelve studies [23,43,[46][47][48]51,52,57,61,63,65,71].All but three studies [23,51,65] used the NRI to quantify the discriminatory ability of the combined models with a PRS and genetic and non-genetic risk factors compared to the same model without the PRS.Studies usually used a net reclassification measure for cases (identified as events-NRI e ) and controls (identified as nonevents-NRI ne ) at a predefined risk threshold.NRI values ranged from −0.029 (p-value = 0.5) [47] to 0.181 (95% CI 0.09-0.27)[43].In general, the addition of the PRS to clinical risk prediction tools such as the BCRAT or IBIS improved the classification of patients, with cases going into higher risk categories and controls into lower risk 1 Figure 2. Discriminative performance of individual risk models, including only a PRS.Each dot represents a measure of discrimination for different versions (represented by the letters when applicable) of risk models as described in Supplementary Table S2.The horizontal segment represents the 95% CIs when provided.Blue, green and yellow dots indicate that the AUC, c-statistic and c-index were the measure of performance, respectively.S2.The horizontal segment represents the 95% CIs when provided.Blue, green and yellow dots indicate that the AUC, c-statistic and c-index were the measure of performance, respectively.

Effect of Combining a PRS and Genetic and Non-Genetic Risk Factors
Many studies reported results for a model including a PRS only and models including the same PRS combined with genetic and non-genetic risk factors [23,43,44,[46][47][48][51][52][53]55,57,[59][60][61]72,75].When comparing the same models with and without the addition of risk factors to the PRS, we observed that the combination of genetic and non-genetic risk factors and a PRS improved the discriminative capacity.The model with the greatest improvement from the combination of a PRS and genetic and non-genetic risk factors is the one by Dite et al. (2016) [48].This study, based on the BOADICEA tool, estimated the 5-year risk and presented discrimination measures for the clinical risk factors only (AUC: 0.66; 95% CI: 0.63-0.70),for the PRS only (AUC: 0.61; 95% CI: 0.58-0.65)and for the combined model (AUC: 0.70; 95% CI: 0.67-0.73).The same study also used the BRCAPRO tool, and the AUC for clinical risk factors only was 0.65 (95% CI: 0.62-0.68)and 0.69 (95% CI: 0.66-0.72)for the combined model.Another study by Shieh et al. (2016) [59] based on the BSCS tool estimating the 5-year BC risk on women of East Asian descent showed great improvement from the addition of a PRS with an AUC of 0.72 (95% CI: 0.62-0.82)for the combined model compared to 0.64 (95% CI: 0.53-0.74)for the PRS only.One of the most comprehensive models is the one by Yang X. et al. (2022) [23], combining a PRS and genetic and nongenetic risk factors using the BOADICEA tool.Their PRS-only model had an AUC of 0.67 (CI 95%: 0.64-0.69),whereas the addition of genetic and non-genetic risk factors, including information on family history, risk factors such as lifestyle, hormonal and reproductive risk factors, mammographic density and pathogenic variants in BC susceptible genes such as BRCA1 and BRCA2 provided an AUC of 0.70 (95% CI = 0.66-0.73).

Effect of Ethnicity
In general, models developed on populations of European descent performed better than models developed on populations from other ethnicities.A study by Allman et al. (2021) [44] validated the same model on three different populations: African Americans, Whites and Hispanics.The AUCs were 0.57 (95% CI: 0.54-0.60),0.64 (95% CI: 0.61-0.67)and 0.60 (95% CI: 0.55-0.65),respectively.Liu et al. (2021) [74] also validated three PRS originally developed on women of European descent in women of African and Latinx descent.The AUCs were consistently lower in women of African and Latinx descent.One PRS had AUCs of 0.60 (95% CI: 0.59-0.61),0.55 (95% CI: 0.51-0.58)and 0.55 (95% CI: 0.50-0.60) in women of European, African and Latinx descent, respectively.However, models developed on populations of Asian descent tended to have similar predictive performance to those from populations of European descent.Ho et al. (2020) [50] validated the PRS developed by Mavaddat et al. (2019) [12] on an Asian population, which had been previously validated on a White population.Both AUCs were close, with values of 0.61 and 0.63.A more recent study by Ho et al. (2022) [68] showed that a PRS developed on a population of Asian descent using a Bayesian polygenic prediction approach and a combination of European and Asian-specific SNP weights from a subset of SNPs by Mavaddat et al. (2019) [12] provided an AUC of 0.64.In fact, PRS based on SNPs associated with BC among a specific ethnicity performed better than general PRS.For example, Shieh et al. (2016) [59] applied to an East Asian population a PRS with 76 SNPs associated with BC in that subpopulation and a general PRS with 83 SNPs associated with European descent populations.The Asian-specific PRS had an AUC of 0.64 (95% CI: 0.53-0.74),whereas the general PRS in East Asians had an AUC of 0.62 (95% CI: 0.52-0.73).

Effect of Prediction Time Frame
Most models predicted the risk of BC within 5 or 10 years.Some models also predicted the lifetime risk (i.e., until the age of 80 to 90).Generally, models with a shorter prediction time frame had better predictive performances than models with a longer prediction time frame.Two studies used the same model on the same population but varied the prediction time frame [61,71].The study by Starlard-Davenport et al. (2018) [61] predicted the 5-year risk and the lifetime risk of BC in African American women.The 5-year risk model had a slightly better performance than the lifetime risk model with an AUC of 0.68 (95% CI: 0.64-0.72)compared to 0.66 (95% CI: 0.62-0.70).We observed a similar pattern of results for the study by Olsen et al. (2021) [71], predicting the 5-year and 3-year risk with AUCs of 0.70 (95% CI: 0.67-0.74)and 0.72 (95% CI: 0.68-0.77),respectively.
(2021) [44] used a streamlined version of the Gail tool based only on risk factors such as age and family history that could be easily used by physicians in the absence of the complete information required by the tool.Models based on the BOADICEA tool usually performed well, considering that it is the most comprehensive BC prediction risk tool in terms of risk factors included.The value of their predictive statistic ranged from 0.62 (95% CI: 0.59-0.64) to 0.70 (0.66-0.73).Several studies compared the predictive capacity of different tools on the same population [25,43,48,53,58,73].The predictive values of different tools tended to be similar when validated on the same population.Jantzen et al. (2020) [53] used the BCRAT and IBIS model with a PRS of 86 SNPs to evaluate the 5-year BC risk on White women and had the same c-index value of 0.63 (95% CI: 0.56-0.70).This observation was similar to the study by Allman et al. (2015) [43] comparing the same two tools but on African Americans and Hispanics.For African Americans, the respective AUCs for the BCRAT and IBIS tools were 0.59 (95% CI: 0.56-0.61)and 0.55 (95% CI: 0.52-0.58),whereas for Hispanics, they were 0.61 (95% CI: 0.56-0.66)and 0.59 (0.54-0.64).Another study by Dite et al. (2016) [48] predicting the 5-year risk of invasive BC on White women compared four tools, namely the BOADICEA, BRCAPRO, BCRAT and IBIS tools, and obtained AUCs of 0.70 (95% CI: 0.67-0.73),0.69 (95% CI: 0.66-0.73),0.67 (95% CI: 0.63-0.70)and 0.63 (95% CI: 0.59-0.66),respectively.

Quality of Reporting
The TRIPOD checklist considers 22 items to be essential for good reporting of studies developing or validating multivariable prediction models [30].Of the 37 studies, only four encompassed all the items on the checklist.The vast majority of studies did not follow the title's recommendation.Namely, they did not identify if the study was either a development or a validation model.In the methods, the description of how the missing data were handled was the most omitted item.For most studies, the results section was clear and complete.However, seven studies did not report confidence intervals for all discriminative measures [12,50,55,[63][64][65]68]. Also, as recommended by the TRIPOD guidelines, calibration performance should be included in all prediction models, but it was assessed less often than discriminative performance.All studies reported their limitations and provided an overall interpretation of their results given those limitations.Finally, other information, such as supplementary materials, was often provided, and a funding statement was present in all studies.

Risk of Bias within Studies
Assessment of the risk of bias is presented in Figure 4 based on the four domains of the PROBAST tool [35].Overall, 19 studies were at low risk of bias, 7 were at high risk and 11 were at unknown risk of bias.When the participant domain was at unclear risk, it was mostly because participants' inclusion or exclusion criteria were not described or not described with enough details to determine if they were appropriate.For the predictors' domain, the main reason for the high or unclear risk of bias was the absence of important predictors such as age when the model was developed or validated.A couple of studies were concerning for the outcome domain since it was unclear whether the outcome was a preclinical stage of cancer [66,71].Most risks of bias occurred in the analysis domain as many development models did not account for complexities in the data, such as competing risk or model overfitting, underfitting and optimism, or did not explain how they handled missing data.These risks of bias, such as the potential overfitting, were mentioned in some studies [64,68,72].

Discussion
The goal of this systematic review was to appraise and critically assess different prediction models incorporating a PRS used to estimate the risk of developing BC for women in the general population.We identified 37 studies, of which 8 included genetic factors only, whereas the rest combined genetic and non-genetic risk factors.The combined models were based on 7 different risk prediction tools and provided 93 measures of discrimination.For models' development, the median value of discriminative performance measures was 0.60 (range = 0.53 to 0.68) for models with PRS only and 0.62 (range = 0.52 to 0.77) for models combining PRS and genetic and non-genetic risk factors.For the models' validation, the median value of discriminative performance measures was 0.61 (range = 0.48 to 0.67) for models with PRS only and 0.64 (range = 0.55 to 0.70) for models combing PRS and genetic and non-genetic risk factors.Although the increase in AUC from the combination of the PRS and genetic and non-genetic risk factors may look small, from a public health perspective, even a modest increase in discriminative performance may lead to a considerable improvement in overall risk stratification levels and be clinically meaningful [23].
Comprehensive BC risk prediction tools incorporating known risk factors could have two potential applications.They can be used as risk-stratification tools to improve the ability to identify women in the general population at increased risk who would most

Discussion
The goal of this systematic review was to appraise and critically assess different prediction models incorporating a PRS used to estimate the risk of developing BC for women in the general population.We identified 37 studies, of which 8 included genetic factors only, whereas the rest combined genetic and non-genetic risk factors.The combined models were based on 7 different risk prediction tools and provided 93 measures of discrimination.For models' development, the median value of discriminative performance measures was 0.60 (range = 0.53 to 0.68) for models with PRS only and 0.62 (range = 0.52 to 0.77) for models combining PRS and genetic and non-genetic risk factors.For the models' validation, the median value of discriminative performance measures was 0.61 (range = 0.48 to 0.67) for models with PRS only and 0.64 (range = 0.55 to 0.70) for models combing PRS and genetic and non-genetic risk factors.Although the increase in AUC from the combination of the PRS and genetic and non-genetic risk factors may look small, from a public health perspective, even a modest increase in discriminative performance may lead to a considerable improvement in overall risk stratification levels and be clinically meaningful [23].
Comprehensive BC risk prediction tools incorporating known risk factors could have two potential applications.They can be used as risk-stratification tools to improve the ability to identify women in the general population at increased risk who would most likely benefit from personalized screening recommendations.They can also be used as risk prediction tools to predict the risk of developing overall BC and molecular subtypes in healthy women.However, there are many aspects to consider when evaluating if these tools could be part of clinical routine or public health practices for risk prediction and stratification.The first is to determine the models' capacity to predict the outcome of interest in a defined population, known as the analytical validity [8].The second is to evaluate the clinical utility of the tools (i.e., their usefulness, benefits and harms) [8].The first aspect may be taken into account by evaluating, as performed in this review, the discriminating capacity, the calibration or the fit of a model and, additionally, other performance measures such as the net reclassification index that has been proposed as an alternative or adjunct to discrimination and calibration measures [18].
A first consideration when assessing the predictive performance of a model is that a risk prediction model should be developed in one sample of a data set and validated in a separate independent sample or new data [30].In fact, associations between risk factors and BC derived from the same data set in which the model was developed may occur by chance due to multiple testing.This problem becomes important with a relatively small sample size with many risk factors included in the model.In studies with small sample size, there is a serious risk of selecting unimportant variables and omitting some variables relevant to the model [86].At the same time, studies with a very large sample size are more likely to include statistically significant variables but with little clinical importance [87].Simulation studies have suggested that the ideal number of subjects with events should be at least 10 and safer with 20 or more per risk factor in order to build a valid model [88,89].As per the results of our review, the number of variables included in the models varied from 2 to 25 variables, so the required number of BC cases should range between 20 and 500 subjects.In this regard, two models could present issues.The model estimating the 10-year ER-negative BC risk by Brentnall et al. (2020) [45] with an AUC of 0.63 (95% CI: 0.54-0.71)had 39 cases for 7 predictors, including the PRS score.The same situation is also present with the model by Shieh et al. (2016) [59] with an AUC of 0.72 (95% CI: 0.65-0.79)but 51 cases for 7 predictors, including the PRS score.Therefore, these models could lead to overoptimistic results in the validation data [90].However, robust methods can facilitate variable selection, especially when there is a large number of predictors, and can account for many challenges in SNP selection and model specification [91].Thus, some studies have used more sophisticated methods to select or develop their PRS, such as penalized regression or Bayesian approach [12,67,68,70,72] and show promising results, particularly in diverse populations.
As models perform better in the sample in which they were developed rather than in a different sample or a completely new population, model development should include a validation process.However, about a third of the models did not present any validation (i.e., internal or external validation), which brings concerns regarding the validity of some models as they might not be ready to be used.Nonetheless, several studies were a validation or an extension of existing prediction models.While the addition of a PRS to existing models increased their discriminative accuracy, the number of SNPs considered in the PRS varied widely from one study to another.It could be another factor influencing their predictive performance.SNPs included in a PRS should be inherited independently (i.e., in linkage equilibrium).Some studies excluded SNPs in high linkage disequilibrium from those reported in the original study [75] or used them as proxies for risk variants not available in their dataset [52,59,75].With the discovery of more SNPs from larger GWAS, the AUC of the risk prediction models is improving.For instance, the oldest model by Mealiffe et al. (2010) [57] had an AUC of 0.58 (95% CI: 0.57-0.60)for the prediction of overall BC using only a 7 SNP-PRS while a more recent model by Mavaddat et al. (2019) [12] had an AUC of 0.63 (95% CI: 0.63-0.65)for the prediction of overall BC using only a 313 SNP-PRS.Since a small improvement in AUC can have a significant impact on risk stratification, one relevant parameter to evaluate a PRS should be the proportion of the polygenic variance attributable to the PRS as expressed by the odds ratio per 1 standard deviation [14].
Another consideration when developing a risk prediction tool is to choose the timeframe for which risk should be predicted [18].Follow-up time has been shown to have an impact on discriminative accuracy measures such as the concordance index and is likely important to understand differences in predictive ability [92,93].In our review, the model with the highest discriminative accuracy had the shortest prediction time frame.In fact, the model by Eriksson et al. (2020) [66] had an AUC of 0.77 (CI 95%: 0.75-0.79)but estimated the 2-year BC risk, whereas most models predicted the 5-year, 10-year or lifetime risk.However, models evaluating short time frames would likely identify existing cancers or preclinical cases.The lead time for BC, which is the period between the early detection of BC by screening and the moment the cancer clinically presents or is diagnosed, is about two to three years [94].Thus, models predicting two or three-year risk are more likely to be diagnostic tools and be considered as screening tests.Tools with longer timeframes could be more effective in predicting BC risk in a screening setting for risk stratification and be used as complements of early detection tools [66].
The choice of the risk prediction tool, especially in a public health setting, should also be examined [18].In our review, based on the latest version of the risk prediction tools and considering the overall risk of bias and the population size, we observed that the best combined models were validation models derived from the BOADICEAv.6and the Tyrer-Cuzick v.8 tools [23,25].Both models predicted the 5-year risk of overall BC in White women and used the 313-SNP PRS developed by Mavaddat et al.(2019) [12].Although both models have limitations, including that they are not as well-calibrated in non-White populations, the Tyrer-Cuzick tool for women over 50 provided an AUC of 0.69 (95% CI: 0.64-0.75),and the BOADICEA provided an AUC of 0.70 (95% CI: 0.66-0.73).The Tyrer-Cuzick tool used a wide range of non-genetic risk factors but was missing important risk factors such as breast density.The BOADICEA tool was the most comprehensive, combining genetic risk factors such as the PRS and pathogenic variants in BC susceptibility genes as well as non-genetic risk factors, including breast density.
Some tools included predictors easily collected in routine clinical practice or even by questionnaires like smoking status and BMI, whereas others include predictors requiring an extensive medical examination.For example, biopsy histopathology (i.e., the presence of atypical hyperplasia) increases the risk of a woman for BC and is included in the BCRAT model [95].However, it can be difficult or expensive to collect and, therefore, has limited use in a population-wide public health approach.Indeed, most models that considered this predictor did not have the information and coded the variable as unknown [43,47,48,57,73].On the other hand, including unique risk factors such as mammographic masses and microcalcifications, as performed by Eriksson et al. (2020) [66], could improve BC risk prediction tools.Lastly, the effect of some risk factors, such as family history, needs to be carefully considered as it may inflate the value of AUC when cases have enriched and strong family history of BC.This has been shown in the study by Starlard-Davenport et al. (2018), reporting an AUC of 0.68 (CI 95%: 0.64-0.72)for the 5-year risk of overall BC in African American women that was significantly higher than other models developed in African American women but unselected for family history [61].
The heterogeneity of populations is also challenging when choosing a prediction tool.We observed that only a limited number of risk models were developed or validated in non-White populations.The models developed or validated on individuals of Latin or African descent performed particularly poorly, with the highest AUC of 0.68 (CI 95%: 0.64-0.72)for African descent [61] and 0.60 (CI 95%: 0.55-0.68)for Latin descent [44].In comparison, models developed or validated in Asian women showed fairly similar performance to the ones developed in White populations [50], with the highest AUC of 0.72 (CI 95%: 0.62-0.82)[59].When integrating PRS into public health practice, we need to ensure that it does not exacerbate health disparities.Currently, risk prediction tools, including a PRS, are not easily generalizable across diverse populations.Other ethnic groups have been underrepresented in GWAS, including about 80% of participants from European ancestry [96,97].This is even more problematic for African, Hispanic or Indigenous individuals who were included in less than 4.0% of the GWAS studies from the first decade of their development [98].The poor performance of models for non-White populations may be explained by the fact that PRS are usually calculated as a weighted sum of the risk alleles of SNPs derived from GWAS.However, most PRS do not account for effect sizes being different than the reference of populations of European descent [98,99].Also, events such as the "flip-flop" phenomenon where a variant is a risk factor in one population but a protector in another have been observed in about 30 to 40% of variants across studies and affect the performance of risk prediction models [11,78].Thus, the performance of the PRS declines with increasing genetic divergence from the reference population, resulting in attenuated associations partly due to variation in linkage disequilibrium patterns and allele frequencies.Therefore, many researchers and organizations have raised the need for more diverse biobanks to conduct GWAS [100].
In conclusion, although the addition of new risk factors such as SNPs has improved the discriminative ability of risk prediction models, they still need to be further evaluated to address the potential barriers to using these tools and the appropriate threshold for interventions and/or recommendations [17,18].Also, there is currently no recommendation for any tools predicting individual risk to be used as the standard in a screening context.Therefore, comparative assessment and validation of risk prediction models in the same populations would be necessary to evaluate the effect of individual risk factors and determine which tool could be useful at the population level [25,101].

Strengths and Limitations
Our review has many strengths worth mentioning.To our knowledge, this is the first systematic review focusing specifically on BC risk prediction models, including a PRS.We used a rigorous methodologic approach where two reviewers independently performed each step of study selection and data extraction.The publication of the review protocol was another strength that ensured transparency in our work.Finally, including multiple measures of discrimination from the same study allowed us to appraise the incremental improvement resulting from adding genetic and non-genetic factors to the PRS.However, our review also has some limitations.Although two reviewers were involved in identifying the studies, we cannot exclude the possibility that some studies might have been missed.In fact, genetic research related to cancer is a growing field, and new articles on the subject are published regularly.We included only studies published in English or French, and those available in the gray literature or published in other languages were not considered.Also, we did not include studies focusing on the development or validation of new methodological methods for risk prediction tools or simulation studies, as those were outside the scope of this review.Furthermore, it was not possible to present pooled results of individual studies in a meta-analysis due to the heterogeneity of included studies in terms of predictors and PRS.Some limitations were due to the quality of individual studies.A measure of calibration or reclassification was not always provided in models, making it difficult to determine how close the estimated risk was to the predicted risk.Finally, studies showed the predictive performance of models, but only some of them considered the clinical and practical utility of these models [102].

Conclusions
Our research brings evidence on BC risk prediction tools incorporating a PRS.This review shows that the combination of genetic and non-genetic risk factors and PRS tends to increase the predictive performance compared to the PRS only and can improve risk stratification in the population.While most tools' discriminative accuracy was still modest, predictive performance is only one component when considering if a risk prediction tool will be implemented and useful in a clinical or public health setting.Many barriers, legal, social, ethical and economic, can influence the implementation of a prediction tool.Therefore, this review is only a first step in understanding the issues related to the validation of BC risk prediction tools, including a genetic risk score, and more studies are needed to shed light on potential challenges in implementing these tools.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/cancers15225380/s1,Table S1.Example of search strategy; Table S2.Description of risk models and participants; Table S3.Predictive performance of models in individual studies.

1b*
Results for development not shown.Abbreviations: UK = United Kingdom; USA = United States of America; LR = likelihood ratio; OR = Odds ratio; SNP = Single nucleotide polymorphism; BCAC = Breast Cancer Association Consortium; BPC3 = Breast and Prostate Cancer Cohort Consortium; BCSC = Breast Cancer Surveillance Consortium; BOADICEA = Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm; BCRAT = Breast Cancer Risk Assessment Tool (Gail Model); COGS = Collaborative Oncological Gene-Environment Study; IBIS = International Breast Intervention Study (Tyrer-Cuzick model); iCARE-Lit = Individualized Coherent Absolute Risk Estimator based on literature review; iCARE-BPC3 = Individualized Coherent Absolute Risk Estimator based on BPC3 analysis.TRIPOD levels: 1a = development only; 1b = development and validation using resampling; 2a = random split sample development and validation; 2b = non-random split sample development and validation; 3 = development and validation using separate data; 4 = validation only.
Abbreviations: BDD = Benign breast disease; BMI = Body mass index; HRT = Hormone therapy replacement; NA = Information n noted as unknown; OC = Oral contraceptive.Notes: 1 = First-degree relative with cancer; 2 = Second-degree relative with cancer; cancer.Ⴕ = Results are presented separately for different ethnicities.ႵႵ Age of breast cancer diagnosis in affected first-and second-d ‡ = Results are presented stratified by age.a = Also includes number of first-degree relatives with ovarian cancer; b = Presented density; c = Also includes microcalcifications and masses; d = Also includes a custom gene panel (ATM, BARD1, BRCA1, BRCA2, CD RAD50, RAD51C, RAD51D and TP53) and weight; e= Also includes HRT type; f = Used as an interaction between BMI and menop includes weight and history of ovarian cancer; h = Also includes year of birth, age at cancer diagnosis for family history and hi includes history of lobular carcinoma in situ, age at cancer diagnosis for family history and history of ovarian cancer; j = Also incl noma in situ (LCIS), age at onset of breast cancer in a relative, bilateral breast cancer in a relative, ovarian cancer in a relative and ma model also includes information on explicit family history of breast and other cancers, genetic factors such as pathogenic variants breast tumor pathology and demographic factors (seeLee et al. 2019 [13]); l = Other factors includes sex hormone levels of estrad hip ratio.
Abbreviations: BDD = Benign breast disease; BMI = Body mass index; HRT = Hormone therapy replacement; NA = Information noted as unknown; OC = Oral contraceptive.Notes: 1 = First-degree relative with cancer; 2 = Second-degree relative with cancer cancer.Ⴕ = Results are presented separately for different ethnicities.ႵႵ Age of breast cancer diagnosis in affected first-and second- ‡ = Results are presented stratified by age.a = Also includes number of first-degree relatives with ovarian cancer; b = Presente density; c = Also includes microcalcifications and masses; d = Also includes a custom gene panel (ATM, BARD1, BRCA1, BRCA2, CD RAD50, RAD51C, RAD51D and TP53) and weight; e= Also includes HRT type; f = Used as an interaction between BMI and menop includes weight and history of ovarian cancer; h = Also includes year of birth, age at cancer diagnosis for family history and h includes history of lobular carcinoma in situ, age at cancer diagnosis for family history and history of ovarian cancer; j = Also inc noma in situ (LCIS), age at onset of breast cancer in a relative, bilateral breast cancer in a relative, ovarian cancer in a relative and m model also includes information on explicit family history of breast and other cancers, genetic factors such as pathogenic variants breast tumor pathology and demographic factors (seeLee et al. 2019 [13]); l = Other factors includes sex hormone levels of estra hip ratio.
Abbreviations: BDD = Benign breast disease; BMI = Body mass index; HRT = Hormone therapy replacement; NA = Information not available in the dataset noted as unknown; OC = Oral contraceptive.Notes: 1 = First-degree relative with cancer; 2 = Second-degree relative with cancer; 3 = Third-degree relatives cancer.Ⴕ = Results are presented separately for different ethnicities.ႵႵ Age of breast cancer diagnosis in affected first-and second-degree relatives is also coll ‡ = Results are presented stratified by age.a = Also includes number of first-degree relatives with ovarian cancer; b = Presented as percentage mammogr density; c= Also includes microcalcifications and masses; d = Also includes a custom gene panel (ATM, BARD1, BRCA1, BRCA2, CDH1, CHEK2, NF1, PALB2, P RAD50, RAD51C, RAD51D and TP53) and weight; e = Also includes HRT type; f = Used as an interaction between BMI and menopausal status at baseline; g = includes weight and history of ovarian cancer; h = Also includes year of birth, age at cancer diagnosis for family history and history of ovarian cancer; i = includes history of lobular carcinoma in situ, age at cancer diagnosis for family history and history of ovarian cancer; j = Also includes hyperplasia, lobular noma in situ (LCIS), age at onset of breast cancer in a relative, bilateral breast cancer in a relative, ovarian cancer in a relative and male breast cancer; k = BOAD model also includes information on explicit family history of breast and other cancers, genetic factors such as pathogenic variants and unobserved genetic e breast tumor pathology and demographic factors (seeLee et al. 2019 [13]); l = Other factors includes sex hormone levels of estradiol; m = Also includess wa hip ratio.

Figure 3 .
Figure 3. Discriminative performance of individual risk tools for models including a PRS and genetic and non-genetic risk factors.Each dot represents a measure of discrimination for different versions (represented by the letters when applicable) of risk models as described in Supplementary TableS2.The horizontal segment represents the 95% CIs when provided.Blue, green and yellow dots indicate that the AUC, c-statistic and c-index were the measure of performance, respectively.

Figure 4 .
Figure 4. Summary of the assessment of risk of bias.

Figure 4 .
Figure 4. Summary of the assessment of risk of bias.

Table 1 .
Characteristics of individual studies based on the type of included risk models.

Table 2 .
Main predictors used in breast cancer (BC) prediction models including a Polygenic Risk Score (PRS).