Development and Validation of Risk Prediction Models for Colorectal Cancer in Patients with Symptoms

We aimed to develop and validate prediction models incorporating demographics, clinical features, and a weighted genetic risk score (wGRS) for individual prediction of colorectal cancer (CRC) risk in patients with gastroenterological symptoms. Prediction models were developed with internal validation [CRC Cases: n = 1686/Controls: n = 963]. Candidate predictors included age, sex, BMI, wGRS, family history, and symptoms (changes in bowel habits, rectal bleeding, weight loss, anaemia, abdominal pain). The baseline model included all the non-genetic predictors. Models A (baseline model + wGRS) and B (baseline model) were developed based on LASSO regression to select predictors. Models C (baseline model + wGRS) and D (baseline model) were built using all variables. Models’ calibration and discrimination were evaluated through the Hosmer-Lemeshow test (calibration curves were plotted) and C-statistics (corrected based on 1000 bootstrapping). The models’ prediction performance was: model A (corrected C-statistic = 0.765); model B (corrected C-statistic = 0.753); model C (corrected C-statistic = 0.764); and model D (corrected C-statistic = 0.752). Models A and C, that integrated wGRS with demographic and clinical predictors, had a statistically significant improved prediction performance. Our findings suggest that future application of genetic predictors holds significant promise, which could enhance CRC risk prediction. Therefore, further investigation through model external validation and clinical impact is merited.


Introduction
Colorectal cancer (CRC) was the third most common cancer and the second leading cause of cancer-related death in the world, 2022 [1]. Early CRC diagnosis and timely treatment could improve survival. Survival rate depends on cancer stage at diagnosis, with 5-year net survival starting at approximately 90% for stage I and reduced to 10% for stage IV [2]. Although screening has successfully reduced CRC incidence and mortality, the majority of CRCs are still diagnosed after symptomatic presentation [3]. It is important to develop accurate prediction models to identify symptomatic patients with higher CRC risk in whom referral is most appropriate. These models could assist clinical professionals in their decision-making for further clinical care, such as risk-tailored cancer screening, testing, and treatments [4]. 2

Studies and Variables
CRC prediction models were developed with internal validation in a study that included participants from the Study of Colorectal Cancer in Scotland (SOCCS) (n = 1649) and the Lothian Bowel Symptoms Study (LABSS) (n = 1000). SOCCS, a case-control study, started in 1999 and has been recruiting CRC incident cases (aged ≥ 16 years old) and healthy controls (matched on age, sex, and health board) from across Scotland. In the current study, we only used data from colorectal cancer cases that had developed gastrointestinal symptoms prior to their recruitment in SOCCS. LABSS, which is a multi-centre casecontrol study started in 2017, recruited patients (aged ≥ 18 years old) with gastrointestinal symptoms through endoscopy, CT scanning, colorectal surgery, and gastroenterology units within NHS recruiting centres across Scotland. SOCCS and LABSS collected age, sex, BMI, family history, and symptoms (changes in bowel habits, rectal bleeding, weight loss, anaemia, abdominal pain). Age (years old), sex (male/female), BMI (kg/m 2 ), and family history of CRC (yes/no) were collected and documented in questionnaires by the study nurse in SOCCS and LABSS. We designated individuals as having a positive family history (yes) if their first-degree (e.g., parents, siblings, and children) or seconddegree (e.g., grandparent/grandchild, half-siblings, aunt/uncle, and niece/nephew) or any other relatives have a documented history of CRC. In SOCCS, symptoms (yes/no) were collected by the study nurse through GP referral and/or consultant clinic referral letters, as documented in medical records in TRAK (the NHS Lothian electronic patient data system). In LABSS, symptoms (yes/no) were collected by the study nurse through interviews during patient recruitment and recorded in a pre-designed consultation questionnaire. SOCCS and LABSS also collected blood samples, and DNA samples were genotyped using Illumina ® HumanHap300, HumanHap240S, and OmniExpressExome BeadChip 8v1 arrays. Genotype data quality control was performed following the method proposed by Anderson [23]. Untyped variants were imputed using the Michigan Imputation Server, which is based on 1000 genomes (from the European reference panel) [24].

Descriptive and Association Analysis
We performed a baseline summary for SOCCS and LABSS. The test of correlation and difference in variables between cases and controls in two studies were examined for statistical significance by using the t-test (continuous variables) and the Pearson χ2 test (categorical variables). Univariable and multivariable logistic regression models were fitted to test the associations between variables and CRC risk (factors with univariable p < 0.05 were included in the multivariable analysis).

Weighted Genetic Risk Scores
A weighted genetic risk score (wGRS) is defined as a weighted sum of dosages of risk alleles for k considered SNPs (g i1 , . . . , g ik ) for the n subjects (i = 1, . . . , n). The wGRS formula is: GRS i = w 1 g i1 + . . . + w k g ik . This means that, for each individual, the number of risk alleles dosages carried at each genetic variant SNP is summed, and it is weighted by its effect size. The effect size derived from the meta-GWAS for a SNP is referred to as the 'weight' (w 1 , . . . , w k ).

Model Development and Internal Validation
CRC prediction models' development and validation were conducted and reported following the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) guideline [26] (Supplementary Figure S2).
Models were developed with internal validation in the combined dataset with a total number of 2649 participants (CRC symptomatic cases = 1686, symptomatic controls = 963; Figure 1). The prediction outcome (Y) was defined as CRC (yes/no). Candidate predictors (X) included (i) continuous variables-age, BMI, and wGRS-as well as (ii) categorical variables-sex, family history, and symptoms (changes in bowel habits, rectal bleeding, weight loss, anaemia, and abdominal pain). Each continuous variable (X) was modelled to test its association with the predicted outcome (Y) using two approaches: (i) linear analysis and (ii) restricted cubic splines (RCS). The continuous variables were then adjusted and incorporated into the full models C (linear) and E (RCS). The prediction performance, including overall accuracy (R 2 , brier score, AIC, BIC), discrimination (C-statistics), and calibration (p-value of Hosmer-Lemeshow test), were compared for the two approaches. The brier score (range: 0-1) quantifies the mean squared difference between the predicted probability and the observed out- Each continuous variable (X) was modelled to test its association with the predicted outcome (Y) using two approaches: (i) linear analysis and (ii) restricted cubic splines (RCS). The continuous variables were then adjusted and incorporated into the full models C (linear) and E (RCS). The prediction performance, including overall accuracy (R 2 , brier score, AIC, BIC), discrimination (C-statistics), and calibration (p-value of Hosmer-Lemeshow test), were compared for the two approaches. The brier score (range: 0-1) quantifies the mean squared difference between the predicted probability and the observed outcome, with a lower score indicating a better prediction performance [27]. AIC and BIC are estimations concerning the sample prediction error, with a lower AIC or BIC value indicating a better model fit [28]. The decision on whether to use linear or RCS to adjust continuous variables in the final model was made by evaluating which method yielded better prediction performance.
After adjusting for the continuous variables (X), CRC risk prediction models were built ( Figure 1). Two main strategies to develop the final models are predictor selection and full model [29]. A comparison of strengths and limitations of the methods is presented in Supplementary Table S11. Models A (baseline model + wGRS) and B (baseline model) were constructed based on LASSO regression algorithm to identify the λ (lambda) in response to the most parsimonious model where the cross-validation prediction error is within one standard error of the minimum [30]. The influential predictors selected by LASSO were incorporated into the prediction models. Models C (baseline model + wGRS) and D (baseline model) were built using all 10 variables collected in SOCCS and LABSS. These 10 variables were used as predictors in the 19 CRC prediction models previously developed (Supplementary Table S3), and, therefore, they were incorporated in models C and D, irrespective of their associations with the prediction outcome or influence on the model performance. In addition, we built prediction models F and G based on random forest regression [31,32], and the results were presented in Supplementary Table S12, Figures S11-S13.

Model Prediction Performance
Models' prediction performance was evaluated in terms of calibration and discrimination. Calibration, which measures the agreement between the model predicted probabilities (the risk rate of individuals with CRC) and the observed probabilities, was assessed using the Hosmer-Lemeshow (HL) goodness of fit test, with a p > 0.05 indicating good model calibration. Calibration curves were plotted to visualize the models' calibrative power. Discrimination performance was examined through analysis of the area under the curve (AUC), which is also referred to as the C-statistic. The corrected C-statistics were calculated based on bootstrapping validation (1000 bootstraps resamples). The receiver operating characteristic (ROC) curve and the precision-recall curve (PRC) were plotted [33,34]. The continuous Net Reclassification Index (NRI) and Integrated Discrimination Index (IDI) were calculated after recalibration to compare models and assess the prediction increment [35]. An online nomogram for the final model was built using Shiny.apps.

Statistical Analysis
The LASSO regression was conducted using the 'glmnet' R package. Random forest regression was performed using the 'randomForest' R package. The HL test was constructed using the 'hoslem.test' function in the 'ResourceSelection' R package. The C-statistic was calculated using the "rcorr.cens" and "roc" functions in the 'rms' package. The online CRC risk prediction nomogram/calculator was constructed using the 'DynNom' and 'rsconnect' R packages. A two-sided p-value less than 0.05 was considered statistically significant. All analyses were performed using R, version 4.0.3 (R Foundation for Statistical Computing).

Baseline Characteristics
The baseline characteristics of SOCCS (n = 1649) and LABSS (n = 1000) studies are summarized in Table 1. The distribution of each variable comparing symptomatic cases versus symptomatic controls in two studies is presented in Supplementary Table S4. There were no statistically significant differences between CRC symptomatic cases in SOCCS and LABSS with regards to wGRS 202 , age, sex, BMI, family history, and symptoms (p > 0.05).
Comparing symptomatic cases (n = 1686) versus symptomatic controls (n = 963) in SOCCS and LABSS (Table 1), CRC symptomatic cases had a higher wGRS 202 , were older in age, and had a higher proportion of male patients, compared to symptomatic controls (p < 0.001). Cases had a lower BMI (p = 0.017). No statistically significant differences were found between symptomatic cases and controls for family history (p = 0.570). Regarding symptoms, the proportion of anaemia was significantly higher in CRC symptomatic cases (23.31%) than in the symptomatic control group (14.75%) [p < 0.001], while the proportions of changes in bowel habits (42.41%), weight loss (14.77%), and abdominal pain (19.69%) in CRC symptomatic cases were significantly lower compared to the symptomatic control group (changes in bowel habits: 74.87%, weight loss: 18.59%, abdominal pain: 43.93%) [p < 0.001]. Rectal bleeding was not statistically different between symptomatic cases and controls (p = 0.219).
In univariable analysis, statistically significant baseline factors for CRC risk included wGRS 202 , age, sex, BMI, and symptoms: changes in bowel habits, weight loss, anaemia, and abdominal pain (p < 0.05). Family history and rectal bleeding were not associated with CRC risk (p > 0.05). The above eight significant baseline factors were included in the multivariable analysis. Multivariable analysis demonstrated that  (Table 1).

Prediction Models of CRC Risk in Patients with Symptoms
Models A-D were developed with internal validation in SOCCS and LABSS to predict CRC risk in patients with symptoms ( Figure 1).

Continuous Variables Adjustment
The shape of the relationship between each continuous variable (age, BMI, and wGRS 202 ) and the predicted outcome (CRC probability) is presented in Supplementary  Figures S3-S5. Relationship figures showed steady increments in CRC probability for each year increase in age, decreasing BMI, and increasing wGRS 202 . The relationships between continuous variables and CRC were roughly linear in shape.
Continuous variables were then transformed by RCS, and we tested the hypothesis that the associations between continuous variables and the predicted outcome are not linear [36]. Spline functions with three, four, and five knots were created to fit each of these in the logistic regression model.
Supplementary Figures S6-S8 and Tables S5-S7 demonstrated that R 2 , AIC, and BIC were the lowest using RCS with three knots, compared to four and five knots. There was no evidence of significant non-linear associations between age (nonlinear p-value = 0.105), BMI (nonlinear p-value = 0.587), wGRS 202 (nonlinear p-value = 0.688), and CRC risk. The findings are consistent with Supplementary Figures S3-S5, showing that the relationships between age, BMI, wGRS, and CRC risk were linear in shape.
The continuous variables were adjusted and incorporated into the full model C (linear) and model E (RCS with three knots). Supplementary Table S8 summarizes and compares the two models' prediction performance. Model C had higher AIC, lower BIC, and higher corrected C-statistic compared to model E. Therefore, continuous variables (X) were adjusted in CRC prediction models, keeping age, BMI, and wGRS 202 as continuous covariates in models.

Models' Development and Validation
Each model's predictors, intercept, coefficients, discrimination, and calibration estimates are presented in Table 2. Model formulas are presented in Supplementary Table S9.  CRC prediction models A, B, C, and D were evaluated, and they demonstrated good prediction performance. The summary of discrimination and calibration results for these models is as follows: Model A had a C-statistic of 0.767 (corrected 0.765) and a HL-test p-value of 0.024, while Model B had a C-statistic of 0.754 (corrected: 0.753) and a HL-test p-value of 0.711, as shown in Table 2 (Table 2; Figures 5-7). Precision recall curves, which visualize the relationship between precision (positive predictive value) and recall (sensitivity) to compare across models, were shown in Figures 4 and 7. J. Pers. Med. 2023, 13, x. https://doi.org/10.3390/xxxxx www.mdpi.com/journal/jpm CRC prediction models A, B, C, and D were evaluated, and they demonstrated good prediction performance. The summary of discrimination and calibration results for these models is as follows: Model A had a C-statistic of 0.767 (corrected 0.765) and a HL-test pvalue of 0.024, while Model B had a C-statistic of 0.754 (corrected: 0.753) and a HL-test pvalue of 0.711, as shown in Table 2 and Figures 2-4. Model C had a C-statistic of 0.767 (corrected: 0.764) and a HL-p value of 0.018, while Model D had a C-statistic of 0.755 (corrected: 0.752) and a HL-p value of 0.428 (Table 2; Figures 5-7). Precision recall curves, which visualize the relationship between precision (positive predictive value) and recall (sensitivity) to compare across models, were shown in Figures 4 and 7.   CRC prediction models A, B, C, and D were evaluated, and they demonstrated good prediction performance. The summary of discrimination and calibration results for these models is as follows: Model A had a C-statistic of 0.767 (corrected 0.765) and a HL-test pvalue of 0.024, while Model B had a C-statistic of 0.754 (corrected: 0.753) and a HL-test pvalue of 0.711, as shown in Table 2 and Figures 2-4. Model C had a C-statistic of 0.767 (corrected: 0.764) and a HL-p value of 0.018, while Model D had a C-statistic of 0.755 (corrected: 0.752) and a HL-p value of 0.428 (Table 2; Figures 5-7). Precision recall curves, which visualize the relationship between precision (positive predictive value) and recall (sensitivity) to compare across models, were shown in Figures 4 and 7.         There was no statistical difference in the predictive accuracy between models A and C (C-statistic increment = 0.001, p = 0.479). In addition, the sensitivity analysis found that there was no statistical difference in models for wGRS 137 , wGRS 163 , and wGRS 202 predictive accuracy (Supplementary Table S2; Figures S9-S10). Random forest models F (baseline model + wGRS) and G (baseline model), with 500 trees, were built, and the results were consistent with the findings in cross-assessment of models A/B and C/D (Supplementary Table S12; Figures S11-S13). Model F had an out-of-bag (OOB) prediction error rate of 27.64%, compared to 27.37% for model G. Models that integrated wGRS in combination with demographic and clinical predictors had better performance than baseline models.  There was no statistical difference in the predictive accuracy between models A and C (C-statistic increment = 0.001, p = 0.479). In addition, the sensitivity analysis found that there was no statistical difference in models for wGRS137, wGRS163, and wGRS202 predictive accuracy (Supplementary Table S2; Figures S9-S10). Random forest models F (baseline model + wGRS) and G (baseline model), with 500 trees, were built, and the results were consistent with the findings in cross-assessment of models   There was no statistical difference in the predictive accuracy between models A and C (C-statistic increment = 0.001, p = 0.479). In addition, the sensitivity analysis found that there was no statistical difference in models for wGRS137, wGRS163, and wGRS202 predictive accuracy (Supplementary Table S2; Figures S9-S10). Random forest models F (baseline model + wGRS) and G (baseline model), with 500 trees, were built, and the results were consistent with the findings in cross-assessment of models We developed an online CRC risk prediction nomogram/calculator A. This can be accessed through the following link: (https://crcpredictionmodel.shinyapps.io/dynnomapp/; accessed on 27 June 2023). The CRC risk for individuals can be calculated via inputting each patient's information.

Interpretation of Main Findings
Our study investigated the predictive value of demographic characteristics, a wGRS based on 202 CRC susceptibility SNPs, family history, and symptoms on CRC risk. The dedicated CRC prediction models were developed and internally validated for personalized cancer risk prediction for patients presenting with symptoms.

Model Predictors
CRC risk prediction models A-D were constructed using a polygenic risk score, age, sex, BMI, family history, and symptoms to predict CRC risk in patients with symptoms.
The 10 candidate variables (except wGRS) in our study were all used as predictors in the previously developed 19 CRC prediction models. Our models' findings were in line with these previous studies. It should be noted that family history data in SOCCS and LABSS studies was collected based on self-reported bowel cancer history, which was recorded in patient questionnaires and may be affected by recall bias. Furthermore, predictive value of symptoms as indicators for CRC is not well established. Previous studies argued that bowel symptoms correlate poorly with the presence of CRC [37]. They are also common in patients free from CRC risk, which implies they do not have good sensitivity for CRC [38]. Bowel symptoms are associated with CRC risk, but only for patients who have had the symptom at least weekly and for less than 12 months [5]. For symptoms that may be relevant, investigating the frequency and duration of symptoms is helpful. Data related to duration and frequency of bowel symptoms were unfortunately not collected in SOCCS, and thus we could not explore this in our study.
None of the 19 models incorporated genetic factors (neither individual SNPs nor a wGRS). To the best of our knowledge, this is the first study that developed and internally validated prediction models that included a wGRS in addition to demographic and clinical factors for CRC risk in patients with symptoms. Models A and C verified that the wGRS, including 202 CRC susceptibility SNPs, is the score with the best prediction performance, compared to baseline models B and D. The findings showed that the inclusion of the genetic predictor (wGRS) into the baseline model could improve CRC risk stratification. By comparison, previous studies were mainly focused on the predictive ability of genetic factors to capture the overall risk of CRC in the general population, not in symptomatic patients [39]. A recently published systematic review synthesized and evaluated a total of 33 CRC risk prediction models, which were developed by incorporating genetic predictors (SNPs or GRS) for the prediction of CRC risk in the general population [39] (Supplementary  Table S10). An amount of 78.8% of the identified 33 CRC risk prediction models applied GRS, and the remaining 21.2% of them, incorporated SNPs as genetic predictors. The meta-analysis findings suggested no correlation between the number of SNPs and AUC improvement (p = 0.695). Furthermore, AUC improvement for the addition of genetic predictors to baseline models ranged from 0.010 to 0.084. The meta-analysis resulted in a pooled estimate of AUC improvement for genetic-enhanced prediction models compared with baseline models of 0.040 (95% CI: 0.035-0.045) [39].
These results are consistent with our finding of the polygenic risk score value in symptomatic patients. The integration of genetic predictors into classical CRC prediction models (baseline models) could improve the models' prediction accuracy. There are several strengths for using genetic risk stratification in CRC. First, wGRS provides a measure of genetic susceptibility to CRC risk. Second, genetic predisposition to CRC remains relatively unchanged throughout life and affords the opportunity to provide long-term estimation of risk trajectories. Third, genetic risk stratification could improve CRC risk prediction in people who carry high-impact disease-causing genetic variants. Future application of genetic predictors holds significant promise and has the potential to enhance CRC risk prediction, assist clinical decision-making in precision therapeutics, and improve population-level screening [40]. Despite the potentials and benefits of using genetic predictors, there are risks and limitations of clinical use, which should be acknowledged. The first concern is to balance the cost and net benefit of using genetic predictors [40]. Genetic variants are not routinely collected in clinical practice, and it is not clear whether their predictive accuracy is better than for traditional risk factors, which can be more easily collected from routine patient records [39]. In addition, the standards and methods to incorporate genetic predictors in prediction models are constantly developing [41]. There has not been a unified standard, and this inconsistency becomes a major challenge during its clinical application. Another challenging aspect of using genetic predictors in clinical practice is to ensure that they are equally applicable to all ethnic groups [42]. The majority of current genetic variants data are from European populations, thus, GRS are primarily developed and validated in those of European descent [43]. This usually leads to a decrease in predictive accuracy when applied to non-European ancestries [44]. Lastly, it is important to validate genetic predictors' feasibility in routine clinical practice [41]. It is suggested to evaluate the CRC genetic model's clinical impact (e.g., cost-effectiveness) prior to implementation in the clinical setting [45].

Model Prediction Performance, Validation, and Clinical Impact
CRC prediction models A, B, C, and D were found to have good predictive performance, surpassing the area under the ROC curves threshold of 0.7. Our models have the advantage of identifying symptomatic patients who have a higher probability of CRC among all patients. In addition, the calibration plots illustrated the acceptable agreement between the observed CRC probabilities and the predicted CRC probabilities. Due to a lack of external data, it was unfortunate that models A, B, C, and D could not be validated in the external population. Comparing LASSO model A and full model C, there was no statistical difference in the models' predictive accuracy. It is critical to consider whether the model's predictive accuracy increment is worth the additional time and cost to collect all the predictors. The parsimonious model A used five LASSO-selected influential predictors. LASSO approach could select the most influential predictors [46]. By comparison, the full model C used all the 10 predictors. In this study, the increased time and cost to collect the larger number of predictors for the full model C outweighed the increased predictive accuracy. It is important to balance model parsimony and accuracy [47]. From a practical perspective, the parsimonious model A is easier to interpret, generalize, and use in practice. In the current study, model A is preferred over model C.
Compared to the previously published 19 risk prediction models, 13 (68.4%) models reported a median AUC value of 0.85 (ranged from 0.73 to 0.97), which indicated that these models had better discrimination ability. With regards to validation, 10 (52.6%) models did not undergo either internal or external validation; five (26.3%) models were internally validated; and three (15.8%) models were validated in external datasets. One model (5.3%) was developed with both internal and external validation. None of the 19 models performed clinical impact analysis. Although they perform at a level that is considered 'clinically acceptable' with a C-statistic >0.7, however, these models have not yet been applied in clinical practice.

Strengths and Limitations
The main strength of this study is that CRC prediction models were developed with internal validation to alleviate the models' overfitting and optimism. Models incorporated both influential genetic and non-genetic predictors to increase the models' prediction performance, which were validated to have good calibration and discrimination.
However, the following potential limitations should be considered. (1) This risk prediction modelling study was based on a small sample size and may not be sufficiently representative of the population. Furthermore, due to the small sample size, we did not develop risk prediction models for CRC risk in males and females separately or in different CRC cancer sites. (2) The majority of CRC cases came from SOCCS (97.81%), and all controls, were from LABSS. The different variable collection methods in SOCCS (GP ereferrals) and LABSS (questionnaire) could bias the study's results. For GP e-referrals, it is possible that not all the symptoms would be accurately recorded by GPs. By comparison, for LABSS, patients were asked whether they had presented the symptoms (those were variables of interest and were designed to be collected in the questionnaire), and, therefore, they were more likely to recall a greater number of symptoms. (3) Previous systematic reviews found that biomarkers (e.g., haemoglobin, CEA, qFIT result), lifestyle (e.g., vitamin D) variables, and bowel symptoms (e.g., rectal mass, abdominal mass) are associated with CRC risk [48,49]. However, these predictors were not collected in SOCCS and LABSS studies and could not be employed in the developed CRC prediction models. (4) The prediction performance of using genetic predictors may vary, depending on the SNPs included (whether they are high-risk susceptibility), SNPs weight estimates from a meta-GWAS dataset, and the specific computational method used for GRS construction [39]. We included a list of genome-wide CRC significant SNPs (p < 5 × 10 −8 ) from the most recently published meta-GWAS study [25]. However, 8.43% of the meta-GWAS participants were SOCCS participants. Thus, this could overestimate our wGRS when we used their SNPs' coefficients for external weight. Another limitation is that current genetic variants are from European populations, which usually leads to a decrease in predictive accuracy when applied to non-European ancestries [50]. (5) Internal validation cannot address selection bias with recruitment, or measurement errors, as validation is performed within the study population [51]. (6) The C-statistic, HL goodness of fit test, and calibration plots were employed to examine model performance (discrimination and calibration). These metrics have their own limitations. The C-statistic does not have a clear interpretation when assessing the incremental value after adding a new predictor [52]. The HL test might lack statistical power to detect overfitting, it is sensitive to the sample size, and it provides no information on the direction or magnitude of miscalibration [53]. The calibration plot cannot provide quantitative assessment of model calibration [54]. (7) The developed CRC risk prediction models have not been externally validated due to lack of data. Validation studies of large sample size may be considered in the future.

Clinical Implications and Future Research
CRC prediction models have the benefit of providing disease risk assessment to identify patients, whilst also supporting clinical decision-making about risk-tailored, personalised clinical care [55]. This eventually could improve patients' health outcomes and the cost-effectiveness of care [38]. Despite their benefits, CRC prediction models in front-line clinical practice remain under-utilized. There are risks and limitations of CRC prediction models in clinical use. The first concern is associated with prediction accuracy. Incorrect CRC prediction models might prioritize the wrong patients for further screening, interventions, and clinical treatments [56]. In addition, two studies conducted interviews/focus groups and surveys to investigate attitudes regarding the use of CRC prediction models among GPs and to identify barriers to their clinical use [57,58]. The findings indicate that clinicians may interpret symptoms inconsistently which would lead to inaccurate and unreliable CRC risk assessment. Therefore, future application of genetic predictors holds significant promise and has the potential to enhance CRC risk prediction.

Conclusions
CRC prediction models were developed with internal validation for personalized cancer risk prediction for patients presenting with symptoms. The integration of genetic architecture into the CRC classical prediction model could improve prediction performance. This could be helpful to identify a subpopulation among the symptomatic population with higher CRC risk due to genetic susceptibility. The findings merit further investigation through model external validation and model clinical impact.

Data Availability Statement:
The data presented in this study are available upon reasonable request to the corresponding author.