Machine Learning for Risk Prediction of Oesophago-Gastric Cancer in Primary Care: Comparison with Existing Risk-Assessment Tools
Abstract
:Simple Summary
Abstract
1. Introduction
2. Materials and Methods
2.1. Data Overview
2.2. Ethical Approval
2.3. Feature Selection
2.4. Predictive Models
2.5. Evaluation
3. Results
3.1. Model Performance
3.2. Feature Contribution Estimates
4. Discussion
Limitations
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Allum, W.; Lordick, F.; Alsina, M.; Andritsch, E.; Ba-Ssalamah, A.; Beishon, M.; Braga, M.; Caballero, C.; Carneiro, F.; Cassinello, F.; et al. ECCO essential requirements for quality cancer care: Oesophageal and gastric cancer. Crit. Rev. Oncol. Hematol. 2018, 122, 179–193. [Google Scholar] [CrossRef]
- Cancer Research UK. UK Oesophageal Cancer Statistics. London, UK. 2019. Available online: https://www.cancerresearchuk.org/health-professional/cancer-statistics/statistics-by-cancer-type/oesophageal-cancer (accessed on 1 December 2021).
- Cancer Research UK. UK Stomach Cancer Statistics. London, UK. 2019. Available online: https://www.cancerresearchuk.org/health-professional/cancer-statistics/statistics-by-cancer-type/stomach-cancer (accessed on 1 December 2021).
- Swann, R.; McPhail, S.; Witt, J.; Shand, B.; Abel, G.A.; Hiom, S.; Rashbass, J.; Lyratzopoulos, G.; Rubin, G.; The National Cancer Diagnosis Audit Steering Group. Diagnosing cancer in primary care: Results from the National Cancer Diagnosis Audit. Br. J. Gen. Pract. 2018, 68, e63–e72. [Google Scholar] [CrossRef]
- Arnold, M.; Rutherford, M.J.; Bardot, A.; Ferlay, J.; Andersson, T.M.L.; Myklebust, T.Å.; Tervonen, H.; Thursfield, V.; Ransom, D.; Shack, L.; et al. Progress in cancer survival, mortality, and incidence in seven high-income countries 1995–2014 (ICBP SURVMARK-2): A population-based study. Lancet Oncol. 2019, 20, 1493–1505. [Google Scholar] [CrossRef] [Green Version]
- Office for National Statistics (ONS). Cancer Survival by Stage at Diagnosis for England. 2019. Available online: https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/conditionsanddiseases/datasets/cancersurvivalratescancersurvivalinenglandadultsdiagnosed (accessed on 5 July 2021).
- Stapley, S.; Peters, T.J.; Neal, R.D.; Rose, P.W.; Walter, F.M.; Hamilton, W. The risk of oesophago-gastric cancer in symptomatic patients in primary care: A large case–control study using electronic records. Br. J. Cancer 2013, 108, 25–31. [Google Scholar] [CrossRef] [Green Version]
- Bowrey, D.J.; Griffin, S.M.; Wayman, J.; Karat, D.; Hayes, N.; Raimes, S.A. Use of alarm symptoms to select dyspeptics for endoscopy causes patients with curable esophagogastric cancer to be overlooked. Surg. Endosc. 2006, 20, 1725–1728. [Google Scholar] [CrossRef] [PubMed]
- NHS. NHS Long-Term Plan NHS. 2019. Available online: https://www.longtermplan.nhs.uk/publication/nhs-long-term-plan/ (accessed on 15 July 2021).
- Neal, R.D.; Tharmanathan, P.; France, B.; Din, N.U.; Cotton, S.; Fallon-Ferguson, J.; Hamilton, W.; Hendry, A.; Hendry, M.; Lewis, R.; et al. Is increased time to diagnosis and treatment in symptomatic cancer associated with poorer outcomes? Systematic review. Br. J. Cancer 2015, 112, S92–S107. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hamilton, W.; Walter, F.M.; Rubin, G.; Neal, R.D. Improving early diagnosis of symptomatic cancer. Nat. Rev. Clin. Oncol. 2016, 13, 740–749. [Google Scholar] [CrossRef]
- Kostopoulou, O.; Arora, K.; Pálfi, B. Using cancer risk algorithms to improve risk estimates and referral decisions. Commun. Med. 2022, 2, 2. [Google Scholar] [CrossRef]
- Amin, M.B.; Greene, F.L.; Edge, S.B.; Compton, C.C.; Gershenwald, J.E.; Brookland, R.K.; Meyer, L.; Gress, D.M.; Byrd, B.R.; Winchester, D.P. The Eighth Edition AJCC Cancer Staging Manual: Continuing to build a bridge from a population-based to a more “personalized” approach to cancer staging: The Eighth Edition AJCC Cancer Staging Manual. CA Cancer J. Clin. 2017, 67, 93–99. [Google Scholar] [CrossRef]
- Cancer Research UK. Risk Assessment Tools (RATs). Available online: https://www.cancerresearchuk.org/sites/default/files/rats_pdf_1.pdf (accessed on 6 January 2022).
- Hamilton, W.; Green, T.; Martins, T.; Elliott, K.; Rubin, G.; Macleod, U. Evaluation of risk assessment tools for suspected cancer in general practice: A cohort study. Br. J. Gen. Pract. 2013, 63, e30–e36. [Google Scholar] [CrossRef] [PubMed]
- Hamilton, W. The CAPER studies: Five case-control studies aimed at identifying and quantifying the risk of cancer in symptomatic primary care patients. Br. J. Cancer 2009, 101, S80–S86. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hippisley-Cox, J.; Coupland, C. QCancer. Available online: https://www.qcancer.org (accessed on 12 August 2021).
- Hippisley-Cox, J.; Coupland, C. Symptoms and risk factors to identify men with suspected cancer in primary care: Derivation and validation of an algorithm. Br. J. Gen. Pract. 2013, 63, e1–e10. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hippisley-Cox, J.; Coupland, C. Symptoms and risk factors to identify women with suspected cancer in primary care: Derivation and validation of an algorithm. Br. J. Gen. Pract. 2013, 63, e11–e21. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- National Institute for Health and Care Excellence (NICE). Suspected Cancer: Recognition and Referral. NICE Guideline [NG12]. NICE. 2021. Available online: www.nice.org.uk/guidance/ng12 (accessed on 10 July 2021).
- Collins, G.S.; Altman, D.G. Identifying patients with undetected gastro-oesophageal cancer in primary care: External validation of QCancer® (Gastro-Oesophageal). Eur. J. Cancer 2013, 49, 1040–1048. [Google Scholar] [CrossRef] [PubMed]
- Hamilton, W.; Calitri, R.; Shepherd, L.; Fletcher, E.; Shakespeare, M. The Electronic RIsk Assessment for CAncer (ERICA) Trial. 2022. Available online: https://www.theericatrial.co.uk/the-trial-team/ (accessed on 15 July 2021).
- Allum, W.H.; Blazeby, J.M.; Griffin, S.M.; Cunningham, D.; Jankowski, J.A.; Wong, R. Guidelines for the management of oesophageal and gastric cancer. Gut 2011, 60, 1449–1472. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Richter, A.N.; Khoshgoftaar, T.M. A review of statistical and machine learning methods for modeling cancer risk using structured clinical data. Artif. Intell. Med. 2018, 90, 1–14. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Niu, P.H.; Zhao, L.L.; Wu, H.L.; Zhao, D.B.; Chen, Y.T. Artificial intelligence in gastric cancer: Application and future perspectives. World J. Gastroenterol. 2020, 26, 5408–5419. [Google Scholar] [CrossRef] [PubMed]
- Jones, O.T.; Calanzani, N.; Saji, S.; Duffy, S.W.; Emery, J.; Hamilton, W.; Singh, H.; de Wit, N.J.; Walter, F.M. Artificial Intelligence Techniques That May Be Applied to Primary Care Data to Facilitate Earlier Diagnosis of Cancer: Systematic Review. J. Med. Internet. Res. 2021, 23, e23483. [Google Scholar] [CrossRef]
- Lyratzopoulos, G.; Abel, G.A.; McPhail, S.; Neal, R.D.; Rubin, G.P. Measures of promptness of cancer diagnosis in primary care: Secondary analysis of national audit data on patients with 18 common and rarer cancers. Br. J. Cancer 2013, 108, 686–690. [Google Scholar] [CrossRef] [PubMed]
- Alfayez, A.A.; Kunz, H.; Grace Lai, A. Predicting the risk of cancer in adults using supervised machine learning: A scoping review. BMJ Open. 2021, 11, e047755. [Google Scholar] [CrossRef] [PubMed]
- Uddin, S.; Khan, A.; Hossain, M.E.; Moni, M.A. Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inform. Decis. Mak. 2019, 19, 281. [Google Scholar] [CrossRef] [PubMed]
- Zihni, E.; Madai, V.I.; Livne, M.; Galinovic, I.; Khalil, A.A.; Fiebach, J.B.; Frey, D. Opening the black box of artificial intelligence for clinical decision support: A study predicting stroke outcome. Stoean R, editor. PLoS ONE 2020, 15, e0231166. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Department of Health and Social Care UK. A Guide to Good PRACTICE for Digital and Data-Driven Health Technologies. 2021. Available online: https://www.gov.uk/government/publications/code-of-conduct-for-data-driven-health-and-care-technology/initial-code-of-conduct-for-data-driven-health-and-care-technology (accessed on 28 May 2022).
- Walley, T.; Mantgani, A. The UK General Practice Research Database. Lancet 1997, 350, 1097–1099. [Google Scholar] [CrossRef]
- Lawson, D.; Sherman, V.; Hollowell, J. The General Practice Research Database. Scientific and Ethical Advisory Group. QJM 1998, 91, 445–452. [Google Scholar] [CrossRef] [Green Version]
- PHE. Public Health England (PHE) Data and Analysis Tools: Cancer Incidence. 2019. Available online: https://www.cancerdata.nhs.uk/incidence_and_mortality (accessed on 10 July 2021).
- Thrumurthy, S.G.; Chaudry, M.A.; Hochhauser, D.; Mughal, M. The diagnosis and management of gastric cancer. BMJ 2013, 347, f6367. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Arnal, M.J.D.; Arenas, Á.; Arbeloa, Á. Esophageal cancer: Risk factors, screening and endoscopic treatment in Western and Eastern countries. World J. Gastroenterol. 2015, 21, 7933. [Google Scholar] [CrossRef]
- NICE. Oesophago-Gastric Cancer: Assessment and Management in Adults (NICE Guideline NG83). National Institute for Health and Care Excellence (NICE). Available online: https://www.nice.org.uk/guidance/ng83 (accessed on 5 July 2021).
- Ribeiro, M.T.; Singh, S.; Guestrin, C. ‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; ACM: New York, NY, USA, 2016; pp. 1135–1144. Available online: https://dl.acm.org/doi/10.1145/2939672.2939778 (accessed on 12 August 2021).
- Benn, M.; Tybjærg-Hansen, A.; Stender, S.; Frikke-Schmidt, R.; Nordestgaard, B.G. Low-Density Lipoprotein Cholesterol and the Risk of Cancer: A Mendelian Randomization Study. JNCI J. Natl. Cancer Inst. 2011, 103, 508–519. [Google Scholar] [CrossRef] [Green Version]
- Tomiki, Y.; Suda, S.; Tanaka, M.; Okuzawa, A.; Matsuda, M.; Ishibiki, Y.; Sakamoto, K.; Kamano, T.; Tsurumaru, M.; Watanabe, Y.; et al. Reduced low-density-lipoprotein cholesterol causing low serum cholesterol levels in gastrointestinal cancer: A case control study. J. Exp. Clin. Cancer Res. CR 2004, 23, 233–240. [Google Scholar]
- Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput. Sci. Eng. 2007, 9, 90–95. [Google Scholar] [CrossRef]
- Wynants, L.; van Smeden, M.; McLernon, D.J.; Timmerman, D.; Steyerberg, E.W.; Van Calster, B. Three myths about risk thresholds for prediction models. BMC Med. 2019, 17, 192. [Google Scholar] [CrossRef] [Green Version]
- Cromwell, D.; Wahedally, H.; Park, M.H.; Maynard, N.; Crosby, T.; Trudgill, N.; Gaskell, J.; Napper, R. National Oesophago-Gastric Cancer Audit. Healthcare Quality Improvement Partnership (HQIP). 2019. Available online: https://www.nogca.org.uk/content/uploads/2019/12/REF150_NOGCA_2019-Annual-Report-FINAL_19Dec.pdf (accessed on 28 May 2022).
- Moore, S.F.; Price, S.J.; Chowienczyk, S.; Bostock, J.; Hamilton, W. The impact of changing risk thresholds on the number of people in England eligible for urgent investigation for possible cancer: An observational cross-sectional study. Br. J. Cancer 2021, 125, 1593–1597. [Google Scholar] [CrossRef] [PubMed]
- Hamilton, W.; Stapley, S.; Campbell, C.; Lyratzopoulos, G.; Rubin, G.; Neal, R.D. For which cancers might patients benefit most from expedited symptomatic diagnosis? Construction of a ranking order by a modified Delphi technique. BMC Cancer 2015, 15, 820. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Van Calster, B.; McLernon, D.J.; van Smeden, M.; Wynants, L.; Steyerberg, E.W. Calibration: The Achilles heel of predictive analytics. BMC Med. 2019, 17, 230. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Price, S.; Spencer, A.; Medina-Lara, A.; Hamilton, W. Availability and use of cancer decision-support tools: A cross-sectional survey of UK primary care. Br. J. Gen. Pract. 2019, 69, e437–e443. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dikomitis, L.; Green, T.; Macleod, U. Embedding electronic decision-support tools for suspected cancer in primary care: A qualitative study of GPs’ experiences. Prim. Health Care Res. Dev. 2015, 16, 548–555. [Google Scholar] [CrossRef] [Green Version]
- Green, T.; Martins, T.; Hamilton, W.; Rubin, G.; Elliott, K.; Macleod, U. Exploring GPs’ experiences of using diagnostic tools for cancer: A qualitative study in primary care. Fam. Pract. 2015, 32, 101–105. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Panter, S.J.; Bramble, M.G.; O’Flanagan, H.; Hungin, A.P.S. Urgent cancer referral guidelines: A retrospective cohort study of referrals for upper gastrointestinal adenocarcinoma. Br. J. Gen. Pract. J R Coll Gen. Pract. 2004, 54, 611–613. [Google Scholar]
- Dong, J.; Thrift, A.P. Alcohol, smoking and risk of oesophago-gastric cancer. Best Pract. Res. Clin. Gastroenterol. 2017, 31, 509–517. [Google Scholar] [CrossRef]
- Martins, T.; Hamilton, W.; Ukoumunne, O. Ethnic inequalities in time to diagnosis of cancer: A systematic review. BMC Fam. Pract. 2013, 14, 197. Available online: http://www.biomedcentral.com/1471-2296/14/197 (accessed on 12 August 2021). [CrossRef] [Green Version]
- Gupta, S.; Tao, L.; Murphy, J.D.; Camargo, M.C.; Oren, E.; Valasek, M.A.; Gomez, S.L.; Martinez, M.E. Race/Ethnicity-, Socioeconomic Status-, and Anatomic Subsite-Specific Risks for Gastric Cancer. Gastroenterology 2019, 156, 59–62.e4. [Google Scholar] [CrossRef] [PubMed]
- Hamilton, W.T.; Round, A.P.; Sharp, D.; Peters, T.J. The quality of record keeping in primary care: A comparison of computerised, paper and hybrid systems. Br. J. Gen. Pract. J R Coll Gen. Pract. 2003, 53, 929–933, discussion 933. [Google Scholar]
- Hippisley-Cox, J.; Coupland, C. Identifying patients with suspected gastro-oesophageal cancer in primary care: Derivation and validation of an algorithm. Br. J. Gen. Pract. 2011, 61, e707–e714. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- McInerney, C.D.; Scott, B.C.; Johnson, O.A. Are Regulations Safe? Reflections From Developing a Digital Cancer Decision-Support Tool. JCO Clin. Cancer Inform. 2021, 5, 353–363. [Google Scholar] [CrossRef]
Patient Characteristic | Total Cohort Count (%) N = 40,348 | Case Count (%) N = 7471 (18.5%) | Control Count (%) N = 32,877 (81.5%) |
---|---|---|---|
Age | |||
Under 55 | 2550 (6.3) | 514 (6.9) | 2036 (6.2) |
Over 55 | 37,798 (93.7) | 6957 (93.1) | 30,841 (93.8) |
Sex | |||
Male | 14,860 (36.4) | 2672 (35.8) | 12,188 (37.1) |
Female | 25,488 (63.6) | 4799 (64.2) | 20,689 (62.9) |
Cancer Site | |||
Oesophagus | 26,360 (65.3) | 4854 (65.0) | 21,506 (65.4) |
Stomach | 13,988 (34.7) | 2617 (35.0) | 11,371 (34.6) |
Symptoms | |||
Abdominal Pain | 2215 (5.5) | 905 (12.1) | 1310 (4.0) |
Chest Pain | 2316 (5.7) | 727 (9.7) | 1589 (4.8) |
Constipation | 1681 (4.2) | 608 (8.1) | 1073 (3.3) |
Cough | 4782 (11.9) | 1005 (13.4) | 3777 (11.4) |
Dyspepsia | 2085 (5.2) | 1294 (17.3) | 764 (2.3) |
Dyspepsia (repeat) | 699 (1.7) | 532 (7.1) | 167 (0.5) |
Dysphagia | 2605 (6.5) | 2420 (32.3) | 185 (0.6) |
Dysphagia (repeat) | 678 (1.7) | 635 (8.5) | 43 (0.1) |
Epigastric Pain | 883 (2.2) | 617 (8.3) | 266 (0.8) |
Fatigue | 1362 (3.4) | 388 (5.2) | 974 (3.0) |
Nausea/Vomiting | 1616 (4.0) | 979 (13.1) | 637 (1.9) |
Nausea/Vomiting (repeat) | 534 (1.3) | 386 (5.2) | 148 (0.5) |
Reflux | 1355 (3.4) | 842 (11.3) | 513 (1.6) |
Shortness of breath | 2621 (6.5) | 629 (8.4) | 1992 (6.1) |
Weight loss | 891 (2.2) | 615 (8.2) | 276 (0.8) |
Lab Test Results | |||
Cholesterol (high) | 6100 (15.1) | 920 (12.3) | 5180 (15.8) |
Haemoglobin (low) | 5398 (13.3) | 2045 (27.3) | 3353 (10.2) |
Inflammatory Markers (high) | 2431 (6.0) | 1010 (13.5) | 1421 (4.3) |
Liver Function Test (high) | 4751 (11.8) | 1272 (17.0) | 3479 (10.6) |
Mean Corpuscular Volume (low) | 1007 (2.5) | 640 (8.6) | 367 (1.1) |
Platelet Count (high) | 1274 (3.2) | 706 (9.4) | 568 (1.7) |
White Cell Count (high) | 1533 (3.8) | 671 (9.0) | 862 (2.6) |
Classifier | AUROC | Classification Threshold | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|---|---|
Linear Support Vector Machine | 0.87 | 0.500 0.800 | 0.89 0.88 | 0.85 0.90 | 0.53 0.41 | 0.65 0.57 |
Logistic Regression | 0.87 | 0.425 0.750 | 0.89 0.88 | 0.81 0.90 | 0.58 0.44 | 0.68 0.59 |
Random Forest | 0.86 | 0.600 0.700 | 0.89 0.88 | 0.86 0.92 | 0.48 0.39 | 0.62 0.55 |
Bernoulli Naïve Bayes | 0.86 | 0.700 0.900 | 0.89 0.86 | 0.80 0.84 | 0.55 0.34 | 0.65 0.49 |
XGBoost | 0.87 | 0.500 0.800 | 0.89 0.88 | 0.85 0.91 | 0.54 0.39 | 0.66 0.55 |
ogRAT | 0.81 | 0.010 0.020 | 0.87 0.87 | 0.86 0.91 | 0.41 0.33 | 0.56 0.49 |
Classifier | AUROC | Classification Threshold | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|---|---|
Linear Support Vector Machine | 0.87 | 0.56 0.82 | 0.89 0.88 | 0.86 0.91 | 0.52 0.41 | 0.65 0.57 |
Logistic Regression | 0.87 | 0.57 0.82 | 0.89 0.88 | 0.86 0.91 | 0.51 0.41 | 0.64 0.57 |
ogRAT | 0.81 | 0.01 0.02 | 0.87 0.87 | 0.86 0.91 | 0.41 0.33 | 0.56 0.49 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Briggs, E.; de Kamps, M.; Hamilton, W.; Johnson, O.; McInerney, C.D.; Neal, R.D. Machine Learning for Risk Prediction of Oesophago-Gastric Cancer in Primary Care: Comparison with Existing Risk-Assessment Tools. Cancers 2022, 14, 5023. https://doi.org/10.3390/cancers14205023
Briggs E, de Kamps M, Hamilton W, Johnson O, McInerney CD, Neal RD. Machine Learning for Risk Prediction of Oesophago-Gastric Cancer in Primary Care: Comparison with Existing Risk-Assessment Tools. Cancers. 2022; 14(20):5023. https://doi.org/10.3390/cancers14205023
Chicago/Turabian StyleBriggs, Emma, Marc de Kamps, Willie Hamilton, Owen Johnson, Ciarán D. McInerney, and Richard D. Neal. 2022. "Machine Learning for Risk Prediction of Oesophago-Gastric Cancer in Primary Care: Comparison with Existing Risk-Assessment Tools" Cancers 14, no. 20: 5023. https://doi.org/10.3390/cancers14205023
APA StyleBriggs, E., de Kamps, M., Hamilton, W., Johnson, O., McInerney, C. D., & Neal, R. D. (2022). Machine Learning for Risk Prediction of Oesophago-Gastric Cancer in Primary Care: Comparison with Existing Risk-Assessment Tools. Cancers, 14(20), 5023. https://doi.org/10.3390/cancers14205023