How Is the Lung Cancer Incidence Rate Associated with Environmental Risks? Machine-Learning-Based Modeling and Benchmarking
Abstract
:1. Introduction
2. Materials and Method
2.1. Data Source
2.2. Variables
2.3. Method
2.4. Feature Selection
2.5. Evaluation Criteria
3. Results and Discussion
3.1. Key Features of Lung Cancer Incidence
3.2. Benchmarking of Machine Learning Algorithms
3.3. Discussion
4. Conclusions
4.1. Summary
4.2. Limitation
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Stayner, L.; Bena, J.; Sasco, A.J.; Smith, R.; Steenland, K.; Kreuzer, M.; Straif, K. Lung cancer risk and workplace exposure to environmental tobacco smoke. Am. J. Public Health 2007, 97, 545–551. [Google Scholar] [CrossRef] [PubMed]
- Taiwan’s Cancer Death Clock 3 Seconds Slower in 2020. Focus Taiwan—CNA English News. Available online: https://focustaiwan.tw/society/202106180017 (accessed on 18 June 2021).
- Lung Cancer Screening Study in East Asia Successful at Identifying Early-Stage Disease. IASLC Lung Cancer News. Retrieved 2022. 2021. Available online: https://www.ilcn.org/lung-cancer-screening-study-in-east-asia-successful-at-identifying-early-stage-disease/ (accessed on 3 January 2022).
- Malhotra, J.; Malvezzi, M.; Negri, E.; La Vecchia, C.; Boffetta, P. Risk factors for lung cancer worldwide. Eur. Respir. J. 2016, 48, 889–902. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Environmental Protection Administration; Executive Yuan, R.O.C. (n.d.). Environmental Protection Administration, EY-Air Pollution Statistics. Retrieved 2022. Available online: https://www.epa.gov.tw/eng/B19FC7AF2E9ACA66 (accessed on 3 January 2022).
- Environmental Protection Administration; Executive Yuan, R.O.C. (n.d.). Introduction to Local Monitoring-Taiwan Air Quality Monitoring Network. Retrieved 2022. Available online: https://airtw.epa.gov.tw/ENG/EnvMonitoring/Local/LocalBack.aspx (accessed on 3 January 2022).
- Coleman, N.C.; Burnett, R.T.; Higbee, J.D.; Lefler, J.S.; Merrill, R.M.; Ezzati, M.; Marshall, J.D.; Kim, S.Y.; Bechle, M.; Robinson, A.L.; et al. Cancer mortality risk, fine particulate air pollution, and smoking in a large, representative cohort of US adults. Cancer Causes Control 2020, 31, 767–776. [Google Scholar] [CrossRef] [PubMed]
- Hvidtfeldt, U.A.; Severi, G.; Andersen, Z.J.; Atkinson, R.; Bauwelinck, M.; Bellander, T.; Boutron-Ruault, M.-C.; Brandt, J.; Brunekreef, B.; Cesaroni, G.; et al. Long-term low-level ambient air pollution exposure and risk of lung cancer—A pooled analysis of 7 European cohorts. Environ. Int. 2021, 146, 106249. [Google Scholar] [CrossRef] [PubMed]
- Kim, H.B.; Shim, J.Y.; Park, B.; Lee, Y.J. Long-Term Exposure to Air Pollutants and Cancer Mortality: A Meta-Analysis of Cohort Studies. Int. J. Environ. Res. Public Health 2018, 15, 2608. [Google Scholar] [CrossRef] [Green Version]
- Wang, K.J.; Lee, C.M.; Hu, G.C.; Wang, K.M. Stroke to dementias associated with environmental risks—A semi-Markov model. Int. J. Environ. Res. Public Health 2020, 17, 1944. [Google Scholar] [CrossRef] [Green Version]
- Rahib, L.; Wehner, M.R.; Matrisian, L.M.; Nead, K.T. Estimated projection of US cancer incidence and death to 2040. JAMA Netw. Open 2021, 4, e214708. [Google Scholar] [CrossRef]
- Jakobsen, E.; Olsen, K.E.; Bliddal, M.; Hornbak, M.; Persson, G.F.; Green, A. Forecasting lung cancer incidence, mortality, and prevalence to Year 2030. BMC Cancer 2021, 21, 985. [Google Scholar] [CrossRef]
- Kelleher, J.D.; Namee, M.B.; D’Arcy, A. Fundamentals of Machine Learning for Predictive Data Analytics, second edition. In Algorithms, Worked Examples, and Case Studies, 2nd ed.; The MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
- Sekeroglu, B.; Tuncal, K. Prediction of cancer incidence rates for the European continent using machine learning models. Health Inform. J. 2021, 27, 146045822098387. [Google Scholar] [CrossRef]
- Tuncal, K.; Sekeroglu, B.; Ozkan, C. Lung Cancer Incidence Prediction Using Machine Learning Algorithms. J. Adv. Inf. Technol. 2020, 11, 91–96. [Google Scholar] [CrossRef]
- Louppe, G. Understanding Random Forests: From Theory to Practice. arXiv 2014, arXiv:1407.7502. [Google Scholar]
- Sung, H.; Ferlay, J.; Siegel, R.L.; Laversanne, M.; Soerjomataram, I.; Jemal, A.; Bray, F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2021, 71, 209–249. [Google Scholar] [CrossRef] [PubMed]
- Health Promotion Administration (HPA). Research & Statistic. Retrieved 2022. Available online: https://www.hpa.gov.tw/EngPages/List.aspx?nodeid=1042 (accessed on 3 January 2022).
- Health Promotion Administration (HPA). Introduction. Retrieved 2022. 2016. Available online: https://www.hpa.gov.tw/EngPages/Detail.aspx?nodeid=1046&pid=5892 (accessed on 3 January 2022).
- Hsu, J.C.; Wei, C.F.; Yang, S.C.; Lin, P.C.; Lee, Y.C.; Lu, C.Y. Lung cancer survival and mortality in Taiwan following the initial launch of targeted therapies: An interrupted time series study. BMJ Open 2020, 10, e033427. [Google Scholar] [CrossRef]
- Everington, K. Taiwan has 15th Highest Lung Cancer Rate in World. Taiwan News, 28 November 2019. Available online: https://www.taiwannews.com.tw/en/news/3825780 (accessed on 28 November 2019).
- International Agency for Research on Cancer. IARC: Outdoor Air Pollution a Leading Environmental Cause of Cancer Deaths. Available online: https://www.iarc.who.int/wp-content/uploads/2018/07/pr221_E.pdf (accessed on 17 October 2013).
- Ministry of Transportation and Communications, R.O.C. (n.d.). A Brief Introduction to the Department of Statistics. Retrieved 2022. Available online: https://www.motc.gov.tw/en/home.jsp?id=607&parentpath=0,154 (accessed on 3 January 2022).
- Ministry of Transportation and Communications, R.O.C. (n.d.). Annual Transportation Report. Retrieved 2022. Available online: https://www.motc.gov.tw/en/home.jsp?id=610&parentpath=0,154 (accessed on 3 January 2022).
- Garshick, E.; Laden, F.; Hart, J.E.; Rosner, B.; Davis, M.E.; Eisen, E.A.; Smith, T.J. Lung cancer and vehicle exhaust in trucking industry workers. Environ. Health Perspect. 2008, 116, 1327–1332. [Google Scholar] [CrossRef] [Green Version]
- Ministry of Economic Affairs, R.O.C. (n.d.). Industrial Production, Shipment & Inventory Statistics Survey—Industrial Statistics. Retrieved 2022. Available online: https://dmz26.moea.gov.tw/GMWeb/investigate/InvestigateDB.aspx?lang=E (accessed on 3 January 2022).
- Ministry of Economic Affairs, R.O.C. (n.d.). Missions. Retrieved 2022. Available online: https://www.moea.gov.tw/MNS/dos_e/content/Content.aspx?menu_id=6761 (accessed on 3 January 2022).
- Executive Yuan, R.O.C. (n.d.). Directorate General of Budget, Accounting and Statistics. National Statistics: Taiwan, China, Retrieved 2022. Available online: https://eng.stat.gov.tw/mp.asp?mp=5 (accessed on 3 January 2022).
- Kang, H. The prevention and handling of the missing data. Korean J. Anesthesiol. 2013, 64, 402. [Google Scholar] [CrossRef]
- Krishna, M.; Gopal Durgaprasad, N.; Kanmani, S.; Deepa Reddy, G.; Sravan; Reddy, D. Revanth. In Comparative Analysis Of Different Imputation Techniques For Handling Missing Dataset; Blue Eyes Intelligence Engineering & Sciences Publication: Bhopal, India, 2019; Volume 8, Issue 7, pp. 347–351. [Google Scholar]
- Ekberg-Aronsson, M.; Nilsson, P.M.; Nilsson, J.K.; Pehrsson, K.; Löfdahl, C.G. Socio-economic status and lung cancer risk including histologic subtyping—A longitudinal study. Lung Cancer 2006, 51, 21–29. [Google Scholar] [CrossRef]
- Australian Institute of Health and Welfare. Burden of Tobacco Use in Australia: Australian Burden of Disease Study 2015; Australian Institute of Health and Welfare Cat: Darlinghurst, Australia, 2019; No. BOD 20.
- National Toxicology Program. Tobacco-Related Exposures, Report on Carcinogens, 4th ed.; National Institute of Environmental Health and Safety: Triangle Park, NC, USA, 2016.
- De Vogli, R.; Santinello, M. Unemployment and smoking: Does psychosocial stress matter? Tob. Control 2005, 14, 389–395. [Google Scholar] [CrossRef] [Green Version]
- Tannenbaum, S.L.; Zhao, W.; Koru-Sengul, T.; Miao, F.; Lee, D.; Byrne, M.M. Marital status and its effect on lung cancer survival. SpringerPlus 2013, 2, 504. [Google Scholar] [CrossRef] [Green Version]
- Siddiqui, F.; Bae, K.; Langer, C.J.; Coyne, J.C.; Gamerman, V.; Komaki, R.; Choy, H.; Curran, W.J.; Watkins-Bruner, D.; Movsas, B. The influence of gender, race, and marital status on survival in lung cancer patients: Analysis of Radiation Therapy Oncology Group trials. J. Thorac. Oncol. 2010, 5, 631–639. [Google Scholar] [CrossRef] [Green Version]
- Fife, D.A.; D’Onofrio, J. Common, Uncommon, and Novel Applications of Random Forest in Psychological Research. 2021. Available online: https://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwiB9sO88tn4AhVLUPUHHY6YD0QQFnoECAUQAQ&url=https%3A%2F%2Fpsyarxiv.com%2Febsmr%2Fdownload&usg=AOvVaw0-8ltV7dAz9Asx6Vhf5uDi (accessed on 3 January 2022).
- Gal, M.; Rubinfeld, D.L. Data Standardization. SSRN Electron. J. 2018, 94, 737. [Google Scholar] [CrossRef]
- Mahesh, B. Machine learning algorithms—A review. Int. J. Sci. Res. 2020, 9, 381–386. [Google Scholar]
- Walton, J.T. Subpixel urban land cover estimation. Photogramm. Eng. Remote Sens. 2008, 74, 1213–1222. [Google Scholar] [CrossRef] [Green Version]
- Widarjono, A. Ekonometrika Teori dan Aplikasi untuk Ekonomi dan Bisnis [Econometrics Theory and Application to Economics and Business]; Ekonisia FE UII: Yogyakarta, Indonesia, 2007. [Google Scholar]
- Farahani, A.; Rahiminezhed, H.; Same, A.L.; Immannezhed, K. A Comparison of Partial Least Square (PLS) and Ordinary Least Square (OLS) regressions in predicting of couples mental health based on their communicational patterns. Procedia Soc. Behav. Sci. 2010, 5, 1459–1463. [Google Scholar] [CrossRef] [Green Version]
- Akinwande, M.O.; Dikko, H.G.; Samson, A. Variance inflation factor: As a condition for the inclusion of suppressor variable(s) in regression analysis. Open J. Stat. 2015, 5, 754–767. [Google Scholar] [CrossRef] [Green Version]
- Draper, N.R.; Smith, H. Applied Regression Analysis; Wiley: New York, NY, USA, 2012. [Google Scholar]
- Marcoulides, K.M.; Raykov, T. Evaluation of variance inflation factors in regression models using latent variable modeling methods. Educ. Psychol. Meas. 2018, 79, 874–882. [Google Scholar] [CrossRef]
- Wooldridge, J.M. Introductory Econometrics. A Modern Approach; Cengage Learning: Boston, MA, USA, 2015. [Google Scholar]
- Chatterjee, S.; Simonoff, J.S. Handbook of Regression Analysis; Wiley: New York, NY, USA, 2013. [Google Scholar]
- Grace-Martin, K. Assessing the Fit of Regression Models. The Analysis Factor. Retrieved 2022. 2013. Available online: https://www.theanalysisfactor.com/assessing-the-fit-of-regression-models/ (accessed on 3 January 2022).
- Sloboda, B.W. Transportation Statistics; J. Ross Publishing: Richmond, VA, USA, 2009. [Google Scholar]
- Wilson, H.J.; Keating, B.; John Galt Solutions, Inc. Business Forecasting with Business ForecastX, 6th ed.; McGraw-Hill/Irwin: New York, NY, USA, 2008. [Google Scholar]
- Chen, G.; Wan, X.; Yang, G.; Zou, X. Traffic-related air pollution and lung cancer: A meta-analysis. Thorac. Cancer 2015, 6, 307–318. [Google Scholar] [CrossRef]
- Huang, Y.; Zhu, M.; Ji, M.; Fan, J.; Xie, J.; Wei, X.; Jiang, X.; Xu, J.; Chen, L.; Yin, R.; et al. Air pollution, genetic factors, and the risk of lung cancer: A prospective study in the UK biobank. Am. J. Respir. Crit. Care Med. 2021, 204, 817–825. [Google Scholar] [CrossRef]
- IBM Cloud Education. Random Forest. IBM. Retrieved 2022. 2020. Available online: https://www.ibm.com/cloud/learn/random-forest (accessed on 3 January 2022).
- Blot, W.; Fraumeni, J. Cancers of the Lung and Pleura. Cancer Epidemiology and Prevention; Schottenfeld, D., Fraumeni, J.F., Eds.; Oxford University Press: New York, NY, USA, 1996; pp. 637–665. [Google Scholar]
- Youlden, D.R.; Cramb, S.M.; Baade, P.D. The international epidemiology of lung cancer: Geographical distribution and secular trends. J. Thorac. Oncol. 2008, 3, 819–831. [Google Scholar] [CrossRef]
- Moon, D.H.; Kwon, S.O.; Kim, S.Y.; Kim, W.J. Air pollution and incidence of lung cancer by histological type in Korean adults: A Korean national health insurance service health examinee cohort study. Int. J. Environ. Res. Public Health 2020, 17, 915. [Google Scholar] [CrossRef] [Green Version]
- Moore, D.S.; Notz, W.; Fligner, M.A. The Basic Practice of Statistics; W.H. Freeman and Company: New York, NY, USA, 2013. [Google Scholar]
- Doshi-Velez, F.; Kim, B. Towards A Rigorous Science of Interpretable Machine Learning. arXiv 2017, arXiv:1702.08608. [Google Scholar]
- Nandi, A.; Pal, A.K. Interpreting Machine Learning Models: Learn Model Interpretability and Explainability Methods; Apress: New York, NY, USA, 2022. [Google Scholar]
- Molnar, C. 3.1 Importance of Interpretability|Interpretable Machine Learning. Interpretable Machine Learning. Retrieved 2022. 2022. Available online: https://christophm.github.io/interpretable-ml-book/interpretability-importance.html (accessed on 3 January 2022).
- Aria, M.; Cuccurullo, C.; Gnasso, A. A comparison among interpretative proposals for random forests. Mach. Learn. Appl. 2021, 6, 100094. [Google Scholar] [CrossRef]
- Petch, J.; Di, S.; Nelson, W. Opening the Black Box: The Promise and Limitations of Explainable Machine Learning in Cardiology. Can. J. Cardiol. 2022, 38, 204–213. [Google Scholar] [CrossRef] [PubMed]
- World Health Organization. Cancer. Retrieved 2022. 2021. Available online: https://www.who.int/news-room/fact-sheets/detail/cancer (accessed on 3 January 2022).
- U.S. Department of Health and Human Services. Smoking Cessation A Report of the Surgeon General. Atlanta, GA: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Office on Smoking and Health. 2020. Available online: https://www.hhs.gov/sites/default/files/2020-cessation-sgr-full-report.pdf (accessed on 3 January 2022).
- Hamra, G.B.; Laden, F.; Cohen, A.J.; Raaschou-Nielsen, O.; Brauer, M.; Loomis, D. Lung cancer and exposure to nitrogen Dioxide and traffic: A systematic review and meta-analysis. Environ. Health Perspect. 2015, 123, 1107–1112. [Google Scholar] [CrossRef] [PubMed]
- Shankar, A.; Dubey, A.; Saini, D.; Singh, M.; Prasad, C.P.; Roy, S.; Bharati, S.J.; Rinki, M.; Singh, N.; Seth, T.; et al. Environmental and occupational determinants of lung cancer. Transl. Lung Cancer Res. 2019, 8, S31–S49. [Google Scholar] [CrossRef] [PubMed]
- Yang, T.; Qiao, Y.; Xiang, S.; Li, W.; Gan, Y.; Chen, Y. Work stress and the risk of cancer: A meta-analysis of observational studies. Int. J. Cancer 2019, 144, 2390–2400. [Google Scholar] [CrossRef] [PubMed]
- O’Keeffe, L.M.; Taylor, G.; Huxley, R.R.; Mitchell, P.; Woodward, M.; Peters, S.A.E. Smoking as a risk factor for lung cancer in women and men: A systematic review and meta-analysis. BMJ Open 2018, 8, e021611. [Google Scholar] [CrossRef] [Green Version]
- Proctor, R.N. Tobacco and the global lung cancer epidemic. Nat. Rev. Cancer 2001, 1, 82–86. [Google Scholar] [CrossRef]
- Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
- Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
- Schölkopf, B.; Smola, A.J. Learning with Kernels; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
- Alkhatib, K.; Najadat, H.; Hmeidi, I.; Shatnawi, M.K. Stock Price Prediction Using K-Nearest Neighbor (kNN) Algorithm. Int. J. Bus. Humanit. Technol. 2013, 3, 32–44. [Google Scholar]
- Ban, T.; Zhang, R.; Pang, S.; Sarrafzadeh, A.; Inoue, D. Referential kNN regression for financial time series forecasting. In International Conference on Neural Information Processing; Springer: Berlin/Heidelberg, Germany, 2013; pp. 601–608. [Google Scholar]
- Lin, A.; Shang, P.; Feng, G.; Zhong, B. Application of empirical mode decomposition combined with K-nearest neighbors approach in financial time series forecasting. Fluct. Noise Lett. 2012, 11, 1250018. [Google Scholar] [CrossRef]
- Taunk, K.; De, S.; Verma, S.; Swetapadma, A. A Brief Review of Nearest Neighbor Algorithm for Learning and Classification. In Proceedings of the 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 15–17 May 2019. [Google Scholar]
- Al-Dosary, N.M.N.; Al-Hamed, S.A.; Aboukarima, A.M. K-nearest Neighbors method for prediction of fuel consumption in tractor-chisel plow systems. Eng. Agrícola 2019, 39, 729–736. [Google Scholar] [CrossRef]
- Jabin, I.; Rahman, M.M. Predicting lung cancer survivability: A machine learning regression model. Netw. Biol. 2021, 11, 68–81. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
- Quinlan, J. Combining instance-cased and model-based learning. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, MA, USA, 27–29 July 1993; pp. 236–243. [Google Scholar]
- Information on Cubist. Data Mining with Cubist. 2020. Available online: https://rulequest.com/cubist-info.html (accessed on 3 January 2022).
- Zhou, J.; Li, E.; Wei, H.; Li, C.; Qiao, Q.; Armaghani, D.J. Random forests and cubist algorithms for predicting shear strengths of rockfill materials. Appl. Sci. 2019, 9, 1621. [Google Scholar] [CrossRef] [Green Version]
- Kuhn, M.; Weston, S.; Keefer, C.; Coulter, N. Cubist Models for Regression. 2012. Available online: https://mran.revolutionanalytics.com/snapshot/2016-01-01/web/packages/Cubist/vignettes/cubist.pdf (accessed on 3 January 2022).
- Kuhn, M.; Johnson, K. Applied Predictive Modeling; Springer: New York, NY, USA, 2013. [Google Scholar]
Factor | Variable (Notation) | Description | Data Type |
---|---|---|---|
Air pollution |
| Average CO concentration (ppm) | Continuous |
| Average NO2 concentration (ppb) | Continuous | |
| Average SO2 concentration (ppb) | Continuous | |
| Average O3 concentration (ppb) | Continuous | |
| Average PM10 concentration (μg/m3) | Continuous | |
| Total number of registered vehicles, including buses, heavy trucks, sedans, light trucks, specially constructed vehicles, and motorcycles. | Discrete | |
| Total number of factories | Discrete | |
Tobacco use |
| Consumption of tobacco per capita aged 18 and over (pieces/year) | Discrete |
| Percentage of smokers from population aged 18 and over | Continuous | |
Socioeconomic status |
| Percentage of low-income persons from total population | Continuous |
Employment status |
| Percentage of employed from civilian population aged 15 and over | Continuous |
| Total unemployment rate | Continuous | |
Marital status |
| Divorce status of population aged 15 and over | Continuous |
Living environment |
| Number of households living in one-story buildings | Continuous |
| Number of households living in apartments six stories or over | Continuous | |
| Percentage of days measured with PSI > 100 | Continuous | |
| Percentage of public sanitary sewer availability | Continuous | |
| Percentage of heavily polluted sections in the total length of major rivers | Continuous | |
| Percentage of unqualified drinking water as tested | Continuous | |
| Percentage of proper refuse disposal | Continuous | |
Dependent variable |
| Trachea, bronchus, and lung cancer (C33–C34) incidence rates per 100,000 in Taiwan | Continuous |
Factor | Predictor Variable | Stepwise Regression | Feature Selection Based on the VIF Value |
---|---|---|---|
Air pollution | CO | ||
NO2 | |||
SO2 | |||
O3 | |||
PM10 | |||
VEHICLES | |||
FACTORIES | |||
Tobacco use | TOBACCO | ||
SMOKERS | |||
Socioeconomic status | LI | ||
Employment status | EMPLOYED | ||
UNEMPLOYMENT | |||
Marital status | DIVORCE | ||
Living environment | ONE | ||
APARTMENTS | |||
PSI | |||
SANITARY | |||
POLLUTED | |||
UNQDRINK | |||
DISPOSAL | |||
Total number of variables | 15 | 8 |
Algorithm | Fold | Without Feature Selection | With Feature Selection | ||
---|---|---|---|---|---|
RMSE | R-Squared | RMSE | R-Squared | ||
Linear regression | 1 | 17.612 | 0.632 | 22.122 | 0.682 |
2 | 2.341 | 0.980 | 5.279 | 0.875 | |
3 | 134.232 | 0.532 | 24.519 | 0.827 | |
4 | 13.419 | 0.080 | 6.846 | 0.960 | |
5 | 4.911 | 0.849 | 10.789 | 0.374 | |
Average | 34.503 | 0.615 | 13.911 | 0.743 | |
Support vector regression | 1 | 2.144 | 0.971 | 1.617 | 0.994 |
2 | 3.712 | 0.978 | 5.296 | 0.919 | |
3 | 2.447 | 0.996 | 5.223 | 0.941 | |
4 | 4.055 | 0.922 | 4.244 | 0.984 | |
5 | 9.489 | 0.173 | 9.758 | 0.182 | |
Average | 4.369 | 0.808 | 5.228 | 0.804 | |
Random forest | 1 | 5.402 | 0.853 | 4.532 | 0.885 |
2 | 4.599 | 0.905 | 5.067 | 0.895 | |
3 | 1.732 | 0.969 | 2.448 | 0.935 | |
4 | 5.086 | 0.897 | 4.996 | 0.885 | |
5 | 7.365 | 0.853 | 7.570 | 0.868 | |
Average | 4.837 | 0.895 | 4.922 | 0.894 | |
K-nearest neighbor | 1 | 2.562 | 0.946 | 7.215 | 0.974 |
2 | 6.008 | 0.749 | 6.008 | 0.842 | |
3 | 3.925 | 0.875 | 3.516 | 0.923 | |
4 | 4.282 | 0.913 | 6.862 | 0.669 | |
5 | 10.792 | 0.590 | 6.393 | 0.660 | |
Average | 5.514 | 0.814 | 5.999 | 0.814 | |
Cubist model tree | 1 | 5.817 | 0.831 | 6.524 | 0.853 |
2 | 3.508 | 0.910 | 2.712 | 0.971 | |
3 | 5.615 | 0.869 | 2.607 | 0.988 | |
4 | 7.451 | 0.550 | 2.897 | 0.998 | |
5 | 2.007 | 0.987 | 1.808 | 0.990 | |
Average | 4.880 | 0.829 | 3.310 | 0.960 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, K.-M.; Chen, K.-H.; Hernanda, C.A.; Tseng, S.-H.; Wang, K.-J. How Is the Lung Cancer Incidence Rate Associated with Environmental Risks? Machine-Learning-Based Modeling and Benchmarking. Int. J. Environ. Res. Public Health 2022, 19, 8445. https://doi.org/10.3390/ijerph19148445
Wang K-M, Chen K-H, Hernanda CA, Tseng S-H, Wang K-J. How Is the Lung Cancer Incidence Rate Associated with Environmental Risks? Machine-Learning-Based Modeling and Benchmarking. International Journal of Environmental Research and Public Health. 2022; 19(14):8445. https://doi.org/10.3390/ijerph19148445
Chicago/Turabian StyleWang, Kung-Min, Kun-Huang Chen, Chrestella Ayu Hernanda, Shih-Hsien Tseng, and Kung-Jeng Wang. 2022. "How Is the Lung Cancer Incidence Rate Associated with Environmental Risks? Machine-Learning-Based Modeling and Benchmarking" International Journal of Environmental Research and Public Health 19, no. 14: 8445. https://doi.org/10.3390/ijerph19148445
APA StyleWang, K.-M., Chen, K.-H., Hernanda, C. A., Tseng, S.-H., & Wang, K.-J. (2022). How Is the Lung Cancer Incidence Rate Associated with Environmental Risks? Machine-Learning-Based Modeling and Benchmarking. International Journal of Environmental Research and Public Health, 19(14), 8445. https://doi.org/10.3390/ijerph19148445