Prospects and Pitfalls of Machine Learning in Nutritional Epidemiology
Abstract
:1. Introduction
2. Nutritional Epidemiology
2.1. Errors in Measurement Methods
2.2. Nonlinearities
2.3. Confounding
2.4. Missing Data
3. Machine Learning
3.1. Training and Evaluation of Machine Learning Models
3.2. Machine-Learning Techniques
3.3. Neural Networks and Deep Learning
4. Applications and Common Pitfalls
4.1. Health and Dietary Input Data
4.1.1. Increasing the Amount of Data
4.1.2. Improving Data Quality
4.2. Modelling of Dietary Variables
4.2.1. Non-Linearities
4.2.2. Dimensionality Reduction
4.3. ML Approaches to Confounding
5. Practical Recommendations
5.1. Data Preparation
5.2. Data Quality and Quantity
5.3. Avoiding Overfitting
5.4. Dealing with Biased Data
5.5. Performance Metrics
5.6. Skilled Personnel
6. Conclusions
6.1. Critical Points for the Application of ML
- Most of the studies in the literature are limited to few models and small datasets, therefore not showing the real advantages of one method compared to another. Systematic comparisons and benchmark datasets are therefore needed.
- It is important to take advantage of the datasets already collected in different studies. That means an organised system of aggregation of the data is essential, together with a regulatory framework for ensuring data privacy and trustworthiness. Extensive work is needed to ensure that research projects collect and publish datasets in a well-organised manner and with robust security.
- In addition, availability of technical skills in the use of ML, as well as access to high-performance computing, is needed to produce clear, quantifiable demonstrations of the benefits of ML in nutritional epidemiology research. This can be reached thanks to collaborations and large investments in the training of personnel and infrastructures.
- Whenever dealing with data with a low signal to noise ratio, such as survival rate or readmission rate in hospitals, several epidemiologic studies have shown that ML algorithms provide improved performance compared with traditional statistical models (Feng et al. [96], Mortazavi et al. [97]). On the other hand, the situation is overturned for data with higher signal to noise ratio, such as risk prediction of major chronic diseases or depression. In this case, ML models have been surpassed by conventional statistical models (Nusinovici et al. [98], Gravesteijn et al. [99]). We expect a similar situation to occur in nutritional epidemiology, although the high correlation between nutritional variables could also play a big role in favour of ML models.
6.2. Limitations of Current Work
6.3. Future Perspectives
Funding
Institutional Review Board Statement
Informed Consent Statement
Acknowledgments
Conflicts of Interest
References
- Satija, A.; Yu, E.; Willett, W.C.; Hu, F.B. Understanding nutritional epidemiology and its role in policy. Adv. Nutr. 2015, 6, 5–18. [Google Scholar] [CrossRef] [Green Version]
- Illner, A.; Freisling, H.; Boeing, H.; Huybrechts, I.; Crispim, S.; Slimani, N. Review and evaluation of innovative technologies for measuring diet in nutritional epidemiology. Int. J. Epidemiol. 2012, 41, 1187–1203. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Thornton, K.; Villamor, E. Nutritional Epidemiology. In Encyclopedia of Food and Health; Caballero, B., Finglas, P.M., Toldrá, F., Eds.; Academic Press: Oxford, UK, 2016; pp. 104–107. [Google Scholar] [CrossRef]
- Hebert, J.R.; Clemow, L.; Pbert, L.; Ockene, I.S.; Ockene, J.K. Social desirability bias in dietary self-report may compromise the validity of dietary intake measures. Int. J. Epidemiol. 1995, 24, 389–398. [Google Scholar] [CrossRef] [PubMed]
- May, S.; Bigelow, C. Modeling nonlinear dose-response relationships in epidemiologic studies: Statistical approaches and practical challenges. Dose-Response 2005, 3. [Google Scholar] [CrossRef] [PubMed]
- Greenland, S.; Morgenstern, H. Confounding in health research. Annu. Rev. Public Health 2001, 22, 189–212. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zeraatkar, D.; Cheung, K.; Milio, K.; Zworth, M.; Gupta, A.; Bhasin, A.; Bartoszko, J.J.; Kiflen, M.; Morassut, R.E.; Noor, S.T.; et al. Methods for the selection of covariates in nutritional epidemiology studies: A meta-epidemiological review. Curr. Dev. Nutr. 2019, 3, nzz104. [Google Scholar] [CrossRef]
- Sangra, R.A.; Codina, A.F. The identification, impact and management of missing values and outlier data in nutritional epidemiology. Nutr. Hosp. 2015, 31, 189–195. [Google Scholar]
- Ciavatta, S.; Pastres, R.; Lin, Z.; Beck, M.; Badetti, C.; Ferrari, G. Fault detection in a real-time monitoring network for water quality in the lagoon of Venice (Italy). Water Sci. Technol. 2004, 50, 51–58. [Google Scholar] [CrossRef] [PubMed]
- Shanthamallu, U.S.; Spanias, A.; Tepedelenlioglu, C.; Stanley, M. A brief survey of machine learning methods and their sensor and IoT applications. In Proceedings of the 2017 8th International Conference on Information, Intelligence, Systems & Applications (IISA), Larnaca, Cyprus, 27–30 August 2017; pp. 1–8. [Google Scholar]
- Mahdavinejad, M.S.; Rezvan, M.; Barekatain, M.; Adibi, P.; Barnaghi, P.; Sheth, A.P. Machine learning for Internet of Things data analysis: A survey. Digit. Commun. Netw. 2018, 4, 161–175. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
- Caruana, R.; Niculescu-Mizil, A. An empirical comparison of supervised learning algorithms. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 161–168. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. Unsupervised learning. In The Elements of Statistical Learning; Springer: Berlin/Heidelberg, Germany, 2009; pp. 485–585. [Google Scholar]
- Hassoun, M.H. Fundamentals of Artificial Neural Networks; MIT Press: Cambridge, MA, USA, 1995. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Winkler, D.A.; Le, T.C. Performance of deep and shallow neural networks, the universal approximation theorem, activity cliffs, and QSAR. Mol. Inform. 2017, 36, 1600118. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
- Morgenstern, J.D.; Rosella, L.C.; Costa, A.P.; de Souza, R.J.; Anderson, L.N. Perspective: Big data and machine learning could help advance nutritional epidemiology. Adv. Nutr. 2021, 12, 621–631. [Google Scholar] [CrossRef] [PubMed]
- Phillips, S.M.; Cadmus-Bertram, L.; Rosenberg, D.; Buman, M.P.; Lynch, B.M. Wearable technology and physical activity in chronic disease: Opportunities and challenges. Am. J. Prev. Med. 2018, 54, 144. [Google Scholar] [CrossRef] [PubMed]
- Vu, T.; Lin, F.; Alshurafa, N.; Xu, W. Wearable food intake monitoring technologies: A comprehensive review. Computers 2017, 6, 4. [Google Scholar] [CrossRef]
- Cappon, G.; Acciaroli, G.; Vettoretti, M.; Facchinetti, A.; Sparacino, G. Wearable continuous glucose monitoring sensors: A revolution in diabetes treatment. Electronics 2017, 6, 65. [Google Scholar] [CrossRef] [Green Version]
- Contreras, I.; Vehi, J. Artificial intelligence for diabetes management and decision support: Literature review. J. Med. Internet Res. 2018, 20, e10775. [Google Scholar] [CrossRef]
- Kavakiotis, I.; Tsave, O.; Salifoglou, A.; Maglaveras, N.; Vlahavas, I.; Chouvarda, I. Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 2017, 15, 104–116. [Google Scholar] [CrossRef]
- Limketkai, B.N.; Mauldin, K.; Manitius, N.; Jalilian, L.; Salonen, B.R. The Age of Artificial Intelligence: Use of Digital Technology in Clinical Nutrition. Curr. Surg. Rep. 2021, 9, 20. [Google Scholar] [CrossRef]
- Kao, C.K.; Liebovitz, D.M. Consumer mobile health apps: Current state, barriers, and future directions. PM&R 2017, 9, S106–S115. [Google Scholar]
- Bandy, L.; Adhikari, V.; Jebb, S.; Rayner, M. The use of commercial food purchase data for public health nutrition research: A systematic review. PLoS ONE 2019, 14, e0210192. [Google Scholar] [CrossRef]
- Kalantarian, H.; Sarrafzadeh, M. Audio-based detection and evaluation of eating behavior using the smartwatch platform. Comput. Biol. Med. 2015, 65, 1–9. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Shah, N.; Srivastava, G.; Savage, D.W.; Mago, V. Assessing Canadians health activity and nutritional habits through social media. Front. Public Health 2020, 7, 400. [Google Scholar] [CrossRef] [PubMed]
- Gerina, F.; Pes, B.; Reforgiato Recupero, D.; Riboni, D. Toward supporting food journaling using air quality data mining and a social robot. In Proceedings of the European Conference on Ambient Intelligence, Rome, Italy, 13–15 November 2019; pp. 318–323. [Google Scholar]
- Grimes, D.A. Epidemiologic research using administrative databases: Garbage in, garbage out. Obstet. Gynecol. 2010, 116, 1018–1019. [Google Scholar] [CrossRef] [PubMed]
- Lo, F.P.W.; Sun, Y.; Qiu, J.; Lo, B. Image-based food classification and volume estimation for dietary assessment: A review. IEEE J. Biomed. Health Inform. 2020, 24, 1926–1939. [Google Scholar] [CrossRef] [PubMed]
- Tay, W.; Kaur, B.; Quek, R.; Lim, J.; Henry, C.J. Current developments in digital quantitative volume estimation for the optimisation of dietary assessment. Nutrients 2020, 12, 1167. [Google Scholar] [CrossRef]
- Sahoo, D.; Hao, W.; Ke, S.; Xiongwei, W.; Le, H.; Achananuparp, P.; Lim, E.P.; Hoi, S.C. FoodAI: Food image recognition via deep learning for smart food logging. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2260–2268. [Google Scholar]
- Lo, F.P.W.; Sun, Y.; Qiu, J.; Lo, B. Food volume estimation based on deep learning view synthesis from a single depth map. Nutrients 2018, 10, 2005. [Google Scholar] [CrossRef] [Green Version]
- Ege, T.; Ando, Y.; Tanno, R.; Shimoda, W.; Yanai, K. Image-based estimation of real food size for accurate food calorie estimation. In Proceedings of the 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 28–30 March 2019; pp. 274–279. [Google Scholar]
- Puri, M.; Zhu, Z.; Yu, Q.; Divakaran, A.; Sawhney, H. Recognition and volume estimation of food intake using a mobile device. In Proceedings of the 2009 Workshop on Applications of Computer Vision (WACV), Snowbird, UT, USA, 7–8 December 2009; pp. 1–8. [Google Scholar]
- Zhu, F.; Bosch, M.; Woo, I.; Kim, S.; Boushey, C.J.; Ebert, D.S.; Delp, E.J. The use of mobile devices in aiding dietary assessment and evaluation. IEEE J. Sel. Top. Signal Process. 2010, 4, 756–766. [Google Scholar]
- Woo, I.; Otsmo, K.; Kim, S.; Ebert, D.S.; Delp, E.J.; Boushey, C.J. Automatic portion estimation and visual refinement in mobile dietary assessment. In Computational Imaging VIII; International Society for Optics and Photonics: Bellingham, WA, USA, 2010; Volume 7533, p. 75330O. [Google Scholar]
- Jia, W.; Yue, Y.; Fernstrom, J.D.; Yao, N.; Sclabassi, R.J.; Fernstrom, M.H.; Sun, M. Imaged based estimation of food volume using circular referents in dietary assessment. J. Food Eng. 2012, 109, 76–86. [Google Scholar] [CrossRef] [Green Version]
- Min, W.; Wang, Z.; Liu, Y.; Luo, M.; Kang, L.; Wei, X.; Wei, X.; Jiang, S. Large scale visual food recognition. arXiv 2021, arXiv:2103.16107. [Google Scholar]
- Aguilar, E.; Bolaños, M.; Radeva, P. Regularized uncertainty-based multi-task learning model for food analysis. J. Vis. Commun. Image Represent. 2019, 60, 360–370. [Google Scholar] [CrossRef]
- He, J.; Zhu, F. Online continual learning for visual food classification. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2337–2346. [Google Scholar]
- Arpey, N.C.; Gaglioti, A.H.; Rosenbaum, M.E. How socioeconomic status affects patient perceptions of health care: A qualitative study. J. Prim. Care Community Health 2017, 8, 169–175. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gianfrancesco, M.A.; Tamang, S.; Yazdany, J.; Schmajuk, G. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern. Med. 2018, 178, 1544–1547. [Google Scholar] [CrossRef] [PubMed]
- Boeing, H. Nutritional epidemiology: New perspectives for understanding the diet-disease relationship? Eur. J. Clin. Nutr. 2013, 67, 424–429. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ioannidis, J.P. The challenge of reforming nutritional epidemiologic research. JAMA 2018, 320, 969–970. [Google Scholar] [CrossRef] [PubMed]
- Kong, Y.W.; Baqar, S.; Jerums, G.; Ekinci, E.I. Sodium and its role in cardiovascular disease—The debate continues. Front. Endocrinol. 2016, 7, 164. [Google Scholar] [CrossRef] [Green Version]
- Investigators, S.; Dehghan, M.; Mente, A.; Zhang, X.; Swaminathan, S.; Li, W.; Mohan, V.; Iqbal, R.; Kumar, R.; Wentzel-Viljoen, E.; et al. Associations of fats and carbohydrate intake with cardiovascular disease and mortality in 18 countries from five continents (PURE): A prospective cohort study. Lancet 2017, 390, 2050–2062. [Google Scholar]
- Bodnar, L.M.; Cartus, A.R.; Kirkpatrick, S.I.; Himes, K.P.; Kennedy, E.H.; Simhan, H.N.; Grobman, W.A.; Duffy, J.Y.; Silver, R.M.; Parry, S.; et al. Machine learning as a strategy to account for dietary synergy: An illustration based on dietary intake and adverse pregnancy outcomes. Am. J. Clin. Nutr. 2020, 111, 1235–1243. [Google Scholar] [CrossRef]
- de Cos Juez, F.J.; Suárez-Suárez, M.; Lasheras, F.S.; Murcia-Mazón, A. Application of neural networks to the study of the influence of diet and lifestyle on the value of bone mineral density in post-menopausal women. Math. Comput. Model. 2011, 54, 1665–1670. [Google Scholar] [CrossRef]
- Zeng, J.; Zhang, J.; Li, Z.; Li, T.; Li, G. Prediction model of artificial neural network for the risk of hyperuricemia incorporating dietary risk factors in a Chinese adult study. Food Nutr. Res. 2020, 64, 3712. [Google Scholar] [CrossRef] [Green Version]
- Chew, E.Y. Age-related Macular Degeneration: Nutrition, Genes and Deep Learning—The LXXVI Edward Jackson Memorial Lecture. Am. J. Ophthalmol. 2020, 217, 335–347. [Google Scholar] [CrossRef]
- Puvanesarajah, S.; Hodge, J.M.; Evans, J.L.; Seo, W.; Yi, M.; Fritz, M.M.; Macheski-Preston, M.; Gansler, T.; Gapstur, S.M.; Gaudet, M.M. Unsupervised deep-learning to identify histopathological features among breast cancers in the Cancer Prevention Study-II Nutrition Cohort. Cancer Res. 2019, 79, 2417. [Google Scholar]
- Vivot, A.; Grégory, J.; Porcher, R. Application of Basic Epidemiologic Principles and Electronic Health Records in a Deep Learning Prediction Model. JAMA Dermatol. 2020, 156, 472–473. [Google Scholar] [CrossRef] [PubMed]
- Wong, T.Y.; Bressler, N.M. Artificial intelligence with deep learning technology looks into diabetic retinopathy screening. JAMA 2016, 316, 2366–2367. [Google Scholar] [CrossRef] [PubMed]
- Byeon, H. Is Deep Learning Better than Machine Learning to Predict Benign Laryngeal Disorders? Int. J. Adv. Comput. Sci. Appl. 2021, 12, 112–117. [Google Scholar] [CrossRef]
- Xiong, H.; Lin, P.; Yu, J.G.; Ye, J.; Xiao, L.; Tao, Y.; Jiang, Z.; Lin, W.; Liu, M.; Xu, J.; et al. Computer-aided diagnosis of laryngeal cancer via deep learning based on laryngoscopic images. EBioMedicine 2019, 48, 92–99. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- VoPham, T.; Hart, J.E.; Laden, F.; Chiang, Y.Y. Emerging trends in geospatial artificial intelligence (geoAI): Potential applications for environmental epidemiology. Environ. Health 2018, 17, 40. [Google Scholar] [CrossRef] [PubMed]
- Hoffmann, K.; Schulze, M.B.; Schienkiewitz, A.; Nöthlings, U.; Boeing, H. Application of a new statistical method to derive dietary patterns in nutritional epidemiology. Am. J. Epidemiol. 2004, 159, 935–944. [Google Scholar] [CrossRef] [PubMed]
- Zhang, F.; Tapera, T.M.; Gou, J. Application of a new dietary pattern analysis method in nutritional epidemiology. BMC Med. Res. Methodol. 2018, 18, 119. [Google Scholar] [CrossRef] [PubMed]
- Santos, R.d.O.; Gorgulho, B.M.; Castro, M.A.d.; Fisberg, R.M.; Marchioni, D.M.; Baltar, V.T. Principal component analysis and factor analysis: Differences and similarities in nutritional epidemiology application. Rev. Bras. Epidemiol. 2019, 22, e190041. [Google Scholar] [CrossRef] [Green Version]
- Falissard, L.; Fagherazzi, G.; Howard, N.; Falissard, B. Deep clustering of longitudinal data. arXiv 2018, arXiv:1802.03212. [Google Scholar]
- Wang, Y.; Yao, H.; Zhao, S. Auto-encoder based dimensionality reduction. Neurocomputing 2016, 184, 232–242. [Google Scholar] [CrossRef]
- Kwon, Y.J.; Kim, H.S.; Jung, D.H.; Kim, J.K. Cluster analysis of nutritional factors associated with low muscle mass index in middle-aged and older adults. Clin. Nutr. 2020, 39, 3369–3376. [Google Scholar] [CrossRef] [PubMed]
- Walter, S.; Tiemeier, H. Variable selection: Current practice in epidemiological studies. Eur. J. Epidemiol. 2009, 24, 733–736. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Altmann, A.; Toloşi, L.; Sander, O.; Lengauer, T. Permutation importance: A corrected feature importance measure. Bioinformatics 2010, 26, 1340–1347. [Google Scholar] [CrossRef] [PubMed]
- Zeevi, D.; Korem, T.; Zmora, N.; Israeli, D.; Rothschild, D.; Weinberger, A.; Ben-Yacov, O.; Lador, D.; Avnit-Sagi, T.; Lotan-Pompan, M.; et al. Personalized nutrition by prediction of glycemic responses. Cell 2015, 163, 1079–1094. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Dipnall, J.F.; Pasco, J.A.; Berk, M.; Williams, L.J.; Dodd, S.; Jacka, F.N.; Meyer, D. Fusing data mining, machine learning and traditional statistics to detect biomarkers associated with depression. PLoS ONE 2016, 11, e0148195. [Google Scholar] [CrossRef] [Green Version]
- Russo, S.; Li, G.; Villez, K. Automated model selection in principal component analysis: A new approach based on the cross-validated ignorance score. Ind. Eng. Chem. Res. 2019, 58, 13448–13468. [Google Scholar] [CrossRef]
- Trepanowski, J.F.; Ioannidis, J.P. Perspective: Limiting dependence on nonrandomized studies and improving randomized trials in human nutrition research: Why and how. Adv. Nutr. 2018, 9, 367–377. [Google Scholar] [CrossRef]
- Brisk, R.; Bond, R.; Finlay, D.; McLaughlin, J.; Piadlo, A.; Leslie, S.J.; Gossman, D.E.; Menown, I.B.; McEneaney, D.J.; Warren, S. The effect of confounding data features on a deep learning algorithm to predict complete coronary occlusion in a retrospective observational setting. Eur. Heart J.-Digit. Health 2021, 2, 127–134. [Google Scholar] [CrossRef]
- Badgeley, M.A.; Zech, J.R.; Oakden-Rayner, L.; Glicksberg, B.S.; Liu, M.; Gale, W.; McConnell, M.V.; Percha, B.; Snyder, T.M.; Dudley, J.T. Deep learning predicts hip fracture using confounding patient and healthcare variables. NPJ Digit. Med. 2019, 2, 31. [Google Scholar] [CrossRef] [Green Version]
- García, S.; Ramírez-Gallego, S.; Luengo, J.; Benítez, J.M.; Herrera, F. Big data preprocessing: Methods and prospects. Big Data Anal. 2016, 1, 9. [Google Scholar] [CrossRef] [Green Version]
- Kotsiantis, S.B.; Kanellopoulos, D.; Pintelas, P.E. Data preprocessing for supervised leaning. Int. J. Comput. Sci. 2006, 1, 111–117. [Google Scholar]
- Lakshminarayan, K.; Harp, S.A.; Goldman, R.P.; Samad, T. Imputation of Missing Data Using Machine Learning Techniques. In Proceedings of the KDD, Portland, OR, USA, 2–4 August 1996; Volume 96. [Google Scholar]
- Richman, M.B.; Trafalis, T.B.; Adrianto, I. Missing data imputation through machine learning algorithms. In Artificial Intelligence Methods in the Environmental Sciences; Springer: Berlin/Heidelberg, Germany, 2009; pp. 153–169. [Google Scholar]
- Batista, G.E.; Monard, M.C. An analysis of four missing data treatment methods for supervised learning. Appl. Artif. Intell. 2003, 17, 519–533. [Google Scholar] [CrossRef]
- Jerez, J.M.; Molina, I.; García-Laencina, P.J.; Alba, E.; Ribelles, N.; Martín, M.; Franco, L. Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif. Intell. Med. 2010, 50, 105–115. [Google Scholar] [CrossRef]
- Al-Milli, N.; Almobaideen, W. Hybrid neural network to impute missing data for IoT applications. In Proceedings of the 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), Amman, Jordan, 9–11 April 2019; pp. 121–125. [Google Scholar]
- Heaton, J. An empirical analysis of feature engineering for predictive modeling. In Proceedings of the SoutheastCon 2016, Amman, Jordan, 9–11 April 2016; pp. 1–6. [Google Scholar]
- Morgenstern, J.D.; Rosella, L.C.; Costa, A.P.; Anderson, L.N. Development of Machine Learning Prediction Models to Explore Nutrients Predictive of Cardiovascular Disease Using Canadian Linked Population-Based Data. Appl. Physiol. Nutr. Metab. 2022. [Google Scholar] [CrossRef]
- Russo, S.; Besmer, M.D.; Blumensaat, F.; Bouffard, D.; Disch, A.; Hammes, F.; Hess, A.; Lürig, M.; Matthews, B.; Minaudo, C.; et al. The value of human data annotation for machine learning based anomaly detection in environmental systems. Water Res. 2021, 206, 117695. [Google Scholar] [CrossRef]
- Sheng, V.S.; Provost, F.; Ipeirotis, P.G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 614–622. [Google Scholar]
- Gudivada, V.; Apon, A.; Ding, J. Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. Int. J. Adv. Softw. 2017, 10, 1–20. [Google Scholar]
- Wang, Q.; Ma, Y.; Zhao, K.; Tian, Y. A comprehensive survey of loss functions in machine learning. Ann. Data Sci. 2020, 9, 187–212. [Google Scholar] [CrossRef]
- Tran, G.S.; Nghiem, T.P.; Nguyen, V.T.; Luong, C.M.; Burie, J.C. Improving accuracy of lung nodule classification using deep learning with focal loss. J. Healthc. Eng. 2019, 2019, 5156416. [Google Scholar] [CrossRef]
- Rodriguez, J.D.; Perez, A.; Lozano, J.A. Sensitivity analysis of k-fold cross validation in prediction error estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 569–575. [Google Scholar] [CrossRef]
- Holzinger, A.; Biemann, C.; Pattichis, C.S.; Kell, D.B. What do we need to build explainable AI systems for the medical domain? arXiv 2017, arXiv:1712.09923. [Google Scholar]
- Gunning, D.; Stefik, M.; Choi, J.; Miller, T.; Stumpf, S.; Yang, G.Z. XAI—Explainable artificial intelligence. Sci. Robot. 2019, 4, eaay7120. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Batterham, M.; Neale, E.; Martin, A.; Tapsell, L. Data mining: Potential applications in research on nutrition and health. Nutr. Diet. 2017, 74, 3–10. [Google Scholar] [CrossRef] [PubMed]
- Davis, J.; Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar]
- Schelter, S.; Biessmann, F.; Januschowski, T.; Salinas, D.; Seufert, S.; Szarvas, G. On Challenges in Machine Learning Model Management. 2018. Available online: http://sites.computer.org/debull/A18dec/p5.pdf (accessed on 17 March 2022).
- Diebolt, V.; Azancot, I.; Boissel, F.H.; Adenot, I.; Balague, C.; Barthelemy, P.; Boubenna, N.; Coulonjou, H.; Fernandez, X.; Habran, E.; et al. “Artificial intelligence”: Which services, which applications, which results and which development today in clinical research? Which impact on the quality of care? Which recommendations? Therapies 2019, 74, 155–164. [Google Scholar] [CrossRef] [PubMed]
- Feng, J.Z.; Wang, Y.; Peng, J.; Sun, M.W.; Zeng, J.; Jiang, H. Comparison between logistic regression and machine learning algorithms on survival prediction of traumatic brain injuries. J. Crit. Care 2019, 54, 110–116. [Google Scholar] [CrossRef] [PubMed]
- Mortazavi, B.J.; Downing, N.S.; Bucholz, E.M.; Dharmarajan, K.; Manhapra, A.; Li, S.X.; Negahban, S.N.; Krumholz, H.M. Analysis of machine learning techniques for heart failure readmissions. Circ. Cardiovasc. Qual. Outcomes 2016, 9, 629–640. [Google Scholar] [CrossRef] [Green Version]
- Nusinovici, S.; Tham, Y.C.; Yan, M.Y.C.; Ting, D.S.W.; Li, J.; Sabanayagam, C.; Wong, T.Y.; Cheng, C.Y. Logistic regression was as good as machine learning for predicting major chronic diseases. J. Clin. Epidemiol. 2020, 122, 56–69. [Google Scholar] [CrossRef]
- Gravesteijn, B.Y.; Nieboer, D.; Ercole, A.; Lingsma, H.F.; Nelson, D.; Van Calster, B.; Steyerberg, E.W.; Åkerlund, C.; Amrein, K.; Andelic, N.; et al. Machine learning algorithms performed no better than regression models for prognostication in traumatic brain injury. J. Clin. Epidemiol. 2020, 122, 95–107. [Google Scholar] [CrossRef]
- Rosso, N.; Giabbanelli, P. Accurately inferring compliance to five major food guidelines through simplified surveys: Applying data mining to the UK National Diet and Nutrition Survey. JMIR Public Health Surveill. 2018, 4, e9536. [Google Scholar] [CrossRef]
- Riboli, E.; Hunt, K.; Slimani, N.; Ferrari, P.; Norat, T.; Fahey, M.; Charrondiere, U.; Hemon, B.; Casagrande, C.; Vignat, J.; et al. European Prospective Investigation into Cancer and Nutrition (EPIC): Study populations and data collection. Public Health Nutr. 2002, 5, 1113–1124. [Google Scholar] [CrossRef]
- Sak, J.; Suchodolska, M. Artificial Intelligence in Nutrients Science Research: A Review. Nutrients 2021, 13, 322. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Learning Type | Technique | Models |
---|---|---|
Supervised | Classification | Random Forest, Naïve Bayes, |
Support Vector Machine, | ||
k-Nearest Neighbour, ANN | ||
Regression | Linear Regression | |
Logistic Regression | ||
Random Forest, ANN | ||
Unsupervised | Feature extraction | PCA |
Deep Autoencoders | ||
Manifold Learning | ||
Clustering | Gaussian Mixture Models | |
k-Means | ||
Deep Neural Networks |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Russo, S.; Bonassi, S. Prospects and Pitfalls of Machine Learning in Nutritional Epidemiology. Nutrients 2022, 14, 1705. https://doi.org/10.3390/nu14091705
Russo S, Bonassi S. Prospects and Pitfalls of Machine Learning in Nutritional Epidemiology. Nutrients. 2022; 14(9):1705. https://doi.org/10.3390/nu14091705
Chicago/Turabian StyleRusso, Stefania, and Stefano Bonassi. 2022. "Prospects and Pitfalls of Machine Learning in Nutritional Epidemiology" Nutrients 14, no. 9: 1705. https://doi.org/10.3390/nu14091705
APA StyleRusso, S., & Bonassi, S. (2022). Prospects and Pitfalls of Machine Learning in Nutritional Epidemiology. Nutrients, 14(9), 1705. https://doi.org/10.3390/nu14091705