Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence
Abstract
:1. Introduction
2. Methods
2.1. Part 1: Text-Mining Model 1
- A corpus (or body of text, which in this case are the titles and abstracts of the articles);
- A dictionary, associating each word to a unique numeric ID;
- A number of topics to be extracted from our documents.
2.2. Part 2: Manual Labelling Text-Mining Model 1 Results
2.3. Part 3: Manual Assessment of Model 1 Accuracy–Classifying Model 1 Articles
2.4. Part 4: Text-Mining Model 2
2.5. Part 5: PyLDAvis Visualization
2.6. Part 6: Consensus Evaluation of Model 2
3. Results
3.1. Determining Topic Number from Text-Mining Model 1 to Best Capture DOHaD Domains
3.2. Labelling Topics from Text-Mining Model 1
3.3. Determining Types and Relevance to DOHaD of Articles Scraped in Model 1
3.4. Topic Modelling of Text-Mining Model 2
3.5. Consensus Evaluation of Model 2
4. Discussion
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Barker, D.J.P. Developmental origins of adult health and disease. J. Epidemiol. Community Health 2004, 58, 114–115. [Google Scholar] [CrossRef] [Green Version]
- Suzuki, K. The developing world of DOHaD. J. Dev. Orig. Health Dis. 2018, 9, 266–269. [Google Scholar] [CrossRef]
- Abdul-Hussein, A.; Kareem, A.; Tewari, S.; Bergeron, J.; Briollais, L.; Challis, J.R.G.; Davidge, S.T.; Delrieux, C.; Fortier, I.; Goldowitz, D.; et al. Early life risk and resiliency factors and their influences on developmental outcomes and disease pathways: A rapid evidence review of systematic reviews and meta-analyses. J. Dev. Orig. Health Dis. 2021, 12, 357–372. [Google Scholar] [CrossRef] [PubMed]
- Ding, J.; Berleant, D.; Nettleton, D.; Wurtele, E. Mining MEDLINE: Abstracts, sentences, or phrases? Pac. Symp. Biocomput. 2002, 7, 326–337. [Google Scholar] [CrossRef] [Green Version]
- Pletscher-Frankild, S.; Pallejà, A.; Tsafou, K.; Binder, J.X.; Jensen, L.J. DISEASES: Text mining and data integration of disease–gene associations. Methods 2015, 74, 83–89. [Google Scholar] [CrossRef]
- Przybyła, P.; Shardlow, M.; Aubin, S.; Bossy, R.; Eckart de Castilho, R.; Piperidis, S.; McNaught, J.; Ananiadou, S. Text mining resources for the life sciences. Database J. Biol. Databases Curation 2016, 2016, baw145. [Google Scholar] [CrossRef]
- Talib, R.; Kashif, M.; Ayesha, S.; Fatima, F. Text Mining: Techniques, Applications and Issues. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 414–418. [Google Scholar] [CrossRef]
- Liu, L.; Tang, L.; Dong, W.; Yao, S.; Zhou, W. An overview of topic modeling and its current applications in bioinformatics. SpringerPlus 2016, 5, 1608. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Artrith, N.; Butler, K.T.; Coudert, F.X.; Han, S.; Isayev, O.; Jain, A.; Walsh, A. Best practices in machine learning for chemistry. Nat. Chem. 2021, 13, 505–508. [Google Scholar] [CrossRef]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent dirichlet allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Muhukadan, B. Selenium with Python: Selenium Python Bindings 2 Documentation. Available online: https://doc.bccnsoft.com/docs/selenium-python-2.45.0/ (accessed on 20 October 2021).
- RARE Techonologies. Gensim: Topic Modelling for Humans. 4.1.2. PythonRepo 2021. [Google Scholar] [CrossRef]
- Bracken, M.B. Why animal studies are often poor predictors of human reactions to exposure. J. R. Soc. Med. 2009, 102, 120–122. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Leenaars, C.H.C.; Kouwenaar, C.; Stafleu, F.R.; Bleich, A.; Ritskes-Hoitinga, M.; De Vries, R.B.M.; Meijboom, F.L.B. Animal to human translation: A systematic scoping review of reported concordance rates. J. Transl. Med. 2019, 17, 223. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ferreira, G.S.; Veening-Griffioen, D.H.; Boon, W.P.C.; Moors, E.H.M.; van Meer, P.J.K. Levelling the Translational Gap for Animal to Human Efficacy Data. Animals 2020, 10, 1199. [Google Scholar] [CrossRef] [PubMed]
- Hooijmans, C.R.; Ritskes-Hoitinga, M. Progress in using systematic reviews of animal studies to improve translational research. PLoS Med. 2013, 10, e1001482. [Google Scholar] [CrossRef] [Green Version]
- Renganathan, V. Text Mining in Biomedical Domain with Emphasis on Document Clustering. Healthc. Inform. Res. 2017, 23, 141–146. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Landis, J.R.; Koch, G.G. The measurement of observer agreement for categorical data. Biometrics 1977, 33, 159–174. [Google Scholar] [CrossRef] [Green Version]
- Han, Y.; Wennersten, S.A.; Lam, M.P.Y. Working the literature harder: What can text mining and bibliometric analysis reveal? Expert Rev. Proteom. 2019, 16, 871–873. [Google Scholar] [CrossRef] [Green Version]
- Boland, M.R.; Kashyap, A.; Xiong, J.; Holmes, J.; Lorch, S. Development and validation of the PEPPER framework (Prenatal Exposure PubMed ParsER) with applications to food additives. J. Am. Med. Inform. Assoc. 2018, 25, 1432–1443. [Google Scholar] [CrossRef]
- Cheng, X.; Shuai, C.; Liu, J.; Wang, J.; Liu, Y.; Li, W.; Shuai, J. Topic modelling of ecology, environment and poverty nexus: An integrated framework. Agric. Ecosyst. Environ. 2018, 267, 1–14. [Google Scholar] [CrossRef]
- Hintzen, R.E.; Papadopoulou, M.; Mounce, R.; Banks-Leite, C.; Holt, R.D.; Mills, M.; Knight, A.T.; Leroi, A.M.; Rosindell, J. Relationship between conservation biology and ecology shown through machine reading of 32,000 articles. Conserv. Biol. 2019, 34, 721–732. [Google Scholar] [CrossRef] [Green Version]
- Hussain, J.; Khan, W.A.; Hur, T.; Bilal, H.S.M.; Bang, J.; Hassan, A.U.; Afzal, M.; Lee, S. A Multimodal Deep Log-Based User Experience (UX) Platform for UX Evaluation. Sensors 2018, 18, 1622. [Google Scholar] [CrossRef] [Green Version]
- Zhao, Y.; Zhang, J.; Wu, M. Finding Users’ Voice on Social Media: An Investigation of Online Support Groups for Autism-Affected Users on Facebook. Int. J. Environ. Res. Public Health 2019, 16, 4804. [Google Scholar] [CrossRef] [Green Version]
- Bisgin, H.; Liu, Z.; Fang, H.; Xu, X.; Tong, W. Mining FDA drug labels using an unsupervised learning technique—Topic modeling. BMC Bioinform. 2011, 12, S11. [Google Scholar] [CrossRef] [Green Version]
- Wang, S.-H.; Ding, Y.; Zhao, W.; Huang, Y.-H.; Perkins, R.; Zou, W.; Chen, J.J. Text mining for identifying topics in the literatures about adolescent substance use and depression. BMC Public Health 2016, 16, 279. [Google Scholar] [CrossRef] [Green Version]
- Yang, F.-C.; Lee, A.J.T.; Kuo, S.-C. Mining Health Social Media with Sentiment Analysis. J. Med. Syst. 2016, 40, 236. [Google Scholar] [CrossRef]
- Martínez-García, M.; Vallejo, M.; Hernández-Lemus, E.; Álvarez-Díaz, J.A. Novel methods of qualitative analysis for health policy research. Health Res. Policy Syst. 2019, 17, 6. [Google Scholar] [CrossRef] [PubMed]
- Westergaard, D.; Stærfeldt, H.-H.; Tønsberg, C.; Jensen, L.J.; Brunak, S. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput. Biol. 2018, 14, e1005962. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Landis, S.C.; Amara, S.G.; Asadullah, K.; Austin, C.P.; Blumenstein, R.; Bradley, E.W.; Crystal, R.G.; Darnell, R.B.; Ferrante, R.J.; Fillit, H.; et al. A call for transparent reporting to optimize the predictive value of preclinical research. Nature 2012, 490, 187–191. [Google Scholar] [CrossRef] [PubMed] [Green Version]





| Manual Labelling Results: Scraping Tool Articles | Number of Articles |
|---|---|
| Articles Scraped from Code That Were Found in PubMed Search of RER * Articles (Out of a Sample of 100) | 75 |
| Animal Studies | 392 |
| Reviews/Lectures/Commentaries/Protocols | 824 |
| Systematic Reviews and Meta-Analyses | 44 |
| Not a Primary Study Article | 72 |
| No Access/Information | 76 |
| Different Language | 52 |
| Not Related to DOHaD | 255 |
| Included Articles (Related to DOHaD According to Abstract Judgement) | 848 |
| Topic Number | Suggested Label |
|---|---|
| 1 | Brain/Neuro Dysfunction |
| 2 | Health Research |
| 3 | Parent-Child Relationship |
| 4 | Global/Rural Socioeconomic Status |
| 5 | Auto-Immune Disorders (Allergies, Asthma, Wheeze, Eczema, etc.) |
| 6 | Metabolic Programming (Cardiovascular, Hypertension, Programming, etc.) |
| 7 | Blood/Hormone Levels (Serum Properties) |
| 8 | Infection/Inflammation (Cellular Level) |
| 9 | Pathogens |
| 10 | Hepatogenic secretions |
| 11 | None |
| 12 | Respiratory Disorders |
| 13 | Fatty Acids/Diet |
| 14 | Metabolic/Supplementation |
| 15 | Diabetes |
| 16 | Hospitalization |
| 17 | Development (Mutation, Underdeveloped) |
| 18 | Immunology (Infection, etc.) |
| 19 | Time Periods (Prenatal, Neonatal, Birth) (Possibly as n Exposure) |
| 20 | Steroids |
| 21 | Fetal Development |
| 22 | Drug/Toxicant Exposure |
| 23 | Cardiovascular Dysfunction |
| 24 | Stress-Related |
| 25 | None |
| 26 | Microbiome-related |
| 27 | Alcohol |
| 28 | Toxicants |
| 29 | Environmental |
| 30 | Childhood Disorders |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tewari, S.; Toledo Margalef, P.; Kareem, A.; Abdul-Hussein, A.; White, M.; Wazana, A.; Davidge, S.T.; Delrieux, C.; Connor, K.L. Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence. J. Pers. Med. 2021, 11, 1064. https://doi.org/10.3390/jpm11111064
Tewari S, Toledo Margalef P, Kareem A, Abdul-Hussein A, White M, Wazana A, Davidge ST, Delrieux C, Connor KL. Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence. Journal of Personalized Medicine. 2021; 11(11):1064. https://doi.org/10.3390/jpm11111064
Chicago/Turabian StyleTewari, Shrankhala, Pablo Toledo Margalef, Ayesha Kareem, Ayah Abdul-Hussein, Marina White, Ashley Wazana, Sandra T. Davidge, Claudio Delrieux, and Kristin L. Connor. 2021. "Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence" Journal of Personalized Medicine 11, no. 11: 1064. https://doi.org/10.3390/jpm11111064
APA StyleTewari, S., Toledo Margalef, P., Kareem, A., Abdul-Hussein, A., White, M., Wazana, A., Davidge, S. T., Delrieux, C., & Connor, K. L. (2021). Mining Early Life Risk and Resiliency Factors and Their Influences in Human Populations from PubMed: A Machine Learning Approach to Discover DOHaD Evidence. Journal of Personalized Medicine, 11(11), 1064. https://doi.org/10.3390/jpm11111064

