Accelerating Disease Model Parameter Extraction: An LLM-Based Ranking Approach to Select Initial Studies for Literature Review Automation
Abstract
:1. Introduction
1.1. Contribution
- Validating the use of a generative LLM for relevancy ranking of primary studies by utilising a QA framework in the area of climate-sensitive zoonotic diseases.
- Evaluating how well the solution can generalise across the climate-sensitive zoonotic disease literature.
- Evaluating the utility of the LLM-generated reasoning text for human reviewers to enhance transparency and trust.
1.1.1. Problem Definition
1.1.2. Background
1.2. Research Questions
- RQ1
- How does an LLM-based assessor utilising a QA framework compared to baseline models utilising review title and selection criteria?
- RQ2
- Does the label granularity effect the ranking performance of an LLM-based assessor utilising a QA framework for climate-sensitive zoonotic disease?
- RQ3
- Does the ranking performance of an LLM-based assessor generalise across climate-sensitive zoonotic disease datasets with varying relevance rate?
- RQ4
- Does CoT rationale provided by an LLM assist a human reviewer’s ability to detect misclassifications in SLR?
2. Methodology
2.1. Dataset
2.2. QA Framework
2.3. Prompts
Listing 1. TSC prompt, based on the review title and selection criteria. |
2.4. Answer Labels
2.5. Relevancy Ranking
- be the eligibility questions derived from the selection criteria.
- be the set of predicated answers for each question in document d.
- be a predefined weight reflecting the importance of each question .
2.6. Models
2.6.1. Baseline Models
2.6.2. Large Language Models
2.7. Evaluation Metrics
- Recall @ k%
- nWSS @ r%
- AP
- MAP
2.8. Experimental Setup
3. Results
3.1. RQ1. LLM-Based QA Assessor vs. Baseline
3.2. RQ2. Effect of Label Granularity
3.3. RQ3. Performance Across Zoonotic Diseases
3.4. RQ4. Utility of Generated CoT Rationale
Listing 2. Example JSON response for an RVF abstract. |
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
MDPI | Multidisciplinary Digital Publishing Institute |
LLM | Large language model |
QA | Question and answer |
CCHF | Crimean–Congo haemorrhagic fever |
RVF | Rift valley fever |
Appendix A. Additional Prompts and Disease Information
Listing A1. QA single-level prompt, based on QA framework. |
Listing A2. QA multi-level prompt, based on QA framework and supporting an answer and a confidence level. |
References
- Ryan, S.J.; Lippi, C.A.; Caplan, T.; Diaz, A.; Dunbar, W.; Grover, S.; Johnson, S.; Knowles, R.; Lowe, R.; Mateen, B.A.; et al. The Current Landscape of Software Tools for the Climate-Sensitive Infectious Disease Modelling Community. Lancet Planet. Health 2023, 7, e527–e536. [Google Scholar] [CrossRef]
- Allen, T.; Murray, K.A.; Zambrana-Torrelio, C.; Morse, S.S.; Rondinini, C.; Di Marco, M.; Breit, N.; Olival, K.J.; Daszak, P. Global Hotspots and Correlates of Emerging Zoonotic Diseases. Nat. Commun. 2017, 8, 1124. [Google Scholar] [CrossRef]
- Grace, D.; Mutua, F.K.; Ochungo, P.; Kruska, R.L.; Jones, K.; Brierley, L.; Lapar, M.L.; Said, M.Y.; Herrero, M.T.; Phuc, P.M.; et al. Mapping of Poverty and Likely Zoonoses Hotspots; Technical Report; International Livestock Research Institute: Nairobi, Kenya, 2012. [Google Scholar]
- Gubbins, S.; Carpenter, S.; Mellor, P.; Baylis, M.; Wood, J. Assessing the Risk of Bluetongue to UK Livestock: Uncertainty and Sensitivity Analyses of a Temperature-Dependent Model for the Basic Reproduction Number. J. R. Soc. Interface 2008, 5, 363–371. [Google Scholar] [CrossRef]
- Guis, H.; Caminade, C.; Calvete, C.; Morse, A.P.; Tran, A.; Baylis, M. Modelling the Effects of Past and Future Climate on the Risk of Bluetongue Emergence in Europe. J. R. Soc. Interface 2011, 9, 339–350. [Google Scholar] [CrossRef]
- Dekkers, R.; Carey, L.D.; Langhorne, P. Making Literature Reviews Work: A Multidisciplinary Guide to Systematic Approaches; Springer: Cham, Swizerland, 2023. [Google Scholar]
- Chandler, J.; Cumpston, M.; Li, T.; Page, M.J.; Welch, V.J.H.W. Cochrane Handbook for Systematic Reviews of Interventions, 2nd ed.; Higgins, J.P.T., Thomas, J., Chandler, J., Cumpston, M., Li, T., Page, M.J., Welch, V.A., Eds.; Cochrane Book Series; Wiley-Blackwell: Hoboken, NJ, USA, 2019. [Google Scholar]
- Kitchenham, B.; Brereton, P. A systematic review of systematic review process research in software engineering. Inf. Softw. Technol. 2013, 55, 2049–2075. [Google Scholar] [CrossRef]
- Tricco, A.C.; Brehaut, J.; Chen, M.H.; Moher, D. Following 411 Cochrane Protocols to Completion: A Retrospective Cohort Study. PLoS ONE 2008, 3, e3684. [Google Scholar] [CrossRef]
- Michelson, M.; Reuter, K. The Significant Cost of Systematic Reviews and Meta-Analyses: A Call for Greater Involvement of Machine Learning to Assess the Promise of Clinical Trials. Contemp. Clin. Trials Commun. 2019, 16, 100443. [Google Scholar] [CrossRef]
- Bornmann, L.; Haunschild, R.; Mutz, R. Growth Rates of Modern Science: A Latent Piecewise Growth Curve Approach to Model Publication Numbers from Established and New Literature Databases. Humanit. Soc. Sci. Commun. 2021, 8, 224. [Google Scholar] [CrossRef]
- Bashir, R.; Surian, D.; Dunn, A.G. Time-to-Update of Systematic Reviews Relative to the Availability of New Evidence. Syst. Rev. 2018, 7, 195. [Google Scholar] [CrossRef]
- Jones, B.A.; Grace, D.; Kock, R.; Alonso, S.; Rushton, J.; Said, M.Y.; McKeever, D.; Mutua, F.; Young, J.; McDermott, J.; et al. Zoonosis Emergence Linked to Agricultural Intensification and Environmental Change. Proc. Natl. Acad. Sci. USA 2013, 110, 8399–8404. [Google Scholar] [CrossRef]
- Shaheen, M.N.F. The Concept of One Health Applied to the Problem of Zoonotic Diseases. Rev. Med. Virol. 2022, 32, e2326. [Google Scholar] [CrossRef]
- Jones, K.E.; Patel, N.G.; Levy, M.A.; Storeygard, A.; Balk, D.; Gittleman, J.L.; Daszak, P. Global Trends in Emerging Infectious Diseases. Nature 2008, 451, 990–993. [Google Scholar] [CrossRef]
- Liao, H.; Lyon, C.J.; Ying, B.; Hu, T. Climate Change, Its Impact on Emerging Infectious Diseases and New Technologies to Combat the Challenge. Emerg. Microbes Infect. 2025, 13, 2356143. [Google Scholar] [CrossRef]
- Van de Vuurst, P.; Escobar, L.E. Climate Change and Infectious Disease: A Review of Evidence and Research Trends. Infect. Dis. Poverty 2023, 12, 51. [Google Scholar] [CrossRef]
- Altizer, S.; Ostfeld, R.S.; Johnson, P.T.J.; Kutz, S.; Harvell, C.D. Climate Change and Infectious Diseases: From Evidence to a Predictive Framework. Science 2013, 341, 514–519. [Google Scholar] [CrossRef]
- Clark, J.; McFarlane, C.; Cleo, G.; Ishikawa Ramos, C.; Marshall, S. The Impact of Systematic Review Automation Tools on Methodological Quality and Time Taken to Complete Systematic Review Tasks: Case Study. JMIR Med. Educ. 2021, 7, e24418. [Google Scholar] [CrossRef]
- Thomas, J.; McDonald, S.; Noel-Storr, A.; Shemilt, I.; Elliott, J.; Mavergames, C.; Marshall, I.J. Machine Learning Reduced Workload with Minimal Risk of Missing Studies: Development and Evaluation of a Randomized Controlled Trial Classifier for Cochrane Reviews. J. Clin. Epidemiol. 2021, 133, 140–151. [Google Scholar] [CrossRef]
- Tsafnat, G.; Glasziou, P.; Choong, M.K.; Dunn, A.; Galgani, F.; Coiera, E. Systematic Review Automation Technologies. Syst. Rev. 2014, 3, 74. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
- Santos, Á.O.D.; Da Silva, E.S.; Couto, L.M.; Reis, G.V.L.; Belo, V.S. The Use of Artificial Intelligence for Automating or Semi-Automating Biomedical Literature Analyses: A Scoping Review. J. Biomed. Inform. 2023, 142, 104389. [Google Scholar] [CrossRef]
- Bolanos, F.; Salatino, A.; Osborne, F.; Motta, E. Artificial Intelligence for Literature Reviews: Opportunities and Challenges. Artif. Intell. Rev. 2024, 57, 259. [Google Scholar] [CrossRef]
- Guo, E.; Gupta, M.; Deng, J.; Park, Y.J.; Paget, M.; Naugler, C. Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study. J. Med. Internet Res. 2024, 26, e48996. [Google Scholar] [CrossRef]
- Issaiy, M.; Ghanaati, H.; Kolahi, S.; Shakiba, M.; Jalali, A.; Zarei, D.; Kazemian, S.; Avanaki, M.; Firouznia, K. Methodological Insights into ChatGPT’s Screening Performance in Systematic Reviews. BMC Med. Res. Methodol. 2024, 24, 78. [Google Scholar] [CrossRef]
- Cao, C.; Sang, J.; Arora, R.; Kloosterman, R.; Cecere, M.; Gorla, J.; Saleh, R.; Chen, D.; Drennan, I.; Teja, B.; et al. Prompting Is All You Need: LLMs for Systematic Review Screening. medRxiv 2024. [Google Scholar] [CrossRef]
- Alshami, A.; Elsayed, M.; Ali, E.; Eltoukhy, A.E.E.; Zayed, T. Harnessing the Power of ChatGPT for Automating Systematic Review Process: Methodology, Case Study, Limitations, and Future Directions. Systems 2023, 11, 351. [Google Scholar] [CrossRef]
- Fernandes Torres, J.P.; Mulligan, C.; Jorge, J.; Moreira, C. PROMPTHEUS: A Human-Centered Pipeline to Streamline Slrs with Llms. arXiv 2024, arXiv:2410.15978. [Google Scholar]
- Schmidt, L.; Finnerty Mutlu, A.N.; Elmore, R.; Olorisade, B.K.; Thomas, J.; Higgins, J.P.T. Data Extraction Methods for Systematic Review (Semi)Automation: Update of a Living Systematic Review. F1000Research 2023, 10, 401. [Google Scholar] [CrossRef]
- Polak, M.P.; Morgan, D. Extracting Accurate Materials Data from Research Papers with Conversational Language Models and Prompt Engineering. Nat. Commun. 2024, 15, 1569. [Google Scholar] [CrossRef]
- Nicholson Thomas, I.; Roche, P.; Grêt-Regamey, A. Harnessing Artificial Intelligence for Efficient Systematic Reviews: A Case Study in Ecosystem Condition Indicators. Ecol. Inform. 2024, 83, 102819. [Google Scholar] [CrossRef]
- Susnjak, T.; Hwang, P.; Reyes, N.H.; Barczak, A.L.C.; McIntosh, T.R.; Ranathunga, S. Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning. arXiv 2024, arXiv:2404.08680. [Google Scholar] [CrossRef]
- Ji, Z.; Yu, T.; Xu, Y.; Lee, N.; Ishii, E.; Fung, P. Towards Mitigating Hallucination in Large Language Models via Self-Reflection. arXiv 2023, arXiv:2310.06271. [Google Scholar] [CrossRef]
- Zack, T.; Lehman, E.; Suzgun, M.; Rodriguez, J.A.; Celi, L.A.; Gichoya, J.; Jurafsky, D.; Szolovits, P.; Bates, D.W.; Abdulnour, R.E.E.; et al. Assessing the Potential of GPT-4 to Perpetuate Racial and Gender Biases in Health Care: A Model Evaluation Study. Lancet Digit. Health 2024, 6, e12–e22. [Google Scholar] [CrossRef] [PubMed]
- Zhao, H.; Chen, H.; Yang, F.; Liu, N.; Deng, H.; Cai, H.; Wang, S.; Yin, D.; Du, M. Explainability for Large Language Models: A Survey. ACM Trans. Intell. Syst. Technol. 2024, 15, 20:1–20:38. [Google Scholar] [CrossRef]
- Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A Survey on Evaluation of Large Language Models. ACM Trans. Intell. Syst. Technol. 2024, 15, 39:1–39:45. [Google Scholar] [CrossRef]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems 35 (NeurIPS 2022); Neural Information Processing Systems Foundation, Inc. (NeurIPS): New Orleans, LA, USA, 2022; Volume 35, pp. 24824–24837. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In Proceedings of the Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2020; Volume 33, pp. 9459–9474. [Google Scholar]
- Scott, A.M.; Forbes, C.; Clark, J.; Carter, M.; Glasziou, P.; Munn, Z. Systematic Review Automation Tools Improve Efficiency but Lack of Knowledge Impedes Their Adoption: A Survey. J. Clin. Epidemiol. 2021, 138, 80–94. [Google Scholar] [CrossRef]
- Polanin, J.R.; Pigott, T.D.; Espelage, D.L.; Grotpeter, J.K. Best Practice Guidelines for Abstract Screening Large-Evidence Systematic Reviews and Meta-Analyses. Res. Synth. Methods 2019, 10, 330–342. [Google Scholar] [CrossRef]
- Sampson, M.; Tetzlaff, J.; Urquhart, C. Precision of Healthcare Systematic Review Searches in a Cross-sectional Sample. Res. Synth. Methods 2011, 2, 119–125. [Google Scholar] [CrossRef]
- Wang, S.; Scells, H.; Koopman, B.; Zuccon, G. Neural Rankers for Effective Screening Prioritisation in Medical Systematic Review Literature Search. In Proceedings of the26th Australasian Document Computing Symposium, ADCS’22, Adelaide, SA, Australia, 15–16 December 2022; pp. 1–10. [Google Scholar] [CrossRef]
- Mitrov, G.; Stanoev, B.; Gievska, S.; Mirceva, G.; Zdravevski, E. Combining Semantic Matching, Word Embeddings, Transformers, and LLMs for Enhanced Document Ranking: Application in Systematic Reviews. Big Data Cogn. Comput. 2024, 8, 110. [Google Scholar] [CrossRef]
- Mao, X.; Zhuang, S.; Koopman, B.; Zuccon, G. Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14– 18 July 2024; pp. 2357–2362. [Google Scholar] [CrossRef]
- Robertson, S.; Zaragoza, H. The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends® Inf. Retr. 2009, 3, 333–389. [Google Scholar] [CrossRef]
- Yang, E.; MacAvaney, S.; Lewis, D.D.; Frieder, O. Goldilocks: Just-Right Tuning of BERT for Technology-Assisted Review. In Proceedings of the Advances in Information Retrieval; Springer: Cham, Switzerland, 2022; pp. 502–517. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers); Burstein, J., Doran, C., Solorio, T., Eds.; Association for Computational Linguistics: Minneapolis, MN, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving language understanding by generative pre-training. OpenAI Tech Report; OpenAI Research: San Francisco, CA, USA, 2018. [Google Scholar]
- Wu, L.; Zheng, Z.; Qiu, Z.; Wang, H.; Gu, H.; Shen, T.; Qin, C.; Zhu, C.; Zhu, H.; Liu, Q.; et al. A Survey on Large Language Models for Recommendation. World Wide Web 2024, 27, 60. [Google Scholar] [CrossRef]
- Kohandel Gargari, O.; Mahmoudi, M.H.; Hajisafarali, M.; Samiee, R. Enhancing Title and Abstract Screening for Systematic Reviews with GPT-3.5 Turbo. BMJ Evid.-Based Med. 2024, 29, 69–70. [Google Scholar] [CrossRef] [PubMed]
- Matsui, K.; Utsumi, T.; Aoki, Y.; Maruki, T.; Takeshima, M.; Takaesu, Y. Human-Comparable Sensitivity of Large Language Models in Identifying Eligible Studies Through Title and Abstract Screening: 3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews. J. Med. Internet Res. 2024, 26, e52758. [Google Scholar] [CrossRef]
- Sanghera, R.; Thirunavukarasu, A.J.; Khoury, M.E.; O’Logbon, J.; Chen, Y.; Watt, A.; Mahmood, M.; Butt, H.; Nishimura, G.; Soltan, A. High-Performance Automated Abstract Screening with Large Language Model Ensembles. arXiv 2024, arXiv:2411.02451. [Google Scholar] [CrossRef]
- Wang, S.; Scells, H.; Koopman, B.; Potthast, M.; Zuccon, G. Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation. In Proceedings of the SIGIR-AP 2023—Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, Beijing, China, 26–28 November 2023; pp. 73–83. [Google Scholar] [CrossRef]
- Akinseloyin, O.; Jiang, X.; Palade, V. A Question-Answering Framework for Automated Abstract Screening Using Large Language Models. J. Am. Med. Inform. Assoc. 2024, 31, 1939–1952. [Google Scholar] [CrossRef] [PubMed]
- Kusa, W.; Mendoza, O.E.; Samwald, M.; Knoth, P.; Hanbury, A. CSMeD: Bridging the Dataset Gap in Automated Citation Screening for Systematic Literature Reviews. Adv. Neural Inf. Process. Syst. 2023, 36, 23468–23484. [Google Scholar]
- Hou, Y.; Zhang, J.; Lin, Z.; Lu, H.; Xie, R.; McAuley, J.; Zhao, W.X. Large Language Models Are Zero-Shot Rankers for Recommender Systems. In Proceedings of the Advances in Information Retrieval; Goharian, N., Tonellotto, N., He, Y., Lipani, A., McDonald, G., Macdonald, C., Ounis, I., Eds.; Springer: Cham, Switzerland, 2024; pp. 364–381. [Google Scholar] [CrossRef]
- Zhuang, H.; Qin, Z.; Hui, K.; Wu, J.; Yan, L.; Wang, X.; Bendersky, M. Beyond Yes and No: Improving Zero-Shot LLM Rankers via Scoring Fine-Grained Relevance Labels. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers); Duh, K., Gomez, H., Bethard, S., Eds.; Association for Computational Linguistics: Mexico City, Mexico, 2024; pp. 358–370. [Google Scholar] [CrossRef]
- Faggioli, G.; Dietz, L.; Clarke, C.L.A.; Demartini, G.; Hagen, M.; Hauff, C.; Kando, N.; Kanoulas, E.; Potthast, M.; Stein, B.; et al. Perspectives on Large Language Models for Relevance Judgment. In Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR ’23, Taipei, Taiwan, 23 July 2023; pp. 39–50. [Google Scholar] [CrossRef]
- Thomas, P.; Spielman, S.; Craswell, N.; Mitra, B. Large Language Models Can Accurately Predict Searcher Preferences. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, Washington, DC, USA, 14–18 July 2024; pp. 1930–1940. [Google Scholar] [CrossRef]
- Syriani, E.; David, I.; Kumar, G. Screening Articles for Systematic Reviews with ChatGPT. J. Comput. Lang. 2024, 80, 101287. [Google Scholar]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. arXiv 2020, arXiv:2005.14165. [Google Scholar] [CrossRef]
- Gu, Y.; Tinn, R.; Cheng, H.; Lucas, M.; Usuyama, N.; Liu, X.; Naumann, T.; Gao, J.; Poon, H. Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing. ACM Trans. Comput. Healthc. 2021, 3, 2:1–2:23. [Google Scholar] [CrossRef]
- Huotala, A.; Kuutila, M.; Ralph, P.; Mäntylä, M. The Promise and Challenges of Using LLMs to Accelerate the Screening Process of Systematic Reviews. In Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering, EASE ’24, Salerno, Italy, 18–21 June 2024; pp. 262–271. [Google Scholar] [CrossRef]
- Kojima, T.; Gu, S.S.; Reid, M.; Matsuo, Y.; Iwasawa, Y. Large Language Models Are Zero-Shot Reasoners. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22, New Orleans, LA, USA, 28 November–9 December 2022; pp. 22199–22213. [Google Scholar]
- Tseng, Y.M.; Huang, Y.C.; Hsiao, T.Y.; Chen, W.L.; Huang, C.W.; Meng, Y.; Chen, Y.N. Two Tales of Persona in LLMs: A Survey of Role-Playing and Personalization. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024; Al-Onaizan, Y., Bansal, M., Chen, Y.N., Eds.; Association for Computational Linguistics: Miami, FL, USA, 2024; pp. 16612–16631. [Google Scholar] [CrossRef]
- Liu, D.; Nassereldine, A.; Yang, Z.; Xu, C.; Hu, Y.; Li, J.; Kumar, U.; Lee, C.; Qin, R.; Shi, Y.; et al. Large Language Models Have Intrinsic Self-Correction Ability. arXiv 2024, arXiv:2406.15673. [Google Scholar] [CrossRef]
- Spillias, S.; Tuohy, P.; Andreotta, M.; Annand-Jones, R.; Boschetti, F.; Cvitanovic, C.; Duggan, J.; Fulton, E.; Karcher, D.; Paris, C.; et al. Human-AI Collaboration to Identify Literature for Evidence Synthesis. Cell Rep. Sustain. 2024, 1, 100132. [Google Scholar] [CrossRef]
- Wang, W.; Wei, F.; Dong, L.; Bao, H.; Yang, N.; Zhou, M. MINILM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. In Advances in Neural Information Processing Systems 33 (NeurIPS 2020); Neural Information Processing Systems Foundation, Inc. (NeurIPS): Vancouver, BC, Canada, 2020; Volume 2020. [Google Scholar]
- SentenceTransformers Documentation—Sentence Transformers Documentation. Available online: https://www.sbert.net/ (accessed on 29 December 2024).
- OpenAI Platform. Available online: https://platform.openai.com (accessed on 26 December 2024).
- Manning, C.D.; Raghavan, P.; Schütze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar]
- Kusa, W.; Lipani, A.; Knoth, P.; Hanbury, A. An Analysis of Work Saved over Sampling in the Evaluation of Automated Citation Screening in Systematic Literature Reviews. Intell. Syst. Appl. 2023, 18, 200193. [Google Scholar] [CrossRef]
- Feng, Y.; Liang, S.; Zhang, Y.; Chen, S.; Wang, Q.; Huang, T.; Sun, F.; Liu, X.; Zhu, H.; Pan, H. Automated Medical Literature Screening Using Artificial Intelligence: A Systematic Review and Meta-Analysis. J. Am. Med. Inform. Assoc. 2022, 29, 1425–1432. [Google Scholar] [CrossRef]
- Kanoulas, E.; Li, D.; Azzopardi, L.; Spijker, R. CLEF 2017 Technologically Assisted Reviews in Empirical Medicine Overview. In Proceedings of the 8th International Conference of the CLEF Association, CLEF 2017, Dublin, Ireland, 11–14 September 2017. [Google Scholar]
- Linzbach, S.; Tressel, T.; Kallmeyer, L.; Dietze, S.; Jabeen, H. Decoding Prompt Syntax: Analysing Its Impact on Knowledge Retrieval in Large Language Models. In Proceedings of the Companion Proceedings of the ACM Web Conference 2023, WWW ’23 Companion, Austin, TX, USA, 30 April 2023–4 May 2023; pp. 1145–1149. [Google Scholar] [CrossRef]
Inclusion Criteria | Exclusion Criteria |
---|---|
|
|
Disease | Climate Variable | Relevant | Irrelevant | Total | % Relevant |
---|---|---|---|---|---|
CCHF | Rainfall | 57 | 397 | 454 | 12.6% |
Ebola | Rainfall | 14 | 901 | 915 | 1.5% |
Lepto | Rainfall | 108 | 891 | 999 | 10.8% |
RVF | Rainfall | 63 | 474 | 537 | 11.7% |
Disease | Topic and Eligibility Questions |
---|---|
CCHF | Topic: Impact of Climate Change on CCHF: A Focus on Rainfall Eligibility Questions: Q1. Does the study report on primary research or a meta-analysis rather than a review, opinion, or book? Q2. Does the study measure the incidence or prevalence or virulence or survival or transmission of Crimean-Congo haemorrhagic fever or a relevant vector (such as ticks) without specifically measuring the incidence of the pathogens? Q3. Does the research examine environmental factors such as rainfall, seasonality (e.g., wet vs. dry season) or regional comparisons impacting disease prevalence or vector distribution? Q4. Is the study focused on field-based or epidemiological research rather than laboratory method validation? |
Ebola | Topic: Impact of Climate Change on Ebola: A Focus on Rainfall Eligibility Questions: Q1. Does the study report on primary research or a meta-analysis rather than a review, opinion, or book? Q2. Does the study measure the incidence or prevalence or virulence or survival or transmission of Ebola or Marburg, a relevant vector, or reservoir hosts abundance or distribution (such as bats or primates) without specifically measuring the incidence of the pathogens? Q3. Does the research examine environmental factors such as rainfall, seasonality (e.g., wet vs. dry season) or regional comparisons impacting disease prevalence or vector distribution? Q4. Is the study focused on field-based or epidemiological research rather than laboratory method validation? |
Lepto | Topic: Impact of Climate Change on Leptospirosis: A Focus on Rainfall Eligibility Questions: Q1. Does the study report on primary research or a meta-analysis rather than a review, opinion, or book? Q2. Does the study measure the incidence or prevalence or virulence or survival or transmission of Leptospirosis, a relevant arthropod vector, or reservoir hosts (such as rodents) without specifically measuring the incidence of the pathogens? Q3. Does the research examine environmental factors such as rainfall, seasonality (e.g., wet vs. dry season) or regional comparisons impacting disease prevalence or vector distribution? Q4. Is the study focused on field-based or epidemiological research rather than laboratory method validation? |
RVF | Topic: Impact of Climate Change on Rift Valley Fever Virus: A Focus on Rainfall Eligibility Questions: Q1. Does the study report on primary research or a meta-analysis rather than a review, opinion, or book? Q2. Does the study measure the incidence or prevalence or virulence or survival or transmission of Rift Valley fever or other vector-borne diseases (such as malaria) that share similar vectors (e.g., mosquitoes) without specifically measuring the incidence of the pathogen? Q3. Does the research examine environmental factors such as rainfall, seasonality (e.g., wet vs. dry season) or regional comparisons impacting disease prevalence or vector distribution? Q4. Is the study focused on field-based or epidemiological research rather than laboratory method validation? |
Answer Schema | Answer Labels | Answer Score |
---|---|---|
QA-2 | Yes | 1.0 |
No | 0.0 | |
QA-3 | Yes | 1.00 |
Unsure | 0.50 | |
No | 0.00 | |
QA-4 | Definitely Yes | 0.95 |
Probably Yes | 0.75 | |
Probably No | 0.25 | |
Definitely No | 0.05 | |
QA-5 | Definitely Yes | 1.00 |
Probably Yes | 0.75 | |
Unsure | 0.50 | |
Probably No | 0.25 | |
Definitely No | 0.00 | |
TSC-5 | Definitely Include | 1.00 |
Probably Include | 0.75 | |
Unsure | 0.50 | |
Probably Exclude | 0.25 | |
Definitely Exclude | 0.00 |
Answer Schema | Answer Labels | Confidence Labels | Confidence Score |
---|---|---|---|
QA-2-C | Yes | High | 1.00 |
Medium | 0.75 | ||
Low | 0.50 | ||
No | High | 0.00 | |
Medium | 0.25 | ||
Low | 0.50 |
Disease | Model | Recall@k% | nWSS@r% | AP | |||||
---|---|---|---|---|---|---|---|---|---|
5 | 10 | 20 | 30 | 50 | 95 | 100 | |||
CCHF | QA-2 | 0.33 | 0.47 | 0.81 | 0.88 | 1.00 | 0.69 | 0.63 | 0.70 |
QA-2-C | 0.39 | 0.54 | 0.82 | 0.93 | 0.98 | 0.64 | 0.53 | 0.77 | |
QA-3 | 0.39 | 0.53 | 0.79 | 0.93 | 0.98 | 0.67 | 0.51 | 0.76 | |
QA-4 | 0.35 | 0.63 | 0.86 | 0.96 | 1.00 | 0.78 | 0.61 | 0.80 | |
QA-5 | 0.33 | 0.65 | 0.84 | 0.95 | 1.00 | 0.74 | 0.62 | 0.78 | |
TSC-5 | 0.37 | 0.46 | 0.58 | 0.63 | 0.82 | 0.16 | 0.12 | 0.55 | |
TSC-BM25 | 0.23 | 0.32 | 0.46 | 0.58 | 0.79 | 0.22 | 0.17 | 0.40 | |
TSC-MiniLM | 0.25 | 0.33 | 0.40 | 0.54 | 0.74 | 0.15 | 0.18 | 0.38 | |
Ebola | QA-2 | 0.50 | 0.71 | 0.79 | 1.00 | 1.00 | 0.67 | 0.72 | 0.18 |
QA-2-C | 0.36 | 0.79 | 0.93 | 0.93 | 1.00 | 0.61 | 0.66 | 0.33 | |
QA-3 | 0.43 | 0.86 | 0.86 | 1.00 | 1.00 | 0.66 | 0.71 | 0.23 | |
QA-4 | 0.71 | 0.93 | 1.00 | 1.00 | 1.00 | 0.86 | 0.91 | 0.51 | |
QA-5 | 0.64 | 0.93 | 1.00 | 1.00 | 1.00 | 0.83 | 0.88 | 0.50 | |
TSC-5 | 0.43 | 0.43 | 0.50 | 0.50 | 0.71 | 0.04 | 0.37 | ||
TSC-BM25 | 0.14 | 0.36 | 0.57 | 0.57 | 0.86 | 0.02 | 0.07 | 0.05 | |
TSC-MiniLM | 0.21 | 0.29 | 0.57 | 0.71 | 0.86 | 0.32 | 0.37 | 0.04 | |
Lepto | QA-2 | 0.32 | 0.54 | 0.82 | 0.92 | 0.95 | 0.57 | 0.41 | 0.58 |
QA-2-C | 0.36 | 0.65 | 0.81 | 0.94 | 0.96 | 0.72 | 0.28 | 0.69 | |
QA-3 | 0.32 | 0.52 | 0.77 | 0.89 | 0.97 | 0.56 | 0.25 | 0.57 | |
QA-4 | 0.39 | 0.75 | 0.92 | 0.94 | 0.99 | 0.72 | 0.42 | 0.80 | |
QA-5 | 0.40 | 0.72 | 0.92 | 0.94 | 0.98 | 0.70 | 0.40 | 0.78 | |
TSC-5 | 0.38 | 0.78 | 0.87 | 0.90 | 0.94 | 0.46 | 0.10 | 0.75 | |
TSC-BM25 | 0.15 | 0.28 | 0.48 | 0.60 | 0.85 | 0.26 | 0.14 | 0.26 | |
TSC-MiniLM | 0.32 | 0.56 | 0.81 | 0.94 | 0.98 | 0.72 | 0.26 | 0.62 | |
RVF | QA-2 | 0.21 | 0.37 | 0.76 | 0.84 | 0.95 | 0.54 | 0.31 | 0.44 |
QA-2-C | 0.29 | 0.41 | 0.76 | 0.90 | 0.97 | 0.73 | 0.36 | 0.53 | |
QA-3 | 0.24 | 0.41 | 0.76 | 0.90 | 0.98 | 0.61 | 0.29 | 0.51 | |
QA-4 | 0.32 | 0.56 | 0.83 | 0.89 | 0.98 | 0.70 | 0.27 | 0.66 | |
QA-5 | 0.29 | 0.52 | 0.83 | 0.89 | 0.98 | 0.67 | 0.30 | 0.61 | |
TSC-5 | 0.21 | 0.27 | 0.40 | 0.48 | 0.73 | 0.23 | 0.08 | 0.28 | |
TSC-BM25 | 0.11 | 0.22 | 0.37 | 0.49 | 0.73 | 0.19 | 0.06 | 0.21 | |
TSC-MiniLM | 0.14 | 0.24 | 0.40 | 0.56 | 0.90 | 0.35 | 0.26 | 0.26 |
Model | MAP | PR-AUC |
---|---|---|
QA-2 | 0.476 | 0.621 |
QA-2-C | 0.579 | 0.669 |
QA-3 | 0.519 | 0.636 |
QA-4 | 0.691 | 0.761 |
QA-5 | 0.670 | 0.740 |
TSC-5 | 0.489 | 0.639 |
TSC-BM25 | 0.229 | 0.206 |
TSC-MiniLM | 0.325 | 0.271 |
Disease | Revised to Include | Revised to Exclude | Total |
---|---|---|---|
CCHF | 7 | 3 | 10 |
Ebola | 1 | 0 | 1 |
Lepto | 13 | 3 | 16 |
RVF | 8 | 5 | 13 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sujau, M.; Wada, M.; Vallée, E.; Hillis, N.; Sušnjak, T. Accelerating Disease Model Parameter Extraction: An LLM-Based Ranking Approach to Select Initial Studies for Literature Review Automation. Mach. Learn. Knowl. Extr. 2025, 7, 28. https://doi.org/10.3390/make7020028
Sujau M, Wada M, Vallée E, Hillis N, Sušnjak T. Accelerating Disease Model Parameter Extraction: An LLM-Based Ranking Approach to Select Initial Studies for Literature Review Automation. Machine Learning and Knowledge Extraction. 2025; 7(2):28. https://doi.org/10.3390/make7020028
Chicago/Turabian StyleSujau, Masood, Masako Wada, Emilie Vallée, Natalie Hillis, and Teo Sušnjak. 2025. "Accelerating Disease Model Parameter Extraction: An LLM-Based Ranking Approach to Select Initial Studies for Literature Review Automation" Machine Learning and Knowledge Extraction 7, no. 2: 28. https://doi.org/10.3390/make7020028
APA StyleSujau, M., Wada, M., Vallée, E., Hillis, N., & Sušnjak, T. (2025). Accelerating Disease Model Parameter Extraction: An LLM-Based Ranking Approach to Select Initial Studies for Literature Review Automation. Machine Learning and Knowledge Extraction, 7(2), 28. https://doi.org/10.3390/make7020028