Machine Learning Approach to Predicting Rift Valley Fever Disease Outbreaks in Kenya
Simple Summary
Abstract
1. Introduction
2. Literature Review
3. Materials and Methods
3.1. Study Area
3.2. Data Description and Attribute Selection
Correlation Coefficient
3.3. Data Pre-Processing
3.4. Statistical Software
3.5. Machine Learning Methodology
Models in ML
3.6. Analytical Flow Chart Approach
3.7. Evaluation Metrics in ML
4. Results
4.1. Descriptive Statistics of Data Used
4.2. Correlation Across Variables
4.3. Model Selection and Evaluation
4.4. ML Models Evaluation Metrics and Ensemble Predictions
4.5. Comparison Between ML Models Based on Accuracy
4.6. Advanced ML Models Evaluation Metrics
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Gaudreault, N.N.; Indran, S.V.; Balaraman, V.; Wilson, W.C.; Richt, J.A. Molecular aspects of Rift Valley fever virus and the emergence of reassortants. Virus Genes 2019, 55, 1–11. [Google Scholar] [CrossRef]
- Kuhn, J.H.; Brown, K.; Adkins, S.; De La Torre, J.C.; Digiaro, M.; Ergünay, K.; Firth, A.E.; Hughes, H.R.; Junglen, S.; Lambert, A.J.; et al. Promotion of order Bunyavirales to class Bunyaviricetes to accommodate a rapidly increasing number of related polyploviricotine viruses. J. Virol. 2024, 98, e01069-24. [Google Scholar] [CrossRef]
- Environmental Change and Rift Valley Fever in Eastern Africa: Projecting Beyond HEALTHY FUTURES|Geospatial Health. Available online: https://www.geospatialhealth.net/gh/article/view/387 (accessed on 22 November 2024).
- Webb, J., Jr. Disease and Epidemiology of Humans and Animals. In Oxford Research Encyclopedia of African History; Oxford University Press: Oxford, UK, 2019. [Google Scholar] [CrossRef]
- Peters, C.J.; Linthicum, K.J. Rift Valley Fever. In Handbook of Zoonoses, 2nd ed.; CRC Press: Boca Raton, FL, USA, 1994; ISBN 978-0-203-75246-3. [Google Scholar]
- Hartman, A. Rift Valley Fever. Clin. Lab. Med. 2017, 37, 285–301. [Google Scholar] [CrossRef] [PubMed]
- Insights into the Pathogenesis of Viral Haemorrhagic Fever Based on Virus Tropism and Tissue Lesions of Natural Rift Valley Fever. Available online: https://www.mdpi.com/1999-4915/13/4/709 (accessed on 23 November 2024).
- Ali, H.; Ali, A.; Umer, Z.; Numan, A.; Ali, H.; Adil, M.T.; Randhawa, U.A.; Khan, H.H.; Jamil, H.; Umar, S. Rift Valley Fever: Insights into Abortive and Zoonotic Disease. Int. J. Agric. Biosci. 2023, 48, 609–624. [Google Scholar] [CrossRef]
- LaBeaud, A.D.; Pfeil, S.; Muiruri, S.; Dahir, S.; Sutherland, L.J.; Traylor, Z.; Gildengorin, G.; Muchiri, E.M.; Morrill, J.; Peters, C.J.; et al. Factors Associated with Severe Human Rift Valley Fever in Sangailu, Garissa County, Kenya. PLoS Negl. Trop. Dis. 2015, 9, e0003548. [Google Scholar] [CrossRef]
- Das, S.; Dey, A.; Pal, A.; Roy, N. Applications of Artificial Intelligence in Machine Learning: Review and Prospect. Int. J. Comput. Appl. 2015, 115, 31–41. [Google Scholar] [CrossRef]
- Bent, O. Machine Learning Applied to Prediction, Control and Planning from Dynamic Epidemiological Models. University of Oxford, 2020. Available online: https://ora.ox.ac.uk/objects/uuid:db5aaded-6f10-4683-9b25-878f2ed8f9e0 (accessed on 7 August 2024).
- Kiunga, P.N. The Application of Ecological Niche Model to Map out the Rift Valley Fever Risk Areas in Kenya. Master’s Thesis, University of Nairobi, Nairobi, Kenya, 2015. Available online: http://erepository.uonbi.ac.ke/handle/11295/95236 (accessed on 23 November 2024).
- Chevalier, V.; Pépin, M.; Plée, L.; Lancelot, R. Rift Valley fever—A threat for Europe? Eurosurveillance 2010, 15, 19506. [Google Scholar] [CrossRef] [PubMed]
- Nanyingi, M.O.; Munyua, P.; Kiama, S.G.; Muchemi, G.M.; Thumbi, S.M.; Bitek, A.O.; Bett, B.; Muriithi, R.M.; Njenga, M.K. A systematic review of Rift Valley Fever epidemiology 1931–2014. Infect. Ecol. Epidemiol. 2015, 5, 28024. [Google Scholar] [CrossRef] [PubMed]
- Lumley, S.; Horton, D.L.; Hernandez-Triana, L.L.M.; Johnson, N.; Fooks, A.R.; Hewson, R. Rift Valley fever virus: Strategies for maintenance, survival and vertical transmission in mosquitoes. J. Gen. Virol. 2017, 98, 875–887. [Google Scholar] [CrossRef]
- Anyamba, A.; Chretien, J.-P.; Small, J.; Tucker, C.J.; Formenty, P.B.; Richardson, J.H.; Britch, S.C.; Schnabel, D.C.; Erickson, R.L.; Linthicum, K.J. Prediction of a Rift Valley fever outbreak. Proc. Natl. Acad. Sci. USA 2009, 106, 955–959. [Google Scholar] [CrossRef]
- Gachohi, J.; Bett, B.; Njogu, G.; Mariner, J.; Jost, C. The 2006–2007 Rift Valley fever outbreak in Kenya: Sources of early warning messages and response measures implemented by the Department of Veterinary Services. Rev. Sci. Tech. Int. Off. Epizoot. 2012, 31, 877–887. [Google Scholar] [CrossRef] [PubMed]
- Demirsoy, I.; Karaibrahimoglu, A. Identifying drug interactions using machine learning. Adv. Clin. Exp. Med. 2023, 32, 829–838. [Google Scholar] [CrossRef]
- Mulwa, D.; Kazuzuru, B.; Misinzo, G.; Bett, B. An XGBoost Approach to Predictive Modelling of Rift Valley Fever Outbreaks in Kenya Using Climatic Factors. Big Data Cogn. Comput. 2024, 8, 148. [Google Scholar] [CrossRef]
- Afrifa-Yamoah, E.; Adua, E.; Peprah-Yamoah, E.; Anto, E.O.; Opoku-Yamoah, V.; Acheampong, E.; Macartney, M.J.; Hashmi, R. Pathways to chronic disease detection and prediction: Mapping the potential of machine learning to the pathophysiological processes while navigating ethical challenges. Chronic Dis. Transl. Med. 2024, in press. [Google Scholar] [CrossRef]
- XGBoost and Random Forest Algorithms: An in Depth Analysis|Pakistan Journal of Scientific Research. Available online: https://pjosr.com/index.php/pjosr/article/view/946 (accessed on 22 November 2024).
- Balkhy, H.H.; Memish, Z.A. Rift Valley fever: An uninvited zoonosis in the Arabian peninsula. Int. J. Antimicrob. Agents 2003, 21, 153–157. [Google Scholar] [CrossRef]
- Rupasinghe, R.; Chomel, B.B.; Martínez-López, B. Climate change and zoonoses: A review of the current status, knowledge gaps, and future trends. Acta Trop. 2022, 226, 106225. [Google Scholar] [CrossRef] [PubMed]
- Palaniyandi, M.; Anand, P.; Maniyosai, R.; Mariappan, T.; Das, P. The integrated remote sensing and GIS for mapping of potential vector breeding habitats, and the Internet GIS surveillance for epidemic transmission control, and management. J. Entomol. Zool. Stud. 2016, 4, 310–318. [Google Scholar]
- Rizaldi, M.I.; Chandranegara, D.R.; Akbi, D.R. Comparison of Machine Learning Techniques for Classification of Distributed Denial of Service Attacks Based on Feature Engineering in SDN-Based Networks. JIPI J. Ilm. Penelit. Dan Pembelajaran Inform. 2024, 9, 1180–1197. [Google Scholar] [CrossRef]
- Early Detection and Prediction of Zoonotic Disease Events Using Event-Based Surveillance and Machine Learning—ProQuest. Available online: https://www.proquest.com/openview/9eae8386fd3703c3ced08eda07fc4ae5/1?pq-origsite=gscholar&cbl=18750&diss=y (accessed on 22 November 2024).
- Ratner, B. The correlation coefficient: Its values range between +1/−1, or do they? J. Target. Meas. Anal. Mark. 2009, 17, 139–142. [Google Scholar] [CrossRef]
- Download R-4.4.2 for Windows. The R-Project for Statistical Computing. Available online: https://cran.r-project.org/bin/windows/base/ (accessed on 23 November 2024).
- Mupangwa, W.; Chipindu, L.; Nyagumbo, I.; Mkuhlani, S.; Sisito, G. Evaluating machine learning algorithms for predicting maize yield under conservation agriculture in Eastern and Southern Africa. SN Appl. Sci. 2020, 2, 952. [Google Scholar] [CrossRef]
- Martinasek, Z.; Zeman, V.; Malina, L.; Martinasek, J. k-Nearest Neighbors Algorithm in Profiling Power Analysis Attacks. Radioengineering 2016, 25, 365–382. [Google Scholar] [CrossRef]
- Sarker, I.H. Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef]
- Singh, A.; Halgamuge, M.H.; Lakshmiganthan, R. Impact of Different Data Types on Classifier Performance of Random Forest, Naïve Bayes, and K-Nearest Neighbors Algorithms. Int. J. Adv. Comput. Sci. Appl. 2017. [Google Scholar] [CrossRef]
- Maldonado, S.; López, J. Dealing with high-dimensional class-imbalanced datasets: Embedded feature selection for SVM classification. Appl. Soft Comput. 2018, 67, 94–105. [Google Scholar] [CrossRef]
- Ali, J.; Khan, R.; Ahmad, N.; Maqsood, I. Random Forests and Decision Trees. IJCSI Int. J. Comput. Sci. Issues 2012, 9, 272–278. [Google Scholar]
- Machine Learning: A Review of Classification and Combining Techniques|Artificial Intelligence Review. Available online: https://link.springer.com/article/10.1007/s10462-007-9052-3 (accessed on 23 November 2024).
- Hayman, D.T.S.; Adisasmito, W.B.; Almuhairi, S.; Behravesh, C.B.; Bilivogui, P.; Bukachi, S.A.; Casas, N.; Becerra, N.C.; Charron, D.F.; Chaudhary, A.; et al. Developing One Health surveillance systems. One Health 2023, 17, 100617. [Google Scholar] [CrossRef] [PubMed]
- Zhang, T.; Rabhi, F.; Chen, X.; Paik, H.; MacIntyre, C.R. A machine learning-based universal outbreak risk prediction tool. Comput. Biol. Med. 2024, 169, 107876. [Google Scholar] [CrossRef]
- Carnahan, B.; Meyer, G.; Kuntz, L.A. Comparing Statistical and Machine Learning Classifiers: Alternatives for Predictive Modeling in Human Factors Research. Hum. Factors 2003, 45, 408–423. [Google Scholar] [CrossRef]
- XGBoost|Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Available online: https://dl.acm.org/doi/abs/10.1145/2939672.2939785 (accessed on 23 November 2024).
- Chen, M.; Hao, Y.; Hwang, K.; Wang, L.; Wang, L. Disease Prediction by Machine Learning Over Big Data From Healthcare Communities. IEEE Access 2017, 5, 8869–8879. [Google Scholar] [CrossRef]
- Javaid, M.; Haleem, A.; Pratap Singh, R.; Suman, R.; Rab, S. Significance of machine learning in healthcare: Features, pillars and applications. Int. J. Intell. Netw. 2022, 3, 58–73. [Google Scholar] [CrossRef]
- Leedale, J.; Jones, A.E.; Caminade, C.; Morse, A.P. A dynamic, climate-driven model of Rift Valley fever. Geospat. Health 2016, 11, 394. [Google Scholar] [CrossRef]
Variable | Scale of Measurement | Variable Category | Possible Impact |
---|---|---|---|
Dependent/independent | Discrete | Independent variable | +/− |
Month | Categorical (Jan–Dec) | Independent variable | +/− |
Rainfall | Continuous | Independent variable | Higher rainfall increases RVF outbreaks due to vector breeding grounds. |
Elevation | Continuous | Independent variable | Lower elevations create ideal conditions for mosquitoes, while higher elevations reduce mosquito activity and virus transmission. |
Slope | Continuous | Independent variable | Steeper slopes facilitate runoff, reducing mosquito habitats and RVF outbreak risk. |
Clay | Continuous | Independent variable | Soil with high clay content retains water for longer periods, hence increasing the likelihood of RVF outbreaks in areas with clay-heavy soils. |
Humidity | Continuous | Independent variable | High humidity levels enhance mosquito survival and activity, hence increasing the lifespan of the virus in the environment, contributing to higher risks of RVF transmission in humid regions. |
RVF outbreak cases | Categorical | Dependent variable | +/− |
Metric and Curves | Implication of Usage | Formula |
---|---|---|
False Positive | When we predict a level or event that did not happen | |
False Negative | When we do not predict a level or event, and it does happen | |
True positive | When we predict the right level | |
Negative Predictive value | Looks on precision for negative class. | |
Sensitivity/Recall | How accurately does the classifier classify actual events? | |
Precision | How accurately does the classifier predict events? | |
Accuracy | How good at classifying both positive and negative cases the model is | |
Confusion matrix | Table that contains true negative, false positive, false negative, and true positive values | |
F1 score | Geometric average of precision and recall | |
ROC AUC curve and scores | It can be used to show the trade-off be tween the false predictive rate (FPR) and true positive rate (TPR) in a single visualization | |
Precision–Recall curve and scores | When data are heavily imbalanced, they can be used to combine precision (PPV) and recall (TPR) in a single visualization |
Province | RVF Outbreaks | Percentage (%) |
---|---|---|
Central | 63 | 14.5 |
Coast | 46 | 10.6 |
Eastern | 89 | 20.6 |
Nairobi | 37 | 8.5 |
North Eastern | 82 | 18.9 |
Nyanza | 0 | 0 |
Rift Valley | 116 | 26.8 |
Western | 0 | 0 |
LR | LDA | KNN | CART | NB | SVM | |
---|---|---|---|---|---|---|
Accuracy | 0.997 | 0.997 | 0.997 | 0.994 | 0.989 | 0.997 |
Sensitivity | 0.000 | 0.000 | 0.000 | 0.021 | 0.010 | 0.000 |
Specificity | 1.000 | 0.999 | 1.000 | 0.997 | 0.992 | 1.000 |
Precision | 0.000 | 0.000 | 0.000 | 0.021 | 0.004 | 0.000 |
Recall | 0.000 | 0.000 | 0.000 | 0.0206 | 0.010 | 0.000 |
F1score | 0.000 | 0.000 | 0.000 | 0.0212 | 0.005 | 0.000 |
PR Classifier | AUC | ROC | AUC |
---|---|---|---|
Decision Tree Classifier | 0.0223 | XG B Classifier | 0.9110 |
X GB Classifier | 0.0214 | Gaussian NB | 0.7192 |
K-Neighbors | 0.0096 | LDA | 0.6941 |
Random Forest Classifier | 0.0089 | Logistic Regression | 0.6756 |
Gaussian NB | 0.0062 | Random Forest Classifier | 0.5736 |
LDA | 0.0059 | K-Neighbors Classifier | 0.5303 |
Logistic Regression | 0.0052 | Decision Tree Classifier | 0.5090 |
SVM | 0.0049 | SVM | 0.4487 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mulwa, D.; Kazuzuru, B.; Misinzo, G.; Bett, B. Machine Learning Approach to Predicting Rift Valley Fever Disease Outbreaks in Kenya. Zoonotic Dis. 2025, 5, 20. https://doi.org/10.3390/zoonoticdis5030020
Mulwa D, Kazuzuru B, Misinzo G, Bett B. Machine Learning Approach to Predicting Rift Valley Fever Disease Outbreaks in Kenya. Zoonotic Diseases. 2025; 5(3):20. https://doi.org/10.3390/zoonoticdis5030020
Chicago/Turabian StyleMulwa, Damaris, Benedicto Kazuzuru, Gerald Misinzo, and Benard Bett. 2025. "Machine Learning Approach to Predicting Rift Valley Fever Disease Outbreaks in Kenya" Zoonotic Diseases 5, no. 3: 20. https://doi.org/10.3390/zoonoticdis5030020
APA StyleMulwa, D., Kazuzuru, B., Misinzo, G., & Bett, B. (2025). Machine Learning Approach to Predicting Rift Valley Fever Disease Outbreaks in Kenya. Zoonotic Diseases, 5(3), 20. https://doi.org/10.3390/zoonoticdis5030020