Generalized Linear Models to Forecast Malaria Incidence in Three Endemic Regions of Senegal
Abstract
:1. Introduction
2. Materials and Methods
2.1. Data and Notation
- “Faciès tropical”: corresponding the regions of Ziguinchor, Kolda, Tambacounda, and Kedougou. In that zone, the raining season is the longest and most intensive in the country and covers 5 to 6 months. Malaria cases are observed between 4 to 6 months and the transmission is high (20 to 100 infected bites/human/year).
- “Faciès sahélien”: corresponding the regions such as Kaolack, Fatick, Diourbel, Dakar, Thies, Louga, Saint-Louis, and Matam with a less intensive rainy season and covers 2 to 3 months. The transmission is very low in general (0 to 20 infected bites/human/year).
2.1.1. Response Variable
2.1.2. Independent Variables
2.2. Models
2.2.1. Poisson Regression Model
2.2.2. NB Regression Model
2.3. Estimation and Forecasting Methods
2.3.1. Train and Test Sets
2.3.2. Parameter Estimation and Principles of Forecasting
Algorithm 1: Forecasting Algorithm |
Input: , , , ← times of Section 2.3.1; ℓ ← vector (Section 3.1); h ← forecast horizon (); ← observed malaria incidence (dependant variable); ← set of explanatory variables; ; ; Output: ← the forecasted vector of malaria incidence; |
2.3.3. Saturation Method
Algorithm 2: Forecasting Algorithm with Saturation |
Input: , , , ← times of Section 2.3.1; ℓ ← vector (Section 3.1); h ← forecast horizon (); ← observed malaria incidence (dependant variable); ← set of explanatory variables; ; ; Output: ← the forecasted vector of malaria incidence; |
3. Results and Discussion
3.1. Determination of Lags
3.2. Model Performance Metrics
- Root mean square error (RMSE):
- Mean absolute error (MAE):
- Mean absolute scaled error (MASE):It consists of the ratio between the MAE and the mean monthly variation of the observed values. A MASE value around 1 or below indicates an excellent accuracy.
- Mean absolute relative error (MARE):
- R-squared [2]:It is the proportion of variation in the outcome that is explained by the predictor variables. The higher the R-squared, the better the model, in contrast to all the above metrics.
- The SI presents the percentage of RMSE difference with respect to mean observation or it gives the percentage of expected error for the parameter. Lower values of the SI are an indication of better model performance.
- The reliability analysis (RA) is a statistical method for measuring the overall consistency of a model by determining if this suggested model achieves a permissible level of performance.Next, if , then , otherwise , where is a threshold value that is 0.2 (20%) based on Chinese standards.
3.3. Model Selection and Result Comparison
3.3.1. Model Selection by Using the Vuong Test
3.3.2. Results Comparison by Using Metrics
3.4. Forecasts Results by Various Sets of Explanatory Variables
3.4.1. Forecasts Results Using History of Malaria Incidence Only
3.4.2. Forecasts Results by Using all Explanatory Variables
3.5. Addition Study
3.6. Ablation Study
3.7. Forecasts Results Using Saturation
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Method for Parameter Estimation
Appendix A.1. Poisson Log-Likelihood
- 1.
- The likelihood function is defined as follows
- 2.
- The Log-likelihood is defined as follows
- 3.
- The first derivative of the log-likelihood is named the gradient. If , we have . So this function is defined as followsThen, the gradient at state j is defined byThen, if g = identity, we have . SoThen, the gradient at state j is defined byFinally, if , we have . SoThen, the gradient at state j is defined as follows
- 4.
- The second derivative of the log-likelihood function named the Hessian is defined, if , byThen, if g = identity soFinally, if so
Appendix A.2. NB Log-Likelihood
Appendix A.3. Optimization Algorithm
Algorithm A1: Newton-Raphson [29] |
1 Choose initial parameter estimate ; 2 Calculate score ; 3 Calculate derivative of the function for which you want to calculate the roots; 4 Walk along first derivative until line (plane) of the derivative crosses zero; 5 Update the betas ; 6 Iterate from step 2 to 5 until convergence. |
References
- Putri, R.G.; Jaharuddin.; Bakhtiar, T. SIRS-SI Model of Malaria Disease with Application of Vaccines, Anti-Malarial Drugs, and Spraying. IOSR J. Math. 2014, 10, 66–72. [Google Scholar] [CrossRef]
- Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data-Second Edition, 2nd ed.; Econometric Society Monographs, Cambridge University Press: Cambridge, UK, 2013. [Google Scholar] [CrossRef]
- Lindsey, J.K. Generalized Linear Modelling. In Applying Generalized Linear Models; Springer: New York, NY, USA, 1997; pp. 1–26. [Google Scholar] [CrossRef]
- McCullagh, P.; Nelder, J.A. Generalized Linear Models; Routledge: London, UK, 1983. [Google Scholar]
- Lee, S.C. Delta Boosting Implementation of Negative Binomial Regression in Actuarial Pricing. Risks 2020, 8, 19. [Google Scholar] [CrossRef] [Green Version]
- Abiodun, G.J.; Makinde, O.S.; Adeola, A.M.; Njabo, K.Y.; Witbooi, P.J.; Djidjou-Demasse, R.; Botai, J.O. A Dynamical and Zero-Inflated Negative Binomial Regression Modelling of Malaria Incidence in Limpopo Province, South Africa. Int. J. Environ. Res. Public Health 2019, 16, 2000. [Google Scholar] [CrossRef] [Green Version]
- Nakashima, E. Some Methods for Estimation in a Negative Binomial Model. Ann. Inst. Stat. Math. 1997, 49, 101–115. [Google Scholar] [CrossRef]
- Famoye, F. A Multivariate Generalized Poisson Regression Model. Commun. Stat.-Theory Methods 2015, 44, 497–511. [Google Scholar] [CrossRef]
- Makindea, O.S.; Abiodun, G.J.; Ojo, O.T. Modelling of malaria incidence in Akure, Nigeria: Negative binomial approach. GeoJournal 2020, 86, 1327–1336. [Google Scholar] [CrossRef]
- Mabaso, M.L.; Vounatsou, P.; Midzi, S.; Silva, J.D.; Smith, T. Spatio-temporal analysis of the role of climate in inter-annual variation of malaria incidence in Zimbabwe. Int. J. Health Geogr. 2006, 5, 20. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Asnath, S.M.; Daniel, M.; Alexander, B. Modelling Malaria Incidence in the Limpopo Province, South Africa: Comparison of Classical and Bayesian Methods of Estimation. Int. J. Environ. Res. Public Health 2020, 17, 5016. [Google Scholar] [CrossRef]
- Yirga, A.A.; Melesse, S.F.; Mwambi, H.G.; Ayele, D.G. Negative binomial mixed models for analyzing longitudinal CD4 count data. Sci. Rep. 2020, 10, 16742. [Google Scholar] [CrossRef] [PubMed]
- Giardina, F.; Gosoniu, L.; Konate, L.; Diouf, M.B.; Perry, R.; Gaye, O.; Faye, O.; Vounatsou, P. Estimating the Burden of Malaria in Senegal: Bayesian Zero-Inflated Binomial Geostatistical Modeling of the MIS 2008 Data. PLOS ONE 2012, 7, e32625. [Google Scholar] [CrossRef]
- Nkiruka, O.; Prasad, R.; Clement, O. Prediction of malaria incidence using climate variability and machine learning. Inform. Med. Unlocked 2021, 22, 100508. [Google Scholar] [CrossRef]
- de lutte Contre le Paludisme, P.N. Bulletin Epidemiologique Annuel 2016 du Paludisme au Senegal. 2016. Available online: https://www.dropbox.com/scl/fi/n2w8hoi2ureubud7usc6e/Bulletin-Epidemiologique-Annuel-2016-du-Paludisme-au-Senegal-VF.pdf?rlkey=wryw7t3z4xt4ov3f5edwgaj33&dl=0 (accessed on 3 July 2023).
- Faye, S.; Cico, A.; Gueye, A.B.; Baruwa, E.; Johns, B.; Ndiop, M.; Alilio, M. Scaling up malaria intervention “packages” in Senegal: Using cost effectiveness data for improving allocative efficiency and programmatic decision-making. Malar. J. 2018, 17, 159. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Adepoju, P. Les Tests de Diagnostic Rapide Pourraient Omettre Jusqu’à 20% des cas de Paludisme. 2021. Available online: https://www.nature.com/articles/d44148-021-00087-0 (accessed on 14 June 2023).
- Love, D.E.; Aseidu, L.J.; Adjei, L.E. A Weather-Based Prediction Model of Malaria Prevalence in Amenfi West District, Ghana; Hindawi Publishing Corporation Malaria Research and Treatment: London, UK, 2017. [Google Scholar]
- Okuneye, K.; Gumel, A.B. Analysis of a temperature- and rainfall-dependent model for malaria transmission dynamics. Math. Biosci. 2017, 287, 72–92. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ndiaye, O.; Hesran, J.Y.L.; Etard, J.F.; Diallo, A.; Simondon, F.; Ward, M.N.; Robert, V. Variations climatiques et mortalité attribuée au paludisme dans la zone de Niakhar, Sénégal, de 1984 à 1996. Cah. Santé 2001, 11, 25–33. [Google Scholar]
- Maslen, B. How to Deal with Count Data? Technical Report; Stats Central: Mark Wainwright Analytical Centre, UNSW Sydney. 2019. Available online: https://www.analytical.unsw.edu.au/sites/default/files/document_related_files/2019April_Seminar_How%20to%20deal%20with%20count%20data_Maslen_1.pdf (accessed on 3 July 2023).
- Absil, P.A.; Diao, O.; Diallo, M. Assessment of COVID-19 Hospitalization Forecasts from a Simplified SIR Model. Lett. Biomath. 2021, 8, 215–228. [Google Scholar]
- Jin, C.; Liu, J.A. Applications of Support Vector Machine and Unsupervised Learning for Predicting Maintainability Using Object-Oriented Metrics. In Proceedings of the 2010 Second International Conference on Multimedia and Information Technology, Kaifeng, China, 24–25 April 2010; Volume 1, pp. 24–27. [Google Scholar] [CrossRef]
- Saberi-Movahed, F.; Najafzadeh, M.; Mehrpooya, A. Receiving More Accurate Predictions for Longitudinal Dispersion Coefficients in Water Pipelines: Training Group Method of Data Handling Using Extreme Learning Machine Conceptions. Water Resour. Manag. 2020, 34, 529–561. [Google Scholar] [CrossRef]
- Gowsar1, S.N.; Radha, M.; Devi, M.N. A Comparison of Generalized Linear Models for Insect Count Data. Int. J. Stat. Anal. 2019, 9, 1–9. [Google Scholar]
- Hashim, L.H.; Dreeb, N.K.; Hashim, K.H.; Shiker, M.A.K. An Application Comparison of Two Negative Binomial Models on Rainfall Count Data; IOP Publishing: Bristol, UK, 2021; Volume 1818, p. 012100. [Google Scholar] [CrossRef]
- Midekisa, A.; Senay, G.; Henebry, G.M.; Semuniguse, P.; Wimberly, M.C. Remote sensing-based time series models for malaria early warning in the highlands of Ethiopia. Malar. J. 2012, 165, 11. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cruyff, M.J.; van der Heijden, P.G. Point and interval estimation of the population size using a zero-truncated negative binomial regression model. Biomed. J. 2008. [Google Scholar] [CrossRef]
- Clemen, L. Poisson IRWLS. 2019. Available online: https://statomics.github.io/SGA2019/assets/poissonIRWLS-implemented.html (accessed on 3 July 2023).
Dakar | Fatick | Kedougou | |
---|---|---|---|
0.79 | 0.62 | 0.61 | |
0.65 | 0.43 | 0.56 | |
0.56 | 0.69 | 0.58 | |
0.15 | 0.38 | 0.60 | |
0.129 | |||
0.85 | 0.88 | 0.92 | |
0.72 | 0.75 | 0.80 | |
Model | Link | RMSE | MASE | MARE | min | RA | ||
---|---|---|---|---|---|---|---|---|
Dk | G | id | 2197.29/2384.26 | 0.52/1.01 | 0.68/1.54 | 0.84/0.79 | −517.03 | 28.75/20 |
log | 2466.29/2352.81 | 0.6/1.38 | 0.85/2.66 | 0.79/0.78 | 511.81 | 22.5/4 | ||
P | id | 2245.27/2689.74 | 0.52/1.02 | 0.54/1.28 | 0.83/0.78 | 282.6 | 32.5/16 | |
log | 2523.01/2297.45 | 0.57/1.23 | 0.65/2.05 | 0.79/0.77 | 341.22 | 27.5/4 | ||
sqrt | 2354.97/2303.14 | 0.54/1.08 | 0.62/1.77 | 0.81/0.8 | 279.49 | 30/12 | ||
NB | id | 2558.87/3555.94 | 0.58/1.28 | 0.5/1.24 | 0.82/0.76 | 460.63 | 27.5/16 | |
log | 3736.91/2424.02 | 0.74/1.1 | 0.61/1.89 | 0.73/0.79 | 471.42 | 28.75/8 | ||
sqrt | 3409.99/3234.53 | 0.73/1.23 | 0.55/1.56 | 0.79/0.79 | 482.42 | 28.75/16 | ||
Ft | G | id | 768.52/578.88 | 0.83/1.33 | 0.66/0.59 | 0.67/0.75 | 87.78 | 27.5/24 |
log | 721.66/484.07 | 0.81/1.31 | 0.73/0.7 | 0.71/0.83 | 83.29 | 25/16 | ||
P | id | 772.32/543.56 | 0.82/1.26 | 0.65/0.6 | 0.67/0.72 | 99.12 | 31.25/24 | |
log | 741.92/491.42 | 0.83/1.37 | 0.85/0.88 | 0.69/0.78 | 215.44 | 25/8 | ||
sqrt | 763.78/526.94 | 0.84/1.32 | 0.79/0.74 | 0.68/0.73 | 215.13 | 26.25/20 | ||
NB | id | 839.53/527.32 | 0.87/1.22 | 0.61/0.59 | 0.64/0.67 | 69.22 | 30/20 | |
log | 861.98/466.03 | 0.91/1.21 | 0.84/0.79 | 0.63/0.7 | 268.19 | 21.25/20 | ||
sqrt | 942.27/504.91 | 0.98/1.13 | 0.75/0.59 | 0.62/0.64 | 180.39 | 22.25/28 | ||
Kd | G | id | 1230.61/2467.27 | 1.01/0.92 | 1.19/0.63 | 0.62/0.62 | 29.85 | 10/16 |
log | 1409.28/2720.94 | 1.27/1.01 | 2.18/0.73 | 0.51/0.56 | 872.92 | 13.75/20 | ||
P | id | 1241.89/2431.91 | 0.99/0.87 | 0.9/0.47 | 0.61/0.63 | 126.76 | 16.25/24 | |
log | 1488.67/3009.49 | 1.15/1.1 | 1.47/0.52 | 0.46/0.59 | 510.81 | 21.25/16 | ||
sqrt | 1352.9/2523.28 | 1.03/0.89 | 1.07/0.42 | 0.56/0.62 | 324.21 | 18.75/12 | ||
NB | id | 1287.33/2346.07 | 1.07/0.83 | 0.9/0.44 | 0.61/0.65 | −52.81 | 16.25/28 | |
log | 2160.53/7024.86 | 1.42/2.47 | 1.08/0.73 | 0.4/0.58 | 275.72 | 16.25/8 | ||
sqrt | 1567.23/3037.79 | 1.19/1.14 | 0.91/0.44 | 0.54/0.63 | 132.16 | 20/16 |
Regions | h | RMSE | MASE | MARE | SI | |
---|---|---|---|---|---|---|
Dakar | 1 | 3886.43/2367.55 | 0.95/1.06 | 0.85/1.59 | 0.51/0.54 | 33.93/49.35 |
Dakar | 2 | 5265.41/3565.7 | 1.47/2.15 | 2.4/5.97 | 0.08/0.06 | 41.5/69.91 |
Dakar | 3 | 5399.89/4075.83 | 1.55/2.79 | 3.15/9.38 | 0.01/0.01 | 60.31/56.57 |
Fatick | 1 | 916.35/408.29 | 0.96/1.11 | 0.76/0.75 | 0.55/0.5 | 22.22/18.83 |
Fatick | 2 | 1250.88/741.17 | 1.47/2.26 | 1.99/2.21 | 0.14/0.03 | 24.92/24.75 |
Fatick | 3 | 1339.06/807.71 | 1.67/2.71 | 2.85/3.32 | 0.0/0.01 | 25.23/25.19 |
Kedougou | 1 | 1250.53/2523.74 | 1.02/0.93 | 1.06/0.61 | 0.61/0.59 | 30.69/36.47 |
Kedougou | 2 | 1790.88/3770.72 | 1.58/1.59 | 2.67/1.23 | 0.19/0.18 | 41.54/41.69 |
Kedougou | 3 | 1972.37/4427.78 | 1.87/1.84 | 3.65/1.43 | 0.02/0.01 | 42.41/47.11 |
Regions | Variable | RMSE w/wo | MASE w/wo | MARE w/wo | w/wo |
---|---|---|---|---|---|
Dakar | Rainfall | 1.18 | 1.02 | 0.99 | 1.43 |
Temperature | 0.99 | 1.05 | 1.15 | 1.04 | |
Humidity | 0.94 | 1.04 | 1.06 | 1.11 | |
Fatick | Rainfall | 1.24 | 1.22 | 1.04 | 1.34 |
Temperature | 1 | 1.06 | 1.08 | 1.1 | |
Humidity | 1.18 | 1.05 | 0.77 | 1.34 | |
Kedougou | Rainfall | 0.99 | 0.97 | 0.92 | 1.02 |
Temperature | 0.96 | 0.95 | 0.88 | 1.05 | |
Humidity | 0.98 | 0.93 | 0.7 | 1.03 |
Variable | RMSE wo/w | MASE wo/w | MARE wo/w | wo/w | |
---|---|---|---|---|---|
Dakar | Mcp | 1.5 | 1.74 | 2.04 | 0.82 |
Rainfall | 0.83 | 1.06 | 1.27 | 0.75 | |
Temperature | 1.05 | 1.07 | 1.11 | 0.98 | |
Humidity | 1.04 | 1.05 | 1.25 | 0.98 | |
Fatick | Mcp | 1.48 | 1.63 | 1.58 | 1.11 |
Rainfall | 0.94 | 0.95 | 1.06 | 0.93 | |
Temperature | 0.96 | 0.98 | 0.96 | 1.01 | |
Humidity | 0.92 | 1.05 | 1.3 | 0.98 | |
Kedougou | Mcp | 1.5 | 1.64 | 1.25 | 1.21 |
Rainfall | 1 | 0.99 | 0.92 | 1.01 | |
Temperature | 1.02 | 1.02 | 0.92 | 0.97 | |
Humidity | 1.01 | 1.03 | 1.14 | 0.98 |
RMSE w/wo | MASE w/wo | MARE w/wo | w/wo | |
---|---|---|---|---|
Dakar | 0.95 | 0.97 | 0.96 | 1.01 |
Fatick | 1 | 1 | 1 | 1 |
Kedougou | 1.01 | 1.02 | 0.87 | 1.04 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Diao, O.; Absil, P.-A.; Diallo, M. Generalized Linear Models to Forecast Malaria Incidence in Three Endemic Regions of Senegal. Int. J. Environ. Res. Public Health 2023, 20, 6303. https://doi.org/10.3390/ijerph20136303
Diao O, Absil P-A, Diallo M. Generalized Linear Models to Forecast Malaria Incidence in Three Endemic Regions of Senegal. International Journal of Environmental Research and Public Health. 2023; 20(13):6303. https://doi.org/10.3390/ijerph20136303
Chicago/Turabian StyleDiao, Ousmane, P.-A. Absil, and Mouhamadou Diallo. 2023. "Generalized Linear Models to Forecast Malaria Incidence in Three Endemic Regions of Senegal" International Journal of Environmental Research and Public Health 20, no. 13: 6303. https://doi.org/10.3390/ijerph20136303