A Dynamical and Zero-Inflated Negative Binomial Regression Modelling of Malaria Incidence in Limpopo Province, South Africa

Recent studies have considered the connections between malaria incidence and climate variables using mathematical and statistical models. Some of the statistical models focused on time series approach based on Box–Jenkins methodology or on dynamic model. The latter approach allows for covariates different from its original lagged values, while the Box–Jenkins does not. In real situations, malaria incidence counts may turn up with many zero terms in the time series. Fitting time series model based on the Box–Jenkins approach and ARIMA may be spurious. In this study, a zero-inflated negative binomial regression model was formulated for fitting malaria incidence in Mopani and Vhembe―two of the epidemic district municipalities in Limpopo, South Africa. In particular, a zero-inflated negative binomial regression model was formulated for daily malaria counts as a function of some climate variables, with the aim of identifying the model that best predicts reported malaria cases. Results from this study show that daily rainfall amount and the average temperature at various lags have a significant influence on malaria incidence in the study areas. The significance of zero inflation on the malaria count was examined using the Vuong test and the result shows that zero-inflated negative binomial regression model fits the data better. A dynamical climate-based model was further used to investigate the population dynamics of mosquitoes over the two regions. Findings highlight the significant roles of Anopheles arabiensis on malaria transmission over the regions and suggest that vector control activities should be intense to eradicate malaria in Mopani and Vhembe districts. Although An. arabiensis has been identified as the major vector over these regions, our findings further suggest the presence of additional vectors transmitting malaria in the study regions. The findings from this study offer insight into climate-malaria incidence linkages over Limpopo province of South Africa.

Abstract: Recent studies have considered the connections between malaria incidence and climate variables using mathematical and statistical models. Some of the statistical models focused on time series approach based on Box-Jenkins methodology or on dynamic model. The latter approach allows for covariates different from its original lagged values, while the Box-Jenkins does not. In real situations, malaria incidence counts may turn up with many zero terms in the time series. Fitting time series model based on the Box-Jenkins approach and ARIMA may be spurious. In this study, a zero-inflated negative binomial regression model was formulated for fitting malaria incidence in Mopani and Vhembe-two of the epidemic district municipalities in Limpopo, South Africa. In particular, a zero-inflated negative binomial regression model was formulated for daily malaria counts as a function of some climate variables, with the aim of identifying the model that best predicts reported malaria cases. Results from this study show that daily rainfall amount and the average temperature at various lags have a significant influence on malaria incidence in the study areas. The significance of zero inflation on the malaria count was examined using the Vuong test and the result shows that zero-inflated negative binomial regression model fits the data better. A dynamical climate-based model was further used to investigate the population dynamics of mosquitoes over the two regions. Findings highlight the significant roles of Anopheles arabiensis on malaria transmission over the regions and suggest that vector control activities should be intense to eradicate malaria in Mopani and Vhembe districts. Although An. arabiensis has been identified as the major vector over these regions, our findings further suggest the presence of additional vectors transmitting malaria in the study regions. The findings from this study offer insight into climate-malaria incidence linkages over Limpopo province of South Africa.

Introduction
Malaria is a life-threatening disease that continues to claim a significant number of lives globally. In 2016 alone, malaria claimed roughly 445,000 lives across the globe from 216 million cases in 91 countries [1]. Despite various ongoing malaria control programmes, Africa continues to bear 90% of malaria cases and 91% of malaria deaths worldwide [1]. South Africa recently witnessed a significant increase in malaria cases across its epidemic regions, which are Limpopo, KwaZulu-Natal, and Mpumalanga province [2,3]. The sudden increase has been linked to climate and environmental factors [4], and reduction in indoor residual spraying [2]. In addition, the resurgence is more significant over Limpopo province. For instance, over 27,500 cases were reported in the province in 2017 as Mopani and Vhembe district municipalities presented the highest number of cases in the province [4,5]. Anopheles arabiensis has been identified as the major vector transmitting Plasmodium falciparum over the study regions [4,5].
Both malaria parasite and mosquito species are very sensitive to climatic conditions. Several studies [6][7][8][9][10][11][12] have investigated the impact of climate variables on the transmission of malaria and mosquito abundance. For instance, Craig et al. [13] developed a climate-based distribution model to investigate the impact of climate change on An. gambiae and malaria transmission over Sub-Saharan Africa. Hoshen and Morse [14] also developed a mathematical-biological model, comprising both the climate-dependent within-vector (An. gambiae s.l.) stages and the climate-independent within host stages to simulate malaria incidence in Zimbabwe. More recently, Abiodun et al. [12] developed mathematical models to investigate the impact of temperature and rainfall on the population dynamics of An. arabiensis malaria transmission over Nkomazi local municipality in KwaZulu-Natal province, South Africa. However, limited investigations have been made over Mopani and Vhembe districts; the regions in Limpopo province that are most prominent with respect to the malaria epidemic.
Recent studies have considered some statistical models for the transmission of malaria over some regions. For instance, various studies have presented various time series models based on the Box-Jenkins methodology [15][16][17]. Arab et al. [18] presented hierarchical Bayesian modelling of malaria in ten West African countries. Using Spearman correlation analysis, Adeola et al. [4] explored the roles of climate variables on malaria transmission in Mutale local municipality of Limpopo, South Africa. The analysis showed that monthly total rainfall, mean minimum temperature, mean maximum temperature, mean average temperature, and mean relative humidity were significantly and positively correlated with monthly malaria cases over the study areas. The monthly total rainfall and monthly mean minimum temperature came up as most significant. Malaria transmission is complex and involves a range of climatic, biological, and environmental factors. However, the high degree of non-linearity in these factors makes it difficult to predict and intervene against malaria [19]. Most statistical models are centred on time series approach grounded on the Box-Jenkins methodology [20]. The Box-Jenkins methodology has two approaches. These include the traditional autoregressive integrated moving average models and its seasonal extensions which do not allow for covariates different from lagged values of response variables. The other approach is the dynamic model (also referred to as ARIMAX), which allows for covariates different from its lagged values of the response variable [16]. Moreover, Briët et al. [21] formulated generalized seasonal autoregressive integrated moving average models for fitting monthly malaria case time series in a district in Sri Lanka, where malaria has decreased dramatically in recent years. In a real situation, malaria incidence counts may be inflated with many zeros. Fitting time series model based on the Box-Jenkins approach and ARIMA on malaria count data may give a spurious result. A zero-inflated model is designed to accommodate the extra zeros in the data.
Using a zero-inflated model for analysing malaria count data with an excessive number of zero, the present study investigates the impact of two climate variables on malaria incidence over Mopani and Vhembe. The malaria incidence is recorded in terms of the number of admission (number of inpatients) in all public health care stations in both regions. The zero-inflated negative binomial regression model was further developed to establish the links between climate variables and malaria cases over the study regions. In addition, the study simulates the population dynamics of An. arabiensis over both Mopani and Vhembe using climate-based mosquito model presented in the study of Abiodun et al. [8]. This is in order to investigate the impact of An. arabiensis abundance (in addition to climate) on malaria transmission over the epidemic regions.

Study Area
Vhembe and Mopani district municipalities are two of the five administrative district municipalities of Limpopo province, located in the north-eastern part of South Africa. The five district municipalities are further sub-divided into 25 local municipalities ( Figure 1). According to the 2011 census, Limpopo province accommodates about 10% (5,404,868) of the total South African (51,770,560) population with 44.2% of the province's population residing in Vhembe (24%) and Mopani (20.2%) districts [22]. These two districts account for about 96.3% of total malaria cases recorded within the province from 1998 to 2017, with 63.2% in Vhembe and 33.1% in Mopani. A large part of the study area is a remote area with pockets of commercial farms. The major malaria vector control strategies include the use of indoor residual spraying with Dichlorodiphenyltrichloroethane (DDT), larviciding of identified breeding habitats and insecticide-impregnated bed nets. Additionally, about 51% of the Kruger National Park, which records high malaria transmission is located within the study area [5]. The average annual temperature in both districts is 21.9 • C. In Vhembe an average of about 350 mm of rainfall is received while about 600 mm of rainfall is received in Mopani district.

Data
The malaria data reported in this study have been sourced from the provincial Integrated Malaria Information System (IMIS) of Malaria Control Programme in the Limpopo Provincial Department of Health and were obtained from the South African Weather Service (SAWS) through its collaborative research with the University of Pretoria Institute for Sustainable Malaria Control (UP ISMC), with ethical approval number MP_2014RP39_978. The data includes both active and passive surveillance malaria case patients, diagnosis date, sex, age, district and local council where the patient resides, source country or province in South Africa where the patient presumably contracted malaria and reported malaria deaths. The daily observation climatic data (total rainfall, maximum, minimum and mean temperatures) were also obtained from SAWS. The locations of the weather stations are shown in Figure 1. Both climate and malaria data span a period of 20 years (1 January 1998 to 31 December 2017).

Dealing with Missing Values
The malaria data denoted by M, consist of daily malaria incidence counts of Mopani and Vhembe District Municipalities from 1 January 1998 to 31 December 2017. The data were characterised by a large number of zeroes and some missing values. Predictor variables are the climate variables of the two districts: daily minimum temperature (T min t ), daily maximum temperature (T max t ) and daily total rainfall amount (R t ). For Mopani district, the proportions of the original data values that were missing are 0.00014, 0.00424 and 0.00424 for malaria count, daily minimum temperature and daily maximum temperature, respectively. For Vhembe district, the proportion of the original data values that were missing is 0.00096 for daily minimum temperature. In this study, multivariate imputation by chained equations (MICE) based on random forest was implemented for estimating a missing daily malaria count and missing values of some climate variables. Multivariate imputation by

Dealing with Missing Values
The malaria data denoted by , consist of daily malaria incidence counts of Mopani and Vhembe District Municipalities from 1 January 1998 to 31 December 2017. The data were characterised by a large number of zeroes and some missing values. Predictor variables are the climate variables of the two districts: daily minimum temperature ( ), daily maximum temperature ( ) and daily total rainfall amount ( ). For Mopani district, the proportions of the original data values that were missing are 0.00014, 0.00424 and 0.00424 for malaria count, daily minimum temperature and daily maximum temperature, respectively. For Vhembe district, the proportion of the original data values that were missing is 0.00096 for daily minimum temperature. In this study, multivariate imputation by chained equations (MICE) based on random forest was implemented for estimating a missing daily malaria count and missing values of some climate variables. Multivariate imputation by chained equations [23,24] estimate missing values for continuous data using predictive mean matching approach and binary data using logistic regression.

The Zero-Inflated Negative Binomial Regression Model
The effect of zero inflation on the malaria incidence is that the relationship may not be wellinformed in terms of the significance of the correlation between malaria and some climate variables. For instance, the estimates of Spearman's rank correlation coefficients between malaria and daily total rainfall of Mopani and Vhembe districts are 0.1342 (p-value < 2.2 × 10 −16 ) and 0.1977 (p-value < 2.2 × 10 −16 ) respectively. The measure of correlation between malaria and daily average temperature at lag 0 is 0.3001 (p-value < 2.2 × 10 −16 ) for Mopani and 0.3754 (p-value < 2.2 × 10 −16 ) for Vhembe. Similarly, measure of correlation between malaria and daily mosquito population at lag 0 is 0.0835 (p-value = 9.515 × 10 −13 ) for Mopani and 0.1655 (p-value < 2.2 × 10 −16 ) for Vhembe. The correlation

The Zero-Inflated Negative Binomial Regression Model
The effect of zero inflation on the malaria incidence is that the relationship may not be well-informed in terms of the significance of the correlation between malaria and some climate variables. For instance, the estimates of Spearman's rank correlation coefficients between malaria and daily total rainfall of Mopani and Vhembe districts are 0.1342 (p-value < 2.2 × 10 −16 ) and 0.1977 (p-value < 2.2 × 10 −16 ) respectively. The measure of correlation between malaria and daily average temperature at lag 0 is 0.3001 (p-value < 2.2 × 10 −16 ) for Mopani and 0.3754 (p-value < 2.2 × 10 −16 ) for Vhembe. Similarly, measure of correlation between malaria and daily mosquito population at lag 0 is 0.0835 (p-value = 9.515 × 10 −13 ) for Mopani and 0.1655 (p-value < 2.2 × 10 −16 ) for Vhembe. The correlation values are very small but significant and show that daily rainfall, average temperature, and mosquito population do have a major influence on malaria prevalence in the two district municipalities of Limpopo, South Africa. The measure of the correlation between malaria count and each of the climate variables at lag 0 is significant but not significant in models for the district municipalities as shown in Tables 1 and 2. The negative binomial distribution, also known as Poisson-Gamma mixture distribution is defined by its probability mass function as where µ i = t i µ and µ is the mean incidence rate of y per unit time t i . Suppose a random variable Y i follows the negative binomial distribution. Then its conditional expected value is where θ is the over-dispersion parameter and X 1 , X 2 , . . . , X p are predictor variables.
Negative binomial regression is used to model count data with the condition that the variance of the data is much greater than its mean. As a result, it is very good for over-dispersed count data. Negative binomial regression model for count data expresses µ in terms of explanatory variables. It is assumed in this study that the dispersion parameter θ takes the same value at all predictor values, following [25].
Suppose that events y 1 , y 2 , . . . , y n are identically distributed. Then the probability distribution of the zero-inflated negative binomial random variable can be expressed as is formulated for malaria counts of Mopani and Vhembe districts, where M t denotes daily malaria count, R t−k denotes daily rain amount at lag k, T ave t−k denotes daily average temperature at lag k and m t−k denotes simulated daily mosquito population at lag k. This model considers daily rain amount and its first K lagged values, average temperature and its first K lagged values and simulated daily mosquito population and its first K lagged values. It is assumed that patients are infected by mosquitoes before the day on which climate variables would be correlated with. It is noted that the incubation period of malaria within mosquito is 8-15 days depending on the daily temperature [26][27][28]. As a result, the value of K is taken to be 20.
It is observed that the time series structure makes malaria incidence counts dependent on each other. Ljung-Box test [29], a statistical test for determining whether any of a group of autocorrelations of a time series is different from zero, is employed to test if the residuals (η t ) of the zero-inflated negative binomial model (M t = E(M t |x i ) + η t ) are correlated. As a remedial measure, we suggest fitting a time series model based on autoregressive integrated moving average (ARIMA(p, d, q)) model to the residuals (η t ) of the fitted ZINB models following [30]. The ARIMA(p, d, q) on η t is defined as where p, d and q are orders of autoregressive, integrated and moving average parts respectively. Residuals, t , of the fitted ARIMA model on η t are uncorrelated. The choice of optimal values of p and q are based on the ARIMA(p, d, q) model with the least Akaike information criterion and root mean square of error. The parameters of ARIMA model are estimated by minimising sum of square of t using maximum likelihood estimation.

The Dynamical Mosquito Model
The importance of long-term data series in the analysis of climate impact on both mosquito abundance and malaria transmission have been highlighted in some studies [8,31,32]. However, long-term mosquito data are not easily accessible. For this reason, several studies [6][7][8]12] have used a deterministic model to simulate mosquito abundance over some regions. Similarly, due to the unavailability of mosquito data over the study regions, the present study used the dynamical model presented in the study of Abiodun et al. [8] to simulate abundance of An. arabiensis over Mopani and Vhembe. The climate-based model was developed to analyse how temperature and the availability of water affect mosquito population size. The model was validated over a town in eastern Sudan and was further used to investigate the influence of ambient temperature on the development and the mortality rate of An. arabiensis over Dondotha town in KwaZulu-Natal Province, South Africa. In particular, the model was used to examine the impact of climatic factors on the gonotrophic cycle and the dynamics of mosquito population over the study region. For details on the formulation of the mosquito model, we refer to Abiodun et al. [8].
The dynamical mosquito model was coded in MATLAB R2013b (MathWorks, Natick, MA, USA), while that of the regression model was handled by R programming language to implement methods in this paper. An R package pscl is used to implement zero-inflated negative binomial model, an R package forecast is used to implement autoregressive integrated moving average model and an R package tseries is used to implement Ljung-Box test and make plots of autocorrelation functions and partial autocorrelation functions.   Comparing the two study regions, Vhembe (in most days) seems hotter during the summer (December, January and February) months and cooler during the winter (June, July and August) months than Mopani although not statistically significant (p-value = 0.132). For instance, the black line (indicating Vhembe daily average temperature) is seen overlapping the green line (indicating Mopani daily average temperature) in most of the days (Figure 3a). However, the summer of Mopani was hotter than that of Vhembe in 2003 (p-value = 0.0033). The rainfall pattern shows that Vhembe generally experiences more rainfall than Mopani especially between 1998 and 2010 (p-value = 2.766 × 10 −6 ) ( Figure 3b). However, more rainfall is observed in Mopani than Vhembe between 2010 and 2014 (p-value = 3.614 × 10 −11 ). Although similar patterns of malaria cases are observed over the two regions, the cases are more noticeable over Vhembe than Mopani (Figure 3c). One reason traceable to this could be that the climate variables of Vhembe are more conducive for malaria transmission than that of Mopani [4]. Malaria cases in both regions are also higher throughout 2017 compared to previous years, but the cases are slightly higher in Mopani than Vhembe in May 2017. The total malaria cases over the study period in Mopani and Vhembe are about 28,811 and 55,037 respectively. Following the 2011 census [33,34], the incident rate per 100,000 people in Mopani is calculated to be approximately 2637.15, while that of Vhembe is 4250.87. The Wilcoxon rank sum test with continuity correction is applied to test if daily rain amount, as well as daily average temperature and simulated daily mosquito abundance, of Mopani and Vhembe districts, is significantly different. The daily rain amount of Mopani and Vhembe districts are not statistically significantly different (p-value = 0.8803) over the study period. The daily average temperature of Mopani and Vhembe districts are not statistically significantly different (p-value = 0.6754) over the study period. The simulated daily mosquito abundance of Vhembe district is statistically higher than that of Mopani district (p-value = 0.0002) over the study period.   Findings from the zero-inflated negative binomial regression model show that Mopani and Vhembe malaria incidence data are over-dispersed (Figure 4). This is because Mopani malaria count data has its variance (206.0995) greater than mean (4.0464). Similarly, Vhembe malaria count data has variance (201.0317) greater than mean (7.5342). Moreover, zero over-inflation of the malaria counts in both locations is evident in the figure as the number of days with no malaria count exceed the number of days with positive malaria count in each of the districts.

Analysis over Mopani District Municipality
A stepwise model selection procedure based on Akaike information criterion (AIC) was applied to drop models with highest AIC values in the fitted zero-inflated negative binomial model. The root mean square error (RMSE) of the full model for Mopani district, which is a measure of the deviation of observed malaria count from the fitted value, is 13.9049 while RMSE of the reduced model is 13.9137. The AIC value for the full model is 31,597.14, while the AIC value for the reduced model is 31,542.55. As a result, the reduced model is preferred for Mopani district.
The first block in Table 1 contains the count model coefficient and their standard error, z-score and p-value for each of the variables. The second block corresponds to the inflation model. The inflation model contains logit coefficients for predicting excess zeroes and the corresponding standard errors, z-scores and p-values for the coefficients. Table 1 presents the estimates of the zeroinflated negative binomial model (reduced model) for Mopani district. The coefficient of daily average temperature at lag 18 in the negative binomial regression part predicting the malaria count is statistically significant at 5% level of significance. The coefficients of daily rain amount at lag 9 and lag 16, daily average temperature at lag 9, lag 10, lag 12, lag 15 and lag 18, simulated daily mosquito population at lag 9, lag 10 and lag 20 in the logit model part predicting excessive zeroes are statistically significant. Other predictor variables are not statistically significant and are, therefore, excluded in the model. It is desirable to know whether zero-inflated negative binomial regression model fits the data statistically better than usual negative binomial regression model. The Vuong test [35] is employed to determine whether the formulated model (zero-inflated negative binomial regression model) fits the data better than the usual negative binomial regression model. The Vuong test is the likelihood-ratio-based test for model selection using the Kullback-Leibler information criterion. The test suggests that the zero-inflated negative binomial model is a significant improvement over a standard negative binomial model. The Vuong statistic tests the null hypothesis that the formulated zero-inflated negative binomial model and the negative binomial model are equally close to the true data generating process, against the alternative that the formulated zeroinflated negative binomial model is closer. The Vuong test is asymptotically distributed as a standard normal distribution (that is, N (0,1)) under the null hypothesis that the models are equivalent. The test rejects the null hypothesis at 5% level of significance (p-value < 2.22 × 10 −16 ) and suggests that zero-inflated negative binomial model with lagged predictors fits the data better than the usual negative binomial regression model.
The number of malaria cases decreases by a factor of 0.9742 for a one-unit increase in daily average temperature at lag 18 when other variables are held constant. This implies that it is much

Analysis over Mopani District Municipality
A stepwise model selection procedure based on Akaike information criterion (AIC) was applied to drop models with highest AIC values in the fitted zero-inflated negative binomial model. The root mean square error (RMSE) of the full model for Mopani district, which is a measure of the deviation of observed malaria count from the fitted value, is 13.9049 while RMSE of the reduced model is 13.9137. The AIC value for the full model is 31,597.14, while the AIC value for the reduced model is 31,542.55. As a result, the reduced model is preferred for Mopani district.
The first block in Table 1 contains the count model coefficient and their standard error, z-score and p-value for each of the variables. The second block corresponds to the inflation model. The inflation model contains logit coefficients for predicting excess zeroes and the corresponding standard errors, z-scores and p-values for the coefficients. Table 1 presents the estimates of the zero-inflated negative binomial model (reduced model) for Mopani district. The coefficient of daily average temperature at lag 18 in the negative binomial regression part predicting the malaria count is statistically significant at 5% level of significance. The coefficients of daily rain amount at lag 9 and lag 16, daily average temperature at lag 9, lag 10, lag 12, lag 15 and lag 18, simulated daily mosquito population at lag 9, lag 10 and lag 20 in the logit model part predicting excessive zeroes are statistically significant. Other predictor variables are not statistically significant and are, therefore, excluded in the model. It is desirable to know whether zero-inflated negative binomial regression model fits the data statistically better than usual negative binomial regression model. The Vuong test [35] is employed to determine whether the formulated model (zero-inflated negative binomial regression model) fits the data better than the usual negative binomial regression model. The Vuong test is the likelihood-ratio-based test for model selection using the Kullback-Leibler information criterion. The test suggests that the zero-inflated negative binomial model is a significant improvement over a standard negative binomial model. The Vuong statistic tests the null hypothesis that the formulated zero-inflated negative binomial model and the negative binomial model are equally close to the true data generating process, against the alternative that the formulated zero-inflated negative binomial model is closer. The Vuong test is asymptotically distributed as a standard normal distribution (that is, N (0,1)) under the null hypothesis that the models are equivalent. The test rejects the null hypothesis at 5% level of significance (p-value < 2.22 × 10 −16 ) and suggests that zero-inflated negative binomial model with lagged predictors fits the data better than the usual negative binomial regression model.
The number of malaria cases decreases by a factor of 0.9742 for a one-unit increase in daily average temperature at lag 18 when other variables are held constant. This implies that it is much likely to have any malaria cases as the daily average temperature at lag 9, lag 12 and lag 14 increase. The odds of being an excessive zero would decrease by 0.9404, 0.9335, 0.8481, 0.8872, 0.8668, 0.9242, 0.8729 and  0.9455 for every one-unit increase in daily rain amount at lag 9 and lag 16, daily average temperature at lag 9, lag 10, lag 12, lag 15 and lag 18, and simulated daily mosquito at lag 9 respectively. Increase in the odds of being an excessive zero means that it is less likely that there will be malaria cases. This implies that the likelihood that daily malaria count would be zero in Mopani district municipality decreases with an increase in daily rain amount at lag 9 and lag 16, daily average temperature at lag 9, lag 10, lag 12, lag 15 and lag 18, and simulated daily mosquito at lag 9. Moreover, the log odds of being an excessive zero would increase by 1.0366 and 1.0195 for every one-unit increase in the simulated daily mosquito at lag 10 and lag 20, respectively.

Analysis over Vhembe District Municipality
A stepwise model selection procedure based on Akaike information criterion (AIC) was applied to drop models with highest AIC values in the fitted zero-inflated negative binomial model for Vhembe district. The RMSE of the full model for Vhembe district is 13.7776 while RMSE of the reduced model is 13.79789. The AIC value for the full model is 42,218.47, while the AIC value for the reduced model is 42,232.6. As a result, the reduced model is preferred for Vhembe district. Table 2 presents the estimates of the zero-inflated negative binomial model (reduced model) for Vhembe district. The coefficients of daily average temperature at lag 9, lag 12 and lag 14, simulated daily mosquito population at lag 20 in the count model predicting daily malaria count are statistically significant at 5% level of significance. The coefficients of daily average temperature at lag 10, lag 12 and lag 14, and simulated daily mosquito population at lag 9 and lag 15 in the logit model part predicting excessive zeroes are statistically significant. Other predictors are not statistically significant and are therefore excluded from the model. The Vuong test is also employed to determine whether a negative binomial regression model fits the Vhembe district malaria data statistically better than the formulated zero-inflated negative binomial regression model. The test rejects the null hypothesis at 5% level of significance (p-value < 2.22 × 10 −16 ) and suggests that zero-inflated negative binomial regression model fits the data better than the negative binomial regression model.
The number of malaria cases increases by 1.0247, 1.0189 and 1.0151 for a one-unit increase in daily average temperature at lag 9, lag 12 and lag 14, respectively, when other variables are held constant. This implies that it is more likely to have any malaria cases as the daily average temperature at lag 9, lag 12 and lag 14 increase. The number of malaria cases decreases by a factor of 0.9979 for a one-unit increase in simulated daily mosquito population at lag 20 when other variables are held constant. This implies that it is less likely to have any malaria cases as the daily average temperature at lag 18 increase. The odds of being an excessive zero would decrease by 0.7965, 0.8848, 0.8364 and 0.9541 for every one-unit increase in daily average temperature at lag 10, daily average temperature at lag 12, daily average temperature at lag 14 and simulated daily mosquito population at lag 9 respectively. This implies that the likelihood that daily malaria count would be zero in Vhembe district municipality decreases with an increase in daily average temperature at lag 10, daily average temperature at lag 12, daily average temperature at lag 14 and simulated daily mosquito population at lag 9. Moreover, the odds of being an excessive zero would increase by a factor of 1.0296 for every one-unit increase in the simulated daily mosquito population at lag 15.
The dispersion parameter θ in Tables 1 and 2 gives an indication if zero-inflated negative binomial model is fit for the data. If θ approaches infinity, then variance equals mean and as a result, zero-inflated Poisson model will fit the data better. Additionally, θ is finite implies that the variance is greater than mean. As θ approaches 0, the farther the variance is from the mean. Exponentiating log(θ) in Tables 1  and 2, the values of θ are 0.4292 and 0.6257 for Mopani and Vhembe districts. Hence, the zero-inflated negative binomial model is appropriate for the model and confirm the result in Section 3.1.
This complements the findings of previous studies. It was argued in [36] that a moderate transmission intensity climate is crucial to malaria transmission. Based on the findings of [37,38] concluded that climate predictor variables generate a better predictive power when modelling malaria incidence in areas with unstable transmission compared to areas with stable endemicity. However, [36] shows that the development of clinical immunity buffers any effect of climate under high endemicity. In addition, [18] showed that there is a statistically significant correspondence between malaria rates and the climate variables, mostly air temperature and precipitation. This is confirmed in the fitted models for malaria incidence in Mopani and Vhembe districts. An increase in daily average temperature and its lagged values significantly raise the chance of malaria transmission and thereby leads to an increase in malaria incidence in Vhembe district. Furthermore, an increase in rainfall amount at lags 9 and 16 increases the probability of malaria cases occurring in the Mopani district. This is in line with several other studies that have highlighted the importance of rainfall on malaria transmission and other infectious diseases in western Kenya [39], Tanzania [40], East Africa [41] and Ghana [42].
Ljung-Box test [29] is employed to test if the residuals (η t ) of the zero-inflated negative binomial model (M t = E(M t |x i ) + η t ) are correlated. The Ljung-Box test shows that residuals of a fitted model for each of Mopani district (p-value < 2.2 × 10 −16 ) and Vhembe district (p-value < 2.2 × 10 −16 ) are autocorrelated. This confirms the result of plots of the autocorrelation function and partial autocorrelation function in Figure 5. The η t achieves stationarity at d = 1. The optimal models for η t are ARIMA(5,1,4) and ARIMA(2,1,1) for Mopani and Vhembe district municipalities, respectively.

Mosquito Abundance and Malaria Cases of Mopani and Vhembe
Findings further highlight the importance of mosquitoes in the transmission of malaria ( Figure 9). Results also show that abundance of An. arabiensis is positively correlated with malaria transmission over the two study regions (Figure 9). The measure of the correlation (Spearman's rank correlation coefficients) between mosquito abundance and malaria count is 0.0835 (p-value = 9.515 × 10 −13 ) in Mopani district while the measure of the correlation between mosquito abundance and malaria count is 0.1655 (p-value < 2.2 ×10 −16 ) in Vhembe district. However, findings show that transmission is possible over the study regions even with temperate amount of An. arabiensis. For instance, over Mopani, malaria cases maintain a steady increase from 0 to almost 250 even below estimated 60,000 An. arabiensis (Figure 9a). Similarly, with just about 50,000 An. arabiensis, malaria cases went up to 350 in Vhembe (Figure 9b). This is also an indication that the impact of other malaria vectors over the study regions cannot be overlooked. In other words, all control measures to eradicate malaria over these regions should target An. arabiensis and other malaria-transmitting vectors. Although it has been established that An. arabiensis is the primary malaria vector in South Africa [32], the findings here suspect the presence of additional mosquito species transmitting malaria over the study regions as recently found in KwaZulu-Natal and Mpumalanga province [32]. This is also in line with the findings of [5] where several other mosquito species were found across five different regions in Limpopo province [5].
been established that An. arabiensis is the primary malaria vector in South Africa [32], the findings here suspect the presence of additional mosquito species transmitting malaria over the study regions as recently found in KwaZulu-Natal and Mpumalanga province [32]. This is also in line with the findings of [5] where several other mosquito species were found across five different regions in Limpopo province [5].

Conclusions
In this study, the importance of climate variables on population dynamics of An. arabiensis and malaria transmission over Mopani and Vhembe (two epidemic regions in Limpopo) was investigated. In particular, a zero-inflated negative binomial regression model was formulated for predicting daily counts of malaria incidence in the two regions as a function of these variables. Results from the study show that daily average temperature, rain amount and simulated daily mosquito population at various lags affects the probability of having malaria count in the Mopani and Vhembe district municipalities. The time series structure of the data from the two district municipalities makes each of the malaria incidence count, simulated daily mosquito population and climate variables autocorrelated. Time series models based on autoregressive integrated moving average (ARIMA) are employed on the residuals of zero-inflated negative binomial models as a remedial measure. This gives better predictive models and the associated residuals are not autocorrelated, as supported by the Ljung-Box test.
In general, since there are no exceptional variations in the climate variables in 2017 (for example, daily average temperature (p-value = 0.0868), daily rain amount (p-value = 0.0867)), the sudden increase of the cases might not totally depend on climate. There could be other factors associated with the increase around this period. It could also be that malaria control activities were relaxed during this period as suggested by the National Institute for Communicable Diseases (NICD) (NICD update, 2017). A further reason could be that malaria transmission started in more areas in both study regions.
Due to unavailability of actual mosquito data over the study regions, the present study considered simulated mosquito data for its analyses. It is envisaged that actual data would produce more precise results in this type of study.